[MA 2025 09] Fusing Vision, Voice & Vitals for Multimodal AI Triage
MESH, 370 Sarphatistraat, 1018GW Amsterdam
Proposed by: MESH R&D (Amsterdam, NL) [Dr. Simon Haddadin (Medical Doctor, MESH Founder)]
Hosting Entity
- External Organization: MESH R&D (Amsterdam, NL)
- Clinical Supervisor: Dr. Simon Haddadin (Medical Doctor, MESH Founder)
- Technical Supervisor: Denis Yakovlev, CTO
Project Context
This SRP supports the development of SAPIEN, MESH’s AI-driven diagnostic platform. The goal is to combine multiple patient inputs — facial imagery, voice tone, posture, and optionally vitals — into a unified model that flags relevant health risks. This early triage engine forms the heart of a CE-compliant digital twin.
Problem Statement
Traditional intake relies solely on verbal symptoms and vitals. This project explores a multimodal fusion approach to triage using smartphone/webcam video, short voice clips, and basic sensor input — making early diagnostics faster, broader, and more human-aware.
Research Questions
1. Which signal combinations (e.g., facial expression + vocal fatigue + movement) best predict clinical concern levels?
2. How can multimodal models operate efficiently on real-world hardware (e.g., mobile or in-clinic setups)?
3. What architectures (early vs. late fusion) are most robust under variable input quality?
Timeline
Preferred: Begin as working student asap and for academic period November 2025 – June 2026 (Full-time SRP)
Expected Results
- Functional prototype fusing at least 2 input modalities (vision, voice, vitals)
- Lightweight triage classifier integrated into SAPIEN’s backend
- Evaluation of accuracy vs. interpretability tradeoffs
- Scientific report + publication-ready output
Student Benefits
- €600/month R&D stipend
- Mentorship from clinical + AI domain experts
- Access to structured movement tasks, video capture tools, and test patients
- Co-author opportunity on publishable thesis
- Potential post-thesis role or referral
- Bonus support for hackathons or conference travel (based on outcome)