[MA 2025 09] Fusing Vision, Voice & Vitals for Multimodal AI Triage

MESH, 370 Sarphatistraat, 1018GW Amsterdam
Proposed by: MESH R&D (Amsterdam, NL) [Dr. Simon Haddadin (Medical Doctor, MESH Founder)]

Hosting Entity

- External Organization: MESH R&D (Amsterdam, NL)

- Clinical Supervisor: Dr. Simon Haddadin (Medical Doctor, MESH Founder)

- Technical Supervisor: Denis Yakovlev, CTO

Project Context

This SRP supports the development of SAPIEN, MESH’s AI-driven diagnostic platform. The goal is to combine multiple patient inputs — facial imagery, voice tone, posture, and optionally vitals — into a unified model that flags relevant health risks. This early triage engine forms the heart of a CE-compliant digital twin.

Problem Statement

Traditional intake relies solely on verbal symptoms and vitals. This project explores a multimodal fusion approach to triage using smartphone/webcam video, short voice clips, and basic sensor input — making early diagnostics faster, broader, and more human-aware.

Research Questions

1. Which signal combinations (e.g., facial expression + vocal fatigue + movement) best predict clinical concern levels?

2. How can multimodal models operate efficiently on real-world hardware (e.g., mobile or in-clinic setups)?

3. What architectures (early vs. late fusion) are most robust under variable input quality?

Timeline

Preferred: Begin as working student asap and for academic period November 2025 – June 2026 (Full-time SRP)

Expected Results

- Functional prototype fusing at least 2 input modalities (vision, voice, vitals)

- Lightweight triage classifier integrated into SAPIEN’s backend

- Evaluation of accuracy vs. interpretability tradeoffs

- Scientific report + publication-ready output

Student Benefits

- €600/month R&D stipend

- Mentorship from clinical + AI domain experts

- Access to structured movement tasks, video capture tools, and test patients

- Co-author opportunity on publishable thesis

- Potential post-thesis role or referral

- Bonus support for hackathons or conference travel (based on outcome)