Scientific Research Project

[MA 2025 14] Exploiting missingness masks to detect novel patients at test time

Department of Medical Informatics, Amsterdam UMC

Proposed by: Giovanni Cinà [g.cina@amsterdamumc.nl]

Introduction

ML models are vulnerable to changes in data distribution; their performance can degrade quickly when they are applied to patients coming from distributions different from training data. For this reason, in high stakes environment such as clinical applications of AI, it is important to develop methods to flag patients that are ‘new’ for the model, to prevent the AI from making mistakes.

In clinical ML, since all tests are ordered by clinicians for a reason, how data are missing often carries information about workflow, patient acuity, and site-specific practice patterns. These “missingness masks”—binary indicators of which features are observed and which are missing—shift across hospitals, time, and acquisition pipelines. It has been shown that such masks can have value for prediction [1]. Such shifts can also signal that new samples are out-of-distribution (OOD) relative to training, even when observed feature values appear benign. In this respect, imputation of missing values can lead to loss of information when it comes to OOD detection.

Description of the SRP Project/Problem

The goal of the project is to develop and evaluate OOD detectors for medical tabular data that explicitly leverage missingness masks—alone and in combination with feature values—to identify distribution shift at deployment.

The motivation for this approach is that standard OOD detectors (e.g., distance-, density-, or energy-based) typically ignore the generative process behind missing data, risking silent failures when clinical ordering behavior changes. A change in missingness patterns can be a good proxy of a difference in care protocol, and therefore signal the presence of OOD samples. A dedicated benchmark for medical tabular OOD detection [2] highlights the need for methods tailored to this setting and can be re-used to test the OOD detectors making use of missingness masks.

The project will compare 1) mask-only OOD detection, training OOD models only with missingness masks – 2) Mask-augmented detectors, OOD detectors trained on the concatenated information of both covariate and missingness masks and 3) OOD detectors using only covariates plus various kinds of imputation. Plausible OOD scenarios will be created by altering test-time ordering policies (mask distribution), site/domain shifts, and calendar drift.

Research questions

RQ1) How predictive is the missingness mask alone for OOD detection in medical tabular data compared with value-based baselines?

RQ2) Do mask-augmented detectors significantly improve OOD detection performance accuracy over value-only detectors using imputation?

RQ3) More broadly, can mask distribution monitoring detect clinically meaningful shifts earlier than performance drops (early-warning drift detection)?

RQ4) How sensitive are mask-based detectors to different missingness mechanisms (MCAR/MAR/MNAR, see [3]) and to site-specific ordering policies observed in practice?

Expected results

- The main intended outcome of this SRP project is a scientific paper. The results of the work will be submitted to a top-tier machine learning workshop, and this paper will constitute your SRP.?

- The second main deliverable is a open source code base where all the experiments conducted will be shared with the research community.

Time period, please tick at least 1 time period:

November – June

May - November

References

[1] Le Morvan, Marine, et al. "NeuMiss networks: differentiable programming for supervised learning with missing values."?Advances in Neural Information Processing Systems?33 (2020): 5980-5990.

[2] Azizmalayeri, Mohammad, Ameen Abu-Hanna, and Giovanni Cina. "Unmasking the chameleons: A benchmark for out-of-distribution detection in medical tabular data."?International Journal of Medical Informatics?195 (2025): 105762.

[3] Donders, A. Rogier T., et al. "A gentle introduction to imputation of missing values."?Journal of clinical epidemiology59.10 (2006): 1087-1091.