Scientific Research Project

[MA 2025 12] Evidence-based informatics: toward an evaluation framework for hospital systems building or buying LLM applications

Amsterdam UMC Department of Medical Informatics. In collaboratoin with Emma Kinderziekenhuis, and Amsterdam UMC afdeling ICT.

Proposed by: David Neal [d.n.neal@amsterdamumc.nl]

Introduction

There is much interest in the potential for a wide range of applications of large language models (LLMs) in to reduce administrative burdens on health care professionals and improve efficiency of health care delivery.1 If a business case is considered strong enough to invest in this new technology, decision-makers are faced with choosing between a large number of commercially developed LLM-powered services. Decision-makers may also have the option to partner with a commercial provider to develop a new application, or may even have capacity for fully in-house development projects. To choose the right solution for their context, decision-makers should take an evidence-based approach. Whilst there is a proliferation of evidence and evaluation frameworks for LLMs,2—5 the primary focus is on aspects of model performance, whereas decision-makers seeking to procure or develop LLM-based applications also need to consider additional factors related to their organization and prospective users. From the organizational perspective, relevant evidence pertains to implementation requirements and feasibility, and outcomes within each domain within the quadruple aim of value-based healthcare: patient health outcomes, patient satisfaction, staff satisfaction, and efficiency of healthcare delivery. From a user perspective, solutions should be compared based on their usability (defined as ease of use, efficiency and effectiveness for a given user, task and context) for the target users within the organization. Additionally, decision-makers need to acquire evidence for making investment decisions at a lower cost in time and money than scientific researchers. There is therefore a need for a decision-maker focused framework for efficient collection of evidence that facilitates comparison of solutions from an organizational and technology user perspective. There is currently a lack of standardized frameworks or approaches for efficiently acquiring such evidence, to facilitate evidence-based informatics in health care.

Description of the SRP Project/Problem

The student will work toward developing an evaluation framework that can support managers within hospital systems to efficiently evaluate and compare competing alternative LLM applications for a given use case, in order to make evidence-based decisions about procurement or development. This framework will be developed based on a comparative evaluation of two LLM-based applications within Amsterdam UMC: Ask Aletta, and the AI Proeftuin.

Research questions

RQ1) Which evidence is required by decision-makers?

RQ1.1) Which stakeholders within Amsterdam UMC are involved in making decisions about procuring or developing LLM applications?

RQ1.2) What are the decisions that each stakeholder must take (e.g. provide development budget, agree to pilot project, procure an application)?

RQ1.3) For each stakeholder, what is most the salient information that would influence their decision-making processes?

RQ2) How can the most salient information be efficiently obtained?

RQ2.1) For the most salient information, which methodological approaches are appropriate to obtain evidence?

RQ2.2) What are the strengths and limitations of the methodological approaches?

RQ3) Case-study: based on evidence collected in line with the evaluation framework, should Amsterdam UMC invest in procurement of Ask Aletta or further development and implementation of the AI proeftuin?

Expected results

- Master thesis consisting of minimum three chapters; at least one of these may be suitable for subsequent write up of a scientific paper describing a novel framework for pragmatic evaluation of application development or procurement proposals.

- Evaluation framework for LLM applications, describing the domains and outcomes on which applications will be evaluated, when applications will be evaluated (in relation to key decisions to be made over product development and procurement cycles), minimum standards and supporting evidence sought at each timepoint for each domain, and how results should be presented and to which stakeholders:

- Presented in the form of a static document that can be disseminated to Amsterdam UMC managers in IT, procurement, strategy, and external stakeholders.

- In the form of a prototype dashboard that can be used to track evaluation status by application or use-case

- Business report for the Amsterdam UMC AI strategy lead and managers at the Emma Kinderziekenhuis

Time period, please tick at least 1 time period

November – June (x)

May - November