Healthcare Foundation Models: where’s the breakthrough?
The medical world has a finicky relationship with AI and ML systems, and for good reason. As industries worldwide tinker with foundation models leveraging specialized data to find a bespoke machine-driven edge, healthcare researchers must grapple with the highest privacy standards and user (patient) procedures. This itself creates additional problems, such as a dearth in shared evaluation frameworks of budding AI that are crucial to machine learning fields everywhere. By the extremely delicate nature of their services, healthcare ML developers cannot enjoy the same try/fail process as their colleagues in other domains; however, they are in the midst of model creation that may push boundaries beyond the medical realm.
Since the beginning of the AI hype cycle two years ago, many researchers look at the lackluster predictive accuracy and factual inconsistency of LLMs and question whether such services would ever be worth investment into a safe and reliable healthcare service. Scholars at Stanford’s Human-Centered Artificial Intelligence Institute reviewed more than 80 clinical foundation models, including Clinical Language Models (CLaMs) and Foundation Models for Electronic Medical Records (FEMRs). Each model displayed considerable difficulty in evaluating but showed the potential to handle complex medical data without extensive labeling efforts. “The authors propose a new evaluation paradigm to better align with clinical value, emphasizing the need for clearer metrics and datasets in healthcare applications of LLMs. They acknowledge the risks, including data privacy concerns and interpretability issues, but remain optimistic about the potential of foundation models in addressing healthcare challenges.” More and more researchers are supporting this view, warranting new strategies for collecting medical datasets that can permit AI devices to classify and respond better. Michael Moor et al. from Stanford put forward a paper arguing for the feasibility of a generalist medical AI (GMAI) in 2023 to interpret an exceptional breadth of medical modalities (e.g., “imaging, electronic health records, laboratory results, genomics, graphs or medical text”) without little or any use of specialized data labeling. Such a sweeping model was unheard of only a couple of years ago in the medical community.
If these developments are adequate, new multipurpose foundation models can deliver on the medical intelligence revolution in three core areas: medical diagnosis, treatment planning, and drug discovery. AI diagnosis has already shown promise in oncology. Last March, The Economist broke the story of Barbara, a woman undergoing a mammogram last year in the UK whose doctors had disregarded a small but aggressive growth. A new AI diagnosis system named Mia correctly identified the six-millimetre patch as Stage 2 breast cancer, which would have evolved quickly between then and her next routine checkup. Outside precision oncology, these models have also been significant in diminishing hospital admission time, new drug development, and improving critical trial design to test the efficacy of a new product. Significant issues remain with these advances, however, with many researchers grappling with moral and accountability concerns in the AI-enabled drug development culture.
The Food and Drug Administration (FDA) in the United States continues to approve drug discovery and diagnostic services AI systems provide. After mitigating extensive data intake needs as discussed, the fundamental concern is whether the training and output of these AI systems is backed by effective oversight. Regulating a nascent technology is inherently fraught with more reactiveness than preemptive accuracy. Still, as NYT reporter Christina Jewett has learned covering the FDA: “In medicine, the cautionary tales about the unintended effects of artificial intelligence are already legendary.” Noticing the warning signs, ranging from overzealous development from commercial upstarts to opaque research from top institutions, is essential to promoting realistic dialogue outside the hype. Healthcare is drowning in a tsunami of data, and the urgency to wrangle it comes with the same intensity to protect it. Because of this, and as new models emerge, perhaps medical foundation models are one participant of the greater AI arc that have the best chance to get it right.
Further reading: