The Generative Era of Medical AI Cell
The Generative Era of Medical AI Cell
Leading Edge
Review
The generative era of medical AI
L. John Fahrner,1,3 Emma Chen,1,3 Eric Topol,2,4,5 and Pranav Rajpurkar1,4,5,*
1Department of Biomedical Informatics, Harvard Medical School, Cambridge, MA, USA
2Scripps Research, La Jolla, CA, USA
3These authors contributed equally
4These authors contributed equally
5Senior author
*Correspondence: [email protected]
https://doi.org/10.1016/j.cell.2025.05.018
SUMMARY
Rapid advancements in artificial intelligence (AI), particularly large language models (LLMs) and multimodal
AI, are transforming medicine through enhancements in diagnostics, patient interaction, and medical fore
casting. LLMs enable conversational interfaces, simplify medical reports, and assist clinicians with decision
making. Multimodal AI integrates diverse data like images and genetic data for superior performance in pa
thology and medical screening. AI-driven tools promise proactive, personalized healthcare through contin
uous monitoring and multiscale forecasting. However, challenges like bias, privacy, regulatory hurdles,
and integration into healthcare systems must be addressed for widespread clinical adoption.
INTRODUCTION AI systems, which were not reliable enough for real-world clinical
applications.
Technological innovation in biomedicine has directly contributed Fast forward to today, LLMs like ChatGPT, Gemini, Claude,
to improved quality of life and extended healthspan. Historically, and Llama have captured the attention of the world. These
advances in drug development, surgical techniques, under models exemplify two important paradigms in modern AI: foun
standing of biological pathways, imaging techniques, and other dation models—large-scale, general-purpose AI systems
areas have propelled this progress. Now we are on the verge of a trained on vast datasets that can be adapted to numerous down
new phase of growth with the recent progress in artificial intelli stream tasks—and generative AI, which enables the creation of
gence (AI), which we will attempt to summarize here. The weekly novel content such as text, images, or molecular designs by
Doctor Penguin newsletter has continued to track novel devel modeling complex data distributions. Unlike traditional AI, which
opments in medical and health AI since 2019 and serves as predominantly focused on discriminative classification tasks and
a source of material for this review (https://doctorpenguin. relied on specialized architectures for specific domains, genera
substack.com). From a technical perspective, modern AI ad tive AI learns to generate outputs that statistically resemble the
vancements have been enabled by several key architectural in training data. This capability stems primarily from the Trans
novations, including the Transformer architecture, generative former architecture, introduced in 2017, which has redefined
adversarial networks, and diffusion models, which together scalability and performance in AI.2
have powered the development of increasingly sophisticated The Transformer’s core innovation is self-attention, a mecha
generative AI systems. Research has shown the potential for nism that dynamically weighs the relevance of different input ele
transformative change because of large language models ments, allowing the model to capture long-range dependencies in
(LLMs) and multimodal AI, the changing medical practice, and data, such as relationships across sentences or protein se
multiscale medical forecasting; this review aims to summarize quences. In generative tasks, the decoder component of the
this seemingly exponential progress over the last 3 years. We Transformer is critical; it generates outputs sequentially (e.g.,
will discuss the background, implementation, implications, and one word or token at a time) by attending to both the input context
some of the persistent challenges associated with these new and previously generated elements. This architecture powers
technologies. LLMs, such as those driving chatbots or text synthesis tools,
which are trained on vast datasets to learn intricate patterns in lan
LLMs AND THE PATH TO MULTIMODAL MEDICINE guage or other domains. Earlier approaches like recurrent neural
networks (RNNs) and convolutional neural networks (CNNs) faced
The promise of AI in healthcare dates to the 1960s, when Joseph fundamental scaling bottlenecks—RNNs struggled with paralleli
Weizenbaum developed ELIZA, one of the first chatbots.1 ELIZA zation and long-range dependencies, while CNNs had locality
simulated a Rogerian psychotherapist, engaging in simple dia biases limiting their effectiveness for sequential data.
logue with users. Subsequent efforts to create conversational The key discovery that Transformer models consistently improve
AI for medicine were hindered by the limited capabilities of early as they grow larger accelerated the recent AI development.
Researchers found that simply increasing the model size, training teracting with existing systems, humans, or other agents.
data, and computing power leads to predictable gains in perfor Agentic systems promise to automate workflows, validate AI
mance—a property not seen with previous approaches where safety and reduce errors, aid in managing disparate AI tools,
improvements would eventually plateau.3 This mathematical pre and to provide outcomes predictions, among other skills.26
dictability, coupled with advances in specialized computing hard Polaris AI exemplifies this agentic approach through its ‘‘constel
ware and the availability of petabyte-scale datasets, established lation architecture,’’ where a primary conversational agent works
the precise conditions necessary for the current AI revolution. in concert with specialized LLM agents—including medication
The convergence of these factors has positioned generative AI as specialists that verify dosages, labs specialists that analyze
a transformative tool for applications like drug discovery, clinical test results, and nutrition specialists that provide tailored dietary
decision support, and automated analysis of medical literature, of guidance—enabling the system to maintain both engaging con
fering unprecedented opportunities to accelerate biomedical versation and medical accuracy while ensuring built-in safety re
research. dundancies for healthcare interactions.27
When applied to direct patient interaction, LLMs are charting a Recent advancements in chain-of-thought prompting and
path toward meaningful conversational AI in medicine. In this reasoning techniques have addressed the challenge of ensuring
application, LLMs can provide patients with accessible conversa accurate and clinically relevant outputs from LLMs.28,29 These
tional interfaces to interact directly with their own individual health approaches have facilitated the development of datasets opti
data in the electronic health record (EHR) and also with general mized for reasoning and, more recently, specialized reasoning
medical information.4–6 For example, LLM agents have been models.30–32 As these techniques evolve, large reasoning
used to reduce the complexity of pathology reports and for trans models are poised to become increasingly prevalent in clinical
forming hospital discharge summaries into a patient-friendly applications.
format.7,8 When mental health chatbots are made available to pa
tients, they have shown potential in reducing stigma about mental Multimodal AI and foundation models
health care and have demonstrated increased referral rates, most Early medical AI systems were dedicated single-task models pre
significantly for traditionally underrepresented groups.9 These re dominantly trained on specific medical datasets, which required
sults are notable, as the first steps of seeking care and receiving tedious manual labeling. This burden was slightly reduced by tech
an appropriate referral are common barriers in the mental health niques like self-supervised learning (automatic interpretation of
pathway. These conversational agents can assist patients in navi training data without explicit human labeling) and few-shot learning
gating their healthcare course, providing personalized informa (more efficient learning using fewer curated examples).33,34
tion and support. While many LLM tools await medical approval, Medicine is inherently a multimodal domain, where clinical in
early reports suggest patients are already testing their benefits; in sights arise from combining radiology scans, patient records,
one example, a mother was able to diagnose her young son’s genomic sequences, and spoken consultations.35 Traditional
tethered cord after multiple fruitless physician visits.10 AI models, often limited to single modalities, struggled to capture
In addition to conversational and summarization agents, LLMs this complexity. Multimodal generative AI overcomes these lim
can be tools for clinicians.11 Models in the research setting have itations by learning unified representations across modalities,
demonstrated performance at least comparable to clinicians in enabling a deeper understanding of medical data. A pivotal
history-taking, following diagnostic pathways, communication, advancement in this field is the Contrastive Language-Image
and empathy.11–14 LLMs can also serve as medical knowledge Pretraining (CLIP) model (introduced in 2021 by OpenAI), which
resources for clinicians. Dedicated LLMs have been developed uses contrastive learning to align vector representations of im
for specialized fields, enabling clinicians to access expert knowl ages, text, and potentially other modalities (e.g., audio spectro
edge and to assist with decision-making and guideline adher grams) into a shared latent space.36 CLIP’s architecture trains on
ence, and have already gained certification.15–17 Models now paired data (e.g., images and captions) by maximizing the simi
routinely achieve passing scores on medical licensing exams, larity between matched pairs while minimizing similarity between
showcasing their potential to provide up-to-date and compre unmatched pairs, creating a unified space where related con
hensive medical information.18–21 By leveraging the vast knowl cepts across modalities (like a medical image of a tumor and
edge encapsulated within LLMs, these diagnostic tools promise its textual description) are positioned close together. This align
to aid clinicians in making accurate and timely diagnoses and ment enables generative models to process and generate multi
guiding management decisions. modal outputs, such as synthesizing medical reports from imag
Beyond these applications, LLMs could be integrated into ing data or answering clinical queries by combining visual and
healthcare delivery to automate documentation tasks and textual inputs. Theoretically, this integration mimics human
improve clinician efficiency. AI-powered ‘‘scribes’’ are capable reasoning, where physicians synthesize diverse information to
of recording patient histories; creating medical notes; handling form diagnoses, making multimodal AI a fundamental break
pre-authorization requests for medications or tests; scheduling through for medicine. By modeling statistical relationships
follow-up appointments; and managing lab test results, scans, across modalities, multimodal generative AI transforms medical
procedures, billing, and more and are already being used clini applications; it enhances diagnostic accuracy by correlating im
cally.11,22–25 Notably, LLMs have shown the ability to summarize aging and clinical notes, accelerates drug discovery by inte
medical information as effectively as human experts. Much of the grating molecular structures with textual annotations, and per
emerging research on LLMs focuses on ‘‘agentic’’ environments, sonalizes treatment plans by combining patient histories with
in which AI systems dynamically complete complex tasks by in real-time sensor data. Unlike unimodal models, which risk
fragmented insights, multimodal AI captures the holistic nature newer AI-enabled smartwatches have demonstrated the ability
of medical data, driving progress in precision medicine and clin to identify individuals at risk of atrial fibrillation, screen for left ven
ical decision support. The scalability of models like CLIP, which tricular systolic dysfunction, and to monitor cardiac function post-
improve with diverse and large-scale datasets, further amplifies COVID-19 vaccination.48–51 Devices in research settings include
their impact, positioning multimodal generative AI as a corner implantable temperature sensors for early detection of acute kid
stone of the generative AI era in medical research.37–44 Current ney rejection in transplant patients, sensors for continuous cortisol
multimodality AI research is focused on incorporating more mo level detection in sweat, and wearable ultrasound devices for
dalities into a single model and incorporating volumetric data physiologic monitoring that use machine learning to maintain
sets like magnetic resonance imaging (MRI) and computed to high-quality images during movement.52–54 Some AI-powered
mography (CT) and video. sensors do not provide continuous monitoring but do allow for
Multimodal AI is rapidly progressing in the field of pathology. more accessible diagnosis. For example, smartphone-based
Large pathology training sets can incorporate standardized im diagnostic tools, like dermoscopy lenses coupled with AI models,
ages of slides and specimens with text reports, genomic data, have demonstrated high accuracy in diagnosing suspicious skin
and EHR data, providing a prime target for AI research.45 By lesions, reducing the need for in-person consultations.55 Similarly,
applying the Transformer architecture, researchers have created smartphones with endoscope or otoscope attachments, com
‘‘vision-language’’ models, which incorporate these text and im bined with AI models, have been shown to assist with accurate
age-analysis components into a unified model. These models remote diagnosis of diseases such as acute otitis media.56 By
are capable of advanced tasks like image captioning and enabling individuals to take control of their health and well-being,
answering questions about an image. An early model used pub these novel AI-enabled sensors have the potential to regularly
licly available social media pathology images and captions to pro monitor health in the home setting to identify developing issues
duce a model named protein-ligand interaction profiler (PLIP).46 earlier and to reduce the burden on healthcare systems.
Later models used larger datasets; PathChat applied an LLM to
the large UNI pathology image model to create a vision-language Advanced medical screening
model, while contrastive learning from captions from histopathol AI medical screening tools can detect disease earlier and more
ogy (CONCH) trained natively on around 1 million image-text pairs efficiently than with traditional methods. Screening programs
from diverse sources.34,47 Both approaches resulted in accurate have traditionally targeted large populations with broad inclusion
multimodal pathology models. As foundation models become criteria. AI enables more accurate targeting of high-risk individ
more capable, it is likely that these single larger models will uals to realize individual and societal benefits. For example, re
continue to replace dedicated smaller models. searchers have recalibrated low-dose lung cancer screening
recommendations using AI analysis to prioritize workup for
CHANGING MEDICAL PRACTICE higher-risk patients and to decrease screening frequency for
those at lower risk.57
Reviewing the extensive recent research in biomedical AI allows 2D and 3D mammography-based AI interpretation algorithms
us to anticipate the future direction of medical care. State-of-the- have matched human abilities in real-world clinical scenarios
art AI developments point to a new model of health management while clinical rollout continues.58,59 Research has shown that
in which patients have more regular and detailed insight into their more advanced models combining lesion detection and texture
health. Patients are empowered to manage their own health with analysis to determine short-term and long-term breast cancer
AI tools that provide more timely and personalized feedback. risks have been shown to improve overall risk assessment.60
Traditional screening programs can become more tailored and AI applied to traditional mammography has also been shown
personalized. AI has the potential to enable a transformation in to help determine which patients would most benefit from sup
the delivery of healthcare by shifting more care from reactive, plemental MRI, reducing missed cancers without a large in
hospital-centric treatment to proactive, personalized, and crease in the MRI screening burden.61
accessible health management. Patients can be triaged to Photography- and video-based screening tools have been
more fine-tuned levels of care, with progressive escalation as shown to provide affordable and fast analysis of complex neuro
needed. Earlier diagnosis and intervention reduce the reliance logical disorders, and they have also been able to go a step
on acute care resources and can lead to improved outcomes. further and provide clinical predictions about disease progres
And finally, these new AI-powered tools promise to revolutionize sion.62–64 For example, a retinal image-based system was devel
medical forecasting with multiscale capabilities; predictions can oped to predict myocardial infarction, providing a less invasive
be made at molecular, cellular, individual, and population scales, screening option.65
completely rethinking the standard model of care. Powerful AI Next-generation AI-powered screening promises to more accu
models will enable more accurate and dynamic short-term and rately triage patients, improve screening efficiency, and improve
long-term risk assessment. These new AI-powered tools prom predictive analytics with accessible technology (Figure 1). The
ise to improve the accessibility of care, improve decision-mak final large component of this new AI-enabled paradigm in health
ing, and provide more targeted management. care is multiscale medical forecasting.
are paired with traditional and new data inputs to enable earlier opers of AlphaFold were awarded the 2024 Nobel Prize in
and more precise, accurate, personalized, affordable, efficient, Chemistry for this work.70
equitable, and convenient medical diagnosis.66 AI algorithms The impact of these advances extends beyond proof of
are being used in medical forecasting to predict future events concept. AlphaFold2 quickly expanded to include protein folding
or outcomes based on personalized patient information after predictions for more than 200 million of the most common pro
training on large datasets. Forecasting applies to the entire teins found in over 1 million species.71 AlphaFold3 includes
context of health, from the molecular level to the cellular level, more complex biomolecular structures beyond individual pro
the organ system level, the individual level, and to the population teins, expanding structure prediction capabilities to protein com
and global levels (Figure 2). plexes and protein-ligand interactions.69 The pace of develop
ment of new AI tools has been staggering—within just a few
Molecular-level progress weeks near the end of 2024, 10 major molecular-level research
The field of protein science has been revolutionized by the devel projects were released.72 These projects included a foundation
opment of an AI model called AlphaFold2, developed by model for DNA, a tool to predict protein-protein interactions,
DeepMind in 2020, which achieved unprecedented accuracy in and a Human Cell Atlas, among others.73–75 These tools are
protein structure prediction.67 This breakthrough, along with crucial for understanding biological processes and for designing
the independently developed RoseTTAFold, used attention new therapeutics.70
mechanisms to predict 3D protein structures from amino acid However, as these models excel at predicting static struc
sequences with near-experimental precision.68 AlphaFold2’s tures, a significant challenge remains in capturing protein dy
success stemmed from its innovative use of multiple sequence namics and flexibility.76 Current research is focused on extend
alignments and its ability to learn spatial relationships between ing these models to predict not just a single structure but also
amino acids.67,69 This advancement not only accelerated struc the various conformations a protein might adopt under various
tural biology research but also catalyzed developments in pro conditions.69 This is particularly important as subtle changes in
tein design, function prediction, and drug discovery; the devel protein folding can lead to significant physiologic differences
and outcomes, with misfolded proteins implicated in diseases networks or cellular systems, pushing the boundaries of syn
such as cystic fibrosis and Huntington disease.69 Currently these thetic biology and potentially enabling new approaches to treat
generative AI tools can be used to predict clinical outcomes of ing complex diseases.
various cystic fibrosis mutations.77 AlphaFold2 has been helpful
for predicting antigen proteins in pathogens like rotavirus and Cellular, organ-system, and individual-level forecasting
other infections.78 It has also been used in immunology research Cardiology
to predict antibody structure and assist with vaccine develop A wide range of experimental tools in cardiology can benefit
ment, predict membrane protein structure and interactions to from the pattern-matching capabilities of AI techniques. In the
assist with drug development, characterize enzyme activity for acute setting, AI models have the potential to alert clinicians
diseases like porphyria, assist with research on drug resistance, to developing decompensation. For example, Lin et al. devel
and to predict outcomes for certain acute lymphoblastic leuke oped an electrocardiogram (ECG) model that monitored
mia (ALL) subtypes, among many other applications.78 From a 12-lead ECGs of hospitalized patients and was able to alert pro
clinical perspective, these models promise to enable personal viders of impending decompensation and to improve clinical
ized prediction of the impact of genetic mutations on protein outcomes.87 Other models have been able to identify patients
function and disease pathways, as well as help us understand at risk of hypotension, tachycardia, or hypoxia, based on stan
the nature of cancer. dard vital sign monitors. Researchers were able to use ECG
Building on the foundations of protein structure prediction, the data to detect a pattern of occlusion myocardial infarction
field of protein generation and design has seen significant ad even without ST elevation, surpassing human abilities and al
vancements. Tools like RFdiffusion and FrameDiff use generative lowing for earlier intervention.88 Sundrani et al. developed a
techniques to generate 3D structural protein backbones.79,80 bimodal model to predict tachycardia, hypotension, or hypoxia
Sequence generation tools like ProGen, ProteinMPNN, and Evo in the emergency department (ED), based on triage data and
can output amino acid sequences based on various inputs and ECG/pulse plethysmograph (PPG) waveforms.89
allow researchers to create novel proteins with specific structures AI-powered models have been developed to forecast the risk
or functions. The emergence of Evo 2 also represents a milestone of future cardiovascular disease events such as heart attacks
multimodal foundation model incorporating DNA, RNA, and pro or strokes, based on various factors such as age, gender, and
teins in one large model.81 This shift from prediction to design medical history. By identifying a unique set of patient variables
represents a new frontier in protein engineering, opening possibil from a potential list of thousands of variables, models have
ities for creating proteins tailored to specific tasks or environ been shown to more accurately predict coronary artery disease
ments.82–86 These models are helping to elucidate the mecha (CAD) risk than was previously possible.90 In another model, re
nisms of biological action from DNA to RNA to protein, and they searchers were able to identify 27 specific proteins in blood sam
enable researchers to efficiently manipulate steps in these pro ples that can be used to create a personalized survival model
cesses. that is more accurate than previous methods.91 ECG analysis
As AI continues to reshape protein science, the field is moving can be used to predict the risk of future atrial fibrillation or LV
toward more integrated, multiscale approaches. The next fron dysfunction after percutaneous coronary intervention (PCI),
tier will likely involve developing AI systems that can not only which in turn predicts which patients would most benefit from
design individual proteins but can also engineer entire protein medical intervention.92,93
Cardiovascular disease risks can be estimated using imaging declines in gait speed in Parkinson disease patients, providing
techniques. An AI tool called EchoCLIP was able to characterize clinically helpful information about progression of the disease.
subtle clinically significant changes over time on echocardio Other contactless sensors such as cameras have been used to
grams, which would be difficult for a human interpreter.94 The track neurodegenerative disease progression and even to iden
timing of future arrhythmic sudden death can be predicted based tify the likely molecular etiology in patients with Friedreich’s
on myocardial scarring seen on MRI.95 Coronary artery CTA ataxia62 and Duchenne muscular dystrophy,112 allowing for
studies are time consuming studies when interpreted manually, earlier diagnosis, intervention, and personalized treatment plans.
but Lin et al. were able to automate the process and show prog AI models based on EHR data are capable of predicting read
nostic value for predicting future myocardial infarction.96 Coro mission, mortality, and length-of-stay.89,113 An AI model trained
nary CTA can also show perivascular fat inflammation, allowing on EHR data was able to predict the International Classification
researchers to create an AI algorithm to estimate the risk of of Diseases (ICD) codes of a patient’s next visit, increasing the
future cardiac events even when there is no obstructive coronary ability to predict uncommon outcomes like pancreatic cancer
disease.97 and self-harm.114 Models have been able to predict seizure
Radiology, oncology, and other fields recurrence risk in pediatric patients, based on routine clinical
In radiology, AI algorithms have been applied to standard MRI or notes, chart messages, and diagnostic studies.115
CT studies to identify subtle image texture patterns that are not
detectable by human clinicians. For example, researchers were Population-level forecasting
able to use MRI data to reliably classify pediatric medulloblas Medical resources are fundamentally limited in our current med
toma into four subtypes based on image characteristics alone, ical system. Global forecasting allows for the optimal distribution
facilitating the development of treatment regimens when there of resources in order to provide the most benefit. For example,
is no access to molecular testing.98 AI-based tools have similarly by modeling brain aging across populations, researchers are
been developed for classifying lung cancer, breast cancer, able to identify and address geographic, socioeconomic, and
neuroendocrine tumor, gastrointestinal stromal tumor, colo health factors, which are associated with increased risks of de
rectal cancer, and other tumors, and they can be used to predict mentia.116 Modeling the global spread of infections can provide
histopathology, grading, metastatic potential, and other clinically information about where a disease may next present. And by
useful characteristics.99–103 modeling weather and population data, governments can antic
Research in the field of oncology has made significant strides in ipate impending heatstroke events.117
leveraging large multimodal AI models for automated analysis of
whole-slide pathology images. AI models have been shown to CHALLENGES AND LIMITATIONS
help determine susceptibility to chemotherapy agents in pancre
atic adenocarcinoma by analyzing subtle morphological features AI has the potential to transform healthcare delivery, but multiple
in the tumor microenvironment, ultimately informing clinical out challenges and limitations need to be addressed before its po
comes.104,105 AI has facilitated the development of tools like tential can be realized.
tumor origin differentiation using cytological histology (TORCH), LLM development and use presents challenges. Much of what
which can more reliably identify the origin of cancers with un has been shown is not based on real-world, prospective studies,
known primary sites, using cytological samples from pleural and but it is instead simulated with patient actors and theoretical
peritoneal fluid.106 Models have been developed that can predict cases. As the infrastructure supporting LLM deployment evolves,
the risk of a patient developing pancreatic adenocarcinoma with the development of foundation models and standardized
based on the patient’s historical diagnoses and trajectory of dis benchmarking,19 challenges related to accuracy, bias, privacy,
eases.107 An AI algorithm trained on pancreatic cancer patients and ethics persist.24 Earlier LLMs were prone to ‘‘hallucinating’’ in
was able to predict future complications, following pancreatic formation, although recent efforts have shown promise in miti
resection, and was able to show reduced mortality by approxi gating this issue.118,119 Bias in training data and the tendency of
mately 50% at 90 days when compared with the usual care in LLMs to accept input text as truthful can also limit output accu
which the clinician did not have access to the algorithm.108 racy.24,120 According to Han et al., medical LLM models currently
Other AI-based tools have been able to extract additional clin do not meet standards of general or medical safety, although ef
ically relevant information from traditional sources. For example, forts to improve safety have been promising.121 Combining hu
a tool called RETFound used fundus photography and retinal op man skills with AI tools has the potential to improve care.122,123
tical coherence tomography to predict the presence of systemic However, more research is needed to understand how to effec
conditions like heart failure and myocardial infarction in addition tively integrate these tools into medical workflows.120,124
to more predictably identifying sight-threatening diseases of the Another fundamental challenge lies in the development and
retina.109 validation of AI models. Older models were relatively small and
Harnessing contactless sensor data through the application of used a smaller set of curated training data. Determining how to
AI allows ‘‘ambient intelligence,’’ in which the patterns of sensors regulate these tools is and was a challenge; these models do
are interpreted by AI algorithms to learn about the surrounding not necessarily generalize well across different populations,
environment and patient movement; this has the potential to again raising concerns about bias. Significant research and re
improve patient safety and clinical efficiency.110 For example, sources will be required to ensure that AI models are appropriate
Liu et al. developed a low-power radio-based sensor to monitor for any given situation or population. Larger and more diverse
gait.111 This device was able to identify statistically significant datasets may be required for training to ensure accurate
performance. To address the complexity of regulating regularly are complex, often outdated, and heterogeneous. Data integra
updated AI tools, the Food and Drug Administration (FDA) has tion between devices and into preexisting IT systems will require
developed the Predetermined Change Control Plan (PCCP) significant effort. One solution is unlikely to work everywhere.
framework, in which a vendor undergoes initial certification for The infrastructure must be in place to seamlessly incorporate
a product but then is able to update the product within certain this information into EHRs and decision-making processes.
boundaries.125 This acknowledges the benefits of product up Incorporating AI models into practice remains a long and
dates while it maintains safety standards and avoids over arduous process (Figure 3).
whelming FDA resources. Multiple other frameworks and entities The role of physicians is likely to evolve. In the future, physicians
can be used for regulation of AI products. may become orchestrators or directors, managing the most com
Generative AI models present significantly more challenges plex cases and overseeing other providers who review the more
from a regulatory perspective.125 It becomes more unclear about routine cases. Physicians could manage departmental operations,
what data were used for training, how the model performs on any quality control, tumor boards, and procedures, while also bearing
one task or population, and how the output varies based on the legal liability. Research has shown that different physicians can
non-deterministic nature of generative AI tools. Additionally, any respond to AI tools variably, highlighting the need for additional
updates to a model or training data can significantly alter the research into understanding the use of AI tools.124,129 Healthcare
model’s output. Medical device companies have begun to inte providers must be trained to interpret and act upon the data
grate generative AI features into their products, although more in generated by these tools. Skepticism or hesitancy about new AI
tegral use and approval of generative AI remain in question.126 As tools is inevitable and will need to be addressed, particularly
AI tools become more generally capable and more unknowable, when considering the perception of potential job loss or loss of
evaluating them may begin to look more like the evaluation of phy autonomy. Resources such as the AI for Medicine Specialization
sicians, incorporating licensing exams and monitoring.125,127 courses in Coursera (https://www.coursera.org/specializations/
Most AI models today are not transparent enough about their ai-for-medicine), Udacity courses, and books like Co-Intelligence:
design and datasets to allow for recreations of the model. The Living and Working with AI by Mollick and The AI Revolution in
black-box nature of these tools reduces understanding of the Medicine by Lee can serve as starting points.130,131
mechanisms of the model and uses and limitations of the out The costs of implementing new AI tools must be addressed to
puts. Additionally, current AI models may not be truly reasoning ensure accessibility and prevent exacerbation of healthcare dis
but rather repeating trained information. This limitation raises parities. It is unclear where the expenses for AI tools will fall—on
questions about their reliability in novel clinical scenarios.128 patients, populations, or third parties. In the US, the dominant
Many prediction and forecasting studies are retrospective and fee-for-service model leaves open the question of reimbursement
may not be rigorous enough for clinical implementation. It may by the government or private insurers. Most reimbursement for
be years before widespread clinical benefit can be realized or medical care currently is initiated by current procedural terminol
proven for some of these tools, as the scientific and healthcare ogy (CPT) or diagnosis-related groups (DRG) codes; these codes
communities adapt to the new capabilities. Furthermore, when generally do not cover AI tools in their current form. A few early AI
implementing a new model, questions arise about how to vali tools received temporary coverage in part because of their nov
date, test, and continuously update it with new information or ad elty, but a more robust and predictable system of reimbursement
vancements. must be developed, tested, and implemented if fee-for-service
The pathway to integrating AI tools into existing healthcare coverage continues. Generalist medical AI (GMAI) tools may
systems presents challenges. Healthcare systems worldwide not fit well into the traditional fee-for-service model, however.
Possible alternative reimbursement models for GMAI include as AUTHOR CONTRIBUTIONS
signing an overarching care management coordinated activity
L.J.F. wrote the original draft and edited the work. E.C. performed conceptu
code for assisting with existing clinical services or value-based
alization and reviewed and edited the work. E.T. performed conceptualization,
reimbursement. Proving clinical and economic value for AI tools investigation, data curation, review and editing, and project administration and
is more complicated and multifaceted than would be initially ex provided supervision. P.R. performed conceptualization and review and edit
pected, however. Advanced statistical analysis techniques may ing and provided supervision.
be required to assess the value or return on investment of a given
AI project or tool. Survival, quality of life, and other variables can DECLARATION OF INTERESTS
be weighted to form a proxy for the ‘‘value’’ of a high-investment
technology. Finally, the realized value of a new tool often lags the P.R. is a co-founder, part-time employee, and equity holder of a2z Radi
ology AI.
implementation, as users learn how to incorporate the new infor
mation and optimize patient selection. Health systems and physi
DECLARATION OF GENERATIVE AI AND AI-ASSISTED
cians are unlikely to implement AI tools on a larger scale if there is
TECHNOLOGIES IN THE WRITING PROCESS
no quantifiable benefit. The overall impact on the healthcare sys
tem remains uncertain. During the preparation of this work, the author(s) used ChatGPT o4-mini and
Privacy and data security are paramount as the medical sys Grok 3 in order to improve the readability and language of the manuscript. After
tem relies on increasingly connected technology. AI tools trained using this tool, the authors reviewed and edited the content as needed and
on real-world data carry the additional risk of exposing patient in take full responsibility for the content of the published article.
20. Lee, P., Bubeck, S., and Petro, J. (2023). Benefits, Limits, and Risks of 36. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S.,
GPT-4 as an AI Chatbot for Medicine. N. Engl. J. Med. 388, 1233– Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. (2021). Learning trans
1239. https://doi.org/10.1056/NEJMsr2214184. ferable visual models From natural language supervision. Preprint at ar
Xiv. https://doi.org/10.48550/arXiv.2103.00020.
21. Saab, K., Tu, T., Weng, W.-H., Tanno, R., Stutz, D., Wulczyn, E., Zhang,
F., Strother, T., Park, C., Vedadi, E., et al. (2024). Capabilities of Gemini 37. Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S. von,
models in medicine. Preprint at arXiv. https://doi.org/10.48550/arXiv. Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2022). On the
2404.18416. opportunities and risks of foundation models. Preprint at arXiv. https://
doi.org/10.48550/arXiv.2108.07258.
22. Liu, S., McCoy, A.B., Wright, A.P., Carew, B., Genkins, J.Z., Huang, S.S.,
Peterson, J.F., Steitz, B., and Wright, A. (2023). Leveraging Large Lan 38. Zhang, K., Zhou, R., Adhikarla, E., Yan, Z., Liu, Y., Yu, J., Liu, Z., Chen, X.,
guage Models for Generating Responses to Patient Messages. Preprint Davison, B.D., Ren, H., et al. (2024). A generalist vision–language founda
at medRxiv, 2023.07.14.23292669. https://doi.org/10.1101/2023.07.14. tion model for diverse biomedical tasks. Nat. Med. 30, 3129–3141.
23292669. https://doi.org/10.1038/s41591-024-03185-2.
23. Tierney, A.A., Gayre, G., Hoberman, B., Mattern, B., Ballesca, M., Kipnis, 39. Zhou, H.-Y., Yu, Y., Wang, C., Zhang, S., Gao, Y., Pan, J., Shao, J., Lu,
P., Liu, V., and Lee, K. (2024). Ambient Artificial Intelligence Scribes to G., Zhang, K., and Li, W. (2023). A transformer-based representation-
Alleviate the Burden of Clinical Documentation. NEJM Catal. 5. https:// learning model with unified processing of multimodal input for clinical di
doi.org/10.1056/CAT.23.0404. agnostics. Nat. Biomed. Eng. 7, 743–755. https://doi.org/10.1038/
s41551-023-01045-x.
24. Omiye, J.A., Gui, H., Rezaei, S.J., Zou, J., and Daneshjou, R. (2024).
Large Language Models in Medicine: The Potentials and Pitfalls: A Narra 40. Khader, F., Müller-Franzes, G., Wang, T., Han, T., Tayebi Arasteh, S.,
tive Review. Ann. Intern. Med. 177, 210–220. https://doi.org/10.7326/ Haarburger, C., Stegmaier, J., Bressem, K., Kuhl, C., Nebelung, S.,
M23-2772. et al. (2023). Multimodal Deep Learning for Integrating Chest Radio
25. Grewal, H., Dhillon, G., Monga, V., Sharma, P., Buddhavarapu, V.S., graphs and Clinical Parameters: A Case for Transformers. Radiology
Sidhu, G., and Kashyap, R. (2023). Radiology Gets Chatty: The 309, e230806. https://doi.org/10.1148/radiol.230806.
ChatGPT Saga Unfolds. Cureus 15, e40135. https://doi.org/10.7759/cur 41. Chen, R.J., Lu, M.Y., Williamson, D.F.K., Chen, T.Y., Lipkova, J., Noor, Z.,
eus.40135. Shaban, M., Shady, M., Williams, M., Joo, B., et al. (2022). Pan-cancer
76. Karelina, M., Noh, J.J., and Dror, R.O. (2023). How accurately can one 91. Williams, S.A., Ostroff, R., Hinterberg, M.A., Coresh, J., Ballantyne, C.M.,
predict drug binding modes using AlphaFold models? eLife 12, Matsushita, K., Mueller, C.E., Walter, J., Jonasson, C., Holman, R.R.,
RP89386. https://doi.org/10.7554/eLife.89386. et al. (2022). A proteomic surrogate for cardiovascular outcomes that is
sensitive to multiple mechanisms of change in risk. Sci. Transl. Med.
77. Drysdale, E. (2023). A multitask neural network trained on embeddings
14, eabj9625. https://doi.org/10.1126/scitranslmed.abj9625.
from ESMFold can accurately rank order clinical outcomes for different
cystic fibrosis mutations. Preprint at bioRxiv. https://doi.org/10.1101/ 92. Khurshid, S., Friedman, S., Reeder, C., Di Achille, P., Diamant, N., Singh,
2023.10.26.564274. P., Harrington, L.X., Wang, X., Al-Alusi, M.A., Sarma, G., et al. (2022).
ECG-Based Deep Learning and Clinical Risk Factors to Predict Atrial
78. Zhang, H., Lan, J., Wang, H., Lu, R., Zhang, N., He, X., Yang, J., and
Fibrillation. Circulation 145, 122–133. https://doi.org/10.1161/CIRCULA
Chen, L. (2024). AlphaFold2 in biomedical research: facilitating the devel
TIONAHA.121.057480.
opment of diagnostic strategies for disease. Front. Mol. Biosci. 11,
1414916. https://doi.org/10.3389/fmolb.2024.1414916. 93. Jeon, K.-H., Lee, H.S., Kang, S., Jang, J.-H., Jo, Y.-Y., Son, J.M., Lee, M.
S., Kwon, J.-M., Kwun, J.-S., Cho, H.-W., et al. (2024). AI-enabled ECG
79. Watson, J.L., Juergens, D., Bennett, N.R., Trippe, B.L., Yim, J., Eisenach,
index for predicting left ventricular dysfunction in patients with ST-
H.E., Ahern, W., Borst, A.J., Ragotte, R.J., Milles, L.F., et al. (2023). De
segment elevation myocardial infarction. Sci. Rep. 14, 16575. https://
novo design of protein structure and function with RFdiffusion. Nature
doi.org/10.1038/s41598-024-67532-6.
620, 1089–1100. https://doi.org/10.1038/s41586-023-06415-8.
94. Christensen, M., Vukadinovic, M., Yuan, N., and Ouyang, D. (2024).
80. Yim, J., Trippe, B.L., Bortoli, V.D., M, E., Doucet, A., Barzilay, R., and
Vision–language foundation model for echocardiogram interpretation.
Jaakkola, T. (2023). SE(3) diffusion model with application to protein
Nat. Med. 30, 1481–1488. https://doi.org/10.1038/s41591-024-02959-y.
backbone generation. Preprint at arXiv. https://doi.org/10.48550/arXiv.
2302.02277. 95. Popescu, D.M., Shade, J.K., Lai, C., Aronis, K.N., Ouyang, D., Moorthy,
M.V., Cook, N.R., Lee, D.C., Kadish, A., Albert, C.M., et al. (2022).
81. Brixi, G., Durrant, M.G., Ku, J., Poli, M., Brockman, G., Chang, D., Gon
Arrhythmic sudden death survival prediction using deep learning analysis
zalez, G.A., King, S.H., Li, D.B., Merchant, A.T., et al. (2025). Genome
of scarring in the heart. Nat CardioVasc Res 1, 334–343. https://doi.org/
modeling and design across all domains of life with Evo 2. Preprint at bio
10.1038/s44161-022-00041-9.
Rxiv. https://doi.org/10.1101/2025.02.18.638918.
96. Lin, A., Manral, N., McElhinney, P., Killekar, A., Matsumoto, H., Kwiecin
82. Ni, B., Kaplan, D.L., and Buehler, M.J. (2023). Generative design of de ski, J., Pieszko, K., Razipour, A., Grodecki, K., Park, C., et al. (2022).
novo proteins based on secondary structure constraints using an atten Deep learning-enabled coronary CT angiography for plaque and stenosis
tion-based diffusion model. Chem 9, 1828–1849. https://doi.org/10. quantification and cardiac risk prediction: an international multicentre
1016/j.chempr.2023.03.020. study. Lancet Digit. Health 4, e256–e265. https://doi.org/10.1016/
83. Lutz, I.D., Wang, S., Norn, C., Courbet, A., Borst, A.J., Zhao, Y.T., Dosey, S2589-7500(22)00022-X.
A., Cao, L., Xu, J., Leaf, E.M., et al. (2023). Top-down design of protein 97. Chan, K., Wahome, E., Tsiachristas, A., Antonopoulos, A.S., Patel, P.,
architectures with reinforcement learning. Science 380, 266–273. Lyasheva, M., Kingham, L., West, H., Oikonomou, E.K., Volpe, L., et al.
https://doi.org/10.1126/science.adf6591. (2024). Inflammatory risk and cardiovascular events in patients without
84. Madani, A., Krause, B., Greene, E.R., Subramanian, S., Mohr, B.P., obstructive coronary artery disease: the ORFAN multicentre, longitudinal
Holton, J.M., Olmos, J.L., Xiong, C., Sun, Z.Z., Socher, R., et al. cohort study. Lancet 403, 2606–2618. https://doi.org/10.1016/S0140-
(2023). Large language models generate functional protein sequences 6736(24)00596-8.
AI plays a significant role in oncology by enabling automated analysis of whole-slide pathology images and predictive modeling of cancer susceptibility to treatment . AI can identify subtle morphological tumor features to inform clinical outcomes, and tools like TORCH accurately determine cancer origins . AI models predict pancreatic cancer complications post-resection, reducing mortality significantly when compared to standard care . Multimodal AI models in genomics, like those for PD-(L)1 blockade response in non-small cell lung cancer, also demonstrate AI's impact in predicting treatment outcomes .
AI algorithms facilitate tumor classification by analyzing medical imaging to detect histopathological features undetectable by humans, enabling the categorization of lung, breast, neuroendocrine, gastrointestinal stromal, and colorectal cancers . By predicting tumor histopathology, grading, and metastatic potential, AI informs the development of personalized treatment strategies . This results in more precise treatments, early interventions, and tailored therapeutic regimens, improving clinical outcomes and treatment efficiency in oncology .
AI algorithms have significantly advanced medical forecasting by using dynamic inputs across different levels—from molecular to population . Tools like EchoCLIP on echocardiograms detect changes difficult for humans to perceive, aiding individualized treatment prediction . At the population level, AI models forecast global infection spread and heatstroke events, guiding resource distribution . These capabilities improve clinical outcomes, patient management, and resource allocation, offering predictive insight for effective prevention strategies nationally and globally .
AI models enhance cardiovascular disease prediction by identifying unique patient variables, leading to more accurate risk assessment of coronary artery disease (CAD). Tools like EchoCLIP characterize subtle echocardiogram changes over time , and ML models identify myocardial infarction risks from coronary artery CTA data, discovering predictive inflammation even without obstructive disease . AI is used to forecast heart disease events and improve risk prediction beyond human capability .
Integrating contactless sensor data with AI models can transform patient care by providing continuous, non-invasive monitoring of neurological conditions, significantly aiding neurodegenerative disease management . For example, sensors track gait speed in Parkinson's patients to monitor disease progression . Cameras assess disease progression and molecular etiologies in conditions like Friedreich’s ataxia, enabling earlier diagnosis and personalized treatment planning . This ambient intelligence enhances patient monitoring accuracy and improves clinical interventions without patient discomfort or interruption .
AI-based multimodal models significantly influence computational pathology by integrating different data modes to improve image interpretation and diagnosis . Innovations include visual-language models that facilitate pathology image analysis and multimodal generative AI frameworks that function as co-pilots for pathologists . These advancements help in more reliably identifying cancer types, predicting treatment responses, and leveraging social media data for image insights, contributing to informed medical decisions and tailored therapies .
AlphaFold and similar tools have significantly advanced drug discovery by accurately predicting protein structures, facilitating the design of diagnostic strategies for diseases . They aid in identifying drug binding modes, accelerating the discovery of therapeutically relevant targets . These tools lead to improved understanding and manipulation of molecular interactions, enhancing the efficiency of biomedical research and opening new possibilities for innovative drug development .
Advanced AI techniques, such as those used in RFdiffusion, FrameDiff for 3D structure generation, and sequence generation tools like ProGen, ProteinMPNN, and Evo, allow for the prediction and design of proteins based on various inputs . This enables the creation of novel proteins with specific structures or functions, opening possibilities for designing proteins tailored to specific tasks or environments . These models help elucidate biological mechanisms and facilitate the manipulation of biological processes, advancing synthetic biology and offering new therapeutic approaches for complex diseases . The integration of multimodal foundation models like Evo 2 further enhances the ability to understand and manipulate the central dogma from DNA to protein .
AI advancements in predicting patient risks utilize EHR data by developing models that foresee readmission, mortality, and length-of-stay . For example, an AI model predicts International Classification of Diseases (ICD) codes for future visits, increasing detection capabilities for rare outcomes like pancreatic cancer . AI also predicts seizure recurrence risks in pediatric patients by analyzing routine clinical data, indicating improved outcome forecasting through EHR data synthesis .
Challenges in deploying AI in healthcare include issues of data accuracy, bias, privacy, and ethics, particularly related to large language models (LLMs). LLMs have been known to "hallucinate" or generate incorrect information, though recent efforts have aimed to mitigate this . Bias from training data persists, potentially impacting the fairness and reliability of these models . The necessity for real-world, prospective studies supersedes simulations with patient actors to validate AI applications .