AI Efficiency in Medical Imaging
AI Efficiency in Medical Imaging
https://doi.org/10.1038/s41746-024-01248-9
In healthcare, integration of artificial intelligence (AI) holds strong promise for facilitating clinicians’
1234567890():,;
1234567890():,;
work, especially in clinical imaging. We aimed to assess the impact of AI implementation for medical
imaging on efficiency in real-world clinical workflows and conducted a systematic review searching six
medical databases. Two reviewers double-screened all records. Eligible records were evaluated for
methodological quality. The outcomes of interest were workflow adaptation due to AI implementation,
changes in time for tasks, and clinician workload. After screening 13,756 records, we identified 48
original studies to be incuded in the review. Thirty-three studies measured time for tasks, with 67%
reporting reductions. Yet, three separate meta-analyses of 12 studies did not show significant effects
after AI implementation. We identified five different workflows adapting to AI use. Most commonly, AI
served as a secondary reader for detection tasks. Alternatively, AI was used as the primary reader for
identifying positive cases, resulting in reorganizing worklists or issuing alerts. Only three studies
scrutinized workload calculations based on the time saved through AI use. This systematic review and
meta-analysis represents an assessment of the efficiency improvements offered by AI applications in
real-world clinical imaging, predominantly revealing enhancements across the studies. However,
considerable heterogeneity in available studies renders robust inferences regarding overall
effectiveness in imaging tasks. Further work is needed on standardized reporting, evaluation of system
integration, and real-world data collection to better understand the technological advances of AI in
real-world healthcare workflows. Systematic review registration: Prospero ID CRD42022303439,
International Registered Report Identifier (IRRID): RR2-10.2196/40485.
With a rising number of patients and limited staff available, the need for expedite the processing of an increasing number of medical images, being
changes in healthcare is a pressing issue1. Artificial intelligence (AI) tech- used for detecting artifacts, malignant cells or other suspicious structures,
nologies promise to alleviate the current burden by taking over routine tasks, and optionally for the succeeding prioritization of patients7–9.
such as monitoring patients, documenting care tasks, providing decision To successfully adopt AI in everyday clinical practice, different
support, and prioritizing patients by analyzing clinical data2,3. AI-facilitated ways for effective workflow integration can be conceived, largely
innovations are claimed to significantly reduce the workload of healthcare depending on the specific aim, that is, enhancing the quality of diagnosis,
professionals4,5. providing reinsurance, or reducing human workload10,11. Efficiency
Several medical specialties have already introduced AI into their rou- outcomes related to AI implementation include shorter reading times or
tine work, particularly in data-intensive domains, such as genomics, a reduced workload of clinicians to meet the growing demand for
pathology, and radiology4. In particular, image-based disciplines have seen interpreting an increasing number of images12–14. Thus, whether AI
substantial benefits from the pattern recognition abilities of AI, positioning fulfills these aims and enables higher efficiency in everyday clinical work
them at the forefront of AI integration in clinical care3,6. AI technologies remains largely unknown.
Institute for Patient Safety, University Hospital Bonn, Bonn, Germany. e-mail: [email protected]
Healthcare systems are complex, combining various components and in North America (n = 21), Europe (n = 12), Asia (n = 11), and Australia
stakeholders that interact with each other15. While the success of AI tech- (n = 3). Furthermore, one study was conducted across continents. The
nology implementation highly depends on the setting, processes, and users, included studies were stemming from the medical departments of radiology
current studies largely focus on the technical features and capabilities of AI, (n = 26), gastroenterology (n = 6), oncology (n = 4), emergency medicine
not on its actual implementation and consequences in the clinical (n = 4), ophthalmology (n = 4), human genetics (n = 1), nephrology (n = 1),
landscape2,3,6,16,17. Therefore, this systematic review aimed to examine the neurology (n = 1), and pathology (n = 1). Most studies used computed
influence of AI technologies on workflow efficiency in medical imaging tomography (CT) for imaging, followed by X-ray and colonoscopes. The
tasks within real-world clinical care settings to account for effects that stem most prominent indications were intracranial hemorrhage, followed by
from the complex and everyday demands in real-world clinical care, all not pulmonary embolism, and cancer screening. Table 1 presents the key
being existent in experimental and laboratory settings18. characteristics of all included studies.
Concerning the purpose of using AI tools in clinical work, we classified
Results the studies into three main categories. First, five studies (10.4%) described an
Study selection AI tool used for segmentation tasks (e.g., determining the boundaries or
We identified 22,684 records in databases and an additional 295 articles volume of an organ). Second, 25 studies (52.1%) used AI tools to examine
through backward search. After the removal of duplicates, the 13,756 detection tasks to identify suspicious cancer nodules or fractures. Third,
remaining records were included in the title/abstract screening. Then, 207 18 studies (37.5%) investigated the prioritization of patients according to
full texts were screened, of which 159 were excluded primarily because of AI-detected critical features (e.g., reprioritizing the worklist or notifying the
inadequate study designs or not focusing on AI for interpreting imaging treating clinician via an alert).
data (Supplementary Table 1). Finally, 48 studies were included in the Regarding the AI tools described in the studies, 34 studies (70.8%)
review and data extraction. Twelve studies underwent additional meta- focused on commercially available solutions (Table 2). Only Pierce et al. did
analyses. A PRISMA flow chart is presented in Fig. 1. not specify which commercially available algorithm was used19. Thirteen
studies (27.1%) used non-commercially available algorithms, detailed
Study characteristics information on these algorithms is provided in Table 3. Different measures
Of the 48 extracted studies, 30 (62.5%) were performed in a single institu- were used to evaluate the accuracy of these AI tools, including sensitivity,
tion, whereas the 18 (37.5%) remaining studies were multicenter studies. specificity, positive predictive value (PPV), negative predictive value (NPV),
One study was published in 2010, another in 2012, and all other included and area under the curve (AUC). Sensitivity and specificity were the most
studies were published from 2018 onward. Research was mainly conducted commonly reported measures (see Tables 2 and 3).
In total only four studies followed a reporting guideline, three risk of bias ratings, with only one study being rated having a low risk of bias.
studies20–22 used Standards for Reporting of Diagnostic Accuracy (STARD) Furthermore, Mueller et al.8 reported no overall reading time but separated
reporting guideline23 and Repici et al.24 followed the CONSORT guidelines it for resident and attending physician, which we included separately in our
for randomized controlled trials25. Only two studies24,26 pre-registered their meta-analysis. Concerning the use of AI for colonoscopy, five studies
protocol and none of the included studies provided or used an open-source reported comparable measures. Our random effects meta-analysis showed
available algorithm. no significant difference between the groups: SMD: −0.04 (95% CI, −0.76 to
0.67; p = 0.87), with significant heterogeneity: Q = 733.51, p < 0.01,
Appraisal of methodological quality I2 = 99.45% (Fig. 3b). Four of the included studies had a serious risk of bias,
When assessing the methodological quality of the 45 non-randomized whereas one randomized study included was rated with a high risk of bias.
studies only one (2.2%) was rated with an overall “low” risk of bias. Four Among 11 studies that reported AI use for the prioritization of patients’
studies (8.9%) were rated “moderate”, 28 studies (62.2%) were rated “ser- scans, four measured the turnaround time. The study by Batra et al.34 did not
ious”, and 12 studies (26.7%) were rated “critical”. All three randomized report variance measures and was therefore excluded from the meta-
studies were appraised with an overall high risk of bias. Summary plots of the analysis. The remaining three studies used the AI tool Aidoc (Tables 2 and 4)
risk of bias assessments are shown in Fig. 2, full assessments can be found in to detect intracranial hemorrhage and reported the turnaround time for
Supplementary Figs. 1 and 2. The assessment of the quality of reporting cases flagged positive. The meta-analysis showed no significant difference in
using the Methodological Index for Non-randomized Studies (MINORS) is turnaround time between cases with and without AI use: SMD: 0.03 (95%
included in Supplementary Figs. 3 and 4. Higher scores indicate higher CI, −0.50 to 0.56; p = 0.84), with a significant heterogeneity across studies:
quality of reporting, with the maximum score being 24 for comparative Q = 12.31, p < 0.01, I2 = 83.75% (Fig. 3c). All included studies were non-
studies and 16 for non-comparative studies27. Comparative studies reported randomized studies, with two studies being rated with a serious risk of bias
a Median of 9 of 12 criteria with a median overall score of 15 (range: 9–23) and one with a moderate risk of bias.
and noncomparative studies reported a Median of 7 of 8 checklist items, In total, 37 studies reported details on the actual workflow adapta-
with a median overall score of 7 (range: 6–14). tions due to AI implementation, which we classified into four main var-
iants (depicted exemplarily in Fig. 4). 16 studies (43.2%) used an AI tool as
Outcomes a triage system, i.e., the AI tool reprioritized the worklist or the AI tool sent
Of all included studies, 33 (68.8%) surveyed the effects of AI implementation an alert to the clinician or referred the patient to a specialist for further
on clinicians’ time for task execution. The most frequently reported out- examination (Fig. 4a: AI triage). In two studies (5.4%), the AI tool acted as
comes included (1) reading time (i.e., time the clinicians required to inter- a gatekeeper, only referring cases labeled as suspicious to the clinician for
pret an image); (2) report turnaround time (i.e., the time from completing further review, while excluding the remaining cases (Fig. 4a: AI gate-
the scan until the report is finalized); and (3) total procedure time (i.e., the keeper). In 13 studies (35.1%), AI tools were used as a second reader for
time needed for colonoscopy)28–30. Times were assessed via surveys, recor- detection tasks in two variants (Fig. 4b: AI second reader). Eight studies
ded by researchers or staff, retrieved via time stamps, or self-recorded. reported that the AI tool functioned as a second reader in a concurrent
Seventeen studies did not describe how they obtained the reported times. mode, presenting additional information during the task to clinicians (e.g.,
Regarding our research question, whether AI use improves efficiency, in colonoscopy studies, where the workflow remained the same as before
22 studies (66.6%) reported a reduction in time for task completion due to displaying additional information during the procedure). Five studies
AI use, with 13 of these studies proving the difference to be statistically described a workflow in which the AI tool was used additionally after the
significant (see Table 4). Eight studies (24.2%) reported that AI did not normal detection task, resulting in a sequential second reader workflow. In
reduce the time required for tasks. The remaining three studies (9.1%) chose five segmentation studies (13.5%), the AI tool served as a first reader with
a design or implementation protocol in which the AI was used after the the clinician reviewing and then correcting the AI-provided contours
normal reading, increasing the task time measured by study design31–33. (Fig. 4c: AI first reader).
For our meta-analyses, we established clusters with studies deploying In a single study (2.7%), the type of actual workflow implementation
similar methods, outcomes, and specific purposes. Concerning studies on was at the radiologist’s choice. Three studies used a study design with the AI
detection tasks, we identified two main subgroups: studies using AI for tool as a second reader in a pre-specified reading sequence; therefore, we did
interpreting CT scans (n = 7) and those using AI for colonoscopy (n = 6). not classify them as workflow adaptations. The remaining studies did not
Among studies using AI for interpreting CT images, a meta-analysis was provide sufficient information on workflow implementation.
performed for four studies reporting clinicians’ reading times. As shown in In our initial review protocol, we also aimed to include investigations
Fig. 3a, the reading times for interpreting CT images did not differ between on clinician workload14. Apart from three studies, Liu et al.35, Raya-
the groups: standardized mean error (SMD): −0.60 (95% confidence Povedano et al.36, and Yacoub et al.37, which calculated the saved workload in
interval, −2.02 to 0.82; p = 0.30). Furthermore, the studies showed sig- scans or patients because of AI use, no other study reported AI imple-
nificantly high heterogeneity: Q = 109.72, p < 0.01, I2 = 96.35%. This het- mentation effects on clinicians’ workload (besides the time for tasks effects,
erogeneity may be associated with the different study designs included or the see above). Other reported outcomes included evaluations of the AI
performing the task (i.e., satisfaction)8,38; frequency of AI use29,30; patient Our study included a vast variety of AI solutions reported in the
outcomes, such as length of stay or in-hospital complications39,40; and sen- publications. The majority was a large number of commercially available AI
sitivity or specificity changes8,21,24,28,41. solutions which mostly had acquired FDA or CE clearance, ensuring safety
of use in a medical context49. Nevertheless, it is desirable that future studies
Risk of bias across studies provide more detailed information about the accuracy of the AI solutions in
Funnel plots for the studies included in the meta-analyses were created their use case or processing times, which both can be crucial to AI
(Supplementary Figs. 5–7). 19 studies declared a relevant conflict of interest adoption50. Regarding included studies which used non-commercially
and six other studies had potential conflicts of interest, which sum up to available algorithms, some of the studies did not specify the origin or source
more than 50% of the included studies. of the algorithm (i.e., developer). Especially with the specific characteristics
Additionally, we ran several sensitivity analyses to evaluate for and potential bias being introduced through the specific algorithm (e.g., for
potential selection bias. We first searched the dblp computer science bib- example stemming from a training bias or gaps in the underlying data), it is
liography, yielding 1159 studies for title and abstract screening. Therein, we essential to provide information about the origins and prior validation steps
achieved perfect interrater reliability (100%). Subsequently, only thirteen of the algorithm in clinical use51,52. Interestingly, only four included studies
studies proceeded to full-text screening, with just one meeting our review discussed the possibility of bias in the AI algorithm53–56. Open science
criteria. This study by Wismueller & Stockmaster42 was also part of our principles, such as data or code sharing, aid to mitigate the impact of bias.
original search. Notably, this study was the only conference publication Yet, none of the studies in our review used open-source solutions or pro-
providing a full paper (refer to Supplementary Table 2). vided their algorithm52. Additionally, guidelines such as CONSORT-AI or
Moreover, to ensure comprehensive coverage and to detect potentially SPIRIT-AI provide recommendations for the reporting of clinical studies
missed publications due to excluding conference proceedings, we screened using AI solutions57, as previous systematic reviews have also identified
2614 records from IEEE Xplore, MICCAI, and HICSS. Once again, our title serious gaps in the reporting on clinical AI solutions58,59. Our results cor-
and abstract screening demonstrated perfect interrater reliability (100%). roborate this shortcoming, as none of the studies reporting non-commercial
However, despite including 31 publications in full-text screening, none met algorithms and only four studies overall followed a reporting guideline.
our inclusion criteria upon thorough assessment. Altogether, this additionally Notwithstanding, for some included studies, AI-specific reporting guide-
searches showed no significant indication for a potential selection bias and lines were published after their initial publication. Nevertheless, compre-
potentially missing out key work in other major scientific publication outlets. hensive and transparent reporting remains insufficient.
Using AMSTAR-2 (A MeaSurement Tool to Assess Systematic With our review, we were able to replicate some of the findings by Yin
Reviews)43, we rated the overall confidence in the results as low, mainly due et al., who provided a first overview on AI solutions in clinical practice,
to our decision to combine non-randomized and randomized studies within e.g., insufficient reporting in included studies60. By providing time for
our meta-analysis (Supplementary Fig. 8). tasks and meta-analyses as well as workflow descriptions our review
substantially extends the scope of their review, providing a robust and
Discussion detailed overview on the efficiency effects of AI solutions. In 2020,
Given the widespread adoption of AI technologies in clinical work, our Nagendran et al. provided a review comparing AI algorithms for medical
systematic review and meta-analysis assesses efficiency effects on routine imaging and clinicians, concluding that only few prospective studies in
clinical work in medical imaging. Although most studies reported positive clinical settings exist59. Our systematic review demonstrated an increase in
effects, our three meta-analyses with subsets of comparable studies showed real-world studies in previous years and provides an up-to-date and
no evidence of AI tools reducing the time on imaging tasks. Studies varied comprehensive overview on AI solutions currently used in medical ima-
substantially in design and measures. This high heterogeneity renders ging practice. Our study thereby addresses one of the previously men-
robust inferences. Although nearly 67% of time-related outcome studies tioned shortcomings, that benefits of the AI algorithm in silico or in
have shown a decrease in time with AI use, a noteworthy portion of these retrospective studies might not transfer into clinical benefit59. This is also
studies revealed conflicts of interest, potentially influencing study design or recognized by Han et al.61 who evaluated randomized controlled trials
outcome estimation44. Our findings emphasize the need for comparable and evaluating AI in clinical practice and who argued that efficiency outcomes
independent high-quality studies on AI implementation to determine its will strongly depend on implementation processes in actual clinical
actual effect on clinical workflows. practice.
Focusing on how AI tools were integrated into the clinical workflow, The complexities of transferring AI solutions from research into
we discovered diverse adoptions of AI applications in clinical imaging. Some practice were explored in a review by Hua et al.62 who evaluated the
studies have provided brief descriptions that lack adequate details to com- acceptability AI for medical imaging by healthcare professionals. We believe
prehend the process. Despite predictions of AI potentially supplanting that for AI to unfold its full potential, it is essential to pay thorough attention
human readers or serving as gatekeepers, with humans primarily reviewing to the adoption challenges and work system integration in clinical work-
flagged cases to enhance efficiency10,11, we noted a limited adoption of AI in places. Notwithstanding the increasing number of studies on AI use in real-
this manner across studies. In contrast, most studies reported AI tools as world settings during the last years, many questions on AI implementation
supplementary readers, potentially extending the time taken for inter- and workflow integration remain unanswered. On the one hand, limited
pretation when radiologists must additionally incorporate AI-generated consideration prevails on acceptance of AI solutions by professionals62.
results18,45. Another practice involved concurrent reading, which seems Although studies even discuss the possibility of AI as a teammate in the
beneficial because it guides clinicians’ attention to crucial areas, which future63,64, most available studies rarely include perceptions of affected
potentially improves reading quality and safety without lengthening reading clinicians60. On the other hand, operational and technical challenges as well
times45,46. Regardless of how AI was used, a crucial factor is its alignment as system integration into clinical IT infrastructures are major challenges, as
with the intended purpose and task15. many of the described algorithms are cloud-based. Smooth interoperability
Although efficiency stands out in the current literature, we were also between new AI technologies and local clinical information systems as well
interested in whether AI affects clinicians’ workload, besides the time as existing IT infrastructure is key to efficient clinical workflows50. For
measurements, such as number of tasks or cognitive load. We only found example, the combination of multimodal data, such as imaging and EHR
three studies on AI’s impact on clinicians’ workload, but no study assessed data, could be beneficial for future decision processes in healthcare65.
workload separately (e.g., in terms of cognitive workload changes)18,35–37. Our review has several limitations. First, publication bias may have
This gap in research is remarkable since human–technology interaction and contributed to the high number of positive findings in our study. Second,
human factors assessment will be a success factor for the adoption of AI in despite searching multiple databases, selection bias may have occurred,
healthcare47,48. particularly as some clinics implementing AI do not systematically assess or
Source Clearance Body part Purpose Technology Study Sensitivity Specificity Processing time
Aidoc Medical, Tel Aviv, Israel/New FDA Head Prioritization Convolutional neural network Davis et al.39 95.0% 99.0% near real-time
York, NY, USA
Ginat 83 88.4% 96.1% 3 min
O’Neill et al.89 95.0% 99.0% 30–45 sec
95
Seyam et al. 87.2% 93.9% NI
Zia et al.30 85.7% 96.8% NI
Aidoc Medical, Tel Aviv, Israel CE, FDA Chest Prioritization Convolutional neural network Batra et al.34 83.3% 97.1% NI
Annalise AI, Sydney, Australia Pre-existing regulatory Chest Detection Convolutional neural network Jones et al.85 NI NI NI
approval
Digital Diagnostics, Coralville, FDA Eye Prioritization Deep learning and rule-based Kanagasingam et al.22 NI 92.0% <3 min
IA, USA models
EndoVigilant Inc., MD, USA NI Colon Detection NI Quan et al.91 90.0% 97.0% 30 frames per sec
FDNA Inc., Sunrise, FL, USA NI Face Detection NI Marwaha et al.88 NI NI NI
Gleamer, Paris, France NI Whole Detection Convolutional neural network Duron et al.21 79.4% 93.6% NI
body (reader + AI, (reader + AI,
patient-wise) patient-wise)
Oppenheimer et al.90 86.9% 84.7% 3 min
Hologic, Marlborough, MA, USA NI Breast Detection NI Tchou et al.31 NI NI NI
iCAD, Nashua, NH, USA NI Breast Detection Convolutional neural network Conant et al.28 85.0% (reader + AI) 69.6% (reader + AI) NI
Infervision Technology Co., Ltd., CE, FDA Chest Detection Deep learning Diao et al.20 NI NI NI
Beijing, China
Limbus AI, Regina, Saskatchewan, NI Whole Segmentation Deep learning Wong et al.99 NI NI NI
Canada body
Lunit, Seoul, South Korea NI Chest Detection Deep learning Hong et al.84 74.8% 99.8% NI
Medtronic, Minneapolis, MN, USA FDA Colon Detection NI Ladabaum et al.41 NI NI NI
Levy et al.87 NI NI NI
Nehme et al.29 NI NI NI
Repici et al.24 99.7% NI real-time
MVision AI Oy, Helsinki, Finland CE, FDA Whole Segmentation Convolutional neural network Kiljunen et al.86 NI NI NI
body
Strolin et al.97 NI NI 2.3 min
Philipps Healthcare, Best, The NI Chest Detection NI Wittenberg et al.33 96.0% 22.0% NI
Netherlands
ScreenPoint Medical, Nijmegen, CE, FDA Breast Prioritization Deep learning Raya-Povedano et al.36 84.1% (reader + AI) NI NI
The Netherlands
Shanghai Wision AI Co., Ltd., NI Colon Detection Deep learning Wang et al.26 94.4% per image 95.9% per image real-time
Shanghai, China
Shenzhen SiBright CO. Ltd., NIFDC Eye Detection Ensemble of 3 convolutional Yang et al.101 86.7% 96.1% 24 sec per eye
Shenzen, China neural networks
Review article
6
Table 2 (continued) | Non-commercially available AI algorithms
Source Clearance Body part Purpose Technology Study Sensitivity Specificity Processing time
Siemens Healthcare, Erlangen, FDA Chest Detection NI Mueller et al.8 NI NI NI
7
https://doi.org/10.1038/s41746-024-01248-9 Review article
Fig. 2 | Quality assessment of included articles. Summary plots of the risk of bias assessments via Risk of Bias in Non-randomized Studies of Interventions tool (ROBINS-I)
for non-randomized studies and the Cochrane Risk of Bias tool (Rob 2) for randomized studies.
publish their processes in scientific formats60. Moreover, we excluded con- note that AI implementation can address a spectrum of outcomes, including
ference publications which could be the source for potential biases. but not limited to enhancing patient quality and safety, augmenting diag-
Nevertheless, we ran different sensitivity analyses for publication and nostic confidence, and improving healthcare staff satisfaction8.
selection bias, and did not find evidence for major bias introduced due to our In conclusion, our review showed a positive trend toward research on
search and identification strategy. Yet, aside from one conference paper, all actual AI implementation in medical imaging, with most studies describing
other conference publications merely provided abstracts or posters, lacking efficiency improvements in course of AI technology implementation. We
a comprehensive base for the extraction of required details. Third, we derive important recommendations for future studies on the implementa-
focused exclusively on medical imaging tasks to enhance the internal validity tion of AI in clinical settings. The rigorous use of reporting guidelines should
of clinical tasks across diverse designs, AI solutions, and workflows. Fourth, be encouraged, as many studies reporting time outcomes did not provide
the low quality rating of our review on the AMSTAR-2 checklist, which is sufficient details on their methods. Providing a protocol or clear depiction of
due to the diverse study designs we included, calling for more comparable how AI tools modify clinical workflows allows comprehension and com-
high quality studies in this field. Nevertheless, we believe that our review parison between pre- and post-adoption processes while facilitating learning
provides a thorough summary of the available studies matching our research and future implementation practice. Considering the complexity of
question. Finally, our review concentrated solely on efficiency outcomes healthcare systems, understanding the factors contributing to successful AI
stemming from the integration of AI into clinical workflows. Yet, the actual implementation is invaluable. Our review corroborates the need for com-
impact of AI algorithms on efficiency gains in routine clinical work can be parable evaluations to monitor and quantify efficiency effects of AI in
influenced by further, not here specified local factors, e.g., existent IT clinical real-world settings. Finally, future research should therefore explore
infrastructure, computational resources, processing times. Next to the success and potential differences between different AI algorithms in con-
testing of the AI solutions under standardized conditions or in randomized trolled trials as well as in real-world clinical practice settings to inform and
controlled trials, which can indicate whether AI solution are suitable for the guide future implementation processes.
transfer into routine medical care, careful evaluations of how AI solutions fit
into everyday clinical workflow should be expanded, i.e., ideally before Methods
implementation. Exploring adoption procedures along with identifying key Registration and protocol
implementation facilitators and barriers provides valuable insights into Before its initiation, our systematic literature review was registered in a
successful AI technology use in clinical routines. However, it is important to database (PROSPERO, ID: CRD42022303439), and the review protocol was
Cheikh et al.81 Reading timea Survey Mean (SD) 00:14:33 00:15:36 +00:01:03 *** Triage
(00:09:05) (00:09:46) ( + 7.22%)
Conant et al.28 Reading time NI Mean (CI) 00:01:04 00:00:30 −00:00:34 ** Second reader,
(00:00:25) (00:00:12) (−52.57%) concurrent
Diao et al.20 Reading time Automatically recorded Mean (SD) 00:04:30 00:03:43 −00:00:47 *** Second reader,
(00:02:24) (00:02:26) (−17.41%) sequential
Duron et al.21 Reading time Automatically recorded Mean 00:01:07 00:00:57 −00:00:10 n.s. Second reader,
(−14.93%) concurrent
Mueller et al.8 Reading time − NI Mean (SD) 00:06:10 00:07:17 +00:01:07 n.s. Depending on
resident (00:02:49) (00:02:29) ( + 18.11%) radiologist’s choice
Mueller et al.8 Reading time − NI Mean (SD) 00:06:06 00:06:20 +00:00:14 n.s. Depending on
consultant (00:01:50) (00:02:01) ( + 3.83%) radiologist’s choice
O’Neill et al.89 Reading timeb NI Median (CI) 00:04:50 00:06:14 +00:01:23 n.s. Triage
(00:00:27) (00:05:28) ( + 28.73%)
Schmuelling Reading timea Timestamps in the clinical information Mean (SD) 01:25:30 01:18:30 −00:07:00 n.s. Triage
et al.94 system (04:42:00) (04:33:00) (-8.19%)
Vassallo et al.32 Reading time Recorded by investigator Mean (SD) 00:04:56 00:05:29 +00:00:33 * Sequential due to
(00:01:20) (00:01:23) ( + 11.15%) study design
Yacoub et al.37 Reading time Self-measured with digital stopwatch Mean (SD) 00:07:01 00:05:28 −00:01:33 *** Second reader,
(00:02:55) (00:02:02) (−22.09%) concurrent
Cha et al.38 Contouring time Self-report Median (IQR) 00:40:00 00:28:00 −00:12:00 ** First reader
(00:43:00) (00:10:00) (−30.00%)
Kiljunen et al.86 Contouring time NI Mean 00:27:00 00:15:00 −00:12:00 NI First reader
(−44.44%)
Strolin et al.97 Contouring time NI Median 00:25:00 00:12:18 −00:12:42 *** First reader
(Range) (01:47:00) (00:46:54) (−50.80%)
Tchou et al.31 Time to review AI Timestamp macro in Excel/ recording by Mean (SE) 00:01:58 .. 00:00:23d NI Sequential due to
results investigator (00:00:04) (00:00:02) study design
Wittenberg et al.33 Time to review AI NI Mean (Range) 00:01:15 .. 00:00:22d NI Sequential due to
results (00:01:02) (00:00:18) study design
Arbabshirani et al.7 Time to NI Median (IQR) 08:32:00 00:19:00 −08:13:00 *** Triage
interpretationb (01:51:00) (00:22:00) (−96.29%)
Ginat83 Wait time (ED Automatically recorded Mean (SD) 01:25:00 01:12:00 −00:13:00 n.s. Triage
cases)b (03:14:00) (02:57:00) (−15.29%)
Ginat83 Wait time Automatically recorded Mean (SD) 06:30:00 05:52:00 −00:38:00 ** Triage
(inpatient cases)b (06:08:00) (05:15:00) (−9.74%)
Ginat83 Wait time Automatically recorded Mean (SD) 11:14:00 01:10:00 −10:04:00 *** Triage
(outpatient (13:45:00) (02:21:00) (−89.61%)
cases)b
O’Neill et al.89 Wait timeb NI Median (CI) 00:15:45 00:12:01 −00:03:44 *** Triage
(00:00:46) (00:01:55) (−23.75%)
Elijovich et al.82 Time to Retrospective documentation Median (IQR) 00:26:00 00:07:00 −00:19:00 *** Triage
notification (00:14:00) (00:04:00) (−73.08%)
Hong et al.84 Time to treatment Retrospectively through analysis of Mean (SD) 02:30:00 01:12:00 −01:18:00 n.s. Second reader,
electronic medical records (03:24:00) (19:30:00) (−4.91%) concurrent
Batra et al.34 Report Timestamps Mean 00:59:54 00:47:36 −00:12:18 *** Triage
turnaround time (−20.53%)
Seyam et al.95 Report Timestamps extracted from the electronic Mean (CI) 01:00:00 01:03:00 +00:03:00 NI Triage
turnaround timeb medical record and PACS (00:17:00) (00:11:00) ( + 5.00%)
Sim et al.96 Report Extracted timestamps from the hospital’s RIS Mean 00:09:00 00:07:00 −00:02:00 NI Triage
turnaround time (−22.22%)
Zia et al.30 Report NI Mean (SD) 01:06:42 01:20:00 +00:13:18 * Second reader,
turnaround timeb (00:41:30) (01:04:24) ( + 19.94%) sequential
Schmuelling ED Timestamps in the clinical information Mean (SD) 02:06:00 01:59:00 −00:07:00 n.s. Triage
et al.94 turnaround timea system (01:04:12) (01:41:00) (−5.56%)
Hassan et al.40 DIDO time at PSC NI Mean (SD) 03:46:42 02:04:24 −01:42:18 * Triage
(04:02:54) (00:57:36) (−45.13%)
Ladabaum et al.41 Withdrawal time NI Mean (CI) 00:17:30 00:18:00 +00:00:30 n.s. NI
(00:01:30) (00:01:36) ( + 2.86%)
Nehme et al.29 Withdrawal time NI Median (IQR) 00:17:00 00:18:00 +00:01:00 n.s. NI
(00:15:00) (00:16:00) ( + 5.88%)
Repici et al.24 Withdrawal time Stopwatch Mean (SD) 00:07:15 00:06:57 −00:00:18 n.s. NI
(00:02:29) (00:01:41) (−4.14%)
Wang et al.26 Withdrawal time NI Mean (SD) 00:06:23 00:06:53 +00:00:30 *** Second reader,
(00:01:13) (00:01:47) ( + 7.82%) concurrent
Levy et al.87 Total Recorded by endoscopy nurse Median (IQR) 00:24:00 00:22:00 −00:02:00 *** NI
procedure time (00:17:00) (00:12:00) (−8.33%)
Wang et al.26 Total NI Mean (SD) 00:12:06 00:12:31 +00:00:25 n.s. Second reader,
procedure time (00:04:05) (00:04:23) ( + 3.47%) concurrent
Raya-Povedano .. .. .. .. .. .. .. Gatekeeper
et al.36
Ruamviboonsuk .. .. .. .. .. .. .. Gatekeeper
et al.92
n.s. Not significant, AI Artificial intelligence, CI 95% confidence interval, DIDO Door-in-door out time, ED Emergency department, EMR Electronic medical record, IQR Interquartile range, NI No
information, PACS Picture archiving and communication system, PSC Primary stroke center, RIS Radiology information system, SD Standard deviation, SE Standard error.
Time formats are hh:mm:ss. *p < 0.05, **p < 0.01, ***p < 0.001.
a
Time measurements for scans that have been classified positive for pulmonary embolism.
b
Time measurements for scans that have been classified positive for intracranial hemorrhage.
c
Potretzke et al. reported a reduction in segmentation time but no concrete numbers.
d
Additional reading time for AI use.
peer-reviewed (International Registered Report Identifier RR2-10.2196/ PsycINFO, Web of Science, IEEE Xplore, and Cochrane Central Register
40485)14. Our reporting adheres to the Preferred Reporting Items for Sys- of Controlled Trials. We included original studies on clinical imaging,
tematic Review and Meta-Analysis (PRISMA) statement reporting guide- written in German or English, retrieved in full-text, and published in
lines (Supplementary Table 3). During the preparation of this work, we used peer-reviewed journals from the 1st of January 2000 onward, which
ChatGPT (version GPT-3.5, OpenAI) to optimize the readability and marked a new area of AI in healthcare with the development of deep
wording of the manuscript. After using this tool, the authors reviewed and learning14,66. The first search was performed on July 21st, 2022, and was
edited the content as required and take full responsibility for the content of updated on May 19th, 2023. Furthermore, a snowball search screening of
the publication. the references of the identified studies was performed to retrieve relevant
studies. Dissertations, conference proceedings, and gray literature were
Search strategy and eligibility criteria excluded. This review encompassed observational and interventional
Articles were retrieved through a structured literature search in the studies, such as randomized controlled trials and nonrandomized stu-
following electronic databases: MEDLINE (PubMed), Embase, dies on interventions (e.g., before–after studies). Only studies that
Fig. 3 | Results of meta-analyses. Graphical display and statistical results of the three measured the total procedure time. c Studies that used AI for reprioritization and
meta-analyses: a Studies using AI for detection tasks in CT images and reported measured the turnaround times for cases flagged positive. All included studies used
clinicians’ reading time. b Studies using AI to detect polyps during colonoscopy and AIDOC for intracranial hemorrhage detection.
introduced AI to actual real-life clinical workflows were eligible, that is, imaging and surveying healthcare professionals of varying expertise
those not conducted in an experimental setting or in a laboratory. The and qualifications.
search strategy followed the PICO framework: • Exposure/interventions: This review encompassed studies that focused
• Population: This review included studies conducted in real-world on various AI tools for diagnostics and their impact on healthcare
healthcare facilities, such as hospitals and clinics, using medical professionals’ interaction with the technology across various clinical
Fig. 4 | Prototypical workflows after AI implementation. Visual representation of detection. c Workflow when using AI for segmentation tasks. Figure created with
the different workflows when using AI as reported in the included studies: Canva (Canva Pty Ltd, Sydney, Australia).
a Workflows when using AI for prioritization tasks. b Workflow when using AI for
imaging tasks67. We exclusively focused on AI tools that interpret image Screening and selection procedure
data for disease diagnosis and screening5. For data extraction, we used All retrieved articles were imported into the Rayyan tool68,69 for title and
the following working definition of AI used for clinical diagnostics: abstract screening. In the first step, after undergoing a training, two study
“any computer system used to interpret imaging data to make a team members (KW and JK/MW/NG) independently screened the titles
diagnosis or screen for a disease, a task previously reserved for and abstracts to establish interrater agreement. In the second step, the full
specialists”14. texts of all eligible publications were screened by KW and JK. Any potential
• Comparators: This review emphasized studies comparing the work- conflicts regarding the inclusion of articles were resolved through discus-
flow before AI use with that after AI use or the workflow with AI use sions with a third team member (MW). Reasons for exclusion were docu-
with that without AI use, although this was not a mandatory criterion mented, as depicted in the flow diagram in Fig. 170.
to be included in the review.
• Outcomes: The primary aim of this study was to evaluate how AI Data extraction procedure
solutions impact workflow efficiency in clinical care contexts. Thus, Two authors (JK and KW/FZ) extracted the study data and imported them
we focused on three outcomes of interest: (1) changes in time into MS Excel which then went through random checks by a study team
required for task completion, (2) workflow adaptation, and (3) member (MW). To establish agreement all reviewers extracted data from
workload. the first five studies based on internal data extraction guidelines.
(1) Changes in time for completion of imaging tasks were considered,
focusing on reported quantitative changes attributed to AI usage (e.g., Study quality appraisal and risk of bias assessment
throughput times and review duration). To evaluate the methodological quality of the included studies, two
(2) Workflow adaptation encompasses changes in the workflow that result reviewers (KW and JK) used three established tools. The Risk of Bias in Non-
from the introduction of new technologies, particularly in the context randomized Studies of Interventions tool (ROBINS-I) for non-randomized
of AI implementation (i.e., specifying the time and purpose of AI use). studies and the Cochrane Risk of Bias tool (Rob 2) for randomized studies
(3) Workload refers to the demands of tasks on human operators and were used71,72. To assess the reporting quality of the included studies, the
changes associated with AI implementation (e.g., cognitive demands or MINORS was used27. The MINORS was used instead of the Quality of
task load). Reporting of Observational Longitudinal Research checklist73, as pre-
specified in the review protocol, because this tool was more adaptable to all
The detailed search strategy following the PICO framework can be included studies. Appraisals were finally established through discussion
found in Supplementary Table 4 and Supplementary Note 1. until consensus was achieved.
Strategy for data synthesis 5. He, J. et al. The practical implementation of artificial intelligence
First, we describe the overall sample and the key information from each technologies in medicine. Nat. Med. 25, 30–36 (2019).
included study. Risk of bias assessment evaluations are presented in nar- 6. Wong, S. H., Al-Hasani, H., Alam, Z. & Alam, A. Artificial intelligence in
rative and tabular formats. Next, where comparable studies were sufficient, a radiology: how will we be affected? Eur. Radiol. 29, 141–143 (2019).
meta-analysis was performed to examine the effects of AI introduction. We 7. Arbabshirani, M. R. et al. Advanced machine learning in action:
used the method of Wan et al.74 to estimate the sample mean and standard identification of intracranial hemorrhage on computed tomography
deviation from the sample size, median, and interquartile range because the scans of the head with clinical workflow integration. npj Digit. Med. 1,
reported measures varied across the included studies. Furthermore, we 9 (2018).
followed the Cochrane Handbook for calculating the standard deviation 8. Mueller, F. C. et al. Impact of concurrent use of artificial intelligence
from the confidence interval (CI)75. The metafor package in R76 was used to tools on radiologists reading time: a prospective feasibility study.
quantitatively synthesize data from the retrieved studies. Considering the Acad. Radiol. 29, 1085–1090 (2022).
anticipated heterogeneity of effects, a random-effects model was used to 9. Pumplun, L., Fecho, M., Wahl, N., Peters, F. & Buxmann, P. Adoption
estimate the average effect across studies. Moreover, we used the DerSi- of machine learning systems for medical diagnostics in clinics:
monian and Laird method to determine cross-study variance and the qualitative interview study. J. Med. Internet Res. 23, e29301 (2021).
Hartung–Knapp method to estimate the variance of the random effect77,78. 10. Dahlblom, V., Dustler, M., Tingberg, A. & Zackrisson, S. Breast
Heterogeneity was assessed using Cochran’s Q test79 and the I2 statistic75. In cancer screening with digital breast tomosynthesis: comparison of
cases where a meta-analysis was not feasible, the results were summarized in different reading strategies implementing artificial intelligence. Eur.
narrative form and presented in tabular format. Radiol. 33, 3754–3765 (2023).
11. Miyake, M. et al. Comparative performance of a primary-reader and
Meta-biases second-reader paradigm of computer-aided detection for CT
Potential sources of meta-bias, such as publication bias and selective colonography in a low-prevalence screening population. Jpn J.
reporting across studies, were considered. Funnel plots were created for the Radio. 31, 310–319 (2013).
studies included in the meta-analyses. 12. Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H. & Aerts, H.
To assess whether our review is subject to selection bias due to the choice J. W. L. Artificial intelligence in radiology. Nat. Rev. Cancer 18,
of databases and publication types, we conducted an additional search in the 500–510 (2018).
dblp computer science bibliography (with our original search timeframe). As 13. van Leeuwen, K. G., de Rooij, M., Schalekamp, S., van Ginneken, B.
this database did not allow our original search string, the adapted version is & Rutten, M. J. C. M. How does artificial intelligence in radiology
found in Supplementary Note 2. Additionally, we performed searches on improve efficiency and health outcomes? Pediatric Radiol. 52,
conference proceedings of the last three years, spanning publications from the 2087–2093 (2021).
January 1st 2020 until May 15th 2023. We surveyed IEEE Xplore and two 14. Wenderott, K., Gambashidze, N. & Weigl, M. Integration of artificial
major conferences not included in the database: International Conference on intelligence into sociotechnical work systems—effects of artificial
Medical Image Computing and Computer Assisted Intervention (MICCAI) intelligence solutions in medical imaging on clinical efficiency:
and Hawaii International Conference on System Sciences (HICSS). We protocol for a systematic literature review. JMIR Res. Protoc. 11,
conducted an initial screening of titles and abstracts, with one reviewer (KW) e40485 (2022).
screening all records and JK screening 10% to assess interrater reliability. Full- 15. Salwei, M. E. & Carayon, P. A Sociotechnical systems framework for
text assessments for eligibility were then performed by one of the reviewers, the application of artificial intelligence in health care delivery. J.
respectively (KW or JK). Furthermore, the AMSTAR-2 critical appraisal tool Cogn. Eng. Decis. Making 16, 194–206 (2022).
for systematic reviews of randomized and/or non-randomized healthcare 16. Wolff, J., Pauling, J., Keck, A. & Baumbach, J. Success factors of
intervention studies was used43. artificial intelligence Implementation in Healthcare. Front. Digit.
Health 3, 594971 (2021).
Data availability 17. Felmingham, C. M. et al. The importance of incorporating human
All data generated or analyzed during this study is available from the cor- factors in the design and implementation of artificial intelligence for
responding author upon reasonable request. skin cancer diagnosis in the real world. Am. J. Clin. Dermatol. 22,
233–242 (2021).
Code availability 18. Wenderott, K., Krups, J., Luetkens, J. A., Gambashidze, N. & Weigl,
Code for meta-analyses available via https://github.com/katwend/ M. Prospective effects of an artificial intelligence-based computer-
metaanalyses. aided detection system for prostate imaging on routine workflow and
radiologists’ outcomes. Eur. J. Radiol. 170, 111252 (2024).
Received: 3 April 2024; Accepted: 31 August 2024; 19. Pierce, J. et al. Seamless integration of artificial intelligence into the
clinical environment: our experience with a novel pneumothorax
detection artificial intelligence algorithm. J. Am. Coll. Radiol. 18,
References 1497–1505 (2021).
1. Yeganeh, H. An analysis of emerging trends and transformations in 20. Diao, K. et al. Diagnostic study on clinical feasibility of an AI-based
global healthcare. IJHG 24, 169–180 (2019). diagnostic system as a second reader on mobile CT images: a
2. Asan, O., Bayrak, A. E. & Choudhury, A. Artificial intelligence and preliminary result. Ann. Transl. Med. 10, 668 (2022).
human trust in healthcare: focus on clinicians. J. Med. Internet Res. 21. Duron, L. et al. Assessment of an AI aid in detection of adult
22, e15154 (2020). appendicular skeletal fractures by emergency physicians and
3. Park, C.-W. et al. Artificial intelligence in health care: current radiologists: a multicenter cross-sectional diagnostic study.
applications and issues. J. Korean Med. Sci. 35, e379 (2020). Radiology 300, 120–129 (2021).
4. Ahmad, Z., Rahim, S., Zubair, M. & Abdul-Ghafar, J. Artificial 22. Kanagasingam, Y. et al. Evaluation of artificial intelligence–based
Intelligence (ai) in medicine, current applications and future role with grading of diabetic retinopathy in primary care. JAMA Netw. Open 1,
special emphasis on its potential and promise in pathology: present e182665 (2018).
and future impact, obstacles including costs and acceptance among 23. Bossuyt, P. M. et al. STARD 2015: an updated list of essential items
pathologists, practical and philosophical considerations. a for reporting diagnostic accuracy studies. Radiology 277,
comprehensive review. Diagn. Pathol. 16, 24 (2021). 826–832 (2015).
24. Repici, A. et al. Efficacy of real-time computer-aided detection of 44. Boutron, I. et al. Considering bias and conflicts of interest among the
colorectal neoplasia in a randomized trial. Gastroenterology 159, included studies. In Cochrane Handbook for Systematic Reviews of
512–520.e7 (2020). Interventions (eds Higgins, J. P. T. et al.) 177–204 (Wiley, 2019).
25. Schulz, K. F., Altman, D. G. & Moher, D. CONSORT Group 45. Beyer, F. et al. Comparison of sensitivity and reading time for the use
CONSORT 2010 Statement: updated guidelines for reporting of computer-aided detection (CAD) of pulmonary nodules at MDCT
parallel group randomised trials. BMJ 340, c332–c332 (2010). as concurrent or second reader. Eur. Radio. 17, 2941–2947 (2007).
26. Wang, P. et al. Real-time automatic detection system increases 46. Fujita, H. AI-based computer-aided diagnosis (AI-CAD): the latest
colonoscopic polyp and adenoma detection rates: a prospective review to read first. Radio. Phys. Technol. 13, 6–19 (2020).
randomised controlled study. Gut 68, 1813–1819 (2019). 47. Asan, O. & Choudhury, A. Research trends in artificial intelligence
27. Slim, K. et al. Methodological index for non-randomized studies applications in human factors health care: mapping review. JMIR
(MINORS): development and validation of a new instrument: Hum. Factors 8, e28236 (2021).
methodological index for non-randomized studies. ANZ J. Surg. 73, 48. Herrmann, T. & Pfeiffer, S. Keeping the organization in the loop: a
712–716 (2003). socio-technical extension of human-centered artificial intelligence.
28. Conant, E. F. et al. Improving accuracy and efficiency with AI Soc. 38, 1523–1542 (2023).
concurrent use of artificial intelligence for digital breast 49. Allen, B. The role of the FDA in ensuring the safety and efficacy of
tomosynthesis. Radiol. Artif. Intell. 1, e180096 (2019). artificial intelligence software and devices. J. Am. Coll. Radiol. 16,
29. Nehme, F. et al. Performance and attitudes toward real-time 208–210 (2019).
computer-aided polyp detection during colonoscopy in a large 50. Wenderott, K., Krups, J., Luetkens, J. A. & Weigl, M. Radiologists’
tertiary referral center in the United States. Gastrointest. Endosc. 98, perspectives on the workflow integration of an artificial intelligence-
100–109.e6 (2023). based computer-aided detection system: a qualitative study. Appl.
30. Zia, A. et al. Retrospective analysis and prospective validation of an Ergon. 117, 104243 (2024).
Ai-based software for intracranial haemorrhage detection at a high- 51. Nazer, L. H. et al. Bias in artificial intelligence algorithms and
volume trauma centre. Sci. Rep. 12, 19885 (2022). recommendations for mitigation. PLOS Digit Health 2, e0000278 (2023).
31. Tchou, P. M. et al. Interpretation time of computer-aided detection at 52. Norori, N., Hu, Q., Aellen, F. M., Faraci, F. D. & Tzovara, A.
screening mammography. Radiology 257, 40–46 (2010). Addressing bias in big data and AI for health care: a call for open
32. Vassallo, L. et al. A cloud-based computer-aided detection system science. Patterns 2, 100347 (2021).
improves identification of lung nodules on computed tomography 53. Chen, W. et al. Improving the diagnosis of acute ischemic stroke on
scans of patients with extra-thoracic malignancies. Eur. Radiol. 29, non-contrast Ct using deep learning: a multicenter study. Insights
144–152 (2019). Imaging 13, 184 (2022).
33. Wittenberg, R. et al. Acute pulmonary embolism: effect of a 54. Potretzke, T. et al. Clinical implementation of an artificial intelligence
computer-assisted detection prototype on diagnosis—an observer algorithm for magnetic resonance-derived measurement of total
study. Radiology 262, 305–313 (2012). kidney volume. Mayo Clin. Proc. 98, 689–700 (2023).
34. Batra, K., Xi, Y., Bhagwat, S., Espino, A. & Peshock, R. Radiologist 55. Sun, J. et al. Performance of a chest radiograph AI diagnostic tool for
worklist reprioritization using artificial intelligence: impact on report COVID-19: a prospective observational study. Radiol. Artif. Intell. 4,
turnaround times for CTPA examinations positive for acute e210217 (2022).
pulmonary embolism. Am. J. Roentgenol 221, 324–333 (2023). 56. Tricarico, D. et al. Convolutional neural network-based automatic
35. Liu, X. et al. Evaluation of an OCT-AI-based telemedicine platform for analysis of chest radiographs for the detection of COVID-19
retinal disease screening and referral in a primary care setting. pneumonia: a prioritizing tool in the emergency department, phase
Transl. Vis. Sci. Technol. 11, 4 (2022). i study and preliminary ‘real life’ results. Diagnostics 12, 570
36. Raya-Povedano, J. L. et al. AI-based strategies to reduce workload (2022).
in breast cancer screening with mammography and tomosynthesis: 57. Ibrahim, H. et al. Reporting guidelines for clinical trials of artificial
a retrospective evaluation. Radiology 300, 57–65 (2021). intelligence interventions: the SPIRIT-AI and CONSORT-AI
37. Yacoub, B. et al. Impact of artificial intelligence assistance on chest guidelines. Trials 22, 11 (2021).
CT interpretation times: a prospective randomized study. Am. J. 58. Liu, X. et al. A comparison of deep learning performance against
Roentgenol. 219, 743–751 (2022). health-care professionals in detecting diseases from medical
38. Cha, E. et al. Clinical implementation of deep learning contour auto imaging: a systematic review and meta-analysis. Lancet Digit.
segmentation for prostate radiotherapy. Radiother. Oncol. 159, Health 1, e271–e297 (2019).
1–7 (2021). 59. Nagendran, M. et al. Artificial intelligence versus clinicians:
39. Davis, M. A., Rao, B., Cedeno, P. A., Saha, A. & Zohrabian, V. M. systematic review of design, reporting standards, and claims of
Machine learning and improved quality metrics in acute intracranial deep learning studies. BMJ m689 (2020).
hemorrhage by noncontrast computed tomography. Curr. Probl. 60. Yin, J., Ngiam, K. Y. & Teo, H. H. Role of artificial intelligence
Diagn. Radiol. 51, 556–561 (2022). applications in real-life clinical practice: systematic review. J. Med.
40. Hassan, A., Ringheanu, V. & Tekle, W. The implementation of artificial Internet Res. 23, e25759 (2021).
intelligence significantly reduces door-in-door-out times in a primary 61. Han, R. et al. Randomised controlled trials evaluating artificial
care center prior to transfer. Interv. Neuroradiol. 29, 631–636 (2022). intelligence in clinical practice: a scoping review. Lancet Digit. Health
41. Ladabaum, U. et al. Computer-aided detection of polyps does not 6, e367–e373 (2024).
improve colonoscopist performance in a pragmatic implementation 62. Hua, D., Petrina, N., Young, N., Cho, J.-G. & Poon, S. K.
trial. Gastroenterol. 164, 481–483 (2023). Understanding the factors influencing acceptability of AI in medical
42. Wismüller, A. & Stockmaster, L. A Prospective randomized clinical imaging domains among healthcare professionals: a scoping
trial for measuring radiology study reporting time on artificial review. Artif. Intell. Med. 147, 102698 (2024).
intelligence-based detection of intracranial hemorrhage in emergent 63. Bruni, S., Freiman, M. & Riddle, K. Beyond the tool vs. teammate
Care Head CT (2020). debate: exploring the sidekick metaphor in human-AI Dyads. In: Julia
43. Shea, B. J. et al. Amstar 2: a critical appraisal tool for systematic Wright and Daniel Barber (eds) Human Factors and Simulation.
reviews that include randomised or non-randomised studies of AHFE (2023) International Conference. AHFE Open Access,
healthcare interventions, or both. BMJ 358, j4008 (2017). 83 (2023).
64. Flathmann, C. et al. Examining the impact of varying levels of AI 87. Levy, I., Bruckmayer, L., Klang, E., Ben-Horin, S. & Kopylov, U.
teammate influence on human-AI teams. Int. J. Hum.-Comput. Stud. Artificial intelligence-aided colonoscopy does not increase
177, 103061 (2023). adenoma detection rate in routine clinical practice. Am. J.
65. Huang, S.-C., Pareek, A., Seyyedi, S., Banerjee, I. & Lungren, M. P. Gastroenterol. 117, 1871–1873 (2022).
Fusion of medical imaging and electronic health records using deep 88. Marwaha, A., Chitayat, D., Meyn, M., Mendoza-Londono, R. & Chad,
learning: a systematic review and implementation guidelines. npj L. The point-of-care use of a facial phenotyping tool in the genetics
Digit. Med. 3, 136 (2020). clinic: enhancing diagnosis and education with machine learning.
66. Kaul, V., Enslin, S. & Gross, S. A. History of artificial intelligence in Am. J. Med. Genet. A 185, 1151–1158 (2021).
medicine. Gastrointest. Endosc. 92, 807–812 (2020). 89. O’Neill, T. J. et al. Active reprioritization of the reading worklist using
67. Dias, R. & Torkamani, A. Artificial intelligence in clinical and genomic artificial intelligence has a beneficial effect on the turnaround time for
diagnostics. Genome Med. 11, 70 (2019). interpretation of head CT with intracranial hemorrhage. Radiol. Artif.
68. Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. Intell. 3, e200024 (2021).
Rayyan-a web and mobile app for systematic reviews. Syst. Rev. 5, 90. Oppenheimer, J., Lüken, S., Hamm, B. & Niehues, S. A prospective
210 (2016). approach to integration of AI fracture detection software in radiographs
69. Ouzzani, M., Hammady, H., Fedorowicz, Z. & Elmagarmid, A. into clinical workflow. Life (Basel, Switzerland) 13, 223 (2023).
Rayyan-a web and mobile app for systematic reviews. Syst Rev. 5, 91. Quan, S. Y. et al. Clinical evaluation of a real-time artificial
210 (2016). intelligence-based polyp detection system: a US multi-center pilot
70. Page, M. J. et al. The Prisma 2020 statement: an updated guideline study. Sci. Rep. 12, 6598 (2022).
for reporting systematic reviews. BMJ 372, n71 (2021). 92. Ruamviboonsuk, P. et al. Real-time diabetic retinopathy screening
71. Sterne, J. A. et al. ROBINS-I: a tool for assessing risk of bias in non- by deep learning in a multisite national screening programme: a
randomised studies of interventions. BMJ 355, i4919 (2016). prospective interventional cohort study. Lancet Digit. Health 4,
72. Sterne, J. A. C. et al. RoB 2: a revised tool for assessing risk of bias in e235–44 (2022).
randomised trials. BMJ 366, l4898 (2019). 93. Sandbank, J. et al. Validation and real-world clinical application of an
73. Tooth, L., Ware, R., Bain, C., Purdie, D. M. & Dobson, A. Quality of artificial intelligence algorithm for breast cancer detection in
reporting of observational longitudinal research. Am. J. Epidemiol. biopsies. npj Breast Cancer 8, 129 (2022).
161, 280–288 (2005). 94. Schmuelling, L. et al. Deep learning-based automated detection of
74. Wan, X., Wang, W., Liu, J. & Tong, T. Estimating the sample mean pulmonary embolism on CT pulmonary angiograms: no significant
and standard deviation from the sample size, median, range and/or effects on report communication times and patient turnaround in the
interquartile range. BMC Med Res Methodol. 14, 135 (2014). emergency department nine months after technical implementation.
75. Higgins, J. P. T., Thompson, S. G., Deeks, J. J. & Altman, D. G. Eur. J. Radiol. 141, 109816 (2021).
Measuring inconsistency in meta-analyses. BMJ 327, 95. Seyam, M. et al. Utilization of artificial intelligence-based intracranial
557–560 (2003). hemorrhage detection on emergent noncontrast CT images in
76. Viechtbauer, W. Conducting meta-analyses in R with the metafor clinical workflow. Radiol. Artif. Intell. 4, e210168 (2022).
Package. J Stat Softw. 36, 1–48 (2010). 96. Sim, J. Z. T. et al. Diagnostic performance of a deep learning model
77. DerSimonian, R. & Laird, N. Meta-analysis in clinical trials. Control. deployed at a National COVID-19 screening facility for detection of
Clin. Trials 7, 177–188 (1986). pneumonia on frontal chest radiographs. Healthcare 10, 175 (2022).
78. Hartung, J. An alternative method for meta-analysis. Biom. J. J. 97. Strolin, S. et al. How smart is artificial intelligence in organs
Math. Methods Biosci. 41, 901–916 (1999). delineation? Testing a CE and FDA-approved deep-learning tool
79. Cochran, W. G. The combination of estimates from different using multiple expert contours delineated on planning CT images.
experiments. Biometrics 10, 101 (1954). Front. Oncol. 13, 1089807 (2023).
80. Carlile, M. et al. Deployment of artificial intelligence for radiographic 98. Wang, M. et al. Deep learning-based triage and analysis of lesion
diagnosis of COVID-19 pneumonia in the emergency department. J. burden for COVID-19: a retrospective study with external validation.
Am. Coll. Emerg. Phys. Open 1, 1459–1464 (2020). Lancet Digit. Health 2, e506–e515 (2020).
81. Cheikh, A. B. et al. How artificial intelligence improves radiological 99. Wong, J. et al. Implementation of deep learning-based auto-
interpretation in suspected pulmonary embolism. Eur. Radiol. 32, segmentation for radiotherapy planning structures: a workflow study
5831–5842 (2022). at two cancer centers. Radiat. Oncol. 16, 101 (2021).
82. Elijovich, L. et al. Automated emergent large vessel occlusion 100. Wong, K. et al. Integration and evaluation of chest X-ray artificial
detection by artificial intelligence improves stroke workflow in a hub intelligence in clinical practice. J. Med. Imaging 10, 051805 (2023).
and spoke stroke system of care. J. NeuroIntervent Surg. 14, 101. Yang, Y. et al. Performance of the AIDRScreening system in
704–708 (2022). detecting diabetic retinopathy in the fundus photographs of Chinese
83. Ginat, D. Implementation of machine learning software on the patients: a prospective, multicenter, clinical study. Ann. Transl. Med.
radiology worklist decreases scan view delay for the detection of 10, 1088 (2022).
intracranial hemorrhage on CT. Brain Sci. 11, 832 (2021). 102. Elguindi, S. et al. Deep learning-based auto-segmentation of targets
84. Hong, W. et al. Deep learning for detecting pneumothorax on chest and organs-at-risk for magnetic resonance imaging only planning of
radiographs after needle biopsy: clinical implementation. Radiology prostate radiotherapy. Phys. Imaging Radiat. Oncol. 12, 80–86 (2019).
303, 433–441 (2022). 103. Wang, L. et al. An intelligent optical coherence tomography-based
85. Jones, C. M. et al. Assessment of the effect of a comprehensive system for pathological retinal cases identification and urgent
chest radiograph deep learning model on radiologist reports and referrals. Trans. Vis. Sci. Tech. 9, 46 (2020).
patient outcomes: a real-world observational study. BMJ Open 11, 104. Gulshan, V. et al. Development and validation of a deep learning
e052902 (2021). algorithm for detection of diabetic retinopathy in retinal fundus
86. Kiljunen, T. et al. A deep learning-based automated CT photographs. JAMA 316, 2402 (2016).
segmentation of prostate cancer anatomy for radiation therapy 105. Krause, J. et al. Grader variability and the importance of reference
planning-A retrospective multicenter study. Diagnostics 10, standards for evaluating machine learning models for diabetic
959 (2020). retinopathy. Ophthalmology 125, 1264–1272 (2018).
106. Ruamviboonsuk, P. et al. Deep learning versus human graders for Additional information
classifying diabetic retinopathy severity in a nationwide screening Supplementary information The online version contains
program. npj Digit. Med. 2, 25 (2019). supplementary material available at
107. Retico, A., Delogu, P., Fantacci, M. E., Gori, I. & Preite Martinez, A. https://doi.org/10.1038/s41746-024-01248-9.
Lung nodule detection in low-dose and thin-slice computed
tomography. Comput. Biol. Med. 38, 525–534 (2008). Correspondence and requests for materials should be addressed to
108. Lopez Torres, E. et al. Large scale validation of the M5L lung CAD on Katharina Wenderott.
heterogeneous CT datasets. Med. Phys. 42, 1477–1489 (2015).
109. Brown, M. S. et al. Automated endotracheal tube placement check Reprints and permissions information is available at
using semantically embedded deep neural networks. Acad. Radiol. http://www.nature.com/reprints
30, 412–420 (2023).
Publisher’s note Springer Nature remains neutral with regard to
Acknowledgements jurisdictional claims in published maps and institutional affiliations.
We sincerely thank Dr. Nikoloz Gambashidze (Institute for Patient Safety,
University Hospital Bonn) for helping with the title and abstract screening. Open Access This article is licensed under a Creative Commons
We thank Annika Strömer (Institute for Medical Biometry, Informatics and Attribution 4.0 International License, which permits use, sharing,
Epidemiology, University of Bonn) for her statistical support. This research adaptation, distribution and reproduction in any medium or format, as long
was financed through institutional budget, i.e., no external funding. as you give appropriate credit to the original author(s) and the source,
provide a link to the Creative Commons licence, and indicate if changes
Author contributions were made. The images or other third party material in this article are
K.W.: conceptualization, data curation, formal analysis, investigation, included in the article’s Creative Commons licence, unless indicated
methodology, project administration, software, visualization, writing – otherwise in a credit line to the material. If material is not included in the
original draft, writing – preparation, review, and editing; J.K.: data curation, article’s Creative Commons licence and your intended use is not permitted
investigation, visualization, writing – review and editing; F.Z.: investigation, by statutory regulation or exceeds the permitted use, you will need to
writing – review and editing; M.W.: conceptualization, funding acquisition, obtain permission directly from the copyright holder. To view a copy of this
supervision, validation. All authors have read and approved the manuscript. licence, visit http://creativecommons.org/licenses/by/4.0/.
Competing interests
The authors declare no competing interests.