Lost in Digitization - A Systematic Review About The Diagnostic Test Accuracy in Digital Pathology Solution
Lost in Digitization - A Systematic Review About The Diagnostic Test Accuracy in Digital Pathology Solution
Review Article
A R T I C L E I N F O A B S T R A C T
Keywords: Introduction: Digital pathology solutions are increasingly implemented for primary diagnostics in departments of
Human pathology pathology around the world. This has sparked a growing engagement on validation studies to evaluate the diagnostic
Whole slide imaging (WSI) performance of whole slide imaging (WSI) regarding safety, reliability, and accuracy. The aim of this review was to
Validation studies evaluate the performance of digital pathology for diagnostic purposes compared to light microscopy (LM) in human
Diagnostic test accuracy
pathology, based on validation studies designed to assess such technologies.
Diagnostic concordance
Overdiagnosis
Methods: In this systematic review based on PRISMA guidelines, we analyzed validation studies of WSI compared with
LM. We included studies of diagnostic performance of WSI regarding diagnostic test accuracy (DTA) indicators, degree
of overdiagnosis, diagnostic concordance, and observer variability as a secondary outcome. Overdiagnosis is (for
example) detecting a pathological condition that will either not progress or progress very slowly. Thus, the patient
will never get symptoms from this condition and the pathological condition will never be the cause of death. From a
search comprising four databases: PubMed, EMBASE, Cochrane Library, and Web of Science, encompassing the period
2010–2021, we selected and screened 12 peer-reviewed articles that fulfilled our selection criteria. Risk of bias was
conducted through QUADAS-2 tool, and data analysis and synthesis were performed in a qualitative format.
Results: We found that diagnostic performance of WSI was not inferior to LM for DTA indicators, concordance, and
observer variability. The degree of overdiagnosis was not explicitly reported in any of the studies, while the term itself
was used in one study and could be implicitly calculated in another.
Conclusion: WSI had an overall high diagnostic accuracy based on traditional accuracy measurements; however, the
degree of overdiagnosis is unknown.
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Materials and methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Study characteristics and quality assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Primary and additional outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Diagnostic test accuracy indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Diagnostic concordance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Degree of overdiagnosis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Additional outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
⁎ Corresponding author.
E-mail addresses: [email protected] (O. Kusta), [email protected] (C.V. Rift), [email protected] (T. Risør), [email protected] (E. Santoni-Rugiu),
[email protected] (J.B. Brodersen).
http://dx.doi.org/10.1016/j.jpi.2022.100136
Received 3 June 2022; Received in revised form 30 August 2022; Accepted 31 August 2022
Available online 6 September 2022
2153-3539/© 2022 The Author(s). Published by Elsevier Inc. on behalf of Association for Pathology Informatics. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
O. Kusta et al. Journal of Pathology Informatics 13 (2022) 100136
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Study design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Subspeciality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Sample preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Overdiagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Shortcomings of the systematic review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Implications for practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Funding support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Authors’ contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Conflicts of interests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Appendix A. Supplementary data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2
O. Kusta et al. Journal of Pathology Informatics 13 (2022) 100136
Fig. 1. Flowchart based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMAa) guidelines.
a
The figure was drafted based on a freely available template at http://prisma-statement.org/documents/PRISMA%202009%20flow%20diagram.pdf.
The quality of the selected studies was assessed through the modified Table 1
Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool.18 Judgement for Risk of Bias summarized for domains (QUADAS 2)a.
The assessment of bias in the studies was based on 4 domains: patient selec- Authors Patient Index Reference Flow and
tion, index test, reference standard, flow of patients in the study, and timing selection test standard timing
of the intervention(s).19 Ammendola et al. 27
? ?
Primary and secondary outcomes are reported in a tabular form, while Brunyé et al.20
the other data extracted as supplementary material. We did not conduct a Cima et al. 31
meta-analysis because of the studies heterogeneity. Elmore et al. 29
Larghi et al. 24
Results Nielsen et al. 30 ?
Perez et al. 21 ?
Study characteristics and quality assessment Ribback et al. 25 ?
Tawfik et al. 28 ?
We identified 2402 unique records in our literature search of which 71 Tawfik et al. 26 ?
articles were included for full text reading and possible elegibility for the Tissier et al. 22 ? ?
study (Fig. 1). Among the 71 articles, 12 fulfilled the main selection criteria Zoroquiain et al. 23 ?
for our study that is reporting at least 2 of the primary outcomes (i.e., DTA a
Table adapted from the freely available template at https://view.officeapps.
indicators, diagnostic concordance, and overdiagnosis). From the 12 live.com/op/view.aspx?src=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttp%2Fwww.bristol.ac.uk%2Fmedia-
studies in our review, 4 did not specify the kind of study20–23; 3 were retro- library%2Fsites%2Fquadas%2Fmigrated%2Fdocuments%2Ftable.
spective studies,24–26 2 comparative studies,27,28 and the remaining 3 docx&wdOrigin=BROWSELINK.
3
O. Kusta et al. Journal of Pathology Informatics 13 (2022) 100136
Tawfik et al. 26 The main DTA indicators reported for WSI in 10 studies were sensitiv-
Tissier et al. 22 ? ? ity, specificity, positive-predictive values, and negative-predictive values
Zoroquiain et al. 23 while in 1 study AUC was reported as a probability.27 One study did not
a
specify any DTA indicators, but only diagnostic concordance.20 From the
Table adapted from the freely available templates at https://view.officeapps.
12 selected studies, 5 were based on histology preparations,22,23,27,29,30 3
live.com/op/view.aspx?src=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttp%2Fwww.bristol.ac.uk%
2Fmedia-library%2Fsites%2Fquadas%2Fmigrated%2Fdocuments%2Ftable.
used cytology preparations,21,26,28 1 study both histology and cytology
docx&wdOrigin=BROWSELINK. samples,24 while 2 of them frozen sections.25,31 The studies selected encom-
b
Because final FS-FFPE diagnosis based on frozen sections (FS) or formalin-fixed passed several pathology subspecialties, with 2 of them reporting on
and paraffin embedded (FFPE) biopsies may differ from the original assessment multiple25,31 and 1 not specifying the subspecialty.21
even during routine use of LM with frozen section. All the results regarding the primary outcomes of accuracy measure-
c
This refers to the comparison of accuracy of WSI with LM to identify microor- ments are shown in Table 3. At least 7 studies reported a very good perfor-
ganisms and not human cells. mance of WSI based on DTA indicators.21–26,30,31 In these studies,
sensitivity ranged from 86% to 100%, specificity 75% to 100%, positive-
predictive values 92% to 99%, and negative-predictive values from 75%
randomized,29 evaluation,30 and validation study,31 respectively. The char- to 100%. Cima et al., examining frozen sections for intraoperative cancer
acteristics of the studies are presented in the Supplementary Tables 1 and 2. staging and transplant organs, had a drop in specificity and negative-predic-
Of emphasis concerning digitization of slides is that only 2 studies re- tive values (both 75%), due to 4 discordant cases (compared to LM) in ex-
ported minor technical discrepancies. One study elaborated on a technical amining kidney and liver donors transplant organs.31
issue where 11 of 124 slides needed a rescan and 4 were excluded due to In a study of pancreatic pathology, Larghi et al. besides the overall good
failed digitization31; while another stated that 6 slides had loss of diagnostic performance of WSI for sensitivity, specificity, and positive-predictive
material on the fine needle biopsy.21 The most used WSI scanner as values, also reported a poor performance for negative-predictive values
reported in 4 studies, was Aperio ScanScope XT (Aperio Technologies, for both LM and WSI (51% and 52%, respectively).24 However, the authors
Vista, Calif., USA),22,24,26,28 followed by iScan Coreo (Ventana, Tucson, do not explain the reasons for this poor performance.
Ariz., USA) used in 3 studies.20,23,29 In the remaining studies, there were di- One study of gynecological pathology, diagnosing several diseases ac-
verse scanners used such as Mirax scanner (Carl Zeiss MicroImaging, Jena, cording to the 2001 Bethesda Report, stated a poor sensitivity of WSI for
Germany),25,30 NanoZoomer S260 (Hamamatsu photonics, Japan),27 each of the individual diseases (23.5%–58.3%, see Table 3 for more
Navigo (Visia Imaging, Arezzo, Italy),31 and digital camera with NetCam details).28 However, they report a higher average sensitivity (82.1%) that
software (Olympus America, Center Valley, PA).21 is adjusted to the number of cases for each diagnostic category. Similarly,
Regarding the quality assessment of the selected studies, overall there in a study of surgical neuropathology, Ammendola et al. reported a poor
was a low risk of bias and applicability concerns (for more details see performance of both LM and WSI based on AUC (from 0.50 to 0.72) for sev-
Tables 1 and 2, and Fig. 2). eral diagnostic features of meningioma.27
Fig. 2. The proportion of the Risk of Bias and Applicability Concerns (QUADAS 2)a.
a
The drafted figure is a template freely available at https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttp%2Fwww.bristol.ac.uk%2Fmedia-library%2Fsites%
2Fquadas%2Fmigrated%2Fdocuments%2Fgraphs.xlsx&wdOrigin=BROWSELINK.
4
Table 3
Primary outcomes of diagnostic test accuracy (DTA) indicators and diagnostic concordance.
Source Subspecialty Diagnostic purpose Primary outcomes
O. Kusta et al.
Ammendola Surgical Neuropathology Grading of meningioma Area Under the Curve (AUC)a
et al.27
Observer 1 Observer 2 Observer 3 Observer 4
Brain invasion 0.50 0.50 0.51 0.51 0.53 0.55 0.50 0.55
High mitotic index 0.64 0.72 0.60 0.61 0.58 0.65 0.56 0.68
Hypercellularity 0.54 0.52 0.58 0.58 0.50 0.50 0.54 0.50
Sheeting 0.57 0.52 0.59 0.59 0.55 0.59 0.50 0.62
Macronucleoli 0.53 0.51 0.55 0.53 0.51 0.53 0.53 0.53
Small cells 0.55 0.51 0.63 0.61 0.54 0.53 0.52 0.54
Spontaneous necrosis 0.51 0.52 0.61 0.61 0.51 0.51 0.56 0.54
Brunyé et al.20 Breast pathology Classification of breast neoplasms Diagnostic concordance (95% CI)
Cima et al.31 Multiple subspecialties and organs Cancer staging (surgical margins, tumor biology, Primary outcomes Cancer (WSI) Transplant (WSI)
lymph node status) and organ quality for transplantation
Sensitivity 100% 96%
Specificity 96% 75%
Positive-predictive values 95% 96%
Negative-predictive values 100% 75%
5
Diagnostic concordance 97% (к=0.96, CI: 86% (к=0.91, CI:
0.941–0.985) 0.877–0.958)
Larghi et al.24 Pancreatic pathology Diagnostic classification according to the Papanicolau Society of Primary outcomes LM (95% CI) WSI (95% CI)
Cytopathology system for reporting pancreatobiliary cytology
Sensitivity 92% 93%
Specificity 96% 88%
Positive-predictive values 99% 99%
Negative-predictive values 51% 52%
Diagnostic concordance 92% 92%
Nielsen et al.30 Dermatopathology Diagnosing neoplasms of the skin: benign, premalignant, and Primary outcomes LM WSI
malignant
Sensitivity 92% (85–96%) 86% (78–91%)
Specificity 99.5% (97–99.5%) 99% (97–99.5%)
Positive-predictive values 93% (86–96.5%) 92% (84.5–95.5%)
Negative-predictive values 98% (97–99%) 97% (96–98%)
Diagnostic concordancef 72.4% 69.6%
Sensitivity 87.9%
Specificity 95.7%
O. Kusta et al.
Ribback et al.25 Urology, gynecology, and Tumor diagnosis and assessment of surgical margin Primary outcomes WSI
dermatopathology
Sensitivity 92.6%
Specificity 99.0%
Positive-predictive values 98.3%
Negative-predictive values 97.7%
Diagnostic concordance 98.35%
Tawfik et al.26 Gynecological pathology Assessing if negative for intraepithelial lesion or malignancy Sensitivity (95% CI)
Diagnosis WSI
Tawfik et al.28 Gynecological pathology Diagnosing for neoplasms, cellular changes, and infectious agents Weighted average for WSI (95% CI)
according to 2001 Bethesda reporting system and terminology
Diagnosis Sensitivity Specificity
6
Atypical squamous cells, cannot exclude high-grade 23.5% 99.5%
squamous intraepithelial lesion (ASC-H)
Any conditionh 82.1% 86.2%
22 i j
Tissier et al. Nephropathology Classification of adrenocortical tumor by Weiss score Primary outcomes Reading 1 Reading 2
Zoroquiain et al.23 Ocular pathology Identification of prognostic factors for retinoblastoma Morphological risk factors Classic morphological features
Interobserver variability between all observers (AO) and senior pathologists (SP)a
LM WSI
7
Interobserver variability for all observers
Parameter LM WSI
LM VS LM 79%
WSI VS WSI 73%
LM VS WSI 77%
WSI VS LM 76%
Negativef (95% CI) 0.74 0.49 (0.39–0.60) 0.63 0.79 0.61 0.68
(0.67–0.80) (0.52–0.73) (0.70–0.87) (0.52–0.70)
Atypical squamous cells of undetermined significance (ASCUS) 0.46 0.21 (0.10–0.32) 0.36 0.45 0.33 0.39
(95% CI) (0.39–0.52) (0.25–0.46) (0.36–0.44) (0.24–0.43)
Low-grade squamous intraepithelial lesions (LSIL) (95% CI) 0.53 0.41 (0.31–0.52) 0.52 0.55 0.51 0.51
(0.47–0.59) (0.42–0.63) (0.46–0.64) (0.42–0.60)
High-grade squamous intraepithelial lesions (HSIL) (95% CI) 0.58 0.36 (0.26–0.46) 0.42 0.58 0.54 0.52
(0.52–0.64) (0.31–0.52) (0.49–0.67) (0.45–0.63)
Tissier et al.22 Nephropathology Intraobserver variability (Weiss scoreg criteria Interobserver variability (Weiss score criteria reading)
reading)
8
Sinusoidal invasion 0 0.40 (0.37–0.44) 0.30 (0.27–0.33)
Weiss modified by Aubert et al ≥3 vs 0–2 0.50 0.67 (0.64–0.70) 0.75 (0.72–0.78)
a
Interobserver concordance was measured between all the observers (pathologists), but also between senior pathologists versus all the observers that participated in the validation study.
b
Here all the possible combination of comparisons between LM and WSI were tried based on intraobserver agreement.
c
Beside the diagnostic classification, in this study other diagnostic features were considered, therefore we use the term “parameters”.
d
Kappa (к) statistics is used to assess observer agreement for intervention(s).
e
At Nielsen et al., they use the term ‘review’ instead of ‘reading’. We have chosen the latter for a consistent terminology (as it is used e.g. in Tissier et al.).
f
The case does not have the target condition.
g
Weiss score is a reference method to distinguish between a benign and a malignant adrenocortical tumor (ACT).
Journal of Pathology Informatics 13 (2022) 100136
O. Kusta et al. Journal of Pathology Informatics 13 (2022) 100136
Elmore et al., focusing on breast cancer, report a high predictive value, The 12 studies included in the present review displayed a high hetero-
for both LM and WSI, in identifying benign without atypia (97.1% vs geneity and from the analysis of the data extracted, it seems that this has
95.7%) and invasive breast cancer (97.7% vs 97.2%).29 However, they re- implications for the diagnostic performance of WSI in the validation studies
port an average performance for Ductal Carcinoma in Situ (DCIS) (69.6% of pathology. There are 3 main aspects, in addition to the risk of overdiag-
LM vs 57.1% WSI) and a poor performance for atypia (37.8% vs 27.8%). nosis, where heterogeneity played an important role regarding perfor-
mance: study design, subspeciality, and sample preparation.
Diagnostic concordance
Study design
Six studies out of 12 reported the diagnostic concordance of WSI
with LM20,21,24,25,30,31 (Table 3). Four of these, reported a high diagnos- The included studies design were quite diverse regarding the main
tic concordance for WSI in the range 86%–98.35%. Nielsen et al. con- CAP recommendations such as the number of samples, pathologists, wash-
ducting a study in dermatopathology, report an average concordance out period, order of examination with LM and WSI, and the comparison
for both LM and WSI, 72.4% vs 69.6%, respectively. 30 The authors between them. Therefore, a reliable diagnostic performance is directly
briefly elaborate on the poor performance of WSI for premalignant related to the quality of the validation study, as also remarked in another
changes, where the main problems with accuracy (and concordance) systematic review comparing WSI with LM.33 In line with Goacher et al.,
were observed. This might explain the average concordance as opposed the quality of the evidence regarding WSI performance is hampered by
to an otherwise very good performance for DTA indicators (see the the heterogeneity of the study design, despite the evidence that WSI was
subsection above and Table 3). Finally, a study of breast cancer reported not inferior to LM.34 Thus, in our review 4 studies did not have a sufficient
a varying mean concordance for different stages of breast cancer. 20 (60 cases) number of samples as recommended by CAP,20,22,23,27 which
Similarly with the other breast cancer study,29 the poor concordance might have increased the uncertainty due to broader confidence intervals.
was observed for atypia (37%), the very good concordance in invasive Notwithstanding the low risk of bias and applicability, 6 studies did not
breast cancer (94%).20 report on the confidence intervals regarding the diagnostic performance
of WSI or LM.21,23,25,27,30,31 This brings further questions about the sample
Degree of overdiagnosis size and whether it is representative of the population.
The degree of overdiagnosis was not explicitly reported in any of the 12 Subspeciality
studies. There are ongoing and recent discussions whether overdiagnosis
should be defined as a diagnostic error,32, thereby captured by the Bayesian The included 12 studies represent different pathology subspecialties,
reasoning (2x2 table). As Brodersen et al. remark, overdiagnosis is not a and 2 even reporting on multiple subspecialties.25,31 Each subspecialty
false-positive result classified as diagnostic error that with further investiga- involves specific challenges regarding the number and type of diagnostic
tion can be determined as such; it is an abnormality that meets the patho- categories, as well as those cases requiring additional molecular tests for
logical criteria of a disease.16 In one of the selected studies, Elmore and the final diagnosis.
colleagues elaborate on overinterpretation for several grades of breast For instance, Ammendola et al. reported AUC values (for both LM and
cancer on both WSI and LM.29 The term overinterpretation was used to WSI) evaluating atypical meningioma mostly in the range of 0.50–0.60.27
denote the incorrect classification of a lesion to a higher stage. The authors These values indicate a poor performance regarding test accuracy. None-
of this study, calculated that 3% of the cases were overinterpreted as theless, the authors concluded that the suboptimal performance regarding
invasive breast cancer with WSI, thereby overdiagnosed. the grading of meningioma was due to the diagnostic challenges that this
disease poses for pathologists. In this case, more experienced senior pathol-
Additional outcomes ogists performed significantly better than younger ones. This finding has
implications about the role of clinical reasoning in diagnostic accuracy,
Six studies out of 12 reported on observer variability22,24,26,27,29,30 where the literature suggests expertise might be related with experience
(Table 4). Of these, 4 studies tested intra or interobserver variability with especially with pattern recognition of importance in visual diagnostics.32,35,36
Cohen’s kappa (к) statistics,22,24,26,30 and 2 in percentage.27,29 Two studies Parallel to the increasing complexity of examinations, the subspecialty of
calculating intra- and interobserver variability based on к statistics, where gynecological pathology was challenged by a high diagnostic workload.37
the values for both LM and WSI were within к 0.67–0.97.24,30 The 2 other In 2 studies of this subspecialty, the authors assessing the performance of
studies calculated к jointly for LM-WSI for different diagnostic features or WSI based on DTA indicators, evaluated 33528 and 111026 slides. In one of
categories, where interobserver variability was from к 0.21–0.83.22,26 the studies, the WSI showed high sensitivity for assessing intraepithelial
Two studies reported the percentage of observer variability for LM and lesions or malignancies.28 While, the other study displayed an inconsistent
WSI, where intraobserver variability was from 73% to 100% for both.27,29 sensitivity for multiple diagnostic categories, but stated that their method
While, Ammendola et al. calculated also interobserver variability for senior of assessment was as sensitive as the standard reference method.26
pathologists (range 49%–97%) vs all observers (range 26%–93%) and all Girolami et al. asserted that diagnostic performance is related to the
observers for LM (range 27%–83%) and WSI (31%–89%).27 time for making the diagnosis in cytology-based subspecialties.37 In this
regard, Tawfik et al. reported an average scanning and reviewing time of
Discussion 5.5 min with WSI for cytology-based gynecological pathology.26 In 3
other studies measuring the time for diagnosis with WSI, 2 stated that turn-
The selected studies in this systematic review displayed a low risk of bias around time (time of the arrival of the specimen until the communication of
and applicability concerns as measured with the QUADAS-2.18,19 We found diagnosis) was comparable between LM and WSI,25,31 while Larghi et al.
that WSI was not inferior to LM regarding diagnostic performance. In addi- reported a comparable time for reviewing slides with LM and WSI, 84
tion, in 4 studies reporting both LM and WSI, their performances were and 108 s, respectively.24
comparable.24,27,29,30 Moreover, 8 out of 12 studies state an overall very
good performance of WSI regarding DTA and diagnostic concordance. How- Sample preparation
ever, the degree of overdiagnosis was not reported in any of the selected stud-
ies, which might have an impact on artificially increasing the performance of Sample preparation techniques pose specific challenges for slide digiti-
WSI like other newer imaging tests. In this regard, Heleno et al. assessing the zation that might affect the performance of WSI, both regarding accuracy
accuracy of low-dose CT scans for lung cancer screening, found that overdiag- and time. One such example are cytology preparations – where smear thick-
nosis inflated sensitivity and positive-predictive values.13 ness, overlapping cells, and obscuring backgrounds require multiplane
9
O. Kusta et al. Journal of Pathology Informatics 13 (2022) 100136
(z-stacking) focusing for digital slides.28 From the selected articles, 3 of presented as a good solution to address the lack of pathologists and a grow-
them were based on cytology preparations,21,26,28 1 involved both cell- ing workload. Following this, the possibility to train residents and patholo-
blocks (cytology) and histology samples,24 while 2 of them used frozen gists with this digital solution adds to the capacity building in order to
sections.25 Despite the difficulties of sample preparation, all these studies tackle these challenges.2 Finally, the prospect of using AI algorithms for
reported a comparable performance of WSI with LM. quantitive measuring, counting, and computer-assisted diagnosis might
This important aspect of using WSI with z-stacking for routine work with contribute in better diagnostic accuracy and saving time for pathologists.4,7
cytology preparations was also emphasized in a systematic review of digital
pathology for cytopathology.37 However, one study of surgical neuropathol- Conclusion
ogy based on histology preparations used 7 z-stack planes and a technique
for optimizing the digital slide.27 Notwithstanding the fact that histology is We found that WSI was not inferior to LM regarding DTA and diagnostic
less challenging for digitization, the performance of pathologists was not concordance. However, the degree of overdiagnosis was not systematically
more accurate than with LM. However, even with single or multiple z- reported and is thereby unknown. The diverse subspecialties and their labo-
stacking, cytopathology and frozen sections are still difficult to digitize ratory tasks pose important questions whether it is possible to compare LM
with a high quality of image as it can be achieved with histopathology slides. and WSI across all these subspecialties, or that perhaps LM has advantages
in some and WSI in others. When considering the implementation of digital
Overdiagnosis pathology, departments should also take into account the advantages for
remote diagnosis and consultations, cancer research, digital multidisciplinary
Adding to the challenges relating to diagnostic performance and the case conferences, supervision of residents, and storage of digital slides. How-
role of heterogeneity, overdiagnosis poses other difficulties. Although its ever, the designers of the validation studies and the participating pathologists
degree was not reported explicitly, it was briefly addressed in the 2 breast should be careful in those areas where the risk of overdiagnosis exists.
cancer studies.20,29 Brunyé et al. mention the notion of overdiagnosis, by
elaborating on its unnecessary and costly treatment and intervention proce-
Funding support
dures, for instance, when a biopsy is interpreted as ductal carcinoma in situ
(DCIS) when in fact is atypia.20 Conversely, Elmore et al. calculated the
This research did not receive any specific grant from funding agencies in
number of cases incorrectly classified to a higher stage (per hundred
the public, commercial, or not-for-profit sectors.
cases), showing that 3% with WSI and 2% with LM (as the reference stan-
dard) of cases were overinterpreted as invasive breast cancer.29 However,
this was a validation study scenario, where clinical outcomes were not cal- Authors’ contributions
culated, but only the performance of the pathologists involved in this study.
In this regard, future studies should evaluate the DTA of WSI by including OK and JBB conceptualized the systematic review. The other authors
patient-relevant outcomes, and thereby overdiagnosis in a randomized design helped to refine conceptualization before submitting the protocol. Database
to encompass the full spectrum of cases.29 search, screening, data extraction, risk of bias, data analysis and synthesis,
While there are 5 cancers documented with high risk of overdiagnosis, were conducted independently by CVR and OK. JBB acted as an arbiter in
the reasons for each of them are different such as screening (i.e., breast can- cases of disagreement. ESR helped with the terminology in the study and
cer, prostate cancer, and melanoma), incidental findings (renal cancer), or his expertise as a senior pathologist throughout different steps. TR helped
both incidental findings and excessive investigation (thyroid cancer).38 with the writing and reviewing the manuscript of the review. OK and
However, there are other cases such as lung cancer, where overdiagnosis CVR wrote the first draft and all the other authors helped during the
is possible if screening for lung cancer is implemented.39 In this review, writing, editing, and reviewing process.
we focused on pathological diagnostics by comparing WSI to LM and not
on the above factors for overdiagnosis. In this regard, the Cochrane Collab- Conflicts of interests
oration has launched a new research field regarding the use of evidence to
tackle overdiagnosis and its consequences.40 The authors declare no conflicts of interests.
10
O. Kusta et al. Journal of Pathology Informatics 13 (2022) 100136
5. Pantanowitz L, Farahani N, Parwani A. Whole slide imaging in pathology: advantages, 23. Zoroquiain P, Logan P, Bravo-Filho V, et al. Diagnosing pathological prognostic factors in
limitations, and emerging perspectives. Pathol Lab Med Int 2015. https://doi.org/10. retinoblastoma: correlation between traditional microscopy and digital slides. Ocul
2147/plmi.S59826. Oncol Pathol Jun 2015;1(4):259–265. https://doi.org/10.1159/000381155.
6. Aeffner F, Zarella MD, Buchbinder N, et al. Introduction to digital image analysis in 24. Larghi A, Fornelli A, Lega S, et al. Concordance, intra- and inter-observer agreements be-
whole-slide imaging: a white paper from the digital pathology association. J Pathol In- tween light microscopy and whole slide imaging for samples acquired by EUS in pancre-
form 2019;10:9. https://doi.org/10.4103/jpi.jpi_82_18. atic solid lesions. Dig Liver Dis Nov 2019;51(11):1574–1579. https://doi.org/10.1016/j.
7. Niazi MKK, Parwani AV, Gurcan MN. Digital pathology and artificial intelligence. Lancet dld.2019.04.019.
Oncol 2019;20(5):e253–e261. https://doi.org/10.1016/s1470-2045(19)30154-8. 25. Ribback S, Flessa S, Gromoll-Bergmann K, Evert M, Dombrowski F. Virtual slide
8. Garcia-Rojo M, De Mena D, Muriel-Cueto P, Atienza-Cuevas L, Dominguez-Gomez M, telepathology with scanner systems for intraoperative frozen-section consultation. Pathol
Bueno G. New European union regulations related to whole slide image scanners and Res Pract Jun 2014;210(6):377–382. https://doi.org/10.1016/j.prp.2014.02.007.
image analysis software. J Pathol Inform 2019;10:2. https://doi.org/10.4103/jpi.jpi_ 26. Tawfik O, Davis M, Dillon S, et al. Whole-slide imaging of pap cellblock preparations is a
33_18. potentially valid screening method. Acta Cytol 2015;59(2):187–200. https://doi.org/10.
9. Deeks JJTY, Macaskill P, Bossuyt PM. Chapter 5: Understanding test accuracy measures. 1159/000430082.
Draft version (29 October 2021). Cochrane Handbook for Systematic Reviews of Diag- 27. Ammendola S, Bariani E, Eccher A, et al. The histopathological diagnosis of atypical me-
nostic Test Accuracy. Cochrane; 2021. ningioma: glass slide versus whole slide imaging for grading assessment. Virchows Arch
10. FaD Administration. Medical devices; hematology and pathology devices. Classifi- Apr 2021;478(4):747–756. https://doi.org/10.1007/s00428-020-02988-1.
cation of Blood Establishment Computer Software and Accessories, 83. ; 2018. 28. Tawfik O, Davis M, Dillon S, Tawfik L, Diaz FJ, Fan F. Whole slide imaging of pap cell
p. 23212.0097-6326. block preparations versus liquid-based thin-layer cervical cytology: a comparative
11. Pantanowitz L, Sinard JH, Henricks WH, et al. Validating whole slide imaging for diag- study evaluating the detection of organisms and nonneoplastic findings. Acta Cytol
nostic purposes in pathology: guideline from the College of American Pathologists Pa- 2014;58(4):388–397. https://doi.org/10.1159/000365046.
thology and Laboratory Quality Center. Arch Pathol Lab Med Dec 2013;137(12):1710– 29. Elmore JG, Longton GM, Pepe MS, et al. A randomized study comparing digital imaging
1722. https://doi.org/10.5858/arpa.2013-0093-CP. to traditional glass slide microscopy for breast biopsy and cancer diagnosis. J Pathol In-
12. Evaluation of Automatic. CLass III Designation for Philips IntelliSite Pathology Solution form 2017;8:12. https://doi.org/10.4103/2153-3539.201920.
(PIPS) (FDA). 2017:1-19. 30. Nielsen PS, Lindebjerg J, Rasmussen J, Starklint H, Waldstrom M, Nielsen B. Virtual
13. Heleno B. Quantification of harms in cancer screening: are numbers available and what microscopy: an evaluation of its validity and diagnostic performance in routine histologic
do they mean?. PhD thesis. Faculty of Health and Medical Sciences, University of diagnosis of skin tumors. Hum Pathol Dec 2010;41(12):1770–1776. https://doi.org/10.
Copenhagen. 2015. 1016/j.humpath.2010.05.015.
14. Rogers WA, Mintzker Y. Casting the net too wide on overdiagnosis: benefits, burdens and 31. Cima L, Brunelli M, Parwani A, et al. Validation of remote digital frozen sections for can-
non-harmful disease. J Med Ethics Nov 2016;42(11):717–719. https://doi.org/10.1136/ cer and transplant intraoperative services. J Pathol Inform 2018;9:34. https://doi.org/
medethics-2016-103715. 10.4103/jpi.jpi_52_18.
15. Brodersen J, Schwartz LM, Woloshin S. Overdiagnosis: how cancer screening can turn in- 32. Balogh EP, Miller BT. In: Ball JR, ed. Improving Diagnosis in Health Care/Committee on
dolent pathology into illness. APMIS Aug 2014;122(8):683–689. https://doi.org/10. Diagnostic Error in Health Care. Washington (DC): The National Academies Press; 2015.
1111/apm.12278. 33. Araujo ALD, Arboleda LPA, Palmier NR, et al. The performance of digital microscopy for
16. Brodersen J, Schwartz LM, Heneghan C, O’Sullivan JW, Aronson JK, Woloshin S. Overdi- primary diagnosis in human pathology: a systematic review. Virchows Arch Mar
agnosis: what it is and what it isn’t. BMJ Evid-Based Med 2018;23(1):1–3. https://doi. 2019;474(3):269–287. https://doi.org/10.1007/s00428-018-02519-z.
org/10.1136/ebmed-2017-110886. 34. Goacher E, Randell R, Williams B, Treanor D. The diagnostic concordance of whole slide
17. Shamseer L, Moher D, Clarke M, et al. Preferred reporting items for systematic review imaging and light microscopy: a systematic review. Arch Pathol Lab Med Jan 2017;141
and meta-analysis protocols (PRISMA-P) 2015: elaboration and explanation. BMJ Jan 2 (1):151–161. https://doi.org/10.5858/arpa.2016-0025-RA.
2015;350:g7647. https://doi.org/10.1136/bmj.g7647. 35. Eva KW. What every teacher needs to know about clinical reasoning. Med Educ Jan
18. Whiting PF, Rutjes AWS, Westwood ME, et al. QUADAS-2: a revised tool for the quality 2005;39(1):98-106. https://doi.org/10.1111/j.1365-2929.2004.01972.x.
assessment of diagnostic accuracy studies. Ann Intern Med 2011;155(8):529–536. 36. Norman GR, Eva KW. Diagnostic error and clinical reasoning. Med Educ Jan 2010;44(1):
https://doi.org/10.7326/0003-4819-155-8-201110180-00009. 94-100. https://doi.org/10.1111/j.1365-2923.2009.03507.x.
19. Uo Bristol. QUADAS2: background document. 2014. 37. Girolami I, Pantanowitz L, Marletta S, et al. Diagnostic concordance between whole slide
20. Brunye TT, Mercan E, Weaver DL, Elmore JG. Accuracy is in the eyes of the pathologist: imaging and conventional light microscopy in cytopathology: a systematic review. Can-
the visual interpretive process and diagnostic accuracy with digital whole slide images. J cer Cytopathol Jan 2020;128(1):17–28. https://doi.org/10.1002/cncy.22195.
Biomed Inform Feb 2017;66:171–179. https://doi.org/10.1016/j.jbi.2017.01.004. 38. Glasziou PP, Jones MA, Pathirana T, Barratt AL, Bell KJ. Estimating the magnitude of can-
21. Perez D, Stemmer MN, Khurana KK. Utilization of dynamic telecytopathology for rapid cer overdiagnosis in Australia. Med J Aust Mar 2020;212(4):163–168. https://doi.org/
onsite evaluation of touch imprint cytology of needle core biopsy: diagnostic accuracy 10.5694/mja2.50455.
and pitfalls. Telemed J E Health May 2021;27(5):525–531. https://doi.org/10.1089/ 39. Brodersen J, Voss T, Martiny F, Siersma V, Barratt A, Heleno B. Overdiagnosis of lung
tmj.2020.0117. cancer with low-dose computed tomography screening: meta-analysis of the randomised
22. Tissier F, Aubert S, Leteurtre E, et al. Adrenocortical tumors: improving the practice of clinical trials. Breathe (Sheff) Mar 2020;16(1), 200013. https://doi.org/10.1183/
the Weiss system through virtual microscopy: a National Program of the French Network 20734735.0013-2020.
INCa-COMETE. Am J Surg Pathol 2012;36(8):1194–1201. https://doi.org/10.1097/PAS. 40. Mahase E. Cochrane launches new research field to tackle overdiagnosis and medical
0b013e31825a6308. excess. BMJ Dec 6 2019;367:l6817. https://doi.org/10.1136/bmj.l6817.
11