0% found this document useful (0 votes)
13 views12 pages

ChampKit - A Framework For Rapid Evaluation of Deep Neural Networks For Patch-Based Histopathology Classification

Uploaded by

muhammad rifal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views12 pages

ChampKit - A Framework For Rapid Evaluation of Deep Neural Networks For Patch-Based Histopathology Classification

Uploaded by

muhammad rifal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Computer Methods and Programs in Biomedicine 239 (2023) 107631

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine


journal homepage: www.elsevier.com/locate/cmpb

ChampKit: A framework for rapid evaluation of deep neural networks


for patch-based histopathology classification
Jakub R. Kaczmarzyk a,c,∗, Rajarsi Gupta a, Tahsin M. Kurc a, Shahira Abousamra b,
Joel H. Saltz a,1, Peter K. Koo c,1
a
Department of Biomedical Informatics, Stony Brook Medicine, 101 Nicolls Rd, Stony Brook, 11794, NY, USA
b
Department of Computer Science, Stony Brook University, Stony Brook, NY, USA
c
Simons Center for Quantitative Biology, 1 Bungtown Rd, Cold Spring Harbor, 11724, NY, USA

a r t i c l e i n f o a b s t r a c t

Article history: Background and Objective: Histopathology is the gold standard for diagnosis of many cancers. Recent
Received 19 January 2023 advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology
Revised 23 April 2023
images for many tasks, including the detection of immune cells and microsatellite instability. However,
Accepted 28 May 2023
it remains difficult to identify optimal models and training configurations for different histopathology
classification tasks due to the abundance of available architectures and the lack of systematic evaluations.
Keywords: Our objective in this work is to present a software tool that addresses this need and enables robust,
Computational pathology systematic evaluation of neural network models for patch classification in histology in a light-weight,
Histopathology easy-to-use package for both algorithm developers and biomedical researchers.
Deep learning
Methods: Here we present ChampKit (Comprehensive Histopathology Assessment of Model Predictions
Benchmarks
toolKit): an extensible, fully reproducible evaluation toolkit that is a one-stop-shop to train and evalu-
Classification
ate deep neural networks for patch classification. ChampKit curates a broad range of public datasets. It
enables training and evaluation of models supported by timm directly from the command line, with-
out the need for users to write any code. External models are enabled through a straightforward API
and minimal coding. As a result, Champkit facilitates the evaluation of existing and new models and
deep learning architectures on pathology datasets, making it more accessible to the broader scientific
community. To demonstrate the utility of ChampKit, we establish baseline performance for a subset of
possible models that could be employed with ChampKit, focusing on several popular deep learning mod-
els, namely ResNet18, ResNet50, and R26-ViT, a hybrid vision transformer. In addition, we compare each
model trained either from random weight initialization or with transfer learning from ImageNet pre-
trained models. For ResNet18, we also consider transfer learning from a self-supervised pretrained model.
Results: The main result of this paper is the ChampKit software. Using ChampKit, we were able to sys-
temically evaluate multiple neural networks across six datasets. We observed mixed results when evalu-
ating the benefits of pretraining versus random intialization, with no clear benefit except in the low data
regime, where transfer learning was found to be beneficial. Surprisingly, we found that transfer learning
from self-supervised weights rarely improved performance, which is counter to other areas of computer
vision.
Conclusions: Choosing the right model for a given digital pathology dataset is nontrivial. ChampKit pro-
vides a valuable tool to fill this gap by enabling the evaluation of hundreds of existing (or user-defined)
deep learning models across a variety of pathology tasks. Source code and data for the tool are freely
accessible at https://github.com/SBU-BMI/champkit.
© 2023 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/)


Corresponding author.
E-mail addresses: [email protected] (J.R. Kaczmarzyk), [email protected] (J.H. Saltz), [email protected] (P.K. Koo).
1
Senior authors.

https://doi.org/10.1016/j.cmpb.2023.107631
0169-2607/© 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

1. Introduction in how the code was run. Finally, while there has been progress
in developing such a toolkit to evaluate slide-level classification
Histopathology is the gold standard for cancer diagnosis. [50], there does not yet exist a reproducible, extensible toolkit cou-
Pathologists review tissue slides visually, often alternating among pled with a comprehensive set of benchmark datasets for patch-level
different magnifications and fields of view. However, examining analysis.
the entirety of a slide at its highest resolution (e.g., 40x) and To address these unmet needs in digital pathology, we in-
examining a large number of slides for a study are very chal- troduce ChampKit (Comprehensive Histopathology Assessment of
lenging tasks, due to the sheer amount of time required. There Model Predictions toolKit), an easy-to-use toolkit that focuses on
is growing interest in computer-assisted analysis of histopathol- the evaluation of neural network models for computational pathol-
ogy images with deep learning. Physical pathology slides can ogy image analysis (Fig. 1). The target users of ChampKit are (1)
be digitized at high resolution, in a process known as digital methods research groups interested in systematically and quickly
pathology, thanks to advanced tissue scanners. Digital pathology evaluating their deep learning methods against a set of state-
is at its core a big data problem, as these images are large of-the-art (SOTA) methods with different pretraining and transfer
(e.g., 10 0,0 0 0 by 80,0 0 0 pixels) and take up several gigabytes of learning configurations, and (2) biomedical research groups inter-
storage. ested in finding and fine-tuning the best models to analyze a col-
There is significant interest in developing deep learning com- lection of whole slide images. Both research communities would
puter vision algorithms for digital pathology images in order to benefit from the ability to quickly and systematically evaluate a
label digital pathology images at high resolutions [3–12]. Digi- set of SOTA (pre-trained) methods, and Champkit enables these
tal pathology images are too large for conventional deep learn- two use cases in an easy-to-use package. In Section 4, we recount
ing algorithms and hardware, so the images are separated into how our group has used ChampKit to identify optimal models for
patches for use with deep learning algorithms. There are sev- Gleason grade classification. Importantly, ChampKit complements
eral types of computer vision approaches for digital pathology, in- existing tools. It makes use of the timm model repository and ex-
cluding slide-level classification, patch-based semantic segmenta- tends it with the addition of pathology-specific pretrained mod-
tion, patch-based object detection, and patch-based classification. els. Transfer learning has shown inconsistent benefits in digital
In this manuscript, we focus exclusively on patch-based classifi- pathology [51–53], so one use of ChampKit could be to compre-
cation tasks. Many studies in digital pathology have introduced hensively characterize the impact of transfer learning across mod-
patch-based deep learning algorithms that seek to characterize dig- els and pathology datasets. ChampKit also curates multiple public
ital pathology images. Owing in large part to large-scale, pub- datasets for patch-based classification. The toolkit and datasets ad-
licly available, multi-cancer projects like The Cancer Genome Atlas, here to FAIR principles [54,55], which enhances the reusability and
patch classification has been explored in a large number of can- reproducibility of this work. ChampKit is meant to simplify the
cers, including breast [13–15], lung [16–18], pancreas [19–23], and training and evaluation of deep learning models for patch-based
prostate [24,25] cancers to name a few. Other well-studied applica- classification.
tions of digital pathology patch classification include detecting mi- The main contribution of this paper is the ChampKit soft-
crosatellite instability [6,26–29], tumor cells [30–35], and tumor- ware. To demonstrate the utility of ChampKit, we perform a
infiltrating lymphocytes [36–42]. study that establishes baseline performance of several exist-
While deep learning can be a powerful tool, planning a deep ing models across six diverse classification tasks across vari-
learning study in digital pathology is nontrivial [43]. There are ous cancers through publicly available datasets curated by the
several challenges, including how to choose the best deep learn- toolkit.
ing model for the task, how to choose hyperparameters of mod-
els, and how to curate training and evaluation data. The choice of
models is a high-dimensional problem. There are hundreds of neu- 1.1. Related work
ral network architectures to choose from, and within each there
are hyperparameters that can greatly affect classification perfor- Benchmarking of deep neural networks is a well-studied field,
mance [43]. Model repositories exist that facilitate the use of neu- and multiple solutions exist in this space. BIAFLOWS is a bench-
ral networks pretrained on common natural image datasets like marking and deployment platform for microscopy image work-
ImageNet. One such repository is timm [1], which contains dozens flows. It supports many problem types, including object segmen-
of model architectures and hundreds of sets of pretrained weights. tation and particle tracking [56], and it implements metrics spe-
Other examples include the PyTorch and TensorFlow model hubs. cific to each supported problem type. OpenML is another plat-
However, there is an unmet need for digital pathology, because form that allows for the creation and sharing of machine learning
these repositories do not contain pathology-specific models, though benchmarks and emphasizes community contributions of bench-
models trained on pathology data using self-supervision have re- mark results [57]. Ludwig is a benchmarking toolkit that en-
cently come online [44–46]. Tools also exist to run benchmarking ables configurable, personalized benchmarking [49]. ShinyLearner
experiments [47–49] but these do not curate data specific to digital is a tool to benchmark classification of tabular data [58]. Weights
pathology. and Biases [2] provides many methods for hyperparameter search,
There are several sources of public digital pathology data, and experiment logging, and experiment visualization, all of which
it would be greatly beneficial to the digital pathology community to are useful for benchmarking. The Python package timm [1] al-
curate these data in one location and prepare them for immediate use lows easy access to hundreds of deep learning models for im-
with deep learning workflows. Although the pieces required to eval- age classification, many of which are pretrained on ImageNet
uate patch-based digital pathology models exist as part of sepa- [59], and thus can facilitate analysis across many models. To our
rate studies, a user would have to perform a complex process to knowledge, however, there are few benchmarking solutions de-
curate relevant data, prepare reference implementations of neu- signed specifically for histopathology. The work of Laleh et al.
ral network architectures, download pretrained weights, and im- [50], for example, provides a benchmark for weakly-supervised,
plement relevant performance metrics. A better solution is a toolkit specimen-level classification in digital pathology. These related
that streamlines these components for researchers [43]. The result- projects have inspired the development of ChampKit as a bench-
ing framework should also be reproducible and simple for users to marking toolkit specific to patch-based classification in digital
run, to mitigate any potential difference in results from differences pathology.

2
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

Fig. 1. Overview of ChampKit. ChampKit enables systematic comparisons of model architectures and transfer learning across several patch-based image classification datasets.
First, users select a model and pretrained weights from those available in timm [1] or a custom model with specfication of pretrained weights or random initialization.
Second, the models are trained on multiple tasks using identical training hyperparameters. Third, the trained models are evaluated on held-out test data for each task.
Performance is tracked with Weights and Biases [2].

2. Materials and methods mal models, use their own datasets, and use other models with
ChampKit.
2.1. ChampKit Identifying optimal models ChampKit includes a Python script to
evaluate trained models on unseen data. This script calculates var-
ChampKit is a one-stop-shop for systematic exploration of deep ious performance metrics, including area under the receiver oper-
learning architectures, training strategies, and transfer learning ating characteristic curve (AUROC), accuracy, F1 score, and a con-
across patch-level histopathology classification tasks. Research in fusion matrix, and it also saves plots of these values across mod-
deep learning architectures moves quickly, so ChampKit integrates els. These evaluations support binary and multi-class classifica-
the timm [1] model repository to provide access to hundreds tion tasks and will show the performance results per class. This
of (pretrained) deep learning models, enabling evaluation across is especially helpful in unbalanced datasets, so that one can de-
different transfer learning schemes from ImageNet [59] or from termine whether the trained models perform well on the under-
self-supervision on histopathology images [46,60] or training from represented class(es). The evaluation script then identifies the best
scratch. ChampKit enables rapid evaluation of hyperparameters us- model per evaluation metric and prints these results. The evalu-
ing Weights & Biases [2] and neural network architectures through ation scores for each model are saved to a spreadsheet so that
timm [1] and torchvision [61]. Users can incorporate their own one can perform further processing if desired. Using this tool in
datasets and models as well. Importantly, ChampKit does not re- ChampKit, one can choose the optimal model for one’s patch clas-
quire any coding — users can use configuration files and command sification dataset.
line programs to run training and evaluation. Detailed instructions Datasets ChampKit has currently curated six patch-based image
to use and extend ChampKit are available in the GitHub reposi- classification tasks for: (1) tumor (versus no tumor) classification,
tory.1 (2) tumor-infiltrating lymphocyte detection, (3–5) microsatellite in-
Toolkit There are two main use cases for ChampKit: 1) to iden- stability detection across different cancers and/or preparations, and
tify an optimal classification model for one’s own patch classifica- (6) precancerous versus benign classification. These datasets repre-
tion dataset, and 2) to benchmark models across multiple datasets, sent a wide variety of tissue types, cancers, tasks, and sample sizes
including those that are downloaded with ChampKit. For both use (see Table S1). This diversity represents a starting point for bench-
cases, the toolkit includes scripts to perform an end-to-end anal- marking patch classification in digital pathology and enables users
ysis: prepare datasets, download pretrained models, train models, to explore models that might generalize across these datasets or
and evaluate them on held-out test data. Weights and Biases [2] is be ideal for certain data characteristics. In addition to serving as
used to log experiment parameters, visualize results across mul- built-in datasets for exploration, these datasets serve to demon-
tiple models and training configurations, and orchestrate hyperpa- strate the capabilities of ChampKit as a benchmarking toolkit. In
rameter searches. Three hyperparameter search strategies are avail- Section 3, we describe the performance of several models on each
able currently via Weights and Biases, and from most to least ef- dataset. ChampKit includes reproducible scripts to download all
ficient they are Bayesian, random, and grid search [62,63]. Users datasets, with the exception of the MHIST dataset [64], which re-
can design their own hyperparameter searches to find the opti- quires completing an online form (an automated email is then
mal model for their own dataset, and the search configurations sent with a download URL). The datasets for each task are cu-
included in our code repository can be used as a starting point. rated from different studies [64–68] and were selected using the
In the following sections, we discuss how one can identify opti- following guidelines: (1) designed for patch-based classification,
(2) accessible without the need to make an online account, (3)
dataset is versioned, (4) sufficient size (at least a few thousand im-
ages), (5) diversity of tasks, and (6) in our judgement, important
1
https://github.com/SBU-BMI/champkit to the biomedical community. Five of the six datasets are hosted

3
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

on Zenodo, which stores static copies of the data. If these datasets The self-supervised model uses the ResNet18 architecture and was
are updated, the updated datasets can easily be incorporated into trained on a large histopathology dataset in a self-supervised fash-
ChampKit. Each dataset was split into training, validation, and test- ion. ChampKit itself does not support pretraining of models, but it
ing partition (see Table S1). Models were trained on the training does leverage pretrained models from published literature. Future
set and the validation set was used to evaluate performance at versions of ChampKit may include other pathology-specific mod-
the end of each epoch. Final models were chosen based on per- els, and the authors welcome community contributions of these
formance on the validation set. The test sets were used for final models, which can be made through a pull request to our GitHub
evaluation, and all results are reported on the test set. All data are repository.
downloaded when the user initializes the ChampKit repository, ac- Criteria for reproducibility We strive to achieve the silver stan-
cessible without any requirements for registration. dard of reproducibility as defined by [74]. The silver standard has
Using a new dataset Beyond the six benchmark datasets, Champ- three criteria:
Kit allows for integration with custom datasets that are framed as
1. Software dependencies are all prepared with a single command,
binary or multi-class patch classification. Datasets must be orga-
2. Documentation of how to run scripts and in which order,
nized in an ImageNet-like directory structure, in which the top-
3. Random elements must be made determinisitic.
level directory contains directories train, val, and test, and
each of those contains sub-directories of the different classes (e.g., ChampKit satisfies all three requirements. All dependencies
tumor-positive, tumor-negative) with the corresponding are installed using the conda Python environment manager, and
patches. Images can be of any size and will be resized during train- datasets are downloaded with a single command. The README of
ing and evaluation – the size is configurable. Indeed, we demon- the source code includes extensive documentation of how to run
strate the creation and use of a new dataset in Section 3.5. the training and evaluation scripts, and the random number gener-
Preparing a dataset from annotated whole slide images ChampKit ators used during training are seeded, so that with the same seed,
expects a dataset of patches in PNG or JPEG format. If one has a results will be identical or almost identical. Dataset preparation is
set of annotated whole slide images, where the annotations indi- also designed to be reproducible, meaning that when one initial-
cate regions in the slide that belong to a label (e.g., ”normal ep- izes ChampKit, one is guaranteed to have exactly the same copy
ithelium”), then existing tools can be used to export patches from of the data as was reported in this manuscript. This is accom-
these labeled regions. These tools include QuPath [69], OpenSlide plished in two ways: 1) five of the six datasets in ChampKit are
[70], and Large Image [71]. When exporting patches, users should hosted on Zenodo, which preserves immutable versions of data;
consider the physical resolution of the patches and should use and 2) the MD5 hashes of the downloaded data are validated at
a consistent resolution for all patches. For example, a user may download time to ensure that the user has the intended version
choose to extract patches of 224 × 224 pixels at 0.5 μm per of the dataset. In some cases, the downloaded dataset is mini-
pixel. Patches are also assumed to have mutually exclusive la- mally curated, including moving image patches into an appropriate
bels, so it is important to ensure that a patch is not a mem- directory structure and splitting the dataset into training, valida-
ber of two annotated regions with different labels. Next, the ex- tion, and test splits. These splits are done with seeded pseudoran-
tracted patches must be split into training, validation, and test sets dom number generators so that the results are deterministic. Data
(to avoid data leakage, patches from any single specimen or pa- download and curation is all done automatically when one initial-
tient should not be a member of multiple subsets). As described izes the ChampKit repository with the setup.sh script.
in the preceding paragraph, all images that belong to a particu-
lar class should be placed in a class-specific directory. If, for ex- 2.2. Experiments
ample, the task is binary classification between ”normal epithe-
lium” and ”neoplastic epithelium”, the directories for the train- The main purpose of this work is to introduce ChampKit as a
ing set would be named train/normal_epithelium/ and toolkit to rapidly evaluate deep neural networks on histopathol-
train/neoplastic_epithelium/, and the directories would ogy patch classification tasks. To demonstrate the utility of Champ-
contain all training images of normal epithelium and neoplastic Kit, we systematically trained several models commonly used in
epithelium. This would be repeated for the validation and test histopathology, namely ResNet18 and ResNet50 [75], to estab-
subsets. After the dataset is prepared in this fashion, one can lish baseline performance for each benchmark task. We also in-
use ChampKit to identify optimal classification models for the clude a hybrid vision transformer (R26-ViT) [76], which has shown
dataset. Section 3.5 describes the creation of a patch classification promise at image classification in smaller data regimes. All mod-
dataset from annotated whole slide images in the PANDA dataset els included comparisons of transfer learning from image classifi-
of prostate biopsies [72]. cation on ImageNet-1K [59] and training from random weight ini-
Using custom models While ChampKit provides access to a wide tialization. ResNet18 comparisons also included transfer learning
range of deep learning architectures and pre-trained models via from a model trained using self-supervision on histopathology data
timm [1], it may still be desirable to tweak an existing model, em- [46,60]. (ChampKit is designed to benchmark existing architectures
ploy a custom model, or use pre-trained weights from pathology- from existing weights, and thus it does not support pretraining
specific models. Image classification models specific to pathol- of models.) All models were implemented in timm using PyTorch,
ogy are being published regularly, and a user may want to ap- and pretrained ImageNet weights were accessed via timm, while
ply these models to their own datasets. In many cases, pathology- the weights from self-supervised pre-training for ResNet18 were
specific models use architectures found in timm, but the weights downloaded from [60]. The AdamW optimizer was used with a
will be specific to a pathology dataset. In that case, one can pro- learning rate of 0.001 [77], and models optimized cross entropy
vide the name of the network architecture, the image normaliza- loss. The learning rate was warmed up from 1e-6 to 0.001 over the
tion parameters, and a path to the pre-trained weights (an exam- first 10 epochs, followed by cosine decay set for 500 epochs. Early
ple of this is provided in the ChampKit code repository). If the stopping was enabled after 30 epochs based on validation cross en-
network is not one found in timm, one can include the PyTorch tropy loss with a patience of 20 epochs. All models used a dropout
[73] implementation of the architecture as well as image normal- rate of 0.3. Each experiment was run on a single NVIDIA Quadro
ization parameters. The ChampKit code repository includes an ex- RTX 80 0 0 GPU with 48GB of video memory.
ample of using pretrained weights from a pathology-specific model Data processing RGB channels were normalized with means
[60], and we report results using this model in this manuscript. (0.485, 0.456, 0.406) and standard deviations (0.229, 0.224, 0.225);

4
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

these values are the means and standard deviations of the Im-
ageNet training set. When using the R26-ViT or self-supervised
ResNet18, images were transformed to range [−1, 1] to match the
normalization used in the pretraining. Images were resized to
224 × 224 with bicubic interpolation for R26-ViT and ResNet50,
and bilinear interpolation for ResNet18. During training, images
were randomly flipped horizontally or vertically.
Evaluation Final models were chosen based on lowest cross en-
tropy loss on the validation set. AUROC, F1-score (threshold=0.5),
false negative rate, false positive rate, true negative rate, and true
positive rate were calculated using torchmetrics [78]. Champ-
Kit includes a script to perform this evaluation automatically using
a single command (see Section 2.1).

3. Results

The main result of this work is the ChampKit software. Champ-


Kit is a toolkit that enables users to evaluate many deep learning
architectures on histopathology patch classification tasks. The soft-
Fig. 2. Sample tumor negative and tumor positive images from PatchCamelyon
ware is reproducible, curates datasets automatically, and is easy dataset [68]. The yellow arrows point to examples of tumor cells. (For interpre-
to extend with custom datasets and models. In our experiments, tation of the references to colour in this figure legend, the reader is referred to the
ChampKit facilitated the assessment of three deep neural network web version of this article.)
architectures across six datasets, including comparing training from
randomly initialized weights and transfer learning. Results for all Table 1
tasks are available in Figures S1, S2, and S3. In the following sub- Results on task 1. AUROC = area under the re-
ceiver operator characteristic curve. FNR = false
sections, we report AUROC and false negative rate as these met- negative rate. Values shown are means across
rics are the most suitable for these tasks. Additional metrics are three runs.
reported in Supplementary Tables S2-S7.
Model Pretraining AUROC FNR

3.1. Task 1: Identification of areas containing tumor cells ResNet18 None 0.929 0.243
ResNet18 ImageNet 0.933 0.250
ResNet18 SSL 0.943 0.239
Identification of areas with tumor cells is critical in clinical ResNet50 None 0.921 0.249
histopathology. Tumor cells can have varied appearances and can ResNet50 ImageNet 0.943 0.217
be challenging to detect. In particular, small nests of tumor cells R26-ViT None 0.943 0.271
R26-ViT ImageNet 0.962 0.191
(<100 cells) might be difficult to detect, and this is one case
where automated deep learning algorithms can be highly useful.
This is especially true in sentinel lymph node biopsies, which are
performed to determine whether cells from primary tumor have against a cancer. TIL quantification is important for predicting sur-
metastasized. False negatives are unacceptable in this situation, vival outcomes and guiding treatment decisions [82–85]. TILs tend
and so deep learning methods for this task must be rigorously to be 8–12 μm in diameter with a dark, ovoid nucleus and scant
evaluated. We have included detection of areas containing tumor cytoplasm [86]. Despite the subtle qualitative differences of TILs
cells as task 1 in our benchmark because of its clinical importance across image patches, pathologists can identify TILs through visual
[79] and already wide-spread application in deep learning. inspection. However, in practice, they tend to characterize only a
Dataset The PatchCamelyon dataset [68] is a processed and cu- small number of microscopic fields of view [79]. More detailed
rated version of the Camelyon16 dataset [80] containing 327,680 prognostic patterns can be made by mapping TILs at a whole-
tumor and non-tumor images at 96 × 96 pixels (10× magnifica- slide-level [87]. Thus, it would be greatly beneficial to clinicians
tion) from sentinel lymph node biopsies of breast cancer (Fig. 1, to identify areas that contain TILs across histopathology slides
Fig. 2 and Table S1). An image is positively labeled if the cen- [36,37,84,88]. Deep learning has the potential to address major
ter 32 × 32 pixel region contains at least one pixel of tumor. The drawbacks of manual TIL scoring: inter-observer variability and the
PatchCamelyon dataset is licensed under Creative Commons Zero scalability of TIL detection. In response, there has been much inter-
v1.0 Universal and is anonymized. PatchCamelyon is available for est in applying deep neural networks to this task [36–38,83,89,90].
download [81] via Zenodo and is downloaded automatically when Thus, task 2 consists of pan-cancer identification of regions con-
the ChampKit repository is initialized. taining TILs because of its tremendous clinical relevance and pop-
Results For task 1, most models performed well according to AU- ularity in deep learning.
ROC (Tables 1 and S2). Because false negatives are an unwanted Dataset The dataset for task 2 consists of 304,097 TIL-positive
outcome for tumor classification, the FNR is a more relevant met- and TIL-negative images from [65], a curated subset of the data
ric, and R26-ViT pretrained on ImageNet had the lowest over- presented in [36,37] (Fig. 3 and Table S1). This dataset includes 23
all FNR and the highest overall AUROC. ImageNet pretraining im- different cancer types from The Cancer Genome Atlas (TCGA) [91],
proved performance in R26-ViT and ResNet50 but pretraining did representing a wide distribution of tissue types and stain differ-
not improve ResNet18. ences. Patches are from formalin-fixed, paraffin-embedded (FFPE)
whole slide images. Images are 100 × 100 pixels at 0.5 μm/pixel
3.2. Task 2: identification of areas containing tumor-infiltrating and are positive if they contain at least two TILs. No stain nor-
lymphocytes malization was applied to the images. The data is licensed un-
der Creative Commons Attribution 4.0 International. Images are
Tumor-infiltrating lymphocytes (TILs) are clinically useful as anonymized, and there is no overlap in TCGA participants across
prognostic biomarkers, related to the degree of immune response data splits. This dataset is available for download via Zenodo

5
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

Fig. 4. Sample images from MSI datasets. Histologic features of MSI include poorly
differentiated cells, signet ring cells, mucinous histopathology, cribriforming, and
lymphocytic infiltrate [92]. FFPE and frozen are two different types of tissue prepa-
rations.
Fig. 3. Sample images from TILs dataset [65]. The yellow arrows point to exam-
ples of TILs. (For interpretation of the references to colour in this figure legend, the
reader is referred to the web version of this article.) Dataset MSI data was curated from [26], which includes images
from formalin-fixed, paraffin-embedded samples of colorectal car-
Table 2 cinoma (CRC) and stomach adenocarcinoma (STAD) [66] and im-
Results on task 2 (TIL detection). Values shown
ages from frozen samples of CRC [67]. All images are 224 × 224
are means across three runs.
pixels at 0.5 μm/pixel (Fig. 4 and Table S1). These datasets are
Model Pretraining AUROC FNR publicly available and licensed under Creative Commons Attri-
ResNet18 None 0.970 0.252 bution 4.0 International, and all images are anonymized. These
ResNet18 ImageNet 0.969 0.256 datasets are available for immediate download via Zenodo (task
ResNet18 SSL 0.967 0.275
3 [66], task 4 [67], and task 5 [66]), and ChampKit automatically
ResNet50 None 0.969 0.241
ResNet50 ImageNet 0.968 0.253 downloads and prepares these datasets.
R26-ViT None 0.943 0.464 Results Overall, MSI detection was the most difficult task in-
R26-ViT ImageNet 0.974 0.246 cluded in ChampKit (Table 3). In Task 3, the AUROC and FNR of the
ResNets were consistent across pretraining strategies (Table S4).
Randomly initialized R26-ViT had the worst AUROC and but was
[65] and is downloaded automatically when the user initializes improved significantly by ImageNet pretraining. The FNR of this
ChampKit. model was highly variable across the three runs but was made
Results In general, all of the tested models do well on task 2 more consistent with ImageNet pretraining. In Task 4, interestingly
(Tables 2 and S3). ResNet18 and ResNet50 are relatively consis- the randomly initialized R26-ViT had the worst AUROC but the best
tent across pretraining strategies, though SSL pretraining resulted FNR, and pretraining with ImageNet significantly improved the AU-
in slightly worse performance. The randomly initialized R26-ViT ROC but worsened the FNR (Table S5). ImageNet pretraining wors-
performed most poorly and had a large spread in AUROC and FNR ened AUROC for the ResNets, and SSL pretraining resulted in the
across three runs. However, pretraining on ImageNet brought per- poorest performance for ResNet18. In Task 5, ImageNet pretraining
formance of R26-ViT in line with that of the ResNets, and indeed only helped ResNet50. For all other models, pretraining resulted in
this model was best based on AUROC and FNR. worse AUROC and FNR (Table S6).

3.4. Task 6: Precancerous versus benign


3.3. Tasks 3–5: Microsatellite instability detection
Colonoscopies are an important screening test for colorectal
Microsatellite instability (MSI) is an important prognostic clin- carcinoma. Polyps are commonly found during the procedure [105],
ical biomarker and has generated strong interest in recent years. and these polyps can be benign, precancerous, or cancerous. It is
MSI causes an abundance of DNA mutations and the formation critical to correctly classify a benign polyp from one with cancer-
of neoantigens, which activate the immune system, and causes ous potential because cancerous polyps might indicate need for ad-
changes in tissue morphology [86,92,93]. MSI is a useful clinical ditional treatment, but distinguishing between these remains chal-
biomarker and is an indicator for PD-1/PD-L1 blocking therapies, lenging [106]. False negatives are unacceptable in this task, and as
like pembrolizumab [94–98]. [99] recently found that their PD-1- such, there is significant interest in using deep learning to robustly
blocking therapy led to remission in all 18 study participants. If a detect precancerous polyps [107–109]. Due to the clinical impor-
pathologist suspects an MSI phenotype, the standard of care is to tance of detecting precancerous colorectal polyps and the growing
conduct confirmatory molecular testing. Previously, [26] found that interest in applying deep learning to this problem, we elected to
they could potentially avoid the time and cost of molecular testing use the MHIST dataset [64] for task 6.
by detecting MSI directly from histopathology. Many similar stud- Dataset This dataset includes images of hyperplastic polyps and
ies have been conducted [6,29,100–104], highlighting the impor- sessile serrated adenomas (Fig. 5). Hyperplasia is a benign over-
tance of and excitement around MSI. We have included MSI detec- growth of cells, and an adenoma is a precancerous, low-grade dis-
tion in different cancer types and tissue preparations as tasks 3–5 ordered growth of cells. MHIST consists of 3,152 images colorectal
because of the strong interest in predicting MSI from histopathol- polyps. The images were labeled as hyperplastic or adenomas by
ogy and the clinical relevance of MSI. seven pathologists, and a binary classification is made by majority

6
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

Table 3
Tasks 3–5 results. MSI detection in colorectal carconima (CRC) and stomach adenocar-
cinoma (STAD) with FFPE or frozen preparations. Values shown are means across three
runs.

Task 3 Task 4 Task 5

(CRC — FFPE) (CRC — frozen) (STAD — FFPE)

Model Pretraining AUROC FNR AUROC FNR AUROC FNR

ResNet18 None 0.661 0.628 0.708 0.632 0.710 0.555


ResNet18 ImageNet 0.666 0.665 0.698 0.657 0.693 0.582
ResNet18 SSL 0.667 0.652 0.667 0.683 0.696 0.571
ResNet50 None 0.667 0.619 0.701 0.674 0.705 0.550
ResNet50 ImageNet 0.668 0.646 0.689 0.677 0.728 0.520
R26-ViT None 0.531 0.539 0.667 0.407 0.718 0.467
R26-ViT ImageNet 0.676 0.697 0.732 0.656 0.709 0.586

best among the randomly initialized models, consistent with [51].


We speculate that pretraining was especially important here be-
cause of the small dataset size. Pretraining might provide useful
initializations for other small datasets. SSL pretraining, however,
did not provide improvements over ImageNet pretraining.
Hyperparameter search To demonstrate an evaluation of hyper-
parameters with ChampKit, we conducted a grid search using the
ResNet18 model on Task 6, searching over the following param-
eters: ImageNet-pretrained (yes, no), learning rate (0.01, 0.001,
0.0 0 01), batch size (16, 32, 64, 128), optimizer (Adam, AdamW,
SGD), freeze encoder (yes, no), augmentation (yes, no; augmenta-
tion is horizontal and vertical flipping each with probabilities of
0.5), and training with automatic mixed precision (yes, no). “Freeze
encoder” means that the original representation learning layers are
frozen and the appended multilayer perceptron is trainable. Of the
explored models, the AUROC ranged from 0.445 to 0.938, and F1
score ranged from 0.0 to 0.825 (Fig. S4). This demonstrates the ef-
fectiveness of using ChampKit to aid in hyperparameter search.

Fig. 5. Sample images from MHIST dataset [64]. Colonic crypts are outlined in yel-
low. Please note the difference in appearance between hyperplasia and adenoma. 3.5. Identifying an optimal model for multi-class patch classification
(For interpretation of the references to colour in this figure legend, the reader is from annotated whole slides
referred to the web version of this article.)

To demonstrate the use of ChampKit with annotated slides and


Table 4
Results on task 6 (precancerous versus benign).
multi-class classification, we evaluated models on patches sampled
Values shown are means across three runs. from the PANDA dataset [72]. This dataset contains over 10,0 0 0 an-
notated biopsy images and uses the CC BY-SA-NC 4.0 license. In
Model Pretraining AUROC FNR
about half of the dataset, regions of benign epithelium and differ-
ResNet18 None 0.885 0.358 ent Gleason grades are segmented (Gleason 3, 4, and 5). The biopsy
ResNet18 ImageNet 0.919 0.288
images and annotations are stored as multi-resolution TIFF images
ResNet18 SSL 0.906 0.298
ResNet50 None 0.823 0.596 (Fig. 6 a). We created a patch classification dataset using a subset
ResNet50 ImageNet 0.921 0.220 of the annotated PANDA slides. A limitation of the PANDA dataset
R26-ViT None 0.770 0.608 is that the Gleason segmentations are noisy, and this presents chal-
R26-ViT ImageNet 0.934 0.176 lenges when attempting to assign labels to extracted patches. We
have chosen not to include the patched PANDA dataset in the
ChampKit repository at this point because the noisy labels war-
vote. All images are 224 × 224 pixels at 8x magnification and are rant further cleaning of the data. Despite the limitations, this sec-
deidentified. The MHIST dataset is the smallest dataset included in tion demonstrates 1) how one can leverage annotated whole slide
ChampKit (Table S1), and this provides a useful test of how well images and 2) benchmark multi-class patch classification models
different models and pretraining strategies cope with a small data with ChampKit.
regime. To access the dataset, one must complete an online form Extracting patches from annotated whole slides In Section 2.1,
accepting a dataset use agreement. The user should then receive we described how one can prepare a patch classification dataset
an automated email with a link to download the dataset. Once from annotated whole slides, and we listed several tools. We opted
the dataset is downloaded, ChampKit can be used to prepare the to use Large Image [71] in a Python script to extract patches of
dataset and train and evaluate models on the data. 128 × 128 pixels at 0.5 μm/pixel. In essence, we iterated through
Results Unlike in the previous tasks, pretraining dramatically all non-overlapping patches in an annotation image, and we kept a
improved performance across all models (Tables 4 and S7), con- patch if it contained more than 10% of a label (not including back-
sistent with the original MHIST publication [64]. Randomly initial- ground) and only one label (normal epithelium, Gleason 3, Gleason
ized R26-ViT had the worst AUROC and FNR of all models, but 4, or Gleason 5). The histology patch was then extracted from the
the ImageNet-pretrained model had the best performance over- associated biopsy slide (using the same coordinates as the patch
all. ResNet50 was similar in performance to R26-ViT. ResNet18 was in the mask image) and was saved as a PNG file in a label-specific

7
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

Fig. 6. A sample prostate biopsy and extracted patches from the PANDA dataset [72]. This dataset includes over 10,0 0 0 biopsies and many have semantic segmentations of
Gleason 3, 4, and 5, as well as normal epithelium. The annotations are noisy, as they were created using other machine learning models. For this reason, the patch labels
are noisy too.

Table 5
Classification results for the patched PANDA dataset. The classes in this dataset are “be-
nign”, “Gleason3 , and “Gleason 4 or5 . Values shown are the performance for the indi-
cated class and are means across three runs. The labels for this dataset are noisy, which
could explain the high false negative rates.

Benign Gleason 3 Gleason 4 or 5

Model Pretraining AUROC FNR AUROC FNR AUROC FNR

ResNet18 None 0.791 0.048 0.619 1.000 0.865 0.418


ResNet18 ImageNet 0.847 0.097 0.767 0.636 0.921 0.358
ResNet18 SSL 0.853 0.095 0.776 0.660 0.919 0.331
ResNet50 None 0.796 0.044 0.626 0.998 0.870 0.414
ResNet50 ImageNet 0.835 0.076 0.769 0.695 0.906 0.366
R26-ViT None 0.802 0.031 0.660 0.996 0.890 0.424
R26-ViT ImageNet 0.836 0.085 0.766 0.648 0.910 0.383

directory. This was done for over 5,0 0 0 biopsies in the PANDA Gleason 3 classification. This is consistent with results on Task 6
dataset and resulted in a total of 1,621,011 benign, 2,220 Gleason 3, (MHIST), a similarly small dataset, and further suggests that trans-
1,722 Gleason 4, and 401 Gleason 5 patches. We then removed low fer learning is beneficial in small data regimes. Classification per-
contrast images, because low contrast could have indicated glass formance (specifically false negative rate) varied widely among the
or another area that we did not intend to sample. The dataset was three classes, reinforcing the need to evaluate the class-specific
highly unbalanced at this point. To remedy this, we merged the performance of models.
Gleason 4 and Gleason 5 labels into a single class, “Gleason 4 or
5”, and we randomly sampled 5,0 0 0 benign patches. Please see 4. Discussion and conclusion
Fig. 6 b for a sample of extracted patches and Table S8 for the final
dataset size. Here we introduce ChampKit, a reproducible toolkit for patch-
Results We performed the same experiment as in Tasks 1–6: based image classification in histopathology. We use ChampKit to
ResNet18, ResNet50, and R26-ViT models were trained with or provide baseline results for multiple models on six histopathology
without transfer learning, and evaluation metrics were calculated datasets. We found that transfer learning can improve classification
on a held out test set. All experimental settings were the same ex- performance, but this is not consistent across tasks. It remains un-
cept as in the previous tasks except the number of training epochs. clear whether the scale of the histopathology features (i.e., magni-
Models were trained for a total of 50 epochs. Notably, this task fication) plays a role in being amenable to transfer learning based
uses three classes, whereas Tasks 1–6 are binary classifications. on models pretrained on natural images. ChampKit enables the
ResNet18 models pretrained with self-supervised learning or Im- systematic evaluation of transfer learning on patch-based image
ageNet performed best overall (Tables 5, S9–11). Models trained classification, and we hope that it will greatly advance the knowl-
from random initialization performed markedly worse than trans- edge of transfer learning and modeling innovations in histopathol-
fer learning, often obtaining false negative rates close to 1.0 in ogy. ChampKit also allows users to identify optimal classification

8
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

models for their own datasets. Our own group has used this tool to Declaration of Competing Interest
accelerate our own research. Specifically, we used it to identify the
best multi-class Gleason grade classification model for a private The authors declare that they have no known competing finan-
dataset of prostate whole slide images. We trained 33 models in a cial interests or personal relationships that could have appeared to
hyperparameter search across several model architectures and data influence the work reported in this paper.
regularization techniques, which took approximately three days to
complete. We then identified the best model using the evaluation Acknowledgements
script in ChampKit, and we applied this model to whole slides us-
ing WSInfer [110] to obtain whole-specimen maps of Gleason grade We gratefully acknowledge support from National Cancer In-
for further analysis. The combination of ChampKit and WSInfer ac- stitute grants U24CA215109 and UH3CA225021. JRK was also
celerated our work and can be explored further in future work. supported by National Institutes of Health grant T32GM008444
A major goal of ChampKit is to expedite a user’s search for (NIGMS) and by the Medical Scientist Training Program at Stony
an ideal classification model on their digital pathology dataset. Brook University. PKK was supported in part by funding the Na-
This work makes use of two main components, namely timm (to tional Human Genome Research Institute of the National Institutes
provide pre-trained models and training methods) and Weights of Health under Award Number R01HG012131, the Developmental
and Biases (for hyperparameter search, experiment logging, and Funds from the CSHL Cancer Center Support Grant 5P30CA045508,
browser-based visualization of results). Other benchmarking toolk- and the Simons Center for Quantitative Biology at Cold Spring Har-
its exist, as discussed in Section 1.1, but to our knowledge, no bor Laboratory. This work was performed with assistance from the
toolkit exists specifically for patch-based classification in digital US National Institutes of Health Grant S10OD028632-01. The re-
pathology. Compared to existing tools, we do not expect Champ- sults shown here are in part based upon data generated by the
Kit to be more efficient in hyperparameter search or model train- TCGA Research Network: https://www.cancer.gov/tcga. We thank
ing. In fact, we use existing tools to implement these features, and Shushan Toneyan for her help reviewing the source code and re-
as they are developed, they may become more efficient. Addition- producing parts of this manuscript and Satrajit S. Ghosh for help
ally, we do not expect the models developed with ChampKit to be naming this project.
more accurate than models developed with other methods, in so
far as a similar hyperparameter space is explored. On the other
Supplementary material
hand, the ChampKit training script is derived from timm and in-
cludes many options that have been used to train state-of-the-art
Supplementary material associated with this article can be
ImageNet classifiers, and these options may also prove useful in
found, in the online version, at doi:10.1016/j.cmpb.2023.107631
digital pathology patch classification.
ChampKit has several limitations. There are many deep learn-
References
ing tasks in digital pathology, and ChampKit addresses only patch-
based classification. ChampKit can be modified to support other [1] R. Wightman, Pytorch image models, 2019, (https://github.com/rwightman/
patch-based tasks, like segmentation, though this would require pytorch-image-models). 10.5281/zenodo.4414861
the addition of segmentation-specific architectures, data loading [2] L. Biewald, Experiment tracking with weights and biases, 2020, Software
available from wandb.com, https://www.wandb.com/.
mechanisms, loss functions, and evaluation metrics. If users would [3] S. Banerji, S. Mitra, Deep learning in histopathology: a review, Wiley Inter-
like to benchmark semantic segmentation tasks, we refer them disciplinary Reviews: Data Mining and Knowledge Discovery 12 (1) (2022)
to OpenMMLab Semantic Segmentation Toolbox and Benchmark e1439.
[4] J. Van der Laak, G. Litjens, F. Ciompi, Deep learning in histopathology: the
[111]. Similarly, if users have slide-level classification tasks, we re-
path to the clinic, Nat. Med. 27 (5) (2021) 775–784.
fer them to the work of Laleh et al. [50]. Additionally, it is as- [5] C.L. Srinidhi, O. Ciga, A.L. Martel, Deep neural network models for computa-
sumed that all patches have mutually exclusive labels, though we tional histopathology: a survey, Med Image Anal 67 (2021) 101813.
[6] A. Echle, H.I. Grabsch, P. Quirke, P.A. van den Brandt, N.P. West,
acknowledge that it is possible that patches can potentially have
G.G.A. Hutchins, L.R. Heij, X. Tan, S.D. Richman, J. Krause, et al., Clinical-grade
multiple labels. ChampKit also does not perform pre-training on detection of microsatellite instability in colorectal tumors by deep learning,
ones dataset and does not support self-supervised learning, and Gastroenterology 159 (4) (2020) 1406–1416.
instead relies on previously published models for transfer learning. [7] S. Deng, X. Zhang, W. Yan, E.I. Chang, Y. Fan, M. Lai, Y. Xu, et al., Deep learn-
ing in digital pathology image analysis: a survey, Front Med 14 (4) (2020)
This limits the diversity of models that one may evaluate. We en- 470–487.
courage the community to provide feedback, suggest features, and [8] A.S. Sultan, M.A. Elgharib, T. Tavares, M. Jessri, J.R. Basile, The use of artificial
contribute new functionality to ChampKit via our GitHub reposi- intelligence, machine learning and deep learning in oncologic histopathology,
Journal of Oral Pathology & Medicine 49 (9) (2020) 849–856.
tory. [9] R. Gupta, T. Kurc, A. Sharma, J.S. Almeida, J. Saltz, The emergence of path-
The current study has several limitations as well. The baseline omics, Curr Pathobiol Rep 7 (3) (2019) 73–84.
comparisons were limited to three network architectures with dif- [10] A. Hamidinekoo, E. Denton, A. Rampun, K. Honnor, R. Zwiggelaar, Deep learn-
ing in mammography and breast histology, an overview and future trends,
ferent pretrained weights, but this is a tiny sample of available ar- Med Image Anal 47 (2018) 45–67.
chitectures accessible to ChampKit. All models in this manuscript [11] N. Coudray, P.S. Ocampo, T. Sakellaropoulos, N. Narula, M. Snuderl, D. Fenyö,
were also trained using identical hyperparameters for each dataset A.L. Moreira, N. Razavian, A. Tsirigos, Classification and mutation prediction
from non–small cell lung cancer histopathology images using deep learning,
to make fair comparisons. Nevertheless, further optimization of
Nat. Med. 24 (10) (2018) 1559–1567.
each model could improve performance but was not explored in [12] O. Jimenez-del Toro, S. Otálora, M. Andersson, K. Eurén, M. Hedlund, M. Rous-
this study to maintain consistency. Thus, the benchmarks per- son, H. Müller, M. Atzori, Analysis of histopathology images: from traditional
machine learning to deep learning, in: Biomedical Texture Analysis, Elsevier,
formed establish a lower-bound of what is achievable. As the scope
2017, pp. 281–314.
of this study was to introduce the ChampKit software, a follow up [13] J. Xie, R. Liu, J. Luttrell IV, C. Zhang, Deep learning based analysis of
study using this tool can build upon our work to elucidate model- histopathological images of breast cancer, Front Genet 10 (2019) 80.
ing choices that are generalizable as part of a more comprehensive [14] W. Mi, J. Li, Y. Guo, X. Ren, Z. Liang, T. Zhang, H. Zou, Deep learning-based
multi-class classification of breast digital pathology images, Cancer Manag
analysis. Res (2021) 4605–4617.
In summary, ChampKit enables users to benchmark patch [15] M. Abdolahi, M. Salehi, I. Shokatian, R. Reiazi, Artificial intelligence in au-
classification models and identify the best model for their tomatic classification of invasive ductal carcinoma breast cancer in digital
pathology images, Med J Islam Repub Iran 34 (2020) 140.
dataset, which we hope will accelerate deep learning research in [16] H. Yang, L. Chen, Z. Cheng, M. Yang, J. Wang, C. Lin, Y. Wang, L. Huang,
histopathology. Y. Chen, S. Peng, et al., Deep learning-based six-type classifier for lung cancer

9
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

and mimics from histopathological whole slide images: a retrospective study, [40] A.L.S. Meirelles, T. Kurc, J. Saltz, G. Teodoro, Effective active learning in digital
BMC Med 19 (2021) 1–14. pathology: acase study in tumor infiltrating lymphocytes, Comput Methods
[17] T. Sakamoto, T. Furukawa, K. Lami, H.H.N. Pham, W. Uegami, K. Kuroda, Programs Biomed 220 (2022) 106828.
M. Kawai, H. Sakanashi, L.A.D. Cooper, A. Bychkov, et al., A narrative review [41] U. Baid, S. Pati, T.M. Kurc, R. Gupta, E. Bremer, S. Abousamra, S.P. Thakur,
of digital pathology and artificial intelligence: focusing on lung cancer, Transl J.H. Saltz, S. Bakas, Federated learning for the classification of tumor infil-
Lung Cancer Res 9 (5) (2020) 2255. trating lymphocytes, arXiv preprint arXiv:2203.16622 (2022).
[18] S. Wang, D.M. Yang, R. Rong, X. Zhan, J. Fujimoto, H. Liu, J. Minna, I.I. Wistuba, [42] M. Amgad, E.S. Stovgaard, E. Balslev, J. Thagaard, W. Chen, S. Dudgeon,
Y. Xie, G. Xiao, Artificial intelligence in lung cancer pathology image analysis, A. Sharma, J.K. Kerner, C. Denkert, Y. Yuan, et al., Report on computational
Cancers (Basel) 11 (11) (2019) 1673. assessment of tumor infiltrating lymphocytes from the international im-
[19] S. Klimov, Y. Xue, A. Gertych, R.P. Graham, Y. Jiang, S. Bhattarai, S.J. Pan- muno-oncology biomarker working group, npj Breast Cancer 6 (1) (2020)
dol, E.A. Rakha, M.D. Reid, R. Aneja, Predicting metastasis risk in pancreatic 1–13.
neuroendocrine tumors using deep learning image analysis, Front Oncol 10 [43] J. Thiyagalingam, M. Shankar, G. Fox, T. Hey, Scientific machine learning
(2021) 593211. benchmarks, Nature Reviews Physics 4 (6) (2022) 413–420.
[20] H. Fu, W. Mi, B. Pan, Y. Guo, J. Li, R. Xu, J. Zheng, C. Zou, T. Zhang, Z. Liang, [44] R.J. Chen, C. Chen, Y. Li, T.Y. Chen, A.D. Trister, R.G. Krishnan, F. Mahmood,
et al., Automatic pancreatic ductal adenocarcinoma detection in whole slide Scaling vision transformers to gigapixel images via hierarchical self-super-
images using deep convolutional neural networks, Front Oncol 11 (2021) vised learning, in: Proceedings of the IEEE/CVF Conference on Computer Vi-
665929. sion and Pattern Recognition, 2022, pp. 16144–16155.
[21] H. Le, D. Samaras, T. Kurc, R. Gupta, K. Shroyer, J. Saltz, Pancreatic cancer de- [45] X. Wang, S. Yang, J. Zhang, M. Wang, J. Zhang, W. Yang, J. Huang, X. Han,
tection in whole slide images using noisy label annotations, in: Medical Im- Transformer-based unsupervised contrastive learning for histopathological
age Computing and Computer Assisted Intervention–MICCAI 2019: 22nd In- image classification, Med Image Anal 81 (2022) 102559.
ternational Conference, Shenzhen, China, October 13–17, 2019, Proceedings, [46] O. Ciga, T. Xu, A.L. Martel, Self supervised contrastive learning for digital
Part I 22, Springer, 2019, pp. 541–549. histopathology, Machine Learning with Applications 7 (2022) 100198, doi:10.
[22] M.N.M. Sehmi, M.F.A. Fauzi, W.S.H.M.W. Ahmad, E.W.L. Chan, Pancreatic can- 1016/j.mlwa.2021.100198.
cer grading in pathological images using deep learning convolutional neural [47] G. Fursin, Invited talk abstract: Introducing requEST: An open platform for re-
networks, F10 0 0Res 10 (1057) (2022) 1057. producible and quality-efficient systems-ML tournaments, 2018 1st Workshop
[23] H. Bowen, H. Huang, S. Zhang, D. Zhang, Q. Shi, J. Liu, J. Guo, Artificial intelli- on Energy Efficient Machine Learning and Cognitive Computing for Embedded
gence in pancreatic cancer, Theranostics 12 (16) (2022) 6931. Applications (EMC2), IEEE, 2018. 3–3
[24] M. Lucas, I. Jansen, C.D. Savci-Heijink, S.L. Meijer, O.J. de Boer, T.G. van [48] J. Thiyagalingam, K. Leng, S. Jackson, J. Papay, M. Shankar, G. Fox, T. Hey,
Leeuwen, D.M. de Bruin, H.A. Marquering, Deep learning for automatic Glea- SciMLBench: A benchmarking suite for AI for science, 2021, https://github.
son pattern classification for grade group determination of prostate biopsies, com/stfc- sciml/sciml- bench.
Virchows Archiv 475 (2019) 77–83. [49] A. Narayan, P. Molino, K. Goel, W. Neiswanger, C. Ré, Personalized benchmark-
[25] Y. Tolkach, T. Dohmgörgen, M. Toma, G. Kristiansen, High-accuracy prostate ing with the ludwig benchmarking toolkit, arXiv preprint arXiv:2111.04260
cancer pathology using deep learning, Nature Machine Intelligence 2 (7) (2021).
(2020) 411–418. [50] N.G. Laleh, H.S. Muti, C.M.L. Loeffler, A. Echle, O.L. Saldanha, F. Mahmood,
[26] J.N. Kather, A.T. Pearson, N. Halama, D. Jäger, J. Krause, S.H. Loosen, A. Marx, M.Y. Lu, C. Trautwein, R. Langer, B. Dislich, R.D. Buelow, H.I. Grabsch, H. Bren-
P. Boor, F. Tacke, U.P. Neumann, et al., Deep learning can predict microsatellite ner, J. Chang-Claude, E. Alwers, T.J. Brinker, F. Khader, D. Truhn, N.T. Gaisa,
instability directly from histology in gastrointestinal cancer, Nat. Med. 25 (7) P. Boor, M. Hoffmeister, V. Schulz, J.N. Kather, Benchmarking weakly-super-
(2019) 1054–1056. vised deep learning pipelines for whole slide classification in computational
[27] H.S. Muti, L.R. Heij, G. Keller, M. Kohlruss, R. Langer, B. Dislich, J.-H. Cheong, pathology, Med Image Anal 79 (2022) 102474.
Y.-W. Kim, H. Kim, M.-C. Kook, et al., Development and validation of deep [51] Y. Sharma, L. Ehsany, S. Syed, D.E. Brown, Histotransfer: Understanding trans-
learning classifiers to detect epstein-barr virus and microsatellite instability fer learning for histopathology, in: 2021 IEEE EMBS International Conference
status in gastric cancer: a retrospective multicentre cohort study, The Lancet on Biomedical and Health Informatics (BHI), IEEE, 2021, pp. 1–4.
Digital Health 3 (10) (2021) e654–e664. [52] S. Kornblith, J. Shlens, Q.V. Le, Do better imagenet models transfer better?
[28] A. Echle, N.G. Laleh, P.L. Schrammen, N.P. West, C. Trautwein, T.J. Brinker, in: Proceedings of the IEEE/CVF conference on computer vision and pattern
S.B. Gruber, R.D. Buelow, P. Boor, H.I. Grabsch, et al., Deep learning for the recognition, 2019, pp. 2661–2671.
detection of microsatellite instability from histology images in colorectal can- [53] M. Raghu, C. Zhang, J. Kleinberg, S. Bengio, Transfusion: understanding trans-
cer: a systematic literature review, ImmunoInformatics (2021) 10 0 0 08. fer learning for medical imaging, Adv Neural Inf Process Syst 32 (2019).
[29] R. Cao, F. Yang, S.-C. Ma, L. Liu, Y. Zhao, Y. Li, D.-H. Wu, T. Wang, W.-J. Lu, [54] M. Barker, N.P. Chue Hong, D.S. Katz, A.-L. Lamprecht, C. Martinez-Ortiz,
W.-J. Cai, et al., Development and interpretation of a pathomics-based model F. Psomopoulos, J. Harrow, L.J. Castro, M. Gruenpeter, P.A. Martinez, et al., In-
for the prediction of microsatellite instability in colorectal cancer, Theranos- troducing the FAIR principles for research software, Sci Data 9 (1) (2022) 1–6.
tics 10 (24) (2020) 11080. [55] M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton,
[30] Y. Liu, K. Gadepalli, M. Norouzi, G.E. Dahl, T. Kohlberger, A. Boyko, S. Venu- A. Baak, N. Blomberg, J.-W. Boiten, L.B. da Silva Santos, P.E. Bourne, et al.,
gopalan, A. Timofeev, P.Q. Nelson, G.S. Corrado, et al., Detecting cancer metas- The FAIR guiding principles for scientific data management and stewardship,
tases on gigapixel pathology images, arXiv preprint arXiv:1703.02442 (2017). Sci Data 3 (1) (2016) 1–9.
[31] D. Wang, A. Khosla, R. Gargeya, H. Irshad, A.H. Beck, Deep learning for iden- [56] U. Rubens, R. Mormont, L. Paavolainen, V. Bäcker, B. Pavie, L.A. Scholz,
tifying metastatic breast cancer, arXiv preprint arXiv:1606.05718 (2016). G. Michiels, M. Maška, D. Ünay, G. Ball, R. Hoyoux, R. Vandaele, O. Golani,
[32] B. Lee, K. Paeng, A robust and effective approach towards accurate metas- S.G. Stanciu, N. Sladoje, P. Paul-Gilloteaux, R. Marée, S. Tosi, Biaflows: a collab-
tasis detection and pN-stage classification in breast cancer, in: International orative framework to reproducibly deploy and benchmark bioimage analysis
Conference on Medical Image Computing and Computer-assisted Intervention, workflows, Patterns 1 (3) (2020) 10 0 040, doi:10.1016/j.patter.2020.10 0 040.
Springer, 2018, pp. 841–850. [57] B. Bischl, G. Casalicchio, M. Feurer, P. Gijsbers, F. Hutter, M. Lang, R.G. Manto-
[33] R. Awan, N.A. Koohbanani, M. Shaban, A. Lisowska, N. Rajpoot, Context-aware vani, J.N. van Rijn, J. Vanschoren, Openml benchmarking suites, arXiv preprint
learning using transferable features for classification of breast cancer his- arXiv:1708.03731 (2017).
tology images, in: International Conference Image Analysis and Recognition, [58] S.R. Piccolo, T.J. Lee, E. Suh, K. Hill, ShinyLearner: a containerized benchmark-
Springer, 2018, pp. 788–795. ing tool for machine-learning classification of tabular data, Gigascience 9 (4)
[34] O. Iizuka, F. Kanavati, K. Kato, M. Rambeau, K. Arihiro, M. Tsuneki, Deep learn- (2020), doi:10.1093/gigascience/giaa026. Giaa026
ing models for histopathological classification of gastric and colonic epithelial [59] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
tumours, Sci Rep 10 (1) (2020) 1–11. A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, ImageNet large scale
[35] S. Kwok, Multiclass classification of breast cancer in whole-slide images, visual recognition challenge, International Journal of Computer Vision (IJCV)
in: International Conference Image Analysis and Recognition, Springer, 2018, 115 (3) (2015) 211–252, doi:10.1007/s11263- 015- 0816- y.
pp. 931–940. [60] O. Ciga, Native pytorch weights (trained with 400 thousand images), 2022,
[36] J. Saltz, R. Gupta, L. Hou, T. Kurc, P. Singh, V. Nguyen, D. Samaras, K.R. Shroyer, (https://github.com/ozanciga/self- supervised- histopathology/releases/tag/
T. Zhao, R. Batiste, J. Van Arnam, Spatial organization and molecular correla- nativetenpercent).
tion of tumor-infiltrating lymphocytes using deep learning on pathology im- [61] T. maintainers, contributors, TorchVision: PyTorch’s Computer Vision library,
ages, Cell Rep 23 (1) (2018) 181–193. 2016, https://github.com/pytorch/vision.
[37] S. Abousamra, R. Gupta, L. Hou, R. Batiste, T. Zhao, A. Shankar, A. Rao, C. Chen, [62] J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, Jour-
D. Samaras, T. Kurc, J. Saltz, Deep learning-based mapping of tumor infiltrat- nal of machine learning research 13 (2) (2012).
ing lymphocytes in whole slide images of 23 types of cancer, Front Oncol 11 [63] R. Turner, D. Eriksson, M. McCourt, J. Kiili, E. Laaksonen, Z. Xu, I. Guyon,
(2022), doi:10.3389/fonc.2021.806603. Bayesian optimization is superior to random search for machine learning
[38] Z. Lu, S. Xu, W. Shao, Y. Wu, J. Zhang, Z. Han, Q. Feng, K. Huang, Deep-learn- hyperparameter tuning: Analysis of the black-box optimization challenge
ing–based characterization of tumor-infiltrating lymphocytes in breast can- 2020, in: NeurIPS 2020 Competition and Demonstration Track, PMLR, 2021,
cers from histopathology images and multiomics data, JCO Clinical Cancer In- pp. 3–26.
formatics 4 (2020) 480–490. [64] J. Wei, A. Suriawinata, B. Ren, X. Liu, M. Lisovsky, L. Vaickus, C. Brown,
[39] N. Linder, J.C. Taylor, R. Colling, R. Pell, E. Alveyn, J. Joseph, A. Protheroe, M. Baker, N. Tomita, L. Torresani, J. Wei, S. Hassanpour, A petri dish for
M. Lundin, J. Lundin, C. Verrill, Deep learning for detecting tumour-infiltrat- histopathology image analysis, in: International Conference on Artificial In-
ing lymphocytes in testicular germ cell tumours, J. Clin. Pathol. 72 (2) (2019) telligence in Medicine, Springer, 2021, pp. 11–24. https://doi.org/10.1007/
157–164. 978- 3- 030- 77211- 6_2

10
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

[65] J.R. Kaczmarzyk, S. Abousamra, T. Kurc, R. Gupta, J. Saltz, Dataset for tu- E. Bremer, J.S. Almeida, J. Saltz, Utilizing automated breast cancer detection
mor infiltrating lymphocyte classification (304,097 images from TCGA), 2022, to identify spatial distributions of tumor-infiltrating lymphocytes in invasive
10.5281/zenodo.6604094 breast cancer, Am. J. Pathol. 190 (7) (2020) 1491–1504.
[66] J.N. Kather, Histological images for MSI vs. MSS classification in gastrointesti- [90] X. Zhang, X. Zhu, K. Tang, Y. Zhao, Z. Lu, Q. Feng, DdtNet: a dense dual-task
nal cancer, FFPE samples, 2019a, 10.5281/zenodo.2530835 network for tumor-infiltrating lymphocyte detection and segmentation in
[67] J.N. Kather, Histological images for MSI vs. MSS classification in gastrointesti- histopathological images of breast cancer, Med Image Anal 78 (2022) 102415,
nal cancer, snap-frozen samples, 2019b, 10.5281/zenodo.2532612 doi:10.1016/j.media.2022.102415.
[68] B.S. Veeling, J. Linmans, J. Winkens, T. Cohen, M. Welling, Rotation equivari- [91] K.A. Hoadley, C. Yau, T. Hinoue, D.M. Wolf, A.J. Lazar, E. Drill, R. Shen,
ant CNNs for digital pathology, in: International Conference on Medical image A.M. Taylor, A.D. Cherniack, V. Thorsson, et al., Cell-of-origin patterns dom-
computing and computer-assisted intervention, Springer, 2018, pp. 210–218. inate the molecular classification of 10,0 0 0 tumors from 33 types of cancer,
https://doi.org/10.1007/978- 3- 030- 00934- 2_24 Cell 173 (2) (2018) 291–304.
[69] P. Bankhead, M.B. Loughrey, J.A. Fernández, Y. Dombrowski, D.G. McArt, [92] J. Alexander, T. Watanabe, T.-T. Wu, A. Rashid, S. Li, S.R. Hamilton, Histopatho-
P.D. Dunne, S. McQuaid, R.T. Gray, L.J. Murray, H.G. Coleman, et al., Qupath: logical identification of colon cancer with microsatellite instability, Am. J.
open source software for digital pathology image analysis, Sci Rep 7 (1) Pathol. 158 (2) (2001) 527–535.
(2017) 1–7. [93] G. Germano, S. Lamba, G. Rospo, L. Barault, A. Magrì, F. Maione, M. Russo,
[70] A. Goode, B. Gilbert, J. Harkes, D. Jukic, M. Satyanarayanan, Openslide: a ven- G. Crisafulli, A. Bartolini, G. Lerda, et al., Inactivation of DNA repair triggers
dor-neutral software foundation for digital pathology, J Pathol Inform 4 (1) neoantigen generation and impairs tumour growth, Nature 552 (7683) (2017)
(2013) 27. 116–120.
[71] L.I. maintainers, contributors, Large Image: Python modules to work with [94] S.J. Casak, L. Marcus, L. Fashoyin-Aje, S.L. Mushti, J. Cheng, Y.-L. Shen,
large multiresolution images, 2019, https://github.com/girder/large_image. W.F. Pierce, L. Her, K.B. Goldberg, M.R. Theoret, et al., FDA approval summary:
[72] W. Bulten, K. Kartasalo, P.-H.C. Chen, P. Ström, H. Pinckaers, K. Nagpal, Y. Cai, pembrolizumab for the first-line treatment of patients with MSI-H/dMMR ad-
D.F. Steiner, H. van Boven, R. Vink, et al., Artificial intelligence for diagnosis vanced unresectable or metastatic colorectal carcinoma, Clinical Cancer Re-
and gleason grading of prostate cancer: the PANDA challenge, Nat. Med. 28 search 27 (17) (2021) 4680–4684.
(1) (2022) 154–163. [95] D.M. O’Malley, G.M. Bariani, P.A. Cassier, A. Marabelle, A.R. Hansen, A. De
[73] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Jesus Acosta, W.H. Miller, T. Safra, A. Italiano, L. Mileshkin, et al., Pem-
Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. De- brolizumab in patients with microsatellite instability–high advanced endome-
Vito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, trial cancer: results from the KEYNOTE-158 study, Journal of Clinical Oncology
S. Chintala, PyTorch: an imperative style, high-performance deep learning 40 (7) (2022) 752–761.
library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F.d. Alché-Buc, E. Fox, [96] C. Luchini, F. Bibeau, M. Ligtenberg, N. Singh, A. Nottegar, T. Bosse, R. Miller,
R. Garnett (Eds.), Advances in Neural Information Processing Systems 32, N. Riaz, J.-Y. Douillard, F. Andre, et al., Esmo recommendations on mi-
Curran Associates, Inc., 2019, pp. 8024–8035. http://papers.neurips.cc/paper/ crosatellite instability testing for immunotherapy in cancer, and its re-
9015-pytorch-an-imperative-style-high-performance-deep-learning-library. lationship with pd-1/pd-l1 expression and tumour mutational burden: a
pdf systematic review-based approach, Annals of Oncology 30 (8) (2019)
[74] B.J. Heil, M.M. Hoffman, F. Markowetz, S.-I. Lee, C.S. Greene, S.C. Hicks, Repro- 1232–1243.
ducibility standards for machine learning in the life sciences, Nat. Methods [97] F. Pietrantonio, G. Randon, M. Di Bartolomeo, A. Luciani, J. Chao, E.C. Smyth,
18 (10) (2021) 1132–1135. F. Petrelli, Predictive role of microsatellite instability for PD-1 blockade in pa-
[75] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recogni- tients with advanced gastric cancer: a meta-analysis of randomized clinical
tion, in: Proceedings of the IEEE Conference on Computer Vision and Pattern trials, ESMO open 6 (1) (2021) 10 0 036.
Recognition, 2016, pp. 770–778. [98] L.A. Diaz, D.T. Le, T. Yoshino, T. Andre, J.C. Bendell, M. Rosales, S.P. Kang, B.
[76] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, Lam, D. Jäger, Keynote-177: Phase 3, open-label, randomized study of first-
M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth line pembrolizumab (pembro) versus investigator-choice chemotherapy for
16x16 words: transformers for image recognition at scale, arXiv preprint mismatch repair-deficient (dmmr) or microsatellite instability-high (msi-h)
arXiv:2010.11929 (2020). metastatic colorectal carc@inproceedingspmlr-v139-touvron21a, title = Train-
[77] I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint ing data-efficient image transformers & distillation through attention, author
arXiv:1711.05101 (2017). = Touvron, Hugo and Cord, Matthieu and Douze, Matthijs and Massa, Fran-
[78] N.S. Detlefsen, J. Borovec, J. Schock, A. Harsh, T. Koker, L.D. Liello, D. Stancl, cisco and Sablayrolles, Alexandre and Jegou, Herve, booktitle = International
C. Quan, M. Grechkin, W. Falcon, TorchMetrics - Measuring Reproducibility in Conference on Machine Learning, pages = 10347–10357, year = 2021, volume
PyTorch, 2022, 10.21105/joss.04101 = 139, month = Julyinoma (mcrc), 2018,
[79] P.L. Fitzgibbons, J.L. Connolly, College of American Pathologists, Protocol [99] A. Cercek, M. Lumish, J. Sinopoli, J. Weiss, J. Shia, M. Lamendola-Essel, I.H. El
for the examination of resection specimens from patients with invasive Dika, N. Segal, M. Shcherba, R. Sugarman, Z. Stadler, R. Yaeger, J.J. Smith,
carcinoma of the breast (version 4.6.0.0), 2022, https://documents.cap.org/ B. Rousseau, G. Argiles, M. Patel, A. Desai, L.B. Saltz, M. Widmar, K. Iyer,
protocols/Breast.Invasive_4.6.0.0.REL_CAPCP.pdf. J. Zhang, N. Gianino, C. Crane, P.B. Romesser, E.P. Pappou, P. Paty, J. Garcia-
[80] B.E. Bejnordi, M. Veta, P.J. Van Diest, B. Van Ginneken, N. Karssemeijer, G. Lit- Aguilar, M. Gonen, M. Gollub, M.R. Weiser, K.A. Schalper, L.A. Diaz, Pd-1
jens, J.A. Van Der Laak, M. Hermsen, Q.F. Manson, M. Balkenhol, et al., Diag- blockade in mismatch repairdeficient, locally advanced rectal cancer, N top
nostic assessment of deep learning algorithms for detection of lymph node N. Engl. J. Med. (2022), doi:10.1056/NEJMoa2201445.
metastases in women with breast cancer, JAMA 318 (22) (2017) 2199–2210. [100] R. Yamashita, J. Long, T. Longacre, L. Peng, G. Berry, B. Martin, J. Higgins,
[81] B.S. Veeling, J. Linmans, J. Winkens, T. Cohen, M. Welling, Rotation Equivariant D.L. Rubin, J. Shen, Deep learning model for the prediction of microsatellite
CNNs for Digital Pathology, 2018, https://zenodo.org/record/2546921. instability in colorectal cancer: a diagnostic study, The Lancet Oncology 22
[82] G.E. Idos, J. Kwok, N. Bonthala, L. Kysh, S.B. Gruber, C. Qu, The prognostic im- (1) (2021) 132–141.
plications of tumor infiltrating lymphocytes in colorectal cancer: a systematic [101] M.A. Jenkins, S. Hayashi, A.-M. O’shea, L.J. Burgart, T.C. Smyrk, D. Shimizu,
review and meta-analysis, Sci Rep 10 (1) (2020) 1–14. P.M. Waring, A.R. Ruszkiewicz, A.F. Pollett, M. Redston, et al., Pathology fea-
[83] S.T. Paijens, A. Vledder, M. de Bruyn, H.W. Nijman, Tumor-infiltrating lym- tures in bethesda guidelines predict colorectal cancer microsatellite instabil-
phocytes in the immunotherapy era, Cellular & Molecular Immunology 18 (4) ity: a population-based study, Gastroenterology 133 (1) (2007) 48–56.
(2021) 842–859. [102] J. Shia, N.A. Ellis, P.B. Paty, G.M. Nash, J. Qin, K. Offit, X.-M. Zhang,
[84] F. Pagès, B. Mlecnik, F. Marliot, G. Bindea, F.-S. Ou, C. Bifulco, A. Lugli, A.J. Markowitz, K. Nafa, J.G. Guillem, et al., Value of histopathology in pre-
I. Zlobec, T.T. Rau, M.D. Berger, et al., International validation of the consensus dicting microsatellite instability in hereditary nonpolyposis colorectal can-
immunoscore for the classification of colon cancer: a prognostic and accuracy cer and sporadic colorectal cancer, Am. J. Surg. Pathol. 27 (11) (2003)
study, The Lancet 391 (10135) (2018) 2128–2139. 1407–1417.
[85] M. Shaban, S.A. Khurram, M.M. Fraz, N. Alsubaie, I. Masood, S. Mushtaq, [103] A. Hyde, D. Fontaine, S. Stuckless, R. Green, A. Pollett, M. Simms, P. Sipahi-
M. Hassan, A. Loya, N.M. Rajpoot, A novel digital score for abundance of tu- malani, P. Parfrey, B. Younghusband, A histology-based model for predicting
mour infiltrating lymphocytes predicts disease free survival in oral squamous microsatellite instability in colorectal cancers, Am. J. Surg. Pathol. 34 (12)
cell carcinoma, Sci Rep 9 (1) (2019) 1–13. (2010) 1820–1829.
[86] V. Kumar, A.K. Abbas, J.C. Aster, Robbins Basic Pathology, Elsevier, Philadel- [104] M.R. Alam, J. Abdul-Ghafar, K. Yim, N. Thakur, S.H. Lee, H.-J. Jang, C.K. Jung,
phia, 2017. Y. Chong, Recent applications of artificial intelligence from histopathologic
[87] D.J. Fassler, L.A. Torre-Healy, R. Gupta, A.M. Hamilton, S. Kobayashi, image-based prediction of microsatellite instability in solid cancers: a sys-
S.C. Van Alsten, Y. Zhang, T. Kurc, R.A. Moffitt, M.A. Troester, K.A. Hoadley, tematic review, Cancers (Basel) 14 (11) (2022) 2590.
J. Saltz, Spatial characterization of tumor-infiltrating lymphocytes and [105] J.C. Obuch, C.M. Pigott, D.J. Ahnen, Sessile serrated polyps: detection, eradica-
breast cancer progression, Cancers (Basel) 14 (9) (2022) 2148, doi:10.3390/ tion, and prevention of the evil twin, Curr Treat Options Gastroenterol 13 (1)
cancers14092148. (2015) 156–170.
[88] R. Salgado, C. Denkert, S. Demaria, N. Sirtaine, F. Klauschen, G. Pruneri, [106] D.R. Jaravaza, J.M. Rigby, Hyperplastic polyp or sessile serrated lesion? the
S. Wienert, G. Van den Eynden, F.L. Baehner, F. Pénault-Llorca, et al., The eval- contribution of serial sections to reclassification, Diagn Pathol 15 (1) (2020)
uation of tumor-infiltrating lymphocytes (TILs) in breast cancer: recommen- 1–9.
dations by an international TILs working group 2014, Annals of Oncology 26 [107] D. Yoon, H.-J. Kong, B.S. Kim, W.S. Cho, J.C. Lee, M. Cho, M.H. Lim, S.Y. Yang,
(2) (2015) 259–271. S.H. Lim, J. Lee, et al., Colonoscopic image synthesis with generative adversar-
[89] H. Le, R. Gupta, L. Hou, S. Abousamra, D. Fassler, L. Torre-Healy, R.A. Mof- ial network for enhanced detection of sessile serrated lesions using convolu-
fitt, T. Kurc, D. Samaras, R. Batiste, T. Zhao, A. Rao, A.L. Van Dyke, A. Sharma, tional neural network, Sci Rep 12 (1) (2022) 1–12.

11
J.R. Kaczmarzyk, R. Gupta, T.M. Kurc et al. Computer Methods and Programs in Biomedicine 239 (2023) 107631

[108] J.W. Wei, A.A. Suriawinata, L.J. Vaickus, B. Ren, X. Liu, M. Lisovsky, N. Tomita, [110] J.R. Kaczmarzyk, T.M. Kurc, J.H. Saltz, WSInfer: blazingly fast pipeline for
B. Abdollahi, A.S. Kim, D.C. Snover, et al., Evaluation of a deep neural network patch-based classification in whole slide images, 2022. https://github.com/
for automated classification of colorectal polyps on histopathologic slides, SBU-BMI/wsinfer.
JAMA network open 3 (4) (2020). e203398–e203398 [111] M. Contributors, MMSegmentation: Openmmlab semantic segmenta-
[109] B. Korbar, A.M. Olofson, A.P. Miraflor, C.M. Nicka, M.A. Suriawinata, L. Torre- tion toolbox and benchmark, 2020, (https://github.com/open-mmlab/
sani, A.A. Suriawinata, S. Hassanpour, Deep learning for classification of col- mmsegmentation).
orectal polyps on whole-slide images, J Pathol Inform 8 (2017).

12

You might also like