0% found this document useful (0 votes)
39 views13 pages

1 s2.0 S1361841521003534 Main

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views13 pages

1 s2.0 S1361841521003534 Main

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Medical Image Analysis 76 (2022) 102308

Contents lists available at ScienceDirect

Medical Image Analysis


journal homepage: www.elsevier.com/locate/media

Encoding histopathology whole slide images with location-aware


graphs for diagnostically relevant regions retrieval
Yushan Zheng a, Zhiguo Jiang a,b,∗, Jun Shi d,∗∗, Fengying Xie a,b, Haopeng Zhang a,b, Wei Luo a,b,
Dingyi Hu a,b, Shujiao Sun a,b, Zhongmin Jiang c, Chenghai Xue e,f
a
Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Beijing 100191, China
b
Image Processing Center, School of Astronautics, Beihang University, Beijing 102206, China
c
Department of Pathology, Tianjin Fifth Central Hospital, Tianjin 300450, China
d
School of Software, Hefei University of Technology, Hefei 230601, China
e
Wankangyuan Tianjin Gene Technology, Inc, Tianjin 300220, China
f
Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China

a r t i c l e i n f o a b s t r a c t

Article history: Content-based histopathological image retrieval (CBHIR) has become popular in recent years in
Received 19 October 2019 histopathological image analysis. CBHIR systems provide auxiliary diagnosis information for pathologists
Revised 14 October 2021
by searching for and returning regions that are contently similar to the region of interest (ROI) from a
Accepted 17 November 2021
pre-established database. It is challenging and yet significant in clinical applications to retrieve diagnos-
Available online 20 November 2021
tically relevant regions from a database consisting of histopathological whole slide images (WSIs). In this
Keywords: paper, we propose a novel framework for regions retrieval from WSI database based on location-aware
Histopathological image analysis graphs and deep hash techniques. Compared to the present CBHIR framework, both structural information
CBIR and global location information of ROIs in the WSI are preserved by graph convolution and self-attention
Computer-aided cancer diagnosis operations, which makes the retrieval framework more sensitive to regions that are similar in tissue dis-
Graph convolutional network tribution. Moreover, benefited from the graph structure, the proposed framework has good scalability for
Self-attention
both the size and shape variation of ROIs. It allows the pathologist to define query regions using free
curves according to the appearance of tissue. Thirdly, the retrieval is achieved based on the hash tech-
nique, which ensures the framework is efficient and adequate for practical large-scale WSI database. The
proposed method was evaluated on an in-house endometrium dataset with 2650 WSIs and the public
ACDC-LungHP dataset. The experimental results have demonstrated that the proposed method achieved a
mean average precision above 0.667 on the endometrium dataset and above 0.869 on the ACDC-LungHP
dataset in the task of irregular region retrieval, which are superior to the state-of-the-art methods. The
average retrieval time from a database containing 1855 WSIs is 0.752 ms. The source code is available at
https://github.com/zhengyushan/lagenet.
© 2021 Elsevier B.V. All rights reserved.

1. Introduction et al., 2019), object detection (Xu et al., 2015; Veta et al., 2019), etc.
Generally, these applications can provide the pathologists diagnosis
With the development of digital pathology and artificial in- suggestions and even automatically generate reports within quan-
telligence, computer-aided cancer diagnosis methods based on titative data and diagnostic descriptions. However, these applica-
histopathological image analysis (HIA) (Litjens et al., 2017; Gurcan tions can hardly provide the dependence or reason of the decision.
et al., 2009; Hollon et al., 2020) have been widely studied. In re- The information for diagnoses is limited.
cent years, the studies focused on the histopathological whole slide Content-based histopathological image retrieval (CBHIR) is an
image classification (Zheng et al., 2017; Xu et al., 2017), segmenta- emerging approach in the domain of HIA (Kalra et al., 2020; Li
tion (Xu et al., 2014; Bejnordi et al., 2016; 2017; Jia et al., 2017; Falk et al., 2018; Zhang and Metaxas, 2016; Zheng et al., 2018). Com-
pared to the typical HIA methods mentioned above, CBHIR meth-
ods provide more valuable information, including similar regions

Corresponding author at: Beijing Advanced Innovation Center for Biomedical from diagnosed cancer cases, the corresponding meta-information,
Engineering, Beihang University, Beijing 100191, China.
∗∗
and the diagnosis reports of experts stored along with the cases in
Co-corresponding author.
E-mail addresses: [email protected] (Z. Jiang), [email protected] (J. Shi).
the digital pathology platform. It can increase the information and

https://doi.org/10.1016/j.media.2021.102308
1361-8415/© 2021 Elsevier B.V. All rights reserved.
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

improve the interpretability of the automatic diagnosis, which is of mines the proposed method is applicable to practical large-
developmental significance to pathologists. scale WSI databases.
With the rapid expansion of digital WSIs archive, it is recently 3) We conducted comprehensive experiments to verify the pro-
crucial to develop effective retrieval systems for large-scale WSI posed retrieval framework on an in-house endometrium dataset
database. However, the histopathology WSIs are gigapixel digital with 2650 WSIs and the public ACDC-LungHP dataset with
images with complex textural information and the query image 150 WSIs and compared it with 6 state-of-the-art methods.
is a region of interest (ROI) in various size and shape. It makes The experimental results have demonstrated that the proposed
the CBHIR for the WSI database a challenging task. The existing method achieves the best performance in the task of irregu-
methods are confronted with many difficulties, which are reflected lar region retrieval with a mean average precision above 0.667
in three aspects: (1) Due to the constraint of the deep learning on endometrium dataset and above 0.869 on the ACDC-LungHP
model, the regions to establish the database and the query re- dataset. The average retrieval time from a database within 1855
gion are limited to rectangle in a fixed size (Ma et al., 2017; Shi WSIs is 0.752 ms.
et al., 2018; Peng et al., 2019). In this case, multiple models need
to be established for the retrieval requirement in different sizes. (2) The remainder of this paper is organized as follows.
The retrieval for large regions is commonly completed by measur- Section 2 reviews the history of histopathological image re-
ing the distance between two sets of local features (Jimenez-del trieval. Section 3 introduces the methodology of the proposed
Toro et al., 2017; Zheng et al., 2018b; Chen et al., 2020). The in- method. The experiment and discussion are presented in Section 4.
ternal adjacency relationship of these local features is not consid- Section 5 summarizes the contributions. A part of this work has
ered, and meanwhile, the computational complexity is expensive. been presented on the conference paper (Zheng et al., 2019).
(3) The sub-regions are cropped from the WSI and then regarded
as independent items (Ma et al., 2018; Zheng et al., 2019) in the 2. Related works
database. The global location information of the sub-region in the
WSI is discarded. The above problems lead to a series of defects in The objects in the studies on CBHIR have been developed
precision, efficiency, and convenience of the retrieval system when through cells/nuclei, image patches and whole slide images with
applied to the practical database. the development of digital pathology. The typical methods related
In this paper, we simultaneously address the above three prob- to our work are reviewed in this section.
lems and propose a novel CBHIR framework for diagnostically
relevant region retrieval from large-scale WSI-database based on 2.1. Retrieval methods for cells and patches
graphs and deep hashing method. Unlike the present sub-region
retrieval frameworks, we proposed constructing location-aware Early studies focused on the cell retrieval from histological im-
graphs (LA-Graph) for the sub-regions in the WSI to describe both ages that were captured under the optical microscopy (Comaniciu
the structural information within the regions and the global lo- et al., 1998b; 1998a; Wetzel et al., 1999). With the development
cation information of the region in the WSI. Meanwhile, we de- of the digitalization of histological sections, CBHIR frameworks for
signed a novel location-aware graph encoding network (LAGE-Net) patches retrieval were proposed. Zheng et al. (2004); Zhou and
based on graph convolution and self-attention operations to en- Jiang (2004) and Mehta et al. (2009) employed the classical image
code the LA-Graph for retrieval. Moreover, we employed the hash- features to depict the histopathological images and achieved the
ing technique to ensure the efficiency of retrieval. The experiments patch-level retrieval. Then, the retrieval methodology was studied
on two large-scale datasets have demonstrated the effectiveness of in various aspects.
our method. A number of works concentrated on extracting high-level fea-
The contribution of this paper to the problem is three-fold: tures of histopathological images to improve the accuracy of re-
trieval. Specifically, CBHIR frameworks based on manifold learn-
1) We proposed a novel histopathology image retrieval frame- ing (Doyle et al., 2007; Sparks and Madabhushi, 2011), semantic
work based on location-aware graphs for databases consisting analysis (Caicedo et al., 2008; Caicedo and Izquierdo, 2010; Zheng
of whole slide images. To our knowledge, we are the first to et al., 2014), spectral embedding (Sridhar et al., 2011) and fine-
use the graphs to simultaneously represent the image content, designed local descriptors (Tizhoosh and Babaie, 2018; Erfankhah
the internal adjacency and the global location information for et al., 2019) have been developed and have proven effective in im-
histopathology ROIs. Specifically, the local features are regarded proving the accuracy of retrieval. Meanwhile, Gu and Jie (2018);
as the nodes of the graph, the spatial connection relations of Zheng et al. (2018a); Gu and Yang (2019) proposed utilizing the
the features are described as the edges of the graph and dis- contextual information by combining features from multiple mag-
tances of patches to the border of the tissue are represented by nifications of histopathological images to enhance the representa-
distance embedding. The definition of LA-Graph determines the tions of image patches and thus improve the performance of re-
sub-regions for retrieval are size- and shape-scalable. It allows trieval. As for the online usage of CBHIR, the security of retrieval
the pathologists to define query regions by free-curves. has also been considered (Cheng et al., 2019).
2) We designed a LAGE-Net to encode the LA-Graph into binary Besides the retrieval accuracy, the efficiency of CBHIR has be-
codes, which are used to index the sub-regions in the WSI come increasingly popular in the recent years. To satisfy the
and the query ROI. The spatial adjacency information in a LA- application for database consisting of massive histopathological
Graph is extracted by graph convolution operations and the images, hashing techniques were introduced. Typically, Zhang
global location information is modeled along with the local fea- et al. (2015b,a) and Jiang et al. (2016) introduced supervised
tures by self-attention operations. Finally, multiple information hashing with kernels (KSH) (Liu et al., 2012) into the CBHIR.
is combined and converted into binary codes by a hash mod- Shi et al. (2017) utilized a graph hashing model to learn the simi-
ule. The local features, spatial information and location infor- larity relationship of histopathological images. With hashing func-
mation for a sub-region in the WSI are effectively extracted and tions, the images are encoded into an array of binary codes. And
preserved by the LAGE-Net. The LAGE-Net can be trained end- the similarities among images are measured by Hamming dis-
to-end from graphs with a variable number of nodes to the tance, which is able to be calculated very efficiently using bit-
binary-like codes and the retrieval can be effectively achieved wise operations by computer. More recently, Shi et al. (2018);
based on hamming distances between binary codes. It deter- Sapkota et al. (2018) and Peng et al. (2019) constructed end-to-

2
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

end deep learning frameworks based on CNNs to directly en- graphs by regarding the patches as the graph nodes and the spatial
code histopathological images into binary codes. The overall per- adjacency as the graph edges.
formance of patch-level CBHIR has been further improved. Letting Ii represent the ith patch in a WSI, the process of feature
extraction can be described as
2.2. Whole slide image database retrieval xi = FCNN (Ii ),

The practical digital histopathology scans are generally stored in where FCNN represents a CNN feature extractor that takes an image
the format of whole slide images. Therefore, it is crucial to study patch as the input and outputs d f -dimensional column vector.
the approach for retrieving relevant sub-regions from the WSIs for Then, the hierarchical agglomerative clustering (HAC) algo-
a region the pathologist provided during the diagnosis. rithm (Day and Edelsbrunner, 1984) is employed to merge the
In the previous study, Ma et al. (2017) proposed dividing the patches in the WSI based on the CNN features. HAC is designed
WSIs into sub-regions following the sliding window paradigm and to merge a set of samples to an assigned number of clusters. In
encoding the individual regions to establish the retrieval database. each iteration of HAC, the two most similar clusters under specific
It is a convenient strategy to index WSIs for sub-regions retrieval. similarity measurement are merged. Specifically for a WSI, we pro-
However, the tissue structure was ignored in the division of WSIs pose regarding the patch features {xi }m s
i=1
as initial clusters and uti-
and retrieval instances in the database were limited to rectangle lizing error sum of squares (EES) as the similarity measurement
images in fixed sizes. It gaps from the applicable situation where between clusters. Besides, an adjacency matrix As ∈ {0, 1}ms ×ms is
pathologists usually define the ROIs with free-carves in various generated to indicate the connectivity of ms patches in the sth
shapes and sizes. WSI, where ai j = 1, ai j ∈ As indicates the ith and the jth patch are
Then, several retrieval strategies were developed to improve the spatially 4-connected and ai j = 0 otherwise. To ensure the merged
scalability of the retrieval framework. Zheng et al. (2018a) pro- sub-regions are spatially connected, only the pairs associated with
posed segmenting a WSI into super-pixels and defining the super- ai j = 1 are allowed to be merged in the iterations of HAC. Fig. 2b
pixels as retrieval instances. Further, Ma et al. (2018) proposed illustrates the merged sub-regions, where a colored area represents
merging the super-pixels into irregular regions based on selec- a sub-region.
tive search (Uijlings et al., 2013) to index the WSI. The query Another important information considered in the graph is the
ROI in these methods was not restricted in rectangle regions. global location of the sub-region in the WSI. The distance of the
Chen et al. (2020) proposed to represent the annotation regions by sub-region, especially the cancerous region, to the border of the
fusing patch-level features and encoding the region representation tissue implies the information about the size of the tumor, the
by supervised hashing for retrieval. However, the representation of depth of tumor infiltration, etc., that is important indicators of tu-
an irregular region was obtained by the quantification of local fea- mor classification and grading. Motivated by this, we propose mea-
tures. The scale information of the region cannot be described. In suring the minimum distance of each patch to the border of the
the methods (Jimenez-del Toro et al., 2017; Zheng et al., 2018b), tissue and adding it to the tissue graph data. It makes the graph
the composition of the images was considered by measuring the involve the tissue depth information, which is expected to ben-
similarity between each pair of local features across two regions. efit the subsequent encoding process. Specifically, we apply dis-
Nevertheless, the adjacency relationship of different types of his- tance transformation to the tissue mask segmented from the WSI,
tological objects was not considered in these methods. Therefore, as shown in Fig. 1(e–f) and record the minimum distance to the
the structural similarity between tissue regions is difficult to rec- tissue border for each patch center. The border distance of the jth
ognize in the retrieval procedure. patch is denoted by ϕ j . For a uniform representation, ϕ j is scaled
To conquest the drawbacks in the present methods, we pro- to ϕ j ∈ [0, 1] by the minimum and maximum values in the dataset.
posed to establish an end-to-end network based on the LA-Graphs Finally, we construct the location-aware graph for each subre-
to encode the regions into uniform indexes, where the local fea- gion, which can be represented as G = (A, X, φ ), where A ∈ Rm×m
tures, the adjacency relationships, and the location information are is an adjacent matrix that defines the connectivity in G with m
hopefully preserved in the indexes and reflected in the results of nodes, and X = (xT1 , xT2 , . . . , xm )T ∈ Rm×d f denotes the node feature
retrieval. matrix assuming each node is represented as a d f -dimensional
vector, and φ = (ϕ1 , ϕ2 , . . . , ϕm ). For convenience, all the graphs in
the sth WSI are represented by set Gs = {Gi |i = 1, 2, . . . , ns }, where
3. Method ns denotes the number of graphs in the sth WSI. The set Gs covers
the entire content of the WSI and thereby can be used to index the
The overview of the framework is illustrated in Fig. 1. The WSIs WSI.
are first divided into patches and converted into an image feature In summary, the algorithm of the LA-Graph construction is ar-
tube with a pre-trained convolutional neural network (CNN) (He ranged as Algorithm 1.
et al., 2016; Huang et al., 2017). Then, sub-region graphs are gen-
erated based on the spatial relationships and feature similarities of 3.2. LAGE-Net for region encoding
patches. Moreover, the minimum distances of the patches to the
border of the tissue are measured to identify the location of the It is challenging to encode the graph node attributes with local
graph in the WSI. Finally, the location-aware graphs (LA-Graphs) adjacency and global location information into a uniform represen-
are constructed and fed into the designed LAGE-Net to obtain the tation. In this paper, we propose a location-aware graph encoding
binary indexes for retrieval. The method for LA-Graph construc- network (LAGE-Net) to achieve this task. The structure of LAGE-Net
tion and the LAGE-Net are the main components of the framework, is presented in Fig. 3. The CNN features for a graph are firstly em-
which are elaborated in this section. bedded by a linear transformation, then fed into the stacked blocks
consisting of the LAGE module and feed-forward linear layers to
3.1. Location-aware graph construction obtain the graph representation. Finally, the graph representation
is transferred into the binary code by a hash module. Meanwhile,
The flowchart to generate the graphs for a WSI is illustrated in layer normalization and residual connection are inserted (as shown
Fig. 2. The patches in a WSI are clustered into sub-regions based in Fig. 3a). The main component of LAGE-Net is the LAGE module,
on their CNN features. Then, the sub-regions are represented with which is elaborated as follows.

3
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

Fig. 1. The proposed CBHIR framework. In the offline stage, the WSIs are first divided into patches following the sliding window paradigm and a CNN is trained based on the
patch labels to extract image features. Then, a WSI is divided into sub-regions based on the features and the spatial adjacency of the patches. Next, A graph is constructed
for each sub-region by regarding the patches within the sub-region as nodes and the spatial adjacency as edges. Additionally, the distances of the patches to the tissue
border are considered in the graph. Finally, the LAGE-Net is trained based on the graphs for sub-region encoding and indexing. In the online retrieval stage, the region the
pathologist queried is converted into a binary code using the trained models. The most relevant regions are retrieved by measuring the similarities between the query code
and those in the database and finally returned to pathologists for diagnosis reference.

Fig. 2. The flowchart of tissue graph generation, where (a) is a digital WSI, (b) illustrates the sub-regions clustered by Algorithm 1, (c) shows the graphs established on the
sub-regions, and (d) jointly presents a graph and its corresponding region where the nodes are drawn on the centers of patches.

Fig. 3. The location-aware graph encoding network (LAGE-Net) consists of a feature embedding layer, multiple stacked LAGE blocks, and a hash layer. It takes the location-
aware graph as input and outputs a binary code that is used to index the graph for retrieval.

3.2.1. LAGE module of the graph nodes. The message-passing based on A is essential in
The proposed LAGE module is composed of graph convolution, the graph representation learning. Therefore, we firstly apply the
self-attention and linear transformation operation. The flowchart of GCN methodology proposed by Kipf and Welling (2016) to achiev-
the module is illustrated in Fig. 3b. ing the internal relationship encoding. Generally, a step of graph
Internal relationship encoding with graph convolution The adja- convolution can be formulated as
cency matrix A in a tissue graph describes the internal relationship ˜ − 12 A
Hgc = D ˜D˜ − 12 Xe Wgc , (1)

4
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

to be the global location embeddings that are indexed by {ϕ̄ j }m


j=1
.
Algorithm 1: The algorithm of tissue graph construction.
dϕ̄ j is generated following sinusoidal embedding formula in Trans-
Input:
former (Vaswani et al., 2017). Then, D is merged to the original
ms ← The number of patches in the sth WSI;
node embeddings Xe by the operation
{xi |i = 1, 2, . . ., ms } ← The feature vectors of patches;
As ∈ {0, 1}ms ×ms ← The adjacency matrix of patches; Xsa = Xe + D, (4)
s ∈ [0, 1]ms ×ms ← The distance transformation matrix for
Here, Xsa ∈ Rm×de
involves the information from both the image
patch centers;
patterns and global locations of the patches. Next, we apply the
gˆs ← The target number of graphs (gˆs ≤ ms );
self-attention mechanism to build relations between the location-
Output: Gs
aware representations. This procedure can be represented as
1 for i = 1 to ms do
2 Ci ← {xi }; Xsa Wq · (Xsa Wk )T
3 end Asa = So f tmax(  ), (5)
datt
4 C ← {Ci |i = 1, 2, . . ., ms };
5 gs ← ms ; Hsa = Asa Xsa Wv , (6)
6 while gs > gˆs do
7 T ← ø; where Wk , Wq , Wv ∈ Rde ×datt represents the weights for Query, Key,
8 for (Ci , C j ) in and Value branch of the self-attention module, respectively.
9 {(Ci , C j )|∃x p ∈ Ci , ∃xq ∈ C j , i = j, s.t.a pq = 1} do Information integration with linear transformation Finally, the
10 di j ←EES(Ci ∪ C j ); outputs of the graph convolution and self-attention are concate-
11 T ← T ∪ {di j }; nated and then fed into a linear transformation layer, to integrate
12 end the patterns extracted from different aspects. The linear transfor-
13 index ( p, q ) ← arg min(i, j ) (T ); mation layer is formulated as
14 C p ← C p ∪ Cq ; Hl = GELU ([Hgc ; Hsa ] · Wl + bl ), (7)
15 C ← C \ Cq ;
16 gs ← gs − 1; where GELU represents the Gaussian error linear units function, Wl
17 end and bl are the weights and bias.
18 Gs ← ø; It is notable that the graph convolution (Eq. (2)) and self-
19 for Ci in C do attention (Eq. (6)) share the same formulation. The main difference
20 Xi ← (x1 , . . ., x j , . . ., x|Ci | )x j ∈Ci ; is that the message-passing matrix Ā is generated from the natu-
ral adjacency relationship of graph nodes and fixed in the calcu-
21 Ai ← Seek As for the adjacent relationship of patches
lation, and Asa is online generated based on the current state of
corresponding to Xi ;
each node regarding both the image content and the global loca-
22 φi ← Seek s for the distance values of patches
tion of the nodes. More generally, we extended the self-attention
corresponding to Xi ;
to the multi-head formulation to allow the LAGE module to ob-
23 Gi ← (Xi , Ai , φi );
serve more aspects of relationships behind the image content and
24 G s ← G s ∪ {Gi };
the global location information.
25 end
26 return Gs ;
3.2.2. Binary indexing with Hash function
In our method, the network is used to generate graph region
indexes that are effective for data retrieval. To learn the represen-
where Hgc ∈ Rm×de denotes the node embeddings after the lth step tation of the graph for the hash function, a trainable token is con-
of graph convolution, de denotes the dimension of the embed- catenated to the initial graph embeddings referring to BERT and
   ViT, which can be formulated as
dings, A ˜ = diag( j A
˜ = A + E1 , D ˜ 1 j, ˜
j A2 j , . . . , j An j ), and Wgc ∈
˜
Rde ×de is a trainable weight matrix. For simplification, we define Xe ← [xTh ; XTe ]T . (8)
1 1
˜−2 A
Ā = D ˜ − 2 and rewrite the Eq. (1) as
˜D
The learnable token is regarded as another node that is connected
Hgc = ĀXe Wgc , (2) with all other nodes in the graph embedding calculation, for which
the adjacency matrix A is correspondingly modified. Meanwhile,
Specifically, Xe represents the original embeddings of graph nodes, the token participates in all the self-attention computations.
which in our method is defined as the linear transformation of the To ensure the framework is applicable to the practical large-
CNN features followed by layer normalization (LN) operation. scale pathological database, we built a head layer with hash func-
Global location encoding with self-attention In this paper, the tions on the output of the last MLP. Supposing zh ∈ Rde represents
global location of the sub-regions in the WSI is proposed to be also the final MLP output of learnable token, the hashing function is
important for diagnostically relevant retrieval. Motivated by the us- defined as
age of the position embedding in the Transformer (Vaswani et al.,
yh = tanh(zh Wh + bh ), (9)
2017), we determine to build global location embeddings for the
graph nodes and merging the global location information into the where Wh ∈ Rde ×dh
and bh ∈ R dh
are the weights and bias for the
graph representation. Specifically, we define the distance index hash functions, dh is the dimension of binary codes, and tanh rep-
ϕ̄ j = [ϕ j × Ndist ] with Ndist controls the intervals of the distance in- resents the hyperbolic tangent function. yh ∈ (−1, 1 )dh is the net-
dexing and is empirically set as 64 in the experiment. Then, we work outputs that can be simply converted into binary codes by
define equation h = sign(yh ) ∈ {−1, 1}dh . Letting Y ∈ (−1, 1 )Ng ×dh denote
the binary-like codes of Ng graphs, the objective function to mini-
D = (dϕ̄T 1 , dϕ̄T 2 , . . . )T ∈ Rm×de (3)
mize for training the LAGE-Net is defined as
1 1
1
E denotes the unit matrix.
J= YYT − C 2
F + λ WTh Wh − E 2
F (10)
Ng dh

5
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

where C ∈ {−1, 1}Ng ×Ng is the pair-wise label matrix in which ci j = • ACDC-LungHP Li et al. (2021) contains 150 WSIs within lung
1 represents the ith graph and the jth graph are relevant and cancer regions annotated by pathologists.2 In the evaluation, 30
ci j = −1 otherwise. λ is the weight coefficient of the orthogonal WSIs were randomly selected as the testing dataset, and the re-
regularization and is empirically set to 0.01 in this paper. Finally, mainders were used to train the retrieval models and establish
the proposed LAGE-Net is trained end-to-end from the input graph the retrieval database.
with CNN-features to the output Y. For simplification, the LAGE-
Net is represented as h = FLAGE−Net (G ). All the WSIs were divided into square patches under lenses
of 20× following the sliding window paradigm. The step of
the window was set half of the length of the patch side.
3.3. Efficient retrieval with binary codes DenseNet (Huang et al., 2017) was employed as the CNN structure
to extract patch features. The global average pooling (GAP) layer
For each WSI in the retrieval database, a set of binary codes of the DenseNet structure was used as the feature extractor. The
that represent the graphs in the WSI can be obtained using the patch size was set 224 × 224 to fit the input of DenseNet. Graphs
trained feature extraction model FCNN and hash model FLAGE−Net . were constructed for each WSI using the algorithm provided in
When retrieving, the region the pathologist queries is divided Algorithm 1. For convenience, the graphs for establishing the re-
into patches and converted into binary codes using the same trieval database are represented as a set D and the query graphs
model. Then, the similarities between the query code and those are correspondingly represented as Q.
in the database are measured using Hamming distance referring to We first conducted experiments to determine hyper-parameters
Zhang et al. (2015b); Zhang and Metaxas (2016); Shi et al. (2018); of models involved in our method on the training set. Then, the re-
Zheng et al. (2019). After ranking the similarities, the most relevant trieval performance was evaluated on the testing set and compared
regions are retrieved and finally returned to the pathologist. with the state-of-the-art methods.
In the evaluation, the graphs that contain more than 10% can-
4. Experiments cerous pixels referring to the pathologists’ annotations were de-
fined as Cancerous Graph, the graphs containing none cancerous
4.1. Experimental setting pixels were regarded as Cancer-free Graph and the remainders were
not counted in the evaluation. For the Endometrium-2K dataset, the
The experiments were mainly conducted on a public dataset label of a Cancerous Graph is set the same as the WSI to which
and an in-house dataset of histopathological whole slide images. the graph belongs. Correspondingly, only the returned graphs that
The profiles of the datasets are provided as follows. share the same label with the query one were considered as rele-
vant in both the training and evaluation stage. The average preci-
• Endometrium-2K contains 2650 WSIs of endometrium sion of retrieval P @k for top-k-returned regions and the mean av-
histopathology from 2650 patients collected by Tianjin Fifth erage precision mAP are used as the metrics. Letting rik = 1 indi-
Central Hospital of China. These WSIs were scanned under cates that the kth returned result shares the same label with the
a lens of 20× and categorized to 5 types of endometrial ith query graph and rik = 0 otherwise, P @k and mAP can be defined
pathology, including Well-differentiated Endometrioid adeno- by equations
carcinoma (WDEA), Moderately differentiated Endometrioid |Q |
adenocarcinoma (MDEA), Low differentiated Endometrioid 1 
P @k = pi ( k ),
adenocarcinoma (LDEA), Serous endometrial intraepithelial |Q| i=1
carcinoma (SEIC), and Normal. All the cancerous regions were
|Q | | D |
annotated by expert pathologists. The WSI instances are shown 1  pi (k ) · rik
k=1
in Fig. 4, and data allocation is given in Table 1. In the experi- mAP = | D | ,
|Q| i=1 r
ment, 30% WSIs were randomly selected as the testing dataset k=1 ik
(to generate query regions), and the remainders were used to where | · | denotes the length of the set and
train the retrieval models and establish the retrieval database. k
j=1 ri j
pi ( k ) =
k
represents the retrieval precision of the top-k returned results for
the ith query instance. The higher the metrics, the better the re-
trieval performance.
All the experiments were conducted in python with pytorch
and run on a computer cluster with 10 available GPUs of Nvidia
Geforce 2080Ti. The source code for LAGE-Net is available at
https://github.com/zhengyushan/lagenet.
The Adam optimizer was employed to train the model. The ini-
tial learning rate for training the LAGE-Net is 3 × 10−4
Fig. 4. Instances in the endometrial WSI dataset, where (a) is Well-differentiated
Endometrioid adenocarcinoma (WDEA), (b) is Moderately differentiated Endometri- 4.2. The structure of feature extraction CNN
oid adenocarcinoma (MDEA), (c) is Low differentiated Endometrioid adenocarci-
noma (LDEA), (d) is Serous endometrial intraepithelial carcinoma (SEIC), (e) is Nor-
mal and the ground-truth for the cancerous regions are provided on the second
The CNN used for patch feature extraction was trained via a
row. subtype classification task. Specifically, patches in size of 224 ×
224 pixels were sampled from the training WSIs. The patches con-
Table 1 taining above 75% percentage of cancerous pixels were labeled
Data allocation of the Endometrium-2K dataset.

Type Name WDEA MDEA LDEA SEIC Normal Total 2


The dataset is accessible at https://acdc- lunghp.grand- challenge.org. Since the
Number 813 821 277 152 587 2650 annotations of the testing part of the data set are not yet published, only the 150
training WSIs of the data were used in this paper.

6
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

Fig. 5. The mAP − #F LOPs curves as functions of the hyper-parameters of the LAGE-Net, where the average mAP of the 5-fold cross-validation is presented by each data point
and the standard variance of the 5 trials is drawn with red bar. (For interpretation of the references to color in this figure legend, the reader is referred to the web version
of this article.)

as positive samples, the patches involving none cancerous pixels Table 2


Results for the ablation study, where the metrics on the test set are compared and
were regarded as negative samples, and the other patches were not
the best values are shown in bold.
used.
The depth of DenseNet was tuned within the training set in the Networks P@5 P@20 mAP
scope suggested in Huang et al. (2017). The best depth was deter- LAGE-Net w/o dist & adj 0.561 0.553 0.626
mined according to the mean classification error of five-fold cross- LAGE-Net w/o dist 0.561 0.559 0.634
validation with the training slides. Finally, the depth of the CNN LAGE-Net w/o adj 0.571 0.563 0.652
LAGE-Net 0.587 0.583 0.667
was determined as 121 according to the validation error, for which
the dimension of the patch features (d f ) is 1024.

completed and meanwhile maintain a high retrieval precision,


4.3. The structure of LAGE-Net
we set de = 512 in the following experiment.

The body of the LAGE-Net is stacked by multiple LAGE blocks


and each block is composited of a graph convolution head and 4.4. Ablation study
multiple self-attention heads. The number of blocks Nb , the num-
ber of attention heads Nh and the dimension of the embeddings de The patch features X are the essential information that is re-
determine the computational complexity of the encoding process quired to be encoded by the LAGE-Net. Besides, The internal adja-
and the performance of retrieval. These hyper-parameters were cency information described by A and the global location informa-
tuned over a wide range and selected based on the best mAP ob- tion in the distance embedding D are also proposed to be impor-
tained through five-fold cross-validation in the training set. Note tant in this paper. The former is mainly modeled by the graph con-
that the division of the data for the cross-validation here was the volution in the LAGE module and the latter is modeled by the self-
same with that in the CNN training stage. attention operation along with the patch features. We conducted
The mAP and the number of floating-point operations (#FLOPs) ablation experiments to certify the effectiveness of the two factors.
as functions of hyper-parameter settings for the Endometrium-2K The ablation models are as follows.
dataset are presented in Fig. 5. The other hyper-parameters were
set fixed when one hyper-parameter was tuned. • LAGE-Net w/o dist. The distance embedding dϕ̄ j is replaced with
a common positional embedding d j that is associated to the
1) The number of blocks Nb determines the depth of the LAGE- patch index j. As a result, the global location information of
Net. A larger Nb helps extract higher level of information from the graph is removed.
tissue graphs but would also increase the risk of over-fitting. • LAGE-Net w/o adj. The adjacency matrix in the graph convolu-
Nb was tuned from 4 to 12 with a step of 2 and the results (as tion path is set to A = 0. Consequently, the internal structure
shown in Fig. 5a) indicate Nb = 8 is optimum for the dataset. constraint is not considered in the graph encoding process.
2) The number of heads Nh determines the width of the network. • LAGE-Net w/o dist & adj. Both the above two types of ablation
More heads for self-attention enable the network to build node are performed.
relations in more aspects and therefore generate better graph
representations for retrieval. Nh was tuned from 0 to 14 with The retrieval performance is compared in Table 2. The average
a step of 2. Note that Nh = 0 means self-attention operations precision and mAP apparently decreased as the internal adjacency
along with the global location information are entirely omitted or the global location information of the graph was discarded. The
and the graph representation learning is only based on the local experiment has verified the effectiveness of the two components.
adjacency information. The results in Fig. 5b show that Nh = 4 Especially, LAGE-Net w/o dist suffers a 3.6% drop in P@5 and 3.3%
and Nh = 8 achieved comparable retrieval performance. Finally, drop in mAP when the distance embedding is not considered. It
we decided to use 4 self-attention heads in each LAGE module indicates that the depth of a region to the border of the tissue
in pursuit of lower computation. matters to the recognition of tumor types. Moreover, we visualized
3) The dimension of embedding de affects the total floating-point the graph representation {zh } for the training graphs (the retrieval
operations of the linear transformation and the multi-head self- database) in the 2-dimensional space with t-SNE (Maaten and Hin-
attention after Nb and Nh are decided. The computational com- ton, 2008). Fig. 6(a and b) illustrates the averaged border distance
plexity of LAGE-Net is in direct proportion to O (Nb Nh de2 ) when for each graph. It is obvious that, in Fig. 6b, the graphs sharing a
the input graph is fixed. In this experiment, we tune de from similar depth to the tissue border tends to cluster in the feature
128 to 1204. As shown in Fig. 5c, the mAP steady increases space. That is one of the reasons that LAGE-Net is significantly su-
as de enlarges, but the computational amount suffers from perior to LAGE-Net w/o dist. In contrast, the clustering phenotype
quadratic augmentation. To ensure the retrieval can be quickly became less obvious when the distance embedding was removed

7
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

Fig. 6. The 2-dimensional visualization of the graph representation output by the last MLP (zh ) for the Endometrium-2K retrieval database, where a dot represents a graph,
the color of the dots in (a) and (b) presents the averaged normalized distance for each graph to the border of the tissue (φ j ), and the color of the dots in (c) and (d) indicates
the ratio of tumor occupation in the graph referring to the color bar on the right of the figure. (Only a part of the database graphs are randomly selected and plotted for
the purpose of clear display.). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

from the encoding. It also demonstrates the effectiveness of the hashing. Finally, the similarity between two graphs is computed
proposed distance embedding approach. based on binary codes.

The second group trains deep learning models that take sub-
4.5. Comparison with the state-of-the-art region features as input and outputs uniform representations for
the sub-regions. Then, the retrieval can be completed by measuring
The proposed method is compared with 6 state-of-the- the distance of these uniform representations. We compare three
art-methods proposed by Ma et al. (2018); Jimenez-del Toro typical methods from this group. Note that the proposed method
et al. (2017); Zheng et al. (2018b, 2019); Yan et al. (2020); belongs to the second group.
Dosovitskiy et al. (2021). These methods can be categorized into
two groups. The first group designs distance between the sets • Zheng et al. (2019). Graphs are constructed to describe the sub-
of features of two sub-regions to measure their similarity for re- regions in the WSIs and fed into a GCN with diffpool module
trieval. This group includes the following four methods: (Ying et al., 2018) to extract the graph representation. A hash
layer is built to the end of the GCN to convert the representa-
• Jimenez-del Toro et al. (2017). The retrieval is achieved based tion into binary codes.
on both the WSIs and the text information of the cases. In the • Yan et al. (2020). The patch features are fed into a two multi-
experiment, only the part for WSIs retrieval was implemented layer bi-directional LSTM to exchange the contextual informa-
for the meta-information of the datasets is not available. Specif- tion. Then the outputs of the LSTM are merged by an average
ically, the distances between all pairs of patch features across pooling layer for similarity measurement.3
two sub-regions were calculated and the mean value of the dis- • Dosovitskiy et al. (2021). The vision transform (ViT) model is
tances was used as the similarity measurement. applied to encode the graph and the classification head of ViT
• Zheng et al. (2018b). The patches in the sub-regions are en- is replaced with a hash layer to generate the binary codes for
coded into binary codes. When retrieving, a set of proposal retrieval.
graphs is first retrieved through table lookup operation based
on patch codes. Then, the distances between the query graph For a fair comparison, the feature extractors (or backbones) of
and the proposal graphs are calculated under specific similarity the compared methods were the same DenseNet-121 structure.
measurement and then the most similar graphs are returned.
• Ma et al. (2018). The features for a sub-region are quantified 3
The outputs of the LSTM are concatenated to generate the regional representa-
through a max-pooling operation. Then, the obtained represen- tion in the original method. For that the number of the features for a sub-region in
tation is converted into binary codes based on latent Dirich- our experiment was not consistent, we used a average pooling layer as a substitute
let allocation (LDA) (Blei et al., 2003) followed by supervised for the concatenation.

8
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

Fig. 7. Comparison of interpolated precision-recall curves of different retrieval methods on the Endometrium-2K dataset, where (a) provides the distributions of number of
graph nodes obtained with different n̄, and (b–e) present the interpolated precision-recall curves for different settings of n̄, respectively. (For interpretation of the references
to color in this figure legend, the reader is referred to the web version of this article.)

Table 3
Retrieval performance for the state-of-the-art methods on the Endometrium-2K dataset, where the results for different size allocations of graphs (determined by n̄) are
compared.

Retrieval
Methods n̄ = 30 n̄ = 50 n̄ = 70 n̄ = 90 complexity

P@5 / P@50 / mAP P@5 / P@50 / mAP P@5 / P@50 / mAP P@5 / P@50 / mAP

Jimenez-del Toro et al. (2017) 0.492 / 0.472 / 0.362 0.532 / 0.514 / 0.392 0.527 / 0.512 / 0.393 0.526 / 0.509 / 0.383 O ( a2 b )
Zheng et al. (2018b) 0.502 / 0.469 / 0.337 0.530 / 0.504 / 0.364 0.541 / 0.513 / 0.372 0.542 / 0.507 / 0.359 O ( a2 b )
Ma et al. (2018) 0.486 / 0.466 / 0.361 0.526 / 0.510 / 0.392 0.522 / 0.507 / 0.401 0.522 / 0.505 / 0.398 O (b )
Zheng et al. (2019) 0.563 / 0.558 / 0.594 0.581 / 0.561 / 0.618 0.573 / 0.584 / 0.633 0.603 / 0.599 / 0.646 O (b )
Yan et al. (2020) 0.568 / 0.567 / 0.594 0.574 / 0.564 / 0.597 0.540 / 0.569 / 0.595 0.589 / 0.589 / 0.593 O (b )
Dosovitskiy et al. (2021) 0.569 / 0.569 / 0.633 0.587 / 0.587 / 0.645 0.590 / 0.588 / 0.653 0.604 / 0.603 / 0.665 O (b )
Proposed 0.588 / 0.581 / 0.667 0.593 / 0.593 / 0.673 0.611 / 0.610 / 0.686 0.619 / 0.617 / 0.692 O (b )

4.5.1. Comparison of retrieval precision max-pooling operations. Both the interaction and the location in-
We conducted experiments for sub-regions in different scales formation of the patches were not considered in the sub-region
to evaluate the retrieval performance of the compared method. encoding process. These issues make it challenging to identify the
Specifically, n̄ was set from 30 to 90 with a step of 20 and the subtle patterns in different subtypes of tumors, resulting in a gap
retrieval database and the number of clusters (i.e. the target num- in P@5 about 6.1–11.3% to the methods in the second group.
ber of graphs gˆs for each WSI is determined by equation gˆs = The methods Zheng et al. (2019); Yan et al. (2020) in the second
[ms /n̄]. The graphs for each setting of n̄ were obtained based on group have modeled the internal adjacency information among
Algorithm 1. The allocation of graph node numbers is visualized sub-region patches. The former applied the adjacency matrix to
by boxplots in Fig. 7. The experimental results are summarized in connecting features in 2D planar space and the latter constructed
Table 3. Correspondingly, the interpolated precision-recall curves a sequence to describe the 1D adjacency information. Then, the
are illustrated in Fig. 7(b–e). graph neural networks and recurrent neural networks were trained
Overall, the proposed method has achieved the best retrieval end-to-end based on the region labels to generate uniform region
performance in the quantitative evaluation. The retrieval methods representations. The end-to-end training strategy delivered a high
in the first group, including (Jimenez-del Toro et al., 2017; Ma mean average precision in the retrieval evaluation, as shown in
et al., 2018; Zheng et al., 2018b), depend on the similarity mea- Fig. 7(b-2). Meanwhile, the precision for the top-returned was sig-
surement of patch features. In these methods, the patch features nificantly improved. However, the gap of mAP for (Yan et al., 2020)
were regarded as equally informative in the mean average and to the other methods in the second group gradually enlarges as the

9
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

Table 4
Retrieval performance for the state-of-the-art methods on the ACDC-LungHP dataset, where the results for different size allocations of graphs (determined by n̄) are com-
pared.

Methods n̄ = 30 n̄ = 40 n̄ = 50 n̄ = 60 n̄ = 70 n̄ = 80 n̄ = 90

P@50 / mAP P@50 / mAP P@50 / mAP P@50 / mAP P@50 / mAP P@50 / mAP P@50 / mAP

Jimenez-del Toro et al. (2017) 0.779/ 0.708 0.777/ 0.709 0.772/ 0.709 0.779/ 0.708 0.765/ 0.705 0.786/ 0.707 0.782/ 0.707
Zheng et al. (2018b) 0.797/ 0.702 0.788/ 0.704 0.789/ 0.703 0.794/ 0.703 0.780/ 0.701 0.790/ 0.703 0.793/ 0.704
Ma et al. (2018) 0.783/ 0.715 0.783/ 0.719 0.779/ 0.719 0.788/ 0.718 0.783/ 0.717 0.789/ 0.718 0.786/ 0.718
Zheng et al. (2019) 0.801/ 0.862 0.811/ 0.865 0.840/ 0.867 0.831/ 0.872 0.797/ 0.857 0.845/ 0.884 0.858/ 0.881
Yan et al. (2020) 0.796/ 0.803 0.824/ 0.841 0.805/ 0.813 0.818/ 0.835 0.821/ 0.816 0.793/ 0.816 0.832/ 0.844
Dosovitskiy et al. (2021) 0.815/ 0.861 0.838/ 0.880 0.843/ 0.886 0.829/ 0.875 0.815/ 0.864 0.815/ 0.853 0.887/ 0.885
Proposed 0.819/ 0.869 0.860/ 0.899 0.863/ 0.901 0.848/ 0.885 0.820/ 0.868 0.833/ 0.866 0.884/ 0.897

size of the sub-region increases. The main reason is that the adja- sizes of query regions. Overall, the proposed retrieval framework
cency information is modeled by RNN. The communication of the with the LAGE-Net achieved the best performance. The experiment
patch features requires traversing the entire sequence and there- results are consistent with those obtained on the Endometrial-2K
fore is weakened when the sequence lengthens. dataset.
ViT (Dosovitskiy et al., 2021) achieved comparable retrieval per-
formance with our method. The superiority benefits from the self- 4.7. Visualization
attention mechanism, which enables a weighted communication
among patch features during the encoding process. The essential To further validate the qualitative performance of the proposed
difference of ViT from our method is that ViT utilizes trainable retrieval framework, we drew the graph structures on the retrieved
embedding indexed by the tensor positions rather than the border regions. The joint visualization of WSI regions and graphs for the
distances. However, the semantic for a tensor position is not fixed retrieval instances in the Endometrial-2K dataset is provided in
for the sub-region because the object in the histopathology image Fig. 8. It shows that the relevant regions in various shape and size
is non-rigid and non-directional. In this case, the trainable embed- for the query region are returned from the WSIs. It means that the
ding could not find consistent meaning for a certain position and LAGE-Net has learned the representations to identify the WDEA,
therefore could not be beneficial to the region encoding. In con- MDEA, LDEA, and SEIC in endometrial histopathology.
trast, the proposed LAGE-Net equips the explicit embedding that is
indexed by the distance of the patch to the tissue border. This dif-
4.8. Discussion
ference brings an improvement of 1.9–2.1% in P@5 and 2.7–3.4% in
mAP to our method.
We have tried to use a trainable distance embedding indexed by
ϕ̄ j as a substitute of the sine-cosine embedding in the LAGE mod-
4.5.2. Comparison of computational complexity ule but observed a slight decrease in retrieval precision. Therefore,
Efficiency is equally important for CBHIR system. In the online we finally applied this constant embedding strategy.
retrieval stage, the computational complexity mainly derives from The parameter n̄ affects the average size of the sub-regions that
the strategy of retrieval, which is relevant to the pixel size of query can be retrieved by the system. n̄ is considered more of a Control
region a, the scale of the database b. The O notation for the com- value than a Hyper-parameter to be optimized in this framework.
pared methods are given in Table 3. In this case, the proposed method was expected to be stable to n̄.
In our method, the retrieval is completed by measuring Ham- Higher values of n̄ generate larger sub-regions. Larger regions con-
ming distances, which is irrelevant to the size of the query region tain more contextual information, which helps the model better
after the graph encoding. Therefore, the complexity is O (b). The identify the category of the sub-regions. That was the main reason
average time for querying the database containing 51,808 graphs that larger n̄ value delivered better performance. Because the pur-
is 0.752 ms in our experimental environment. Moreover, benefit- pose of this study was to develop framework for fine-grained re-
ing from the binary encoding, the similarity measurement is time- trieval of sub-regions, we limited the value of n̄ to 90 in the main
saving than those based on float-type high-dimensional features experiment. We also tested the retrieval framework for the settings
(e.g., Jimenez-del Toro et al., 2017). When the order of magni- n̄ = 180 and n̄ = 360, and got a mAP of 0.703 and a mAP of 0.711
tudes of WSI in the database increases, a hash table can be pre- in the Endometrium-2K dataset, respectively, which were better
established. Then, the retrieval can be easily achieved by a table- than those obtained with n̄ ≤ 90. However, the retrieval database
lookup operation, for which the complexity of retrieval is poten- were occupied by large WSI regions. Simultaneously, the computa-
tially reduced to O (1 ). tional amount of the self-attention module in LAGE-Net quadrati-
cally grows as the enlarge of n̄, which increases difficulties to the
4.6. Comparison on ACDC-LungHP dataset training and inference of LAGE-Net.
Generally, the source of a query region, i.e. the organ a re-
The same evaluations were completed on the ACDC-LungHP gion comes from, is available in the digital pathology system,
dataset. The hyper-parameters of the LAGE-Net were tuned within and the retrieval is usually performed within the WSIs from the
the 120 training WSIs and were finally determined as (Nb , Hh , de ) = same organ. Therefore, we did not mix the histopathology WSIs
(4, 4, 512 ). Then the training WSIs were encoded to construct the from different organs to establish the retrieval database. The pro-
retrieval database. The 30 testing WSIs were used to generate the posed global location embeddings used to describe the location
query graphs. The metrics of retrieval for different settings of n̄ are of the patches were calculated in the same resolution and then
compared in Table 4. As for the dataset only provides binary an- scaled based on the minimum and maximum distance values of
notation, the training and evaluation were completed with binary the database. It ensures the same embedding represents consis-
labels, i.e., Tumor vs. Normal. The results have shown that all the tent semantics. And when the retrieval system needs to be gener-
compared methods achieved applicable retrieval performance. The alized to retrieval database containing WSIs from different organs
P@5 is better than 0.779, and the mAP is above 0.701 for different or even in different resolutions, the distances to the border should

10
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

Fig. 8. Visualization of the retrieval performance of the proposed method on the Endometrium-2K dataset, where the first column provides the 5 query regions in different
type of lesions, the top-returned regions from the retrieval are ranked on the right, the irrelevant return regions (has different labels with the query graph) are framed
in red and the pixel size of the regions are located on the leftop of the images. Please check the supplemental material for the high resolution version of the figure. (For
interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

be properly scaled to make sure the same embedding represents retrieval performance. The LAGE-Net is scalable to size and shape
the equivalent actual distance. variations of query regions and can effectively retrieve relevant re-
The models in the proposed framework were trained based on gions that contain similar content and structure of the tissue. It
supervised learning, which depends on the manual annotations of allows pathologists to create query regions by free-curves on the
pathologists. Theoretically, the CNN and LAGE-Net are potentially digital pathology platform. Benefited from hashing structure, the
trained based on the methodology of unsupervised learning, es- retrieval process is completed based on hamming distance, which
pecially the contrast representation learning (He et al., 2020; Grill is very time-saving. One future work is to build an integrated
et al., 2020; Chen et al., 2021). Therefore, one of the future works model for representative generation and indexing of whole slide
will focus on deploying the proposed framework without patholo- images. Another future work will focus on developing unsuper-
gists’ manual annotations. vised and weakly supervised retrieval frameworks.
The encoding of regions in the proposed framework are divided
into three separate stages: feature extraction, graph construction,
and graph encoding. Another future work will focus on combin- Declaration of Competing Interest
ing the three stages into an integrated model that can be trained
end-to-end and can simultaneously predict the representative re- The authors declare that they have no known competing finan-
gions in the WSI and encode these regions to establish the retrieval cial interests or personal relationships that could have appeared to
database. influence the work reported in this paper.

5. Conclusions
CRediT authorship contribution statement
In this paper, we propose a novel histopathological image re-
trieval framework for a large-scale WSI database based on location- Yushan Zheng: Conceptualization, Methodology, Software,
aware graphs and deep hashing techniques. The sub-regions in Writing – original draft. Zhiguo Jiang: Funding acquisition, Su-
the WSI are represented as graphs within image features, in- pervision. Jun Shi: Formal analysis, Writing – review & editing.
ternal connection information and WSI location information. The Fengying Xie: Writing – review & editing. Haopeng Zhang: Su-
graphs are encoded by the designed LAGE-Net and archived with pervision, Validation. Wei Luo: Data curation. Dingyi Hu: Investi-
binary codes for hash retrieval. The experimental results on an gation, Visualization. Shujiao Sun: Data curation. Zhongmin Jiang:
in-house endometrium dataset and a public lung dataset have Data curation, Supervision. Chenghai Xue: Resources, Writing – re-
demonstrated that the proposed method achieves state-of-the-art view & editing.

11
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

Acknowledgments Hollon, T.C., et al., 2020. Near real-time intraoperative brain tumor diagnosis using
stimulated raman histology and deep neural networks. Nat. Med. 26 (1), 52–58.
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q., 2017. Densely connected
This work was partly supported by the National Natural Sci- convolutional networks. In: Proceedings of the IEEE Conference on Computer
ence Foundation of China [Grant numbers 61901018, 61771031, Vision and Pattern Recognition, pp. 4700–4708.
61906058, and 61471016], partly supported by the China Post- Jia, Z., Huang, X., Eric, I., Chang, C., Xu, Y., 2017. Constrained deep weak supervi-
sion for histopathology image segmentation. IEEE Trans. Med. Imaging 36 (11),
doctoral Science Foundation [Grant number 2019M650446], partly 2376–2388.
supported by Tianjin Science and Technology Major Project [Grant Jiang, M., Zhang, S., Huang, J., Yang, L., Metaxas, D.N., 2016. Scalable histopatholog-
number 18ZXZNSY00260 ], partly supported by the Anhui Provin- ical image analysis via supervised hashing with multiple features. Med. Image
Anal. 34, 3–12.
cial Natural Science Foundation [Grant number 1908085MF210],
Kalra, S., Tizhoosh, H., Choi, C., Shah, S., Diamandis, P., Campbell, C.J., Pan-
partly supported by the Fundamental Research Funds for the Cen- tanowitz, L., 2020. Yottixel–An image search engine for large archives of
tral Universities of China [Grant number JZ2020YYPY0093], and histopathology whole slide images. Med. Image Anal. 65, 101757.
Kipf, T.N., Welling, M., 2016. Semi-supervised classification with graph convolutional
partly supported by the 111 Project [Grant number B13003].
networks. In: Proceedings of Advances in Neural Information Processing Sys-
tems.
Supplementary material Li, Z., Zhang, J., Tan, T., Teng, X., Sun, X., Zhao, H., Liu, L., Xiao, Y., Lee, B., Li, Y.,
Zhang, Q., Sun, S., Zheng, Y., Yan, J., Li, N., Hong, Y., Ko, J., Jung, H., Liu, Y., cheng
Chen, Y., wei Wang, C., Yurovskiy, V., Maevskikh, P., Khanagha, V., Jiang, Y., Yu, L.,
Supplementary material associated with this article can be Liu, Z., Li, D., Schuffler, P.J., Yu, Q., Chen, H., Tang, Y., Litjens, G., 2021. Deep
found, in the online version, at doi:10.1016/j.media.2021.102308 learning methods for lung cancer segmentation in whole-slide histopathology
images—The ACDC@LungHP challenge 2019. IEEE J. Biomed. Health Inform. 25
References (2), 429–440.
Li, Z., Zhang, X., Müller, H., Zhang, S., 2018. Large-scale retrieval for medical image
Bejnordi, B.E., Balkenhol, M., Litjens, G., Holland, R., Bult, P., Karssemeijer, N., van der analytics: acomprehensive review. Med. Image Anal. 43, 66–84.
Laak, J.A., 2016. Automated detection of DCIS in whole-slide H&Estained breast Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Van Der
histopathology images. IEEE Trans. Med. Imaging 35 (9), 2141–2150. doi:10.1109/ Laak, J.A., Van Ginneken, B., Sánchez, C.I., 2017. A survey on deep learning in
tmi.2016.2550620. medical image analysis. Med. Image Anal. 42, 60–88.
Bejnordi, B.E., Veta, M., Van Diest, P.J., Van Ginneken, B., Karssemeijer, N., Litjens, G., Liu, W., Wang, J., Ji, R., Jiang, Y.G., 2012. Supervised hashing with kernels. In: Com-
Van Der Laak, J.A., Hermsen, M., Manson, Q.F., Balkenhol, M., et al., 2017. Di- puter Vision and Pattern Recognition, pp. 2074–2081.
agnostic assessment of deep learning algorithms for detection of lymph node Ma, Y., Jiang, Z., Zhang, H., Xie, F., Zheng, Y., Shi, H., Zhao, Y., Shi, J., 2017. Breast
metastases in women with breast cancer. JAMA 318 (22), 2199–2210. histopathological image retrieval based on latent Dirichlet allocation. IEEE J.
Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res Biomed. Health Inform. 21 (4), 1114–1123. doi:10.1109/JBHI.2016.2611615.
3 (Jan), 993–1022. Ma, Y., Jiang, Z., Zhang, H., Xie, F., Zheng, Y., Shi, H., Zhao, Y., Shi, J., 2018. Generat-
Caicedo, J.C., Gonzalez, F.A., Romero, E., 2008. A semantic content-based retrieval ing region proposals for histopathological whole slide image retrieval. Comput.
method for histopathology images. In: Asia Information Retrieval Conference on Methods Prog. Biomed. 159, 1–10.
Information Retrieval Technology, pp. 51–60. Maaten, L.v.d., Hinton, G., 2008. Visualizing data using t-SNE. J. Mach. Learn. Res 9
Caicedo, J.C., Izquierdo, E., 2010. Combining low-level features for improved classifi- (Nov), 2579–2605.
cation and retrieval of histology images. Ibai Publ. 2 (1), 68–82. Mehta, N., Alomari, R.S., Chaudhary, V., 2009. Content based sub-image retrieval
Chen, P., Shi, X., Liang, Y., Li, Y., Yang, L., Gader, P.D., 2020. Interactive thyroid system for high resolution pathology images using salient interest points. In:
whole slide image diagnostic system using deep representation. Comput. Meth- International Conference of the IEEE Engineering in Medicine and Biology Soci-
ods Prog. Biomed. 195, 105630. ety, pp. 3719–3722.
Chen, X., Xie, S., He, K., 2021. An empirical study of training self-supervised vision Peng, T., Boxberg, M., Weichert, W., Navab, N., Marr, C., 2019. Multi-task learning
transformers. arXiv preprint arXiv:2104.02057. of a deep K-nearest neighbour network for histopathological image classifica-
Cheng, S., Wang, L., Du, A., 2019. Histopathological image retrieval based on asym- tion and retrieval. In: International Conference on Medical Image Computing
metric residual hash and DNA coding. IEEE Access 7, 101388–101400. and Computer-Assisted Intervention. Springer, pp. 676–684.
Comaniciu, D., Meer, P., Foran, D., 1998. Shape-based image indexing and retrieval Sapkota, M., Shi, X., Xing, F., Yang, L., 2018. Deep convolutional hashing for
for diagnostic pathology. In: Pattern Recognition, 1998. Proceedings. Fourteenth low-dimensional binary embedding of histopathological images. IEEE J. Biomed.
International Conference on, 1. IEEE, pp. 902–904. Health Inform. 23 (2), 805–816.
Comaniciu, D., Meer, P., Foran, D., Medl, A., 1998. Bimodal system for interactive Shi, X., Sapkota, M., Xing, F., Liu, F., Cui, L., Yang, L., 2018. Pairwise based deep rank-
indexing and retrieval of pathology images. In: Applications of Computer Vision, ing hashing for histopathology image classification and retrieval. Pattern Recog-
1998. WACV. Proceedings. Fourth IEEE Workshop on, pp. 76–81. nit. 81, 14–22.
Day, W.H.E., Edelsbrunner, H., 1984. Efficient algorithms for agglomerative hierarchi- Shi, X., Xing, F., Xu, K., Xie, Y., Su, H., Yang, L., 2017. Supervised graph hashing for
cal clustering methods. J. Classif 1 (1), 7–24. histopathology image retrieval and classification. Med. Image Anal. 42, 117.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Sparks, R., Madabhushi, A., 2011. Out-of-sample extrapolation using semi-supervised
Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N., 2021. manifold learning (OSE-SSL): content-based image retrieval for prostate histol-
An image is worth 16 × 16 words: transformers for image recognition at scale. ogy grading. In: International Symposium on Biomedical Imaging, pp. 734–737.
In: ICLR 2021: The Ninth International Conference on Learning Representations. Sridhar, A., Doyle, S., Madabhushi, A., 2011. Boosted spectral embedding (BOSE): ap-
Doyle, S., Hwang, M., S, N., MD, F., JE, T., A, M., 2007. Using manifold learning for plications to content-based image retrieval of histopathology. In: International
content-based image retrieval of prostate histopathology. Medical Image Com- Symposium on Biomedical Imaging, pp. 1897–1900.
puting and Computer-Assisted Intervention. Tizhoosh, H.R., Babaie, M., 2018. Representing medical images with encoded local
Erfankhah, H., Yazdi, M., Babaie, M., Tizhoosh, H.R., 2019. Heterogeneity-aware projections. IEEE Trans. Biomed. Eng. 65 (10), 2267–2277.
local binary patterns for retrieval of histopathology images. IEEE Access 7, Jimenez-del Toro, O., Otálora, S., Atzori, M., Müller, H., 2017. Deep multimodal
18354–18367. case-based retrieval for large histopathology datasets. In: MICCAI 2018 Work-
Falk, T., Mai, D., Bensch, R., Çiçek, Ö., Abdulkadir, A., Marrakchi, Y., Böhm, A., Deub- shop on Patch-based Techniques in Medical Imaging. Springer, pp. 149–157.
ner, J., Jäckel, Z., Seiwald, K., et al., 2019. U-Net: deep learning for cell counting, Uijlings, J.R.R., Sande, K.E.A., van de Gevers, T., Smeulders, A.W.M., 2013. Selective
detection, and morphometry. Nat. Methods 16 (1), 67. search for object recognition. Int. J. Comput. Vis. 104 (2), 154–171.
Grill, J.-B., Strub, F., AltchȨ, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doer- Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.,
sch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Polosukhin, I., 2017. Attention is all you need. In: Proceedings of the 31st
Valko, M., 2020. Bootstrap your own latent: a new approach to self-super- International Conference on Neural Information Processing Systems, vol. 30,
vised learning. In: Advances in Neural Information Processing Systems, vol. 33, pp. 5998–6008.
pp. 21271–21284. Veta, M., Heng, Y.J., Stathonikos, N., Bejnordi, B.E., Beca, F., Wollmann, T., Rohr, K.,
Gu, Y., Jie, Y., 2018. Densely-connected multi-magnification hashing for histopatho- Shah, M.A., Wang, D., Rousson, M., et al., 2019. Predicting breast tumor prolif-
logical image retrieval. IEEE J. Biomed. Health Inform. 23 (4), 1683–1691. eration from whole-slide images: the TUPAC16 challenge. Med. Image Anal. 54,
Gu, Y., Yang, J., 2019. Multi-level magnification correlation hashing for scalable 111–121.
histopathological image retrieval. Neurocomputing 351, 134–145. Wetzel, A.W., Crowley, R., Kim, S., Dawson, R., Zheng, L., Joo, Y.M., Yagi, Y., Gilbert-
Gurcan, M.N., Boucheron, L.E., Can, A., Madabhushi, A., Rajpoot, N.M., Yener, B., son, J., Gadd, C., Deerfield, D.W., 1999. Evaluation of prostate tumor grades by
2009. Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, content-based image retrieval. In: Proceedings of SPIE - The International Soci-
147–171. ety for Optical Engineering, vol. 3584, pp. 244–252.
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R., 2020. Momentum contrast for unsuper- Xu, J., Xiang, L., Liu, Q., Gilmore, H., Wu, J., Tang, J., Madabhushi, A., 2015. Stacked
vised visual representation learning. In: 2020 IEEE/CVF Conference on Computer sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology
Vision and Pattern Recognition (CVPR), pp. 9729–9738. images. IEEE Trans. Med. Imaging 35 (1), 119–130.
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recogni- Xu, Y., Jia, Z., Wang, L.B., Ai, Y., Zhang, F., Lai, M., Chang, I.C., 2017. Large scale
tion. In: Proceedings of the IEEE Conference on Computer Vision and Pattern tissue histopathology image classification, segmentation, and visualization via
Recognition, pp. 770–778. deep convolutional activation features. BMC Bioinform. 18 (1), 281.

12
Y. Zheng, Z. Jiang, J. Shi et al. Medical Image Analysis 76 (2022) 102308

Xu, Y., Zhu, J.Y., Chang, I.C., Lai, M., Tu, Z., 2014. Weakly supervised histopathol- Zheng, Y., Jiang, B., Shi, J., Zhang, H., Xie, F., 2019. Encoding histopathological WSIs
ogy cancer image segmentation and classification. Med. Image Anal. 18 (3), using GNN for scalable diagnostically relevant regions retrieval. In: International
591–604. Conference on Medical Image Computing and Computer-Assisted Intervention.
Yan, R., Ren, F., Wang, Z., Wang, L., Zhang, T., Liu, Y., Rao, X., Zheng, C., Zhang, F., Springer, pp. 550–558. doi:10.1007/978- 3- 030- 32239- 7_61.
2020. Breast cancer histopathological image classification using a hybrid deep Zheng, Y., Jiang, Z., Shi, J., Ma, Y., 2014. Retrieval of pathology image for breast can-
neural network. Methods 173, 52–60. cer using PLSA model based on texture and pathological features. In: 2014 IEEE
Ying, Z., You, J., Morris, C., Ren, X., Hamilton, W., Leskovec, J., 2018. Hierarchical International Conference on Image Processing (ICIP). IEEE, pp. 2304–2308.
graph representation learning with differentiable pooling. In: Advances in Neu- Zheng, Y., Jiang, Z., Xie, F., Zhang, H., Ma, Y., Shi, H., Zhao, Y., 2017. Feature ex-
ral Information Processing Systems, pp. 4800–4810. traction from histopathological images based on nucleus-guided convolutional
Zhang, S., Metaxas, D., 2016. Large-scale medical image analytics: recent method- neural network for breast lesion classification. Pattern Recognit. 71, 14–25.
ologies, applications and Future directions. Med. Image Anal. 33, 98–101. doi:10.1016/j.patcog.2017.05.010.
Zhang, X., Dou, H., Ju, T., Xu, J., Zhang, S., 2015. Fusing heterogeneous features from Zheng, Y., Jiang, Z., Zhang, H., Xie, F., Ma, Y., Shi, H., Zhao, Y., 2018. Histopathological
stacked sparse autoencoder for histopathological image analysis. IEEE J. Biomed. whole slide image analysis using context-based CBIR. IEEE Trans. Med. Imaging
Health Inform. 20 (5), 1377–1383. 37 (7), 1641–1652.
Zhang, X., Liu, W., Dundar, M., Badve, S., Zhang, S., 2015. Towards large-scale Zheng, Y., Jiang, Z., Zhang, H., Xie, F., Ma, Y., Shi, H., Zhao, Y., 2018. Size-scalable
histopathological image analysis: hashing-based image retrieval. IEEE Trans. content-based histopathological image retrieval from database that consists of
Med. Imaging 34 (2), 496–506. doi:10.1109/tmi.2014.2361481. WSIs. IEEE J. Biomed. Health Inform. 22 (4), 1278–1287.
Zheng, L., Wetzel, A.W., Gilbertson, J., Becich, M.J., 2004. Design and analysis of Zhou, G., Jiang, L., 2004. Content-based cell pathology image retrieval by combining
a content-based pathology image retrieval system. IEEE Trans. Inf. Technol. different features. Med. Imaging Pacs Imaging Inform 5371, 326–333.
Biomed. 7 (4), 249–255.

13

You might also like