0% found this document useful (0 votes)
23 views8 pages

Clinical Interpretable Deep Learning Model For Glaucoma Diagnosis

This paper presents a novel clinical interpretable deep learning model, EAMNet, for accurate glaucoma diagnosis that enhances interpretability by highlighting relevant regions in fundus images. The model employs a Multi-Layers Average Pooling (M-LAP) method to aggregate features from different scales, achieving a state-of-the-art AUC of 0.88 on the ORIGA dataset. The proposed architecture bridges the gap between global diagnosis and precise localization, providing clear clinical evidence for ophthalmologists and patients alike.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views8 pages

Clinical Interpretable Deep Learning Model For Glaucoma Diagnosis

This paper presents a novel clinical interpretable deep learning model, EAMNet, for accurate glaucoma diagnosis that enhances interpretability by highlighting relevant regions in fundus images. The model employs a Multi-Layers Average Pooling (M-LAP) method to aggregate features from different scales, achieving a state-of-the-art AUC of 0.88 on the ORIGA dataset. The proposed architecture bridges the gap between global diagnosis and precise localization, providing clear clinical evidence for ophthalmologists and patients alike.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO.

5, MAY 2020 1405

Clinical Interpretable Deep Learning Model for


Glaucoma Diagnosis
WangMin Liao , BeiJi Zou, RongChang Zhao , YuanQiong Chen , ZhiYou He, and MengJie Zhou

Abstract—Despite the potential to revolutionise disease


diagnosis by performing data-driven classification, clinical
interpretability of ConvNet remains challenging. In this pa-
per, a novel clinical interpretable ConvNet architecture is
proposed not only for accurate glaucoma diagnosis but
also for the more transparent interpretation by highlighting
the distinct regions recognised by the network. To the best
of our knowledge, this is the first work of providing the
interpretable diagnosis of glaucoma with the popular deep
learning model. We propose a novel scheme for aggregat-
ing features from different scales to promote the perfor-
mance of glaucoma diagnosis, which we refer to as M-LAP.
Moreover, by modelling the correspondence from binary
diagnosis information to the spatial pixels, the proposed
scheme generates glaucoma activations, which bridge the
gap between global semantical diagnosis and precise lo-
cation. In contrast to previous works, it can discover the Fig. 1. Top: a normal image. Bottom: a glaucoma image. The glau-
distinguish local regions in fundus images as evidence for coma image has a higher cup-to-disc ratio (CDR). And some of them
clinical interpretable glaucoma diagnosis. Experimental re- have bleeding spots and notch on the neuroretinal rim. They are the
sults, performed on the challenging ORIGA datasets, show evidence for glaucoma diagnosis.
that our method on glaucoma diagnosis outperforms state-
of-the-art methods with the highest AUC (0.88). Remarkably,
the extensive results, optic disc segmentation (dice of 0.9) a manual assessment of the optic disc (OD) through ophthal-
and local disease focus localization based on the evidence moscopy. However, it is difficult and time-consuming for manual
map, demonstrate the effectiveness of our methods on clin- detection due to its complex procedure. As shown in Fig. 1,
ical interpretability.
manual measurement is always required to quantificationally
Index Terms—Glaucoma diagnosis, clinical interpreta- assess the structural changes and progressive damage of optical
tion, medical image processing.
nerve head (ONH) caused by glaucoma [4]. In clinical practice,
I. INTRODUCTION widely-adopted quantitative measurements include cup-to-disc
ratio (CDR), rim to disc area ratio, disc diameter, disc area and so
LAUCOMA is a major chronic eye disease that acts as
G the second leading cause of blindness worldwide, with
around 80 million people by 2020 [1], [2]. Since glaucoma can
on [5]. Besides, the notch on neuroretinal rim [6], the bleeding
on optic disc [7] and defects on retinal nerve fibre layer [8]
are employed as evidence to provide detail information for
cause irreversible vision loss, early diagnosis is critical to slow accurate assessment of ONH. Therefore, the clinical evidences
down the progress [3]. Clinically, the usual diagnosis includes of glaucoma are distributed on the OD.
intra-ocular pressure and visual field loss tests together with Nowadays, convolutional neural networks (CNN) based ONH
assessment methods have been widely used for large-scale au-
Manuscript received February 19, 2019; revised July 1, 2019 and Au- tomated diagnosis of glaucoma [9]–[14]. With the rapid devel-
gust 23, 2019; accepted October 15, 2019. Date of publication October opment of medical imaging [15], [16], these machine learning
23, 2019; date of current version May 6, 2020. This work was supported
in part by the National Natural Science Foundation of China (61702558 methods make rapid diagnosis possible, and it is significant
and 61573380); in part by the Hunan Natural Science Foundation for the screening in community health centres [17]–[18]. Al-
(2017JJ3411); in part by Key R&D projects in Hunan (2017WK2074); though these methods make breakthroughs in automated glau-
in part by the National Key R&D Program of China (2017YFC0840104);
and in part by the Graduate Research and Innovation Project coma diagnosis, they still suffer from some weakness. The
of Central South University (2019zzts587). (Corresponding author: most criticised one is lack of clinical interpretation and explicit
RongChang Zhao.) diagnosis evidence [19]–[21]. CNN-based methods can often
The authors are with the School of Computer Science and Engi-
neering, Central South University, Changsha 410083, China, and also provide diagnostic conclusions accurately. However, they cannot
with the Hunan Engineering Research Center of Machine Vision and bring out the facts or reasons why the conclusions are made.
Intelligent Medicine, Changsha 410012, China (e-mail: 0909123117@ To solve this problem, we provide a pathological condition
csu.edu.cn; [email protected]; [email protected]; yqchen@
vip.163.com; [email protected]; [email protected]). for physicians and intuitive interpretation for patients of how
Digital Object Identifier 10.1109/JBHI.2019.2949075 the diagnosis made, as clinical evidence. In a computer-aided
2168-2194 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:18:08 UTC from IEEE Xplore. Restrictions apply.
1406 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO. 5, MAY 2020

segmentation of optic disc and cup along with localization of le-


sions. If the region of interest matches the clinical evidence area,
optic disc and lesions, the model is interpretable. In this paper,
we propose a novel clinical interpretable ConvNet architecture
(EAMNet) not only to achieve accurate glaucoma diagnosis but
also to provide a more transparent interpretation by highlighting
the distinct regions recognized by the network. Therefore, our
EAMNet enables deep model interpretable benefitting from
three facts: 1) the model imitates the diagnosis process of clinical
physicians who discover the evidence to support the diagnosis.
The proposed EAMNet not only gives the diagnosis results, but
also provides a visual region of interest (ROI) to corroborate the
reliability of the diagnosis decision. 2) the proposed EAMNet
employs three distinguished components to accurately discover
local regions with particular appearance and features to support
Fig. 2. (a) The proposed method not only obtains automated diagnos- the glaucoma diagnosis. Specifically, a well-designed CNN has
tic conclusions but also provides the clinical evidence for the accurate constructed to abstract hierarchical information for semantic
diagnosis. (b) Traditional segmentation-based methods first measure
CDR from the segmented result which requires strong prior information
features extraction and automated glaucoma diagnosis. A novel
and user interaction, then diagnosis glaucoma based on the segmenta- method, Multi-Layers Average Pooling (M-LAP), is proposed
tion results. to build an information passageway to bridge the gap between
semantic information and localization information at multiple
scales. 3) the results produced by our EAMNet are interpretable
approach, the clinical evidence of glaucoma is often shown as for glaucoma diagnosis due to it can discover ophthalmic lesions
changes in intensity or structure in local regions. Unfortunately, and key anatomical regions (OD) automatically without any
modern CNN has difficulties in dealing with the problem of pixel-level annotation, as shown in Section III. The contribution
evidence identification. It is because we use CNN as a black of our work is as follows:
box. The clinical evidence is hidden in the black box. It is the 1) For the first time, a clinical interpretable deep learning
most challenging task to bridge the gap between the evidence model is proposed to not only achieve accurate automated
of a model and the understanding of the ophthalmologist. Due glaucoma diagnosis but also provide a more transparent
to the pyramid structure of a CNN, the flow of information and interpretation by highlighting the distinct regions to sup-
the region of interest are imperceptible. Since we can not open port the diagnosis.
an old box, we can make an openable box. 2) A novel method, Multi-Layers Average Pooling (M-
Designing a system, which provides reliable evidence for LAP), is proposed to integrate features of different levels
accurate diagnosis of glaucoma is an excellent challenging task for accurate glaucoma diagnosis, meanwhile building
in clinical practice [22]. For the ophthalmologist, the clear and an information passageway to bridge the gap between
easy to be understood evidence for the glaucoma diagnosis semantic information and localization information at mul-
is the localization of lesions. Existing methods usually treat tiple scales and collaborating with Evidence Activation
evidence extraction and glaucoma diagnosis as two separate Mapping this method both output fully-supervised diag-
tasks that are solved with two independent systems. On the one nosis and weakly-supervised evidence localization.
hand, many methods [9], [23]–[26] have been proposed to find 3) We achieve clinical interpretable diagnosis result of high
the evidence area by localizing and segmenting the anatomies accuracy. Our method on glaucoma diagnosis achieves
with supervised technique. However, those segmented areas are state-of-the-art accuracy with the Area Under Curve
not always sensitive to the accurate diagnosis as pathological (AUC) of 0.88, and it provides the evidence activation
conditions. On the other hand, glaucoma diagnosis is formulated maps which give the clinical basis of glaucoma, which is
as a classification problem in machine learning to be solved meaningful for the clinical application of CNN.
end-to-end [9]–[13]. The classification model is a black box, and
neither clinician nor patients can be told why it is, but only what it
is. Multi-task learning [10] have been used to find the segmented II. METHODOLOGY
area and diagnosis glaucoma simultaneously. However, multi- The proposed framework (EAMNet), as shown in Fig. 3,
task learning usually needs large-scale pixel-level annotation mainly consists of three main parts: CNN backbone network for
which is expensive to obtain. Weakly-supervised learning has hierarchical feature extraction and aggregation, Multi-Layers
the ability to find local special regions only with classification Average Pooling (M-LAP) to bridge the gap between semantic
labels [27]. Fig. 2 demonstrates how the proposed method differs information and localization information at multiple scales and
from segmentation-based methods of getting the evidence. Evidence Activation Mapping for evidence identification and
To make it clear and easy to be understood, the evidence discovery. We adopt a classification network with ResBlock and
should be highly correlated to diagnosis. In fact, for the ophthal- multiple convolutional layers as a backbone network, which
mologist, the clinical evidence for the glaucoma diagnosis is the obtains excellent representation via aggregation of complex

Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:18:08 UTC from IEEE Xplore. Restrictions apply.
LIAO et al.: CLINICAL INTERPRETABLE DEEP LEARNING MODEL FOR GLAUCOMA DIAGNOSIS 1407

identity shortcut connection it introduces provides a fairly good


representation of fundus images, which largely enhances the
ability to extract evidence and the ability to diagnose glaucoma.
Considering the spatial layout of fundus images are almost
the same and avoiding model redundancy, we configure a low
number of filters. We also largely employ dropout and batch
normalisation layers to alleviate overfitting. Just before sent into
the network, a fundus image is resized to 224 × 224. As can be
seen in our experiments, our CNN architecture is beneficial for
fundus images representation.
Fig. 3. The overview of the proposed EAMNet, containing three main Noting that there are five stages in the architecture. Each stage
parts: CNN backbone network for hierarchical feature extraction and includes several ResBlocks and one pooling layers. The first
aggregation, Multi-Layers Average Pooling (M-LAP) to bridge the gap
between semantic information and localization information at multiple stage, Conv_1, include a 7 × 7 convolution layer. Others are
scales and Evidence Activation Mapping for evidence identification and ResBlocks with three convolution layers and shortcut connec-
discovery. tion. As shown in Fig. 4, the architecture of different stages are
slightly different in the size of output and the repeat times of
ResBlocks. Experimentally, we select three stages as the input
of M-LAP, which are Conv_3x, Conv_4x and Conv_5x stages.
Their pooling layers are followed by a 1 × 1 convolution layer
to decrease the parameter size. And they are resized to the same
size in M-LAP.
Each ResBlock is a combination of convolution layers. The
architecture explicitly enables each layers fit a residual map-
ping instead of letting each few stacked layers directly fit a
desired underlying mapping. Denoting the underlying mapping
as H(x), the stacked nonlinear layers fit another mapping of
F (x) = H(x) − x. This is the formulation of a shortcut con-
nection. The origin mapping H(x) is F (x) + x. It shows that
identity shortcut connections add neither extra parameters nor
computational complexity [29].
Fig. 4. The overview of backbone architecture is shown in the yellow
box. The input image is a 224 × 224 RGB image. There are five stages
in the architecture. Each stage includes several ResBlocks and one
pooling layers. The pooling layer is 2 × 2 max pooling. The architecture B. Multi-Layers Average Pooling
of ResBlocks in different stages are on the bottom. The output of each
selected stage is connected to the next stage and also followed by a We introduce the Multi-Layers Average Pooling (M-LAP) to
1 × 1 convolution layer to decrease the parameter size. The chosen aggregate multi-scale global features for glaucoma diagnosis
stage is the input of M-LAP. effectively. Meanwhile, the M-LAP provides an information
passageway to bridge the gap between semantic information
hierarchical features. To bridge the gap between semantic and localization information at multiple scales. As shown in
information and localization information, we perform a novel Fig. 5, M-LAP consists of multi-scale feature aggregation and
block M-LAP on the convolutional hierarchical feature maps. channel-wise global pooling. With the multi-scale feature maps,
The method produces the diagnosis conclusion. Meanwhile, it the mission is constructing a classifier to achieve accurate clas-
provides evidence activation map. Then EAM projects back the sification between glaucoma and normal image by aggregating
binary diagnosis conclusion on to the convolutional feature maps feature maps. Given the fact that the lesions of glaucoma are
and activates the local pixels which contribute to glaucoma diag- of different layout and size. In our implements, three-level
nosis. Therefore, EAMNet can discover and identify the partic- feature maps, refined, coarse and discriminative features, are
ular local regions of the fundus image (notch on the neuroretinal aggregated to obtain expressive representations of fundus im-
rim, bleeding on optic disc and defects on the optic disc, etc.). ages. To aggregate feature in different scales, we first resize
all feature maps to the same size as the output of the feature
extractor. Then all the resized feature maps are concatenated into
A. Backbone Architecture
a multi-channel feature map followed by the 1 × 1 convolution
The backbone of EAMNet is a feature expressive represen- to interactively aggregate features among different channels and
tations network with multiple convolutional layers and pooling generate fix-channel feature maps.
layers. As shown in Fig. 4. We use ResBlock [29] as the basic Different from the traditional classifier with fully-connection
module of our network. These ResBlocks are connected to differ- layer, M-LAP uses the global spatial pooling to abstract the se-
ent ResBlocks or pooling layers. We select three pooling layers mantics for accurate classification. Global spatial pooling (GSP)
according to the different levels of feature layers. We resize averages the feature map into represented single value instead of
the output of these pooling layers and concatenate them. The every pixels. As shown in Fig. 6 , the GSP [30] layer is simple

Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:18:08 UTC from IEEE Xplore. Restrictions apply.
1408 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO. 5, MAY 2020

Fig. 7. Optic Disc Activation Mapping: the weights are mapped back
to the previous convolutional layer to generate the Evidence Activation
Maps (EAMs) as the attention score for glaucoma classification. There
are n feature maps in all of the feature maps. Correspondingly there
are n weights learned from the previous process. Weighted summation
of weights and feature maps are used to generate EAM. The EAM
highlights the glaucoma-specific discriminative regions.

unacceptable small size at the deeper layers. This bottle obstructs


Fig. 5. The three modules are connected in this way. The selected
stages of backbone architecture are connected to the M-LAP. Three the generation of feature maps with accurate spatial information
branches with 1 × 1 convolution and resizing are used to modify the and high semantics. In EAMNet, we describe a novel approach to
size of feature maps to concatenate them. After this process, three generate refined evidence activation maps (EAM) with accurate
feature maps are generated, which represent refined features, coarse
features and discriminative features respectively. Before classification, spatial evidence information for glaucoma diagnosis.
global average pooling is used to generate one-dimensional vectors. Evidence activation mapping is a channel-wise attention-
based approach for evidence identification and implemented
by a projection from binary classification to spatial evidence
maps. As shown in Fig. 7 the feature maps at different scales are
aggregated into a single map by a weighted sum function, and the
weighted sum function acts as an attention gate which gives the
biggest weight to the feature map that contributes to glaucoma
classification while giving small weight to the other one. Here,
Fig. 6. Fully connected layers flatten the feature maps. The network
the weights are regarded as the attention scores for glaucoma
contains lots of redundant information, making it impossible to project classification and optimised in the classification stage. With the
information from arrays to feature maps. Global average pooling global weighted sum function, EAMNet back-projects the attention
average each feature maps as the representation of them.
scores from glaucoma classification to the different feature
maps. In this implementation, we compute a weighted sum of the
in structure and needs fewer parameters to train. For a given feature maps from three chosen convolutional layers to obtain
feature map, let fki (x, y) represent the activation of channel k our EAM. Let gki (x y) represents the result of normalization
in layer i at (x, y). Then, for channel k in activation layer i, the of the kth kernel in the ith activation layer, where (x y) is the
result of global spatial pooling, Fki is Σx,y fki (x y). Thus, the coordinate of a pixel. In our method, there are 3 activation layers,
c as shown in Fig. 5. They are refined layers, coarse layers and
output of the softmax layer of a given class c, Sc = Σk,i wki Fki
c
where wki is the weight corresponding to class c for channel k discriminative layers. Each feature map, gki (x y), has the same
in activation layer i. size of 28 × 28. We define Mc as the evidence activation map
By plugging Fki = Σx,y fki (x y) into the class score, Sc , we where the optic disc region share the same location with the
obtain: significant evidence for glaucoma diagnosis.
c 
Sc = Σx,y Σk,i wki fki (x y) . (1) Mc (x y) = c
wki gki (x y) . (2)
i k
It is easy to find out that the number of Σx,y fki (x y) is the same
c
as that of wki and the number of concatenated feature maps,
where fki (x y) is the feature map of the kth kernel in the ith
which makes it possible to project weights back onto feature c
activation map and wki is the weight learned by the classifier
maps. Comparing to fully connection layer, the parameters are
as is shown in Fig. 5. The kernel k and activation layer i are
reduced by 1/xy.
corresponding to the feature maps gki (x, y).
The further experiments, which will be discussed in
C. Evidence Activation Mapping Section III-B, indicate that the multi-scale feature maps con-
It is known that the shallower layers represent low-semantics catenation algorithm performs better than single-scale feature
features while the deeper layers represent discriminative features maps for evidence identification. It is because multi-scale feature
in a classification-oriented CNN [31]. Meanwhile, the shallower maps provide more detailed spatial information of evidence at
layers provide rich spatial information with high-resolution fea- multiple scales. Our EAMNet makes the evidence map sharp and
ture maps. Due to the structure of CNN, which is known as clear by using refined features while enhancing the semantics of
pyramid structure, discriminative feature maps are shrunk to an evidence map by using coarse and discriminative features. The

Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:18:08 UTC from IEEE Xplore. Restrictions apply.
LIAO et al.: CLINICAL INTERPRETABLE DEEP LEARNING MODEL FOR GLAUCOMA DIAGNOSIS 1409

lesions in the optic disc are accurately discovered due to the


three kinds of features.

III. EXPERIMENT
The effectiveness of the proposed EAMNet is validated on
two aspects: the accuracy of glaucoma diagnosis and precision
of evidence identification. We perform experiments on the chal-
lenging public datasets ORIGA [34]. The experimental results
verify the proposed EAMNet achieves state-of-the-art diagnosis
accuracy (0.88) and does an excellent performance on evidence
identification.
In our experiments, the localization of lesions and segmenta-
tion of the optic disc are employed as an instance of evidence
identification for our clinical interpretable EAMNet. The patho-
genesis of glaucoma, structural changes of optical nerve head,
are often observed on the optic disc [1]. It is believed that when
judging a fundus image, whether it is glaucoma, doctors focus
Fig. 8. ROC curve of our method and other methods. Our method, the
mostly on the optic disc and the lesions on it. Thus, when a CNN red one, performs better than others.
model provides diagnosis result, meanwhile giving the evidence
map where the optic disc is, we are convinced this model is
clinically interpretable. In this implementation, we make use of
superpixel to soften the gradient of local features and employ
ellipse fitting to obtain the segmentation of optic disc. To the best
of our knowledge, no previous work sets a criterion to measure
the interpretability of the model.

A. Criteria
In this paper, we utilise the area under the curve (AUC) of
the receiver operation characteristic curve (ROC) to evaluate
the performance of glaucoma diagnosis. The ROC is plotted as Fig. 9. The activated map represents optic disc and cup areas simul-
a curve which shows the tradeoff between sensitivity (TPR) and taneously with different activation amplitude. The first column is the raw
fundus image, and the second and third columns are the activation map
specificity (TNR), defined as: and segmented optic disc mask by our EAMNet.
TP TN
TPR = , TNR = . (3)
TP + FN FP + TN
conducted over three years from 2004 to 2007 by the Singapore
where T P and T N are the numbers of true positives and true Eye Research Institute and funded by the National Medical Re-
negatives, and F P along with F N are the number of false search Council. Singapore Malay Eye Study (SiMES) examined
positives and false negatives, respectively. 3,280 Malay adults aged 40 to 80, from which, 149 are glaucoma
We utilize the overlapping error E and balance accuracy A as patients. Retinal fundus images for both eyes were taken for
the evaluation metrics for optic disc segmentation. each subject in the study [34]. The 650 images with manual
Area(S ∩ G) 1 labelled optic disc mask are divided into 325 training images
E =1− , A = (T P R + T N R) (4) (including 73 glaucoma cases) and 325 testing images (including
Area(S ∪ G) 2
95 glaucomas).
with 1) Ablation Study: As shown in Figs. 8 and 9, the ablation
TP TN study demonstrates that our method can not only obtain accurate
TPR = , TNR = (5)
TP + FN FP + TN glaucoma diagnosis but also provides the more transparent inter-
pretation by highlighting the distinct regions recognised by the
where S and G denote the segmented mask and the manual
network. In Fig. 9, the ROC curve (in red) indicates that although
ground truth, respectively.
the detection of glaucoma based on colour fundus image is
a challenging task, our EAMNet obtains high sensitivity and
B. Dataset low specificity. Thanks to the accurate evidence and multi-scale
The origa dataset is used in the experiments to validate glau- feature aggregation, EAMNet obtains a state-of-the-art AUC
coma diagnosis, disc segmentation and lesion localization. The value with 0.88. It is much higher than the traditional image
ORIGA datasets are comprised of 168 glaucoma and 482 normal processing methods like Airpuff, Wavelet, Gabor, and GRI.
images from studies of a Malay population with ground truth cup It also performs better than the superpixel and CNN method
and disc labels along with clinical glaucoma diagnoses. It was (Chan et al. [4]). The further analytical result will be shown in

Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:18:08 UTC from IEEE Xplore. Restrictions apply.
1410 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO. 5, MAY 2020

Fig. 11. (1) Comparison of One-GAP-EAM and Multi-Layers Average


Pooling. The results of One-GAP-EAM and Multi-Layers Average Pool-
ing are shown in the middle and third columns, respectively. It is a 7 × 7
Fig. 10. (a) Shows that notch and bleeding spots are highlighted on map which only shows the approximate position of the optic disc. The
the final map. (b) Shows that the structural variation of blood vessels is resolution is not enough for segmentation. The result of our proposed
also highlighted. (c) Indicates that PPA is taken into consideration of our method is shown in the right row, which uses the feature maps of
method. 28 × 28, 14 × 14 and 7 × 7. The final EAM is much finer. (2) We also
test the result when EAM inserted in different layers. We find that when
shallower layers are inserted, the result will be affected by the identity
information, like vessels and texture of retina.
the comparison results part. Different from existing methods,
EAMNet develops a novel technique named as multi-layer av-
erage pooling to extract discriminative features by aggregating always visible on the fundus images of glaucoma patients. We
multi-scale information strictly related to glaucoma diagnosis. infer that our proposed method refers to not only the parameters
This strategy improves above 1.1% compared with the existing of the optic disc and cup but also some rare features in the
direct classification methods. diagnosis of glaucoma. These features are also very important
Significantly, as shown in Fig. 9, EAMNet provides the pre- clinically, sometimes decisive. Therefore, we are convinced that
cise activation area which contributes to glaucoma diagnosis. the diagnostic basis of our method is the same as that of humans.
In our experiments, the activation maps are used to localize the It can be proven that our method is interpretable.
lesions and segment optic disc to validate the effectiveness of our We compare the One-GAP-EAM model with Multi-Layers
EAMNet. In Fig. 9, the second columns show that EAM activates Average Pooling. As shown in Fig. 11, the result of One-GAP-
the attention area as the pathogenesis area in fundus images EAM is not good enough to be used to segment optic disc. And,
for glaucoma diagnosis. The third columns in Fig. 9 are the there is no other lesion area shown on the final One-GAP-EAM.
segmented optic disc with our EAMNet. We can observe that the Therefore, the EAM composed of feature maps with different
EAMNet can deal with the challenging optic disc segmentation resolutions can be better used to diagnose glaucoma and extract
task even though the image-level labels are used for training glaucoma lesions comprehensively.
our model. It is worth noticing that the existing methods always In addition, experiments are conducted to demonstrate the
achieve state-of-the-art results based on the supervised model clinical interpretation changes when the EAM module is inserted
with pixel-level labels. in different layers. To ensure the depth of the network, the last
It should be noted that the optic disc area is often segmented to stage, Conv_5x, is always connected by the EAM module. As
measure the structural changes for accurate glaucoma diagnosis. shown in Fig. 11, the outputs of the random structure are more
Those distinguish regions show that our EAMNet is focused on likely to be affected by the irrelevant information, like vessels
the area of the optic disc and its lesions where the pathogenesis of and texture of retina. It is because the models are likely to overfit.
glaucoma are highlighted for the diagnosis of clinicians. Also, When the representation ability of a model is weak, the identity
we evaluate Multi-Layers Average Pooling and Single-Layer information, like vessels and texture of retina will dominante.
Average Pooling and find out that the ability of evidence activa- Therefore, it can be proved that our structure can well represent
tion is largely enhanced. the pathology of glaucoma rather than overfitting the data set.
Noting that the distribution of EAM is uneven as shown in We underfit the EAMNet step by step to explore the relation-
Fig. 10 . There are some extra activated areas beyond the optic ship of glaucoma diagnosis and optic disc segmentation in the
disc. We observe these areas alone. These areas are the character- unified framework. As shown in Table I, we observed that the
istics of glaucomatous related lesions. Such as bleeding, notch, result of glaucoma diagnosis is improved with the increasing
PPA and structural variation of blood vessels. They are also of optic disc segmentation. We remove the batch normalisation
crucial clinical evidence for glaucoma diagnosis. They are not layers in each ResBlocks and change the dropout rate to 0.2 to

Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:18:08 UTC from IEEE Xplore. Restrictions apply.
LIAO et al.: CLINICAL INTERPRETABLE DEEP LEARNING MODEL FOR GLAUCOMA DIAGNOSIS 1411

TABLE I TABLE III


RELATIONSHIP OF CLASSIFICATION AND SEGMENTATION OPTIC DISC SEGMENTATION ON THE ORIGA VALIDATION SET

TABLE II accurate glaucoma diagnosis (0.88 AUC) and optic disc seg-
CLASSIFICATION ON THE ORIGA VALIDATION SET mentation (0.9 Adisc and 0.278 Edisc). Here, EAMNet obtains
precise boundaries of the optic disc and accurate glaucoma di-
agnosis simultaneously since the accurate segmentation of optic
disc originates from accurate glaucoma diagnosis. In addition,
the accurate segmentation (even evidence identification) pro-
motes and verify the accuracy of glaucoma diagnosis. Compared
with state-of-the-art methods, our EAMNet achieves accurate
glaucoma diagnosis, meanwhile obtains high performance on
evidence activation.
As shown in Table III, the results show that EAMNet deals
effectively with the challenging task of optic disc segmenta-
tion, even though the pixel-level is unavailable. Noting that our
overfit the model. It can be found that as the overfitted accuracy method is worse than other methods in the task of optic disc
raises the segmentation accuracy drops. It can be proven that segmentation. It is because that we did not use any pixel-level
although it looks like two independent tasks, the optic disc labels, and there is much less supervision information in our task
segmentation and glaucoma diagnosis in a unified framework are than fully-supervised method. We only use the fully-supervised
strongly related. We are convinced that the segmentation of optic method for comparison. And the comparison results are only
disc is guided by the procedure of glaucoma diagnosis, while for reference to prove that our semi-supervised method is as ef-
the accurate glaucoma diagnosis is also promoted by effective fective as other methods. Although using the image-level labels,
segmentation of optic disc as evidence map. EAMNet performs closely to fully-supervised OD segmentation
2) Comparison Results: In this section, we compare the methods. This phenomenon indicates that the main pathological
results of proposed EAMNet with different types of CNN ar- area of glaucoma is located in the optic disc, which matches do-
chitectures and show that our EAMNet obtains the state-of- main knowledge of glaucoma. And considering intuition clinical
the-art performance on glaucoma diagnosis. Same as above, evidence of glaucoma, like CDR, closely related to the optic disc,
to quantify the evidence activation, we compare the results of it is interpretable when CNN activation map covers it.
optic disc segmentation which is generated by evidence acti-
vation maps with a generic and straightforward segmentation
IV. CONCLUSION AND FUTURE WORK
method. The matched methods are as follow. Gabor [23] and
wavelet [24] method use manual features with Support Vector In this paper, we propose a novel clinical interpretable Con-
Machine (SVM) classifier to get the diagnostic result.GRI [25] vNet architecture named EAMNet not only for accurate glau-
is a probabilistic two-stage classification method to extract the coma diagnosis but also for the more transparent interpretation
Glaucoma Risk Index (GRI) that shows a good glaucoma detec- by highlighting the distinct regions recognized by the network.
tion performance. Superpixel [26] method proposes optic disc The EAMNet solves the lack of interpretability of CNN-based
and optic cup segmentation using superpixel classification for glaucoma diagnosis CAD system. Beside diagnosing glaucoma
glaucoma screening. Chen et al. [10] and Zhao et al. [9] propose with high precision, the proposed EAMNet also gives an in-
two CNN method both of them have good accuracy. Meanwhile, terpretation for diagnosis. It presents the ability of weakly-
U-Net [32] and M-Net + PT [36] are optic disc segmentation supervised optic disc segmentation. And it activates the extract
method also using CNN. glaucoma lesions like bleeding, notch, PPA and structural vari-
In the experiment, the manual labels are adopted as the ground ation of blood vessels. The proposed EAMNet employed the
truth. 10-fold cross-validation method is used in the experiment. ResNet and M-LAP. It consists of 3 GAPs connecting to 3 layers
We divided all samples into ten parts, each containing equal of the different resolution increasing the resolution of EAM sig-
proportions of glaucoma and normal individuals. Each time nine nificantly. The result shows that this method makes classification
samples were used as training samples, and the remaining one performance primarily preserved. And an additional function
was used as a test sample. Finally, each result was averaged to of optic disc segmentation is attached. We have demonstrated
obtain the final diagnosis result. As shown in Tables II and III, that our system produces high accuracy diagnosis and optic disc
experimental results show that the proposed EAMNet achieves segmentation results on ORIGA dataset.

Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:18:08 UTC from IEEE Xplore. Restrictions apply.
1412 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 24, NO. 5, MAY 2020

Based on this work, limitations and open questions are drawn. [17] K. Blekas, D. I. Fotiadis, and A. Likas, “Greedy mixture learning for
High-resolution feature maps are hard to be represented by GAP. multiple motif discovery in biological sequences,” Bioinformatics, vol. 19,
pp. 607–617, 2003.
Besides, the optic cup is also important and related to glaucoma [18] K. Kourou, C. Papaloukas, and D. I. Fotiadis, “Modeling biological data
diagnosis. Further studies need to be carried out to design a through dynamic bayesian networks for oral squamous cell carcinoma
more empirical model to deal with the clear cup segmentation classification,” in Proc. World Congr. Med. Phys. Biomed. Eng., 2018,
pp. 375–379.
by weakly-supervised evidence exploring. [19] Q. Zhang et al., “Interpreting CNN knowledge via an explanatory graph,”
in Proc. Thirty-Second Assoc. Advancement Artif. Intell. Conf. Artif. Intell.,
REFERENCES 2017, pp. 4454–4463.
[20] C. M. Olthoff et al., “Noncompliance with ocular hypotensive treatment
[1] E. Dervisevic, S. Pavljasevic A. Dervisevic, and S. Kasumovic, “Chal- in patients with glaucoma or ocular hypertension: An evidence-based
lenges in early glaucoma detection,” Med. Arch., vol. 70, pp. 203–207, review,” Ophthalmology, vol. 112, pp. 953–961, 2005.
2016. [21] H. I. Suk and D. Shen, “Deep learning in diagnosis of brain disorders,”
[2] H A. Quigley and A T. Broman, “The number of people with glaucoma Recent Progress in Brain and Cognitive Engineering, pp. 203–213, 2015.
worldwide in 2010 and 2020,” Brit. J. Ophthalmology, vol. 90, pp. 262– [22] R. Zhao et al., “Weakly-supervised simultaneous evidence identification
267, 2006. and segmentation for automated glaucoma diagnosis,” in Proc. Thirty-
[3] M. C. Leskea, A. Heijl, L. Hyman, B. Bengtsson, and E. Komaroff, Third Assoc. Advancement Artif. Intell. Conf. Artif. Intell., Jul. 2019,
“Factors for progression and glaucoma treatment: The early manifest vol. 33, no. 01, pp. 809–816.
glaucoma trial,” Current Opinion Ophthalmology, vol. 15, pp. 102–106, [23] U. R. Acharya et al., “Decision support system for the glaucoma using
2004. Gabor transformation,” Biomed. Signal Process. Control, vol. 15, pp. 18–
[4] W. Chan et al., “Analysis of retinal nerve fiber layer and optic nerve head 26, 2015.
in glaucoma with different reference plane offsets, using optical coherence [24] S. Dua, U. R. Acharya, P. Chowriappa, and S. V. Sree, “Wavelet-based
tomography,” Dig. World Core Med. J., 2006. energy features for glaucomatous image classification,” IEEE Trans. Inf.
[5] M. A. Fernandez-Granero et al., “Automatic CDR estimation for early Technol. Biomedicine, vol. 16, no. 1, pp. 80–87, Jan. 2012.
glaucoma diagnosis,” J. Healthcare Eng., vol. 2017, pp. 1–14, 2017. [25] R. Bock, J. R. Meier, and L. G. Nyl, “Glaucoma risk index: Automated
[6] M. H. Tan et al., “Automatic notch detection in retinal images,” in Proc. glaucoma detection from color fundus images,” Med. Image Anal., vol. 14,
IEEE Int. Symp. Biomed. Imag., 2013, pp. 1440–1443. pp. 471–481, 2010.
[7] S. Vernon, “How to screen for glaucoma,” Practitioner, vol. 239, pp. 257– [26] J. Cheng et al., “Superpixel classification based optic disc and optic cup
260, 1995. segmentation for glaucoma screening,” IEEE Trans. Med. Imag., vol. 32,
[8] J. B. Jonas et al., “Ranking of optic disc variables for detection of no. 6, pp. 1019–1032, Jun. 2013.
glaucomatous optic nerve damage,” Investigative Ophthalmology Vis. Sci., [27] Z. Wang et al., “Zoom-in-Net: Deep mining lesions for diabetic retinopathy
vol. 41, pp. 1764–1773, 2000. detection,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assisted
[9] R. Zhao et al., “Automatic detection of glaucoma based on aggregated Intervention, 2017, pp. 267–275.
multi-channel features,” J. Comput.-Aided Des. Comput. Graph., vol. 29, [28] S. Barratt, “InterpNET: Neural introspection for interpretable deep learn-
pp. 998–1006, 2017. ing,” Preprint, 2017, arXiv:1710.09511.
[10] B. Zou et al., “Classified optic disc localization algorithm based on [29] K. He et al., “Deep residual learning for image recognition,” in Proc. IEEE
verification model,” Comput. Graph., vol. 70, pp. 281–287, 2018. Conf. Comput. Vis. Pattern Recognit., 2015, pp. 770–778.
[11] X. Chen et al., “Glaucoma detection based on deep convolutional neural [30] B. Zhou et al., “Learning deep features for discriminative localization,” in
network,” Eng. Medicine Biol. Soc., pp. 715–718, 2015. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 2921–2929.
[12] P. Junior, A. Sarmento, “A HW/SW embedded system for accelerating [31] J. Ngiam et al., “Sparse filtering,” in Proc. Int. Conf. Neural Inf. Process.
diagnosis of glaucoma from eye fundus images,” in Proc. Int. Symp. Syst., 2011, pp. 1125–1133.
Rapid Syst. Prototyping: Shortening Path Specification Prototype, 2016, [32] M. Oquab et al., “Is object localization for free? - Weakly-supervised learn-
pp. 12–18. ing with convolutional neural networks,” in Proc. IEEE Conf. Comput. Vis.
[13] T. J. Jun et al., “2sRanking-CNN: A 2-stage ranking-CNN for diagnosis of Pattern Recognit., 2015, pp. 685–694.
glaucoma from fundus images using CAM-extracted ROI as intermediate [33] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEE Conf.
input,” Preprint, 2018, arXiv:1805.05727. Comput. Vis. Pattern Recognit., 2015, pp. 1–9.
[14] S. Yousefi et al., “Glaucoma progression detection using structural reti- [34] Z. Zhang et al., “ORIGA-light : An online retinal fundus image database
nal nerve fiber layer measurements and functional visual field points,” for glaucoma analysis and research,” in Proc. IEEE Conf. Eng. Med. Biol.
IEEE Trans. Biomed. Eng., vol. 61, no. 4, pp. 1143–1154, Apr. Soc., 2009, vol. 2010, pp. 3065–3068.
2014. [35] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks
[15] Z. Gao et al., “Motion tracking of the carotid artery wall from ultrasound for biomedical image segmentation,” in Proc. Int. Conf. Med. Image
image sequences: A nonlinear state-space approach,” IEEE Trans. Med. Comput. Comput.-Assisted Intervention, 2015, pp. 234–241.
Imag., vol. 37, no. 1, pp. 273–283, Jan. 2018. [36] H. Fu, J. Cheng, Y. Xu, D. W. K. Wong, J. Liu, and X. Cao, “Joint optic
[16] Z. Gao, H. Xiong, and X. Liu, “Robust estimation of carotid artery wall disc and cup segmentation based on multi-label deep network and polar
motion using the elasticity-based state-space approach,” Med. Image Anal., transformation,” IEEE Trans. Med. Imag., vol. 37, no. 7, pp. 1597–1605,
vol. 37, pp. 1–21, 2017. Jul. 2018.

Authorized licensed use limited to: PSG COLLEGE OF TECHNOLOGY. Downloaded on December 03,2024 at 04:18:08 UTC from IEEE Xplore. Restrictions apply.

You might also like