Retinal Medical Image Classification Based On Deep Convolutional Neural Network AlexNet
Retinal Medical Image Classification Based On Deep Convolutional Neural Network AlexNet
Abstract—Eye diseases will have a very serious impact on detachment refers to the separation of retinal neuroepithelium
the life, study and work of patients. In order to better assist and pigment epithelium. Patients will have symptoms such as
doctors in their work, it is very meaningful to use deep learning spotted vision, light spots shaking or shadows covering their
neural networks for medical image analysis and auxiliary medical
diagnosis. In this paper, we use deep neural network AlexNet eyes [10]. (6) Asteroid hyalosis is a common degenerative
combined with Adam optimization algorithm to classify images of process in which fat calcium globules accumulate in the
four common eye diseases: vitreous opacity, vitreous opacity with vitreous humor. Although its cause and mechanism are still
retinal detachment, asteroid hyalosis and vitreous hemorrhage. unclear, it is related to aging and some systemic diseases
Use confusion matrix, accuracy, precision, recall, specificity and [11]. (7) Vitreous hemorrhage is a relatively common eye
other evaluation indicators to evaluate its classification effect.
The application results of the above methods on ophthalmic diseasewhich leads to refractive interstitial clouding and cause
ultrasound images from actual hospitals show that AlexNet has vision loss. The most common cause is proliferative diabetic
high classification accuracy for actual ultrasound pattern, and retinopathy, followed by ocular trauma [12]. This article
can be used to assist doctors in ophthalmic disease diagnosis. focuses on the diagnosis of four types of diseases: vitreous
Index Terms—ultrasound pattern; computer aided diagnosis; opacities, vitreous opacities with retinal detachment, asteroid
deep learning; AlexNet;
hyalosis and vitreous hemorrhage.
With the increasing of ophthalmic patients and the mixed
I. I NTRODUCTION
occurrence of various diseases, more and more refined treat-
Eye is one of the most important sensory organs in the ment methods are required. However, the work intensity of
human body. Eye disease can have a serious impact on a doctors is getting higher, so the urgent need is to develope
person’s life, studies and work [1, 2]. It is essential to prevent, new technology to assist doctors in diagnosis [13–15]. AlexNet
diagnosis and treat the eye disease in a timely manner. Many won the 2012 ILSVR (ImageNet Large-Scale Visual Recog-
people around the world are suffering from eye diseases and nition Challenge) competition, which was the first time that
even facing the risk of blindness, so it is very necessary to use a deep learning neural network participated the competition.
new technical means to find and diagnosis the eye disease as ResNet won in the 2015 ILSVRC competition with a Top5
earlier as possible [3, 4]. error rate of 3.57% on the classification track, which has
Common eye diseases include as follows: (1) Glaucoma exceeded the error rate that the human eye can achieve.
is a group of diseases that damage the optic nerve, vision Lots of research has made good progress linking neural
loss and visual field loss. According to recent estimates, the networks to the analysis of medical images and auxiliary
number of glaucoma patients worldwide will increase from medical diagnosis. Ting DSW et al. built a deep learning
76.5 million in 2020 to 111.8 million in 2040 [5]. (2) Cataract system to detect eye diseases such as diabetic retinopathy,
is a visual disorder caused by the opacity of the lens caused sight-threatening diabetic retinopathy and possible glaucoma
by protein degeneration inside the lens, which can lead to [16]. The deep learning system they built has high sensitivity
varying degrees of vision loss or even blindness [6]. (3) and specificity in identifying diabetic retinopathy and related
Macular degeneration is a disease caused by the atrophy of the eye diseases. It is worth noting that the above method divides
macula in the retina or the accumulation of metabolites and the data into multiple layers rather than the entire image, so
the function deterioration. Age-related macular degeneration it is difficult to directly apply in practice. When assessing the
is the most prevalent retinal disease in the Western world severity of age-related macular degeneration (AMD), it can be
[7, 8]. (4) Vitreous opacity is caused by the development of time consuming if all handled manually by an expert. So Yifan
opacities, liquefaction and shrinkage of the vitreous over time, Peng et al. proposed a deep learning model: DeepSeeNet [17].
resulting in an opaque body within the vitreous [9]. (5) Retinal This method can directly target color fundus photographs and
978-1-6654-5120-8/22/$31.00©2022 IEEE
1
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on September 07,2023 at 06:42:12 UTC from IEEE Xplore. Restrictions apply.
automatically classify patients according to the severity of age- contains 60,000,000 parameters and 65,000 neurons. The
related macular degeneration. The deep learning algorithms for general optimization algorithm for AlexNet was stochastic gra-
classifying glaucoma lesions often required extensive human dient descent, while the adaptive moment estimation (Adam)
labelling of the dataset. Medeiros FA proposed to use spec- optimization algorithm was chosen in this experiment. The
tral domain optical coherence tomography (SDOCT) data to detailed structure can be seen in Fig. 1. The feature extraction
train deep learning algorithms to quantify glaucoma structural is finished at convolutional layer, which contains several
damage on optic disc photographs, so the glaucoma disease convolutional kernels that generate their own feature maps
can be better diagnosed [18]. and its shared weight largely reduces the amount of training
Deep learning not only helps ophthalmologists in the aux- of the model parameters. Pooling is applied to reduce the
iliary diagnosis of eye diseases, but also makes good progress feature matrix extracted from the convolutional neural network
in other disease diagnosis. Lakshmanaprabu S.K. et al. used and the maximum pooling is selected here. As shown in
optimal deep neural network (ODNN) to extract deep features Fig. 2, the 4*4 feature matrix can be obtained as a 2*2 matrix
from CT lung images and used linear discriminant analysis after the maximum pooling layer, which largely reduces the
(LDA) to reduce the dimension of features [19]. ODNN computational effort when training the network.
effectively reduced the manual labeling time and human errors, Before a deep convolutional neural network can correctly
and the specificity reached 94.2%. Hassan Ali Khan et al. classify the target data it needs to be trained extensively with
used the edge detection technology to find and crop the a training dataset, during which the parameters and weights in
region of interest in the brain tumor image, then proposed a the network are continuously iterated. To speed up the training
simple convolutional neural network for efficient brain tumor of the network, AlexNet can be computed on the GPU, and
classification [20]. This proposed network is very small, and a number of optimisations have been made to improve the
the training time for each round is about 200 seconds, which is network’s classification ability. First, deep neural networks all
much lower than the existing network methods: 456 seconds need activation functions to improve their nonlinearization and
for VGG-16 and 606 seconds for ResNet-50. But how well AlexNet chooses to use ReLU as the activation function at the
such a small network will perform when applied to multi- convolutional layer, the formula is
disease classification is unknown. Therefore, we choose a
smaller AlexNet network for multi-disease classification, and ReLU = max(x, 0) (1)
it is also one of the goals of this study to examine whether
a smaller network can complete the work of multi-disease Previous networks often used sigmoid or tanh as the activa-
classification. tion function, but these activation functions are complicated to
In view of the wide variety of eye diseases and the location derive during training calculations and tend to cause gradient
of the lesions are very close, this paper uses the deep learning disappearance, while ReLU avoids these problems by increas-
neural network AlexNet combined with the Adam optimizer ing the nonlinearity of the network. Secondly, Dropout module
to classify four types of eye disease pictures. AlexNet includes is added to AlexNet, where Dropout randomly deactivates neu-
technologies such as convolution layer, pooling layer, rectified rons at a set ratio during each round of training. For example, if
linear unit (ReLU) and Dropout. On this basis, the Adam a ratio of 50% is set, Dropout will randomly deactivate 50% of
optimization algorithm is combined to further enhance the the neurons in that layer during training, which is equivalent
training effect of the model. This paper evaluates the clas- to randomly cutting off 50% of the pathways in that layer.
sification effect of AlexNet network in multiple ophthalmic Each round of training is scaled to randomly deactivate the
diseases through indicators such as confusion matrix, accuracy, network, which greatly increases the generalization ability of
precision, recall and specificity, and verifies the feasibility of deep convolutional networks and better prevents overfitting.
AlexNet for practical auxiliary diagnosis. Fig. 3 shows Dropout randomly deactivating neurons in the
The paper is divided into the following sections: section II fully connected layer.
contains introduce the experimental model and evaluation in- Deep convolutional neural networks require the use of a loss
dex, section III describes the ophthalmic data set, environment function to measure the error between the predicted and actual
and results of the experiment, section IV the experimental labels, and reduce the error by passing the update parameters
results and analyzes the challenges in future work. backwards. This experiment was conducted for a classification
task, and so the CrossEntropy Loss function was chosen as a
II. M ODEL AND E VALUATION M ETRICS
tool to measure the error between the predicted labels and the
A. Deep Convolutional Neural Network Model actual labels. The CrossEntropy Loss is defined as follows,
AlexNet [21] is a deep convolutional neural network jointly
n
developed by Hinton and his student Alex Krizhevsky, which X
won the championship in the ImageNet LSVRC-2010 com- Hi = − Oi∗ log (Oi ) (2)
i=1
petition. AlexNet consists of five convolutional layers, three
pooling layers and three fully connected networks. The final where Hi is the cross entropy between the predicted and true
output layer is a fully connected layer with 1000 channels probabilities of the data in class i, and Oi∗ is the true value of
and an activation function of softmax. The entire network the classification label. Oi is the probability of the ith label
2
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on September 07,2023 at 06:42:12 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. The structural framework of AlexNet
g = hθ x i − y i x i
wt = β1 wt−1 + (1 − β1 ) ∗ g
vt = β2 vt−1 + (1 − β2 ) ∗ g 2
wt
wt∼ = (4)
1 − β1t
vt
vt∼ =
1 − β2t
Fig. 3. Dropout randomly deactivated neurons
α
θj = θj−1 − wt∼ ∗ √ ∼
vt + ϵ
predicted by the neural network, calculated from the Softmax
function of the fully connected layer, where g is the calculated gradient, wt is the first-order moment
of the gradient, β1 is the decay coefficient of the first-order
exp (yi ) moment, vt is the second-order moment of the gradient g, β2
O i = Pn (3) is the decay coefficient of the second-order moment, θ is the
j=1 exp (yj )
parameter to be updated or solved for, wt∼ and vt∼ are the
where n is the total number of all labels and yi is the output offset corrections for wt and vt .
3
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on September 07,2023 at 06:42:12 UTC from IEEE Xplore. Restrictions apply.
B. Evaluation indicators III. E XPERIMENT
The evaluation metrics include Confusion Matrix, Accuracy, A. Datasets
Precision, Recall and Specificity. The sample data in the test The experiment data was obtained from the colour ultra-
set are divide into four main categories according to their sound cropping process provided by the hospital. The ocular
true and predicted labels: TP (True Positive), the number of ultrasound examines the vitreous and lens to determine the
samples that are predicted by the model to be positive and anterior and posterior diameters of the lens and length of
whose true labels are also positive. TN (True Negative), the intraocular visual axis, and to look for echogenicity at the edge
number of samples that are predicted by the model to be of lens. The vitreous humor may also be examined to look
negative and whose true labels are also negative. FP (False for conditions such as infection or blood accumulation in the
Positive), the number of positive samples predicted by the vitreous humor. The raw data were first manually cropped to
model but the true labels are negative samples. FN (False remove gaps and interfering information and sorted, and finally
Negative), the number of positive samples predicted by the reviewed by a medical professional before being applied to the
model, but the real label is the number of positive samples. experiment. The data set consisted of 1,966 images, with five
The confusion matrix can then be drawn from the above four main categories: vitreous opacity 600 images, vitreous opacity
values, as shown in Fig. 4. retinal detachment 227 images, asteroid hyalosis 304 images,
vitreous hemorrhage 185 images and normal 600 images. The
data were divided into training set and test set in a ratio of
9:1 for training. Fig. 5 shows a display of the five categories.
Fig. 4. Diagram of the confusion matrix Fig. 5. Presentation of the 5 categories in the dataset
4
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on September 07,2023 at 06:42:12 UTC from IEEE Xplore. Restrictions apply.
TABLE I
A LEX N ET INTERNAL PARAMETERS
TABLE II
C OMPARISON OF THE TWO OPTIMISATION ALGORITHMS
5
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on September 07,2023 at 06:42:12 UTC from IEEE Xplore. Restrictions apply.
TABLE III
T HE PRECISION , RECALL AND SPECIFICITY OF A LEX N ET FOR TESTING AFTER TRAINING WITH DIFFERENT OPTIMIZATION ALGORITHMS FOR DIFFERENT
EPOCHS
Adam SGD
Epochs=50 Epochs=70 Epochs=300
Precision Recall Specificity Precision Recall Specificity Precision Recall Specificity
normal 0.897 0.867 0.982 0.737 0.933 0.939 0.667 0.933 0.915
asteroid hyalosis 0.800 0.800 0.911 0.820 0.683 0.933 0.797 0.783 0.911
vitreous opacities 0.769 0.556 0.983 0.833 0.556 0.989 0.583 0.389 0.972
vitreous opacity,
0.703 0.750 0.859 0.643 0.750 0.815 0.684 0.650 0.867
retinal detachment
vitreous hemorrhage 0.828 0.889 0.970 0.840 0.778 0.976 0.840 0.778 0.976
6
Authorized licensed use limited to: SRM Institute of Science and Technology. Downloaded on September 07,2023 at 06:42:12 UTC from IEEE Xplore. Restrictions apply.