Deep Learning for Skin Lesion Classification
Deep Learning for Skin Lesion Classification
[Link]
Abstract
ResNet50 and VGG-16 models are introduced in this paper with different strategies, with
and without preprocessing and with and without Support Vector Machine (SVM).
Moreover, both transfer learning and data augmentation are used to solve the problem
of lack of tagged data. The fully connected (FC) layer is replaced by the SVM classifier
leading to better accuracy. In addition, in our work, we utilize the median filter, contrast
enhancement and edge detection, which based on four main steps: noise removal,
gradient smoothed image calculations, non-highest suppression and hysteresis
thresholding. Also, the k-fold cross validation is performed to authenticate our model’s
performance. Three data sets: ISIC 2017 MNIST-HAM10000 and ISBI 2016 are utilized
in our proposed work. It is observed that the proposed technique of employing ResNet50
hybridized with SVM achieves the best performance, specifically with the ISIC2017
dataset, producing 99.19% accuracy, 99.32% area under the curve (AUC), 98.98%
sensitivity, 98.78% precision, 98.88% F1 score and 2.6988 s computational time.
Keywords Skin Cancer . Deep learning . Support vector machine . ResNet50 . Transfer learning
1 Introduction
Skin cancer has been increasing among men and women worldwide for many decades [23].
There were approximately 76,250 new cases of melanoma and approximately 8790 new
* Moustafa H. Aly
mosaly@[Link]
Wessam M. Salamaa
[Link]@[Link]
1
Department of Basic Science, Faculty of Engineering, Pharos University, Alexandria, Egypt
2
Department of Electronics and Communications Engineering, College of Engineering and
Technology, Arab Academy for Science, Technology and Maritime Transport, Alexandria, Egypt
26796 Multimedia Tools and Applications (2021) 80:26795–26811
melanoma-related deaths during 2012 in the United States [39]. In Brazil, it is estimated that
for the biennium of 2018–2019, there was 165,580 new cases of non-melanoma skin cancer
[26]. Skin cancer spreading results from many factors such as long longevity of the population,
people being exposed to the sun, and early detection of skin cancer. Dermoscopy, a noninva-
sive skin imaging technique, is one of the most reliable methods for early detection of skin
cancer. The appearance of skin lesions in dermoscopic images may change significantly
according to the skin condition. In addition, different artifact sources, including hair, skin
texture, and air bubbles may lead to misidentifying the boundary between the skin lesions and
the surrounding healthy skin. Despite the effectiveness of dermoscopic diagnosis for skin
cancer, it is very difficult for expert dermatologists to provide an accurate classification of
malignant melanoma and benign skin lesions for large number of dermoscopic images.
Therefore, it is very necessary to develop an efficient non-invasive computer-aided diagnosis
(CAD) system for skin lesion classification. The CAD system generally consists of four main
steps: image pre-processing, segmentation, feature extraction, and classification. Note that
each step significantly affects the classification performance of the whole CAD system [38].
Therefore, efficient algorithms should be employed in each step to achieve high diagnosis
performance.
Several studies investigated different machine learning techniques for diagnosis of different
types of cancer [8, 11, 24, 32]. Most of these studies employed classifiers with relatively
simple structures and trained on a group of hand-crafted features extracted from the images.
Most of machine learning techniques require high computational time for accurate diagnosis
and their performance depends on the selected features that characterize the cancerous region.
Deep learning techniques and Convolutional Neural Networks (CNNs) have become an
important approach for automated diagnosis of different types of cancer [3]. Deep learning
has achieved impressive results in image classification, including skin lesion analysis. In image
classification tasks, transfer learning [21] and data augmentation [29] are employed to
overcome the lack of data and to reduce the computational and memory requirements.
Transfer learning turns out to be useful when dealing with relatively small datasets, e.g.
medical images, which are harder to obtain in large numbers than other datasets. Instead of
training a neural network from scratch, which would require a significant amount of data and a
high computational time, it is often convenient to use a pre-trained model and just fine-tune its
performance to simplify and speed up the process. Several pre-trained CNNs, including
AlexNet, Inception, ResNet, and DenseNet [9, 28] were trained on the ImageNet Large Scale
Visual Recognition Challenge (ILSVRC) dataset. Several pre-trained CNNs achieved high
classification performance in skin cancer diagnosis [20, 22, 25].
Data augmentation is a method for increasing the size of the input data by generating new
data from the original input data [29]. Image augmentation techniques can be employed to
overcome the deficiency of the skin cancer dataset. There are many data augmentation
strategies, including rotation, scaling, random cropping, and color modifications. Data aug-
mentation is broadly employed with pre-trained CNN architectures. In [25], the performance
of skin lesion classification is enhanced by using data augmentation with geometric transforms
(rotations by multiples of 90 degrees and lesion-preserving crops). In [22], the effect of data
augmentation was investigated over different binary classifiers trained with features extracted
by a pre-trained Inception-v4. In [20], a deep learning approach was employed to classify
melanoma, seborrheic keratosis, and nevocellular nevus, using dermoscopic images. In [2], a
deep CNN prediction model based on new regularizer technique is employed for skin lesion
classification with an accuracy of 97.49%.
Multimedia Tools and Applications (2021) 80:26795–26811 26797
In this paper, the main contribution is that two ResNet50 and VGG16 deep neural networks
with different strategies, with and without preprocessing and with and without SVM, are used.
A fully connected layer is replaced with SVM in both previous networks to achieve high rates
in the skin cancer classification problem. Preprocessing steps are employed to improve dataset
quality and therefore increase our framework performance. Three data augmentation and
transfer learning techniques are employed on the most common dermatoscopic image data-
bases in the field. The first one is the International Skin Imaging Collaboration (ISIC) 2017
[13], the second one is the MNIST: HAM10000 [14], and the third one is the International
Symposium on Biomedical Imaging (ISBI 2016) [13].
The remainder of the paper is structured as follows. Section 2 details the methodology used
in the current study for the classification of datasets as benign or malignant. The metric
evaluation and obtained results are displayed and discussed in Sections 3 and 4, respectively.
Section 5 is devoted to the main conclusions.
2 Methodology
The aim of this study is to propose an accurate diagnosis system for skin lesion classification in
dermoscopic images. The proposed system is based on employing efficient image processing
techniques: median filter, contrast enhancement and edge detection. These are based on four
main steps: noise removal, gradient smoothed image calculations, non-highest suppression and
hysteresis thresholding, for preprocessing data augmentation, and transfer learning along with
investigating different pre-trained CNN architectures for skin lesion classification. Also, the k-
fold cross validation is performed to authenticate our model performance. Figure 1 explains the
structure of our proposed skin cancer system.
2.1 Preprocessing
Preprocessing steps play a vital role in increasing system performance. Filtering and enhance-
ment image are considered as examples of preprocessing. Filtering and enhancement are useful
for reducing random noise. Here, our preprocessing steps are divided into three steps.
Median filter: In fact, the consistency of the diagnostic picture is evidently diminished due
to the bad intimacy of all kinds of noise. This can cause serious impediments to the sorting of
images. The nonlinear function of the median filter can be expressed as [12].
T ðmÞ ¼ med ½yðm− f Þ; yðm−f þ 1Þ; …::yðmÞ; yðm þ f −1Þ ð1Þ
y1;1…………………………… y1;ð2mþ1Þ
Ym ¼ ¼ ðym ð1Þ… …::ym ðW ÞÞT ð2Þ
y1;ð2mþ1Þ … …y1;ð2mþ1Þ;ð2mþ1Þ
where W = 2N + 1 = (2m + 1)is the window size and m, N are all integers, and ym(N + 1) is the
central input sample
yðmÞ þ ðyðmÞ þ 1Þ
med ¼ ð3Þ
2
where med is defined as middle sample from elements of vector y(m) which is defined as the
input signal.
In our scenario, this window size (10 × 10) delivers better results in the classification
process.
& Contrast enhancement: The purpose of the contrast enhancement is to make the image
features more noticeable. The contrast stretching technique calculates the difference in the
brightness and color of the object background to aid better perception of the human visual
system. To measure the consistency and purity of view of the image local contrast C is
measured as follows [10]
μ
target− μbackground
C local ¼ ð4Þ
μtargetþ μbackground
Here, μtarget and μbackground are the mean grey levels of the target and background image in the
local region of interest. Larger values of Clocal represent more purity of the image.
& Edge detection: The tumor edges are detected using the Canny edge identification
technique [4]. This consists of four principal steps:
Multimedia Tools and Applications (2021) 80:26795–26811 26799
- Noise removal: remove the noise effects from the skin images I(i, j) using a Gaussian filter
Gσ(i, j). For image processing, the two-dimensional zero mean discrete Gaussian function is
expressed as:
ði2 þ j2 Þ
Gσ ði; jÞ ¼ e− 2σ2 ð5Þ
In two dimensions, Gaussian functions are rotationally symmetric. This means that the amount
of smoothing performed by the filter will be the same in all directions.
The output smoothed data K(i, j) is obtained from the convolution between the Gaussian
filter Gσ(i, j) and the skin images I(i, j) as follows:
K ði; jÞ ¼ Gσ ði; jÞ*I ði; jÞ ð6Þ
where Gσ(i, j) is the Gaussian smoothing filter and where σ is the standard deviation of
Gaussian distribution.
- Gradient smoothed image calculations: the smoothed image will express the gradient
magnitude Z(i, j) as
s
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
∂K ði; jÞ 2 ∂K ði; jÞ 2
Z ði; jÞ ¼ þ ð7Þ
∂x ∂y
where ∂K∂x
ði; jÞ
= Kx(i, j) and ∂K∂yði; jÞ = Ky(i, j) are the x and y partial derivatives, respectively.
The smoothed image gradient angle θ(i, j):
K y ði; jÞ
θði; jÞ ¼ tan−1 ð8Þ
K x ði; jÞ
- Non-highest suppression: This step is used to remove the blurring effect arising from the
gradient phase in the edges of images. For each pixel I(i, j), if Z(i, j) is in the best approximate
gradient direction higher than its two neighbors, then I(i, j) = Z(i, j); otherwise assign I(i, j)= 0.
– Hysteresis thresholding: The non-maximum image suppressed from the output neverthe-
less suffers from local maxima induced by the noise. Hysteresis thresholding is used to
solve the local maxima problem, where by two threshold values are: For I(i, j) add low
threshold TL and high threshold TH. If a pixel gradient is above TH, the pixel is recognized
as an edge pixel while the non-edge pixel is known to be a pixel with a gradient value less
than TL. If the pixel gradient value is between TL and TH, evaluate its neighbors iteratively,
then find it to be an edge pixel if it is directly or via pixels between TL and TH related to an
edge pixel.
After the pre-processing step, each preprocessed skin image, in the training set, is augmented
into four images by rotating the input image into four directions of 0°, 90°, 180°, and 270°.
Data augmentation introduced in [29] is investigated in the current work to increase the data
size, generate new data from the original input data, and overcome the lack of the tagged
images.
26800 Multimedia Tools and Applications (2021) 80:26795–26811
ISIC2017 database: 1800 Images (820 Malignant and 980 Benign cases)
Transfer learning turns out to be useful when dealing with relatively small datasets, e.g.
medical images, which are harder to obtain in large numbers than other datasets. Instead of
training a deep neural network from scratch, which would require a significant amount of data,
power and time, it is often convenient to use a pre-trained model and just fine-tune its
performance to simplify and speed up the process. For transfer learning, we apply the pre-
trained VGG-16 and ResNet50 [20, 22, 25].
Both ResNet50 and VGG16 are considered the most active deeply convolutionary neural
network for classification. On the ImageNet dataset, both models are pre-trained. In addition,
the last classification layer is replaced by a new classification layer for two classes; benign and
malignant, instead of 1000 classes. Then, certain parameters must be set to start the process of
fine-tuning on the datasets. Firstly, in the ResNet50 model, the iteration number and the
primary learning rate are set at 105 and 10−4, respectively. The epoch number is 50, whereas
the momentum is set to 0.8 and the weight decay is set to 5 × 10−3. A Gaussian distribution
with a zero mean and 10−2 variance is used to initialize the new final layer of classification.
Furthermore, the VGG16 model is set to be retrained as 106 and 10−5, respectively, with the
iteration number and the primary learning. The momentum is 0.6 new and the weight decay is
Table 2 Average classification results and standard deviation for k-fold cross-validation for ResNet50 with
attaching SVM using ISIC 2017 databases
Table 3 Average k-fold cross-validation classification results using ResNet50 and VGG16 with attaching the
SVM for ISIC 2017 dataset
4 × 10−2. To initialize the new layer as described above, a Gaussian distribution with a zero
mean and 10−3 variance is used.
The structure of any deep convolutional neural network consists a number of layers which
can be described with the following equations.
Convolution layer: It is the process where one takes a small matrix of numbers (called
Kernel or filter). Then, it is passed over our image and is transformed based on the values from
filter.
where f is the input image and h is our Kernel or filter. The indexes of rows and columns of the
result matrix are marked with m and n, respectively.
If we look at how our Kernel moves through the image, we see that the impact of the pixels
located on the outskirts is much smaller than those at the center of image. So, we can pad our
image with an additional border according to the following equation:
f −1
p¼ ð10Þ
2
where p is padding and f is the filter dimension.
The dimensions of the output matrix, taking into account padding and stride, can be
obtained using the following formula:
nin þ 2p−f
nout ¼ þ1 ð11Þ
s
where nin is image size, s is used stride.
Table 4 Average k-fold cross-validation classification results using ResNet50 and VGG16 with attaching the
SVM for MNIST-HAM10000 dataset
Table 5 Average k-fold cross-validation classification results using ResNet50 and VGG16 without attaching the
SVM for ISIC 2017 dataset
In CNNs, there are two conventional pooling methods, including maximum pooling and
average pooling. The maximum pooling method selects the largest element in each pooling
region as [37]:
Ykij ¼ max xkpqðp;qÞ∈Rij ð12Þ
where Ykij is output of the pooling operation, xkpqis the element at location (p, q) and Rij is the
pooling region which embodies a receptive field around the position (i, j).
Fully connected layer: It is the last layer of the convolutional neural network, which is
responsible for summarizing the features learned by the convolutional neural network.
Q ¼ MN þ N ð13Þ
where M and N are the lengths of the input vectors and output vectors, respectively, and Q is
the number of parameters in the fully connected layer.
The SVM [5] is a machine learning algorithm that analyzes data for classification. The goal of
the SVM is to find the optimum hyper-plane that separates clusters of target vectors on the
opposing sides of the plane.
3 Evaluation
There are several evaluation tools to assess a classifier amongst them, including the accuracy,
the area under the curve (AUC), the precision and the F1 score.
Table 6 Average k-fold cross-validation classification results using ResNet50 and VGG16 without attaching the
SVM for MNIST-HAM10000 dataset
Table 7 Average k-fold cross-validation classification results using ResNet50 and VGG16 with attaching the
SVM for ISBI 2016 dataset
3.1 Accuracy
Accuracy is the measure of a correct prediction made by the classifier. The accuracy is defined
as [30].
TP þ TN
Accuracy ¼ ð14Þ
TN þ FP þ FN þ TP
where TP, FN, TN, and FP denote True Positive, False Negative, True Negative, and False
Positive, respectively.
3.2 Precision
Precision is the ratio of correctly expected positive observations to the total positive observa-
tions forecasted [30].
TP
Precision ¼ ð15Þ
TP þ FP
3.3 Sensitivity
Sensitivity is the true positive rate or the fraction of true malignant cases that are detected
malignant as [30].
TP
Sensitivity ¼ ð16Þ
TP ¼ FN
Table 8 Average k-fold cross-validation classification results using ResNet50 and VGG16 without attaching the
SVM for ISBI 2016 dataset
Table 9 A comparative view of several classification methods based on different CNN architectures and datasets
and our proposed models with preprocessing
VGG16 hybrid with SVM without preprocessing 5580 ISIC 2017 94.98% 94.56% 94.78%
ResNet50 hybrid with SVM without 5580 ISIC 2017 95.99% 95.73% 95.82%
preprocessing
VGG16 hybrid with SVM with preprocessing 5580 ISIC 2017 98.75% 98.88% 98.78%
ResNet50 hybrid with SVM with preprocessing 5580 ISIC 2017 99.32% 99.19% 98.98%
Deep CNN with data augmentation [36] 211 ISIC 2017 – 88.00 –
CNN model based on attention residual learning 2750 ISIC 2017 87.50 85.00 65.80
[40]
Optimized deep features from well-established 2037 ISIC 2017 83.83 – –
CNNs and from different abstraction levels
[15]
Gabor wavelet-based deep CNN [27] 2000 ISIC 2017 96.00 83.00 –
GoogleNet [34] 21,570 ISIC 2017 – 92.70 –
ResNet50 [34] 21,570 ISIC 2017 – 92.81 –
AlexNet [34] 21,570 ISIC 2017 – 91.40 –
Pre-processing with cytological properties, CNN 1760 ISIC archive 84.70 86.60 80.90
with 5-fold cross validation [35]
Deep CNN with new regularizer technique [19] 1760 ISIC archive 98.00 97.49 94.30
Probabilistic distribution segmentation, 290 ISIC archive – 97.75 96.60
entropy-based method feature selection,
multi-class SVM [33]
Probabilistic distribution segmentation, 2750 ISBI 2016 96.00 93.2 93.00
entropy-based method feature selection, &2017
multi-class SVM [33]
ResNet50 hybrid with SVM with Preprocessing 1000 ISBI 2016 96.45% 96.43% 96.52%
VGG16 hybrid with SVM with Preprocessing 1000 ISBI 2016 94.98% 95.86% 94.98%
Fully convolutional residual network [7] 1250 ISBI 2016 – 94.40 91.10
Deep CNN with data augmentation [7] 900 ISBI 2016 – 83.60 76.00
A deep learning model and iteration-controlled 1279 ISBI2016 98.00 94.50 94.00
Newton-Raphson (IcNR) based feature selec-
tion method [17]
Optimal color and texture features, 350 Atlas of 94.00 93.00 94.00
majority-voting multi-class SVM [1] Dermosc-
opy CD
Automatic ABCD scoring of skin lesions [18] 200 Private – 94.00 91.20
Probabilistic distribution segmentation, 200 PH2 – 97.50 96.67
entropy-based method feature selection,
multi-class SVM [16]
Ant colony-based segmentation, geometric and 172 DermIS and – 93.60 –
texture features, ANN classifier [6] DermQue-
st
3.4 F1 score
The F1 score is utilized as a statistical measure to rate the classifier performance [30].
F1 ¼ 2 ðPrecision SensitivityÞ=ðPrecision þ SensitivityÞ ð17Þ
Multimedia Tools and Applications (2021) 80:26795–26811 26805
Fig. 2 Preprocessing steps: (a) Original image (b) median filter image (c) Contrast enhancement and edge
detection
The AUC is used within the framework of medical diagnosis. It provides an approach for the
evaluation of models. The model with a higher AUC value gives better results in the classifier.
In order to demonstrate the effectiveness of the proposed techniques, 1800, 6000 and 1000 images
taken from the ISIC2017 [13], MNIST-HAM10000 [14] and ISBI 2016 [13] databases are
investigated. Both of them are divided into 70% for training, and 30% testing and validation. Our
ISIC 2017 dataset is divided into 1260 images for training and 540 images for testing and validation,
while the total augmented images are 5040 images and the total number of images are 5580.
Moreover, the MNIST-HAM10000 is divided into 4200 images for training and 1800 for testing
and validation. The total augmented images is 16,800 and the total number of images is 18,600. In
addition, the ISBI is divided into 700 images for training, and 300 for test and validation, with a total
augmented image of 2800 and a total number of images of 3100. The selection criterion depends on
the availability of datasets containing tumors, as there are several datasets with no tumors in
ISIC2017, MNIST-HAM10000 and ISBI 2016 databases. The distribution of both the ISIC2017
and MNIST-HAM1000 databases are introduced in Table 1. The ISIC2017 consists of 1800
images. 1000 and 6000 images are taken, respectively, from ISIB 2016 and MNIST-HAM1000
databases. Moreover, Table 1 contains the patient’s sex (male of female). The proposed techniques
are applied to the skin images providing the possibility of each image to belong to one of the two
classes either benign or malignant. In our work, datasets are chosen to verify the proposed methods
using Python.
The k-fold cross-validation methodology is employed to reduce again the effect of
overfitting [31]. In the k-fold cross-validation, data is randomly divided into k number of
groups, which are roughly equal in size; each group is used once for testing and (k-1) training
times. The k-fold values and average classification results implemented in our research are
from k = 1 to 5 and are shown in Tables 2, 3, 4, 5 and 6 for the ResNet50 and VGG16 using
ISIC 2017, MNIST-HAM 10000 and ISBI 2016 databases, respectively. Tables 2, 3 and 4
explain different strategies, ResNet50 and VGG16 with attaching SVM with and without
preprocessing techniques. Table 2 demonstrates the average classification results and standard
deviation of the k-fold cross-validation for ResNet50 with attaching SVM using ISIC 2017
databases and Table 3 represents the average k-fold cross-validation classification results using
ResNet50 and VGG16 with attaching the SVM for ISIC 2017 dataset. Moreover, Table 4
describes the average k-fold cross-validation classification results using ResNet50 and VGG16
with attaching the SVM for MNIST-HAM10000 dataset, while Tables 5 and 6 represent the
two models without attaching SVM with and without preprocessing techniques based on
ISIC2017 and MNIST-HAM databases. It is observed from Table 5 that the ResNet50 with
preprocessing techniques achieves better results than VGG16 with preprocessing. Moreover,
Table 6 introduces the average k-fold cross-validation classification results using ResNet50
and VGG16 without attaching the SVM for MNIST-HAM10000 dataset. Tables 7 and 8
demonstrate the two models without attaching SVM with and without preprocessing tech-
niques based on ISIB 2016 database. In previous models, both ResNet50 with SVM and
preprocessing techniques achieved better result than VGG16 model. For the ISIC2017, it is
obvious, from Table 2, that the ResNet50 and preprocessing steps hybrid with the SVM
achieve the highest values. The technique achieves an accuracy of 99.19% and an AUC of
99.32%. Moreover, the sensitivity, precision, and F1 score reach 98.99%, 98.78% and
98.88%, respectively. Table 9 explains a comparative view of several classification methods
based on different CNN architectures and datasets and our newly proposed models. Our
different models achieve the best performance than other models in the literature. Moreover,
the average computational time for ResNet50 and VGG16 models is around 2.6988 and
3.1987 s, respectively.
Fig. 7 Samples images from testing ISIC 2017 dataset: (a) Malignant, (b) Benign
26808 Multimedia Tools and Applications (2021) 80:26795–26811
Fig. 8 An example of ISIC 2017 dataset classification: (a) True Positive, (b) True Negative, (c) False Positive,
(d) False Negative
Figure 2 shows the images before and after applying the preprocessing steps. It is observed
that the quality of the images increases, leading to enhanced system performance. When the
ResNet50 is used without attaching SVM, the accuracy is 96.74%. Whereas, with attaching the
ResNet50 to the SVM, the accuracy becomes 99.19% and 2.6988 s computational time.
Moreover, when using VGG16, the accuracy is 92.54% and increases to 98.88% after
attaching the SVM and computational time 3.1987 s on ISIC 2017 dataset as shown in Figs. 3,
4, 5 and 6.
Figure 7 displays samples of images from the testing set. Moreover, Fig. 8 provides an
example of visualization of classifier on four patients. The classifier diagnoses patients A and
B correctly. Patients C and D are not correctly diagnosed by the classifier. The figures indicate
that the performance of our model quickly increases and is stabilized to an accuracy greater
than 90%.
5 Conclusion
The efficiency of the suggested approaches is tested as real clinical skin images representing
different patient sex taken from the databases ISIC2017, MNIST-HAM10000 and ISBI 2016.
Firstly, our proposed algorithms are performed end-to-end without achieving high perfor-
mance. So, preprocessing steps are applied to increase our proposed algorithm performance.
Multimedia Tools and Applications (2021) 80:26795–26811 26809
Transfer learning with the fine-tuning technique is implemented to reduce the overfitting of
deep neural network. The data augmentation technique is used to resolve data shortcomings
and adds complexity to the dataset, thereby improving the generalization capability of the pre-
trained network and further alleviating overfitting. The efficiency of the diagnosis is evaluated
with k-fold cross-validation in terms of accuracy, sensitivity, precision, AUC, F1- score and
computational time. The ResNet50 hybrid system with SVM after preprocessing achieves the
best performance: 99.19% accuracy, 99.32% AUC, 98.98% sensitivity, 98.78% precision and
98.88% F1-score and computational time 2.6988 s. Moreover, the VGG16 model achieves
98.88% accuracy, 98.75% AUC, 98.78% sensitivity, 97.98% precision and 97.24% F1- score
on ISIC 2017 datasets. The obtained results demonstrate the superior performance of the
structures proposed according to improved measurement parameter values.
As a future work, a preprocessing step based on morphological filtering is to be employed
for hair removal and artifacts removal. Moreover, skin lesions will be segmented automatically
using Grab-Cut with minimal human interaction in HSV color space. Image processing
techniques will be investigated for an automatic implementation of the ABCD (asymmetry,
border irregularity, color and dermoscopic patterns) rule to separate malignant melanoma from
benign lesions. To classify skin lesions into benign or malignant, different pretrained
convolutional neural networks (CNNs), including InceptionV3 and MobileNet, will be
examined.
Conflict of interest Wessam M. Salama and Moustafa H. Aly declare that there is no conflict of interest.
Human and animal studies This article does not contain any studies with human participants or animals
performed by any of the authors.
Informed consent The article has no part or work that needs informed consent.
References
1. Abbas Q, Sadaf M, Akram A (2016) Prediction of dermoscopy patterns for recognition of both melanocytic
and non-melanocytic skin lesions. Computers 5(13):1–16
2. Albahar MA (2019) Skin lesion classification using convolutional neural network with novel regularizer.
IEEE Access 7:38306–38313
3. Brinker TJ, Hekler A, Utikal JS, Grabe N, Schadendorf D, Klode J, Berking C, Steeb T, Alexander HE,
Kalle CV (2018) Skin cancer classification using convolutional neural networks: systematic review. J Med
Internet Res 20(10):e11936
4. Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell PAMI-
8(6):679–698
5. Cervantes J, Garcia-Lamont F, Rodriguez-Mazahua L, Lopez A (2020) A comprehensive survey on support
vector machine classification: applications, challenges and trends. Neurocomputing 408:189–215
6. Dalila F, Zohra A, Reda K, Hocine C (2017) Segmentation and classification of melanoma and benign skin
lesions. Optik 140:749–761
7. Demyanov S, Chakravorty R, Abedini M, Halpern A, Garnavi R (2016) Classification of dermoscopy
patterns using deep convolutional neural networks. In: 2016 IEEE 13th international symposium on
biomedical imaging (ISBI). Czech Republic, Prague, pp 364–368
8. Eltrass A, Salama M (2020) Fully automated scheme for computer-aided detection and breast cancer
diagnosis using digitised mammograms. IET Image Process 14(3):495–505
9. Feng X, Hongxun Y, Shengping Z (2019) An efficient way to refine DenseNet. SIViP 13(5):959–965
26810 Multimedia Tools and Applications (2021) 80:26795–26811
10. Firoz R, Ali M, Khan M, Hossain M, Islam M, Shahinuzzaman M (2016) Medical imageenhancement using
morphological transformation. J Data Anal Inf Process 4:1–12.[Link]
11. Hardie R, Ali R, Silva MSD, Kebede TM (2018) Skin lesion segmentation and classification for ISIC 2018
using traditional classifiers with handcrafted features. arXiv:1807.07001 [[Link]]
12. Huang T, Yang G, Tang G (1979) A fast two-dimensional median filtering algorithm. IEEE Trans Acoust
Speech Signal Process 27(1):13–18
13. ISIS (n.d.) Archive [electronic resource], Kitware, Inc., [Link] January 24,
2020).
14. K. Inc. (2019) Skin Cancer MNIST: HAM10000. Available: [Link]
mnist-ham10000/version/2 (accessed January 24, 2020).
15. Kasmi R, Mokrani K (2016) Classification of malignant and benign skin lesions: implementation of
automatic ABCD rule. IET Image Process 10(6):448–455
16. Khan MA, Akram T, Sharif M, Shahzad A, Aurangzeb K, Alhussein M, Haider SI, Altamrah A (2019) An
implementation of normal distribution based segmentation and entropy controlled features selection for skin
lesion detection and classification. BMC Cancer, 18(1), 638, [Link], pp. 1229–1233. [Link]
org/10.1186/s12885-018-4465-8
17. Khan MA, Sharif M, Akram T, Bukhari SAC, Nayak RS (2020) Developed Newton-Raphson based deep
features selection framework for skin lesion recognition. Pattern Recogn Lett 129:293–303
18. Mahbod A, Schaefer G, Wang C, Ecker R, Ellinge I (2019) Skin lesion classification using hybrid deep
neural networks. In ICASSP 2019 IEEE international conference on acoustics, speech and signal processing
(ICASSP’2019), Brighton, UK. [Link]
19. Majtner T, Yildirim-Yayilgan S, Hardeberg JY (2016) Combining deep learning and hand-crafted features
for skin lesion classification. In 2016 IEEE sixth international conference on image processing theory, tools
and applications (IPTA), Oulu, Finland, pp. 1–6. [Link]
20. Matsunaga K, Hamada A, Minagawa A, Koga H (2017) Image classification of melanoma, nevus and
seborrheic keratosis by deep neural network ensemble. arXiv preprint arXiv:1703.03108
21. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
22. Pham BT, Jaafari A, Prakash I, Bui DT (2019) A novel hybrid intelligent model of support vector machines
and the multi-boost ensemble for landslide susceptibility modeling. Bull Eng Geol Environ 78(4):2865–
2886
23. Rembielak A, Ajithkumar T (2019) Non-melanoma skin cancer–an underestimated global health threat.
Clin Oncol 31(11):735–737
24. Salama MS, Eltrass AS, Elkamchouchi HM (2018) An improved approach for computer-aided diagnosis of
breast cancer in digital mammography. In IEEE International Symposium on Medical Measurements and
Applications, Rome, Italy, 1–5
25. Salamon J, Bello JP (2017) Deep convolutional neural networks and data augmentation for environmental
sound classification. IEEE Signal Process Lett 24(3):279–283
26. Santos MO (2018) Estimate: cancer incidence in Brazil. Revista Brasileira de Cancerologia 64(1):119–120
27. Serte S, Demirel H (2019) Gabor wavelet-based deep learning for skin lesion classification. Comput Biol
Med 113:103423
28. Shin HC, Roth HR, Mingchen G, Lu L, Ziyue X, Nogues I, Jianhua Y, Mollura D, Summers RM (2016)
Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteris-
tics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
29. Shorten C, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data
6(1):60
30. Sofaer HR, Hoeting JA, Jarnevich CS (2019) The area under the precision-recall curve as a performance
metric for rare binary events. Methods Ecol Evol 10:565–577
31. Srinivasan K, Cherukuri AK, Vincent DR, Garg A, Chen B-Y (2019) An efficient implementation of
artificial neural networks with K-fold cross-validation for process optimization, J. Internet Technol 20:
1213–1225
32. Tschandl P, Codella N, Akay NB, Argenziano G, Braun PR, Cabo H, Gutman D, Halpern A, Helba B,
Wellenhof RH, Lallas A (2019) Comparison of the accuracy of human readers versus machine-learning
algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.
Lancet Oncol 20(7):938–947
33. Vasconcelos CN, Vasconcelos BN (2017) Experiments using deep learning for dermoscopy image analysis.
Pattern Recogn Lett 139(2020):95–103
34. Yilmaz E, Trocan M (2020) Benign and malignant skin lesion classification comparison for three deep-
learning architectures. In: Nguyen N, Jearanaitanakij K, Selamat A, Trawiński B, Chittayasothorn S (eds)
Intelligent information and database systems. ACIIDS. Lecture notes in computer science, 12033. Springer,
Cham
Multimedia Tools and Applications (2021) 80:26795–26811 26811
35. Yoshida T, Celebi ME, Schaefer G, Iyatomi H (2016) Simple and effective pre-processing for automated
melanoma discrimination based on cytological findings. In 2016 IEEE international conference on big data
(big data), Washington, USA, pp. 3439–3442. [Link]
36. Yu L, Chen H, Dou Q, Qin J, Heng PA (2016) Automated melanoma recognition in dermoscopy images via
very deep residual networks. IEEE Trans Med Imaging 36(4):994–1004
37. Yu D, Wang H, Chen P, Wei Z (2014) Mixed pooling for convolutional neural networks. In: Miao D,
Pedrycz W, Śl zak D, Peters G, Hu Q,Wang R (eds) Rough Sets and Knowledge Technology. RSKT 2014.
Lecture Notes in Computer Science, vol. 8818. Springer, Cham. [Link]
9_34
38. Yüksel M, Emin, Borlu M (2009) Accurate segmentation of dermoscopic images by image thresholding
based on type-2 fuzzy logic. IEEE Trans Fuzzy Syst 17(4):976–982
39. Zachary HR, Secrest AM (2019) Public health implications of google searches for sunscreen, sunburn, skin
cancer, and melanoma in the United States. Am J Health Promot 33(4):611–615
40. Zhang J, Xie Y, Xia Y, Shen C (2019) Attention residual learning for skin lesion classification. IEEE Trans
Med Imaging 38(9):2092–2103
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.