Lung Cancer 1
Lung Cancer 1
https://doi.org/10.1007/s10729-019-09489-x
IoT with cloud based lung cancer diagnosis model using optimal
support vector machine
Dinesh Valluru 1 & I. Jasmine Selvakumari Jeya 2
Abstract
In the last decade, exponential growth of Internet of Things (IoT) and cloud computing takes the healthcare services to the next
level. At the same time, lung cancer is identified as a dangerous disease which increases the global mortality rate annually.
Presently, support vector machine (SVM) is the effective image classification tool especially in medical imaging. Feature
selection and parameter optimization are the effective ways to improve the results of SVM and are conventionally resolved
individually. This paper presents an optimal SVM for lung image classification where the parameters of SVM are optimized and
feature selection takes place by modified grey wolf optimization algorithm combined with genetic algorithm (GWO-GA). The
experimentation part takes place on three dimensions: test for parameter optimization, feature selection, and optimal SVM. For
assessing the performance of the presented approach, a benchmark image database is employed which comprises of 50 low-
dosage and stored lung CT images. The presented method exhibits its superior results on all the applied test images under several
aspects. In addition, it achieves average classification accuracy of 93.54 which is significantly higher than the compared methods.
Keywords Classification . Feature selection . IoT . Lung cancer . Support vector machine
method that can search an optimal solution for classification tuning of SVM parameters should be considered concurrently.
outcomes with constrained data about a smaller sample In this paper, we introduce an optimal SVM for lung image
dataset. SVM employs the kernel function concepts to change classification. Since the parameter selection and feature selec-
the problem of non-linearity into problem of linearity and tion plays a vital role in the results of SVM, an integration of
decreases the mapping complexity [2, 26]. The optimal prob- GWO with GA called (GWO-GA) takes place. A set of ex-
lem can be transformed to quadratic convex function problem periments has been performed for investigating the results
by employing the kernel function optimization and transfor- interms of feature selection results and classifier performance.
mation. SVM classifier clearly reaches the global optimal so- The simulation outcome clearly defines the enhanced perfor-
lution in theory and is shown in Fig. 1. mance of the optimal SVM among the compared methods.
Presently, swarm intelligence algorithms are being applied The succeeding sections of the paper are arranged as fol-
to enhance the results of the SVM. At the same time, optimal lows. An overview of related works is given in Section 2. The
SVM model can be attained when the parameters and features presented GWO-GA is given in Section 3. The application of
employed are assumed and optimized simultaneously. An im- presented method for medical image classification is ex-
proved grey wolf optimization (IGWO) method has been pre- plained in Section 4. The results are validated in 5 and con-
sented to filter the significant features and then SVM is ap- cluded in section 6.
plied to classify data using the features chosen by IGWO
algorithm. In this study, grey wolf optimization (GWO) algo-
rithm is integrated with GA and is employed to classify med-
2 Related works
ical images.
The contribution of the paper is summarized here.
In the SVM classification performance, the kernel function pa-
Nowadays, e-healthcare services become more common with
rameter δ and penalty factor C has much influence over it. At the
the use of IoT devices and cloud technologies. Several studies
same time, while SVM is used for classification, feature selection
were presented to develop an automatic disease classification
is important for high-dimensional features dataset. Generally, in
model to identify the presence of lung cancer. Presently, sup-
SVM, feature selection and parameter optimization plays a major
port vector machine (SVM) is the effective image classifica-
role and has influence on one another. And, it is difficult to gain
tion tool especially in medical imaging. But, the results of
optimal SVM classifier if the SVM parameters and feature subset
SVM are solely based on its parameters. In case of medical
are optimized distinctly. Hence, feature selection, parameter op-
images with high-dimensional features, feature redundancy
timization and optimal classifier of SVM shall be concurrently
has a great impact on classification results. Feature selection
optimized; this is known as optimal SVM for convenience. In
and parameter optimization are the effective ways to improve
common, both feature selection and parameter optimization can
the results of SVM. The choice of optimum feature subset and
be known as an issue of combinatorial optimization and might be
managed through swarm intelligence algorithms and evolution-
ary algorithms like differential evolution (DE) algorithm, genetic
algorithm (GA), GWO, particle swarm optimization (PSO) algo-
rithm, etc.,
At present times, numerous unsupervised methods had been A GA method had been projected [13] for optimal SVM and
projected for feature selection. Few unsupervised feature simulation outcomes demonstrate that the projected GA-based
selection methods have been projected [7, 8] depending consistent parameter optimization and feature selection tech-
on structural learning through combining feature learning nique had enhanced accuracy with some features. A PSO-
and image understanding into a framework of joint learn- based method in [14] showed that the simulation outcomes
ing. For image understanding tasks, over revealing latent is good at removing insignificant or unnecessary features,
subspace, the simulation outcomes denote that the projected and it determined the parameter rates efficiently by means of
method was efficient and the classification accuracy was enhancing the entire classification outcomes. GA for the opti-
enhanced over other methods. For embedded feature selec- mal parameters detection of SVM and the discriminative fea-
tion, unsupervised learning methods had been projected [9] ture subset is presented [15] for remote sensing image dataset.
using kernel K-means, that had been employed to reduce For a given classification problem, the simulation outcomes
violation in contrast to initial cluster structure. For future denote that the projected technique can find better feature
analysis, simulation outcomes showed that the projected subset and gain enhanced classification accuracy. For remote
technique had superior phenomena understanding. In image sensing images, [16] examined a PSO algorithm for determin-
retrieval scope, unsupervised methods are used commonly ing the parameter, feature selection in a SVM-based classifi-
for every pixel classification in an image and choose the cation. It has reached low sensitivity and enhanced classifica-
optimal feature subset simultaneously. This method is tion accuracy due to the dimensionality curse. Some other
employed for every pixel classification and to choose the related works are also developed and found in the literature
optimal feature subset. For an image, pixel is the minimal [17–21].
unit and size of the data is comparatively large; it may take
a huge time to estimate the data correlation of various 2.4 Characteristics of the proposed method
features that will tend to poor efficacy.
The major feature selection aim is to gain proper classifica- Though the PSO and GA algorithm be implied for optimal
tion accuracy with some possible features. As the SVM fea- SVM, GA and PSO might get trapped simply in local opti-
tures are chosen through swarm intelligence algorithms or evo- mum and it is hard to search global optimal solution. By the
lutionary algorithms, SVM is employed as supervised classifier use of presented GWO-GA algorithm, the projected method
which is a prominent one among classifiers. A model of new attempts to find a best solution and it can attain classification
walker assisted gait has been projected through Martins in this accuracy using the feature subset and optimal parameters.
view that merges SVM and GA-based feature selection method
to differentiate among non-assisted and assisted gait with fore-
arm supports that had been successful at feature count restric- 3 GWO-GA method
tion. A diagnosis model had been build [10] depending on
SVM and PSO method to choose the optimal feature subset. GWO algorithm is based on the hunting procedure of leader-
It founds optimal while diagnosing erythemato-squamous dis- ship hierarchy. The grey wolves are considered as the top-
eases and actual feature set. The simulation results show that level predators and it usually lives in the numbers of 5–12
classification precision of the projected method was superior wolves. Based on the hunting behavior of wolves, it is cate-
when compared to given outcomes. And, PSO method can gorized into 4 kinds such as alpha (α), beta (β), delta (δ), and
efficiently decide the SVM parameter rates that appropriately omega (ω). The most dominating one called α wolves which
enhanced the entire classification accuracy. A hybrid feature take care of the place to hunt, sleep and so on. These decisions
selection method depending on SVM (GA-SVM) and GA will be followed the group and the members of the group
was projected [11] for remote sensing image classification acknowledges the α words by keeping their tails in a down-
and the simulation outcomes shows that the projected method ward direction. The subsequent level of grey wolves is β
usually gives enhanced classification accuracy with some which are subordinate wolves that help α to make decisions.
employed features when comparing with methods of non- The β wolves are assumed as α wolves in situations when the
feature selection. A new classification and feature selection α wolves dies. The minimum ranking grey wolves are ω
method has been projected [12] for remote sensing images by wolves that play the role of scapegoat. Approaching,
merging PSO algorithm global optimization capability. The encircling and attacking the prey are the different functions
simulation method demonstrates that the feature selection per- involved in GWO algorithm. It has exploration as well as
formance can efficiently decrease the dimensionality of data exploitation phase. The first phase looks for the optimum so-
and attained enhanced accuracy when comparing with all the lutions in the local search space. The swarm of grey wolf
method. encircles and attacks the prey while exploring the optimal
Valluru D., Jeya I.J.S.
solutions in a local search space. The prey will be searched in GA [22] is an important evolutionary algorithm that pro-
the exploration phase where the prey will be searched in the vides the algorithms, which follows the process of natural
entire search area. During the process of encircling prey, the selection and evolution. It is commonly employed for gener-
wolf recognizes the prey location and encircles them. The ating useful solutions to optimize and search problems partic-
position vectors of the prey are found and the search agents ularly when there is no heuristic information of the problem to
alter the position based on the attained best solution. The be managed. It is assumed that the initial heuristic information
encircling prey can be defined as follows for GWO algorithm is arbitrarily created and it does not use
the data based on the managed problem.
! ! ! ! In fortunate, GA does not require any heuristic data in the
D ¼ j C : Xp ðtÞ− X ðtÞj ð1Þ
task of evolution and it can attain acceptable solution in most
! ! !! cases. On the other hand it is helpless to premature conver-
X ðt þ 1Þ ¼ X p ðkÞ− A : D ð2Þ
gence and has very low convergence speed later in the evolu-
! ! tion process. Keeping these features, GWO-GA algorithm is
Where t denotes the present round, A and C are coefficient
! ! introduced which make use of binary encoding, which makes
vectors and position vector of the prey is indicated as Xp ðkÞ; X cooperation easier. The solution of GA is applied to the initial
is the position vector, || is the absolute value and . is element population of the GWO algorithm. When the fitness value of
! ! present individual is effective when compared to GA, the in-
multiplication. The vectors A and C are indicated by:
dividual will be employed for replacing the respective GA and
!
A ¼ 2!a :!
r −!
a ð3Þ GA restarts to execute, hence, GA and GWO gets executed till
! the stopping criteria is satisfied.
C ¼ 2:!r ð4Þ
4.1 Parameter optimization and kernel function The functional form is highly simple for linear kernel
of SVM function. To deal with the problem in linear separable
ones, it is highly useful. The functional form is highly
For SVM, the kernel function is highly essential that creates difficult for polynomial kernel function that might cost
the actual data as linear separable, and dimensionality curse a huge sum of computing time [23]. It can recognize
has been avoided efficiently. In SVM, there exist numerous non-linear mapping for RBF kernel function and the
kernel functions types such as radial basis function (RBF) needed count of RBF kernel function parameters is
kernel function, linear kernel function and polynomial kernel small comparatively and it comprise low number of
functions and are represented as complexities. When comparing with other generally
employed kernel functions [24], numerous simulation
K xi ; x j ¼ xTi :x j ð10Þ outcomes exhibit that RBF kernel function has enhanced
δ performance. Therefore, for classification, RBF kernel
K xi ; x j ¼ xTi :x j þ 1 ð11Þ
function is employed for every dataset to find optimal
2 !
−xi −x j penalty factor C combination and kernel function pa-
K xi ; x j ¼ exp ð12Þ
δ2 rameter δ.
In CT lung images, high-dimensional features are re- The presented method is simulated using MATLAB
quired for attaining high classification accuracy. But, 2014b. The parameter settings used for experimentation
few correlated and repetitive features are also involved are given as follows: Kernel type: Polynomial, Kernel
in these features that will decrease classification accura- degree:2, Cost: 0.1 and Gamma:0.01.The presented
cy and computational efficiency. A procedure of choos- method concentrates on the optimization capability of
ing relevant features subset from entire features for con- the GWO-GA and the classification accuracy of the op-
struction of model and removes duplicate features is timal SVM. The experimentation part takes place on
known as feature selection. Selection of feature is three dimensions: test for parameter optimization, fea-
known as a combinatorial optimization problem and is ture selection, and optimal SVM. For assessing the per-
solved by the GWO-GA algorithm as discussed in the formance of the presented approach, a benchmark image
previous section. For classification, SVM is used and database is employed which comprises of 50 low-dosage
the objective function depends on the classification ac- and stored lung CT images (http://www.via.cornell.edu/
curacy gained through SVM number of features were lungdb.html). Each image is 1.25 mm slice thickness
chosen by the presented method. and is generated in a single breath. The position of
Normal
Std(%) 4.22 3.83 3.03 2.75
the nodules is recognized by the expert are also whereas the GA, BPSO and BDE achieved lower fitness
provided in the dataset. Some of the sample test values of 84.3, 84.8 and 85.8 respectively.
images are shown in Fig. 3. Likewise, on the test image I3, the higher fitness value of
Tables 1, 2 and 3 provides the fitness values of pa- 80.9 is attained by the presented method whereas the lower
rameter optimization, feature selection, and optimal fitness values of 80, 80.2 and 80.2 is achieved by the GA,
SVM that undergone optimization using diverse algo- BPSO and BDE methods. In the same way, for all the applied
rithms. It is shown in Table 1 that the fitness value of test images, maximum fitness value is obtained by the optimal
optimized by different algorithms, respectively. It is SVM compared to other methods. From the Table 1, it is also
displayed in Table 1 that the fitness value of GA and observed that the presented optimal SVM takes minimum
BPSO algorithm is noticeably poor than the compared computation time than the other methods.
BDE and the presented method. For instance, for the Table 2 provides the optimal optimization capability of the
image I1, a maximum fitness value of 82.9 is obtained optimal SVM, its average fitness value, standard deviation, num-
by the optimal SVM whereas the GA, BPSO and BDE ber of chosen features (Fn) and time. From this table, it is evident
achieved fitness values of 78.1, 80.3 and 80.8 respec- that the presented method chooses less number of features and
tively. Similarly, on the applied image I2, the optimal higher fitness value for all the applied images. For instance, for
SVM achieved a maximum fitness value of 86.50 the applied image I1, the presented method has chosen a
minimum of 4 features whereas the other methods have chosen a Likewise, on the test image I3, the higher fitness value of
number of 5 and 6 features. Interms of fitness value, for the image 0.66 is attained by the presented method whereas the lower
I1, a maximum fitness value of 0.82 is obtained by the optimal fitness values of 0.61, 0.61 and 0.64 is achieved by the GA,
SVM whereas the GA, BPSO and BDE achieved fitness values BPSO and BDE methods. In line with, for the applied image
of 0.78, 0.78 and 0.79 respectively. Similarly, on the applied I4, the presented optimal SVM showed superior results with
image I2, the optimal SVM achieved a maximum fitness value the fitness value of 0.84. However, the compared GA, BPSO
of 0.45 whereas the GA, BPSO and BDE achieved lower fitness and BDE methods depicted comparative lower results of 0.74,
values of 0.26, 0.44 and 0.36 respectively. 0.75 and 0.77 respectively. The values depicted that the
IoT with cloud based lung cancer diagnosis model using optimal support vector machine
presented SVM is superior to other methods under several 0.77 is achieved by the GA, BPSO and BDE methods. In line
aspects. From the Table 2, it is observed that the presented with, for the applied image I4, the presented optimal SVM
optimal SVM takes lower computation time than the other showed superior results with the fitness value of 0.94.
methods. At the end, based on the data present in the However, the compared GA, BPSO and BDE methods
Table 3, the values depicted the superior performance of the depicted comparative lower results of 0.90, 0.90 and 0.91
optimal SVM compared to the existing methods in several respectively. These values clarified that the presented SVM
ways. The presented algorithm achieved an average fitness showed enhanced results over the compared methods in a
value of 0.828 on all the applied test images which signifies significant way.
the optimization capability of presented algorithm than the Finally, an analysis of the compared methods and proposed
other existing algorithms. Particularly, interms of chosen fea- method is made interms of classification accuracy and the values
tures, the presented method has chosen less number of features are provided in Table 4. From the table and Fig. 4, it is evident
with a minimum of 6 features for the image I1 and a maximum that the maximum accuracy is attained by the presented method
of 18 features for the image I5. In addition, the computation with the classification accuracy of 96.54 on the image I1 whereas
time required by the presented method is significantly lower the FCM-SVM attained a lower classification accuracy of 90.96.
than the time needed by the compared methods. For instance, Similarly, on all the applied test images, the presented method
interms of fitness value, for the image I1, a maximum fitness exhibited better performance with the maximum classification
value of 0.83 is obtained by the optimal SVM whereas the results. On average, FCM-SVM, parameter optimization, feature
GA, BPSO and BDE achieved fitness values of 0.77, 0.78 and selection and optimal SVM attains the classification accuracy of
0.78 respectively. Similarly, on the applied image I2, the op- 79.328, 91.94, 87.108 and 93.54 respectively. From these values,
timal SVM achieved a maximum fitness value of 0.86 where- it is evident that the optimal SVM exhibits superiority with an
as the GA, BPSO and BDE achieved lower fitness values of average classification accuracy of 93.54 over the compared
0.82, 0.82 and 0.83 respectively. Likewise, on the test image methods. In short, the presented method easily converges to the
I3, the higher fitness value of 0.81 is attained by the presented optimum solution for parameter optimization, feature selection,
method whereas the lower fitness values of 0.76, 0.76 and and the optimal SVM on all the applied test images.
100
90
80
70
60
Accuracy
50
40
30
20
10
0
I1 I2 I3 I4 I5
Images
Valluru D., Jeya I.J.S.