0% found this document useful (0 votes)
42 views11 pages

Disease Classification with Machine Learning

Uploaded by

meenalgarg13jan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views11 pages

Disease Classification with Machine Learning

Uploaded by

meenalgarg13jan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/319181775

Disease Classification Using Machine Learning Algorithms-A Comparative


Study

Article in International Journal of Pure and Applied Mathematics · January 2017

CITATIONS READS

37 6,278

3 authors, including:

Leoni Sharmila Perumal Venkatesan


SRM Institute of Science and Technology Sri Ramachandra University
8 PUBLICATIONS 49 CITATIONS 222 PUBLICATIONS 3,683 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Bayesian Modeling View project

Clinical pharmacology View project

All content following this page was uploaded by Perumal Venkatesan on 19 August 2017.

The user has requested enhancement of the downloaded file.


International Journal of Pure and Applied Mathematics
Volume 114 No. 6 2017, 1-10
ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)
url: [Link]
Special Issue
[Link]

Disease Classification Using Machine


Learning Algorithms - A Comparative
Study
[Link] Sharmila1,∗ , C.Dharuman2 and P.Venkatesan3
1,2
Department of Mathematics, SRM University,
Ramapuram Campus, Chennai - 600 089, India.
3
Sri Ramachandra University,Chennai, India.

Corresponding author:sharmilamartin169@[Link]
February 24, 2017

Abstract
Machine learning technique is widely used in various
fields of science and technology. They have been giving out
meaningful and classified information this tool also explores
in constructing and study of algorithms which can learn
from data. Data mining in healthcare is an emerging field
of high importance for providing prognosis and a deeper un-
derstanding of medical data. Its applications in healthcare
include analysis and prevention of hospital errors, early de-
tection, prevention of diseases, and for cost savings. The
main problem arises to predict and diagnosis the disease in
early stage, with the use of machine learning techniques.
This paper gives a comparative study of different machine
learning technique such Fuzzy logic, Fuzzy Neural Network
and decision tree in classifying liver data set.
AMS Subject Classification:03B52, 92B20, 68Q32.
Key Words and Phrases:Data mining, Fuzzy logic,
Fuzzy Neural Network, Decision Tree.

11
International Journal of Pure and Applied Mathematics Special Issue

1 Introduction
Data mining is process of analyzing bulk amount of data to auto-
matically discover the interesting regularities or associations which
in turn lead to improved understanding of the original processes[1].
There are two categories of data mining: [Link] mining in De-
scriptive, [Link] mining in Predictive. Descriptive data mining
generalizes or summarizes the general properties of the data in the
database. Predictive data mining searches the inference on the
present data to make predictions[2]. Data mining has several tasks
such as association rules, classification, predictions and clustering
etc. Classification are supervised learning techniques which clas-
sifies data into predefined class label. It is one of the most useful
techniques in data mining; this technique is commonly used to build
models that predict future data trends. The main aim of the clas-
sification techniques is to analyze the input data and to predict the
accuracy for the future work. In medical field data mining plays a
vital role to find the relationship between patient data and medical
data set from the large database. In this paper a comparative study
of three different classification techniques namely, Fuzzy logic, Deci-
sion tree and Fuzzy Neural network is used to classify liver data set
which is taken from UCI machine learning repository. This paper
is organized as follows: data set is explained in section 3, section
4 consists of different machine learning methods, results and dis-
cussions are shown in section 5 finally conclusion of the study is
presented in section 6.

2 Literature Review
Over the years, the literature about the use of intelligent methods
in medicine domain has seen enormous number of related studies[3].
In [4] a fuzzy rule based expert system was implemented for asthma
diagnosing. An expert system with the use of neural network, C5.0
decision tree and linear discriminate analysis is suggested for the
classification of six different categories of dermatology disease [5].[6]
the author evaluate the different types of liver dataset using clas-
sification algorithm, they are support vector machine, C4.5, Back
propagation neural network algorithm, and Naive bayes classifier.
[7] Gives compared classifications between fuzzy logic and Neuro -

22
International Journal of Pure and Applied Mathematics Special Issue

Fuzzy systems.

3 Data Set Description


Liver disorders data set is taken from UCI machine learning repos-
itory. It classifies into 2 classes namely, normal and diseased. A
total of 345 male patients form the data out of which 200 are normal
and 145 are diseased, variables selected for study are 6 which are
as follows: 1. mcv - mean corpuscular volume, 2. alkphos - alkaline
phosphotase, 3. sgpt - alamine aminotransferase, 4. sgot- aspartate
aminotransferase, 5. gammagt-gamma-glutamyl transpeptidase 6.
drinks -number of half-pint equivalents of alcoholic beverages drunk
per day 7. group - normal / disease [8].

4 Machine learning Methods


4.1 Fuzzy Logic
Fuzzy logic provides an inference morphology which enables hu-
man reasoning capabilities to be applied to knowledge based sys-
tems. The theory of fuzzy logic gives a mathematical strength to
understand the uncertainties associated with human thinking and
reasoning. It has proved to be wonderful tool for intelligent sys-
tems in medicine. The first step is to select suitable input variables
from the collection of system inputs; the second step is to determine
the number of membership functions for each input variable. This
process is closely related to the partitioning of input space. Fuzzy
grids can be used to generate fuzzy rules based on system input-
output training data. Also, a one-pass build-up procedure can avoid
the time-consuming learning process, but its performance depends
heavily on the definition of the grid. In general, the finer the grid
is, the better the performance will be[9].

4.2 Fuzzy Neural Network


Every intelligent technique has particular computational proper-
ties. Neural networks and fuzzy logic are two approaches that are
widely used to solve classification problems. Neural networks are

33
International Journal of Pure and Applied Mathematics Special Issue

good at recognizing patterns but are not good at explaining how


they reach their decisions. Fuzzy logic can reason with imprecise
information, they are good at explaining their decisions but they
cannot automatically acquire to the rules that use to make those
decisions. Hybrid systems are also important when considering the
varied nature of application domains. For this work, SFAM system
is used for classification.

4.2.1 Simplified Fuzzy ARTMAP


The SFAM consists of a two layer net containing an input and
an output layer. Figure 1 illustrates the architecture of simpli-
fied fuzzy ARTMAP. The main idea of SFAM is as follows[10]: (1)
Find the nearest subclass prototype which resonates with the input
pattern (winner). (2) If the labels of the subclass and the input
pattern match, update the prototype to be closer to the input pat-
tern. (3) Reset the winner then temporarily increase the resonance
threshold (r), and try the next winner. (4) If the winner is uncom-
mitted, create a new subclass (assign the input vector which is the
prototype pattern of the winner, and label it as the class label of
the input). The input to the network flows through the comple-
ment coder where the input string is stretched to double the size
by adding its complement also. The complement codes input then
flows into the input layer and remains there. Weights (w) from
each of the output category nodes flow down to the input layer.
The category layer merely holds the names of the M number of
categories that the network has to learn. Vigilance parameter and
match tracking are mechanisms of the network architecture which
are primarily employed for network training. ρ is the vigilance pa-
rameter which can range from 0 to 1. It controls the granularity
of the output node encoding. Thus, while high vigilance values
make the output node much fussier during pattern encoding, low
vigilance renders the output node to be liberal during the encod-
ing of patterns. The match tracking mechanism of the network is
responsible for the adjustment of vigilance values[11].

44
International Journal of Pure and Applied Mathematics Special Issue

Figure 1: Architecture of SFAM Network[10]

4.3 Decision Tree


Decision Tree is a simple and widely used classification technique
which applies a straightforward idea to solve the classification prob-
lem [12]. This classifier poses a series of carefully crafted questions
about the attributes of the test. Each time when it receives an
answer there is a follow-up question until a conclusion about the
class label of the record is reached. In this work BF tree, J48 tree,
LMT, Random forest is used for classification.

4.3.1 BF Tree
In Best First (BF) tree learners the ”best” node is expanded first.
The ”best” node is the node whose split leads to maximum re-
duction of impurity among all nodes available for splitting. The
resulting tree will be the same when fully grown. It constructs bi-
nary trees, i.e., each internal node has exactly two outgoing edges.
The tree growing method tries to maximize the node homogeneity
to an extent were a node does not represent a homogenous subset
of cases is an indication of impurity.

55
International Journal of Pure and Applied Mathematics Special Issue

4.3.2 J48 Tree


J48 decision tree is the implementation of ID3 algorithm; it is a
simple C4.5 decision tree for classification. Using this technique,
a tree is constructed to model the classification process, after the
tree is build, it is applied to each tuple in the database and that
results in the classification for that tuple[13].

4.3.3 LMT Tree


A logistic model tree (LMT) consists of a standard decision tree
structure with logistic regression functions at the leaves, more like
a regression tree with regression functions at the leaves.

4.3.4 Random Forest


Random Forest is a group of un-pruned classification or it is a re-
gression trees which are made from the random selection of samples
of the training data. Randomly features are selected in the induc-
tion process then Prediction is made by aggregating majority vote
for classification which is the prediction of the ensemble [14].

Figure 2: Decision Tree [12]

66
International Journal of Pure and Applied Mathematics Special Issue

5 Results and Discussions


Liver disease classification data set consists of 345 samples out of
which 200 are normal and145 are disease case. Inputs are 7 out of
which the last attribute has to be classified. Classification work is
done in WEKA for fuzzy logic and decision tree. For fuzzy simple
grid partition was chosen under 10-fold cross validation and an ac-
curacy of 58.8% was obtained. BF tree, J48, LMT and Random
forest algorithm 10-fold cross validation was used for Decision tree.
Fuzzy Neuro classification was done in Neunet Pro using SFAM
method. In this experiment 70% split was used for training and
30% for testing. In health sector diagnosing a disease is a very
important task. Fuzzy logic and decision tree are good classifiers
but it differs from the nature of the problem. For diagnosing liver
disease Fuzzy Neural network classifies better than other two tech-
niques. Classification results are presented in figure 3 , table 1 and
results are tabulated in table2.

Figure 3: J48 Decision Tree

Table 1: Classification of FNN for Test data


Predicate Value
Y N
Actual Value Y 37 9
N 0 59

77
International Journal of Pure and Applied Mathematics Special Issue

Table 2: Results of different classification technique


Correct Classifications
Fuzzy logic 58.8%
BF Tree 64.9%
J48 68.7%
LMT 66.4 %
Random Forest 68.1
Fuzzy Neutral Network 91%

6 Conclusion
This study surveyed some machine learning techniques to predict
the liver disease. The study analyzed Fuzzy logic, Decision tree and
Fuzzy Neural Network. From the experimental results it is noticed
that Fuzzy Neural network gives 91% accuracy , Decision Tree (J48)
gives 68.7% and Fuzzy Logic 58.8% accuracy for classification. The
hybrid technique could be successfully used to help the diagnosis
of liver disease. The Neuro- Fuzzy system used in this study shows
better performances than other two techniques. The advantage of
SFAM is capable to perform classification very efficiently and giving
very high performances.

References
[1] Yao,H.,Hamilton, H.J., Buzz, C.J, A foundational Approach
to mining itemset utilities from databases, In 4th SIAM Inter-
national Conference on Data Mining, Florida USA,2004.

[2] [Link], R. Jemina Priyadarsini ,A Survey on Classifica-


tion Techniques in Data Mining for Analyzing Liver Disease
Disorder,International Journal of Computer Science and Mo-
bile Computing,Vol. 5, 2016,483 - 488

[3] Adeli, A. and Neshat M., A Fuzzy Expert System for Heart
Disease Diagnosis, Proceeding of the International Multi Con-
ference of Engineers and Computers Scientists, 2010

88
International Journal of Pure and Applied Mathematics Special Issue

[4] Zarandi, M.H.F, Zolnoori, M. M., Heidarnejad, H., A fuzzy-


rule based expert system for diagnosing asthma, Transaction
E: Industrial Engineering, 2010, 17(2), 129-142.
[5] [Link]. M, Diagnosis of Erythemato-Squamous diseases
using ensemble of data mining Methods, ICGST-BIME jour-
nal, 2010,10(1)
[6] Bendi Venkata Ramana, [Link] Prasad Babu and N.B.
Venkaeswarlu, A Critical Study of Selected Classification Al-
gorithms for Liver Disease Diagnosis, International Journal of
Database Management Systems (IJDMS), 2011, 3(2).
[7] Leoni Sharmila.S, Dharuman. C , Venkatesan. P, A Novel
Neuro- Fuzzy System for Classification,Global Journal of Pure
and Applied Mathematics, 2016, 12, 267-270.
[8] [Link]/pub/ml-repos/liver disease.
[9] Young Hoon Joo, Guanrong Chen, Fuzzy Systems Modeling:
An Introduction, IGI Global Distributions, 2009.
[10] Vakil-Gahimisheh. M, Pavei N, A Fast Simplified Fuzzy
ARTMAP Network, Neural Processing Letters, 2003, 17, 273-
316.
[11] Boonruang Marungsri , Suphachai Boonpoke, Applications of
Simplified Fuzzy ARTMAP to Partial Discharge Classifica-
tion and Pattern Recognition, Wseas transactions on systems,
2011, 10, 69 - 80 .
[12] Nidhi Bhatla, Kiran Jyoti, A Novel Approach for Heart Disease
Diagnosis using Data Mining and Fuzzy Logic, International
Journal of Computer Applications,2012, 54(17).
[13] Thenmozhi.K, Deepika.P, Heart Disease Prediction Using
Classification with Different Decision Tree, International Jour-
nal of Engineering Research and General Science, 2014,2(6).
[14] Jehad Ali, Rehanullah Khan, Nasir Ahmad, Imran Maqsood,
Random Forests and Decision Trees,International Journal of
Computer Science, 2012, 9.

99
10

View publication stats

You might also like