International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
RESEARCH ARTICLE OPEN ACCESS
A Data Mining Tool In Neurlogical Disorder
Prediction Using Feature Selection Techniques
P. Sravanthi [1], B.Sai Sudha [2], P.Sai Ganesh [3], M.Dinesh [4]
,
T.Ravi Kumar [5], G.Rajasekharam [6]
Department of Computer Science and Engineering, Nadimpalli Satyanarayana Raju Institute of
Technology, Andhra Pradesh,India
ABSTRACT
Data mining techniques are used for a variety of applications. In healthcare Industry, Data mining plays an
important role in predicting diseases. For detecting a disease number of tests should be required from the patient. By
using Data mining techniques the number of tests can be reduced. This reduced test plays an important role in time
and performance. Neurological disorders are diseases of the brain, spine and the nerves that connect them.The
increasing capabilities of technologies are generating massive volumes of complex data at a rapid pace. Evaluating
and diagnosing disorders of the nervous system is a complicated and complex task. Many of the same or similar
symptoms happen in different combinations among the different disorders. Here, we provide a developed selected
data mining methods in the area of neurological diseases diagnosis. Hence, it will help experts to gain an
understanding of how data mining techniques can assist them in neurological diseases diagnosis and patients
treatment. Gradient boosting tree algorithm is used to predict the diseases as it produces the accurate results. Here,
our aim is to find the performance of different classification methods of large database.
Keywords:- Data Mining, Classification Rules, Voice dataset, Bar Graph.
I. INTRODUCTION agitans and later gave his surname was known as a PD.
It generally affects the neurons which is responsible for
PROBLEM STATEMENT
overall body movements. Main chemicals are dopamine
Neurodegenerative disorders are the results of the and acetylcholine which affects human brain.
progressive tearing and neurons loss in different areas There are various environmental factor which have
of the nervous system. Neurons are the functional unit been implicated in PD [20].below are the listed factor
of brain .They are contiguous rather than continuous. A which caused Parkinson’s disease in an individual.
good healthy looking neuron as shown in fig 1 has
extensions called dendrites or axons, a cell body and a
nucleus that contains our DNA. DNA is our genome II. OBJECTIVES
and hundred billion neurons contains our entire genome
which is packaged into it .When a neuron get sick, it The main objective is to predict the prediction
loses its extension and hence its ability to communicate efficiency that would be beneficial for the patients who
which is not good for it and its metabolism become low are suffering from Parkinson and the percentage ratio
so it starts to accumulate junk and it tries to contain the will be reduced. Generally in the first stage Parkinson
junk in the little packages in little pockets .When things can be cured by the proper treatment. So it‘s important
become worse and if the neuron is a cell culture it to identify the PD at the early stage for the betterment
completely loses its extension, becomes round and full of the patients. The main purpose of this research work
of the vacuoles. is to find the best prediction model i.e. the best machine
learning technique which will distinguishes the
This work deals with the prediction of Parkinson’s Parkinson’s patient from the healthy person. The
disorder which is now a days is a tremendously techniques SVM, Adaboost, navie bayes, gradient
descent, gradient boosting tree We have found that
increasing incurable disease. Parkinson’s disease is
Neural network ,SVM, Linear Regression have been
most spreading disease [19] which get its name from
reported in various researches, whereas it has been
James Parkinson who earlier described it as a paralysis
found that only few researchers have explored
ISSN: 2347-8578 www.ijcstjournal.org Page 100
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
Adaboost and gradient boosting tree. The experimental 4. Gbt (gradient boosting tree)
study is performed on the biomedical voice
measurement from 31 people, 23 with Parkinson’s 1. Naive Baye’s Classifiers
disease. The prediction is evaluated using error rates.
Further the Feature selection technique has been
implemented with the aim to get the important features This article discusses the theory behind the Naive Bayes
that can detect the Parkinson’s disease. classifiers and their implementation.
Naive Bayes classifiers are a collection of classification
III. SCOPE algorithms based on Baye’s Theorem.It is not a single
algorithm but a family of algorithms where all of them
share a common principle, i.e. every pair of features
SCOPEBy using this feature selection technique we can being classified is independent of each other
predict the disease In initial stage and we can save the
life of effected person Prediction of Parkinson disorder
is one of the most important problem that has to be
detected in
Fig 2: represent data set
Fig 1:Structure of neuron present in human brain To start with, let us consider a dataset.
the early phases of the commencement of the disease so Consider a fictional dataset that describes the features
as to reduce the disease progression rate among the for prediction of Parkinson's disease.
individuals .Various researches have been made to find The dataset is divided into two parts, namely, feature
the basic cause and some have reached to the heights by matrix and the response vector.
proposing a system which differentiates the healthy Feature matrix contains all the vectors(rows)
people from those with any parkinson’s data set using of dataset in which each vector consists of the
various machine learning techniques. Lots of pre- value of dependent features.
processing, feature selection and classification Response vector contains the value of class
techniques have been implemented and developed in variable(prediction or output) for each row of
the past decades. Following is the given work done in feature matrix.
the prediction of Parkinson’s disorders.
2. Support Vector Machine (svm)
Machine learningalgorithms
I guess by now you would’ve accustomed yourself
1. Navie baye’s with linearregression and logistic regression algorithms.
2. Svm (support vector machine) If not, I suggest you have a look at them before moving
on to support vector machine. Support vector machine
3. Sgd (stochastic gradient
is another simple algorithm that every machine learning
descent) expert should have in his/her arsenal. Support vector
ISSN: 2347-8578 www.ijcstjournal.org Page 101
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
machine is highly preferred by many as it produces
significant accuracy with less computation power.
Support Vector Machine, abbreviated as SVM can be 3. Gradient Descent technique
used for both regression and classification tasks. But, it Gradient descent is an optimization algorithm used to
is widely used in classification objectives. minimize some function by iteratively moving in the
direction of steepest descent as defined by the negative
The objective of the support vector machine algorithm of the gradient. In machine learning, we use gradient
is to find a hyper plane in an N-dimensional space(N — descent to update the parameters of our model.
the number of features) that distinctly classifies the data Parameters refer to coefficients in Linear
points.
Regression and weights in neural networks.
Fig 6 :shows the SGD technique
Fig 3 :Possible hyper planes
Step-by-step
Now let’s run gradient descent using our new cost
function. There are two parameters in our cost function
we can control: mm (weight) and bb (bias). Since we
need to consider the impact each one has on the final
Fig 4 : Hyper planes in 2D and 3D feature prediction, we need to use partial derivatives. We
space calculate the partial derivatives of the cost function with
respect to each parameter and store the results in a
gradient.
4. Gradient boosting tree
Although most of the Kaggle competition winners use
stack/ensemble of various models, one particular model
that is part of most of the ensembles is some variant of
Gradient Boosting (GBM) algorithm. Take for an
example the winner of latest Kaggle
competition: Michael Jahrer’s solution with
representation learning in Safe Driver Prediction. His
Fig 5 : represents support vector solution was a blend of 6 models. 1 LightGBM (a
variant of GBM) and 5 Neural Nets. Although his
ISSN: 2347-8578 www.ijcstjournal.org Page 102
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
success is attributed to the semi-supervised learning that
he used for the structured data, but gradient boosting
model has done the useful part too.Even though GBM is
being used widely, many practitioners still treat it as
complex black-box algorithm and just run the models
using pre-built libraries. The purpose of this post is to
simplify a supposedly complex algorithm and to help
the reader to understand the algorithm intuitively. I am
going to explain the pure vanilla version of the gradient
boosting algorithm and will share links for its different
variants at the end. I have taken base DecisionTree code
from fast.ai library (fastai/courses/ml1/lesson3- Fig 9 : Simulated data (x: input, y:output)
rf_foundations.ipynb) and on top of that, I have built
my own simple version of basic gradient boosting
model.
IV. PROPOSED SYSTEM
We have proposed new method to collect symptoms of
diseases we are using data set of Parkinson disease by
using some machine learning algorithms for more
accuracy In prediction we used algorithms like
a. NAVIE BAYES
b. SVM
c. SGD
d. GBT
Fig 7 : represents the iteration ,parallel,
sequential
While compared with previous research’s we have got
better results those are shown below
Fig 8 : Sample random normally
distributed residuals with mean around 0
Fig 10 : represents total results of the
algorithms
ISSN: 2347-8578 www.ijcstjournal.org Page 103
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
V. ADVANTAGES software requirements specification (SRS) is a
detailed description of a software system to be
developed with its functional and non-functional
We have taken voice data set of Parkinson
requirements.The SRS is developed based the
disease
agreement between customer and contractors. It may
include the use cases of how user is going to interact
We have taken 195 patients samples 23 feature with software system. The software requirement
from voice data set specification document consistent of all necessary
requirements required for project development. To
develop the software system we should have clear
['MDVP:Fo(Hz)', 'MDVP:Fhi(Hz)',
understanding of Software system. To achieve this we
'MDVP:Flo(Hz)', 'MDVP:Jitter(%)',
need to continuous communication with customers to
'MDVP:Jitter(Abs)', 'MDVP:RAP',
gather all requirements.
'MDVP:PPQ', 'Jitter:DDP', 'MDVP:Shimmer',
'MDVP:Shimmer(dB)', 'Shimmer:APQ3',
'Shimmer:APQ5', 'MDVP:APQ',
'Shimmer:DDA', 'NHR', 'RPDE', 'DFA',
'spread1', 'spread2', 'D2', 'PPE']
From these features we are comparing the
patients data and predicting the disease
We are divided the prediction into 3 sub parts
like 50,100,150
In the above bar graph we can see the results
By dividing the samples into 3 parts we can
know the which algorithm is giving better Fig 11: shows the software requirement
output and accurate results specification
In our observation (GBT) is giving best
accurate results SPIRAL MODEL
Spiral model is one of the most important Software
Time complexity is very less Development Life Cycle models, which provides
support for Risk Handling.
Reliability
In its diagrammatic representation, it looks like a spiral
Portability with many loops.
The exact number of loops of the spiral is unknown and
can vary from project to project.
By using this algorithm’s we can increase life
span of the patient if it is detected in initial
stages Each loop of the spiral is called a Phase of the software
development process. The exact number of phases
needed to develop the product can be varied by the
SRS ( SOFTWARE REQUIREMENT project manager depending upon the project risks.
SPECIFICATION )
ISSN: 2347-8578 www.ijcstjournal.org Page 104
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
As the project manager dynamically determines the Customer Satisfaction: Customer can see the
number of phases, so the project manager has an development of the product at the early phase
important role to develop a product using spiral model. of the software development and thus, they
habituated with the system by using it before
completion of the total product.
The Radius of the spiral at any point represents the
expenses(cost) of the project so far, and the angular VI. FUCTIONAL REQUIREMENTS
dimension represents the progress made so far in the
current phase.
A Functional Requirement (FR) is a
Below diagram shows the different phases of the Spiral description of the service that the software
Model: must offer.
Functional Requirements are also
called Functional Specification.
Capturing the patient medical data as a Input .
Handling the large-scale datasets.
observational and multi-center study that
includes early untreated Parkinson’s Disease
patients along with age and gender-matched
healthy normal subjects, to identify
progression biomarkers in Parkinson’s
Disease.
Output will be the different machine learning
Fig 12: Shows the working of spiral model comparative analysis graphs
Advantages of Spiral Model
NON-FUNCTIONAL REQUIREMENTS
Below are some of the advantages of the Spiral
Model. A non-functional requirement defines the
quality attribute of a software system.
Risk Handling: The projects with many
unknown risks that occur as the development Non-functional Requirements allows you to
proceeds, in that case, Spiral Model is the best impose constraints or restrictions on the design
development model to follow due to the risk of the system across the various agile
analysis and risk handling at every phase. backlogs.
Good for large projects: It is recommended login for authorized users.
to use the Spiral Model in large and complex
projects. data backup using SQL queries.
Flexibility in Requirements: Change requests It should run in offline also
in the Requirements at later phase can be
incorporated accurately by using this model.
ISSN: 2347-8578 www.ijcstjournal.org Page 105
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
DATA FLOW DIAGRAM SOFT WARETOOLS
Tools that are used for implementation of the problem
solution are as follows:
python 3.7.4( 64 bit)
Microsoft Excel 2007.
Parkinson’s data set.
Machine learning algorithms.
Html web page for user convenience
Fig 13 : data flow diagram
SCREEN SHOTS OUTPUT
ARCHITECTURAL DIAGRAM
Fig 14: It shows the architecture diagram
ISSN: 2347-8578 www.ijcstjournal.org Page 106
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
Fig 14 :HOME PAGE OF HTML
Fig 15 :TO DIAGNOSE INDIVIDUAL
PERSION
If it is matched with the symptoms well get “YES” /if it
is not matched we will get” NO”
ISSN: 2347-8578 www.ijcstjournal.org Page 107
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
Fig 18: Generated graph
Fig 19 : CLICK ON THE UPLOAD TEST
Fig 16 : UPLOAD TRAINING SET FILE
Fig 17 : TRAINING SET OUT PUT Fig 20 : TEST FILE OUTPUT
ISSN: 2347-8578 www.ijcstjournal.org Page 108
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
VII. FUTURE SCOPE
In this study we have used machine learning techniques,
however very few researches have been done on
machine learning algorithms. In future, the work can be
extended by using auto encoders to reduce the number
of feature and to extract the most important from them.
Also the dataset used in this work is not so complex , so
auto encoder did not learn well from that but with
complex dataset it would definitely give better results.
VIII. CONCLUSION
Fig 21: Generated graph In this work, various prediction models for Parkinson’s
disease detection. For this purpose four machine
learning techniques i.e. are naviebaye’s , support vector
machine(SVM),StochasticGradient
Fig 22 :CLICK ON GET SCATTER GRAPH Descent(SGD),Gradient tree boosting(GBT). To obtain
the desired results, are as well as four performance
metrics are seen. These four metrics are accuracy,
sensitivity, ROC, specificity.
From the results, GBT outstands from all the other ML
techniques with the accuracy. After that , we tried to
selected the most important and minimum number of
features from the speech articulation data of 195 people
where we have 23 features as explained in dataset
description .For that we have used feature selection
techniques whose working is shown below by changing
the number of features selected as it is giving the
overall accuracy 96.6%, which is better in comparison
to all other machine learning techniques when
compared with 50,100 and 150 sub sets feature’s
performance metrics.
REFERENCES
[1] Kamal Nayan Reddy, Challa, Venkata Sasank
Pagolu and Ganapati Panda, “An Improved
Approach for Prediction of Parkinson’s Disease
using Machine Learning Techniques”, in
Procedings of the International conference on
Signal Processing, Communication, Power and
Embedded System (SCOPES)-2016, pp. 1446-145,
Fig 23 : Generated scatter graph 2016.
[2] Geeta Yadav, Yugal Kumar and G. Sahoo,
ISSN: 2347-8578 www.ijcstjournal.org Page 109
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
“Predication of Parkinson’s disease using Data [9] Daniel Johnstone1, Elizabeth A. Milward1,
Mining Methods: a comparative analysis of tree, Regina Berretta1 and Pablo Moscato1,
statistical and support vector machine classifiers”, “Multivariate Protein Signatures of Pre-Clinical
in Procedings of the National Conference on Alzheimer’s Disease in the Alzheimer’s Disease
Computing and Communication Systems Neuroimaging Initiative (ADNI) Plasma Proteome
(NCCCS), pp. 1-4, 2012. Dataset”, in Proceedings of the Disease
Neuroimaging Initiative, vol-7, pp. 1-17, 2017.
[3] Paolo Bonato, Delsey M. Sherrill, David G.
Standaert, Sara S. Salles and Metin Akay, “Data AUTHOR DETAILS
Mining Techniques to Detect Motor Fluctuations in
Parkinson's Disease”, in Proceedings of the 26th P. SRAVANTHI is presently
Annual International Conference of the IEEE pursuing B.Tech (CSE)
Engineering in Medicine and Biology Society, pp. Department of Computer Science
4766-4769, 2004. Engineering from Nadimpalli
satyanarayana raju institute of
technology.
[4] Sonu S. R., Vivek Prakash and Ravi Ranjan,
“Prediction of Parkinson’s Disease using Data
Mining”, in Proceedings of the International B.SAI SUDHA is presently
Conference on Energy, Communication, Data pursuing B.Tech(CSE)
Analytics and Soft Computing (ICECDS), pp. Department of Computer Science
1082-1085, 2017. Engineering from Nadimpalli
satyanarayana raju institute of
technology
[5] Aarushi Agarwal, Spriha Chandrayan and Sitanshu
S Sahu, “Prediction of Parkinson’s Disease using M.DINESH is presently pursuing
Speech Signal with Extreme Learning Machine”, in B.Tech (CSE) Department of
Proceedings of the International Conference on Computer Science Engineering
Electrical, Electronics, and Optimization from Nadimpalli satyanarayana
Techniques (ICEEOT), pp. 1-4, 2016. raju institute of technology
[6] Akshaya Dinesh and Jennifer He, “Using Machine
Learning to Diagnose Parkinson’s Disease from P.SAI GANESH is presently
Voice Recording”, in Proceedings of the IEEE pursuing B.Tech (CSE)
MIT Undergraduate Research Technology Department of Computer Science
Conference (URTC), pp. 1-4, 2017. Engineering from Nadimpalli
satyanarayana raju institute of
technology
[7] Giulia Fiscon, Emanuel Weitschek, Giovanni Felici
and Paola Bertolazzi, “Alzheimer’s disease patients
classification through EEG signals processing”, in MR.T.RAVI KUMAR
Proceedings of the IEEE Symposium on (M.TECH),is working as an
Computational Intelligence and Data Mining Assistant Professor in
(CIDM). pp 1-4, 2014. Department of computer science
and engineering in Nadimpalli
[8] Pedro Miguel Rodrigues, Diamantino Freitas and satyanarayana raju institute of
Joao Paulo Teixeirab, “Alzheimer technology, sontyam,
electroencephalogram temporal events detection by Visakhapatnam.
K-means”, in Proceedings of the International
Conference on Health and Social Care Information
Systems and Technologies HCIST. pp. 859 – 864,
2012.
ISSN: 2347-8578 www.ijcstjournal.org Page 110
International Journal of Computer Science Trends and Technology (IJCST) – Volume 8 Issue 2, Mar-Apr 2020
MR.G.RAJASEKHARAM
(M.TECH,PH.D), is working as an
Associate Professor,(HOD) in the
Department of Computer
Science and Engineering in
Nadimpalli satyanarayana raju
institute of technology ,
sontyam ,Visakhapatnam.
ISSN: 2347-8578 www.ijcstjournal.org Page 111