0% found this document useful (0 votes)
44 views4 pages

Identification of Parkinson's Disease Using Machine Learning Algorithms

The document discusses identifying Parkinson's disease using machine learning algorithms. It analyzes audio signals from patients to train classification models like logistic regression and XGBoost. XGBoost achieved 96% accuracy in predicting Parkinson's disease, outperforming logistic regression which achieved 79% accuracy.

Uploaded by

Prajwal Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views4 pages

Identification of Parkinson's Disease Using Machine Learning Algorithms

The document discusses identifying Parkinson's disease using machine learning algorithms. It analyzes audio signals from patients to train classification models like logistic regression and XGBoost. XGBoost achieved 96% accuracy in predicting Parkinson's disease, outperforming logistic regression which achieved 79% accuracy.

Uploaded by

Prajwal Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Biomedical

Communication
Biosc.Biotech.Res.Comm. Vol 13 (2) April-May-June 2020 Pp-576-579

Identification of Parkinson’s Disease Using Machine


Learning Algorithms
V. Ulagamuthalvi1, G. Kulanthaivel2, G. Sri Nikhil Reddy3 and G. Venugopal3
1
School of Computing, Sathyabama Institute of Science and Technology, Chennai, India
2
NITTTR, Chennai, India.
3
School of Computing, Sathyabama Institute of Science and Technology, Chennai, India

ABSTRACT
Parkinson’s disease is Progressive nervous system disorder. It affects movement of the human beings. Symptoms
starts gradually. The result of syndrome is the patient is not able to do the activities like talking, strolling, and tremor
during motion. Normally the physicist identified this disease using two scales are Hoehn and Yahr scale and Unified
Parkinson’s Disease Rating Scale. There are so many features in the dataset. Audio signal is one of features taken
in the dataset from UCI dataset repository. Parkinson’s disease patient has a low-volume noise with a monotone
quality. In This system different audio signals like jitter, simmer, New Human Revolution (NHR), Multidimensional
Voice Program (MDVP) are given as a train and test data. MinmaxScale method is used for preprocessing the
data. Threshold value and correction coefficient of audio data are played as a parameters of feature selection. The
Machine Learning classifiers are utilized to identify the disease. In our model we employed Logistic regression
and eXtreme Gradient Boosting (XGBoost) classifiers for classification. Among twenty one features only twelve
played as an important role for predicting the disease. The system has achieved result in predicting whether the
Parkinson’s disease patient is healthy or not. The performance of machine learning classifier XGBoost provided
the accuracy of 96% and the Matthews Correlation Coefficient (MCC) of 89%.

KEY WORDS: Multidimensional Voice Program Matthews Correlation Coefficient Parkinson ’s disease,
XGBoost.

INTRODUCTION of communication. Parkinson disease affects central


nervous system which leads to the effect in motor system,
Parkinson’s disease is described as a neuro degeneration the main PD symptoms are tremor, rigidity and movement
disorder which is death of dopamine generating disorders, (Ramezani et al 2017).
cells (Jankovic et al 2008). The loss of dopaminergic
neurons in the mid brain decrease the achievable rate The people who are having Parkinson’s Disease mostly
90% of them have a speech impairment, only 3% to 4%
ARTICLE INFORMATION of PD patient receives speech therapy and also only one
of the most important factor for PD is age, the patient
*Corresponding Author: [email protected] of PD are most of them are aged between 45-60, (Levine
Received 12th April 2020 Accepted after revision 19th May 2020
Print ISSN: 0974-6455 Online ISSN: 2321-4007 CODEN: BBRCBA et al 2003). The speech of PD patient have change in
the frequency specter in their voice because they loss the
Thomson Reuters ISI Web of Science Clarivate control of the limb, which decrease the frequency of the
Analytics USA and Crossref Indexed Journal
audio. So, the low frequency region gives important data
to differentiate the speech impairments in PD. Unified

576
NAAS Journal Score 2020 (4.31) SJIF: 2020 (7.728)
A Society of Science and Nature Publication,
Bhopal India 2020. All rights reserved
Online Contents Available at: http//www.bbrc.in/
DOI: 10.21786/bbrc/13.2/32
Ulagamuthalvi et al.,

Parkinson disease rating scale (UPDRS) is used to find contains 195 voice samples and consist of both male
the severity of the PD by help of clinical expertise and and female. The dataset has 23 PD patient and healthy,
experience (Dobson et al 2008). by comparing all the classifiers, FBANN classifier has
achieved 97.37% accuracy.
Centre for Machine Learning and Intelligent system
(2009) given that we perform a feature selection for Srilatha et al (2019) have presented Classification is
the audio features dataset created by Max Little of the an important task within the field of computer vision.
University of Oxford, high prediction has been achieved Image classification refers to the labelling of images
with classification accuracy, algorithm predict various into one of a number of predefined categories that
accuracy for various variables that are relevant on includes image sensors, image pre-processing, object
the other attributed present in the feature dataset, as detection, object segmentation, feature extraction and
feature plays important role the dataset which we taken object classification. Many classification techniques have
from UCI repository contains 21 features and applied a been developed for image classification. The highest
Pearson’s correlation coefficient on feature to determine concentration is on using various classifiers combined
the coefficient correlation among features. with several segmentation algorithms for detection of
tumor using image processing. Shraddha et al (2019)
Neharika et al (2020) have given the Multi-Dimensional have proposed as Performance parameters used by
Voice Program (MDVP) is a computer program that can authors are true positive, true negative and accuracy.
calculate as many as 33 acoustic parameters from a voice Authors make use of various semi-supervised classifiers
sample. It is standard. Dobson et al (2008) presented for intrusion detection. All classifiers used NSL KDD
this section presents the comparative determination dataset for intrusion detection.
endeavors, here both model-based and model-free
techniques algorithms are used for predicting Parkinson’s Ramani et al (2011) discussed a system to classify PD
disease. Rätsch et al (2001) have presented that most and Non-PD patient was proposed by utilized Binary
commonly used model-based tool is Logistic regression Logistic Regression, Linear Discriminant Analysis LDA,
which it measures the outcome on a binary scale (e.g. Random tree and SVM. The dataset used in this system
healthy/not), here classification process carried out based are from UCI repository of PD, the training dataset
on the estimation probabilities. Whereas model-free consist of 195 samples with 21 features, here the LDA
methods like XGBoost adapt to the intrinsic. and random tree achieved an accuracy greater than 90%.
Resul et al (2010) used various classification models to
Fietzek et al (2020) given the high dataset size identify PD. Classification techniques were implemented
requirements are met through a supervised data and analyzed, they are neural network, regression and
collection approach by which we were able to generate decision tree. For classification various evaluation
informative annotations in one-minute intervals. To methods were used, the performance of the classifiers
our knowledge, collecting expert annotations on a one- were evaluated from the results, only Neural network
minute basis has not been reported to date at such a large classifier yield the good result among other, here the
scale. Abós et al (2017) described that data characteristic input dataset was randomly inserted into train and test
without any priori model. We used XGBoost algorithm for dataset.Paul et al (2019) have used a machine learning
classification, XGBoost algorithm benefit from constant techniques for predicating student dropout using data
learning or retraining, they don’t guarantee optimized mining.
classification/regression. However, when trained and
maintained, XGBoost learning method have great In this model decision tree was used to predict the dropout
potential than Logistic regression in solving real world in student and they obtained an accuracy with 97.69%
problems. The prior report of using XGBoost technique and the prediction was done by using various parapets,
to diagnose Parkinson’s disease are determines according which are considered for every student. Mallikarjuna
to their cognitive status. et al (2020) presented the feedback-based approach
comparison of the normality and abnormality with the
XGBoost provided an accuracy of 96% for classification back propagation approach. In the training phase, the
the dataset and logistic regression provided an accuracy extracted feature sequence of a normal walking and
of 79%, this system that predict PD has been formulated abnormal walking, the three classes A, B, C, D normal,
which compares the accuracy of LR and XGBoost on the Parkinson gait, Hemiplegic gait, Neuropathic gait data
train and test dataset. It utilized co-efficient correlation sets compared with the normal data set.
to find the correlation among features, on comparison
it provided that XGBoost performed better than LR with MATERIAL AND METHODS
accuracy of 96%. Mohammad et al (2014) performed a
comparative analysis to detect Parkinson Disease using In this system, we applied two machine learning
various classifiers like Support vector Machine (SVM), algorithms which are Logistic Regression and XGBoost.
Random Tree (RT), feed forward back-propagation We implemented this model to find the best model among
Artificial Neural Network (FBANN) classifiers are utilized them for the datasets Logistic regression: Mohammad et
in this system. Geetha et al (2011) presented a comparison al (2013) given as Logistic Regression is the appropriate
was made between the classifiers to differentiate between regression analysis to conduct when the dependent
PD and Healthy persons and the study has the dataset variable is dichotomous. It is used to explain the

577 Identification of Parkinson’s Disease BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS


Ulagamuthalvi et al.,

relationship between one or more independent variables Figure 1 shows the correlation between features, it
and one dependent binary variables, the dependent differentiate the strong positive and negative correlation
variable must be binary in nature, e.g. 0 or 1. They among features
shouldn’t be high correlation among the prediction, this
can be assessed by a correlation matrix. Here the outcome
has two classes, Logistic regression starts with different Figure 1: Cross correlation among features
model setup than linear regression instead of modeling
Y as a function of X directly, we model the probability
that Y is equal to class 1, gives X. First, abbreviation
P(X)=P(Y=1/X).

A. XGBoost: Zhang et al (2019) presented XGBoost is


a boosting algorithm, it is statistical learning method
and derived from gradient boosting decision tree, it has
better performance and optimization. The reason why we
used XGBoost is it has good efficiency and feasibility,
XGBoost allows dense and sparse matrix as the input
and a numeric vector uses integer starting from 0 for
classification, we can add number of iteration to the
model A dataset with of n samples and d features of every
sample then s_k is the prediction from decision tree.

The prediction score of each individual are summed up


to get the final score. Mathematically, our model in the
form

(1)

Where k is number of trees, s is function in function RESULTS AND DISCUSSION


space s.
By comparing the result of the system, the maximum
B. Data preprocessing: MinMaxScaler, Normalizer are classification rate is achieved by XGBoost than LR
method in scikit-learn are preprocessing methods, based with an accuracy 96%, whereas LR achieved only 79%
on our features values we select the method, as we accuracy. Figure 2 shows the bar chart of the classifiers
know machine learning algorithm will perform better accuracy and correlation coefficient of the XBoost and
and faster when features are relative or similar scale, Linear Regression. Figure 1 shows that only 12 features
we suggest MinMaxScale () for preprocessing, as it are the most important and characteristic among other
subtracts the minimum value in feature and divide with features present in the dataset.
its range, difference of maximum and minimum is range
MinMaxScale () return the default range 0 to1.
Table 1. Classification Accuracy of both model
C. Feature selection: Arefi et al (2011) prescribed as
we know, features play important role in classification, Algorithm Accuracy MCC
there are different approach in feature selection and
based on the threshold value and benchmark algorithm Logistic Regression 0.79 0.42
we determine the optimality of feature in the dataset,
XGBoost 0.96 0.89
Correlation coefficient features selection is the most
widely used parameter, because feature selection is
based on their correlation factor among the features Figure 2: Bar chart of classifiers accuracy
(Shahbakhti et al 2013).

Let suppose f1 and f2 are two correlated features then


to find Pearson’s correlation coefficient (r)

(2)

Where, cov(x, y) are covariance of variable x and y

(3)

BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS Identification of Parkinson’s Disease 578


Ulagamuthalvi et al.,

The entitled technology will permit us to compare two Predicting Student Dropout using Data Mining and
or more new algorithms with this XBoost and show the Machine Learning Techniques International Journal of
performance of the XBoost classifier. Innovative Technology and Exploring Engineering Vol
8 Issue 9S2 Pages 750 – 752.
CONCLUSION Mohammad S Islam Imtiaz Parvez Hai Deng Parijat
Goswami (2014) Performance Comparison Of
The aim of the study is to analyze which algorithm provide
Heterogeneous Classifiers For Detection Of Parkinson's
the high accuracy of prediction for the Parkinson’s
disease Using Voice Disorder (Dysphonia) International
disease dataset, here the classification accuracy was
studied and compared, with good performance and fast Conference On Informatics, Electronics & Vision (Iciev)
implementation XGBoost achieved a high accuracy with Pages 1 – 7.
96%. This system provides the comparison between Mohammad shahbakhti Danial Taherifar (2013) linear
machine learning classifiers of LR and XGBoost in PD and Non-Linear speech features for detection of
disease diagnosis with high dimensional data. Parkinson’s disease BMEiCON-2013.
Murugan S Kulanthaivel G V Ulagamuthalvi (2019)
REFERENCES Selection of test case features using fuzzy entropy
Abós Alexandra Abós Hugo C Baggio Bàrbara Segura measure and random forest Ing. Des Syst. d’Information,
(2017) Discriminating cognitive status in Parkinson’s Vol 24 No 3 Pages 261–268.
disease through functional connectomics and machine Murugan S. and Ramachandran V. (2012) Aspect
learning Scientific reports Vol 7 No. 45347. Oriented Decision Making Model for Byzantine
Arefi Shirvan R E Tahami (2011) Voice analysis for Agreement, Journal of Computer Science Vol 8 No. 3
detecting parkinson's disease using genetic algorithm Pages 382-388.
and KNN classification method Proc 18th Int Con on Neharika D Bala Anusuya S (2020) Machine Learning
Biomedical Engineering, Tehran Pages 550-555. Algorithms for Detection of Parkinson’s Disease using
Mallikarjuna B R. Viswanathan and Bharat Bhushan Motor Symptoms: Speech and Tremor, International
Naib (2020) feedback-based gait identification using Journal of Recent Technology and Engineering Vol 8
deep neural network classification, Journal of Critical Issue 6 Pages 47-50.
Reviews Vol 7 Pages 661-667. Ramezani H Akan O B (2017) Rate region analysis of
Center for Machine Learning and Intelligent System, multi-terminal neuronal nanoscale communication
website: http://Archive.Ics.UCI.Edu/ML/Datasets/ channel in 17th IEEE NANO Conf. IEEE.
Parkinsons. (2009). Rätsch G Onoda T Müller K R (2001) SOF margins for
Dobson a J Barnett A (2008) an introduction to AdaBoost. Machine learning Vol 42 Pages 287–320.
generalized linear models, CRC press. Resul das (2010) A Comparison of Multiple Classification
Franz M J Pfister Terry Taewoong Um Daniel C Methods for Diagnosis of Parkinson Disease Expert
Pichler (2020) High-Resolution Motor State Detection Systems with Applications Vol 37 Issue 2 Pages 1568-
in Parkinson’s Disease Using Convolutional Neural 1572.
Networks Scientific Reports Vol 10 5860. Shraddha Khonde V Ulagamuthalvi (2019) Fusion
Geetha R and R Sivagami (2011) Parkinson of feature selection and Random Forest for an
Disease Classification Using Data Mining Algorithms Anomaly based intrusion detection system, Journal
International Journal of Computer Applications Vol 32 of Computational and Theoretical Nan science Vol 16,
No 9 Pages 17-22. Pages 3603-3607.
Gladence L Mary M Karthi V Maria Anu (2015) A Srilatha. K V Ulagamuthalvi (2019) A Comparative
statistical comparison of logistic regression and Study on Tumor Classification Research Journal of
different Bayes classification methods for machine Pharmacy and Technology Vol 12 No 1 Pages 407-
learning Journal of Engineering and Applied Sciences 411.
Vol 10 No14 Pages 5947-5953. Tsanas M A Little P E McSharry J Spielman L O Ramig
Jankovic J (2008) Parkinson’s disease: clinical features (2012) Novel Speech Signal Processing Algorithms for
and diagnosis Journal of Neurology, Neurosurgery & High Accuracy Classification of Parkinson's Disease in
Psychiatry Vol 79 Issue 4 Pages 368-376. IEEE Transactions on Biomedical Engineering Vol 59
Levine C B Fahrbach K R Siderowf a D R P Estok V M No 5 Pages 1264- 1271.
Ludensky S D Ross (2003) Diagnosis and treatment of Zhang J Ren Y Cheng B Wang Z Wei (2019) Health Data
Parkinson’s disease: a systematic review of the literature, Driven on Continuous Blood Pressure Prediction Based
Evid. Rep. Technol. Assess No. 57 Pages 1–4. on Gradient Boosting Decision Tree Algorithm in IEEE
Mercy Paul Selvan Nagubadi Navadurga Nimmagadda Access Vol 7 Pages 32423-32433.
Lakshmi Prasanna (2019), An Efficient Model for

579 Identification of Parkinson’s Disease BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS

You might also like