Parkinson’s Disease(PD)
Chandan K S
Department of Computer Science, JSS College of arts Commerce and Science
Ooty Road, Mysore-57002, India
Email: [email protected]
Abstract—Parkinson’s Disease (PD) is a progressive neurolog- symptoms, they’re often recorded for further analysis. In this
ical disorder that affects millions of people around the world. particular study, we focus on analyzing speech patterns to
Detecting it early can make a huge difference in how well patients see if they can help detect Parkinson’s early on. Parkinson’s
respond to treatment and manage their symptoms. In this study,
we explored how machine learning (ML) can help identify PD is considered a neurodegenerative disorder, which means it
earlier and more accurately by analyzing clinical and genetic slowly damages and destroys nerve cells over time. Neurons,
data.We gathered a rich dataset that included detailed medical which are the brain’s communication cells, play a huge role in
and genetic information from both individuals diagnosed with PD how we think, move, and feel. A healthy neuron has a central
and healthy participants. Using this data, we applied several ma- body with branches called dendrites and axons, and inside it
chine learning techniques—specifically Random Forest, Decision
Tree, and AdaBoost—to build predictive models for Parkinson’s holds a nucleus with our full genetic code. That’s right—every
Disease. We also used feature selection and engineering methods single neuron contains our entire DNA blueprint.
to improve the accuracy and efficiency of these models. Among When neurons start to fail, they lose their ability to connect
the algorithms we tested, the Random Forest model stood out, with other neurons. Their metabolism slows down, and they
achieving an impressive AUC-ROC score of 0.95. This means it begin to accumulate waste. The cell tries to manage the
was highly effective at distinguishing between PD patients and
healthy individuals. Through feature importance analysis, we also damage by packing that waste into small pockets. But if things
identified specific clinical signs and genetic markers that played get worse, the neuron can lose all its structure—it rounds
a major role in predicting the disease. Our findings show that up, fills with junk, and basically stops working altogether.
machine learning can be a powerful tool in the early detection of That’s a big part of what makes diseases like Parkinson’s so
Parkinson’s Disease. These models have the potential to support devastating.
doctors in making more informed decisions, allowing for earlier
intervention and more personalized care for those at risk. This study focuses on predicting Parkinson’s disease, a
serious neurological disorder that’s becoming increasingly
Index Terms—Neurodegeneration, Dopamine, Basal Ganglia, common and, unfortunately, remains incurable. Named after
Motor Symptoms, Tremor, Bradykinesia, Rigidity, Lewy Bodies.
James Parkinson—who first described it as ”paralysis agi-
tans”—the disease later took on his name and became widely
I. I NTRODUCTION
known as Parkinson’s Disease (PD).
According to a recent report by the World Health Or-
ganization, there’s been a noticeable rise in the number of II. L ITERATURE S URVEY
people affected by Parkinson’s disease, and the overall health The literature survey for this project involves a review of
burden is growing rapidly. Parkinson’s is now recognized several research studies and techniques that have been used to
as the second most common neurological disorder, and it detect and predict Parkinson’s disease with the help of machine
can seriously impact a person’s ability to move and function learning (ML). These studies highlight different approaches,
normally. People with this condition often experience shaking, tools, and levels of effectiveness in diagnosing PD. Below is
stiffness, and trouble with balance or walking. The root cause a summary of a few key works:
is the gradual breakdown of nerve cells in the brain.
In healthcare, classification algorithms are often used to 2.1 Graph Theoretical Analysis (2019)
group medical data into categories based on specific char- Authors: Jiayue Cai and Taormin MI
acteristics. When it comes to Parkinson’s, these tools can Methodology: This study used graph theoretical analysis to
help identify whether someone might have the disease by examine brain connectivity networks dynamically. The goal
analyzing various features. Parkinson’s comes with a mix of was to better understand how Parkinson’s disease affects the
symptoms—some affect movement, while others don’t. The communication between different regions of the brain.
motor symptoms include things like slow movements, muscle Results: The proposed method achieved a recognition rate
stiffness, balance issues, and tremors. Over time, these prob- of 90.5%, showing promising accuracy in distinguishing PD-
lems can make everyday tasks like walking or talking much affected individuals from healthy subjects.
harder. But Parkinson’s isn’t just about movement. It also
affects people in other ways. Non-motor symptoms can include 2.2 Detection Using Rating Scale (2020)
anxiety, breathing difficulties, depression, loss of smell, and Author: Benita
changes in how someone speaks. When doctors notice these Methodology: In this research, a custom rating scale was
developed to help detect Parkinson’s disease. The approach Preprocessing transforms this data into a clean and under-
involved the use of gradient-boosted regression trees, a pow- standable format, making it ready for analysis and model
erful ML technique, to evaluate patient data. training.
Results: Besides accurate detection, this study also explored • Model Training: Once the data is ready, we split it
how effective medications were in managing symptoms, offer- into two parts: a training dataset and a testing dataset.
ing insights into both diagnosis and treatment monitoring. The training dataset is used to train the machine learn-
ing models. Three algorithms—Decision Tree, Random
2.3 Machine Learning for Early Prediction (2022) Forest, and AdaBoost—are applied to the data, and the
Authors: Mavis Henriques and Ashin Laurel classification accuracy of each model is evaluated.
Methodology: This study focused on using logistic regression • Testing the Models: After the models have been trained,
models combined with other machine learning techniques to they are tested using the same algorithms to assess how
predict Parkinson’s disease in its early stages. The approach well they can predict Parkinson’s disease on new, unseen
aimed to identify patterns in patient data that could signal the data.
onset of the condition. • Comparison of Results: Finally, the results from the
Results: The model achieved an accuracy of 85%, indicating three algorithms are compared based on their classifica-
strong potential for early detection through ML-based meth- tion accuracy. This helps to determine which algorithm
ods. performs the best for predicting Parkinson’s disease.
2.4 Voice-Based Telemonitoring (2020) M ETHODOLOGY
Authors: E. Wang and L. Verhagen The proposed approach uses audio data from the
Methodology: The researchers implemented a Kernel Support PPMI[21] and UCI datasets, which contain voice record-
Vector Machine (SVM) to detect Parkinson’s disease using ings of individuals with Parkinson’s Disease. These
vocal features. This method was designed for telemonitoring, recordings include key vocal features like jitter, shimmer,
allowing remote assessment of patients through speech analy- and MDVP values during vowel sounds.
sis. The data is preprocessed and analyzed to highlight im-
Results: The model delivered an impressive accuracy of portant patterns. Four machine learning models—Logistic
91.4%, showcasing the effectiveness of voice-based analysis Regression, SVM, Random Forest Regressor, and K-
in PD detection. Nearest Neighbors—are trained using 75% of the dataset.
These models aim to classify whether a voice sample
2.5 Statistical Brain Connectivity Analysis (2019)
belongs to a Parkinson’s patient or a healthy individual.
Author: Mohammad Hadi Aarabi The remaining 25% of the data is used for testing and
Methodology: This research applied Diffusion Tensor Imag- evaluation. Model performance is assessed using metrics
ing (DTI) to study brain connectivity and assess neurode- such as accuracy, precision, sensitivity, confusion ma-
generation in Parkinson’s patients. The statistical approach trix[22], and ROC-AUC score.
provided insights into how PD alters the brain’s structure and
communication pathways.
Results: The method reached a high accuracy of 93.4%,
making it a reliable tool for understanding and diagnosing
Parkinson’s disease through brain imaging.
III. P ROJECT I MPLEMENTATION /P ROPOSED
M ETHODOLOGY
Machine learning has revolutionized the ability of computer
systems to learn from data and make predictions without the
need for explicit programming. In this project, three machine
learning algorithms are utilized to predict Parkinson’s disease
(PD). The architecture diagram below provides an overview of
the system components and their interactions. It outlines the
major steps involved in the process, which are:
• Architecture Overview: The architecture defines the
flow of the process, starting with refining the raw data
and ultimately using it to predict Parkinson’s disease. This
step sets the foundation for the entire model development
process.
• Data Preprocessing: The raw data collected is not al-
ways in a format suitable for machine learning models. Fig. 1: Architecture diagram of the proposed system.
IV. S YSTEM D ESIGN Before training and testing the models, the raw data
underwent preprocessing, which included cleaning, han-
A. System Architecture
dling missing values, and normalization. The result is a
The system architecture provides a clear and organized standardized dataset ready for machine learning applica-
flow of how Parkinson’s Disease prediction is carried out tions.
using machine learning. It starts from collecting relevant
V. R ESULTS
patient data and continues through stages of cleaning,
training, testing, and evaluating the results of different The ”Health Assistant” application uses several machine
models. learning models to predict different diseases based on
This approach helps ensure that the models are well- health parameters provided by the user. These models are
prepared to identify patterns associated with Parkinson’s pre-trained and integrated into the Streamlit framework,
Disease. By structuring the process into separate stages, allowing for real-time predictions. The performance of
we can better manage and analyze each step, leading to the system is assessed using key metrics such as accuracy,
more accurate and reliable predictions. precision, recall, and F1-score, to evaluate how well each
disease prediction model performs.
PARKINSON DISEASE PREDICTION
Fig. 2: System Architecture Diagram
The architecture includes the following stages:
– Data Collection: Involves gathering clinical and
biometric data that reflects characteristics commonly
associated with Parkinson’s Disease.
– Preprocessing: Raw data is cleaned, normalized,
and transformed into a format suitable for machine
learning algorithms.
– Model Training: The processed data is used to
train different algorithms, including Decision Tree,
Random Forest, and AdaBoost.
– Model Testing: The trained models are tested on
a separate portion of the data to measure their
performance.
– Result Evaluation: The accuracy and other evalu-
ation metrics are used to compare and analyze the
results from each model.
B. Data Design
The dataset used in this study was sourced from Kaggle, VI. M ODEL P ERFORMANCE E VALUATION
containing biomedical voice measurements from 31 indi- The performance of the model was evaluated using real-
viduals, 23 of whom have Parkinson’s Disease. Each row world datasets. The table below shows the accuracy of
represents one of 195 voice recordings, and each column the model used in this application:
corresponds to a specific vocal feature such as pitch, jitter,
Parkinson’s Prediction Model Accuracy
and shimmer. Support Vector Machine (SVM) 87.1%
The dataset’s main goal is to differentiate between healthy
individuals and those with Parkinson’s Disease, with the TABLE I: Model Accuracy for Parkinson’s Prediction
latter labeled as ’1’.
VII. D ISCUSSION 2) Shuo Yang, Ping Luo, Chen Change Loy,
Using machine learning (ML) algorithms to predict Member, IEEE and Xiaoou Tang, Fellow,
Parkinson’s Disease (PD) offers a promising way to IEEE. Faceness-Net: Face Detection through
assist with early diagnosis, which is essential for man- Deep Facial Part Responses. (2017) DOI:
aging symptoms and improving patient outcomes. In this 10.1109/TPAMI.2017.2738644, IEEE.
project, we applied three main ML algorithms—Decision 3) Tri-Cong Pham, Antoine Boucet, Chi Mai Loung,
Tree, Random Forest, and AdaBoost—to build models (Member, IEEE), Cong Thanh Tran, and Van-Dung
that can differentiate between healthy individuals and Hoang, (Member, IEEE). Improving Skin Disease
those with PD based on speech features. Classification Based on Customized with Balanced
Of the three models, the Random Forest algorithm stood Mini-Batch Logic and Real-Time Image Augmenta-
out with the best performance, achieving an AUC-ROC tion. (2020) DOI: 10.1109/ACCESS.2020.3016653.
of 0.95, which shows its strong ability to distinguish 4) Rahat Yasir, Md.Ashiqur, and Nova Ahmed. Derma-
between the two groups. This result is in line with tological Disease Detection using Image Processing
Random Forest’s ensemble approach, where multiple and Artificial Neural Network. (2014).
decision trees are combined to enhance accuracy and 5) R.K.M.S.K Karunanayake, W.G Malaka Danan-
reduce overfitting. AdaBoost also performed well, thanks jaya, M.S.Y Peirs, B.R.I.S Gunatileka, Shashika
to its focus on correcting misclassified instances from pre- Lokuliyana, Anuththara Kuruppu. CURETO: Skin
vious iterations, making it effective at handling complex Disease Detection Using Image Processing and
patterns in the data. CNN. (2020).
6) Dr. Punal.M.Arabi, Mrs. Gayathri Joshi, Rohit
VIII. C ONCLUSION & F UTURE W ORK N Reddy, Anusha S.R, Archa P.S. Categorizing
We have developed an effective approach to create an Normal Skin, Oily Skin and Dry Skin using 4-
accurate predictive model for Parkinson’s disease using Connectivity and 8-Connectivity Region Properties.
Decision Tree, Random Forest, and AdaBoost classi- (2017).
fiers. This method successfully identifies individuals with 7) Zhi-Hao Wang, Gwo-Jiun Horng, TZ Heng Hsu,
Parkinson’s disease with an accuracy ranging from 90% Chao-Chun Chen, and Gwo Jia Jong. A novel
to 95%.Our in-depth study shows that sustained vowels Facial Thermal Feature Extraction Method for
contain enough information to reliably predict Parkin- Non-Contact Healthcare System. (2020) DOI:
son’s disease. In future research, exploring different fea- 10.1109/ACCESS.2020.2992908.
ture selection or reduction techniques could help further 8) Leelavathy S, Jaichandran R, Shobana R, Vasude-
enhance the classification accuracy. van, Sreejith S Prasad and Nihad. Skin Disease
Detection Using Computer Vision and Machine
Future Work: Learning Technique. (2020).
– In the future, these models can be trained with
different datasets that include more features, which
could potentially improve prediction accuracy.
– If the accuracy rate increases, these models could be
used in laboratories and hospitals to easily predict
diseases in the early stages.
– These models could also be applied to different
medical and disease datasets, expanding their utility.
– A future direction could be to extend the work by
developing a hybrid model that can predict more than
one disease, using an accurate dataset that includes
common features from both diseases.
– Additionally, future work could focus on building a
model that extracts the most important features from
the dataset, potentially leading to further improve-
ments in accuracy.
IX. R EFERENCES
1) Kenneth Thomsen, Anja Liljedahl Christensen, Lars
Iversen, Hans Bredsted Lomhant and Ole Winther.
Deep Learning for Diagnostic Binary Classifica-
tion of Multiple-Lesion Skin Disease. (2020) DOI:
10.3389/fmed.2020.574329.