0% found this document useful (0 votes)
19 views12 pages

Machine Learning-Based Disease Classification Models For Parkinson's Based On Magnetic Resonance Imaging

AJCSE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

Machine Learning-Based Disease Classification Models For Parkinson's Based On Magnetic Resonance Imaging

AJCSE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ISSN 2581 – 3781

Available Online at www.ajcse.info


Asian Journal of Computer Science Engineering 2025;10(3):1-12

REVIEW ARTICLE

Machine Learning-Based Disease Classification Models for Parkinson’s Based on


Magnetic Resonance Imaging
Pradeep Laxkar*
Department of Computer Science and Engineering, ITM SLS University, Vadodara, Gujarat, India
Received on: 15-03-2025; Revised on: 10-04-2025; Accepted on: 05-05-2025
ABSTRACT
Parkinson’s disease (PD) is a slowly advancing neurological problem of the central nervous system
that is manifested by shaking, rigidity, and slowness of movement. Effective early diagnosis is a must;
usually, it includes detailed physical tests and analysis of medical history. This study presents an early-
stage PD prediction system based on biological voice characteristics and machine learning. In the study,
the researcher will use a publicly accessible dataset that is on Kaggle to discriminate between healthy
and affected people using advanced classification methods. Exploratory data analysis shows feature
correlations and class imbalance, making it possible to advance a systematic data processing pipeline
that involves cleaning data, identifying outliers, and standardizing data. This was done to improve model
performance by removing some features that are not important using feature selection, which reduces
dimensionality and computational complexity. They created and assessed two models: Logistic Regression
(LR) and Extreme Gradient Boosting (XGBoost), utilizing the receiver operating characteristic curve,
F1-score, accuracy, precision, recall, and confusion matrix. The experimental results demonstrated that
the XGBoost model outperformed the LR and could be used to make an early diagnosis of PD, with
an F1-score of 98.3, an accuracy rate of 97.4, and an area under the curve of 0.9833. These results
demonstrate that XGBoost is a useful diagnostic tool that can assist medical professionals in early PD
detection.
Key words: Clinical decision support, Early diagnosis, Medical diagnosis, Neurodegenerative
disorder, Parkinson’s disease, Voice recordings

INTRODUCTION such identification enables instilling of the right


treatments and interventions, which are capable
Alzheimer’s disease (AD) is the most prevalent of improving the outcomes of the patients.[4,5] The
neurological condition, followed by Parkinson’s. conventional PD diagnostic tools are by clinical
Parkinson’s disease (PD)-specific symptoms include observation and subjective assessment, which
bradykinesia, resting tremor, hypokinetic movement can only be used to misdiagnose a condition and
disorder, muscle stiffness, and unstable posture postponed treatment. The benefits of biosensors,
and steps.[1-3] Besides, non-motor characteristics, which are easy to build, inexpensive, ready
including dementia, depression, and dysautonomia, available, and simple to interpret and read, have
were outlined. The general disturbances of the provided it the potential of becoming an alternative
motor system on PD are known as Parkinsonism. It and more promising method of early detection of
is noteworthy that Parkinsonism is primarily linked PD.[6,7] However, traditional biosensors have some
to PD, but other disorders, including AD-related and drawbacks, such as limited sensitivity, difficulty in
PD-related diseases, have identical characteristics. detecting the target molecule in low concentration,
To effectively intervene, PD must be identified and low anti-interference ability.
early and manage the disease in time because The most common diagnostic method for detecting
PD early on is the examination of brain magnetic
Address for correspondence: resonance imaging (MRI) data. The brain’s
Pradeep Laxkar subcortical structures are shown anatomically in
E-mail: [email protected] the MRI images, which are then examined to ensure

© 2025, AJCSE. All Rights Reserved 1


Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

that no aneurysms are present. This information is · Implemented a robust pre-processing pipeline,
also thought to be helpful in the early detection including data cleaning, outlier detection, and
of specific disease types. However, because the standardization of continuous variables to
MRI is a three-dimensional structure, using the improve data quality and model performance
human eye to explore the nuances and various · Implemented XGBoost and LR classifiers to
features of subcortical areas can be challenging.[8,9] determine the most effective model for diagnosis
Thus, by utilizing multidimensional healthcare · To manage and treat PD early, the proposed
data, computer-aided detection systems have study employs ML to diagnose the condition
demonstrated remarkable efficacy in illness · Measured the performance of evaluated
analysis and diagnosis as intelligent technologies models with standard classification metrics,
have advanced. Precision, Recall, Accuracy, and F1-Score to
The latest developments in deep learning (DL) and guarantee robustness and reliability.
machine learning (ML), two branches of artificial
intelligence (AI), are helping doctors diagnose
Novelty and Justification of the Study
diseases early. As a result, recent studies have used
a range of AI and ML algorithms to automatically The proposed study is novel because it uses a
detect PD from MRI data.[10,11] DL has been used to holistic method of detecting PD based on voice
detect many different diseases and conditions, and attributes by using ensemble learning (XGBoost)
the results often surpass conventional benchmarks. and classical statistical analysis, LR benchmarks
DL algorithms are very powerful and often used to measure overall performance. Compared to
for image categorization tasks. Because they can the previous works where a single model or a
recognize intricate patterns and characteristics small number of features can be used, this study
from pictures, they outperform the outdated ML employs a wide variety of vocal biomarkers
techniques in terms of accuracy. producing delicate patterns related to PD based
on biomedical voice measurements. The use of
advanced pre-processing, feature selection, and
Motivation and Contribution of the Study
cross-validation techniques ensures robust model
The motor system is impacted by PD, a training and generalization. The justification for
degenerative neurological condition. Early this study stems from the urgent need for accurate,
diagnosis is essential for managing it and enhancing non-invasive, and early-stage diagnostic tools, as
quality of life. Conventional diagnostic methods, present clinical assessments are prone to delays
however, typically rely on subjective assessments and subjectivity. By comparing and validating
and physical observations, which can be time- multiple ML models, this study offers important
consuming to establish. Automated, non-invasive, new information on the predictive power of voice
and effective diagnostic techniques might become characteristics, supporting the development of
more feasible as ML advances and biological scalable, real-time diagnostic applications in
voice data becomes more accessible. The project clinical settings.
is driven by the need to use these technologies to
investigate voice biomarkers for PD to allow early Structure of the Paper
identification, which often fluctuates throughout
the disease’s early stages. The study uses modern The following is the structure of the paper:
ML algorithms, including Extreme Gradient Section II examines pertinent studies on PD early
Boosting (XGBoost) and Logistic Regression (LR) diagnosis, Section III describes the technique,
on voice-based features to improve diagnostic Section IV displays the findings and model
accuracy, do away with manual analysis, and comparisons, and Section V offers conclusions
contribute to the development of reliable, data- and suggestions for further study.
driven healthcare proposals.
The study’s primary contributions are as follows: LITERATURE REVIEW
· Utilized a PD dataset from Kaggle, enhancing
the practical relevance and applicability of the The material currently available on the early
findings diagnosis of PD is reviewed in this section. The
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 2
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

majority of studies emphasize the use of diverse was 92%. In comparison, the sensitivity was
algorithms to enhance the efficiency of task 90%, and the specificity was 94%. The receiver
scheduling in cloud environments. Common operating characteristic (ROC)-AVC of the timing
themes emerging from the reviewed literature of task execution was calculated to be 95% in the
include: diagnostic capacity, which indeed indicates the
Jain and Srivastava proposed neurological high level had been maintained. According to the
disorders, the use of MRI and CT images as input above results, the DBN model provided superior
data in DL models is becoming increasingly performance to other diagnostic methods, which
widespread. In this study, MRI images from the include a low FNR. Traditional techniques, where
“Alzheimer Parkinson 3 Class Data Set” available the diagnosis depends on a doctor’s assignment
on the Kaggle platform were used for the diagnosis and imaging techniques, are usually less accurate
of Alzheimer’s and PD. The dataset includes three and take more time to detect diseases early.[14]
classes: 2,561 Alzheimer’s, 906 Parkinson’s, and Vats and Mehta suggested deploying a DBN
3,010 Control (Normal) images. In this work, the method, considered a highly advanced ML
Alzheimer, Parkinson, and Normal classes were algorithm, which is more of a memory structure
trained using ResNet-18, VGG-16, and ConvNext capable of DL and hierarchically. Their study
architectures, yielding accuracy rates of 96.2%, implied the use of a DBM model for a diverse
95.4%, and 98.9%, respectively. In addition, data set of 500 PD subjects suspected to have the
Alzheimer and PDs were tested against the normal disease in its early stage. The dataset contains
class using binary classifiers. For the Alzheimer- medical records, speaking analysis, audio
Normal and Parkinson-Normal classes, the models recordings of subjects, and biometric monitoring.
achieved the following results: ResNet-18 with The model was trained using a two-phase training
accuracy rates of 82.0% and 96.1%, VGG-16 with approach. The first phase is an unsupervised pre-
95.4% and 89.4%, and ConvNext with 99.4% and training process to learn general characteristics.
99.5%, respectively.[12] The DBN model’s accuracy of 93%, sensitivity
Nawal et al., stated that an approach combining of 90%, specificity of 93%, and AVC of 0.7. 97
Histogram of Oriented Gradients (HOG) with it were all extremely positive outcomes. With an
is suggested to use a customized convolutional accuracy of 85%, sensitivity of 80%, specificity
neural networks (CNN) for early PD diagnosis. of 85%, and AVC of 0.85%, these measurements
Pre-processing methods were used to improve perform better than standard diagnostic
the consistency and quality of a medical image techniques.[15]
collection. The CNN extracts key features while Tesfai focused on the development of a speech
HOG provides edge orientation information, and audio-based ML pipeline for PD diagnosis.
and their fusion creates a robust feature map. Two voice recording datasets are assembled
An integrated attention mechanism further using data augmentation techniques. Paired with
refines focus on crucial regions. Evaluation traditional ML models, acoustic features yield
demonstrates a balanced performance in terms 99.21% accuracy, while Log-Mel spectrograms
of accuracy (99%) and parameter (0.8M) with CNN’s achieve 99.71% accuracy. The
requirement. Visualization tools, such as Grad highest accuracy of 99.82% is attained through an
class activation mapping offer insights into ensemble model that combines both spectrogram
model decisions, aiding interpretability. This and acoustic models. These outcomes provide
approach offers an accurate PD detection, compelling evidence for the effectiveness of
potentially transforming diagnosis and multimodal ensemble models in PD diagnosis,
improving patient outcomes.[13] offering promising prospects for non-invasive
Mehta and Khurana aimed to determine whether early detection.[16]
deep belief networks (DBNs) are suitable Lyu and Guo Brain Graph Convolutional
for detecting PD early since they can assess Networks is a unified framework designed to
complicated and high-dimensional medical integrate brain functional connectivity based
information. During the DBN modeling, the on the non-Euclidean heuristic into a DL
data used were trained and tested using publicly model (GCN) based on graphs for diagnosing
available datasets, and the accuracy level recorded Parkinson’s illness. To preserve the spatial

AJCSE/Jul-Sep-2025/Vol 10/Issue 3 3
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

dependency between the electroencephalogram movement disorders, PD is the most common.


(EEG) channels and make it easier to formulate Using a high-speed camera system, the accuracy
the functional connectivity building issue, the of a novel algorithm approach created to
graph format of EEG data is used. It used the recognize each motor evaluation on the Unified
GCN to simulate the flow of brain information PD Rating Scale has been confirmed. The three
between nodes using convolutions along categories of detection parameters that comprise
functional connectivity. Functional connection this system are the angle, time-frequency, and
was achieved in this study by using a heuristic trajectory parameters. With IMU, the average
search technique to solve an minimum spanning detection accuracy is 87%, 90%, and 95%,
tree issue. The resulting functional connectivity respectively. There are some disparities in the
in terms of the afflicted areas and hub shift movement characteristics between the 17 patients
was in line with previous MRI investigations. and the 20-year-old youth controls, according to
The effectiveness of the suggested framework the results of the trial tests. The typical control
was assessed by contrasting random/uniform rotation speed for 3.6 pronation and supination
connectivity produced by k-NN with the can be double that of the patient, and A typical
heuristic functional connectivity speculation. control’s amplitude deviation is 5°, whereas the
Both learning robustness and accuracy (95.59%) patients can exceed 45°.[18]
have been attained by the suggested system.[17] A comparative analysis of the background study,
Chang et al. proposed that bradykinesia, rest based on its methodology, Dataset/Environment,
tremor, and stiffness are the three primary motor Problem Addressed, Performance, and Future
symptoms of PD. Among neurodegenerative Work/Limitations, is provided in Table 1.

Table 1: Review of literature on early diagnosis of Parkinson’s disease


Authors Methodology Environment Problem Performance Future work/
addressed Limitation
Jain and Srivastava MRI image deep learning “Alzheimer Parkinson 3 MRI imaging for Multi‑class: 96.2% Focused on
(2025) with ResNet‑18, VGG‑16, Class Data Set” (Kaggle) Alzheimer’s and (ResNet‑18), classification; could
and ConvNext Parkinson’s disease 95.4% (VGG‑16), explore lightweight
diagnosis 98.9% (ConvNext); models for real‑time
Binary: Up to 99.5% or mobile deployment
(ConvNext)
Nawal, Habib, and HOG + custom CNN with Curated medical image Early Parkinson’s Accuracy: 99%, Limited details on
Barua (2025) attention mechanism; Grad dataset detection through Parameters: 0.8M dataset diversity
CAM visualization hybrid feature and generalizability;
learning clinical validation
needed
Mehta and Khurana Deep Belief Network Public datasets High‑dimensional Accuracy: 92%, Lacks multimodal
(2024) (DBN) on public PD medical data Sensitivity: 90%, data usage; focused
datasets analysis for early PD Specificity: 94%, only on DBN
detection ROC‑AUC: 95% architecture
Vats and Mehta DBN with unsupervised Diverse dataset with 500 Early‑stage PD Accuracy: 93%, AVC reported
(2024) pre‑training on multimodal PD subjects detection with Sensitivity: 90%, inconsistently;
data (voice, biometric, various physiological Specificity: 93%, real‑world
medical) and biometric AUC: 97% deployment readiness
indicators not assessed
Tesfai (2023) Traditional ML with Speech and audio datasets PD diagnosis ML: 99.21%, CNN: Real‑time application
acoustic features and CNNs + data augmentation through non‑invasive 99.71%, Ensemble: and language/accent
with Log‑Mel spectrograms; speech signals 99.82% variation unaddressed
ensemble model
Lyu and Guo (2023) Brain Graph EEG data + graph‑based EEG‑based Precision: 95.59%, Heuristic connectivity
Convolutional Networks deep learning PD diagnosis Robust learning may vary across
(BGCN) using EEG preserving spatial performance individuals; needs
functional connectivity interdependence clinical validation and
through Minimum real‑time efficiency
Spanning Tree heuristic review
Chang et al. (2022) Wearable IMU system IMU + CMOS chip Objective Accuracy: 87%‑95% Small subject
with Unified PD Rating + high‑speed camera quantification of PD depending on metric; pool (17 patients);
Scale motor exam validation motor symptoms Power: 0.3713mW; generalization
analysis (trajectory, Area: 4.2mm × and long‑term use
time‑frequency, angle) 4.2mm unassessed

AJCSE/Jul-Sep-2025/Vol 10/Issue 3 4
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

METHODOLOGY column to one of the 195 voice recordings of these


people. The “status” column is set to 0 for healthy
The symptoms of PD, a complex, progressive
and 1 for PD to distinguish between those with PD
neurological disease that causes tremor, rigidity,
and those in excellent health. This is the primary
and bradykinesia. As the illness progresses,
goal of the information. Some exploratory data
some people may have postural instability. This
analysis graphs are given in this section below:
section illustrates how to use ML to make an early
Figure 2 visualizes the pairwise relationships
diagnosis of PD. The PD dataset is gathered from
Kaggle to start the procedure. The second step between features in the dataset used to identify
also includes data preparation extensively (data Parkinson’s illness. The Pearson correlation
cleaning, the identification of outliers, and the coefficient between two attributes is shown in each
normalization of continuous variables). This is cell of the heatmap; Perfect negative correlation
followed by the feature selection process so as to (value -1) and perfect positive correlation
keep the most pertinent attributes of classification. (value +1) are the two extremes. Lighter blue
From this cleaner dataset, the training and testing hues and values close to 0 signify weak or non-
datasets are further segregated. LR and XGBoost existent linear associations, whereas darker blue
(XGB), two ML classifiers, are used to build hues suggest higher positive correlations. The
predictive models. These classifiers’ performance status variable, representing the disease state,
is commonly assessed using metrics, such as F1- shows moderate correlation with certain acoustic
score, recall, accuracy, and precision. The models’ features, indicating their predictive relevance.
ability to diagnose PD is then determined by This heatmap aids in identifying multicollinearity,
looking at the evaluation results in Figure 1. guiding dimensionality reduction and feature
Each step of the flowchart is explained in the selection strategies in the model development
section below: process.
Figure 3 displays the distribution of individuals
based on their health status, categorized into
Data Collection
Healthy and Parkinson’s. The y-axis shows
In this study, the PD dataset, which was acquired the overall number of people, while the x-axis
through Kaggle, was used. There are 31 people in shows the present situation. With a noticeably
this collection, 23 of whom have PD, and a variety higher percentage of individuals with PD than
of biological voice metrics are included. The index healthy individuals, the graph clearly illustrates
is the “name” column in the database, and each the dataset’s imbalance. Specifically, there are
row corresponds to a voice measure, and each approximately 50 healthy individuals (represented
by the blue bar) and around 145 individuals
with Parkinson’s (represented by the red bar),
indicating that the dataset is imbalanced toward
the Parkinson’s class.
Figure 4 displays a grid of 23 histograms, each
representing the distribution of a different feature.
All histograms are blue on a white background,
consistent with a standard plotting style, and appear
to have similar scales on their y-axes (representing
frequency or count), though the x-axis scales vary
for each feature. Many histograms frequently
exhibit a skewed distribution that extends toward
higher values with a lengthy tail and a high
frequency of values concentrated at the lower end.
This indicates that most features are not normally
distributed but rather exhibit a positive skew,
Figure 1: Flowchart of early diagnosis of Parkinson’s meaning there are more instances of lower values
disease and fewer instances of higher values.

AJCSE/Jul-Sep-2025/Vol 10/Issue 3 5
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

Figure 2: Correlation between features

Detecting outliers
The mode, median, and mean are all at the
same location, indicating that the data are
symmetrical.[20] A longer or fatter tail distribution
to the right indicates positive skewness in the data,
meaning that the mode is lower than the mean and
median.

Standardization of continuous variables


The standardization approach was used to make
sure that all of the data had a uniform format
Figure 3: Plot between healthy and Parkinson’s from the because the dataset derived from the earlier phases
dataset included continuous variables.[21] The dataset was
Data Pre-processing standardized using Equation (1), where the mean
of each characteristic is taken out of split by its
This part used a variety of pre-processing value and the data’s standard deviation.
techniques to improve the data’s quality while
keeping the original characteristics for further x − mean
Stand =  (1)
examination. The pre-processing involves data Standard Deviation
cleaning, outlier detection, standardization of
continuous variables, and feature selection Feature selection
techniques, which are discussed below: A crucial step before using classification
algorithms is feature selection, which lowers
the algorithms’ complexity and computation
Data cleaning time while also improving overall classification
Single-value and missing-value columns were performance.[22] The following describes the
eliminated before pre-processing and analysis.[19] To feature selection: The aim of feature selection is to
provide more dependable and significant findings, find the optimal subset Q’, where Q’ ⊂ Q and has
effective data cleansing makes that the information is a size of n’, where (n’ < n), such that the following
trustworthy, consistent, and suitable for ML or analysis. equivalence is assured in eq. Given an evaluation

AJCSE/Jul-Sep-2025/Vol 10/Issue 3 6
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

Figure 4: Analyzing the data attributes

function Eval and a feature set Q = q1, q2,…,qn of XGBoost classifier


size n, where n is the total number of Equation (2): XGBoost is a classifier that uses the gradient
boosting (GB) technique, which is based on the
argmin E val (M )  Decision Tree (DT). Its speed, effectiveness, and
E val (Q' ) = (2)
M⊂Q scalability have led to its usage.[23] The following
is a general explanation of GB and XGBoost.
In this case, M = n ' , where n’ is a user-defined Using D=[x,y] to characterize a dataset with n
number or dictated by the selection criteria. observations, where x is the feature (an independent
variable) and y is the dependent variable.[24]
Data Splitting The scores for each leaf may then be added
together to determine the final forecast for a
The data splitting, which comprises separating specific sample x_i, as shown in Equation (3).
the dataset into subsets for testing and training,

i = ∑ b =1 f b ( xi )
B
typically 30% for testing and 70% for training, is yÆ (3)
a crucial step in DL.
A tree construction q is indicated by fb, and leaf j
has a weight score wj. If boosting is k in GB, use a
Proposed ML Classifiers B function to anticipate the outcome using  yi as
The ML models are described in this section: the prediction for the i-th sample at the b-th boost.

AJCSE/Jul-Sep-2025/Vol 10/Issue 3 7
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

Logistic Regression (LR) · False Positives (FP): The quantity of instances


in which a patient is misdiagnosed with PD by
The majority of early 20th-century biological
the model when they do not
research and applications employed LR. When
· True Negatives (TN): The frequency with
dealing with categorical target variables, one of
which the model accurately predicts that
the most used ML techniques is LR. Lately, LR
a patient is actually healthy and does not
has gained popularity as a technique for binary
have PD
classification issues.[25,26] In addition, a discrete
· False Negatives (FN): The frequency with
binary product between 0 and 1 is shown. Using
which the model predicts a patient to be
the underlying logistic function, LR evaluates
healthy while in fact they have PD.
probabilities (p) to calculate the connection
between the feature variables.[27] In the initial
phase of the analysis, LR, a widely-used method Accuracy
for predictive analytics and classification tasks, Evaluates the overall diagnostic precision of the
was applied to transform, which is the probability model for both PD patients and those without
of success divided by the probability of failure, the condition. The accuracy is calculated for the
the logit formula was employed as shown in overall model using Equation (6):
Equation (4): TP + TN 
Accuracy = (6)
1 (TP + TN + FP + FN )
Logit ( p ) =  (4)
1 + exp(− p )
Precision
Evaluates the model’s capacity to identify PD in
The function of Logit (p) in LR is to transform authentic situations. High recall is crucial for early
the odds of success to a linear scale, facilitating diagnosis to avoid missed cases. The precision is
binary classification by modeling the probability calculated in Equation (7):
of the outcome as given in Equation (5):
TP 
p Pr ecision = (7)
ln = β 0 + β1 X 1 +… + β k X k  (5) (TP + FP )
1− p
Recall
Where X1…Xk are predictor variables, p is the The percentage of TP evaluations that the model
probability of an occurrence, and β0, β1,…,βk accurately detects. An elevated recall signifies
are coefficients that determine each predictor that the model can detect the vast majority of TP
variable’s proportional relevance. emotions. The recall is mathematically depicted in
Equation (8):
TP 
Performance Matrix Re call = (8)
(TP + FN )
The suggested model’s performance was
evaluated using the four commonly used F1-score
evaluation metrics of recall, accuracy, precision, A single performance metric that balances the
and F1-score. The predictive ability of the model importance of both detecting true Parkinson’s
was demonstrated by comparing its predictions cases and avoiding FP. In situations when there is
with the test dataset’s actual class labels using an unequal distribution of classes, it is invaluable.
a confusion matrix. This matrix summarizes the The F1-score is formulated in Equation (9):
right and wrong classifications in a simple way, Pr ecision × Re call 
F1 − Score = 2 × (9)
giving you a better idea of how well the model Pr ecision + Re call
worked. It also serves as a basis for calculating
ROC-area under the curve (AUC)
key performance indicators that reflect the
model’s classification effectiveness. The The classification problem’s performance is
confusion matrix’s essential elements include: measured using the ROC curve. The x-axis
· True Positives (TP): The proportion of PD displays the FPR, while the y-axis displays the
patients who the algorithm correctly forecasts TPR. The AUC and ROC, is a separability statistic
will have the condition that shows how well a model can differentiate
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 8
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

between classes. The model predicts classes more model’s performance, whereas the black dashed
correctly when the AUC is larger. line indicates a random classifier (where
AUC = 0.5). As seen in the legend, the XGBoost
model’s curve performs admirably, maintaining
RESULTS AND DISCUSSION
a significant margin above the random classifier
The system used for this study is equipped with a line and attaining an AUC score of 0.9833. The
6th Generation Intel Core i5 processor, supported XGBoost model appears to have outstanding
by 12 GB of RAM to ensure smooth multitasking discriminating power, successfully differentiating
and efficient data handling. It also has a dedicated between positive and negative classes, based on
4 GB GPU to make computations faster, especially its high AUC value.
those related to ML. The ML models for PD The XGBoost model’s confusion matrix is shown in
prediction are compared in Table 2 according Figure 6, showing strong classification performance.
to important performance characteristics, such All 9 healthy individuals were correctly identified
as F1-score, recall, accuracy, and precision. The TN, with no FPs. Among Parkinson’s cases, 29 were
XGBoost model’s maximum accuracy of 97.4%, correctly classified TP, and only 1 was misclassified
precision of 99.9%, recall of 96.6%, and F1-score FN. The darker blue shades emphasize the high
of 98.3% show how effectively the model can number of accurate predictions.
distinguish between favorable and unfavorable An LR, ROC curve, which shows how effectively
situations. Comparatively, the LR model performs a binary classifier system can identify issues when
a little bit lower with the accuracy standing at its discriminating threshold is altered, is shown in
92.3%, precision at 93.5%, recall at 96.6% and an Figure 7. The TPR (sensitivity) is shown on the
accuracy of 95.0%. Such findings underscore the y-axis, while the FPR (specificity) is shown on the
efficiency and stability of XGBoost in predicting x-axis. The blue solid line shows the model’s ROC
the presence of PD in its initial stages, which were curve, while a random classifier is shown by the
better than LR with regard to all the measured
variables.
The ROC curve for the XGBoost model is
displayed in Figure 5. There are problems with
both the FPR on the x-axis and the TPR on the
y-axis. The blue solid line displays the XGBoost

Table 2: Evaluation of machine learning models on early


diagnosis of Parkinson’s disease
Model Accuracy Precision Recall F1‑score
XGBoost 97.4 99.9 96.6 98.3
LR 92.3 93.5 96.6 95.0
LR: Logistic regression, XGBoost: Extreme gradient boosting
Figure 6: Confusion matrix of the Extreme gradient
boosting model

Figure 5: Receiver operating characteristic graph of the Figure 7: Receiver operating characteristic graph of the
Extreme gradient boosting model logistic regression model

AJCSE/Jul-Sep-2025/Vol 10/Issue 3 9
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

black dashed line. According to the description, achieves 92.3% accuracy, and the last model is
this ROC curve’s AUC is 0.8722. Random Forest[28] with an accuracy of 91.01%.
In Figure 8, the classification performance of an LR Bagging[29] produces a moderately high accuracy
model is shown graphically as a confusion matrix. of 88.2%, whereas SVM[30] and DT[31] have lesser
The matrix, labeled “Confusion Matrix - LR,” accuracies of 76.32% and 60.7%, respectively.
has “Actual” classes (Healthy and Parkinsons) Such results highlight the high level of precision
on the y-axis and “Predicted” classes (Healthy and diagnostic efficiency of the suggested
and Parkinsons) on the x-axis. According to the XGBoost model in comparison with both standard
matrix, 7 individuals who were actually “Healthy” and ensemble-based methods of ML.
were correctly predicted as “Healthy” (TN). The suggested XGBoost and LR models are
The FP results were 2 cases of “Healthy” being excellent for early PD detection because they are
erroneously classified as “Parkinson’s.” Among reliable, generalizable, and able to handle intricate
people with real cases of having “Parkinson’s” 1 data patterns. The XGBoost is a successful ensemble
was falsely classified as being healthy FN and 29 learning technique, is a reliable algorithm because
as being “Parkinson’s” TP. it is useful in characterizing non-linear feature
relationships and interactions, and thus it should
be useful in biomedical tasks of classification.
Comparative Analysis
The fact that it has internal regularization and
In this section, a comparative statement is provided optimization helps in increasing model stability
to compare the proposed XGBoost and LR models and minimizing overfitting. On one hand, LR is
with the present ML techniques, DT, and Support praised due to its simplicity, interpretability, and
Vector Machine (SVM). Table 3 shows that the effectiveness of processing linearly separable data,
XGBoost model has the highest accuracy of which is quite crucial in medical diagnosis when
97.4%, indicating that it has great predictive transparency and explainability are vital. These
ability. Another model, LR, works quite well and models, when combined, outperform traditional
ML techniques in several ways: They produce
more accurate and reliable predictions, offer better
classification, and can identify individuals in good
health and those with PD, enabling more effective
early intervention and treatment planning.

CONCLUSION AND FUTURE SCOPE


The neurological degenerative disorder known as
PD can cause both motor and non-motor symptoms.
Non-motor symptoms include sleep difficulties,
depression, and irregularities in cognition, whereas
Figure 8: Confusion matrix of the logistic regression motor symptoms, including bradykinesia, tremors,
model and stiffness, have been linked to striatal dopamine
deficit. There are currently no reliable tests to
Table 3: Comparative analysis of ML models on early identify PD, however, identifying illnesses that have
diagnosis Parkinson’s disease characteristics with the Parkinson’s syndrome is a
Model Accuracy Precision Recall F1‑Score crucial first step in the diagnosing process. Finally,
Bagging 88.2 67.6 94.8 ‑ a novel and effective method for early PD detection
Support vector 76.32 86.0 81.0 84.0 may be the suggested strategy, which combines
machine
NLP with ML methods, such as XGBoost and LR.
Decision tree 60.7 58.4 60.7 59.5
Regarding precision, accuracy, recall, and F1-score,
Random forest 91.01 89.25 93.26 91.21
the suggested paradigm shows encouraging results
XGBoost 97.4 99.9 96.6 98.3
LR 92.3 93.5 96.6 95.0
for practical clinical use. Better patient outcomes
LR: Logistic regression, XGBoost: Extreme gradient boosting, ML: Machine
and early intervention can be facilitated by this
learning automated and scalable method.

AJCSE/Jul-Sep-2025/Vol 10/Issue 3 10
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

For future scope, the model can be extended by shapes. In: 2025 4th International Conference on
incorporating DL techniques, larger and more Robotics, Electrical and Signal Processing Techniques
(ICREST). Bangladesh: IEEE; 2025. p. 65-70.
diverse datasets, and multilingual clinical records. 14. Mehta S, Khurana S. From data to diagnosis: Utilizing
In addition, expanding the pipeline to detect other deep belief networks for early parkinson’s disease
neurological disorders or integrating it into a real- detection. In: 2024 International Conference on Artificial
time diagnostic support tool could further enhance Intelligence and Emerging Technology (Global AI
its utility and impact in the medical field. Summit). IEEE; 2024. p. 347-51.
15. Vats S, Mehta S. The impact of deep belief networks
on the early diagnosis of parkinson’s disease. In:
REFERENCES 2024 5th International Conference on Electronics and
Sustainable Communication Systems (ICESC). IEEE;
1. Paccosi E, Proietti-De-Santis L. Parkinson’s disease: 2024. p. 980-4.
From genetics and epigenetics to treatment, a miRNA- 16. Tesfai S. Multimodal ensemble models for parkinson’s
based strategy. Int J Mol Sci 2023;24:9547. disease diagnosis using log-mel spectrograms and
2. Jalaja PP, Kommineni D, Mishra A, Tumati R, acoustic features. In: 2023 IEEE MIT Undergraduate
Joseph CA, Rupavath RV. Predictors of mortality in acute Research Technology Conference (URTC). IEEE;
myocardial infarction: Insights from the healthcare cost 2023. p. 1-5.
and utilization project (HCUP) nationwide readmission 17. Lyu T, Guo H. BGCN: An EEG-based graphical
database. Cureus 2025;17:e83675. classification method for parkinson’s disease diagnosis
3. Patel N. Quantum cryptography in healthcare with heuristic functional connectivity speculation. In:
information systems: Enhancing security in medical 2023 11th International IEEE/EMBS Conference on
data storage and communication. J Emerg Technol Neural Engineering (NER). IEEE; 2023. p. 1-4.
Innov Res 2022;9:193-202. 18. Chang WH, Du Liou K, Liu YT, Wen KA. Precise
4. Jiang P, Gao N, Chang G, Wu Y. Biosensors for motor function monitor for parkinson disease using
early detection of parkinson’s disease: Principles, low power and wearable IMU body area network.
applications, and future prospects. Biosensors (Basel) In: 2022 14th Biomedical Engineering International
2025;15:280. Conference (BMEiCON). IEEE; 2022. p. 1-5.
5. Pahune S, Rewatkar N. Large language models and 19. Sagili S, Goswami C, Bharathi VC, Ananthi S, Rani K,
generative AI’s expanding role in healthcare. Int J Res Sathya R. Identification of diabetic retinopathy by transfer
Appl Sci Eng Technol 2024;11:2288-302. learning based retinal images. In: 2024 9th International
6. Mobed A, Razavi S, Ahmadalipour A, Shakouri SK, Conference on Communication and Electronics Systems
Koohkan G. Biosensors in parkinson’s disease. Clin (ICCES). IEEE; 2024. p. 1149-54.
Chim Acta 2021;518:51-8. 20. Wang J, Zhou Z, Li Z, Du S. A novel fault detection
7. Singamsetty S. Neurofusion advancing alzheimer’s scheme based on mutual k-nearest neighbor method:
diagnosis with deep learning and multimodal feature Application on the industrial processes with outliers.
integration. Int J Educ Appl Sci Res 2021;8:23-32. Processes 2022;10:497.
8. Chakraborty S, Aich S, Kim HC. Detection of 21. Chang KH, Wang CH, Hsu BG, Tsai JP. Serum
parkinson’s disease from 3T T1 weighted MRI scans osteopontin level is positively associated with aortic
using 3D convolutional neural network. Diagnostics stiffness in patients with peritoneal dialysis. Life (Basel)
(Basel) 2020;10:402. 2022;12:397.
9. Mostafiz MA. Machine learning for early cancer 22. Khoury N, Attal F, Amirat Y, Oukhellou L, Mohammed S.
detection and classification: AI-based medical Data-driven based approach to aid parkinson’s disease
imaging analysis in healthcare. Int J Curr Eng Technol diagnosis. Sensors (Basel) 2019;19:242.
2025;15:251-60. 23. Mahajan RP. Transfer learning for MRI image
10. Majhi B, Kashyap A, Mohanty SS, Dash S, Mallik S, reconstruction: Enhancing model performance with
Li A, et al. An improved method for diagnosis of pretrained networks. Int J Sci Res Arch 2025;15:298-309.
parkinson’s disease using deep learning models 24. Abdurrahman G, Sintawati M. Implementation of
enhanced with metaheuristic algorithm. BMC Med XGboost for classification of parkinson’s disease.
Imaging 2024;24:156. J Phys Conf Ser 2020;1538:1-8.
11. Neeli S. Heart disease prediction for a cloud-based smart 25. Gomathy CK. The parkinson’s disease detection using
healthcare monitoring system using gans and ant colony machine learning. Int Res J Eng Technol 2021;8:440-4.
optimization. Int J Med Pub Health 2024;14:1219-29. 26. Dattangire R, Biradar D, Vaidya R, Joon A.
12. Jain S, Srivastava R. Neurodegenerative disease A comprehensive analysis of cholera disease prediction
alzheimer’s and parkinson’s classification with using machine learning. In: Congress on Intelligent
deep learning. In: 2025 3rd International Conference Systems. Berlin, Germany: Springer; 2025. p. 555-68.
on Advancement in Computation and Computer 27. Shah SB. Artificial intelligence (AI) for brain tumor
Technologies (InCACCT). IEEE; 2025. p. 798-803. detection: Automating MRI image analysis for enhanced
13. Nawal R, Habib N, Barua S. A deep learning based non- accuracy. Int J Curr Eng Technol 2024;14:320-7.
invasive framework for neurodegenerative parkinson’s 28. Yang X, Ye Q, Cai G, Wang Y, Cai G. PD-ResNet for
disease diagnosis from template-less handwritten classification of parkinson’s disease from gait. IEEE J

AJCSE/Jul-Sep-2025/Vol 10/Issue 3 11
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging

Transl Eng Health Med 2022;10:1-11. classification of parkinson’s disease patients using
29. Barukab O, Ahmad A, Khan T, Kunhumuhammed MR. speech biomarkers. J Park Dis 2024;14:95-109.
Analysis of parkinson’s disease using an imbalanced- 31. Ali H, Hashmi E, Yildirim SY, Shaikh S. Analyzing
speech dataset by employing decision tree ensemble amazon products sentiment: A comparative study of
methods. Diagnostics (Basel) 2022;12:300. machine and deep learning, and transformer-based
30. Hossain MA, Amenta F. Machine learning-based techniques. Electronics 2024;13:1305.

AJCSE/Jul-Sep-2025/Vol 10/Issue 3 12

You might also like