Machine Learning-Based Disease Classification Models For Parkinson's Based On Magnetic Resonance Imaging
Machine Learning-Based Disease Classification Models For Parkinson's Based On Magnetic Resonance Imaging
REVIEW ARTICLE
that no aneurysms are present. This information is · Implemented a robust pre-processing pipeline,
also thought to be helpful in the early detection including data cleaning, outlier detection, and
of specific disease types. However, because the standardization of continuous variables to
MRI is a three-dimensional structure, using the improve data quality and model performance
human eye to explore the nuances and various · Implemented XGBoost and LR classifiers to
features of subcortical areas can be challenging.[8,9] determine the most effective model for diagnosis
Thus, by utilizing multidimensional healthcare · To manage and treat PD early, the proposed
data, computer-aided detection systems have study employs ML to diagnose the condition
demonstrated remarkable efficacy in illness · Measured the performance of evaluated
analysis and diagnosis as intelligent technologies models with standard classification metrics,
have advanced. Precision, Recall, Accuracy, and F1-Score to
The latest developments in deep learning (DL) and guarantee robustness and reliability.
machine learning (ML), two branches of artificial
intelligence (AI), are helping doctors diagnose
Novelty and Justification of the Study
diseases early. As a result, recent studies have used
a range of AI and ML algorithms to automatically The proposed study is novel because it uses a
detect PD from MRI data.[10,11] DL has been used to holistic method of detecting PD based on voice
detect many different diseases and conditions, and attributes by using ensemble learning (XGBoost)
the results often surpass conventional benchmarks. and classical statistical analysis, LR benchmarks
DL algorithms are very powerful and often used to measure overall performance. Compared to
for image categorization tasks. Because they can the previous works where a single model or a
recognize intricate patterns and characteristics small number of features can be used, this study
from pictures, they outperform the outdated ML employs a wide variety of vocal biomarkers
techniques in terms of accuracy. producing delicate patterns related to PD based
on biomedical voice measurements. The use of
advanced pre-processing, feature selection, and
Motivation and Contribution of the Study
cross-validation techniques ensures robust model
The motor system is impacted by PD, a training and generalization. The justification for
degenerative neurological condition. Early this study stems from the urgent need for accurate,
diagnosis is essential for managing it and enhancing non-invasive, and early-stage diagnostic tools, as
quality of life. Conventional diagnostic methods, present clinical assessments are prone to delays
however, typically rely on subjective assessments and subjectivity. By comparing and validating
and physical observations, which can be time- multiple ML models, this study offers important
consuming to establish. Automated, non-invasive, new information on the predictive power of voice
and effective diagnostic techniques might become characteristics, supporting the development of
more feasible as ML advances and biological scalable, real-time diagnostic applications in
voice data becomes more accessible. The project clinical settings.
is driven by the need to use these technologies to
investigate voice biomarkers for PD to allow early Structure of the Paper
identification, which often fluctuates throughout
the disease’s early stages. The study uses modern The following is the structure of the paper:
ML algorithms, including Extreme Gradient Section II examines pertinent studies on PD early
Boosting (XGBoost) and Logistic Regression (LR) diagnosis, Section III describes the technique,
on voice-based features to improve diagnostic Section IV displays the findings and model
accuracy, do away with manual analysis, and comparisons, and Section V offers conclusions
contribute to the development of reliable, data- and suggestions for further study.
driven healthcare proposals.
The study’s primary contributions are as follows: LITERATURE REVIEW
· Utilized a PD dataset from Kaggle, enhancing
the practical relevance and applicability of the The material currently available on the early
findings diagnosis of PD is reviewed in this section. The
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 2
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging
majority of studies emphasize the use of diverse was 92%. In comparison, the sensitivity was
algorithms to enhance the efficiency of task 90%, and the specificity was 94%. The receiver
scheduling in cloud environments. Common operating characteristic (ROC)-AVC of the timing
themes emerging from the reviewed literature of task execution was calculated to be 95% in the
include: diagnostic capacity, which indeed indicates the
Jain and Srivastava proposed neurological high level had been maintained. According to the
disorders, the use of MRI and CT images as input above results, the DBN model provided superior
data in DL models is becoming increasingly performance to other diagnostic methods, which
widespread. In this study, MRI images from the include a low FNR. Traditional techniques, where
“Alzheimer Parkinson 3 Class Data Set” available the diagnosis depends on a doctor’s assignment
on the Kaggle platform were used for the diagnosis and imaging techniques, are usually less accurate
of Alzheimer’s and PD. The dataset includes three and take more time to detect diseases early.[14]
classes: 2,561 Alzheimer’s, 906 Parkinson’s, and Vats and Mehta suggested deploying a DBN
3,010 Control (Normal) images. In this work, the method, considered a highly advanced ML
Alzheimer, Parkinson, and Normal classes were algorithm, which is more of a memory structure
trained using ResNet-18, VGG-16, and ConvNext capable of DL and hierarchically. Their study
architectures, yielding accuracy rates of 96.2%, implied the use of a DBM model for a diverse
95.4%, and 98.9%, respectively. In addition, data set of 500 PD subjects suspected to have the
Alzheimer and PDs were tested against the normal disease in its early stage. The dataset contains
class using binary classifiers. For the Alzheimer- medical records, speaking analysis, audio
Normal and Parkinson-Normal classes, the models recordings of subjects, and biometric monitoring.
achieved the following results: ResNet-18 with The model was trained using a two-phase training
accuracy rates of 82.0% and 96.1%, VGG-16 with approach. The first phase is an unsupervised pre-
95.4% and 89.4%, and ConvNext with 99.4% and training process to learn general characteristics.
99.5%, respectively.[12] The DBN model’s accuracy of 93%, sensitivity
Nawal et al., stated that an approach combining of 90%, specificity of 93%, and AVC of 0.7. 97
Histogram of Oriented Gradients (HOG) with it were all extremely positive outcomes. With an
is suggested to use a customized convolutional accuracy of 85%, sensitivity of 80%, specificity
neural networks (CNN) for early PD diagnosis. of 85%, and AVC of 0.85%, these measurements
Pre-processing methods were used to improve perform better than standard diagnostic
the consistency and quality of a medical image techniques.[15]
collection. The CNN extracts key features while Tesfai focused on the development of a speech
HOG provides edge orientation information, and audio-based ML pipeline for PD diagnosis.
and their fusion creates a robust feature map. Two voice recording datasets are assembled
An integrated attention mechanism further using data augmentation techniques. Paired with
refines focus on crucial regions. Evaluation traditional ML models, acoustic features yield
demonstrates a balanced performance in terms 99.21% accuracy, while Log-Mel spectrograms
of accuracy (99%) and parameter (0.8M) with CNN’s achieve 99.71% accuracy. The
requirement. Visualization tools, such as Grad highest accuracy of 99.82% is attained through an
class activation mapping offer insights into ensemble model that combines both spectrogram
model decisions, aiding interpretability. This and acoustic models. These outcomes provide
approach offers an accurate PD detection, compelling evidence for the effectiveness of
potentially transforming diagnosis and multimodal ensemble models in PD diagnosis,
improving patient outcomes.[13] offering promising prospects for non-invasive
Mehta and Khurana aimed to determine whether early detection.[16]
deep belief networks (DBNs) are suitable Lyu and Guo Brain Graph Convolutional
for detecting PD early since they can assess Networks is a unified framework designed to
complicated and high-dimensional medical integrate brain functional connectivity based
information. During the DBN modeling, the on the non-Euclidean heuristic into a DL
data used were trained and tested using publicly model (GCN) based on graphs for diagnosing
available datasets, and the accuracy level recorded Parkinson’s illness. To preserve the spatial
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 3
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 4
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 5
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging
Detecting outliers
The mode, median, and mean are all at the
same location, indicating that the data are
symmetrical.[20] A longer or fatter tail distribution
to the right indicates positive skewness in the data,
meaning that the mode is lower than the mean and
median.
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 6
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 7
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging
between classes. The model predicts classes more model’s performance, whereas the black dashed
correctly when the AUC is larger. line indicates a random classifier (where
AUC = 0.5). As seen in the legend, the XGBoost
model’s curve performs admirably, maintaining
RESULTS AND DISCUSSION
a significant margin above the random classifier
The system used for this study is equipped with a line and attaining an AUC score of 0.9833. The
6th Generation Intel Core i5 processor, supported XGBoost model appears to have outstanding
by 12 GB of RAM to ensure smooth multitasking discriminating power, successfully differentiating
and efficient data handling. It also has a dedicated between positive and negative classes, based on
4 GB GPU to make computations faster, especially its high AUC value.
those related to ML. The ML models for PD The XGBoost model’s confusion matrix is shown in
prediction are compared in Table 2 according Figure 6, showing strong classification performance.
to important performance characteristics, such All 9 healthy individuals were correctly identified
as F1-score, recall, accuracy, and precision. The TN, with no FPs. Among Parkinson’s cases, 29 were
XGBoost model’s maximum accuracy of 97.4%, correctly classified TP, and only 1 was misclassified
precision of 99.9%, recall of 96.6%, and F1-score FN. The darker blue shades emphasize the high
of 98.3% show how effectively the model can number of accurate predictions.
distinguish between favorable and unfavorable An LR, ROC curve, which shows how effectively
situations. Comparatively, the LR model performs a binary classifier system can identify issues when
a little bit lower with the accuracy standing at its discriminating threshold is altered, is shown in
92.3%, precision at 93.5%, recall at 96.6% and an Figure 7. The TPR (sensitivity) is shown on the
accuracy of 95.0%. Such findings underscore the y-axis, while the FPR (specificity) is shown on the
efficiency and stability of XGBoost in predicting x-axis. The blue solid line shows the model’s ROC
the presence of PD in its initial stages, which were curve, while a random classifier is shown by the
better than LR with regard to all the measured
variables.
The ROC curve for the XGBoost model is
displayed in Figure 5. There are problems with
both the FPR on the x-axis and the TPR on the
y-axis. The blue solid line displays the XGBoost
Figure 5: Receiver operating characteristic graph of the Figure 7: Receiver operating characteristic graph of the
Extreme gradient boosting model logistic regression model
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 9
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging
black dashed line. According to the description, achieves 92.3% accuracy, and the last model is
this ROC curve’s AUC is 0.8722. Random Forest[28] with an accuracy of 91.01%.
In Figure 8, the classification performance of an LR Bagging[29] produces a moderately high accuracy
model is shown graphically as a confusion matrix. of 88.2%, whereas SVM[30] and DT[31] have lesser
The matrix, labeled “Confusion Matrix - LR,” accuracies of 76.32% and 60.7%, respectively.
has “Actual” classes (Healthy and Parkinsons) Such results highlight the high level of precision
on the y-axis and “Predicted” classes (Healthy and diagnostic efficiency of the suggested
and Parkinsons) on the x-axis. According to the XGBoost model in comparison with both standard
matrix, 7 individuals who were actually “Healthy” and ensemble-based methods of ML.
were correctly predicted as “Healthy” (TN). The suggested XGBoost and LR models are
The FP results were 2 cases of “Healthy” being excellent for early PD detection because they are
erroneously classified as “Parkinson’s.” Among reliable, generalizable, and able to handle intricate
people with real cases of having “Parkinson’s” 1 data patterns. The XGBoost is a successful ensemble
was falsely classified as being healthy FN and 29 learning technique, is a reliable algorithm because
as being “Parkinson’s” TP. it is useful in characterizing non-linear feature
relationships and interactions, and thus it should
be useful in biomedical tasks of classification.
Comparative Analysis
The fact that it has internal regularization and
In this section, a comparative statement is provided optimization helps in increasing model stability
to compare the proposed XGBoost and LR models and minimizing overfitting. On one hand, LR is
with the present ML techniques, DT, and Support praised due to its simplicity, interpretability, and
Vector Machine (SVM). Table 3 shows that the effectiveness of processing linearly separable data,
XGBoost model has the highest accuracy of which is quite crucial in medical diagnosis when
97.4%, indicating that it has great predictive transparency and explainability are vital. These
ability. Another model, LR, works quite well and models, when combined, outperform traditional
ML techniques in several ways: They produce
more accurate and reliable predictions, offer better
classification, and can identify individuals in good
health and those with PD, enabling more effective
early intervention and treatment planning.
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 10
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging
For future scope, the model can be extended by shapes. In: 2025 4th International Conference on
incorporating DL techniques, larger and more Robotics, Electrical and Signal Processing Techniques
(ICREST). Bangladesh: IEEE; 2025. p. 65-70.
diverse datasets, and multilingual clinical records. 14. Mehta S, Khurana S. From data to diagnosis: Utilizing
In addition, expanding the pipeline to detect other deep belief networks for early parkinson’s disease
neurological disorders or integrating it into a real- detection. In: 2024 International Conference on Artificial
time diagnostic support tool could further enhance Intelligence and Emerging Technology (Global AI
its utility and impact in the medical field. Summit). IEEE; 2024. p. 347-51.
15. Vats S, Mehta S. The impact of deep belief networks
on the early diagnosis of parkinson’s disease. In:
REFERENCES 2024 5th International Conference on Electronics and
Sustainable Communication Systems (ICESC). IEEE;
1. Paccosi E, Proietti-De-Santis L. Parkinson’s disease: 2024. p. 980-4.
From genetics and epigenetics to treatment, a miRNA- 16. Tesfai S. Multimodal ensemble models for parkinson’s
based strategy. Int J Mol Sci 2023;24:9547. disease diagnosis using log-mel spectrograms and
2. Jalaja PP, Kommineni D, Mishra A, Tumati R, acoustic features. In: 2023 IEEE MIT Undergraduate
Joseph CA, Rupavath RV. Predictors of mortality in acute Research Technology Conference (URTC). IEEE;
myocardial infarction: Insights from the healthcare cost 2023. p. 1-5.
and utilization project (HCUP) nationwide readmission 17. Lyu T, Guo H. BGCN: An EEG-based graphical
database. Cureus 2025;17:e83675. classification method for parkinson’s disease diagnosis
3. Patel N. Quantum cryptography in healthcare with heuristic functional connectivity speculation. In:
information systems: Enhancing security in medical 2023 11th International IEEE/EMBS Conference on
data storage and communication. J Emerg Technol Neural Engineering (NER). IEEE; 2023. p. 1-4.
Innov Res 2022;9:193-202. 18. Chang WH, Du Liou K, Liu YT, Wen KA. Precise
4. Jiang P, Gao N, Chang G, Wu Y. Biosensors for motor function monitor for parkinson disease using
early detection of parkinson’s disease: Principles, low power and wearable IMU body area network.
applications, and future prospects. Biosensors (Basel) In: 2022 14th Biomedical Engineering International
2025;15:280. Conference (BMEiCON). IEEE; 2022. p. 1-5.
5. Pahune S, Rewatkar N. Large language models and 19. Sagili S, Goswami C, Bharathi VC, Ananthi S, Rani K,
generative AI’s expanding role in healthcare. Int J Res Sathya R. Identification of diabetic retinopathy by transfer
Appl Sci Eng Technol 2024;11:2288-302. learning based retinal images. In: 2024 9th International
6. Mobed A, Razavi S, Ahmadalipour A, Shakouri SK, Conference on Communication and Electronics Systems
Koohkan G. Biosensors in parkinson’s disease. Clin (ICCES). IEEE; 2024. p. 1149-54.
Chim Acta 2021;518:51-8. 20. Wang J, Zhou Z, Li Z, Du S. A novel fault detection
7. Singamsetty S. Neurofusion advancing alzheimer’s scheme based on mutual k-nearest neighbor method:
diagnosis with deep learning and multimodal feature Application on the industrial processes with outliers.
integration. Int J Educ Appl Sci Res 2021;8:23-32. Processes 2022;10:497.
8. Chakraborty S, Aich S, Kim HC. Detection of 21. Chang KH, Wang CH, Hsu BG, Tsai JP. Serum
parkinson’s disease from 3T T1 weighted MRI scans osteopontin level is positively associated with aortic
using 3D convolutional neural network. Diagnostics stiffness in patients with peritoneal dialysis. Life (Basel)
(Basel) 2020;10:402. 2022;12:397.
9. Mostafiz MA. Machine learning for early cancer 22. Khoury N, Attal F, Amirat Y, Oukhellou L, Mohammed S.
detection and classification: AI-based medical Data-driven based approach to aid parkinson’s disease
imaging analysis in healthcare. Int J Curr Eng Technol diagnosis. Sensors (Basel) 2019;19:242.
2025;15:251-60. 23. Mahajan RP. Transfer learning for MRI image
10. Majhi B, Kashyap A, Mohanty SS, Dash S, Mallik S, reconstruction: Enhancing model performance with
Li A, et al. An improved method for diagnosis of pretrained networks. Int J Sci Res Arch 2025;15:298-309.
parkinson’s disease using deep learning models 24. Abdurrahman G, Sintawati M. Implementation of
enhanced with metaheuristic algorithm. BMC Med XGboost for classification of parkinson’s disease.
Imaging 2024;24:156. J Phys Conf Ser 2020;1538:1-8.
11. Neeli S. Heart disease prediction for a cloud-based smart 25. Gomathy CK. The parkinson’s disease detection using
healthcare monitoring system using gans and ant colony machine learning. Int Res J Eng Technol 2021;8:440-4.
optimization. Int J Med Pub Health 2024;14:1219-29. 26. Dattangire R, Biradar D, Vaidya R, Joon A.
12. Jain S, Srivastava R. Neurodegenerative disease A comprehensive analysis of cholera disease prediction
alzheimer’s and parkinson’s classification with using machine learning. In: Congress on Intelligent
deep learning. In: 2025 3rd International Conference Systems. Berlin, Germany: Springer; 2025. p. 555-68.
on Advancement in Computation and Computer 27. Shah SB. Artificial intelligence (AI) for brain tumor
Technologies (InCACCT). IEEE; 2025. p. 798-803. detection: Automating MRI image analysis for enhanced
13. Nawal R, Habib N, Barua S. A deep learning based non- accuracy. Int J Curr Eng Technol 2024;14:320-7.
invasive framework for neurodegenerative parkinson’s 28. Yang X, Ye Q, Cai G, Wang Y, Cai G. PD-ResNet for
disease diagnosis from template-less handwritten classification of parkinson’s disease from gait. IEEE J
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 11
Laxkar: Machine Learning-Based Disease Classification Models for Parkinson’s Based on Magnetic Resonance Imaging
Transl Eng Health Med 2022;10:1-11. classification of parkinson’s disease patients using
29. Barukab O, Ahmad A, Khan T, Kunhumuhammed MR. speech biomarkers. J Park Dis 2024;14:95-109.
Analysis of parkinson’s disease using an imbalanced- 31. Ali H, Hashmi E, Yildirim SY, Shaikh S. Analyzing
speech dataset by employing decision tree ensemble amazon products sentiment: A comparative study of
methods. Diagnostics (Basel) 2022;12:300. machine and deep learning, and transformer-based
30. Hossain MA, Amenta F. Machine learning-based techniques. Electronics 2024;13:1305.
AJCSE/Jul-Sep-2025/Vol 10/Issue 3 12