Comprehensive Detection of Alzheimer’s Disease
Using Machine Learning and Deep Neural
Networks: A Cross-Sectional and Longitudinal
Study
Poonam Rani Tushar Dahiya
Dept. of Computer Science & Engineering Dept. of Computer Science & Engineering
Divya Jain Katikay Mehra
Dept. of Computer Science & Engineering Dept. of Computer Science & Engineering
Roll No. 2023UCS1647 Roll No. 2023UCS1650
Sneha Anjali
Dept. of Computer Science & Engineering Dept. of Computer Science & Engineering
Roll No. 2023UCS1641 Roll No. 2023UCS1593
Abstract—Alzheimer’s disease (AD) is a progressive neurode- with dementia by 2050, creating an immense societal and
generative disorder and a pressing global health issue that economic burden. The pathology of AD typically leads to
demands more effective and accessible diagnostic approaches. a gradual decline in cognitive functions, including memory,
Early and accurate detection is critical for patient care, clinical
trial enrollment, and the development of new therapies. This reasoning, and behavioral capabilities, severely impairing a
study presents a comprehensive analysis of machine learning and person’s ability to perform daily activities. Given the profound
deep learning models for AD detection, leveraging the publicly impact on patients and healthcare systems, the development
available OASIS cross-sectional and longitudinal datasets. Focus- of tools for early and accurate diagnosis is a paramount
ing on structured, MRI-derived neuroimaging features, this work challenge in modern medicine [14]. Early detection is critical
systematically explores, implements, and evaluates a spectrum
of models. We compare the static predictive power of classi- not only for clinical intervention and care planning but also
cal classifiers (e.g., Logistic Regression, Random Forest, SVM) for enrolling participants in clinical trials aimed at slowing the
on cross-sectional data against the dynamic, sequence-learning disease’s progression.
capabilities of recurrent architectures (LSTMs) on longitudinal Traditional diagnostic techniques, while established, present
data. The models are benchmarked on their ability to perform significant barriers to widespread, routine screening. Methods
binary classification (Healthy vs. Demented), a critical first step in
clinical screening protocols. We address key challenges inherent such as cerebrospinal fluid (CSF) analysis require an inva-
to medical data, such as feature heterogeneity, missing values, sive lumbar puncture, while positron emission tomography
and the distinct analytical requirements of static versus time- (PET) scans are costly and involve radioactive tracers. These
series data. This paper highlights the potential and limitations limitations create a critical need for diagnostic tools that are
of different modeling paradigms, finding that while traditional non-invasive, cost-effective, accurate, and scalable. To address
models offer high interpretability and strong performance in
cross-sectional analysis, deep learning models like LSTMs are these challenges, machine learning (ML) and deep learning
uniquely suited to capturing the subtle temporal dynamics of (DL) have emerged as powerful paradigms for developing such
disease progression. Ultimately, this work underscores the critical systems [9]. These computational methods excel at identifying
importance of matching model architecture to data structure to complex, high-dimensional patterns in standard neuroimaging
enhance diagnostic precision and provides a clear, reproducible data, like structural MRI, that may be too subtle for human
comparative baseline for future research in computational neu-
rology. interpretation. By learning from large datasets, these models
Alzheimer’s disease, AD detection, convolutional neural can create robust, data-driven frameworks for diagnostics.
network, recurrent neural network, machine learning, deep This study proposes a comprehensive ML pipeline that
learning, cross-sectional, longitudinal, OASIS. leverages the widely used Open Access Series of Imaging
Studies (OASIS) datasets [2]. The core hypothesis is that a
I. I NTRODUCTION systematic comparison of different model classes on distinct
Alzheimer’s disease (AD) is a chronic, irreversible, and data types—static versus dynamic—can reveal optimal strate-
progressive neurodegenerative disorder that represents the gies for AD detection using MRI-derived features. The focus
most prevalent cause of dementia globally. With the aging is on a foundational binary classification task: differentiating
population steadily increasing, the World Health Organization between healthy and demented subjects. We implement and
(WHO) projects that around 139 million people will be living compare a wide range of models, from interpretable traditional
classifiers like Support Vector Machines [6] and Random applied in the medical domain. The development of this re-
Forests [5] to advanced neural networks like LSTMs [8] that search has been critically accelerated by open-source software
are capable of learning temporal patterns. A key contribution libraries. Frameworks like Scikit-learn [3] and Keras [4] have
of this work is the creation of a unified framework to directly democratized access to powerful algorithms. As summarized
evaluate cross-sectional (”snapshot”) models against longitudi- by Alsubaie et al. [1], the overarching trend is a move
nal (”progression”) models. This paper is organized as follows: towards more complex, multimodal models to capture the
Section II reviews the relevant literature. Section III details the multifaceted nature of Alzheimer’s disease, though challenges
full methodology. Section IV details the model architectures, in interpretability and clinical validation remain [14], [15].
followed by results, discussion, and conclusion.
III. M ETHODOLOGY
II. L ITERATURE R EVIEW This study employs a multi-stage methodology that begins
with data acquisition and rigorous preprocessing, followed
The application of computational methods to Alzheimer’s
by exploratory data analysis to uncover underlying patterns,
disease diagnostics has grown into a major field of research,
and culminates in a comparative analysis of various machine
propelled by the availability of public datasets and advance-
learning and deep learning models. Each stage is designed to
ments in machine learning algorithms.
ensure the robustness and clinical relevance of the findings.
A cornerstone of this research is the availability of high-
The overall workflow is visualized in Figure 1.
quality, large-scale datasets. The Open Access Series of Imag-
ing Studies (OASIS) initiative, first detailed by Marcus et
al. [2], has been particularly influential. By providing both
cross-sectional and longitudinal structural MRI data alongside
clinical assessments, OASIS enables researchers to investigate
AD not just as a static condition but as a progressive disease.
This has paved the way for the development of sophisticated
models that can track changes over time.
Historically, classical machine learning models have served
as a strong baseline for classification tasks using tabular
clinical data. Support Vector Machines (SVMs), as intro-
duced by Cortes and Vapnik [6], have been widely used for
their effectiveness in high-dimensional spaces, making them
suitable for handling numerous neuroimaging features. Simi-
larly, ensemble methods like Random Forests, developed by
Breiman [5], are valued for their robustness against overfitting
and their ability to capture non-linear interactions between
features, a common characteristic of biological data. Such
models have been successfully applied in numerous studies
for AD classification [11].
More recently, the field has seen a significant shift towards
deep learning. The survey by Zhang et al. [9] highlights the
extensive use of deep learning for neuroimaging-based brain
disorder analysis. For tasks involving structured, non-image
data, Feedforward Neural Networks optimized with methods
like Adam [7] have become a standard approach. Regulariza-
tion techniques like Dropout, introduced by Srivastava et al.
[12], are crucial for preventing overfitting in these models.
For longitudinal data, which captures patient visits over time,
Recurrent Neural Networks (RNNs) are particularly powerful.
The Long Short-Term Memory (LSTM) architecture, proposed
by Hochreiter and Schmidhuber [8], is a specialized RNN
that can learn long-term dependencies, making it ideal for Fig. 1. Project Methodology Flowchart.
modeling disease progression from sequential clinical data
[13].
The potential of even more advanced architectures is also A. Dataset Description
being explored. Transformers, introduced by Vaswani et al. This work utilizes two public datasets from the Open Access
with their ”Attention is All you Need” paper [10], have Series of Imaging Studies (OASIS), which is a valuable
revolutionized sequence processing and are beginning to be resource for studying neurodegenerative diseases. The use of
two distinct datasets is a key strength of this study, allowing numeric ranges (like eTIV) do not disproportionately influence
for both static and dynamic analysis of AD markers. the model’s learning process compared to features with smaller
• OASIS-I: This is a cross-sectional MRI dataset contain- ranges (like nWBV).
ing records for 416 subjects, each from a single visit. This 3) Outlier Detection: Outliers can skew the results of a
type of data is useful for building models that can make model. The *Interquartile Range (IQR)* method was used
a diagnostic prediction based on a single ”snapshot” in to identify any extreme data points. This method defines an
time, which is representative of a typical initial clinical outlier as any point that falls below Q1 - 1.5*IQR or above
encounter. Q3 + 1.5*IQR. The analysis revealed no extreme outliers in
• OASIS-II: This is a longitudinal dataset that follows the key features, suggesting a high quality of data collection
150 individuals over two or more visits. This time- and recording.
series data is critical for understanding and modeling 4) Target Encoding: The primary goal was binary classi-
the progression of Alzheimer’s disease, capturing subtle fication. The CDR score, a multi-class measure of dementia
changes in biomarkers over time that might be missed in severity, was converted into a binary target variable. A CDR
a single scan. score of 0, which signifies no dementia, was encoded as
Key features extracted from these datasets for our models ”Healthy”. Scores of 0.5 or greater, which indicate at least
include demographic variables, clinical assessment scores, and very mild impairment, were encoded as ”Demented”. This
MRI-derived volumetric measures: transformation creates a clear, clinically relevant classification
• eTIV (Estimated Total Intracranial Volume): A mea- task for the models to learn.
sure of the total volume within the cranium, used to
C. Exploratory Data Analysis (EDA)
normalize brain volume measurements.
• nWBV (Normalized Whole Brain Volume): The ratio 1) Principal Component Analysis (PCA): PCA is a dimen-
of brain volume to eTIV. This is a crucial biomarker, as a sionality reduction technique used to transform a large set of
decrease in nWBV (brain atrophy) is a hallmark of AD. variables into a smaller one that still contains most of the
• MMSE (Mini-Mental State Examination): A widely information. In this study, PCA was applied to the feature
used 30-point questionnaire that measures cognitive im- set, and it was found that the first two principal components
pairment. Lower scores indicate more severe impairment. were able to capture over 85% of the total variance in the
• CDR (Clinical Dementia Rating): A scale used to data. This indicates that the dataset has a strong underlying
quantify the severity of dementia, from 0 (normal) to 3 structure. The 2D projection of the principal components
(severe). (Fig. 2) showed a noticeable, though not perfect, separation
• SES (Socioeconomic Status): A categorical variable between the healthy and demented groups, providing early
representing the subject’s social and economic standing. evidence that a machine learning classifier could successfully
distinguish between them.
B. Preprocessing Steps
2) Correlation Analysis: A correlation matrix was com-
A rigorous, multi-step preprocessing pipeline was imple- puted to understand the linear relationships between key
mented to clean and prepare the data for modeling, ensuring variables (Fig. 3). The analysis confirmed several clinically
the reliability and validity of the model inputs. expected relationships: the MMSE score was positively corre-
1) Handling Missing Data: Missing data is a common lated with nWBV (i.e., higher cognitive scores are associated
problem in clinical datasets. For the categorical SES feature, with larger brain volumes) and negatively correlated with age
which had some missing entries, *mode imputation* was (cognitive function tends to decline with age). These findings
used. This technique replaces missing values with the most validate the data’s integrity and confirm that the selected
frequently occurring value in the column and is a standard features are clinically relevant to Alzheimer’s disease.
approach for non-numeric data. For the longitudinal data 3) Clustering: To explore natural groupings within the data
from OASIS-II, *forward-filling* was applied. This method without using the diagnostic labels, K-means clustering was
propagates the last observed value forward, which is a clini- performed with k=4. The resulting clusters aligned well with
cally sound assumption for a slowly progressing disease like the known stages of AD progression: one cluster predom-
AD, where a patient’s status is unlikely to change drastically inantly contained healthy subjects, while the others corre-
between closely spaced visits. sponded to patterns seen in mild cognitive impairment (MCI)
2) Feature Engineering: Raw data must be transformed and more advanced stages of dementia. This unsupervised
into a format suitable for machine learning algorithms. *One- analysis further supports the feature set’s ability to represent
hot encoding* was applied to categorical features like Gender the disease spectrum.
and SES. This converts each category into a new binary col-
umn, preventing the model from assuming an incorrect ordinal IV. M ODEL A RCHITECTURE AND I MPLEMENTATION
relationship between categories. All numerical features were
standardized using *Z-score normalization*, which rescales A. Traditional ML Models
the data to have a mean of 0 and a standard deviation of A suite of classical classifiers was implemented to serve as
1. This is a critical step that ensures that features with larger robust baselines. These models are well-suited for tabular data
and provide a high degree of interpretability, which is valuable
in a clinical context.
• Logistic Regression: This model was chosen as a funda-
mental linear baseline. It calculates the probability of a
binary outcome by fitting the data to a logistic function.
Its primary advantages are its simplicity, speed, and the
high interpretability of its coefficients, which directly
indicate the influence of each feature on the prediction.
• Random Forest: As an ensemble method, Random
Forest constructs a multitude of decision trees during
training. It operates by building each tree on a random
bootstrap sample of the data and using a random subset of
features for each split. This dual randomization process
makes it highly resistant to overfitting and capable of cap-
turing complex, non-linear interactions between features
without extensive parameter tuning.
• SVM: The Support Vector Machine is a powerful clas-
sifier that works by finding the optimal hyperplane that
Fig. 2. 2D PCA projection showing separation of healthy and demented separates data points of different classes with the max-
subjects.
imum possible margin. Its strength lies in its ability to
handle high-dimensional feature spaces effectively and its
use of the kernel trick to model non-linear boundaries.
• Gaussian Process, LDA: Bayesian classifiers were also
explored to provide a probabilistic perspective on classi-
fication.
All traditional models were trained and evaluated using a 10-
fold stratified cross-validation scheme to ensure that perfor-
mance estimates were stable and not dependent on a single
random split of the data. Hyperparameters were optimized
using an exhaustive grid search.
B. Deep Learning Models
1) Feedforward Neural Network (FNN): For the cross-
sectional data, a 3-layer FNN was constructed. The architec-
ture consisted of an input layer with neurons corresponding
to the number of input features, two hidden layers with
ReLU activation functions to learn progressively more com-
plex non-linear representations, and a final output layer with a
Fig. 3. Correlation heatmap of key features. single neuron and a sigmoid activation function to produce
a probability score for the ”Demented” class. To prevent
overfitting, a dropout rate of 0.3 was applied after each hidden
layer. The Adam optimizer was used for its adaptive learning
rate capabilities, and the model was trained for 200 epochs,
showing stable convergence on the loss curves.
2) Long Short-Term Memory (LSTM): For the longitudinal
data, an LSTM network was designed. This type of recurrent
neural network (RNN) is specifically engineered to handle
sequential data. Its architecture contains memory cells with
input, output, and forget gates, which allow the network to
selectively remember or discard information over long se-
quences. This makes it ideal for modeling disease progression,
as it can learn patterns from the sequence of patient visits.
The model’s architecture included one recurrent layer with 20
Fig. 4. Boxplots of MMSE and nWBV scores across healthy and demented LSTM units. A key part of the optimization process was a
groups.
comparison between the Adam and SGD optimizers. It was
found that while Adam converged faster, SGD provided better
generalization and reduced overfitting, a common advantage inclusion of the MMSE score was a particularly powerful
on smaller and potentially noisy medical datasets. predictor, reinforcing its clinical importance.
The LSTM model’s success on the longitudinal data high-
V. R ESULTS AND E VALUATION lights the clinical depth that can be gained from time-series
A. Accuracy Comparison analysis. By detecting changes across time, the model can
better represent the dynamic nature of disease progression.
TABLE I This is a crucial advantage over static models, which cannot
M ODEL ACCURACY ACROSS DATASETS differentiate between a stable patient and one who is rapidly
declining.
Model Accuracy Dataset
Compared to prior literature, as surveyed by Alsubaie et al.
Logistic Regression 83.4% OASIS-I [1], this project’s primary contribution is the direct and system-
Random Forest 81.0% OASIS-I
LSTM (Adam) 80.0% OASIS-II
atic implementation and validation of both static and dynamic
LSTM (SGD) 83.0% OASIS-II models on real-world datasets. This includes a focused analysis
FNN (Adam) 80.0% OASIS-I on optimization strategies, such as the finding that SGD can
outperform Adam for LSTMs on noisy medical data, which
aligns with findings in other areas of medical imaging analysis
B. Evaluation Metrics [9]. While the achieved accuracies are consistent with similar
• Confusion Matrix: Analysis of the confusion matrices studies, our unified pipeline provides a clear and reproducible
for the top-performing models revealed a high true pos- framework for future comparative research.
itive rate, indicating that the models were effective at VII. C ONCLUSION AND F UTURE S COPE
correctly identifying demented subjects. This is a critical This project successfully designed, implemented, and eval-
requirement for a clinical screening tool, where missing uated an end-to-end machine learning pipeline for detecting
a diagnosis (a false negative) is often more detrimental Alzheimer’s Disease using widely available structural MRI-
than a false alarm (a false positive). derived features. By systematically integrating and comparing
• Loss Curves: The training and validation loss curves both cross-sectional and longitudinal analyses with a diverse
for the FNN were monitored. The stable and converging range of ML and DL models, this work achieved robust classi-
nature of these curves confirmed that the model was fication performance and generated clinically relevant insights
learning effectively and was not suffering from major into the trade-offs between different modeling approaches.
instability or divergence during training. Key Contributions: The main contributions of this study
• LSTM Heatmaps: For the LSTM model, heatmaps were are threefold:
used to visualize the attention or weights the model
• The development of a unified pipeline that can process
placed on different time steps (visits) in a patient’s
and model both cross-sectional and longitudinal data,
sequence, providing insights into its temporal trend recog-
providing a comprehensive framework for AD analysis.
nition capabilities.
• The use of PCA-guided features and unsupervised clus-
C. Cross-Sectional vs. Longitudinal Analysis tering to enhance the interpretability of the dataset’s
A key finding of this study is the distinct utility of the two structure before supervised learning.
• A comparative analysis and optimization of deep learning
modeling approaches. The cross-sectional models, trained on
OASIS-I, provided higher ”snapshot” accuracy. This makes models, particularly the demonstration of SGD’s effec-
them well-suited for an initial diagnostic screening, where tiveness for LSTMs in this context, providing a practical
a decision must be made based on a single point of data guide for real-world applicability.
collection. In contrast, the longitudinal LSTM-based models Future Work: To build upon these findings, future work
captured the subtler transitions of the disease over time. should proceed in several key directions:
This capability is especially valuable for monitoring patients • Multimodal Data Integration: The current models rely
diagnosed with Mild Cognitive Impairment (MCI) or for solely on structural MRI features. Future iterations should
identifying individuals on a trajectory towards dementia, even incorporate multimodal imaging data, such as PET and
if their current state does not meet the full criteria. fMRI scans, to provide a more holistic view of the dis-
ease’s pathophysiology and potentially improve predictive
VI. D ISCUSSION AND C OMPARATIVE S TUDY accuracy.
The results of this study provide several valuable insights • Advanced Architectures: While LSTMs are effective,
into the application of machine learning for AD detection. newer sequence models like Transformers and other
The strong performance of Logistic Regression on the cross- attention-based architectures should be explored, as they
sectional data is noteworthy. It suggests that with proper may offer superior performance in capturing long-range
feature engineering, even simple, interpretable linear models dependencies in patient data [10].
can be highly effective, which is a significant advantage in a • Clinical Deployment: A long-term goal is to refine these
clinical setting where model transparency is paramount. The models into robust, interpretable screening tools that can
be deployed in clinical settings to assist physicians in
early diagnosis and patient monitoring.
R EFERENCES
[1] M.G. Alsubaie et al., ”Alzheimer’s Disease Detection Using Deep
Learning on Neuroimaging: A Systematic Review,” Mach. Learn. Knowl.
Extr., vol. 6, pp. 464-505, 2024.
[2] D. S. Marcus, et al., ”Open Access Series of Imaging Studies (OASIS):
Cross-sectional MRI Data in Young, Middle Aged, Nondemented, and
Demented Older Adults,” J. Cogn. Neurosci., vol. 19, no. 9, pp. 1498-
1507, 2007.
[3] F. Pedregosa, et al., ”Scikit-learn: Machine Learning in Python,” J.
Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.
[4] F. Chollet, et al., ”Keras,” [Link] 2015.
[5] L. Breiman, ”Random Forests,” Machine Learning, vol. 45, no. 1, pp.
5-32, 2001.
[6] C. Cortes and V. Vapnik, ”Support-vector networks,” Machine Learning,
vol. 20, no. 3, pp. 273-297, 1995.
[7] D. P. Kingma and J. Ba, ”Adam: A Method for Stochastic Optimization,”
in Proc. 3rd International Conference on Learning Representations
(ICLR), 2015.
[8] S. Hochreiter and J. Schmidhuber, ”Long short-term memory,” Neural
Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[9] J. Zhang, et al., ”A survey on deep learning for neuroimaging-based
brain disorder analysis,” Frontiers in Neuroscience, vol. 15, p. 737599,
2021.
[10] A. Vaswani, et al., ”Attention is All you Need,” in Advances in Neural
Information Processing Systems 30 (NIPS 2017), 2017.
[11] S. S. Sarraf and G. Tofighi, ”Deep learning-based pipeline to recognize
Alzheimer’s disease,” Journal of Big Data, vol. 3, no. 1, p. 22, 2016.
[12] N. Srivastava, et al., ”Dropout: a simple way to prevent neural networks
from overfitting,” The Journal of Machine Learning Research, vol. 15,
no. 1, pp. 1929-1958, 2014.
[13] P. R. Adhikari, et al., ”Predicting Alzheimer’s disease progression using
multi-modal deep learning approach,” in 2019 IEEE 16th International
Symposium on Biomedical Imaging (ISBI), pp. 1362-1365, 2019.
[14] A. Esteva, et al., ”A guide to deep learning in healthcare,” Nature
Medicine, vol. 25, no. 1, pp. 24-29, 2019.
[15] W. J. E. P. L. et al., ”A deep learning model for early prediction of
Alzheimer’s disease dementia based on hippocampal MRI,” Alzheimer’s
& Dementia, vol. 15, no. 8, pp. 1059-1070, 2019.