0% found this document useful (0 votes)
20 views9 pages

Autism Spectrum Disorder Prediction in Children Using Machine Learning

The study investigates the use of machine learning techniques to predict Autism Spectrum Disorder (ASD) in children, aiming to enhance early detection and diagnosis. Various models, including support vector machines and logistic regression, were evaluated using publicly available datasets, with logistic regression achieving the highest accuracy. The research highlights the importance of timely intervention in improving long-term outcomes for individuals with ASD.

Uploaded by

Manasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views9 pages

Autism Spectrum Disorder Prediction in Children Using Machine Learning

The study investigates the use of machine learning techniques to predict Autism Spectrum Disorder (ASD) in children, aiming to enhance early detection and diagnosis. Various models, including support vector machines and logistic regression, were evaluated using publicly available datasets, with logistic regression achieving the highest accuracy. The research highlights the importance of timely intervention in improving long-term outcomes for individuals with ASD.

Uploaded by

Manasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Journal of Disability Research

2024 | Volume 3 | Pages: 1–9 | e-location ID: e20230064


DOI: 10.57197/JDR-2023-0064

Autism Spectrum Disorder Prediction in Children


Using Machine Learning
Mahmoud M. Abdelwahab1,2,*, Khamis A. Al-Karawi3,4 , E. M. Hasanin5 and H. E. Semary1,6

1Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia
2Department of Basic Sciences, Higher Institute of Administrative Sciences, Osim, Egypt
3Department of Acoustic, School of Science, Engineering, and Environment, Salford University, Great Manchester, UK
4Department of Computer Science, Faculty of Science, Diyala University, Baqubah, Diyala, Iraq
5Faculty of Business Administration, Egyptian E-Learning University, Giza, Egypt
6Department of Statistics and Insurance, Faculty of Commerce, Zagazig University, Zagazig, Egypt

Correspondence to:
Mahmoud M. Abdelwahab*, e-mail: [email protected], Tel.: +966541065376
Khamis A. Al-Karawi, e-mail: [email protected]
E. M. Hasanin, e-mail: [email protected]
H. E. Semary, e-mail: [email protected]

Received: August 31 2023; Revised: December 11 2023; Accepted: December 11 2023; Published Online: January 5 2024

ABSTRACT
Life symptoms associated with autism spectrum disorder (ASD) typically manifest during childhood and persist into adolescence and adulthood. ASD,
which can be caused by genetic or environmental factors, can be significantly improved through early detection and treatment. Currently, standardized
clinical tests are the primary diagnostic method for ASD. However, these tests are time consuming and expensive. Early detection and intervention are
pivotal in enhancing the long-term prospects of children diagnosed with ASD. Machine-learning (ML) techniques are being utilized alongside con-
ventional methods to improve the accuracy and efficiency of ASD diagnosis. Therefore, the paper aims to explore the feasibility of employing support
vector machines, random forest classifier, naïve Bayes, logistic regression (LR), K-nearest neighbor, and decision tree classification models on our
dataset to construct predictive models for predicting and analyzing ASD problems across different age groups: children, adolescents, and adults. The
proposed techniques are assessed using publicly available nonclinical ASD datasets of three distinct datasets. The four ASD datasets, namely toddlers,
adolescents, children, and adults, were obtained from publicly available repositories, specifically Kaggle and UCI ML. These repositories provide a
valuable data source for research and analysis related to ASD. Our main objective is to identify the susceptibility to ASD in children during the early
stages, thereby streamlining the diagnosis process. Based on our findings, LR demonstrated the highest accuracy for the selected dataset.

KEYWORDS
autism, machine learning, random forest, SVM, decision tree

INTRODUCTION
Autism spectrum disorder (ASD) is a neurodevelopmental et al., 2011; Alenizi and Al-Karawi, 2023b, c). Autism is a
condition affecting a child’s communication, social interac- rapidly growing and numerous global condition, affecting
tion, and knowledge acquisition, typically presenting within approximately one child out of every 160, according to the
the first 2 years of life (Frith and Happé, 2005). People with World Health Organization (Suhas et al., 2021; Al-Karawi,
autism face various obstacles, including difficulty with focus, 2023). ASD is a neurodevelopmental condition that affects
learning disabilities, mental health issues such as anxiety, social interaction and communication abilities, requiring
depression, movement, sensory issues, and other challenges 24-h care and assistance for some individuals (Vaishali and
(Tripathy et al., 2021). As a result, it impacts an individual’s Sasikala, 2018; Thabtah, 2019). Individuals with ASD often
entire cognitive, social, emotional, and physical health (Omar experience lifelong challenges in these areas. ASD, a con-
et al., 2019; Alenizi and Al-Karawi, 2023a). The symptoms dition characterized by persistent symptoms, is believed to
of this condition vary in extent and intensity, including com- be caused by a combination of genetic and environmental
munication difficulties, obsessive h­obbies, and repeated factors, with no known cure, but early detection can help
mannerisms in social situations. A comprehensive examina- manage its effects. Genes, environmental factors, and risk
tion is needed to detect ASD. This also comprises a thorough factors like low birth weight, ASD sibling presence, and
evaluation and a range of assessments performed by child older parents can influence a person’s development. Early
psychologists and other qualified professionals (Bastiaansen diagnosis of autism can be quite beneficial because it allows

1
2 M. M. Abdelwahab et al.: ASD Prediction Using Machine Learning

doctors to provide patients with the appropriate treatment at accuracy of 98.27%. The effectiveness of ML in predicting
an earlier stage. It can potentially halt any further deteriora- various diseases based on syndromes is highly noteworthy.
tion of the patient’s condition. It would help to cut down on For instance, Khan et al. (2017) and Al-Karawi and Ahmed
the expenditures associated with delayed diagnosis over long (2021) utilized ML to predict whether a person has diabetes,
term. Therefore, there is a significant need for a screening test whereas Cruz and Wishart (2006) attempted to diagnose can-
instrument that is time efficient, accurate, and simple. This cer using ML. Alternating decision tree (ADTree) was used
test instrument would predict autistic symptoms in an indi- by Wall et al. (2012a) and Alenizi and Al-Karawi (2023b) to
vidual and determine whether or not that individual requires shorten the screening process and speed up the identification
a thorough autism examination (Lakhan et al., 2020; Alenizi of ASD features. With data from 891 people, they employed
and Al-Karawi, 2022). Early detection and intervention are the Autism Diagnostic Interview, Revised (ADI-R) approach.
crucial for mitigating ASD symptoms and improving the They reached high accuracy, but the test was restricted to peo-
quality of life. Observation is the primary method, with par- ple between the ages of 5 and 17, and it could not predict ASD
ents, teachers, and special education teams identifying poten- for various age groups (children, adolescents, and adults).
tial symptoms. Children should seek healthcare for further Machine learning has been used in several types of research in
testing, as identifying ASD symptoms in adults can be more multiple ways to enhance and expedite the diagnosis of ASD.
challenging, while behavioral changes in children can be rec- Using a 65-item Social Responsiveness Scale, Duda et al.
ognized as early as 6 months (Al-Karawi and Ahmed, 2021; (2016) used forward feature selection and under sampling
Alenizi and Al-Karawi, 2023c). This study aims to develop to distinguish between autism and attention deficit hyper-
a platform for accurately predicting autistic c­haracteristics activity disorder (ADHD). The metrics of Al-Karawi (2021)
in individuals of any age, using machine-learning (ML) and Deshpande et al. (2013) for predicting ASD were based
approaches to aid in early diagnosis and intervention. on brain activity. Artificial neural networks (ANN), proba-
bilistic reasoning, and classifier combinations are examples
of soft computing approaches that have also been employed
(Pratap et al., 2014; Alenizi and Al-Karawi, 2022). Numerous
BACKGROUND AND LITERATURE papers have discussed automatic ML models that solely con-
REVIEW sider characteristics for input features. Several research also
used brain neuroimaging data. Parikh et al. (2019) selected
In their study, Vaishali and Sasikala (2018) proposed a method six personal traits from the ABIDE database and used a cross-­
for identifying ASD using optimized behavior sets. The validation technique to train and test ML models using data
researchers experimented with an ASD diagnosis dataset con- from 851 subjects. Patients with and without ASD were cat-
taining 21 features from the UCI machine-­learning repository. egorized using this, accordingly. Rules of machine learning,
They employed a swarm intelligence-based binary firefly fea- which Thabtah and Peebles (2020) introduced, provide users
ture selection wrapper to explore the dataset. Researchers tested with a knowledge base of rules for comprehending the clas-
the hypothesis that a machine-­learning model could improve sification’s fundamental causes and detecting ASD character-
classification accuracy using minimal feature subsets, finding istics. Al Banna et al. (2020) track and support ASD patients
that only 10 features from the original 21-feature ASD dataset while they deal with the COVID-19 epidemic. The study uti-
were sufficient. The study found that swarm intelligence-based lized five machine-learning models to classify participants as
binary firefly feature selection can achieve accurate ASD diag- having ASD or No-ASD based on various parameters like age,
nosis with fewer features, achieving an average accuracy in sex, and ethnicity. We then analyzed each classifier to find the
the range of 92.12 to 97.95%, potentially improving efficiency model that performed the best. SVM was utilized by Bone
and reducing computational complexity in ASD diagnostic et al. (2016) to apply ML for the same goal and achieve 89.2%
systems. In their study, Thabtah (2017b) introduced an ASD sensitivity and 59% specificity. In their study, 1264 people with
screening model incorporating machine-learning adaption and ASD and 462 people without ASD features were involved.
diagnostic and statistical manual of mental disorders (DSM-5) However, because of the vast age range (4-55 years), their
criteria. Screening tools play a crucial role in achieving var- research was not approved as a screening method for all age
ious objectives in ASD screening. This paper explores the groups. Using more than 90% accuracy, Allison et al. (2012)
use of machine learning for ASD classification, highlighting used the “Red Flags” tool to screen for ASD in both children
its advantages and disadvantages and the challenges existing and adults with the Autism Spectrum Quotient before shortlis-
tools face in aligning with the DSM-5 manual. In their study, ting them to the AQ-10. Schankweiler et al. (2023) attempted
Mythili and Shanavas (2014) researched ASD using classifi- to identify relatively more important screening questions for
cation techniques. The primary objective of their paper was to the ADI-R and ADOS screening methods. They found that
detect and classify levels of autism. They employed neural net- ADI-R and ADOS screening tests can work better when they
works, support vector machine (SVM), and fuzzy techniques are combined. Thabtah compared the previous works on ML
with WEKA tools to analyze students’ behavior and social algorithms to predict autism traits (Thabtah, 2017b). To iden-
interaction. In another study, Kosmicki et al. (2015) proposed tify ASD symptoms in children, such as developmental delay,
a method for identifying a minimal set of traits for autism obesity, and insufficient physical activity, van den Bekerom
detection. The authors used machine learning to assess ASD (2017) utilized multiple ML algorithms, including naïve Bayes
clinically using the Autism Diagnostic Observation Schedule (NB), SVM, and random forest algorithms. He then com-
(ADOS). They identified 98.27% of the 28 behaviors from pared those results. ADTree and the functional tree fared well
module 2 and 97.66% from module 3, achieving an overall with high sensitivity, specificity, and accuracy, according to

Journal of Disability Research 2024


M. M. Abdelwahab et al.: ASD Prediction Using Machine Learning 3

Wall et al. (2012b) study on identifying autism using a short operation and data flow, starting with preliminary data
screening test and validation. Heinsfeld et al. (2018) used a processing, removing noise, missing values, outliers, and
sizable brain imaging dataset from the Autism Imaging Data encoding categorical attributes. We use feature-engineering
Exchange (ABIDE I) to identify ASD patients and got a mean techniques to reduce dataset dimensionality, improve train-
classification accuracy of 70% with accuracy in the range of 66 ing speed, and use preprocessed datasets for classification
to 71%. The random forest classifier’s (RFC) mean accuracy using SVM, decision tree, and RFCs. The system evaluates
was 63%, compared to the SVM classifier’s mean accuracy of classifier accuracy using a structured workflow, starting with
65%. This study’s accuracy, specificity, sensitivity, and AUC data preprocessing, feature selection, and classification tech-
were 88.51%. To pinpoint the problems with conceptual prob- niques, identifying the most accurate model for further train-
lem formulation, methodology implementation, and result in ing and categorization tasks.
interpretation, Bone et al. (2015) analyzed the earlier works of
Wall et al. (2012b) and Kosmicki et al. (2015). The research-
ers used machine learning to replicate their findings, but there RESEARCH METHODOLOGY
is no consensus on the best approach for generalizing autism
screening tools across different age ranges. The research involved five stages: data collection, synthesis,
prediction model development, evaluation, and application
development, each with a brief discussion of each phase.
WORKING MODEL
This research aims to create a robust machine-learning model Data collection
for detecting autism in individuals of different ages, ensuring
accurate and effective detection. Figure 1 shows our system’s The dataset utilized for this research has been acquired from
the publicly available UCI Repository. The four ASD data-
sets, namely, toddlers, adolescents, children, and adults,
were obtained from publicly available repositories, spe-
cifically Kaggle and UCI ML (Hasan et al., 2022). These
repositories provide a valuable data source for research and
analysis related to ASD.
These datasets have 20 common attributes that are used
for prediction. These attributes are listed below:

Data preprocessing

Figure 1: The architecture of the proposed system (Alenizi Data preparation encompasses all the necessary preproc-
and Al-Karawi, 2023c). essing steps before commencing model training, aiming

Table 1: List of ASD datasets (Hasan et al., 2022).


S. no. Dataset name Sources Attribute type Attributes number Instances number
1 ASD screening data Machine-learning repository Categorical, continuous, 21 704
for adult UCI (Thabtah, 2017b) and binary
2 ASD screening data Machine-learning repository Categorical, serial, and 21 292
for children UCI (Thabtah, 2017b) binary
3 ASD screening data Machine-learning repository ASD categorical, 21 104
for adolescent UCI (Thabtah, 2017a) continuous, and binary

Abbreviation: ASD, autism spectrum disorder.

Table 2: List of attributes in the dataset (Hasan et al., 2022).


Attribute id Attributes description
1 Patient age
2 Sex
3 Nationality
4 The patient suffered from jaundice problem at birth
5 Any family member suffered from pervasive developmental disorders
6 Who is the fulfillment of the experiment
7 The country in which the user lives
8 Did the user use the screening application before or not?
9 Screening test type
10-19 Based on the screening method, answers to 10 questions
20 Screening score

Journal of Disability Research 2024


4 M. M. Abdelwahab et al.: ASD Prediction Using Machine Learning

to achieve optimal results (Gopal Krishna Patro and Sahu, • After selecting the feature subset with optimal perfor-
2015). This preparation entails a series of three stages. mance, as described earlier (initially starting with the
• Data encoding involves transforming a dataset comprising full feature set in the first iteration), cross-validation
6 numerically assigned values and 13 nominally assigned with 10-fold is employed to evaluate the discriminant
values. To effectively employ various machine-­learning performance (Berrar, 2019). As demonstrated later, all
algorithms, it is essential to work with real numbers. trained models are saved from being utilized for diag-
Consequently, all nominal values must be converted into nosing unseen samples. This process is repeated for each
real numbers. A straightforward representation is adopted algorithm under consideration. To assess the model per-
in this case, wherein the real numbers 1 and 2 are utilized. formance, the results obtained from cross-validation for
For instance, the male class is encoded as 1, while the each feature subset are compared, determining the best-­
female type is encoded as 2. performing model for each specific number of features.
• Dealing with missing values is a crucial step in data han- • As part of this research, the objective is to develop a
dling. A significant portion (48.3%) of the data is miss- mobile application for patients or healthcare facilities.
ing in the given dataset. Removing these missing values Therefore, one of the essential goals is to minimize the
would render the dataset unusable, reducing it to 155 sam- number of features, thereby reducing the cost of tests
ples. Hence, it becomes essential to address this issue. A while maximizing accuracy. To achieve this, a proce-
statistical approach is adopted whereby the missing values dure is implemented to identify the minor features that
are replaced with the mean of the values corresponding to yield the most optimal performance across 10-folds. The
each class (Wohlrab and Fürnkranz 2011). This ensures resulting 10 models from each fold are saved for later use
that the dataset remains intact and usable for further anal- during the testing phase. This approach ensures that the
ysis and modeling. application maintains high accuracy while minimizing the
• Normalization becomes necessary as the dataset exhib- required features.
its significant variations in the range of values, particu-
larly after the nominal values have been encoded into
real numbers (1 and 2). Without normalization, attributes Training framework architecture
with more extensive numeric ranges can dominate those
with smaller ranges, potentially biasing the analysis. As mentioned earlier, previous studies have predominantly
Moreover, normalization facilitates faster execution of focused on selecting features independently of the training
algorithms by avoiding the utilization of wide-ranging model. In traditional classification systems, a feature selec-
numbers (Deshpande et al., 2013). In this case, the data tion technique is often applied, and the selected features
are scaled to fit within the interval of 0 to 1, following are then used across all algorithms to classify d­ iseases.
Equation (1), where x represents the original value of the However, this approach can lead to varying performance
attribute, x­ Normalized represents the scaled value, mina is the for each model, depending on the algorithm used and the
minimum value of attribute a, and maxa is the maximum representation of the selected features. Specific algorithms
value of attribute a. may underperform because the chosen features may not be
the most suitable for that particular algorithm. To address
 x  min a  this feature selection challenge, this subsection proposes
X Normalized    (1)
 max  min a  and justifies a stand-alone platform for diagnosing hep-
atitis disease. The platform encompasses the training
framework architecture, testing framework architecture,
Selecting the optimal subset of the and real-time diagnosis platform. Figure 1 illustrates the
feature complete training framework architecture, with each sec-
tion detailed. The entire process is repeated for all selected
The feature selection block outlines the process of select- algorithms.
ing the best subset of features, which is influenced by the
chosen algorithm and the desired learning performance. The
following steps are followed to accomplish this selection Testing framework architecture
procedure:
• This study employs adaptive wrapper feature selec- Figure 2 illustrates the execution of a simulated test on an
tion and precisely backward elimination (Mao, 2004; unseen portion of the dataset. The testing process involves
Al-Karawi and Mohammed, 2023) to determine the opti- data preparation, similar to the training process. The p­ repared
mal set of features. The results of this process are pre- data are then passed to a script that performs predictions
sented in the paper. Initially, all features related to the using the 10 pretrained models. A voting process d­ etermines
chosen algorithm are included. Then, in each iteration, the the final decision based on the highest probability (Parikh
importance of each feature is evaluated, and the feature et al., 2019). However, in the scenario where five models
with the lowest priority is eliminated. This iterative loop predict “affected” and five models predict “healthy,” the
continues until only one feature remains unexplored. The patient is considered to have autism disease. It is important
process is repeated until a significant decline in diagnostic to note that, since we are dealing with a disease, it is highly
performance is observed, as discussed in the Results and recommended that the patient consult a doctor for further
Discussion section. examination and diagnosis.

Journal of Disability Research 2024


M. M. Abdelwahab et al.: ASD Prediction Using Machine Learning 5

Figure 4: An SVM classifier. Abbreviation: SVM, support


vector machine (Alenizi and Al-Karawi, 2023c).

Naïve Bayes

The NB classifier is a supervised learning algorithm that


operates as a generative model based on joint probability
distribution. It makes use of independence assumptions to
simplify computations. Compared to SVM and ME models,
NB exhibits faster training times. It calculates the posterior
probability for a dataset by combining prior probability and
likelihood estimations (John and Langley, 2013).

Logistic regression
Figure 2: Training framework architecture (Alenizi and Logistic regression (LR) is a regression technique for ana-
Al-Karawi, 2023c).
lyzing binary dependent variables. Its output values are con-
strained to 0 or 1, making it suitable for binary classification
tasks. LR is beneficial for datasets with continuous values. It
enables examining the relationship between a single depend-
ent binary variable and one or more nominal or ordinal var-
iables. The relationship is typically represented using the
sigmoidal function.

K-nearest neighbor

K-nearest neighbor (KNN) is a supervised learning method


known for its simplicity. It is employed in both classification

Figure 3: Testing framework architecture (Alenizi and


Al-Karawi, 2023c).

CLASSIFICATIONS ALGORITHMS
Support vector machine

SVM is a supervised machine-learning technique for clas-


sification and regression tasks. It is a practical approach to
solving pattern recognition problems. One notable advan-
tage of SVM is its ability to mitigate overfitting issues. By
establishing a decision boundary, SVM effectively segre-
gates classes (Huang et al., 2018). Figure 5: K-nearest neighbor (Alenizi and Al-Karawi, 2023c).

Journal of Disability Research 2024


6 M. M. Abdelwahab et al.: ASD Prediction Using Machine Learning

and regression tasks. The underlying principle of KNN is can be seen as a tree. In Figure 2, a decision tree is displayed
that similar data points tend to be located close to each other. (Song and Ying, 2015).
The “K” in KNN refers to the number of neighboring points
to consider. Selecting an appropriate “K” value is crucial in
minimizing errors. KNN relies on similarity, measured by
distance, closeness, or proximity. The widely used distance
RESULTS AND DISCUSSION
metric is the Euclidean distance.
The performance of the classification model is evaluated
using metrics such as specificity, sensitivity, and accuracy,
Random forest classifier which are derived from the confusion matrix and classifica-
tion report. These metrics provide insights into the model’s
The RFC is a versatile algorithm capable of handling classifi- precision in predicting true negatives, positives, and overall
cation, regression, and other tasks (Alam and Vuong, 2013). accuracy. The model’s effectiveness depends on the accu-
It operates by generating multiple decision trees using ran- racy of its training, as it directly influences the quality of the
dom subsets of the data. Once predictions are obtained from results obtained from these performance measures.
each tree, the final solution is determined by employing a
voting mechanism. The prediction that receives the highest
Performance evaluation
number of votes is selected as the best solution. This vot-
ing-based approach allows RFC to leverage the collective
Evaluating the performance of a classification model is cru-
wisdom of multiple decision trees, resulting in improved
cial to assess its effectiveness in achieving a desired out-
accuracy and flexibility.
come. Performance evaluation metrics quantitatively assess
The random forest algorithm creates many decision trees
the model’s performance on a test dataset. Selecting appro-
from a randomly selected section of the training dataset
priate metrics to evaluate the model’s performance accu-
shown in Figure 3. The votes from several decision trees
rately is essential. Several metrics can be utilized, including
are then averaged to establish the final class of test objects
the confusion matrix, accuracy, specificity, sensitivity, and
(Alam and Vuong, 2013).
more. The following formulas are commonly employed to
calculate these performance metrics.
Decision tree classification method
TN
Specificity  (2)
The cornerstone of a decision tree is the decision-making TN  TP
process, which has outstanding accuracy and stability and
TN
True Positive Rate or Sensitivity  (3)
TN  FN

TP  TN
Accuracy  (4)
TN  TP  FP  FN _

The experimental results demonstrate the application of


various machine-learning algorithms with feature selection
for ASD screening data in children. All features were selected
to evaluate the predictive models’ specificity, sensitivity, and

Table 3: Elements of a confusion matrix.


Figure 6: Random forest classification (Alenizi and Al-Karawi, Predictive ASD values
2023c). Actual ASD values True Positive (TP) False positive (FP)
False Negative (FN) True negative (TN)

Abbreviation: ASD, autism spectrum disorder.

Table 4: Performance measures for all machine-learning


classifiers with the three datasets.
Classifier Specificity Sensitivity Accuracy
Logistic regression 0.9375 0.9696 96.69
SVM 0.9474 0.88888 98.11
Naïve Bayes 0.9361 96.76 96.24
KNN 0.9148 0.9687 95.65
Random forest 1.00 0.9933 99.75
Decision tree 0.9887 0.98877 97.47

Figure 7: Decision tree classification method (Alenizi and Abbreviations: KNN, K-nearest neighbor; SVM, support vector
Al-Karawi, 2023c). machine.

Journal of Disability Research 2024


M. M. Abdelwahab et al.: ASD Prediction Using Machine Learning 7

Figure 8: Classification performance.


Figure 12: Learning curve of logistic regression.

Figure 9: Learning curve of naïve Bayes.

Figure 13: Learning curve of random force.

Figure 10: Learning curve of SVM. Abbreviation: SVM, sup-


port vector machine.

Figure 14: Learning curve of decision tree.

accuracy. The specific implementations for each algorithm


are as follows:
• NB: Gaussian NB algorithm was used.
• SVM: Radial basis function (RBF) kernel with a gamma
value of 0.1 was utilized.
• KNN: N = 5 neighbors were considered.
• ANN: Adam optimizer with a learning rate of 0.01 and
100 epochs was employed; random forest and decision
tree algorithm were used.

The evaluation of different machine-learning models on the


Figure 11: Learning curve of KNN. Abbreviation: KNN, ASD diagnosis dataset resulted in accuracy ranging from
K-nearest neighbor. 95.65 to 99.75% on the original dataset. The KNN classifier

Journal of Disability Research 2024


8 M. M. Abdelwahab et al.: ASD Prediction Using Machine Learning

with K = 5 achieved the lowest accuracy of 95.65%, while dataset to construct an appropriate model. The dataset we
the random forest model achieved the highest prediction used for this analysis did not contain enough cases.
accuracy of 99.75% on the original dataset. Additionally, the On the other hand, our research findings have contributed
learning curves of all the machine-learning algorithms pro- to creating an automated model that can assist medical pro-
vide further insights into the performance of the prediction fessionals in diagnosing autism in youngsters. In the future,
models. we will examine the possibility of employing a larger dataset
to increase generalization. In future endeavors, we aim to
gather a larger dataset related explicitly to ASD and con-
CONCLUSION struct a more comprehensive prediction model applicable to
individuals of any age. This will further enhance ASD detec-
This study presents a machine-learning framework designed tion and facilitate improved identification of other neuro-­
to detect ASD in individuals across various age groups, developmental disorders.
including toddlers, children, adolescents, and adults. Our
findings demonstrate the effectiveness of predictive models
based on machine-learning techniques as valuable tools for
accomplishing this task. As a result, the prediction models
CONFLICTS OF INTEREST
proposed in this study, which are based on machine-learn-
The authors declare no conflicts of interest in association
ing techniques, can serve as an alternative or supportive tool
with the present study.
for healthcare professionals in accurately identifying ASD
cases across various age groups. The experimental anal-
ysis conducted in this research provides valuable insights
for healthcare practitioners, enabling them to consider the ACKNOWLEDGMENTS
most ­significant features when screening for ASD cases. It is
important to note that the limitation of this study lies in the The authors extend their appreciation to the King Salman
insufficient amount of data to develop a generalized model Centre for Disability Research for funding this work through
encompassing all stages of ASD. It is vital to have a huge Research Group no KSRG-2023-556.

REFERENCES
Al Banna M.H., Ghosh T., Taher K.A., Kaiser M.S. and Mahmud M. (2020). Allison C., Auyeung B. and Baron-Cohen S. (2012). Toward brief “red
A monitoring system for patients of autism spectrum disorder using flags” for autism screening: the short autism spectrum quotient and
artificial intelligence. In: Proceedings of the 13th International Con- the short quantitative checklist in 1,000 cases and 3,000 controls. J.
ference on Brain Informatics, BI 2020, Padua, Italy, 19 September Am. Acad. Child Adolesc. Psychiatry, 51(2), 202-212.e7.
2020, Springer. Bastiaansen J.A., Thioux M., Nanetti L., van der Gaag C., Ketelaars C.,
Alam M.S. and Vuong S.T. (2013). Random forest classification for detecting Minderaa R., et al. (2011). Age-related increase in inferior frontal
android malware. In: 2013 IEEE International Conference on Green gyrus activity and social functioning in autism spectrum disorder.
Computing and Communications and IEEE Internet of Things and Biol. Psychiatry, 69(9), 832-838.
IEEE Cyber, Physical and Social Computing, IEEE, Beijing, China. Berrar D. (2019). Encyclopedia of Bioinformatics and Computational Biol-
Alenizi A.S. and Al-Karawi K.A. (2022). Cloud computing adoption-based ogy. Cross-Validation, Academic Press, Oxford.
digital open government services: challenges and barriers. In: Pro- Bone D., Goodwin M.S., Black M.P., Lee C.C., Audhkhasi K. and Narayanan
ceedings of Sixth International Congress on Information and Com- S. (2015). Applying machine learning to facilitate autism diagnostics:
munication Technology, Springer. pitfalls and promises. J. Autism Dev. Disord., 45, 1121-1136.
Alenizi A.S. and Al-Karawi K.A. (2023a). Effective Biometric Technology Bone D., Bishop S.L., Black M.P., Goodwin M.S., Lord C. and Narayanan
Used with Big Data. In: Proceedings of Seventh International Con- S.S. (2016). Use of machine learning to improve autism screening
gress on Information and Communication Technology, Springer. and diagnostic instruments: effectiveness, efficiency, and multi-­
Alenizi A.S. and Al-Karawi K.A. (2023b). Internet of things (IoT) adop- instrument fusion. J. Child Psychol. Psychiatry, 57(8), 927-937.
tion: challenges and barriers. In: Proceedings of Seventh Interna- Cruz J.A. and Wishart D.S. (2006). Applications of machine learn-
tional Congress on Information and Communication Technology, ing in cancer prediction and prognosis. Cancer Inform., 2,
Springer. 117693510600200030.
Alenizi A.S. and Al-Karawi K.A. (2023c). Machine learning approach for Deshpande G., Libero L.E., Sreenivasan K.R., Deshpande H.D. and Kana
diabetes prediction. In: International Congress on Information and R.K. (2013). Identification of neural connectivity signatures of autism
Communication Technology, Springer. using machine learning. Front. Hum. Neurosci., 7, 670.
Al-Karawi K.A. (2021). Mitigate the reverberation effect on the speaker Duda M., Ma R., Haber N. and Wall D.P. (2016). Use of machine learning
verification performance using different methods. Int. J. Speech Tech- for behavioral distinction of autism and ADHD. Transl. Psychiatry,
nol., 24(1), 143-153. 6(2), e732.
Al-Karawi K.A. (2023). Face mask effects on speaker verification perfor- Frith U. and Happé F. (2005). Autism spectrum disorder. Curr. Biol., 15(19),
mance in the presence of noise. Multimed. Tools Appl., 82, 1-14. R786-R790.
Al-Karawi K.A. and Ahmed S.T. (2021). Model selection toward robustness Gopal Krishna Patro S. and Sahu K.K. (2015). Normalization: a preprocess-
speaker verification in reverberant conditions. Multimed. Tools Appl., ing stage. arXiv e-prints, p. arXiv:1503.06462.
80, 36549-36566. Hasan S.M., Uddin M.P., Mamun M.A., Sharif M.I., Ulhaq A. and
Al-Karawi K.A. and Mohammed D.Y. (2023). Using combined features to Krishnamoorthy G. (2022). A machine learning framework for ear-
improve speaker verification in the face of limited reverberant data. ly-stage detection of autism spectrum disorders. IEEE Access, 11,
Int. J. Speech Technol., 26, 789-799. 15038-15057.

Journal of Disability Research 2024


M. M. Abdelwahab et al.: ASD Prediction Using Machine Learning 9

Heinsfeld A.S., Franco A.R., Craddock R.C., Buchweitz A. and Meneguzzi Schankweiler P., Raddatz D., Ellrott T. and Hauck Cirkel C. (2023). Corre-
F. (2018). Identification of autism spectrum disorder using deep lates of food addiction and eating behaviours in patients with morbid
learning and the ABIDE dataset. Neuroimage Clin., 17, 16-23. obesity. Obesity Facts, 16, 465-474.
Huang S., Cai N., Pacheco P.P., Narrandes S., Wang Y. and Xu W. (2018). Song Y.-Y. and Ying L. (2015). Decision tree methods: applications for clas-
Applications of support vector machine (SVM) learning in cancer sification and prediction. Shanghai Arch. Psychiatry, 27(2), 130.
genomics. Cancer Genomics Proteomics, 15(1), 41-51. Suhas G., Naveen N., Nagabanu M., Mario Edwin R. and Nithish Kumar R.
John G.H. and Langley P. (2013). Estimating continuous distributions in (2021). A survey on autism spectrum disorder (ASD) using machine
Bayesian classifiers. arXiv preprint, arXiv:1302.4964. learning. Adv. Innov. Comput. Progr. Lang., 3(2).
Khan N.S., Muaz M.H., Kabir A. and Islam M.N. (2017). Diabetes predict- Thabtah F.F. (2017a). Autistic spectrum disorder screening data for adolescent.
ing mhealth application using machine learning. In: 2017 IEEE Inter- Thabtah F. (2017b). Autism spectrum disorder screening: machine learning
national WIE Conference on Electrical and Computer Engineering adaptation and DSM-5 fulfillment. In: Proceedings of the 1st Interna-
(WIECON-ECE), IEEE. tional Conference on Medical and health Informatics 2017.
Kosmicki J., Sochat V., Duda M. and Wall D.P. (2015). Searching for a Thabtah F. (2019). Machine learning in autistic spectrum disorder behavio-
minimal set of behaviors for autism detection through feature selec- ral research: a review and ways forward. Inform. Health Soc. Care,
tion-based machine learning. Transl. Psychiatry, 5(2), e514. 44(3), 278-297.
Lakhan R., Agrawal A. and Sharma M. (2020). Prevalence of depression, Thabtah F. and Peebles D. (2020). A new machine learning model based on
anxiety, and stress during COVID-19 pandemic. J. Neurosci. Rural induction of rules for autism detection. J. Health Inform., 26(1), 264-286.
Pract., 11(04), 519-525. Tripathy H.K., Mallick P.K. and Mishra S. (2021). Application and evalu-
Mao K.Z. (2004). Orthogonal forward selection and backward elimina- ation of classification model to detect autistic spectrum disorders in
tion algorithms for feature subset selection. IEEE Trans. Syst. Man children. Int. J. Comput. Appl. Technol., 65(4), 368-377.
Cybern., 34(1), 629-634. Vaishali R. and Sasikala R. (2018). A machine learning based approach to clas-
Mythili M. and Shanavas A. (2014). A study on Autism spectrum disorders sify autism with optimum behaviour sets. Int. J. Eng. Technol., 7(4), 18.
using classification techniques. Int. J. Soft Comput. Eng., 4(5), 88-91. van den Bekerom B. (2017). Using machine learning for detection of autism
Omar K.S., Mondal P., Khan N.S., Rizvi M.R.K. and Islam M.N. (2019). spectrum disorder. In: Proceedings of the 20th Student Conference IT.
A machine learning approach to predict autism spectrum disorder. In: Wall D.P., Dally R., Luyster R., Jung J.Y. and Deluca T.F. (2012a). Use of
2019 International Conference on Electrical, Computer and Commu- artificial intelligence to shorten the behavioral diagnosis of autism.
nication Engineering (ECCE), IEEE. PLoS One, 7(8), e43855.
Parikh M.N., Li H. and He L. (2019). Enhancing diagnosis of autism with Wall D.P., Kosmicki J., DeLuca T.F., Harstad E. and Fusaro V.A. (2012b).
optimized machine learning models and personal characteristic data. Use of machine learning to shorten observation-based screening and
Front. Comput. Neurosci., 13, 9. diagnosis of autism. Transl. Psychiatry, 2(4), e100-e100.
Pratap A., Kanimozhiselvi C.S., Vijayakumar R. and Pramod K.V. (2014). Wohlrab L. and Fürnkranz J. (2011). A review and comparison of strategies
Soft computing models for the predictive grading of childhood for handling missing values in separate-and-conquer rule learning. J.
Autism—a comparative study. Int. J. Soft Comput. Eng., 4(3), 64-67. Intell. Inf. Syst., 36, 73-98.

Journal of Disability Research 2024

You might also like