0% found this document useful (0 votes)
112 views9 pages

Research Article

Uploaded by

Bharat Traders
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views9 pages

Research Article

Uploaded by

Bharat Traders
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Hindawi

Complexity
Volume 2020, Article ID 1357853, 9 pages
https://doi.org/10.1155/2020/1357853

Research Article
Multichannel Deep Attention Neural Networks for the
Classification of Autism Spectrum Disorder Using
Neuroimaging and Personal Characteristic Data

Ke Niu ,1,2 Jiayang Guo ,3 Yijie Pan,4 Xin Gao,5 Xueping Peng ,2 Ning Li ,1
and Hailong Li 6
1
Computer School, Beijing Information Science and Technology University, Beijing 100101, China
2
CAI, School of Computer Science, Faculty of Engineering and Information Technology, University of Technology Sydney,
Ultimo, Australia
3
Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH 45221, USA
4
Ningbo Institute of Information Technology Application, CAS, Beijing, China
5
Computational Bioscience Research Center (CBRC),
Computer Electrical and Mathematical Sciences and Engineering (CEMSE) Division,
King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
6
Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA

Correspondence should be addressed to Jiayang Guo; [email protected], Xueping Peng; [email protected], and
Hailong Li; [email protected]

Received 12 June 2019; Revised 1 January 2020; Accepted 4 January 2020; Published 31 January 2020

Guest Editor: Gonzalo Farias

Copyright © 2020 Ke Niu et al. This is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Autism spectrum disorder (ASD) is a developmental disorder that impacts more than 1.6% of children aged 8 across the United States. It
is characterized by impairments in social interaction and communication, as well as by a restricted repertoire of activity and interests. The
current standardized clinical diagnosis of ASD remains to be a subjective diagnosis, mainly relying on behavior-based tests. However, the
diagnostic process for ASD is not only time consuming, but also costly, causing a tremendous financial burden for patients’ families.
Therefore, automated diagnosis approaches have been an attractive solution for earlier identification of ASD. In this work, we set to
develop a deep learning model for automated diagnosis of ASD. Specifically, a multichannel deep attention neural network (DANN) was
proposed by integrating multiple layers of neural networks, attention mechanism, and feature fusion to capture the interrelationships in
multimodality data. We evaluated the proposed multichannel DANN model on the Autism Brain Imaging Data Exchange (ABIDE)
repository with 809 subjects (408 ASD patients and 401 typical development controls). Our model achieved a state-of-the-art accuracy of
0.732 on ASD classification by integrating three scales of brain functional connectomes and personal characteristic data, outperforming
multiple peer machine learning models in a k-fold cross validation experiment. Additional k-fold and leave-one-site-out cross validation
were conducted to test the generalizability and robustness of the proposed multichannel DANN model. The results show promise for
deep learning models to aid the future automated clinical diagnosis of ASD.

1. Introduction activity and interests [2–5]. Patients with ASD exhibit dif-
ferent levels of impairments, ranging from above average to
Autism spectrum disorder (ASD) has been estimated to intellectual disability. In neuroscience, ASD remains a
occur in more than 1.6% of children aged 8 across the United formidable challenge, due to their high prevalence, com-
States [1]. As a chronic neurological condition, ASD is plexity, and substantial heterogeneity, which require mul-
characterized by impairments in social interaction and tidisciplinary efforts [6–8]. Although clinical therapies have
communication, as well as by a restricted repertoire of been developed to treat the symptoms, the diagnosis of ASD
2 Complexity

remains to be a challenging task. Currently, behavior-based outperform traditional machine learning algorithms on nu-
test is the standard clinical method for diagnosing ASD [9]. merous recognition and classification tasks [24–29], which
However, the diagnostic process for ASD is not only time inspires the researchers in the ASD community to apply deep
consuming but also costly [10]. This results in a tremendous learning approaches on ASD classification. Earlier, deep
financial burden for patients’ families. Meanwhile, with this neural networks (DNNs) have been applied to identify ASD
lifetime ASD, the patients may have difficulties in normal patients using rs-fMRI [26]. Their model achieved 70% on
socialization and working environments, increasing the accuracy by using the functional connectivity (FC) matrix as
overall social costs. Therefore, an automated diagnosis ap- features for model training.
proach is desirable for earlier identification of ASD. Kong et al. [27] constructed individual functional brain
Machine learning is a promising tool for investigating networks using the rs-fMRI data from 182 subjects of NYU
the replicability of patterns across larger, more heteroge- Langone Medical Center, a data site within ABIDE repository.
neous datasets [11–13]. For automated diagnosis of ASD, FC features were used to represent the networks of all subjects
personal characteristic (PC) data, such as intelligence and further ranked using F-score. Then, a stacked sparse
quotient (IQ) and Social Responsiveness Scale (SRS) score autoencoder-based DNN model was developed. Significant
have been adopted in several studies [14–16]. In the study of performance improvement was achieved by comparing the
ASD, IQ is a type of standard score that is derived from proposed method with two existing algorithms.
several standardized tests designed to assess human intel- More recently, an ASD-DiagNet, a joint learning pro-
ligence, and the SRS score includes a 65-item standardized cedure using an autoencoder and a single layer perceptron,
questionnaire regarding behaviors that are associated with was presented [28]. A data augmentation strategy was also
ASD [17]. ASD is highly associated with intellectual dis- designed for the FC features of functional brain networks
ability which is mainly measured by IQ. Meanwhile, some based on linear interpolation of available feature vectors to
studies [18, 19] indicate that IQ discrepancy marks a ensure the robust training of the ASD-DiagNet. By evalu-
meaningful phenotype in ASDs. In this way, IQ becomes an ating the model on 1035 subjects from 17 different sites of
important biomarker to classify the ASD. ABIDE repository, ASD-DiagNet achieves 70.1% on the
Neuroimaging data have also been investigated to ex- accuracy, 67.8% on sensitivity, and 72.8% on specificity in
plore ASD biomarkers in recent decades. To facilitate the 10-fold cross validation. In the mode evaluation of indi-
ASD research community, Autism Brain Imaging Data vidual data centers, ASD-DiagNet outperformed other state-
Exchange (ABIDE), an international collaborative project, of-the-art methods and increased the accuracy performance
has collected data from over 1,000 subjects (e.g., structure up to 20% with a maximum accuracy of 80%.
MRI (sMRI), resting-state functional MRI (rs-fMRI), and In this work, we aim to develop a novel deep learning
PC data) and made the whole database publicly available. model for automated diagnosis of ASD. Specifically, we
This provided a common platform to test hypotheses, search proposed a multichannel deep attention neural network,
key biomarkers, and develop advanced statistical and ma- called DANN, by integrating multiple layers of neural
chine learning algorithms. For example, Ghiassian et al. [20] networks, attention mechanism, and feature fusion to
proposed an automated classifier by combining the histo- capture the interrelationships in multimodality data
gram of orientated gradients approach for feature extraction (functional neuroimaging data and PC data) to distinguish
from sMRI and rs-fMRI data and support vector machines ASD patients from typical development controls (TDCs).
(SVMs) for decision making. Their method was tested on the The attention mechanism-based learning is a type of deep
ABIDE dataset and achieved 65.0% accuracy on hold-out set. learning which is a recent trend for understanding what part
Of late, Sen et al. [21] developed a LEFMS learner, which of historical information weighs more in predicting diseases
applies sparse autoencoder to extract features from sMRI [30, 31]. Taking advantage of large heterogeneous dataset
and spatial nonstationary independent components on rs- from ABIDE, multiscale brain functional connectomes and
fMRI data. SVM was the utilized to classify ASD and im- PC data were obtained as the features. We systematically
proved accuracy by 0.042. Katuwal et al. [22] applied a evaluated the diagnosis power of our multichannel DANN
random forest classifier to classify ASD and achieved an on ASD classification and compared the performance of the
AUC of 0.61. Adding verbal IQ and age to morphometric proposed model with peer machine learning models.
features, AUC was improved to 0.68. By introducing The rest of paper is organized as follows. Section 2 de-
hypergraph learning technique, Zu et al. [23] proposed a scribes ASD data and multichannel deep attention neural
novel learning method to discover complex connectivity network. The experimental setup is shown in Section 3,
biomarkers that are beyond the widely used region-to-region followed by the experimental results and discussion in Section
connections in the conventional brain network analysis. 4. Finally, the conclusion of this work is described in Section 5.
Deep learning has had a profound impact on many data
analytic applications, such as speech recognition, image 2. Materials and Methods
classification, computer vision, and natural language pro-
cessing [24]. Based on data-driven feature construction, deep 2.1. Subjects. We collected preprocessed rs-fMRI and PC
learning provides a new direction for data analytic modelling. data from 809 subjects from publicly accessible ABIDE
Over the past few years, an increasing body of the literature repository, including 408 ASD subjects and 401 TDC sub-
confirmed the success of feature construction using deep jects. Detailed demographic information of subjects is listed
learning methods. Deep learning has been demonstrated to in Table 1. The incidence of ASD between male and female
Complexity 3

Table 1: Demographic information of 809 subjects from ABIDE.


Type Number Gender (M/F) Age FIQ PIQ VIQ
ASD 408 330/78 16.47 ± 6.70 110.63 ± 12.67 107.85 ± 13.41 111.17 ± 13.31
TDC 401 352/49 16.80 ± 7.80 105.28 ± 16.64 105.10 ± 17.10 104.60 ± 17.81
p value — 0.017 0.785 <0.001 0.003 <0.001
ASD: autism spectrum disorder; TDC: typical development control; M: male; F: female; VIQ: the verbal IQ; PIQ: the performance IQ; FIQ: the full-scale IQ.
The values are denoted as mean and standard deviation.

subjects is significantly different, and thus the majority of the A dropout layer, which prevents overfitting during
subjects in ABIDE dataset are male. There is no significant training the model, is applied on input data, e.g. AAL FC
difference between the age of ASD and TDC groups. All (input size is 4005). The white circle in Figure 2 denotes
three IQ scores had significant difference between two dropped units according to dropout probability. The
groups. Later, the variables’ gender, age, and three IQs were dropout layer is followed by four dense layers, whose hidden
used as PC data in our ASD classification experiments. units are 1024, 512, 128, and 32, respectively, and corre-
sponding activation functions are “elu,” “tanh,” “tanh,” and
“relu,” respectively.
2.2. Data Preprocessing. Each of rs-fMRI data has been
preprocessed using Configurable Pipeline for the Analysis of
Connectomes (CPAC) preprocessing pipeline, which in- 2.3.3. Self-Attention. The attention is proposed to compute
cludes slice timing correction, motion realignment, and an alignment score between elements from two sources [32].
intensity normalization. Nuisance variable regression was In particular, given an input FC adjacency matrix, which can
implemented through bandpass filtering and global signal be transformed into a FC adjacency sequence,
regression strategies to clean confounding variations in- x � [x1 , x2 , . . . , xd ] and a representation of a query q ∈ Rd ,
troduced by heartbeats and respiration, head motion, and attention [33] computes the alignment score between q and
low-frequency scanner drifts. Furthermore, boundary-based each element xi using a compatibility function f(xi , q). A
rigid body and FMRIB’s linear and nonlinear image regis- softmax function then transforms the alignment scores
tration tools were used to register functional to anatomical [f(xi , q)]di�1 to a probability distribution p(z | x, q), where z
images. Then, both functional and anatomical images were is an indicator of which element is important to q. That is, a
normalized to template space (MNI 152). Three scales of large p(z � i | x, q) means that xi contributes important
brain functional connectomes were extracted in this work. information to q. This attention process can be formalized as
Mean blood oxygen-level dependent (BOLD) time-series d
signals for three sets of regions of interests (ROIs), i.e., α � 􏼂f xi , q􏼁􏼃i�1 ,
(1)
atlases, including the Automated Anatomical Labeling p(z � i | x, q) � softmax(α).
(AAL) atlas, Harvard-Oxford (HO) atlas, and Craddock 200
(CC200), were calculated. The weights of functional brain The output si is the weighted element according to its
connectivity were defined using Pearson’s correlation co- importance, i.e.,
efficient between any pair of two ROIs. For AAL atlas, each
subject was represented by a 90 × 90 FC adjacency matrix, si � p(z � i | x, q)xi . (2)
symmetric along diagonal, in which each entry represents
Additive attention mechanisms [33, 34] are commonly
the brain connectivity between each pair of ROIs. Similarly,
used attention mechanisms where the compatibility function
each rs-fMRI data was also represented by 110 × 110 and
f(·) is parameterized by a MLP, i.e.,
200 × 200 symmetric FC adjacency matrices using HO and
CC200 atlases, respectively. In addition, from 809 subjects, f xi , q􏼁 � wT σ 􏼐W(1) xi + W(2) q􏼑, (3)
we obtained five PC data, including sex, handedness, full-
scale IQ (FIQ), verbal IQ (VIQ), and performance IQ (PIQ). where W(1) ∈ Rd×d , W(2) ∈ Rd×d , w ∈ Rd are learnable pa-
rameters, d is the dimension of xi , and σ(·) is an activation
function. In contrast to additive attention, multiplicative
2.3. Multichannel Deep Attention Neural Network attention [35, 36] uses cosine similarity or inner product as
the compatibility function for f(xi , q), i.e.,
2.3.1. Overview Structure. An overview of multichannel
DANN is given in Figure 1. It consists of blocks of multi- f xi , q􏼁 � 〈W(1) xi , W(2) q〉. (4)
channel inputs, multilayer perceptron (MLP), self-attention,
fusion, and aggregation. The various components are de- In practice, although additive attention is expensive in
scribed in the following sections. time cost and memory consumption, it usually achieves
better empirical performance for downstream tasks.
Self-attention [37, 38] explores the importance of each
2.3.2. MLP. The MLP block is composed of 5 layers, which feature to the entire FC given a specific task. In particular, q
are one dropout layer and four dense layers. The details of is removed from the common compatibility function which
the block are shown in Figure 2. is formally written as the following equation:
4 Complexity

Multi-channel DANN for ASD Classification


AAL FC (90 × 90) HO FC (110 × 110) CC200 FC (200 × 200) Demographic data

MLP MLP MLP


Od1 Od2 Od3

Self-attention Fusion Self-attention Fusion Self-attention Fusion


Saal u1 Sho u2 Scc u3

Aggregation

32, ‘relu’

Sigmoid

ASD TDC

Figure 1: A DANN structure for ASD classification in this study.

Input 1024 512 128 32 Od1 Od2


Sigmoid

elu
F
tanh
× 1–
tanh
relu
+ ×

Figure 3: Detailed fusion gate in DANN structure.

2.3.4. Fusion. The fusion output u is obtained by com-


bining the outputs of the two dense layer blocks, which can
capture the correlation between the types of spaces. The
combination is accomplished by a fusion gate, as shown in
Figure 3, i.e.,
F � sigmoid􏼒W(f1 ) od1 + W(f2 ) od2 + b(f) 􏼓.
(7)
Figure 2: Detailed MLP block in DANN structure. u � F ⊙ od1 +(1 − F) ⊙ od2 ,

where W(f1) , W(f2) ∈ Rdo , do is the dimension of output od ,


T
f xi 􏼁 � w σ 􏼐W xi 􏼑,(1) and b(f) ∈ R are the learnable parameters of the fusion gate.
d (5)
α � 􏼂f xi 􏼁􏼃i�1 ,
p(z � i | x) � softmax(α). 2.3.5. Aggregation. To aggregate dense layer, self-attention,
and fusion into a DANN, the outputs of self-attention and
The output si is the weighted element according to its fusion blocks can be concatenated, multiplied, or averaged.
importance, i.e., In our implementation, the outputs of both the self-attention
blocks and the fusion blocks are concatenated, followed by a
si � p(z � i | x)xi . (6) dense layer and sigmoid layer for classification:
Complexity 5

ld � relu Wd v + bd 􏼁, 3.2. Peer Machine Learning Models. To compare our mul-


(8) tichannel DANN with existing machine learning models, we
Comb � sigmoid Wc ld + bc 􏼁, also implemented random forest (RF), support vector ma-
chine (SVM) models, and multichannel DNN. Each model
where v is a vector of the combined outputs of both the self-
was designed to take multimodality data as inputs.
attention blocks and the fusion blocks.
v � [saal , sho , scc , u1 , u2 , u3 , Demo] represents the concate-
nation of outputs saal , sho , scc from the self-attention blocks, 3.2.1. Random Forest (RF). RF is one of the classic ensemble
u1 , u2 , u3 from the fusion blocks, and Demo from demo- learning methods by learning multiple decision trees to
graphic data. A sigmoid function on dense lay is then used improve classification performance and control overfitting.
for data classification. The number of trees in the forest was optimized from
empirical values [20, 40, 60, 80, 100]. We set the maximal
3. Experiment Setup depth of the tree as 10.

3.1. Model Evaluation. We conducted a comprehensive


evaluation in this study by employing the proposed 3.2.2. Support Vector Machine (SVM). A SVM model was
multichannel DANN on ABIDE dataset to classify the developed to perform ASD classification by using vectorized
ASD subjects from TDC subjects. Two evaluation strategies, FC features. We applied a linear kernel and searched the
k-fold cross validation and leave-one-site-out cross validation, margin penalty with empirical values [0.2, 0.4, 0.6, 0.8, 1.0].
were designed in our experiments. For k-fold cross validation,
whole ABIDE dataset would be divided into k portions. In 3.2.3. Deep Neural Networks (DNNs). In terms of existing
each repeated iteration, we randomly used one portion of the deep learning model, we compared our model with a DNN
data as testing data and applied the remaining (k − 1) portions model developed previously for ASD classification [26]. In
of the data as training data. This process would be repeated k brief, the compared existing DNN model is a 5-layer DNN,
times until all data have been tested once. For the leave-one- with input number of nodes in input layer, followed by 1024,
site-out cross validation, we separated the whole ABIDE 512, 128, and 32 nodes in hidden layers, and the output layer
dataset according to their data sites. We removed the SBL site contains two output units. A cross entropy loss function was
from this experiment due to its small subject size (N � 4). This adopted. Learning rate was set as 0.0001. 10 epochs were
resulted in a total of 12 data sites. We randomly used data applied to ensure the convergence of the model.
from one site as testing data and treated the remaining data
from 11 data sites as training data. This is repeated 12 times
until data from all sites have been evaluated as testing data. 3.3. Developmental Environment. The proposed DANN and
Both the k-fold cross validation and leave-one-site-out ex- peer machine learning models were implemented in the
periments were repeated 50 times to understand the vari- Python 3.7 environment. To build the deep learning related
ability of the results. Mean and standard deviation (SD) were models, we applied Keras (2.2.4) package with TensorFlow
calculated. Student’s T-test was applied to test the difference (1.13.1) backend. For the traditional models, we adopted the
between continuous values, and chi-square test was used for models from Sklearn 0.20 [39]. Statistical analyses were
discrete values. One-way analysis of variance (ANOVA) was performed using Matlab 2019b.
utilized to compare multiple conditions (i.e., multiple k-fold All the experiments were conducted on a workstation
cross validation experiments). A p value < 0.05 was used for with 10 cores of Intel Core i9 CPU and 64 GB RAM. Due to
inferring statistical significance. the high computation cost of deep learning algorithm, we
We calculated true positive (TP), false positive (FP), true configured one GPU (Nvidia TITAN Xp, 12 GB RAM) to
negative (TN), and false negative (FN) for the classification by accelerate the training speed of the models.
comparing the classified labels and gold-standard labels. Then,
we calculated accuracy, sensitivity, precision, and F-score by 4. Results and Discussion
TP + TN 4.1. Performance Comparison on the Whole ABIDE Dataset.
accuracy � , We first compared the ASD classification performance of
TP + TN + FP + FN
the proposed multichannel DANN model and multiple
TP peer machine learning models, including RF, SVM, and
sensitivity � ,
TP + FN multichannel DNN. The results were calculated based on
50 repeats of 10-fold cross validation experiments by using
TP the entire ABIDE dataset. The mean and SD of the per-
precision � , (9)
TP + FP formance metrics are listed in Table 2. The proposed
precision × sensitivity multichannel DANN exhibited a significantly higher ac-
F − score � 2 × , curacy than multichannel DNN (p � 0.01), SVM
precision + sensitivity
(p � 0.014), and RF (p � 0.008) models. Similarly, the
TN multichannel DANN also had better F-score than multi-
specificity � . channel DNN (p � 0.004), SVM (p < 0.001), and RF
TN + FP
(p < 0.001) models. The sensitivity of the multichannel
6 Complexity

Table 2: Comparison of random forest (RF), support vector machine (SVM), multichannel deep neural network (DNN), and multichannel
deep attention neural network (DANN) classifiers trained using 10-fold cross validation on the entire dataset.
Method Accuracy Sensitivity Precision F-Score Specificity
RF 0.659 ± 0.018 0.689 ± 0.106 0.656 ± 0.012 0.671 ± 0.023 0.628 ± 0.081
SVM 0.693 ± 0.059 0.713 ± 0.059 0.696 ± 0.072 0.702 ± 0.048 0.673 ± 0.113
Multichannel DNN 0.707 ± 0.027 0.673 ± 0.088 0.740 ± 0.106 0.718 ± 0.060 0.700 ± 0.067
Multichannel DANN 0.732 ± 0.024 0.745 ± 0.115 0.730 ± 0.053 0.736 ± 0.042 0.717 ± 0.101
All data are mean and standard deviation. The highest metrics were marked as bold.

DANN was significantly higher than that of multichannel across all repeats, the performances have much smaller
DNN (p � 0.009), SVM (p � 0.015), and RF (p � 0.005) variations than k-fold cross validation. Table 3 shows the
models. The specificity of the multichannel DANN was classification performance of our model and the size of
significantly higher than that of SVM (p � 0.004) and RF subjects for each data site.
(p < 0.001) models but was not significantly better than In the NYU data site that contains the largest sample size,
multichannel DNN (p � 0.082). Since the multichannel our model achieved an accuracy of 0.709 ± 0.019, sensitivity
DNN had a relatively lower sensitivity (0.673), it achieved of 0.720 ± 0.086, the precision of 0.758 ± 0.127, F-score of
the best mean precision in our experiments. No significant 0.738 ± 0.069, and specificity of 0.689 ± 0.072. When ex-
difference (p � 0.219) was found between multichannel amining data sites with more than 40 subjects, we found that
DNN and DANN on precision. The multichannel DANN our model achieved the highest accuracy (0.803 ± 0.045) on
model still exhibited higher precision than SVM the USM site and the best F-score (0.745 ± 0.052) on the
(p � 0.003) and RF (p < 0.001). Overall, the proposed UCLA site. These two sites contain nearly 100 subjects, so
multichannel DANN achieved improved ASD classifica- the results are very informative. We also noted that the
tion accuracy, sensitivity, F-score, and specificity among lowest accuracy our model returned was 0.684 ± 0.026 from
compared machine learning models, while the multi- UM site, suggesting that the data here may have variability
channel DNN had the highest precision. that is different from other sites. Overall, our model reached
Inspiringly, the proposed multichannel DANN sig- a mean accuracy of 0.713 ± 0.022 and mean F-score
nificantly outperformed multichannel DNN on four of 0.707 ± 0.043. This was significantly lower than accuracy
five performance metrics, increasing mean accuracy by (p � 0.002) and F-score (p < 0.001) from the cross validation
0.025, sensitivity by 0.072, F-score by 0.018, and specificity results in Table 2, indicating a large data variability among
by 0.017. Although no significance was found, the pre- different data sites.
cision of the proposed approach is slightly lower than
multichannel DNN by 0.01. The attention mechanism in
our model, as the name implies, aids the deep learning 4.3. Robustness of Multichannel DANN on Varying Data Split
model to make choices about which features it should pay Schemes. Next, the robustness of our DANN was further
attention. Our model can allocate attention by adjusting tested using varying k-fold cross validation. A classification
the weights they assign to individual FC features. This model that is not robust may appear to perform very dif-
process can decide which FC features are more important ferently with different k. Figure 4 shows plots of the accu-
than others in terms of the ASD classification task. In racy, sensitivity, precision, F-score, and specificity of the
another word, it optimizes the feature selection during the proposed DANN over k-fold cross validation strategies
learning of a deep learning model. The improved per- (k � [6, 7, 8, 9, 10]). Using one-way ANOVA, the proposed
formance of DANN over DNN demonstrated the validity DANN exhibited no significantly different performance
of the attention mechanism. The results in Table 2 also across varying k-fold experiments (p � 0.082), indicating the
showed that multichannel DANN achieved significantly robustness of the proposed multichannel DANN model.
improved performance, compared to traditional models
SVM and RF. This is consistent with multiple previous 4.4. Impact of Data Modality on the Classification
ASD classification studies [26, 27]. The improvement was Performance. At the end, we set to test the performance of
likely due to a combination of attention mechanism and the multichannel DANN when different data modalities are
the superior capability of deep learning model on complex used for ASD classification. All results were based on 50
data patterns, such as FC features. repeats of 10-fold cross validation experiment. Table 4 lists
the performance of multichannel DANN on varying com-
binations of FC data (marked as AAL, HO, and CC200) and
4.2. Leave-One-Site-Out Cross Validation of Multichannel PC data (marked as Demo). The upper part of Table 4
DANN. To test the generalizability of the proposed model contains results based on both FC and PC data, while the
on unseen data from different data sites, we performed a lower part of the table focuses on FC data only. The com-
leave-one-site-out cross validation. Similar to k-fold cross bined FC and PC data (AAL + HO + CC + Demo) had a
validation, we reserved data from one data site as testing data better accuracy (p � 0.011), sensitivity (p � 0.039), and
and trained our model by using all data from the rest of the specificity (p � 0.025) than FC data alone
11 data sites. But, since the training data were the same (AAL + HO + CC), while no significant differences were
Complexity 7

Table 3: Leave-one-site-out cross validation results using multichannel DANN.


Site-out Size Accuracy Sensitivity Precision F-score Specificity
TRINITY 46 0.696 ± 0.012 0.640 ± 0.012 0.762 ± 0.036 0.696 ± 0.004 0.679 ± 0.070
YALE 56 0.696 ± 0.025 0.679 ± 0.029 0.714 ± 0.032 0.691 ± 0.034 0.682 ± 0.065
STANFORD 39 0.615 ± 0.018 0.350 ± 0.025 0.778 ± 0.039 0.483 ± 0.015 0.685 ± 0.032
SDSU 36 0.694 ± 0.024 0.727 ± 0.095 0.762 ± 0.072 0.744 ± 0.059 0.705 ± 0.067
CALTECH 36 0.667 ± 0.029 0.556 ± 0.016 0.714 ± 0.029 0.625 ± 0.015 0.693 ± 0.038
UCLA 98 0.755 ± 0.015 0.795 ± 0.017 0.700 ± 0.009 0.745 ± 0.012 0.701 ± 0.019
CMU 27 0.630 ± 0.019 0.692 ± 0.044 0.600 ± 0.037 0.643 ± 0.044 0.684 ± 0.035
USM 71 0.803 ± 0.015 0.560 ± 0.028 0.824 ± 0.034 0.667 ± 0.029 0.685 ± 0.038
NYU 175 0.709 ± 0.019 0.720 ± 0.026 0.758 ± 0.027 0.738 ± 0.039 0.689 ± 0.022
PITT 56 0.696 ± 0.022 0.778 ± 0.023 0.656 ± 0.002 0.712 ± 0.027 0.717 ± 0.013
LEUVEN 29 0.621 ± 0.017 1.000 ± 0.017 0.577 ± 0.027 0.732 ± 0.028 0.674 ± 0.022
UM 126 0.684 ± 0.026 0.761 ± 0.008 0.675 ± 0.009 0.715 ± 0.008 0.671 ± 0.012
Mean 62 0.713 ± 0.022 0.712 ± 0.081 0.731 ± 0.087 0.707 ± 0.043 0.713 ± 0.057
All data are mean and standard deviation.

Accuracy Sensitivity
0.76
0.85
0.74
0.80
0.72
0.75
0.70
0.70
0.68
0.65
0.66
0.60
0.64
Fold 6 Fold 7 Fold 8 Fold 9 Fold 10 Fold 6 Fold 7 Fold 8 Fold 9 Fold 10
(a) (b)
Precision F-score
0.78
0.80 0.76
0.74
0.75
0.72
0.70 0.70
0.68
0.65
0.66
0.60 0.64
Fold 6 Fold 7 Fold 8 Fold 9 Fold 10 Fold 6 Fold 7 Fold 8 Fold 9 Fold 10
(c) (d)
Specificity

0.76
0.74
0.72
0.70
0.68
0.66
0.64
Fold 6 Fold 7 Fold 8 Fold 9 Fold 10
(e)

Figure 4: Performance of multichannel DANN over varying data split schemes with k-fold cross validation strategies (k � [6, 7, 8, 9, 10]).
Mean and standard deviation are displayed.
8 Complexity

Table 4: Comparison of multichannel DANN on different data combinations using 10-fold cross validation on the entire dataset.
Data Accuracy Sensitivity Precision F-score Specificity
AAL + HO + CC + Demo 0.732 ± 0.024 0.745 ± 0.115 0.730 ± 0.053 0.736 ± 0.042 0.717 ± 0.101
AAL + HO + Demo 0.700 ± 0.035 0.698 ± 0.068 0.701 ± 0.035 0.702 ± 0.004 0.673 ± 0.401
AAL + CC + Demo 0.703 ± 0.009 0.721 ± 0.084 0.686 ± 0.071 0.699 ± 0.067 0.697 ± 0.060
HO + CC + Demo 0.691 ± 0.018 0.696 ± 0.098 0.686 ± 0.106 0.690 ± 0.054 0.687 ± 0.065
AAL + Demo 0.666 ± 0.002 0.683 ± 0.080 0.650 ± 0.071 0.659 ± 0.006 0.679 ± 0.031
HO + Demo 0.689 ± 0.027 0.681 ± 0.078 0.696 ± 0.106 0.691 ± 0.011 0.685 ± 0.057
CC + Demo 0.692 ± 0.053 0.703 ± 0.092 0.681 ± 0.141 0.689 ± 0.043 0.704 ± 0.055
AAL + HO + CC 0.720 ± 0.062 0.696 ± 0.097 0.738 ± 0.212 0.724 ± 0.078 0.695 ± 0.056
AAL + HO 0.684 ± 0.027 0.636 ± 0.084 0.730 ± 0.018 0.699 ± 0.035 0.673 ± 0.050
AAL + CC 0.695 ± 0.018 0.683 ± 0.083 0.706 ± 0.124 0.700 ± 0.030 0.683 ± 0.052
HO + CC 0.688 ± 0.027 0.666 ± 0.086 0.711 ± 0.053 0.697 ± 0.020 0.683 ± 0.055
AAL 0.658 ± 0.009 0.641 ± 0.078 0.674 ± 0.141 0.663 ± 0.035 0.658 ± 0.032
HO 0.679 ± 0.009 0.683 ± 0.065 0.674 ± 0.071 0.677 ± 0.007 0.691 ± 0.029
CC 0.682 ± 0.005 0.651 ± 0.071 0.713 ± 0.106 0.693 ± 0.034 0.677 ± 0.046
AAL: AAL atlas-based FC; HO: HO atlas-based FC; CC: CC200 atlas-based FC; Demo: PC data. All data are mean and standard deviation.

observed on precision (p � 0.231) and F-score (p � 0.347). Conflicts of Interest


This demonstrated the predictive power of PC data.
Without PC data, our model achieved the highest The authors declare that there are no conflicts of interest
performance by combining FC from all three brain atlases. regarding the publication of this paper.
This suggests that brain connected data from different
atlases may have complementary information so as to assist Acknowledgments
the ASD classification. Interestingly, the model using
CC200 FC data (marked as CC in the table) performed This work was supported in part by the Beijing Education
better than FC data derived from AAL (p � 0.012) and HO Commission Research Project of China under grant no.
(p � 0.023). It is likely because that CC200 atlas is con- KM201911232004, National Natural Science Foundation of
structed from rs-fMRI data, representing a brain functional China under grant no. 61672105, and National Key Research
parcellation. and Development Program of China under grant no.
2018YFB1004100.
5. Conclusion
In summary, we developed a multichannel DANN model References
by applying the state-of-the-art attention mechanism- [1] J. Baio, Prevalence of Autism Spectrum Disorder Among
based deep learning techniques for automated diagnosis of Children Aged 8 Years-Autism and Developmental Disabilities
ASD. The k-fold cross validation experiments have shown Monitoring Network, 11 Sites, Centers for Disease Control and
that our multichannel DANN achieved an accuracy of Prevention, Atlanta, GA, USA, 2010.
0.732, outperforming multiple peer machine learning [2] L. Tonello, L. Giacobbi, A. Pettenon et al., “Crisis behavior in
models. The results of the leave-one-site-out cross vali- autism spectrum disorders: a self-organized criticality ap-
dation experiments showed promise for our model to be proach,” Complexity, vol. 2018, pp. 1–7, 2018.
applied to clinical data with unseen variations. The ex- [3] E. Simonoff, A. Pickles, T. Charman, S. Chandler, T. Loucas,
and G. Baird, “Psychiatric disorders in children with autism
periments using varying combinations of data modalities
spectrum disorders: prevalence, comorbidity, and associated
demonstrated discriminative power of individual data factors in a population-derived sample,” Journal of the
modalities such as brain functional connectome and PC American Academy of Child & Adolescent Psychiatry, vol. 47,
data. This suggests a future direction of combining addi- no. 8, pp. 921–929, 2008.
tional data modalities to move the machine learning ap- [4] S. Goldstein and S. Ozonoff, Assessment of Autism Spectrum
plications towards clinical usage of ASD computer-aided Disorder, Guilford Publications, New York, NY, USA, 2018.
diagnosis tools. One limitation of the current work is that [5] I. Riquelme, S. M. Hatem, and P. Montoya, “Abnormal
the selected cohort is in the adolescent and young adult pressure pain, touch sensitivity, proprioception, and manual
population, which limits the generalizability of the model, dexterity in children with autism spectrum disorders,” Neural
since the ASD diagnosis was performed much earlier. In the Plasticity, vol. 2016, Article ID 1723401, 9 pages, 2016.
[6] R. Djemal, K. AlSharabi, S. Ibrahim, and A. Alsuwailem,
future study, we would retrain the model with additional
“EEG-based computer aided diagnosis of autism spectrum
data from a wider age range of population. disorder using wavelet, entropy, and ann,” BioMed Research
International, vol. 2017, Article ID 9816591, 9 pages, 2017.
Data Availability [7] K. B. Schauder and L. Bennetto, “Toward an interdisciplinary
understanding of sensory dysfunction in autism spectrum
The dataset used to support the findings of this study is disorder: an integration of the neural and symptom litera-
available in http://fcon_1000.projects.nitrc.org/indi/abide/. tures,” Frontiers in Neuroscience, vol. 10, p. 268, 2016.
Complexity 9

[8] N. Newbutt, C. Sung, H. J. Kuo, and M. J. Leahy, “The accep- [24] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
tance, challenges, and future applications of wearable technology vol. 521, no. 7553, pp. 436–444, 2015.
and virtual reality to support people with autism spectrum [25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
disorders,” in Recent Advances in Technologies for Inclusive Well- classification with deep convolutional neural networks,” in
Being, pp. 221–241, Springer, Berlin, Germany, 2017. Advances in Neural Information Processing Systems, Curran
[9] American Psychiatric Association, Diagnostic and Statistical Associates, Inc., Lake Tahoe, NV, USA, 2012.
®
Manual of Mental Disorders (DSM-5 ), American Psychiatric
Association Publishing, GA, USA, 2013.
[26] A. S. Heinsfeld, A. R. Franco, R. C. Craddock, A. Buchweitz,
and F. Meneguzzi, “Identification of autism spectrum dis-
[10] M. Galliver, E. Gowling, W. Farr, A. Gain, and I. Male, “Cost order using deep learning and the abide dataset,” NeuroImage:
of assessing a child for possible autism spectrum disorder? an Clinical, vol. 17, pp. 16–23, 2018.
observational study of current practice in child development [27] Y. Kong, J. Gao, Y. Xu, Y. Pan, J. Wang, and J. Liu, “Clas-
centres in the UK,” BMJ Paediatrics Open, vol. 1, no. 1, Article sification of autism spectrum disorder by combining brain
ID e000052, 2017. connectivity and deep neural network classifier,” Neuro-
[11] K. K. Hyde, M. N. Novack, N. LaHaye et al., “Applications of computing, vol. 324, pp. 63–68, 2019.
supervised machine learning in autism spectrum disorder [28] T. Eslami, V. Mirjalili, A. Fong, A. Laird, and F. Saeed, “ASD-
research: a review,” Review Journal of Autism and Develop- diagnet: a hybrid learning approach for detection of autism
mental Disorders, vol. 6, no. 2, pp. 128–146, 2019. spectrum disorder using FMRI data,” 2019, http://arxiv.org/
[12] D. Gil, M. Johnsson, H. Mora, and J. Szymanski, “Advances in abs/1904.07577.
architectures, big data, and machine learning techniques for [29] J. Guo, K. Yang, H. Liu et al., “A stacked sparse autoencoder-
complex Internet of things systems,” Complexity, vol. 2019, based detector for automatic identification of neuromagnetic
Article ID 4184708, 3 pages, 2019. high frequency oscillations in epilepsy,” IEEE Transactions on
[13] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, “Machine Medical Imaging, vol. 37, no. 11, pp. 2474–2482, 2018.
learning on big data: opportunities and challenges,” Neuro- [30] F. Ma, Q. You, H. Xiao, R. Chitta, J. Zhou, and J. G. Kame,
computing, vol. 237, pp. 350–361, 2017. “Knowledge-based attention model for diagnosis prediction
[14] Ç. Uğur, H. Tunca, E. Sekmen et al., “A comparative study of in healthcare,” in Proceedings of the 27th ACM International
the oxidative stress indices of children with autism and Conference on Information and Knowledge Management,
healthy children,” Anatolian Journal of Psychiatry, vol. 19, pp. 743–752, ACM, Torino, Italy, October 2018.
no. 3, 2018. [31] X. Peng, G. Long, T. Shen, S. Wang, J. Jiang, and
[15] C. Li, H. Zhou, T. Wang et al., “Performance of the autism M. Blumenstein, “Temporal self-attention network for
spectrum rating scale and social responsiveness scale in medical concept embedding,” 2019, http://arxiv.org/abs/1909.
identifying autism spectrum disorder among cases of intel- 06886.
lectual disability,” Neuroscience Bulletin, vol. 34, no. 6, [32] T. Shen, T. Zhou, G. Long, J. Jiang, S. Pan, and C. Zhang,
pp. 972–980, 2018. “Disan: directional self-attention network for RNN/CNN-free
[16] R. L. Hansen, N. J. Blum, A. Gaham et al., “Diagnosis of language understanding,” in Proceedings of the Thirty-Second
autism spectrum disorder by developmental-behavioral pe- AAAI Conference on Artificial Intelligence, New Orleans, LA,
diatricians in academic centers: a DBPNet study,” Pediatrics, USA, February 2018.
vol. 137, no. 2, pp. S79–S89, 2016. [33] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine
[17] J. N. Constantino and C. P. Gruber, Social Responsiveness translation by jointly learning to align and translate,” 2014,
Scale (SRS), Western Psychological Services, Springer, New http://arxiv.org/abs/1409.0473.
York, NY, USA, 2007. [34] L. Shang, Z. Lu, and H. Li, “Neural responding machine for
[18] T. Charman, A. Pickles, E. Simonoff, S. Chandler, T. Loucas, short-text conversation,” 2015, http://arxiv.org/abs/1503.
and G. Baird, “IQ in children with autism spectrum disorders: 02364.
data from the special needs and autism project (SNAP),” [35] S. Sukhbaatar, J. Weston, R. Fergus et al., “End-to-end
Psychological Medicine, vol. 41, no. 3, pp. 619–627, 2011. memory networks,” in Proceedings of the Conference on
[19] S. L. Bishop, J. Richler, and C. Lord, “Association between Neural Information Processing Systems, Montreal, Canada,
restricted and repetitive behaviors and nonverbal IQ in December 2015.
children with autism spectrum disorders,” Child Neuropsy- [36] A. M. Rush, S. Chopra, and J. Weston, “A neural attention
model for abstractive sentence summarization,” 2015, http://
chology, vol. 12, no. 4-5, pp. 247–267, 2006.
arxiv.org/abs/1509.00685.
[20] S. Ghiassian, R. Greiner, P. Jin, and M. R. G. Brown, “Using
[37] Z. Lin, M. Feng, C. Nogueira dos Santos et al., “A structured
functional or structural magnetic resonance images and
self-attentive sentence embedding,” 2017, http://arxiv.org/
personal characteristic data to identify ADHD and autism,”
abs/1703.03130.
PLoS One, vol. 11, no. 12, Article ID e0166934, 2016.
[38] Y. Liu, C. Sun, L. Lin, and X. Wang, “Learning natural lan-
[21] B. Sen, N. C. Borle, R. Greiner, and M. R. G. Brown, “A
guage inference using bidirectional LSTM model and inner-
general prediction model for the detection of ADHD and
attention,” 2016, http://arxiv.org/abs/1605.09090.
autism using structural and functional MRI,” PLoS One,
[39] F. Pedregosa, G. Varoquaux, A. Gramfort et al., “Scikit-learn:
vol. 13, no. 4, Article ID e0194856, 2018.
machine learning in Python,” Journal of Machine Learning
[22] G. J. Katuwal, S. A. Baum, N. D. Cahill, and M. M. Andrew,
Research, vol. 12, pp. 2825–2830, 2011.
“Divide and conquer: sub-grouping of asd improves ASD
detection based on brain morphometry,” PLoS One, vol. 11,
no. 4, Article ID e0153331, 2016.
[23] C. Zu, Y. Gao, B. Munsell et al., “Identifying disease-related
subnetwork connectome biomarkers by sparse hypergraph
learning,” Brain Imaging and Behavior, vol. 13, no. 4,
pp. 879–892, 2018.

You might also like