Research Article
Research Article
Complexity
Volume 2020, Article ID 1357853, 9 pages
https://doi.org/10.1155/2020/1357853
Research Article
Multichannel Deep Attention Neural Networks for the
Classification of Autism Spectrum Disorder Using
Neuroimaging and Personal Characteristic Data
Ke Niu ,1,2 Jiayang Guo ,3 Yijie Pan,4 Xin Gao,5 Xueping Peng ,2 Ning Li ,1
and Hailong Li 6
1
Computer School, Beijing Information Science and Technology University, Beijing 100101, China
2
CAI, School of Computer Science, Faculty of Engineering and Information Technology, University of Technology Sydney,
Ultimo, Australia
3
Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH 45221, USA
4
Ningbo Institute of Information Technology Application, CAS, Beijing, China
5
Computational Bioscience Research Center (CBRC),
Computer Electrical and Mathematical Sciences and Engineering (CEMSE) Division,
King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
6
Department of Pediatrics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
Correspondence should be addressed to Jiayang Guo; [email protected], Xueping Peng; [email protected], and
Hailong Li; [email protected]
Received 12 June 2019; Revised 1 January 2020; Accepted 4 January 2020; Published 31 January 2020
Copyright © 2020 Ke Niu et al. This is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Autism spectrum disorder (ASD) is a developmental disorder that impacts more than 1.6% of children aged 8 across the United States. It
is characterized by impairments in social interaction and communication, as well as by a restricted repertoire of activity and interests. The
current standardized clinical diagnosis of ASD remains to be a subjective diagnosis, mainly relying on behavior-based tests. However, the
diagnostic process for ASD is not only time consuming, but also costly, causing a tremendous financial burden for patients’ families.
Therefore, automated diagnosis approaches have been an attractive solution for earlier identification of ASD. In this work, we set to
develop a deep learning model for automated diagnosis of ASD. Specifically, a multichannel deep attention neural network (DANN) was
proposed by integrating multiple layers of neural networks, attention mechanism, and feature fusion to capture the interrelationships in
multimodality data. We evaluated the proposed multichannel DANN model on the Autism Brain Imaging Data Exchange (ABIDE)
repository with 809 subjects (408 ASD patients and 401 typical development controls). Our model achieved a state-of-the-art accuracy of
0.732 on ASD classification by integrating three scales of brain functional connectomes and personal characteristic data, outperforming
multiple peer machine learning models in a k-fold cross validation experiment. Additional k-fold and leave-one-site-out cross validation
were conducted to test the generalizability and robustness of the proposed multichannel DANN model. The results show promise for
deep learning models to aid the future automated clinical diagnosis of ASD.
1. Introduction activity and interests [2–5]. Patients with ASD exhibit dif-
ferent levels of impairments, ranging from above average to
Autism spectrum disorder (ASD) has been estimated to intellectual disability. In neuroscience, ASD remains a
occur in more than 1.6% of children aged 8 across the United formidable challenge, due to their high prevalence, com-
States [1]. As a chronic neurological condition, ASD is plexity, and substantial heterogeneity, which require mul-
characterized by impairments in social interaction and tidisciplinary efforts [6–8]. Although clinical therapies have
communication, as well as by a restricted repertoire of been developed to treat the symptoms, the diagnosis of ASD
2 Complexity
remains to be a challenging task. Currently, behavior-based outperform traditional machine learning algorithms on nu-
test is the standard clinical method for diagnosing ASD [9]. merous recognition and classification tasks [24–29], which
However, the diagnostic process for ASD is not only time inspires the researchers in the ASD community to apply deep
consuming but also costly [10]. This results in a tremendous learning approaches on ASD classification. Earlier, deep
financial burden for patients’ families. Meanwhile, with this neural networks (DNNs) have been applied to identify ASD
lifetime ASD, the patients may have difficulties in normal patients using rs-fMRI [26]. Their model achieved 70% on
socialization and working environments, increasing the accuracy by using the functional connectivity (FC) matrix as
overall social costs. Therefore, an automated diagnosis ap- features for model training.
proach is desirable for earlier identification of ASD. Kong et al. [27] constructed individual functional brain
Machine learning is a promising tool for investigating networks using the rs-fMRI data from 182 subjects of NYU
the replicability of patterns across larger, more heteroge- Langone Medical Center, a data site within ABIDE repository.
neous datasets [11–13]. For automated diagnosis of ASD, FC features were used to represent the networks of all subjects
personal characteristic (PC) data, such as intelligence and further ranked using F-score. Then, a stacked sparse
quotient (IQ) and Social Responsiveness Scale (SRS) score autoencoder-based DNN model was developed. Significant
have been adopted in several studies [14–16]. In the study of performance improvement was achieved by comparing the
ASD, IQ is a type of standard score that is derived from proposed method with two existing algorithms.
several standardized tests designed to assess human intel- More recently, an ASD-DiagNet, a joint learning pro-
ligence, and the SRS score includes a 65-item standardized cedure using an autoencoder and a single layer perceptron,
questionnaire regarding behaviors that are associated with was presented [28]. A data augmentation strategy was also
ASD [17]. ASD is highly associated with intellectual dis- designed for the FC features of functional brain networks
ability which is mainly measured by IQ. Meanwhile, some based on linear interpolation of available feature vectors to
studies [18, 19] indicate that IQ discrepancy marks a ensure the robust training of the ASD-DiagNet. By evalu-
meaningful phenotype in ASDs. In this way, IQ becomes an ating the model on 1035 subjects from 17 different sites of
important biomarker to classify the ASD. ABIDE repository, ASD-DiagNet achieves 70.1% on the
Neuroimaging data have also been investigated to ex- accuracy, 67.8% on sensitivity, and 72.8% on specificity in
plore ASD biomarkers in recent decades. To facilitate the 10-fold cross validation. In the mode evaluation of indi-
ASD research community, Autism Brain Imaging Data vidual data centers, ASD-DiagNet outperformed other state-
Exchange (ABIDE), an international collaborative project, of-the-art methods and increased the accuracy performance
has collected data from over 1,000 subjects (e.g., structure up to 20% with a maximum accuracy of 80%.
MRI (sMRI), resting-state functional MRI (rs-fMRI), and In this work, we aim to develop a novel deep learning
PC data) and made the whole database publicly available. model for automated diagnosis of ASD. Specifically, we
This provided a common platform to test hypotheses, search proposed a multichannel deep attention neural network,
key biomarkers, and develop advanced statistical and ma- called DANN, by integrating multiple layers of neural
chine learning algorithms. For example, Ghiassian et al. [20] networks, attention mechanism, and feature fusion to
proposed an automated classifier by combining the histo- capture the interrelationships in multimodality data
gram of orientated gradients approach for feature extraction (functional neuroimaging data and PC data) to distinguish
from sMRI and rs-fMRI data and support vector machines ASD patients from typical development controls (TDCs).
(SVMs) for decision making. Their method was tested on the The attention mechanism-based learning is a type of deep
ABIDE dataset and achieved 65.0% accuracy on hold-out set. learning which is a recent trend for understanding what part
Of late, Sen et al. [21] developed a LEFMS learner, which of historical information weighs more in predicting diseases
applies sparse autoencoder to extract features from sMRI [30, 31]. Taking advantage of large heterogeneous dataset
and spatial nonstationary independent components on rs- from ABIDE, multiscale brain functional connectomes and
fMRI data. SVM was the utilized to classify ASD and im- PC data were obtained as the features. We systematically
proved accuracy by 0.042. Katuwal et al. [22] applied a evaluated the diagnosis power of our multichannel DANN
random forest classifier to classify ASD and achieved an on ASD classification and compared the performance of the
AUC of 0.61. Adding verbal IQ and age to morphometric proposed model with peer machine learning models.
features, AUC was improved to 0.68. By introducing The rest of paper is organized as follows. Section 2 de-
hypergraph learning technique, Zu et al. [23] proposed a scribes ASD data and multichannel deep attention neural
novel learning method to discover complex connectivity network. The experimental setup is shown in Section 3,
biomarkers that are beyond the widely used region-to-region followed by the experimental results and discussion in Section
connections in the conventional brain network analysis. 4. Finally, the conclusion of this work is described in Section 5.
Deep learning has had a profound impact on many data
analytic applications, such as speech recognition, image 2. Materials and Methods
classification, computer vision, and natural language pro-
cessing [24]. Based on data-driven feature construction, deep 2.1. Subjects. We collected preprocessed rs-fMRI and PC
learning provides a new direction for data analytic modelling. data from 809 subjects from publicly accessible ABIDE
Over the past few years, an increasing body of the literature repository, including 408 ASD subjects and 401 TDC sub-
confirmed the success of feature construction using deep jects. Detailed demographic information of subjects is listed
learning methods. Deep learning has been demonstrated to in Table 1. The incidence of ASD between male and female
Complexity 3
subjects is significantly different, and thus the majority of the A dropout layer, which prevents overfitting during
subjects in ABIDE dataset are male. There is no significant training the model, is applied on input data, e.g. AAL FC
difference between the age of ASD and TDC groups. All (input size is 4005). The white circle in Figure 2 denotes
three IQ scores had significant difference between two dropped units according to dropout probability. The
groups. Later, the variables’ gender, age, and three IQs were dropout layer is followed by four dense layers, whose hidden
used as PC data in our ASD classification experiments. units are 1024, 512, 128, and 32, respectively, and corre-
sponding activation functions are “elu,” “tanh,” “tanh,” and
“relu,” respectively.
2.2. Data Preprocessing. Each of rs-fMRI data has been
preprocessed using Configurable Pipeline for the Analysis of
Connectomes (CPAC) preprocessing pipeline, which in- 2.3.3. Self-Attention. The attention is proposed to compute
cludes slice timing correction, motion realignment, and an alignment score between elements from two sources [32].
intensity normalization. Nuisance variable regression was In particular, given an input FC adjacency matrix, which can
implemented through bandpass filtering and global signal be transformed into a FC adjacency sequence,
regression strategies to clean confounding variations in- x � [x1 , x2 , . . . , xd ] and a representation of a query q ∈ Rd ,
troduced by heartbeats and respiration, head motion, and attention [33] computes the alignment score between q and
low-frequency scanner drifts. Furthermore, boundary-based each element xi using a compatibility function f(xi , q). A
rigid body and FMRIB’s linear and nonlinear image regis- softmax function then transforms the alignment scores
tration tools were used to register functional to anatomical [f(xi , q)]di�1 to a probability distribution p(z | x, q), where z
images. Then, both functional and anatomical images were is an indicator of which element is important to q. That is, a
normalized to template space (MNI 152). Three scales of large p(z � i | x, q) means that xi contributes important
brain functional connectomes were extracted in this work. information to q. This attention process can be formalized as
Mean blood oxygen-level dependent (BOLD) time-series d
signals for three sets of regions of interests (ROIs), i.e., α � f xi , qi�1 ,
(1)
atlases, including the Automated Anatomical Labeling p(z � i | x, q) � softmax(α).
(AAL) atlas, Harvard-Oxford (HO) atlas, and Craddock 200
(CC200), were calculated. The weights of functional brain The output si is the weighted element according to its
connectivity were defined using Pearson’s correlation co- importance, i.e.,
efficient between any pair of two ROIs. For AAL atlas, each
subject was represented by a 90 × 90 FC adjacency matrix, si � p(z � i | x, q)xi . (2)
symmetric along diagonal, in which each entry represents
Additive attention mechanisms [33, 34] are commonly
the brain connectivity between each pair of ROIs. Similarly,
used attention mechanisms where the compatibility function
each rs-fMRI data was also represented by 110 × 110 and
f(·) is parameterized by a MLP, i.e.,
200 × 200 symmetric FC adjacency matrices using HO and
CC200 atlases, respectively. In addition, from 809 subjects, f xi , q � wT σ W(1) xi + W(2) q, (3)
we obtained five PC data, including sex, handedness, full-
scale IQ (FIQ), verbal IQ (VIQ), and performance IQ (PIQ). where W(1) ∈ Rd×d , W(2) ∈ Rd×d , w ∈ Rd are learnable pa-
rameters, d is the dimension of xi , and σ(·) is an activation
function. In contrast to additive attention, multiplicative
2.3. Multichannel Deep Attention Neural Network attention [35, 36] uses cosine similarity or inner product as
the compatibility function for f(xi , q), i.e.,
2.3.1. Overview Structure. An overview of multichannel
DANN is given in Figure 1. It consists of blocks of multi- f xi , q � 〈W(1) xi , W(2) q〉. (4)
channel inputs, multilayer perceptron (MLP), self-attention,
fusion, and aggregation. The various components are de- In practice, although additive attention is expensive in
scribed in the following sections. time cost and memory consumption, it usually achieves
better empirical performance for downstream tasks.
Self-attention [37, 38] explores the importance of each
2.3.2. MLP. The MLP block is composed of 5 layers, which feature to the entire FC given a specific task. In particular, q
are one dropout layer and four dense layers. The details of is removed from the common compatibility function which
the block are shown in Figure 2. is formally written as the following equation:
4 Complexity
Aggregation
32, ‘relu’
Sigmoid
ASD TDC
elu
F
tanh
× 1–
tanh
relu
+ ×
Table 2: Comparison of random forest (RF), support vector machine (SVM), multichannel deep neural network (DNN), and multichannel
deep attention neural network (DANN) classifiers trained using 10-fold cross validation on the entire dataset.
Method Accuracy Sensitivity Precision F-Score Specificity
RF 0.659 ± 0.018 0.689 ± 0.106 0.656 ± 0.012 0.671 ± 0.023 0.628 ± 0.081
SVM 0.693 ± 0.059 0.713 ± 0.059 0.696 ± 0.072 0.702 ± 0.048 0.673 ± 0.113
Multichannel DNN 0.707 ± 0.027 0.673 ± 0.088 0.740 ± 0.106 0.718 ± 0.060 0.700 ± 0.067
Multichannel DANN 0.732 ± 0.024 0.745 ± 0.115 0.730 ± 0.053 0.736 ± 0.042 0.717 ± 0.101
All data are mean and standard deviation. The highest metrics were marked as bold.
DANN was significantly higher than that of multichannel across all repeats, the performances have much smaller
DNN (p � 0.009), SVM (p � 0.015), and RF (p � 0.005) variations than k-fold cross validation. Table 3 shows the
models. The specificity of the multichannel DANN was classification performance of our model and the size of
significantly higher than that of SVM (p � 0.004) and RF subjects for each data site.
(p < 0.001) models but was not significantly better than In the NYU data site that contains the largest sample size,
multichannel DNN (p � 0.082). Since the multichannel our model achieved an accuracy of 0.709 ± 0.019, sensitivity
DNN had a relatively lower sensitivity (0.673), it achieved of 0.720 ± 0.086, the precision of 0.758 ± 0.127, F-score of
the best mean precision in our experiments. No significant 0.738 ± 0.069, and specificity of 0.689 ± 0.072. When ex-
difference (p � 0.219) was found between multichannel amining data sites with more than 40 subjects, we found that
DNN and DANN on precision. The multichannel DANN our model achieved the highest accuracy (0.803 ± 0.045) on
model still exhibited higher precision than SVM the USM site and the best F-score (0.745 ± 0.052) on the
(p � 0.003) and RF (p < 0.001). Overall, the proposed UCLA site. These two sites contain nearly 100 subjects, so
multichannel DANN achieved improved ASD classifica- the results are very informative. We also noted that the
tion accuracy, sensitivity, F-score, and specificity among lowest accuracy our model returned was 0.684 ± 0.026 from
compared machine learning models, while the multi- UM site, suggesting that the data here may have variability
channel DNN had the highest precision. that is different from other sites. Overall, our model reached
Inspiringly, the proposed multichannel DANN sig- a mean accuracy of 0.713 ± 0.022 and mean F-score
nificantly outperformed multichannel DNN on four of 0.707 ± 0.043. This was significantly lower than accuracy
five performance metrics, increasing mean accuracy by (p � 0.002) and F-score (p < 0.001) from the cross validation
0.025, sensitivity by 0.072, F-score by 0.018, and specificity results in Table 2, indicating a large data variability among
by 0.017. Although no significance was found, the pre- different data sites.
cision of the proposed approach is slightly lower than
multichannel DNN by 0.01. The attention mechanism in
our model, as the name implies, aids the deep learning 4.3. Robustness of Multichannel DANN on Varying Data Split
model to make choices about which features it should pay Schemes. Next, the robustness of our DANN was further
attention. Our model can allocate attention by adjusting tested using varying k-fold cross validation. A classification
the weights they assign to individual FC features. This model that is not robust may appear to perform very dif-
process can decide which FC features are more important ferently with different k. Figure 4 shows plots of the accu-
than others in terms of the ASD classification task. In racy, sensitivity, precision, F-score, and specificity of the
another word, it optimizes the feature selection during the proposed DANN over k-fold cross validation strategies
learning of a deep learning model. The improved per- (k � [6, 7, 8, 9, 10]). Using one-way ANOVA, the proposed
formance of DANN over DNN demonstrated the validity DANN exhibited no significantly different performance
of the attention mechanism. The results in Table 2 also across varying k-fold experiments (p � 0.082), indicating the
showed that multichannel DANN achieved significantly robustness of the proposed multichannel DANN model.
improved performance, compared to traditional models
SVM and RF. This is consistent with multiple previous 4.4. Impact of Data Modality on the Classification
ASD classification studies [26, 27]. The improvement was Performance. At the end, we set to test the performance of
likely due to a combination of attention mechanism and the multichannel DANN when different data modalities are
the superior capability of deep learning model on complex used for ASD classification. All results were based on 50
data patterns, such as FC features. repeats of 10-fold cross validation experiment. Table 4 lists
the performance of multichannel DANN on varying com-
binations of FC data (marked as AAL, HO, and CC200) and
4.2. Leave-One-Site-Out Cross Validation of Multichannel PC data (marked as Demo). The upper part of Table 4
DANN. To test the generalizability of the proposed model contains results based on both FC and PC data, while the
on unseen data from different data sites, we performed a lower part of the table focuses on FC data only. The com-
leave-one-site-out cross validation. Similar to k-fold cross bined FC and PC data (AAL + HO + CC + Demo) had a
validation, we reserved data from one data site as testing data better accuracy (p � 0.011), sensitivity (p � 0.039), and
and trained our model by using all data from the rest of the specificity (p � 0.025) than FC data alone
11 data sites. But, since the training data were the same (AAL + HO + CC), while no significant differences were
Complexity 7
Accuracy Sensitivity
0.76
0.85
0.74
0.80
0.72
0.75
0.70
0.70
0.68
0.65
0.66
0.60
0.64
Fold 6 Fold 7 Fold 8 Fold 9 Fold 10 Fold 6 Fold 7 Fold 8 Fold 9 Fold 10
(a) (b)
Precision F-score
0.78
0.80 0.76
0.74
0.75
0.72
0.70 0.70
0.68
0.65
0.66
0.60 0.64
Fold 6 Fold 7 Fold 8 Fold 9 Fold 10 Fold 6 Fold 7 Fold 8 Fold 9 Fold 10
(c) (d)
Specificity
0.76
0.74
0.72
0.70
0.68
0.66
0.64
Fold 6 Fold 7 Fold 8 Fold 9 Fold 10
(e)
Figure 4: Performance of multichannel DANN over varying data split schemes with k-fold cross validation strategies (k � [6, 7, 8, 9, 10]).
Mean and standard deviation are displayed.
8 Complexity
Table 4: Comparison of multichannel DANN on different data combinations using 10-fold cross validation on the entire dataset.
Data Accuracy Sensitivity Precision F-score Specificity
AAL + HO + CC + Demo 0.732 ± 0.024 0.745 ± 0.115 0.730 ± 0.053 0.736 ± 0.042 0.717 ± 0.101
AAL + HO + Demo 0.700 ± 0.035 0.698 ± 0.068 0.701 ± 0.035 0.702 ± 0.004 0.673 ± 0.401
AAL + CC + Demo 0.703 ± 0.009 0.721 ± 0.084 0.686 ± 0.071 0.699 ± 0.067 0.697 ± 0.060
HO + CC + Demo 0.691 ± 0.018 0.696 ± 0.098 0.686 ± 0.106 0.690 ± 0.054 0.687 ± 0.065
AAL + Demo 0.666 ± 0.002 0.683 ± 0.080 0.650 ± 0.071 0.659 ± 0.006 0.679 ± 0.031
HO + Demo 0.689 ± 0.027 0.681 ± 0.078 0.696 ± 0.106 0.691 ± 0.011 0.685 ± 0.057
CC + Demo 0.692 ± 0.053 0.703 ± 0.092 0.681 ± 0.141 0.689 ± 0.043 0.704 ± 0.055
AAL + HO + CC 0.720 ± 0.062 0.696 ± 0.097 0.738 ± 0.212 0.724 ± 0.078 0.695 ± 0.056
AAL + HO 0.684 ± 0.027 0.636 ± 0.084 0.730 ± 0.018 0.699 ± 0.035 0.673 ± 0.050
AAL + CC 0.695 ± 0.018 0.683 ± 0.083 0.706 ± 0.124 0.700 ± 0.030 0.683 ± 0.052
HO + CC 0.688 ± 0.027 0.666 ± 0.086 0.711 ± 0.053 0.697 ± 0.020 0.683 ± 0.055
AAL 0.658 ± 0.009 0.641 ± 0.078 0.674 ± 0.141 0.663 ± 0.035 0.658 ± 0.032
HO 0.679 ± 0.009 0.683 ± 0.065 0.674 ± 0.071 0.677 ± 0.007 0.691 ± 0.029
CC 0.682 ± 0.005 0.651 ± 0.071 0.713 ± 0.106 0.693 ± 0.034 0.677 ± 0.046
AAL: AAL atlas-based FC; HO: HO atlas-based FC; CC: CC200 atlas-based FC; Demo: PC data. All data are mean and standard deviation.
[8] N. Newbutt, C. Sung, H. J. Kuo, and M. J. Leahy, “The accep- [24] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,
tance, challenges, and future applications of wearable technology vol. 521, no. 7553, pp. 436–444, 2015.
and virtual reality to support people with autism spectrum [25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet
disorders,” in Recent Advances in Technologies for Inclusive Well- classification with deep convolutional neural networks,” in
Being, pp. 221–241, Springer, Berlin, Germany, 2017. Advances in Neural Information Processing Systems, Curran
[9] American Psychiatric Association, Diagnostic and Statistical Associates, Inc., Lake Tahoe, NV, USA, 2012.
®
Manual of Mental Disorders (DSM-5 ), American Psychiatric
Association Publishing, GA, USA, 2013.
[26] A. S. Heinsfeld, A. R. Franco, R. C. Craddock, A. Buchweitz,
and F. Meneguzzi, “Identification of autism spectrum dis-
[10] M. Galliver, E. Gowling, W. Farr, A. Gain, and I. Male, “Cost order using deep learning and the abide dataset,” NeuroImage:
of assessing a child for possible autism spectrum disorder? an Clinical, vol. 17, pp. 16–23, 2018.
observational study of current practice in child development [27] Y. Kong, J. Gao, Y. Xu, Y. Pan, J. Wang, and J. Liu, “Clas-
centres in the UK,” BMJ Paediatrics Open, vol. 1, no. 1, Article sification of autism spectrum disorder by combining brain
ID e000052, 2017. connectivity and deep neural network classifier,” Neuro-
[11] K. K. Hyde, M. N. Novack, N. LaHaye et al., “Applications of computing, vol. 324, pp. 63–68, 2019.
supervised machine learning in autism spectrum disorder [28] T. Eslami, V. Mirjalili, A. Fong, A. Laird, and F. Saeed, “ASD-
research: a review,” Review Journal of Autism and Develop- diagnet: a hybrid learning approach for detection of autism
mental Disorders, vol. 6, no. 2, pp. 128–146, 2019. spectrum disorder using FMRI data,” 2019, http://arxiv.org/
[12] D. Gil, M. Johnsson, H. Mora, and J. Szymanski, “Advances in abs/1904.07577.
architectures, big data, and machine learning techniques for [29] J. Guo, K. Yang, H. Liu et al., “A stacked sparse autoencoder-
complex Internet of things systems,” Complexity, vol. 2019, based detector for automatic identification of neuromagnetic
Article ID 4184708, 3 pages, 2019. high frequency oscillations in epilepsy,” IEEE Transactions on
[13] L. Zhou, S. Pan, J. Wang, and A. V. Vasilakos, “Machine Medical Imaging, vol. 37, no. 11, pp. 2474–2482, 2018.
learning on big data: opportunities and challenges,” Neuro- [30] F. Ma, Q. You, H. Xiao, R. Chitta, J. Zhou, and J. G. Kame,
computing, vol. 237, pp. 350–361, 2017. “Knowledge-based attention model for diagnosis prediction
[14] Ç. Uğur, H. Tunca, E. Sekmen et al., “A comparative study of in healthcare,” in Proceedings of the 27th ACM International
the oxidative stress indices of children with autism and Conference on Information and Knowledge Management,
healthy children,” Anatolian Journal of Psychiatry, vol. 19, pp. 743–752, ACM, Torino, Italy, October 2018.
no. 3, 2018. [31] X. Peng, G. Long, T. Shen, S. Wang, J. Jiang, and
[15] C. Li, H. Zhou, T. Wang et al., “Performance of the autism M. Blumenstein, “Temporal self-attention network for
spectrum rating scale and social responsiveness scale in medical concept embedding,” 2019, http://arxiv.org/abs/1909.
identifying autism spectrum disorder among cases of intel- 06886.
lectual disability,” Neuroscience Bulletin, vol. 34, no. 6, [32] T. Shen, T. Zhou, G. Long, J. Jiang, S. Pan, and C. Zhang,
pp. 972–980, 2018. “Disan: directional self-attention network for RNN/CNN-free
[16] R. L. Hansen, N. J. Blum, A. Gaham et al., “Diagnosis of language understanding,” in Proceedings of the Thirty-Second
autism spectrum disorder by developmental-behavioral pe- AAAI Conference on Artificial Intelligence, New Orleans, LA,
diatricians in academic centers: a DBPNet study,” Pediatrics, USA, February 2018.
vol. 137, no. 2, pp. S79–S89, 2016. [33] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine
[17] J. N. Constantino and C. P. Gruber, Social Responsiveness translation by jointly learning to align and translate,” 2014,
Scale (SRS), Western Psychological Services, Springer, New http://arxiv.org/abs/1409.0473.
York, NY, USA, 2007. [34] L. Shang, Z. Lu, and H. Li, “Neural responding machine for
[18] T. Charman, A. Pickles, E. Simonoff, S. Chandler, T. Loucas, short-text conversation,” 2015, http://arxiv.org/abs/1503.
and G. Baird, “IQ in children with autism spectrum disorders: 02364.
data from the special needs and autism project (SNAP),” [35] S. Sukhbaatar, J. Weston, R. Fergus et al., “End-to-end
Psychological Medicine, vol. 41, no. 3, pp. 619–627, 2011. memory networks,” in Proceedings of the Conference on
[19] S. L. Bishop, J. Richler, and C. Lord, “Association between Neural Information Processing Systems, Montreal, Canada,
restricted and repetitive behaviors and nonverbal IQ in December 2015.
children with autism spectrum disorders,” Child Neuropsy- [36] A. M. Rush, S. Chopra, and J. Weston, “A neural attention
model for abstractive sentence summarization,” 2015, http://
chology, vol. 12, no. 4-5, pp. 247–267, 2006.
arxiv.org/abs/1509.00685.
[20] S. Ghiassian, R. Greiner, P. Jin, and M. R. G. Brown, “Using
[37] Z. Lin, M. Feng, C. Nogueira dos Santos et al., “A structured
functional or structural magnetic resonance images and
self-attentive sentence embedding,” 2017, http://arxiv.org/
personal characteristic data to identify ADHD and autism,”
abs/1703.03130.
PLoS One, vol. 11, no. 12, Article ID e0166934, 2016.
[38] Y. Liu, C. Sun, L. Lin, and X. Wang, “Learning natural lan-
[21] B. Sen, N. C. Borle, R. Greiner, and M. R. G. Brown, “A
guage inference using bidirectional LSTM model and inner-
general prediction model for the detection of ADHD and
attention,” 2016, http://arxiv.org/abs/1605.09090.
autism using structural and functional MRI,” PLoS One,
[39] F. Pedregosa, G. Varoquaux, A. Gramfort et al., “Scikit-learn:
vol. 13, no. 4, Article ID e0194856, 2018.
machine learning in Python,” Journal of Machine Learning
[22] G. J. Katuwal, S. A. Baum, N. D. Cahill, and M. M. Andrew,
Research, vol. 12, pp. 2825–2830, 2011.
“Divide and conquer: sub-grouping of asd improves ASD
detection based on brain morphometry,” PLoS One, vol. 11,
no. 4, Article ID e0153331, 2016.
[23] C. Zu, Y. Gao, B. Munsell et al., “Identifying disease-related
subnetwork connectome biomarkers by sparse hypergraph
learning,” Brain Imaging and Behavior, vol. 13, no. 4,
pp. 879–892, 2018.