1 s2.0 S2352914821000356 Main
1 s2.0 S2352914821000356 Main
A R T I C L E I N F O A B S T R A C T
Keywords: Electronic Health Records (EHRs) hold symptoms of many diverse diseases and it is imperative to build models to
Deep neural networks recognise these problems early and classify the diseases appropriately. This classification task could be presented
Deep learning class imbalance as a single or multi-label problem. Thus, this study presents Psychotic Disorder Diseases (PDD) dataset with five
Mental disorder multi-label classification
labels: bipolar disorder, vascular dementia, attention-deficit/hyperactivity disorder (ADHD), insomnia, and
Psychotic disorder disease
schizophrenia as a multi-label classification problem. The study also investigates the use of deep neural network
and machine learning techniques such as multilayer perceptron (MLP), support vector machine (SVM), random
forest (RF) and Decision tree (DT), for identifying hidden patterns in patients’ data. The study furthermore in
vestigates the symptoms associated with certain types of psychotic diseases and addresses class imbalance from a
multi-label classification perspective. The performances of these models were assessed and compared based on
an accuracy metric. The result obtained revealed that deep neural network gave a superior performance of
75.17% with class imbalance accuracy, while the MLP model accuracy is 58.44%. Conversely, the best perfor
mance in the machine learning techniques was exhibited by the random forest model, using the dataset without
class imbalance and its result, compared with deep learning techniques, is 64.1% and 55.87%, respectively. It
was also observed that patient’s age is the most contributing feature to the performance of the model while
divorce is the least. Likewise, the study reveals that there is a high tendency for a patient with bipolar disorder to
have insomnia; these diseases are strongly correlated with an R-value of 0.98. Our concluding remark shows that
applying the deep and machine learning model to PDD dataset not only offers improved clinical classification of
the diseases but also provides a framework for augmenting clinical decision systems by eliminating the class
imbalance and unravelling the attributes that influence PDD in patients.
* Corresponding author.
E-mail addresses: [email protected], [email protected] (S.G. Fashoto).
https://doi.org/10.1016/j.imu.2021.100545
Received 21 June 2020; Received in revised form 28 February 2021; Accepted 1 March 2021
Available online 17 March 2021
2352-9148/© 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
at an early stage of sickness may be unpredictable and sometimes conditions them to perceive and react to most stimuli in their environ
treatment is yet to be decided. Diagnosing a patient with PDD is ment and this predisposes them to accidents or injury, sleep disturbance,
exclusively clinically stressful [6] for physicians, psychologists, and aggression, mood swings, substance misuse, immature language, anxiety
psychiatrists because they need to know the medical and psychiatric states, academic underachievement and unpopularity with peers [19].
history of the patient. The PDD coexists with non-communicable chronic Since ADHD symptoms differ from one person to another, its diagnosis
diseases (NCD) [7]. The diagnosis of psychotic disorder diseases such as takes time and in some cases, results in high error rates, due to different
Schizophrenia is based on the DSM-5 and ICD, and is done clinically, clinical examinations. Due to ADHD symptoms that differ from one
since specific biomarkers that can predict the illness accurately remain person to another, ADHD can appear in three different ways among
unknown [8]. Emerging technologies like machine learning methods different people: inattention, hyperactivity-impulsivity, and a combi
(such as pattern recognition, support vector machines, multivariate nation of both [20].
pattern analysis, Gaussian processes, logistic regression, random forest,
neural networks) [4,9] and magnetic resonance imaging (MRI) [8,10] 2.1.3. Schizophrenia
when applied to neuroimage data represent a new and promising Schizophrenia is characterised by a collection of symptoms such as
approach that could support the diagnosis of mental disorders. The issue deterioration of mental functioning, language disturbance, disjointed
of PDD continues to persist and it is one of the most devastating mental speech, social withdrawal, hallucination, motor disturbance and irra
illnesses globally [2,4], because it incapacitates individuals psycholog tional thinking [6]. Schizophrenia is a disorder that impacts every area
ically. There is presently no known cure for mental illness but timely of an individual’s psychological functioning, and which is characterised
detection and prompt intervention can aid in slowing down the illness by severe deviation from reality [12]. Prevalence of schizophrenia in the
[2,4,11]. The onset of mental illness is primarily preceded by non-severe adult population is between 0.3 and 0.7%, though it is higher in males
symptoms which are a major challenge in diagnosing the early onset of than female adults [21].
mental illness or psychosis. The DSM-5 and ICD-10 systems require the
elimination of medical conditions before diagnosing psychotic disorder 2.1.4. Vascular dementia
[12]. In an attempt to overcome the devastating state of PDD, psychia Vascular Dementia (VD) is a neuro-cognitive disorder which is
trists and psychologists in the psychiatric community teamed up with characterised by a progressive cognitive decline in memory and cogni
computer scientists and engineers to develop machine learning algo tive functions. That is, such individuals decline in language, complex
rithms for the prediction of psychotic disorder diseases, simultaneously, attention, learning and memory, social cognition and motor function
based on the existing dataset [3,13,14]. [12]. Individuals with the dementia condition may become suicidal,
The outline of the paper is as follows: Section 2 describes the types of depressed and harmful to themselves and others [12]. Prevalence is high
mental disorder discussed in this study -Deep and Machine learning among older adults (above 65 years) and varies between 2% and 25% in
techniques for Multi-label Classification, deep learning approaches and the adult population. The World Health Organisation estimates that 35.6
Class imbalance in Machine Learning Techniques. Section 3 describes million people live with dementia, a number that is anticipated to triple
the methods adopted in this study. The PDD dataset used will be by 2050, as 7.7 million new cases of dementia are diagnosed every year,
described and also represented as a multi-label problem. Also, the Deep posing a financial burden to the society [22].
Learning architecture and the machine learning techniques used for the
multi-classification and class imbalance will be extensively described. 2.1.5. Insomnia
Section 4 will discuss the results obtained, based on the measures used in Insomnia is also known as a sleep disorder and it is a chronic sleep
the study. Lastly, section 5 will conclude the work and describe future disorder. An individual with insomnia experiences the challenge of
work. falling asleep and is unable to maintain sleep [23]. This mental condi
tion may be intermittently ranging from acute (a few weeks) to chronic
2. Mental disorder (several months).
2.1. Types of mental disorder 2.2. Deep and machine learning techniques for multi-label classification
on PDD
There are five types of mental disorders considered in this study,
based on the available dataset in Nigeria to label psychotic patients; Several models have been applied to address single-labelled classi
these include bipolar disorder, attention deficit hyperactive disorder, fication and multi-class problems. A single-labelled classification bothers
schizophrenia, vascular dementia and insomnia. with learning from a set of instances that are related with a single label l
from a set of disjoint labels L, |L| > 1. If |L| = 2, then this is termed
2.1.1. Bipolar disorder binary classification problem. A multi-class problem is characterised by
Bipolar disorder is an unusual mood change which is often extreme |L| > 2. Whereas, a multi-label classification is characterised by instances
and fluctuating, and which occurs at irregular intervals. An emotion associated with a set of labels Y⊆L [24]. For instance, authors in
ranges between euphonic feelings and bout of depression [15]. Bipolar Ref. [25] presented an unsupervised deep feature learning method
disorder is sub-categorised into Bipolar I and Bipolar II. Diagnostic named ‘deep patient’ with a 3-layered stack of denoising autoencoders
symptoms of bipolar I include mania episodes) to hypomanic episodes to capture hierarchical regularities and dependencies in the aggregated
while Bipolar II diagnostic symptoms range from mild depression to EHRs. These EHRs contain about 700,000 patients as samples with 78
moderate and extreme depressive episodes [16]. diseases as labels from the Mount Sinai data warehouse which were split
into the ratio of 89.11–10.89 train – test split. Results obtained revealed
2.1.2. Attention deficit hyperactive disorder (ADHD) that severe diabetes, schizophrenia, and various cancers performed best.
The new name for Minimal Brain dysfunction (MBD) in the psychi The approach taken by Ref. [4] is a semantic and latent content
atric space nowadays is Attention Deficit Hyperactivity Disorder analysis. It considered semantic density which was determined by the
(ADHD). Attention-deficit is characterised by the inability to sustain number of parts in a sentence which makes the sentence meaningful.
and/or maintain attention, control impulses and regulate the level of Furthermore, it compared an individual’s speech with a large database
physical activities [17] while hyperactivity disorder is described as in of other people’s speech patterns to determine when the speech becomes
children that are highly impulsive, acting too fast without thinking to abnormal. This results in the prediction of possible psychosis in an in
complete a task and rebuff waiting for their turn [18]. ADHD in children dividual. Machine learning techniques in the form of a two-layer neural
(especially between 3 and 6 years old), young people and adults network were then implemented with these two linguistic indicators to
2
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
predict psychosis. The authors in Ref. [3] proposed the use of machine perceptron (MLP) [33]. In a multilayer feedforward network, each input
learning evaluation on PDD like label and ranking-based perspectives. variable is associated with weight values which are inputted in the input
The evaluation metrics considered in the study are hamming loss (HL), layer. Each node in the input layer consists of the artificial neuron,
one error (OE), zero-one loss (ZOL), ranking loss (RL), accuracy (Acc), which sums the product of input variables (x1 , x2 , …, xn ) and associated
average precision (AP), Micro-F1 and Macro-F1. The dataset for the weights values (w1 , w2 , …, wn ); and the sum of weighted inputs are
study was evaluated on the Support Vector Machine (SVM), Naïve Bayes calculated as in equation (1).
(NB), Logistic Model Tree (LMT), and Naïve Bayes Tree (NBTree), base
∑
n
classifiers in the Problem Transformation (PT) and Ensemble methods. Yi = xi ⋅wi (1)
The PT approach uses Binary relevance (BR), Classifier Chains (CC), i=1
Probabilistic Classifier Chains (PCC), Pruned Set (PS), FW and RT, while
the ensemble approach uses Random k-label sets (RAkEL), RAkELd, then the summation output of the weighted inputs is then passed
∑
EBR, Ensemble of Classifier Chains (ECC), EFW, EPCC, Ensemble of through a nonlinear activation function f( ni=1 xi ⋅wi ) o transform the
Pruned Set (EPS), ELC and ERT. The study shows that the Label Powerset pre-activation level of the neuron to an output yi as shown in Fig. 1. The
(LP) and Pruned Sets (PS) in the multi-label classification methods, with weight value determines the strength and direction (inhibitory or
Naïve Bayes (NB) and Naïve Bayes Tree (NBTree), consistently per excitatory) of each neuron input.
formed best in terms of the evaluation measures on the PDD dataset. The output of the input layer is then used as the input of the pre
Schizophrenia diagnosis was examined in Ref. [26] by utilising a ceding layer, called the hidden layer. The number of hidden layers
hybrid of artificial intelligence and a knowledge base supplying an represents the depth of the network. The output of hidden layers is fed
expert system to make accurate diagnoses. The knowledge base for into the output layer, whose purpose is to classify labels in the context of
classifying psychotic disorders was sourced from previously established classification. Each neuron is fully connected with all predecessor neu
works and models for multi-criteria decision analysis, while artificial rons in the previous layer as well as all neurons in the preceding layer in
intelligence was used to create production rules and probabilities. The a feedforward structure, as shown in Fig. 2. Usually, MLPs are trained
work in Ref. [6] reviewed 35 previous studies that utilised various using backpropagation method, where the error is adjusted using asso
combinations of different machine learning techniques, such as support ciated weights values (learning rate or momentum value), from the
vector machines, multivariate pattern analysis, and random forest, in output layer to the input layer. This process is called network training.
order to analyse neuro-images and determine the presence of schizo Therefore, network training continues iteratively using a training set
phrenia in a particular subject. The use of artificial intelligence and and training algorithm until the error has reached its minimum value.
machine learning in medicine and psychiatry were considered in
Ref. [27]. This study looks at a possible future using these technologies 2.2.1.2. Convolutional neural networks (CNNs). Recent studies have
in these industries, and in particular, the benefits and challenges. shown that convolutional neural networks are subsequently applied to
classifying psychiatric disorders [34–36]. CNNs is a deep neural network
2.2.1. Overview of deep learning approaches architecture initially introduced by LeCun in 1989, designed to process
The increase in psychotic disorder cases and the popularity of deep visual imagery [37]. CNN can be thought of as a classifier that extracts
learning algorithms present unprecedented opportunities for the appli and processes hierarchical features for imagery data. Thus, images are
cation of deep learning algorithms, for modelling and classifying PDD, given as input labels and training is done automatically. In CNNs, the
using patients’ data. Deep learning (DL) approaches achieved remark first layer is the input image. Instead of having input layer and output
able results in many domains, thereby revolutionising the field of ma layer only, CNNs have more additional types of layers called convolu
chine learning through deep hierarchical feature construction in the tional layer, pooling layer, and fully-connected layer.
dataset. DL is a subset of machine learning, composed of multi-layered The convolutional layer is the core module of CNN which is
neural networks; one or more hidden layers are connected to each responsible for convolving the input image with learnable filters and
other to form a network that is capable of learning complex structures extract its features [38]. Every filter is composed of neurons that detect
with a high level of abstraction [28]. Deep learning approaches such as features for the layer inputs. A filter has spatial dimensions of width
convolutional neural networks (CNN), recurrent neural network (RNN), (Wi ), height (Hi ), and channel number (Di ). Therefore, convolution
deep reinforcement learning (DRL) and multilayer feedforward net means the sum of the element by element multiplication of the neurons
works can learn the optimal representation from the raw data through in each filter, with the corresponding values at the input. Therefore, a
consecutive nonlinear transformations, thus achieving increasingly 2-dimensional feature map with parameters such as padding and stride
higher levels of abstraction and complexity, as compared to machine is produced accordingly on convolution with a single filter in each layer.
learning algorithms [5]. The DL neural networks are inspired by the way The strides determine the size of the feature map and weights associated
the human brain processes information [29], therefore, its architecture with the filter determine the features. Each convolutional layer has
is composed of an input layer, two or more hidden layers, and an output n-filters (where n is the number of filters), with each resulting in a
layer. The PDD data is fed into the input layer; the abstract features of feature map. The output of the convolutional layer is the stacked feature
the PDD data are passed into hidden layers, which process them using maps produced by filters containing different weights, as shown in
activation function. Features or patterns are then fed to the output layer Fig. 3.
that assigns the observations to classes. The performance of DL algo
rithms is improved through an iterative process of adjustment of the
interconnections (weights and learning rate) between hidden artificial
neurons in the hidden layer.
3
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
4
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
The data were obtained from Yaba Psychiatry Hospital, Yaba, Lagos
State, Nigeria by Adejumo et al. [50]. It contained medical records of
500 psychotic patients, 16 variables (11 independent and 5 dependent
variables). The information spans a period of five years (Jan. 2010–Dec.
2014). A deep learning neural network was employed to cater for
edge-cases that could not be addressed by machine learning algorithms.
Machine learning techniques were employed to eliminate the class
imbalance in the dataset using the Synthetic Minority Oversampling
Technique (SMOTE). The categorical feature vectors from the experi
ment are transformed into binary using a one-hot-vector encoding
technique. The deep learning neural network is designed to be a 3-layer
deep architecture as presented in Fig. 4. The activation functions used in
the architecture were rectified linear units (RELU) and sigmoid for the
connected layers and output layer respectively. The loss function for
training is a binary cross-entropy and evaluation metrics accuracy. The
first layer in Fig. 4a and Fig. 4b are the input layers, where the input data
is read into the network model. The fifth layers are the output layer used
for the classification (single and multi-label). The second, third, and
fourth layers are called hidden layers. The hidden layers are the layers
between the input layers and the output layers. The number of hidden
layers represents the depth of the network.
5
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
6
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
the curve (AUC) of 93%. This indicates that the model has more true
Table 1
positives and fewer false positives. On the other hand, the worst per
Accuracy Results of Single and Multi-Label Classification Model on dataset with
formance is exhibited by the model for bipolar disorder with 71% AUC.
class imbalance.
For the confusion matrix, all the values in the diagonal top left to
Deep Learning Model (%) Validation (%)
bottom right are correctly classified data samples, as presented in Fig. 8.
Multi-output (Multi label) 0.7786 0.7517 Now, in our validation set, if you get the sum across columns (left to
Insomnia 0.7929 0.7417 right), you would get the total samples present in that class. For
Schizophrenia 0.9250 0.9000
Vascular dementia 0.8536 0.8010
example, in row one, total 0 samples is 23 + 1 = 24. This means in the
ADHD 0.7786 0.6500 actual validation set, we have 24 samples belonging to the “0′′ class, of
Bipolar disorder 0.8143 0.7667 which our model predicted 23 correctly (95.8%). For observe class “11′′ ,
the model didn’t perform well; it showed 28.6% accuracy by classifying
only 6 samples correctly and misclassifying most data samples into
Table 2 different classes. Here, the model is overfitted on this class. This shows
Accuracy Results of Single and Multi-Label Classification Model on dataset the data samples in the training belonging to class “11′′ is not enough.
without class imbalance.
Deep Learning Model (%) Validation (%) 4.2. Discussion of results
7
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
8
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
The target column now forms our multiclass variable. The values in Table 3
this column were found to contain 19 different combinations. When the Comparison of machine learning techniques results.
distribution was checked, we found that some of the combinations Algorithm Accuracy (%) Balanced Accuracy
occurred only once, some two times, some 5 times and some more than
MLP 0.5843621399176955 0.5853072853072853
10 times. The different combinations found are 01110, 11100, 11101, SVM 0.4691358024691358 0.4982219169719169
11001, 11111, 01100, 01000, 01111, 10001, 10011, 00000, 10111, RF 0.4691358024691358 0.6406784188034188
11011, 00100, 00110, 01010, 10101, 00010 and 10100. In the SMOTE DT 0.4691358024691358 0.4982219169719169
recommendation, it is recommended that for good performance, we
have at least 6 neighbours. As such, we removed all combinations with
less than 6 occurrences. The removed combinations are 11100, 01111, Table 4
10011, 00110, 00010, 10100, 10111. After removing combinations less Accuracy on multi-label classification with class imbalance.
than 6 (which were 7 in number), the following are the remaining Techniques Optimal Accuracy (%) Author
combinations 01110, 11101, 11001, 11111, 01100, 01000, 10001,
LC-Naïve Bayes tree (Multi-label) 56.56 [3]
00000, 11011, 00100, 01010, 10101. We then apply SMOTE on the LC-Naïve bayes (Multi-label) 56.56 [3]
data, such that every sample had a total of 101 samples each. We further Proposed (Deep learning) 75.17 this study
run an analysis on the dataset to compute the feature importance, and Proposed (ML-MLP) 58.44 this study
the plot is as shown in Fig. 9. Age is the top feature contributing to the
classifications of the PDD as derived from the random forest model,
followed by occupation and divorce is the least as shown in Fig. 9. We Table 5
further train a Multilayer Perceptron (MLP), Support Vector Machines Balanced accuracy on multi-label classification without class imbalance.
(SVM), Random Forest (RF) and Decision Tree (DT) on the training data Techniques Optimal Accuracy (%) Author
(which is 80% of the whole data). The four algorithms are evaluated on
Proposed (Deep learning) 55.87 this study
the test set (which is 20% of the sampled data). The accuracy and Proposed (ML-RF) 64.07 this study
balanced accuracy are as presented in Table 3.
Our results on machine learning techniques were compared with [3,
50] in terms of accuracy, balanced accuracy and correlation matrix on imbalance and optimally balanced accuracy on the dataset with a
attributes influencing PDD. Our proposed study performed better with balanced class from different studies. This was achieved using different
MLP, compared to SVM, RF and decision tree, based on accuracy, while techniques on the same dataset. The proposed deep neural network-
the RF outperform SVM, MLP and DT are based on balanced accuracy, as multilayer perceptron (DNN-MLP) and machine learning - multilayer
presented in Table 3. The MLP and RF also outperformed the best perceptron (ML-MLP) outperformed the other approaches in Table 4.
ensemble machine learning employed in Ref. [3], based on accuracy and The proposed and research carried using a deep neural network and
balanced accuracy, respectively, as presented in Table 3. machine learning with RF with balanced class are as shown in Table 5.
After balancing the dataset, we discovered that the Machine learning-
random forest (ML-RF) outperform the Machine learning-multilayer
4.3. Evaluation of techniques based on accuracy and balanced accuracy perceptron (ML-MLP) while ML-RF classification performance is less
than deep neural network.
Different validations accuracy were obtained as presented in Ta
bles 4 and 5 based on optimal accuracy on the dataset, with class
9
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
The features that influence the PDD by Ref. [50] and the proposed
study are based on a statistical approach, using χ 2 statistics and machine
learning approach, using a pairwise correlation function are as pre
sented in Table 6.
In this study, the result shows that there is a high tendency for a
patient with Bipolar disorder to have insomnia. The claim is supported
by Ref. [53] because the features influencing them are almost the same,
except for occupation, in Bipolar disorder. Our findings in Table 6 is
supported by Refs. [50,53]. Bipolar disorder and insomnia are strongly
correlated with R-value of 0.98 as shown in Fig. 10. Bipolar disorder and
insomnia are most prevalent in old adults also. Fig. 10 also shows that
marital stress can also predispose married couples to ADHD, while
insomnia and bipolar are higher in single patients.
In spite of the accomplishment of the deep learning technique, its Fig. 10. Feature importance in PDD.
ascendency cannot be demonstrated in all instances in real-life sce
narios. Most researchers have concluded that deep learning technique practitioners to distinguish and properly diagnose psychotic disorders.
will always outperform machine learning technique [54,55]. This is not Therefore, this study presented PDD as a multi-label classification
always the case in all circumstances as established in this study. Deep problem to investigate the use of deep neural network and machine
learning outperforms machine learning, considering a dataset that learning architectures and techniques - multilayer perceptron (MLP),
contains both imaging and non-imaging raw data as established in some support vector machine (SVM), random forest (RF) and Decision tree
systematic literature review [5]. We established that deep learning (DT). This was done for the purpose of identifying hidden patterns in
outperforms machine learning on PDD dataset with class imbalance patients’ data and for classifying Psychotic Disorder Diseases like Bi
from multi-classification accuracy perspective whereas on PDD dataset polar disorder, Vascular Dementia, Insomnia, Schizophrenia and
with balanced class, machine learning outperforms the deep learning. Attention Deficit Hyperactive Disorder (ADHD) in patients, for the use of
Many studies also supported the fact that machine learning will clinicians. The architectures are treasured tools as correlations in the
outperform deep learning on the dataset with a small sample size as data, through which iterative optimisation techniques can be concluded.
corroborated in this study. The deep neural network algorithm perfor The outcomes obtained showed that Deep Learning Neural Networks
mance in this study on the psychotic disorder dataset, using two split give good classification performance results, based on accuracy, true
(train and validate) produced good results which when used for three positive rate and false positive rate and AUC, compared to the machine
splits (train, validate and test) of the same algorithm produced poor learning approach adopted by Ref. [56]. The proposed approaches in
results because the sample size was small. Secondly, the accuracy of the this study and the use of ensemble machine learning on the same dataset
multi-classification, based on the deep learning-multilayer perceptron in the previous studies are both pointing to schizophrenia as the one
algorithm was high on the class imbalance dataset due to bias towards with the best performance in terms of accuracy, and to ADHD as the least
the majority class, but it was low on the balanced class. performance. The limitation is that the use of temporal validation per
forms badly, compared to the train/test split both from deep learning
5. Conclusion perspective on the same dataset. These results show that applying the
deep learning model to PDD can derive patient representations that offer
Diagnosing mental illness is increasingly becoming more complex improved clinical predictions and augment machine learning framework
because of confusing symptoms. Also, some patients do not clearly for making clinical decisions. The deep neural network algorithm per
articulate their mental health state and diseases’ symptoms, especially formance on the psychotic disorder dataset, using two split (train and
PDD. A proper diagnosing tool is necessary to assist medical validate) produced strongly good results which when used for three
splits (train, validate and test) of the same algorithm produced poor
Table 6 results because the sample size is small.
Attributes influencing PDD. The result obtained revealed that deep neural network gave a su
perior performance of 75.17% with class imbalance accuracy while the
Techniques Features Author
MLP model accuracy is 58.44%. Conversely, the best performance in the
Insomnia age, occupation, marital status, divorce, spiritual This machine learning techniques are exhibited by the random forest model,
consult study
using the dataset without class imbalance and its result, compared with
Schizophrenia age, marital status, divorce This
study deep learning techniques is 64.1% and 55.87%, respectively. It was also
Vascular spiritual consult This observed that patients’ age is the most contributing feature to the per
dementia study formance of the model while divorce is the least. Likewise, the study
ADHD sex, age, occupation, marital status This
reveals that there is a high tendency for a patient with bipolar disorder
study
Bipolar disorder age, marital status, divorce, spiritual consult This
to have insomnia; these diseases are strongly correlated with an R-value
study of 0.98. Bipolar disorder and insomnia are most prevalent in old adults
Insomnia age, occupation, status, divorce, spiritual consult [50] also. Our concluding remark shows that applying the deep learning
Schizophrenia age, occupation, religion, status, hereditary, [50] model to PDD data not only offers improved clinical classification of the
divorce
diseases but also provides a framework for augmenting clinical decision
Vascular history, spiritual consult [50]
dementia systems, by eliminating the class imbalance and unravelling the attri
ADHD sex, age, occupation, religion, marital status [50] butes that influence PDD in patients. This study also supports other re
Bipolar disorder age, occupation, marital status, divorce, spiritual [50] searchers to be assertive that deep learning does not outperform
consult machine learning techniques in all real-life scenarios [57]. In the future,
10
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
the authors intend to develop a soft-computing model that can handle [20] Sartipi Shadi, Kalbkhani Hashem, Ghasemzadeh Peyman, Shayesteh Mahrokh G.
Stockwell transform of time-series of fmri data for diagnoses of attention deficit
early diagnosis of the multi-classification psychotic diseases analysis
hyperactive disorder. Appl Soft Comput 2020;86:105905.
matrix confusion. [21] Saha Sukanta, Chant David, Joy Welham, McGrath John. A systematic review of
the prevalence of schizophrenia. PLoS Med 2005;2(5).
Declaration of competing interest [22] Iadecola Costantino. The pathobiology of vascular dementia. Neuron 2013;80(4):
844–66.
[23] Hublin CG, Partinen Markku M. The extent and impact of insomnia as a public
The authors declare that they have no known competing financial health problem. J Clin Psychiatry Prim Care Companion 2002;4(8–12).
interests or personal relationships that could have appeared to influence [24] Tsoumakas Grigorios, Katakis Ioannis. Multi-label classification: an overview. Int J
Data Warehous Min 2007;3(3):1–13.
the work reported in this paper. [25] Miotto Riccardo, Li Li, Kidd Brian A, T Dudley Joel. Deep patient: an unsupervised
representation to predict the future of patients from the electronic health records.
Acknowledgement Sci Rep 2016;6(1):1–10.
[26] Nunes Luciano Comin, Rogério Pinheiro Plácido, Pequeno Cavalcante Tarcísio,
Dantas Pinheiro Mirian Calíope. Handling diagnosis of schizophrenia by a hybrid
We would like to appreciate the Yaba Psychiatry hospital, Yaba, method. Comput Math. Methods Med. 2015;2015:13. https://doi.org/10.1155/
Lagos state, Nigeria team and Adejumo, A. O., Ikoba, N. A., Suleiman, E. 2015/987298. 987298.
[27] Kalanderian Hripsime, Nasrallah Henry A. Artificial intelligence in psychiatry. Curr
A., Okagbue, H. I., Oguntunde, P. E., Odetunmibi, O. A., Job, O for Psychiatr 2019;18(8):33–8.
making the dataset available for research purpose. [28] Elliot Mbunge, Makuyana Ralph, Chirara Nation, Antony Chingosho. Fraud
detection in e-transactions using deep neural networks-a case of financial
institutions in Zimbabwe. Int J Sci Res 2017;6(9):1036–40.
References
[29] Chu Lei, Qiu Robert, Liu Haichun, Ling Zenan, Zhang Tianhong, Wang Jijun.
Individual recognition in schizophrenia using deep learning methods with random
[1] Lee Anna Clark, Cuthbert Bruce, Lewis-Fernández Roberto, Narrow William E, forest and voting classifiers: insights from resting state eeg streams. 2017. arXiv
Reed Geoffrey M. Three approaches to understanding and classifying mental preprint arXiv:1707.03467.
disorder: Icd-11, dsm-5, and the national institute of mental health’s research [30] Bashyal Shishir. Classification of psychiatric disorders using artificial neural
domain criteria (rdoc). Psychol Sci Publ Interest 2017;18(2):72–145. network. In: International symposium on neural networks. Springer; 2005.
[2] Bzdok Danilo, Meyer-Lindenberg Andreas. Machine learning for precision p. 796–800.
psychiatry: opportunities and challenges. Biol Psychiatr: Cognit Neurosci [31] Gbenga Fashoto Stephen, Akinnuwesi Boluwaji, Owolabi Olumide,
Neuroimag 2018;3(3):223–30. Adelekan David. Decision support model for supplier selection in healthcare
[3] Folorunso SO, Fashoto SG, Olaomi J, Fashoto OY. A multi-label learning model for service delivery using analytical hierarchy process and artificial neural network.
psychotic diseases in Nigeria. Inf Med Unlocked 2020;19:100326. Afr J Bus Manag 2016;10(9):209.
[4] Rezaii Neguine, Walker Elaine, Wolff Phillip. A machine learning approach to [32] Gnana Sheela K, Deepa Subramaniam N. Review on methods to fix number of
predicting psychosis using semantic density and latent content analysis. npj hidden neurons in neural networks. Math Probl Eng 2013:2013.
Schizophrenia 2019;5(1):1–12. [33] Gardner Matt W, Dorling SR. Artificial neural networks (the multilayer
[5] Vieira Sandra, Pinaya Walter HL, Mechelli Andrea. Using deep learning to perceptron)—a review of applications in the atmospheric sciences. Atmos Environ
investigate the neuroimaging correlates of psychiatric and neurological disorders: 1998;32(14–15):2627–36.
methods and applications. Neurosci Biobehav Rev 2017;74:58–75. [34] Campese Stefano, Lauriola Ivano, Scarpazza Cristina, Sartori Giuseppe,
[6] de Filippis Renato, Carbone Elvira Anna, Gaetano Raffaele, Bruni Antonella, Aiolli Fabio. Psychiatric disorders classification with 3d convolutional neural
Pugliese Valentina, Segura-Garcia Cristina, De Fazio Pasquale. Machine learning networks. In: INNS big data and deep learning conference. Springer; 2019.
techniques in a structural and functional mri diagnostic approach in schizophrenia: p. 48–57.
a systematic review. Neuropsychiatric Dis Treat 2019;15:1605. [35] Durstewitz Daniel. Georgia Koppe, and Andreas Meyer-Lindenberg. Deep neural
[7] Stein Dan J, Benjet Corina, Gureje Oye, Lund Crick, Scott Kate M, networks in psychiatry. Mol Psychiatr 2019;24(11):1583–98.
Poznyak Vladimir, van Ommeren Mark. Integrating mental health with other non- [36] Lanillos Pablo, Oliva Daniel, Philippsen Anja, Yamashita Yuichi, Nagai Yukie,
communicable diseases. Bmj 2019;364:l295. Cheng Gordon. A review on neural network models of schizophrenia and autism
[8] Mikolas Pavol, Hlinka Jaroslav, Skoch Antonin, Pitra Zbynek, Frodl Thomas, Filip spectrum disorder. Neural Network 2020;122:338–63.
Spaniel, Hajek Tomas. Machine learning classification of first-episode [37] Balas Valentina Emilia, Roy Sanjiban Sekhar, Sharma Dharmendra, Samui Pijush.
schizophrenia spectrum disorders and controls using whole brain white matter Handbook of deep learning applications, vol 136. Springer; 2019.
fractional anisotropy. BMC Psychiatr 2018;18(1):97. [38] Jing Hua, Zhong Zichun. Spectral geometry of shapes. Academic Press; 2020.
[9] Stamate Daniel, Katrinecz Andrea, Stahl Daniel, Simone JW Verhagen, [39] Gu Jiuxiang, Wang Zhenhua, Kuen Jason, Ma Lianyang, Shahroudy Amir,
Delespaul Philippe AEG, Jim van Os, Guloksuz Sinan. Identifying psychosis Shuai Bing, Liu Ting, Wang Xingxing, Wang Gang, Cai Jianfei, et al. Recent
spectrum disorder from experience sampling data using machine learning advances in convolutional neural networks. Pattern Recogn 2018;77:354–77.
approaches. Schizophr Res 2019;209:156–63. [40] Hadji Isma, Wildes Richard P. What do we understand about convolutional
[10] Vieira Sandra, Gong Qi-yong, Walter HL Pinaya, Scarpazza Cristina, networks?. 2018. arXiv preprint arXiv:1803.08834.
Tognin Stefania, Crespo-Facorro Benedicto, Tordesillas-Gutierrez Diana, Ortiz- [41] Miikkulainen Risto, Liang Jason, Meyerson Elliot, Rawal Aditya, Fink Daniel,
García Victor, Setien-Suero Esther, Scheepers Floortje E, et al. Using machine Olivier Francon, Raju Bala, Shahrzad Hormoz, Navruzyan Arshak, Duffy Nigel,
learning and structural neuroimaging to detect first episode psychosis: et al. Evolving deep neural networks. In: Artificial intelligence in the age of neural
reconsidering the evidence. Schizophr Bull 2020;46(1):17–26. networks and brain computing. Elsevier; 2019. p. 293–312.
[11] Larson Molly K, Walker Elaine F, Compton Michael T. Early signs, diagnosis and [42] Li Shuai, Li Wanqing, Cook Chris, Zhu Ce, Gao Yanbo. Independently recurrent
therapeutics of the prodromal phase of schizophrenia and related psychotic neural network (indrnn): building a longer and deeper rnn. In: Proceedings of the
disorders. Expert Rev Neurother 2010;10(8):1347–59. IEEE conference on computer vision and pattern recognition; 2018. p. 5457–66.
[12] Botha K, Moletsane M. Western and african aetiological models. In: Abnormal [43] Buda Mateusz, Maki Atsuto, Mazurowski Maciej A. A systematic study of the class
psychology. A South African perspective; 2012. p. 80–99. imbalance problem in convolutional neural networks. Neural Network 2018;106:
[13] Morel Didier, Yu Kalvin C, Liu-Ferrara Ann, Caceres-Suriel Ambiorix J, 249–59.
Kurtz Stephan G, Tabak Ying P. Predicting hospital readmission in patients with [44] Liu Xu-Ying, Wu Jianxin, Zhou Zhi-Hua. Exploratory undersampling for class-
mental or substance use disorders: a machine learning approach. Int J Med Inf imbalance learning. IEEE Trans Syst Man Cybern B Cybern 2008;39(2):539–50.
2020:104136. [45] Ali Aida, Shamsuddin Siti Mariyam, Ralescu Anca L, et al. Classification with class
[14] Scarpazza Cristina, Baecker Lea, Vieira Sandra, Mechelli Andrea. Applications of imbalance problem: a review. Int J Adv Soft Comput Appl 2015;7(3):176–204.
machine learning to brain disorders. In: Machine learning. Elsevier; 2020. [46] Guo Xinjian, Yin Yilong, Dong Cailing, Yang Gongping, Zhou Guangtong. On the
p. 45–65. class imbalance problem. In: 2008 Fourth international conference on natural
[15] Agbu Jane-Frances. “Language and thoughts”. In: Agiobu Kemmer I, editor. computation, vol. 4. IEEE; 2008. p. 192–201.
Essentials of psychology. Springfield Books; 2004. [47] Nitesh V Chawla, Bowyer Kevin W, Hall Lawrence O, Kegelmeyer W Philip. Smote:
[16] Grunze Heinz. Bipolar disorder. In: Neurobiology of brain disorders. Elsevier; synthetic minority over-sampling technique. J Artif Intell Res 2002;16:321–57.
2015. p. 655–73. [48] Pruengkarn Ratchakoon, Wong Kok Wai, Fung Chun Che. Imbalanced data
[17] Anastopoulos Arthur D, Guevremont David C, Shelton Terri L, George J DuPaul. classification using complementary fuzzy support vector machine techniques and
Parenting stress among families of children with attention deficit hyperactivity smote. In: 2017 IEEE international conference on systems, man, and cybernetics
disorder. J Abnorm Child Psychol 1992;20(5):503–20. (SMC). IEEE; 2017. p. 978–83.
[18] Green Christopher, Chee Kit. Understanding adhd. A paren’s guide to attention deficit [49] Abdelrahman I Saad, Yasser MK Omar, Maghraby Fahima A. Predicting drug
hyperactivity disorder in children. London: Vermilion; 1997. interaction with adenosine receptors using machine learning and smote
[19] Joseph Biederman, Mick Eric, Faraone Stephen V. Age-dependent decline of techniques. IEEE Access 2019;7:146953–63.
symptoms of attention deficit hyperactivity disorder: impact of remission [50] O Adejumo Adebowale, Nehemiah A Ikoba, Suleiman Esivue A, Okagbue Hilary I,
definition and symptom type. Am J Psychiatr 2000;157(5):816–8. Oguntunde Pelumi E, Odetunmibi Oluwole A, Job Obalowu. Quantitative
exploration of factors influencing psychotic disorder ailments in Nigeria. Data in
Brief 2017;14:175–85.
11
I. Elujide et al. Informatics in Medicine Unlocked 23 (2021) 100545
[51] James Bergstra, Bengio Yoshua. Random search for hyper-parameter optimization. [55] Han Xiaobing, Zhong Yanfei, He Lifang, Philip S Yu, Zhang Liangpei. The
J Mach Learn Res 2012;13(Feb):281–305. unsupervised hierarchical convolutional sparse auto-encoder for neuroimaging
[52] Vabalas Andrius, Gowen Emma, Poliakoff Ellen, Alexander J Casson. Machine data classification. In: International conference on brain informatics and health.
learning algorithm validation with a limited sample size. PloS One 2019;14(11): Springer; 2015. p. 156–66.
e0224365. [56] Akinnuwesi Boluwaji A, Fashoto Stephen G, Andile S Metfula, Akinnuwesi Adetutu
[53] Kaplan Katherine A, Harvey Allison G. Behavioral treatment of insomnia in bipolar N. Experimental application of machine learning on financial inclusion data for
disorder. Am J Psychiatr 2013;170(7):716–20. governance in eswatini. In: Conference on e-Business, e-Services and e-Society.
[54] Sergey M Plis, Devon R Hjelm, Salakhutdinov Ruslan, Allen Elena A, Springer; 2020. p. 414–25.
Bockholt Henry J, Long Jeffrey D, Johnson Hans J, Paulsen Jane S, Turner Jessica [57] Stephen Fashoto, Elliot Mbunge, Gabriel Ogunleye, Van den Burg Johan.
A, Calhoun Vince D. Deep learning for neuroimaging: a validation study. Front Implementation of machine learning for predicting maize crop yields using
Neurosci 2014;8:229. multiple linear regression and backward elimination. Malay J Comput 2021;6(1):
679–97.
12