1 s2.0 S2590005623000164 Main
1 s2.0 S2590005623000164 Main
Array
journal homepage: www.elsevier.com/locate/array
Keywords: Depression is a common psychiatric disorder that is becoming more prevalent in developing countries like
Depression assessment Bangladesh. Depression has been found to be prevalent among youths and influences a person’s lifestyle and
Machine learning thought process. Unfortunately, due to the public and social stigma attached to this disease, the mental health
Voting algorithm
issue of individuals are often overlooked. Early diagnosis of patients who may have depression often helps to
Explainable AI
provide effective treatment. This research aims to develop mechanisms to detect and predict depression levels
and was applied to university students in Bangladesh. In this work, a questionnaire containing 106 questions
has been constructed. The questions in the questionnaire are primarily of two kinds – (i) personal, and (ii)
clinical. The questionnaire was distributed amongst Bangladeshi students and a total of 684 responses (aged
between 19 and 35) were obtained. After appropriate consents from the participants, they were allowed to
take the survey. After carefully scrutinizing the responses, 520 samples were taken into final consideration.
A hybrid depression assessment scale was developed using a voting algorithm that employs eight well-known
existing scales to assess the depression level of an individual. This hybrid scale was then applied to the collected
samples that comprise personal information and questions from various familiar depression measuring scales.
In addition, ten machine learning and two deep learning models were applied to predict the three classes
of depression (normal, moderate and extreme). Five hyperparameter optimizers and nine feature selection
methods were employed to improve the predictability. Accuracies of 98.08%, 94.23%, and 92.31% were
obtained using Random Forest, Gradient Boosting, and CNN models, respectively. Random Forest accomplished
the lowest false negatives and highest F Measure with its optimized hyperparameters. Finally, LIME, an
explainable AI framework, was applied to interpret and retrace the prediction output of the machine learning
models.
∗ Corresponding author.
E-mail address: [email protected] (S. Momen).
https://doi.org/10.1016/j.array.2023.100291
Received 27 February 2023; Received in revised form 25 April 2023; Accepted 3 May 2023
Available online 10 May 2023
2590-0056/© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
R. Siddiqua et al. Array 18 (2023) 100291
This study followed various measures to assess and predict depres- selection approaches have been used in this work to select the dominant
sion, including establishing an integrated set of questionnaires and a attributes. SVM, ANN and a wide range of ensemble models have been
new scale for measuring depression, collecting a private dataset, and used to classify the depressed and non-depressed samples. SVM and
analyzing and evaluating data. Following the creation of the question- ANN techniques achieved the highest accuracy and AUC of 77.1% and
naires, data were collected in two ways: (1) via a Google form and (2) 0.813, respectively. Su et al. [13] proposed an automatic system to
via a printed form. In both situations, a consent form was provided on forecast the depression of the Chinese elderly population using the
the first page, and individuals were required to sign it to participate in CLHLS survey dataset. KNN-based imputation framework has been
this work. Major contributions of this work are illustrated below: used to substitute the missing data samples in the preprocessing step.
Gradient-Boosted Decision Tree achieved the overall best performance
• In order to investigate depression levels amongst Bangladeshi with 75.9% accuracy and 0.63 AUC.
university students, a dataset has been collected using a question- Ryu and colleagues developed a machine learning-based depression
naire in both online and offline format. forecasting system for stroke survival patients [14]. NIHSS survey and
• A new scale combining eight well-known existing scales has been Hamilton depression prediction index were performed on 623 individ-
created to measure depression as either normal, moderate or uals from a medical center in Korea. SVM and KNN reported accuracies
extreme. A voting technique has been proposed to combine the of 77.5% and 73.3%, respectively. Haque et al. [15] developed an
eight existing scales. automated depression detection scheme for children using the Young
• Machine learning and deep learning algorithms have been ap- Minds Matter dataset. The Boruta approach has been used to select the
plied to classify three categories of depression. Hyperparameter important features. The Decision Tree classifier produced 95% accuracy
optimizers and feature selection methods are used to improve the and 0.99 precision.
performance of these models. There has been considerable work on predicting depression among
• LIME, an explainable AI tool, has been used to understand the Bangladeshi individuals using locally collected datasets. For instance,
final predictions provided by machine learning models. Choudhury and his team used deep learning and five machine learning
• The created models were cross-validated using an independently algorithms to predict depression among Bangladeshi undergraduate
collected dataset from a different time period. The results demon- students based on some basic questionnaires [7]. Recursive Feature
strate that the performance on the cross-validated dataset is com- Elimination with Cross-Validation and Random Forest Classification
parable to the performance on the test dataset, demonstrating the was used for selecting salient features. A total of 65 questions were
robustness of the models. included in their dataset, including 7 basic information, 16 depression-
related queries, 21 BDI and 21 DASS 21-BV (Bangla version) questions.
To the best of our knowledge, this is the first time a new integrated
After pre-processing, 577 participants were considered in this study.
scale based on a voting algorithm has been introduced, and a dataset of
Zulfiker and his colleagues [16] used six different machine learn-
Bangladeshi participants has been used to investigate and comprehend
ing classifiers and three distinct feature selection methods to predict
the level of depression among university students.
depression and extract relevant features. Synthetic Minority Oversam-
The following is the order in which this article is organized. Related
works are addressed in Section 2. Section 3 focuses discussion of the pling Technique (SMOTE) [17] and Burns Depression Checklist (BDC)
methodology adopted in this work. Section 4 examines the acquired were also used in this work. The AdaBoost technique with the Selec-
outcomes. Section 5 discusses how the outcomes achieved our research tKBest feature selection approach provided the best performance with
objectives. Section 6 discusses the limitation of this work and finally, 0.9256 classification accuracy.
Section 7 includes a conclusion with remarks on our future work. Ahmed and teammates used five machine learning classifiers on two
datasets to predict depression and anxiety [18]. In addition, two well-
2. Related work known depression and anxiety measuring scales were used in this work.
The CNN model attained the highest accuracies of 0.96 for anxiety and
Psychiatric disorders, especially depression, are highly prevalent 0.968 for depression detection employing 45 epochs. In [5], the authors
among young people with potentially severe complexities. Appropri- utilized a binary logistic model to predict depression for Bangladeshi
ate and early care for this disease is of paramount importance in university students. The authors have collected a private dataset from
dealing with its consequent impairing problems. However, it is not an online survey of 210 participants using the DASS-21 scale. According
always possible due to associating social stigma and misconceptions to this study, relationships with parents and friends, bedtime and di-
with mental affliction treatment [10]. Considerable work has been etary patterns, and family socioeconomic status are the primary factors
performed to detect depression using measuring scaling systems and of depression and anxiety. Moon et al. employed various machine learn-
artificial intelligence techniques. This section briefly discusses some of ing and ensemble approaches to predict the depression of employed
the notable works in this area. Bangladeshis [19].
2.1. Related work on depression prediction using machine learning 2.2. Related work on depression measuring scaling system
In recent years, with the immense growth of artificial intelligence In order to measure depression levels, clinicians develop a set
and machine learning techniques, many studies have been performed of questions that are given to an individual under assessment. The
to detect depression and anxiety-related mental disorders employing questions typically contain options that the individual needs to select.
a wide range of private and public datasets. Nemesure and his team Based on the responses provided, a score is given per question. The
utilized ensemble-based machine learning approaches to predict the total score obtained is then used to assess the level of depression
depression and anxiety of undergraduate students [11]. More than of an individual. Different scaling systems exist, which are used by
four thousand students from the University of Nice Sophia-Antipolis psychiatrists to diagnose the level of depression one is suffering [20].
participated in this study. The XGBoost model attained an AUC of A new integrated scale has been developed in this work using
0.67 for the validation set. Lee and Kim applied machine learning a majority voting technique on eight existing depression assessment
frameworks to predict the depressive behavior of American adults [12]. scales, i.e., BDI, HDRS, MADRS, EQ-5, PHQ-9, QIDS-SR, DASS-21 and
They used the United States National Health and Nutrition Examination K-10. These assessment scales have been successfully utilized in many
Survey (NHANES) dataset. NHANES conducted a PHQ-9 survey of works to detect depression and anxiety-associated mental disorders.
approximately 8600 people for ten years. Boruta and LASSO feature Some of these recent articles have been briefly presented below.
2
R. Siddiqua et al. Array 18 (2023) 100291
Table 1
A comparative analysis of related works.
Ref. Subject Sample size Age range (years) Data collection means Advantage Limitation
[11] UG students 4184 18–20 mostly Electronics health records Investigates depression and Low AUC score
anxiety disorder
[15] Children and adolescents 6310 4–17 Australian children and The dataset contains rich The samples are in the age
adolescent survey feature set range 4–17
[13] Adult chinese 1538 35–64 Chinese Longitudinal The survey data was Low accuracy
Healthy Longevity Study collected over a longer
time period
[14] Stroke survivors 65 47–79 Comparable-aged stroke Various cognitive and Dataset with low sample
patients with 4-weeks functional analysis were size
screening were polled performed
[12] Adults with hypertension 8628 ≥40 PHQ-9 survey The survey data was Low accuracy
collected over a
comparatively longer time
period
[7] UG students 577 – Beck Depression Scale and Multiple depression scales Low accuracy
DASS-21 were used
[16] Bangladeshi participants 604 16 and above Burns Depression Checklist Age distribution was wide Depression level has not
(BDC) been predicted.
[18] Bangladeshi women – 15–35 Lifestyle related questions. Multiple levels of Limited details on dataset
depression and anxiety distribution.
were measured.
Ustun [21] used the Beck Depression Inventory scale to determine to Burchert and his colleagues [28], users of smartphone health applica-
the depression level among the citizens of Turkey during COVID-19. A tions are eager to analyze everyday depression symptoms. The Patient
custom dataset with 1115 samples was collected through Google Forms Health Questionnaire (PHQ-9) was used for analyzing short-term mood
over a span of ten days. Among the survey participants, 47%, 25.7%, dynamics.
22.3%, and 5% were found to have minimal, mild, moderate, and Wang and teammates [29] aimed to create a mapping framework
severe depression, respectively. Saha and his colleagues [22] applied that connects the acromegaly quality of life (AcroQoL) assessment
the Kessler K-10 metric to determine the level of psychological distress survey to the EQ-5D-5L survey to provide a preference-based score
among Bangladeshi undergraduate students, particularly from Dhaka that could be applied to study the socioeconomic assessment. For this
city. The authors collected 180 records using an online survey over study, 424 adult individuals had an average EQ-5D-5L coefficient score
a span of two weeks. Their dataset contained 28 features comprising of approximately 0.80 with a 0.15 standard deviation. Mergen and
coronavirus-related stress factors. This study discovered that almost colleagues evaluated the validity and accuracy of the Quick Inventory
40% and 30.56% of students struggle with mild and moderate psycho- of Depressive Symptomatology (QIDS-SR16) and its American lexicon
logical distress, respectively. The Hamilton Scale for Anxiety (HAMA), version using the Turkish student participants [30]. Lu et al. evalu-
Hamilton depression rating scale (HDRS), and Beck’s Suicide Intent ated gender-based assessment invariance of the DASS-21 scale [31].
Scale were used in an observational study [23] with a set of predeter- Among 13,208 students from five different cities, 4985 men and 8223
mined questionnaires for Indian students. This study predicted anxiety, women participated in this study. The average indices for the DASS-21
despair, and suicide intent among undergraduate dental students, as depression, anxiety, and stress sub-scales were 2.17, 2.48, and 3.69,
respectively.
well as identified various stressors. Extended study hours, exhaustive
Table 1 provides a summary comparison of related works. The
workload, frequency of tests, competition among peers, and fear of
following observations have been made after careful inspection of the
failing were revealed to be statistically significant stressors.
related work – (i) there is an absence of using a hybrid scale to measure
Ibrahim et al. used Zagazig Depression Scale (ZDS) to measure
the level of depression, (ii) predictive models were typically applied
the existence of psychological illness in a group of Egyptian under-
on single and not on multi-scales, and (iii) most of the works have not
graduate students [24]. In this work, ZDS, an individually-assessed
produced an explanation of the prediction.
Arabic language interpretation of the Hamilton Rating Scale was used
to determine the pervasiveness of mental disorders symptoms. Partic-
3. Methodology and model architecture
ipants revealed an average ZDS coefficient of approximately 18 and a
maximum of 20. These ZDS scores indicate that more than 71% of the Fig. 1 shows the primary steps taken in this research. Following
survey participants suffer from mild depression. Guo [25] used HAMD- the creation of a questionnaire, it is distributed amongst Bangladeshi
17, CES-D, and WHOQOL-BREF evaluation metrics on undergraduate students and subsequently, a dataset is collected. The dataset is pre-
students with SSD to assess the impacts of electroacupuncture and processed so that it is in a format conducive to machine learning
cognitive behavioral therapy, or their combined effects, on mental algorithms to operate on. The dataset is then organized into two forms
disorders. – (i) dataset 1, which comprises 70 personal/basic questions of the
Ozawa conducted cross-sectional research on Japanese outpatients respondents and (ii) dataset 2, which contains all the questions (106
with ICD-10-defined depressive disorder [26]. Montgomery-Asberg De- questions), i.e., the personal questions as well as the clinical questions.
pression Rating Scale (MADRS) was used in this study with 100 de-
pressed outpatients and 36 healthy family members. In [27], the au- 3.1. Questionnaires creation
thors directed exploration on a significant number of people with
abrupt mood swings and subsyndromal depression symptoms. The One of the significant contributions of this work is to create a
integrated characteristics specifier was described as a score of 1 to 3 dataset of integrated depression assessment scales. The dataset is com-
on selected items of the Montgomery Asberg Depression Rating Scale posed of two kinds of questions: (i) Personal questions and (ii) Clinical
(MADRS) or Hamilton Depression Rating Scale (HAMD-17). According questions. After careful inspection of the literature, a total of nine
3
R. Siddiqua et al. Array 18 (2023) 100291
important areas were identified, which subsequently led to 70 per- 3.3. Dataset preprocessing
sonal/basic questions. Fig. 2 shows a mind map of the nine major
areas and relevant parameters. An additional 36 questions have been The data cleaning, organizing, visualizing and finally, handling of
created from various well-established depression assessment scales — the questions have been described exhaustively in this section.
BDI, HDRS, MADRS, EQ-5, PHQ-9, QIDS-SR, DASS-21 and K-10. Three
random Bangladeshi undergraduate students (who had not seen those 3.3.1. Data cleaning
questionnaires before) were asked to take an initial survey to determine
Some of the 684 collected records were incomplete, so they were
whether the selected questions were understandable to them. The
removed from the final dataset. Only 148 of 312 offline submissions
initial survey took 20–25 min to answer all questions. The volunteers
were considered after this removal process. The duplicate data checking
found several HDRS depression assessment scale questions confound-
and the feature-renaming process were conducted. Label and one-hot
ing, including three questions from the personal questionnaire. Conse-
encodings were applied to provide a numerical representation of the
quently, these HDRS questions were translated into the Bangla native
categorical features. For example, ‘Undergraduate’ and ‘Postgraduate’
language. Finally, the evaluation questionnaire was established.
were set to 1 and 2, respectively, in the feature named ‘degree.’ Null-
valued entries were replaced with their corresponding mean values.
3.2. Data collection
Min–max scaler was used to normalize features so that all numerical
features have an acceptable range.
The survey was conducted online using Google Forms and offline
(printed copy) over about nine and half weeks, i.e., from October 23,
2021, to December 28, 2021. A consent agreement document to partic- 3.3.2. Selecting questions from familiar depression assessment tools
ipate in the survey was attached on the first page. Participants were This work used eight depression measurement scales, i.e., BDI,
given a brief idea about the study and their voluntary participation HDRS, QIDS, MADRS, EQ5D, DASS21, PHQ-9 and K10. The selected
in the consent form. It was clearly stated that no traceable personal questions from 5 scales (i.e., BDI, HDRS, QIDS, EQ5D, and DASS21)
information would be collected and participants could stop taking the were taken verbatim. For the other three scales, if the questions overlap
survey and withdraw at any time. A total of 684 records were collected with the selected questions from the five scales, then they were not
— among those, 312 were offline and 372 were online submissions. included — thus, we avoid duplication of questions.
335 males and 349 females aged between 19 and 35 participated in To create each category, values were assigned to each question
the study. for that particular scale and scores were calculated by summing up
4
R. Siddiqua et al. Array 18 (2023) 100291
Fig. 2. Mind map of the personal questionnaires. The blue boxes show the nine major areas that were identified. (For interpretation of the references to color in this figure
legend, the reader is referred to the web version of this article.)
those values. The scales have a unique range for measuring depres- expressed using three common depression classification levels, i.e., Nor-
sion levels, primarily containing 4 to 6 classes (except EQ5D, which mal, Moderate and Extreme, to maintain consistency and achieve better
contains 2 classes). Finally, the range of the scales was normalized and results.
5
R. Siddiqua et al. Array 18 (2023) 100291
Table 2
Synopsis of the eight depression measuring scales used in this study.
Scales Levels of depression based on No. of questions Depression categories of
total score new scale
Beck Depression Inventory Normal (1–10) 21 Normal (0–13)
(BDI) Mild mood disturbance (11–16) Moderate (14–27)
Borderline clinical depression Extreme (over 27)
(17–20)
Moderate depression (21–30)
Severe depression (31–40)
Extreme depression (over 40)
Hamilton Depression Normal (0–7) 17 Normal (0–10)
Rating Scale (HDRS) Mild depression (8–16) Moderate (11–22)
Moderate depression (17–23) Extreme (over 22)
Severe depression (over 23)
The Quick Inventory of Normal (0–5) 16 Normal (0–8)
Depressive Mild depression (6–10) Moderate (9–17)
Symptomatology Moderate depression (11–15) Extreme (18–27)
(QIDS-SR16) Severe depression (16–20)
Very severe depression (21–27)
Montgomery and Asberg Depressive symptoms absent (0–8) 10 Normal (0–15)
Depression Rating Scale Mild (9–17) Moderate (16–30)
(MADRS) Moderate (18–34) Extreme (over 30)
Severe (35–60)
EQ-5D Worst (0) 5 Normal (0–8)
Best (100) Extreme (over 8)
Depression, Anxiety and Normal (0–9) 21 Normal (0–13)
Stress Scale (DASS21) Mild (10–13) Moderate (14–28)
Moderate (14–20) Extreme (over 28)
Severe (21–27)
Extremely Severe (over 27)
Patient Health Minimal depression (1–4) 9 Normal (0–7)
Questionnaire (PHQ-9) Mild depression (5–9) Moderate (8–16)
Moderate depression (10–14) Extreme (17–27)
Moderately severe depression
(15–19)
Severe depression (20–27)
Kessler Psychological Normal (10–19) 10 Normal (0–22)
Distress Scale (K10) Mild disorder (20–24) Moderate (23–30)
Moderate disorder (25–29) Extreme (31–50)
Severe disorder (30–50)
3.4. New depression assessment scale creation the feature space — this facilitates analyzing depression on a much
wider number of factors. This also improves the predictability of the
In this research, a new scale was created using the voting technique machine learning algorithms. To determine the level of depression for
on eight different depression measuring scales, discussed in Table 2. a new record, the new record is assessed against the eight depression
The eight scales do not use the same set of questions. One of the scales. The most frequent label (normal, moderate, extreme) serves as
critical motivations for combining the eight scales is that it increases the final depression level for this record. However, to handle equal
6
R. Siddiqua et al. Array 18 (2023) 100291
Fig. 4. Percentage distribution of respondent (a) depression level, (b) suicidal thoughts by gender.
votes, the lower depression level was taken into account. For example, depression dataset. According to this figure, women are more vulnera-
if both ‘normal’ and ‘moderate’ levels obtain the maximum votes and ble to both depression and suicidal thoughts than men. Depression was
are equal, then ‘normal’ was taken as the level of depression. Later, this found in about one-third (33.27%) of the total respondents, whereas
new scale was used as a truth value while applying supervised machine the male percentage was 28.84%. Fig. 4(b) shows the suicidal thoughts
learning techniques. The pseudocode of the voting algorithm is shown distributions across gender. Of the respondents who have suicidal
in algorithm 1. Finally, according to the new scale, 197 students were thoughts, 56.6% (120 out of 212) were females and 43.4% (92 out of
labeled normal, 191 students were marked moderately depressed, and 212) were males.
132 students were found extremely depressed. The depression detection Fig. 5 demonstrates the percentage of depression levels in those who
questionnaire and the scale creation procedures are illustrated in Fig. 3. have experienced emotional and sexual violence. According to Fig. 5,
71.23% and 37.73% of total participants admitted that they had faced
emotional and sexual violence, respectively, which provokes depression
Algorithm 1 Voting algorithm to measure depression
in their life.
1: procedure Voting Algorithm(𝑅𝑒𝑐𝑜𝑟𝑑 r)
2: 𝑠𝑐𝑎𝑙𝑒𝑠 = [BDI, HDRS, QIDS, MADRS, EQ5D, DASS21, PHQ-9, • Fig. 5(a) shows that both cases of ‘‘Strongly Agree’’ and ‘‘Agree’’
K10] have a large number of extremely depressed students due to
3: 𝑛𝑜𝑟𝑚𝑎𝑙𝐹 𝑟𝑒𝑞 ← 0 emotional violence where the normal percentage (1.42%) is very
4: 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒𝐹 𝑟𝑒𝑞 ← 0 low. However, the Extreme case is almost twice (27.36%) than
5: 𝑒𝑥𝑡𝑟𝑒𝑚𝑒𝐹 𝑟𝑒𝑞 ← 0 the moderate case (14.62%) according to the ‘‘Strongly agree’’
6: for i in scales do instances.
7: Determine depression level for Record 𝑟 • Fig. 5(b) shows that a small portion of respondents have experi-
8: 𝑑𝑒𝑝𝐿𝑒𝑣𝑒𝑙 ← depression level for Record 𝑟 enced sexual violence, but those who have been victims of sexual
9: if depLevel == normal then violence suffer from depression.
10: 𝑛𝑜𝑟𝑚𝑎𝑙𝐹 𝑟𝑒𝑞 ← 𝑛𝑜𝑟𝑚𝑎𝑙𝐹 𝑟𝑒𝑞 + 1 According to Fig. 5(b), if observed carefully in both ‘‘Strongly
11: else if depLevel == moderate then Agree’’ and ‘‘Agree’’ instances, no student was found without de-
12: 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒𝐹 𝑟𝑒𝑞 ← 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒𝐹 𝑟𝑒𝑞 + 1 pression. On top of that, the extreme depression class percentage
13: else is higher than the moderate percentage for sexual violence.
14: 𝑒𝑥𝑡𝑟𝑒𝑚𝑒𝐹 𝑟𝑒𝑞 ← 𝑒𝑥𝑡𝑟𝑒𝑚𝑒𝐹 𝑟𝑒𝑞 + 1
15: end if Fig. 6 shows the proportion of depression levels of survey respon-
16: end for dents who have financial hardship. According to Fig. 6, more than
17: if 𝑛𝑜𝑟𝑚𝑎𝑙𝐹 𝑟𝑒𝑞 ≥ 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒𝐹 𝑟𝑒𝑞 and 𝑛𝑜𝑟𝑚𝑎𝑙𝐹 𝑟𝑒𝑞 ≥ 𝑒𝑥𝑡𝑟𝑒𝑚𝑒𝐹 𝑟𝑒𝑞 half (57.41%) of students were found to have their families financially
then dependent on them due to their current financial difficulties and also
18: 𝑑𝑒𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐿𝑒𝑣𝑒𝑙 ← 𝑛𝑜𝑟𝑚𝑎𝑙 have moderate to extreme level of depression.
19: else if 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒𝐹 𝑟𝑒𝑞 > 𝑛𝑜𝑟𝑚𝑎𝑙𝐹 𝑟𝑒𝑞 and 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒𝐹 𝑟𝑒𝑞 ≥ According to Fig. 7(a), 323 out of 520 (62.11%) of regular smokers
𝑒𝑥𝑡𝑟𝑒𝑚𝑒𝐹 𝑟𝑒𝑞 then have moderate to severe depression. It is also worth noting that there
20: 𝑑𝑒𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐿𝑒𝑣𝑒𝑙 ← 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒 is a correlation between a lack of physical activity and depression.
21: else According to Fig. 7(b), 67.76% of participants who have never or
22: 𝑑𝑒𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐿𝑒𝑣𝑒𝑙 ← 𝑒𝑥𝑡𝑟𝑒𝑚𝑒 infrequently engaged in physical exercise have moderate to severe
23: end if depression.
24: return 𝑑𝑒𝑝𝑟𝑒𝑠𝑠𝑖𝑜𝑛𝐿𝑒𝑣𝑒𝑙 Empirical findings also show that families play a critical role in peo-
25: end procedure ple’s lives, to the point where they feel pressured by family members.
Participants were provided with six statements and asked to rate how
much pressure they feel from their family members.
3.5. Data visualization Fig. 8 indicates that the pressure towards career choice, higher stud-
ies and the lifestyle one adopt is slightly higher than the pressure one
Fig. 4 provides information on the percentage of depression levels feels towards marriage. However, a significant amount of respondents
and suicidal thoughts among male and female students of the collected feel extremely stressed with their role as a sibling or a child.
7
R. Siddiqua et al. Array 18 (2023) 100291
Fig. 5. Percentage of depression of survey respondents who have experienced (a) emotional violence and (b) sexual violence.
Since most of the participants are students, one source of depression into two parts — thus ensuring that the two datasets contain equal
originates from academic performance. Therefore, in order to under- representations from all classes. Dataset 1 contained only personal
stand the perception of the respondents in terms of their academic questions while dataset 2 included all questions, i.e., personal and
performances, seven statements were provided. The respondents used questions obtained from the depression assessment scales. Both datasets
a 5-point Likert scale to specify their agreement level with these were split for training (90%) and testing (10%) purposes. Therefore,
statements.
468 and 52 instances were assigned for the training and validation
Fig. 9 shows that academic workload plays a critical role in the
sets, respectively. Ten machine learning (Logistic Regression, Gradient
amount of stress they feel.
Boosting, K-Nearest Neighbor, Random Forest, Decision Tree, Sup-
3.6. Algorithms application port Vector Machine, Perceptron, Naive Bayes (Gaussian), Naive Bayes
(Multinomial), ZeroR Classifier and two deep learning (Artificial Neural
After preprocessing the data, 520 records were obtained in the Network (ANN), and Convolutional Neural Network (CNN)) techniques
final dataset. The stratified approach was used to divide the dataset were applied.
8
R. Siddiqua et al. Array 18 (2023) 100291
9
R. Siddiqua et al. Array 18 (2023) 100291
Fig. 7. Distribution of (a) smoking habit and (b) physical exercises of the respondents.
4. Result analysis
𝑇𝑃
𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (6)
This section discusses the results of the proposed automatic depres- 𝑇𝑃 + 𝐹𝑃
sion assessment system. 𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (7)
Accuracy, precision, recall and F1 score are used to measure the 𝑇𝑃 + 𝐹𝑁
performance of various machine learning and deep learning models. 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
Eq. (6) to (9) were used to determine the performance metrics. 𝐹 1 𝑠𝑐𝑜𝑟𝑒 = 2 × (8)
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
10
R. Siddiqua et al. Array 18 (2023) 100291
Table 3
Performance metrics for different algorithms of dataset 1 and dataset 2.
Algorithms Datasets Accuracy Precision Recall F1 score
Dataset-1 50.00% 48.79% 50.00% 48.97%
Logistic Regression
Dataset-2 88.46% 88.50% 88.46% 88.42%
Dataset-1 50.00% 52.83% 50.00% 51.12%
Gradient Boosted Algorithm
Dataset-2 82.69% 84.63% 82.69% 83.04%
Dataset-1 63.46% 60.77% 61.54% 59.72%
K-Nearest Neighbor Algorithm
Dataset-2 78.85% 79.62% 79.62% 75.60%
Dataset-1 63.46% 62.67% 63.46% 62.83%
Random Forest
Dataset-2 90.38% 89.17% 88.46% 88.67%
Dataset-1 59.62% 63.14% 61.54% 61.44%
Decision Tree
Dataset-2 76.92% 77.54% 75.00% 75.45%
Dataset-1 57.69% 58.17% 57.69% 57.82%
Support Vector Machine
Dataset-2 86.54% 91.03% 86.54% 86.53%
Dataset-1 46.15% 65.21% 32.69% 25.27%
Perceptron
Dataset-2 76.92% 83.02% 75.00% 76.18%
Dataset-1 55.77% 41.07% 55.77% 47.23%
Naive Bayes (Gaussian)
Dataset-2 73.08% 73.46% 73.08% 70.47%
Dataset-1 61.54% 62.45% 61.54% 61.70%
Naive Bayes (Multinomial)
Dataset-2 86.54% 87.98% 86.54% 86.94%
Dataset-1 26.92% 07.24% 26.92% 11.42%
ZeroR
Dataset-2 26.92% 07.24% 26.92% 11.42%
Dataset-1 48.08% 50.50% 38.46% 30.69%
ANN
Dataset-2 69.23% 54.08% 65.38% 56.68%
Dataset-1 55.77% 58.08% 56.63% 57.72%
CNN
Dataset-2 92.31% 88.83% 87.86% 88.63%
Table 4
Accuracy for different algorithms on dataset 1 and dataset 2 after using various hyperparameter optimizers.
Algorithms Datasets Randomized GridSearch Bayesian Genetic Algorithms (%) Optuna (%)
SearchCV (%) CV (%) Optimization (%)
Dataset-1 48.08% 48.08% 48.08% 51.92% 50.00%
Logistic Regression
Dataset-2 90.38% 92.31% 90.38% 84.61% 88.69%
Dataset-1 63.36% 51.92% 55.77% 54.64% 53.85%
Gradient Boosting
Dataset-2 94.23% 79.90% 94.23% 89.20% 78.84%
Dataset-1 55.77% 55.77% 61.54% 55.77% 63.46%
K-Nearest Neighbor
Dataset-2 73.08% 75.00% 76.92% 73.08% 73.08%
Dataset-1 59.61% 61.54% 63.46% 57.69% 61.53%
Random Forest
Dataset-2 92.31% 92.31% 90.38% 92.31% 88.46%
Dataset-1 55.77% 53.85% 53.85% 57.69% 61.54%
Decision Tree
Dataset-2 92.31% 69.23% 67.31% 69.23% 71.15%
Dataset-1 57.69% 64.75% 58.90% 59.94% 57.90%
Support Vector Machine
Dataset-2 86.54% 86.54% 94.23% 86.54% 82.48%
Dataset-1 48.08% 48.64% 47.62% 49.64% 53.94%
Perceptron
Dataset-2 75.00% 84.62% 83.61% 76.32% 80.69%
Dataset-1 59.62% 59.62% 57.69% 55.77% 61.54%
Gaussian Naive Bayes
Dataset-2 78.85% 78.85% 80.77% 78.85% 78.85%
Dataset-1 51.92% 53.85% 55.77% 57.69% 53.85%
Multinomial Naive Bayes
Dataset-2 86.54% 86.54% 84.61% 82.69% 84.61%
Dataset-1 26.92% 26.92% 26.92% 26.92% 26.92%
ZeroR
Dataset-2 26.92% 26.92% 26.92% 26.92% 26.92%
Table 5
Hyperparameters of the CNN model.
Parameters Values/Types
Epoch 15
Initial learning rate 0.001
Batch size 5
Optimizer Adam
Loss function Categorical cross-entropy
11
R. Siddiqua et al. Array 18 (2023) 100291
The results obtained after taking the top 20 features and dropping
the lowest 20 features from both datasets are shown in Tables 6
Fig. 11. Accuracy vs. epochs graph of CNN for dataset 2.
and 7 respectively. From Table 6, it can be seen that, on dataset 1,
Gradient Boosting Algorithm and KNN provided 67.31% and 65.38%
accuracies after using the Pearson Correlation and SelectKBest method
respectively, on dataset 2. The precision of 65.21% and 91.03% on respectively. On dataset 2, 92.31% and 96.15% accuracies were found
datasets 1 and 2 respectively were achieved by Perceptron and Support from Logistic Regression and Random Forest using Pearson Correlation
Vector Machine. Recall of 63.46% and 88.46% were achieved on and Mutual Information methods respectively. Table 7 showed that
datasets 1 and 2 respectively using Random Forest. Logistic Regression on dataset 1, Gradient Boosting Algorithm provided 67.31% accu-
also has 88.46% recall. As expected, the baseline classifier, ZeroR, racy using the Pearson Correlation method. The Random Forest model
achieved the lowest performance on datasets 1 and 2. The proposed with Pearson’s Correlation coefficient-based feature selection technique
ML models were found to provide better results on dataset 2. attained the highest accuracy, i.e., 98.08%, for dataset 2.
Fig. 11 illustrates the accuracy with the change of epochs of the CNN
Fig. 12 illustrates the comparison of accuracies between various ML
model for dataset 2. As expected, the accuracy gradually improves with
techniques obtained before and after feature selection on both datasets
the increment of the epochs.
with the most important 20 and 50 features. In dataset 1, the accu-
4.1. Performance of different algorithms with hyperparameter optimization racy of Logistic Regression without feature selection is 50.00%, which
increased to 63.46% and 65.38% after selecting the top 20 and 50
This section discusses the performance of various classifiers with hy- features, respectively. For Gradient Boosting Algorithm, the accuracy
perparameter optimization techniques applied to them. From Table 4, without feature selection is 50.00%. However, the accuracy improved
we can notice that after using RandomizedSearchCV, GridSearchCV to 67.31% considering both the best 20 and 50 features. Similarly,
and BayesianOptimization on dataset 1, 63.36%, 64.75% and 63.46% in dataset 2, Gradient Boosting Algorithm attained 82.69% accuracy,
accuracies were obtained by Gradient Boosting, SVM and Random which increased to 90.38% and 88.54% after taking the dominant
Forest frameworks, respectively. In the case of dataset 2, Gradient 20 and 86 features, respectively. The accuracy of Random Forest was
Boosting Algorithm and Random Forest achieved 94.23% and 92.31% 90.38% which increased to 96.15% and 98.08% after considering the
accuracy with the RandomizedSearchCV optimizer. most important 20 and 86 features, respectively.
12
R. Siddiqua et al. Array 18 (2023) 100291
Table 6
Accuracy for different algorithms on dataset 1 and dataset 2 after using various feature selection methods of significant 20 features.
Algorithms Datasets Recursive Select K Fisher Extra Trees Pearson Mutual Mutual Info Manual Variance
Feature Best (%) Score Chi Classifier (%) Correlation Informa- Regression Uniqueness Threshold
Elimination square Test (%) tion (%) (%) (%) (%)
(%) (%)
Dataset-1 61.54 63.46 61.54 57.69 48.08 59.62 57.69 61.54 50.00
Logistic Regression
Dataset-2 88.46 88.46 84.62 88.46 92.31 90.38 80.77 61.54 88.46
Dataset-1 46.15 53.85 55.77 57.69 67.31 57.69 59.62 51.92 50.00
Gradient Boosting
Dataset-2 82.69 90.38 80.77 84.62 82.69 86.54 84.62 51.92 82.69
Dataset-1 51.92 65.38 65.38 57.69 57.69 59.62 61.54 55.77 63.46
K-Nearest Neighbor
Dataset-2 88.46 82.69 82.69 76.92 76.92 76.92 78.85 55.77 78.85
Dataset-1 61.54 59.62 61.54 53.85 61.54 65.38 63.46 51.92 63.46
Random Forest
Dataset-2 82.69 94.23 90.38 92.31 92.31 96.15 90.38 59.62 90.38
Dataset-1 44.23 53.85 55.77 55.77 55.77 63.46 55.77 61.54 59.62
Decision Tree
Dataset-2 78.85 73.08 90.08 78.85 75.00 75.00 76.92 59.62 76.92
Dataset-1 63.46 61.54 61.54 61.54 61.54 55.77 61.54 53.85 57.69
Support Vector Machine
Dataset-2 82.69 88.46 86.54 86.54 90.38 84.62 84.62 53.85 86.54
Dataset-1 46.15 53.85 34.62 48.08 34.62 36.54 51.92 44.23 46.15
Perceptron
Dataset-2 76.92 53.85 76.92 76.92 65.38 67.31 71.15 44.23 76.92
Dataset-1 53.85 61.54 59.62 59.62 55.77 53.85 55.77 61.54 55.77
Gaussian Naive Bayes
Dataset-2 78.85 86.54 80.77 88.46 73.08 82.69 82.69 61.54 73.08
Dataset-1 67.31 57.69 61.54 55.77 61.54 61.54 51.92 59.62 61.54
Multinomial Naive Bayes
Dataset-2 80.77 82.69 82.69 84.62 86.54 78.85 86.54 59.62 86.54
Dataset-1 26.92 26.92 26.92 26.92 26.92 26.92 26.92 26.92 26.92
ZeroR
Dataset-2 26.92 26.92 26.92 26.92 26.92 26.92 26.92 26.92 26.92
Table 7
Accuracy for different algorithms on dataset 1 (best 50 features) and dataset 2 (top 86 features) after using feature selection methods.
Algorithms Datasets Recursive Select K Fisher Extra Trees Pearson Mutual Mutual Info Manual Variance
Feature Best (%) Score Chi Classifier (%) Correlation Informa- Regression Uniqueness Threshold
Elimination square (%) tion (%) (%) (%) (%)
(%) Test (%)
Dataset-1 55.77 48.08 50.00 44.23 48.08 53.85 57.69 65.38 50.00
Logistic Regression
Dataset-2 92.31 90.38 90.38 88.46 92.31 88.46 88.46 88.46 88.46
Dataset-1 55.77 50.00 50.00 50.00 67.31 55.77 53.85 65.38 50.00
Gradient Boosting
Dataset-2 82.69 78.85 78.85 78.85 82.69 82.69 84.62 88.54 82.69
Dataset-1 63.46 50.00 50.00 48.08 57.69 55.77 51.92 57.69 63.46
K-Nearest Neighbor
Dataset-2 80.77 75.00 76.92 78.85 76.92 75.00 78.85 80.77 78.85
Dataset-1 61.54 65.38 57.69 65.38 59.62 57.69 55.77 63.46 63.46
Random Forest
Dataset-2 92.31 86.54 88.46 88.46 98.08 86.54 88.46 86.54 90.38
Dataset-1 59.62 57.69 61.54 61.54 53.85 67.31 57.69 57.69 59.62
Decision Tree
Dataset-2 67.31 69.23 73.08 73.08 73.08 73.08 75.00 67.31 76.92
Dataset-1 63.46 57.69 59.62 59.62 61.54 63.46 61.54 59.62 57.69
SVM
Dataset-2 88.46 84.62 84.62 88.46 90.38 88.46 88.46 88.46 86.54
Dataset-1 48.08 55.77 46.15 53.85 34.62 32.69 28.85 42.31 46.15
Perceptron
Dataset-2 90.38 76.92 90.38 67.31 65.38 80.77 63.46 73.08 76.92
Dataset-1 53.85 55.77 55.77 57.69 55.77 59.62 57.69 57.69 55.77
Gaussian Naive Bayes
Dataset-2 78.85 76.92 73.08 76.92 73.08 75.00 75.00 86.54 73.08
Dataset-1 63.46 59.62 65.38 59.62 61.54 61.54 55.77 65.38 61.54
Multinomial Naive Bayes
Dataset-2 88.46 84.62 84.62 86.54 86.54 90.38 86.54 86.54 86.54
Dataset-1 26.92 26.92 26.92 26.92 26.92 26.92 26.92 26.92 26.92
ZeroR
Dataset-2 26.92 26.92 26.92 26.92 26.92 26.92 26.92 26.92 26.92
Synopsis of the feature selection techniques for dataset 1 and dataset score for this specific sample, as illustrated in Fig. 13, because of its
2 are summarized in Table 8. irregular sleep pattern, suicidal and somatic symptoms, disappointment
Local Interpretable Model-Agnostic Explanations (LIME) is an effi- with the future and self-blaming attitude.
cient technique for comprehending black box machine learning mod- Table 9 compares the proposed depression detection system with
els [32]. This technique constructs a simpler surrogate model by locally similar works.
approximating the complex ML model. The surrogate model analyzes
the confined area of the individual prediction and formulates a logical 4.3. Cross validation on an independently collected dataset
explanation in that local region. Figs. 13 and 14 demonstrate the
depression prediction interpretation of an extreme case and a normal One of the challenges with this model is that it is difficult to apply
instance, respectively, provided by the LIME explainable AI framework. the model on other existing datasets. This is due to the fact that this
The Random Forest model with the Pearson correlation feature selec- model is trained on a dataset containing rich feature set while other
tion technique predicted extreme depression with a 0.88 confidence existing dataset does not possess such extensive feature set. Therefore,
13
R. Siddiqua et al. Array 18 (2023) 100291
Fig. 12. Accuracies before and after feature selection (a) dataset 1 (b) dataset 2.
Table 8
Summary of the feature selection techniques for dataset 1 and dataset 2.
Machine learning algorithm Dataset Highest accuracy obtained Highest accuracy (%)
ZeroR 1, 2 Same 26.92
Naive Bayes (Multinomial) 1 Considering top 20 features 67.31
Naive Bayes (Multinomial) 2 Considering top 86 features 90.38
Naive Bayes (Gaussian) 1 Considering top 20 features 61.54
Naive Bayes (Gaussian) 2 Considering top 86 features 88.46
Perceptron 1 Considering top 50 features 55.77
Perceptron 2 Considering top 86 features 90.38
Support Vector Machine 1 Same performance if top 20/top 50 features are considered 63.46
Support Vector Machine 2 Same performance if top 20/top 86 features are considered 90.38
Decision Tree 1 Considering top 50 features 67.31
Decision Tree 2 Considering top 20 features 90.08
Random Forest 1 Same performance if top 20/top 50 features are considered 65.38
Random Forest 2 Considering top 86 features 98.08
K-Nearest Neighbor 1 Considering top 20 features 63.46
K-Nearest Neighbor 1 Considering top 20 features 88.46
Gradient Boosting 1 Same performance if top 20/top 50 features are considered 67.31
Gradient Boosting 2 Considering top 20 features 90.38
Logistic Regression 1 Considering top 50 features 65.38
Logistic Regression 2 Same performance if top 20/top 86 features are considered 92.31
14
R. Siddiqua et al. Array 18 (2023) 100291
Fig. 13. Depression prediction interpretation of an extreme case of the LIME explainable AI.
Fig. 14. Depression prediction interpretation of a normal instance of the LIME explainable AI.
15
R. Siddiqua et al. Array 18 (2023) 100291
Table 9
Comparison of the proposed depression prediction system with similar works.
Reference Dataset Number of participants Used scale Number of questions Best model Accuracy
[5] Custom dataset of 210 Modified DASS-21 21 Binary logistic model N/A
Bangladeshi students
[11] Undergraduate students from 4184 Electronic health records 59 XGBoost AUC = 0.79
the University of Nice
Sophia-Antipolis
[12] United States National 8628 PHQ-9 9 SVM 77.1%
Health and Nutrition
Examination Survey
(NHANES)
[13] Chinese Longitudinal 1538 Information on health status 8 Gradient Boosting 75.9%
Healthy Longevity Study and quality of life
(CLHLS)
[14] Korean adults 623 National Institutes of Health 13 SVM 77.1%
Stroke Scale (NIHSS)
[15] Australian children 6310 Australian Child and 667 XGBoost 95%
Adolescent Survey
[16] Bangladeshi participants 6.4 Burns Depression Checklist 55 AdaBoost 92.56%
(BDC)
[18] Bangladeshi women 623 Lifestyle related questions 30 CNN 96.8%
This work Custom dataset of 684 Combined scale 106 Random Forest 98.08%
Bangladeshi students
Table 10
Model performance on test set and cross-validation dataset.
Classifier Details on Test accuracy (%) Accuracy on independent Training time (ms) Model size (KB)
features/hyperparameters used dataset (cross-val) (%)
Random Forest Pearson Correlation 98.08 95.95 103.77 88.6
(drop 20 features)
Gradient Boosting RandomizedSearchCV 94.23 93.24 210 755.27 88.6
(Hyperparameter
Optimization)
Logistic Regression GridSearchCV 92.31 87.84 700.71 88.6
(Hyperparameter
Optimization)
Logistic Regression Recursive Feature 92.31 87.84 54.85 2.79
Elimination (drop
20 features)
and frequent social media usage have been found to be common traits 6. Limitation of the work
among depressed students.
Another objective of this research is to test if depression can be ac- This article presents a novel approach of measuring depression as
well as predicting depression using machine learning and deep learning
curately predicted with the newly created depression assessment scale
models and was applied to Bangladeshi university going students.
and using various machine learning and deep learning models. The new
However, the work has experienced a number of challenges:
depression assessment questionnaires have been created employing the
voting technique on eight well-known depression measuring scales. • The novel approach involved the use of a voting algorithm that
Various feature selection approaches and hyperparameter optimization was applied on eight well-known depression measuring scales.
techniques have been performed to enhance the performance of the Consequently, this needed a much more extensive feature set
prediction models. The accuracy is further increased by keeping the compared to the previous work that exists in the literature.
• Due to extensive feature set, the questionnaire had a total of 106
dominant features, i.e., removing irrelevant ones.
questions which were time consuming for participants to fill up.
The model proposed here has been highly accurate and reliable
• Other datasets available in literature had much smaller set of
compared to the notable works found in the literature, as demonstrated features. As a consequence, it is not possible to apply our models
in Table 9. Empirical results reveal that machine learning and deep on other existing datasets.
learning algorithms have been able to predict depression with high • This model was applied on university going students. Hence, it
performance. The training duration and model size of the classifiers may not be reliable in inferring predictions on individuals with
listed in Table 10 have been determined for the PC configuration shown different profiles (such as children, elderly, etc...)
below:
Processor: Intel (R) Core(TM) i7-6500U CPU @ 2.50 GHz 2.59 GHz, 7. Conclusion and future scope
RAM: 8 GB, System Type: 64-bit operating system, x64-based processor.
This research paper presents various machine learning and deep
Random forest has been found to produce the best result on the test learning methods for assessing the level of depression among
set. Random forest uses ensemble approach that combines the strengths Bangladeshi university students. In addition, the study explores various
of various decision trees and hence it is likely to have provided a better personal and social factors that negatively impact young people’s men-
result. tal health. Moreover, this work has significant importance in studying
16
R. Siddiqua et al. Array 18 (2023) 100291
various factors of suicidal activities among young people that have been [8] Hoque R. Major mental health problems of undergraduate students in a private
increasing recently, mostly due to depression. Our research created a university of Dhaka, Bangladesh. Eur Psychiatry 2015;30:1880.
[9] Sultana J, Elhum Uddin Quadery S, Amik FR, Basak T, Momen S. A data-driven
new depression measuring scale with three distinct levels, i.e., normal,
approach to understanding the impact of Covid-19 on dietary habits amongst
moderate and extreme, using the voting technique on the results of the Bangladeshi students. J Posit Sch Psychol 2022;6:11691–7.
eight recognized scales. A private dataset containing 684 participants [10] Conceição V, Rothes I, Gusmão R. The association between stigmatizing attitudes
has been collected following the consent of the participants. Next, towards depression and help seeking attitudes in college students. PLOS ONE
nine feature selection methods were used to find relevant and most 2022;17:1–14.
[11] Nemesure M, Heinz M, Huang R, Jacobson N. Predictive modeling of psychiatric
important features contributing to depression. In addition, 12 machine illness using electronic health records and a novel machine learning approach
learning, ensemble and deep learning algorithms were applied to with artificial intelligence. Sci Rep 2020;11:1–9.
predict depression automatically. Random Forest, Gradient Boosting [12] Lee C, Kim H. Machine learning-based predictive modeling of depression in
Algorithm and CNN were found to be the better models for evaluating hypertensive populations. PLOS ONE 2022;17:1–17.
[13] Su D, Zhang X, He K, Chen Y. Use of machine learning approach to predict
depression. Finally, hyperparameter optimization and feature selection
depression in the elderly in China: A longitudinal study. J Affect Disord
approaches were applied to enhance the efficiency of the prediction re- 2021;282:289–98.
sults. In the future, a comprehensive sample with more diverse cohorts [14] Ryu YH, et al. Prediction of poststroke depression based on the outcomes of
can be combined with the existing dataset. Meta-heuristic optimization machine learning algorithms. J Clin Med 2022;11.
techniques can be applied to find the best features from patients’ [15] Haque UM, Kabir E, Khanam R. Detection of child depression using machine
learning methods. PLOS ONE 2021;16:1–13.
extensive biometric markers and characteristics. Possible extensions of [16] Zulfiker MS, Kabir N, Biswas AA, Nazneen T, Uddin MS. An in-depth analysis
this work are to utilize the prowess of more advanced artificial intel- of machine learning approaches to predict depression. Curr Res Behav Sci
ligence frameworks, e.g., adversarial and sequential learning, domain 2021;2:1–12.
adaptation, etc. [17] Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority
over-sampling technique. J Artificial Intelligence Res 2002;16:321–57.
[18] Ahmed A, Sultana R, Ullas MTR, Begom M, Rahi MMI, Alam MA. A machine
CRediT authorship contribution statement learning approach to detect depression and anxiety using supervised learning.
In: Asia-Pacific conference on computer science and data engineering. 2020, p.
Rokeya Siddiqua: Conceptualization, Methodology, Formal anal- 1–6.
ysis, Investigation, Data curation, Writing – original draft, Writing [19] Moon NN, Mariam A, Sharmin S, Islam MM, Nur FN, Debnath N. Machine
learning approach to predict the depression in job sectors in Bangladesh. Curr
– review & editing, Visualization. Nusrat Islam: Conceptualization,
Res Behav Sci 2021;2:1–10.
Methodology, Formal analysis, Investigation, Writing – original draft. [20] Cassidy S, Bradley L, Bowen E, Wigham S, Rodgers J. Measurement properties
Jarba Farnaz Bolaka: Conceptualization, Methodology, Investigation. of tools used to assess depression in adults with and without autism spectrum
Riasat Khan: Conceptualization, Methodology, Formal analysis, Inves- conditions: A systematic review. Autism Res 2018;11:738–54.
[21] Ustun G. Determining depression and related factors in a society affected by
tigation, Writing – original draft, Writing – review & editing, Supervi-
COVID-19 pandemic. Int J Soc Psychiatry 2021;67:54–63.
sion. Sifat Momen: Conceptualization, Methodology, Formal analysis, [22] Saha A, Dutta A, Sifat RI. The mental impact of digital divide due to COVID-19
Investigation, Writing – original draft, Writing – review & editing, pandemic induced emergency online learning at undergraduate level: Evidence
Supervision. from undergraduate students from Dhaka City. J Affect Disord 2021;294:170–9.
[23] Bathla M, Singh M, Kulhara P, Chandna S, Aneja J. Evaluation of anxiety,
depression and suicidal intent in undergraduate dental students: A cross-sectional
Declaration of competing interest
study. Contemp Clin Dent 2015;6:215–22.
[24] Ibrahim AK, Kelly SJ, Glazebrook C. Reliability of a shortened version of the
The authors declare that they have no known competing finan- Zagazig Depression Scale and prevalence of depression in an Egyptian university
cial interests or personal relationships that could have appeared to student sample. Compr Psychiatry 2012;53:638–47.
[25] Guo T, Guo Z, Zhang W, Ma W, Yang X, Yang X, Hwang J, He X, Chen X,
influence the work reported in this paper.
Ya T. Electroacupuncture and cognitive behavioural therapy for sub-syndromal
depression among undergraduates: A controlled clinical trial. Acupunct Med
Data availability 2016;34:356–63.
[26] Ozawa C, et al. Resilience and spirituality in patients with depression and their
Data will be made available on request. family members: A cross-sectional study. Compr Psychiatry 2017;77:53–9.
[27] McIntyre RS, et al. The prevalence and illness characteristics of DSM-5-defined
‘‘mixed feature specifier’’ in adults with major depressive disorder and bipolar
References disorder: results from the International Mood Disorders Collaborative Project. J
Affect Disord 2015;172:259–64.
[1] Dunn G, Sham P, Hand D. Statistics and the nature of depression. Psychol Med [28] Burchert S, Kerber A, Zimmermann J, Knaevelsrud C. Screening accuracy of a
1993;23:871–89. 14-day smartphone ambulatory assessment of depression symptoms and mood
[2] Organization WH. The World Health Report: Mental disorders affect one in four dynamics in a general population sample: Comparison with the PHQ-9 depression
people. 2001. screening. PLoS One 2021;16:1–25.
[3] Organization WH. Depression. 2023, https://www.who.int/news-room/fact- [29] Wang K, et al. Mapping of the acromegaly quality of life questionnaire to
sheets/detail/depression, [Online, accessed: 22 April 2023]. ED-5D-5L index score among patients with acromegaly. Eur J Health Econ
[4] Hossain MD, Ahmed HU, Chowdhury WA, Niessen LW, Alam DS. Mental 2021;1–11.
disorders in Bangladesh: A systematic review. BMC Psychiatry 2014;14:1–8. [30] Mergen H, et al. Comparative validity and reliability study of the QIDS-SR16 in
[5] Arusha A, Biswas R. Prevalence of stress, anxiety and depression due to Turkish and American college student samples. Klin Psikofarmakol Bülteni-Bull
examination in Bangladeshi youths: A pilot study. Child Youth Serv Rev Clin Psychopharmacol 2011;21:289–301.
2020;116:1–6. [31] Lu S, Hu S, Guan Y, Xiao J, Cai D, Gao Z, Sang Z, Wei J, Zhang X, Margraf J.
[6] Chang J, Yuan Y, Wang D. Mental health status and its influencing factors Measurement invariance of the Depression Anxiety Stress Scales-21 across gender
among college students during the epidemic of COVID-19. J South Med Univ in a sample of Chinese university students. Front Psychol 2018;9:2064.
2020;40:171–6. [32] Ribeiro MT, Singh S, Guestrin C. ‘‘Why should I trust you?’’: Explaining the
[7] Choudhury AA, Khan MRH, Nahim NZ, Tulon SR, Islam S, Chakrabarty A. predictions of any classifier. In: International conference on knowledge discovery
Predicting depression in Bangladeshi undergraduates using machine learning. In: and data mining. 2016.
IEEE region 10 symposium. 2019, p. 789–94.
17