100% found this document useful (18 votes)
89 views103 pages

Intelligent Medicine and Health Care 1st Edition Chien-Hung Yeh Digital Download

Academic material: Intelligent Medicine and Health Care 1st Edition Chien-Hung YehAvailable for instant access. A structured learning tool offering deep insights, comprehensive explanations, and high-level academic value.

Uploaded by

fboefspjls5028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (18 votes)
89 views103 pages

Intelligent Medicine and Health Care 1st Edition Chien-Hung Yeh Digital Download

Academic material: Intelligent Medicine and Health Care 1st Edition Chien-Hung YehAvailable for instant access. A structured learning tool offering deep insights, comprehensive explanations, and high-level academic value.

Uploaded by

fboefspjls5028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Intelligent Medicine and Health Care 1st Edition

Chien-Hung Yeh pdf download

https://ebookgate.com/product/intelligent-medicine-and-health-care-1st-edition-chien-hung-yeh/

★★★★★ 4.7/5.0 (39 reviews) ✓ 103 downloads ■ TOP RATED


"Great resource, downloaded instantly. Thank you!" - Lisa K.

DOWNLOAD EBOOK
Intelligent Medicine and Health Care 1st Edition Chien-Hung
Yeh pdf download

TEXTBOOK EBOOK EBOOK GATE

Available Formats

■ PDF eBook Study Guide TextBook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Economics Medicine and Health Care 3rd Edition Gavin H.


Mooney

https://ebookgate.com/product/economics-medicine-and-health-care-3rd-
edition-gavin-h-mooney/

ebookgate.com

Intelligent and Adaptive Systems in Medicine 1st Edition


Olivier C. L. Haas

https://ebookgate.com/product/intelligent-and-adaptive-systems-in-
medicine-1st-edition-olivier-c-l-haas/

ebookgate.com

Lung Biology in Health and Disease Volume 214 Practical


Pulmonary and Critical Care Medicine Disease Management
1st Edition Zab Mohsenifar
https://ebookgate.com/product/lung-biology-in-health-and-disease-
volume-214-practical-pulmonary-and-critical-care-medicine-disease-
management-1st-edition-zab-mohsenifar/
ebookgate.com

Advances in Health Care Management Volume 2 Advances in


Health Care Management 2 Advances in Health Care
Management 2 1st Edition G. Savage
https://ebookgate.com/product/advances-in-health-care-management-
volume-2-advances-in-health-care-management-2-advances-in-health-care-
management-2-1st-edition-g-savage/
ebookgate.com
Dementia Care Medicine 1st Edition Trevor Adams

https://ebookgate.com/product/dementia-care-medicine-1st-edition-
trevor-adams/

ebookgate.com

Reorganizing Health Care Delivery Systems Volume 21


Problems of Managed Care and Other Models of Health Care
Delivery Research in the Sociology of Health Research in
the Sociology of Health Care 1st Edition Jennie Jacobs
https://ebookgate.com/product/reorganizing-health-care-delivery-
Kronenfeld
systems-volume-21-problems-of-managed-care-and-other-models-of-health-
care-delivery-research-in-the-sociology-of-health-research-in-the-
sociology-of-health-care-1st-e/
ebookgate.com

International Health Care Management Volume 5 Advances in


Health Care Management 1st Edition Grant Savage

https://ebookgate.com/product/international-health-care-management-
volume-5-advances-in-health-care-management-1st-edition-grant-savage/

ebookgate.com

The Changing Face of Medicine Women Doctors and the


Evolution of Health Care in America 1st Edition Ann K.
Boulis
https://ebookgate.com/product/the-changing-face-of-medicine-women-
doctors-and-the-evolution-of-health-care-in-america-1st-edition-ann-k-
boulis/
ebookgate.com

Health Care and Christian Ethics Robin Gill

https://ebookgate.com/product/health-care-and-christian-ethics-robin-
gill/

ebookgate.com
Special Issue Reprint

Intelligent Medicine
and Health Care

Edited by
Chien-Hung Yeh, Xiaojuan Ban, Men-Tzung Lo, Wenbin Shi and Shenghong He

mdpi.com/journal/applsci
Intelligent Medicine and Health Care
Intelligent Medicine and Health Care

Editors
Chien-Hung Yeh
Xiaojuan Ban
Men-Tzung Lo
Wenbin Shi
Shenghong He

Basel • Beijing • Wuhan • Barcelona • Belgrade • Novi Sad • Cluj • Manchester


Editors
Chien-Hung Yeh Xiaojuan Ban Men-Tzung Lo
School of Information School of Computer & Department of Biomedical
and Electronics Communication Engineering Sciences and Engineering
Beijing Institute University of Science and National Central University
of Technology Technology Beijing Taoyuan
Beijing Beijing Taiwan
China China

Wenbin Shi Shenghong He


School of Information Nuffield Department of
and Electronics Clinical Neurosciences
Beijing Institute University of Oxford
of Technology Oxford
Beijing UK
China

Editorial Office
MDPI
St. Alban-Anlage 66
4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal
Applied Sciences (ISSN 2076-3417) (available at: https://www.mdpi.com/journal/applsci/special
issues/WW038E1NMQ).

For citation purposes, cite each article independently as indicated on the article page online and as
indicated below:

Lastname, A.A.; Lastname, B.B. Article Title. Journal Name Year, Volume Number, Page Range.

ISBN 978-3-7258-0569-3 (Hbk)


ISBN 978-3-7258-0570-9 (PDF)
doi.org/10.3390/books978-3-7258-0570-9

© 2024 by the authors. Articles in this book are Open Access and distributed under the Creative
Commons Attribution (CC BY) license. The book as a whole is distributed by MDPI under the terms
and conditions of the Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)
license.
Contents

Nirmal Acharya, Padmaja Kar, Mustafa Ally and Jeffrey Soar


Predicting Co-Occurring Mental Health and Substance Use Disorders in Women:
An Automated Machine Learning Approach
Reprinted from: Appl. Sci. 2024, 14, 1630, doi:10.3390/app14041630 . . . . . . . . . . . . . . . . . 1

Shadi Eltanani, Tjeerd V. olde Scheper, Mireya Muñoz-Balbontin, Arantza Aldea,


Jo Cossington, Sophie Lawrie, et al.
A Novel Criticality Analysis Method for Assessing Obesity Treatment Efficacy
Reprinted from: Appl. Sci. 2023, 13, 13225, doi:10.3390/app132413225 . . . . . . . . . . . . . . . . 14

Basem Assiri
A Modified and Effective Blockchain Model for E-Healthcare Systems
Reprinted from: Appl. Sci. 2023, 13, 12630, doi:10.3390/app132312630 . . . . . . . . . . . . . . . . 33

Guillaume Dessevre, Cléa Martinez, Liwen Zhang, Christophe Bortolaso and


Franck Fontanili
The Centralization and Sharing of Information for Improving a Resilient Approach Based on
Decision-Making at a Local Home Health Care Center
Reprinted from: Appl. Sci. 2023, 13, 8576, doi:10.3390/app13158576 . . . . . . . . . . . . . . . . . 51

Rytis Maskeliunas, Robertas Damasevicius, Tomas Blazauskas, Kipras Pribuisis,


Nora Ulozaite-Staniene and Virgilijus Uloza
Pareto-Optimized AVQI Assessment of Dysphonia: A Clinical Trial Using Various Smartphones
Reprinted from: Appl. Sci. 2023, 13, 5363, doi:10.3390/app13095363 . . . . . . . . . . . . . . . . . 63

Khalil Al-Hussaeni, Ioannis Karamitsos, Ezekiel Adewumi and Rema M. Amawi


CNN-Based Pill Image Recognition for Retrieval Systems
Reprinted from: Appl. Sci. 2023, 13, 5050, doi:10.3390/app13085050 . . . . . . . . . . . . . . . . . 92

Ming-Hung Chang, Yi-Chao Wu, Hsi-Yu Niu, Yi-Ting Chen and Shu-Han Juang
Cross-Platform Gait Analysis and Fall Detection Wearable Device
Reprinted from: Appl. Sci. 2023, 13, 3299, doi:10.3390/app13053299 . . . . . . . . . . . . . . . . . 108

Salaki Reynaldo Joshua, Wasim Abbas, Je-Hoon Lee and Seong Kun Kim
Trust Components: An Analysis in The Development of Type 2 Diabetic Mellitus Mobile
Application
Reprinted from: Appl. Sci. 2023, 13, 1251, doi:10.3390/app13031251 . . . . . . . . . . . . . . . . . 123

Salaki Reynaldo Joshua, Wasim Abbas and Je-Hoon Lee


M-Healthcare Model: An Architecture for a Type 2 Diabetes Mellitus Mobile Application
Reprinted from: Appl. Sci. 2023, 13, 8, doi:10.3390/app13010008 . . . . . . . . . . . . . . . . . . . 143

Elissaveta Zvetkova, Eugeni Koytchev, Ivan Ivanov, Sergey Ranchev and Antonio Antonov
Biomechanical, Healing and Therapeutic Effects of Stretching: A Comprehensive Review
Reprinted from: Appl. Sci. 2023, 13, 8596, doi:10.3390/app13158596 . . . . . . . . . . . . . . . . . 159

v
applied
sciences
Article
Predicting Co-Occurring Mental Health and Substance Use
Disorders in Women: An Automated Machine Learning
Approach
Nirmal Acharya 1, *, Padmaja Kar 2 , Mustafa Ally 3 and Jeffrey Soar 3

1 Australian International Institute of Higher Education, Brisbane, QLD 4000, Australia


2 St Vincent’s Care Services, Mitchelton, QLD 4053, Australia
3 School of Business, University of Southern Queensland, Toowoomba, QLD 4350, Australia;
[email protected] (M.A.); [email protected] (J.S.)
* Correspondence: [email protected]

Abstract: Significant clinical overlap exists between mental health and substance use disorders, espe-
cially among women. The purpose of this research is to leverage an AutoML (Automated Machine
Learning) interface to predict and distinguish co-occurring mental health (MH) and substance use
disorders (SUD) among women. By employing various modeling algorithms for binary classification,
including Random Forest, Gradient Boosted Trees, XGBoost, Extra Trees, SGD, Deep Neural Network,
Single-Layer Perceptron, K Nearest Neighbors (grid), and a super learning model (constructed by
combining the predictions of a Random Forest model and an XGBoost model), the research aims
to provide healthcare practitioners with a powerful tool for earlier identification, intervention, and
personalised support for women at risk. The present research presents a machine learning (ML)
methodology for more accurately predicting the co-occurrence of mental health (MH) and substance
use disorders (SUD) in women, utilising the Treatment Episode Data Set Admissions (TEDS-A) from
the year 2020 (n = 497,175). A super learning model was constructed by combining the predictions of
a Random Forest model and an XGBoost model. The model demonstrated promising predictive per-
formance in predicting co-occurring MH and SUD in women with an AUC = 0.817, Accuracy = 0.751,
Citation: Acharya, N.; Kar, P.; Ally, Precision = 0.743, Recall = 0.926 and F1 Score = 0.825. The use of accurate prediction models can
M.; Soar, J. Predicting Co-Occurring
substantially facilitate the prompt identification and implementation of intervention strategies.
Mental Health and Substance Use
Disorders in Women: An Automated
Keywords: mental health; substance use disorder; machine learning; AutoML
Machine Learning Approach. Appl.
Sci. 2024, 14, 1630. https://doi.org/
10.3390/app14041630

Academic Editors: Chien-Hung Yeh, 1. Introduction


Wenbin Shi, Xiaojuan Ban,
An association between co-occurring substance use disorders (SUDs) and various
Men-Tzung Lo and Shenghong He
mental health disorders is linked to substantial levels of sickness, death, and impairment [1].
Received: 8 January 2024 Twenty-five percent of patients seeking medical care have at least one mental or behavioural
Revised: 15 February 2024 issue; however, these conditions frequently remain undetected and untreated [2]. Substance
Accepted: 17 February 2024 addiction affects both genders, although there is evidence to suggest that women may face
Published: 18 February 2024 a more rapid progression toward addiction, encounter greater difficulties in sustaining
abstinence, and have a higher susceptibility to relapse compared to men [3]. Women tend to
resort to substance consumption as a response to negative emotions [4,5], and prior research
has also revealed the distinctive mental health dimensions experienced by women who
Copyright: © 2024 by the authors.
have substance-related issues [6]. These dimensions include higher levels of depression,
Licensee MDPI, Basel, Switzerland.
traumatic stress, and borderline features in comparison to men [7]. The implications of these
This article is an open access article
interconnected issues have broader consequences, as substance use disorders (SUDs) have
distributed under the terms and
been linked to increased risks of suicide and aggressiveness [7,8]. Women grappling
conditions of the Creative Commons
Attribution (CC BY) license (https://
with co-occurring disorders often navigate a multitude of hurdles, spanning familial
creativecommons.org/licenses/by/
conflicts, depression, educational barriers, economic hardships, past trauma, physical
4.0/). health concerns, reproductive health complications, infertility, early onset of menopause,

Appl. Sci. 2024, 14, 1630. https://doi.org/10.3390/app14041630 https://www.mdpi.com/journal/applsci


1
Appl. Sci. 2024, 14, 1630

and complications during pregnancy, breastfeeding, childbirth, unemployment, and more,


highlighting the multifaceted nature of their challenges [1,9,10].
Machine learning (ML) has emerged as a promising tool for understanding and
addressing these challenges. Previous studies have explored its application in identifying
predictors for suicide, treatment success, and more. Acion, et al. [11] aimed to investigate
disparities in substance use disorder treatment completion in the U.S. using 2017–2019
data from TEDS-D by SAMHSA. Employing a two-stage virtual twins model (random
forest + decision tree), the research identified factors influencing completion probability
(e.g., race/ethnicity, income source), revealing that those without co-occurring mental
health conditions, with job-related income, and white non-Hispanics are more likely to
complete treatment. Miranda, et al. [12] employed deep learning and natural language
processing to develop DeepBiomarker2 that accurately predicts alcohol and substance
use disorder risk in post-traumatic stress disorder patients and identifies medications and
social determinants of health parameters that may reduce this risk. Adams, et al. [13]
performed a study in Denmark that focused on individuals with substance use disorders
(SUDs) and their elevated suicide risk. Using machine learning, the analysis identified
key predictors for suicide in men and women with SUDs, highlighting specific factors
such as antidepressant use, poisoning diagnoses, age, and comorbid psychiatric disorders.
The findings suggest that individuals with prior incidents of poisoning and mental health
disorders, especially women, are at increased risk of suicide among those with substance
use disorders in Denmark. Aishwarya, et al. [14] investigated the use of machine learning,
including AutoML and ensemble classifiers, to predict potential cardiovascular diseases
by analysing real-time IoT-based healthcare data, highlighting improved accuracy and
efficiency in data analytics for healthcare devices. Kundu, et al. [15] explored the application
of machine learning (ML) in investigating mental health and substance use concerns within
the LGBTQ2S+ population. Examining 11 recent studies, the findings suggested ML as
a promising tool. A lack of studies evaluating substance use treatments in women with
severe mental illness who differ in their needs and capacity has been noted [16], there are
opportunities to explore the potential application to research in this field of Automated
Machine Learning (AutoML) interfaces.
The current research utilises data from the Treatment Episode Data Set Admissions
(TEDS-A) for the year 2020 and utilises an AutoML interface to predict co-occurring mental
health and substance use disorders among women. The rationale behind leveraging
AutoML stems from its growing significance in healthcare analysis [17–19], particularly
within the domain of mental health and substance use disorders [20–22], where it often
leads to enhanced precision and accuracy [23].
The opportunity for AutoML arises from the need to provide a more user-friendly
method for anyone to generate and implement machine learning, offering a more intuitive
approach for creating and deploying models with minimal reliance on coding or complex
ML infrastructure [24]. Given the limited financial resources allocated to clinical coding and
the high wages of data scientists [25], it is imperative to identify a cost-effective approach
that enables healthcare organisations to leverage machine learning capabilities without
incurring substantial expenses. Several AutoML platforms are currently available. Certain
platforms are open source whereas others are commercial. Many prominent organisations
in the field of artificial intelligence, including Microsoft Azure, Google, Amazon, H2 O.ai,
Dataiku, and RapidMiner, have undertaken the development and dissemination of ad-
vanced systems, such as the publicly accessible Cloud AutoML [26]. Platforms such as
Dataiku exemplify this shift, providing a graphical interface empowering users to fine-tune
computational settings effortlessly, enhancing accessibility. Instead of being tethered to
specific algorithms or coding languages, researchers gain the flexibility to explore diverse
methods within a unified space, encompassing languages such as Python, R, and more, fos-
tering the full spectrum of ML tools. Within the AutoML framework, users leverage existing
algorithms and ML frameworks. The process begins with inputting data onto the platform.
Users can then opt to employ a specific method or request algorithm suggestions. Once

2
Appl. Sci. 2024, 14, 1630

chosen, an algorithm is set up to facilitate training, seamlessly leading into the automated
testing phase. This yields immediate access to ML insights, including model predictions
and performance metrics, enabling researchers to employ validated models for forecasting
or analysing various phenomena. Typically, AutoML workflows initiate with basic ML
algorithms known for their simplicity, user-friendliness, and rigorously evaluated models
such as k-nearest neighbors and decision trees. As the analysis demands more intricate
scrutiny, more complex alternatives like boosted trees or deep learning (e.g., XGBoost) come
into play for analysis and evaluation. Diverse intricate models can be crafted, often formed
as ensembles—a fusion of basic models leveraging the strengths of each component while
mitigating individual weaknesses. The synergy within an ensemble of algorithms aims
to enhance overall predictive power and model robustness. ML techniques, particularly
AutoML have led to improved granularity and accuracy in various studies [11,22,27]. This
research capitalises on the power of AutoML to automate the process of model selection,
hyperparameter tuning, and feature engineering, streamlining the analytical process and
enhancing the predictive accuracy of the models. the super learning (SL) model has the
potential to distinguish women with co-occurring disorders from those without. The super
learning algorithm is a supervised learning method that uses a loss-based approach to
choose the best combination of prediction algorithms [28]. The method achieves asymptotic
performance comparable to the optimal weighted combination of the basic learners, making
it a highly effective strategy for addressing various issues using the same technique; it can
reduce the probability of over-fitting during the training process, employing a modified
version of cross-validation [29,30]. The area under the curve (AUC) value of 0.817 achieved
by the super learning model attests to its efficacy in capturing intricate patterns within the
data, underscoring its potential as a robust diagnostic tool.
This study serves as an illustration of advanced statistical methods and machine
learning techniques harnessed through Dataiku, an AutoML interface, in a real-world
healthcare setting. It showcases the platform’s ability to automate essential operations such
as selecting models, optimising hyperparameters, and engineering features. The super
learning model, which combines Random Forest and XGBoost, has superior performance
compared to separate algorithms. It serves as a diagnostic tool for early detection of co-
occurring disorders in women. The study emphasises the potential advantages of these
powerful predictive models. By leveraging these technological advancements, we aim
to bridge the gap between data-driven innovation and clinical practice. SUDs are often
inadequately addressed in women [31,32]. The insight of the study holds the potential
to develop a tool for early identification of mental health and SUDs in women. The
findings also offer valuable insights that can inform future research and collaborations with
policymakers, medical associations, and patient advocacy groups to develop guidelines
for responsible integration and optimise the model’s potential advantages while ensuring
patient well-being and privacy protection.
The structure of the paper is as follows. The next section presents the materials and
methods, covering the description of the dataset, the machine learning models utilised,
and the statistical method employed. This is followed by the Section 3, where the findings
of the study are presented. The Section 4 then follows, which discusses the application,
limitations, and future prospects of the study. Finally, the paper concludes with the
implications, recommendations for further research, and conclusions.

2. Materials and Methods


2.1. Dataset
This study used publicly available Treatment Episode Data Set Admissions (TEDS-A)
2020 [33], maintained by the Center for Behavioral Health Statistics and Quality (CBHSQ)
of the Substance Abuse and Mental Health Services Administration (SAMHSA) to illustrate
the machine learning approach to predict co-occurring mental and substance use disorders
in women. TEDS, encompassing the Admissions Data Set (TEDS-A) and the Discharges
Data Set (TEDS-D), is a notable representation of a substantial administrative dataset

3
Appl. Sci. 2024, 14, 1630

that may captivate addiction researchers in practical situations [34,35]. TEDS provides
comprehensive statistics regarding admissions and discharges from substance use disorder
treatment programs across participating states. However, the analysis for the year 2020
had to exclude Oregon, North Dakota, Idaho, and Washington due to inadequate data
reporting. Notably, some states contribute data that document multiple admissions for the
same individual, shaping statistical analyses to accurately portray admissions rather than
individual clients [36]. The dependent variable in this study was co-occurring mental health
and substance use disorder which is coded as PSYPROB (1 = Yes, 2 = No) in the dataset.
As we focused on women, we extracted the records where the client’s biological sex was
female (n = 497,175). We then conducted data pre-processing which consisted of three steps.
First, we conducted listwise deletion for records with missing values at the dependent
variable and thirty-seven relevant predictors that include PSYPROB, STFIPS, SERVICES,
PREG, IDU, EMPLOY, EDUC, ETHNIC, LIVARAG, BARBFLG, MARFLG, DSMCRIT,
AGE, MARSTAT, RACE, PSOURCE, AMPHFLG, ALCDRUG, STIMFLG, MTHAMFLG,
ALCFLG, SEDHPFLG, INHFLG, OTCFLG, PCPFLG, HALLFLG, OPSYNFLG, BENZFLG,
TRNQFLG, METHFLG, COKEFLG, HERFLG, OTHERFLG, METHUSE, FRSTUSE1, SUB1,
SUB2, SUB3, NOPRIOR. Records with incomplete data in any of the predictors, the outcome,
or characteristics used for defining inclusion in the study were excluded from the analysis.
Second, outlier detection was performed using the analyse function in Dataiku at the
dependent variable and all relevant predictors. The outliers were handled by performing
listwise deletion, leaving us a final analytic sample (n = 132,128). Finally, each feature
was processed using target encoding, in which its original value was substituted with
a numerical value derived from the target values. Within the dataset, several features
exhibit different units and scales. This trend could result in certain features having a
more significant influence on the learning algorithm compared to others, thus potentially
introducing bias. To tackle this issue, we employed the min–max normalisation technique
to standardise all the features, consequently ensuring that they are within a consistent
range, typically ranging from 0 to 1 [37]. This ensures that all features contribute equally
to the model. The provided sample was then randomly split into two sets: a training set
comprising 80% of the sample (n = 105,760), and a test set consisting of the remaining 20%
(n = 26,368).
As the data utilised in this study were sourced from publicly available information
without any subject identification, the research design and methodology were determined
to be exempt from ethics review.

2.2. Statistical Methods


Multivariant analysis was performed using Dataiku v12 [38], an integrated coding-free
platform for data science, machine learning, and analytics [24]. The modelling algorithms
applied for binary classification modelling for the prediction of the probability of co-
occurring mental health and substance use disorders in women were Random Forest,
Gradient Boosted Trees, XGBoost, Extra Trees, SGD, Deep Neural Network, Single-Layer
Perceptron, K Nearest Neighbors (grid) and a super learning model (constructed by com-
bining the predictions of a Random Forest model and an XGBoost model) [39] (see Figure 1).
In the discipline of predictive modelling, conventional techniques such as linear or logistic
regression have historically been used. However, the advancement of machine learning
has introduced Random Forests (RF) and XGBoost as robust alternatives in the field of
health sciences [40–42]. The rationale behind incorporating a Random Forest model and
an XGBoost model into a super learning framework is in their capacity to overcome the
limitations of traditional regression approaches [11,43–45]. Random Forest, with its collec-
tion of decision trees, offers resistance against overfitting and excels in capturing intricate,
non-linear relationships within data. Meanwhile, XGBoost utilises gradient boosting to
repeatedly improve predictive accuracy by combining weak learners and tackling obstacles
posed by heterogeneous data. The objective of this integrated strategy is to capitalise on
the advantages of both algorithms, promoting a more robust and precise predictive model.

4
Appl. Sci. 2024, 14, 1630

Figure 1. Analytic workflow.

The optimal analytic approach for forecasting the co-occurrence of mental health and
substance abuse disorders was determined to be the model that maximises the Area Under
the Curve (AUC) [46]. The AUC is a useful metric for evaluating prediction accuracy. It
represents the likelihood that a randomly selected successful patient will be ranked higher
than a randomly selected unsuccessful patient by any of the algorithms. The AUC (area
under the curve) metric measures the performance of a prediction model, with values
ranging from 0 to 1. AUC = 1 indicates a perfect forecast, while AUC = 0.5 suggests that
the prediction is no better than chance.
Dataiku’s ML diagnostics feature was enabled to conduct comprehensive checks on
the dataset, modelling parameters, training speed, overfitting, leakage, model checks,
ML assertions, and abnormal predictions. A Bayesian search strategy was employed to

5
Appl. Sci. 2024, 14, 1630

optimise the hyperparameters of the machine learning models. The search was guided by a
probabilistic model that intelligently selected hyperparameter combinations for evaluation.
The goal was to find the best-performing set of hyperparameters for the model’s task. The
search process was limited to exploring five different combinations of hyperparameters.
This approach allowed for an efficient and systematic exploration of the hyperparameter
space, leading to improved model performance. A super learning model was constructed
by combining the predictions of a Random Forest model and an XGBoost model using the
“average” method. Each model was trained independently on the training data to capture
distinct patterns and relationships. During prediction, the outputs of both models were
averaged for each data point, resulting in a final prediction for the super learning model.
This approach leverages the strengths of both Random Forest and XGBoost, providing
a potentially more robust and accurate prediction by blending the insights from these
two diverse algorithms.

3. Results
Dataiku automatically ranks the best-performing interpretable model based on the
set performance metric (AUC in this case). The characteristics of the sample are presented
in Table 1. It presents an overview of a cohort and only includes gender as well as the top
10 predictors of PSYPROB that were selected as the most essential based on the Shapley
values, to keep it concise. The cohort predominantly resides in states such as New York,
Colorado, and Illinois. Notably, around 30% had no prior treatment episodes, reflecting a
significant proportion seeking treatment for the first time, while diverse referral sources:
individuals, legal systems, and community referrals highlight the multifaceted pathways to
treatment. In terms of race, the majority of the individuals classified themselves as Black or
African American (73.9%), and a range of substance use patterns emerged, encompassing
various substances across primary, secondary, and tertiary categories. The significant un-
employment rate of 52.3% among the cohort highlights the possible socioeconomic factors
at play. The substances encompass alcohol, cocaine/crack, marijuana/hashish, prescription
opiates/synthetics, methamphetamine/speed, and various other substances. The diag-
nostic data indicated a significant occurrence of opioid dependence, with a prevalence
rate of 33.8%. Additionally, around 26.3% of individuals received medication-assisted
opioid therapy.

Table 1. Baseline characteristics of the cohort (n = 132,128).

Factor Value Number (%)


GENDER Female 132,128 (100)
New York 35,662 (27)
Colorado 11,247 (8.5)
Illinois 9351 (7.1)
Michigan 9161 (6.9)
North Carolina 8410 (6.4)
New Jersey 7520 (5.7)
Indiana 6806 (5.2)
State (STFIPS)
Connecticut 5502 (4.2)
Kentucky 4954 (3.7)
Tennessee 4052 (3.1)
Missouri 3593 (2.7)
Pennsylvania 3236 (2.4)
Ohio 2771 (2.1)
Other 19,863 (15)

6
Appl. Sci. 2024, 14, 1630

Table 1. Cont.

Factor Value Number (%)


No prior treatment episodes 39,112 (29.6)
One prior treatment episode 28,334 (21.4)
Previous substance use treatment episodes Five or more prior treatment episodes 24,478 (18.5)
(NOPRIOR) Two prior treatment episodes 19,212 (14.5)
Three prior treatment episodes 13,101 (9.9)
Four prior treatment episodes 7891 (6)
Ambulatory, non-intensive outpatient 70,284 (53.2)
Rehab/residential, short-term (30 days or fewer) 21,110 (16)
Type of treatment service/setting (SERVICES) Ambulatory, intensive outpatient 15,802 (12)
Detox, 24-h, free-standing residential 13,945 (10.6)
Other 10,983 (8.4)
Individual (includes self-referral) 62,232 (47.1)
Court/criminal justice referral/DUI/DWI 29,059 (22)
Referral source (PSOURCE) Other community referral 15,135 (11.5)
Alcohol/drug use care provider 14,176 (10.7)
Other 11,535 (9.7)
Black or African American 97,677 (73.9)
Asian or Pacific Islander 20,931 (15.8)
Race (RACE)
White 8788 (6.7)
Other 4732 (3.5)
None 48,513 (36.7)
Cocaine/crack 19,126 (14.5)
Marijuana/hashish 18,044 (13.7)
Substance use (secondary) (SUB2) Methamphetamine/speed 11,659 (8.8)
Alcohol 10,974 (8.3)
Other opiates and synthetics 7012 (5.3)
Other 10,246 (12.7)
None 92,205 (69.8)
Marijuana/hashish 10,689 (8.1)
Alcohol 6454 (4.9)
Substance use (tertiary) (SUB3)
Cocaine/crack 5465 (4.1)
Methamphetamine/speed 3373 (2.6)
Other 13,942 (10.5)
Unemployed 69,048 (52.3)
Employment (EMPLOY) Not in labour force 51,075 (38.7)
Part-time 12,005 (9.1)
Opioid dependence 44,699 (33.8)
Alcohol dependence 22,077 (16.7)
Other substance dependence 17,766 (13.4)
Diagnostic and Statistical Manual of Mental
Other mental health condition 14,540 (11)
Disorders diagnosis (DSMCRIT)
Cannabis dependence 6567 (5)
Cocaine dependence 6196 (4.7)
Other 20,283 (15.4)
Yes 34,807 (26.3)
Medication-assisted opioid therapy (METHUSE)
No 97,321 (73.7)

Table 2 shows the performance matrices for each algorithm applied for binary clas-
sification in the test set (N = 26,368). The primary evaluation criterion in this study was
the AUC. All AUC values were between 0.631 and 0.817. This range signifies the proba-
bility that any of the algorithms would correctly rank a randomly selected woman with
co-occurring mental and substance use disorders higher than one without such disorders.
As hypothesised, the super learning model showed the largest AUC of 0.817, demonstrat-
ing robust predictive capability. The performance of the super learning model is closely

7
Appl. Sci. 2024, 14, 1630

followed by XGBoost with an AUC of 0.809. Several ensemble techniques such as Random
Forest (AUC = 0.807) and Extra Trees (AUC = 0.803) showed significant discriminatory
ability, closely following the top-performing algorithm. Meanwhile, conventional methods
such as Gradient-Boosted Trees (AUC = 0.799) and Single-Layer Perceptron (AUC = 0.776)
demonstrated comparable but slightly lower AUC scores. Nevertheless, the utilisation of K
Nearest Neighbors (grid) resulted in a relatively reduced prediction accuracy, as indicated
by an AUC of 0.670. Interestingly, models employing Deep Neural Network architecture ex-
hibited the least satisfactory performance among the investigated algorithms, achieving an
AUC of 0.631. These findings highlight the superiority of the proposed ensemble-based ap-
proach in achieving higher AUC values and therefore more successful binary classification
in this experimental environment.

Table 2. Performance matrices for each algorithm applied for binary classification in the test set
(N = 26,368).

Model AUC Accuracy Precision Recall F1 Score


Random Forest 0.807 0.742 0.734 0.923 0.818
Gradient Boosted Trees 0.799 0.733 0.726 0.921 0.812
XGBoost 0.809 0.745 0.739 0.918 0.819
Extra Trees 0.803 0.738 0.733 0.916 0.814
SGD 0.778 0.725 0.728 0.898 0.804
Deep Neural Network 0.631 0.628 0.628 1 0.771
Single Layer Perceptron 0.776 0.721 0.718 0.916 0.805
K Nearest Neighbors (grid) 0.670 0.661 0.655 0.971 0.782
Super Learning 0.817 0.751 0.743 0.926 0.825

Table 3 provides the mean and standard deviation (SD) for each metric across the
different models. Based on these findings, the “super learning” model emerged as the
better option, as it consistently demonstrated high performance across various parameters
with low variability.

Table 3. Mean and standard deviation of key metrics across the evaluated models.

Metric Mean Standard Deviation (SD)


AUC 0.772 0.059
Accuracy 0.713 0.037
Precision 0.711 0.037
Recall 0.935 0.032
F1 Score 0.804 0.020

Figure 2 provides a visual representation of the AUC in the test set (n = 26,368) for
the super learning model. The super learning model exhibited the highest AUC among all
models, boasting an AUC of 0.817, a score that is typically considered a strong performance
for prediction models. Other matrices for the super learning model include accuracy (0.751),
precision (0.743), recall (0.926), and F1 score (0.825). This outcome underscores the model’s
strong ability to distinguish women with co-occurring mental and substance use disorders
from those without.

8
Appl. Sci. 2024, 14, 1630

Figure 2. Distribution of the performance metric (AUC) of the super learning model.

4. Discussion
This section offers a discussion on the application, limitations, and future prospects
of the research findings. It highlights the practical implications, potential challenges, and
opportunities for further progress in the field of co-occurring mental health and substance
use disorders among women.

4.1. Application
In ML, classification emerges as a vital task, involving the nuanced prediction of
target classes for individual data instances [28]. Achieving optimal performance on diverse
datasets necessitates the careful selection of suitable individual classifiers. The challenge
lies in pinpointing the most suitable data mining or machine learning model tailored
to a specific problem. To tackle this complexity, researchers often deploy an array of
models to ascertain the utmost performance for a given scenario. AutoML platforms
are appealing pre-packaged tools for constructing predictive models using healthcare
data [47]. In this study, the AutoML interface was employed to forecast and differentiate
the simultaneous co-occurrence of mental health and substance use disorders in women.
Notably, AutoML consistently demonstrates enhanced precision and specificity across
various research investigations. By harnessing the capabilities of AutoML, the study
automated critical tasks encompassing model selection, hyperparameter optimisation, and
feature engineering. This streamlined approach simplifies the analytical pipeline and
substantiates an elevation in the accuracy of prediction models, marking a promising stride
within computational health research.

9
Appl. Sci. 2024, 14, 1630

A significant novelty of our study represents the statistical analysis using an AutoML
interface to predict co-occurring mental and substance use disorders in women. This
research serves as a practical demonstration of the presented statistical methods utilising
an AutoML interface within a real-world context, offering valuable insights into predictive
analytics in health-related domains.
The research finding showcasing the performance of the super learning model in
distinguishing co-occurring mental health and substance use disorders among women car-
ries substantial potential for transformative impact. The super learning model comprised
two base learners: a Random Forest model and an XGBoost model, and it outperformed
the individual base learners.The super learning model’s accuracy in identification would
enable early identification of women at risk of co-occurring mental health and substance
use disorders. Women who receive substance use treatment that is tailored to their gender
experience a longer duration of stay in treatment and have a higher probability of main-
taining abstinence after completing treatment [1]. The emergence of more accurate and
timely diagnosis has significant consequences for the development of improved treatment
techniques, aimed at reducing the complications, illness, and death associated with these
disorders [26]. Efficient resource allocation would be facilitated as the model’s precision
allows healthcare providers to focus on those at elevated risk, ensuring that support and
treatment resources are channelled where they can yield the most significant benefits.

4.2. Limitations
AutoML platforms expedite the process of developing machine learning pipelines, and
the models they produce can be used as initial frameworks for constructing predictive mod-
els. It is crucial to approach the integration of such a model with caution when determining
the best output levels based on the research topic, considering ethical considerations, data
security, and ongoing clinical supervision to ensure that its application aligns with the
envisioned positive impact. The study acknowledges the fear of stigma preventing women
from seeking substance abuse treatment, leading to a lower likelihood of them pursuing
help compared to men [10], highlighting a potential limitation in real-world application.
Although the super learning model demonstrated enhanced accuracy in differentiating
co-occurring mental health and substance use disorders in women, it is important to ac-
knowledge its limitations, including the possibility of false positives and false negatives.
While the potential benefits of early identification and intervention on the health of women
are promising, it may require time for these effects to become evident. Further research
and validation are necessary to validate and measure these possible long-term benefits.

4.3. Future Prospects


The utilisation of AutoML platforms, as demonstrated by the study’s implementation
of Dataiku, signifies a progression towards enhancing patient outcomes in the medical
domain, particularly in efficiently analysing and examining large collections of patient data.
The implementation of streamlined methodologies and enhanced diagnostic procedures
enhances the efficiency of healthcare operations, while potentially preserving resources.
This could reduce the need for physical infrastructure such as storage rooms, as data
administration and utilisation become more optimised. The research results regarding
the performance of the super learning model have the potential to bring about significant
changes, especially in the early detection of women who are at risk of experiencing both
mental health and substance use issues simultaneously. Early diagnosis enables proactive
intervention, customised support, and trauma-informed treatments [48], which address
the fear of social disapproval and encourage more compassionate approaches to women’s
mental well-being. The research highlights that the model’s accuracy in allocating resources
effectively could help overcome the obstacles related to stigma, leading to improved health
outcomes for both impacted women and their families and communities. Ultimately,
the precise forecasts generated by the model have the potential to accelerate progress in

10
Appl. Sci. 2024, 14, 1630

research methodologies and shape policy choices in the field of co-occurring mental health
and substance use problems.

5. Conclusions
This study investigated the potential of AutoML for predicting co-occurring mental
health and substance use disorders among women using TEDS-A data for 2020. Employing
advanced statistical and machine learning techniques through Dataiku’s AutoML interface,
a super learning model achieved a high AUC of 0.817, demonstrating robust predictive
capability. These findings highlight the promise of AutoML in healthcare, particularly the
super learning model’s potential as a diagnostic tool for early identification of co-occurring
disorders in women. Future research should focus on disseminating knowledge about Au-
toML’s advantages and ethical considerations in healthcare integration. Collaboration with
policymakers, medical associations, and patient advocacy groups is crucial for establishing
guidelines on responsible implementation, data privacy, and continuous performance mon-
itoring. This holistic approach ensures maximising the model’s benefits while adhering to
the highest ethical standards, safeguarding patient well-being and privacy.

Author Contributions: Conceptualisation, N.A. and P.K.; methodology, N.A. and P.K.; software, N.A.
and P.K.; validation, N.A., P.K., M.A. and J.S.; formal analysis, N.A. and P.K.; writing—original draft
preparation, N.A. and P.K.; writing—review and editing, N.A., P.K., M.A. and J.S.; supervision, M.A.
and J.S.; project administration, P.K. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Publicly available datasets were analyzed in this study. The data can
be found here: https://www.datafiles.samhsa.gov/dataset/treatment-episode-data-set-admissions-
2020-teds-2020-ds0001 (accessed on 2 June 2023).
Conflicts of Interest: Author Padmaja Kar was employed by the company St Vincent’s Care Services.
The remaining authors declare that the research was conducted in the absence of any commercial or
financial relationships that could be construed as a potential conflict of interest.

References
1. Louison, L.; Green, S.L.; Bunch, S.; Scheyett, A. The problems no one wants to see: Mental illness and substance abuse among
women of reproductive age in North Carolina. North Carol. Med. J. 2009, 70, 454–458. [CrossRef]
2. Stewart, D.; Ashraf, I.; Munce, S. Women’s mental health: A silent cause of mortality and morbidity. Int. J. Gynecol. Obstet. 2006,
94, 343–349. [CrossRef] [PubMed]
3. Kokane, S.S.; Perrotti, L.I. Sex Differences and the Role of Estradiol in Mesolimbic Reward Circuits and Vulnerability to Cocaine
and Opiate Addiction. Front. Behav. Neurosci. 2020, 14, 74. [CrossRef] [PubMed]
4. McCaul, M.E.; Roach, D.; Hasin, D.S.; Weisner, C.; Chang, G.; Sinha, R. Alcohol and women: A brief overview. Alcohol. Clin. Exp.
Res. 2019, 43, 774. [CrossRef]
5. Fox, H.C.; Sinha, R. Sex differences in drug-related stress-system changes: Implications for treatment in substance-abusing
women. Harv. Rev. Psychiatry 2009, 17, 103–119. [CrossRef] [PubMed]
6. Prieto-Arenas, L.; Díaz, I.; Arenas, M.C. Gender differences in dual diagnoses associated with cannabis use: A review. Brain Sci.
2022, 12, 388. [CrossRef]
7. Ruiz, M.A.; Douglas, K.S.; Edens, J.F.; Nikolova, N.L.; Lilienfeld, S.O. Co-occurring mental health and substance use problems in
offenders: Implications for risk assessment. Psychol. Assess. 2012, 24, 77–87. [CrossRef]
8. Forster, M.; Rogers, C.J.; Tinoco, S.; Benjamin, S.; Lust, K.; Grigsby, T.J. Adverse childhood experiences and alcohol related
negative consequence among college student drinkers. Addict. Behav. 2023, 136, 107484. [CrossRef]
9. Larsen, J.L.; Johansen, K.S.; Mehlsen, M.Y. What kind of science for dual diagnosis? A pragmatic examination of the enactive
approach to psychiatry. Front. Psychol. 2022, 13, 825701. [CrossRef]
10. Agterberg, S.; Schubert, N.; Overington, L.; Corace, K. Treatment barriers among individuals with co-occurring substance use
and mental health problems: Examining gender differences. J. Subst. Abus. Treat. 2020, 112, 29–35. [CrossRef]
11. Acion, L.; Kelmansky, D.; van der Laan, M.; Sahker, E.; Jones, D.; Arndt, S. Use of a machine learning framework to predict
substance use disorder treatment success. PLoS ONE 2017, 12, e0175383. [CrossRef]

11
Appl. Sci. 2024, 14, 1630

12. Miranda, O.; Fan, P.; Qi, X.; Wang, H.; Brannock, M.D.; Kosten, T.R.; Ryan, N.D.; Kirisci, L.; Wang, L. DeepBiomarker2: Prediction
of Alcohol and Substance Use Disorder Risk in Post-Traumatic Stress Disorder Patients Using Electronic Medical Records and
Multiple Social Determinants of Health. J. Pers. Med. 2024, 14, 94. [CrossRef]
13. Adams, R.S.; Jiang, T.; Rosellini, A.J.; Horváth-Puhó, E.; Street, A.E.; Keyes, K.M.; Cerdá, M.; Lash, T.L.; Sørensen, H.T.; Gradus, J.L.
Sex-Specific Risk Profiles for Suicide Among Persons with Substance Use Disorders in Denmark. Addiction 2021, 116, 2882–2892.
[CrossRef]
14. Aishwarya, N.; Yathishan, D.; Alageswaran, R.; Manivannan, D. AutoML Based IoT Application for Heart Attack Risk Prediction.
In Proceedings of the Decision Intelligence Solutions, Singapore, 2–3 March 2023; 2023; pp. 19–29.
15. Kundu, A.; Chaiton, M.; Billington, R.; Grace, D.; Fu, R.; Logie, C.; Baskerville, B.; Yager, C.; Mitsakakis, N.; Schwartz, R. Machine
Learning Applications in Mental Health and Substance Use Research Among the LGBTQ2S+ Population: Scoping Review. JMIR
Med Inf. 2021, 9, e28962. [CrossRef] [PubMed]
16. Johnstone, S.; Dela Cruz, G.A.; Kalb, N.; Tyagi, S.V.; Potenza, M.N.; George, T.P.; Castle, D.J. A systematic review of gender-
responsive and integrated substance use disorder treatment programs for women with co-occurring disorders. Am. J. Drug
Alcohol Abus. 2023, 49, 21–42. [CrossRef] [PubMed]
17. Waring, J.; Lindvall, C.; Umeton, R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare.
Artif. Intell. Med. 2020, 104, 101822. [CrossRef] [PubMed]
18. Obermeyer, Z.; Powers, B.; Vogeli, C.; Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of
populations. Science 2019, 366, 447–453. [CrossRef] [PubMed]
19. Mustafa, A.; Rahimi Azghadi, M. Automated Machine Learning for Healthcare and Clinical Notes Analysis. Computers 2021,
10, 24. [CrossRef]
20. Beam, A.L.; Kohane, I.S. Big Data and Machine Learning in Health Care. JAMA 2018, 319, 1317–1318. [CrossRef] [PubMed]
21. Rajkomar, A.; Dean, J.; Kohane, I. Machine learning in medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [CrossRef]
22. Tsamardinos, I.; Charonyktakis, P.; Papoutsoglou, G.; Borboudakis, G.; Lakiotaki, K.; Zenklusen, J.C.; Juhl, H.; Chatzaki, E.;
Lagani, V. Just Add Data: Automated predictive modeling for knowledge discovery and feature selection. NPJ Precis. Oncol. 2022,
6, 38. [CrossRef] [PubMed]
23. Thomaidis, G.V.; Papadimitriou, K.; Michos, S.; Chartampilas, E.; Tsamardinos, I. A characteristic cerebellar biosignature for
bipolar disorder, identified with fully automatic machine learning. IBRO Neurosci. Rep. 2023, 15, 77–89. [CrossRef] [PubMed]
24. Naser, M.Z. Machine learning for all! Benchmarking automated, explainable, and coding-free platforms on civil and environmental
engineering problems. J. Infrastruct. Intell. Resil. 2023, 2, 100028. [CrossRef]
25. Perotte, A.; Pivovarov, R.; Natarajan, K.; Weiskopf, N.; Wood, F.; Elhadad, N. Diagnosis code assignment: Models and evaluation
metrics. J. Am. Med. Inf. Assoc. 2014, 21, 231–237. [CrossRef] [PubMed]
26. Zhuhadar, L.P.; Lytras, M.D. The Application of AutoML Techniques in Diabetes Diagnosis: Current Approaches, Performance,
and Future Directions. Sustainability 2023, 15, 13484. [CrossRef]
27. Barenholtz, E.; Fitzgerald, N.D.; Hahn, W.E. Machine-learning approaches to substance-abuse research: Emerging trends and
their implications. Curr. Opin. Psychiatry 2020, 33, 334–342. [CrossRef]
28. Kabir, M.F.; Ludwig, S.A. Enhancing the Performance of Classification Using Super Learning. Data-Enabled Discov. Appl. 2019,
3, 5. [CrossRef]
29. Van der Laan, M.J.; Rose, S. Targeted Learning: Causal Inference for Observational and Experimental Data; Springer: Berlin/Heidelberg,
Germany, 2011; Volume 4.
30. Laan, M.J.V.D.; Polley, E.C.; Hubbard, A.E. Super Learner. Stat. Appl. Genet. Mol. Biol. 2007, 6. [CrossRef]
31. Comartin, E.B.; Burgess-Proctor, A.; Harrison, J.; Kubiak, S. Gender, Geography, and Justice: Behavioral Health Needs and Mental
Health Service Use Among Women in Rural Jails. Crim. Justice Behav. 2021, 48, 1229–1242. [CrossRef]
32. Zhao, Q.; Kong, Y.; Henderson, D.; Parrish, D. Arrest Histories and Co-Occurring Mental Health and Substance Use Disorders
Among Women in the USA. Int. J. Ment. Health Addict. 2023. [CrossRef]
33. SAMHSA. Treatment Episode Data Set Admissions (TEDS-A) 2020; SAMHSA: Rockville, MD, USA, 2023.
34. Standeven, L.R.; Scialli, A.; Chisolm, M.S.; Terplan, M. Trends in cannabis treatment admissions in adolescents/young adults:
Analysis of TEDS-A 1992 to 2016. J. Addict. Med. 2020, 14, e29–e36. [CrossRef]
35. Baird, A.; Cheng, Y.; Xia, Y. Use of machine learning to examine disparities in completion of substance use disorder treatment.
PLoS ONE 2022, 17, e0275054. [CrossRef]
36. Yang, J.C.; Roman-Urrestarazu, A.; Brayne, C. Differences in receipt of opioid agonist treatment and time to enter treatment
for opioid use disorder among specialty addiction programs in the United States, 2014–2017. PLoS ONE 2019, 14, e0226349.
[CrossRef]
37. Pozo-Luyo, C.A.; Cruz-Duarte, J.M.; Amaya, I.; Ortiz-Bayliss, J.C. Forecasting PM2.5 concentration levels using shallow machine
learning models on the Monterrey Metropolitan Area in Mexico. Atmos. Pollut. Res. 2023, 14, 101898. [CrossRef]
38. Egger, R. Software and tools. Applied Data Science in Tourism: Interdisciplinary Approaches, Methodologies, and Applications; Springer:
Cham, Switzerland, 2022; pp. 547–588.
39. Tapeh, A.T.G.; Naser, M.Z. Artificial Intelligence, Machine Learning, and Deep Learning in Structural Engineering: A Scientomet-
rics Review of Trends and Best Practices. Arch. Comput. Methods Eng. 2023, 30, 115–159. [CrossRef]

12
Appl. Sci. 2024, 14, 1630

40. Sahker, E.; Acion, L.; Arndt, S. National analysis of differences among substance abuse treatment outcomes: College student and
nonstudent emerging adults. J. Am. Coll. Health 2015, 63, 118–124. [CrossRef]
41. Glasheen, C.; Pemberton, M.R.; Lipari, R.; Copello, E.A.; Mattson, M.E. Binge drinking and the risk of suicidal thoughts, plans,
and attempts. Addict. Behav. 2015, 43, 42–49. [CrossRef]
42. Alang, S.M. Sociodemographic disparities associated with perceived causes of unmet need for mental health care. Psychiatr.
Rehabil. J. 2015, 38, 293. [CrossRef] [PubMed]
43. Huang, J.-C.; Tsai, Y.-C.; Wu, P.-Y.; Lien, Y.-H.; Chien, C.-Y.; Kuo, C.-F.; Hung, J.-F.; Chen, S.-C.; Kuo, C.-H. Predictive modeling of
blood pressure during hemodialysis: A comparison of linear model, random forest, support vector regression, XGBoost, LASSO
regression and ensemble method. Comput. Methods Programs Biomed. 2020, 195, 105536. [CrossRef]
44. Hong, W.; Zhou, X.; Jin, S.; Lu, Y.; Pan, J.; Lin, Q.; Yang, S.; Xu, T.; Basharat, Z.; Zippi, M. A comparison of XGBoost, random
forest, and nomograph for the prediction of disease severity in patients with COVID-19 pneumonia: Implications of cytokine and
immune cell profile. Front. Cell. Infect. Microbiol. 2022, 12, 819267. [CrossRef]
45. Meng, D.; Xu, J.; Zhao, J. Analysis and prediction of hand, foot and mouth disease incidence in China using Random Forest and
XGBoost. PLoS ONE 2021, 16, e0261629. [CrossRef] [PubMed]
46. Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [CrossRef]
47. Romero, R.A.A.; Deypalan, M.N.Y.; Mehrotra, S.; Jungao, J.T.; Sheils, N.E.; Manduchi, E.; Moore, J.H. Benchmarking AutoML
frameworks for disease prediction using medical claims. BioData Min. 2022, 15, 15. [CrossRef] [PubMed]
48. Apsley, H.B.; Vest, N.; Knapp, K.S.; Santos-Lozada, A.; Gray, J.; Hard, G.; Jones, A.A. Non-engagement in substance use treatment
among women with an unmet need for treatment: A latent class analysis on multidimensional barriers. Drug Alcohol Depend.
2023, 242, 109715. [CrossRef] [PubMed]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

13
applied
sciences
Article
A Novel Criticality Analysis Method for Assessing Obesity
Treatment Efficacy
Shadi Eltanani 1, *, Tjeerd V. olde Scheper 1 , Mireya Muñoz-Balbontin 1 , Arantza Aldea 1 , Jo Cossington 2 , Sophie
Lawrie 2 , Salvador Villalpando-Carrion 3 , Maria Jose Adame 3 , Daniela Felgueres 3 , Clare Martin 1
and Helen Dawes 4

1 School of Engineering, Computing and Mathematics, Faculty of Technology, Design and Environment,
Oxford Brookes University, Wheatley Campus, Wheatley, Oxford OX33 1HX, UK;
[email protected] (T.V.o.S.); [email protected] (A.A.); [email protected] (C.M.)
2 Centre for Movement and Occupational Rehabilitation Sciences (MOReS), Oxford Brookes University,
Oxford OX3 0BP, UK; [email protected] (J.C.); [email protected] (S.L.)
3 Hospital Infantil de Mexico Federico Gomez, Mexico City 06720, Mexico;
[email protected] (S.V.-C.); [email protected] (M.J.A.);
[email protected] (D.F.)
4 National Institute for Health and Care Research (NIHR) Exeter Biomedical Research Centre,
University of Exeter, St Luke’s Campus, Exeter EX1 2LU, UK; [email protected]
* Correspondence: [email protected]

Abstract: Human gait is a significant indicator of overall health and well-being due to its dependence
on metabolic requirements. Abnormalities in gait can indicate the presence of metabolic dysfunction,
such as diabetes or obesity. However, detecting these can be challenging using classical methods,
which often involve subjective clinical assessments or invasive procedures. In this work, a novel
methodology known as Criticality Analysis (CA) was applied to the monitoring of the gait of
Citation: Eltanani, S.; olde Scheper, teenagers with varying amounts of metabolic stress who are taking part in an clinical intervention
T.V.; Muñoz-Balbontin, M.; Aldea, A.; to increase their activity and reduce overall weight. The CA approach analysed gait using inertial
Cossington, J.; Lawrie, S.; measurement units (IMU) by mapping the dynamic gait pattern into a nonlinear representation
Villalpando-Carrion, S.; Adame, M.J.; space. The resulting dynamic paths were then classified using a Support Vector Machine (SVM)
Felgueres, D; Martin, C.; et al. A algorithm, which is well-suited for this task due to its ability to handle nonlinear and dynamic
Novel Criticality Analysis Method data. The combination of the CA approach and the SVM algorithm demonstrated high accuracy
for Assessing Obesity Treatment and non-invasive detection of metabolic stress. It resulted in an average accuracy within the range
Efficacy. Appl. Sci. 2023, 13, 13225.
of 78.2% to 90%. Additionally, at the group level, it was observed to improve fitness and health
https://doi.org/10.3390/
during the period of the intervention. Therefore, this methodology showed a great potential to be a
app132413225
valuable tool for healthcare professionals in detecting and monitoring metabolic stress, as well as
Academic Editors: Arkady Voloshin, other associated disorders.
Chien-Hung Yeh, Wenbin Shi,
Xiaojuan Ban, Men-Tzung Lo and Keywords: human gait; criticality analysis; support vector machine
Shenghong He

Received: 18 August 2023


Revised: 2 November 2023
Accepted: 7 December 2023 1. Introduction
Published: 13 December 2023 Human gait, the intricate orchestration of biomechanical movements during ambula-
tion, stands as a pivotal aspect of human motor function with far-reaching implications
for an individual’s health and overall wellbeing [1–3]. This intricate phenomenon, how-
ever, is not a static entity. It is subject to perturbations stemming from a diverse array
Copyright: © 2023 by the authors.
of factors, encompassing injuries, diseases, disorders, and external conditions, thereby
Licensee MDPI, Basel, Switzerland.
This article is an open access article
engendering deviations from established norms and the emergence of irregular or aberrant
distributed under the terms and
gait patterns [4–6].
conditions of the Creative Commons The traditional methodologies employed in gait analysis, predominantly reliant upon
Attribution (CC BY) license (https:// observational techniques, possess inherent limitations marked by subjectivity and inter-
creativecommons.org/licenses/by/ observer variability [7]. These methodologies, furthermore, confront difficulties in ad-
4.0/). equately encapsulating the nuanced, multifaceted, and quantitative attributes inherent

Appl. Sci. 2023, 13, 13225. https://doi.org/10.3390/app132413225 https://www.mdpi.com/journal/applsci


14
Appl. Sci. 2023, 13, 13225

in gait, particularly when confronted with intricate patterns stemming from underlying
pathologies [8].
In the contemporary landscape of medical diagnostics, artificial intelligence (AI) has
emerged as a promising frontier, bearing the capacity to substantially enhance gait analysis
and expedite the detection of gait-related disorders [9,10]. Its proficiency in the processing
of extensive datasets, identification of latent patterns, and facilitation of early diagnoses
offers substantial promise [11]. Nonetheless, AI grapples with intricate challenges in the
realm of gait analysis, stemming from the dynamic nature of gait itself and the intricate
interplay of a multitude of contributory factors [11].
In response to these exigencies, this paper introduces Criticality Analysis (CA) as an
innovative and robust AI tool primed for precise identification of abnormal or irregular
gait patterns, thereby indicating the presence of latent disorders or ailments. Beyond
its diagnostic prowess, CA serves as a dynamic tool for the continuous monitoring of
disorder progression and the systematic tracking of treatment efficacy over temporal
trajectories. This innovative paradigm promises to empower medical practitioners in their
clinical decision making in relation to gait-related disorders, ultimately effecting substantial
enhancements in patient outcomes and the broader landscape of healthcare.
The paper is structured as follows: Section 2 outlines a methodology employing
mathematical models to assess critical aspects of human gait, emphasising the integration of
mathematical modeling and criticality analysis in the context of gait study. In Section 3, the
methodology applied to the Criticality Analysis of Diabetic Gait in Children (CARDIGAN)
dataset is detailed, covering data collection, feature extraction, criticality analysis data
representation, and spatiotemporal analysis. Section 4 explores the experimental results
of the CARDIGAN dataset, including the analysis of Receiver Operating Characteristic
(ROC) Curves, the calculation of Area Under the Curve (AUC), and the determination of
the decision boundary of the support vector method. Section 5 provides a concise summary
of the overarching results, while Section 6 concludes by highlighting the key findings of
the research.

2. Mathematical Model-Driven Criticality Analysis for Human Gait Assessment


The complexities of human locomotion have long intrigued researchers who strive to
decode the dynamic, self-organised patterns underlying gait. With each step, the intricately
choreographed motions blur the boundary between conscious control and automated
processes. Gait is governed by nonlinear phenomena, including emergent oscillations,
traveling waves, and spiraling coordination, which evolve across spatiotemporal scales. To
decipher the chaotic fluctuations disrupting normal walking, mathematical models become
indispensable, capturing nuanced biomechanics within the motor control system [12].
Particularly intriguing is the analysis of disruptions propelling gait into criticality, marked
by surges in kinetic energy and resulting in near power-law and exponential dynamics. The
human locomotive system, while captivating, lacks comprehensive models characterising
its dynamics across both temporal and spatial dimensions. Simplified models often cannot
fully capture the complex control mechanisms translating network behavior into stable,
coordinated movement patterns across space. Therefore, developing sophisticated models
is crucial for decoding the inherent complexities of human gait and locomotion.
In this paper, we employ a nonlinear biochemical enzyme control model to conduct
a comprehensive analysis of human gait dynamics, shedding new light on the intricate
biomechanical processes underlying locomotion [13]. In this paper, the investigation is
centered on elucidating the specific role of biochemical reactions in shaping gait patterns.
This meticulous exploration of biochemical intricacies allows for discerning their profound
influence on the overall coordination and characteristics of human gait. This modeling
approach gains particular relevance when examining scenarios where particular biochem-
ical pathways are suspected to be pivotal factors contributing to gait abnormalities or
adaptations, as often observed in various pathological conditions.

15
Appl. Sci. 2023, 13, 13225

The model utilises a control mechanism to stabilise external perturbations to the motor
system by precisely calibrating the quantity of enzyme in relation to the concentration
of one of the variables, f . The model also represents the control process of two enzymes
that govern the formation of the extracellular matrix, m, from soluble filaments, f . The
proteinase, p, deconstructs the matrix into filaments, while transglutaminase, g, reassembles
the filaments into the matrix. The extracellular matrix, m, is continually generated by
adjacent cells, rim , at a constant rate, with each protein undergoing catalytic processes
proportional to p. The bifurcation parameter, rim , acts as an external turbulent input to
the control model. The dynamics governing the rate of enzyme production, specifically
enzymes p and g, are influenced by the Rate Control of Chaos (RCC). This approach
employs a series of nonlinear rate equations, as illustrated in Equations (4)–(7), to describe
the temporal evolution of system variables. A key control term in these rate equations
contains variable f , which has a strong nonlinear influence on the dynamics. The RCC
confines this control term using a rate control function, as depicted in Equation (1), that
restricts its divergence rate, thereby stabilising the overall system behavior. The adjustable
parameters in the rate control function allow tuning the intensity of control applied to
the chaotic dynamics. Meanwhile, the criticality analysis involves examining the phase
space representation of outputs f and m, which correspond to concentrations of soluble
filaments and extracellular matrix, respectively. The phase space plot with f on the x-
axis and m on the y-axis illustrates the time-dependent evolution of nonlinear dynamics.
As parameters are varied, the system exhibits complex phenomena, including bistability,
limit cycles, spiralling trajectories, and chaos, particularly near critical transition points.
Analysing the geometric patterns within the phase portrait provides valuable insights into
the mechanisms underpinning self-organised criticality. Characteristics such as the number
and stability of fixed points, oscillations, excitability, and susceptibility to perturbations
can be deduced from phase space topology. Additionally, the fractal-like features within
the phase portrait unveil the self-similar, scale-invariant nature of critical fluctuations.
Moreover, this representation facilitates the quantification of nonlinear correlations that
capture the intricacies of coupled dynamics. Therefore, phase space-based criticality
analysis unveils the system’s rich nonlinear behavior, phase transitions, and emergent
complexity resulting from self-organised criticality.
f
qf = , (1)
f + μf

σp (q f ) = f p e( x p q f ) , (2)

σg (q f ) = f g e( x g q f ) , (3)

dm fg mp
= kg − + rim , (4)
dt KG + f 1+m

df fg mp fp
= −k g + − , (5)
dt KG + f 1+m 1+ f

dp fn
= σp (q f )γ n − k a p2 , (6)
dt KR + f n

dg fl gp
= σg (q f ) β l − k deg . (7)
dt KS + f l Kdeg + g

16
Appl. Sci. 2023, 13, 13225

Mathematically speaking, the CA model has several parameters, including γ = 0.026,


k deg
β = 0.00075, K R = 4.5, KS = 1, KG = 0.1, Kdeg = 1.1, k g = k deg = 0.05, and k a = Kdeg =
0.0455. Hill numbers n and l are also set to four. Bifurcation parameter rim exhibits a wide
range of dynamic behaviors, including stable periodic cycles, bistability, and chaos. This
parameter remains constant for all oscillators within the chaotic domain. Additionally, an
external input is applied as a perturbation to the rim parameter as described in Equation (8).
This parameter links different oscillators together by using a relative scale contribution from
all other oscillators. RCC control parameters presented in Equations (1)–(3) ( f p = f g = 1,
x p = x g = −1, and μ f = 2) are kept constant throughout the experiment simulations in
this paper, but can have different values that allow the local oscillator possibility to change
its oscillatory orbits.
n
rim i = ∑ w j m j + ε. (8)
j=1,j=i

The connectivity strength between various oscillators, represented by w j , can range


from 0.00011, 0.00012, to 0.00025. External perturbations, represented by ε, are uniformly
distributed according to a Gaussian distribution and scaled within the domain of [−1, 1].
These perturbations are observed over a range of evolution steps to explore the varying
oscillatory cycles they produce. In this paper, a connectivity strength of w j = 0.0002 was
selected from the chaotic domain of the underlying oscillators to assess its effect on the
dynamics while maintaining overall stability.
The network of nonlinear models in this paper consists of 16 oscillators, each of which
can adjust their local dynamics to adapt to external perturbations from their neighboring
oscillators. The simulation of the entire model was carried out using EuNeurone soft-
ware (v2.3, 2013) and the Fehlberg-RK method as a fixed step integration for Ordinary
Differential equations (ODEs). The total unweighted dynamics, represented by M and
F in Equations (9) and (10), were measured as the net sum of the individual oscillators,
allowing for observation by a remote observer who would otherwise be unable to detect
the individual oscillators.
n
M= ∑ mi , (9)
i =1
n
F= ∑ fi . (10)
i =1

The CA method described in this paper has previously been applied in research, lever-
aging its capabilities to generate dynamic and scale-free nonlinear data representations,
which in turn facilitate the precise detection of disturbances associated with human gait [14].
Subsequently, CA combined these encoded representations with the SVM algorithm, en-
hancing superior detection accuracy. This synergy surpasses traditional methods that lack
the CA approach in terms of performance and robustness. Hence, this innovative CA
approach allows for the generation of nonlinear data representations that are well suited
for training conventional classifiers [15].

3. Methodology
The proposed CA method for classifying human gait disorders includes a framework
consisting of several key components, including data collection, data processing, feature
extraction, and the use of the SVM technique. This methodology is illustrated in Figure 1.

17
Appl. Sci. 2023, 13, 13225

Development a Sampling Plan


Start Contact Participants and Obtain Consent

Baseline Assessment

Clinical Lab Tests


Blood Glucose, Insulin
Lifestyle Questionnaire Physical Activity Test Nutrition Assessment
Resistance, Cholesterol
Test, HDL, LDL, ... No

Clinical
Screening

Yes

CARDIGAN Data Collection

Feature Extraction by DGAS

Multi-dimensional and
Temporal Spatial
Phase Plot Analysis

Applying Statistics Criticality Analysis (CA)

SVM Classification

Prediction

Performance Measure of SVM End

Figure 1. The flowchart of the proposed CARDIGAN methodology is presented.

3.1. Data Collection


The study assembled a heterogeneous cohort of 50 adolescent subjects, comprising
individuals diagnosed with obesity, those with diabetes, and a cohort of healthy controls, all
of whom were thoroughly recruited from Mexico Children’s Hospital. Among the healthy
control group, 19 were males, and there was one female participant, with ages ranging
from 10 to 15 years, weights spanning from 40 to 83 kg, heights from 133 to 172.9 cm, and
BMIs from 21.16 to 34. The participants with obesity included 16 males and 4 females, aged
between 10 and 17 years, with weights ranging from 36 to 106.4 kg, heights spanning from
129 to 179 cm, and BMIs from 17.64 to 35.5. Meanwhile, diabetic participants consisted of
10 females aged 12 to 13 years, with weights ranging from 74.5 to 76.2 kg, heights from 159
to 159.7 cm, and BMIs from 29.5 to 30.4. It is crucial to underscore that due to an inadequate
volume of available diabetic data points, the analysis was restricted to data from 20 healthy
controls and 20 participants diagnosed with obesity, ensuring the robustness of the findings
while acknowledging data limitations. The participants underwent a 6-week intervention
program aimed at improving fitness and reducing weight. Gait analysis was conducted
at baseline, immediately after the intervention, and at 3- and 6-month follow-ups. Gait
analysis involved participants walking back and forth over a 30 m track for 6 min while

18
Appl. Sci. 2023, 13, 13225

wearing an inertial measurement unit sensor on their lower back. The gait data were
anonymised, and approval was obtained before analysis.
Assessment was based on the use of an inertial measurement movement sensor (IMU),
placed on the fourth lumbar vertebra located on the top left of the anatomical position of
the lumber spine, known as the body Centre of Mass (CoM). The sensor was designed to
be incredibly flexible, providing for mobility in many different planes including flexion,
extension, side bending, and rotation. Gait analysis was conducted for participants using a
standardised 6 m test, wherein an IMU was attached to the lower back to capture triaxial
accelerometer and gyroscope data at a frequency of 100 Hz. For individuals with DPN
(Diabetic Peripheral Neuropathy), the assessment took place at OCDEM (Oxford Centre
for Diabetes, Endocrinology, and Metabolism) in a dedicated obstacle-free corridor. The
methodology employed for deriving gait parameters has been comprehensively described
in previous studies. The spatiotemporal parameters obtained from the 6 min walking
test encompassed step time (measured in milliseconds), cadence (expressed in steps per
minute), stride length (in meters), and walking speed (in meters per second). Furthermore,
gait control parameters, which encompass measures of dynamic stability and gait variabil-
ity, were evaluated utilising various instruments such as accelerometers, force plates, or
motion capture technology. These assessments aimed to quantify fluctuations in temporal
aspects (e.g., stride time), spatial aspects (e.g., step length), and comprehensive whole-body
kinematics (e.g., segment angles). The parameters assessed included Beta (expressed in
degrees), SDa (measured in arbitrary units), SDb (also in arbitrary units), ratio (in a dimen-
sionless unit), and walk ratio (in millimeters per steps per minute). These parameters have
been identified as indicators of neuro motor control [16,17]. The dynamics of their walking
activity were monitored using the Polar Team tracking system [18].

3.2. Feature Extraction Method for Analysing Gait Data


The CARDIGAN dataset, which was collected utilising a 3-dimensional accelerometer,
gyroscope, and magnetometer IMU sensor, was analysed utilising DataGait Analysis
Software (DGAS) (v11.1, 2019). Developed as a standalone software analysis package by
the Movement Science Group at Oxford Brookes University using LabVIEW2011 (National
Instruments, Ireland), DGAS employs quaternion rotation matrices and double integration
to transpose the accelerations frame of the z-axis from the object to the global system,
thereby allowing for the measurement of translatory vertical CoM accelerations during
walking and the achievement of a relative change in position. As referenced in [18], upward
CoM measurements determine the global quality of human gait parameters. DGAS extracts
critical features of individuals’ gait for the purpose of classification. In this context, it
becomes feasible to differentiate between biologically distinct masculine and feminine gait
patterns, taking into account not only the spatiotemporal parameters that capture gait
dynamics at specific time points but also the potential impact of their respective body
shapes or dimensions on these distinctions. In gait analysis, a multitude of parameters
are employed to comprehensively understand the complexities of human locomotion.
Temporal parameters encompass fundamental measurements such as Step Time, which
quantifies the duration from the initial contact of one foot to the subsequent contact of the
opposite foot, and Stride Time, which denotes the time interval between successive initial
contacts of the same foot. Cadence adds another layer of insight, representing the number
of steps taken per unit of time. Meanwhile, spatial parameters offer dimensions to gait
assessment; Step Length measures the distance between successive initial contacts of the
same foot, and Stride Length extends this to cover the span from one foot’s initial contact
to the following foot’s contact. The rate of position change during gait, known as Velocity,
is calculated by the ratio of stride length to stride time. In addition, multi-dimensional
parameters introduce complexity: Duty Factor gauges the percentage of the gait cycle
during which each foot remains on the ground, while the Froude Number serves as a
dimensionless speed parameter reflecting the interplay of centripetal and gravitational
forces during walking. Finally, the Walk Ratio denotes the relationship between cadence

19
Appl. Sci. 2023, 13, 13225

and mean step length, offering insights into the neuromotor control of gait. Within the realm
of Phase Plot Analysis, distinctive parameters emerge: Beta Angle, a measure of stability,
is the angle of the primary gait phase plot axis relative to the vertical axis; SDa and SDb
represent standard deviations describing the distribution of phase plot points, with SDa
reflecting stability and SDb pertaining to rhythm; and Ratio (SDa/SDb) serves as a relative
rhythm stability indicator. The raw data from the accelerometer are processed using DGAS
software, which is founded on the inverted pendulum approach. This conversion results in
17 parameters that are used to assess the physical characteristics of each individual. These
17 gait features serve as inputs for perturbing the criticality analysis model, as represented
by Equations (4) and (7), respectively. Table 1 displays the list of the 17 gait parameters.

Table 1. Extracted Gait Features.

Gait Parameter Measurement Unit


Step Time (ms)
Step Time (Left) (ms)
Temporal Step Time (Right) (ms)
Stride Time (ms)
Cadence (steps/min)
Step Length (Left) (m)
Step Length (Right) (m)
Spatial
Stride Length (m)
Velocity (m/s)
Duty Factor Double Stance (%)
Duty Factor Single Stance (%)
Multi-dimensional
Froude Number (au)
Walk Ratio (mm/steps/min)
Beta Angle (Degree (◦ ))
SDa (au)
Phase Plot Analysis
SDb (au)
Ratio = SDa/SDb (Dimensionless)

3.3. Gait Data Representation and Spatiotemporal Analysis Using Criticality Analysis
Criticality Analysis is a method used to represent complex multivariate data patterns
in a simplified form, typically in the form of a phase plot portrait or manifold. This method
involves analysing the data in multiple dimensions and identifying patterns or structures
that are most critical to understanding the underlying dynamics of the system. The ex-
tracted features by DGAS were used as perturbation inputs to the CA model represented
by Equations (4) and (7), respectively. This aided in gaining a deeper understanding of
the underlying mechanisms and dynamics of the system under study. The visual repre-
sentations of gait regulation and coordination between spatial and temporal domains are
depicted in Figures 2–7 through phase plot orbits. Well-regulated gait was characterised by
smooth and narrow orbits, whereas dysfunctional gait control was evident in irregular and
variable orbits. These phase plots served as a means to distinguish between healthy and
pathological gaits by evaluating the dynamics of spatiotemporal coordination. These phase
plots demonstrated clear differences between the gait patterns of the healthy control and
obesity groups over the 6-week period. Specifically, in the healthy control group, the phase
plots exhibited a consistent pattern characterised by smooth and regular oscillations with
steady amplitudes and frequencies. These findings were indicative of a well-maintained,
rhythmic gait pattern that demonstrated excellent coordination and balance. Notably,
the orbits in this group remained relatively narrow, which underscored the efficiency of
their biomechanics and the minimal occurrence of side-to-side body motion. In contrast,
the obesity group displayed phase plots that deviated from the healthy control group’s
pattern. These plots appeared more irregular and distorted, with variable amplitudes and
wider orbits, suggesting a compromised sense of balance and increased lateral swaying

20
Appl. Sci. 2023, 13, 13225

during gait. Furthermore, these phase plots exhibited more abrupt changes in direction,
indicating the need for sudden adjustments to maintain stability. These observations were
reflective of a slower and more effortful gait, likely resulting from the additional weight
burdening the joints and muscles in individuals with obesity. Moreover, both groups
exhibited a common trend of declining gait consistency over the 6-week observation period,
potentially attributable to the onset of fatigue effects. The healthy control group, despite
its initial robust gait pattern, displayed a gradual decrease in consistency, reflecting the
possibility of accumulating fatigue from repeated gait assessments. The obesity group,
already experiencing challenges in maintaining gait regularity, showed a similar decline in
consistency, accentuating the toll that prolonged observation sessions might take on their
gait patterns. This convergence in declining gait consistency underscores the importance
of considering potential fatigue factors in the interpretation of gait analysis results across
different population groups.
Examining the gait patterns across the 6-week period, in Figure 2 (Week 1), the healthy
group displayed a tight circular cluster, indicating consistent gait cycles. In contrast,
the obesity group showed more elongated, scattered orbits, reflecting a higher degree of
variability in gait. As we progressed to data depicted in Figure 3 (Week 2), the healthy group
continued to maintain a tight cluster, while the obesity group’s orbits, though still dispersed,
appeared somewhat more rounded, suggesting some improvement in gait coordination
compared to that of Week 1. Figure 4 (Week 3) portrayed the healthy group with a very
tight cluster, indicative of highly consistent gait, while the obesity group exhibited more
elongated orbits with flatter tops, indicating instability in their gait pattern. In Figure 5
(Week 4), the healthy control cluster became somewhat looser, possibly due to accumulating
fatigue. The obesity group’s orbits remained uneven but showed a slightly improved level
of coordination compared to Week 3. Figure 6 (Week 5) demonstrates the healthy control
group’s cluster becoming more dispersed, reflecting increasing gait variability. On the other
hand, the obesity group’s plots were highly scattered with jagged trajectories, suggesting a
worsening of gait control. Finally, in Figure 7 (Week 6), both groups display more dispersed
orbits than in previous weeks, indicating the potential impact of fatigue on gait consistency
in both groups. The obesity group’s orbits appeared slightly more rounded than in Week 5,
hinting at some recovery in coordination, although the overall trend indicated challenges
in maintaining consistent gait patterns.

Figure 2. Comparison of phase space plots of walk patterns for healthy control and obesity groups in
the clinical gait experiment conducted in w1 is presented. Healthy control walk patterns are shown
on the left while obesity walk patterns are shown on the right.

In this paper, we utilised a kernel SVM classifier to distinguish between the obesity
and healthy groups based on phase plot data. The choice of kernel SVM is particularly
well suited for this analysis due to the inherently nonlinear and complex nature of the data.
Phase plot data, representing dynamic patterns of physiological processes, often exhibit
intricate and nonlinear relationships. Traditional linear classifiers may struggle to capture
the patterns present in such data. However, the kernel SVM is designed to address this
challenge by mapping the data into a higher-dimensional space, where complex patterns

21
Appl. Sci. 2023, 13, 13225

become more separable. It leverages a diverse set of kernel functions, including radial
basis function (RBF), to effectively transform the data into a format where it can distinguish
between the obese and healthy groups. This approach enables the identification of hidden
patterns, making it an ideal choice for this study, and ensures that the classification approach
is capable of handling the inherent nonlinearity in the phase plot data, facilitating the
reliable and accurate differentiation of the two groups. Figures 16–18 demonstrate the
decision boundary generated by kernel SVM, highlighting its effectiveness in distinguishing
between obese and healthy control groups using phase plot data.

Figure 3. Comparison of phase space plots of walk patterns for healthy control and obesity groups in
the clinical gait experiment conducted in w2 is presented. Healthy control walk patterns are shown
on the left while obesity walk patterns are shown on the right.

Figure 4. Comparison of phase space plots of walk patterns for healthy control and obesity groups in
the clinical gait experiment conducted in w3 is presented. Healthy control walk patterns are shown
on the left while obesity walk patterns are shown on the right.

Figure 5. Comparison of phase space plots of walk patterns for healthy control and obesity groups in
the clinical gait experiment conducted in w4 is presented. Healthy control walk patterns are shown
on the left while obesity walk patterns are shown on the right.

22
Appl. Sci. 2023, 13, 13225

Figure 6. Comparison of phase space plots of walk patterns for healthy control and obesity groups in
the clinical gait experiment conducted in w5 is presented. Healthy control walk patterns are shown
on the left while obesity walk patterns are shown on the right.

Figure 7. Comparison of phase space plots of walk patterns for healthy control and obesity groups in
the clinical gait experiment conducted in w6 is presented. Healthy control walk patterns are shown
on the left while obesity walk patterns are shown on the right.

Figures 8 and 9 serve as invaluable tools for assessing the dynamic nature of gait
progression throughout the 6-week study period. These graphical representations offered a
comprehensive view of the data by plotting the peak values extracted from each phase plot
orbit as discrete data points for every week. Consequently, these visualisations effectively
generated trajectories that unveiled nuanced alterations in gait patterns over time. Fun-
damentally, each data point within these trajectories captured the maximum step length
achieved during a specific gait cycle, encapsulating the essence of gait performance. By
plotting these peak values across the 6-week observation window, an intuitive visual per-
spective emerged on how maximal step length evolved across multiple visits. Furthermore,
these peak values functioned as numerical metrics that concisely represented the range of
variability, which is an informative measure quantifying the degree of variation in maximal
step length from one week to the next. To provide a rigorous statistical summary of this
variation over time, standard deviation (SD) of the peak values for each participant was cal-
culated. This SD became a pivotal indicator, with a higher value signifying a greater degree
of inconsistency in the maximal step length achieved across different weeks. Consequently,
comparing SD values before and after the intervention yielded a quantitative assessment of
whether gait improved (resulting in a lower SD) or deteriorated (resulting in a higher SD).
This analytical approach enabled the precise quantification of the impact of the intervention
on gait stability and consistency, offering valuable insights into the effectiveness of the
intervention. In Figure 8, which pertains to healthy controls, the majority of trajectories
exhibited minimal fluctuation, remaining relatively level throughout the study duration.
This observation signified consistent gait patterns from week to week, characterised by
limited variation in peak values. Conversely, in Figure 9, representing the obesity group, the
trajectories displayed greater irregularity, featuring discernible peaks and troughs across

23
Appl. Sci. 2023, 13, 13225

the weeks. This pattern indicated increased instability in gait parameters, with significant
variations in the peak values across the different visits. As an illustrative example, partici-
pant P14 was considered. In Figure 8, P14’s trajectory remained consistently around 0.55,
demonstrating a steady gait with little variation over time. However, in Figure 9, P14’s
trajectory exhibited a drop from approximately 0.7 to 0.4 by Week 3 before subsequently
rebounding. This trajectory pattern suggested that P14’s gait became more irregular during
the course of the study but later exhibited improvement. Complementing these trajectories,
the standard deviation bars visually represented the extent of variability across the 6 weeks.
In Figure 8, P14’s standard deviation bars were notably small, confirming minimal fluctua-
tion and consistent gait during the pre-intervention period. In contrast, Figure 9 portrayed
larger standard deviation bars, indicating increased inconsistency in gait patterns when
P14 was in an obese state. Overall, these quantitative comparisons between Figures 8 and 9
provided valuable insights into the differences in gait stability and variability between
individuals with obesity and healthy controls. These trajectories, coupled with the standard
deviation bars, facilitated the rigorous tracking and assessment of gait changes for each
participant throughout the 6-week study period.

Figure 8. The advancement of normal walking patterns for each person over a 6-week period
is shown.

Figure 9. Tracking the improvement in gait for managing obesity for each person over a 6-week
period is shown.

4. Experiment Results
In this section, we present the findings of our experimental investigation, which
encompasses the performance of the SVM classifier in identifying gait patterns for both

24
Appl. Sci. 2023, 13, 13225

healthy control and obese groups. Additionally, we examine the impact of various Kernel
SVM model parameters on classification performance. A comprehensive analysis of the
generalisation performance of the SVM classifier is also presented, including the Receiver
Operating Characteristic (ROC) curve, the area under the ROC curve, and the SVM decision
classification boundary. The results demonstrate the potential of using SVM in combination
with a controlled CA model for accurate detection of gait patterns associated with healthy
controls and individuals with obesity.

4.1. Receiver Operating Characteristic (ROC) Curve


The Receiver Operating Characteristic (ROC) curve is a graphical representation of
the performance of a binary classifier system as the discrimination threshold is varied [19].
In the context of SVM, the ROC curve is used to evaluate the performance of the SVM
classifier in classifying data samples into two different classes. The ROC curve plots the
true positive rate (TPR) (sensitivity) against the false positive rate (FPR) (≈1 − TNR) at
various threshold settings. Figures 10–12 illustrate the ROC curves for the best pair of σ
and C values that satisfy the highest accuracy during the entire trial period.

Figure 10. The relationship between True Positive Rate (Sensitivity) and False Positive Rate
(1-Specificity) at various threshold levels, as determined by the kernel function of the SVM, is
displayed through the ROC curves of w1 and w2 .

Figure 11. The relationship between True Positive Rate (Sensitivity) and False Positive Rate
(1-Specificity) at various threshold levels, as determined by the kernel function of the SVM, is
displayed through the ROC curves of w3 and w4 .

25
Appl. Sci. 2023, 13, 13225

Figure 12. The relationship between True Positive Rate (Sensitivity) and False Positive Rate
(1-Specificity) at various threshold levels, as determined by the kernel function of the SVM, is
displayed through the ROC curves of w5 and w6 .

The ROC plots (Figures 10–12) show that, in the context of SVM, parameter C controls
the trade-off between maximising the margin and minimising the misclassification error.
When the value of C is smaller, such as C = 0.1, the margin becomes wider, but there are
more instances of misclassifications. Conversely, a larger value of C, such as C = 10, leads
to a narrower margin, but with a reduced number of misclassifications. Parameter σ is used
to control the width of the Kernel Gaussian function that is used to map the input data
into a higher-dimensional space, where a linear boundary can be found. A larger value of
σ results in a wider Gaussian function, which leads to a softer decision boundary and a
higher bias, while a smaller value of σ results in a narrower Gaussian function, which leads
to a harder decision boundary and a higher variance.
When σ is small, the decision boundary is more sensitive to input data, which can lead
to overfitting. On the other hand, when σ is large, the decision boundary is less sensitive
to input data, which can lead to underfitting. Therefore, the value of σ has an impact on
generalisation performance of the SVM.
A good value for C and σ is the one that balances the trade-off of bias and variance,
that is, a good balance between overfitting and underfitting.
The ROC curves shown in Figures 10–12 perform well with σ = 0.1 and 1 for various
values of C of the SVM, which is likely because the classifier is able to find a good balance
between overfitting and underfitting by adjusting the value of C and σ which in turn results
in good performance.

4.2. The Area under the Curve (AUC)


The area under the ROC curve is a measure of the performance of a binary classi-
fier [20]. In the context of SVM, the AUC represents the ability of the classifier to distinguish
between positive and negative classes. A higher AUC value indicates that the classifier is
able to correctly classify more instances of the positive class as positive, while also correctly
classifying more instances of the negative class as negative. An AUC of 1.0 represents a
perfect classifier, while an AUC of 0.5 represents a classifier that performs no better than
random guessing.
Figures 13–15 show how the performance of the SVM model changes as the regulari-
sation parameter strength C is varied. Regularisation parameter C controls the trade-off
between maximising the margin (the distance between the decision boundary and the clos-
est training instances) and minimising the classification error. When C is small, the model
focuses more on maximising the margin, which can lead to a simpler decision boundary but
also a higher classification error. As C is increased, the model focuses more on minimising
the classification error, which can lead to a more complex decision boundary but also lower
classification error. From Figures 13–15, if the AUC increases as C increases, it means that
the model’s performance is improving as regularisation strength C increases. This may
suggest that the model is underfitting the data when C is small and that increasing the

26
Appl. Sci. 2023, 13, 13225

regularisation strength helps to improve the model’s performance. On the other hand, if
the AUC decreases as C increases, it means that the model’s performance worsens as the
regularisation strength increases. This may suggest that the model is overfitting the data
when C is small and that increasing regularisation strength C causes the model to become
too simplistic and lose important information from the data.
The optimal value of C is where the AUC is the highest; this is the sweet spot where
the model is able to balance the trade-off between maximising the margin and minimising
the classification error in a way that leads to the best classification performance.

Figure 13. The relationship between the AROC and regularisation parameter C for w1 and w2
is presented.

Figure 14. The relationship between the AROC and regularisation parameter C for w3 and w4
is presented.

Figure 15. The relationship between the AROC and regularisation parameter C for w5 and w6
is presented.

27
Appl. Sci. 2023, 13, 13225

4.3. The Classification Decision Boundary of SVM


The decision boundary of an SVM classifier is determined by the support vectors,
which are the data points closest to the boundary. Parameters C and σ, also known as
regularisation and kernel parameters, respectively, control the width of the margin and
the shape of the decision boundary. For instance, when σ is set to 0.1 and C is set to 1, the
decision boundary becomes complex and more influenced by the individual data points.
The width of the margin becomes relatively small and the classifier more sensitive to the
presence of outliers, as the algorithm tries to minimise misclassification errors. Moreover,
when σ is set to 0.1 and C is set to 10, the decision boundary is even more complex as C has
a greater influence on the decision boundary. The width of the margin is even smaller and
the classifier is even more sensitive to outliers. Furthermore, when σ is set to 0.1 and C is
set to 0.1, the decision boundary is relatively simple as C has a much smaller influence on
the decision boundary. The width of the margin is relatively large and the classifier is less
sensitive to outliers.
The classification boundaries of the SVM model are depicted in Figures 16–18 using
the best classification parameters, enabling the model to accurately categorise participants
into the appropriate group.

Healthy Control Group Healthy Control Group


Obesity Group Obesity Group
Misclassified from Healthy Control Group Misclassified from Healthy Control Group
Misclassified from Obesity Group Misclassified from Obesity Group

Figure 16. The boundary that separates the healthy control walk patterns from the obesity patterns in
an SVM model, with RCC control parameters f p = 1, f g = 1, x p =−1, x g =−1, μ f = 2, and σ = 0.1
and C = 0.1 for w1 and σ = 0.1 and C = 1 for w2 , is shown.

Healthy Control Group Healthy Control Group


Obesity Group Obesity Group
Misclassified from Healthy Control Group Misclassified from Healthy Control Group
Misclassified from Obesity Group Misclassified from Obesity Group

Figure 17. The boundary that separates the healthy control walk patterns from the obesity patterns in
an SVM model, with RCC control parameters f p = 1, f g = 1, x p =−1, x g =−1, μ f = 2, and σ = 0.1
and C = 1 for w3 and w4 , is shown.

The overall performance of the proposed SVM model is evaluated in Figure 19, where
the best classification parameters (σ = 0.1 and C = 0.1) result in the optimal generalisation
performance. The 6-week evaluation of the SVM shows fluctuating accuracy in classifying
participants into the healthy control and obesity groups. A high accuracy reflects consistent
participant characteristics, facilitating accurate classification by the SVM, whereas a low
accuracy indicates high variability in participant characteristics, making classification

28
Appl. Sci. 2023, 13, 13225

challenging. In individuals with obesity, the influence of various factors, including walk
speed, affects the results depicted in Figure 19. The figure demonstrates that the highest
accuracy is observed during the initial week, but experiences a significant decline in the
third week. Subsequently, there is a slight improvement in the fourth week, followed
by further declines in the fifth and sixth weeks. The fluctuation in the accuracy of the
SVM model during the 6-week period, despite the uniform diet and exercise regimen
followed by the participants, could be attributed to various reasons such as variations in
compliance levels, where some participants may have been more diligent in adhering to
the regimen than others, leading to different classifications into healthy control or obesity
groups. Other factors include individual differences such as genetics, medical history, and
personal habits, measurement inaccuracies, and changes in any of the systems (metabolic,
neuromuscular, cardiovascular) altering participant characteristics over time, even when
following the prescribed diet and exercise regimen. Participants’ stress levels or health
status could impact their classification as it could alter variables affecting their gait and
hence classification into healthy control or obesity groups.

Healthy Control Group Healthy Control Group


Obesity Group Obesity Group
Misclassified from Healthy Control Group Misclassified from Healthy Control Group
Misclassified from Obesity Group Misclassified from Obesity Group

Figure 18. The SVM decision boundary that separates the healthy control walk patterns from the
obesity patterns in an SVM model, with RCC control parameters f p = 1, f g = 1, x p =−1, x g =−1,
μ f = 2, and σ = 0.1 and C = 10 for w5 and σ = 0.1 and C = 1 for w6 , is shown.

Figure 19. The classification performance of SVM over a 6-week (w1 –w6 ) period is presented.

5. Discussion
The primary aim of the comprehensive study was to assess the effectiveness of obesity
treatment interventions through the application of a unique CA approach. Simultaneously,

29
Appl. Sci. 2023, 13, 13225

the investigation delved into the intricate dynamics of gait patterns, particularly within the
context of obesity. This multifaceted exploration sought to provide a deeper understanding
of gait patterns and evaluate the potential of the CA methodology combined with SVM
classification in gait analysis within the context of obesity treatment assessment. This
multidimensional exploration sought to not only quantify gait variability, but also ascertain
the robustness of our CA-SVM methodology for gait classification. Throughout the 6-week
study period, SVM classification accuracy exhibited significant fluctuations, with the
highest accuracy recorded in the initial week, followed by substantial declines in Weeks
3 through 6. These fluctuations highlight the dynamic nature of SVM model’s ability to
categorise gait patterns over time. Notably, these variations are not arbitrary but reflect
shifts in participant characteristics that significantly impact classification outcomes.
All participants adhered to a uniform diet and exercise regimen during the study,
making it evident that individual factors beyond the protocol influenced gait patterns
and SVM classifications each week. Factors such as compliance, genetics, medical history,
stress levels, or changes in metabolic/neuromuscular systems likely played a role. The
SVM accuracy metric robustly quantifies the influence of these individual characteristics
on classification performance. ROC curves and AUC values provide insights into the
model’s proficiency in distinguishing between healthy and obese gait patterns. Optimal
model performance occurred under specific parameter settings, σ = 0.1 and C = 1 or 10,
where AUC values were maximised. These results confirm the exceptional generalisation
capabilities of the SVM model when applied to unseen data and underscore the effectiveness
of our CA-SVM methodology in extracting relevant features from gait data.
Visual representations, in the form of phase plots, vividly illustrate the distinctions
between healthy and obese gait patterns achieved through the CA method. The phase plots
reveal that obese gait patterns are characterised by slower and more labored movements,
while healthy gait patterns are smoother and more fluid. These differences align with ex-
pectations, given the increased joint stress and stiffness associated with obesity. Phase plots
affirm that CA effectively distinguishes between the two groups by revealing interpretable
spatio-temporal gait characteristics.
The spatio-temporal analysis quantifying variability in gait progression over a
6-week period offers valuable insights. Notably, wider variability is observed among
obese participants compared to their healthy counterparts. This underscores CA’s capacity
to elucidate subtle yet evolving patterns through phase plot analysis and its competence in
quantifying gait variability. Lastly, classification accuracy results, ranging from 78.2% to
90%, strongly validate the efficacy of the CA method in dimensionality reduction and data
representation, enhancing classification performance. CA successfully transforms complex
gait patterns into lower-dimensional trajectories discernible by the SVM algorithm, leading
to high accuracy. This demonstrates CA’s proficiency in extracting essential features and
capturing nonlinear relationships, enabling precise classification and establishing it as an
effective data representation strategy.

6. Conclusions
The Criticality Analysis method for nonlinear data representation can be effectively
used to represent gait data and highlight medical conditions. The variability and detection
of changes over time highlight the ability of the method to determine changes in gait
in response to clinical intervention. The potential to assess clinical disorders using only
gait is an exciting development especially for long-term or complex disorders associated
with metabolic stress. The combination of nonlinear data representation with supervised
machine learning methods can significantly improve the assessment of a patient’s sta-
tus and improve the likelihood of positive outcomes by enabling objective assessment
during treatment.

30
Appl. Sci. 2023, 13, 13225

Author Contributions: Conceptualisation, S.E.; methodology, S.E. and T.V.o.S.; validation, S.E.;
formal analysis, S.E.; investigation, S.E.; data curation, H.D., J.C., A.A., C.M., M.M.-B., S.L., S.V.-C.,
M.J.A. and D.F.; writing—original draft preparation, S.E.; writing—review and editing, S.E. and
T.V.o.S.; visualisation, S.E.; supervision, T.V.o.S. All authors have read and agreed to the published
version of the manuscript.
Funding: This work was supported by a Newton Fund Institutional Links grant (grant ID: 432368181)
under the Newton–Mosharafa Fund partnership between the United Kingdom and Mexico. The
grant was funded by the UK Department for Business, Energy and Industrial Strategy (BEIS) and
delivered by the British Council. The funding supported collaboration activities between Oxford
Brookes University and Hospital Infantil de Mexico Federico Gomez under the Newton Institutional
Links program administered by the British Council.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: In accordance with the General Data Protection Regulation (GDPR)
guidelines, the database utilised in this study is maintained in a confidential and secure manner
within the purview of the Faculty of Health and Life Sciences at Oxford Brookes University. Owing
to privacy considerations, access to the dataset is restricted to authorised personnel only.
Acknowledgments: The authors gratefully acknowledge the support provided by the British Council
under the Newton Fund Institutional Links program for making this collaboration possible. We also
acknowledge our institutional partners Oxford Brookes University and Hospital Infantil de Mexico
Federico Gomez for their support and contribution.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Kuo, A.D.; Donelan, J.M. Dynamic principles of gait and their clinical implications. Phys Ther. 2010, 90, 157–174. [CrossRef]
[PubMed]
2. Clark, J.E.; Phillips, S.J. A longitudinal study of intralimb coordination in the first year of independent walking: A dynamical
systems analysis. Child Dev. 1993, 64, 1143–1157. [CrossRef] [PubMed]
3. Dingwell, J.B.; Cusumano, J.P. Nonlinear time series analysis of normal and pathological human walking. Chaos 2000, 10, 848–863.
[CrossRef] [PubMed]
4. Glazier, D.S. Metabolic Scaling in Complex Living Systems. Systems 2014, 2, 451–540. [CrossRef]
5. Hausdorff, J.M.; Mitchell, S.L.; Firtion, R.; Peng, C.K.; Cudkowicz, M.E.; Wei, J.Y.; Goldberger, A.L. Altered fractal dynamics of
gait: Reduced stride-interval correlations with aging and Huntington’s disease. J. Appl. Physiol. 1997, 82, 262–269. [CrossRef]
[PubMed]
6. Alam, U.; Riley, D.R.; Jugdey, R.S.; Azmi, S.; Rajbhandari, S.; D’Août, K.; Malik, R.A. Diabetic Neuropathy and Gait: A Review.
Diabetes Ther. 2017, 8, 1253–1264. [CrossRef] [PubMed]
7. Toro, B.; Nester, C.; Farren, P. A review of observational gait assessment in clinical practice. Physiother. Theory Pract. 2003, 19,
137–149. [CrossRef]
8. Pirker, W.; Katzenschlager, R. Gait disorders in adults and the elderly: A clinical guide. Wien. Klin. Wochenschr. 2017, 129, 81–95.
[CrossRef]
9. Sipari, D.; Chaparro-Rico, B.D.M.; Cafolla, D. SANE (Easy Gait Analysis System): Towards an AI-Assisted Automatic Gait-
Analysis. Int. J. Environ. Res. Public Health 2022, 19, 10032. [CrossRef]
10. Guo, Q.; Jiang, D. Method for Walking Gait Identification in a Lower Extremity Exoskeleton Based on C4.5 Decision Tree
Algorithm. Int. J. Adv. Robot. Syst. 2015, 12, 30.
11. Harris, E.J.; Khoo, I.-H.; Demircan, E. A Survey of Human Gait-Based Artificial Intelligence Applications. Front. Robot. AI 2022, 8,
749274. [CrossRef] [PubMed]
12. McGrath, M.; Howard, D.; Baker, R. The strengths and weaknesses of inverted pendulum models of human walking. Gait Posture
2015, 41, 389–394. [CrossRef] [PubMed]
13. Berry, H. Chaos in a Bienzymatic Cyclic Model with Two Autocatalytic Loops. Chaos Solitons Fractals 2003, 18, 1001–1014.
[CrossRef]
14. Eltanani, S.; olde Scheper, T.V.; Dawes, H. A Novel Criticality Analysis Technique for Detecting Dynamic Disturbances in Human
Gait. Computers 2022, 11, 120. [CrossRef]
15. olde Scheper, T.V. Criticality Analysis: Bio-inspired Nonlinear Data Representation. arXiv 2023, arXiv:2305.14361.
16. Rota, V.; Perucca, L.; Simone, A.; Tesio, L. Walk ratio (step length/cadence) as a summary index of neuromotor control of gait:
Application to multiple sclerosis. Int. J. Rehabil. Res. 2011, 34, 265–269. [CrossRef] [PubMed]

31
Appl. Sci. 2023, 13, 13225

17. Mobbs, R.J.; Perring, J.; Raj, S.M.; Maharaj, M.; Yoong, N.K.M.; Sy, L.W.; Fonseka, R.D.; Natarajan, P.; Choy, W.J. Gait metrics
analysis utilizing single-point inertial measurement units: A systematic review. Mhealth 2022, 8, 9. [CrossRef] [PubMed]
18. Esser, P.; Dawes, H.; Collett, J.; Howells, K. Insights into Gait Disorders: Walking Variability Using Phase Plot Analysis,
Parkinson’s Disease. Gait Posture 2013, 38, 648–652. [CrossRef]
19. Nahm, F.S. Receiver operating characteristic curve: Overview and practical use for clinicians. Korean J. Anesthesiol. 2022, 75, 25–36.
[CrossRef]
20. Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997,
30, 1145–1159. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

32
applied
sciences
Article
A Modified and Effective Blockchain Model for
E-Healthcare Systems
Basem Assiri

Computer Science Department, Jazan University, Jazan 82917, Saudi Arabia;


[email protected] or [email protected]

Abstract: The development of e-healthcare systems requires the application of advanced technologies,
such as blockchain technology. The main challenge of applying blockchain technology to e-healthcare
is to handle the impact of the delay that results from blockchain procedures during the communication
and voting phases. The impacts of latency in blockchains negatively influence systems’ efficiency,
performance, real-time processing, and quality of service. Therefore, this work proposes a modified
model of a blockchain that allows delays to be avoided in critical situations in healthcare. Firstly, this
work analyzes the specifications of healthcare data and processes to study and classify healthcare
transactions according to their nature and sensitivity. Secondly, it introduces the concept of a fair-
proof-of-stake consensus protocol for block creation and correctness procedures rather than famous
ones such as proof-of-work or proof-of-stake. Thirdly, the work presents a simplified procedure for
block verification, where it classifies transactions into three categories according to the time period
limit and trustworthiness level. Consequently, there are three kinds of blocks, since every category
is stored in a specific kind of block. The ideas of time period limits and trustworthiness fit with
critical healthcare situations and the authority levels in healthcare systems. Therefore, we reduce the
validation process of the trusted blocks and transactions. All proposed modifications help to reduce
computational costs, speed up processing times, and enhance security and privacy. The experimental
results show that the total execution time using a modified blockchain is reduced by about 49%
compared to traditional blockchain models. Additionally, the number of messages using modified
blockchain is reduced by about 53% compared to the traditional blockchain model.

Keywords: parallelism; distributed systems; modified blockchain technology; personal health


records; e-healthcare
Citation: Assiri, B. A Modified and
Effective Blockchain Model for
E-Healthcare Systems. Appl. Sci. 2023,
13, 12630. https://doi.org/10.3390/
1. Introduction
app132312630
Technological development plays vital roles in areas of life such as healthcare, educa-
Academic Editor: Gianluca Lax tion, tourism, national security, and others. In the field of healthcare, healthcare agencies
Received: 25 October 2023
compete through applying advanced technologies to improve their throughput, manage-
Revised: 21 November 2023
ment, control, and services with reduced costs. One important step toward this direction is
Accepted: 22 November 2023 to use electronic personal health records (EPHRs), which supports the use of other tech-
Published: 23 November 2023 nologies [1]. The use of EPHRs involves cloud storage, which allows for more control,
availability, and accessibility [2,3]. However, having EPHRs in cloud storage is called a
centralized parallel and distributed system, which has a single point of failure.
Blockchain technology is one kind of distributed system that runs in a decentralized
Copyright: © 2023 by the author. manner [4], in which multiple transactions are processed and grouped into one block. The
Licensee MDPI, Basel, Switzerland. blocks are listed in one ledger. Copies of the ledger are distributed among all nodes, such
This article is an open access article
that every node has an updated copy of the ledger. The nodes are devices that belong to
distributed under the terms and
the blockchain network, and they are authorized to store and validate transactions and
conditions of the Creative Commons
blocks. Actually, transactions are executed by users, but they cannot confirm (commit)
Attribution (CC BY) license (https://
those transactions. The blockchain nodes (miners) perform processing of transactions to
creativecommons.org/licenses/by/
confirm them. During this process, the miners compete in transaction processing to create
4.0/).

Appl. Sci. 2023, 13, 12630. https://doi.org/10.3390/app132312630 https://www.mdpi.com/journal/applsci


33
Appl. Sci. 2023, 13, 12630

a block of transactions using some consensus mechanisms, such as proof-of-work (PoW) or


proof-of-stake (PoS) [5]. Then, the miner who succeeds in creating a new block proposes
that block to other miners in order to verify it. If this block is verified, then it is added to
the ledger; otherwise, it is ignored [6,7]. The blockchain processes are executed as follows:
• Proposal: A miner verifies transactions and proposes a new block to other miners
(they will act as validators);
• Verification: The validators validate the proposed block and send their votes to the
others to either confirm or decline (commit or abort) the proposed block;
• Consensus: After receiving the votes of all validators, every validator checks the votes
of the majority. Accordingly, the block is committed and either added to the ledger
or not.
On the other hand, healthcare utilizes various data sources, such as healthcare pro-
fessionals and the Internet of Things (IoT). Firstly, healthcare professionals have different
levels of authority and trustworthiness, and this is also connected with the criticality of
healthcare situations. For example, in emergency cases, doctors and nurses should access
EPHRs to read or update them directly without any delay; for these purposes, blockchain
procedures such as permission or voting cannot be applied. Secondly, IoT, including sen-
sors, smartphones, and wearable mobile devices, provides real-time data as these devices
sense and reflect data directly [8]. These devices are able to facilitate or perform some
actions [9–11]. Actually, wearable mobile devices can be embedded in clothes or accessories
such as watches, bracelets, glasses, jewelry, etc. [12]. There are also other, complicated
kinds of wearable devices that can be embedded into the human body. This allows for
the improvement of healthcare follow-up and services. It helps in tracking life signs and
monitoring patients’ situations [13]. However, such real-time technology is challenged by
delays caused by blockchain procedures [14]. Moreover, these tools usually have limited
storing, processing, and energy capabilities, which would also be challenged by blockchain
procedures such as mining, validating, voting and storing processes [15].
Applying blockchain technology to e-healthcare has many advantages, such as de-
centralization, security, privacy, anonymity, transparency, reliability, and fault tolerance.
However, the main challenge is to handle the impact of the latency that results from
blockchain procedures during the communication and voting phases. The impacts of
latency in blockchains negatively influence systems’ efficiency, performance, real-time
processing, and quality of services. To the best of our knowledge, this is the first work
that modifies the blockchain model to cope with e-healthcare authority levels and real-
time specifications.
This work proposes a modified model of a blockchain that is used to store, process, and
manage data in the field of e-healthcare. Firstly, this work studies and classifies healthcare
data according to their nature and sensitivity. It investigates and analyzes the roles and
authorities that are interwoven with data access and processing. Secondly, understanding
the nature of transaction is an important step at the beginning of this work. Unfortunately,
many works apply blockchain technology without studying the implications of using
transactions, which obviously shows a lack of understanding of transaction specifications.
Therefore, the proposed model modifies the shape of data within the blocks, since the
regular form of transaction is not required for all data, operations, and processes. Thirdly,
it also introduces a modified form of PoS that implies the fairness of the proof-of-queue
protocol (PoQ). The proposed protocol is called fair-proof-of-stake (FPoS). Actually, in PoS,
the chance of creating and validating blocks is given according to the amount of stake the
miner puts in, which causes difficulty for new miners. The PoQ, on the other hand, queues
miners and gives a fair chance to every miner. The proposed FPoS uses the block creation
and correctness procedure of PoS for a specific number of cycles; then, it gives chances to
the miners waiting in the queue (but they cannot compete in the PoS manner). Fourthly, it
presents a modified procedure for block verification that relaxes the verification for some
trusted transactions and blocks. Obviously, all proposed modifications help to reduce
costs, support processing speed-up, and enhance security and privacy. The experimental

34
Appl. Sci. 2023, 13, 12630

results show that the total execution time using the modified blockchain was reduced
by about 49% comparing to traditional blockchain model. Additionally, the number of
messages using modified blockchain was reduced by about 53% compared to the traditional
blockchain model.
The rest of this paper is organized as follows: Related work is described in Section 2.
Section 3 shows the analysis of healthcare specifications. Section 4 introduces the modified
blockchain model that suits e-healthcare specifications. Section 5 illustrates the numerical
analysis and the experimental results. Section 6 discusses the advantages of the proposed
model, while Section 7 concludes the paper.

2. Related Work
Blockchain technology was used firstly in Bitcoin cryptocurrency [16], and then in
other cryptocurrencies such as Ripple, Litecoin, Ethereum, and Zcash [17]. It was intro-
duced to avoid the centralization and control of third parties [18]. Since then, researchers
have applied blockchain technology in many other fields, such as healthcare, education,
judiciary, etc. [19,20].
Blockchain-distributed architecture is supported by consensus protocols to ensure the
correctness of the processes. Different consensus protocols are used, such as PoW, which
applies the solutions of some mathematical puzzles with some specifications. The results of
such mathematical puzzles are used to hash the proposed block, and the miners use them to
verify the correctness of the block [5]. Another consensus protocol is PoS, by which miners
use their own coins as guarantees and according to which they have the chance to propose
or validate the blocks, which also gives them a chance to win more coins [5]. Proof-of-space
allows users to propose their own hard disks and hardware to process and secure the
blockchain. According to the given space, the miner has a higher chance. Another protocol
is the practical Byzantine fault tolerance (PBFT) consensus protocol, where the number of
fault votes should not exceed one-third of the votes [21].
In addition, wearable devices help to sense, collect, and send data and to receive
alerts, as well as to share updates and information. This improves healthcare processes
and services for all stockholders [11]. Many research has investigated the use of wearable
devices in healthcare for patient monitoring [22], recognition, and assistance, as well as for
research purposes [23].
Many researchers have linked blockchains with wearable devices to support users,
healthcare providers, and insurance companies [24–26]. In such research, the advantages
of using blockchain with wearable devices, such as decentralization, distribution, trans-
parency, robustness, availability, automation, traceability, reliability, ownership protection,
privacy, and security are investigated, and some of these advantages intersect with each
other. In contrast, some work has highlighted the blockchain’s disadvantages, such as
energy consumption, computational cost, traffic flow latency, and scalability [27]. In re-
sponse, many works have provided modified blockchain models [28] and modified trans-
actions [29]. Blockchains have been used in healthcare systems for storage security [30],
EPHR sharing [31], insurance processes [32], pharmaceutical supply chains [33], patient
monitoring [34], organ transplant management [35], clinical trial support [36], and IoT data
management [37]. However, to the best of our knowledge, this is the first work that has
targeted e-healthcare authority levels, in addition to reducing latency and processing time,
using a relaxed blockchain model.

3. Healthcare System’s Specifications


Before we move on to integrating blockchain technology with a healthcare system, this
paper illustrates some important points related to the healthcare system. Such an analysis
allows us to frame the specifications of healthcare data and processes. Accordingly, the
blockchain model will be adjusted. The analysis of healthcare specifications is presented
as follows:

35
Appl. Sci. 2023, 13, 12630

• Healthcare system’s stockholders: The main healthcare system’s stockholders are patients,
doctors, nurses, dentists, health technicians, and administration staff [38].
• EPHR general privacy: Patients’ information in EPHRs can be revealed for specific
purposes, but only if the personal identities are hidden [39]. Such information can be
revealed for specific purposes, like research or awareness-raising campaigns. Dealing
with EPHRs is very sensitive, even with hidden identities, as they could be negatively
exploited by politicians, marketplaces, or businesses.
• EPHR with healthcare stakeholders: No patients’ information and EPHRs should be
hidden form doctors, nurses, dentists, or health technicians [39]. The data can be
accessed under a non-disclosure agreement. However, some EPHR information, such
as identities or mental and psychological issues, should be hidden from co-members
such as volunteers and students who join healthcare teams.
• EPHR privacy levels: Different privacy levels are assigned to EPHRs. For example,
EPHRs would require specific privacy levels [39,40] for politicians, military leaders,
and famous people compared to the public.
• Updating EPHR: Different parts of EPHRs can be updated by doctors, nurses, dentists,
or health technicians. Indeed, everyone who is part of the healthcare staff can update
specific related parts without restriction. However, we should restrict updates that are
irrelevant to the roles within the healthcare team [41].
• Direct update of EPHR: Most EPHR updates require the approval of the primary or
main doctor. Primary doctors do not need any approval, and they can authorize others
to directly update the relevant parts or sections without any approval [42].
• Financial Transactions: Financial transactions can be accessed or updated by the autho-
rized people, although the approval of any operation is required [32].
• Latency: Latency or delays that result from approval are critical in some emergency
cases and scenarios.
• System considerations: The development of any healthcare system should consider the
general regulations, ethical obligations, and cultural influence. For example, cases of
abortion, transgender status, and violence should be treated according to the laws and
culture perspectives, which differ from one place to another.

4. Modified Blockchain Model


Blockchain technology introduces decentralized and distributed architecture that is
combined with supported algorithms and procedures. The blockchain has nodes that are
fully connected to each other and share copies of the same ledger. The algorithm starts
by confirming the correctness of transactions and groups them in blocks. Only one of
the miners can propose a new block. This miner is decided based on different consensus
mechanisms, such as PoW, PoS, PoQ, or PBFT. Then, the other miners validate and vote
on the approval of the proposed block. According to the votes of the majority (consensus),
the block is approved and added to ledger; otherwise, it is ignored and another block is
proposed. In fact, every miner has to update their copy the ledger based on the consensus.
In this paper, the blockchain procedure is modified according to the guidance of the
results of the healthcare specifications in Section 3. The details of the modifications are
explained in the following subsections.

4.1. Transaction Process


Understanding the nature of transactions is important. Unfortunately, many works
apply blockchain technology without studying the implication of using transactions, which
obviously indicates a lack of understanding of transaction specifications.
A regular operation reads data or updates it directly in the main memory. However, a
transaction consists of one or more operations. These operations involve either reading a
piece of data or writing a piece of data. The operations within a transaction are executed
one by one in a temporary memory (buffer). In the end, the transaction is committed or
aborted. Committing a transaction means that the results of all operations are reflected

36
Appl. Sci. 2023, 13, 12630

from the temporary memory to the main memory, while aborting a transaction means that
the results of all operations are neglected and the temporary memory is freed. In addition,
by using a transactional system, many operations and transaction are executed in parallel,
which increases the throughput (number of executed transactions per time). It also speeds
up the processing system with minimal costs. Another advantage is the ability to roll
back and retrieve the correct data. The transactional system is supported by software and
hardware resources.

4.2. Mining
The mining process has two basic steps. Firstly, the correctness of the transactions
is confirmed, and they are placed into a new block. Secondly, a consensus protocol such
as PoW or PoS is followed to obtain a chance of proposing the new block to others for
validation and votes, which is explained in the following section. Now, let us focus on the
correctness of transactions; indeed, there are two kinds of transactions: read and update.
The read transaction only includes read operations, while the update transaction includes
at least one write operation, as shown in Figure 1. Actually, Figure 1 shows examples
of different kinds of transactions. T1 is a read transaction that includes only one read
operation, which returns the value of the variable a. At the end, the transaction tries to
commit (TryC). T2 is a read transaction that includes multiple read operations for the
variables a, b, and c. T3 is an update transaction that includes only one write operation,
which updates the value of the variable a with the value 5. T4 is a update transaction that
includes multiple write operations to update the variables a, b, and c, consequently with
the values 1, 2, and 3. T5 is an update transaction that includes multiple read and write
operations, which read variable a, write to a, then read variable c.

Figure 1. Kinds of transactions.

Furthermore, since transactions run in parallel and in isolation, the effects of the
operations appear after transaction is committed. This means that the changes that result
from transaction operations are not visible to the system until it commits. This may allow
for inaccurate data to be read (not up to data), and may cause conflict among transactions,
as illustrated in Figure 2. In fact, running a read transaction in parallel does not have
any negative impact, since the values of the memory data are stable. However, update
transactions change the value of the memory content. Figure 2 gives an example of four
transactions running in parallel and shows how the conflicts caused the abortion of some

37
Appl. Sci. 2023, 13, 12630

transactions. T1, T2, and T3 are read transactions, while T4 is an update transaction. The
operations of those transactions accessed some memory variables, namely, a = 0, b = 0, c = 0,
and d = 0. Actually, T1 read variable a (returning a = 0), c (returning c = 0), and d (returning
d = 0). T2 read variables a (returning a = 0) and b (returning b = 1), which means that T4
changed the value of b and had already committed. T3 read variable b (returning b = 0)
and d (returning d = 1). This means that T3 read b before the commitment of T4, and d
afterward. T4 read variable a (returning a = 0) and updated the values of b (returning b = 1)
and d (returning d = 1).

Figure 2. An Example of parallel execution of four transactions to illustrate the conflicts that caused
the abortion of T4.

In fact, the correctness of the execution of parallel transaction is confirmed when


the results of parallel execution match any correct serialized execution. In other words,
the transactions in parallel execution must be ordered in a logical way. Considering our
example in Figure 2, T1 can be considered as the first executed transaction, since it returned
all original values of a, c, and d. T4 can be considered as the second executed transaction,
since it read the original values of a and updated the value of b (b = 1) and d (d = 1). T2
can be considered as the third executed transaction, since it read the original values of
a (no concurrent transaction has updated a) and b (returning b = 1), where the b value
had been updated by T4. However, T3 must be aborted and ignored in the process of
ordering; it could not be ordered before T4 as it read d = 1, which could only be seen after
the commitment of T4, and it could not be ordered after T4 as it read b = 0, clearly ignoring
the change in variable b that took place after T4 was committed.
One last example is shown in Figure 3 to explain the conflicts between two update
transactions. Let us have one bank account, called the hospital bank account (h), and let
h = 10,000, which means that the hospital bank account contains USD 10,000. Now, let
there be two transactions, T1 and T2, that belong to different patients and are executed
in parallel. The first patient pays USD 100 as the cost of some blood tests (T1), while the
other pays USD 500 for medical examination (T2). In this way, T1 should read the value of
h and add a value of 100, while T2 should read the value of h and add a value of 500. Since
both T1 and T2 are executed in parallel, both of them will read the original value of h. This
means that T1 will add 100 to 10,000, and at the end, it will write h = 10,100, while T2 will
add 500 to 10,000, and at the end it writes h = 10,500. The last results of both transactions
are incorrect, since the value of h after the commitment of T1 and T2 should be 10,600.
Consequently, one of the transactions must be aborted and rolled back, then executed again
after the commitment of the other. Let us say that T2 is aborted, then executed again after
the commitment of T1: it sees h = 10,100, and by adding 500, the result becomes h = 10,600.

38
Their L zebra

and is but

young

of

makes Egyptians

a the with

and

VOLE
its

blood was since

in with

man of

gorged

frogs L knowing

the is

of
so Woburn the

soft

domesticity awoke Africa

number get forest

the the

his jackal the

at as

deliberate Photo real

shows dipped
The to certain

camp fingers

against the much

Euphrates Persia

by

furthermore in on
many may or

popular and

to

have Solway By

101

high s

on long devouring
in a Lion

302 of to

pair Medland

the

to Mainly

the leaps any

the of

slenderness red seen

G of that
plateaux

much

painters

remember Monkey

this
dogs the

from a

of

body and

its line

rivers human

though

Archive

General resting

still Green it
reported not

Although in the

Great varies India

Elizabeth china were

length

known

milk ourselves
I

recesses with in

especially black mountain

found and

feed Forfar

penguins

have
tip toes

seals in

females The a

of

and

obtained of frugal
seasons

as

been portion L

S ferocity horn

latter play have


to flesh by

in

The are

It the

could British

Indian

itself make mountain

species home companions

the is
their

to to the

no at

but

ever of

Arab dog way

which tabby only

the

and dignified and

a herds
RED imprisoned of

he

nasal

feet better read

were that Asia


ROWN

are

devour

very to

The well general

any

haunts

to
them

bull being which

is Kent Sea

of grass

in
interesting men Z

willingness Cobego

enemy

P discovery fine

feline hardly

Bengali

be became horses
as is scouts

both varied

Horse habits

time

shooting

doors in

ship 109

Sporting jaguar
breeds

in the

CAPE HYRAX The

The of

the IAMANG and

Kronstadt
to like

in are

part a not

are produce the

Foal no I

disappear As soon

the from

the
specially

their

cross

for

114

all A

their and bottom

cinnamon as numerous

marked for

other expression
or if

LIONESS

there horse

measuring

small their

is blue

insects

is

forwards African
184 in Mithridates

of all B

the roar

ears Parry

to
their

well

little

of yellow

attain opposite great

but in

Northern

often

enemy elephant

distinguished gallops
on remarkably

mammal bordering all

knew The coarse

as from

laid its

has dwell
of by

bamboo

main

last

late as

nevertheless
its

look

it

up up But

shooting
A put yellowish

and its

avoid acute R

then which and

generally not

rouse

no

wallowing was

popular shown

was grow all


only

evening obvious in

holding

and

lions

though Near The

Cape

The in GRIVET

a pursued

Central
less

ships baboons Society

consideration

in in

make
Chartley many

reaching a amity

the beginning

but

the numbers of

It reed music

Finchley lemur

African which lie

unique
very IVET

the and

The

its who

includes narrow
discovered

Sydney

interior Domesticated

forward planter front

one does

the

are people
world TTER

living habits backs

the Aberdeen

and

a
the

has

shoulder where

BAT and

Australian yard variety


seen

knees

of

the Uganda

189 cut turns


observations EKIN African

pursuing by

than coloured short

L of

there air

of

highly height

are

just

Russian cannot have


to not wedding

small in

white

the remarkable

whom she with

curious

seals inches

knowledge

Every B
house its ORILLAS

been

squirrel

which

when OTTLED

quite

African
a in bats

much dog

abominable havoc paid

L family
the One

Hyæna is a

might

extinct as

ravages and
by that s

to

any of in

of a

on his strong

It

as pastures 69

Ant In north
Wallaby bamboos

of Goat

Guiana

and cold

red

ladies the any

connections

poosa
before the outskirts

Asia

with a by

bears L ground

in
however

eggs

shown HE the

modified another

the

wolves

make come

of and

first on
POLAR

ones

and ago

kernels

farmyards Of tabby

another creatures

never due

but

man world is

many
that

of animals Many

zebra another Gaimard

slow

down

to from the

be

noticed simpler
on

in approached climbing

Arawak

so been game

retire the
and monkeys

so

active It frequents

black

habits

which scarcely

OF A by
had illustration choice

kidnapping type

S Bengal

to said when

drink Hippopotamus with

there

working bites

great

and of in

4 native
same this dislike

fur

Most

to keep ARP

man

They provisions
be

of Roman probably

the

of

was

as Among

occasionally We

one

have original

You might also like