Combinación de IA y Apoyo Humano en Salud Mental
Combinación de IA y Apoyo Humano en Salud Mental
Clare E Palmer1, Emily Marshall1, Edward Millgate1, Graham Warren1, Michael P. Ewbank1, Elisa Cooper 1, Samantha
Lawes1, Malika Bouazzaoui1, Alastair Smith1, Chris Hutchins-Joss1, Jessica Young1, Morad Margoum2, Sandra
Healey2, Louise Marshall 1, Shaun Mehew1, Ronan Cummins1, Valentin Tablan1, Ana Catarino1, Andrew E Welchman1
and Andrew D Blackwell1
ieso Digital Health, The Jeffrey's Building, Cowley Road, Cambridge, CB4 0DS, UK
1
2
Dorset HealthCare University NHS Foundation, Sentinel House, Nuffield Industrial Estate, Nuffield Road, Poole, UK
Escalating global mental health demand exceeds existing clinical capacity. Scalable digital solutions will be essential
to expand access to high-quality mental healthcare. This study evaluated the effectiveness of a digital intervention
to alleviate mild, moderate and severe symptoms of generalized anxiety. This structured, evidence-based program
combined an Artificial Intelligence (AI) driven conversational agent to deliver content with human clinical oversight
and user support to maximize engagement and effectiveness. The digital intervention was compared to three
propensity-matched real-world patient comparator groups: i) waiting control; ii) face-to-face cognitive behavioral
therapy (CBT); and iii) remote typed-CBT. Endpoints for effectiveness, engagement, acceptability, and safety were
collected before, during and after the intervention, and at one-month follow-up. Participants (n=299) used the
program for a median of 6 hours over 53 days. There was a large clinically meaningful reduction in anxiety symptoms
for the intervention group (per-protocol (n=169): change on GAD-7 = –7.4, d = 1.6; intention-to-treat (n=299): change
on GAD-7 = –5.4, d = 1.1) that was statistically superior to the waiting control, non-inferior to human-delivered care,
and was sustained at one-month follow-up. By combining AI and human support, the digital intervention achieved
clinical outcomes comparable to human-delivered care while significantly reducing the required clinician time.
These findings highlight the immense potential of technology to scale effective evidence-based mental healthcare,
address unmet need, and ultimately impact quality of life and economic burden globally.
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
1
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
3
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
Figure 1. CONSORT diagram. Prior to assessment enrolment avenues differed for external recruits (left) and patients referred to ieso for typed
therapy (either from an NHS Provider or via a self-direct referral; right). External recruits signed-up specifically for the study via an external webpage
following social media or email advertisements. All potential participants irrespective of enrolment avenue were triaged for suitability based on a
Self-Assessment Questionnaire (SAQ). For patients, only those deemed to be potentially eligible were invited to participate. Participants were
withdrawn either actively (requested to withdraw), passively (dropped-out or disengaged from study procedures), clinician-led (withdrawn based
on clinician recommendation), or other (due to reasons such as technical issues). TAU = treatment as usual.
4
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
The six modules consisted of an introduction module, program was registered as a UKCA marked Class 1
three core modules, and two consolidation modules medical device.
(Figure 2). The three core modules each consisted of
three sessions that followed the pattern of i) learning, ii)
activity, and iii) practice. The two consolidation modules
consisted of two sessions. There were 16 sessions total. To ensure participant safety and maximize engagement
The introduction and consolidation modules consisted of and acceptability of the program, a dedicated human user
sessions designed for onboarding and learning and clinical support service was provided. Prior to
consolidation, respectively. All modules began with a enrolment, as part of the screening process, all
symptom “check-in” consisting of the GAD-7 and PHQ-9 participants received a standardized clinical assessment
within the software immediately before the first session by a trained clinician with an accredited postgraduate
within that module. Sessions were made available on a qualification via typed modality. The clinician assessed
timed schedule subject to completing the prior session the individual’s needs, determined if they were eligible for
(Figure 2). the study and obtained informed consent. Research
coordinators provided fortnightly check-in calls to all
Within each session, the software used a conversational participants throughout the program and sent weekly
agent to guide participants through a combination of emails or SMSs to remind participants only if they
videos, educational content, conversations, and deviated from the program schedule. Risk could be
worksheets written by accredited clinicians. The software flagged through symptom monitoring of GAD-7 and
used AI models for Natural Language Understanding, PHQ-9 scores or through interaction with the research
specific and tailored elements of Natural Language coordinators during check-in calls or ad hoc
Generation and a dialogue management system. Part way communication. Flagged risk was escalated to a clinician
through enrolment, with agreement from the overseeing for review. Where appropriate the participant would then
NHS Research Ethics Committee, the software was be contacted for further risk assessment by a clinician to
updated to fix bugs, improve the user experience within ensure their safety. Participants could also request an
the introductory module, and update select AI models. appointment with a clinician at any point to discuss their
The final 60 participants enrolled were offered the journey, particularly if they were unsure the program was
updated software. Software version was controlled for in working for them. At the end of the study, all participants
statistical analyses. The digital program was built in were offered a further discharge appointment with a
accordance with ISO 13485. Prior to the study, the study clinician to discuss the next steps for their care.
Figure 2. Schematic of ieso Digital Program with human clinical and user support service and study procedures. All participants received a clinical
assessment prior to enrolment and were offered a discharge appointment with a clinician following the program. Clinicians were available via
asynchronous messaging or for a review appointment whenever needed. All participants received email or SMS reminders and fortnightly check-
in calls throughout the program to maximize engagement delivered via the research team. The ieso Digital Program included 6 modules with a total
of 16 sessions. Each module started with a symptom check-in consisting of the GAD-7 and PHQ-9
5
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
The support service and study procedures are illustrated conducted prior to retrospective analysis of external
in Figure 2. In total, delivering the intervention required an control data to estimate the total sample size needed to
average of 97 minutes (1.6 hours) of clinician time (i.e. time quantify clinical effectiveness (i.e. change in GAD-7 total
spent in sessions with participants) per participant. This score) compared to an active external control. Clinical
included 299 assessments (mean 66 mins; range 31–105 effectiveness was defined as a change in GAD-7 score
mins), 46 review appointments (mean 32 mins; 14–60 over either the course of six treatment sessions or until
mins) and 174 discharge appointments (mean 44 mins; recovery was reached (if sooner than 6 sessions). A non-
range 13–76 mins). inferiority margin of a 1.8 change in GAD-7 total score was
chosen based on previous literature ((38–40); see
Supplementary Methods for more details). Using data
from patients being treated for GAD via typed-CBT, with
Adults with mild to severe symptoms of anxiety, at least six sessions or recovery, we estimated an
consistent with Generalized Anxiety Disorder (GAD), were expected standard deviation of GAD-7 change of 5.14. To
invited to participate either following referral to ieso’s estimate a sample size, we used the following equation:
𝑍𝛼 +𝑍𝛽 2
typed therapy service (either referred to ieso from the 𝑛 = 2( ) (see (41)), where Zα and Zβ are the
(𝛿+ ∆)/𝜎
NHS Provider or via self-referral direct to ieso) or in
standard normal scores for the one-sided significance
response to online advertisements or email invitation
level of 2.5% (1.96) and power of 90% (1.28) respectively, δ
through the NIHR BioResource for Translational
is the non-inferiority level 1.8 and σ is the standard
Research (https://bioresource.nihr.ac.uk/). Only
deviation 5.14. A sample size of 172 was estimated for the
participants with a main problem descriptor of GAD were
study intervention to enable a non-inferiority analysis of
eligible as established through a clinician assessment in
clinical effectiveness compared to human-delivered care.
line with the NHS TT manual (16).
December 2023, and b) DHC between January 2017 and 7, from baseline to final score, and estimating a within-
December 2021. subject effect size (Cohen’s d). The threshold for a
clinically meaningful reduction in symptoms was defined
as a change greater than the reliable change index of the
GAD-7 scale (minimum of a 4-point reduction; Toussaint
et al., 2020). Clinical outcomes were calculated using the
Analyses were conducted in R (42). A statistical analysis
following definitions: a) improvement was defined as a
plan was defined prior to final analyses being conducted.
reduction on the PHQ-9 or GAD-7 scales greater than or
equal to the reliable change index ( ≥4 for GAD-7; ≥6 for
PHQ-9) and no reliable increase on either measure; b)
The per-protocol (PP) sample (n=169) was defined as recovery was defined as reduction on both scales to
participants who completed the minimum meaningful below the clinical cutoff (GAD-7 score <8; PHQ-9 score
clinical dose of the program (MMCD) and the final post- <10); c) reliable recovery was defined as having both
intervention GAD-7 and PHQ-9 questionnaires. This improved and recovered; d) responder rate was defined
dose was defined a priori by three accredited cognitive as an improvement of either ≥4 on the GAD-7 or ≥6 on
behavioral therapists who evaluated the content of the the PHQ-9; and e) remission rate was defined as having
program to determine the amount of content required to either a final GAD-7 score <8 or final PHQ-9 score <10 for
deliver meaningful clinical improvement on the GAD-7 those only having started above the clinical cut-off.
scale based on their clinical experience (mean experience Definitions for improvement, recovery and reliable
of 14 years delivering psychological therapy). Based on recovery are equivalent to those used in NHS TT (44). A
this evaluation, the MMCD was defined as completing within-subjects effect-size for mean change in GAD-7
modules 1 to 3 in the digital program and the module 4 scores from post-intervention to one month follow-up
check-in. was calculated to determine the short-term durability of
any effects of the digital intervention. We also measured
The intention-to-treat (ITT) sample (n=299) included all effectiveness by calculating the change in PHQ-9 and
participants who completed questionnaires at enrolment WSAS between baseline and final score, as well as
irrespective of adherence to the digital program except between comparator groups. For the ITT sample, when
for one participant who requested that their data be calculating GAD-7 and PHQ-9 effectiveness, missing
deleted. Due to missing data for the pre-intervention post-intervention scores were imputed using last
WSAS (external recruits only), the ITT sample for all WSAS observation carried forward, such that the final score
analyses was n=295. collected prior to disengagement or withdrawal was used.
Metrics of adherence were primarily assessed with To determine whether any demographic or study
descriptive statistics of in-software usage metrics: variables were associated with adherence or
median and distribution of time spent in the digital effectiveness, a series of regression analyses were
program in hours, days since initialization of the program conducted. All regression models included age, gender,
(defined based on the date that the software was highest qualification, employment status, religion,
downloaded); and proportion of participants completing presence of a chronic physical health condition, ethnicity,
each session, module, and check-in. An “engaged” reported disability, sexuality, baseline GAD-7 severity,
software version, and enrolment path (referred to ieso’s
patient is defined as an individual who has received the
typed therapy service or externally recruited) as
minimum amount of therapy such that pre- and post-
predictors. Linear regression models were used to
treatment measures can be collected, and clinical
predict continuous dependent variables: i) number of
outcomes estimated (16). Here we used a comparable
sessions completed; ii) change in GAD-7 score from
definition of engagement based on usage of the program
baseline to final score. A logistic regression model was
(including time in the program, content delivered, and
used to predict non-adherence (i.e. participants who did
number of outcomes measured) defined as completing
not complete the necessary program sessions or study
session 1 of module 2 in the program. This is in contrast to
assessments to be in the PP sample; non-adherence
the MMCD definition which is defined based on both
coded as 1). Due to unequal sample sizes within
usage and expected improvement in symptoms.
demographic sub-categories (e.g. sexuality), groups were
truncated to aid in the interpretability of findings and
power of analyses. To determine if adherence across
Clinical effectiveness was quantified by calculating the sessions differed between groups, adherence rates were
change in anxiety symptoms, measured using the GAD- compared between the digital program, face-to-face CBT
7
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
and typed CBT by estimating the slope and confidence Enrolled participants in the intervention group were
intervals of the association between the proportion of propensity-matched to patients from these control
sample that completed each GAD-7 assessment (either groups using baseline GAD-7 scores, baseline PHQ-9
symptom check-in within the digital program or prior to scores, age, and the presence of a chronic physical health
each therapy session as standard within NHS TT) and condition (yes/no/not known). Propensity-matching was
session number. Sessions were aligned such that each conducted using the ‘MatchIT’ package (45) in R with
symptom check-in within the program was associated ‘nearest neighbor’ methodology (average treatment
with a treatment session for the control group. effect in treated patients). For the waitlist control only
participants in the PP sample were matched (n=169) due
to limited available data for matching. For the human-
delivered therapy control groups all participants were
Safety was assessed using reported serious adverse matched (n=299). Supplementary Table 2 illustrates the
events, software deficiencies, and number of cases matching of comparator groups to the intervention
withdrawn based on clinician assessment of suitability to sample.
continue with the program. Software deficiencies include
malfunctions or errors of the software that could result in In line with the a priori defined statistical analysis plan, a
issues related to safety or software performance. superiority analysis was conducted to test the hypothesis
that the clinical effectiveness of the intervention was
greater than a propensity-matched waiting control group.
A non-inferiority analysis was conducted to test the
Three propensity-matched external control groups were hypothesis that the clinical effectiveness of the
created using real-world historic patient data (see intervention was not inferior to the effectiveness of typed
External comparator data source) to compare the clinical CBT or face-to-face CBT in comparison to waiting-list.
effectiveness of the intervention to no intervention and Within and between-subject effect sizes were also
standard of care. All propensity-matched control patients estimated for the change in total score on the PHQ-9 and
had a main problem descriptor of GAD as established the WSAS to estimate the effectiveness of the
through a clinician assessment. intervention on low mood and work and social functioning
relative to the waiting control.
The control groups consisted of:
i. waiting controls (total available sample n=576);
patients referred for typed-CBT with two GAD-7
scores between 4-10 weeks apart without having The final sample for analysis included 299 participants of
started treatment during that time (same sample used whom 80% were female (n=240) with a mean age at
for PP and ITT analyses), baseline of 39.8 years (range: 18 – 75 years). Table 1
provides an overview of demographics and baseline
ii. therapist delivered typed CBT (total available sample
severity for participants in the intervention group for both
n=2,210); patients referred for typed-CBT with at least
the ITT and PP samples.
two scores on the GAD-7, who had completed a
course of typed CBT - defined by the discharge code
of ‘completed treatment’ - and discharged with a
maximum of twelve treatment sessions (PP sample), or
any patient who had entered treatment, regardless of Participants (n=299) completed a median of 6.1 hours of
completion (ITT sample), and program interaction over 53.1 days. This was higher for
the PP sample in which participants completed a median
iii. therapist delivered face-to-face CBT (total available of 8.7 hours over 59.6 days. In total, 232 participants
sample n=753); NHS TT patients referred to DHC who (78%) were engaged in the program (i.e. completed
received face-to-face CBT and had a minimum of two session 1 of module 2) involving a median of 2 hours
and a maximum of twelve treatment sessions (PP interacting with the program content over 14 days. Out of
sample), or any patient who attended treatment (ITT those engaged participants, 78% (n=180) reached the
sample). Unlike the typed-CBT comparator, due to minimum meaningful clinical dose (i.e. completing up to
unavailability of discharge codes it was not possible to check-in 4 out of 6 in the program). The overall study
use the ‘completed treatment’ to define the PP sample attrition rate (defined as the proportion of participants
for this group. who did not complete the final study questionnaires) was
8
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
Table 1. Sample characteristics of the digital intervention group for both ITT and PP samples.
ITT PP
Demographic Category
(N=299) (N=169)
Age, Mean (SD) - 39.8 (12.8) 41.7 (11.8)
32%. Descriptive statistics of engagement with the typed therapy based on session completion. Confidence
program across modules are outlined in Table 2. intervals for estimates of the adherence rate were
overlapping indicating no difference in adherence rates
To determine if adherence across sessions differed across groups (Figure 3; Supplementary Table 3).
between the groups, we compared adherence rates
between the digital program, face-to-face therapy and
9
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
Table 2. Engagement metrics for the digital program. To investigate potential drivers of program adherence,
Median time Median time demographic and study factors were associated with i)
N since initialization interacting in adherence, defined as number of completed sessions,
(days) program (hours)
and ii) non-adherence, defined as not being included in
Per-protocol sample
total
169 59.6 8.7 the PP sample. The number of completed sessions in the
program did not show any significant associations with
Intention-to-treat
299 53.1 6.1 demographic or study factors (linear regression: F(24,
sample total
Engaged sample
274) = 1.4, p = .11, adjusted R² = 0.03; Supplementary
total (up to module 232 14.0 2.0 Table 4). Age was associated with non-adherence, with
2 session1) younger participants less likely to be in the PP sample
(logistic regression: OR = 0.97, p = .009; Supplementary
All participants by milestone
Module 1 check-in Table 5).
284 0.0 0.03
Module 6 check-in 113 49.5 5.4 On average, across the intervention sample, there was a
Median days since initialization was calculated as number of days large, clinically meaningful reduction in anxiety symptoms
since software download at onboarding for each sample: PP, ITT and from baseline to final score (PP: mean GAD-7 change = –
the engaged sample (i.e. those who completed up to session 1 of 7.4, d = 1.6; ITT: mean GAD-7 change = –5.4, d = 1.1). This
module 2). Metrics are also shown for each symptom check-in at the reduction was significantly greater than that found for the
beginning of each module.
waiting control (mean GAD-7 change = –1.9; PP between-
subject effect: p <.001, d = 1.3; ITT between-subject effect:
p <.001, d = 0.8), and statistically non-inferior to the
propensity-matched face-to-face therapy control (PP:
mean GAD-7 change = –6.4, non-inferiority effect p
<.001; ITT: mean GAD-7 change = –6.0, non-inferiority
effect p = .002). For the propensity matched typed-
therapy control, the intervention was significantly non-
inferior for the PP sample (mean GAD-7 change: –7.5;
non-inferiority p <.001), and for the ITT sample the effect
was approaching significance (mean GAD-7 change = –
6.6, p = .06; Figure 4, Table 3). Clinical outcomes for all
groups are reported in Supplementary Table 6.
10
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
Figure 4. Change in anxiety symptoms from baseline to final score for all groups. A) Mean change (final score – baseline) in GAD-7 scores for the
PP sample (n=169), propensity-matched waiting control group, face-to-face CBT group, and typed CBT group. B) Mean change in GAD-7 scores
for the ITT sample (n=299) and a propensity-matched waiting control group, face-to-face CBT group, and typed CBT group. C) Mean GAD-7
scores at baseline and final score with 95% confidence intervals for the PP sample (n=169) and propensity-matched waiting control group, face-
to-face CBT group, and typed CBT group. D) Mean GAD-7 scores at baseline and final score with 95% confidence intervals for the ITT sample
(n=299) and propensity-matched waiting control group, face-to-face CBT group, and typed CBT group. *** = p < .001, ** = p <.005
The associations between participant demographics, 9 change = –3.1, d =0.7; ITT: mean PHQ-9 change = –1.6,
study factors and change in GAD-7 score were explored d = 0.3) (Table 4). This mean change was significantly
with a linear regression: F(24, 274) = 3.45, p< .001, greater than the mean change in the waiting control
adjusted R2 of 0.16. Greater reductions in GAD-7 scores group for the PP sample (mean PHQ-9 change = –1.0,
were associated with higher baseline GAD-7 scores (β = between-subject effect, p < .001, d = 0.5), but not for the
0.70, SE = 0.09, t = 7.6, p< .001), and higher baseline age ITT sample (p = .11 d = 0.1). Despite this, PHQ-9 remission
(β = 0.07, SE = 0.02, t = 3.0, p = .003) (Supplementary rate (based on n=80 above the clinical cut-off at baseline)
Table 9), such that more severe, older participants saw a was 78.8% (Supplementary Table 6). Participants with
larger change in GAD-7 score. severe and moderate baseline GAD-7 scores
experienced the largest improvement in PHQ-9 scores
(Supplementary Table 8). There was minimal mean
change in scores between post intervention and follow-
As intended, given the specificity of the program for up for both PP and ITT samples (PP mean difference =
targeting symptoms of generalized anxiety, there was a 0.5; ITT mean difference = 0.4) (Supplementary Table 10).
statistically significant yet smaller effect for low mood
symptoms as measured with the PHQ-9 (PP: mean PHQ-
11
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
Table 3. Change in GAD-7 score from baseline to final score for all groups.
Per- ieso Digital Program 169 12.4 3.4 –7.4 4.6 –6.7 –8.1 1.6
protocol
Face-to-face CBT 253 13.0 3.1 –6.4 4.8 –5.8 –7.0 1.3
Typed CBT 229 12.5 3.4 –7.5 4.1 –7.0 –8.0 1.8
Intention- ieso Digital Program 299 12.5 3.3 –5.4 5.1 –4.8 –6.0 1.1
to-treat
Face-to-face CBT 299 12.9 3.1 –6.0 4.9 –5.5 –6.6 1.2
Typed CBT 299 12.6 3.5 –6.6 4.6 –6.1 –7.1 1.4
Mean difference in GAD-7 score was calculated between baseline and final score for the intervention group (“ieso Digital Program”)
and all propensity-matched comparator arms: waiting control; face-to-face CBT; and typed-CBT. A negative mean difference denotes
a reduction in GAD-7 total scores. Within-subject effect sizes (Cohen’s d) were estimated for the mean change in GAD-7 scores for
each group. Change scores were calculated for PP and ITT samples.
Table 4. Change in PHQ-9 and WSAS score from baseline to final score for all groups.
12
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
Per- ieso Digital Program 169 8.0 3.8 –3.1 4.5 –2.4 –3.8 0.7
protocol
Face-to-face CBT 253 8.5 3.7 –3.0 4.8 –2.4 –3.6 0.6
Typed CBT 229 8.1 3.5 –4.1 3.9 –3.6 –4.6 1.1
Intention- ieso Digital Program 299 8.0 3.7 –1.6 4.8 –1.1 –2.1 0.3
to-treat
Face-to-face CBT 299 8.4 3.6 –2.7 4.8 –2.2 –3.3 0.6
Typed CBT 299 8.1 3.6 –3.3 4.2 –2.9 –3.8 0.8
WSAS
Waiting control 153 10.6 6.1 –0.1 1.3 0.1 –0.3 0.1
Per- ieso Digital Program 169 15.3 6.4 –5.3 6.2 –4.4 –6.2 0.9
protocol
Face-to-face CBT 253 14.1 7.6 –4.3 8.6 –3.3 –5.4 0.5
Typed CBT 223 10.8 6.4 –4.6 5.5 –3.8 –5.3 0.8
Intention- ieso Digital Program 295 14.9 6.6 –4.7 6.5 –3.8 –5.6 0.7
to-treat
Face-to-face CBT 299 14.1 7.6 –3.9 8.3 –2.9 –4.8 0.5
Typed CBT 291 10.8 6.3 –3.9 5.7 –3.2 –4.5 0.7
Mean differences in PHQ-9 and WSAS scores were calculated between baseline and final score for the intervention group (“ieso Digital
program”) and all propensity-matched comparator arms: waiting control; face-to-face CBT; and, typed-CBT. A negative mean difference
denotes a reduction in scores. Within-subject effect sizes (Cohen’s d) were estimated for the mean change for each group. Change
scores were calculated for PP and ITT samples.
the potential to expand global access to high-quality, economic healthcare costs, therefore ensuring effects are
effective mental healthcare. durable is imperative (46–48). Incorporating cognitive
and behavioral principles into daily life through practical
The large clinical effect of the digital intervention across exercises can enable meaningful behavioral change that
participants with moderate or severe symptoms persists beyond treatment end (16). Here, both the
highlights the clinical value of the combined program persistent clinical effect at one month follow-up, and the
content and human support. Here, the PP (d = 1.3) and ITT significant improvement in the impact of anxiety on
(d = 0.8) effect sizes relative to waitlist are larger than the participants’ day-to-day functioning (as measured with
pooled effect size reported in a recent meta-analysis (n the WSAS) highlights the potential of the intervention to
comparisons = 96, g = 0.26) (14). Unlike the PP sample instigate long-lasting behavioral change. Retrospective
which is designed to demonstrate the clinical analysis of recurrence data from electronic health records
effectiveness of an intervention when the intervention is is needed to accurately measure the persistence of the
adhered to, the ITT sample provides an estimate of clinical effect in the real world over a longer follow-up
effectiveness more reflective of the real-world context by time-period.
accounting for disengagement. The large ITT effect was
significantly non-inferior to face-to-face therapy, and The engagement rate of the digital program (78%) and
approaching significance for non-inferiority to typed- time to reach “engaged” (~2 hours of program interaction
therapy (p = 0.06). Human-delivered care enables greater over 2 weeks) is comparable to engagement rates and
flexibility to respond to patient concerns and adapt time in therapy observed in NHS TT services for
content compared to a digital program, therefore the treatment of GAD (70%; 2022-2023) (49). Adherence
comparable clinical effects and adherence rates across rates across groups in the study were also similar.
groups indicates the potential of this digital intervention Average program interaction time (median 6.1 hours)
to significantly impact real-world patient outcomes. across the ITT sample was greater than that reported for
It is important to note that high relapse and recurrence similar app-based interventions (e.g. median 3.4 hours)
rates have implications for both patient quality of life and (50), indicating high engagement with the program. Study
13
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
attrition (32%) is higher than previous reports from studies rapid growth in the development of AI conversational
of conversational agent-delivered mental health agents, use of this technology remains rare in digital
interventions (22%) (51), yet similar to real-world global mental health interventions, with only ~5% using this
treatment drop-out rates (~20-40%) (52,53). This may be technology (14). The majority of these systems employ a
due to the pragmatic design of the study: 30% of the tree-based dialogue approach, where natural language
sample recruited through ieso’s therapy referrals could processing analyzes user input, and responses are
choose to withdraw at any time and immediately access selected from a predefined set of pre-written answers.
1:1 human-delivered therapy; and participants had the However, previous research has shown users find this
option to discuss their progress or any issues with the frustrating, particularly when it feels the agent does not
clinical team at any point. These factors could have understand them (64,65). Recent advances in the
increased withdrawal rates more than previous studies, development of large language models now make it
but more readily reflect real-world patient choice and possible to flexibly generate personalized language for a
clinical decision-making. more engaging user experience. In the current study, the
digital program primarily used a tree-based dialogue
To our knowledge, this study is the first to compare the system with controlled use of natural language
effectiveness of a digital intervention to standard of care generation in specific instances to enhance engagement.
using external propensity-matched comparator groups Increased use of generative technology and reduced
from real-world patient data. There is increasing reliance on tree-based approaches will continually
acceptability for the use of externally controlled clinical improve the capability of conversational agents to create
trials (54–57) made possible by the availability of large- a personalized and engaging experience. However,
scale, standardized datasets. Generating external allowing fully autonomous language generation within the
comparator groups reduces patient burden, study costs, context of mental health, where patient problems can be
and avoids delaying treatment for the comparator group nuanced, complex and require the consideration of social
receiving no intervention (58). However, creating and cultural contexts, poses a high risk for patient harm
standard of care control arms that are directly comparable and misuse (66). Stringent validation of these new AI
to a novel intervention remains difficult due to differences technologies with a phased roll out alongside human
in how to define comparable doses, treatment completion oversight will be essential to ensure patient safety (67).
and account for study-specific assessments. Moreover,
the lack of randomization means selection bias and study Finally, a ‘blended’ design of human support and
effects are not controlled for in the current study. conversational technology has been suggested to be key
Nevertheless, this is more reflective of real-world care for maximizing real-world engagement (51). Previous
where treatment outcomes are biased by a patient’s research has highlighted lack of trust, lack of user-centric
preference and choice over their treatment. design, privacy concerns, poor usability, and being
unhelpful in emergencies as key drivers of poor
The clinical effect and engagement rate reported in the engagement with digital interventions (12). To address
current study could have been driven by a combination these concerns, we mirrored a real-world treatment model
of the three key features of the digital intervention: i) a including user support services, clinician referral to the
curated and structured evidence-based program, ii) a program, proactive symptom monitoring and clinician
conversational agent to deliver the program content, and availability for collaborative decision-making with each
iii) a human user and clinical support model akin to participant. This service created a credible and
standard healthcare delivery. First, the structured trustworthy patient experience that we believe positively
evidence-based program was curated by a team of impacted patient outcomes. Although this study was not
accredited cognitive behavioral therapists with an designed to demonstrate the economic value of the
average of 14 years direct clinical experience. The intervention, the average clinician time spent per
program used principles from traditional CBT (22) participant was <2 hours, which is significantly lower than
including third wave approaches, such as ACT. This current standards of care globally: approximately 4 times
approach encourages individuals to accept their less than an average episode of treatment in the UK for
thoughts and feelings while committing to actions aligned GAD (~8 appointments between 45-60 mins; NHS Digital
with their values. There is a growing body of evidence 2021-2022) (49) and ~approximately 8 times less globally
indicating that ACT demonstrates comparable (~15 appointments; mean across reported naturalistic
effectiveness to other forms of CBT for anxiety disorders studies in (68)). This new model, combining an AI-driven
(59–61), and has been shown to be acceptable and program with clinical support, allows the current, limited
engaging within a digital program for GAD (62,63). supply of trained therapists to help more people than
current standards of care.
Second, a conversational agent was used to personalize
the content delivery and enhance engagement. Despite
14
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
Limitations of the current study include the use of flexible human language will soon be widely accessible.
compensation for time for those who volunteered to This accessibility will radically change how individuals
participate, and the selection of a sample with limited low seek mental health support. Our responsibility lies in
mood symptoms. In particular, in line with the study leveraging these advances, addressing the ethical and
exclusion criteria, individuals with severe depression social challenges inherent with AI, and combining the
symptoms were not included. Nevertheless, the best of technology with the best of clinical care to
propensity-matching across groups controls for this, i.e. increase access to effective, safe and engaging mental
all groups included patients with similar baseline anxiety health support for everyone. Rigorous evidence,
and depression symptoms. Differences in PP sample particularly to understand the optimal blend of human
sizes across the control groups were likely driven by the and computer support for different individuals, will be key
definition of PP in each context rather than engagement, to accelerate precision treatment, maintain scalability,
given similar adherence rates across the groups. Defining maximize uptake and adherence, and integrate digital
a comparable PP sample across groups is challenging interventions into health systems.
due to differences in dose intensity, delivery mechanism
and data collected, as well as significant variation across
patients in both clinical presentation of generalized We extend our gratitude to the patients who participated in the study
and to the dedicated clinicians and support staff involved. We would
anxiety and response to treatment. The PP samples for
like to thank Gerald Chan, Stephen Bruso, Andy Richards, Ann
the therapy control groups were based on completed Hayes, David Icke, Michael Black, Clare Hurley, Florian Erber,
episodes of care, therefore were agnostic of therapy dose Richard Marsh, Sam Williams, Jo Parfrey for their support and
and would have included those who received a low encouragement. We are grateful to Prof Thalia Eley for introducing
number of sessions and recovered quickly. Those us to NIHR BioResource. We thank NIHR BioResource volunteers
for their participation, and gratefully acknowledge NIHR
individuals would not have been included in the
BioResource centers, NHS Trusts and staff for their contribution. We
intervention PP sample which was conservatively defined thank the National Institute for Health and Care Research, NHS
based on minimum program interaction. Blood and Transplant, and Health Data Research UK as part of the
Digital Innovation Hub Program. The views expressed are those of
There were also limitations in terms of the diversity of the the author(s) and not necessarily those of the NHS, the NIHR or the
Department of Health and Social Care. We thank Dorset Healthcare
intervention sample, with enrolled participants
University NHS Foundation Trust (DHC) for providing external data
predominantly white, highly educated, and female. This
for comparison.
sample is reflective of the typical profile of GAD patients
in the UK and US (49,69). Although we attempted to
increase diversity in this sample through focused This research was funded by ieso Digital Health Ltd.
marketing campaigns, these efforts were not successful.
Needs differ across individuals, conditions and contexts,
and a greater understanding of the barriers to research Chief Investigator (EMa) and other investigators (CEP, EMi, GW,
MPE, EC, SL, AS, CH, JY, MB, LM, SM, RC, VT, AC, AW, AB) are
participation is required to fully understand these needs,
employees of ieso Digital Health Limited (the company funding this
particularly where groups have been systematically research) or its subsidiaries. None of these authors had a direct
excluded from research, and where there is stigma financial incentive related to the results of this study or the
around mental health. Increasing access to mental health publication of the manuscript.
support could play a substantial role in addressing unmet
need in underserved groups, therefore future work will
CEP, EMa, AW & AB conceptualized the study. CEP and EMi drafted
aim to evidence the inclusivity of this digital intervention
the paper. GW, EMi, MPE, EC, MB, AS & AC contributed to data
and its potential to counter existing health inequalities. analyses and interpretation. SL, JY, CH, EMa & CEP conducted the
study. All authors contributed to the interpretation of results and
In conclusion, this study demonstrates that a digital paper revision, and approved the final version.
intervention, designed for adults with symptoms of
generalized anxiety, produces comparable outcomes to
human-delivered CBT while significantly reducing the Owing to the potential risk of patient identification, and following
data privacy policies at ieso and DHC, individual-level data are not
required clinician time. This result indicates the potential
available. Aggregated data are available upon request, subject to a
for digital interventions to provide high quality, evidence- data-sharing agreement with ieso and DHC. Data requests should
based care at scale to address unmet need worldwide. As be sent to the corresponding author and will be responded to within
AI technologies rapidly progress, it is evident that 30 days.
generative dialogue systems that emulate creative and
15
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
https://www.businessofapps.com/guide/mobile-app-
retention/
1.World Health Organisation. Mental Health | Key facts 14.Linardon J, Torous J, Firth J, Cuijpers P, Messer M, Fuller-
[Internet]. 2022 [cited 2024 Jun 24]. Available from: Tyszkiewicz M. Current evidence on the efficacy of mental
https://www.who.int/news-room/fact-sheets health smartphone apps for symptoms of depression and
anxiety. A meta-analysis of 176 randomized controlled trials.
2.Alonso J, Liu Z, Evans-Lacko S, Sadikova E, Sampson N, World Psychiatry. 2024 Feb 1;23(1):139–49.
Chatterji S, et al. Treatment gap for anxiety disorders is global:
Results of the World Mental Health Surveys in 21 countries. 15.David D, Cristea I, Hofmann SG. Why cognitive behavioral
Depress Anxiety. 2018 Mar 1;35(3):195–208. therapy is the current gold standard of psychotherapy. Front
Psychiatry. 2018 Jan 29;9(JAN).
3.Our World in Data. Psychiatrists per 100,000 people
[Internet]. 2024 [cited 2024 Jul 3]. Available from: 16.The National Collaborating Centre for Mental Health. The
https://ourworldindata.org/grapher/psychiatrists-working-in- Improving Access to Psychological Therapies Manual The
the-mental-health-sector Improving Access to Psychological Therapies Manual -
Appendices and helpful resources. 2018.
4.Health Resources & Services Administration. Health
Workforce Shortage Areas [Internet]. 2024 [cited 2024 Jul 3]. 17.Ewbank MP, Cummins R, Tablan V, Catarino A, Buchholz S,
Available from: https://data.hrsa.gov/topics/health- Blackwell AD. Understanding the relationship between patient
workforce/shortage-areas language and outcomes in internet-enabled cognitive
behavioural therapy: A deep learning approach to automatic
5.Roland J, Lawrance E, Insel T, Christensen H. THE DIGITAL coding of session transcripts. Psychotherapy Research.
MENTAL HEALTH REVOLUTION TRANSFORMING CARE 2020;1–13.
THROUGH INNOVATION AND SCALE-UP: WISH 2020
Forum on Mental Health and Digital Technologies. 2020. 18.Ewbank MP, Cummins R, Tablan V, Bateup S, Catarino A,
Martin AJ, et al. Quantifying the Association between
6.Clay R. Mental health apps are gaining traction [Internet]. Psychotherapy Content and Clinical Outcomes Using Deep
2021 [cited 2024 Jul 3]. Available from: Learning. JAMA Psychiatry. 2019 Jan 1;77(1):35–43.
https://www.apa.org/monitor/2021/01/trends-mental-health-
apps 19.Huckvale K, Venkatesh S, Christensen H. Toward clinical
digital phenotyping: a timely opportunity to consider purpose,
7.Torous J, Roberts LW. Needed innovation in digital health quality, and safety. Vol. 2, npj Digital Medicine. Nature
and smartphone applications for mental health transparency Publishing Group; 2019.
and trust. Vol. 74, JAMA Psychiatry. American Medical
Association; 2017. p. 437–8. 20.Catarino A, Harper S, Malcolm R, Stainthorpe A, Warren G,
Margoum M, et al. Economic evaluation of 27,540 patients
8.Lattie EG, Stiles-Shields C, Graham AK. An overview of and with mood and anxiety disorders and the importance of
recommendations for more accessible digital mental health waiting time and clinical effectiveness in mental healthcare.
services. Vol. 1, Nature Reviews Psychology. Nature Nature Mental Health. 2023 Aug 31;1(9):667–78.
Publishing Group; 2022. p. 87–100.
21.Taylor HL, Menachemi N, Gilbert A, Chaudhary J, Blackburn
9.Borghouts J, Eikey E, Mark G, De Leon C, Schueller SM, J. Economic Burden Associated with Untreated Mental Illness
Schneider M, et al. Barriers to and facilitators of user in Indiana. JAMA Health Forum. 2023 Oct 13;4(10):E233535.
engagement with digital mental health interventions:
Systematic review. J Med Internet Res. 2021 Mar 1;23(3). 22.Fenn K, Byrne M. The key principles of cognitive
behavioural therapy. InnovAiT: Education and inspiration for
10.M. Ng M, Firth J, Minen M, Torous J. User engagement in general practice. 2013 Sep;6(9):579–85.
mental health apps: A review of measurement, reporting, and
validity. Psychiatric Services. 2019;70(7):538–44. 23.Wilson K, Hayes S, Strosahl K. Acceptance and
commitment therapy: an experiential approach to behavior
11.Michie S, Yardley L, West R, Patrick K, Greaves F. change. New York: Guilford Press; 2003.
Developing and evaluating digital interventions to promote
behavior change in health and health care: Recommendations 24.Gilbody S, Brabyn S, Lovell K, Kessler D, Devlin T, Smith L,
resulting from an international workshop. J Med Internet Res. et al. Telephone-supported computerised cognitive-
2017;19(6). behavioural therapy: REEACT-2 large-scale pragmatic
randomised controlled trial. British Journal of Psychiatry. 2017
12.Torous J, Nicholas J, Larsen ME, Firth J, Christensen H. May 1;210(5):362–7.
Clinical review of user engagement with mental health
smartphone apps: Evidence, theory and improvements. Vol. 21, 25.Schulz KF, Altman DG, Moher D. CONSORT 2010
Evidence-Based Mental Health. BMJ Publishing Group; 2018. Statement: Updated guidelines for reporting parallel group
p. 116–9. randomised trials. BMJ (Online). 2010 Mar 27;340(7748):698–
702.
13.Tafradzhiyski N. Business of Apps | Mobile App Retention.
[Internet]. 2023 [cited 2024 Jun 24]. Available from:
16
medRxiv preprint doi: https://doi.org/10.1101/2024.07.17.24310551; this version posted July 17, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
It is made available under a CC-BY-NC-ND 4.0 International license .
26.Spitzer RL, Kroenke K, Williams JBW, Löwe B. A Brief effective for generalized anxiety disorder: randomized
Measure for Assessing Generalized Anxiety Disorder The controlled trial. Australian & New Zealand Journal of
GAD-7. Arch Intern Med. 2006;166:1092–7. Psychiatry [Internet]. 2009;43(10):905–12. Available from:
www.virtualclinic.org.au
27.Kroenke K, Spitzer RL. The PHQ-9: A New Depression
Diagnostic and Severity Measure. Psychiatr Ann. 2002 40.Titov N, Dear BF, Johnston L, Lorian C, Zou J, Wootton B,
Sep;32(9):509–15. et al. Improving Adherence and Clinical Outcomes in Self-
Guided Internet Treatment for Anxiety and Depression:
28.Mundt JC, Marks IM, Shear MK, Greist JH. The Work and Randomised Controlled Trial. PLoS One. 2013 Jul 3;8(7).
Social Adjustment Scale: A simple measure of impairment in
functioning. British Journal of Psychiatry. 2002;180(MAY):461– 41.Rothmann MD, Wiens BL, Chan ISF. Design and Analysis of
4. Non-Inferiority Trials. Chapman and Hall/CRC; 2016.
29.Rolffs JL, Rogge RD, Wilson KG. Disentangling 42.R Core Team. R: A language and environment for statistical
Components of Flexibility via the Hexaflex Model: computing. Vienna, Austria : R Foundation for Statistical
Development and Validation of the Multidimensional Computing; 2016.
Psychological Flexibility Inventory (MPFI). Assessment. 2018
Jun 1;25(4):458–82. 43.Toussaint A, Hüsing P, Gumz A, Wingenfeld K, Härter M,
Schramm E, et al. Sensitivity to change and minimal clinically
30.O’Brien HL, Cairns P, Hall M. A practical approach to important difference of the 7-item Generalized Anxiety
measuring user engagement with the refined user Disorder Questionnaire (GAD-7). J Affect Disord. 2020 Mar
engagement scale (UES) and new UES short form. 15;265:395–401.
International Journal of Human Computer Studies. 2018 Apr
1;112:28–39. 44.Clark DM. Realizing the Mass Public Benefit of Evidence-
Based Psychological Therapies: The IAPT Program. Annu Rev
31.Brooke J. Usability Evaluation in Industry. 1st Edition. 1996. Clin Psychol. 2018 May 7;14:159–83.
32.Hirani SP, Rixon L, Beynon M, Cartwright M, Cleanthous S, 45.Ho DE, Imai K, King G, Stuart EA. MatchIt: Nonparametric
Selva A, et al. Quantifying beliefs regarding telehealth: Preprocessing for Parametric Causal Inference [Internet]. Vol.
Development of the Whole Systems Demonstrator Service 42, JSS Journal of Statistical Software. 2011. Available from:
User Technology Acceptability Questionnaire. J Telemed http://www.jstatsoft.org/
Telecare. 2017;23(4):460–9.
46.Ali S, Rhodes L, Moreea O, McMillan D, Gilbody S, Leach
33.Hayes S, Follette V, Linehan M. Mindfulness and C, et al. How durable is the effect of low intensity CBT for
Acceptance: Expanding the Cognitive-Behavioral Tradition. depression and anxiety? Remission and relapse in a
Guilford Press; 2011. longitudinal cohort study. Behaviour Research and Therapy.
2017 Jul 1;94:1–8.
34.Berg H, Akeman E, McDermott TJ, Cosgrove KT, Kirlic N,
Clausen A, et al. A randomized clinical trial of behavioral 47.Delgadillo J, Rhodes L, Moreea O, McMillan D, Gilbody S,
activation and exposure-based therapy for adults with Leach C, et al. Relapse and Recurrence of Common Mental
generalized anxiety disorder. Journal of Mood and Anxiety Health Problems after Low Intensity Cognitive Behavioural
Disorders. 2023 Jun;1:100004. Therapy: The WYLOW Longitudinal Cohort Study. Psychother
Psychosom. 2018 Mar 1;87(2):116–7.
35.Beatty C, Malik T, Meheli S, Sinha C. Evaluating the
Therapeutic Alliance With a Free-Text CBT Conversational 48.Shallcross AJ, Willroth Aaron Fisher EC, Dimidjian S, Gross
Agent (Wysa): A Mixed-Methods Study. Front Digit Health. JJ, Visvanathan Manhattan Mindfulness-Based Cognitive
2022 Apr 11;4. Behavioral Therapy Iris B Mauss PD. Relapse/Recurrence
Prevention in Major Depressive Disorder: 26-Month Follow-
36.Boucher E, Honomichl R, Ward H, Powell T, Stoeckl SE, Up of Mindfulness-Based Cognitive Therapy Versus an Active
Parks A. The Effects of a Digital Well-being Intervention on Control ScienceDirect. Behav Ther [Internet]. 2018; Available
Older Adults: Retrospective Analysis of Real-world User Data. from: www.sciencedirect.comwww.elsevier.com/locate/bt
JMIR Aging. 2022 Jul 1;5(3).
49.NHS Digital. NHS Digital. 2024 [cited 2024 May 24]. NHS
37.Cliffe B, Croker A, Denne M, Stallard P. Supported Web- Talking Therapies, for anxiety and depression, Annual reports,
Based Guided Self-Help for Insomnia for Young People 2022-23. Available from: https://digital.nhs.uk/data-and-
Attending Child and Adolescent Mental Health Services: information/publications/statistical/nhs-talking-therapies-for-
Protocol for a Feasibility Assessment. JMIR Res Protoc. 2018 anxiety-and-depression-annual-reports/2022-23#resources
Dec 1;7(12).
50.Richards D, Enrique A, Eilert N, Franklin M, Palacios J,
38.Robinson E, Titov N, Andrews G, McIntyre K, Schwencke G, Duffy D, et al. A pragmatic randomized waitlist-controlled
Solley K. Internet treatment for generlized anxiety disorder: A effectiveness and cost-effectiveness trial of digital
randomized controlled trial comparing clinician vs. technician interventions for depression and anxiety. NPJ Digit Med.
assistance. PLoS One. 2010;5(6). 2020 Dec 1;3(1).
51.Jabir AI, Lin X, Martinengo L, Sharp G, Theng YL, Car LT. Disorder in Adults: A Systematic Review and Network Meta-
Attrition in Conversational Agent-Delivered Mental Health Analysis of Randomized Clinical Trials. JAMA Psychiatry. 2024
Interventions: Systematic Review and Meta-Analysis. J Med Mar 6;81(3):250–9.
Internet Res. 2024 Jan 1;26(1).
62.Kelson J, Rollin A, Ridout B, Campbell A. Internet-delivered
52.Olfson M, Marcus S. National Trends in Outpatient acceptance and commitment therapy for anxiety treatment:
Psychotherapy. Am J Psychiatry . 2010;167(12):1456–63. Systematic review. Vol. 21, Journal of Medical Internet
Research. JMIR Publications Inc.; 2019.
53.Wells JE, Browne MO, Aguilar-Gaxiola S, Al-Hamzawi A,
Alonso J, Angermeyer MC, et al. Drop out from out-patient 63.Hemmings NR, Kawadler JM, Whatmough R, Ponzo S,
mental healthcare in the World Health Organization’s World Rossi A, Morelli D, et al. Development and feasibility of a
Mental Health Survey initiative. British Journal of Psychiatry. digital acceptance and commitment therapy⇓based
2013 Jan;202(1):42–9. intervention for generalized anxiety disorder: Pilot
acceptability study. JMIR Form Res. 2021 Feb 1;5(2).
54.U.S. Department of Health and Human Services Food and
Drug Administration. Considerations for the Design and 64.Coghlan S, Leins K, Sheldrick S, Cheong M, Gooding P,
Conduct of Externally Controlled Trials for Drug and Biological D’Alfonso S. To chat or bot to chat: Ethical issues with using
Products Guidance for Industry DRAFT GUIDANCE [Internet]. chatbots in mental health. Digit Health. 2023 Jan;9.
2023. Available from: https://www.fda.gov/vaccines-blood-
biologics/guidance-compliance-regulatory-information- 65.Huang YS (Sandy), Dootson P. Chatbots and service
biologics/biologics-guidances failure: When does it lead to customer aggression. Journal of
Retailing and Consumer Services. 2022 Sep 1;68.
55.National Institute for Health and Care Excellence. NICE
real-world evidence framework [Internet]. 2022. Available from: 66.Nuffield Council on Bioethics. The role of technology in
www.nice.org.uk/corporate/ecd9 mental healthcare [Internet]. 2022 [cited 2024 Jun 24].
Available from:
56.Thorlund K, Dron L, Park JJH, Mills EJ. Synthetic and https://www.nuffieldbioethics.org/assets/pdfs/The-role-of-
external controls in clinical trials – A primer for researchers. technology-in-mental-healthcare.pdf
Clin Epidemiol. 2020;12:457–67.
67.Stade EC, Stirman SW, Ungar LH, Boland CL, Schwartz HA,
57.Corrigan-Curay J, Sacks L, Woodcock J. Real-world Yaden DB, et al. Large language models could change the
evidence and real-world data for evaluating drug safety and future of behavioral healthcare: a proposal for responsible
effectiveness. Vol. 320, JAMA - Journal of the American development and evaluation. npj Mental Health Research.
Medical Association. American Medical Association; 2018. p. 2024 Apr 2;3(1).
867–8.
68.Flückiger C, Wampold BE, Delgadillo J, Rubel J, Vîslǎ A,
58.Patterson B, Boyle MH, Kivlenieks M, Van Ameringen M. Lutz W. Is There an Evidence-Based Number of Sessions in
The use of waitlists as control conditions in anxiety disorders Outpatient Psychotherapy? - A Comparison of Naturalistic
research. J Psychiatr Res. 2016 Dec 1;83:112–20. Conditions across Countries. Psychother Psychosom. 2020
Aug 1;89(5):333–5.
59.American Psychological Association. DIAGNOSIS: Mixed
Anxiety Conditions TREATMENT: Acceptance And 69.Terlizzi E, Villarroel M. Symptoms of Generalized Anxiety
Commitment Therapy For Mixed Anxiety Disorders [Internet]. Disorder Among Adults: United States, 2019 [Internet]. 2020
2015 [cited 2024 Jun 24]. Available from: Sep [cited 2024 Jul 4]. Available from:
https://div12.org/treatment/acceptance-and-commitment- https://www.cdc.gov/nchs/products/databriefs/db378.htm
therapy-for-mixed-anxiety-disorders/
70.Food and Drug Administration. Non-Inferiority
60.Han A, Kim TH. Efficacy of Internet-Based Acceptance and Clinical Trials to Establish Effectiveness. Guidance for
Commitment Therapy for Depressive Symptoms, Anxiety, Industry [Internet]. 2016 Sep [cited 2024 Jul 11]. Available from:
Stress, Psychological Distress, and Quality of Life: Systematic https://www.fda.gov/media/78504/download
Review and Meta-analysis. J Med Internet Res. 2022 Dec
1;24(12).
18