0% found this document useful (0 votes)
270 views16 pages

Tolin, 2015 - Empirically Supported Treatment - Recommendations For A New Model

This document discusses the evolution of empirically supported treatments (ESTs) over the past 20 years. It notes that while EST criteria were initially developed to identify cognitive-behavioral therapies, research has increasingly supported other approaches. The quality and quantity of treatment outcome studies has greatly increased, with more rigorous guidelines and systematic reviews. EST identification has impacted clinical training and practice, encouraging the use of evidence-based approaches. However, EST evaluation remains important as a starting point for evidence-based practice, which integrates research, clinical expertise, and patient characteristics.

Uploaded by

up201900768
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
270 views16 pages

Tolin, 2015 - Empirically Supported Treatment - Recommendations For A New Model

This document discusses the evolution of empirically supported treatments (ESTs) over the past 20 years. It notes that while EST criteria were initially developed to identify cognitive-behavioral therapies, research has increasingly supported other approaches. The quality and quantity of treatment outcome studies has greatly increased, with more rigorous guidelines and systematic reviews. EST identification has impacted clinical training and practice, encouraging the use of evidence-based approaches. However, EST evaluation remains important as a starting point for evidence-based practice, which integrates research, clinical expertise, and patient characteristics.

Uploaded by

up201900768
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Empirically Supported Treatment: Recommendations for a

New Model
David F. Tolin, The Institute of Living and Yale University School of Medicine
Dean McKay, Fordham University
Evan M. Forman, Drexel University
E. David Klonsky, University of British Columbia
Brett D. Thombs, Jewish General Hospital and McGill University

Over the 20 years since the criteria for empirically sup- David Barlow, first published criteria for what were ini-
ported treatments (ESTs) were published, standards for tially termed “empirically validated psychological treat-
synthesizing evidence have evolved and more system- ments” (Task Force on Promotion and Dissemination
atic approaches to reviewing the findings from interven- of Psychological Procedures, 1993) and later termed
tion trials have emerged. Currently, the APA is planning
“empirically supported psychological treatments”
(Chambless & Hollon, 1998; Chambless & Ollendick,
the development of treatment guidelines, a process that
2001). The identification of empirically supported treat-
will likely take many years. As an intermediate step, we
ments (ESTs) has had substantial impact in psychology
recommend a revised set of criteria for ESTs that will
and related mental health disciplines. One immediately
utilize existing systematic reviews of all of the available
tangible effect of the movement to identify ESTs has
literature, and recommendations that address the been the validation of procedures for specific psycho-
methodological quality, outcomes, populations, and logical problems, and the dissemination of that informa-
treatment settings included in the literature. tion to practitioners, consumers, and other stakeholders
Key words: clinical significance, empirically sup- on the web (www.psychologicaltreatments.org).
ported treatment, GRADE tool, systematic reviews. [Clin Since (and perhaps in part due to) that early work,
Psychol Sci Prac, 2015] the quantity of treatment outcome studies has increased
dramatically. A search of PsycINFO for the terms “ran-
domized controlled trial” or “randomised controlled
CONSIDERATIONS IN THE EVALUATION OF EMPIRICALLY trial” (November 23, 2014) yielded only 20 citations
SUPPORTED TREATMENTS: ARE EMPIRICALLY SUPPORTED for the year 1995, compared to 123 in 2000, 427 in
TREATMENTS STILL RELEVANT? 2005, and 950 in 2010. Among this increase in ran-
Over two decades ago, the Society of Clinical Psychol- domized controlled trials (RCTs), we see a wide range
ogy (Division 12 of the American Psychological Asso- of therapeutic approaches being evaluated for efficacy.
ciation [APA]), under the direction of then President Since the development of the original list of ESTs,
most of which were cognitive-behavioral treatments,
efficacy trials for psychodynamic therapy (Milrod et al.,
Address correspondence to David F. Tolin, Ph.D., Anxiety
2007), transference-focused psychotherapy (Yeomans,
Disorders Center, The Institute of Living, 200 Retreat Avenue,
Hartford, CT 06106. E-mail: [email protected].
Levy, & Caligor, 2013), family-based therapy (Lock
et al., 2010), and interpersonal psychotherapy (e.g.,
doi:10.1111/cpsp.12122 Parker, Parker, Brotchie, & Stuart, 2006) have

© 2015 American Psychological Association. Published by Wiley Periodicals, Inc., on behalf of the American Psychological Association.
All rights reserved. For permissions, please email: [email protected]. 1
appeared, to name a diverse few. The result has been a problem areas. This increased consumer education
greater emphasis on empiricism among approaches that encourages clinicians who might otherwise not have
previously lacked a history of accumulating research practiced in an empirically supported manner to
support. This increase in diverse outcome research has acquire the necessary skills to begin offering scientifi-
shifted the debate among practitioners of different the- cally based treatments. Perhaps the most ambitious
oretical persuasions from mere assertions of theory to a illustration of the impact of the movement toward sci-
consideration of empirical evidence. entifically tested treatments on clinical practice are the
The quality of available research evidence has also National Institute of Clinical Excellence standards in
increased substantially over the past 20 years. Detailed the United Kingdom (NICE; Baker & Kleijnen, 2000),
and stringent guidelines have now been published established to ensure that clinicians practice specific and
regarding the execution and reporting of methodologi- accepted empirically based interventions for different
cally sound treatment outcome studies (Moher, Schulz, psychological conditions (see http://guid-
& Altman, 2001), and leading psychology journals such ance.nice.org.uk/Topic/MentalHealthBehavioural).
as the Journal of Consulting and Clinical Psychology Similarly, the Veterans Health Administration, which
require that manuscripts adhere to such guidelines (re- serves nearly 6 million veterans in the United States,
trieved November 23, 2014, from http://www. has undertaken a complete overhaul of its mental
apa.org/pubs/journals/ccp/index.aspx). These changes health practices and is implementing a systemwide dis-
have led to a greater emphasis on study quality. Given semination of empirically based treatments for posttrau-
the emphasis on establishing procedures as empirically matic stress disorder, depression, and serious mental
supported, guidebooks have been published that care- illness (Ruzek, Karlin, & Zeiss, 2012).
fully document how to design sound therapy research Importantly, the early work on ESTs was an impor-
investigations (e.g., Are!an & Kraemer, 2013). Recently, tant catalyst for the APA’s relatively recent emphasis on
a review of trials of psychodynamic and cognitive-be- evidence-based practice (EBP). EBP is a broad template of
havioral therapies, using a rating scale of various aspects activities that include assessment, case formulation, rela-
of methodological quality and study reporting (Kocsis tionship factors, and treatment decisions that will assist
et al., 2010), concluded that study quality and report- the clinician to work with a patient to achieve the best
ing have been significantly improving over the past possible outcome. In 2006, a Presidential Task Force of
four decades (Thoma et al., 2012). the American Psychological Association (APA Presi-
The EST movement has led to changes in how stu- dential Task Force on Evidence-Based Practice, 2006)
dents are trained in clinical practice. Although training adapted the Institute of Medicine’s (2001) definition of
programs still have a wide degree of latitude, EST lists evidence-based medicine, defining EBP as practice that
help guide curricula and inform syllabi. Most promi- integrates three sources of information: patient charac-
nently, the APA Commission on Accreditation’s Guideli- teristics, clinical expertise, and the best available
nes and Procedures (2013) encourages programs to train research evidence.
students in assessment and treatment procedures based on It might well be asked, given the broad movement
empirically supported methods, encourages placement in in psychology and other health disciplines toward EBP,
training settings that employ empirically supported whether identification of ESTs is still a necessary task.
approaches, and encourages internship training sites to We argue that it is, perhaps now more than ever. The
include methods of demonstrating that interns possess “three-legged stool” of research evidence, patient char-
intermediate to expert-level knowledge in ESTs. acteristics, and clinician expertise leaves room for
Finally, the development of lists of ESTs has resulted debate about the relative importance of each; however,
in greater protections for the public. By developing a we suggest that EBP is best approached as starting from
list of established and empirically supported interven- the perspective of ESTs—that is, for any given prob-
tions, treatment-seeking individuals are now better lem, what treatment or treatments have proven effica-
able to learn about and seek out information on cious? This scientific information is then interpreted
well-validated treatments for specific disorders and and potentially adapted based on clinician expertise and

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 2


patient characteristics. Thus, where treatment selection problems in how research evidence is synthesized and
is concerned, EBP might be thought of as an approach evaluated. The original Division 12 report on ESTs
to ESTs, filtering that scientific information through delineated specific criteria (see Table 1) by which a
the clinician’s and patient’s “lenses” (Djulbegovic & treatment would be regarded as “probably efficacious”
Guyatt, 2014; Tolin, 2014). or “well established” (Chambless & Hollon, 1998;
As a brief example, a clinician may want to select a Chambless & Ollendick, 2001; Task Force on Promo-
treatment approach for an impoverished African Amer- tion and Dissemination of Psychological Procedures,
ican man with a presenting complaint of depression, as 1993), and these criteria are still being used today. In
well as a significant drinking problem. Most likely, no brief, to meet the highest standard of “well estab-
published list of ESTs will match this situation per- lished,” a treatment must be supported by (a) at least
fectly. However, using the “filter system” of EBP may two independently conducted, well-designed studies or
lead to a helpful solution. Examination of the available (b) a large series of well-designed and carefully con-
ESTs for depression alerts the clinician to the fact that trolled single-case design experiments. To meet the
behavioral activation has strong empirical support in standard of “probably efficacious,” a treatment must be
the treatment of depression (Lejuez, Hopko, & Hopko, supported by at least one well-designed study or a small
2001; Lewinsohn, Biglan, & Zeiss, 1976; Martell, series of single-case design experiments.
Addis, & Jacobson, 2001). The contributing research, Given the proliferation of clinical research over the
however, did not address the present patient’s charac- past two decades, the improved quality of clinical
teristics such as socioeconomic status, race, and the research, and the adoption of more sophisticated meth-
presence of a co-occurring substance use disorder. The
clinician would therefore rely on expertise and addi-
tional research to understand how an EST such as Table 1. Current definitions of “well established” and “probably
efficacious” treatments (adapted from Chambless et al., 1998)
behavioral activation might be adapted in a manner
Well Established
that successfully addresses these issues. These modifica-
tions might include specific cultural adaptations (Ben-
I At least two good between-group design experiments
ish, Quintana, & Wampold, 2011; Griner & Smith, demonstrating efficacy in one or more of the following ways:
2006; van Loon, van Schaik, Dekker, & Beekman, Superior (based on statistical significance alone) to pill or
A
2013) or the addition (either concurrently or sequen- psychological placebo or to another treatment.
tially) of an EST for drinking problems such as behav- B Equivalent to an already established treatment in experiments
with adequate statistical power, considered to be approximately
ioral couples therapy (O’Farrell, Cutter, Choquette, 30 per group.
OR
Floyd, & Bayog, 1992) or contingency management II A large series of single-case design experiments (n > 9)
demonstrating efficacy. These experiments must have:
(Petry, Martin, Cooney, & Kranzler, 2000). The treat-
ment(s) must also be delivered competently in a way A Used good experimental designs and
B Compared the intervention to another treatment as in IA.
that successfully engages the patient, thus requiring a Further criteria for both I and II:
III Experiments must be conducted with treatment manuals.
high level of clinical competency and cross-cultural Characteristics of the client samples must be clearly specified.
IV
awareness. The process starts, however, with identifica- V Effects must have been demonstrated by at least two different
investigators or investigating teams.
tion of a specific EST. To make informed decisions,
patients and clinicians must be aware of the available Probably Efficacious
scientific evidence, and the degree of confidence that
can be placed in that evidence.
I Two experiments showing the treatment is superior (based on
statistical significance alone) to a waiting-list control group.
WHY DOES THE LIST NEED TO BE REVISED? OR
II One or more experiments meeting all criteria for well-established
Many authors, including those broadly in agreement treatments except V (demonstration by independent investigator
teams).
with the EST concept in theory, have raised significant OR
concerns about how ESTs are currently defined. Many III A small series of single-case design experiments (n > 3) meeting
well-established treatment criteria II, III, and IV.
of the critiques of the EST movement point to

EMPIRICALLY SUPPORTED TREATMENTS ! TOLIN ET AL. 3


ods for research synthesis and evaluation, we concur only for pharmaceutical studies but also for studies of
with many critics who have suggested that the current psychological interventions, although poor adherence
criteria are outdated (see Table 2). The evaluation to registration policies and poor quality of trial registra-
based on two studies sets an unacceptably low bar for tions have been problematic (Riehm, Azar, & Thombs,
efficacy, may not account for mixed findings, and risks 2015).
creating a misrepresentative and highly selective The exclusive focus on symptom reduction risks
impression of efficacy (Borkovec & Castonguay, 1998; ignoring other potentially important clinical outcomes,
Henry, 1998; Herbert, 2003). For example, if two such as functional impairment (Dobson & Beshai,
studies find evidence that a given treatment is effica- 2013), despite the fact that functional concerns are a
cious, five studies find the treatment is no better than leading reason for individuals to seek treatment (Hunt
placebo, and 10 studies find that the treatment is worse & McKenna, 1993). Although symptom reduction and
than placebo, the current criteria for a designation of a improvements in functioning are significantly corre-
“well-established” EST would be satisfied. This is not a lated, there can be a mismatch after treatment (see
hypothetical scenario, and many bodies of treatment Vatne & Bjorkly, 2008, for review). Thus, it is possible
evidence include some studies with statistically signifi- that a treatment is highly effective at reducing specific
cant results favoring a treatment and other studies that target symptoms, and yet the patient fails to achieve
report null or even negative findings. This is a problem desired clinical outcomes such as improved social or
that occurs across areas of research, and its influence occupational functioning. Therefore, a number of
has been well documented in the evidence on pharma- scholars have cautioned against the overreliance of
ceutical products, where a clear bias for trials favorable symptom-based evaluations of efficacy and have instead
to a sponsored product has been demonstrated (Lex- urged consideration of wellness, quality of life, well-be-
chin, Bero, Djulbegovic, & Clark, 2003; Lundh, Sis- ing, and functionality (Cowen, 1991; Hayes, 2004;
mondo, Lexchin, Busuioc, & Bero, 2012). Registration Seligman, 1995). We propose that symptom reduction
of clinical trials (e.g., at www.clinicaltrials.gov) is no longer be considered the sine qua non of treatment
increasingly emphasized to address this problem, not outcome. Symptom reduction is important in deter-

Table 2. Common critiques of the EST movement and suggested changes

Area Critiques Proposed Changes

Concerns about the strength of • Inadequate attention to null or negative findings • Emphasize systematic reviews rather than
treatment • Reliance on statistical, rather than clinical, individual studies
significance • Separate strength of effect from strength of
• Inadequate attention to long-term outcomes evidence
• Potentially significant variability in study quality • Grade quality of studies
• Consider clinical significance in addition to
statistical significance
• Consider long-term efficacy in addition to
short-term efficacy
Concerns about selecting among
multiple treatment options • Within a given EST category, there is little basis for
choosing one over another
• Present quantitative information about
treatment strength
• Lack of clarity about whether empirical support
translates to a recommendation
• Make specific recommendations based on
clinical outcomes and the quality of the
available research
Concerns about the relevance of
• Inadequate attention to functional outcomes • Include functional or other health-related
findings
• Inadequate attention to effectiveness in outcomes as well as symptom outcomes
nonresearch settings or with diverse populations • Address generalization of research findings to
nonresearch settings and diverse populations
Concern about unclear active
treatment ingredients and the • Listing of packaged treatments rather than
empirically supported principles of change
• Evaluate and encourage dismantling research
to identify empirically supported principles of
proliferation of manuals for specific
• Emphasis on specific psychiatric diagnoses change
diagnoses
• De-emphasize diagnoses and emphasize
syndromes/mechanisms of psychopathology

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 4


mining the efficacy of a treatment, but the value of those working in routine clinical settings. The issue of
symptom reduction is greatly diminished if functional treatment generalizability is complex. Patients seen in
improvement is not also demonstrated. Functional out- clinical settings do not necessarily appear more com-
comes address domains of psychosocial functioning, plex or severe than those seen in clinical trials; in one
which may include work attendance or performance, study of clinical outpatients deemed ineligible for
school attendance or performance, social engagement, depression research trials, the most common reasons for
or family functioning. Several measures of such func- exclusion were partial remission of symptoms at intake
tional outcomes have been published, including the and insufficient severity or duration of symptoms.
Sheehan Disability Scale (Sheehan, 2008), Leibowitz Importantly, of those meeting criteria for major depres-
Self-rating Disability Scale (Schneier et al., 1994), sion, none were excluded due to Axis I or Axis II
Work and Social Adjustment Scale (Mundt, Marks, comorbidity (Stirman, Derubeis, Crits-Christoph, &
Shear, & Greist, 2002), Range of Impaired Functioning Rothman, 2005).
Tool (Leon et al., 1999), and the functional subscales Evidence for differential efficacy of treatments
of the Outcomes Questionnaire (Lambert et al., 1996), administered in research versus clinical settings is
in addition to a wide array of performance-based func- mixed. In some cases, randomized and nonrandomized
tional tests from disciplines such as industrial/organiza- patients receiving similar treatments appear to do
tional psychology. The value of specific measures in equally well (Franklin, Abramowitz, Kozak, Levitt, &
the evidence review will depend on their psychometric Foa, 2000), whereas in other cases, treatments adminis-
properties and direct relevance to the clinical problem tered in a research setting yield outcomes superior to
being treated. the same treatments administered in a clinical setting
Quality of life (QOL) is a less well-defined construct (Gibbons, Stirman, DeRubeis, Newman, & Beck,
(Gill & Feinstein, 1994), which is problematic for 2013; Kushner, Quilty, McBride, & Bagby, 2009). The
many trials of psychological treatment, given its appar- reasons for a possibly stronger response in research trials
ently strong overlap with depression (Keltner et al., are unclear, but could include factors such as therapist
2012). We therefore concur with Muldoon, Barger, training and fidelity monitoring, setting time limits for
Flory, and Manuck (1998) that objective functioning treatment, and providing feedback to clinicians and
and subjective appraisals of well-being be considered patients on treatment progress.
separately. Nevertheless, there is increasing interest in Many have called for a greater emphasis on effective-
QOL as an outcome measure in trials of psychological ness research, which focuses primarily on the generaliz-
treatments, particularly in the United Kingdom (e.g., ability of the treatment to more clinically representative
Layard & Clark, 2014), and its inclusion in treatment situations. We therefore suggest that the evaluation of
guidelines should be considered carefully going ESTs attend not only to the efficacy of a treatment in
forward. research settings, but also in terms of that treatment’s
There is, at present, no clear way to establish effectiveness in nonresearch settings. Criteria that could be
whether a treatment has proven effective with diverse considered include more diagnostically complex
populations or in more clinically representative settings patients, effectiveness with nonrandomized patients,
(Beutler, 1998; Goldfried & Wolfe, 1996, 1998; Gon- effectiveness when used by nonacademic practitioners,
zales & Chambers, 2002; Norcross, 1999; Seligman, and utility in open-ended, flexible practice.
1996). Concerns about the transportability of treatment The internal validity and degree of research bias in
include the fact that patients seen in routine clinical clinical trials are not adequately addressed, potentially
practice might be more complex or heterogeneous than making the results prone to false-positive results
those in efficacy-oriented RCTs, that willingness to be (Luborsky et al., 1999; Wachtel, 2010). Internal valid-
randomized to treatments may be a confounding factor ity relates to the degree to which a given trial likely
that diminishes sample representativeness, and that the answers the research question being evaluated correctly
therapists used in efficacy RCTs are more highly or free from bias. Bias is systematic error that can lead
trained, specialized, monitored, or structured than are to underestimation or overestimation of true treatment

EMPIRICALLY SUPPORTED TREATMENTS ! TOLIN ET AL. 5


effects (Higgins & Green, 2008). It is not usually lines for panic disorder (National Institute for Clinical
possible to know with precision the degree to which Excellence, 2011), clinicians are advised to use
design flaws may have influenced results in a given cognitive-behavioral therapy (CBT). We would not
treatment trial, but elements of trial design have been disagree with this recommendation; however, NICE
shown to be related to bias. In RCTs, generally, design provides little means for understanding what kind of
weaknesses related to allocation concealment, blinding, CBT is most helpful or the strength of various inter-
and randomization methods may be expected to influ- ventions. Thus, although existing guidelines are com-
ence effect estimates, particularly when outcomes are prehensive and immediately available, we argue that
subjective (Savovic et al., 2012), which is the case in there is room for an alternative source of information
most trials of psychological treatments (Wood et al., for consumers of research on psychological treatments.
2008). An additional example is the researcher alle- As the Society of Clinical Psychology has been at the
giance effect (Gaffan, Tsaousis, & Kemp-Wheeler, forefront of identifying and disseminating ESTs for the
1995; Luborsky et al., 1999). The presence of past two decades and is one of the most prominent
researcher allegiance does not necessarily imply bias organizations dedicated to psychological ESTs in par-
(Hollon, 1999; Leykin & DeRubeis, 2009); however, ticular, it is logical for this group to take the lead in
it is a risk factor that has been shown empirically to be this next phase of treatment evaluation.
associated with some probability of bias. Financial con- In recent years, the APA Advisory Steering Commit-
flict of interest, a demonstrated source of publication tee for the Development of Clinical Practice Guidelines
bias in pharmaceutical studies (Friedman & Richter, was formed to provide research-based recommendations
2004; Lexchin et al., 2003; Perlis et al., 2005), may for the psychological treatment of particular disorders
also be considered in rating risk of bias (Bero, 2013; (Hollon et al., 2014). When in place, guideline devel-
Roseman et al., 2011, 2012), although conflict of opment panels, under the direction of the Steering
interest may be harder to identify and quantify in stud- Committee, will oversee the development of clinical
ies of psychological treatments. practice guidelines. A number of steps that have been
proposed by the Advisory Steering Committee to gen-
DOES THE WORLD NEED ANOTHER LIST OF ESTS? erate patient-focused, scientifically based, clinically use-
Even though, as we argue, it remains of vital impor- ful guidance point the way toward steps that should be
tance to identify ESTs, one might ask whether another taken for a much-needed update of EST standards.
list would be beneficial to the field. We suggest that a Two of them, in particular, should be central to mod-
well-designed list could be of great import, filling ernizing EST standards: (a) the evaluation of all existing
noticeable gaps in the available knowledge. Three evidence via high-quality systematic reviews, which
alternative systems with which readers are likely to be include (i) evaluation of relevance to clinical practice,
familiar include the NICE standards in the United including treatment fidelity; (ii) an assessment or risk of
Kingdom (Baker & Kleijnen, 2000), the Practice bias; and (iii) other considerations, including evaluation
Guidelines published by the American Psychiatric of multiple clinical outcomes, including functional, as
Association (e.g., 2009, 2010), and the Veterans well as symptom, outcomes; and (b) a committee-based
Administration/Department of Defense Clinical Prac- appraisal of the evidence, using the Grading of Recom-
tice Guidelines (e.g., Veterans Health Administration, mendations Assessment, Development, and Evaluation
2004, 2009). These systems are available immediately (GRADE) system (Atkins et al., 2004; Guyatt et al.,
and have the advantage of addressing both psychologi- 2006, 2008) to assess the quality of relevant evidence
cal and pharmacological treatments. However, the and degree to which benefits are established in excess of
breadth of these systems is also a limitation for psychol- potential harms.
ogists. As broad guidelines, they lack the depth of The proposed process by the APA Advisory Steering
information that clinical psychologists or other psy- Committee for the Development of Clinical Practice
chotherapy practitioners would need to make informed Guidelines represents an important step forward in initi-
treatment decisions. For example, in the NICE guide- ating a disorder-based guideline development process

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 6


for psychological treatments. This process, which paral- for reasons other than those proposed by the treatment
lels that used by the Institute of Medicine (2011a, developers (Lohr, Tolin, & Lilienfeld, 1998).
2011b), is expected to result in a transparent system of An emphasis on identifying the active ingredients of
treatment recommendations for practitioners and con- change need not exclude factors associated with the
sumers. However, it is an expensive and extremely therapeutic relationship. Indeed, many have suggested
time-consuming process, and it is unlikely that the Task that the therapeutic relationship accounts for greater
Force will develop recommendations for a wide range variance in clinical outcomes than do those aspects of
of clinical problems in the immediate future. Indeed, the therapy that are described as “techniques” (Blatt &
the APA initiated a process for producing guidelines in Zuroff, 2005; Henry, 1998; Lambert & Barley, 2001;
2010 and announced panels to develop guidelines for Norcross, 1999). Relationship-oriented therapist
the treatment of obesity and posttraumatic stress disor- behaviors are themselves subject to empirical scrutiny.
der in 2012 and for depressive disorders in 2013, but A pressing question, however, is whether there is
has not yet generated any finished guidelines. Thus, enough research to date to make meaningful recommen-
there is an immediate need for dissemination of up-to- dations to practitioners, consumers, and other stakehold-
date, evidence-based guidance that can not only com- ers based solely on empirically supported principles of
plement the work of the APA Task Force, but also pro- change. We suggest that the field is approaching that tar-
vide practitioners with clear information about the get, but has not yet arrived. Certainly, there is much
strength of ESTs and the degree of confidence that can work being done in this area (e.g., Castonguay & Beutler,
be derived from the available evidence. 2006); however, in our opinion, the field has not yet
amassed a body of evidence that would adequately address
TO WHAT EXTENT SHOULD WE FOCUS ON ESTABLISHED the multiple concerns of patients seen in clinical settings.
TREATMENTS, VERSUS PRINCIPLES OF CHANGE? As just one example, a recent review concluded that the
Over time, the field would likely benefit from a shift mechanisms of prolonged exposure (PE) for posttraumatic
away from “named” or “packaged” treatments. The cur- stress disorder (PTSD), which is a well-studied and fairly
rent EST list includes more recent multicomponent straightforward treatment, remain unclear (Zalta, 2015). It
treatments that contain many different interventions would be difficult, therefore, to evaluate only mecha-
within one treatment “package.” CBT for fibromyalgia, nism-based processes at this time, although we believe
as one example of a treatment currently identified as well that such research should be emphasized going forward.
established, is described as including education, relax-
ation, graded behavioral activation, pleasant activity HOW SHOULD WE HANDLE TREATMENTS WITH CONFLICTING
scheduling, sleep hygiene, stress management, goal set- EVIDENCE?
ting, structured problem solving, reframing, and com- As noted previously, a primary limitation of the exist-
munication skills (Bernardy, Fuber, Kollner, & Hauser, ing criteria is that it allows reviewers to select two pos-
2010). While the assessment of such treatment packages itive studies, while potentially ignoring studies with
is a necessary step in identifying what works, such null or even negative outcomes. In our view, the only
research does not allow for a determination of which defensible strategy is a systematic (quantitative) review
aspects of the treatment are responsible for change (Bor- that takes into account all of the available research evi-
kovec & Castonguay, 1998; Gonzales & Chambers, dence, rather than selecting a limited number of posi-
2002; Henry, 1998). That is, within a given treatment tive studies. This is the approach that has been
package, there is no way to determine which compo- proposed by the APA Advisory Steering Committee
nents of that treatment are therapeutically active or inert. for the Development of Clinical Practice Guidelines
As a result, practitioners are often unable to make (Hollon et al., 2014). Twenty years ago, there were
informed decisions about which treatments to use (Her- not enough controlled research trials, in many cases,
bert, 2003; Rosen & Davison, 2003; Westen, Novotny, for such a process to be feasible. Today, however, the
& Thompson-Brenner, 2004), and many treatments may field has seen a marked increase in published research,
be weakened by ineffective components and/or work making larger-scale reviews possible.

EMPIRICALLY SUPPORTED TREATMENTS ! TOLIN ET AL. 7


HOW MUCH WEIGHT SHOULD WE AFFORD IMMEDIATE clinical practice might be more complex or heteroge-
VERSUS LONG-TERM EFFICACY OF TREATMENTS? neous than those in efficacy-oriented RCTs, that will-
Both short-term and long-term outcomes of psycho- ingness to be randomized to treatments may be a
logical treatment are important. Short-term outcomes confounding factor that diminishes sample representa-
are frequently the strongest and give the best estimate tiveness, and that the therapists used in efficacy RCTs
of the immediate efficacy of the treatment. However, are more highly trained, specialized, monitored, or
it is quite possible that a given treatment is effective in structured than are those working in routine clinical
the short term but not at a time point well after treat- settings. Many have therefore called for a greater
ment discontinuation (i.e., participants exhibited signs emphasis on effectiveness research, which focuses primarily
of relapse). In some cases, this might reflect a basic on the generalizability of the treatment to more clini-
weakness of the treatment, suggesting that its effects are cally representative situations.
not durable. In some other cases, it could be argued We suggest that treatments should be evaluated from
that the treatment is only effective so long as one both an efficacy and effectiveness perspective. Specifi-
remains in the treatment; so long as the treatment can cally, it is important to identify treatments that are not
be feasibly delivered on a maintenance basis, this is not only efficacious in research-based settings but have also
necessarily a fatal flaw. For example, while many have demonstrated evidence of effectiveness in more typical
pointed out that gold standard cognitive-behavioral clinical settings. Criteria that could be considered
treatments for obesity have short-term effects (most include more diagnostically complex patients, effective-
people eventually gain back their lost weight), others ness with nonrandomized patients, effectiveness when
point out that a continuous care model is both feasible used by nonacademic practitioners, and utility in open-
and better suited to the problem of overeating (Perri, ended, flexible practice.
Sears, & Clark, 1993). In still other cases, a lack of
long-term efficacy may reflect the presence of compet- HOW SHOULD TREATMENT COSTS AND BENEFITS BE
ing issues (e.g., chronic psychosocial stressors) that WEIGHED?
complicate the long-term prognosis despite an appar- There is, unfortunately, no quantitative “gold standard”
ently successful treatment, suggesting the need for sup- for determining whether or not a treatment is cost-ef-
plemental intervention. Alternatively, it is possible that fective. Nevertheless, cost-effectiveness considerations
a treatment might show only modest clinical effects at must be taken into account. Two treatments may show
immediate posttreatment, but outcomes become stron- similar clinical effects, but if one treatment is clearly
ger after treatment discontinuation (sleeper effects) due more costly to consumers, third-party payers, or society
to memory consolidation effects, skill practice effects, (e.g., the treatment requires a very large number of ses-
naturalistic reinforcement, or other factors. Consumers, sions, long duration, or hospitalization), then this
practitioners, and policymakers should be able to evalu- should be taken into consideration. It would be pro-
ate both short- and long-term treatment effects as part hibitive to conduct a full cost-benefit analysis of every
of a systematic review. psychological treatment, but a reasonable panel of
reviewers should be able to upgrade or downgrade a
HOW SHOULD WE ADDRESS EFFICACY VERSUS treatment based on obvious strengths or weaknesses in
EFFECTIVENESS? cost or patient burden.
Many authors have questioned whether the results of
RCTs conducted in clinical research settings will trans- WHAT STRENGTH OF EFFECT SHOULD BE CONSIDERED
late to more clinically representative settings such as “GOOD”?
private practice, community mental health centers, and Various attempts to define cutoffs of “good response”
hospitals (Beutler, 1998; Goldfried & Wolfe, 1996, have been proposed. Cohen (1988), for example, sug-
1998; Gonzales & Chambers, 2002; Norcross, 1999; gested that effect sizes (d) of 0.2, 0.5, and 0.8 be con-
Seligman, 1996). Concerns about the transportability of sidered small, moderate, and large effects, respectively.
treatment include the fact that patients seen in routine Others have proposed varying definitions of treatment

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 8


response and remission (Andreasen et al., 2005; Doyle symptoms of a psychological disorder directly. Thus, a
& Pollack, 2003; Frank et al., 1991; McIntyre, Fallu, & “clinically meaningful” treatment result for one group
Konarski, 2006; Simpson, Huppert, Petkova, Foa, & and purpose might not be suitable for another group
Liebowitz, 2006), usually operationalized as a cutoff and purpose. The conclusion that a treatment is “effi-
score on a standardized measure. Similarly, many have cacious” therefore is a subjective process that requires
called for the use of reliable change (demonstration that human decision-making.
reduction on a measure is greater than would be
expected to occur at random) and clinically significant A PROPOSED SYSTEM OF TREATMENT EVALUATION FOR THE
change (variously described as posttreatment scores no SOCIETY OF CLINICAL PSYCHOLOGY
longer in the pathological range, posttreatment scores As described previously, the proposed process of sys-
in the normal range, or posttreatment scores that are tematic evaluation by the APA Advisory Steering
closer to the normal range than the pathological range) Committee for the Development of Clinical Practice
as outcome criteria (Jacobson, Follette, & Revenstorf, Guidelines represents a clear move in the right direc-
1984; Lambert & Bailey, 2012). Some have used the tion. However, we argue that there remains a need,
criterion of good end-state functioning (e.g., Feeny, both due to the time-consuming nature of the APA
Zoellner, & Foa, 2002), reflecting scores in the normal process and due to the specific needs of clinical
range on a variety of different measures, not solely psychologists and consumers for evidence-based deci-
measures of the disorder being treated. From a popula- sion-making, for the Society of Clinical Psychology to
tion-based perspective, some have suggested the use of create a new system by which scientific evidence of
statistics such as number needed to treat (NNT), treatment efficacy can be evaluated and disseminated in
reflecting the number of patients needed to treat to a clear, transparent, and cost-effective manner that pri-
observe one improvement. oritizes the empirical basis of psychological treatments.
These methods (many of which overlap consider- The system we propose here is consistent with the
ably) all have their individual strengths and weaknesses. methods that will be used by the APA Task Force
Ultimately, however, there is no clear consensus in the (Hollon et al., 2014), but requires less time and there-
field to tell us how strong of an effect must be fore can provide more rapid dissemination of findings
observed before we pronounce a treatment to be effi- and recommendations. The most time-consuming
cacious. In our view, the degree to which treatment aspect of the APA Task Force will be the systematic
effects are considered clinically meaningful is highly review of research findings. That process could be
dependent on contextual factors such as the disorder greatly sped up by using existing, published systematic
being treated and the goals of treatment. In a case of reviews of the literature. Since the original EST criteria
(for example) mild depression treated on an outpatient were developed, systematic reviews and meta-analyses
basis, full remission and good end-state functioning are now available for most interventions, and for many
might be considered appropriate targets, and one might of these, the Task Force will be able to use high-qual-
be skeptical of a treatment that fails to achieve those ity reviews that have already been published in order
goals. On the other hand, for chronically psychotic to expedite its work.
patients seen in residential or day treatment, improve- We note as well that although many of the existing
ments in psychosocial functioning, regardless of the clinical trials and systematic reviews are based on par-
presence of psychotic symptoms, might be considered ticipants selected according to diagnostic criteria (e.g.,
an appropriate goal, and full remission would not be those listed in the Diagnostic and Statistical Manual of
reasonably expected. Brief inpatient interventions for Mental Disorders [5th ed.; DSM-5; American Psychiatric
suicidality may have as their aim the reduction of suici- Association, 2013]), there is no requirement that they
dal ideation and behavior, but not necessarily the do so. Indeed, the reliability and validity of the DSM
remission of depression. Interventions with medical and the medical taxonomy implied therein have been
populations might aim to improve compliance with critiqued as a basis for psychotherapy research (Fenster-
treatment regimens, but not necessarily address the heim & Raw, 1996; Henry, 1998). Over the coming

EMPIRICALLY SUPPORTED TREATMENTS ! TOLIN ET AL. 9


years, we encourage clinical psychology researchers to APA work groups (American Psychological Associa-
focus on distinct, empirically derived syndromes of psy- tion, 1995, 2002), review panels should (a) be com-
chopathology (which can range from mild to severe), posed of individuals with a broad range of documented
rather than on categorical diagnoses. Such a shift would expertise, (b) disclose actual and potential conflict of
comport well with the Research Domain Criteria interest, (c) maintain a climate of openness and free
(RDoC) project currently underway within the exchange of views, and (d) have clearly defined pro-
National Institute of Mental Health (Insel et al., 2010), cesses and methods.
although the specific RDoC dimensions may or may When an individual nominates a treatment for eval-
not be those chosen as targets for psychotherapy uation, the nominator may provide existing reviews or
research. That shift would also likely decrease the EST may create a new review for this purpose, although all
movement’s reliance on a large number of treatment reviews will be evaluated carefully for thoroughness
manuals, a process to which many authors, even those and risk of bias (see below). Published or unpublished
supportive of the broad EST movement, object (e.g., systematic reviews that are not deemed to meet rigor-
Fonagy, 1999; Goldfried & Eubanks-Carter, 2004; ous quality standards will not be considered for EST
Levant, 2004; Norcross, 1999; Wachtel, 2010; Westen designation. Recently conducted reviews (i.e., within
et al., 2004). Understanding the core dimensions of the past 2 years) will be required unless the evidence in
pathology and the treatments that target this dimension an older review is robust and a strong case can be
would create a much simpler, more intuitive, and more made that it is unlikely that there are recent develop-
practitioner-friendly system. ments that would influence the evaluation of the body
The proposed system takes into account the recom- of evidence for or against a treatment. The effective-
mendations of APA work groups (American Psycho- ness of a given treatment can be evaluated (a) based on
logical Association, 1995, 2002), suggesting that comparisons to known and quantifiable inactive control
treatment guidelines should (a) be based on broad and conditions including (i) wait list, (ii) pill placebo, and
careful consideration of the relevant empirical litera- (iii) psychological placebo or (b) by comparing alterna-
ture, (b) take into consideration the level of method- tive psychological treatments.
ological rigor and clinical sophistication of the research,
(c) take comparison conditions into account, (d) con- Evaluating the Quality of Systematic Reviews
sider available evidence regarding patient-treatment There are a number of ways to determine whether a
matching, (e) specify the outcomes the intervention is systematic review has been conducted with sufficient
intended to produce, (f) identify known patient vari- transparency and rigor to provide confidence that its
ables that influence the utility of the intervention, (g) results are comprehensive and reflect the best possible
take the setting of the treatment into account, (h) note evidence. The Cochrane Handbook (Higgins & Green,
possible adverse effects, and (i) take treatment cost into 2008) and the Preferred Reporting Items for Systematic
account. Reviews and Meta-Analyses (PRISMA; Liberati et al.,
2009) are well-respected systems for evaluation; the
STEP 1: EXAMINATION OF SYSTEMATIC RESEARCH REVIEWS Task Force will use, at least in its initial efforts, the
We propose that candidate treatments be evaluated on AMSTAR checklist (Shea, Bouter, et al., 2007; Shea,
the basis of existing (published or unpublished) quanti- Grimshaw, et al., 2007; Shea et al., 2009) as described
tative reviews by a Task Force operating under the above and shown in the online supplement. The
direction of the Committee on Science and Practice, AMSTAR checklist is not scored; therefore, there is
the group that has overseen the identification of ESTs no cutoff at which a review is considered reliable;
over the past two decades. The process of member rather, the items on the checklist will be used to
selection should be transparent, with an open nominat- inform the group’s subjective decision of when a sys-
ing process, public listing of member names, and orga- tematic review is of sufficient quality and reported suf-
nizational measures to ensure diversity of member ficiently well to be used by the Division 12 Task Force
backgrounds. Following the recommendations of the (Table 3).

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 10


used to evaluate RCTs. Consistent with current
Table 3. Summary of the proposed Division 12 procedure for evaluating
empirically supported treatments approaches to evidence synthesis, however, we do not
recommend that evidence from only single-subject
Step Process Details designs be used as the basis of recommendations, which
should rely largely on synthesis of data from larger clin-
Step 1 Systematic review • Treatment is nominated
• Existing systematic review is evaluated
according to:
ical trials.
o PICOTS (population, intervention,
comparison, outcomes, timeline, Evaluation of Relevance to Clinical Practice
setting)
o Risk of bias (low, unclear, high) An important component for ensuring the external
Step 2 Committee-based
evidence review
• GRADE (Grading of Recommendations
Assessment, Development, and validity of systematic reviews is the definition of struc-
Evaluation) recommendation by tured review questions. The mnemonic PICOTS refers
committee: very strong, strong, weak
to the explicit delineation of trials that are eligible for
consideration in the systematic review based on the
In some cases, a systematic review may combine tri- population that received the treatment (P); the inter-
als in which the treatments differed from each other in vention delivered (I); the comparison, such as another
one or more ways, such as the manner in which an active treatment or an inactive control (C); outcomes
intervention was applied, the characteristics of the pro- that are assessed (O); the timeline (e.g., 12 weeks,
vider, or the length of treatment or follow-up. In some 6 months, or long-term) (T); and setting of treatment,
cases, elements of treatment might be added to or sub- for example, inpatient psychiatry versus outpatient
tracted from certain studies. Such modifications across community clinics (S). To ensure external validity or
studies create a dilemma for the reviewers, who must generalizability, the Task Force should insist that a clear
determine whether there is sufficient similarity among PICOTS statement is included in the systematic
the studies to consider them all to be testing the same review, clearly defining the population of interest, the
essential treatment. Some degree of clinical heterogene- intervention, the comparisons considered, outcomes
ity must be anticipated and allowed, or else there examined, and timing of outcome assessment.
would be very few meaningful groupings of studies for In addition, the systematic review should evaluate
review. However, the degree to which there is clinical the degree to which trials included in the review took
heterogeneity that negatively impacts the interpretabil- steps to ensure treatment fidelity. Bellg et al. (2004)
ity of a single quantitative result must be carefully con- provide a thorough discussion of elements of treatment
sidered before a meta-analysis is considered by the Task fidelity and steps that can be taken to enhance treat-
Force (Ioannidis, 2008). A standard part of the review ment fidelity in trials of behavior change studies. In the
should include agreement among reviewers that all of context of systematic reviews, there are no standard
the selected studies are similar enough that they can be instruments for assessing steps taken to ensure treatment
considered to reflect a single treatment. fidelity in included trials. Elements that were included
The use of systematic reviews does not preclude the in Chambless and Hollon’s (1998) original EST defini-
inclusion of single-case designs, as these designs, when tion, and that continue to be evaluated in evidence
using appropriate experimental control, can establish reviews, are therapist qualifications and training, the
causality (Horner et al., 2005) in a manner comparable use of a treatment manual, and monitoring of the
to RCTs, although the smaller number of subjects may degree to which the treatment is implemented accord-
limit the generalizability of findings. Methods have ing to the manual.
been developed for calculating effect sizes of such stud-
ies and conducting Bayesian and multilevel modeling Assessing Risk of Bias
(see Shadish, 2014, for a summary). Assessment of the The original EST criteria (Chambless et al., 1998)
quality of single-subject designs could employ pub- operationalized methodological adequacy as including
lished quality indicators (Horner et al., 2005; Smith the use of a treatment manual, a well-characterized
et al., 2007), in a manner that parallels the procedures sample, and random assignment to treatment and

EMPIRICALLY SUPPORTED TREATMENTS ! TOLIN ET AL. 11


control conditions. Since these criteria were published, that the outcome analyses adequately represent the out-
however, standards for evaluating both the external and comes of the sample. Examination of selective outcome
internal validity of treatment trials have evolved sub- reporting helps identify whether important (possibly
stantially, and there are now several widely accepted nonsignificant) findings were omitted from the report
methods of determining methodological adequacy that of the study (Higgins & Green, 2008). Whether or not
should be considered. We recommend that authors of clinical trials are registered and, if so, ascertaining
systematic reviews assess validity using the Cochrane whether published outcomes are consistent with regis-
Risk of Bias Tool (Higgins et al., 2011). This tool, tered outcomes is an important step in a systematic
widely regarded as the standard for evaluating risk of review (Milette, Roseman, & Thombs, 2011; Thombs,
bias in RCTs included in systematic reviews, provides Kwakkenbos, & Coronado-Montoya, 2014).
a rating system and criteria by which individual RCTs Across all dimensions, trials are rated as high risk of
are evaluated according to the potential sources of bias bias, unclear risk of bias, or low risk of bias. Cochrane
related to (a) adequate allocation sequence generation; advocates that systematic reviews assess the potential
(b) concealment of allocation to conditions; (c) blind- influence on outcomes of each of these dimensions
ing of participants, personnel, and outcome assessors; separately and recommends against attempting to gen-
(d) incomplete outcome data; (e) selective outcome erate a single score or rating of overall bias (Higgins &
reporting; and (f) other sources of bias (see online sup- Green, 2008). Summary scores tend to confound the
plement). Adequate sequence allocation ensures that quality of reporting with the quality of trial conduct,
study participants were appropriately randomized to to assign weights to different items in ways that are dif-
study conditions. Allocation concealment means that ficult to justify, and to yield inconsistent and unpre-
the random assignment is implemented in a way that dictable associations with intervention effect estimates
cannot be predicted by participants or key study per- (Greenland & O’Rourke, 2001; Juni, Witschi, Bloch,
sonnel. Blinding of key study personnel and outcome & Egger, 1999).
assessors ensures that those personnel in a position to Both individual trials and systematic reviews can be
affect outcome data are unaware of participants’ study judged as having low, unclear, or high risk of bias (see
condition. Blinding of participants indicates that the online supplement). A systematic review would be
participants themselves are unaware of study condition. graded to be at low risk of bias when the conclusions
Blinding of participants is not commonly used (and is from the review are based on evidence judged to be at
often not possible) in trials of psychotherapy. In many low risk of bias, according to the GRADE dimensions
cases, such as when a treatment group is compared to a described above. Note that this grading system differs
nontreatment group, this would be reflected as a markedly from those originally proposed by the Divi-
methodological limitation common to studies of psy- sion 12 Task Force (e.g., Chambless et al., 1998). Two
chological treatments. However, the Cochrane system well-conducted studies are no longer considered suffi-
allows a “low risk of bias” determination on this item cient; this system would now require that the conclu-
when the outcome and outcome measurement are not sions of the systematic review are based on studies
likely to be influenced by lack of blinding, or outcome deemed to be of high quality.
assessment was blinded and the nonblinding of partici- Assessment of risk of bias requires human judgment
pants was unlikely to introduce bias. Blinding of partic- (Higgins et al., 2011), and, unfortunately, there is no
ipants, or at least to study aims and hypotheses, would quantitative algorithm that will consistently lead to reli-
be possible in comparison trials between two psycho- able and valid assessment. Thus, there will always be
logical treatments; full blinding of participants has been room for disagreement and debate about the merits of
noted in some studies of computerized cognitive bias individual studies and about the quality of research
modification training (e.g., Amir, Beard, Burns, & across studies for a given treatment. Assessment of
Bomyea, 2009). Appropriate handling of incomplete whether a particular methodological concern in a trial
(missing) outcome data due to attrition during the creates a risk of bias requires both knowledge of the
study or to exclusions from the analysis helps ensure trial methods and a judgment about whether those

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 12


methods are likely to have led to a risk of bias. The framework to guide the decision-making process, and
Cochrane Risk of Bias Tool, at least, makes the deci- to make the process as transparent as possible so that the
sion process transparent and provides accessible guid- public can understand how these judgments were made.
ance for how decisions should be made (Higgins & A number of different strategies have been
Green, 2008). employed by guideline developers to attempt to make
clear the strength of evidence and recommendations,
Additional Considerations for the Evaluation of Systematic although the most widely used system is the GRADE
Reviews and Recommendations for Implementation system (Atkins et al., 2004; Guyatt et al., 2008). The
Systematic reviews will be examined for both short- aim of the GRADE system is to rate quality of evi-
term and long-term outcomes. Long-term outcomes dence and strength of recommendations in a manner
will generally be defined as outcomes collected some that is explicit, comprehensive, transparent, and prag-
time after treatment discontinuation; however, we rec- matic. Factors that are taken into account in making
ognize that some treatments may include a low-inten- these decisions include the methodological quality of
sity “maintenance” phase that continues for a long time the evidence that supports estimates of benefits, costs,
after the more acute phase; outcomes during the main- and harms; the importance of the outcome that the
tenance phase might be appropriate for consideration as treatment improves; the magnitude of the treatment
long-term effects. Effects for both symptom reduction effect and the precision of its estimate; the burden,
and functional outcomes will be coded, relying on vali- costs, and potential risks associated with the therapy;
dated measures that are appropriate for the population and other consumer values that might be expected to
and treatment under study. Finally, the review will note influence their decision process.
whether the treatment has demonstrated effectiveness
(e.g., more diagnostically complex patients, effectiveness Using the GRADE System for Treatment Recommendations
with nonrandomized patients, effectiveness when used The GRADE system rates evidence quality as high,
by nonacademic practitioners, and utility in open- moderate, or low. Evidence is judged to be high quality
ended, flexible practice) in addition to efficacy. when reviewers can be highly confident that the true
effect lies close to that of the estimate of the effect. For
STEP 2: COMMITTEE-BASED EVIDENCE REVIEW USING THE example, evidence is judged as high quality if all of the
GRADE TOOL following apply:
The systematic review, having been graded for risk of
1. There is a wide range of studies included in the
bias, must then be translated into practical recommen-
analyses with no major limitations.
dations that will address the concerns of a broad range
2. There is little variation between studies.
of patients, presenting problems, clinicians, and clinical
3. The summary estimate has a narrow confidence
settings. As it is unlikely that any statistical algorithm
interval.
will ever be able to provide such guidance consistently,
the process of recommending treatments must ulti- Evidence is judged to be moderate quality when
mately be a product of human judgment. The system- reviewers conclude that the true effect is likely to be
atic review will provide raw information about the close to the estimate of the effect, but there is a possi-
strength of clinical effects, as well as the risk of bias of bility that it is substantially different. For example, evi-
the studies evaluating the treatment. In addition to dence is judged as moderate quality if any of the
those basic assessments, a determination of whether psy- following applies:
chological treatments should be recommended to clini-
1. There are only a few studies, and some have lim-
cians, consumers, and other stakeholders must be based
itations but not major flaws.
on the strength and quality of existing evidence and a
2. There is some variation between studies, or the
comparison of the likely benefits versus burden, cost,
confidence interval of the summary estimate is
and potential harms of the treatment. The best strategy
wide.
one can use in such a situation is to provide a clear

EMPIRICALLY SUPPORTED TREATMENTS ! TOLIN ET AL. 13


Evidence is judged to be low quality when the true
Table 4. Modified GRADE recommendations for psychological
effect may be substantially different from the estimate treatments based on systematic reviews (adapted from Guyatt et al.,
2008)
of the effect. For example, evidence is judged as low
quality if any of the following applies:
Recommendation
1. The studies have major flaws.
Very strong All of the following:
2. There is important variation between studies. recommendation • There is high-quality evidence that the treatment
produces a clinically meaningful effect on
3. The confidence interval of the summary estimate symptoms of the disorder being treated.
is very wide. • There is high-quality evidence that the treatment
produces a clinically meaningful effect on
functional outcomes.
In the GRADE system to determine quality of evi-
• There is high-quality evidence that the treatment
produces a clinically meaningful effect on
dence, evidence based on RCTs begins as high-quality
symptoms and/or functional outcomes at least
evidence, but such evidence could be downgraded 3 months after treatment discontinuation.
based on concerns such as study limitation, inconsis- • At least one well-conducted study has
demonstrated effectiveness in nonresearch
tency of results, indirectness of evidence, imprecision, settings.
Strong At least one of the following:
and reporting bias. Other types of studies begin as recommendation • There is moderate- to high-quality evidence
that the treatment produces a clinically
lower-quality evidence, but may be upgraded if mer-
meaningful effect on symptoms of the disorder
ited on a case-by-case basis. being treated.
The GRADE process typically results in a weak or a • There is moderate- to high-quality evidence
that the treatment produces a clinically
strong recommendation. For the psychotherapy evalua- meaningful effect on functional outcomes.
Weak Any of the following:
tion, we suggest that the GRADE system be modified recommendation • There is only low- or very low-quality evidence
that the treatment produces a clinically
to include a third category. A three-tier system would meaningful effect on symptoms of the disorder
better correspond to the current reality that few exist- being treated.
ing trials of psychological treatments have assessed • There is only low- or very low-quality evidence
that the treatment produces a clinically
functional and disability outcomes, despite the fact that meaningful effect on symptoms of the disorder
being treated as well as on functional outcomes.
such outcomes may be more important than symptom • There is moderate- to high-quality evidence that
the effect of the treatment, although statistically
outcomes. Thus, based on evidence from the submitted significant, may not be of a magnitude that is
systematic review and meta-analysis, we recommend clinically meaningful.

that the Task Force use an adapted GRADE process


and make one of three recommendations for the
empirical support of a psychological treatment: weak, ation) interval of not less than 3 months, with relatively
strong, or very strong. Treatments not meriting at least a little risk of harm and reasonable resource use, and
weak recommendation (e.g., no systematic review is there is at least one well-conducted study that has
available, or the outcomes of treatment studies do not demonstrated effectiveness of that treatment in nonre-
satisfy the minimal criteria for a weak recommenda- search settings (e.g., settings that provide routine clini-
tion) will be described simply as lacking sufficient evi- cal care, such as community mental health centers,
dence of efficacy. The criteria for these inpatient or outpatient treatment facilities, health main-
recommendations are shown in Table 4. tenance organizations, or private practices). We recog-
The GRADE recommendations are hierarchical; nize that this level of recommendation may be largely
treatments are ranked according to the highest level of aspirational at this time, although some treatments will
recommendation obtained. A very strong recommenda- merit a very strong recommendation at present. In other
tion is made when there is high-quality evidence that cases, the establishment of this level of recommenda-
the treatment produces a clinically meaningful effect on tion sets a bar for the planning of future treatment out-
symptoms of the disorder being treated, as well as a come studies.
clinically meaningful effect on functional outcomes, A strong recommendation, which will be more read-
with significant improvement noted at immediate ily attainable for many treatments at this time, requires
posttreatment and at a follow-up (treatment discontinu- the presence of moderate- to high-quality evidence

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 14


that the treatment produces a clinically meaningful the board can and should consider whether those asser-
effect on symptoms of the disorder being treated, or on tions are supported. Single-case designs are often par-
functional outcomes, again, with a clear positive bal- ticularly useful for such purposes. Such consideration
ance in consideration of benefits versus possible harms would help reduce the risk of allocating resources to
and resource use. Evidence of external effectiveness of elements of treatment that are inert or worse. Below,
generalizability is not required for this level of recom- we describe a longer-term plan for identifying active
mendation. therapeutic ingredients.
Weak recommendations, which are not necessarily Although most ESTs appear effective when applied
intended to discourage the use of treatments, are made to minority groups with specific disorders (e.g., Mir-
when there is only low- or very low-quality evidence anda et al., 2005), it cannot be automatically assumed
that the treatment produces a clinically meaningful that an EST that is effective for the majority population
effect on symptoms of the disorder being treated and/ will be equally effective among minority groups.
or functional outcomes, or when the evidence suggests Therefore, it is important that research on treatment
that the effects of the treatment may not be clinically efficacy and effectiveness attend to the generalizability
meaningful (although they may be statistically signifi- of effects across diverse populations. At this time, it
cant). In the case of a weak recommendation, it is not would be difficult to require a documentation of effi-
clear that gains from treatment warrant the resources cacy or effectiveness across minority groups, given the
involved, and patient preferences will be central in many nuances associated with assessing, treating, and
determining whether engaging in the treatment is the modifying treatments for different populations. Further-
best possible decision. more, it would likely be counterproductive to identify
a treatment as appropriate for minority populations
Taking Contextual Factors Into Account unless all such populations had been studied. We there-
It would be prohibitive, on several levels, for the Task fore recommend that nominators of treatments identify
Force to explicitly require comparative effectiveness specific studies demonstrating efficacy or effectiveness
analyses of all possible treatments or analyses of cost-ef- within a particular underrepresented group and that
fectiveness. However, when there are obvious concerns, such findings be highlighted in the presentation of the
the committee should be able to incorporate them into treatment and by the Task Force when recommenda-
the recommendation. This might occur, for instance, in tions are made.
contextualizing the clinical meaningfulness of a treat-
ment effect when there are other psychological treat- CONCLUSIONS AND FUTURE DIRECTIONS
ments that have well-documented and much larger The EST movement has, overall, provided positive
effects. Similarly, if a treatment generates an effect that direction for clinical psychology. However, several
is similar to other well-studied treatments, but requires valid criticisms of the process have been offered. In this
a very large number of sessions or length of time to article, we propose a new approach for identifying
generate the same effect at a much higher cost, then the ESTs and for recommending specific psychological
Task Force may take this into consideration. treatments to practitioners, consumers, and other stake-
The Task Force may take into account the pur- holders. Twenty years after the original Division 12
ported mechanism or active ingredient(s) of treatment Task Force report, such an update is long overdue.
and may upgrade or downgrade the recommendation Although clinical psychology once led the way in artic-
based on the quality of evidence supporting that mech- ulating how a treatment should be determined to be
anism or ingredient(s). It is conceptually difficult to empirically supported (and although many other
standardize this consideration into the criteria, as healthcare fields still look to those original criteria for
admittedly the mechanisms of many efficacious treat- guidance), advances in the field of evidence-based
ments are unclear. However, to the extent that a given medicine have rendered the old criteria obsolete.
treatment is based on a specific purported mechanism In this article, we propose a two-stage process by
or relies strongly on a particular treatment ingredient, which the Society of Clinical Psychology/Division 12

EMPIRICALLY SUPPORTED TREATMENTS ! TOLIN ET AL. 15


may help bridge the gap between the current, outdated best available research evidence, nor should all three
EST criteria and the planned treatment guidelines from factors be considered an “either-or” selection. That is,
APA. The aim is to begin to evaluate treatments in a skillful EBP does not involve selecting a treatment
manner that parallels and will support the methods pro- based on research evidence or on the clinician’s exper-
posed by APA, but in a manner that lends itself to tise or on patient characteristics. Rather, the best avail-
more rapid dissemination of scientific findings to those able research evidence (including ESTs) forms the basis
who would benefit most from them. We propose that of clinical judgment, with additional selection and
the process of identifying one or two positive studies modification based on clinical expertise and patient
for a treatment ceases, and that in its place we begin characteristics. The modifications to how ESTs are
evaluating systematic reviews of the treatment outcome evaluated and disseminated proposed in this article are
literature, weighting them according to the risk of bias hoped to help EBP practitioners reach appropriate con-
in the studies contributing the review. We further rec- clusions based on the best available clinical science.
ommend that instead of labeling treatments as “well
established” or “probably efficacious,” as is currently REFERENCES
done under the current system, we translate the American Psychiatric Association. (2009). Practice guideline for
research findings into clear recommendations of very the treatment of patients with panic disorder (2nd ed.).
strong, strong, or weak, using well-established, widely Washington, DC: Author.
accepted, and transparent grading guidelines. These American Psychiatric Association. (2010). Practice guideline for
the treatment of patients with major depressive disorder (3rd
steps, which can be implemented immediately, will
ed.). Washington, DC: Author.
greatly improve the quality of information that is dis-
American Psychiatric Association. (2013). Diagnostic and
seminated.
statistical manual of mental disorders (5th ed.). Washington,
As mentioned earlier, the APA Presidential Task DC: Author.
Force on Evidence-Based Practice (2006) defines EBP American Psychological Association. (1995). Template for
as consisting of three components of information: best developing guidelines: Interventions for mental disorders and
available research evidence, clinical expertise, and psychosocial aspects of physical disorders. Washington, DC:
patient characteristics. In our view, these three compo- Author.
nents play different critical roles in clinical decision- American Psychological Association. (2002). Criteria for
making (e.g., Tolin, 2014), in which the best available evaluating treatment guidelines. American Psychologist, 57,
research evidence forms the basis of clinical decisions 1052–1059. doi:10.1037//0003-066X.57.12.1052
and is interpreted, adjusted, and implemented through Amir, N., Beard, C., Burns, M., & Bomyea, J. (2009). Attention
modification program in individuals with generalized anxiety
clinical expertise and patient characteristics. A skilled
disorder. Journal of Abnormal Psychology, 118(1), 28–33.
evidence-based practitioner will first identify the EST
doi:10.1037/a0012589
that most closely matches the concerns presented by a
Andreasen, N. C., Carpenter, W. T., Jr., Kane, J. M., Lasser,
given patient. One EST is selected over the others by R. A., Marder, S. R., & Weinberger, D. R. (2005).
examining the available research evidence that shows Remission in schizophrenia: Proposed criteria and
the strength of the treatment and the quality of evi- rationale for consensus. American Journal of Psychiatry, 162,
dence. ESTs may also need to be adapted or aug- 441–449. doi:10.1176/appi.ajp.162.3.441
mented, based on patient characteristics such as APA Presidential Task Force on Evidence-Based Practice.
comorbid psychopathology, situational factors, or cul- (2006). Evidence-based practice in psychology. American
tural and demographic features. Such selection, adapta- Psychologist, 61, 271–285. doi:10.1037/0003-066X.61.4.271
tion, and augmentation procedures derive from the Are!an, P. A., & Kraemer, H. C. (2013). High-quality
expertise of the clinician, guided wherever possible by psychotherapy research. New York, NY: Oxford University
Press.
the best scientific evidence (with the understanding
Atkins, D., Eccles, M., Flottorp, S., Guyatt, G. H., Henry, D.,
that such research will rarely line up perfectly with the
Hill, S., . . . GRADE Working Group. (2004). Systems for
clinical problem). It is noted in this model that clinical
grading the quality of evidence and the strength of
expertise and patient characteristics do not trump the

CLINICAL PSYCHOLOGY: SCIENCE AND PRACTICE 16

You might also like