Delphi Methodology PMC
Delphi Methodology PMC
Abstract
The use of the Delphi technique is prevalent across health sciences research, and it is used to identify priorities, reach consensus
on issues of importance and establish clinical guidelines. Thus, as a form of expert opinion research, it can address fundamental
questions present in healthcare. However, there is little guidance on how to conduct them, resulting in heterogenous Delphi
studies and methodological confusion. Therefore, the purpose of this review is to introduce the use of the Delphi method, assess
the application of the Delphi technique within health sciences research, discuss areas of methodological uncertainty and propose
recommendations. Advantages of the use of Delphi include anonymity, controlled feedback, flexibility for the choice of statistical
analysis, and the ability to gather participants from geographically diverse areas. Areas of methodological uncertainty worthy of
further discussion broadly include experts and data management. For experts, the definition and number of participants remain
issues of contention, while there are ongoing difficulties with expert selection and retention. For data management, there are
issues with data collection, defining consensus and methods of data analysis, such as percent agreement, central tendency,
measures of dispersion, and inferential statistics. Overall, the use of Delphi addresses important issues present in health sciences
research, but methodological issues remain. It is likely that the aggregation of future Delphi studies will eventually pave the way for
more comprehensive reporting guidelines and subsequent methodological clarity.
Abbreviations: IQR = interquartile range.
Keywords: Delphi, mixed-methods research, quantitative research, qualitative research, survey research
The authors have no conflicts of interest to disclose. Copyright © 2023 the Author(s). Published by Wolters Kluwer Health, Inc.
Data sharing not applicable to this article as no datasets were generated or This is an open access article distributed under the Creative Commons
analyzed during the current study. Attribution License 4.0 (CCBY), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
a
Faculty of Medicine and Health Sciences, McGill University, Montreal, QC,
Canada. How to cite this article: Shang Z. Use of Delphi in health sciences research:
A narrative review. Medicine 2023;102:7(e32829).
* Correspondence: Zhida Shang, Faculty of Medicine and Health Sciences, McGill
University, Montreal, QC, Canada (e-mail: [email protected]). Received: 6 September 2021 / Received in final form: 10 January 2023 /
Accepted: 11 January 2023
http://dx.doi.org/10.1097/MD.0000000000032829
1
Shang • Medicine (2023) 102:7Medicine
2. A brief overview of Delphi measures of central tendency,[15] while also allowing for the final
results to be amendable to statistical analysis, leading to a sense
2.1. History of quasi-objectivity.[17]
Dealing with uncertainty is a perpetual component of the
human experience, and throughout human history, researchers,
theorists, and philosophers have attempted to perfect human 2.3. Consensus group techniques
mastery over it. In Ancient Greece, kings and generals have The Delphi technique has been widely used as a forecasting tool
sought to face their uncertainties for their state and careers by to predict certain developments, build consensus around clini-
consulting a prophet, the Pythia, or more commonly known as cal issues and develop, describe and evaluate clinical guidelines
the Oracle of Delphi. The Oracle was known for her prophecies, or tools.[19] Despite the diversity in its applications, consensus
which supposedly came directly from the Greek God Apollo. is a central theme and Delphi is considered a consensus group
More than 2500 years later, during the height of the Cold War, research method.[20–22] Consensus group techniques involve
the United States Army Air Corps also faced military uncertain- obtaining the views of a group of experts, and aim to bring
ties and consulted a think tank, the RAND Corporation. Using about consensus and agreement as outcomes.[21] A key advan-
the name of the prophet from the ancient world, researchers tage of consensus group techniques is a balanced participation
from the RAND Corporation developed the Delphi technique, from participants through structured formatting.[22]
which involves recruiting several military experts, asking each As a whole, researchers using consensus group methods
expert about the probability, frequency and intensity of a poten- assume that the views of a group are superior to that of an
tial Soviet attack, and then asking each expert to provide anon- individual,[15] consistent the psychology of the “wisdom of
ymous feedback, a process that is repeated until consensus is crowds.”[23,24] To elaborate, “crowds,” or a collection of individ-
reached.[13] Since then, this technique has been declassified and uals, are often better at decision making than a single member
has evolved beyond its military applications into the different of the group due to diversity of expertise, independent decision
health sciences. making, decentralized working conditions and aggregation.[24]
The Delphi technique incorporates these aspects, as there is
an independence of decision making through anonymized
2.2. Basic tenets of Delphi questionnaires, decentralization through experts responding
The Delphi technique is defined as the procedure of asking a autonomously but sharing decisions through the researcher,
panel of experts for their opinion on a relevant issue, summa- aggregation through the researcher presentation of results
rizing and presenting their collective responses and repeating and potential of having diverse expertise through appropriate
this process for a certain number of rounds (Fig. 1).[14] Overall, recruitment criteria.[25]
there are 4 key features to any Delphi, namely anonymity, iter- Another consensus group method is the Nominal Group tech-
ation, controlled feedback and the statistical aggregation of nique and the major difference between the Nominal Group and
group response.[15] Anonymity refers to the fact that partici- Delphi is that the Nominal Group is conducted in-person.[20]
pants should not know who else is involved in the study besides Specifically, the Nominal Group technique involves participants
the researchers, and is achieved through the use of anonymized receiving and reflecting upon a question, having the facilitator
questionnaires[15] and can be enhanced through assigning a ask each participant to share their ideas to the group, generating
unique confidential code.[16] Also, anonymity from each other a group discussion and lastly ranking the discussed ideas.[26] In
prevents undue influence by other participants potentially seen contrast to the Delphi technique, because the Nominal Group
as superior or more expert than themselves.[17] technique is done in-person, it is nearly impossible to be con-
Iterations refer to the feedback process, which is viewed as ducted anonymously and typically takes less than 2 hours to
a series of rounds allowing for participants to reassess their answer a single question.[27] One of the primary advantages of
previous judgments.[18] Controlled feedback refers to the fact using the Nominal Group is the ability to establish collaborative
that participants are informed of the responses of their anony- partnerships among the participants, and it is particularly well-
mous colleagues, allowing for the vocalization of the collective suited for research designs where such partnerships are required,
opinions and judgments rather than the vocal few.[15] Statistical such as action research.[21] Due to their respective differences,
aggregation involves the presentation of the statistical summary the Nominal Group technique is typically used in exploring con-
of the group response by the researcher, which are typically sumer and stakeholder views, whereas the Delphi technique is
2
Shang • Medicine (2023) 102:7www.md-journal.com
used to create best practice guidelines and treatment protocols convenience, time and cost savings and allows for an unprece-
among healthcare professionals.[22] Conversely, power differen- dented ease of data management.[42] Specifically, the eDelphi will
tials might have participants perceived as “weaker to rescind allow experts to participate regardless of geographic location or
their actual views in face of the opinions of those perceived as time zone and often leads to faster response times.[17] Currently,
“stronger.”[22] the term “eDelphi is inconsistently used, as many researchers
The 2 most common forms of consensus techniques used refer to their studies as simply “Delphi.”[43] Therefore, due to
are Delphi and Nominal Group,[28] but there is also a third the ubiquitous use of the Internet, most modern day Delphis are
1 that is occasionally used in health sciences research, the conducted as eDelphis, unless otherwise specified.
RAND Appropriateness method.[29] In essence, the RAND
Appropriateness method employs the Delphi technique through
online or mailed questionnaires, and then uses the Nominal 3. Importance to health sciences
Group technique to discuss the findings generated during the The use of Delphi is important to health sciences research in
Delphi phase.[29] several ways. As stated earlier, the Delphi technique is consid-
ered a manifestation of expert opinion developed via consensus.
Expert opinion is considered to fall within the lowest level of
2.4. Different forms of Delphi evidence on the evidence pyramid, whereas the highest are sys-
Currently, there is much debate surrounding the definitions of tematic reviews and meta-analyses.[44] Nonetheless, being “low”
the Delphi technique. This adds to the methodological confusion, on the evidence pyramid does not mean that Delphi studies are
which is a major critique of the technique, related to the lack of without value or are considered low-quality research. Evidence-
methodological rigor, little existing guidance to help research based medicine and nursing require a balance of studies involv-
and large variations in how Delphis are conducted.[30–32] Below, ing the entire pyramid, and the dominance of 1 form would lead
there will be an outline on some of the most used methods, to discrepancies, confusion and an incomplete rendition of the
namely “classic Delphi,” “modified Delphi,” “policy Delphi” phenomena under study. Furthermore, consensus methods are
and “e-Delphi.” All these terms are often used interchangeably, often used to determine the directionality of scientific research,
and currently, no concrete definitions or guidelines exist to dif- unearthing what are the fundamental underpinnings of a field
ferentiate between these different, but related Delphi techniques. and are seen as foundational methodology upon which all other
The “classic Delphi” is a term which is mostly outdated, as it methodologies rest.[25] Lastly, as the information stemming from
is now overwhelming being referred to as simply “Delphi”[20,33] experts tend to have direct and practical results, consensus by
and is the most commonly used method. The classic Delphi is expert opinion allows for the easier generation of solutions for
typically conducted for 2 to 3 rounds,[34] and the first round real-world problems. Despite its many advantages, there are
involves a qualitative open-ended questionnaire or interviews several potential barriers to implementing a successful Delphi
to develop initial statements, generating a large amount of data study. Being a form of survey research, conducting a Delphi can
(Trevelyan & Robinson, 2015). Subsequent rounds involve be a slow process, and it may take 2 to 6 months to complete a
quantitative questionnaires, with central tendency and measures 2-round Delphi.[45,46] A study involving closely contacting many
of dispersion to aggregate data.[35] participants over such a long duration may incur additional
The “modified Delphi” is used in an incredibly diverse fash- financial costs and resources, which the research team should
ion, and almost no single modified Delphi is conducted in be aware and adjust for. Thus, it is imperative for the researcher
the same way. For example, in a widely cited study, Morisset, and/or Delphi coordinator the become familiar with the process,
Johannson[36] used a modified Delphi to identify diagnostic in order to have a streamlined process, which may avoid partici-
criteria for chronic hypersensitivity pneumonitis. In the first pant dropout and increase the overall study success.[47]
round, Morisset, Johannson[36] used qualitative interviews and Delphi studies are useful to collect the first opinion on phe-
a literature review to identify initial items, and proceeded for 2 nomena, and it is often used to examine an area with limited
rounds using the classic Delphi approach. Subsequently, in the empirical research, and/or for where there are questions for
final round, researchers asked participants to use the newly con- which there may be no definitive answers.[48] Relatedly, depend-
ceived diagnostic criteria in a series of clinical vignettes, and to ing on the research team and research question, Delphis tend
provide a level of diagnostic confidence. As for another influen- to quickly identify important points or issues, rapidly leading
tial example, Feo, Conroy[37] used a modified Delphi approach to conclusions.[34] As one of the first empirical papers being
to standardize the definition of “fundamental care.” For the published in an emerging field, it can subsequently influence a
first round, Feo, Conroy[37] conducted an in-person interactive large body of literature. Therefore, the use of Delphi by various
workshop during an academic conference on the topic of “fun- healthcare professionals can allow for opportunities to publish
damental care,” compiled the results of the workshop and went highly visible research. Furthermore, despite being used incon-
through 2 rounds of “classic Delphi.” Overall, most modified sistently,[30–32] the Delphi is considered the most widely used
Delphis tend to involve an in-person aspect,[14] such as inter- consensus group technique.[30] Thus, more Delphi studies should
views or focus groups,[38] which are typically done in the first be conducted by nurses and other healthcare professionals to
round. Nonetheless, the use of “modified Delphi” appears to be ensure future methodological clarity and conciseness.
a blanket term for heterogeneous methods that are deviating
from the “classic Delphi.”
An interesting deviation from the traditional Delphi approach 4. Issue #1 – experts
is the “policy Delphi,” which is used in cases where consensus is
4.1. What is an “expert?”
not required, and dissensus is promoted.[39] Thus, policy Delphis
are not meant for decision making, but instead are useful as an The Delphi involves the recruitment of a panel of “experts,”
analytic tool in policy issues.[40] Although they are conducted in which the term “expert” is left open to interpretation. As a
a heterogeneous fashion,[41] policy Delphis tend to involve sim- whole, an aim of the Delphi technique is to obtain high-quality
ilar steps as the classic Delphi, in which issues are formulated, responses from a select panel of experts, as opposed to getting a
participants are asked of their opinion on the items, but also representative sample in traditional survey techniques.[49,50] The
explore the reasons for disagreements and subsequently reeval- Delphi technique uses a nonrandom sampling method, and aims
uate the options.[40] to identify prominent, knowledgeable or representative people
Lastly, eDelphi, defined as the classic Delphi but done com- of the field under study.[50] As Delphis use a nonrandom sam-
pletely online[38] is increasingly being used as it offers unparalleled pling technique, there is inherent bias in the recruitment process,
3
Shang • Medicine (2023) 102:7Medicine
since participants who are more interested in the topic are more Next, attrition has been identified as a major issue being
likely to be involved for the various rounds.[9] Unfortunately, faced by Delphi studies,[20,25,35] with attrition rates ranging from
participants cannot be selected randomly, due to the need to 0% to 92% for classical Delphis.[62] As Delphis can have mul-
ensure “expertise,”[50] a concept which will be elaborated in the tiple rounds, more and more participants are likely to drop out
section below. through the subsequent rounds. This can be problematic, because
Criteria used to define “expertise” is highly diverse, with exam- increasing attrition over subsequent rounds can be due to partic-
ples being high educational attainment,[17] part of the researcher ipants with dissenting views to drop out, creating a false sense of
personal network,[51] years of clinical/practical experience,[17] consensus,[20] leading to a form of response bias.[50] Interestingly,
authorship in a peer-reviewed publication[52] and membership in Hejblum, Ioos[63] found that there are lower response rates for
a professional association.[53] Although educational attainment Internet-based Delphis as opposed to a mail-in system, leading
and years of experience are the most commonly used metrics to to Boulkedid, Abdoul[31] to propose researchers to use a mixed
gauge expertise,[35] there is an ongoing debate on the definition Internet and mail-in approach. However, the applicability of the
of expertise, and there are no current guidelines or standards findings of Hejblum, Ioos[63] can be critiqued due to its outdat-
on the selection of expert panel members. Commonly accepted edness, as the year of implementation is a strong predictor of
requirements for participation in the expert panel include: expe- mail-in-survey response rate.[64]
rience and knowledge, willingness and capacity to participate, Overall, it is recommended to ensure that participants are
time to participate and adequate communication skills.[35,54] fully informed of the study, including of the time commitments
Thus, researchers should strive to maintain a balance between and researchers should maintain a short between-round time
these points. Additionally, a definition that is too stringent or frame,[17,31,35] while ensuring a presentable delivery of feedback.
specific would reduce the potential pool of participants, while Also, sending in regular reminders to participants that each
a poorly defined definition can potentially affect the construct round is constructed out of their responses encourages inter-
validity of the Delphi panel. est, ownership and partnership.[50] Due to the permeability and
flexibility of the Internet, the use of Internet-based approaches
for Delphi is recommended, and even though evidence (albeit
4.2. Number of experts outdated) suggests the contrary, it is without debate that
Another issue of ongoing debate is the number of participants Internet-based survey methods are much more cost-effective.[65]
to participate in a Delphi. The number of panelists can range Therefore, monetary funds could be redistributed to other
from as few as 4[55] to several thousand.[56] Most commonly, aspects of the study, resulting in a decrease in attrition and an
Delphis tend to be within the range of 8[57] to 20.[25] Interestingly, overall more robust study.
a bootstrap study done by Akins, Tolson[58] suggests that 23
participants lead to response stability within multiple rounds.
Nonetheless, this is just 1 dated study, and more research is 5. Issue #2 – data management
needed in studying the optimal panel size for Delphi studies. It
has also been argued that findings will be more stable with larger 5.1. Data collection
sample sizes.[25] To illustrate, a smaller panel of 10 experts can Data is collected through Likert surveys, and summarized
be highly unstable, as 1 person makes up 10% of the responses results from the previous round, alongside the participant own
and thus is a major influence on the results of the study. With responses are presented to each participant. For Likert scales,
larger panel sizes, the individual expert influence on the study typically 5 and 10 point scales are used[66] and can be supported
will be less, and findings will be more stable.[25] On the other through graphical representation, such as bar graphs. Currently,
hand, researchers have found that large expert panels can intro- there is debate around whether to include a midpoint (odd
duce difficulties in data collection and management.[18] Overall, number of categories) or not (even number of categories).[35]
Delphi panel sizes should be carried out with consideration to If there is a midpoint, there is a chance that participants may
time and monetary constraints and ideally be between 8 to 23 elect to choose the midpoint for: questions they have no opinion
participants. on, choosing a minimally acceptable response as soon as it is
found and avoidance what appears to be the socially undesir-
able behavior of selecting a “negative” option.[67] Nonetheless,
4.3. Issues with expert selection and retention the midpoint is useful for expressing neutral opinions, which is
Besides the number of participants, the choice of participants is important for answering obscure and emerging topics,[68] topics
also open to debate. Strict selection criteria and definitions of which are often studied by Delphis. Also, it has been argued
expertise lead to a more homogenous expert panel, whereas less that midpoints are not “dumping grounds” and instead the
restrictive definitions will lead to a more heterogeneous sample. phenomenon can be attributed to a lack of question clarity by
Currently, it is recommended to have a heterogeneous sample the research team.[69] Therefore, researchers should carefully
in terms of expertise,[25,31,55,59] as it leads to better performances consider the clarity of their Likert questionnaire, which can be
and higher quality responses due to a wider range of perspec- accomplished through pilot testing.[38] It is also suggested to
tives.[24,35] Furthermore, if the issue under study is used to inform include as few items as possible within the survey, as a large
broader policy or have a global relevance, then it may be opti- amount of items is associated with lower response rates.[70]
mal to have a more heterogeneous sample.[60] However, it is also
argued that heterogeneous panels can increase the complexity
and difficulty of collecting data, reaching consensus, conduct- 5.2. Consensus defined?
ing analyses, and verifying results.[61] This leads to a decision As stated earlier, the aim of the Delphi technique is to achieve
quality trade-off, as stability increases in tandem with sample consensus. However, in a systematic review by Diamond,
size and heterogeneity (thus an increase in decision quality), but Grant,[32] nearly every study uses their own standard for consen-
beyond a certain threshold panel size and heterogeneity, manag- sus, a finding that is also supported by a more dated review.[31]
ing the Delphi process becomes cumbersome in return for mar- Consensus can be defined in 2 ways, with the first being agree-
ginal benefits.[61] Although these issues are inherent with Delphi ment with the statement and second being the extent partici-
methodology and unique to the problem under study, a portion pants agree with each other.[71] Furthermore, there is stability,
of these issues can potentially be mitigated by conducting pilot which measures if agreement is present throughout the Delphi
Delphi studies, or validating the results through triangulation process, or if it changed between rounds.[71] With such con-
with other techniques, such as qualitative focus groups.[50] fusing and ambiguous definitions going around, it is without
4
Shang • Medicine (2023) 102:7www.md-journal.com
a doubt that consensus is inadequately addressed by research- researchers should elect to use the IQR to represent spread and
ers, with roughly 26% of all Delphi studies not even defining consensus rather than the standard deviation.
consensus.[32]
Currently, the most common definitions of consensus are:
percent agreement (i.e., x% with the same rating), measure 5.4. Consensus to inferential statistics
of central tendency (i.e., median ≥7 on a 9-point Likert scale), Inferential statistics are statistics that help to establish rela-
proportion within a range (i.e., x% of participants scoring tionships among variables and draw conclusion, which can be
above a certain score on a Likert questionnaire) and dispersion viewed as a measure of stability by various authors.[35,74,75] The
of responses (i.e., interquartile range of 1 on a 5-point Likert Chi squared test for independence is a nonparametric test by
scale).[32,72] For the purposes of this paper, the term “defini- which 1 can assess whether there is a relationship between 2
tion” will be considered a measure, such as percent agreement variables. It has been proposed by Dajani, Sincoff[78] as a method
or interquartile range. The term “level” will be considered the to check for the stability of the responses, but has been criticized
degree of the definition, such as 70 percent agreement, or an by Holey, Feeley,[75] who argue that Chi squared is instead test-
interquartile range of 2 on a 10-point scale. Despite this diver- ing for the independence of the Delphi rounds from responses
sity of analytical measures and definitions of consensus, there obtained in them. Another inferential statistic is the Wilcoxon
are no currently agreed upon standards or guidelines for choos- paired signed-ranks t test, which is the non-parametric alter-
ing 1 over the other. native of the t test. Specifically, it compares the difference in
responses to a survey item from 2 rounds, and assesses whether
it is equal to zero.[10] The Wilcoxon paired signed-ranks t-test
5.3. Consensus – percent agreement, central tendency and can assess the degree of consensus, and thus is seen by several
measures of dispersion authors as a measure of stability.[10,35,74] Overall, consensus is
Even with such diverse definitions of consensus, there are dif- one of the most undefined aspects of Delphis, and stability is the
fering levels of each definition. For example, if going by percent most undefined aspect of consensus.[10] Currently, it is unknown
agreement, then achieving a 100% agreement by all participants whether stability is a valid stopping criterion, or if inferential
would be incredibly difficult. However, an unimpressively low statistics truly represents stability, and therefore researchers
percent agreement, perhaps around 30% to 50%, will be quite should interpret the results of any inferential statistic in Delphi
easy to achieve but renders the results of the Delphi less robust. cautiously.
Therefore, a balance is required, and the answer may lie within
the importance of the research question, such as if it revolves
around a life or death issue, then a very high consensus level will 5.5. Consensus – lessons learned
be desirable.[73] If using percent agreement, then a level between Although consensus is a defining aspect of Delphi studies, over
70% to 80% is usually adopted and widely considered to be rig- 70% of Delphi studies do not use the achievement of consen-
orous.[16,34,54,60,73] To summarize, researchers should aim for 70% sus as a stopping criterion, but instead only run for a predeter-
to 80%, unless the research question is 1 that requires incredi- mined number of rounds.[32] If not using consensus as a stopping
ble precision, such as end-of-life guidelines, in which researchers criterion, having a Delphi run for 3 rounds has been deemed
should aim for 90% to 100%. to be optimal by several authors.[35,61,79] One example of a 3
Other commonly used definitions are measures of central round Delphi is by Griffiths, who demonstrated high consensus
tendency, which include mean, median and mode and are rec- (>90%) in identifying different treatment goals for pulmonary
ommended over percent agreement by Hsu and Sandford.[18] perioperative complications.[80] Nonetheless, there can also be
However, Likert survey data are traditionally considered to be low percent agreement and consensus within a study, which
on the ordinal scale, and thus cannot be used for calculating would subsequently eliminate many of the proposed points, as
the mean, as it can only be used for data on an interval scale.[74] seen in the 3 round Delphi done by Huijben in examining ICU
Therefore, it is recommended to avoid reporting the mean as care qualities for patients with traumatic brain injury.[81] This
a definition of consensus, and instead use median and mode. can be advantageous for the research team, as it will allow for
Furthermore, the median is less likely to be influenced by out- better logistical and time management, indirectly leading to cost
liers, which can very likely occur if there is an expert with an savings. Additionally, forcing consensus by continuously con-
extremely strong and divergent opinion on a certain issue. The ducting an indefinite number of rounds is counterproductive,
mode is also used, but it can be problematic as it cannot capture because participants may eventually become frustrated and
multimodal data distributions, such as if an issue is perceived agree with each other to make it end.[20] Generally, due to its
by some experts as moderately important, while by others as simplicity and ease of planning, Delphi studies can be conducted
extremely important. Therefore, if using central tendency, it is with a predetermined number of rounds, ideally 3, without
recommended to solely use the median, while considering the significant controversy. One possible exception to this will be
mode as a form of secondary analysis. for research questions which require a large amount of preci-
One commonly used descriptive statistic and measure of dis- sion, such as the previously mentioned example of end-of-life
persion is the standard deviation.[66,74,75] The standard deviation guidelines. Also, setting a predetermined number of rounds is
is a statistic that measures the dispersion relative to the mean and not an excuse to ignore or conduct a subpar data analysis, and
is calculated as the square root of the variance. Thus, a smaller researchers should always justify their choices.
spread means a smaller standard deviation, which means that it In an interesting study, Grant, Booth,[72] used available data
is more likely to represent consensus. However, like the critiques sets from published Delphi studies and calculated final consen-
of using the mean, the standard deviation is still sensitive (albeit sus based on several commonly used definitions with varying
less so) to outliers and cannot be used for ordinal data coming levels of stringency. In that study, the authors found that the
from Likert questionnaires. Another measure of dispersion is the percentage of items reaching consensus varied dramatically
interquartile range (IQR), which is defined as the amount of from 0% to 84% depending on the analytic procedure, leading
spread of the middle 50% of observations. The IQR is a fre- to Grant, Booth[72] to caution readers against potential data
quently used metric for consensus, and is considered objective mining and selective reporting of consensus. Relatedly, there
and rigorous by several authors.[66,74,76,77] Typically, an IQR of is always a chance that consensus represents collective igno-
1 or less on a 4 to 5 item Likert scale and 2 or less on a 10 rance as opposed to wisdom.[60] Also, even though anonymity
item scale can be considered as consensus, with the more points prevents any explicit form of domination,[17] inherently weak-
being on the scale, the larger the expected IQR.[74] Therefore, er-willed members may be inclined to change their opinions
5
Shang • Medicine (2023) 102:7Medicine
due to the sole desire to conform, while inherently strong- [10] Kalaian S, Kasim RM. Terminating sequential Delphi survey data col-
willed participants may continue to rigidly hold onto their lection. Pract Assess Res. 2012;17:1–11.
views.[73] Therefore, even if consensus is “achieved,” it should [11] Shariff N. Utilizing the Delphi survey approach: a review. J Nurs Care
Qual. 2015;4:246–51.
be carefully interpreted, and it does not necessarily mean that
[12] Baethge C, Goldbeck-Wood S, Mertens S. SANRA – a scale for the
the statement is “correct.”[73] As a whole, it is suggested that quality assessment of narrative review articles. Res Integr Peer Rev.
researchers should always report a priori the consensus defi- 2019;4:5.
nition and level, and ideally explain their reasoning for choos- [13] Dalkey NC. Analysis of the Future: the Delphi Method. Santa Monica.
ing the various definitions and levels. Readers should always California: RAND Corporation; 1967.
critically appraise the methodology, design and results, and [14] McKenna HP. The Delphi technique: a worthwhile research approach
be aware of the inherent weaknesses surrounding the Delphi for nursing?. J Adv Nurs. 1994;19:1221–5.
technique. [15] Rowe G, Wright G. The Delphi technique as a forecasting tool: issues
and analysis. Int J Forecast. 1999;15:353–75.
[16] Falzarano M, Zipp GP. Seeking consensus through the use of
the Delphi technique in health sciences research. J Allied Health.
6. Conclusion 2013;42:99–105.
To conclude, the use of the Delphi technique is useful in [17] McPherson S, Reese C, Wendler MC. Methodology update: Delphi
answering several critical problems within the different studies. Nurs Res. 2018;67:404–10.
health professions. The combination of anonymity, iteration, [18] Hsu C-C, Sandford BA. The Delphi technique: making sense of consen-
sus. Pract Assess Res. 2007;12:10.
controlled feedback and the statistical aggregation of group
[19] Thangaratinam S, Redman CW. The Delphi technique. Obstet
response makes it ideal for addressing emerging and unknown Gynaecol. 2005;7:120–5.
topics and forecasting issues of importance in medicine, nurs- [20] Humphrey-Murto S, Varpio L, Gonsalves C, et al. Using consensus
ing and others. Furthermore, as a form of expert opinion group methods such as Delphi and Nominal Group in medical educa-
consensus technique, it has the potential of being one of the tion research. Med Teach. 2017;39:14–9.
first peer-reviewed publications in a new field, subsequently [21] Harvey N, Holmes CA. Nominal group technique: an effective method
influencing a large body of literature. Lastly, Delphis are often for obtaining group consensus. Int J Nurs Pract. 2012;18:188–94.
cost-effective and able to gather experts from geographically [22] McMillan SS, King M, Tully MP. How to use the nominal group and
diverse areas. However, there continues to be methodologi- Delphi techniques. Int J Clin Pharm. 2016;38:655–62.
[23] Mannes AE, Soll JB, Larrick RP. The wisdom of select crowds. J Pers
cal uncertainty and a lack of clear guidelines. More precisely,
Soc Psychol. 2014;107:276–99.
there are ongoing debates about the definition of expertise, [24] Surowiecki J. The Wisdom of Crowds. Anchor; 2005.
how many panel members to recruit, definitions of consensus [25] Jorm AF. Using the Delphi expert consensus method in mental health
and issues with using different forms of statistical analysis. research. Aust N Z J Psychiatry. 2015;49:887–97.
Accordingly, the preliminary reporting guidelines set out by [26] Delbecq AL, Van de Ven AH, Gustafson DH. Group Techniques for
Diamond, Grant[32] can be used as a foundation for future Program Planning: A Guide to Nominal Group and Delphi Processes.
Delphi research. Naturally, it is likely that the aggregation of Scott, Foresman; 1975.
future Delphi studies will eventually pave the way for more [27] Bradley F, Schafheutle EI, Willis SC, Noyce PR. Changes to supervi-
comprehensive reporting guidelines and subsequent method- sion in community pharmacy: pharmacist and pharmacy support staff
views. Health Soc Care Community. 2013;21:644–54.
ological clarity.
[28] Humphrey-Murto S, Varpio L, Wood TJ, et al. The use of the Delphi
and other consensus group methods in medical education research: a
review. Acad Med. 2017;92:1491–8.
Acknowledgements [29] Fitch K, Bernstein SJ, Aguilar MD, et al. The RAND/UCLA
This paper is dedicated to the memory of San Seungwoo Hong. Appropriateness Method User’s Manual. CA: Rand Corp Santa
Monica; 2001.
[30] Foth T, Efstathiou N, Vanderspank-Wright B, et al. The use of Delphi
Author contributions and Nominal Group Technique in nursing education: a review. Int J
Nurs. 2016;60:112–20.
Conceptualization: Zhida Shang. [31] Boulkedid R, Abdoul H, Loustau M, et al. Using and reporting the
Formal analysis: Zhida Shang. Delphi method for selecting healthcare quality indicators: a systematic
Investigation: Zhida Shang. review. PLoS One. 2011;6:e20476.
Methodology: Zhida Shang. [32] Diamond IR, Grant RC, Feldman BM, et al. Defining consensus: a
Validation: Zhida Shang. systematic review recommends methodologic criteria for reporting of
Visualization: Zhida Shang. Delphi studies. J Clin Epidemiol. 2014;67:401–9.
Writing – original draft: Zhida Shang. [33] Varndell W, Fry M, Lutze M, et al. Use of the Delphi method to generate
guidance in emergency nursing practice: a systematic review. Int Emerg
Writing – review & editing: Zhida Shang.
Nurs. 2020;100867.
[34] Asselin M, Harper M. Revisiting the Delphi technique: implications for
nursing professional development. J Nurses Prof Dev. 2014;30:11–5.
References [35] Trevelyan EG, Robinson PN. Delphi methodology in health research:
[1] Watson R. Quantitative research. Nurs Stand. 2015;29:44–8. how to do it?. Eur J Integr Med. 2015;7:423–8.
[2] Polit DF, Beck CT. Nursing Research: Generating and Assessing [36] Morisset J, Johannson KA, Jones KD, et al. Identification of diagnos-
Evidence for Nursing Practice. Lippincott Williams & Wilkins; 2008. tic criteria for chronic hypersensitivity pneumonitis. An international
[3] Sidani S. Health Intervention Research: Understanding Research modified Delphi survey. Am J Respir Crit. 2018;197:1036–44.
Design and Methods. London: SAGE Publications; 2014. [37] Feo R, Conroy T, Jangland E, et al. Towards a standardised defi-
[4] Curtis E, Comiskey C, Dempsey O. Correlational research: Importance nition for fundamental care: a modified Delphi study. J Clin Nurs.
and use in nursing and health research. Nurs Res. 2015;6:20–5. 2018;27:2285–99.
[5] Coughlan M, Cronin P, Ryan F. Survey research: process and limita- [38] Toronto C. Considerations when conducting e-Delphi research: a case
tions. Int J Ther Rehabil. 2009;16:9–15. study. Nurse Res. 2017;25:10–5.
[6] Privitera GJ. Research Methods for the Behavioral Sciences. SAGE [39] Turoff M. The design of a policy Delphi. Technol Forecast Soc Change.
Publications; 2018. 1970;2:149–71.
[7] Story DA, Tait AR. Survey research. Anesthesiology. 2019;130:192–202. [40] Turoff M, Linstone HA. The Delphi method-techniques and applica-
[8] Cope DG. Using electronic surveys in nursing research. Oncol Nurs tions. 2002. Available at: http://www.foresight.pl/assets/downloads/
Forum. 2014. publications/Turoff_Linstone.pdf.
[9] Hasson F, Keeney S, McKenna H. Research guidelines for the Delphi [41] de Loë RC, Melnychuk N, Murray D, et al. Advancing the state of
survey technique. J Adv Nurs. 2000;32:1008–15. policy Delphi practice: a systematic review evaluating methodological
6
Shang • Medicine (2023) 102:7www.md-journal.com
evolution, innovation, and opportunities. Technol Forecast Soc Change. [63] Hejblum G, Ioos V, Vibert J-F, et al. A web-based Delphi study on
2016;104:78–88. the indications of chest radiographs for patients in ICUs. Chest.
[42] Donohoe H, Stellefson M, Tennant B. Advantages and limitations of 2008;133:1107–12.
the e-Delphi technique: implications for health education researchers. [64] Stedman RC, Connelly NA, Heberlein TA, et al. The end of the
Am J Health Educ. 2012;43:38–46. (research) world as we know it?. Understanding and coping with declin-
[43] Gill FJ, Leslie GD, Grech C, et al. Using a web-based survey tool to ing response rates to mail surveys. Soc Nat Resour. 2019;32:1139–54.
undertake a Delphi study: application for nurse education research. [65] Campbell RM, Venn TJ, Anderson NM. Cost and performance
Nurse Educ Today. 2013;33:1322–8. tradeoffs between mail and internet survey modes in a nonmarket val-
[44] Ingham-Broomfield JR. A nurses’ guide to the hierarchy of research uation study. J Environ Manage. 2018;210:316–27.
designs and evidence. Aust J Adv Nurs. 2016;33:38–43. [66] Giannarou L, Zervas E. Using Delphi technique to build consensus in
[45] Schmalz U, Spinler S, Ringbeck J. Lessons learned from a two-round practice. Int J Bus Sci Appl Manag. 2014;9:65–82.
Delphi-based scenario study. MethodsX. 2021;8:101179. [67] Chyung SY, Roberts K, Swanson I, et al. Evidence-based survey
[46] Isla D, González-Rojas N, Nieves D, et al. Treatment patterns, use of design: the use of a midpoint on the Likert scale. Perform Improv.
resources, and costs of advanced non-small-cell lung cancer patients in 2017;56:15–23.
Spain: results from a Delphi panel. Clin Transl Oncol. 2011;13:460–71. [68] Johns R. One size doesn’t fit all: selecting response scales for attitude
[47] Ab Latif R, Dahlan A, Mulud ZA, et al. The Delphi technique as a items. J Elect Public Opin Parties. 2005;15:237–64.
method to obtain consensus in health care education research. EIMJ. [69] Kulas JT, Stachowski AA. Respondent rationale for neither agree-
2017;9. ing nor disagreeing: person and item contributors to middle cate-
[48] Schneider Z, Whitehead D, LoBiondo-Wood G, et al. Nursing and gory endorsement intent on Likert personality indicators. J Res Pers.
Midwifery Research: Methods and Appraisal for Evidence Based 2013;47:254–62.
Practice. Elsevier; 2016. [70] Gargon E, Crew R, Burnside G, et al. Higher number of items associ-
[49] Devaney L, Henchion M. Who is a Delphi “expert?” Reflections on ated with significantly lower response rates in COS Delphi surveys. J
a bioeconomy expert selection procedure from Ireland. Futures. Clin Epidemiol. 2019;108:110–20.
2018;99:45–55. [71] Becker GE, Roberts T. Do we agree?. Using a Delphi technique
[50] Keeney S, Hasson F, McKenna HP. A critical review of the Delphi to develop consensus on skills of hand expression. J Hum Lact.
technique as a research methodology for nursing. Int J Nurs. 2009;25:220–5.
2001;38:195–200. [72] Grant S, Booth M, Khodyakov D. Lack of preregistered analysis plans
[51] Manzano-García G, Ayala J-C. Insufficiently studied factors related allows unacceptable data mining for and selective reporting of consen-
to burnout in nursing: results from an e-Delphi study. PLoS One. sus in Delphi studies. J Clin Epidemiol. 2018;99:96–105.
2017;12:e0175352. [73] Keeney S, Hasson F, McKenna H. Consulting the oracle: ten lessons
[52] Cheung KL, de Ruijter D, Hiligsmann M, et al. Exploring consensus from using the Delphi technique in nursing research. J Adv Nurs.
on how to measure smoking cessation. A Delphi study. BMC Public 2006;53:205–12.
Health. 2017;17. [74] von der Gracht HA. Consensus measurement in Delphi studies: review
[53] Salgado TM, Fedrigon A, Omichinski DR, et al. Identifying medication and implications for future quality assurance. Technol Forecast Soc
management smartphone app features suitable for young adults with Change. 2012;79:1525–36.
developmental disabilities: Delphi consensus study. JMIR mHealth [75] Holey EA, Feeley JL, Dixon J, et al. An exploration of the use of simple
uHealth. 2018;6. statistics to measure consensus and stability in Delphi studies. BMC
[54] Grisham T. The Delphi technique: a method for testing complex and Med Res Methodol. 2007;7:52.
multifaceted topics. Int J Manag. 2009. [76] Gnatzy T, Warth J, von der Gracht H, et al. Validating an innovative
[55] Bloor M, Sampson H, Baker S, et al. Useful but no Oracle: reflections real-time Delphi approach - a methodological comparison between
on the use of a Delphi group in a multi-methods policy research study. real-time and conventional Delphi studies. Technol Forecast Soc
Qual Res. 2015;15:57–70. Change. 2011;78:1681–94.
[56] Cuhls K, Kuwahara T. Outlook for Japanese and German Future [77] Birko S, Dove ES, Özdemir V. Evaluation of nine consensus indices in
Technology: Comparing Technology Forecast Surveys. Springer Science Delphi foresight research and their dependency on Delphi survey char-
& Business Media; 2012. acteristics: a simulation study and debate on Delphi design and inter-
[57] Paul CL. A modified delphi approach to a new card sorting methodol- pretation. PLoS One. 2015;10:e0135162.
ogy. J Usabil Stud. 2008;4:7–30. [78] Dajani JS, Sincoff MZ, Talley WK. Stability and agreement criteria
[58] Akins RB, Tolson H, Cole BR. Stability of response characteristics of a for the termination of Delphi studies. Technol Forecast Soc Change.
Delphi panel: application of bootstrap data expansion. BMC Med Res 1979;13:83–90.
Methodol. 2005;5:37. [79] Green B, Jones M, Hughes D, et al. Applying the Delphi technique in a
[59] Powell C. The Delphi technique: myths and realities. J Adv Nurs. study of GPs’ information requirements. Health Soc Care Community.
2003;41:376–82. 1999;7:198–205.
[60] Vernon W. The Delphi technique: a review. Int J Ther Rehabil. [80] Griffiths SV, Conway DH, Javier BF, et al. What are the optimum com-
2009;16:69–76. ponents in a care bundle aimed at reducing post-operative pulmonary
[61] Skulmoski GJ, Hartman FT, Krahn J. The Delphi method for graduate complications in high-risk patients?. Perioper Med. 2018;7:7.
research. J Inf Technol. 2007;6:1–21. [81] Huijben JA, Wiegers EJA, de Keizer NF, et al. Development of a quality
[62] Keeney S, McKenna H, Hasson F. The Delphi Technique in Nursing and indicator set to measure and improve quality of ICU care for patients
Health Research. John Wiley & Sons; 2011. with traumatic brain injury. Crit Care. 2019;23:95.