0% found this document useful (0 votes)
57 views8 pages

3 - Cosmin 2010

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views8 pages

3 - Cosmin 2010

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Mokkink et al.

BMC Medical Research Methodology 2010, 10:22


http://www.biomedcentral.com/1471-2288/10/22

CORRESPONDENCE Open Access

The COSMIN checklist for evaluating the


methodological quality of studies on
measurement properties: A clarification
of its content
Lidwine B Mokkink1*, Caroline B Terwee1†, Dirk L Knol1, Paul W Stratford2, Jordi Alonso3,4, Donald L Patrick5,
Lex M Bouter1,6, Henrica CW de Vet1†

Abstract
Background: The COSMIN checklist (COnsensus-based Standards for the selection of health status Measurement
INstruments) was developed in an international Delphi study to evaluate the methodological quality of studies on
measurement properties of health-related patient reported outcomes (HR-PROs). In this paper, we explain our
choices for the design requirements and preferred statistical methods for which no evidence is available in the
literature or on which the Delphi panel members had substantial discussion.
Methods: The issues described in this paper are a reflection of the Delphi process in which 43 panel members
participated.
Results: The topics discussed are internal consistency (relevance for reflective and formative models, and
distinction with unidimensionality), content validity (judging relevance and comprehensiveness), hypotheses testing
as an aspect of construct validity (specificity of hypotheses), criterion validity (relevance for PROs), and
responsiveness (concept and relation to validity, and (in) appropriate measures).
Conclusions: We expect that this paper will contribute to a better understanding of the rationale behind the
items, thereby enhancing the acceptance and use of the COSMIN checklist.

Background empirical evidence. It is particularly valued for its ability


For the measurement of health-related patient-reported to structure and organize group communication [1].
outcomes (HR-PROs) it is important to evaluate the In an international Delphi study we developed the
methodological quality of studies in which the measure- COSMIN (COnsensus-based Standards for the selection
ment properties of these instruments are assessed. of health status Measurement INstruments) checklist for
When studies on measurement properties have good evaluating the methodological quality of studies on mea-
methodological quality, their conclusions are more trust- surement properties [2,3]. The checklist contains twelve
worthy. A checklist containing standards for design boxes. Ten boxes can be used to assess whether a study
requirements and preferred statistical methods is a use- meets the standards for good methodological quality.
ful tool for this purpose. However, there is not much Nine of these boxes contain standards for the included
empirical evidence for the content of such a tool. A Del- measurement properties (internal consistency, reliability,
phi study is a useful study design in fields lacking measurement error, content validity (including face
validity), structural validity, hypotheses testing, and
cross-cultural validity (these three are aspects of con-
* Correspondence: [email protected] struct validity), criterion validity, and responsiveness),
† Contributed equally
1
Department of Epidemiology and Biostatistics and the EMGO Institute for and one box contains standards for studies on interpret-
Health and Care Research, VU University Medical Center, Amsterdam, the ability. In addition, two boxes are included in the
Netherlands

© 2010 Mokkink et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Mokkink et al. BMC Medical Research Methodology 2010, 10:22 Page 2 of 8
http://www.biomedcentral.com/1471-2288/10/22

checklist that contain general requirements for articles properties, and reached international consensus on ter-
in which IRT methods are applied (IRT box), and gen- minology and definitions of measurement properties [5].
eral requirements for the generalisability of the results The focus of the checklist is on studies on measurement
(Generalisability box), respectively. More information on properties of HR-PROs used in an evaluative applica-
how to use the COSMIN checklist can be found else- tion, i.e. longitudinal assessment of treatment effects or
where [4]. changes in health over time.
The checklist can be used, for example, in a systematic In this paper, we provide a clarification for some parts
review of measurement properties, in which the quality of the COSMIN checklist. We explain our choices for
of studies on measurement properties of instruments the included design requirements and preferred statisti-
with a similar purpose are assessed, and results of those cal methods for which no evidence is available in the lit-
studies are compared with a view to select the best erature or which generated substantial discussion among
instrument. If the results of high-quality studies differ the members of the Delphi panel. The topics that are
from the results of low-quality studies, this can be an subsequently discussed in detail are internal consistency,
indication of bias. Consequently, instrument selection content validity, hypotheses testing as an aspect of con-
should be based on the high-quality studies. The COS- struct validity, criterion validity, and responsiveness.
MIN checklist can also be used as guidance for designing
or reporting a study on measurement properties. Further- Internal Consistency
more, students can use the checklist when learning about Internal consistency was defined as the interrelatedness
measurement properties, and reviewers or editors of among the items [5]. In Figure 1 its standards are given.
journals can use it to appraise the methodological quality The discussion was about the relevance of internal con-
of studies on measurement properties. Note that the sistency for reflective models and formative models, and
COSMIN checklist is not a checklist for the evaluation of on the distinction between internal consistency and
the quality of a HR-PRO, but for the methodological unidimensionality.
quality of studies on their measurement properties. The Delphi panel reached consensus that the internal
As a foundation for the content of the checklist, we consistency statistic only gets an interpretable meaning,
developed a taxonomy of all included measurement when (1) the interrelatedness among the items is

Figure 1 Box A. Internal consistency.


Mokkink et al. BMC Medical Research Methodology 2010, 10:22 Page 3 of 8
http://www.biomedcentral.com/1471-2288/10/22

determined of a set of items that together form a reflec- reflection of the construct to be measured [5] (see
tive model, and (2) all items tap the same construct, i.e., Figure 2 for its standards). The discussion was about
they form a unidimensional (sub)scale [6,7]. how to evaluate content validity.
A reflective model is a model in which all items are a The Delphi panel agreed that content validity should
manifestation of the same underlying construct [8,9]. be assessed by making a judgment about the relevance
These items are called effect indicators and are expected and the comprehensiveness of the items. The relevance
to be highly correlated and interchangeable [9]. Its of the items should be assessed by judging whether the
counterpart is a formative model, in which the items items are relevant for the construct to be measured
together form a construct [8]. These items do not need (D1), for the study population (D2), and for the purpose
to be correlated. Therefore, internal consistency is not of the HR-PRO (D3). When a new HR-PRO is devel-
relevant for items that form a formative model. For oped, the focus and detail of the content of the instru-
example, stress could be measured by asking about the ment should match the target population (D2). When
occurrence of different situations and events that might the instrument is subsequently used in another popula-
lead to stress, such as job loss, death in a family, divorce tion than the original target population for which it was
etc. These events obviously do not need to be corre- developed, it should be assessed whether all items are
lated, thus internal consistency is not relevant for such relevant for this new study population (D2). For exam-
an instrument. Often, authors do not explicitly describe ple, a questionnaire measuring shoulder disability (i.e.,
whether their HR-PRO is based on a reflective or forma- the Shoulder Disability Questionnaire [10]) may include
tive model. To decide afterwards which model was used, the item “my shoulder hurts when I bring my hand
one can do a simple “thought test”. With this test one towards the back of my head”. When one decides to use
should consider whether all item scores are expected to this questionnaire in a population of patients with wrist
change when the construct changes. If yes, a reflective problems to measure wrist disability, one could not sim-
model is at issue. If not, the HR-PRO instrument is ply change the word “shoulder” into “wrist” because this
probably based on a formative model [8]. item might not be relevant for patients with wrist pro-
For an internal consistency statistic to get an interpre- blems. Moreover, an item like “Do you have difficulty
table meaning the scale needs to be unidimensional. with the grasping and use of small objects such as keys
Unidimensionality of a scale can be investigated with e. or pens?” [11] will probably not be included in a ques-
g. a factor analysis, but not with an assessment of inter- tionnaire for shoulder disability, while it is clearly rele-
nal consistency [8]. Rather, unidimensionality of a scale vant to ask patients with wrist problems.
is a prerequisite for a clear interpretation of the internal Experts, should judge the relevance of the items for
consistency statistics [6,7]. the construct (D1), for the patient population (D2), and
for the purpose (D3). Because the focus is on PROs
Content validity patients should be considered as experts when judging
Content validity was defined as the degree to which the the relevance of the items for the patient population
content of a HR-PRO instrument is an adequate (D2). In addition, many missing observations on an item

Figure 2 Box D. Content validity (including face validity).


Mokkink et al. BMC Medical Research Methodology 2010, 10:22 Page 4 of 8
http://www.biomedcentral.com/1471-2288/10/22

can be an indication that the item is not relevant for the (for instance with regard to internal relationships, rela-
population, or it is ambiguously formulated. tionships to scores of other instruments, or differences
To assess the comprehensiveness of the items (D4) between relevant groups) based on the assumption that
three aspects should be taken into account: the content the HR-PRO instrument validly measures the construct
coverage of the items, the description of the domains, to be measured [5]. It contains three aspects, i.e. struc-
and the theoretical foundation. The first two aspects tural validity, which concerns the internal relation-
refer to the question if all relevant aspects of the con- ships, hypotheses testing, and cross-cultural validity,
struct are covered by the items and the domains. The which both concern the relationships to scores of
theoretical foundation refers to the availability of a clear other instruments, or differences between relevant
description of the construct, and the theory on which it groups.
is based. A part of this theoretical foundation could be a Hypotheses testing
description of how different constructs within a concept The standards for hypotheses testing are given in Figure 3.
are interrelated, like for instance as described in the The discussion was about how specific the hypotheses that
model of health status of Wilson and Cleary [12] or the are being formulated should be.
International Classification of Functioning, Disability Hypotheses testing is an ongoing, iterative process
and Health (ICF) model [13]. An indication that the [14]. Specific hypotheses should include an indication of
comprehensiveness of the items was assessed could be the expected direction and magnitude of correlations or
that patients or experts were asked whether they missed differences. Hypotheses testing is about whether the
items. Large floor and ceiling effects can be an indica- direction and magnitude of a correlation or difference is
tion that a scale is not comprehensive. similar to what could be expected based on the con-
struct(s) that are being measured. The more hypotheses
Construct validity are being tested on whether the data correspond to a
Construct validity is the degree to which the scores of priori formulated hypotheses, the more evidence is gath-
an HR-PRO instrument are consistent with hypotheses ered for construct validity.

Figure 3 Box F. Hypotheses testing.


Mokkink et al. BMC Medical Research Methodology 2010, 10:22 Page 5 of 8
http://www.biomedcentral.com/1471-2288/10/22

The Delphi panel agreed that specific hypotheses to Criterion validity


be tested should be formulated a priori (F4) about Criterion validity was defined as the degree to which the
expected mean differences between known groups or scores of a HR-PRO instrument are an adequate reflec-
expected correlations between the scores on the instru- tion of a “gold standard” [5]. The criterion used should
ment and other variables, such as scores on other be considered as a reasonable “gold standard” (H4). The
instruments, or demographic or clinical variables. The Delphi panel reached consensus that no gold standards
expected direction (positive or negative) (F5) and mag- exist for HR-PRO instruments, and discussed whether
nitude (absolute or relative) (F6) of the correlations or criterion validity should be mentioned at all in the COS-
differences should be included in the hypotheses (e.g. MIN checklist. The panel decided that the only excep-
[14-17]). tion of a gold standard is when a shortened instrument
For example, an investigator may theorize that two is compared to the original long version. In that case,
HR-PROs intended to assess the same construct should the original long version can be considered the gold
correlate. Therefore, the investigator would test whether standard. Often, authors consider their comparator
the observed correlation equals the expected correlation instrument wrongly as a gold standard, for example
(e.g. > 0.70). The hypotheses may also concern the rela- when they compare the scores of a new instrument to a
tive magnitude of correlations, for example “it is widely used instrument like the SF-36. When the new
expected that the score on measure A correlates higher instrument is compared to the SF-36, we consider it as
(e.g. 0.10 higher) with the score on measure B than with construct validation, and expected hypotheses about the
the score on measure C”. magnitude and direction of the correlation between
A hypothesis can also concern differences in scores (subscales of) the instruments should be formulated and
between groups. When assessing differences between tested.
groups, it is less relevant whether these differences are
statistically significant (which depends on the sample Responsiveness
size) than whether these differences have the expected The discussion on responsiveness was about the concept
magnitude. For example, an investigator may theorize of responsiveness, (in)appropriate methods to evaluate
based on previous evidence that persons off work with responsiveness, and its relationship with validity. In the
low back pain have more pain related disability than COSMIN study responsiveness was defined as the ability
persons working with low back pain. Accordingly, an of a HR-PRO instrument to detect change over time in
instrument measuring pain related disability would be the construct to be measured [5]. Although the Delphi
valid in this context if it is capable to distinguish these panel wanted to discuss responsiveness as a separate
two groups. However, it is preferable to specify a mini- measurement property, the panel agreed that the only
mally important between-group difference. The Delphi difference between cross-sectional (construct and criter-
panel recommended that p-values should be avoided in ion) validity and responsiveness is that validity refers to
the hypotheses, because it is not relevant to examine the validity of a single score, and responsiveness refers
whether correlations or differences statistically differ to the validity of a change score [5]. Therefore, the
from zero [18]. The size of the difference is more panel decided that the standards for responsiveness
important than significant differences between the should be analogue to the standards for construct and
groups, since this is dependent on the number of sub- criterion validity. Similarly as with criterion validity, it
jects in each group. Formal hypotheses testing is prefer- was agreed that no gold standards exist for change
able based using the expected magnitude of correlations scores on HR-PROs, with the exception of change on
and differences, rather than p-values. the original longer version of a HR-PRO that can be
When hypotheses are formulated about expected rela- considered a gold standard, when it is compared to
tions with other instruments, these comparator instru- change on its shorter version.
ments should be appropriately described (F7). For Appropriate measures to evaluate responsiveness are
example, if the comparator instrument intends to mea- the same as those for hypotheses testing and criterion
sure physical activity (PA), it should be described which validity, with the only difference that hypotheses should
construct exactly it aims to measure. Some PA instru- focus on the change score of an instrument. For exam-
ments aim to measure total energy expenditure, others ple, De Boer et al. assessed responsiveness of the Low
are focussed on duration of physical activities, on the Vision Quality of Life questionnaire (LVQOL) and the
frequency of activities, or on type of activities [19]. Ide- Vision-Related Quality of Life Core Measure (VCM1) by
ally, the measurement properties of the comparator testing pre-specified hypotheses about the relations of
instruments should have been assessed in the same lan- changes in the questionnaires with changes in other
guage version and the same patient population as is measures in patient with irreversible vision loss [20].
used in the study. They hypothesized, for example, that ‘the correlation of
Mokkink et al. BMC Medical Research Methodology 2010, 10:22 Page 6 of 8
http://www.biomedcentral.com/1471-2288/10/22

change on the LVQOL/VCM1 with change on the factor analyses, such as the choice of the explorative fac-
Visual Functioning questionnaire (VF-14) is higher than tor analysis (principal component analysis or common
the correlation with the global rating scale, change in factor analysis), the choice and justification of the rota-
visual acuity and change on the Euroqol thermometer’. tion method (e.g. orthogonal or oblique rotation), or the
After calculating correlations between the change scores decision about the number of relevant factors. Such spe-
on the different measurement instruments they con- cific requirements are described by e.g. Floyd & Wida-
cluded whether the correlations were as expected. man [28] and De Vet et al. [29].
There are a number of parameters proposed in the lit- In the Delphi panel it was discussed that in a study
erature to assess responsiveness that the Delphi panel evaluating the interpretability of scores of an HR-PRO
considers inappropriate. The panel reached consensus instrument the minimal important change (MIC) or
that the use of effect sizes (mean change score/SD base- minimal important difference (MID) should be deter-
line) [21], and related measures, such as standardised mined. The MIC is the smallest change in score in the
response mean (mean change score/SD change score) construct to be measured which patients perceive as
[22], Norman’s responsiveness coefficient (s2 change/s2 important. The MID is the smallest differences in the
change + s2 error) [23], and relative efficacy statistic ((t- construct to be measured between patients that is consid-
statistic1/t-statistic2)2) [24] are inappropriate measures ered important [30]. Since we talk about patient-reported
of responsiveness. The paired t-test was also considered outcomes, the agreement among panel members was that
to be inappropriate, because it is a measure of signifi- the patients should be the one to decide on what is
cant change instead of valid change, and it is dependent important. In the literature there is an ongoing discus-
on the sample size of the study [18]. These measures sion about which methods should be used to determine
are considered measures of the magnitude of change the MIC or MID of a HR-PRO instrument [31]. Conse-
due to an intervention or other event, rather than mea- quently, the opinions of the panel members differed
sures of the quality of the measurement instrument widely, and within the COSMIN study no consensus on
[25,26]. Guyatt’s responsiveness ratio (MIC/SD change standards for assessing MIC could be reached.
score of stable patients) [27] was also considered to be The results of a Delphi study are dependent on the
inappropriate, because it takes the minimal important composition of the panel. The panel members do not
change into account. The Delphi panel agreed that need to be randomly selected to represent a target
minimal important change concerns the interpretation population. Rather experts are chosen because of their
of the change score, but not the validity of the change knowledge of the topic of interest [32,33]. It has been
score. noted that heterogeneous groups produce a higher pro-
portion of high-quality, highly acceptable solutions than
Discussion homogeneous groups [1]. Furthermore, anonymity of
In this article, we explained our choices for the design each of the panel members is often recommended,
requirements and preferred statistical methods for because it provides an equal chance for each panel
which no evidence is available in the literature or which member to present and react to ideas unbiased by the
generated major discussions among the members of the identities of other participants [34]. Both issues were
Delphi study during the development of the COSMIN ensured in this Delphi study. We included experts in
checklist. However, within the four rounds of the Delphi the field of psychology, epidemiology, statistics and clin-
study, two issues could not be discussed extensively, due ical medicine. The panel members did not know who
to lack of time. These issues concerned factor analyses the other panel members were. All questionnaires were
(mentioned in Box A internal consistency and Box E analysed and reported back anonymously. Only one of
structural validity) and minimal important change (men- the researchers (LM) had access to this information.
tioned in Box J interpretability). The COSMIN Delphi study focussed on assessing the
The Delphi panel decided that the evaluation of struc- methodological quality of studies on measurement prop-
tural validity can be done either by explorative factor erties of existing HR-PROs. However, we think that the
analysis or confirmative factor analysis. However, confir- discussions described above and the COSMIN checklist
matory factor analysis is preferred over explorative fac- itself are also relevant and applicable for researchers
tor analysis, because confirmative factor analysis tests who are developing HR-PROs. The COSMIN checklist
whether the data fit an a priori hypothesized factor can be a useful tool for designing a study on measure-
structure [28], while explorative factor analysis can be ment properties.
used when no clear hypotheses exist about the underly-
ing dimensions [28]. Such an explorative factor analysis Conclusions
is not a strong tool in hypothesis testing. In the COS- In conclusion, as there is not much empirical evidence
MIN study we did not discuss specific requirements for for standards for the assessment of measurement
Mokkink et al. BMC Medical Research Methodology 2010, 10:22 Page 7 of 8
http://www.biomedcentral.com/1471-2288/10/22

properties, we consider the Delphi technique the most status measurement instruments: an international Delphi study. Qual Life
Res 2010.
appropriate method to develop a checklist on the meth- 4. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL,
odological quality of studies on measurement properties. Bouter LM, De Vet HCW: The COSMIN checklist manual. [http://cosmin.nl].
Within this Delphi study we have had many interesting 5. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL,
Bouter LM, De Vet HCW: International consensus on taxonomy,
discussions, and reached consensus on a number of terminology, and definitions of measurement properties for health-
important issues about the assessment of measurement related patient-reported outcomes: results of the COSMIN study. J Clin
properties. We expect that this paper will contribute to Epidemiol 2010.
6. Cortina JM: What is coefficient alpha? An examination of theory and
a better understanding of the rationale behind the items applications. J Appl Psychology 1993, 78:98-104.
in the COSMIN checklist, thereby enhancing its accep- 7. Cronbach LJ: Coefficient Alpha and the Internal Structure of Tests.
tance and use. Psychometrika 1951, 16:297-334.
8. Fayers PM, Hand DJ, Bjordal K, Groenvold M: Causal indicators in quality of
life research. Qual Life Res 1997, 6:393-406.
9. Streiner DL: Being inconsistent about consistency: when coefficient alpha
Acknowledgements
does and doesn’t matter. J Pers Assess 2003, 80:217-222.
We are grateful to all the panel members who have participated in the
10. Heijden Van der GJ, Leffers P, Bouter LM: Shoulder disability questionnaire
COSMIN study:
design and responsiveness of a functional status measure. J Clin
Neil Aaronson, Linda Abetz, Elena Andresen, Dorcas Beaton, Martijn Berger,
Epidemiol 2000, 53:29-38.
Giorgio Bertolotti, Monika Bullinger, David Cella, Joost Dekker, Dominique
11. Levine DW, Simmons BP, Koris MJ, Daltroy LH, Hohl GG, Fossel AH, Katz JN:
Dubois, Arne Evers, Diane Fairclough, David Feeny, Raymond Fitzpatrick,
A self-administered questionnaire for the assessment of severity of
Andrew Garratt, Francis Guillemin, Dennis Hart, Graeme Hawthorne, Ron
symptoms and functional status in carpal tunnel syndrom. J Bone Joint
Hays, Elizabeth Juniper, Robert Kane, Donna Lamping, Marissa Lassere,
Surg Am 1993, 75:1585-1592.
Matthew Liang, Kathleen Lohr, Patrick Marquis, Chris McCarthy, Elaine
12. Wilson IB, Cleary PD: Linking clinical variables with health-related quality
McColl, Ian McDowell, Don Mellenbergh, Mauro Niero, Geoffrey Norman,
of life. A conceptual model of patient outcomes. JAMA 1995, 273:59-65.
Manoj Pandey, Luis Rajmil, Bryce Reeve, Dennis Revicki, Margaret Rothman,
13. World Health Organization: ICF: international classification of functioning,
Mirjam Sprangers, David Streiner, Gerold Stucki, Giulio Vidotto, Sharon
disability and health Geneva: World Health Organization 2001.
Wood-Dauphinee, Albert Wu.
14. Strauss ME, Smith GT: Construct Validity: Advances in Theory and
This study was financially supported by the EMGO Institute for Health and
Methodology. Annu Rev Clin Psychol 2008.
Care Research, VU University Medical Center, Amsterdam, and the Anna
15. Cronbach LJ, Meehl PE: Construct validity in psychological tests. Psychol
Foundation, Leiden, the Netherlands. These funding organizations did not
Bull 1955, 52:281-302.
play any role in the study design, data collection, data analysis, data
16. McDowell I, Newell C: Measuring health. A guide to rating scales and
interpretation, or publication.
questionnaires New York, NY: Oxford University Press, 2 1996.
17. Messick S: The standard problem. Meaning and values in measurement
Author details
1 and evaluation. American Psychologist 1975, 955-966.
Department of Epidemiology and Biostatistics and the EMGO Institute for
18. Altman DG: Practical statistics for medical research London: Chapman & Hall/
Health and Care Research, VU University Medical Center, Amsterdam, the
CRC 1991.
Netherlands. 2School of Rehabilitation Science and Department of Clinical
19. Terwee CB, Mokkink LB, Van Poppel MNM, Chinapaw MJM, Van
Epidemiology and Biostatistics, McMaster University, Hamilton, Canada.
3 Mechelen W, De Vet HCW: Qualitative attributes and measurement
Health Services Research Unit, Institute Municipal d’Investigació Mèdica
properties of physical activity questionnaires: a checklist. Accepted for
(IMIM-Hospital del Mar), Barcelona, Spain. 4Centro de Investigación
publication in Sports Med .
Biomédica en Red de Epidemiología y Salud Pública (CIBERESP), Spain.
5 20. De Boer MR, Terwee CB, De Vet HC, Moll AC, Völker-Dieben HJ, Van
Department of Health Services, University of Washington, Seattle, USA.
6 Rens GH: Evaluation of cross-sectional and longitudinal construct validity
Executive Board of VU University Amsterdam, Amsterdam, the Netherlands.
of two vision-related quality of life questionnaires: the LVQOL and
VCM1. Qual Life Res 2006, 15:233-248.
Authors’ contributions
21. Cohen J: Statistical power analysis for the behavioural sciences Hillsdale, NJ:
CT and HdV secured funding for the study. CT, HdV, LB, DK, DP, JA, and PS
Lawrence Erlbaum Associates, 2 1988.
conceived the idea for the study. LM and CT prepared all questionnaires for
22. McHorney CA, Tarlov AR: Individual-patient monitoring in clinical practice:
the four Delphi rounds, supervised by HdV, DP, JA, PS, DK and LB. LM, CT,
are available health status surveys adequate? Qual Life Res 1995,
and HdV interpreted the data. LM coordinated the study and managed the
4:293-307.
data. CT, DP, JA, PS, DK, LB and HdV supervised the study. LM wrote the
23. Norman GR: Issues in the use of change scores in randomized trials. J
manuscript with input from all the authors. All authors read and approved
Clin Epidemiol 1989, 42:1097-1105.
the final version of the report.
24. Stockler MR, Osoba D, Goodwin P, Corey P, Tannock IF: Responsiveness to
change in health-related quality of life in a randomized clinical trial: a
Competing interests
comparison of the Prostate Cancer Specific Quality of Life Instrument
The authors declare that they have no competing interests.
(PROSQOLI) with analogous scales from the EORTC QLQ-C30 and a trial
specific module. European Organization for Research and Treatment of
Received: 18 August 2009 Accepted: 18 March 2010
Cancer. J Clin Epidemiol 1998, 51:137-145.
Published: 18 March 2010
25. Streiner DL, Norman GR: Health measurement scales. A practical guide to
their development and use Oxford: University Press, 4 2008.
References 26. Terwee CB, Dekker FW, Wiersinga WM, Prummel MF, Bossuyt PM: On
1. Powell C: The Delphi technique: myths and realities. J Adv Nurs 2003, assessing responsiveness of health-related quality of life instruments:
41:376-382. guidelines for instrument evaluation. Qual Life Res 2003, 12:349-362.
2. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, 27. Guyatt GH, Walter S, Norman GR: Measuring change over time: assessing
Bouter LM, De Vet HCW: Protocol of the COSMIN study: COnsensus-based the usefulness of evaluative instruments. J Chronic Dis 1987, 40:171-178.
Standards for the selection of health Measurement INstruments. BMC 28. Floyd FJ, Widaman KF: Factor analysis in the development and
Med Res Methodol 2006, 6:2. refinement of clinical assessment instruments. Psychological Assessment
3. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, 1995, 7:286-299.
Bouter LM, De Vet HCW: The COSMIN checklist for assessing the 29. De Vet HC, Ader HJ, Terwee CB, Pouwer F: Are factor analytical techniques
methodological quality of studies on measurement properties of health used appropriately in the validation of health status questionnaires? A
Mokkink et al. BMC Medical Research Methodology 2010, 10:22 Page 8 of 8
http://www.biomedcentral.com/1471-2288/10/22

systematic review on the quality of factor analysis of the SF-36. Qual Life
Res 2005, 14:1203-1218.
30. De Vet H, Beckerman H, Terwee CB, Terluin B, Bouter LM, for the
clinimetrics working group: Definition of clinical differences. Letter to the
Editor. J Rheumatol 2006, 33:434.
31. Revicki DA, Hays RD, Cella DF, Sloan JA: Recommended methods for
determining responsiveness and minimally important differences for
patient-reported outcomes. J Clin Epidemiol 2008, 61:102-109.
32. Keeney S, Hasson F, McKenna HP: A critical review of the Delphi
technique as a research methodology for nursing. Int J Nurs Stud 2001,
38:195-200.
33. Hasson F, Keeney S, McKenna H: Research guidelines for the Delphi
survey technique. J Adv Nurs 2000, 32:1008-1015.
34. Goodman CM: The Delphi technique: a critique. J Adv Nurs 1987,
12:729-734.

Pre-publication history
The pre-publication history for this paper can be accessed here: http://www.
biomedcentral.com/1471-2288/10/22/prepub

doi:10.1186/1471-2288-10-22
Cite this article as: Mokkink et al.: The COSMIN checklist for evaluating
the methodological quality of studies on measurement properties: A
clarification of its content. BMC Medical Research Methodology 2010 10:22.

Submit your next manuscript to BioMed Central


and take full advantage of:

• Convenient online submission


• Thorough peer review
• No space constraints or color figure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution

Submit your manuscript at


www.biomedcentral.com/submit

You might also like