0% found this document useful (0 votes)
61 views9 pages

Beaton2005 Quick Dash

This document describes the development of a shortened version of the DASH outcome measure called the QuickDASH. Researchers evaluated three approaches to reducing the 30-item DASH to 11 items while maintaining its measurement properties. The approaches were a concept-retention method, equidiscriminative item-total correlation, and Rasch modeling. All three 11-item versions showed good reliability and validity when tested on patients with upper extremity disorders. The concept-retention version was found to best retain similarities to the original DASH and is therefore named the QuickDASH.

Uploaded by

jose silva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views9 pages

Beaton2005 Quick Dash

This document describes the development of a shortened version of the DASH outcome measure called the QuickDASH. Researchers evaluated three approaches to reducing the 30-item DASH to 11 items while maintaining its measurement properties. The approaches were a concept-retention method, equidiscriminative item-total correlation, and Rasch modeling. All three 11-item versions showed good reliability and validity when tested on patients with upper extremity disorders. The concept-retention version was found to best retain similarities to the original DASH and is therefore named the QuickDASH.

Uploaded by

jose silva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

1038

COPYRIGHT © 2005 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED

Development of the
QuickDASH:
Comparison of Three
Item-Reduction Approaches
BY DORCAS E. BEATON, BSCOT, MSC, PHD, JAMES G. WRIGHT, MD, MPH, FRCSC,
JEFFREY N. KATZ, MD, MS, AND THE UPPER EXTREMITY COLLABORATIVE GROUP*
Investigation performed at the Institute for Work and Health, Toronto, Ontario, Canada

Background: The purpose of this study was to develop a short, reliable, and valid measure of physical function and
symptoms related to upper-limb musculoskeletal disorders by shortening the full, thirty-item DASH (Disabilities of the
Arm, Shoulder and Hand) Outcome Measure.
Methods: Three item-reduction techniques were used on the cross-sectional field-testing data derived from a study
of 407 patients with various upper-limb conditions. These techniques were the concept-retention method, the equi-
discriminative item-total correlation, and the item response theory (Rasch modeling). Three eleven-item scales were
created. Data from a longitudinal cohort study in which the DASH questionnaire was administered to 200 patients
with shoulder and wrist/hand disorders were then used to assess the reliability (Cronbach alpha and test-retest reli-
ability) and validity (cross-sectional and longitudinal construct) of the three scales. Results were compared with those
derived with the full DASH.
Results: The three versions were comparable with regard to their measurement properties. All had a Cronbach alpha
of ≥0.92 and an intraclass correlation coefficient of ≥0.94. Evidence of construct validity was established (r ≥ 0.64
with single-item indices of pain and function). The concept-retention method, the most subjective of the approaches
to item reduction, ranked highest in terms of its similarity to the original DASH.
Conclusions: The concept-retention version is named the QuickDASH. It contains eleven items and is similar with re-
gard to scores and properties to the full DASH. A comparison of item-reduction approaches suggested that the retention
of clinically sensible and important content produced a comparable, if not slightly better, instrument than did more sta-
tistically driven approaches.
Clinical Relevance: The QuickDASH is a more efficient version of the DASH outcome measure that appears to retain
its measurement properties.

P
atient-based questionnaires are well-accepted means The Disabilities of the Arm, Shoulder and Hand Out-
with which to quantify a patient’s perception of the im- come Measure (DASH) is a thirty-item questionnaire that
pact of a disorder. Shorter questionnaires are attractive quantifies physical function and symptoms in persons with
as they save time, are easier to use, and minimize the burden any or multiple musculoskeletal disorders of the upper limb1-3.
on the respondent and therefore minimize missing data. How- Direct comparisons with other, more joint-specific or disease-
ever, they often sacrifice measurement properties (internal specific measures have shown the DASH to have comparable2
consistency and test-retest reliability) for their brevity. An or almost comparable4 reliability and validity. A major advan-
ideal scale would be one that was as short as possible while re- tage of the DASH is that it can be used for any upper-extremity
taining the necessary measurement properties. evaluation and therefore offers more versatility for clinical and
research applications.
*The Upper Extremity Collaborative Group included Peter Amadio, MD,
Shortening the DASH is attractive and sensible provided
Claire Bombardier, MD, Donald Cole, MD, MSc, Aileen Davis, BScPT, PhD, that its measurement properties are maintained. The thirty-
Pam Hudak, BScPT, PhD, Robert Marx, MD, MSc, Gillian Hawker, MD, item DASH has been shown in multiple studies to have a high
MSc, Matti Makela, MD, and Laura Punnett, DSc. Cronbach alpha (0.97)1,3, suggesting the possibility of item re-
1039
THE JOUR NAL OF BONE & JOINT SURGER Y · JBJS.ORG D E VE L O P M E N T O F T H E Q U I C K DASH: C O M P A R I S O N
VO L U M E 87-A · N U M B E R 5 · M AY 2005 O F T H R E E I T E M -R E D U C T I O N A P P RO A C H E S

TABLE I Description of Samples Used in Development and Testing of QuickDASH

Characteristic Development of QuickDASH Validation of QuickDASH


11
Original study Field-testing for development of full DASH Cohort study for testing responsiveness of DASH and
other measures12

Size of cohort 407 200


Diagnoses Mixed, accrued from clinical practices at 21 sites Total shoulder arthroplasty, rotator cuff tendinopathy,
carpal tunnel syndrome, wrist/hand tendinitis

Mean patient age (yr) 45.0 53.6

Gender 166 male, 206 female, 35 data on gender missing 86 male, 113 female, 1 data on gender missing

dundancy. The Spearman-Brown prophesy5 is a belief that the Materials and Methods
cross-sectional reliability of a questionnaire will be reduced by Data
shortening the questionnaire, given fairly consistent inter-item wo datasets were used for the item-reduction process. The
correlations. We wished to ensure that any version of the DASH
would have a Cronbach alpha of 0.90 to 0.95. Using a formula
T first was the field-testing data that had been used to
create the thirty-item DASH, and the second was data gath-
for the Spearman-Brown prophesy, we calculated that a mini- ered in a cohort study used to test the reliability and validity
mum of eleven items would be needed to retain an alpha of of the DASH. Each will be briefly described below and in
0.90, which is the lower boundary of what is considered reason- Table I.
able for measurement at an individual level (i.e., in a clinical sit-
uation)5,6. The decision to reduce the number of items on an Field-Testing Data
established questionnaire is not a simple one. The risk with re- The DASH was created by pooling items from thirteen differ-
gard to precision could result in a weak instrument with which ent questionnaires that addressed the health-related quality of
the respondents’ “true” score lies somewhere within a broader life of persons with an upper-limb problem11. Through a pro-
range of scores around the observed score. When the Cronbach cess of eliminating overlap, a set of seventy-eight items was se-
alpha coefficient is <0.90, this range becomes so large that it is lected and was fielded in a cross-sectional study of 407 persons
impractical to use the questionnaire for individual patients6. with varying upper-extremity problems from twenty-one dif-
Our goal in reducing the size of the DASH was to retain a suffi- ferent sites. In a second sample, the importance and difficulty
ciently high Cronbach alpha (>0.90) to ensure precision at an of the seventy-eight items were rated by a sample of seventy-
individual level6, while retaining its reliability and validity. six persons from two of the twenty-one sites. The mean age
Several approaches are available for reducing the number (and standard deviation) in both of these samples was 45 ± 16
of items. The first, which we will call the “concept-retention ap- years; the percentages of women were 51% and 61%8. Equidis-
proach,” is a judgmental approach geared toward retaining the criminative item-total correlations were used along with pa-
domains or concepts in the original theoretical framework for tient ratings of difficulty and importance to form the final
the instrument3,7 and selecting an item or items from each do- thirty-item DASH3,7. The mean DASH score (and standard de-
main. The second approach is equidiscriminative item-total viation) was 37.9 ± 22.0 points.
correlation. With this method, the investigator selects items
with the highest correlation with the overall score, across the Cohort Data
breadth of the overall score, by calculating the item-total cor- The measurement properties of the DASH were evaluated in a
relations in three subgroups stratified by the level of the score. prospective cohort study of 200 persons who completed a
We chose three levels: high, moderate, and low levels of dis- questionnaire package twice, at intervals three to five days
ability5. The highest ranking items should create a question- apart, before treatment and twice, at four and twelve weeks,
naire that is able to detect disability across the full range of after treatment for a shoulder or wrist/hand problem2,12. This
scores. This approach, supplemented by patient ratings of the study focused on four diagnostic groups: rotator cuff tendino-
importance and difficulty of items3, was used to develop the pathies, shoulder osteoarthritis, carpal tunnel syndrome, and
DASH8. Finally, our third item-reduction approach was Rasch wrist/hand tendinopathies. Other measures used as compara-
modeling, with which items are calibrated on the basis of their tors for the assessment of construct validity included visual
relative difficulty, as compared with one another, along the analogue scales of the patients’ perception of their problem,
breadth of the scale score. When used to reduce items, Rasch pain severity, function, and ability to work as well as questions
modeling leads to the selection of items that are equally spaced about work status and their ability to cope with symptoms or
and calibrated along the scale length9,10. limitations. The mean age of this sample was 53.6 ± 14.2 years,
The purpose of this study was to develop a valid, reliable and 57% (113) of the 200 individuals were female. The mean
eleven-item version of the DASH on the basis of a comparison DASH score in this sample was 43.9 ± 22.9 points. Table I
of these three approaches to item reduction. summarizes the data on the two samples.
1040
THE JOUR NAL OF BONE & JOINT SURGER Y · JBJS.ORG D E VE L O P M E N T O F T H E Q U I C K DASH: C O M P A R I S O N
VO L U M E 87-A · N U M B E R 5 · M AY 2005 O F T H R E E I T E M -R E D U C T I O N A P P RO A C H E S

In the current study, the field-testing data were used for score level. If the same item was in the top-four list for two of
item reduction and the cohort data were used to evaluate the the score ranges, it was dropped from the list of the higher
measurement properties of the three versions of the Quick- score group, and the next ranked item from that group was
DASH created by the three item-reduction approaches2,12. Each substituted into that list. Finally, the item with the lowest cor-
of the original studies had been reviewed by the research ethics relation of the twelve was dropped to bring the total to our
board at each clinical site, and the participants had consented to target of eleven items.
be in the study with the knowledge that the data would be used
for ongoing testing of the DASH questionnaire. Rasch Approach
A single-parameter partial credit Rasch model9,10,15 was used to
Application of Three Item-Reduction create the third short version of the DASH with use of Bigsteps
Approaches to the DASH software (version 2.6)9. Items are weighted according to their
Concept-Retention Approach level of difficulty along a linear logistic function. Adjustments
The intent of the concept-retention approach was to create a are made to allow for multiple rather than dichotomous re-
short version with items selected from the key domains identi- sponse options (partial credit model). If an item fits this linear
fied in the theoretical framework of the DASH7. To achieve function, the mean square error term for the item will rest be-
this, items in the DASH were sorted according to the domain tween 0.7 and 1.3, or the related Z statistic will be <2. In item
that they represented. Data from the field-testing study were reduction, Rasch methodology can be used to delete these
used to rank the importance and difficulty of each of the items misfitting items as well as to minimize overlap in the level of
as well as the correlation that each had with the total score. difficulty represented in the scale. Items were deleted when
This information was combined to identify the highest-ranking their infit and/or outfit statistic was >2, indicating noise in
item in each of the sixteen domains8,13,14. Five domains had to how that item was scored relative to other items across indi-
be eliminated to achieve our target of an eleven-item question- viduals. Priority was given to the infit Z statistics because they
naire. Sexual functioning was dropped as a result of the high are sensitive to errors near the person’s ability, as opposed to
percentage of missing values, probably due to poor acceptabil- outfit statistics, which are more sensitive to errors in items fur-
ity, during clinical use of the DASH1. Self-image was furthest ther from the person’s ability. Highly negative (less than −2)
from the core concepts of physical function and symptoms standardized infit statistics suggest redundancy in items. We
and therefore was also eliminated. Family care was eliminated used this factor only as a second line of reduction. Although it
by combining it with social activities. Finally, since inter-item is not done with most Rasch approaches to scale development,
correlations among stiffness, pain, and weakness were all ~0.60 we set an a priori target of eleven items. The first run included
(polychoric correlations), only pain, the most universally ex- all thirty DASH items. Misfitting items (infit and outfit Z sta-
perienced of the three symptoms and the most salient present- tistics of >2) were dropped manually, with the item with the
ing feature for patients, was used. The number of domains largest infit statistic dropped first and the program rerun. Iter-
now numbered eleven, and the top item in each domain be- ations continued until eleven items remained, ideally with no
came part of the short, concept-retention version, which was misfitting items and a good range of logit (weighting) val-
approved by the Upper Extremity Collaborative Group, who ues (−2 to +2) and good steps between the logit values (~0.15
originally developed the DASH. logits). Rasch modeling was used only for item selection; Ra-
sch weights and scores were not used as a final scoring because
Equidiscriminative Item-Total of the complexities of applying these partial credit weights in a
Correlation Approach clinical situation. A simple summative score was used across
The second item-reduction approach was the equidiscrimina- the Rasch-selected items.
tive item-total correlation method5. This was the statistical ap-
proach that contributed to the development of the DASH Testing of the Measurement Properties
from a larger pool of items8,13,14, and we followed the same pro- of the Resultant Questionnaires
cess. Three variables were created, representing the 25th, 50th, The three eleven-item QuickDASH versions resulting from the
and 75th percentile values for the distribution of the thirty- three approaches to item reduction were evaluated indepen-
item DASH scores in the field-testing sample. Individuals were dently. Testing was done with use of the prospective cohort
assigned a “yes” or “no” for each of these variables depending data. The results were then compared with each other and
on whether or not their score on the thirty-item DASH was with the results of the DASH outcome measure.
higher or lower than the particular percentile value. Each di-
chotomous variable was then correlated with each of the items Item Level
in the DASH. A higher correlation identified items associated The proportion of the sample falling into each of the response
with a higher DASH score (yes/no to above the 75th percen- categories, including “missing,” was calculated. Items with
tile), midrange, or lower (yes/no to above the 25th percentile) >40% in one category were considered to be at risk for poor
score. The four items with the highest correlations with each discrimination in this sample5. Mean item difficulties, item-
dichotomous marker were selected because they represented to-total correlations, and Cronbach alpha coefficients were
the items most likely to be sensitive and discriminating to that calculated16.
1041
THE JOUR NAL OF BONE & JOINT SURGER Y · JBJS.ORG D E VE L O P M E N T O F T H E Q U I C K DASH: C O M P A R I S O N
VO L U M E 87-A · N U M B E R 5 · M AY 2005 O F T H R E E I T E M -R E D U C T I O N A P P RO A C H E S

Scale Level tionnaire could be closest to interval level in measurement but


Distribution would be expected to have items with skewed distribution be-
Univariate analyses, including the mean, median, standard de- cause of its effort to capture the full spectrum of disability.
viation, 25th and 75th percentiles, and full range of scores, Finally, the version derived with the equidiscriminative item-
were used to describe the distribution of scores for each of the total correlation method would probably have items with the
QuickDASH versions. highest correlation with each other and with the total score,
giving it an advantage with regard to traditional psychometric
Convergent Construct Validity markers such as the Cronbach alpha as well as homogeneity in
Correlations (Pearson product moment) were calculated be- the distribution of responses to items. The different advan-
tween each of the QuickDASH versions and the visual analogue tages meant that we had to set clear a priori rules for deciding
scores for ability to function in daily activities, rating of prob- which version we would endorse. The final decision was to be
lem, pain severity, and ability to work. Correlations were 0.64 to based on three criteria: (1) fewest items with >40% in one re-
0.80, as were expected and which were consistent with the pre- sponse category, (2) a Cronbach alpha of >0.90, and (3) highest
vious testing of the DASH1,3. Higher correlations would not be correlation with the full DASH and measurement properties
expected because of the use of single-item scales (higher mea- most similar to those of the full DASH.
surement error).
Results
Known-Groups Validity Creation of Three Versions of the QuickDASH
Two constructs to which the DASH is known to be sensitive Concept-Retention Approach
were used to test the ability of the three QuickDASH versions he final eleven items derived with the conceptual ap-
to differentiate between people who are more severely affected
from those who are not as severely affected. The first was the
T proach to item reduction are shown in the Appendix as
well as in Figure 1. As described above, the best item was se-
ability to work versus the inability to work as a result of the lected from eleven of the sixteen original domains of the the-
upper-limb problem. The second was the ability to do all that oretical framework of the DASH7. The domains that were
one wanted to do versus being limited in some way. Unpaired dropped were weakness, stiffness, family care, sexual activi-
t tests were used to compare QuickDASH scores to determine ties, and self-image.
whether there were significant differences in the scores across
groups. The magnitude of the difference was also compared Equidiscriminative Item-Total
across QuickDASH versions. Correlation Approach
The four items with the highest correlation to the dichoto-
Responsiveness mous indicators of having a DASH score higher or lower than
Standardized response means, defined as the mean change each of the 25th, 50th, and 75th percentiles are shown in Fig-
score divided by the standard deviation of the change, were ure 1 and in the Appendix. Several items appeared in more
used as a summary statistic of responsiveness17,18. Responsive- than one list. These included pushing open a heavy door, do-
ness was calculated on the basis of the patients expected to ing heavy household chores, gardening/yard work, carrying
improve (therefore everyone in the cohort study) and those heavy objects, and recreational activities with force through
who actually indicated that their problem was decreased on the arm. These were retained in the lowest percentile column
an 11-point transitional scale (a scale of 0 to 10, with >6 and were replaced with the next highest ranked item in the
considered to indicate improvement). Relative efficiency was higher percentile column. After all twelve items (four in each
assessed by creating a ratio of the square of the standardized column) were selected, the one with the lowest item-total cor-
response mean for each QuickDASH over the square of the relation (doing usual work; r = 0.58) was dropped to achieve
standardized response mean for the thirty-item DASH19. If the final target of eleven items.
the relative efficiency was greater than one, that version of
the QuickDASH was considered to be more “efficient” (more Rasch Approach
signal, less noise) than the full DASH for measuring change Five iterations of Rasch modeling were used, with misfitting
in that sample. The relative efficiency provides a ratio that items dropped at each step. The first item to be eliminated was
can be related to the relative sample size needed to observe self-image (infit Z = 9.9) followed by weakness and tingling
that level of effect size if one used the instrument in the de- (infit Z = ~6). Writing was deleted as well because of a less ex-
nominator rather than the one in the numerator. treme misfit (infit Z = 2.4). Sexual function was also dropped
at this stage because of a high rate of missing values (115 of
A Priori Decision Rules 395 observations were missing). The next four steps deleted
We recognized a priori that each item-reduction method additional items. Fortunately, the fit improved as the item
would have certain strengths based solely on its approach to count neared the target of eleven. The final items are shown in
item reduction. For example, the concept-retention Quick- Figure 1 and the Appendix, which lists the retained items,
DASH would have the advantage of covering all of the relevant their weight, and the standardized infit and outfit statistics.
domains. On the other hand, the Rasch version of the ques- Two items, using a key and doing usual work, became margin-
1042
THE JOUR NAL OF BONE & JOINT SURGER Y · JBJS.ORG D E VE L O P M E N T O F T H E Q U I C K DASH: C O M P A R I S O N
VO L U M E 87-A · N U M B E R 5 · M AY 2005 O F T H R E E I T E M -R E D U C T I O N A P P RO A C H E S

ally misfitting (infit and outfit Z > 2) in this final model, but highest frequency of missing responses, with 6% (twelve) of
they were retained because no other substitution of a similarly the 200 patients not responding to those questions. Light
weighted item had a better fit and because of their consistent recreation, managing transportation needs, using a key, and
fit in previous iterations. using a knife were the only items to which >40% of the sam-
ple responded with one response category (“no difficulty”).
Summary of Items Across Different On the concept-retention version of the QuickDASH, the re-
Versions of the QuickDASH sponse to the item regarding severity of numbness and tin-
Only two items were shared across all three methodological gling was given as “none” by 39.5% of the respondents, and
approaches: doing heavy household chores and carrying a that item had the lowest item-to-total correlation (r = 0.46),
shopping bag (Fig. 1). The equidiscriminative item-total cor- suggesting that it could be different than the other items.
relation version of the QuickDASH and the Rasch version had Pain severity did not have this problem. The distributions
the greatest overlap, with five items shared only between them were most disparate between the items in the Rasch version
and two items common to all three versions. The question- of the QuickDASH. This finding is consistent with Rasch
naire derived with the concept-retention approach had the analyses, which seek items across the score range and hence
most unique items (five), which probably reflects the focus on with opposite distributions. Equidiscriminative item-total
unique domains in each item. correlation tends to favor similar, more normal distributions
where the greatest variance is found, leading to a higher cor-
Results of Item-Level Analysis relation. The Cronbach alpha coefficient was ≥0.92 for each
Details of the item-level analysis are available in the Appen- version (0.92 for the concept-retention version, 0.95 for the
dix. There were very few items to which more than one or equidiscriminative item-total correlation version, and 0.95
two patients failed to respond. Recreational activities with for the Rasch version), and item-to-total correlations were
force taken through the arm and doing yard work had the also satisfactory in all versions.

Fig. 1
Items selected with each item-reduction method and overlap between approaches. Concept = concept-retention method,
EITC = equidiscriminative item-total correlation, and Rasch = Rasch modeling. Only two items were found in all three ver-
sions of the QuickDASH.
1043
THE JOUR NAL OF BONE & JOINT SURGER Y · JBJS.ORG D E VE L O P M E N T O F T H E Q U I C K DASH: C O M P A R I S O N
VO L U M E 87-A · N U M B E R 5 · M AY 2005 O F T H R E E I T E M -R E D U C T I O N A P P RO A C H E S

TABLE II Construct Validity, Responsiveness, and Test-Retest Reliability for Three Versions of QuickDASH with Reference to Full
DASH Results5 for Comparison*

QuickDASH
Equidiscriminative
Concept- Item-Total
Retention Correlation Rasch Full DASH
Convergent construct validity (Pearson correlation)
Overall problem 0.70 0.64 0.65 0.71
Overall pain 0.73 0.64 0.65 0.72
Ability to function 0.80 0.74 0.78 0.79
Ability to work 0.76 0.72 0.73 0.77

Known-groups validity (mean scores for subgroups)†


Able to do all one needs to do 25.4 24.1 20.6 23.6
Unable to do all one needs to do 48.6 49.0 41.1 47.1

Known-groups validity (mean scores for subgroups)†


Able to work 27.5 26.4 20.1 26.8
Unable to work due to arm 52.6 53.1 45.3 47.1

Responsiveness (standardized response mean‡)


Observed change (n = 171) 13.4/16.9 = 0.79 13.1/18.9 = 0.69 12.5/18.2 = 0.64 13.2/16.9 = 0.78
Change in those rating problem as better (n = 121) 17.3/16.7 = 1.03 17.1/18.8 = 0.91 16.0/18.0 = 0.89 17.3/16.4 = 1.05
Patients with shoulder problem 17.8/16.4 = 1.08 17.5/17.7 = 0.99 16.3/17.5 = 0.93 17.7/15.7 = 1.13
Patients with wrist/hand problem 16.3/17.6 = 0.93 16.3/21.0 = 0.77 15.2/19.2 = 0.79 16.3/17.8 = 0.92

Relative efficiency§ (compared with full DASH)


Observed change 1.03 0.78 0.67 1.0
Change in those rating problem as better 0.96 0.75 0.72 1.0
Patients with shoulder problem 0.91 0.76 0.68 1.0
Patients with wrist/hand problem 1.02 0.70 0.74 1.0

Correlations with change (transitional index)


Change in problem overall 0.39 0.37 0.32 0.40
Change in ability to function 0.35 0.26 0.25 0.32
Test-retest reliability (Shrout and Fleiss 2,1 0.94 0.96 0.97 0.96
intraclass correlation coefficient)

*All correlations are Pearson product-moment correlations and were significant at p = 0.05. †Known-group differences were analyzed with an
unpaired t statistic and were significant at p = 0.05. ‡Standardized response mean (SRM) = mean change/standard deviation of that
change. §Relative efficiency = SRM2QuickDASH /SRM2DASH.

Results of Scale-Level Analysis high (Pearson product-moment correlation, r ≥ 0.94). The


Distribution of Scores distribution of the concept-retention version was closest to
The mean and median scores for the concept-retention and that of the DASH.
equidiscriminative item-total correlation versions of the
QuickDASH were similar in magnitude to the scores for the Construct Validity: Correlational
DASH, whereas those for the Rasch version tended to be Convergent and Known-Groups
lower (mean scores [and standard deviation], 45.3 ± 23.2 Only the concept-retention version of the QuickDASH main-
points for the concept-retention version, 45.6 ± 26.2 points tained the levels of correlation expected with the target con-
for the equidiscriminative item-total correlation version, structs for pain and for rating of the overall problem. All
and 37.9 ± 25.1 points for the Rasch version; see Appendix). versions had satisfactory correlations with the ability to func-
Correlations between the QuickDASH and the DASH were tion and the ability to work. These correlations, especially
extremely high (Pearson product-moment correlation, r ≥ those of the concept-based QuickDASH, were also compara-
0.97), which speaks to the comparability of scores between ble with those of the DASH. Each version of the QuickDASH
the full DASH and QuickDASH versions. Correlations among was able to discriminate (p < 0.0001) between the known
the three versions of the QuickDASH were also extremely groups of being able to do everything one needs to do (or not)
1044
THE JOUR NAL OF BONE & JOINT SURGER Y · JBJS.ORG D E VE L O P M E N T O F T H E Q U I C K DASH: C O M P A R I S O N
VO L U M E 87-A · N U M B E R 5 · M AY 2005 O F T H R E E I T E M -R E D U C T I O N A P P RO A C H E S

and being able to work (or not because of problems with the Discussion
arm). Higher QuickDASH scores were found in the more dis- hort, psychometrically sound measures offer clinicians
abled group, and the difference in scores, particularly those of
the equidiscriminative item-total correlation version, was sim-
S and researchers more efficient ways of quantifying pa-
tient outcomes while retaining the validity and reliability of
ilar to those found with the DASH (Table II). the longer versions. Such instruments offer the advantage of
providing the same quality of information with less burden
Reliability and Responsiveness for the patient completing it and easier scoring for the clini-
Test-retest reliability was excellent for all three versions of the cian or researcher. In this study, we developed the Quick-
QuickDASH (Table II) and was fairly consistent with that of DASH, an eleven-item questionnaire that addresses symptoms
the DASH (intraclass correlation coefficient = 0.94 to 0.97). and physical function in people with any or multiple disor-
Analysis of responsiveness revealed that the concept-retention ders involving the upper limb. We demonstrated strong mea-
version of the QuickDASH was most similar to the DASH, surement properties with use of this shortened scale. The
with higher standardized response means than were found for QuickDASH demonstrated reliability, validity, and respon-
the other versions. This translated into a high relative effi- siveness when it was used for patients with either a proximal
ciency of this version (0.96), offering a statistical advantage or a distal disorder of the upper extremity. It provides a sum-
over the other versions (relative efficiency, 0.72 and 0.75), mative score on a 100-point scale, with 100 indicating the
which required larger sample sizes to detect the same change most disability. Scores are obtained by summing circled re-
in patient state. This advantage of the concept-retention ver- sponses, dividing the total by the number of items com-
sion was retained after stratification by proximal versus distal pleted, subtracting one, and then multiplying that figure by
disorders. Responsiveness was also assessed by examining the 25. Only one missing item (10% of the items) can be toler-
correlation with responses to the transitional scales rating the ated; the QuickDASH score cannot be calculated if two or
change in the overall problem and the change in function. The more items are missing. The QuickDASH is comparable with
correlations were all low to moderate and were similar to the the full DASH (r = 0.98), and its construct validity and re-
DASH correlations. The concept-retention version again was sponsiveness suggest that the QuickDASH scores should give
ranked first and was closest to the DASH. views of disability and symptoms that are relatively similar
to those provided by the full DASH.
Decision on Which Version of Additional comparisons should be made. For direct
the QuickDASH to Adopt comparisons of two datasets with DASH and QuickDASH
Each version of the QuickDASH was highly correlated with scores, clinical researchers may wish to extract the Quick-
the DASH. In terms of the criteria set before the analysis, DASH items from the full DASH data to have the greatest
the equidiscriminative item-total correlation and concept- confidence. The high correlation between the QuickDASH
retention approaches both had one item each with >40% re- and DASH suggests highly comparable scores; however, an
sponses in one category and the Rasch approach had three. exact match between the numeric scores of long and shorter
The Rasch version would be expected to have more such scales is not guaranteed (i.e., 45 points on the DASH may not
items because, with that approach, items with extreme scores equal 45 points on a QuickDASH), but they are likely to be
are selected in order to cover the spectrum of disability and close. The optional modules (sports/performing arts and
to ensure that items are available across the range of disabil- work) are retained as optional; they have not changed from
ity in the sample. This often provides an advantage in a fairly the original DASH.
well or a very ill group. The other two approaches were more In this study, we compared three approaches toward
correlationally based, which favors items in the midrange item reduction. The resultant scales differed in content, but at
and with a broader distribution of responses. Thus, these a group level they were similar in terms of their measurement
versions would be expected to have fewer items with extreme properties and relationship to other measures. The three
response distributions. scales were so similar in performance that the results were un-
All versions had a Cronbach alpha of >0.90 and ac- likely to reflect clinically important differences in performance.
ceptable test-retest reliability (intraclass correlation coeffi- The version that ultimately was selected was concept-based. As
cient of ≥0.94). The construct validity and responsiveness described above, items for this version were selected by our
of the concept-retention QuickDASH were closest to those core development group with the goal of retaining the key
of the thirty-item DASH, although by rank more than by concepts outlined in the conceptual framework of the DASH7,11.
magnitude. An added benefit of this decision is the retention of the key
The descriptive and psychometric results for all three elements of our framework, which were domains deemed to
versions were sent to ten members of the Upper Extremity Col- be clinically relevant to clinicians and are the important issues
laborative Group. The group members, blinded to the item- for patients.
reduction approach, independently judged which version The similarity of the measurement properties despite
to recommend on the basis of the criteria described above. differences in item content could reflect the high inter-item
There was unanimous support for the concept-retention correlations in the full thirty-item DASH. The thirty-item
version. version has a very high Cronbach alpha (0.97), which to some
1045
THE JOUR NAL OF BONE & JOINT SURGER Y · JBJS.ORG D E VE L O P M E N T O F T H E Q U I C K DASH: C O M P A R I S O N
VO L U M E 87-A · N U M B E R 5 · M AY 2005 O F T H R E E I T E M -R E D U C T I O N A P P RO A C H E S

suggests redundancy across items. The various approaches to tients) and has the potential to work well in the more de-
item reduction reflected this high inter-item correlation, in manding role of monitoring care of individual patients in a
their similar performances with regard to numeric scores, clinical setting. Congruent findings from evalutions of the
construct validity, and correlation with the full DASH. QuickDASH on its own, rather than as extracted from the full
The concept-retention QuickDASH, which was the DASH, will increase our confidence in its measurement prop-
most subjective version, had the strongest ranking in terms erties. On the basis of our current findings, the QuickDASH
of measurement properties, which are often assumed to de- appears to perform comparably with the DASH, with little
pend on solid psychometric foundations for a measure. This loss of reliability, validity, or responsiveness.
finding might be counterintuitive to those who believe that
psychometric approaches produce a good, or better, psycho- Appendix
metric measure. On the other hand, it lends support to the Tables providing details of the item reduction, details of
hypothesis that greater clinical sensibility in a measure trans- the item-level analysis, and the distribution of the scores
lates into greater validity. for all three methods as well as the eleven-item QuickDASH
Our study was limited by the fact that we had to extract outcome measure are available with the electronic versions of
the QuickDASH versions from longer questionnaires for the this article, on our web site at jbjs.org (go to the article citation
psychometric testing of the resultant scale. Although we have and click on “Supplementary Material”) and on our quarterly
no clear understanding about whether this affected response CD-ROM (call our subscription department, at 781-449-9780,
patterns, we cannot be certain that it did not. To avoid this to order the CD-ROM). 
limitation, we would have had to have performed three inde-
pendent studies, one for each version of the QuickDASH. This
would have been impractical from a sample-size point of view, Dorcas E. Beaton, BScOT, MSc, PhD
and it would have meant the loss of the ability to perform con- Institute for Work and Health, 481 University Avenue, Suite 800,
current comparisons in the same sample. Toronto, ON M5G 2E9, Canada. E-mail address: [email protected]
Another limitation of the study is the method used for James G. Wright, MD, MPH, FRCSC
the Rasch approach. The Rasch model focuses on one pa- Division of Orthopaedic Surgery, The Hospital for Sick Children, 555
rameter, item difficulty. Other item-response-theory models University Avenue, Toronto, ON M5G 1X8, Canada
offer up to three parameters or are nonparametric, which
also allows the response options to have broader or narrower Jeffrey N. Katz, MD, MS
scales of meaning between items. These models become Robert B. Brigham Arthritis and Musculoskeletal Clinical Research Cen-
ter, Brigham and Women’s Hospital, 75 Francis Street, Boston, MA 02115
more complex to interpret in a clinical setting and almost
necessitate the use of computer interfaces for data collection In support of their research or preparation of this manuscript, one or
or analysis. The second weakness in our use of the Rasch ap- more of the authors received grants or outside funding from the Institute
proach was the a priori decision to target eleven items. We for Work and Health, the American Academy of Orthopaedic Surgeons,
retained two items that had been fitting on previous itera- the American Society of Surgery of the Hand, Canadian Institutes of
tions but were borderline misfitting items on the eleven-item Health Research, and the National Institutes of Health (Grants K24
iteration. There were no alternative items with which to re- AR02123 and P60 AR47782). None of the authors received payments or
other benefits or a commitment or agreement to provide such benefits
place those items and provide a better fit. In a purely Rasch from a commercial entity. No commercial entity paid or directed, or
approach, one might decide to stop at another level and have agreed to pay or direct, any benefits to any research fund, foundation,
a better fit. educational institution, or other charitable or nonprofit organization
In conclusion, on the basis of these findings, we antici- with which the authors are affiliated or associated.
pate that the new eleven-item QuickDASH (see Appendix)
will work very well in groups of patients (in research studies,
case series, and program evaluations involving groups of pa- doi:10.2106/JBJS.D.02060

References
1. Beaton DE, Davis AM, Hudak PL, McConnell S. The DASH (Disabilities of the 5. Nunnally JC, Bernstein IH. Psychometric theory. 3rd ed. New York: McGraw-
Arm, Shoulder and Hand) outcome measure: what do we know about it now? Br J Hill; 1994.
Hand Ther. 2001;6:109-18.
6. McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are
2. Beaton DE, Katz JN, Fossel AH, Wright JG, Tarasuk V, Bombardier C. Measuring available health status surveys adequate? Qual Life Res. 1995;4:293-307.
the whole or the parts? Validity, reliability, and responsiveness of the Disabilities
7. Hudak PL, Amadio PC, Bombardier C. Development of an upper extremity
of the Arm, Shoulder and Hand outcome measure in different regions of the up-
outcome measure: the DASH (disabilities of the arm, shoulder and hand). The
per extremity. J Hand Ther. 2001;14:128-46.
Upper Extremity Collaborative Group. Am J Ind Med. 1996;29:602-8. Erratum in:
3. McConnell S, Beaton DE, Bombardier C. The DASH outcome measure user’s Am J Ind Med. 1996;30:372.
manual. Toronto: Institute for Work and Health; 1999.
8. Marx RG, Bombardier C, Hogg-Johnson S, Wright JG. Clinimetric and psycho-
4. MacDermid JC, Richards RS, Donner A, Bellamy N, Roth JH. Responsiveness metric strategies for development of a health measurement scale. J Clin Epide-
of the short form-36, disability of the arm, shoulder, and hand questionnaire, pa- miol. 1999;52:105-11.
tient-rated wrist evaluation, and physical impairment measurements in evaluating
9. Linacre JM, Wright BD. A user’s guide to BIGSTEPS/Winsteps Rasch-model
recovery after a distal radius fracture. J Hand Surg [Am]. 2000;25:330-40.
1046
THE JOUR NAL OF BONE & JOINT SURGER Y · JBJS.ORG D E VE L O P M E N T O F T H E Q U I C K DASH: C O M P A R I S O N
VO L U M E 87-A · N U M B E R 5 · M AY 2005 O F T H R E E I T E M -R E D U C T I O N A P P RO A C H E S

computer program. Chicago: MESA Press; 1998. 15. Andiel C. Rasch analysis: a description of the model and related issues. Can
J Rehabil. 1995;9:17-25.
10. Wright BD. Solving measurement problems with the Rasch model. J Educ
Meas. 1977;14:97-116. 16. Ware JE Jr, Gandek B. Methods for testing data quality, scaling assumptions,
and reliability: the IQOLA Project approach. International Quality of Life Assess-
11. Solway S, Beaton DE, McConnell S, Bombardier C. The DASH outcome mea-
ment. J Clin Epidemiol. 1998;51:945-52.
sure user’s manual. 2nd ed. Toronto: Institute for Work and Health; 2002.
17. Beaton DE, Hogg-Johnson S, Bombardier C. Evaluating changes in
12. Beaton DE. Are you better? Describing and explaining changes in health sta-
health status: reliability and responsiveness of five generic health status
tus in persons with upper extremity musculoskeletal disorders [PhD dissertation].
measures in workers with musculoskeletal disorders. J Clin Epidemiol. 1997;
Toronto: University of Toronto; 2000.
50:79-93.
13. Marx R. A comparison of clinimetric and psychometric techniques for item re-
18. Wright JG, Young NL. A comparison of different indices of responsiveness.
duction in the development of an upper extremity disability scale [MSc Thesis].
J Clin Epidemiol. 1997;50:239-46.
Toronto: University of Toronto; 1996.
19. Liang MH, Larson MG, Cullen KE, Schwartz JA. Comparative measurement
14. Wright JG, Feinstein AR. A comparative contrast of clinimetric and psycho-
efficiency and sensitivity of five health status instruments for arthritis research.
metric methods for constructing indexes and rating scales. J Clin Epidemiol.
Arthritis Rheum. 1985;28:542-7.
1992;45:1201-18.

You might also like