0% found this document useful (0 votes)
6 views18 pages

Lippincott Williams & Wilkins Medical Care

The document discusses the psychometric and clinical tests of validity for the MOS 36-Item Short-Form Health Survey (SF-36) in measuring physical and mental health constructs. It presents findings from the Medical Outcomes Study, demonstrating that the SF-36 effectively distinguishes between groups with varying severities of medical and psychiatric conditions. The study emphasizes the importance of establishing guidelines for interpreting the scales and understanding the differences in health status among clinical groups.

Uploaded by

heitorblesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views18 pages

Lippincott Williams & Wilkins Medical Care

The document discusses the psychometric and clinical tests of validity for the MOS 36-Item Short-Form Health Survey (SF-36) in measuring physical and mental health constructs. It presents findings from the Medical Outcomes Study, demonstrating that the SF-36 effectively distinguishes between groups with varying severities of medical and psychiatric conditions. The study emphasizes the importance of establishing guidelines for interpreting the scales and understanding the differences in health status among clinical groups.

Uploaded by

heitorblesa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

The MOS 36-Item Short-Form Health Survey (SF-36): II.

Psychometric and Clinical Tests of


Validity in Measuring Physical and Mental Health Constructs
Author(s): Colleen A. McHorney, John E. Ware, Jr. and Anastasia E. Raczek
Source: Medical Care, Vol. 31, No. 3 (Mar., 1993), pp. 247-263
Published by: Lippincott Williams & Wilkins
Stable URL: http://www.jstor.org/stable/3765819
Accessed: 13-04-2015 21:42 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact [email protected].

Lippincott Williams & Wilkins is collaborating with JSTOR to digitize, preserve and extend access to Medical Care.

http://www.jstor.org

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
MEDICALCARE
Volume31, Number3, pp 247-263
? 1993, J. B. Lippincott Company

The MOS 36-Item Short-FormHealthSurvey (SF-36):


II.Psychometric and ClinicalTests of Validityin Measuring
Physical and MentalHealthConstructs

COLLEENA. MCHORNEY, PHD, JOHN E. WARE, JR., PHD,


AND ANASTASIAE. RACZEK, AB

Cross-sectionaldata from the Medical Outcomes Study (MOS)were analyzed


to test the validity of the MOS 36-ItemShort-FormHealth Survey (SF-36)scales
as measures of physical and mental health constructs.Results from traditional
psychometric and clinical tests of validity were compared. Principal compo-
nents analysis was used to test for hypothesized physical and mental health
dimensions. For purposes of clinical tests of validity, clinical criteria defined
mutually exclusive adult patient groups differing in severity of medical and
psychiatric conditions. Scales shown in the components analysis to primarily
measure physical health (physical functioning and role limitations-physical)
best distinguished groups differing in severity of chronic medical condition
and had the most pure physical health interpretation.Scales shown to primar-
ily measure mental health (mental health and role limitations-emotional)best
distinguished groups differing in the presence and severity of psychiatric dis-
orders and had the most pure mental health interpretation.The social function-
ing, vitality, and general health perceptions scales measured both physical and
mental health components and, thus, had the most complex interpretation.
These results are useful in establishing guidelines for the interpretationof each
scale and in documenting the size of differences between clinical groups that
should be considered very large. Key words: health status assessment; health-
related quality of life; constructvalidity; MOS SF-36health survey. (Med Care
1993;31:247-263)

A major goal of the Medical Outcomes standardized and more practical short-form
Study (MOS) was to advance the state-of- questionnaire for measuring general health
the-art of methods used for routine moni- concepts was demonstrated by the 20-item
toring of patient outcomes in medical prac- MOS Short-Form General Health Survey
tice and clinical research.1 The value of a (SF-20).2'3That form has been used in com-

From The Health Institute, New England Medical the Robert Wood Johnson Foundation, The Pew Chari-
Center, Boston, Massachusetts. table Trusts, The Agency for Health Care Policy and
This research was supported by the Henry J. Kaiser Research, and the National Institute of Mental Health.
Family Foundation, Menlo Park, CA (Grant No. 85- Address correspondence to: Colleen A. McHorey,
6515); the Functional Outcomes Program of the Henry PhD, The Health Institute, Box 345, New England Medi-
J. Kaiser Family Foundation of the Health Institute, cal Center, 750 Washington Street, Boston, MA 02111.
New England Medical Center, Boston MA (Grant No. This article was originally submitted for consider-
91-013); and the National Institute on Aging, Bethesda, ation on 11-19-92. It was accepted for publication on
MD (Grant No. AG07508). Data come from the Medi- 10-21-92 following the completion of all necessary re-
cal Outcomes Study (MOS), which is also supported by views and/or revisions.

247

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
MCHORNEY, WARE, AND RACZEK MEDICALCARE

parisons of patients with both medical and tween scale scores and external criteria. In
psychiatric conditions4'0 and in compari- this article, we focus on the second and third
sons with general populations.31"''2 aspects of construct validity. The conceptual
The MOS 36-Item Short-Form Health blueprint and rationale underlying item se-
Survey (SF-36) was constructed to broaden lection for the eight SF-36 health concepts
the health concepts measured and improve has been previously reported.13
measurement precision for each concept Briefly, the SF-36 survey was constructed
over that achieved by the SF-20. Notewor- to achieve two well-accepted standards of
thy improvements include the addition of comprehensiveness: 1) representation of
items tapping vitality, better representation multidimensional health concepts; and 2)
of the domain of general health perceptions, measurement of the full range of health
distinguishing between physical and mental states, including levels of well-being and
causes of role limitations, and increased mea- personal evaluations of health. Accordingly,
surement precision for physical, role, social, the SF-36 measures the health concepts
and bodily pain scales.13 The eight SF-36 most frequently included in widely used
measures constitute the core set of generic health surveys (physical, role, and social
health outcomes assessed in the longitudinal functioning, mental health, and general
component of the MOS. health perceptions) as well as two additional
A continuous aspect of evaluating both concepts strongly supported by empirical
the SF-20 and SF-36 surveys has been accu- work (bodily pain and vitality). To achieve
mulating evidence for validity-the fidelity depth of measurement for each health con-
with which a scale measures what it pur- cept, i.e., measurement precision, short-
ports to measure.14 Validity is the basis of form multi-item scales were constructed
the interpretability and meaningfulness of from a subset of items shown to best repro-
scores.'5 One traditional psychometric ap- duce a full-length and well-validated scale.
proach to validation is through components The full-length measures of general
or factor analysis, which gauges the con- health status that preceded the SF-36 were
gruence between the hypothesized con- constructed to capture two major dimen-
structs of interest and scales constructed to sions of health-physical and mental-and
measure those attributes. However, tradi- these dimensions have been empirically
tional psychometric tests often do not explic- confirmed in both general and patient popu-
itly address other key validity issues, such as lations.8'189 We replicated this important psy-
the relevance of scores to the intended use of chometric test of construct validity for the
a measure and the "quality of inferences"16 SF-36 measures. We also went beyond psy-
derived from specific applications. A more chometric tests and evaluated whether simi-
unified approach to validity emphasizes lar patterns of results are observed when the
both kinds of tests: 1) psychometric tests, scales are examined in relation to clinical cri-
which are the foundation of scale construc- teria of physical and mental health status.
tion and scoring; and 2) applied tests of rele- Finally, we compared results from psycho-
vance and usefulness that approximate a par- metric and clinical criteria to determine the
ticular use of the measure.'5 extent to which conclusions about the con-
Construct validation, the accumulation of vergent and discriminant validity of each
evidence of validity in relation to theoretical scale are replicated across criteria. Because
constructs, requires three steps:171) specify- interest in using general health scales in
ing the domain of variables, i.e., preparing a clinical research and medical practice is
blueprint for constructs; 2) establishing the growing rapidly, information about validity
internal structure of observed variables; and in relation to clinical criteriais crucial to doc-
3) verifying theoretical relationships be- ument the size of small and large differences
248

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
Vol. 31, No. 3 SF-36: PSYCHOMETRIC
AND CLINICALTESTS

and to advance understanding of how these participantstended to be more involved in


differences should be interpreted. directpatient care than nonparticipants.1
Study participantswere English-speaking
Methods adults (18 years of age and older) who had
an officevisit with an enrolledcliniciandur-
Sample and Data Collection
ing 9-day screeningperiods in Februaryto
The data for this analysis came from MOS November, 1986. Patients seen during this
forms completed by patients and physicians periodwere asked to completea brief, stan-
and from health examinations administered dardized, self-report questionnaire that
in 1986-1987. Details on study objectives gatheredinformationabout chronicdisease,
and design, including selection of sites and depressive symptoms, sociodemographic
recruitment of clinicians and patients, has characteristics,and general health status.
been extensively reported1'4'5'10'20'21 and is Complete questionnaires were obtained
briefly summarized here. The MOS was con- from 74% of eligible patients treated in
ducted in three cities (Boston, Mass; Chi- group practicesand from 65% of patients
cago, Ill; and Los Angeles, Calif) selected treated in solo or small-grouppractices(N
from three of four census regions. In each = 22,462). For 96% of these patients, their
city, one large health maintenance organ- cliniciansalso completed a brief, standard-
ization (HMO), numerous multispecialty ized questionnairethat elicited information
groups, and representative solo practices on diagnosis,disease severity,and visit con-
were studied. From these systems of care, tent.
physicians board certified or board eligible Data from the physician-completedques-
in family practice, internal medicine, cardiol- tionnaires were used to identify patients
ogy, endocrinology, and psychiatry were with the fourMOSmedicaltracerconditions
identified along with clinical psychologists, (hypertension, diabetes, congestive heart
clinical social workers, and other mental failure(CHF),and recentmyocardialinfarc-
health providers. Solo and small-group clini- Patientswith these conditions
tion(MI)).1'4'10
cians were identified from master files of the were identified on the basis of a standard-
American Medical Association, American ized physicianreportform.A two-stagepro-
Academy of Family Physicians, and Ameri- cess, involvinga depressivesymptomscale22
can Psychological Association. Multispe- includedin the patient-completedquestion-
cialty group clinicians were identified from naire and the National Institute of Mental
the Medical Group Management Associa- Health's Diagnostic Interview Schedule
tion membership directory, and HMO clini- (DIS),was used to identifypatientswith de-
cians were identified by upper-level manage- pressionand to stage their severity.5'21
ment. Patientswith matchedpatient and physi-
The process of enrolling clinicians dif- cian questionnaireswho were determinedto
fered by system of care. Of eligible clinicians have one of the medicaltracerconditionsor
practicing in HMOs or large multispecialty current depressive symptoms were subse-
groups, 225 (79%) agreed to participate in quentlycontactedfor a telephoneinterview.
the MOS.'1 Solo and small-group clinicians This interview was designed to: 1) deter-
were selected by a multistage sampling pro- mine the presence of psychiatric disorder
cess. This process yielded 298 solo or small- among those with currentsymptomsby us-
group practitioners (58% of those eligible ing the depression section of the DIS;5'21and
and who agreed to be contacted).10 Physi- 2) to enrollpatientswho met DIS criteriafor
cian participants were similar to nonpartici- psychiatricdisordersand patients who met
pants regarding clinical training and socio- original diagnostic criteriafor the medical
demographic and practice characteristics; tracers.Of those eligible for enrollmentand
249

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
MCHORNEY, WARE, AND RACZEK MEDICAL
CARE

who were successfully contacted by tele- mensions of health within the SF-36, we ex-
phone, 73% (N = 5,341) completed the in- tracted principal components from the
terview and 91% (N = 4,824) of interviewed correlations among its eight scales.28 Corre-
patients agreed to enroll in the study. Upon lations between the scales and the first
enrollment, patients were invited to the unrotated component test for the large gen-
MOS Health Examination and were sent the eral health factor hypothesized to be com-
baseline Patient Assessment Questionnaire. mon to all eight scales. The pattern of corre-
The health examination (standardized medi- lations between the eight scales and the two
cal history and clinical examination) was in- rotated components test the validity of each
dependently conducted by specially trained scale in relation to hypothesized physical
MOS medical staff. Health examinations and mental health dimensions.
were completed on 2,583 patients and 3,445
patients returned the baseline questionnaire. Tests of ValidityUsing Clinical Criteria
The MOS patient sample used for the psy-
chometric analyses included all enrolled pa- We also assessed the validity of each scale
tients who completed the 245-item baseline by comparing patient groups differing in
questionnaire, which included the 36 items physical and/or mental health status and se-
which were later used to construct the SF-36 verity. Using clinical criteria, four mutually
survey (N = 3,445). Because disease-specific exclusive groups were formed: Group 1,
information from the health examination minor (uncomplicated) chronic medical con-
was used to stage severity for clinical tests of ditions only (N = 638); Group 2, serious
validity reported here, we limited that sam- (complicated) chronic medical conditions
ple to a subset of enrolled patients who only (N = 168); Group 3, psychiatric condi-
completed both the baseline questionnaire tions only (N = 163); and Group 4, both seri-
and health examination within a 1-month ous medical and psychiatric conditions (N
period (N = 1,014). We required the com- = 45). The first three groups are identical to
pletion of the baseline questionnaire and the those studied elsewhere.23 We document
health examination to be within a 1-month here more thoroughly the clinical criteria
period so that the clinical criteria and the used to define each group.
health scales they are compared with were To distinguish patients differing in sever-
measured in close proximity. The sample ity of chronic medical condition, we used
analyzed here for clinical tests of validity is disease-specific severity scales constructed
similar to that used in previously reported from the standardized medical history inter-
comparisons of the relative precision of sin- view.29'30Patients classified as having a seri-
gle-item and MOS short- and long-form gen- ous chronic medical condition (Groups 2
eral health status measures.23 In this article, and 4) included the following: 1) CHF pa-
we add a fourth group of patients-those tients reporting edema, orthopnea, or dys-
who have bothchronic medical and psychiat- pnea on exertion (5% of CHF patients); 2)
ric conditions. We also add clinical tests of MI survivors with noteworthy and recurring
validity using the severity of psychiatric dis- angina symptoms and/or severe CHF
orders as additional clinical criteria. symptomology (2% of MI patients); and 3)
hypertension patients with reports of severe
Tests of ValidityUsing Psychometric Criteria CHF symptomology and/or history of a
stroke (2% of hypertension patients).
Previous studies investigating the dimen- Twelve percent of diabetic patients were
sionality of self-reports of health have con- classified as severe because of the presence
firmed distinct physical and mental health of at least one of the following complica-
components.18'19'24-27 To test for these di- tions: history of an MI; weekly angina; se-

250

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
Vol. 31, No. 3 SF-36: PSYCHOMETRIC
AND CLINICALTESTS

vere autonomic neuropathy; moderately se- fective disorder to those with serious de-
vere peripheral neuropathy and lack of pressive symptoms in the absence of a dis-
blood sugar control or severe vision prob- order.
lems or moderately severe autonomic neu-
Hypotheses
ropathy; or recurring angina monthly and
lack of blood sugar control or severe vision The first panel of Table 1 presents hypoth-
problems or severe peripheral neuropathy eses regarding the factor content of each SF-
or moderately severe autonomic neuropa- 36 scale along with results of tests of those
thy. hypotheses, which are presented below. We
We defined psychiatric conditions using define a strong association as a correlation
well-established psychiatric diagnostic crite- greater than 0.70, moderate to substantial as
ria, as reported in detail elsewhere.5'21'22 a correlation of 0.30 to 0.70, and weak as a
Briefly, patients were determined to have correlation less than 0.30. These are equiva-
current depressive symptoms based on re- lent, in variance terms, to shared variances
sponses to an eight-item depression symp- of > 50%, 10% to 50%, and < 10%.
tom scale22administered during the screen- On the basis of previous research,18'19'26
ing visit. The subsequent DIS telephone in- we expected SF-36 scales measuring physi-
terview (described earlier) was used to cal functioning, role limitations due to physi-
classify them as having current unipolar af- cal health problems, and bodily pain 1) to be
fective disorder (major depression or dysthy- most highly correlated with an empirically
mia) or serious depressive symptoms in the derived physical health component; 2) to be
absence of a disorder. Patients with either most valid in distinguishing groups differing
depressive disorders or current symptoms in severity of chronic medical condition; 3)
were included in Groups 3 and 4. To test to show little or no association with the
validity in relation to severity of psychiatric mental health component; and 4) to perform
condition, we disaggregated Group 3 and less well than the mental health scales in dis-
compared patients with current unipolar af- tinguishing groups differing in the presence

1. Hypothesized Associations Between SF-36 Scales and Results From Psychometric Tests
TABLE

Hypothesized
Association RotatedPrincipalComponents RelativeValidityb

Physical Mental Physicala Mentala h2 Physical Mental

Physicalfunctioning + -0.88 0.04 0.78 1.00 0.00


Role-physical + -0.78 0.30 0.70 0.79 0.11
Bodilypain + -0.77 0.24 0.65 0.77 0.07
Mentalhealth -+ 0.12 0.90 0.82 0.02 1.00
Role-emotional -+ 0.19 0.81 0.69 0.05 0.81
Socialfunctioning * + 0.44 0.71 0.70 0.25 0.62
Vitality * * 0.59 0.57 0.67 0.45 0.40
Generalhealth perceptions * * 0.68 0.32 0.56 0.60 0.13

h2, proportionof total varianceof each scale explainedby the two extractedcomponents.
"Correlationbetween each scale and rotatedprincipalcomponent.
b
Computedby the ratio of the common-factorvarianceof each scale relative to the scale with the greatest
common-factorvariance.The common-factorvarianceof each scaleis the squareof each scale-componentcorrela-
tion.
+ Strong Association (r > 0.70)
* Moderateto SubstantialAssociation(0.30 < r < 0.70)
- WeakAssociation(r < 0.30)

251

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
MCHORNEY, WARE, AND RACZEK MEDICALCARE

and severity of psychiatricdisorders.Simi- ficiency of five health status instruments


larly,on the basisof previousresearch,18'19'26(usingthe ratioof squaredt-statistics)in de-
we expected SF-36 measures of general tectingchange in functioningover time. We
mental health and role limitations due to improvedon this methodologyin tests of the
emotional problems 1) to be most highly relative precision of short- and long-form
correlatedwith the mental health compo- health status scales by holding sample
nent; 2) to be most valid in distinguishing size constant within comparisons,holding
groupsdifferingin the presenceand severity groups constant across comparisons, and
of psychiatricdisorders;3) to show little or defining clinical groups to differ in clearly
no associationwith the physicalhealth com- interpretableways.23
ponent;and 4) to performless well than the We extend here the methodology to test
physical health scales in distinguishingpa- the RVof the eight SF-36scalesas indicators
tientsdifferingin severityof chronicmedical of two unobservable health constructs. For
condition. clinicaltests of validity,we used unadjusted
On the basis of theircontent,we expected generallinearmodels to estimatemean dif-
some scales to measure both physical and ferencesbetween pairsof clinicalgroupsfor
mental health factorsand, thus, to be valid each of the eight scales. The resultingF-sta-
for purposes of comparinggroups differing tistic for each scale defines the ratio of be-
in both physicaland mentalhealth status as tween-groups(systematic)variancerelative
clinically defined. First,on the basis of pre- to within-group(error)variance.The greater
vious research,18'19'26 we expected the vital- the F-ratio,the greaterthe amountof infor-
ity and general health perceptionsscales to mation (systematic variance) a scale pro-
be moderatelycorrelatedwith both physical vides aboutthe criterionrelativeto errorvari-
andmentalhealthcomponentsand to distin- ance. Sample size was held constantacross
guish groups differingin both physical and scales to standardizecomparisons.By ana-
mental health status. Second, although we lyzing identical samples across scales for
expected the social functioning scale to be each clinicalcontrast,the relativesize of F-
highly correlated with the mental health ratios reflectsthe relevanceof the scales to a
component,18 we also expected a moderate particularcriterion.We estimatedRVfor the
correlationwith the physicalhealth compo- eight scales for each clinical-groupcontrast
nent. Because the social functioning items by computingthe ratioof pair-wiseF-statis-
confoundphysicaland mentalhealth by de- tics (F for each comparisonscale divided by
sign, that scale should be sensitive to the F for the most valid scale). The resultingRV
burden of both physical and mental health estimatesindicatein proportionaltermshow
as clinicallydefined. much less valid each scale is as a measureof
physical or mental health status, relativeto
Methods of Analysis the most valid scale.
We used principalcomponentsanalysisto
The generalmethodologyused for assess- test the hypothesized dimensionalityof the
ing the relativevalidity(RV)of the eight SF- SF-36 scales. Becausewe hypothesizedtwo
36 scalesas measuresof physicaland mental dimensions to underlie the structureof the
health constructshas its rootsin the concept eight scales,we extractedtwo principalcom-
of statistical efficiency.'32 Briefly, a mea- ponents. The size of the firstunrotatedcom-
sureis moreefficient,relativeto another,if it ponent and the pattern of correlationsbe-
yields the rightinformationwith greaterac- tween it and the eight scales gauge the ex-
curacy(less error).Lianget al.33applied the tent to which the scales contribute to a
conceptof statisticalefficiencyin health sta- commongeneralhealthdimension.To facili-
tus assessmentby comparingthe relativeef- tate interpretation,we rotated the compo-
252

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
Vol. 31, No. 3 SF-36: PSYCHOMETRIC
AND CLINICALTESTS

nents to orthogonal simple structure using that component. For each component, we
the varimax method. To interpret the compo- then estimated RV for each scale by dividing
nents, we examined the pattern of correla- the variance shared with the component by
tions across the eight scales. To evaluate the that estimate for the most valid scale. These
validity of each of the eight scales, we com- ratios indicate in proportional terms how
pared their correlations with the hypothe- much less valid each scale is relative to the
sized component(s) (convergent validity) most valid scale. The higher the RV of a
versus the other component (discriminant scale, the more precisely or efficiently it
validity). measures the underlying construct of inter-
To evaluate the factorial validity of each est as defined by the most valid scale.
scale as a measure of each component, we
first squared each factor loading (scale-com- Results
ponent correlation) to estimate the propor- Validation of Clinical Groups Compared
tion of variance shared with that component
(common-factor variance). We defined the As Table 2 shows, clinical criteria pro-
scale sharing the most variance with each duced the desired mutually exclusive groups
component as the most valid measure of differing in the severity of medical and psy-

TABLE
2. Characteristics of Patients in Four Clinical Groups

Psychiatricand
MinorMedical SeriousMedical Psychiatric SeriousMedical
Conditions" Conditionsb ConditionOnlyc Conditionsd
(N = 638) (N = 168) (N = 163) (N = 45)
PatientCharacteristics 1 2 3 4

Sociodemographics
Mean age (SD) 57.4 61.0 41.8 54.4
(12.8) (12.4) (12.6) (12.5)
% female 47.0 49.7 73.0 68.9
Medicaland psychiatricconditions
% Complicatedadvancedcoronary
arterydisease 0.0 35.1 0.0 17.8
% Complicatedhypertension 0.0 20.8 0.0 28.9
% Complicateddiabetes 0.0 61.3 0.0 62.2
% Currentdepressivesymptoms 0.0 0.0 100.0 100.0
% Currentdepressivedisorder 0.0 0.0 63.8 22.2
Healthstatus
% Self-ratedhealth fair or poor 17.4 43.8 21.6 74.4
% Any bed days last 3 months 8.7 15.8 35.2 46.7
Providerspecialty
Medicalsubspecialist 13.9 25.6 3.7 24.4
Mentalhealth professional 0.0 0.0 42.3 0.0
Utilizationof health careservices
% Clinicianvisit within past two
weeks 29.1 40.0 45.3 51.3
% Hospitalizedpast 12 months 12.3 25.2 20.8 47.4
% Everutilizedmentalhealth
services 21.2 22.4 82.5 56.8
Minormedical:patientswith uncomplicatedchronicmedicalconditions.
b Seriousmedical:
patientswith advancedor complicatedchronicmedicalconditions.
cPsychiatric
only:patients with eithercurrentdepressivesymptomsor disorderbut no chronicmedicalcondition.
d
Psychiatricand serious medical:patientswith either currentdepressivesymptomsor disorderand a serious
chronicmedicalcondition.

253

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
MCHORNEY, WARE, AND RACZEK MEDICAL
CARE

chiatric conditions. None of the patients as- The first principal component accounted for
signed to Groups 1 and 3 had complicated 55% of the total measured variance and
medical conditions, whereas patients in correlated highly with all eight scales (range
Groups 2 and 4 all had complicated medical = 0.67 for role-emotional to 0.82 for vitality,
conditions. As intended, none of the pa- median = 0.74). Extraction of the second
tients in Groups 1 and 2 had current depres- component increased the percentage of total
sive symptoms or disorders. All patients in variance explained from 55% to 70%. Com-
Groups 3 and 4 had current depressive munalities (h2 in Table 1) indicate the extent
symptoms, and 64% of Group 3 and 22% of of overlap in terms of common variance be-
Group 4 patients had a current depressive tween each measure and the two extracted
disorder. factors. The percentage of total variance in
Demographic differences among the each scale accounted for by the two-factor
groups correspond well with epidemiologic solution ranged from 0.56 to 0.82 across
trends in the United States34 (Group 3 pa- scales, indicating that the two factors ac-
tients were the youngest and disproportion- counted for the majority of the reliable vari-
ately female and Group 2 patients were the ance in each scale.
oldest). The substantial differences in per- The middle panel of Table 1 presents
sonal ratings of health, proportion reporting correlations between the SF-36 scales and
any bed days in the last 3 months, and utili- the two rotated components. Rotation of
zation of health services across groups pro- these components confirmed the hypothe-
vide further evidence of the desired distinc- sized physical and mental dimensions of
tions between the groups in health status as health. As hypothesized for a physical
clinically defined. For example, 74% of pa- health component, the physical functioning,
tients with both serious medical and psychi- role-physical, and bodily pain scales corre-
atric conditions reported their health as fair lated most highly with the first rotated com-
or poor, compared with 44% of serious medi- ponent, while the mental health and role-
cal patients and 22% or less of patients with emotional scales correlated weakly. As hy-
solely psychiatric or minor medical condi- pothesized for a mental health component,
tions. Report of any bed days in the last 3 the order of correlations with the eight scales
months was also greatest among patients was nearly reversed for the second compo-
with both serious medical and psychiatric nent. Specifically, the mental health, role-
conditions. Patients with minor medical emotional, and social functioning scales
conditions were the least likely to have re- correlated most highly with the second com-
cently used health care services, while pa- ponent, while physical functioning, bodily
tients with psychiatric conditions were the pain, and role-physical scales correlated
most likely to have ever consulted a mental weakly. Based on these patterns of correla-
health professional. In summary, these data tions, we interpreted the first and second
provide prima facie evidence that the in- components as "physical" and "mental"
tended differences in the presence and sever- health dimensions, respectively.
ity of medical and psychiatric conditions The third panel of Table 1 presents esti-
were achieved across the comparison mates of the RV of the eight scales as mea-
groups. sures of physical and mental health compo-
nents. Because the physical functioning and
Psychometric Validity mental health scales had the highest correla-
tions, respectively, with the physical and
The components analysis confirmed the mental health components, they served as
substantial general health dimension hy- the standards for estimating RV. As hypoth-
pothesized to be common to all eight scales. esized, the role-physical and bodily pain
254

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
Vol. 31, No. 3 SF-36: PSYCHOMETRIC
AND CLINICALTESTS

scales showed strong associations, in terms among the four clinical groups. These com-
of shared common-factor variance, with the parisons test the validity of the scales in de-
physical health component (RV = 79% and tecting decrements in health status asso-
77%, respectively). The mental health com- ciated with chronic medical and/or psychiat-
ponent was best measured by the mental ric conditions. Table 3 presents means and
health scale, followed by the role-emotional standard errors for each group across the
and social functioning scales (RV = 0.81 and eight SF-36 scales.
0.62, respectively). The three scales hypoth- Table 4 presents pair-wise mean differ-
esized to measure more than one health di- ences, pair-wise F-statistics, and estimates of
mension (social functioning, vitality, and RV for group comparisons involving minor
general health perceptions) showed moder- medical patients. Patients with serious medi-
ate to strong associations with both compo- cal conditions scored significantly lower on
nents. However, for these three scales, there all eight scales compared to patients with
was substantial variation in observed RV es- minor medical conditions (Group 2 vs. 1).
timates: the general health perceptions scale However, as indicated by the wide range of
was clearly more strongly associated with observed RV estimates, all scales were not
the physical than mental component; the equally valid in this clinical-group compari-
social functioning scale was more highly as- son. As hypothesized, the physical func-
sociated with the mental than physical com- tioning scale was most valid in detecting dif-
ponent; and the vitality scale showed nearly ferences between patients with minor versus
equal associations with both components. serious medical conditions. The general
health perceptions scale nearly equaled that
Clinical Validity standard (RV = 0.99), followed by the role-
physical and vitality scales (RV = 0.71 and
Tables 3, 4, and 5 present results from 0.67, respectively). As hypothesized, the
tests of validity based on comparisons best mental health scales (mental health and

TABLE3. Means (and Standard Errors) for Groups Differing in Medical and Psychiatric Conditions

ComparisonGroups
Group1 Group2 Group3 Group4
Minor Serious Psychiatric Psychiatric&
Medical Medical Only SeriousMedical
Scale N = 576 N = 144 N = 153 N = 43

Physicalfunctioning 80.53 57.35 80.62 46.37


(0.89) (2.34) (1.64) (4.24)
Role-physical 70.27 43.92 55.56 23.84
(1.48) (3.31) (3.18) (4.63)
Bodilypain 76.06 65.10 63.30 50.23
(0.91) (2.06) (1.91) (3.52)
Mentalhealth 82.49 77.59 52.75 56.90
(0.59) (1.32) (1.63) (3.08)
Role-emotional 84.26 76.16 40.74 52.71
(1.27) (3.11) (3.20) (5.89)
Socialfunctioning 91.62 80.03 64.54 65.12
(0.62) (2.03) (2.06) (3.44)
Vitality 62.02 47.79 45.32 37.05
(0.82) (1.82) (1.65) (3.11)
Generalhealth perceptions 67.02 49.13 57.91 39.93
(0.74) (1.80) (1.75) (2.30)

255

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
MCHORNEY, WARE, AND RACZEK MEDICALCARE

4.
TABLE Summary of Clinical Validity Tests Involving Minor Medical Patients
Group 4 vs. 1
Group2 vs. 1 Group3 vs. 1 BothSeriousMedical
SeriousMedicalvs. Psychiatricvs. and Psychiatric
MinorMedical MinorMedical vs. MinorMedical

Mean Relative Mean Relative Mean Relative


Scale Difference F Validity Difference F Validity Difference F Validity

Physicalfunctioning -23.18" 85.9 1.00 0.09 0.0 0.00 -34.16" 62.2 0.66
Role-physical -26.35" 60.6 0.71 -14.71" 19.9 0.07 -46.43" 69.9 0.74
Bodilypain -10.96" 23.6 0.27 -12.76" 39.9 0.14 -25.83" 55.6 0.59
Mentalhealth -4.90" 13.3 0.15 -29.74" 294.7 1.00 -25.59" 66.7 0.71
Role-emotional -8.10b 5.8 0.07 -43.52a 159.9 0.54 -31.55" 27.4 0.29
Socialfunctioning -11.59" 29.9 0.35 -27.08" 158.6 0.54 -26.50" 57.4 0.61
Vitality -14.23" 57.8 0.67 -16.70" 86.0 0.29 -24.97" 64.4 0.68
Generalhealth perceptions -17.89" 84.7 0.99 -9.11" 22.9 0.08 -27.09" 94.4 1.00

P < 0.001.
bp < 0.01.

role-emotional) performed most poorly in most valid in detecting the incremental bur-
this test. The bodily pain scale performed den of a psychiatric condition among pa-
less well than hypothesized (RV = 0.27). tients with serious medical conditions
As hypothesized, for clinical comparisons (Group 4 vs. 2). The other seven scales were
involving the presence or absence of a psy- well below that standard (RV range = 0.13
chiatric condition (Group 3 vs. 1), the mental to 0.34, median = 0.32). The physical func-
health scale proved to be the most valid, fol- tioning scale was most valid in detecting the
lowed by the role-emotional and social incremental burden of a serious medical
functioning scales (RV = 0.54 each). Also as condition among patients with a psychiatric
hypothesized, the physical functioning scale condition (Group 4 vs. 3), followed by the
did not distinguish between groups differing general health perceptions and role-physical
only in psychiatric condition (RV = 0.00), scales (RV = 0.68 and 0.56, respectively).
and the role-physical and bodily pain scales The remaining five scales performed rela-
were less valid measures for this group con- tively poorly in this test (RV range = 0.00
trast. The general health perceptions scale to 0.18).
also yielded poor validity relative to the The mental health scale was most valid in
standard in this test (RV = 0.08). distinguishing serious medical from psychi-
Patients with both serious medical and atric patients (Group 3 vs. 2). Although the
psychiatric conditions scored significantly physical functioning and role-emotional
lower than minor medical patients in all scales had similar RV estimates (RV = 0.47
eight scales (Group 4 vs. 1). The general and 0.45, respectively), their group mean
health perceptions scale was most valid in differences were in opposite directions, as
detecting the combined effects of medical would be expected. Specifically, patients
and psychiatric conditions. The other scales with psychiatric conditions had better physi-
performed similarly in this test (RV range cal functioning but worse role-emotional
= 0.59-0.74), with the exception of the role- functioning than patients with serious medi-
emotional scale (RV = 0.29). cal conditions. Scales measuring social func-
Table 5 extends tests of validity to groups tioning, general health perceptions, and
of patients with serious medical and psychi- role-physical showed significant differences
atric conditions. The mental health scale was between the groups but were far less valid.

256

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
Vol. 31, No. 3 SF-36: PSYCHOMETRIC
AND CLINICALTESTS

5.
TABLE Summary of Clinical Validity Tests Involving Chronically Ill Patients
Group4 vs. 2 Group4 vs. 3
PsychiatricIncremental: MedicalIncremental: Group3 vs. 2
PsychiatricAmong Serious MedicalAmong Psychiatricvs.
SeriousMedical Psychiatric SeriousMedical

Mean Relative Mean Relative Mean Relative


Scale Difference F Validity Difference F Validity Difference F Validity

Physical functioning -10.98c 5.1 0.13 -34.25" 56.8 1.00 23.27" 66.4 0.47
Role-physical -20.08" 12.5 0.33 -31.72" 31.9 0.56 11.64c 6.4 0.05
Bodilypain -14.87" 12.3 0.32 -13.07b 10.4 0.18 -1.80 0.4 0.00
Mentalhealth -20.69" 38.1 1.00 4.15 1.4 0.02 -24.84" 140.2 1.00
Role-emotional -23.45" 12.9 0.34 11.97 3.1 0.05 -35.42" 62.7 0.45
Socialfunctioning -14.91" 12.9 0.34 0.58 0.0 0.00 -15.49" 28.7 0.20
Vitality -10.74b 8.3 0.22 -8.27c 5.5 0.10 -2.47 1.0 0.01
General health perceptions -9.20b 9.9 0.26 -17.98" 38.6 0.68 8.78" 12.3 0.09

P < 0.001.
P < 0.01.
cP < 0.05.

The vitality and bodily pain scales did not functioning (RV = 0.32), and vitality (RV
distinguish these two groups. = 0.31). The best physical health measures
Table 6 presents results for tests of valid- (physical functioning, role-physical, bodily
ity in relation to the severity of psychiatric pain, and general health perceptions) all had
disorder for patients within Group 3-symp- RV estimates close to 0.
tomatic depression versus more severe clini-
cal depression. As hypothesized, the mental Summary of Results
health scale was most valid in detecting
these differences, followed by scales mea- Table 7 presents hypotheses for each scale
suring role-emotional (RV = 0.43), social and summarizes RV estimates obtained

TABLE
6. Summary of Clinical Validity Results for Groups Differing
in Severity of Psychiatric Condition

Symptomatic Clinical
Depression Depression Mean Relative
Scale N = 56 N = 97 Difference F Validity

Physicalfunctioning 81.20 80.28 -0.92 0.07 0.00


(2.92) (1.97)
Role-physical 62.95 51.29 -11.66 3.16 0.06
(5.21) (3.97)
Bodilypain 64.71 62.48 -2.23 0.31 0.01
(3.25) (2.37)
Mentalhealth 65.19 45.56 -19.63" 49.03" 1.00
(2.00) (1.96)
Role-emotional 58.93 30.24 -28.69" 21.10a 0.43
(5.50) (3.53)
Socialfunctioning 74.78 58.63 -16.15a 15.64" 0.32
(3.10) (2.53)
Vitality 53.39 40.65 -12.74" 15.06" 0.31
(2.47) (2.05)
Generalhealth perceptions 59.95 56.74 -3.21 0.78 0.02
(2.78) (2.25)
P < 0.001.

257

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
MCHORNEY, WARE, AND RACZEK MEDICAL CARE

Uo
o _c/ from psychometric tests and five clinical
tests. These clinical tests were judged to be
r cMc c0 O
Ce (4
'
e' most useful because they tested convergent
e
66o0r~-< 00
and discriminant validity in relation to un-
gu
confounded differences in physical or men-
tal health as clinically defined. This table can
be interpreted both row-wise and column-
b
*uE
.2 *< O 0 0e CO C C4
o
wise. Each column summarizes results
across scales for a particular validity crite-
ucJ
rion. Table entries for a given column (crite-
-U 0
rion) are RV estimates, which indicate how
z
tU
;a (U) (A
(U o rH O mr ur
O\
oC)
much less valid a scale is relative to the best
U) n
04
o scale. These results serve as guidelines for
O N 0 O0 0 O0
hypothesizing which scale/concept is most
(U
U) relevant to each criterion. Summaries of re-
-4
10
CZ sults by row indicate whether the interpreta-
UL) *E 0 0 00'0
Q0
o 04
oC,,
tion of each scale is pure or complex; that is,
H
(U U
>-
whether observed differences are largely
'0 due to one health component or likely due to
both components. These results serve as
U U,
&-0 I I I +++ * guidelines for interpreting each scale.
U1)
0
As summarized in Table 7, the scales
(U
identified in the components analysis to best
represent the physical and mental health di-
-4-
U)
U)
u
mensions- physical functioning and men-
U) o0 oO
c0 ir o o0 00 tal health-were most valid, respectively, in
O uI) - O O O - C0
I-
4
01 DWMOOo o clinical tests involving detection of the bur-
tv
v0 den of severe medical versus psychiatric
u
conditions. Further, the mental health scale
r-J
cu
0
,-.
best distinguished between patients within
tg o N m) mu N 0o the psychiatric group who differed only in
(U ; r 00 N LS (I O N) 'O)
I.t. 0
V the severity of their disorder. These findings
(U
U
uI
'5;
Co
o support the convergent validity of the physi-
H 2 cal functioning and mental health scales.
o0^f
_t. 0 N N 0 0 . oC) Consistent with results from psychometric
6
U c5 tests, the physical functioning scale was
*40 least valid in tests involving the presence
L0
0J
0 Q
and severity of psychiatric conditions and
the mental health scale was least valid in the
o +++ I I * * A\ X V/ medical severity test. These findings support
6 _ the discriminant validity of these two scales.
o ()0
The incremental burden tests summarized
U?S to in Table 7 provide further evidence for the
0 C;
0o (U .
(t)
convergent and discririnant validity of
o-o 'I- these two scales. The physical functioning
0
scale was most valid, and the mental health
it'Klll
scale least valid, in detecting the incremental
01 + * I0
burden of serious medical conditions among
258

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
Vol. 31, No. 3 SF-36: PSYCHOMETRIC
AND CLINICALTESTS

those with a psychiatric condition. The psychometricand clinical standardsto as-


mental health scale was most valid, and sess the validity of each SF-36 scale as a
physicalfunctioningleast valid, in detecting measure of the physical or mental dimen-
the incrementalburdenof psychiatriccondi- sion of health status. Overall, results from
tions among those with seriousmedicalcon- the psychometricand clinicaltests of valid-
ditions. ity agreedwith one anotherand converged
The role-physical and role-emotional with study hypotheses.Thus,thereis a good
scales showed strong convergent and dis- basis for establishingguidelines for the in-
criminantvalidityin relationto role disabili- terpretationof score differences for each
ties associatedwith medicalversuspsychiat- scale as a measureof physicaland/or men-
ric disorders.In both psychometricand clin- tal health effectsand also specifyingthe size
ical tests, each role functioning scale was of differencesin each scale scorethat should
stronglyrelatedto one component(physical be consideredlarge.
ormental)and unrelatedto the othercompo- Our results indicate that the physical
nent. For both physical and mental health functioningand mentalhealthscales arerel-
dimensions, the social functioning scale atively pure and, therefore,theirinterpreta-
showed moderateto strong convergentva- tionis unequivocal.Thesetwo scales,respec-
lidity across psychometricand clinical tests tively, measurethe physical and mental di-
but fairlypoor discriminantvalidity.As hy- mensions of health and are most sensitive,
pothesized, the vitality scale showed good respectively,to the clinicalmanifestationsof
convergentvalidityfor physical and mental medical and psychiatricconditions. There-
health effects in both psychometric and fore, when observed differencesare found
clinical tests, but it has poor discriminant on these scales, interpretationattributedto
validity. physicalor mental causes can be made with
Two exceptions in expected results from a high degree of confidence.Unambiguous
psychometricand clinical validity tests are interpretationsof these scoreswere general-
apparentin Table 7. First, the bodily pain izableboth within and acrossvariouscombi-
scale showed strong convergent validity in nations of the medical and psychiatriccon-
the physical-healthfactorialtest as hypoth- ditionsstudiedhere. This informationis im-
esized, but poor convergentvalidityin both portant because little is known about the
medical-severityclinical tests. Second, we validityof healthstatusmeasuresin patients
hypothesized moderateconvergentvalidity with both medical and psychiatric condi-
for the general health perceptionsscale in tions.27
relationto both physicaland mentalcompo- However, a comprehensiveassessmentof
nents of health. However, for both psycho- health requiresrepresentationof more than
metricand clinicalcriteria,it performedrela- physical and mental functioningas defined
tively better than hypothesized in physical by these two scales. To be comprehensive,
health tests and relatively worse in mental an assessment should provide information
health tests. on limitationsin engagingin normativeroles
as a result of health problems. To capture
Discussion aspects of disability, role and social func-
tioning scales were included in the SF-36
Well-accepted definitional standards35-39 survey. Observed differences on the role-
and empirical work to date18'19'24-27 have physicalscale can be interpretedas role dis-
identifiedphysical and mental components ability associated largely, but not entirely,
of health status. The SF-36 survey was con- with physical health effects. Interpretation
structedto providea comprehensiveassess- of scores may be complicated somewhat
ment of each of these dimensions.13Weused when psychiatricconditionsare present(see
259

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
MCHORNEY, WARE, AND RACZEK MEDICALCARE

incremental test of physical health). Differ- scale in both psychometric and clinical tests
ences in role-emotional scores can be inter- of the mental health component suggest that
preted with confidence as role disability as- this scale is most sensitive to the physical
sociated with mental health problems. By health dimension. Further, RV estimates for
design, the social functioning scale con- the general health perceptions scale tended
founds physical and mental health attribu- to be higher in clinical than in psychometric
tions. Accordingly, while the social func- tests of physical health. These differences in
tioning scale appears most sensitive to social results across psychometric and clinical tests
disability associated with mental health suggest this scale taps aspects of physical
problems, it is moderately sensitive to the health including but not limited to those rep-
burden of physical health problems as well. resented in the physical functioning scale.
Interpretation of social functioning scores is, Consistent with this finding, previous re-
therefore, complex and observed differences search has found measures of general health
can not be confidently attributed to either perceptions to be highly sensitive to both se-
physical or mental health problems. rious and minor physical symptoms, regard-
The vitality scale is a subjective measure less of whether they are associated with
of general well-being. By design, it was in- physical limitations or with disability.40
tended to tap both positive health states Although the results of psychometric and
(e.g., energy) as well as somatic expressions clinical tests were not identical, taken as a
of physical illness and psychological distress whole they were very similar and provide a
(e.g., fatigue). As a result, the interpretation basis for guidelines for interpreting each
of vitality scores was expected to be compli- scale. Both psychometric and clinical tests
cated relative to both physical and mental provided consistent information about the
health dimensions, and this was confirmed underlying nature of each scale-physical
empirically in both psychometric and clini- and/or mental-as well as the degree to
cal tests of validity. which each scale measured that component
The strong convergent validity of the (pure versus complex). We achieved a
bodily pain scale in the psychometric test, greater understanding of the validity of
yet poor convergent validity in medical tests, score inferences, and the quality of those in-
may be an artifact of the specific conditions ferences, by combining distinct approaches
that were represented in the severe medical to construct validation-assessment of con-
group. The four medical conditions repre- vergent and discriminant validity across psy-
sented are not typically dominated by pain. chometric and clinical standards. These re-
Consistent with this explanation, previous sults underscore the usefulness of combin-
studies have shown that the SF-36 severity ing psychometric with clinical tests to better
of bodily pain item was the most valid mea- understand the interpretation of measures.
sure in group discriminations involving pa- An important lesson of this research is
tients with arthritis and back problems.4 that a multidimensional assessment of
This issue warrants further study. Given the health is necessary to achieve a comprehen-
weak to low-moderate associations between sive understanding of the impact of disease
the bodily pain scale and both psychometric on health-related quality of life. Relatively
and clinical criteria for mental health, our pure measures, such as the physical func-
results suggest that differences in this scale tioning and mental health scales, are highly
can be attributed largely to the physical di- sensitive to the psychometric and clinical cri-
mension of health. teria studied here and permit unambiguous
The relatively poor convergent validity interpretations. However, sole use of these
results for the general health perceptions measures results in an incomplete assess-

260

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
Vol. 31, No. 3 SF-36: PSYCHOMETRIC AND CLINICAL TESTS

ment of health because they ignore varia- large score differences. Because clinically se-
tions in disability, personal evaluations of vere groups were compared, results reported
health, and general well-being. Therefore, here help to gauge the size of differences in
despite the complexity of interpretation in- scores that should be considered very large.
herent in measures of role and social disabil- These estimates apply only to the MOS SF-
ity, vitality, and perceptions of health, they 36 scoring algorithms, which are docu-
are essential qualities to measure to obtain a mented elsewhere.48 For example, a differ-
synergistic and comprehensive assessment ence of 23 points on the physical function-
of the burden of disease and/or treatment ing scale (nearly one standard deviation)
on patients' everyday functioning and well- reflects the impact of a complicated chronic
being. medical condition on everyday physical
Further, multidimensional assessments of functioning. A difference of 27 points on the
health are important because, unlike the mental health scale (1.3 standard deviation
groups deliberately formed for validity tests units) reflects the impact of serious depres-
here, most patients have multiple coexisting sive symptoms. Pending further research,
conditions, both physical and mental. For ex- the mean differences reported here are of-
ample, medical comorbidity is common fered as benchmarks for gauging very large
among patients with both chronic medi- effect sizes for the SF-36 scales. While these
cal4'41 and psychiatric42 conditions, and differences might appear to be so large as to
psychiatric comorbidity is common among render measurement meaningless, physi-
patients with medical conditions.43'44More- cians greatly underestimate patient-reported
over, given the extent of under-recogni- disabilities in physical and social function-
tion of depressive disorders in primary ing,49'50and mental health differences of this
care,21'43'45the prevalence of comorbid medi- magnitude are routinely underdetected in
cal and psychiatric conditions may be primary care.21,43'45
greater than previously reported. Results Results from tests of validity based on
from the incremental burden tests indicate comparisons between groups known to
that scales that measure both physical and differ clinically have great potential in docu-
mental dimensions may be most useful in menting the sizes of small and large differ-
these circumstances. For example, the gen- ences in general health scales as well as in
eral health perceptions scale was most valid advancing understanding of the meaning of
in detecting the combined effect of having those differences. Such tests should be ex-
both a serious medical and psychiatric con- tended to include more subtle disease-spe-
dition relative to uncomplicated patients. cific criteria to define the sizes of very small
Analysis of a unidimensional measure will score differences and tests of the convergent
not capture the range of effects disease and/ and discriminant validity of scales in detect-
or treatment have on subjective states that ing those differences. Tests based on small
have social meaning for the patient and pos- and large clinical changes over time will also
sibly clinical significance for the practi- advance understanding of how to use and
tioner. interpret general health scales. The results
One barrier to the meaningful use of gen- reported here clearly indicate that the issue
eral health status measures in clinical prac- is not as simple as whether or not a health
tice and research is the lack of information status scale is valid. At least for the SF-36
necessary to interpret scores.46'47Our results scales, validity for purposes of measuring
not only provide guidelines for interpreting one dimension of health tends to go hand in
score differences in each scale but also pro- hand with poor validity for another. Thus,
vide guidelines for establishing the size of in selecting measures of health status, prior-

261

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
MCHORNEY, WARE, AND RACZEK MEDICAL
CARE

ity should be given to those proven to be son of the Duke Health Profile and the MOS Short-
most relevant to the desired use and inter- Form in health young adults. Med Care 1991;29:679.
12. Anderson JS, Sullivan V, Usherwood TP. The
pretation. Medical Outcomes Study Instrument (MOSI)-Use of a
new health status measure in Britain. J Fam Practice
Acknowledgments 1990; 7:205.
13. Ware JE, Sherboume CD. The MOS 36-Item
The authors gratefully acknowledge the following:
Short-Form Health Survey (SF-36): I. Conceptual
Audrey Burnam, PhD, Sheldon Greenfield, MD, framework and item selection. Med Care 1992;30:473.
Richard Kravitz, MD, Alvin R. Tarlov, MD, Kenneth
Wells, MD, and Mark B. Wenneker, MD for assistance 14. Garrett HE. Statistics in Psychology and Educa-
in defining the medical and psychiatric groups com- tion. New York: Longmans, Green and Co., 1926.
pared; Cameron Cushing, MS, Stephanie Kieszak, MA, 15. Messick S. The once and future issues of validity:
and J. F. Rachel Lu, MS, for analytic assistance; helpful assessing the meaning and consequences of measure-
critiques provided by two anonymous reviewers, James ment. In: Wainer H, Braun H. Test Validity. Hillsdale,
D. Lankin for editorial assistance; and Rebecca Voris, NJ: Lawrence Erlbaum Assoc., Publishers, 1988.
Jennifer Lin, and Kathleen Clark for administrative 16. Guion RM. On trinitarian doctrines of validity.
support. Professional Psychology 1980; 11:385.
17. Nunnally JC. Psychometric Theory. New York:
McGraw-Hill Publishing Company, 1978.
References
18. Ware JE, Davies-Avery A, Brook RH. Conceptu-
1. Tarlov AR, Ware JE,Greenfield S, et al. The Medi- alization and Measurement of Health for Adults in the
cal Outcomes Study: an application of methods for Health Insurance Study: Vol. VI, Analysis of Relation-
monitoring the results of medical care. JAMA ships Among Health Status Measures. Santa Monica,
1989; 262:925. CA: The RAND Corporation, 1980 (publication num-
2. Stewart AL, Hays RD, Ware JE. The MOS Short- ber R-1987/6-HEW).
Form General Health Survey: reliability and validity in 19. Hays RD, Stewart AL. The structure of self-re-
a patient population. Med Care 1988;26:724. ported health in chronic disease patients. Psychological
3. Ware JE, Sherboure CD, Davies AR. Developing Assessment 1990; 2:22.
and testing the MOS 20-item Short-Form Health Sur- 20. Rogers W, McGlynn E, Berry S et al. Methods of
vey: a general population application. In: Stewart AL, sampling. In: Stewart AL, Ware JE, eds. Measuring
Ware JE, eds. Measuring Functioning and Well-Being: Functioning and Well-Being: The Medical Outcomes
The Medical Outcomes Study Approach. Chapel Hill, Study Approach. Chapel Hill, NC: Duke University
NC: Duke University Press, 1992. Press, 1992.
4. Stewart AL, Greenfield S, Hays RD, et al. Func- 21. Wells KB,Hays RD, Buram MA, et al. Detection
tional status and well-being of patients with chronic of depressive disorder for patients receiving prepaid or
conditions: results from the Medical Outcomes Study. fee-for-service care: results from the Medical Outcomes
JAMA 1989;262:907. Study. JAMA 1989;262:3298.
5. Wells KB, Stewart A, Hays RD, et al. The func- 22. Buram MA, Wells KB, Leake B, et al. Develop-
tioning and well-being of depressed patients: results ment of a brief screening instrument for detecting de-
from the Medical Outcomes Study. JAMA 1989; pressive disorders. Med Care 1988;26:775.
262:914. 23. McHomey CA, Ware JE, Rogers W, et al. The
6. Bindman AB, Keane D, Lurie N. Measuring health validity and relative precision of MOS short- and long-
changes among severely ill patients: the floor phenome- form health status scales and Dartmouth COOP charts:
non. Med Care 1990;28:1142. results from the Medical Outcomes Study. Med Care
7. Katon WJ, Buchwald DS, Simon GE, et al. Psychi- 1992; 30:MS253.
atric illness in patients with chronic fatigue and those 24. Bergner M, Bobbitt RA, Carter WB, et al. The
with rheumatoid arthritis. J Gen Intern Med 1991; Sickness Impact Profile: development and final revi-
6:277. sion of a health status measure. Med Care 1981; 19:787.
8. Wu AW, Rubin HR, Mathews WC, et al. A health 25. Greenwald HP. The specificity of quality-of-life
status questionnaire using 30 items from the Medical measures among the seriously ill. Med Care
Outcomes Study. Med Care 1991;29:786. 1987;25:642.
9. Wachtel T, Piette J, Mor V, et al. Quality of life in 26. Hall JA, Epstein AM, McNeil BJ. Multidimen-
persons with AIDS as measured by the Medical Out- sionality of health status in an elderly population: con-
comes Study instruments. Ann Intern Med 1992; struct validity of a measurement battery. Med Care
116:129. 1989;27:S168.
10. Kravitz RL, Greenfield S, Rogers W, et al. Differ- 27. Brooks WB, Jordan JS, Divine GW, et al. The im-
ences in the mix of patients among medical specialties pact of psychologic factors on measurement of func-
and systems of care: results from the Medical Outcomes tional status. Med Care 1990;28:793.
Study. JAMA 1992;267:1617. 28. Comrey AL. A first course in factor analysis.
11. Parkerson GR, Broadhead WE, Tse, CJ.Compari- New York: Academic Press, 1973.

262

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions
Vol. 31, No. 3 SF-36: PSYCHOMETRIC AND CLINICAL TESTS

29. Wenneker MB, Greenfield S, McHorney CA, et 40. Shapiro MF, Ware JE, Sherboure CD. Effects of
al. The validity of a severity scale for hypertension in cost sharing on seeking care for serious and minor
predicting functional status and well-being: results symptoms: results of a randomized controlled trial.
from the Medical Outcomes Study. Clin Res Ann Intern Med 1986; 104:246.
1990; 38:228A. 41. Lohr KN, Kamberg CJ, Keeler EB, et al. Chronic
30. Wenneker MB, McHorey CA, Kieszak SM, et al. disease in a general adult population: findings from the
The impact of diabetes severity on quality of life: results RAND Health Insurance Experiments. West J Med
from the Medical Outcomes Study. Clin Res 1986; 145:537.
1991;39:612A. 42. Wells KB, Rogers W, Burman A, et al. How the
31. Dixon WJ, Massey FJ. Introduction to statistical medical comorbidity of depressed patients differs
analysis. New York: McGraw-Hill Book Company, Inc. across health care settings: results from the Medical
1951. Outcomes Study. Am J Psychiatry 1991; 148:1688.
32. Snedecor GW, Cochran WG. Statistical meth- 43. Nielsen AC, Williams TA. Depression in ambu-
ods, 8th ed. Ames, Iowa: Iowa State University Press, latory medical patients: prevalence by self-report ques-
1967. tionnaire and recognition by nonpsychiatric physicians.
33. Liang MH, Larson MG, Cullen KEet al. Compar- Arch Gen Psychiatry 1980;37:999.
ative measurement efficiency and sensitivity of five 44. Rodin G, Voshart K. Depression in the medically
health status instruments for arthritis research. Arthri- ill: an overview. Am J Psychiatry 1986; 143:696.
tis Rheum 1985;28:542. 45. Prestidge BR, Lake CR. Prevalance and recogni-
34. Broadhead WE, Blazer DG, George LK,et al. De- tion of depression among primary care outpatients. J
pression, disability days, and days lost from work in a Family Practice 1987;25:67.
prospective epidemiologic survey. JAMA 1990; 264: 46. Nelson EC, Berwick DM. The measurement of
2524. health status in clinical practice. Med Care 1989;
35. World Health Organization. Constitution of the 27:S77.
World Health Organization. In: Basic Documents. Ge- 47. Deyo RA, Patrick DL. Barriers to the use of
neva: World Health Organization, 1948. health status measures in clinical investigation, patient
36. Bergner M. Measurement of health status. Med care, and policy research. Med Care 1989;27:S254.
Care 1985;23:696. 48. International Resource Center for Health Care
37. Spitzer WO. State of Science 1986: quality of life Assessment. How to score the MOS 36-item short-term
and functional status as target variables for research. J health survey (SF-36). Boston: The Health Institute,
Chron Dis 1987;40:465. 1992.
38. Ware JE. Standards for validating health mea- 49. Nelson E, Conger B, Douglass R, et al. Functional
sures: definition and content. J Chron Dis 1987;40:473. health status levels of primary care patients. JAMA
39. Patrick DL, Erickson P. Assessing health-related 1983;249:3331.
quality of life for clinical decision making. In: Walker 50. Calkins DR, Rubenstein LV, Cleary PD, et al. Fail-
SR, Rosser RM, eds. Quality of Life: Assessment and ure of physicians to recognize functional disability in
Application. Lancaster: MTP Press Limited, 1988. ambulatory patients. Ann Intern Med 1991;114:451.

263

This content downloaded from 137.99.31.134 on Mon, 13 Apr 2015 21:42:56 UTC
All use subject to JSTOR Terms and Conditions

You might also like