Global of Functioning: Assessment
Global of Functioning: Assessment
A Modified Scale
The modified Global Assessment of Functioning (GAF) scale has more detailed crite-
ria and a more structured scoring system than the original GAF. The two scales were
compared for reliability and validity. Raters who had different training levels assigned
hospital admission and discharge GAF scores from patient charts./ntracJass correla-
tion coefficients for admission GAF scores were higher for raters who used the modi-
fied GAF (0.81 ), compared with raters who used the original GAF (0.62). Validity
studies showed a high correlation (0.80) between the two sets of scores. The modified
GAF also correlated well with Zung Depression scores (-0.73). The modified GAF
may be particularly useful when interrater reliability needs to be maximum and/or
when persons with varying skills and employment backgrounds~nd without much
GAF training--must rate patients. Because of the increased structure, the modified
GAF may also be more resistant to rater bias. (Psychosomatics 1995; 36:267-275)
illness severity. Endicott et al. 4 tested the reli- verity rating and is the more frequently used
ability of the GAS in 5 studies and reported instrument. The criteria and scoring changes
intraclass correlation coefficients ranging from that we made in the GAF were tested among a
0.61 to 0.91, with associated standard error of small group of staff members who rated patients
measurement scores ranging from 5.0 to 8.0 from successive drafts of the modified scale.
units. Most of Endicott's ratings were done by When staff members had different ratings of a
only a few, well-trained interviewers. Having given patient, their reasons were discussed, and
consistently trained interviewers should pro- changes were made in the wording or use of the
duce a greater likelihood of higher interrater criteria or scoring directions.
reliability scores and smaller standard errors. Reliability studies for both the original and
Yet even with this bias, two of the studies had modified GAF scales were based on ratings of
intraclass correlation coefficients in the 0.60s, 16 patient intake histories and discharge sum-
suggesting that the scale might be less reliable maries taken from the patients' hospital charts.
than had been hoped. In contrast, one of the All of these patients had diagnoses of major
reliability studies used 15 raters of different depression with or without comorbid eating dis-
backgrounds and training levels. Although the orders. They had all been inpatients on the Af-
intraclass correlation coefficient was high, this fective!Eating Disorders Unit, and their intake
was due primarily to a greater heterogeneity of histories were obtained by one of the same two
illness severity as compared to the other studies, doctors. These particular 16 patients were cho-
not to interrater consistency of scoring. The lack sen for review because they had the most de-
of interrater consistency was demonstrated by a tailed intake histories and discharge summaries
high standard error of measurement not seen in available. Thus, a maximum amount of patient
the other studies. information was available for evaluation with
Although we could not find published reli- the GAF.
ability studies on the GAF in the literature, our Two groups of staff from the psychiatric
subjective experience at Florida Hospital was units at Florida Hospital rated each of the same
that the GAF was used by staff members of 16 patient histories and discharge summaries.
different backgrounds (physicians with varying All patients were given a GAF score for the
degrees of familiarity with the scale, nurses, severity of illness at admission and a second
Ph.D. researchers), and GAF ratings from these GAF score for illness severity at discharge. One
staff differed substantially for the same patient. group of staff rated the patients using the origi-
Thus, we hypothesized that the original GAF nal GAF, and the other group of staff rated the
might be less reliable than we had expected. To patients using the modified GAF. None of the
test this hypothesis and to improve interrater staff received any training in the use of either
reliability, we developed a modified GAF scale, GAF, but they were allowed to read it and to ask
and we formally tested interrater reliability in questions for clarification. This procedure was
the original and modified versions of the GAF. followed to evaluate the consistency of ratings
We conducted our study in 1992-1993. by untrained staff; therefore, we could evaluate
the soundness and reliability of each GAF under
METHODS these conditions.
The staff in the group using the original
A modified GAF scale was developed by in- GAF consisted of 12 professionals (nurses, phy-
creasing the structure of the original GAF in- sicians, social workers, psychiatry technicians,
strument with a greater number of criteria and and clinical Ph.D. 's) assigned to 2 inpatient
with additional directions for assigning scores. treatment units (affective/eating disorders and
We chose to modify the GAF rather than the psychiatric/medical). The staff in the group us-
GAS because the GAF, as listed in the DSM- ing the modified GAF consisted of another
III-R, reflects more current ideas on illness se- group of 12 professionals from other inpatient
26K PSYCHOSOMATICS
Hall
units (acute general psychiatry, adolescent, or tient's illness severity were added at the end of
intensive treatment). Within each of the rating each IO-point interval. The purpose for these
groups (original or modified GAF), the means additions was to decrease the variability in scor-
and standard errors were calculated for the rat- ing. Usually, the scoring within a IO-point in-
ings of each patient on admission and discharge. terval applied only to the criteria within that
Intraclass correlation coefficients (ICC) were interval. For example. in the 81-90 interval, a
then calculated separately for the original GAF patient having no symptoms or problems re-
group on admission and discharge and for the ceived a score of 88-90; a patient having mini-
modified GAF group on admission and dis- mal symptoms or problems received a score of
charge. Both the admission and discharge cor- 84-87; and a patient having minimal symptoms
relation coefficients were compared between and problems received a score of 81-83 (Table
the groups. I). However, in the 21-30, 31~0, and 41-50
The concurrent validity of the modified scoring intervals. the same 10 criteria were
GAF was tested by comparing admission scores listed in each interval, and the score depended
of this instrument with admission scores on the on the number of criteria that a patient met
original GAF, the Zung depression test, and a within these 3 scoring intervals.
self-rating of global illness severity. Pearson For example, if a patient met I of these
Product Moment correlations were used for criteria. the score was 48-50; if a patient met 2
these three assessments of validity. For the of the criteria, the score was 44-47; and if the
modified and original GAF comparison, admis- patient met 3 of the criteria, the score was 41-
sion scores were obtained from the same 16 43. However, if the patient met 4-6 of the crite-
patient histories and discharge summaries as in ria, the scores ranged from 31 ~o. If the patient
the reliability tests. For the modified GAF and met 7-10 of the criteria, the scores ranged from
Zung comparison and the modified GAF and 21-30 (Table I). Finally, in the 21-30 scoring
self-rating of illness comparison, data were ob- interval, a unique set of criteria and scores also
tained from outpatient telephone interviews existed in addition to the criteria and scoring
with 142 patients who had been discharged already discussed. These unique criteria were
from Florida Hospital 6 months to 1.5 years listed in the original GAF and were deemed to
before. These patients all had diagnoses of ma- be of sufficient seriousness that they should not
jor depression with or without comorbid diag- be added to the list of criteria in the 31~0 and
noses of eating disorder. Each patient had been 41-50 intervals but rather would warrant the
evaluated using the modified GAF only, the lowest score available in the 21-30 category.
Zung depression test, and a self-illness severity Thus, suicidal preoccupation and preparation,
rating. The self-rated global illness scores were behavior considerably influenced by delusions
on a scale of 1-10, where I was sickest and 10 or hallucinations, or serious impairment in com-
was most healthy. munication (i.e., sometimes incoherent or pro-
found stuporous depression). always elicited a
RESULTS score of 21.
The various changes we made in modifying
The Modified GAF the GAF made it longer than the original GAF
(4 pages vs. I). Thus, it is suggested that when
The modified GAF retained the same 1-90 using this new GAF, the interviewer should
scale with the same IO-point intervals as the question the patient about each of the criteria,
original GAF. All criteria in the original GAF then write down answers, and later count the
were retained and were listed on separate lines number of criteria that the patient meets. It is
to facilitate quick reading (Table I). Additional felt that the slower speed in assigning a score
criteria were added to most of the IO-point from the modified GAF is compensated for by
intervals. and directions for scoring the pa- the increased consistency of ratings attributable
Interestingly, all of the means for the pa- when interrater reliability needs to be as high as
tient's admission GAF scores were also higher it can be or when multiple persons of varying
in the original GAF group than in the modified employment backgrounds and without much
group. Thus, the modified GAF caused patients GAF training will rate patients. Research is a
to be rated more sick than the original GAF. prime example for both uses of the modified
GAF. Usually during research studies, there
Concurrent Validity would also be enough time to read this longer
GAF and assign ratings.
Because all of the mean admission GAF Another use for the modified GAF, com-
scores for the original group were higher than pared with the original GAF or GAS, is in
the scores in the modified group, we wanted to evaluating the need for hospital admission. Spe-
test the correlation between the scores of the cifically, Thompson et al.,2 in a review of 9,055
two GAF's and test the correlation of the modi- adult intakes, found marked variations in the
fied GAF with other psychological assess- way managed care case managers, compared
ment tests. The Pearson Product Moment with providers, assigned GAS scores generated
correlation coefficient between the 16 original from the same data. Thompson and colleagues
and 16 modified mean admission scores was felt that higher (less sick) scores reflected a
0.80, P < 0.00 I, df = 14, showing good correla- need by managed care companies to limit the
tion (Table 2). use of all inpatient services rather than their
Because all of the patients used in these desire to selectively eliminate unnecessary hos-
studies were depressed, we also compared pitalizations. The ability of the managed care
modified GAF scores with the scores from the industry to affect the GAS scores in this way is
Zung depression test. The Pearson Product attributed to the relatively less-structured na-
Moment correlation coefficient was -0.73, ture of the GAS instrument, leading to lower
P < 0.001 (negative because a higher number interrater reliability. As we have shown, the
represents sickness in the Zung scores and a modified GAF is both more structured than the
lower number represents sickness in the GAF) original GAF or GAS and has better interrater
(Table 2). reliability on admission scores. Thus, the modi-
Finally, we also correlated modified GAF fied GAF is less likely to reflect a bias by a
scores with the scores that patients gave them- managed care or governmental agency.
selves to indicate their severity of illness. The In addition to reliability tests, modified
Pearson Product Moment correlation coeffi- GAF ratings were also correlated with Zung
cient was 0.58, P < 0.01 (Table 2). depression tests and self-ratings of illness se-
verity in outpatients. Similar to reliability tests,
DISCUSSION these correlations were in the same range as the
correlations that Endicott et al. 4 found between
Our finding of an intraclass correlation coeffi- the original GAS and the Mental Status Exami-
cient of 0.62 for admission scores on the origi- nation Record (MSER) or the Family Evalu-
nal GAF agreed with Endicott et al.'s report4 of ation Form (FEF) in outpatients. The slightly
ICC's ranging from 0.61 to 0.91. Our ICC of higher correlation between the modified GAF
0.62 was significant at P < 0.001, thus indicat- and the Zung depression test (-0.71), compared
ing that while the reliability was somewhat low with the original GAS and MSER (0.62), prob-
for admission ratings, it still was perfectly us- ably was because all of our patients were de-
able. Likewise, the ICC for discharge ratings pressed and the Zung specifically assessed
from the original GAF was 0.90, which indi- depression. In contrast, the MSER is a global
cates excellent reliability. The value of the rating scale like the GAS, and there was prob-
modified GAF (with its admission ICC of 0.81 ably greater heterogeneity among these
and discharge ICC of 0.95) is for instances patients. However, both of these sets of correla-
274 PSYCHOSOMATICS
Hall
tions were acceptable, thereby indicating that viewer assessments of patients. Thus, the inter-
interviewer rated scales provide similar types of viewer vs. self- or family-rating procedures for
information and the original GAS and modified measuring severity of illness often cannot be
GAF each show acceptable validity. Interest- considered as providing similar or redundant
ingly, both the self-rated illness severity test information.
that we correlated with the modified GAF and The modified GAF is an instrument having
the FEF, correlated by Endicott with the original a higher reliability and similar validity to the
GAS, gave scores based on someone other than original GAF or GAS. The modified GAF may
the interviewer's judgment, specifically the pa- be particularly useful when interrater reliability
tient or the patient's family. Both of these sets needs to be maximum (i.e., in research or as a
of correlations were fairly low, 0.58 for the tool to determine need for hospitalization)
self-rated scale and modified GAF and -0.52 or and/or when multiple persons of varying skills
-0.45 for the FEF and original GAS. While the and employment backgrounds and without hav-
Zung is also a self-rated instrument, its ques- ing had much GAF training (i.e., in managed
tions are more objective than the self-rated care organizations) must rate patients. In addi-
global illness scale or FEF, which may have tion, when used to evaluate the need for hospital
accounted for the Zung's higher correlation admission, the modified GAF is less likely than
with the GAF. Still, MeGlashans and Pfeiffe~ the original GAF or GAS to reflect a provider
reported that patient self-assessments and phy- or managed care bias. Thus, our modified GAF
sician or interviewer assessments of patients may be a better and improved patient assess-
may differ significantly. One might also expect ment tool, one that can more accurately reflect
the same discrepancy between family and inter- a patient's true need for hospitalization.
References
I. Westenneyer J: Problems with managed psychiatric care 4. Endicott J. Spitzer RL. Fleiss JL. et al: The global
without a psychiatrist-manager. Hosp Community Psy- assessment scale. A procedure for measuring overall
chiatry 1991; 42:1221-1224 severity of psychiatric disturbance. Arch Gen Psychiatry
2. Thompson JW. Bums BJ. Goldman HH. et 31: Initial 1976; 33:76fr77 1
level of care and clinical status in a managed mental 5. McGlashan TH: The chestnut lodge follow-up study II.
health program. Hosp Community Psychiatry 1992; Long-tenn outcome of schizophrenia and the affective
43:599-603 disorders. Arch Gen Psychiatry 1984; 41 :58fr60 I
3. McGlashan T led): The Documentation of Clinical Psy- 6. Pfeiffer SI: An analysis of methodology in follow-up
chotropic Drug Trials. Rockville. MD. National Institute studies of adult inpatient psychiatric treatment. Hosp
of Mental Health. 1973 Community Psychiatry 1990; 41: 1315-1321