Assessment of Language Disorders in Children
Assessment of Language Disorders in Children
Acknowledgement
Why I Wrote This Book
‘‘You can’t kill anyone with speech-language pathology.”
I came to speech-language pathology by what was then an unconventional route—a Ph.D. in a nonclinical speciality
within behavioral sciences, followed by postdoctoral study, clinical practicum, and a clinical fellowship year. Thus, I
was unschooled in the humorous wisdom that is passed along with more standard fare to speech-language pathology
doctoral students through the years. I was able to glean only one or two such aphorisms from my contacts with a more
conventionally trained and clinically savvy colleague.
“You can’t kill anyone with speech-language pathology,” she said. A balm to the anxieties of a beginning clinician who
knows that there is so much she does not know. A bit of humor to help you while you learn. However, the more clients I
worked with, the more I was haunted by this aphorism. Certainly, killing was exceedingly rare to nonexistent, but
looming large were the specters of unfulfilled hopes and wasted time. The possibility for improving children’s lives
became ever clearer, but so did the possibility of less desirable outcomes.
Page xii
Initially my clients were preschoolers whose parents were baffled by their children’s failure to express themselves
clearly, or they were school-aged children who were diagnosed with both language-learning disabilities and serious
emotional problems. More recently, my clients have included unintelligible children whose problems were largely
limited to their phonology as well as children whose problems encompassed not only that one aspect of language, but
almost all other areas one might examine. All of these clients—like those with whom you currently work or will soon
work—present us with puzzles to be solved and responsibilities to be met if we are to help them.
The puzzle presented by children with language disorders is the array of abilities and difficulties that they bring to
language learning and use. I use the word “puzzle” because, like puzzles, their problems at first suggest many alternative
modes of solution—some better, some worse, and some probably of no value at all. Thus, “responsibilities” follow from
our professional obligation to help children maximize their skills and minimize their problems in the process of
deciphering the particular pattern of intricacies they present.
In short, the reason I wrote this book was to help identify better ways of dealing with the puzzles and responsibilities that
are so frustratingly linked in our interactions with our clients. By finding the best ways of dealing with these puzzles and
responsibilities we can avoid the harm implied by the aphorism quoted earlier and can instead enrich their lives by
helping them improve their communication with others.
How This Book Is Organized
Overall Organization of the Book
This book is divided into three major sections. In Part I, concepts in measurement are explained as they apply to
children’s communication. Although some of these concepts are quantitative in nature, others relate to the social context
in which measurements are made and used. Special emphasis is placed on the concepts of validity and reliability because
all other measurement characteristics are ultimately of interest by virtue of their effects on reliability and, more
importantly, on validity. This part of the book concludes with a chapter providing direct advice regarding the
examination of materials associated with measurement tools for purposes of determining their usefulness for a particular
child or group of children.
In Part II, four major categories of childhood language disorders are discussed: specific language impairment (chap. 5),
language problems associated with mental retardation (chap. 6), autism spectrum disorders (chap. 7), and language
problems associated with hearing impairment (chap. 8). These four categories were selected because they are the most
frequently occurring childhood language disorders. Although children across these disorder categories share many
problems, each group also presents unique challenges to assessment and management. Some of these challenges relate to
the heterogeneity of language and other abilities shown by children in the category, the relative amount of information
available due to the rarity of the problem, and the often diverse theoretical orientations of researchers. Each of these
chapters provides a bare-bones introduction to the disorder category: its suspected causes,
Page xiii
special challenges to language assessment, expected patterns of language performance, and accompanying problems that
are unrelated to language. A full description of any one of these disorders would require several books as long as this
one. Consequently, readers are directed to more comprehensive sources for further learning but are given sufficient
information to anticipate how language assessment will need to be focused in order to begin to respond to the special
needs of each group of children.
In Part III, three major types of questions that serve as the starting points for assessment are introduced and then pursued
in detail—from theoretical underpinnings to currently available measures. The major questions correspond to steps in the
clinical interaction. First, the clinician must determine whether a language problem exists; second, he or she must
determine the nature of the problem—both in terms of specific patterns of impairment across language domains and
modalities and in terms of specific problem areas within each domain and modality. Finally, he or she must track change,
determining how the client’s behaviors are changing and whether treatment seems to be the cause of identified
improvements. In the course of addressing each of these questions, the reader is taken through the steps required to move
from the question to the tools available to answer it for any given client.
Organization within Chapters
Each chapter contains several features designed to assist readers in mastering new content and in searching the text for
specific information. Chapter outlines and enumerated summaries of major points aid readers interested in obtaining an
overview of chapter content. To help readers with new or unfamiliar vocabulary, key terms are highlighted in the text,
defined when of particular importance, and listed at the end of each chapter. Finally, a list of study questions and
recommended readings is designed to allow readers to pursue topics further.
Acknowledgments
Whereas the flaws of this book are certainly of my own doing, its virtues owe much to the help I have received from
colleagues and friends. Numerous colleagues in Vermont and elsewhere read sections of the book and contributed
greatly to my understanding of the diverse group of children described in it and deserve my considerable thanks. Among
them are Melissa Bruce, Kristeen Elaison, Laura Engelhart, Julie Hanson, and Julie Roberts. In addition, I owe special
appreciation to Barry Guitar, whose experience with his own books helped him provide the most meaningful
encouragement and advice on all aspects of the project. I am particularly grateful for his ability to temper constructive
criticism with ego-boosting praise. My long-time colleague and friend Martha Demetras took on a heroic and most
helpful reading of a near final form of the book. She along with Frances Billeaud, Bernard Grela and Elena Plante read
some of the most challenging sections and tried to help keep me on track. At Lawrence Erlbaum Associates, Susan
Milmoe, Kate Graetzer, Jenny Wiseman, and Eileen Engel have helped me countless times through their expertise and
patience. Irene Farrar took my graphics and made them both clearer and more inter-
Page xiv
esting and Kathryn Houghtaling made the cover all I could have hoped for. She did this with help of the photographer
Holly Favro and her most graceful niece Sara Faust.
Although not involved with this project directly, there are several mentors who have shaped my interest in the topics
discussed here and contributed substantially to my ability to tackle those topics as well as I have. They have my respect
and gratitude always: Ralph Shelton, Linda Swisher, Betty Stark, Dick Curlee, and Dale Terbeek. Finally, I owe great
thanks to my parents, who each read and commented on some portion of the book and who provided encouragement
along the way, not to mention the foundation that led me to want to pursue this project.
Page 1
CHAPTER
1
Introduction
Purposes of This Text
Why Do We Make Measurements in the Assessment and Management of Childhood Language Disorders?
Fig. 1.1. A decision matrix for the decision of whether to refer Mary Beth for neurologic evaluation.
been used to assess the implications of alternative choices in a variety of fields (Berk, 1984; Thorner & Remein, 1962;
Turner & Nielsen, 1984). To construct such a matrix as a means of considering repercussions for a single case, one
pretends that one has access to the ultimate “truth” about what is best for Mary Beth. From that perspective, a referral
either should or should not be made—no doubts.
With such perfect knowledge, therefore, suppose that a referral should be made. In that case, the clinician will have made
a correct judgment if he or she has referred and an incorrect one if he or she has not. If the clinician errs by not referring,
Mary Beth may become involved in the expense and frustration of continuing speech-language treatment that is doomed
to failure. Further she may be delayed in or prevented from receiving attention for an incipient neurologic condition,
which, in turn, could have serious, even life-threatening consequences. Although this error might be corrected over time,
its effects are likely to be relatively long lasting and potentially costly in terms of time and money.
On the other hand, suppose that the “truth’’ is that a referral is not needed and therefore should not be made. In that case
the clinician will have made a correct judgment if she has not referred and an error if she has. Plausibly, this type of error
may result in a needless expenditure of time and money and in undue concern on the part of Mary Beth’s family. A bit
more positively, however, the effects of this error would probably be relatively short-lived: Once the neurologic
evaluation took place, the concern would probably end.
A decision matrix makes it clear that different errors in clinical decision making are associated with different effects.
Errors vary in terms of the likelihood that they
Page 7
will be detected, the time course for that detection, and the nature of costs they will exact from the client and clinician.
The decision matrix, therefore, is a particularly powerful tool because it allows one to examine both the frequency and
type of errors made. I return to this type of matrix frequently because of its helpfulness in thinking about tools used to
reach clinical decisions.
In the next section of this chapter, I introduce methods used to understand (and therefore potentially to improve) clinical
decision making. Their description is followed by the introduction of a model that is intended to serve as a framework in
which to think about the steps involved in formulating and answering clinical questions.
A Model of Clinical Decision Making
The processes by which individuals make decisions about complex problems—such as those involved in a variety of
clinical settings—have been the focus of several lines of research (Shanteau & Stewart, 1992; Tracey & Rounds, 1999).
Each differs from the others somewhat in intent, but all have something to offer anyone interested in clinical decision
making.
First, decision making has been of interest to psychologists who want to understand how complicated problems are
solved and to what extent those who are acknowledged “expert” problem solvers in a given area (e.g., chess, medicine,
accounting) differ from naive problem solvers (Barsalou, 1992). Second. skilled decision making has been studied by
researchers from a variety of disciplines who wish to develop computer programs called expert systems, which seek to
mimic expert performance (Shanteau & Stewart, 1992). Such researchers have focused on the creation of computer
programs yielding optimal clinical judgments. Because they focus on successful decision making, these researchers have
been uninterested in understanding expert errors in decision-making. Finally, there has been a much smaller group of
researchers who study the nature and process of decision making in specific fields for the benefit of the field itself. In
speech-language pathology and audiology, such research has increased dramatically over the last decade (e.g., McCauley
& Baker, 1994; Records & Tomblin, 1994; Records & Weiss, 1991). Researchers in this third category tend to be
interested in both errors and successful performance, often as a means of improving professional training.
You may be asking, “How does research on decision-making relate to measurement in speech-language pathology?” and
more specifically, “How can it help me be a better professional?’’ To begin with, a detailed understanding of expert
clinical decision making may help beginning clinicians reach the ranks of “expert” more quickly. For example, such an
understanding may identify which sources of information and which methods experts use—as well as which ones they
avoid. Another potential benefit of research in clinical decision making is that it may identify problems that beset even
experienced clinicians, thereby helping decision makers at all levels be vigilant in avoiding them (e.g., Faust, 1986;
Tracey & Rounds, 1999). A relatively brief description of two such problems may help illustrate the potential value of
this type of research.
In a review of research on human judgment in clinical psychology and related fields, Faust (1986) described clinicians’
over-reliance on confirmatory strategies. Essentially, the use of a confirmatory strategy means that after forming a
hypothe-
Page 8
sis early in the course of decision making (e.g., regarding a diagnosis, etiology, or some other clinical question), the
clinician proceeds to search out and emphasize information tending to confirm the hypothesis. At the same time, she or
he may fail to search out discrepant evidence. The tendency for very able clinicians to adopt such a strategy has been
demonstrated repeatedly in studies in which clinicians are asked to make decisions on hypothetical clinical data
(Chapman & Chapman, 1967, 1969; Dawes, Faust, & Meehl, 1993).
For an example of how a confirmatory strategy might operate in a case of decision making in speech-language
pathology, I return to the case of Alejandro. Suppose that Alejandro’s clinician initially develops the hypothesis that
Alejandro responds most consistently when communicating in English. The clinician would be using a confirmatory
strategy if she or he failed to evaluate Alejandro’s performance for Spanish and informally sought teachers’ impressions
of how well Alejandro was responding to the English-only approach she had recommended, but did so in such a way as
to invite only positive reactions.
A second example of a problem in clinical decision making has been described as the failure to “realize the extent to
which sampling error increases as sample size decreases.” (Faust, 1986, p. 421). Tversky and Kahneman (1993)
described this practice as evidence of “the belief in the law of small numbers,” by which they mean the tendency to
assume that even a very small sample is likely to be representative of the larger population from which it is drawn.
Returning to one of the hypothetical cases presented earlier, imagine this sort of problem as menacing the clinician who
is to evaluate Mary Beth, the youngster with Down syndrome. Suppose that that clinician were to have seen only two or
three children with Down syndrome during her clinical career—each of whom had made exceptionally poor progress.
The danger would be that the clinician would consider those few children she had seen as representative of all children
with that diagnosis, thereby causing her to downplay the stated concerns about Mary Beth’s lack of progress.
Neither of these problems in clinical decision making has been seen as evidence of gross incompetence. Although poor
clinicians may succumb more frequently to these practices, the practices themselves should be of considerable concern to
scientifically oriented clinicians precisely because they seem to be related to tendencies in human problem solving, and
they must actively be worked against for the good of clients and of the profession.
Once aware that bad habits such as those described above may creep into clinical decision making, the wary clinician
can seek remedies. Among the remedies recommended for the tendency to use a confirmatory strategy is the adoption of
a disconfirmatory strategy, in which evidence both for and against one’s pet hypothesis is sought after and valued.
Similarly, a belief in the law of small numbers can be undermined by reminders that when one has only limited
experience with individuals with a particular type of communication disorder, the characteristics of people from that
sample are quite likely to be unrepresentative of that population as a whole.
Although the process by which speech-language pathologists and audiologists reach clinical decisions is far from well
understood at this point (Kamhi, 1994; Yoder & Kent, 1988), the model shown in Fig. 1.2 is intended to serve as a
working model that can be
Page 9
Fig. 1.2. A model illustrating the ways in which measurements are used to reach clinical decisions leading to the
initiation or modification of clinical actions.
elaborated on as understanding increases. Such a graphic model can help emphasize the varied nature of the processes
involved in reaching complex clinical decisions, including both those that are very deliberate and readily available for
inspection as well as those that are almost automatic and less available for observation.
The process of clinical decision making is initiated as the speech-language pathologist formulates one or more clinical
questions. Although such questions may often coincide with those actually expressed by the client, they may not always
do so. Thus for example, the parents of 3-year-old Mary Beth may not have expressed interest in having her hearing
status evaluated. On the other hand, her speech-language pathologist would see that as a critically important question,
given both the susceptibility to middle ear infection with associated hearing loss among children with Down syn-
Page 10
drome and the pivotal role of hearing in speech-language acquisition. This example points out that clinical questions
arise both from clients’ expressions of need and from the expert knowledge possessed by the clinician.
The formulation of clinical questions is of central importance to the quality of clinical decision making because it drives
all that follows. First, the clinical question determines what range of information should be sought. Second, it guides the
clinician in the selection or creation of appropriate measurement tools. In fact, it is widely held that any measurement
tool can only be evaluated in relation to its adequacy in addressing a specific clinical question (American Educational
Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in
Education [NCME], 1985; Messick, 1988). No measure is intrinsically “good” or “valid.” Rather, the quality of a
measure varies depending on the specific question it is used to address. Thus, for example, a given language test may be
an excellent tool for answering a question about the adequacy of 4-year-old Mary Beth’s expressive language skills, yet
it may be a perfectly awful tool if used to examine such skills for 9-year-old bilingual Alejandro.
Optimally, specific measurement tools will be selected so as to address the full scope of each clinical question being
posed using the best measures available (Vetter, 1988b). For some questions, however, the wealth of commercially
available standardized tests and published procedures will fail to yield any acceptable measure, or even any measure at
all. At such times, clinicians may decide to develop an informal measure of their own (Vetter, 1988a), or they may
simply have to admit that not all clinical questions for all clients are answerable (Pedhazur & Schmelkin, 1991).
The administration or collection of selected clinical measures is certainly the most obvious portion of the clinical
decision-making process. Its importance can be emphasized by reference to the data-processing adage “garbage in,
garbage out.’’ Put more decorously, the act of skillful administration is crucial to the quality of information obtained.
Haphazard compliance with standard administration guidelines may render the information obtained spurious and
misleading, thereby undermining all later efforts of the clinician to use it to arrive at a reasonable clinical decision.
Following data collection, the clinician examines information obtained across a variety of sources and integrates that
information to address specific clinical questions. For example, in order to comment on the reasonableness of progress
made by Mary Beth during the past 2 years, her speech-language pathologist will need to perform a Herculean task—
integrating across time and content area measures related to speech, language, hearing, and nonverbal cognition.
Components of the clinical decision-making process outlined in Fig. 1.1 have received differing amounts of attention
from speech-language pathology and audiology professionals. Thus, for example, considerable attention has been paid to
the formulation of relevant clinical questions for specific categories of communication disorders (e.g., Creaghead,
Newman & Secord, 1989; Guitar, 1998; Lahey, 1988). On the other hand, little has been written about how clinicians
can use such information to arrive at effective clinical decisions (Records & Tomblin, 1994; Turner & Nielsen, 1984).
Therefore, in the remainder of this text, both venerable concepts and emerging hypotheses will be shared to help readers
improve the quality of their clinical decision
Page 11
making and, consequently, of their clinical actions toward children with developmental language disorders.
Summary
1. Measurement of developmental language disorders draws on methods used in a wide variety of disciplines.
2. The purposes of this text are to help readers learn to frame effective clinical questions that will guide the decision-
making process, to recognize that all measurement opportunities present alternatives, and to recognize the connection
between the quality of clinical actions and the quality of measurement used in the clinical decision-making process.
3. Speech-language pathologists obtain and use information obtained through measurement to arrive at diagnoses that
affect medical, educational, social, and even legal outcomes. They derive this information cooperatively with others (e.
g., families and other professionals) and share it with others as a means of achieving the child’s greatest good.
4. Measurement is important because it helps drive clinical decision making, which in turn affects clinical actions.
5. Measurement is used to address clinical questions related to screening, diagnosis, planning for treatment, determining
severity, evaluating treatment efficacy, and evaluating change in communication over time.
6. The cognitive processes involved in clinical decision making are not well understood but have begun to be studied in
research addressing complex problem solving, computer expert systems, and specific issues within a variety of fields (e.
g., medicine, special education).
7. Examples of problematic tendencies that have been identified as possible barriers to effective clinical decision making
include the use of confirmatory strategies and the belief in the law of small numbers.
Key Concepts and Terms
belief in the law of small numbers: the tendency to overvalue information obtained from a relatively small sample of
individuals, for example, those few individuals with an uncommon disorder with whom one has had direct contact.
clinical decision making: the processes by which clinicians pose and answer clinical questions as a basis for clinical
actions such as diagnosing a communication disorder, developing a treatment plan, or referring a client for medical
evaluation.
confirmatory strategy: the tendency to seek and pay special attention to information that is consistent with a clinical
hypothesis while failing to seek, or undervaluing, information that is not consistent with the hypothesis.
decision matrix: a method used to consider the outcomes associated with correct and incorrect decisions.
Page 12
differential diagnosis: the identification of a specific disorder when several diagnoses are possible because of shared
symptoms (self-reported problems) and signs (observed problems).
measurement: methods used to describe and understand characteristics of a person.
Study Questions and Questions to Expand Your Thinking
1. Taking each of the three cases described earlier in the chapter, use Table 1.1 to determine what types of clinical
decisions and related clinical actions are likely to be required for each.
2. For each of those cases used in Question 1, identify a binary clinical decision and consider the implications of the two
kinds of errors that can result.
3. On the basis of your current knowledge of children with language disorders, develop a hierarchy of outcomes that
might result from clinical errors in the following cases:
● screening of hearing in a 4-month-old infant;
● collection of treatment data in English for a child whose first language is Vietnamese;
● collection of trial treatment data for purposes of selecting treatment goals for a child exhibiting significant semantic
delays;
● evaluation of a language skills in a child who exhibits severe delays in speech development.
4. Think about decisions—big and small—that you may have made during the last week. Try to remember the process by
which you reached your decision. Did any of your decision making involve the use of a confirmatory strategy? Describe
the specific example and how your thinking might have differed if you had avoided such a strategy.
Recommended Readings
Barsalou, L. W. (1992). Cognitive psychology: An overview for cognitive scientists. Hillsdale, NJ: Lawrence Erlbaum
Associates.
McCauley, R. J. (1988). Measurement as a dangerous activity. Hearsay: Journal of the Ohio Speech and Hearing
Association, Spring 1988, 6–9.
Tracey, T. J., & Rounds, J. (1999). Inference and attribution errors in test interpretation. In J. W. Lichtenberg & R. K.
Goodyear (Eds.), Scientist-practitioner perspectives on test interpretation (pp. 113–131). Boston: Allyn & Bacon.
References
American Educational Research Association (AERA), American Psychological Association (APA), & National Council
on Measurement in Education (NCME) (1985). Standards for educational and psychological testing. Washington, DC:
APA.
Barsalou, L. W. (1992). Thinking. Cognitive psychology: An overview for cognitive scientists. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Page 13
Berk, R. A. (1984). Screening and diagnosis of children with learning disabilities. Springfield, IL: C. C. Thomas.
Chapman, L. J., & Chapman, J. P. (1967). Genesis of popular but erroneous psychodiagnostic observations. Journal of
Abnormal Psychology, 72, 193–204.
Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs.
Journal of Abnormal Psychology, 74, 271–280.
Creaghead, N. A., Newman, P. W., & Secord, W. A. (1989). Assessment and remediation of articulatory and
phonological disorders. Columbus: Merrill.
Dawes, R. M., Faust, D., & Meehl, P. E. (1993). Statistical prediction versus clinical prediction: Improving what works.
In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp.
351–367). Hillsdale, NJ: Lawrence Erlbaum Associates.
Faust, D. (1986). Research on human judgment and its application to clinical practice. Professional Psychology, 17, 420–
430.
Guitar, B. (1998). Stuttering: An integrated approach to the nature and treatment (3rd ed.). Baltimore, MD: Williams &
Wilkins.
Kamhi, A. G. (1994). Toward a theory of clinical expertise in speech-language pathology. Language, Speech, Hearing
Services in Schools, 25, 115–118.
Lahey, M. (1988). Language disorders and language development. New York: Macmillan.
McCauley, R. J. (1988, Spring). Measurement as a dangerous activity. Hearsay: Journal of the Ohio Speech and
Hearing Association, 6–9.
McCauley, R. J. & Baker, N. E. (1994). Clinical decision-making in specific language impairment: Actual cases. Journal
of the National Student Speech-Language-Hearing Association, 21, 50–58.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13–104). New York: American Council
on Education and Macmillan Publishing.
Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Records, N. L., & Weiss, A. (1990). Clinical judgment: An overview. Journal of Childhood Communication Disorders,
13, 153–165.
Records, N. L., & Tomblin, J. B. (1994). Clinical decision making: Describing the decision-rules of practicing speech-
language pathologists, Journal of Speech and Hearing Research, 37, 144–156.
Rowland, R. C. (1988). Malpractice in audiology and speech-language pathology. Asha, 45–48.
Shanteau, J., & Stewart, T. R. (1992). Why study expert decision making? Some historical perspectives and comments.
Organizational Behavior and Human Decision Processes, 53, 95–106.
Thorner, R. M. & Remein, Q. R. (1962). Principles and procedures in the evaluation of screening for disease. Public
Health Service Monograph No. 67, 408–421.
Tracey, T. J., & Rounds, J. (1999). Inference and attribution errors in test interpretation. In J. W. Lichtenberg & R. K.
Goodyear (Eds.), Scientist-practitioner perspectives on test interpretation (pp. 113–131). Boston: Allyn & Bacon.
Turner, R. G. & Nielsen, D. W. (1984). Application of clinical decision analysis to audiological tests. Ear and Hearing,
5, 125–133.
Tversky, A. & Kahneman, D. (1993). Belief in the law of small numbers. In G. Keren & C. Lewis (Eds.), A handbook
for data analysis in the behavioral sciences: Methodological issues (pp. 341–350). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Vetter, D. K. (1988a). Designing informal assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making
in speech-language pathology (pp. 192–193). Toronto: BC Decker Inc.
Vetter, D. K. (1988b). Evaluation of tests and assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision
making in speech-language pathology (pp. 190–191). Toronto: BC Decker Inc.
Yoder, D. E., & Kent, R. D. (Eds.) (1988). Decision making in speech-language pathology. Toronto: BC Decker Inc.
Page 14
Page 15
PART
I
Case Example
Theoretical Building Blocks of Measurement
What Is Measured by Measurements?
Measurements are usually indirect, that is, they involve the description of a characteristic taken to be closely related to
but different from the characteristic of interest. As an illustration of this notion, Pedhazur and Schmelkin (1991)
considered temperature. Conceptually, temperature is most closely related to the rate of molecular movement within a
material, yet it is almost always measured using a column of mercury. In this way, the measurement is made indirectly
using the height of the column of mercury as the indicator, or indirect focus of measurement. Although it would be
possible to determine the rate of molecular movement more directly, this is not done because of the considerable expense
and effort involved.
Similarly, measurements of behavior or other characteristics of people are almost always indirect. Consider, for example,
a characteristic that might be of interest to a
Page 18
speech-language pathologist, such as a child’s ability to understand language. Clearly, as in the case of temperature, one
cannot easily measure this characteristic in a direct fashion. In fact, the ability to understand language cannot ever be
directly measured but instead must be inferred from a variety of indicators. This is because that ability is a theoretical
construct,1 a concept used in a specific way within a particular system of related concepts, or theory. Thus, the
theoretical construct referred to here as “the ability to understand language” represents shorthand for the carefully
weighed observations one has made about people as they respond to the vocalizations of others as well as for the
information one has read or been told about this construct by others. Figure 2.1 attempts to capture the complex
relationship between what one wants to measure, the theoretical construct, and the indicators used to measure it.
Looking at Fig. 2.1, you can see that there are many possible indicators for a single construct. This premise is important
to clinicians and researchers who need to recognize that any test or measure they use represents a choice from the set of
all possible indicators.
As will become clearer in later sections of this book, the wealth of indicators available for a construct presents flexibility
for those interested in measuring the construct, but it also presents potential problems. For example, a diverse range of
indicators for a single construct (e.g., intelligence) can lead to confusion when clinicians or researchers use different
indicators in relation to the same construct and reach different conclusions about both the construct and how the
characteristic being studied functions in the world. As an example, if one were to use an “intelligence” test that heavily
emphasizes knowledge of a particular culture, then use of that measure with children who come from a different culture
would lead to very different conclusions regarding how intelligent the children are.
Alternatively, focusing on a single indicator and ignoring the broader range of possible indicators for a given construct
can lead to its impoverishment. This type of problem has recently received attention in the literature on learning
disabilities, where it has been asserted that intelligence is synonymous with performance on one particular test—the
Wechsler Intelligence Scale for Children-Revised (Wechsler, 1974). Critics complain that the use of this single measure
means that the knowledge gained by such research may be far more limited in its appropriate application than has been
appreciated. In summary, the choice of which indicator and how many indicators are used in order to gain information
about a particular construct—be it intelligence, receptive language, or narrative production—have important implications
for the quality of information to be gained.
Pedhazur and Schmelkin (1991) described two kinds of indicators: reflective and formative indicators. Reflective
indicators represent effects of the construct, and formative indicators represent causes of it. An example of a reflective
indicator of one’s ability to understand a language would be the proportion of a set of simple commands in that language
that one can correctly follow. An example of a formative indicator of one’s ability to understand a language would be the
number of years one has
1 Within the literature on psychological testing, there is a tendency to refer to such constructs as latent variables.
Page 19
Fig. 2.1. The relationship between a theoretical construct—single word comprehension— and several indicators that
could be used to measure it.
been exposed to it. Almost all indicators are reflective; however, formative indicators are sometimes used.
By this point, you may be scratching your head, wondering whether the term indicator is synonymous with the
somewhat more familiar term variable. In fact, those terms are quite closely related and, at times, may be used
synonymously. I introduced the term indicator first because variable is so closely associated with research that its
application to clinical measures might have seemed confusing. Consequently, I believe that an initial discussion of
indicators may help readers see how similar clinical and research measures are to one another while averting the
confusion. For the purposes of this book, indicator and variable will be used almost interchangeably to refer to a
measurable characteristic associated with a theoretical construct. However, variable is frequently used in a more
restricted way than indicator, to refer to a property that takes on specific values (Kerlinger, 1973).
One more term that commonly functions as a building block for measurement in descriptions of human behavior and
abilities is the operational definition. This term was originally introduced in physics by Bridgman (1927) to suggest that
in a given application (e.g., a specific research design or a particular clinical measure) a construct can be considered
identical to the procedures used to measure it. Operational definitions have been influential in communication disorders
because they have given rise to the clinical use of behavioral objectives, specific statements defining desired outcomes of
treatment for clients in terms that explain exactly how one will know whether the desired outcome has been achieved.
Operational definitions are probably most useful as a means of encouraging us to think carefully about the specific
indicators we use to gain information about a given theoretical construct.
Page 20
TESTING AND MEASUREMENT CLOSE-UP:
ALFRED BINET AND THE POTENTIAL EVILS OF REIFICATION
In his 1981 book The Mismeasure of Man, Stephen Jay Gould, a noted biologist and popularizer of science, described
the work of Alfred Binet, the Frenchman who developed one of the first well-known intelligence tests. Gould noted
that Binet began to develop the test in 1904 when he was commissioned by the minister of education to devise a
practical technique for ‘‘identifying those children whose lack of success in normal classrooms suggested the need for
some form of special education” (p. 149). Almost as soon as the test came into use, Binet expressed hopes that its
results not be taken as iron-clad predictions of what a child could achieve, but that they be used as a basis for providing
help rather than as a justification for limiting opportunities. Gould went on to describe the regrettable dismantling of
Binet’s fond hope.
Gould’s book describes the process of the reification of intelligence, a process in which an abstract, complex
theoretical construct (such as “intelligence”) comes to have a life of its own, to be seen as real rather than the abstract
approximations that its originators may have had in mind. To illustrate this process, Gould described events in the
United States that occurred within a mere 20 years of Binet’s initial test development. Intelligence had been reified to
the point that it was used—or rather misused—as a basis for decisions having major effects on military service,
emigration policies, penal systems, and the treatment of individuals suspected of “mental defectiveness: ”
Levels of Measurement
There are numerous ways to categorize measurements, but the notion of levels, or scales, of measurement introduced by
S. S. Stevens (1951) is one of the most influential and continues to inspire both defenders and attackers. Stevens’s levels
describe the mathematical properties of different kinds of indicators, or variables. The concept of levels is usually
defined operationally, with each level of measurement described in terms of the methods used to assign values to
variables—for example, whether the values are assigned using categories (normal vs. disordered) versus numbers
(percentage correct).
Typically, a hierarchical system of four ordered levels is discussed, in which the higher levels preserve greater amounts
of information about the characteristic being measured. Table 2.1 summarizes the defining properties of each level of
measurement and lists examples of each that relate to the assessment of childhood language disorders. These levels not
only have implications for our interpretation of specific measures, but also what statistics will be appropriate for their
further investigation.
The nominal level of measurement refers to measures in which mutually exclusive categories are used. Diagnostic labels
and category systems for describing errors are frequently used examples of nominal measures. Although numerals may
some-
Page 21
Table 2.1
Three Levels of Measurement, Their Defining Characteristics
and Examples From Developmental Language Disorders
Nominal
Mutually exclusive categories Describing a child as having word-finding difficultiesLabeling a child’s problem as
specific language impairmentDescribing a child’s use and nonuse for each of 14 grammatical morphemes Ordinal
Mutually exclusive categoriesCategories reflect a rank ordering of the characteristic being measured Describing the
severity of a child’s expressive language difficulties as severeCharacterizing a child’s intelligibility along a rating scale,
such as “intelligible with careful listening,” where no effort has been made to assure that the scale has equal
intervalsDescribing a child’s language in a conversational sample as productive at a particular phase (Lahey, 1988)
Interval Mutually exclusive categoriesCategories reflect a rank ordering of the characteristic being measuredUnits of
equal size are used making the comparison of differences in numbers of units meaningful Summarizing a child’s
standardized test performance using a raw or standard scoreDescribing a child’s spontaneous use of personal pronouns
using the number of correct responsesRating intelligibility using an equal-interval scale
times be used as labels for nominal categories (e.g., serial numbers or numbers on baseball jerseys), nominal
measurements are not quantitative and simply involve the assignment of an individual or behavior to a particular
category. Measurement at this level is quite crude in that all people or behaviors assigned to a specific category are
treated as if they are identical.
Ideally, categories used in nominal level measures are mutually exclusive: Each person or characteristic to be measured
can be assigned to only one category. Diagnostic labels used in childhood language disorders can ideally be thought of as
nominal; however, they are not always mutually exclusive. For example, a child may have language problems associated
with both mental retardation and hearing impairment. Similarly, a child with mental retardation may show a pattern of
greater difficulties with linguistic than nonlinguistic cognitive functions, leading one to want to entertain a designation of
the child as both language impaired and mentally retarded (Francis, Fletcher, Shaywitz, Shaywitz, & Rourke, 1996).
The ordinal level of measurement refers to measures using mutually exclusive categories in which the categories reflect
an underlying ranking of the characteristic
Page 22
to be measured. Put differently, at this level, categories bear an ordered relationship to one another so that objects or
persons placed in one category have less or more of the characteristic being measured than those assigned to another
category. Despite the greater information provided at this level of measurement compared with the nominal level, it lacks
the assumption that categories differ from one another by equal amounts. Severity ratings are probably the most
commonly used ordinal measures in childhood language disorders.
Although ordinal measures reflect relative amounts of a characteristic, they are still not quantitative in the sense of
reflecting precise numerical relationships between categories. For example, although a profound expressive language
impairment may be regarded as representing “more” of an impairment than a severe expressive language impairment, it
is not clear how much more of the impairment is present.
One result of the absence of equal distances between categories (also called equal intervals) in an ordinal measure is that
when rankings are based on an individual judgment, they are likely to be quite inconsistent across individuals. Imagine
the case of a clinician who only serves children with devastatingly severe language impairments. When that clinician
uses the label mild to describe a child’s problems, it may mean something very different from the level of impairment
meant to be conveyed by the same label when it is used by clinicians serving a less involved population. Because of this,
it has been recommended that ordinal measures be used when the ratings made by a single individual will be compared
with one another, but not when ratings of several people will be compared (Allen & Yen, 1979; Pedhazur & Schmelkin,
1991).
The interval level of measurement refers to measures using mutually exclusive categories, ordered rankings of
categories, and units of equal size. It is the highest level of measurement usually encountered in measurements of human
abilities and behavior. Unlike measurements at the first two levels, measurements at this level can be considered
quantitative because numerical differences between scores are meaningful, as was not the case for numerals used at the
nominal or ordinal levels. Test scores are usually identified as the most frequent examples of this level of measurement
in childhood language disorders.
The use of equal-size units in interval-level measurements allows more precise comparisons of measured characteristics
to take place. For example, someone who receives a score of 100 on a vocabulary test can be said to have received 10
more points than someone who received a score of 90, and the same can be said for the person who scored 40 points
when compared with someone who scored 30 on the same test. What cannot be said, however, is that someone who
received a score of 80 knew twice as much as someone who received a score of 40—that comparison entails a ratio
(80:40), and the ability to describe ratios precisely is not reached until the final level of measurement. However, for most
measurement purposes, the interval level of measurement allows sufficient precision.
The ratio level of measurement refers to measures using mutually exclusive categories, ordered rankings of categories,
equal-size units, and a real zero. Achievement of this level of measurement is considered rare in the behavioral sciences,
but occurs when a measure demonstrates all of the traits associated with interval measures along
Page 23
with a sensitivity to the absence of the characteristic being measured—the “real zero” mentioned above. The term ratio
is used to describe such measures because ratio comparisons of two different measurements along this scale hold true
regardless of the unit of measurement that is used. It should also be noted that when ratios are formed from other
measures, they achieve this level of measurement. For example, the ratio of a person’s height to weight falls at the ratio
level of measurement. Measures involving time (such as age or duration) are probably the most common of the relatively
few measures in childhood language disorders that reach the ratio level.
At this point, readers may wonder why score data are not described as falling at the ratio level of measurement given that
a score of 0 on a test or other scored clinical measures is an unpleasant but real possibility. For score data, however, the
zero point is considered an arbitrary zero rather than a real zero because a score of 0 does not reflect a real absence of the
characteristic being studied (Pedhazur & Schmelkin, 1991). Thus, for example, a score of zero on a 15-item task
concerning phonological awareness is not considered indicative of a complete absence of phonological awareness on the
part of the person taking the test. In order to demonstrate that a person has no phonological awareness, the test would
need to include items addressing all possible demonstrations of phonological awareness and would therefore be too long
to administer (or devise, for that matter).
Information concerning levels of measurement may be a review to many readers who remember it from past statistics or
research methods courses. Levels of measurement are introduced in those contexts because each level is associated with
specific mathematical transformations that can be applied to measurements at that level without changing the
relationship between the characteristic to be measured and the value or category assigned to it. Those mathematical
properties, in turn, determine the types of statistics considered appropriate to the measure. In general, the lower the level
of measurement, the less information contained in the measure and the less flexibility one will have in its statistical
treatment.
Recall that a given construct may be associated with indicators at various levels of measurement. Consequently, the level
of measurement of an indicator may be one consideration when choosing a particular measure. Thus, for example,
imagine that you are interested in characterizing a child’s skill at structuring an oral narrative. At the crudest level, one
might choose to label a child’s performance in the production of such a narrative as impaired or not impaired—
measuring it at a nominal level. For greater precision, however, a spontaneous narrative produced by the child might be
rated using a 5-point scale, with 1 indicating a very poorly organized narrative and 5 a narrative with adult-like
structure. Yet probably the most satisfactory type of measure for describing this child’s difficulties is one at the interval
level of measurement. An example of such a measure for narrative production is one devised by Culatta, Page, and Ellis
(1983), in which the child receives a score for the number of propositions correctly recalled in a story-retelling task.
With such a measure (as opposed to measures at the nominal or ordinal levels), you can obtain greater insight into the
nature of the difficulties facing the child and can more readily make comparisons to the severity of other children with
problems in narrative production.
Page 24
Basic Statistical Concepts
As a branch of applied mathematics, the field of statistics has two general uses: describing groups of measurements made
to gain information about one or more variables and testing hypotheses about the relationships of variables to one
another. For many students in an elementary statistics class, each of these uses represents a vast, awe-inspiring, and
sometimes fear-provoking landscape. In this section of the chapter, only the highest peaks and lowest valleys of these
landscapes will be surveyed. Specifically, selected statistical concepts are introduced in terms of their meaning and the
practical uses to which they are applied by those of us interested in measuring children’s behaviors and abilities.
Although statistical calculations are described, only rarely are specific formulas given so that the connection between
meaning and application can remain particularly close. More elaborate and mathematically specific discussions can be
found in sources such as Pedhazur and Schmelkin (1991).
Statistical Concepts Used to Describe Groups of Measurements
One of the most common uses of statistics is to summarize groups of measurements, typically referred to as distributions.
Distributions can consist of a set of measurements based on actual observations (often called a sample) or a set of values
hypothesized for a set of possible observations (often called a population). An example of a distribution based on a
sample would be all of the test scores obtained by children in a single preschool class on a screening test of language. In
contrast, an example of a distribution based on a population would be all of the scores on that same test obtained by any
child who has ever taken it. Except when population distributions are discussed from a purely mathematical point of
view, they are almost always inferred from a specific sample distribution because of the impracticality or even
impossibility of measuring the population.
Two types of statistics used to summarize distributions of measurements are measures of central tendency and
variability. Measures of central tendency are designed to convey a typical or representative value, whereas measures of
variability are used to convey the degree of variation from the central tendency.
Measures of central tendency have been described as indicating “how scores tend to cluster in a particular
distribution” (Williams, 1979, p. 30). The three most common measures of central tendency are (in order of decreasing
use) the mean, median, and mode. The mean is the most common measure of central tendency. It is used to refer to the
value in a distribution that is the arithmetic average, that is, the result when the sum of all scores in a distribution is
divided by the number of scores in the distribution. Unlike the two other measures of central tendency, the mean is
appropriate only for measurements that fall at interval or ratio levels. Although it is considered the richest measure of
central tendency, the mean has the negative feature of being particularly sensitive to outliers—extreme scores that differ
greatly from most scores in the distribution. Because of this, the mean will sometimes not be used even if the level of
measurement allows it; instead, the median, which is the next most sensitive measure of central tendency will be used.
Page 25
The median is the score or category that lies at the midpoint of a distribution. It is the middle score in the case of
ungrouped distributions of interval or ratio data and the middle category in the case of ordinal data. The median is
considered an appropriate measure of central tendency for either ordinal or interval measures and is even superior to the
mean in terms of its relative stability in the face of outliers. On the other hand, it is considered inappropriate for nominal
measures because the categories used at that level of measurement cannot, by definition, be ordered logically. Because of
this lack of ‘‘order” in nominal data, finding a middle score or category is nonsensical.
The third and final measure of central tendency, the mode, has relatively few uses. The mode simply refers to the most
frequently occurring score (for interval or ratio data) or category (for nominal data). Because of the way the mode is
defined, it is possible for there to be more than one mode in a given distribution, in which case the distribution from
which it comes can be referred to as bimodal, trimodal, and so forth. For nominal level data, the mode is the only
suitable measure of central tendency.
Because measurements within a distribution vary, a measure of variability is also required to characterize it effectively.
Three measures of variability, two of which are very closely related, are most frequently used in descriptions of
children’s abilities and behaviors. As was done in the description of measures of central tendency, these measures will be
described in order of decreasing use.
Although considered somewhat daunting by beginning statistics students because of its relatively involved calculations,
the most frequently used measure of variability is the standard deviation. The standard deviation was developed for
interval and ratio measures as an improvement on the seemingly good idea of describing the average (or mean)
difference (or deviation) from the mean. The problem with an average deviation was that because of the way the mean is
defined, all of the deviations above the mean are positive in sign and would therefore balance all of the negative
deviations falling below the mean, leading to an average deviation of zero for all distributions—regardless of obvious
differences in variability from one distribution to another. In order to avoid this problem, the standard deviation is
calculated in a manner that makes all deviations positive. Nonetheless, the intent behind the standard deviation is to
convey the size of the typical difference from the mean score. As I expand on in an upcoming section of this chapter, the
standard deviation has special significance because of its relationship to the normal curve. Specifically, standard
deviation units become critical to comparisons of one person’s score against a distribution of scores, such as occurs when
test norms are used.
The concept of variance is closely related to the standard deviation. In fact, the standard deviation of a distribution is the
square root of its variance. Despite this very close relationship to standard deviation, variance is less frequently used
because, unlike the standard deviation, it cannot be expressed in the same units as the measure it is being used to
characterize. For example, you can describe the age of a group of children in months by saying that the mean age for the
group is 36 months, and the standard deviation is 3 months. This results in a much clearer description than saying that
the mean age for the group is 36 months, and the variance is 9. No, not 9 months—simply 9. Because of this “unit-
lessness,” variance is rarely used when the
Page 26
intent is simply to describe the characteristics of a group. It does play a role in some statistical operations, however, and
so is an important statistic to be aware of.
The least complicated measure of variability, the range, is also the least frequently used of the three measures. It
represents the difference between the highest and lowest scores in a distribution. The utility of the range lies in its ease of
calculation and its applicability to distributions at any level of measurement other than the nominal level. For interval or
ratio data, it is calculated by subtracting the lowest from the highest score and adding 1. Thus for example, if the highest
and lowest scores in a distribution of test scores were 85 and 25, respectively, the range would be 61. At the ordinal
level, the range is usually reported by indicating the lowest to highest value used. For example, one might report that
listener ratings of a child’s intelligibility in conversation ranged from usually unintelligible to intelligible with careful
listening, or from 2 to 4 if a 5-point numeric scale were used. Because the range is based on only two numbers (or two
levels in the case of an ordinal measure), its weakness is the lack of sensitivity and susceptibility to the effects of outliers.
In summary, measures of central tendency and variability are useful for describing groups of measurements related to a
single variable and are selected on the basis of the variable’s level of measurement.
Statistical Concepts Used to Describe Relationships between Variables
A number of statistical concepts are available to describe relationships between and among two or more groups of
measurements and to test hypotheses about the nature of those relationships. Because the intent here is to focus only on
those concepts most basic to understanding measurement applications in developmental language disorders, only one of
those concepts will be discussed in some detail—the correlation.
The correlation between two variables describes the degree of relationship existing between them as well as information
about the direction of that relationship and its strength. Correlation coefficients typically range in degree from 0
(indicating no relationship) to positive or negative 1 (indicating a perfect relationship in which knowing one measure for
an individual would allow you to predict that person’s performance on the second measure with perfect accuracy). The
sign of the correlation refers to its direction: A positive correlation indicates that as one measure increases, the second
measure increases as well. Relationships associated with a positive correlation are said to be direct. A vivid example of a
direct relationship would be the relationship some see between money and happiness. In contrast, a negative correlation
indicates that as one measure increases, the second measure decreases. Relationships associated with a negative
correlation are said to be inverse. A vivid example of an inverse relationship would be the relationship between unpaid
bills and peace of mind.
Figure 2.2 contains examples of graphic representations of correlations that differ in magnitude and direction. Notice that
two of the correlations are described as being associated with a correlation coefficient of 0. The second of those
demonstrates a curvilinear relationship, which cannot be captured by the simple methods described here.
Page 27
Fig. 2.2. Illustrations showing the variety of relationships that can exist between variables and can potentially be
described using correlation coefficients. These include no relationship (i.e., the value of one variable is independent of
the value of the other), a curvilinear relationship (i.e., in which the nature of the relationship between variables changes
in a curvilinear fashion depending on the value of one of the variables), and linear relationships of lower and greater
magnitudes.
As a more detailed (and relevant) example involving correlation, let’s consider two hypothetical sets of test scores
obtained for a class of third graders—one on reading comprehension and the other on phonological awareness (explicit
knowledge of the sound structure of words). If this group of children were like many others, then one would expect their
performances on these two measures to be positively correlated (e.g., Badian, 1993; Bradley & Bryant, 1983)—that is,
one would expect that children who receive higher scores on the reading comprehension test would receive higher scores
on the phonological awareness test. However, because many factors affect each of the abilities targeted by the measures,
it would be unlikely that the magnitude of the correlation, which reflects the strength of the association, would be very
large. In
Page 28
fact, a low correlation might be expected in this context. Table 2.2 contains labels that are frequently used to describe
correlations of various magnitudes (Williams, 1979).
The correlation coefficient most frequently used in describing human behavior is the Pearson product–moment
correlation coefficient (r), the specific type of correlation that would have been appropriate for the example given above.
Unfortunately, that correlation coefficient is only considered appropriate for measurements at the interval or ratio level
of measurement. For measurements at the ordinal level, Spearman’s rank-order correlation coefficient (ρ) can be
calculated. At the nominal level, the contingency coefficient (C) is used to describe the relationship between the
frequencies of pairs of nominal categories.
In addition to these correlation coefficients, however, there are several other correlation coefficients (e.g., phi, point
biserial, biserial, terachoic) that are used during the development of standardized tests. The choice of these less familiar
correlation coefficients is dictated by the characteristics of the measurements to be correlated, such as whether either or
both of the measurements are dichotomous (e.g., yes–no, correct–incorrect), multivalued (e.g., number correct), or
continuous (e.g., response times).
It is easy to be intimidated by an unfamiliar correlation coefficient. However, this danger can be countered with the
knowledge that the concept of correlation remains the same, regardless of how exotic the name of the specific
coefficient. Thus, whether one is using phi or Pearson’s product–moment correlation, a correlation coefficient always is
intended to describe the extent to which two measures tend to vary with one another. In fact, even when one examines
the relationships between the distributions of more than two variables using multiple correlations, the interpretation of
correlations remains essentially unchanged.
Correlation coefficients are usually reported along with a statement of statistical significance, which describes the extent
to which the correlation coefficient is likely to differ from zero by chance, given the size of the sample on which it is
based. In general, statements of statistical significance always carry the implication that although a particular sample of
behavior was observed, it is being used to draw conclusions for the larger population. Statements of statistical
significance are used to test hypotheses—conjectural statements about a relation between two or more variables
(Pedhazur & Schmelkin, 1991). In this case, the hypothesis is that the obtained correlation coefficient differs from zero.
Statistical significance indicates that the obtained value was unlikely to have occurred by chance.
Table 2.2
Descriptive Labels Applied to Correlations of Varying Magnitudes
Norm-referenced Criterion-referenced
Personal experience Developmental language disorders Personal experience Developmental language disorders
IQ testsGREsSATsClassroom tests (with grading on the curve) IQ testsMost language tests Driver’s testEye
examinationClassroom examination (without grading on the curve) Most articulation or phonology testsTreatment
probes in which a set criterion (e.g., 80%) is used Note. GRE = Graduate Record Examination; SAT = Scholastic
Aptitude Test.
Table 2.4
The Amazing University of Vermont Test
1. The University of Vermont is located in (a) Burlington, Vermont (b) Montpelier, Vermont
(c) Manchester, New Hampshire (d) St. Albans, Vermont
(e) Enosburg Falls, Vermont
2. The official acronym for the University is (a) U of V (b) VU (c) UVM (d) MUV
(e) none of the above
3. The number of students attending the University is (a) 500–1500 (b) 1500–3000 (c) 3000–4500
(d) 4500–6000 (e) > 10,000
4. The school colors are (a) grey and white (b) green and white
(c) grey and green (d) green and gold (e) grey and gold
5. The mascot of the University is (a) snowy owl (b) raccoon (c) barn owl (d) catamount
(e) Jersey cow
6. The most popular spectator sport at the University is (a) cow tipping (b) ice hockey (c) football
(d) downhill skiing (e) snowboarding
7. The most famous philosopher graduating from UVM was (a) Ethan Allen (b) Ira Allen (c) Woody Allen
(d) Woody Jackson (e) John Dewey
8. Translated from the Latin, the school motto means (a) Scholarship and hard work (b) Stay warm
(c) Live free and stay out of New Hampshire
(d) Suspect flatlanders (e) Independence and dignity
Such a comparison group is called a normative group, hence, the designation norm-referenced to refer to the method of
score interpretation and sometimes to refer to the specific type of measure being used.
Norms, then, refer to the specific information about the distribution of scores associated with the normative group. Two
types of norms merit special attention: national norms and local norms. National norms are data concerning a group that
has been recruited so as to be representative of a national cross section of individuals who might be tested. Norms for
tests involving children are typically organized so that information
Page 33
based on subgroups of children are reported by age (usually in 2–6 month intervals), by grade, or both. It is often
recommended that when norms are collected, the normative groups be matched against national data (usually census
data) for socioeconomic status, race, ethnicity, education, and geographic region (Salvia & Ysseldyke, 1995). National
norms are collected almost solely for standardized measures that will be used with very large numbers of individuals
each year. For example, intelligence tests, educational tests, and many language tests typically provide national norms.
Local norms are prepared when national norms for a measure are unavailable or inappropriate to a group of test takers.
They represent normative, data collected on a group of test takers like those on whom the measure will be used. Local
norms are especially useful when national norms are likely to be inappropriate for a group of test takers whose language
is unlike that in which the test is written. Most frequently, this would involve individuals who speak one of many
regional or social dialects that are significantly different from the idealized “standard” American English dialect, for
example, speakers of Black American English or Spanish-influenced English. Alternatively, a clinician may want to
collect local norms for specific client populations for whom normative data are lacking (e.g., individuals with hearing
impairment, mental retardation, or cerebral palsy).
Rather than using the Amazing University of Vermont Test to compare performances of a number of test takers, you
might use the Amazing University of Vermont Test to determine whether a group of incoming students has adequately
learned the information included in their orientation materials. In that case, the outcome of the test could lead to a
student’s becoming exempt from an additional orientation session or being required to complete it.
For that testing purpose, scores would be interpreted in relation to a behavioral criterion, for example, 6 of 10 correct.
When interpreted in that way, the test could be described as a criterion-referenced measure. The level of performance
would then be considered a cutoff, or, less frequently, a cutting score. Often the term master is used to refer to a test taker
whose score exceeds the cutoff score, and nonmaster is used to refer to a test taker whose score falls below the cutoff.
Briefly then, in contrast to a norm-referenced interpretation, score interpretation for a criterion-referenced measure
hinges on knowledge of the person’s raw score and the cutoff score. Information about a reference or normative group is
not necessary. It is often useful, however, for developers of criterion-referenced measures to study group performances
as a means of determining a reasonable cutoff score—one that is empirically derived rather than based on an arbitrary
cutoff, for example at 80% correct.
In addition to differences in the mechanics of score interpretation, norm-referenced versus criterion-referenced measures
tend to differ in the scope of knowledge being assessed and the specific method used to choose items. Specifically, norm-
referenced measures tend to address a large content area which is sampled broadly; whereas criterion-referenced
measures tend to address a quite narrowly defined concept that is sampled in as exhaustive a manner as possible. For
norm-referenced measures, items are selected so that the greatest amount of variability in test scores is achieved among
test takers; whereas for criterion-referenced measures, items are selected primarily because of how well they address the
targeted construct. Figure 2.3 shows the steps involved in the development of standardized norm-referenced and
criterion-referenced instruments.
Page 34
At the beginning of this section, only a single measure, the Amazing University of Vermont Test, was used to introduce
the concepts of criterion- and norm-referencing. This was done in order to emphasize that method of interpretation is the
most crucial feature distinguishing norm- from criterion-referenced measures. Practically, however, because of
differences in how items are selected for each type of measure, it is very difficult to develop a single measure that can
equally support these two different approaches to score interpretation.
Types of Scores
Norm-Referenced Measures
For norm-referenced measures, a variety of test scores is useful. Because of the centrality of the comparison between the
test taker’s and the normative group’s performances, however, the raw score is of little value
Fig. 2.3. Steps involved in the development of norm-referenced and criterion-referenced standardized measures.
Page 35
except as the starting point for other scores. These other scores are termed derived scores because of their dependent
relationship to the raw score. Three types of derived scores deserve attention: developmental scores, percentile ranks,
and standard scores. These are listed in increasing order of both their value as a means of representing a test taker’s
performance and their complexity of calculation.
Developmental scores are the least valuable derived scores but are still ubiquitous in clinical and research contexts—a
paradox that I will address shortly. The two most commonly used developmental scores are age-equivalent scores and
grade-equivalent scores. A test taker’s age-equivalent score is derived by identifying the age group that has a mean score
closest to the score received by the individual test taker. For example if a test taker’s raw score of 85 corresponding the
mean raw score of a group of 3-year-olds, the age-equivalent score assigned to the test taker would be 3 years. If there is
no age group that exactly matches the score of a test taker, then an estimation is made of how many months should be
added to the age group whose mean falls just below that of the test taker, resulting in grade-equivalent scores, such as 2
years, 6 months or 5 years, 11 months. Typically, test users do not have to examine the group data directly, but are given
tables listing raw scores and the age-scores to which they correspond.
Grade-equivalent scores are similar in many respects to age-equivalent scores but are, as one would guess from their
name, derived from data concerning the mean performance of groups of test takers in different grades. When estimation
is required, grade-equivalent scores are reported in tenths of a grade. Thus, for a 12-year-old who achieves a score just
slightly above that of a group of 4th graders, a grade-equivalent of 4.1 or 4.2 might be assigned.
In psychometric circles, almost never is a kind word spoken about scores of this type. Long, derogatory lists of the
problems with developmental scores abound (e.g., McCauley & Swisher, 1984; Salvia & Ysseldyke, 1995), but the lists
invariably center around concerns that such scores are easily misunderstood and likely to be unreliable. Table 2.5
provides an elaborate version of these lists as well as a pointed commentary on developmental scores.
The appeal of developmental scores is twofold. First, the apparent uniformity of meaning of such scores across different
tests makes it seem that they allow for a comparison of skills in different areas and permit a sensitive quantification of
degree of impairment. Thus, when a 9-year-old child is said to have skills falling at the 7-year level in math and the 8-
year level in receptive language, it can be misinterpreted as indicating significant problems in both areas, with a more
severe impairment in mathematics. Although many individuals are quite aware of the low esteem in which
developmental scores are held, they nonetheless fall into misinterpretations like this. Given that age-equivalent scores
only crudely compare two scores as their means of norm-referencing, neither individual developmental scores nor
comparisons between them necessarily convey degrees of impairment. Depending on the tests used, for example, it may
be that a great many very normally developing children would exhibit the same “impaired” scores.
The second appeal of developmental scores lies outside the interests of individual test users. Numerous state and
insurance regulations demand that developmental scores be used to describe test performances, presumably on the basis
of the misconceptions cited earlier that meaningful comparisons between skill areas can be based
Page 36
Table 2.5
Five Drawbacks to Developmental Scores, Such as Age-Equivalent
and Grade-Equivalent Scores (Anastasi, 1982; Salvia & Ysseldyke, 1995)
1. Developmental scores lead to frequent misunderstandings concerning the meaning of scores falling below a child’s
age or grade. For example, a parent may interpret an age equivalent of 5 years, 10 months as evidence of a delay in
a 6-year-old. In fact, by definition, half of those children in a given age group (or grade level) would receive age-
equivalent scores below the child’s age. This problem arises because developmental scores contain no information
about normal group variability.
2. There is a tendency to interpret developmental scores as indicating that performance was similar to that of an
individual of corresponding age—for example, that a score of 3 years, 6 months would be associated with
performance that was qualitatively like that of a 3½-year-old. In fact, however, it is unlikely that the nature and
consistency of errors would be similar for two individuals with similar developmental scores but differing ages or
grade levels.
3. Developmental scores promote comparisons of children with other children of different ages or grades rather than
with their same-age peers.
4. Developmental scores tend to be ordinal in their level of measurement. Therefore, they lack flexibility in how they
may be treated mathematically and are prone to being misunderstood. For example, a “delay” of 1 year in a fifth
grader who receives a grade equivalent score of 4 is not necessarily comparable to a “delay” of 1 year in a ninth
grader who receives a grade-equivalent score of 8.
5. Developmental scores are less reliable than other types of scores.
on them. As I discuss in the next section of this chapter, such regulation of test users provides a vivid example of the
numerous cases in which assessment must respond to a variety of forces outside of the direct clinical interaction between
clinician and client. Typically, test users faced with the dilemma of having to report developmental scores are advised by
psychometricians to report them along with more useful derived scores in a manner that minimizes the likelihood of
misunderstanding.
Percentile ranks are actually one variety of a class of derived scores that includes quartiles and deciles. Percentile ranks
represent the percentage of people receiving scores at or below a given raw score. Thus, a percentile rank of 98, or 98th
percentile, indicates that a test taker received a score better or equal to those of 98% of persons taking the test (usually
the normative sample). This type of score has the distinct advantage of being readily understood by a wide range of
persons, including parents and some older children.
Percentile ranks have two disadvantages. The first is that they are sometimes misunderstood as meaning percentage of
correct responses on the test. Readers can avoid this false step if they remember that on a very difficult test, one could
perform better than almost anyone (and therefore have a high percentile rank), but in fact have obtained a low percentage
correct. The second disadvantage of percentile ranks is that, like developmental scores, they represent an ordinal measure
and thus cannot be combined or averaged.
Standard scores represent the pinnacle of scoring approaches used in norm-referenced testing. They preserve information
about the comparison between an individual and appropriate age group and information about the variability of the
normative group.
Page 37
In addition, they are at the interval level of measurement and thus can be combined and averaged in ways not possible
with the other types of scores discussed earlier.
Standard scores are ‘‘standard” because the original distribution of raw scores on which they are based has been
transformed to produce a standard distribution having a specific mean and standard deviation. Because standard scores
are normally distributed, they can be interpreted in terms of known properties of the normal distribution, especially
expectations concerning how expected or unexpected a particular score is. This makes standard scores a favored method
of communicating test results among professionals. Figure 2.4 illustrates the relationship between the normal curve and
several of the most frequently used scores: the z score, deviation IQ score, and T scores.
The most basic standard score is the z score, which has a mean of 0 and a standard deviation of 1. It is calculated by
taking the difference of a particular raw score from the mean for the distribution and dividing the result by the standard
deviation of the distribution. Each score is represented by the number of standard deviations it falls from the mean, with
positive values representing scores that were above the mean and negative values, representing those below the mean.
Because of the relationship between this type of score and the normal curve, it is possible to know that a z score
Fig. 2.4. The relationship between the normal curve and several of the most frequently used standard scores, including
the z-score, deviation IQ score, and T scores. From Assessment of children (p. 17), by J. M. Sattler, 1988, San Diego,
CA: Author. Copyright 1988 by J. M. Sattler. Reprinted with permission.
Page 38
of –2 falls 2 standard deviations below the mean and that fewer than 3% of the normative population had a score that low
or lower.
Other widely used standard scores in developmental language disorders are the deviation IQ and the T score. These
scores share the virtue of z scores in their known relationships to the normal curve: The deviation IQ has a mean of 100
and a standard deviation of 15. As an additional benefit, such scores are somewhat less open to the confusion associated
with negative numbers used in z scores. However, their interpretation remains quite challenging for people who are
unfamiliar with the use of the normal curve in score interpretation. Still, because of their strengths, standard scores such
as these are frequently used among professionals, with percentiles favored for use with other audiences.
Criterion-Referenced Measures
For criterion-referenced measures, raw scores are the major type of score because by definition such measures involve
the comparison of a raw score against a given criterion or cutting score. As mentioned previously, it is possible for the
cutoff score to be based on empirical study or for it to be arbitrarily established on the basis of hypotheses about the level
of performance, or performance standard, required for satisfactory advancement to later levels of skill acquisition
(McCauley, 1996).
Case Example
Case 2.1 illustrates most of the concepts discussed in this chapter as they relate to Austin, a 5-year-old boy with specific
language impairment. This hypothetical report is annotated to highlight instances where a measurement has been made
by the clinician. Specifically, both formal and informal measures are bolded in this case.
Case 2.1
Speech-Language-Hearing Center
353 Luse Street
Burlington, VT 05405-0010
Client’s name: Austin G. Date of Evaluation: 2/12/97
Address: (child’s home with mother and stepfather) Parents’ names: Leslie G. (mother)
284 Willow Creek Road Warren G. (stepfather)
Burlington, VT 05401 George C. (father)
33 Elm Street
Savannah, GA 31411
Date of Birth: 1/8/92 h: (912) 999-9393
Education Status: Kindergarten Referral Source: Dr. A. B. Park
School: Woodward Elementary School Student clinician: E. Miller, B.A.
2 Station Street Supervisor: R. J. Turner, M.S., CCC-SLP
Burlington, VT 05401
Date of report: 2/14/97
Page 39
BACKGROUND INFORMATION
Austin, a 5-year, 1-month-old boy, was seen today for a speech and language evaluation following referral by his
primary care physician. Dr. A. B. Park. Background information was obtained using a case history form, an in-depth
parent interview conducted with Mr. and Mrs. G., who accompanied Austin today, and a phone conversation with Mr.
C., Austin’s biological father.
The reasons given by Mr. and Mrs. G for today’s evaluation were growing concerns regarding Austin’s articulation,
overall intelligibility, and expressive language skills. Mr. and Mrs. G report that strangers and even other children in
Austin’s class find him difficult to understand and frequently ask him to repeat what he has said. He is also becoming
increasingly frustrated with family members when they fail to understand him, resulting in increasingly frequent and
escalating arguments with his older sister. Elizabeth (age 10). In contrast, they report that he understands everything
that is said to him and is recognized as a very bright child even by adults who fail to understand him.
Austin and his sister Elizabeth live with Mr. and Mrs. G and see their biological father, Mr. C, only at holidays and for
6 weeks in the summer. The parents divorced when Austin was 1 year old, and he calls his stepfather as well as his
biological father “Daddy.” Austin currently attends a kindergarten class in the Woodward Elementary School—
Burlington , where he has three or four especially close friends. According to his teacher Mrs. Smith’s reports to his
parents, Austin is a happy child Who is popular at least in part because of his enthusiastic manner and skill at
playground athletics. Because he is small for his age (in the 5th percentile for height and weight) and because of his
immature-sounding speech, he is sometimes teased by children from older classes about being a “baby,” but is readily
defended by his classmates and appears unaffected by such taunts, according to Mrs. Smith. She referred Austin for a
speech-language evaluation by the school speech-language pathologist in January because of concerns about his
language production and articulation, but otherwise she states that he is performing well in the kindergarten classroom.
Because circumstances prevented that evaluation from taking place, Mr. and Mrs. G had decided to seek an evaluation
at the Luse Center.
Austin’s birth and early health and developmental history are unremarkable except for delays in the onset of speech,
with only about 10 words by age 2 and no word combinations until age 3. Although he had shown a dramatic increase
in the length of his utterances over the past 2 years, his parents reported that he still speaks in incomplete. sentences
and produces many words incorrectly. Both biological parents reported a significant history of family members with
speech and language problems, including Mr. C., who received speech therapy until 5th grade for what appeared to
have been language-related concerns, two of Austin’s paternal uncles, one maternal aunt in the preceding generation,
and two maternal cousins.
Page 40
LIST OF ASSESSMENT TOOLS
The assessment procedures that were conducted during this evaluation are listed and reported in the paragraphs that
follow.
In addition, informal procedures were used to screen pragmatics, voice, and fluency. Overall results of these tests and
procedures are described in the following sections, with more detailed information about subtest performance and
specific errors available on summary test forms (see file).
Hearing
Austin’s hearing was screened using pure tones that were presented under headphones at 20 dB bilaterally at 500, 100,
2000, and 4000 Hz. He passed the screening in both ears.
Receptive Language
Austin’s ability to understand what is said to him was assessed using receptive portion of the Test of Language
Development—2 Primary (TOLD-P:2) and the Peabody Picture Vocabulary Test—3 (PPVT-2). On the receptive
language subtests of the TOLD-P:2, Austin received a listening quotient of 96, which approximates a percentile rank of
50. On the PPVT-2, his performance was even better. The raw score he obtained was 78, which corresponds to a
percentile rank of 75 and a standard score of 110.
Expressive Language
Austin’s ability to express himself was assessed using the TOLD-P:2 expressive portions and the Expressive One-
Word Picture Vocabulary Test—Revised, as well as informal measures obtained from a transcription of a
conversational sample taken as Austin played with his mother. Austin’s formal test scores were considerably lower on
these measures, in part because of the difficulties associated with his speech intelligibility. On EOWPVT-R, Austin
received a raw score
Page 41
of 20, which corresponds to the 5th percentile and a standard score of 76. Of his 10 errors on that test, approximately 4
were unambiguous with respect to the possible impact of his speech production difficulties; for example, they involved
the use of a more general or associated word than the target, or they consisted of instances when Austin said that he did
not know the name. On the TOLD-P:2 expressive subtests, Austin received an overall speaking quotient of 61, which
falls below the first percentile. An examination of his utterances during a conversation with his mother revealed
frequent omission of grammatical morphemes, an absence of complex sentences, and a tendency to overuse the word
‘‘thingy” to refer to numerous elements of a Lego construction that they built cooperatively.
Phonology and Oral–Motor Performance
The Oral Speech Mechanism Examination—Revised (OSME-R) was used to examine the adequacy of Austin’s oral
structures for speech production. His performance on that measure was well within the normal range, with no signs of
incoordination or weakness and no observable abnormalities of the structures used in speech. Errors noted in the
production of repeated syllables mirrored those in his conversational speech.
On the Bankson–Bernthal Test of Phonology, Austin received a word inventory score, which reflects the number of
words produced correctly, of 39, which corresponds to a Standard Score of 71 and a percentile rank of 3. Errors
occurred primarily on medial or final consonants. Patterns of errors that occurred most frequently were final consonant
deletion (omission of the final consonant in the word; e.g., “bat” becomes “ba”), cluster simplification (replacement or
less of one or more elements of a consonant cluster; e.g., “clown” becomes “clo”), and fronting (replacement of a velar
consonant by a more forward consonant; e.g., “gun” becomes “dun”). Efforts to elicit correct production of two
consonants that had not been produced correctly up to that point (viz., k, g) were undertaken using a phonetic
placement instructions and touch-cues resulted in velar fricative approximations. Other sounds consistently in error
were [s, z, r] and [l].
When the language sample discussed in the previous section was examined with regard to speech errors and
intelligibility, very ,similar error patterns were observed and the percentage of understandable words out of all words
spoken was determined to be 70%.
Screening for Other Language and Speech Problems
The conversational sample between Austin and his mother was also examined to screen for problems in pragmatics,
voice, and fluency. Austin’s use of language and his ability to describe the plot of a movie he had recently seen with
Page 42
out his mother appeared appropriate for his age. His voice quality and pitch were normal. Fluency also appeared
normal, although frequent repetitions and rewordings of sentences occurred in response to his mother’s verbal and
nonverbl indications of having difficulty in understanding some of his utterances. Although Austin’s awareness of his
communication difficulties is quite sophisticated in a child of his age, his facial expression and movements at times
suggested significant frustration.
Summary
Austin appears to be a bright and sensitive 5-year-old with no significant medical history, but a family history of
communication difficulties. Today’s evaluation reveals normal hearing and language comprehension, as well as good
conversational skills and normal voice and fluency. Austin’s difficulties in being understood are moderate to severe at
this time and appear to reflect his difficulties in using sounds as expected for his age and in selecting and combining
words to create grammatically acceptable sentences. His strong skills in other areas, support by family and school
personnel, and clear motivation to improve his communication efforts suggest a very positive prognosis for change.
Recommendations
Austin is likely to benefit from speech-language intervention conducted in individual and group setting at his school,
including in-class work conducted by his teacher in consultation with the school speech-language pathologist. Areas to
be targeted include phonology, expressive vocabulary, and syntax. Specific goals should address (a) the phonological
processes of final consonant deletion and fronting, (b) expressive vocabulary related to school activities, (c) the use of
grammatical morphemes that are not currently used but should be pronounceable given his current phonological
system, and (d) the development of strategies for dealing in a more relaxed way with listeners’ difficulties in under
standing Austin’s speech.
It was a pleasure to meet Austin and his family today and to have talked previously to others involved in his education
and upbringing. We urge you to call with any questions you might have about this report or Austin’s ongoing
development.
Sincerely,
Validity
Reliability
Historical Background
The historic roots of behavioral measurement can be traced to tests used in the third century B.C. by the Chinese military
for the purpose of identifying officers worthy of promotion (Nitko, 1983). Despite such early beginnings, however,
widespread interest in measurement for purposes such as helping children has far more recent origins, beginning at the
close of the 19th century. Not surprisingly, therefore, there are many threads of thought leading to the diversity of
instruments and procedures now being used to describe and make decisions about people.
During the 20th century, perspectives on how to develop and use measures such as those used to help children with
developmental language disorders have come from education, psychology, and—most recently—speech-language
pathology. Over this relatively brief period of time, professional and academic organizations in these fields have taken
on the responsibility of developing standards of test development and use. These efforts have primarily focused on tests,
where test is defined as a behavioral measure in which a structured sample of behavior is obtained under conditions in
which the tested individual is expected (or at least has been instructed) to do his or
Page 50
her best1 (APA, AERA, & NCME, 1985). Despite a focus on tests in this narrow sense, such standards have always been
meant to apply to all behavioral measures—although they apply to a greater or lesser extent depending on the specific
characteristics of the measure.
Most notable among efforts to provide guidance to test developers and users have been those of the APA, AERA, and
NCME. In 1966, after two earlier sets of testing standards (APA, 1954; National Education Association, 1955), the three
organizations worked together to create a single document, Standards for Educational and Psychological Tests and
Manuals, which has gone through two revisions. The most recent revision was renamed Standards for Educational and
Psychological Testing (AERA, APA, & NCME, 1985).
The frequent revision of these standards reflects the brisk pace of research and ongoing discussion about behavioral
measurement. One particularly important transition occurring within the past two decades is reflected in the change of
title from Standards for… Tests to Standards for… Testing. This change emphasizes the centrality of the test user in
measurement quality. Earlier editions focused on ways in which test developers could demonstrate the quality of their
instruments. Far less attention was paid to issues related to actual test administration and interpretation. In fact, whereas
75% of the 1974 version related to test standards, only 25% of it related to standards of test use. In the most recent
version, there has been almost a reversal in those percentages: about 60% relates to test use versus 40% to test standards.
This shift is consistent with the most influential work conducted in the last decade in which test users are asked to
consider not simply the technical adequacy of methods used to derive specific test scores, but also the impact their
decisions will have (Messick, 1989). Not surprisingly, the term ethics has cropped up frequently in the course of these
discussions. It will surface frequently in this text as well.
Beginning with this chapter, I hope that readers will adopt a perspective similar to that set by the APA, AERA, and
NCME (1985). Specifically, I hope that you will consider measurement quality in developmental language disorders as a
arena in which many elements come into play, but in which you are the lion tamer, the person who remains expertly in
charge of a potentially dangerous situation. In this chapter and the one that follows it, I focus on how best to select
appropriate measures once you have a fairly specific application in mind. Chapters in Part II focus on those specific
applications commonly faced by clinicians who work with children who have developmental language disorders. Those
chapters will figure prominently in helping you learn to tailor your measurements to the specific purposes you have in
mind—a key lesson for those interested in providing their clients with the best possible care.
The remainder of this chapter is intended to introduce you to validity and reliability, two concepts that invariably
dominate discussions of measurement quality. Validity is by far the most central of the two terms. It even might be said
that any discussion of measurement quality is automatically a discussion of validity. Reliability is of
1 This assumption is probably not well founded for many children with language disorders, who may be unable to
understand what it means to “do one’s best” or who may be unwilling to do it. I return to this issue at numerous points
throughout this book.
Page 51
lesser importance but is still vital. Its secondary place derives from its role as prerequisite for, but not sole determinant
of, validity.
Validity
Validity can be defined as the extent to which a measure measures what it is being used to measure. So, you might ask,
what’s all of the fuss about? Despite its seeming simplicity, however, the concept of validity has a number of subtle
nuances that can be difficult to grasp for even the most seasoned users of behavioral measures. Several misconceptions
are evident when a test user or developer says sweepingly that a given test is a valid test. First, this kind of statement
about a measure suggests that it somehow possesses validity, independent of its use for a particular purpose. Second, it
suggests that validity is an all-or-nothing proposition. Both of those suggestions are untrue, however. What can safely be
said about a given measure is that it seems to have a certain level of validity to answer a specific question regarding a
specific individual. However, even reaching that less-than-definitive-sounding conclusion requires considerable work on
the part of the clinician.
To explore the general concept of validity a little more fully, consider a specific, widely used measure—the Peabody
Picture Vocabulary’ Test–III (Dunn & Dunn, 1997). That measure was developed for the purpose of examining receptive
vocabulary in a wide variety of individuals using a task in which a single word is spoken by the test giver and the test
taker points to one picture (from a set of four) to which the word corresponds. Despite the exceptionally detailed
development undergone by the PPVT-III, it is nonetheless quite easy to imagine situations in which its use could lead to
highly invalid conclusions and, thus, for which its validity could be questioned. For example, using the PPVT-III to
reach conclusions about a test taker’s artistic talent or about the vocabulary of someone who does not speak English
represent gross examples of how misapplication undermines validity.
One can also imagine—or simply observe—less obvious yet similarly problematic applications of the PPVT-III. For
example, the PPVT-III might be used to draw conclusions about overall receptive language, rather than receptive
vocabulary only. It might also be used to examine the receptive vocabulary skills of an individual or group lacking much
previous exposure to many vocabulary items pictured in the exam. In each of these cases, the validity of the test’s use
would be adversely affected, although probably not to the degree of the first, extreme examples. Thus, these latter
examples illustrate the continuous nature of validity by showing that a measure can be less valid than if it were used
appropriately, but more valid than if wildly misused. These last two examples are also poignant because they aren’t just
hypothetical examples, but actual ones that readily occur if a clinician is careless or naive about the concept of validity.
As another way of thinking about these problems in validity, consider two questions: (a) Is something other than the
intended construct actually being measured by the indicator (the test)? and (b) Does the indicator reflect its target
construct in such a limited way that much of the meaning of the construct is lost? Affirmative answers to either or both
of those questions chip away at the value of the indicator as a means
Page 52
of measuring the intended construct and, by definition, chip away at the measure’s validity. Thus, when the PPVT-III is
used as a measure of receptive language as a whole, the construct of receptive language is greatly impoverished, hence
one can conclude that reduced validity is a strong risk. On the other hand, it may be used to measure vocabulary skills in
individuals who have not had much exposure to the vocabulary. Then it may become a measure of exposure to the
vocabulary rather learning of the vocabulary, thus reducing the measure’s validity because the test would not be
measuring what it was supposed to measure.
Given the continuous nature of validity and the considerable specificity with which it must be demonstrated, how does
one ascertain that a measure is valid enough to warrant use for a particular purpose? In the next section I outline methods
that are used by test developers and other researchers to provide support of a general nature—that is, suggesting broad
parameters associated with its useful application. Methods used by test users to evaluate that support in terms of a
specific application are described in the next chapter.
Ways of Examining Validity
The methods used to demonstrate that a measure is likely to prove valid for a general purpose (such as identifying a
problem area or monitoring learning) have grown in number and sophistication over the years. Although the methods are
highly interrelated, they are nonetheless characterized as falling into three categories: construct validation, content
validation, and criterion-related validation. These three categories are ordered beginning with the most important.
Construct Validation
Construct validation refers to the accumulation of evidence showing that a measure relates in predicted ways to the
construct it is being used to measure—that is, to show that it is an effective indicator of that construct. A wide variety of
evidence falls into this category, including evidence that is described as content- or criterion-related in the sections that
follow. If that seems confusing to you at first, you are not alone; the theoretical centrality of construct validity has only
recently been recognized. Until that time, validity was usually conveyed as composed of three parts rather than as a unity.
Figure 3.1 portrays the relationship between the three types of validity evidence. It also conveys the two meanings of
construct validity—(a) as a cover term for all types of validity evidence and (b) as a term used to refer to several
methods of validation that are not seen as fitting under either content- or criterion-related validation techniques.
The underlying similarity of methods uniquely defined as demonstrating construct validity can perhaps best be seen
through a discussion of the earliest stages in measurement development. When approaching the development of a
behavioral measure, the developer considers how the construct to be measured (such as receptive vocabulary) is related
to other behavioral constructs and events in the world (such as age, gender, other abilities). Also considered at this stage
are possible indicators (such as pointing at named pictures or acting out named actions) that might reasonably be used
Page 53
Fig. 3.1. A graphic analogy illustrating the different kinds of evidence of validity.
to obtain information about the construct and thereby serve as the basis for the measure.
For example, in the case of receptive vocabulary as a possible construct, the test developer begins with a scientific
knowledge base that supports expectations about how receptive vocabulary is affected by phenomena such age and
gender. That knowledge base also generates expectations about how the construct is related to other behavioral
constructs such as expressive language development and hearing ability. From this knowledge base, the developer
formulates predictions about how a valid indicator, or measure, will be affected by such phenomena and how such a
valid indicator will be related to other constructs. Evidence suggesting that the measure acts as predicted supports claims
of construct validity. Four specific methods of construct validation are discussed in upcoming paragraphs—
developmental studies, contrasting group studies, factor analytic studies, and convergent-discriminant validation studies.
For many measures used with children, two kinds of studies are frequently used to provide evidence of construct validity
—developmental studies (sometimes called age differentiation studies) and studies in which groups who are believed to
differ in relation to the construct are contrasted with one another (sometimes called group differentiation studies). Table
3.1 provides an example of the description provided for each of these types of study. The specific examples used here are
not considered to be the most thorough nor the most sophisticated possible examples. Instead they are meant to help you
anticipate the way such studies are described in test manuals.
The developmental method of construct validation is based on the general expectation that language and many related
skills of interest increase with age. The
Page 54
Table 3.1
Examples of Test Manual Descriptions of Two Types of Construct Validation Studies
Developmental studies ‘‘Correlational methods were used to determine if performance on the TWF [Test of Word
Finding] changes with age. Using the Pearson product-moment correlation procedure, TWF
accuracy scores (scale scores generated from the Rasch analyses) were correlated with the
chronological age of the 1,200 normal subjects in the standardization sample…. All
coefficients were statistically significant and of a sufficient magnitude to support the
construct validity of the TWF as a measure of expressive language for both boys and girls
and of children of different ethnic and racial background.
Comparison of accuracy scores at each grade level also reflected developmental trends as
the accuracy scores of the normal subjects in the standardization sample increased across
grades…. These findings, which support grade differentiation by the TWF for all but one
grade, are a further indication of developmental trends in test performances on the
TWF.” (German, 1986, p. 5)
Contrasting group studies “In order to test the capacity of the TELD [Test of Early Language Development] to
distinguish between groups known to differ in communication ability, we administered the
TELD to seventeen children who were diagnosed as ‘communication disordered’ cases. No
children with apparent hearing losses were included in the group. Eighty percent of the
children were white males; they ranged in age from three to six and a half. In socio-
economic status, sixty-four percent were middle class or above. All of the children attended
school in Dallas, Texas.
The mean Language Quotient (LQ) derived from the TELD for this group was 76. Since the
TELD is built to conform to a distribution that has a mean of 100 and a standard deviation
of 15, it is apparent that the observed 76 LQ represents a considerable departure from
expectancy. It is a discrepancy that approaches two standard deviations from normal. These
findings were taken as evidence supporting the TELD’s construct validity.” (Hresko, Reid,
& Hammill, 1981, p. 15)
hypothesis tested in this type of validation study is that performance on the measure being studied will improve with age.
As you probably recall from previous course work, developmental studies of this kind can take a couple of different
forms—one (called a longitudinal study) compares the performances of a single group of children across time, and a
second (called a cross-sectional design) compares the performances of several groups of children, each group falling at a
different age. Cross-sectional studies are particularly popular among test developers, undoubtedly because the data
needed to test the hypothesis are the same as those needed to provide norms.
A second major type of construct validation study, which can be called the contrasting groups method of construct
validation, tests the hypothesis that two or more groups of children will differ significantly in their performance on the
targeted measure. Again, consider receptive vocabulary as the example. Obviously developing a test of receptive
vocabulary for use with children only makes sense if you believe that there are some children whose performance falls so
far below that of peers as to
Page 55
have significant negative consequences. For this type of measure, one might evaluate construct validity by finding
groups of children who are thought to differ in their receptive vocabulary knowledge (e.g., children with a
developmental language disorder vs. children without such a disorder). In this type of study, if the measure is a valid
reflection of the construct, children who have been identified as differing in relation to the construct should also differ in
their performance on the measure. See Table 3.1 for an example of a validation study of this type.
A third category of construct validity study is identified through the use of a specific statistical technique—factor
analysis. Factor analysis is less frequently used in speech-language pathology than it is in some other disciplines. For
example, it has been used most extensively to study intelligence tests. Besides its value as a means of studying an
already developed measure, factor analysis is frequently used in early stages of test development as an aid in selecting
items from a pool of possible items.
The term factor analysis describes a number of techniques used to examine the interrelationships of a set of variables and
to explain those interrelationships through a smaller number of factors (Allen & Yen, 1979). Factor analysis assists
researchers in the very difficult process of making sense of a large number of correlations, the most basic method for
describing interrelationships (as described in chap. 2).
In factor analytic studies, the original set of variables to be studied typically consists of a group’s performance on the
target measure as well as a set of other measures—some of which tap a similar construct as the target measure. Although
the concept of the factor does not exactly relate to a specific underlying construct, all measures related to a single
construct are expected to be associated with a single factor. Therefore, construct validity would be demonstrated in this
type of study when the target measure shares, or “loads on,” the same factor as measures for which validity with respect
to a particular construct has already been demonstrated (Pedhazur & Schmelkin, 1991).
A particularly sophisticated method proposed for studying construct validity exists in principle, is applied to measures
developed for a variety of behavioral constructs, but is rarely applied in speech and language measures. That is the
method known as convergent and discriminant validation (Campbell & Fiske, 1959), which is associated with a type of
experimental design they called a multitrait–multimethod matrix. Because of the relative rarity of this approach for
measures used with children who have language disorders, I do not discuss it in detail. However, because this method is
sometimes used for measures you will be interested in, it is important to know that convergent validiation refers to
demonstrations that a measure correlates significantly and highly with measures aimed at the same construct, but using
different methods; discriminant validation refers to demonstrations that it does not correlate significantly and highly with
measures targeting different constructs (Pedhazur & Schmelkin, 1991).
An example from Anastasi (1988) may help make the ideas behind convergent and discriminant validation clearer:
Correlation of a quantitative reasoning test with subsequent grades in a math course would be an example of convergent
validation. For the same test, discriminant validity would be evidenced by a low and insignificant correlation with scores
on a reading comprehension test, since reading ability is an irrelevant variable in a test designed to measure quantitative
reasoning. (p. 156)
Page 56
In short, validity is supported in this approach through evidence that the measure under study is measuring what it is
supposed to measure in a manner uncontaminated by its relationship to something else that it was not supposed to
measure.
In the context of their discussion of convergent and discriminant validation, Pedhazur and Schmelkin (1991) discussed a
pair of fallacies that threaten researchers’ understanding of the evidence they obtain using this measure, but equally
apply to thoughts about test selection. Cleverly, they have been termed the “jingle and jangle fallacies.” Jingle fallacies
arise when one assumes that measures with similar names must tap similar constructs; whereas jangle fallacies arise
when one assumes that measures with dissimilar names must tap dissimilar constructs. Obviously, close examination of
actual content can help ward off the deluding effects of such thinking.
Although I only discussed four methods of construct validation, many more methods are actually used, including those
that have conventionally been identified in association with content- and criterion-related validation. Methods fitting
under content- and criterion-related validation techniques are discussed next. These are typically viewed as more easily
understood than construct validation.
Content Validation
Content validation involves the demonstration that a measure’s content is consistent with the construct or constructs it is
being used to measure. As with construct validity, the developer addresses concerns about content validity from the
earliest stages of the measure’s development. Such concerns necessitate the use of a plan to guide the construction of the
components of the measure (test items, in the case of standardized tests). The plan ensures that the components of the
measure will provide sufficient coverage of various aspects of a construct (often called content coverage) while avoiding
extraneous content unrelated to the construct (thus assuring content relevance). Later, content validity is evaluated
directly, usually through the use of a panel of experts who evaluate the original plan and the extent to which it was
effectively executed. Table 3.2 lists the basic steps involved in the development of standardized measures.
Despite underlying similarities, the specific ways in which concerns regarding content validity affect the development
process differ for norm-referenced and criterion-referenced measures. Before attempting a comparison of these
differences, recall
Table 3.2
Steps Involved in the Development of a Standardized Measure
(Allen & Yen, 1979; Berk, 1984)
Concurrent validity
Test of Phonological Awareness (TOPA): “When the TOPA (Test of Phonological Awareness) was given to a sample of
100 children at the end of kindergarten, it was found to be significantly correlated with two other, relatively different
measures of phonological awareness. The TOPA-Kindergarten scores were correlated with scores from a measure called
sound isolation (a 15-item test requiring pronunciation of the first phoneme in words) at .66 and with a segmentation task
(requiring children to produce all the phonemes in a three- to five-phoneme word) at .47. Both of these other measures
assessed analytical phonological awareness, although they required a more explicit level of awareness than did the
TOPA.” (Torgesen & Bryant, 1994, p. 24)Preschool Language Scale-3 (PLS-3): “A study of the relationship between
PLS-3 and CELF-R [Clinical Evaluation of Language Function-Revised (Semel, Wiig, & Secord, 1987)] was conducted
with 58 children. The sample consisted of 25 males and 33 females ranging in age from 5 years to 6 years, 11 months
(mean = 6 years, 0 months). The two tests were administered in counterbalanced order. The between-test interval ranged
from two days to two weeks, with an average of 4.5 days. Both tests were administered by the same examiner. Reported
correlations were as follows: PLS-3-Auditory Comprehension with CELF-R Receptive Composite (r = .69); PLS-3-
Expressive Communication with CELF-R Expressive Composite (r = .75); PLS-3-Total Language score with CELF-R
total Score (r = .82).” (Zimmerman, Steiner, & Pond, 1992, p. 95) Predictive validity Test of Phonological Awareness
(TOPA): “When the TOPA-Kindergarten was given to 90 kindergarten children sampled from two elementary schools
serving primarily low socioeconomic status and racial minority children, its correlation with a measure of alphabetic
reading skill (the Word Analysis subtest from the Woodcock Reading Mastery Test) at the end of first grade was .62.
Thus, between 30% to 40% of the variance in word-level reading skills in first grade was accounted for by the TOPA
administered in kindergarten.’’ (Torgesen & Bryant, 1994, p. 24)Receptive-Expressive Emergent Language Scale
(REEL-2): “In the first study investigating predictive validity, researchers at the University of Florida’s Emergent
Language laboratory conducted a longitudinal study of 50 ‘normal’ infants from linguistically enriched environments.
After repeated monthly testing over a 2- to 3-year period, all infants were found to achieve mean average scores for
Receptive Language Age (RLA) and Expressive Language Age (ELA), and Combined Language Age (CLA) at or about
their chronological ages.” (Bzoch & League, 1992, p. 10)
has been to examine performances of an item tryout sample before and after instruction designed to produce mastery
(Allen & Yen, 1979). In that context, better items are those in which p values show the greatest upward change.
As was the case with norm-referenced measures, the last step of the test construction for a criterion-referenced measure
involves the collection of initial information about the instrument’s overall validity and reliability and the preparation of
docu-
Page 60
mentation concerning the instrument. Here, the effects on content validity are achieved using means similar to those used
for norm-referenced measures. In addition to providing descriptive evidence of the procedures used to develop the test’s
content, test authors look to the results of expert evaluations of construction methods and final test content as a further
source of content validation.
TESTING AND MEASUREMENT CLOSE-UP
Anne Anastasi has been called one of “psychology’s leading women.” She was one of only five women (of a total of
96 psychologists) to be considered during the first eight decades of this century in a prominent series of books
recording the history of psychology through autobiography (Stevens & Gardner, 1982). Although Anastasi has made
contributions in a variety of areas in psychology, the reason that she is included here is because of her authorship of a
classic text on psychological testing (Anastasi, 1954). That text has gone through seven editions, with the latest edition
published in 1997. It has undoubtedly served as the source of more information on testing for psychologists and others
than perhaps any other work, and in its latest edition, Anastasi (1997) again provided one of the clearest sources for
essential information on validity and reliability.
In the early 1980s, at the University of Arizona, I had the pleasure of hearing Anne Anastasi present a lecture, when
she was in her 70s. Her black patent leather pocketbook was propped up in front of her on the podium as she spoke, its
stiff handle almost obscuring the audience’s view of her white hair, thick horn-rimmed glasses, and the bright eyes that
lay behind them. I do not actually remember much about the details of her presentation, except that her speech was as
clear as her writing and was presented without a single note. She was as impressive in person as she had been on the
page.
The following passage from her autobiography breathes life into two very different ideas from this chapter. First, it
shows the possibly traumatizing effect that the process of assessment can have—even on a child whose biggest
problem appears to have been her exceptional intelligence. Second it revisits the distinction between norm-referenced
and criterion-referenced (or as she calls it here, content-referenced) score interpretation.
“Throughout my schooling, I retained a deep-rooted notion that any grade short of 100 percent was unsatisfactory. At
one time I actually believed that a single error meant a failing score. I recall a spelling test in 4B, in which we wrote ten
words from dictation. I was unable to hear one of the words properly, because the subway had just roared past the
window (it was elevated in that area). The word was “friend,” but I heard it as “brand.” As a result, the item was
marked wrong and my grade was only 90%. That evening when I told my mother about it, she consoled me and
advised me to raise my hand at the time and tell the teacher, if anything like that should happen again. But she did not
disabuse me of the notion that anything short of a perfect score was a failure. I eventual-
Page 61
ly discovered for myself that one could pass despite a few errors; but I always felt personally uncomfortable with the
idea. There seemed to be some logical fallacy in calling a performance satisfactory when it contained errors. I was
apparently following a content-referenced rather than a norm-referenced approach to performance
evaluation.’’ (Anastasi, 1980, p. 7–8).
Face Validity. One further topic regarding content validity that demands attention is not really a matter of true validity at
all, despite its being termed face validity. Face validity is the superficial appearance of validity to a casual observer. Face
validity is considered a potentially dangerous notion if a test user mistakenly assumes that a cursory evaluation of a
measure for its face validity constitutes sufficient evidence to warrant its adoption. Nonetheless, face validity can play a
role in a test’s actual validity; for example, poor face validity may cause a test taker to discount the importance of a
measure and thereby undermine its ability to function as intended.
In summary, the kind of evidence provided for norm-referenced versus criterion-referenced measures differs. However,
content validation for both types of measures is achieved through the author’s careful planning, execution, and reporting
of the measure’s development and through the positive evaluation of this process by experts in the content being
addressed.
Criterion-Related Validation
Criterion-related validation refers to the accumulation of evidence that the measure being validated is related to another
measure—a criterion—where the criterion is assumed to have been shown to be a valid indicator of the targeted
construct. Putting this in primitive terms, criterion-related validation involves looking to see if your “duck” acts like a
“duck.” This explanation derives from famous streetwise logic in which anything that looks like a duck, walks like a
duck, and quacks like a duck is determined to be a duck. Thus, as you set out to validate your measure (Duck 1), you
search around for a duck (Duck 2, a.k.a. Criterion Duck) that everyone acknowledges is indeed a true duck (i.e., a valid
indicator of the underlying construct). Then you put your ducks through their paces to see to what extent they act
similarly. The greater their similarities, the better the evidence that they share a common “duck-ness.” And then, voilà:
You have evidence of criterion-related validity!
In case I lost you there, the way that criterion-related validation works for a behavioral measure is that one obtains
evidence by finding a strong, usually positive correlation between the target measure and a criterion. The choice of the
criterion is crucial because of the assumption that the criterion has high validity itself. It can also be problematic because
for many constructs it may be difficult to find a criterion that can claim such an exalted status.
Two types of criterion-related validity studies are typically described: concurrent and predictive. Predictive validity is
most relevant when the measure under study will be used to predict future performance in some area. For example, the
Predictive
Page 62
Screening Test of Articulation (PSTA; Van Riper & Erickson, 1969) was intended to predict whether a child tested at the
beginning of first grade would still be considered impaired in phonologic performance 2 years later. Consequently, this
type of evidence was important in demonstrating that the test would measure what it was supposed to measure. In that
particular case, the test developers used as the criterion measure the researcher’s judgments of normal articulation versus
continued articulatory errors based on a simple phonetic inventory and on samples of spontaneous connected speech
obtained 2 years after initial testing with the PSTA.
A study of concurrent validity is performed when the criterion and target measures are studied simultaneously in a group
of individuals like those for whom the test will generally be used. It is by far the more common type of criterion-related
validity study. See Table 3.3 for an example of this type of validity study.
For both predictive and concurrent studies of criterion-related validity, the resulting correlation coefficient is often
termed a validity coefficient, or more specifically, as a predictive or concurrent validity coefficient, respectively.
Interpretation of such coefficients is essentially the same as that described for correlations in chapter 2. However, one
factor in the interpretation of validity coefficients that was not addressed previously concerns how high a correlation has
to be for one to consider it credible support of a measure’s valid use for a particular purpose. The Standards for
Educational and Psychological Testing (APA, AERA, & NCME, 1985) does not provide direct guidance on this
question. However, several experts recommend that when a measure is going to be used to make decisions about an
individual (rather than as a way to summarize a group’s performance), a standard of .90 should be used. As an additional
proviso, the correlation coefficient should also be found to be statistically significant (Anastasi, 1988).
Factors Affecting Validity
Anything that causes a measure to be sensitive to factors other than the targeted construct will diminish the measure’s
validity. For example, a bathroom scale that becomes sensitive to room temperature or humidity is likely to be less valid
as an indicator of how much damage one has done after a series of holiday meals. In this section of chapter, I consider
factors affecting the validity of behavioral measures such as those used with children—first considering two factors over
which the clinician has considerable direct control, then two factors over which the clinician’s control is far less direct.
Selection of an Appropriate Measure
As mentioned at the beginning of this chapter, probably the biggest factor affecting the validity of decisions made using
a particular measure is the suitability of the match between the specific testing purpose and the demonstrated qualities of
the measure to be used. The majority of information described thus far relates to activities performed by the developer of
a standardized measure. Still to be discussed is how test users make use of that information to do their rather large part in
assuring the validity of their own test use. For the moment, it is sufficient that you be aware that your role is critical in
assuring testing validity and that it begins with a thorough evaluation of information provided by the
Page 63
test developer, test reviewers, and the clinical literature in light of your client’s needs. Specific steps leading to such an
evaluation are described in the next chapter.
Administration of the Measure
After successful selection of a measure, the clinician plays a critical role in assuring validity through its skilled and
accurate administration. Unless a measure is administered in a manner consistent with the methods used in developing
the measure’s norms and testing its reliability and validity, any comparison of the resulting performance against either
norms or performance standards becomes distorted, even nonsensical. Thus for example, the directions supplied, with a
test may indicate that orally presented items are to be read aloud only once. In that case, the difficulty of that test will
probably be lessened if the test user decides that it’s only “fair” to the child to give a second chance to hear the
information included in the item. In reality, however, it is decidedly “unfair” to the child if the test is being used to
provide information about how that child’s performance compares with a standard that was determined under different
conditions.
Skilled administration of standardized measures, however, goes well beyond the preservation of idealized conditions. It
also facilitates a crucial but sometimes overlooked function of a testing situation—that is, the establishment of a trusting,
potentially helpful relationship between the clinician and the child being tested. If test administration goes well, the child
comes away from the experience with a sense that the test giver likes the child and is a rewarding person with whom to
interact. If it does not, not only will the test data be compromised, but the child may develop expectations of the test
giver that will be difficult to overcome. Indeed, some researchers (Maynard & Marlaire, 1999; Stillman, Snow, &
Warren, 1999) who examine the testing process in detail note that far too little attention is paid to the collaborative
nature of testing, in which the examiner is not a passive conduit of items but an integral participant in the testing
outcome. Table 3.4 lists some suggestions gleaned from several years of clinical experience (my own and others’)
concerning how to facilitate testing.
Client Factors
Client factors are such a key feature to valid testing of children that it seems worth discussing them under a separate
heading. Of particular interest are motivation and what Salvia and Ysseldyke (1995) called enabling behaviors and
knowledge.
Motivation affects the performance of adults and children in often dramatic ways. Although the topic of motivation has
been the impetus for extensive research in several disciplines, you can quickly appreciate the devastating impact of low
motivation by looking back over your own experiences and remembering an occasion when a classroom quiz or test fell
at a time when you were preoccupied by other things happening in your life, or perhaps a time when you “psyched
yourself out,’’ thereby seemingly necessitating the fulfillment of a prophecy of failure. For me, the experience that
comes to mind is a midterm examination I took in college. I had found an unconscious but still breathing mockingbird on
my way to the exam. Consequently, during the examination, I spent much more of my time wondering whether the bird
would still
Page 64
Table 3.4
Testing Recommendations
1. Remember that children rarely have much sophistication in test-taking skills. They expect your relationship with
them to be based on the same rules that apply to interactions in other situations. Therefore it is your responsibility
to honor their expectations and find ways to achieve your goals within that context.
2. Children’s efforts to achieve their best for you will be built on the expectation that you and they are out to please
each other in the interaction. You want to be accepted by the child as a rewarding, appreciative adult who is
generally fun to be with.
3. For older children, you need to strive for a balance in which you are in control as much as you need to be to have
your questions answered and the child is in control as much as possible otherwise. For example, it is important that
you maintain control over your test materials, are relatively firm when you make a request that is a necessary part
of the testing process, and only offer choices where they are truly available (e.g., avoid asking questions such as the
following if they are not true offers: “Do you want to look at some pictures with me now? ”).
4. Help children cooperate by informing them about the content, order, and time frames associated with various
assessment tasks. Toward this end, consider doing the following: (a) Whenever possible, allow the child to make
choices in ordering activities, and (b) devise a method to let children know how much more is required of them. For
older children, you can use a list where each completed item is checked off or rewarded with a sticker. For younger
or less sophisticated children, you can use tokens equaling the number of activities, which are removed from sight
or moved to a different location as each activity is finished.
be alive when I finished it and where I could get help for it if it were still alive, than I spent actively focused on the
outcome of the examination. With predictable results. (Sadly, the bird fared no better than my exam grade.)
Motivation is particularly critical for measures that are intended to elicit one’s best effort. One variety of such measures
are those in which clients are assumed to be doing their best under conditions stressing accuracy, speed of execution, or
both. These are sometimes called maximal performance measures. Common examples of maximal performance
measures in childhood language disorders include measurement of language functions in which responses are timed as
well as a variety of speech production measures, including diadochokinetic rate. In a discussion of such measures used to
study speech production, Kent, Kent and Rosenbek (1987) cautioned that extreme care should be taken before
concluding that a test taker is fully aware and motivated and therefore likely to produce a performance that can
reasonably be compared with norms or behavioral standards. The need for caution is particularly great for younger
children and for children with either Down syndrome or autism, but it should always be a concern for any child. Whereas
the level of concern should be greatest for maximal performance testing, any testing of a child will be subject to reduced
validity if the child is uninterested or overly anxious.
Enabling behaviors and knowledge are defined by Salvia and Ysseldyke (1995) as “skills and facts that a person must
rely on to demonstrate a target behavior or knowledge.” If an assumed enabling behavior is absent or diminished,
performance on the
Page 65
measure may no longer be associated with the behavior under study; hence its validity is threatened dramatically.
Enabling behaviors that are frequently assumed in children’s language tests include adequate vision, hearing, motor skill,
and understanding of the dialect in which the test is constructed. In fact, although I discussed it earlier as a separate
category, positive motivation to participate in assessment is a frequently assumed enabling behavior.
Reliability
Reliability, or consistency in measurement, is invariably listed as a major factor affecting validity because it is a
necessary condition for validity, meaning that a measure can only be valid if it is also reliable. Reliability does not assure
validity, however. Figure 3.2 illustrates this relationship between reliability and validity using archery as an analogy.
Target number 1 demonstrates the handiwork of an archer whose aim is both reliable and valid; number 2, an archer
whose aim is reliable, but not valid; and number 3, an archer whose aim is neither reliable nor valid. In behavioral
measurement, the use of measures with degrees of reliability and validity similar to that shown in targets 2 and 3 will
have similarly negative outcomes, although unfortunately the outcomes may not be as obvious and, therefore, will be
harder to detect—and, possibly, to rectify.
Fig. 3.2. A graphic analogy illustrating the relationship between reliability and validity.
Page 66
One point (no pun intended) made by Fig. 3.2 is that reliability limits how valid a measure can be; any loss of reliability
represents a loss of validity. Thus, information about reliability can provide very important insight into the quality of a
measure. To illustrate this problem in a more lively way, imagine the problems associated with an elastic and therefore
unreliable ruler. Over repeated measurements of a single piece of wood with such a ruler, its user on each attempt might
try desperately, even comically, to apply exactly the same outward pressures to the ruler—almost certainly in vain, with
measurements of 5 inches one time, 6 inches the next, and so on. With such immediate feedback, the user of the measure
would surely recognize the hopeless lack of validity in these measurements and would undoubtedly go looking for a
better ruler. Unfortunately, when human behavior is being measured, even measures with reliability equivalent to that of
an elastic ruler would not be so easily recognized. Thus, because of the importance of reliability, the next section of this
chapter is devoted to a more detailed explanation of reliability—what it is and how it is studied.
Reliability
Reliability can be defined as the consistency of a measure across various conditions—such as conditions associated with
changes in time, in the individual administering or scoring the measure, and even changes in the specific items it
contains. If a measure is shown to be consistent in its results across these conditions, then its user can make inferences
from performance under observed conditions to behaviors and skills shown in other, unobserved conditions. In short,
acceptable reliability allows for generalization of findings obtained in the assessment situation to a broader array of real-
life situations—those in which test users are really more interested.
When the reliability of a measure is examined during the course of its construction, that information is frequently
represented using another type of correlation coefficient called a reliability coefficient. Alternatively, more sophisticated
statistical methods have been developed to examine the reliability of measures on the basis of an influential perspective
called Generalizability theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972), which attempts to examine several
sources of inconsistency simultaneously. These methods, however, are relatively recent and only infrequently applied in
speech and language measures (Cordes, 1994).
Another way for thinking about reliability is in terms of how it affects an individual score. The most popular framework
guiding this perspective on reliability is sometimes described as the “classical psychometric theory’’ or the “classical
true-score theory.” Although recent developments, including Generalizability theory, have eclipsed classical theory as
the cutting edge of psychometrics (Fredericksen, Mislevy, & Bejar, 1993), classical theory nonetheless pervades much of
the practical methods used by test developers and hence test users. Further, its continuing utility is praised even by those
actively working along other lines (e.g., Mislevy, 1993).
The most important assumption associated with classical true-score theory (Allen & Yen, 1979) is that an observed score
(a score someone actually obtains) is the sum of the test taker’s true score plus some nonsystematic error. Thus, the true
score is an
Page 67
idealization. It has alternatively been described as the score you would find if you had access to a crystal ball or as the
mean score a test taker would achieve if tested infinitely. Notice that error and reliability are correlated in this
perspective. Specifically they are inversely related: The larger the reliability, the smaller the error.
Besides its historical value, this perspective on reliability is useful because it foreshadows our ability to apply reliability
information obtained on a group to possible error in the observed score of an individual test taker, such as our client.
When the reliability of a measure is expressed in relation to individual scores, that information is represented using a
measure known as the standard error of measurement (SEM). Its mention here is meant to whet your appetite for further
information, which is provided later under the heading Internal Consistency.
Ways of Examining Reliability
Three types of reliability are of most frequent interest—test–retest reliability, internal consistency reliability, and
interexaminer reliability. A fourth type of reliability, alternate-forms reliability, is relatively infrequently used. The
methods used to demonstrate such reliability with a particular group of test takers depend to some extent on whether it
will be interpreted using a criterion-referenced or norm-referenced approach. Whereas there is widespread agreement
concerning the methods to be used to study the reliability of norm-referenced measures, debate continues concerning the
best methods to be used with criterion-referenced measures and whether methods traditionally developed for norm-
referenced measures can also be used with criterion-referenced measures (Gronlund, 1993; Nitko, 1983). I discuss
reliability primarily from the traditional, or norm-referenced, perspective, but note those points at which methods
recommended for criterion-referenced measures depart from that perspective.
Test–Retest Reliability
Test–retest reliability is studied in order to address concerns about a measure’s consistency over time. It is particularly
important where the characteristic being measured is thought to remain relatively constant for at least shorter periods of
time (such as 2 weeks to a month). Sometimes a distinction is made between examinations of reliability over periods of
time under 2 months and those of reliability over longer periods of time, which is then termed stability (e.g., Watson,
1983). However, more common is a tendency for the terms test–retest reliability and stability to be used interchangeably.
For norm-referenced measures used with children with language impairments, test–retest reliability is typically studied
by testing a group of children similar to those for whom the measure is intended on two occasions, usually no more than
a month apart. A correlation coefficient, called a test-retest reliability coefficient, is calculated to describe the
relationship between the two sets of scores and is interpreted in a manner identical to that used for previous correlation
coefficients, with increasing correlation size showing a greater degree of relatedness between the two sets of scores.
For measures used with children, the test–retest interval is particularly crucial because rapid developmental changes are
likely to affect whatever characteristic is being meas-
Page 68
ured if the test–retest interval is too large. Thus it is imperative that test developers report the size of that interval over
which test–retest reliability is calculated. Only rarely will a measure be examined for test–retest reliability over an
interval longer than a month.
One limitation of test–retest reliability coefficients is their susceptibility to carryover effects where the first testing
affects the second. Depending on the nature of the carryover, the apparent reliability of a measure for use in a one-time
testing situation (the most typical application) might be either inflated or deflated (Allen & Yen, 1979). For example,
practice effects might make the test easier on the second testing, causing answers to change from the first to second
testing that would result in a reliability coefficient that is smaller than it would be if carryover had not occurred. On the
other hand, test takers may remember their answers from the first testing and simply repeat them on the second, resulting
in a reliability coefficient that is larger than it would be if carryover had not occurred. Because of this, test developers
will sometimes adopt methods other than the straightforward test–retest method, choosing to use alternate-forms
retesting methods to supplement or sometimes even replace test–retest data.
Many measures of considerable utility to speech-language pathologists working with children who have language
impairments are not standardized tests for which reliability data are provided. Instead, they are informal measures
devised for a limited purpose. For informal measures it is more common to discuss the concept of consistency under the
heading of agreement. Thus, for example, it is possible to calculate test–retest agreement for an informal measure used
by a single clinician.
Figure 3.3 provides an example of an informal probe measure for which an agreement measure is calculated. Although
this example uses two judges, analogous methods can be used to examine consistency for a single judge over time. In
this example, the importance of agreement measures in giving you a sense of the consistency of measurement is
highlighted when you notice that the two judges arrived at exactly the same percentage correct calculation for the client.
However, they did so while agreeing about which words were correctly produced at a percentage almost equal to that
predicted if their judgements were due to chance (50%)! A particularly popular alternative to the simple procedure I
described is the Kappa coefficient (Fleiss, 1981; Hsu & Hsu, 1996), which addresses this problem of chance agreement.
McReynolds and Kearns (1983) are an especially helpful resource for those interested in a more thorough description of
agreement measures. Yet another resource for those interested in a detailed discussion of the meaning and relative merit
of such measures can be found in Cordes (1994).
Internal Consistency
Internal consistency is studied in order to address concerns about a measure’s consistency of content. It is primarily of
interest in cases where a test or subtest has items that are assumed to function similarly. Obtaining information about
internal consistency for norm-referenced measures presents few practical difficulties: The same information used to
provide norms is used to study internal consistency. Thus, information about internal consistency is often provided, even
if little else is.
The most basic method for examining internal consistency involves the calculation of a split-half reliability coefficient,
where performances of a group of test takers like
Page 69
Asian Americans
Strict gender and age roles Father—the family leader, head of family Mother—the nurturer, caregiver Older males
superior to younger males Females submissive to males Close, extended families Multigenerational families Older
children strictly controlled, restricted, protected Physical punishment used Parents actively promote learning activities
at home—may not participate in school functions Children are treasured Infant/toddler needs met immediately or
anticipated Close physical contact between mother and child Touch rather than vocal/verbal is primary vehicle of
early mother–infant interaction Harmony of society more important than individual Infant seen as independent and
needing to develop dependence on family and society African Americans Mothers and grandmothers may be greatest
influences Strong extended family ties are encouraged Independence and assertiveness encouraged Infants may be
focus of family attention Affectionate treatment of babies, but fear of “spoiling” Strong belief in discipline, often
physical Caregiving of older toddler may be done by an older child Hispanic Americans Strong identification with
extended family Families tend to be patriarchal with males making most decisions Infants tend to be indulged;
toddlers are expected to learn acceptable behavior Emphasis placed on cooperativeness and harmony in family and
society Independence and ability to defend self encouraged Older siblings often participate in child care Note. From
Family-Centered Early Intervention for Communication Disorders: Prevention and Treatment (p. 21), by G. Donahue-
Kilburg, 1992, Gaithersburg, MD: Aspen. Copyright 1992 by Aspen. Reprinted with permission.
speech-language pathology and audiology. Thus, clinicians are increasingly faced with the special challenge of enlarging
their understanding of other cultures and linguistic communities and the skills required to implement that understanding
in their work.
The process of respecting diversity in children and in their families pervades all phases of clinical interaction. Because it
is critical to valid screening, identification, description, and assessments of change, diversity arises as a continuing point
of discussion throughout the remainder of this text. I highlight it here because of its particular relevance to the test
review process discussed later in this chapter.
Page 84
Societal and Legal Contexts
Just as the child whose language development is in doubt exists as a member of a larger community, so too is the speech-
language pathologist who serves the child. He or she is also a participant in the larger social contexts of a given
profession and workplace within a particular time and place—a given era within a given school district or institution,
state, and country. Each of these contextual factors can affect decisions about assessment. A recent discussion of the
roles and responsibilities of school speech-language pathologists, contained within an extensive ASHA document
available on their website, emphasized this fact (ASHA, 1999). Table 4.2 includes just a small number of the many
factors ASHA described as affecting clinical practice with children. In this brief section, two particularly compelling
sources of effects on measurement practice are addressed: national legislation and changing global perspectives on
disablement.
National Legislation
As mentioned briefly in terms of regulations regarding family involvement, legal influences on how children are
evaluated for language problems represent some of the most powerful influences in clinical practice. In particular,
federal legislation establishing the ways in which public schools address the needs of children has had profound effects
on how children’s problems are screened, identified and addressed (ASHA, 1999; Demers & Fiorello, 1999). Thus, as
described earlier, it was through Education of the Handicapped Act Amendments of 1986 that ideas about the need for
greater attention to families became a potent factor in shaping actual practice. In this section, I point out the even broader
effects that have resulted from a number of other legislative initiatives, paying particular attention to the Individuals with
Disabilities Education Act (IDEA), which was passed in 1990.
The IDEA built on and modified earlier legislation, including two landmark federal laws: the Education for All
Handicapped Children Act of 1975 (P.L. 94-142), which established many now-standard features of educational
attention to children with special needs and Education of the Handicapped Act Amendments (1986), which mandated
services for those children from birth to age 21, in addition to its role in pressing for greater inclusion of families in
educational evaluations. Since 1990, the IDEA has been amended (IDEA Amendments of 1997) and has had regulations
developed for its implementation.
Table 4.2
A Brief List of Some of the Contextual Factors Affecting Speech-Language Pathology Practice Among School-Based
Clinicians (ASHA, 1999)
Specific federal legislative actions (e.g., the Individuals with Disabilities Education Act of 1990) State regulations and
guidelines Local policies and procedures Staffing needs Caseload composition and severity Cutbacks in education
budgets Personnel shortages Expanding roles
Page 85
The IDEA and the 1997 amendments to it maintained numerous elements of the earlier legislation. Among the most
important of these maintained features is a mandate for nondiscriminatory assessment. In such assessments, it is required
that measures be administered in the child’s native language by trained personnel following the procedures outlined in
the test manual. In addition, these more recent laws dictate that validity information for a test be specific to the purpose
for which the test is used. Further, this legislation requires that evaluations of children be comprehensive, multifactored,
and conducted by an interdisciplinary team. Although each of these components was viewed as the best practice at the
time of legislation, legislation and the potential for litigation where legislation is not followed give rise to the actual
implementation of professional and academic recommendations. However, it’s important to recognize that legislation is
not always in accord with best practices, as I discuss in later sections.
New provisions of the IDEA, its amendments, and the more recent development of regulations implementing it include
some changes in nomenclature, such as abandonment of the term handicapped for the term disabled as the designation
given to children covered by the law. In addition, these legal actions have added several new separate disability
categories, with autism being the most relevant to discussions of language disorders. Other new elements consist of
demands for increased accountability with resulting increases in documentation requirements and insistence that
children’s IEPs contain information connecting the child’s disability to its impact on the general education curriculum
(ASHA, 1999; Demers & Fiorello, 1999).
Because of the legislation described above, speech-language pathologists who work with children in schools are
involved in a broader range of responsibilities and potential roles (ASHA, 1999). The children they evaluate are more
diverse in age, language, and culture, and the collaborative nature of their work has increased dramatically. Also,
clinicians are made more accountable for the validity of the instruments they use and the methods they follow in
evaluating clients.
To a great extent, the effects of national legislation are supportive of good measurement practices. At the same time,
however, legislation introduces complexity for clinicians, who face increasing responsibilities, increasing demands for
documentation, and the push to revise or develop strategies to deal with the specific ways in which individual states and
school districts implement federal law. Some of the complications to clinical practice introduced by state Departments of
Education are discussed as they relate to specific measurement questions in later chapters.
World Health Organization Definitions
At an international level, changes brought about by the World Health Organization (WHO) of the United Nations have
affected assessment practices (WHO, 1980). As part of its charge to develop “a global common language in the field of
health,’’ WHO proposed guidelines reflecting changing views about health and departures from health that would affect
a wide array of sectors, including health care, research, planning and policy formation, and education. Specifically, in
1980, WHO developed the International Classification of Impairments, Disabilities, and Handicaps (ICIDH), in which
various types of outcomes associated with health conditions were considered.
Page 86
The 1980 ICIDH classification recognized four levels of effects. These levels are summarized here with examples taken
from applications to language disorders. First, there is disease or disorder, the physical presence of a health condition,
for which a language disorder can serve as the example. Next, there is impairment, an alteration of structure or function
causing the individual with the condition to become aware of it. For children with language disorders, an example of a
possible impairment would be inappropriate use of grammatical morphemes. The third level of effects is described as
disability, an alteration in functional ability. For children with language disorders, the disability associated with their
difficulties could be a decreased ability to communicate. The last level recognized in the ICIDH is that of handicap,
which is a social outcome. Thus, negative attitudes on the part of playmates or teachers toward affected children
constitutes a possible handicap associated with language disorder.
Although changes in these terms and the reasons for those changes are discussed in a moment, I first discuss two
important implications of this new classification system that have proven most significant. First, although there is a
tendency for these four types of effects to be related to one another (e.g., for more severe disorders to be associated with
greater handicaps), this is not always the case. For example, it is possible for a handicap to exist apart from the presence
of a disease or disorder, as might be the case if societal prejudice against an individual occurred in the absence of actual
impairment. A specific example might be if a child were to be excluded by a group of peers because of a cleft lip, an
observable but functionally insignificant difference.
Similarly, it is possible for a more severe impairment to be associated with only a mild disability and minimal handicap
because of successful compensatory strategies on the part of the individual, effective interventions on the part of
professionals, or both. Imagine a child with a moderate hearing loss acquired after initial stages of language acquisition
are complete who experiences high overall intelligence, strong motivation, a supportive home environment, and effective
auditory management. Such a child could be expected to experience lesser effects on communication effectiveness and
on social roles than would be expected on the basis of the severity of hearing loss alone. This classification causes one to
consider the role of not only the child, but also of his or her surroundings in determining the nature of negative effects
experienced because of a disorder.
A second major implication of the 1980 classification is that each of the four levels of effects is understood to be
associated with different measurement goals for both research and clinical purposes. For example, measurement focused
at the level of handicap requires information about how a child’s social and educational roles are affected by his or her
condition. This contrasts with measures focused at the level of impairment, which require information about the child’s
use of particular language structures. The greater attention paid to the larger ramifications of health conditions coincides
with an urgent push in both clinical and educational settings for measuring and evaluating the effectiveness of
interventions in terms these higher order effects.
Despite the widespread influence of the 1980 classification system, dissatisfaction existed with its terminology and with
the ways in which the social contributions to the effects of health conditions was handled. Among specific criticisms was
that terminology was sometimes confusing and included the use of potentially offensive terms
Page 87
such as handicap (Frattali, 1998). The model underlying the classification was also criticized for failing to represent the
influence of contextual factors.
Because of concerns about the 1980 classification system, a draft revision was put forward in 1997 for comment and
field testing, with an expected final approval date for a final version in 2000 (WHO, 1998). The proposed classification
system is called the ICIDH-2: International Classification of Impairments, Activities, and Participation (WHO, 1998),
reflecting significant changes to the theoretical orientation from the earlier classification of “Impairments, Disabilities,
and Handicaps.” The details of the final revision remain indefinite at the moment. Nonetheless, the current draft warrants
discussion because of its value as an indicator of emerging trends and because it fits snugly with the view of children
advanced up to this point in the chapter—that is, as deeply affecting and affected by their environment.
As its most important change, the 1997 classification is designed to embrace a model in which human functioning and
disablement result from an interaction of the individual’s condition and his or her social and physical environment. In
this system, therefore, the following definitions are used to describe levels of functioning (or where decreased
functioning is noted, disablement) in the context of a health condition:
1. “Impairment is a loss or abnormality of body structure or physiological or psychological function, e.g., loss of a limb,
loss of vision” (WHO, 1998, p. 8). Notice that this level corresponds to the current ICIDH level of impairment and thus
might refer to a child’s abnormal or delayed language characteristics.
2. “An Activity is the nature and extent of functioning at the level of the person. Activities may be limited in nature,
duration, and quality, e.g., taking care of oneself, maintaining a job” (WHO, 1998, p. 8). Notice that this level replaces
the current ICIDH level of disability and thus might refer to a child’s reduced ability to communicate.
3. “Participation is the nature and extent of a person’s involvement in life situations in relation to Impairment, Activities,
Health Conditions and Contextual factors. Participation may be restricted in nature, duration and quality, e.g.,
participation in community activities, obtaining a driving license’’ (WHO, 1998, p. 8). This final level corresponds to
the older level of handicap and thus might refer to negative social outcomes of a child’s language problems.
On the basis of these new formulations, one can see continuities between the proposed and existing systems yet also
notice a significant change in orientation that is both more positive in tone and more recognizing of contextual
influences. In the new classification system, a person’s environmental (social and physical) and personal contexts are
said to influence how disablement at each of these levels is experienced. In particular, two types of contextual factors are
deemed most important: (a) environmental and physical factors (such as social attitudes, physical barriers posed by
specific settings, climate, and public policy) and (b) personal factors (e.g., education, coping style, gender, age, and other
health conditions; WHO, 1998, p. 8).
From this overview, it is evident that the thrust of the ICIDH-2 will be support for many of the principles championed by
Bronfenbrenner, by recent federal legislation,
Page 88
and by advocates for an integrated view of validity in which the effects of a decision made using a measure must be
considered when one evaluates a measure’s validity. Overall, a unifying principle is that decision making on behalf of
children requires attention not simply to properties of the child but to the context in which the decision is being made and
acted on.
In the last half of this chapter, practical steps involved in the process of evaluating measures for possible use in decision
making are described. Although I have rendered the larger context in which this process must take place in only the
grossest detail, I hope that you can sense the sheer intricacy of the task at hand. On the one hand, confronting the very
significant intellectual challenge entailed in the selection, use, and interpretation of appropriate measures makes me
nearly turn tail and run. On the other hand, however, the rewards of successful clinical decision making and action would
be less sweet if they were easily won.
Evaluating Individual Measures
Evaluating individual measures is like solving a mystery, where the mystery is how to view a measure for use with a
particular client or group of clients. After a general plan is developed in the early stages of the review process, clues are
collected and weighed. Most clues come from the clinician’s knowledge of individual clients and their needs and from
the manual for the particular measure. Additional sources of information, such as test reviews and pertinent research
articles, can also help in the process. This chapter is arranged so that, following a brief overview of two modes of
reviewing, you are introduced to the test manual and then to other sources of information to help you reach a final
decision—to “crack the case,” if we follow the detective analogy.
Client-versus Population-Oriented Reviews of Measures
I have said that the validity of a measure depends on its ability to answer a particular clinical question for a particular
child. Consequently, the appropriateness of a measure is determined within the realm of the particulars—ideally, within a
firm appreciation of factors important to an individual child, such as coexisting handicapping conditions, language
background, gender, and age—as one reviews the test manual and other sources of information for the measure. Such a
review might be said to be a client-oriented review of the measure.
Client-oriented review of measures is an ideal that is often unattainable. Given the pace of most clinical environments,
clinicians are rarely able to review each potential measure thoroughly and compare it with competing measures
immediately prior to each measurement they make. In fact, clinicians more commonly use what I would call a
population-oriented evaluation.
In a population-oriented review of a measure, the clinician reviews the measure’s documentation in reference to a
particular group or groups—usually those subgroups of children they serve most frequently. For example, a speech-
language pathologist in a rural Vermont school would pay special attention to a test’s likely value for a subgroup of chil-
Page 89
dren with few significant problems in other areas of development, who come from homes in which English is the only
language spoken, and socioeconomic status is middle to low. In contrast, a very different population-oriented assessment
might be conducted by a speech-language pathologist in a Boston school district with a caseload consisting solely of
children from French-speaking Haitian families living in poverty. Although evaluating a measure for these two
populations would involve many of the same questions, each would require different answers reflecting sensitivity to the
relevant population.
Population-oriented reviews are most frequently conducted when a new measure is considered for purchase, when a
measure is examined at a publisher’s display at a convention, or when a speech-language pathologist enters a new
position and inventories available measures. In contrast, client-oriented reviews of measures often arise when an
uncommon clinical question emerges or when a child’s particular pattern of problems (e.g., mental retardation and a
severe hearing loss) make the child’s needs in a testing situation too unlike those for which the clinician has conducted a
population-oriented evaluation.
How to Use Test Manuals
Regardless of the type of review you undertake, the outcome of your evaluation will never simply be a buy–don’t-buy
or– use–don’t-use decision. A thorough review provides potential users with an appreciation of the measure’s limitations
for answers to specific clinical questions.
The test manual is the definitive source of information on a standardized measure. In fact, many of the recommendations
made in the Standards for Psychological and Educational Testing (APA, AERA, & NCME, 1985) relate directly to
material that should be provided in test manuals. Despite their importance, however, test manuals range widely in their
sophistication and value. At their best, test manuals provide not only the basic information required to evaluate the
measure’s appropriateness for given uses with specific populations, but also insightful tutorial information that can
reinforce and extend one’s understanding of test construction and use. At their worst, test manuals appear to be little
more than sales brochures designed to obscure a test’s weaknesses and imply that it can be used for all clients and testing
purposes. Even measures that are valuable additions to a clinician’s arsenal may imply possible uses that really are not
supportable. Consequently, a clinician’s detective talents are called on to ferret out the truth!
The reviewing guide reproduced in Fig. 4.2 is a worksheet for evaluating behavioral measures. It is blank so that you can
readily duplicate and use it. An annotated version of the guide, which appears as Fig. 4.3, summarizes the most
important kinds of information—or “clues”—you will be looking for as you conduct a measure review. The annotated
guide is designed to function like the ready reference cards available for many software applications.
The reviewing guide and annotated guide are included to make reviewing a more efficient process, but their inclusion is
not without hazards. The danger of such worksheets and summaries is that some individuals may consider them all one
needs to know in order to conduct a credible review. This is a big mistake! These guides are a first step that should
always be accompanied by a willingness—even eagerness—to
Page 90
(continued)
Page 91
(continued)
Page 92
Fig. 4.2. Annotated review form.
Page 93
(continued)
Page 94
(continued)
Page 95
Fig. 4.3. Review form.
Page 96
look back at trusted resources on measurement, especially the Standards for Educational and Psychological Testing
(APA, AERA, & NCME, 1985). After all, even Sherlock Holmes depended on his learned friend Dr. Watson!
Numerous authors writing about psychometric issues propose review procedures that are very similar to those described
here (e.g., Anastasi, 1988, 1997; Hammer, 1992; Hutchinson, 1996; Salvia & Ysseldyke, 1998; Vetter, 1988b).
Appendix 5 in Salvia and Ysseldyke (1998, pp. 763–766), “How to Review a Test,” is a particularly informative and
amusing description of the review process.
In the remainder of this section I lead you through the annotated guide, explaining why it is important to look for certain
kinds of information. These sections are less sketchy versions of the brief summaries given in Fig. 4.3. Some of their
content should sound quite familiar because it is based on the concepts discussed at length in chapter 3. This section ends
with a review guide completed as part of a hypothetical client-oriented review (Fig. 4.4).
1. Reviewer This information will probably be unnecessary for reviewers who function alone in their test selection and
evaluation. On the other hand, it can be helpful in cases where multiple test users share reviewing responsibilities, at
least for preliminary, population-oriented reviews. Use of a standard guide facilitates such sharing by reducing
differences between reviewers and offering later reviewers a possible starting point for client-oriented reviews.
2. Identifying Information Besides information that can help you locate or replace an instrument, this section provides
preliminary clues to the scope and nature of the measure. Test names vary greatly in just how much they disclose about
the nature of the test (e.g., whether it is comprehensive or aimed at only one modality or one domain of language), so
they should be approached with caution. Testing time, which users may want to break down in terms of projected
administration and scoring times, is of practical importance when scheduling testing.
Information about basic characteristics of the measure such as whether it is standardized versus informal, criterion-
referenced versus norm-referenced, is used to determine the measure’s suitability for certain clinical questions and
guides expectations for other sections of the review guide. Although all major sections of the Guide are relevant to all
measures, the kinds and amounts of information provided vary depending on the measure’s type. Manuals for
standardized, norm-referenced measures probably provide the greatest amounts of information. On the other hand, more
informal, criterion-referenced measures, which have often been created by an individual clinician for a specific purpose,
have far less information available. (Although see Vetter, 1988a, for recommendations about the kind of information that
should be kept for any procedure that might profitably be used on repeated occasions).
3. Testing Purpose Here, you summarize your knowledge of the intended client or population. Relevant information
includes the client’s age, other problems (e.g., visual, motor, or cognitive impairments), and important language
characteristics (e.g., bilingual home, regional or social dialect use).
Page 97
(continued)
Page 98
(continued)
Page 99
Fig. 4.4. Sample of a completed review form.
Page 100
The main clinical questions leading to the search for an appropriate measure should also recorded here: Is the measure
going to be used for screening, identifying a problem or difference, treatment planning, or assessing change? Also, what
language modalities and skill areas are of interest? As mentioned in chapter 1, each of these clinical questions requires
different measurement solutions. Therefore, the reviewer should conduct all reviews with the assessment purpose vividly
in mind. Chapters 9–12 address in considerable detail the demands associated with different clinical questions.
4. Test Content This section returns the reviewer’s attention to the test manual. Gaining a clear understanding of a test’s
content usually requires that you examine at least the early sections of the test manual and the test form itself.
Homogeneous measures, in which all items are aimed at a single modality and language domain, are relatively easy to
specify in terms of their content. For example, the Expressive One-Word Picture Vocabulary Test–Revised (Gardner,
1990) fits into this category; its content can be specified as expressive vocabulary or expressive semantics. Usually,
however, measures address more than one content area, which are indicated through the use of subtests or subscores. For
this section of the review guide, as well as for the sections that follow it, recording page numbers along with your
findings is an excellent way to encourage checking against the manual during later use of the completed guide.
As you record information about test content, you want to see how well the content areas covered by the measure match
those of interest for your client. Even the nature of items (e.g., forced choice vs. open-ended responses) will be important
in helping you determine whether the behaviors or abilities of interest will be the largest contributor to your client’s
performance. Recall that one threat to validity introduced in chapter 3 was that of enabling behaviors, behaviors that
enable a test taker to take the test validly. For example, suppose that you were interested in assessing the receptive
language skills of a child with cerebral palsy who fatigued easily if asked to show or act out responses. The motoric
demands of measures become enabling behaviors that will negatively influence the child’s performance even though
they are independent of the targeted construct of receptive language.
In addition to providing a tangible reminder not to overlook possibly problematic enabling behaviors, this section of the
review form should also stimulate clue-gathering around what is actually being tested (Hammer, 1992; Sabers, 1996).
Recall that as the test developer moves from an ideal formulation of the measure’s underlying construct to the down-and-
dirty task of writing sets of items, certain behaviors or skills necessarily tag along to yield a fleshed-out construct that
may or may not match your own (or even the author’s) intended formulations.
As an example of how constructs can be modified as a test takes shape, imagine a test developer who decides to devise a
measure to assess use of complex sentences using methods that place a heavy demand on working memory capabilities.
For example, the test developer could provide the test taker with a set of seven words, including the word because that
are to be used to create a single sentence. Although the final form in which the construct is realized may be acceptable to
some test users, it may
Page 101
not be to others, depending on their understanding of the targeted construct. It is primarily through careful attention to
this step in the review process that you will become aware of correspondences—or disjunctions—between the test
developer’s and your view of what is being tested. Armed with this knowledge, you can make an informed decision as to
whether the construct being measured is close enough to your reason for testing for you to consider using it.
5. Standardization Sample/Norms At first glance, this section may seem to be primarily of interest when you are looking
for a norm-referenced instrument, that is, one in which scores are interpreted primarily on the basis of how the test
taker’s performance compares in a quantitative way with that of a peer group. In fact, however, the nature of the
standardization sample has important implications for all measures. It can determine the extent to which summary
statistics (in the case of norm-referenced measures) or summary descriptions of behaviors (in the case of criterion-
referenced measures) are likely to reflect characteristics of most children rather than those of a small, potentially
nonrepresentative group (e.g., children of affluent, highly educated parents). Nonetheless, there are some differences in
how the information provided in this section will be weighed on the basis of the nature of the instrument.
When a norm-referenced measure is being evaluated, you look for a clear description of the normative sample that was
used: how many children were studied, whether and why any children were excluded, and how representative the sample
is compared with the population your client (or subgroup of clients) fits into. Ideally, at least 50 children who are within
a relatively small range in age from that of your client (usually no more than 6 months older or younger) will have been
tested. Also, you want these children to be similar in race, language background, and socioeconomic status to the child or
children you have in mind.
When there are significant differences between the normative sample and your client(s), you need to draw on your
knowledge of the appropriate research base as well as your own knowledge of cultural differences to determine to what
extent the validity of this measure is likely to be undermined. If a measure’s validity is seriously undermined and
alternative measures are unavailable, a variety of approaches, including dynamic assessment and the development of an
informal measure, represent possible strategies (see chap. 10 for further discussion of this issue.)
For a norm-referenced instrument, you also want to examine the types of scores the test uses to describe the test taker’s
performance. In terms of desirability, standard scores rank first, percentile scores are next, and developmental scores
(such as age-equivalent or grade-equivalent scores) earn a sorry last place. In this section of the review form, you may
also want to record the availability of tables that record the standard error of measurement (which will be discussed at
greater length below under reliability). Recording that information here is a good idea because it indicates the amount of
error associated with a test taker’s standard score.
When a criterion-referenced measure is evaluated, the composition of groups used to determine cutoff scores will be the
focus of your scrutiny at this point in the review form. I am not aware of recommendations concerning sample size and
composition that are as specific as those given above for norm-referenced measures. However, you
Page 102
want to be sure that the group for whom the cutoff scores are provided are similar to your client or clients and that the
group is large enough so that the cutoff is likely to be stable (McCauley, 1996).
6. Reliability In this section, you will summarize relevant information about the test’s reliability, which is almost always
contained in a separate, clearly marked section of the test manual. The operative word here is relevant. The manual may
report 6, 10, even 20 studies in which the reliability of the measure was examined. Nonetheless, the relevant ones are
those (a) using participants who are as similar as possible to your client(s) and (b) focusing on the type of reliability that
is either most at risk because of the nature of the instrument or most important to your clinical question. Recall that
chapter 3 discusses the different kinds of reliability data that are typically of interest.
Once you have decided what forms of reliability are of greatest importance, how do you know whether the evidence is
adequate? For norm-referenced tests, the evidence will almost always take the form of reliability coefficients.
Traditionally, it has been suggested that one demand correlation coefficients that are statistically significant and at
least .80 in magnitude for screening purposes and at least .90 when making more important decisions about individuals
(Salvia & Ysseldyke, 1998). However, a more circumspect recommendation might be that you want the best reliability
available on the market. By this I mean that when the ideal of .90 is not available, and a decision must be made, you will
want the best that you can find as well as multiple, independent sources of information.
For criterion-referenced measures, evidence for reliability can take a great many forms—from correlation coefficients to
agreement indices (Feldt & Brennan, 1989). Such evidence for criterion-referenced measures usually addresses the
question of how consistently the cutoff can be used to reach a particular decision. As you would do for norm-referenced
measures, focus on the results of those studies that involve research questions most like your clinical question and
participants most like your client(s). Information about the relationship between types of reliability and clinical questions
is discussed in chapters 9 to 11.
7. Validity Although the entire review form is aimed at your cracking the case of a measure’s validity for a particular
use, in this section of the review form, you will summarize the most important of the information provided by the test
developer for the purpose of evaluating validity. Although most of the information of interest will probably be found in
clearly labeled sections of the manual, information relevant to considerations of content and construct validity is also
frequently found in sections dealing with the measure’s initial development and subsequent revisions (if any). Recall that
some of the specific methods used to provide evidence of validity (e.g., developmental studies, contrasting group
studies) are discussed at some length in the previous chapter.
The statistical methods that are used to document validity vary from correlation coefficients to analyses of variance to
factor analysis. Consequently, a discussion of what constitutes acceptable data must remain fairly general here. Overall,
one looks
Page 103
to see that the measure is shown to function as it is predicted to function if valid. As with reliability evidence, the nature
of the participants in the study will affect the extent to which it is relevant for your client and purposes. As you complete
this section of the review form, every skeptical bone in your body should be recruited for service. Claiming validity
doesn’t make a measure valid, although at times test developers seem to forget this.
8. Overall Impressions of Validity for Testing Purpose and Population At this point in the review guide, you put the
clues together to sum up the case. Your study of the pros and cons should be summarized, with holes in the evidence
noted and discussed in terms of their implications for interpreting results. This is where you determine whether you
believe the instrument can be safely used and, if used, what cautions should be kept in mind when it is administered and
interpreted. Clearly, this is the most demanding point in the review process—akin to a final exam or the concluding
paragraph of a large paper. Although practice is perhaps the best way of honing the requisite analytic skills, examination
of other reviews of the instrument (when they’re available) can help you make sure you have not overlooked any major
clues and can also help you see how others have approached the task. Even examining reviews on other measures can
prove helpful for getting a sense of how seasoned detectives sum up their cases. (e.g., See reviews in Conoley & Impara,
1995, of the Receptive–Expressive Emergent Language Test–2 [Bzoch & League, 1994], written by Bachman [1995] and
Bliss [1995] and of the Test of Early Reading Ability–Deaf or Hard of Hearing [Reid, Hresko, Hammill, & Wiltshire,
1991], written by Rothlisberg [1995] and Toubanos [1995]).
Because examples can prove so helpful in developing one’s understanding of a new process, I included Fig. 4.3, which
illustrates how I would complete the reviewing guide for the Expressive Vocabulary Test (Williams, 1997) as I consider
its validity for use with a hypothetical child, Melissa. Melissa is a 9-year, 2-month-old girl who has previously been
receiving treatment for a specific language impairment. She is being tested as part of a periodic reevaluation, which will
be used by an educational team to determine whether she will continue to receive services in her school. Melissa’s
unilateral hearing loss and problems with attention will require special attention during the review of the Expressive
Vocabulary Test (Williams, 1997) for possible use.
How to Access Other Sources of Information
In addition to test manuals, independent test reviews are available to help in the test review process in three different
forms: reviews appearing in standard reference volumes on behavioral measures, journal articles reviewing one or more
tests in a particular area, and computer databases of test reviews.
Standard references and journal articles that include reviews of tests used frequently in the assessment of children with
developmental language disorders or that provide specific information relevant to an understanding of individual tests
are listed in Table 4.3.
Page 104
Table 4.3
Books and Journal Articles Providing Information About Specific Tests Used With Children
Books
American Speech-Language-Hearing Association. (1995). Directory of speech-language pathology assessment
instruments. Rockville, MD: Author.
Compton, C. (1996). A guide to 100 tests for special education. Upper Saddle River, NJ: Globe Fearon Educational.
Impara, J. C., & Plake, B. S. (Eds.). (1998). Thirteenth mental measurements yearbook. Lincoln, NE: Buros Institute of
Mental Measurements.
Keyser, D. J., & Sweetland, R. C. (1994). (Eds.). Test critiques (Vol. X). Austin, TX: Pro-Ed.
Murphy, L. L., Conoley, J. C., & Impara, J. C. (Eds.). (1994). Tests in print IV: An index to tests, test reviews, and the
literature on specific tests. Lincoln, NE: Buros Institute of Mental Measurements.
Journal Articles
Huang, R., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speech-
language pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–23.
McCauley, R. J., & Swisher, L. (1984). Psychometric review of language and articulation tests for preschool children.
Journal of Speech and Hearing Disorders, 49, 34–2.
Merrell, A. W., & Plante, E. (1997). Norm-referenced test interpretation in the diagnostic process. Language, Speech,
Hearing Services in Schools, 28, 50–58.
Plante, E., & Vance, R. (1994). Selection of preschool language tests: A data-based approach. American Journal of
Speech-Language Pathology, 4, 70–76.
Plante, E., & Vance, R. (1995). Diagnostic accuracy of two tests of preschool language. American Journal of Speech-
Language Pathology, 4, 70–76.
Stephens, M. I., & Montgomery, A. A. (1985). A critical review of recent relevant standardized tests. Topics in
Language Disorders, 5(3), 21–45.
Sturner, R. A., Layton, T. L., Evans, A. W., Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and
language screening: A review of currently available tests. American Journal of Speech-Language Pathology, 3, 25–36.
Each new volume in the Mental measurements yearbook series contains reviews of commercially available tests and
tests that have just been published or were revised since their review in a preceding volume. Entries are alphabetically
organized by the name of the test, with two reviews prepared independently by individuals with expertise in testing, in
the content area tested, or both. A new volume of this series appears about every three years. In addition, reviews
published since 1989 are available on the Internet to allow for on-line searches that can help consumers find reviews as
well as specific kinds of measures because of searching capabilities.
Several recent journal articles reviewing tests in a particular content area or for a particular group of children with
language impairments are also listed in Table 4.3.
Computer databases represent a more recent possible source of information on standardized measures. Reviews from the
Mental measurements year bookseries are
Page 105
available on-line through colleges, universities, and public libraries. Reviews included in this on-line database are
identical in content to those included in the bound volumes of the Mental measurements yearbook. Further, these reviews
are more timely than those appearing in the printed volumes because reviews that will eventually be incorporated in a
later bound volume are added every month.
The Health and Psychosocial Instruments (HaPI) database is also available at many libraries and can be searched on-line.
It allows one to search for information about a specific test, to find the publishing information about a test through its
name, acronym, or authorship, and to search for instruments focusing by content or age group. The HaPI publishes
abstracts and does not contain complete reviews of instruments. However, it does indicate whether information is
reported for seven critical characteristics: internal consistency reliability, test–retest reliability, parallel forms reliability,
interrater reliability, content validity, construct validity, and criterion-related validity.
Summary
1. Effective evaluation of measures of children’s communication and related skills must be conducted with appreciation
for the contextual variables affecting both children and clinicians.
2. The bioecological theory of Bronfenbrenner and his colleagues emphasizes the interplay of the child’s characteristics
with those of his or her environment, beginning with the family and extending to the broader physical, social, and
historical environment as well. The relevance of this theory to the evaluation of measures and measurement strategies for
children lies in the connection between validity and attention to these contextual variables.
3. Among the contextual variables affecting clinicians as they interact with children and evaluate their language are not
only personal variables (e.g., their own language and culture), but also legal variables and other variables affecting their
professional practice.
4. Evaluation of individual measures requires the potential test user to gather clues suggesting the strengths and
weaknesses of the measure for answering a particular clinical question for a particular client. Client-oriented reviews are
conducted to refine information obtained from a population-oriented review or in response to the exceptional needs of a
particular client.
5. Test manuals or other materials provided by the developer of a measure serve as the primary source of information to
be considered in evaluating its usefulness for a given client.
6. The test reviewer needs to approach the review process armed with a skeptical attitude toward unproven claims and an
arsenal of information regarding acceptable psychometric standards.
7. The Standards for educational and psychological testing (AERA, APA, & NCME, 1985) is the most widely accepted
source for such information on standards.
Page 106
8. Additional information for use in the reviewing process is available in the form of reviews published in standard
reference books, relevant journal articles, and computer databases.
9. In spite of existing ideals for evidence of reliability and validity, the clinician may nonetheless decide to use a
particular measure even when it does not reach an ideal, when it is the best available for a particular client, and a clinical
decision must be made.
Key Concepts and Terms
client-oriented measure review: evaluation of a measure’s appropriateness for use in answering a specific clinical
question for a single client.
Individuals with Disabilities Education Act (IDEA): federal legislation addressing the education needs of individuals
with disabilities, including children with communication disorders.
International Classification of Impairments, Disabilities, and Handicaps (ICIDH): a classification designed by the WHO
for global use by health professionals, educators, legislators, and other groups concerned with health-related issues to
serve as a common language.
Mental measurement yearbooks: a well-regarded source of test reviews.
nondiscriminatory assessment: the use of measures and procedures for administering and interpreting data that will not
confound a child’s language or dialect background with the target of testing.
population-oriented measure review: a preliminary evaluation of a measure’s likely appropriateness for use in answering
one or more clinical questions for a population of clients who share important similar characteristics. Population-oriented
reviews of measures are often conducted for subgroups of clients who are frequently seen by a given clinician.
Study Questions and Questions to Expand Your Thinking
1. Consider your own social ecology. Think about a specific kind of decision you have made or will make (e.g.,
concerning school or employment). What institutions and people affect your decision?
2. Talk to the parent of a young child about the contexts in which that child functions—daycare, time spent with
extended family, and so forth. Determine how many hours the child spends in each setting and who the main interaction
partners for the child are. How might these settings influence the communication experiences of this child?
3. List five domains of language.
4. Does time taken to conduct a test have any obvious potential relationship to the validity of testing? If so, when or for
what groups of children?
Page 107
5. Discuss the importance of conducting a client-oriented review rather than simply a population-oriented review of a
measure you will use with a specific client.
6. Go to the library and examine several volumes of the Mental measurements yearbook series. Describe the process by
which tests are selected to be reviewed, and examine two reviews for a single speech-language measure.
7. Choose a test that you have heard referred to in a course you have taken. See if you can find a review for it in the
Mental measurements yearbook series or elsewhere. Also, consider the extent to which the interaction implicit in the
testing procedures matches the kinds of experiences a child might have on an everyday basis.
8. Complete a review form for a norm-referenced speech-language test.
9. Complete a review form for a criterion-referenced speech-language measure.
Recommended Readings
Hutchinson, T. A. (1996). What to look for in the technical manual: Twenty questions for users. Language, Speech,
Hearing Services in Schools, 27, 109–121.
Sabers, D. L. (1996). By their tests we will know them. Language, Speech, Hearing Services in Schools, 27, 102–108.
Salvia, J., & Ysseldyke, J. (1998). Appendix 5. In J. Salvia & J. Ysseldyke (Eds.), Assessment (5th ed., pp. 763–766).
Boston: Houghton Mifflin.
References
American Psychological Association, American Educational Research Association, National Council on Measurement in
Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological
Association.
American Speech-Language-Hearing Association. (1995). Directory of speech-language pathology assessment
instruments. Rockville, MD: Author.
American-Speech-Language-Hearing Association. (1999). Guidelines for roles and responsibilities of the school-based
speech-language pathologist [On-line]. Available: http:/www.asha.org/professionals/library/slpschool_i.htm#purpose.
Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan.
Anastasi, A. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.
Bachman, L. F. (1995). Review of the Receptive–Expressive Emergent Language Test (2nd ed.). In J. C. Conoley & J. C.
Impara (Eds.), The twelfth mental measurements yearbook. (pp. 843–845). Lincoln, NE: Buros Institute of Mental
Measurements.
Bliss, L. S. (1995). Review of the Receptive–Expressive Emergent Language Test (2nd ed.). In J. C. Conoley & J. C.
Impara (Eds.), The twelfth mental measurements yearbook (p. 846). Lincoln, NE: Buros Institute of Mental
Measurements.
Bronfenbrenner, U. (1974). Developmental research, public policy, and the ecology of childhood. Child Development,
45, 1–5.
Bronfenbrenner, U. (1986). Recent advances in research on the ecology of human development. In R. K. Silbereisen, E.
Eyferth, & G. Rudinger (Eds.), Development as action in context: Problem behavior and normal youth development (pp.
286–309). New York: Springer-Verlag.
Bronfenbrenner, U., & Morris, P. (1998). The ecology of developmental processes. In W. Damon & R. M. Lerner (Eds.),
Handbook of child psychology: Theoretical models of human development (5th ed., Vol. 1, pp. 993–1028). New York:
Wiley.
Bzoch, K. R., & League, R. (1994). Receptive-Expressive Emergent Language Test-2. Austin, TX: Pro-Ed.
Page 108
Compton, C. (1996). A guide to 100 tests for special education. Upper Saddle River, NJ: Globe Fearon Educational.
Conoley, J. C., & Impara, J. C. (Eds.). (1995). The twelfth mental measurements yearbook. Lincoln, NE: Buros Institute
of Mental Measurements.
Crais, E. R. (1992). ‘‘Best practices” with preschoolers: Assessing within the context of a family-centered approach. In
W. Secord (Ed.), Best practices in school speech-language pathology: Descriptive/non-standardized language
assessment (pp. 33–42). San Antonio, TX: Psychological Corporation.
Crais, E. R. (1995). Expanding the repertoire of tools and techniques for assessing the communication skills of infants
and toddlers. American Journal of Speech-Language Pathology, 4, 47–59.
Damico, J. S., Smith, M., & Augustine, L. E. (1996). In M. D. Smith & J. S. Damico (Eds.), Childhood language
disorders (pp. 272–299). New York: Thieme.
Demers, S. T., & Fiorello, C. (1999). Legal and ethical issues in preschool assessment and screening. In E. V. Nuttall, I.
Romero, & J. Kalesnik (Eds.), Assessing and screening preschoolers: Psychological and educational dimensions (2nd
ed., pp. 50–58). Needham Heights, MA: Allyn & Bacon.
Donahue-Kilburg, G. (1992). Family-centered early intervention for communication disorders: Prevention and
treatment. Gaithersburg, MD: Aspen.
Education for All Handicapped Children Act of 1975. Pub. L. No. 94-142, 89 Stat. 773 (1975).
Education of the Handicapped Act Amendments of 1986. Pub. L. No. 99-457, 100 Stat. 1145 (1986).
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146).
New York: American Council on Education and Macmillan.
Frattali, C. (1998). Outcome measurement: Definitions, dimensions, and perspectives. In C. Frattali (Ed.), Measuring
outcomes in speech-language pathology (pp. 1–27). New York: Thieme.
Gardner, M. F. (1990). Expressive One-Word Picture Vocabulary Test–Revised. Novato, CA: Academic Therapy.
Hammer, A. L. (1992). Test evaluation and quality. In M. Zeidner & R. Most (Eds.), Psychological testing: An inside
view. Palo Alto, CA: Consulting Psychologists Press.
Hammill, D. D., & Newcomer, P. L. (1988). Test of Language Development–2 Intermediate. Austin, TX: Pro-Ed.
Huang, R., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speech-
language pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–23.
Hutchinson, T. A. (1996). What to look for in the technical manual: Twenty questions for users. Language, Speech,
Hearing Services in Schools, 27, 109–121.
Impara, J. C., & Plake, B. S. (Eds.). (1998). The thirteenth mental measurements yearbook (pp. 1050–1052). Lincoln,
NE: Buros Institute of Mental Measurements.
Individuals with Disabilities Education Act (IDEA). Pub. L. No. 101-476, 104 Stat. 1103 (1990).
Individuals with Disabilities Education Act Amendments of 1997. Pub. L. No. 105-17, 111 Stat. 37 (1997).
Keyser, D. J., & Sweetland, R. C. (1994). (Eds.). Test critiques (Vol. X). Austin, TX: Pro-Ed.
McCauley, R. J. (1996). Familiar strangers: Criterion-referenced measures in communication disorders. Language,
Speech, and Hearing Services in Schools, 27, 122–131.
McCauley, R. J., & Swisher, L. (1984). Psychometric review of language and articulation tests for preschool children.
Journal of Speech and Hearing Disorders, 49, 34–42.
Merrell, A. W., & Plante, E. (1997). Norm-referenced test interpretation in the diagnostic process. Language, Speech,
Hearing Services in Schools, 28, 50–58.
Murphy, L. L., Conoley, J. C., & Impara, J. C. (Eds.). (1994). Tests in print IV: An index to tests, test reviews, and the
literature on specific tests. Lincoln, NE: Buros Institute of Mental Measurements.
Muma, J. (1998). Effective speech-language pathology: A cognitive socialization approach. Mahwah, NJ: Lawrence
Erlbaum Associates.
Plante, E., & Vance, R. (1994). Selection of preschool language tests: A data-based approach. Language, Speech, and
Hearing Services in Schools, 25, 15–24.
Plante, E., & Vance, R. (1995). Diagnostic accuracy of two tests of preschool language. American Journal of Speech-
Language Pathology, 4, 70–76.
Page 109
Radziewicz, C. (1995). In E. Tiegerman-Farber, Language and communication intervention in preschool children (pp.
95–128). Boston: Allyn & Bacon.
Reid, D. K., Hresko, W. P., Hammill, D. D., & Wiltshire, S. (1991). Test of Early Reading Ability–Deaf or hard of
hearing. Austin, TX: Pro-Ed.
Rothlisberg, B. A. (1995). Review of the Test of Early Reading Ability–Deaf or Hard of Hearing. In J. C. Conoley & J.
C. Impara (Eds.), The twelfth mental measurements yearbook (pp. 1049–1051). Lincoln, NE: Buros Institute of Mental
Measurements.
Sabers, D. L. (1996). By their tests we will know them. Language, Speech, and Hearing Services in Schools, 27, 102–
108.
Salvia, J., & Ysseldyke, J. E. (1995). Assessment (7th ed.). Boston: Houghton Mifflin.
Stephens, M. I., & Montgomery, A. A. (1985). A critical review of recent relevant standardized tests. Topics in
Language Disorders, 5 (3), 21–45.
Sturner, R. A., Layton, T. L., Evans, A. W., Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and
language screening: A review of currently available tests. American Journal of Speech-Language Pathology, 3, 25–36.
Toubanos, E. S. (1995). Review of the Test of Early Reading Ability–Deaf or Hard of Hearing. In J. C. Conoley & J. C.
Impara (Eds.), The twelfth mental measurements yearbook (pp. 1051–1053). Lincoln, NE: Buros Institute of Mental
Measurements.
van Kleeck, A. (1994). Potential cultural bias in training parents as conversational partners with their children who have
delays in language development. American Journal of Speech-Language Pathology, 3, 67–78.
Vetter, D. K. (1988a). Designing informal assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making
in speech-language pathology (pp. 192–193). Philadelphia: B. C. Decker.
Vetter, D. K. (1988b). Evaluation of tests and assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision
making in speech-language pathology (pp. 190–191). Philadelphia: B. C. Decker.
Williams, K. T. (1997). Expressive Vocabulary Test. Circle Pines, MN: American Guidance Service.
World Health Organization. (1980). ICIDH: The international classification of impairments, disabilities, and handicaps.
Geneva: Author.
World Health Organization. (1998). Towards a common language for functioning and disablement: ICIDH-2: The
International Classification of Impairments, Activities and Participation. Geneva: Author.
Page 110
Page 111
PART
II
Suspected Causes
Related Problems
Defining the Problem
Sandy is a compact 6-year-old who was late in talking and considered unintelligible by all but a few family members
until about age 5. She is still often mistaken for a younger child because of her size, limited vocabulary, and frequent
errors in grammar. Having recently transferred to a new school, Sandy is having trouble adjusting and has become very
quiet except for occasional interactions with friends from her previous school.
Joshua, a 9-year-old with a history of delayed speech and language, continues to use short, simple sentences that are
often ineffective in getting his message across. Despite significant gains in his oral communication, he has made little
progress in early reading skills. Thus, despite two years of instruction and special support in both oral and written
language, he names letters of the alphabet inconsistently and has a
Page 114
sight vocabulary limited to about 30 words. Joshua also appears to have difficulty understanding many of the
instructions given in the classroom.
Wilson is a 4-year-old whirlwind who augments his limited speech productions with animated gestures and, sometimes,
truly gifted doodles. Because of his activity level and awkward, sometimes overwhelming style of interacting, he is
avoided by his peers and has formed fierce attachments to the preschool teacher and his speech-language pathologist.
Wilson’s parents and educators are beginning to question whether his activity level falls within the normal range and
will be discussing the possibility of having him evaluated for attention deficit disorder with hyperactivity at their next
meeting. Wilson’s ability to understand the communications of others has never been questioned.
Although Sandy, Joshua, and Wilson are varied in their patterns of communication difficulties, each can be described as
demonstrating specific language impairment (SLI), a disorder estimated to affect between 1.5 and 7% of children
(Leonard, 1998). A recently proposed figure of 7% for 5-year-olds may be the best current estimate of prevalence: The
research on which it was based was rigorous and included the use of a carefully selected sample of 7,218 children
(Tomblin et al., 1997). Although estimates differ considerably from study to study, it has generally been found that boys
are affected more often than girls, with some studies suggesting that boys are at twice the risk of girls (Tomblin, 1996b).
SLI can be defined as “delayed acquisition of language skills, occurring in conjunction with normal functioning in
intellectual, social-emotional, and auditory domains” (Watkins, 1994, p. 1). Thus, SLI is frequently described as a
disorder of exclusion. As such, it can seem like a definition of leftovers, encompassing those instances where language
impairment exists but cannot readily be attributed to factors that clearly limit a child’s access to information about
language or to the abilities required to undertake the creative task of language acquisition. On the other hand, specific
language impairment can be regarded as a “pure” form of developmental language disorder, one in which language alone
is affected (Bishop, 1992b).
Hopes of defining the nature of specific language impairment have instigated a wealth of research in child language
disorders over the past 50 years. Initially termed “congenital aphasia’’ or “developmental dysphasia,” SLI seemed to
offer the opportunity to look at a pure, or “specific,” variety of communication disorder (Rapin, 1996; Rapin & Allen,
1983). Historically, each of the categories of developmental language disorders examined in other chapters in this
section offered ostensibly obvious explanations for their existence. In contrast, children with SLI offered no apparent
explanations yet promised an opportunity to look at the unique effects of impaired language on development. Or so it
first appeared. In the Related Problems section of this chapter, you will read about the subtle differences in cognition and
other attributes that have been identified in children with SLI and that thus threaten narrow conceptions of specific
impairment.
The DSM–IV (American Psychiatric Association, 1994) does not use the term specific language impairment, but includes
two disorders that together cover much of the same terrain: Expressive Language Disorder and Mixed Expressive–
Receptive Language Disorder. Table 5.1 lists the diagnostic criteria for these two communication
Page 115
Table 5.1
Summary of Criteria for Two Disorders Corresponding to Specific Language Impairment From the Diagnostic and
Statistical Manual (4th ed.) of the American Psychiatric Association (1994)
disorders. The division of SLI into these two categories reflects a recurring impulse among researchers and clinicians to
identify subgroups within the larger population—in this case and most often according to whether receptive language is
significantly affected.
The DSM–IV criteria include a variation on the exclusionary elements of the SLI definition described up to this point.
Specifically, in Criterion D for both disorders, the clinician is directed to look for language impairments whose severity
is unexplained by the obvious threats to language development included in other exclusionary definitions (e.g., the
presence of hearing impairment or mental retardation). The DSM–IV definitions allow both for the identification of a
language impairment when no obvious threats exists as well as for cases where the presence of these threats does not
seem sufficient to account for the degree of problem presented.
Most researchers over the past three decades have used definitions largely like those discussed and have particularly
relied on the operationalization of SLI proposed by Stark and Tallal (1981, 1988). The details of such definitions,
however, have proven quite controversial (Camarata & Swisher, 1990; Johnston, 1992, 1993; Kamhi,
Page 116
1993; Plante, 1998). Moving from the laboratory to clinical practice in schools, the controversy is intensified because
state policies are vigorous participants in the decision-making process. In particular, the use of difference or discrepancy
scores is often mandated but has faced increasing criticism (e.g., Aram, Morris, & Hall, 1993; Fey, Long, & Cleave,
1994; Kamhi, 1998). Although methods used in identification of SLI are discussed at some length in the Special
Challenges in Assessment section of this chapter, they are mentioned here because they affect understanding of the
nature of the problem and therefore affect research intended to obtain information about suspected causes, patterns of
language performance, and related problems.
It seems important to recognize that SLI is a term that is often absent from the day-to-day functioning of speech-language
pathologists in many clinical and education settings. Instead, they frequently use the terms language delays or language
impairments, thereby remaining silent on the specificity of a given child’s problems (Kamhi, 1998). Nonetheless, the
foundation of research on this population and clinical writings provides an important context for scientifically oriented
clinical practice. In the same way that field geologists need to know about basic chemistry despite few encounters in the
wild with pure iron or other elements, speech-language pathologists can learn from attempts to identify and understand
SLIs and to recognize them when they encounter them in their practice. The very length of this chapter compared with
the others addressing information about subgroups of children with language disorders testifies to the fertility of the
resulting explorations.
Suspected Causes
The question of what causes isolated language impairment has been approached from several perspectives—from genetic
to linguistic, physiological to social. It remains a question—or, more accurately, a series of related questions—that
tantalizes researchers, clinicians, and parents alike. It is best viewed as a set of related questions because one can
conceive of causes on several different levels (e.g., physical as well as social). In addition, it can be viewed that way
because effects are frequently the result of a convergence of causes rather than a single cause. Thus two or more factors
may need to come into play before impaired language occurs. Understanding causation is further clouded by the fact that
researchers are frequently only in the position of identifying risk factors; that is, factors that tend to co-occur with the
presence of SLI, but that can only be thought of as potential causes until the nature of the association can be worked out
through further research.
In this section, a review of suspected causes encompasses not only differences in brain structure and function, genetics,
and selected environmental factors, but also more abstract linguistic and cognitive discussions of the origins of specific
language disorder in children. Although there is considerable turmoil in the community of child language researchers
concerning the more abstract accounts provided in linguistic and cognitive explanations, their role in assessment and
planning for treatment has the potential for being more immediate and influential than that of accounts related to genetics
and physiology.
Page 117
Genetics
Genetic origins for SLI have probably been suspected for some years by anyone who has encountered families in which
language problems seem more commonplace than one might expect given the relative exceptionality of language
impairment. Nonetheless, serious study of genetic contributions to SLI have been undertaken only in the last couple of
decades (Leonard, 1998; Pembrey, 1992; Rice, 1996). Largely, the increase in such studies has occurred because of
advances in the study of behavioral genetics (Rice, 1996). In addition, however, the delayed interest in the genetics of
language impairment has resulted from the need for agreement on a phenotype, that is, the behavior or set of behaviors
that constitute critical characteristics of the disorder (Gilger, 1995; Rice, 1996).
Several different types of genetic studies are regularly used to link specific diseases or behavioral differences with
genetic underpinnings (Brzustowiz, 1996). Among those that have been used to the greatest extent so far in studying SLI
are family studies, twin studies, and pedigree studies. In family studies, the family members of a proband (i.e., an
affected person who is the focus of study) are examined to determine whether they show evidence of the characteristic or
disorder under study at rates that are higher than would be expected in the general population. If they do, the
characteristic or disorder is considered familial—a state of affairs that could be due to genetic origins or to common
exposure to other influences. Thus, for example, a fondness for chocolate might be found to be familial, but, without
further study, could just as easily be due to long exposure to a kitchen full of chocolate delicacies as to a genetic basis.
In twin studies, comparisons of the frequency of a characteristic or disorder are made between identical and fraternal
twins. Because identical twins share the same genetic makeup, they should show higher concordance for the
characteristic if it has a genetic basis; that is, there should be a strong tendency for both identical twins to either have or
not have the characteristic. In contrast, if their rates of concordance are relatively high, but similar to those of the
fraternal twins (who are no more genetically related than any pair of siblings and thus on average share 50% of their
genetic make-up), the characteristic might still be considered familial. However, in that case, it would be more likely the
result of environmental rather than genetic influences. (See Tomblin, 1996b, for a discussion of some of the complexities
of this type of design.)
In pedigree studies, as many members as, possible of a single proband’s large, multigenerational family are examined in
order to get insight into patterns of inheritance associated with the targeted characteristic or disorder. Closely related to
pedigree studies are segregation studies in which multiple families with affected members are examined to compare
observed patterns of inheritance with patterns that have been observed for other genetically transmitted diseases.
Despite the difficulties associated with defining a disorder as complex as SLI (Brzustowicz, 1996), considerable progress
has been made over the past 15 years in understanding genetic contributions to the disorder. Familial studies (e.g., Neils
& Aram, 1986; Tallal, Ross, & Curtiss, 1989; Tomblin, 1989) have consistently demonstrated higher risk among families
selected because of an individual member with SLI than families selected because of an unaffected member who is
serving as a control
Page 118
participant. Complicating these findings, however, have been observations that many children with SLI come from
families where they are the only affected member (Tomblin & Buckwalter, 1994). Further, family histories of SLI may
be more common among children with expressive problems only than among those with both receptive and expressive
problems (Lahey & Edwards, 1995).
Whereas some familial studies (e.g., Neils & Aram, 1986; Tallal et al., 1989; Tomblin, 1989) have used questionnaires to
examine the language skills of other often older family members, others have used direct assessment of language skills (e.
g., Plante, Shenkman, & Clark, 1996; Tomblin & Buckwalter, 1994). The latter studies are considered more desirable
(Leonard, 1998) because they rely neither on participant’s memories of childhood difficulties nor on potentially
incomplete and inaccurate school or clinical records. Further, they seem to be more sensitive to manifestations of SLI in
adults, thereby capturing a greater number of affected individuals for examination of inheritance patterns (Plante et al.,
1996). Most importantly, however, both types of studies can demonstrate familial patterns of SLI, which are the first step
toward proving its genetic underpinnings for in least some affected individuals.
Twin studies (e.g., Bishop, 1992a; Tomblin, 1996b) have demonstrated higher concordance for SLI among identical than
fraternal twins, thus providing evidence of some degree of genetic influence. However, even among identical twins,
concordance is not perfect, despite their identical genetic make-up. Consequently it has been suggested that either the
affected gene associated with SLI does not always produce the same outcome (due to incomplete penetrance) or it does
not operate alone to produce SLI (Tomblin & Buckwalter, 1994; Leonard, 1998). In the former case, incomplete
penetrance refers to cases in which a gene associated with a disorder fails to act in an all-or-nothing fashion, with some
people who carry a gene showing no ill effects (Gilger, 1995). The latter prospect means that SLI may be caused by
more than one gene or that a gene or group of genes must operate in combination with environmental factors.
Current research on the genetics of SLI is weighing these alternative scenarios. Among the kinds of studies needed are
pedigree and segregation studies in which groups of families or a single family is studied across generations. One family,
referred to as the KE family, has been under study for some time (e.g., Crago & Gopnik, 1994; Gopnik & Crago, 1991;
Vargha-Kadeem, Watkins, Alcock, Fletcher, & Passingham, 1995). This family continues to be examined to determine
whether a hypothesized autosomal dominant transmission mode is at work. Briefly, autosomal dominant transmission
means that the disorder is transmitted through a pair of autosomal chromosomes (i.e., one of the 22 chromosome pairs
that are not sex-linked) and will occur even if only one of the two chromosomes in a pair is affected.
The KE family has many affected members, as would be expected given an autosomal dominant mode of transmission,
as opposed to modes involving the sex chromosomes (a single pair) or a recessive mode of transmission in which both
members of a pair would be affected to result in the disorder. In fact, most members of the KE family demonstrate both
severely impaired speech and language, and several show cognitive impairment or psychiatric disorders as well. Thus,
additional work is needed to examine other families who might be more representative of greater numbers of children
with SLI.
Page 119
Continuing pursuit of information about genetic bases is thought to be useful because it may be possible to determine
what aspects of language impairment are more biologically determined and, therefore, perhaps less amenable to
treatment. Once those determinations are made, clinicians could focus on the fostering of compensatory strategies or on
the amelioration of remaining aspects of the language impairment that may be more modifiable through treatment (Rice,
1996).
Differences in Brain Structure and Function
The prospect of differences in brain structure and function between children with SLI and those without has beckoned as
a potential explanation since researchers first began ruminating about this disorder. This is illustrated by the use of the
term childhood aphasia in the 1930s and several decades thereafter. Among the possibilities that have been examined are
those of early damage to both cerebral hemispheres, of damage to the left hemisphere only (Aram & Eisele, 1994;
Bishop, 1992a), as well as the possibility that differences are not the result of “damage” per se, but rather are the
expression of natural genetic variation (Leonard, 1998).
Currently, cases of frank neurologic damage—for example those following a stroke or head injury—are excluded from
definitions of SLI. Somewhat more difficult to classify are the problems of children with Landau–Kleffner syndrome,
also called acquired epileptic aphasia. These children fail to show signs of focal damage except for
electroencephalographic abnormalities, yet they experience a profound loss of language skills (Bishop, 1993). Although
included in early formulations of childhood aphasia, this syndrome has recently been found to fit within cases that are
typically excluded from SLI.
Despite the exclusion of known brain damage from strict definitions of SLI, a relatively large number of studies using
techniques such as magnetic resonance imaging (MRI) and, less frequently, autopsy examination have been undertaken
to determine whether subtle differences in brain structure and function can account for the difficulties facing children
with SLI. Often these differences have been structural anomalies that seem to depart from those considered optimal for a
left-hemisphere dominance for speech—leading to either right-hemisphere dominance or a lack of dominance by either
hemisphere (Gauger, Lombardino, & Leonard, 1997). Increasingly, it is thought that such differences may reflect
variations in structure that make language development less efficient (e.g., Leonard, 1998).
Two areas of the cerebral hemispheres in which such variations have been identified are the plana temporale and the
perisylvian areas, illustrated in Fig. 5.1. These two areas overlap, with the smaller planum temporale lying within the
larger perisylvian region of each hemisphere; both of the areas lie within an area that has consistently been shown to be
associated with language function.
Examinations of the plana temporale in individuals with SLI were sparked by a 1985 autopsy study (Galaburda,
Sherman, Rosen, Aboitiz, & Geschwind) of adults who had had written language deficits. Detailed examination of these
individuals’ brains after death showed an atypical symmetry between the planum temporale on the left and the one on the
right. This pattern contrasted with the more typical asymmet
Page 120
Fig. 5.1. The left cerebral hemisphere with the planum temporale highlighted. From Neural bases of speech, hearing,
and language (Figure 9-2), by D. P. Kuehn, M. L. Lemme, & J. M. Baumgartner, 1989, San Antonio, TX: Pro-Ed.
Copyright 1989 by Pro-Ed. Adapted with permission.
ric arrangement in which the planum temporale on the left is bigger than that on the right, with the larger size thought to
reflect greater involvement in language processing. The atypical symmetry results from a typically sized left planum
temporale and a larger-than-usual right planum temporale. In the only autopsy study conducted to date for a single child
with SLI, this same atypical symmetry was observed (Cohen, Campbell, & Yaghmai, 1989).
Similar asymmetries, with left- larger than right-hemisphere perisylvian areas, have also been identified in autopsy
studies performed on individuals who did not have SLI during their lifetimes (e.g., Geschwind & Levitsky, 1968;
Teszner,Tzavares, Gruner, & Hecaen, 1972). The perisylvian areas, rather than the smaller plana temporale, became the
focus of a series of studies conducted by Plante and her colleagues (Plante, 1991; Plante, Swisher, & Vance, 1989;
Plante, Swisher, Vance, & Rapcsak, 1991). In those studies, Plante and her colleagues compared the relative size of these
areas between hemispheres and between family members who were affected or unaffected by SLI. The researchers
focused on the perisylvian areas rather than the plana temporale because of limitations in the use of MRI (Plante, 1996)—
a technique that was nonetheless highly desirable because it could be used even on very young, live participants.
The researchers found that children with SLI and their families demonstrated perisylvian areas that were larger on the
right than those typically seen in studies of individuals without SLI or a known family history of SLI (Plante, 1991;
Plante et al., 1989, 1991). These larger right perisylvian areas sometimes associated with symmetry across hemispheres
and sometimes with asymmetries favoring the right hemisphere. Nonetheless, because some individuals with atypical
configurations did not show language impairment, and others with normal configurations did show such impairment, this
structural difference cannot be seen as a single cause of language impairment. In a 1996 review of this literature, Plante
noted that the absence of abnormal findings for some individuals may simply be due to the insensitivity of MRI
techniques to subtle differences in brain structure. Nonetheless, her argument does not really explain the instances in
which identified atypical structures are associated with normal language perform-
Page 121
ance. Furthermore, Plante, as well as other researchers in the field (Leonard, 1998; Rice, 1996; Watkins, 1994), believe
that a number of factors probably need to be in place for structural brain differences to culminate in language impairment.
More recent studies have looked not only at the perisylvian areas but also at other brain structures for differences that
may help researchers better understand SLI (e.g., Clark & Plante, 1998; Gauger et al., 1997; Jackson & Plante, 1996).
Whereas many of these have been regions in or close to the perisylvian region (e.g., Clark & Plante, 1998; Jackson &
Plante, 1996), others have looked at much larger areas of the cerebrum (Jernigan, Hesselink, Sowell & Tallal, 1991), at
the extensive tract of nerve fibers connecting the two cerebral hemispheres (Cowell, Jernigan, Denenberg & Tallal, 1994,
cited in Leonard, 1998), and at areas including the ventricles (Trauner, Wulfeck, Tallal, & Hesselink, 1995). All of these
studies found at least some differences (Cowell et al., 1994).
In a recent review of these studies and others using behavioral and neurophysiological data, Leonard (1998) summarized
the evidence as indicating the high percentage of atypical neurobehavioral findings for children with specific language
impairment implicates a “constitutional basis” that may contribute to the presence of language impairment. The origins
of these suspected differences in brain structure lead to other kinds of questions about causes, such as environmental
factors.
Environmental Variables
Environmental variables can encompass physical, social, emotional, or other aspects of the developing child’s
surroundings from conception onward. Two types of environmental variables, however, have received the greatest
amount of attention for SLI—(a) variables constituting the social and linguistic environment in which children with SLI
are acquiring their language (Leonard, 1998) and (b) demographic variables, such as parental education, birth order, and
family socioeconomic status (SES), that affect that environment in less direct ways (Tomblin, 1996b).
A particularly engaging and clear account of the literature examining conversational environment of children with SLI
can be found in Leonard (1998, chap. 8). In the literature examining this type of environmental influence (e.g.,
Bondurant, Romeo & Kretschmer, 1983; Cunningham, Siegel, van der Spuy, Clark, & Bow, 1985), most studies have
focused on the nature and linguistic content of conversations occurring between children with SLI and their parents.
Usually, comparisons are made to conversations between parents and their normally developing children (age-matched
or language-matched, depending on the study). In addition, in order to clarify “chicken-or-the-egg” speculation about the
direction of causation (i.e., Are differences in conversation causing children’s problems or resulting from them?), studies
have also examined conversations between children with SLI and unrelated adults (Newhoff, 1977) and even with other
children (e.g., Hadley & Rice, 1991).
Despite the impediments offered by abundant methodological variations and challenging patterns of empirical
disagreements, Leonard (1998) ventured a few generalizations about this line of investigation. First, most of the evidence
in which children with SLI are compared with control children who are similar in age suggests that their
Page 122
conversation partners (parents, other adults, and peers alike) make allowances for their diminished language skills and
are thus reacting to, rather than causing, the children’s problems. For example, Cunningham et al. (1985) found that
mothers of children with SLI interacted similarly to mothers of control children of the similar ages in conditions of free
play, but asked fewer questions during a structured task. In addition, for those children with SLI whose comprehension
and production were both affected, mothers reduced their length of utterance, something that was not done by mothers
whose children were either normally developing or had SLI in which only expressive language was affected.
Second, Leonard (1998) contended that in studies where children with SLI are compared with younger children who are
similar in language characteristics, findings are less consistent in showing differences. Nonetheless, the most reliable
difference in how each group is spoken to by their parents involves the frequency with which recasts are used. A recast is
a restatement of a child’s production using grammatically correct structures, often incorporating morphosyntactic forms
that had been omitted or produced in error by the child (Leonard, 1998). Recent research has suggested that this
conversational strategy is used frequently by parents of normally developing children at earlier stages, but is then faded
over time. It has also been shown to be a useful therapeutic strategy (Nelson, Camarata, Welsh, Butkovsky, & Camarata,
1996). Interestingly, Leonard noted that rather than increasing their use of this kind of statement with children with SLI
as might be expected in compensation, parents of children with SLI use it less frequently than those of children without
SLI. Despite the possible value of additional research in clarifying why this difference is seen, all in all, this line of
research has not proven as productive to the understanding of the genesis of SLI as was once hoped (Leonard, 1998).
Turning to possible clues in the form of demographic variables, Tomblin (1996b) searched for risk factors in
demographic data obtained from the preliminary results (consisting of 32 children with SLI and 114 controls) of a larger
epidemiological study (planned to include 200 children with SLI and 800 controls). Specifically, he looked for
associations between demographic and biological data and the presence of SLI. Among the variables he examined
relative to the home environment were parent education, family income, and birth order of the child in the family.
Although there were trends in the direction of children with SLI being later born and having parents with fewer years of
education than unaffected children, neither of these trends was significant. Tomblin speculated that the two trends may
have been due to the extent that lower incomes are associated with larger families.
Also available to Tomblin (1996b) were data concerning exposure to biological risk factors including maternal infection
or illness, medication, use of alcohol, and use of tobacco during pregnancy, as well as the evidence of potential trauma at
birth and the participants’ birthweights. In these preliminary data at least, Tomblin found no differences between the
groups relative to maternal infection and illness during pregnancy, and actually found lower, but nonsignificant rates of
exposure to alcohol and medication. Birth histories and birthweights also did not differ significantly. Only maternal
smoking showed a trend towards higher levels among the children with SLI. Although attributing the lack of significant
findings to the relatively small sample sizes used,
Page 123
Tomblin also suggested that the larger numbers associated with the completed study would be unlikely to reveal effect
sizes of any major significance, where effect sizes relate to the magnitude of the difference between groups.
Clearly, findings across several lines of research suggest the need for the continuation and coordination of efforts to
understand the complexity of variables that put children at risk for SLI. Although neurologic and genetic research
findings have been particularly exciting over the past two decades, these variables are not sufficient by themselves to
explain SLI. Biological and environmental factors represent important frontiers for a more complete understanding of
language impairment (Snow, 1996). At a different level of explanation, linguistic and cognitive accounts attempt to
provide more immediate explanations for the specific patterns of language behaviors seen in SLI and their variability
across children and over varying ages.
Linguistic and Cognitive Accounts
A large number of linguistic accounts of SLI as well as cognitive accounts have been advanced over the past several
decades. At present, more than a dozen warrant serious consideration (Leonard, 1998). As a group, these accounts
deserve some attention here because of their potential impact on assessment and treatment of children for whom SLI is
suspected or confirmed.
As discussed in previous chapters, the validity of the assessment tool chiefly turns on the extent to which it captures the
construct being measured. Consequently, different models of SLI imply the need for different measures. In practice,
however, the link between theoretical understandings of a complex behavior and readily available assessment procedures
is usually far from direct. This is particularly true when there are a large number of competing accounts but no clear
front-runners—the current case for SLI. In addition, the term accounts, used here and used by Leonard, specifically
implies acknowledgement that these formulations fail to tie together the breadth of data that are typically associated with
use of the term theories. Despite these limitations, some familiarity with these competing accounts can help readers
anticipate future trends in both theoretical efforts and in recommended assessment practice.
Leonard (1998) reviewed a wide field of linguistic and cognitive explanations of SLI, dividing them into three
categories. Specifically, he considered six explanations of SLI focusing on deficits in linguistic knowledge: three on
limitations in general processing capacity and three on specific processing deficits. Because of space limitations, each of
these twelve accounts cannot be discussed in detail here. Instead, a small subset will be used to introduce readers to this
complex debate and illustrate the challenges awaiting researchers and clinicians who seek to translate these accounts into
assessment practice.
Language Knowledge Deficit Accounts
Leonard (1998) argued that Chomsky’s (1986) principles and parameters framework to language acquisition can be seen
as a foundation for the major accounts in which deficits in linguistic knowledge are postulated as central to SLI.
Stemming
Page 124
from transformational grammar of the 1960s and 1970s, ‘‘principles” represent universals of natural languages, and
“parameters” the dimensions along which individual languages differ. Children are presumed to work within the
constraints associated with universal principles to acquire the specific knowledge of the parameter settings associated
with their ambient language. Chief among the difficulties facing children in this process is the apparent need to
understand more than just the surface relations existing between words in sentences as they are heard. Rather, they must
also understand the underlying, or inferred, relationships between lexical categories (e.g., noun, verb, adjective) and
functional categories that explain relationships between words within sentences (e.g., complementizer, inflection,
determiner).
Differences in the accounts that Leonard (1998) placed within this category lie primarily in which area of linguistic
knowledge is absent or, more often, incomplete in children with SLI. Leonard himself and several colleagues are
associated with accounts in which knowledge of functional categories overall is deemed incomplete (Leob & Leonard,
1991; Leonard, 1995). Alternatively, Rice, Wexler, and Cleave (1995) are associated with the extended optional
infinitive account, in which children with SLI are thought to remain too long in a developmental phase in which tense is
treated as optional. Other accounts see children with SLI as unable to develop implicit grammatical rules (Gopnik,
1990), as developing rules that are too narrow in their application (e.g., Ingram & Carr, 1994), or as lacking the ability to
understand different agreement or dependent relationships existing between functional categories (e.g., Clahsen, 1989;
van der Lely, 1996).
Among the significant challenges facing these accounts is their need to provide more complete explanations of the
variability in developmental patterns shown by children with SLI and of crosslinguistic differences in the error patterns
and development of children with SLI. In addition, despite emerging efforts to tie linguistic accounts to genetic,
biological, and environmental accounts (e.g., Gopnik & Crago, 1991), further steps in that direction are needed.
Accounts Positing General Processing Deficits
General processing deficit accounts of SLI place general deficits in cognitive processing at the core of SLI, with the most
ambitious of them holding these deficits responsible for both the linguistic and nonlinguistic differences seen in children
with SLI (Leonard, 1998). Rather than assume that specific cognitive mechanisms are affected—as is done in the third
and final category of accounts—these accounts postulate a more widespread deficiency offering a simpler, more elegant
explanation of the patterns of deficits seen in children with SLI. Typically, such accounts tend to describe central
cognitive deficits in terms of reductions in processing capacity or speed.
Such accounts are particularly compelling for explanations of difficulties in word recall and retrieval and comprehension
as well as nonlinguistic cognitive deficits, but must also explain the special difficulties associated with morphosyntax in
most English-speaking children with SLI. Among the numerous researchers cited by Leonard (1998) as working on
accounts of this type are Ellis Weismer (1985), Bishop (1994), Edwards and Lahey (1996; Lahey & Edwards, 1996) as
well as Leonard himself.
Page 125
Leonard’s surface hypothesis (e.g., Leonard, 1989; Leonard, Eyer, Bedor, & Grela, 1997) represents one of the most
thoroughly probed of the general processing deficit accounts and, consequently, serves here as an important exemplar of
such accounts. The surface hypothesis suggests that differences in the pattern of deficits observed crosslinguistically in
children with SLI may be due to differences in language structure across languages. Such differences are thought to lead
to differences in processing demands rather than to the impaired gralnmatical systems posited by linguistic accounts.
This account emphasizes the importance of surface features of languages, such as the physical properties of English
graummatical morphology, that may represent special challenges to children, particularly to those with reduced
processing capabilities.
According to the surface hypothesis, children with SLI will take longer to acquire the more difficult aspects of their
language and may focus their processing efforts in some areas at the expense of others (e.g., on word order at the
expense of morphology). Among those features of a language that are considered particularly vulnerable are those that
are relatively brief, uncommon in languages of the world, or less regular within the language (e.g., numerous
grammatical morphemes in English). Leonard (1998) provides a thorough description of the successes and failures of
this account in explaining an ever expanding body of empirical data from several language groups. Further he shows its
basic compatibility with the surface hypothesis and other processing limitation accounts that emphasize reduced speed of
processing.
As with the grammatical knowledge accounts, accounts that posit general processing deficits have a wide range of
crosslinguistic data to address, including patterns of errors and of acquisition patterns in children with SLI. Further, the
appeal of such accounts in terms of simplicity is enhanced if they can also address similar data for children without
impaired language. Add to that the desirability of addressing emerging data on the genetic and biologic factors
associated with SLI and it becomes only a small wonder that consensus leading to a unified theory of SLI eludes the
research community at this time. The last of the three types of accounts Leonard describes within this community
wrestles with this same list of empirical challenges but proposes cognitive limitations that are more specific in nature.
Specific Processing Deficit Accounts of SLI
According to Leonard (1998), three accounts have focused on specific deficits as responsible for far-reaching
consequences for language function. Respectively, these accounts hypothesize deficits in phonological memory (Ellis
Weismer, Evans, & Hesketh, 1999; Gathercole & Baddeley, 1990), in temporal processing (Tallal, 1976, Tallal &
Piercy, 1973; Tallal, Stark, Kallman, & Mellits, 1981), and in the mechanisms used for grammatical analysis (Locke,
1994). These accounts are less well developed than the linguistic and general cognitive deficit accounts in terms of the
breadth of data they encompass.
Of these accounts, the accounts associated with temporal processing (viz., Stark & Tallal, 1988; Tallal et al., 1996) have
had the greatest recent impact, including considerable attention in the popular press (e.g., in a USA Today article [Levy,
1996]).
Page 126
This attention has largely been the result of the popularization of a specific training program called Fast ForWord
(Scientific Learning Corporation, 1998).
After a long history of work on SLI, Tallal joined with Michael Merzenich and others to conduct a series of remarkable
treatment studies (Merzenich et al., 1996; Tallal et al., 1996). In those studies use of Fast ForWord, a computer training
program designed to address hypothesized processing difficulties, resulted in significant gains in language performance
and auditory processing. Development of that program was based on evidence that children with SLI have difficulty
processing brief stimuli or stimuli that follow one another in rapid succession—difficulties that might significantly affect
a child’s ability to process speech. Further, the program is based on the hypothesis that the deficit can be ameliorated by
exposing children with SLI to stimuli that are initially recognizable but acoustically altered through the lengthening of
formant transitions. During treatment, children participate in a large number of video-game-like trials in which they are
required to make judgements about the altered stimuli. Across trials, the stimulus characteristics are altered in the
direction of natural speech.
Readers are encouraged to take note of the debate surrounding this account and the commercialization it has fostered (e.
g., Gillam, 1999; Veale, 1999). Ironically, the authors of the other accounts discussed in this section of the chapter have
appeared to take greater pains to tie together a huge number of empirical clues about the nature of SLI. However, it is
rare to find the public so aware of an account—or at least the treatment program associated with it—and to clamor for its
use with children presenting with a wide range of communication-related disorders (including reading disabilities and
autism). These public responses alone make it a fascinating area of additional investigation for clinicians and researchers
interested in children’s language disorders. Independent validation of this treatment and its theoretical underpinnings has
yet to be provided (Gillam, 1999).
What’s Ahead for Accounts of SLI?
In this section, I have tried my best to point out the most important landmarks of this vast and changing terrain (helped
considerably by the work of Leonard, 1998, and the urgings of Bernard Grela to address these complex issues).
However, I am certain that I have missed some important vantage points and critical roadways. Nonetheless, I hope that
this brief overview provides you with the sense of the complexities facing these researchers.
The researchers working on this topic have immense amounts of data to address if they are to settle on a truly
comprehensive theory, rather than fragmented accounts of isolated aspects of SLI. Not only must they deal with
information about how children of SLI perform on a range of language and nonlanguage tasks, they must do so for the
wide range of spoken languages and across the life span. Further they must tie these together with the burgeoning
findings about the genetics, brain structures, and social contexts of children with SLI.
Other challenges facing researchers interested in SLI have been summarized by Tager-Flusberg & Cooper (1999), who
reviewed the findings of a recent National Institutes of Health workshop focused on steps needed to produce clear
definitions of SLI
Page 127
for genetic study. Despite the narrow focus of that conference, the recommendations that came out of it appear germane
to thoughts about the relation of theory to assessment practices. Among the recommendations summarized by Tager-
Flusberg and Cooper are that researchers abandon exclusionary definitions of SLI, broaden the language domains and
information-processing skills they assess, and develop a standard approach to defining SLI, not only in preschoolers but
also in older school-age children, adolescents, and adults. These same recommendations are clinically relevant insofar as
combining clinical and research efforts may result in the greatest gains in both arenas.
Special Challenges in Assessment
In addition to the theoretical challenges to the assessment of children with SLI, these children also come with a range of
personal reactions to testing that are at least partially determined by the amount of success they expect. Any of us who
has difficulty in certain areas, such as singing, drawing, or playing sports knows how uncomfortable we feel when our
performance in those areas is evaluated. Consequently, I urge you to refer back to chapter 3 for some of the general
guidelines addressed in that chapter, which will serve as a useful exercise in preparing for working with children with
SLI.
Beyond the personal dynamics that should always be a special consideration in assessment, children with SLI present
several problems related to how they are identified as needing help. Plante (1998) pointed out at least three problems
with how such children have been identified by researchers. Some of Plante’s concerns about the literature also face
clinicians. Even those that do not, deserve attention by knowledgeable consumers of this research literature.
First, Plante (1998) argued, researchers have tended to use criteria for nonverbal IQ (often nonverbal IQ of 85 or greater)
that exclude not only children with mental retardation but large numbers of others whose lower intelligence makes them
no less relevant to our understanding of SLI. Second, Plante noted that in the identification process, researchers have
tended to use tests and cutoff scores on those tests that have not been shown to successfully identify children with the
disorder. Specifically, she questioned two particular aspects of the validity of those tests and cutoffs: their sensitivity (the
extent to which individuals with disorders are actually identified as having the disorder) and specificity (the extent to
which individuals without disorders are successfully identified as such). (See chap. 9 for more complete explanations of
these concepts).
Third, Plante (1998) questioned the use of discrepancy or difference scores in the practice often referred to as cognitive
referencing. Cognitive referencing occurs when the identification of SLI hinges on the demonstration of a specific
difference between expected language function (based on nonverbal IQ) and language performance. Plante attacked this
practice on two grounds: (a) because of a tendency for such comparisons to be based on age-equivalent scores, which are
the targets of a long history of criticism from psychometric perspectives (e.g., see chap. 2) and (b) because there is no
good evidence to support the use of nonverbal IQ as an indicator of language potential. As just one example of this lack
of evidence, Krassowski and Plante (1997) reported a lack of stability in the performance IQ scores of 75 children with
SLI over a 3-year time frame that would be inconsistent with their use as a constant
Page 128
measure of language potential. Plante and her colleagues are joined by large numbers of the community of language
researchers in finding serious—many would say fatal—flaws with cognitive referencing (e.g., Aram et al., 1993; Fey et
al., 1994; Kamhi, 1998; Lahey, 1988). Along with the instability of categorizations obtained through cognitive
referencing, others have noted that similar amounts of improvement in specific treatments are made by children who
would fall on both sides of conventional cognitive criteria (e.g., Fey et al., 1994).
Even readers who have simply skimmed earlier chapters on their way to this one will recognize certain common
dilemmas facing clinicians as well as researchers regarding cognitive referencing. Thus, for example, both groups need
to be as careful as possible to select measures that have been studied very carefully for the purpose to which they are
being used. That is, evidence of criterion-related validity for how and with whom measures are used is something in
which both clinicians and researchers have a prodigious stake. In addition, both groups should avoid the relatively
unreliable and misleading nature of age-equivalent scores—insofar as they are able to do so. The “wiggle room” left by
that last clause stems from the fact that clinicians may find themselves compelled to use age-equivalent scores by the
settings in which they work, particularly for younger children. With regard to cognitive referencing, Casby (1992) noted
that in 31 states, eligibility for services based on SLI demand its use in some form. In such situations, an ethical and
sensible recommendation would be to provide the required documentation (i.e., to go ahead and report the cognitive-
referenced information, age-equivalent scores, or both), but accompany it with appropriate warnings about the
limitations of each and recommendations from a more scientifically supportable perspective.
In a discussion of problems of differential diagnosis in SLI, Leonard (1998) called attention to a further difficulty
associated with the assessment of children considered at risk for the disorder. Specifically, he called attention to the
difficulty in distinguishing late talkers, who will ultimately prove to be simply late in developing language, from those
children whose late talking foretells persisting problems in language acquisition. Most children with SLI have a history
of late talking (which is usually defined in terms of late use of words). However, only one quarter to one half of late-
talkers will go on to be diagnosed with a language disorder. Developing accurate predictions of which children are
showing early signs of SLI has spurred the efforts of a number of researchers who hope that early identification will lead
to effective and efficient early intervention (e.g., Paul, 1996; Rescorla, 1991).
Unfortunately, the dramatic variability in children’s normal language development is proving a considerable obstacle.
Thus, reliable signs yielding reasonably accurate predictions have evaded researchers, leading Leonard (1998) to
recommend with- holding diagnoses until at least age 3 and Paul (1996) to advise a “watch and see” policy. A differing
interpretation of the data on which Paul’s recommendations are based that includes a plea for more aggressive
intervention can be found in van Kleeck, Gillam, and Davis (1997).
Also urging more aggressive responses to late-talking children, Olswang, Rodriguez, and Timler (1998) represent a
somewhat more optimistic reading of the research evidence. Specifically they outlined speech and language differences
and other risk factors that they propose should prompt decisions to intervene. Table 5.2
Page 129
Table 5.2
Predictors and Risk Factors Useful in Helping Clinicians Decide Whether to Enroll Toddlers Who Are Late Talkers for
Intervention
Predictors
Language production
Small vocabulary for ageFew verbsPreponderance of general all-purpose verbs (e.g., want, go, get, do, put, look, make,
got)More transitive verbs (e.g., John hit the ball)Few intransitive and ditransitive verb forms (e.g., he sleep, doggie run)
PlayPrimarily manipulating and groupingLittle combinatorial and/or symbolic play Otitis media—Prolonged periods of
untreated otitis media Language comprehensionPresence of 6-month comprehension gapLarge comprehension-
production gap with comprehension deficit Gestures—Few communicative gestures, symbolic gestural sequences, or
supplementary gestures Heritability—Family member with persistent language and learning problems PhonologyFew
prelinguistic vocalizationsLimited number of consonantsLimited variety in babbling structureLess than 50% consonants
correct (substitution of glottal consonants and back sounds for front)Restricted syllable structureVowel errors Social
skillsBehavior problemsFew conversational initiationsInteractions with adults more than peersDifficulty gaining access
to activities Parent needsParent characteristics: Low SES; directive more than response interaction styleExtreme parent
concern ImitationFew spontaneous imitationsReliance on direct model and prompting in imitations tasks of emerging
language forms Note. From “Recommending Intervention for Toddlers With Specific Language Learning Difficulties:
We May Not Have All the Answers, but We Know a Lot,” by L. Olswang, B. Rodriguez, & G. Timler, 1998. American
Journal of Speech-Language Palhology, 7, p. 29. Copyright 1998 by American Speech-Language-Hearing Association.
American Speech-Language-Hearing Association. Reprinted with permission.
Page 130
summarizes their list. They recommended that larger numbers of risk factors be viewed as cause for greater concern.
Expected Patterns of Language Performance
The language performance of children with SLI has undergone greater scrutiny than that of any other group of children
with language difficulties. The diversity and depth of this research over several decades leads to some clear expectations
of areas in which difficulties can be expected but also to ubquitous expectations that each child will be different.
Therefore, before I delve into expected patterns of difficulties, I should mention again that general expectations lead to
hypotheses about what might be expected in a given child—not infallible certainties. Generalizations also fail to render
either the variations found in studies identifying distinct subtypes of SLI (e.g., Aram & Nation, 1975; Rapin & Allen,
1988; Wilson & Risucci, 1986) or in studies revealing changes in patterns of impairment that occur with age (e.g., Aram,
Ekelman, & Nation, 1984; Stothard, Snowling, Bishop, Chipchase, & Kaplan, 1998; Tomblin, Freese, & Records, 1992).
Further, these generalizations have been identified for children acquiring English—a potentially serious limitation for
clinicians working with children acquiring other less-studied languages (Leonard, 1998). Thus, the expected patterns
discussed here are described only briefly and are meant to prompt consideration of likely areas of difficulty, not to
become the only ones given attention.
Among the more robust findings from studies examining language skills in English-speaking children with SLI have
been the findings that (a) expressive and receptive language are often differentially impaired, and (b) degree of
involvement can vary from quite mild to quite severe. Also, expressive language tends to be more frequently and
severely affected—an observation that is borne out in much of the literature and is also reflected in the DSM–IV
(American Psychiatric Association, 1994) definition shared at the beginning of the chapter. Recent research, however,
suggests that this disparity may not be as large as has sometimes been thought. Among the children who were found to
have impaired language in a report dealing with a large epidemiological study, Tomblin (1996a) identified 35% of
children with expressive problems, 28% with receptive problems, and 35% with both expressive and receptive problems
(given a cutoff of 1.25 standard deviations below the mean).
In Table 5.3, specific areas of difficulty relative to normally developing peers are summarized on the basis of an
extensive review of literature appearing in Leonard (1998; cf. Menyuk, 1993; Watkins, 1994). In Table 5.3, the density
of comments falling under language production reflects not only the tendency for this modality to be affected by more
obvious and often more severe deficits than comprehension, but also by a tendency for it to have received much greater
research attention. A related table, Table 5.4, lists specific grammatical morphemes that have been identified as
particularly problematic.
As you examine Table 5.3, notice that many—although not all—of the differences shown by children with SLI resemble
patterns seen in younger children and are therefore characterized as delays. This observation may have implications
related to the nature of this disorder. In addition, it supports the reasonableness of approaching
Page 131
Table 5.3
Patterns of Oral Language Impairment by Modality and Domain Reported in Children With Specific Language
Impairment (SLI) (Leonard, 1998)
Semantics
Lexical abilities and early word combinations
Delays in acquiring first words and word combinationsDelays in verb acquisition, with overuse of some common verbs
(e.g., do, go, get, put, want)Word-finding difficulties,a especially noted in school-age children Deficient in learning to
understand new words, particularly those involving actions Argument structure Increased tendency to omit obligatory
arguments (e.g., omission of object for transitive verb) or even the verb itselfIncreased tendency to omit optional but
semantically important information (e.g., adverbials providing information regarding time, location, or manner of action)
and use of an infinitival complement (e.g., He wants to do this) Increased difficulty in acquiring argument structure
information from syntactic information for new verbs Grammatical morphologyb Grammatical morphology constitutes a
relative and sometimes enduring weakness in children with SLI (see Table 5.4 for a list of grammatical morphemes that
have received particular attention)Grammatical morphology related to verbs is especially affectedErrors most often
consist of omissions rather than inappropriate use, but are likely to be inconsistent in either case Limited research
suggests poorer comprehension of grammatical morphemes, especially for those of shorter duration, and poorer
identification of errors involving grammatical morphemes Phonology Although occasionally occurring alone,
phonological deficits are almost always accompanied by other language deficits, and vice versaDelays are most
frequently seen with most errors resembling those of younger normally developing children.Unusual errors in
productionc occur rarely, but probably more often than in normally developing childrenGreater variability in production
than children without SLI at similar stages of phonological development
(Continued)
Page 132
Table 5.3 (Continued)
Pragmatics
Some evidence of pragmatic difficultiesAlthough these difficulties largely seem due to communication problems posed
by other language deficits, independent pragmatic deficits may occur as wellParticipation in communication is
negatively affected when communication involves adults or multiple communication partners Limited research suggests
that understanding of the speech acts of others may be affectedComprehension of figurative language (e.g., metaphors,
idioms) can be affected Narratives Cohesion of narratives can be affected, and sometimes expected story components are
absent Comprehension of narratives can be affected when inferences need to be drawn from the literal narrative content
aEvidenced by unusually long pauses in speech, frequent circumlocution, or frequent use of nonspecific words such as it
and stuff.bGrammatical morphology can be defined as “the closed-class morphemes of language, both the morphemes
seen in inflectional morphology (e.g., ‘plays,’ ‘played’) and derivational morphology (e.g., ‘fool,’ ‘foolish’), and
function words such as articles and auxiliary verbs” (Leonard, 1998, p. 55).cAmong the unusual errors reported for this
population are later developing sounds being used in place of earlier developing sounds, a sound segment addition, and
use of sounds not heard in the child’s ambient language.
treatment goals from a developmental perspective (Leonard, 1998). Also, notice the expanse of unmapped country
revealed here. Despite several decades of work, much remains unknown about the abilities of children with SLI and how
they are related to one another. Consequently, the potential for valuable outcomes from experimental exploration is
immense!
Finally, on a very different note, readers of this table may find that their knowledge of some terminology related to
linguistic descriptions of these children’s difficulties is outdated or incomplete. They are referred to Hurford (1994) as a
reference guide to the more basic grammatical terms.
Related Problems
When compared with children described in other sections of this book, children with SLI have far fewer related
problems. Despite the more restricted nature of their difficulties, however, children with SLI are at increased risk for a
number of significant, ongoing problems in addition to a lengthening list of subtle perceptual and cognitive deficiencies
that were described briefly earlier. Among these are increased risk for emotional, behavioral, and social difficulties. In
addition, there is increased risk for ongoing academic difficulties often associated with diagnoses of learning disabilities
(Wallach & Butler, 1994).
Page 133
Table 5.4
Examples of Grammatical Morphemes, an Area of Special Difficulty for Children With Specific Language Impairment
(SLI)
Inflectional morphemes
Past tense, regular –ed: slept, walked, irregular: flew, hid
Third-person singular -s: sits, runs
Progressive–ing: is running, is seeing
Plural –s: coats, flowers
Possessive ‘s (also called genitive ‘s): Sam’s, dog’s
Other grammatical morphemes
Copula be: he is a boy; they are happy
Auxiliary be: she is hunting, he was cooking
Auxiliary do: I don’t hate you; Do you remember that man?
Articles: the man; a cat
Pronouns: anything, herself, I, he, they, them, her
Suspected Causes
Related Problems
Tracy, a 10-year-old with Down syndrome, attends a regular classroom, where her voice often rings out as she expresses
exuberant enthusiasm for all the fun things that happen. Tracy speaks in short sentences that are frequently difficult to
understand. Although she sometimes shows considerable frustration with others’ not understanding her, most of the time
Tracy appears oblivious to their lack of understanding. A speech-language pathologist works with her on goals related
to syntax and intelligibility, usually within the classroom.
Seth, a 4-year-old with cerebral palsy and epilepsy as well as mental retardation, attends a special preschool classroom
irregularly because of his frequent illnesses. In the classroom, he spends much of his time in a wheelchair or adaptive
seat, which was designed to provide him with the postural support needed for him to control his head movements. In
addition to working with him in the classroom, a speech-language pathologist visits his home once a week to work with
Seth and his mother. Seth vocalizes infrequently and often seems unaware of others in his environment. Goals for him
Page 147
include establishing nonverbal turn-taking skills and increasing the frequency of his vocalizations.
Jake is a 12-year-old boy with mild mental retardation associated with Fetal alcohol syndrome. Although his
comprehension skills test within the normal range, and he is generally understandable in his language production, Jake
has considerable difficulty in following directions in school. He has been diagnosed with ADD and requires frequent
redirecting to stay involved in classroom activities. Although he is eager to establish friendships with his classmates, his
ability to use social cues to guide his communications appears inconsistent. Intervention for Jake includes individual,
attention within the classroom and participation in a social skills group with the speech-language pathologist one time
per week.
Defining the Problem
Tracy, Seth, and Jake are representative of the approximately 3% of school-age children in the United States who exhibit
problems associated with mental retardation (Roeleveld, Zielhuis, & Gabreels, 1997), where mental retardation can be
defined as reduced intelligence accompanied by reduced adaptive functioning, that is, reduced ability to function in
everyday situations in a manner considered culturally and developmentally appropriate. Because communication is a
particularly important adaptive function affected by mental retardation, speech-language pathologists often work with
affected children and their families.
About 85% of children with mental retardation experience mild problems (Lubetsky, 1990) and may not be identified as
mentally retarded until they reach school age. Children with more significant degrees of impairment are often identified
at an earlier point because their delays in achieving developmental milestones are more pronounced and because they
often have additional medical difficulties, such as cerebral palsy or epilepsy (Durkin & Stein, 1996). Although mental
retardation is usually present from birth, it can also be diagnosed for conditions that can occur up to 18 years of age,
including exposure to environmental toxins such as lead over the first few years of life.
Despite the brief definition offered earlier, formulating a more complete, usable definition of mental retardation that is
equally acceptable to families, advocates, scientists, clinicians, and politicians has proved controversial and difficult—
some would say impossible—particularly where milder forms of retardation are concerned (Baumeister, 1997; Roeleveld
et al., 1997). Table 6.1 provides two of the most influential definitions currently being used—those proposed by the
American Association for Mental Retardation (AAMR) and the American Psychiatric Association.
The AAMR and American Psychiatric Association definitions both specify impairment in adaptive skills as a critical
element in the identification process. Traditionally, IQ score alone, with less attention to adaptive skills, was central to
the identification process. These two newer definitions address essentially the same adaptive skills (viz., communication,
self-care, home living, social skills, community use, self-direction, health and safety, functional academics, leisure, and
work). Despite this uniformity, however, these definitions are still quite controversial because of significant concerns
Page 148
Table 6.1
Two Influential Definitions of Mental Retardation
about the lack of valid measures for many adaptive skill areas (e.g., Jacobson & Mulick, 1996; Macmillan & Reschly,
1997) and because of debates about the number of dimensions needed to capture adaptive functioning (Simeonsson &
Short, 1996).
Although not evident in Table 6.1, the complete AAMR and American Psychiatric Association definitions differ sharply
in their handling of severity. Whereas the American Psychiatric Association definition maintains a traditional treatment
of severity using a system with five levels (see Table 6.2), the AAMR system (Luckasson, 1992) replaces those with the
description of levels of support needed by the individual (intermittent, limited, extensive, and pervasive) for intellectual
ability and for each adaptive skill separately. Because treatment recommendations are often formulated on the basis of
severity (Durkin & Stein, 1996), this change in the AAMR definition represents a major departure from long-standing
practice.
Table 6.2
Degrees of Severity of Mental Retardation Used
by the American Psychiatric Association (DSM–IV, 1994)
Degree IQ Level
Category Percentage of cases of mental retardation associated with this factor Specific conditions
Heredity 5
Inborn errors of metabolism (e.g., Tay-Sachs disease)Single-gene abnormalities (e.g., tuberous sclerosis)Chromosomal
aberrations (e.g., fragile X syndrome, a small number of cases of Down syndrome) Early alterationsof
embryonicdevelopment 30 Chromosomal changes (most cases of Down syndrome—those due to trisomy 21)Prenatal
damage due to toxins (e.g., maternal alchohol consumption, infections) Pregnancy andperinatalproblems 10 Fetal
malnutrition, prematurity, hypoxia (oxygen deficiency), viral and other infections, and trauma General
medicalconditionsacquired ininfancy orchildhood 5 Infections, traumas, and poisoning (e.g., due to lead)
Probably the most basic facts in genetics include the information that all cells in the human body except for the
reproductive cells (sperm in men and ova in women) contain 23 pairs of chromosomes. These 23 chromosome pairs
consist of 22 pairs of numbered autosomes and 1 pair of sex chromosomes, which are identified as XX for women and
XY for men. These chromosomes, which hold many individual genes, act as the blueprints for cell function and thus
determine an individual’s physical make-up.
Unlike other human cells, ova and sperm cells have half the usual number of chromosomes—23 nonpaired chromosomes
and one sex chromosome. During the reproductive process, this feature of reproductive cells allows each parent to
contribute one half of each offspring’s genetic material as the genetic materials of both reproductive cells are combined
during fertilization. Because chromosomes contain numerous genes, defects to either the larger chromosomes or to
individual genes can result in impaired cellular function during embryonic development and later life.
Down syndrome is an example of an autosomal genetic disorder in which extra genetic material is found at chromosome
pair 21. This condition arises about once in every 800 live births, making it the most common genetic disorder associated
with mental retardation. About 95% of the time, Down syndrome occurs because an entire
Page 151
Fig. 6.1. Graphic representation of the genetic test used to identify the presence of Trisomy 21. From Babies With Down
Syndrome: A New Parents Guide (p. 8), by K. Stray-Gunderson (Ed.), 1986, Kensington, MD: Woodbine House.
Copyright 1986 by Woodbine House. Reproduced with permission.
extra chromosome is present, resulting in the individual’s possessing three chromosomes of chromosome 21 known as
trisomy 21, instead of the normal pairing of chromosomes (Bellenir, 1996). Figure 6.1 illustrates the complete set of
chromosomes associated with a girl who has Down syndrome.
Less frequently, Down syndrome is associated with only a portion of an extra chromosome occurring at chromosome 21
or with the occurrence of an entire extra chromosome 21, but only in some cells within the body (termed mosaic Down
syndrome). Usually the chromosomal defect occurs during the development of an individual ovum, but it can occur
because of a sperm defect or a defect occurring after the uniting of the sperm and ovum in fertilization. Because of this
timing of the change in the genetic material, Down syndrome is described as a genetic disorder, but not an inherited one,
in which both parent and child are affected.
Down syndrome is associated with a characteristic physical appearance, involving slanted eyes, small skin folds on the
inner corner of the eyes (epicanthal folds), slightly protruding lips, small ears, an overly large tongue (macroglossia), and
short hands, feet, and trunk (Bellenir, 1996). Figure 6.2 shows two young children with this syndrome.
Other more serious physical anomalies found among children with Down syndrome affect the cervical spine, bowel,
thyroid, eyes, and heart (Cooley & Graham, 1991). Children with Down syndrome are more susceptible to infection,
including otitis media
Page 152
Fig. 6.4. Two youngsters with fetal alcohol syndrome. From Fetal Alcohol Syndrome: Diagnosis, Epidemiology,
Prevention, and Treatment (Figure 1-1, p. 18), by K. Stratton, C. Howe, & F. Battaglia (Eds.), 1996, Washington, DC:
National Academy Press. Copyright 1996 by National Academy Press. Reproduced with permission.
Page 156
Despite the implications that these cases involve social or experiential bases, there is considerable speculation that
nonorganic cases of mental retardation may actually reflect our current lack of knowledge rather than truly nonorganic
causes (Baumeister, 1997; Richardson & Koller, 1994). Many cases now identified as nonorganic may be recategorized
as the relationship of low SES and family history to exposure to environmental toxins (e.g., lead), poor nutrition, and
other ultimately organic causes are uncovered. The one major, truly nonorganic factor associated with mental retardation
is severe social deprivation, as a result of either inadequate institutional conditions or limitations of a child’s principal
caregiver (Richardson & Koller, 1994). Yet even that mechanism may act by depriving the infant’s maturing nervous
system of the proper inputs to promote specific physiological states required for brain development.
Special Challenges in Assessment
One of the most important things to keep in mind when trying to understand any child is his or her uniqueness—the
uniqueness of current strengths and weaknesses, history, and family situation. Most important, there is the need to
remember that uniqueness that makes them “Tracy” or “Seth” or “Jake,” rather than just the child with a particular
syndrome and pattern of deficits. Assessing children with mental retardation tempts some individuals to equate them
with their level of retardation or its etiology and tempts some people to pay attention to what they cannot do rather than
to what they are doing in their communications. Personal Perspective 6 hints at the negative effects of such a mistake.
PERSONAL PERSPECTIVE
The following passage is taken from a book written by a pair of young adult friends who have each been diagnosed
with Down syndrome. The title of their book is Count Us in: Growing up With Down Syndrome (Kingsley & Levitz,
1994, p. 35).
August ‘90
Mitchell: I wish I didn’t have Down syndrome because I would be a regular person, a regular mainstream normal
person. Because I didn’t know I had Down syndrome since a long time ago, but I feel very special in many ways. I feel
that being with, having Down syndrome, there’s more to it than I expected. It was very difficult but… I was able to
handle it very well.
Jason: I’m glad to have Down syndrome. I think it’s a good thing to have for all people that are born with it. I don’t
think it’s a handicap. It’s a disability for what you’re learning because you’re learning slowly. It’s not that bad. (p. 35)
How do you avoid these temptations? First, plan assessments using initial hypotheses about developmental levels and
patterns of impairment (which will be described in the next section) and on information obtained from caregivers or
others who know
Page 157
the child well. Framing the assessment questions with special clarity can help you anticipate the particular challenges
individual children might pose to the validity of conventional instruments.
Second, prepare to alter your plan as needed to keep the child engaged and interacting. Not only does this mean that you
may need to turn away from a standardized instrument midstream (e.g., if it is developmentally inappropriate) in favor of
a more informal or dynamic assessment method (see chap. 10), you may also want to consider the use of adaptations.
Test adaptations are changes made in the test stimuli, response required of the child, or testing procedures (Stagg, 1988;
Wasson, Tynan, & Gardiner, 1982). On the one hand, the use of test adaptations threatens the validity of norm-
referenced comparisons that may be made using the instrument. Therefore, if a clinical question that really requires that
kind of comparison is at stake (e.g., an initial evaluation in which a difference from norms must be demonstrated to help
a child receive services), the clinician will avoid adaptations if possible. On the other hand, when some aspect of the
standard administration other than the basic skill or knowledge being tested interferes with a child’s ability to reveal his
or her actual skill or knowledge, one can argue that the validity of the comparison has already been severely
compromised. Table 6.4 lists some of the most common adaptations used. Regardless of which adaptations are used, they
should be described in reports of test results and the clinician should com-
Table 6.4
Examples of Testing Adaptations Used Frequently
With Children With Mental Retardation and Frequent Coexisting Problems (Stagg, 1988)
Down syndrome
Semantics (Rondal, 1996)Pragmatics (e.g., turn-taking, diversity of speech acts; Rondal, 1996)Nonverbal social
interaction skills (Hodapp, 1996) Morphology (Fowler, 1990; Rondal, 1996)Syntax (Fowler, 1990; Rondal, 1996)
Phonology (Rondal, 1996)Expressive skills relative to receptive skills (Dykens, Hodapp, & Leckman, 1994)Plateauing
of development in above areas from late childhood on (Rondal, 1996)Auditory processing (Hodapp, 1996)Nonverbal
requesting behavior (Hodapp, 1996)Increased risk of hearing loss (Bellenir, 1996)Increased risk of fluency disorder
(Bloodstein, 1995) Strengths:Adaptive behavior (Hodapp, 1996)“Pleasant personality” (Hodapp, 1996)Weaknesses:Low
task persistence (Hodapp, 1996)Mathematics (Hodapp, 1996)Inadequate motor organization (Hodapp, 1996)Visually
directed reaching (Hodapp, 1996)Visual monitoringHypotonia(Hodapp, 1996)Slow orienting to auditory information
(Hodapp, 1996) Fragile X syndromea Expressive vocabulary skills (Rondal & Edwards, 1997)Possibly syntax (although
sometimes grammar has been identified as a weakness; Dykens, Hodapp, & Leckman, 1994; Rondal & Edwards, 1997)
Fluency abnormalities (e.g., perseverative and staccato speech, rate of speech, cluttering; Rondal & Edwards, 1997).
Pragmatics, especially poor eye contact and other autistic-like behaviors (Rondal & Edwards, 1997)Phonology, difficulty
in sequencing syllables (Dykens, Hodapp, & Leckman, 1994; Rondal & Edwards, 1997) Strengths:Adaptive skills
(especially in personal and domestic skills; Dykens, Hodapp, & Leckman, 1994)Weaknesses:Attention deficits and
hyperactivity (Dykens, Hodapp, & Leckman, 1994)Social avoidance and shyness (Dykens, Hodapp, & Leckman, 1994)
(Continued)
Page 160
Table 6.5 (Continued)
Fetal alcohol
syndrome and fetal
alcohol effect
Most areas of language relatively unaffected ComprehensionPragmatics (e.g., frequently tangential responses; Abkarian,
1992) Strengths:Cognitive delays, when present, are usually mildWeaknesses:Attentional problems or hyperactivity
(Stratton, Howe, & Battaglia, 1996)Increased risk for visual and hearing problems (Stratton, Howe, & Battaglia, 1996)
Increased risk for behavior problems (Stratton, Howe, & Battaglia, 1996) Williams syndromeb Expressive language
(Rondal & Edwards, 1997)Morphology and syntax (Rondal & Edwards, 1997)Lexical knowledge (Rondal & Edwards,
1997)Metalinguistic knowledge (Rondal & Edwards, 1997)Fluency, prosody (Rondal & Edwards, 1997)Narrative skills
(Rondal & Edwards, 1997)Phonological skills (Rondal & Edwards, 1997) Receptive language (Udwin & Yule, 1990).
Pragmatics skills (socially inappropriate content, poor eye contact; Rondal & Edwards, 1997) Strengths:Facial
recognition (Rondal & Edwards, 1997)Weaknesses:Severe visuospatial deficits (Rondal & Edwards, 1997)Hyperacusis
(negatively sensitive to noise), especially younger children aPatterns relate almost entirely to affected males because of
the paucity of data on affected females.bPatterns based on a very limited database.
Page 161
because of its rarity.) Because the four groups of children described in Table 6.4 have experienced very different levels
of scrutiny, they differ in the certainty with which these strengths and weaknesses are known (Hodapp & Dykens, 1994).
Specifically, children with Down syndrome have received much more attention than those with fragile X, who have, in
turn, received considerably more attention than those with Williams or FAS. Interestingly, there has even been some
work suggesting that the specific type of chromosomal abnormality resulting in Down syndrome results in different
prognoses for communication outcomes, with better communication skills predicted for those children with mosaic
Down syndrome than with the more common trisomy 21 (Rondal, 1996).
Related Problems
Children with mental retardation are at risk for a variety of additional health-related and social problems, particularly if
the retardation is more severe (American Psychiatric Association, 1994). For example, two medical conditions that occur
frequently among children with severe or profound mental retardation are epilepsy and cerebral palsy, which have
expected percentage of occurrence rates of 19–36% for epilepsy and 20–40% for cerebral palsy (Richardson & Koller,
1994).
Overall, children with mental retardation, regardless of etiology, appear to be at four times the normal risk level for
ADHD, although there is some question as to whether their attention problems are really manifestations of mental
retardation rather than an independent additional problem (Biederman, Newcorn, & Sprich, 1997). Other behavioral and
emotional problems are also observed more frequently among individuals with mental retardation than among others,
including conduct disorder, anxiety disorders, psychozoidal disorder, and depression (Eaton & Menolascino, 1982).
Often, the etiology of mental retardation is closely associated with risk levels for particular problems. For example,
different kinds of visual problems are found in children with Down syndrome than in children with fragile X syndrome.
Whereas children with Down syndrome will frequently experience nearsightedness and cataracts (Connor & Ferguson-
Smith, 1997; Lubetsky, 1990), children with fragile X syndrome will more commonly have strabismus, a problem in the
coordination of eye movements (Maino, Wesson, Schlange, Cibis, & Maino, 1991).
Children with developmental and speech delays have also been found to be at increased risk for maltreatment, including
physical abuse, sexual abuse, and neglect (Sandgrund, Gaines, & Green, 1974; Taitz & King, 1988). Given the close
contact that speech-language pathologists frequently have with their clients, this increased incidence of maltreatment
makes it particularly important for them to be aware of signs of maltreatment (Veltkamp, 1994).
Summary
1. Mental retardation, which affects about 3% of children in the United States, involves reduced intelligence and reduced
adaptive functioning.
2. More severe levels of mental retardation (i.e., moderate, severe, and profound) are often diagnosed relatively early,
but are relatively uncommon, affecting only 15%
Page 162
of those children diagnosed with mental retardation. Mild mental retardation affects about 85% of children with mental
retardation but tends to be diagnosed later—sometimes not until school age.
3. Definitions of mental retardation proposed by the AAMR and the American Psychiatric Association differ primarily in
their characterization of severity, with the AAMR definition proposing levels of support needed for numerous
intellectual and adaptive functions in place of levels of impairment.
4. Increasingly, organic factors, as opposed to familial or nonorganic factors, are being identified as reasonable
explanations for cases of mental retardation. The three most common organic causes of mental retardation are Down
syndrome, fragile X syndrome, and FAE.
5. Down syndrome and fragile X syndrome are the most frequent genetic sources of mental retardation. Down syndrome
is almost always associated with a chromosomal abnormality, whereas fragile X syndrome is associated with an error
involving a single gene on the X chromosome.
6. FAS, which is usually associated with mild mental retardation, is considered the most frequent preventable cause of
mental retardation.
7. Assessment challenges include the need for particularly careful selection of developmentally appropriate instruments,
increased need for less formal measures because of a lack of appropriate standardized measure, and the need to adapt
tests to help insure that aspects of the child’s difficulties that are unrelated to the concept being tested are not preventing
successful performance.
8. Expected patterns of communication performance are related to level of mental retardation and to etiology.
Key Concepts and Terms
adaptive functioning: reduced ability to function in everyday situations in a manner considered culturally and
developmentally adequate.
autosomes: the most common type of chromosome within the human cell. They are usually contrasted with the sex
chromosomes, which typically consist of a single pair (XX for women and XY for men).
chromosomes: structures within human cells that carry the genes that act as blueprints for cell function.
dementia: a significant decline in intellectual function, usually after a period of normal intellectual function.
discrepancy testing: the comparison of performances in two different behavioral or skill areas (e.g., between ability and
achievement) to determine whether a discrepancy exists; often used as a requirement for services in education systems.
Down syndrome: an autosomal genetic disorder that is considered the most common genetic abnormality resulting in
mental retardation. It is associated with mild to severe mental retardation and particularly marked difficulties with syntax
and phonology.
Page 163
fetal alcohol effect (FAE): a diagnosis related to FAS, in which some but not all of the abnormalities required for a
diagnosis of FAS are observed.
Fetal alcohol syndrome (FAS): the constellation of physical abnormalities, deficient growth patterns, and cognitive and
behavioral problems found in children with a significant prenatal exposure to alcohol.
fragile X syndrome: the most common inherited cause of mental retardation; it is related to an X-chromosome
abnormality that may be passed through several generations before becoming severe enough to result in mental
retardation. The syndrome more commonly affects men than women.
mental retardation: reduced intelligence accompanied by reduced adaptive functioning.
mosaic Down syndrome: an uncommon form of Down syndrome occurring in less than 5% of cases, when trisomy 21
affects only some rather than all cells in the body.
out-of-level testing: the use of an instrument developed for children whose age differs from that of the child to be tested
(Berk, 1984).
premutation: a gene that is somewhat defective but not associated with significant abnormalities, as can happen in
families where fragile X syndrome is subsequently identified.
sex chromosomes: gene-bearing chromosomes associated with gender-related characteristics; these are related to
numerous birth defects in which patterns of transmission appear to be affected by gender.
strabismus: a problem in eye movement coordination, sometimes referred to as crossed eyes.
trisomy 21: the most typical chromosomal abnormality in Down syndrome, consisting of a third chromosome 21.
Williams syndrome: a congenital metabolic disease usually associated with moderate to severe learning difficulties.
Study Questions and Questions to Expand Your Thinking
1. What are the major common components of the definitions of mental retardation provided in this chapter?
2. Describe three possible co-occurring problems that may affect the communication and test-taking behaviors of a child
with mental retardation.
3. What is the most common inherited cause of mental retardation? What is the most common preventable cause?
4. Determine the definition for mental retardation used in a school system near you. How does that definition compare to
those of the AAMR and the American Psychiatric Association?
5. One test of adaptive skills that is frequently used is the Vineland Adaptive Behavior Scales (Sparrow, Balla, &
Cicchetti, 1984). Examine that measure in terms
Page 164
of items related to communication. What language domains (e.g., semantics, syntax, morphology, pragmatics) and what
language modalities (speaking, listening, writing, reading) are emphasized?
6. Using a format like that used in Table 6.5, identify a syndrome not described in this chapter (e.g., Prader-Willi
syndrome, cri du chat) and prepare a brief list of expected patterns of language and communication.
7. Examine the test manual of a language test to determine (a) what, if anything, is said about the appropriateness of the
measure for a child with mental retardation, and (b) what aspects of one or more tasks included in the test might be
incompatible with the characteristics of the following children:
● a child with severe cerebral palsy and moderate retardation whose only reliable response mode is a slow, effortful
pointing response;
● a child with mild retardation but severe attention and motivational problems; and
● a child with Down syndrome who has moderate retardation and a severe visual impairment.
Recommended Readings
Cohen, M. M. (1997). The child with multiple birth defects. (2nd ed.) New York: Oxford University Press.
Dykens, E. M., Hodapp, R. M. & Leckman, J. F. (1994). Behavior and development in fragile X Syndrome. Thousand
Oaks, CA: Sage.
Hersen, M., & Van Hasselt, V. (Eds.). (1990). Psychological aspects of developmental and physical disabilities: A
casebook. Newbury Park, CA: Sage.
Rondal, J. A., & Edwards, S. (1997). Language in mental retardation. San Diego, CA: Singular.
Stray-Gunderson, K. (Ed.). (1986). Babies with Down syndrome: A new parents guide. Kensington, MD: Woodbine
Press.
References
Abkarian, G. (1992). Communication effects of prenatal alcohol exposure. Journal of Communication Disorders, 25(4),
221–240.
American College of Medical Genetics. (1997). Policy statement on fragile X syndrome: Diagnostic and carrier testing
[On-line]. Available: http://www.faseb.org/genetics/acmg/pol-16.htm.
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington,
DC: Author.
Batshaw, M. L., & Perret, Y. M. (1981). Children with handicaps: A medical primer. Baltimore: Brookes Publishing
Company.
Baumeister, A. A. (1997). Behavioral research: Boom or bust? In W. E. MacLean, Jr. (Ed.), Ellis’ handbook of mental
deficiency, psychological theory and research (3rd ed., pp.3–45). Mahwah, NJ: Lawrence Erlbaum Associates.
Baumeister, A. A., & Woodley-Zanthos, P. (1996). Prevention: Biological factors. In J. W. Jacobson & J. A. Mulick
(Eds.), Manual of diagnosis and professional practice in mental retardation (pp. 229–242). Washington, DC: APA.
Bellenir, K. (1996). Facts about Down syndrome. Genetic disorders handbook. (pp. 3–14). Detroit, MI: Omnigraphics.
Bellugi, U., Marks, S., Bihrle, A., & Sabo, H. (1993). Dissociation between language and cognitive functions in
Williams syndrome. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 177–
189). Mahwah, NJ: Lawrence Erlbaum Associates.
Page 165
Berk, R. A. (1984). Screening and diagnosis of children with learning disabilities. Springfield, IL: Thomas.
Biederman, J., Newcorn, J. H., & Sprich, S. (1997). Comorbidity of attention-deficit/hyperactivity disorder. In T. A.
Widiger, A. J. Frances, H. A. Pincus, R. Ross, M. B. First, & W. Davis (Eds.), DSM–IV sourcebook. Washington, DC:
American Psychiatric Association.
Bloodstein, O. (1995). A handbook on stuttering (5th ed.). San Diego, CA: Singular.
Cohen, I. L. (1995). Behavioral profiles of autistic and nonautistic Fragile X males. Developmental Brain Dysfunction, 8,
252–269.
Cohen, M. M. (1997). The child with multiple birth defects (2nd ed.). New York: Oxford University Press.
Connor, M., & Ferguson-Smith, M. (1997). Essential medical genetics (5th ed.). Oxford, England: Blackwell.
Cooley, W. C. & Graham, J. M. (1991). Common syndrome and management issues for primary care physicians: Down
syndrome—An update and review for the primary pediatrician. Clinical Pediatrics, 30(4), 233–253.
Cromer, R. (1981). Reconceptualizing language acquisition and cognitive development. In R. L. Schiefelbusch & D. D.
Bricker (Eds.), Early language: Acquisition and intervention. Baltimore: University Park Press.
Donaldson, M. D. C., Shu, C. E., Cooke, A., Wilson, A., Greene, S. A., & Stephenson, J. B. (1994). The Prader-Willi
syndrome. Archives of Diseases of Children, 70, 58–63.
Downey, J., Ehrhardt, A. A., Gruen, R., Bell, J. J., & Morishima, A. (1989). Psychopathology and social functioning in
women with Turner syndrome. Journal of Nervous and Mental Disorders, 177, 191–201.
Durkin, M. S., & Stein, Z. A. (1996). Classification of mental retardation. In J. W. Jacobson & J. A. Mulick, (Eds.),
Manual of diagnosis and professional practice in mental retardation (pp. 67–73). Washington, DC: APA.
Dykens, E. M., Hodapp, R. M., & Leckman, J. F. (1994). Behavior and development in fragile X syndrome. Thousand
Oaks, CA: Sage.
Eaton, L. F., & Menolascino, F. J. (1982). Psychiatric disorder in the mentally retarded: Types, problems, and
challenges. American Journal of Psychiatry, 139, 1297–1303.
Fowler, A. E. (1990). Language abilities in children with Down syndrome: evidence for a specific syntactic delay. In D.
Cicchetti & M. Beeghley (Eds.), Children with Down syndrome (pp. 302–328). Cambridge, England: Cambridge
University Press.
Fox, R., & Wise, P. S. (1981). Infant and preschool reinforcement survey. Psychology in the Schools, 18, 87–92.
Gottlieb, M. L. (1987). Major variations in intelligence. In M. I. Gottlieb & J. E. Williams (Eds.), Textbook of
developmental pediatrics (pp. 127–150). New York: Plenum.
Grossman, H. J. (Ed.). (1983). Classification in mental retardation. Washington, DC: American Association on Mental
Deficiency.
Hersen, M., & Van Hasselt, V. B. (Eds.). (1990). Psychological aspects of developmental and physical disabilities: A
casebook. Newbury Park, CA: Sage Publications.
Hodapp, R. M. (1996). Cross-domain relations in Down’s syndrome. In J. A. Rondal, J. Porera, L. Nadel, & A.
Comblain (Eds.), Down s syndrome: Psychological, psychological, and socio-educational perspectives (pp. 65–79). San
Diego, CA: Singular.
Hodapp, R. M., & Dykens, E. M. (1994). Mental retardation’s two cultures of behavioral research. American Journal on
Mental Retardation, 98, 675–687.
Hodapp, R. M., & Zigler, E. (1997). New issues in the developmental approach to mental retardation. In W. E. MacLean,
Jr. (Ed.), Ellis’ handbook of mental deficiency, psychological theory and’ research (3rd ed., pp. 1–28). Mahwah, NJ:
Lawrence Erlbaum Associates.
Hodapp, R. M., Leckman, J. F, Dykens, E. M., Sparrow, S. S., Zelinsky, D. G., & Ort, S. I. (1992). K-ABC profiles of
children with fragile X syndrome, Down syndrome, and nonspecific mental retardation. American Journal on Mental
Retardation, 97, 39–46.
Jacobson, J. W., & Mulick, J. A. (Eds.) (1996). Manual of diagnosis and professional practice in mental retardation.
Washington, DC: APA.
Page 166
Kingsley, J., & Levitz, M. (1994). Count us in: Growing up with Down syndrome. New York: Harcourt Brace.
Lehrke, R. G. (1972). A theory of X-linkage of major intellectual traits. American Journal of Mental Deficiency, 76, 611–
619.
Lubetsky, M. J. (1990). Diagnostic and medical considerations. In M. Hersen & V. B. Van Hasselt (Eds.), Psychological
aspects of developmental and physical disabilities: A casebook (pp. 25–53). Newbury Park, CA: Sage Publications.
Luckasson, R. (1992). Mental retardation: Definition, classification, and systems of support. Washington, DC: American
Association on Mental Retardation.
Macmillan, D. L., & Reschly, D. J. (1997). Issues of definition and classification. In W. E. MacLean, Jr. (Ed.), Ellis’
handbook of mental deficiency, psychological theory and research (3rd ed., pp. 47–71). Mahwah, NJ: Lawrence Erlbaum
Associates.
Maino, D. M., Wesson, M., Schlange, D., Cibis, G., & Maino, J. H. (1991). Optometric findings in the fragile X.
Optometry and Vision Science, 68, 634–640.
Maxwell, L. A., & Geschwint-Rabin, J. (1996). Substance abuse risk factors and childhood language disorders. In M. D.
Smith & J. S. Damico (Eds.), Childhood language disorders (pp. 235–271). New York: Thieme.
Mervis, C. B. (1998). The Williams syndrome cognitive profile: Strengths, weaknesses, and interrelations among
auditory short-term memory, language, and visuospatial constructive cognition. In E. Winograd, R. Fivush, & W. Hirst
(Eds.), Ecological approaches to cognition. Mahwah, NJ: Lawrence Erlbaum Associates.
Miller, J. F, & Chapman, R. (1984). Disorders of communication: Investigating the development of language of mentally
retarded children. American Journal of Mental Deficiency, 88, 536–545.
Richardson, S. A., & Koller, H. (1994). Mental retardation. In I. B. Pless (Ed.), The epidemiology of childhood disorders
(pp. 277–303). New York: Oxford University Press.
Roeleveld, N., Zielhuis, G. A., & Gabreels, F. (1997). The prevalence of mental retardation: A critical review of the
literature. Developmental Medicine and Child Neurology, 39, 125–132.
Rondal, J. A. (1996). Oral language in Down’s syndrome. In J. A. Rondal, J. Porera, L. Nadel, & A. Comblain (Eds.),
Down’s syndrome: Psychological, psychobiological, and socio-educational perspectives (pp. 99–117). San Diego, CA:
Singular.
Rondal, J. A., & Edwards, S. (1997). Language in mental retardation. San Diego, Singular Publishing Group.
Rosenberg, S., & Abbeduto, L. (1993). Language and communication in mental retardation: development, processes,
and prevention. Hillsdale, NJ: Lawrence Erlbaum Associates.
Sandgrund, A., Gaines, R., & Green, A. (1974). Child abuse and mental retardation: A problem of cause and effect.
American Journal of Mental Deficiency, 79, 327–330.
Sattler, J. M. (1988). Assessment of children (3rd ed.). San Diego, CA: Author.
Simeonsson, R. J., & Short, R. J. (1996). Adaptive development, survival roles, and quality of life. In J. W. Jacobson &
J. A. Mulick (Eds.), Manual of diagnosis and professional practice in mental retardation (pp. 137–146). Washington,
DC: APA.
Sparks, S. N. (1993). Children of prenatal substance abuse. San Diego, CA: Singular.
Sparrow, S., Balla, D., & Cicchetti, D. (1984). Vineland Adaptive Behavior Scales. Circle Pines, MN: American
Guidance Service.
Stagg, V. (1988). Clinical considerations in the assessment of young handicapped children. In T. D. Wachs & R.
Sheehan (Eds.), Assessment of young developmentally disabled children (pp. 61–73). New York: Plenum.
Stratton, K., Howe, C., & Battaglia, E (1996). Fetal alcohol syndrome: Diagnosis, epidemiology, prevention, and
treatment. Washington, DC: National Academy Press.
Stray-Gunderson, K. (Ed.). (1986). Babies with Down syndrome: A new parents’ guide. Kensington, MD: Woodbine
Press.
Taitz, L. S., & King, J. M. (1988). A profile of abuse. Archives of Disease in Childhood, 63, 1026–1031.
Udwin, O., & Yule, W. (1990). Expressive language of children with Williams syndrome. American Journal of Medical
Genetics-Supplement 6, 108–114.
Page 167
Veltkamp, L. J. (1994). Clinical handbook of child abuse and neglect. Madison, CT: International Universities Press.
Wasson, P., Tynan, T., & Gardiner, P. (1982). Test adaptations for the handicapped. San Antonio, TX: Education
Service Center, Region 20.
Zigman, W. B., Schupf, N., Zigman, A., & Silverman, W. (1993). Aging and Alzheimer’s disease in people with mental
retardation. In N. W. Bray (Ed.), International review of research in mental retardation (Vol. 19, pp. 41–70) New York:
Academic Press.
Page 168
CHAPTER
7
Suspected Causes
Related Problems
Andrew is a 4-year-old who rarely speaks or vocalizes. He also fails to respond or make eye contact when others speak
to him. He has some activities he will engage in incessantly, such as spinning parts of a toy truck or twirling his fingers
in front of his eyes. Andrew has epileptic seizures almost daily, is not yet toilet trained, rises early in the morning and
awakens once or twice each night—problems that provide additional stress to his caring, beleaguered parents. He was
initially identified as having severe to profound mental retardation and has more recently been identified as having
Autistic Disorder
Peter is a 12-year-old who speaks infrequently and often appears to ignore remarks directed to him by others. He
occasionally repeats the full text of a television commercial containing words he neither uses nor appears to understand
in other contexts. Peters expressive and receptive language, as measured through standardized tests, appear delayed, his
vocal intonation sounds unmodulated in pitch; and he rarely
Page 169
seems able to practice the give-and-take required for conversation. Although Peter was initially identified as having
autism, he has recently been diagnosed as having pervasive developmental delay not otherwise specified.
Amelia is a 10-year-old girl who was considered normal in her development of language until her extreme difficulty in
using language for communication was noticed when she entered preschool. Despite having near-normal language
abilities on standardized measures, her need for sameness and her difficulty in engaging in social interaction make her a
very solitary child. She performs best in school subjects such as mathematics and geography, which appear to interest
her greatly. Her problems have been tentatively identified as associated with Asperger’s Disorder.
Defining the Problem
Autistic spectrum disorder, the diagnostic category that encompasses many of the problems of Andrew, Peter, and
Amelia, is found in 0.02 to 0.05 % of the population, or in about 2 to 5 of every 10,000 people (American Psychiatric
Association, 1994). Recently, somewhat higher estimates have suggested as many as 10 to 14 of every 10,000
individuals (Trevarthen, Aitken, Papoudi, & Robarts, 1996). Even with these higher estimates, autism spectrum disorder
is relatively rare. The magnitude of its impact on affected children and their families, however, has caused it to be the
focus of considerable research and clinical writing. Its impact stems from the severity of symptoms, which include
delayed or deviant language and social communication and abnormal ways of responding to people, places, and objects.
There is also some evidence to suggest that it is becoming more prevalent (Wolf-Schein, 1996; cf. Trevarthen et al.,
1996).
About 75% of children with autism are diagnosed with mental retardation as well (Rutter & Schopler, 1987), with about
50% reportedly having IQs less than 50 and fewer than 33% having IQs greater than 70 (Waterhouse, 1996). There is
great uncertainty associated with these figures, however, because the diagnosis of mental retardation is often
questionable given the difficulty these children have in participating in formal assessment procedures (Wolf-Schein,
1996).
In the influential DSM–IV system of nomenclature (American Psychiatric Association, 1994), autistic spectrum disorder
is referred to as Pervasive Developmental Disorder (PDD), a category that includes autistic disorder, Rett’s disorder,
childhood disintegrative disorder, Asperger’s disorder, and pervasive developmental disorder not otherwise specified
(PDD-NOS) (Waterhouse, 1996). Readers should be aware that an alternative and somewhat more complicated set of
diagnoses related to autism has been formulated by the World Health Organization (WHO) in the International
Classification of Diseases (ICD; WHO, 1992, 1993), although it is not discussed here.
Autistic disorder is sometimes referred to as Kanner’s autism or infantile autism and is the most common of spectrum
disorders. Its symptoms are similar to the other disorders within the PDD category, including severe delays in
‘‘reciprocal social interaction skills, communication skills, and the presence of stereotyped behavior, interests and
activities” (American Psychiatric Association, 1994, p. 65). Although chil-
Page 170
dren with autistic disorder share many characteristics with children with other PDD disorders, the primary focus of this
chapter is children with autistic disorder and their surprising degree of heterogeneity, with regard to levels of cognitive
function, language outcomes, and specific symptoms (Hall & Aram, 1996; Myles, Simpson, & Becker, 1995). The
considerable differences within this single disorder are illustrated by the range of difficulties described at the outset of
the chapter in relation to Peter and Andrew.
The American Psychiatric Association (1994) definition for Autistic Disorder is presented in Table 7.1. Besides calling
attention to these children’s very marked problems in social interaction and language, this definition emphasizes the
abnormal and
Table 7.1
A Definition of Autistic Disorder (American Psychiatric Association, 1994)
A. A total of six (or more) items from (1), (2), and (3), with at least two from (1) and one each from (2) and (3):
(1) Qualitative impairment in social interaction, as manifested by at least two of the following:
(a) marked impairment in the use of multiple nonverbal behaviors such as eye-to-eye gaze, facial expression,
body postures, and gestures to regulate social interaction;
(b) failure to develop peer relationships appropriate to developmental level;
(c) a lack of spontaneous seeking to share enjoyment, interests, or achievements with other people (e.g., by a
lack of showing, bringing, or pointing out objects of interest);
(d) lack of social or emotional reciprocity.
(2) Qualitative impairments in communication as manifested by at least one of the following:
(a) delay in, or total lack of, the development of spoken language (not accompanied by an attempt to
compensate through alternative modes of communication such as gestures or mime);
(b) in individuals with adequate speech, marked impairment in the ability to initiate or sustain a conversation
with others;
(c) stereotyped and repetitive use of language or idiosyncratic language;
(d) lack of varied, spontaneous make-believe play or social imitative play appropriate to developmental level.
(3) Restricted repetitive and stereotyped patterns of behavior, interests, and activities, as manifested by at least
one of the following:
(a) encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is
abnormal either in intensity or focus;
(b) apparently inflexible adherence to specific, nonfunctional routines or rituals;
(c) stereotyped and repetitive motor mannerisms (e.g., hand or finger flapping or twisting, or complex whole-
body movements);
(d) persistent preoccupation with parts of objects.
B. Delays or abnormal functioning in at least one of the following areas, with onset prior to age 3 years: (1) social
interaction, (2) language as used in social communication, or (3) symbolic or imaginative play.
C. The disturbance is not better accounted for by Rett’s syndrome or Childhood Disintegrative Disorder.
Note. From Diagnostic and Statistical Manual of Mental Disorders (4th ed., pp. 70–71) by the American Psychiatric
Association, 1994, Washington, DC: Author. Copyright 1994 by the American Psychiatric Association. Adapted with
permission.
Page 171
often rigid pattern of interaction with objects and other aspects of their environment that is characteristic of children with
autism. In this definition, the onset is specified as being prior to age 3 because of the variety of ages at which marked
changes in development are reported: Although many children are described by their parents as having always been
distant and unresponsive, others are described as having responded to social interaction normally until age 1 or 2
(American Psychiatric Association, 1994; Prizant & Wetherby, 1993).
Difficulties in defining autistic disorder arise from the remarkable heterogeneity of children with the disorder and from
the extent to which their problems overlap with those associated with other developmental disorders and with mental
retardation (Carpentieri & Morgan, 1996; Nordin & Gillberg, 1996; Waterhouse et al., 1996). Table 7.2 lists the other
disorders included within PDD and the characteristics that are thought to distinguish autistic disorder from them.
A number of researchers (e.g., Rapin, 1996; Waterhouse, 1996; Wing, 1991 ) have explored common features across
specific disorders included within PDD and have suggested that frequent changes in terminology and clinical categories
are likely to continue as more is learned about these children (Waterhouse, 1996). In particular, considerable research
has recently been devoted to the defining boundaries between Asperger’s syndrome and autistic disorder in individuals
with higher measured IQs (Ramberg, Ehlers, Nyden, Johansson & Gillberg, 1996; Wing, 1991).
The overlap between mental retardation and autistic spectrum disorder also presents major challenges to researchers and
clinicians. As mentioned earlier, about 75% of children with autistic spectrum disorder are diagnosed with mental
retardation. In addition, the severity of mental retardation appears to be related to the frequency of autistic symptoms.
For example, in one recent Swedish study (Nordin & Gillberg, 1996a), autistic spectrum disorder was identified in about
12% of children with mild retardation, whereas it was identified in 29.5% of those with severe retardation. The fact that
not all children with mental retardation show autistic symptoms, however, suggests that much more needs to be done to
understand the relationship of these two conditions. Increased understanding of the nature of the relationship between
mental retardation and the specific cognitive deficits associated with autistic spectrum disorder should help improve the
quality of care directed to children with these combined difficulties.
Additional difficulties in diagnosis are due to the changing nature of symptoms associated with autistic disorder with
age, although currently there is considerable disagreement over the nature and direction of those changes (i.e.,
improvement vs. decline; e.g., see Eaves & Ho, 1996; Piven, Harper, Palmer, & Arndt, 1996). Despite possible changes
over time, however, it is rare for individuals diagnosed as autistic in childhood to enter adulthood without significant
residual problems (e.g., see Piven et al., 1996). A personal experience with an acquaintance in graduate school—who in
retrospect would probably have been identified as having Asperger’s disorder and whom I will call Matthew Metz—
captures this generality for me: Although Matthew would eventually complete a Ph.D. in history, he invariably greeted
members of our graduate house he saw on campus with an introduction—“Hi, you may not remember me, but my name
is Matthew Metz.” This greeting persisted despite months of having
Page 172
Table 7.2
Differentiating Autistic Disorder From Other Disorders Within the Autistic Spectrum Disorder
(Called Pervasive Developmental Disorders, PDD, by the American Psychiatric Association, 1994)
Rett’s disorder
An autosomal disorder affecting only women (probably no men are identified because of fetal mortality)Normal pattern
of early physical, motor development with later loss of skills and deceleration in head growthAssociated with severe or
profound mental retardation and limited language skillsCharacteristic hand movements (“wringing” or “washing” of
hands) Differences in sex ratios (female only versus predominately male in autism)Head growth slows down after
infancy only in Rett’s; Autistic disorder may actually be associated with an abnormally large head circumference
(Waterhouse et al., 1996)Social interaction difficulties are more persistent into late childhood in autism than in Rett’s
disorderDifferentiation from autism depends on good evidence of normal development during first two years; otherwise,
the autism categorization is preferred Childhood disintegrative disorder Marked regression after at least 2 years of
seemingly normal developmentSocial, communication, and behavioral characteristics similar to autismUsually
associated with severe mental retardationVery rare disorder, possibly more common in men than women Asperger’s
disorder Preserved language function in the presence of ‘‘severe and sustained impairment in social interaction” (p. 75)
Restricted, repetitive patterns of behavior, interests, and activities (e.g., pronounced interest in train schedules) Absence
of significant language and cognitive deficits in Asperger’s disorder, but very significant delays in autismExcept for
social communication deficits, adaptive skills are developmentally appropriate in Asperger’s, but not in
autismAsperger’s Disorder is typically diagnosed later than autism, often at school age, possibly due to later onset than
autism Pervasive Developmental Disorder Not Otherwise Specified (PDD-NOS) Severe and pervasive impairment in
social interaction and/or verbal and nonverbal communication and/or presence of restricted, repetitive patterns of
behavior, interests, and activitiesFailure to meet specific criteria required for other PDD categories described above with
regard to severity of symptoms or age of onset Onset or symptoms failing to conform to criteria for other PDD, including
autismSometimes referred to as “atypical autism”
Page 173
shared dinners at a common table with the acquaintances he addressed. As you may expect, Matthew had a very
restricted social sphere that was largely confined to fellow students in his graduate program. When I last heard of him, he
was living with his elderly parents and earned a limited income by writing entries on historical subjects for publishers of
an encyclopedia. Thus, even in the presence of the intellectual abilities required for completion of a graduate degree,
significant challenges for Matthew persisted well into adulthood.
Suspected Causes
To date, discussions of etiology for autistic spectrum disorder have focused on socioenvironmental, behavioral, and
purely organic possibilities (Haas, Townsend, Courchesne, Lincoln, Schreibman, & Yeung-Courchesne, 1996;
Waterhouse, 1996; Wolf-Schein, 1996). The socioenvironmental perspective had strong proponents in the 1960s,
especially among psychoanalysts who held that poor parenting was the source of these children’s difficulties (e.g.,
Bettelheim, 1967). More recently, however, such theories have lost favor with almost all researchers and clinicians.
Currently, the dominant perspective on autism is that it has one or more organic bases in the form of underlying
neurological abnormalities.
The nature of neurologic abnormalities underlying autism has not yet been well documented and constitutes a major area
of research (Rapin, 1996). Proposed sites of suspected neurologic abnormalities are the frontal lobe (Frith, 1993), the
reticular formation of the brain stem (Rimland, 1964), and the cerebellum (Courchesne, 1995)—just to name a few (cf.
Cohen, 1995; Wolf-Shein, 1996). In addition, the role that the right hemisphere of the brain plays in autistic symptoms
has received some attention (e.g., Shields, Varley, Broks, & Simpson, 1996). Although localized functional
abnormalities have been sought, it has frequently been suggested that the underlying abnormalities are in fact likely to be
diffuse (Rapin, 1996).
As a more distal causal factor leading to the brain abnormalities that are then believed to cause autistic symptoms more
directly, genetic factors are implicated for some cases of autism. Evidence supporting this reasoning includes (a) the
preponderance of males in all categories with PDD except Rett’s disorder (American Psychiatric Association, 1994;
Waterhouse et al., 1996),1 (b) the tendency for PDD to occur much more frequently in some families than in others
(Folstein & Rutter, 1977), and (c) the tendency for PDD to occur frequently among individuals with fragile X, where
genetic abnormalities are well documented (Cohen, 1995).
Many cases of autism, however, have yet to be linked to genetic abnormalities. Nonetheless, it is suspected that these
cases are still due to organic factors arising before rather than during or after the child’s birth (Rapin, 1996). Other
suspected sources of the presumed neurologic abnormalities include metabolic disorders and infectious disorders (e.g.,
congenital rubella, encephalitis, or meningitis; Rapin, 1996; Wolf-Schein, 1995). In some cases, no likely causal factor is
suggested—leading to cases that are termed
1 The reasoning is that male preponderance may exist because males’ single X chromosome makes them at special risk
for X-chromosome defects.
Page 174
idiopathic, that is, without a known cause. Efforts to identify the real nature of such idiopathic cases and to identify the
specific mechanisms by which known causes act to create autistic symptoms represent some of the most needed areas for
research on PDD.
Special Challenges in Assessment
Children with autistic spectrum disorder present the greatest imaginable challenges to the clinician contemplating formal
testing as a means of collecting information. Frequently, these children’s essential social interaction deficits dramatically
limit their participation in the usual give-and-take required by most standardized language instruments. Consequently,
informal measures, especially parent questionnaires and behavioral checklists, are used very frequently for purposes of
screening, diagnosis, and description of language among children and adults with autistic spectrum disorder (Chung,
Smith, & Vostanis, 1995; DiLavore, Lord, & Rutter, 1995; Gillberg, Nordin, & Ehlers, 1996; Nordin, & Gillberg, 1996;
Prizant & Wetherby, 1993; Sponheim, 1996).
Alternatives to standardized tests are particularly valuable for those children whose communication repertoire is very
limited, a group that includes as many as 50% of all children with autism (Paul, 1987). Where the purpose of an
evaluation is to aid in diagnosis of the disorder, it has been argued that parent interviews may be considerably better than
observational methods that may be applied by clinicians (Rapin, 1996). Table 7.3 lists some of the most common
questionnaires, interview schedules, checklists, and other instruments used in screening and diagnosing autistic spectrum
disorders. Although many of these focus on the entire range of difficulties often seen as part of autism, some focus on
selected skill areas, such as communication or play.
Despite the frequent need for nontraditional, observational techniques, more traditional, standardized speech and
language tests can play a useful role in language assessments of some children with autism. In particular, children with
more elaborate language and communication skills—children who are often described as “high functioning” may be
amenable to standardized testing when appropriate attention is paid to motivation and other enabling factors. Information
obtained from family members and other individuals who are very familiar with the child can help pinpoint the
reinforcers that will prove most helpful in facilitating a child’s participation and warn against specific stimuli (e.g., types
of environmental noise such as traffic noise or the sound of some electrical devices) that are likely to be distracting or
disturbing to the individual child.
For higher functioning children, standardized speech and language testing may not only be feasible, but quite vital to a
thorough understanding of their strengths and weaknesses—particularly for receptive skills that, unlike expressive skills,
cannot be as readily observable in spontaneous productions.
Even when expressive language testing is feasible, analysis of spontaneous productions will almost always constitute a
particularly desirable tool for expressive language assessment. Not only does analysis of spontaneous language allow
one to simultaneously examine variables related to numerous expressive language domains (Snow & Pan, 1993), one can
argue that the validity of such measures will be particularly superior for children who are so reactive to standardized
testing procedures. In
Page 175
Table 7.3
Recent Behavioral Checklists and Interview for Screening and Description of Autistic Spectrum Disorder
(Chung, Smith, & Vostanis, 1995; Gillberg, Nordin, & Ehlers, 1996; Wolf-Schein, 1996)
Screening Checklist for Autism in Children from 18 to Uses 14 items that are responded
Toddlers (CHAT; Baron-Cohen, 30 months to by parent (n = 9) and by
Allen, & Gillberg, 1992) clinician (n = 5 items); found to
have a low rate of false positives
and reported to have good
reliability (Gillberg, Nordin &
Ehlers, 1996)
Pre-Linguistic Autism Children under 6 Uses 12 play-based activites
Diagnostic Observation years of age with 17 associated ratings, with
Schedule (PL-ADOS) items administered by the
(DiLavore, Lord, & Rutter, examiner or through one of the
1995) child’s caregivers; designed to
relate directly to the DSM–IV or
ICD-10 criteria
Asperger Syndrome Screening 7 to 16 years A teacher questionnaire
Questionnaire (ASSQ; Ehlers & containing 27 items; it appears to
Gillberg, 1993) consistently identify Asperger’s
disorder, but it may overidentify
in cases of other social
abnormalities; one of the few
measures developed to be
sensitive to Asperger’s disorder.
Diagnosis and Description Autism Diagnostic Interview- Children from 18 Uses interview of parents or
Revised (ADI-R; Lord, Rutter, months to adults caregivers of Individuals with
& Le Couteur, 1994) suspected autistic disorder.
Designed to relate directly to the
DSM–IV or ICD-10 criteria.
Childhood Autism Rating Scale Children Uses direct observation of
(CARS) (Schopler, Reichler, & children with suspected autistic
Renner, 1986) disorder. Designed to be used in
diagnosis and description of
severity.
Page 176
chapter 10, the use of spontaneous language sample analyses is discussed at some length.
Expected Patterns of Language Performance
Certain specific language behaviors are frequently associated with autism, although they may also occur infrequently in
normal language development and in other language disturbances. Among these behaviors are echolalia, pronominal
reversals, and stereotypic or nonreciprocal language (Fay, 1993; Paul, 1995).
Echolalia consists of the immediate or delayed repetition of speech, often without evident communicative intent.
Echolalic productions can often be quite complex in their language structure relative to the level of the child’s
spontaneous communications and may simply represent memorized routines rather than creatively generated language.
The presence of echolalic productions often appears to indicate a child’s attempt to stay engaged in the social interaction
despite failing to understand what has just been said or being unable to produce a more suitable response. Such
productions, consequently, may be communicative in intent and therefore provide information about the nature of the
child’s pragmatic skills (Paul, 1995).
Pronominal reversals involves an apparent confusion in pronoun choice in which first and second person pronouns are
substituted for one another. Thus, for example, a child might say “you go” when apparently referring to him-or herself.
Although at one point in time these errors were thought to reflect the child’s failure to distinguish him-or herself from
the environment, they are currently taken to reflect the child’s inflexible use of language forms. In short, the child treats
pronouns, which are sometimes referred to as “deictic shifters,” as unchanging labels, thereby failing to recognize the
shift that allows “I” to refer to several different speakers in turn simply by virtue of their role as speaker, and “you’’ by
virtue of their role as listener. Although once considered a hallmark of the disorder, pronominal reversals are not
necessarily used frequently (Baltaxe & D’Angiola, 1996). The Personal Perspective included in this chapter contains the
reflections of Donna Williams, an adult with autism, who argues persuasively for the relative unimportance of pronoun
use as a target for therapy, given all of the words one needs to learn.
PERSONAL PERSPECTIVE
The following passage comes from a book written by a young woman who describes herself as having autism
associated with high functioning (Williams, 1996, pp. 160–61). In this passage, she discusses which words are
important and which are unimportant to learn:
“Words to do with the names of objects are probably the most important ones to connect with as it is hard to ask for
help if you haven’t got these. If someone can only say ‘book,’ at least you can work out what they might want done
with
Page 177
it. if they just say ‘look’ but haven’t connected with ‘book’, you have a whole house full of things that can be
‘looked’ (at or for).
“Words to do with what things are contained in (box, bottle, bag, packet), made of (wood, metal, cloth, leather, glass,
plastic, powder, goo) or what is done with them (eating, drinking, closing, warming, sleeping) are also really important
to learn. Much later, less tangible, less; directly observable words such as those to do with feelings (had enough, hurt,
good, angry) or body sensations (tired, full, cold, thirsty) are really important to connect with.
“Words to do with pronouns, such as ‘I,’ ‘you,’ ‘he,’ ‘she,’ ‘we’ or ‘they,’ aren’t so important. Too many people make
a ridiculous big hoo-ha about these things, because they want to eradicate this ‘symptom of autism,’ or for the sake of
‘manners’ or impressiveness. Pronouns are ‘relative’ to who is being referred to, where you are and where they are in
space and who you are telling all this to. That’s a lot of connections and far more than ever have to be made to
correctly access, use and interpret most other words. Pronouns are, in my experience, the hardest words to connect with
experienceable meaning because they are always changing, because they are so relative. In my experience, they require
far more connections, monitoring and feedback than in the learning of so many other words.
“Too often so much energy is put into teaching pronouns and the person being drilled experiences so little consistent
success in using them that it can really strongly detract from any interest in learning all the words that can be easily
connected with. I got through most of my life using general terms like ‘a person’ and ‘one,’ calling people by name or
by gender with terms like ‘the woman’ or ‘the man’ or by age with terms like ‘the boy.’ It didn’t make a great deal of
difference to my ability to be comprehended whether I referred to these people’s relationship to me or in space or not.
These things might have their time and place but there are a lot of more important things to learn which come easier
and can build a sense of achievement before building too great a sense of failure.”
Stereotypic or nonreciprocal language refers to idiosyncratic use of words or even whole sentences (Paul, 1995). Often
the particular word or phrase seems to be used because it was first heard in a particular situation or in conjunction with a
specific event or objects. Thereafter, it is used to stand for the associated situation, event, or object, despite its lack of
meaning to anyone except a very perceptive individual present at the time the association ‘was formed. Temple Grandin,
a college professor who has recently published several books about her experiences as someone with autism, describes a
personal example of nonreciprocal language:
Teachers who work with autistic children need to understand associative thought patterns. An autistic child will often use
a word in an inappropriate manner. Sometimes these uses have a logical associated meaning and other times they don’t.
For example, an autistic child might say the word “dog” when he wants to go outside. The word ‘‘dog” is associated with
going outside. In my
Page 178
own case, I can remember both logical and illogical use of inappropriate words. When I was six, I learned to say
‘prosecution.’ I had absolutely no idea what it meant, but it sounded nice when I said it, so I used it as an exclamation
every time my kite hit the ground. I must have baffled more than a few people who heard me exclaim “Prosecution!” to
my downward spiraling kite. (Grandin, 1995, p. 32)
In addition to characteristic kinds of atypical language use, patterns of language strengths and weaknesses among
children with autistic disorder and Asperger’s disorder have received extensive attention by researchers. Table 7.4
summarizes the language characteristics described for three diagnoses in the spectrum: two forms of autistic disorder and
Asperger’s disorder. The two descriptions provided under autistic disorder are included because of the relatively rich
research base that has identified very different skills seen in individuals who can be described as high- versus low-
functioning in terms of severity as well as in terms of nonverbal intelligence scores. A study performed by a large group
of researchers headed by Isabelle Rapin (1996) provides the most comprehensive study of the largest number of children
with autism to date; it made use of normal controls and two other control groups—(a) a group of language-impaired
children to act as controls for the high-functioning children with autism and (b) a group of children without autism but
with low nonverbal IQs to act as a control group for the low-functioning children with autism. That multiyear, multisite
study provided much of the information included in Table 7.4. Despite my use of the subcategories high- and low-
functioning, it should be noted that researchers have identified several subgroupings of autistic spectrum disorder beyond
those discussed in this chapter, including aloof, passive, and active-but-odd; e.g., Frith, 1991; Sevin et al., 1995;
Waterhouse, 1996; Waterhouse et al., 1996).
Related Problems
Autistic Disorder, and indeed most of the disorders on the autistic spectrum, are characterized by a number of behavioral
problems in addition to those already discussed in terms of communication and language. Two of these—“restricted
repetitive and stereotyped patterns of behavior, interests, and activities” and “lack of varied, spontaneous make-believe
play or social imitative play appropriate to developmental level” are considered central enough to the nature of the
disorder to be listed in the DSM–IV definition (American Psychiatric Association, 1994). They are closely related.
Restricted and stereotyped patterns of behavior, interests, and activities can include behaviors such as the child’s
rocking, flapping one or both hands in front of his or her own eyes, repeatedly manipulating parts of objects (such as
spinning the wheel on a toy or repeatedly opening and closing a hallway door), or, more alarmingly, repeatedly biting or
striking others or him-or herself. Some of these repetitive behaviors can be interpreted as self-stimulatory or as efforts by
the child to deal with anxiety and avoid overstimulation (e.g., Cohen, 1995); others are more difficult to interpret.
Stereotyped, repetitive behaviors (sometimes referred to as stereotypies) will often need to be addressed in order to free
the child to attend to important interactions (such as assessment or establishing relationships with peers). How they
should be addressed
Page 179
Table 7.4
Patterns of Strengths and Weaknesses Among Children With Autistic Disorder—High-Functioning,
Autistic Disorder—Low-Functioning, and Asperger’s Disorder
Autistic disorder–high-
functioning (AD-HF)
Expressive vocabulary (Rapin, 1996)Written language superior to oral language and superior to written language skills
of children with delayed language skills, but normal intelligence (Rapin, 1996)Relatively less use of echolalia than in
AD-LF Receptive language more affected than expressive language (Rapin, 1996)Functional use of expressive language
below performance on most tests of expressive language (Rapin, 1996)Pragmatic skillsRapid naming within a category
(Rapin, 1996)Formulated output of connected speech (Rapin, 1996)Verbal reasoning (Rapin, 1996)Delayed development
of questionasking as pronounced in AD-LF (Rapin, 1996) Strengths:Preserved function on visuospatial and visual-
perceptual skills (Rapin, 1996)Weaknesses:Marked delay in onset of ability to engage in symbolic play (Rapin, 1996)
Possible deficits in memory (Rapin, 1996)Subtle motor deficits, especially affecting gross motor skills (Rapin, 1996) that
are more consistent with language skills than nonverbal IQ
(Continued)
Page 180
Table 7.4 (Continued)
Autistic disorder–low-
functioning (AD-LF)
Expressive vocabulary is a relative strength and is generally better than receptive vocabularyPatterns of strength and
weaknesses may be especially difficult to determine because of floor effects on many measures (Rapin, 1996) Verbal
communication may be absent in about half of these children (Rapin, 1996)When present, most areas of language are
severely affected (Rapin, 1996)Reported temporary regression of language skills in early development (Rapin, 1996)
Strengths:Nonverbal performance supperior to verbal performance (Rapin, 1996) Asperger’s Disorder (AD) Generally
preserved language skills (American Psychiatric Association, 1994)Phonology, except possibly in the areas of
prosodySyntax Pragmatic skills (Ramberg, Ehlers, Nyden, Johansson, & Gillberg, 1996; Wing, 1991)Atypical prosody
and vocal characteristics (Ramberg et al., 1996) Strengths:Normal nonverbal intelligenceWeaknesses:Motor clumsiness
(Ramberg et al., 1996; Wing, 1991) Note. Asperger’s Disorder is considered equivalent to Autistic Disorder-High
Functioning by some authors (e.g., Rapin, 1996).
Page 181
must be determined in relation to their potentially adaptive role from the child’s perspective. Team approaches using
behavioral interventions and, at times, drug intervention are sometimes useful.
The high frequency of these stereotyped patterns of interaction is combined with a lack of the spontaneous, imaginative
play considered so characteristic of childhood. Although this deficiency has been noted since autism was first described
by Kanner in 1943, it has recently been seen as related to these children’s apparent inability to assume alternative
perspectives—an ability that also supports social interaction. It has been said that one of the chief cognitive deficits in
children with autistic disorder may be their lack of a theory of mind, the ability to think about emotions, thoughts,
motives—either in themselves or others (Frith, 1993).
Sometimes, pronounced sensory abnormalities have been inferred in many autistic children on the basis of their apparent
avoidance of and negative reactions to many auditory, visual, and tactile stimuli. In particular, hypersensitivity and
hyposensitivity have been associated with autistic spectrum disorders (e,.g., Roux et al., 1995; Sevin et al., 1995).
Recently, a controversial therapy technique, auditory integration training (Rimland & Edelson, 1995), has been devised
in an attempt to eliminate these, abnormal responses to auditory stimuli seen in some children.
In a growing number of studies, children with autism spectrum disorder have been found to be at increased risk for motor
abnormalities. For example, in a recent large-scale study, children in both high-and low-functioning groups showed a
greater frequency of motor abnormalities than did groups of children with either mental retardation without autism or
SLI (Rapin, 1996). However, oromotor impairments tended to be more common and more severe among children in the
low-functioning, group. Among the difficulties noted have been akinesia (absent or diminished movement), bradykinesia
(delay in initiating, stopping, or changing movement patterns); and dyskinesia (involuntary tics or stereotypies; Damasio
& Maurer, 1978) as well as problems with muscle tone, posture, and gait (Page & Boucher, 1998). Of particular interest
to speech-language pathologists who may wish to work on oral motor activities in efforts to foster speech or on manual
gestures have been reports of oral and manual dyspraxia, difficulties in the performance of purposeful voluntary
movements in the absence of paralysis or muscular weakness (Page & Boucher, 1998; Rapin, 1996).
Other problems that are more common among children on the autistic disorder spectrum than among children without
identified problems are epilepsy, especially a form called infantile spasms, and sleep disorders (Rapin, 1996). ADHD
(discussed in chap. 5), is also more prevalent (Wender, 1995).
Summary
1. Autistic spectrum disorder, also termed Pervasive Developmental Disorder (PDD), encompasses at least four related
and relatively rare disorders: Rett’s disorder, autistic disorder, Asperger’s syndrome, childhood disintegrative disorder,
and pervasive developmental disorder not otherwise specified (PDD-NOS) according the diagnostic system of the DSM–
IV (American Psychiatric Association, 1994).
Page 182
2. Difficulties shared by children with autistic spectrum disorders include delayed or deviant language, social
communication, and abnormal ways of responding to people, places, and objects.
3. Autistic spectrum disorders frequently co-occur with mental retardation, perhaps because of a shared cause:
underlying neurologic abnormalities.
4. Although the source of underlying neurologic abnormalities is generally unknown, genetic factors and prenatal
infections are suspected in some cases.
5. Children with autistic spectrum disorder are often unable to participate in standardized testing required for the
diagnosis of their disorder, making the use of observational methods and parental questionnaires a very frequent and
relatively well-studied alternative.
6. Echolalia, pronominal reversals, and stereotypic language are abnormal features of language that are seen more
frequently in autistic disorder than in other developmental language disorders.
7. Other problems affecting children with autistic spectrum disorders include a lack of spontaneous, imaginative play and
restricted patterns of behavior, interests, and activities. In addition, these children are at increased risk for motor
abnormalities, seizures, and sleep disorders.
Key Concepts and Terms
akinesia: absent or diminished movement.
autistic disorder: the major and most frequently occurring disorder category within the larger DSM–IV (American
Psychiatric Association, 1994) definition of Pervasive Developmental Disorders; often used synonymously with infantile
autism or Kanner’s autism.
Asperger’s disorder: an autistic disorder within the larger DSM–IV category of Pervasive Developmental Disorders in
which early delays in communication are absent; often considered synonymous with high-functioning autism.
bradykinesia: a motor abnormality characterized by delays in initiation, cessation, or alteration of movement pattern.
childhood disintegrative disorder: a very rare autistic disorder within the larger DSM–IV category of Pervasive
Developmental Disorders in which a period of about 2 years of normal development is followed by autistic symptoms.
dyskinesia: a movement abnormality characterized by involuntary tics or stereotypies.
dyspraxia: difficulties in the performance of purposeful voluntary movements in the absence of paralysis or muscular
weakness; for example, oral dyspraxia, manual dyspraxia, verbal dyspraxia (also frequently referred to as verbal apraxia).
echolalia: immediate or delayed repetition of a previous speaker’s or one’s own utterance.
Page 183
epilepsy: a chronic disorder associated with excessive neuronal discharge, altered consciousness, and sensory activity,
motor activity, or both.
Pervasive Developmental Disorders (PDDs): the group of severe disorders having their onset in childhood, characterized
by significant deficits in social interaction and communication, as well as the presence of stereotyped behavior, interests
and activities; considered synonymous with autistic disorder spectrum disorder.
pervasive developmental disorder not otherwise specified (PDD-NOS): Within the DSM–IV system of disorder
classification, this diagnosis is made when some but not all of the major criteria for autistic disorder are met; also
referred to as atypical autism.
pronominal reversals: incorrect use of first- and third-person pronouns (e.g., “you want” to mean ‘‘I want”), which are
considered typical of autistic speech.
Rett’s disorder: a severe autosomal pervasive developmental disorder affecting only girls, in which a brief period of
normal development is followed by regression; associated with severe or profound levels of mental retardation.
stereotypy: frequent repetition of a meaningless gesture or movement pattern.
theory of mind: the ability to think about emotions, thoughts, and motives—either in oneself or others; considered to be a
primary deficit among individuals whose difficulties fall along the autistic disorder spectrum.
Study Questions and Questions to Expand Your Thinking
1. On the Internet, look for sites related to PDD. For which disorders within that designation do you find web sites? Who
are the main audiences for these sites? How do sites respond differently to these various audiences?
2. On the basis of Table 7.2, list the major characteristics of a child’s behavior that will be needed to determine which
PDD label is most appropriate.
3. On the basis of the discussion of suspected causes of PDD, outline two major research needs that should be pursued by
future researchers.
4. List in order of importance the problems—other than those intrinsic to autism itself—presented to adults who wish to
interact with children with PDD.
5. What features of a child’s communication would cause you to be most concerned that he or she was showing
symptoms of autism? What features of his or her language?
6. What practical problems might a parent of a child with PDD face that are different from those faced by other parents?
7. Find out what definition of autistic spectrum disorders is used in a local school system. How does it differ from the
system described in DSM–IV (American Psychiatric Association, 1994)?
Page 184
Recommended Readings
Angell, R. (1993). A parent’s perspective on the preschool years. In E. Schopler, M. E. Van Bourgondien, & M. M.
Bristol (Eds.), Preschool issues in autism. New York: Plenum.
Campbell, M., Schopler, E., Cueva, J. E., & Hallin, A. (1996). Treatment of autistic disorder. Journal of the American
Academy of Child and Adolescent Psychiatry, 35, 124–143.
Grandin, T. (1995). Thinking in pictures and other reports from my life with autism. New York: Doubleday.
Schopler, E. (1994). Behavioral issues in autism. New York: Plenum.
Strain, P. S. (1990). Autism. In M. Hersen & V. B. Van Hasselt (Eds.), Psychological aspects of developmental and
physical disabilities: A casebook (pp. 73–86). Newbury Park, CA: Sage.
References
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington,
DC: Author.
Angell, R. (1993). A parent’s perspective on the preschool years. In E. Schopler, M. E. Van Bourgondien, & M. M.
Bristol (Eds.), Preschool issues in autism (pp. 17–38). New York: Plenum.
Baltaxe, C. A. M., & D’Angiola, N. (1996). Referencing skills in children with autism and specific language impairment.
European Journal of Disorders of Communication, 31, 245–258.
Baron-Cohen, S., Allen, J., & Gillberg, C. (1992). Can autism be detected at 18 months? The needle, the haystack, and
the CHAT. British Journal of Psychitary, 161, 839–843.
Bettelheim, B. (1967). The empty fortress. New York: Collier Macmillan. Carpentieri, S., & Morgan, S. B. (1996).
Adaptive and intellectual functioning in autistic and nonautistic retarded children. Journal of Autism and Developmental
Disorders, 26, 611–620.
Chung, M. C., Smith, B., & Vostanis, P. (1995). Detection of children with autism. Educational and Child Psychology,
12(2), 31–36.
Cohen, I. L. (1995). Behavioral profiles of autistic and nonautistic fragile X males. Developmental Brain Dysfunction, 8,
252–269.
Courchesne, E. (1995). New evidence of cerebellar and brainstem hypoplasia in autistic infants, children, and
adolescents: The MR imaging study by Hashimoto and colleagues. Journal of Autism and Developmental Disorders, 25,
19–22.
Damasio, A. R., & Maurer, R. G. (1978). A neurological model for childhood autism. Archives of Neurology, 35, 779–
786.
DiLavore, P. C., Lord, C., & Rutter, M. (1995). The Pre-Linguistic Autism Diagnostic Observation Schedule. Journal of
Autism and Developmental Disorders, 25, 355–379.
Eaves, L. C., & Ho, H. H. (1996). Brief report: Stability and change in cognitive and behavioral characteristics of autism
through childhood. Journal of Autism and Developmental Disorders, 26, 557–569.
Ehlers, S., & Gillberg, C. (1993). The epidemiology of Asperger syndrome: A total population study. Journal of Child
Psychology and Psychiatry, 34, 1327–1350.
Fay, W. (1993). Infantile autism. In D. Bishop & K. Mogford (Eds.), Language development in exceptional
circumstances (pp. 190–202). Mahwah, NJ: Lawrence Erlbaum Associates.
Folstein, S., & Rutter, M. (1977). Infantile autism: A genetic study of 21 twin pairs. Journal of Child Psychology and
Psychiatry, 18, 297–321.
Frith, U. (1991). Asperger and his syndrome. In U. Frith (Ed.), Autism and Asperger syndrome (pp. 1–36). Cambridge:
Cambridge University Press.
Frith, U. (1993). Autism and Asperger syndrome. Cambridge, England: Cambridge University Press.
Gillberg, C., Nordin, V., & Ehlers, S. (1996). Early detection of autism: Diagnostic instruments for clinicians. European
Child and Adolescent Psychiatry, 5, 67–74.
Grandin, T. (1995). Thinking in pictures and other reports from my life with autism. New York: Double-day.
Hall, N. E., & Aram, D. M. (1996). Classification of developmental language disorders. In I. Rapin (Ed.), Preschool
children with inadequate communication (pp. 10–20). London: MacKeith Press.
Page 185
Haas, R. H., Townsend, J., Courchesne, E., Lincoln, A. J., Schreibman, L., & Yeung-Courchesne, R. (1996). Neurologic
abnormalities in infantile autism. Journal of Child Neurology, 11(2), 84–92.
Kanner, L. (1943). Autistic disturbances of affective contact. Nervous Child, 2, 217–250.
Lord, C., Rutter, M., & Le Couteur, A. (1994). Autism Diagnostic Interview-Revised: a revised version of a diagnostic
interview for caregivers of individuals with possible pervasive developmental disorders. Journal of Autism &
Developmental Disorders, 24, 659–685.
Myles, B. S., Simpson, R. L., & Becker, J. (1995). An analysis of characteristics of students diagnosed with higher-
functioning Autistic Disorder. Exceptionality, 5(1), 19–30.
Nordin, V., & Gillberg, C. (1996a). Autism spectrum disorders in children with physical or mental disability or both: I.
Clinical and epidemiological aspects. Developmental Medicine and Child Neurology, 38, 297–311.
Nordin, V., & Gillberg, C. (1996b). Autism spectrum disorders in children with physical or mental disability or both: II.
Screening aspects. Developmental Medicine and Child Neurology, 38, 314–324.
Page, J., & Boucher, J. (1998). Motor impairments in children with autistic disorder. Child Language Teaching and
Therapy, 14, 233–259.
Paul, R. (1987). Communication. In D. J. Cohen & A. M. Donnellan (Eds.), Handbook of autism and pervasive
developmental disorders (pp. 61–84). New York: Wiley.
Paul, R. (1995). Language disorders from infancy through adolescence: Assessment and intervention. St. Louis, MO:
Mosby.
Piven, J., Harper, J., Palmer, P., & Arndt, S. (1996). Course of behavioral change in autism: A retrospective study of
high-IQ adolescents and adults. Journal of the American Academy of Child and Adolescent Psychiatry, 35, 523–529.
Prizant, B. M., & Wetherby, A. M. (1993). Communication in preschool autistic children. In E. Schopler, M. E. Van
Bourgondien, & M. M. Bristol (Eds.), Preschool issues in autism (pp. 95–128). New York: Plenum.
Ramberg, C., Ehlers, S., Nyden, A., Johansson, M., & Gillberg, C. (1996). Language and pragmatic functions in school-
age children on the autism spectrum. European Journal of Disorders of Communication, 31, 387–414.
Rapin, I. (1996). Classification of autistic disorder. In I. Rapin (Ed.), Preschool children with inadequate
communication. (pp. 10–20). London: MacKeith Press.
Rimland, B. (1964). Infantile autism. New York: Appleton.
Rimland, B., & Edelson, S. M. (1995). Brief report: a pilot study of auditory integration training in autism. Journal of
Autism & Developmental Disorders, 25, 61–70.
Roux, S., Malvy, J., Bruneau, N., Garreau, B., Guerin, P., Sauvage, D., & Barthelemy, C. (1994). Identification of
behaviour profiles within a population of autistic children using multivariate statistical methods. European Child and
Adolescent Psychiatry, 4, 249–258.
Rutter, M., & Schopler, E. (1987). Autism and pervasive developmental disorders: Concepts and diagnostic issues.
Journal of Autism and Developmental Disorders, 17, 159–186.
Schopler, E., Reichler, R. J., & Renner, B. R. (1988). The Childhood Autism Rating Scale (CARS). Revised. Los Angeles:
Western Psychological Services.
Sevin, J. A., Matson, J. L., Coe, D., Love, S. R., Matese, M. J., & Benavidez, D. A. (1995). Empirically derived subtypes
of pervasive developmental disorders: A cluster analytic study. Journal of Autism and Developmental Disorders, 25,
561–578.
Shields, J., Varley, R., Broks, P., & Simpson, A. (1996). Hemispheric function in developmental language disorders and
high-level autism. Developmental Medicine and Child Neurology, 38, 473–486.
Snow, C. E., & Pan, B. A. (1993). Ways of analyzing the spontaneous speech of children with mental retardation: The
value of cross-domain analyses. In N. W. Bray (Ed.), International review of research in mental retardation (Vol. 19,
pp. 163–192). New York: Academic Press.
Sponheim, E. (1996). Changing criteria of autistic disorders: A comparison of the ICD-10 research criteria and DSM–IV
with DSM–III–R, CARS, and ABC. Journal of Autism and Developmental Disorders, 26, 513–525.
Strain, P. S. (1990). Autism. In M. Hersen & V. B. Van Hasselt (Eds.), Psychological aspects of developmental and
physical disabilities: A casebook (pp. 73–86). Newbury Park, CA: Sage.
Page 186
Trevarthen, C., Aitken, K., Papoudi, D, & Robarts, J. (1996). Children with autism: Diagnosis and interventions to meet
their needs. London: Jessica Kingsley.
Waterhouse, L. (1996). Classification of autistic disorder (AD). In I. Rapin (Ed.), Preschool children with inadequate
communication (pp. 21–30). London: MacKeith Press.
Waterhouse, L., Morris, R., Allen, D., Dunn, M., Fein, D., Feinstein, C., Rapin, I., & Wing, L. (1996). Diagnosis and
classification in Autism. Journal of Autism and Developmental Disorders, 26, 59–86.
Wender, E. (1995). Hyperactivity. In S. Parker & B. Zuckerman (Eds.), Behavioral and developmental pediatrics (pp.
185–194). Boston: Little, Brown.
Williams, D. (1996). Autism—An inside-out approach: An innovative look at the mechanics of ‘autism ‘ and its
developmental ‘cousins.’ Bristol, PA: Jessica Kingsley.
Wing, L. (1991). The relationship between Asperger’s syndrome and Kanner’s autism. In U. Frith (Ed.), Autism and
Asperger syndrome (pp. 93–121). Cambridge, England: Cambridge University Press.
Wolf-Schein, E. G. (1996). The autistic spectrum disorder: A current review. Developmental Disabilities Bulletin, 24(1),
33–55.
World Health Organization. (1992). The ICD-10 classification of mental and behavioral disorders: Clinical descriptions
and diagnostic guidelines. Geneva, Switzerland: Author.
World Health Organization. (1993). The ICD-10 classification of mental and behavioral disorders: Diagnostic criteria
for research. Geneva, Switzerland: Author.
Page 187
CHAPTER
8
Suspected Causes
Related Problems
Bradley was 5 years old when it was determined that he had a mild, bilateral sensorineural hearing loss. Prior to
entering kindergarten, his parents described him as a shy child who disliked larger play groups and preferred playing
alone or with one close friend. In a noisy 16-child classroom, the adequacy of his hearing was first questioned by his
kindergarten teacher, who reported that she often had difficulty getting his attention and found his poor attention during
circle time inconsistent with his good attention in one-on-one situations. A hearing screening by a speech-language
pathologist, which was performed because of concerns about delayed phonologic development, was the immediate
source of a referral for the complete audiological examination in which his hearing loss was identified. After detection of
the hearing loss, Bradley was fitted for binaural behind-the-ear aids. (He loved the bright blue earmolds and tubing he
was allowed to choose.) Within a short time of the fitting, Bradley appeared more attentive during circle time and
readily made progress in work on targeted speech distortions.
Page 188
Sammy, or Samantha on formal occasions, is a 3-year-old whose moderate high-frequency hearing loss was identified
shortly after birth following her failure on a high-risk screening conducted because of her family history of hearing loss.
Because initially an ear-level fitting proved unfeasible, Sammy used a body-worn aid, which was replaced by a behind-
the-ear fitting at age 1½. Six months ago, the use of an FM trainer was extended to the home after continuous use in a
preschool group that she had attended since age 1½. Although she is experiencing some delays in speech, her
communication development otherwise appears on-target.
Desmond’s profound hearing loss was identified using auditory brainstem response (ABR) during his 3-week stay in a
neonatal intensive care unit, following his premature birth at 7 months gestational age with a birth weight of 3.1 pounds.
He required ventilator support for 5 days after birth. Now 5 years old, Desmond’s parents have been frustrated by
Desmond’s slow progress in oral language development despite years of participation in special education and several
failed attempts at successful amplification. Desmond currently uses a vibrotactile aid to increase his awareness of
environmental sounds and his speech reception and is being considered as a candidate for a cochlear implant.
Defining the Problem
Estimates of the prevalence of hearing impairment in children vary from 0.1 to 4%—or from 1 in every 25 to 1 in every
1,000 children—depending on the definitions used (Bradley-Johnson & Evans, 1991; Northern & Downs, 1991). Of
children between the ages of 3 and 17, about 52,000 have impairments severe enough to be termed deafness, where
deafness can be defined as a hearing loss, usually above 70 dB, that precludes the understanding of speech through
listening (Ries, 1994). When all levels of hearing loss are considered, hearing impairment is the most common disability
among American school children (Flexer, 1994).
The negative impact of deafness for the normal acquisition of oral language may seem obvious: You cannot learn about
phenomena with which you have limited experience. In addition, for children with profound hearing loss, this experience
is largely restricted to a sensory channel (i.e., vision) that is mismatched to the most distinctive characteristics of that
phenomenon (i.e., oral language). One line of evidence suggesting how great this mismatch is comes from a growing
body of research suggesting that the structure of oral languages differs substantially from that of visuospatial languages
(such as sign; Bellugi, van Hoek, Lillo-Martin, & O’Grady, 1993). Nonetheless, there is research suggesting that
lipreading becomes more important to oral language development as hearing impairment worsens (Mogford-Bevan,
1993).
Because limiting auditory exposure limits learning opportunities, even children with milder hearing losses—who
therefore obtain greater amounts of acoustic information about oral language than children with greater hearing losses—
experience significant consequences for their spoken language reception in everyday situations. Therefore, although this
chapter focuses most intently on children with greater degrees of hearing impairment, it also alerts readers to the
jeopardy in which children
Page 189
with even unilateral or “mild” bilateral hearing impairments are placed when it comes to language learning and academic
success (Bess, 1985; Bess, Klee, & Culbertson, 1986; Carney & Moeller, 1998; Culbertson & Gilbert, 1986; Oyler,
Oyler, & Matkin, 1988). In the Personal Perspective for this chapter, a teenager describes the ways in which deafness has
affected her school life.
PERSONAL PERSPECTIVE
The following is an excerpt from the transcript of a statement made by Darby, a high school junior with a profound
hearing impairment. She speaks about the academic and personal challenges facing her in school:
“I have never, and most likely never will, hear sounds in the same way as a hearing person. As a result, hearing people
experience things every millisecond of the day that I never will. By the same token, I have experienced things and will
experience things that no hearing person can.
“My deafness makes me different, and that difference makes me strong. I seem to get respect from other people just for
doing things a hearing person can do with ease. For example, watch television, use the telephone, listen to music, and
so on. For whatever reason, I never think about the fact that I am doing something that would normally be difficult for
someone who couldn’t hear. In fact, I have never looked at myself as someone who was limited in any way, someone
who couldn’t do something that any other hearing person could do I’ve always know that I was different, but even
though people would intimate that I wasn’t able to compete on the same’s level as hearing people, I would ignore them,
or maybe I just didn’t “hear” them.
“I have always attended Dalton, a private hearing school. It has never been, and never will be, easy for me. I have
experienced periods of rejection and isolation, but I have proven myself worthy of the privilege of attending this school
by receiving grades as good as many of my hearing peers and better than most.
“I have definitely survived the academic challenges of my school and life. Socially, I still feel though that I’m not
accepted as a true equal, but hey, that’s their problem, they don’t know what they’re missing.’’ (Ross, 1990, pp. 304–
305)
Overall degree of hearing loss, or magnitude, is a major descriptor of hearing impairment, usually based on an estimate
of an individual’s ability to detect the presence of a pure tone at three frequencies important for speech information (500,
1000, and 2000 Hz; Bradley-Johnson & Evans, 1991). Table 8.1 lists major categories of hearing loss and provides some
preliminary information about the effects of that level of loss. Although deafness is not listed as a category in the table, it
is frequently used to refer to a hearing loss greater than or equal to 70 dB (Northern & Downs, 1991).
Page 190
Table 8.1
Effect of Differing Magnitudes of Hearing Loss
Note. From Hearing in Children (4th ed., p. 14), by J. L. Northern and M. P. Downs, 1991, Baltimore: Williams &
Wilkins. Copyright 1994 by Williams & Wilkins. Reprinted with permission.
Page 191
The term hard of hearing is used to refer to lesser degrees of hearing loss that allow speech and language acquisition to
occur primarily through audition (Ross, Brackett, & Maxon, 1991).
In addition to the magnitude of loss, related variables that influence how children’s language is affected include (a)
variables affecting the auditory nature of the loss (such as type, configuration, and whether the loss is unilateral or
bilateral), (b) the age at which the hearing loss is acquired, (c) the age at which it is identified, and (d) how well the loss
is managed.
Type of hearing loss—conductive, sensorineural, or mixed—refers to the physiological site responsible for reduced
sensitivity to auditory stimuli. Conductive hearing losses result from conditions that prevent adequate transmission of
sound energy somewhere along the pathway leading from the external auditory canal to the inner ear. They can result
from conditions that block the external ear canal or interfere with the energy-transferring movement of the ossicles
(small bones) of the middle ear. Conductive losses are generally similar across frequencies and, at their most severe, do
not exceed 60 dB (Northern & Downs, 1991). Such losses can often be corrected or significantly reduced using medical
or surgical therapies (Paul & Jackson, 1993).
One particularly common cause of conductive hearing loss is middle ear infection, otitis media. The hearing loss
associated with this condition may be the most widely experienced form of hearing loss, given that 90% of children in
the United States have had at least one episode of otitis media by age 6 (Northern & Downs, 1991). Although not all
episodes of otitis media are associated with hearing losses, when they are observed the overall magnitudes of loss have
generally been found to fall from 20 to 30 dB in the affected ear (Frial Cantekin, & Eichler, 1985).
Sensorineural hearing losses result from damage to the inner ear or to some portion of the nervous system pathways
connecting the inner ear to the brain. They are responsible for the most serious hearing losses, accounting for or
contributing to most hearing losses in the severe to profound range. In addition, they account for most congenital hearing
losses (Scheetz, 1993) and are rarely reversible (Northern & Downs, 1991).
Mixed hearing losses refer to losses in which both conductive and sensorineural components are evident. Because the
conductive components of a mixed hearing loss are generally treatable, such losses often become sensorineural in nature
following effective treatment for the condition underlying the conductive loss. For example, a child with Down
syndrome may experience a mixed loss consisting of a sensorineural loss exacerbated by poor eustachian tube function
and chronic otitis media. Effective management of the middle ear condition can reduce the magnitude of the loss
substantially in many cases. Consequently, clinicians who work with children who have sensorineural losses need to be
especially aware that an already significant degree of loss can be further worsened if middle ear disease goes undetected.
Central auditory processing disorders refer to abnormalities in the processing of auditory stimuli occurring in the absence
of reduced acuity for pure tones or at a more pronounced level than would be expected given the degree of reduced
acuity. In especially severe cases, such difficulties have been described as a specific type of language disorder: verbal
auditory agnosia (Resnick & Rapin, 1991). Although central auditory processing disorders receive increasing attention
by audiologists, their sepa-
Page 192
rability from language disabilities and other learning disabilities continues to be debated (Cacace & McFarland, 1998;
Rees, 1973).
Hearing loss configuration refers to the relative amount of loss occurring at different frequency regions of the sound
spectrum. For example, a high-frequency loss is one in which the loss is largely or solely confined to the higher
frequencies of the speech spectrum. In contrast, a flat hearing loss is one in which the degree of loss is relatively constant
across the spectrum.
Knowing the magnitude and configuration of an individual’s hearing loss can help you predict what sounds will be
difficult for him or her to hear at specific loudness levels. A pair of figures may help illustrate this. Figure 8.1 consists of
two frequency × intensity graphs (like those of a traditional audiogram) on which are plotted a variety of common
sounds occurring at various intensity levels and frequencies. The shaded area on Figure 8.1A indicates the sound
frequencies and intensities that might not be heard by children with severe high-frequency hearing losses—children such
as Sammy, who was described at the beginning of the chapter. Although Sammy would easily hear environmental
sounds such as car horns or telephones as well as many speech sounds when they are produced at conversational
loudness levels, she would probably miss most fricative sounds because of their high frequency (high pitch) and low
intensity (softness) when they are produced in the same conversations.
Figure 8.1B represents the kind of loss frequently associated with deafness, the kind of loss demonstrated by Desmond.
The negligible amount of auditory information to which Desmond has access is well-illustrated by this figure. The
centrality of visual information to Desmond’s interactions with the world is further brought home when you are told that
even the best available amplification would probably fail to improve Desmond’s access to sound information.
Consequently, it is not surprising that vision has been called “the primary input mode of deaf children” (Ross, Brackett,
& Maxon, 1991) and that management of the communication needs of such children often veers away from methods in
which auditory information plays a major role (Nelson, Loncke, & Camarata, 1993), although growing effectiveness of
cochlear implants may increase that somewhat, especially as cochlear implants are used at younger ages (Tye-Murray,
Spencer, & Woodworth, 1995). A cochlear implant entails the insertion of a sophisticated device that includes an internal
receiver/stimulator and an external transmitter and microphone with a micro speech processor (Sanders, 1993). Their
rapid development and increasing application make them an exciting development in the management of severe hearing
losses.
Whether one or both ears are affected represents another important factor determining the significance of a hearing loss.
Unilateral hearing losses, ones affecting only one ear, usually have fewer negative consequences than bilateral hearing
losses. That does not mean, however, that unilateral losses are insignificant. Adequate hearing in both ears is of
particular importance when listening to quiet sounds or in noisy surroundings—especially for children. This special
importance of bilateral hearing in children arises because their incomplete language acquisition makes using language
knowledge and environmental context to “guess” the message being conveyed by an imperfect signal much harder for
them than it is for adults. In a study conducted by Bess et al. (1986), about one third of the children who exhibited
unilateral sen
Page 193
Fig. 8.1. Figures illustrating the types of sounds that are likely to be heard (unshaded areas) and not heard (shaded areas)
for two different hearing losses: a severe high-frequency loss (8.1A) and a profound hearing loss (8.1B). For purposes of
clarity, but contrary to most instances in real life, these figures represent hearing loss as identical for each ear. From
Hearing in children (4th ed., p. 17), by Northern and Downs, Baltimore: Williams & Wilkins. Copyright © 1991 by
Williams & Wilkins. Adapted by permission.
Page 194
sorineural hearing losses of 45 dB HL or greater were found to have either failed a grade or required special assistance in
school.
Despite the importance of the nature of hearing loss affecting a child to that child’s overall outcome for speech and oral
language, several nonauditory factors can play a very significant role. For example, the age at which a hearing loss is
acquired has a tremendous impact on the extent to which it will interfere with the acquisition of oral language.
Congenital hearing losses, those present at birth, are more detrimental than those acquired in early childhood, which in
turn are more detrimental than those acquired in later childhood or adulthood. Even 3 or 4 years of good hearing can
dramatically alter a child’s later language skills (Ross, Brackett, & Maxon, 1991). This fact has led to the use of the term
prelingual hearing loss to refer to a hearing loss acquired before age 2, which is thus thought to be associated with a
more significant impact (Paul & Jackson, 1993).
The age of detection of hearing loss in children is yet another variable affecting the oral language of hearing-impaired
children. The earlier the detection of hearing loss in children, the better the outcome for language acquisition—assuming,
of course, that adequate intervention follows. Recently devised methods, such as the measurement of auditory brainstem-
evoked responses and transient-evoked otoacoustic emissions, permit the detection of even mild hearing loss in children
from shortly after birth (Carney & Moeller, 1998; Mauk & White, 1995; Northern & Downs, 1991). Between 10 and
26% of hearing loss is estimated to exist at birth or to occur within the first 2 years of life (Kapur, 1996), thus making
efforts at detection an ongoing need.
Despite the possibility of early detection, however, hearing loss will escape detection for varying periods of time in
children whose hearing is not screened or is screened prior to the onset of the loss. In a recent study, Harrison and Roush
(1996) surveyed the parents of 331 children who had been identified with hearing loss. They found that when there was
no known risk factor, the median age of identification of hearing loss was about 13 months for severe to profound losses
and 22 months for mild to moderate losses. Although the presence of known risk factors was associated with decreased
age at identification for milder losses (down to about 12 months), identification for more severe losses remained about
the same in this group (12 months). Median additional delays of up to 10 months were observed between identification
of hearing loss and early interventions. These delays represent precious lost time for children whose auditory experience
of the world is compromised. Only in late 1999 have efforts to make universal screening of infant hearing a reality
(Mauk & White, 1995) received momentous support in the form of The Newborn and Infant Hearing Screening and
Intervention Act of 1999. This federal legislation provides new funding for newborn hearing screening grants to
individual states. It is hoped that this funding will cause all states to implement infant screening programs leading to a
revolution in the early identification of hearing loss.
A fourth factor influencing how hearing loss will affect children’s language development is the management of the loss.
For children with mild and moderate bilateral or unilateral losses, there is considerable agreement as to the approaches
that will optimize their access to the auditory signal on which they will rely for processing infor-
Page 195
mation about oral language. Table 8.2 lists some of the types of interventions typically considered in the hearing
management of children with these lesser degrees of loss.
When it comes to children with greater losses, however, there is much controversy among professionals as well as
members of the Deaf community (Coryell & Holcomb, 1997). A frequent battleground for those interested in
interventions for deaf youngsters concerns the primacy of oral versus signed language. Arguments favoring an emphasis
on oral language stress that the vast majority of society are users of oral language and, therefore, deaf children should be
given tools with which to negotiate effectively within that context. Further, it can be stressed that their families will
almost always (90% of the time) be composed entirely of hearing individuals (Mogford, 1993).
Arguments favoring an emphasis on sign language stress that the Deaf community is a cohesive subculture in which
visuospatial communication is the effective norm. In fact, in recent years, the Deaf community has begun to advocate for
a difference rather than disorder perspective on hearing impairment, a political perspective thought to be vital to the
emotional and social well-being of its members (Corker, 1996; Harris, 1995). Arguments favoring a strong emphasis on
sign language also sadly note that only poor levels of achievement in oral language and particularly poor
Table 8.2
Interventions Used With Children Who Have Mild
and Moderate Hearing Impairment (Brackett, 1997)
Method Function
Personal amplification FM radio systems used with remote Increase loudness levels of acoustic signals; acoustic
microphones signal enhanced relative to background noise levels; A
far superior means of dealing with a noisy classroom
than preferential seating (Flexer, 1994); one of several
types of special amplification systems (Sanders, 1993)
Sound treatment of classrooms (e.g., using carpets, acoustic Reduction of reverberation and other sources of noise
ceiling tiles, curtains)
Preferential seating Reduction of distance between speaker and child can
increase audibility of a signal; Sitting next to a child is
better than sitting in front of the child (Flexer, 1994),
although for children who require visual information,
this strategy decreases access to visual information
Inclusion in regular classroom with supplementation Provision of the wealth of social and academic
through pull-out services experiences afforded by regular classrooms, with
support designed to preview and review instructional
vocabulary as well as work on communication goals
inconsistent with classroom setting (e.g., the earliest
stages involved in acquiring a new communicative
behavior)
Auditory learning program (e.g., Ling, 1989; Stout & Improvement of the child’s attention and use of auditory
Windle, 1992) information enhanced by personal and classroom
amplification
Page 196
levels of achievement in written language (which often plateaus at a third-grade level) have been the norm in studies of
individuals with severe to profound hearing losses (Dubé, 1996; Paul, 1998).
Total communication was originally proposed as the simultaneous use of multiple communication modes (e.g.,
fingerspelling, sign language, speech, and speech reading) selected with the child’s individual needs in mind. As
implemented, however, total communication has been found typically to consist of the simultaneous use of speech and
one of several sign languages other than American Sign Language (ASL) that use word order and word inflections
closely resembling those of spoken English (Coryell & Holcomb, 1997). The most prominent examples of these sign
languages, sometimes referred to as manually coded English systems, are Signing Essential English (SEE-1), Signing
Exact English (SEE-2) and Signed English. Although most classroom teachers report using this relatively limited form of
total communication (sometimes termed simultaneous communication), it is infrequently used among adults in the Deaf
community (Coryell & Holcomb, 1997).
In a review of studies of treatment efficacy for hearing loss in children, Carney and Moeller (1998) noted a current trend
toward considering oral language as a potential second language for deaf children, to be acquired after some degree of
proficiency in a first (visuospatial) language is attained. This approach, termed the bilingual education model, is seen by
some as having the strengths associated with learning a language (i.e., ASL) for which a cohesive community of users
exists, while at the same time valuing the importance of English competence as a curricular rather than rehabilitative
issue (Coryell & Holcomb, 1997; Dubé, 1996). Data supporting this approach, however, are relatively sparse as yet. To
date, such data consist of evidence of strong academic performance in English by deaf children reared by deaf parents
who are proficient in ASL and evidence that skills in English are strongly related to skills in ASL, independent of
parental hearing status (Moores, 1987; Spencer & Deyo, 1993; Strong & Prinz, 1997).
A recent position statement of the Joint Committee of ASHA and the Council on Education of the Deaf (1998) illustrates
the growing influence of the Deaf community’s insistence that deafness be viewed as a “cultural phenomenon” rather
than a clinical condition (Crittenden, 1993). In that position statement, professionals are cautioned to adopt terminology
that respects the individual and family or caregiver preferences while facilitating the individual’s access to services and
assistive technology. Sensitivity to cultural factors is a requisite for speech-language pathologists in all settings working
with all populations. For speech-language pathologists working with members of the Deaf community, it is a
requirement of critical importance to the deaf child’s social and emotional development.
Suspected Causes
What is currently known about the causes of permanent hearing impairment in children is almost entirely restricted to
studies focused on more serious levels of hearing loss, especially deafness. Although there may be considerable overlap
in the known
Page 197
causes of deafness and milder degrees of impairment, differences also exist. Because this section limits itself to causes
related to these more severe levels of hearing loss, I remind readers that what I say relates less clearly to children with
milder losses.
Genetic factors are suspected in about half of all cases of deafness (Kapur, 1996; Vernon & Andrews, 1990). Of these
genetically based instances of deafness, about 80% are due to autosomal recessive disorders, almost 20% are autosomal
dominant disorders, and the remaining are sex-linked (Fraser, 1976). Because recessive disorders demand that both
parents of an individual contribute a defective gene for their offspring to demonstrate the disorder without necessarily
showing evidence of the disorder themselves, it is relatively uncommon for children with congenital deafness to have
parents who are also deaf. This information is important for appreciating that most congenitally deaf children grow up
with parents whose first language is oral and who will need to acquire sign as a belated second language if they are to
assist their child’s acquisition of sign.
Genetically caused deafness sometimes occurs within the context of genetic syndromes in which one or more specific
organ systems (e.g., the skeleton, skin, nervous system) are also affected. About 70 such syndromes have been identified,
including Down syndrome, Apert syndrome, Treacher Collins, Pierre Robin, and muscular dystrophy (Bergstrom,
Hemenway, & Downs, 1971). Although most genetically caused deafness will be sensorineural in type, conductive
components are also observed. Some syndromes are associated with hearing losses that are progressive, causing
increasing hearing loss over time, often at unpredictable rates. Examples of such syndromes are Friedrich’s ataxia, severe
infantile muscular dystrophy, and Hunter syndrome, as well as the closely related Hurler syndrome.
Nongenetic causes of deafness include prenatal rubella, postnatal infection with meningitis, prematurity, rh factor
incompatibility between mother and infant, exposure to ototoxic drugs, syphilis, Meniere’s disease, and mumps (Vernon
& Andrews, 1990). Four of these factors—prenatal rubella, meningitis, syphilis, and mumps—are infectious diseases
meaning that their successful prevention can drastically reduce instances of deafness from those causes.
The three noninfectious factors most commonly associated with hearing loss in children are rh factor incompatibility,
exposures to ototoxic drugs, and Meniere’s. Rh factor incompatibility refers to a condition in which a mother and the
embryo she is carrying have blood types characterized by discrepant rh factors, a circumstance that stimulates the
production of maternal antibodies against the developing child. This condition is currently considered preventable
through maternal immunization or the treatment of the infant using phototherapy or transfusions (Kapur, 1996).
Ototoxicity refers to a drug’s toxicity to the inner ear. Although the use of drugs with this side effect is usually avoided
in pregnant women and infants, they may be required as the only effective treatment for some diseases. Monitoring of
hearing can frequently prevent hearing loss in children who require treatment with ototoxic drugs because of infections
or cancer (Kapur, 1996).
Prematurity, birth 2 or more weeks prior to expected due date (Dirckx, 1997), is an increasingly frequent correlate of
hearing impairment. Whereas mortality was once
Page 198
an almost certain outcome of prematurity, improved neonatal care over the past half century (Vernon & Andrews, 1990)
has resulted in the increased survival of children who nonetheless may show residual effects. Premature birth is most
directly associated with hearing impairment and other co-occurring difficulties (e.g., mental retardation, cerebral palsy)
through the neurologic stresses it places on the infant. Indirect links between prematurity and hearing impairment lie in
the fact that premature birth is frequently precipitated by conditions that are themselves associated with hearing
impairment (such as prenatal rubella, meningitis, and rh factor incompatibility). Prematurity increases risk of deafness by
20 times (Kapur, 1996).
Special Challenges in Assessment
When assessing the oral communication skills of children with hearing impairment, the speech-language pathologist is
confronted with numerous threats to the validity of his or her decision making. Therefore, in addition to the usual care
that must be taken to determine the precise questions prompting assessment and factors that may complicate accurate
information gathering, clinicians working with children whose hearing is temporarily (e.g., during episodes of otitis
media) or permanently impaired, must consider a larger than usual range of possible complicating factors and necessary
adaptations. Table 8.3 lists some of the considerations related to the evaluation of language skills of a child with hearing
impairment.
A major first consideration for children with very severe hearing loss is the choice of language or languages in which the
child is to be assessed. Often, testing in both a sign and an oral language is reasonable for obtaining information about
potentially optimal performance as well as about development with the alternative form.
Complexities of the child’s hearing loss and of its management will need to be considered in making this decision,
because children who may be considered deaf do not always receive enough exposure to sign language to consider it
their first language (Mogford, 1993).
Although efforts to standardize assessments of ASL have begun (e.g., Lillo-Martin, Bellugi, & Poizner, 1985; Prinz &
Strong, 1994; Supalla et al., 1994), children’s performance in ASL (the most common sign language system in the
United States) is usually informally assessed by individuals with high levels of proficiency in ASL. A small number of
standardized tools have been developed. Among these are the Caro
Table 8.3
Considerations When Planning the Assessment of a Child With Impaired Hearing
Grammatical Analysis of Elicited Language— 8 to 12 years Skills are assessed in terms of prompted
Complex Sentence Levels (GAEL-C;Moog & production and imitation
Geers, 1980
Two groups of children, one with and one without hearing impairment were studied; the hearing-impaired children had
severe to profound levels of impairment and were without other problem areas;16 grammatical categories are assessed:
articles, noun modifiers, subject nouns, object nouns, noun plurals, personal pronouns, indefinite and reflexive pronouns,
conjunctions, auxiliary verbs, first clause verbs, verb inflections, infinitives and participles, prepositions, negation, and
wh- questions.Scores expressed as percentiles or language quotients(M = 100; SD = 15) Rhode Island Test of Language
Structure (Engen & Engen, 1983) 5 to 17+ years Designed to assess comprehension of syntax Normed on 364 children
with hearing impairment ranging from moderate to profound and 283 children without hearing impairment; considerable
information is available about the hearing-impaired group100 items are used to assess 20 sentence types, including
simple sentences, imperatives, negatives, passives, dative sentences, expanded simple sentences, adverbial clauses,
relative clauses, conjunctions; deleted sentences, noninitial subjects, embedded imperatives, and complementsTest may
be orally presented or presented through simultaneous presentation of signed and spoken EnglishResults are presented as
percentiles or standard scores. Scales of Early Communication Skills (SECS; Moog & Geers, 1975) 2 to 9 years Verbal
and nonverbal skills are assessed receptively and expressively through teacher ratings Standardized on 372 children from
2 years to 8 years, 11 months, with profound hearing impairments from oral programsInterexaminer reliability data only;
no test–retest data or validity information. Teacher Assessment of Grammatical Structures (TAGS; Moog & Kozak,
1983) Not specified Criterion-referenced teacher rating of children’s grammatical structures at four levels:
comprehension, imitated production, prompted production, and spontaneous production There are 3 levels of the test: pre-
sentence, simple sentence, and complex sentencesCan be used with children who use signed or spoken EnglishStructures
examined are less comprehensive than in other measures developed by Moog and her colleagues
Page 203
first followed by the visual stimulus, thus allowing the child to look at the clinician as he or she speaks. Use of an FM
listening situation during testing can also be recommended for obtaining information about optimal performance
(Brackett, 1997).
It is unlikely that one measure or one person who interacts with a hearing-impaired child will capture all of the child’s
strengths and weaknesses as a communicator (Moeller, 1988). Consequently, the speech-language pathologist will need
to rely on multiple measures and seek team input both as an assessment is planned and as it is interpreted. In addition to
the audiologist, the child’s educators, psychologists, and especially those who know the child the best—the child him- or
herself and the child’s parents—can be valuable sources of information. An excellent source of recommendations for
effective interactions with families can be found in Donahue-Kilburg (1992) and Roush and Matkin (1996).
Expected Patterns of Oral Language Performance
Despite evidence that even children with mild or unilateral hearing losses are at risk for academic difficulties (Bess,
1985; Bess et al., 1986; Carney & Moeller, 1998; Culbertson & Gilbert, 1986; Oyler et al., 1988), relatively little is
known about their oral or sign language development (Mogford-Bevan, 1993). To date, most research on oral language
development in children with hearing impairment has focused on children with more severe congenital losses (Mogford-
Bevan, 1993) or with the fluctuating hearing loss associated with otitis media (Klein & Rapin, 1992).
The fluctuating hearing loss associated with otitis media appears more important when combined with other risk factors
for disordered language development than it does when viewed as a single explanatory factor (Klein & Rapin, 1992;
Paul, 1995). In contrast, there is considerable evidence that deaf children and those who are hard of hearing experience
difficulties across all oral language domains and modalities—at least when comparisons are made against same-age
peers (Mogford-Bevan, 1993).
Syntax has been described as the “most severely affected aspect of language” in children with hearing loss that occurs
congenitally or in early childhood (Mogford-Bevan, 1993). Phonology is understandably quite affected, although some
children who appear to derive all of their phonological information visually (through speech reading) demonstrate the
ability to use the phonological code and show many phonological patterns consistent with younger, hearing children
(Mogford-Bevan, 1993). Documented semantic deficits involve lexical items referring to sounds and concepts related to
the ordering of events across time, and possibly, to the use of metaphorical language (Mogford-Bevan, 1993). Pragmatic
deficits are sometimes described and attributed to the close relationship of pragmatics to syntax as well as to changes that
occur in conversational interaction on the part of speaker and listener when one is deaf. A different pattern of
conversational initiation and turn-taking represents the milieu in which such children acquire their knowledge of
language use (Mogford-Bevan, 1993; Yoshinaga-Itano, 1997). Therefore, it has been suggested that comparisons with
hearing peers may not prove to be a useful means of understanding the pragmatic development of deaf children. In a
recent article, Yoshinaga-Itano (1997)
Page 204
described a comprehensive approach to assessing pragmatics, semantics, and syntax among children with hearing
impairment in which the interrelationships of these domains was stressed and both informal and formal measures were
used.
Related Problems
Children with hearing loss appear to be at increased risk for a number of problems (e.g., Voutilainen, Jauhiainen, &
Linkola, 1988). This increased risk may arise because the cause of the hearing loss has multiple negative outcomes (e.g.,
some genetic syndromes or infections can cause both mental retardation and hearing loss). Alternatively hearing loss
may make children more vulnerable (e.g., children who are less able to communicate for any reason may be a greater
risk for psychosocial difficulties). Despite a convergence of evidence suggesting increased risk, the specific prevalence
of multiple handicaps in children with hearing loss is a matter of considerable debate (Bradley-Johnson & Evans, 1991).
The prevalence of specific problems also appears to be related to etiology. For example, whereas children whose hearing
impairments are inherited tend to have fewer additional problems (inherited or unknown etiologies), those whose hearing
impairment is due to cytomegalovirus are at increased risk for behavioral problems (Bradley-Johnson & Evans, 1991).
In a 1979 study looking at additional problems areas for children with hearing impairment (Karchmer, Milone, & Wolk,
1979), the most common additional problems were mental retardation (7.8%), visual impairment (7.4%), and emotional–
behavioral disorder (6.7%). Although each of these problems was found to occur in less than 10% of children with
hearing loss, their prevalence was still considerably higher than in children without hearing loss (Bradley-Johnson &
Evans, 1991).
The increased prevalence of emotional–behavioral disorders is of interest because of the special management issues that
accompany it. Biological factors may be responsible for emotional–behavioral disorders in children with hearing loss.
However, it has also been suggested that mismatches between the child’s communication needs and capacities and those
of his or her caregivers and peers may contribute to special environmental stresses that increase a child’s risk of these
disorders (Paul & Jackson, 1993). Paul and Jackson provided a fascinating discussion of the literature describing the
subtle and not-so-subtle differences in world experience that accompany deafness.
The one problem area in which children with hearing loss were found to be at reduced risk in the study by Karchmer et
al. (1979) was learning disorders, a finding that some authors have attributed to the effects of overshadowing (Goldsmith
& Schloss, 1986). Overshadowing is the tendency for professionals to focus on a primary problem to a degree that causes
them to overlook other, significant problem areas. Although overshadowing may be one source of underidentification of
learning disabilities in children with hearing loss, another possible source is certainly the tendency of researchers and
clinicians to define learning disabilities as “specific learning disabilities,” in which problems known to affect learning
are excluded. The question remains, however, whether some children with a hearing loss have a learning disability
whose origin is unrelated to that hearing loss.
Page 205
Summary
1. Permanent hearing loss in children encompasses both (a) children who are hard of hearing, who will learn speech
primarily through auditory means, and (b) children who are deaf, who may acquire speech primarily through vision.
2. Characteristics of hearing losses that affect the impact of the loss include degree of loss (mild, moderate, severe,
profound; hard of hearing, deafness), type of loss (conductive, sensorineural, mixed), configuration (flat, high-frequency,
low-frequency), laterality (unilateral vs. bilateral), and age of onset (congenital, acquired).
3. Genetic sources account for about 50% of all cases of deafness, with remaining causes including infectious disease, rh
factor incompatibility, and exposure to ototoxic drugs.
4. Even mild or unilateral hearing loss can negatively affect children’s language learning and academic progress, and
there is some evidence to suggest that the transient hearing loss associated with otitis media can interact with other risk
factors to undermine children’s learning, (Peters, Grievink, van Bon, Van den Bercken, & Schilder, 1997).
5. Management of the hearing loss for children who are hard of hearing ideally includes amplification (hearing aids and
FM system use), sound treatment of the child’s language learning environment, speech-language intervention, and
classroom support as needed.
6. Under most current programs of early identification and subsequent interventions, deafness poses a grim threat to
children’s normal acquisition of an oral language.
7. Current controversies in deafness include the relative importance of oral versus sign languages in children’s
acquisition of communication competence and the role of the Deaf culture as a political force.
8. Challenges in the assessment of communication of children with hearing loss include difficulties in determining the
mode(s) in which to conduct testing (e.g., oral, ASL, Total Communication) as well as a scarcity of both appropriate
developmental expectations for communication acquisition and standardized norm-referenced measures for this
population in any mode.
Key Concepts and Terms
cochlear implant: a prosthetic device that provides stimulation of the acoustic nerve in response to sound and is used
with individuals who have little residual hearing.
conductive hearing loss: a hearing loss caused by an abnormality affecting the transmission of sound and mechanical
energy from the outer to the inner ear.
deafness: a hearing loss greater than or equal to 70 dB HL, which precludes the understanding of speech through
audition.
Page 206
FM (frequency modulated) radio systems: one of several systems designed to address the problems of low signal-to-
noise ratios and reverberation occurring in settings such as classrooms; these are used in combination with personal
hearing aids.
hard of hearing: having a degree of hearing loss usually less than 70 dB HL, which allows speech and language
acquisition to occur primarily through audition.
hearing loss configuration: the pattern of hearing loss across sound frequencies—for instance, a high-frequency loss is
one in which the loss is greatest in the high frequencies.
mixed hearing loss: a hearing loss with both conductive and sensorineural components.
otitis media: middle ear infection.
ototoxicity: the property of being poisonous to the inner ear that is found for some drugs and environmental substances.
otoacoustic emissions: low-level audio frequency sounds that are produced by the cochlea as part of the normal hearing
process (Lonsbury-Martin, Martin, & Whitehead, 1997).
overshadowing: the tendency for professionals to focus on a primary problem to a degree that causes them to overlook
other, significant problem areas.
prelingual hearing loss: a hearing loss acquired before age 2, which is thought to be associated with a more significant
impact.
prematurity: birth 2 or more weeks prior to expected due date.
rh factor incompatibility: condition in which the blood of mother and infant have discrepant rh factors resulting in
maternal antibody production that can prove harmful to the infant if untreated.
sensorineural hearing loss: hearing loss due to pathology affecting the inner ear or nervous system pathways leading to
the cortex.
Study Questions and Questions to Expand Your Thinking
1. The tendency to have a diagnosis such as deafness overshadow other significant but less severe conditions is an
understandable but quite unfortunate clinical error. How might you avoid this kind of error in clinical practice?
2. Protective ear plugs (e.g., EAR Classic) produce the equivalent of a mild (approximately 20–30 dB) hearing loss. Find
a pair and use them in three different listening conditions. For example, talking with a friend face to face in a quiet
setting, listening to a lecture from your usual seat in the classroom, and watching the TV news with the loudness level set
at a comfortable listening level (before you put the plugs in). Write down what you hear.
3. Repeat the experiment from Question 2 using only one ear plug. Besides noting what you hear, note whether you
changed anything else about your behavior as you listened and talked.
Page 207
4. Briefly describe an argument you might make favoring the use of total communication with a deaf child born to
hearing parents.
5. Repeat Question 4, but argue in favor of the use of ASL only with the same child.
6. Consider the etiologies described for hearing loss in this chapter. What preventive measures might help reduce the
occurrence of hearing loss in infants? Are there any of these measures in which you could play a role as a school-based
speech-language pathologist? As a citizen of your local community?
7. List four things you would want to be sure to remember as you prepare for the oral language evaluation of a child who
is hard of hearing and who regularly uses a hearing aid, where the purpose of the evaluation is to determine the child’s
optimal performance.
Recommended Readings
Carney, A. E., & Moeller, M. P. (1998). Treatment efficacy: Hearing loss in children. Journal of Speech-Language-
Hearing Research, 41, 561–584.
Northern, J. L., & Downs, M. P. (1991) Hearing in children (4th ed.). Baltimore: Williams & Wilkins.
Paul, P. V., & Quigley, S. P. (1994). Language and deafness (2nd ed.). San Diego, CA: Singular.
Scheetz, N. A. (1993). Orientation to deafness. Boston: Allyn & Bacon.
References
American Speech-Language-Hearing Association and the Council on Education of the Deaf. (1998). Hearing loss:
Terminology and classification; position statement and technical report. ASHA, 40 (Suppl. 18), pp. 22–23.
Bellugi, U., van Hoek, K., Lillo-Martin, D., & O’Grady, L. (1993). The acquisition of syntax and space in young deaf
signers. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 132–149).
Mahwah, NJ: Lawrence Erlbaum Associates.
Bergstrom, L., Hemenway, W. G., & Downs, M. P. (1971). A high risk registry to find congenital deafness.
Otolaryngological Clinics of North America, 4, 369–399.
Bess, F. H. (1985). The minimally hearing-impaired child. Ear and Hearing, 6(1), 43–47.
Bess, F., Klee, T., & Culbertson, J. L. (1986). Identification, assessment and management of children with unilateral
sensorineural hearing, loss Ear and Hearing, 7(1), 43–51.
Brackett, D. (1997). Intervention for children with hearing impairment in general education settings. Language, Speech,
and Hearing in Schools, 28, 355–361.
Bradley-Johnson, S., & Evans, L. D. (1991). Psychoeducational assessment of hearing-impaired students: Infancy
through high school. Austin, TX: Pro-Ed.
Cacace, A. T., & McFarland, D. J. (1998). Central auditory processing disorder in school-aged children: A critical
review. Journal of Speech-Language-Hearing Research, 41, 355–373.
Carney, A. E., & Moeller, M. P. (1998). Treatment efficacy: Hearing loss in children. Journal of Speech-Language-
Hearing Research, 41, S61–S84.
Corker, M. (1996). Deaf transitions: Images and origins of deaf families, deaf communities and deaf identities. Bristol,
PA: Jessica Kingsley.
Coryell, J., & Holcomb, T. K. (1997). The use of sign language and sign systems in facilitating the language acquisition
and communication of deaf students. Language, Speech & Hearing Services in Schools, 28, 384–394.
Crittenden, J. B. (1993). The culture and identity of deafness. In P. V. Paul & D. W. Jackson (Eds.), Toward a
psychology of deafness: Theoretical and empirical perspectives (pp. 215–235). Needham Heights, MA: Allyn & Bacon.
Page 208
Culbertson, J. L. & Gilbert, L. E. (1986). Children with unilateral sensorineural hearing loss: Cognitive, academic, and
social development. Ear and Hearing, 7(1), 38–42.
Dirckx, J. H. (1997). Stedman’s concise medical dictionary for the health professions. Baltimore: Williams & Wilkins.
Donahue-Kilburg, G. (1992). Family-centered early intervention for communication disorders: Prevention and
treatment. Gaithersburg, MD: Aspen.
Dubé, R. V. (1995). Language assessment of deaf children: American Sign Language and English. Journal of the
American Deafness and Rehabilitation Association, 29, 8–16.
Engen, E., & Engen, T. (1983). The Rhode Island Test of Language Structure. Baltimore: University Park Press.
Flexer, C. (1994). Facilitating hearing and listening in young children. San Diego, CA: Singular Press.
Fraser, G. R. (1976). The causes of profound deafness in childhood. Baltimore: The Johns Hopkins University Press.
Fria, T. J., Cantekin, E. I., & Eichler, J. A. (1985). Hearing acuity of children with otitis media with effusion.
Otolaryngology—Head and Neck Surgery, 111, 10–16.
Goldsmith, L., & Schloss, P. J. (1986). Diagnostic overshadowing among school psychologists working with hearing-
impaired learners. American Annals of the Deaf, 131, 288–293.
Harris, J. (1995). The cultural meaning of deafness. Brookfield, VT: Ashgate.
Harrison, M., & Roush, J. (1996). Age of suspicion, identification, and intervention for infants and young children with
hearing loss: A national study. Ear and Hearing, 17(1), 55–62.
Kapur, Y. P. (1996). Epidemiology of childhood hearing loss. In S. E. Gerber (Ed.), The handbook of pediatric
audiology (pp. 3–14). Washington, DC: Galludet University Press.
Karchmer, M. A., Milone, M. N., & Wolk, S. (1979). Educational significance of hearing loss at three levels of severity.
American Annals of the Deaf, 124, 97–109.
Klein, S. K., & Rapin, I. (1992). Intermittent conductive hearing loss and language development. In D. Bishop & K.
Mogford (Eds.), Language development in exceptional circumstances (pp. 96–109). Mahwah, NJ: Lawrence Erlbaum
Associates.
Layton, T. L., & Holmes, D. W. (1985). Carolina Picture Vocabulary Test. Austin, TX: Pro-Ed.
Lillo-Martin, D., Bellugi, U., & Poizner, H. (1985). Tests for American Sign Language. San Diego: The Salk Institute for
Biological Studies.
Ling, D. (1989). Foundations of spoken language for hearing impaired children. Washington, DC: A. G. Bell
Association for the Deaf.
Lonsbury-Martin, B. L., Martin, G. K., & Whitehead, M. L. (1997). Distortion-production otoacoustic emissions. In M.
S. Robinette & T. J. Glattke (Eds.), Otoacoustic emissions: Clinical applications (pp. 83–109). New York: Thieme.
Mauk, G. W., & White, K. R. (1995). Giving children a sound beginning: The promise of universal newborn hearing
screening. Volta Review, 97(1), 5–32.
Maxwell, M. M. (1997). Communication assessments of individuals with limited hearing. Language, Speech, Hearing
Services in Schools, 28, 231–244.
Moeller, M. P. (1988). Combining formal and informal strategies for language assessment of hearing-impaired children.
Journal of the Academy of Rehabilitative Audiology. Monograph Supplement, 21, 73–99.
Mogford, K. (1993). Oral language acquisition in the prelinguistically deaf. In D. Bishop & K. Mogford (Eds.),
Language development in exceptional circumstances (pp. 110–131). Mahwah, NJ: Lawrence Erlbaum Associates.
Mogford-Bevan, K. (1993). Language acquisition and development with sensory impairment: Hearing impaired children.
In G. Blanken, J. Pitman, H. Grimm, J. C. Marshall, & C. W. Wallesch (Eds.), Linguistic disorders and pathologies: An
international handbook (pp. 660–679). Berlin, Germany: deGruyter.
Moog, J. S., & Geers, A. E. (1975). Scales of Early Communication Skills. St. Louis, MO: Central Institute for the Deaf.
Moog, J. S., & Geers, A. E. (1980). Grammatical analysis of elicited language: Complex sentence level. St. Louis, MO:
Central Institute for the Deaf.
Page 209
Moog, J. S., & Geers, A. E. (1985). Grammatical analysis of elicited language: Simple sentence level. St. Louis, MO:
Central Institute for the Deaf.
Moog, J. S., & Kozak, V. J. (1983). Teacher assessment of grammatical structure. St. Louis, MO: Central Institute for
the Deaf.
Moog, J. S., Kozak, V. J., & Geers, A. E. (1983). Grammatical analysis of written language: Pre-sentence level. St.
Louis, MO: Central Institute for the Deaf.
Moores, D. F. (1987). Educating the deaf. Boston: Houghton Mifflin.
Musket, C. H. (1981). Maintenance of personal hearing aids. In M. Ross, R. J. Roeser, & M. Downs (Eds.), Auditory
disorders in school children (pp. 229–248). New York: Thieme & Stratton.
Nelson, K. E., Loncke, F., & Camarata, S. (1993). Implications of research on deaf and hearing children’s language
learning. In M. Marschark & M. D. Clarke (Eds.), Psychological perspectives on deafness (pp. 123–152). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Newborg, J., Stock, J. R., Wnek, L., Guidubaldi, J., & Svinicki, J. (1984). Batelle Developmental Inventory. Allen, TX:
DLM Teaching Resources.
Northern, J. L., & Downs, M. P. (1991). Hearing in children (4th ed.). Baltimore: Williams & Wilkins.
Oyler, R. F., Oyler, A. L., & Matkin, N. D. (1988). Unilateral hearing loss: Demographics and educational impact.
Language, Speech, and Hearing Services in the Schools, 19, 201–209.
Paul, P. V. (1998). Literacy & Deafness. Boston: Allyn & Bacon.
Paul, P. V., & Jackson, D. W. (1993). Toward a psychology of deafness: Theoretical and empirical perspectives.
Needham Heights, MA: Allyn & Bacon.
Paul, P. V., & Quigley, S. P. (1994). Language and deafness (2nd ed.). San Diego, CA: Singular.
Paul, R. (1995). Language disorders from infancy through adolescence: Assessment and intervention. St. Louis: Mosby.
Peters, S. A. F., Grievink, E. H., van Bon, W. H. J., Van den Bercken, J. H. L., & Schilder, A. G. M. (1997). The
contribution of risk factors to the effect of early otitis media with effusion on later language, reading, and spelling.
Developmental Medicine and Child Neurology, 39, 31–39.
Prinz, P., & Strong, M. (1994). A test of ASL. Unpublished manuscript, San Francisco State University, California
Research Institute.
Rees, N. S. (1973). Auditory processing factors in language disorders: The view from Procrustes’ bed. Journal of Speech
and Hearing Disorders, 38, 304–315.
Resnick, T. J., & Rapin, I. (1991). Language disorders in children. Psychiatric Annals, 21, 709–716.
Ries, P. W. (1994). Prevalence and characteristics of persons with hearing trouble: United States. 1990–91. National
Center for Health Statistics. Vital Health Statistics, 10 (188).
Ross, M. (1990). Hearing impaired children in the mainstream. Parkton, MD: York Press.
Ross, M., Brackett, D., & Maxon, A. (1991). Assessment and management of mainstreamed hearing-impaired children:
Principles and practices. Austin, TX: Pro-Ed.
Roush, J., & Matkin, N. D. (1994). Infants and toddlers with hearingloss: Family centered assessment and intervention.
Baltimore: York Press.
Sanders, D. A. (1993). Management of hearing handicap. Englewood Cliffs, NJ: Prentice-Hall.
Scheetz, N. A. (1993). Orientation to deafness. Needham Heights, MA: Allyn & Bacon.
Smedley, T., & Plapinger, D. (1988). The nonfunctioning hearing aid: A case of double jeopardy. The Volta Review,
February/March, 77–84.
Spencer, P. E., & Deyo, D. A. (1993). Cognitive and social aspects of deaf children’s play. In M. Marschark & M. D.
Clarke (Eds.), Psychological perspectives on deafness (pp. 65–91). Hillsdale, NJ: Lawrence Erlbaum Associates.
Stout, G. G., & Windle, J. (1992). Developmental approach to successful listening II—DASL II. Denver: Resource Point.
Strong, M., & Prinz, P. (1997). A study of the relationship between American Sign Language and English literacy.
Journal of Deaf Studies and Deaf Education, 2(1), 37–46.
Supalla, T., Newport, E., Singleton, J., Supalla, S., Metlay, D., & Coulter, G. (1994). Test Battery for American Sign
Language Morphology and Syntax. Burtonsville, MD: Linstok Press.
Tye-Murray, N., Spencer, L., & Woodworth, G. G. (1995). Acquisition of speech by children who have prolonged
cochlear implant experience. Journal of Speech & Hearing Research, 38(2), 327–337.
Page 210
Vernon, M., & Andrews, J. F. (1990). Other causes of deafness: Their psychological role. The psychology of deafness
(pp. 40–67). New York: Longman.
Voutilainen, R., Jauhiainen, T., & Linkola, H. (1988). Associated handicaps in children with hearing loss. Scandinavian
Audiological Supplement, 33, 57–59.
Worthington, D. W., Stelmachowicz, P., & Larson, L. (1986). Audiological evaluation. In M. J. Osberger (Ed.),
Language and learning skills of hearing impaired students. American Speech-Language-Hearing Association
Monographs, 23, 12–20.
Ying, E. (1990). Speech and language assessment: Communication evaluation. In M. Ross (Ed.), Hearing impaired
children in the mainstream (pp. 45–60). Parkton, MD: York Press.
Yoshinago-Itano, C. (1997). The challenge of assessing language in children with hearing loss. Language, Speech, and
Hearing Services in Schools, 28, 362–373.
Page 211
PART
III
Available Tools
Practical Considerations
Since his infancy, Serge’s parents had suspected that there was something different about their third child. Although he
was a healthy and friendly baby, he rarely vocalized and used only a few intelligible words by the time he was 3. He also
seemed able to ignore much of what went on around him while being extraordinarily sensitive to loud noises such as
motorcycles or a TV turned up by his older siblings. On the basis of Serge’s mother’s reports and the results of the
Denver II (Frankenburg, Dodds, & Archer, 1990), an early educator at a preschool screening recommended a complete
speech-language and hearing evaluation.
Amelia had ‘‘just gotten by” in the early grades. Although she never performed particularly well, she rarely failed
assignments and never received a failing grade. She was well organized, attentive, and ever so eager to please. Her
parents were accepting of her performance because they, too, had never done terribly well in school; they had just been
happy that she was enjoying it so much. All of her enjoyment vanished,
Page 214
however, in the fourth grade, when the language of the classroom became more complex and more dependent on the
books being used. She pretended to be sick in order to avoid school and cried in frustration when the work seemed too
hard. Her teacher and the school speech-language pathologist were so alarmed by her behavior and by the quality of
her written and oral discourse that they decided an in-depth examination of her oral language and literacy skills was
necessary immediately.
The Nature of Screening and Identification
Screening and identification of language disorders are closely related enterprises. Screening procedures aid clinicians in
making a relatively gross decision—Should this child’s communication be scrutinized more closely for the possible
presence of a language disorder? Identification, on the other hand, takes that question several steps further. Does this
child have a language disorder, a difference in language, or both? Often this complex question is tied to yet another
question: Is this child eligible for services within a particular setting?
Screening
In many cases, referrals by concerned parents, teachers, or physicians function as indirect screening mechanisms.
Nonetheless, alternative procedures are needed in cases when such indirect methods are unlikely to occur or are
unsuccessful. Although detection may readily occur at the behest of concerned families facing severe problems,
detection may be delayed when the problems are mild (e.g., when they consist of subtle difficulties in comprehension) or
when they are unaccompanied by obvious physical or cognitive disabilities (Prizant & Wetherby, 1993).
Screening is typically used when the number of individuals under consideration makes the use of more elaborate
methods impractical—usually from the perspectives of both time and money. Much of the current thinking about
screening and its relationship to identification are borrowed from the realm of public health (e.g., Thorner & Remein,
1962). In that context, screenings are designed to be quick, inexpensive, and capable of being conducted by individuals
with lesser amounts of training. Similarly, in speech-language pathology, the administration and interpretation of
screening methods should require minimal time and expertise. Nonetheless, validity continues to be of critical
importance because an inaccurate screening procedure is useless no matter how quick or inexpensive it may be!
A number of different kinds of screening mechanisms occur in the detection and management of language disorders. Of
greatest importance for our purposes is screening for the presence of a language disorder. Such a screening, for example,
might be performed on all 3–5 year olds in a given school district, often as part of a broader screening for a variety of
health and developmental risks. Another example of such a comprehensive screening would occur as part of neonatal
intensive care follow-up. When examined alone, communication is screened using a great variety of measures with
selected aspects of speech, language, and hearing as their major foci.
Page 215
In practice, such measures are often informal and frequently make use of several measures—some formal and some
informal—to increase the comprehensiveness of the examination. Specific tools used in a more focused approach to
language screening are discussed in the Available Tools section of this chapter.
When examined as part of a broader screening effort, communication is frequently assessed using a measure designed to
address a variety of major areas of functioning. One example of these kinds of screening measures is the Denver
Developmental Screening Test—Revised (Frankenburg, Dodds, Fandal, Kazuk, & Cohrs, 1975; Feeney & Bernthal,
1996), a screening tool for children from birth to age 6 that makes use of direct elicitation and parental reports. Another
is the Developmental Indicators for Assessment of Learning—Revised (DIAL-R; Mardell-Czudnoswki & Goldenberg,
1983), a screening tool for children ages 2–6 that is often used to screen larger numbers of children through the use of a
team of evaluators, each of whom elicit behaviors from an an individual child within a given area.
In a 1986 study of the 19 measures most commonly used in federally funded demonstration projects around the United
States, Lehr, Ysseldyke, and Thurlow (1986) found only 3 that they judged to be technically adequate: the Vineland
Adaptive Behavior Scales (Sparrow, Balla, & Cicchetti, 1984), the McCarthy Scales of Children’s Abilities (McCarthy,
1972), and the Kaufman Assessment Battery for Children (Kaufman & Kaufman, 1983). Bracken (1987) noted similar
problems with available screening measures, especially among measures designed for children younger than 4. This lack
of well-developed comprehensive screening tests is particularly problematic given the demand inherent in the
Individuals with Disabilities Education Act (IDEA, 1990) which compels identification of at-risk children at very young
ages.
Screening procedures are also used by speech-language pathologists during comprehensive communication assessments
to determine (a) whether specific areas of communication (e.g., voice, fluency, hearing) need in-depth testing and (b)
whether problems exist and thus require referrals in other major areas of functioning (e.g., vision, cognition). Nuttall,
Romero, and Kalesnik (1999) provided a wide ranging discussion of various types of developmental preschool
screenings.
Identification
Essentially, identification procedures for language disorders in children are intended to verify the existence of a problem
that may have been suspected by referral sources or uncovered through a screening program. For the purposes of this
book, identification is seen as synonymous with the term diagnosis, when that term is defined as the “identification of a
disease, abnormality, or disorder by analysis of the symptoms presented” (Nicolosi, Harryman, & Kresheck, 1996, p.
86). Diagnosis is often defined so that it includes the larger set of questions leading to conclusions regarding etiology,
prognosis, and recommendations for treatment (e.g., see Haynes, Pindzola, & Emerick, 1992). Here, however, the term
identification is preferred as a means of expediting our focus on the special measurement considerations it entails.
Page 216
Identification decisions involving children are crucial for at least two reasons. First, identification is usually the first step
that enables the child to receive help, often in the form of intervention. This step is a critical one because of the
emotional, monetary, and temporal demands that accompany intervention that will be met to varying degrees by the
child, the parents, the speech-language pathologist, as well as the larger community. Second, by leading to effective
intervention, correct identification can help prevent or mitigate the additional social and scholastic problems that may
accompany language impairment. Identification decisions are among the most important ones made by speech-language
pathologists and, therefore, should be among the most carefully made.
Because identification decisions often involve the assignment of a label, they are often associated with a fear on the part
of many parents and some theorists (Shepard, 1989) that the child will be equated with the disorder. For example, the
parents may fear that their child will no longer be seen as “a cute, complicated child” when he or she becomes an
‘‘autistic child.” Although person first nomenclature (e.g., referring to “a person with autism” rather than “an autistic
person” or, worse yet, “an autistic”) is intended to make the process of labeling more benign, the negative implications
of being identified as having a communication disorder exist nonetheless in the minds of parents and perhaps in the
understandings of naive observers. This is evident when parents find one label—for example, “language impaired”—
more acceptable than another—such as “language delayed”—as clinicians frequently discover during their interactions
with families (Kamhi, 1998). Concerns about labeling in the special education community are intense and have led to
recommendations to avoid labels as much as possible, particularly for younger children and in cases where only a
screening has been conducted (Nuttall et al., 1999).
Many of the measurement issues associated with identification mirror those of screening. However, the more permanent
nature of identification and its association with decisions about access to continuing services raise the stakes in the
quality of decision making required. In the next section, special measurement considerations affecting both screening
and identification are discussed in some detail, with efforts made to call readers’ attention to points where the two differ.
Special Considerations When Asking This Clinical Question
If I were reading this book as a student (or as a clinician who finds measurement less interesting than I do), I would be
hoping that my friendly author would offer several easy steps toward accurate and efficient screening and identification.
Better yet, perhaps she would tell me exactly which screening and identification measures I should purchase and exactly
which three simple steps I should follow for infallible clinical decision making. Sadly, as much as I would like to help, a
blanket prescription for test purchasing and use cannot be made for all of the testing situations facing even a very small
group of readers. Instead, what I can do is provide basic information about some special considerations and then, in the
next section, introduce some of the many available measures that can be used for screening and identification.
Page 217
In this section of the chapter, several special considerations are explored to help readers engage in the process of test
selection and interpretation for the purposes of screening and identification. These special considerations represent
refinements of some of the information presented in earlier chapters—refinements dictated by the particular demands of
screening and identification as testing purposes.
In learning how to choose the best possible measure for a given purpose, the tie between measurement purpose and
methodology was not always obvious to me. Some time ago, in my first published article, a colleague and I used 10
operational definitions of psychometric guidelines offered by the APA, AERA, and NCME (1985) to evaluate 30
language and articulation tests used with preschool children (McCauley & Swisher, 1984a). The criteria included an
adequate description of tester qualifications, evidence of test–retest reliability, information about criterion-related
validity, and others. Almost instantly, a well-known language researcher, John Muma (1985), chastised us, citing, among
other reasons, the danger that readers would assume that each of the criteria we included was equally as important as
every other. Today, as in 1985, it seems to me that although Muma failed to understand the basic intent of the article, he
was absolutely on the mark in his concern about its fostering misunderstanding. In fact, as you will see in the next
chapters, different purposes of testing will draw special attention to different aspects of the measures one might use. It is
important to pay attention to this ironclad connection in order to make ethical decisions.
The appropriateness of standardized norm-referenced tests for purposes of identifying a language disorder or difference
is almost universally accepted in the clinical literature (e.g., see Kelly & Rice, 1986; Merrell & Plante, 1997; Sabatino,
Vance, & Miller, 1993; cf. Muma, 1998). In addition, such instruments are widely favored for that purpose by practicing
speech-language pathologists (e.g., see Huang, Hopkins, & Nippold, 1997). Often, their use is mandated as the backbone
of screening and identification efforts.
In an ideal world, speech-language pathologists would be able to predict flawlessly which children would experience
persistent, penalizing differences in communication based on a description of each child’s current language status. Thus,
criterion-referenced measures would generally suffice for both identification and treatment planning. However, given the
current level of understanding, the best strategy is to (a) identify those children whose performance seems sufficiently
different from the performances of a relatively large group of peers as to warrant concern and (b) supplement that
information with other sources of information, particularly from persons familiar with the child’s functional
communication.
Because of the tie between norm-referenced measures and identification procedures, most of the special considerations
regarding screening and identification discussed next relate to the use of norm-referenced measures in decision making.
The six special considerations involve (a) weighing measure sensitivity and specificity in test selection, (b) deciding on
cutoff scores, (c) remembering measurement error in score interpretation, (d) wrestling with the disorder–difference
question, (e) conducting comparisons between scores, and (f) taking into account base rates and referral rates in
evaluating screening measures. The first two of these considerations address concerns that will pri-
Page 218
marily be dealt with by the clinician prior to use of an instrument in a particular case. The second three address concerns
arising during the process of test use. The last consideration relates to one’s thinking about how to implement and
potentially evaluate a screening program—a more specific concern than the other five.
Weighing Measure Sensitivity and Specificity in Test Selection
On the basis of previous discussions of validity, readers can anticipate that a measure used to screen or identify children
for language disorders should provide as a corner-stone of evidence supporting its validity convincing empirical
documentation of its ability to distinguish children with and without such disorders (Plante & Vance, 1994).
One method used to examine the accuracy of classification achieved by screening and identification measures entails the
comparison of the measure under study with a measure that is considered valid or at least acceptable given the state of
the art. Comparison against an ideal is often described as a comparison with a gold standard, a measure that has been so
thoroughly studied that it is thought to represent the very best measure available for a given purpose. Because of the
scarcity of gold standards in arenas related to child language assessment, the more typical scenario involves a
comparison with a well-studied and- respected measure.
In the case of a screening measure, the comparison is often made between the results of a screening procedure and those
of a more elaborate and established method of identification. The comparison may involve the use of a more well-
established test or test battery that has been independently validated. As you may recognize in the discussion that
follows, the method used to compare these performances is largely an elaboration of the contrasting-groups method
described in chapter 3.
The comparison often makes use of a contingency table, such as that portrayed in Fig. 9.1 and in earlier sections of the
book. In Fig. 9.1, two tables are used—one to illustrate the components of this type of table and the other to show a
hypothetical example: the results of the Hopeful Screening Test contrasted with those of the Firmly Established
Identification Measure for a group of 1000 individuals.
As you can see from the first table in the figure, sensitivity is simply the proportion of true positives produced by the
measure. Thus, it reflects how frequently those children needing further evaluation are accurately found using this
measure. According to a more formal definition, sensitivity is a measure of the ability of a test or procedure to give a
positive result when the person being assessed truly does have the disorder. Specificity is a measure of the ability of a
measure to give a negative result when the person being assessed truly does not have the disorder. It is usually described
as the proportion of true negatives associated with the measure. Thus for a screening measure, specificity reflects how
frequently individuals will be held back from additional evaluation who actually shouldn’t be evaluated because they are
problem-free. In other words, a test or procedure that underidentifies children suffers from poor sensitivity, and a test or
procedure that overidentifies children suffers from poor specificity.
In the case of the hypothetical Hopeful Screening Test of Language, sensitivity seems to be less than most people would
be happy with: on the basis of its results,
Page 219
Fig. 9.1. Information contained in a contingency table and an example showing how it can be used to calculate
sensitivity and specificity.
Page 220
22%, or about 1/5, of children with the disorder would go undetected and thus be excluded from further assessment. In
contrast, the measure’s specificity is excellent, with only about 5 out of every 100 children who are performing normally
recommended for unnecessary testing.
In discussions of what constitutes acceptable levels of overall accuracy for language identification measures, Plante &
Vance (1994) noted that overall accuracy (i.e., the percentage of true positives plus true negatives given out of the entire
population) should be at least 90% for an evaluation of “good” and 80% for an evaluation of“ fair.” Thus, although the
Hopeful Screening Test of Language might be considered good in its overall accuracy (about 94%), its sensitivity cannot
be regarded nearly so highly (78%).
With regard to sensitivity and specificity for language-screening procedures, Plante and Vance (1995) recommended that
a higher standard be met for sensitivity than for specificity. Specifically, they recommended that sensitivity should be at
90% or above, whereas for specificity they accepted levels of 80% as “good” and 70% as “fair.” Thus, although
sensitivity and specificity are both inversely related to the frequency of errors (also called “misses’’) in decision making
associated with a particular test or procedure, it is important to want to examine them independently rather than lumped
together in a single measure of accuracy because their effects differ. As Plante and Vance noted, sensitivity is more
important for screening measures than specificity because the underreferrals associated with poorer sensitivity may have
greater negative effects on children than overreferrals associated with poorer specificity.
Taking Plante and Vance’s (1995) line of thought one step further, not only should clinicians go beyond overall accuracy
of classification in their evaluations of measures, they should also consider the implications of a measure’s sensitivity
and specificity levels in light of the specific testing situation. Properties of that testing situation include the gravity of the
decision to be made and its irreversibility. For example, lower sensitivity may be more acceptable in settings where
failures to refer for testing or to take steps toward identification will be corrected— such as a situation in which a well-
informed teaching staff will be likely to bring a child to the clinician’s attention regardless of previous screening results.
Similarly, lower specificity may be tolerated in situations where testing resources are not sorely taxed (if there are such
places).
Finally, as a point that cannot be overstressed—the relative sensitivity and specificity of accessible alternatives needs to
enter into the clinician’s decision making: It makes little sense to jump from a rocking boat to a sinking one. Yet this is
the action that may be taken regularly by clinicians who choose reliance on their own untested “judgment” over a flawed
but better understood screening mechanism.
Lest the reader hope that if other indicators of validity and reliability look promising all is likely to be well with regard to
a test’s sensitivity and specificity, consider a relevant finding of Plante and Vance’s (1994) research. Using criteria
closely related to those used in McCauley and Swisher (1984a), Plante and Vance rated 21 language tests designed for
use with 4 to 5 year olds. The researchers then conducted a study of 4 of the tests that met a relatively larger number of
criteria (6 out of 10) to determine their sensitivity and specificity. Of the 4 they examined, only one achieved
Page 221
acceptable levels. Thus, it pays to look for specific information on sensitivity and specificity—and to demand it from
publishers as a prerequisite to purchase.
In summary, sensitivity and specificity data provide special insight into the way that measures function for purposes of
screening and identification. Thus, they can provide enormously valuable evidence of a measure’s value for those
purposes. Whereas for many purposes sensitivity is even more important than specificity, the specific context in which
the measure is used and the availability of preferable alternatives will ultimately affect clinical perceptions of acceptable
levels. Finally, it seems quite probable that the absence of this information from test manuals, although currently
commonplace, will be rectified only when clinicians begin to discriminate among tests on this basis and to directly urge
publishers to take action.
Choosing a Cutoff Score
One factor that affects both sensitivity and specificity is the cutoff used to determine whether a positive or negative result
has been obtained. When a screening or identification decision is made using a normative comparison, a cutoff score is
selected to indicate the score at which a child’s performance is seen as crossing an invisible boundary between a region
of normal variation for that particular group on that particular measure into a region suggesting a difficulty or difference
worthy of attention. Clearly, however, the location of the cutoff point is both arbitrary and significant. Shifting its
location can decrease a test’s specificity while increasing its sensitivity, or vice versa. Thus, the choice of a cutoff is not
a trivial matter.
Clinically oriented authors writing about language disorders have recommended a variety of possible cutoffs for use
when norm-referenced instruments are used as part of developmental language assessments. For example, Owens (1995)
noted that scores falling below the 10th percentile are often considered “other-than-normal.” Leonard (1998) also
observed that researchers frequently use cutoffs falling 1.25 or 1.5 standard deviations below the mean, thus falling close
to Owen’s 10th percentile. Similarly, Paul (1995) endorsed a cutoff at the 10th percentile, corresponding to a standard
score of about 80 and a z score falling 1.25 standard deviations below the mean for scores that are normally distributed.
She indicated that she based her recommendation, in part, on similar levels previously recommended by Fey (1986) and
Lee (1974). However, because of concerns about its arbitrariness and questionable psychometric defensibility, Paul’s
complete criterion is somewhat more elaborate. Specifically, she required
that a child thought by significant adults in his or her life to have a communication handicap should score below the
tenth percentile or below a standard score of 80 on two well-constructed measures of language function to be thought of
as having a language disorder. (p. 5)
Paul’s intention was to make sure that this definition would not strong-arm children who had no real-life problems into
diagnoses simply because of differences in test scores that, although detectable, are of little or no practical significance.
(See a longer discussion of clinical or practical significance in chap. 11.)
Page 222
It is also important to note that Paul (1995) recommended the use of two “well-constructed” measures, given that the use
of one or two measures that are less than that will undermine the intent of the recommendation. Just as a chain is no
stronger than its weakest link, a battery (even of just 2 measures) will be no more accurate than its least accurate member
(Plante & Vance, 1994; Turner, 1988). Because of this concern, Plante (1998) recently recommended that a single valid
test along with a second functional indicator (e.g., clinician judgment, enrollment in treatment) be used for verification of
specific language impairment for research purposes. This recommendation leads to an obvious parallel for initial
implications and one that can be seen as consistent with IDEA (Plante, personal communication).
Sometimes, when cutoffs are selected in accordance with test developer recommendations, clinicians and researchers use
different cutoffs for different tests. Usually, the recommendations of the test developers result in very similar cutoffs to
those discussed earlier. Looking back at the normal curve and its relationship to different types of scores in Fig. 2.5
suggests that small differences in cutoffs should result in only small shifts in selection, thus suggesting that the method
used to select a cutoff probably does not matter. Surprisingly, however, Plante and Vance (1994, 1995) demonstrated
that an empirically derived cutoff can greatly enhance a measure’s sensitivity and specificity. Further, they showed that
empirically derived cutoffs are likely to vary from test to test, thus making the use of a “one-cutoff-fits-all-tests’’
practice something that they would advise against. Their work is described briefly in the next paragraphs to help
illustrate the value of research into basic measurement issues such as cutoff selection.
In their studies, Plante and Vance (1994, 1995) used a statistical technique called discriminant analysis—a form of
regression analysis—to examine outcomes associated with different cutoffs. Using this technique, the experimenter
determines to what extent variation in scores is accounted for by group membership and then examines the accuracy of
predictions of group membership made from a resulting regression equation. It allows one to examine the ways in which
changing the cutoff affects sensitivity and specificity.
Plante and Vance (1994, 1995) recommended two strategies for ensuring the availability of empirically derived cutoffs
such as those that can be obtained through discriminant analysis. First, they advised clinicians to insist that standardized
measures offer such cutoffs along with data concerning sensitivity and specificity. Second, they noted the possibility of
developing local cutoffs, a process that requires fewer participants than local norming but that can require clinicians who
attempt it to seek statistical assistance (Plante & Vance, 1995).
Although not endorsed by Plante and Vance (1995), the development of local norms may also represent a responsible
strategy for increasing the availability of data concerning sensitivity and specificity of decisions in settings where
sufficient resources and numbers of children (including those with disorders) exist (e.g., see Hirshoren & Ambrose,
1976; Norris, Juarez, & Perkins, 1989; Smit, 1986). Software designed to aid in the construction of local norms (Sabers
& Hutchinson, 1990) makes this strategy more feasible than it once was (Hutchinson, 1996). In addition, the
development and use of local norms has been recommended as a means of dealing with
Page 223
bias in testing that results from the use of inappropriate norms (e.g., see Vaughn-Cooke, 1983).
In summary, then, the cutoffs used to identify children’s performance as falling below expectations are often arbitrarily
set at about 1.25 to 1.5 standard deviations below the mean. However, greater sensitivity and specificity can be achieved
when empirical methods are used to optimize the performance of the measures used. Not only does this practice
constitute another step that can be taken by test authors and publishers to improve the quality of clinical decision making
in the field, it represents a topic of such practical significance as to invite a wealth of applied research. In addition, as
Paul (1995) suggested, the current state of the art precludes reliance on a single measure—or even a single battery of
measures—to lead in a lockstep fashion to decision making. Integration of functional data about the child will remain a
necessary component of screening and identification for the foreseeable future. As understanding of functional or
qualitative data—such as portfolios and teacher reports of critical incidents—increases (e.g., Schwartz & Olswang,
1996), their role will probably increase as well (see chap. 10), with beneficial results for the sensitivity and specificity of
the process. Further, in many clinical and especially educational settings, the choice of cutoff to be used can seem—and
in some cases may be—outside the control of the speech-language pathologist. The role played by educational agencies
in establishing guidelines for measurement use and clinicians’ productive responses to these are discussed later in this
chapter in the section called Practical Considerations.
There are theoretical concerns, too, about the use of cutoffs that relate to our understanding of the very nature of
language impairment in all children, but particularly in those for whom no obvious cause exists: children with SLI.
Dollaghan and Campbell (1999) recently called attention to the fact that the use of an arbitrary cutoff at a point along a
normal distribution of scores is at odds with theoretical notions that language impairment represents a natural category,
or taxon. Instead, they say that it implies an assumption that children with “impaired” language may simply represent
those children who have less language ability, in the same way that short persons have less height. This possibility has
been pointed out by several theoreticians addressing the question of etiology for children with SLI (e.g., see Lahey,
1990; Leonard, 1987) but has failed to receive sustained attention. As an important step toward reviving consideration of
this hypothesis, Dollaghan and Campbell noted that the question of whether “language impairment” represents a distinct
category versus the lower range of a continuum of performance is an empirical one with potentially powerful
repercussions for both assessment and treatment. Specifically, as a working hypothesis they predict that if language
impairment is taxonic, language deficits would be likely to be more focused and would therefore require more focused
assessments and treatments.
Dollaghan and Campbell (1999) also noted that the time may be ripe for addressing the question of the nature of
language impairment because parallel concerns in clinical psychology with regard to schizophrenia and depression have
spawned rich advances in methodology (Meehl, 1992; Meehl & Yonce, 1994, 1996). They conjectured that these
advances might provide an auspicious starting point for additional efforts. Among the implications of this work are the
possibility of identifying those cutoffs that truly identify children who are categorically different in their language
Page 224
skills from other children rather than those who simply seem quantitatively suspicious because of their lower
performances. Thus, these methods may prove to provide additional strategies for more rational cutoff selection.
Remembering Measurement Error in Score Interpretation
Once a measure has actually been selected and administered and a cutoff level settled on, the clinician uses the test
taker’s score to assist in a decision regarding screening or identification. During this process, because of the weight
attached to individual scores in screening and identification decisions, remembering measurement error in score
interpretation becomes critical to solid clinical decision making—even when functional criteria are incorporated.
Recall that in chapter 3 the concept of SEM was described as a means of conveying the impact of a test’s reliability on
an individual score. Specifically, the lower the reliability of the instrument, the higher the error (quantified using SEM)
attached to the individual score. The importance of reliability and SEM is not due to their ability to remove error
(because they can’t), but rather to their helping us understand the magnitude of error we face.
Figure 9.2 is intended to provide an example illustrating the effect of SEM on a screening decision. It shows the same
score achieved by a child on two different screening measures—one with a larger SEM and the other with a smaller
SEM for that child’s age group. Around each of these scores, there is a 95% confidence interval. The confidence interval
represents a range of scores in which it is likely (although not absolutely assured) that the test taker’s true score falls. A
95% confidence level means that there is a probability of 95% that the interval contains the child’s true score and, of
course, 5% that it does not. It is often recommended that clinicians characterize children’s performance using the range
of scores encompassed within the confidence interval, rather than a single score. Further, it has been suggested that the
SEM for a measure should be no more than one third to one half of its standard deviation (Hansen, 1999).
If a score of 75 is used as a cutoff on each test in the example, clearly the task of deciding that the child’s performance
falls below that value becomes much trickier for test A than for test B, despite identical scores. In fact, one might be
tempted to refrain from using test A in favor of test B when screening children of this particular age. However, perhaps
test A is preferable as a screening tool for other reasons, for example, because it has a more appropriate normative
sample and better evidence of validity for children similar to the one being tested. In that case, the clinician may decide
to use the measure but view the resulting data with greater circumspection.
Some tests make it quite easy to take error into account during score interpretation because of the way in which a child’s
scores are plotted on the test form. For tests that do not provide this user-friendly feature, however, the test user can
calculate a confidence interval using the tables and following the example laid out in Fig. 9.3. Although the choice of
confidence level is somewhat arbitrary, more stringent levels are usually selected for more momentous decisions.
Confidence intervals of 68, 95,
Page 225
Fig. 9.2. Two 95% confidence intervals calculated for the same score using two different screening measures, one with a larger SEM, on the left, and
the other with a smaller SEM, on the right.
and 99% are the ones most typically reported, with 85 and 90% used less frequently (Sattler, 1988).1
The old adage “know your limitations”—including know the limitations of your data—would work as an apt summary of this brief section.
Information about SEM can help clarify the significance of reliability data for individual clients and can thus be used to help the clinician make
choices in the measures he or she adopts. Further, through the
1 Also note that Salvia and Ysseldyke (1991) and others (including McCauley & Swisher, 1984b, Nunnally, 1978) recommended a slightly more
complex procedure in which an estimated true score is calculated first. This procedure is offered as a first step in appreciating the. potential value of
confidence intervals but should not be taken as definitive.
Page 226
Fig. 9.3. Table to be used in calculating confidence intervals, with an example. From “The truth about scores children
achieve on tests’’ by J. Brown, 1989, Language, speech, hearing services in schools, 20, p. 371. Copyright © 1989 by
American Speech-Language-Hearing Association. Reprinted with permission.
use of confidence intervals during interpretation of an individual’s performance, the clinician is given the opportunity to
gauge the possible effect of a measure’s known imperfection (imperfect reliability in this case). Therefore, what may
have begun to sound like a repeated refrain in the last three sections can be sounded again here. One should always make
use of such information when it is readily available, calculate it if possible, and encourage test publishers to provide it
when it is neither offered nor calculable.
Page 227
Wrestling with the Disorder–Difference Question
The diversity of cultural and language backgrounds represented among any group of children can be quite breathtaking.
Even in Vermont, which is often cited as one of the least diverse states in the country, the school district of the state’s
largest city, Burlington (population 40,000), has children whose first languages include Vietnamese, Serbo-Croatian,
Mandarin, and Arabic. In fact, in 1998 and 1999, about 25 languages other than English were spoken by children whose
proficiency in English was sufficiently low to require special intervention. During the time frame 1987–1988 to 1998–
1999, the number of such children grew from just below to 20 to just about 300 (Horness, personal communication).
Because several national companies are represented in Burlington, there are numerous children who have moved here
from different regions of the United States with their parents. Whereas some of these families have moved from other
New England regions with similar regional dialects to Vermont, others have moved from the Deep South or other
regions claiming distinct regional dialects. Further, children in this same school district come from families with incomes
below the poverty level to those with incomes in the stratosphere of affluence. On the basis of these few facts, it seems
safe to say that each speech-language pathologist working in this school district confronts issues related to differences in
culture, regional dialect, social dialect, and primary language on a daily basis. Even in Vermont!
As this example illustrates, diversity affecting language use among young native speakers of English and language use
by children who are acquiring English as a second language is the rule rather than the exception. Consequently,
professionals who work with children are challenged to remain vigilant to cultural and linguistic factors in the selection
and use of screening and identification measures.
Clearly, the magnitude of the challenge differs substantially when the clinician works with children who speak a
minority dialect of English compared with those who are being exposed to English for the first time in a school setting.
This latter group of children are sometimes referred to as having limited English proficiency (LEP). Regardless of
whether they are seen as having a language disorder, they will often be served through an English as a Second Language
(ESL) program in school systems. In contrast, the children who speak a minority dialect of English are perhaps more
easily misunderstood by the SLP because their differences in dialect may go unappreciated, in the assumption that they
are bidialectal—that is, able to use the dialect of the school and a regional or social dialect as well. They may also
include children whose first dialect is unknown to both their classmates and the speech-language pathologist, thus further
increasing the complexity of the speech-language pathologist’s work.
Regardless of the differences between these groups of children, any time there is a mismatch between the tools being
used or between the clinician’s language and culture and the language and culture of the child, the issue of difference
versus disorder becomes relevant. Table 9.1 offers a pair of hypothetical scenarios in which challenges of this type are
presented.
Before figuring out exactly how to respond to the challenges of linguistic and cultural diversity, however, we need to
remind ourselves of what threats to validity are
Page 228
Table 9.1
Scenarios Illustrating the Challenges of Cultural and Linguistic Diversity
interwoven with diversity. I begin by considering the threats that occur in instances where a child speaks a dialect of
English or is acquiring English as a second language—for example, Black English or Spanish-influenced English.
Among the threats to valid testing in English that have been most thoroughly discussed are those arising from the
potential for measures to use situations, directions, formats, or language that are inconsistent with the child’s previous
experience (Taylor & Payne, 1983). Here, the chief concern is in correctly respecting the presence of a language
difference, a difference in language use associated with systematic variation in semantics, phonology, and so on, when
compared with the idealized dialect that is typically represented in standardized language measures. The danger, of
course, is erroneously identifying a difference as a disorder. ASHA (1993) has defined language difference more
elaborately
as a variation of a symbol system used by a group of individuals that reflects and is determined by shared regional,
social, or cultural/ethnic factors. A regional, social or ethnic variation of a symbol system is not considered a disorder of
speech or language. (p. 41)
For children using minority dialects, English-language measures developed without attention to dialectal and
accompanying cultural variation are especially problematic for purposes of screening and identification. The advantages
and disadvantages
Page 229
of alternatives for children who speak Black English and other minority dialects fuel continuing discussion (e.g., see
Damico, Smith, & Augustine, 1996; Kamhi, Pollock, & Harris, 1996; Kayser, 1989, 1995; Reveron, 1984; Taylor &
Payne, 1983; Terrell & Terrell, 1983; Van Keulen, Weddington, & DeBose, 1998; Vaughn-Cooke, 1983).
Not surprisingly, many strategies for coping with this complex issue have been considered, but none are completely
satisfactory for use with children speaking minority dialects (Vaughn-Cooke, 1983; Washington, 1996). When the
continuing use of norm-referenced instruments for these children is entertained (e.g., see Kayser, 1989; Vaughn-Cooke,
1983), it is generally recognized that there are few existing measures that have been found to be suitable. The strategies
that have been recommended and tried include the development of alternative norms, either through adding minorities in
small numbers to normative samples or obtaining normative data for minority children—ideas that are, respectively,
ineffective or impractical in addressing a problems with the norms (e.g., Vaughn-Cooke, 1983). A second method
involves modifying objectionable test components (e.g., Kayser, 1989), and a third involves developing alternative
scoring rules designed to give credit for “correct” answers in the dialect being considered (e.g., Terrell, Arensberg, &
Rosa, 1992). Both of these latter methods have been found lacking because they invalidate the norms, thus transforming
the targeted measure into an informal criterion-referenced measure. Table 9.2 lists some modifications in test admin-
Table 9.2
Modifications of Testing Procedures
1. Reword instructions.
2. Provide additional time for the child to respond.
3. Continue testing beyond the ceiling.
4. Record all responses, particularly when the child changes an answer, explains, comments, or demonstrates.
5. Compare the child’s answers to dialect or to first language or second language learning features. Rescore
articulation and expressive language samples, giving credit for variation or differences.
6. Develop several more practice items so that the process of ‘‘taking the test” is established.
7. On picture vocabulary recognition tests, have the child name the picture in addition to pointing to the stimulus
item to ascertain the appropriateness of the label for the pictorial representation.
8. Have the child explain why the “incorrect” answer was selected.
9. Have the child identify the actual object, body part, action, photograph, and so forth, particularly if he or she has
had limited experience with books, line drawings, or the testing process.
10. Complete the testing in several sessions.
11. Omit items you expect the child to miss because of age, language, or culture.
12. Change the pronunciation of vocabulary.
13. Use different pictures.
14. Accept culturally appropriate responses as correct.
15. Have parents or other trusted adult administer the test items.
16. Repeat the stimuli more than specified in the test manual.
Note. From “Speech and Language, Assessment of Spanish-Speaking Children,” by H. Kayser, 1989, Language,
Speech, and Hearing Services in Schools, 20, p. 244. Copyright 1989 by American Speech-Language-Hearing
Association. Reprinted with permission.
Page 230
istration that have been proposed for use with minority children who have been tested with existing norm-referenced
tests; these modifications might profitably be applied in cases where a description of the child’s responses to certain
kinds of stimuli is wanted. Usually, however, those cases will exist not during identification of a language impairment,
but during the descriptive process that follows it (see chap. 10). A fourth method consists of supplementing existing
norm-referenced measures with descriptive tools (Vaughn-Cooke, 1983), which seems to present a very difficult
interpretation challenge to the clinician because norm-referenced measures will be assumed to be biased, and descriptive
measures are usually not up to the challenge of identification.
Finding more widespread approval than those methods just discussed are strategies that entail the abandonment of
currently available measures. These include (a) the substitution of descriptive methods (such as language sample analysis
or criterion-referenced measures; e.g., see Damico, Smith, & Augustine, 1996; Leonard & Weiss, 1983; Schraeder,
Quinn, Stockman, & Miller, 1999) and (b) development of new, more appropriate norm-referenced instruments (Vaughn-
Cooke, 1983; Washington, 1996). Sole use of criterion-referenced approaches, such as language sampling, has the chief
disadvantage of insufficient data supporting that strategy in screening and identification. Washington also noted that
language analyses that might be conducted for young speakers of Black English are hampered by the absence of
appropriate norms because normative data are currently available only for adolescents and adults. However, the many
proponents of a criterion-referenced or descriptive approach (e.g., see Damico, Secord, & Wiig, 1992; Robinson-
Zañartu, 1996) would argue that despite their drawbacks, descriptive strategies offer the least dangerous of the choices.
Not much progress has been made in the development of appropriate norm-referenced instruments; however, that may
change in response to pressures for improved nonbiased assessment. In addition, perusal of recently developed tests
suggests that more sophisticated efforts are being made to consider dialect use in the development of tests for more
diverse populations. This has included the test developer’s examination of item bias for minority children (Plante,
personal communication). Depending on when it is obtained, the resulting data can be used in the test’s early
development to lead to less biased testing or can be presented to show that a relatively unbiased measure has been
achieved.
Beyond the realm of traditional recommendations for improving language assessment validity for diverse groups of
children, attention has been paid recently to the development of methods that seek to reduce the effects of prior
knowledge and experience on performance. Two approaches of particular interest are processing-dependent measures
and dynamic assessment methods. The development of processing-dependent measures involves the use of tasks with
either high novelty or high familiarity for all participants (e.g., Campbell, Dollaghan, Needleman, & Janosky, 1997).
Dynamic assessment methods focus on the child’s learning of new material rather than acquired knowledge. This is done
as a means of leveling the effects of prior experience and obtaining information about how to support the child’s learning
beyond the assessment situation (e.g., Gutierrez-Clellan, Brown, Conboy, & Robinson-Zañartu, 1998; Olswang, Bain, &
Johnson, 1992; Peña, 1996). Although proposed as being applicable to identification decisions, these two types of
measures are more frequently used for descriptive purposes and are discussed more thoroughly in the next chapter.
Page 231
Assessments designed to address the needs of children who can be described as having LEP are growing in number.
Table 9.3 illustrates some of the measures that are being developed for use with children from diverse linguistic and
cultural backgrounds. Clearly at this point, the majority of these measures have been developed for children with Spanish
as their first language. Some of these measures are developed “from scratch” and thus can take advantage of the existing
knowledge base concerning development and disorders in the target languages. In contrast, others are little more than
translations of existing tests—a practice that requires considerable care and may still result in measures that do not get at
the heart of major developmental tasks in the language. For example, translations can be hampered by items that do not
have true counterparts or that will require greater linguistic complexity to convey information in the target language than
in the original. Consumers should be cautioned to be skeptical of their own comfort level with such adaptations of
familiar tests. Further, they will want to be careful of the match between the dialect spoken by the child and the dialect in
which a test is written.
I encourage you to look at more thorough discussions of the special challenges posed during the identification of
language impairment in several groups whose first or major language or dialect is either not English or not the dialect of
English typical of standardized tests. Sources warranting particular attention exist for children who are Native American
(Crago, Annahatak, Doehring, & Allen, 1991; Leap, 1993; Robinson-Zañartu, 1996), Hispanic American (Kayser, 1989,
1991, 1995), Asian American (Cheng, 1987; Pang & Cheng, 1998), and who speak Black English (Kamhi et al., 1996;
Van Keulen et al., 1998) and regional dialects (Wolfram, 1991).
Conducting Comparisons between Scores
Clinicians rarely compare scores on different instruments as part of screening. Instead, such comparisons occur more
commonly during identification. They are particularly common in settings requiring a comparison of nonverbal and
verbal skills called cognitive referencing. Despite widespread criticism of this practice (Aram, Morris, & Hall, 1993;
Fey, Long, & Cleave, 1994; Kamhi, 1998; Krassowski & Plante, 1997; Lahey, 1988), its use is nonetheless mandated in
several states to justify services. In addition, it has sometimes been used in research definitions of SLI and other learning
disabilities (see a lengthier discussion of this point in chap. 5). Comparisons of this kind are also used as a means of
identifying strengths and weaknesses in preparation for planning intervention—a descriptive use that is touched on in the
next chapter.
When single pairs of scores are compared, the comparison is frequently referred to as discrepancy analysis; when larger
numbers of scores are compared, it is more frequently referred to as profile analysis. Numerous discussions of the
hazards of this type of comparison are provided in the literature (e.g., McCauley & Swisher, 1984b; Salvia & Ysseldyke,
1998). The focus of the current discussion is the use of such comparisons in identification.
For purposes of illustration, imagine that a child’s overall score on a language measure is to be compared with her
performance on a nonverbal measure of intelli-
Page 232
Table 9.3
Selected Tests Designed for Children Whose Primary Language Is Not English
(Compton, 1996; Roussel, 1991)
Bilingual Syntax Measure– Grades K–12 Chinese E-Sem Tsang, C.(n.d.) Bilingual
Chinese (Tsang, n.d.) Syntax Measure–Chinese.
Berkeley, CA: Asian-
American Bilingual Center.
Spanish Structured 3-0 to 5-11; Spanish E Werner, E.O., & Kresheck, J.
Photographic Expressive 4-0 to 9-5 S. (1989). Spanish Structured
Language Test (Werner & Photographic Expressive
Kresheck, 1989) Language Test. Sandwich, IL:
Janelle.
Ber-Sil Spanish Test 4 to 12 years Spanish R-Sem, Morph Beringer, M. (n.d.). Ber-Sil
(Beringer, n.d.) Spanish Test. Rancho Palos
Verdes, CA: The Ber-Sil
Company.
Austin Spanish Articulation 3 years to adult Spanish E-Phon Carrow-Woolfolk, E. (n.d.).
Test (Carrow-Woolfolk, n.d.) Austin Spanish Articulation
Test. Allen, TX: DLM
Teaching Resources.
Compton Speech and 3 to 6 years Spanish R & E-Phon, Sem, Syn Compton, A. J., & Kline, M.
Language Screening (n.d.). Compton Speech and
Evaluation–Spanish Language Screening
(Compton & Kline, n.d.) Evaluation–Spanish. San
Francisco: Institute of
Language.
Test de Vocabulario en 2-6 to 17-11 Spanish R-Sem Dunn, L. M., Lugo, D.E.,
Imagenes Peabody (Dunn, Padilla, E.&R., E Dunn, L.M.
Lugo, Padilla, & Dunn, 1986) (1986). Test de Vocabulario en
Imagenes Peabody. Circle
Pines, MN: American
Guidance Service.
Page 233
Expressive One-Word Picture 2 to 11 Spanish E-Sem Gardner, M. E (n.d.).
Vocabulary Test–Spanish Expressive One- Word Picture
(Gardner, n.d.) Vocabulary Test-Spanish. San
Francisco: Children’s Hospital
of San Francisco.
Preuba del Desarrollo Inicial 3 to 7 Spanish R-Sem, Syn Hresko, W. P., Reid, D. K., &
del Lenguaje (Hresko, Reid, & Hammill, D. D. (n.d.). Preuba
Hammill, n.d.). del Desarrollo Inicial del
Lenguaje. San Antonio, TX:
Pro-Ed.
Clinical Evaluation of 6 to 21 Spanish R & E-Sem, Semel, E., Wiig, E. H., &
Language Function–3 Spanish Morph, Syn, Secord, W. (n.d.). Clinical
Edition (Semel, Wiig, & Prag Evaluation of Language
Secord, n.d.) Function-3 Spanish Edition.
San Antonio, TX:
Psychological Corporation.
Del Rio Language Screening 3 to 6 Spanish R-Sem Toronto, A. S., Leverman, D.,
Test (Toronto, Leverman, Hanna, C., Rosenzweig, P., &
Hanna, Rosenzweig, & Maldonado, A. (n.d.). Del Rio
Maldonado, n.d.) Language Screening Test.
Austin, TX: National
Educational Laboratory.
Preschool Language Scale–3 Birth to 6 Spanish E&R Zimmerman, I. L., Steiner, V.,
(Zimmerman, Steiner, & Pond, years & Pond, R. (1992). Preschool
1992) Language Scale–3. San
Antonio, TX: Psychological
Corporation.
Sequenced Inventory of 0-4 to 4-0 Spanish translation E & R Hedrick, D. L., Prather, E. M.,
Communication Development– Tobin, A. R., Allen, D. Y.,
Revised (Hedrick et al., 1984) Bliss, L. S., & Rosenberg, L.
R. (1984). Sequenced
Inventory of Communication
Development Revised Edition.
Seattle, WA: University of
Washington Press.
Bilingual Syntax Measure– Grades K–12 Tagalog E Tsang, C. (n.d.). Bilingual
Tagalog (Tsang, n.d.) Syntax Measure-Tagalog.
Berkeley, CA: Asian–
American Bilingual Center.
Note. E = Expressive. R = Receptive. E-Sem = Expressive Semantics, etc. Morph = Morphology. Phon = Phonology.
Syn = Syntax. Prag = Pragmatics.
Page 234
gence. Imagine that she receives a standard score of 70 on the former and 90 on the latter. On the face of this
comparison, it looks like there is quite a difference. However, differences between scores, also called difference scores
or discrepancies, are often less reliable than the scores on which they are based. In fact, the likelihood that observed
differences are due to error rather than real differences is affected by three factors: the reliability of each measure, the
correlation of the two measures, and the similarity of their normative samples (Salvia & Ysseldyke, 1998).
The task of assessing norm comparability is as straightforward as looking over descriptions of each normative group to
determine whether they seem to differ in ways that could affect the scores to be compared. To see why this is necessary,
recall that the standard scores best used to summarize test performance include the group mean in their calculation.
Therefore, something about the normative group may push one group mean higher (e.g., one group is more “elite” in
some sense than the other). Consequently, one would fare more poorly in a comparison against that group than against a
group with a lower mean, even if one’s true abilities in the two areas were comparable. To provide a poignant example,
imagine a ruthless clinician has decided to compare your language and nonverbal skills—using scores obtained by
comparing your performances against those of Nobel laureates in literature for the former and fifth graders for the latter.
Not only could you legitimately question the inappropriateness of the norms as a basis of each of the scores, you could
also vehemently protest the resulting comparison. Thankfully, flagrant mismatches between test norms used in
comparisons may not occur outside of examples like this one. However, if overlooked, more subtle mismatches can
nonetheless contribute to poor decisions and inappropriate clinical actions.
Taking test error and test correlation into account is less straightforward than inspecting norms. On the basis of ideas
analogous to those used for calculating a confidence interval around a single score, however, it is possible to calculate a
confidence interval around a difference score. Salvia and Ysseldyke (1998) described two methods based on differing
assumptions about the causal relationship of the two skills being compared. In addition to the actual score data, both
methods require information about the reliability of the measures being used and about their correlation. Whereas the
relevant information about reliability and the nature of normative samples should be readily available for individual
measures, information about the correlation between measures will often be lacking. In that event, abandoning a direct
comparison and instead noting the results of each test as supporting or not supporting the identification of a problem in a
given area may represent the best alternative (McCauley & Swisher, 1984b).
Even when a difference between two scores is found to be reliable, Salvia and Good (1982) pointed out, a difference of
that magnitude may not be particularly uncommon, or, even more importantly, it may not be functionally meaningful.
Because of the resources involved, determining the functional significance of differences in skill levels represents yet
another area in which clinicians must look to the research literature to help them interpret their clinical data. Fortunately,
in cases where comparisons between scores affect identification decisions, there is a rich literature examining these
issues (e.g., for SLI). Clinicians can be more active and work to change policy in settings in which the use of
discrepancies is mandated for purposes for which they have been found to lack meaning.
Page 235
In summary, comparing scores is a more complicated endeavor than it first appears, involving as it does not only the
child’s test scores but also the properties of the two tests, especially their norms and intercorrelaltion. A well-reasoned
conservatism in undertaking identifications based on such comparisons should be joined by a healthy appetite for the
clinical literature exploring their significance.
Taking into Account Base Rates and Referral Rates
Each of the special considerations addressed earlier had a more specific focus on test selection or on the use of tests with
a particular child. Two other factors that affect screening and identification decisions really represent features of the
clinical environment: the rarity of the disorder (the base rate of the disorder) and the frequency with which referrals are
made in a particular setting (the referral rate). In this section, these two topics are discussed briefly because of their
effect on screening programs.
The lower the base rate of the disorder—that is, the rarer the disorder in the general population—the more likely it
becomes that the positive results of screening or identification are actually false positives rather than true positives
(Hummel, 1999). Shepard (1989) pointed out that although people understand that classification error will occur based
on fallible measures and decision processes, they fail to appreciate that that error will fall equally on those children who
are identified as having a disorder as those who are not, even when the validity coefficient for the measure being used is
quite large. She concluded that when base rates are low, “even with reasonably valid measures, the identifications will be
equally divided between correct decisions and false positive decisions” (Shepard, 1989, p. 551). This problem is
particularly acute when measures are less valid for a given population, such as minority children, where
overidentification is very likely to result (Schraeder et al., 1999).
Concern about low base rates has led public health researchers and psychologists interested in rare psychiatric outcomes
(e.g., suicide) to develop several strategies designed to target screening at subsets of the larger population with higher
base rates. These include strategies that include the use of multistep screening procedures and the application of
screening procedures to subgroups who are expected to have higher prevalence rates than the general population
(Derogatis & DellaPietra, 1994). Currently, the prevalence of childhood language disorders across all types is not
particularly low, as can be illustrated by the fact that it is estimated that children with language disorders constitute 53%
of all speech-language pathologists’ case loads (Nelson, 1993). Nonetheless, it is sufficiently low that careful selection of
groups for language screening makes good sense. Children about whom concerns are expressed or who are
demonstratively failing in some aspect of their adaptation to school or home environments make obvious candidates for
more focused screenings and indeed are often seen for screening prior to more comprehensive evaluations.
Screening programs in preschool education are associated with enormous differences in referral rates (Thurlow,
Ysseldyke, & O’Sullivan, 1985, as cited in Nuttall et al., 1999), the rates at which children who are screened are referred
on for additional assessment. This variability leads to concerns about overreferral when referral rates are particularly
high and underreferral when they are particularly low. Because
Page 236
overreferrals needlessly tax clinical resources, parental concern, and the child’s patience, whereas underreferrals deprive
children of needed attention, steps to study and alter referral rates have been recommended. Changes in the targets for
screening and the criteria (including cutoffs) used can be made to address verified inadequacies in the screening
mechanism. In addition, the use of a second-level screening using measures that are intermediate in their efficiency and
comprehensiveness between initial screenings and full-fledged assessments has been recommended (Nuttall et al., 1999).
Available Tools
Screening
Available screening measures differ in terms of whether information is obtained directly by the speech-language
pathologist and whether the measurement is formal or informal. Screening methods include the use of norm-referenced
standardized tools as well as informal clinician-developed measures. Over the past few years there has been growing
interest in the development of questionnaires that might be used to increase the involvement of parents and others
familiar with the child and improve the quality of information obtained from them. More recently still, there has been an
interest in the development of criterion-referenced authentic assessments in which specific minimal competencies are
evaluated in a familiar setting. Schraeder et al. (1999) described such a protocol that was developed for use with young
speakers of Black English. Because its elements were selected for their high degree of overlap with features of Standard
American English, Schraeder and her colleagues suggested its potential relevance for many children in the targeted age
group of 3-year-olds.
Parent Questionnaires and Related Instruments
Although historically some instruments have incorporated the use of parent report for very young children (e.g., the
Sequenced Inventory of Communicative Development, Hedrick, Prather, & Tobin, 1975), extensive development of
parent questionnaires for language-disorder screening has blossomed only in the past decade. The use of such
instruments is welcomed from a family-centered perspective (Crais, 1993) because parents are given the opportunity to
share their expertise concerning the child as part of their collaboration in the assessment process. In addition, these
measures also show good potential for efficient, valid use from a psychometric point of view. One obvious advantage
that they have over the clinician-administered procedures is their ability to obtain information that has been accumulated
by the parent over time using questions that cover a variety of situations and settings. For some children and at some
times, the testing advantage is irrefutable: The child will simply not cooperate for more direct testing or is so thoroughly
affected by the testing situation as to make the results of structured observations hopelessly flawed. Even when children
are more amenable to interacting with strangers, parent questionnaires may help remove the subtler invalidating
influence of the clinician on the child’s behavior (Maynard & Marlaire, 1999).
Page 237
On the basis of a growing number of studies, it appears that parent questionnaires may reliably and validly be used to
obtain information about a number of language areas, especially expressive vocabulary and syntax—although most
individual measures are still very undeveloped. Leading the trend toward increased development of these measures, the
MacArthur Communication Development Inventories (Fenson et al., 1991) has been thoroughly studied (e.g., Bates,
Bretherton, & Snyder, 1988; Dale, Bates, Reznick, & Morisset, 1989; Reznick & Goldsmith, 1989). In addition, it has
also been effectively adapted for use with other languages, including Italian, Spanish, and Icelandic (Camaioni, Castelli,
Longobardi, & Volterra, 1991; Jackson-Maldonado, Thal, Marchman, Bates & Gutierrez-Clellan, 1993; Thordardottir &
Ellis Weismer, 1996). Other tools that assess communication more broadly have also been developed but have received
less widespread attention and validation (e.g, Girolametto, 1997; Hadley & Rice, 1993; Haley, Coster, Ludlow,
Haltiwanger, & Andrellos, 1992). Table 9.4 lists five instruments for use with English-speaking children under the age of
3, each of which consists of a parent questionnaire or makes use of parent report for at least some items.
Questionnaires that take advantage of the familiarity of other adults with the child—usually classroom teachers—are also
being developed (Bailey & Roberts,
Table 9.4
Instruments for Use With Children Under 3 Years of Age,
Including Parent Reports
Measure and Source Ages covered Receptive or Expressive Areas of Language Covered
Communication
Screen 3 to 7 X X X
(Striffler & Willis, years
1981)
Fluharty
Preschool Speech
and 2 to 6 X X X X X X
Language years
Screening Test
(Fluharty, 1978)
Physician‘s
Developmental <1 to 6 X
Quick years
Screen (Kulig &
Baker, 1975)
Stephens Oral
Language PreK–lst X X X
Screening grade
Test (Stephens,
1977)
Sentence
Repetition
Screening Test 4 to 5 X
(Sturner, Kunze, years
Funk, & Green,
1993)
Texas Preschool
Screening (Haber 4 to 6 X X
& Norris, 1983) years
Available Tools
Practical Considerations
Nigel is a 9-year-old with mild mental retardation whose placement in a multiage classroom is complicated by a
moderate hearing loss and ADD. A 3-year reevaluation conducted at the beginning of the school year included extensive
audiological assessment as well as standardized language testing that confirmed particular difficulties in expressive
phonology and morphosyntax. Language sampling and a classroom checklist were used to help determine the
educational impact of Nigel’s difficulties and to help plan accommodations and develop Nigel’s individualized
educational plan.
Tao has a long history of communication problems that have changed with age. She was diagnosed with autism at age 4,
then Asperger’s syndrome at age 8. Now, at age 12, with appropriate accommodations and intensive treatment, she is in
a regular junior high school. Speech-language intervention has centered on addressing her pragmatic challenges with
peers and teachers. Goals in this area have been identified and tracked during the semester using a variety of descriptive
measures
Page 251
created by her clinicians. Recently a dynamic assessment designed to examine Tao’s emerging awareness of the
perspectives of others was undertaken as part of this process.
The Nature of Description
Describing their skills and the problems faced by children with suspected language impairments sometimes occurs as
part of screening, thus preceding the use of formal procedures associated with identification. More often, description
represents a critical component of initial assessments and continues throughout all of the later steps involved in speech-
language management. With such pervasiveness, description undoubtedly constitutes the major measurement task facing
clinicians.
The purposes served by description are varied. Descriptive measures are initially used to characterize the specific areas
of linguistic or communicative difficulty facing a child, the functional limitations those difficulties impose, and
increasingly the effects on the child’s social roles that are associated with the child’s language disorder (Goldstein &
Geirut, 1998). At the same time, descriptive measures can be used to help plan initial treatment strategies, choose
specific treatment goals, and provide the basis for later comparisons. During treatment, descriptive probes—especially of
untreated but related stimuli—and other descriptive measures are likely to provide some of the best evidence of
treatment effectiveness (Bain & Dollaghan, 1991; Olswang & Bain, 1994; Schmidt & Bjork, 1992) because they reflect
the extent to which generalization is occurring. In fact, much of the profession’s recent focus on measuring outcomes to
document the value of treatment (see Frattali, 1998) involves the development and use of descriptive measures.
Despite the ubiquity of descriptive measures (and perhaps because of it), the measurement challenges they present can be
overlooked, or at least underappreciated (Leonard, Prutting, Perozzi, & Berkley, 1978; McCauley, 1996; Minifie, Darley,
& Sherman, 1963). Illustrating a growing interest in those challenges, Secord (1992) devoted an entire book to
descriptive, nonstandardized language assessment. In an early chapter of that book, Damico, Secord, and Wiig (1992)
noted that effective descriptive assessment procedures need to be “as rigorous as norm-referenced tests” (p. 1). The
source of that rigor, however, is much less obvious than that associated with measures used for purposes of classification.
Much of the rigor associated with methods used in the identification of language impairment appears to reside in the
hands of others (e.g., test authors and publishers, individual researchers). In contrast, for descriptive measures, the
responsibility for rigor falls largely into the hands of the clinician. As Leonard (1996) observed, such measures are
“essentially experimental tasks”—often created by clinicians and sometimes borrowed directly from experimenters.
Fortunately, in creating and understanding such measures, the clinician has allies in the increasing number of clinician–
researchers in speech-language pathology and related fields who develop and share individual methods and reflections
on the measurement challenges they present. In this chapter, I try to pass along some of their insights and direct readers
to particularly helpful examples.
Page 252
Special Considerations for Asking This Clinical Question
The process of description can sometimes use norm-referenced measurement. When profiles of performance are
examined to assess broader patterns of strengths and challenges within different areas of communication, standardized
norm-referenced measures can provide useful information (Olswang & Bain, 1991). This is especially true when
limitations due to test content and measurement error are taken into account (McCauley & Swisher, 1984; Salvia &
Ysseldyke, 1981).
Usually, however, the process of description makes use of criterion-referenced measurement. Such measurement can
function at several levels of detail—from more global categorizations of language function in different modalities to the
detailed description of a specific language or communication skill (e.g., frequency of use of a particular grammatical
morpheme or communicative intent in a given conversational context). Although such descriptions may not always fit
within a view of measurement as the assignment of numbers to behaviors, they fit within the broader view of behavioral
measurement as a simplification process or as information compression used to aid decision making (Barrow, 1992;
Morris, 1994). Thus, as with all cases of measurement, our central concern with validity remains (APA, AERA, &
NCME, 1985; Messick, 1989). However, validity is fostered through means that may superficially appear unrelated to
the psychometric concerns described for norm-referenced instruments. For example, rather than a study of criterion-
related validity using numerous participants and other norm-referenced measures, evidence for descriptive measures may
involve the collection of supporting qualitative and subjective data for a much smaller number of cases, or even a single
case. Because a principal value of such measures is their close tie to a specific construct, the user’s alertness to the nature
of a targeted construct and the degree to which a specific measure serves as an acceptable indicator of it rises in
importance from large to gargantuan proportions.
Damico et al. (1992) discussed three complex characteristics pivotal to effective descriptive assessment techniques:
authenticity, functionality, and richness of description. Authenticity is used to refer to three related concepts: linguistic
realism, ecological validity, and psychometric veracity. Linguistic realism involves the treatment of communication in
data collection and analysis as a complex and synergistic process with the sharing of meaning as its goal, whereas
ecological validity refers to the preservation of natural communicative contexts in assessment. The third concept,
psychometric veracity, encompasses the traditional concepts of reliability and validity as well as the clinical practicality
of the measures in terms such as time and required resources. Concerns regarding authenticity have led to the use of the
term authentic assessment to refer to assessments designed with authenticity as their paramount virtue (e.g., Schraeder,
Quinn, Stockman, & Miller, 1999).
The term functionality as used by Damico et al. (1992) relates to effectiveness, fluency, and appropriateness of conveyed
meaning. This criterion focuses not just on obtaining information about clients’ underlying competence but also about
their ability to put knowledge into play effectively to achieve communication goals. The crite-
Page 253
rion of richness of description, cited by those same authors, entails the use of assessment procedures designed to provide
detailed descriptions of communicative performance leading to explanatory hypotheses for detected communication
difficulties. This criterion, then, associates descriptive measures with the manipulation of variables in the environment
(materials used, identity of communication partner, etc.) that can be studied for their immediate effect on performance.
I urge readers to examine the original source (Damico et al., 1992) in order to get a deeper feel for the intricacies
involved in assessment that preserve those characteristics of communication that make communication what it is. I also
suggest, however, that the overarching point Damico and his colleagues were making is that descriptive measures of
communication need to be valid—they need to measure what they purport to measure. Specifically, to the very great
extent to which communication is embedded in social interaction, intended to share meaning, and constrained by the
physiological and social makeup of its users, its measurement must honor those properties or suffer the fate of reduced
validity. The work of Damico et al. and numerous others (e.g., Kovarsky, Duchan, & Maxwell, 1999; Lund & Duchan,
1983, 1993; Muma, 1998) is extremely valuable in calling attention to these special properties—an endeavor made all
the more necessary by the frequent equating of principles such as validity only with norm-referenced measurement.
Because of growing sensitivity to the demands for a widening range of descriptive measures, advice about construction
of such measures by clinicians themselves has become increasingly available (e.g., Miller & Paul, 1995; Vetter, 1988).
Providing a succinct foundation for these recommendations, Vetter outlined a systematic process for developing informal
assessment procedures. In an earlier publication on criterion-referenced measures (McCauley, 1996), I modified that
process somewhat and have modified it further in Fig. 10.1 through the addition of a step encouraging clinicians to seek
out existing probes for possible use or adaptation.
In the process outlined in Fig. 10.1, the crucial first step is the formulation of the specific clinical question. In questions
of description, the clinician is relatively unencumbered by the external, regulatory forces (e.g., state requirements) that
affect both the kinds of clinical questions that are asked and the methods used to answer them. However, that does little
to decrease, and may even increase, the clinical perspicacity required at this step. The multiple levels of WHO’s
classification systems (WHO, 1980, 1998) come into play in the complexity of this step. Recall that these levels (e.g.,
impairment, disorder, disability, and handicap in the 1980 version) consider the broader effects of health conditions and
the role that society plays in determining the implications of a given condition for the individual. These levels bring to
mind the challenge of describing a child’s communication in terms of effects on the child’s participation in social roles,
as well as in the specifics of lexicon, grammar, and so forth. Consequently, the clinician who wishes to describe a child’s
communication will need to choose selectively from a large number of possible levels and areas for which description is
possible. In so doing, the clinician can focus on a smaller number of clinical questions whose answers can have a
powerful impact on the child’s treatment and subsequent functioning.
The remaining steps in Vetter’s process entail tailoring the procedure to meet the demands of a specific clinical question
and client, implementing it, and then evaluat-
Page 254
Test Name Reference Reviewed in Ages Receptive Phonology Semantics Morphology Syntax Pragmatics
Mental and/or
Measurements Expressive
Yearbooks
Production
Elicited imitation (Lust, Flynn, & Foley, The child is asked to repeat an utterance
1996) (usually a single sentence) exactly as produced
by an adult. It is assumed that only structures
reflecting the child’s grammatical competence
will be produced. An easy technique, even for
children as young as 1 or 2.
Strengths: You can choose stimuli very precisely and “know” what the child is attempting to say. Studies show good
agreement with comprehension and other data. The technique is applicable with small changes for children from a wide
range of cultures and languages and can be used at relatively low developmental levels.Weaknesses: Stimulus design is
complex due to the need to control variables that are not of direct interest (e.g., cognitive demand, attention, grammatical
complexity, sentence length). The technique has been criticized for relying unduly on short-term memory. Elicited
production (Thorton, 1996) Situations are created to increase the likelihood that the child will attempt to produce a given
structure, usually including the use of a “lead in” sentence that is produced by the adult to ‘‘provide the context and
‘ingredients’ for production of the structure without modeling it.” Sometimes this technique makes use of a puppet who
can be asked questions, directed to do things, or corrected. Typically used with normally developing children 3 years and
older. Strengths: Generation of the targeted structure rests more entirely with the child and is unlikely to be due to
chance. A large number of such probes have been described in the research literature.Weaknesses: The child’s
enjoyment level is key to the success of the strategy because she or he needs to be an active participant. The
awkwardness associated with a “no response” from the child may be intensified relative to other methods and may make
children less willing to continue. Working out the details required to elicit production may require considerable piloting
with adults or normally developing children. Similarly, correct productions are far more straightforwardly interpreted
than incorrect or untargeted productions.
Page 261
Comprehension intermodal preferential looking (Hirsh- The child is seated on a parent’s lap, hears a stimulus
Pasek & Golinkoff, 1996) and then is presented simultaneously with two novel
video images—one matching and the other not
matching what has been said. Greater time spent
watching the matching video is expected for
comprehended structures. Used for children between 12
months and 4 years of age.
Strengths: Minimal action is required. Use of videos allows the presentation of dynamic relationships. Can be used at
lower developmental levels than many other tasks.Weaknesses: Considerable time and expertise are required to create
the video stimuli. Only a few stimuli can be studied at any point in time. Picture selection task (Gerken & Shady, 1996).
The child hears the adult or a recorded voice presenting a verbal stimulus and then points to one of two to four pictures.
Typically this task is useful with normally developing children 20 to 24 months and older. Strengths: This technique has
been widely used to assess understanding or grammaticality of specific phonological distinctions, lexical comprehension
or comprehension of specific morphosyntactic structure. It tends to produce results comparable to object selection where
either task is feasible.Weaknesses: Considerable time can be required to produce comparable target and foil items.
Although use of tape-recorded speech or synthetic speech can help increase children’s attention, it increases the
complexity of task construction. Failures to respond are difficult to interpret. Acting-Out Tasks (Goodluck, 1996) The
child is asked to use provided props to act out a sentence that is read or played back from tape. Typically used for
children older than 3 years. Strengths: The task has a long history of use and is easy and inexpensive to use. It can be fun
for the child and can be particularly effective in assessing understanding of anaphora and pronominalization. It is
relatively open-ended task that may be less sensitive to response bias than many others, yet may be associated with a
tendency to repeatedly use a prop once it is picked up.Weaknesses: It cannot be used with constructions or predicates
that are difficult to act out and can be associated with responses that are difficult to interpret. Because of the cognitive
complexity of the task, it typically is used for normally developing children older than 3 years, thus limiting use with
children with language difficulties.
Page 262
informative in their utterances as they participate in a role-playing game. The child is assigned the role of “warehouse
manager” and is approached by the clinician “toy buyer” and asked where different toys might be found in the
warehouse. In a similar vein, Roeper, de Villiers, and de Villiers (1999) recently described their ongoing efforts to
design an extensive number of probes for assessing important interacting knowledge in pragmatics, semantics, and
syntax for 5-year-olds—for example, the need to know specific semantic and syntactic forms to achieve particular
pragmatic functions. Elaborately developed in terms of the materials, instructions, and scoring procedures, both the
probes developed by Lucas et al. and those developed by Roeper et al. illustrate that a measure’s formality is better
conceived of as a continuum than a dichotomy. Further, the thorough description of the probes offered by Lucas et al.
illustrate the extent to which sharing the results of well-developed probes can increase the efficiency of clinicians’ efforts.
Professional journals and a growing number of books on language development and disorders describe numerous clinical
and research probes (e.g., Brinton & Fujiki, 1992; Lund & Duchan, 1993; Miller, 1981; Miller & Paul, 1995; Simon,
1984). Table 10.3 showcases a modest sample of these probes for children across a wide range of ages and
developmental levels. It is offered to help provide a feel for the heterogeneity and considerable potential of such
measures.
4. Rating Scales
Rating scales consist of assigning numerals or labels to an individual’s behavior in a particular context. Rating scales are
typically completed by the clinician or other observer after the observation of individual communication events. At
times, such scales can be used to help observers summarize their experience across multiple observation experiences.
Rating scales differ from on-line observations, another type of descriptive measure, in that on-line judgements are made
during rather than after the actual communicative event.
Rating scales have a lengthy history in psychology and speech-language pathology (e.g., see Schiavetti, 1992), but
primarily in research rather than clinical settings (e.g., Burroughs & Tomblin, 1990; Campbell & Dollaghan, 1992).
However, increasing attention to the documentation of children’s functional limitations (Goldstein & Gierut, 1998) may
cause rating scales to be used with greater frequency in the future.
Two types of rating scales that have been most influential in speech-language pathology are interval scaling and direct
magnitude estimation (Campbell & Dollaghan, 1992; Schiavetti, 1992). These rating scales are usually used to compare
a large number of stimulus examples—something that is not always done with rating scales. When interval scaling is
used, the rater assigns each characteristic or behavior being rated to a linearly partitioned continuum, which is marked
off using numerals or descriptive labels. Thus, for example, a rater might be asked to rate a behavior on a continuum
from uncommon to most common, using a 6- or 7-point scale that might look something like this:
Page 263
or this:
When direct magnitude estimation is used, the rater is asked to rate each characteristic or behavior either as a proportion
of a standard stimulus provided as part of the rating system or as a proportion of other rated stimuli. Thus, for example,
Camp
Table 10.3
A Sample of Probes Used in the Description of Children’s Language
Comprehension of action words 12 to 24 months Child is asked to perform actions that the
(Miller & Paul, 1995) child’s parent(s) believes he or she may
understand on familiar objects and people.
Unconventional actions may be requested to
help distinguish action unconnected to the
request from intentional responses.
Bellugi’s negation test (Miller, 1981) The child is asked to provide the negative of
an utterance produced by an adult.
Variations can include different auxiliaries,
negative with indefinites, imperatives, and
multipropositional sentences.
Production of question forms (Lund & The Messenger Game. The child is asked to
Duchan, 1993) get information from a third party, ideally
one who is out of view. For example, “Ask
her how she got to this school?”
Comprehension of nonliteral meaning Early adolescence Joke explanations. The child is asked to
(Lund & Duchan, 1993) explain a joke that he or she finds humorous.
Comprehension of classroom 6 to 12 years Classroom directions and vocabulary that
direction vocabulary (Miller & Paul, are thought to be difficult for the child are
1995) incorporated in instructions that the child
must follow using paper and pencil.
Production of sequential description Middle and High school Description for using a payphone. Child is
(Simon, 1984) students shown a picture of a pay phone and asked to
give a step by step description of how it is
used.
Page 264
bell and Dollaghan (1992) described a method in which no standard stimulus is provided. In their study, listeners were
instructed to assign any number of their choice to the first of 36 speech samples they were asked to rate. Later samples
were then rated subjectively on the bases of (a) their proportional informativeness relative to the other judgments made
in the sample and (b) the understanding that higher numbers were to be associated with greater informativeness than
lower numbers.
The Observational Rating Scales that are included as part of the third edition of the Clinical Evaluation of Language
Fundamentals (Semel, Wiig, & Secord, 1996) provide an example of how a rating scale can be used to enrich the
clinician’s understanding of the school-age child and his or her communication environment. They are mentioned here
because of the relative dearth of such scales for school-age children, although they are becoming more common—for
example, the Functional Status Measures (Educational Settings) of the Pediatric Treatment Outcomes Form (ASHA,
1995) and the Teacher Assessment of Student Communicative Competence (Smith, McCauley, & Guitar, in press). In
addition, the Observational Rating Scales are of particular interest because of their novel inclusion of parallel rating
forms so that comparable information can be obtained from the child, his parent(s) and teacher(s). They represent an
example of the interval scaling method, one in which individuals are asked to respond in a summative fashion to past
observations.
Each scale of the Observational Rating Scales consists of 40 items addressing ‘‘troubles” facing the child in listening (9
items), speaking (19 items), reading (6 items), and writing (6 items). To illustrate the nature of these items, let me
indicate that the first listening item is “I have trouble paying attention” for the student version (often completed with the
speech-language pathologist); “My child has trouble paying attention” for the parent version; and “The student has
trouble paying attention” for the teacher version. Each item is rated as occurring never, sometimes, often, or always, with
DK (Don’t know) used to mark items for which the rater feels unable to pass judgment. The Observational Rating Scales
also describe procedures for the observers to identify and provide examples of their top five concerns, thus paving the
way for functionally oriented intervention planning.
The chief appeals of rating scales are the apparent ease with which they can be created and administered, as well as their
wide applicability (Pedhazur & Schmelkin, 1991; Salvia & Ysseldyke, 1998). These virtues, however, may mask their
susceptibility to a number of problems, especially ones stemming from poorly defined points along an interval scale and
from differences introduced by different raters. In a brief review of such measurement issues facing rating scales,
Pedhazur and Schmelkin (1991) concluded that ratings may often “tell more about the raters than about the objects they
rate” (p. 121). They cited a rich literature in which the perceptual aspects of the rating task make raters vulnerable to a
number of types of bias. Two common types of bias include halo effects, in which raters allow impressions of general
characteristics or previous knowledge to have a consistent effect on ratings, and leniency effects, in which overly
positive judgments appear to occur because the rater is familiar with the person whose characteristics are being rated
(Primavera, Allison, & Alfonso, 1996).
An additional challenge to valid use of rating scales lies in the need to achieve a successful fit between the nature of the
characteristic being rated and the type of scal-
Page 265
ing method used to rate it (Campbell & Dollaghan, 1992; Schiavetti, 1992). In particular, researchers have noted a
difference in what kind of scale is appropriate depending on whether the rated characteristic falls along a metathetic
versus a prothetic continuum. On a metathetic continuum, raters’ responses to differences between rated entities seem to
reflect qualitative distinctions; whereas on a prothetic continuum, raters’ responses to differences between rated entities
appear to reflect quantitative distinctions (Stevens, 1975). The classic contrastive pair illustrating these two types of
continuum are pitch and loudness. Without looking ahead to the next paragraph, can you anticipate which of those two
characteristics of sound is prothetic (i.e., characterized by quantitative rather than qualitative differences)?
If you decided that loudness was prothetic, you are in agreement with a large body of research suggesting that people
tend to treat judgments such as loudness as if they were judgements about whether a stimulus had “more” or “less” of
something (Stevens, 1975). In contrast, pitch differences tend to be judged as if they represent qualitatively different
stimuli. Well, the challenge to devising appropriate rating scales is that whereas direct magnitude estimation can validly
be used to measure either type of characteristic, interval scaling appears to only be valid for measuring characteristics
that are metathetic.
Campbell and Dollaghan (1992) suggested that because of the lack of research determining which language
characteristics are metathetic versus prothetic, direct magnitude estimation is a less risky choice for researchers and
clinicians who wish to use rating scales in their descriptions of children’s language disorders. They noted that direct
magnitude estimation can be used to provide a comparison of children’s spontaneously produced language against that of
their peers. Among the most important uses they saw for such judgments were the examination of change occurring as
result of or in the absence of treatment. In particular, Campbell and Dollaghan described a method in which 10 to 15
listeners could be used to provide ratings with a stable percentage of variability.
Specifically, Campbell and Dollaghan (1992) had 13 listeners compare the informativeness—‘‘amount of verbal
information conveyed by a speaker during a specified period of spontaneous language production” (p. 50)—achieved by
three children who had sustained severe brain injury with three age-matched controls, when both sets of children were
engaged in a video-narration task (Dollaghan, Campbell, & Tomlin: 1990). (Recall that the particulars of the direct
estimation method involved in this study were described earlier in the chapter when that rating method was introduced.)
The use of this technique provided social validation to the recovery patterns shown by the 3 children with brain injury
who participated in the study. The relatively large number of raters required for use of direct magnitude estimation may
preclude its use in many clinical situations. However, it may prove valuable as a means of validating more efficient
methods of social validation. In addition, it may prove valuable as a method that could provide exactly the information
required for certain clinical situations. For example, it might be used as described by Campbell and Dollaghan to support
to a relatively costly or lengthy treatment approach for a given child or group of similar children.
Not surprisingly, then, it appears that the use of rating scales as a descriptive measurement tool, like others discussed in
this section, has a greater complexity than might
Page 266
at first be apparent. Thus, wise users will require as much evidence regarding validity as possible for specific methods
prior to deciding to implement them clinically. Further evidence of their promise should prompt users to want to
participate in providing such evidence.
5. Language Analysis
Language sampling and analysis have enjoyed a long history of use in studies of children’s language acquisition (e.g.,
Brown, 1973; Miller, 1981; Templin, 1957). The variety of procedures recommended for elicitation of language samples
and for the derivation of measures based on them has grown appreciably over the past 40 years and has changed as
understandings of the nature of language impairments have changed (Evans, 1996a; Gavin, Klee, & Membrino, 1993;
Miller, 1996; Stromswold, 1996).
In a study of some 253 American speech-language pathologists who work with preschool children, Kemp and Klee
(1997) found that 85% of them used language analysis in their practice, with most preferring nonstandardized forms to
formal procedures. Language analyses are sometimes avoided by clinicians who report that they do not have the time to
incorporate them into practice or that they lack the computer resources that would make their use more time efficient
(Kemp & Klee, 1997). However, these objections are rapidly being addressed by the refinement and proliferation of
computerized analysis programs (Long, 1999). Innovations such as transcription laboratories staffed by nonprofessional
transcribers, the creation of databases reporting findings for large numbers of children, and the availability of analysis
procedures at no cost also point to greater practicality of language analysis in the future (Evans & Miller, 1999; Miller,
1996; Long, personal communication, January 7, 2000; Miller, Freiberg, Rolland, & Reeves, 1992).
Among the numerous discussions extolling the virtues of language sampling and analysis, Evans and Miller (1999)
offered one that is particularly powerful:
The language sample, by contrast [with available standardized tools], represents the child’s integration of specific
intervention goals within the larger communication context and provides clinicians with an opportunity to assess
children’s language skills dynamically across a range of situations that vary in communicative demand (e.g., free-play,
interview, narration, picture description). Language samples can be collected as often as necessary without performance
bias, and changes in children’s abilities can be documented across a wide range of linguistic levels. (Evans & Miller,
1999, pp. 101–102)
Additionally, such analyses can examine not only many aspects of language, but can also be used to examine how
complexity in one area may impact another—a theme of growing interest in the evolution of language assessment tools.
Although language analyses are typically used to assess aspects of expressive communication, they are also frequently
used as a means of examining receptive skills. In particular, it seems that children’s responses to the direction and
comments of their conversational partners provide data that are valued by many clinicians (Beck, 1996). In the next
section, the evolution
Page 267
of language sampling and analysis is described to help readers understand the variety of available measures and how
these measures have changed over time.
The Evolution of Language Analyses
In 1996a, Evans reviewed the changes in emphasis in language sampling techniques that have accompanied changes in
theoretical perspectives on language development and language disorders. In particular, she discussed the influence of
three dominant research paradigms spanning the past half-century: (a) the behaviorist learning paradigm, (b) the
formalist competence-based paradigm (encompassing “generative syntax, generative semantics, and a narrow
interpretation of syntax,” Evans, 1996a, p. 208) and (3) the functionalist paradigm. A brief summary of her comments is
relevant to anyone using language analysis because so many of the measures associated with earlier paradigms remain
available and in widespread use—sometimes in revised versions and sometimes in their original form (Kemp & Klee,
1997).
In the heyday of the behaviorist learning paradigm, the roles of the environment on learning and the word as the unit of
analysis were emphasized. Language acquisition was understood to occur through the reinforcement of correct use of
words and sentences (word sequences). Although standardized language tests (e.g., the Peabody Picture Vocabulary
Test, Illinois Test of Psycholinguistic Abilities) dominated language assessment methods during this period, language
analysis techniques were used as well and emphasized counts or descriptions of different verbal behaviors (e.g., type–
token ratio, measures of sentence length).
The second paradigm discussed by Evans (1996a), the formalist competence-based paradigm, was designed to address
the generativity of children’s language, that is, the use of novel and therefore unmodeled and presumably unreinforced
utterances (e.g., overregularization of past tense, as in “he goed.”). As Evans notes, this paradigm was made possible by
linguistic theory of the day (particularly the work of Chomsky), in which a major goal of linguists became the
identification of language-independent competencies, termed linguistic universals. Such universals were thought to
suggest features of languages and linguistic structure that were likely to occur in all languages.
Evans (1996a) suggested that initial orientations within the formalist paradigm were largely syntactic in nature and
proceeded on the assumption that domains of language—syntax, semantics, and so forth—could be viewed
independently. An assumption was also made that variability in performance was more likely to be a function of a
child’s knowledge than a function of contextual factors. According to Evans’s account, later developments in this
paradigm, fueled by theory and data from a variety of sources, shifted the focus somewhat—first to semantics, then to
pragmatics. Evans pointed out that language analyses associated with the formalist period similarly shifted, although
sometimes subtly, from largely syntactic measures (e.g., Developmental Sentence Scoring, DSS; Language Sampling,
Analysis, and Training, LSAT; and Language Assessment Remediation and Screening Procedure, LARSP) to measures
focusing on semantics (e.g., mean length of utterance in morphemes, MLUm) and, later, on pragmatics (e.g., Roth &
Spekman, 1984).
Page 268
Evans (1996a) noted that, throughout this period, the child’s task in language acquisition was largely seen as that of
acquiring competence in the underlying rules of the ambient language. Predictably, then, childhood language disorders
within this paradigm were seen as difficulties in acquiring the rules of the individual subsystems of language. In Evans’s
view, language assessments have thus grown through accretion to require elaborate analyses across semantics, syntax,
and pragmatics—a process that has been made more feasible through modern technology. Among the analyses she
associates with this period are the Systematic Analysis of Language Transcripts (SALT; Miller & Chapman, 1982, 1998)
and the Child Language Analysis programs (CLAN; MacWhinney, 1991).
Evans (1996a) suggested that functional theories, the last of the three paradigms, were prompted by difficulties in
accounting for children’s variability across contexts. If rule acquisition is what is taking place, then a form evidencing
that rule should either be present or not present in a child’s productions—not present in some situations, but not others,
with some conversational partners, but not others. The functionalist paradigm is reflected in works such as Bates and
MacWhinney (1989). According to Evans, it is based on the following premise:
Variability in speaker performance is simply the final solution to the interaction among the internal state of a complex
system (i.e., the underlying speaker competence), the structure of the system (e.g., word order, lexical items,
morphonology, suprasegmentals), and the impact of external constraints such as real-time language processing demands.
(Evans, 1996a, p. 254)
Within the functionalist paradigm, then, variability becomes a major source of information about the current state of a
child’s dynamic system (linguistic and nonlinguistic) as it responds to external conditions (e.g., situational or attentional
factors). Increased variability is seen as an opportunity for positive change. In addition, this paradigm emphasizes the
necessity of examining the interplay of language domains, an area identified by numerous authors as among the most
exciting challenges facing clinicians this decade (Howard, Hartley, & Muller, 1995).
Evans (1996b) provided an example of such interactions when she found fewer morphosyntactic omissions in the speech
of children with SLI when their utterances occurred within a conversational turn rather than adjacent to a shift in
conversational turn. Numerous studies beyond those just cited (e.g., Crystal, 1987; Panagos & Prelock, 1982; Paul &
Shriberg, 1982) argued that rich and powerful understandings of children’s speech and language development emerge
from the kinds of detailed analyses called for by current theory.
Certainly one of the major advantages of language sampling, then, is the variety of questions to which the resulting
sample can be put. For example, Dollaghan and Campbell (1992) described a taxonomy of within-utterance disruptions
arising from language rather than fluency disorders to help characterize the subtle deficits lying across language domains
that plague young speakers with language disorders, both developmental and acquired.
Table 10.4 lists some of the standardized measures currently used to describe children’s language skills based on
language samples. In this table, a variety of informa
Page 269
Table 10.4
Tools Available for Detailed Analyses of Language Samples
(Evans, 1996a; Long, 1999; Owens, 1998)
tion about the procedure and children for whom it would be useful are provided. In addition, those procedures that are
available on computer are indicated. Recently, one of these computerized programs, CP (Long, Fey & Channell, 1998)
has been made available without charge at the following Internet website: http://www.cwru.edu/artsci/cosi/cp.htm (Long,
January 7, 2000, personal communication).
Readers are reminded that computerized measures should be viewed hopefully (Long, 1991, 1999; Long & Masterson,
1993), but with caution as well (Cochran &
Page 270
Masterson, 1995). After all, computers render it possible to conduct language analyses that would be prohibitively time-
consuming if performed by hand, but they also make it possible to make really silly or wrong-headed mistakes more
quickly than ever—for example, to use the wrong analysis for a particular child. The user of such measures must
exercise as much caution as ever in selecting the specific sample to be used as input and in “buying into” the specific
techniques used. Further, one should recognize that although language samples are “natural’’ in the sense that they are
often not consciously structured by the clinician, they are nonetheless subject to the same contextual effects that affect
norm-referenced test performance (Plante, February 18, 2000, personal communication). A growing literature on the
subject of language analyses can help clinicians determine what is available and likely to be useful for their clients
(Cochran, & Masterson, 1995; Long, 1991, 1999; Long & Masterson, 1993).
Although a detailed account of even a single analysis tool is beyond the scope of this book, a summary of some recent
research may help the reader see the wealth of information obtainable through language analysis. Table 10.5 lists some
patterns of disordered language performance that can be described using the SALT (Miller, 1996). Miller and Klee
(1995) used these categories to characterize problems of 256 children from ages 2 years, 9 months to 13 years, 8 months.
The data were based on conversational and narrative samples, contexts that were selected because of the wealth of
research on the former and the important connection to literacy of the latter (Miller, 1996). Miller and Klee (1995) found
significant numbers of children at varying ages falling in one or more categories, with only 20 children not described by
any category.
For preschool children, one very specific measure that has remained in use in a relatively consistent form across the
paradigms described by Evans has been the MLU, measured in morphemes. Guidelines for the calculation of MLU as
described Chapman (1981) are shown in Table 10.6. MLU is regularly used clinically (Kemp & Klee, 1997; Miller,
1996) and has been incorporated in several of the procedures described in Table 10.4, including SALT. Its use is based
on the premise that, at least in younger children, increasing syntactic complexity will also require increasing utterance
length—especially when length is measured in morphemes and therefore would be sensitive to increases in either words
or grammatical or derivational morphemes.
Numerous studies lend credence to the value of MLU in describing language change through the preschool years
(Conant, 1987; Rondal, Ghiotto, Bredart, & Bachelet, 1988; Scarborough, Wyckoff, & Davidson, 1986). In 1993, Blake,
Quartaro, and Onorati found evidence that MLU correlated highly with a measure of grammatical complexity obtained
using the LARSP until an MLU of 4.5 was reached. Findings such as these have provided considerable support for
MLU’s widespread use in research as a means of grouping children according to language skill (Miller, 1996), but the
appropriateness of MLU depends on the precise focus of the study.1 Recent research (e.g., Aram, Morris, & Hall, 1993)
has also suggested the diagnostic utility
1 Leonard (1996) described several alternative measures for equating research groups that will be more appropriate in
certain circumstances, including mean number of arguments expressed per utterance, mean number of open-class words
per utterance, measures of unstressed syllable production or word-final consonant production, and expressive vocabulary.
Page 271
Table 10.5
A Clinical Typology of Disordered Language Performance
Based on Use of the SALT
Note. SALT = Systematic Analysis of Language Transcripts; MLU = mean length of utterance; NP-VP = n. From
“Progress in Assessing, Describing, and Defining Child Language Disorder,” by J. Miller, 1996, in K. N. Cole, P. S.
Dale, and D. J. Thal (Eds.), Assessment of Communication and Language (p. 319), Baltimore: Brookes Publishing.
Copyright 1996 by Brookes Publishing. Reprinted with permission. in clinical settings, particularly where production
difficulties are prominent features of the child’s difficulties.
Technical Considerations: Sample Size and Variations in Language Sampling Conditions
Recently, Muma et al. (1998) reported on a study conducted several years earlier in which language samples were
obtained from a group of seven normally developing children between the ages of 2 years, 2 months and 5 years, 2
months. They noted that 200–300 utterances were needed to obtain acceptable error rates on many grammatical
structures related to the child’s use of different grammatical systems (nominal, auxiliary, verbal) and grammatical
operations (use of relative clauses, do insertion, participle shifts, etc.). Specifically they found a 15% error rate for the
200–300 utterance samples versus error rates of 55 and 40%, respectively, for 50-utterance and 100-utterance samples.
Not surprisingly, then, these data suggest that the more specific the nature of the information that will be looked for in
the language analysis (i.e., whether detailed information about specific structures is sought), the longer the sample will
need to be (Plante, February 20, 2000, personal communication).
In a similar study, Gavin and Giles (1996) conducted a SALT analysis on language samples of varying sizes based on
either increments of time (12 or 20 minutes) or number of utterances (25–175, in 25-word increments). Study
participants were 20 children from 31 to 46 months of age. The researchers examined the test–retest relia-
Page 272
Table 10.6
A Summary of the Method for Calculating Mean Length of Utterance (MLU) in
Morphemes, as Described by Chapman (1981) as an Adaptation From Brown (1973)
The child’s speech is segmented using the criterion of terminal intonation (rising or falling). These procedures differ
from those of Brown (1973) in that a sample of the first consecutive 50 utterances (including the first page of
transcription) rather than 100 utterances (excluding the first page) is recommended. Excluded from the sample of
utterances are unintelligible or partially unintelligible utterances. Included are “doubtful” transcriptions and exact
utterance repetitions. Counting morphemes in each utterance Morphemes are defined as minimal meaningful units of a
language, with dog and -s given as examples. Counting rules based on those of Brown (1973) are given to address the
greater uncertainty of what constitutes a morpheme in the speech of a child. The total count for each utterance is
calculated, summed, and divided by the total number of utterances spoken to yield the MLU. The counting rules are
given verbatim: “(1) Stuttering is marked as repeated efforts at a single word; the word is counted once in the most
complete form produced. In the few cases where a word is produced for emphasis, or the like (no, no, no), each
occurrence is counted separately. (2) Such fillers as mm or oh are not counted, but no, yeah, and hi are. (3) All
compound words (two or more free morphemes), proper nouns, and ritualized reduplications count as single words.
Some examples are birthday, rackety-boom, choo-choo, quack-quack, night-night, pocketbook, seesaw. The justification
for this decision is that there is no evidence that the constituent morphemes function as such for these children. (4)
All irregular pasts of the verb (got, did, went, saw) count as one morpheme. Again, there is no evidence that the child
relates these to present form. (5) All diminutives (doggie, mommie) count as one morpheme because these children
do not seem to use the suffix productively. Diminutives are the stand forms used by the child. (6) All auxiliaries (is,
have, will, can, must, would) count as separate morphemes as do all catenatives (gonna, wanna, hafta, gotta) The
catenatives are counted as single morphemes, rather than as going to or want to, because evidence is that they function as
such for children. All inflections, for example, possessive (s), plural (s), third person singular (s), regular past (ed), and
progressive (ing), count as separate morphemes. (Chapman, 1981, p. 24) Chapman (1981) identified several special
characteristics of a sample that may affect the representativeness of the MLU: high rate of imitation (i.e., >20% of the
child’s utterances), frequent self-repetitions within a speech turn, a high proportion of answers occurring in response to
adult questions (i.e., >30–40% of the child’s utterances), frequent use of routines (such as “counting, saying the alphabet,
nursery rhymes, song fragments, commercial jingles, or long utterances made up by listing objects in a book or the
room’’), and a high proportion of utterances in which clauses are conjoined by and. Among the strategies she suggested
for addressing these problems are calculations conducted with and without imitations, self-repetitions, frequent routines,
and responses to questions. In addition, she suggested obtaining additional samples with another adult who asks fewer
questions when high rates of question responses are noted and the use of another measure (the T unit) when a high
proportion of utterances consist of clauses conjoined by and.
Page 273
bility of four measures (MLU, number of different words, total number of words, and means syntactic length) in samples
at these different lengths. They found that only at the largest number of utterances (about 175) did reliability coefficients
meet or exceed .90, the value considered acceptable for diagnostic use.
The implication of these findings extends beyond a simple admonition for clinicians to attempt to obtain larger sample
sizes on which to base language analyses or for them to be very aware of the potential for error dogging analyses based
on smaller samples—although those are clear and potent implications. Even more importantly, however, they illustrate
the connection between reliability and sample size that haunts many if not most descriptive measures. Obviously, rarer
structures or phenomena are more likely to be vulnerable, but additional research will prove helpful in guiding us toward
best practices in our choice of tools and sample sizes.
The conditions under which language samples are collected are known to affect numerous measures obtained in
language analyses (Agerton & Moran, 1995; Landa & Olswang, 1988; Miller, 1981; Moellman-Landa & Olswang, 1984;
Terrell, Terrell, & Golin, 1977). Even a partial listing of some of the variables affecting a child’s productions can leave
one quite daunted—for example, race and familiarity of communication partner, stimulus materials, number of
communication partners, number and types of questions asked, type of communication required (e.g., narrative,
description of a procedure), to name a few! It is possible to leave these variables uncontrolled—as is often done when an
unstructured conversation between clinician and child is used as the sample. In such cases, the clinician will want to
consider these variables in his or her analysis and interpretation process.
As an alternative to unstructured language samples, structured sampling tasks have been recommended as providing
more relevant (i.e., valid) information for some clinical questions. Following is a list of five sets of tasks designed to
elicit structured language samples for school-age children (Cirrin & Penner, 1992):
1. describing an object or picture that is in the view;
2. recalling a two-paragraph story told by the clinician without pictures;
3. describing a person, place, or thing that is not present in the immediate surroundings;
4. providing a description of how to do something familiar (e.g., making a sandwich); and
5. telling what the child would do in a given situation (e.g., waking up or seeing a house on fire)
This list illustrates tasks that manipulate some of the variables that may present a child with particular difficulty, thus
allowing the clinician to target language sampling for those areas of special importance for the individual child.
However, it is important to remember that each of these conditions is likely to affect more about the chilid’s productions
than simply the variable that appears to be manipulated. For example, on the basis of the precise way in which the task is
set up by the clinician, variables beyond the desired topic or level of language complexity will probably be affected.
Page 274
In another effort to help clinicians standardize the conditions under which they collect conversational language samples,
Campbell and Dollaghan (1992) offered a sequence of topic questions that they suggested be used in order, but only as
spurs to conversation. Thus, only topics that the child would show genuine interest in would be continued. Further,
additional topics introduced by the child would be pursued as long as they continued to interest the child. The intended
result was increased consistency across examiners. In brief, the sequence begins with questions about the child’s age,
birth date, and siblings; then proceeds to questions about family pets, favorite home activities, and school affairs; and
closes with questions about vacations, favorite books, and TV shows. Although this list is relatively conventional, the
decision of a group of colleagues to adopt it—or some other consistent set of starter questions—might help lend greater
consistency to the language samples obtained across children. This, in turn, would increase the integrity of local
measures that might be made using the data from a number of clients. However, it should be noted that standardization in
this way is not necessarily going to add to the representativeness of the sample for the individual child—that may best be
achieved by entering one of a child’s favorite activities and simply observing what happens there.
6. On-Line Observations
This category of descriptive measures is characterized by Damico et al. (1992) as real-time observation and coding of
behaviors exhibited during communicative interactions as they happen. Thus, these measures differ from rating scales
that are completed outside of that time frame. Although not at all rare in research on communication, Damico et al. noted
the relative rarity with which they are applied by speech-language clinicians in clinical practice.
McReynolds and Kearns (1983) described five kinds of observational information or codes that are frequently used in
applied research settings to obtain on-line measures: (a) trial scoring, (b) event recording, (c) interval recording, (d) time
sampling, and (e) response duration. As each of these is described, the reader will see that these same categories can be
used to describe the outcomes of probes. The chief difference between probes and on-line observations is that the latter
involves responses to a more naturalistic communication event, whereas the former involves a greater level of
contrivance on the part of the clinician.
In trial scoring, responses following a specific stimulus or trial are scored as correct or incorrect. Such responses can
occur either naturally or with prompting. Although correct versus incorrect are the most commonly used labels applied to
responses in trial scoring, a numerical code (which may in fact represent a type of rating scale) may be used to provide
greater detail about the nature of responses. One example of a numerical code is the multidimensional scoring system
used in the Porch Index of Communicative Ability in Children (Porch, 1979), which uses a 16-point scoring system to
reflect 5 dimensions (accuracy, responsiveness, completeness, promptness, and efficiency). Readers should note that
only rarely are such combinations of rating scales and trial scoring used in on-line situations because of the intense
demands on the rater, which leaves such measures quite vulnerable to problems with reliability.
Page 275
In event recording, a code is established consisting of behaviors (including verbal, nonverbal, or both) of interest. That
code is then used to summarize the targeted child’s behaviors over a given time period (e.g., a 15-minute period). One
example of a code that might be used in event recording would be the one developed by Dollaghan and Campbell (1992).
That code had been developed to describe within-utterance speech disruptions (i.e., pauses, repetitions, revisions, and
orphans—linguistic units such as sounds or words that are not reliably related to other such units within an utterance).
Whereas Dollaghan and Campbell used that code in an analysis of previously recorded language samples, it could also
be used for on-line observation.
Interval recording and time sampling, two sampling methods that are closely related, are also closely related to event
recording (McReynolds & Kearns, 1983). In interval recording, a set time period is divided into short, equal intervals (e.
g., 10 seconds) and events are noted as having occurred once if they occur at any point during the interval. In time
sampling, a set time period is again divided into intervals, but only the presence of the behavior at the very end of the
interval is recorded. In addition to the designation of intervals devoted to observation, this approach also includes
recording intervals in which no observations are attempted. In time sampling, therefore, a 7.5-second observation
interval might be followed by a 2.5-second recording interval. Time sampling has been thought to be associated with
fewer problems affecting accuracy than interval recording. However, both methods require that care be taken in the
selection of interval sizes (McReynolds & Kearns, 1983). Intervals that are too short are likely to increase recording
errors; those that are too long are likely to lose information due to wanting observer attention.
The last of the observational codes described by McReynolds and Kearns (1983) is the recording of response duration, in
which the duration of a specific event of interest (e.g., pause duration) is recorded using a stopwatch or other timing
device. Although response duration may not be applicable to many language phenomena, it can nonetheless prove quite
useful from time to time for children with language disorders. For example, a functional measure for a child with SLI
who demonstrates pragmatic difficulties might consist of time spent engaged in conversation with one or more peers
during recess. Alternatively, time spent in perseverative or noncommunicative speech (e.g., repeated recitation of a
television commercial) during a group activity might be used as a functional measure for a child with autism.
Damico (1992) provided an example of an on-line observational system, called Systematic Observation of
Communicative Interaction (SOCI), which makes use of event recording and time sampling. In SOCI, problematic
verbal and nonverbal behaviors are recorded along with information about several dimensions (such as illocutionary
purpose) each time it occurs within a fixed time period (a 10-second period that consists of a 7-second observation and 3-
second recording interval). Recorded behaviors include failure to provide significant information, nonspecific
vocabulary, message inaccuracy, poor topic maintenance, inappropriate response, linguistic nonfluency, and
inappropriate intonation contour. Four to seven recording periods of approximately 12 minutes each are recommended.
Although some data regarding reliability of this procedure are mentioned in Damico (1992), clearly this type of
procedure warrants additional evidence to provide better guidance regarding its interpretation and validity.
Page 276
7. Dynamic Assessment
Dynamic assessment procedures represent a large number of procedures that are designed to examine a child’s changing
response to levels of support provided by the clinician. Proponents of dynamic assessment might balk at its inclusion in
the list of measures reviewed in this chapter, maintaining that it represents an approach to assessment that is entirely
different from the rest. In fact, for proponents of dynamic assessment, most other forms of descriptive assessment can be
lumped into the single, usually less desirable category “static.” Within this conceptualization, static assessments assume
a constant set of stimuli and interactions between the child and tester, whereas dynamic assessments assume a changing
set of stimuli and interactions that are manipulated to provide a richer description of how the child’s performance can be
modified. Referred to as dynamic assessment here, a wide variety of related assessment strategies fall within this
category.
To those unfamiliar with the term dynamic assessment, Olswang and Bain (1991), two of its foremost advocates in
language assessment, helpfully noted its strong resemblance to a more familiar and venerable concept. Specifically, they
compare it with stimulability, in which unaided productions (usually in articulation testing) are followed by efforts to
obtain the child’s “best” productions when aided by the clinician’s visual, auditory, and attentional prompts. In both
stimulability and in dynamic assessment procedures, facilitating actions on the part of the clinician are designed to help
determine the upper limits of a child’s performance. As a result, the boundaries of assessment and treatment are blurred.
This blurring has led to the use of the term mediated learning experience (Feuerstein, Rand, & Hoffman, 1979; Lidz &
Peña, 1996) to refer to one model of dynamic assessment. It also foreshadows the integration of such assessment
techniques into treatment (e.g., Norris & Hoffman, 1993).
Initially applied in cognitive and educational psychology by Feuerstein and others (e.g., Feuerstein, Rand, & Hoffman,
1979; Feuerstein, Miller, Rand, & Jensen, 1981; Lidz, 1987), dynamic assessment models are typically based on the
work of Vygotsky (1978), who proposed the zone of proximal development (ZPD) as a conceptualization of the moving
boundary of a child’s learning. The zone of proximal development is defined as “the distance between the actual
developmental level as determined by independent problem solving and the level of potential development as determined
through problem solving under adult guidance or in collaboration with more capable peers” (Vygotsky, 1978, p. 86).
Problem solving or behaviors lying within this zone are thought to represent those areas where maturation is occurring
and to characterize development “prospectively” rather than ‘‘retrospectively” as is done with typical, static assessment
(Vygotsky, 1978).
The ZPD has been interpreted as being indicative of learning readiness. Therefore, its description through dynamic
assessment has been considered especially useful for identifying treatment goals (Bain & Olswang, 1995; Olswang &
Bain, 1991; 1996). Specifically, Olswang and Bain (1991, 1996) suggested that tasks that children perform with little
assistance do not warrant treatment, and those that children fail to perform, even when provided with maximal
assistance, are not yet appropriate targets. Instead, the most appropriate targets are likely to be those that children
perform only
Page 277
when given considerable assistance. Modifiability of performance in response to adult facilitation has also been shown to
predict generalization of performance to new situations, such that children who demonstrate less modifiability show less
transfer (Campione & Brown, 1987; Olswang, Bain, & Johnson, 1992).
Another benefit of dynamic assessment observed by Olswang and Bain (1991) is that dynamic assessment strategies
allow the clinician to determine not only what the child is learning, but also how that learning can be supported through
the manipulation of antecedent and consequent events. They note that whereas consequent events such as the nature of
reinforcement (e.g., tangible vs. social) and schedule of reinforcement (e.g., continuous vs. variable) have received
attention for many years in speech-language pathology, antecedent events receive greater attention in dynamic
assessment. Among the antecedent events highlighted in dynamic assessment are the use of models or prompts, the
selection of the modalities of stimuli or cues that are used, and the number of stimulus presentations that are provided.
Table 10.7 provides a hierarchy of verbal cues used to provide differing levels of support for children with specific
expressive language impairment learning two-word utterances (Bain & Olswang, 1995). In the study, which was
designed to validate the
Table 10.7
A Sample Hierarchy of Verbal Cues
Condition Cue
Available Tools
Practical Considerations
David is 8 years old and was diagnosed at the age of 7 with a fatal form of a genetic neurodegenerative disease,
adrenoleukodystrophy. He had developed normally until about age 6½, when he began showing signs of clumsiness and
behavior problems that had initially been attributed to the stresses of a cross-country move and beginning first grade.
Currently, he follows simple verbal directions with some consistency but rarely speaks. His family is interested in both
his current level of comprehension and in information about the rate at which his communication skills are declining so
that they can facilitate the child’s participation in the family and plan more for his ongoing care.
Tamika, a 5-year-old girl with specific expressive language impairment, has been seen for treatment since age 3.
Initially her treatment was aimed at increasing the frequency and intelligibility of single word productions; more recent
goals have focused on her use of grammatical morphemes and monitoring comprehension of directions. In her efforts to
adjust Tamika’s treatment and monitor her overall progress, Tamika’s speech-language pathologist uses periodic
standardized testing along with frequent
Page 294
informal probes, including probes of treated, generalization, and control items. The speech-language pathologist is
concerned about her ability to assess the true impact of treatment on Tamika’s social communication with peers and
family members because Tamika’s family speaks Black English, whereas the clinician does not. She would like to find an
appropriate assessment strategy to help document Tamika’s ongoing communication skills.
The five certified speech-language pathologists working within a small Vermont school district are eager to demonstrate
the efficacy of their work with school-age children because of concerns about cutbacks in neighboring special education
budgets. They decide to participate in ASHA’s National Outcomes Measurement System and begin collecting data for
each of their students. In addition, because of their commitment to improving the quality of their practice, they also
decide to use a computerized language sampling system with all of their preschool and first grade children with
language problems.
The Nature of Examining Change
The examination of change in children’s language disorders actually encompasses a fairly large number of related
questions—Is this child’s overall language changing? What aspects in particular are changing? Is observed change likely
to be due to treatment rather than to maturation or other factors? Should a specific treatment be continued, or has
maximum progress been made? Should termination of treatment occur? How effective is this particular clinical practice
group in achieving change with the children it serves? These assessment questions present some of the most challenging
issues facing speech-language pathology professionals (e.g., Diedrich & Bangert, 1980; Elbert, Shelton, & Arndt, 1967;
Mowrer, 1972; Olswsang, 1990; Olswang & Bain, 1994).
Described with regard to a single child, methods used to examine change will fuel decisions regarding how the child
moves through a given treatment plan, whether alternative treatment strategies should be explored, and, finally, whether
treatment should be terminated. Providing a more formal categorization, Campbell and Bain (1991) drew on the
framework of Rosen and Proctor (1978, 1981) to describe three dimensions or kinds of change: ultimate, intermediate,
and instrumental.
Ultimate outcomes constitute grounds for ending treatment, and they should be established at the initiation of treatment.
They are similar to long-term treatment objectives, with levels of final expected performance defined in terms of ‘‘age
appropriate, functional, or maximal communicative effectiveness” (Campbell & Bain, 1991, p. 272). Modification of an
ultimate outcome might occur. For example, a functional outcome level might initially be set for a child because of
expectations that performance at a level with same-age peers was unrealistic. However, if treatment data suggested
otherwise, a revision in outcome level would be appropriate (Campbell & Bain, 1991).
Intermediate outcomes were seen by Campbell and Bain (1991) as more specific and numerous for a given client. They
relate to individual behaviors that must be acquired in order for the ultimate outcome to be achieved and for progression
through
Page 295
a given hierarchically arranged treatment to occur. Data from treatment tasks within a session are given as an example of
such data.
Instrumental outcomes illustrate the likelihood that additional change will occur without additional treatment (Campbell
& Bain, 1991). Data documenting generalization fit into this third category. Campbell and Bain acknowledged that this
type of outcome is challenging to identify because of the difficulty in knowing at what point evidence of generalization
reliably predicts improvement towards ultimate outcomes.
The feature that most complicates the assessment of change in children is that children’s behavior is characterized by
change stemming from a variety of sources, most of which are related to growth and development. With few exceptions,
children—even those with quite significant difficulties—are benefiting from developmental advances that enhance their
communication skills. Sometimes change occurs broadly and sometimes in some areas more than others. Even children
who have sustained severe brain damage during early childhood will experience developmental benefits as well as the
physiological benefits of biological recovery. Only a few exceptions to this upward trend exist—for example, in children
with very severe neurologic damage or with neurodegenerative disease and in children who tend to regress in
performance when therapy is withdrawn (e.g., some children with developmental dyspraxia of speech or mental
retardation). In all cases, however, the speech-language pathologist’s assessment of whether change is occurring and
why it is occurring must be gauged on an terrain that is rarely flat and is sometimes a series of foothills.
Clinical questions involving change make use of many of the same types of measures discussed in chapters 9 and 10 and
often examine similar issues across the added dimension of time. Nonetheless, despite their importance for work with
children with language disorders, at least until recently such questions have generally received less attention than
questions related to screening, identification, or description at a given point in time. Thankfully, a variety of external
factors affecting clinical practice described in preceding chapters, such as the demand for greater accountability in
schools and hospitals, are helping to encourage and even mandate greater research attention to the assessment of change
(Frattali, 1998b; Olswang, 1990, 1993, 1998).
Once, broad questions regarding the value of treatment approaches lay principally within the purview of researchers,
who conducted treatment efficacy research in highly controlled conditions. Over the past decade, however, concerns
about accountability have caused individual professionals in speech-language pathology to become more active in
collecting and using such data as well (Eger, 1988; Eger, Chabon, Mient, & Cushman, 1986). The primary emphasis on
evidence obtained in tightly controlled conditions has been shifted to include emphases on evidence obtained under the
very conditions in which treatment is typically conducted—data that are typically referred to as outcomes.
In this chapter, the specific considerations affecting the assessment of change in clinical practice are addressed, followed
by the special considerations relating to tools that are available to address this issue. Finally, practical considerations
related to outcome assessment are discussed for the ways in which they shape professional practices in this area of
assessment.
Page 296
Special Considerations for Asking This Clinical Question
At least four special concerns complicate the process of answering clinical questions regarding change: (a) identifying
reliable, or real, change; (b) determining that the change that is observed is important; (c) determining responsibility for
change; and (d) predicting the likelihood of future change (Bain & Dollaghan, 1991, Campbell & Bain, 1991; McCauley
& Swisher, 1984; Schwartz & Olswang, 1996). These concerns affect both global inferences regarding a child’s overall
progress—ultimate outcomes as well as the more specific decisions involved in specific treatment goals—intermediate
and instrumental outcomes (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Olswang & Bain, 1996).
Identification of Reliable and Valid Change
Because examination of change depends on a comparison of measurements made on at least two occasions, reliability in
the measurement of change is no more certain than the reliability of a single measurement. In fact, there is every
indication that it is less so (McCauley & Swisher, 1984; Salvia & Ysseldyke, 1995). In order to get an idea of the effect
of measurement error on the examination of change, consider the case of a child whose score on a specific measure taken
4 months apart changes from 15 to 30, where 80 is the highest possible score. Initially, this change would appear to be
cause for some degree of celebration—more restrained if you looked just at the number of points gained out of the
number possible; less restrained if you looked at the fact that the child had doubled his score. However, once you remind
yourself that measures vary in their reliability (sometimes quite wildly), you realize that more information is needed
before party invitations can be sent out. Depending on the reliability of the measure, each observed score could fall quite
off the mark of the test taker’s real score, with unfortunate consequences for the believability of observations about the
difference between the two testings. The difference between these two scores could be described as a difference score or,
more frequently in this kind of situation, a gain score.
In fact, gain scores are often less reliable than the measures on which they are based (Mehrens & Lehman, 1980; Salvia
& Ysseldyke, 1995). Although concerns about gain scores are typically expressed in relation to standardized norm-
referenced measures, they apply equally to other quantitative measures. The nature of the measure used in the preceding
example was intentionally ambiguous in order to emphasize that point.
The advantage of some standardized norm-referenced tests is the availability of information allowing one to estimate the
risk of error associated with individual gain scores. Using the standard error of measurement and methods like those used
to examine difference scores when they occur in profiles, it is possible to examine the likelihood that a difference score
is reliable (Anastasi, 1982; Salvia & Ysseldyke, 1995). Indeed, some tests include graphic devices on their scoring sheets
that will help users determine whether a difference is likely to be reliable. However, there is still reason to believe that
numerous norm-referenced tests continue to fail to provide this information for users (Sturner et al., 1994).
Page 297
The problem facing norm-referenced instruments, however, is equally shared or even more intense for informal
measures: Informal quantitative measures will almost never provide that information. Thus, additional strategies are
needed for providing evidence of reliability—that is, evidence that a measure is likely to be consistent over short periods
of time, when used by different clinicians, and so forth—and is thus able to reflect real change, rather than error, when it
occurs. As you will see later in this chapter, single subject designs constitute the most powerful of these strategies.
As a sophisticated observer of psychometric properties, you may be waiting for the other shoe to drop—the validity
shoe. Although it might be possible for developers of highly developed standardized measures to study the ability of
their measure to capture significant change as a form of criterion-related validity evidence, they almost never do so.
Instead, for most measures in speech-language pathology and other applied behavioral sciences as well, the examination
of validity has been couched in terms of discussions of “importance”: Is observed change that appears to be reliable also
important?
Determining That Observed Change Is Important
Issues about the importance of change can be complex. They include questions such as, Is the change large enough to be
significant? and Is the nature of the change such that it is likely to affect the child’s communicative and social life?
These are some of the questions that Bain and Dollaghan (1991) explored under the notion of clinically significant
change.
A number of complementary indicators of “importance” have been put forward. The most important of these are (a)
effect size—Did much happen? (Bain & Dollaghan, 1991); (b) social validation—Did it make a difference in this
person’s communicative life? (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Kazdin, 1977, 1999; Schwartz &
Olswang, 1996); and (c) the use of multiple measures (Campbell & Bain, 1991; Olswang & Bain, 1994; Schwartz &
Olswang, 1996).
Effect Size
In the statistical and research design literature, a distinction is made between statistical significance and substantive
importance, or meaningfulness. That distinction, although often overlooked by researchers who focus on statistical
significance as if it were the holy grail (Young, 1993), is a valuable one for our thinking about the clinical importance of
change we observe in children. Effect size, which refers to the magnitude of difference observed, is frequently discussed
in relation to substantive importance, or clinical significance, and is discussed at some length later in this section.
Statistical significance is a relatively straightforward concept. Specifically, when a research finding is statistically
significant, a statistical test has suggested that the finding is unlikely to have occurred by chance, that it is rare (Pedhazur
and Schmelkin, 1991). More complex, however, is the matter of determining whether a statistically significant finding is
meaningful, that is, whether it says anything important about the matter under study (Pedhazur and Schmelkin, 1991). A
term frequently used to refer to the meaningfulness or substantive importance of a difference to clinical decision mak-
Page 298
ing is clinical significance (Bain & Dollaghan, 1991; Bernthal & Bankson, 1998). Other terms applied to this concept in
the rich psychological literature on the topic include social validity, clinical importance, qualitative change, educational
relevance, ecological validity and cultural validity (Foster & Mash, 1999).
A research example using a difference between two groups at a single point in time can help illustrate the distinction
between statistical significance and substantive importance. In a research study one might compare the performance of
two groups on a given test with 100 items and find that the two groups differed in their performance by just 2 items.
Further, the difference might be shown to be statistically significant. Despite the statistical significance, however, most
observers, if aware of the size of the difference, would consider a difference of just 2 points to merit no more than a yawn
—no matter how much verbal arm waving the researcher in question might use to inspire interest. In contrast, if a much
larger difference had been obtained and found to be statistically significant, most observers would be moved to rapt
attention, having been persuaded that the basis for group assignments had at least some sort of important relationship to
the subject covered by the test.
Using an analogous clinical example, one can imagine achieving a very consistent result when using a particular
treatment with a given child—for instance, Tamika, from the introduction of this chapter. Perhaps Tamika makes gains
of one or two items on untreated probes that are used over the course of a semester to monitor her progress in the use of
grammatical morphemes. That relatively high consistency (or reliability) of change, however, would probably not please
you (or Tamika) and would probably send you scrambling to find an alternative, more effective intervention strategy.
The clinical significance of change observed for Tamika simply would not warrant contentment with the current
treatment.
Effect size, which can be measured in a variety of ways, generally refers to the magnitude of the difference between two
scores or sets of scores, or of the correlation between two sets of variables (Pedhazur & Schmelkin, 1991). Authors
regularly suggest that researchers in speech-language pathology and elsewhere appear to fixate on statistical significance
at the expense of effect size or other measures that are more amenable to decisions about the value of information to
decision making (e.g., Pedhazur & Schmelkin, 1991; Young, 1993). Because information about the reliability of
difference scores is difficult and often impossible to come by for the measures clinicians use to examine change,
clinicians and their constituents are much more likely to want to inspect the actual magnitude of changes with an eye
toward its clinical meaning. Effect size alone cannot be the sole data used to determine the meaning of a particular
difference because other factors will need to be taken into account (e.g., the social significance of the difference, the
likely generalizability of the difference). However, it can be an important element in that process (Bain & Olswang,
1995).
Bain and Dollaghan (1991) described a couple of strategies for looking at effect size. One of these strategies uses
standard scores, takes into account the absolute amount of change that has occurred, and is therefore primarily limited to
use with norm-referenced standardized measures. The other uses age-equivalent scores, looks at the relative size of
change, and is subject to the vagaries associated with that inferior method of characterizing performance.
Page 299
Using standard scores to examine change, Bain and Dollaghan (1991) noted that the amount of change can be expressed
in terms of standard deviation units and compared against an arbitrary standard. Thus, a difference might be considered
of practical significance if it met or exceeded a change of so many standard deviation units—with those authors citing 1
standard deviation as a frequently used standard. For instance, imagine that at Time 1, a child receives a standard score
of 70 on a test with a mean of 100 and standard deviation of 10. Then, at Time 2, the child receives a score of 81 on that
same test. The amount of change would be considered of clinical significance because it corresponded to slightly more
than one standard deviation.
As long as the measure that is being used has been carefully selected for its validity for the given child and content area,
this method seems a reasonable one for many purposes. In particular, its use is strengthened if the time period
encompassed by the comparison results in a comparison against a single normative subgroup. Specifically, if a child’s
performance can be compared with just a single normative subgroup over time (e.g., all of the children age 5 years, 1
month to 6 years), then the extra variability introduced by comparing his her first performance with one set of children (e.
g., the children from 5 years to 5 years, 6 months) and then with another (e.g., the children from 5 years, 7 months to 6
years) can be avoided.
The use of standard scores is also preferable to the same method applied using age-equivalent scores and a cutoff
established around a certain age-equivalent gain (Bain & Dollaghan, 1991) because of the poor reliability of such scores
(McCauley & Swisher, 1984). Admittedly, at this point, selection of the cutoff in this strategy using standard scores is
arbitrary—how much change should be regarded as clinically significant can serve as a point of considerable argument.
However, additional research by test developers and others could validate specific levels in a manner quite analogous to
that proposed for cutoffs used in other areas of clinical decision making (Plante & Vance, 1994).
The Proportional Change Index (PCI), the alternative strategy for examining effect size described by Bain and Dollaghan
(1991), provides a relative measure of change arising from the work of Wolery (1983). The measure is relative in the
sense that it attempts to examine the rate of change characteristic of the child’s behavior for the period before treatment
as compared with the rate observed during treatment. Specifically, the PCI is the proportion created when the child’s
preintervention rate of development is divided by the child’s rate of development during intervention. The
preintervention rate of change is estimated by dividing the child’s age-equivalent score on a measure taken just before
the beginning of treatment by his age in months. The rate of development during intervention is estimated by dividing
the gain score obtained for that measure when it is readministered after a period of treatment by the duration of
treatment. For a child whose behavior is being monitored over time without intervention, the measure might be used to
examine the period before observation with that observed during the period of observation. The merit of this particular
measure is that it “takes into account the number of months actually gained, the number of months in intervention [or
observation] and the child’s rate of development at the pretest date” (Wolery, 1983, p. 168). Figure 11.1 illustrates the
calculation of PCI for two children: Shana, who shows excellent gains in receptive vocabulary, with twice as
Page 300
Fig. 11.1. A hypothetical example showing the calculation of the Proportional Change Index (Bain & Dollaghan, 1991; Wolery, 1983) for
two children.
Page 301
Page 302
much progress in treatment as prior to treatment; and Jason, who shows progress in receptive vocabulary acquisition that
is no better in treatment than it had been prior to treatment.
If the two rates of change used in the equation for PCI are similar, the calculated value for PCI will approach a value of
one. On the other hand, if treatment or other factors have accelerated development, the PCI should be positive, with
larger PCI’s indicating greater acceleration. Thus, for example, a PCI of 3 would imply that change had occurred three
times as quickly during treatment as preceding it. Alternatively, a PCI of .5 would suggest that change had occurred at
half the rate during the treatment or observation period as preceding it.
As described earlier, the PCI is usually recommended for its utility in examining change during a period of intervention
in which positive change is expected. Nonetheless, it might also be used if one were interested in examining alterations
in rates of change occurring under conditions like those described for David at the beginning of the chapter. Recall that
David had been diagnosed with a neurodegenerative disease that was predicted to result in skill loss. It might also be
used under conditions in which problems in development were suspected (as in the case of a suspected ‘‘late talker”), but
the child’s clinician had opted for a watch-and-see strategy with a planned 6-month reevaluation.
Bain & Dollaghan (1991) noted that the PCI rests on two problematic assumptions, with the first being that change in
children’s skills occurs at a constant rate in the absence of intervention. A plausible alternative to this assumption is that
change may occur at varying rates during development—with children’s behaviors sometimes racing ahead, sometimes
holding steady, and sometimes, perhaps, even regressing for a time. The problem with the assumption of constant change
embodied in the PCI is addressed to some extent by the use of single subject designs, a specific method that is described
in greater detail later in the chapter. Single subject designs escape this assumption through the clinician’s active
examination of change patterns during periods in which intervention is not occurring as well as when it is. Thankfully,
too, the question of whether change is constant can be addressed empirically. Although additional information is needed
to determine the extent to which this assumption is tenable, efforts to examine patterns of change are underway and
suggest that over shorter time periods the assumption of a constant rate of change is probably false (Diedrich & Bangert,
1980; Olswang & Bain, 1985).
The second problematic assumption of the PCI lies in its use of age-equivalent scores and the temptation that it presents
for clinicians to use tests that present such scores without much in the way of empirical support—either for the age-
equivalent scores or for the test in its entirety. Bain and Dollaghan (1991) acknowledged this potential drawback and
implicitly recommended that clinicians should search for the highest quality measures to use for documenting change.
However, they also suggested that in the absence of such measures, the PCI may offer a better alternative than the simple
assumption that a gain in age-equivalent scores over time represents progress.
An additional limitation affecting the PCI is the need for users to adopt an arbitrary basis for determining when a certain
amount of change is sufficient to support the use of time and other resources required to achieve a particular gain. Thus
far, no meas-
Page 303
ure described herein or proposed elsewhere has been able to claim a rational basis for its particular standard or cutoff.
In principle, then, the two measures of effect size that I have described (standard score gain scores and PCI) seem to
represent strong contenders for use in decisions about the importance of observed change—both for change observed
during treatment or for change observed over a period of time in which intervention is not used but a child’s performance
is monitored. However, additional research is needed to validate their use in decision making, particularly in the case of
the PCI in which the strength of the logic behind the measure is undermined by its dependence on age-equivalent scores.
I also call readers’ attention to the fact that both of these methods will more readily be implemented for standardized
norm-referenced tests than for other types of measures that might be used to describe a child’s language.
Social Validation
In examining the importance of change, clinicians are almost always interested in considering whether observed changes
conform to theoretical expectations, especially developmental expectations, that imply a hierarchy of learning in which
some behaviors are seen as prerequisites to others (Bain & Dollaghan, 1991; Lahey, 1988). Put differently, clinicians are
interested in determining whether the child has made gains that theoretically appear to be movements along the “right”
path. Gains on those behaviors that are seen as precursors to further advancement are judged to be more important than
those that are not.
Additionally, clinicians have always valued and sometimes solicited family and teacher reports asserting progress as de
facto evidence that change has occurred and is important. This way of thinking about the importance of language change
falls under the term social validation.Social validation also complements the use of effect size in fostering the richest
possible conceptualization of “importance.” Acknowledging that such evidence has value is consistent, first of all, with
an appreciation that the functional and social effects of communication disorders warrant greater incorporation into
clinical practice (Frattali, 1998b; Goldstein & Geirut, 1998; Olswang & Bain, 1994).
In a different context (discussing research significance as opposed to clinical significance), Pedhazur and Schmelkin
(1991) offered a quotation from Gertrude Stein: “A difference in order to be a difference must make a difference” (p.
203). If rephrased slightly, this quotation also seems to speak to efforts to examine the importance of change in
children’s language: For change in a child’s language to be significant, it must make a difference in the child’s life.
Use of measures to examine the functional and social impact of change is also consistent with the growing appreciation
of qualitative data described in the last chapter. Because qualitative data are unapologetically subjective in nature
(Glesne & Peskin, 1992), they may be used very effectively—more effectively than reams of quantitative data—to
address questions related to the social context supporting and affecting a child and to how the child is viewed in that
context. Over the past few decades, quantitative as well as qualitative measures have received growing attention for the
purpose of assessing function and social impacts of treatment (Bain & Dollaghan, 1991;
Page 304
Campbell & Bain, 1991; Campbell & Dollaghan, 1992; Koegel, Koegel, Van Voy, & Ingham, 1988; Olswang & Bain,
1994; Schwartz & Olswang, 1996).
Kazdin (1977) described a process by which such measures can be used to look at the importance of behavioral change.
In particular, he focused on behavioral change achieved through applied behavior analysis and based his work on that of
Wolf and his colleagues (e.g., Maloney et al., 1976; Minkin et al., 1976; Wolf, 1978). Kazdin defined social validation as
the assessment of “the social acceptability of intervention,” where such acceptability could be assessed with regard to
intervention focus, procedures, and—importantly for this discussion—behavior change. More recently, he has defined
clinical significance as ‘‘the practical or applied value or importance of the effect of an intervention—that is, whether the
intervention makes real (e.g., genuine, palpable, practical, noticeable difference in everyday life to the clients or others
with whom the clients interact” (Kazdin, 1999, p. 332). Although Kazdin and numerous other authors working in the
area of clinical psychology (e.g., Foster & Mash, 1999; Jacobson, Roberts, Berns, & McGlinchey, 1999; Kazdin, 1999)
have continued to elaborate on the concepts outlined in Kazdin (1977), basic issues raised in that earlier work remain
relevant. In particular, this relevance derives from the lack of empirical validation supporting many of the highly
developed measures of clinical significance proposed in the clinical psychology literature (Kazdin, 1999).
Kazdin (1977) recommended two general approaches to the social validation of behavior change that have been
embraced by a number of researchers in child language disorders—social comparison and subjective evaluation (Bain &
Dollaghan, 1991; Campbell & Bain, 1991; Campbell & Dollaghan, 1992; Olswang & Bain, 1994; Schwartz & Olswang,
1994). Social comparison involves comparisons conducted pre-and post-intervention between behaviors exhibited by the
child receiving intervention with those of a group of same-age peers who are unaffected by language impairment
(Campbell & Bain, 1991). Astute readers will find this method reminiscent of a normative comparison. However, instead
of comparisons on a standardized measure against a relatively large group of ostensible “peers,” here the child’s
performance on a more informal measure (usually a clinician-designed probe) is compared against that of a relatively
small group of actual peers. The value of this technique will certainly be affected by the care taken to choose a
representative, if small, comparison group. In addition, it may also prove most valuable in cases where a norm-
referenced comparison using a larger group is unavailable because no appropriate measures or appropriate normative
samples exist for the targeted behavior and particular client.
Subjective evaluation involves the use of procedures designed to determine whether individuals who interact frequently
with the child see perceived changes as important (Kazdin, 1977). Methods that have been proposed for these purposes
in speech-language pathology range from quite informal to relatively sophisticated. Thus, for example, at the informal
end of the continuum, it has been suggested that parents, teachers and other adults who are familiar with the child be
asked to appraise the adequacy of a child’s performance following a period of intervention (Bain & Dollaghan, 1991;
Campbell & Bain, 1991). Clearly these data may be qualitative in nature (Olswang & Bain, 1994; Schwartz & Bain,
1995) and would benefit from the clinician’s use of triangulation with other sources, as discussed in the previous chapter,
Page 305
thus implying the use of multiple measures. This is consistent with the idea emphasized in Kazdin’s (1999) recent work,
that “clinical significance invariably includes a frame of reference or perspective” (p. 334).
A more intermediate level of complexity might involve use of an existing rating scale, such as the Observational Rating
Scales of the Clinical Evaluation of Language Functions—3 (Semel, Wiig, & Secord, 1996), in which a similar rating
scale is completed by the child, the parent(s), and a classroom teacher. The growing interest in the development of
functional measures for use with children in school settings will certainly provide many new alternatives of this kind.
Addition of this type of measure to the very detailed measures of progress being used for Tamika may not only provide
strong evidence of functional impact, but may also help reduce possible bias in the assessment of progress achieved by a
child who speaks a dialect usually underrepresented in standardized measures.
A higher level of complexity in the use of subjective evaluation would involve the use of a panel of naive listeners who
could be asked to use a rating strategy such as direct magnitude estimation to make judgments about some aspect of the
communicative effectiveness of a child’s productions. Campbell and Dollaghan (1992) described the use of a 13-person
panel that was asked to rate the informativeness (“amount of verbal information conveyed by a speaker during a
specified period of spontaneous language”, p. 50) of utterances produced by nine children with brain damage and their
controls. This example of social validation is particularly complex given that Campbell and Dollaghan applied a hybrid
method that used both social comparison and subjective evaluation components. Although methods as complex as these
are probably not practical in many clinical settings, they provide a valuable illustration of how flexible social validation
procedures can be.
In summary, social validation methods add greatly to our estimation of how important an observed change is. In
particular, they can help us see how observed differences make a change in a child’s communicative and social functions
and opportunities. They vary dramatically in terms of their complexity and sophistication. Further, because they can be
applied to qualitative as well as quantitative data, the use of informal measures is an especially attractive feature.
Use of Multiple Measures
The augmentation of measures designed to directly assess linguistic behaviors with measures intended to provide social
validation constitutes one very important way in which multiple measures may be used to enhance our ability to tease out
the contribution of treatment to change. However, the kinds of multiple sources of data recommended by clinical
researchers do not stop there (Campbell & Bain, 1991; Olswang & Bain, 1994; Schwartz & Olswang, 1996). They
extend to considering the value of multiple indicators in helping one best address the construct of interest—an idea that
was introduced in Fig. 2.2 and in chapter 2. Whether the construct is one related to a particular linguistic skill or to a
child’s communicative function within a given setting, there is general agreement that making use of several measures
can best support conclusions about the construct under consideration.
Page 306
Writing from a research perspective, Primavera, Allison, and Alfonso (1996) noted that Cook and Campbell (1979)
introduced the idea of multioperationalism into behavioral research, in which a construct is operationalized using as
many indicators as possible in order to truly capture its essence. In a similar vein, Pedhazur and Schmelkin (1991)
offered a detailed account explaining why the use of a single indicator of a construct “almost always poses
insurmountable problems” (p. 56) related to knowing to what extent the indicator reflects the construct rather than error.
Whereas researchers may have greater opportunities and rewards for practicing multioperationalism, clinicians, too, can
benefit from its application. When a clinician uses a single measure (e.g., a single test of receptive vocabulary) to support
conclusions about a construct (e.g., receptive language), both the clinician and his or her audience either immediately
feel skeptical that the part (receptive vocabulary) represents the whole (receptive language) or should feel skeptical if
they give it much thought. Even if conclusions are limited to those about receptive vocabulary, however, a quick
reminder about the nature of most such tests—that they frequently address only pictureable nouns—should cause the
clinician to pause. Clearly, the single indicator seems unlikely to capture the construct of interest. The time demands of
clinical practice can sometimes make the collection of even one measure seem onerous and the idea of multiple measures
an author’s fantasy and clinician’s nightmare. However, becoming aware of the value of such measures may help
clinicians decide to take the extra time and provide support for that decision in select cases. Further, in cases where the
use of multiple measures has not seemed practical, it can help lead to more limited and therefore more valid
interpretations.
In this section, three principal strategies for examining the importance of change were briefly introduced: use of multiple
measures, social validation and effect size. Authors such as Bain, Campbell, Dollaghan, and Olswang have begun to
venture deep into the literatures of related disciplines to explore this relatively new territory for the resources it might
contribute to measurement in communication disorders. Given the value of their work to date, their efforts will
undoubtedly continue and be joined by those of others who respond to recent calls for more persuasive evidence that
speech-language pathology services make a difference for children with communication disorders.
Determining Responsibility for Change
Whereas determining the extent to which change in language has occurred and determining its importance are closely
related tasks, verifying the clinician’s contributions to that change is an altogether different and more daunting task.
Granted, simply noting the extent to which change has occurred and its nature can be useful in instances where no
intervention has taken place—for example, in cases where a child’s development is being monitored because of
suspicion that the child is a late talker. More commonly, assessment of change for children in treatment involves cases
where all stakeholders are comfortable with the unexamined assumption that change will be primarily the result of
intervention efforts. However, there are times when demonstrating that treatment is responsible for observed changes is
crucial. In this era of growing attention to accountability and quality assurance, these times are becoming more common
(Eger, 1988; Frattali, 1998a 1998b).
Page 307
The difficulty in pinning down causal explanations for human behavior or behavior change is a driving force behind
developments in psychology and related disciplines over the past 100 years. Again and again, the problem with
determining causality seems to be ruling out alternative explanations in cases where stringent control over potential
causes is either not possible or not ethical. Treatment for language disorders in children presents the classic difficulty in
this regard. The possibility of factors other than treatment—such as development, environmental influences, and changes
in the child’s physiology through recovery from a disease process or trauma—make it very difficult to identify
treatments or indirect management strategies as having “caused” gains that are seen in a child’s performance.
At least two design elements have provided a logical basis for increasing the plausibility that gains in performance seen
while a child is undergoing treatment are attributable to treatment rather than to alternative explanations. These two
elements are repeated observations over a period of time prior to the onset of treatment and the use of treatment,
generalization, and control probes. Both of these elements have been incorporated into the framework of research known
as single subject experimental design (Franklin, Allison, & Gorman, 1996; Kratochwill & Levin, 1992; McReynolds &
Kearns, 1983). In addition, each has been identified separately as a means of enhancing support for treatment as a causal
factor in cases of behavioral gains (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Olswang & Bain, 1994; Schwartz
& Bain, 1996).
Pretreatment Baselines
The use of multiple observations over a period of time prior to the initiation of treatment is frequently referred to as a
baseline or the A condition in a single subject experimental design. Multiple observations function as a window into the
stability of the behavior and the measure used to characterize it. If little variation is observed, it seems most likely that
the behavior is not changing and that the measure being used to track the behavior is not introducing error (i.e., that it is
probably reliable). This means that departures from stability observed after the onset of treatment can be more readily
attributed to treatment than to either the instability of the behavior being measured or to measurement error. The
presence of stability during baseline observations might alternatively be interpreted as suggesting that the behavior being
measured and the measure being used for that purpose are varying: in ways that cancel each other out—a most unlikely
prospect.
In contrast, when considerable variation is observed, it can be difficult to determine which of the two possible sources of
variation (change in the behavior vs. error in the measurement) is the culprit. Consequently, as a rule, baselines are
easiest to interpret and they provide the strongest support for observing changes that might occur under conditions such
as treatment, when they are sufficiently lengthy, show no obvious trends, and appear to be stable (McReynolds &
Kearns, 1983). With regard to length, three observations is often referred to as a minimum (McReynolds and Kearns,
1983), with longer baselines required if the behavior shows a trend or other lack of stability. The presence of a trend
(consistent increase or decrease in data values in the direction of expected change with treatment) can be problematic, as
can lack of stability in
Page 308
which both increases and decreases in a specific measure are noted. Because stability is a relative quality, we again are in
a position of looking toward expert advice to help us agree on an acceptable range of variation. McReynolds and Kearns
(1983) pointed to a historic standard of 5 to 10%. However, they noted that lower levels of stability achieved during a
baseline will simply necessitate greater amounts of change to justify claims of effective treatment.
Proponents of single subject experimental designs who are the chief resources for interpreting baseline data have often
suggested that visual inspection of such data is sufficient for the detection of stability and systematic change. Recently,
however, the complexity of this judgment task has led to questions about its use (Franklin, Gorman, Beasley, & Allison,
1996; Parsonson & Baer, 1992). In particular, researchers have noted a tendency for visual analysis to fail to detect
change when it has actually occurred, thus suggesting a lack of sensitivity to smaller levels of change. This reduced
sensitivity may present serious problems for clinicians who believe that small amounts of change will be important to
documenting the effect of their treatment. On the other hand, for those who attempt to target behaviors on which they
expect larger changes (larger effect sizes, to use our previous terminology) the reduction in sensitivity may represent a
reasonable trade-off against the relative simplicity of graphic analysis. Nonetheless, clinicians who may wish to rely on
visual analysis would do well to look into the emerging complexities of this aid to data interpretation (Franklin et al.,
1996; Parsonson & Baer, 1992). Researchers and clinicians with sufficient resources might also consider alternative
interpretations that make use of emerging methods (Gorman & Allison, 1996).
Treatment, Control, and Generalization Probes
The idea of treatment and control probes draws once again on the single subject experimental design literature (Bain &
Dollaghan, 1991). In that context, treatment probes represent quantitative measures focusing on behaviors that are or
will be the target of treatment. They are usually the minimum type of data collected to provide evidence of change. In
contrast, control probes represent quantitative measures obtained periodically over the course of a study to allow the
clinician to monitor the effects of extraneous variables on an individual’s behavior. They are usually constructed or
selected so that they measure behaviors that are unrelated to the treated behavior. If the treated behavior shows change
whereas the untreated, control behavior monitored using control probes does not, then the clinician can feel confident
that maturation and other factors have not produced global advances from which treated stimuli would have benefited
with or without the implementation of treatment. (Of course, one of the perils involved in the selection of control probes
is that developmental forces may cause changes in the behavior they are used to track even without a direct effect of
treatment; Demetras, personal communication, February, 2000).
Generalization probes are used to track behaviors that are related but distinct from those receiving treatment. Thus, their
use involves a violation of the expected lack of relationship from treated behaviors characteristic of control probes within
single sub-
Page 309
ject designs (Bain & Dollaghan, 1991; Fey, 1988). In the construction of generalization probes, the clinician looks for
behaviors that are related to treated behaviors in a manner thought likely to cause generalization that will affect them. On
the basis of the current understanding of generalization, generalization probes would be expected to show similar but
smaller changes than treatment probes in response to the implementation of an effective treatment. Although
generalization across behaviors may be the most common dimension in which generalization probes are studied
clinically, generalization across situations will also prove of interest as will generalization across time (McReynolds &
Kearns, 1983).
The use of generalization and control probes allows for a clear demonstration that treatment is behaving as predicted
relative to the targeted behavior. Specifically, their use can help demonstrate that treatment is having its greatest effect
on treated behaviors, a lesser effect on untreated or other generalization behaviors, and no effect on control behaviors.
Their use can thus contribute to the plausibility of arguments that treatment, rather than the myriad of other variables that
might help a child’s behavior improve, is the agent responsible for observed change. Campbell and Bain (1991) further
argued that evidence of generalization obtained during treatment offers speech-language pathologists their clearest
opportunity to show instrumental outcomes (i.e., outcomes suggesting the likelihood that treatment will lead to
additional outcomes without further treatment). More support for these varied measures comes from the motor learning
literature, in which it was observed that data obtained during a learning condition (e.g., a treatment session) can
overestimate learning compared to generalization or maintenance data (e.g., see Schmidt & Bjork, 1992).
An example illustrating the use of treatment, generalization, and control probes is described in Bain and Dollaghan
(1991) as part of a single subject design. Using the case of a hypothetical preschooler with SLI, they suggested a
treatment target consisting of the production of a two-word semantic relation Agent + Action. As a generalization
behavior, they proposed the production of Action + Object because its shared component, Action, was thought to make
generalization likely. Finally, as a control behavior, they proposed the production of Entity + Locative because it seemed
unlikely to change without direct treatment. Each probe consisted of the child’s percentage of correct production of 10
unfamiliar exemplars that the clinician attempted to elicit through manipulation of several toys and the context.
Treatment, generalization, and control probes often involve elicited behaviors such as those described under that heading
in the preceding chapter. However, other measures, such as performance on language samples and analyses, could also
serve as measures that might be used to examine treatment, generalization, and probe behaviors. Although there is a
tendency for treatment probes to be obtained frequently so that the process of treatment as well as the product may be
illuminated (McReynolds & Kearns, 1983), generalization and control probes are frequently evaluated on a less frequent
basis (Bain & Dollaghan, 1991). The frequency with which treatment probes are used may depend on the expected rate
of change; Bain and Dollaghan pointed out that the behaviors of a child with cognitive delays indicative of an overall
slower rate of learning may require less frequent collection of data.
Page 310
Determining Whether Additional Change Is Likely to Occur
As an additional aspect of examining change, authors have sometimes called attention to the value of predicting whether
future change is likely. In particular, this general question has been asked specifically with regard to addressing
predictions of change at two different ends of the treatment process: initiation and termination. First, successful
prediction of whether change is likely might help in judging whether treatment should be initiated because of a child’s
‘‘readiness” for change in a particular area (Bain & Olswang, 1995; Long & Olswang, 1996; Olswang & Bain, 1996).
Second, successful prediction might help in judging whether treatment should be terminated, or at least temporarily
discontinued, because additional change is unlikely (Campbell & Bain, 1991; Eger et al., 1986). Both kinds of questions
will require substantial empirical investigations to arrive at universal recommendations for best practices. Nonetheless,
each depends on evidence that a particular technique is valid for predicting a given outcome—thus suggesting that
evidence of predictive criterion-related validity is at the root of both of these questions. This realization is implicit in the
work of Bain and Olswang (1995), in which they sought to demonstrate the predictive validity of dynamic assessment to
support its use in determining readiness for the production of two-word phrases.
Posing the question of when treatment might most profitably be initiated goes beyond the clinical assumption that
treatment should be undertaken any time a child is found to demonstrate a significant problem in language or
communication skills. The question itself suggests the possibility that there are times when children may exhibit
evidence of a language disorder but that treatment would be unlikely to be effective—either in a global sense or in
relation to a specific domain or behavior. Timing the onset of treatment or at least the onset of treatment aimed at
specific targets to coincide with children’s areas of readiness could be expected to yield major enhancements to
treatment efficiency (Long & Olswang, 1996).
Olswang and Bain (1996) discussed the use of profiling in static assessment versus dynamic assessment as tools to use in
addressing the question of readiness. The use of profiles, which are most often created by comparing a child’s
performances on several tests or subtests, was discussed at some length in chapter 9. Even though the use of profiles has
been largely debunked as a strategy for highlighting domains or children that might exhibit the greatest change in
treatment, Olswang and Bain (1996) decided both to pursue it as one of the few methods in static assessment that has
been proposed for addressing the prediction of future change and to compare it with techniques from dynamic
assessment.
One of the greatest promises of dynamic assessment has been its use in identifying the moving boundary of a child’s
learning, or zone of proximal development (ZPD; Olswang & Bain, 1996; Vygotsky, 1978). As described in chapter 10,
the ZPD is thought to reflect the loci of a child’s active developmental processes and thus to suggest areas in which
treatment might be aimed to achieve optimal change. As a result of this promise, Olswang and Bain (1996) decided to
compare the relative merits of profiles based on static assessments as well as performances on other selected variables
versus measures of dynamic assessment techniques in predicting responses to
Page 311
treatment. The dynamic measures were found to have the stronger correlation than the static measures to a measure of
change (PCI) calculated following a 3-week treatment period.
The results of their study led Olswang and Bain (1996) to propose that dynamic assessment procedures are better than
other techniques at determining the likelihood of immediate change. However, they noted that additional research is
needed to determine whether observed changes would have occurred even in the absence of treatment. They might also
have noted that additional research is needed to determine whether the predictive powers of dynamic assessment would
have performed as well over longer periods of treatment.
As Campbell and Bain (1991) advised, decisions regarding treatment termination can be based on predetermined exit
criteria or on demonstrations that no change has occurred over a given period of time. Such decisions, however, can also
be based on empirical evidence that additional change is unlikely. This last alternative thus demands a prediction of
future change levels akin to that sought by Olswang and Bain in their efforts to identify harbingers of change prior to
treatment initiation.
Campbell and Bain (1991) touched on the possibility of predicting future change for purposes of making a rational
decision about the end of treatment in their discussion of ultimate or instrumental outcomes. Whereas ultimate outcomes
can be defined as a child’s achievement of age-appropriate or maximal communicative effectiveness, such outcomes can
also be defined as functional communicative effectiveness, which implies that the child has achieved his or her best
approximation of maximal communicative effectiveness. Additionally, “instrumental outcomes” can be defined as
outcomes suggestive that additional change will be forthcoming in the absence of treatment. The notions of “functional”
communicative effectiveness and instrumental outcomes each involve implications related to the prediction of future
change. Specifically when functional communicative effectiveness is seen as a legitimate ultimate outcome, it is almost
invariably because the prospect of additional change is seen as unlikely or as prohibitive in terms of the time and effort
required to produce it. Similarly, instrumental outcomes depend on the notion that additional change is likely.
At this point in time, it appears that generalization data, such as that described in the preceding section, may represent
the best method for addressing questions regarding future change. Research designed to identify more appropriate
methods of predicting future change will undoubtedly need to proceed hand-in-hand with research aimed at
understanding the nature of language learning and of threats to language learning posed by language disorders before
substantial progress on these clinical questions can be made. Measures of predictive validity will also undoubtedly play a
role in helping us arrive at satisfying answers.
Available Tools
The kinds of tools available for use in addressing questions of change in children’s language disorders largely overlap
those available for description that were described in the preceding chapter. Therefore, in this chapter, discussion of
available tools is
Page 312
quite brief and focuses on those measures that are most frequently used to examine behavioral change and the special
considerations that arise when they are used for that purpose. The only new tool to be introduced in this chapter is single
subject designs, a family of methods that has been alluded to throughout this chapter but has not been adequately
introduced as a specific method for examining change.
Standardized, Norm-Referenced Tests
Repeated administration of standardized, norm-referenced tests is probably the most widespread method used by speech-
language pathologists to examine broad changes in language behaviors over time (McCauley & Swisher, 1984). More so
than other measures used to examine change, standardized norm-referenced measures are often accompanied by data
concerning their reliability and validity. This represents a distinct potential advantage because such data can enhance the
clinician’s ability to determine whether observed changes are likely to be reliable and important. Regrettably, however,
norm-referenced measures often do not provide sufficiently detailed data to make this potential a reality (Sturner et al.,
1994).
As additional barriers to their effective use for evaluating change, there are a number of pitfalls that must be avoided.
The most important of these relates to the tendency for such measures to have been devised so that they are more
sensitive to large differences in knowledge between individuals than to small differences (Carver, 1974; McCauley &
Swisher, 1984). Yet it is small differences that are characteristic of the changes most likely to occur in treatment within a
given individual (Carver, 1974; McCauley & Swisher, 1984). Thus, clinicians who use such measures to assess change
must be aware that their efforts are likely to prove insensitive to very important changes in behaviors that simply are not
addressed by a given test. Such tests should be used when broad changes are of interest.
Among other possible pitfalls cited by McCauley and Swisher (1984), as well as others, are the need to avoid situations
in which the test is explicitly taught by a well-meaning clinician or implicitly taught through repeated administrations
that occur so closely in time as to allow the child an unwarranted advantage at the second administration. Another pitfall
is the use of norm-referenced instruments to assess change, which can be problematic if changes in the normative groups
occur over the time interval studied or if different measures (albeit those that ostensibly tap the same behavior) are used
at different times. Now, it may be tempting to view change as having occurred because a child has received a relatively
better score on Test B of Language Behavior X than she or he did on Test A of Language Behavior X. However, the
huge amount of error that could be introduced by differences in the content of Tests A and B (despite their similar
names) as well as by differences in their normative samples are likely to make such a conclusion completely erroneous.
One method that has been recommended (e.g., McCauley & Swisher, 1984) as helping remove the additional error
associated with gain scores has been to simply reexamine a child with the same initial question: Is this child’s language
(or the particular aspect of it that is under scrutiny) impaired? However, a recent study looking at remission rates for
reading disability among children examined in two studies over
Page 313
a 2-year time period suggested that measurement error can lead to significant overestimates of recovery rates even when
this more cautious strategy is applied (Fergusson, Horwood, Caspi, Moffitt, & Silva, 1996). However, the chief source of
difficulty was not in how change was examined, but that the question of measurement error had not been explored
sufficiently by the original investigators at the time of the children’s original diagnoses. Careful analysis by Fergusson
and his colleagues suggested that the overidentification of many children at their first testing, due to a lack of
appreciation of testing error, was the villain. It is now an empirical question to determine whether the findings of
Fergusson et al. are echoed in the identification of children as having a language impairment. However, I include this
brief description of their work here as a cautionary tale suggesting that careful use of norm-referenced measures in
assessing change begins with their careful use in identification processes.
In short, despite their frequent use for the assessment of change, norm-referenced tests are most useful when broad
changes are expected and when clinicians are careful to avoid the several problems that can undermine the validity of
their use for this purpose as well as for purposes of identification.
Standardized Criterion-Referenced Measures
Because criterion-referenced measures are more often developed so that they exhaustively examine knowledge within a
given domain, they have been hailed as superior to norm-referenced measures for purposes of examining change
(Carver, 1974; McCauley, 1996; McCauley & Swisher, 1984). However, their relative rarity (as shown by the sampling
of such tools in Table 10.1) means that their value in assessing language change in children has not been extensively
evaluated.
Clinicians need to examine documentation for such measures to determine whether the author has presented a reasonable
evidence base supporting their use to examine change over time. Especially desirable is evidence suggesting that changes
in performance of specific magnitudes are likely to reflect significant functional changes in performance. Nonetheless,
where they are used as a simple description of the specific content on which gains have been achieved, such evidence is
not as critical.
Probes and Other Informal Criterion-Referenced Measures
As argued throughout this book, probes have a relative advantage in their malleability to the specific clinical questions
posed by the speech-language pathologist. Thus, they can be devised or selected to address very specific questions about
change that coincide with the very focus of treatment for a given child. That they are often relatively brief and
straightforward in interpretation represent further advantages.
To contemplate the possible pitfalls of the use of probes, however, readers need only return to their description in chapter
10. Without the considerable effort entailed in standardization, clinician-devised probes or probes that are borrowed from
other nonstandardized sources are unknown with respect to reliability and validity. Although their possible fit to the
question being asked presents a great potential for excellent construct validity, the tendency for probes to be haphazardly
constructed,
Page 314
administered, and interpreted represents a potentially devastating threat to that potential. Because of the expectation that
repeated use of probes will be required if they are to be used to assess change, the standardization strategies described in
Figure 10.1 become particularly vital defenses against those threats.
Dynamic Assessment Methods
The growing literature aimed at exploring the utility of dynamic assessment methods in predicting readiness for language
change (Bain & Olswang, 1995; Long & Olswang, 1996; Olswang & Bain, 1996) supports a hopeful but questioning
view regarding the uniformity with which such techniques succeed. Although by definition such methods are intended to
elicit conditions that change a child’s likelihood of acquiring a more mature behavior, they may at times provide no more
than transient predictions with a tenure that makes them of lesser value for signaling treatment focus. Nonetheless,
exploration of their predictive value in specific domains and for specific clients warrants further investigation. In the
meantime, their greatest promise appears to lie in the insights they provide regarding how intervention might best take
place and in providing more valid assessments for children who are highly reactive to a testing. There are also numerous
suggestions that they promise to provide more valid assessments than other available methods for children from diverse
backgrounds who may lack the experiences assumed by more conventional testing methods.
Single Subject Designs
In their ground-breaking work on the application of single subject experimental designs to speech-language pathology,
McReynolds and Kearns (1983) noted that such designs had the promise of wide application by clinicians because of
their practicality and clinical relevance. Despite their wide acceptance as an alternative method of scientific inquiry,
however, such designs have been resisted by speech-language clinicians in daily practice—probably because their
practicality falls short of that demanded by most clinical settings. Nonetheless, they remain the strongest available
method when the clinical question at hand centers on whether treatment is the likely cause of observed changes in
behavior.
The most frequently used measures in single subject designs are elicited probes and other informal measures, which are
referred to as dependent measures in this context. These informal measures often lack the documentation regarding
validity and reliability that can adorn more formal measures. Nonetheless, their use is strengthened by their close tie to
the specific construct for which they have been created or selected. Ideally, they represent highly defensible
operationalizations of the behavior or ability of interest. Their use is further strengthened when measures of inter- and
intraexaminer agreement, or other basic measures aimed at demonstrating reliability, are obtained. They can also be
enhanced by blind measurement procedures in which the person making the measurement is unaware of the purpose it
will serve or, ideally, the individual on whom it was obtained (Fukkink, 1996).
Page 315
As part of the systematic structuring of observations that underlies the rationale behind single subject designs, dependent
measures are obtained frequently and can thus provide persuasive evidence of consistency or change. In addition, the
temporal structure of such designs is intended to provide logical support for the role of treatment versus alternative
explanations as agents of change. On the basis of these ideals, single subject experimental designs have been lauded not
only for their ability to provide superior evidence about causation at the level of the individual but also about both the
outcome and process of treatment (McReynolds & Kearns, 1983; McReynolds & Thompson, 1986).
A simple consideration of a few of the books on the subject suggests that detailed discussion of the methods and logic
supporting the application of single subject designs in communication disorders is well beyond the scope of this book (e.
g., Franklin, Allison, & Gorman, 1996; Kratochwill & Levin, 1992; McReynolds & Kearns, 1983). Nonetheless, a
simple example can be used to illustrate the logic that supports causal interpretation of such designs and thus their
potential for addressing the question of whether treatment is likely to be responsible for a child’s behavioral change. The
example I show in Fig. 11.2 is a hypothetical example from Bain and Dollaghan (1991). It was described previously for
its use of control, generalization, and treatment probes. It is described here for the way in which the stability of data,
timing of treatment, and demonstrations of change lead one to the conclusion that observed changes probably resulted
from treatment.
As you look at Fig. 11.2, notice first the top graph, in which probes for the primary focus of treatment (Agent + Action)
are studied first without the presence of treatment during a baseline condition. Because the baseline is clearly
unchanging, it is reasonable to conclude that factors such as maturation, informal instruction by a parent, and so forth are
not playing a role in the child’s acquisition of the target form prior to the initiation of treatment. Although the initiation
of treatment does not result in instantaneous change, change does occur over the course of the treatment interval. Further,
that change seems likely to be due to the effects of treatment rather than alternative explanatory factors because of the
implausibility that such factors would commence by chance in such close proximity to the onset of treatment. Whereas in
most single subject designs, the period labeled “withdrawal’’ is considered a second baseline, here it is described as
withdrawal because the experimenter would probably expect some additional growth (generalization) due to learning
effects. This kind of design in which treatment is absent, then present, then absent again is often referred to as an ABA or
withdrawal design.
ABA designs are often avoided in classical single subject designs in cases where an effective treatment would be
expected to show “carryover” in this way. Instead such designs would more typically be used for behaviors that are
expected to return to baseline when treatment is ended. When language development is studied, however, the presence of
generalization is not considered a serious detractor from the logic of an experiment when it occurs as part of a set of
predictions made in advance by the clinician or experimenter.
In the second graph of Fig. 11.2, a second dependent measure (or generalization probe), Action + Object, is observed
with the expectation that its relationship to the targeted variable, Agent + Action, will cause some developmental change
to occur
Page 316
Fig. 11.2. A hypothetical multiple baseline single subject design that makes use of treatment (Agent + Action), generalization
(Action + Object), and control (Entity + Locative) probes (Graphs 1, 2, and 3, respectively). From “The Notion of Clinically
Significant Change,” by B. A. Bain and C. A. Dollaghan, 1991, Language, Speech, and Hearing Services in Schools, 22, p. 266.
Copyright 1991 by the American Speech-Language-Hearing Association. American Speech-Language-Hearing Association.
Reprinted with permission.
during treatment and possibly beyond. However, the presence of an initial period of stability prior to the onset of change in this
measure is again helpful in strengthening the plausibility of the argument that the observed change is likely to result from the
treatment rather than other factors. In addition, that argument is strengthened if the generalization probe does not improve to the same
extent as the target probe, or does so following a delay relative to the actual target of treatment.
In the third graph of Fig. 11.2, the control probe, Entity + Locative, is shown with a stable but longer baseline, thus indicating that
extraneous variables are unlikely to
Page 317
be acting on the child’s language development for the entire duration of the baseline. It is important that the baseline for
this variable, which was predicted to be unaffected by generalization, remained stable throughout the entirety of
treatment directed at Agent + Action and its withdrawal period in order to support the treatment effect on the other
variables. As importantly, it begins to show improvement only after the initiation of treatment in which it has become the
direct target.
The practical requirements in terms of data collection and display are not inconsequential for single subject designs.
However, as this example illustrates, they do not have to be overly burdensome either, with the chief investment here
being the periodic (and staggered) collection of probe data for two additional forms. This cost seems well worth it when
weighed against the value of evidence documenting the effectiveness of the treatment used for two different targets and
of real-time insights into the generalization patterns of the individual child.
In addition to numerous books dealing more comprehensively with the large number of designs that can be applied in
clinical settings (Franklin, Allison, & Gorman, 1996; Kratochwill & Levin, 1992; McReynolds & Kearns, 1983), a set of
three classic articles (Connell & Thompson, 1986; Kearns, 1986; McReynolds & Thompson, 1986) represent a
wonderful initiation to the promise such designs hold for clinicians interested in children’s language disorders.
Practical Considerations
With regard to assessing change, the largest practical consideration appearing on the horizon has been the presence of
professional and societal forces urging clinicians to find measures that document the value of what they do on a broader
scale and with greater regularity. Therefore, although other practical issues exist as very real pressures on clinicians’
decision making regarding all of the areas of change discussed in this chapter, the issue of outcome measurement seems
to warrant the full attention of the remaining pages of this chapter and, indeed, the concluding pages of this book.
In speech-language pathology, interest in how language treatment affects children has been around for quite some time
(e.g., Schriebman & Carr, 1978; Wilcox & Leonard, 1978). However, a continuing complaint has been that not enough
such research on treatment is being done (e.g., McReynolds, 1983; Olswang, 1998), and the research that is being done
involves treatment procedures that, although useful for purposes of scientific rigor, cannot readily be applied to real
clinical settings. Thus, the generalizability of a small research base has been at issue. Nonetheless, existing treatment
research has provided at least some preliminary evidence of the effectiveness of treatment extending beyond the level of
the individual clinician.
More recently, interest in accountability (e.g., Eger, 1988; Eger et al., 1986; Mowrer, 1972) has arisen at a grassroots
level because of growing demands from individual consumers and their advocates. This interest has been joined in an
intense top-down fashion by ASHA as it responds to protect its members’ roles in fast-changing health care and
educational systems (Frattali, 1998a,b; Hicks, 1998). In a chapter addressing the specific nature of top-down pressures
necessitating greater attention to outcomes
Page 318
assessment in speech-language pathology, Hicks (1998) described at least three sources of influence to which the
profession must respond:
1. accrediting agencies (e.g., the Rehabilitation Accreditation Commission; the Joint Commission on Accreditation of
Healthcare Organizations, JCAHO; ASHA’s Professional Services Board, PSB);
2. payer requirements (e.g., Medicare; Medicaid; and Managed Care Organizations, MCOs); and
3. legislative and regulatory requirements (e.g., Omnibus Budget Reconciliation Act of 1987, Public Law 100-203, and
the Social Security Act, Part 484)
At first glance, these forces would seem to come primarily from those clinical settings that serve adults and, thus, it
might be thought that they would not affect clinicians who work with children in primarily educational settings.
However, as appreciation of the value of outcomes measures has become more widespread and as the great divide
between education and healthcare breaks down (as illustrated in Medicaid funding for some children enrolled in school
programs), the blissful luxury of considering treatment outcomes someone else’s challenge has all but disappeared. Eger
(1998) noted that Congress’s passing of the Education of All Handicapped Children Act of 1975 (P.L. 94-142) served as
a possible precursor to formal outcomes measurement activities in special education because it included as one of its four
main goals the assessment and assurance of educational effectiveness. The passage of the 1997 amendments to IDEA (P.
L. 105-17) further reinforces the importance of further developments in this area. In order to respond to the challenges
facing the professions across settings, ASHA has begun the development of treatment outcomes measures that can be
used by groups of clinicians to document their value and provide a basis for comparisons by important groups (e.g.,
school districts, third-party payers).
At this point, readers who are unfamiliar with the terminology that accompanies outcomes measurement may feel a tad
bewildered. Therefore, some background on the relationship between treatment efficacy research and treatment
outcomes research seems in order. Despite some important underlying similarities and overlapping methods, an
important distinction can be made between these two terms (Frattali, 1998a; Olswang, 1998). Olswang (1998) pointed
out that both efficacy research and outcome research represent strategies for examining the influence of treatment on
individuals with communication disorders. Nonetheless, whereas efficacy research emphasizes the importance of
documenting treatment as a cause for change, outcomes research emphasizes the benefits associated with treatment as it
is administered in real-world circumstances. Frattali (1998a) described the distinction quite succinctly by saying that
“efficacy research is designed to prove,” whereas “outcomes research can only identify trends, describe, or make
associations or estimates” (p. 18). Whereas past efficacy research has focused primarily on the behaviors that fall at the
impairment level in terms of the ICIDH classification system, a broadening of concerns to embrace behaviors falling at
the levels of disorder and handicap is an emerging trend (Olswang, 1998).
Page 319
Treatment efficacy is often defined as encompassing treatment effectiveness, efficiency, and effects (e.g., see Kreb &
Wolf, 1997; Olswang, 1990, 1998). Treatment effectiveness refers to the traditional idea of whether or not a given
treatment is likely to be responsible for observed changes in behavior. Treatment efficiency refers to the relative
effectiveness of several treatments or to the role of components of a treatment in contributing to its effectiveness.
Finally, treatment effects refers to the specific changes that can be seen in a constellation of behaviors in response to a
given treatment. Similar components have also been identified as falling within the province of treatment outcomes as
well (Kreb & Wolf, 1997).
Whereas treatment efficacy research is usually conducted under optimal conditions, or at least well-controlled clinical
conditions, outcomes measurement is, by definition, conducted under typical conditions (Frattali, 1998b; Olswang,
1998). On the downside, this means treatment outcomes research will almost never be able to contribute to arguments
about the cause and effect relationships of treatments and observed benefits. Nonetheless, outcomes research will almost
always be in a better position than treatment efficacy research to address concerns about the value of services offered to
professional constituencies (e.g., within a given hospital or school district). Consequently, outcomes research has a very
special value to individual clinicians. It can enable them to demonstrate accountability not in the abstract, based on
treatments conducted solely by other clinician–researchers working under controlled conditions, but by comparing their
own outcomes with those obtained by others through participation in the large-scale, multi-site efforts that are
characteristic of such research.
In 1997, the National Center for Treatment Effectiveness in Communication Disorders began work on a database that
will involve clinicians in the collection of outcomes data on a national basis. This complex database, the National
Outcomes Measurement System (NOMS), will eventually include information about all of the populations served by
speech-language pathologists and audiologists. Currently, however, NOMS is limited to information about adults seen in
healthcare settings, preschool children who are served in school or healthcare settings, and children in kindergarten
through the sixth grade who are seen in schools. (Note that data concerning infant hearing screenings are just beginning
to be collected.) In order to participate, school-based clinicians work cooperatively to provide data for a given school
system in which at least 75% of the speech-language pathologists hold ASHA certification and in which all students will
be included in the data that are collected. These two restrictions are designed to improve the quality and
representativeness of the data.
For schools, data for the NOMS are collected at the beginning and conclusion of services, or at the beginning and end of
the school year, with data collection procedures designed to take no more than 5 to 10 minutes per child. Data include
information about demographics, eligibility for services, the nature of treatment (i.e., model of services, amount, and
frequency of services), teacher and family satisfaction, and the results of the Functional Communication Measures
(FCMs), a 7-point scale developed by ASHA. The scale addresses functional performance within the educational
environment. It includes items such as ‘‘The student responds to questions regarding everyday and classroom activities”
and “The student knows and uses age-appropriate
Page 320
interaction with peers and staff.” These items are rated on the following scale: 0 = No basis for rating; 1 = Does not do;
2 = Does with maximal assistance; 3 = Does with moderate to maximal assistance; 4 = Does with moderate assistance; 5
= Does with minimal to moderate assistance; 6 = Does with minimal assistance; and 7 = Does.
ASHA’s goals for the NOMS are lofty. Besides demonstrating positive outcomes for children receiving speech-language
pathology services, it is hoped that the NOMS will facilitate administrative planning (e.g., caseload assignments) as well
as individual decisions about intervention. Among particular aspirations are that it will provide information about when
intervention is most effective, how much progress can be expected over an academic year, what service delivery model
and frequency of service results in the greatest gains for a given kind of communication disorder, and what entrance and
dismissal criteria are reasonable. In addition, it is hoped that comparative NOMS data might allow individual school
systems or groups of school systems to demonstrate their effectiveness and efficiency in ways that will help them
negotiate in an era of strained educational resources. The success of the system in meeting these goals will depend
greatly on widespread participation allowing the representative samples required for specific generalizations such as
those just described. In terms of the utility of the system for providing comparative data across school systems or units, a
greater tailoring of reports available to participants may be necessary before those aspirations can be actualized.
Beyond the NOMS, Eger (1998) described numerous ways in which an outcomes approach can be incorporated within
school practice. These range from simple modifications of the way goals and objectives are written for individualized
educational plans (IEPs) to the development of empirically motivated dismissal criteria to more elaborate investigations
of effectiveness of specific service delivery models (e.g., classroom-based interventions, self-contained classroom).
These three examples run the gamut from those that can be implemented by the individual clinician to those requiring
more extensive resources, akin to those required by the NOMS.
In terms of how the individual speech-language pathologists can modify the IEPs they write, Eger (1998) provided an
example. She noted that a goal that might currently be written as “The student will improve expressive language skills”
could be replaced with one or more of the following: “The student will apply problem-solving and decision making skills
in math and English classes,” ‘‘The student will use language to create dialogues with teachers and peers to facilitate
learning,” or “The student will be able to follow written directions on objective tests” (Eger, 1998, p. 447).
Regardless of whether speech-language pathologists working with children actively work to include an outcomes
perspective in their practice, the outcomes movement will undoubtedly drive extensive changes in clinical practice over
the next decade, especially as these relate to the documentation of change in children’s communication. Responsible
reactions to these changes will depend on sensitivity to the measurement virtues (i.e., functionality and the development
of common best practices) as well as the measurement perils. Many of these perils are those shared with all measurement
strategies, such as concerns about the quality of data collection at its source and the size of the sample used for any
particular decision. Some, however, are unique to such a large undertaking—the relinquishment of decisions about how
interpretation
Page 321
will take place and, thus, the possible relinquishment of feelings of personal responsibility as well. Still, it is an exciting
time for measurement in communication disorders, one in which sizeable resources may finally be funneled to some of
the questions that most trouble speech-language pathologists. The desired outcome of such investments is the
proliferation of innovative measurement strategies and refinement of existing tools to help us arrive at a sophisticated
armamentarium of tools for addressing our clinical questions.
Summary
1. The assessment of change underlies both critical and commonplace decisions made in the management of children’s
language disorders. These include decisions about individuals, such as when to begin and end treatment and whether
treatment tactics should be altered during the course of treatment.
2. When questions of treatment efficacy and accountability are raised, the assessment of change can also fuel decisions
about the relative merit of various treatment approaches or the relative productivity of groups of clinicians.
3. Three types of outcomes observed in clinical settings include ultimate outcomes, intermediate outcomes, and
instrumental outcomes. Whereas ultimate outcomes relate to decisions about treatment termination, intermediate and
instrumental outcomes relate to clinical decisions made during the course of treatment.
4. Measurement error presents an especially difficult challenge to interpretation when measures are examined at multiple
points in time, such as when past change is examined or future change is predicted.
5. Clinically significant change must not only be reliable, it must also represent an important change to the life of the
child. Three methods used to address whether an observed change is likely to be important involve considerations of
effect size, social validation, and the use of multiple measures.
6. Determining that positive changes in a child’s language are caused by treatment is made extraordinarily difficult by
the thankfully unavoidable but nonetheless confounding influences of growth and development. Increased understanding
of those influences within and across children are needed to help address this very thorny measurement problem.
7. Single subject experimental designs offer clinicians the best currently available means for demonstrating that
treatment is responsible for observed changes, but have thus far been used primarily by researchers.
8. Measurement elements strengthening arguments that treatment is the cause of observed changes include the presence
of pretreatment baselines and the use of treatment, generalization, and control probes.
9. Treatment efficacy research is concerned with documenting whether treatment is effective, efficient, and whether the
effects of treatment extend to a number of significant behaviors.
10.
Page 322
Treatment outcomes research is designed to demonstrate benefits associated with treatment as it is conducted in
everyday contexts. Cooperation from all members of the profession is needed to collect some kinds of particularly
persuasive treatment outcomes data, such as those being collected in the NOMS database by ASHA.
Key Concepts and Terms
clinically significant change: a change that makes an immediate impact on the communicative life of a child or that
represents significant progress toward the acquisition of critical aspects of language.
effect size: the magnitude of the difference between two scores or sets of scores, or of the correlation between two sets of
variables.
Functional Communication Measures (FCMs): one of several rating scales designed by ASHA for use in tracking
functional communication gains made by clients.
gain scores: the difference between scores obtained by an individual at two points in time when that difference represents
a positive change in performance; also called difference scores.
instrumental outcomes: individual behaviors acquired during treatment that suggest the likelihood of additional change;
generalization probe data function as instrumental outcomes.
intermediate outcomes: individual behaviors that must be acquired for progress in treatment to have occurred; treatment
probe data can function as intermediate outcomes.
National Outcomes Measurement System (NOMS): an outcomes database for speech-language pathology and audiology
that is being developed to address the professions’ need for large-scale outcomes data.
outcome measurement: the use of measures designed to describe the effects of treatment conducted under typical, rather
than controlled conditions.
Proportional Change Index (PCI): a method for examining the rate of change observed in a given behavior during
treatment relative to that observed prior to treatment.
single subject experimental designs: a group of related research designs that permit the user to support claims of causal
relationship between variables, such as the effect of treatment on a targeted behavior.
social comparison: a social validation method that involves the use of a comparison between language behaviors of a
given child or group of children and those of a small group of peers.
social validation: methods used to indicate the social importance of changes occurring in treatment.
Page 323
subjective evaluation: a social validation method in which procedures are used to determine whether individuals who
interact frequently with a child who is receiving treatment see perceived changes as important.
treatment effectiveness: the demonstration that a treatment, rather than other variables, is responsible for changes in
behavior (Kreb & Wolf, 1997; Olswang, 1990).
treatment effects: changes in multiple behaviors that appear to result from a given treatment (Olswang, 1990).
treatment efficacy research: research designed to demonstrate the complex property of a treatment that includes its
effectiveness, efficiency, and effects (Olswang, 1990, 1998).
treatment efficiency: the effectiveness of a treatment relative to an alternative; a more efficient treatment is one in which
goals are accomplished more rapidly, completely, or more cost-effectively than a less efficient treatment (Olswang,
1990).
ultimate outcomes: individual behaviors that signal successful treatment, either because age-appropriate or functionally
adequate levels of performance had been achieved or because further treatment would be unlikely to yield significant
additional gains.
Study Questions and Questions to Expand Your Thinking
1. Arrange to see a clinical case file for a child who is receiving treatment for a language disorder. List the ways in which
change is currently documented. Consider ways in which that documentation might be strengthened including how
efforts might be made to address changes in educational or social function as well as in the nature of impairment.
2. Discuss the advantages and disadvantages of using a standard battery of norm-referenced tests to look at a child’s
overall language functioning over time. If you were to devise such a battery, what would you look for in its components?
Would that battery differ on the basis of the etiology of the disorder? If so, how?
3. With regard to the different tools that might be used to examine change, discuss how you might explain that method to
a child’s parents.
4. Visit the web site for the NOCMS at http://www.asha.org/nctecd/treatment_outcomes.htm. Determine what barriers
might exist to participating in the NOMS. On the basis of the information you obtained in this chapter and through that
web site, what arguments might be made to justify efforts to overcome these barriers?
5. Look at the treatment efficacy studies for child language disorders collected at the NOMS web site under the Efficacy
Bibliographies link. On the basis of the information you can glean from reading the titles of articles listed there, what
kinds of aspects of treatment efficacy seem to have gotten the greatest attention?
6. On the basis of what you know about clinical decisions regarding change, discuss specific changes that might warrant
the use of a method such as a single subject
Page 324
design or social validation techniques. Although these methods are more complex than some other methods, they have
the respective advantages of demonstrating the clinician’s responsibility for change or the social impact of change.
Recommended Readings
Bain, B. A., & Dollaghan, C. (1991). The notion of clinically significant change. Language, Speech, and Hearing
Services in Schools, 22, 264–270.
Kazdin, A. E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical
Psychology, 67, 332–339.
Kreb, R. A., & Wolf, K. E. (1997). Treatment outcomes terminology. In R. A. Kreb & K. E. Wolf (Eds.), Successful
operations in the treatment-outcomes-driven world of managed care. Rockville, MD: National Student Speech-
Language-Hearing Association.
Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative
strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101.
References
Anastasi, A. (1982). Psychological testing (5th ed). New York: Macmillan.
Bain, B. A., & Dollaghan, C. (1991). The notion of clinically significant change. Language, Speech, and Hearing
Services in Schools, 22, 264–270.
Bain, B. A., & Olswang, L. B. (1995). Examining readiness for learning two-word utterances by children with specific
expressive language impairment: Dynamic assessment validation. American Journal of Speech-Language Pathology, 4,
81–91.
Bernthal, J. E., & Bankson, N. W. (1998). Articulation and phonological disorders (4th ed.). Englewood Cliffs, NJ:
Prentice-Hall.
Campbell, T., & Bain, B. A. (1991). Treatment efficacy: How long to treat: A multiple outcome approach. Language,
Speech, and Hearing Services in Schools, 22, 271–276.
Campbell, T., & Dollaghan, C. (1992). A method for obtaining listener judgments of spontaneously produced language:
Social validation through direct magnitude estimation. Topics in Language Disorders, 12 (2), 42–55.
Carver, R. (1974). Two dimensions of tests: Psychometric and edumetric. American Psychologist, 29, 512–518.
Connell, P. J., & Thompson, C. K. (1986). Flexibility of single-subject experimental designs. Part III: Using flexibility to
design and modify experiments. Journal of Speech and Hearing Disorders, 51, 214–225.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston:
Houghton Mifflin.
Diedrich, W. M., & Bangert, J. (1980). Articulation learning. Houston, TX: College-Hill Press.
Education for All Handicapped Children Act of 1975. Pub. L. No. 94–142. 89 Stat. 773 (1975).
Eger, D. (1988). Accountability in action: Entry, measurement, exit. Seminars in Speech and Language, 9, 299–319.
Eger, D. (1998). Outcomes measurement in the schools. In C. Frattali (Ed.), Measuring outcomes in speech-language
pathology (pp. 438–452). New York: Thieme.
Eger, D., Chabon, S. S., Mient, M. G., & Cushman, B. B. (1986). When is enough enough? Articulation therapy
dismissal considerations in the public schools. Asha, 28, 23–25.
Elbert, M., Shelton, R. L., & Arndt, W. B. (1967). A task for evaluation of articulation change: I. Development of
methodology. Journal of Speech and Hearing Research, 10, 281–289.
Fergusson, D. M., Horwood, L. J., Caspi, A., Moffitt, T. E., & Silva, P. A. (1996). The (artefactual) remission of reading
disability: Psychometric lessons in the study of stability and change in behavioral development. Developmental
Psychology, 32, 132–140.
Page 325
Fey, M. (1988). Generalization issues facing language interventionists: An introduction. Language, Speech, and Hearing
Services in Schools, 19, 272–2 1.
Foster, S. L., & Mash, E. J. (1999). Assessing social validity in clinical treatment research: Issues and procedures.
Journal of Consulting and Clinical Psychology, 67, 308–319.
Franklin, R. D., Allison, D. B., & Gorman, B. S. (Eds.). (1996). Design and analysis of single-case research. Mahwah,
NJ: Lawrence Erlbaum Associates.
Franklin, R. D., Gorman, B. S., Beasley, T. M., & Allison, D. B. (1996). Graphical display and visual analysis. In R. D.
Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 119–158). Mahwah,
NJ: Lawrence Erlbaum Associates.
Frattali, C. (1998a). Measuring modality-specific behaviors, functional abilities, and quality of life. In C. Frattali (Ed.),
Measuring treatment outcomes in speech-language pathology (pp. 55–88). New York: Thieme.
Frattali, C. (Ed.). (1998b). Measuring treatment outcomes in speech-language pathology. New York: Thieme.
Fukkink, R. (1996). The internal validity of aphasiological single-subject studies. Aphasiology, 10, 741–754.
Glesne, C., & Peshkin, A. (1992). Becoming qualitative researchers: An introduction. White Plains, NY: Longman.
Goldfried, M. R., & Wolfe, B. E. (1998). Toward a more clinically valid approach to therapy research. Journal of
Consulting and Clinical Psychology, 66, 143–150.
Goldstein, H., & Geirut, J. (1998). Outcomes measurement in child language and phonological disorders. In C. Frattali
(Ed.), Measuring outcomes in speech-language pathology (pp. 406–437). New York: Thieme.
Gorman, B. S., & Allison, D. B. (1996). Statistical alternatives for single-case designs. In R. D. Franklin, D. B. Allison,
& B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 159–214). Mahwah, NJ: Lawrence Erlbaum
Associates.
Hicks, P. L. (1998). Outcomes measurement requirements. In C. Frattali (Ed.), Measuring outcomes in speech-language
pathology (pp. 28–49). New York: Thieme.
Individuals with Disabilities Education Act (IDEA) Amendments of 1997. Pub. L. 105–17. 111 Stat. 37 (1997).
Jacobson, N. S., Roberts, L. J., Berns, S. B. & McGlinchey, J. B. (1999). Methods for defining and determining the
clinical significance of treatment effects: Description, application, and alternatives. Journal of Consulting and Clinical
Psychology, 67, 300–307.
Kamhi, A. (1991). Clinical forum: Treatment efficacy, an introduction. Language, Speech and Hearing Services in
Schools, 22, 254.
Kazdin, A. E. (1977). Assessing the clinical or applied significance of behavioral change through social validation.
Behavior Modification, 1, 427–452.
Kazdin, A. E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical
Psychology, 67, 332–339.
Kazdin, A. E., & Weisz, J. R. (1998). Identifying and developing empirically supported child and adolescent treatments.
Journal of Consulting and Clinical Psychology, 66, 19–36.
Kearns, K. P. (1986). Flexibility of single-subject experimental designs. Part II: Design selection and arrangement of
experimental phases. Journal of Speech and Hearing Disorders, 51, 204–214.
Koegel, R., Koegel, L. K., Van Voy, K., & Ingham, J. (1988). Within-clinic versus outside-of-clinic self-monitoring of
articulation to promote generalization. Journal of Speech & Hearing Disorders, 53, 392–399.
Kratochwill, T. R., & Levin, J. R. (1992). Single-case research design and analysis: New directions for psychology and
education. Hillsdale, NJ: Lawrence Erlbaum Associates.
Kreb, R. A., & Wolf, K. E. (1997). Treatment outcomes terminology. Successful operations in the treatment-outcomes
driven world of managed care. Rockville, MI: National Student Speech-Language-Hearing Association.
Lahey, M. (1988). Language disorders and language development. New York: Macmillan.
Long, S. H., & Olswang, L. B. (1996). Readiness and patterns of growth in children with SELI. Language, Speech, and
Hearing Serivces in Schools, 5, 79–85.
Maloney, D. M., Harper, T. M., Braukmann, C. J., Fixsen, D. L., Phillips, E. L., & Wolf, M. M. (1976). Teaching
conversation-related skills to pre-delinquent girls. Journal of Applied Behavioral Analysis, 9, 371.
Page 326
McCauley, R. J. (1996). Familiar strangers: Criterion-referenced measures in communication disorders. Language,
Speech, and Hearing Services in Schools, 27, 122–131.
McCauley, R. J., & Swisher, L. (1984). Use and misuse of norm-referenced tests in clinical assessment: A hypothetical
case. Journal of Speech and Hearing Disorders, 49, 338–348.
McReynolds, L. V. (1983). Discussion: VII. Evaluating program effectiveness. ASHA Reports 12, 298–306.
McReynolds, L. V., & Kearns, K. P. (1983). Single-subject experimental designs in communicative disorders. Austin,
TX: Pro-Ed.
McReynolds, L. V., & Thompson, C. K. (1986). Flexibility of single-subject experimental designs. Part I: Review of the
basics of single-subject designs. Journal of Speech and Hearing Disorders, 51, 194–203.
Mehrens, W., & Lehman, I. (1980). Standardized tests in education (3rd ed.). New York: Holt, Rinehart & Winston.
Minkin, N., Braukmann, C. J., Minkin, B. L., Timbers, G. D., Timbers, B. J., Fixsen, D. L., Phillips, E. L., & Wolf, M.
M. (1976). The social validation and training of conversational skills. Journal of Applied Behavioral Analysis, 9, 127–
139.
Mowrer, D. (1972). Accountability and speech therapy in the public schools. Asha, 14, 111–115.
Olswang, L. B. (1990). Treatment efficacy research: A path to quality assurance. Asha, 32, 45–47.
Olswang, L. B. (1993). Treatment efficacy research: A paradigm for investigating clinical practice and theory. Journal of
Fluency Disorders, 18, 125–131.
Olswang, L. B. (1998). Treatment efficacy research. In C. Frattali (Ed.), Measuring treatment outcomes in speech-
language pathology (pp. 134–150). New York: Thieme.
Olswang, L. B., & Bain, B. A. (1985). Monitoring phoneme acquisition for making treatment withdrawal decisions.
Applied Psycholinguistics, 6, 17–37.
Olswang, L. B., & Bain, B. A. (1994). Data collection: Monitoring children’s treatment progress. American Journal of
Speech-Language Pathology, 3, 55–66.
Olswang, L. B., & Bain, B. A. (1996). Assessment information for predicting upcoming change in language production.
Journal of Speech and Hearing Research 39, 414–423.
Parsonson, B. S., & Baer, D. M. (1992). The visual analysis of data, and current research into the stimuli controlling it.
In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research design and analysis: New directions for psychology and
education (pp. 15–40). Hillsdale, NJ: Lawrence Erlbaum Associates.
Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Plante, E., & Vance, R. (1994). Selection of preschool speech and language tests: A data-based approach. Language,
Speech, and Hearing Services in Schools, 25, 15–23.
Primavera, L. H., Allison, D. B., & Alfonso, V C. (1996). Measurement of dependent variables. In R. D. Franklin, D. B.
Allison, & B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 41–89). Mahwah, NJ: Lawrence
Erlbaum Associates.
Rosen, A., & Proctor, E. K. (1978). Distinctions between treatment outcomes and their implications for treatment
process: The basis for effectiveness research. Journal of Social Service Research, 2, 25–43.
Rosen, A., & Proctor, E. K. (1981). Distinctions between treatment outcomes and their implications for treatment
evaluation. Journal of Consulting and Clinical Psychology, 49, 418, 425.
Salvia, J., & Ysseldyke, J. E. (1995). Assessment. (6th ed.). Boston: Houghton Mifflin.
Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms
suggest new concepts for training. Psychological Sciences, 3, 207–217.
Schreibman, L., & Carr, E. G. (1978). Elimination of echolalic responding to questions through the training of a
generalized verbal response. Journal of Applied Behavior Analysis, 11, 453–463.
Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative
strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101.
Semel, E., Wiig, E. H., & Secord, W. A. (1996). Clinical Evaluation of Language Fundamentals 3. San Antonio, TX:
Psychological Coproration.
Sturner, R. A., Layton, T. L., Evans, A. W, Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and
language screening: A review of currently available tests. American Journal of Speech, Language, and Hearing, 3, 25–
36.
Page 327
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge: Harvard
University Press.
Wilcox, M. J., & Leonard, L. B. (1978). Experimental acquisition of Wh-questions in language-disordered children.
Journal of Speech and Hearing Research, 21, 220–239.
Wolery, M. (1983). Proportional change index: An alternative for comparing child change data. Exceptional Children,
50, 167–170.
Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its
heart. Journal of Applied Behavior. Analysis, 11, 203–214.
Young, M. A. (1993). Supplementing tests of statistical significance: Variation accounted for. Journal of Speech and
Hearing Research, 36, 644–656.
Page 328
APPENDIX A
Page 329
Norm-Referenced Tests Designed for the Assessment of Language in Children,
Excluding Those Designed Primarily for Phonology (Appendix B)
(Continued)
Page 330
Appendix A (Continued)
Test of Adolescent and 12 to 21 R- and E-Sem, Writing: Sem, Hammill, D. D., Brown, x
Adult Language–3 years Morph, Syn Syn V. L., Larsen, S. C., &
Wiederholt, J. L. (1994).
Test of Adolescent and
Adult Language–3.
Austin, TX: Pro-Ed.
Test of Adolescent/ 12 to 80 E-WF no German, D. J. (1990). x
Adult Word Finding years Test of Adolescent/Adult
Word Finding. San
Antonio, TX:
Psychological
Corporation.
Test of Auditory 3 years to R-Sem, Morph, no Carrow-Woolfolk, E. no
Comprehension of 9 years, 11 Syn (1999). Test of Auditory
Language–3 months Comprehension of
Language–3. Austin,
TX: Pro-Ed.
Test of Children’s 5 years to E Reading, Barenbaum, E., & x
Language 8 years, 11 writing Newcomer, P. (1996).
months Test of Children’s
Language. San Antonio,
TX: Pro-Ed.
Test of Early Language 3 years to R- and E-Sem, no Hresko, W. P., Reid, K., x
Development 7 years, 11 Syn & Hammill, D. D.
months (1991). Test of Early
Language Development
(2nd ed.). Austin, TX:
Pro-ed.
Test of Language 5 to 18 R and E-Sem, no Wiig, E. H., & Secord, x
Competence—Expanded years, 11 Syn, Prag W. (1989). Test of
months Language Competence—
Expanded Edition. San
Antonio: Psychological
Corporation.
Test of Language 8 years to R and E-Sem, no Hammill, D. D., & x
Development– 12 years, Syn Newcomer, P. L.
Intermediate: 3 11 months (1997). Test of
Language Development
—Intermediate: 3.
Circle Pines, MN:
American Guidance
Service.
Test of Language 4 years to R- and E-Phon, no Newcomer, P., & x
Development— 8 years, 11 Sem, Syn Hammill, D. (1997).
Primary: 3 months Test of Language
Development—Primary:
3. Austin, TX: Pro-Ed.
Test of Pragmatic 5 to 13 R and E- no Phelps-Terasaki, D., & x
Language years, 11 Phelps-Gunn, T. (1992).
months Test of Pragmatic
Language. San Antonio,
TX: Psychological
Corporation
Test of Pragmatic Skills 3 to 8 years R- and E-Sem, no Shulman, B. B. (1986). no
(Revised) Prag Test of Pragmatic Skills
(Revised). Tucson, AZ:
Communication Skill
Builders.
Page 333
Test of Relational Concepts 3 years to 7 R-Sem no Edmonston, N., & Thane, N. x
years, 11 L. (1988). Test of Relational
months Concepts. Austin, TX: Pro-Ed.
Test of Word Finding 6½ to 12 E-WF no German, D. J. (1989). Test of x
years, 11 Word Finding. San Antonio,
months TX: Psychological
Corporation.
Test of Word Finding in 6½ to 12 E-WF no German, D. J. (1991). Test of x
Discourse years, 11 Word Finding in Discourse.
months Chicago: Riverside Publishing.
Test of Word Knowledge 5 to 17 R- and E- no Wiig, E. H., & Secord, W. x
Sem (1992). Test of Word
Knowledge. San Antonio, TX:
Psychological Corporation.
Test of Written Expression 6½ years to 14 — Writing McGhee, R., Bryant, B. R., x
years, 11 Larsen, S. C., & Rivera, D. M.
months (1995). Test of Written
Expression. San Antonio: Pro-
Ed.
Test of Written Language-2 17 years, 11 E Writing Hammill, D. D., & Larsen, S. x
months C. (1988). Test of Written
Language–2. San Antonio,
TX: Psychological
Corporation.
The Word Test-Adolescent 12 years to 17 E-Sem no Bowers, L., Huisingh, R., no
years, 11 Orman, J., Barrett, M., &
months LoGiudice, C. (1989). The
Word Test-Adolescent. East
Moline, IL: LinguiSystems.
The Word Test-Revised 7 to 11 years E-Sem no Bowers, L., Huisingh, R., no
Elementary Barrett, M., & LoGiudice, C.,
& Orman, J. (1990). The
Word Test–Revised
Elementary. East Moline, IL:
LinguiSystems.
Token Test for Children 3 to 12 years R-Sem, no DiSimoni, F. (1978). Token MMY9
Syn Test for Children. Chicago:
Riverside.
Utah Test of Language 3 years to 10 R- and E- no Mecham, M. J. (1989). Utah x
Development–3 years, 11 Syn Test of Language
months Development–3. Austin, TX:
Pro-Ed.
Woodcock Language 2 to 95 years R- and E- reading, writing Woodcock, R. W. (1991). x
Proficiency Battery–Revised Sem, Syn Woodcock Language
Proficiency–Revised.
Chicago: Riverside.
Note. Modalities and domains are abbreviated as follows: Receptive (R), Expressive (E), Semantics (Sem),
Morphology (Morph), Syntax (Syn), Pragmatics (Prag), Phonology (Phon), and Word Finding (WF). The presence of a
review in the Mental Measurements Yearbook (MMY) database or print series is noted in the final column, with x
indicating a com-puterized version and numerals representing the specific print volume containing the review.
aMitchell, J. V. (Ed.). (1985). The ninth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental
Measurement.
bConoley, J. C., & Kramer, J. J. (Eds.). (1989). The tenth mental measurements yearbook. Lincoln, NE: Buros Institute
of Mental Measurement.
Page 334
APPENDIX B
Page 335
Norm-Referenced and Criterion-Referenced Tests Designed Primarily for the Assessment of Phonology in Children
(Continued)
Page 336
Appendix B (Continued)
Note. The presence of a review in Mental Measurements Yearbook (MMY), with x indicating a computerized version
and numerals representing the specific print volume contain-ing the review.
aMitchell, J. V. (Ed.). (1985). The ninth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental
Measurement.
bBuros, O. K. (Ed.). (1992). The seventh mental measurements yearbook. Highland Park, NJ: Gryphon Press.
Page 338
Page 339
AUTHOR INDEX
Entries in italics appear in reference lists.
A
Abbeduto, L., 149, 155, 166
Abkarian, G., 160, 164
Aboitiz, F., 119, 141
Agerton, E. P., 273, 287
Aitken, K., 169, 186
Alcock, K., 118, 145
Alfonso, V. C., 264, 280, 291, 306, 326
Allen, D., 114, 130, 143, 171, 172, 173, 178, 186, 233
Allen, J., 175, 184
Allen, M. J., 22, 47, 55, 56, 57, 58, 59, 66, 68, 76
Allen, S., 231, 245
Allison, D. B., 264, 280, 291, 306, 307, 308, 315, 317, 325, 326
Ambrose, W. R., 222, 246
American College of Medical Genetics, 153, 164
American Educational Research Association (AERA), 10, 12, 31, 47, 50, 62, 72, 75, 76, 89, 96, 105, 107, 252, 287
American Psychiatric Association, 111, 114, 115, 130, 134, 140, 148, 149, 150, 161, 164, 169, 170, 171, 172, 173, 178,
180, 181, 182, 183, 184
American Psychological Association (APA), 10, 12, 31, 47, 50, 62, 72, 75, 76, 89, 96, 105, 107, 217, 228, 244, 252, 287
American Speech-Language-Hearing Association (ASHA), 82, 84, 85, 104, 107, 196, 207, 264, 287
Anastasi, A., 36, 47, 55,60, 61, 62, 76, 96, 107, 296, 324
Andrellos, P. J., 237, 246
Andrews, J. F., 197, 198, 210
Angell, R., 181, 184
Annahatak, B., 231, 245
Apel, K., 241, 242, 245
Aram, D. M., 116, 117, 118, 119, 128, 130, 140, 143, 170, 184, 231, 240, 245, 270, 287
Archer, P., 213, 246
Arensberg, K., 229, 249
Arndt, S., 171, 185
Arndt, W. B., 259, 288, 294, 324
Aspedon, M., 238, 248
Augustine, L. E., 82, 108, 229, 230, 245
B
Bachelet, J. F., 270, 291
Bachman, L. F., 103, 107
Baddeley, A., 125, 141
Badian, N., 27, 47
Baer, D. M., 308, 326
Bailey, D., 237, 245
Page 340
Bain, B. A., 230, 248, 251, 252, 255, 256, 276, 277, 278, 279, 286, 287, 291, 294, 295, 296, 297, 298, 299, 300, 302,
303, 304, 305, 307, 308, 309, 310, 311, 314, 315, 316, 324, 326
Baker, K. A., 239, 247
Baker, L., 133, 140
Baker, N. E., 7, 13
Baker-van den Goorbergh, L., 269, 287
Ball, E. W., 137, 140
Balla, D., 163, 166, 215, 249
Baltaxe, C. A. M., 176, 184
Bangert, J., 259, 288, 294, 302, 324
Bankson, N. W., 29, 40, 47, 298, 324, 329, 335
Barenbaum, E., 332
Baron-Cohen, S., 175, 184
Barrett, M., 329, 333
Barrow, J. D., 252, 287
Barsalou, L.W., 7, 12
Barthelemy, C., 181, 185
Bashir, A., 135, 140, 241, 245
Bates, E., 237, 245, 246, 268, 287
Batshaw, M. L., 149, 164
Battaglia, F., 153, 154, 155, 160, 166
Baumeister, A. A., 147, 149, 152, 154, 156, 158, 164
Baumgartner, J. M., 120, 142
Beasley, T. M., 308, 325
Beck, A. R., 266, 283, 287
Becker, J., 170, 185
Bedi, G., 125, 126, 144
Bedor, L., 125, 142
Beitchman, J. H, 130, 140
Bejar, I. I., 66, 77
Bell, J. J., 158, 165
Bellenir, K., 151, 152, 159, 164
Bellugi, U., 158, 166, 188, 198, 207, 208
Benavidez, D. A., 178, 181, 185
Berg, B. L., 279, 280, 287
Bergstrom, L., 197, 207
Beringer, M., 232
Berk, R. A., 6, 13, 56, 76, 158, 163, 165, 256, 287
Berkley, R. K., 251, 289
Berlin, L. J., 258
Bernthal, J. E., 29, 40, 47, 215, 246, 298, 324, 335
Berns, S. B., 304, 325
Bess, F. H., 189, 192, 203, 207
Bettelheim, B., 173, 184
Biederman, J., 161, 165
Bihrle, A., 158, 166
Biklen, S. K., 279, 280, 287
Bishop, D. V. M., 114, 118, 119, 124, 130, 140, 141, 144, 269, 287
Bjork, R. A., 251, 292, 309, 326
Blackmon, R., 283, 292
Blake, J., 270, 287
Blakeley, R. W., 337
Blank, M., 258
Bliss, L. S., 103, 107, 233
Bloodstein, O., 159, 165
Boehm, A. E., 329
Bogdan, R. C., 279, 280, 287, 292
Bondurant, J., 121, 141
Botting, N., 238, 245
Boucher, J., 181, 185
Bow, S., 121, 122, 141
Bowers, L., 333
Bracken, B. A., 215, 245, 329
Brackett, D., 191, 192, 194, 195, 200, 203, 207, 209
Bradley, L., 27, 47
Bradley-Johnson, S., 188, 189, 200, 201, 204, 207
Braukmann, C. J., 304, 325, 326
Bredart, S., 270, 291
Breecher, S. V. A., 238, 248
Brennan, R. L., 102, 108
Bretherton, I., 237, 245
Bridgman, P. W., 19, 47
Brinton, B., 133, 141, 241, 245, 262, 287
Broks, P., 173, 185
Bronfenbrenner, U., 79, 107
Brown, A. L., 226, 245, 272, 277, 287
Brown, J., 226, 245
Brown, R., 266, 287
Brown, S., 230, 246, 255, 278, 289
Brown, V. L., 332
Brownell, R., 331
Brownlie, E. B., 133, 140
Bruneau, N., 181, 185
Bryant, B. R., 59, 77, 333
Bryant, P., 27, 47
Brzustowicz, L., 117, 141
Buckwalter, P., 114, 118, 145
Bunderson, C. V., 31, 47
Buros, O., 337
Burroughs, E. I., 262, 287
Butkovsky, L., 122, 143
Butler, K. G., 132, 145, 255, 287
Byma, G., 125, 144
Bzoch, K. R., 59, 76, 103, 107, 237, 245, 258
C
Cacace, A. T., 192, 207
Cairns, H. S., 259, 290
Page 341
Calhoon, J. M., 281, 287
Camarata, M., 122, 143
Camarata, S., 115, 122, 141, 143, 192, 209
Camaioni, L., 237, 245
Campbell, D., 55, 76, 306, 324
Campbell, M., 184
Campbell, R., 120, 141
Campbell, T., 223, 230, 245, 262, 264, 265, 268, 274, 275, 287, 288, 294, 295, 296, 304, 305, 307, 309, 311, 324
Campione, J., 277, 287
Cantekin, E. I., 191, 208
Cantwell, D., 133, 140
Carney, A. E., 189, 194, 196, 203, 207
Carpentieri, S., 171, 185
Carr, E. G., 317, 326
Carr, L., 124, 142
Carrow-Woolfolk, E., 232, 329, 330, 331, 332
Carver, R., 57, 76, 312, 313, 324
Casby, M., 128, 141
Caspi, A., 313, 324
Castelli, M. C., 237, 245
Chabon, S. S., 295, 310, 317, 324
Channell, R. W., 269, 290
Chapman, A., 238, 248
Chapman, J. P., 8, 13
Chapman, L. J., 8, 13
Chapman, R., 158, 166, 268, 272, 287, 290
Cheng, L. L., 231, 245
Chial, M. R., 30, 47
Chipchase, B. B., 130, 144
Chomsky, N., 123, 141
Chung, M. C., 174, 175, 184
Cibis, G., 161, 166
Cicchetti, D., 163, 166, 215, 249
Cirrin, F. M., 241, 245, 255, 273, 288
Clahsen, H., 124, 141
Clark, M., 118, 121, 122, 135, 141, 143
Cleave, P. L., 116, 124, 128, 141, 144, 231, 246
Clegg, M., 133, 140
Cochran, P. S., 269, 270, 288
Coe, D., 178, 181, 185
Cohen, I. L., 153, 165, 173, 178, 184
Cohen, M., 120, 141, 149, 164, 165
Cohen, N. J., 134, 141
Cohrs, M., 215, 246
Compton, A. J., 232
Compton, C., 104, 108, 232, 245
Conant, S., 270, 288
Conboy, B., 230, 246, 255, 278, 289
Connell, P. J., 317, 324
Connor, M., 152, 161, 165
Conoley, J. C., 103, 104, 108, 333
Conover, W. M., 30, 47
Conti-Ramsden, G., 238, 245
Cook, T. D., 306, 324
Cooke, A., 158, 165
Cooley, W. C., 151, 152, 165
Cooper, J., 126, 144
Cordes, A. K., 66, 68, 76
Corker, M., 195, 207
Coryell, J., 195, 196, 207
Coster, W. J., 237, 246
Courchesne, E., 173, 184, 185
Crago, M., 118, 124, 135, 141, 142, 231, 245
Craig, H. K., 133, 141, 268
Crais, E. R., 79, 81, 108, 236, 245, 281, 288
Creaghead, N. A., 10, 13, 282, 288
Creswell, J. W., 279, 288
Crittenden, J. B., 196, 207
Cromer, R., 149, 165
Cronbach, L. J., 66, 76
Crutchley, A., 238, 245
Crystal, D., 268, 269, 288
Cueva, J. E., 184
Culatta, B., 23, 48
Culbertson, J. L., 189, 192, 203, 207, 208
Cunningham, C., 121, 122, 141
Curtiss, S., 117, 118, 144
Cushman, B. B., 295, 310, 317, 324
D
Dale, P., 237, 246
Damasio, A. R., 181, 184
Damico, J. S., 82, 108, 229, 230, 241, 245, 251, 252, 253, 255, 257, 274, 275, 283, 284, 285, 286, 288
D’Angiola, N., 176, 184
Daniel, B., 271, 291
Darley, F., 251, 290, 337
Davidson, R., 270, 291
Davies, C., 238, 249
Davine, M., 134, 141
Davis, B., 128, 145
Dawes, R. M., 8, 13
Day, K., 271, 291
de Villiers, J., 262, 291
de Villiers, P., 262, 291
DeBose, C. E., 229, 231, 249
DellaPietra, L., 235, 245
Demers, S. T., 84, 85, 108
Denzin, N. K., 279, 288
Derogatis, L. R., 235, 245
Page 342
Deyo, D. A., 196, 209
Dickey, S., 337
Diedrich, W. M., 259, 288, 294, 302, 324
DiLavore, P., 174, 175, 184
Dirckx, J. H., 197, 208
DiSimoni, F., 333
Dobrich, W., 135, 144
Dodds, J., 213, 215, 246
Doehring, D. G., 231, 245
Dollaghan, C., 223, 230, 245, 251, 262, 264, 265, 268, 274, 275, 287, 288, 296, 297, 298, 299, 300, 302, 303, 304, 305,
307, 308, 309, 315, 316, 324
Donahue-Kilburg, G., 82, 83, 108, 203, 208
Donaldson, M. D. C., 158, 165
Dowdy, C. A., 134, 141
Downey, J., 158, 165
Downs, M. P., 188, 189, 190, 191, 193, 194, 197, 207, 209
Dubé, R. V., 196, 208
Dublinske, S., 241, 245
Duchan, J., 253, 259, 262, 263, 279, 289, 290
Dunn, Leota, 40, 51, 57, 71, 76, 232, 245, 331
Dunn, Lloyd, 40, 51, 57, 71, 76, 232, 245, 331
Dunn, M., 171, 172, 186, 240, 245
Durkin, M. S., 147, 148, 165
Dykens, E. M., 149, 152, 153, 158, 159, 161, 164, 165
E
Eaton, L. F., 161, 165
Eaves, L. C., 171, 184
Edelson, S. M., 181, 185
Edmonston, A., 333
Edwards, E. B., 241, 245
Edwards, J., 118, 124, 141, 142
Edwards, S., 159, 160, 164, 166
Eger, D., 295, 306, 310, 317, 318, 320, 324
Ehlers, S., 171, 174, 175, 180, 184, 185
Ehrhardt, A. A., 158, 165
Eichler, J. A., 191, 208
Eisele, J. A., 119, 140
Ekelman, B., 130, 140
Elbert, M., 259, 288, 294, 324
Elcholtz, G., 283, 292
Ellis Weismer, S., 124, 125, 141, 237, 249
Ellis, J., 23, 48
Embretson, S. E., 279, 288
Emerick, L. L., 215, 248
Engen, E., 202, 208
Engen, T., 202, 208
Erickson, J. G., 62, 77
Evans, A. W., 104, 109, 238, 239, 249, 296, 312, 326
Evans, J., 125, 141, 266, 267, 268, 288
Evans, L. D., 188, 189, 200, 201, 204, 207
Eyer, J., 125, 142
F
Fandal, A., 215, 246
Farmer, M., 133, 141
Faust, D., 7, 8, 13
Fay, W., 176, 184
Feeney, J., 215, 246
Fein, D., 171, 172, 173, 178, 186
Feinstein, C., 171, 172, 173, 178, 186
Feldt, L. S., 102, 108
Fenson, L., 237, 246
Ferguson, B., 133, 140
Ferguson-Smith, M., 152, 161, 165
Fergusson, D. M., 313, 324
Feuerstein, R., 276, 278, 288
Fey, M., 116, 128, 141, 221, 231, 246, 269, 290, 309, 325
Finnerty, J., 269, 288
Fiorello, C., 84, 85, 108
Fisher, H. B., 335
Fiske, D. W., 55, 76
Fixsen, D. L., 304, 325, 326
Flax, J., 240, 247
Fleiss, J. L., 68, 77
Fletcher, J. M., 21, 48
Fletcher, P., 118, 145, 269, 288
Flexer, C., 188, 195, 199, 208
Fluharty, N., 239, 246
Flynn, S., 260, 290
Foley, C., 260, 290
Folstein, S., 173, 184
Foster, R., 258
Foster, S. L., 298, 304, 325
Fowler, A. E., 159, 165
Fox, R., 157, 165
Francis, D. J., 21, 48
Frankenburg, W. K., 213, 215, 246
Franklin, R. D., 307, 308, 315, 317, 325
Fraser, G. R., 197, 208
Frattali, C., 87, 108, 251, 288, 295, 303, 306, 318, 319, 325
Fredericksen, N., 66, 77
Freedman, D., 30, 48
Freese, P., 130, 133, 135, 143, 145
Freiberg, C., 266, 290
Page 343
Fria, T. J., 191, 208
Fristoe, M., 336
Frith, U., 173, 178, 181, 184
Fudala, J., 335
Fujiki, M., 133, 141, 262, 287
Fukkink, R., 314, 325
Funk, S. G., 104, 109, 238, 239, 249, 296, 312, 326
G
Gabreels, F., 147, 166
Gaines, R., 161, 166
Galaburda, A., 119, 141
Gardiner, P., 157, 167
Gardner, M. F., 40, 48, 60, 77, 100, 108, 233, 330, 331
Garman, M. L., 269, 288
Garreau, B., 181, 185
Gathercole, S., 125, 141
Gauger, L., 119, 121, 141
Gavin, W. J., 266, 271, 289
Geers, A. E., 201, 202, 208, 209
Geirut, J., 251, 262, 289, 303, 325
Gerken, L., 261, 289
German, D. J., 54, 77, 332, 333
Gertner, B. L., 133, 141
Geschwind, N., 119, 120, 141, 142
Geschwint-Rabin, J., 154, 166
Ghiotto, M., 270, 291
Gibbons, J. D., 30, 47
Giddan, J. J., 258
Gilbert, L. E., 189, 203, 208
Giles, L., 271, 289
Gilger, J. W., 117, 118, 140, 142
Gillam, R., 126, 128, 140, 142, 145
Gillberg, C., 171, 174, 175, 180, 184, 185
Girolametto, L., 237, 246
Glaser, R., 58, 77
Gleser, G. D., 66, 76
Glesne, C., 279, 303, 325
Goldenberg, D., 215, 247
Goldfield, B. A., 302
Goldman, R., 336
Goldman, S., 134, 142
Goldsmith, L., 204, 208, 237, 248
Goldstein, H., 251, 262, 289, 303, 325
Golin, S., 273, 292
Golinkoff, R. M., 261, 289
Good, R., 234, 248
Goodluck, H., 261, 289
Gopnik, M., 118, 124, 135, 141, 142
Gordon-Brannan, M., 241, 242, 245
Gorman, B. S., 307, 308, 315, 317, 325
Gottlieb, M. L., 165
Gottsleben, R., 269, 292
Gould, S. J., 20, 47, 48
Graham, J. M., 151, 152, 165
Grandin, T., 178, 184
Green, A., 161, 166
Green, J. A., 239, 249
Greene, S. A., 158, 165
Grela, B., 125, 142
Grievink, E. H., 205, 209
Grimes, A. M., 241, 245
Gronlund, N., 31, 48, 67, 71, 76, 77
Grossman, H. J., 149, 165
Gruber, C. P., 331
Gruen, R., 158, 165
Gruner, J., 120, 144
Guerin, P., 181, 185
Guidubaldi, J., 201, 209
Guitar, B., 10, 13, 238, 249, 264, 292
Gutierrez-Clellen, V. F., 230, 237, 246, 255, 278, 289
H
Haas, R. H., 173, 185
Haber, J. S., 239, 246
Hadley, P., 121, 133, 141, 142, 237, 246
Haley, S. M., 237, 246
Hall, N. E., 116, 128, 135, 140, 142, 170, 184, 231, 245, 270, 287
Hall, P., 259, 290
Hall, R., 283, 292
Hallin, A., 184
Haltiwanger, J. T., 237, 246
Hammer, A. L., 96, 100, 108
Hammill, D. D., 40, 48, 54, 57, 77, 103, 109, 233, 240, 247, 330, 332, 333
Hand, L., 337
Hanna, C., 233
Hanner, M. A., 330
Hansen, J. C., 224, 244, 246
Harper, T. M., 171, 185, 304, 325
Harris, J. L., 229, 231, 246
Harris, J., 195, 208
Harrison, M., 194, 208
Harryman, E., 215, 248
Hartley, J., 268, 289
Hartung, J., 237, 246
Haynes, W. O., 215, 248
Hecaen, H., 120, 144
Page 344
Hedrick, D., 233, 236, 237, 246
Heller, J. H., 104, 109, 238, 239, 249, 296, 312, 326
Hemenway, W. G., 197, 207
Hersen, M., 164, 165
Hesketh, L. J., 125, 141
Hesselink, J., 121, 142
Hicks, P. L., 317, 318, 325
Hirshoren, A., 222, 246
Hirsh-Pasek, K., 261, 289
Hixson, P. K., 269, 289
Ho, H. H., 171, 184
Hodapp, R. M., 149, 152, 153, 158, 159, 161, 164, 165
Hodson, B., 241, 242, 245, 335
Hoffman, M., 276, 278, 288
Hoffman, P., 276, 291
Holcomb, T. K., 195, 196, 207
Holmes, D. W., 199, 208
Hopkins, J., 104, 108, 217, 238, 246
Horodezky, N., 134, 141
Horwood, L. J., 313, 324
Howard, S., 268, 289
Howe, C., 153, 154, 155, 160, 166
Howlin, P., 133, 144
Hresko, W., 54, 77, 103, 109, 233, 332
Hsu, J. R., 68, 77
Hsu, L. M., 68, 77
Huang, R., 104, 108, 217, 238, 246
Huisingh, R., 329, 333
Hummel, T. J., 235, 246
Hurford, J. R., 132, 140, 142
Hutchinson, T. A., 96, 107, 108, 222, 246, 248
Hux, K., 238, 248
I
Iglesias, A., 278, 291
Impara, J. C., 103, 104, 108
Ingham, J., 304, 325
Inglis, A., 133, 140
Ingram, D., 124, 142
Inouye, D. K., 31, 47
Isaacson, L., 134, 141
J
Jackson, D. W., 121, 142, 194, 200, 204, 209
Jackson-Maldonado, D., 237, 246
Jacobson, J. W., 148, 165, 304, 325
Janesick, V. J., 280, 289
Janosky, J., 230, 245
Jauhiainen, T., 204, 210
Jenkins, W., 125, 126, 143, 144
Jensen, M., 276, 288
Jernigan, T., 121, 142
Johansson, M., 171, 180, 185
Johnson, G. A., 277, 291
Johnson, G., 230, 248
Johnston, A. V., 330
Johnston, E. G., 330
Johnston, J. R., 115, 142
Johnston, P., 126, 143
Jones, S. S., 31, 48
Juarez, M. J., 222, 248
K
Kahneman, D., 8, 13
Kalesnik, J. O, 215, 216, 235, 236, 238, 248
Kallman, C., 125, 144
Kamhi, A., 8, 13, 115, 116, 128, 142, 216, 229, 231, 241, 245, 246
Kanner, L., 181, 185
Kaplan, C. A., 130, 144
Kapur, Y. P., 194, 197, 198, 208
Karchmer, M. A., 204, 208
Kaufman, A. S., 215, 246
Kaufman, N. L., 215, 246, 336
Kayser, H., 229, 231, 246, 247
Kazdin, A. E., 297, 304, 305, 324, 325
Kazuk, E., 215, 246
Kearns, K. P., 68, 76, 77, 274, 275, 286, 290, 307, 308, 309, 314, 315, 317, 325, 326
Kelley, D. L., 279, 289
Kelly, D. J., 217, 247
Kemp, K., 266, 267, 270, 289
Kent, J. F., 64, 77
Kent, R. D., 8, 13, 64, 77
Kerlinger, F. N., 19, 48
Keyser, D. J., 104, 108
Khan, L. M., 336
King, J. M., 161, 166
Kingsley, J., 156, 166
Klaus, D. J., 58, 77
Klee, T., 189, 192, 203, 207, 266, 267, 270, 289, 290
Klein, S. K., 203, 208
Kline, M., 232
Koegel, L. K., 304, 325
Koegel, R., 304, 325
Koller, H., 156, 161, 166
Page 345
Kovarsky, D., 253, 279, 286, 289
Kozak, V. J., 201, 202, 209
Kramer, J. J., 333
Krassowski, E., 127, 142, 231, 247
Kratochwill, T. R., 307, 315, 317, 325
Kreb, R. A., 319, 323, 324, 325
Kretschmer, R., 121, 141
Kresheck, J., 215, 232, 248, 331
Kuder, G. F., 69, 77
Kuehn, D. P., 120, 142
Kulig, S. G., 239, 247
Kunze, L., 239, 249
Kwiatkowski, J., 336
L
Lahey, M., 10, 13, 48, 118, 124, 128, 141, 142, 223, 231, 247, 269, 289, 303, 325
Lancee, W., 133, 140
Lancy, D., 279, 289
Landa, R. M., 273, 289
Larsen, S. C., 332, 333
Larson, L., 199, 210
Layton, T. L., 104, 109, 199, 208, 238, 239, 249, 296, 312, 326
Le Couteur, A., 175, 185
League, R., 59, 76, 103, 107, 237, 246, 258
Leap, W. L., 231, 247
Leckman, J. F., 149, 152, 153, 159, 164, 165
Lee, L., 218, 247
Lehman, I., 296, 326
Lehr, C. A., 215, 247
Lehrke, R. G., 152, 166
Lemme, M. L., 120, 142
Leonard, C., 119, 121, 141
Leonard, L., 114, 117, 118, 119, 121, 122, 123, 124, 125, 126, 128, 130, 131, 132, 137, 140, 142, 221, 223, 230, 240,
247, 251, 270, 289, 317, 327
Leverman, D., 233
Levin, J. R., 307, 315, 317, 325
Levitsky, W., 120, 142
Levitz, M., 156, 166
Levy, D., 125, 143
Lewis, N. P., 336
Lidz, C. S., 255, 276, 278, 289
Lillo-Martin, D., 188, 198, 207, 208
Lincoln, A. J., 173, 185
Lincoln, Y. S., 279, 288
Linder, T. W., 281, 289
Ling, D., 195, 208
Linkola, H., 204, 210
Lipsett, L., 134, 141
Locke, J., 125, 143
Loeb, D., 124, 143
Logemann, J. A., 335
Logue, B., 271, 291
LoGuidice, C., 333
Lombardino, L., 119, 121, 141
Loncke, F., 192, 209
Long, S. H., 116, 128, 141, 231, 246, 266, 269, 270, 278, 289, 290, 310, 314, 325
Longobardi, E., 237, 245
Lonsbury-Martin, B. L., 206, 208
Lord, C., 174, 175, 184, 185
Love, S. R., 178, 181, 185
Lowe, R., 335
Lubetsky, M. J., 147, 161, 166
Lucas, C. R., 259, 290
Luckasson, R., 148, 166
Ludlow, L. H., 237, 246
Lugo, D. E., 232, 245
Lund, N. J., 253, 259, 262, 263, 290
Lust, B., 260, 290
Lyman, H. B., 76
M
Machon, M. W., 104, 109, 238, 239, 249, 296, 312, 326
Macmillan, D. L., 148, 149, 166
MacWhinney, B., 268, 269, 287, 290
Maino, D. M., 161, 166
Maloney, D. M., 304, 325
Malvy, J., 181, 185
Marchman, V., 237, 246
Mardell-Czudnoswki, C., 215, 247
Marks, S., 158, 166
Marlaire, C. L., 63, 77, 236, 244, 247
Martin, G. K., 206, 208
Mash, E. J., 298, 304, 325
Masterson, J. J., 269, 270, 288, 290
Matese, M. J., 178, 181, 185
Matkin, N. D., 189, 203, 209
Matson, J. L., 178, 181, 185
Matthews, R., 133, 145
Mauk, G. W., 194, 208
Maurer, R. G., 181, 184
Mawhood, L., 133, 144
Maxon, A., 191, 192, 194, 200, 209
Maxwell, L. A., 154, 166, 199, 200, 208
Maxwell, M. M., 253, 279, 289
Page 346
Maynard, D. W., 63, 77, 236, 244, 247
McCarthy, D. A., 215, 249
McCauley, R. J., 7, 12, 13, 35, 38, 48, 102, 104, 108, 217, 220, 225, 231, 234, 238, 247, 249, 251, 252, 253, 256, 264,
290, 292, 296, 299, 312, 313, 326
McClave, J. T., 30, 48
McDaniel, D., 259, 290
McGhee, R., 333
McGlinchey, J. B., 304, 325
McKee, C., 259, 290
McFarland, D. J., 192, 207
McReynolds, L. V., 68, 76, 77, 274, 275, 286, 290, 307, 308, 309, 314, 315, 317, 326
Mecham, M. J., 333
Meehl, P. E., 8, 13, 223, 247
Mehrens, W., 296, 326
Mellits, D., 125, 144
Membrino, I., 266, 289
Menolascino, F. J., 161, 165
Menyuk, P., 130, 143
Merrell, A. M., 104, 108, 217, 242, 247
Mervis, C. B., 158, 166
Merzenich, M., 125, 126, 143, 144
Messick, S., 4, 13, 76, 77, 252, 290
Mient, M. G., 295, 310, 317, 324
Miller, J. F., 158, 166, 230, 235, 236, 249, 252, 253, 262, 258, 263, 266, 268, 269, 270, 271, 273, 284, 288, 290, 292
Miller, R., 276, 288
Miller, S., 125, 126, 143, 144
Miller, T. L., 217, 248
Milone, M. N., 204, 208
Minifie, F., 251, 290
Minkin, B. L., 304, 326
Minkin, N., 304, 326
Mislevy, R. J., 66, 77
Mitchell, J. V., 333, 337
Moeller, M. P., 189, 194, 196, 200, 203, 207, 208
Moellman-Landa, R., 273, 290
Moffitt, T. E., 313, 324
Mogford, K., 195, 208
Mogford-Bevan, K., 188, 203, 208
Moldonado, A., 233
Montgomery, A. A., 104, 109
Montgomery, J. K., 241, 242, 248
Moog, J. S., 201, 202, 208, 209
Moores, D. F., 196, 209
Morales, A., 271, 291
Moran, M. J., 273, 287
Mordecai, D. R., 269, 290
Morgan, S. B., 171, 184
Morishima, A., 158, 165
Morisset, C., 237, 245
Morris, P., 79, 107
Morris, R., 116, 128, 140, 171, 172, 173, 178, 186, 231, 245, 252, 270, 287, 290
Morriss, D., 271, 291
Mowrer, D., 294, 317, 326
Mulick, J. A., 148, 165
Muller, D., 268, 289
Muma, J., 79, 108, 217, 247, 253, 271, 290, 291
Murphy, L. L., 104, 108
Musket, C. H., 199, 209
Myles, B. S., 170, 185
N
Nagarajan, S., 125, 126, 144
Nair, R., 133, 140
Nanda, H., 66, 76
Nation, J., 130, 140
National Council on Measurement in Education (NCME), 10, 12, 31, 47, 50, 62, 72, 75, 76, 89, 96, 105, 107, 252, 287
Needleman, H., 230, 245
Neils, J., 117, 118, 143
Nelson, K. E., 122, 143, 192, 209
Nelson, N. W., 235, 247, 282, 291
Newborg, J., 201, 209
Newcomer, P. L., 40, 48, 57, 77, 103, 108, 240, 247, 332
Newcorn, J. H., 161, 165
Newhoff, M., 121, 143
Newman, P. W., 10, 13
Newport, E., 198, 209
Nicolosi, L., 215, 248
Nielsen, D. W., 6, 10, 13
Nippold, M. A., 104, 108, 134, 143, 217, 238, 246, 248
Nitko, A. J., 49, 58, 67, 71, 77
Nordin, V., 171, 174, 175, 184, 185
Norris, J., 276, 291
Norris, M. K., 222, 248
Norris, M. L., 239, 246
Northern, J. L., 188, 189, 190, 191, 193, 194, 207, 209
Nunnally, J., 225, 248
Nuttall, E. V., 215, 216, 235, 236, 238, 248
Nyden, A., 171, 180, 185
Nye, C., 241, 242, 248
O
O’Brien, M., 114, 145
O’Grady, L., 188, 207
Page 347
Olsen, J. B., 31, 47
Olswang, L. B., 128, 129, 143, 223, 230, 248, 249, 251, 252, 255, 256, 273, 276, 277, 278, 279, 280, 286, 287, 289, 290,
291, 292, 294, 295, 296, 297, 298, 302, 303, 304, 305, 307, 310, 311, 314, 317, 318, 319, 323, 324, 325, 326
Onorati, S., 270, 287
Orman, J., 333
Ort, S. I., 149, 165
Owens, R. E., 221, 248
Oyler, A. L., 189, 203, 209
Oyler, R. F., 189, 203, 209
P
Padilla, E. R., 232, 245
Page, J. L., 23, 48, 181, 185
Palin, M. W., 269, 290
Palmer, P., 171, 185, 269, 290
Pan, B. A., 174, 185
Panagos, J., 268, 291
Pang, V. O., 231, 248
Papoudi, D., 169, 186
Parsonson, B. S., 308, 326
Passingham, R., 118, 145
Patell, P. G., 133, 140
Patton, J. R., 134, 141
Paul, P. V., 194, 196, 200, 204, 207, 209
Paul, R., 128, 143, 174, 176, 177, 185, 203, 209, 221, 222, 223, 248, 253, 262, 263, 268, 290, 291
Payne, K. T., 228, 229, 249
Pedhazur, R. J., 10, 13, 17, 18, 22, 23, 24, 28, 48, 55, 56, 76, 264, 280, 291, 297, 298, 303, 306, 326
Pembrey, M., 117, 143
Peña, E., 230, 248, 255, 276, 278, 289
Pendergast, K., 337
Penner, S. G., 255, 273, 288
Perachio, J. J., 331
Perkins, M. N., 222, 248
Perozzi, J. A., 251, 289
Perret, Y. M., 149, 164
Peshkin, A., 279, 303, 325
Peters, S. A. F., 205, 209
Pethick, S., 237, 246
Phelps-Gunn, T., 332
Phelps-Terasaki, D., 332
Phillips, E. L., 304, 325, 326
Piercy, M., 125, 144
Pindzola, R. H., 215, 248
Pisani, R., 30, 48
Piven, J., 171, 185
Plake, B. S., 104, 108
Plante, E., 104, 108, 116, 118, 120, 121, 127, 135, 141, 142, 143, 217, 218, 220, 222, 231, 242, 247, 299, 326
Plapinger, D., 199, 209
Poizner, H., 198, 208
Pollock, K. E., 229, 231, 246
Polloway, E. A., 134, 141
Pond, R. E., 59, 77, 233, 331
Porch, B. E., 274, 331
Prather, E. M., 233, 236, 237, 238, 246, 248
Prelock, P. A., 241, 245, 268, 282, 289, 291
Primavera, L. H., 264, 280, 291, 306, 326
Prinz, P., 196, 198, 209
Prizant, B. M., 171, 174, 185, 214, 248
Proctor, E. K., 294, 326
Prutting, C. A., 251, 289
Purves, R., 30, 48
Pye, C., 269, 291
Q
Quartaro, G., 270, 287
Quigley, S. P., 207
Quinn, M., 230, 235, 236, 249, 252, 284, 292
Quinn, R., 278, 291
R
Radziewicz, C., 81, 109
Rajaratnam, N., 66, 76
Ramberg, C., 171, 180, 185
Rand, Y., 276, 278, 288
Rapcsak, S., 120, 143
Rapin, I., 114, 130, 143, 171, 172, 173, 174, 178, 179, 180, 181, 185, 186, 191, 203, 208, 209
Raver, S. A., 281, 291
Records, N. L., 7, 10, 13, 114, 130, 133, 135, 143, 145
Rees, N. S., 192, 209
Reeves, M., 266, 290
Reichler, R. J., 175, 185
Reid, D., 54, 77, 103, 109, 233, 332
Reilly, J., 237, 246
Remein, Q. R., 6, 13, 214, 249
Renner, B. R., 175, 185
Reschly, D. J., 148, 149, 166
Rescorla, L., 128, 143, 237, 248
Resnick, T. J., 191, 209
Reveron, W. W., 229, 248
Reynell, J., 331
Reynolds, W. M., 335
Page 348
Reznick, S., 237, 246
Rice, M. L., 117, 119, 121, 124, 133, 141, 142, 143, 144, 217, 237, 247, 249
Richard, G. J., 330
Richardson, M. W., 69, 77
Richardson, S. A., 156, 161, 166
Ries, P. W., 188, 209
Riley, A. M., 330
Rimland, B., 173, 181, 185
Risucci, D., 130, 145
Rivera, D. M., 333
Robarts, J., 169, 186
Roberts, J. E., 237, 245
Roberts, L. J., 304, 325
Robinson-Zañartu, C., 230, 231, 248, 255, 278, 289
Roby, C., 136, 144
Rodriguez, B., 128, 129, 143, 241, 245
Roeleveld, N., 147, 166
Roeper, T., 262, 291
Rolland, M.-B. 266, 290
Romeo, D., 121, 141
Romero, I., 215, 216, 235, 236, 238, 248
Rondal, J. A., 159, 160, 161, 164, 166, 270, 291
Rosa, M., 229, 249
Rose, S. A., 258
Rosen, A., 294, 326
Rosen, G., 119, 141
Rosenbek, J. C., 64, 77
Rosenberg, L. R., 233
Rosenberg, S., 149, 155, 166
Rosenzweig, P., 233
Rosetti, L., 237, 248, 281
Ross, M., 189, 191, 192, 194, 200, 209
Ross, R., 117, 118, 144
Roth, F., 267, 291
Rothlisberg, B. A., 103, 109
Rounds, J., 7, 13
Rourke, B., 21, 48
Roush, J., 194, 203, 208, 209
Roussel, N., 232, 248
Roux, S., 181, 185
Rowland, R. C., 3, 13
Ruscello, D., 40, 48
Rutter, M., 133, 144, 169, 173, 174, 175, 184, 185
S
Sabatino, A. D., 217, 248
Sabers, D. L., 100, 107, 109, 222, 248
Sabo, H., 158, 166
Salvia, J., 33, 35, 36, 48, 63, 64, 77, 96, 102, 107, 109, 225, 231, 233, 248, 252, 264, 291, 296, 326
Sanders, D. A., 192, 195, 209
Sandgrund, A., 161, 166
Sanger, D., 238, 248
Sattler, J. M., 37, 47, 48, 76, 158, 166, 225, 226, 249
Sauvage, D., 181, 185
Scarborough, H., 135, 144, 269, 270, 291
Schachter, D. C., 133, 137, 140, 144
Scheetz, N. A., 191, 207, 209
Schiavetti, N., 262, 265, 292
Schilder, A. G. M., 205, 209
Schlange, D., 161, 166
Schloss, P. J., 204, 208
Schmelkin, L. P., 10, 13, 17, 18, 22, 23, 24, 28, 48, 55, 56, 76, 264, 280, 291, 297, 298, 303, 306, 326
Schmidt, R. A., 251, 292, 309, 326
Schopler, E., 169, 175, 184, 185
Schraeder, T., 230, 235, 236, 249, 252, 284, 292
Schreibman, L., 173, 185, 317, 326
Schreiner, C., 125, 126, 143, 144
Schupf, N., 152, 167
Schwartz, I. S., 223, 249, 255, 279, 280, 292, 296, 297, 304, 305, 307, 324, 326
Scientific Learning Corporation, 126, 144
Secord, W. A., 10, 13, 59, 77, 230, 233, 238, 245, 249, 251, 252, 253, 255, 257, 259, 264, 274, 283, 284, 285, 286, 288,
292, 305, 326, 329, 332, 333, 337
Selmar, J., 337
Semel, E., 59, 77, 233, 238, 249, 264, 292, 305, 326, 329
Sevin, J. A., 178, 181, 185
Shady, M., 261, 289
Shanteau, J., 7, 13
Shaywitz, B., 21, 48
Shaywitz, S. E., 21, 48
Shelton, R. L., 259, 288, 294, 324
Shenkman, K., 118, 135, 143
Shepard, L. A., 235, 249
Sherman, D., 251, 290
Sherman, G., 119, 141
Shewan, C., 283, 292
Shields, J., 173, 185
Shine, R. E., 259, 292
Shipley, K. G., 331
Short, R. J., 148, 166
Shriberg, L., 268, 291, 336
Shu, C. E., 158, 165
Page 349
Shulman, B., 241, 242, 245, 332
Siegel, L., 121, 122, 141
Silliman, E. R., 279, 282, 292
Silva, P. A., 313, 324
Silverman, W., 152, 167
Simeonsson, R. J., 148, 166
Simon, C., 262, 263, 292
Simpson, A., 173, 185
Simpson, R. L., 170, 185
Slater, S., 283, 292
Sliwinski, M., 240, 247
Smedley, T., 199, 209
Smit, A., 222, 249, 337
Smith, A. R., 238, 249, 264, 292
Smith, B., 174, 175, 184
Smith, E., 114, 145
Smith, M., 82, 108,229, 230, 245
Smith, S., 271, 291
Smith, T. E. C., 134, 141
Snyder, L., 237, 245
Snow, C. E., 123, 144, 174, 185
Snow, R., 63, 77
Snowling, M. J., 130, 144
Soder, A. L., 337
Sowell, E., 121, 142
Sparks, S. N., 155, 166
Sparrow, S. S., 149, 163, 165, 166, 215, 249
Spekman, N., 266, 290
Spencer, L., 192, 196, 209
Sponheim, E., 174, 185
Sprich, S., 161, 165
St. Louis, K. O., 40, 48
Stafford, M. L., 238, 248
Stagg, V., 157, 166
Stark, J., 258
Stark, R. E., 115, 125, 137, 144
Stein, Z. A., 147, 148, 165
Steiner, V., 59, 77, 233, 331
Stelmachowicz, P., 199, 210
Stephens, M. I., 104, 109, 239, 249
Stephenson, J. B., 158, 165
Stevens, G., 60, 77
Stevens, S. S., 20, 43, 48, 265, 292
Stevenson, J., 133, 144
Stewart, T. R., 7, 13
Stillman, R., 63, 77
Stock, J. R., 201, 209
Stockman, I. J., 230, 235, 236, 249, 252, 284, 292
Stokes, S., 238, 249
Stone, T. A., 331
Stothard, S. E., 130, 144
Stout, G. G., 195, 209
Strain, P. S., 184, 185
Stratton, K., 153, 154, 155, 160, 166
Stray-Gunderson, K., 151, 164, 166
Striffler, N., 239, 249
Strominger, A., 135, 140
Stromswold, K., 266, 292
Strong, M., 196, 198, 209
Sturner, R. A., 104, 109, 238, 239, 249, 296, 312, 326
Sue, M. B., 331
Supalla, S., 198, 209
Supalla, T., 198, 209
Svinicki, J., 201, 209
Sweetland, R. C., 104, 108
Swisher, L., 35, 48, 104, 108, 115, 120, 141, 143, 217, 220, 225, 231, 234, 247, 252, 256, 290, 296, 299, 312, 313, 326
T
Tackett, A., 271, 291
Tager-Flusberg, H., 126, 144
Taitz, L. S., 161, 166
Tallal, P., 115, 117, 118, 121, 125, 126, 137, 142, 143, 144, 145
Taylor, O. L., 228, 229, 249
Taylor, S. J., 279, 292
Templin, M. C., 266, 292, 337
Terrell, F., 229, 249, 273, 292
Terrell, S. L., 229, 249, 273, 292
Teszner, D., 120, 144
Thal, D., 237, 246
Thane, N. L., 333
Thompson, C. K., 315, 317, 324, 326
Thordardottir, E. T., 237, 249
Thorner, R. M., 6, 13, 214, 249
Thorton, R., 260, 292
Thorum, A. R., 330
Thurlow, M. L., 215, 247
Tibbits, D. F., 241, 245
Timbers, B. J., 304, 326
Timbers, G. D., 304, 326
Timler, G., 128, 145, 146
Tobin, A., 233, 236, 237, 246
Tomblin, J. B., 7, 10, 13, 114, 117, 118, 121, 122, 130, 133, 135, 142, 143, 144, 145, 262, 265, 287
Tomlin, R., 265, 288
Torgesen, J. K., 59, 77
Toronto, A. S., 233
Toubanos, E. S., 103, 109
Page 350
Townsend, J., 173, 185
Tracey, T. J., 7, 12, 13
Trauner, D., 121, 145
Trevarthen, C., 169, 186
Tsang, C., 232, 233
Turner, R. G., 6, 10, 13, 222, 249, 256, 292
Tversky, A., 8, 13
Tyack, D., 269, 292
Tye-Murray, N., 192, 209
Tynan, T., 157, 167
Tzavares, A., 120, 144
U
Udwin, O., 160, 166
V
van Bon, W. H. J., 205, 209
Van den Bercken, J. H. L., 205, 209
van der Lely, H., 124, 145
van der Spuy, H., 121, 122, 141
Van Hasselt, V. B., 164, 165
van Hoek, K., 188, 207
Van Keulen, J. E., 229, 231, 249
van Kleeck, A., 82, 109, 128, 145
Van Riper, C., 62, 77
Van Voy, K., 304, 325
Vance, H. B., 217, 248
Vance, R., 104, 108, 120, 143, 218, 220, 222, 247, 299, 326
Vargha-Kadeem, F., 118, 145
Vaughn-Cooke, F. B., 223, 229, 230, 249
Veale, T. K., 126, 145
Veltkamp, L. J., 161, 167
Vernon, M., 197, 198, 210
Vetter, D. K., 10, 13, 96, 109, 253, 292
Volterra, V., 237, 245
Vostanis, P., 174, 175, 184
Voutilainen, R., 204, 210
Vygotsky, L. S., 276, 292, 310, 327
W
Wallace, E. M., 238, 248
Wallace, G., 330
Wallach, G. P., 132, 145
Walters, H., 133, 140
Wang, X., 125, 126, 144
Warren, K., 63, 77
Washington, J. A., 229, 230, 249
Wasson, P., 157, 167
Waterhouse, L., 169, 171, 172, 173, 178, 186
Watkins, K., 118, 145
Watkins, R. V., 114, 121, 130, 145
Wechsler, D., 18, 48
Weddington, G. T., 229, 231, 249
Weiner, F. F., 269, 292, 336
Weiner, P., 135, 145
Weiss, A., 7, 13, 230, 247, 259, 290
Welsh, J., 122, 143
Wender, E., 134, 145, 181, 186
Werner, E. O., 232, 331
Wesson, M., 161, 166
Westby, C., 241, 245, 279, 292
Wetherby, A. M., 171, 174, 185, 214, 248
Wexler, K., 124, 144
White, K. R., 194, 208
Whitehead, M. L., 206, 208
Whitworth, A., 238, 249
Wiederholt, J. L., 332
Wiig, E. D., 31, 48
Wiig, E. H., 59, 77, 230, 233, 238, 245, 249, 251, 252, 253, 255, 257, 258, 264, 274, 283, 284, 285, 286, 288, 292, 305,
326, 329, 332, 333
Wiig, E. S., 31, 48
Willis, S., 239, 249
Wilcox, M. J., 317, 327
Wild, J., 133, 140
Wilkinson, L. C., 279, 282, 292
Williams, D., 176, 186
Williams, F., 24, 28, 48
Williams, K. T., 97, 103, 109, 330
Wilson, A., 158, 165
Wilson, B., 130, 133, 140, 145
Wilson, K., 283, 292
Wiltshire, S., 103, 109
Windle, J., 195, 209
Wing, L., 171, 172, 173, 178, 180, 186
Wise, P. S., 157, 165
Wnek, L., 201, 209
Wolery, M., 299, 300, 327
Wolf, M. M., 304, 305, 319, 323, 324, 325, 326
Wolfram, W., 231, 249
Wolf-Schein, E. G., 169, 173, 175, 186
Wolk, S., 204, 208
Woodcock, R. W, 333
Woodley-Zanthos, P., 152, 154, 164
Woodworth, G. G., 192, 198, 209
World Health Organization, 85, 87, 109, 169, 186, 253, 280, 282, 292
Page 351
Worthington, D. W., 199, 210
Wulfeck, B., 121, 145
Wyckoff, J., 270, 291
Y
Yaghmai, F., 120, 141
Yen, W. M., 22, 47, 55, 56, 57, 58, 59, 68, 76
Yeung-Courchesne, R., 173, 185
Ying, E., 199, 200, 210
Yoder, D. E., 8, 13, 258
Yonce, L. J., 223, 245
Yoshinago-Itano, C., 200, 203, 210
Young, E. C., 331
Young, M. A., 29, 48, 297, 298, 327
Ysseldyke, J. E., 33, 35, 36, 48, 64, 77, 96, 102, 107, 109, 215, 225, 231, 234, 247, 248, 252, 264, 291, 296, 326
Yule, W., 160, 166
Z
Zachman, L., 329
Zelinsky, D. G., 149, 165
Zhang, X., 114, 145
Zielhuis, G. A., 147, 166
Zigler, E., 149, 165
Zigman, A., 152, 167
Zigman, W. B., 152, 167
Zimmerman, I. L., 59, 77, 233, 331
Page 352
Page 353
SUBJECT INDEX
Page numbers followed by a t indicate tables and those followed by an f indicate figures.
Children with SLI are at increased risk for various emotional, behavioral, and social difficulties, including attention deficit disorder (ADD), conduct disorder, and anxiety disorders .
Autism spectrum disorders present diagnostic challenges due to significant overlap with mental retardation and other developmental disorders. The heterogeneity and changes in symptoms further complicate diagnosis, making it essential to understand the specific cognitive deficits and symptomatology over time .
Different reliability types, like test-retest or inter-rater reliability, need to align with clinical questions and the population assessed. High reliability (coefficient at least 0.90 for significant decisions) provides confidence, while lower reliability necessitates cautious decision-making and confirms findings using multiple sources .
An understanding of etiology, such as Down syndrome versus fragile X syndrome, is important because it can influence the associated health risks, the prognosis, and the specific educational and therapeutic approaches necessary for effective management .
Clinicians should consider the test's norm sample representativeness, reliability, and validity regarding the specific demographic and linguistic characteristics of the student population. Additionally, practical considerations such as ease of administration and interpretation are crucial .
The standard error of measurement helps determine the degree of confidence in a test score's accuracy, highlighting potential variability. It guides understanding of whether observed changes between test appearances reflect true skill differences or measurement error .
Cultural factors significantly influence the assessment of bilingual children with language impairments. Assessments must consider the child’s linguistic community and the specific expectations and cultural norms that may affect their language use . Speech-language pathologists are encouraged to conduct evaluations in the child’s native language and use coordinated, interdisciplinary approaches to ensure a comprehensive and culturally sensitive assessment . Furthermore, assessments should include a dynamic component, accommodating the child's cultural background, by using methodologies such as mediated learning experiences that can reflect the child's potential in diverse cultural contexts . Clinicians also need to recognize and minimize their own cultural biases and ensure that their evaluations are relevant to the child's real-life interactions and social environments ."}
It's crucial to ensure that the normative sample of the test closely matches the race, language background, and socioeconomic status of the child being assessed. Significant differences can undermine the validity of the test, so practitioners may need to draw on cultural knowledge and alternative assessment approaches, such as dynamic assessment, if the validity is compromised .
Dynamic assessment provides framework for non-biased evaluation by emphasizing learning potential rather than static abilities. It incorporates mediation and focuses on children's capacity to learn when given support, helping mitigate cultural and linguistic biases .
Organic factors, like genetic syndromes (e.g., Down syndrome, fragile X), offer biological explanations for developmental disabilities, whereas familial factors often relate to socio-environmental influences. These factors affect diagnosis, management, and understanding of the developmental trajectory .