0% found this document useful (0 votes)
568 views419 pages

Assessment of Language Disorders in Children

Uploaded by

Syaeful Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
568 views419 pages

Assessment of Language Disorders in Children

Uploaded by

Syaeful Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 419

Cover

title : Assessment of Language Disorders in Children


author : McCauley, Rebecca Joan.
publisher : Lawrence Erlbaum Associates, Inc.
isbn10 | asin :
print isbn13 : 9780805825619
ebook isbn13 : 9780585378947
language : English
subject Language disorders in children--Diagnosis, Communicative
disorders in children--Diagnosis, Learning disabled children--
Language--Evaluation.
publication date : 2001
lcc : RJ496.L35M375 2001eb
ddc : 618.92/855075
subject : Language disorders in children--Diagnosis, Communicative
disorders in children--Diagnosis, Learning disabled children--
Language--Evaluation.
Page i
ASSESSMENT OF LANGUAGE DISORDERS IN CHILDREN
Page ii
Page iii
ASSESSMENT OF LANGUAGE DISORDERS IN CHILDREN
Rebecca J. McCauley
University of Vermont
Page iv
Copyright © 2001 by Lawrence Erlbaum Associates, Inc.
All rights reserved. No part of this book may be reproduced in any form, by photostat, microform, retrieval system, or
any other means, without the prior written permission of the publisher.
Lawrence Erlbaum Associates, Inc., Publishers
10 Industrial Avenue
Mahwah, New Jersey 07430
Cover design by Kathryn Houghtaling Lacey
Library of Congress Cataloging-in-Publication Data
McCauley, Rebecca Joan, 1952–
Assessment of language disorders in children / Rebecca J.
McCauley.
p. cm.
ISBN: 0-8058-2561-4 (cloth : alk. paper)/ 0-8058-2562-2 (pbk. : alk. paper)
1. Language disorders in children—Diagnosis. 2. Communicative
disorders in children—Diagnosis. 3. Learning disabled children—
Language—Evaluation. I. Title.

RJ496.L35 M375 2001 00-050403


618.92’855075—dc21 CIP
Printed in the United States of America
10 9 8 7 6 5 4 3 2 1
Page v
To my parents,
Fred and Priscilla McCauley
Page vi
Page vii
Contents
Preface xi
Why I Wrote This Book
How This Book Is Organized
Acknowledgments
1 Introduction 1
Purposes of This Text 1
Why Do We Make Measurements in the Assessment and Management of Childhood Language
Disorders? 2
What Problems Accompany Measurement? 4
A Model of Clinical Decision Making 7
Summary 11
Key Concepts and Terms 11
Study Questions and Questions to Expand Your Thinking 12
Recommended Readings 12
References 12
PART I: BASIC CONCEPTS IN ASSESSMENT
2 Measurement of Children’s Communication and Related Skills 17
Theoretical Building Blocks of Measurement 17
Basic Statistical Concepts 24
Characterizing the Performance of Individuals 30
Case Example 38
Summary 43
Key Concepts and Terms 44
Study Questions and Questions to Expand Your Thinking 46
Page viii
Recommended Readings 47
References 47
3 Validity and Reliability 49
Historical Background 49
Validity 51
Reliability 66
Summary 72
Key Concepts and Terms 73
Study Questions and Questions to Expand Your Thinking 75
Recommended Readings 76
References 76
4 Evaluating Measures of Children’s Communication and Related Skills 78
Contextual Considerations in Assessment: The Bigger Picture in Which Assessments Take Place 79
Evaluating Individual Measures 88
Summary 105
Key Concepts and Terms 106
Study Questions and Questions to Expand Your Thinking 106
Recommended Readings 107
References 107
PART II: AN OVERVIEW OF CHILDHOOD LANGUAGE DISORDERS
5 Children with Specific Language Impairment 113
Defining the Problem 113
Suspected Causes 116
Special Challenges in Assessment 127
Expected Patterns of Language Performance 130
Related Problems 132
Summary 137
Key Concepts and Terms 138
Study Questions and Questions to Expand Your Thinking 139
Recommended Readings 140
References 140
6 Children with Mental Retardation 146
Defining the Problem 147
Suspected Causes 149
Special Challenges in Assessment 156
Expected Pattern of Strengths and Weaknesses 158
Related Problems 161
Summary 161
Key Concepts and Terms 162
Study Questions and Questions to Expand Your Thinking 163
Page ix
Recommended Readings 164
References 164
7 Children with Autistic Spectrum Disorder 168
Defining the Problem 169
Suspected Causes 173
Special Challenges in Assessment 174
Expected Patterns of Language Performance 176
Related Problems 178
Summary 181
Key Concepts and Terms 182
Study Questions and Questions to Expand Your Thinking 183
Recommended Readings 184
References 184
8 Children with Hearing Impairment 187
Defining the Problem 188
Suspected Causes 196
Special Challenges in Assessment 198
Expected Patterns of Oral Language Performance 203
Related Problems 204
Summary 205
Key Concepts and Terms 205
Study Questions and Questions to Expand Your Thinking 206
Recommended Readings 207
References 207
PART III: CLINICAL QUESTIONS DRIVING ASSESSMENT
9 Screening and Identification: Does This Child Have a Language Impairment? 213
The Nature of Screening and Identification 214
Special Considerations When Asking This Clinical Question 216
Available Tools 236
Practical Considerations 240
Summary 242
Key Concepts and Terms 243
Study Questions and Questions to Expand Your Thinking 244
Recommended Readings 244
References 244
10 Description: What Is the Nature of This Child’s Language? 250
The Nature of Description 251
Special Considerations for Asking This Clinical Question 252
Available Tools 255
Practical Considerations 280
Summary 283
Key Concepts and Terms 284
Study Questions and Questions to Expand Your Thinking 286
Page x
Recommended Readings 286
References 287
11 Examining Change: Is This Child’s Language Changing? 293
The Nature of Examining Change 294
Special Considerations for Asking This Clinical Question 296
Available Tools 311
Practical Considerations 317
Summary 321
Key Concepts and Terms 322
Study Questions and Questions to Expand Your Thinking 323
Recommended Readings 324
References 324
Appendix A 328
Appendix B 334
Author Index 339
Subject Index 353
Page xi
Preface
Why I Wrote This Book

How This Book Is Organized

Acknowledgement
Why I Wrote This Book
‘‘You can’t kill anyone with speech-language pathology.”
I came to speech-language pathology by what was then an unconventional route—a Ph.D. in a nonclinical speciality
within behavioral sciences, followed by postdoctoral study, clinical practicum, and a clinical fellowship year. Thus, I
was unschooled in the humorous wisdom that is passed along with more standard fare to speech-language pathology
doctoral students through the years. I was able to glean only one or two such aphorisms from my contacts with a more
conventionally trained and clinically savvy colleague.
“You can’t kill anyone with speech-language pathology,” she said. A balm to the anxieties of a beginning clinician who
knows that there is so much she does not know. A bit of humor to help you while you learn. However, the more clients I
worked with, the more I was haunted by this aphorism. Certainly, killing was exceedingly rare to nonexistent, but
looming large were the specters of unfulfilled hopes and wasted time. The possibility for improving children’s lives
became ever clearer, but so did the possibility of less desirable outcomes.
Page xii
Initially my clients were preschoolers whose parents were baffled by their children’s failure to express themselves
clearly, or they were school-aged children who were diagnosed with both language-learning disabilities and serious
emotional problems. More recently, my clients have included unintelligible children whose problems were largely
limited to their phonology as well as children whose problems encompassed not only that one aspect of language, but
almost all other areas one might examine. All of these clients—like those with whom you currently work or will soon
work—present us with puzzles to be solved and responsibilities to be met if we are to help them.
The puzzle presented by children with language disorders is the array of abilities and difficulties that they bring to
language learning and use. I use the word “puzzle” because, like puzzles, their problems at first suggest many alternative
modes of solution—some better, some worse, and some probably of no value at all. Thus, “responsibilities” follow from
our professional obligation to help children maximize their skills and minimize their problems in the process of
deciphering the particular pattern of intricacies they present.
In short, the reason I wrote this book was to help identify better ways of dealing with the puzzles and responsibilities that
are so frustratingly linked in our interactions with our clients. By finding the best ways of dealing with these puzzles and
responsibilities we can avoid the harm implied by the aphorism quoted earlier and can instead enrich their lives by
helping them improve their communication with others.
How This Book Is Organized
Overall Organization of the Book
This book is divided into three major sections. In Part I, concepts in measurement are explained as they apply to
children’s communication. Although some of these concepts are quantitative in nature, others relate to the social context
in which measurements are made and used. Special emphasis is placed on the concepts of validity and reliability because
all other measurement characteristics are ultimately of interest by virtue of their effects on reliability and, more
importantly, on validity. This part of the book concludes with a chapter providing direct advice regarding the
examination of materials associated with measurement tools for purposes of determining their usefulness for a particular
child or group of children.
In Part II, four major categories of childhood language disorders are discussed: specific language impairment (chap. 5),
language problems associated with mental retardation (chap. 6), autism spectrum disorders (chap. 7), and language
problems associated with hearing impairment (chap. 8). These four categories were selected because they are the most
frequently occurring childhood language disorders. Although children across these disorder categories share many
problems, each group also presents unique challenges to assessment and management. Some of these challenges relate to
the heterogeneity of language and other abilities shown by children in the category, the relative amount of information
available due to the rarity of the problem, and the often diverse theoretical orientations of researchers. Each of these
chapters provides a bare-bones introduction to the disorder category: its suspected causes,
Page xiii
special challenges to language assessment, expected patterns of language performance, and accompanying problems that
are unrelated to language. A full description of any one of these disorders would require several books as long as this
one. Consequently, readers are directed to more comprehensive sources for further learning but are given sufficient
information to anticipate how language assessment will need to be focused in order to begin to respond to the special
needs of each group of children.
In Part III, three major types of questions that serve as the starting points for assessment are introduced and then pursued
in detail—from theoretical underpinnings to currently available measures. The major questions correspond to steps in the
clinical interaction. First, the clinician must determine whether a language problem exists; second, he or she must
determine the nature of the problem—both in terms of specific patterns of impairment across language domains and
modalities and in terms of specific problem areas within each domain and modality. Finally, he or she must track change,
determining how the client’s behaviors are changing and whether treatment seems to be the cause of identified
improvements. In the course of addressing each of these questions, the reader is taken through the steps required to move
from the question to the tools available to answer it for any given client.
Organization within Chapters
Each chapter contains several features designed to assist readers in mastering new content and in searching the text for
specific information. Chapter outlines and enumerated summaries of major points aid readers interested in obtaining an
overview of chapter content. To help readers with new or unfamiliar vocabulary, key terms are highlighted in the text,
defined when of particular importance, and listed at the end of each chapter. Finally, a list of study questions and
recommended readings is designed to allow readers to pursue topics further.
Acknowledgments
Whereas the flaws of this book are certainly of my own doing, its virtues owe much to the help I have received from
colleagues and friends. Numerous colleagues in Vermont and elsewhere read sections of the book and contributed
greatly to my understanding of the diverse group of children described in it and deserve my considerable thanks. Among
them are Melissa Bruce, Kristeen Elaison, Laura Engelhart, Julie Hanson, and Julie Roberts. In addition, I owe special
appreciation to Barry Guitar, whose experience with his own books helped him provide the most meaningful
encouragement and advice on all aspects of the project. I am particularly grateful for his ability to temper constructive
criticism with ego-boosting praise. My long-time colleague and friend Martha Demetras took on a heroic and most
helpful reading of a near final form of the book. She along with Frances Billeaud, Bernard Grela and Elena Plante read
some of the most challenging sections and tried to help keep me on track. At Lawrence Erlbaum Associates, Susan
Milmoe, Kate Graetzer, Jenny Wiseman, and Eileen Engel have helped me countless times through their expertise and
patience. Irene Farrar took my graphics and made them both clearer and more inter-
Page xiv
esting and Kathryn Houghtaling made the cover all I could have hoped for. She did this with help of the photographer
Holly Favro and her most graceful niece Sara Faust.
Although not involved with this project directly, there are several mentors who have shaped my interest in the topics
discussed here and contributed substantially to my ability to tackle those topics as well as I have. They have my respect
and gratitude always: Ralph Shelton, Linda Swisher, Betty Stark, Dick Curlee, and Dale Terbeek. Finally, I owe great
thanks to my parents, who each read and commented on some portion of the book and who provided encouragement
along the way, not to mention the foundation that led me to want to pursue this project.
Page 1
CHAPTER
1

Introduction
Purposes of This Text

Why Do We Make Measurements in the Assessment and Management of Childhood Language Disorders?

What Problems Accompany Measurement?

A Model of Clinical Decision Making


Purposes of This Text
The distraught parents of a 3-year-old with delayed communication arrive at the office of a speech-language pathologist,
youngster in tow and anxiety emanating almost palpably with every word: ‘‘Does our child have a serious problem?”
“What can be done to correct it?” “How effective will treatment be?”
Although the children and the specific questions change, the scene remains the same: A child’s parents or teacher turn to
a speech-language clinician for help that will include answers to specific questions about whether a language problem
exists, its nature, and how to intervene to minimize or remove its effects. This book focuses on basic elements of
measurement of childhood language disorders as the means of providing valid clinical answers to these questions
because only with valid clinical answers can effective clinical action be taken.
Specifically, this book is designed to prepare readers to select, create, and use behavioral measures as they assess,
manage, and evaluate treatment efficacy for chil-
Page 2
dren with language disorders. Although it is designed to provide guidance for those working with children with any
language disorder, the greatest attention is paid to specific language impairment, autism, and language disorders related
to mental retardation and hearing impairment.
This book is intended primarily for graduate and undergraduate students who expect to enter the field of communication
disorders. It may also serve as a refresher for professionals, such as practicing speech-language pathologists or teachers,
who have never been formally introduced to some of the basic concepts behind the wide range of measures used in the
assessment of childhood language disorders or who would like an introduction to the latest developments in this area.
Unfortunately, the topic of measurement in childhood language disorders has the reputation of threatening complexity.
Indeed, measurement of language, or communication more generally, is complex both because of the wealth of abilities
and behaviors underlying language use and because of the variety of measurement orientations on which speech-
language pathology and audiology draw. Although direct roots in educational and psychological testing traditions are
particularly robust, there are also connections to measurement traditions in linguistics, personnel management, medicine,
public health, and even acoustics. The approach taken here attempts to blend the best of these traditions and alert readers
to the elements they share.
For all readers, the text is intended to achieve three goals. First, readers will learn to recognize the bond that ties the
quality of clinical actions to the quality of measurement used in the process of clinical decision making for children with
suspected language disorders. Second, they will learn how to frame clinical questions in measurement terms by
considering the information needed and the specific methods available to answer them. Third, they will learn to
recognize that all measurement opportunities present alternatives—at times alternatives of comparable merit, but more
often alternatives that vary in their ability to answer the clinical question at hand. This last goal will enable readers to act
as critical consumers and discriminating developers of clinical tools for language measurement. Case examples are used
frequently in the text to help readers apply new concepts and methods to specific problems like those they currently face
or will soon encounter.
Why Do We Make Measurements in the Assessment and Management of Childhood Language
Disorders?
The following three cases illustrate a variety of occasions in which measurement serves as the basis for clinical actions
involving children with various language difficulties.
Two-year-old Cameron has been scheduled for a communication evaluation because of parental concerns that he uses
only two words and does not appear to understand as well as his older sister did at a much younger age. Additionally, he
generally avoids eye contact, which his parents find particularly alarming because of recent exposure to a television
show on autism. Thus, they have specific questions about whether their child has autism and what they can do to
improve his ability to communicate with other members of the family.
Page 3
Alejandro, a diminutive 9-year-old who hardly seems imposing enough for such a distinguished name, moved from
Mexico to the United States a year ago, has just moved into a new school district. Although he has been diagnosed with
a language disorder, no information concerning the relation of that language disorder to his bilingualism has
accompanied him to his new school. Decisions regarding his school placement and access to special services will hinge
on that information.
Four-year-old Mary Beth has been referred by her pediatrician to your private practice for a complete evaluation of her
communication skills. Although she has been receiving speech-language treatment since she was 2 years of age because
of Down syndrome, Mary Beth has not made progress at the rate expected by her regular speech-language pathologist
or desired by her parents. In fact, she appears to have made almost no progress in the past year and may be losing skills
in some areas.
These three cases illustrate the varied problems facing children and families who turn to speech-language pathologists
for solutions. They also illustrate the speech-language pathologist’s role as part of a larger team of professionals.
First, Cameron’s parents are faced with a child who appears quite delayed in his expressive and receptive language and
who may also evidence difficulties in the nonverbal underpinnings of communication. Addressing their chief concern
will require an interdisciplinary effort involving several professionals (including possibly a psychologist, a neurologist, a
developmental pediatrician, and a social worker) designed to yield a differential diagnosis. If autism is diagnosed, the
need for interdisciplinary efforts will continue because of the array of problems often associated with autism—ranging
from mental retardation to sleep disorders. The family’s needs, as well as the child’s, may be intense, with the result that
the speech pathologist’s focus on the child’s communication may broaden to encompass the family communication
context as well as the coordination of efforts aimed at the child’s overall needs.
Alejandro presents the speech-language pathologist with the difficult task of determining to what extent his language
difficulties are differences not unlike those facing anyone with undeveloped skills in a new language versus to what
extent they reflect an underlying disorder in language learning affecting both his native and second languages. In
addition to decisions regarding the nature of direct therapy that he should receive (including whether it should be
conducted in Spanish or English), critical decisions regarding his classroom placements are pressing. Not only will the
speech-language pathologist need to work closely with his family and teachers to reach these decisions, he or she may
also need to work with a translator or cultural informant to arrive at the best decisions for Alejandro’s academic and
social future.
Finally, Mary Beth’s parents and pediatrician are interested in receiving information that will shed some light on her lack
of progress in speech-language treatment. Such information could help guide her subsequent treatment by providing her
parents, pediatrician, and regular speech-language pathologist with a better understanding of her current strengths and
weaknesses and, consequently, a better understanding of reasonable next steps. It should be noted, however, that Mary
Beth’s parents might also use this information as they consider suing the speech-language pathologist responsible for her
care. Although this prospect is remote, it is nonetheless an increasing possibility (Rowland, 1988).
These three cases reveal that speech-language pathologists are asked to obtain and use information to help children from
a variety of cultural backgrounds and a range
Page 4
of communication problems. Although they obtain much of that information directly, they must often work with families
and other professionals to stand a chance of getting the “facts.” Speech-language pathologists use some of this
information themselves, such as when they identify and describe a language disorder or plan their role in treatment. They
also share information with others, including doctors, teachers, and other individuals who work with persons
experiencing a communication disorder. In brief, then, speech-language pathologists generate, use, and share information
having potentially vital medical, educational, social, and even legal significance.
So how does measurement enter into the strategies used to address children’s needs? Put simply and in terms specific to
its use in communication disorders, measurement can be seen as the methods used to describe and understand
characteristics of persons and their communication as part of clinical decision making, the process by which the clinician
devises a plan for clinical action. Thus, it is the connection between clinical decisions and clinical action that makes
measurement matter (Messick, 1989). Clinicians make numerous, almost countless decisions about a child in the course
of a clinical relationship—from determining that a communication disorder exists, to selecting a general course of
treatment, to examining the efficacy of a very specific treatment task. Because the clinician bases her actions at least in
part on measurement data obtained from the client, the quality of the action will be closely related to the quality of the
data used to plan it. The section that follows considers several decision points that offer opportunities for successes—or
failures—in clinical decision making.
What Problems Accompany Measurement?
Table 1.1 lists five different kinds of decisions occurring in the course of a clinical relationship as well as some of the
measures that might be used to provide input to each decision. This listing is intended to illustrate the variety of
decisions to be made rather than to list them exhaustively. As illustrated in the table, decision making begins even prior
to the initiation of an ongoing clinical relationship, as the speech-language pathologist screens communication skills to
determine whether additional attention is warranted. Subsequently, the clinician will require more information to
understand the nature of the problem presented and to arrive at decisions about how best to manage it. Once a program
of management is in place, ongoing measurement is required to respond to the client’s changing needs and
accomplishments. Even the end of the clinical relationship is based on the clinician’s use of measurement with dismissal
from treatment usually occurring when communication skills are normalized, maximum gains have been effected, or
treatment has been found to be unsuccessful. At each of the points of decision making, the potential for harm enters hand
in hand with the potential for benefit.
A brief reconsideration of the case of Mary Beth can be used to illustrate the potential for clinical harm as well as to
introduce a method for evaluating the effects of different kinds of errors in decision making. Recall that Mary Beth has
received speech-language treatment for 2 years because of an early diagnosis of Down syndrome. Her lack of any
progress in speech and language over the past year, or worse yet, her loss of
Page 5
Table 1.1
Clinical Decisions in Speech-Language Pathology

Clinical Decision Related Clinical Actions Types of Measures Used

Screening for a language disorder


Refer for complete evaluationCounsel client and familyInform and confer with relevant professionalsRefer for related
evaluations Client and family interviewStandardized screening measureInformal clinician-designed measure Diagnosis
of a language disorder Recommend treatment, monitoring, or no treatmentCounsel client and familyInform and confer
with relevant professionals Client and family interviewStandardized norm-referenced testsParent report instruments
Planning for management of language disorder Recommend type and frequency of treatmentIdentify strengths and
weaknesses in communicative functioningConsult with professionals serving client needs (e.g., educators, psychologists,
physicians) Standardized norm-referenced or criterion-referenced testsInformal measures related to specific treatment
goals or used to describe domains for which measures are unavailable or that require a realistic setting (e.g., functional
performance in the classroom) Assessment of change in communication over time Infer developmental trendsModify
treatment planDocument treatment efficacyDismiss from treatment Standardized norm-referenced or criterion-referenced
testsInformal measures related to specific treatment goalsSingle subject experimental designs Identification of need for
additional information in a related area Refer to a related professional for additional information Client and family
interviewStandardized norm-referenced testsInformal clinician-devised measure
skills, may represent a poor fit between the assessment tools used to measure progress and the areas in which Mary Beth
has in fact advanced, or it may represent some unsatisfactory clinical practice of her regular speech-language pathologist.
On the other hand, this lack of progress may reflect a change in Mary Beth’s neurological status that requires medical
attention. Therefore, one of the most immediate decisions to be made from a speech-language perspective is whether to
refer Mary Beth to a neurologist.
Figure 1.1, a decision matrix, illustrates a method for thinking about the possible outcomes associated with this particular
decision. This type of decision matrix has
Page 6

Fig. 1.1. A decision matrix for the decision of whether to refer Mary Beth for neurologic evaluation.
been used to assess the implications of alternative choices in a variety of fields (Berk, 1984; Thorner & Remein, 1962;
Turner & Nielsen, 1984). To construct such a matrix as a means of considering repercussions for a single case, one
pretends that one has access to the ultimate “truth” about what is best for Mary Beth. From that perspective, a referral
either should or should not be made—no doubts.
With such perfect knowledge, therefore, suppose that a referral should be made. In that case, the clinician will have made
a correct judgment if he or she has referred and an incorrect one if he or she has not. If the clinician errs by not referring,
Mary Beth may become involved in the expense and frustration of continuing speech-language treatment that is doomed
to failure. Further she may be delayed in or prevented from receiving attention for an incipient neurologic condition,
which, in turn, could have serious, even life-threatening consequences. Although this error might be corrected over time,
its effects are likely to be relatively long lasting and potentially costly in terms of time and money.
On the other hand, suppose that the “truth’’ is that a referral is not needed and therefore should not be made. In that case
the clinician will have made a correct judgment if she has not referred and an error if she has. Plausibly, this type of error
may result in a needless expenditure of time and money and in undue concern on the part of Mary Beth’s family. A bit
more positively, however, the effects of this error would probably be relatively short-lived: Once the neurologic
evaluation took place, the concern would probably end.
A decision matrix makes it clear that different errors in clinical decision making are associated with different effects.
Errors vary in terms of the likelihood that they
Page 7
will be detected, the time course for that detection, and the nature of costs they will exact from the client and clinician.
The decision matrix, therefore, is a particularly powerful tool because it allows one to examine both the frequency and
type of errors made. I return to this type of matrix frequently because of its helpfulness in thinking about tools used to
reach clinical decisions.
In the next section of this chapter, I introduce methods used to understand (and therefore potentially to improve) clinical
decision making. Their description is followed by the introduction of a model that is intended to serve as a framework in
which to think about the steps involved in formulating and answering clinical questions.
A Model of Clinical Decision Making
The processes by which individuals make decisions about complex problems—such as those involved in a variety of
clinical settings—have been the focus of several lines of research (Shanteau & Stewart, 1992; Tracey & Rounds, 1999).
Each differs from the others somewhat in intent, but all have something to offer anyone interested in clinical decision
making.
First, decision making has been of interest to psychologists who want to understand how complicated problems are
solved and to what extent those who are acknowledged “expert” problem solvers in a given area (e.g., chess, medicine,
accounting) differ from naive problem solvers (Barsalou, 1992). Second. skilled decision making has been studied by
researchers from a variety of disciplines who wish to develop computer programs called expert systems, which seek to
mimic expert performance (Shanteau & Stewart, 1992). Such researchers have focused on the creation of computer
programs yielding optimal clinical judgments. Because they focus on successful decision making, these researchers have
been uninterested in understanding expert errors in decision-making. Finally, there has been a much smaller group of
researchers who study the nature and process of decision making in specific fields for the benefit of the field itself. In
speech-language pathology and audiology, such research has increased dramatically over the last decade (e.g., McCauley
& Baker, 1994; Records & Tomblin, 1994; Records & Weiss, 1991). Researchers in this third category tend to be
interested in both errors and successful performance, often as a means of improving professional training.
You may be asking, “How does research on decision-making relate to measurement in speech-language pathology?” and
more specifically, “How can it help me be a better professional?’’ To begin with, a detailed understanding of expert
clinical decision making may help beginning clinicians reach the ranks of “expert” more quickly. For example, such an
understanding may identify which sources of information and which methods experts use—as well as which ones they
avoid. Another potential benefit of research in clinical decision making is that it may identify problems that beset even
experienced clinicians, thereby helping decision makers at all levels be vigilant in avoiding them (e.g., Faust, 1986;
Tracey & Rounds, 1999). A relatively brief description of two such problems may help illustrate the potential value of
this type of research.
In a review of research on human judgment in clinical psychology and related fields, Faust (1986) described clinicians’
over-reliance on confirmatory strategies. Essentially, the use of a confirmatory strategy means that after forming a
hypothe-
Page 8
sis early in the course of decision making (e.g., regarding a diagnosis, etiology, or some other clinical question), the
clinician proceeds to search out and emphasize information tending to confirm the hypothesis. At the same time, she or
he may fail to search out discrepant evidence. The tendency for very able clinicians to adopt such a strategy has been
demonstrated repeatedly in studies in which clinicians are asked to make decisions on hypothetical clinical data
(Chapman & Chapman, 1967, 1969; Dawes, Faust, & Meehl, 1993).
For an example of how a confirmatory strategy might operate in a case of decision making in speech-language
pathology, I return to the case of Alejandro. Suppose that Alejandro’s clinician initially develops the hypothesis that
Alejandro responds most consistently when communicating in English. The clinician would be using a confirmatory
strategy if she or he failed to evaluate Alejandro’s performance for Spanish and informally sought teachers’ impressions
of how well Alejandro was responding to the English-only approach she had recommended, but did so in such a way as
to invite only positive reactions.
A second example of a problem in clinical decision making has been described as the failure to “realize the extent to
which sampling error increases as sample size decreases.” (Faust, 1986, p. 421). Tversky and Kahneman (1993)
described this practice as evidence of “the belief in the law of small numbers,” by which they mean the tendency to
assume that even a very small sample is likely to be representative of the larger population from which it is drawn.
Returning to one of the hypothetical cases presented earlier, imagine this sort of problem as menacing the clinician who
is to evaluate Mary Beth, the youngster with Down syndrome. Suppose that that clinician were to have seen only two or
three children with Down syndrome during her clinical career—each of whom had made exceptionally poor progress.
The danger would be that the clinician would consider those few children she had seen as representative of all children
with that diagnosis, thereby causing her to downplay the stated concerns about Mary Beth’s lack of progress.
Neither of these problems in clinical decision making has been seen as evidence of gross incompetence. Although poor
clinicians may succumb more frequently to these practices, the practices themselves should be of considerable concern to
scientifically oriented clinicians precisely because they seem to be related to tendencies in human problem solving, and
they must actively be worked against for the good of clients and of the profession.
Once aware that bad habits such as those described above may creep into clinical decision making, the wary clinician
can seek remedies. Among the remedies recommended for the tendency to use a confirmatory strategy is the adoption of
a disconfirmatory strategy, in which evidence both for and against one’s pet hypothesis is sought after and valued.
Similarly, a belief in the law of small numbers can be undermined by reminders that when one has only limited
experience with individuals with a particular type of communication disorder, the characteristics of people from that
sample are quite likely to be unrepresentative of that population as a whole.
Although the process by which speech-language pathologists and audiologists reach clinical decisions is far from well
understood at this point (Kamhi, 1994; Yoder & Kent, 1988), the model shown in Fig. 1.2 is intended to serve as a
working model that can be
Page 9

Fig. 1.2. A model illustrating the ways in which measurements are used to reach clinical decisions leading to the
initiation or modification of clinical actions.
elaborated on as understanding increases. Such a graphic model can help emphasize the varied nature of the processes
involved in reaching complex clinical decisions, including both those that are very deliberate and readily available for
inspection as well as those that are almost automatic and less available for observation.
The process of clinical decision making is initiated as the speech-language pathologist formulates one or more clinical
questions. Although such questions may often coincide with those actually expressed by the client, they may not always
do so. Thus for example, the parents of 3-year-old Mary Beth may not have expressed interest in having her hearing
status evaluated. On the other hand, her speech-language pathologist would see that as a critically important question,
given both the susceptibility to middle ear infection with associated hearing loss among children with Down syn-
Page 10
drome and the pivotal role of hearing in speech-language acquisition. This example points out that clinical questions
arise both from clients’ expressions of need and from the expert knowledge possessed by the clinician.
The formulation of clinical questions is of central importance to the quality of clinical decision making because it drives
all that follows. First, the clinical question determines what range of information should be sought. Second, it guides the
clinician in the selection or creation of appropriate measurement tools. In fact, it is widely held that any measurement
tool can only be evaluated in relation to its adequacy in addressing a specific clinical question (American Educational
Research Association [AERA], American Psychological Association [APA], & National Council on Measurement in
Education [NCME], 1985; Messick, 1988). No measure is intrinsically “good” or “valid.” Rather, the quality of a
measure varies depending on the specific question it is used to address. Thus, for example, a given language test may be
an excellent tool for answering a question about the adequacy of 4-year-old Mary Beth’s expressive language skills, yet
it may be a perfectly awful tool if used to examine such skills for 9-year-old bilingual Alejandro.
Optimally, specific measurement tools will be selected so as to address the full scope of each clinical question being
posed using the best measures available (Vetter, 1988b). For some questions, however, the wealth of commercially
available standardized tests and published procedures will fail to yield any acceptable measure, or even any measure at
all. At such times, clinicians may decide to develop an informal measure of their own (Vetter, 1988a), or they may
simply have to admit that not all clinical questions for all clients are answerable (Pedhazur & Schmelkin, 1991).
The administration or collection of selected clinical measures is certainly the most obvious portion of the clinical
decision-making process. Its importance can be emphasized by reference to the data-processing adage “garbage in,
garbage out.’’ Put more decorously, the act of skillful administration is crucial to the quality of information obtained.
Haphazard compliance with standard administration guidelines may render the information obtained spurious and
misleading, thereby undermining all later efforts of the clinician to use it to arrive at a reasonable clinical decision.
Following data collection, the clinician examines information obtained across a variety of sources and integrates that
information to address specific clinical questions. For example, in order to comment on the reasonableness of progress
made by Mary Beth during the past 2 years, her speech-language pathologist will need to perform a Herculean task—
integrating across time and content area measures related to speech, language, hearing, and nonverbal cognition.
Components of the clinical decision-making process outlined in Fig. 1.1 have received differing amounts of attention
from speech-language pathology and audiology professionals. Thus, for example, considerable attention has been paid to
the formulation of relevant clinical questions for specific categories of communication disorders (e.g., Creaghead,
Newman & Secord, 1989; Guitar, 1998; Lahey, 1988). On the other hand, little has been written about how clinicians
can use such information to arrive at effective clinical decisions (Records & Tomblin, 1994; Turner & Nielsen, 1984).
Therefore, in the remainder of this text, both venerable concepts and emerging hypotheses will be shared to help readers
improve the quality of their clinical decision
Page 11
making and, consequently, of their clinical actions toward children with developmental language disorders.
Summary
1. Measurement of developmental language disorders draws on methods used in a wide variety of disciplines.
2. The purposes of this text are to help readers learn to frame effective clinical questions that will guide the decision-
making process, to recognize that all measurement opportunities present alternatives, and to recognize the connection
between the quality of clinical actions and the quality of measurement used in the clinical decision-making process.
3. Speech-language pathologists obtain and use information obtained through measurement to arrive at diagnoses that
affect medical, educational, social, and even legal outcomes. They derive this information cooperatively with others (e.
g., families and other professionals) and share it with others as a means of achieving the child’s greatest good.
4. Measurement is important because it helps drive clinical decision making, which in turn affects clinical actions.
5. Measurement is used to address clinical questions related to screening, diagnosis, planning for treatment, determining
severity, evaluating treatment efficacy, and evaluating change in communication over time.
6. The cognitive processes involved in clinical decision making are not well understood but have begun to be studied in
research addressing complex problem solving, computer expert systems, and specific issues within a variety of fields (e.
g., medicine, special education).
7. Examples of problematic tendencies that have been identified as possible barriers to effective clinical decision making
include the use of confirmatory strategies and the belief in the law of small numbers.
Key Concepts and Terms
belief in the law of small numbers: the tendency to overvalue information obtained from a relatively small sample of
individuals, for example, those few individuals with an uncommon disorder with whom one has had direct contact.
clinical decision making: the processes by which clinicians pose and answer clinical questions as a basis for clinical
actions such as diagnosing a communication disorder, developing a treatment plan, or referring a client for medical
evaluation.
confirmatory strategy: the tendency to seek and pay special attention to information that is consistent with a clinical
hypothesis while failing to seek, or undervaluing, information that is not consistent with the hypothesis.
decision matrix: a method used to consider the outcomes associated with correct and incorrect decisions.
Page 12
differential diagnosis: the identification of a specific disorder when several diagnoses are possible because of shared
symptoms (self-reported problems) and signs (observed problems).
measurement: methods used to describe and understand characteristics of a person.
Study Questions and Questions to Expand Your Thinking
1. Taking each of the three cases described earlier in the chapter, use Table 1.1 to determine what types of clinical
decisions and related clinical actions are likely to be required for each.
2. For each of those cases used in Question 1, identify a binary clinical decision and consider the implications of the two
kinds of errors that can result.
3. On the basis of your current knowledge of children with language disorders, develop a hierarchy of outcomes that
might result from clinical errors in the following cases:
● screening of hearing in a 4-month-old infant;

● collection of treatment data in English for a child whose first language is Vietnamese;

● collection of trial treatment data for purposes of selecting treatment goals for a child exhibiting significant semantic

delays;
● evaluation of a language skills in a child who exhibits severe delays in speech development.

4. Think about decisions—big and small—that you may have made during the last week. Try to remember the process by
which you reached your decision. Did any of your decision making involve the use of a confirmatory strategy? Describe
the specific example and how your thinking might have differed if you had avoided such a strategy.
Recommended Readings
Barsalou, L. W. (1992). Cognitive psychology: An overview for cognitive scientists. Hillsdale, NJ: Lawrence Erlbaum
Associates.
McCauley, R. J. (1988). Measurement as a dangerous activity. Hearsay: Journal of the Ohio Speech and Hearing
Association, Spring 1988, 6–9.
Tracey, T. J., & Rounds, J. (1999). Inference and attribution errors in test interpretation. In J. W. Lichtenberg & R. K.
Goodyear (Eds.), Scientist-practitioner perspectives on test interpretation (pp. 113–131). Boston: Allyn & Bacon.
References
American Educational Research Association (AERA), American Psychological Association (APA), & National Council
on Measurement in Education (NCME) (1985). Standards for educational and psychological testing. Washington, DC:
APA.
Barsalou, L. W. (1992). Thinking. Cognitive psychology: An overview for cognitive scientists. Hillsdale, NJ: Lawrence
Erlbaum Associates.
Page 13
Berk, R. A. (1984). Screening and diagnosis of children with learning disabilities. Springfield, IL: C. C. Thomas.
Chapman, L. J., & Chapman, J. P. (1967). Genesis of popular but erroneous psychodiagnostic observations. Journal of
Abnormal Psychology, 72, 193–204.
Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs.
Journal of Abnormal Psychology, 74, 271–280.
Creaghead, N. A., Newman, P. W., & Secord, W. A. (1989). Assessment and remediation of articulatory and
phonological disorders. Columbus: Merrill.
Dawes, R. M., Faust, D., & Meehl, P. E. (1993). Statistical prediction versus clinical prediction: Improving what works.
In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp.
351–367). Hillsdale, NJ: Lawrence Erlbaum Associates.
Faust, D. (1986). Research on human judgment and its application to clinical practice. Professional Psychology, 17, 420–
430.
Guitar, B. (1998). Stuttering: An integrated approach to the nature and treatment (3rd ed.). Baltimore, MD: Williams &
Wilkins.
Kamhi, A. G. (1994). Toward a theory of clinical expertise in speech-language pathology. Language, Speech, Hearing
Services in Schools, 25, 115–118.
Lahey, M. (1988). Language disorders and language development. New York: Macmillan.
McCauley, R. J. (1988, Spring). Measurement as a dangerous activity. Hearsay: Journal of the Ohio Speech and
Hearing Association, 6–9.
McCauley, R. J. & Baker, N. E. (1994). Clinical decision-making in specific language impairment: Actual cases. Journal
of the National Student Speech-Language-Hearing Association, 21, 50–58.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13–104). New York: American Council
on Education and Macmillan Publishing.
Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Records, N. L., & Weiss, A. (1990). Clinical judgment: An overview. Journal of Childhood Communication Disorders,
13, 153–165.
Records, N. L., & Tomblin, J. B. (1994). Clinical decision making: Describing the decision-rules of practicing speech-
language pathologists, Journal of Speech and Hearing Research, 37, 144–156.
Rowland, R. C. (1988). Malpractice in audiology and speech-language pathology. Asha, 45–48.
Shanteau, J., & Stewart, T. R. (1992). Why study expert decision making? Some historical perspectives and comments.
Organizational Behavior and Human Decision Processes, 53, 95–106.
Thorner, R. M. & Remein, Q. R. (1962). Principles and procedures in the evaluation of screening for disease. Public
Health Service Monograph No. 67, 408–421.
Tracey, T. J., & Rounds, J. (1999). Inference and attribution errors in test interpretation. In J. W. Lichtenberg & R. K.
Goodyear (Eds.), Scientist-practitioner perspectives on test interpretation (pp. 113–131). Boston: Allyn & Bacon.
Turner, R. G. & Nielsen, D. W. (1984). Application of clinical decision analysis to audiological tests. Ear and Hearing,
5, 125–133.
Tversky, A. & Kahneman, D. (1993). Belief in the law of small numbers. In G. Keren & C. Lewis (Eds.), A handbook
for data analysis in the behavioral sciences: Methodological issues (pp. 341–350). Hillsdale, NJ: Lawrence Erlbaum
Associates.
Vetter, D. K. (1988a). Designing informal assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making
in speech-language pathology (pp. 192–193). Toronto: BC Decker Inc.
Vetter, D. K. (1988b). Evaluation of tests and assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision
making in speech-language pathology (pp. 190–191). Toronto: BC Decker Inc.
Yoder, D. E., & Kent, R. D. (Eds.) (1988). Decision making in speech-language pathology. Toronto: BC Decker Inc.
Page 14
Page 15
PART
I

BASIC CONCEPTS IN ASSESSMENT


Page 16
Page 17
CHAPTER
2

Measurement of Children’s Communication and Related Skills


Theoretical Building Blocks of Measurement

Basic Statistical Concepts

Characterizing the Performance of Individuals

Case Example
Theoretical Building Blocks of Measurement
What Is Measured by Measurements?
Measurements are usually indirect, that is, they involve the description of a characteristic taken to be closely related to
but different from the characteristic of interest. As an illustration of this notion, Pedhazur and Schmelkin (1991)
considered temperature. Conceptually, temperature is most closely related to the rate of molecular movement within a
material, yet it is almost always measured using a column of mercury. In this way, the measurement is made indirectly
using the height of the column of mercury as the indicator, or indirect focus of measurement. Although it would be
possible to determine the rate of molecular movement more directly, this is not done because of the considerable expense
and effort involved.
Similarly, measurements of behavior or other characteristics of people are almost always indirect. Consider, for example,
a characteristic that might be of interest to a
Page 18
speech-language pathologist, such as a child’s ability to understand language. Clearly, as in the case of temperature, one
cannot easily measure this characteristic in a direct fashion. In fact, the ability to understand language cannot ever be
directly measured but instead must be inferred from a variety of indicators. This is because that ability is a theoretical
construct,1 a concept used in a specific way within a particular system of related concepts, or theory. Thus, the
theoretical construct referred to here as “the ability to understand language” represents shorthand for the carefully
weighed observations one has made about people as they respond to the vocalizations of others as well as for the
information one has read or been told about this construct by others. Figure 2.1 attempts to capture the complex
relationship between what one wants to measure, the theoretical construct, and the indicators used to measure it.
Looking at Fig. 2.1, you can see that there are many possible indicators for a single construct. This premise is important
to clinicians and researchers who need to recognize that any test or measure they use represents a choice from the set of
all possible indicators.
As will become clearer in later sections of this book, the wealth of indicators available for a construct presents flexibility
for those interested in measuring the construct, but it also presents potential problems. For example, a diverse range of
indicators for a single construct (e.g., intelligence) can lead to confusion when clinicians or researchers use different
indicators in relation to the same construct and reach different conclusions about both the construct and how the
characteristic being studied functions in the world. As an example, if one were to use an “intelligence” test that heavily
emphasizes knowledge of a particular culture, then use of that measure with children who come from a different culture
would lead to very different conclusions regarding how intelligent the children are.
Alternatively, focusing on a single indicator and ignoring the broader range of possible indicators for a given construct
can lead to its impoverishment. This type of problem has recently received attention in the literature on learning
disabilities, where it has been asserted that intelligence is synonymous with performance on one particular test—the
Wechsler Intelligence Scale for Children-Revised (Wechsler, 1974). Critics complain that the use of this single measure
means that the knowledge gained by such research may be far more limited in its appropriate application than has been
appreciated. In summary, the choice of which indicator and how many indicators are used in order to gain information
about a particular construct—be it intelligence, receptive language, or narrative production—have important implications
for the quality of information to be gained.
Pedhazur and Schmelkin (1991) described two kinds of indicators: reflective and formative indicators. Reflective
indicators represent effects of the construct, and formative indicators represent causes of it. An example of a reflective
indicator of one’s ability to understand a language would be the proportion of a set of simple commands in that language
that one can correctly follow. An example of a formative indicator of one’s ability to understand a language would be the
number of years one has
1 Within the literature on psychological testing, there is a tendency to refer to such constructs as latent variables.
Page 19

Fig. 2.1. The relationship between a theoretical construct—single word comprehension— and several indicators that
could be used to measure it.
been exposed to it. Almost all indicators are reflective; however, formative indicators are sometimes used.
By this point, you may be scratching your head, wondering whether the term indicator is synonymous with the
somewhat more familiar term variable. In fact, those terms are quite closely related and, at times, may be used
synonymously. I introduced the term indicator first because variable is so closely associated with research that its
application to clinical measures might have seemed confusing. Consequently, I believe that an initial discussion of
indicators may help readers see how similar clinical and research measures are to one another while averting the
confusion. For the purposes of this book, indicator and variable will be used almost interchangeably to refer to a
measurable characteristic associated with a theoretical construct. However, variable is frequently used in a more
restricted way than indicator, to refer to a property that takes on specific values (Kerlinger, 1973).
One more term that commonly functions as a building block for measurement in descriptions of human behavior and
abilities is the operational definition. This term was originally introduced in physics by Bridgman (1927) to suggest that
in a given application (e.g., a specific research design or a particular clinical measure) a construct can be considered
identical to the procedures used to measure it. Operational definitions have been influential in communication disorders
because they have given rise to the clinical use of behavioral objectives, specific statements defining desired outcomes of
treatment for clients in terms that explain exactly how one will know whether the desired outcome has been achieved.
Operational definitions are probably most useful as a means of encouraging us to think carefully about the specific
indicators we use to gain information about a given theoretical construct.
Page 20
TESTING AND MEASUREMENT CLOSE-UP:
ALFRED BINET AND THE POTENTIAL EVILS OF REIFICATION
In his 1981 book The Mismeasure of Man, Stephen Jay Gould, a noted biologist and popularizer of science, described
the work of Alfred Binet, the Frenchman who developed one of the first well-known intelligence tests. Gould noted
that Binet began to develop the test in 1904 when he was commissioned by the minister of education to devise a
practical technique for ‘‘identifying those children whose lack of success in normal classrooms suggested the need for
some form of special education” (p. 149). Almost as soon as the test came into use, Binet expressed hopes that its
results not be taken as iron-clad predictions of what a child could achieve, but that they be used as a basis for providing
help rather than as a justification for limiting opportunities. Gould went on to describe the regrettable dismantling of
Binet’s fond hope.
Gould’s book describes the process of the reification of intelligence, a process in which an abstract, complex
theoretical construct (such as “intelligence”) comes to have a life of its own, to be seen as real rather than the abstract
approximations that its originators may have had in mind. To illustrate this process, Gould described events in the
United States that occurred within a mere 20 years of Binet’s initial test development. Intelligence had been reified to
the point that it was used—or rather misused—as a basis for decisions having major effects on military service,
emigration policies, penal systems, and the treatment of individuals suspected of “mental defectiveness: ”
Levels of Measurement
There are numerous ways to categorize measurements, but the notion of levels, or scales, of measurement introduced by
S. S. Stevens (1951) is one of the most influential and continues to inspire both defenders and attackers. Stevens’s levels
describe the mathematical properties of different kinds of indicators, or variables. The concept of levels is usually
defined operationally, with each level of measurement described in terms of the methods used to assign values to
variables—for example, whether the values are assigned using categories (normal vs. disordered) versus numbers
(percentage correct).
Typically, a hierarchical system of four ordered levels is discussed, in which the higher levels preserve greater amounts
of information about the characteristic being measured. Table 2.1 summarizes the defining properties of each level of
measurement and lists examples of each that relate to the assessment of childhood language disorders. These levels not
only have implications for our interpretation of specific measures, but also what statistics will be appropriate for their
further investigation.
The nominal level of measurement refers to measures in which mutually exclusive categories are used. Diagnostic labels
and category systems for describing errors are frequently used examples of nominal measures. Although numerals may
some-
Page 21
Table 2.1
Three Levels of Measurement, Their Defining Characteristics
and Examples From Developmental Language Disorders

Level of Measurement Characteristics Examples

Nominal
Mutually exclusive categories Describing a child as having word-finding difficultiesLabeling a child’s problem as
specific language impairmentDescribing a child’s use and nonuse for each of 14 grammatical morphemes Ordinal
Mutually exclusive categoriesCategories reflect a rank ordering of the characteristic being measured Describing the
severity of a child’s expressive language difficulties as severeCharacterizing a child’s intelligibility along a rating scale,
such as “intelligible with careful listening,” where no effort has been made to assure that the scale has equal
intervalsDescribing a child’s language in a conversational sample as productive at a particular phase (Lahey, 1988)
Interval Mutually exclusive categoriesCategories reflect a rank ordering of the characteristic being measuredUnits of
equal size are used making the comparison of differences in numbers of units meaningful Summarizing a child’s
standardized test performance using a raw or standard scoreDescribing a child’s spontaneous use of personal pronouns
using the number of correct responsesRating intelligibility using an equal-interval scale
times be used as labels for nominal categories (e.g., serial numbers or numbers on baseball jerseys), nominal
measurements are not quantitative and simply involve the assignment of an individual or behavior to a particular
category. Measurement at this level is quite crude in that all people or behaviors assigned to a specific category are
treated as if they are identical.
Ideally, categories used in nominal level measures are mutually exclusive: Each person or characteristic to be measured
can be assigned to only one category. Diagnostic labels used in childhood language disorders can ideally be thought of as
nominal; however, they are not always mutually exclusive. For example, a child may have language problems associated
with both mental retardation and hearing impairment. Similarly, a child with mental retardation may show a pattern of
greater difficulties with linguistic than nonlinguistic cognitive functions, leading one to want to entertain a designation of
the child as both language impaired and mentally retarded (Francis, Fletcher, Shaywitz, Shaywitz, & Rourke, 1996).
The ordinal level of measurement refers to measures using mutually exclusive categories in which the categories reflect
an underlying ranking of the characteristic
Page 22
to be measured. Put differently, at this level, categories bear an ordered relationship to one another so that objects or
persons placed in one category have less or more of the characteristic being measured than those assigned to another
category. Despite the greater information provided at this level of measurement compared with the nominal level, it lacks
the assumption that categories differ from one another by equal amounts. Severity ratings are probably the most
commonly used ordinal measures in childhood language disorders.
Although ordinal measures reflect relative amounts of a characteristic, they are still not quantitative in the sense of
reflecting precise numerical relationships between categories. For example, although a profound expressive language
impairment may be regarded as representing “more” of an impairment than a severe expressive language impairment, it
is not clear how much more of the impairment is present.
One result of the absence of equal distances between categories (also called equal intervals) in an ordinal measure is that
when rankings are based on an individual judgment, they are likely to be quite inconsistent across individuals. Imagine
the case of a clinician who only serves children with devastatingly severe language impairments. When that clinician
uses the label mild to describe a child’s problems, it may mean something very different from the level of impairment
meant to be conveyed by the same label when it is used by clinicians serving a less involved population. Because of this,
it has been recommended that ordinal measures be used when the ratings made by a single individual will be compared
with one another, but not when ratings of several people will be compared (Allen & Yen, 1979; Pedhazur & Schmelkin,
1991).
The interval level of measurement refers to measures using mutually exclusive categories, ordered rankings of
categories, and units of equal size. It is the highest level of measurement usually encountered in measurements of human
abilities and behavior. Unlike measurements at the first two levels, measurements at this level can be considered
quantitative because numerical differences between scores are meaningful, as was not the case for numerals used at the
nominal or ordinal levels. Test scores are usually identified as the most frequent examples of this level of measurement
in childhood language disorders.
The use of equal-size units in interval-level measurements allows more precise comparisons of measured characteristics
to take place. For example, someone who receives a score of 100 on a vocabulary test can be said to have received 10
more points than someone who received a score of 90, and the same can be said for the person who scored 40 points
when compared with someone who scored 30 on the same test. What cannot be said, however, is that someone who
received a score of 80 knew twice as much as someone who received a score of 40—that comparison entails a ratio
(80:40), and the ability to describe ratios precisely is not reached until the final level of measurement. However, for most
measurement purposes, the interval level of measurement allows sufficient precision.
The ratio level of measurement refers to measures using mutually exclusive categories, ordered rankings of categories,
equal-size units, and a real zero. Achievement of this level of measurement is considered rare in the behavioral sciences,
but occurs when a measure demonstrates all of the traits associated with interval measures along
Page 23
with a sensitivity to the absence of the characteristic being measured—the “real zero” mentioned above. The term ratio
is used to describe such measures because ratio comparisons of two different measurements along this scale hold true
regardless of the unit of measurement that is used. It should also be noted that when ratios are formed from other
measures, they achieve this level of measurement. For example, the ratio of a person’s height to weight falls at the ratio
level of measurement. Measures involving time (such as age or duration) are probably the most common of the relatively
few measures in childhood language disorders that reach the ratio level.
At this point, readers may wonder why score data are not described as falling at the ratio level of measurement given that
a score of 0 on a test or other scored clinical measures is an unpleasant but real possibility. For score data, however, the
zero point is considered an arbitrary zero rather than a real zero because a score of 0 does not reflect a real absence of the
characteristic being studied (Pedhazur & Schmelkin, 1991). Thus, for example, a score of zero on a 15-item task
concerning phonological awareness is not considered indicative of a complete absence of phonological awareness on the
part of the person taking the test. In order to demonstrate that a person has no phonological awareness, the test would
need to include items addressing all possible demonstrations of phonological awareness and would therefore be too long
to administer (or devise, for that matter).
Information concerning levels of measurement may be a review to many readers who remember it from past statistics or
research methods courses. Levels of measurement are introduced in those contexts because each level is associated with
specific mathematical transformations that can be applied to measurements at that level without changing the
relationship between the characteristic to be measured and the value or category assigned to it. Those mathematical
properties, in turn, determine the types of statistics considered appropriate to the measure. In general, the lower the level
of measurement, the less information contained in the measure and the less flexibility one will have in its statistical
treatment.
Recall that a given construct may be associated with indicators at various levels of measurement. Consequently, the level
of measurement of an indicator may be one consideration when choosing a particular measure. Thus, for example,
imagine that you are interested in characterizing a child’s skill at structuring an oral narrative. At the crudest level, one
might choose to label a child’s performance in the production of such a narrative as impaired or not impaired—
measuring it at a nominal level. For greater precision, however, a spontaneous narrative produced by the child might be
rated using a 5-point scale, with 1 indicating a very poorly organized narrative and 5 a narrative with adult-like
structure. Yet probably the most satisfactory type of measure for describing this child’s difficulties is one at the interval
level of measurement. An example of such a measure for narrative production is one devised by Culatta, Page, and Ellis
(1983), in which the child receives a score for the number of propositions correctly recalled in a story-retelling task.
With such a measure (as opposed to measures at the nominal or ordinal levels), you can obtain greater insight into the
nature of the difficulties facing the child and can more readily make comparisons to the severity of other children with
problems in narrative production.
Page 24
Basic Statistical Concepts
As a branch of applied mathematics, the field of statistics has two general uses: describing groups of measurements made
to gain information about one or more variables and testing hypotheses about the relationships of variables to one
another. For many students in an elementary statistics class, each of these uses represents a vast, awe-inspiring, and
sometimes fear-provoking landscape. In this section of the chapter, only the highest peaks and lowest valleys of these
landscapes will be surveyed. Specifically, selected statistical concepts are introduced in terms of their meaning and the
practical uses to which they are applied by those of us interested in measuring children’s behaviors and abilities.
Although statistical calculations are described, only rarely are specific formulas given so that the connection between
meaning and application can remain particularly close. More elaborate and mathematically specific discussions can be
found in sources such as Pedhazur and Schmelkin (1991).
Statistical Concepts Used to Describe Groups of Measurements
One of the most common uses of statistics is to summarize groups of measurements, typically referred to as distributions.
Distributions can consist of a set of measurements based on actual observations (often called a sample) or a set of values
hypothesized for a set of possible observations (often called a population). An example of a distribution based on a
sample would be all of the test scores obtained by children in a single preschool class on a screening test of language. In
contrast, an example of a distribution based on a population would be all of the scores on that same test obtained by any
child who has ever taken it. Except when population distributions are discussed from a purely mathematical point of
view, they are almost always inferred from a specific sample distribution because of the impracticality or even
impossibility of measuring the population.
Two types of statistics used to summarize distributions of measurements are measures of central tendency and
variability. Measures of central tendency are designed to convey a typical or representative value, whereas measures of
variability are used to convey the degree of variation from the central tendency.
Measures of central tendency have been described as indicating “how scores tend to cluster in a particular
distribution” (Williams, 1979, p. 30). The three most common measures of central tendency are (in order of decreasing
use) the mean, median, and mode. The mean is the most common measure of central tendency. It is used to refer to the
value in a distribution that is the arithmetic average, that is, the result when the sum of all scores in a distribution is
divided by the number of scores in the distribution. Unlike the two other measures of central tendency, the mean is
appropriate only for measurements that fall at interval or ratio levels. Although it is considered the richest measure of
central tendency, the mean has the negative feature of being particularly sensitive to outliers—extreme scores that differ
greatly from most scores in the distribution. Because of this, the mean will sometimes not be used even if the level of
measurement allows it; instead, the median, which is the next most sensitive measure of central tendency will be used.
Page 25
The median is the score or category that lies at the midpoint of a distribution. It is the middle score in the case of
ungrouped distributions of interval or ratio data and the middle category in the case of ordinal data. The median is
considered an appropriate measure of central tendency for either ordinal or interval measures and is even superior to the
mean in terms of its relative stability in the face of outliers. On the other hand, it is considered inappropriate for nominal
measures because the categories used at that level of measurement cannot, by definition, be ordered logically. Because of
this lack of ‘‘order” in nominal data, finding a middle score or category is nonsensical.
The third and final measure of central tendency, the mode, has relatively few uses. The mode simply refers to the most
frequently occurring score (for interval or ratio data) or category (for nominal data). Because of the way the mode is
defined, it is possible for there to be more than one mode in a given distribution, in which case the distribution from
which it comes can be referred to as bimodal, trimodal, and so forth. For nominal level data, the mode is the only
suitable measure of central tendency.
Because measurements within a distribution vary, a measure of variability is also required to characterize it effectively.
Three measures of variability, two of which are very closely related, are most frequently used in descriptions of
children’s abilities and behaviors. As was done in the description of measures of central tendency, these measures will be
described in order of decreasing use.
Although considered somewhat daunting by beginning statistics students because of its relatively involved calculations,
the most frequently used measure of variability is the standard deviation. The standard deviation was developed for
interval and ratio measures as an improvement on the seemingly good idea of describing the average (or mean)
difference (or deviation) from the mean. The problem with an average deviation was that because of the way the mean is
defined, all of the deviations above the mean are positive in sign and would therefore balance all of the negative
deviations falling below the mean, leading to an average deviation of zero for all distributions—regardless of obvious
differences in variability from one distribution to another. In order to avoid this problem, the standard deviation is
calculated in a manner that makes all deviations positive. Nonetheless, the intent behind the standard deviation is to
convey the size of the typical difference from the mean score. As I expand on in an upcoming section of this chapter, the
standard deviation has special significance because of its relationship to the normal curve. Specifically, standard
deviation units become critical to comparisons of one person’s score against a distribution of scores, such as occurs when
test norms are used.
The concept of variance is closely related to the standard deviation. In fact, the standard deviation of a distribution is the
square root of its variance. Despite this very close relationship to standard deviation, variance is less frequently used
because, unlike the standard deviation, it cannot be expressed in the same units as the measure it is being used to
characterize. For example, you can describe the age of a group of children in months by saying that the mean age for the
group is 36 months, and the standard deviation is 3 months. This results in a much clearer description than saying that
the mean age for the group is 36 months, and the variance is 9. No, not 9 months—simply 9. Because of this “unit-
lessness,” variance is rarely used when the
Page 26
intent is simply to describe the characteristics of a group. It does play a role in some statistical operations, however, and
so is an important statistic to be aware of.
The least complicated measure of variability, the range, is also the least frequently used of the three measures. It
represents the difference between the highest and lowest scores in a distribution. The utility of the range lies in its ease of
calculation and its applicability to distributions at any level of measurement other than the nominal level. For interval or
ratio data, it is calculated by subtracting the lowest from the highest score and adding 1. Thus for example, if the highest
and lowest scores in a distribution of test scores were 85 and 25, respectively, the range would be 61. At the ordinal
level, the range is usually reported by indicating the lowest to highest value used. For example, one might report that
listener ratings of a child’s intelligibility in conversation ranged from usually unintelligible to intelligible with careful
listening, or from 2 to 4 if a 5-point numeric scale were used. Because the range is based on only two numbers (or two
levels in the case of an ordinal measure), its weakness is the lack of sensitivity and susceptibility to the effects of outliers.
In summary, measures of central tendency and variability are useful for describing groups of measurements related to a
single variable and are selected on the basis of the variable’s level of measurement.
Statistical Concepts Used to Describe Relationships between Variables
A number of statistical concepts are available to describe relationships between and among two or more groups of
measurements and to test hypotheses about the nature of those relationships. Because the intent here is to focus only on
those concepts most basic to understanding measurement applications in developmental language disorders, only one of
those concepts will be discussed in some detail—the correlation.
The correlation between two variables describes the degree of relationship existing between them as well as information
about the direction of that relationship and its strength. Correlation coefficients typically range in degree from 0
(indicating no relationship) to positive or negative 1 (indicating a perfect relationship in which knowing one measure for
an individual would allow you to predict that person’s performance on the second measure with perfect accuracy). The
sign of the correlation refers to its direction: A positive correlation indicates that as one measure increases, the second
measure increases as well. Relationships associated with a positive correlation are said to be direct. A vivid example of a
direct relationship would be the relationship some see between money and happiness. In contrast, a negative correlation
indicates that as one measure increases, the second measure decreases. Relationships associated with a negative
correlation are said to be inverse. A vivid example of an inverse relationship would be the relationship between unpaid
bills and peace of mind.
Figure 2.2 contains examples of graphic representations of correlations that differ in magnitude and direction. Notice that
two of the correlations are described as being associated with a correlation coefficient of 0. The second of those
demonstrates a curvilinear relationship, which cannot be captured by the simple methods described here.
Page 27

Fig. 2.2. Illustrations showing the variety of relationships that can exist between variables and can potentially be
described using correlation coefficients. These include no relationship (i.e., the value of one variable is independent of
the value of the other), a curvilinear relationship (i.e., in which the nature of the relationship between variables changes
in a curvilinear fashion depending on the value of one of the variables), and linear relationships of lower and greater
magnitudes.
As a more detailed (and relevant) example involving correlation, let’s consider two hypothetical sets of test scores
obtained for a class of third graders—one on reading comprehension and the other on phonological awareness (explicit
knowledge of the sound structure of words). If this group of children were like many others, then one would expect their
performances on these two measures to be positively correlated (e.g., Badian, 1993; Bradley & Bryant, 1983)—that is,
one would expect that children who receive higher scores on the reading comprehension test would receive higher scores
on the phonological awareness test. However, because many factors affect each of the abilities targeted by the measures,
it would be unlikely that the magnitude of the correlation, which reflects the strength of the association, would be very
large. In
Page 28
fact, a low correlation might be expected in this context. Table 2.2 contains labels that are frequently used to describe
correlations of various magnitudes (Williams, 1979).
The correlation coefficient most frequently used in describing human behavior is the Pearson product–moment
correlation coefficient (r), the specific type of correlation that would have been appropriate for the example given above.
Unfortunately, that correlation coefficient is only considered appropriate for measurements at the interval or ratio level
of measurement. For measurements at the ordinal level, Spearman’s rank-order correlation coefficient (ρ) can be
calculated. At the nominal level, the contingency coefficient (C) is used to describe the relationship between the
frequencies of pairs of nominal categories.
In addition to these correlation coefficients, however, there are several other correlation coefficients (e.g., phi, point
biserial, biserial, terachoic) that are used during the development of standardized tests. The choice of these less familiar
correlation coefficients is dictated by the characteristics of the measurements to be correlated, such as whether either or
both of the measurements are dichotomous (e.g., yes–no, correct–incorrect), multivalued (e.g., number correct), or
continuous (e.g., response times).
It is easy to be intimidated by an unfamiliar correlation coefficient. However, this danger can be countered with the
knowledge that the concept of correlation remains the same, regardless of how exotic the name of the specific
coefficient. Thus, whether one is using phi or Pearson’s product–moment correlation, a correlation coefficient always is
intended to describe the extent to which two measures tend to vary with one another. In fact, even when one examines
the relationships between the distributions of more than two variables using multiple correlations, the interpretation of
correlations remains essentially unchanged.
Correlation coefficients are usually reported along with a statement of statistical significance, which describes the extent
to which the correlation coefficient is likely to differ from zero by chance, given the size of the sample on which it is
based. In general, statements of statistical significance always carry the implication that although a particular sample of
behavior was observed, it is being used to draw conclusions for the larger population. Statements of statistical
significance are used to test hypotheses—conjectural statements about a relation between two or more variables
(Pedhazur & Schmelkin, 1991). In this case, the hypothesis is that the obtained correlation coefficient differs from zero.
Statistical significance indicates that the obtained value was unlikely to have occurred by chance.
Table 2.2
Descriptive Labels Applied to Correlations of Varying Magnitudes

Correlation Label Degree of Relationship

<.20 Slight correlation Almost negligible relationship


.20–.40 Low correlation Definite, but small relationship
0.40–.70 Moderate correlation Substantial relationship
0.70–.90 High correlation Marked relationship
> .90 Very high correlation Very dependable relationship
Page 29
Unfortunately, a correlation’s statistical significance is sometimes mistakenly taken as the most important indication of
its importance. However, a very low correlation coefficient is unlikely to be important even if it is statistically significant
because it does not explain much of the variability of the correlated measures. In addition, the larger the sample size, the
easier it is for a correlation coefficient to attain statistical significance. Therefore, although a statistically significant
correlation coefficient is always desirable, the magnitude as well as the significance of the correlation must be
considered.
An additional concern surrounding the interpretation of correlation coefficients, such as the Pearson product–moment
correlation coefficient, is that its magnitude does not itself reflect the extent to which two variables explain one another.
Instead, that information is provided by a closely related statistic, the coefficient of determination, which can also be
referred to as “variance accounted for,” or r2, for the Pearson product–moment correlation. It is calculated by squaring
the correlation coefficient and multiplying it times 100. As an example, assume that the correlation between two sets of
test scores was .60 (a moderate correlation according to Table 2.2). The corresponding coefficient of determination
would be 36%, meaning that 36% of the variation observed in the two sets of test scores was accounted for by their
relationship—leaving a substantial 64% unexplained. Awareness of this concept becomes important in evaluating
correlational evidence provided by test developers to support the quality of their test.
In their book on phonologic disorders, Bernthal and Bankson (1998) made a general point concerning the limitations of
statistical significance as an indication of the importance a research finding. Although they were not talking specifically
about correlational data, they warned clinicians against the assumption that any statistically significant finding reported
in the research literature was worthy of impact on clinical practice. They use the term clinical significance to suggest that
only relatively large effects (i.e., those that would be associated with a relatively large proportion of variance accounted
for) would likely be of importance in the clinical environment. They encouraged readers to look for evidence of the size
of relationships in the form of “variation accounted for,” which is reported as omega-squared for many analyses (Young,
1993). For the purposes of this book, Bernthal and Bankson’s caution should be considered as it applies to both
correlation coefficients and any statistical finding that might be used in discussions of children’s language abilities.
A final cautionary statement concerning the interpretation of correlations is the fundamental idea that the existence of a
correlation between two measures does not constitute evidence of a causal relationship between them. Thus, returning to
the example initially used to introduce the concept of correlation, remember that children’s scores on two tests, one of
reading comprehension and one of phonological awareness, were found to be correlated. Although it would be very
tempting to conclude that children’s phonological awareness “caused’’ their comprehension performance, that would be
an incomplete, even incorrect interpretation of the correlation. Theoretically such an interpretation would be quite
inviting because it would be easy to imagine that a greater familiarity with the sound structure of a written language
would make its processing easier, thus resulting in improved comprehension. In fact, however, it is equally plausible that
children’s comprehension caused their performance on the phonological tasks. That is, their level of comprehension may
have allowed them to process the
Page 30
sound information of the language to a greater degree because they were not as overwhelmed with the other memory and
processing demands associated with understanding text. Thus, they would perform better on the phonological awareness
test because of their comprehension skills. Finally, it would also be plausible to imagine that children’s performances on
both tasks were in fact caused by some third variable or by multiple variables. The oft-repeated warning not to confuse
correlation with causation is probably one of the most important lessons in this or any book because of its impact on
critical thinking in nonscientific as well as scientific realms.
In addition to simple correlations, a wide range of other statistics are available for examining hypotheses about the
relationship between variables. Frequently, hypotheses relate to the relationship of one or more classification variables (e.
g., age and gender) to an outcome or response variable (e.g., performance on a particular test). Alternatively, statistics
are used to determine whether one or more variables have a causal effect on a response variable. When that is the case,
variables hypothesized to be causes are termed independent variables and those hypothesized to be effects are termed
dependent variables. Selection of specific statistical techniques for testing a hypothesis depends quite heavily on the
level of measurement of the outcome or dependent variable.
Variables measured at the interval or ratio level of measurement are generally studied using parametric statistics (e.g., t
tests, analyses of variance, or ANOVAs); whereas variables measured at nominal or ordinal levels are examined using
nonparametric statistics (e.g., chi-square analyses and Cochran’s Q). Nonparametric statistics are also used when the
dependent variable seems to be distributed in a manner that either departs significantly from a normal distribution or
seems likely to violate assumptions underlying the use of normal distributions. A concise introduction to the decision
making behind the selection of an appropriate statistical technique can be found in Chial (1988). Longer discussions can
be found in Freedman, Pisani & Purves (1998) or McClave (1995) for parametric statistics, and Conover (1998) or
Gibbons (1993) for nonparametric statistics.
Statistical techniques for testing hypotheses are not explored further here because of their relatively limited use in
assessing children’s language disorders. They primarily come into play in the documentation provided by test developers
to support the value of standardized measures, and they will be discussed further in that context in the next chapter.
Characterizing the Performance of Individuals
Methods for summarizing an individual’s performance vary depending on the nature of the measurement being made.
Numerous schemes for categorizing measurements of human behavior have been proposed. These categorizations often
assume that the measurements of interest are formal tests because tests are the most studied form of measurement related
to human abilities and behaviors. One frequently discussed categorization separates achievement testing from ability
testing; the former seeks to measure actual learning, and the latter seeks to measure learning potential. Within
achievement testing, distinctions are made between placement testing, which takes place prior to instruction; formative
and diagnostic testing, which take place during
Page 31
instruction; and summative testing, which takes place at the end of instruction (Gronlund, 1982). Formative testing is
designed to measure the learner’s progress as learning is underway, whereas diagnostic testing identifies the source of
difficulties impeding the learner’s progress. Summative testing is designed to evaluate learning progress at some ending
point, for example, at the end of a school term.
Other categories applied to tests have included paper-and-pencil tests, the most studied medium for test execution;
performance tests, which typically involve the test taker’s manipulation of objects or performance of some activity that
usually does not involve the use of paper and pencil; and computerized tests, which involve the use of computer displays
or both computer display and keyboarded responses. Although performance tests predominate as a method of testing in
developmental language disorders, paper-and-pencil tests are typically used in cases when written language skills are
assessed. Computerized testing is a growing topic of interest (e.g., Wiig, Jones, & Wiig, 1996) because of the
possibilities it presents for providing more interesting, even animated stimuli and for greater tailoring of test items to a
client’s needs by choosing later items based on earlier performance (Bunderson, Inouye, & Olsen, 1989). Each of these
types of tests alters aspects of the test administration and scoring process and thus indirectly affects the interpretation of
individual scores.
Although tests and other measures can be categorized along many different dimensions, the categorization of measures
as norm-referenced versus criterion-referenced has the greatest impact on how individual performances are interpreted.
In fact, at times, these two categories are referred to as modes of score interpretation rather than types of tests (e.g., APA,
AERA, & NCME, 1985).
Norm-Referenced versus Criterion-Referenced Measures
Overall, norm-referenced measures are those for which an individual’s performance is interpreted in relation to the
performance of others, and criterion-referenced measures are those for which an individual’s performance is interpreted
in relation to an established behavioral criterion. Table 2.3 lists some norm-referenced and criterion-referenced measures
with which readers may have had personal experience as well as some that are commonly used in developmental
language disorders. Although not every author would agree that some of the more informal of these measures should be
categorized as normor criterion-referenced, each of the measures fits within the definitions appearing at the beginning of
this paragraph.
The dependence of this categorization on the method used to interpret an individual’s score can be illustrated using the
brief example in Table 2.4, which I call the Amazing University of Vermont Test. Imagine first that this would-be test is
to be given to determine which incoming students to the University will receive a scholarship being granted by the
University’s Alumni Association. If that were the test’s purpose, appropriate score interpretation would involve
comparing all of the incoming first-year students to see which ones had the most knowledge and thus would receive the
scholarship. That method of score interpretation, therefore, would depend not only on knowledge of a single test taker’s
score, but also on knowledge of the performance of the entire group against which the individual’s performance was to
be compared.
Page 32
Table 2.3
Examples of Criterion- and Norm-Referenced Measures Associated With Readers’
Personal Experiences and Clinical Practice in Developmental Language Disorders

Norm-referenced Criterion-referenced

Personal experience Developmental language disorders Personal experience Developmental language disorders

IQ testsGREsSATsClassroom tests (with grading on the curve) IQ testsMost language tests Driver’s testEye
examinationClassroom examination (without grading on the curve) Most articulation or phonology testsTreatment
probes in which a set criterion (e.g., 80%) is used Note. GRE = Graduate Record Examination; SAT = Scholastic
Aptitude Test.
Table 2.4
The Amazing University of Vermont Test

1. The University of Vermont is located in (a) Burlington, Vermont (b) Montpelier, Vermont
(c) Manchester, New Hampshire (d) St. Albans, Vermont
(e) Enosburg Falls, Vermont
2. The official acronym for the University is (a) U of V (b) VU (c) UVM (d) MUV
(e) none of the above
3. The number of students attending the University is (a) 500–1500 (b) 1500–3000 (c) 3000–4500
(d) 4500–6000 (e) > 10,000
4. The school colors are (a) grey and white (b) green and white
(c) grey and green (d) green and gold (e) grey and gold
5. The mascot of the University is (a) snowy owl (b) raccoon (c) barn owl (d) catamount
(e) Jersey cow
6. The most popular spectator sport at the University is (a) cow tipping (b) ice hockey (c) football
(d) downhill skiing (e) snowboarding
7. The most famous philosopher graduating from UVM was (a) Ethan Allen (b) Ira Allen (c) Woody Allen
(d) Woody Jackson (e) John Dewey
8. Translated from the Latin, the school motto means (a) Scholarship and hard work (b) Stay warm
(c) Live free and stay out of New Hampshire
(d) Suspect flatlanders (e) Independence and dignity

Such a comparison group is called a normative group, hence, the designation norm-referenced to refer to the method of
score interpretation and sometimes to refer to the specific type of measure being used.
Norms, then, refer to the specific information about the distribution of scores associated with the normative group. Two
types of norms merit special attention: national norms and local norms. National norms are data concerning a group that
has been recruited so as to be representative of a national cross section of individuals who might be tested. Norms for
tests involving children are typically organized so that information
Page 33
based on subgroups of children are reported by age (usually in 2–6 month intervals), by grade, or both. It is often
recommended that when norms are collected, the normative groups be matched against national data (usually census
data) for socioeconomic status, race, ethnicity, education, and geographic region (Salvia & Ysseldyke, 1995). National
norms are collected almost solely for standardized measures that will be used with very large numbers of individuals
each year. For example, intelligence tests, educational tests, and many language tests typically provide national norms.
Local norms are prepared when national norms for a measure are unavailable or inappropriate to a group of test takers.
They represent normative, data collected on a group of test takers like those on whom the measure will be used. Local
norms are especially useful when national norms are likely to be inappropriate for a group of test takers whose language
is unlike that in which the test is written. Most frequently, this would involve individuals who speak one of many
regional or social dialects that are significantly different from the idealized “standard” American English dialect, for
example, speakers of Black American English or Spanish-influenced English. Alternatively, a clinician may want to
collect local norms for specific client populations for whom normative data are lacking (e.g., individuals with hearing
impairment, mental retardation, or cerebral palsy).
Rather than using the Amazing University of Vermont Test to compare performances of a number of test takers, you
might use the Amazing University of Vermont Test to determine whether a group of incoming students has adequately
learned the information included in their orientation materials. In that case, the outcome of the test could lead to a
student’s becoming exempt from an additional orientation session or being required to complete it.
For that testing purpose, scores would be interpreted in relation to a behavioral criterion, for example, 6 of 10 correct.
When interpreted in that way, the test could be described as a criterion-referenced measure. The level of performance
would then be considered a cutoff, or, less frequently, a cutting score. Often the term master is used to refer to a test taker
whose score exceeds the cutoff score, and nonmaster is used to refer to a test taker whose score falls below the cutoff.
Briefly then, in contrast to a norm-referenced interpretation, score interpretation for a criterion-referenced measure
hinges on knowledge of the person’s raw score and the cutoff score. Information about a reference or normative group is
not necessary. It is often useful, however, for developers of criterion-referenced measures to study group performances
as a means of determining a reasonable cutoff score—one that is empirically derived rather than based on an arbitrary
cutoff, for example at 80% correct.
In addition to differences in the mechanics of score interpretation, norm-referenced versus criterion-referenced measures
tend to differ in the scope of knowledge being assessed and the specific method used to choose items. Specifically, norm-
referenced measures tend to address a large content area which is sampled broadly; whereas criterion-referenced
measures tend to address a quite narrowly defined concept that is sampled in as exhaustive a manner as possible. For
norm-referenced measures, items are selected so that the greatest amount of variability in test scores is achieved among
test takers; whereas for criterion-referenced measures, items are selected primarily because of how well they address the
targeted construct. Figure 2.3 shows the steps involved in the development of standardized norm-referenced and
criterion-referenced instruments.
Page 34
At the beginning of this section, only a single measure, the Amazing University of Vermont Test, was used to introduce
the concepts of criterion- and norm-referencing. This was done in order to emphasize that method of interpretation is the
most crucial feature distinguishing norm- from criterion-referenced measures. Practically, however, because of
differences in how items are selected for each type of measure, it is very difficult to develop a single measure that can
equally support these two different approaches to score interpretation.
Types of Scores
Norm-Referenced Measures
For norm-referenced measures, a variety of test scores is useful. Because of the centrality of the comparison between the
test taker’s and the normative group’s performances, however, the raw score is of little value

Fig. 2.3. Steps involved in the development of norm-referenced and criterion-referenced standardized measures.
Page 35
except as the starting point for other scores. These other scores are termed derived scores because of their dependent
relationship to the raw score. Three types of derived scores deserve attention: developmental scores, percentile ranks,
and standard scores. These are listed in increasing order of both their value as a means of representing a test taker’s
performance and their complexity of calculation.
Developmental scores are the least valuable derived scores but are still ubiquitous in clinical and research contexts—a
paradox that I will address shortly. The two most commonly used developmental scores are age-equivalent scores and
grade-equivalent scores. A test taker’s age-equivalent score is derived by identifying the age group that has a mean score
closest to the score received by the individual test taker. For example if a test taker’s raw score of 85 corresponding the
mean raw score of a group of 3-year-olds, the age-equivalent score assigned to the test taker would be 3 years. If there is
no age group that exactly matches the score of a test taker, then an estimation is made of how many months should be
added to the age group whose mean falls just below that of the test taker, resulting in grade-equivalent scores, such as 2
years, 6 months or 5 years, 11 months. Typically, test users do not have to examine the group data directly, but are given
tables listing raw scores and the age-scores to which they correspond.
Grade-equivalent scores are similar in many respects to age-equivalent scores but are, as one would guess from their
name, derived from data concerning the mean performance of groups of test takers in different grades. When estimation
is required, grade-equivalent scores are reported in tenths of a grade. Thus, for a 12-year-old who achieves a score just
slightly above that of a group of 4th graders, a grade-equivalent of 4.1 or 4.2 might be assigned.
In psychometric circles, almost never is a kind word spoken about scores of this type. Long, derogatory lists of the
problems with developmental scores abound (e.g., McCauley & Swisher, 1984; Salvia & Ysseldyke, 1995), but the lists
invariably center around concerns that such scores are easily misunderstood and likely to be unreliable. Table 2.5
provides an elaborate version of these lists as well as a pointed commentary on developmental scores.
The appeal of developmental scores is twofold. First, the apparent uniformity of meaning of such scores across different
tests makes it seem that they allow for a comparison of skills in different areas and permit a sensitive quantification of
degree of impairment. Thus, when a 9-year-old child is said to have skills falling at the 7-year level in math and the 8-
year level in receptive language, it can be misinterpreted as indicating significant problems in both areas, with a more
severe impairment in mathematics. Although many individuals are quite aware of the low esteem in which
developmental scores are held, they nonetheless fall into misinterpretations like this. Given that age-equivalent scores
only crudely compare two scores as their means of norm-referencing, neither individual developmental scores nor
comparisons between them necessarily convey degrees of impairment. Depending on the tests used, for example, it may
be that a great many very normally developing children would exhibit the same “impaired” scores.
The second appeal of developmental scores lies outside the interests of individual test users. Numerous state and
insurance regulations demand that developmental scores be used to describe test performances, presumably on the basis
of the misconceptions cited earlier that meaningful comparisons between skill areas can be based
Page 36
Table 2.5
Five Drawbacks to Developmental Scores, Such as Age-Equivalent
and Grade-Equivalent Scores (Anastasi, 1982; Salvia & Ysseldyke, 1995)

1. Developmental scores lead to frequent misunderstandings concerning the meaning of scores falling below a child’s
age or grade. For example, a parent may interpret an age equivalent of 5 years, 10 months as evidence of a delay in
a 6-year-old. In fact, by definition, half of those children in a given age group (or grade level) would receive age-
equivalent scores below the child’s age. This problem arises because developmental scores contain no information
about normal group variability.
2. There is a tendency to interpret developmental scores as indicating that performance was similar to that of an
individual of corresponding age—for example, that a score of 3 years, 6 months would be associated with
performance that was qualitatively like that of a 3½-year-old. In fact, however, it is unlikely that the nature and
consistency of errors would be similar for two individuals with similar developmental scores but differing ages or
grade levels.
3. Developmental scores promote comparisons of children with other children of different ages or grades rather than
with their same-age peers.
4. Developmental scores tend to be ordinal in their level of measurement. Therefore, they lack flexibility in how they
may be treated mathematically and are prone to being misunderstood. For example, a “delay” of 1 year in a fifth
grader who receives a grade equivalent score of 4 is not necessarily comparable to a “delay” of 1 year in a ninth
grader who receives a grade-equivalent score of 8.
5. Developmental scores are less reliable than other types of scores.

on them. As I discuss in the next section of this chapter, such regulation of test users provides a vivid example of the
numerous cases in which assessment must respond to a variety of forces outside of the direct clinical interaction between
clinician and client. Typically, test users faced with the dilemma of having to report developmental scores are advised by
psychometricians to report them along with more useful derived scores in a manner that minimizes the likelihood of
misunderstanding.
Percentile ranks are actually one variety of a class of derived scores that includes quartiles and deciles. Percentile ranks
represent the percentage of people receiving scores at or below a given raw score. Thus, a percentile rank of 98, or 98th
percentile, indicates that a test taker received a score better or equal to those of 98% of persons taking the test (usually
the normative sample). This type of score has the distinct advantage of being readily understood by a wide range of
persons, including parents and some older children.
Percentile ranks have two disadvantages. The first is that they are sometimes misunderstood as meaning percentage of
correct responses on the test. Readers can avoid this false step if they remember that on a very difficult test, one could
perform better than almost anyone (and therefore have a high percentile rank), but in fact have obtained a low percentage
correct. The second disadvantage of percentile ranks is that, like developmental scores, they represent an ordinal measure
and thus cannot be combined or averaged.
Standard scores represent the pinnacle of scoring approaches used in norm-referenced testing. They preserve information
about the comparison between an individual and appropriate age group and information about the variability of the
normative group.
Page 37
In addition, they are at the interval level of measurement and thus can be combined and averaged in ways not possible
with the other types of scores discussed earlier.
Standard scores are ‘‘standard” because the original distribution of raw scores on which they are based has been
transformed to produce a standard distribution having a specific mean and standard deviation. Because standard scores
are normally distributed, they can be interpreted in terms of known properties of the normal distribution, especially
expectations concerning how expected or unexpected a particular score is. This makes standard scores a favored method
of communicating test results among professionals. Figure 2.4 illustrates the relationship between the normal curve and
several of the most frequently used scores: the z score, deviation IQ score, and T scores.
The most basic standard score is the z score, which has a mean of 0 and a standard deviation of 1. It is calculated by
taking the difference of a particular raw score from the mean for the distribution and dividing the result by the standard
deviation of the distribution. Each score is represented by the number of standard deviations it falls from the mean, with
positive values representing scores that were above the mean and negative values, representing those below the mean.
Because of the relationship between this type of score and the normal curve, it is possible to know that a z score

Fig. 2.4. The relationship between the normal curve and several of the most frequently used standard scores, including
the z-score, deviation IQ score, and T scores. From Assessment of children (p. 17), by J. M. Sattler, 1988, San Diego,
CA: Author. Copyright 1988 by J. M. Sattler. Reprinted with permission.
Page 38
of –2 falls 2 standard deviations below the mean and that fewer than 3% of the normative population had a score that low
or lower.
Other widely used standard scores in developmental language disorders are the deviation IQ and the T score. These
scores share the virtue of z scores in their known relationships to the normal curve: The deviation IQ has a mean of 100
and a standard deviation of 15. As an additional benefit, such scores are somewhat less open to the confusion associated
with negative numbers used in z scores. However, their interpretation remains quite challenging for people who are
unfamiliar with the use of the normal curve in score interpretation. Still, because of their strengths, standard scores such
as these are frequently used among professionals, with percentiles favored for use with other audiences.
Criterion-Referenced Measures
For criterion-referenced measures, raw scores are the major type of score because by definition such measures involve
the comparison of a raw score against a given criterion or cutting score. As mentioned previously, it is possible for the
cutoff score to be based on empirical study or for it to be arbitrarily established on the basis of hypotheses about the level
of performance, or performance standard, required for satisfactory advancement to later levels of skill acquisition
(McCauley, 1996).
Case Example
Case 2.1 illustrates most of the concepts discussed in this chapter as they relate to Austin, a 5-year-old boy with specific
language impairment. This hypothetical report is annotated to highlight instances where a measurement has been made
by the clinician. Specifically, both formal and informal measures are bolded in this case.
Case 2.1
Speech-Language-Hearing Center
353 Luse Street
Burlington, VT 05405-0010
Client’s name: Austin G. Date of Evaluation: 2/12/97
Address: (child’s home with mother and stepfather) Parents’ names: Leslie G. (mother)
284 Willow Creek Road Warren G. (stepfather)
Burlington, VT 05401 George C. (father)
33 Elm Street
Savannah, GA 31411
Date of Birth: 1/8/92 h: (912) 999-9393
Education Status: Kindergarten Referral Source: Dr. A. B. Park
School: Woodward Elementary School Student clinician: E. Miller, B.A.
2 Station Street Supervisor: R. J. Turner, M.S., CCC-SLP
Burlington, VT 05401
Date of report: 2/14/97
Page 39
BACKGROUND INFORMATION
Austin, a 5-year, 1-month-old boy, was seen today for a speech and language evaluation following referral by his
primary care physician. Dr. A. B. Park. Background information was obtained using a case history form, an in-depth
parent interview conducted with Mr. and Mrs. G., who accompanied Austin today, and a phone conversation with Mr.
C., Austin’s biological father.
The reasons given by Mr. and Mrs. G for today’s evaluation were growing concerns regarding Austin’s articulation,
overall intelligibility, and expressive language skills. Mr. and Mrs. G report that strangers and even other children in
Austin’s class find him difficult to understand and frequently ask him to repeat what he has said. He is also becoming
increasingly frustrated with family members when they fail to understand him, resulting in increasingly frequent and
escalating arguments with his older sister. Elizabeth (age 10). In contrast, they report that he understands everything
that is said to him and is recognized as a very bright child even by adults who fail to understand him.
Austin and his sister Elizabeth live with Mr. and Mrs. G and see their biological father, Mr. C, only at holidays and for
6 weeks in the summer. The parents divorced when Austin was 1 year old, and he calls his stepfather as well as his
biological father “Daddy.” Austin currently attends a kindergarten class in the Woodward Elementary School—
Burlington , where he has three or four especially close friends. According to his teacher Mrs. Smith’s reports to his
parents, Austin is a happy child Who is popular at least in part because of his enthusiastic manner and skill at
playground athletics. Because he is small for his age (in the 5th percentile for height and weight) and because of his
immature-sounding speech, he is sometimes teased by children from older classes about being a “baby,” but is readily
defended by his classmates and appears unaffected by such taunts, according to Mrs. Smith. She referred Austin for a
speech-language evaluation by the school speech-language pathologist in January because of concerns about his
language production and articulation, but otherwise she states that he is performing well in the kindergarten classroom.
Because circumstances prevented that evaluation from taking place, Mr. and Mrs. G had decided to seek an evaluation
at the Luse Center.
Austin’s birth and early health and developmental history are unremarkable except for delays in the onset of speech,
with only about 10 words by age 2 and no word combinations until age 3. Although he had shown a dramatic increase
in the length of his utterances over the past 2 years, his parents reported that he still speaks in incomplete. sentences
and produces many words incorrectly. Both biological parents reported a significant history of family members with
speech and language problems, including Mr. C., who received speech therapy until 5th grade for what appeared to
have been language-related concerns, two of Austin’s paternal uncles, one maternal aunt in the preceding generation,
and two maternal cousins.
Page 40
LIST OF ASSESSMENT TOOLS
The assessment procedures that were conducted during this evaluation are listed and reported in the paragraphs that
follow.
In addition, informal procedures were used to screen pragmatics, voice, and fluency. Overall results of these tests and
procedures are described in the following sections, with more detailed information about subtest performance and
specific errors available on summary test forms (see file).
Hearing
Austin’s hearing was screened using pure tones that were presented under headphones at 20 dB bilaterally at 500, 100,
2000, and 4000 Hz. He passed the screening in both ears.
Receptive Language
Austin’s ability to understand what is said to him was assessed using receptive portion of the Test of Language
Development—2 Primary (TOLD-P:2) and the Peabody Picture Vocabulary Test—3 (PPVT-2). On the receptive
language subtests of the TOLD-P:2, Austin received a listening quotient of 96, which approximates a percentile rank of
50. On the PPVT-2, his performance was even better. The raw score he obtained was 78, which corresponds to a
percentile rank of 75 and a standard score of 110.
Expressive Language
Austin’s ability to express himself was assessed using the TOLD-P:2 expressive portions and the Expressive One-
Word Picture Vocabulary Test—Revised, as well as informal measures obtained from a transcription of a
conversational sample taken as Austin played with his mother. Austin’s formal test scores were considerably lower on
these measures, in part because of the difficulties associated with his speech intelligibility. On EOWPVT-R, Austin
received a raw score
Page 41
of 20, which corresponds to the 5th percentile and a standard score of 76. Of his 10 errors on that test, approximately 4
were unambiguous with respect to the possible impact of his speech production difficulties; for example, they involved
the use of a more general or associated word than the target, or they consisted of instances when Austin said that he did
not know the name. On the TOLD-P:2 expressive subtests, Austin received an overall speaking quotient of 61, which
falls below the first percentile. An examination of his utterances during a conversation with his mother revealed
frequent omission of grammatical morphemes, an absence of complex sentences, and a tendency to overuse the word
‘‘thingy” to refer to numerous elements of a Lego construction that they built cooperatively.
Phonology and Oral–Motor Performance
The Oral Speech Mechanism Examination—Revised (OSME-R) was used to examine the adequacy of Austin’s oral
structures for speech production. His performance on that measure was well within the normal range, with no signs of
incoordination or weakness and no observable abnormalities of the structures used in speech. Errors noted in the
production of repeated syllables mirrored those in his conversational speech.
On the Bankson–Bernthal Test of Phonology, Austin received a word inventory score, which reflects the number of
words produced correctly, of 39, which corresponds to a Standard Score of 71 and a percentile rank of 3. Errors
occurred primarily on medial or final consonants. Patterns of errors that occurred most frequently were final consonant
deletion (omission of the final consonant in the word; e.g., “bat” becomes “ba”), cluster simplification (replacement or
less of one or more elements of a consonant cluster; e.g., “clown” becomes “clo”), and fronting (replacement of a velar
consonant by a more forward consonant; e.g., “gun” becomes “dun”). Efforts to elicit correct production of two
consonants that had not been produced correctly up to that point (viz., k, g) were undertaken using a phonetic
placement instructions and touch-cues resulted in velar fricative approximations. Other sounds consistently in error
were [s, z, r] and [l].
When the language sample discussed in the previous section was examined with regard to speech errors and
intelligibility, very ,similar error patterns were observed and the percentage of understandable words out of all words
spoken was determined to be 70%.
Screening for Other Language and Speech Problems
The conversational sample between Austin and his mother was also examined to screen for problems in pragmatics,
voice, and fluency. Austin’s use of language and his ability to describe the plot of a movie he had recently seen with
Page 42
out his mother appeared appropriate for his age. His voice quality and pitch were normal. Fluency also appeared
normal, although frequent repetitions and rewordings of sentences occurred in response to his mother’s verbal and
nonverbl indications of having difficulty in understanding some of his utterances. Although Austin’s awareness of his
communication difficulties is quite sophisticated in a child of his age, his facial expression and movements at times
suggested significant frustration.
Summary
Austin appears to be a bright and sensitive 5-year-old with no significant medical history, but a family history of
communication difficulties. Today’s evaluation reveals normal hearing and language comprehension, as well as good
conversational skills and normal voice and fluency. Austin’s difficulties in being understood are moderate to severe at
this time and appear to reflect his difficulties in using sounds as expected for his age and in selecting and combining
words to create grammatically acceptable sentences. His strong skills in other areas, support by family and school
personnel, and clear motivation to improve his communication efforts suggest a very positive prognosis for change.
Recommendations
Austin is likely to benefit from speech-language intervention conducted in individual and group setting at his school,
including in-class work conducted by his teacher in consultation with the school speech-language pathologist. Areas to
be targeted include phonology, expressive vocabulary, and syntax. Specific goals should address (a) the phonological
processes of final consonant deletion and fronting, (b) expressive vocabulary related to school activities, (c) the use of
grammatical morphemes that are not currently used but should be pronounceable given his current phonological
system, and (d) the development of strategies for dealing in a more relaxed way with listeners’ difficulties in under
standing Austin’s speech.
It was a pleasure to meet Austin and his family today and to have talked previously to others involved in his education
and upbringing. We urge you to call with any questions you might have about this report or Austin’s ongoing
development.
Sincerely,

E. Miller, B.A. R. J. Turner, M.S., CCC-SLP


Student clinician Supervisor
Page 43
Summary
1. Measurement is usually indirect, meaning that it involves the measurement of characteristics, sometimes called
indicators, that are closely related to but different from the characteristic being described by the process of measurement.
2. The use of theoretical constructs, which are examined using various indicators, underlies clinical as well as research
measurement.
3. Four levels of measurement, first proposed by S. S. Stevens (1951), are nominal, ordinal, interval, and ratio. These
levels correspond to different methods of assigning measurements to characteristics, which have implications for the
measurement’s appropriate interpretation and statistical study.
4. Measures at the nominal level, such as diagnostic labels or labels of error type, involve the assignment of measured
individual performances or behaviors to mutually exclusive categories. Measurements at the ordinal levels, such as
severity labels, also use mutually exclusive categories, but ones that can be ordered as demonstrating more or less of the
measured characteristic.
5. Measures at the interval level, such as test scores reported in raw or standard scores, involve the assignment of
numbered values to characteristics. This is the highest level of measurement usually attained in the behavioral sciences.
6. Often a theoretical construct can be measured using indicators falling at various levels of measurement.
7. Statistics are useful for gaining and summarizing information about groups of measurements, called distributions, as
well as for testing hypotheses about the relationships between distributions or between an individual score and a
distribution.
8. Two types of statistics used in summarizing distributions of measurements are measures of central tendency (e.g.,
mean, median, mode) and measures of variability (e.g., standard deviation, variance, range). Central tendency refers to
the most typical values in a distribution, whereas variability refers to the tendency of values in the distribution to differ
from one another.
9. In the measurement literature, correlation coefficients are the ones most used to describe the relationship between
groups of measures, with the Pearson product–moment correlation coefficient achieving the greatest use. Correlation
refers to the tendency for values of one distribution to be systematically related to values of another distribution.
10. Causal inferences cannot be made directly from observations of correlations: If variables A and B are related, it may
because A caused B, B caused A, or that both are caused by a third variable or set of variables.
11. Norm-referenced measures are interpreted through the comparison of a person’s performance to those of a relevant
normative group—usually using a derived score that incorporates information relevant to the comparison. Criterion-
referenced measures are interpreted through the comparison of a person’s performance to a performance standard—
usually using a raw score.
Page 44
12. Derived scores consist of developmental scores (age- and grade-equivalent scores), percentile ranks, and standard
scores (e.g., z scores, T scores, deviation IQ scores). Percentile ranks are probably the most widely used among lay
persons, whereas standard scores are preferred by most professionals. Although widely used, developmental scores are
the least respected type of score among professionals because they encourage misunderstanding and are less reliable than
other derived scores.
Key Concepts and Terms
ability testing: a systematic procedure for exploring learning potential.
achievement testing: a systematic procedure for examining past learning.
age-equivalent score: a derived score corresponding to the age group with the mean score that is closest to the raw score
received by the individual test taker.
behavioral objectives: a description of treatment goals in terms of client behaviors.
clinical significance: the likely value of a particular research finding on the basis of the reliability of the finding (i.e., its
statistical significance) and its magnitude.
computerized tests: tests that involve the use of computer display, keyboarded responses, or both.
correlation: the degree of relationship existing between two or more variables.
criterion-referenced measure: a measure in which scores are interpreted in relation to a particular behavioral criterion;
contrasts with norm-referenced measure.
developmental scores: a type of derived score in which development is taken into account, for example, age-equivalent
and grade-equivalent scores.
distribution: a group of scores, either theoretical or observed.
formative indicators: indicators that are associated with a cause of a construct that is of interest.
grade-equivalent scores: a derived score corresponding to the grade-specified group with the mean score that is closest to
the raw score received by the individual test taker.
indicator: an indirect object of measurement; something one measures in place of the characteristic one is really
interested in because it is both related to the actual focus of interest and is more accessible to measurement.
interval level of measurement: a level of measurement using mutually exclusive categories in which scores reflect a rank
ordering of the characteristic being measured and the difference between adjacent scores is equal in size; for example,
scores on a behavioral probe.
local norms: summaries of the performance of a relevant group of individuals that are obtained, often when national
norms are unavailable, for purposes of making a specific comparison between an individual test taker’s performance and
those of that group.
Page 45
mean: a distribution’s arithmetic average.
median: the middle score of a distribution.
mode: the most frequently occurring score(s) of a distribution.
national norms: summaries of the test performances of a large group of individuals against which a person’s performance
can be compared; usually consisting of individuals with known demographic characteristics.
nominal level of measurement: a level of measurement in which characteristics of an individual are assigned to mutually
exclusive categories (e.g., boys and girls).
nonparametric statistics: statistics that do not require assumptions about the nature of the underlying distribution from
which observations are drawn.
normal distribution: a theoretical distribution of scores or set of scores with known mathematical properties.
normative group: a group whose performance is used in the comparison and interpretation of an individual’s score in
norm-referenced score interpretation.
norm-referenced measure: a measure in which scores are interpreted in relation to the performance of a normative group;
contrasts with criterion-referenced.
norms: data concerning the distribution of scores achieved by a normative group.
operational definition: defining a variable through the operations used to measure it.
ordinal level of measurement: a level of measurement using mutually exclusive categories that reflect a rank ordering of
the characteristic being measured.
paper-and-pencil tests: conventional testing in which printed test materials are completed independently by literate test
takers.
parametric statistics: statistics that require certain assumptions about the nature of the underlying distribution from which
observations are drawn.
percentile ranks (percentiles): derived scores representing the percentage of individuals performing at or below a given
raw score.
performance standard: a criterion against which an individual’s performance can be compared in criterion-referenced
score interpretation.
performance tests: tests to assess skills involving the manipulation of objects or that otherwise are difficult or impossible
to assess using paper and pencil tests.
range: one method of describing the variability of a distribution; the difference between the highest and lowest scores in
the distribution.
ratio level of measurement: a level of measurement using mutually exclusive categories (scores) in which the scores
reflect a rank ordering of the characteristic being measured, the difference between adjacent scores is equal in size, and
there is a real zero along the scale; for example, the time elapsing between presentation of a picture and name production.
reflective indicators: indicators that are associated with the effects of a construct that is of interest.
Page 46
standard deviation: a method of describing the variability of a distribution of scores; the square root of the variance.
standard scores: derived scores in which a transformation has been used to assure a predetermined mean and standard
deviation, for example, a mean of 100 and standard deviation of 15.
statistical significance: statistical evidence that an obtained value was unlikely to have occurred by chance.
theoretical construct: a concept used in a specific way within a particular system of related concepts.
theory: a system of related concepts, usually used to explain a variety of related data concerning a phenomenon of
interest.
variables: measurable characteristics that differ under different circumstances.
variance: a method of describing the variability of a distribution; it consists of the mean of the squared distances of
scores from the distribution mean
Study Questions and Questions to Expand Your Thinking
1. Imagine that you are interested in measuring the ability of a child to understand the names of colors usually known by
children of his or her age. Think of four different indicators for the construct of color name comprehension—two that are
reflective and two that are formative.
2. Propose an indicator of spelling proficiency falling at each of the first three levels of measurement: nominal, ordinal,
and interval.
3. Suppose that a measurement tool offers you two different normative groups against which to compare the performance
of a child who speaks Korean as a first language and English as a second—one group consisting of children of similar
ages with similar language histories to the child to be tested and one of children of similar ages with English as their only
language. What would each comparison tell you about the child?
4. For each of the following measurement purposes, explain which type of score interpretation would be most suitable—
norm- or criterion-referencing:
1. identifying the poorest performance on a classroom test
2. competency testing for graduation from high school
3. testing for licensure in a profession, such as speech-language pathology
4. national testing for scholastic aptitude (e.g., SAT’s or GRE’s)
5. determining success of treatment aimed at improving a student’s correct use of selected verb forms
5. Find a newspaper article in which a behavioral measure is described. What construct appears to be measured? At what
level is that measurement conducted? What measures of central tendency and variability would be appropriate for this
measure?
Page 47
6. Find a newspaper article in which the relationship between two variables is described. Is a causal relationship between
these variables implied? Does that interpretation seem warranted or can you imagine a different causal relationship
between the variables? Describe it.
7. On the basis of your personal observation, describe two variables you believe have a positive correlation with one
another, then two that have a negative correlation.
8. A 3-year-old child receives a test score on a norm-referenced test that falls at the 35th percentile and yields an age
equivalent score of 2 years, 8 months. Explain the meaning of those scores as if you were talking to a very worried
parent.
9. The parent of a high-achieving 10-year-old girl tells you that her daughter has been tested by a neighbor who is
studying psychology and achieved a standard score of 100 on an intelligence test. She wonders if that doesn’t mean that
her child’s perfect score suggests that she is a genius who should skip several grades. What would you tell her about her
child’s performance? (This is tricky. Consider both the fact that you didn’t obtain this information directly as well as the
meaning of standard scores.)
10. Pretend that you have devised a test to determine students’ mastery of the content covered in this chapter. How might
you determine an appropriate cutting score? (No, the answer to this is not in the book up to this point. Think creatively).
Recommended Readings
Gould, S. J. (1981). The mismeasure of man. New York: Norton.
Sattler, J. M. (1988). Assessment of children (3rd ed.). San Diego: Author.
References
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole Publishing.
American Psychological Association, American Educational Research Association, & National Council on Measurement
in Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological
Association.
Anatasi, A. (1982). Psychological testing (5th ed.). New York: Macmillan.
Badian, N. (1993). Phonemic awareness, naming and visual symbol processing and reading. Reading and Writing, 5, 87–
100.
Bankson, N. W., & Bernthal, J. E. (1990). Bankson–Bernthal Test of Phonology. Chicago: Riverside.
Bernthal, J. E., & Bankson, N. W. (1998). Articulation and phonological disorders (4th ed.). Englewood Cliffs, NJ:
Prentice-Hall.
Bradley, L., & Bryant, P. (1983). Categorizing sounds and learning to read: A causal connection. Nature, 301, 419–421.
Bridgman, P. W. (1927). The logic of modern physics. New York: Macmillan.
Bunderson, C. V., Inouye, D. K. & Olsen, J. B. (1989). The four generations of computerized educational measurement.
In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 409–429). New York: National Council on Measurement in
Education and American Council on Education.
Chial, M. R. (1988). Utility inferential statistics. In D. Yoder & R. D. Kent (Eds.), Decision making in speech-language
pathology (pp. 198–201). Toronto: B. C. Decker.
Conover, W. M. (1998). Practical nonparametric statistics (3rd ed.). New York: Wiley.
Page 48
Culatta, B., Page., J. L., & Ellis, J. (1983). Story retelling as a communicative performance screening tool. Language,
Speech, and Hearing Services in Schools, 14, 66–74.
Francis, D. J., Fletcher, J. M., Shaywitz, B. A., Shaywitz, S. E., & Rourke, B. P. (1996). Defining learning and language
disabilities: Conceptual and psychometric issues with the use of IQ tests. Language, Speech, and Hearing Services in
Schools, 27, 132–143.
Freedman, D., Pisani, R., & Purves, R. (1998). Statistics (3rd ed). New York: Norton.
Gardner, M. F. (1990). Expressive One-Word Picture Vocabulary Test—Revised. Novato, CA: Academic Therapy.
Gibbons, J. D. (1993). Nonparametric statistics: An introduction. Newbury Park, CA: Sage.
Gould, S. J. (1981). The mismeasure of man. New York: Norton.
Gronlund, N. (1982). Constructing achievement tests (3rd ed.). Englewood Cliffs, NJ: Prentice-Hall.
Kerlinger, F. N. (1973). Foundations of behavioral research. (2nd ed.) New York: Holt, Rinehart & Winston.
Lahey, M. (1988). Language disorders and language development. New York: Macmillan.
McCauley, R. J. (1996). Familiar strangers: Criterion-referenced measures in communication disorders. Language,
Speech, and Hearing Services in Schools, 27, 122–131.
McCauley, R. J., & Swisher, L. (1984). Use and misuse of norm-referenced tests in clinical assessment: A hypothetical
case. Journal of Speech and Hearing Disorders, 49, 338–348.
McClave, J. T. (1995). A first course in statistics (5th ed.). Englewood Cliffs, NJ: Prentice-Hall.
Newcomer, P. L., & Hammill, D. D. (1991). Test of Languge Develoment—2 Primary. Austin, TX: Pro-Ed.
Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Salvia, J., & Ysseldyke, J. E. (1995). Assessment (6th ed.). Boston: Houghton Mifflin.
Sattler, J. M. (1988). Assessment of children (3rd ed.). San Diego, CA: Author.
St. Louis, K. O. & Ruscello, D. (1987). Oral Speech Mechanism Screening Examination—Revised. Baltimore:
University Park Press.
Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of experimental
psychology (pp. 1–49). New York: Wiley.
Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children-Revised. San Antonio: The Psychological
Corporation.
Wiig, E. S., Jones, S. S., & Wiig, E. D. (1996). Computer-based assessment of word knowledge in teens with learning
disabilities. Language, Speech, and Hearing Services in Schools, 27, 21–28.
Williams, F. (1979). Reasoning with statistics (2nd ed.). New York: Holt, Rinehart & Winston.
Young, M. A. (1993). Supplementing tests of statistical significance: Variation accounted for. Journal of Speech and
Hearing Research, 36, 644–656.
Page 49
CHAPTER
3

Validity and Reliability


Historical Background

Validity

Reliability
Historical Background
The historic roots of behavioral measurement can be traced to tests used in the third century B.C. by the Chinese military
for the purpose of identifying officers worthy of promotion (Nitko, 1983). Despite such early beginnings, however,
widespread interest in measurement for purposes such as helping children has far more recent origins, beginning at the
close of the 19th century. Not surprisingly, therefore, there are many threads of thought leading to the diversity of
instruments and procedures now being used to describe and make decisions about people.
During the 20th century, perspectives on how to develop and use measures such as those used to help children with
developmental language disorders have come from education, psychology, and—most recently—speech-language
pathology. Over this relatively brief period of time, professional and academic organizations in these fields have taken
on the responsibility of developing standards of test development and use. These efforts have primarily focused on tests,
where test is defined as a behavioral measure in which a structured sample of behavior is obtained under conditions in
which the tested individual is expected (or at least has been instructed) to do his or
Page 50
her best1 (APA, AERA, & NCME, 1985). Despite a focus on tests in this narrow sense, such standards have always been
meant to apply to all behavioral measures—although they apply to a greater or lesser extent depending on the specific
characteristics of the measure.
Most notable among efforts to provide guidance to test developers and users have been those of the APA, AERA, and
NCME. In 1966, after two earlier sets of testing standards (APA, 1954; National Education Association, 1955), the three
organizations worked together to create a single document, Standards for Educational and Psychological Tests and
Manuals, which has gone through two revisions. The most recent revision was renamed Standards for Educational and
Psychological Testing (AERA, APA, & NCME, 1985).
The frequent revision of these standards reflects the brisk pace of research and ongoing discussion about behavioral
measurement. One particularly important transition occurring within the past two decades is reflected in the change of
title from Standards for… Tests to Standards for… Testing. This change emphasizes the centrality of the test user in
measurement quality. Earlier editions focused on ways in which test developers could demonstrate the quality of their
instruments. Far less attention was paid to issues related to actual test administration and interpretation. In fact, whereas
75% of the 1974 version related to test standards, only 25% of it related to standards of test use. In the most recent
version, there has been almost a reversal in those percentages: about 60% relates to test use versus 40% to test standards.
This shift is consistent with the most influential work conducted in the last decade in which test users are asked to
consider not simply the technical adequacy of methods used to derive specific test scores, but also the impact their
decisions will have (Messick, 1989). Not surprisingly, the term ethics has cropped up frequently in the course of these
discussions. It will surface frequently in this text as well.
Beginning with this chapter, I hope that readers will adopt a perspective similar to that set by the APA, AERA, and
NCME (1985). Specifically, I hope that you will consider measurement quality in developmental language disorders as a
arena in which many elements come into play, but in which you are the lion tamer, the person who remains expertly in
charge of a potentially dangerous situation. In this chapter and the one that follows it, I focus on how best to select
appropriate measures once you have a fairly specific application in mind. Chapters in Part II focus on those specific
applications commonly faced by clinicians who work with children who have developmental language disorders. Those
chapters will figure prominently in helping you learn to tailor your measurements to the specific purposes you have in
mind—a key lesson for those interested in providing their clients with the best possible care.
The remainder of this chapter is intended to introduce you to validity and reliability, two concepts that invariably
dominate discussions of measurement quality. Validity is by far the most central of the two terms. It even might be said
that any discussion of measurement quality is automatically a discussion of validity. Reliability is of
1 This assumption is probably not well founded for many children with language disorders, who may be unable to
understand what it means to “do one’s best” or who may be unwilling to do it. I return to this issue at numerous points
throughout this book.
Page 51
lesser importance but is still vital. Its secondary place derives from its role as prerequisite for, but not sole determinant
of, validity.
Validity
Validity can be defined as the extent to which a measure measures what it is being used to measure. So, you might ask,
what’s all of the fuss about? Despite its seeming simplicity, however, the concept of validity has a number of subtle
nuances that can be difficult to grasp for even the most seasoned users of behavioral measures. Several misconceptions
are evident when a test user or developer says sweepingly that a given test is a valid test. First, this kind of statement
about a measure suggests that it somehow possesses validity, independent of its use for a particular purpose. Second, it
suggests that validity is an all-or-nothing proposition. Both of those suggestions are untrue, however. What can safely be
said about a given measure is that it seems to have a certain level of validity to answer a specific question regarding a
specific individual. However, even reaching that less-than-definitive-sounding conclusion requires considerable work on
the part of the clinician.
To explore the general concept of validity a little more fully, consider a specific, widely used measure—the Peabody
Picture Vocabulary’ Test–III (Dunn & Dunn, 1997). That measure was developed for the purpose of examining receptive
vocabulary in a wide variety of individuals using a task in which a single word is spoken by the test giver and the test
taker points to one picture (from a set of four) to which the word corresponds. Despite the exceptionally detailed
development undergone by the PPVT-III, it is nonetheless quite easy to imagine situations in which its use could lead to
highly invalid conclusions and, thus, for which its validity could be questioned. For example, using the PPVT-III to
reach conclusions about a test taker’s artistic talent or about the vocabulary of someone who does not speak English
represent gross examples of how misapplication undermines validity.
One can also imagine—or simply observe—less obvious yet similarly problematic applications of the PPVT-III. For
example, the PPVT-III might be used to draw conclusions about overall receptive language, rather than receptive
vocabulary only. It might also be used to examine the receptive vocabulary skills of an individual or group lacking much
previous exposure to many vocabulary items pictured in the exam. In each of these cases, the validity of the test’s use
would be adversely affected, although probably not to the degree of the first, extreme examples. Thus, these latter
examples illustrate the continuous nature of validity by showing that a measure can be less valid than if it were used
appropriately, but more valid than if wildly misused. These last two examples are also poignant because they aren’t just
hypothetical examples, but actual ones that readily occur if a clinician is careless or naive about the concept of validity.
As another way of thinking about these problems in validity, consider two questions: (a) Is something other than the
intended construct actually being measured by the indicator (the test)? and (b) Does the indicator reflect its target
construct in such a limited way that much of the meaning of the construct is lost? Affirmative answers to either or both
of those questions chip away at the value of the indicator as a means
Page 52
of measuring the intended construct and, by definition, chip away at the measure’s validity. Thus, when the PPVT-III is
used as a measure of receptive language as a whole, the construct of receptive language is greatly impoverished, hence
one can conclude that reduced validity is a strong risk. On the other hand, it may be used to measure vocabulary skills in
individuals who have not had much exposure to the vocabulary. Then it may become a measure of exposure to the
vocabulary rather learning of the vocabulary, thus reducing the measure’s validity because the test would not be
measuring what it was supposed to measure.
Given the continuous nature of validity and the considerable specificity with which it must be demonstrated, how does
one ascertain that a measure is valid enough to warrant use for a particular purpose? In the next section I outline methods
that are used by test developers and other researchers to provide support of a general nature—that is, suggesting broad
parameters associated with its useful application. Methods used by test users to evaluate that support in terms of a
specific application are described in the next chapter.
Ways of Examining Validity
The methods used to demonstrate that a measure is likely to prove valid for a general purpose (such as identifying a
problem area or monitoring learning) have grown in number and sophistication over the years. Although the methods are
highly interrelated, they are nonetheless characterized as falling into three categories: construct validation, content
validation, and criterion-related validation. These three categories are ordered beginning with the most important.
Construct Validation
Construct validation refers to the accumulation of evidence showing that a measure relates in predicted ways to the
construct it is being used to measure—that is, to show that it is an effective indicator of that construct. A wide variety of
evidence falls into this category, including evidence that is described as content- or criterion-related in the sections that
follow. If that seems confusing to you at first, you are not alone; the theoretical centrality of construct validity has only
recently been recognized. Until that time, validity was usually conveyed as composed of three parts rather than as a unity.
Figure 3.1 portrays the relationship between the three types of validity evidence. It also conveys the two meanings of
construct validity—(a) as a cover term for all types of validity evidence and (b) as a term used to refer to several
methods of validation that are not seen as fitting under either content- or criterion-related validation techniques.
The underlying similarity of methods uniquely defined as demonstrating construct validity can perhaps best be seen
through a discussion of the earliest stages in measurement development. When approaching the development of a
behavioral measure, the developer considers how the construct to be measured (such as receptive vocabulary) is related
to other behavioral constructs and events in the world (such as age, gender, other abilities). Also considered at this stage
are possible indicators (such as pointing at named pictures or acting out named actions) that might reasonably be used
Page 53

Fig. 3.1. A graphic analogy illustrating the different kinds of evidence of validity.
to obtain information about the construct and thereby serve as the basis for the measure.
For example, in the case of receptive vocabulary as a possible construct, the test developer begins with a scientific
knowledge base that supports expectations about how receptive vocabulary is affected by phenomena such age and
gender. That knowledge base also generates expectations about how the construct is related to other behavioral
constructs such as expressive language development and hearing ability. From this knowledge base, the developer
formulates predictions about how a valid indicator, or measure, will be affected by such phenomena and how such a
valid indicator will be related to other constructs. Evidence suggesting that the measure acts as predicted supports claims
of construct validity. Four specific methods of construct validation are discussed in upcoming paragraphs—
developmental studies, contrasting group studies, factor analytic studies, and convergent-discriminant validation studies.
For many measures used with children, two kinds of studies are frequently used to provide evidence of construct validity
—developmental studies (sometimes called age differentiation studies) and studies in which groups who are believed to
differ in relation to the construct are contrasted with one another (sometimes called group differentiation studies). Table
3.1 provides an example of the description provided for each of these types of study. The specific examples used here are
not considered to be the most thorough nor the most sophisticated possible examples. Instead they are meant to help you
anticipate the way such studies are described in test manuals.
The developmental method of construct validation is based on the general expectation that language and many related
skills of interest increase with age. The
Page 54
Table 3.1
Examples of Test Manual Descriptions of Two Types of Construct Validation Studies

Type of study Description

Developmental studies ‘‘Correlational methods were used to determine if performance on the TWF [Test of Word
Finding] changes with age. Using the Pearson product-moment correlation procedure, TWF
accuracy scores (scale scores generated from the Rasch analyses) were correlated with the
chronological age of the 1,200 normal subjects in the standardization sample…. All
coefficients were statistically significant and of a sufficient magnitude to support the
construct validity of the TWF as a measure of expressive language for both boys and girls
and of children of different ethnic and racial background.
Comparison of accuracy scores at each grade level also reflected developmental trends as
the accuracy scores of the normal subjects in the standardization sample increased across
grades…. These findings, which support grade differentiation by the TWF for all but one
grade, are a further indication of developmental trends in test performances on the
TWF.” (German, 1986, p. 5)
Contrasting group studies “In order to test the capacity of the TELD [Test of Early Language Development] to
distinguish between groups known to differ in communication ability, we administered the
TELD to seventeen children who were diagnosed as ‘communication disordered’ cases. No
children with apparent hearing losses were included in the group. Eighty percent of the
children were white males; they ranged in age from three to six and a half. In socio-
economic status, sixty-four percent were middle class or above. All of the children attended
school in Dallas, Texas.
The mean Language Quotient (LQ) derived from the TELD for this group was 76. Since the
TELD is built to conform to a distribution that has a mean of 100 and a standard deviation
of 15, it is apparent that the observed 76 LQ represents a considerable departure from
expectancy. It is a discrepancy that approaches two standard deviations from normal. These
findings were taken as evidence supporting the TELD’s construct validity.” (Hresko, Reid,
& Hammill, 1981, p. 15)

hypothesis tested in this type of validation study is that performance on the measure being studied will improve with age.
As you probably recall from previous course work, developmental studies of this kind can take a couple of different
forms—one (called a longitudinal study) compares the performances of a single group of children across time, and a
second (called a cross-sectional design) compares the performances of several groups of children, each group falling at a
different age. Cross-sectional studies are particularly popular among test developers, undoubtedly because the data
needed to test the hypothesis are the same as those needed to provide norms.
A second major type of construct validation study, which can be called the contrasting groups method of construct
validation, tests the hypothesis that two or more groups of children will differ significantly in their performance on the
targeted measure. Again, consider receptive vocabulary as the example. Obviously developing a test of receptive
vocabulary for use with children only makes sense if you believe that there are some children whose performance falls so
far below that of peers as to
Page 55
have significant negative consequences. For this type of measure, one might evaluate construct validity by finding
groups of children who are thought to differ in their receptive vocabulary knowledge (e.g., children with a
developmental language disorder vs. children without such a disorder). In this type of study, if the measure is a valid
reflection of the construct, children who have been identified as differing in relation to the construct should also differ in
their performance on the measure. See Table 3.1 for an example of a validation study of this type.
A third category of construct validity study is identified through the use of a specific statistical technique—factor
analysis. Factor analysis is less frequently used in speech-language pathology than it is in some other disciplines. For
example, it has been used most extensively to study intelligence tests. Besides its value as a means of studying an
already developed measure, factor analysis is frequently used in early stages of test development as an aid in selecting
items from a pool of possible items.
The term factor analysis describes a number of techniques used to examine the interrelationships of a set of variables and
to explain those interrelationships through a smaller number of factors (Allen & Yen, 1979). Factor analysis assists
researchers in the very difficult process of making sense of a large number of correlations, the most basic method for
describing interrelationships (as described in chap. 2).
In factor analytic studies, the original set of variables to be studied typically consists of a group’s performance on the
target measure as well as a set of other measures—some of which tap a similar construct as the target measure. Although
the concept of the factor does not exactly relate to a specific underlying construct, all measures related to a single
construct are expected to be associated with a single factor. Therefore, construct validity would be demonstrated in this
type of study when the target measure shares, or “loads on,” the same factor as measures for which validity with respect
to a particular construct has already been demonstrated (Pedhazur & Schmelkin, 1991).
A particularly sophisticated method proposed for studying construct validity exists in principle, is applied to measures
developed for a variety of behavioral constructs, but is rarely applied in speech and language measures. That is the
method known as convergent and discriminant validation (Campbell & Fiske, 1959), which is associated with a type of
experimental design they called a multitrait–multimethod matrix. Because of the relative rarity of this approach for
measures used with children who have language disorders, I do not discuss it in detail. However, because this method is
sometimes used for measures you will be interested in, it is important to know that convergent validiation refers to
demonstrations that a measure correlates significantly and highly with measures aimed at the same construct, but using
different methods; discriminant validation refers to demonstrations that it does not correlate significantly and highly with
measures targeting different constructs (Pedhazur & Schmelkin, 1991).
An example from Anastasi (1988) may help make the ideas behind convergent and discriminant validation clearer:
Correlation of a quantitative reasoning test with subsequent grades in a math course would be an example of convergent
validation. For the same test, discriminant validity would be evidenced by a low and insignificant correlation with scores
on a reading comprehension test, since reading ability is an irrelevant variable in a test designed to measure quantitative
reasoning. (p. 156)
Page 56
In short, validity is supported in this approach through evidence that the measure under study is measuring what it is
supposed to measure in a manner uncontaminated by its relationship to something else that it was not supposed to
measure.
In the context of their discussion of convergent and discriminant validation, Pedhazur and Schmelkin (1991) discussed a
pair of fallacies that threaten researchers’ understanding of the evidence they obtain using this measure, but equally
apply to thoughts about test selection. Cleverly, they have been termed the “jingle and jangle fallacies.” Jingle fallacies
arise when one assumes that measures with similar names must tap similar constructs; whereas jangle fallacies arise
when one assumes that measures with dissimilar names must tap dissimilar constructs. Obviously, close examination of
actual content can help ward off the deluding effects of such thinking.
Although I only discussed four methods of construct validation, many more methods are actually used, including those
that have conventionally been identified in association with content- and criterion-related validation. Methods fitting
under content- and criterion-related validation techniques are discussed next. These are typically viewed as more easily
understood than construct validation.
Content Validation
Content validation involves the demonstration that a measure’s content is consistent with the construct or constructs it is
being used to measure. As with construct validity, the developer addresses concerns about content validity from the
earliest stages of the measure’s development. Such concerns necessitate the use of a plan to guide the construction of the
components of the measure (test items, in the case of standardized tests). The plan ensures that the components of the
measure will provide sufficient coverage of various aspects of a construct (often called content coverage) while avoiding
extraneous content unrelated to the construct (thus assuring content relevance). Later, content validity is evaluated
directly, usually through the use of a panel of experts who evaluate the original plan and the extent to which it was
effectively executed. Table 3.2 lists the basic steps involved in the development of standardized measures.
Despite underlying similarities, the specific ways in which concerns regarding content validity affect the development
process differ for norm-referenced and criterion-referenced measures. Before attempting a comparison of these
differences, recall
Table 3.2
Steps Involved in the Development of a Standardized Measure
(Allen & Yen, 1979; Berk, 1984)

Step Test Development Activity

1 Plan the test


2 Write possible items
3 Conduct an item try-out
4 Conduct an item analysis
5 Develop interpretive base (norms or performance standards)
6 Collect additional validity and reliability data
Page 57
these two ideas from chapter 2: (a) content tends to be broadly sampled for norm-referenced measures and extensively,
almost exhaustively, sampled for criterion-referenced measures; and (b) a person’s performance is interpreted in relation
to the performance of a normative group for norm-referenced measures and to a specific performance level for criterion-
referenced measures. In the following sections, I describe the effect of these differences on content validation within the
context of an explanation of procedures used in the development of norm-referenced as well as criterion-referenced
measures.
The Development of Norm-Referenced Measures and Content Validity. For norm-referenced tests, the development of
the plan involves decisions about the number and complexity of constructs to be examined as well as the numbers and
kinds of items to be used. Some tests attempt to take on only one construct (e.g., the PPVT-III Dunn & Dunn, 1997),
whereas others address many or complex constructs and consequently are composed of numerous subtests (e.g., the Test
of Language Development Intermediate–3 [Hammill & Newcomer, 1997]), in which the complex construct of language
is viewed as composed of numerous simpler constructs involving various aspects of receptive and expressive language).
Next, as many as 1.5 to 3 times as many items are written as are expected to be used in the final version of the test (Allen
& Yen, 1979). Items are written with the goal of sampling evenly across the range of all possible items and providing a
large enough pool of items that their effectiveness can be studied in the next steps of the test’s development: item tryout
and item analysis.
An item tryout is conducted using a large sample of individuals chosen to be as similar as possible to those for whom the
test will ultimately be used. After the test is given to the sample, the performance of each item is studied using item
analysis. This analysis tends to rely most heavily on information about the item’s difficulty and discrimination but can
involve a variety of techniques (including factor analysis) intended to help the test developer arrive at a subset of the
most valid items by throwing out or modifying unsatisfactory items.
Item difficulty (p) is the number of persons answering the item correctly divided by the number of persons who took the
item. It can be used to gauge whether an item is appropriate to the range of abilities characteristic of the target
population. Obviously, if a test is passed by everyone (p = 1.0) or is failed by everyone (p = .0), it will not help you rank
individuals relative to one another—the goal of a norm-referenced measure. In fact, it is generally held that an item has
the maximum ability to discriminate among test takers when it has a p value of .50. Norm-referenced test developers are
often encouraged to strive for items with difficulties falling between .30 and .70 as an acceptable range around .50 (Allen
& Yen, 1979; Carver, 1974). Items that fall outside of this range are discarded or rewritten (because a difficult item may
only be difficult because its wording is confusing).
Item discrimination can be measured in several different ways, with item discrimination indexes and item–total test score
point biserial correlations as the most popular methods. Item discrimination reflects the extent to which people tend to
perform similarly on the item as they do on the test as a whole (Allen & Yen, 1979). It is gen-
Page 58
erally thought that better items will be those for which there is a tendency for more positive performances on the item to
be associated with more positive performance on the test as whole. Again, items that fail to perform in a desirable
fashion are candidates for rewriting or exclusion.
Once items are rewritten and a subsequent item analysis demonstrates a satisfactory report on the final body of items, the
last step of the test construction process involves the collection of initial information about the instrument’s overall
validity and reliability and the preparation of documentation concerning the instrument. Content validity comes in at this
point in two ways. First, by reporting on the specific methods used in the steps I described, the test author is providing a
potential test user with some evidence that the initial intended content of the test has been well translated into the final
measure. Second, one type of data collected during the final step of test construction consists of expert evaluation of the
development process and of the final fit between intended and actual content of the test. Table 3.3 provides two
examples, showing how different test manuals describe this information.
The Development of Criterion-Referenced Measures and Content Validity. Criterion-referenced measures are constructed
using steps similar to those previously described. However, numerous differences in methods and rationales distinguish
the development of such measures from the development of norm-referenced measures.
To begin with, the initial plan used for a criterion-referenced measure tends to be more elaborate and detailed than that
used in norm-referenced measure construction (Glaser, 1963; Glaser & Klaus, 1962). Also, behavioral objectives, often
hierarchically arranged, may be used as part of the plan, particularly when the measure is being developed to examine
progress in the acquisition of a particular body of information or a particular skill (Allen & Yen, 1979). Nitko (1983)
offered a detailed accounting of the sometimes very intricate plans used for criterion-referenced measures.
The Testing and Measurement Close-Up in this chapter provides a very personal example from the life of one of the
authors quoted most frequently on the topic of validity, Anne Anastasi, which reminds us of the difference in norm-
referenced and criterion-referenced tests.
Once the plan has been finalized, items are written so that they address all aspects of the intended content. Although
exhaustive is too strong a word (an exhaustive test of any construct worth knowing about would undoubtedly require
several lifetimes), the extensiveness of item coverage is definitely in the direction of exhaustive when compared with
that of norm-referenced measures.
Item tryouts and analyses offer another point at which major differences separate norm-referenced from criterion-
referenced instruments. For norm-referenced measures, items are selected for their ability to discriminate across a range
of abilities; for criterion-referenced measures, however, items are selected for their ability to discriminate between
performance levels. Most commonly, dichotomous performance levels are used, such that items are selected for their
ability to discriminate between performance showing mastery of a particular content versus that showing nonmastery.
For that purpose, an ideal item’s difficulty would approximate zero for nonmasters and 1 for masters. One method used
to tentatively identify masters and nonmasters
Page 59
Table 3.3
Examples of Two Types of Criterion-Related Validity Studies

Concurrent validity
Test of Phonological Awareness (TOPA): “When the TOPA (Test of Phonological Awareness) was given to a sample of
100 children at the end of kindergarten, it was found to be significantly correlated with two other, relatively different
measures of phonological awareness. The TOPA-Kindergarten scores were correlated with scores from a measure called
sound isolation (a 15-item test requiring pronunciation of the first phoneme in words) at .66 and with a segmentation task
(requiring children to produce all the phonemes in a three- to five-phoneme word) at .47. Both of these other measures
assessed analytical phonological awareness, although they required a more explicit level of awareness than did the
TOPA.” (Torgesen & Bryant, 1994, p. 24)Preschool Language Scale-3 (PLS-3): “A study of the relationship between
PLS-3 and CELF-R [Clinical Evaluation of Language Function-Revised (Semel, Wiig, & Secord, 1987)] was conducted
with 58 children. The sample consisted of 25 males and 33 females ranging in age from 5 years to 6 years, 11 months
(mean = 6 years, 0 months). The two tests were administered in counterbalanced order. The between-test interval ranged
from two days to two weeks, with an average of 4.5 days. Both tests were administered by the same examiner. Reported
correlations were as follows: PLS-3-Auditory Comprehension with CELF-R Receptive Composite (r = .69); PLS-3-
Expressive Communication with CELF-R Expressive Composite (r = .75); PLS-3-Total Language score with CELF-R
total Score (r = .82).” (Zimmerman, Steiner, & Pond, 1992, p. 95) Predictive validity Test of Phonological Awareness
(TOPA): “When the TOPA-Kindergarten was given to 90 kindergarten children sampled from two elementary schools
serving primarily low socioeconomic status and racial minority children, its correlation with a measure of alphabetic
reading skill (the Word Analysis subtest from the Woodcock Reading Mastery Test) at the end of first grade was .62.
Thus, between 30% to 40% of the variance in word-level reading skills in first grade was accounted for by the TOPA
administered in kindergarten.’’ (Torgesen & Bryant, 1994, p. 24)Receptive-Expressive Emergent Language Scale
(REEL-2): “In the first study investigating predictive validity, researchers at the University of Florida’s Emergent
Language laboratory conducted a longitudinal study of 50 ‘normal’ infants from linguistically enriched environments.
After repeated monthly testing over a 2- to 3-year period, all infants were found to achieve mean average scores for
Receptive Language Age (RLA) and Expressive Language Age (ELA), and Combined Language Age (CLA) at or about
their chronological ages.” (Bzoch & League, 1992, p. 10)
has been to examine performances of an item tryout sample before and after instruction designed to produce mastery
(Allen & Yen, 1979). In that context, better items are those in which p values show the greatest upward change.
As was the case with norm-referenced measures, the last step of the test construction for a criterion-referenced measure
involves the collection of initial information about the instrument’s overall validity and reliability and the preparation of
docu-
Page 60
mentation concerning the instrument. Here, the effects on content validity are achieved using means similar to those used
for norm-referenced measures. In addition to providing descriptive evidence of the procedures used to develop the test’s
content, test authors look to the results of expert evaluations of construction methods and final test content as a further
source of content validation.
TESTING AND MEASUREMENT CLOSE-UP
Anne Anastasi has been called one of “psychology’s leading women.” She was one of only five women (of a total of
96 psychologists) to be considered during the first eight decades of this century in a prominent series of books
recording the history of psychology through autobiography (Stevens & Gardner, 1982). Although Anastasi has made
contributions in a variety of areas in psychology, the reason that she is included here is because of her authorship of a
classic text on psychological testing (Anastasi, 1954). That text has gone through seven editions, with the latest edition
published in 1997. It has undoubtedly served as the source of more information on testing for psychologists and others
than perhaps any other work, and in its latest edition, Anastasi (1997) again provided one of the clearest sources for
essential information on validity and reliability.
In the early 1980s, at the University of Arizona, I had the pleasure of hearing Anne Anastasi present a lecture, when
she was in her 70s. Her black patent leather pocketbook was propped up in front of her on the podium as she spoke, its
stiff handle almost obscuring the audience’s view of her white hair, thick horn-rimmed glasses, and the bright eyes that
lay behind them. I do not actually remember much about the details of her presentation, except that her speech was as
clear as her writing and was presented without a single note. She was as impressive in person as she had been on the
page.
The following passage from her autobiography breathes life into two very different ideas from this chapter. First, it
shows the possibly traumatizing effect that the process of assessment can have—even on a child whose biggest
problem appears to have been her exceptional intelligence. Second it revisits the distinction between norm-referenced
and criterion-referenced (or as she calls it here, content-referenced) score interpretation.
“Throughout my schooling, I retained a deep-rooted notion that any grade short of 100 percent was unsatisfactory. At
one time I actually believed that a single error meant a failing score. I recall a spelling test in 4B, in which we wrote ten
words from dictation. I was unable to hear one of the words properly, because the subway had just roared past the
window (it was elevated in that area). The word was “friend,” but I heard it as “brand.” As a result, the item was
marked wrong and my grade was only 90%. That evening when I told my mother about it, she consoled me and
advised me to raise my hand at the time and tell the teacher, if anything like that should happen again. But she did not
disabuse me of the notion that anything short of a perfect score was a failure. I eventual-
Page 61
ly discovered for myself that one could pass despite a few errors; but I always felt personally uncomfortable with the
idea. There seemed to be some logical fallacy in calling a performance satisfactory when it contained errors. I was
apparently following a content-referenced rather than a norm-referenced approach to performance
evaluation.’’ (Anastasi, 1980, p. 7–8).
Face Validity. One further topic regarding content validity that demands attention is not really a matter of true validity at
all, despite its being termed face validity. Face validity is the superficial appearance of validity to a casual observer. Face
validity is considered a potentially dangerous notion if a test user mistakenly assumes that a cursory evaluation of a
measure for its face validity constitutes sufficient evidence to warrant its adoption. Nonetheless, face validity can play a
role in a test’s actual validity; for example, poor face validity may cause a test taker to discount the importance of a
measure and thereby undermine its ability to function as intended.
In summary, the kind of evidence provided for norm-referenced versus criterion-referenced measures differs. However,
content validation for both types of measures is achieved through the author’s careful planning, execution, and reporting
of the measure’s development and through the positive evaluation of this process by experts in the content being
addressed.
Criterion-Related Validation
Criterion-related validation refers to the accumulation of evidence that the measure being validated is related to another
measure—a criterion—where the criterion is assumed to have been shown to be a valid indicator of the targeted
construct. Putting this in primitive terms, criterion-related validation involves looking to see if your “duck” acts like a
“duck.” This explanation derives from famous streetwise logic in which anything that looks like a duck, walks like a
duck, and quacks like a duck is determined to be a duck. Thus, as you set out to validate your measure (Duck 1), you
search around for a duck (Duck 2, a.k.a. Criterion Duck) that everyone acknowledges is indeed a true duck (i.e., a valid
indicator of the underlying construct). Then you put your ducks through their paces to see to what extent they act
similarly. The greater their similarities, the better the evidence that they share a common “duck-ness.” And then, voilà:
You have evidence of criterion-related validity!
In case I lost you there, the way that criterion-related validation works for a behavioral measure is that one obtains
evidence by finding a strong, usually positive correlation between the target measure and a criterion. The choice of the
criterion is crucial because of the assumption that the criterion has high validity itself. It can also be problematic because
for many constructs it may be difficult to find a criterion that can claim such an exalted status.
Two types of criterion-related validity studies are typically described: concurrent and predictive. Predictive validity is
most relevant when the measure under study will be used to predict future performance in some area. For example, the
Predictive
Page 62
Screening Test of Articulation (PSTA; Van Riper & Erickson, 1969) was intended to predict whether a child tested at the
beginning of first grade would still be considered impaired in phonologic performance 2 years later. Consequently, this
type of evidence was important in demonstrating that the test would measure what it was supposed to measure. In that
particular case, the test developers used as the criterion measure the researcher’s judgments of normal articulation versus
continued articulatory errors based on a simple phonetic inventory and on samples of spontaneous connected speech
obtained 2 years after initial testing with the PSTA.
A study of concurrent validity is performed when the criterion and target measures are studied simultaneously in a group
of individuals like those for whom the test will generally be used. It is by far the more common type of criterion-related
validity study. See Table 3.3 for an example of this type of validity study.
For both predictive and concurrent studies of criterion-related validity, the resulting correlation coefficient is often
termed a validity coefficient, or more specifically, as a predictive or concurrent validity coefficient, respectively.
Interpretation of such coefficients is essentially the same as that described for correlations in chapter 2. However, one
factor in the interpretation of validity coefficients that was not addressed previously concerns how high a correlation has
to be for one to consider it credible support of a measure’s valid use for a particular purpose. The Standards for
Educational and Psychological Testing (APA, AERA, & NCME, 1985) does not provide direct guidance on this
question. However, several experts recommend that when a measure is going to be used to make decisions about an
individual (rather than as a way to summarize a group’s performance), a standard of .90 should be used. As an additional
proviso, the correlation coefficient should also be found to be statistically significant (Anastasi, 1988).
Factors Affecting Validity
Anything that causes a measure to be sensitive to factors other than the targeted construct will diminish the measure’s
validity. For example, a bathroom scale that becomes sensitive to room temperature or humidity is likely to be less valid
as an indicator of how much damage one has done after a series of holiday meals. In this section of chapter, I consider
factors affecting the validity of behavioral measures such as those used with children—first considering two factors over
which the clinician has considerable direct control, then two factors over which the clinician’s control is far less direct.
Selection of an Appropriate Measure
As mentioned at the beginning of this chapter, probably the biggest factor affecting the validity of decisions made using
a particular measure is the suitability of the match between the specific testing purpose and the demonstrated qualities of
the measure to be used. The majority of information described thus far relates to activities performed by the developer of
a standardized measure. Still to be discussed is how test users make use of that information to do their rather large part in
assuring the validity of their own test use. For the moment, it is sufficient that you be aware that your role is critical in
assuring testing validity and that it begins with a thorough evaluation of information provided by the
Page 63
test developer, test reviewers, and the clinical literature in light of your client’s needs. Specific steps leading to such an
evaluation are described in the next chapter.
Administration of the Measure
After successful selection of a measure, the clinician plays a critical role in assuring validity through its skilled and
accurate administration. Unless a measure is administered in a manner consistent with the methods used in developing
the measure’s norms and testing its reliability and validity, any comparison of the resulting performance against either
norms or performance standards becomes distorted, even nonsensical. Thus for example, the directions supplied, with a
test may indicate that orally presented items are to be read aloud only once. In that case, the difficulty of that test will
probably be lessened if the test user decides that it’s only “fair” to the child to give a second chance to hear the
information included in the item. In reality, however, it is decidedly “unfair” to the child if the test is being used to
provide information about how that child’s performance compares with a standard that was determined under different
conditions.
Skilled administration of standardized measures, however, goes well beyond the preservation of idealized conditions. It
also facilitates a crucial but sometimes overlooked function of a testing situation—that is, the establishment of a trusting,
potentially helpful relationship between the clinician and the child being tested. If test administration goes well, the child
comes away from the experience with a sense that the test giver likes the child and is a rewarding person with whom to
interact. If it does not, not only will the test data be compromised, but the child may develop expectations of the test
giver that will be difficult to overcome. Indeed, some researchers (Maynard & Marlaire, 1999; Stillman, Snow, &
Warren, 1999) who examine the testing process in detail note that far too little attention is paid to the collaborative
nature of testing, in which the examiner is not a passive conduit of items but an integral participant in the testing
outcome. Table 3.4 lists some suggestions gleaned from several years of clinical experience (my own and others’)
concerning how to facilitate testing.
Client Factors
Client factors are such a key feature to valid testing of children that it seems worth discussing them under a separate
heading. Of particular interest are motivation and what Salvia and Ysseldyke (1995) called enabling behaviors and
knowledge.
Motivation affects the performance of adults and children in often dramatic ways. Although the topic of motivation has
been the impetus for extensive research in several disciplines, you can quickly appreciate the devastating impact of low
motivation by looking back over your own experiences and remembering an occasion when a classroom quiz or test fell
at a time when you were preoccupied by other things happening in your life, or perhaps a time when you “psyched
yourself out,’’ thereby seemingly necessitating the fulfillment of a prophecy of failure. For me, the experience that
comes to mind is a midterm examination I took in college. I had found an unconscious but still breathing mockingbird on
my way to the exam. Consequently, during the examination, I spent much more of my time wondering whether the bird
would still
Page 64
Table 3.4
Testing Recommendations

Things to Consider When Testing Children

1. Remember that children rarely have much sophistication in test-taking skills. They expect your relationship with
them to be based on the same rules that apply to interactions in other situations. Therefore it is your responsibility
to honor their expectations and find ways to achieve your goals within that context.
2. Children’s efforts to achieve their best for you will be built on the expectation that you and they are out to please
each other in the interaction. You want to be accepted by the child as a rewarding, appreciative adult who is
generally fun to be with.
3. For older children, you need to strive for a balance in which you are in control as much as you need to be to have
your questions answered and the child is in control as much as possible otherwise. For example, it is important that
you maintain control over your test materials, are relatively firm when you make a request that is a necessary part
of the testing process, and only offer choices where they are truly available (e.g., avoid asking questions such as the
following if they are not true offers: “Do you want to look at some pictures with me now? ”).
4. Help children cooperate by informing them about the content, order, and time frames associated with various
assessment tasks. Toward this end, consider doing the following: (a) Whenever possible, allow the child to make
choices in ordering activities, and (b) devise a method to let children know how much more is required of them. For
older children, you can use a list where each completed item is checked off or rewarded with a sticker. For younger
or less sophisticated children, you can use tokens equaling the number of activities, which are removed from sight
or moved to a different location as each activity is finished.

be alive when I finished it and where I could get help for it if it were still alive, than I spent actively focused on the
outcome of the examination. With predictable results. (Sadly, the bird fared no better than my exam grade.)
Motivation is particularly critical for measures that are intended to elicit one’s best effort. One variety of such measures
are those in which clients are assumed to be doing their best under conditions stressing accuracy, speed of execution, or
both. These are sometimes called maximal performance measures. Common examples of maximal performance
measures in childhood language disorders include measurement of language functions in which responses are timed as
well as a variety of speech production measures, including diadochokinetic rate. In a discussion of such measures used to
study speech production, Kent, Kent and Rosenbek (1987) cautioned that extreme care should be taken before
concluding that a test taker is fully aware and motivated and therefore likely to produce a performance that can
reasonably be compared with norms or behavioral standards. The need for caution is particularly great for younger
children and for children with either Down syndrome or autism, but it should always be a concern for any child. Whereas
the level of concern should be greatest for maximal performance testing, any testing of a child will be subject to reduced
validity if the child is uninterested or overly anxious.
Enabling behaviors and knowledge are defined by Salvia and Ysseldyke (1995) as “skills and facts that a person must
rely on to demonstrate a target behavior or knowledge.” If an assumed enabling behavior is absent or diminished,
performance on the
Page 65
measure may no longer be associated with the behavior under study; hence its validity is threatened dramatically.
Enabling behaviors that are frequently assumed in children’s language tests include adequate vision, hearing, motor skill,
and understanding of the dialect in which the test is constructed. In fact, although I discussed it earlier as a separate
category, positive motivation to participate in assessment is a frequently assumed enabling behavior.
Reliability
Reliability, or consistency in measurement, is invariably listed as a major factor affecting validity because it is a
necessary condition for validity, meaning that a measure can only be valid if it is also reliable. Reliability does not assure
validity, however. Figure 3.2 illustrates this relationship between reliability and validity using archery as an analogy.
Target number 1 demonstrates the handiwork of an archer whose aim is both reliable and valid; number 2, an archer
whose aim is reliable, but not valid; and number 3, an archer whose aim is neither reliable nor valid. In behavioral
measurement, the use of measures with degrees of reliability and validity similar to that shown in targets 2 and 3 will
have similarly negative outcomes, although unfortunately the outcomes may not be as obvious and, therefore, will be
harder to detect—and, possibly, to rectify.

Fig. 3.2. A graphic analogy illustrating the relationship between reliability and validity.
Page 66
One point (no pun intended) made by Fig. 3.2 is that reliability limits how valid a measure can be; any loss of reliability
represents a loss of validity. Thus, information about reliability can provide very important insight into the quality of a
measure. To illustrate this problem in a more lively way, imagine the problems associated with an elastic and therefore
unreliable ruler. Over repeated measurements of a single piece of wood with such a ruler, its user on each attempt might
try desperately, even comically, to apply exactly the same outward pressures to the ruler—almost certainly in vain, with
measurements of 5 inches one time, 6 inches the next, and so on. With such immediate feedback, the user of the measure
would surely recognize the hopeless lack of validity in these measurements and would undoubtedly go looking for a
better ruler. Unfortunately, when human behavior is being measured, even measures with reliability equivalent to that of
an elastic ruler would not be so easily recognized. Thus, because of the importance of reliability, the next section of this
chapter is devoted to a more detailed explanation of reliability—what it is and how it is studied.
Reliability
Reliability can be defined as the consistency of a measure across various conditions—such as conditions associated with
changes in time, in the individual administering or scoring the measure, and even changes in the specific items it
contains. If a measure is shown to be consistent in its results across these conditions, then its user can make inferences
from performance under observed conditions to behaviors and skills shown in other, unobserved conditions. In short,
acceptable reliability allows for generalization of findings obtained in the assessment situation to a broader array of real-
life situations—those in which test users are really more interested.
When the reliability of a measure is examined during the course of its construction, that information is frequently
represented using another type of correlation coefficient called a reliability coefficient. Alternatively, more sophisticated
statistical methods have been developed to examine the reliability of measures on the basis of an influential perspective
called Generalizability theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972), which attempts to examine several
sources of inconsistency simultaneously. These methods, however, are relatively recent and only infrequently applied in
speech and language measures (Cordes, 1994).
Another way for thinking about reliability is in terms of how it affects an individual score. The most popular framework
guiding this perspective on reliability is sometimes described as the “classical psychometric theory’’ or the “classical
true-score theory.” Although recent developments, including Generalizability theory, have eclipsed classical theory as
the cutting edge of psychometrics (Fredericksen, Mislevy, & Bejar, 1993), classical theory nonetheless pervades much of
the practical methods used by test developers and hence test users. Further, its continuing utility is praised even by those
actively working along other lines (e.g., Mislevy, 1993).
The most important assumption associated with classical true-score theory (Allen & Yen, 1979) is that an observed score
(a score someone actually obtains) is the sum of the test taker’s true score plus some nonsystematic error. Thus, the true
score is an
Page 67
idealization. It has alternatively been described as the score you would find if you had access to a crystal ball or as the
mean score a test taker would achieve if tested infinitely. Notice that error and reliability are correlated in this
perspective. Specifically they are inversely related: The larger the reliability, the smaller the error.
Besides its historical value, this perspective on reliability is useful because it foreshadows our ability to apply reliability
information obtained on a group to possible error in the observed score of an individual test taker, such as our client.
When the reliability of a measure is expressed in relation to individual scores, that information is represented using a
measure known as the standard error of measurement (SEM). Its mention here is meant to whet your appetite for further
information, which is provided later under the heading Internal Consistency.
Ways of Examining Reliability
Three types of reliability are of most frequent interest—test–retest reliability, internal consistency reliability, and
interexaminer reliability. A fourth type of reliability, alternate-forms reliability, is relatively infrequently used. The
methods used to demonstrate such reliability with a particular group of test takers depend to some extent on whether it
will be interpreted using a criterion-referenced or norm-referenced approach. Whereas there is widespread agreement
concerning the methods to be used to study the reliability of norm-referenced measures, debate continues concerning the
best methods to be used with criterion-referenced measures and whether methods traditionally developed for norm-
referenced measures can also be used with criterion-referenced measures (Gronlund, 1993; Nitko, 1983). I discuss
reliability primarily from the traditional, or norm-referenced, perspective, but note those points at which methods
recommended for criterion-referenced measures depart from that perspective.
Test–Retest Reliability
Test–retest reliability is studied in order to address concerns about a measure’s consistency over time. It is particularly
important where the characteristic being measured is thought to remain relatively constant for at least shorter periods of
time (such as 2 weeks to a month). Sometimes a distinction is made between examinations of reliability over periods of
time under 2 months and those of reliability over longer periods of time, which is then termed stability (e.g., Watson,
1983). However, more common is a tendency for the terms test–retest reliability and stability to be used interchangeably.
For norm-referenced measures used with children with language impairments, test–retest reliability is typically studied
by testing a group of children similar to those for whom the measure is intended on two occasions, usually no more than
a month apart. A correlation coefficient, called a test-retest reliability coefficient, is calculated to describe the
relationship between the two sets of scores and is interpreted in a manner identical to that used for previous correlation
coefficients, with increasing correlation size showing a greater degree of relatedness between the two sets of scores.
For measures used with children, the test–retest interval is particularly crucial because rapid developmental changes are
likely to affect whatever characteristic is being meas-
Page 68
ured if the test–retest interval is too large. Thus it is imperative that test developers report the size of that interval over
which test–retest reliability is calculated. Only rarely will a measure be examined for test–retest reliability over an
interval longer than a month.
One limitation of test–retest reliability coefficients is their susceptibility to carryover effects where the first testing
affects the second. Depending on the nature of the carryover, the apparent reliability of a measure for use in a one-time
testing situation (the most typical application) might be either inflated or deflated (Allen & Yen, 1979). For example,
practice effects might make the test easier on the second testing, causing answers to change from the first to second
testing that would result in a reliability coefficient that is smaller than it would be if carryover had not occurred. On the
other hand, test takers may remember their answers from the first testing and simply repeat them on the second, resulting
in a reliability coefficient that is larger than it would be if carryover had not occurred. Because of this, test developers
will sometimes adopt methods other than the straightforward test–retest method, choosing to use alternate-forms
retesting methods to supplement or sometimes even replace test–retest data.
Many measures of considerable utility to speech-language pathologists working with children who have language
impairments are not standardized tests for which reliability data are provided. Instead, they are informal measures
devised for a limited purpose. For informal measures it is more common to discuss the concept of consistency under the
heading of agreement. Thus, for example, it is possible to calculate test–retest agreement for an informal measure used
by a single clinician.
Figure 3.3 provides an example of an informal probe measure for which an agreement measure is calculated. Although
this example uses two judges, analogous methods can be used to examine consistency for a single judge over time. In
this example, the importance of agreement measures in giving you a sense of the consistency of measurement is
highlighted when you notice that the two judges arrived at exactly the same percentage correct calculation for the client.
However, they did so while agreeing about which words were correctly produced at a percentage almost equal to that
predicted if their judgements were due to chance (50%)! A particularly popular alternative to the simple procedure I
described is the Kappa coefficient (Fleiss, 1981; Hsu & Hsu, 1996), which addresses this problem of chance agreement.
McReynolds and Kearns (1983) are an especially helpful resource for those interested in a more thorough description of
agreement measures. Yet another resource for those interested in a detailed discussion of the meaning and relative merit
of such measures can be found in Cordes (1994).
Internal Consistency
Internal consistency is studied in order to address concerns about a measure’s consistency of content. It is primarily of
interest in cases where a test or subtest has items that are assumed to function similarly. Obtaining information about
internal consistency for norm-referenced measures presents few practical difficulties: The same information used to
provide norms is used to study internal consistency. Thus, information about internal consistency is often provided, even
if little else is.
The most basic method for examining internal consistency involves the calculation of a split-half reliability coefficient,
where performances of a group of test takers like
Page 69

Fig. 3.3. An example showing how to calculate a point-to-point measure of agreement.


those for whom the measure is designed are compared for two halves of the measure. Although the measure may be split
in half using a variety of strategies, most often even items are compared with odd items through the calculation of a
correlation coefficient. Higher correlations are taken as evidence of internal consistency.
A major problem with the split-half method is that because you compare only one-half of the test items with the other
half, the amount of data used in the correlation coefficient is half what it should be. This has the effect of making the
correlation coefficient smaller than it would otherwise be. Alternative methods have been developed to cope with this
limitation.
The two most important alternative measures of internal consistency encountered in tests for children are the Kuder–
Richardson formula (KR20) and Coefficient alpha (α). KR20 (which, in case you’re curious about the name, was the
twentieth formula used by Kuder & Richardson in a famous 1937 article—Kuder & Richardson, 1937) is used only for
dichotomously scored items (e.g., those scored as 1 = right and 2 = wrong only). It cannot be used for items that are not
scored dichotomously (e.g., those using a rating system from 1 to 4). This limitation led to the development of α.
Coefficient alpha is a more general formula than KR20 and can handle both dichotomously and nondichotomously
scored measures. KR20 and α are thought to be more sensitive than split-half methods to homogeneity of item content,
meaning the extent to which items are aimed at the same specific construct. Thus they are sometimes described as
measures of test homogeneity.
Page 70
Near the beginning of this section on reliability I introduced the idea that reliability can be considered in terms of its
impact on a given score using a statistic called the SEM. The SEM is discussed in greater detail at this point because it is
usually based on a measure of internal consistency (possibly because of the easy availability of this type of reliability
data, rather than for theoretical reasons). The formula for the SEM is relatively easy to understand and use. It is
calculated by multiplying the standard deviation of the test by the square root of 1 minus the reliability coefficient. It
represents the degree of error affecting an individual score.
Recall that as reliability increases, the size of the SEM decreases: The more reliable a measure, the smaller the error
affecting individual scores and the more precise the measurement. Thus, one can use the SEM directly as a means of
determining which of two competing measures is more precise. For example, for a 4-year-old child you may want to
compare the SEM for two tests designed to address receptive vocabulary skills using very similar tasks. Although there
are additional grounds on which you may want to compare the two tests, precision would be one important feature to
consider in making a choice between them. Searching in their test manuals, you find that the SEM for the first test is 7
(for which the mean standard score is 100, SD = 15) and for the second test is 4 (for which the mean and standard score
is 100, SD = 15). Thus, the second is the more precise of the two measures. (Although it is possible to make essentially
the same comparison using the reliability coefficients for these measures, phrasing that comparison in terms of the SEM
allows you to see much more vividly the impact on an individual score.)
The SEM can also be used, along with information about the normal curve, to obtain a confidence interval around an
obtained score—a concept I discuss more fully in chapter 9 as part of a larger discussion of test scores and identification
decisions.
Interexaminer Reliability
Interexaminer reliability is studied in order to address concerns about a measure’s consistency across examiners.
Essentially, this form of reliability study addresses the question, Are different examiners likely to affect performance on
the measure? Depending on the specific focus, it can be called by a variety of names: interscorer reliability, interobserver
reliability, interjudge reliability, among others. The nature of the study depends on which aspects of the sequence of
activities involved in administering, scoring, and interpreting the measure are expected to be most vulnerable to
inconsistency. For example, if a measure involves a sophisticated perceptual judgment on the part of the examiner (such
as the application of a 5 point rating scale), that aspect of the test’s use would be the primary focus of a reliability study.
Alternatively, if the calculation of a measure’s total score depended on the calculation and correct recording of numerous
sums, then that aspect of the test’s use would be a more important focus of study.
Where possible during reliability studies, two testers are asked to perform the same function (e.g., scoring), either from
tape (audio or video) or live, for a single group of test takers. Then the resulting scores are examined using a reliability
coefficient. When the actual administration of items seems to provide a source of error, the same
Page 71
group of test takers may be tested by two testers. The results will be less clear-cut in that case, however, because
differences in the two testing times could be due to differences either in testers or in testing times (test–retest reliability).
For informal measures, consistency across users of the measure is more commonly discussed in terms of agreement. For
example, it is possible to calculate agreement for two examiners using a behavioral probe to examine performance within
a specific treatment task. The methods are identical to those described in Fig. 3.3.
Alternate-Forms Reliability
Alternate-forms reliability is studied to address concerns about consistency across varying forms of the test. Multiple
versions of a test, termed alternate or parallel forms, tend to be created when a test will probably be used on more than
one occasion with an individual, thus making repeat testing subject to possible carryover effects. Alternate forms are
created by selecting items for each form from a common pool of possible items. Alternate-forms reliability is studied by
administering one version, then another (balanced so that half of the test takers will take one version first and the other
half will take the other version first), and then calculating a correlation coefficient for the resulting two sets of scores.
Often the interval between testings is very short, and the correlation coefficient is thought to reflect only differences in
the form used. If the interval is longer, however, the resulting correlation coefficient can be expected to reflect not only
differences in content between the two forms, but also changes due to time. Therefore, information about the interval
between testings should be reported as part of the test developer’s description of the study.
Alternate, or parallel, forms are rarely provided for tests used with children who have developmental language problems.
They are typically reserved for tests that are used with greater frequency, such as some educational and intelligence tests.
Nonetheless, there are a small number of tests (e.g., the PPVT-III, Dunn & Dunn, 1997) that do provide this information,
which is why it is considered here.
Factors Affecting Reliability
Any factor that increases the likelihood that nonsystematic error will enter into the testing situation will, by definition,
decrease a measure’s measured reliability. Consequently, any lack of similarity between testing conditions during a study
of test–retest reliability or interexaminer reliability, for example, are likely to result in lower reliability coefficients. In
addition, there are a couple of factors that may not be so obvious that will distort the magnitude of reliability
coefficients. These are discussed further in a variety of sources, including Nitko (1983) and Gronlund (1993).
First, the length of the measure used will affect the size of the reliability coefficient. In general, the longer a measure, the
greater its reliability. This factor presents a significant challenge to those wishing to develop tests for test takers with
shorter attention spans (e.g., children!).
Second, the specific group on which reliability is studied may affect the size of the obtained reliability coefficient. One
reason for this is a phenomenon known as restric-
Page 72
tion of range. What that means is that when there is little variability in performance in a distribution of scores (the
restricted range), the size of the correlation coefficient will be smaller than if the same pattern of variation were extended
over a larger range of scores.
Another reason for the possibility of specific groups affecting the size of reliability coefficients is that characteristics of
one group may make it susceptible to error that does not affect a different group. Take the performance on an IQ test of
one group with and one group without an identified learning disability. The ability of those two groups to perform
consistently under the same conditions may not be the same, leading to differing results if reliability coefficients were to
be calculated for each group. The danger would be, however, that rather than looking for evidence for each group
separately, one would consider evidence about the group without an identified learning disability as sufficient for both
groups. Here, as has been stressed before, the adequacy of evidence concerning reliability (and validity) needs to be
considered in light of the specific circumstances (who is being measured and for what purpose) motivating the clinician’s
search for an appropriate measure. In the next chapter, procedures are presented that are designed to help you learn how
to evaluate individual measures within a client-oriented framework.
Summary
1. Although behavioral measurement has relatively ancient roots, clinical and educational testing began only at the end
of the 19th century.
2. The most influential standards developed for educational and psychological testing have been those of APA, AERA,
and NCME (1985). These standards apply to all behavioral measures, but apply most strictly to standardized tests.
3. The test user is responsible for assuring that a specific measure is likely to provide the information being sought (i.e.,
that the measure is a valid measure for the purpose to which it will be put).
4. Because all evidence of validity depends on demonstrations that the measure captures the theoretical construct it was
intended to assess, construct validity can be seen as the overarching framework of validation. As a result of historical
factors, however, three types of evidence are typically discussed: construct validity, content validity, and criterion-
related validity.
5. Four specific methods of construct validation include the developmental method, the contrasting groups method,
factor analytic studies, and studies of convergent and discriminant validity.
6. Content validation activities occur as part of the development process (e.g., documentation of the test plan, item
analyses) and, following development, as part of validation activities.
7. Standardized measures designed for criterion-referenced interpretation and for norm-referenced interpretation are
developed using similar steps, but differ in the methods used to make decisions at each step.
Page 73
8. Face validity, or public relations validity as it is sometimes called, involves a measure’s appearance of validity rather
than the degree of validity it will be shown to have on closer, systematic scrutiny.
9. Criterion-related validity involves the collection of evidence suggesting that the target measure performs in a manner
similar to that of an already validated criterion measure. Concurrent validity refers to criterion-referenced validation
studies in which the criterion and the target measure are administered to the participant at the same point in time,
whereas predictive validity refers to studies in which the target measure is obtained first and the criterion at a later time.
10. Validity is affected by appropriate measure selection, test administration conditions, reliability, and client factors
such as motivation and enabling behaviors.
11. Reliability (consistency of measurement) places an upper limit on possible validity, but even perfect reliability does
not ensure validity. Reliability is therefore said to be a necessary, but not sufficient condition for validity.
12. Studies of reliability usually target consistency across testing occasions (test–retest reliability), across subsets of test
items (internal consistency reliability), and across testers (interexaminer reliability). For speech-language pathology and
audiology measures, consistency across test versions (alternate form reliability) is much less frequently examined.
13. When reliability information is reported for a particular measure, reliability correlation coefficients are used most
frequently. When such information is reported in relation to a specific score obtained by an individual, SEM is used.
14. When information about consistency is sought for informal measures, agreement measures are usually calculated.
The most common measures of agreement are interexaminer and interexaminer agreement.
15. Classical true test score theory holds that the score actually received by an individual (the obtained score) is
composed of error and the theoretical score the individual “should” receive (the true score).
16. SEM can be used to construct a confidence interval within which one can determine a high probability of finding the
individual’s true score.
17. Reliability is affected by test length (with fewer items resulting in lower reliability) and by the range of abilities
represented in the reliability subjects (with a smaller range of abilities resulting in lower reliability).
Key Concepts and Terms
construct validation: the accumulation of evidence showing that a measure relates in predicted ways to the construct it is
being used to measure.
content validation: the accumulation of evidence suggesting that the content included in a measure is relevant and
representative of the range of behaviors fitting within the construct being measured.
Page 74
contrasting groups method of construct validation: the accumulation of evidence suggesting that groups known to differ
in the extent to which the tested construct applies to them also differ in their performance on the target measure.
convergent and discriminant validation: the accumulation of evidence suggesting that a measure correlates significantly
and highly with measures aimed at the same construct (convergent validation) as well as evidence that the measure does
not correlate significantly and highly with measures targeting different constructs (discriminant validation).
criterion-related validation: the accumulation of evidence suggesting that the measure performs in a manner similar to
another measure (the criterion) that is believed to be a valid indicator of the construct under study, either where both
criterion and measure are administered at one point in time (concurrent validation) or with the criterion measured at a
later point in time than the target measure (predictive validation).
developmental method of construct validation: the accumulation of evidence suggesting that performance on a measure
changes with age (usually improves), when the measure is meant to target a construct that is thought to change with age.
enabling behaviors and knowledge: behaviors not related to the construct under study that are nonetheless required for
successful test performance (e.g., vision for tasks using visual stimuli, previous exposure to vocabulary being used).
face validity: the appearance of validity of a measure, which is not necessarily reflective of its actual validity.
factor analysis: a number of statistical procedures used during test development and construct validation to describe and
confirm the relationships of a number of variables.
informal measure: a measure developed for a limited measurement purpose for which a standardized measure was
inappropriate or unavailable; for example, probes designed by speech-language pathologists to assess learning within a
treatment session are usually informal measures.
interexaminer agreement: the extent to which results of a measure agree when it is administered, scored, or interpreted
by two or more examiners.
interexaminer reliability: the consistency of a measure across two or more examiners, also termed interjudge reliability,
interobserver reliability, and intertester reliability.
internal consistency: the consistency of a measure across subdivisions of its content, usually measured using split-half
reliability, KR20, or coefficient α.
item analysis: a variety of procedures applied to the pool of items being considered for inclusion in a measure that
examine its possible contributions to the overall measure.
observed score: the score actually achieved by a given test taker; usually contrasted with “true” score in classical true-
score theory.
Page 75
reliability: consistency of a measure across changes in time (test–retest), in the individual administering or scoring it
(interexaminer), and in the specific items it contains (internal consistency).
standard error of measurement (SEM): a measure of reliability that is expressed in terms of the original units of
measurement (e.g., number of items).
test–retest reliability: the consistency of a measure that is administered at two points in time.
test: a behavioral measure in which a structured sample of behavior is obtained under conditions in which the “tested”
individual is assumed to perform at his or her best (APA, AERA, & NCME, 1985).
true score: a theoretical value that hypothetically would be obtained by a test taker if the measure being used were
perfectly reliable, that is, were unaffected by error
validity: the extent to which a measure actually measures what it claims to measure.
Study Questions and Questions to Expand Your Thinking
1. Define validity.
2. Choose a specific test dealing with child language. Compare sources of information about its content: (a) content
implied by the title, (b) its apparent content on the basis of the author’s overview statements concerning the intended
purpose of the test and its intended content, and (c) individual items. How might a naive test user be misled if he or she
only considers the title?
3. Describe the three major kinds of validity evidence and their relationships with one another.
4. Translate the following sentence into a form that someone unfamiliar with testing would be able to understand:
Reliability is a necessary but not sufficient condition for validity.
5. List the steps required in the development of a standardized measure. Compare and contrast these steps as they apply
to criterion- versus norm-referenced measures.
6. Imagine that you’ve set up a task with 20 items that you believe may be difficult for you to rate consistently as correct
or incorrect. What procedure would you use to obtain a measure of your consistency in rating these items.
7. List three factors known to affect validity.
8. List three factors know to affect reliability.
9. Explain how the amount of variability in test scores affects the magnitude of correlation coefficients. What
implications does this effect have for test developers?
10. What is meant by the convergent–discriminant approach to construct validity?
11. How does reliability relate to classical true score theory?
Page 76
12. Why is internal consistency associated with three measures: split-half reliability, KR20, and coefficient α?
13. List five enabling behaviors required for the performance of a picture vocabulary test in which the test taker is
required to listen to the name of an action and pick out a picture (from a group of 4) that corresponds to the action.
14. Reflect on situations in which a teacher, coach, or parent has helped you do something that you found particularly
difficult. What did they do that helped you feel motivated to try that difficult something? How might you apply the same
approach to the testing of a reluctant child?
Recommended Readings
American Educational Research Association, American Psychological Association, and National Council on
Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American
Psychological Association.
Gronlund, N. E. (1993). How to make achievement tests and assessments. (5th ed.). Boston: Allyn & Bacon.
Lyman, H. B. (1963). Test scores and what they mean. Englewood Cliffs, NJ: Prentice-Hall.
McReynolds, L., & Kearns, K. (1983). Single-subject experimental designs in communicative disorders. Baltimore:
University Park Press.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American
Council on Education and Macmillan.
Sattler, J. (1988). Assessment of children (3rd ed.). San Diego, CA: Author.
References
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic
techniques. Washington, DC: Author.
American Psychological Association, American Educational Research Association, and National Council on
Measurement in Education. (1985). Standards for educational and psychological testing. Washington, DC: American
Psychological Association.
Anastasi, A. (1954). Psychological testing. New York: Macmillan.
Anastasi, A. (1980). Anne Anastasi. In G. Lindzey (Ed.), A history of psychology in autobiography (pp. 1–37). San
Francisco: W. H. Freeman and Company.
Anastasi, A. (1988). Psychological testing (6th ed.). Upper Saddle River, NJ: Prentice-Hall.
Berk, R. A. (1984). Screening and identification of learning disabilities. Springfield, IL: CC Thomas.
Bzoch, K. R., & League, R. (1991). Receptive–Expressive Emergent Language Test (REEL-2). Austin, TX: Pro-Ed.
Campbell, D. P., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait–multimethod matrix.
Psychological Bulletin, 56, 81–105.
Carver, R. P (1974). Two dimensions of tests: Edumetric and psychometric. American Psychologist, 29, 512–518.
Cordes, A. K. (1994). The reliability of observational data: I. Theories and methods for speech-language pathology.
Journal of Speech and Hearing Research, 37, 264–278.
Cronbach, L. J., Gleser, G. D., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements:
Theory of generalizability for scores and profiles. New York: Wiley.
Dunn, L., & Dunn, L. (1997). Peabody Picture Vocabulary Test–III. Circle Pines, MN: American Guidance Services.
Page 77
Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York: Wiley.
Fredericksen, N., Mislevy, R. J., & Bejar, I. I. (Eds.) (1993). Test theory for a new generation of tests. Hillsdale, NJ:
Lawrence Erlbaum Associates.
German, D. J. (1986). Test of Word Finding. Allen, TX: DLM Teaching Resources.
Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18, 519–
521.
Glaser, R., & Klaus, D. J. (1962). Proficiency measurement: Assessing human performance. In R. Gagne (Ed.),
Psychological principles in systems development (pp. 419–476). New York: Holt, Rinehart & Winston.
Gronlund, N. (1993). How to make achievement tests and assessments. (5th ed.). Boston: Allyn & Bacon.
Hammill, D. D., & Newcomer, P. L. (1997). Test of Language Development Intermediate–3. Circle Pines, MN:
American Guidance Service.
Hresko, W., Reid, D., & Hammill, D. D. (1981). Test of Early Language Development. Los Angeles: Western
Psychological Associates.
Hsu, J. R., & Hsu, L. M. (1996). Issues in design research and evaluating data pertaining to children’s syntactic
knowledge. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing childrens syntax (pp. 303–341).
Cambridge, MA: MIT Press.
Kent, R. D., Kent, J. F, & Rosenbek, J. C. (1987). Maximum performance tests of speech production. Journal of Speech
and Hearing Disorders, 52, 367–387.
Kuder, G.F., & Richardson, M. W. (1937). The theory of estimation of test reliability. Psychometrika, 2, 151–160.
Maynard, D. W., & Marlaire, C. L. (1999). Good reasons for bad testing performance: The interactional substrate of
educational testing. In D. Kovarsky, J. Duchan, & M. Maxwell (Eds.), Constructing (in)competence (pp. 171–196).
Mahwah, NJ: Lawrence Erlbaum Associates.
McReynolds, L. V. & Kearns, K. P. (1983). Single-subject experimental designs in communicative disorders. Baltimore:
University Park Press.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American
Council on Education and Macmillan.
Mislevy, R. J. (1993). Foundations of a new test theory. In N. Fredericksen, R. J. Mislevy, & I. I. Bejar (Eds.), Test
theory for a new generation of tests (pp. 19–40). Hillsdale, NJ: Lawrence Erlbaum Associates.
National Education Association. (1955). Technical recommendations for achievement tests. Washington, DC: Author.
Nitko, A. J. (1983). Educational tests and measurement: An introduction. New York: Harcourt Brace Jovanovich.
Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Salvia, J., & Ysseldyke, J. E. (1995). Assessment (6th ed.). Boston: Houghton Mifflin.
Semel, E., Wiig, E. H., & Secord, W. (1987). Clinical Evaluation of Language Fundamentals-Revised. San Antonio: The
Psychological Corporation.
Stillman, R., Snow, R., & Warren, K. (1999). ‘‘I used to be good with kids.” Encounters between speech-language
pathology students and children with Pervasive Developmental Disorders (PDD). In D. Kovarsky, J. Duchan, & M.
Maxwell (Eds.), Constructing (in)competence (pp. 29–48). Mahwah, NJ: Lawrence Erlbaum Associates.
Stevens, G., & Gardner, S. (1982). The women of psychology. Cambridge, MA: Schenkman.
Torgesen, J. K., & Bryant, B. R. (1994). Test of Phonological Awareness (TOPA). Austin, TX: Pro-Ed.
Van Riper, C., & Erickson, R. (1969). A predictive screening test of articulation. Journal of Speech and Hearing
Disorders, 34, 214–219.
Zimmerman, I. L., Steiner, V. G., & Pond, R. E. (1992). Preschool Language Scale–3. San Antonio, TX: Psychological
Corporation.
Page 78
CHAPTER
4

Evaluating Measures of Children’s Communication and Related Skills


Contextual Considerations in Assessment: The Bigger Picture in which Assessments Take Place

Evaluating Individual Measures


In the last chapter, you were introduced to the most important test-related considerations for evaluating individual
measures: validity and reliability. In this chapter, you will learn about factors to consider in evaluating measures and
about how to perform such an evaluation—a process that makes the most sense when the focus is shifted away from the
test itself and toward the reason for its use: the child in question and the larger world in which he or she moves.
Speech-language pathologists use measurement information to achieve goals affecting children’s health, development,
family life, education, and social well-being. They obtain this information cooperatively (working primarily with
families and other professionals) and share it with others as a means of achieving the child’s greatest good. This
cooperative pursuit on behalf of the child is not simply a practical matter, although it certainly affects the logistics of
measurement in very practical ways. Rather, a rich understanding of the way in which children’s interactions with the
world are mediated by their family and culture is critical to framing questions that will result in valid responses to the
child’s needs. Also needed, however, is an appreciation that the clinician brings his or her own history, culture, and
workplace constraints to the question-asking situation—
Page 79
all of which will also bear on which questions are asked and how they are answered. In the first half of this chapter, I
discuss the larger context of measurement, focusing first on factors affecting the child and then on factors, that more
directly impinge on the clinician. Figure 4.1 illustrates a visual model for thinking about this larger context.
Contextual Considerations in Assessment: The Bigger Picture in Which Assessments Take Place
In 1974, Urie Bronfenbrenner was responsible for an evaluation of the developmental research of that era which can still
chill the heart of researchers and clinicians who study children in highly structured contexts. Specifically, he described
that research as “the study of the strange behavior of children in strange situations for the briefest possible period of
time” (Bronfenbrenner, 1974). This quotation brings into sharp focus a deep concern that researchers were failing to
capture the essential factors affecting the child by failing to study them and their most influential companions (usually
parents) in the natural situations in which development occurs. Shifting species for a second, one could say that
essentially Bronfenbrenner pointed out that drawing conclusions about children in real life from existing research
paradigms was akin to concluding that one knew about lions in the wild by observing lions moving around the artificial
rocks, caves, and ponds of their enclosure in a zoo. Anyone who has seen a wild-eyed, noncompliant, and virtually
nonverbal child leave a clinic room to begin a fast-paced, detailed litany of his ordeals can understand Bronfenbrenner’s
point—as well as the relevance of the lion analogy.
A vast research literature was spawned by Bronfenbrenner’s criticism and by the program of research he and others
undertook to understand development through observations of children and their caretakers in real-life settings. The
resulting literature is associated with an evolving theory of development (Bronfenbrenner, 1986; Bronfenbrenner &
Morris, 1998) that can provide us with a valuable starting point for thinking about the larger context of assessment.
A recent articulation of this model (Bronfenbrenner & Morris, 1998) was described by its authors as a “bioecological
model” of development because it emphasizes both the child’s characteristics and the context in which development
occurs as contributors to the process of development. Among the most obvious modifications represented in this version
of the model are the placing of greater emphasis both on biological factors affecting the child and those around him or
her and on the child’s role in affecting his or her environment as well as being affected by it. The enduring central
component of the model, however, and the component that was most needed and championed in speech-language
pathology, is its celebration of the importance of the child’s environment to developmental processes, especially the
social environment (Crais, 1995; Muma, 1998). In the following pages I briefly discuss how current thoughts on the
contexts of family, language, culture, and society as a whole continue to shape and reshape views of valid language
evaluation and how aspects of the clinician’s context also affect the evaluation of children’s language.
Page 80
Fig. 4.1. A model of factors affecting the child and the clinician in the assessment process. From Assessing and
screening preschoolers: Psychological and educational dimensions (p. 6), by Vasquez-Nuttall, Romero, and Kalesnik,
Boston: Allyn & Bacon. Copyright 1999 by Allyn & Bacon. Adapted by permission.
Page 81
Familial Contexts
Why should families be seen as the central forum for language development and, thus, language assessment? From the
time the child is born, the family constitutes the most basic and enduring of contexts in which children spend their time
and their energies. Further, the foundation of communication and language is established with the give and take of early
feeding and proceeds onward to all attained levels of linguistic achievement.
Although these truths have probably always been recognized by professionals at some level, they have tended to be
overlooked in measurement practices until the diffusion of theories such as that proposed by Bronfenbrenner led to
political action. Specifically, the Education of the Handicapped Act Amendments of 1986 required Individual
Educational Plans (IEPs) for children ages 3 to 5 and Individualized Family Service Plans (IFSPs) for children younger
than 3. Through these requirements, the law embodied the perspective that because of the intertwined and interdependent
nature of child and family needs, effective evaluation and intervention for children requires inclusion of the family as
collaborators—that is, as active agents in the life and affairs of the child rather than as passive recipients of professional
activities. Particularly for children below the age of three, this perspective was seen as crucial, hence the requirement of
the IFSP for that age group.
Within the IFSP provisions, assessments include information about family strengths, needs, and variables related to
program services, as well as about the child’s current level of functioning (Radziewicz, 1995). Radziewicz noted that
effective family assessment is conducted in a manner that is positive for the family, respectful of the family’s values,
inclusive of key family members, nonintrusive, and aimed at targeting family needs and resources (Radziewicz, 1995).
New types of tools have been developed to address clinical questions concerning the nature of the family and of parent–
child interactions. Radziewicz (1995) and Crais (1992, 1995) provided excellent discussions of these.
In addition to serving as a focus of attention of professionals, however, parents and families have become more actively
involved in a variety of “clinical” activities, including screening, providing descriptions and other data, validating
evaluation findings, and even administering some tests. Although these activities are described in later chapters, they are
mentioned here to help you become aware that your consideration of an instrument’s validity will often include thinking
about the suitability of its use with and by parents. Not surprisingly, this need is greatest for younger children and infants
and for children who are more affected by their difficulties.
Cultural-Linguistic Contexts for Assessment
Just as the child is embedded within his or her family, so too is the family embedded within a specific culture and
linguistic context. Thus, effective interaction with families depends not simply on the clinician’s choosing to include
them in the process, but also on her or his knowledge of each family’s cultural and linguistic expectations. The variety of
cultural and linguistic differences affecting a clinician’s interaction with
Page 82
parents is quite awesome. Among just a few of the differences discussed in a growing literature (Damico, Smith, &
Augustine, 1996; Donahue-Kilburg, 1992; van Kleeck, 1994) are the following:
1. differences in child-rearing practices (e.g., the appropriateness of asking children to engage in question asking or to
recite information already known by listeners);
2. differences in patterns of decision making within families (e.g., which figures are seen as primary decision makers);
3. differences in family choices concerning language and dialect use (e.g., whether children are expected to use the
language of the home);
4. and differences in how difficulties in communication are viewed (e.g., how they are viewed as affecting the family
and child).
Differences such as these can affect the nature and extent of communications occurring between clinicians and parents,
how they are included in their child’s care, the nature of intervention, and—most importantly for the purposes of this
book—how language evaluations are planned, executed, and acted on. Also, because evaluations are prompted by
heightened parental concern or can act to promote parents’ focus on their child, evaluations that successfully involve
parents can also enlist parents’ continuing engagement in ways that are critical to the child’s success. Table 4.1
summarizes reported trends in the attitudes of Asian Americans, African Americans, and Hispanic Americans toward
children, family, and child rearing. Of course, these trends represent prejudices, that is, prejudgments, of a type: There is
simply no substitute for finding out how a specific family functions and what its attitudes are, regardless of its culture.
That each child is also a maturing user of ambient language(s) and dialect(s) will affect assessment dramatically. Most
obviously, clinicians are aware of this when they are asked to assess the communication skills of a child whose first
language is not the same as their own, and they must decide whether and how they can be involved with the child.
Clinicians are also aware of this when they serve children who differ in social or regional dialect from themselves. In
both cases, the clinician must often determine whether the differences from the mainstream, dominant, or school
language are due to language disorder or to difficulties specific to second language or dialect acquisition (e.g.,
inadequate exposure, transference effects from the first language or dialect, motivational differences between first and
second language acquisition; Damico et al., 1996).
Issues related to the presence of culturally and linguistically diverse clients was once seen as a matter of sporadic
significance. Once it was considered more important in bigger cities with larger immigrant populations and in
geographic regions with greater cultural, ethnic, and dialectal diversity. Now, however it has been estimated that one in
every three Americans is African American, Hispanic, Asian American, or American Indian (American Speech–
Language–Hearing Association [ASHA], 1999). Although nationally and globally, diversity in language and culture is
the rule rather than the exception, that fact is not represented in the demographics of the professions of
Page 83
Table 4.1
Trends in Attitudes Toward Children, Family, and Child Rearing

Asian Americans

Strict gender and age roles Father—the family leader, head of family Mother—the nurturer, caregiver Older males
superior to younger males Females submissive to males Close, extended families Multigenerational families Older
children strictly controlled, restricted, protected Physical punishment used Parents actively promote learning activities
at home—may not participate in school functions Children are treasured Infant/toddler needs met immediately or
anticipated Close physical contact between mother and child Touch rather than vocal/verbal is primary vehicle of
early mother–infant interaction Harmony of society more important than individual Infant seen as independent and
needing to develop dependence on family and society African Americans Mothers and grandmothers may be greatest
influences Strong extended family ties are encouraged Independence and assertiveness encouraged Infants may be
focus of family attention Affectionate treatment of babies, but fear of “spoiling” Strong belief in discipline, often
physical Caregiving of older toddler may be done by an older child Hispanic Americans Strong identification with
extended family Families tend to be patriarchal with males making most decisions Infants tend to be indulged;
toddlers are expected to learn acceptable behavior Emphasis placed on cooperativeness and harmony in family and
society Independence and ability to defend self encouraged Older siblings often participate in child care Note. From
Family-Centered Early Intervention for Communication Disorders: Prevention and Treatment (p. 21), by G. Donahue-
Kilburg, 1992, Gaithersburg, MD: Aspen. Copyright 1992 by Aspen. Reprinted with permission.
speech-language pathology and audiology. Thus, clinicians are increasingly faced with the special challenge of enlarging
their understanding of other cultures and linguistic communities and the skills required to implement that understanding
in their work.
The process of respecting diversity in children and in their families pervades all phases of clinical interaction. Because it
is critical to valid screening, identification, description, and assessments of change, diversity arises as a continuing point
of discussion throughout the remainder of this text. I highlight it here because of its particular relevance to the test
review process discussed later in this chapter.
Page 84
Societal and Legal Contexts
Just as the child whose language development is in doubt exists as a member of a larger community, so too is the speech-
language pathologist who serves the child. He or she is also a participant in the larger social contexts of a given
profession and workplace within a particular time and place—a given era within a given school district or institution,
state, and country. Each of these contextual factors can affect decisions about assessment. A recent discussion of the
roles and responsibilities of school speech-language pathologists, contained within an extensive ASHA document
available on their website, emphasized this fact (ASHA, 1999). Table 4.2 includes just a small number of the many
factors ASHA described as affecting clinical practice with children. In this brief section, two particularly compelling
sources of effects on measurement practice are addressed: national legislation and changing global perspectives on
disablement.
National Legislation
As mentioned briefly in terms of regulations regarding family involvement, legal influences on how children are
evaluated for language problems represent some of the most powerful influences in clinical practice. In particular,
federal legislation establishing the ways in which public schools address the needs of children has had profound effects
on how children’s problems are screened, identified and addressed (ASHA, 1999; Demers & Fiorello, 1999). Thus, as
described earlier, it was through Education of the Handicapped Act Amendments of 1986 that ideas about the need for
greater attention to families became a potent factor in shaping actual practice. In this section, I point out the even broader
effects that have resulted from a number of other legislative initiatives, paying particular attention to the Individuals with
Disabilities Education Act (IDEA), which was passed in 1990.
The IDEA built on and modified earlier legislation, including two landmark federal laws: the Education for All
Handicapped Children Act of 1975 (P.L. 94-142), which established many now-standard features of educational
attention to children with special needs and Education of the Handicapped Act Amendments (1986), which mandated
services for those children from birth to age 21, in addition to its role in pressing for greater inclusion of families in
educational evaluations. Since 1990, the IDEA has been amended (IDEA Amendments of 1997) and has had regulations
developed for its implementation.
Table 4.2
A Brief List of Some of the Contextual Factors Affecting Speech-Language Pathology Practice Among School-Based
Clinicians (ASHA, 1999)

Specific federal legislative actions (e.g., the Individuals with Disabilities Education Act of 1990) State regulations and
guidelines Local policies and procedures Staffing needs Caseload composition and severity Cutbacks in education
budgets Personnel shortages Expanding roles
Page 85
The IDEA and the 1997 amendments to it maintained numerous elements of the earlier legislation. Among the most
important of these maintained features is a mandate for nondiscriminatory assessment. In such assessments, it is required
that measures be administered in the child’s native language by trained personnel following the procedures outlined in
the test manual. In addition, these more recent laws dictate that validity information for a test be specific to the purpose
for which the test is used. Further, this legislation requires that evaluations of children be comprehensive, multifactored,
and conducted by an interdisciplinary team. Although each of these components was viewed as the best practice at the
time of legislation, legislation and the potential for litigation where legislation is not followed give rise to the actual
implementation of professional and academic recommendations. However, it’s important to recognize that legislation is
not always in accord with best practices, as I discuss in later sections.
New provisions of the IDEA, its amendments, and the more recent development of regulations implementing it include
some changes in nomenclature, such as abandonment of the term handicapped for the term disabled as the designation
given to children covered by the law. In addition, these legal actions have added several new separate disability
categories, with autism being the most relevant to discussions of language disorders. Other new elements consist of
demands for increased accountability with resulting increases in documentation requirements and insistence that
children’s IEPs contain information connecting the child’s disability to its impact on the general education curriculum
(ASHA, 1999; Demers & Fiorello, 1999).
Because of the legislation described above, speech-language pathologists who work with children in schools are
involved in a broader range of responsibilities and potential roles (ASHA, 1999). The children they evaluate are more
diverse in age, language, and culture, and the collaborative nature of their work has increased dramatically. Also,
clinicians are made more accountable for the validity of the instruments they use and the methods they follow in
evaluating clients.
To a great extent, the effects of national legislation are supportive of good measurement practices. At the same time,
however, legislation introduces complexity for clinicians, who face increasing responsibilities, increasing demands for
documentation, and the push to revise or develop strategies to deal with the specific ways in which individual states and
school districts implement federal law. Some of the complications to clinical practice introduced by state Departments of
Education are discussed as they relate to specific measurement questions in later chapters.
World Health Organization Definitions
At an international level, changes brought about by the World Health Organization (WHO) of the United Nations have
affected assessment practices (WHO, 1980). As part of its charge to develop “a global common language in the field of
health,’’ WHO proposed guidelines reflecting changing views about health and departures from health that would affect
a wide array of sectors, including health care, research, planning and policy formation, and education. Specifically, in
1980, WHO developed the International Classification of Impairments, Disabilities, and Handicaps (ICIDH), in which
various types of outcomes associated with health conditions were considered.
Page 86
The 1980 ICIDH classification recognized four levels of effects. These levels are summarized here with examples taken
from applications to language disorders. First, there is disease or disorder, the physical presence of a health condition,
for which a language disorder can serve as the example. Next, there is impairment, an alteration of structure or function
causing the individual with the condition to become aware of it. For children with language disorders, an example of a
possible impairment would be inappropriate use of grammatical morphemes. The third level of effects is described as
disability, an alteration in functional ability. For children with language disorders, the disability associated with their
difficulties could be a decreased ability to communicate. The last level recognized in the ICIDH is that of handicap,
which is a social outcome. Thus, negative attitudes on the part of playmates or teachers toward affected children
constitutes a possible handicap associated with language disorder.
Although changes in these terms and the reasons for those changes are discussed in a moment, I first discuss two
important implications of this new classification system that have proven most significant. First, although there is a
tendency for these four types of effects to be related to one another (e.g., for more severe disorders to be associated with
greater handicaps), this is not always the case. For example, it is possible for a handicap to exist apart from the presence
of a disease or disorder, as might be the case if societal prejudice against an individual occurred in the absence of actual
impairment. A specific example might be if a child were to be excluded by a group of peers because of a cleft lip, an
observable but functionally insignificant difference.
Similarly, it is possible for a more severe impairment to be associated with only a mild disability and minimal handicap
because of successful compensatory strategies on the part of the individual, effective interventions on the part of
professionals, or both. Imagine a child with a moderate hearing loss acquired after initial stages of language acquisition
are complete who experiences high overall intelligence, strong motivation, a supportive home environment, and effective
auditory management. Such a child could be expected to experience lesser effects on communication effectiveness and
on social roles than would be expected on the basis of the severity of hearing loss alone. This classification causes one to
consider the role of not only the child, but also of his or her surroundings in determining the nature of negative effects
experienced because of a disorder.
A second major implication of the 1980 classification is that each of the four levels of effects is understood to be
associated with different measurement goals for both research and clinical purposes. For example, measurement focused
at the level of handicap requires information about how a child’s social and educational roles are affected by his or her
condition. This contrasts with measures focused at the level of impairment, which require information about the child’s
use of particular language structures. The greater attention paid to the larger ramifications of health conditions coincides
with an urgent push in both clinical and educational settings for measuring and evaluating the effectiveness of
interventions in terms these higher order effects.
Despite the widespread influence of the 1980 classification system, dissatisfaction existed with its terminology and with
the ways in which the social contributions to the effects of health conditions was handled. Among specific criticisms was
that terminology was sometimes confusing and included the use of potentially offensive terms
Page 87
such as handicap (Frattali, 1998). The model underlying the classification was also criticized for failing to represent the
influence of contextual factors.
Because of concerns about the 1980 classification system, a draft revision was put forward in 1997 for comment and
field testing, with an expected final approval date for a final version in 2000 (WHO, 1998). The proposed classification
system is called the ICIDH-2: International Classification of Impairments, Activities, and Participation (WHO, 1998),
reflecting significant changes to the theoretical orientation from the earlier classification of “Impairments, Disabilities,
and Handicaps.” The details of the final revision remain indefinite at the moment. Nonetheless, the current draft warrants
discussion because of its value as an indicator of emerging trends and because it fits snugly with the view of children
advanced up to this point in the chapter—that is, as deeply affecting and affected by their environment.
As its most important change, the 1997 classification is designed to embrace a model in which human functioning and
disablement result from an interaction of the individual’s condition and his or her social and physical environment. In
this system, therefore, the following definitions are used to describe levels of functioning (or where decreased
functioning is noted, disablement) in the context of a health condition:
1. “Impairment is a loss or abnormality of body structure or physiological or psychological function, e.g., loss of a limb,
loss of vision” (WHO, 1998, p. 8). Notice that this level corresponds to the current ICIDH level of impairment and thus
might refer to a child’s abnormal or delayed language characteristics.
2. “An Activity is the nature and extent of functioning at the level of the person. Activities may be limited in nature,
duration, and quality, e.g., taking care of oneself, maintaining a job” (WHO, 1998, p. 8). Notice that this level replaces
the current ICIDH level of disability and thus might refer to a child’s reduced ability to communicate.
3. “Participation is the nature and extent of a person’s involvement in life situations in relation to Impairment, Activities,
Health Conditions and Contextual factors. Participation may be restricted in nature, duration and quality, e.g.,
participation in community activities, obtaining a driving license’’ (WHO, 1998, p. 8). This final level corresponds to
the older level of handicap and thus might refer to negative social outcomes of a child’s language problems.
On the basis of these new formulations, one can see continuities between the proposed and existing systems yet also
notice a significant change in orientation that is both more positive in tone and more recognizing of contextual
influences. In the new classification system, a person’s environmental (social and physical) and personal contexts are
said to influence how disablement at each of these levels is experienced. In particular, two types of contextual factors are
deemed most important: (a) environmental and physical factors (such as social attitudes, physical barriers posed by
specific settings, climate, and public policy) and (b) personal factors (e.g., education, coping style, gender, age, and other
health conditions; WHO, 1998, p. 8).
From this overview, it is evident that the thrust of the ICIDH-2 will be support for many of the principles championed by
Bronfenbrenner, by recent federal legislation,
Page 88
and by advocates for an integrated view of validity in which the effects of a decision made using a measure must be
considered when one evaluates a measure’s validity. Overall, a unifying principle is that decision making on behalf of
children requires attention not simply to properties of the child but to the context in which the decision is being made and
acted on.
In the last half of this chapter, practical steps involved in the process of evaluating measures for possible use in decision
making are described. Although I have rendered the larger context in which this process must take place in only the
grossest detail, I hope that you can sense the sheer intricacy of the task at hand. On the one hand, confronting the very
significant intellectual challenge entailed in the selection, use, and interpretation of appropriate measures makes me
nearly turn tail and run. On the other hand, however, the rewards of successful clinical decision making and action would
be less sweet if they were easily won.
Evaluating Individual Measures
Evaluating individual measures is like solving a mystery, where the mystery is how to view a measure for use with a
particular client or group of clients. After a general plan is developed in the early stages of the review process, clues are
collected and weighed. Most clues come from the clinician’s knowledge of individual clients and their needs and from
the manual for the particular measure. Additional sources of information, such as test reviews and pertinent research
articles, can also help in the process. This chapter is arranged so that, following a brief overview of two modes of
reviewing, you are introduced to the test manual and then to other sources of information to help you reach a final
decision—to “crack the case,” if we follow the detective analogy.
Client-versus Population-Oriented Reviews of Measures
I have said that the validity of a measure depends on its ability to answer a particular clinical question for a particular
child. Consequently, the appropriateness of a measure is determined within the realm of the particulars—ideally, within a
firm appreciation of factors important to an individual child, such as coexisting handicapping conditions, language
background, gender, and age—as one reviews the test manual and other sources of information for the measure. Such a
review might be said to be a client-oriented review of the measure.
Client-oriented review of measures is an ideal that is often unattainable. Given the pace of most clinical environments,
clinicians are rarely able to review each potential measure thoroughly and compare it with competing measures
immediately prior to each measurement they make. In fact, clinicians more commonly use what I would call a
population-oriented evaluation.
In a population-oriented review of a measure, the clinician reviews the measure’s documentation in reference to a
particular group or groups—usually those subgroups of children they serve most frequently. For example, a speech-
language pathologist in a rural Vermont school would pay special attention to a test’s likely value for a subgroup of chil-
Page 89
dren with few significant problems in other areas of development, who come from homes in which English is the only
language spoken, and socioeconomic status is middle to low. In contrast, a very different population-oriented assessment
might be conducted by a speech-language pathologist in a Boston school district with a caseload consisting solely of
children from French-speaking Haitian families living in poverty. Although evaluating a measure for these two
populations would involve many of the same questions, each would require different answers reflecting sensitivity to the
relevant population.
Population-oriented reviews are most frequently conducted when a new measure is considered for purchase, when a
measure is examined at a publisher’s display at a convention, or when a speech-language pathologist enters a new
position and inventories available measures. In contrast, client-oriented reviews of measures often arise when an
uncommon clinical question emerges or when a child’s particular pattern of problems (e.g., mental retardation and a
severe hearing loss) make the child’s needs in a testing situation too unlike those for which the clinician has conducted a
population-oriented evaluation.
How to Use Test Manuals
Regardless of the type of review you undertake, the outcome of your evaluation will never simply be a buy–don’t-buy
or– use–don’t-use decision. A thorough review provides potential users with an appreciation of the measure’s limitations
for answers to specific clinical questions.
The test manual is the definitive source of information on a standardized measure. In fact, many of the recommendations
made in the Standards for Psychological and Educational Testing (APA, AERA, & NCME, 1985) relate directly to
material that should be provided in test manuals. Despite their importance, however, test manuals range widely in their
sophistication and value. At their best, test manuals provide not only the basic information required to evaluate the
measure’s appropriateness for given uses with specific populations, but also insightful tutorial information that can
reinforce and extend one’s understanding of test construction and use. At their worst, test manuals appear to be little
more than sales brochures designed to obscure a test’s weaknesses and imply that it can be used for all clients and testing
purposes. Even measures that are valuable additions to a clinician’s arsenal may imply possible uses that really are not
supportable. Consequently, a clinician’s detective talents are called on to ferret out the truth!
The reviewing guide reproduced in Fig. 4.2 is a worksheet for evaluating behavioral measures. It is blank so that you can
readily duplicate and use it. An annotated version of the guide, which appears as Fig. 4.3, summarizes the most
important kinds of information—or “clues”—you will be looking for as you conduct a measure review. The annotated
guide is designed to function like the ready reference cards available for many software applications.
The reviewing guide and annotated guide are included to make reviewing a more efficient process, but their inclusion is
not without hazards. The danger of such worksheets and summaries is that some individuals may consider them all one
needs to know in order to conduct a credible review. This is a big mistake! These guides are a first step that should
always be accompanied by a willingness—even eagerness—to
Page 90
(continued)
Page 91
(continued)
Page 92
Fig. 4.2. Annotated review form.
Page 93
(continued)
Page 94
(continued)
Page 95
Fig. 4.3. Review form.
Page 96
look back at trusted resources on measurement, especially the Standards for Educational and Psychological Testing
(APA, AERA, & NCME, 1985). After all, even Sherlock Holmes depended on his learned friend Dr. Watson!
Numerous authors writing about psychometric issues propose review procedures that are very similar to those described
here (e.g., Anastasi, 1988, 1997; Hammer, 1992; Hutchinson, 1996; Salvia & Ysseldyke, 1998; Vetter, 1988b).
Appendix 5 in Salvia and Ysseldyke (1998, pp. 763–766), “How to Review a Test,” is a particularly informative and
amusing description of the review process.
In the remainder of this section I lead you through the annotated guide, explaining why it is important to look for certain
kinds of information. These sections are less sketchy versions of the brief summaries given in Fig. 4.3. Some of their
content should sound quite familiar because it is based on the concepts discussed at length in chapter 3. This section ends
with a review guide completed as part of a hypothetical client-oriented review (Fig. 4.4).
1. Reviewer This information will probably be unnecessary for reviewers who function alone in their test selection and
evaluation. On the other hand, it can be helpful in cases where multiple test users share reviewing responsibilities, at
least for preliminary, population-oriented reviews. Use of a standard guide facilitates such sharing by reducing
differences between reviewers and offering later reviewers a possible starting point for client-oriented reviews.
2. Identifying Information Besides information that can help you locate or replace an instrument, this section provides
preliminary clues to the scope and nature of the measure. Test names vary greatly in just how much they disclose about
the nature of the test (e.g., whether it is comprehensive or aimed at only one modality or one domain of language), so
they should be approached with caution. Testing time, which users may want to break down in terms of projected
administration and scoring times, is of practical importance when scheduling testing.
Information about basic characteristics of the measure such as whether it is standardized versus informal, criterion-
referenced versus norm-referenced, is used to determine the measure’s suitability for certain clinical questions and
guides expectations for other sections of the review guide. Although all major sections of the Guide are relevant to all
measures, the kinds and amounts of information provided vary depending on the measure’s type. Manuals for
standardized, norm-referenced measures probably provide the greatest amounts of information. On the other hand, more
informal, criterion-referenced measures, which have often been created by an individual clinician for a specific purpose,
have far less information available. (Although see Vetter, 1988a, for recommendations about the kind of information that
should be kept for any procedure that might profitably be used on repeated occasions).
3. Testing Purpose Here, you summarize your knowledge of the intended client or population. Relevant information
includes the client’s age, other problems (e.g., visual, motor, or cognitive impairments), and important language
characteristics (e.g., bilingual home, regional or social dialect use).
Page 97
(continued)
Page 98
(continued)
Page 99
Fig. 4.4. Sample of a completed review form.
Page 100
The main clinical questions leading to the search for an appropriate measure should also recorded here: Is the measure
going to be used for screening, identifying a problem or difference, treatment planning, or assessing change? Also, what
language modalities and skill areas are of interest? As mentioned in chapter 1, each of these clinical questions requires
different measurement solutions. Therefore, the reviewer should conduct all reviews with the assessment purpose vividly
in mind. Chapters 9–12 address in considerable detail the demands associated with different clinical questions.
4. Test Content This section returns the reviewer’s attention to the test manual. Gaining a clear understanding of a test’s
content usually requires that you examine at least the early sections of the test manual and the test form itself.
Homogeneous measures, in which all items are aimed at a single modality and language domain, are relatively easy to
specify in terms of their content. For example, the Expressive One-Word Picture Vocabulary Test–Revised (Gardner,
1990) fits into this category; its content can be specified as expressive vocabulary or expressive semantics. Usually,
however, measures address more than one content area, which are indicated through the use of subtests or subscores. For
this section of the review guide, as well as for the sections that follow it, recording page numbers along with your
findings is an excellent way to encourage checking against the manual during later use of the completed guide.
As you record information about test content, you want to see how well the content areas covered by the measure match
those of interest for your client. Even the nature of items (e.g., forced choice vs. open-ended responses) will be important
in helping you determine whether the behaviors or abilities of interest will be the largest contributor to your client’s
performance. Recall that one threat to validity introduced in chapter 3 was that of enabling behaviors, behaviors that
enable a test taker to take the test validly. For example, suppose that you were interested in assessing the receptive
language skills of a child with cerebral palsy who fatigued easily if asked to show or act out responses. The motoric
demands of measures become enabling behaviors that will negatively influence the child’s performance even though
they are independent of the targeted construct of receptive language.
In addition to providing a tangible reminder not to overlook possibly problematic enabling behaviors, this section of the
review form should also stimulate clue-gathering around what is actually being tested (Hammer, 1992; Sabers, 1996).
Recall that as the test developer moves from an ideal formulation of the measure’s underlying construct to the down-and-
dirty task of writing sets of items, certain behaviors or skills necessarily tag along to yield a fleshed-out construct that
may or may not match your own (or even the author’s) intended formulations.
As an example of how constructs can be modified as a test takes shape, imagine a test developer who decides to devise a
measure to assess use of complex sentences using methods that place a heavy demand on working memory capabilities.
For example, the test developer could provide the test taker with a set of seven words, including the word because that
are to be used to create a single sentence. Although the final form in which the construct is realized may be acceptable to
some test users, it may
Page 101
not be to others, depending on their understanding of the targeted construct. It is primarily through careful attention to
this step in the review process that you will become aware of correspondences—or disjunctions—between the test
developer’s and your view of what is being tested. Armed with this knowledge, you can make an informed decision as to
whether the construct being measured is close enough to your reason for testing for you to consider using it.
5. Standardization Sample/Norms At first glance, this section may seem to be primarily of interest when you are looking
for a norm-referenced instrument, that is, one in which scores are interpreted primarily on the basis of how the test
taker’s performance compares in a quantitative way with that of a peer group. In fact, however, the nature of the
standardization sample has important implications for all measures. It can determine the extent to which summary
statistics (in the case of norm-referenced measures) or summary descriptions of behaviors (in the case of criterion-
referenced measures) are likely to reflect characteristics of most children rather than those of a small, potentially
nonrepresentative group (e.g., children of affluent, highly educated parents). Nonetheless, there are some differences in
how the information provided in this section will be weighed on the basis of the nature of the instrument.
When a norm-referenced measure is being evaluated, you look for a clear description of the normative sample that was
used: how many children were studied, whether and why any children were excluded, and how representative the sample
is compared with the population your client (or subgroup of clients) fits into. Ideally, at least 50 children who are within
a relatively small range in age from that of your client (usually no more than 6 months older or younger) will have been
tested. Also, you want these children to be similar in race, language background, and socioeconomic status to the child or
children you have in mind.
When there are significant differences between the normative sample and your client(s), you need to draw on your
knowledge of the appropriate research base as well as your own knowledge of cultural differences to determine to what
extent the validity of this measure is likely to be undermined. If a measure’s validity is seriously undermined and
alternative measures are unavailable, a variety of approaches, including dynamic assessment and the development of an
informal measure, represent possible strategies (see chap. 10 for further discussion of this issue.)
For a norm-referenced instrument, you also want to examine the types of scores the test uses to describe the test taker’s
performance. In terms of desirability, standard scores rank first, percentile scores are next, and developmental scores
(such as age-equivalent or grade-equivalent scores) earn a sorry last place. In this section of the review form, you may
also want to record the availability of tables that record the standard error of measurement (which will be discussed at
greater length below under reliability). Recording that information here is a good idea because it indicates the amount of
error associated with a test taker’s standard score.
When a criterion-referenced measure is evaluated, the composition of groups used to determine cutoff scores will be the
focus of your scrutiny at this point in the review form. I am not aware of recommendations concerning sample size and
composition that are as specific as those given above for norm-referenced measures. However, you
Page 102
want to be sure that the group for whom the cutoff scores are provided are similar to your client or clients and that the
group is large enough so that the cutoff is likely to be stable (McCauley, 1996).
6. Reliability In this section, you will summarize relevant information about the test’s reliability, which is almost always
contained in a separate, clearly marked section of the test manual. The operative word here is relevant. The manual may
report 6, 10, even 20 studies in which the reliability of the measure was examined. Nonetheless, the relevant ones are
those (a) using participants who are as similar as possible to your client(s) and (b) focusing on the type of reliability that
is either most at risk because of the nature of the instrument or most important to your clinical question. Recall that
chapter 3 discusses the different kinds of reliability data that are typically of interest.
Once you have decided what forms of reliability are of greatest importance, how do you know whether the evidence is
adequate? For norm-referenced tests, the evidence will almost always take the form of reliability coefficients.
Traditionally, it has been suggested that one demand correlation coefficients that are statistically significant and at
least .80 in magnitude for screening purposes and at least .90 when making more important decisions about individuals
(Salvia & Ysseldyke, 1998). However, a more circumspect recommendation might be that you want the best reliability
available on the market. By this I mean that when the ideal of .90 is not available, and a decision must be made, you will
want the best that you can find as well as multiple, independent sources of information.
For criterion-referenced measures, evidence for reliability can take a great many forms—from correlation coefficients to
agreement indices (Feldt & Brennan, 1989). Such evidence for criterion-referenced measures usually addresses the
question of how consistently the cutoff can be used to reach a particular decision. As you would do for norm-referenced
measures, focus on the results of those studies that involve research questions most like your clinical question and
participants most like your client(s). Information about the relationship between types of reliability and clinical questions
is discussed in chapters 9 to 11.
7. Validity Although the entire review form is aimed at your cracking the case of a measure’s validity for a particular
use, in this section of the review form, you will summarize the most important of the information provided by the test
developer for the purpose of evaluating validity. Although most of the information of interest will probably be found in
clearly labeled sections of the manual, information relevant to considerations of content and construct validity is also
frequently found in sections dealing with the measure’s initial development and subsequent revisions (if any). Recall that
some of the specific methods used to provide evidence of validity (e.g., developmental studies, contrasting group
studies) are discussed at some length in the previous chapter.
The statistical methods that are used to document validity vary from correlation coefficients to analyses of variance to
factor analysis. Consequently, a discussion of what constitutes acceptable data must remain fairly general here. Overall,
one looks
Page 103
to see that the measure is shown to function as it is predicted to function if valid. As with reliability evidence, the nature
of the participants in the study will affect the extent to which it is relevant for your client and purposes. As you complete
this section of the review form, every skeptical bone in your body should be recruited for service. Claiming validity
doesn’t make a measure valid, although at times test developers seem to forget this.
8. Overall Impressions of Validity for Testing Purpose and Population At this point in the review guide, you put the
clues together to sum up the case. Your study of the pros and cons should be summarized, with holes in the evidence
noted and discussed in terms of their implications for interpreting results. This is where you determine whether you
believe the instrument can be safely used and, if used, what cautions should be kept in mind when it is administered and
interpreted. Clearly, this is the most demanding point in the review process—akin to a final exam or the concluding
paragraph of a large paper. Although practice is perhaps the best way of honing the requisite analytic skills, examination
of other reviews of the instrument (when they’re available) can help you make sure you have not overlooked any major
clues and can also help you see how others have approached the task. Even examining reviews on other measures can
prove helpful for getting a sense of how seasoned detectives sum up their cases. (e.g., See reviews in Conoley & Impara,
1995, of the Receptive–Expressive Emergent Language Test–2 [Bzoch & League, 1994], written by Bachman [1995] and
Bliss [1995] and of the Test of Early Reading Ability–Deaf or Hard of Hearing [Reid, Hresko, Hammill, & Wiltshire,
1991], written by Rothlisberg [1995] and Toubanos [1995]).
Because examples can prove so helpful in developing one’s understanding of a new process, I included Fig. 4.3, which
illustrates how I would complete the reviewing guide for the Expressive Vocabulary Test (Williams, 1997) as I consider
its validity for use with a hypothetical child, Melissa. Melissa is a 9-year, 2-month-old girl who has previously been
receiving treatment for a specific language impairment. She is being tested as part of a periodic reevaluation, which will
be used by an educational team to determine whether she will continue to receive services in her school. Melissa’s
unilateral hearing loss and problems with attention will require special attention during the review of the Expressive
Vocabulary Test (Williams, 1997) for possible use.
How to Access Other Sources of Information
In addition to test manuals, independent test reviews are available to help in the test review process in three different
forms: reviews appearing in standard reference volumes on behavioral measures, journal articles reviewing one or more
tests in a particular area, and computer databases of test reviews.
Standard references and journal articles that include reviews of tests used frequently in the assessment of children with
developmental language disorders or that provide specific information relevant to an understanding of individual tests
are listed in Table 4.3.
Page 104
Table 4.3
Books and Journal Articles Providing Information About Specific Tests Used With Children

Books
American Speech-Language-Hearing Association. (1995). Directory of speech-language pathology assessment
instruments. Rockville, MD: Author.
Compton, C. (1996). A guide to 100 tests for special education. Upper Saddle River, NJ: Globe Fearon Educational.
Impara, J. C., & Plake, B. S. (Eds.). (1998). Thirteenth mental measurements yearbook. Lincoln, NE: Buros Institute of
Mental Measurements.
Keyser, D. J., & Sweetland, R. C. (1994). (Eds.). Test critiques (Vol. X). Austin, TX: Pro-Ed.
Murphy, L. L., Conoley, J. C., & Impara, J. C. (Eds.). (1994). Tests in print IV: An index to tests, test reviews, and the
literature on specific tests. Lincoln, NE: Buros Institute of Mental Measurements.

Journal Articles
Huang, R., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speech-
language pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–23.
McCauley, R. J., & Swisher, L. (1984). Psychometric review of language and articulation tests for preschool children.
Journal of Speech and Hearing Disorders, 49, 34–2.
Merrell, A. W., & Plante, E. (1997). Norm-referenced test interpretation in the diagnostic process. Language, Speech,
Hearing Services in Schools, 28, 50–58.
Plante, E., & Vance, R. (1994). Selection of preschool language tests: A data-based approach. American Journal of
Speech-Language Pathology, 4, 70–76.
Plante, E., & Vance, R. (1995). Diagnostic accuracy of two tests of preschool language. American Journal of Speech-
Language Pathology, 4, 70–76.
Stephens, M. I., & Montgomery, A. A. (1985). A critical review of recent relevant standardized tests. Topics in
Language Disorders, 5(3), 21–45.
Sturner, R. A., Layton, T. L., Evans, A. W., Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and
language screening: A review of currently available tests. American Journal of Speech-Language Pathology, 3, 25–36.

Each new volume in the Mental measurements yearbook series contains reviews of commercially available tests and
tests that have just been published or were revised since their review in a preceding volume. Entries are alphabetically
organized by the name of the test, with two reviews prepared independently by individuals with expertise in testing, in
the content area tested, or both. A new volume of this series appears about every three years. In addition, reviews
published since 1989 are available on the Internet to allow for on-line searches that can help consumers find reviews as
well as specific kinds of measures because of searching capabilities.
Several recent journal articles reviewing tests in a particular content area or for a particular group of children with
language impairments are also listed in Table 4.3.
Computer databases represent a more recent possible source of information on standardized measures. Reviews from the
Mental measurements year bookseries are
Page 105
available on-line through colleges, universities, and public libraries. Reviews included in this on-line database are
identical in content to those included in the bound volumes of the Mental measurements yearbook. Further, these reviews
are more timely than those appearing in the printed volumes because reviews that will eventually be incorporated in a
later bound volume are added every month.
The Health and Psychosocial Instruments (HaPI) database is also available at many libraries and can be searched on-line.
It allows one to search for information about a specific test, to find the publishing information about a test through its
name, acronym, or authorship, and to search for instruments focusing by content or age group. The HaPI publishes
abstracts and does not contain complete reviews of instruments. However, it does indicate whether information is
reported for seven critical characteristics: internal consistency reliability, test–retest reliability, parallel forms reliability,
interrater reliability, content validity, construct validity, and criterion-related validity.
Summary
1. Effective evaluation of measures of children’s communication and related skills must be conducted with appreciation
for the contextual variables affecting both children and clinicians.
2. The bioecological theory of Bronfenbrenner and his colleagues emphasizes the interplay of the child’s characteristics
with those of his or her environment, beginning with the family and extending to the broader physical, social, and
historical environment as well. The relevance of this theory to the evaluation of measures and measurement strategies for
children lies in the connection between validity and attention to these contextual variables.
3. Among the contextual variables affecting clinicians as they interact with children and evaluate their language are not
only personal variables (e.g., their own language and culture), but also legal variables and other variables affecting their
professional practice.
4. Evaluation of individual measures requires the potential test user to gather clues suggesting the strengths and
weaknesses of the measure for answering a particular clinical question for a particular client. Client-oriented reviews are
conducted to refine information obtained from a population-oriented review or in response to the exceptional needs of a
particular client.
5. Test manuals or other materials provided by the developer of a measure serve as the primary source of information to
be considered in evaluating its usefulness for a given client.
6. The test reviewer needs to approach the review process armed with a skeptical attitude toward unproven claims and an
arsenal of information regarding acceptable psychometric standards.
7. The Standards for educational and psychological testing (AERA, APA, & NCME, 1985) is the most widely accepted
source for such information on standards.
Page 106
8. Additional information for use in the reviewing process is available in the form of reviews published in standard
reference books, relevant journal articles, and computer databases.
9. In spite of existing ideals for evidence of reliability and validity, the clinician may nonetheless decide to use a
particular measure even when it does not reach an ideal, when it is the best available for a particular client, and a clinical
decision must be made.
Key Concepts and Terms
client-oriented measure review: evaluation of a measure’s appropriateness for use in answering a specific clinical
question for a single client.
Individuals with Disabilities Education Act (IDEA): federal legislation addressing the education needs of individuals
with disabilities, including children with communication disorders.
International Classification of Impairments, Disabilities, and Handicaps (ICIDH): a classification designed by the WHO
for global use by health professionals, educators, legislators, and other groups concerned with health-related issues to
serve as a common language.
Mental measurement yearbooks: a well-regarded source of test reviews.
nondiscriminatory assessment: the use of measures and procedures for administering and interpreting data that will not
confound a child’s language or dialect background with the target of testing.
population-oriented measure review: a preliminary evaluation of a measure’s likely appropriateness for use in answering
one or more clinical questions for a population of clients who share important similar characteristics. Population-oriented
reviews of measures are often conducted for subgroups of clients who are frequently seen by a given clinician.
Study Questions and Questions to Expand Your Thinking
1. Consider your own social ecology. Think about a specific kind of decision you have made or will make (e.g.,
concerning school or employment). What institutions and people affect your decision?
2. Talk to the parent of a young child about the contexts in which that child functions—daycare, time spent with
extended family, and so forth. Determine how many hours the child spends in each setting and who the main interaction
partners for the child are. How might these settings influence the communication experiences of this child?
3. List five domains of language.
4. Does time taken to conduct a test have any obvious potential relationship to the validity of testing? If so, when or for
what groups of children?
Page 107
5. Discuss the importance of conducting a client-oriented review rather than simply a population-oriented review of a
measure you will use with a specific client.
6. Go to the library and examine several volumes of the Mental measurements yearbook series. Describe the process by
which tests are selected to be reviewed, and examine two reviews for a single speech-language measure.
7. Choose a test that you have heard referred to in a course you have taken. See if you can find a review for it in the
Mental measurements yearbook series or elsewhere. Also, consider the extent to which the interaction implicit in the
testing procedures matches the kinds of experiences a child might have on an everyday basis.
8. Complete a review form for a norm-referenced speech-language test.
9. Complete a review form for a criterion-referenced speech-language measure.
Recommended Readings
Hutchinson, T. A. (1996). What to look for in the technical manual: Twenty questions for users. Language, Speech,
Hearing Services in Schools, 27, 109–121.
Sabers, D. L. (1996). By their tests we will know them. Language, Speech, Hearing Services in Schools, 27, 102–108.
Salvia, J., & Ysseldyke, J. (1998). Appendix 5. In J. Salvia & J. Ysseldyke (Eds.), Assessment (5th ed., pp. 763–766).
Boston: Houghton Mifflin.
References
American Psychological Association, American Educational Research Association, National Council on Measurement in
Education. (1985). Standards for educational and psychological testing. Washington, DC: American Psychological
Association.
American Speech-Language-Hearing Association. (1995). Directory of speech-language pathology assessment
instruments. Rockville, MD: Author.
American-Speech-Language-Hearing Association. (1999). Guidelines for roles and responsibilities of the school-based
speech-language pathologist [On-line]. Available: http:/www.asha.org/professionals/library/slpschool_i.htm#purpose.
Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan.
Anastasi, A. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice Hall.
Bachman, L. F. (1995). Review of the Receptive–Expressive Emergent Language Test (2nd ed.). In J. C. Conoley & J. C.
Impara (Eds.), The twelfth mental measurements yearbook. (pp. 843–845). Lincoln, NE: Buros Institute of Mental
Measurements.
Bliss, L. S. (1995). Review of the Receptive–Expressive Emergent Language Test (2nd ed.). In J. C. Conoley & J. C.
Impara (Eds.), The twelfth mental measurements yearbook (p. 846). Lincoln, NE: Buros Institute of Mental
Measurements.
Bronfenbrenner, U. (1974). Developmental research, public policy, and the ecology of childhood. Child Development,
45, 1–5.
Bronfenbrenner, U. (1986). Recent advances in research on the ecology of human development. In R. K. Silbereisen, E.
Eyferth, & G. Rudinger (Eds.), Development as action in context: Problem behavior and normal youth development (pp.
286–309). New York: Springer-Verlag.
Bronfenbrenner, U., & Morris, P. (1998). The ecology of developmental processes. In W. Damon & R. M. Lerner (Eds.),
Handbook of child psychology: Theoretical models of human development (5th ed., Vol. 1, pp. 993–1028). New York:
Wiley.
Bzoch, K. R., & League, R. (1994). Receptive-Expressive Emergent Language Test-2. Austin, TX: Pro-Ed.
Page 108
Compton, C. (1996). A guide to 100 tests for special education. Upper Saddle River, NJ: Globe Fearon Educational.
Conoley, J. C., & Impara, J. C. (Eds.). (1995). The twelfth mental measurements yearbook. Lincoln, NE: Buros Institute
of Mental Measurements.
Crais, E. R. (1992). ‘‘Best practices” with preschoolers: Assessing within the context of a family-centered approach. In
W. Secord (Ed.), Best practices in school speech-language pathology: Descriptive/non-standardized language
assessment (pp. 33–42). San Antonio, TX: Psychological Corporation.
Crais, E. R. (1995). Expanding the repertoire of tools and techniques for assessing the communication skills of infants
and toddlers. American Journal of Speech-Language Pathology, 4, 47–59.
Damico, J. S., Smith, M., & Augustine, L. E. (1996). In M. D. Smith & J. S. Damico (Eds.), Childhood language
disorders (pp. 272–299). New York: Thieme.
Demers, S. T., & Fiorello, C. (1999). Legal and ethical issues in preschool assessment and screening. In E. V. Nuttall, I.
Romero, & J. Kalesnik (Eds.), Assessing and screening preschoolers: Psychological and educational dimensions (2nd
ed., pp. 50–58). Needham Heights, MA: Allyn & Bacon.
Donahue-Kilburg, G. (1992). Family-centered early intervention for communication disorders: Prevention and
treatment. Gaithersburg, MD: Aspen.
Education for All Handicapped Children Act of 1975. Pub. L. No. 94-142, 89 Stat. 773 (1975).
Education of the Handicapped Act Amendments of 1986. Pub. L. No. 99-457, 100 Stat. 1145 (1986).
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 105–146).
New York: American Council on Education and Macmillan.
Frattali, C. (1998). Outcome measurement: Definitions, dimensions, and perspectives. In C. Frattali (Ed.), Measuring
outcomes in speech-language pathology (pp. 1–27). New York: Thieme.
Gardner, M. F. (1990). Expressive One-Word Picture Vocabulary Test–Revised. Novato, CA: Academic Therapy.
Hammer, A. L. (1992). Test evaluation and quality. In M. Zeidner & R. Most (Eds.), Psychological testing: An inside
view. Palo Alto, CA: Consulting Psychologists Press.
Hammill, D. D., & Newcomer, P. L. (1988). Test of Language Development–2 Intermediate. Austin, TX: Pro-Ed.
Huang, R., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speech-
language pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–23.
Hutchinson, T. A. (1996). What to look for in the technical manual: Twenty questions for users. Language, Speech,
Hearing Services in Schools, 27, 109–121.
Impara, J. C., & Plake, B. S. (Eds.). (1998). The thirteenth mental measurements yearbook (pp. 1050–1052). Lincoln,
NE: Buros Institute of Mental Measurements.
Individuals with Disabilities Education Act (IDEA). Pub. L. No. 101-476, 104 Stat. 1103 (1990).
Individuals with Disabilities Education Act Amendments of 1997. Pub. L. No. 105-17, 111 Stat. 37 (1997).
Keyser, D. J., & Sweetland, R. C. (1994). (Eds.). Test critiques (Vol. X). Austin, TX: Pro-Ed.
McCauley, R. J. (1996). Familiar strangers: Criterion-referenced measures in communication disorders. Language,
Speech, and Hearing Services in Schools, 27, 122–131.
McCauley, R. J., & Swisher, L. (1984). Psychometric review of language and articulation tests for preschool children.
Journal of Speech and Hearing Disorders, 49, 34–42.
Merrell, A. W., & Plante, E. (1997). Norm-referenced test interpretation in the diagnostic process. Language, Speech,
Hearing Services in Schools, 28, 50–58.
Murphy, L. L., Conoley, J. C., & Impara, J. C. (Eds.). (1994). Tests in print IV: An index to tests, test reviews, and the
literature on specific tests. Lincoln, NE: Buros Institute of Mental Measurements.
Muma, J. (1998). Effective speech-language pathology: A cognitive socialization approach. Mahwah, NJ: Lawrence
Erlbaum Associates.
Plante, E., & Vance, R. (1994). Selection of preschool language tests: A data-based approach. Language, Speech, and
Hearing Services in Schools, 25, 15–24.
Plante, E., & Vance, R. (1995). Diagnostic accuracy of two tests of preschool language. American Journal of Speech-
Language Pathology, 4, 70–76.
Page 109
Radziewicz, C. (1995). In E. Tiegerman-Farber, Language and communication intervention in preschool children (pp.
95–128). Boston: Allyn & Bacon.
Reid, D. K., Hresko, W. P., Hammill, D. D., & Wiltshire, S. (1991). Test of Early Reading Ability–Deaf or hard of
hearing. Austin, TX: Pro-Ed.
Rothlisberg, B. A. (1995). Review of the Test of Early Reading Ability–Deaf or Hard of Hearing. In J. C. Conoley & J.
C. Impara (Eds.), The twelfth mental measurements yearbook (pp. 1049–1051). Lincoln, NE: Buros Institute of Mental
Measurements.
Sabers, D. L. (1996). By their tests we will know them. Language, Speech, and Hearing Services in Schools, 27, 102–
108.
Salvia, J., & Ysseldyke, J. E. (1995). Assessment (7th ed.). Boston: Houghton Mifflin.
Stephens, M. I., & Montgomery, A. A. (1985). A critical review of recent relevant standardized tests. Topics in
Language Disorders, 5 (3), 21–45.
Sturner, R. A., Layton, T. L., Evans, A. W., Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and
language screening: A review of currently available tests. American Journal of Speech-Language Pathology, 3, 25–36.
Toubanos, E. S. (1995). Review of the Test of Early Reading Ability–Deaf or Hard of Hearing. In J. C. Conoley & J. C.
Impara (Eds.), The twelfth mental measurements yearbook (pp. 1051–1053). Lincoln, NE: Buros Institute of Mental
Measurements.
van Kleeck, A. (1994). Potential cultural bias in training parents as conversational partners with their children who have
delays in language development. American Journal of Speech-Language Pathology, 3, 67–78.
Vetter, D. K. (1988a). Designing informal assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making
in speech-language pathology (pp. 192–193). Philadelphia: B. C. Decker.
Vetter, D. K. (1988b). Evaluation of tests and assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision
making in speech-language pathology (pp. 190–191). Philadelphia: B. C. Decker.
Williams, K. T. (1997). Expressive Vocabulary Test. Circle Pines, MN: American Guidance Service.
World Health Organization. (1980). ICIDH: The international classification of impairments, disabilities, and handicaps.
Geneva: Author.
World Health Organization. (1998). Towards a common language for functioning and disablement: ICIDH-2: The
International Classification of Impairments, Activities and Participation. Geneva: Author.
Page 110
Page 111
PART
II

AN OVERVIEW OF CHILDHOOD LANGUAGE DISORDERS


Part II introduces the four most frequently occurring categories of childhood language disorders: specific language
impairment (chap. 5) and language problems associated with mental retardation (chap. 6), autism spectrum disorders
(chap. 7), and hearing impairment (chap. 8). Each chapter is designed to provide an overview of the nature and special
testing problems associated with one category.
Within each chapter, disorder categories are defined, where possible, according to criteria outlines the Diagnostic and
statistical manual of mental disorders (4th ed.; DSM–IV) of the American Psychiatric Association (1994) and in some
chapters according to other influential definitions. Each disorder category is then further introduced in terms of its
suspected causes, the special challenges to language assessment afforded by children with the specific problem, their
expected patterns of language performance, and accompanying problems that may further complicate these children’s
lives and communication functioning. Each chapter also contains a short passage written from the perspective of
someone diagnosed with the condition addressed in the chapter.
Page 112
Page 113
CHAPTER
5

Children with Specific Language Impairment


Defining the Problem

Suspected Causes

Special Challenges in Assessment

Expected Patterns of Language Performance

Related Problems
Defining the Problem
Sandy is a compact 6-year-old who was late in talking and considered unintelligible by all but a few family members
until about age 5. She is still often mistaken for a younger child because of her size, limited vocabulary, and frequent
errors in grammar. Having recently transferred to a new school, Sandy is having trouble adjusting and has become very
quiet except for occasional interactions with friends from her previous school.
Joshua, a 9-year-old with a history of delayed speech and language, continues to use short, simple sentences that are
often ineffective in getting his message across. Despite significant gains in his oral communication, he has made little
progress in early reading skills. Thus, despite two years of instruction and special support in both oral and written
language, he names letters of the alphabet inconsistently and has a
Page 114
sight vocabulary limited to about 30 words. Joshua also appears to have difficulty understanding many of the
instructions given in the classroom.
Wilson is a 4-year-old whirlwind who augments his limited speech productions with animated gestures and, sometimes,
truly gifted doodles. Because of his activity level and awkward, sometimes overwhelming style of interacting, he is
avoided by his peers and has formed fierce attachments to the preschool teacher and his speech-language pathologist.
Wilson’s parents and educators are beginning to question whether his activity level falls within the normal range and
will be discussing the possibility of having him evaluated for attention deficit disorder with hyperactivity at their next
meeting. Wilson’s ability to understand the communications of others has never been questioned.
Although Sandy, Joshua, and Wilson are varied in their patterns of communication difficulties, each can be described as
demonstrating specific language impairment (SLI), a disorder estimated to affect between 1.5 and 7% of children
(Leonard, 1998). A recently proposed figure of 7% for 5-year-olds may be the best current estimate of prevalence: The
research on which it was based was rigorous and included the use of a carefully selected sample of 7,218 children
(Tomblin et al., 1997). Although estimates differ considerably from study to study, it has generally been found that boys
are affected more often than girls, with some studies suggesting that boys are at twice the risk of girls (Tomblin, 1996b).
SLI can be defined as “delayed acquisition of language skills, occurring in conjunction with normal functioning in
intellectual, social-emotional, and auditory domains” (Watkins, 1994, p. 1). Thus, SLI is frequently described as a
disorder of exclusion. As such, it can seem like a definition of leftovers, encompassing those instances where language
impairment exists but cannot readily be attributed to factors that clearly limit a child’s access to information about
language or to the abilities required to undertake the creative task of language acquisition. On the other hand, specific
language impairment can be regarded as a “pure” form of developmental language disorder, one in which language alone
is affected (Bishop, 1992b).
Hopes of defining the nature of specific language impairment have instigated a wealth of research in child language
disorders over the past 50 years. Initially termed “congenital aphasia’’ or “developmental dysphasia,” SLI seemed to
offer the opportunity to look at a pure, or “specific,” variety of communication disorder (Rapin, 1996; Rapin & Allen,
1983). Historically, each of the categories of developmental language disorders examined in other chapters in this
section offered ostensibly obvious explanations for their existence. In contrast, children with SLI offered no apparent
explanations yet promised an opportunity to look at the unique effects of impaired language on development. Or so it
first appeared. In the Related Problems section of this chapter, you will read about the subtle differences in cognition and
other attributes that have been identified in children with SLI and that thus threaten narrow conceptions of specific
impairment.
The DSM–IV (American Psychiatric Association, 1994) does not use the term specific language impairment, but includes
two disorders that together cover much of the same terrain: Expressive Language Disorder and Mixed Expressive–
Receptive Language Disorder. Table 5.1 lists the diagnostic criteria for these two communication
Page 115
Table 5.1
Summary of Criteria for Two Disorders Corresponding to Specific Language Impairment From the Diagnostic and
Statistical Manual (4th ed.) of the American Psychiatric Association (1994)

Expressive Language Disorder (American Psychiatric Association, 1994, p. 58)


A. The scores obtained from standardized, individually administered measures of expressive language development
are substantially below those obtained from standardized measures of both nonverbal intellectual capacity and
receptive language development. The disturbance may be manifest clinically by symptoms that include having a
markedly limited vocabulary, making errors in tense, or having difficulty recalling words or producing sentences
with developmentally appropriate length or complexity.
B. The difficulties with expressive language interfere with academic or occupational achievement or with social
communication.
C. Criteria are not met for Mixed Receptive-Expressive Language Disorder or Pervasive Developmental Disorder.
D. If Mental Retardation, a speech-motor or sensory deficit, or environmental deprivation is present, the language
difficulties are in excess of those usually associated with these problems.
Mixed Receptive-Expressive Language Disorder (American Psychiatric Association, 1994, pp. 60–61).
A. The scores obtained from a battery of standardized individually administered measures of both receptive and
expressive language development are substantially below those obtained from standardized measures of
nonverbal intellectual capacity. Symptoms include those for Expressive Language Disorder as well as difficulty
understanding words, sentences, or specific types of words, such as spatial terms.
B. The difficulties with receptive and expressive language significantly interfere with academic or occupational
achievement or with social communication.
C. Criteria are not met for Pervasive Developmental Disorder.
D. If Mental Retardation, a speech-motor or sensory deficit, or environmental deprivation is present, the language
difficulties are in excess of those usually associated with these problems.

disorders. The division of SLI into these two categories reflects a recurring impulse among researchers and clinicians to
identify subgroups within the larger population—in this case and most often according to whether receptive language is
significantly affected.
The DSM–IV criteria include a variation on the exclusionary elements of the SLI definition described up to this point.
Specifically, in Criterion D for both disorders, the clinician is directed to look for language impairments whose severity
is unexplained by the obvious threats to language development included in other exclusionary definitions (e.g., the
presence of hearing impairment or mental retardation). The DSM–IV definitions allow both for the identification of a
language impairment when no obvious threats exists as well as for cases where the presence of these threats does not
seem sufficient to account for the degree of problem presented.
Most researchers over the past three decades have used definitions largely like those discussed and have particularly
relied on the operationalization of SLI proposed by Stark and Tallal (1981, 1988). The details of such definitions,
however, have proven quite controversial (Camarata & Swisher, 1990; Johnston, 1992, 1993; Kamhi,
Page 116
1993; Plante, 1998). Moving from the laboratory to clinical practice in schools, the controversy is intensified because
state policies are vigorous participants in the decision-making process. In particular, the use of difference or discrepancy
scores is often mandated but has faced increasing criticism (e.g., Aram, Morris, & Hall, 1993; Fey, Long, & Cleave,
1994; Kamhi, 1998). Although methods used in identification of SLI are discussed at some length in the Special
Challenges in Assessment section of this chapter, they are mentioned here because they affect understanding of the
nature of the problem and therefore affect research intended to obtain information about suspected causes, patterns of
language performance, and related problems.
It seems important to recognize that SLI is a term that is often absent from the day-to-day functioning of speech-language
pathologists in many clinical and education settings. Instead, they frequently use the terms language delays or language
impairments, thereby remaining silent on the specificity of a given child’s problems (Kamhi, 1998). Nonetheless, the
foundation of research on this population and clinical writings provides an important context for scientifically oriented
clinical practice. In the same way that field geologists need to know about basic chemistry despite few encounters in the
wild with pure iron or other elements, speech-language pathologists can learn from attempts to identify and understand
SLIs and to recognize them when they encounter them in their practice. The very length of this chapter compared with
the others addressing information about subgroups of children with language disorders testifies to the fertility of the
resulting explorations.
Suspected Causes
The question of what causes isolated language impairment has been approached from several perspectives—from genetic
to linguistic, physiological to social. It remains a question—or, more accurately, a series of related questions—that
tantalizes researchers, clinicians, and parents alike. It is best viewed as a set of related questions because one can
conceive of causes on several different levels (e.g., physical as well as social). In addition, it can be viewed that way
because effects are frequently the result of a convergence of causes rather than a single cause. Thus two or more factors
may need to come into play before impaired language occurs. Understanding causation is further clouded by the fact that
researchers are frequently only in the position of identifying risk factors; that is, factors that tend to co-occur with the
presence of SLI, but that can only be thought of as potential causes until the nature of the association can be worked out
through further research.
In this section, a review of suspected causes encompasses not only differences in brain structure and function, genetics,
and selected environmental factors, but also more abstract linguistic and cognitive discussions of the origins of specific
language disorder in children. Although there is considerable turmoil in the community of child language researchers
concerning the more abstract accounts provided in linguistic and cognitive explanations, their role in assessment and
planning for treatment has the potential for being more immediate and influential than that of accounts related to genetics
and physiology.
Page 117
Genetics
Genetic origins for SLI have probably been suspected for some years by anyone who has encountered families in which
language problems seem more commonplace than one might expect given the relative exceptionality of language
impairment. Nonetheless, serious study of genetic contributions to SLI have been undertaken only in the last couple of
decades (Leonard, 1998; Pembrey, 1992; Rice, 1996). Largely, the increase in such studies has occurred because of
advances in the study of behavioral genetics (Rice, 1996). In addition, however, the delayed interest in the genetics of
language impairment has resulted from the need for agreement on a phenotype, that is, the behavior or set of behaviors
that constitute critical characteristics of the disorder (Gilger, 1995; Rice, 1996).
Several different types of genetic studies are regularly used to link specific diseases or behavioral differences with
genetic underpinnings (Brzustowiz, 1996). Among those that have been used to the greatest extent so far in studying SLI
are family studies, twin studies, and pedigree studies. In family studies, the family members of a proband (i.e., an
affected person who is the focus of study) are examined to determine whether they show evidence of the characteristic or
disorder under study at rates that are higher than would be expected in the general population. If they do, the
characteristic or disorder is considered familial—a state of affairs that could be due to genetic origins or to common
exposure to other influences. Thus, for example, a fondness for chocolate might be found to be familial, but, without
further study, could just as easily be due to long exposure to a kitchen full of chocolate delicacies as to a genetic basis.
In twin studies, comparisons of the frequency of a characteristic or disorder are made between identical and fraternal
twins. Because identical twins share the same genetic makeup, they should show higher concordance for the
characteristic if it has a genetic basis; that is, there should be a strong tendency for both identical twins to either have or
not have the characteristic. In contrast, if their rates of concordance are relatively high, but similar to those of the
fraternal twins (who are no more genetically related than any pair of siblings and thus on average share 50% of their
genetic make-up), the characteristic might still be considered familial. However, in that case, it would be more likely the
result of environmental rather than genetic influences. (See Tomblin, 1996b, for a discussion of some of the complexities
of this type of design.)
In pedigree studies, as many members as, possible of a single proband’s large, multigenerational family are examined in
order to get insight into patterns of inheritance associated with the targeted characteristic or disorder. Closely related to
pedigree studies are segregation studies in which multiple families with affected members are examined to compare
observed patterns of inheritance with patterns that have been observed for other genetically transmitted diseases.
Despite the difficulties associated with defining a disorder as complex as SLI (Brzustowicz, 1996), considerable progress
has been made over the past 15 years in understanding genetic contributions to the disorder. Familial studies (e.g., Neils
& Aram, 1986; Tallal, Ross, & Curtiss, 1989; Tomblin, 1989) have consistently demonstrated higher risk among families
selected because of an individual member with SLI than families selected because of an unaffected member who is
serving as a control
Page 118
participant. Complicating these findings, however, have been observations that many children with SLI come from
families where they are the only affected member (Tomblin & Buckwalter, 1994). Further, family histories of SLI may
be more common among children with expressive problems only than among those with both receptive and expressive
problems (Lahey & Edwards, 1995).
Whereas some familial studies (e.g., Neils & Aram, 1986; Tallal et al., 1989; Tomblin, 1989) have used questionnaires to
examine the language skills of other often older family members, others have used direct assessment of language skills (e.
g., Plante, Shenkman, & Clark, 1996; Tomblin & Buckwalter, 1994). The latter studies are considered more desirable
(Leonard, 1998) because they rely neither on participant’s memories of childhood difficulties nor on potentially
incomplete and inaccurate school or clinical records. Further, they seem to be more sensitive to manifestations of SLI in
adults, thereby capturing a greater number of affected individuals for examination of inheritance patterns (Plante et al.,
1996). Most importantly, however, both types of studies can demonstrate familial patterns of SLI, which are the first step
toward proving its genetic underpinnings for in least some affected individuals.
Twin studies (e.g., Bishop, 1992a; Tomblin, 1996b) have demonstrated higher concordance for SLI among identical than
fraternal twins, thus providing evidence of some degree of genetic influence. However, even among identical twins,
concordance is not perfect, despite their identical genetic make-up. Consequently it has been suggested that either the
affected gene associated with SLI does not always produce the same outcome (due to incomplete penetrance) or it does
not operate alone to produce SLI (Tomblin & Buckwalter, 1994; Leonard, 1998). In the former case, incomplete
penetrance refers to cases in which a gene associated with a disorder fails to act in an all-or-nothing fashion, with some
people who carry a gene showing no ill effects (Gilger, 1995). The latter prospect means that SLI may be caused by
more than one gene or that a gene or group of genes must operate in combination with environmental factors.
Current research on the genetics of SLI is weighing these alternative scenarios. Among the kinds of studies needed are
pedigree and segregation studies in which groups of families or a single family is studied across generations. One family,
referred to as the KE family, has been under study for some time (e.g., Crago & Gopnik, 1994; Gopnik & Crago, 1991;
Vargha-Kadeem, Watkins, Alcock, Fletcher, & Passingham, 1995). This family continues to be examined to determine
whether a hypothesized autosomal dominant transmission mode is at work. Briefly, autosomal dominant transmission
means that the disorder is transmitted through a pair of autosomal chromosomes (i.e., one of the 22 chromosome pairs
that are not sex-linked) and will occur even if only one of the two chromosomes in a pair is affected.
The KE family has many affected members, as would be expected given an autosomal dominant mode of transmission,
as opposed to modes involving the sex chromosomes (a single pair) or a recessive mode of transmission in which both
members of a pair would be affected to result in the disorder. In fact, most members of the KE family demonstrate both
severely impaired speech and language, and several show cognitive impairment or psychiatric disorders as well. Thus,
additional work is needed to examine other families who might be more representative of greater numbers of children
with SLI.
Page 119
Continuing pursuit of information about genetic bases is thought to be useful because it may be possible to determine
what aspects of language impairment are more biologically determined and, therefore, perhaps less amenable to
treatment. Once those determinations are made, clinicians could focus on the fostering of compensatory strategies or on
the amelioration of remaining aspects of the language impairment that may be more modifiable through treatment (Rice,
1996).
Differences in Brain Structure and Function
The prospect of differences in brain structure and function between children with SLI and those without has beckoned as
a potential explanation since researchers first began ruminating about this disorder. This is illustrated by the use of the
term childhood aphasia in the 1930s and several decades thereafter. Among the possibilities that have been examined are
those of early damage to both cerebral hemispheres, of damage to the left hemisphere only (Aram & Eisele, 1994;
Bishop, 1992a), as well as the possibility that differences are not the result of “damage” per se, but rather are the
expression of natural genetic variation (Leonard, 1998).
Currently, cases of frank neurologic damage—for example those following a stroke or head injury—are excluded from
definitions of SLI. Somewhat more difficult to classify are the problems of children with Landau–Kleffner syndrome,
also called acquired epileptic aphasia. These children fail to show signs of focal damage except for
electroencephalographic abnormalities, yet they experience a profound loss of language skills (Bishop, 1993). Although
included in early formulations of childhood aphasia, this syndrome has recently been found to fit within cases that are
typically excluded from SLI.
Despite the exclusion of known brain damage from strict definitions of SLI, a relatively large number of studies using
techniques such as magnetic resonance imaging (MRI) and, less frequently, autopsy examination have been undertaken
to determine whether subtle differences in brain structure and function can account for the difficulties facing children
with SLI. Often these differences have been structural anomalies that seem to depart from those considered optimal for a
left-hemisphere dominance for speech—leading to either right-hemisphere dominance or a lack of dominance by either
hemisphere (Gauger, Lombardino, & Leonard, 1997). Increasingly, it is thought that such differences may reflect
variations in structure that make language development less efficient (e.g., Leonard, 1998).
Two areas of the cerebral hemispheres in which such variations have been identified are the plana temporale and the
perisylvian areas, illustrated in Fig. 5.1. These two areas overlap, with the smaller planum temporale lying within the
larger perisylvian region of each hemisphere; both of the areas lie within an area that has consistently been shown to be
associated with language function.
Examinations of the plana temporale in individuals with SLI were sparked by a 1985 autopsy study (Galaburda,
Sherman, Rosen, Aboitiz, & Geschwind) of adults who had had written language deficits. Detailed examination of these
individuals’ brains after death showed an atypical symmetry between the planum temporale on the left and the one on the
right. This pattern contrasted with the more typical asymmet
Page 120

Fig. 5.1. The left cerebral hemisphere with the planum temporale highlighted. From Neural bases of speech, hearing,
and language (Figure 9-2), by D. P. Kuehn, M. L. Lemme, & J. M. Baumgartner, 1989, San Antonio, TX: Pro-Ed.
Copyright 1989 by Pro-Ed. Adapted with permission.
ric arrangement in which the planum temporale on the left is bigger than that on the right, with the larger size thought to
reflect greater involvement in language processing. The atypical symmetry results from a typically sized left planum
temporale and a larger-than-usual right planum temporale. In the only autopsy study conducted to date for a single child
with SLI, this same atypical symmetry was observed (Cohen, Campbell, & Yaghmai, 1989).
Similar asymmetries, with left- larger than right-hemisphere perisylvian areas, have also been identified in autopsy
studies performed on individuals who did not have SLI during their lifetimes (e.g., Geschwind & Levitsky, 1968;
Teszner,Tzavares, Gruner, & Hecaen, 1972). The perisylvian areas, rather than the smaller plana temporale, became the
focus of a series of studies conducted by Plante and her colleagues (Plante, 1991; Plante, Swisher, & Vance, 1989;
Plante, Swisher, Vance, & Rapcsak, 1991). In those studies, Plante and her colleagues compared the relative size of these
areas between hemispheres and between family members who were affected or unaffected by SLI. The researchers
focused on the perisylvian areas rather than the plana temporale because of limitations in the use of MRI (Plante, 1996)—
a technique that was nonetheless highly desirable because it could be used even on very young, live participants.
The researchers found that children with SLI and their families demonstrated perisylvian areas that were larger on the
right than those typically seen in studies of individuals without SLI or a known family history of SLI (Plante, 1991;
Plante et al., 1989, 1991). These larger right perisylvian areas sometimes associated with symmetry across hemispheres
and sometimes with asymmetries favoring the right hemisphere. Nonetheless, because some individuals with atypical
configurations did not show language impairment, and others with normal configurations did show such impairment, this
structural difference cannot be seen as a single cause of language impairment. In a 1996 review of this literature, Plante
noted that the absence of abnormal findings for some individuals may simply be due to the insensitivity of MRI
techniques to subtle differences in brain structure. Nonetheless, her argument does not really explain the instances in
which identified atypical structures are associated with normal language perform-
Page 121
ance. Furthermore, Plante, as well as other researchers in the field (Leonard, 1998; Rice, 1996; Watkins, 1994), believe
that a number of factors probably need to be in place for structural brain differences to culminate in language impairment.
More recent studies have looked not only at the perisylvian areas but also at other brain structures for differences that
may help researchers better understand SLI (e.g., Clark & Plante, 1998; Gauger et al., 1997; Jackson & Plante, 1996).
Whereas many of these have been regions in or close to the perisylvian region (e.g., Clark & Plante, 1998; Jackson &
Plante, 1996), others have looked at much larger areas of the cerebrum (Jernigan, Hesselink, Sowell & Tallal, 1991), at
the extensive tract of nerve fibers connecting the two cerebral hemispheres (Cowell, Jernigan, Denenberg & Tallal, 1994,
cited in Leonard, 1998), and at areas including the ventricles (Trauner, Wulfeck, Tallal, & Hesselink, 1995). All of these
studies found at least some differences (Cowell et al., 1994).
In a recent review of these studies and others using behavioral and neurophysiological data, Leonard (1998) summarized
the evidence as indicating the high percentage of atypical neurobehavioral findings for children with specific language
impairment implicates a “constitutional basis” that may contribute to the presence of language impairment. The origins
of these suspected differences in brain structure lead to other kinds of questions about causes, such as environmental
factors.
Environmental Variables
Environmental variables can encompass physical, social, emotional, or other aspects of the developing child’s
surroundings from conception onward. Two types of environmental variables, however, have received the greatest
amount of attention for SLI—(a) variables constituting the social and linguistic environment in which children with SLI
are acquiring their language (Leonard, 1998) and (b) demographic variables, such as parental education, birth order, and
family socioeconomic status (SES), that affect that environment in less direct ways (Tomblin, 1996b).
A particularly engaging and clear account of the literature examining conversational environment of children with SLI
can be found in Leonard (1998, chap. 8). In the literature examining this type of environmental influence (e.g.,
Bondurant, Romeo & Kretschmer, 1983; Cunningham, Siegel, van der Spuy, Clark, & Bow, 1985), most studies have
focused on the nature and linguistic content of conversations occurring between children with SLI and their parents.
Usually, comparisons are made to conversations between parents and their normally developing children (age-matched
or language-matched, depending on the study). In addition, in order to clarify “chicken-or-the-egg” speculation about the
direction of causation (i.e., Are differences in conversation causing children’s problems or resulting from them?), studies
have also examined conversations between children with SLI and unrelated adults (Newhoff, 1977) and even with other
children (e.g., Hadley & Rice, 1991).
Despite the impediments offered by abundant methodological variations and challenging patterns of empirical
disagreements, Leonard (1998) ventured a few generalizations about this line of investigation. First, most of the evidence
in which children with SLI are compared with control children who are similar in age suggests that their
Page 122
conversation partners (parents, other adults, and peers alike) make allowances for their diminished language skills and
are thus reacting to, rather than causing, the children’s problems. For example, Cunningham et al. (1985) found that
mothers of children with SLI interacted similarly to mothers of control children of the similar ages in conditions of free
play, but asked fewer questions during a structured task. In addition, for those children with SLI whose comprehension
and production were both affected, mothers reduced their length of utterance, something that was not done by mothers
whose children were either normally developing or had SLI in which only expressive language was affected.
Second, Leonard (1998) contended that in studies where children with SLI are compared with younger children who are
similar in language characteristics, findings are less consistent in showing differences. Nonetheless, the most reliable
difference in how each group is spoken to by their parents involves the frequency with which recasts are used. A recast is
a restatement of a child’s production using grammatically correct structures, often incorporating morphosyntactic forms
that had been omitted or produced in error by the child (Leonard, 1998). Recent research has suggested that this
conversational strategy is used frequently by parents of normally developing children at earlier stages, but is then faded
over time. It has also been shown to be a useful therapeutic strategy (Nelson, Camarata, Welsh, Butkovsky, & Camarata,
1996). Interestingly, Leonard noted that rather than increasing their use of this kind of statement with children with SLI
as might be expected in compensation, parents of children with SLI use it less frequently than those of children without
SLI. Despite the possible value of additional research in clarifying why this difference is seen, all in all, this line of
research has not proven as productive to the understanding of the genesis of SLI as was once hoped (Leonard, 1998).
Turning to possible clues in the form of demographic variables, Tomblin (1996b) searched for risk factors in
demographic data obtained from the preliminary results (consisting of 32 children with SLI and 114 controls) of a larger
epidemiological study (planned to include 200 children with SLI and 800 controls). Specifically, he looked for
associations between demographic and biological data and the presence of SLI. Among the variables he examined
relative to the home environment were parent education, family income, and birth order of the child in the family.
Although there were trends in the direction of children with SLI being later born and having parents with fewer years of
education than unaffected children, neither of these trends was significant. Tomblin speculated that the two trends may
have been due to the extent that lower incomes are associated with larger families.
Also available to Tomblin (1996b) were data concerning exposure to biological risk factors including maternal infection
or illness, medication, use of alcohol, and use of tobacco during pregnancy, as well as the evidence of potential trauma at
birth and the participants’ birthweights. In these preliminary data at least, Tomblin found no differences between the
groups relative to maternal infection and illness during pregnancy, and actually found lower, but nonsignificant rates of
exposure to alcohol and medication. Birth histories and birthweights also did not differ significantly. Only maternal
smoking showed a trend towards higher levels among the children with SLI. Although attributing the lack of significant
findings to the relatively small sample sizes used,
Page 123
Tomblin also suggested that the larger numbers associated with the completed study would be unlikely to reveal effect
sizes of any major significance, where effect sizes relate to the magnitude of the difference between groups.
Clearly, findings across several lines of research suggest the need for the continuation and coordination of efforts to
understand the complexity of variables that put children at risk for SLI. Although neurologic and genetic research
findings have been particularly exciting over the past two decades, these variables are not sufficient by themselves to
explain SLI. Biological and environmental factors represent important frontiers for a more complete understanding of
language impairment (Snow, 1996). At a different level of explanation, linguistic and cognitive accounts attempt to
provide more immediate explanations for the specific patterns of language behaviors seen in SLI and their variability
across children and over varying ages.
Linguistic and Cognitive Accounts
A large number of linguistic accounts of SLI as well as cognitive accounts have been advanced over the past several
decades. At present, more than a dozen warrant serious consideration (Leonard, 1998). As a group, these accounts
deserve some attention here because of their potential impact on assessment and treatment of children for whom SLI is
suspected or confirmed.
As discussed in previous chapters, the validity of the assessment tool chiefly turns on the extent to which it captures the
construct being measured. Consequently, different models of SLI imply the need for different measures. In practice,
however, the link between theoretical understandings of a complex behavior and readily available assessment procedures
is usually far from direct. This is particularly true when there are a large number of competing accounts but no clear
front-runners—the current case for SLI. In addition, the term accounts, used here and used by Leonard, specifically
implies acknowledgement that these formulations fail to tie together the breadth of data that are typically associated with
use of the term theories. Despite these limitations, some familiarity with these competing accounts can help readers
anticipate future trends in both theoretical efforts and in recommended assessment practice.
Leonard (1998) reviewed a wide field of linguistic and cognitive explanations of SLI, dividing them into three
categories. Specifically, he considered six explanations of SLI focusing on deficits in linguistic knowledge: three on
limitations in general processing capacity and three on specific processing deficits. Because of space limitations, each of
these twelve accounts cannot be discussed in detail here. Instead, a small subset will be used to introduce readers to this
complex debate and illustrate the challenges awaiting researchers and clinicians who seek to translate these accounts into
assessment practice.
Language Knowledge Deficit Accounts
Leonard (1998) argued that Chomsky’s (1986) principles and parameters framework to language acquisition can be seen
as a foundation for the major accounts in which deficits in linguistic knowledge are postulated as central to SLI.
Stemming
Page 124
from transformational grammar of the 1960s and 1970s, ‘‘principles” represent universals of natural languages, and
“parameters” the dimensions along which individual languages differ. Children are presumed to work within the
constraints associated with universal principles to acquire the specific knowledge of the parameter settings associated
with their ambient language. Chief among the difficulties facing children in this process is the apparent need to
understand more than just the surface relations existing between words in sentences as they are heard. Rather, they must
also understand the underlying, or inferred, relationships between lexical categories (e.g., noun, verb, adjective) and
functional categories that explain relationships between words within sentences (e.g., complementizer, inflection,
determiner).
Differences in the accounts that Leonard (1998) placed within this category lie primarily in which area of linguistic
knowledge is absent or, more often, incomplete in children with SLI. Leonard himself and several colleagues are
associated with accounts in which knowledge of functional categories overall is deemed incomplete (Leob & Leonard,
1991; Leonard, 1995). Alternatively, Rice, Wexler, and Cleave (1995) are associated with the extended optional
infinitive account, in which children with SLI are thought to remain too long in a developmental phase in which tense is
treated as optional. Other accounts see children with SLI as unable to develop implicit grammatical rules (Gopnik,
1990), as developing rules that are too narrow in their application (e.g., Ingram & Carr, 1994), or as lacking the ability to
understand different agreement or dependent relationships existing between functional categories (e.g., Clahsen, 1989;
van der Lely, 1996).
Among the significant challenges facing these accounts is their need to provide more complete explanations of the
variability in developmental patterns shown by children with SLI and of crosslinguistic differences in the error patterns
and development of children with SLI. In addition, despite emerging efforts to tie linguistic accounts to genetic,
biological, and environmental accounts (e.g., Gopnik & Crago, 1991), further steps in that direction are needed.
Accounts Positing General Processing Deficits
General processing deficit accounts of SLI place general deficits in cognitive processing at the core of SLI, with the most
ambitious of them holding these deficits responsible for both the linguistic and nonlinguistic differences seen in children
with SLI (Leonard, 1998). Rather than assume that specific cognitive mechanisms are affected—as is done in the third
and final category of accounts—these accounts postulate a more widespread deficiency offering a simpler, more elegant
explanation of the patterns of deficits seen in children with SLI. Typically, such accounts tend to describe central
cognitive deficits in terms of reductions in processing capacity or speed.
Such accounts are particularly compelling for explanations of difficulties in word recall and retrieval and comprehension
as well as nonlinguistic cognitive deficits, but must also explain the special difficulties associated with morphosyntax in
most English-speaking children with SLI. Among the numerous researchers cited by Leonard (1998) as working on
accounts of this type are Ellis Weismer (1985), Bishop (1994), Edwards and Lahey (1996; Lahey & Edwards, 1996) as
well as Leonard himself.
Page 125
Leonard’s surface hypothesis (e.g., Leonard, 1989; Leonard, Eyer, Bedor, & Grela, 1997) represents one of the most
thoroughly probed of the general processing deficit accounts and, consequently, serves here as an important exemplar of
such accounts. The surface hypothesis suggests that differences in the pattern of deficits observed crosslinguistically in
children with SLI may be due to differences in language structure across languages. Such differences are thought to lead
to differences in processing demands rather than to the impaired gralnmatical systems posited by linguistic accounts.
This account emphasizes the importance of surface features of languages, such as the physical properties of English
graummatical morphology, that may represent special challenges to children, particularly to those with reduced
processing capabilities.
According to the surface hypothesis, children with SLI will take longer to acquire the more difficult aspects of their
language and may focus their processing efforts in some areas at the expense of others (e.g., on word order at the
expense of morphology). Among those features of a language that are considered particularly vulnerable are those that
are relatively brief, uncommon in languages of the world, or less regular within the language (e.g., numerous
grammatical morphemes in English). Leonard (1998) provides a thorough description of the successes and failures of
this account in explaining an ever expanding body of empirical data from several language groups. Further he shows its
basic compatibility with the surface hypothesis and other processing limitation accounts that emphasize reduced speed of
processing.
As with the grammatical knowledge accounts, accounts that posit general processing deficits have a wide range of
crosslinguistic data to address, including patterns of errors and of acquisition patterns in children with SLI. Further, the
appeal of such accounts in terms of simplicity is enhanced if they can also address similar data for children without
impaired language. Add to that the desirability of addressing emerging data on the genetic and biologic factors
associated with SLI and it becomes only a small wonder that consensus leading to a unified theory of SLI eludes the
research community at this time. The last of the three types of accounts Leonard describes within this community
wrestles with this same list of empirical challenges but proposes cognitive limitations that are more specific in nature.
Specific Processing Deficit Accounts of SLI
According to Leonard (1998), three accounts have focused on specific deficits as responsible for far-reaching
consequences for language function. Respectively, these accounts hypothesize deficits in phonological memory (Ellis
Weismer, Evans, & Hesketh, 1999; Gathercole & Baddeley, 1990), in temporal processing (Tallal, 1976, Tallal &
Piercy, 1973; Tallal, Stark, Kallman, & Mellits, 1981), and in the mechanisms used for grammatical analysis (Locke,
1994). These accounts are less well developed than the linguistic and general cognitive deficit accounts in terms of the
breadth of data they encompass.
Of these accounts, the accounts associated with temporal processing (viz., Stark & Tallal, 1988; Tallal et al., 1996) have
had the greatest recent impact, including considerable attention in the popular press (e.g., in a USA Today article [Levy,
1996]).
Page 126
This attention has largely been the result of the popularization of a specific training program called Fast ForWord
(Scientific Learning Corporation, 1998).
After a long history of work on SLI, Tallal joined with Michael Merzenich and others to conduct a series of remarkable
treatment studies (Merzenich et al., 1996; Tallal et al., 1996). In those studies use of Fast ForWord, a computer training
program designed to address hypothesized processing difficulties, resulted in significant gains in language performance
and auditory processing. Development of that program was based on evidence that children with SLI have difficulty
processing brief stimuli or stimuli that follow one another in rapid succession—difficulties that might significantly affect
a child’s ability to process speech. Further, the program is based on the hypothesis that the deficit can be ameliorated by
exposing children with SLI to stimuli that are initially recognizable but acoustically altered through the lengthening of
formant transitions. During treatment, children participate in a large number of video-game-like trials in which they are
required to make judgements about the altered stimuli. Across trials, the stimulus characteristics are altered in the
direction of natural speech.
Readers are encouraged to take note of the debate surrounding this account and the commercialization it has fostered (e.
g., Gillam, 1999; Veale, 1999). Ironically, the authors of the other accounts discussed in this section of the chapter have
appeared to take greater pains to tie together a huge number of empirical clues about the nature of SLI. However, it is
rare to find the public so aware of an account—or at least the treatment program associated with it—and to clamor for its
use with children presenting with a wide range of communication-related disorders (including reading disabilities and
autism). These public responses alone make it a fascinating area of additional investigation for clinicians and researchers
interested in children’s language disorders. Independent validation of this treatment and its theoretical underpinnings has
yet to be provided (Gillam, 1999).
What’s Ahead for Accounts of SLI?
In this section, I have tried my best to point out the most important landmarks of this vast and changing terrain (helped
considerably by the work of Leonard, 1998, and the urgings of Bernard Grela to address these complex issues).
However, I am certain that I have missed some important vantage points and critical roadways. Nonetheless, I hope that
this brief overview provides you with the sense of the complexities facing these researchers.
The researchers working on this topic have immense amounts of data to address if they are to settle on a truly
comprehensive theory, rather than fragmented accounts of isolated aspects of SLI. Not only must they deal with
information about how children of SLI perform on a range of language and nonlanguage tasks, they must do so for the
wide range of spoken languages and across the life span. Further they must tie these together with the burgeoning
findings about the genetics, brain structures, and social contexts of children with SLI.
Other challenges facing researchers interested in SLI have been summarized by Tager-Flusberg & Cooper (1999), who
reviewed the findings of a recent National Institutes of Health workshop focused on steps needed to produce clear
definitions of SLI
Page 127
for genetic study. Despite the narrow focus of that conference, the recommendations that came out of it appear germane
to thoughts about the relation of theory to assessment practices. Among the recommendations summarized by Tager-
Flusberg and Cooper are that researchers abandon exclusionary definitions of SLI, broaden the language domains and
information-processing skills they assess, and develop a standard approach to defining SLI, not only in preschoolers but
also in older school-age children, adolescents, and adults. These same recommendations are clinically relevant insofar as
combining clinical and research efforts may result in the greatest gains in both arenas.
Special Challenges in Assessment
In addition to the theoretical challenges to the assessment of children with SLI, these children also come with a range of
personal reactions to testing that are at least partially determined by the amount of success they expect. Any of us who
has difficulty in certain areas, such as singing, drawing, or playing sports knows how uncomfortable we feel when our
performance in those areas is evaluated. Consequently, I urge you to refer back to chapter 3 for some of the general
guidelines addressed in that chapter, which will serve as a useful exercise in preparing for working with children with
SLI.
Beyond the personal dynamics that should always be a special consideration in assessment, children with SLI present
several problems related to how they are identified as needing help. Plante (1998) pointed out at least three problems
with how such children have been identified by researchers. Some of Plante’s concerns about the literature also face
clinicians. Even those that do not, deserve attention by knowledgeable consumers of this research literature.
First, Plante (1998) argued, researchers have tended to use criteria for nonverbal IQ (often nonverbal IQ of 85 or greater)
that exclude not only children with mental retardation but large numbers of others whose lower intelligence makes them
no less relevant to our understanding of SLI. Second, Plante noted that in the identification process, researchers have
tended to use tests and cutoff scores on those tests that have not been shown to successfully identify children with the
disorder. Specifically, she questioned two particular aspects of the validity of those tests and cutoffs: their sensitivity (the
extent to which individuals with disorders are actually identified as having the disorder) and specificity (the extent to
which individuals without disorders are successfully identified as such). (See chap. 9 for more complete explanations of
these concepts).
Third, Plante (1998) questioned the use of discrepancy or difference scores in the practice often referred to as cognitive
referencing. Cognitive referencing occurs when the identification of SLI hinges on the demonstration of a specific
difference between expected language function (based on nonverbal IQ) and language performance. Plante attacked this
practice on two grounds: (a) because of a tendency for such comparisons to be based on age-equivalent scores, which are
the targets of a long history of criticism from psychometric perspectives (e.g., see chap. 2) and (b) because there is no
good evidence to support the use of nonverbal IQ as an indicator of language potential. As just one example of this lack
of evidence, Krassowski and Plante (1997) reported a lack of stability in the performance IQ scores of 75 children with
SLI over a 3-year time frame that would be inconsistent with their use as a constant
Page 128
measure of language potential. Plante and her colleagues are joined by large numbers of the community of language
researchers in finding serious—many would say fatal—flaws with cognitive referencing (e.g., Aram et al., 1993; Fey et
al., 1994; Kamhi, 1998; Lahey, 1988). Along with the instability of categorizations obtained through cognitive
referencing, others have noted that similar amounts of improvement in specific treatments are made by children who
would fall on both sides of conventional cognitive criteria (e.g., Fey et al., 1994).
Even readers who have simply skimmed earlier chapters on their way to this one will recognize certain common
dilemmas facing clinicians as well as researchers regarding cognitive referencing. Thus, for example, both groups need
to be as careful as possible to select measures that have been studied very carefully for the purpose to which they are
being used. That is, evidence of criterion-related validity for how and with whom measures are used is something in
which both clinicians and researchers have a prodigious stake. In addition, both groups should avoid the relatively
unreliable and misleading nature of age-equivalent scores—insofar as they are able to do so. The “wiggle room” left by
that last clause stems from the fact that clinicians may find themselves compelled to use age-equivalent scores by the
settings in which they work, particularly for younger children. With regard to cognitive referencing, Casby (1992) noted
that in 31 states, eligibility for services based on SLI demand its use in some form. In such situations, an ethical and
sensible recommendation would be to provide the required documentation (i.e., to go ahead and report the cognitive-
referenced information, age-equivalent scores, or both), but accompany it with appropriate warnings about the
limitations of each and recommendations from a more scientifically supportable perspective.
In a discussion of problems of differential diagnosis in SLI, Leonard (1998) called attention to a further difficulty
associated with the assessment of children considered at risk for the disorder. Specifically, he called attention to the
difficulty in distinguishing late talkers, who will ultimately prove to be simply late in developing language, from those
children whose late talking foretells persisting problems in language acquisition. Most children with SLI have a history
of late talking (which is usually defined in terms of late use of words). However, only one quarter to one half of late-
talkers will go on to be diagnosed with a language disorder. Developing accurate predictions of which children are
showing early signs of SLI has spurred the efforts of a number of researchers who hope that early identification will lead
to effective and efficient early intervention (e.g., Paul, 1996; Rescorla, 1991).
Unfortunately, the dramatic variability in children’s normal language development is proving a considerable obstacle.
Thus, reliable signs yielding reasonably accurate predictions have evaded researchers, leading Leonard (1998) to
recommend with- holding diagnoses until at least age 3 and Paul (1996) to advise a “watch and see” policy. A differing
interpretation of the data on which Paul’s recommendations are based that includes a plea for more aggressive
intervention can be found in van Kleeck, Gillam, and Davis (1997).
Also urging more aggressive responses to late-talking children, Olswang, Rodriguez, and Timler (1998) represent a
somewhat more optimistic reading of the research evidence. Specifically they outlined speech and language differences
and other risk factors that they propose should prompt decisions to intervene. Table 5.2
Page 129
Table 5.2
Predictors and Risk Factors Useful in Helping Clinicians Decide Whether to Enroll Toddlers Who Are Late Talkers for
Intervention

Predictors

Speech Nonspeech Risk Factors

Language production
Small vocabulary for ageFew verbsPreponderance of general all-purpose verbs (e.g., want, go, get, do, put, look, make,
got)More transitive verbs (e.g., John hit the ball)Few intransitive and ditransitive verb forms (e.g., he sleep, doggie run)
PlayPrimarily manipulating and groupingLittle combinatorial and/or symbolic play Otitis media—Prolonged periods of
untreated otitis media Language comprehensionPresence of 6-month comprehension gapLarge comprehension-
production gap with comprehension deficit Gestures—Few communicative gestures, symbolic gestural sequences, or
supplementary gestures Heritability—Family member with persistent language and learning problems PhonologyFew
prelinguistic vocalizationsLimited number of consonantsLimited variety in babbling structureLess than 50% consonants
correct (substitution of glottal consonants and back sounds for front)Restricted syllable structureVowel errors Social
skillsBehavior problemsFew conversational initiationsInteractions with adults more than peersDifficulty gaining access
to activities Parent needsParent characteristics: Low SES; directive more than response interaction styleExtreme parent
concern ImitationFew spontaneous imitationsReliance on direct model and prompting in imitations tasks of emerging
language forms Note. From “Recommending Intervention for Toddlers With Specific Language Learning Difficulties:
We May Not Have All the Answers, but We Know a Lot,” by L. Olswang, B. Rodriguez, & G. Timler, 1998. American
Journal of Speech-Language Palhology, 7, p. 29. Copyright 1998 by American Speech-Language-Hearing Association.
American Speech-Language-Hearing Association. Reprinted with permission.
Page 130
summarizes their list. They recommended that larger numbers of risk factors be viewed as cause for greater concern.
Expected Patterns of Language Performance
The language performance of children with SLI has undergone greater scrutiny than that of any other group of children
with language difficulties. The diversity and depth of this research over several decades leads to some clear expectations
of areas in which difficulties can be expected but also to ubquitous expectations that each child will be different.
Therefore, before I delve into expected patterns of difficulties, I should mention again that general expectations lead to
hypotheses about what might be expected in a given child—not infallible certainties. Generalizations also fail to render
either the variations found in studies identifying distinct subtypes of SLI (e.g., Aram & Nation, 1975; Rapin & Allen,
1988; Wilson & Risucci, 1986) or in studies revealing changes in patterns of impairment that occur with age (e.g., Aram,
Ekelman, & Nation, 1984; Stothard, Snowling, Bishop, Chipchase, & Kaplan, 1998; Tomblin, Freese, & Records, 1992).
Further, these generalizations have been identified for children acquiring English—a potentially serious limitation for
clinicians working with children acquiring other less-studied languages (Leonard, 1998). Thus, the expected patterns
discussed here are described only briefly and are meant to prompt consideration of likely areas of difficulty, not to
become the only ones given attention.
Among the more robust findings from studies examining language skills in English-speaking children with SLI have
been the findings that (a) expressive and receptive language are often differentially impaired, and (b) degree of
involvement can vary from quite mild to quite severe. Also, expressive language tends to be more frequently and
severely affected—an observation that is borne out in much of the literature and is also reflected in the DSM–IV
(American Psychiatric Association, 1994) definition shared at the beginning of the chapter. Recent research, however,
suggests that this disparity may not be as large as has sometimes been thought. Among the children who were found to
have impaired language in a report dealing with a large epidemiological study, Tomblin (1996a) identified 35% of
children with expressive problems, 28% with receptive problems, and 35% with both expressive and receptive problems
(given a cutoff of 1.25 standard deviations below the mean).
In Table 5.3, specific areas of difficulty relative to normally developing peers are summarized on the basis of an
extensive review of literature appearing in Leonard (1998; cf. Menyuk, 1993; Watkins, 1994). In Table 5.3, the density
of comments falling under language production reflects not only the tendency for this modality to be affected by more
obvious and often more severe deficits than comprehension, but also by a tendency for it to have received much greater
research attention. A related table, Table 5.4, lists specific grammatical morphemes that have been identified as
particularly problematic.
As you examine Table 5.3, notice that many—although not all—of the differences shown by children with SLI resemble
patterns seen in younger children and are therefore characterized as delays. This observation may have implications
related to the nature of this disorder. In addition, it supports the reasonableness of approaching
Page 131
Table 5.3
Patterns of Oral Language Impairment by Modality and Domain Reported in Children With Specific Language
Impairment (SLI) (Leonard, 1998)

Domain Production Comprehension

Semantics
Lexical abilities and early word combinations
Delays in acquiring first words and word combinationsDelays in verb acquisition, with overuse of some common verbs
(e.g., do, go, get, put, want)Word-finding difficulties,a especially noted in school-age children Deficient in learning to
understand new words, particularly those involving actions Argument structure Increased tendency to omit obligatory
arguments (e.g., omission of object for transitive verb) or even the verb itselfIncreased tendency to omit optional but
semantically important information (e.g., adverbials providing information regarding time, location, or manner of action)
and use of an infinitival complement (e.g., He wants to do this) Increased difficulty in acquiring argument structure
information from syntactic information for new verbs Grammatical morphologyb Grammatical morphology constitutes a
relative and sometimes enduring weakness in children with SLI (see Table 5.4 for a list of grammatical morphemes that
have received particular attention)Grammatical morphology related to verbs is especially affectedErrors most often
consist of omissions rather than inappropriate use, but are likely to be inconsistent in either case Limited research
suggests poorer comprehension of grammatical morphemes, especially for those of shorter duration, and poorer
identification of errors involving grammatical morphemes Phonology Although occasionally occurring alone,
phonological deficits are almost always accompanied by other language deficits, and vice versaDelays are most
frequently seen with most errors resembling those of younger normally developing children.Unusual errors in
productionc occur rarely, but probably more often than in normally developing childrenGreater variability in production
than children without SLI at similar stages of phonological development
(Continued)
Page 132
Table 5.3 (Continued)

Domain Production Comprehension

Pragmatics
Some evidence of pragmatic difficultiesAlthough these difficulties largely seem due to communication problems posed
by other language deficits, independent pragmatic deficits may occur as wellParticipation in communication is
negatively affected when communication involves adults or multiple communication partners Limited research suggests
that understanding of the speech acts of others may be affectedComprehension of figurative language (e.g., metaphors,
idioms) can be affected Narratives Cohesion of narratives can be affected, and sometimes expected story components are
absent Comprehension of narratives can be affected when inferences need to be drawn from the literal narrative content
aEvidenced by unusually long pauses in speech, frequent circumlocution, or frequent use of nonspecific words such as it
and stuff.bGrammatical morphology can be defined as “the closed-class morphemes of language, both the morphemes
seen in inflectional morphology (e.g., ‘plays,’ ‘played’) and derivational morphology (e.g., ‘fool,’ ‘foolish’), and
function words such as articles and auxiliary verbs” (Leonard, 1998, p. 55).cAmong the unusual errors reported for this
population are later developing sounds being used in place of earlier developing sounds, a sound segment addition, and
use of sounds not heard in the child’s ambient language.
treatment goals from a developmental perspective (Leonard, 1998). Also, notice the expanse of unmapped country
revealed here. Despite several decades of work, much remains unknown about the abilities of children with SLI and how
they are related to one another. Consequently, the potential for valuable outcomes from experimental exploration is
immense!
Finally, on a very different note, readers of this table may find that their knowledge of some terminology related to
linguistic descriptions of these children’s difficulties is outdated or incomplete. They are referred to Hurford (1994) as a
reference guide to the more basic grammatical terms.
Related Problems
When compared with children described in other sections of this book, children with SLI have far fewer related
problems. Despite the more restricted nature of their difficulties, however, children with SLI are at increased risk for a
number of significant, ongoing problems in addition to a lengthening list of subtle perceptual and cognitive deficiencies
that were described briefly earlier. Among these are increased risk for emotional, behavioral, and social difficulties. In
addition, there is increased risk for ongoing academic difficulties often associated with diagnoses of learning disabilities
(Wallach & Butler, 1994).
Page 133
Table 5.4
Examples of Grammatical Morphemes, an Area of Special Difficulty for Children With Specific Language Impairment
(SLI)

Inflectional morphemes
Past tense, regular –ed: slept, walked, irregular: flew, hid
Third-person singular -s: sits, runs
Progressive–ing: is running, is seeing
Plural –s: coats, flowers
Possessive ‘s (also called genitive ‘s): Sam’s, dog’s
Other grammatical morphemes
Copula be: he is a boy; they are happy
Auxiliary be: she is hunting, he was cooking
Auxiliary do: I don’t hate you; Do you remember that man?
Articles: the man; a cat
Pronouns: anything, herself, I, he, they, them, her

Emotional, Behavioral, and Social Difficulties


The possibility that children with specific language disorders may be at risk for difficulties in personal adjustment has
been examined at several levels of severity. These levels of severity have ranged from studies examining the prevalence
of identifiable psychiatric diagnoses (e.g., Baker & Cantwell, 1987a, b; Beitchman, Nair, Clegg, & Patell, 1986;
Beitchman, Brownlie, et al., 1996; Beitchman, Wilson, et al., 1996) to studies examining specific aspects of peer
relationships or social maturation (e.g., Craig, 1993; Farmer, 1997; Fujiki & Brinton, 1994; Gertner, Rice, & Hadley,
1994; Records, Tomblin, & Freese, 1992; Rutter, Mawhood, & Howlin, 1992).
Studies looking at this issue differ in a large number of methodological variables (e.g., ages studied, methods used to
define language impairment and other problem areas). Nonetheless, a serviceable overview of their findings is that
children with SLI are at increased risk for difficulties involving their emotional, behavioral, and social status. Further,
this generalization holds for both children and older individuals with a history of SLI—even when they appear to have
outgrown persisting language impairment (through treatment or maturational processes alone; e.g., Rutter, Mawhood, &
Howlin, 1992). There is evidence that children with receptive problems or those with both expressive and receptive
language problems are at greater risk than those with expressive problems alone (e.g., Beitchman, Wilson, et al., 1996;
Stevenson, 1996). The causal mechanisms involved in the co-occurrence of communication problems and difficulties in
emotional, behavioral, and social realms are difficult to discern and are far from being understood (Stevenson, 1996).
Still, the implications of the co-occurrence alone are nonetheless important for those who help children with SLI.
Among the specific problems associated with SLI that can be categorized as psychiatric problems are attention deficit
disorder (ADD), conduct disorder, and anxiety disorders (Baker & Cantwell, 1987b). Of these three disorders, perhaps
the most familiar to many people is attention deficit hyperactivity disorder (ADHD). With
Page 134
an estimated prevalence of 4 to 6% of all elementary-school-aged children, it has been described as the ‘‘most common
significant behavioral syndrome in children” (Wender, 1995, p. 185). Recall that in the description of Wilson at the
beginning of the chapter, it was suspected as a contributor to some of his difficulties in fitting into the classroom and
interacting with peers.
ADHD is typically diagnosed in children who show patterns of inattention, over-activity–impulsivity, or both, that seem
inappropriate for age and detrimental to functioning (American Psychiatric Association, 1994). Although symptoms of
the disorder may be more common in some situations than others, they occur across settings. Excellent practical
recommendations for dealing with the symptoms of this disorder in the classroom are available in Dowdy, Patton, Smith,
and Polloway (1998, Appendix A).
Conduct disorder is diagnosed in children who demonstrate a repeated and consistent pattern of behavior that is
inappropriate for age and violates social or even legal norms (American Psychiatric Association, 1994; Goldman, 1995).
Behaviors that are associated with this diagnosis can include aggression to people and animals, destruction of property,
deceitfulness, theft, truancy, and running away.
Anxiety disorder is diagnosed in children who worry excessively, usually about their performance, with resulting
negative effects on their functioning (American Psychiatric Association, 1994). Although the area of concern may shift
from time to time, the intensity, duration, and frequency of the anxiety and worry are seen as out of proportion with their
actual likelihood or impact. Children with this disorder may be overly concerned about approval and require excessive
reassurance about the adequacy of their performance or other focus of concern.
Although the diagnoses of attention deficit disorder, conduct disorder and anxiety disorders are relatively rare among
children with language impairment, another view of the association between psychiatric diagnoses and language skills
has been taken by researchers who examine the language skills of groups of children seen as psychiatric outpatients. In
one relatively recent study, researchers found that one third of the 399 such children whose language was screened were
identified as having an unsuspected language impairment (Cohen, Davine, Horodezky, Lipsett, & Isaacson, 1993). Thus,
awareness of this possible association can help speech-language pathologists contribute to the development of children
whose emotional and behavioral issues have previously overshadowed very real language difficulties, as well as those
children for whom a language diagnosis has already been made.
Academic Difficulties
The connection between language difficulties and academic difficulties is a powerful one. In the early grades, academic
skills build on language skills used in everyday experience. Later, academic demands, especially for written language
acquisition but also for the understanding and use of figurative language, narrative construction and the use of language
in reasoning (Nippold, 1998), help fuel additional gains in language development. At least that is the way things are
thought to work for normally developing children.
Page 135
Increasingly, it appears that the oral language difficulties of children with SLI may contribute to and be exacerbated by
the unsuccessful language experiences they encounter in school. Bashir and Strominger (1996) described the
interweaving of oral and written language problems as follows:
It is reasonable to argue that the continued academic vulnerability in children with language disorders in the middle
grades reflects both the persistence of language problems and restrictions on later language development resulting from
reduced reading as well as restricted exposure to different texts and text-based information. (p. 134)
Thus, not only may language impairments lead to academic difficulties, but difficulties with the language of the
academic setting may contribute to children falling further behind their peers in language development.
A recent study by Stothard et al. is quite representative of the literature on later language and academic outcomes (e.g,
Hall & Tomblin, 1978; Tomblin et al., 1992; Weiner, 1974) and corroborates some of its more robust findings. The study
reports on data from the same children seen at ages 4, 5½, and 15½ years. Experimental measures included measures of
oral language (receptive vocabulary, expressive vocabulary, general comprehension, grammatical understanding,
naming), short-term memory and phonological skills (sentence repetition, nonword repetition, and spoonerisms), and
written language (consisting of one test that assesses single-word reading, single-word spelling, and reading
comprehension). In addition, information about children’s special education status was examined.
Results indicated that children who were seen as having persisting SLI at age 5½ demonstrated long-standing
impairment, with performances at age 15½ falling below age-matched peers on all oral language measures. In particular,
47% of these children obtained verbal composite scores more than 1 standard deviation below the mean, and 20%
obtained scores more than 2 standard deviations below the mean. In addition, they showed persisting problems in reading
and spelling that had resulted in a high percentage receiving special education assistance of some kind.
Even children who seemed to have recovered at age 5½, performed significantly less well than age-matched peers at age
15½ on tests tapping short-term memory and phonological skills. Further, almost a third of these children, 31% (8 out of
26), demonstrated performances consistent with the persisting SLI category, that is, they could again be considered
language impaired. This is consistent with a pattern termed illusory recovery, which refers to the apparent reemergence
of problems as the complexity of demands placed on children increases with grade level (Scarborough & Dobrich, 1990).
The few studies exploring the language skills and academic accomplishments of adults with a history of SLI (Gopnik &
Crago, 1991; Hall & Tomblin, 1978; Plante et al., 1996; Tomblin et al., 1992) confirm that many children with SLI will
continue to be plagued by significant differences in language performance that impact other areas of functioning,
including school advancement.
The Personal Perspective for this chapter illustrates the possibility of long-term academic effects of SLI.
Page 136
PERSONAL PERSPECTIVE
This perspective was provided by Michele, who told her story to Cynthia Roby, the author of When learning is tough:
Kids talk about their learning disabilities (1994). Although Michele primarily discussed her learning disability and its
impact on her school experiences, it seems likely that she had language problems as a “learning problem.”
“My apartment is above a video store in the city. I can go right downstairs and rent a movie. I don’t have any brothers
or sisters. My mother is Korean. She works at a cafeteria. My dad works at the airport; he’s Chinese. We eat lots of
Korean and Chinese food. I like rice and noodles. I love pizza, too.…
“I think my parents found out I had a learning problem when I was two. I had a problem when people would read to
me. I would just draw on the books because I couldn’t understand the stories. It was hard for me to understand the
words.
“I hated my old school, I felt a little bit mad about having a learning problem. I couldn’t read the words and the other
kids could. I had to be sent to a quiet room so I could read. Somebody would help me there. It made me feel happy
when I finally got extra help. It didn’t make me feel bad to go to the special classroom.
“Then a few years ago, my parents decided to send me to a special school for kids with learning disabilities. I like it
there, and the teachers help me. They treat me nicely and help me with my reading. Even so, recess is still the most fun.
I run around the playground with the girls.
“My parents help me with school work. My dad used to show me flash cards. He still helps me with my math, my
reading, and my spelling. He made a list of all the math facts I have to learn; it’s taped next to my bed. My parents are
good to me. They don’t get angry at me because I have learning problems.
“I think my cousin may have learning problems, too. He is just little. He goes to school, and when the teacher reads a
book he won’t listen. He is like me at that age. I’m good at art. I like to do self-portraits and paint and do projects. I
would rather paint all day instead of doing math or reading. I like classical music. And last year I learned to play “Can
Can’’ on the keyboard.I practiced every day. Sometimes I would mess up a little. Then I would do it over again, and I
would do it right.
“I think high school will be hard, very hard. I am going to study biology in college—it’s all about human beings and
the body parts. I’ll be a teacher when I grow up. I will tell kids not to fight or pinch. I want to teach little kids. They’re
cute!”
Michele’s tip: “I would tell other kids with learning problems to get books and keep trying to read them.”
Page 137
Although it is the exceptional child with written language difficulties who is without a history of spoken language
difficulties (Stark & Tallal, 1988), not all children with SLI go on to be identified with difficulties in written language.
Factors that appear to predict later problems in literacy include difficulties with receptive language, phonological
awareness, and rapid naming (Leonard, 1998). Phonological awareness is explicit knowledge about the sound structure
of the language—for instance, that words are made up of syllables and syllables of individual sounds (Ball, 1993). Other
complications to be dealt with in understanding the relationship between language impairment and later academic
difficulties are a tendency for lower intelligence and lower SES to encroach as potential confounding variables
(Schachter, 1996).
Summary
1. Specific language impairment (SLI) is a research construct designed to help identify a “pure” language disorder and is
usually defined in terms of exclusionary as well as inclusionary characteristics, although such definitions are
increasingly controversial.
2. Among factors that are suspected in the causation of SLI are genetics, differences in brain structure and function, and
other biological factors. Environmental factors, especially aspects of the child’s social environment, have been
examined, but appear less important at this time.
3. In addition, linguistic and cognitive accounts of causation in SLI have received extensive attention from researchers.
According to Leonard (1998), the three major categories into which these accounts fit are those focused on linguistic
knowledge deficits, general processing deficits, and specific processing deficits.
4. Although theoretical understanding of SLI can ultimately be expected to engender major shifts in assessment ]
methods, little translation from experimental assessment tools to those available to practicing clinicians has yet occurred.
More immediate impact may derive from calls for researchers (and clinicians) to assess language and other performance
domains more broadly and to seek consensus on diagnostic methods.
5. Special challenges to assessment include problems with the frequent exclusion of children with mental retardation
from definitions of SLI, the use of cognitive referencing in research, bureaucratically dictated protocols, and the overuse
of measures in identification without sufficient study of their validity for that purpose.
6. As an additional challenge, the problem of differentiating young late-talkers from children who will have a persistent
impairment in language poses special difficulties.
7. Patterns of language impairment can range from mild to quite severe and can affect both receptive and expressive
language. Domains of language that are particularly problematic for young children learning English appear to include
morphology, syntax, and phonology.
8. Related problems for these children include somewhat increased risk for emotional, behavioral, and social difficulties,
as well as greater risk for persistent academic difficulties.
Page 138
Key Concepts and Terms
anxiety disorder: an emotional disorder in children in which their excessive anxiety and worrying, usually about
performance, adversely affects their performance in school and at home.
attention deficit disorder (with or without hyperactivity): a psychological disorder in which individuals demonstrate
excessive inattention and distractibility, implusivity and hyperactivity, or both when compared with other individuals the
same age.
cognitive referencing: the use of a measure of intelligence (usually nonverbal IQ) as a reference against which to define
impaired language; it is based on the assumption that nonverbal cognition represents an upper bound for language
function.
concordance: agreement in the presence or absence of a disorder between two individuals in a natural pair (e.g., a pair of
identical or fraternal twins).
conduct disorder: a psychological disorder in which there is a persistent pattern of violating social rights, others’ rights,
or societal norms through behaviors such as aggression toward people or animals, destruction of property, theft, or
deceitfulness.
effect size: A measure reflecting the magnitude of difference between groups in an experimental study. Whereas
statistical significance addresses the reliability of a research finding, effect size provides important information for
judging the importance of a statistically significant effect.
Fast ForWord: A computerized treatment developed by Paula Tallal, Michael Merzenich, and their colleagues, based on
the premise that SLI is caused by difficulties in temporal processing.
general all-purpose verbs: Verbs, such as do and get, that occur with relatively high-frequency in the speech of normally-
developing children, but that also tend to be overused in the speech of children who are “late talkers.”
general processing deficit accounts of SLI: explanations of SLI in which processing deficits are presumed to account for
both verbal and nonverbal difficulties documented in children with SLI. The surface hypothesis is one such account.
incomplete penetrance: the failure of a gene to have the same effect on all individuals who carry it, for example, when a
gene that is usually associated with a specific disease does not produce that disease in some individuals who carry it.
late-talkers: Children who show delays in language production that may represent early signs of SLI or simply a delay in
language development that is overcome as the child matures.
linguistic accounts of SLI: Accounts in which deficits in linguistic knowledge are considered the core deficits in children
with SLI. Rice, Wexler and Cleave’s extended optional infinitive is an example of this type of account.
Page 139
magnetic resonance imaging (MRI): a relatively noninvasive radiographic technique used to study brain structure in
living individuals.
phenotype: the behavioral outcome for which a genetic explanation is sought (Rice, 1996).
phonological awareness: explicit knowledge about the sound structure of the language, for example, knowing that words
are made of syllables, and syllables of individual sounds.
proband: the affected individual in a genetic study, whose identified disorder or difficulty leads to researchers including
them and members of their family in genetic research.
recast: a restatement of a child’s production using grammatically correct structures, often incorporating morphosyntactic
forms that had been omitted or produced in error by the child.
risk factors: factors that are associated with increased likelihood that a disorder will occur; these factors may or may not
represent causes.
specific language impairment (SLI): delayed acquisition of language skills, usually defined as occurring in the absence
of impairments in other areas of functioning, such as nonverbal cognition and hearing.
specific processing deficit accounts of SLI: explanations of SLI in which specific processing deficits (e.g., in auditory
processing or phonologic working memory) are thought to account for the language and other difficulties associated with
SLI. Tallal’s account based on temporal processing deficits is one example.
Study Questions and Questions to Expand Your Thinking
1. How might knowledge that SLI is sometimes ‘‘caused” by differences in brain structure affect diagnosis? How might
it affect treatment?
2. Remembering that co-occurrence does not mean causation, consider the significance of a physical marker, such as a
specific neurological anomaly, for SLI. What other mechanisms might explain its presence besides its having a role in
causing the appearance of language learning difficulties?
3. If you were the parent of a child with SLI, what might you want to know about the genetics of this condition? How
might you, as a clinician, explain this information, and where could you suggest that both you and the parent obtain
additional information?
4. Describe three possible co-occurring problems that may affect the communication and test-taking behaviors of a child
with SLI.
5. On the basis of your reading, what domains of language and communication have been considered important by
researchers? Can you find standardized tests that correspond to these areas?
Page 140
6. What research questions do you think are most important for furthering our understanding of this condition?
Recommended Readings
Gilger, J. W. (1995). Behavioral genetics: Concepts for research and practice in language development and disorders.
Journal of Speech and Hearing Research, 38, 1126–1 142.
Gillam, R. (1999). Computer assisted language intervention using FastForward: Theoretical and empirical considerations
for clinical decision-making. Language, Speech, and Hearing Services in Schools, 30, 363–370.
Hurford, J. R. (1994). Grammar: A student’s guide. Cambridge, England: Cambridge University Press.
Leonard, L. (1998). Children with specific language impairment. Cambridge, MA: MIT Press.
References
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington,
DC: Author.
Aram, D. M., & Eisele, J. A. (1994). Limits to a left hemisphere explanation for specific language impairment. Journal
of Speech and Hearing Research, 37, 824–830.
Aram, D. M., Morris, R., & Hall, N. E. (1993). Clinical and research congruence in identifying children with specific
language impairment. Journal of Speech and Hearing Research, 36, 580–591.
Aram, D. M., & Nation. J. (1975). Patterns of language behavior in children with developmental language disorders.
Journal of Speech and Hearing Research, 18, 229–241.
Aram, D. M., Ekelman, B., & Nation, J. (1984). Preschoolers with language disorders: 10 years later. Journal of Speech
and Hearing Research, 27, 232–244.
Baker, L., & Cantwell, D. (1987a). Comparison of well, emotionally disordered and behaviorally disordered children
with linguistic problems. Journal of the American Academy of Child and Adolescent Psychiatry, 26, 193–196.
Baker, L., & Cantwell, D. (1987b). A prospective psychiatric follow-up of children with speech/language disorders.
Journal of the American Academy of Child and Adolescent Psychiatry, 26, 546–553.
Ball, E. W. (1993). Assessing phoneme awareness. Language, Speech, and Hearing Services in Schools, 24, 130–139.
Bashir, A., & Strominger, A. (1996). Children with developmental language disorders: Outcomes, persistence, and
change. In M. D. Smith & J. S. Damico (Eds.), Childhood language disorders (pp. 119–140). New York: Thieme.
Beitchman, J. H., Nair, R., Clegg, M., & Patell, P. G. (1986). Prevalence of psychiatric disorders in children with speech
and language disorders. Journal of the American Academy of Child and Adolescent Psychiatry, 25, 528–535.
Beitchman, J. H., Brownlie, E. B., Inglis, A., Wild, J., Ferguson, B., Schachter, D., Lancee, W., Wilson, B., & Mathews,
R. (1996). Seven-year follow-up of speech/language impaired and control children: Psychiatric outcome. Journal of
Child Psychology and Psychiatry, 37, 961–970.
Beitchman, J. H., Wilson, B., Brownlie, E. B., Walters, H., Inglis, A., & Lancee, W. (1996). Long-term consistency in
speech/language profiles: II. Behavioral, emotional, and social outcomes. Journal of the American Academy of Child and
Adolescent Psychiatry, 35, 815–825.
Bishop, D. V. M. (1992a). The biological basis of specific language impairment. In P. Fletcher & D. Hall (Eds.), Specific
speech and language disorders in children (pp. 2–17). San Diego, CA: Singular Press.
Bishop, D. V. M. (1992b). The underlying nature of specific language impairment. Journal of Child Psychology and
Psychiatry, 33, 3–66.
Bishop, D. V. M. (1993). Language development after focal brain damage. In D. Bishop & K. Mogford (Eds.), Language
development in exceptional circumstances (pp. 203–219). Hove, UK: Lawrence Erlbaum Associates.
Page 141
Bishop, D. V. M. (1994). Grammatical errors in specific language impairment: Competence or performance limitations?
Applied Psycholinguistics, 15, 507–550.
Bishop, D. V. M., & Edmundson, A. (1987). Language-impaired 4-year-olds: Distinguishing transient from persistent
impairment. Journal of Speech and Hearing Disorders, 52, 156–173.
Bondurant, J., Romeo, D., & Kretschmer, R. (1983). Language behaviors of mothers of children with normal and
delayed language. Language, Speech, and Hearing Services in Schools, 14, 233–242.
Brzustowicz, L. (1996). Looking for language genes: Lessons from complex disorder studies. In M. Rice (Ed.), Towards
a genetics of language (pp. 3–25). Mahwah, NJ: Lawrence Erlbaum Associates.
Camarata, S., & Swisher, L. (1990). A note on intelligence assessment within studies of specific language impairment.
Journal of Speech and Hearing Research, 33, 205–207.
Casby, M. (1992). The cognitive hypothesis and its influence on speech-language services in schools. Language, Speech,
and Hearing Services in School, 23, 198–202.
Chomsky, N. (1986). Barriers. Cambridge, MA: MIT Press.
Clahsen, H. (1989). The grammatical characterization of developmental dysphasia. Linguistics, 27, 897–920.
Clark, M., & Plante, E. (1998). Morphology of the inferior frontal gyrus in developmentally language-disordered adults.
Brain and Language, 61(2), 288–303.
Cohen, M., Campbell, R., & Yaghmai, F. (1989). Neuropathological abnormalities in developmental dysphasia. Annals
of Neurology, 25, 567–570.
Cohen, N. J., Davine, M., Horodezky, N., Lipsett, L., & Isaacson, L. (1993). Unsuspected language impairment in
psychiatrically disturbed children: prevalence and language and behavioral characteristics. Journal of the American
Academy of Child and Adolescent Psychiatry, 32, 595–603.
Crago, M., & Gopnik, M. (1994). From families to phenotypes. In R. Watkins & M. Rice (Eds.), Specific language
impairments in children (pp. 35–51). Baltimore: Paul H. Brookes.
Craig, H. K. (1993). Social skills of children with specific language impairment: Peer relationships. Language, Speech,
and Hearing Services in Schools, 24, 206–215.
Cunningham, C., Siegel, L., van der Spuy, H., Clark, M., & Bow, S. (1985). The behavioral and linguistic interactions of
specifically language-delayed and normal boys with their mothers. Child Development, 56, 1389–1403.
Dowdy, C. A., Patton, J. R., Smith, T. E. C., & Polloway; E. A. (1998). Attention deficit/hyperactivity disorder in the
classroom: A practical guide for teachers. Austin, TX: Pro-Ed.
Edwards, J., & Lahey, M. (1996). Auditory lexical decisions of children with specific language impairment. Journal of
Speech and Hearing Research, 39, 1263–1273.
Ellis Weismer, S. (1985). Constructive comprehension abilities exhibited by language-disordered children. Journal of
Speech and Hearing Research, 28, 175–184.
Ellis Weismer, S., Evans, J., & Hesketh, L. J. (1999). An examination of verbal working memory capacity in children
with specific language impairment. Journal of Speech, Language, and Hearing Research, 42, 1249–1260.
Farmer, M. (1997). Exploring the links between communication skills and social competence. Educational and Child
Psychology, 14(3), 38–44.
Fey, M., Long, S. H., & Cleave, P. L. (1994) Reconsideration of IQ criteria in the definition of specific language
impairment. In R. Watkins & M. Rice (Eds.), Specific language impairments in children (pp. 161–178). Baltimore: Paul
H. Brookes.
Fujiki, M., & Brinton, B. (1994). Social competence and language impairment in children. In R. V. Watkins & M. L.
Rice (Eds.), Specific language impairments in children (pp. 123–143). Baltimore: Paul H. Brookes.
Galaburda, A., Sherman, G., Rosen, G., Aboitiz, F., & Geschwind, N. (1985). Developmental dyslexia: Four consecutive
patients with cortical anomalies. Annals of Neurology, 18, 222–233.
Gathercole, S., & Baddeley, A. (1990). Phonological memory deficits in language disordered children: Is there a causal
connection? Journal of Memory and Language, 29, 336–360.
Gauger, L., Lombardino, L., & Leonard, C. (1997). Brain morphology in children with specific language impairment.
Journal of Speech, Language, and Hearing Research, 40, 1272–1284.
Gertner, B. L., Rice, M. L., & Hadley, P. A. (1994). Influence of communicative competence on peer preferences in a
preschool classroom. Journal of Speech and Hearing Research, 37, 913–923.
Page 142
Geschwind, N., & Levitsky, W. (1968). Human brain: Asymmetries in the temporal speech region. Science, 161, 186–
187.
Gilger, J. W. (1995). Behavioral genetics: Concepts for research and practice in language development and disorders.
Journal of Speech and Hearing Research, 38, 1126–1142.
Gillam, R. (1999). Computer assisted language intervention using Fast ForWord: Theoretical and empirical
considerations for clinical decision-making. Language, Speech, and Hearing Services in Schools, 30, 363–370.
Goldman, S. (1995). Disruptive behavior, lying, stealing, and aggression. In S. Parker & B. Zuckerman (Eds.),
Behavioral and developmental pediatrics (pp. 110–115). Boston: Little, Brown.
Gopnik, M. (1990). Feature-blind grammar and dysphasia. Nature, 344, 715.
Gopnik, M., & Crago, M. (1991). Familial aggregation of a developmental language disorder. Cognition, 39, 1–50.
Hadley, P., & Rice, M. L. (1991). Conversational responsiveness of speech- and language-impaired preschoolers.
Journal of Speech and Hearing Research, 34, 1308–1317.
Hall, P., & Tomblin, B. (1978). A follow-up study of children with articulation and language disorders. Journal of
Speech and Hearing Disorders, 43, 227–241.
Hurford, J. R. (1994). Grammar: A student’s guide. Cambridge, England: Cambridge University Press.
Ingram, D., & Carr, L. (1994, November). When morphology ability exceeds syntactic ability: A case study. Paper
presented at the Convention of the American Speech-Language-Hearing Association, New Orleans, LA.
Jackson, T., & Plante, E. (1996). Gyral morphology in the posterior sylvian region in families affected by developmental
language disorder. Neuropsychology Review, 6(2), 81–94.
Jernigan, T., Hesselink, J., Sowell, E., & Tallal, P. (1991). Cerebral structure on magnetic resonance imaging in
language- and learning-impaired children. Archives of Neurology, 48, 539–545.
Johnston, J. R. (1992). Cognitive abilities of language-impaired children. In P. Fletcher & D. Hall (Eds.), Specific speech
and language disorders in children (pp. 105–116). San Diego, CA: Singular Press.
Johnston, J. R. (1993). Definition and diagnosis of language development disorders. In G. Blanken, J. Dittman, H.
Grimm, J. C. Marshall, C.- W. Wallesch (Eds.), Linguistic disorders and pathologies: An international handbook (pp.
574–585). Berlin, Germany: deGruyter.
Kamhi, A. (1993). Children with specific language impairment (developmental dysphasia): Perceptual and cognitive
aspects. In G. Blanken, J. Dittman, H. Grimm, J. C. Marshall, C. -W. Wallesch (Eds.), Linguistic disorders and
pathologies: An international handbook (pp. 574–585). Berlin, Germany: deGruyter.
Kamhi, A. (1998). Trying to make sense of developmental language disorders. Language, Speech, and Hearing Services
in Schools, 29, 35–44.
Krassowski, E., & Plante, E. (1997). IQ variability in children with SLI: Implications for use of cognitive referencing in
determining SLI. Journal of Communication Disorders, 30, 1–9.
Kuehn, D. P., Lemme, M. L., & Baumgartner, J. M. (1989). Neural bases of speech, hearing, and language. San
Antonio, TX: Pro-Ed.
Lahey, M. (1988). Language disorders and language development. New York: Macmillan.
Lahey, M., & Edwards, J. (1995). Specific language impairment: Preliminary investigation of factors associated with
family history and with patterns of language performance. Journal of Speech and Hearing Research, 38, 643–657.
Lahey, M., & Edwards, J. (1996). Why do children with specific language impairment name pictures more slowly than
their peers? Journal of Speech and Hearing Research, 39, 1081–1098.
Leonard, L. (1989). Language learnability and specific language impairment in children. Applied Psycholinguistics, 10,
179–202.
Leonard, L. (1995). Functional categories in the grammars of children with specific language impairment. Journal of
Speech and Hearing Research, 38, 1270–1283.
Leonard, L. (1998). Children with specific language impairment. Cambridge, MA: MIT Press.
Leonard, L., Eyer, J., Bedore, L., & Grela, B. (1997). Three accounts of the grammatical morpheme difficulties of
English-speaking children with specific language impairment. Journal of Speech and Hearing Research, 40, 741–753.
Page 143
Levy, D. (1996, January 5). Sound games help teach impaired kids. USA Today, p. 1A.
Locke, J. (1994). Gradual emergence of developmental language disorders. Journal of Speech and Hearing Disorders,
37, 608–616.
Loeb, D., & Leonard, L. (1991). Subject case marking and verb morphology in normally developing and specifically
language-impaired children. Journal of Speech and Hearing Research, 34, 340–346.
Menyuk, P. (1993). Children with specific language impairment (Developmental dysphasia): Linguistic aspects. In G.
Blanken, J. Dittman, H. Grimm, J. C. Marshall, C. -W. Wallesch (Eds.), Linguistic disorders and pathologies. An
international handbook (pp. 606–625). Berlin, Germany: deGruyter.
Merzenich, M., Jenkins, W., Johnston, P., Schreiner, C., Miller, S., & Tallal, P. (1996, January). Temporal processing
deficits of language-learning impaired children ameliorated by training. Science, 271, 77–81.
Neils, J., & Aram. D. (1986). Family history of children with developmental language disorders. Perceptual and Motor
Skills, 63, 655–658.
Nelson, K. E., Camarata, S., Welsh, J., Butkovsky, L., & Camarata, M. (1996). Effects of imitative and conversational
recasting treatment on the acquisition of grammar in children with specific language impairment and younger language-
normal children. Journal of Speech and Hearing Research, 39, 850–859.
Newhoff, M. (1977). Maternal linguistic behavior in relation to the linguistic and developmental ages of children.
Unpublished doctoral dissertation, Memphis State University, Tennessee.
Nippold, M. A. (1998). Later language development: The school-age and adolescent years. (2nd ed.). Austin, TX: Pro-
Ed.
Olswang, L., Rodriguez, B., & Timlet, G. (1998). Recommending intervention for toddlers with specific language
learning difficulties: We may not have all the answers, but we know a lot. American Journal of Speech-Language
Pathology, 7, 23–32.
Paul, R. (1996). Clinical implications of the natural history of slow expressive language development. American Journal
of Speech-Language Pathology, 5, 5–21.
Pembrey, M. (1992). Genetics and language disorder. In P. Fletcher & D. Hall (Eds.), Specific speech and language
disorders in children (pp. 51–62). San Diego, CA: Singular Press.
Plante, E. (1991). MRI findings in the parents and siblings of specifically language-impaired boys. Brain and Language,
41, 67–80.
Plante, E. (1996). Phenotypic variability in brain-behavior studies of specific language impairment. In M. Rice (Ed.),
Toward a genetics of language (pp. 317–335). Mahwah, NJ: Lawrence Erlbaum Associates.
Plante, E. (1998). Criteria for SLI: The Stark and Tallal legacy and beyond. Journal of Speech, Language, and Hearing
Research, 41, 951–957.
Plante, E., Shenkman, K., & Clark, M. (1996). Classification of adults for family studies of developmental language
disorders. Journal of Speech and Hearing Research, 39, 661–667.
Plante, E., Swisher, L., & Vance, R. (1989). Anatomical correlates of normal and impaired language in a set of dizygotic
twins. Brain and Language, 37, 643–655.
Plante, E., Swisher, L., Vance, R., & Rapcsak, S. (1991). MRI findings in boys with specific language impairment. Brain
and Language, 41, 52–66.
Rapin, I. (1996). Practitioner review: Developmental language disorders: A clinical update. Journal of Child Psychology
and Psychiatry, 37, 643–655.
Rapin, I., & Allen, D. (1983). Developmental language disorders: Nosologic considerations. In U. Kirk (Ed.),
Neurospychology of language, reading, and spelling (pp. 155–184). New York: Academic Press.
Rapin, I., & Allen, D. (1988). Syndromes in developmental dysphasia and adult aphasia. In F. Plum (Ed.), Language,
communication, and the brain (pp. 57–75). New York: Raven Press.
Records, N. L., Tomblin, J. B., & Freese, P. (1992). The quality of life of young adults with histories of speech-language
impairment. American Journal of Speech-Language Pathology, 1, 44–53.
Rescorla, L. (1991). Identifying expressive language delay at age 2. Topics in Language Disorders, 11(4), 14–20.
Rice, M. L. (1996). Of language, phenotypes, and genetics: Building a cross-disciplinary platform for inquiry. In M. Rice
(Ed.), Towards a genetics of language (pp. xi–xxv), Mahwah, NJ: Lawrence Erlbaum Associates.
Page 144
Rice, M. L., Wexler, K. and Cleave, P. (1995). Specific language impairment as a period of extended optional infinitive.
Journal of Speech and Hearing Research, 38, 850–863.
Roby, C. (1994). When learning is tough: Kids talk about their learning disabilities. Morton Grove, IL: Albert Whitman
& Company.
Rutter, M., Mawhood, L., & Howlin, P. (1992). Language delay and social development. In P. Fletcher & D. Hall (Eds.),
Specific speech and language disorders in children (pp. 63–78). San Diego, CA: Singular Press.
Scarborough, H., & Dobrich, W. (1990). Development of children with early language delay. Journal of Speech and
Hearing Research, 33, 70–83.
Schachter, D. C. (1996). Academic performance in children with speech and language impairment: A review of follow-
up research. In J. H. Beitchman, N. J. Cohen, M. M. Konstantaras, & R. Tannock (Eds.), Language, learning, and
behavior disorders: Developmental, biological, and clinical perspectives (pp. 515–529). Cambridge, England:
Cambridge University Press.
Scientific Learning Corporation. (1998). Fast ForWord [Computer software]. Berkeley, CA: Author.
Snow, C. E. (1996). Toward a rational empiricism: Why interactionism is not behavior any more than biology is
genetics. In M. L. Rice (Ed.), Toward a genetics of language (pp. 377–396). Mahwah, NJ: Lawrence Erlbaum Associates.
Stark, R. E., & Tallal, P. (1981). Selection of children with specific language deficits. Journal of Speech and Hearing
Disorders, 46, 114–122.
Stark, R. E., & Tallal, P. (1988). R. J. McCauley (Ed.), Language, speech, and reading disorders in children:
Neuropsychological studies. Boston: Little, Brown/College-Hill.
Stevenson, J. (1996). Developmental changes in the mechanisms linking language disabilities and behavior disorders. In
J. H. Beitchman, N. J. Cohen, M. M. Konstantaras, & R. Tannock (Eds.), Language, learning, and behavior disorders:
Developmental, biological, and clinical perspectives (pp. 78–99). Cambridge, England: Cambridge University Press.
Stothard, S. E., Snowling, M. J., Bishop, D. V. M., Chipchase, B. B., & Kaplan, C. A. (1998). Language-impaired
preschoolers: A follow-up into adolescence. Journal of Speech-Language-Hearing Research, 41,407–418.
Tager-Flusberg, H., & Cooper, J. (1999). Present and future possibilities for defining a phenotype for specific language
impairment. Journal of Speech, Language, and Hearing Research, 42, 1273–1278.
Tallal, P. (1976). Rapid auditory processing in normal and disordered language development. Journal of Speech and
Hearing Research, 19, 561–571.
Tallal, P., & Piercy, M. (1973). Developmental aphasia: Impaired rate of non-verbal processing as a function of sensory
modality. Neuropsychologia, 11, 389–398.
Tallal, P., Ross, R., & Curtiss, S. (1989). Familial aggregation in specific language impairment. Journal of Speech &
Hearing Disorders, 54, 167–173.
Tallal, P., Stark, R. E., Kallman, C., & Mellits, D. (1980). Developmental aphasia: The relation between acoustic
processing deficits and verbal processing. Neuropsychologia, 18, 273–284.
Tallal, P., Miller, S., Bedi, G., Byma, G., Wang, X., Nagarajan, S., Schreiner, C., Jenkins, W., & Merzenich, M. (1996,
January). Language comprehension in language-learning impaired children with acoustically modified speech, Science,
271, 81–84.
Teszner, D., Tzavares, A., Grunder, J., & Hecaen, H. (1972). L’asymetrie droite-gauche du planum temporale: A propos
de l’etude anatomique de 100 cerveau [Right-left asymmetry of the planum temporale; apropos of the anatomical study
of 100 brains]. Neurological Review, 146, 444–449.
Tomblin, J. B. (1989). Familial concentration of developmental language impairment. Journal of Speech and Hearing
Disorders, 54, 287–295.
Tomblin, J. B. (1996a, June). The big picture of SLI: Results of an epidemiologic study of SLI among kindergarten
children. Paper presented at the Symposium for Research in Child Language Disorders, Madison, WI.
Tomblin, J. B. (1996b). Genetic and environmental contributions to the risk for specific language impairment. In M. Rice
(Ed.), Toward a genetics of language (pp. 191–210). Hillsdale, NJ: Lawrence Erlbaum Associates.
Page 145
Tomblin, J. B., & Buckwalter, P. (1994). Studies of genetics of specific language impairment. In R. Watkins & M. Rice
(Eds.), Specific language impairments in children (pp. 17–35). Baltimore: Paul H. Brookes.
Tomblin, J. B., Freese, P., & Records, N. (1992). Diagnosing specific language impairment in adults for the purpose of
pedigree analysis. Journal of Speech and Hearing Research, 35, 832–843.
Tomblin, J. B., Records, N. L., Buckwalter, P., Zhang, X., Smith, E., & O’Brien, M. (1997). Prevalence of specific
language impairment in kindergarten children. Journal of Speech, Language, and Hearing Research, 40, 1245–1260.
Trauner, D., Wulfeck, B., Tallal, P., & Hesselink, J. (1995). Neurologic and MRI profiles of language impaired children.
Technical Report CND-9513. Center for Research in Language, University of California at San Diego.
van der Lely, H. (1996). Specifically language impaired and normally developing children: Verbal passive vs. adjectival
passive interpretation. Lingua, 98, 243–272.
van Kleeck, A., Gillam, R. B., & Davis, B. (1997). When is “watch and see” warranted? A response to Paul’s 1996
article “Clinical implications of the natural history of slow expressive language development.” American Journal of
Speech-Language Pathology, 6, 34–39.
Vargha-Kadeem, F., Watkins, K., Alcock, K., Fletcher, P., & Passingham, R. (1995). Praxic and nonverbal cognitive
deficits in a large family with a genetically transmitted speech and language disorder. Proceedings of the National
Academy of Sciences, 92, 930–933.
Veale, T. K. (1999). Targeting temporal processing deficits through Fast ForWord: Language therapy with a new twist.
Language, Speech, and Hearing Services in Schools, 30, 353–362.
Wallach, G. P., & Butler, K. G. (1994). Language learning disabilities in school-age children and adolescents: Some
principles and applications. New York: Macmillan.
Watkins, R. V. (1994). Specific language impairments in children: An introduction. In R. V. Watkins & M. L. Rice
(Eds.), Specific language impairments in children (pp. 1–15). Baltimore: Paul H. Brookes.
Wender, E. (1995). Hyperactivity. In S. Parker & B. Zuckerman (Eds.), Behavioral and developmental pediatrics (pp.
185–194). Boston: Little, Brown.
Weiner, P. (1974). A language-delayed child at adolescence. Journal of Speech and Hearing Disorders, 39, 202–212.
Wilson, B., & Risucci, D. (1986). A model for clinical-quantitative classification. Generation I: Application to language-
disordered preschool children. Brain & Language, 27, 281–309.
Page 146
CHAPTER
6

Children with Mental Retardation


Defining the Problem

Suspected Causes

Special Challenges in Assessment

Expected Pattern of Strengths and Weaknesses

Related Problems
Tracy, a 10-year-old with Down syndrome, attends a regular classroom, where her voice often rings out as she expresses
exuberant enthusiasm for all the fun things that happen. Tracy speaks in short sentences that are frequently difficult to
understand. Although she sometimes shows considerable frustration with others’ not understanding her, most of the time
Tracy appears oblivious to their lack of understanding. A speech-language pathologist works with her on goals related
to syntax and intelligibility, usually within the classroom.
Seth, a 4-year-old with cerebral palsy and epilepsy as well as mental retardation, attends a special preschool classroom
irregularly because of his frequent illnesses. In the classroom, he spends much of his time in a wheelchair or adaptive
seat, which was designed to provide him with the postural support needed for him to control his head movements. In
addition to working with him in the classroom, a speech-language pathologist visits his home once a week to work with
Seth and his mother. Seth vocalizes infrequently and often seems unaware of others in his environment. Goals for him
Page 147
include establishing nonverbal turn-taking skills and increasing the frequency of his vocalizations.
Jake is a 12-year-old boy with mild mental retardation associated with Fetal alcohol syndrome. Although his
comprehension skills test within the normal range, and he is generally understandable in his language production, Jake
has considerable difficulty in following directions in school. He has been diagnosed with ADD and requires frequent
redirecting to stay involved in classroom activities. Although he is eager to establish friendships with his classmates, his
ability to use social cues to guide his communications appears inconsistent. Intervention for Jake includes individual,
attention within the classroom and participation in a social skills group with the speech-language pathologist one time
per week.
Defining the Problem
Tracy, Seth, and Jake are representative of the approximately 3% of school-age children in the United States who exhibit
problems associated with mental retardation (Roeleveld, Zielhuis, & Gabreels, 1997), where mental retardation can be
defined as reduced intelligence accompanied by reduced adaptive functioning, that is, reduced ability to function in
everyday situations in a manner considered culturally and developmentally appropriate. Because communication is a
particularly important adaptive function affected by mental retardation, speech-language pathologists often work with
affected children and their families.
About 85% of children with mental retardation experience mild problems (Lubetsky, 1990) and may not be identified as
mentally retarded until they reach school age. Children with more significant degrees of impairment are often identified
at an earlier point because their delays in achieving developmental milestones are more pronounced and because they
often have additional medical difficulties, such as cerebral palsy or epilepsy (Durkin & Stein, 1996). Although mental
retardation is usually present from birth, it can also be diagnosed for conditions that can occur up to 18 years of age,
including exposure to environmental toxins such as lead over the first few years of life.
Despite the brief definition offered earlier, formulating a more complete, usable definition of mental retardation that is
equally acceptable to families, advocates, scientists, clinicians, and politicians has proved controversial and difficult—
some would say impossible—particularly where milder forms of retardation are concerned (Baumeister, 1997; Roeleveld
et al., 1997). Table 6.1 provides two of the most influential definitions currently being used—those proposed by the
American Association for Mental Retardation (AAMR) and the American Psychiatric Association.
The AAMR and American Psychiatric Association definitions both specify impairment in adaptive skills as a critical
element in the identification process. Traditionally, IQ score alone, with less attention to adaptive skills, was central to
the identification process. These two newer definitions address essentially the same adaptive skills (viz., communication,
self-care, home living, social skills, community use, self-direction, health and safety, functional academics, leisure, and
work). Despite this uniformity, however, these definitions are still quite controversial because of significant concerns
Page 148
Table 6.1
Two Influential Definitions of Mental Retardation

American Association on Mental Retardation (Luckasson, 1992)


Mental retardation refers to substantial limitations in present functioning. It is characterized by significantly
subaverage intellectual functioning, existing concurrently with related limitations in two or more of the following
applicable adaptive skill areas: communication, self-care, home living, social skills, community use, self-direction,
health and safety, functional academics, leisure, and work. Mental retardation manifests before age 18.
American Psychiatric Association (1994)
Diagnostic criteria:
A. Significantly subaverage intellectual functioning: an IQ of approximately 70 or below on an individually
administered IQ test (for infants, a clinical judgment of significantly subaverage intellectual functioning);
B. Concurrent deficits or impairments in present adaptive functioning (i.e., the person’s effectiveness in meeting the
standards expected for his or her age by his or her cultural group) in at least two of the following areas:
communication, self-care, home living, social and inter-personal skills, use of community resources, self-
direction, functional academic sklls work, leisure, health, and safety;
C. The onset is before age 18 years.

about the lack of valid measures for many adaptive skill areas (e.g., Jacobson & Mulick, 1996; Macmillan & Reschly,
1997) and because of debates about the number of dimensions needed to capture adaptive functioning (Simeonsson &
Short, 1996).
Although not evident in Table 6.1, the complete AAMR and American Psychiatric Association definitions differ sharply
in their handling of severity. Whereas the American Psychiatric Association definition maintains a traditional treatment
of severity using a system with five levels (see Table 6.2), the AAMR system (Luckasson, 1992) replaces those with the
description of levels of support needed by the individual (intermittent, limited, extensive, and pervasive) for intellectual
ability and for each adaptive skill separately. Because treatment recommendations are often formulated on the basis of
severity (Durkin & Stein, 1996), this change in the AAMR definition represents a major departure from long-standing
practice.
Table 6.2
Degrees of Severity of Mental Retardation Used
by the American Psychiatric Association (DSM–IV, 1994)

Degree IQ Level

Mild mental retardation 50–55 to approximately 70


Moderate retardation 35–40 to 40–55
Severe mental retardation 20–25 to 35–40
Profound mental retardation Below 20 or 25
Mental retardation, severity unspecified Used when there is a strong presumption of mental retardation but the
individual cannot be tested using standardized instruments
Page 149
Radical changes in definitions such as those just described can affect the ways in which governmental and other agencies
determine which children are eligible for assistance. They also affect researchers who must identify the group of
individuals to whom their research can be generalized and clinicians as they work within the bureaucracy to help
affected children and their families (Macmillan & Reschly, 1997). Therefore, although wrangles over definitions can
seem irrelevant to a basic understanding of language impairment and its assessment in children with mental retardation,
they are powerful in determining how such children can be helped. For example, depending on which of the two
definitions described in this section is used and exactly how it is implemented, Jake, the third child described at the
beginning of the chapter, might not be identified as a child requiring special attention in the school setting.
Suspected Causes
Until the past decade, only about 25% of cases of mental retardation were associated with known organic causes (e.g.,
Down syndrome, perinatal trauma; Grossman, 1983). Recent advances, however, bring that figure up to about 50%
(American Psychiatric Association, 1994; Baumeister, 1997), with a wide range of organic causes now identified. Such
causes are often associated with more severe cases of mental retardation (Rosenberg & Abbeduto, 1993).
Organic Causes
Classification of the many pre-, peri-, and postnatal organic causes of mental retardation reveals human vulnerability to a
myriad of factors that can alter later neurologic development and function. Table 6.3 presents a lengthy but far from
complete list of predisposing factors. Knowledge of causation can help in efforts to prevent retardation in some
individuals, to counsel families regarding its likelihood of recurring in later children, and to develop treatments that can
prevent or ameliorate long-term negative consequences.
Three important known causes of mental retardation are Down syndrome, fragile X syndrome, and fetal alcohol
syndrome. Each of these conditions is described as a syndrome because it is associated with a ‘‘common set of physical
traits or malformations sharing a similar prognosis” (Batshaw & Perret, 1981). Two of these syndromes have genetic
causes; the third, fetal alcohol syndrome, has a preventable cause—namely, intrauterine exposure to alcohol, a powerful
toxin to the developing brain. Consideration of these syndromes demonstrates the intimate connections between the
cause of mental retardation and the nature of communication and other difficulties confronting affected children
(Cromer, 1981; Hodapp & Dykens, 1994; Hodapp, Leckman, Dykens, Sparrow, Zelinsky, & Ort, 1992; cf. Hodapp &
Zigler, 1990).
Down syndrome and fragile X syndrome are the most common genetic birth defects associated with mental retardation.
Beginning to understand these two conditions, therefore, depends on at least a bare-bones grasp of human genetics,
which will be offered here. More lengthy treatments can be found in resources such as M. M. Cohen (1997).
Page 150
Table 6.3
Categories of Organic Predisposing Factors Associated With Mental Retardation
(American Psychiatric Association, 1994, p. 43)

Category Percentage of cases of mental retardation associated with this factor Specific conditions

Heredity 5
Inborn errors of metabolism (e.g., Tay-Sachs disease)Single-gene abnormalities (e.g., tuberous sclerosis)Chromosomal
aberrations (e.g., fragile X syndrome, a small number of cases of Down syndrome) Early alterationsof
embryonicdevelopment 30 Chromosomal changes (most cases of Down syndrome—those due to trisomy 21)Prenatal
damage due to toxins (e.g., maternal alchohol consumption, infections) Pregnancy andperinatalproblems 10 Fetal
malnutrition, prematurity, hypoxia (oxygen deficiency), viral and other infections, and trauma General
medicalconditionsacquired ininfancy orchildhood 5 Infections, traumas, and poisoning (e.g., due to lead)
Probably the most basic facts in genetics include the information that all cells in the human body except for the
reproductive cells (sperm in men and ova in women) contain 23 pairs of chromosomes. These 23 chromosome pairs
consist of 22 pairs of numbered autosomes and 1 pair of sex chromosomes, which are identified as XX for women and
XY for men. These chromosomes, which hold many individual genes, act as the blueprints for cell function and thus
determine an individual’s physical make-up.
Unlike other human cells, ova and sperm cells have half the usual number of chromosomes—23 nonpaired chromosomes
and one sex chromosome. During the reproductive process, this feature of reproductive cells allows each parent to
contribute one half of each offspring’s genetic material as the genetic materials of both reproductive cells are combined
during fertilization. Because chromosomes contain numerous genes, defects to either the larger chromosomes or to
individual genes can result in impaired cellular function during embryonic development and later life.
Down syndrome is an example of an autosomal genetic disorder in which extra genetic material is found at chromosome
pair 21. This condition arises about once in every 800 live births, making it the most common genetic disorder associated
with mental retardation. About 95% of the time, Down syndrome occurs because an entire
Page 151

Fig. 6.1. Graphic representation of the genetic test used to identify the presence of Trisomy 21. From Babies With Down
Syndrome: A New Parents Guide (p. 8), by K. Stray-Gunderson (Ed.), 1986, Kensington, MD: Woodbine House.
Copyright 1986 by Woodbine House. Reproduced with permission.
extra chromosome is present, resulting in the individual’s possessing three chromosomes of chromosome 21 known as
trisomy 21, instead of the normal pairing of chromosomes (Bellenir, 1996). Figure 6.1 illustrates the complete set of
chromosomes associated with a girl who has Down syndrome.
Less frequently, Down syndrome is associated with only a portion of an extra chromosome occurring at chromosome 21
or with the occurrence of an entire extra chromosome 21, but only in some cells within the body (termed mosaic Down
syndrome). Usually the chromosomal defect occurs during the development of an individual ovum, but it can occur
because of a sperm defect or a defect occurring after the uniting of the sperm and ovum in fertilization. Because of this
timing of the change in the genetic material, Down syndrome is described as a genetic disorder, but not an inherited one,
in which both parent and child are affected.
Down syndrome is associated with a characteristic physical appearance, involving slanted eyes, small skin folds on the
inner corner of the eyes (epicanthal folds), slightly protruding lips, small ears, an overly large tongue (macroglossia), and
short hands, feet, and trunk (Bellenir, 1996). Figure 6.2 shows two young children with this syndrome.
Other more serious physical anomalies found among children with Down syndrome affect the cervical spine, bowel,
thyroid, eyes, and heart (Cooley & Graham, 1991). Children with Down syndrome are more susceptible to infection,
including otitis media
Page 152

Fig. 6.2. Two children with Down syndrome.


(Cooley & Graham, 1991), and are 20 times more likely than other children to develop and die from leukemia. Because
of these abnormalities, as a group their life expectancy is somewhat shortened, despite recent advances in the correction
of congenital heart defects, improved control of infections, and avoidance of institutionalization. Roughly 80% of these
children will live to the age of 30 and beyond (Cooley & Graham, 1991). Adults with Down syndrome have also been
shown to be at increased risk for the onset of Alzheimer’s-like dementia, or decline in intellectual function (Connor &
Ferguson-Smith, 1997; Zigman, Schupf, Zigman, & Silverman, 1993)
Fragile X syndrome is currently thought to be the single most common inherited cause of mental retardation (Baumeister
& Woodley-Zanthos, 1996). Although it occurs less frequently than Down syndrome—the most frequent genetic cause
of mental retardation—fragile X is more frequently inherited than Down syndrome because Down syndrome is almost
never passed from one generation to the next.
Fragile X occurs about once in every 1250 to 2500 men and about half that often in women (Bellenir, 1996). Although
fragile X can occur in either gender, it is more often associated with mental retardation in affected men. When mental
retardation occurs, it can range from mild to profound levels, with generally milder impairments in affected women
(Dykens, Hodapp, & Leckman, 1994). Because its patterns of inheritance are more complex than those seen in other
previously identified genetic disorders, fragile X was only identified in the 1970s (Lehrke, 1972).
Fragile X syndrome involves the single gene FMR1, present on the X chromosome, which can be defective or absent. A
partially defective gene is referred to as a pre-
Page 153
mutation and may be associated with very mild or even no obvious problems in the affected person. When the defect is
greater, or the gene FMR1 is absent, more serious problems, including severe to profound mental retardation, are the
likely outcome.
Fragile X syndrome is inherited through an X-linked mode of transmission (similar to hemophilia) in which some
individuals are “carriers” (usually women) and others are affected individuals (usually men). Fathers have only one X
and one Y chromosome. Consequently, they can only transmit a defective X chromosome to a daughter, who will have
received a second X from her mother (who has two X and no Y chromosomes). Because only one of the two X
chromosomes in a girl is likely to be active, it is possible for daughters to appear unaffected, but to be carriers of the
defective chromosome. They can also be affected, however, if they possess two defective X chromosomes or if the
defective X chromosome for some reason is the active one. About one third of those girls with the defective gene will be
of normal intelligence, one third will have borderline intelligence, and one third will have greater degrees of mental
retardation (American College of Medical Genetics, 1997). About 50% of the male offspring of carrier women will
demonstrate fragile X syndrome (Dykens et al., 1994), and most of these children will have mental retardation.
Boys with fragile X and mental retardation often share the following physical traits: a long, narrow face; long, thick,
prominent ears; and overly large testicles (Dykens et al., 1994). Figure 6.3 shows two youngsters with this condition.
Beyond the physical traits noted throughout life, children are at risk for obesity during adolescence. On the basis of a
smaller number of studies than those undertaken for males with fragile X, it appears that females with fragile X show
some similar traits to those of males, although to a lesser extent. Conditions that tend to accompany mental retardation in
children with fragile X are ADD and ADHD, anxiety and mood difficulties, as well as auditory and visual problems
(Dykens et al., 1994). Considerable controversy has surrounded the relationship between fragile X and autistic disorder
(chap. 7, this volume; I. L. Cohen, 1995). There has been some speculation that the rate of co-occurrence may be due to
the level of mental retardation rather than to etiology (Dykens et al., 1994). However, work by I. L. Cohen (1995)
suggests that boys with both autism and fragile X are more significantly impaired than would be expected if the effects
of each condition were simply additive.
Fetal alcohol syndrome (FAS) refers to the constellation of physical abnormalities, deficient growth patterns, and
cognitive and behavioral problems found in children whose mothers drank heavily during pregnancy. Fetal alcohol effect
(FAE) is a closely related diagnosis in which only some portion of the constellation of abnormalities described for FAS
is seen in the affected child (Stratton, Howe, & Battaglia, 1996).
Although a possible connection between alcohol consumption by mothers during pregnancy and subsequent birth defects
has been known throughout history, only in the late 1960s and early 1970s was FAS formally described (Stratton et al.,
1996). Despite its having received considerable attention only recently, FAS has been proposed as the “most common
known nongenetic cause of mental retardation” (Stratton et al., 1996, p. 7), with estimates of incidence ranging from 0.5
to 3 births per 1000 live births (Stratton et al., 1996). The higher of these incidence figures makes FAS a more frequent
cause of mental retardation than either Down syndrome or fragile X syndrome.
Page 154

Fig. 6.3. Two young boys with Fragile X syndrome.


In addition, it is thought to be widely underdiagnosed (Maxwell & Geschwint-Rabin, 1996).
Alcohol is one of many different substances well known to be toxic to the developing central nervous system. However,
the specific mechanism by which alcohol consumption leads to the variety of difficulties seen in FAS or FAE is poorly
understood (Baumeister & Woodley-Zanthos, 1996). In general, the magnitude and nature of a toxin’s effects on prenatal
development are thought to be closely related to the amount of the toxin, the timing of exposure, and the genetic make-
up of the mother and child (Stratton et al., 1996). Currently, however, little is known about how those variables interact
to produce the broad range of effects seen in children with full FAS or with FAE. Particularly puzzling are observations
that some women who drink very heavily throughout their pregnancy can give birth to unaffected children, whereas
other women who drink far less can give birth to children with severe symptoms. This uncertainty about how damage is
caused has resulted in strong prohibitions against drinking during pregnancy until more is known about what, if any,
degree of exposure is safe.
As a group, children with full FAS tend to have mild mental retardation, but for individual children, cognitive levels can
range from severe retardation to normal func-
Page 155
tion. In addition to mental retardation, cardiac and skeletal abnormalities and vision problems have also been noted.
Facial abnormalities apparent during early childhood include the presence of epicanthal folds (such as those seen in
Down syndrome), eyelids that are overly narrow from inner to outer corner, a flat midface, smooth or long philtrum (area
above the upper lip), and thin upper lip (Sparks, 1993). These facial features are sometimes less pronounced in infancy
and after childhood, so they are not as useful as indicators of this problem for some age groups as for others. Congenital
hearing loss is another area of increased risk (Stratton et al., 1996). Figure 6.4 shows two youngsters affected by FAS.
Nonorganic Suspected Causes
Despite the growing frequency with which biological causes of mental retardation are identified, about half of all cases
of mental retardation do not have such well-defined explanations. In such cases, the degree of retardation tends to be
milder, and the retardation tends to be associated with a family history of mental retardation and low SES (Rosenberg &
Abbeduto, 1993). Historically, such cases were classified as ‘‘nonorganic” or “familial” mental retardation.

Fig. 6.4. Two youngsters with fetal alcohol syndrome. From Fetal Alcohol Syndrome: Diagnosis, Epidemiology,
Prevention, and Treatment (Figure 1-1, p. 18), by K. Stratton, C. Howe, & F. Battaglia (Eds.), 1996, Washington, DC:
National Academy Press. Copyright 1996 by National Academy Press. Reproduced with permission.
Page 156
Despite the implications that these cases involve social or experiential bases, there is considerable speculation that
nonorganic cases of mental retardation may actually reflect our current lack of knowledge rather than truly nonorganic
causes (Baumeister, 1997; Richardson & Koller, 1994). Many cases now identified as nonorganic may be recategorized
as the relationship of low SES and family history to exposure to environmental toxins (e.g., lead), poor nutrition, and
other ultimately organic causes are uncovered. The one major, truly nonorganic factor associated with mental retardation
is severe social deprivation, as a result of either inadequate institutional conditions or limitations of a child’s principal
caregiver (Richardson & Koller, 1994). Yet even that mechanism may act by depriving the infant’s maturing nervous
system of the proper inputs to promote specific physiological states required for brain development.
Special Challenges in Assessment
One of the most important things to keep in mind when trying to understand any child is his or her uniqueness—the
uniqueness of current strengths and weaknesses, history, and family situation. Most important, there is the need to
remember that uniqueness that makes them “Tracy” or “Seth” or “Jake,” rather than just the child with a particular
syndrome and pattern of deficits. Assessing children with mental retardation tempts some individuals to equate them
with their level of retardation or its etiology and tempts some people to pay attention to what they cannot do rather than
to what they are doing in their communications. Personal Perspective 6 hints at the negative effects of such a mistake.
PERSONAL PERSPECTIVE
The following passage is taken from a book written by a pair of young adult friends who have each been diagnosed
with Down syndrome. The title of their book is Count Us in: Growing up With Down Syndrome (Kingsley & Levitz,
1994, p. 35).
August ‘90
Mitchell: I wish I didn’t have Down syndrome because I would be a regular person, a regular mainstream normal
person. Because I didn’t know I had Down syndrome since a long time ago, but I feel very special in many ways. I feel
that being with, having Down syndrome, there’s more to it than I expected. It was very difficult but… I was able to
handle it very well.
Jason: I’m glad to have Down syndrome. I think it’s a good thing to have for all people that are born with it. I don’t
think it’s a handicap. It’s a disability for what you’re learning because you’re learning slowly. It’s not that bad. (p. 35)
How do you avoid these temptations? First, plan assessments using initial hypotheses about developmental levels and
patterns of impairment (which will be described in the next section) and on information obtained from caregivers or
others who know
Page 157
the child well. Framing the assessment questions with special clarity can help you anticipate the particular challenges
individual children might pose to the validity of conventional instruments.
Second, prepare to alter your plan as needed to keep the child engaged and interacting. Not only does this mean that you
may need to turn away from a standardized instrument midstream (e.g., if it is developmentally inappropriate) in favor of
a more informal or dynamic assessment method (see chap. 10), you may also want to consider the use of adaptations.
Test adaptations are changes made in the test stimuli, response required of the child, or testing procedures (Stagg, 1988;
Wasson, Tynan, & Gardiner, 1982). On the one hand, the use of test adaptations threatens the validity of norm-
referenced comparisons that may be made using the instrument. Therefore, if a clinical question that really requires that
kind of comparison is at stake (e.g., an initial evaluation in which a difference from norms must be demonstrated to help
a child receive services), the clinician will avoid adaptations if possible. On the other hand, when some aspect of the
standard administration other than the basic skill or knowledge being tested interferes with a child’s ability to reveal his
or her actual skill or knowledge, one can argue that the validity of the comparison has already been severely
compromised. Table 6.4 lists some of the most common adaptations used. Regardless of which adaptations are used, they
should be described in reports of test results and the clinician should com-
Table 6.4
Examples of Testing Adaptations Used Frequently
With Children With Mental Retardation and Frequent Coexisting Problems (Stagg, 1988)

Reason for Recommended Adaptations


Adaptation

Attention and motivation


Increased use of social, tangible, and activity reinforcers (Fox & Wise, 1981)Breaking up administration into smaller
periods of time to maximize attentionUse of auditory commands or visual cueing (e.g., with a light pen) to direct
attention prior to each item (Wasson, Tynan, & Gardiner, 1982) Motor skills Replacement of tabletop administration to
position a child to achieve optimal motor performanceUse of alternative response modes (e.g., gazel., head pointers, oral
instead of pointing; Wasson, Tynan, & Gardiner, 1982)Removal of response time restrictionsBreaking up administration
into smaller periods of time to address fatigue Hearing Substitution of sign for oral presentationAddition of gesture or
sign to oral presentation (Wasson, Tynan, & Gardiner, 1982)Positioning to enhance child’s access to visual information
and to optimize residual hearing Vision Substitution of standard visual stimuli by high-contrast stimuli or larger
stimuliPlacement of all stimuli within the child’s visual field (as determined prior to testing)
Page 158
ment on the extent to which these adaptations are likely to interfere with the valid use of norms.
Related to the use of adaptations is a method that Sattler (1988) has proposed as a follow-up to standardized test
administration —testing of limits. A test of limits involves (a) providing additional cues, (b) changing test modality (e.g.,
from written to oral), (c) establishing methods used by the tested child, (d) eliminating time limits, and (e) asking
probing questions designed to clarify a child’s thinking leading to a response. It is meant to help the clinician gain an
appreciation of how a child has approached the task and what aspects of it interfered with success. It is closely related to
dynamic assessment approaches, which I describe in greater detail in chapter 10.
Special issues in testing that I discuss in later chapters of the book are out-of-level testing and discrepancy testing.
Except for brief definitions, these topics are not addressed here because they are also relevant to some of the other
groups of children discussed in the next few chapters. Out-of-level testing (Berk, 1984) refers to the use of an instrument
developed for children of a different age group from that of the child to be tested. In the context of children with mental
retardation, this is done in order to use content that is developmentally appropriate. This practice is discussed again in
chapter 10.
Discrepancy testing refers to the comparison of performances in two different behavioral or skill areas (e.g., between
ability and achievement) to determine whether a discrepancy exists. This kind of testing is important for children with
mental retardation because it will often be required as part of the procedures dictated within an educational system to
justify the provision of specific kinds of assistance. This topic is discussed repeatedly throughout this book, but
especially in chapters 9 and 10, because it represents one of the greatest contemporary challenges to assessment.
Expected Pattern of Strengths and Weaknesses
In psychology and special education, level of mental retardation has played a much greater role than etiology in the
identification of participants for research studies and the development of treatment approaches (Baumeister, 1997;
Hodapp & Dykens, 1994). However, there is a growing sensitivity that both etiology (e.g., Down syndrome, fragile X
syndrome) and level of mental retardation (viz., mild, moderate, severe, profound) provide useful bases for some
tentative predictions regarding likely patterns of behavioral strengths and weaknesses (Miller & Chapman, 1984).
Syndromes for which communication skills have been extensively studied are Down syndrome and, to a lesser extent,
fragile X syndrome. Several other syndromes, such as Williams Syndrome (Bellugi, Marks, Bihrle, & Sabo, 1993;
Mervis, 1998), Prader Willi (Donaldson, Shu, Cooke, Wilson, Greene, & Stephenson, 1994), and Turner Syndrome
(Downey, Ehrhardt, Gruen, Bell, & Morishima, 1989) have begun to be studied.
Table 6.5 summarizes tentative patterns of strengths and weaknesses as they have been suggested for children with
Down syndrome, fragile X, FAS, and Williams syndrome, a congenital metabolic disease usually associated with
moderate to severe learning difficulties. (Williams syndrome was not discussed previously in this chapter
Page 159
Table 6.5
Patterns of Strengths and Weaknesses Among Children With Mental Retardation Associated
With Down Syndrome, Fragile X Syndrome, Fetal Alcohol Syndrome, and Williams Syndrome

Syndrome Relative Strengths in Relative Weaknesses in Other Strengths and


Communication Communication Weaknesses

Down syndrome
Semantics (Rondal, 1996)Pragmatics (e.g., turn-taking, diversity of speech acts; Rondal, 1996)Nonverbal social
interaction skills (Hodapp, 1996) Morphology (Fowler, 1990; Rondal, 1996)Syntax (Fowler, 1990; Rondal, 1996)
Phonology (Rondal, 1996)Expressive skills relative to receptive skills (Dykens, Hodapp, & Leckman, 1994)Plateauing
of development in above areas from late childhood on (Rondal, 1996)Auditory processing (Hodapp, 1996)Nonverbal
requesting behavior (Hodapp, 1996)Increased risk of hearing loss (Bellenir, 1996)Increased risk of fluency disorder
(Bloodstein, 1995) Strengths:Adaptive behavior (Hodapp, 1996)“Pleasant personality” (Hodapp, 1996)Weaknesses:Low
task persistence (Hodapp, 1996)Mathematics (Hodapp, 1996)Inadequate motor organization (Hodapp, 1996)Visually
directed reaching (Hodapp, 1996)Visual monitoringHypotonia(Hodapp, 1996)Slow orienting to auditory information
(Hodapp, 1996) Fragile X syndromea Expressive vocabulary skills (Rondal & Edwards, 1997)Possibly syntax (although
sometimes grammar has been identified as a weakness; Dykens, Hodapp, & Leckman, 1994; Rondal & Edwards, 1997)
Fluency abnormalities (e.g., perseverative and staccato speech, rate of speech, cluttering; Rondal & Edwards, 1997).
Pragmatics, especially poor eye contact and other autistic-like behaviors (Rondal & Edwards, 1997)Phonology, difficulty
in sequencing syllables (Dykens, Hodapp, & Leckman, 1994; Rondal & Edwards, 1997) Strengths:Adaptive skills
(especially in personal and domestic skills; Dykens, Hodapp, & Leckman, 1994)Weaknesses:Attention deficits and
hyperactivity (Dykens, Hodapp, & Leckman, 1994)Social avoidance and shyness (Dykens, Hodapp, & Leckman, 1994)
(Continued)
Page 160
Table 6.5 (Continued)

Syndrome Relative Strengths in Relative Weaknesses in Other Strengths and


Communication Communication Weaknesses

Fetal alcohol
syndrome and fetal
alcohol effect
Most areas of language relatively unaffected ComprehensionPragmatics (e.g., frequently tangential responses; Abkarian,
1992) Strengths:Cognitive delays, when present, are usually mildWeaknesses:Attentional problems or hyperactivity
(Stratton, Howe, & Battaglia, 1996)Increased risk for visual and hearing problems (Stratton, Howe, & Battaglia, 1996)
Increased risk for behavior problems (Stratton, Howe, & Battaglia, 1996) Williams syndromeb Expressive language
(Rondal & Edwards, 1997)Morphology and syntax (Rondal & Edwards, 1997)Lexical knowledge (Rondal & Edwards,
1997)Metalinguistic knowledge (Rondal & Edwards, 1997)Fluency, prosody (Rondal & Edwards, 1997)Narrative skills
(Rondal & Edwards, 1997)Phonological skills (Rondal & Edwards, 1997) Receptive language (Udwin & Yule, 1990).
Pragmatics skills (socially inappropriate content, poor eye contact; Rondal & Edwards, 1997) Strengths:Facial
recognition (Rondal & Edwards, 1997)Weaknesses:Severe visuospatial deficits (Rondal & Edwards, 1997)Hyperacusis
(negatively sensitive to noise), especially younger children aPatterns relate almost entirely to affected males because of
the paucity of data on affected females.bPatterns based on a very limited database.
Page 161
because of its rarity.) Because the four groups of children described in Table 6.4 have experienced very different levels
of scrutiny, they differ in the certainty with which these strengths and weaknesses are known (Hodapp & Dykens, 1994).
Specifically, children with Down syndrome have received much more attention than those with fragile X, who have, in
turn, received considerably more attention than those with Williams or FAS. Interestingly, there has even been some
work suggesting that the specific type of chromosomal abnormality resulting in Down syndrome results in different
prognoses for communication outcomes, with better communication skills predicted for those children with mosaic
Down syndrome than with the more common trisomy 21 (Rondal, 1996).
Related Problems
Children with mental retardation are at risk for a variety of additional health-related and social problems, particularly if
the retardation is more severe (American Psychiatric Association, 1994). For example, two medical conditions that occur
frequently among children with severe or profound mental retardation are epilepsy and cerebral palsy, which have
expected percentage of occurrence rates of 19–36% for epilepsy and 20–40% for cerebral palsy (Richardson & Koller,
1994).
Overall, children with mental retardation, regardless of etiology, appear to be at four times the normal risk level for
ADHD, although there is some question as to whether their attention problems are really manifestations of mental
retardation rather than an independent additional problem (Biederman, Newcorn, & Sprich, 1997). Other behavioral and
emotional problems are also observed more frequently among individuals with mental retardation than among others,
including conduct disorder, anxiety disorders, psychozoidal disorder, and depression (Eaton & Menolascino, 1982).
Often, the etiology of mental retardation is closely associated with risk levels for particular problems. For example,
different kinds of visual problems are found in children with Down syndrome than in children with fragile X syndrome.
Whereas children with Down syndrome will frequently experience nearsightedness and cataracts (Connor & Ferguson-
Smith, 1997; Lubetsky, 1990), children with fragile X syndrome will more commonly have strabismus, a problem in the
coordination of eye movements (Maino, Wesson, Schlange, Cibis, & Maino, 1991).
Children with developmental and speech delays have also been found to be at increased risk for maltreatment, including
physical abuse, sexual abuse, and neglect (Sandgrund, Gaines, & Green, 1974; Taitz & King, 1988). Given the close
contact that speech-language pathologists frequently have with their clients, this increased incidence of maltreatment
makes it particularly important for them to be aware of signs of maltreatment (Veltkamp, 1994).
Summary
1. Mental retardation, which affects about 3% of children in the United States, involves reduced intelligence and reduced
adaptive functioning.
2. More severe levels of mental retardation (i.e., moderate, severe, and profound) are often diagnosed relatively early,
but are relatively uncommon, affecting only 15%
Page 162
of those children diagnosed with mental retardation. Mild mental retardation affects about 85% of children with mental
retardation but tends to be diagnosed later—sometimes not until school age.
3. Definitions of mental retardation proposed by the AAMR and the American Psychiatric Association differ primarily in
their characterization of severity, with the AAMR definition proposing levels of support needed for numerous
intellectual and adaptive functions in place of levels of impairment.
4. Increasingly, organic factors, as opposed to familial or nonorganic factors, are being identified as reasonable
explanations for cases of mental retardation. The three most common organic causes of mental retardation are Down
syndrome, fragile X syndrome, and FAE.
5. Down syndrome and fragile X syndrome are the most frequent genetic sources of mental retardation. Down syndrome
is almost always associated with a chromosomal abnormality, whereas fragile X syndrome is associated with an error
involving a single gene on the X chromosome.
6. FAS, which is usually associated with mild mental retardation, is considered the most frequent preventable cause of
mental retardation.
7. Assessment challenges include the need for particularly careful selection of developmentally appropriate instruments,
increased need for less formal measures because of a lack of appropriate standardized measure, and the need to adapt
tests to help insure that aspects of the child’s difficulties that are unrelated to the concept being tested are not preventing
successful performance.
8. Expected patterns of communication performance are related to level of mental retardation and to etiology.
Key Concepts and Terms
adaptive functioning: reduced ability to function in everyday situations in a manner considered culturally and
developmentally adequate.
autosomes: the most common type of chromosome within the human cell. They are usually contrasted with the sex
chromosomes, which typically consist of a single pair (XX for women and XY for men).
chromosomes: structures within human cells that carry the genes that act as blueprints for cell function.
dementia: a significant decline in intellectual function, usually after a period of normal intellectual function.
discrepancy testing: the comparison of performances in two different behavioral or skill areas (e.g., between ability and
achievement) to determine whether a discrepancy exists; often used as a requirement for services in education systems.
Down syndrome: an autosomal genetic disorder that is considered the most common genetic abnormality resulting in
mental retardation. It is associated with mild to severe mental retardation and particularly marked difficulties with syntax
and phonology.
Page 163
fetal alcohol effect (FAE): a diagnosis related to FAS, in which some but not all of the abnormalities required for a
diagnosis of FAS are observed.
Fetal alcohol syndrome (FAS): the constellation of physical abnormalities, deficient growth patterns, and cognitive and
behavioral problems found in children with a significant prenatal exposure to alcohol.
fragile X syndrome: the most common inherited cause of mental retardation; it is related to an X-chromosome
abnormality that may be passed through several generations before becoming severe enough to result in mental
retardation. The syndrome more commonly affects men than women.
mental retardation: reduced intelligence accompanied by reduced adaptive functioning.
mosaic Down syndrome: an uncommon form of Down syndrome occurring in less than 5% of cases, when trisomy 21
affects only some rather than all cells in the body.
out-of-level testing: the use of an instrument developed for children whose age differs from that of the child to be tested
(Berk, 1984).
premutation: a gene that is somewhat defective but not associated with significant abnormalities, as can happen in
families where fragile X syndrome is subsequently identified.
sex chromosomes: gene-bearing chromosomes associated with gender-related characteristics; these are related to
numerous birth defects in which patterns of transmission appear to be affected by gender.
strabismus: a problem in eye movement coordination, sometimes referred to as crossed eyes.
trisomy 21: the most typical chromosomal abnormality in Down syndrome, consisting of a third chromosome 21.
Williams syndrome: a congenital metabolic disease usually associated with moderate to severe learning difficulties.
Study Questions and Questions to Expand Your Thinking
1. What are the major common components of the definitions of mental retardation provided in this chapter?
2. Describe three possible co-occurring problems that may affect the communication and test-taking behaviors of a child
with mental retardation.
3. What is the most common inherited cause of mental retardation? What is the most common preventable cause?
4. Determine the definition for mental retardation used in a school system near you. How does that definition compare to
those of the AAMR and the American Psychiatric Association?
5. One test of adaptive skills that is frequently used is the Vineland Adaptive Behavior Scales (Sparrow, Balla, &
Cicchetti, 1984). Examine that measure in terms
Page 164
of items related to communication. What language domains (e.g., semantics, syntax, morphology, pragmatics) and what
language modalities (speaking, listening, writing, reading) are emphasized?
6. Using a format like that used in Table 6.5, identify a syndrome not described in this chapter (e.g., Prader-Willi
syndrome, cri du chat) and prepare a brief list of expected patterns of language and communication.
7. Examine the test manual of a language test to determine (a) what, if anything, is said about the appropriateness of the
measure for a child with mental retardation, and (b) what aspects of one or more tasks included in the test might be
incompatible with the characteristics of the following children:
● a child with severe cerebral palsy and moderate retardation whose only reliable response mode is a slow, effortful

pointing response;
● a child with mild retardation but severe attention and motivational problems; and

● a child with Down syndrome who has moderate retardation and a severe visual impairment.

Recommended Readings
Cohen, M. M. (1997). The child with multiple birth defects. (2nd ed.) New York: Oxford University Press.
Dykens, E. M., Hodapp, R. M. & Leckman, J. F. (1994). Behavior and development in fragile X Syndrome. Thousand
Oaks, CA: Sage.
Hersen, M., & Van Hasselt, V. (Eds.). (1990). Psychological aspects of developmental and physical disabilities: A
casebook. Newbury Park, CA: Sage.
Rondal, J. A., & Edwards, S. (1997). Language in mental retardation. San Diego, CA: Singular.
Stray-Gunderson, K. (Ed.). (1986). Babies with Down syndrome: A new parents guide. Kensington, MD: Woodbine
Press.
References
Abkarian, G. (1992). Communication effects of prenatal alcohol exposure. Journal of Communication Disorders, 25(4),
221–240.
American College of Medical Genetics. (1997). Policy statement on fragile X syndrome: Diagnostic and carrier testing
[On-line]. Available: http://www.faseb.org/genetics/acmg/pol-16.htm.
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington,
DC: Author.
Batshaw, M. L., & Perret, Y. M. (1981). Children with handicaps: A medical primer. Baltimore: Brookes Publishing
Company.
Baumeister, A. A. (1997). Behavioral research: Boom or bust? In W. E. MacLean, Jr. (Ed.), Ellis’ handbook of mental
deficiency, psychological theory and research (3rd ed., pp.3–45). Mahwah, NJ: Lawrence Erlbaum Associates.
Baumeister, A. A., & Woodley-Zanthos, P. (1996). Prevention: Biological factors. In J. W. Jacobson & J. A. Mulick
(Eds.), Manual of diagnosis and professional practice in mental retardation (pp. 229–242). Washington, DC: APA.
Bellenir, K. (1996). Facts about Down syndrome. Genetic disorders handbook. (pp. 3–14). Detroit, MI: Omnigraphics.
Bellugi, U., Marks, S., Bihrle, A., & Sabo, H. (1993). Dissociation between language and cognitive functions in
Williams syndrome. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 177–
189). Mahwah, NJ: Lawrence Erlbaum Associates.
Page 165
Berk, R. A. (1984). Screening and diagnosis of children with learning disabilities. Springfield, IL: Thomas.
Biederman, J., Newcorn, J. H., & Sprich, S. (1997). Comorbidity of attention-deficit/hyperactivity disorder. In T. A.
Widiger, A. J. Frances, H. A. Pincus, R. Ross, M. B. First, & W. Davis (Eds.), DSM–IV sourcebook. Washington, DC:
American Psychiatric Association.
Bloodstein, O. (1995). A handbook on stuttering (5th ed.). San Diego, CA: Singular.
Cohen, I. L. (1995). Behavioral profiles of autistic and nonautistic Fragile X males. Developmental Brain Dysfunction, 8,
252–269.
Cohen, M. M. (1997). The child with multiple birth defects (2nd ed.). New York: Oxford University Press.
Connor, M., & Ferguson-Smith, M. (1997). Essential medical genetics (5th ed.). Oxford, England: Blackwell.
Cooley, W. C. & Graham, J. M. (1991). Common syndrome and management issues for primary care physicians: Down
syndrome—An update and review for the primary pediatrician. Clinical Pediatrics, 30(4), 233–253.
Cromer, R. (1981). Reconceptualizing language acquisition and cognitive development. In R. L. Schiefelbusch & D. D.
Bricker (Eds.), Early language: Acquisition and intervention. Baltimore: University Park Press.
Donaldson, M. D. C., Shu, C. E., Cooke, A., Wilson, A., Greene, S. A., & Stephenson, J. B. (1994). The Prader-Willi
syndrome. Archives of Diseases of Children, 70, 58–63.
Downey, J., Ehrhardt, A. A., Gruen, R., Bell, J. J., & Morishima, A. (1989). Psychopathology and social functioning in
women with Turner syndrome. Journal of Nervous and Mental Disorders, 177, 191–201.
Durkin, M. S., & Stein, Z. A. (1996). Classification of mental retardation. In J. W. Jacobson & J. A. Mulick, (Eds.),
Manual of diagnosis and professional practice in mental retardation (pp. 67–73). Washington, DC: APA.
Dykens, E. M., Hodapp, R. M., & Leckman, J. F. (1994). Behavior and development in fragile X syndrome. Thousand
Oaks, CA: Sage.
Eaton, L. F., & Menolascino, F. J. (1982). Psychiatric disorder in the mentally retarded: Types, problems, and
challenges. American Journal of Psychiatry, 139, 1297–1303.
Fowler, A. E. (1990). Language abilities in children with Down syndrome: evidence for a specific syntactic delay. In D.
Cicchetti & M. Beeghley (Eds.), Children with Down syndrome (pp. 302–328). Cambridge, England: Cambridge
University Press.
Fox, R., & Wise, P. S. (1981). Infant and preschool reinforcement survey. Psychology in the Schools, 18, 87–92.
Gottlieb, M. L. (1987). Major variations in intelligence. In M. I. Gottlieb & J. E. Williams (Eds.), Textbook of
developmental pediatrics (pp. 127–150). New York: Plenum.
Grossman, H. J. (Ed.). (1983). Classification in mental retardation. Washington, DC: American Association on Mental
Deficiency.
Hersen, M., & Van Hasselt, V. B. (Eds.). (1990). Psychological aspects of developmental and physical disabilities: A
casebook. Newbury Park, CA: Sage Publications.
Hodapp, R. M. (1996). Cross-domain relations in Down’s syndrome. In J. A. Rondal, J. Porera, L. Nadel, & A.
Comblain (Eds.), Down s syndrome: Psychological, psychological, and socio-educational perspectives (pp. 65–79). San
Diego, CA: Singular.
Hodapp, R. M., & Dykens, E. M. (1994). Mental retardation’s two cultures of behavioral research. American Journal on
Mental Retardation, 98, 675–687.
Hodapp, R. M., & Zigler, E. (1997). New issues in the developmental approach to mental retardation. In W. E. MacLean,
Jr. (Ed.), Ellis’ handbook of mental deficiency, psychological theory and’ research (3rd ed., pp. 1–28). Mahwah, NJ:
Lawrence Erlbaum Associates.
Hodapp, R. M., Leckman, J. F, Dykens, E. M., Sparrow, S. S., Zelinsky, D. G., & Ort, S. I. (1992). K-ABC profiles of
children with fragile X syndrome, Down syndrome, and nonspecific mental retardation. American Journal on Mental
Retardation, 97, 39–46.
Jacobson, J. W., & Mulick, J. A. (Eds.) (1996). Manual of diagnosis and professional practice in mental retardation.
Washington, DC: APA.
Page 166
Kingsley, J., & Levitz, M. (1994). Count us in: Growing up with Down syndrome. New York: Harcourt Brace.
Lehrke, R. G. (1972). A theory of X-linkage of major intellectual traits. American Journal of Mental Deficiency, 76, 611–
619.
Lubetsky, M. J. (1990). Diagnostic and medical considerations. In M. Hersen & V. B. Van Hasselt (Eds.), Psychological
aspects of developmental and physical disabilities: A casebook (pp. 25–53). Newbury Park, CA: Sage Publications.
Luckasson, R. (1992). Mental retardation: Definition, classification, and systems of support. Washington, DC: American
Association on Mental Retardation.
Macmillan, D. L., & Reschly, D. J. (1997). Issues of definition and classification. In W. E. MacLean, Jr. (Ed.), Ellis’
handbook of mental deficiency, psychological theory and research (3rd ed., pp. 47–71). Mahwah, NJ: Lawrence Erlbaum
Associates.
Maino, D. M., Wesson, M., Schlange, D., Cibis, G., & Maino, J. H. (1991). Optometric findings in the fragile X.
Optometry and Vision Science, 68, 634–640.
Maxwell, L. A., & Geschwint-Rabin, J. (1996). Substance abuse risk factors and childhood language disorders. In M. D.
Smith & J. S. Damico (Eds.), Childhood language disorders (pp. 235–271). New York: Thieme.
Mervis, C. B. (1998). The Williams syndrome cognitive profile: Strengths, weaknesses, and interrelations among
auditory short-term memory, language, and visuospatial constructive cognition. In E. Winograd, R. Fivush, & W. Hirst
(Eds.), Ecological approaches to cognition. Mahwah, NJ: Lawrence Erlbaum Associates.
Miller, J. F, & Chapman, R. (1984). Disorders of communication: Investigating the development of language of mentally
retarded children. American Journal of Mental Deficiency, 88, 536–545.
Richardson, S. A., & Koller, H. (1994). Mental retardation. In I. B. Pless (Ed.), The epidemiology of childhood disorders
(pp. 277–303). New York: Oxford University Press.
Roeleveld, N., Zielhuis, G. A., & Gabreels, F. (1997). The prevalence of mental retardation: A critical review of the
literature. Developmental Medicine and Child Neurology, 39, 125–132.
Rondal, J. A. (1996). Oral language in Down’s syndrome. In J. A. Rondal, J. Porera, L. Nadel, & A. Comblain (Eds.),
Down’s syndrome: Psychological, psychobiological, and socio-educational perspectives (pp. 99–117). San Diego, CA:
Singular.
Rondal, J. A., & Edwards, S. (1997). Language in mental retardation. San Diego, Singular Publishing Group.
Rosenberg, S., & Abbeduto, L. (1993). Language and communication in mental retardation: development, processes,
and prevention. Hillsdale, NJ: Lawrence Erlbaum Associates.
Sandgrund, A., Gaines, R., & Green, A. (1974). Child abuse and mental retardation: A problem of cause and effect.
American Journal of Mental Deficiency, 79, 327–330.
Sattler, J. M. (1988). Assessment of children (3rd ed.). San Diego, CA: Author.
Simeonsson, R. J., & Short, R. J. (1996). Adaptive development, survival roles, and quality of life. In J. W. Jacobson &
J. A. Mulick (Eds.), Manual of diagnosis and professional practice in mental retardation (pp. 137–146). Washington,
DC: APA.
Sparks, S. N. (1993). Children of prenatal substance abuse. San Diego, CA: Singular.
Sparrow, S., Balla, D., & Cicchetti, D. (1984). Vineland Adaptive Behavior Scales. Circle Pines, MN: American
Guidance Service.
Stagg, V. (1988). Clinical considerations in the assessment of young handicapped children. In T. D. Wachs & R.
Sheehan (Eds.), Assessment of young developmentally disabled children (pp. 61–73). New York: Plenum.
Stratton, K., Howe, C., & Battaglia, E (1996). Fetal alcohol syndrome: Diagnosis, epidemiology, prevention, and
treatment. Washington, DC: National Academy Press.
Stray-Gunderson, K. (Ed.). (1986). Babies with Down syndrome: A new parents’ guide. Kensington, MD: Woodbine
Press.
Taitz, L. S., & King, J. M. (1988). A profile of abuse. Archives of Disease in Childhood, 63, 1026–1031.
Udwin, O., & Yule, W. (1990). Expressive language of children with Williams syndrome. American Journal of Medical
Genetics-Supplement 6, 108–114.
Page 167
Veltkamp, L. J. (1994). Clinical handbook of child abuse and neglect. Madison, CT: International Universities Press.
Wasson, P., Tynan, T., & Gardiner, P. (1982). Test adaptations for the handicapped. San Antonio, TX: Education
Service Center, Region 20.
Zigman, W. B., Schupf, N., Zigman, A., & Silverman, W. (1993). Aging and Alzheimer’s disease in people with mental
retardation. In N. W. Bray (Ed.), International review of research in mental retardation (Vol. 19, pp. 41–70) New York:
Academic Press.
Page 168
CHAPTER
7

Children with Autistic Spectrum Disorder


Defining the Problem

Suspected Causes

Special Challenges in Assessment

Expected Patterns of Language Performance

Related Problems
Andrew is a 4-year-old who rarely speaks or vocalizes. He also fails to respond or make eye contact when others speak
to him. He has some activities he will engage in incessantly, such as spinning parts of a toy truck or twirling his fingers
in front of his eyes. Andrew has epileptic seizures almost daily, is not yet toilet trained, rises early in the morning and
awakens once or twice each night—problems that provide additional stress to his caring, beleaguered parents. He was
initially identified as having severe to profound mental retardation and has more recently been identified as having
Autistic Disorder
Peter is a 12-year-old who speaks infrequently and often appears to ignore remarks directed to him by others. He
occasionally repeats the full text of a television commercial containing words he neither uses nor appears to understand
in other contexts. Peters expressive and receptive language, as measured through standardized tests, appear delayed, his
vocal intonation sounds unmodulated in pitch; and he rarely
Page 169
seems able to practice the give-and-take required for conversation. Although Peter was initially identified as having
autism, he has recently been diagnosed as having pervasive developmental delay not otherwise specified.
Amelia is a 10-year-old girl who was considered normal in her development of language until her extreme difficulty in
using language for communication was noticed when she entered preschool. Despite having near-normal language
abilities on standardized measures, her need for sameness and her difficulty in engaging in social interaction make her a
very solitary child. She performs best in school subjects such as mathematics and geography, which appear to interest
her greatly. Her problems have been tentatively identified as associated with Asperger’s Disorder.
Defining the Problem
Autistic spectrum disorder, the diagnostic category that encompasses many of the problems of Andrew, Peter, and
Amelia, is found in 0.02 to 0.05 % of the population, or in about 2 to 5 of every 10,000 people (American Psychiatric
Association, 1994). Recently, somewhat higher estimates have suggested as many as 10 to 14 of every 10,000
individuals (Trevarthen, Aitken, Papoudi, & Robarts, 1996). Even with these higher estimates, autism spectrum disorder
is relatively rare. The magnitude of its impact on affected children and their families, however, has caused it to be the
focus of considerable research and clinical writing. Its impact stems from the severity of symptoms, which include
delayed or deviant language and social communication and abnormal ways of responding to people, places, and objects.
There is also some evidence to suggest that it is becoming more prevalent (Wolf-Schein, 1996; cf. Trevarthen et al.,
1996).
About 75% of children with autism are diagnosed with mental retardation as well (Rutter & Schopler, 1987), with about
50% reportedly having IQs less than 50 and fewer than 33% having IQs greater than 70 (Waterhouse, 1996). There is
great uncertainty associated with these figures, however, because the diagnosis of mental retardation is often
questionable given the difficulty these children have in participating in formal assessment procedures (Wolf-Schein,
1996).
In the influential DSM–IV system of nomenclature (American Psychiatric Association, 1994), autistic spectrum disorder
is referred to as Pervasive Developmental Disorder (PDD), a category that includes autistic disorder, Rett’s disorder,
childhood disintegrative disorder, Asperger’s disorder, and pervasive developmental disorder not otherwise specified
(PDD-NOS) (Waterhouse, 1996). Readers should be aware that an alternative and somewhat more complicated set of
diagnoses related to autism has been formulated by the World Health Organization (WHO) in the International
Classification of Diseases (ICD; WHO, 1992, 1993), although it is not discussed here.
Autistic disorder is sometimes referred to as Kanner’s autism or infantile autism and is the most common of spectrum
disorders. Its symptoms are similar to the other disorders within the PDD category, including severe delays in
‘‘reciprocal social interaction skills, communication skills, and the presence of stereotyped behavior, interests and
activities” (American Psychiatric Association, 1994, p. 65). Although chil-
Page 170
dren with autistic disorder share many characteristics with children with other PDD disorders, the primary focus of this
chapter is children with autistic disorder and their surprising degree of heterogeneity, with regard to levels of cognitive
function, language outcomes, and specific symptoms (Hall & Aram, 1996; Myles, Simpson, & Becker, 1995). The
considerable differences within this single disorder are illustrated by the range of difficulties described at the outset of
the chapter in relation to Peter and Andrew.
The American Psychiatric Association (1994) definition for Autistic Disorder is presented in Table 7.1. Besides calling
attention to these children’s very marked problems in social interaction and language, this definition emphasizes the
abnormal and
Table 7.1
A Definition of Autistic Disorder (American Psychiatric Association, 1994)

A. A total of six (or more) items from (1), (2), and (3), with at least two from (1) and one each from (2) and (3):
(1) Qualitative impairment in social interaction, as manifested by at least two of the following:
(a) marked impairment in the use of multiple nonverbal behaviors such as eye-to-eye gaze, facial expression,
body postures, and gestures to regulate social interaction;
(b) failure to develop peer relationships appropriate to developmental level;
(c) a lack of spontaneous seeking to share enjoyment, interests, or achievements with other people (e.g., by a
lack of showing, bringing, or pointing out objects of interest);
(d) lack of social or emotional reciprocity.
(2) Qualitative impairments in communication as manifested by at least one of the following:
(a) delay in, or total lack of, the development of spoken language (not accompanied by an attempt to
compensate through alternative modes of communication such as gestures or mime);
(b) in individuals with adequate speech, marked impairment in the ability to initiate or sustain a conversation
with others;
(c) stereotyped and repetitive use of language or idiosyncratic language;
(d) lack of varied, spontaneous make-believe play or social imitative play appropriate to developmental level.
(3) Restricted repetitive and stereotyped patterns of behavior, interests, and activities, as manifested by at least
one of the following:
(a) encompassing preoccupation with one or more stereotyped and restricted patterns of interest that is
abnormal either in intensity or focus;
(b) apparently inflexible adherence to specific, nonfunctional routines or rituals;
(c) stereotyped and repetitive motor mannerisms (e.g., hand or finger flapping or twisting, or complex whole-
body movements);
(d) persistent preoccupation with parts of objects.
B. Delays or abnormal functioning in at least one of the following areas, with onset prior to age 3 years: (1) social
interaction, (2) language as used in social communication, or (3) symbolic or imaginative play.
C. The disturbance is not better accounted for by Rett’s syndrome or Childhood Disintegrative Disorder.

Note. From Diagnostic and Statistical Manual of Mental Disorders (4th ed., pp. 70–71) by the American Psychiatric
Association, 1994, Washington, DC: Author. Copyright 1994 by the American Psychiatric Association. Adapted with
permission.
Page 171
often rigid pattern of interaction with objects and other aspects of their environment that is characteristic of children with
autism. In this definition, the onset is specified as being prior to age 3 because of the variety of ages at which marked
changes in development are reported: Although many children are described by their parents as having always been
distant and unresponsive, others are described as having responded to social interaction normally until age 1 or 2
(American Psychiatric Association, 1994; Prizant & Wetherby, 1993).
Difficulties in defining autistic disorder arise from the remarkable heterogeneity of children with the disorder and from
the extent to which their problems overlap with those associated with other developmental disorders and with mental
retardation (Carpentieri & Morgan, 1996; Nordin & Gillberg, 1996; Waterhouse et al., 1996). Table 7.2 lists the other
disorders included within PDD and the characteristics that are thought to distinguish autistic disorder from them.
A number of researchers (e.g., Rapin, 1996; Waterhouse, 1996; Wing, 1991 ) have explored common features across
specific disorders included within PDD and have suggested that frequent changes in terminology and clinical categories
are likely to continue as more is learned about these children (Waterhouse, 1996). In particular, considerable research
has recently been devoted to the defining boundaries between Asperger’s syndrome and autistic disorder in individuals
with higher measured IQs (Ramberg, Ehlers, Nyden, Johansson & Gillberg, 1996; Wing, 1991).
The overlap between mental retardation and autistic spectrum disorder also presents major challenges to researchers and
clinicians. As mentioned earlier, about 75% of children with autistic spectrum disorder are diagnosed with mental
retardation. In addition, the severity of mental retardation appears to be related to the frequency of autistic symptoms.
For example, in one recent Swedish study (Nordin & Gillberg, 1996a), autistic spectrum disorder was identified in about
12% of children with mild retardation, whereas it was identified in 29.5% of those with severe retardation. The fact that
not all children with mental retardation show autistic symptoms, however, suggests that much more needs to be done to
understand the relationship of these two conditions. Increased understanding of the nature of the relationship between
mental retardation and the specific cognitive deficits associated with autistic spectrum disorder should help improve the
quality of care directed to children with these combined difficulties.
Additional difficulties in diagnosis are due to the changing nature of symptoms associated with autistic disorder with
age, although currently there is considerable disagreement over the nature and direction of those changes (i.e.,
improvement vs. decline; e.g., see Eaves & Ho, 1996; Piven, Harper, Palmer, & Arndt, 1996). Despite possible changes
over time, however, it is rare for individuals diagnosed as autistic in childhood to enter adulthood without significant
residual problems (e.g., see Piven et al., 1996). A personal experience with an acquaintance in graduate school—who in
retrospect would probably have been identified as having Asperger’s disorder and whom I will call Matthew Metz—
captures this generality for me: Although Matthew would eventually complete a Ph.D. in history, he invariably greeted
members of our graduate house he saw on campus with an introduction—“Hi, you may not remember me, but my name
is Matthew Metz.” This greeting persisted despite months of having
Page 172
Table 7.2
Differentiating Autistic Disorder From Other Disorders Within the Autistic Spectrum Disorder
(Called Pervasive Developmental Disorders, PDD, by the American Psychiatric Association, 1994)

Disorder Major Characteristics Basis for Differentiation From Autistic Disorder

Rett’s disorder
An autosomal disorder affecting only women (probably no men are identified because of fetal mortality)Normal pattern
of early physical, motor development with later loss of skills and deceleration in head growthAssociated with severe or
profound mental retardation and limited language skillsCharacteristic hand movements (“wringing” or “washing” of
hands) Differences in sex ratios (female only versus predominately male in autism)Head growth slows down after
infancy only in Rett’s; Autistic disorder may actually be associated with an abnormally large head circumference
(Waterhouse et al., 1996)Social interaction difficulties are more persistent into late childhood in autism than in Rett’s
disorderDifferentiation from autism depends on good evidence of normal development during first two years; otherwise,
the autism categorization is preferred Childhood disintegrative disorder Marked regression after at least 2 years of
seemingly normal developmentSocial, communication, and behavioral characteristics similar to autismUsually
associated with severe mental retardationVery rare disorder, possibly more common in men than women Asperger’s
disorder Preserved language function in the presence of ‘‘severe and sustained impairment in social interaction” (p. 75)
Restricted, repetitive patterns of behavior, interests, and activities (e.g., pronounced interest in train schedules) Absence
of significant language and cognitive deficits in Asperger’s disorder, but very significant delays in autismExcept for
social communication deficits, adaptive skills are developmentally appropriate in Asperger’s, but not in
autismAsperger’s Disorder is typically diagnosed later than autism, often at school age, possibly due to later onset than
autism Pervasive Developmental Disorder Not Otherwise Specified (PDD-NOS) Severe and pervasive impairment in
social interaction and/or verbal and nonverbal communication and/or presence of restricted, repetitive patterns of
behavior, interests, and activitiesFailure to meet specific criteria required for other PDD categories described above with
regard to severity of symptoms or age of onset Onset or symptoms failing to conform to criteria for other PDD, including
autismSometimes referred to as “atypical autism”
Page 173
shared dinners at a common table with the acquaintances he addressed. As you may expect, Matthew had a very
restricted social sphere that was largely confined to fellow students in his graduate program. When I last heard of him, he
was living with his elderly parents and earned a limited income by writing entries on historical subjects for publishers of
an encyclopedia. Thus, even in the presence of the intellectual abilities required for completion of a graduate degree,
significant challenges for Matthew persisted well into adulthood.
Suspected Causes
To date, discussions of etiology for autistic spectrum disorder have focused on socioenvironmental, behavioral, and
purely organic possibilities (Haas, Townsend, Courchesne, Lincoln, Schreibman, & Yeung-Courchesne, 1996;
Waterhouse, 1996; Wolf-Schein, 1996). The socioenvironmental perspective had strong proponents in the 1960s,
especially among psychoanalysts who held that poor parenting was the source of these children’s difficulties (e.g.,
Bettelheim, 1967). More recently, however, such theories have lost favor with almost all researchers and clinicians.
Currently, the dominant perspective on autism is that it has one or more organic bases in the form of underlying
neurological abnormalities.
The nature of neurologic abnormalities underlying autism has not yet been well documented and constitutes a major area
of research (Rapin, 1996). Proposed sites of suspected neurologic abnormalities are the frontal lobe (Frith, 1993), the
reticular formation of the brain stem (Rimland, 1964), and the cerebellum (Courchesne, 1995)—just to name a few (cf.
Cohen, 1995; Wolf-Shein, 1996). In addition, the role that the right hemisphere of the brain plays in autistic symptoms
has received some attention (e.g., Shields, Varley, Broks, & Simpson, 1996). Although localized functional
abnormalities have been sought, it has frequently been suggested that the underlying abnormalities are in fact likely to be
diffuse (Rapin, 1996).
As a more distal causal factor leading to the brain abnormalities that are then believed to cause autistic symptoms more
directly, genetic factors are implicated for some cases of autism. Evidence supporting this reasoning includes (a) the
preponderance of males in all categories with PDD except Rett’s disorder (American Psychiatric Association, 1994;
Waterhouse et al., 1996),1 (b) the tendency for PDD to occur much more frequently in some families than in others
(Folstein & Rutter, 1977), and (c) the tendency for PDD to occur frequently among individuals with fragile X, where
genetic abnormalities are well documented (Cohen, 1995).
Many cases of autism, however, have yet to be linked to genetic abnormalities. Nonetheless, it is suspected that these
cases are still due to organic factors arising before rather than during or after the child’s birth (Rapin, 1996). Other
suspected sources of the presumed neurologic abnormalities include metabolic disorders and infectious disorders (e.g.,
congenital rubella, encephalitis, or meningitis; Rapin, 1996; Wolf-Schein, 1995). In some cases, no likely causal factor is
suggested—leading to cases that are termed
1 The reasoning is that male preponderance may exist because males’ single X chromosome makes them at special risk
for X-chromosome defects.
Page 174
idiopathic, that is, without a known cause. Efforts to identify the real nature of such idiopathic cases and to identify the
specific mechanisms by which known causes act to create autistic symptoms represent some of the most needed areas for
research on PDD.
Special Challenges in Assessment
Children with autistic spectrum disorder present the greatest imaginable challenges to the clinician contemplating formal
testing as a means of collecting information. Frequently, these children’s essential social interaction deficits dramatically
limit their participation in the usual give-and-take required by most standardized language instruments. Consequently,
informal measures, especially parent questionnaires and behavioral checklists, are used very frequently for purposes of
screening, diagnosis, and description of language among children and adults with autistic spectrum disorder (Chung,
Smith, & Vostanis, 1995; DiLavore, Lord, & Rutter, 1995; Gillberg, Nordin, & Ehlers, 1996; Nordin, & Gillberg, 1996;
Prizant & Wetherby, 1993; Sponheim, 1996).
Alternatives to standardized tests are particularly valuable for those children whose communication repertoire is very
limited, a group that includes as many as 50% of all children with autism (Paul, 1987). Where the purpose of an
evaluation is to aid in diagnosis of the disorder, it has been argued that parent interviews may be considerably better than
observational methods that may be applied by clinicians (Rapin, 1996). Table 7.3 lists some of the most common
questionnaires, interview schedules, checklists, and other instruments used in screening and diagnosing autistic spectrum
disorders. Although many of these focus on the entire range of difficulties often seen as part of autism, some focus on
selected skill areas, such as communication or play.
Despite the frequent need for nontraditional, observational techniques, more traditional, standardized speech and
language tests can play a useful role in language assessments of some children with autism. In particular, children with
more elaborate language and communication skills—children who are often described as “high functioning” may be
amenable to standardized testing when appropriate attention is paid to motivation and other enabling factors. Information
obtained from family members and other individuals who are very familiar with the child can help pinpoint the
reinforcers that will prove most helpful in facilitating a child’s participation and warn against specific stimuli (e.g., types
of environmental noise such as traffic noise or the sound of some electrical devices) that are likely to be distracting or
disturbing to the individual child.
For higher functioning children, standardized speech and language testing may not only be feasible, but quite vital to a
thorough understanding of their strengths and weaknesses—particularly for receptive skills that, unlike expressive skills,
cannot be as readily observable in spontaneous productions.
Even when expressive language testing is feasible, analysis of spontaneous productions will almost always constitute a
particularly desirable tool for expressive language assessment. Not only does analysis of spontaneous language allow
one to simultaneously examine variables related to numerous expressive language domains (Snow & Pan, 1993), one can
argue that the validity of such measures will be particularly superior for children who are so reactive to standardized
testing procedures. In
Page 175
Table 7.3
Recent Behavioral Checklists and Interview for Screening and Description of Autistic Spectrum Disorder
(Chung, Smith, & Vostanis, 1995; Gillberg, Nordin, & Ehlers, 1996; Wolf-Schein, 1996)

Purpose Instrument Age Group Description

Screening Checklist for Autism in Children from 18 to Uses 14 items that are responded
Toddlers (CHAT; Baron-Cohen, 30 months to by parent (n = 9) and by
Allen, & Gillberg, 1992) clinician (n = 5 items); found to
have a low rate of false positives
and reported to have good
reliability (Gillberg, Nordin &
Ehlers, 1996)
Pre-Linguistic Autism Children under 6 Uses 12 play-based activites
Diagnostic Observation years of age with 17 associated ratings, with
Schedule (PL-ADOS) items administered by the
(DiLavore, Lord, & Rutter, examiner or through one of the
1995) child’s caregivers; designed to
relate directly to the DSM–IV or
ICD-10 criteria
Asperger Syndrome Screening 7 to 16 years A teacher questionnaire
Questionnaire (ASSQ; Ehlers & containing 27 items; it appears to
Gillberg, 1993) consistently identify Asperger’s
disorder, but it may overidentify
in cases of other social
abnormalities; one of the few
measures developed to be
sensitive to Asperger’s disorder.
Diagnosis and Description Autism Diagnostic Interview- Children from 18 Uses interview of parents or
Revised (ADI-R; Lord, Rutter, months to adults caregivers of Individuals with
& Le Couteur, 1994) suspected autistic disorder.
Designed to relate directly to the
DSM–IV or ICD-10 criteria.
Childhood Autism Rating Scale Children Uses direct observation of
(CARS) (Schopler, Reichler, & children with suspected autistic
Renner, 1986) disorder. Designed to be used in
diagnosis and description of
severity.
Page 176
chapter 10, the use of spontaneous language sample analyses is discussed at some length.
Expected Patterns of Language Performance
Certain specific language behaviors are frequently associated with autism, although they may also occur infrequently in
normal language development and in other language disturbances. Among these behaviors are echolalia, pronominal
reversals, and stereotypic or nonreciprocal language (Fay, 1993; Paul, 1995).
Echolalia consists of the immediate or delayed repetition of speech, often without evident communicative intent.
Echolalic productions can often be quite complex in their language structure relative to the level of the child’s
spontaneous communications and may simply represent memorized routines rather than creatively generated language.
The presence of echolalic productions often appears to indicate a child’s attempt to stay engaged in the social interaction
despite failing to understand what has just been said or being unable to produce a more suitable response. Such
productions, consequently, may be communicative in intent and therefore provide information about the nature of the
child’s pragmatic skills (Paul, 1995).
Pronominal reversals involves an apparent confusion in pronoun choice in which first and second person pronouns are
substituted for one another. Thus, for example, a child might say “you go” when apparently referring to him-or herself.
Although at one point in time these errors were thought to reflect the child’s failure to distinguish him-or herself from
the environment, they are currently taken to reflect the child’s inflexible use of language forms. In short, the child treats
pronouns, which are sometimes referred to as “deictic shifters,” as unchanging labels, thereby failing to recognize the
shift that allows “I” to refer to several different speakers in turn simply by virtue of their role as speaker, and “you’’ by
virtue of their role as listener. Although once considered a hallmark of the disorder, pronominal reversals are not
necessarily used frequently (Baltaxe & D’Angiola, 1996). The Personal Perspective included in this chapter contains the
reflections of Donna Williams, an adult with autism, who argues persuasively for the relative unimportance of pronoun
use as a target for therapy, given all of the words one needs to learn.
PERSONAL PERSPECTIVE
The following passage comes from a book written by a young woman who describes herself as having autism
associated with high functioning (Williams, 1996, pp. 160–61). In this passage, she discusses which words are
important and which are unimportant to learn:
“Words to do with the names of objects are probably the most important ones to connect with as it is hard to ask for
help if you haven’t got these. If someone can only say ‘book,’ at least you can work out what they might want done
with
Page 177
it. if they just say ‘look’ but haven’t connected with ‘book’, you have a whole house full of things that can be
‘looked’ (at or for).
“Words to do with what things are contained in (box, bottle, bag, packet), made of (wood, metal, cloth, leather, glass,
plastic, powder, goo) or what is done with them (eating, drinking, closing, warming, sleeping) are also really important
to learn. Much later, less tangible, less; directly observable words such as those to do with feelings (had enough, hurt,
good, angry) or body sensations (tired, full, cold, thirsty) are really important to connect with.
“Words to do with pronouns, such as ‘I,’ ‘you,’ ‘he,’ ‘she,’ ‘we’ or ‘they,’ aren’t so important. Too many people make
a ridiculous big hoo-ha about these things, because they want to eradicate this ‘symptom of autism,’ or for the sake of
‘manners’ or impressiveness. Pronouns are ‘relative’ to who is being referred to, where you are and where they are in
space and who you are telling all this to. That’s a lot of connections and far more than ever have to be made to
correctly access, use and interpret most other words. Pronouns are, in my experience, the hardest words to connect with
experienceable meaning because they are always changing, because they are so relative. In my experience, they require
far more connections, monitoring and feedback than in the learning of so many other words.
“Too often so much energy is put into teaching pronouns and the person being drilled experiences so little consistent
success in using them that it can really strongly detract from any interest in learning all the words that can be easily
connected with. I got through most of my life using general terms like ‘a person’ and ‘one,’ calling people by name or
by gender with terms like ‘the woman’ or ‘the man’ or by age with terms like ‘the boy.’ It didn’t make a great deal of
difference to my ability to be comprehended whether I referred to these people’s relationship to me or in space or not.
These things might have their time and place but there are a lot of more important things to learn which come easier
and can build a sense of achievement before building too great a sense of failure.”
Stereotypic or nonreciprocal language refers to idiosyncratic use of words or even whole sentences (Paul, 1995). Often
the particular word or phrase seems to be used because it was first heard in a particular situation or in conjunction with a
specific event or objects. Thereafter, it is used to stand for the associated situation, event, or object, despite its lack of
meaning to anyone except a very perceptive individual present at the time the association ‘was formed. Temple Grandin,
a college professor who has recently published several books about her experiences as someone with autism, describes a
personal example of nonreciprocal language:
Teachers who work with autistic children need to understand associative thought patterns. An autistic child will often use
a word in an inappropriate manner. Sometimes these uses have a logical associated meaning and other times they don’t.
For example, an autistic child might say the word “dog” when he wants to go outside. The word ‘‘dog” is associated with
going outside. In my
Page 178
own case, I can remember both logical and illogical use of inappropriate words. When I was six, I learned to say
‘prosecution.’ I had absolutely no idea what it meant, but it sounded nice when I said it, so I used it as an exclamation
every time my kite hit the ground. I must have baffled more than a few people who heard me exclaim “Prosecution!” to
my downward spiraling kite. (Grandin, 1995, p. 32)
In addition to characteristic kinds of atypical language use, patterns of language strengths and weaknesses among
children with autistic disorder and Asperger’s disorder have received extensive attention by researchers. Table 7.4
summarizes the language characteristics described for three diagnoses in the spectrum: two forms of autistic disorder and
Asperger’s disorder. The two descriptions provided under autistic disorder are included because of the relatively rich
research base that has identified very different skills seen in individuals who can be described as high- versus low-
functioning in terms of severity as well as in terms of nonverbal intelligence scores. A study performed by a large group
of researchers headed by Isabelle Rapin (1996) provides the most comprehensive study of the largest number of children
with autism to date; it made use of normal controls and two other control groups—(a) a group of language-impaired
children to act as controls for the high-functioning children with autism and (b) a group of children without autism but
with low nonverbal IQs to act as a control group for the low-functioning children with autism. That multiyear, multisite
study provided much of the information included in Table 7.4. Despite my use of the subcategories high- and low-
functioning, it should be noted that researchers have identified several subgroupings of autistic spectrum disorder beyond
those discussed in this chapter, including aloof, passive, and active-but-odd; e.g., Frith, 1991; Sevin et al., 1995;
Waterhouse, 1996; Waterhouse et al., 1996).
Related Problems
Autistic Disorder, and indeed most of the disorders on the autistic spectrum, are characterized by a number of behavioral
problems in addition to those already discussed in terms of communication and language. Two of these—“restricted
repetitive and stereotyped patterns of behavior, interests, and activities” and “lack of varied, spontaneous make-believe
play or social imitative play appropriate to developmental level” are considered central enough to the nature of the
disorder to be listed in the DSM–IV definition (American Psychiatric Association, 1994). They are closely related.
Restricted and stereotyped patterns of behavior, interests, and activities can include behaviors such as the child’s
rocking, flapping one or both hands in front of his or her own eyes, repeatedly manipulating parts of objects (such as
spinning the wheel on a toy or repeatedly opening and closing a hallway door), or, more alarmingly, repeatedly biting or
striking others or him-or herself. Some of these repetitive behaviors can be interpreted as self-stimulatory or as efforts by
the child to deal with anxiety and avoid overstimulation (e.g., Cohen, 1995); others are more difficult to interpret.
Stereotyped, repetitive behaviors (sometimes referred to as stereotypies) will often need to be addressed in order to free
the child to attend to important interactions (such as assessment or establishing relationships with peers). How they
should be addressed
Page 179
Table 7.4
Patterns of Strengths and Weaknesses Among Children With Autistic Disorder—High-Functioning,
Autistic Disorder—Low-Functioning, and Asperger’s Disorder

Disorder Relative Strengths in Relative Weaknesses in Other Strengths and


Communication Communication Weaknesses

Autistic disorder–high-
functioning (AD-HF)
Expressive vocabulary (Rapin, 1996)Written language superior to oral language and superior to written language skills
of children with delayed language skills, but normal intelligence (Rapin, 1996)Relatively less use of echolalia than in
AD-LF Receptive language more affected than expressive language (Rapin, 1996)Functional use of expressive language
below performance on most tests of expressive language (Rapin, 1996)Pragmatic skillsRapid naming within a category
(Rapin, 1996)Formulated output of connected speech (Rapin, 1996)Verbal reasoning (Rapin, 1996)Delayed development
of questionasking as pronounced in AD-LF (Rapin, 1996) Strengths:Preserved function on visuospatial and visual-
perceptual skills (Rapin, 1996)Weaknesses:Marked delay in onset of ability to engage in symbolic play (Rapin, 1996)
Possible deficits in memory (Rapin, 1996)Subtle motor deficits, especially affecting gross motor skills (Rapin, 1996) that
are more consistent with language skills than nonverbal IQ
(Continued)
Page 180
Table 7.4 (Continued)

Disorder Relative Strengths in Relative Weaknesses in Other Strengths and


Communication Communication Weaknesses

Autistic disorder–low-
functioning (AD-LF)
Expressive vocabulary is a relative strength and is generally better than receptive vocabularyPatterns of strength and
weaknesses may be especially difficult to determine because of floor effects on many measures (Rapin, 1996) Verbal
communication may be absent in about half of these children (Rapin, 1996)When present, most areas of language are
severely affected (Rapin, 1996)Reported temporary regression of language skills in early development (Rapin, 1996)
Strengths:Nonverbal performance supperior to verbal performance (Rapin, 1996) Asperger’s Disorder (AD) Generally
preserved language skills (American Psychiatric Association, 1994)Phonology, except possibly in the areas of
prosodySyntax Pragmatic skills (Ramberg, Ehlers, Nyden, Johansson, & Gillberg, 1996; Wing, 1991)Atypical prosody
and vocal characteristics (Ramberg et al., 1996) Strengths:Normal nonverbal intelligenceWeaknesses:Motor clumsiness
(Ramberg et al., 1996; Wing, 1991) Note. Asperger’s Disorder is considered equivalent to Autistic Disorder-High
Functioning by some authors (e.g., Rapin, 1996).
Page 181
must be determined in relation to their potentially adaptive role from the child’s perspective. Team approaches using
behavioral interventions and, at times, drug intervention are sometimes useful.
The high frequency of these stereotyped patterns of interaction is combined with a lack of the spontaneous, imaginative
play considered so characteristic of childhood. Although this deficiency has been noted since autism was first described
by Kanner in 1943, it has recently been seen as related to these children’s apparent inability to assume alternative
perspectives—an ability that also supports social interaction. It has been said that one of the chief cognitive deficits in
children with autistic disorder may be their lack of a theory of mind, the ability to think about emotions, thoughts,
motives—either in themselves or others (Frith, 1993).
Sometimes, pronounced sensory abnormalities have been inferred in many autistic children on the basis of their apparent
avoidance of and negative reactions to many auditory, visual, and tactile stimuli. In particular, hypersensitivity and
hyposensitivity have been associated with autistic spectrum disorders (e,.g., Roux et al., 1995; Sevin et al., 1995).
Recently, a controversial therapy technique, auditory integration training (Rimland & Edelson, 1995), has been devised
in an attempt to eliminate these, abnormal responses to auditory stimuli seen in some children.
In a growing number of studies, children with autism spectrum disorder have been found to be at increased risk for motor
abnormalities. For example, in a recent large-scale study, children in both high-and low-functioning groups showed a
greater frequency of motor abnormalities than did groups of children with either mental retardation without autism or
SLI (Rapin, 1996). However, oromotor impairments tended to be more common and more severe among children in the
low-functioning, group. Among the difficulties noted have been akinesia (absent or diminished movement), bradykinesia
(delay in initiating, stopping, or changing movement patterns); and dyskinesia (involuntary tics or stereotypies; Damasio
& Maurer, 1978) as well as problems with muscle tone, posture, and gait (Page & Boucher, 1998). Of particular interest
to speech-language pathologists who may wish to work on oral motor activities in efforts to foster speech or on manual
gestures have been reports of oral and manual dyspraxia, difficulties in the performance of purposeful voluntary
movements in the absence of paralysis or muscular weakness (Page & Boucher, 1998; Rapin, 1996).
Other problems that are more common among children on the autistic disorder spectrum than among children without
identified problems are epilepsy, especially a form called infantile spasms, and sleep disorders (Rapin, 1996). ADHD
(discussed in chap. 5), is also more prevalent (Wender, 1995).
Summary
1. Autistic spectrum disorder, also termed Pervasive Developmental Disorder (PDD), encompasses at least four related
and relatively rare disorders: Rett’s disorder, autistic disorder, Asperger’s syndrome, childhood disintegrative disorder,
and pervasive developmental disorder not otherwise specified (PDD-NOS) according the diagnostic system of the DSM–
IV (American Psychiatric Association, 1994).
Page 182
2. Difficulties shared by children with autistic spectrum disorders include delayed or deviant language, social
communication, and abnormal ways of responding to people, places, and objects.
3. Autistic spectrum disorders frequently co-occur with mental retardation, perhaps because of a shared cause:
underlying neurologic abnormalities.
4. Although the source of underlying neurologic abnormalities is generally unknown, genetic factors and prenatal
infections are suspected in some cases.
5. Children with autistic spectrum disorder are often unable to participate in standardized testing required for the
diagnosis of their disorder, making the use of observational methods and parental questionnaires a very frequent and
relatively well-studied alternative.
6. Echolalia, pronominal reversals, and stereotypic language are abnormal features of language that are seen more
frequently in autistic disorder than in other developmental language disorders.
7. Other problems affecting children with autistic spectrum disorders include a lack of spontaneous, imaginative play and
restricted patterns of behavior, interests, and activities. In addition, these children are at increased risk for motor
abnormalities, seizures, and sleep disorders.
Key Concepts and Terms
akinesia: absent or diminished movement.
autistic disorder: the major and most frequently occurring disorder category within the larger DSM–IV (American
Psychiatric Association, 1994) definition of Pervasive Developmental Disorders; often used synonymously with infantile
autism or Kanner’s autism.
Asperger’s disorder: an autistic disorder within the larger DSM–IV category of Pervasive Developmental Disorders in
which early delays in communication are absent; often considered synonymous with high-functioning autism.
bradykinesia: a motor abnormality characterized by delays in initiation, cessation, or alteration of movement pattern.
childhood disintegrative disorder: a very rare autistic disorder within the larger DSM–IV category of Pervasive
Developmental Disorders in which a period of about 2 years of normal development is followed by autistic symptoms.
dyskinesia: a movement abnormality characterized by involuntary tics or stereotypies.
dyspraxia: difficulties in the performance of purposeful voluntary movements in the absence of paralysis or muscular
weakness; for example, oral dyspraxia, manual dyspraxia, verbal dyspraxia (also frequently referred to as verbal apraxia).
echolalia: immediate or delayed repetition of a previous speaker’s or one’s own utterance.
Page 183
epilepsy: a chronic disorder associated with excessive neuronal discharge, altered consciousness, and sensory activity,
motor activity, or both.
Pervasive Developmental Disorders (PDDs): the group of severe disorders having their onset in childhood, characterized
by significant deficits in social interaction and communication, as well as the presence of stereotyped behavior, interests
and activities; considered synonymous with autistic disorder spectrum disorder.
pervasive developmental disorder not otherwise specified (PDD-NOS): Within the DSM–IV system of disorder
classification, this diagnosis is made when some but not all of the major criteria for autistic disorder are met; also
referred to as atypical autism.
pronominal reversals: incorrect use of first- and third-person pronouns (e.g., “you want” to mean ‘‘I want”), which are
considered typical of autistic speech.
Rett’s disorder: a severe autosomal pervasive developmental disorder affecting only girls, in which a brief period of
normal development is followed by regression; associated with severe or profound levels of mental retardation.
stereotypy: frequent repetition of a meaningless gesture or movement pattern.
theory of mind: the ability to think about emotions, thoughts, and motives—either in oneself or others; considered to be a
primary deficit among individuals whose difficulties fall along the autistic disorder spectrum.
Study Questions and Questions to Expand Your Thinking
1. On the Internet, look for sites related to PDD. For which disorders within that designation do you find web sites? Who
are the main audiences for these sites? How do sites respond differently to these various audiences?
2. On the basis of Table 7.2, list the major characteristics of a child’s behavior that will be needed to determine which
PDD label is most appropriate.
3. On the basis of the discussion of suspected causes of PDD, outline two major research needs that should be pursued by
future researchers.
4. List in order of importance the problems—other than those intrinsic to autism itself—presented to adults who wish to
interact with children with PDD.
5. What features of a child’s communication would cause you to be most concerned that he or she was showing
symptoms of autism? What features of his or her language?
6. What practical problems might a parent of a child with PDD face that are different from those faced by other parents?
7. Find out what definition of autistic spectrum disorders is used in a local school system. How does it differ from the
system described in DSM–IV (American Psychiatric Association, 1994)?
Page 184
Recommended Readings
Angell, R. (1993). A parent’s perspective on the preschool years. In E. Schopler, M. E. Van Bourgondien, & M. M.
Bristol (Eds.), Preschool issues in autism. New York: Plenum.
Campbell, M., Schopler, E., Cueva, J. E., & Hallin, A. (1996). Treatment of autistic disorder. Journal of the American
Academy of Child and Adolescent Psychiatry, 35, 124–143.
Grandin, T. (1995). Thinking in pictures and other reports from my life with autism. New York: Doubleday.
Schopler, E. (1994). Behavioral issues in autism. New York: Plenum.
Strain, P. S. (1990). Autism. In M. Hersen & V. B. Van Hasselt (Eds.), Psychological aspects of developmental and
physical disabilities: A casebook (pp. 73–86). Newbury Park, CA: Sage.
References
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington,
DC: Author.
Angell, R. (1993). A parent’s perspective on the preschool years. In E. Schopler, M. E. Van Bourgondien, & M. M.
Bristol (Eds.), Preschool issues in autism (pp. 17–38). New York: Plenum.
Baltaxe, C. A. M., & D’Angiola, N. (1996). Referencing skills in children with autism and specific language impairment.
European Journal of Disorders of Communication, 31, 245–258.
Baron-Cohen, S., Allen, J., & Gillberg, C. (1992). Can autism be detected at 18 months? The needle, the haystack, and
the CHAT. British Journal of Psychitary, 161, 839–843.
Bettelheim, B. (1967). The empty fortress. New York: Collier Macmillan. Carpentieri, S., & Morgan, S. B. (1996).
Adaptive and intellectual functioning in autistic and nonautistic retarded children. Journal of Autism and Developmental
Disorders, 26, 611–620.
Chung, M. C., Smith, B., & Vostanis, P. (1995). Detection of children with autism. Educational and Child Psychology,
12(2), 31–36.
Cohen, I. L. (1995). Behavioral profiles of autistic and nonautistic fragile X males. Developmental Brain Dysfunction, 8,
252–269.
Courchesne, E. (1995). New evidence of cerebellar and brainstem hypoplasia in autistic infants, children, and
adolescents: The MR imaging study by Hashimoto and colleagues. Journal of Autism and Developmental Disorders, 25,
19–22.
Damasio, A. R., & Maurer, R. G. (1978). A neurological model for childhood autism. Archives of Neurology, 35, 779–
786.
DiLavore, P. C., Lord, C., & Rutter, M. (1995). The Pre-Linguistic Autism Diagnostic Observation Schedule. Journal of
Autism and Developmental Disorders, 25, 355–379.
Eaves, L. C., & Ho, H. H. (1996). Brief report: Stability and change in cognitive and behavioral characteristics of autism
through childhood. Journal of Autism and Developmental Disorders, 26, 557–569.
Ehlers, S., & Gillberg, C. (1993). The epidemiology of Asperger syndrome: A total population study. Journal of Child
Psychology and Psychiatry, 34, 1327–1350.
Fay, W. (1993). Infantile autism. In D. Bishop & K. Mogford (Eds.), Language development in exceptional
circumstances (pp. 190–202). Mahwah, NJ: Lawrence Erlbaum Associates.
Folstein, S., & Rutter, M. (1977). Infantile autism: A genetic study of 21 twin pairs. Journal of Child Psychology and
Psychiatry, 18, 297–321.
Frith, U. (1991). Asperger and his syndrome. In U. Frith (Ed.), Autism and Asperger syndrome (pp. 1–36). Cambridge:
Cambridge University Press.
Frith, U. (1993). Autism and Asperger syndrome. Cambridge, England: Cambridge University Press.
Gillberg, C., Nordin, V., & Ehlers, S. (1996). Early detection of autism: Diagnostic instruments for clinicians. European
Child and Adolescent Psychiatry, 5, 67–74.
Grandin, T. (1995). Thinking in pictures and other reports from my life with autism. New York: Double-day.
Hall, N. E., & Aram, D. M. (1996). Classification of developmental language disorders. In I. Rapin (Ed.), Preschool
children with inadequate communication (pp. 10–20). London: MacKeith Press.
Page 185
Haas, R. H., Townsend, J., Courchesne, E., Lincoln, A. J., Schreibman, L., & Yeung-Courchesne, R. (1996). Neurologic
abnormalities in infantile autism. Journal of Child Neurology, 11(2), 84–92.
Kanner, L. (1943). Autistic disturbances of affective contact. Nervous Child, 2, 217–250.
Lord, C., Rutter, M., & Le Couteur, A. (1994). Autism Diagnostic Interview-Revised: a revised version of a diagnostic
interview for caregivers of individuals with possible pervasive developmental disorders. Journal of Autism &
Developmental Disorders, 24, 659–685.
Myles, B. S., Simpson, R. L., & Becker, J. (1995). An analysis of characteristics of students diagnosed with higher-
functioning Autistic Disorder. Exceptionality, 5(1), 19–30.
Nordin, V., & Gillberg, C. (1996a). Autism spectrum disorders in children with physical or mental disability or both: I.
Clinical and epidemiological aspects. Developmental Medicine and Child Neurology, 38, 297–311.
Nordin, V., & Gillberg, C. (1996b). Autism spectrum disorders in children with physical or mental disability or both: II.
Screening aspects. Developmental Medicine and Child Neurology, 38, 314–324.
Page, J., & Boucher, J. (1998). Motor impairments in children with autistic disorder. Child Language Teaching and
Therapy, 14, 233–259.
Paul, R. (1987). Communication. In D. J. Cohen & A. M. Donnellan (Eds.), Handbook of autism and pervasive
developmental disorders (pp. 61–84). New York: Wiley.
Paul, R. (1995). Language disorders from infancy through adolescence: Assessment and intervention. St. Louis, MO:
Mosby.
Piven, J., Harper, J., Palmer, P., & Arndt, S. (1996). Course of behavioral change in autism: A retrospective study of
high-IQ adolescents and adults. Journal of the American Academy of Child and Adolescent Psychiatry, 35, 523–529.
Prizant, B. M., & Wetherby, A. M. (1993). Communication in preschool autistic children. In E. Schopler, M. E. Van
Bourgondien, & M. M. Bristol (Eds.), Preschool issues in autism (pp. 95–128). New York: Plenum.
Ramberg, C., Ehlers, S., Nyden, A., Johansson, M., & Gillberg, C. (1996). Language and pragmatic functions in school-
age children on the autism spectrum. European Journal of Disorders of Communication, 31, 387–414.
Rapin, I. (1996). Classification of autistic disorder. In I. Rapin (Ed.), Preschool children with inadequate
communication. (pp. 10–20). London: MacKeith Press.
Rimland, B. (1964). Infantile autism. New York: Appleton.
Rimland, B., & Edelson, S. M. (1995). Brief report: a pilot study of auditory integration training in autism. Journal of
Autism & Developmental Disorders, 25, 61–70.
Roux, S., Malvy, J., Bruneau, N., Garreau, B., Guerin, P., Sauvage, D., & Barthelemy, C. (1994). Identification of
behaviour profiles within a population of autistic children using multivariate statistical methods. European Child and
Adolescent Psychiatry, 4, 249–258.
Rutter, M., & Schopler, E. (1987). Autism and pervasive developmental disorders: Concepts and diagnostic issues.
Journal of Autism and Developmental Disorders, 17, 159–186.
Schopler, E., Reichler, R. J., & Renner, B. R. (1988). The Childhood Autism Rating Scale (CARS). Revised. Los Angeles:
Western Psychological Services.
Sevin, J. A., Matson, J. L., Coe, D., Love, S. R., Matese, M. J., & Benavidez, D. A. (1995). Empirically derived subtypes
of pervasive developmental disorders: A cluster analytic study. Journal of Autism and Developmental Disorders, 25,
561–578.
Shields, J., Varley, R., Broks, P., & Simpson, A. (1996). Hemispheric function in developmental language disorders and
high-level autism. Developmental Medicine and Child Neurology, 38, 473–486.
Snow, C. E., & Pan, B. A. (1993). Ways of analyzing the spontaneous speech of children with mental retardation: The
value of cross-domain analyses. In N. W. Bray (Ed.), International review of research in mental retardation (Vol. 19,
pp. 163–192). New York: Academic Press.
Sponheim, E. (1996). Changing criteria of autistic disorders: A comparison of the ICD-10 research criteria and DSM–IV
with DSM–III–R, CARS, and ABC. Journal of Autism and Developmental Disorders, 26, 513–525.
Strain, P. S. (1990). Autism. In M. Hersen & V. B. Van Hasselt (Eds.), Psychological aspects of developmental and
physical disabilities: A casebook (pp. 73–86). Newbury Park, CA: Sage.
Page 186
Trevarthen, C., Aitken, K., Papoudi, D, & Robarts, J. (1996). Children with autism: Diagnosis and interventions to meet
their needs. London: Jessica Kingsley.
Waterhouse, L. (1996). Classification of autistic disorder (AD). In I. Rapin (Ed.), Preschool children with inadequate
communication (pp. 21–30). London: MacKeith Press.
Waterhouse, L., Morris, R., Allen, D., Dunn, M., Fein, D., Feinstein, C., Rapin, I., & Wing, L. (1996). Diagnosis and
classification in Autism. Journal of Autism and Developmental Disorders, 26, 59–86.
Wender, E. (1995). Hyperactivity. In S. Parker & B. Zuckerman (Eds.), Behavioral and developmental pediatrics (pp.
185–194). Boston: Little, Brown.
Williams, D. (1996). Autism—An inside-out approach: An innovative look at the mechanics of ‘autism ‘ and its
developmental ‘cousins.’ Bristol, PA: Jessica Kingsley.
Wing, L. (1991). The relationship between Asperger’s syndrome and Kanner’s autism. In U. Frith (Ed.), Autism and
Asperger syndrome (pp. 93–121). Cambridge, England: Cambridge University Press.
Wolf-Schein, E. G. (1996). The autistic spectrum disorder: A current review. Developmental Disabilities Bulletin, 24(1),
33–55.
World Health Organization. (1992). The ICD-10 classification of mental and behavioral disorders: Clinical descriptions
and diagnostic guidelines. Geneva, Switzerland: Author.
World Health Organization. (1993). The ICD-10 classification of mental and behavioral disorders: Diagnostic criteria
for research. Geneva, Switzerland: Author.
Page 187
CHAPTER
8

Children with Hearing Impairment


Defining the Problem

Suspected Causes

Special Challenges in Assessment

Expected Patterns of Oral Language Performance

Related Problems
Bradley was 5 years old when it was determined that he had a mild, bilateral sensorineural hearing loss. Prior to
entering kindergarten, his parents described him as a shy child who disliked larger play groups and preferred playing
alone or with one close friend. In a noisy 16-child classroom, the adequacy of his hearing was first questioned by his
kindergarten teacher, who reported that she often had difficulty getting his attention and found his poor attention during
circle time inconsistent with his good attention in one-on-one situations. A hearing screening by a speech-language
pathologist, which was performed because of concerns about delayed phonologic development, was the immediate
source of a referral for the complete audiological examination in which his hearing loss was identified. After detection of
the hearing loss, Bradley was fitted for binaural behind-the-ear aids. (He loved the bright blue earmolds and tubing he
was allowed to choose.) Within a short time of the fitting, Bradley appeared more attentive during circle time and
readily made progress in work on targeted speech distortions.
Page 188
Sammy, or Samantha on formal occasions, is a 3-year-old whose moderate high-frequency hearing loss was identified
shortly after birth following her failure on a high-risk screening conducted because of her family history of hearing loss.
Because initially an ear-level fitting proved unfeasible, Sammy used a body-worn aid, which was replaced by a behind-
the-ear fitting at age 1½. Six months ago, the use of an FM trainer was extended to the home after continuous use in a
preschool group that she had attended since age 1½. Although she is experiencing some delays in speech, her
communication development otherwise appears on-target.
Desmond’s profound hearing loss was identified using auditory brainstem response (ABR) during his 3-week stay in a
neonatal intensive care unit, following his premature birth at 7 months gestational age with a birth weight of 3.1 pounds.
He required ventilator support for 5 days after birth. Now 5 years old, Desmond’s parents have been frustrated by
Desmond’s slow progress in oral language development despite years of participation in special education and several
failed attempts at successful amplification. Desmond currently uses a vibrotactile aid to increase his awareness of
environmental sounds and his speech reception and is being considered as a candidate for a cochlear implant.
Defining the Problem
Estimates of the prevalence of hearing impairment in children vary from 0.1 to 4%—or from 1 in every 25 to 1 in every
1,000 children—depending on the definitions used (Bradley-Johnson & Evans, 1991; Northern & Downs, 1991). Of
children between the ages of 3 and 17, about 52,000 have impairments severe enough to be termed deafness, where
deafness can be defined as a hearing loss, usually above 70 dB, that precludes the understanding of speech through
listening (Ries, 1994). When all levels of hearing loss are considered, hearing impairment is the most common disability
among American school children (Flexer, 1994).
The negative impact of deafness for the normal acquisition of oral language may seem obvious: You cannot learn about
phenomena with which you have limited experience. In addition, for children with profound hearing loss, this experience
is largely restricted to a sensory channel (i.e., vision) that is mismatched to the most distinctive characteristics of that
phenomenon (i.e., oral language). One line of evidence suggesting how great this mismatch is comes from a growing
body of research suggesting that the structure of oral languages differs substantially from that of visuospatial languages
(such as sign; Bellugi, van Hoek, Lillo-Martin, & O’Grady, 1993). Nonetheless, there is research suggesting that
lipreading becomes more important to oral language development as hearing impairment worsens (Mogford-Bevan,
1993).
Because limiting auditory exposure limits learning opportunities, even children with milder hearing losses—who
therefore obtain greater amounts of acoustic information about oral language than children with greater hearing losses—
experience significant consequences for their spoken language reception in everyday situations. Therefore, although this
chapter focuses most intently on children with greater degrees of hearing impairment, it also alerts readers to the
jeopardy in which children
Page 189
with even unilateral or “mild” bilateral hearing impairments are placed when it comes to language learning and academic
success (Bess, 1985; Bess, Klee, & Culbertson, 1986; Carney & Moeller, 1998; Culbertson & Gilbert, 1986; Oyler,
Oyler, & Matkin, 1988). In the Personal Perspective for this chapter, a teenager describes the ways in which deafness has
affected her school life.
PERSONAL PERSPECTIVE
The following is an excerpt from the transcript of a statement made by Darby, a high school junior with a profound
hearing impairment. She speaks about the academic and personal challenges facing her in school:
“I have never, and most likely never will, hear sounds in the same way as a hearing person. As a result, hearing people
experience things every millisecond of the day that I never will. By the same token, I have experienced things and will
experience things that no hearing person can.
“My deafness makes me different, and that difference makes me strong. I seem to get respect from other people just for
doing things a hearing person can do with ease. For example, watch television, use the telephone, listen to music, and
so on. For whatever reason, I never think about the fact that I am doing something that would normally be difficult for
someone who couldn’t hear. In fact, I have never looked at myself as someone who was limited in any way, someone
who couldn’t do something that any other hearing person could do I’ve always know that I was different, but even
though people would intimate that I wasn’t able to compete on the same’s level as hearing people, I would ignore them,
or maybe I just didn’t “hear” them.
“I have always attended Dalton, a private hearing school. It has never been, and never will be, easy for me. I have
experienced periods of rejection and isolation, but I have proven myself worthy of the privilege of attending this school
by receiving grades as good as many of my hearing peers and better than most.
“I have definitely survived the academic challenges of my school and life. Socially, I still feel though that I’m not
accepted as a true equal, but hey, that’s their problem, they don’t know what they’re missing.’’ (Ross, 1990, pp. 304–
305)
Overall degree of hearing loss, or magnitude, is a major descriptor of hearing impairment, usually based on an estimate
of an individual’s ability to detect the presence of a pure tone at three frequencies important for speech information (500,
1000, and 2000 Hz; Bradley-Johnson & Evans, 1991). Table 8.1 lists major categories of hearing loss and provides some
preliminary information about the effects of that level of loss. Although deafness is not listed as a category in the table, it
is frequently used to refer to a hearing loss greater than or equal to 70 dB (Northern & Downs, 1991).
Page 190
Table 8.1
Effect of Differing Magnitudes of Hearing Loss

Average Description What Can Be Heard Handicapping Effects Probable Needs


Hearing Level Without Amplification (If Not Treated in First
(500–2000 Hz) Year of Life)

0 to 15 dB Normal range All speech sounds None None


15 to 25 dB Slight hearing loss Vowel sounds are heard Mild auditory Consideration of need
clearly; unvoiced dysfunction in language for hearing aid, speech-
consonant sounds may learning reading, auditory
be missed training, speech
therapy, preferential
seating
25 to 30 dB Mild hearing loss Only some of the speech Auditory learning Hearing aid, speech
sounds are heard—the dysfunction, mild reading, auditory
louder voiced sounds language retardation, training, speech therapy
mild speech problems,
inattention
30 to 50 dB Moderate hearing Almost no speech sounds Speech problems, All of the above, plus
loss are heard when produced language retardation, consideration of special
at normal conversational learning dysfunction, classroom situation
level inattention
50 to 70 dB Severe hearing loss No speech sounds are Severe speech problems, All of the above,
heard at normal language retardation, probable assignment to
conversational level learning dysfunction, special classes
inattention
70+ dB Profound hearing No speech or other Severe speech problems, All of the above,
loss sounds are heard language retardation, probable assignment to
learning dysfunction, special classes
inattention

Note. From Hearing in Children (4th ed., p. 14), by J. L. Northern and M. P. Downs, 1991, Baltimore: Williams &
Wilkins. Copyright 1994 by Williams & Wilkins. Reprinted with permission.
Page 191
The term hard of hearing is used to refer to lesser degrees of hearing loss that allow speech and language acquisition to
occur primarily through audition (Ross, Brackett, & Maxon, 1991).
In addition to the magnitude of loss, related variables that influence how children’s language is affected include (a)
variables affecting the auditory nature of the loss (such as type, configuration, and whether the loss is unilateral or
bilateral), (b) the age at which the hearing loss is acquired, (c) the age at which it is identified, and (d) how well the loss
is managed.
Type of hearing loss—conductive, sensorineural, or mixed—refers to the physiological site responsible for reduced
sensitivity to auditory stimuli. Conductive hearing losses result from conditions that prevent adequate transmission of
sound energy somewhere along the pathway leading from the external auditory canal to the inner ear. They can result
from conditions that block the external ear canal or interfere with the energy-transferring movement of the ossicles
(small bones) of the middle ear. Conductive losses are generally similar across frequencies and, at their most severe, do
not exceed 60 dB (Northern & Downs, 1991). Such losses can often be corrected or significantly reduced using medical
or surgical therapies (Paul & Jackson, 1993).
One particularly common cause of conductive hearing loss is middle ear infection, otitis media. The hearing loss
associated with this condition may be the most widely experienced form of hearing loss, given that 90% of children in
the United States have had at least one episode of otitis media by age 6 (Northern & Downs, 1991). Although not all
episodes of otitis media are associated with hearing losses, when they are observed the overall magnitudes of loss have
generally been found to fall from 20 to 30 dB in the affected ear (Frial Cantekin, & Eichler, 1985).
Sensorineural hearing losses result from damage to the inner ear or to some portion of the nervous system pathways
connecting the inner ear to the brain. They are responsible for the most serious hearing losses, accounting for or
contributing to most hearing losses in the severe to profound range. In addition, they account for most congenital hearing
losses (Scheetz, 1993) and are rarely reversible (Northern & Downs, 1991).
Mixed hearing losses refer to losses in which both conductive and sensorineural components are evident. Because the
conductive components of a mixed hearing loss are generally treatable, such losses often become sensorineural in nature
following effective treatment for the condition underlying the conductive loss. For example, a child with Down
syndrome may experience a mixed loss consisting of a sensorineural loss exacerbated by poor eustachian tube function
and chronic otitis media. Effective management of the middle ear condition can reduce the magnitude of the loss
substantially in many cases. Consequently, clinicians who work with children who have sensorineural losses need to be
especially aware that an already significant degree of loss can be further worsened if middle ear disease goes undetected.
Central auditory processing disorders refer to abnormalities in the processing of auditory stimuli occurring in the absence
of reduced acuity for pure tones or at a more pronounced level than would be expected given the degree of reduced
acuity. In especially severe cases, such difficulties have been described as a specific type of language disorder: verbal
auditory agnosia (Resnick & Rapin, 1991). Although central auditory processing disorders receive increasing attention
by audiologists, their sepa-
Page 192
rability from language disabilities and other learning disabilities continues to be debated (Cacace & McFarland, 1998;
Rees, 1973).
Hearing loss configuration refers to the relative amount of loss occurring at different frequency regions of the sound
spectrum. For example, a high-frequency loss is one in which the loss is largely or solely confined to the higher
frequencies of the speech spectrum. In contrast, a flat hearing loss is one in which the degree of loss is relatively constant
across the spectrum.
Knowing the magnitude and configuration of an individual’s hearing loss can help you predict what sounds will be
difficult for him or her to hear at specific loudness levels. A pair of figures may help illustrate this. Figure 8.1 consists of
two frequency × intensity graphs (like those of a traditional audiogram) on which are plotted a variety of common
sounds occurring at various intensity levels and frequencies. The shaded area on Figure 8.1A indicates the sound
frequencies and intensities that might not be heard by children with severe high-frequency hearing losses—children such
as Sammy, who was described at the beginning of the chapter. Although Sammy would easily hear environmental
sounds such as car horns or telephones as well as many speech sounds when they are produced at conversational
loudness levels, she would probably miss most fricative sounds because of their high frequency (high pitch) and low
intensity (softness) when they are produced in the same conversations.
Figure 8.1B represents the kind of loss frequently associated with deafness, the kind of loss demonstrated by Desmond.
The negligible amount of auditory information to which Desmond has access is well-illustrated by this figure. The
centrality of visual information to Desmond’s interactions with the world is further brought home when you are told that
even the best available amplification would probably fail to improve Desmond’s access to sound information.
Consequently, it is not surprising that vision has been called “the primary input mode of deaf children” (Ross, Brackett,
& Maxon, 1991) and that management of the communication needs of such children often veers away from methods in
which auditory information plays a major role (Nelson, Loncke, & Camarata, 1993), although growing effectiveness of
cochlear implants may increase that somewhat, especially as cochlear implants are used at younger ages (Tye-Murray,
Spencer, & Woodworth, 1995). A cochlear implant entails the insertion of a sophisticated device that includes an internal
receiver/stimulator and an external transmitter and microphone with a micro speech processor (Sanders, 1993). Their
rapid development and increasing application make them an exciting development in the management of severe hearing
losses.
Whether one or both ears are affected represents another important factor determining the significance of a hearing loss.
Unilateral hearing losses, ones affecting only one ear, usually have fewer negative consequences than bilateral hearing
losses. That does not mean, however, that unilateral losses are insignificant. Adequate hearing in both ears is of
particular importance when listening to quiet sounds or in noisy surroundings—especially for children. This special
importance of bilateral hearing in children arises because their incomplete language acquisition makes using language
knowledge and environmental context to “guess” the message being conveyed by an imperfect signal much harder for
them than it is for adults. In a study conducted by Bess et al. (1986), about one third of the children who exhibited
unilateral sen
Page 193
Fig. 8.1. Figures illustrating the types of sounds that are likely to be heard (unshaded areas) and not heard (shaded areas)
for two different hearing losses: a severe high-frequency loss (8.1A) and a profound hearing loss (8.1B). For purposes of
clarity, but contrary to most instances in real life, these figures represent hearing loss as identical for each ear. From
Hearing in children (4th ed., p. 17), by Northern and Downs, Baltimore: Williams & Wilkins. Copyright © 1991 by
Williams & Wilkins. Adapted by permission.
Page 194
sorineural hearing losses of 45 dB HL or greater were found to have either failed a grade or required special assistance in
school.
Despite the importance of the nature of hearing loss affecting a child to that child’s overall outcome for speech and oral
language, several nonauditory factors can play a very significant role. For example, the age at which a hearing loss is
acquired has a tremendous impact on the extent to which it will interfere with the acquisition of oral language.
Congenital hearing losses, those present at birth, are more detrimental than those acquired in early childhood, which in
turn are more detrimental than those acquired in later childhood or adulthood. Even 3 or 4 years of good hearing can
dramatically alter a child’s later language skills (Ross, Brackett, & Maxon, 1991). This fact has led to the use of the term
prelingual hearing loss to refer to a hearing loss acquired before age 2, which is thus thought to be associated with a
more significant impact (Paul & Jackson, 1993).
The age of detection of hearing loss in children is yet another variable affecting the oral language of hearing-impaired
children. The earlier the detection of hearing loss in children, the better the outcome for language acquisition—assuming,
of course, that adequate intervention follows. Recently devised methods, such as the measurement of auditory brainstem-
evoked responses and transient-evoked otoacoustic emissions, permit the detection of even mild hearing loss in children
from shortly after birth (Carney & Moeller, 1998; Mauk & White, 1995; Northern & Downs, 1991). Between 10 and
26% of hearing loss is estimated to exist at birth or to occur within the first 2 years of life (Kapur, 1996), thus making
efforts at detection an ongoing need.
Despite the possibility of early detection, however, hearing loss will escape detection for varying periods of time in
children whose hearing is not screened or is screened prior to the onset of the loss. In a recent study, Harrison and Roush
(1996) surveyed the parents of 331 children who had been identified with hearing loss. They found that when there was
no known risk factor, the median age of identification of hearing loss was about 13 months for severe to profound losses
and 22 months for mild to moderate losses. Although the presence of known risk factors was associated with decreased
age at identification for milder losses (down to about 12 months), identification for more severe losses remained about
the same in this group (12 months). Median additional delays of up to 10 months were observed between identification
of hearing loss and early interventions. These delays represent precious lost time for children whose auditory experience
of the world is compromised. Only in late 1999 have efforts to make universal screening of infant hearing a reality
(Mauk & White, 1995) received momentous support in the form of The Newborn and Infant Hearing Screening and
Intervention Act of 1999. This federal legislation provides new funding for newborn hearing screening grants to
individual states. It is hoped that this funding will cause all states to implement infant screening programs leading to a
revolution in the early identification of hearing loss.
A fourth factor influencing how hearing loss will affect children’s language development is the management of the loss.
For children with mild and moderate bilateral or unilateral losses, there is considerable agreement as to the approaches
that will optimize their access to the auditory signal on which they will rely for processing infor-
Page 195
mation about oral language. Table 8.2 lists some of the types of interventions typically considered in the hearing
management of children with these lesser degrees of loss.
When it comes to children with greater losses, however, there is much controversy among professionals as well as
members of the Deaf community (Coryell & Holcomb, 1997). A frequent battleground for those interested in
interventions for deaf youngsters concerns the primacy of oral versus signed language. Arguments favoring an emphasis
on oral language stress that the vast majority of society are users of oral language and, therefore, deaf children should be
given tools with which to negotiate effectively within that context. Further, it can be stressed that their families will
almost always (90% of the time) be composed entirely of hearing individuals (Mogford, 1993).
Arguments favoring an emphasis on sign language stress that the Deaf community is a cohesive subculture in which
visuospatial communication is the effective norm. In fact, in recent years, the Deaf community has begun to advocate for
a difference rather than disorder perspective on hearing impairment, a political perspective thought to be vital to the
emotional and social well-being of its members (Corker, 1996; Harris, 1995). Arguments favoring a strong emphasis on
sign language also sadly note that only poor levels of achievement in oral language and particularly poor
Table 8.2
Interventions Used With Children Who Have Mild
and Moderate Hearing Impairment (Brackett, 1997)

Method Function

Personal amplification FM radio systems used with remote Increase loudness levels of acoustic signals; acoustic
microphones signal enhanced relative to background noise levels; A
far superior means of dealing with a noisy classroom
than preferential seating (Flexer, 1994); one of several
types of special amplification systems (Sanders, 1993)
Sound treatment of classrooms (e.g., using carpets, acoustic Reduction of reverberation and other sources of noise
ceiling tiles, curtains)
Preferential seating Reduction of distance between speaker and child can
increase audibility of a signal; Sitting next to a child is
better than sitting in front of the child (Flexer, 1994),
although for children who require visual information,
this strategy decreases access to visual information
Inclusion in regular classroom with supplementation Provision of the wealth of social and academic
through pull-out services experiences afforded by regular classrooms, with
support designed to preview and review instructional
vocabulary as well as work on communication goals
inconsistent with classroom setting (e.g., the earliest
stages involved in acquiring a new communicative
behavior)
Auditory learning program (e.g., Ling, 1989; Stout & Improvement of the child’s attention and use of auditory
Windle, 1992) information enhanced by personal and classroom
amplification
Page 196
levels of achievement in written language (which often plateaus at a third-grade level) have been the norm in studies of
individuals with severe to profound hearing losses (Dubé, 1996; Paul, 1998).
Total communication was originally proposed as the simultaneous use of multiple communication modes (e.g.,
fingerspelling, sign language, speech, and speech reading) selected with the child’s individual needs in mind. As
implemented, however, total communication has been found typically to consist of the simultaneous use of speech and
one of several sign languages other than American Sign Language (ASL) that use word order and word inflections
closely resembling those of spoken English (Coryell & Holcomb, 1997). The most prominent examples of these sign
languages, sometimes referred to as manually coded English systems, are Signing Essential English (SEE-1), Signing
Exact English (SEE-2) and Signed English. Although most classroom teachers report using this relatively limited form of
total communication (sometimes termed simultaneous communication), it is infrequently used among adults in the Deaf
community (Coryell & Holcomb, 1997).
In a review of studies of treatment efficacy for hearing loss in children, Carney and Moeller (1998) noted a current trend
toward considering oral language as a potential second language for deaf children, to be acquired after some degree of
proficiency in a first (visuospatial) language is attained. This approach, termed the bilingual education model, is seen by
some as having the strengths associated with learning a language (i.e., ASL) for which a cohesive community of users
exists, while at the same time valuing the importance of English competence as a curricular rather than rehabilitative
issue (Coryell & Holcomb, 1997; Dubé, 1996). Data supporting this approach, however, are relatively sparse as yet. To
date, such data consist of evidence of strong academic performance in English by deaf children reared by deaf parents
who are proficient in ASL and evidence that skills in English are strongly related to skills in ASL, independent of
parental hearing status (Moores, 1987; Spencer & Deyo, 1993; Strong & Prinz, 1997).
A recent position statement of the Joint Committee of ASHA and the Council on Education of the Deaf (1998) illustrates
the growing influence of the Deaf community’s insistence that deafness be viewed as a “cultural phenomenon” rather
than a clinical condition (Crittenden, 1993). In that position statement, professionals are cautioned to adopt terminology
that respects the individual and family or caregiver preferences while facilitating the individual’s access to services and
assistive technology. Sensitivity to cultural factors is a requisite for speech-language pathologists in all settings working
with all populations. For speech-language pathologists working with members of the Deaf community, it is a
requirement of critical importance to the deaf child’s social and emotional development.
Suspected Causes
What is currently known about the causes of permanent hearing impairment in children is almost entirely restricted to
studies focused on more serious levels of hearing loss, especially deafness. Although there may be considerable overlap
in the known
Page 197
causes of deafness and milder degrees of impairment, differences also exist. Because this section limits itself to causes
related to these more severe levels of hearing loss, I remind readers that what I say relates less clearly to children with
milder losses.
Genetic factors are suspected in about half of all cases of deafness (Kapur, 1996; Vernon & Andrews, 1990). Of these
genetically based instances of deafness, about 80% are due to autosomal recessive disorders, almost 20% are autosomal
dominant disorders, and the remaining are sex-linked (Fraser, 1976). Because recessive disorders demand that both
parents of an individual contribute a defective gene for their offspring to demonstrate the disorder without necessarily
showing evidence of the disorder themselves, it is relatively uncommon for children with congenital deafness to have
parents who are also deaf. This information is important for appreciating that most congenitally deaf children grow up
with parents whose first language is oral and who will need to acquire sign as a belated second language if they are to
assist their child’s acquisition of sign.
Genetically caused deafness sometimes occurs within the context of genetic syndromes in which one or more specific
organ systems (e.g., the skeleton, skin, nervous system) are also affected. About 70 such syndromes have been identified,
including Down syndrome, Apert syndrome, Treacher Collins, Pierre Robin, and muscular dystrophy (Bergstrom,
Hemenway, & Downs, 1971). Although most genetically caused deafness will be sensorineural in type, conductive
components are also observed. Some syndromes are associated with hearing losses that are progressive, causing
increasing hearing loss over time, often at unpredictable rates. Examples of such syndromes are Friedrich’s ataxia, severe
infantile muscular dystrophy, and Hunter syndrome, as well as the closely related Hurler syndrome.
Nongenetic causes of deafness include prenatal rubella, postnatal infection with meningitis, prematurity, rh factor
incompatibility between mother and infant, exposure to ototoxic drugs, syphilis, Meniere’s disease, and mumps (Vernon
& Andrews, 1990). Four of these factors—prenatal rubella, meningitis, syphilis, and mumps—are infectious diseases
meaning that their successful prevention can drastically reduce instances of deafness from those causes.
The three noninfectious factors most commonly associated with hearing loss in children are rh factor incompatibility,
exposures to ototoxic drugs, and Meniere’s. Rh factor incompatibility refers to a condition in which a mother and the
embryo she is carrying have blood types characterized by discrepant rh factors, a circumstance that stimulates the
production of maternal antibodies against the developing child. This condition is currently considered preventable
through maternal immunization or the treatment of the infant using phototherapy or transfusions (Kapur, 1996).
Ototoxicity refers to a drug’s toxicity to the inner ear. Although the use of drugs with this side effect is usually avoided
in pregnant women and infants, they may be required as the only effective treatment for some diseases. Monitoring of
hearing can frequently prevent hearing loss in children who require treatment with ototoxic drugs because of infections
or cancer (Kapur, 1996).
Prematurity, birth 2 or more weeks prior to expected due date (Dirckx, 1997), is an increasingly frequent correlate of
hearing impairment. Whereas mortality was once
Page 198
an almost certain outcome of prematurity, improved neonatal care over the past half century (Vernon & Andrews, 1990)
has resulted in the increased survival of children who nonetheless may show residual effects. Premature birth is most
directly associated with hearing impairment and other co-occurring difficulties (e.g., mental retardation, cerebral palsy)
through the neurologic stresses it places on the infant. Indirect links between prematurity and hearing impairment lie in
the fact that premature birth is frequently precipitated by conditions that are themselves associated with hearing
impairment (such as prenatal rubella, meningitis, and rh factor incompatibility). Prematurity increases risk of deafness by
20 times (Kapur, 1996).
Special Challenges in Assessment
When assessing the oral communication skills of children with hearing impairment, the speech-language pathologist is
confronted with numerous threats to the validity of his or her decision making. Therefore, in addition to the usual care
that must be taken to determine the precise questions prompting assessment and factors that may complicate accurate
information gathering, clinicians working with children whose hearing is temporarily (e.g., during episodes of otitis
media) or permanently impaired, must consider a larger than usual range of possible complicating factors and necessary
adaptations. Table 8.3 lists some of the considerations related to the evaluation of language skills of a child with hearing
impairment.
A major first consideration for children with very severe hearing loss is the choice of language or languages in which the
child is to be assessed. Often, testing in both a sign and an oral language is reasonable for obtaining information about
potentially optimal performance as well as about development with the alternative form.
Complexities of the child’s hearing loss and of its management will need to be considered in making this decision,
because children who may be considered deaf do not always receive enough exposure to sign language to consider it
their first language (Mogford, 1993).
Although efforts to standardize assessments of ASL have begun (e.g., Lillo-Martin, Bellugi, & Poizner, 1985; Prinz &
Strong, 1994; Supalla et al., 1994), children’s performance in ASL (the most common sign language system in the
United States) is usually informally assessed by individuals with high levels of proficiency in ASL. A small number of
standardized tools have been developed. Among these are the Caro
Table 8.3
Considerations When Planning the Assessment of a Child With Impaired Hearing

Determine what modality or modalities will be used.


Match demands placed on hearing to assessment question.
Ensure that instructions are understood.
Identify an appropriate normative group for norm-referenced interpretations.
Ensure optimal attention and minimal distractions.
Consider the use of modifications and describe in reports of testing.
Rely on multiple measures and team input for preparation and interpretation.
Page 199
lina Picture Vocabulary Test (Layton & Holmes, 1985), which is designed for use with children age 2 years–8 months to
18 years whose primarily mode of communication is sign. Dubé (1996) discusses the current pressing need for better
methods of assessing children’s competence in both ASL and English.
For children with severe to profound hearing losses, assessment of oral language may also require interactions in ASL (e.
g., to assure that a task is understood). Maxwell (1997) reasonably pointed out that deaf individuals often use both sign
and spoken language depending on the demands of the communicative situation, and he pointed out that determining
what modes of communication a child uses are too often based on hearsay or the clinician’s own limitations in
identifying the communication system being used. Therefore, speech-language pathologists who work frequently with
hearing impaired children should be proficient in sign themselves and, ideally, in both signed English and ASL modes.
Those who are not proficient but receive occasional requests to serve hearing-impaired children should proceed carefully
in determining what can be done in the absence of such proficiency and should be prepared to make referrals as needed
to ensure optimal assessment data. The majority of this section of the chapter is devoted to considerations coming into
play during oral language testing.
For children with all degrees of hearing loss, one of the first considerations in oral language testing is the listening
condition confronting the child. For example, is the setting in which the testing is done relatively quiet? If very quiet,
optimal performance may be assessed (assuming other factors are optimal). If less quiet, optimal performance will be
unlikely, but useful information for extrapolating typical performance in similar settings may be obtained. In many cases,
language esting is performed for purposes of examining optimal performance. However, if the purpose of testing is to
determine the kind of difficulty facing the child in a conventional classroom, then testing in noisier environments would
be indicated. Ying (1990) discussed a systematic approach to examining the child’s functional auditory skills under
conditions varying (a) access to both visual and auditory information versus to auditory information only and (b)
auditory stimuli that are close versus far in conditions that are (c) noisy versus quiet.
Knowledge of the child’s listening conditions includes not only information about the ambient environment, but also
about the status of the child’s hearing and hearing aid at the time of testing. Recalling that children’s hearing can be
affected more readily by middle ear infections than can adults’ makes it particularly important to know whether the child
has an upper respiratory infection or is showing signs of reduced hearing, such as altered response to auditory stimuli or
appearing confused or in pain in noisy situations (Flexer, 1994). Ascertaining directly or indirectly that a hearing aid has
charged batteries and is functioning well is time well spent given studies indicating that children’s hearing aids are
frequently found to be functioning unacceptably (e.g., Musket, 1981; Worthington, Stelmachowicz, & Larson, 1986). In
addition, when hearing aids are in place but not functioning because of a dead battery, their ‘‘use” has been found to
reduce hearing by an additional 25 to 30 dB at critical speech frequencies (Smedley & Plapinger, 1988).
Ensuring that directions are understood is obviously most crucial for receptive language testing, but can be critical for
expressive language testing as well, particularly when verbal instructions are used. Besides the steps described earlier,
which are
Page 200
aimed at improving the child’s auditory access to information, seating to make the tester’s face visible and well-lit (not
back-lit) can help. Because children with hearing loss may fail to signal their incomplete understanding of directions
(Paul & Jackson, 1993), the clinician must be particularly watchful for hesitations or facial expressions indicating a lack
of understanding. In addition, students should be encouraged to ask questions when they are uncertain (Bradley-Johnson
& Evans, 1991).
When the questions being asked in an assessment are in regard to whether the child’s performance is like that of his or
her peers, the ticklish question of norms is presented. Sometimes it is assumed that those peers should be a group that is
similar in age to the tested child (e.g., because these are the children with whom a child will be compared at school and
with whom he or she shares a common developmental history; Bradley-Johnson & Evans, 1991). In such cases, finding
appropriate norms is relatively easy, and the use of peers with normal hearing is quite appropriate (Brackett, 1997; Ying,
1990). However, using that normative group will not help you figure out whether any observed decrements in
performance are due primarily to hearing differences, or whether additional cognitive or environmental barriers to
language learning exist. For those questions, norms should ideally consist of children with similar patterns of hearing
impairment and similar developmental experiences. Alternatively, interpretations based on information other than norms
should be considered (for detailed discussion of informal methods, see Maxwell, 1997; Moeller, 1988; Ross, Brackettt,
& Maxon, 1991; and Yoshinaga-Itano, 1997).
Table 8.4 lists language tests that have been developed for or normed on children with hearing impairment (Bradley-
Johnson & Evans, 1991). Even when such norms are available, determining the appropriateness of the norms still hinges
on the test user’s examination of the test manual for specific information about the normative sample. For children with
hearing impairment, factors affecting the relevance of norms include the group’s age of onset of the hearing loss, degree
and type of loss, etiology, presence of other significant problems, and the communication used during testing (Bradley-
Johnson & Evans, 1991).
Once an appropriate measure has been selected, increasing the child’s attention to the task and minimizing distractions
further enhance the possibility of obtaining information reflecting optimal performance. Positioning oneself close to the
child and paying close attention to the child’s gaze as a signal of current focus can help increase attention while also
minimizing distractions (Maxwell, 1997).
By modifying testing procedures, one risks invalidating normative comparisons. However, when testing modifications
are noted in reports on the testing and discussed for their possible effects on test validity, their use can actually improve
validity by removing sources of error that are unrelated to the skill or attribute being tested. Ying (1990) discussed a
number of possible modifications to use when testing children with hearing impairment. These include asking the child
to repeat all verbal stimuli to ensure that poor reception is not undermining performance and using extra demonstration
items to ensure that the child understands the task demands. Another possible modification she recommended was
repeating verbally presented test items. Also, when standardized instructions call for simultaneous presentation of verbal
and visual stimuli, she suggested altering procedures so that the verbal stimulus is presented
Page 201
Table 8.4
Language Tests for Children With Hearing Impairment That Were Designed or Adapted for
Children With Hearing Impairment (Bradley-Johnson & Evans, 1991)

Test Ages Description of the Test Comments

Batelle Developmental Inventory Birth to 8 years Tests across 5 domains: personal-social,


(Newborg, Stock, Wnek, Guidubaldi, & of age adaptive, motor, communication, and
Svinicki, 1984) cognitive; purpose is to identify children
with handicaps, determine strengths and
weaknesses, and help in planning
instruction and monitoring progress
Although children with hearing impairment are described as an appropriate population for testing, neither norms nor
studies validating that use are contained in the test manualAdaptations of items in the communication domain have been
described as “inappropriate” Grammatical Analysis of Elicited Language—Pre-sentence Level (GAEL-P; Moog, Kozak,
& Geers, 1983) 3 to 6 years Skills assessed for comprehension, prompted production, and imitated production, with
items at 3 levels: readiness, single words, and word combinations Standardized on 150 hearing-impaired children
enrolled in oral educational programs whose hearing impairment was not describedNo data for children who use manual
communicationScores expressed as percentiles Grammatical Analysis of Elicited Language—Simple Sentence Level
(GAEL-S; Moog & Geers, 1985) 5 to 9 years Skills are assessed in terms of prompted production and imitation. Norms
obtained for 3 groups of children with hearing impairment and one non-hearing-impaired group; considerable
information is available about these groups; one of the groups with hearing impairment came from total education back-
grounds and was tested using that method94 items assessing articles, modifiers, pronouns, subject nouns, object nouns,
wh- questions, verbs, verb inflections, copula inflections, prepositions, and negationScores expressed as percentiles or
language quotients(M= 100; SD = 15)
(Continued)
Page 202
Table 8.4 (Continued)

Test Ages Description of the Test Comments

Grammatical Analysis of Elicited Language— 8 to 12 years Skills are assessed in terms of prompted
Complex Sentence Levels (GAEL-C;Moog & production and imitation
Geers, 1980
Two groups of children, one with and one without hearing impairment were studied; the hearing-impaired children had
severe to profound levels of impairment and were without other problem areas;16 grammatical categories are assessed:
articles, noun modifiers, subject nouns, object nouns, noun plurals, personal pronouns, indefinite and reflexive pronouns,
conjunctions, auxiliary verbs, first clause verbs, verb inflections, infinitives and participles, prepositions, negation, and
wh- questions.Scores expressed as percentiles or language quotients(M = 100; SD = 15) Rhode Island Test of Language
Structure (Engen & Engen, 1983) 5 to 17+ years Designed to assess comprehension of syntax Normed on 364 children
with hearing impairment ranging from moderate to profound and 283 children without hearing impairment; considerable
information is available about the hearing-impaired group100 items are used to assess 20 sentence types, including
simple sentences, imperatives, negatives, passives, dative sentences, expanded simple sentences, adverbial clauses,
relative clauses, conjunctions; deleted sentences, noninitial subjects, embedded imperatives, and complementsTest may
be orally presented or presented through simultaneous presentation of signed and spoken EnglishResults are presented as
percentiles or standard scores. Scales of Early Communication Skills (SECS; Moog & Geers, 1975) 2 to 9 years Verbal
and nonverbal skills are assessed receptively and expressively through teacher ratings Standardized on 372 children from
2 years to 8 years, 11 months, with profound hearing impairments from oral programsInterexaminer reliability data only;
no test–retest data or validity information. Teacher Assessment of Grammatical Structures (TAGS; Moog & Kozak,
1983) Not specified Criterion-referenced teacher rating of children’s grammatical structures at four levels:
comprehension, imitated production, prompted production, and spontaneous production There are 3 levels of the test: pre-
sentence, simple sentence, and complex sentencesCan be used with children who use signed or spoken EnglishStructures
examined are less comprehensive than in other measures developed by Moog and her colleagues
Page 203
first followed by the visual stimulus, thus allowing the child to look at the clinician as he or she speaks. Use of an FM
listening situation during testing can also be recommended for obtaining information about optimal performance
(Brackett, 1997).
It is unlikely that one measure or one person who interacts with a hearing-impaired child will capture all of the child’s
strengths and weaknesses as a communicator (Moeller, 1988). Consequently, the speech-language pathologist will need
to rely on multiple measures and seek team input both as an assessment is planned and as it is interpreted. In addition to
the audiologist, the child’s educators, psychologists, and especially those who know the child the best—the child him- or
herself and the child’s parents—can be valuable sources of information. An excellent source of recommendations for
effective interactions with families can be found in Donahue-Kilburg (1992) and Roush and Matkin (1996).
Expected Patterns of Oral Language Performance
Despite evidence that even children with mild or unilateral hearing losses are at risk for academic difficulties (Bess,
1985; Bess et al., 1986; Carney & Moeller, 1998; Culbertson & Gilbert, 1986; Oyler et al., 1988), relatively little is
known about their oral or sign language development (Mogford-Bevan, 1993). To date, most research on oral language
development in children with hearing impairment has focused on children with more severe congenital losses (Mogford-
Bevan, 1993) or with the fluctuating hearing loss associated with otitis media (Klein & Rapin, 1992).
The fluctuating hearing loss associated with otitis media appears more important when combined with other risk factors
for disordered language development than it does when viewed as a single explanatory factor (Klein & Rapin, 1992;
Paul, 1995). In contrast, there is considerable evidence that deaf children and those who are hard of hearing experience
difficulties across all oral language domains and modalities—at least when comparisons are made against same-age
peers (Mogford-Bevan, 1993).
Syntax has been described as the “most severely affected aspect of language” in children with hearing loss that occurs
congenitally or in early childhood (Mogford-Bevan, 1993). Phonology is understandably quite affected, although some
children who appear to derive all of their phonological information visually (through speech reading) demonstrate the
ability to use the phonological code and show many phonological patterns consistent with younger, hearing children
(Mogford-Bevan, 1993). Documented semantic deficits involve lexical items referring to sounds and concepts related to
the ordering of events across time, and possibly, to the use of metaphorical language (Mogford-Bevan, 1993). Pragmatic
deficits are sometimes described and attributed to the close relationship of pragmatics to syntax as well as to changes that
occur in conversational interaction on the part of speaker and listener when one is deaf. A different pattern of
conversational initiation and turn-taking represents the milieu in which such children acquire their knowledge of
language use (Mogford-Bevan, 1993; Yoshinaga-Itano, 1997). Therefore, it has been suggested that comparisons with
hearing peers may not prove to be a useful means of understanding the pragmatic development of deaf children. In a
recent article, Yoshinaga-Itano (1997)
Page 204
described a comprehensive approach to assessing pragmatics, semantics, and syntax among children with hearing
impairment in which the interrelationships of these domains was stressed and both informal and formal measures were
used.
Related Problems
Children with hearing loss appear to be at increased risk for a number of problems (e.g., Voutilainen, Jauhiainen, &
Linkola, 1988). This increased risk may arise because the cause of the hearing loss has multiple negative outcomes (e.g.,
some genetic syndromes or infections can cause both mental retardation and hearing loss). Alternatively hearing loss
may make children more vulnerable (e.g., children who are less able to communicate for any reason may be a greater
risk for psychosocial difficulties). Despite a convergence of evidence suggesting increased risk, the specific prevalence
of multiple handicaps in children with hearing loss is a matter of considerable debate (Bradley-Johnson & Evans, 1991).
The prevalence of specific problems also appears to be related to etiology. For example, whereas children whose hearing
impairments are inherited tend to have fewer additional problems (inherited or unknown etiologies), those whose hearing
impairment is due to cytomegalovirus are at increased risk for behavioral problems (Bradley-Johnson & Evans, 1991).
In a 1979 study looking at additional problems areas for children with hearing impairment (Karchmer, Milone, & Wolk,
1979), the most common additional problems were mental retardation (7.8%), visual impairment (7.4%), and emotional–
behavioral disorder (6.7%). Although each of these problems was found to occur in less than 10% of children with
hearing loss, their prevalence was still considerably higher than in children without hearing loss (Bradley-Johnson &
Evans, 1991).
The increased prevalence of emotional–behavioral disorders is of interest because of the special management issues that
accompany it. Biological factors may be responsible for emotional–behavioral disorders in children with hearing loss.
However, it has also been suggested that mismatches between the child’s communication needs and capacities and those
of his or her caregivers and peers may contribute to special environmental stresses that increase a child’s risk of these
disorders (Paul & Jackson, 1993). Paul and Jackson provided a fascinating discussion of the literature describing the
subtle and not-so-subtle differences in world experience that accompany deafness.
The one problem area in which children with hearing loss were found to be at reduced risk in the study by Karchmer et
al. (1979) was learning disorders, a finding that some authors have attributed to the effects of overshadowing (Goldsmith
& Schloss, 1986). Overshadowing is the tendency for professionals to focus on a primary problem to a degree that causes
them to overlook other, significant problem areas. Although overshadowing may be one source of underidentification of
learning disabilities in children with hearing loss, another possible source is certainly the tendency of researchers and
clinicians to define learning disabilities as “specific learning disabilities,” in which problems known to affect learning
are excluded. The question remains, however, whether some children with a hearing loss have a learning disability
whose origin is unrelated to that hearing loss.
Page 205
Summary
1. Permanent hearing loss in children encompasses both (a) children who are hard of hearing, who will learn speech
primarily through auditory means, and (b) children who are deaf, who may acquire speech primarily through vision.
2. Characteristics of hearing losses that affect the impact of the loss include degree of loss (mild, moderate, severe,
profound; hard of hearing, deafness), type of loss (conductive, sensorineural, mixed), configuration (flat, high-frequency,
low-frequency), laterality (unilateral vs. bilateral), and age of onset (congenital, acquired).
3. Genetic sources account for about 50% of all cases of deafness, with remaining causes including infectious disease, rh
factor incompatibility, and exposure to ototoxic drugs.
4. Even mild or unilateral hearing loss can negatively affect children’s language learning and academic progress, and
there is some evidence to suggest that the transient hearing loss associated with otitis media can interact with other risk
factors to undermine children’s learning, (Peters, Grievink, van Bon, Van den Bercken, & Schilder, 1997).
5. Management of the hearing loss for children who are hard of hearing ideally includes amplification (hearing aids and
FM system use), sound treatment of the child’s language learning environment, speech-language intervention, and
classroom support as needed.
6. Under most current programs of early identification and subsequent interventions, deafness poses a grim threat to
children’s normal acquisition of an oral language.
7. Current controversies in deafness include the relative importance of oral versus sign languages in children’s
acquisition of communication competence and the role of the Deaf culture as a political force.
8. Challenges in the assessment of communication of children with hearing loss include difficulties in determining the
mode(s) in which to conduct testing (e.g., oral, ASL, Total Communication) as well as a scarcity of both appropriate
developmental expectations for communication acquisition and standardized norm-referenced measures for this
population in any mode.
Key Concepts and Terms
cochlear implant: a prosthetic device that provides stimulation of the acoustic nerve in response to sound and is used
with individuals who have little residual hearing.
conductive hearing loss: a hearing loss caused by an abnormality affecting the transmission of sound and mechanical
energy from the outer to the inner ear.
deafness: a hearing loss greater than or equal to 70 dB HL, which precludes the understanding of speech through
audition.
Page 206
FM (frequency modulated) radio systems: one of several systems designed to address the problems of low signal-to-
noise ratios and reverberation occurring in settings such as classrooms; these are used in combination with personal
hearing aids.
hard of hearing: having a degree of hearing loss usually less than 70 dB HL, which allows speech and language
acquisition to occur primarily through audition.
hearing loss configuration: the pattern of hearing loss across sound frequencies—for instance, a high-frequency loss is
one in which the loss is greatest in the high frequencies.
mixed hearing loss: a hearing loss with both conductive and sensorineural components.
otitis media: middle ear infection.
ototoxicity: the property of being poisonous to the inner ear that is found for some drugs and environmental substances.
otoacoustic emissions: low-level audio frequency sounds that are produced by the cochlea as part of the normal hearing
process (Lonsbury-Martin, Martin, & Whitehead, 1997).
overshadowing: the tendency for professionals to focus on a primary problem to a degree that causes them to overlook
other, significant problem areas.
prelingual hearing loss: a hearing loss acquired before age 2, which is thought to be associated with a more significant
impact.
prematurity: birth 2 or more weeks prior to expected due date.
rh factor incompatibility: condition in which the blood of mother and infant have discrepant rh factors resulting in
maternal antibody production that can prove harmful to the infant if untreated.
sensorineural hearing loss: hearing loss due to pathology affecting the inner ear or nervous system pathways leading to
the cortex.
Study Questions and Questions to Expand Your Thinking
1. The tendency to have a diagnosis such as deafness overshadow other significant but less severe conditions is an
understandable but quite unfortunate clinical error. How might you avoid this kind of error in clinical practice?
2. Protective ear plugs (e.g., EAR Classic) produce the equivalent of a mild (approximately 20–30 dB) hearing loss. Find
a pair and use them in three different listening conditions. For example, talking with a friend face to face in a quiet
setting, listening to a lecture from your usual seat in the classroom, and watching the TV news with the loudness level set
at a comfortable listening level (before you put the plugs in). Write down what you hear.
3. Repeat the experiment from Question 2 using only one ear plug. Besides noting what you hear, note whether you
changed anything else about your behavior as you listened and talked.
Page 207
4. Briefly describe an argument you might make favoring the use of total communication with a deaf child born to
hearing parents.
5. Repeat Question 4, but argue in favor of the use of ASL only with the same child.
6. Consider the etiologies described for hearing loss in this chapter. What preventive measures might help reduce the
occurrence of hearing loss in infants? Are there any of these measures in which you could play a role as a school-based
speech-language pathologist? As a citizen of your local community?
7. List four things you would want to be sure to remember as you prepare for the oral language evaluation of a child who
is hard of hearing and who regularly uses a hearing aid, where the purpose of the evaluation is to determine the child’s
optimal performance.
Recommended Readings
Carney, A. E., & Moeller, M. P. (1998). Treatment efficacy: Hearing loss in children. Journal of Speech-Language-
Hearing Research, 41, 561–584.
Northern, J. L., & Downs, M. P. (1991) Hearing in children (4th ed.). Baltimore: Williams & Wilkins.
Paul, P. V., & Quigley, S. P. (1994). Language and deafness (2nd ed.). San Diego, CA: Singular.
Scheetz, N. A. (1993). Orientation to deafness. Boston: Allyn & Bacon.
References
American Speech-Language-Hearing Association and the Council on Education of the Deaf. (1998). Hearing loss:
Terminology and classification; position statement and technical report. ASHA, 40 (Suppl. 18), pp. 22–23.
Bellugi, U., van Hoek, K., Lillo-Martin, D., & O’Grady, L. (1993). The acquisition of syntax and space in young deaf
signers. In D. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 132–149).
Mahwah, NJ: Lawrence Erlbaum Associates.
Bergstrom, L., Hemenway, W. G., & Downs, M. P. (1971). A high risk registry to find congenital deafness.
Otolaryngological Clinics of North America, 4, 369–399.
Bess, F. H. (1985). The minimally hearing-impaired child. Ear and Hearing, 6(1), 43–47.
Bess, F., Klee, T., & Culbertson, J. L. (1986). Identification, assessment and management of children with unilateral
sensorineural hearing, loss Ear and Hearing, 7(1), 43–51.
Brackett, D. (1997). Intervention for children with hearing impairment in general education settings. Language, Speech,
and Hearing in Schools, 28, 355–361.
Bradley-Johnson, S., & Evans, L. D. (1991). Psychoeducational assessment of hearing-impaired students: Infancy
through high school. Austin, TX: Pro-Ed.
Cacace, A. T., & McFarland, D. J. (1998). Central auditory processing disorder in school-aged children: A critical
review. Journal of Speech-Language-Hearing Research, 41, 355–373.
Carney, A. E., & Moeller, M. P. (1998). Treatment efficacy: Hearing loss in children. Journal of Speech-Language-
Hearing Research, 41, S61–S84.
Corker, M. (1996). Deaf transitions: Images and origins of deaf families, deaf communities and deaf identities. Bristol,
PA: Jessica Kingsley.
Coryell, J., & Holcomb, T. K. (1997). The use of sign language and sign systems in facilitating the language acquisition
and communication of deaf students. Language, Speech & Hearing Services in Schools, 28, 384–394.
Crittenden, J. B. (1993). The culture and identity of deafness. In P. V. Paul & D. W. Jackson (Eds.), Toward a
psychology of deafness: Theoretical and empirical perspectives (pp. 215–235). Needham Heights, MA: Allyn & Bacon.
Page 208
Culbertson, J. L. & Gilbert, L. E. (1986). Children with unilateral sensorineural hearing loss: Cognitive, academic, and
social development. Ear and Hearing, 7(1), 38–42.
Dirckx, J. H. (1997). Stedman’s concise medical dictionary for the health professions. Baltimore: Williams & Wilkins.
Donahue-Kilburg, G. (1992). Family-centered early intervention for communication disorders: Prevention and
treatment. Gaithersburg, MD: Aspen.
Dubé, R. V. (1995). Language assessment of deaf children: American Sign Language and English. Journal of the
American Deafness and Rehabilitation Association, 29, 8–16.
Engen, E., & Engen, T. (1983). The Rhode Island Test of Language Structure. Baltimore: University Park Press.
Flexer, C. (1994). Facilitating hearing and listening in young children. San Diego, CA: Singular Press.
Fraser, G. R. (1976). The causes of profound deafness in childhood. Baltimore: The Johns Hopkins University Press.
Fria, T. J., Cantekin, E. I., & Eichler, J. A. (1985). Hearing acuity of children with otitis media with effusion.
Otolaryngology—Head and Neck Surgery, 111, 10–16.
Goldsmith, L., & Schloss, P. J. (1986). Diagnostic overshadowing among school psychologists working with hearing-
impaired learners. American Annals of the Deaf, 131, 288–293.
Harris, J. (1995). The cultural meaning of deafness. Brookfield, VT: Ashgate.
Harrison, M., & Roush, J. (1996). Age of suspicion, identification, and intervention for infants and young children with
hearing loss: A national study. Ear and Hearing, 17(1), 55–62.
Kapur, Y. P. (1996). Epidemiology of childhood hearing loss. In S. E. Gerber (Ed.), The handbook of pediatric
audiology (pp. 3–14). Washington, DC: Galludet University Press.
Karchmer, M. A., Milone, M. N., & Wolk, S. (1979). Educational significance of hearing loss at three levels of severity.
American Annals of the Deaf, 124, 97–109.
Klein, S. K., & Rapin, I. (1992). Intermittent conductive hearing loss and language development. In D. Bishop & K.
Mogford (Eds.), Language development in exceptional circumstances (pp. 96–109). Mahwah, NJ: Lawrence Erlbaum
Associates.
Layton, T. L., & Holmes, D. W. (1985). Carolina Picture Vocabulary Test. Austin, TX: Pro-Ed.
Lillo-Martin, D., Bellugi, U., & Poizner, H. (1985). Tests for American Sign Language. San Diego: The Salk Institute for
Biological Studies.
Ling, D. (1989). Foundations of spoken language for hearing impaired children. Washington, DC: A. G. Bell
Association for the Deaf.
Lonsbury-Martin, B. L., Martin, G. K., & Whitehead, M. L. (1997). Distortion-production otoacoustic emissions. In M.
S. Robinette & T. J. Glattke (Eds.), Otoacoustic emissions: Clinical applications (pp. 83–109). New York: Thieme.
Mauk, G. W., & White, K. R. (1995). Giving children a sound beginning: The promise of universal newborn hearing
screening. Volta Review, 97(1), 5–32.
Maxwell, M. M. (1997). Communication assessments of individuals with limited hearing. Language, Speech, Hearing
Services in Schools, 28, 231–244.
Moeller, M. P. (1988). Combining formal and informal strategies for language assessment of hearing-impaired children.
Journal of the Academy of Rehabilitative Audiology. Monograph Supplement, 21, 73–99.
Mogford, K. (1993). Oral language acquisition in the prelinguistically deaf. In D. Bishop & K. Mogford (Eds.),
Language development in exceptional circumstances (pp. 110–131). Mahwah, NJ: Lawrence Erlbaum Associates.
Mogford-Bevan, K. (1993). Language acquisition and development with sensory impairment: Hearing impaired children.
In G. Blanken, J. Pitman, H. Grimm, J. C. Marshall, & C. W. Wallesch (Eds.), Linguistic disorders and pathologies: An
international handbook (pp. 660–679). Berlin, Germany: deGruyter.
Moog, J. S., & Geers, A. E. (1975). Scales of Early Communication Skills. St. Louis, MO: Central Institute for the Deaf.
Moog, J. S., & Geers, A. E. (1980). Grammatical analysis of elicited language: Complex sentence level. St. Louis, MO:
Central Institute for the Deaf.
Page 209
Moog, J. S., & Geers, A. E. (1985). Grammatical analysis of elicited language: Simple sentence level. St. Louis, MO:
Central Institute for the Deaf.
Moog, J. S., & Kozak, V. J. (1983). Teacher assessment of grammatical structure. St. Louis, MO: Central Institute for
the Deaf.
Moog, J. S., Kozak, V. J., & Geers, A. E. (1983). Grammatical analysis of written language: Pre-sentence level. St.
Louis, MO: Central Institute for the Deaf.
Moores, D. F. (1987). Educating the deaf. Boston: Houghton Mifflin.
Musket, C. H. (1981). Maintenance of personal hearing aids. In M. Ross, R. J. Roeser, & M. Downs (Eds.), Auditory
disorders in school children (pp. 229–248). New York: Thieme & Stratton.
Nelson, K. E., Loncke, F., & Camarata, S. (1993). Implications of research on deaf and hearing children’s language
learning. In M. Marschark & M. D. Clarke (Eds.), Psychological perspectives on deafness (pp. 123–152). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Newborg, J., Stock, J. R., Wnek, L., Guidubaldi, J., & Svinicki, J. (1984). Batelle Developmental Inventory. Allen, TX:
DLM Teaching Resources.
Northern, J. L., & Downs, M. P. (1991). Hearing in children (4th ed.). Baltimore: Williams & Wilkins.
Oyler, R. F., Oyler, A. L., & Matkin, N. D. (1988). Unilateral hearing loss: Demographics and educational impact.
Language, Speech, and Hearing Services in the Schools, 19, 201–209.
Paul, P. V. (1998). Literacy & Deafness. Boston: Allyn & Bacon.
Paul, P. V., & Jackson, D. W. (1993). Toward a psychology of deafness: Theoretical and empirical perspectives.
Needham Heights, MA: Allyn & Bacon.
Paul, P. V., & Quigley, S. P. (1994). Language and deafness (2nd ed.). San Diego, CA: Singular.
Paul, R. (1995). Language disorders from infancy through adolescence: Assessment and intervention. St. Louis: Mosby.
Peters, S. A. F., Grievink, E. H., van Bon, W. H. J., Van den Bercken, J. H. L., & Schilder, A. G. M. (1997). The
contribution of risk factors to the effect of early otitis media with effusion on later language, reading, and spelling.
Developmental Medicine and Child Neurology, 39, 31–39.
Prinz, P., & Strong, M. (1994). A test of ASL. Unpublished manuscript, San Francisco State University, California
Research Institute.
Rees, N. S. (1973). Auditory processing factors in language disorders: The view from Procrustes’ bed. Journal of Speech
and Hearing Disorders, 38, 304–315.
Resnick, T. J., & Rapin, I. (1991). Language disorders in children. Psychiatric Annals, 21, 709–716.
Ries, P. W. (1994). Prevalence and characteristics of persons with hearing trouble: United States. 1990–91. National
Center for Health Statistics. Vital Health Statistics, 10 (188).
Ross, M. (1990). Hearing impaired children in the mainstream. Parkton, MD: York Press.
Ross, M., Brackett, D., & Maxon, A. (1991). Assessment and management of mainstreamed hearing-impaired children:
Principles and practices. Austin, TX: Pro-Ed.
Roush, J., & Matkin, N. D. (1994). Infants and toddlers with hearingloss: Family centered assessment and intervention.
Baltimore: York Press.
Sanders, D. A. (1993). Management of hearing handicap. Englewood Cliffs, NJ: Prentice-Hall.
Scheetz, N. A. (1993). Orientation to deafness. Needham Heights, MA: Allyn & Bacon.
Smedley, T., & Plapinger, D. (1988). The nonfunctioning hearing aid: A case of double jeopardy. The Volta Review,
February/March, 77–84.
Spencer, P. E., & Deyo, D. A. (1993). Cognitive and social aspects of deaf children’s play. In M. Marschark & M. D.
Clarke (Eds.), Psychological perspectives on deafness (pp. 65–91). Hillsdale, NJ: Lawrence Erlbaum Associates.
Stout, G. G., & Windle, J. (1992). Developmental approach to successful listening II—DASL II. Denver: Resource Point.
Strong, M., & Prinz, P. (1997). A study of the relationship between American Sign Language and English literacy.
Journal of Deaf Studies and Deaf Education, 2(1), 37–46.
Supalla, T., Newport, E., Singleton, J., Supalla, S., Metlay, D., & Coulter, G. (1994). Test Battery for American Sign
Language Morphology and Syntax. Burtonsville, MD: Linstok Press.
Tye-Murray, N., Spencer, L., & Woodworth, G. G. (1995). Acquisition of speech by children who have prolonged
cochlear implant experience. Journal of Speech & Hearing Research, 38(2), 327–337.
Page 210
Vernon, M., & Andrews, J. F. (1990). Other causes of deafness: Their psychological role. The psychology of deafness
(pp. 40–67). New York: Longman.
Voutilainen, R., Jauhiainen, T., & Linkola, H. (1988). Associated handicaps in children with hearing loss. Scandinavian
Audiological Supplement, 33, 57–59.
Worthington, D. W., Stelmachowicz, P., & Larson, L. (1986). Audiological evaluation. In M. J. Osberger (Ed.),
Language and learning skills of hearing impaired students. American Speech-Language-Hearing Association
Monographs, 23, 12–20.
Ying, E. (1990). Speech and language assessment: Communication evaluation. In M. Ross (Ed.), Hearing impaired
children in the mainstream (pp. 45–60). Parkton, MD: York Press.
Yoshinago-Itano, C. (1997). The challenge of assessing language in children with hearing loss. Language, Speech, and
Hearing Services in Schools, 28, 362–373.
Page 211
PART
III

CLINICAL QUESTIONS DRIVING ASSESSMENT


Page 212
Page 213
CHAPTER
9

Screening and Identification: Does This Child Have a Language Impairment?


The Nature of Screening and Identification

Special Considerations When Asking This Clinical Question

Available Tools

Practical Considerations
Since his infancy, Serge’s parents had suspected that there was something different about their third child. Although he
was a healthy and friendly baby, he rarely vocalized and used only a few intelligible words by the time he was 3. He also
seemed able to ignore much of what went on around him while being extraordinarily sensitive to loud noises such as
motorcycles or a TV turned up by his older siblings. On the basis of Serge’s mother’s reports and the results of the
Denver II (Frankenburg, Dodds, & Archer, 1990), an early educator at a preschool screening recommended a complete
speech-language and hearing evaluation.
Amelia had ‘‘just gotten by” in the early grades. Although she never performed particularly well, she rarely failed
assignments and never received a failing grade. She was well organized, attentive, and ever so eager to please. Her
parents were accepting of her performance because they, too, had never done terribly well in school; they had just been
happy that she was enjoying it so much. All of her enjoyment vanished,
Page 214
however, in the fourth grade, when the language of the classroom became more complex and more dependent on the
books being used. She pretended to be sick in order to avoid school and cried in frustration when the work seemed too
hard. Her teacher and the school speech-language pathologist were so alarmed by her behavior and by the quality of
her written and oral discourse that they decided an in-depth examination of her oral language and literacy skills was
necessary immediately.
The Nature of Screening and Identification
Screening and identification of language disorders are closely related enterprises. Screening procedures aid clinicians in
making a relatively gross decision—Should this child’s communication be scrutinized more closely for the possible
presence of a language disorder? Identification, on the other hand, takes that question several steps further. Does this
child have a language disorder, a difference in language, or both? Often this complex question is tied to yet another
question: Is this child eligible for services within a particular setting?
Screening
In many cases, referrals by concerned parents, teachers, or physicians function as indirect screening mechanisms.
Nonetheless, alternative procedures are needed in cases when such indirect methods are unlikely to occur or are
unsuccessful. Although detection may readily occur at the behest of concerned families facing severe problems,
detection may be delayed when the problems are mild (e.g., when they consist of subtle difficulties in comprehension) or
when they are unaccompanied by obvious physical or cognitive disabilities (Prizant & Wetherby, 1993).
Screening is typically used when the number of individuals under consideration makes the use of more elaborate
methods impractical—usually from the perspectives of both time and money. Much of the current thinking about
screening and its relationship to identification are borrowed from the realm of public health (e.g., Thorner & Remein,
1962). In that context, screenings are designed to be quick, inexpensive, and capable of being conducted by individuals
with lesser amounts of training. Similarly, in speech-language pathology, the administration and interpretation of
screening methods should require minimal time and expertise. Nonetheless, validity continues to be of critical
importance because an inaccurate screening procedure is useless no matter how quick or inexpensive it may be!
A number of different kinds of screening mechanisms occur in the detection and management of language disorders. Of
greatest importance for our purposes is screening for the presence of a language disorder. Such a screening, for example,
might be performed on all 3–5 year olds in a given school district, often as part of a broader screening for a variety of
health and developmental risks. Another example of such a comprehensive screening would occur as part of neonatal
intensive care follow-up. When examined alone, communication is screened using a great variety of measures with
selected aspects of speech, language, and hearing as their major foci.
Page 215
In practice, such measures are often informal and frequently make use of several measures—some formal and some
informal—to increase the comprehensiveness of the examination. Specific tools used in a more focused approach to
language screening are discussed in the Available Tools section of this chapter.
When examined as part of a broader screening effort, communication is frequently assessed using a measure designed to
address a variety of major areas of functioning. One example of these kinds of screening measures is the Denver
Developmental Screening Test—Revised (Frankenburg, Dodds, Fandal, Kazuk, & Cohrs, 1975; Feeney & Bernthal,
1996), a screening tool for children from birth to age 6 that makes use of direct elicitation and parental reports. Another
is the Developmental Indicators for Assessment of Learning—Revised (DIAL-R; Mardell-Czudnoswki & Goldenberg,
1983), a screening tool for children ages 2–6 that is often used to screen larger numbers of children through the use of a
team of evaluators, each of whom elicit behaviors from an an individual child within a given area.
In a 1986 study of the 19 measures most commonly used in federally funded demonstration projects around the United
States, Lehr, Ysseldyke, and Thurlow (1986) found only 3 that they judged to be technically adequate: the Vineland
Adaptive Behavior Scales (Sparrow, Balla, & Cicchetti, 1984), the McCarthy Scales of Children’s Abilities (McCarthy,
1972), and the Kaufman Assessment Battery for Children (Kaufman & Kaufman, 1983). Bracken (1987) noted similar
problems with available screening measures, especially among measures designed for children younger than 4. This lack
of well-developed comprehensive screening tests is particularly problematic given the demand inherent in the
Individuals with Disabilities Education Act (IDEA, 1990) which compels identification of at-risk children at very young
ages.
Screening procedures are also used by speech-language pathologists during comprehensive communication assessments
to determine (a) whether specific areas of communication (e.g., voice, fluency, hearing) need in-depth testing and (b)
whether problems exist and thus require referrals in other major areas of functioning (e.g., vision, cognition). Nuttall,
Romero, and Kalesnik (1999) provided a wide ranging discussion of various types of developmental preschool
screenings.
Identification
Essentially, identification procedures for language disorders in children are intended to verify the existence of a problem
that may have been suspected by referral sources or uncovered through a screening program. For the purposes of this
book, identification is seen as synonymous with the term diagnosis, when that term is defined as the “identification of a
disease, abnormality, or disorder by analysis of the symptoms presented” (Nicolosi, Harryman, & Kresheck, 1996, p.
86). Diagnosis is often defined so that it includes the larger set of questions leading to conclusions regarding etiology,
prognosis, and recommendations for treatment (e.g., see Haynes, Pindzola, & Emerick, 1992). Here, however, the term
identification is preferred as a means of expediting our focus on the special measurement considerations it entails.
Page 216
Identification decisions involving children are crucial for at least two reasons. First, identification is usually the first step
that enables the child to receive help, often in the form of intervention. This step is a critical one because of the
emotional, monetary, and temporal demands that accompany intervention that will be met to varying degrees by the
child, the parents, the speech-language pathologist, as well as the larger community. Second, by leading to effective
intervention, correct identification can help prevent or mitigate the additional social and scholastic problems that may
accompany language impairment. Identification decisions are among the most important ones made by speech-language
pathologists and, therefore, should be among the most carefully made.
Because identification decisions often involve the assignment of a label, they are often associated with a fear on the part
of many parents and some theorists (Shepard, 1989) that the child will be equated with the disorder. For example, the
parents may fear that their child will no longer be seen as “a cute, complicated child” when he or she becomes an
‘‘autistic child.” Although person first nomenclature (e.g., referring to “a person with autism” rather than “an autistic
person” or, worse yet, “an autistic”) is intended to make the process of labeling more benign, the negative implications
of being identified as having a communication disorder exist nonetheless in the minds of parents and perhaps in the
understandings of naive observers. This is evident when parents find one label—for example, “language impaired”—
more acceptable than another—such as “language delayed”—as clinicians frequently discover during their interactions
with families (Kamhi, 1998). Concerns about labeling in the special education community are intense and have led to
recommendations to avoid labels as much as possible, particularly for younger children and in cases where only a
screening has been conducted (Nuttall et al., 1999).
Many of the measurement issues associated with identification mirror those of screening. However, the more permanent
nature of identification and its association with decisions about access to continuing services raise the stakes in the
quality of decision making required. In the next section, special measurement considerations affecting both screening
and identification are discussed in some detail, with efforts made to call readers’ attention to points where the two differ.
Special Considerations When Asking This Clinical Question
If I were reading this book as a student (or as a clinician who finds measurement less interesting than I do), I would be
hoping that my friendly author would offer several easy steps toward accurate and efficient screening and identification.
Better yet, perhaps she would tell me exactly which screening and identification measures I should purchase and exactly
which three simple steps I should follow for infallible clinical decision making. Sadly, as much as I would like to help, a
blanket prescription for test purchasing and use cannot be made for all of the testing situations facing even a very small
group of readers. Instead, what I can do is provide basic information about some special considerations and then, in the
next section, introduce some of the many available measures that can be used for screening and identification.
Page 217
In this section of the chapter, several special considerations are explored to help readers engage in the process of test
selection and interpretation for the purposes of screening and identification. These special considerations represent
refinements of some of the information presented in earlier chapters—refinements dictated by the particular demands of
screening and identification as testing purposes.
In learning how to choose the best possible measure for a given purpose, the tie between measurement purpose and
methodology was not always obvious to me. Some time ago, in my first published article, a colleague and I used 10
operational definitions of psychometric guidelines offered by the APA, AERA, and NCME (1985) to evaluate 30
language and articulation tests used with preschool children (McCauley & Swisher, 1984a). The criteria included an
adequate description of tester qualifications, evidence of test–retest reliability, information about criterion-related
validity, and others. Almost instantly, a well-known language researcher, John Muma (1985), chastised us, citing, among
other reasons, the danger that readers would assume that each of the criteria we included was equally as important as
every other. Today, as in 1985, it seems to me that although Muma failed to understand the basic intent of the article, he
was absolutely on the mark in his concern about its fostering misunderstanding. In fact, as you will see in the next
chapters, different purposes of testing will draw special attention to different aspects of the measures one might use. It is
important to pay attention to this ironclad connection in order to make ethical decisions.
The appropriateness of standardized norm-referenced tests for purposes of identifying a language disorder or difference
is almost universally accepted in the clinical literature (e.g., see Kelly & Rice, 1986; Merrell & Plante, 1997; Sabatino,
Vance, & Miller, 1993; cf. Muma, 1998). In addition, such instruments are widely favored for that purpose by practicing
speech-language pathologists (e.g., see Huang, Hopkins, & Nippold, 1997). Often, their use is mandated as the backbone
of screening and identification efforts.
In an ideal world, speech-language pathologists would be able to predict flawlessly which children would experience
persistent, penalizing differences in communication based on a description of each child’s current language status. Thus,
criterion-referenced measures would generally suffice for both identification and treatment planning. However, given the
current level of understanding, the best strategy is to (a) identify those children whose performance seems sufficiently
different from the performances of a relatively large group of peers as to warrant concern and (b) supplement that
information with other sources of information, particularly from persons familiar with the child’s functional
communication.
Because of the tie between norm-referenced measures and identification procedures, most of the special considerations
regarding screening and identification discussed next relate to the use of norm-referenced measures in decision making.
The six special considerations involve (a) weighing measure sensitivity and specificity in test selection, (b) deciding on
cutoff scores, (c) remembering measurement error in score interpretation, (d) wrestling with the disorder–difference
question, (e) conducting comparisons between scores, and (f) taking into account base rates and referral rates in
evaluating screening measures. The first two of these considerations address concerns that will pri-
Page 218
marily be dealt with by the clinician prior to use of an instrument in a particular case. The second three address concerns
arising during the process of test use. The last consideration relates to one’s thinking about how to implement and
potentially evaluate a screening program—a more specific concern than the other five.
Weighing Measure Sensitivity and Specificity in Test Selection
On the basis of previous discussions of validity, readers can anticipate that a measure used to screen or identify children
for language disorders should provide as a corner-stone of evidence supporting its validity convincing empirical
documentation of its ability to distinguish children with and without such disorders (Plante & Vance, 1994).
One method used to examine the accuracy of classification achieved by screening and identification measures entails the
comparison of the measure under study with a measure that is considered valid or at least acceptable given the state of
the art. Comparison against an ideal is often described as a comparison with a gold standard, a measure that has been so
thoroughly studied that it is thought to represent the very best measure available for a given purpose. Because of the
scarcity of gold standards in arenas related to child language assessment, the more typical scenario involves a
comparison with a well-studied and- respected measure.
In the case of a screening measure, the comparison is often made between the results of a screening procedure and those
of a more elaborate and established method of identification. The comparison may involve the use of a more well-
established test or test battery that has been independently validated. As you may recognize in the discussion that
follows, the method used to compare these performances is largely an elaboration of the contrasting-groups method
described in chapter 3.
The comparison often makes use of a contingency table, such as that portrayed in Fig. 9.1 and in earlier sections of the
book. In Fig. 9.1, two tables are used—one to illustrate the components of this type of table and the other to show a
hypothetical example: the results of the Hopeful Screening Test contrasted with those of the Firmly Established
Identification Measure for a group of 1000 individuals.
As you can see from the first table in the figure, sensitivity is simply the proportion of true positives produced by the
measure. Thus, it reflects how frequently those children needing further evaluation are accurately found using this
measure. According to a more formal definition, sensitivity is a measure of the ability of a test or procedure to give a
positive result when the person being assessed truly does have the disorder. Specificity is a measure of the ability of a
measure to give a negative result when the person being assessed truly does not have the disorder. It is usually described
as the proportion of true negatives associated with the measure. Thus for a screening measure, specificity reflects how
frequently individuals will be held back from additional evaluation who actually shouldn’t be evaluated because they are
problem-free. In other words, a test or procedure that underidentifies children suffers from poor sensitivity, and a test or
procedure that overidentifies children suffers from poor specificity.
In the case of the hypothetical Hopeful Screening Test of Language, sensitivity seems to be less than most people would
be happy with: on the basis of its results,
Page 219

Fig. 9.1. Information contained in a contingency table and an example showing how it can be used to calculate
sensitivity and specificity.
Page 220
22%, or about 1/5, of children with the disorder would go undetected and thus be excluded from further assessment. In
contrast, the measure’s specificity is excellent, with only about 5 out of every 100 children who are performing normally
recommended for unnecessary testing.
In discussions of what constitutes acceptable levels of overall accuracy for language identification measures, Plante &
Vance (1994) noted that overall accuracy (i.e., the percentage of true positives plus true negatives given out of the entire
population) should be at least 90% for an evaluation of “good” and 80% for an evaluation of“ fair.” Thus, although the
Hopeful Screening Test of Language might be considered good in its overall accuracy (about 94%), its sensitivity cannot
be regarded nearly so highly (78%).
With regard to sensitivity and specificity for language-screening procedures, Plante and Vance (1995) recommended that
a higher standard be met for sensitivity than for specificity. Specifically, they recommended that sensitivity should be at
90% or above, whereas for specificity they accepted levels of 80% as “good” and 70% as “fair.” Thus, although
sensitivity and specificity are both inversely related to the frequency of errors (also called “misses’’) in decision making
associated with a particular test or procedure, it is important to want to examine them independently rather than lumped
together in a single measure of accuracy because their effects differ. As Plante and Vance noted, sensitivity is more
important for screening measures than specificity because the underreferrals associated with poorer sensitivity may have
greater negative effects on children than overreferrals associated with poorer specificity.
Taking Plante and Vance’s (1995) line of thought one step further, not only should clinicians go beyond overall accuracy
of classification in their evaluations of measures, they should also consider the implications of a measure’s sensitivity
and specificity levels in light of the specific testing situation. Properties of that testing situation include the gravity of the
decision to be made and its irreversibility. For example, lower sensitivity may be more acceptable in settings where
failures to refer for testing or to take steps toward identification will be corrected— such as a situation in which a well-
informed teaching staff will be likely to bring a child to the clinician’s attention regardless of previous screening results.
Similarly, lower specificity may be tolerated in situations where testing resources are not sorely taxed (if there are such
places).
Finally, as a point that cannot be overstressed—the relative sensitivity and specificity of accessible alternatives needs to
enter into the clinician’s decision making: It makes little sense to jump from a rocking boat to a sinking one. Yet this is
the action that may be taken regularly by clinicians who choose reliance on their own untested “judgment” over a flawed
but better understood screening mechanism.
Lest the reader hope that if other indicators of validity and reliability look promising all is likely to be well with regard to
a test’s sensitivity and specificity, consider a relevant finding of Plante and Vance’s (1994) research. Using criteria
closely related to those used in McCauley and Swisher (1984a), Plante and Vance rated 21 language tests designed for
use with 4 to 5 year olds. The researchers then conducted a study of 4 of the tests that met a relatively larger number of
criteria (6 out of 10) to determine their sensitivity and specificity. Of the 4 they examined, only one achieved
Page 221
acceptable levels. Thus, it pays to look for specific information on sensitivity and specificity—and to demand it from
publishers as a prerequisite to purchase.
In summary, sensitivity and specificity data provide special insight into the way that measures function for purposes of
screening and identification. Thus, they can provide enormously valuable evidence of a measure’s value for those
purposes. Whereas for many purposes sensitivity is even more important than specificity, the specific context in which
the measure is used and the availability of preferable alternatives will ultimately affect clinical perceptions of acceptable
levels. Finally, it seems quite probable that the absence of this information from test manuals, although currently
commonplace, will be rectified only when clinicians begin to discriminate among tests on this basis and to directly urge
publishers to take action.
Choosing a Cutoff Score
One factor that affects both sensitivity and specificity is the cutoff used to determine whether a positive or negative result
has been obtained. When a screening or identification decision is made using a normative comparison, a cutoff score is
selected to indicate the score at which a child’s performance is seen as crossing an invisible boundary between a region
of normal variation for that particular group on that particular measure into a region suggesting a difficulty or difference
worthy of attention. Clearly, however, the location of the cutoff point is both arbitrary and significant. Shifting its
location can decrease a test’s specificity while increasing its sensitivity, or vice versa. Thus, the choice of a cutoff is not
a trivial matter.
Clinically oriented authors writing about language disorders have recommended a variety of possible cutoffs for use
when norm-referenced instruments are used as part of developmental language assessments. For example, Owens (1995)
noted that scores falling below the 10th percentile are often considered “other-than-normal.” Leonard (1998) also
observed that researchers frequently use cutoffs falling 1.25 or 1.5 standard deviations below the mean, thus falling close
to Owen’s 10th percentile. Similarly, Paul (1995) endorsed a cutoff at the 10th percentile, corresponding to a standard
score of about 80 and a z score falling 1.25 standard deviations below the mean for scores that are normally distributed.
She indicated that she based her recommendation, in part, on similar levels previously recommended by Fey (1986) and
Lee (1974). However, because of concerns about its arbitrariness and questionable psychometric defensibility, Paul’s
complete criterion is somewhat more elaborate. Specifically, she required
that a child thought by significant adults in his or her life to have a communication handicap should score below the
tenth percentile or below a standard score of 80 on two well-constructed measures of language function to be thought of
as having a language disorder. (p. 5)
Paul’s intention was to make sure that this definition would not strong-arm children who had no real-life problems into
diagnoses simply because of differences in test scores that, although detectable, are of little or no practical significance.
(See a longer discussion of clinical or practical significance in chap. 11.)
Page 222
It is also important to note that Paul (1995) recommended the use of two “well-constructed” measures, given that the use
of one or two measures that are less than that will undermine the intent of the recommendation. Just as a chain is no
stronger than its weakest link, a battery (even of just 2 measures) will be no more accurate than its least accurate member
(Plante & Vance, 1994; Turner, 1988). Because of this concern, Plante (1998) recently recommended that a single valid
test along with a second functional indicator (e.g., clinician judgment, enrollment in treatment) be used for verification of
specific language impairment for research purposes. This recommendation leads to an obvious parallel for initial
implications and one that can be seen as consistent with IDEA (Plante, personal communication).
Sometimes, when cutoffs are selected in accordance with test developer recommendations, clinicians and researchers use
different cutoffs for different tests. Usually, the recommendations of the test developers result in very similar cutoffs to
those discussed earlier. Looking back at the normal curve and its relationship to different types of scores in Fig. 2.5
suggests that small differences in cutoffs should result in only small shifts in selection, thus suggesting that the method
used to select a cutoff probably does not matter. Surprisingly, however, Plante and Vance (1994, 1995) demonstrated
that an empirically derived cutoff can greatly enhance a measure’s sensitivity and specificity. Further, they showed that
empirically derived cutoffs are likely to vary from test to test, thus making the use of a “one-cutoff-fits-all-tests’’
practice something that they would advise against. Their work is described briefly in the next paragraphs to help
illustrate the value of research into basic measurement issues such as cutoff selection.
In their studies, Plante and Vance (1994, 1995) used a statistical technique called discriminant analysis—a form of
regression analysis—to examine outcomes associated with different cutoffs. Using this technique, the experimenter
determines to what extent variation in scores is accounted for by group membership and then examines the accuracy of
predictions of group membership made from a resulting regression equation. It allows one to examine the ways in which
changing the cutoff affects sensitivity and specificity.
Plante and Vance (1994, 1995) recommended two strategies for ensuring the availability of empirically derived cutoffs
such as those that can be obtained through discriminant analysis. First, they advised clinicians to insist that standardized
measures offer such cutoffs along with data concerning sensitivity and specificity. Second, they noted the possibility of
developing local cutoffs, a process that requires fewer participants than local norming but that can require clinicians who
attempt it to seek statistical assistance (Plante & Vance, 1995).
Although not endorsed by Plante and Vance (1995), the development of local norms may also represent a responsible
strategy for increasing the availability of data concerning sensitivity and specificity of decisions in settings where
sufficient resources and numbers of children (including those with disorders) exist (e.g., see Hirshoren & Ambrose,
1976; Norris, Juarez, & Perkins, 1989; Smit, 1986). Software designed to aid in the construction of local norms (Sabers
& Hutchinson, 1990) makes this strategy more feasible than it once was (Hutchinson, 1996). In addition, the
development and use of local norms has been recommended as a means of dealing with
Page 223
bias in testing that results from the use of inappropriate norms (e.g., see Vaughn-Cooke, 1983).
In summary, then, the cutoffs used to identify children’s performance as falling below expectations are often arbitrarily
set at about 1.25 to 1.5 standard deviations below the mean. However, greater sensitivity and specificity can be achieved
when empirical methods are used to optimize the performance of the measures used. Not only does this practice
constitute another step that can be taken by test authors and publishers to improve the quality of clinical decision making
in the field, it represents a topic of such practical significance as to invite a wealth of applied research. In addition, as
Paul (1995) suggested, the current state of the art precludes reliance on a single measure—or even a single battery of
measures—to lead in a lockstep fashion to decision making. Integration of functional data about the child will remain a
necessary component of screening and identification for the foreseeable future. As understanding of functional or
qualitative data—such as portfolios and teacher reports of critical incidents—increases (e.g., Schwartz & Olswang,
1996), their role will probably increase as well (see chap. 10), with beneficial results for the sensitivity and specificity of
the process. Further, in many clinical and especially educational settings, the choice of cutoff to be used can seem—and
in some cases may be—outside the control of the speech-language pathologist. The role played by educational agencies
in establishing guidelines for measurement use and clinicians’ productive responses to these are discussed later in this
chapter in the section called Practical Considerations.
There are theoretical concerns, too, about the use of cutoffs that relate to our understanding of the very nature of
language impairment in all children, but particularly in those for whom no obvious cause exists: children with SLI.
Dollaghan and Campbell (1999) recently called attention to the fact that the use of an arbitrary cutoff at a point along a
normal distribution of scores is at odds with theoretical notions that language impairment represents a natural category,
or taxon. Instead, they say that it implies an assumption that children with “impaired” language may simply represent
those children who have less language ability, in the same way that short persons have less height. This possibility has
been pointed out by several theoreticians addressing the question of etiology for children with SLI (e.g., see Lahey,
1990; Leonard, 1987) but has failed to receive sustained attention. As an important step toward reviving consideration of
this hypothesis, Dollaghan and Campbell noted that the question of whether “language impairment” represents a distinct
category versus the lower range of a continuum of performance is an empirical one with potentially powerful
repercussions for both assessment and treatment. Specifically, as a working hypothesis they predict that if language
impairment is taxonic, language deficits would be likely to be more focused and would therefore require more focused
assessments and treatments.
Dollaghan and Campbell (1999) also noted that the time may be ripe for addressing the question of the nature of
language impairment because parallel concerns in clinical psychology with regard to schizophrenia and depression have
spawned rich advances in methodology (Meehl, 1992; Meehl & Yonce, 1994, 1996). They conjectured that these
advances might provide an auspicious starting point for additional efforts. Among the implications of this work are the
possibility of identifying those cutoffs that truly identify children who are categorically different in their language
Page 224
skills from other children rather than those who simply seem quantitatively suspicious because of their lower
performances. Thus, these methods may prove to provide additional strategies for more rational cutoff selection.
Remembering Measurement Error in Score Interpretation
Once a measure has actually been selected and administered and a cutoff level settled on, the clinician uses the test
taker’s score to assist in a decision regarding screening or identification. During this process, because of the weight
attached to individual scores in screening and identification decisions, remembering measurement error in score
interpretation becomes critical to solid clinical decision making—even when functional criteria are incorporated.
Recall that in chapter 3 the concept of SEM was described as a means of conveying the impact of a test’s reliability on
an individual score. Specifically, the lower the reliability of the instrument, the higher the error (quantified using SEM)
attached to the individual score. The importance of reliability and SEM is not due to their ability to remove error
(because they can’t), but rather to their helping us understand the magnitude of error we face.
Figure 9.2 is intended to provide an example illustrating the effect of SEM on a screening decision. It shows the same
score achieved by a child on two different screening measures—one with a larger SEM and the other with a smaller
SEM for that child’s age group. Around each of these scores, there is a 95% confidence interval. The confidence interval
represents a range of scores in which it is likely (although not absolutely assured) that the test taker’s true score falls. A
95% confidence level means that there is a probability of 95% that the interval contains the child’s true score and, of
course, 5% that it does not. It is often recommended that clinicians characterize children’s performance using the range
of scores encompassed within the confidence interval, rather than a single score. Further, it has been suggested that the
SEM for a measure should be no more than one third to one half of its standard deviation (Hansen, 1999).
If a score of 75 is used as a cutoff on each test in the example, clearly the task of deciding that the child’s performance
falls below that value becomes much trickier for test A than for test B, despite identical scores. In fact, one might be
tempted to refrain from using test A in favor of test B when screening children of this particular age. However, perhaps
test A is preferable as a screening tool for other reasons, for example, because it has a more appropriate normative
sample and better evidence of validity for children similar to the one being tested. In that case, the clinician may decide
to use the measure but view the resulting data with greater circumspection.
Some tests make it quite easy to take error into account during score interpretation because of the way in which a child’s
scores are plotted on the test form. For tests that do not provide this user-friendly feature, however, the test user can
calculate a confidence interval using the tables and following the example laid out in Fig. 9.3. Although the choice of
confidence level is somewhat arbitrary, more stringent levels are usually selected for more momentous decisions.
Confidence intervals of 68, 95,
Page 225

Fig. 9.2. Two 95% confidence intervals calculated for the same score using two different screening measures, one with a larger SEM, on the left, and
the other with a smaller SEM, on the right.
and 99% are the ones most typically reported, with 85 and 90% used less frequently (Sattler, 1988).1
The old adage “know your limitations”—including know the limitations of your data—would work as an apt summary of this brief section.
Information about SEM can help clarify the significance of reliability data for individual clients and can thus be used to help the clinician make
choices in the measures he or she adopts. Further, through the
1 Also note that Salvia and Ysseldyke (1991) and others (including McCauley & Swisher, 1984b, Nunnally, 1978) recommended a slightly more
complex procedure in which an estimated true score is calculated first. This procedure is offered as a first step in appreciating the. potential value of
confidence intervals but should not be taken as definitive.
Page 226

Fig. 9.3. Table to be used in calculating confidence intervals, with an example. From “The truth about scores children
achieve on tests’’ by J. Brown, 1989, Language, speech, hearing services in schools, 20, p. 371. Copyright © 1989 by
American Speech-Language-Hearing Association. Reprinted with permission.
use of confidence intervals during interpretation of an individual’s performance, the clinician is given the opportunity to
gauge the possible effect of a measure’s known imperfection (imperfect reliability in this case). Therefore, what may
have begun to sound like a repeated refrain in the last three sections can be sounded again here. One should always make
use of such information when it is readily available, calculate it if possible, and encourage test publishers to provide it
when it is neither offered nor calculable.
Page 227
Wrestling with the Disorder–Difference Question
The diversity of cultural and language backgrounds represented among any group of children can be quite breathtaking.
Even in Vermont, which is often cited as one of the least diverse states in the country, the school district of the state’s
largest city, Burlington (population 40,000), has children whose first languages include Vietnamese, Serbo-Croatian,
Mandarin, and Arabic. In fact, in 1998 and 1999, about 25 languages other than English were spoken by children whose
proficiency in English was sufficiently low to require special intervention. During the time frame 1987–1988 to 1998–
1999, the number of such children grew from just below to 20 to just about 300 (Horness, personal communication).
Because several national companies are represented in Burlington, there are numerous children who have moved here
from different regions of the United States with their parents. Whereas some of these families have moved from other
New England regions with similar regional dialects to Vermont, others have moved from the Deep South or other
regions claiming distinct regional dialects. Further, children in this same school district come from families with incomes
below the poverty level to those with incomes in the stratosphere of affluence. On the basis of these few facts, it seems
safe to say that each speech-language pathologist working in this school district confronts issues related to differences in
culture, regional dialect, social dialect, and primary language on a daily basis. Even in Vermont!
As this example illustrates, diversity affecting language use among young native speakers of English and language use
by children who are acquiring English as a second language is the rule rather than the exception. Consequently,
professionals who work with children are challenged to remain vigilant to cultural and linguistic factors in the selection
and use of screening and identification measures.
Clearly, the magnitude of the challenge differs substantially when the clinician works with children who speak a
minority dialect of English compared with those who are being exposed to English for the first time in a school setting.
This latter group of children are sometimes referred to as having limited English proficiency (LEP). Regardless of
whether they are seen as having a language disorder, they will often be served through an English as a Second Language
(ESL) program in school systems. In contrast, the children who speak a minority dialect of English are perhaps more
easily misunderstood by the SLP because their differences in dialect may go unappreciated, in the assumption that they
are bidialectal—that is, able to use the dialect of the school and a regional or social dialect as well. They may also
include children whose first dialect is unknown to both their classmates and the speech-language pathologist, thus further
increasing the complexity of the speech-language pathologist’s work.
Regardless of the differences between these groups of children, any time there is a mismatch between the tools being
used or between the clinician’s language and culture and the language and culture of the child, the issue of difference
versus disorder becomes relevant. Table 9.1 offers a pair of hypothetical scenarios in which challenges of this type are
presented.
Before figuring out exactly how to respond to the challenges of linguistic and cultural diversity, however, we need to
remind ourselves of what threats to validity are
Page 228
Table 9.1
Scenarios Illustrating the Challenges of Cultural and Linguistic Diversity

Little English, Little Vietnamese—An Experiential Deficit or a Disorder?


Although Van and a twin sister were born in the Southwestern United States to parents of Vietnamese heritage, Van
was adopted at age 5 by a professional couple in New England after he was removed from his home because of severe
neglect. Not much was known about his life before the adoption. However, informants knowledgeable in Vietnamese
indicated that although his understanding in that language seemed excellent, he spoke little. During a foster placement
immediately preceding his adoption, he had begun to use English as frequently as Vietnamese, but he continued to be
very quiet around everyone except his new parents. The speech-language pathologist and the educational team assigned
to work with Van and his new family was interested in obtaining information about Van’s language status in both
languages.
American English Dialect, Probably Not a Disorder, but a Problematic Difference
Raymond moved from a school district in New Orleans in which about 95% of his classmates in kindergarten were
Black, to a racially mixed suburb of Chicago in which a White speech-language pathologist who had been raised in
Toronto, Canada was assigned to serve as his speech-language pathologist. Concerns had been raised about his speech
intelligibility and his vocabulary use and understanding by his classroom teacher, who was a White native of Indiana.
Although both professionals had many years of experience working with children and colleagues in their racially
diverse school, neither had had such a difficult time in understanding a speaker of Black English. They wanted to
determine whether Raymond’s speech was simply “different” because of his dialect or whether it represented a genuine
problem. Although they were relieved to find out that Raymond’s family considered him a competent, if young
speaker, they were even more perplexed about how they might smooth his transition into his new school.

interwoven with diversity. I begin by considering the threats that occur in instances where a child speaks a dialect of
English or is acquiring English as a second language—for example, Black English or Spanish-influenced English.
Among the threats to valid testing in English that have been most thoroughly discussed are those arising from the
potential for measures to use situations, directions, formats, or language that are inconsistent with the child’s previous
experience (Taylor & Payne, 1983). Here, the chief concern is in correctly respecting the presence of a language
difference, a difference in language use associated with systematic variation in semantics, phonology, and so on, when
compared with the idealized dialect that is typically represented in standardized language measures. The danger, of
course, is erroneously identifying a difference as a disorder. ASHA (1993) has defined language difference more
elaborately
as a variation of a symbol system used by a group of individuals that reflects and is determined by shared regional,
social, or cultural/ethnic factors. A regional, social or ethnic variation of a symbol system is not considered a disorder of
speech or language. (p. 41)
For children using minority dialects, English-language measures developed without attention to dialectal and
accompanying cultural variation are especially problematic for purposes of screening and identification. The advantages
and disadvantages
Page 229
of alternatives for children who speak Black English and other minority dialects fuel continuing discussion (e.g., see
Damico, Smith, & Augustine, 1996; Kamhi, Pollock, & Harris, 1996; Kayser, 1989, 1995; Reveron, 1984; Taylor &
Payne, 1983; Terrell & Terrell, 1983; Van Keulen, Weddington, & DeBose, 1998; Vaughn-Cooke, 1983).
Not surprisingly, many strategies for coping with this complex issue have been considered, but none are completely
satisfactory for use with children speaking minority dialects (Vaughn-Cooke, 1983; Washington, 1996). When the
continuing use of norm-referenced instruments for these children is entertained (e.g., see Kayser, 1989; Vaughn-Cooke,
1983), it is generally recognized that there are few existing measures that have been found to be suitable. The strategies
that have been recommended and tried include the development of alternative norms, either through adding minorities in
small numbers to normative samples or obtaining normative data for minority children—ideas that are, respectively,
ineffective or impractical in addressing a problems with the norms (e.g., Vaughn-Cooke, 1983). A second method
involves modifying objectionable test components (e.g., Kayser, 1989), and a third involves developing alternative
scoring rules designed to give credit for “correct” answers in the dialect being considered (e.g., Terrell, Arensberg, &
Rosa, 1992). Both of these latter methods have been found lacking because they invalidate the norms, thus transforming
the targeted measure into an informal criterion-referenced measure. Table 9.2 lists some modifications in test admin-
Table 9.2
Modifications of Testing Procedures

1. Reword instructions.
2. Provide additional time for the child to respond.
3. Continue testing beyond the ceiling.
4. Record all responses, particularly when the child changes an answer, explains, comments, or demonstrates.
5. Compare the child’s answers to dialect or to first language or second language learning features. Rescore
articulation and expressive language samples, giving credit for variation or differences.
6. Develop several more practice items so that the process of ‘‘taking the test” is established.
7. On picture vocabulary recognition tests, have the child name the picture in addition to pointing to the stimulus
item to ascertain the appropriateness of the label for the pictorial representation.
8. Have the child explain why the “incorrect” answer was selected.
9. Have the child identify the actual object, body part, action, photograph, and so forth, particularly if he or she has
had limited experience with books, line drawings, or the testing process.
10. Complete the testing in several sessions.
11. Omit items you expect the child to miss because of age, language, or culture.
12. Change the pronunciation of vocabulary.
13. Use different pictures.
14. Accept culturally appropriate responses as correct.
15. Have parents or other trusted adult administer the test items.
16. Repeat the stimuli more than specified in the test manual.

Note. From “Speech and Language, Assessment of Spanish-Speaking Children,” by H. Kayser, 1989, Language,
Speech, and Hearing Services in Schools, 20, p. 244. Copyright 1989 by American Speech-Language-Hearing
Association. Reprinted with permission.
Page 230
istration that have been proposed for use with minority children who have been tested with existing norm-referenced
tests; these modifications might profitably be applied in cases where a description of the child’s responses to certain
kinds of stimuli is wanted. Usually, however, those cases will exist not during identification of a language impairment,
but during the descriptive process that follows it (see chap. 10). A fourth method consists of supplementing existing
norm-referenced measures with descriptive tools (Vaughn-Cooke, 1983), which seems to present a very difficult
interpretation challenge to the clinician because norm-referenced measures will be assumed to be biased, and descriptive
measures are usually not up to the challenge of identification.
Finding more widespread approval than those methods just discussed are strategies that entail the abandonment of
currently available measures. These include (a) the substitution of descriptive methods (such as language sample analysis
or criterion-referenced measures; e.g., see Damico, Smith, & Augustine, 1996; Leonard & Weiss, 1983; Schraeder,
Quinn, Stockman, & Miller, 1999) and (b) development of new, more appropriate norm-referenced instruments (Vaughn-
Cooke, 1983; Washington, 1996). Sole use of criterion-referenced approaches, such as language sampling, has the chief
disadvantage of insufficient data supporting that strategy in screening and identification. Washington also noted that
language analyses that might be conducted for young speakers of Black English are hampered by the absence of
appropriate norms because normative data are currently available only for adolescents and adults. However, the many
proponents of a criterion-referenced or descriptive approach (e.g., see Damico, Secord, & Wiig, 1992; Robinson-
Zañartu, 1996) would argue that despite their drawbacks, descriptive strategies offer the least dangerous of the choices.
Not much progress has been made in the development of appropriate norm-referenced instruments; however, that may
change in response to pressures for improved nonbiased assessment. In addition, perusal of recently developed tests
suggests that more sophisticated efforts are being made to consider dialect use in the development of tests for more
diverse populations. This has included the test developer’s examination of item bias for minority children (Plante,
personal communication). Depending on when it is obtained, the resulting data can be used in the test’s early
development to lead to less biased testing or can be presented to show that a relatively unbiased measure has been
achieved.
Beyond the realm of traditional recommendations for improving language assessment validity for diverse groups of
children, attention has been paid recently to the development of methods that seek to reduce the effects of prior
knowledge and experience on performance. Two approaches of particular interest are processing-dependent measures
and dynamic assessment methods. The development of processing-dependent measures involves the use of tasks with
either high novelty or high familiarity for all participants (e.g., Campbell, Dollaghan, Needleman, & Janosky, 1997).
Dynamic assessment methods focus on the child’s learning of new material rather than acquired knowledge. This is done
as a means of leveling the effects of prior experience and obtaining information about how to support the child’s learning
beyond the assessment situation (e.g., Gutierrez-Clellan, Brown, Conboy, & Robinson-Zañartu, 1998; Olswang, Bain, &
Johnson, 1992; Peña, 1996). Although proposed as being applicable to identification decisions, these two types of
measures are more frequently used for descriptive purposes and are discussed more thoroughly in the next chapter.
Page 231
Assessments designed to address the needs of children who can be described as having LEP are growing in number.
Table 9.3 illustrates some of the measures that are being developed for use with children from diverse linguistic and
cultural backgrounds. Clearly at this point, the majority of these measures have been developed for children with Spanish
as their first language. Some of these measures are developed “from scratch” and thus can take advantage of the existing
knowledge base concerning development and disorders in the target languages. In contrast, others are little more than
translations of existing tests—a practice that requires considerable care and may still result in measures that do not get at
the heart of major developmental tasks in the language. For example, translations can be hampered by items that do not
have true counterparts or that will require greater linguistic complexity to convey information in the target language than
in the original. Consumers should be cautioned to be skeptical of their own comfort level with such adaptations of
familiar tests. Further, they will want to be careful of the match between the dialect spoken by the child and the dialect in
which a test is written.
I encourage you to look at more thorough discussions of the special challenges posed during the identification of
language impairment in several groups whose first or major language or dialect is either not English or not the dialect of
English typical of standardized tests. Sources warranting particular attention exist for children who are Native American
(Crago, Annahatak, Doehring, & Allen, 1991; Leap, 1993; Robinson-Zañartu, 1996), Hispanic American (Kayser, 1989,
1991, 1995), Asian American (Cheng, 1987; Pang & Cheng, 1998), and who speak Black English (Kamhi et al., 1996;
Van Keulen et al., 1998) and regional dialects (Wolfram, 1991).
Conducting Comparisons between Scores
Clinicians rarely compare scores on different instruments as part of screening. Instead, such comparisons occur more
commonly during identification. They are particularly common in settings requiring a comparison of nonverbal and
verbal skills called cognitive referencing. Despite widespread criticism of this practice (Aram, Morris, & Hall, 1993;
Fey, Long, & Cleave, 1994; Kamhi, 1998; Krassowski & Plante, 1997; Lahey, 1988), its use is nonetheless mandated in
several states to justify services. In addition, it has sometimes been used in research definitions of SLI and other learning
disabilities (see a lengthier discussion of this point in chap. 5). Comparisons of this kind are also used as a means of
identifying strengths and weaknesses in preparation for planning intervention—a descriptive use that is touched on in the
next chapter.
When single pairs of scores are compared, the comparison is frequently referred to as discrepancy analysis; when larger
numbers of scores are compared, it is more frequently referred to as profile analysis. Numerous discussions of the
hazards of this type of comparison are provided in the literature (e.g., McCauley & Swisher, 1984b; Salvia & Ysseldyke,
1998). The focus of the current discussion is the use of such comparisons in identification.
For purposes of illustration, imagine that a child’s overall score on a language measure is to be compared with her
performance on a nonverbal measure of intelli-
Page 232
Table 9.3
Selected Tests Designed for Children Whose Primary Language Is Not English
(Compton, 1996; Roussel, 1991)

Test Ages Language Oral Language Reference


Modalities & Domains

Bilingual Syntax Measure– Grades K–12 Chinese E-Sem Tsang, C.(n.d.) Bilingual
Chinese (Tsang, n.d.) Syntax Measure–Chinese.
Berkeley, CA: Asian-
American Bilingual Center.
Spanish Structured 3-0 to 5-11; Spanish E Werner, E.O., & Kresheck, J.
Photographic Expressive 4-0 to 9-5 S. (1989). Spanish Structured
Language Test (Werner & Photographic Expressive
Kresheck, 1989) Language Test. Sandwich, IL:
Janelle.
Ber-Sil Spanish Test 4 to 12 years Spanish R-Sem, Morph Beringer, M. (n.d.). Ber-Sil
(Beringer, n.d.) Spanish Test. Rancho Palos
Verdes, CA: The Ber-Sil
Company.
Austin Spanish Articulation 3 years to adult Spanish E-Phon Carrow-Woolfolk, E. (n.d.).
Test (Carrow-Woolfolk, n.d.) Austin Spanish Articulation
Test. Allen, TX: DLM
Teaching Resources.
Compton Speech and 3 to 6 years Spanish R & E-Phon, Sem, Syn Compton, A. J., & Kline, M.
Language Screening (n.d.). Compton Speech and
Evaluation–Spanish Language Screening
(Compton & Kline, n.d.) Evaluation–Spanish. San
Francisco: Institute of
Language.
Test de Vocabulario en 2-6 to 17-11 Spanish R-Sem Dunn, L. M., Lugo, D.E.,
Imagenes Peabody (Dunn, Padilla, E.&R., E Dunn, L.M.
Lugo, Padilla, & Dunn, 1986) (1986). Test de Vocabulario en
Imagenes Peabody. Circle
Pines, MN: American
Guidance Service.
Page 233
Expressive One-Word Picture 2 to 11 Spanish E-Sem Gardner, M. E (n.d.).
Vocabulary Test–Spanish Expressive One- Word Picture
(Gardner, n.d.) Vocabulary Test-Spanish. San
Francisco: Children’s Hospital
of San Francisco.
Preuba del Desarrollo Inicial 3 to 7 Spanish R-Sem, Syn Hresko, W. P., Reid, D. K., &
del Lenguaje (Hresko, Reid, & Hammill, D. D. (n.d.). Preuba
Hammill, n.d.). del Desarrollo Inicial del
Lenguaje. San Antonio, TX:
Pro-Ed.
Clinical Evaluation of 6 to 21 Spanish R & E-Sem, Semel, E., Wiig, E. H., &
Language Function–3 Spanish Morph, Syn, Secord, W. (n.d.). Clinical
Edition (Semel, Wiig, & Prag Evaluation of Language
Secord, n.d.) Function-3 Spanish Edition.
San Antonio, TX:
Psychological Corporation.
Del Rio Language Screening 3 to 6 Spanish R-Sem Toronto, A. S., Leverman, D.,
Test (Toronto, Leverman, Hanna, C., Rosenzweig, P., &
Hanna, Rosenzweig, & Maldonado, A. (n.d.). Del Rio
Maldonado, n.d.) Language Screening Test.
Austin, TX: National
Educational Laboratory.
Preschool Language Scale–3 Birth to 6 Spanish E&R Zimmerman, I. L., Steiner, V.,
(Zimmerman, Steiner, & Pond, years & Pond, R. (1992). Preschool
1992) Language Scale–3. San
Antonio, TX: Psychological
Corporation.
Sequenced Inventory of 0-4 to 4-0 Spanish translation E & R Hedrick, D. L., Prather, E. M.,
Communication Development– Tobin, A. R., Allen, D. Y.,
Revised (Hedrick et al., 1984) Bliss, L. S., & Rosenberg, L.
R. (1984). Sequenced
Inventory of Communication
Development Revised Edition.
Seattle, WA: University of
Washington Press.
Bilingual Syntax Measure– Grades K–12 Tagalog E Tsang, C. (n.d.). Bilingual
Tagalog (Tsang, n.d.) Syntax Measure-Tagalog.
Berkeley, CA: Asian–
American Bilingual Center.

Note. E = Expressive. R = Receptive. E-Sem = Expressive Semantics, etc. Morph = Morphology. Phon = Phonology.
Syn = Syntax. Prag = Pragmatics.
Page 234
gence. Imagine that she receives a standard score of 70 on the former and 90 on the latter. On the face of this
comparison, it looks like there is quite a difference. However, differences between scores, also called difference scores
or discrepancies, are often less reliable than the scores on which they are based. In fact, the likelihood that observed
differences are due to error rather than real differences is affected by three factors: the reliability of each measure, the
correlation of the two measures, and the similarity of their normative samples (Salvia & Ysseldyke, 1998).
The task of assessing norm comparability is as straightforward as looking over descriptions of each normative group to
determine whether they seem to differ in ways that could affect the scores to be compared. To see why this is necessary,
recall that the standard scores best used to summarize test performance include the group mean in their calculation.
Therefore, something about the normative group may push one group mean higher (e.g., one group is more “elite” in
some sense than the other). Consequently, one would fare more poorly in a comparison against that group than against a
group with a lower mean, even if one’s true abilities in the two areas were comparable. To provide a poignant example,
imagine a ruthless clinician has decided to compare your language and nonverbal skills—using scores obtained by
comparing your performances against those of Nobel laureates in literature for the former and fifth graders for the latter.
Not only could you legitimately question the inappropriateness of the norms as a basis of each of the scores, you could
also vehemently protest the resulting comparison. Thankfully, flagrant mismatches between test norms used in
comparisons may not occur outside of examples like this one. However, if overlooked, more subtle mismatches can
nonetheless contribute to poor decisions and inappropriate clinical actions.
Taking test error and test correlation into account is less straightforward than inspecting norms. On the basis of ideas
analogous to those used for calculating a confidence interval around a single score, however, it is possible to calculate a
confidence interval around a difference score. Salvia and Ysseldyke (1998) described two methods based on differing
assumptions about the causal relationship of the two skills being compared. In addition to the actual score data, both
methods require information about the reliability of the measures being used and about their correlation. Whereas the
relevant information about reliability and the nature of normative samples should be readily available for individual
measures, information about the correlation between measures will often be lacking. In that event, abandoning a direct
comparison and instead noting the results of each test as supporting or not supporting the identification of a problem in a
given area may represent the best alternative (McCauley & Swisher, 1984b).
Even when a difference between two scores is found to be reliable, Salvia and Good (1982) pointed out, a difference of
that magnitude may not be particularly uncommon, or, even more importantly, it may not be functionally meaningful.
Because of the resources involved, determining the functional significance of differences in skill levels represents yet
another area in which clinicians must look to the research literature to help them interpret their clinical data. Fortunately,
in cases where comparisons between scores affect identification decisions, there is a rich literature examining these
issues (e.g., for SLI). Clinicians can be more active and work to change policy in settings in which the use of
discrepancies is mandated for purposes for which they have been found to lack meaning.
Page 235
In summary, comparing scores is a more complicated endeavor than it first appears, involving as it does not only the
child’s test scores but also the properties of the two tests, especially their norms and intercorrelaltion. A well-reasoned
conservatism in undertaking identifications based on such comparisons should be joined by a healthy appetite for the
clinical literature exploring their significance.
Taking into Account Base Rates and Referral Rates
Each of the special considerations addressed earlier had a more specific focus on test selection or on the use of tests with
a particular child. Two other factors that affect screening and identification decisions really represent features of the
clinical environment: the rarity of the disorder (the base rate of the disorder) and the frequency with which referrals are
made in a particular setting (the referral rate). In this section, these two topics are discussed briefly because of their
effect on screening programs.
The lower the base rate of the disorder—that is, the rarer the disorder in the general population—the more likely it
becomes that the positive results of screening or identification are actually false positives rather than true positives
(Hummel, 1999). Shepard (1989) pointed out that although people understand that classification error will occur based
on fallible measures and decision processes, they fail to appreciate that that error will fall equally on those children who
are identified as having a disorder as those who are not, even when the validity coefficient for the measure being used is
quite large. She concluded that when base rates are low, “even with reasonably valid measures, the identifications will be
equally divided between correct decisions and false positive decisions” (Shepard, 1989, p. 551). This problem is
particularly acute when measures are less valid for a given population, such as minority children, where
overidentification is very likely to result (Schraeder et al., 1999).
Concern about low base rates has led public health researchers and psychologists interested in rare psychiatric outcomes
(e.g., suicide) to develop several strategies designed to target screening at subsets of the larger population with higher
base rates. These include strategies that include the use of multistep screening procedures and the application of
screening procedures to subgroups who are expected to have higher prevalence rates than the general population
(Derogatis & DellaPietra, 1994). Currently, the prevalence of childhood language disorders across all types is not
particularly low, as can be illustrated by the fact that it is estimated that children with language disorders constitute 53%
of all speech-language pathologists’ case loads (Nelson, 1993). Nonetheless, it is sufficiently low that careful selection of
groups for language screening makes good sense. Children about whom concerns are expressed or who are
demonstratively failing in some aspect of their adaptation to school or home environments make obvious candidates for
more focused screenings and indeed are often seen for screening prior to more comprehensive evaluations.
Screening programs in preschool education are associated with enormous differences in referral rates (Thurlow,
Ysseldyke, & O’Sullivan, 1985, as cited in Nuttall et al., 1999), the rates at which children who are screened are referred
on for additional assessment. This variability leads to concerns about overreferral when referral rates are particularly
high and underreferral when they are particularly low. Because
Page 236
overreferrals needlessly tax clinical resources, parental concern, and the child’s patience, whereas underreferrals deprive
children of needed attention, steps to study and alter referral rates have been recommended. Changes in the targets for
screening and the criteria (including cutoffs) used can be made to address verified inadequacies in the screening
mechanism. In addition, the use of a second-level screening using measures that are intermediate in their efficiency and
comprehensiveness between initial screenings and full-fledged assessments has been recommended (Nuttall et al., 1999).
Available Tools
Screening
Available screening measures differ in terms of whether information is obtained directly by the speech-language
pathologist and whether the measurement is formal or informal. Screening methods include the use of norm-referenced
standardized tools as well as informal clinician-developed measures. Over the past few years there has been growing
interest in the development of questionnaires that might be used to increase the involvement of parents and others
familiar with the child and improve the quality of information obtained from them. More recently still, there has been an
interest in the development of criterion-referenced authentic assessments in which specific minimal competencies are
evaluated in a familiar setting. Schraeder et al. (1999) described such a protocol that was developed for use with young
speakers of Black English. Because its elements were selected for their high degree of overlap with features of Standard
American English, Schraeder and her colleagues suggested its potential relevance for many children in the targeted age
group of 3-year-olds.
Parent Questionnaires and Related Instruments
Although historically some instruments have incorporated the use of parent report for very young children (e.g., the
Sequenced Inventory of Communicative Development, Hedrick, Prather, & Tobin, 1975), extensive development of
parent questionnaires for language-disorder screening has blossomed only in the past decade. The use of such
instruments is welcomed from a family-centered perspective (Crais, 1993) because parents are given the opportunity to
share their expertise concerning the child as part of their collaboration in the assessment process. In addition, these
measures also show good potential for efficient, valid use from a psychometric point of view. One obvious advantage
that they have over the clinician-administered procedures is their ability to obtain information that has been accumulated
by the parent over time using questions that cover a variety of situations and settings. For some children and at some
times, the testing advantage is irrefutable: The child will simply not cooperate for more direct testing or is so thoroughly
affected by the testing situation as to make the results of structured observations hopelessly flawed. Even when children
are more amenable to interacting with strangers, parent questionnaires may help remove the subtler invalidating
influence of the clinician on the child’s behavior (Maynard & Marlaire, 1999).
Page 237
On the basis of a growing number of studies, it appears that parent questionnaires may reliably and validly be used to
obtain information about a number of language areas, especially expressive vocabulary and syntax—although most
individual measures are still very undeveloped. Leading the trend toward increased development of these measures, the
MacArthur Communication Development Inventories (Fenson et al., 1991) has been thoroughly studied (e.g., Bates,
Bretherton, & Snyder, 1988; Dale, Bates, Reznick, & Morisset, 1989; Reznick & Goldsmith, 1989). In addition, it has
also been effectively adapted for use with other languages, including Italian, Spanish, and Icelandic (Camaioni, Castelli,
Longobardi, & Volterra, 1991; Jackson-Maldonado, Thal, Marchman, Bates & Gutierrez-Clellan, 1993; Thordardottir &
Ellis Weismer, 1996). Other tools that assess communication more broadly have also been developed but have received
less widespread attention and validation (e.g, Girolametto, 1997; Hadley & Rice, 1993; Haley, Coster, Ludlow,
Haltiwanger, & Andrellos, 1992). Table 9.4 lists five instruments for use with English-speaking children under the age of
3, each of which consists of a parent questionnaire or makes use of parent report for at least some items.
Questionnaires that take advantage of the familiarity of other adults with the child—usually classroom teachers—are also
being developed (Bailey & Roberts,
Table 9.4
Instruments for Use With Children Under 3 Years of Age,
Including Parent Reports

Measure and Source Ages covered Receptive or Expressive Areas of Language Covered

Language Development Survey 2-year-olds E Semantics


(Rescora, 1989). From “The
language development survey: A
screening tool for delayed
language in toddlers.’’ Journal of
Speech and Hearing Disorders, 54,
587–599.
MacArthur Communicative 8 months to 2½ E Semantics
Development Inventories (Fenson, years
Dale, Reznick, Thal, Bates,
Hartung, Pethick, & Reilly, 1991).
San Diego, CA: San Diego State
University, Center for Research in
Language.
Receptive-Expressive Emergent 0 to 3 years R and E
Language Test (2nd ed.; Bzoch &
League, 1971). Austin, TX: Pro-Ed.
Rosetti Infant–Toddler Language 0 to 3 years R and E Pragmatics, play, comprehension,
Scale (Rosetti, 1990). East Moline, expression
IL: LinguiSystems.
Sequenced Inventory of 4 months to 4 R and E Phonology, morphology, syntax,
Communication Development- years semantics
Revised (Hedrick, Prather, &
Tobin, 1984). Seattle, WA:
University of Washington Press.
Page 238
1987; Sanger, Aspedon, Hux, & Chapman, 1995; Semel, Wiig, & Secord, 1996; Smith, McCauley, & Guitar, in press;
Stokes, 1997). Results of these have also been compared with parent questionnaires (Whitworth, Davies, & Stokes,
1993) and against formal assessments (Botting, Conti-Ramsden, & Crutchley, 1997). Usually, however, these
questionnaires have not been developed for use in the identification process, but rather to describe the nature of problems
facing the child in the classroom. Thus, they will be considered in the next chapter, which deals with description.
Norm-Referenced Standardized Measures
Standardized measures are not well established as screening tools in the field. Only 50% of the 109 clinicians in Oregon
responding to a survey concerning their test use reported that they used standardized measures for screening (Huang et
al., 1997). Another related result from that same study was that only I screening test (the Screening Test of Adolescent
Language, Prather, Breecher, Stafford, & Wallace, 1980) appeared in the list of 10 tests that are most commonly used by
speech-language pathologists in their work with four age groups (0–3, 4–5, 6–12, and 13–19). Nonetheless, standardized
screening of younger children has received increased attention with the IDEA requirement that children with
communication disorders be identified before entering school (Nuttall, Romero, & Kalesnik, 1999; Sturner, Layton,
Evans, Heller, Funk, & Machon, 1994).
Stumer et al. (1994) reviewed 51 measures available for speech and language screening covering at least some part of the
3–6-year age span. In that review, the researchers found that only 6 of the measures they examined provided sufficient
normative data, and were both brief (i.e., requiring 10 minutes or less) and comprehensive (i.e., covering more than one
modality or domain). Thus, despite a playing field filled with many players, the number of instruments that warrant
serious consideration as a comprehensive language screening tool are relatively few. Table 9.5 describes the four tools
supported in Sturner et al.’s review.
Despite the focus of Sturner et al. (1994) on preschool screening measures, many of the measures studied by Sturner et
al. also extend to cover school-age children. Nonetheless, the availability of measures for both younger school-age
children and adolescents is greatly reduced compared with those available for preschoolers. This is probably due, for the
most part, to the various referral mechanisms that can reduce the need for formal screenings. Also, the persistent nature
of language problems means that screening of older children and adolescents for language disorders will usually only be
needed if screenings have been absent or ineffective at younger ages.
Identification
Norm-Referenced Standardized Instruments
Even children within any specific category of developmental language disorders (i.e., language disorder associated with
hearing loss, autism spectrum disorder, mental retardation, and SLI) vary considerably in the areas of language that are
affected. Thus, it is important to be quite comprehensive in the identification process, particularly because
Page 239
Table 9.5
Communication Screening Measures for Children Between 3 and 7 Years of Age That Were Found to Be Brief, Norm-
Referenced, and Comprehensive (Defined as Phonology [Articulation] and Other Language Domains)
by Sturner, Layton, Evans, Heller, Funk, and Machon (1994)

Test Ages Expressive Receptive Semantics Morphosyntax Phonology Pragmatics Reviewed


Covered in MMY?

Communication
Screen 3 to 7 X X X
(Striffler & Willis, years
1981)
Fluharty
Preschool Speech
and 2 to 6 X X X X X X
Language years
Screening Test
(Fluharty, 1978)
Physician‘s
Developmental <1 to 6 X
Quick years
Screen (Kulig &
Baker, 1975)
Stephens Oral
Language PreK–lst X X X
Screening grade
Test (Stephens,
1977)
Sentence
Repetition
Screening Test 4 to 5 X
(Sturner, Kunze, years
Funk, & Green,
1993)
Texas Preschool
Screening (Haber 4 to 6 X X
& Norris, 1983) years

Note. MMY = Mental Measurement Yearbooks.


Page 240
part of that process will often be the identification of which aspects of language are affected More comprehensive
coverage across modalities (receptive, expressive) and domains of language (e.g., syntax, phonology) can be achieved
through the use of a measure designed for that purpose (e.g., the Test of Language Development—Primary: 3;
Newcomer & Hammill, 1997). It can also be achieved through the use of a battery of tests that provide more
comprehensive coverage or through a combination of these methods. Even when a “comprehensive” measure is used,
however, certain aspects of language function (especially pragmatics and discourse) are almost certainly overlooked.
The Appendix lists over 50 tests that have been described as useful in the identification process. The table includes very
basic information about the test’s identifying information, content, and intended population. Almost all of the measures
published between 1989 and 1996 have been reviewed for the Mental measurements yearbook on-line review service,
thus allowing anyone with access to the Internet an opportunity to examine at least one, and often two, independent
reviews. Earlier tests are likely to have been reviewed in the Mental measurement yearbook printed volumes. Tests
published after about 1996 are likely to be reviewed soon, perhaps even before the publication of this book.
Although the Appendix is not intended to be exhaustive, the number of tests it includes illustrates the staggering task
facing clinicians who must choose among them. It is interesting to note the relatively large number of tests that have
been created in the 1990s and the relatively small number of publishing houses responsible for their availability if not
their original construction. On the plus side, this means that in efforts to increase the quality of available measures,
individual clinicians and the profession can focus their cooperative interactions with fewer parties. On the negative side,
it means that publishers are often in the position of competing largely with their own products—a prospect that makes it
unlikely for free market pressures to help drive the quality of tests higher.
Criterion-Referenced Measures
In the realm of criterion-referenced measures, specific measures obtained through language analysis (e.g., mean length of
utterance, or MLU; 14-morpheme count; type-token ratio) are gaining increasing support in the identification process (e.
g., Aram et al., 1993). In particular, some researchers have used MLU as an identification tool and found it to be more
consistent with clinician judgments than certain test data (M. Dunn, Flax, Sliwinski, & Aram, 1996). Usually, however,
MLU is used in combination with norm-referenced measures (Leonard, 1998). Because language analysis measures are
typically considered more useful in description than identification, the next chapter contains a more detailed account of
recent studies in which their strengths and limitations are examined. Nonetheless, it is important to reiterate here that
their use in identification is growing in significance.
Practical Considerations
In chapter 4, several variables affecting clinicians were highlighted for their potential effects on speech-language
pathologists. These variables included federal legislation, local regulations, and global changes in perspective toward
behavioral problems. In
Page 241
cases of screening and identification, particularly as they are practiced in school settings, those variables can
dramatically affect the shape of practice—both for better and for worse (Cirrin et al., 1989). In this brief section, the
effects of these factors on screening and identification are primarily discussed through practice constraints related to
determining children’s eligibility for services.
In 1989, Nye and Montgomery examined the criteria used in 47 states to identify children as having a language disorder.
They used a case example in which a 13-year-old girl who moved frequently because her father was in the military had
variously been considered “language disordered in one state, learning disabled in another, ineligible in a third, and
eligible only for tutorial support in a fourth.” (Nye & Montgomery, 1989, p. 26). In the 47 states they examined, they
found that although most provided specific definitions of language disorder, the definitions were highly inconsistent
from one state to the next. Only about a half of the states made some reference to the components of language, and
among those that did, semantics and syntax were included far more frequently than phonology, morphology, and
pragmatics. Twenty-one states required the use of at least one standardized language test and only 7 required use of a
language sample. Three different means of finding eligibility were identified across the states—the use of a discrepancy
formula, a rating–severity scale, and professional report. Nye and Montgomery noted the poor reliability likely to be
attached to the use of rating–severity scales. Consistent with the poor evaluation of discrepancy scores even in the 1980s,
the authors expressed dismay at the relative frequency with which discrepancy formulas were used. However, they
seemed to have combined instances in which a cutoff is used (e.g., 1.5 standard deviations below the mean on a
standardized measure) with the truly more notorious instances of cognitive referencing in which a discrepancy is found
between two measures for a given child. This practice makes the extent of cognitive referencing difficult to determine
from their report. In their conclusions, Nye and Montgomery pointed out the need for greater uniformity in terminology
and criteria used with this population.
In case you have been reading this account and hoping that things changed rapidly, a brief look at similar variables 4
years later (Apel, Hodson, Shulman, & Gordon-Brannan, 1994) will be of interest. Apel and his colleagues examined the
eligibility guidelines for most states (data for Tennessee could not be obtained) and for the District of Columbia. The
data showed continuing inadequacy in the definitions being used. Definitions of language used by state Departments of
Education included reference to both oral and written language only 40% of the time, with the majority of state
definitions including either no reference to oral or written language (40%) or definitions addressing oral language only
(20%). Specific guidelines for eligibility were often missing (46%) or were quite heterogeneous. Although standard
scores often figured in available guidelines, cutoffs were quite variable (ranging from 1.5 to greater than 2 standard
deviations below the mean), encouragement to use multiple standardized measures was often absent, and severity ratings
were sometimes used as bases for eligibility. When specific criteria for preschool children were sought, only 8 states
(16%) had developed criteria for that population and the types of criteria used were quite variable. Among the practices
incorporated in these guidelines were the use of percentage delay as the sole criterion or as part of a more complex
criterion—a
Page 242
practice that, unfortunately, relies on the use of notoriously unreliable age-equivalent scores.
In short, 4 years did not appear to have resulted in many improvements in the practices reflected in state regulations.
Where are we today? A study of state regulations comparable with those of Nye and Montgomery (1989) and Apel et al.
(1994) is currently underway by ASHA (Susan Karr, personal communication). Although these data are in the process of
being analyzed, it seems unlikely that the fit between legislatively influenced practice and ‘‘best” practices will have
been brought into much better alignment than that reported a decade ago by Nye and Montgomery. One positive trend,
however, is the intention of the developers of this most recent report to pair recommendations for components
comprising a defensible set of guidelines with preliminary follow-up efforts designed to result in the redrafting of
guidelines in at least a small number of states (Susan Karr, personal communication, October, 1999).
In 1997, Merrell and Plante called attention to the need for more studies aimed at furthering the development of
empirical bases for test selection. In particular, they noted that such studies can minimize test selection based on
subjective grounds, such as test familiarity and the recommendations or mandates of supervisors or districts. Although
responding to legal and workplace obligations is a necessary part of clinical practice, the nature of the response can go
beyond a simple compliance with an unsatisfactory status quo. More satisfying and ethical responses include increasing
the knowledge base of the profession through studies intended to identify best practices, increasing the knowledge base
of individuals around measurement issues, and working with professional organizations at the state and national levels to
effect needed changes.
Summary
1. Screening procedures, which typically are designed to be efficient in terms of time and other resources, lead to
decisions that a child receive further assessment. Strategic targeting of groups to be screened and the use of multiple
steps in screening procedures can improve screening accuracy when concerns about rarity of the disorder (low base
rates) and about overly high referral rates are encountered in a particular setting.
2. Identification involves the determination that a language problem exists, usually through the use of normative
comparisons facilitated by norm-referenced measures. Methods used in identification are often affected by the eligibility
requirements instituted by state Departments of Education.
3. Measures of sensitivity and specificity provide important empirical bases for test selection.
4. Cutoff scores are used in research and clinical practice to standardize identification decisions. Although their
empirical determination is feasible, they are often related to state eligibility requirements and are best used in
conjunction with awareness of the possible influence of measurement error and functional criteria that associate test
performance with real-world effects on the child’s social functioning.
Page 243
5. For all children, but particularly for those with limited English proficiency or dialect use that differs substantially from
the clinician’s, the clinician’s attention to the effects of language difference and cultural effects on assessment can
enhance validity. Although controversy persists in the face of an inadequate but growing literature on the subject of
language and cultural influences on assessment, clinical strategies for mitigating negative effects on assessment abound.
6. When scores on different measures are compared with one another during the identification process, factors requiring
consideration include the effects of test error, test correlation, and differences in normative groups.
7. Although most measures used in screening of older children are standardized norm-referenced tests, the use of parent
questionnaires for younger children and criterion-referenced measures derived from language analyses is becoming more
common with increasing research.
8. Among practical factors affecting the selection of measures for use in screening and identification are state guidelines,
which have historically been slow to respond to professional recommendations regarding best practices.
Key Concepts and Terms
cutoff score: the score that serves as a decision boundary in screening or identification, such that scores above a
particular level are seen are presenting nonproblematic performance and those below that level are seen as indicative of
potential disorder or difference.
gold standard: a measure used as the basis for comparison when a second measure is being evaluated. It is thought to
provide a “true” measurement of the behavior or characteristic being measured.
language difference: a difference in language use reflecting systematic variation in phonology, syntax, semantics, and so
forth, when compared with the dialect that is typically represented in standardized language measures.
limited English proficiency (LEP): language difficulties in English that appear to be related primarily to ineffective or
insufficient exposure to the language rather than to a language disorder, which may nonetheless be coexisting.
person first nomenclature: using terms such as “a child with impaired language” instead of “a language-impaired child’’
to avoid undue emphasis on the role of the problem in understanding the child.
referral rate: the rate at which children who are screened are referred on for additional assessment.
sensitivity: the ability of a measure to give a positive result when the person being assessed truly has the disorder.
specificity: the ability of a measure to give a negative result when the person being assessed truly does not have the
disorder.
Page 244
Study Questions and Questions to Expand Your Thinking
1. What might the effects of poor sensitivity be on the following decisions?
Screenings for hearing loss in children with known language impairments;
Identification testing for children’s eligibility for communication problems warranting early intervention services; and
Determination of the presence of a language disorder in a bilingual child.
2. What might the effects of poor specificity be on the following decisions?
Screenings of a large group of kindergarten children for speech and language disorders;
Identification of oral language disorder in children who are failing academically; and
Language screenings for children who speak Spanish-influenced English.
3. Imagine that you are a school speech-language pathologist who is interested in obtaining information about the
specificity and sensitivity of your own screening procedures? How might you obtain the information you need for
looking at both hits and both kinds of misses—false positives and false negatives? Which kind of information will be
most difficult to obtain?
4. Use Appendix A and your reading of this chapter to consider the following questions. What domains of language and
what age groups appear to be less well represented in standardized tests? Besides those reasons given in the text, can you
think of reasons for these patterns?
5. Take two measures listed in Appendix A that are said to target one or more language domains and modalities in
common. Compare and contrast the content of these shared components in terms of numbers and kinds of items, tasks,
and stimuli.
6. On the basis of what you have read, create a list of 5 research questions that, if answered, would greatly improve
screening and assessment practices in speech-language pathology.
Recommended Readings
American Speech-Language-Hearing Association (1993). Definitions of communication disorders and variations. Asha,
35(Suppl. 10), 40–41.
Hansen, J. C. (1999). Test psychometrics. In J. W. Lichtenberg & R. K. Goodyear (Eds.), Scientist–practitioner
perspectives on test interpretation (pp. 15–30). Boston: Allyn & Bacon.
Maynard, D. W., & Marlaire, C. L. (1999). Good reasons for bad testing performance: The interactional substrate of
educational testing. In D. Kovarsky, J. Duchan, & M. Maxwell (Eds.), Constructing (in)competence (pp. 171–196).
Mahwah, NJ: Lawrence Erlbaum Associates.
References
American Speech-Language-Hearing Association (1999). Guidelines for roles and responsibilities of the school-based
speech-language pathologist [On-line]. Available: http:/www.asha.org/professionals/library/slpschool_i.htm#purpose.
Page 245
Apel, K., Hodson, B., Shulman, B., & Gordon-Brannan, M. (November, 1994). Severity ratings and eligibility criteria: A
(confused?) state of the union. Miniseminar presented to the annual convention of the American Speech-Language-
Hearing Association, New Orleans, LA.
Aram, D. M., Morris, R., & Hall, N. E. (1993). Clinical and research congruence in identifying children with specific
language impairment. Journal of Speech and Hearing Research, 36, 580–591.
Bailey, D., & Roberts, J. E. (1987). Teacher–Child Communication Scale. Chapel Hill, NC: University of North Carolina.
Bates, E., Bretherton, I., & Snyder, L. (1988). From first words to grammar: Individual differences and dissociable
mechanisms. Cambridge, England: Cambridge University Press.
Botting, N., Conti-Ramsden, G., & Crutchley, A. (1997). Concordance between teacher/therapist opinion and formal
language assessment scores in children with language impairment. European Journal of Disorders of Communication,
32, 317–327.
Bracken, B. A. (1987). Limitations of preschool instruments and standards for minimal levels of technical adequacy.
Journal of Psychoeducational Assessment, 4, 313–326.
Brown, J. (1989). The truth about scores children achieve on tests. Language, Speech, and Hearing Services in Schools,
20, 366–371.
Bzoch, K. R., & League, R. (1971). The Receptive–Expressive Language Emergent Language Scale—Revised.
Gainesville, FL: Language Education Division, Computer Management Corporation.
Camaioni, L., Castelli, M. C., Longobardi, E., & Volterra, V. (1991). A parent report instrument for early language
assessment. First Language, 11, 345–359.
Campbell, T., Dollaghan, C., Needleman, H., & Janosky, J. (1997). Reducing bias in language assessment: Processing-
dependent measures. Journal of Speech-Language-Hearing Research, 40, 519–525.
Cheng, L. L. (1987). Assessing Asian language proficiency: Guidelines for evaluating limited-English-proficient
students. Rockville, MD: Aspen.
Cirrin, F. M., Bashir, A., Brinton, B., Damico, J. S., Dublinske, S., Edwards, E. B., Grimes, A. M., Kamhi, A. G.,
Prelock, P. A., Rodriguez, J. M., Shulman, B. B., Tibbits, D. F., & Westby, C. (March, 1989). Issues in determining
eligibility for language intervention. ASHA, 113–118.
Compton, C. (1996). A guide to 100 tests in special education. Upper Saddle River, NJ: Globe Fearon Educational
Publisher.
Crago, M. B., Annahatak, B., Doehring, D. G., & Allen, S. (1991). First language evaluation by native speakers: A
preliminary study. Journal of Speech-Language Pathology and Audiology, 15, 43–48.
Crais, E. R. (1993). Families and professionals as collaborators in assessment. Topics in Language Disorders, 14(1), 29–
40.
Dale, P., Bates, E., Reznick, S., & Morisset, C. (1989). The validity of a parent report instrument of child language at
twenty months. Journal of Child Language, 16, 239–49.
Damico, J. S., Secord, W. A., & Wiig, E. H. (1992). Descriptive language assessment at school: Characteristics and
design. In W. Secord (Ed.), Best practices in school speech-language pathology: Descriptive/nonstandardized language
assessment (pp. 1–8). San Antonio, TX: Psychological Corporation.
Damico, J. S., Smith, M., & Augustine, L. E. (1996). Multicultural populations and language disorders. In M. D. Smith
& J. S. Damico (Eds.), Childhood language disorders (pp. 272–299). New York: Thieme.
Derogatis, L. R., & DellaPietra, L. (1994). The use of psychological testing for treatment planning and outcome
assessment. In M. E. Mareish, (Ed.), The use of psychological testing for treatment planning and outcome assessment
(pp. 22–54). Hillsdale, NJ: Lawrence Erlbaum Associates.
Dollaghan, C., & Campbell, T. (1999, November). Is child language impairment a taxon? Paper presented at the
American Speech-Language-Hearing Association, San Francisco.
Dunn, L. M., & Dunn, L. M. (1997). Peabody Picture Vocabulary Test–III (3rd ed.). Circle Pines, MN: American
Guidance Service.
Dunn, L. M., Lugo, D. E., Padilla, E. R., & Dunn, L. M. (1986). Test de Vocabulario en Imagenes Peabody [Peabody
Picture Vocabulary Test]. Circle Pines, MN: American Guidance Service.
Dunn, M., Flax, J., Sliwinski, M., & Aram, D. M. (1996). The use of spontaneous language measures as criteria for
identifying children with specific language impairment: An attempt to reconcile clinical and research incongruence.
Journal of Speech and Hearing Research, 39, 643–654.
Page 246
Erickson, J. G., & Iglesias, A. (1986). Assessment of communicatin disorders in non-English proficient children. In O.
Taylor (Ed.), Nature of communication disorders in culturally and linguistically diverse populations (pp. 181–218). San
Diego, CA: College-Hill Press.
Feeney, J., & Bernthal, J. E. (1996). The efficiency of the Revised Denver Developmental Screening Test as a language
screening tool. Language, Speech, Hearing Services in Schools, 27, 330–332.
Fenson, L., Dale, P., Reznick, S., Thal, D., Bates, E., Hartung, J., Pethick, S., & Reilly, J. (1991). Technical manual for
the MacArthur Communicative Development Inventories. San Diego, CA: San Diego State University.
Fey, M. (1986). Language intervention with young children. San Diego, CA: College-Hill Press.
Fey, M., Long, S., & Cleave, P. (1994). Reconsideration of IQ criteria in the definition of specific language impairment.
In R. Watkins & M. Rice (Eds.), Specific language impairments in children (pp. 161–178). Baltimore: Paul H. Brookes.
Fluharty, N. (1978). Fluharty Preschool Speech and Language Screening Test. Boston: Teaching Resources.
Frankenburg, W. K., Dodds, J., & Archer, P. (1990). Denver–II: Technical manual. Denver, CO: Denver Developmental
Materials.
Frankenburg, W. K., Dodds, J., Fandal, A., Kazuk, E., & Cohrs, M. (1975). The Denver Developmental Screening Test:
Revised. Denver, CO: Denver Developmental Materials.
Girolametto, L. (1997). Development of a parent report measure for profiling the conversational skills of preschool
children. American Journal of Speech-Language Pathology, 6, 25–33.
Gutierrez-Clellen, V F., Brown, S., Conboy, B., & Robinson-Zañartu, C. (1998). Modifiability: A dynamic approach to
assessing immediate language change. Journal of Children s Communication Development, 19, 31–42.
Haber, J. S., & Norris, M. L. (1983). The Texas Preschool Screening Inventory: A simple screening device for language
and learning disorders. Children’s Health Care, 12(1), 11–18.
Hadley, P. A., & Rice, M. L. (1993). Parental judgments of preschoolers’ speech and language development: A resource
for assessment and IEP planning. Seminars in Speech and Language, 14, 278–288.
Haley, S. M., Coster, W. J., Ludlow, L. H., Haltiwanger, J. T., & Andrellos, P. J. (1992). Pediatric evaluation of
disability inventory, Version 1.0. Boston: New England Medical Center Hospitals.
Hansen, J. C. (1999). Test psychometrics. In J. W. Lichtenberg & R. K. Goodyear (Eds.), Scientist–practitioner
perspectives on test interpretation (pp. 15–30). Boston: Allyn & Bacon.
Haynes, W. O., Pindzola, R. H., & Emerick, L. L. (1992). Diagnosis and evaluation in speech pathology (4th ed).
Englewood Cliffs, NJ: Prentice–Hall.
Hedrick, D., Prather, E., & Tobin, A. (1975). Sequenced Inventory of Communication Development. Seattle, WA:
University of Washington Press.
Hirshoren, A., & Ambrose, W. R. (1976). Language, Speech, and Hearing Services in Schools, 7(2), 86–89.
Huang, R., Hopkins, J., & Nippold, M. A. (1997). Satisfaction with standardized language testing: A survey of speech-
language pathologists. Language, Speech, and Hearing Services in Schools, 28, 12–29.
Hummel, T. J. (1999). The usefulness of tests in clinical decisions. In J. W. Lichtenberg & R. K. Goodyear (Eds.),
Scientist–practitioner perspectives on test interpretation (pp. 59–112). Boston: Allyn & Bacon.
Hutchinson, T. A. (1996). What to look for in the technical manual: Twenty questions for users. Language, Speech, and
Hearing Services in Schools, 27, 109–121.
Individuals With Disabilities Education Act (IDEA). Pub. L. No. 101 –476, 104 Stat. 1103 (1990).
Jackson-Maldonado, D., Thal, D., Marchman, V., Bates, E., & Gutierrez-Clellen, V. (1993). Early lexical development
in Spanish-speaking infants and toddlers. Journal of Child Language, 20(3), 523–549.
Kamhi, A. (1998). Trying to make sense of developmental language disorders. Language, Speech, and Hearing
Disorders in Schools, 29, 3 544.
Kamhi, A., Pollock, K. E., & Harris, J. L. (Ed.). (1996). Communication development and disorders in African American
children: Research, assessment, and intervention. Baltimore: Paul H. Brookes.
Kaufman, A. S., & Kaufman, N. L. (1983). Kaufman Assessment Battery for Children. Circle Pines, MN: American
Guidance Service.
Kayser, H. (1989). Speech and language assessment of Spanish-speaking children. Language, Speech, and Hearing
Services in Schools, 20, 226–244.
Page 247
Kayser, H. (1991). Interpreters in speech-language pathology. Texas Journal of Audiology and Speech Pathology, 17, 28–
29.
Kayser, H. (Ed.). (1995). Bilingual speech-language pathology: An Hispanic focus. San Diego, CA: Singular Publishing
Group.
Kelly, D. J., & Rice, M. L. (1986). A strategy for language assessment of young children: A combination of two
approaches. Language, Speech, and Hearing Services in Schools, 17, 83–94.
Krassowski, E., & Plante, E. (1997). IQ variability in children with SLI: Implications for use in cognitive referencing in
determining SLI. Journal of Communication Disorders, 30, 1–9.
Kulig, S. G., & Baker, K. A. (1975). Physician’s Developmental Quick Screen for Speech Disorders. Galveston, TX:
Department of Pediatrics, University of Texas Medical Branch at Galveston.
Lahey, M. (1988). Language disorders and language development. New York: Macmillan.
Lahey, M. (1990). Who shall be called language disordered? Some reflections and one perspective. Journal of Speech
and Hearing Disorders, 55 612–620.
Leap, W. L. (1993). American Indian English. Salt Lake City, UT: University of Utah Press.
Lee, L. (1974). Developmental sentence analysis. Evanston, IL: Northwestern University Press.
Lehr, C. A., Ysseldyke, J. E., & Thurlow, M. L. (Eds.). (1986). Assessment practices in model early childhood education
programs. Psychology in the Schools, 24, 390–399.
Leonard, L. (1987). Is specific language impairment a useful construct? In S. Rosenberg (Ed.), Advances in applied
psycholinguistics, 1: Disorders of first-language acquisition (pp. 1–39). New York: Cambridge University Press.
Leonard, L. (1998). Children with specific language impairment. Cambridge: Massachusetts Institute of Technology
Press.
Leonard, L., & Weiss, A. L. (1983). Application of nonstandardized assessment procedures to diverse linguistic
populations. Topics in Language Disorders, 3(3), 35–45.
Mardell-Czudnoswki, C., & Goldenberg, D. (1983). Developmental Indicators for Assessment of Learning—Revised
(DIAL-R). Edison, NJ: Childcraft Education Corporation.
Maynard, D. W., & Marlaire, C. L. (1999). Good reasons for bad testing performance: The interactional substrate of
educational testing. In D. Kovarsky, J. Duchan, & M. Maxwell (Eds.), Constructing (in)competence (pp. 171–196).
Mahwah, NJ: Lawrence Erlbaum Associates.
McCarthy, D. A. (1972). Scales of Children’s Abilities. San Antonio, TX: Psychological Corporation.
McCauley, R. J., & Swisher, L. (1984a). Psychometric review of language and articulation tests for preschool children.
Journal of Speech and Hearing Disorders, 49, 34–42.
McCauley, R. J., & Swisher, L. (1984b). Use and misuse of norm-referenced tests in clinical assessment: A hypothetical
case. Journal of Speech and Hearing Disorders, 49, 338–348.
Meehl, P. E. (1992). Factors and taxa, traits and types, differences of degree and differences in kind. Journal of
Personality, 60, 117–174.
Meehl, P. E., & Yonce, L. J. (1994). Taxometric analysis: I. Detecting taxonicity with two quantitative indicators using
means above and means below a sliding cut (MAMBAC procedure). Psychological Reports, 74 (Monograph Supplement
1-V74), 1059–1274.
Meehl, P. E., & Yonce, L. J. (1996). Taxometric analysis: II. Detecting taxonicity using two quantitative indicators in
successive intervals of a third indicator (MAXCOV procedure). Psychological Reports, 78 (Monograph Supplement 1-
V78), 1091–1227.
Merrell, A., & Plante, E. (1997). Norm-referenced test interpretation in the diagnostic process. Language, Speech, and
Hearing Services in Schools, 28, 50–58.
Muma, J. (1985). “No news is bad news”: Response to McCauley and Swisher (1984). Journal of Speech and Hearing
Disorders, 50, 290–293.
Muma, J. (1998). Effective speech-language pathology: A cognitive socialization approach. Mahwah, NJ: Lawrence
Erlbaum Associates.
Nelson, N. W. (1993). Language intervention in school settings. In D. K. Bernstein & E. Tiegerman (Eds.), Language
and communication disorders in children (3rd ed., pp. 273–324). New York: Merrill.
Newborn Infant Hearing Screening and Intervention Act. Pub. L. No. 106 –113. 113 Stat 1501 (1999).
Newcomer, P. L., & Hammill, D. D. (1997). Test of Language Development—Primary: 3. Austin, TX: Pro-Ed.
Page 248
Nicolosi, L., Harryman, E., & Kresheck, J. (1996). Terminology of communication disorders (4th ed.). Baltimore:
Williams & Wilkins.
Norris, M. K., Juarez, M. J., & Perkins, M. N. (1989). Adaptation of a screening test for bilingual and bidialectal
populations. Language, Speech, and Hearing Services in Schools, 20, 381–390.
Nunnally, J. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.
Nuttall, E. V., Romero, I., & Kalesnik, J. (Eds.). (1999). Assessing and screening preschoolers: Psychological and
educational dimensions (2nd ed.). Needham Heights, MA: Allyn & Bacon.
Nye, C., & Montgomery, J. K. (1989). Identification criteria for language disordered children: A national survey.
Hearsay: The Journal of the Ohio Speech and Hearing Association, Spring, 26–33.
Olswang, L., Bain, B., & Johnson, G. (1992). Using dynamic assessment with children with language disorders. In S. F.
Warren & J. Reichle (Eds.), Causes and effects in communication language intervention (pp. 187–215). Baltimore:
Brookes Publishing.
Owens, R. E. (1995). Language disorders: A functional approach to assessment and treatment (2nd ed.). Boston: Allyn
& Bacon.
Pang, V. O., & Cheng, L. L. (Eds.). (1998). Struggling to be heard. The unmet needs of Asian Pacific American children.
Albany, New York: State University of New York Press.
Paul, R. (1995). Language disorders from infancy through adolescence. Assessment and intervention. St. Louis, MO:
Mosby Yearbook.
Peña, E. (1996). Dynamic assessment: The model and its language applications. In K. Cole, P. Dale, & D. Thal (Eds.),
Advances in assessment of communication and language (pp. 281–308). Baltimore: Brookes Publishing.
Plante, E. (1998). Criteria for SLI: The Stark and Tallal legacy and beyond. Journal of Speech, Language, and Hearing
Research, 41, 951–957.
Plante, E., & Vance, R. (1994). Selection of preschool speech and language tests: A data-based approach. Language,
Speech, and Hearing Services in Schools, 25, 15–23.
Plante, E., & Vance, R. (1995). Diagnostic accuracy of two tests of preschool language. American Journal of Speech-
Language Pathology, 4, 70–76.
Prather, E. M., Breecher, S. V. A., Stafford, M. L., & Wallace, E. M. (1980). Screening Test of Adolescent Language
(STAL). Seattle, WA: University of Washington Press.
Prizant, B. M., & Wetherby, A. M. (1993). Communication and language assessment for young children. Infants and
Young Children, 5, 20–34.
Rescorla, L. (1989). The language development survey: A screening tool for delayed language in toddlers. Journal of
Speech and Hearing Disorders, 54, 587–599.
Reveron, W. W. (1984). Language assessment of Black children: The state of the art. Papers in the Social Sciences, 4,
79–94.
Reznick, S., & Goldsmith, L. (1989). A multiple form word production checklist for assessing early language. Journal of
Child Language, 16, 91–100.
Robinson-Zañartu, C. (1996). Serving Native American children and families: Considering cultural variables. Language,
Speech, and Hearing Services in Schools, 27, 373–384.
Rosetti, L. (1990). Rosetti Infant–Toddler Language Scale. East Moline, IL: LinguiSystems.
Roussel, N. (1991). Appendix A: Annotated bibliography of Communicative Abilities Test. In E. V. Hamayan & J. S.
Damico (Eds.), Limiting bias in the assessment of bilingual children (pp. 320–343). Austin: Pro-Ed.
Sabatino, A. D., Vance, H. B., & Miller, T. L. (1993). Defining best diagnostic practices. In H. B. Vance (Ed.), Best
practices in assessment for school and clinical settings (pp. 1–28). Brandon, VT: Clinical Psychology Publishing.
Sabers, D., & Hutchinson, T. (1990). User norms software. Chicago: Riverside Publishing.
Salvia, J., & Good, R. (1982). Significant discrepancies in the classification of pupils: Differentiating the concept. In J.
T. Neisworth (Ed.), Assessment in special education (pp. 77–82). Rockville, MD: Aspen.
Salvia, J., & Ysseldyke, J. E. (1991). Assessment (5th ed.). Boston: Houghton Mifflin.
Salvia, J., & Ysseldyke, J. E. (1998). Assessment (7th ed.). Boston: Houghton Mifflin.
Sanger, D., Aspedon, M., Hux, K., & Chapman, A. (1995). Early referral of school-age children with language problems.
Journal of Childhood Communication Disorders, 16, 3–9.
Page 249
Sattler, J. M. (1988). Assessment of children. San Diego: Author.
Schraeder, T., Quinn, M., Stockman, I. J., & Miller, J. F. (1999). Authentic assessment as an approach to preschool
speech-language screening. American Journal of Speech-Language Pathology, 8, 195–200.
Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative
strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101.
Semel, E., Wiig, E. H., & Secord, W. A. (1966). Observation Rating Scales. Clinical Evaluation of Language
Fundamentals (3rd ed.). San Antonio: The Psychological Corporation.
Shepard, L. A. (1989). Identification of mild handicaps. In R. Linn (Ed.), Educational measurement (3rd ed., pp. 545–
572). New York: American Council on Education and Macmillan.
Smit, A. (1986). Ages of speech sound acquisition: Comparisons and critiques of several normative studies. Language,
Speech, and Hearing Services in Schools, 17, 175–186.
Smith, A. R., McCauley, R. J., & Guitar, B. (in press). Development of the Teacher Assessment of Student
Communicative Competence (TASCC) for children in grades 1 through 5. Communication Disorders Quarterly.
Sparrow, S., Balla, D., & Cicchetti, D. (1984). Vineland Adaptive Behavior Scales. Circle Pines, MN: American
Guidance Service.
Stephens, M. I. (1977). Stephens Oral Language Screening Test (SOLST). Peninsula, OH: Interim Publishers.
Stokes, S. F (1997). Secondary prevention of paediatric language disability: A comparison of parents and nurses as
screening agents. European Journal of Disorders of Communication, 32, 139–158.
Striffler, N., & Willis, S. (1981). The Communication Screen. Tucson, AZ: Communication Skill Builders.
Sturner, R. A., Kunze, L., Funk, S. G., & Green, J. A. (1993). Elicited imitation: Its effectiveness for speech and
language screening. Developmental Medicine and Child Neurology, 35, 715–726.
Sturner, R. A., Layton, T. L., Evans, A. W., Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and
language screening: A review of currently available tests. American Journal of Speech-Language Pathology, 3, 25–36.
Taylor, O. L., & Payne, K. T. (1983). Culturally valid testing: A proactive approach. Topics in Language Disorders, 3, 8–
20.
Terrell, S. L., Arensberg, K., & Rosa, M. (1992). Parent-child comparative analysis: A criterion-referenced method for
the nondiscriminatory assessment of a child who spoke a relatively uncommon dialect of English. Language, Speech and
Hearing Services in Schools, 23, 34–42.
Terrell, S. L., & Terrell, F. (1983). Distinguishing linguistic differences from disorders: The past, present, and future of
nonbiased assessment. Topics in Language Disorders, 3, 107.
Thordardottir, E. T., & Ellis Weismer, S. (1996). Language assessment via parent report: Development of a screening
instrument for Icelandic children. First Language, 16, 265–285.
Thorner, R. M., & Remein, Q. R. (1962). Principles and procedures in the evaluation of screening for disease. Public
Health Monograph, 67, 408–421.
Turner, R. G. (1988). Techniques to determine test protocol performance. Ear and Hearing, 9, 177—189.
Van Keulen, J. E., Weddington, G. T., & DeBose, C. E. (1998). Speech, language, and learning and the African
American child. Boston: Allyn & Bacon.
Vaughn-Cooke, F B. (1983). Improving language assessment in minority children. Asha, 25, 29–34.
Washington, J. A. (1996). Issues in assessing the language abilities of African American children. In A. G. Kamhi, K. E.
Pollock, & J. L. Harris (Eds.), Communication development and disorders in African American children (pp. 35–54).
Baltimore: Brookes.
Weddington, G. T. (1987). The assessment and treatment of communication disorders in culturally diverse populations.
Unpublished manuscript.
Whitworth, A., Davies, C., & Stokes, S. F (1993). Identification of communication impairments in preschoolers: A
comparison of parent and teacher success. Australian Journal of Human Communication Disorders, 21, 112–133.
Wolfram, W. (1991). Dialects and American English. Englewood Cliffs, NJ: Prentice-Hall.
Page 250
CHAPTER
10

Description: What Is the Nature of This Child’s Language?


The Nature of Examining Change

Special Considerations for Asking This Clinical Question

Available Tools

Practical Considerations
Nigel is a 9-year-old with mild mental retardation whose placement in a multiage classroom is complicated by a
moderate hearing loss and ADD. A 3-year reevaluation conducted at the beginning of the school year included extensive
audiological assessment as well as standardized language testing that confirmed particular difficulties in expressive
phonology and morphosyntax. Language sampling and a classroom checklist were used to help determine the
educational impact of Nigel’s difficulties and to help plan accommodations and develop Nigel’s individualized
educational plan.
Tao has a long history of communication problems that have changed with age. She was diagnosed with autism at age 4,
then Asperger’s syndrome at age 8. Now, at age 12, with appropriate accommodations and intensive treatment, she is in
a regular junior high school. Speech-language intervention has centered on addressing her pragmatic challenges with
peers and teachers. Goals in this area have been identified and tracked during the semester using a variety of descriptive
measures
Page 251
created by her clinicians. Recently a dynamic assessment designed to examine Tao’s emerging awareness of the
perspectives of others was undertaken as part of this process.
The Nature of Description
Describing their skills and the problems faced by children with suspected language impairments sometimes occurs as
part of screening, thus preceding the use of formal procedures associated with identification. More often, description
represents a critical component of initial assessments and continues throughout all of the later steps involved in speech-
language management. With such pervasiveness, description undoubtedly constitutes the major measurement task facing
clinicians.
The purposes served by description are varied. Descriptive measures are initially used to characterize the specific areas
of linguistic or communicative difficulty facing a child, the functional limitations those difficulties impose, and
increasingly the effects on the child’s social roles that are associated with the child’s language disorder (Goldstein &
Geirut, 1998). At the same time, descriptive measures can be used to help plan initial treatment strategies, choose
specific treatment goals, and provide the basis for later comparisons. During treatment, descriptive probes—especially of
untreated but related stimuli—and other descriptive measures are likely to provide some of the best evidence of
treatment effectiveness (Bain & Dollaghan, 1991; Olswang & Bain, 1994; Schmidt & Bjork, 1992) because they reflect
the extent to which generalization is occurring. In fact, much of the profession’s recent focus on measuring outcomes to
document the value of treatment (see Frattali, 1998) involves the development and use of descriptive measures.
Despite the ubiquity of descriptive measures (and perhaps because of it), the measurement challenges they present can be
overlooked, or at least underappreciated (Leonard, Prutting, Perozzi, & Berkley, 1978; McCauley, 1996; Minifie, Darley,
& Sherman, 1963). Illustrating a growing interest in those challenges, Secord (1992) devoted an entire book to
descriptive, nonstandardized language assessment. In an early chapter of that book, Damico, Secord, and Wiig (1992)
noted that effective descriptive assessment procedures need to be “as rigorous as norm-referenced tests” (p. 1). The
source of that rigor, however, is much less obvious than that associated with measures used for purposes of classification.
Much of the rigor associated with methods used in the identification of language impairment appears to reside in the
hands of others (e.g., test authors and publishers, individual researchers). In contrast, for descriptive measures, the
responsibility for rigor falls largely into the hands of the clinician. As Leonard (1996) observed, such measures are
“essentially experimental tasks”—often created by clinicians and sometimes borrowed directly from experimenters.
Fortunately, in creating and understanding such measures, the clinician has allies in the increasing number of clinician–
researchers in speech-language pathology and related fields who develop and share individual methods and reflections
on the measurement challenges they present. In this chapter, I try to pass along some of their insights and direct readers
to particularly helpful examples.
Page 252
Special Considerations for Asking This Clinical Question
The process of description can sometimes use norm-referenced measurement. When profiles of performance are
examined to assess broader patterns of strengths and challenges within different areas of communication, standardized
norm-referenced measures can provide useful information (Olswang & Bain, 1991). This is especially true when
limitations due to test content and measurement error are taken into account (McCauley & Swisher, 1984; Salvia &
Ysseldyke, 1981).
Usually, however, the process of description makes use of criterion-referenced measurement. Such measurement can
function at several levels of detail—from more global categorizations of language function in different modalities to the
detailed description of a specific language or communication skill (e.g., frequency of use of a particular grammatical
morpheme or communicative intent in a given conversational context). Although such descriptions may not always fit
within a view of measurement as the assignment of numbers to behaviors, they fit within the broader view of behavioral
measurement as a simplification process or as information compression used to aid decision making (Barrow, 1992;
Morris, 1994). Thus, as with all cases of measurement, our central concern with validity remains (APA, AERA, &
NCME, 1985; Messick, 1989). However, validity is fostered through means that may superficially appear unrelated to
the psychometric concerns described for norm-referenced instruments. For example, rather than a study of criterion-
related validity using numerous participants and other norm-referenced measures, evidence for descriptive measures may
involve the collection of supporting qualitative and subjective data for a much smaller number of cases, or even a single
case. Because a principal value of such measures is their close tie to a specific construct, the user’s alertness to the nature
of a targeted construct and the degree to which a specific measure serves as an acceptable indicator of it rises in
importance from large to gargantuan proportions.
Damico et al. (1992) discussed three complex characteristics pivotal to effective descriptive assessment techniques:
authenticity, functionality, and richness of description. Authenticity is used to refer to three related concepts: linguistic
realism, ecological validity, and psychometric veracity. Linguistic realism involves the treatment of communication in
data collection and analysis as a complex and synergistic process with the sharing of meaning as its goal, whereas
ecological validity refers to the preservation of natural communicative contexts in assessment. The third concept,
psychometric veracity, encompasses the traditional concepts of reliability and validity as well as the clinical practicality
of the measures in terms such as time and required resources. Concerns regarding authenticity have led to the use of the
term authentic assessment to refer to assessments designed with authenticity as their paramount virtue (e.g., Schraeder,
Quinn, Stockman, & Miller, 1999).
The term functionality as used by Damico et al. (1992) relates to effectiveness, fluency, and appropriateness of conveyed
meaning. This criterion focuses not just on obtaining information about clients’ underlying competence but also about
their ability to put knowledge into play effectively to achieve communication goals. The crite-
Page 253
rion of richness of description, cited by those same authors, entails the use of assessment procedures designed to provide
detailed descriptions of communicative performance leading to explanatory hypotheses for detected communication
difficulties. This criterion, then, associates descriptive measures with the manipulation of variables in the environment
(materials used, identity of communication partner, etc.) that can be studied for their immediate effect on performance.
I urge readers to examine the original source (Damico et al., 1992) in order to get a deeper feel for the intricacies
involved in assessment that preserve those characteristics of communication that make communication what it is. I also
suggest, however, that the overarching point Damico and his colleagues were making is that descriptive measures of
communication need to be valid—they need to measure what they purport to measure. Specifically, to the very great
extent to which communication is embedded in social interaction, intended to share meaning, and constrained by the
physiological and social makeup of its users, its measurement must honor those properties or suffer the fate of reduced
validity. The work of Damico et al. and numerous others (e.g., Kovarsky, Duchan, & Maxwell, 1999; Lund & Duchan,
1983, 1993; Muma, 1998) is extremely valuable in calling attention to these special properties—an endeavor made all
the more necessary by the frequent equating of principles such as validity only with norm-referenced measurement.
Because of growing sensitivity to the demands for a widening range of descriptive measures, advice about construction
of such measures by clinicians themselves has become increasingly available (e.g., Miller & Paul, 1995; Vetter, 1988).
Providing a succinct foundation for these recommendations, Vetter outlined a systematic process for developing informal
assessment procedures. In an earlier publication on criterion-referenced measures (McCauley, 1996), I modified that
process somewhat and have modified it further in Fig. 10.1 through the addition of a step encouraging clinicians to seek
out existing probes for possible use or adaptation.
In the process outlined in Fig. 10.1, the crucial first step is the formulation of the specific clinical question. In questions
of description, the clinician is relatively unencumbered by the external, regulatory forces (e.g., state requirements) that
affect both the kinds of clinical questions that are asked and the methods used to answer them. However, that does little
to decrease, and may even increase, the clinical perspicacity required at this step. The multiple levels of WHO’s
classification systems (WHO, 1980, 1998) come into play in the complexity of this step. Recall that these levels (e.g.,
impairment, disorder, disability, and handicap in the 1980 version) consider the broader effects of health conditions and
the role that society plays in determining the implications of a given condition for the individual. These levels bring to
mind the challenge of describing a child’s communication in terms of effects on the child’s participation in social roles,
as well as in the specifics of lexicon, grammar, and so forth. Consequently, the clinician who wishes to describe a child’s
communication will need to choose selectively from a large number of possible levels and areas for which description is
possible. In so doing, the clinician can focus on a smaller number of clinical questions whose answers can have a
powerful impact on the child’s treatment and subsequent functioning.
The remaining steps in Vetter’s process entail tailoring the procedure to meet the demands of a specific clinical question
and client, implementing it, and then evaluat-
Page 254

Fig. 10.1. Steps in the development of an informal measure.


ing its effectiveness—ideally through the accumulation of data spanning several clients. The clinician’s reactions to this
evaluative step in the process can include changing instructions or the specific items used, increasing the number of
items used in order to increase reliability, or abandoning the procedure altogether.
Particularly when a measure lends itself to use with numerous children, additional steps such as a more rigorous
evaluation of reliability and the development of local
Page 255
norms can be well worth the additional effort. Cirrin and Penner (1992) discussed how descriptive measures can be
implemented district-wide. Their recommendations make use of multiple stages to ensure feasibility and validity. Among
factors that they stressed are the need to (a) use pilot procedures with a small number of clinicians prior to widespread
use, (b) conduct initial training and follow-up sessions for all users, and (c) undertake a district-wide trial period. Cirrin
and Penner stressed the value of local norms as a means of improving eligibility decisions, but they also acknowledged
the heavy administrative demands this entails in terms of expertise and staff time. In so doing, they point to one of the
chief challenges of descriptive measures—making their construction and use fit within the sometimes harsh demands for
efficiency (especially time demands) facing most speech-language pathologists. However, it should always be
remembered that cutting corners by paying too much attention to being efficient can result in an incomplete picture of a
child’s problem that will result in the long run in far greater losses of time. This issue will receive additional attention
later in the chapter in the section entitled ‘‘Practical Considerations.”
Available Tools
One reason that description can seem relatively perplexing from a measurement perspective is the diversity of available
tools and strategies. This diversity includes tools that are quite standardized, tools proposed informally in research or
clinical publications, and tools that the clinician may decide to develop on-demand to address a specific clinical question
for which no commercially developed alternative is available. Although not exhaustive, a relatively detailed list of
available types of such measures has been offered by Damico et al. (1992)—language sample analyses, probes, rating
scales, and on-line observations. Although one can broadly categorize the tools and strategies listed by Damico et al.
(1992) as norm-referenced and criterion-referenced measures falling at various levels of standardization, considering
them in greater detail seems warranted. Consequently, all of the categories described by Damico et al. as well as
standardized norm-referenced measures and standardized criterion-referenced measures are briefly discussed in this
section.
Two additional assessment strategies are also highlighted—dynamic assessment (Gutierrez-Clellen, Brown, Conboy, &
Robinson-Zañartu, 1998; Lidz, 1987, Lidz & Peña, 1996) and qualitative measures (Olswang & Bain, 1994; Schwartz &
Olswang, 1996). These techniques are singled out for special attention by virtue of their emerging status as innovative
approaches to description. Although dynamic assessment has received considerable attention in the professional
literature (Butler, 1997), the use of qualitative measures represents a refinement of clinical practice that has received less
direct critical attention.
1. Standardized Norm-Referenced Measures
Standardized norm-referenced measures are frequently used to characterize areas of greater or lesser deficit—a type of
description that involves what is sometimes termed profile analysis or discrepancy analysis. For example, many
clinicians make use of
Page 256
the structure of available norm-referenced tests in which both receptive and expressive skills are examined to determine
the extent of problems in each area. Additionally, they may make use of subtest structure, when it is available, to further
refine a list of more specific strengths and challenges. For example, the clinician may note a child’s better performance
on receptive subtests with longer stimuli (e.g., listening to paragraphs) than on those with shorter stimuli (e.g., word
classes).
In chapter 9, problems in profile analysis were discussed in relation to using profiles in identification decisions (see the
section on conducting comparisons between scores). As a brief reprise, these problems relate to the difficulty in
distinguishing real differences between scores from those due to measurement error or to differences in normative
groups. In addition, when measures used in a profile are highly correlated, the comparison may offer little or no new
information (Olswang & Bain, 1994; Turner, 1988). Finally, even differences between tests or subtests that are real (i.e.,
are not due to error) and have occurred on measures of independent skills may not represent differences that are any
greater than those that may be observed in normal development (Berk, 1984; Olswang & Bain, 1991). The strategy of
simply distinguishing between age-appropriate and non-age-appropriate functioning seems a useful alternative to more
elaborate but problematic strategies of interpretation (McCauley & Swisher, 1984). This strategy consists of making
decisions about the adequacy of functioning in a given area independently, rather than in relation to function in other
areas.
Several difficulties in addition to those described in chapter 9 arise when norm-referenced tests are used to identify a
detailed set of strengths and challenges for purposes of description. One difficulty lies in the relatively small number of
content areas for which subtests are available. Looking only at those areas for which subtests do exist is quite akin to the
story of the intoxicated soul who looks under the lamppost for lost keys. Turning to non-norm-referenced measures
presents a logical, “sober” alternative in many such cases.
Even when a subtest contains items that seem perfectly relevant to a description of a child’s communication, however,
norm-referenced tests can also be used erroneously in efforts to provide detailed information. Treating individual items
or even subtests as reliable descriptors is likely to be erroneous in part because of the unreliability of small sample sizes
(i.e., the small amount of the child’s behavior that was sampled; McCauley & Swisher, 1984). In addition, because items
in such tests are usually selected more often because they discriminate between individuals than because of the specific
content they reflect, they can provide a spotty representation of the specific content area (McCauley & Swisher, 1984).
In addition to their use in profiles, norm-referenced tests are used in out-of-level testing, the practice of using a test that
may not be appropriate for a client of a given age to sample a set of behaviors. This descriptive information is intended
to help define what an individual does and does not do in response to a standard task and set of stimuli. Although this
practice is probably most frequently used with individuals with mental retardation, it can be used at any time when more
appropriate measures are wanting (Berk, 1984). When used in this way, the measure is treated as if it were criterion-
referenced, with the sampling of content becoming critically important to its value. The problem of small,
unrepresentative samples of behavior described earlier will require cautious interpretation or, more probably, a search for
a more appropriate tool.
Page 257
2. Standardized Criterion-Referenced Measures
Criterion-referenced measures have traditionally been applauded for their descriptive powers. After all, they are
generally constructed to enable a description of an individual’s knowledge base, rather than to facilitate comparisons
between individuals. However, there are relatively few criterion-referenced measures of communication that demonstrate
the same degree of standardization seen in norm-referenced tests. Because criterion-referenced measures require more
comprehensive coverage of smaller content areas, the demand for any single measure may not be sufficient to support
more extensive development. Recalling also that interest in the measurement community has only lately turned to
criterion referencing, it is easy to understand why informal criterion-referenced measures abound. Nonetheless, a few
more elaborately developed criterion-referenced measures exist, and many are in various stages of development.
Specific types of procedures used to collect data for criterion-referenced interpretation vary significantly and include
each of the measurement types discussed in the remainder of this chapter. The decision to highlight standardized
criterion-referenced measures in this separate section was based on a desire to emphasize the potential value of
strengthening such measures through the additional empirical scrutiny that accompanies their formal development.
Table 10.1 provides several examples of criterion-referenced measures encompassing diverse communication domains
and modalities. They vary in the extent to which they have been standardized. However, at a minimum they demonstrate
several of the hallmarks of standardization for a criterion-referenced instrument: development of guidelines for
appropriate use, administration procedures, scoring procedures, and method of interpretation.
3. Probes
Probes involve the use of structured tasks or contexts intended to elicit a given behavior (Damico et al., 1992). Although
that definition can also apply to the contents of standardized measures, the term probe is more typically reserved for
more informal measures. Elicitation greatly increases the probability of obtaining information about a given behavior
within a given time span, particularly for those behaviors that occur less frequently in natural conversation. However,
elicitation procedures represent potential intrusions on the naturalness of the elicited behavior. This potential means that,
insofar as naturalness is a major concern in description, their use should primarily be limited to behaviors that occur only
rarely without elicitation. In addition, special care should be taken during their construction to preserve the authenticity
of the communication exchange in which they are embedded. When such care is seen as impractical, the resulting data
more closely resemble a standardized test in miniature than a descriptive procedure meeting the more intense demands
for naturalness of context desirable for this type of measurement question. Data obtained from probes are frequently
evaluated by the clinician in terms of number or percentage correct. (See the discussion of observational codes under On-
Line Observations later in this chapter.)
Page 258
Table 10.1
A List of Some Criterion-Referenced Measures Available for the Description of Language Disorders in Children

Test Name Reference Reviewed in Ages Receptive Phonology Semantics Morphology Syntax Pragmatics
Mental and/or
Measurements Expressive
Yearbooks

Assessment of Foster, R., 3 years R X X


Children’s Giddan, J. J., & to 6
Language Stark, J. (1983). years,
Comprehension Assess ment of 11
Children’s months
Language
Comprehension.
Palo Alto, CA:
Consulting
Psychologists
Press.
Miller–Yoder Miller, J. F., & 4 to 8 R X X
Language Yoder, D. E. years
Comprehen-sion (1984). Miller–
Test Yoder Language
Comprehension
Test. Austin, TX:
Pro-Ed.
Preschool Blank, M., Rose, 2 R/E X X
Language S. A., & Berlin, years,
Assessment L. J. (1978). 9
Instrument Preschool months
Language to 5
Assessment years,
Instrument 8
(PLAI). San months
Antonio, TX:
Psychological
Corporation.
Receptive– Bzoch, K. R., & 0 to 3 R/E X X X X
Expressive League, R. years
Emergent (1991). Receptive-
Language Test–2 Expressive
Language Test-2.
Austin, TX: Pro-
Ed.
Wiig Criterion- Wiig, E. (1990). 4 to 13 E X X X X
Referenced Wiig Criterion- years
Inventory of Referenced
Language Inventory of
Language. San
Antonio:
Psychological
Corporation.
Page 259
An extended example in which measures varying in naturalness are described may help readers see the trade-offs
between naturalness, efficiency, and the clinician’s control of variables affecting performance. The procedures in this
example derive from attempts to examine phonological performance on a single sound or sound pattern in some detail
and over time. The first part of this example was created in 1967, when Elbert, Shelton, and Arndt developed the Sound
Production Task (SPT). In that task, the client imitated the production of 30 to 60 items containing a particular target
sound. Some items on the SPT consisted of nonsense syllables, others of single words, and others of short phrases
containing the sound. The SPT was designed to obtain relatively large numbers of observations in varying phonetic and
linguistic contexts, while avoiding repeated, inappropriate use of entire norm-referenced tests or items from them.
In a study of patterns of acquisition for /s/ and /r/ in treatment, Diedrich and Bangert (1980) used the SPT, but they also
devised a less reactive procedure, that is, one that was more covert in terms of its focus and thus less apt to elicit
uncharacteristically careful speech from the tested child. For this second procedure, called the Talking Task (TT), the
clinician engaged in a 3-minute conversation with the child and covertly noted the number of correct productions out of
those attempted. Although the TT represented an interesting innovation, it left the clinician at the mercy of chance, in
that infrequently occurring sounds might occur only a few times during the 3-minute sample—something that might be
addressed by defining the sample length in terms of a certain number of attempts, rather than in terms of time.
In 1981, Secord developed a set of tasks, the Clinical Probes of Articulation Consistency (C-PAC), recently replaced by
the Secord Contextual Articulation Tests (S-CAT; Secord & Shine, 1997), which bears some relationship to each of these
previous two tasks. In the S-CAT, probes for each consonant /r/ and vocalic / / are elicited in prevocalic and postvocalic
positions, as well as in clusters—in imitations of single words, short phrases, and sentences as well as in delayed
retellings of a story containing many words with the target sound. Thus, this set of probes can efficiently help the
clinician consider the possible effects of linguistic complexity (single word, sentence, narrative contexts) and phonetic
context (postvocalic, prevocalic, and cluster contexts). However, naturalness is somewhat reduced in a story-retelling
format and is reduced still further in imitation. These kinds of trade-offs abound in the construction of probes, making
the sharing of successful creations with colleagues a substantial and time-saving contribution.
In a book entitled Assessing children’s syntax (McDaniel, McKee, & Cairns, 1996), a variety of elicitation strategies for
both comprehension and production are discussed in detail by researchers who have considerable experience in their
application. Table 10.2 lists a number of these elicitation strategies. The descriptions of these strategies reveal the
common techniques available to both professional test authors and clinicians wishing to construct a syntactic probe for a
particular client.
Informal probes have also been developed to examine pragmatic skills—an area in which there is a dearth of
standardized measures (Lund & Duchan, 1993). For example, Lucas, Weiss, and Hall (1993) described the development
of a probe designed to examine the extent to which children with communication disorders are sufficiently
Page 260
Table 10.2
Elicitation Strategies for Assessing the Comprehension and Production of Syntax in Children

Strategy Description Strengths and Weaknesses

Production
Elicited imitation (Lust, Flynn, & Foley, The child is asked to repeat an utterance
1996) (usually a single sentence) exactly as produced
by an adult. It is assumed that only structures
reflecting the child’s grammatical competence
will be produced. An easy technique, even for
children as young as 1 or 2.
Strengths: You can choose stimuli very precisely and “know” what the child is attempting to say. Studies show good
agreement with comprehension and other data. The technique is applicable with small changes for children from a wide
range of cultures and languages and can be used at relatively low developmental levels.Weaknesses: Stimulus design is
complex due to the need to control variables that are not of direct interest (e.g., cognitive demand, attention, grammatical
complexity, sentence length). The technique has been criticized for relying unduly on short-term memory. Elicited
production (Thorton, 1996) Situations are created to increase the likelihood that the child will attempt to produce a given
structure, usually including the use of a “lead in” sentence that is produced by the adult to ‘‘provide the context and
‘ingredients’ for production of the structure without modeling it.” Sometimes this technique makes use of a puppet who
can be asked questions, directed to do things, or corrected. Typically used with normally developing children 3 years and
older. Strengths: Generation of the targeted structure rests more entirely with the child and is unlikely to be due to
chance. A large number of such probes have been described in the research literature.Weaknesses: The child’s
enjoyment level is key to the success of the strategy because she or he needs to be an active participant. The
awkwardness associated with a “no response” from the child may be intensified relative to other methods and may make
children less willing to continue. Working out the details required to elicit production may require considerable piloting
with adults or normally developing children. Similarly, correct productions are far more straightforwardly interpreted
than incorrect or untargeted productions.
Page 261
Comprehension intermodal preferential looking (Hirsh- The child is seated on a parent’s lap, hears a stimulus
Pasek & Golinkoff, 1996) and then is presented simultaneously with two novel
video images—one matching and the other not
matching what has been said. Greater time spent
watching the matching video is expected for
comprehended structures. Used for children between 12
months and 4 years of age.
Strengths: Minimal action is required. Use of videos allows the presentation of dynamic relationships. Can be used at
lower developmental levels than many other tasks.Weaknesses: Considerable time and expertise are required to create
the video stimuli. Only a few stimuli can be studied at any point in time. Picture selection task (Gerken & Shady, 1996).
The child hears the adult or a recorded voice presenting a verbal stimulus and then points to one of two to four pictures.
Typically this task is useful with normally developing children 20 to 24 months and older. Strengths: This technique has
been widely used to assess understanding or grammaticality of specific phonological distinctions, lexical comprehension
or comprehension of specific morphosyntactic structure. It tends to produce results comparable to object selection where
either task is feasible.Weaknesses: Considerable time can be required to produce comparable target and foil items.
Although use of tape-recorded speech or synthetic speech can help increase children’s attention, it increases the
complexity of task construction. Failures to respond are difficult to interpret. Acting-Out Tasks (Goodluck, 1996) The
child is asked to use provided props to act out a sentence that is read or played back from tape. Typically used for
children older than 3 years. Strengths: The task has a long history of use and is easy and inexpensive to use. It can be fun
for the child and can be particularly effective in assessing understanding of anaphora and pronominalization. It is
relatively open-ended task that may be less sensitive to response bias than many others, yet may be associated with a
tendency to repeatedly use a prop once it is picked up.Weaknesses: It cannot be used with constructions or predicates
that are difficult to act out and can be associated with responses that are difficult to interpret. Because of the cognitive
complexity of the task, it typically is used for normally developing children older than 3 years, thus limiting use with
children with language difficulties.
Page 262
informative in their utterances as they participate in a role-playing game. The child is assigned the role of “warehouse
manager” and is approached by the clinician “toy buyer” and asked where different toys might be found in the
warehouse. In a similar vein, Roeper, de Villiers, and de Villiers (1999) recently described their ongoing efforts to
design an extensive number of probes for assessing important interacting knowledge in pragmatics, semantics, and
syntax for 5-year-olds—for example, the need to know specific semantic and syntactic forms to achieve particular
pragmatic functions. Elaborately developed in terms of the materials, instructions, and scoring procedures, both the
probes developed by Lucas et al. and those developed by Roeper et al. illustrate that a measure’s formality is better
conceived of as a continuum than a dichotomy. Further, the thorough description of the probes offered by Lucas et al.
illustrate the extent to which sharing the results of well-developed probes can increase the efficiency of clinicians’ efforts.
Professional journals and a growing number of books on language development and disorders describe numerous clinical
and research probes (e.g., Brinton & Fujiki, 1992; Lund & Duchan, 1993; Miller, 1981; Miller & Paul, 1995; Simon,
1984). Table 10.3 showcases a modest sample of these probes for children across a wide range of ages and
developmental levels. It is offered to help provide a feel for the heterogeneity and considerable potential of such
measures.
4. Rating Scales
Rating scales consist of assigning numerals or labels to an individual’s behavior in a particular context. Rating scales are
typically completed by the clinician or other observer after the observation of individual communication events. At
times, such scales can be used to help observers summarize their experience across multiple observation experiences.
Rating scales differ from on-line observations, another type of descriptive measure, in that on-line judgements are made
during rather than after the actual communicative event.
Rating scales have a lengthy history in psychology and speech-language pathology (e.g., see Schiavetti, 1992), but
primarily in research rather than clinical settings (e.g., Burroughs & Tomblin, 1990; Campbell & Dollaghan, 1992).
However, increasing attention to the documentation of children’s functional limitations (Goldstein & Gierut, 1998) may
cause rating scales to be used with greater frequency in the future.
Two types of rating scales that have been most influential in speech-language pathology are interval scaling and direct
magnitude estimation (Campbell & Dollaghan, 1992; Schiavetti, 1992). These rating scales are usually used to compare
a large number of stimulus examples—something that is not always done with rating scales. When interval scaling is
used, the rater assigns each characteristic or behavior being rated to a linearly partitioned continuum, which is marked
off using numerals or descriptive labels. Thus, for example, a rater might be asked to rate a behavior on a continuum
from uncommon to most common, using a 6- or 7-point scale that might look something like this:
Page 263

or this:

When direct magnitude estimation is used, the rater is asked to rate each characteristic or behavior either as a proportion
of a standard stimulus provided as part of the rating system or as a proportion of other rated stimuli. Thus, for example,
Camp
Table 10.3
A Sample of Probes Used in the Description of Children’s Language

Procedure (Source) Approximate Age of Child for Description


Whom the Task Could Be Used
(If Specified)

Comprehension of action words 12 to 24 months Child is asked to perform actions that the
(Miller & Paul, 1995) child’s parent(s) believes he or she may
understand on familiar objects and people.
Unconventional actions may be requested to
help distinguish action unconnected to the
request from intentional responses.
Bellugi’s negation test (Miller, 1981) The child is asked to provide the negative of
an utterance produced by an adult.
Variations can include different auxiliaries,
negative with indefinites, imperatives, and
multipropositional sentences.
Production of question forms (Lund & The Messenger Game. The child is asked to
Duchan, 1993) get information from a third party, ideally
one who is out of view. For example, “Ask
her how she got to this school?”
Comprehension of nonliteral meaning Early adolescence Joke explanations. The child is asked to
(Lund & Duchan, 1993) explain a joke that he or she finds humorous.
Comprehension of classroom 6 to 12 years Classroom directions and vocabulary that
direction vocabulary (Miller & Paul, are thought to be difficult for the child are
1995) incorporated in instructions that the child
must follow using paper and pencil.
Production of sequential description Middle and High school Description for using a payphone. Child is
(Simon, 1984) students shown a picture of a pay phone and asked to
give a step by step description of how it is
used.
Page 264
bell and Dollaghan (1992) described a method in which no standard stimulus is provided. In their study, listeners were
instructed to assign any number of their choice to the first of 36 speech samples they were asked to rate. Later samples
were then rated subjectively on the bases of (a) their proportional informativeness relative to the other judgments made
in the sample and (b) the understanding that higher numbers were to be associated with greater informativeness than
lower numbers.
The Observational Rating Scales that are included as part of the third edition of the Clinical Evaluation of Language
Fundamentals (Semel, Wiig, & Secord, 1996) provide an example of how a rating scale can be used to enrich the
clinician’s understanding of the school-age child and his or her communication environment. They are mentioned here
because of the relative dearth of such scales for school-age children, although they are becoming more common—for
example, the Functional Status Measures (Educational Settings) of the Pediatric Treatment Outcomes Form (ASHA,
1995) and the Teacher Assessment of Student Communicative Competence (Smith, McCauley, & Guitar, in press). In
addition, the Observational Rating Scales are of particular interest because of their novel inclusion of parallel rating
forms so that comparable information can be obtained from the child, his parent(s) and teacher(s). They represent an
example of the interval scaling method, one in which individuals are asked to respond in a summative fashion to past
observations.
Each scale of the Observational Rating Scales consists of 40 items addressing ‘‘troubles” facing the child in listening (9
items), speaking (19 items), reading (6 items), and writing (6 items). To illustrate the nature of these items, let me
indicate that the first listening item is “I have trouble paying attention” for the student version (often completed with the
speech-language pathologist); “My child has trouble paying attention” for the parent version; and “The student has
trouble paying attention” for the teacher version. Each item is rated as occurring never, sometimes, often, or always, with
DK (Don’t know) used to mark items for which the rater feels unable to pass judgment. The Observational Rating Scales
also describe procedures for the observers to identify and provide examples of their top five concerns, thus paving the
way for functionally oriented intervention planning.
The chief appeals of rating scales are the apparent ease with which they can be created and administered, as well as their
wide applicability (Pedhazur & Schmelkin, 1991; Salvia & Ysseldyke, 1998). These virtues, however, may mask their
susceptibility to a number of problems, especially ones stemming from poorly defined points along an interval scale and
from differences introduced by different raters. In a brief review of such measurement issues facing rating scales,
Pedhazur and Schmelkin (1991) concluded that ratings may often “tell more about the raters than about the objects they
rate” (p. 121). They cited a rich literature in which the perceptual aspects of the rating task make raters vulnerable to a
number of types of bias. Two common types of bias include halo effects, in which raters allow impressions of general
characteristics or previous knowledge to have a consistent effect on ratings, and leniency effects, in which overly
positive judgments appear to occur because the rater is familiar with the person whose characteristics are being rated
(Primavera, Allison, & Alfonso, 1996).
An additional challenge to valid use of rating scales lies in the need to achieve a successful fit between the nature of the
characteristic being rated and the type of scal-
Page 265
ing method used to rate it (Campbell & Dollaghan, 1992; Schiavetti, 1992). In particular, researchers have noted a
difference in what kind of scale is appropriate depending on whether the rated characteristic falls along a metathetic
versus a prothetic continuum. On a metathetic continuum, raters’ responses to differences between rated entities seem to
reflect qualitative distinctions; whereas on a prothetic continuum, raters’ responses to differences between rated entities
appear to reflect quantitative distinctions (Stevens, 1975). The classic contrastive pair illustrating these two types of
continuum are pitch and loudness. Without looking ahead to the next paragraph, can you anticipate which of those two
characteristics of sound is prothetic (i.e., characterized by quantitative rather than qualitative differences)?
If you decided that loudness was prothetic, you are in agreement with a large body of research suggesting that people
tend to treat judgments such as loudness as if they were judgements about whether a stimulus had “more” or “less” of
something (Stevens, 1975). In contrast, pitch differences tend to be judged as if they represent qualitatively different
stimuli. Well, the challenge to devising appropriate rating scales is that whereas direct magnitude estimation can validly
be used to measure either type of characteristic, interval scaling appears to only be valid for measuring characteristics
that are metathetic.
Campbell and Dollaghan (1992) suggested that because of the lack of research determining which language
characteristics are metathetic versus prothetic, direct magnitude estimation is a less risky choice for researchers and
clinicians who wish to use rating scales in their descriptions of children’s language disorders. They noted that direct
magnitude estimation can be used to provide a comparison of children’s spontaneously produced language against that of
their peers. Among the most important uses they saw for such judgments were the examination of change occurring as
result of or in the absence of treatment. In particular, Campbell and Dollaghan described a method in which 10 to 15
listeners could be used to provide ratings with a stable percentage of variability.
Specifically, Campbell and Dollaghan (1992) had 13 listeners compare the informativeness—‘‘amount of verbal
information conveyed by a speaker during a specified period of spontaneous language production” (p. 50)—achieved by
three children who had sustained severe brain injury with three age-matched controls, when both sets of children were
engaged in a video-narration task (Dollaghan, Campbell, & Tomlin: 1990). (Recall that the particulars of the direct
estimation method involved in this study were described earlier in the chapter when that rating method was introduced.)
The use of this technique provided social validation to the recovery patterns shown by the 3 children with brain injury
who participated in the study. The relatively large number of raters required for use of direct magnitude estimation may
preclude its use in many clinical situations. However, it may prove valuable as a means of validating more efficient
methods of social validation. In addition, it may prove valuable as a method that could provide exactly the information
required for certain clinical situations. For example, it might be used as described by Campbell and Dollaghan to support
to a relatively costly or lengthy treatment approach for a given child or group of similar children.
Not surprisingly, then, it appears that the use of rating scales as a descriptive measurement tool, like others discussed in
this section, has a greater complexity than might
Page 266
at first be apparent. Thus, wise users will require as much evidence regarding validity as possible for specific methods
prior to deciding to implement them clinically. Further evidence of their promise should prompt users to want to
participate in providing such evidence.
5. Language Analysis
Language sampling and analysis have enjoyed a long history of use in studies of children’s language acquisition (e.g.,
Brown, 1973; Miller, 1981; Templin, 1957). The variety of procedures recommended for elicitation of language samples
and for the derivation of measures based on them has grown appreciably over the past 40 years and has changed as
understandings of the nature of language impairments have changed (Evans, 1996a; Gavin, Klee, & Membrino, 1993;
Miller, 1996; Stromswold, 1996).
In a study of some 253 American speech-language pathologists who work with preschool children, Kemp and Klee
(1997) found that 85% of them used language analysis in their practice, with most preferring nonstandardized forms to
formal procedures. Language analyses are sometimes avoided by clinicians who report that they do not have the time to
incorporate them into practice or that they lack the computer resources that would make their use more time efficient
(Kemp & Klee, 1997). However, these objections are rapidly being addressed by the refinement and proliferation of
computerized analysis programs (Long, 1999). Innovations such as transcription laboratories staffed by nonprofessional
transcribers, the creation of databases reporting findings for large numbers of children, and the availability of analysis
procedures at no cost also point to greater practicality of language analysis in the future (Evans & Miller, 1999; Miller,
1996; Long, personal communication, January 7, 2000; Miller, Freiberg, Rolland, & Reeves, 1992).
Among the numerous discussions extolling the virtues of language sampling and analysis, Evans and Miller (1999)
offered one that is particularly powerful:
The language sample, by contrast [with available standardized tools], represents the child’s integration of specific
intervention goals within the larger communication context and provides clinicians with an opportunity to assess
children’s language skills dynamically across a range of situations that vary in communicative demand (e.g., free-play,
interview, narration, picture description). Language samples can be collected as often as necessary without performance
bias, and changes in children’s abilities can be documented across a wide range of linguistic levels. (Evans & Miller,
1999, pp. 101–102)
Additionally, such analyses can examine not only many aspects of language, but can also be used to examine how
complexity in one area may impact another—a theme of growing interest in the evolution of language assessment tools.
Although language analyses are typically used to assess aspects of expressive communication, they are also frequently
used as a means of examining receptive skills. In particular, it seems that children’s responses to the direction and
comments of their conversational partners provide data that are valued by many clinicians (Beck, 1996). In the next
section, the evolution
Page 267
of language sampling and analysis is described to help readers understand the variety of available measures and how
these measures have changed over time.
The Evolution of Language Analyses
In 1996a, Evans reviewed the changes in emphasis in language sampling techniques that have accompanied changes in
theoretical perspectives on language development and language disorders. In particular, she discussed the influence of
three dominant research paradigms spanning the past half-century: (a) the behaviorist learning paradigm, (b) the
formalist competence-based paradigm (encompassing “generative syntax, generative semantics, and a narrow
interpretation of syntax,” Evans, 1996a, p. 208) and (3) the functionalist paradigm. A brief summary of her comments is
relevant to anyone using language analysis because so many of the measures associated with earlier paradigms remain
available and in widespread use—sometimes in revised versions and sometimes in their original form (Kemp & Klee,
1997).
In the heyday of the behaviorist learning paradigm, the roles of the environment on learning and the word as the unit of
analysis were emphasized. Language acquisition was understood to occur through the reinforcement of correct use of
words and sentences (word sequences). Although standardized language tests (e.g., the Peabody Picture Vocabulary
Test, Illinois Test of Psycholinguistic Abilities) dominated language assessment methods during this period, language
analysis techniques were used as well and emphasized counts or descriptions of different verbal behaviors (e.g., type–
token ratio, measures of sentence length).
The second paradigm discussed by Evans (1996a), the formalist competence-based paradigm, was designed to address
the generativity of children’s language, that is, the use of novel and therefore unmodeled and presumably unreinforced
utterances (e.g., overregularization of past tense, as in “he goed.”). As Evans notes, this paradigm was made possible by
linguistic theory of the day (particularly the work of Chomsky), in which a major goal of linguists became the
identification of language-independent competencies, termed linguistic universals. Such universals were thought to
suggest features of languages and linguistic structure that were likely to occur in all languages.
Evans (1996a) suggested that initial orientations within the formalist paradigm were largely syntactic in nature and
proceeded on the assumption that domains of language—syntax, semantics, and so forth—could be viewed
independently. An assumption was also made that variability in performance was more likely to be a function of a
child’s knowledge than a function of contextual factors. According to Evans’s account, later developments in this
paradigm, fueled by theory and data from a variety of sources, shifted the focus somewhat—first to semantics, then to
pragmatics. Evans pointed out that language analyses associated with the formalist period similarly shifted, although
sometimes subtly, from largely syntactic measures (e.g., Developmental Sentence Scoring, DSS; Language Sampling,
Analysis, and Training, LSAT; and Language Assessment Remediation and Screening Procedure, LARSP) to measures
focusing on semantics (e.g., mean length of utterance in morphemes, MLUm) and, later, on pragmatics (e.g., Roth &
Spekman, 1984).
Page 268
Evans (1996a) noted that, throughout this period, the child’s task in language acquisition was largely seen as that of
acquiring competence in the underlying rules of the ambient language. Predictably, then, childhood language disorders
within this paradigm were seen as difficulties in acquiring the rules of the individual subsystems of language. In Evans’s
view, language assessments have thus grown through accretion to require elaborate analyses across semantics, syntax,
and pragmatics—a process that has been made more feasible through modern technology. Among the analyses she
associates with this period are the Systematic Analysis of Language Transcripts (SALT; Miller & Chapman, 1982, 1998)
and the Child Language Analysis programs (CLAN; MacWhinney, 1991).
Evans (1996a) suggested that functional theories, the last of the three paradigms, were prompted by difficulties in
accounting for children’s variability across contexts. If rule acquisition is what is taking place, then a form evidencing
that rule should either be present or not present in a child’s productions—not present in some situations, but not others,
with some conversational partners, but not others. The functionalist paradigm is reflected in works such as Bates and
MacWhinney (1989). According to Evans, it is based on the following premise:
Variability in speaker performance is simply the final solution to the interaction among the internal state of a complex
system (i.e., the underlying speaker competence), the structure of the system (e.g., word order, lexical items,
morphonology, suprasegmentals), and the impact of external constraints such as real-time language processing demands.
(Evans, 1996a, p. 254)
Within the functionalist paradigm, then, variability becomes a major source of information about the current state of a
child’s dynamic system (linguistic and nonlinguistic) as it responds to external conditions (e.g., situational or attentional
factors). Increased variability is seen as an opportunity for positive change. In addition, this paradigm emphasizes the
necessity of examining the interplay of language domains, an area identified by numerous authors as among the most
exciting challenges facing clinicians this decade (Howard, Hartley, & Muller, 1995).
Evans (1996b) provided an example of such interactions when she found fewer morphosyntactic omissions in the speech
of children with SLI when their utterances occurred within a conversational turn rather than adjacent to a shift in
conversational turn. Numerous studies beyond those just cited (e.g., Crystal, 1987; Panagos & Prelock, 1982; Paul &
Shriberg, 1982) argued that rich and powerful understandings of children’s speech and language development emerge
from the kinds of detailed analyses called for by current theory.
Certainly one of the major advantages of language sampling, then, is the variety of questions to which the resulting
sample can be put. For example, Dollaghan and Campbell (1992) described a taxonomy of within-utterance disruptions
arising from language rather than fluency disorders to help characterize the subtle deficits lying across language domains
that plague young speakers with language disorders, both developmental and acquired.
Table 10.4 lists some of the standardized measures currently used to describe children’s language skills based on
language samples. In this table, a variety of informa
Page 269
Table 10.4
Tools Available for Detailed Analyses of Language Samples
(Evans, 1996a; Long, 1999; Owens, 1998)

Procedures Content of Analyses Computerized?

Assigning structural stage (Miller, 1981) Morphology, Syntax


Communication analyzer (Finnerty, 1991) Morphology, Syntax √
Computerized Language Assessment, Remediation, and Screening Morphology, Syntax √
Procedure (LARSP; Bishop, 1985)
Computerized profiling versions 6.2 and 1.0 (Long & Fey, 1989) Morphology, Syntax, Narrative √
Computerized language analysis (CLAN; MacWhinney, 1991) Morphology, Syntax, Narrative √
Computerized language error analysis report (CLEAR; Baker-van Morphology, Syntax, Pragmatics √
den Goorbergh, 1990)
Computerized profiling (CP; Long, Fey & Channell, 1998) Morphology, Syntax √
Developmental sentence scoring (DSS) Computer Program Morphology, Syntax
(Hixson, 1985)
Content, form, and use analysis (Lahey, 1988) Semantics, Morphology,
Syntax, and Pragmatics
Index of Productive Syntax (IPSyn; Scarborough, 1990) Morphology, Syntax √
Language assessment, remediation, and screening procedure Morphology, Syntax
(LARSP:, Crystal, Fletcher, & Garman, 1989)
Language sampling, analysis, and training (LSAT; Tyack & Morphology, Syntax
Gottsleben, 1974)
Lingquest (Mordecai, Palin, & Palmer, 1985) Morphology, Syntax √
Parrot early language sample analysis (PELSA; Weiner, 1988) Morphology, Syntax √
Profile in semantics—grammar (PRISM-G; Crystal, 1982) Semantics
Profile in semantics—lexicon (PRISM-L; Crystal, 1982) Morphology
Pye analysis of language (PAL; Pye, 1987) Morphology, Syntax √
Systematic analysis of language transcripts (SALT; Miller & Morphology, Syntax, Narrative √
Chapman, 1998)

tion about the procedure and children for whom it would be useful are provided. In addition, those procedures that are
available on computer are indicated. Recently, one of these computerized programs, CP (Long, Fey & Channell, 1998)
has been made available without charge at the following Internet website: http://www.cwru.edu/artsci/cosi/cp.htm (Long,
January 7, 2000, personal communication).
Readers are reminded that computerized measures should be viewed hopefully (Long, 1991, 1999; Long & Masterson,
1993), but with caution as well (Cochran &
Page 270
Masterson, 1995). After all, computers render it possible to conduct language analyses that would be prohibitively time-
consuming if performed by hand, but they also make it possible to make really silly or wrong-headed mistakes more
quickly than ever—for example, to use the wrong analysis for a particular child. The user of such measures must
exercise as much caution as ever in selecting the specific sample to be used as input and in “buying into” the specific
techniques used. Further, one should recognize that although language samples are “natural’’ in the sense that they are
often not consciously structured by the clinician, they are nonetheless subject to the same contextual effects that affect
norm-referenced test performance (Plante, February 18, 2000, personal communication). A growing literature on the
subject of language analyses can help clinicians determine what is available and likely to be useful for their clients
(Cochran, & Masterson, 1995; Long, 1991, 1999; Long & Masterson, 1993).
Although a detailed account of even a single analysis tool is beyond the scope of this book, a summary of some recent
research may help the reader see the wealth of information obtainable through language analysis. Table 10.5 lists some
patterns of disordered language performance that can be described using the SALT (Miller, 1996). Miller and Klee
(1995) used these categories to characterize problems of 256 children from ages 2 years, 9 months to 13 years, 8 months.
The data were based on conversational and narrative samples, contexts that were selected because of the wealth of
research on the former and the important connection to literacy of the latter (Miller, 1996). Miller and Klee (1995) found
significant numbers of children at varying ages falling in one or more categories, with only 20 children not described by
any category.
For preschool children, one very specific measure that has remained in use in a relatively consistent form across the
paradigms described by Evans has been the MLU, measured in morphemes. Guidelines for the calculation of MLU as
described Chapman (1981) are shown in Table 10.6. MLU is regularly used clinically (Kemp & Klee, 1997; Miller,
1996) and has been incorporated in several of the procedures described in Table 10.4, including SALT. Its use is based
on the premise that, at least in younger children, increasing syntactic complexity will also require increasing utterance
length—especially when length is measured in morphemes and therefore would be sensitive to increases in either words
or grammatical or derivational morphemes.
Numerous studies lend credence to the value of MLU in describing language change through the preschool years
(Conant, 1987; Rondal, Ghiotto, Bredart, & Bachelet, 1988; Scarborough, Wyckoff, & Davidson, 1986). In 1993, Blake,
Quartaro, and Onorati found evidence that MLU correlated highly with a measure of grammatical complexity obtained
using the LARSP until an MLU of 4.5 was reached. Findings such as these have provided considerable support for
MLU’s widespread use in research as a means of grouping children according to language skill (Miller, 1996), but the
appropriateness of MLU depends on the precise focus of the study.1 Recent research (e.g., Aram, Morris, & Hall, 1993)
has also suggested the diagnostic utility
1 Leonard (1996) described several alternative measures for equating research groups that will be more appropriate in
certain circumstances, including mean number of arguments expressed per utterance, mean number of open-class words
per utterance, measures of unstressed syllable production or word-final consonant production, and expressive vocabulary.
Page 271
Table 10.5
A Clinical Typology of Disordered Language Performance
Based on Use of the SALT

Clinical types Characteristics

Utterance formulation Maze revisions at word- and phrase-level units; increased


MLU; pauses within and between utterances; word- order errors
Word finding Maze revisions and repetitions at word-and part-word-level
units; pauses within utterances; word omissions; word- choice
errors
Hypo-verbal rate Decreased number of utterances and words per minute; pauses
within and between utterances
Hyper-verbal rate Increased number of utterances and words per minute, which
may be combined with reduced semantic content
Pragmatic or discourse Noncontingent utterances; pronominal reference errors;
problems with topic maintenance, new versus old information,
and narrative structure
Semantic or reference Over-generalization, word-choice, and Noun Phrase–Verb
Phrase symmetry errors; abandoned utterances; redundancy
Delayed development Decreased number of different words and total number of
words; delayed syntactic development as measured in MLU
and other detailed syntactic analyses

Note. SALT = Systematic Analysis of Language Transcripts; MLU = mean length of utterance; NP-VP = n. From
“Progress in Assessing, Describing, and Defining Child Language Disorder,” by J. Miller, 1996, in K. N. Cole, P. S.
Dale, and D. J. Thal (Eds.), Assessment of Communication and Language (p. 319), Baltimore: Brookes Publishing.
Copyright 1996 by Brookes Publishing. Reprinted with permission. in clinical settings, particularly where production
difficulties are prominent features of the child’s difficulties.
Technical Considerations: Sample Size and Variations in Language Sampling Conditions
Recently, Muma et al. (1998) reported on a study conducted several years earlier in which language samples were
obtained from a group of seven normally developing children between the ages of 2 years, 2 months and 5 years, 2
months. They noted that 200–300 utterances were needed to obtain acceptable error rates on many grammatical
structures related to the child’s use of different grammatical systems (nominal, auxiliary, verbal) and grammatical
operations (use of relative clauses, do insertion, participle shifts, etc.). Specifically they found a 15% error rate for the
200–300 utterance samples versus error rates of 55 and 40%, respectively, for 50-utterance and 100-utterance samples.
Not surprisingly, then, these data suggest that the more specific the nature of the information that will be looked for in
the language analysis (i.e., whether detailed information about specific structures is sought), the longer the sample will
need to be (Plante, February 20, 2000, personal communication).
In a similar study, Gavin and Giles (1996) conducted a SALT analysis on language samples of varying sizes based on
either increments of time (12 or 20 minutes) or number of utterances (25–175, in 25-word increments). Study
participants were 20 children from 31 to 46 months of age. The researchers examined the test–retest relia-
Page 272
Table 10.6
A Summary of the Method for Calculating Mean Length of Utterance (MLU) in
Morphemes, as Described by Chapman (1981) as an Adaptation From Brown (1973)

Preparing the speech sample for calculation of MLU

The child’s speech is segmented using the criterion of terminal intonation (rising or falling). These procedures differ
from those of Brown (1973) in that a sample of the first consecutive 50 utterances (including the first page of
transcription) rather than 100 utterances (excluding the first page) is recommended. Excluded from the sample of
utterances are unintelligible or partially unintelligible utterances. Included are “doubtful” transcriptions and exact
utterance repetitions. Counting morphemes in each utterance Morphemes are defined as minimal meaningful units of a
language, with dog and -s given as examples. Counting rules based on those of Brown (1973) are given to address the
greater uncertainty of what constitutes a morpheme in the speech of a child. The total count for each utterance is
calculated, summed, and divided by the total number of utterances spoken to yield the MLU. The counting rules are
given verbatim: “(1) Stuttering is marked as repeated efforts at a single word; the word is counted once in the most
complete form produced. In the few cases where a word is produced for emphasis, or the like (no, no, no), each
occurrence is counted separately. (2) Such fillers as mm or oh are not counted, but no, yeah, and hi are. (3) All
compound words (two or more free morphemes), proper nouns, and ritualized reduplications count as single words.
Some examples are birthday, rackety-boom, choo-choo, quack-quack, night-night, pocketbook, seesaw. The justification
for this decision is that there is no evidence that the constituent morphemes function as such for these children. (4)
All irregular pasts of the verb (got, did, went, saw) count as one morpheme. Again, there is no evidence that the child
relates these to present form. (5) All diminutives (doggie, mommie) count as one morpheme because these children
do not seem to use the suffix productively. Diminutives are the stand forms used by the child. (6) All auxiliaries (is,
have, will, can, must, would) count as separate morphemes as do all catenatives (gonna, wanna, hafta, gotta) The
catenatives are counted as single morphemes, rather than as going to or want to, because evidence is that they function as
such for children. All inflections, for example, possessive (s), plural (s), third person singular (s), regular past (ed), and
progressive (ing), count as separate morphemes. (Chapman, 1981, p. 24) Chapman (1981) identified several special
characteristics of a sample that may affect the representativeness of the MLU: high rate of imitation (i.e., >20% of the
child’s utterances), frequent self-repetitions within a speech turn, a high proportion of answers occurring in response to
adult questions (i.e., >30–40% of the child’s utterances), frequent use of routines (such as “counting, saying the alphabet,
nursery rhymes, song fragments, commercial jingles, or long utterances made up by listing objects in a book or the
room’’), and a high proportion of utterances in which clauses are conjoined by and. Among the strategies she suggested
for addressing these problems are calculations conducted with and without imitations, self-repetitions, frequent routines,
and responses to questions. In addition, she suggested obtaining additional samples with another adult who asks fewer
questions when high rates of question responses are noted and the use of another measure (the T unit) when a high
proportion of utterances consist of clauses conjoined by and.
Page 273
bility of four measures (MLU, number of different words, total number of words, and means syntactic length) in samples
at these different lengths. They found that only at the largest number of utterances (about 175) did reliability coefficients
meet or exceed .90, the value considered acceptable for diagnostic use.
The implication of these findings extends beyond a simple admonition for clinicians to attempt to obtain larger sample
sizes on which to base language analyses or for them to be very aware of the potential for error dogging analyses based
on smaller samples—although those are clear and potent implications. Even more importantly, however, they illustrate
the connection between reliability and sample size that haunts many if not most descriptive measures. Obviously, rarer
structures or phenomena are more likely to be vulnerable, but additional research will prove helpful in guiding us toward
best practices in our choice of tools and sample sizes.
The conditions under which language samples are collected are known to affect numerous measures obtained in
language analyses (Agerton & Moran, 1995; Landa & Olswang, 1988; Miller, 1981; Moellman-Landa & Olswang, 1984;
Terrell, Terrell, & Golin, 1977). Even a partial listing of some of the variables affecting a child’s productions can leave
one quite daunted—for example, race and familiarity of communication partner, stimulus materials, number of
communication partners, number and types of questions asked, type of communication required (e.g., narrative,
description of a procedure), to name a few! It is possible to leave these variables uncontrolled—as is often done when an
unstructured conversation between clinician and child is used as the sample. In such cases, the clinician will want to
consider these variables in his or her analysis and interpretation process.
As an alternative to unstructured language samples, structured sampling tasks have been recommended as providing
more relevant (i.e., valid) information for some clinical questions. Following is a list of five sets of tasks designed to
elicit structured language samples for school-age children (Cirrin & Penner, 1992):
1. describing an object or picture that is in the view;
2. recalling a two-paragraph story told by the clinician without pictures;
3. describing a person, place, or thing that is not present in the immediate surroundings;
4. providing a description of how to do something familiar (e.g., making a sandwich); and
5. telling what the child would do in a given situation (e.g., waking up or seeing a house on fire)
This list illustrates tasks that manipulate some of the variables that may present a child with particular difficulty, thus
allowing the clinician to target language sampling for those areas of special importance for the individual child.
However, it is important to remember that each of these conditions is likely to affect more about the chilid’s productions
than simply the variable that appears to be manipulated. For example, on the basis of the precise way in which the task is
set up by the clinician, variables beyond the desired topic or level of language complexity will probably be affected.
Page 274
In another effort to help clinicians standardize the conditions under which they collect conversational language samples,
Campbell and Dollaghan (1992) offered a sequence of topic questions that they suggested be used in order, but only as
spurs to conversation. Thus, only topics that the child would show genuine interest in would be continued. Further,
additional topics introduced by the child would be pursued as long as they continued to interest the child. The intended
result was increased consistency across examiners. In brief, the sequence begins with questions about the child’s age,
birth date, and siblings; then proceeds to questions about family pets, favorite home activities, and school affairs; and
closes with questions about vacations, favorite books, and TV shows. Although this list is relatively conventional, the
decision of a group of colleagues to adopt it—or some other consistent set of starter questions—might help lend greater
consistency to the language samples obtained across children. This, in turn, would increase the integrity of local
measures that might be made using the data from a number of clients. However, it should be noted that standardization in
this way is not necessarily going to add to the representativeness of the sample for the individual child—that may best be
achieved by entering one of a child’s favorite activities and simply observing what happens there.
6. On-Line Observations
This category of descriptive measures is characterized by Damico et al. (1992) as real-time observation and coding of
behaviors exhibited during communicative interactions as they happen. Thus, these measures differ from rating scales
that are completed outside of that time frame. Although not at all rare in research on communication, Damico et al. noted
the relative rarity with which they are applied by speech-language clinicians in clinical practice.
McReynolds and Kearns (1983) described five kinds of observational information or codes that are frequently used in
applied research settings to obtain on-line measures: (a) trial scoring, (b) event recording, (c) interval recording, (d) time
sampling, and (e) response duration. As each of these is described, the reader will see that these same categories can be
used to describe the outcomes of probes. The chief difference between probes and on-line observations is that the latter
involves responses to a more naturalistic communication event, whereas the former involves a greater level of
contrivance on the part of the clinician.
In trial scoring, responses following a specific stimulus or trial are scored as correct or incorrect. Such responses can
occur either naturally or with prompting. Although correct versus incorrect are the most commonly used labels applied to
responses in trial scoring, a numerical code (which may in fact represent a type of rating scale) may be used to provide
greater detail about the nature of responses. One example of a numerical code is the multidimensional scoring system
used in the Porch Index of Communicative Ability in Children (Porch, 1979), which uses a 16-point scoring system to
reflect 5 dimensions (accuracy, responsiveness, completeness, promptness, and efficiency). Readers should note that
only rarely are such combinations of rating scales and trial scoring used in on-line situations because of the intense
demands on the rater, which leaves such measures quite vulnerable to problems with reliability.
Page 275
In event recording, a code is established consisting of behaviors (including verbal, nonverbal, or both) of interest. That
code is then used to summarize the targeted child’s behaviors over a given time period (e.g., a 15-minute period). One
example of a code that might be used in event recording would be the one developed by Dollaghan and Campbell (1992).
That code had been developed to describe within-utterance speech disruptions (i.e., pauses, repetitions, revisions, and
orphans—linguistic units such as sounds or words that are not reliably related to other such units within an utterance).
Whereas Dollaghan and Campbell used that code in an analysis of previously recorded language samples, it could also
be used for on-line observation.
Interval recording and time sampling, two sampling methods that are closely related, are also closely related to event
recording (McReynolds & Kearns, 1983). In interval recording, a set time period is divided into short, equal intervals (e.
g., 10 seconds) and events are noted as having occurred once if they occur at any point during the interval. In time
sampling, a set time period is again divided into intervals, but only the presence of the behavior at the very end of the
interval is recorded. In addition to the designation of intervals devoted to observation, this approach also includes
recording intervals in which no observations are attempted. In time sampling, therefore, a 7.5-second observation
interval might be followed by a 2.5-second recording interval. Time sampling has been thought to be associated with
fewer problems affecting accuracy than interval recording. However, both methods require that care be taken in the
selection of interval sizes (McReynolds & Kearns, 1983). Intervals that are too short are likely to increase recording
errors; those that are too long are likely to lose information due to wanting observer attention.
The last of the observational codes described by McReynolds and Kearns (1983) is the recording of response duration, in
which the duration of a specific event of interest (e.g., pause duration) is recorded using a stopwatch or other timing
device. Although response duration may not be applicable to many language phenomena, it can nonetheless prove quite
useful from time to time for children with language disorders. For example, a functional measure for a child with SLI
who demonstrates pragmatic difficulties might consist of time spent engaged in conversation with one or more peers
during recess. Alternatively, time spent in perseverative or noncommunicative speech (e.g., repeated recitation of a
television commercial) during a group activity might be used as a functional measure for a child with autism.
Damico (1992) provided an example of an on-line observational system, called Systematic Observation of
Communicative Interaction (SOCI), which makes use of event recording and time sampling. In SOCI, problematic
verbal and nonverbal behaviors are recorded along with information about several dimensions (such as illocutionary
purpose) each time it occurs within a fixed time period (a 10-second period that consists of a 7-second observation and 3-
second recording interval). Recorded behaviors include failure to provide significant information, nonspecific
vocabulary, message inaccuracy, poor topic maintenance, inappropriate response, linguistic nonfluency, and
inappropriate intonation contour. Four to seven recording periods of approximately 12 minutes each are recommended.
Although some data regarding reliability of this procedure are mentioned in Damico (1992), clearly this type of
procedure warrants additional evidence to provide better guidance regarding its interpretation and validity.
Page 276
7. Dynamic Assessment
Dynamic assessment procedures represent a large number of procedures that are designed to examine a child’s changing
response to levels of support provided by the clinician. Proponents of dynamic assessment might balk at its inclusion in
the list of measures reviewed in this chapter, maintaining that it represents an approach to assessment that is entirely
different from the rest. In fact, for proponents of dynamic assessment, most other forms of descriptive assessment can be
lumped into the single, usually less desirable category “static.” Within this conceptualization, static assessments assume
a constant set of stimuli and interactions between the child and tester, whereas dynamic assessments assume a changing
set of stimuli and interactions that are manipulated to provide a richer description of how the child’s performance can be
modified. Referred to as dynamic assessment here, a wide variety of related assessment strategies fall within this
category.
To those unfamiliar with the term dynamic assessment, Olswang and Bain (1991), two of its foremost advocates in
language assessment, helpfully noted its strong resemblance to a more familiar and venerable concept. Specifically, they
compare it with stimulability, in which unaided productions (usually in articulation testing) are followed by efforts to
obtain the child’s “best” productions when aided by the clinician’s visual, auditory, and attentional prompts. In both
stimulability and in dynamic assessment procedures, facilitating actions on the part of the clinician are designed to help
determine the upper limits of a child’s performance. As a result, the boundaries of assessment and treatment are blurred.
This blurring has led to the use of the term mediated learning experience (Feuerstein, Rand, & Hoffman, 1979; Lidz &
Peña, 1996) to refer to one model of dynamic assessment. It also foreshadows the integration of such assessment
techniques into treatment (e.g., Norris & Hoffman, 1993).
Initially applied in cognitive and educational psychology by Feuerstein and others (e.g., Feuerstein, Rand, & Hoffman,
1979; Feuerstein, Miller, Rand, & Jensen, 1981; Lidz, 1987), dynamic assessment models are typically based on the
work of Vygotsky (1978), who proposed the zone of proximal development (ZPD) as a conceptualization of the moving
boundary of a child’s learning. The zone of proximal development is defined as “the distance between the actual
developmental level as determined by independent problem solving and the level of potential development as determined
through problem solving under adult guidance or in collaboration with more capable peers” (Vygotsky, 1978, p. 86).
Problem solving or behaviors lying within this zone are thought to represent those areas where maturation is occurring
and to characterize development “prospectively” rather than ‘‘retrospectively” as is done with typical, static assessment
(Vygotsky, 1978).
The ZPD has been interpreted as being indicative of learning readiness. Therefore, its description through dynamic
assessment has been considered especially useful for identifying treatment goals (Bain & Olswang, 1995; Olswang &
Bain, 1991; 1996). Specifically, Olswang and Bain (1991, 1996) suggested that tasks that children perform with little
assistance do not warrant treatment, and those that children fail to perform, even when provided with maximal
assistance, are not yet appropriate targets. Instead, the most appropriate targets are likely to be those that children
perform only
Page 277
when given considerable assistance. Modifiability of performance in response to adult facilitation has also been shown to
predict generalization of performance to new situations, such that children who demonstrate less modifiability show less
transfer (Campione & Brown, 1987; Olswang, Bain, & Johnson, 1992).
Another benefit of dynamic assessment observed by Olswang and Bain (1991) is that dynamic assessment strategies
allow the clinician to determine not only what the child is learning, but also how that learning can be supported through
the manipulation of antecedent and consequent events. They note that whereas consequent events such as the nature of
reinforcement (e.g., tangible vs. social) and schedule of reinforcement (e.g., continuous vs. variable) have received
attention for many years in speech-language pathology, antecedent events receive greater attention in dynamic
assessment. Among the antecedent events highlighted in dynamic assessment are the use of models or prompts, the
selection of the modalities of stimuli or cues that are used, and the number of stimulus presentations that are provided.
Table 10.7 provides a hierarchy of verbal cues used to provide differing levels of support for children with specific
expressive language impairment learning two-word utterances (Bain & Olswang, 1995). In the study, which was
designed to validate the
Table 10.7
A Sample Hierarchy of Verbal Cues

Condition Cue

General statement Opportunity to child’s attention


“Oh look at this. ”
Elicitation question Opportunity plus an elicitation cue
“What’s happening?”
“What’s he doing? ’’
Close or sentence completion More salient opportunity—contrasting particular feature of
what is to be coded
“Look the dog is sitting and__.”
(manipulating dog so it is walking)
Indirect model Repetition of opportunity + embedded or delayed model and
elicitation cue
“See, the dog is walking; what is he doing?”
Direct model evoking spontaneous imitation Opportunity plus a direct model of desired utterance without
elicitation cue—participant spontaneously imitates the
utterance
“Dog walk.”
Direct model plus an elicitation statement Opportunity plus a direct model of desired utterance with an
elicita-tion statement
“Tell me, dog walk.”
Note. This table represents a sample hierarchy of verbal cues arranged from those providing least to most support for
the production of two-word utterances in children with specific expressive language impairment who are producing
few or no utterances of this type. This example uses cues designed to elicit Agent + Action (“dog walk”) as relevant
objects are manipulated. From “Examining Readiness for Learning Two-Word Utterances by Children With Specific
Expressive Language Impairment: Dynamic Assessment Validation,” by B. A. Bain and L. B. Olswang, 1995,
American Journal of Speech-Language Pathology, 4, p. 84. Copyright 1995 by American Speech-Language Hearing
Association. Reprinted with permission.
Page 278
use of dynamic assessment, 15 children who were producing few or no two-word utterances were assessed using
standardized measures, language samples, and dynamic assessment, then treated for 3 weeks. Construct validity was
supported through the demonstration that more supportive cues (i.e., those providing more information) resulted in more
correctly produced two-word utterances than less supportive cues. In addition, predictive validity was supported through
the demonstration that children who showed the greatest responsiveness to the hierarchy (responded to the less
supportive cues) showed the greatest language change over the study period. One unexpected finding was that language
sampling was associated with a greater variety of word combinations and two-word utterance types than was dynamic
assessment. This finding was inconsistent with the outcome needed to support concurrent validity, thus suggesting the
need for further study.
The collaborative nature of the interaction promoted in dynamic assessment is thought to have immediate benefits to the
child’s motivation. Lidz (1996) described this interaction as promoting “rapport-building and motivational variables,
including reduced anxiety, [such that] assessment becomes more of an instructional conversation than a test” (p. 11). At
the same time, a number of authors (Gutierrez-Clellen, Brown, Conboy, & Robinson-Zañartu, 1998; Lidz, 1996) noted
that the use of dynamic assessment allows the clinician to determine how assessment conditions facilitate or obstruct the
child’s attention or arousal, perception, memory, conceptual processing, and metacognitive processing. Thus, dynamic
assessment may provide information not only about the child’s current and potential level of functioning on a given task,
but also about the child’s learning needs and style that extend beyond the task at hand.
Because of its complexity, dynamic assessment is recommended for some, but not all children whose language requires
description. Much of the early work on dynamic assessment was directed at its use for children with mental retardation
(Feuerstein et al., 1979). More recently it has received considerable attention as a nonbiased approach for use with
children who come from linguistically or culturally diverse communities (Gutierrez-Clellen et al., 1998; Lidz, 1996; Lidz
& Peña, 1996; Peña, Quinn, & Iglesias, 1992). Reduced bias is expected for at least three reasons. First, dynamic
assessment techniques can either circumvent or alter as needed the unfamiliar language and interaction routines that may
penalize children from nondominant cultural backgrounds. Second, the collaborative nature of the interaction of child
and clinician can facilitate more relaxed, confident, and, consequently, valid efforts from the child. Third, the embedding
of instruction in dynamic assessment can reduce the effects of previous experience, a major source of bias for children
who lack the experiences of the mainstream culture (Lidz, 1996).
Bain and Olswang (1995) summarized the promise of dynamic assessment techniques as follows:
Dynamic assessment offers clinicians the opportunity to obtain information as to who to treat, when to treat, what to
treat, how to treat, and to determine prognosis. Such information will enable clinicians to make informed decisions as
they provide services to children with language impairment. (p. 90)
Page 279
A growing body of data bolsters portions of these claims (e.g., see Long & Olswang, 1996; Olswang & Bain, 1996).
However, the complexity and variety of procedures fitting within the umbrella of dynamic assessment mean that much
work remains to be done to optimize the validity of these procedures for individual children and assessment purposes—
or even to understand the extent to which traditional psychometric concepts can be applied to their evaluation
(Embretson, 1987).
8. Qualitative Measures
Speech-language pathologists have always paid attention to a very wide range of information sources beyond those
described thus far in the chapter, including teacher and parent comments, client observations, interviews, and official
documents. More recently, sources such as student journals, portfolios, clinician journals, and critical incident reports, or
“stand outs,” have been added (Schwartz & Olswang, 1996; Silliman & Wilkinson, 1991). Olswang and Bain (1994)
used the terms descriptive and qualitative to refer to these sources of information and described them as subjective, in
contrast to the more typical, operationally defined quantitative data. Olswang and Bain based their discussion of such
measures on the work of authors (e.g., Bogdan & Biklen, 1992; Glesne & Peshkin, 1992) describing qualitative research,
an umbrella term used to describe several research strategies in which subjective, inductive, and richly descriptive
measures are systematically used to examine participants’ perspectives on phenomena of interest. Because of this close
connection to a type of research that may be unfamiliar to many readers, a brief discussion of qualitative research is
offered as background.
Historically, qualitative research methods have been developed somewhat independently in anthropology, nursing,
education, sociology and social work, among other disciplines (Bogdan & Biklen, 1998; Lancy, 1993), but have shown
increasing cross-fertilization. Recently, these methods., especially those described as “ethnographic,” have begun to be
adopted in research and, to a lesser extent, in clinical practice in speech-language pathology (Kovarsky, 1994; Kovarsky
et al., 1999; Silliman & Wilkinson, 1991; Westby, 1990). A thorough description of qualitative research is beyond the
scope of this text, having, in fact, served as the focus for a dazzling array of texts in just the past decade (e.g., Berg,
1998; Bogdan & Biklen, 1998; Creswell, 1998; Denzin & Lincoln, 2000; Kelley, 1999; Lancy, 1993; Taylor & Bogdan,
1998). Nonetheless, a brief overview of some of the theoretical threads uniting different approaches within qualitative
research can help guide our thinking about how qualitative data may be used in the assessment of children’s language
disorders.
Qualitative research strategies have been described as demonstrating, to greater or lesser degrees, the following 5
features, many of which clearly contrast with quantitative strategies (Bogdan & Biklen, 1998). First, the focus of
qualitative research is a natural context in which the researcher serves as the primary ‘‘instrument.” Second, data are
descriptive, rather than quantitative, in nature. Third, interactive social processes, rather than products, are of interest.
Fourth, methods are inductive; thus, abstractions are made from the data that are present, rather than tested from data that
Page 280
are sought out. Fifth, meaning as experienced by individuals from their personal perspectives is of paramount interest.
From a clinical vantage point, one of the chief attractions of qualitative methods is their potential to guide clinicians in
the use of data that may have previously been seen as illicit.
One of the major sources of evidence for the validity of qualitative data lies in the process of triangulation, which can be
defined as the believability provided by repeated examples of a given behavior obtained in a variety of settings or using a
variety of methods (Schwartz & Olswang, 1996). Janesick (1994) describes five kinds of triangulation: triangulation
across (a) data sources, (b) researchers or evaluators, (c) multiple perspectives, (d) multiple methods, and (e) disciplines.
It is the preponderance of evidence gained under these conditions that validates findings (Berg, 1998). Some authors (e.
g., Bogdan & Biklen, 1998) object that the term triangulation is used differently by different authors and thus argue that
the exact methods used to provide rich support for validity need to be specified. However, it is a useful term for
capturing the way in which validity, or believability, is alternatively characterized within this research paradigm. Also,
interestingly, it is related to the concept pervading mainstream psychometric discussions that theoretical constructs need
to be studied using several indicators (Pedhazur & Schmelkin, 1991; Primavera, Allison & Alfonso, 1996). When
thinking about how qualitative data may complement quantitative descriptions of the language and linguistic context for
children with language impairment, the concept points toward a need for multiple sources and settings.
Although additional research may help us understand how such data can best be used in combination with more
established, quantitative measures, existing work on qualitative research can point to the types of questions for which
qualitative data may be best suited (Schwartz & Olswang, 1996). Specifically, questions that relate to the diverse ways in
which a child is viewed in his or her linguistic community or to the special expectations falling on a specific child in a
specific community may best be addressed using qualitative data. Thus, questions that address concerns about handicap
and disability, which relate to functional and participative effects of impairments (WHO, 1980), may be very effectively
answered using qualitative methods.
Practical Considerations
Numerous practical considerations affect the way in which speech-language pathologists currently describe language
disorders in children. Influences related to the consideration of the larger contexts in which language impairments occur
include movements toward increasing use of assessments in which several professionals contribute their insights into the
functioning of a child (coordinated assessment strategies). In addition, there has been a continuing movement in the past
few decades toward assessments for school-age children in which the functional demands of academic settings are
recognized as the chief challenges facing them (curriculum-based assessment).These coordinated approaches toward
assessment could have been profitably discussed in chapter 9, which dealt with identification. Nonetheless, they are
discussed here because of the closer connection of description than identification to
Page 281
treatment planning—the area of clinical decision making thought to benefit most from coordination.
Beyond these assessment strategies, perhaps the most important practical consideration affecting the descriptive
measures chosen by clinicians is that of time and other practical resources. In this section, the role of coordinated
assessment strategies and other practical factors is discussed briefly to illustrate some of the forces shaping descriptive
practices in work with children with language impairments.
Coordinated Assessment Strategies
Children with language impairment experience a range of needs that require the attention and care of individuals from a
variety of disciplines, for example, speech-language pathology and audiology, psychology, social work, occupational
therapy, physical therapy, and a variety of other health professionals. As a child’s physical problems and other problems
increase in number, coordination of assessments and interventions conducted by these professionals become crucial
(Linder, 1993; Rosetti, 1986). Without coordination, professionals may work at cross purposes with families, overwhelm
them with excessive or contradictory recommendations and, as a result, facilitate small gains in individual domains while
undermining the overall quality of the child’s life (Calhoon, 1997; Raver, 1991). Rosetti described the difficulty facing a
professional working alone with a child with many problems as suffering from tunnel-vision, in which the child may be
viewed from the exceedingly narrow perspective of that single individual’s academic and professional background.
Particularly for very young children with multiple needs, the need for coordination has been recognized in legislation
and in the development of sophisticated strategies of coordination. Three general strategies that attempt to meet the
needs of children and families are multidisciplinary, interdisciplinary, and transdisciplinary approaches (Calhoon, 1997;
Raver, 1991).
Multidisciplinary assessment involves parallel planning, administration, and interpretation processes in which parents
interact independently with individual disciplines. Interdisciplinary assessment involves coordination through team
planning of assessment; consultation with team members as assessments occur within individual disciplines and parent
involvement is encouraged. Transdisciplinary assessment involves shared assessments conducted by the entire team.
This approach involves participation of all team members as well as the child’s parents throughout the planning,
administration, and interpretation process. Two examples of specific transdisciplinary approaches include play and arena
assessment, in which a criterion-referenced measurement strategy is implemented within a naturalistic context. Whereas
multidisciplinary and interdisciplinary approaches tend to predominate in systems designed for older children,
transdisciplinary approaches have become particularly popular for the assessment of infants and toddlers (Calhoon,
1997).
Attempts at the coordination of disciplines, particularly those that increase the involvement of parents, are presumed to
increase the validity of measurement and the effectiveness with which clinical decisions can be implemented (Crais,
1993). Further, greater coordination, particularly with parents, is required through IDEA (1990).
Page 282
As a consequence, it is likely that increasing attention will be paid to the validation of coordinated approaches to
assessment and to the development of methods to increase their efficiency.
Curriculum-Based Assessment
For school-age children with language impairments, coordination of disciplines is often more limited than for younger
children, although it is at least as vital. For school-age children, coordination will entail collaboration between classroom
teachers, special educators, and speech-language pathologists. For this age group, collaborative assessment approach is
the term most frequently used to refer to the way in which professionals (speech-language pathologists in this case)
attempt to coordinate their activities with those of the other professionals serving the child in a school setting.
Curriculum-based assessment is one particularly widespread component of collaborative assessment approaches
(Prelock, 1997).
Collaboration enables the speech-language pathologist and other members of the educational team to understand the
specific language and communication demands facing the child with a given teacher, classroom, and curriculum. The
purposes of this collaboration are to determine what demands present particular challenges to the child and to identify
team resources for addressing them (Creaghead, 1992; Prelock, 1997; Silliman & Wilkinson, 1991).
Curriculum-based assessment has been defined broadly as “evaluation of a student’s ability to meet curriculum
objectives so that school success can be achieved” (Prelock, 1997, p. 35). Adding more detail to this concept, Nelson
(1989, 1994) called attention to the presence of numerous kinds of curricula. Thus, for example, in addition to the
official curriculum of the school district, there are the cultural curriculum consisting of unspoken expectations based on
the mainstream culture and the underground curriculum consisting of the rules affecting peer social interactions.
In order to understand and respond to school curricula in both the broad and more detailed senses, the speech-language
pathologist will almost always need to use criterion-referenced measures. Such measures are sometimes aimed at
characterizing the educational setting and its demands and sometimes aimed at determining whether the child is or is not
able to meet those demands. Identifying the taxing aspects of language and communication within the classroom will
benefit not just the child with language impairments but all students within that classroom (Prelock, 1997). Obviously,
the benefit of collaborative curriculum-based assessments to children with language impairments is the possibility of
describing and then responding to their difficulties. These responses by the speech-language pathologist and other team
members can result in accommodations or other active steps to foster greater success in the regular classroom. In
essence, curriculum-based assessments can help prevent impairment from necessarily being realized as a disability or
handicap, in the terminology of the ICIDH (WHO, 1980). Alternatively, it can also be seen as preventing impairment
from being realized as a limitation in activities or participation opportunities, in the terminology of ICIDH-2 (WHO,
1998).
Page 283
Other Practical Factors
Practical factors beyond those discussed in this chapter, such as time and money, appear to affect the ways in which
clinicians conduct language assessments (Beck, 1996; Wilson, Blackmon, Hall, & Elcholtz, 1991), including
assessments designed to plan for treatment (Beck, 1996). Time demands seem to stem from the pressures of large
caseloads. In particular, whereas ASHA (1993) recommended caseload sizes of 40, Shewan and Slater (1993) found that
school clinicians have average caseloads of 52! Beck’s survey found that clinicians frequently reported that they did not
have sufficient time to conduct complete assessments. In addition, clinicians also reported insufficient funds to buy
“adequate materials for assessment.” Other data from the same source led Beck to ponder whether frequency of use
might result from properties of a test as simple as its being appropriate for a wide age range and its addressing both
receptive and expressive concerns. This possibility led her to comment “these are certainly not the ideal criteria on which
to base selection of assessment methods” (Beck, 1996, p. 58).
Further, Beck (1996) and Wilson et al. (1991) did not obtain detailed information about the entire range of descriptive
measures used by clinicians. However, they did find that language sampling is very widely used. Given the expressed
concerns about time and money, however, it seems likely that the time-consuming descriptive measures and many of the
exciting but emerging descriptive measures described in this chapter may not make it into the repertoire of techniques
used by clinicians. At least this conclusion seems reasonable in the absence of considerable effort on the part of
individual clinicians and the profession. These efforts may take the form of working to reduce caseload sizes and
increase budgets. Alternatively, they may take the form of research studies aimed at increasing the efficiency and variety
of descriptive measures. Fortunately, there is widespread realization that descriptive measures are the most appropriate
tools to use in addressing many critical clinical questions—the first step needed to engage the attention of individual
clinicians and of the profession as a whole.
Summary
1. Descriptive measurement of language presents both greater challenges and greater rewards to the practicing clinician
than does assessment aimed at screening or identification because of its steadfast tie to the heart of clinical practice:
interventions designed to improve the social, communicative lives of children.
2. Even more than measures used in identification, descriptive measures of language require scrupulous attention by the
clinician to achieve a match between the specific clinical question being posed and method used to achieve it. This is
true largely because the specificity of the question being asked necessitates the use of informal measures that can only be
“validated’’ through the actions of the individual clinician.
3. Damico et al. (1992) described authenticity, functionality, and richness of description as critical characteristics for
descriptive measures.
4.
Page 284
A wealth of strategies have been proposed for use in description, including standardized norm-referenced measures,
standardized criterion-referenced measures, probes, rating scales, language analysis, on-line observations, dynamic
assessment, and qualitative measures.
5. Because children with special language needs often require the attention of other professionals as well, assessment
frameworks have arisen that reflect differing degrees of coordination across disciplines, as well as differing degrees of
parent involvement. These range from multidisciplinary to interdisciplinary to transdisciplinary assessments.
6. The nature of coordinated assessment efforts changes according to the age of the child, with younger children more
frequently served using methods that involve a greater degree of integration across professions and older children served
using methods that acknowledge the primacy of the school environment for the school-age child. Terms associated with
coordinated assessments include arena and play-based assessment methods for younger children as well as curriculum-
based assessment for older children.
7. Recent innovations, such as dynamic assessment and the thoughtful use of qualitative measures, challenge researchers
and clinicians with opportunities for a richer description of the effects of language disorders on children, including those
from nonmainstream cultures.
8. Future developments with regard to descriptive measures are likely to include the development and validation of new
methods as well as the development of better practices leading to more efficient and effective application of existing
approaches.
Key Concepts and Terms
authentic assessment: assessment occurring when skills to be assessed are selected to represent realistic learning
demands conducted in real-life settings, such as classrooms, in which artificial and standard conditions are avoided
(Schraeder et al., 1999).
authenticity: the most complex of three primary characteristics described by Damico et al. (1992) as necessary for
descriptive measures; it includes respect for and preservation of the intricate and meaning-directed nature of
communication as well as traditional concepts of reliability and validity.
collaborative assessment approach: any of several approaches in which professionals from different disciplines (e.g.,
speech-language pathologists, audiologists, special educators) work together to provide information leading to effective
and efficient intervention for a given child.
curriculum-based assessment: assessment aimed at examining a child’s skills and challenges in relation to curricular
demands for purposes of planning interventions that may occur within and outside of the classroom.
direct magnitude estimation: a type of rating method in which stimuli to be rated are compared with one another or
against a standard stimulus.
Page 285
dynamic assessment: a variety of approaches to description in which stimuli and procedures are modified to identify the
child’s potential performance with adult collaboration to help determine treatment goals and facilitative methods;
considered especially useful as a means of nonbiased assessment for children who are bilingual or from nondominant
cultural backgrounds.
event recording: an observational method in which the frequency of specific behaviors (events) is recorded across the
entire observational time period.
functionality: one of three primary characteristics described by Damico et al. (1992) as necessary for descriptive
measures, consisting of their ability to capture a child’s skill in transmitting meaning effectively, fluently, and
appropriately.
interval recording: A method of obtaining on-line observational data in which the observer notes the presence of a
behavior or targeted characteristic within a relatively short time frame (e.g., 10 seconds).
interval scaling: A rating technique in which raters are asked to assign a number or verbal label to a set of related stimuli.
metathetic continuum: the type of rating shown when raters’ responses to differences between rated entities seem to
reflect qualitative distinctions. Auditory stimuli differing in pitch appear to be treated in this fashion by raters.
multidisciplinary assessment: assessment in which professionals involved with a child work in parallel to plan, conduct,
and interpret their individual assessments with interactions between professionals occurring in a less structured fashion
than interdisciplinary or transdisciplinary assessments.
probe: an informal measure in which the clinician attempts to devise conditions that will elicit a response demonstrating
a child’s knowledge of a particular area of form, content, or use.
prothetic continuum: the type of rating shown when raters’ responses to differences between rated entities appear to
reflect quantitative distinctions. Auditory stimuli differing in loudness appear to be judged in this fashion by raters.
qualitative research: a range of research strategies designed to be naturalistic, descriptive, inductive in nature, and
concerned with process and meaning (Bogdan & Biklen, 1998).
richness of description: one of three primary characteristics described by Damico et al. (1992) as necessary for
descriptive measures; it entails the use of sufficient detail to lead to an understanding of causality that may be used in
planning treatment.
time sampling: a method of observation in which the observation time period is divided into intervals and the presence of
a targeted behavior is recorded at the end of each interval.
transdisciplinary assessment: assessments in which team members from different disciplines share maximally in the
assessment process; specific examples of this type of assessment include arena and play-based assessments, which are
used most frequently with infants and toddlers.
Page 286
trial scoring: the recording of a response as correct or incorrect following a specific stimulus or trial (McReynolds and
Kearns, 1983).
triangulation: an approach to validation in which convergent findings are sought across varying methods, data courses,
and data sources; recently emphasized in relation to qualitative research methods.
zone of proximal development (ZPD): the range of behaviors lying between independent functioning and functioning
that must be facilitated by a more expert interaction partner; thought to illustrate a child’s emerging mastery or learning
readiness.
Study Questions and Questions to Expand Your Thinking
1. On the basis of your reading of this chapter, formulate three ideas for research projects aimed at clarifying some
psychometric characteristic (e.g., validity for a purpose, reliability) of a specific descriptive measure, thus making it
more clinically useful.
2. Look at a recent issue of a journal containing articles on children with language impairments. See if you can find
examples of probes that could be added to Table 10.3.
3. Engage in a conversation with two different people for a period of 10 minutes each, ideally tape-recording it with their
knowledge so that you can go back over the conversations. Then create a list of the factors affecting your word choice,
the length of your sentences, the structure of your sentences, the nature of your turn-taking, and so forth. Can you group
the items on your list into related factors? Once you have done this, consider the extent to which children’s
communications are likely to be similarly affected in the course of collecting a language sample.
4. Consider ways to triangulate information about a child’s lack of success in a reading class in a regular first grade class.
Develop a small set of related questions about the child and the context and then consider what kinds of measures might
provide you with a rich understanding of the child’s difficulties.
5. Find out what coordinated assessment methods exist in any clinical settings that serve children to which you have
access. Consider what benefits might be gained, and at what costs, if greater integration were to occur across
professional roles within that setting.
Recommended Readings
Damico, J. S., Secord, W. A., & Wiig, E. H. (1992). Descriptive language assessment at school: Characteristics and
design. In W. Secord (Ed.), Best practices in school speech-language pathology: Descriptive/nonstandardized language
assessment (pp. 1–8). San Antonio, TX: Psychological Corporation.
Kovarsky, D. (1994). Distinguishing quantitative and qualitative research methods in communication sciences and
disorders. National Student Speech Language Hearing Association Journal, 21, 59–64.
Olswang, L. B., & Bain, B. A. (1991). When to recommend intervention. Language, Speech, and Hearing Services in
Schools, 22, 255–263.
Page 287
References
Agerton, E. P., & Moran, M. J. (1995). Effects of race and dialect of examiner on language samples elicited from
Southern African American Preschoolers. Journal of Childhood Communication Disorders, 16, 25–30.
American Psychological Association, American Educational Research Association, National Council on Measurement in
Education. (1985). Standards for educational and psychological testing. Washington, DC: APA.
American Speech-Language-Hearing Association. (1993). Guidelines for caseload size and speech-language delivery in
the schools. Asha, 35(Suppl. 10), 33–39.
Aram, D. M., Morris, R., & Hall, N. E. (1993). Clinical and research congruence in identifying children with specific
language impairment. Journal of Speech and Hearing Research, 36, 580–591.
Bain, B. A., & Dollaghan, C. (1991). The notion of clinically significant change. Language, Speech, and Hearing
Services in Schools, 22, 264–270.
Bain, B. A., & Olswang, L. B. (1995). Examining readiness for learning two-word utterances by children with specific
expressive language impairment: Dynamic assessment validation. American Journal of Speech-Language Pathology, 4,
81–91.
Baker-van den Goorbergh, L. (1990). CLEAR: Computerized language-error analysis report. Clinical Linguistic and
Phonetics, 4, 285–293.
Barrow, J. D. (1992). Pi in the sky: Counting, thinking, and being. New York: Oxford University
Bates, E., & MacWhinney, B. (1989). Functionalism and the competition model. In B. MacWhinney & E. Bates (Eds.),
The crosslinguistic study of sentence processing (pp. 3–73). New York: Cambridge University Press.
Beck, A. R. (1996). Language assessment methods for three age groups of children. Journal of Children’s
Communication Development, 17, 51–66.
Berg, B. L. (1998). Qualitative research methods for the social sciences. (3rd ed.). Boston: Allyn & Bacon.
Berk, R. A. (1984). Screening and identification of learning disabilities. Springfield, IL: Thomas.
Bishop, D. V. M. (1985). Automated LARSP (Language Assessment, Remediation, and Screening) [Computer program].
Manchester, England: University of Manchester.
Blake, J., Quartaro, G., & Onorati, S. (1993). Evaluating quantitative measures of grammatical complexity in
spontaneous speech samples. Journal of Child Language, 20, 139–152.
Bogdan, R. C., & Biklen, S. K. (1992). Qualitative research for education: An introduction to theory and methods (2nd
ed.). Boston: Allyn & Bacon.
Bogdan, R. C., & Biklen, S. K. (1998). Qualitative research for education: An introduction to theory and methods (3rd
ed.). Boston: Allyn & Bacon.
Brinton, B., & Fujiki, M. (1992). Setting the context for conversational language sampling. In W. Secord (Ed.), Best
practices in school speech-language pathology: Descriptive/nonstandardized language assessment (pp. 9–19). San
Antonio, TX: Psychological Corporation.
Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University.
Burroughs, E. I., & Tomblin, J. B. (1990). Speech and language correlates of adults’ judgments of children. Journal of
Speech and Hearing Disorders, 55, 485–494.
Butler, K. G. (1997). Dynamic assessment at the millennium: A transient tutorial for today! Journal of Children’s
Communication Development, 19, 43–54.
Calhoon, J. M. (1997). Comparison of assessment results between a formal standardized measure and a play-based
format. Infant–Toddler Intervention, 7, 201–214.
Campbell, T., & Dollaghan, C. (1992). A method for obtaining listener judgments of spontaneously produced language:
Social validation through direct magnitude estimation. Topics in Language Disorders, 12, 42–55.
Campione, J., & Brown, A. L. (1987). Linking dynamic assessment with school achievement. In C. S. Lidz (Ed.),
Dynamic assessment: An interactional approach to evaluating learning potential (pp. 82–116). New York: Guildford.
Chapman, R. (1981). Computing mean length of utterance in morphemes. In J. F Miller (Ed.), Assessing language
production in children (pp. 22–25). Baltimore: University Park Press.
Page 288
Cirrin, F. M., & Penner, S. G. (1992). Implementing change to descriptive language assessment approaches in the
schools. In W. Secord (Ed.), Best practices in school speech-language pathology: Descriptive/nonstandardized language
assessment (pp. 23–131). San Antonio, TX: Psychological Corporation.
Cochran, P. S., & Masterson, J. J. (1995). NOT using a computer in language assessment/intervention: In defense of the
reluctant clinician. Language, Speech, Hearing, Services in Schools, 26, 213–222.
Conant, S. (1987). The relationship between age and MLU in young children: A second look at Klee and Fitzgerald’s
data. Journal of Child Language, 14, 169–173.
Crais, E. R. (1993). Families and professionals as collaborators in assessment. Topics in Language Disorders, 14(1), 29–
40.
Creaghead, N. A. (1992). Classroom interactional analysis/script analysis. In W. Secord (Ed.), Best practices in speech-
language pathology: Descriptive/nonstandardized language assessment (pp. 65–72). San Antonio, TX: Psychological
Corporation.
Creswell, J. W. (1998). Qualitative inquiry and research design: Choosing among five traditions. Thousand Oaks, CA:
Sage Publications.
Crystal, D. (1982). Profiling linguistic disability. London: Edward Arnold.
Crystal, D. (1987). Toward a “bucket” theory of language disability: Taking account of interaction between linguistic
levels. Clinical Linguistics and Phonetics, 1, 7–22.
Crystal, D., Fletcher, P., & Garman, M. L. (1989). Grammatical analysis of language disability (2nd ed.). London:
Whurr.
Damico, J. S. (1992). Systematic observation of communicative interaction: A valid and practical descriptive assessment
technique. In W. Secord (Ed.), Best practices in school speech-language pathology: Descriptive/nonstandardized
language assessment (pp. 133–143). San Antonio, TX: Psychological Corporation.
Damico, J. S., Secord, W. A., & Wiig, E. H. (1992). Descriptive language assessment at school: Characteristics and
design. In W. Secord (Ed.), Best practices in school speech-language pathology: Descriptive /nonstandardized language
assessment (pp. 1–8). San Antonio, TX: Psychological Corporation.
Denzin, N. K., & Lincoln, Y. S. (2000). The handbook of qualitative research (2nd ed.). Thousand Oaks, CA: Sage.
Diedrich, W. M., & Bangert, J. (1980). Articulation learning. Houston, TX: College-Hill Press.
Dollaghan, C., & Campbell, T. (1992). A procedure for classifying disruptions in spontaneous language samples. Topics
in Language Disorders, 12, 56–68.
Dollaghan, C., Campbell, T., & Tomlin, R. (1990). Video narration as a language sampling context. Journal of Speech
and Hearing Disorders, 55, 582–590.
Elbert, M., Shelton, R. L., & Arndt, W. B. (1967). A task for evaluation of articulation change: I. Development of
methodology. Journal of Speech and Hearing Research, 10, 281–289.
Embretson, S. E. (1987). Toward development of a pyschometric approach. In C. S. Lidz (Ed.), Dynamic assessment: An
interactional approach to evaluating learning potential (pp. 141–170). New York: Guilford.
Evans, J. L. (1996a). Plotting the complexities of language sample analysis. In K. N. Cole, P. S. Dale, & D. J. Thal
(Eds.), Assessment of communication and language (pp. 207–256). Baltimore: Brookes Publishing.
Evans, J. L. (1996b). SLI subgroups: Interaction between discourse constraints and morphological deficits. Journal of
Speech and Hearing Research, 39, 655–660.
Evans, J. L., & Miller, J. F. (1999). Language sample analysis in the 21st century. Seminars in Speech and Language, 20,
101–115.
Feuerstein, R., Miller, R., Rand, Y., & Jensen, M. (1981). Can evolving techniques better measure cognitive change?
Journal of Special Education, 15, 201–219.
Feuerstein, R., Rand, Y., & Hoffman, M. (1979). The dynamic assessment of retarded performers. Baltimore: University
Park Press.
Finnerty, J. (1991). Communication analyzer [Computer software]. Lexington, MA: Educational Software Research.
Frattali, C. (Ed.). (1998). Measuring outcomes in speech-language pathology. New York: Thieme.
Gavin, W. J., & Giles, L. (1996). Sample size effects on temporal reliability of language sample measures of preschool
children. Journal of Speech, Language, and Hearing Research, 39, 1258–1262.
Page 289
Gavin, W. J., Klee, T., & Membrino, I. (1993). Differentiating specific language impairment from normal language
development using grammatical analysis. Clinical Linguistics and Phonetics, 7, 191–206.
Gerken, L., & Shady, M. (1996). The picture selection task. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods
for assessing children’s syntax (pp. 287–302). Cambridge, MA: MIT Press.
Goldstein, H., & Geirut, J. (1998). Outcomes measurement in child language and phonological disorders. In C. Frattali
(Ed.), Measuring outcomes in speech-language pathology (pp. 406–437). New York: Thieme.
Goodluck, H. (1996). The acting-out task. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing
children’s syntax (pp. 147–162). Cambridge, MA: MIT Press.
Gutierrez-Clellen, V. F., Brown, S., Conboy, B., & Robinson-Zañartu, C. (1998). Modifiability: A dynamic approach to
assessing immediate language change. Journal of Children’s Communication Development, 19, 31–42.
Hirsh-Pasek, K., & Golinkoff, R. M. (1996). The intermodal preferential looking paradigm: A window onto emerging
language comprehension. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing children’s syntax
(pp. 105–124). Cambridge, MA: MIT Press.
Hixson, P. K. (1985). Developmental Sentence Scoring computer program [Computer program]. Omaha, NE:
Computerized Language Analysis.
Howard, S., Hartley, J., & Muller, D. (1995). The changing face of child language assessment: 1985–1995. Child
Language Teaching and Therapy, 11, 7–22.
Janesick, V. J. (1994). The dance of qualitative research design. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of
qualitative research (pp. 209–219). Thousand Oaks, CA: Sage.
Kelley, D. L. (1999). Measurement made accessible: A research approach using qualitative, quantitative and TQM
methods. Thousand Oaks, CA: Sage.
Kemp, K., & Klee, T. (1997). Clinical language sampling practices: Results of a survey of speech-language pathologists
in the United States. Child Language Teaching and Therapy, 13, 161–176.
Kovarsky, D. (1994). Distinguishing quantitative and qualitative research methods in communication sciences and
disorders. National Student Speech Language Hearing association Journal, 21, 59–64.
Kovarsky, D., Duchan, J., & Maxwell, M. (Eds.). (1999). Constructing (in)competence. Mahwah, NJ: Lawrence Erlbaum
Associates.
Lahey, M. (1988). Language disorders and language development. New York: Macmillan.
Lancy, D. (1993). Qualitative research in education: An introduction to the major traditions. White Plains, NY:
Longman.
Landa, R. M., & Olswang, L. (1988). Effectiveness of language elicitation tasks with two-year-olds. Child Language
Teaching and Therapy, 4, 170–192.
Lee, L. (1974). Developmental sentence analysis. Evanston, IL: Northwestern University Press.
Leonard, L. (1996). Assessing morphosyntax in clinical settings. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.),
Methods for assessing children’s syntax (pp. 287–302). Cambridge, MA: MIT Press.
Leonard, L., Prutting, C. A., Perozzi, J. A., & Berkley, R. K. (1978). Nonstandardized approaches to the assessment of
language behaviors. Asha, 20, 371–379.
Lidz, C. S. (1987). Dynamic assessment: An interactional approach to evaluating learning potential. New York:
Guilford.
Lidz, C. S. (1996, November). Dynamic assessment: Theory, application and research. Handout for seminar presented at
the American Speech-Language-Hearing Association meeting, Seattle, WA.
Lidz, C. S., & Peña, E. D. (1996). Dynamic assessment: The model, its relevance as a nonbiased approach, and its
application to Latin American preschool children. Language, Speech, and Hearing Services in Schools, 27, 367–372.
Linder, T. W. (1993). Traditional assessment and transdisciplinary play-based assessment. In T. W. Linder (Ed.),
Transdisciplinary play-based assessment (pp. 9–22). Baltimore: Brookes Publishing.
Long, S. H. (1991). Integrating microcomputer applications into speech and language assessment. Topics in Language
Disorders, 11, 1–17.
Long, S. H. (1999). Technology applications in the assessment of children’s language. Seminars in Speech and
Language, 20, 117–132.
Long, S. H., & Fey, M. E. (1989). Computerized Profiling Version 6.2 (Macintosh and MS-DOS series) [Computer
program]. Ithaca, NY: Ithaca College.
Page 290
Long, S. H., Fey, M., & Channell, R. W. (1998). Computerized profiling (CP) [computer program]. Cleveland, OH:
Department of Communication Sciences, Case Western Reserve University.
Long, S. H., & Masterson, J. J. (1993, September). Computer technology: Use in language analysis. Asha, 35, 40–41, 51.
Long, S. H., & Olswang, L. B. (1996). Readiness and patterns of growth in children with SELI. American Journal of
Speech-Language Pathology, 5, 79–85.
Lucas, D. R., Weiss, A. L., & Hall, P. K. (1993). Assessing referential communication skills: The use of a
nonstandardized assessment procedure. Journal of Childhood Communication Disorders, 15, 25–34.
Lund, N. J., & Duchan, J. (1983). Assessing children’s language in naturalistic contexts. Englewood Cliffs, NJ: Prentice-
Hall.
Lund, N. J., & Duchan, J. (1993). Assessing children’s language in naturalistic contexts. (3rd ed.). Englewood Cliffs,
NJ: Prentice-Hall.
Lust, B., Flynn, S., & Foley, C. (1996). Why children know what they say: Elicited imitation as a research method for
assessing children’s syntax. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing children’s syntax
(pp. 55–76). Cambridge, MA: MIT Press.
MacWhinney, B. (1991). The CHILDES project: Tools for analyzing talk. Hillsdale, NJ: Lawrence Erlbaum Associates.
McCauley, R. J. (1996). Familiar strangers: Criterion-referenced measures in communication disorders. Language,
Speech, Hearing Services in Schools, 27, 122–131.
McCauley, R., & Swisher, L. (1984). Use and misuse of norm-referenced tests in clinical assessment: A hypothetical
case. Journal of Speech and Hearing Disorders, 49, 338–348.
McDaniel, D., McKee, C., & Cairns, H. S. (Eds.) (1996). Methods for assessing children’s syntax. Cambridge, MA: MIT
Press.
McReynolds, L. V., & Kearns, K. (1983). Single-subject experimental designs in communicative disorders. Austin, TX:
Pro-Ed.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American
Council on Education and Macmillan.
Miller, J. F. (1981). Assessing language production in children: Experimental procedures. Baltimore, MD: University
Park Press.
Miller, J. F. (1996). Progress in assessing, describing, and defining child language disorder. In K. N. Cole, P. S. Dale, &
D. J. Thal (Eds.), Assessment of communication and language (pp. 309–324). Baltimore: Brookes Publishing.
Miller, J. F., & Chapman, R. (1982). SALT: Systematic analysis of Language Transcripts [computer software]. Madison,
WI: University of Wisconsin-Madison, Waisman Research Center, Language Analysis Laboratory.
Miller, J. F., & Chapman, R. (1998). SALT: Systematic analysis of Language Transcripts [computer software]. Madison,
WI: University of Wisconsin-Madison, Waisman Research Center, Language Analysis Laboratory.
Miller, J., Freiberg, C., Rolland, M. -B., & Reeves, M. (1992). Implementing computerized language sample analysis in
the public schools. Topics in Language Disorders, 12(2), 69–82.
Miller, J. F., & Klee, T. (1995). Computational approaches to the analysis of language impairment. In P. Fletcher & B.
MacWhinney (Eds.), The handbook of child language (545–572). Oxford: Blackwell.
Miller, J. F., & Paul, R. (1995). The clinical assessment of language comprehension. Baltimore: Brookes Publishing.
Minifie, F., Darley, F., & Sherman, D. (1963). Temporal reliability of seven language measures. Journal of Speech and
Hearing Research, 6, 139–149.
Moellman-Landa, R., & Olswang, L. B. (1984). Effects of adult communication behaviors on language-impaired
children’s verbal output. Applied Psycholinguistics, 5, 117–134.
Mordecai, D. R., Palin, M. W., & Palmer, C. B. (1985). Lingquest 1 [computer software]. Columbus, OH: Macmillan.
Morris, R. (1994). A review of critical concepts and issues in the measurement of learning disabilities. In R. Lyon (Ed.),
Frames of reference for the assessment of learning disabilities: New views on measurement issues (pp. 615–626).
Baltimore: Brookes Publishing.
Page 291
Muma, J. (1998). Effective speech-language pathology: A cognitive socialization approach. Mahwah, NJ: Lawrence
Erlbaum Associates.
Muma, J., Morales, A., Day, K., Tackett, A., Smith, S., Daniel, B., Logue, B., & Morriss, D. (1998). Language sampling:
Grammatical assessment. In J. Muma (Ed.), Effective speech-language pathology: A cognitive socialization approach
(pp. 310–345). Mahwah, NJ: Lawrence Erlbaum Associates.
Nelson, N. W. (1989). Curriculum-based language assessment and intervention. Language, Speech, and Hearing
Services in Schools, 20, 170–184.
Nelson, N. W. (1994). Curriculum-based language assessment and intervention across the grades. In G. P. Wallach & K.
G. Butler (Eds.), Language learning disabilities in school-age children and adolescents: Some principles and
applications (104–131). New York: Merrill.
Norris, J., & Hoffman, P. (1993). Whole language intervention for school-age children. San Diego, CA: Singular
Publishing.
Olswang, L. B., & Bain, B. A. (1991). When to recommend intervention. Language, Speech, and Hearing Services in
Schools, 22, 255–263.
Olswang, L. B., & Bain, B. A. (1994). Data collection: Monitoring children’s treatment progress. American Journal of
Speech-language Pathology, 3, 55–66.
Olswang, L. B., & Bain, B. A. (1996). Assessment information for predicting upcoming change in language production.
Journal of Speech and Hearing Research 39, 414–423.
Olswang, L. B., Bain, B. A., & Johnson, G. A. (1992). Using dynamic assessment with children with language disorders.
In S. F. Warren & J. Reichle (Eds.), Causes and effects in communication and language intervention (pp. 187–215).
Baltimore: Brookes Publishing.
Panagos, J., & Prelock, P. (1982). Phonological constraints on the sentence productions of language disordered children.
Journal of Speech and Hearing Research, 25, 171–176.
Paul, R., & Shriberg, L. (1982). Associations between phonology and syntax in speech disordered children. Journal of
Speech and Hearing Research, 25, 536–546.
Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Peña, E., Quinn, R., & Iglesias, A. (1992). The application of dynamic assessment methods to language assessment: A
non-biased procedure. Journal of Special Education, 26, 269–280.
Prelock, P. A. (1997). Language-based curriculum analysis: A collaborative assessment and intervention process.
Journal of Children’s Communication Development, 19, 35–42.
Primavera, L. H., Allison, D. B., & Alfonso, V. C. (1996). Measurement of dependent variables. In R. D. Franklin, D. B.
Allison, & B. S. (Gorman (Eds.), Design and analysis of single-case research (pp. 41–92). Mahwah, NJ: Lawrence
Erlbaum Associates.
Pye, C. (1987). The Pye analysis of language [computer software]. Lawrence, KS: Author.
Raver, S. A. (1991). Transdisciplinary approach to infant and toddler intervention. In S. A. Raver (Ed.), Strategies for
teaching at-risk and handicapped infants and toddlers. A transdisciplinary approach (pp. 26–44). New York: Merrill.
Roeper, T., de Villiers, J., & de Villiers, P. (1999, November). What every 5 year old should know: Syntax, semantics
and pragmatics. Presentation to the American Speech-Language-Hearing Association convention, San Francisco.
Rondal, J. A., Ghiotto, M., Bredart, S., & Bachelet, J. F. (1988). Age-relation, reliability, and grammatical validity of
measures of utterance length. Journal of Child Language, 14, 433–446.
Rosetti, L. (1986). High-risk infants: Identification, assessment, and intervention. Boston: College-Hill.
Roth, F., & Spekman, N. (1984). Assessing the pragmatic abilities of children: Part I. Organizational framework and
assessment parameters. Journal of Speech and Hearing Disorders, 49, 2–11.
Salvia, J., & Ysseldyke, J. E. (1981). Assessment in remedial and special education (2nd ed.). Boston: Houghton-Mifflin.
Scarborough, H. S. (1990). Index of productive syntax. Applied Psycholinguistics, 11, 1–22.
Scarborough, H. S., Wyckoff, J., & Davidson, R. (1986). A reconsideration of the relation between age and mean
utterance length. Journal of Speech and Hearing Research, 29, 394–399.
Page 292
Schiavetti, N. (1992). Scaling procedures for the measurement of speech intelligibility. In R. Kent (Ed.), Intelligibility in
speech disorders: Theory, measurement and management (pp. 11–34). Philadelphia: John Benjamins.
Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms
suggest new concepts for training. Psychological Sciences, 3, 207–217.
Schraeder, T., Quinn, M., Stockman, I. J., & Miller, J. F. (1999). Authentic assessment as an approach to preschool
speech-language screening. American Journal of Speech-Language Pathology, 8, 195–200.
Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative
strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101.
Secord, W. A. (1981). C-PAC: Clinical Probes of Articulation Consistency. San Antonio, TX: Psychological
Corporation.
Secord, W. A., & Shine, R. E. (1997). Secord-Contextual Articulation Tests. Sedona, AZ: Red Rock Educational
Publications.
Semel, E., Wiig, E. H., & Secord, W. A. (1996). Clinical Evaluation of Language Fundamentals 3. San Antonio, TX:
Psychological Coproration.
Shewan, C., & Slater, S. (1993). Caseloads of speech-language pathologists, Asha, 35, 64.
Silliman, E. R., & Wilkinson, L. C. (1991). Communicating for learning: Classroom observation and collaboration.
Gaithersburg, MD: Aspen Publishers.
Simon, C. (1984). Evaluating communicative competence: A functional pragmatic procedure. Tucson, AZ:
Communication Skill Builders.
Smith, A. R., McCauley, R. J., & Guitar, B. (in press). Development of the Teacher Assessment of Student
Communicative Competence (TASCC) for children in grades 1 through 5. Communication Disorders Quarterly.
Stevens, S. S. (1975). Psychophysics. New York: Wiley.
Stromswold, K. (1996). Analyzing spontaneous speech. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for
assessing children’s syntax (pp. 23–53). Cambridge, MA: MIT Press.
Taylor, S. J., & Bogdan, R. (1998). Introduction to qualitative research methods: A guidebook and resources (3rd ed.).
New York: Wiley.
Templin, M. C. (1957). Certain language skills in children. Minneapolis: University of Minnesota Press.
Terrell, F., Terrell, S. L., & Golin, S. (1977). Language productivity of black and white children in black versus white
situations. Language and Speech, 20, 377–383.
Thorton, R. (1996). Elicited production. In D. McDaniel, C. McKee, & H. S. Cairns (Eds.), Methods for assessing
children’s syntax (pp. 77–102). Cambridge, MA: MIT Press.
Turner, R. G. (1988). Techniques to determine test protocol performance. Ear and Hearing, 9, 177–189.
Tyack, D., & Gottsleben, R. (1974). Language sampling, analysis, and training. Palo Alto, CA: Consulting
Psychologists Press.
Vetter, D. K. (1988). Designing informal assessment procedures. In D. E. Yoder & R. D. Kent (Eds.), Decision making
in speech-language pathology (pp. 192–193). Baltimore: Brookes Publishing.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Edited by M. Cole, V.
John-Steiner, S. Scribner, & E. Souberman. Cambridge, MA: Harvard University Press.
Weiner, F. F. (1988). Parrot Easy Language Sample Analysis (PELSA) [Computer software]. State Park, PA: Parrot
Software.
Westby, C. E. (1990). Ethnographic interviewing: Asking the right questions to the right people in the right ways.
Journal of Childhood Communication Disorders, 13, 101–111.
Wilson, K., Blackmon, R., Hall, R, & Elcholtz, G. (1991). Methods of language assessment: A survey of California
public school clinicians. Language, Speech, and Hearing Services in Schools, 22, 236–241.
World Health Organization. (1980). ICIDH: The International Classification of Impairments, Disabilities, and
Handicaps. Geneva, Switzerland: World Health Organization.
World Health Organization. (1998). Toward a common language for functioning and disablement: ICIDH-2: The
International Classification of Impairments, Activities and Participation. Geneva, Switzerland: World Health
Organization.
Page 293
CHAPTER
11

Examining Change: Is This Child’s Language Changing?


The Nature of Examining Change

Special Considerations for Asking This Clinical Question

Available Tools

Practical Considerations
David is 8 years old and was diagnosed at the age of 7 with a fatal form of a genetic neurodegenerative disease,
adrenoleukodystrophy. He had developed normally until about age 6½, when he began showing signs of clumsiness and
behavior problems that had initially been attributed to the stresses of a cross-country move and beginning first grade.
Currently, he follows simple verbal directions with some consistency but rarely speaks. His family is interested in both
his current level of comprehension and in information about the rate at which his communication skills are declining so
that they can facilitate the child’s participation in the family and plan more for his ongoing care.
Tamika, a 5-year-old girl with specific expressive language impairment, has been seen for treatment since age 3.
Initially her treatment was aimed at increasing the frequency and intelligibility of single word productions; more recent
goals have focused on her use of grammatical morphemes and monitoring comprehension of directions. In her efforts to
adjust Tamika’s treatment and monitor her overall progress, Tamika’s speech-language pathologist uses periodic
standardized testing along with frequent
Page 294
informal probes, including probes of treated, generalization, and control items. The speech-language pathologist is
concerned about her ability to assess the true impact of treatment on Tamika’s social communication with peers and
family members because Tamika’s family speaks Black English, whereas the clinician does not. She would like to find an
appropriate assessment strategy to help document Tamika’s ongoing communication skills.
The five certified speech-language pathologists working within a small Vermont school district are eager to demonstrate
the efficacy of their work with school-age children because of concerns about cutbacks in neighboring special education
budgets. They decide to participate in ASHA’s National Outcomes Measurement System and begin collecting data for
each of their students. In addition, because of their commitment to improving the quality of their practice, they also
decide to use a computerized language sampling system with all of their preschool and first grade children with
language problems.
The Nature of Examining Change
The examination of change in children’s language disorders actually encompasses a fairly large number of related
questions—Is this child’s overall language changing? What aspects in particular are changing? Is observed change likely
to be due to treatment rather than to maturation or other factors? Should a specific treatment be continued, or has
maximum progress been made? Should termination of treatment occur? How effective is this particular clinical practice
group in achieving change with the children it serves? These assessment questions present some of the most challenging
issues facing speech-language pathology professionals (e.g., Diedrich & Bangert, 1980; Elbert, Shelton, & Arndt, 1967;
Mowrer, 1972; Olswsang, 1990; Olswang & Bain, 1994).
Described with regard to a single child, methods used to examine change will fuel decisions regarding how the child
moves through a given treatment plan, whether alternative treatment strategies should be explored, and, finally, whether
treatment should be terminated. Providing a more formal categorization, Campbell and Bain (1991) drew on the
framework of Rosen and Proctor (1978, 1981) to describe three dimensions or kinds of change: ultimate, intermediate,
and instrumental.
Ultimate outcomes constitute grounds for ending treatment, and they should be established at the initiation of treatment.
They are similar to long-term treatment objectives, with levels of final expected performance defined in terms of ‘‘age
appropriate, functional, or maximal communicative effectiveness” (Campbell & Bain, 1991, p. 272). Modification of an
ultimate outcome might occur. For example, a functional outcome level might initially be set for a child because of
expectations that performance at a level with same-age peers was unrealistic. However, if treatment data suggested
otherwise, a revision in outcome level would be appropriate (Campbell & Bain, 1991).
Intermediate outcomes were seen by Campbell and Bain (1991) as more specific and numerous for a given client. They
relate to individual behaviors that must be acquired in order for the ultimate outcome to be achieved and for progression
through
Page 295
a given hierarchically arranged treatment to occur. Data from treatment tasks within a session are given as an example of
such data.
Instrumental outcomes illustrate the likelihood that additional change will occur without additional treatment (Campbell
& Bain, 1991). Data documenting generalization fit into this third category. Campbell and Bain acknowledged that this
type of outcome is challenging to identify because of the difficulty in knowing at what point evidence of generalization
reliably predicts improvement towards ultimate outcomes.
The feature that most complicates the assessment of change in children is that children’s behavior is characterized by
change stemming from a variety of sources, most of which are related to growth and development. With few exceptions,
children—even those with quite significant difficulties—are benefiting from developmental advances that enhance their
communication skills. Sometimes change occurs broadly and sometimes in some areas more than others. Even children
who have sustained severe brain damage during early childhood will experience developmental benefits as well as the
physiological benefits of biological recovery. Only a few exceptions to this upward trend exist—for example, in children
with very severe neurologic damage or with neurodegenerative disease and in children who tend to regress in
performance when therapy is withdrawn (e.g., some children with developmental dyspraxia of speech or mental
retardation). In all cases, however, the speech-language pathologist’s assessment of whether change is occurring and
why it is occurring must be gauged on an terrain that is rarely flat and is sometimes a series of foothills.
Clinical questions involving change make use of many of the same types of measures discussed in chapters 9 and 10 and
often examine similar issues across the added dimension of time. Nonetheless, despite their importance for work with
children with language disorders, at least until recently such questions have generally received less attention than
questions related to screening, identification, or description at a given point in time. Thankfully, a variety of external
factors affecting clinical practice described in preceding chapters, such as the demand for greater accountability in
schools and hospitals, are helping to encourage and even mandate greater research attention to the assessment of change
(Frattali, 1998b; Olswang, 1990, 1993, 1998).
Once, broad questions regarding the value of treatment approaches lay principally within the purview of researchers,
who conducted treatment efficacy research in highly controlled conditions. Over the past decade, however, concerns
about accountability have caused individual professionals in speech-language pathology to become more active in
collecting and using such data as well (Eger, 1988; Eger, Chabon, Mient, & Cushman, 1986). The primary emphasis on
evidence obtained in tightly controlled conditions has been shifted to include emphases on evidence obtained under the
very conditions in which treatment is typically conducted—data that are typically referred to as outcomes.
In this chapter, the specific considerations affecting the assessment of change in clinical practice are addressed, followed
by the special considerations relating to tools that are available to address this issue. Finally, practical considerations
related to outcome assessment are discussed for the ways in which they shape professional practices in this area of
assessment.
Page 296
Special Considerations for Asking This Clinical Question
At least four special concerns complicate the process of answering clinical questions regarding change: (a) identifying
reliable, or real, change; (b) determining that the change that is observed is important; (c) determining responsibility for
change; and (d) predicting the likelihood of future change (Bain & Dollaghan, 1991, Campbell & Bain, 1991; McCauley
& Swisher, 1984; Schwartz & Olswang, 1996). These concerns affect both global inferences regarding a child’s overall
progress—ultimate outcomes as well as the more specific decisions involved in specific treatment goals—intermediate
and instrumental outcomes (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Olswang & Bain, 1996).
Identification of Reliable and Valid Change
Because examination of change depends on a comparison of measurements made on at least two occasions, reliability in
the measurement of change is no more certain than the reliability of a single measurement. In fact, there is every
indication that it is less so (McCauley & Swisher, 1984; Salvia & Ysseldyke, 1995). In order to get an idea of the effect
of measurement error on the examination of change, consider the case of a child whose score on a specific measure taken
4 months apart changes from 15 to 30, where 80 is the highest possible score. Initially, this change would appear to be
cause for some degree of celebration—more restrained if you looked just at the number of points gained out of the
number possible; less restrained if you looked at the fact that the child had doubled his score. However, once you remind
yourself that measures vary in their reliability (sometimes quite wildly), you realize that more information is needed
before party invitations can be sent out. Depending on the reliability of the measure, each observed score could fall quite
off the mark of the test taker’s real score, with unfortunate consequences for the believability of observations about the
difference between the two testings. The difference between these two scores could be described as a difference score or,
more frequently in this kind of situation, a gain score.
In fact, gain scores are often less reliable than the measures on which they are based (Mehrens & Lehman, 1980; Salvia
& Ysseldyke, 1995). Although concerns about gain scores are typically expressed in relation to standardized norm-
referenced measures, they apply equally to other quantitative measures. The nature of the measure used in the preceding
example was intentionally ambiguous in order to emphasize that point.
The advantage of some standardized norm-referenced tests is the availability of information allowing one to estimate the
risk of error associated with individual gain scores. Using the standard error of measurement and methods like those used
to examine difference scores when they occur in profiles, it is possible to examine the likelihood that a difference score
is reliable (Anastasi, 1982; Salvia & Ysseldyke, 1995). Indeed, some tests include graphic devices on their scoring sheets
that will help users determine whether a difference is likely to be reliable. However, there is still reason to believe that
numerous norm-referenced tests continue to fail to provide this information for users (Sturner et al., 1994).
Page 297
The problem facing norm-referenced instruments, however, is equally shared or even more intense for informal
measures: Informal quantitative measures will almost never provide that information. Thus, additional strategies are
needed for providing evidence of reliability—that is, evidence that a measure is likely to be consistent over short periods
of time, when used by different clinicians, and so forth—and is thus able to reflect real change, rather than error, when it
occurs. As you will see later in this chapter, single subject designs constitute the most powerful of these strategies.
As a sophisticated observer of psychometric properties, you may be waiting for the other shoe to drop—the validity
shoe. Although it might be possible for developers of highly developed standardized measures to study the ability of
their measure to capture significant change as a form of criterion-related validity evidence, they almost never do so.
Instead, for most measures in speech-language pathology and other applied behavioral sciences as well, the examination
of validity has been couched in terms of discussions of “importance”: Is observed change that appears to be reliable also
important?
Determining That Observed Change Is Important
Issues about the importance of change can be complex. They include questions such as, Is the change large enough to be
significant? and Is the nature of the change such that it is likely to affect the child’s communicative and social life?
These are some of the questions that Bain and Dollaghan (1991) explored under the notion of clinically significant
change.
A number of complementary indicators of “importance” have been put forward. The most important of these are (a)
effect size—Did much happen? (Bain & Dollaghan, 1991); (b) social validation—Did it make a difference in this
person’s communicative life? (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Kazdin, 1977, 1999; Schwartz &
Olswang, 1996); and (c) the use of multiple measures (Campbell & Bain, 1991; Olswang & Bain, 1994; Schwartz &
Olswang, 1996).
Effect Size
In the statistical and research design literature, a distinction is made between statistical significance and substantive
importance, or meaningfulness. That distinction, although often overlooked by researchers who focus on statistical
significance as if it were the holy grail (Young, 1993), is a valuable one for our thinking about the clinical importance of
change we observe in children. Effect size, which refers to the magnitude of difference observed, is frequently discussed
in relation to substantive importance, or clinical significance, and is discussed at some length later in this section.
Statistical significance is a relatively straightforward concept. Specifically, when a research finding is statistically
significant, a statistical test has suggested that the finding is unlikely to have occurred by chance, that it is rare (Pedhazur
and Schmelkin, 1991). More complex, however, is the matter of determining whether a statistically significant finding is
meaningful, that is, whether it says anything important about the matter under study (Pedhazur and Schmelkin, 1991). A
term frequently used to refer to the meaningfulness or substantive importance of a difference to clinical decision mak-
Page 298
ing is clinical significance (Bain & Dollaghan, 1991; Bernthal & Bankson, 1998). Other terms applied to this concept in
the rich psychological literature on the topic include social validity, clinical importance, qualitative change, educational
relevance, ecological validity and cultural validity (Foster & Mash, 1999).
A research example using a difference between two groups at a single point in time can help illustrate the distinction
between statistical significance and substantive importance. In a research study one might compare the performance of
two groups on a given test with 100 items and find that the two groups differed in their performance by just 2 items.
Further, the difference might be shown to be statistically significant. Despite the statistical significance, however, most
observers, if aware of the size of the difference, would consider a difference of just 2 points to merit no more than a yawn
—no matter how much verbal arm waving the researcher in question might use to inspire interest. In contrast, if a much
larger difference had been obtained and found to be statistically significant, most observers would be moved to rapt
attention, having been persuaded that the basis for group assignments had at least some sort of important relationship to
the subject covered by the test.
Using an analogous clinical example, one can imagine achieving a very consistent result when using a particular
treatment with a given child—for instance, Tamika, from the introduction of this chapter. Perhaps Tamika makes gains
of one or two items on untreated probes that are used over the course of a semester to monitor her progress in the use of
grammatical morphemes. That relatively high consistency (or reliability) of change, however, would probably not please
you (or Tamika) and would probably send you scrambling to find an alternative, more effective intervention strategy.
The clinical significance of change observed for Tamika simply would not warrant contentment with the current
treatment.
Effect size, which can be measured in a variety of ways, generally refers to the magnitude of the difference between two
scores or sets of scores, or of the correlation between two sets of variables (Pedhazur & Schmelkin, 1991). Authors
regularly suggest that researchers in speech-language pathology and elsewhere appear to fixate on statistical significance
at the expense of effect size or other measures that are more amenable to decisions about the value of information to
decision making (e.g., Pedhazur & Schmelkin, 1991; Young, 1993). Because information about the reliability of
difference scores is difficult and often impossible to come by for the measures clinicians use to examine change,
clinicians and their constituents are much more likely to want to inspect the actual magnitude of changes with an eye
toward its clinical meaning. Effect size alone cannot be the sole data used to determine the meaning of a particular
difference because other factors will need to be taken into account (e.g., the social significance of the difference, the
likely generalizability of the difference). However, it can be an important element in that process (Bain & Olswang,
1995).
Bain and Dollaghan (1991) described a couple of strategies for looking at effect size. One of these strategies uses
standard scores, takes into account the absolute amount of change that has occurred, and is therefore primarily limited to
use with norm-referenced standardized measures. The other uses age-equivalent scores, looks at the relative size of
change, and is subject to the vagaries associated with that inferior method of characterizing performance.
Page 299
Using standard scores to examine change, Bain and Dollaghan (1991) noted that the amount of change can be expressed
in terms of standard deviation units and compared against an arbitrary standard. Thus, a difference might be considered
of practical significance if it met or exceeded a change of so many standard deviation units—with those authors citing 1
standard deviation as a frequently used standard. For instance, imagine that at Time 1, a child receives a standard score
of 70 on a test with a mean of 100 and standard deviation of 10. Then, at Time 2, the child receives a score of 81 on that
same test. The amount of change would be considered of clinical significance because it corresponded to slightly more
than one standard deviation.
As long as the measure that is being used has been carefully selected for its validity for the given child and content area,
this method seems a reasonable one for many purposes. In particular, its use is strengthened if the time period
encompassed by the comparison results in a comparison against a single normative subgroup. Specifically, if a child’s
performance can be compared with just a single normative subgroup over time (e.g., all of the children age 5 years, 1
month to 6 years), then the extra variability introduced by comparing his her first performance with one set of children (e.
g., the children from 5 years to 5 years, 6 months) and then with another (e.g., the children from 5 years, 7 months to 6
years) can be avoided.
The use of standard scores is also preferable to the same method applied using age-equivalent scores and a cutoff
established around a certain age-equivalent gain (Bain & Dollaghan, 1991) because of the poor reliability of such scores
(McCauley & Swisher, 1984). Admittedly, at this point, selection of the cutoff in this strategy using standard scores is
arbitrary—how much change should be regarded as clinically significant can serve as a point of considerable argument.
However, additional research by test developers and others could validate specific levels in a manner quite analogous to
that proposed for cutoffs used in other areas of clinical decision making (Plante & Vance, 1994).
The Proportional Change Index (PCI), the alternative strategy for examining effect size described by Bain and Dollaghan
(1991), provides a relative measure of change arising from the work of Wolery (1983). The measure is relative in the
sense that it attempts to examine the rate of change characteristic of the child’s behavior for the period before treatment
as compared with the rate observed during treatment. Specifically, the PCI is the proportion created when the child’s
preintervention rate of development is divided by the child’s rate of development during intervention. The
preintervention rate of change is estimated by dividing the child’s age-equivalent score on a measure taken just before
the beginning of treatment by his age in months. The rate of development during intervention is estimated by dividing
the gain score obtained for that measure when it is readministered after a period of treatment by the duration of
treatment. For a child whose behavior is being monitored over time without intervention, the measure might be used to
examine the period before observation with that observed during the period of observation. The merit of this particular
measure is that it “takes into account the number of months actually gained, the number of months in intervention [or
observation] and the child’s rate of development at the pretest date” (Wolery, 1983, p. 168). Figure 11.1 illustrates the
calculation of PCI for two children: Shana, who shows excellent gains in receptive vocabulary, with twice as
Page 300
Fig. 11.1. A hypothetical example showing the calculation of the Proportional Change Index (Bain & Dollaghan, 1991; Wolery, 1983) for
two children.
Page 301
Page 302
much progress in treatment as prior to treatment; and Jason, who shows progress in receptive vocabulary acquisition that
is no better in treatment than it had been prior to treatment.
If the two rates of change used in the equation for PCI are similar, the calculated value for PCI will approach a value of
one. On the other hand, if treatment or other factors have accelerated development, the PCI should be positive, with
larger PCI’s indicating greater acceleration. Thus, for example, a PCI of 3 would imply that change had occurred three
times as quickly during treatment as preceding it. Alternatively, a PCI of .5 would suggest that change had occurred at
half the rate during the treatment or observation period as preceding it.
As described earlier, the PCI is usually recommended for its utility in examining change during a period of intervention
in which positive change is expected. Nonetheless, it might also be used if one were interested in examining alterations
in rates of change occurring under conditions like those described for David at the beginning of the chapter. Recall that
David had been diagnosed with a neurodegenerative disease that was predicted to result in skill loss. It might also be
used under conditions in which problems in development were suspected (as in the case of a suspected ‘‘late talker”), but
the child’s clinician had opted for a watch-and-see strategy with a planned 6-month reevaluation.
Bain & Dollaghan (1991) noted that the PCI rests on two problematic assumptions, with the first being that change in
children’s skills occurs at a constant rate in the absence of intervention. A plausible alternative to this assumption is that
change may occur at varying rates during development—with children’s behaviors sometimes racing ahead, sometimes
holding steady, and sometimes, perhaps, even regressing for a time. The problem with the assumption of constant change
embodied in the PCI is addressed to some extent by the use of single subject designs, a specific method that is described
in greater detail later in the chapter. Single subject designs escape this assumption through the clinician’s active
examination of change patterns during periods in which intervention is not occurring as well as when it is. Thankfully,
too, the question of whether change is constant can be addressed empirically. Although additional information is needed
to determine the extent to which this assumption is tenable, efforts to examine patterns of change are underway and
suggest that over shorter time periods the assumption of a constant rate of change is probably false (Diedrich & Bangert,
1980; Olswang & Bain, 1985).
The second problematic assumption of the PCI lies in its use of age-equivalent scores and the temptation that it presents
for clinicians to use tests that present such scores without much in the way of empirical support—either for the age-
equivalent scores or for the test in its entirety. Bain and Dollaghan (1991) acknowledged this potential drawback and
implicitly recommended that clinicians should search for the highest quality measures to use for documenting change.
However, they also suggested that in the absence of such measures, the PCI may offer a better alternative than the simple
assumption that a gain in age-equivalent scores over time represents progress.
An additional limitation affecting the PCI is the need for users to adopt an arbitrary basis for determining when a certain
amount of change is sufficient to support the use of time and other resources required to achieve a particular gain. Thus
far, no meas-
Page 303
ure described herein or proposed elsewhere has been able to claim a rational basis for its particular standard or cutoff.
In principle, then, the two measures of effect size that I have described (standard score gain scores and PCI) seem to
represent strong contenders for use in decisions about the importance of observed change—both for change observed
during treatment or for change observed over a period of time in which intervention is not used but a child’s performance
is monitored. However, additional research is needed to validate their use in decision making, particularly in the case of
the PCI in which the strength of the logic behind the measure is undermined by its dependence on age-equivalent scores.
I also call readers’ attention to the fact that both of these methods will more readily be implemented for standardized
norm-referenced tests than for other types of measures that might be used to describe a child’s language.
Social Validation
In examining the importance of change, clinicians are almost always interested in considering whether observed changes
conform to theoretical expectations, especially developmental expectations, that imply a hierarchy of learning in which
some behaviors are seen as prerequisites to others (Bain & Dollaghan, 1991; Lahey, 1988). Put differently, clinicians are
interested in determining whether the child has made gains that theoretically appear to be movements along the “right”
path. Gains on those behaviors that are seen as precursors to further advancement are judged to be more important than
those that are not.
Additionally, clinicians have always valued and sometimes solicited family and teacher reports asserting progress as de
facto evidence that change has occurred and is important. This way of thinking about the importance of language change
falls under the term social validation.Social validation also complements the use of effect size in fostering the richest
possible conceptualization of “importance.” Acknowledging that such evidence has value is consistent, first of all, with
an appreciation that the functional and social effects of communication disorders warrant greater incorporation into
clinical practice (Frattali, 1998b; Goldstein & Geirut, 1998; Olswang & Bain, 1994).
In a different context (discussing research significance as opposed to clinical significance), Pedhazur and Schmelkin
(1991) offered a quotation from Gertrude Stein: “A difference in order to be a difference must make a difference” (p.
203). If rephrased slightly, this quotation also seems to speak to efforts to examine the importance of change in
children’s language: For change in a child’s language to be significant, it must make a difference in the child’s life.
Use of measures to examine the functional and social impact of change is also consistent with the growing appreciation
of qualitative data described in the last chapter. Because qualitative data are unapologetically subjective in nature
(Glesne & Peskin, 1992), they may be used very effectively—more effectively than reams of quantitative data—to
address questions related to the social context supporting and affecting a child and to how the child is viewed in that
context. Over the past few decades, quantitative as well as qualitative measures have received growing attention for the
purpose of assessing function and social impacts of treatment (Bain & Dollaghan, 1991;
Page 304
Campbell & Bain, 1991; Campbell & Dollaghan, 1992; Koegel, Koegel, Van Voy, & Ingham, 1988; Olswang & Bain,
1994; Schwartz & Olswang, 1996).
Kazdin (1977) described a process by which such measures can be used to look at the importance of behavioral change.
In particular, he focused on behavioral change achieved through applied behavior analysis and based his work on that of
Wolf and his colleagues (e.g., Maloney et al., 1976; Minkin et al., 1976; Wolf, 1978). Kazdin defined social validation as
the assessment of “the social acceptability of intervention,” where such acceptability could be assessed with regard to
intervention focus, procedures, and—importantly for this discussion—behavior change. More recently, he has defined
clinical significance as ‘‘the practical or applied value or importance of the effect of an intervention—that is, whether the
intervention makes real (e.g., genuine, palpable, practical, noticeable difference in everyday life to the clients or others
with whom the clients interact” (Kazdin, 1999, p. 332). Although Kazdin and numerous other authors working in the
area of clinical psychology (e.g., Foster & Mash, 1999; Jacobson, Roberts, Berns, & McGlinchey, 1999; Kazdin, 1999)
have continued to elaborate on the concepts outlined in Kazdin (1977), basic issues raised in that earlier work remain
relevant. In particular, this relevance derives from the lack of empirical validation supporting many of the highly
developed measures of clinical significance proposed in the clinical psychology literature (Kazdin, 1999).
Kazdin (1977) recommended two general approaches to the social validation of behavior change that have been
embraced by a number of researchers in child language disorders—social comparison and subjective evaluation (Bain &
Dollaghan, 1991; Campbell & Bain, 1991; Campbell & Dollaghan, 1992; Olswang & Bain, 1994; Schwartz & Olswang,
1994). Social comparison involves comparisons conducted pre-and post-intervention between behaviors exhibited by the
child receiving intervention with those of a group of same-age peers who are unaffected by language impairment
(Campbell & Bain, 1991). Astute readers will find this method reminiscent of a normative comparison. However, instead
of comparisons on a standardized measure against a relatively large group of ostensible “peers,” here the child’s
performance on a more informal measure (usually a clinician-designed probe) is compared against that of a relatively
small group of actual peers. The value of this technique will certainly be affected by the care taken to choose a
representative, if small, comparison group. In addition, it may also prove most valuable in cases where a norm-
referenced comparison using a larger group is unavailable because no appropriate measures or appropriate normative
samples exist for the targeted behavior and particular client.
Subjective evaluation involves the use of procedures designed to determine whether individuals who interact frequently
with the child see perceived changes as important (Kazdin, 1977). Methods that have been proposed for these purposes
in speech-language pathology range from quite informal to relatively sophisticated. Thus, for example, at the informal
end of the continuum, it has been suggested that parents, teachers and other adults who are familiar with the child be
asked to appraise the adequacy of a child’s performance following a period of intervention (Bain & Dollaghan, 1991;
Campbell & Bain, 1991). Clearly these data may be qualitative in nature (Olswang & Bain, 1994; Schwartz & Bain,
1995) and would benefit from the clinician’s use of triangulation with other sources, as discussed in the previous chapter,
Page 305
thus implying the use of multiple measures. This is consistent with the idea emphasized in Kazdin’s (1999) recent work,
that “clinical significance invariably includes a frame of reference or perspective” (p. 334).
A more intermediate level of complexity might involve use of an existing rating scale, such as the Observational Rating
Scales of the Clinical Evaluation of Language Functions—3 (Semel, Wiig, & Secord, 1996), in which a similar rating
scale is completed by the child, the parent(s), and a classroom teacher. The growing interest in the development of
functional measures for use with children in school settings will certainly provide many new alternatives of this kind.
Addition of this type of measure to the very detailed measures of progress being used for Tamika may not only provide
strong evidence of functional impact, but may also help reduce possible bias in the assessment of progress achieved by a
child who speaks a dialect usually underrepresented in standardized measures.
A higher level of complexity in the use of subjective evaluation would involve the use of a panel of naive listeners who
could be asked to use a rating strategy such as direct magnitude estimation to make judgments about some aspect of the
communicative effectiveness of a child’s productions. Campbell and Dollaghan (1992) described the use of a 13-person
panel that was asked to rate the informativeness (“amount of verbal information conveyed by a speaker during a
specified period of spontaneous language”, p. 50) of utterances produced by nine children with brain damage and their
controls. This example of social validation is particularly complex given that Campbell and Dollaghan applied a hybrid
method that used both social comparison and subjective evaluation components. Although methods as complex as these
are probably not practical in many clinical settings, they provide a valuable illustration of how flexible social validation
procedures can be.
In summary, social validation methods add greatly to our estimation of how important an observed change is. In
particular, they can help us see how observed differences make a change in a child’s communicative and social functions
and opportunities. They vary dramatically in terms of their complexity and sophistication. Further, because they can be
applied to qualitative as well as quantitative data, the use of informal measures is an especially attractive feature.
Use of Multiple Measures
The augmentation of measures designed to directly assess linguistic behaviors with measures intended to provide social
validation constitutes one very important way in which multiple measures may be used to enhance our ability to tease out
the contribution of treatment to change. However, the kinds of multiple sources of data recommended by clinical
researchers do not stop there (Campbell & Bain, 1991; Olswang & Bain, 1994; Schwartz & Olswang, 1996). They
extend to considering the value of multiple indicators in helping one best address the construct of interest—an idea that
was introduced in Fig. 2.2 and in chapter 2. Whether the construct is one related to a particular linguistic skill or to a
child’s communicative function within a given setting, there is general agreement that making use of several measures
can best support conclusions about the construct under consideration.
Page 306
Writing from a research perspective, Primavera, Allison, and Alfonso (1996) noted that Cook and Campbell (1979)
introduced the idea of multioperationalism into behavioral research, in which a construct is operationalized using as
many indicators as possible in order to truly capture its essence. In a similar vein, Pedhazur and Schmelkin (1991)
offered a detailed account explaining why the use of a single indicator of a construct “almost always poses
insurmountable problems” (p. 56) related to knowing to what extent the indicator reflects the construct rather than error.
Whereas researchers may have greater opportunities and rewards for practicing multioperationalism, clinicians, too, can
benefit from its application. When a clinician uses a single measure (e.g., a single test of receptive vocabulary) to support
conclusions about a construct (e.g., receptive language), both the clinician and his or her audience either immediately
feel skeptical that the part (receptive vocabulary) represents the whole (receptive language) or should feel skeptical if
they give it much thought. Even if conclusions are limited to those about receptive vocabulary, however, a quick
reminder about the nature of most such tests—that they frequently address only pictureable nouns—should cause the
clinician to pause. Clearly, the single indicator seems unlikely to capture the construct of interest. The time demands of
clinical practice can sometimes make the collection of even one measure seem onerous and the idea of multiple measures
an author’s fantasy and clinician’s nightmare. However, becoming aware of the value of such measures may help
clinicians decide to take the extra time and provide support for that decision in select cases. Further, in cases where the
use of multiple measures has not seemed practical, it can help lead to more limited and therefore more valid
interpretations.
In this section, three principal strategies for examining the importance of change were briefly introduced: use of multiple
measures, social validation and effect size. Authors such as Bain, Campbell, Dollaghan, and Olswang have begun to
venture deep into the literatures of related disciplines to explore this relatively new territory for the resources it might
contribute to measurement in communication disorders. Given the value of their work to date, their efforts will
undoubtedly continue and be joined by those of others who respond to recent calls for more persuasive evidence that
speech-language pathology services make a difference for children with communication disorders.
Determining Responsibility for Change
Whereas determining the extent to which change in language has occurred and determining its importance are closely
related tasks, verifying the clinician’s contributions to that change is an altogether different and more daunting task.
Granted, simply noting the extent to which change has occurred and its nature can be useful in instances where no
intervention has taken place—for example, in cases where a child’s development is being monitored because of
suspicion that the child is a late talker. More commonly, assessment of change for children in treatment involves cases
where all stakeholders are comfortable with the unexamined assumption that change will be primarily the result of
intervention efforts. However, there are times when demonstrating that treatment is responsible for observed changes is
crucial. In this era of growing attention to accountability and quality assurance, these times are becoming more common
(Eger, 1988; Frattali, 1998a 1998b).
Page 307
The difficulty in pinning down causal explanations for human behavior or behavior change is a driving force behind
developments in psychology and related disciplines over the past 100 years. Again and again, the problem with
determining causality seems to be ruling out alternative explanations in cases where stringent control over potential
causes is either not possible or not ethical. Treatment for language disorders in children presents the classic difficulty in
this regard. The possibility of factors other than treatment—such as development, environmental influences, and changes
in the child’s physiology through recovery from a disease process or trauma—make it very difficult to identify
treatments or indirect management strategies as having “caused” gains that are seen in a child’s performance.
At least two design elements have provided a logical basis for increasing the plausibility that gains in performance seen
while a child is undergoing treatment are attributable to treatment rather than to alternative explanations. These two
elements are repeated observations over a period of time prior to the onset of treatment and the use of treatment,
generalization, and control probes. Both of these elements have been incorporated into the framework of research known
as single subject experimental design (Franklin, Allison, & Gorman, 1996; Kratochwill & Levin, 1992; McReynolds &
Kearns, 1983). In addition, each has been identified separately as a means of enhancing support for treatment as a causal
factor in cases of behavioral gains (Bain & Dollaghan, 1991; Campbell & Bain, 1991; Olswang & Bain, 1994; Schwartz
& Bain, 1996).
Pretreatment Baselines
The use of multiple observations over a period of time prior to the initiation of treatment is frequently referred to as a
baseline or the A condition in a single subject experimental design. Multiple observations function as a window into the
stability of the behavior and the measure used to characterize it. If little variation is observed, it seems most likely that
the behavior is not changing and that the measure being used to track the behavior is not introducing error (i.e., that it is
probably reliable). This means that departures from stability observed after the onset of treatment can be more readily
attributed to treatment than to either the instability of the behavior being measured or to measurement error. The
presence of stability during baseline observations might alternatively be interpreted as suggesting that the behavior being
measured and the measure being used for that purpose are varying: in ways that cancel each other out—a most unlikely
prospect.
In contrast, when considerable variation is observed, it can be difficult to determine which of the two possible sources of
variation (change in the behavior vs. error in the measurement) is the culprit. Consequently, as a rule, baselines are
easiest to interpret and they provide the strongest support for observing changes that might occur under conditions such
as treatment, when they are sufficiently lengthy, show no obvious trends, and appear to be stable (McReynolds &
Kearns, 1983). With regard to length, three observations is often referred to as a minimum (McReynolds and Kearns,
1983), with longer baselines required if the behavior shows a trend or other lack of stability. The presence of a trend
(consistent increase or decrease in data values in the direction of expected change with treatment) can be problematic, as
can lack of stability in
Page 308
which both increases and decreases in a specific measure are noted. Because stability is a relative quality, we again are in
a position of looking toward expert advice to help us agree on an acceptable range of variation. McReynolds and Kearns
(1983) pointed to a historic standard of 5 to 10%. However, they noted that lower levels of stability achieved during a
baseline will simply necessitate greater amounts of change to justify claims of effective treatment.
Proponents of single subject experimental designs who are the chief resources for interpreting baseline data have often
suggested that visual inspection of such data is sufficient for the detection of stability and systematic change. Recently,
however, the complexity of this judgment task has led to questions about its use (Franklin, Gorman, Beasley, & Allison,
1996; Parsonson & Baer, 1992). In particular, researchers have noted a tendency for visual analysis to fail to detect
change when it has actually occurred, thus suggesting a lack of sensitivity to smaller levels of change. This reduced
sensitivity may present serious problems for clinicians who believe that small amounts of change will be important to
documenting the effect of their treatment. On the other hand, for those who attempt to target behaviors on which they
expect larger changes (larger effect sizes, to use our previous terminology) the reduction in sensitivity may represent a
reasonable trade-off against the relative simplicity of graphic analysis. Nonetheless, clinicians who may wish to rely on
visual analysis would do well to look into the emerging complexities of this aid to data interpretation (Franklin et al.,
1996; Parsonson & Baer, 1992). Researchers and clinicians with sufficient resources might also consider alternative
interpretations that make use of emerging methods (Gorman & Allison, 1996).
Treatment, Control, and Generalization Probes
The idea of treatment and control probes draws once again on the single subject experimental design literature (Bain &
Dollaghan, 1991). In that context, treatment probes represent quantitative measures focusing on behaviors that are or
will be the target of treatment. They are usually the minimum type of data collected to provide evidence of change. In
contrast, control probes represent quantitative measures obtained periodically over the course of a study to allow the
clinician to monitor the effects of extraneous variables on an individual’s behavior. They are usually constructed or
selected so that they measure behaviors that are unrelated to the treated behavior. If the treated behavior shows change
whereas the untreated, control behavior monitored using control probes does not, then the clinician can feel confident
that maturation and other factors have not produced global advances from which treated stimuli would have benefited
with or without the implementation of treatment. (Of course, one of the perils involved in the selection of control probes
is that developmental forces may cause changes in the behavior they are used to track even without a direct effect of
treatment; Demetras, personal communication, February, 2000).
Generalization probes are used to track behaviors that are related but distinct from those receiving treatment. Thus, their
use involves a violation of the expected lack of relationship from treated behaviors characteristic of control probes within
single sub-
Page 309
ject designs (Bain & Dollaghan, 1991; Fey, 1988). In the construction of generalization probes, the clinician looks for
behaviors that are related to treated behaviors in a manner thought likely to cause generalization that will affect them. On
the basis of the current understanding of generalization, generalization probes would be expected to show similar but
smaller changes than treatment probes in response to the implementation of an effective treatment. Although
generalization across behaviors may be the most common dimension in which generalization probes are studied
clinically, generalization across situations will also prove of interest as will generalization across time (McReynolds &
Kearns, 1983).
The use of generalization and control probes allows for a clear demonstration that treatment is behaving as predicted
relative to the targeted behavior. Specifically, their use can help demonstrate that treatment is having its greatest effect
on treated behaviors, a lesser effect on untreated or other generalization behaviors, and no effect on control behaviors.
Their use can thus contribute to the plausibility of arguments that treatment, rather than the myriad of other variables that
might help a child’s behavior improve, is the agent responsible for observed change. Campbell and Bain (1991) further
argued that evidence of generalization obtained during treatment offers speech-language pathologists their clearest
opportunity to show instrumental outcomes (i.e., outcomes suggesting the likelihood that treatment will lead to
additional outcomes without further treatment). More support for these varied measures comes from the motor learning
literature, in which it was observed that data obtained during a learning condition (e.g., a treatment session) can
overestimate learning compared to generalization or maintenance data (e.g., see Schmidt & Bjork, 1992).
An example illustrating the use of treatment, generalization, and control probes is described in Bain and Dollaghan
(1991) as part of a single subject design. Using the case of a hypothetical preschooler with SLI, they suggested a
treatment target consisting of the production of a two-word semantic relation Agent + Action. As a generalization
behavior, they proposed the production of Action + Object because its shared component, Action, was thought to make
generalization likely. Finally, as a control behavior, they proposed the production of Entity + Locative because it seemed
unlikely to change without direct treatment. Each probe consisted of the child’s percentage of correct production of 10
unfamiliar exemplars that the clinician attempted to elicit through manipulation of several toys and the context.
Treatment, generalization, and control probes often involve elicited behaviors such as those described under that heading
in the preceding chapter. However, other measures, such as performance on language samples and analyses, could also
serve as measures that might be used to examine treatment, generalization, and probe behaviors. Although there is a
tendency for treatment probes to be obtained frequently so that the process of treatment as well as the product may be
illuminated (McReynolds & Kearns, 1983), generalization and control probes are frequently evaluated on a less frequent
basis (Bain & Dollaghan, 1991). The frequency with which treatment probes are used may depend on the expected rate
of change; Bain and Dollaghan pointed out that the behaviors of a child with cognitive delays indicative of an overall
slower rate of learning may require less frequent collection of data.
Page 310
Determining Whether Additional Change Is Likely to Occur
As an additional aspect of examining change, authors have sometimes called attention to the value of predicting whether
future change is likely. In particular, this general question has been asked specifically with regard to addressing
predictions of change at two different ends of the treatment process: initiation and termination. First, successful
prediction of whether change is likely might help in judging whether treatment should be initiated because of a child’s
‘‘readiness” for change in a particular area (Bain & Olswang, 1995; Long & Olswang, 1996; Olswang & Bain, 1996).
Second, successful prediction might help in judging whether treatment should be terminated, or at least temporarily
discontinued, because additional change is unlikely (Campbell & Bain, 1991; Eger et al., 1986). Both kinds of questions
will require substantial empirical investigations to arrive at universal recommendations for best practices. Nonetheless,
each depends on evidence that a particular technique is valid for predicting a given outcome—thus suggesting that
evidence of predictive criterion-related validity is at the root of both of these questions. This realization is implicit in the
work of Bain and Olswang (1995), in which they sought to demonstrate the predictive validity of dynamic assessment to
support its use in determining readiness for the production of two-word phrases.
Posing the question of when treatment might most profitably be initiated goes beyond the clinical assumption that
treatment should be undertaken any time a child is found to demonstrate a significant problem in language or
communication skills. The question itself suggests the possibility that there are times when children may exhibit
evidence of a language disorder but that treatment would be unlikely to be effective—either in a global sense or in
relation to a specific domain or behavior. Timing the onset of treatment or at least the onset of treatment aimed at
specific targets to coincide with children’s areas of readiness could be expected to yield major enhancements to
treatment efficiency (Long & Olswang, 1996).
Olswang and Bain (1996) discussed the use of profiling in static assessment versus dynamic assessment as tools to use in
addressing the question of readiness. The use of profiles, which are most often created by comparing a child’s
performances on several tests or subtests, was discussed at some length in chapter 9. Even though the use of profiles has
been largely debunked as a strategy for highlighting domains or children that might exhibit the greatest change in
treatment, Olswang and Bain (1996) decided both to pursue it as one of the few methods in static assessment that has
been proposed for addressing the prediction of future change and to compare it with techniques from dynamic
assessment.
One of the greatest promises of dynamic assessment has been its use in identifying the moving boundary of a child’s
learning, or zone of proximal development (ZPD; Olswang & Bain, 1996; Vygotsky, 1978). As described in chapter 10,
the ZPD is thought to reflect the loci of a child’s active developmental processes and thus to suggest areas in which
treatment might be aimed to achieve optimal change. As a result of this promise, Olswang and Bain (1996) decided to
compare the relative merits of profiles based on static assessments as well as performances on other selected variables
versus measures of dynamic assessment techniques in predicting responses to
Page 311
treatment. The dynamic measures were found to have the stronger correlation than the static measures to a measure of
change (PCI) calculated following a 3-week treatment period.
The results of their study led Olswang and Bain (1996) to propose that dynamic assessment procedures are better than
other techniques at determining the likelihood of immediate change. However, they noted that additional research is
needed to determine whether observed changes would have occurred even in the absence of treatment. They might also
have noted that additional research is needed to determine whether the predictive powers of dynamic assessment would
have performed as well over longer periods of treatment.
As Campbell and Bain (1991) advised, decisions regarding treatment termination can be based on predetermined exit
criteria or on demonstrations that no change has occurred over a given period of time. Such decisions, however, can also
be based on empirical evidence that additional change is unlikely. This last alternative thus demands a prediction of
future change levels akin to that sought by Olswang and Bain in their efforts to identify harbingers of change prior to
treatment initiation.
Campbell and Bain (1991) touched on the possibility of predicting future change for purposes of making a rational
decision about the end of treatment in their discussion of ultimate or instrumental outcomes. Whereas ultimate outcomes
can be defined as a child’s achievement of age-appropriate or maximal communicative effectiveness, such outcomes can
also be defined as functional communicative effectiveness, which implies that the child has achieved his or her best
approximation of maximal communicative effectiveness. Additionally, “instrumental outcomes” can be defined as
outcomes suggestive that additional change will be forthcoming in the absence of treatment. The notions of “functional”
communicative effectiveness and instrumental outcomes each involve implications related to the prediction of future
change. Specifically when functional communicative effectiveness is seen as a legitimate ultimate outcome, it is almost
invariably because the prospect of additional change is seen as unlikely or as prohibitive in terms of the time and effort
required to produce it. Similarly, instrumental outcomes depend on the notion that additional change is likely.
At this point in time, it appears that generalization data, such as that described in the preceding section, may represent
the best method for addressing questions regarding future change. Research designed to identify more appropriate
methods of predicting future change will undoubtedly need to proceed hand-in-hand with research aimed at
understanding the nature of language learning and of threats to language learning posed by language disorders before
substantial progress on these clinical questions can be made. Measures of predictive validity will also undoubtedly play a
role in helping us arrive at satisfying answers.
Available Tools
The kinds of tools available for use in addressing questions of change in children’s language disorders largely overlap
those available for description that were described in the preceding chapter. Therefore, in this chapter, discussion of
available tools is
Page 312
quite brief and focuses on those measures that are most frequently used to examine behavioral change and the special
considerations that arise when they are used for that purpose. The only new tool to be introduced in this chapter is single
subject designs, a family of methods that has been alluded to throughout this chapter but has not been adequately
introduced as a specific method for examining change.
Standardized, Norm-Referenced Tests
Repeated administration of standardized, norm-referenced tests is probably the most widespread method used by speech-
language pathologists to examine broad changes in language behaviors over time (McCauley & Swisher, 1984). More so
than other measures used to examine change, standardized norm-referenced measures are often accompanied by data
concerning their reliability and validity. This represents a distinct potential advantage because such data can enhance the
clinician’s ability to determine whether observed changes are likely to be reliable and important. Regrettably, however,
norm-referenced measures often do not provide sufficiently detailed data to make this potential a reality (Sturner et al.,
1994).
As additional barriers to their effective use for evaluating change, there are a number of pitfalls that must be avoided.
The most important of these relates to the tendency for such measures to have been devised so that they are more
sensitive to large differences in knowledge between individuals than to small differences (Carver, 1974; McCauley &
Swisher, 1984). Yet it is small differences that are characteristic of the changes most likely to occur in treatment within a
given individual (Carver, 1974; McCauley & Swisher, 1984). Thus, clinicians who use such measures to assess change
must be aware that their efforts are likely to prove insensitive to very important changes in behaviors that simply are not
addressed by a given test. Such tests should be used when broad changes are of interest.
Among other possible pitfalls cited by McCauley and Swisher (1984), as well as others, are the need to avoid situations
in which the test is explicitly taught by a well-meaning clinician or implicitly taught through repeated administrations
that occur so closely in time as to allow the child an unwarranted advantage at the second administration. Another pitfall
is the use of norm-referenced instruments to assess change, which can be problematic if changes in the normative groups
occur over the time interval studied or if different measures (albeit those that ostensibly tap the same behavior) are used
at different times. Now, it may be tempting to view change as having occurred because a child has received a relatively
better score on Test B of Language Behavior X than she or he did on Test A of Language Behavior X. However, the
huge amount of error that could be introduced by differences in the content of Tests A and B (despite their similar
names) as well as by differences in their normative samples are likely to make such a conclusion completely erroneous.
One method that has been recommended (e.g., McCauley & Swisher, 1984) as helping remove the additional error
associated with gain scores has been to simply reexamine a child with the same initial question: Is this child’s language
(or the particular aspect of it that is under scrutiny) impaired? However, a recent study looking at remission rates for
reading disability among children examined in two studies over
Page 313
a 2-year time period suggested that measurement error can lead to significant overestimates of recovery rates even when
this more cautious strategy is applied (Fergusson, Horwood, Caspi, Moffitt, & Silva, 1996). However, the chief source of
difficulty was not in how change was examined, but that the question of measurement error had not been explored
sufficiently by the original investigators at the time of the children’s original diagnoses. Careful analysis by Fergusson
and his colleagues suggested that the overidentification of many children at their first testing, due to a lack of
appreciation of testing error, was the villain. It is now an empirical question to determine whether the findings of
Fergusson et al. are echoed in the identification of children as having a language impairment. However, I include this
brief description of their work here as a cautionary tale suggesting that careful use of norm-referenced measures in
assessing change begins with their careful use in identification processes.
In short, despite their frequent use for the assessment of change, norm-referenced tests are most useful when broad
changes are expected and when clinicians are careful to avoid the several problems that can undermine the validity of
their use for this purpose as well as for purposes of identification.
Standardized Criterion-Referenced Measures
Because criterion-referenced measures are more often developed so that they exhaustively examine knowledge within a
given domain, they have been hailed as superior to norm-referenced measures for purposes of examining change
(Carver, 1974; McCauley, 1996; McCauley & Swisher, 1984). However, their relative rarity (as shown by the sampling
of such tools in Table 10.1) means that their value in assessing language change in children has not been extensively
evaluated.
Clinicians need to examine documentation for such measures to determine whether the author has presented a reasonable
evidence base supporting their use to examine change over time. Especially desirable is evidence suggesting that changes
in performance of specific magnitudes are likely to reflect significant functional changes in performance. Nonetheless,
where they are used as a simple description of the specific content on which gains have been achieved, such evidence is
not as critical.
Probes and Other Informal Criterion-Referenced Measures
As argued throughout this book, probes have a relative advantage in their malleability to the specific clinical questions
posed by the speech-language pathologist. Thus, they can be devised or selected to address very specific questions about
change that coincide with the very focus of treatment for a given child. That they are often relatively brief and
straightforward in interpretation represent further advantages.
To contemplate the possible pitfalls of the use of probes, however, readers need only return to their description in chapter
10. Without the considerable effort entailed in standardization, clinician-devised probes or probes that are borrowed from
other nonstandardized sources are unknown with respect to reliability and validity. Although their possible fit to the
question being asked presents a great potential for excellent construct validity, the tendency for probes to be haphazardly
constructed,
Page 314
administered, and interpreted represents a potentially devastating threat to that potential. Because of the expectation that
repeated use of probes will be required if they are to be used to assess change, the standardization strategies described in
Figure 10.1 become particularly vital defenses against those threats.
Dynamic Assessment Methods
The growing literature aimed at exploring the utility of dynamic assessment methods in predicting readiness for language
change (Bain & Olswang, 1995; Long & Olswang, 1996; Olswang & Bain, 1996) supports a hopeful but questioning
view regarding the uniformity with which such techniques succeed. Although by definition such methods are intended to
elicit conditions that change a child’s likelihood of acquiring a more mature behavior, they may at times provide no more
than transient predictions with a tenure that makes them of lesser value for signaling treatment focus. Nonetheless,
exploration of their predictive value in specific domains and for specific clients warrants further investigation. In the
meantime, their greatest promise appears to lie in the insights they provide regarding how intervention might best take
place and in providing more valid assessments for children who are highly reactive to a testing. There are also numerous
suggestions that they promise to provide more valid assessments than other available methods for children from diverse
backgrounds who may lack the experiences assumed by more conventional testing methods.
Single Subject Designs
In their ground-breaking work on the application of single subject experimental designs to speech-language pathology,
McReynolds and Kearns (1983) noted that such designs had the promise of wide application by clinicians because of
their practicality and clinical relevance. Despite their wide acceptance as an alternative method of scientific inquiry,
however, such designs have been resisted by speech-language clinicians in daily practice—probably because their
practicality falls short of that demanded by most clinical settings. Nonetheless, they remain the strongest available
method when the clinical question at hand centers on whether treatment is the likely cause of observed changes in
behavior.
The most frequently used measures in single subject designs are elicited probes and other informal measures, which are
referred to as dependent measures in this context. These informal measures often lack the documentation regarding
validity and reliability that can adorn more formal measures. Nonetheless, their use is strengthened by their close tie to
the specific construct for which they have been created or selected. Ideally, they represent highly defensible
operationalizations of the behavior or ability of interest. Their use is further strengthened when measures of inter- and
intraexaminer agreement, or other basic measures aimed at demonstrating reliability, are obtained. They can also be
enhanced by blind measurement procedures in which the person making the measurement is unaware of the purpose it
will serve or, ideally, the individual on whom it was obtained (Fukkink, 1996).
Page 315
As part of the systematic structuring of observations that underlies the rationale behind single subject designs, dependent
measures are obtained frequently and can thus provide persuasive evidence of consistency or change. In addition, the
temporal structure of such designs is intended to provide logical support for the role of treatment versus alternative
explanations as agents of change. On the basis of these ideals, single subject experimental designs have been lauded not
only for their ability to provide superior evidence about causation at the level of the individual but also about both the
outcome and process of treatment (McReynolds & Kearns, 1983; McReynolds & Thompson, 1986).
A simple consideration of a few of the books on the subject suggests that detailed discussion of the methods and logic
supporting the application of single subject designs in communication disorders is well beyond the scope of this book (e.
g., Franklin, Allison, & Gorman, 1996; Kratochwill & Levin, 1992; McReynolds & Kearns, 1983). Nonetheless, a
simple example can be used to illustrate the logic that supports causal interpretation of such designs and thus their
potential for addressing the question of whether treatment is likely to be responsible for a child’s behavioral change. The
example I show in Fig. 11.2 is a hypothetical example from Bain and Dollaghan (1991). It was described previously for
its use of control, generalization, and treatment probes. It is described here for the way in which the stability of data,
timing of treatment, and demonstrations of change lead one to the conclusion that observed changes probably resulted
from treatment.
As you look at Fig. 11.2, notice first the top graph, in which probes for the primary focus of treatment (Agent + Action)
are studied first without the presence of treatment during a baseline condition. Because the baseline is clearly
unchanging, it is reasonable to conclude that factors such as maturation, informal instruction by a parent, and so forth are
not playing a role in the child’s acquisition of the target form prior to the initiation of treatment. Although the initiation
of treatment does not result in instantaneous change, change does occur over the course of the treatment interval. Further,
that change seems likely to be due to the effects of treatment rather than alternative explanatory factors because of the
implausibility that such factors would commence by chance in such close proximity to the onset of treatment. Whereas in
most single subject designs, the period labeled “withdrawal’’ is considered a second baseline, here it is described as
withdrawal because the experimenter would probably expect some additional growth (generalization) due to learning
effects. This kind of design in which treatment is absent, then present, then absent again is often referred to as an ABA or
withdrawal design.
ABA designs are often avoided in classical single subject designs in cases where an effective treatment would be
expected to show “carryover” in this way. Instead such designs would more typically be used for behaviors that are
expected to return to baseline when treatment is ended. When language development is studied, however, the presence of
generalization is not considered a serious detractor from the logic of an experiment when it occurs as part of a set of
predictions made in advance by the clinician or experimenter.
In the second graph of Fig. 11.2, a second dependent measure (or generalization probe), Action + Object, is observed
with the expectation that its relationship to the targeted variable, Agent + Action, will cause some developmental change
to occur
Page 316
Fig. 11.2. A hypothetical multiple baseline single subject design that makes use of treatment (Agent + Action), generalization
(Action + Object), and control (Entity + Locative) probes (Graphs 1, 2, and 3, respectively). From “The Notion of Clinically
Significant Change,” by B. A. Bain and C. A. Dollaghan, 1991, Language, Speech, and Hearing Services in Schools, 22, p. 266.
Copyright 1991 by the American Speech-Language-Hearing Association. American Speech-Language-Hearing Association.
Reprinted with permission.
during treatment and possibly beyond. However, the presence of an initial period of stability prior to the onset of change in this
measure is again helpful in strengthening the plausibility of the argument that the observed change is likely to result from the
treatment rather than other factors. In addition, that argument is strengthened if the generalization probe does not improve to the same
extent as the target probe, or does so following a delay relative to the actual target of treatment.
In the third graph of Fig. 11.2, the control probe, Entity + Locative, is shown with a stable but longer baseline, thus indicating that
extraneous variables are unlikely to
Page 317
be acting on the child’s language development for the entire duration of the baseline. It is important that the baseline for
this variable, which was predicted to be unaffected by generalization, remained stable throughout the entirety of
treatment directed at Agent + Action and its withdrawal period in order to support the treatment effect on the other
variables. As importantly, it begins to show improvement only after the initiation of treatment in which it has become the
direct target.
The practical requirements in terms of data collection and display are not inconsequential for single subject designs.
However, as this example illustrates, they do not have to be overly burdensome either, with the chief investment here
being the periodic (and staggered) collection of probe data for two additional forms. This cost seems well worth it when
weighed against the value of evidence documenting the effectiveness of the treatment used for two different targets and
of real-time insights into the generalization patterns of the individual child.
In addition to numerous books dealing more comprehensively with the large number of designs that can be applied in
clinical settings (Franklin, Allison, & Gorman, 1996; Kratochwill & Levin, 1992; McReynolds & Kearns, 1983), a set of
three classic articles (Connell & Thompson, 1986; Kearns, 1986; McReynolds & Thompson, 1986) represent a
wonderful initiation to the promise such designs hold for clinicians interested in children’s language disorders.
Practical Considerations
With regard to assessing change, the largest practical consideration appearing on the horizon has been the presence of
professional and societal forces urging clinicians to find measures that document the value of what they do on a broader
scale and with greater regularity. Therefore, although other practical issues exist as very real pressures on clinicians’
decision making regarding all of the areas of change discussed in this chapter, the issue of outcome measurement seems
to warrant the full attention of the remaining pages of this chapter and, indeed, the concluding pages of this book.
In speech-language pathology, interest in how language treatment affects children has been around for quite some time
(e.g., Schriebman & Carr, 1978; Wilcox & Leonard, 1978). However, a continuing complaint has been that not enough
such research on treatment is being done (e.g., McReynolds, 1983; Olswang, 1998), and the research that is being done
involves treatment procedures that, although useful for purposes of scientific rigor, cannot readily be applied to real
clinical settings. Thus, the generalizability of a small research base has been at issue. Nonetheless, existing treatment
research has provided at least some preliminary evidence of the effectiveness of treatment extending beyond the level of
the individual clinician.
More recently, interest in accountability (e.g., Eger, 1988; Eger et al., 1986; Mowrer, 1972) has arisen at a grassroots
level because of growing demands from individual consumers and their advocates. This interest has been joined in an
intense top-down fashion by ASHA as it responds to protect its members’ roles in fast-changing health care and
educational systems (Frattali, 1998a,b; Hicks, 1998). In a chapter addressing the specific nature of top-down pressures
necessitating greater attention to outcomes
Page 318
assessment in speech-language pathology, Hicks (1998) described at least three sources of influence to which the
profession must respond:
1. accrediting agencies (e.g., the Rehabilitation Accreditation Commission; the Joint Commission on Accreditation of
Healthcare Organizations, JCAHO; ASHA’s Professional Services Board, PSB);
2. payer requirements (e.g., Medicare; Medicaid; and Managed Care Organizations, MCOs); and
3. legislative and regulatory requirements (e.g., Omnibus Budget Reconciliation Act of 1987, Public Law 100-203, and
the Social Security Act, Part 484)
At first glance, these forces would seem to come primarily from those clinical settings that serve adults and, thus, it
might be thought that they would not affect clinicians who work with children in primarily educational settings.
However, as appreciation of the value of outcomes measures has become more widespread and as the great divide
between education and healthcare breaks down (as illustrated in Medicaid funding for some children enrolled in school
programs), the blissful luxury of considering treatment outcomes someone else’s challenge has all but disappeared. Eger
(1998) noted that Congress’s passing of the Education of All Handicapped Children Act of 1975 (P.L. 94-142) served as
a possible precursor to formal outcomes measurement activities in special education because it included as one of its four
main goals the assessment and assurance of educational effectiveness. The passage of the 1997 amendments to IDEA (P.
L. 105-17) further reinforces the importance of further developments in this area. In order to respond to the challenges
facing the professions across settings, ASHA has begun the development of treatment outcomes measures that can be
used by groups of clinicians to document their value and provide a basis for comparisons by important groups (e.g.,
school districts, third-party payers).
At this point, readers who are unfamiliar with the terminology that accompanies outcomes measurement may feel a tad
bewildered. Therefore, some background on the relationship between treatment efficacy research and treatment
outcomes research seems in order. Despite some important underlying similarities and overlapping methods, an
important distinction can be made between these two terms (Frattali, 1998a; Olswang, 1998). Olswang (1998) pointed
out that both efficacy research and outcome research represent strategies for examining the influence of treatment on
individuals with communication disorders. Nonetheless, whereas efficacy research emphasizes the importance of
documenting treatment as a cause for change, outcomes research emphasizes the benefits associated with treatment as it
is administered in real-world circumstances. Frattali (1998a) described the distinction quite succinctly by saying that
“efficacy research is designed to prove,” whereas “outcomes research can only identify trends, describe, or make
associations or estimates” (p. 18). Whereas past efficacy research has focused primarily on the behaviors that fall at the
impairment level in terms of the ICIDH classification system, a broadening of concerns to embrace behaviors falling at
the levels of disorder and handicap is an emerging trend (Olswang, 1998).
Page 319
Treatment efficacy is often defined as encompassing treatment effectiveness, efficiency, and effects (e.g., see Kreb &
Wolf, 1997; Olswang, 1990, 1998). Treatment effectiveness refers to the traditional idea of whether or not a given
treatment is likely to be responsible for observed changes in behavior. Treatment efficiency refers to the relative
effectiveness of several treatments or to the role of components of a treatment in contributing to its effectiveness.
Finally, treatment effects refers to the specific changes that can be seen in a constellation of behaviors in response to a
given treatment. Similar components have also been identified as falling within the province of treatment outcomes as
well (Kreb & Wolf, 1997).
Whereas treatment efficacy research is usually conducted under optimal conditions, or at least well-controlled clinical
conditions, outcomes measurement is, by definition, conducted under typical conditions (Frattali, 1998b; Olswang,
1998). On the downside, this means treatment outcomes research will almost never be able to contribute to arguments
about the cause and effect relationships of treatments and observed benefits. Nonetheless, outcomes research will almost
always be in a better position than treatment efficacy research to address concerns about the value of services offered to
professional constituencies (e.g., within a given hospital or school district). Consequently, outcomes research has a very
special value to individual clinicians. It can enable them to demonstrate accountability not in the abstract, based on
treatments conducted solely by other clinician–researchers working under controlled conditions, but by comparing their
own outcomes with those obtained by others through participation in the large-scale, multi-site efforts that are
characteristic of such research.
In 1997, the National Center for Treatment Effectiveness in Communication Disorders began work on a database that
will involve clinicians in the collection of outcomes data on a national basis. This complex database, the National
Outcomes Measurement System (NOMS), will eventually include information about all of the populations served by
speech-language pathologists and audiologists. Currently, however, NOMS is limited to information about adults seen in
healthcare settings, preschool children who are served in school or healthcare settings, and children in kindergarten
through the sixth grade who are seen in schools. (Note that data concerning infant hearing screenings are just beginning
to be collected.) In order to participate, school-based clinicians work cooperatively to provide data for a given school
system in which at least 75% of the speech-language pathologists hold ASHA certification and in which all students will
be included in the data that are collected. These two restrictions are designed to improve the quality and
representativeness of the data.
For schools, data for the NOMS are collected at the beginning and conclusion of services, or at the beginning and end of
the school year, with data collection procedures designed to take no more than 5 to 10 minutes per child. Data include
information about demographics, eligibility for services, the nature of treatment (i.e., model of services, amount, and
frequency of services), teacher and family satisfaction, and the results of the Functional Communication Measures
(FCMs), a 7-point scale developed by ASHA. The scale addresses functional performance within the educational
environment. It includes items such as ‘‘The student responds to questions regarding everyday and classroom activities”
and “The student knows and uses age-appropriate
Page 320
interaction with peers and staff.” These items are rated on the following scale: 0 = No basis for rating; 1 = Does not do;
2 = Does with maximal assistance; 3 = Does with moderate to maximal assistance; 4 = Does with moderate assistance; 5
= Does with minimal to moderate assistance; 6 = Does with minimal assistance; and 7 = Does.
ASHA’s goals for the NOMS are lofty. Besides demonstrating positive outcomes for children receiving speech-language
pathology services, it is hoped that the NOMS will facilitate administrative planning (e.g., caseload assignments) as well
as individual decisions about intervention. Among particular aspirations are that it will provide information about when
intervention is most effective, how much progress can be expected over an academic year, what service delivery model
and frequency of service results in the greatest gains for a given kind of communication disorder, and what entrance and
dismissal criteria are reasonable. In addition, it is hoped that comparative NOMS data might allow individual school
systems or groups of school systems to demonstrate their effectiveness and efficiency in ways that will help them
negotiate in an era of strained educational resources. The success of the system in meeting these goals will depend
greatly on widespread participation allowing the representative samples required for specific generalizations such as
those just described. In terms of the utility of the system for providing comparative data across school systems or units, a
greater tailoring of reports available to participants may be necessary before those aspirations can be actualized.
Beyond the NOMS, Eger (1998) described numerous ways in which an outcomes approach can be incorporated within
school practice. These range from simple modifications of the way goals and objectives are written for individualized
educational plans (IEPs) to the development of empirically motivated dismissal criteria to more elaborate investigations
of effectiveness of specific service delivery models (e.g., classroom-based interventions, self-contained classroom).
These three examples run the gamut from those that can be implemented by the individual clinician to those requiring
more extensive resources, akin to those required by the NOMS.
In terms of how the individual speech-language pathologists can modify the IEPs they write, Eger (1998) provided an
example. She noted that a goal that might currently be written as “The student will improve expressive language skills”
could be replaced with one or more of the following: “The student will apply problem-solving and decision making skills
in math and English classes,” ‘‘The student will use language to create dialogues with teachers and peers to facilitate
learning,” or “The student will be able to follow written directions on objective tests” (Eger, 1998, p. 447).
Regardless of whether speech-language pathologists working with children actively work to include an outcomes
perspective in their practice, the outcomes movement will undoubtedly drive extensive changes in clinical practice over
the next decade, especially as these relate to the documentation of change in children’s communication. Responsible
reactions to these changes will depend on sensitivity to the measurement virtues (i.e., functionality and the development
of common best practices) as well as the measurement perils. Many of these perils are those shared with all measurement
strategies, such as concerns about the quality of data collection at its source and the size of the sample used for any
particular decision. Some, however, are unique to such a large undertaking—the relinquishment of decisions about how
interpretation
Page 321
will take place and, thus, the possible relinquishment of feelings of personal responsibility as well. Still, it is an exciting
time for measurement in communication disorders, one in which sizeable resources may finally be funneled to some of
the questions that most trouble speech-language pathologists. The desired outcome of such investments is the
proliferation of innovative measurement strategies and refinement of existing tools to help us arrive at a sophisticated
armamentarium of tools for addressing our clinical questions.
Summary
1. The assessment of change underlies both critical and commonplace decisions made in the management of children’s
language disorders. These include decisions about individuals, such as when to begin and end treatment and whether
treatment tactics should be altered during the course of treatment.
2. When questions of treatment efficacy and accountability are raised, the assessment of change can also fuel decisions
about the relative merit of various treatment approaches or the relative productivity of groups of clinicians.
3. Three types of outcomes observed in clinical settings include ultimate outcomes, intermediate outcomes, and
instrumental outcomes. Whereas ultimate outcomes relate to decisions about treatment termination, intermediate and
instrumental outcomes relate to clinical decisions made during the course of treatment.
4. Measurement error presents an especially difficult challenge to interpretation when measures are examined at multiple
points in time, such as when past change is examined or future change is predicted.
5. Clinically significant change must not only be reliable, it must also represent an important change to the life of the
child. Three methods used to address whether an observed change is likely to be important involve considerations of
effect size, social validation, and the use of multiple measures.
6. Determining that positive changes in a child’s language are caused by treatment is made extraordinarily difficult by
the thankfully unavoidable but nonetheless confounding influences of growth and development. Increased understanding
of those influences within and across children are needed to help address this very thorny measurement problem.
7. Single subject experimental designs offer clinicians the best currently available means for demonstrating that
treatment is responsible for observed changes, but have thus far been used primarily by researchers.
8. Measurement elements strengthening arguments that treatment is the cause of observed changes include the presence
of pretreatment baselines and the use of treatment, generalization, and control probes.
9. Treatment efficacy research is concerned with documenting whether treatment is effective, efficient, and whether the
effects of treatment extend to a number of significant behaviors.
10.
Page 322
Treatment outcomes research is designed to demonstrate benefits associated with treatment as it is conducted in
everyday contexts. Cooperation from all members of the profession is needed to collect some kinds of particularly
persuasive treatment outcomes data, such as those being collected in the NOMS database by ASHA.
Key Concepts and Terms
clinically significant change: a change that makes an immediate impact on the communicative life of a child or that
represents significant progress toward the acquisition of critical aspects of language.
effect size: the magnitude of the difference between two scores or sets of scores, or of the correlation between two sets of
variables.
Functional Communication Measures (FCMs): one of several rating scales designed by ASHA for use in tracking
functional communication gains made by clients.
gain scores: the difference between scores obtained by an individual at two points in time when that difference represents
a positive change in performance; also called difference scores.
instrumental outcomes: individual behaviors acquired during treatment that suggest the likelihood of additional change;
generalization probe data function as instrumental outcomes.
intermediate outcomes: individual behaviors that must be acquired for progress in treatment to have occurred; treatment
probe data can function as intermediate outcomes.
National Outcomes Measurement System (NOMS): an outcomes database for speech-language pathology and audiology
that is being developed to address the professions’ need for large-scale outcomes data.
outcome measurement: the use of measures designed to describe the effects of treatment conducted under typical, rather
than controlled conditions.
Proportional Change Index (PCI): a method for examining the rate of change observed in a given behavior during
treatment relative to that observed prior to treatment.
single subject experimental designs: a group of related research designs that permit the user to support claims of causal
relationship between variables, such as the effect of treatment on a targeted behavior.
social comparison: a social validation method that involves the use of a comparison between language behaviors of a
given child or group of children and those of a small group of peers.
social validation: methods used to indicate the social importance of changes occurring in treatment.
Page 323
subjective evaluation: a social validation method in which procedures are used to determine whether individuals who
interact frequently with a child who is receiving treatment see perceived changes as important.
treatment effectiveness: the demonstration that a treatment, rather than other variables, is responsible for changes in
behavior (Kreb & Wolf, 1997; Olswang, 1990).
treatment effects: changes in multiple behaviors that appear to result from a given treatment (Olswang, 1990).
treatment efficacy research: research designed to demonstrate the complex property of a treatment that includes its
effectiveness, efficiency, and effects (Olswang, 1990, 1998).
treatment efficiency: the effectiveness of a treatment relative to an alternative; a more efficient treatment is one in which
goals are accomplished more rapidly, completely, or more cost-effectively than a less efficient treatment (Olswang,
1990).
ultimate outcomes: individual behaviors that signal successful treatment, either because age-appropriate or functionally
adequate levels of performance had been achieved or because further treatment would be unlikely to yield significant
additional gains.
Study Questions and Questions to Expand Your Thinking
1. Arrange to see a clinical case file for a child who is receiving treatment for a language disorder. List the ways in which
change is currently documented. Consider ways in which that documentation might be strengthened including how
efforts might be made to address changes in educational or social function as well as in the nature of impairment.
2. Discuss the advantages and disadvantages of using a standard battery of norm-referenced tests to look at a child’s
overall language functioning over time. If you were to devise such a battery, what would you look for in its components?
Would that battery differ on the basis of the etiology of the disorder? If so, how?
3. With regard to the different tools that might be used to examine change, discuss how you might explain that method to
a child’s parents.
4. Visit the web site for the NOCMS at http://www.asha.org/nctecd/treatment_outcomes.htm. Determine what barriers
might exist to participating in the NOMS. On the basis of the information you obtained in this chapter and through that
web site, what arguments might be made to justify efforts to overcome these barriers?
5. Look at the treatment efficacy studies for child language disorders collected at the NOMS web site under the Efficacy
Bibliographies link. On the basis of the information you can glean from reading the titles of articles listed there, what
kinds of aspects of treatment efficacy seem to have gotten the greatest attention?
6. On the basis of what you know about clinical decisions regarding change, discuss specific changes that might warrant
the use of a method such as a single subject
Page 324
design or social validation techniques. Although these methods are more complex than some other methods, they have
the respective advantages of demonstrating the clinician’s responsibility for change or the social impact of change.
Recommended Readings
Bain, B. A., & Dollaghan, C. (1991). The notion of clinically significant change. Language, Speech, and Hearing
Services in Schools, 22, 264–270.
Kazdin, A. E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical
Psychology, 67, 332–339.
Kreb, R. A., & Wolf, K. E. (1997). Treatment outcomes terminology. In R. A. Kreb & K. E. Wolf (Eds.), Successful
operations in the treatment-outcomes-driven world of managed care. Rockville, MD: National Student Speech-
Language-Hearing Association.
Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative
strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101.
References
Anastasi, A. (1982). Psychological testing (5th ed). New York: Macmillan.
Bain, B. A., & Dollaghan, C. (1991). The notion of clinically significant change. Language, Speech, and Hearing
Services in Schools, 22, 264–270.
Bain, B. A., & Olswang, L. B. (1995). Examining readiness for learning two-word utterances by children with specific
expressive language impairment: Dynamic assessment validation. American Journal of Speech-Language Pathology, 4,
81–91.
Bernthal, J. E., & Bankson, N. W. (1998). Articulation and phonological disorders (4th ed.). Englewood Cliffs, NJ:
Prentice-Hall.
Campbell, T., & Bain, B. A. (1991). Treatment efficacy: How long to treat: A multiple outcome approach. Language,
Speech, and Hearing Services in Schools, 22, 271–276.
Campbell, T., & Dollaghan, C. (1992). A method for obtaining listener judgments of spontaneously produced language:
Social validation through direct magnitude estimation. Topics in Language Disorders, 12 (2), 42–55.
Carver, R. (1974). Two dimensions of tests: Psychometric and edumetric. American Psychologist, 29, 512–518.
Connell, P. J., & Thompson, C. K. (1986). Flexibility of single-subject experimental designs. Part III: Using flexibility to
design and modify experiments. Journal of Speech and Hearing Disorders, 51, 214–225.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings. Boston:
Houghton Mifflin.
Diedrich, W. M., & Bangert, J. (1980). Articulation learning. Houston, TX: College-Hill Press.
Education for All Handicapped Children Act of 1975. Pub. L. No. 94–142. 89 Stat. 773 (1975).
Eger, D. (1988). Accountability in action: Entry, measurement, exit. Seminars in Speech and Language, 9, 299–319.
Eger, D. (1998). Outcomes measurement in the schools. In C. Frattali (Ed.), Measuring outcomes in speech-language
pathology (pp. 438–452). New York: Thieme.
Eger, D., Chabon, S. S., Mient, M. G., & Cushman, B. B. (1986). When is enough enough? Articulation therapy
dismissal considerations in the public schools. Asha, 28, 23–25.
Elbert, M., Shelton, R. L., & Arndt, W. B. (1967). A task for evaluation of articulation change: I. Development of
methodology. Journal of Speech and Hearing Research, 10, 281–289.
Fergusson, D. M., Horwood, L. J., Caspi, A., Moffitt, T. E., & Silva, P. A. (1996). The (artefactual) remission of reading
disability: Psychometric lessons in the study of stability and change in behavioral development. Developmental
Psychology, 32, 132–140.
Page 325
Fey, M. (1988). Generalization issues facing language interventionists: An introduction. Language, Speech, and Hearing
Services in Schools, 19, 272–2 1.
Foster, S. L., & Mash, E. J. (1999). Assessing social validity in clinical treatment research: Issues and procedures.
Journal of Consulting and Clinical Psychology, 67, 308–319.
Franklin, R. D., Allison, D. B., & Gorman, B. S. (Eds.). (1996). Design and analysis of single-case research. Mahwah,
NJ: Lawrence Erlbaum Associates.
Franklin, R. D., Gorman, B. S., Beasley, T. M., & Allison, D. B. (1996). Graphical display and visual analysis. In R. D.
Franklin, D. B. Allison, & B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 119–158). Mahwah,
NJ: Lawrence Erlbaum Associates.
Frattali, C. (1998a). Measuring modality-specific behaviors, functional abilities, and quality of life. In C. Frattali (Ed.),
Measuring treatment outcomes in speech-language pathology (pp. 55–88). New York: Thieme.
Frattali, C. (Ed.). (1998b). Measuring treatment outcomes in speech-language pathology. New York: Thieme.
Fukkink, R. (1996). The internal validity of aphasiological single-subject studies. Aphasiology, 10, 741–754.
Glesne, C., & Peshkin, A. (1992). Becoming qualitative researchers: An introduction. White Plains, NY: Longman.
Goldfried, M. R., & Wolfe, B. E. (1998). Toward a more clinically valid approach to therapy research. Journal of
Consulting and Clinical Psychology, 66, 143–150.
Goldstein, H., & Geirut, J. (1998). Outcomes measurement in child language and phonological disorders. In C. Frattali
(Ed.), Measuring outcomes in speech-language pathology (pp. 406–437). New York: Thieme.
Gorman, B. S., & Allison, D. B. (1996). Statistical alternatives for single-case designs. In R. D. Franklin, D. B. Allison,
& B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 159–214). Mahwah, NJ: Lawrence Erlbaum
Associates.
Hicks, P. L. (1998). Outcomes measurement requirements. In C. Frattali (Ed.), Measuring outcomes in speech-language
pathology (pp. 28–49). New York: Thieme.
Individuals with Disabilities Education Act (IDEA) Amendments of 1997. Pub. L. 105–17. 111 Stat. 37 (1997).
Jacobson, N. S., Roberts, L. J., Berns, S. B. & McGlinchey, J. B. (1999). Methods for defining and determining the
clinical significance of treatment effects: Description, application, and alternatives. Journal of Consulting and Clinical
Psychology, 67, 300–307.
Kamhi, A. (1991). Clinical forum: Treatment efficacy, an introduction. Language, Speech and Hearing Services in
Schools, 22, 254.
Kazdin, A. E. (1977). Assessing the clinical or applied significance of behavioral change through social validation.
Behavior Modification, 1, 427–452.
Kazdin, A. E. (1999). The meanings and measurement of clinical significance. Journal of Consulting and Clinical
Psychology, 67, 332–339.
Kazdin, A. E., & Weisz, J. R. (1998). Identifying and developing empirically supported child and adolescent treatments.
Journal of Consulting and Clinical Psychology, 66, 19–36.
Kearns, K. P. (1986). Flexibility of single-subject experimental designs. Part II: Design selection and arrangement of
experimental phases. Journal of Speech and Hearing Disorders, 51, 204–214.
Koegel, R., Koegel, L. K., Van Voy, K., & Ingham, J. (1988). Within-clinic versus outside-of-clinic self-monitoring of
articulation to promote generalization. Journal of Speech & Hearing Disorders, 53, 392–399.
Kratochwill, T. R., & Levin, J. R. (1992). Single-case research design and analysis: New directions for psychology and
education. Hillsdale, NJ: Lawrence Erlbaum Associates.
Kreb, R. A., & Wolf, K. E. (1997). Treatment outcomes terminology. Successful operations in the treatment-outcomes
driven world of managed care. Rockville, MI: National Student Speech-Language-Hearing Association.
Lahey, M. (1988). Language disorders and language development. New York: Macmillan.
Long, S. H., & Olswang, L. B. (1996). Readiness and patterns of growth in children with SELI. Language, Speech, and
Hearing Serivces in Schools, 5, 79–85.
Maloney, D. M., Harper, T. M., Braukmann, C. J., Fixsen, D. L., Phillips, E. L., & Wolf, M. M. (1976). Teaching
conversation-related skills to pre-delinquent girls. Journal of Applied Behavioral Analysis, 9, 371.
Page 326
McCauley, R. J. (1996). Familiar strangers: Criterion-referenced measures in communication disorders. Language,
Speech, and Hearing Services in Schools, 27, 122–131.
McCauley, R. J., & Swisher, L. (1984). Use and misuse of norm-referenced tests in clinical assessment: A hypothetical
case. Journal of Speech and Hearing Disorders, 49, 338–348.
McReynolds, L. V. (1983). Discussion: VII. Evaluating program effectiveness. ASHA Reports 12, 298–306.
McReynolds, L. V., & Kearns, K. P. (1983). Single-subject experimental designs in communicative disorders. Austin,
TX: Pro-Ed.
McReynolds, L. V., & Thompson, C. K. (1986). Flexibility of single-subject experimental designs. Part I: Review of the
basics of single-subject designs. Journal of Speech and Hearing Disorders, 51, 194–203.
Mehrens, W., & Lehman, I. (1980). Standardized tests in education (3rd ed.). New York: Holt, Rinehart & Winston.
Minkin, N., Braukmann, C. J., Minkin, B. L., Timbers, G. D., Timbers, B. J., Fixsen, D. L., Phillips, E. L., & Wolf, M.
M. (1976). The social validation and training of conversational skills. Journal of Applied Behavioral Analysis, 9, 127–
139.
Mowrer, D. (1972). Accountability and speech therapy in the public schools. Asha, 14, 111–115.
Olswang, L. B. (1990). Treatment efficacy research: A path to quality assurance. Asha, 32, 45–47.
Olswang, L. B. (1993). Treatment efficacy research: A paradigm for investigating clinical practice and theory. Journal of
Fluency Disorders, 18, 125–131.
Olswang, L. B. (1998). Treatment efficacy research. In C. Frattali (Ed.), Measuring treatment outcomes in speech-
language pathology (pp. 134–150). New York: Thieme.
Olswang, L. B., & Bain, B. A. (1985). Monitoring phoneme acquisition for making treatment withdrawal decisions.
Applied Psycholinguistics, 6, 17–37.
Olswang, L. B., & Bain, B. A. (1994). Data collection: Monitoring children’s treatment progress. American Journal of
Speech-Language Pathology, 3, 55–66.
Olswang, L. B., & Bain, B. A. (1996). Assessment information for predicting upcoming change in language production.
Journal of Speech and Hearing Research 39, 414–423.
Parsonson, B. S., & Baer, D. M. (1992). The visual analysis of data, and current research into the stimuli controlling it.
In T. R. Kratochwill & J. R. Levin (Eds.), Single-case research design and analysis: New directions for psychology and
education (pp. 15–40). Hillsdale, NJ: Lawrence Erlbaum Associates.
Pedhazur, R. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Plante, E., & Vance, R. (1994). Selection of preschool speech and language tests: A data-based approach. Language,
Speech, and Hearing Services in Schools, 25, 15–23.
Primavera, L. H., Allison, D. B., & Alfonso, V C. (1996). Measurement of dependent variables. In R. D. Franklin, D. B.
Allison, & B. S. Gorman (Eds.), Design and analysis of single-case research (pp. 41–89). Mahwah, NJ: Lawrence
Erlbaum Associates.
Rosen, A., & Proctor, E. K. (1978). Distinctions between treatment outcomes and their implications for treatment
process: The basis for effectiveness research. Journal of Social Service Research, 2, 25–43.
Rosen, A., & Proctor, E. K. (1981). Distinctions between treatment outcomes and their implications for treatment
evaluation. Journal of Consulting and Clinical Psychology, 49, 418, 425.
Salvia, J., & Ysseldyke, J. E. (1995). Assessment. (6th ed.). Boston: Houghton Mifflin.
Schmidt, R. A., & Bjork, R. A. (1992). New conceptualizations of practice: Common principles in three paradigms
suggest new concepts for training. Psychological Sciences, 3, 207–217.
Schreibman, L., & Carr, E. G. (1978). Elimination of echolalic responding to questions through the training of a
generalized verbal response. Journal of Applied Behavior Analysis, 11, 453–463.
Schwartz, I. S., & Olswang, L. B. (1996). Evaluating child behavior change in natural settings: Exploring alternative
strategies for data collection. Topics in Early Childhood Special Education, 16, 82–101.
Semel, E., Wiig, E. H., & Secord, W. A. (1996). Clinical Evaluation of Language Fundamentals 3. San Antonio, TX:
Psychological Coproration.
Sturner, R. A., Layton, T. L., Evans, A. W, Heller, J. H., Funk, S. G., & Machon, M. W. (1994). Preschool speech and
language screening: A review of currently available tests. American Journal of Speech, Language, and Hearing, 3, 25–
36.
Page 327
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge: Harvard
University Press.
Wilcox, M. J., & Leonard, L. B. (1978). Experimental acquisition of Wh-questions in language-disordered children.
Journal of Speech and Hearing Research, 21, 220–239.
Wolery, M. (1983). Proportional change index: An alternative for comparing child change data. Exceptional Children,
50, 167–170.
Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its
heart. Journal of Applied Behavior. Analysis, 11, 203–214.
Young, M. A. (1993). Supplementing tests of statistical significance: Variation accounted for. Journal of Speech and
Hearing Research, 36, 644–656.
Page 328
APPENDIX A
Page 329
Norm-Referenced Tests Designed for the Assessment of Language in Children,
Excluding Those Designed Primarily for Phonology (Appendix B)

Test Ages Oral Language Written Complete Reference Reviewed in


Modalities and Language MMY?
Domains Included? (x =
Computer
Form)

Assessing Semantic 3 to 9 years R and E-Sem no Barrett, M., Zachman, no


Skills Through L., & Huisingh, R.
Everyday Themes (1988). Assessing
Semantic Skills
Through Everyday
Themes. East Moline,
IL: LinguiSystems.
Bankson Language 3 years to 6 years, E-Sem, Morph, no Bankson, N. W. x
Test–2 11 months Syn, Prag (1990). Bankson
Language Test–2. San
Antonio, TX: Pro-Ed.
Boehm Test of Basic 3 to 5 years R-Sem no Boehm, A. E. (1986). x
Concepts–Preschool Boehm Test of Basic
Concepts–Preschool
Version. San Antonio,
TX: Psychological
Corporation.
Boehm Test of Basic Kindergarten to R-Sem no Boehm, A. E. (1986). x
Concepts–Revised Grade 2 Boehm Test of Basic
Concepts–Revised.
San Antonio, TX:
Psychological
Corporation.
Bracken Basic 2½ to 8 years R-Sem no Bracken, B. A. x
Concept Scale– (1986). Bracken Basic
Revised Concept Scale. San
Antonio, TX:
Psychological
Corporation.
Carrow Elicited 3 years to 7 years, E-Morph, Syn no Carrow-Woolfolk, E. x
Language Inventory 11 months (1974). Carrow
Elicited Language
Inventory. Austin,
TX: Learning
Concepts.
Clinical Evaluation of 6 to 21 years R- and E-Sem, no Semel, E., Wiig, E. x
Language Syn, Rapid H., & Secord, W. A.
Fundamentals–3 Naming (1996). Clinical
Evaluation of
Language
Fundamentals–3. San
Antonio, TX:
Psychological
Corporation.
Clinical Evaluation of 3 to 6 years, 11 R and E-Sem, no Wiig, E. H., Secord, x
Language months Syn W., & Semel, E.
Fundamentals– (1992). Clinical
Preschool Evaluation of
Language
Fundamentals–
Preschool. San
Antonio, TX:
Psychological
Corporation.

(Continued)
Page 330
Appendix A (Continued)

Test Ages Oral Language Written Complete Reference Reviewed in


Modalities and Language MMY?
Domains Included? (x = Computer
Form)

Communication Abilities 3 to 9 years R- and E-Sem, no Johnston, E. B., & x


Diagnostic Test Syn, Prag Johnston, A. V. (1990).
Communication Abilities
Diagnostic Test.
Chicago: Riverside.
Comprehensive 3 to 21 R and E-Sem, no Carrow-Woolfolk, E. no
Assessment of Spoken years Morph, Syntax, (1999). Comprehensive
Language Prag Assessment of Spoken
Language. Circle Pines,
MN: American Guidance
Service.
Comprehensive 4 to 17 R and E-Sem no Wallace, G., & Hammill, x
Receptive and years, 11 D. D. (1994).
Expressive Vocabulary months Comprehensive
Test Receptive and Expressive
Vocabulary Test. San
Antonio, TX:
Psychological
Corporation.
Evaluating Acquired 3 months R- and E-Sem, no Riley, A. M. (1991). x
Skills in Communication– to 8 years Morph, Syntax, Evaluating Acquired
Revised Prag Skills in Communication–
Revised. San Antonio,
TX: Psychological
Corporation.
Expressive One-Word 2 to 12 E-Sem no Gardner, M. F. (1990). x
Picture Vocabulary Test years Expressive One-Word
Revised Picture Vocabulary Test–
Revised. Austin, TX: Pro-
Ed.
Expressive Vocabulary 2½ to 90 E no Williams, K. T. (1997). no
Test years Expressive Vocabulary
Test. Circle Pines, MN:
American Guidance
Service.
Fullerton Language Test 11 years to R- and E-Sem, no Thorum, A. R. (1986). x
for Adolescents adult Morph, Syntax Fullerton Language Test
for Adolescents (2nd
ed.). San Antonio, TX:
Pro-Ed.
Language Processing 5 to 11 E-Sem no Richard, G. J., & x
Test–Revised years, 11 Hanner, M. A. (1985).
months Language Processing
Test–Revised. East
Moline, IL:
LinguiSystems.
Oral and Written 3 to 21 R and E no Carrow-Woolfolk, E. x
Language Scales: years for (1995) Oral and Written
Listening oral Language Scales:
Comprehension and Oral Listening
Expression Comprehension and Oral
Expression. Circle Pines,
MN: American Guidance
Service.
Page 331
Oral and Written Language 5 to 21 years E Writing Carrow-Woolfolk, E. (1996). x
Scales: Written Expression Morph, Syn Oral and Written Language
Scales: Written Expression.
Circle Pines, MN: American
Guidance Service.
Patterned Elicitation Syntax 3½ to 7 E-Sem, no Young, E. C., & Perachio, J. x
Test With Morphophonemic years Morph, Syn J. (1993). The Patterned
Analysis Elicitation Syntax Test with
Morphophonemic Analysis.
Tucson, AZ: Communication
Skill Builders.
Peabody Picture Vocabulary 2½ to 90+ R-Sem no Dunn, L., & Dunn, L. (1997). no
Test–III years Peabody Picture Vocabulary
Test–III. Circle Pines, MN:
American Guidance Service.
Porch Index of 4 to 12 years R and E no Porch, B. E. (1979). Porch x
Communicative Ability in Index of Communicative
Children Ability in Children. Chicago:
Riverside.
Preschool Language Scale–3 Birth to 6 R and E- no Zimmerman, I. L., Steiner, no
years, 11 Sem, V., & Pond, R. (1992).
months Morph, Preschool Language Scale–3.
Syntax San Antonio, TX:
Psychological Corporation.
Receptive One-Word Picture 12 years to R-Sem no Brownell, R. (1987). x
Vocabulary Test–Upper 15 years, 11 Receptive One-Word Picture
Extension months Vocabulary Test-Upper
Extension. Novato, CA:
Academic Therapy
Publications.
Receptive One-Word Picture 2 years, 11 R-Sem no Gardner, M. F. (1985). x
Vocabulary Test months to Receptive One-Word Picture
12 years Vocabulary Test. Novato,
CA: Academic Therapy
Publications.
Reynell Developmental 1 year to 6 R and E no Reynell, J., & Gruber, C. P. x
Language Scales–U.S. Edition years, 11 (1990). Reynell
months Developmental Language
Scale. US. Edition. Windsor,
Ontario, Canada: NFER-
Nelson.
Structured Photo-graphic 4 to 9 years, E-Morph, no Werner, E., & Kresheck, J. MMY9a
Expressive Language Test–II 5 months Syn D. (1983). Structured
Photographic Expressive
Language Test–II. Sandwich,
IL: Janelle.
Test for Examining 3 years to 7 E-Syn no Shipley, K. G., Stone, T. A., MMY10b
Expressive Morphology years, 11 & Sue, M. B. (1983). Test for
months Examining Expressive
Morphology. Tucson, AZ:
Communication Skill
Builders.
(Continued)
Page 332
Appendix A (Continued)

Test Ages Oral Language Written Complete Reference Reviewed in


Modalities and Language MMY?
Domains Included? (x = Computer
Form)

Test of Adolescent and 12 to 21 R- and E-Sem, Writing: Sem, Hammill, D. D., Brown, x
Adult Language–3 years Morph, Syn Syn V. L., Larsen, S. C., &
Wiederholt, J. L. (1994).
Test of Adolescent and
Adult Language–3.
Austin, TX: Pro-Ed.
Test of Adolescent/ 12 to 80 E-WF no German, D. J. (1990). x
Adult Word Finding years Test of Adolescent/Adult
Word Finding. San
Antonio, TX:
Psychological
Corporation.
Test of Auditory 3 years to R-Sem, Morph, no Carrow-Woolfolk, E. no
Comprehension of 9 years, 11 Syn (1999). Test of Auditory
Language–3 months Comprehension of
Language–3. Austin,
TX: Pro-Ed.
Test of Children’s 5 years to E Reading, Barenbaum, E., & x
Language 8 years, 11 writing Newcomer, P. (1996).
months Test of Children’s
Language. San Antonio,
TX: Pro-Ed.
Test of Early Language 3 years to R- and E-Sem, no Hresko, W. P., Reid, K., x
Development 7 years, 11 Syn & Hammill, D. D.
months (1991). Test of Early
Language Development
(2nd ed.). Austin, TX:
Pro-ed.
Test of Language 5 to 18 R and E-Sem, no Wiig, E. H., & Secord, x
Competence—Expanded years, 11 Syn, Prag W. (1989). Test of
months Language Competence—
Expanded Edition. San
Antonio: Psychological
Corporation.
Test of Language 8 years to R and E-Sem, no Hammill, D. D., & x
Development– 12 years, Syn Newcomer, P. L.
Intermediate: 3 11 months (1997). Test of
Language Development
—Intermediate: 3.
Circle Pines, MN:
American Guidance
Service.
Test of Language 4 years to R- and E-Phon, no Newcomer, P., & x
Development— 8 years, 11 Sem, Syn Hammill, D. (1997).
Primary: 3 months Test of Language
Development—Primary:
3. Austin, TX: Pro-Ed.
Test of Pragmatic 5 to 13 R and E- no Phelps-Terasaki, D., & x
Language years, 11 Phelps-Gunn, T. (1992).
months Test of Pragmatic
Language. San Antonio,
TX: Psychological
Corporation
Test of Pragmatic Skills 3 to 8 years R- and E-Sem, no Shulman, B. B. (1986). no
(Revised) Prag Test of Pragmatic Skills
(Revised). Tucson, AZ:
Communication Skill
Builders.
Page 333
Test of Relational Concepts 3 years to 7 R-Sem no Edmonston, N., & Thane, N. x
years, 11 L. (1988). Test of Relational
months Concepts. Austin, TX: Pro-Ed.
Test of Word Finding 6½ to 12 E-WF no German, D. J. (1989). Test of x
years, 11 Word Finding. San Antonio,
months TX: Psychological
Corporation.
Test of Word Finding in 6½ to 12 E-WF no German, D. J. (1991). Test of x
Discourse years, 11 Word Finding in Discourse.
months Chicago: Riverside Publishing.
Test of Word Knowledge 5 to 17 R- and E- no Wiig, E. H., & Secord, W. x
Sem (1992). Test of Word
Knowledge. San Antonio, TX:
Psychological Corporation.
Test of Written Expression 6½ years to 14 — Writing McGhee, R., Bryant, B. R., x
years, 11 Larsen, S. C., & Rivera, D. M.
months (1995). Test of Written
Expression. San Antonio: Pro-
Ed.
Test of Written Language-2 17 years, 11 E Writing Hammill, D. D., & Larsen, S. x
months C. (1988). Test of Written
Language–2. San Antonio,
TX: Psychological
Corporation.
The Word Test-Adolescent 12 years to 17 E-Sem no Bowers, L., Huisingh, R., no
years, 11 Orman, J., Barrett, M., &
months LoGiudice, C. (1989). The
Word Test-Adolescent. East
Moline, IL: LinguiSystems.
The Word Test-Revised 7 to 11 years E-Sem no Bowers, L., Huisingh, R., no
Elementary Barrett, M., & LoGiudice, C.,
& Orman, J. (1990). The
Word Test–Revised
Elementary. East Moline, IL:
LinguiSystems.
Token Test for Children 3 to 12 years R-Sem, no DiSimoni, F. (1978). Token MMY9
Syn Test for Children. Chicago:
Riverside.
Utah Test of Language 3 years to 10 R- and E- no Mecham, M. J. (1989). Utah x
Development–3 years, 11 Syn Test of Language
months Development–3. Austin, TX:
Pro-Ed.
Woodcock Language 2 to 95 years R- and E- reading, writing Woodcock, R. W. (1991). x
Proficiency Battery–Revised Sem, Syn Woodcock Language
Proficiency–Revised.
Chicago: Riverside.
Note. Modalities and domains are abbreviated as follows: Receptive (R), Expressive (E), Semantics (Sem),
Morphology (Morph), Syntax (Syn), Pragmatics (Prag), Phonology (Phon), and Word Finding (WF). The presence of a
review in the Mental Measurements Yearbook (MMY) database or print series is noted in the final column, with x
indicating a com-puterized version and numerals representing the specific print volume containing the review.
aMitchell, J. V. (Ed.). (1985). The ninth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental
Measurement.
bConoley, J. C., & Kramer, J. J. (Eds.). (1989). The tenth mental measurements yearbook. Lincoln, NE: Buros Institute
of Mental Measurement.
Page 334
APPENDIX B
Page 335
Norm-Referenced and Criterion-Referenced Tests Designed Primarily for the Assessment of Phonology in Children

Test with Reference Criterion- Ages Stimuli, Processes, and Reviewed in


Information referenced (CR) Other Features MMY? (x =
and/or Norm- computer form)
referenced (NR)

Assessment Link Between NR/CR 3–0 to 8–11 Sentence or single words no


Phonology and elicited using delayed
Articulation: ALPHA imitation and pictures; 15
(Revised ed.) (Lowe, processes examined
1995). Mifflinville, PA:
Speech and Language
Resources.
Assessment of CR Preschool to age Single words elicited using x
Phonological Processes– 10 objects; 30 processes
Revised (Hodson, 1986). including 10 basic
Danville, IL: Interstate processes that are used in
Press. calculating overall score
that allows clarification of
severity
Arizona Articulation NR/CR 1–6 to 13–11 Single words and stimuli x
Proficiency Scale, 2nd for elicited connected
ed. (Fudala & Reynolds, speech; consonants,
1994). Los Angeles: consonant clusters, vowels,
Western Psychological and diphthongs are
Services. assessed; omission,
substitution, and distortion
error analysis; also allows
calculation of severity
Bankson-Bernthal Test of NR/CR 3–0 to 9–11 Single words; 10 most x
Phonology (Bankson & frequently occurring
Bernthal, 1990). Chicago: processes in standardization
Riverside Press. samples
Fisher-Logemann Test of CR 2 to 3 years and up Single word and sentence ?
Articulatory Competence forms; including
(Fisher & Logemann, consonants, consonant
1971). Boston: Houghton clusters, vowels, and
Mifflin. diphthongs are assessed;
place/manner/voicing
analysis only

(Continued)
Page 336
Appendix B (Continued)

Test with Reference Criterion-referenced Ages Stimuli, Processes, and Reviewed in


Information (CR) and/or Norm- Other Features MMY? (x =
referenced (NR) computer form)

Goldman-Fristoe Test of NR/CR 2 to 21 years 44 single words and 2 x


Articulation–2 (Goldman sets of pictures for
& Fristoe, 2000). Austin, connected speech
TX: Pro-Ed. elicitation; error analysis
does not include
features, but the Khan-
Lewis is designed for use
with the earlier version
of this test
Kaufman Speech Praxis NR/CR 2 years to 6 years Limited normative data; no
Test for Children assess productions at 4
(Kaufmann, 1995). levels: oral movement,
Detroit: Wayne State simple phonemic/
University Press. syllabic, complex
phonemic/syllabic, and
spontaneous length and
complexity
Khan-Lewis Phonological NR/CR 2 to 6 years Stimulus materials are x
Analysis (Khan & Lewis, those of the Goldman-
1986). Circle Pines, MN: Fristoe Test of
American Guidance Articulation-Revised; 15
Service. phonological processes;
one of the few tests with
normative data for
processes
Natural Process Analysis CR any age Analysis method for no
(Shriberg & Kwiatkowski, continuous speech
1980). New York: John sample; 8 natural
Wiley & Sons. processes
Phonological Process CR preschool children Single words or no
Analysis (Weiner, 1979). sentences; 16
Baltimore: University Park phonological processes
Press, 1979.
Page 337
Photo Articulation Test (Pendergast, NR/CR 3 to 12 years Single word and stimuli to elicited MMY9a
Dickey, Selmar & Soder, 1984). Austin, connected speech; including consonants,
TX: Pro-Ed. consonant clusters, and diphthongs;
omissions, substitutions, and distortions
are scored
Screening Test for Developmental NR/CR 4 to 12 years Scores given for expressive language x
Apraxia of Speech (Blakely, 1980). discrepancy, vowels and diphthongs, oral
Austin, TX: Pro-Ed. motor movement, verbal sequencing,
articulation, motorically complex words,
transpositions, prosody, and total; based
on a small population
S-CAT. Secord - Consistency of Articula- CR all ages Two test components - (1) Contextual no
tion Tests (Secord, 1997). Sedona, AZ: Probes of Articulation Competence
Red Rock Educational Publications. (CPAC) probes for production of
individual sounds and processes in
words, clusters and sentence, (2) SPAC -
Storytelling Probes of Articulation
Competence(SPAC) probes for
production in a narrative task
Smit-Hand Articulation and Phonology NR/CR 3 to 9 years Single words elicited through pictures or no
Evaluation (SHAPE; Smit & Hand, delayed imitation; 11 processes examined
1997). Los Angeles: Western
Psychological Services.
Templin-Darley Tests of Articulation NR/CR 3 to 8 years Single words; several subtests, including MMY7b
(Templin & Darley, 1969). Iowa City, screening, Iowa Pressure consonants test
IA: Bureau of Educational Research and (those affected by velopharyngeal
Service, University of Iowa. insufficiency), vowels, and diphthongs;
omissions, substitutions, and distortions
are scored

Note. The presence of a review in Mental Measurements Yearbook (MMY), with x indicating a computerized version
and numerals representing the specific print volume contain-ing the review.
aMitchell, J. V. (Ed.). (1985). The ninth mental measurements yearbook. Lincoln, NE: Buros Institute of Mental
Measurement.
bBuros, O. K. (Ed.). (1992). The seventh mental measurements yearbook. Highland Park, NJ: Gryphon Press.
Page 338
Page 339
AUTHOR INDEX
Entries in italics appear in reference lists.
A
Abbeduto, L., 149, 155, 166
Abkarian, G., 160, 164
Aboitiz, F., 119, 141
Agerton, E. P., 273, 287
Aitken, K., 169, 186
Alcock, K., 118, 145
Alfonso, V. C., 264, 280, 291, 306, 326
Allen, D., 114, 130, 143, 171, 172, 173, 178, 186, 233
Allen, J., 175, 184
Allen, M. J., 22, 47, 55, 56, 57, 58, 59, 66, 68, 76
Allen, S., 231, 245
Allison, D. B., 264, 280, 291, 306, 307, 308, 315, 317, 325, 326
Ambrose, W. R., 222, 246
American College of Medical Genetics, 153, 164
American Educational Research Association (AERA), 10, 12, 31, 47, 50, 62, 72, 75, 76, 89, 96, 105, 107, 252, 287
American Psychiatric Association, 111, 114, 115, 130, 134, 140, 148, 149, 150, 161, 164, 169, 170, 171, 172, 173, 178,
180, 181, 182, 183, 184
American Psychological Association (APA), 10, 12, 31, 47, 50, 62, 72, 75, 76, 89, 96, 105, 107, 217, 228, 244, 252, 287
American Speech-Language-Hearing Association (ASHA), 82, 84, 85, 104, 107, 196, 207, 264, 287
Anastasi, A., 36, 47, 55,60, 61, 62, 76, 96, 107, 296, 324
Andrellos, P. J., 237, 246
Andrews, J. F., 197, 198, 210
Angell, R., 181, 184
Annahatak, B., 231, 245
Apel, K., 241, 242, 245
Aram, D. M., 116, 117, 118, 119, 128, 130, 140, 143, 170, 184, 231, 240, 245, 270, 287
Archer, P., 213, 246
Arensberg, K., 229, 249
Arndt, S., 171, 185
Arndt, W. B., 259, 288, 294, 324
Aspedon, M., 238, 248
Augustine, L. E., 82, 108, 229, 230, 245
B
Bachelet, J. F., 270, 291
Bachman, L. F., 103, 107
Baddeley, A., 125, 141
Badian, N., 27, 47
Baer, D. M., 308, 326
Bailey, D., 237, 245
Page 340
Bain, B. A., 230, 248, 251, 252, 255, 256, 276, 277, 278, 279, 286, 287, 291, 294, 295, 296, 297, 298, 299, 300, 302,
303, 304, 305, 307, 308, 309, 310, 311, 314, 315, 316, 324, 326
Baker, K. A., 239, 247
Baker, L., 133, 140
Baker, N. E., 7, 13
Baker-van den Goorbergh, L., 269, 287
Ball, E. W., 137, 140
Balla, D., 163, 166, 215, 249
Baltaxe, C. A. M., 176, 184
Bangert, J., 259, 288, 294, 302, 324
Bankson, N. W., 29, 40, 47, 298, 324, 329, 335
Barenbaum, E., 332
Baron-Cohen, S., 175, 184
Barrett, M., 329, 333
Barrow, J. D., 252, 287
Barsalou, L.W., 7, 12
Barthelemy, C., 181, 185
Bashir, A., 135, 140, 241, 245
Bates, E., 237, 245, 246, 268, 287
Batshaw, M. L., 149, 164
Battaglia, F., 153, 154, 155, 160, 166
Baumeister, A. A., 147, 149, 152, 154, 156, 158, 164
Baumgartner, J. M., 120, 142
Beasley, T. M., 308, 325
Beck, A. R., 266, 283, 287
Becker, J., 170, 185
Bedi, G., 125, 126, 144
Bedor, L., 125, 142
Beitchman, J. H, 130, 140
Bejar, I. I., 66, 77
Bell, J. J., 158, 165
Bellenir, K., 151, 152, 159, 164
Bellugi, U., 158, 166, 188, 198, 207, 208
Benavidez, D. A., 178, 181, 185
Berg, B. L., 279, 280, 287
Bergstrom, L., 197, 207
Beringer, M., 232
Berk, R. A., 6, 13, 56, 76, 158, 163, 165, 256, 287
Berkley, R. K., 251, 289
Berlin, L. J., 258
Bernthal, J. E., 29, 40, 47, 215, 246, 298, 324, 335
Berns, S. B., 304, 325
Bess, F. H., 189, 192, 203, 207
Bettelheim, B., 173, 184
Biederman, J., 161, 165
Bihrle, A., 158, 166
Biklen, S. K., 279, 280, 287
Bishop, D. V. M., 114, 118, 119, 124, 130, 140, 141, 144, 269, 287
Bjork, R. A., 251, 292, 309, 326
Blackmon, R., 283, 292
Blake, J., 270, 287
Blakeley, R. W., 337
Blank, M., 258
Bliss, L. S., 103, 107, 233
Bloodstein, O., 159, 165
Boehm, A. E., 329
Bogdan, R. C., 279, 280, 287, 292
Bondurant, J., 121, 141
Botting, N., 238, 245
Boucher, J., 181, 185
Bow, S., 121, 122, 141
Bowers, L., 333
Bracken, B. A., 215, 245, 329
Brackett, D., 191, 192, 194, 195, 200, 203, 207, 209
Bradley, L., 27, 47
Bradley-Johnson, S., 188, 189, 200, 201, 204, 207
Braukmann, C. J., 304, 325, 326
Bredart, S., 270, 291
Breecher, S. V. A., 238, 248
Brennan, R. L., 102, 108
Bretherton, I., 237, 245
Bridgman, P. W., 19, 47
Brinton, B., 133, 141, 241, 245, 262, 287
Broks, P., 173, 185
Bronfenbrenner, U., 79, 107
Brown, A. L., 226, 245, 272, 277, 287
Brown, J., 226, 245
Brown, R., 266, 287
Brown, S., 230, 246, 255, 278, 289
Brown, V. L., 332
Brownell, R., 331
Brownlie, E. B., 133, 140
Bruneau, N., 181, 185
Bryant, B. R., 59, 77, 333
Bryant, P., 27, 47
Brzustowicz, L., 117, 141
Buckwalter, P., 114, 118, 145
Bunderson, C. V., 31, 47
Buros, O., 337
Burroughs, E. I., 262, 287
Butkovsky, L., 122, 143
Butler, K. G., 132, 145, 255, 287
Byma, G., 125, 144
Bzoch, K. R., 59, 76, 103, 107, 237, 245, 258
C
Cacace, A. T., 192, 207
Cairns, H. S., 259, 290
Page 341
Calhoon, J. M., 281, 287
Camarata, M., 122, 143
Camarata, S., 115, 122, 141, 143, 192, 209
Camaioni, L., 237, 245
Campbell, D., 55, 76, 306, 324
Campbell, M., 184
Campbell, R., 120, 141
Campbell, T., 223, 230, 245, 262, 264, 265, 268, 274, 275, 287, 288, 294, 295, 296, 304, 305, 307, 309, 311, 324
Campione, J., 277, 287
Cantekin, E. I., 191, 208
Cantwell, D., 133, 140
Carney, A. E., 189, 194, 196, 203, 207
Carpentieri, S., 171, 185
Carr, E. G., 317, 326
Carr, L., 124, 142
Carrow-Woolfolk, E., 232, 329, 330, 331, 332
Carver, R., 57, 76, 312, 313, 324
Casby, M., 128, 141
Caspi, A., 313, 324
Castelli, M. C., 237, 245
Chabon, S. S., 295, 310, 317, 324
Channell, R. W., 269, 290
Chapman, A., 238, 248
Chapman, J. P., 8, 13
Chapman, L. J., 8, 13
Chapman, R., 158, 166, 268, 272, 287, 290
Cheng, L. L., 231, 245
Chial, M. R., 30, 47
Chipchase, B. B., 130, 144
Chomsky, N., 123, 141
Chung, M. C., 174, 175, 184
Cibis, G., 161, 166
Cicchetti, D., 163, 166, 215, 249
Cirrin, F. M., 241, 245, 255, 273, 288
Clahsen, H., 124, 141
Clark, M., 118, 121, 122, 135, 141, 143
Cleave, P. L., 116, 124, 128, 141, 144, 231, 246
Clegg, M., 133, 140
Cochran, P. S., 269, 270, 288
Coe, D., 178, 181, 185
Cohen, I. L., 153, 165, 173, 178, 184
Cohen, M., 120, 141, 149, 164, 165
Cohen, N. J., 134, 141
Cohrs, M., 215, 246
Compton, A. J., 232
Compton, C., 104, 108, 232, 245
Conant, S., 270, 288
Conboy, B., 230, 246, 255, 278, 289
Connell, P. J., 317, 324
Connor, M., 152, 161, 165
Conoley, J. C., 103, 104, 108, 333
Conover, W. M., 30, 47
Conti-Ramsden, G., 238, 245
Cook, T. D., 306, 324
Cooke, A., 158, 165
Cooley, W. C., 151, 152, 165
Cooper, J., 126, 144
Cordes, A. K., 66, 68, 76
Corker, M., 195, 207
Coryell, J., 195, 196, 207
Coster, W. J., 237, 246
Courchesne, E., 173, 184, 185
Crago, M., 118, 124, 135, 141, 142, 231, 245
Craig, H. K., 133, 141, 268
Crais, E. R., 79, 81, 108, 236, 245, 281, 288
Creaghead, N. A., 10, 13, 282, 288
Creswell, J. W., 279, 288
Crittenden, J. B., 196, 207
Cromer, R., 149, 165
Cronbach, L. J., 66, 76
Crutchley, A., 238, 245
Crystal, D., 268, 269, 288
Cueva, J. E., 184
Culatta, B., 23, 48
Culbertson, J. L., 189, 192, 203, 207, 208
Cunningham, C., 121, 122, 141
Curtiss, S., 117, 118, 144
Cushman, B. B., 295, 310, 317, 324
D
Dale, P., 237, 246
Damasio, A. R., 181, 184
Damico, J. S., 82, 108, 229, 230, 241, 245, 251, 252, 253, 255, 257, 274, 275, 283, 284, 285, 286, 288
D’Angiola, N., 176, 184
Daniel, B., 271, 291
Darley, F., 251, 290, 337
Davidson, R., 270, 291
Davies, C., 238, 249
Davine, M., 134, 141
Davis, B., 128, 145
Dawes, R. M., 8, 13
Day, K., 271, 291
de Villiers, J., 262, 291
de Villiers, P., 262, 291
DeBose, C. E., 229, 231, 249
DellaPietra, L., 235, 245
Demers, S. T., 84, 85, 108
Denzin, N. K., 279, 288
Derogatis, L. R., 235, 245
Page 342
Deyo, D. A., 196, 209
Dickey, S., 337
Diedrich, W. M., 259, 288, 294, 302, 324
DiLavore, P., 174, 175, 184
Dirckx, J. H., 197, 208
DiSimoni, F., 333
Dobrich, W., 135, 144
Dodds, J., 213, 215, 246
Doehring, D. G., 231, 245
Dollaghan, C., 223, 230, 245, 251, 262, 264, 265, 268, 274, 275, 287, 288, 296, 297, 298, 299, 300, 302, 303, 304, 305,
307, 308, 309, 315, 316, 324
Donahue-Kilburg, G., 82, 83, 108, 203, 208
Donaldson, M. D. C., 158, 165
Dowdy, C. A., 134, 141
Downey, J., 158, 165
Downs, M. P., 188, 189, 190, 191, 193, 194, 197, 207, 209
Dubé, R. V., 196, 208
Dublinske, S., 241, 245
Duchan, J., 253, 259, 262, 263, 279, 289, 290
Dunn, Leota, 40, 51, 57, 71, 76, 232, 245, 331
Dunn, Lloyd, 40, 51, 57, 71, 76, 232, 245, 331
Dunn, M., 171, 172, 186, 240, 245
Durkin, M. S., 147, 148, 165
Dykens, E. M., 149, 152, 153, 158, 159, 161, 164, 165
E
Eaton, L. F., 161, 165
Eaves, L. C., 171, 184
Edelson, S. M., 181, 185
Edmonston, A., 333
Edwards, E. B., 241, 245
Edwards, J., 118, 124, 141, 142
Edwards, S., 159, 160, 164, 166
Eger, D., 295, 306, 310, 317, 318, 320, 324
Ehlers, S., 171, 174, 175, 180, 184, 185
Ehrhardt, A. A., 158, 165
Eichler, J. A., 191, 208
Eisele, J. A., 119, 140
Ekelman, B., 130, 140
Elbert, M., 259, 288, 294, 324
Elcholtz, G., 283, 292
Ellis Weismer, S., 124, 125, 141, 237, 249
Ellis, J., 23, 48
Embretson, S. E., 279, 288
Emerick, L. L., 215, 248
Engen, E., 202, 208
Engen, T., 202, 208
Erickson, J. G., 62, 77
Evans, A. W., 104, 109, 238, 239, 249, 296, 312, 326
Evans, J., 125, 141, 266, 267, 268, 288
Evans, L. D., 188, 189, 200, 201, 204, 207
Eyer, J., 125, 142
F
Fandal, A., 215, 246
Farmer, M., 133, 141
Faust, D., 7, 8, 13
Fay, W., 176, 184
Feeney, J., 215, 246
Fein, D., 171, 172, 173, 178, 186
Feinstein, C., 171, 172, 173, 178, 186
Feldt, L. S., 102, 108
Fenson, L., 237, 246
Ferguson, B., 133, 140
Ferguson-Smith, M., 152, 161, 165
Fergusson, D. M., 313, 324
Feuerstein, R., 276, 278, 288
Fey, M., 116, 128, 141, 221, 231, 246, 269, 290, 309, 325
Finnerty, J., 269, 288
Fiorello, C., 84, 85, 108
Fisher, H. B., 335
Fiske, D. W., 55, 76
Fixsen, D. L., 304, 325, 326
Flax, J., 240, 247
Fleiss, J. L., 68, 77
Fletcher, J. M., 21, 48
Fletcher, P., 118, 145, 269, 288
Flexer, C., 188, 195, 199, 208
Fluharty, N., 239, 246
Flynn, S., 260, 290
Foley, C., 260, 290
Folstein, S., 173, 184
Foster, R., 258
Foster, S. L., 298, 304, 325
Fowler, A. E., 159, 165
Fox, R., 157, 165
Francis, D. J., 21, 48
Frankenburg, W. K., 213, 215, 246
Franklin, R. D., 307, 308, 315, 317, 325
Fraser, G. R., 197, 208
Frattali, C., 87, 108, 251, 288, 295, 303, 306, 318, 319, 325
Fredericksen, N., 66, 77
Freedman, D., 30, 48
Freese, P., 130, 133, 135, 143, 145
Freiberg, C., 266, 290
Page 343
Fria, T. J., 191, 208
Fristoe, M., 336
Frith, U., 173, 178, 181, 184
Fudala, J., 335
Fujiki, M., 133, 141, 262, 287
Fukkink, R., 314, 325
Funk, S. G., 104, 109, 238, 239, 249, 296, 312, 326
G
Gabreels, F., 147, 166
Gaines, R., 161, 166
Galaburda, A., 119, 141
Gardiner, P., 157, 167
Gardner, M. F., 40, 48, 60, 77, 100, 108, 233, 330, 331
Garman, M. L., 269, 288
Garreau, B., 181, 185
Gathercole, S., 125, 141
Gauger, L., 119, 121, 141
Gavin, W. J., 266, 271, 289
Geers, A. E., 201, 202, 208, 209
Geirut, J., 251, 262, 289, 303, 325
Gerken, L., 261, 289
German, D. J., 54, 77, 332, 333
Gertner, B. L., 133, 141
Geschwind, N., 119, 120, 141, 142
Geschwint-Rabin, J., 154, 166
Ghiotto, M., 270, 291
Gibbons, J. D., 30, 47
Giddan, J. J., 258
Gilbert, L. E., 189, 203, 208
Giles, L., 271, 289
Gilger, J. W., 117, 118, 140, 142
Gillam, R., 126, 128, 140, 142, 145
Gillberg, C., 171, 174, 175, 180, 184, 185
Girolametto, L., 237, 246
Glaser, R., 58, 77
Gleser, G. D., 66, 76
Glesne, C., 279, 303, 325
Goldenberg, D., 215, 247
Goldfield, B. A., 302
Goldman, R., 336
Goldman, S., 134, 142
Goldsmith, L., 204, 208, 237, 248
Goldstein, H., 251, 262, 289, 303, 325
Golin, S., 273, 292
Golinkoff, R. M., 261, 289
Good, R., 234, 248
Goodluck, H., 261, 289
Gopnik, M., 118, 124, 135, 141, 142
Gordon-Brannan, M., 241, 242, 245
Gorman, B. S., 307, 308, 315, 317, 325
Gottlieb, M. L., 165
Gottsleben, R., 269, 292
Gould, S. J., 20, 47, 48
Graham, J. M., 151, 152, 165
Grandin, T., 178, 184
Green, A., 161, 166
Green, J. A., 239, 249
Greene, S. A., 158, 165
Grela, B., 125, 142
Grievink, E. H., 205, 209
Grimes, A. M., 241, 245
Gronlund, N., 31, 48, 67, 71, 76, 77
Grossman, H. J., 149, 165
Gruber, C. P., 331
Gruen, R., 158, 165
Gruner, J., 120, 144
Guerin, P., 181, 185
Guidubaldi, J., 201, 209
Guitar, B., 10, 13, 238, 249, 264, 292
Gutierrez-Clellen, V. F., 230, 237, 246, 255, 278, 289
H
Haas, R. H., 173, 185
Haber, J. S., 239, 246
Hadley, P., 121, 133, 141, 142, 237, 246
Haley, S. M., 237, 246
Hall, N. E., 116, 128, 135, 140, 142, 170, 184, 231, 245, 270, 287
Hall, P., 259, 290
Hall, R., 283, 292
Hallin, A., 184
Haltiwanger, J. T., 237, 246
Hammer, A. L., 96, 100, 108
Hammill, D. D., 40, 48, 54, 57, 77, 103, 109, 233, 240, 247, 330, 332, 333
Hand, L., 337
Hanna, C., 233
Hanner, M. A., 330
Hansen, J. C., 224, 244, 246
Harper, T. M., 171, 185, 304, 325
Harris, J. L., 229, 231, 246
Harris, J., 195, 208
Harrison, M., 194, 208
Harryman, E., 215, 248
Hartley, J., 268, 289
Hartung, J., 237, 246
Haynes, W. O., 215, 248
Hecaen, H., 120, 144
Page 344
Hedrick, D., 233, 236, 237, 246
Heller, J. H., 104, 109, 238, 239, 249, 296, 312, 326
Hemenway, W. G., 197, 207
Hersen, M., 164, 165
Hesketh, L. J., 125, 141
Hesselink, J., 121, 142
Hicks, P. L., 317, 318, 325
Hirshoren, A., 222, 246
Hirsh-Pasek, K., 261, 289
Hixson, P. K., 269, 289
Ho, H. H., 171, 184
Hodapp, R. M., 149, 152, 153, 158, 159, 161, 164, 165
Hodson, B., 241, 242, 245, 335
Hoffman, M., 276, 278, 288
Hoffman, P., 276, 291
Holcomb, T. K., 195, 196, 207
Holmes, D. W., 199, 208
Hopkins, J., 104, 108, 217, 238, 246
Horodezky, N., 134, 141
Horwood, L. J., 313, 324
Howard, S., 268, 289
Howe, C., 153, 154, 155, 160, 166
Howlin, P., 133, 144
Hresko, W., 54, 77, 103, 109, 233, 332
Hsu, J. R., 68, 77
Hsu, L. M., 68, 77
Huang, R., 104, 108, 217, 238, 246
Huisingh, R., 329, 333
Hummel, T. J., 235, 246
Hurford, J. R., 132, 140, 142
Hutchinson, T. A., 96, 107, 108, 222, 246, 248
Hux, K., 238, 248
I
Iglesias, A., 278, 291
Impara, J. C., 103, 104, 108
Ingham, J., 304, 325
Inglis, A., 133, 140
Ingram, D., 124, 142
Inouye, D. K., 31, 47
Isaacson, L., 134, 141
J
Jackson, D. W., 121, 142, 194, 200, 204, 209
Jackson-Maldonado, D., 237, 246
Jacobson, J. W., 148, 165, 304, 325
Janesick, V. J., 280, 289
Janosky, J., 230, 245
Jauhiainen, T., 204, 210
Jenkins, W., 125, 126, 143, 144
Jensen, M., 276, 288
Jernigan, T., 121, 142
Johansson, M., 171, 180, 185
Johnson, G. A., 277, 291
Johnson, G., 230, 248
Johnston, A. V., 330
Johnston, E. G., 330
Johnston, J. R., 115, 142
Johnston, P., 126, 143
Jones, S. S., 31, 48
Juarez, M. J., 222, 248
K
Kahneman, D., 8, 13
Kalesnik, J. O, 215, 216, 235, 236, 238, 248
Kallman, C., 125, 144
Kamhi, A., 8, 13, 115, 116, 128, 142, 216, 229, 231, 241, 245, 246
Kanner, L., 181, 185
Kaplan, C. A., 130, 144
Kapur, Y. P., 194, 197, 198, 208
Karchmer, M. A., 204, 208
Kaufman, A. S., 215, 246
Kaufman, N. L., 215, 246, 336
Kayser, H., 229, 231, 246, 247
Kazdin, A. E., 297, 304, 305, 324, 325
Kazuk, E., 215, 246
Kearns, K. P., 68, 76, 77, 274, 275, 286, 290, 307, 308, 309, 314, 315, 317, 325, 326
Kelley, D. L., 279, 289
Kelly, D. J., 217, 247
Kemp, K., 266, 267, 270, 289
Kent, J. F., 64, 77
Kent, R. D., 8, 13, 64, 77
Kerlinger, F. N., 19, 48
Keyser, D. J., 104, 108
Khan, L. M., 336
King, J. M., 161, 166
Kingsley, J., 156, 166
Klaus, D. J., 58, 77
Klee, T., 189, 192, 203, 207, 266, 267, 270, 289, 290
Klein, S. K., 203, 208
Kline, M., 232
Koegel, L. K., 304, 325
Koegel, R., 304, 325
Koller, H., 156, 161, 166
Page 345
Kovarsky, D., 253, 279, 286, 289
Kozak, V. J., 201, 202, 209
Kramer, J. J., 333
Krassowski, E., 127, 142, 231, 247
Kratochwill, T. R., 307, 315, 317, 325
Kreb, R. A., 319, 323, 324, 325
Kretschmer, R., 121, 141
Kresheck, J., 215, 232, 248, 331
Kuder, G. F., 69, 77
Kuehn, D. P., 120, 142
Kulig, S. G., 239, 247
Kunze, L., 239, 249
Kwiatkowski, J., 336
L
Lahey, M., 10, 13, 48, 118, 124, 128, 141, 142, 223, 231, 247, 269, 289, 303, 325
Lancee, W., 133, 140
Lancy, D., 279, 289
Landa, R. M., 273, 289
Larsen, S. C., 332, 333
Larson, L., 199, 210
Layton, T. L., 104, 109, 199, 208, 238, 239, 249, 296, 312, 326
Le Couteur, A., 175, 185
League, R., 59, 76, 103, 107, 237, 246, 258
Leap, W. L., 231, 247
Leckman, J. F., 149, 152, 153, 159, 164, 165
Lee, L., 218, 247
Lehman, I., 296, 326
Lehr, C. A., 215, 247
Lehrke, R. G., 152, 166
Lemme, M. L., 120, 142
Leonard, C., 119, 121, 141
Leonard, L., 114, 117, 118, 119, 121, 122, 123, 124, 125, 126, 128, 130, 131, 132, 137, 140, 142, 221, 223, 230, 240,
247, 251, 270, 289, 317, 327
Leverman, D., 233
Levin, J. R., 307, 315, 317, 325
Levitsky, W., 120, 142
Levitz, M., 156, 166
Levy, D., 125, 143
Lewis, N. P., 336
Lidz, C. S., 255, 276, 278, 289
Lillo-Martin, D., 188, 198, 207, 208
Lincoln, A. J., 173, 185
Lincoln, Y. S., 279, 288
Linder, T. W., 281, 289
Ling, D., 195, 208
Linkola, H., 204, 210
Lipsett, L., 134, 141
Locke, J., 125, 143
Loeb, D., 124, 143
Logemann, J. A., 335
Logue, B., 271, 291
LoGuidice, C., 333
Lombardino, L., 119, 121, 141
Loncke, F., 192, 209
Long, S. H., 116, 128, 141, 231, 246, 266, 269, 270, 278, 289, 290, 310, 314, 325
Longobardi, E., 237, 245
Lonsbury-Martin, B. L., 206, 208
Lord, C., 174, 175, 184, 185
Love, S. R., 178, 181, 185
Lowe, R., 335
Lubetsky, M. J., 147, 161, 166
Lucas, C. R., 259, 290
Luckasson, R., 148, 166
Ludlow, L. H., 237, 246
Lugo, D. E., 232, 245
Lund, N. J., 253, 259, 262, 263, 290
Lust, B., 260, 290
Lyman, H. B., 76
M
Machon, M. W., 104, 109, 238, 239, 249, 296, 312, 326
Macmillan, D. L., 148, 149, 166
MacWhinney, B., 268, 269, 287, 290
Maino, D. M., 161, 166
Maloney, D. M., 304, 325
Malvy, J., 181, 185
Marchman, V., 237, 246
Mardell-Czudnoswki, C., 215, 247
Marks, S., 158, 166
Marlaire, C. L., 63, 77, 236, 244, 247
Martin, G. K., 206, 208
Mash, E. J., 298, 304, 325
Masterson, J. J., 269, 270, 288, 290
Matese, M. J., 178, 181, 185
Matkin, N. D., 189, 203, 209
Matson, J. L., 178, 181, 185
Matthews, R., 133, 145
Mauk, G. W., 194, 208
Maurer, R. G., 181, 184
Mawhood, L., 133, 144
Maxon, A., 191, 192, 194, 200, 209
Maxwell, L. A., 154, 166, 199, 200, 208
Maxwell, M. M., 253, 279, 289
Page 346
Maynard, D. W., 63, 77, 236, 244, 247
McCarthy, D. A., 215, 249
McCauley, R. J., 7, 12, 13, 35, 38, 48, 102, 104, 108, 217, 220, 225, 231, 234, 238, 247, 249, 251, 252, 253, 256, 264,
290, 292, 296, 299, 312, 313, 326
McClave, J. T., 30, 48
McDaniel, D., 259, 290
McGhee, R., 333
McGlinchey, J. B., 304, 325
McKee, C., 259, 290
McFarland, D. J., 192, 207
McReynolds, L. V., 68, 76, 77, 274, 275, 286, 290, 307, 308, 309, 314, 315, 317, 326
Mecham, M. J., 333
Meehl, P. E., 8, 13, 223, 247
Mehrens, W., 296, 326
Mellits, D., 125, 144
Membrino, I., 266, 289
Menolascino, F. J., 161, 165
Menyuk, P., 130, 143
Merrell, A. M., 104, 108, 217, 242, 247
Mervis, C. B., 158, 166
Merzenich, M., 125, 126, 143, 144
Messick, S., 4, 13, 76, 77, 252, 290
Mient, M. G., 295, 310, 317, 324
Miller, J. F., 158, 166, 230, 235, 236, 249, 252, 253, 262, 258, 263, 266, 268, 269, 270, 271, 273, 284, 288, 290, 292
Miller, R., 276, 288
Miller, S., 125, 126, 143, 144
Miller, T. L., 217, 248
Milone, M. N., 204, 208
Minifie, F., 251, 290
Minkin, B. L., 304, 326
Minkin, N., 304, 326
Mislevy, R. J., 66, 77
Mitchell, J. V., 333, 337
Moeller, M. P., 189, 194, 196, 200, 203, 207, 208
Moellman-Landa, R., 273, 290
Moffitt, T. E., 313, 324
Mogford, K., 195, 208
Mogford-Bevan, K., 188, 203, 208
Moldonado, A., 233
Montgomery, A. A., 104, 109
Montgomery, J. K., 241, 242, 248
Moog, J. S., 201, 202, 208, 209
Moores, D. F., 196, 209
Morales, A., 271, 291
Moran, M. J., 273, 287
Mordecai, D. R., 269, 290
Morgan, S. B., 171, 184
Morishima, A., 158, 165
Morisset, C., 237, 245
Morris, P., 79, 107
Morris, R., 116, 128, 140, 171, 172, 173, 178, 186, 231, 245, 252, 270, 287, 290
Morriss, D., 271, 291
Mowrer, D., 294, 317, 326
Mulick, J. A., 148, 165
Muller, D., 268, 289
Muma, J., 79, 108, 217, 247, 253, 271, 290, 291
Murphy, L. L., 104, 108
Musket, C. H., 199, 209
Myles, B. S., 170, 185
N
Nagarajan, S., 125, 126, 144
Nair, R., 133, 140
Nanda, H., 66, 76
Nation, J., 130, 140
National Council on Measurement in Education (NCME), 10, 12, 31, 47, 50, 62, 72, 75, 76, 89, 96, 105, 107, 252, 287
Needleman, H., 230, 245
Neils, J., 117, 118, 143
Nelson, K. E., 122, 143, 192, 209
Nelson, N. W., 235, 247, 282, 291
Newborg, J., 201, 209
Newcomer, P. L., 40, 48, 57, 77, 103, 108, 240, 247, 332
Newcorn, J. H., 161, 165
Newhoff, M., 121, 143
Newman, P. W., 10, 13
Newport, E., 198, 209
Nicolosi, L., 215, 248
Nielsen, D. W., 6, 10, 13
Nippold, M. A., 104, 108, 134, 143, 217, 238, 246, 248
Nitko, A. J., 49, 58, 67, 71, 77
Nordin, V., 171, 174, 175, 184, 185
Norris, J., 276, 291
Norris, M. K., 222, 248
Norris, M. L., 239, 246
Northern, J. L., 188, 189, 190, 191, 193, 194, 207, 209
Nunnally, J., 225, 248
Nuttall, E. V., 215, 216, 235, 236, 238, 248
Nyden, A., 171, 180, 185
Nye, C., 241, 242, 248
O
O’Brien, M., 114, 145
O’Grady, L., 188, 207
Page 347
Olsen, J. B., 31, 47
Olswang, L. B., 128, 129, 143, 223, 230, 248, 249, 251, 252, 255, 256, 273, 276, 277, 278, 279, 280, 286, 287, 289, 290,
291, 292, 294, 295, 296, 297, 298, 302, 303, 304, 305, 307, 310, 311, 314, 317, 318, 319, 323, 324, 325, 326
Onorati, S., 270, 287
Orman, J., 333
Ort, S. I., 149, 165
Owens, R. E., 221, 248
Oyler, A. L., 189, 203, 209
Oyler, R. F., 189, 203, 209
P
Padilla, E. R., 232, 245
Page, J. L., 23, 48, 181, 185
Palin, M. W., 269, 290
Palmer, P., 171, 185, 269, 290
Pan, B. A., 174, 185
Panagos, J., 268, 291
Pang, V. O., 231, 248
Papoudi, D., 169, 186
Parsonson, B. S., 308, 326
Passingham, R., 118, 145
Patell, P. G., 133, 140
Patton, J. R., 134, 141
Paul, P. V., 194, 196, 200, 204, 207, 209
Paul, R., 128, 143, 174, 176, 177, 185, 203, 209, 221, 222, 223, 248, 253, 262, 263, 268, 290, 291
Payne, K. T., 228, 229, 249
Pedhazur, R. J., 10, 13, 17, 18, 22, 23, 24, 28, 48, 55, 56, 76, 264, 280, 291, 297, 298, 303, 306, 326
Pembrey, M., 117, 143
Peña, E., 230, 248, 255, 276, 278, 289
Pendergast, K., 337
Penner, S. G., 255, 273, 288
Perachio, J. J., 331
Perkins, M. N., 222, 248
Perozzi, J. A., 251, 289
Perret, Y. M., 149, 164
Peshkin, A., 279, 303, 325
Peters, S. A. F., 205, 209
Pethick, S., 237, 246
Phelps-Gunn, T., 332
Phelps-Terasaki, D., 332
Phillips, E. L., 304, 325, 326
Piercy, M., 125, 144
Pindzola, R. H., 215, 248
Pisani, R., 30, 48
Piven, J., 171, 185
Plake, B. S., 104, 108
Plante, E., 104, 108, 116, 118, 120, 121, 127, 135, 141, 142, 143, 217, 218, 220, 222, 231, 242, 247, 299, 326
Plapinger, D., 199, 209
Poizner, H., 198, 208
Pollock, K. E., 229, 231, 246
Polloway, E. A., 134, 141
Pond, R. E., 59, 77, 233, 331
Porch, B. E., 274, 331
Prather, E. M., 233, 236, 237, 238, 246, 248
Prelock, P. A., 241, 245, 268, 282, 289, 291
Primavera, L. H., 264, 280, 291, 306, 326
Prinz, P., 196, 198, 209
Prizant, B. M., 171, 174, 185, 214, 248
Proctor, E. K., 294, 326
Prutting, C. A., 251, 289
Purves, R., 30, 48
Pye, C., 269, 291
Q
Quartaro, G., 270, 287
Quigley, S. P., 207
Quinn, M., 230, 235, 236, 249, 252, 284, 292
Quinn, R., 278, 291
R
Radziewicz, C., 81, 109
Rajaratnam, N., 66, 76
Ramberg, C., 171, 180, 185
Rand, Y., 276, 278, 288
Rapcsak, S., 120, 143
Rapin, I., 114, 130, 143, 171, 172, 173, 174, 178, 179, 180, 181, 185, 186, 191, 203, 208, 209
Raver, S. A., 281, 291
Records, N. L., 7, 10, 13, 114, 130, 133, 135, 143, 145
Rees, N. S., 192, 209
Reeves, M., 266, 290
Reichler, R. J., 175, 185
Reid, D., 54, 77, 103, 109, 233, 332
Reilly, J., 237, 246
Remein, Q. R., 6, 13, 214, 249
Renner, B. R., 175, 185
Reschly, D. J., 148, 149, 166
Rescorla, L., 128, 143, 237, 248
Resnick, T. J., 191, 209
Reveron, W. W., 229, 248
Reynell, J., 331
Reynolds, W. M., 335
Page 348
Reznick, S., 237, 246
Rice, M. L., 117, 119, 121, 124, 133, 141, 142, 143, 144, 217, 237, 247, 249
Richard, G. J., 330
Richardson, M. W., 69, 77
Richardson, S. A., 156, 161, 166
Ries, P. W., 188, 209
Riley, A. M., 330
Rimland, B., 173, 181, 185
Risucci, D., 130, 145
Rivera, D. M., 333
Robarts, J., 169, 186
Roberts, J. E., 237, 245
Roberts, L. J., 304, 325
Robinson-Zañartu, C., 230, 231, 248, 255, 278, 289
Roby, C., 136, 144
Rodriguez, B., 128, 129, 143, 241, 245
Roeleveld, N., 147, 166
Roeper, T., 262, 291
Rolland, M.-B. 266, 290
Romeo, D., 121, 141
Romero, I., 215, 216, 235, 236, 238, 248
Rondal, J. A., 159, 160, 161, 164, 166, 270, 291
Rosa, M., 229, 249
Rose, S. A., 258
Rosen, A., 294, 326
Rosen, G., 119, 141
Rosenbek, J. C., 64, 77
Rosenberg, L. R., 233
Rosenberg, S., 149, 155, 166
Rosenzweig, P., 233
Rosetti, L., 237, 248, 281
Ross, M., 189, 191, 192, 194, 200, 209
Ross, R., 117, 118, 144
Roth, F., 267, 291
Rothlisberg, B. A., 103, 109
Rounds, J., 7, 13
Rourke, B., 21, 48
Roush, J., 194, 203, 208, 209
Roussel, N., 232, 248
Roux, S., 181, 185
Rowland, R. C., 3, 13
Ruscello, D., 40, 48
Rutter, M., 133, 144, 169, 173, 174, 175, 184, 185
S
Sabatino, A. D., 217, 248
Sabers, D. L., 100, 107, 109, 222, 248
Sabo, H., 158, 166
Salvia, J., 33, 35, 36, 48, 63, 64, 77, 96, 102, 107, 109, 225, 231, 233, 248, 252, 264, 291, 296, 326
Sanders, D. A., 192, 195, 209
Sandgrund, A., 161, 166
Sanger, D., 238, 248
Sattler, J. M., 37, 47, 48, 76, 158, 166, 225, 226, 249
Sauvage, D., 181, 185
Scarborough, H., 135, 144, 269, 270, 291
Schachter, D. C., 133, 137, 140, 144
Scheetz, N. A., 191, 207, 209
Schiavetti, N., 262, 265, 292
Schilder, A. G. M., 205, 209
Schlange, D., 161, 166
Schloss, P. J., 204, 208
Schmelkin, L. P., 10, 13, 17, 18, 22, 23, 24, 28, 48, 55, 56, 76, 264, 280, 291, 297, 298, 303, 306, 326
Schmidt, R. A., 251, 292, 309, 326
Schopler, E., 169, 175, 184, 185
Schraeder, T., 230, 235, 236, 249, 252, 284, 292
Schreibman, L., 173, 185, 317, 326
Schreiner, C., 125, 126, 143, 144
Schupf, N., 152, 167
Schwartz, I. S., 223, 249, 255, 279, 280, 292, 296, 297, 304, 305, 307, 324, 326
Scientific Learning Corporation, 126, 144
Secord, W. A., 10, 13, 59, 77, 230, 233, 238, 245, 249, 251, 252, 253, 255, 257, 259, 264, 274, 283, 284, 285, 286, 288,
292, 305, 326, 329, 332, 333, 337
Selmar, J., 337
Semel, E., 59, 77, 233, 238, 249, 264, 292, 305, 326, 329
Sevin, J. A., 178, 181, 185
Shady, M., 261, 289
Shanteau, J., 7, 13
Shaywitz, B., 21, 48
Shaywitz, S. E., 21, 48
Shelton, R. L., 259, 288, 294, 324
Shenkman, K., 118, 135, 143
Shepard, L. A., 235, 249
Sherman, D., 251, 290
Sherman, G., 119, 141
Shewan, C., 283, 292
Shields, J., 173, 185
Shine, R. E., 259, 292
Shipley, K. G., 331
Short, R. J., 148, 166
Shriberg, L., 268, 291, 336
Shu, C. E., 158, 165
Page 349
Shulman, B., 241, 242, 245, 332
Siegel, L., 121, 122, 141
Silliman, E. R., 279, 282, 292
Silva, P. A., 313, 324
Silverman, W., 152, 167
Simeonsson, R. J., 148, 166
Simon, C., 262, 263, 292
Simpson, A., 173, 185
Simpson, R. L., 170, 185
Slater, S., 283, 292
Sliwinski, M., 240, 247
Smedley, T., 199, 209
Smit, A., 222, 249, 337
Smith, A. R., 238, 249, 264, 292
Smith, B., 174, 175, 184
Smith, E., 114, 145
Smith, M., 82, 108,229, 230, 245
Smith, S., 271, 291
Smith, T. E. C., 134, 141
Snyder, L., 237, 245
Snow, C. E., 123, 144, 174, 185
Snow, R., 63, 77
Snowling, M. J., 130, 144
Soder, A. L., 337
Sowell, E., 121, 142
Sparks, S. N., 155, 166
Sparrow, S. S., 149, 163, 165, 166, 215, 249
Spekman, N., 266, 290
Spencer, L., 192, 196, 209
Sponheim, E., 174, 185
Sprich, S., 161, 165
St. Louis, K. O., 40, 48
Stafford, M. L., 238, 248
Stagg, V., 157, 166
Stark, J., 258
Stark, R. E., 115, 125, 137, 144
Stein, Z. A., 147, 148, 165
Steiner, V., 59, 77, 233, 331
Stelmachowicz, P., 199, 210
Stephens, M. I., 104, 109, 239, 249
Stephenson, J. B., 158, 165
Stevens, G., 60, 77
Stevens, S. S., 20, 43, 48, 265, 292
Stevenson, J., 133, 144
Stewart, T. R., 7, 13
Stillman, R., 63, 77
Stock, J. R., 201, 209
Stockman, I. J., 230, 235, 236, 249, 252, 284, 292
Stokes, S., 238, 249
Stone, T. A., 331
Stothard, S. E., 130, 144
Stout, G. G., 195, 209
Strain, P. S., 184, 185
Stratton, K., 153, 154, 155, 160, 166
Stray-Gunderson, K., 151, 164, 166
Striffler, N., 239, 249
Strominger, A., 135, 140
Stromswold, K., 266, 292
Strong, M., 196, 198, 209
Sturner, R. A., 104, 109, 238, 239, 249, 296, 312, 326
Sue, M. B., 331
Supalla, S., 198, 209
Supalla, T., 198, 209
Svinicki, J., 201, 209
Sweetland, R. C., 104, 108
Swisher, L., 35, 48, 104, 108, 115, 120, 141, 143, 217, 220, 225, 231, 234, 247, 252, 256, 290, 296, 299, 312, 313, 326
T
Tackett, A., 271, 291
Tager-Flusberg, H., 126, 144
Taitz, L. S., 161, 166
Tallal, P., 115, 117, 118, 121, 125, 126, 137, 142, 143, 144, 145
Taylor, O. L., 228, 229, 249
Taylor, S. J., 279, 292
Templin, M. C., 266, 292, 337
Terrell, F., 229, 249, 273, 292
Terrell, S. L., 229, 249, 273, 292
Teszner, D., 120, 144
Thal, D., 237, 246
Thane, N. L., 333
Thompson, C. K., 315, 317, 324, 326
Thordardottir, E. T., 237, 249
Thorner, R. M., 6, 13, 214, 249
Thorton, R., 260, 292
Thorum, A. R., 330
Thurlow, M. L., 215, 247
Tibbits, D. F., 241, 245
Timbers, B. J., 304, 326
Timbers, G. D., 304, 326
Timler, G., 128, 145, 146
Tobin, A., 233, 236, 237, 246
Tomblin, J. B., 7, 10, 13, 114, 117, 118, 121, 122, 130, 133, 135, 142, 143, 144, 145, 262, 265, 287
Tomlin, R., 265, 288
Torgesen, J. K., 59, 77
Toronto, A. S., 233
Toubanos, E. S., 103, 109
Page 350
Townsend, J., 173, 185
Tracey, T. J., 7, 12, 13
Trauner, D., 121, 145
Trevarthen, C., 169, 186
Tsang, C., 232, 233
Turner, R. G., 6, 10, 13, 222, 249, 256, 292
Tversky, A., 8, 13
Tyack, D., 269, 292
Tye-Murray, N., 192, 209
Tynan, T., 157, 167
Tzavares, A., 120, 144
U
Udwin, O., 160, 166
V
van Bon, W. H. J., 205, 209
Van den Bercken, J. H. L., 205, 209
van der Lely, H., 124, 145
van der Spuy, H., 121, 122, 141
Van Hasselt, V. B., 164, 165
van Hoek, K., 188, 207
Van Keulen, J. E., 229, 231, 249
van Kleeck, A., 82, 109, 128, 145
Van Riper, C., 62, 77
Van Voy, K., 304, 325
Vance, H. B., 217, 248
Vance, R., 104, 108, 120, 143, 218, 220, 222, 247, 299, 326
Vargha-Kadeem, F., 118, 145
Vaughn-Cooke, F. B., 223, 229, 230, 249
Veale, T. K., 126, 145
Veltkamp, L. J., 161, 167
Vernon, M., 197, 198, 210
Vetter, D. K., 10, 13, 96, 109, 253, 292
Volterra, V., 237, 245
Vostanis, P., 174, 175, 184
Voutilainen, R., 204, 210
Vygotsky, L. S., 276, 292, 310, 327
W
Wallace, E. M., 238, 248
Wallace, G., 330
Wallach, G. P., 132, 145
Walters, H., 133, 140
Wang, X., 125, 126, 144
Warren, K., 63, 77
Washington, J. A., 229, 230, 249
Wasson, P., 157, 167
Waterhouse, L., 169, 171, 172, 173, 178, 186
Watkins, K., 118, 145
Watkins, R. V., 114, 121, 130, 145
Wechsler, D., 18, 48
Weddington, G. T., 229, 231, 249
Weiner, F. F., 269, 292, 336
Weiner, P., 135, 145
Weiss, A., 7, 13, 230, 247, 259, 290
Welsh, J., 122, 143
Wender, E., 134, 145, 181, 186
Werner, E. O., 232, 331
Wesson, M., 161, 166
Westby, C., 241, 245, 279, 292
Wetherby, A. M., 171, 174, 185, 214, 248
Wexler, K., 124, 144
White, K. R., 194, 208
Whitehead, M. L., 206, 208
Whitworth, A., 238, 249
Wiederholt, J. L., 332
Wiig, E. D., 31, 48
Wiig, E. H., 59, 77, 230, 233, 238, 245, 249, 251, 252, 253, 255, 257, 258, 264, 274, 283, 284, 285, 286, 288, 292, 305,
326, 329, 332, 333
Wiig, E. S., 31, 48
Willis, S., 239, 249
Wilcox, M. J., 317, 327
Wild, J., 133, 140
Wilkinson, L. C., 279, 282, 292
Williams, D., 176, 186
Williams, F., 24, 28, 48
Williams, K. T., 97, 103, 109, 330
Wilson, A., 158, 165
Wilson, B., 130, 133, 140, 145
Wilson, K., 283, 292
Wiltshire, S., 103, 109
Windle, J., 195, 209
Wing, L., 171, 172, 173, 178, 180, 186
Wise, P. S., 157, 165
Wnek, L., 201, 209
Wolery, M., 299, 300, 327
Wolf, M. M., 304, 305, 319, 323, 324, 325, 326
Wolfram, W., 231, 249
Wolf-Schein, E. G., 169, 173, 175, 186
Wolk, S., 204, 208
Woodcock, R. W, 333
Woodley-Zanthos, P., 152, 154, 164
Woodworth, G. G., 192, 198, 209
World Health Organization, 85, 87, 109, 169, 186, 253, 280, 282, 292
Page 351
Worthington, D. W., 199, 210
Wulfeck, B., 121, 145
Wyckoff, J., 270, 291
Y
Yaghmai, F., 120, 141
Yen, W. M., 22, 47, 55, 56, 57, 58, 59, 68, 76
Yeung-Courchesne, R., 173, 185
Ying, E., 199, 200, 210
Yoder, D. E., 8, 13, 258
Yonce, L. J., 223, 245
Yoshinago-Itano, C., 200, 203, 210
Young, E. C., 331
Young, M. A., 29, 48, 297, 298, 327
Ysseldyke, J. E., 33, 35, 36, 48, 64, 77, 96, 102, 107, 109, 215, 225, 231, 234, 247, 248, 252, 264, 291, 296, 326
Yule, W., 160, 166
Z
Zachman, L., 329
Zelinsky, D. G., 149, 165
Zhang, X., 114, 145
Zielhuis, G. A., 147, 166
Zigler, E., 149, 165
Zigman, A., 152, 167
Zigman, W. B., 152, 167
Zimmerman, I. L., 59, 77, 233, 331
Page 352
Page 353
SUBJECT INDEX
Page numbers followed by a t indicate tables and those followed by an f indicate figures.

14-morpheme count, 240


A
Ability testing, 30, 44
Accountability, 295, 306–309, 315, 317
Achievement testing, 30, 44
Acquired epileptic aphasia, see Landau-Kleffner syndrome
Acting-out tasks, 261
Activity, ICIDH-2 proposed definition of, 87
African American culture,
Black English, 236
family attitudes, 83
Age differentiation studies of construct validity, see Construct validity, developmental studies of
Age-equivalent scores, 35–36t, 44
Agreement measures, 68–69f
Akinesia, 181–182
Alternate-forms reliability, 67–68, 71
Amazing University of Vermont Test, 32
American Sign Language (ASL), 196
American Speech-Language-Hearing Association, 317, 319–320
Anastasi, Anne, autobiographical statement, 60–61
Anxiety disorder, 134, 138
Arena assessment, 281, 284
Arizona Articulation Proficiency Scale, 2nd ed., 335t
Asian culture, 83
Asperger’s disorder, 169, 171–172t, 180t, 182
Asperger syndrome, see Asperger’s disorder
Asperger Syndrome Screening Questionnaire (ASSQ), 175
Assessing Semantic Skills Through Everyday Themes, 329t
Assessment Link Between Phonology and Articulation: ALPHA (Revised ed.), 335t
Assessment of change,
importance of, 294, 317–321
outcome measurement and, 294–295, 317, 322
prediction of future change, 310–311
recommended readings, 324
special considerations, 296–311
types of methods used,
dynamic assessment, 310–311, 314
informal criterion-referenced measures, 313–314
norm-referenced tests, 312–313
single subject experimental designs, 314–317, 316f
standardized criterion-referenced measures, 258t, 313
Assessment of Children’s Language Comprehension, 258t
Assessment of Phonological Processes–Revised, 335t
Page 354
Assigning Structural Stage, 269t
Attention deficit hyperactivity disorder (ADHD), 133–134, 138
definition, 134
specific language impairment and, 133–134
Atypical autism, see Pervasive developmental disorder not otherwise specified (PDD-NOS)
Auditory integration training, 181
Auditory training, 190
Austin Spanish Articulation Test, 232t
Authentic assessment, 236, 252, 284
Authenticity, 252, 284
Autism, see Autistic spectrum disorder
Autistic disorder,
definition, 170t, 182
high-functioning, 174, 176, 179t
low-functioning, 180t
other terms for, 169
symptoms of, 169
Autistic spectrum disorder,
behavioral checklists and interviews, 174–175t
classification of subgroups, 169, 178
DSM-IV diagnostic categories, 169
dyspraxia and, 181
fragile X syndrome and, 153, 173
mental retardation and, 169, 171
motor abnormalities, 179t–180t, 181
personal perspective, 176–177
play and, 170, 174, 178
pragmatic deficits, 169–170t, 172, 174, 179t, 180t
prevalence, 169
recommended readings, 184
sensory differences, 181
sleep disorders and, 181
stereotypical behaviors, 170t, 178
suspected causes, 173–174
genetic, 173
infectious disease, 173
neurologic, 173
suspected neurologic abnormalities, 173
theory of mind and, 181
written language and, 179
Autism Diagnostic Inverview-Revised (ADI-R), 175t
B
Bankson-Bernthal Test of Phonology, 335t
Bankson Language Test-2, 329t
Baseline measures, 307, 315–317, 316f
Batelle Developmental Inventory, 201t
Behavioral objectives, 19, 44
Belief in the law of small numbers, 8, 11
Bellugi’s negation test, 263
Ber-Sil Spanish Test, 232t
Bilingual Syntax Measure-Chinese, 232t
Bilingual Syntax Measure-Tagalog, 232t
Bioecological model of development, 79
Blind measurement procedures, definition of, 314
Boehm Test of Basic Concepts-Preschool, 329t
Boehm Test of Basic Concepts-Revised, 329t
Bracken Basic Concept Scale-Revised, 329t
Bradykinesia, 181, 182
Bronfenbrenner, influence in developmental research, 79
C
Carolina Picture Vocabulary Test, 199
Carrow Elicited Language Inventory, 329t
Case examples, 1, 2, 3, 38–42, 113–114, 168–169, 187–188, 213–214, 228t, 250–251, 293–294
Caseloads and assessment practices, 283
Causation,
confusion with correlation, 29–30, 43
single subject design and study of, 307
Central auditory processing disorders, 191
Chapter summary, 11, 46–47, 75–76, 106–107, 137, 161–162, 181–182, 205, 242–243, 283–284, 323–324
Checklist for Autism in Toddlers (CHAT), 175t
Childhood Autism Rating Scale (CARS), 175t
Childhood disintegrative disorder, 169, 172t, 182
Chinese, 232t
Chromosomes, 162
Classical psychometric theory, 66–67
Classical true score theory, see Classical psychometric theory
Clinical decision making,
definition, 4, 11
disconfirmatory strategy in, 8
ethics and, 50, 72
fallacies in, 7, 8
measurement and, 252
model of, 7, 9f
types of, 5t
Clinical Evaluation of Language Fundamentals-3, 264, 329t
Clinical Evaluation of Language Fundamentals-3 Spanish Edition, 233t
Clinical Evaluation of Language Fundamentals-Preschool, 329t
Clinical Probes of Articulation (C-PAC), 259
Clinically significant change, 321–322
Clinical significance, 29, 44, 297–306, 321
Cochlear implants, 192, 205
Coefficient alpha, 69
Page 355
Coefficient of determination, 29
Cognitive referencing, see also Discrepancy testing
definition, 127, 138,
problems with, 127–128, 231–235
Collaborative assessment approaches, 280–282
types of, 281
for younger children, 236–237
Communication Abilities Diagnostic Test, 330t
Communication Analyzer, 269t
Communication Screen, 239t
Compton Speech and Language Screening Evaluation-Spanish, 232t
Comprehensive Assessment of Spoken Language, 330t
Comprehensive Receptive and Expressive Vocabulary Test, 330t
Computerized Language Analysis (CLAN), 268
Computerized Language Error Analysis Report (CLEAR), 269t
Computerized Profiling Version 6.2 and 1.0, 269t
Computers and language assessment and treatment, 31, 44, 126
Concurrent validity, 59t, 61–62
Conduct disorder, 134, 138
Confidence interval, 70, 73, 224–226, 225f
Confirmatory strategy in decision making, 7–8, 11
Congenital aphasia, see Specific language impairment
Construct validity,
centrality of, 53, 72
contrasting groups evidence, 53–55, 54t, 74
convergent and discriminant validation, 55–56, 74
definition, 52, 73
developmental studies of, 53–54, 54t, 74
factor analysis and, 55
Content, Form and Use Analysis, 269
Content-related validity, see Content validity
Content validity, see also Item analysis,
Content coverage, 56
Content relevance, 56
Definition, 73
Expert evaluation of, 56
Test design and, 56
Contexts,
affecting children and families, 79–83, 80f, 83t, 87
affecting clinicians, 79–80f, 84t, 84–88, 240–242, 283, 317–321
Coordinated assessment strategies, 280–282, 284
Correlation, 26–28, 27f
Correlation coefficients, interpretation of magnitude, 28t
Correlation coefficients, types of, 28
Criterion-referenced measures
construction of, 33–34f, 58, 253–255, 254f
definition of, 31, 44
examples of, 32t
interpretation of, 31, 43, 60
scores for, 38, 101–102
use in screening and identification, 217, 230, 236
Criterion-related validity, 61–62, 74
concurrent validity, 59t
criterion selection, 61
predictive validity, 59t, 61–62, 310
Cultural validity, see Clinical significance
Curricula, types of, 282
Curriculum-based assessment, 280, 282, 284
Cutoff score
confidence intervals and, 224–226
definition, 33, 243
determining local cutoffs, 222
empirical selection of, 222
recommended levels for identification of language impairment, 221–224
Cutting score, see Cutoff score
D
Deaf culture, 195–196
Deafness, see Hearing impairment, deafness
Decision matrix, 5–6f, 11, 219f
Del Rio Language Screening Test, 233
Denver Developmental Screening Test-Revised, 215
Derived scores, 35–37, 44
Description of language, see Descriptive measures
Descriptive measures, see also Criterion-referenced measures; Informal measures
characteristics of, 252–253, 283
criterion-referenced tests as, 257
norm-referenced tests as, 255–256
purposes, 230
recommended readings, 286
types of
criterion-referenced, 257, 258t
dynamic assessment, 276–279, 277t, 310–311
on-line observations, 274–275
norm-referenced, 255–256
probes, see also Informal measures, 257, 259–261, 260t–261t, 263t, 285, 308–310, 316f
qualitative measures, 279–280
rating scales, 262–266
use in examining treatment effectiveness, 251
use in treatment planning, 251
validity and, 250–255, 280, 283
Developmental dysphasia, see Specific language impairment
Page 356
Developmental Indicators for Assessment of Learning-Revised, 215
Developmental scores, see Age-equivalent scores; Grade-equivalent scores
Developmental Sentence Scoring (DSS), 267, 269t
Deviation IQ, 38
Diadochokinesis, 64
Diagnosis, see Identification
Diagnostic and Statistical Manual of Mental Disorders IV,
diagnostic categories related to autistic disorder, 170t
diagnostic categories related to specific language impairment, 114–115t
Dichotomous scoring, 69
Difference scores, 116, 296
Differential diagnosis, 3, 12
Direct magnitude estimation, 263, 284
Disability,
ICIDH definition of, 86
Discrepancy analysis, see Discrepancy testing
Discrepancy testing, see also Cognitive referencing
criticisms of, 116, 231–235
mental retardation and, 158, 162
specific language impairment and, 116
state regulations and, 241–242
use in description, 255–257
Discriminant analysis, 222
Distributions, statistical, 24, 37f, 43–44
Down syndrome,
definition, 162
dementia and, 5, 152
health problems and, 151–152
pattern of strengths and weaknesses, 159t
personal perspective, 156
prevalence, 150–152, 151f, 152f
Dynamic assessment, 276–279
definition, 276,
sample hierarchy of cues, 277t
use in identification, 278
use in planning treatment, 276–278
use with children from diverse cultures, 230, 278
use with children with mental retardation, 278
validation, 278–279
Dyskinesia, 181–182
Dyspraxia, 181–182
E
Echolalia, 176, 179, 182
Ecological validity, see Clinical significance
Eduational relevance, see Clinical significance
Effect size, 123, 138, 297–303, 322–323, see also Clinical significance
Elicitation strategies,
imitation, 260t
production, 260t
syntax, 260t–261t
Eligibility for special education services, 241–242
Emotional/Behavioral problems
hearing impairment and, 204
mental retardation and, 161
specific language impairment and, 133–134
Enabling behaviors, 63–65, 74, 100
English as a Second Language (ESL), 227
Epilepsy and language disorders, 119, 161, 181, 183
Error, see Measurement error
Evaluating Acquired Skills in Communication—Revised, 330t
Event recording, 275, 285
Expert systems, 7
Expressive language disorder, 114–115t
Expressive One-Word Picture Vocabulary Test-Revised, 330t
Expressive One-Word Picture Vocabulary Test-Spanish, 233t
Expressive Vocabulary Test, 97f–99f, 103, 109, 330t
Extended optional infinitive account of SLI, 124
F
Face validity, 61, 74
Factor analysis, 74
Fallacies in decision making, 7–8
Family assessment, 81
Family members as partners in assessment, 78, 81, 236–237, 281
Fast ForWord, 126, 138
Fetal alcohol effect (FAE), 153, 163
Fetal alcohol syndrome, 153–155t, 163
Fisher-Logemann Test of Articulatory Competence, 335t
Fluharty Preschool Speech and Language Screening Test, 239t
FM radio systems, 206
Formative testing, 30, 31
Fragile X syndrome
attention deficit and hyperactivity disorder, 153
autism and, 153, 173
definition, 163
gender and, 152
prevalence and, 152
sensory problems, 153
Fullerton Language Test for Adolescents, 330t
Page 357
Functional Communication Measures (FCMs), 319–320, 322
Functional Status Measures (Educational Settings) of the Pediatric Treatment Outcomes Form, 264
Functionality, 252, 285
G
Gain scores, 296, 322, see also Difference scores
General all-purpose verbs, 129t, 138
General processing deficit accounts of SLI, 124–125, 138
Generalizability theory, 66
Generalization, 295, 311, 315
Genetics,
basic concepts, 150, 162–163
chromosomal disorders, 150
concordance, 117, 138
Down syndrome and, 150–151f
family studies of specific language impairment, 117–118
fragile X syndrome and, 152–153, 154f
genetic disorders versus inherited disorders, 151
hearing impairment and, 197
incomplete penetrance, 118, 138
pedigree studies of specific language impairment, 117
premutation, 152–153,
specific language impairment and, 117–119
transmission modes
autosomal versus X-linked, 118, 162
dominant versus recessive, 118
twin studies of specific language impairment, 117
Goldman-Fristoe Test of Articulation—Revised, 336t
Gold standard, 218, 243
Grade-equivalent scores, 35, 36t, 44
Grammar, recommended tutorial text, 132
Grammatical Analysis of Elicited Language—Complex Sentence Levels (GAEL-C), 201t
Grammatical Analysis of Elicited Language–Presentence Level (GAEL–P), 201t
Grammatical Analysis of Elicited Language—Simple Sentence Level (GAEL-S), 201t
Grammatical complexity, see Linguistic complexity
Grammatical morphemes,
inflectional morphemes, 133
specific language impairment and, 131, 133
H
Handicap,
ICIDH definition of, 86
objections to use of this term, 86–87
Hard of hearing, definition, 191, 206
Health and Psychosocial Instruments (HaPI) database, 105
Hearing aids, 195
Hearing impairment,
academic difficulties, 189, 203
age at identification, 194
assessment of American sign language (ASL), 198–199
bilingual model of language development for Deaf children, 196
causes
genetic, 197
infectious disease, 197
ototoxic agents, 197, 206
prematurity, 197–198, 206
rh incompatibility, 197, 206
configuration of, 192–193f, 206
deafness
cultural considerations, 195, 196, see also Deaf culture
definition, 188, 205
differences from other levels of hearing impairment
effects on oral language acquisition, 203–204
emotional/behavioral disorders and, 204
implications for oral language assessment,
norms, 200
procedures, 199–201t
interventions
for mild and moderate hearing impairment, 190t, 195t
for profound hearing impairment, 190t
laterality of, 192
magnitude of, 189–190t
personal perspective, 189
prelingual, 194
prevalence, 188
recommended readings, 207
sign language, 188, 195–196
special considerations in assessment planning, 198–200, 203
syndromes associated with, 197
total communication and, 195–196
types of,
central auditory processing disorders, 191–192
conductive, 191, 205
mixed, 191, 206
sensorineural, 191, 206
Hispanic culture, 83t
Homogeneity of item content, 69
Page 358
I
ICIDH: International Classification of Impairments, Disabilities, and Handicaps, 85–87, 282
ICIDH-2: International Classification of Impairments, Activities, and Participation of the World Health Organization, 87
IDEA, see Legislation, Individuals with Disabilities Education Act of 1990 (IDEA)
Identification of language impairment,
cognitive referencing and, 231–235
definition, 215
diagnosis versus, 215
disorder versus difference question, 227–231
federal legislation and, see Legislation
importance of, 216
local regulations and, 128
recommended cutoffs, 221–224
recommended levels of sensitivity and specificity, 220, 222
recommended readings, 244
special challenges in, 217–236
use of criterion-referenced measures in, 238–240
use of norm-referenced measures in, 217–240
use of standardized measures in, 217
Illinois Test of Psycholinguistic Abilities, 267
Index of Productive Syntax (IPSyn), 269t
Imitation, 260t
Impairment,
ICIDH definition of, 86
ICIDH-2 proposed definition of, 87
Indicators
definition, 17, 19f, 43, 44
formative, 18, 19, 44
reflective, 18, 45
value of multiple indicators, 305–306
Individual Educational Plans (IEPs), 81, 320
Individualized Family Service Plans (IFSPs), 81
Individuals with Disabilities Education Act (IDEA), 84, 106, 108
Informal measures, see also Criterion-referenced measures; Descriptive measures
development of, 254f
relationship to criterion-referenced measures, 251
relationship to experimental measures, 251
reliability, 68–69t
Informativeness, 265
Instrumental outcomes, 295, 311–322
Intelligence testing, 20, see also Cognitive referencing
Interdisciplinary teams, 281
for children with autistic spectrum disorder, 3
for children with hearing impairment, 203
requirement for nondiscriminatory assessment, 85
Interexaminer agreement, 69t, 74
Interexaminer reliability, 70, 74
Intermediate outcomes, 294, 322
Internal consistency, see Reliability, types of
Interval level of measurement, 21t–22, 43–44
Interval recording, 275, 285
Interval scaling, 262, 285
Item analysis, 57–59, 74
Item difficulty, 57
Item discrimination, 57
Item formats, 100
Item tryout, 57
J
Jangle fallacy, 56
Jingle fallacy, 56
K
Kaufman Assessment Battery for Children, 215
Kaufman Speech Praxis Test for Children, 336t
KE family, 118
Key concepts and terms, 11–12, 44–46, 73–75, 106, 138–139, 162–163, 182–183, 205–206, 243, 284–286, 322–323
Khan-Lewis Phonological Analysis, 336t
Kuder-Richardson formula 20 (KR20), 69
L
Labeling
negative effects of, 216
purposes of, 216
Landau-Kleffner syndrome, 119
Language Assessment, Remediation, and Screening Procedure (LARSP), 269t
Language
development,
as a guide to treatment planning, 130
regression in childhood disintegrative disorder, 172t
regression in Rett’s disorder, 172t
regression in Landau-Kleffner syndrome, 119
variability in, 128–129
domains, 90
modalities, 90
Language Development Survey, 237t
Language difference, 228, 243
Page 359
Language diversity,
current levels of diversity, 82, 227
implications for screening and identification, 3, 33, 81, 227–231, 243
norms, 33, 229–230
recommended readings, 231
Language impairment versus language delay, 130, 132, 216
Language knowledge deficit accounts of SLI, 123–124
Language Processing Test-Revised, 330t
Language sample analysis, 266–274
analysis methods, 267–271
computerized programs, 266, 269t–270
elicitation procedures, 269t, 273–274
factors affecting results, 271, 273–274
history of use, 266–271, 283
innovations in, 266
use in assessing change, 266
use in examining interactions in language performance, 268
use in identification, 240
use in treatment planning, 266
use with diverse populations, 230
Language Sampling, Analysis & Training (LSAT), 267, 269t
Language tests,
criterion-referenced measures, 32t
for children under age 3, 237t
for children with hearing impairment, 201t–202t
for languages other than sign languages or English, 232t–233t
norm-referenced measures, 329t–337t
processing-dependent measures, 230
sign languages, 198–199
written language, 329t–337t
Latent variables, 18
Late talkers, 128–130, 129t, 138
Learning disabilities and measurement issues, 18
Learning readiness, see Assessment of change, prediction of future change
Legislation
Education for All Handicapped Children Act of 1975 (PL 94–142), 84, 108
Education of the Handicapped Act Amendments of 1986, 81, 108
Individuals with Disabilities Education Act of 1990 (IDEA), 84, 85, 106, 108, 281–282
Individuals with Disabilities Education Act Amendments of 1997, 84, 85, 108, 318
Newborn and Infant Hearing Screening and Intervention Act of 1999, 194
Limited English proficiency (LEP), 227, 243
Lingquest, 269
Linguistic complexity, 259, 266
Linguistic universals, 267
Lipreading, see Speech reading
Local norms, 33, 44, 222–223
M
MacArthur Communicative Development Inventories, 237t
Magnetic resonance imaging (MRI), 119–120, 139
Manual communication, see Sign languages
Mastery, 33
Maximal performance measures, 64
McCarthy Scales of Children’s Abilities, 215
Mean, 24, 45
Mean length of utterance (MLU), 240, 267, 270–271
calculation of, 272t
Measurement of behavior
definition, 4, 12, 252
history of, 20, 49
levels of, 20–23
relationship to selection of appropriate statistical methods, 23
Measurement error, 224–226f
assessment of change and, 296
base rates and, 235
referral rates and, 235
relationship to reliability, 67, 224
types, 6
false negatives, 219f
false positives, 219f
Measurement scales, see Measurement of behavior, levels of
Median, 25, 45
Mental measurements yearbook series, 104–105, 106, 240
Mental retardation
adaptive functioning and, 147, 162
age at identification, 147, 161–162
alcohol and, 153–154
attention deficit and hyperactivity disorder, 153, 159t–161, 160t,
autism and, 153, 171
causes
nonorganic, 155–156
organic, 149–155
toxins, 153–154, 156
cerebral palsy and, 147
communication strengths and weaknesses, 159t–160t
definitions of, 147, 148t, 163
Page 360
Mental retardation (Continued)
dementia and, 152, 162
emotional/behavioral disorders, 153, 159t–161, 160t
familial, 155
fetal alcohol syndrome and, 153–155f
fluency disorder, 159t
fragile X syndrome and, 149, 152–153, 154f
hearing impairment and, 151, 155, 159t, 160t
longterm outcomes, 171
maltreatment and, 161
personal perspective, 156
prevalence, 147, 161
recommended readings, 164
sensory differences, 151, 153, 155, 159t–160t
severity, 147, 148
Miller-Yoder Language Comprehension Test, 258t
Mixed expressive-receptive language disorder, 114–115, 115t
Mode, 25, 45
Mosaic Down syndrome, 151, 163
Multidisciplinary assessment, 281, 285
Multiple measures, 223, 305–306, see also Multipleoperationalism
Multipleoperationalism, 306
N
National norms, 32, 45
National Outcomes Measurement System (NOMS), 294, 319–320, 322–323
Native American culture family attitudes, 83t
Natural Process Analysis, 336t
Nominal level of measurement, 20–21t, 43, 45
Nondiscriminatory assessment
definition of, 85, 106
methods for achieving, 229–231, 278
Nonparametric statistics, 30, 45
Nonreciprocal language, see Stereotypic language
Normal curve, see Normal distribution
Normal distribution, 30, 37f
Normative group, 32, 45, 101, 234
Norm-referenced measures
construction of, 33–34f, 57–58
definition of, 31, 45
examples of, 32t
interpretation of, 31, 43, 60, 218–227, 234
scores, 34–35, 101
use in description, 255–256
use in screening and identification, 217
Norms
definition, 32, 45
local, 33, 44
national, 32, 45
O
Observational Rating Scales, 305
Observed score, 66, 74
Omega squared, 29
Operational definitions, 19, 45
Oral and Written Language Scales: Listening Comprehension and Oral Expression, 330t
Oral and Written Language Scales: Written Expression, 331t
Ordinal level of measurement, 21t–22, 43, 45
Otitis media, 129, 151, 191, 199, 203, 205–206
Otoacoustic emissions and early identification of hearing impairment, 194, 206
Outliers, 24
Out-of-level testing, 158, 163, 256
Overshadowing, 204
P
Paper-and-pencil tests, 31, 45
Parallel-forms reliability, see Alternate-forms reliability
Parametric statistics, 30
Parent involvement in assessment, 236, 304
Parent questionnaires, 236–238
Parrot Early Language Sample Analysis (PELSA), 269
Participation, ICIDH-2 proposed definition of, 87
Patterned Elicitation Syntax Test with Morphophonemic Analysis, 331t
Peabody Picture Vocabulary Test, 267
Peabody Picture Vocabulary Test-III, 51–52, 57, 71, 331t
Pearson Product Moment correlation coefficient, 28, 43
Percentile ranks, 36
Performance standard, 38
Performance testing, 31, 45
Perisylvian areas, 119–120
Person first nomenclature, 216, 243
Pervasive developmental disorder (PDD), 169, 172t, 183
Pervasive developmental disorder not otherwise specified (PDD-NOS), 169, 172t, 183
Phenotype, 117, 139
Phonological awareness, 137, 139
Phonological memory deficit account of specific language impairment, 125
Phonological Process Analysis, 336t
Phonology tests, 335t–337t
Photo Articulation Test, 337t
Physician’s Developmental Quick Screen, 239t
Picture selection task, 261
Page 361
Placement testing, 30
Play-based assessment, 281, 284
Porch Index of Communicative Ability in Children, 274, 331t
Prader Willi syndrome, 158
Predictive validity, see Criterion-related validity, predictive validity
Preferential looking, 261t
Preferential seating, 190t, 195
Prelingual hearing loss, 206
Pre-Linguistic Autism Diagnostic Observation Schedule (PL-ADOS), 175
Preschool Language Assessment Instrument (PLAI), 258t
Preschool Language Scale-3 (PLS-3), 331t
Preschool Language Scale-3–Spanish edition, 233t
Preuba del Desarrollo Inicial del Lenguaje, 233t
Principles and parameters framework (Chomsky), 123–124
Proband, 117, 139
Probes, see also Criterion-referenced measures; Descriptive measures; Informal measures
control probes, 308, 315
generalization probes, 251, 308–309, 315
phonology, 259
pragmatics, 259, 260
sources for finding, 259, 260t–261t
syntax, 260t–261t
treatment probes, 308–309, 315
Profile analysis, see Discrepancy testing
Profile in Semantics-Grammar (PRISM-G), 269t
Profile in Semantics-Lexicon (PRISM-L), 269t
Pronominal reversals, 176, 183
Proportional Change Index (PCI), 299–303, 322
Psychiatric diagnoses and language impairment, 134
Public relations validity, 73, see also Face validity
Pye Analysis of Language (PAL), 269t
Q
Qualitative change, see Clinical significance
Qualitative measures, 279–280
Qualitative research, 280, 285
R
Range, 26, 45
Rating scales, 262–266
halo effects, 264
leniency effects, 264
metathetic continuum, 265, 285
prothetic continuum, 265, 285
Ratio level of measurement, 21t–23, 43, 45
Raw scores, 34–35
Recasts, 122, 139
Receptive-Expressive Emergent Language Test-2, 103, 237t, 258t
Receptive One-Word Picture Vocabulary Test-Upper Extension, 331t
Receptive One-Word Picture Vocabulary Test, 331t
Regional dialect, 82, 227–231
Reification and intelligence tests, 20
Reliability
coefficients, 66
definition, 65–66, 75
differences in methods for criterion- versus norm-referenced measures, 67, 102
factors affecting, 71, 72
recommendation regarding levels, 102
relationship to agreement, 68–69f
relationship to validity, 51, 65f–66, 73
types of, 73
alternate forms reliability, 67–68, 71
internal consistency, 68–70
test-retest reliability, 67–68, 75
Restriction of range, effect on reliability, 71
Rett’s disorder, 169, 172t, 183
Reynell Developmental Language Scales-U.S. Edition, 331t
Rhode Island Test of Language Structure, 202
Richness of description, 253, 285
Risk factors
definition of, 116, 139
for language impairment, 116–127
Rosetti Infant-Toddler Language Scale, 237t
S
Scales of Early Communication Skills (SECS), 202
School language, 82
Scores, types of
age-equivalent, 35–36t, 44
criterion-referenced, 38
grade-equivalent, 35–36t, 44
norm-referenced, 34–38
percentile ranks, 36
standard scores, 36–37, 46
Screening, see also Identification
base rates and, 235–236
characteristics of, 214, 242,
comprehensive tests that include communication, 215
federal and local legislation, 240–242
indirect methods, 214
language measures for, 236–239t
Page 362
Screening, (Continued)
reasons for, 214–215
referral rates, 235–236, 243
Screening Test for Developmental Apraxia of Speech, 337t
Secord-Consistency of Articulation Tests (S-CAT), 259, 337t
Segregation studies, see Genetics, pedigree studies of specific language impairment
SEM, see Standard error of measurement (SEM)
Sensitivity
definition of, 218, 243
language tests and, 220–221
Sentence Repetition Screening Test, 239t
Sequenced Inventory of Communication Development (SICD), 236–237t
Sequenced Inventory of Communication Development (SICD) Spanish translation, 233t
Severity ratings, 43,
Sex chromosomes, 163
Sign languages,
tests of, 198–199
varieties of, 196
Signed English, 196
Signing Essential English (SEE-1), 196
Signing Exact English (SEE-2), 196
Simultaneous communication, see Hearing impairment, total communication and
Single subject experimental designs,
clinical use of, 307–310, 314
definition, 322
interpretation of, 307–308, 315–317
recommended readings, 317
statistical versus visual analysis, 308
withdrawal, 315
Smit-Hand Articulation and Phonology Evaluation, 337t
Social comparison as a method of social validation, 304, 322
Social deprivation, effects on development, 156
Social dialect, 82, 227–231
Social validation, 297, 303, 322
Social validity, see Clinical significance
Sound Production Task (SPT), 259
Spanish, 231–233t, 237
Spanish Structured Photographic Expressive Language Test, 232t
Specific language impairment (SLI)
academic difficulties and, 132, 134–135
alternative terms for, 114–116, 119
argument structure and, 131t
brain differences, 119–121
affecting dominance, 119
perisylvian areas, 119–121
planum temporale, 119–120f
versus damage, 119
definition of, 114–115, 137, 139
demographic variables and, 121–122
emotional/behavioral disorders and, 133
environmental variables and, 121–123
figurative language and, 132t, 134
gender differences and, 114
genetic factors, 117–119
illusory recovery, 135
language patterns, 130–133t, 137
long-term outcomes, 135
morphological deficits and, 131t
narrative skills and, 132
nature of, 223–224
personal perspective, 136
phonology and, 131t, 135, 137
pragmatics and, 132t
prevalence, 114
recommended readings, 140
subgroup identification, 115, 130
suspected causes, 116–127
syntactic deficits and, 131t
theoretical accounts (after Leonard), 123–127
crosslinguistic data and, 123–125,
linguistic knowledge deficit accounts, 123–124, 138
generalized processing deficit accounts, 124–125
specific processing deficit accounts, 125–126, 139
written language and, 135, 137
Specificity,
definition, 218, 243
language tests and, 218–221, 222
Speech reading, 188, 190t
Split-half reliability, 68–69
Stability, 67
Standard deviation, 25, 46
Standard error of measurement (SEM), 67, 70, 75, 101, 224
Standard scores, 36–37, 46
Standards for Educational and Psychological Testing, 50, 62, 89, 96, 105, 107
Statistical measures
of central tendency, 24–25, 43
of variability, 24–26, 43
Statistical significance, 28–29, 46, 297
Stephens Oral Language Screening Test, 239t
Stereotypic language, 177
Stereotypy, 182–183
Page 363
Stimulability testing, relationship to dynamic assessment, 276
Strabismus, 161, 163
Structured Photographic Expressive Language Test-II, 331t
Subjective evaluation as a method of social validation, 304–305, 323
Summative testing, 31
Surface hypothesis account of SLI, 125
Syndrome, definition of, 149
Systematic Analysis of Language Transcripts (SALT), 268, 269t, 271t
T
Tagalog, 233
Talking Task (TT), 259
Teacher Assessment of Student Communicative Competence (TASCC), 264
Teacher Assessment of Grammatical Structures (TAGS), 202
Teacher questionnaires, 237–238
Templin-Darley Tests of Articulation, 336t
Temporal processing account of specific language impairment, 125–126
Termination of treatment, 4, 310–311
Test,
definition, 49, 75
effect of length on reliability, 71
Test administration,
adaptations, 63, 157–158, 200, 203, 229t
importance of, 10, 63
motivation, 63
suggestions for, 64t
Test de Vocabulario en Imagenes Peabody, 232t
Test for Examining Expressive Morphology, 331t
Test manuals, how to use, 88–103
Test of Adolescent and Adult Language, 332t
Test of Adolescent/Adult Word Finding, 332t
Test of Auditory Comprehension of Language-3, 332t
Test of Children’s Language, 332t
Test of Early Language Development, 332t
Test of Early Reading Ability-Deaf or Hard of Hearing, 103, 109
Test of Language Competence-Expanded, 332t
Test of Language Development-Intermediate: 3, 57, 332t
Test of Language Development-Primary: 3, 240, 332t
Test of Pragmatic Language, 332t
Test of Pragmatic Skills (Revised), 332t
Test of Relational Concepts, 333t
Test of Word Finding, 333t
Test of Word Finding in Discourse, 333t
Test of Word Knowledge, 333t
Test of Written Expression, 333t
Test of Written Language–2, 333t
Test review guide,
annotated, 90f–92f
basic form, 93f–95f
completed example, 97f–99f
Test reviews
client-oriented, 88–89, 106
computerized sources of, 104–5
population-oriented, 88–89, 106
steps in, 88–103
sources of published reviews, 103–105, 104t
Testing of limits, 158
Texas Preschool Screening, 239t
Theoretical construct, 18–19f, 43, 46, 51, 57, 306
Theory, 18, 46
Theory of mind, 181, 183
Time sampling, 275, 285
Token Test for Children, 333t
Transdisciplinary assessment, 281, 285
Treatment
effectiveness, 319, 323
effects, 319, 323
efficacy research, 295, 318, 321
efficiency, 319, 323
outcomes, 294–295
outcomes research, 318
Trial scoring, 274, 286
Triangulation of qualitative data, 280, 286
Trisomy 21, 151, 163
True score, 66, 75
T score, 38
Turner Syndrome, 158
Type-token ratio, 240
U
Ultimate outcomes, 294, 311, 323
Utah Test of Language Development-3, 333t
V
Validity
centrality to discussions of measurement quality, 50
definition, 51, 75
factors affecting, 10, 61, 62–66, 235–236
‘‘types of,” see Validation, strategies of evidence gathering
Page 364
Validation
differences for criterion- versus norm-referenced measures, 56–60
strategies of evidence gathering, 52–62
content validity, 52, 56t–60
criterion-related validity, 52, 61–62, 310
construct validity, 52–56, 53f
Variable, 19, 46
Variance, 25, 46
Variance accounted for, 29
Verbal auditory agnosia, 191
Vineland Adaptive Behavior Scales, 163, 215
Visuospatial languages, see Sign languages
W
“Watch and see” policy toward late talkers, 128
Wechsler Intelligence Scale for Children-Revised, 18
Wiig Criterion-Referenced Inventory of Language, 258t
Williams syndrome, 158, 160t, 163
Woodcock Language Proficiency Battery-Revised, 333t
Word Test–Adolescent, 333t
Word Test-Revised, 333t
World Health Organization, 85
Written language, 241
Z
Zone of proximal development (ZPD), 276, 286
Z-scores, 37

Common questions

Powered by AI

Children with SLI are at increased risk for various emotional, behavioral, and social difficulties, including attention deficit disorder (ADD), conduct disorder, and anxiety disorders .

Autism spectrum disorders present diagnostic challenges due to significant overlap with mental retardation and other developmental disorders. The heterogeneity and changes in symptoms further complicate diagnosis, making it essential to understand the specific cognitive deficits and symptomatology over time .

Different reliability types, like test-retest or inter-rater reliability, need to align with clinical questions and the population assessed. High reliability (coefficient at least 0.90 for significant decisions) provides confidence, while lower reliability necessitates cautious decision-making and confirms findings using multiple sources .

An understanding of etiology, such as Down syndrome versus fragile X syndrome, is important because it can influence the associated health risks, the prognosis, and the specific educational and therapeutic approaches necessary for effective management .

Clinicians should consider the test's norm sample representativeness, reliability, and validity regarding the specific demographic and linguistic characteristics of the student population. Additionally, practical considerations such as ease of administration and interpretation are crucial .

The standard error of measurement helps determine the degree of confidence in a test score's accuracy, highlighting potential variability. It guides understanding of whether observed changes between test appearances reflect true skill differences or measurement error .

Cultural factors significantly influence the assessment of bilingual children with language impairments. Assessments must consider the child’s linguistic community and the specific expectations and cultural norms that may affect their language use . Speech-language pathologists are encouraged to conduct evaluations in the child’s native language and use coordinated, interdisciplinary approaches to ensure a comprehensive and culturally sensitive assessment . Furthermore, assessments should include a dynamic component, accommodating the child's cultural background, by using methodologies such as mediated learning experiences that can reflect the child's potential in diverse cultural contexts . Clinicians also need to recognize and minimize their own cultural biases and ensure that their evaluations are relevant to the child's real-life interactions and social environments ."}

It's crucial to ensure that the normative sample of the test closely matches the race, language background, and socioeconomic status of the child being assessed. Significant differences can undermine the validity of the test, so practitioners may need to draw on cultural knowledge and alternative assessment approaches, such as dynamic assessment, if the validity is compromised .

Dynamic assessment provides framework for non-biased evaluation by emphasizing learning potential rather than static abilities. It incorporates mediation and focuses on children's capacity to learn when given support, helping mitigate cultural and linguistic biases .

Organic factors, like genetic syndromes (e.g., Down syndrome, fragile X), offer biological explanations for developmental disabilities, whereas familial factors often relate to socio-environmental influences. These factors affect diagnosis, management, and understanding of the developmental trajectory .

You might also like