0 ratings0% found this document useful (0 votes) 998 views199 pagesTESTING ENGLISH Book For Deveoloped Guide of Study
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Mater levesa
are de
omigeringa PE
nape
H. DOUGLAS BRC
PRIYANVADA ABEYWICKRLanguage Assessment, Principles and Classroom Practices, Second Eton
opi © 2010 by Pasoa Educa, ne. Al igs etre,
{No paths puedo maybe reproduced ced ier eo anced in sy fore,
or by mens, cco, merbniel, paccopig, cong. rather, wet te pe
pemisioa oft publ.
Person Bao, 10 ante, Pit Pi, Y 10606
Sete The pope who me up be Language Asessnect, Princes aod Classroom
Prscdes Secnad Efton an, represen eos procs, des, and macro t --
Dil elie ha ey ery Mus Geko ay Sag soo,
‘Kim Steines, and jennifer Stem banat alaeeans
Cones desig: Tacey Mam Cato
“ext desig: Wey OE, TS Gass
Tex conpestoe TS! Gaplics
Tet foe Garamond Beck
Te a Don Naranes
Tex cred See page at
brary of Congress Cataloging Prion Data
boa, Hough, I~
Lngiage sesenenc pois 2nd chssoom race, Doulas roms,
Dongs way raza
peo.
BaND5S10313
|. Laopage td opmges— Sry and eahng. 2. ang and bape Ean.
3 Language aun L Abeyricama Parada, Te
5340762010
sore
mops
BBO ESL
SBHIOB S15
ER
Pearionlongman.com ces onlne
‘eure for teaches and students Aces
‘ur Cemparion Webstes our cineca,
nd cura ofces oud the wed
\stuat pearsontongm com.
imei he Uates Sixes necr
56789 10-¥036-15 1413
o
-
z
e
-
Bae z
-
o
”
*
-
FEISS TST
Preface
Text Credits
Chapter 1 Assessment Concepts and Issues
Assessment and Testing 3
Measurement and Evaluation 4
Assessment and Learsing 5
[formal and Formal assessment 6
Formative and Summaive Assesment 7
‘NormReerenced and Criterion Referenced Tess 8
‘Types and Purposes of Assesment §
‘Achievemeat Tess 9
Diagnostic Tests 10
Placemenc Tests 10
roiciency Tests 11
Aptitude Tests 11
tsses in Language Assesment Then and Now 12
‘Behavioral Influences on Language Testing 13
Tategrave Approaches 15
Communicative Language Testing 14
Pecformance 8ased Assessment 16
Curent “Hot Topics” in CascoomBased Assessment 16
Mukipl inteigences 17
“Ttitoasl and Akermtve Asesmeat 18
CompurerBased Testing 19
(Other Current sues 21
Beercises 2
For our Furtber Reading 24
Chapter 2 Principles of Language Assessment 25
Prcsicaty 25
Reb 77
‘dant lated Relhiie 78
» bwater Blaby 28
“est Administration Retail 28
Tes Reliable 29
Way 29
Contented Evidence 30
terion Related Erdence 32
Consiuce Related Bence 33
Consequential ay (np)
Face Vility 35
Autbentcy 36
‘Wash 37
Applying Frincpes to he Brauaton of Casroom Tet 40
1. Ace the test procedures practice? 40
2 se est ise babe? 41
3. Canyouensre te relabiin? 41
4 Does the procedure demonstteconrent aii? 42
5. Haste mpc ofthe te been coef accounted fo? $3
6 Is the procedure ‘ised for bes? 44
7 Are the west ask s authentic as possible 4
8 Doss the test ofr beet wasback to the lear 5
Brerctses 48
‘For Your Furor Reading 51
Chapter 3 Designing Classroom Language Tests
Four Asesmeat Scenarios 533
Scenario I: Renting Qui 54
Sceoai 2: Grammar Unit Tet 4
Scenario 3: Nctemn Essay 54
Scenario Lisesing Speaking Fis! Exam 34
‘Deven th Purpose of Tess 38
‘Designing Clea, Unambiguous Objectives 56
Drawing Up Test Speciation 39
Devising Test ems 60
Designing Muliple Choice kems 67
Desig each item fo measure a single jective. 68
‘Sate bo tem and options as sinply and rec as pose. 69
Make cei hate incended answers leat oat he caret oat. 70,
(Options) Use tem indices to accept, sce, or eis ems. 70
Administering be Tes 78
Scoring, Gating, and Giving Feedback 79
Scoring 79
Ai
z
4
3
zt
5
43h
3
Itt
=
SITS
cre
Grating 79
Giving Feedback 80
Brorcises 82
For Your Farber Reading ®4
Chapter 4 Standards-Based Assessment 85
“The Role of Standards in Standardized Tess 86
Standards Bised Education 87
Designing English Language Stancarés 88
Sundarés Based Assessment $0
‘CASAS and SCANS 92
‘Teacher Standards 95
“The Consequences of Standards Based and Sundarized Testing 94
‘Test Bas 95
‘TeseDriven Leaming and Teaching 98
‘tha sues: Cea! Language Testing 98
Eaercises 100
For Your Further Reading 101
Chapter5 Standardized Testing 103
-Adrantages tod Disadvantages of Standaied Tests 104
‘Developing a Standardized Test 106
‘Deteize the purpose aad objecires for the res. 107
Desga es spectations 108
‘Design, select, and arrange test tasks/items. 110
ake appropiate erations of iret kindof tems. 112
‘Specify scoring procedures and reporting formats. 114
Pm ongoing construct valdaton sts. U7
santadied Language roficency Testing 118,
Brees 120
For our Farber Reading 124
Chapter 6 Beyond Tests: Alternatives in Assessment 12
‘The Dilemma of Maximizing Both Practcalry and Washhack 123
Pecfonnance Based Assessment 126
Rubrics 123
Portolis 130
Jourals 134
(Conferences and Interviews 139
Observations 161cons
Rae Rely 23
‘Test Adainsration Relay 28
Tes Raby 29
Yat 29
Contented Frdence 30
Gitesion Reed Fidence 32
Coase Rehted Fidenee 33
Consequently pact) 34
Face lig 35,
uenici 36
Washbe37
Appling Princip tobe Braaton of Clasroom Teas 0
1.Are the tes procedures pres 40
2 lsthe testis reliable? 41
3, Can you ensure rater reba? 41
4 Does the procedure demonsate content ai? 42
5. Hs the impact of he tet been eae ccoue fo! 43
6 Is he procedure "bite for best? 44
7. Are the test tasks as aueatc as posible 44
8, Does tbe tt fe beac wasack othe esac? 46
Freres 48
For Your Further Reading 31
Chapter 3. Designing Classroom Language Tests
Four Asesoment Sernaes 53
Scenario I: Reatig Qin 54
Scentro 2 Gana Unt Tes 4
Scent 3: Mier Esa 34
‘Scenario 4 Linening/Speaking Final Exam 54
Deterining the Prose ofa Tet 55 :
Designing Ger, Tanbiguous Obes 56
Drawiog ip Test Specicaions 59
Devising Test ems 60
Designing Multiple Choice Items 67
Design each tem o meanre a szge objective.
Saxe bth sem and option simply and deci as posse 69
‘ae cerain thar the intended answer i lacy onthe correct one. 70
Options) Use item indices 10 accep, discard, or revit tems. 70
‘Adminsering the Test 78
Scoring, Grading, and Giving Feedback 79
Scoring 79
52
Grading 79
Giving Feedback 80
Exercises 82
For Your Further Reading 84
Chapter 4 Standards-Based Assessment
‘The Rote of Standards in Sundaried Tests 86
Standart Bised Education 87
Designing English Language Standards 88
Standards used Assessment 90
CASAS and SCANS 92
Teacher Suadards 95
‘The Consequences of SandardsBised and Sandardized Testing 94
‘Test Bis 95
‘TeseDriven Leming and Teaching 98
Bical sues: Critical Language Testing 98
Eeorcises 100
For our Further Reading 101
Chapter 5 Standardized Testing
Advantages and Disedrantages of Stndaties Tess 104
Developing aSandartzed Test 106
Determine the purpose aad objectives forthe tes. 107,
Design es specications. 18
Design, selec, and arrange test sires. 110
Make appropri evhatons of iferet kinds of items. 112
Specify seocng procedures and reporting formas 114
Perform ongoing constrict validaton states. 117
Standartized Langage ProflescyTetig 118
Exercises 10
‘For Your Burtter Reading 2.
Chapter 6 Beyond Tests: Alternatives in Assessment
‘Tae Dilemma of Naxiizng Both Pactcalty and Washback 123,
FecormanceBased Asessreat 126,
ubeics 128
Pontolos 130,
Jourals 134
Conferences and Interviews 159
(tserations Mt
85
103
12wi coma
Sele and FeeeAsessmeats 166
Types of Self and FeerAsesment 15
Guidelines fr Selécnd Fee Assesment 151
‘Aono of Se and Peer Assesment Tasks 153
Exarcies 154
‘or Your Furtber Reading 155
Chapter 7 Assessing Listening 136
Insegraon of Sis ia Language ASessment 137
assessing Gara and Voabuary 158
(Ofeerrng the Peformance ofthe Fur Sis 159
‘The imporance of Lisering 160
Basic Types of Listeria 161
Mero- and Macross of Usening 162
Designing Assessmeat Tass lnteasive Listening 164
Recognizing Phonological and Morphological Hements 164
Paaphrase Recognition 166
Designing assessment Tsk: Respossive Listeaing 167
Designing Assessment Ts: Selecie Livening 167
Lisering Coze 168
Information Trnsir 169
Seatence Repetition 172
Designing Assessment Tiss: Extensive Listening 172
Dictation 173
‘Comumusicaive Simuius Response Tass 175
Acshentic Lnenirg Tasks 178
verses 181
For Your Further Reeing 182
Chapter 8 Assessing Speaking 183,
Base Types of Speaking 184
‘Micro-and Macros of Specking 185
Designing Assessment Tasks: ImiziveSpesking 187
‘Yersunt® 188
Designing Assessment Tasks: Inensve Speaking 189
Directed Response Tasks 189
Read Aloud Tasks 189
Sentence/Dialogve Completion Tasks and Onl Quesonnsires 192
Pierre ued Tass 193,
“Translation (of Limited Stretches of Discourse) 201
Designing Assessoent Tasks; Responsive Speaking 201
(Question and Answer 201
inna tnerietnne an Directions 208
-_
z
sd
o-
-
-
—
-
“-
ss
-
i
IIT
nn
THT
156
183
cents vi
Paraphrasing 203
“Text of Spleen English CTSE® Tes) 205,
Designing ssessneat Tasks: ncractive Speaking 207
Incerview 207
ole Pay 214
Discussions and Coaversutions 215
Games 215
ACTEL Onl Proficiency interview (OPD 216
Designing Assessments: Extensive Speaking 218,
(Oral Presentations 219
Picture Cued Soryeling 220,
Reteling 2 Story, News Event 21
‘Tanslation (of Extended Prose) 21
Bxorcies 22
For Your Furtber Reading 223,
Chapter 9 Assessing Reading ng
Genres of Reading 225,
Micros, Macresklls and Strategies for Reading 27
‘Types of Reading 228,
Designing Assesment Tasks: Percepiv Reading 250
Reading Aloud 250
Weiten Response 231
Mull Chee 251
Piceue Cued ems 251
Designing Assessment Tasks: Slecsive Reading 234
uldple Choice lo Form focused Criteria) 254
Matching Tats 257
ating Tasks 258
Pion Cued Tasks 39
Gapfiling sks 240
Designing Assessment Tasks: Interactive Reading 41
‘Cloze Tasks 241
Iimpcompra Reading Pus Comprehension Questions 244
Shoreanswer Tasks 247
ating (Longe Tests) 247
Scanning 249
Ordering Tasks 29
Information Transfer Reading Charts, Maps, Graphs, Diagrams 250
Designing Assessment Tass: Extensive Reading 252
‘Skimming Tiss 253
‘Summarizing and Responding 254
Noetaking and Outining 255
Brercises 256cones
Chapter 10. Assessing Writing
Genres of Weiten Language 260
‘Types of Writing Perfomance 261
Miro- and Macroskis of Weting 262
oe in and] Weng ener, Words, ad Punctaton 265 .
Spelig ists and Detecting Fhonene— Graphene Conesponencs 265
esgning Aseseat Tass tensive (Conroe) Wing 257
Dietaton ad DictoComp 267
(Geant Trnsormation Tasks 268
PieareCued Tasks 268
‘Vocabulary Assesment Tiss 271
Ordering Tasks 272
‘Short Answer and Sentence Completion Tas 272
resus in Assessing Responsive and Extensive Writing 273
Desanig Assessment Tasks: Responsive ané Extensive Wing
Paraphrasing 276
‘Guided Question and Answes 276
Paragraph Construction asks 277
Sategic Options 273,
Sondard:zed Tess of Responsive Wing 279
Scoring Methods for Respoosiv and Extensive Writing 285
Holic Scoring 283
Primary Tal Seocng 284
Aaaljti Scoring 284
beyond Scoring: Responding to Excensie Weting 285
Assessing Hat Sxges ofthe Process of Composing 58
‘evening Laer Sages ofthe Frocess of Composing 259
275
erases 290
For Your Furtber Reading 91
Chapter 11 Assessing Grammar and Vocabulary
assessing Geum 293
Desig Gasatical Kaowledge 94
Designing Assesment Tsk Selected Response 295
‘utp Choice Tasks 295
Discrimination Tisks 298
‘Noveing Tasks or CoasciousaeseRalsng Tasks 299
‘Designing Assesment Tass: Linited Production 299
Gaping Tasks 299
Sho answer Tsks 300
Dialogue Completion Tasks 302
239
FTE asiiiiitt
VW
Designing Assesment Tasks: Bended Production 303
Infoceation Gap Tasks 303,
Role Payor Simulaton Tasks 304
Assessing Vocabalary 305
“The Nature of Vocabulary 306
Defining Lexie! Raowedge 307
Some Considerations ia Designing Assessment Tasks 310
Designing Assessment Tasks: Receptive Vocabulary 311
Designing Assessment Tasks: Productive Vocabulry 314
Exercises 316
or Your Furtber Reading 317
Chapter 12 Grading and Student Evaluation 318
“The Philosophy of Grading: What Shoulé Grades Reflect? 329
‘Guidelines for Selecting Grading Criteria 322
‘Methods for Calculating Grades 322
‘Teachers’ Perceptions of Aopropiste Grade Distibutions 326
Isintonal Expectstions and Consiins 328
rose-Calual Factors and the Question of Difculy 329
What Do Lees Gades Mean"? 31
Calculating Grades 332
Altereaties to Leer Grading 332
Some Principles and Guidelines for Grading and Evalution 337,
Beercises 338
For Your Furtber Reading 340
3a
Appendix: Commercial Tests
Glossary 346
Bibliography 355
‘Name Index 37k
Subject Index a7PREFACE
‘As this second edoa of Language Asessment Principles and Classroom
Practices goes to press, we are embarking on the second decade of this new
irilennium, In that Bs tenear period alone, the Feld of second langeage
acquisition and pedagogy sew remarkable advances in our stockpile of metboce
‘bogies! opdons for teaching languages. The subdscipin of lnguise assess
tment kepe pace with this growth. ntis second edition, we bave almost doubled
the aumber of bibliographic eatries found in the fist edition, 2 sgn of fall
seearch agenda. Also, in thit period of time Several aew jourals—exclusively
‘evoted to language sssesmeat—bave been published, 2 further index of cae
‘eaic prosperity.
“feseament ia gener, but in panicular language assesment, is an ares of
intense fiscination. No longer afield exchusvey relegated to psychorsetricas and
testing exper asessment bas caught the interes of lasoom teachers, sents,
parents, 2nd politcal ation groups. Hows ca (teacher design an effective ast
‘oom ten? What can I stwent) do to prepare fora test and to make assessments
tal Kinds enhancing learning experiences? Are the standardized cess of angusge
(bar mny child bas to ake) accurate measure of sii? And do (as a toca for
far esting practices) belie tha the plethora of tet that sudents ae exposed 1
sxe culture, fee from bis, and not instruments ofa powerful ete designe
arher che gap berween the haves and the haves?
"al of these and sny more quesions now being addressed by teachers,
researchers, and specaliss can be overthelming tothe novice languege teacher,
‘whois already balled by lnguisic and psychological paradigms and a mukiude of
Inetiodelogical opsoas. Tis book provides teachers~and excherstobe—with 2
‘Gent readeriesdly preeataton ofthe esential foundzton stones of language
sscessment, with ample praccl examples to iustrate their application i language
‘assooms Isa book ta simples the isues without oresmpliffing I esu't
dodge complex questions, and it eas them in ways hat cseroam ceachers can
comprehend Readers do nothze to become eng expers~orstiiclas adept
jn manipulating mated equations and advanced cacus—to understand and
apply the concepts in this book,
S111
ii
SEA
J
4
AH
ef
i
3
TH
ater
INF
etice i
+ Glossary of terminology. In this edton you will fad 2 ghssary of
ascessnent ems and concep al of which hae been boc in the
tent of the book We hope ths wi bea usefl way to quit cant he
tenia ofthe myriatof ems invoduced inthis book.
+ Appendix listing commercially avaiable tests. Curent don Wiel
‘muted tenis now preseared in an appends ths ists perdnen infer
ration, specications, nd lternet eferences.
PERSONAL WORDS OF APPRECIATION
Fist, I want to welcome my coauthor for this edition, De Payunrads
‘Aberwickama, profesor of English at San Francisco Sue Univers. thas been my
pleasure to work with rps in writing this second eon, as she has been expe-
dally helpful in iMdentfing new, cutiagedge research inthe field 2s wel asia
feria insights on standards-based and foruMocused asessneat
‘We can both hear aser that this book is very much the producto our own
teaching of language assessmene. Our stents have colecively taught us more
than we have taught them, which prompts us ro chan them a, everwhere, for
these gis of knowledge. And ofcourse the embracing suppor ofthe NATESOL fc
‘uly at Sen Francisco Sate Universi 2n upifing source of stinulation and afi
Tm farther indebged to teachers in many counties around the world where [
hve had the honor of fering workshops and seminars on anguapeasessmeat[
have memorble impressions of such sessons in Bri, Casada, Chile, the
Dominican Republic, Egypt, Japan, Korea, Mexico, Peri, Spin, Taiwan, Thailand,
“Tuckey, Uruguay, and the former repubc of Yugosavia, wheve cosrcukural issues
inascesment have beea especialy stimulating,
Final, we wish co teak our respective paraes, Scot and Mary, for toe
sting authors in theic ome who need all oo many hours of univerrupted fous
‘uring periods of book wring. Tei love and support isa marvelous afiemation
of our work
‘We would lke thank the folowing reviewers who offered valuable insights
about the Ses edition and eaty drat ofthe revised edion: Jorge H. Cables,
Universy of Delaware, Newark, Detware; Fernando Fleuzquia, University of
Maryland Bakimore County, Ealtinore, Maryland; Vasie Kelos, Seneca Coleg,
“Foroato,Cansex; ezanne Medina, Caforia Sate Unversity, California (he Hie)
yon Stafford-Yilmaz, Belevue Community Colles, Belerue, Weshingor; Diane
Steong-Krause, Brigham Young Universi, Provo, Uy; Latricia Tits, Musa
‘Sue Universe, Kennucy.
De. H.Dougha Brown,
Sepeember 209TEXT CREDITS __
oaefuleknoledgmen i made tothe flowing publishers and authors for per
smision to reprint copyihted materia.
“anercan Cound 0a Tenciog Foreign Languages (ACT, fr mate from
ACTHL Proficiency Guidelines Speaking (1986), Oral Proficiency Inventory (OPD:
‘Sumamary Hight
‘Buackwel Publishers, for mater rom Browa, Janes Dean & Balle, Kathleen
M. (1980), A categorical instrument fr scoring second language wing sil
language Learning 34,242
‘Cabfoia Deparment of Eduction, for mate from Cafornie Engl
Language Deveopment ELD) Standards Usening and Speaking.
"BocxtonlTesig Service (TS, fe mater fom Ts of Engl asa Foreign
Language TOEEL* Tet) Tt of Spon Engle (TSE* Te Tit of Writen Ege
15
ugg ie sp etn Mee
Eng Language Assessment Batiery QIELAB).
‘Georetown Universi Press, for material from Swi, Mesid. (1990). Tae ne
guage of French inmersion studens: apcations fr thecryand practice. la ames
Alas (Ed), Georgetoun Univesity round table on languages and lings.
‘Washington: Georgetown University ress.
Osi iver Pres, fr mare rom Bachaan, fF (1990). Fundamental
considerations language esting New Yodc Oxford Uaivessy res.
Pearson Education, for mitral fom Versan# Test
Pearson Longman ESL, and Deborah Philip, for mate fom Pais,
Deborah 001), Longman Introductory Course or be TOEFL Test Wit Pits,
Ni;Pearsen ivcaion.
Second Langage Testing, Inc. (LTD, for mater tom Moder Language
Apitude Tet.
‘alversy of Cambridge Local Examinations Spaicate (UCLES, for mate
ftom International Englis Language Testing Stor,
“Yasir nz, Rosian Khar, Ec Philp, and Shela Vi, for unpublished
ster
Pater i
PURPOSE AND AUDIENCE
‘This book is designed to oe a compreheaie survey of eset principles and
tool for second langage asesrent. thas ben succes used in its frst eition
in eacbertining courses teacher certifcaion curcula and TESOL master of as
programs. As the dd in a trogy of teacher education textbooks, its designed to
fotow H, Douglas Brown’ other two books, Principe of Language Learning and
Teaching (ith edkon, Pearson ivcaton, 207) and Teaching by Principles (ied
‘edton, Pearson Education, 2007), References to those two books are sprinkled
throughout the curent book In keeping with the toe st in the previo tO
‘ok this one erties uncomplicated prose and a systematic, spring organize
on. Conceps ae inuodhced wit a maximum of prc! exemplifcaton and 3
‘minimum of weighty defo. Supportive esearch i acknoredged aad suciacly
explained wihoutburdning the reader with ponderous debate ore miouize.
‘The testing discipline someties poses ana of sency that can se
teachers to fee indent 2s they approach the task of mastering peiniples and
designing effecire instruments, Some wstng manual, with thes heavy erphass ca
Jann ard mama equacons, doa'thelp to dspate that mq. By the en of
LangucgeAsesment eaters wil hae gained access otis ncesofrighesing Bld
‘They wi havea woking koowiedge ofa numberof wel, fandaneatpringpes of
ssseseat ad wl bre apple thos prinsplesto praia cassroom contents. They
‘il ao have acqueda storehonse of weil comprehene tools for evabatng aad
desiring paca ettve sessment ehniues the cesses.
‘PRINCIPAL FEATURES
[Noable fare of his book indude the flowing:
+ lacy famed fundamental principles fo evaluating and designing asses
meat procedures of al kinds
* Focus on the most common pedagogical caleage: cassroontbased
assessment
+ Many practical examples to use principles and guidelines
+ Concise bur comprehensive reameat of assessing al fou skills (isening,
speaking, reading, writing)
+ In exch sil, cassification of assessment techniques that ange fom
‘controlled to open-ended item pes ona specie continuum of mire-and
macros of language
+ Explanation of standards based assessment wt it, wy itis 0 pop-
ay, and ts pros and cons
+ Thorough discussion of large-scale standardized tests: their. purpose,
design, vali, and wysii Pace
+ Consteradon ofthe ethies of testing in an educnonal and commercial
world driven by ests
+ Acomprekensie presentation of alternatives in assessment, same, pore
fotos, ural, conferences, observations, iterviews, and selt and pees
assessment
« A sysemaic scusion of lees grading nd overall evaluation of uuent
pesformancein2eourse
+ Endofchapter exereises dot suggest wholectiss discussion and ind
‘ial, pai, and group work forte classroom
+ Suggexed additional readings atthe end of each chapter
IMPROVEMENTS IN THE SECOND EDITION
nts second etion of Language Assessment, a aumber of changes ae present
tilecingadrees inthe Ged s wells enancerens based on fetbek om he
fest eon Some ofthese changes ae 2 lows
+ Advaace organizers at the beginaing of each chapter. Each chapter
‘vw begins with a brief list of objectives, which serve as preceding org
sets fr stadens and insiructors,
‘Arne chapter, The domain of formfocused assessnent is now arsed
in a sepacste chapter (Chapter 11) Although assesing the four sls
‘vores, ia some cases, assessing pertinent grammar tnd vocabulcy, i is
appropriate co rea such foes a8 separate sue. Background information
taeda suvey of reseach are presented long with praca examples of chy
niques fr assessing grammar and ocabular. -
Updated references and new information, The sx years of esearch and
‘prcice dat have wanspre since the frst eition ofthis book create the
‘vious need ro encapsulate that progres in the form of reports of new
research, updated seferences, and te insenion of new information. The
lace incdes the ftlowing:
* Reorganization ofthe rst tree Capters ora more logical progression of
sieps toward undersunding casiroombased assessment
«+ Adnzer and moreincksive discussion ofthe history oflanguage assesment
+ Upued descriptions of curent sues and challenges throughout
4 Recent esearch and practice with regard to sandards bed stessment, in
‘a completely redesigned chapter (Chapter 4)
+A discussion of rbrcs in the chaper ca ‘aemaives* (Chapter 6)
1 Tie afbremendoned weatment of asessing grammar and vocabulary, now
‘wh che aseadon it eserves ina separte chapter (Chaper 11)
«An addtional sect in the chapter oa grading and eration (Chpeet
12) on ealeuliting gndes with refeenees to Webbased resources
]
OBJECTIVES: After reading this chapter, you wi beable to
+ undessund diferences besreen + appreciate historical antecedents
Tsvesment an testng, along with of presenta tens and reseasch
ther basic assessment concepis in language assesement
and tems «+ rasp some ofthe major current
+ singh among Ae ferent ‘asus tha assesment researchers
ypes oflnguage tess, cite sare now addressing
‘cumple of exch, and apply them
{or ferent purposes and conress
‘estshave away of scaring suaeats. How many tne ia your schooldays did you
feelyoursell tense up when your teacher mentioned ates? The anicipation ofthe
upcoming ‘moweat of tru” provoked felngs of ansery and seldoubt along
“ah a fervene bope tha you would come out onthe other end with at leat &
sense of worthiness. The Fear ofl is pechaps one ofthe strongest negative
emotions 2 student caa experience, and the mos comon iastrumeat ining
‘och fear isthe eet. You are not Hkely to view atest as positive, pleasant, oF
‘firming, aad, Uke most ordinary morals, you intensely wis for # miraculons
cxemaptign from the ordeal
And yet es seem 25 unavoidable 25 comorsow' sunrise in virtaiy all
‘education setings around the Word. Courses of sexy in every dcipine
ire marked by these periodic milestones of progress (or sometimes in the pe
Ception ofthe lamer, confirmations cf inaequacy) tha hae become conve:
femal methods of measurement. The guckeeping function of tesis—from
CGassroom achievement tess to largescale standardized (ets—Ius become eo
acceptable norm.
"Now, ust fo fun, take the following quiz. Al five of the word are found ia
standard English diconaries, so you sould be able to answer al ie items cs
right? Okay, go fori2 wrt Asan Cac an es
Deco in exch of he i ites belo seth definition that conectly
‘how unequivocally the tose Kad of asks predic communicative sucess in
language especialy wrteedacqston ofthe ante
evrce oft Limitation, sandzdined apne ex are seo wed oe,
withthe exception, praps, ef Senifying foreign nguageesming diseiity
(Gnsteié & Reed, 2008, led, aes to measure ngage apirde more
fren provide learners with information aout tec peered ses andthe ote
fal seengts and weskneses, with flowy sacl for capitling 08 che
Secaghs and overcoming the weaknesses Robins, 205; Seen, 202) Any test
has custo pred soees nang a lnguage is undoubted fed becase
see now mow that with appropae elinow edge ace sates ivaveetia
Teaing anor stegesbeed aston viral everyone can every se
eed Te pigrntole lamers prio, before they have even attempted co eam
heage to presippose aur or success without substantial ens. (fret
“Gseusson of guage aptnde can be foand in H.D. rowats 2007] Prinses of
Language Leorning and Teaching PLT, Chapter 43"
ISSUES IN LANGUAGE ASSESSMENT: THEN AND NOW
Before moving on to the practicalities of creating casroom tests and asessments,
you wil beter appreciate the intricacies ofthe process by taking rif histone
Took at language esting over the pat half centuy and taking not of some curéat
‘souesin the Held.
istorii, langugesesting trends and practices hae flowed the siting
sade of teaching methodology (or a description ofthese trends, see HD, Bron
{a00rby, acing by Principles (TBP, Chapter 2). Fr rape inthe 19206
{950s aa eraofbehavorsm and special atention to contastiv nals, language
tees foewsed on specific lingisie elements such asthe pbonologicel, gramme:
fea, an lexical contrasts berween two laguages. nthe 1970 and 1880s, com
tnunicative theories of language Brought with them 2 more integrative view of
tenting in which specials cained tat‘the whole ofthe commuriceve event
7 puma levees ae mde ini Book wo compos vtumes by #- Dovghs
esa of Language Learning ond Teaching PEL, ta etion, 200782
‘on which pedagogical pracces te based. Teacing by Princes CBE, til elton,
‘oon spels oat tha peeagogy in practical terms forte language teacher
|
ume Aces Cece adios 13
wis consideabiy grt than the sum ofits linguistic elements” (Cask, 1983,
1.432) Today, et designers ae stil callenged in their quest for more authentic,
‘lid instruments that simulate real-wodd interaction (Leung & Lewkowcr, 2006)
Behavioral Influences on Language Testing
‘Theoogh the mile ofthe twentieth century, language teaching and esting were
both stone infuenced by bebaviol porehology and srvtu linguists. Both
indiions enptoszed sentenceievel grammatical praigns, definitions of voeabu
lary ‘tems, an transation fom fist language to second language an paced oly
tnor focus on realword authentic communication, Typical, tes consisted of
yaar and vocabulary items in mulplechoie format along with 2 varie of
transition exercises ranging from words to sentence to short paragraphs
Such diserete point formats sl preva nd, especaly in largescale stand
sned‘estance examination” wed to admit students tiation of higher edcaion
‘around the wo. Gee Baravel, 1996, an Spot, 1978, 1955 fora summary)
scent asessments were designed onthe assumption tha nguage canbe broken
dient its component pass and that those par canbe teed success. Tee
components are de ski ofseing, speaking, reitiog, aad wring andthe various
tus of inginge (Gace poins) of ehonoiog/gaptolgy, morptoogy, lexicon
yes, and discourse i was cimed tha an ore nguge proce tes then,
should sample al fours ands many igus dsc points pose,
Discrete point testing provided ferle round for what Spolsky (1978, 1995)
‘called the psychometric sructaralist approach to language sessment, in Which
test designers sized the tools ofthe day 0 focus on issues of vali relabiy, and
cbjectvn, Sancadized tess of znguage blossomed inthis scenic climate nd
the language texching/teting word saw such rests 2 the Michigan Test of Ealish
Tanguage Podcicacy (961) and the Test of English asa Foreign Language (1565)
‘pecome extiortirly popular. The science of meisuement and the a of
teaching appeared to have made a revolutionary bance
Integrative Approaches
Tate midst ofthis eros, anguage pedagogy was rapidly moving in more comm
nicave decons, and testing specats were focced into a debate that woud
‘soon respond to the changes. The discretepolnt approach presupposed decin-
texralization hat was proving tobe inauthetie. So, asthe profession emergedinto
‘an er of emphasizing comunicaton,auticaity, and contest, ew approaches
‘vere sought Jon Oller (1979) argue thar hnguage compecence was und set
of inceracting sides that could not be tested separately His cai was that com
snicative competence i global and requires such integration tha itcanaotbe
‘capeuted in dive ess of pramumar, exding, vocabulary, nd other discrete pats
‘of langage, Others (among them Ceiko, 1982, and Savignon, 1962) soon follomet
in their suppor: for what became known 25 integrative testing,1H owner Aas Comedies
‘What does aa iategtie test lok ike? Two types of tests were, atthe ime,
claimed tobe examples of incegraive tests cloze ests and dications, A cloze test
isareading pasage (perhaps 150 to 300 words in which roughly every ssthorser-
‘nth word las bea deleted the ressakeris required 0 sup words that fi ito
those blanks, Gee Chapter 8 fora fll discussion of coz testing)
(ter (1979) ciaimed thc cloze tes: ests Were good measures Of Oral peo-
ficiency. According to theoretical constrvts undeing this cin, the ability to
supply appropeite words in blanks requires competence in 2 language, which
inchdes Knowedge of vocabulary, grammatical sirvcure, discourse structure,
reading sills and staesies; aad an incerazed “expectancy” grammar (enabling
‘ne to predict antes that will come next in a sequence). I was argued that suc
cessful completion of cloze tems taps ar all of those abies, which were sed ro
be the essence of gota langage proficiency.
‘Dictation, in which learners listen toa sort passage and write what they hea
4s familar language eaching techie that evolved into a testing techaiue, Gee
(Ctaprer 6 fora dscuson of dictation a an astessment device) Supporters argued
thar dictation was an integrate test because ps ino grammatical and discourse
competencies required foc other modes of peformasce ina language. Seccess on 2
dctaon est requies careful Istening, repeoduton in wrkdag of wit is head, ef
ent shortierm memory, and, to an extent, some expectancy rules to ad the shore
‘erm memory. Further, dictation tes resus tend to corlate suongly with thee
‘ess of proficiency. For lrgescule testing, the usualy casroomentred dictation
techaigue ca be practical and refable trough the design of multiple choice items.
Proponents of integrative test methods scoa centered thei argumens on Wit
became inown as the wnltary trait hypothests, which suggested an indie”
view of angvage proficiency: tar vocabulcy,graumar, phonology, the our skis?
and other discrete points of nguage could not be dsectangled fom each other in
Innguage performance. The unitary tit aypothess agued that there s 4 general
‘actor of language proficiency suc tht al the discrete Points do nce. add upto that
‘whole However, nasties of debates and research evidence (Fahy, 1982; Ole,
1983), the unitary tat hypothesis was sbandoned.
Communicative Language Testing
By the mid-19608, especially inthe wake of Canale and Swan's (1960) seminal
‘work on commuaicative competence, the languagecesting eld had begun to
focus on designing communicative languagecesting tasks, Bachman and
aimer (1996) included among “fundamen” principles of anguage testing the
‘need for a correspondence betweea language test performance and language
vse: ‘In onde fora particular language test to be usefil for its intended pur
poses, rest performance must correspond in demonstrable ways to language use
{in nontest steations”(p. 9). The problem that language assesment experts
faced was that tasks tended wo be anil, contrived, and alikely to mirror la
Sage use in rel ie. As Weir (1990) noted, integrative tess such as cloze only
ame Aus Conard inet 15,
tellus about a candidate's linguistic competence. They do act tellus anything
iretly about a student's performance abil (p. 6).
‘Thus quest foc authentic was lurched, stst designers centered oncom
‘unica pesformance, Folowing Canale and Swain’ (1980) model, Bachman
(1990) proposed a model of language competence consisting of organizational and
pragmatic competence, respectively subdivided into grammatical and teal com-
‘Poneats aod ino ilocatonary an sociolinguistic components (ee Figure 1.2)
Language Competence
Organizational Pragiatic
Competence Competence
Grammatical Teva Wociionary —~ Secidlinguisic
Competence Competence Competence Competence
hay fee vesent —_ | ean
oly eats uncon: Diletervariey
Syne ‘ganization fMangulatie ——_LSerstty
eaolgy Funes toRegiser
‘raholegy ise Senstity
uncon ‘oNatrares
toast tal Rls
Fens dF,
oF Speech
Figure 1.2. Component of language campetence (Bachman, 1990, p. 87)
Bachman and Falmer (1996, pp. 70-75) alo emphasized the inporsace of
strategic competence (the ability to emplay cemminiatve stategks (0 coat
east for breakdowns as well as enhance the rhetorical effec of uerances in
the proces of communication ll element ofthe model, especially pragmatic and
step bilities, needed tobe included inthe consracsof language esting aad
in the actual performance requiced of esttaers
Communicative testing presented challenges to tet designes, 2s we will set
‘in subsequent chapers of this book. Test designers began to identify the kinds of
realword ts that language earners were called onto perfor. It was clea that
ke Content for those tasks were extaosinarily widely varied and that the sum
pling oftasts for any asesstent procedure needed to be validated by what fe
song users acully do with linguage. Weir (1990) resnded his eaders that“16 cn Aunt Cees an es
measure nguage proficiency... ancount mus now be taken of where, wha,
how, with whom, and why language is he used, and on what ropics, and with
‘what efect(p. 1). The assessment fed also became more concerned with the
authentic of sks andthe genuineness of texts, (See Skchan, 1968, 1989, and
Fulcher, 2000, for surveys of communicative testing research)
Performance Based Assessment
Ta aoguge courses and proams around the wot est designer are now cng
this new and pore swentcentered sgeda (Aeron & Daneyjee, 200, 20%
Bachman, 202 Leung & Leki, 206; Wei, 2005). Ine offen paperaad-
penal muliplechoice tens of pekon of septe teas, prformancetased
asessmentcf langage pial ares ol production, wren proucton pea:
ved sponses, regres peranmaace (Across eae, group pecan, al
‘er inerace tasks. T be sure, uch assent iste censuning ad theeere
legen, but those exa es esl in more direct and more scare testing
teense stata ase shy peroom sel or Suatedreaewoé sks.
techies higher conceae ally Ge Chapter 2 fran expats) achieved
because leane ae mesuedin the proces of pening te teed inguic cs.
nan Exes agege teaching context, pevormance based asesmneat mens
you iy hie aia ine dsingshing Dersea formal and infomal aes
rent you rey ie es on forma truce tess ad aide mee on eae
aoa wile sudeas se pecrming varius ust, you wl be taking some eps
Cowan mecing the goals of penance based asessment. (ee Chapter 10 fora
farther dseuson of peormance sed asesmeat)
Acuncerisic of many ut eotall) pedormance bared nguig sesens
is the presence of icerace tasks, hence the aenatne tera, taiebased
assessment, for sch appocces In such cast, the assessments ine leaaes
i cuprates ei
cesvaers are mestued in he acto peaking requesting, esponding, oa com
(Signy ad rang a tpt eig 2 re Pp
pen texts certainly donot it such communicative perfomance,
A prime example ofan interactive language asessment procedure isan or
incre, The tester requeed to fen arcuntely to someone else and
respond appropri cats taken in the test desgn proces, angage ected
‘eovonceeby tester cn epee! esi ads an
approach the atheticty of elie hoguage use (Se Chapter)
(CURRENT “HOT TOPICS” IN CLASSROOM-BASED ASSESSMENT
Designing communicative, pevformance-ased assessment rubrics continues «0
céallenge astsomeat experts aad casscoom teachers aie in addition, thee new
fsmues in the Bel te shaping our current undersanding of elfecive assesment
ver | Aseen Cong andinet V7
“These are: (1) the elect of nes theories of ntligence on the resting industry ia
ener, @) the vet of what has come tobe cle ake” assessmenh a
te increasing use of cooper ecology in asesmens of vans Kinds We
bey explore these issues bere.
Multiple Enelligences
tnttigence was once viewed sty asthe sity to perm (agus and)
Joeatmabeantal problem solving. This 19 Gaeligence quotient) concept of
ineligence bus permeated the Westra weld and its way of resin for almost =
eer, Because mares in gener is measured by ined, discrete pot es
Conssing of ere of sepa ens, why shout every fell of uy be
tease? Foray ears, eka led ina wort of sandarined, aormeferenced
teas thar ae timed ina mukple choice forsat and consist of multiplicity of log
constrained items, many of which 2 inauthentic,
However in tc ls wo decades ofthe rveatieth cen resarch on intel
ence begat tra the paychomeuic word upside ow, Howar Gare (1985,
199, for example, exeaded te atonal view of inteligence vo eight dierear
onponents! He accepted the tracioal concepuaizatons of guise inele
fence and logicamatematc ineligence on which andardaed 19 tes 2
Ease bat included other rames of min” in bis theory of eanltipe intelligences
sata, musi, Gnesthec, murals, inerpersoal, aod incapecsoasl. Rober
‘Steberg (198, 1997) aso cared ew triton in ineigence research a eo
ining ceive inking ed manipulate satis 2 pat ofinteligece, Ukewse,
‘anal Glenn's (1995) concept of £Q (emotional quotient) spurred us 0 une
‘scare the imporance ofthe emtoas in our cogitve processing
“Tse conceptions of inceligece were not universally ecceped by te
seadenie comunity (Se White, 1988, forexample). Aeral, bow does one objec:
trely mesure such bypotbedalconsnics as iterpesonlineligenc, cea,
vid elreceer? Nevertbeles, thes lantive appeal infuses educators with a sess
ft borb fetdor and responsi in thei caching and testing agendas, a8
dleaced by educatioal reforms atthe tie (Armstong, 1994)
adhe language asesssent il in particular, the recognition of nile inte
genes tas had an invert fect. On the one band, communicate das
aarvais in textbooks and programs have gid increasing attention 10 divers of
tesrng ables and ses. Cisson (2005), for camp feed more tan 150
satvtes for langeage lamers, exch exphasizing specie ineigences, On the
‘rhe hand, in casroom asesment, new ews 0 neigence bare helped to free
Tanguageiastrecion progam fro rejiag excunely on timed, dire pin
tea tess in measuring guage. Cassrom language eachers have ben
fecely prod 0 cao combat te potential tranny of obectiviy” an
—_——_—____——.
“TJeranmaazy ol are? ear ofiarligene, DB (07, pp 107-110)12 cure Aseinen Cones sate
accompanying impersonal approich. Teachers and adminsators have also been
urged to measure whe inguage sl, earning proceses, andthe ability 1 nego.
‘iste meaning. Our challenge continues to be designing assessments tat tp sto
tespersona, creative, communicative, an incerctive sis and in dogo pace
Some test in ou subject ad ntition,
‘Traditional and Alternative Assessment
Implied in some ofthe eutier description of performance based clasroom asses
‘ment isa trend wo supplement tadtional tet design with atemaives tat ae more
authea inthe ection of meaningful communication Tbe 1.1 highligh df
‘erences beoween th two approaches adapted from Armstrong, 1934, and Bailey,
1988, p. 207.
‘Two caveats need to be sated here. Fest, the concepsin Table 1. represent
some overgeneraizations and should therefore be ncerpreted with cation. is di
Sut inact, to daa car ineofdsincon berween what anmstzong (1954) and
Baley (1996) called wadional and alternative assessment Many forms of aes
‘ment fla berweea the two, and some combize the best of both,
‘Secon itis obvious tat the ale shows abs toward aternative assessment,
‘and one shoud not be mised into thinking that evecything on the leehand side i
tainted wibereas the ist oa the righthand side offers station to the feld of a
guage asessient. As Brown aad Hudson (1998) pty poiated out, the assesment
traions sralable tows should be valued and lized forthe funcios they pro-
ide, At the same time, we might al be stimalated to lok athe rghtiand Hs and
‘ask ourselves if among those concept, cere ae alternatives to assesment tat we
‘can use constructively in our chssrooms.
1k should be noted here tat consideibly more time sad higher insrtoaal
budges are required to sdminister and score assesment that presuppose more
subjective eration, more individualzation, znd more iteration inthe proces
Table 1.1. Taitonal and atematve.
Condinvous longterm assesment
Timed, multiple choice fomat Untimed, open-ended responses
Deconietualze est ies CConteualized communicaive asks
‘Sens suficefr feedback lndividuelized fecback
Norm-eferenced seres Citerionseferenced sores
Focus on derte anowers Operended, creative ansvert
Summative Formative
Oriented t product Oriented t process
Nenintratve priomance Interactive psfrmance
etisic motvaton Faster invinsic matvaton
a
10 been
jt aego-
|p ato
iso place
|
ec eeerreeereereeeet
over | wsesme Coc and et 19
of otfeing feedback. The pay for the later, however, comes with more usefil
feedback to setents, the potential for intrinsic motivation, and ukimately a store
complete description ofa tudea’s abl. (See Chaper6 for a complete treatneat
of aliernziesin assessment) More educstors 2nd atvocates for edison refora,
axe arguing fora deemphasis of large scale sandaried tess in ive of contexte
alized, communicative, pecormancebased asesoment that Wil beter fitate
tearing in our schools. la Chapters 4 and 5, isues surrounding saadardzed
testing ae addressed ar length)
Computer-Based Testing
Receat years have seen a burgeoning of computer technology and applications of
that technology to language learning and teaching. Virwaly every language
leamer wortdwide is 0 a lesser or grestr extent, a user of computers, the
Tere, Pods, cel phones, the Web, and other common cyberecheolog. I's 0
surprise, then, that an orerwelming aunber of language courses ullze some
‘orm of computer assisted language learning (CALL) to achieve thet goals, 28
recent publiitions show (Chapelle, 2005; Chapelle & Jamieson, 2008; de
‘Seendelly, 2005; H.D. Brown, 20073).
The assessment of language learning is ao excepdoa to the mushrooating
growth of computer techaology in educatonal conten (see Chapelle & Dough,
2006; Douglas & Hegeheier, 2008; Jamieson, 2005, for overviews of computer
based secoad hoguage testing), Some computerbased tess are smalscile “home
sgrowa" tess avaiable on a plethom of Web sites, Others ae standaized,
large scale tet in which ens of thousands of estnkers ay be involved. Sucents
receive prompts (or probes, 28 they are sometimes refered tc) ia the form of
spoken or writen simul from preprogrammed algoithm zad ae required to pe
(or in some cases, speak thir responses, Most computerbased test items have
Fixed, cosedended responses; however, tests soch 25 the Test of English a5 2
Foreign Language (TOEFL? Test) now offer a writen essay section aad aa ol pro:
uction section, both of which are scored by humans (as opposed to 2uromatc,
electronic, or machine scoring.
Recent developments in computerbased sssessucat clude coaubutons of
‘corpus linguistics in providing more authentic, the design of more complex
tusks in computerdeivered tet, the utlzation of speech and wri recopaon
software 10 score onl and weinen production Gamieson, 2005), and some
iniguing questions about ‘whether and how the delivery medium (of computer
bused language testing) changes the oaruse of the construc being measured"
(Dougias & Hegelacimer, 2008, p. 116).
A speoflc pe of computer-based es, 2 computeradaptive test (CAN), tas
been ava for many yeas bu has receay gained moments. In computer
adaptive tes, each esiaker receives a set of quescons that meet he test spice
tions aad are geceralyappropste fr his oc hex performance level. The CAT sas
‘with questions of moderate diiculy, As tetakers answer cach question, he20 UM! Aten Covey des
‘computer scores the quesin ad ust that infomation, a el ste responses to
previous quetons,t detemine which queson wil be presented ax. AS Fong as
‘eumines respond cerect, be computer pica sles queinas of greater ee
aya cut. cere answers, however, pial bring queios of leer ot
gale, The computers programmed 10 ul there dsgn contin
cay acjuss tind quesns af appropriate iticuy for estates a al peor
tance lve, {a CATs, the tester sees only one quesion aa ine, and the
computer scores each question before selecting the net one. AS eu, tesaers
‘cannoskip questions and cace they tae entered and contre this aswers, they
‘cao eb fo quesoas or 92 ext par ofthe est.
Computecated esting with or wit CAT techaoogofs these adage
+ avariery of ex adnlnisered cassoombased test
+ selfirected testing on vasious aspects of 2 language (Vocabulary, grammar,
‘discourse, one oral ofthe four sls, etc.)
+ practice for upcoming high stakes sandarized tests
+ some individalizaton inthe cise of CATS
+ largescale sandardized tess that canbe administered easy to thousands of
testiakers at many diferent stons, then scored electronically for rapid
feponing of results
+ iniproved (but imperien) technology for automated essay evaluation and
speech recognion (Doves & Hegel, 2008)
Of couse, some dsadrantages are present in our current peddeton for com
puterbased esting. Among them:
+ Lack ofsecurtyand the possbiiyof cheating ae inherent in unsupervised
computerized tes,
+ Occasional "homegrown quizes that appear on oficial Web ses may be
sisaken for validated ssessmens,
«+The muliplechoice format preferred for mos computertased tests contains
the usual potential for awed item design (see Chapter 3).
+ Openeaded responses are less key 10 appenc de ro (2) the expense and
poteatal uneabilryof human scoring or (the complexity of recognition
sofware for automated soring
+The uman interactive element (especialy n or production is absent.
+ Validation issues xeruning from testiakers approaching wsks a test uss
rather than 2 eahworld language use Douglas & Hegtbeimes, 2008),
Some argue that computer based testing, pushed to is ulimate level, might
mitigate recent efforts to return testing to is acl form of (2) being tailored
by teaches for their dasscooms, (b) being designed to be performance based,
and (© allowing 2 teacher-sudent dialogue co form the basis of assesment
‘This need not be the case, While ‘compaterasisted language tests (CALTS]
cura Assent Cones diss 21
teave not ily ived up to their promise... esearch and development of CALTS
continues in interesting and principled divecuoas” (Douglas & Hegelneimes,
2008, p. 127). Computer techgology canbe a boon to both communicative ln
fuage teaching and testing (Chapelle & Jamieson, 2008; Jamiesoo, 2005.
Teachers and tetmakers now have accesso an everincreaing range of tolsto
make computerbased testing less formulaic. By using technological innovations
cecatively, testers will be able to enhance authentic, increase interactive
exchange, and promote autonomy
(OTHER CURRENT ISSUES
_Asurrey of recent artes and books oa language asesitent yields severt other
‘urea issues, beyond those metionedabor,thzt ae being probed inasessmeat
cdeles around the world. They wil be discussed in subsequent chapter ofthis
book but deserve some meation nos, 2 ou begin this journey inc te intcaces
oflanguge asessneat.
Direct assessment of speaking and writing, Whether by means of buna
evaluators or automated computer-based software, the testing industry has begun
anew erin the iret assessment of productive skis. ‘Dizece” means that tx
takers must accaiy*€o" language, not just respond to quesions“about aguzgs,
In bygone years, est designers shrank from involving tscekers in acta! le
guage performance, especially in Irge scale assessment, because of the cos and
‘orelability ofthe endeavor. Now, wih advances in discourse anasis, beuer
accounting of examiner-examineeinceriction, improved rubrics, and technology
tenhanced scoring, the testing iaduty is taking on the challenge of direct aces
seat (ior, 200).
“Adcances in corpus linguistic. As tray bilioas of words and seaences ae
gathered fom the cel worl, logged Ino Unguistc copor, and eatlogued into
‘manageable, rercrable daa, the design of zsesement inssuments is being cevor
‘ionized Te od complaint that te lnguige of sandardived tests was too"phoy”
tad consved should no longer hold inthe near fore as we have accesso real ae
guage spoken and wren inthe real word (Conrad, 2005)
‘Standards based assessment. la Chapter 4, we go into deta on issues
surrounding standards-based assessment or a5 the proces is know in some cir
des, esublising beachmacks or frameworks of reference. Around the
‘world, educational insntion ae demanding common criteria for evaluation of
sudents entering into programs, advancing fom one course to another aad
graduating from one gride tothe next Such benchmars a a hotbed of cou:
trovessy at times as teachers, ainisatcs, and poltcians clash over ethical
‘ssves ieNamare& Roever, 2006). la more constructive moments, they provide
such needed standardization.
Consequential nalidityGmpact) As you wil oon readin the ext chapter and
then agin in Chapter 4, according to some researchers, the impact of sdacaed
test ubiquousin many societies, goveras and detenmines people's eewe edueton22 curr sess Cocaps sn ss
(Shotamy, 2001). The anthesis ofthe negntve pecs of uch impact the poten:
‘il for eeeramining the washback that tess and asessments can provide inthe
language classroom (Tyla, 200),
toe ek ke
‘As You cea this book, we hope you wil do so with a appreciation fr the place of
‘esting in assessment and with sense ofthe interconnection of assesment and
‘teaching. Assessment isan integra part of the teaching-leaming ce. In an inter
‘ctve, communicative cuniulum, assessment is almost constant. Tests, which ae
4 subset of assessment, can provide authentic, mociration, and feedback tothe
learner. Tests ae essential compocents of a sucessful cureuiun and one of ser
‘ral pare inthe learning process. Keep in mind these basic principles:
1, Pesiodc assessmens, both formal and inform, can increate motration by
serving 2s milestones of den progress.
2 Appropriate assesment a inthe enforcement and retention of ingormaon.
3. Assesmen’ can confirm aes of swengih and pinpoint aces needing
further woe.
4, Assessments can provide a sense of pesodi elosue to modules within
acorricuhun,
5. Assessments can promote safer autonomy by encouraging student's
eration oftheir progress
6. Assessment can spur lamers to set goal for theaseres,
7 Assesment ca aid in evaluating teaching eectivenes.
2. Answers tothe analogies quiz on page 1.2.03. 485 bo
EXERCISES
(Next: (9 Incvidual work: (6) Group opie week; (C) Whole ces discussion.)
2. (@) Ina small group, ook at Fpure 1.1 on page 6 that shows, among other
‘elatonships, ests asa subse of assessment ad the ater asa subset of
teaching. Consider the following clssroom teaching techniques: choral dei,
‘air pronunciation practice, eiding loud information exp task, singing
s00gs ia English, wing a desricion ofthe weekead!sactriies.La your
‘roup, specifically desonbe aspects ofeach that could be wed for asessmest
usposes. Share your conlusons withthe rest ofthe cas.
2 (6) The car on te ne page shows 2 hypothe ine ofdsincdon berween
‘ornate and sunmairessesment and bexween nfl and foal ses
‘sec. Ina group, reproduce the chart ona bege shee of paper (or four see).
vant Asser Cece and ses 23
‘Then come o 2 consensus on wich of the four cel each ofthe flloing tee
iques/precedues would be placed in ad jusy your decision. Stare your
esis With ater groups and lscus any difreaces cf opinion.
cement es
Diagn ess
Reroc achievement tes
Som pop quizes
‘Sodarzed proficiency tess
eal exams
Pontos
Jourmas
‘Speeches (prepared and rebeased)
‘Ol presentations (prepared bur not rehearsed)
mprompr dent responses to teacher's questions
Sradensen response (one paragrpl 1 a reaing assigument
raking 2nd revising weitng
al esas (er seve dats)
Sreeat orl sponses to teacher questions ater a videotaped lecre
“Wholecas openended discussion of atopic
“oral
Fownal
. On your own eh edt eee oeieda
> Cece ea hwbelsapedmobe tsa
ese Then cones the usin oometreszed es a Pet
«a disrbiition of scores that resemble a bellshaped curve, what kinds of
Sette pea of cto aire ein ourexperece
Repo your igs ota cas
4. (ins whic ds wanes dete pees
{elie ora sors) sue grate BrP1M cures Auten Cone ar es
tens dessbe on ages 9-12 I each case, have the volumes aes he
erent which the tes was suceshlin accomplishing is purpose.
5, (ss: nholecss dseusson, bainsom a varie of es as (eg mpl
hole, vee, e827 queston) that dass members have experienced i
teaming foreign ng and make son the boul of al the ass tht a
trentioned. Then deade Which ofthese ads are peormance base, whch
are commusicatre, at whic a both, nets, o perhaps flin been
took wt be is of Gadnes eh inligeaces. Tike one or wo ites
geoces, assigned to 70U group, 204 brainstorm some tesching aces
thr fver at pe ofitelignce. Thea bist some assessment ks
Unt may presuppose the sane incligence to perform wl Shae your resus
vot eer groups.
1. (@ ble 1 ss talon snd akeraiveasessnen: ass and chants
tis In pai, bnsorm bo the postive 2nd the oegatie aspects of ak in
‘rc it Shae our coecstoes—which hou ld x anced perspec
rei heres ofthe dss
(©) ds cass members to share any experiences wits computebased esing
tne emlate te advantages and dsrantiges of tne experiences
9. (@ tha panne is Dave's ESL Café at hts con/ an oo
fone ofthe dozer of quizes at are presented, Can you detemine the
teen to Wich such izes ae wsefl for a cascom eacher and the
{Eten which they may present some prblens and disadraapes? Report
our ings back othe dss
6
FOR YOUR FURTHER READING
MeNamara, Tim. 2000. Language testing Oxford: Oxford Univers Press,
(ne of + number of Oxford University Press bie iaroductons wo various
arcasoflnguage sy, this 40-oage primer on resing cles defnons
OF Las tems in language texting wi brief explensons of fndamental
concepts. ris usefl ile reference book to check your understanding of
tesing jagon aed issues in the Feld.
Mousa, Seyge! Abbas. (2008). An encyclopedic dctionary of language testing
(his ed), Tebran:Rahasa Publications.
‘This ia highly useful, very daaled compilation of vimully every term in
the field of language tesing, with definons, background history, and
research references. It provides comprehensive explanations of theories,
fencpls, sues, tol, and tasks and an exhaustive §-page bibliography.
‘A chorer version ofthis 946;ege come may be found in Mousav's (1958)
‘Dictionary of Language Tesing Tehran: Rahnama Publication)
(OBJECTIVES: After reading thls chapter, you wil be
duction and the behaviors (isening, reading, granmaicalty deectoa, and
‘wring) acraly sampled on such tess Dura, Caml, Peneld, Sash, &
UiskinGaspuc, 1985). Because ofthe crcl need wo of fancy atreble
probciency tess and te high cot of aministing and scaing ol production
tess. the omission of orl concen was justified an ecocomic neces. Homer,
Jn the la deca, wih cranes in deveopiag rab for soning rl production
tusks and in automated speech recognition sata, mace general ngage prof
cieacy tes inca orl production sts, largely temuing rom the demas of
the profesional community fo authenticity and content ay.
Consequential Vality (impact)
‘Aswellas these thre widely accepted forms of evidence tha may be introduced to
‘support the val of un asessment,cwo other categories nay be of some interest
and uty in your own quest for validating classroom tess, Mesick (1989),
‘T McNamare 2060), Bindley (2001), Fulcher and Davidson (2007, and Gromlund
‘and Waugh (200, among others, underscore the poteatalimporaace ofthe con-
sequeaces of wsing an assesment. Consequential validity encompases al the
coasequences of est, incluting such considerations sits accuracy in measuring
intended exter, is effect on the preparation of testakes, and the Gatended and
‘unintended socal consequences of tes’ inerpresaton aad use
‘Bachman and Palmer (1996), MeKay (2000), Davies (2003), and Cho (2008) use
the term impact io refer to consequential vay, perhaps more broadly encom.
passing the any consequences of assesment, before and afer a est administion,
‘Toe impact of teaking andthe use of test scores can, according to Bachan and
Palmer (p. 30), be sen at both 2 macro lve! (the effect on seiey and educational
systems) and a micro level (he effect on inva esakers. Ar the mcr level,
Choi argued thet the wholesale employment of standardized tess for such gate-
{ceping purposes as college admission” deprive sudens of erucal opportunites to
fear and acquire productive language sil causing tet consumers fo be ncreas-
ingly distlsioned wid EFL tesng”(p. 58). Moe wil besa about impact and
rebated issues of values, social consequences ethics, ant iraess in Chapter 4,
As high-stakes asessment has gained ground in the lst ra decades, oo
aspect of consequat validity has drwa special tenon: the effec of test preps-
‘Aton courses and manvals on pesfornazce.T McNamara 20K) cautoned against
test resus that may sflect socioeconomic conditions such 25 opportunities for
‘coaching tht are “diferentally svalzble co the sudens being asesed (for
stprepe
lites for
Mra 2 FnibeofLpage Ament 35
example, beaut only some fies can tion coaching, or because chee with
‘more highty educated parents get ep fom thes parents) (p. 5,
‘tthe micolevel, specify be cassoom nseuctional er, notes ger
‘ant consequence of test fils int the category of washback, tbe dened od
ire fly suse tei this chap: Gronind aod Wavgh 2208) encouraged
‘cicers fo conser the eect ofasesezens on tuen modalcn,subsequeet
esfommance ina cours, independent leaning, suey habits, and a
Pesan Fearing, sey e toad
Face Validity
A futher face of consequential vali isthe extent to which *sadentswew the
‘sccsmeat 2 ft, relevant, and useful fo improving learning” (Ground, 1998,
210), oc waat bas popula been caled~or misamed—face valid Face
‘lity refers othe degree to Which atest looks igh, ad eppearsto meagre the
‘uowiedge or abies clams to measre, based oa the sabjecinejudgmen ofthe
‘aninets who take i the adminisrasre personel who decide on its se, snd
‘othe pychometicall unsophisticated observes" ousn, 2008, p. 247,
Despite the innlive appeal of the coocepe office vay, remains 2 aoton
that ano be empscaly measued or theoretical juried ues the aepoy of
‘ay. is purely fro ofthe “ee othe belder”—ow the ester er poe
Sly the test gre iantrely perceives an instrument. For this exon, aay asses
fest expe (see Bachman, 190, pp. 285-285) view face vy 2s 2 superficial
Seto tat is foo dependent cn the whim of the perceiver. In Bchia's “pos
Fare ou te ay he echoes Mosier 97, . 196 decateol ceatenon
q 'sa"pemiious (thar should be techn
ieeemire fallacy. hat shoud be|pucged irom te techn
(x the same tie, Bchman (1990) and other asessment expen “grudgingly”
agte that test appearance does indeed have an eect hat acter estates one
{eit desgnes cia ignore. Sudeas aay for a variety of reasons fe that at's
{ering what i's supposed tote, and this might aie: ther pefoamance and, con
‘Sedu, create teat elated uneEabity refered co previous So snaent per
* aelldonstructed, expected format with fiir tasks
tks that canbe accomplished iia an alloed time limit
* items that are Gear aad uncomplicated
+ Gxecons tha are crystal lear
+ ess that have been rehearsed in thelr previous course wouk
* tks that relate their couse wore (Content ri).
* acktfcuty level thar presents a reasonable chaleageBe amin? Pareles! ng Ase
aly the ne office vay eins ws tht he ppl te ofS
ean fidence, ace ee) n impocant ingredient a peak Peron,
eriascaa be sect ander anseyincsese you" tuow a cue! US
ae ey ated to have reared ret sts before the ct and el cmon
sei a hea. casoom te othe Ene 10 nodes OW Usk, Paces
2 Tove kow ste: ea arf he ws ser he esses
ou are testing
wr es ined 2 dca et and aoe test ee Chapter 9 fons
uno cre es) as cena es for & goup of eae of EG 3
ca ging Some ier wee upset DEC Suh E00 he Be oF
sored tea oe thee abies in xg, The ata AE
choice grammar tes woud have been the appropcate format to use. Afew caimed
Speyer pero wel on he cone aed ition becase ey Wer 95 ET
(rey eo thse frau As carne ou the et served as super asruneNS
for placement, bute students did ot think o.
se peady noted seve, aria complex concen Yeti spense
ae enero deang of wht makes ocd es, We owe hee Mesos
‘Gpey.p 3 ean hat a ot an aeranne proposten an a ES
Say nay aed wo be apple 04ers wi
forms of ecener in your angus asesent proces YO CONE
reef prim fg on con an cetein ali, tea You se 0 8
ei pa mang acct gens bout compe ef sears
‘whom you are working.
AUTHENTIC
‘afer eo pip of lng ein saben «coer tt
acne, eapecaly within the 2c and eiece of erating and Sgn
CaS setae Pine (1996 dened atheatciy asthe degre of correspon
wee Pie darters of ea ngage St he eae of 2 SS
weipage wk (pant en sgl a ga fr ifeting ose ane
ne ss and for tasfoing hem into 2d et Kes
Fe one, saheniy i nota concept tat easy fends self to eB
ica det or mesure (Lewinics (200) suse he dha
so ating bent agg assent) After a he ct AY
aac ask ox angage seals s"reaiwor” or ao? Ofea such joan
Tae ject and yt suthentcy concept ha angageesing PT 7
eda oe de of enon co Gaiman & Pes 1996 Fue: & DS,
Fey. panes, acolo Chia (2006), many eps asst Fak
sword sks
1 ay when youmske a din forauhency in ssh YOu
ata bo ey be ena in the el wos, Many es tem es
ure? cpa ngage Asesment 37
simulate seabvord tasks They may be contrived or arial in thie atempt to
tanger a grammatical form ora lxical item. The sequencing of tems tia bear 9
‘elaoashp to one anther lcks authentic. One des ot bave to look ery lag
to fad reading comprehension passages in proficiency tests that do aor refect &
sealord pusoge.
nates, auheatcty may be preseat in the folowing ways:
Tan AUTHENTIC TEST .
| conmin language thts as natu 3s posse
| + has tems aecostenalized rather tan ole
| inde mening relat, teresing pes
+ provides some thematic organization ro ems, such suhovgh 2
soy lin eps
+ ofes mask tha replicate reakwodd tasks
“The authentic of test asks in recea years has increased noticeably. Two
hee decades ago, unconnected, boring, contrived items were acepted a ances
‘ary component of testing. Thiags have change. c was once asumed that le
sous testing could not nchade pecformance of the producive skis and say within
budgetary constrains, but now many such tests offer speaking and writing Comper
nents. Reading passages ae elected fom real-world sources that tse at
Fel o have encountered oe wil encounter. istening compretesion sections fe
cure masa language With Kestations, white noise, and inerruption. Bore tes
‘offer iems that are episodic in thar they are sequenced to form eating waits,
pangaphs, or sis
"We ive you 0 tke up the challenge of authentic in your classroom test,
[As we explore many diferent types of tasks in this book, especialy in Chapees 6
‘hou 9, the principle of authentic wl be very much inthe forefoot.
‘WASHBACK,
Atetoenteestvaliy Saud shore heeft
Sauer on sage
ce fede Ty snus oe pt oan sseseert, dined a
{Se onic ink oft tes ween oa fo one
(Sertich neo sch cement stent ee gee
eopmene stk C996 p. 2 ended ws tate wick ft 2
eben ke poston site nin of ing seg Mat
se pele mec ver af coeur watback Aen
TL Ginp conser wack an tags esp concep fe 2
Feu emess tar seni tore co tow es enc bth30 ourae2 Pines pag Acer
teaching aad leaing. Cheng, Watanabe, 2nd Curtis (2004 devoted an entire
forlogy to the sue of washback, and Spt (2005) changed teicher 0
become agets of beneical washback in their language classooas.
“The following factors comprise the concept of washbac
| a TEST THAT PROVIDES BENEFICIAL WASHBACK
+ posvelyinuences wt and tow exces ech
«fosters vst nd tow ae ees,
1 fs tamer a une to sdeqtey prepare
«ges lamers edb tht enkances thei nung evelopest
1s more oma inate tan sae
+ proves cones fe peak peormance bye esate
as eee eee
ee
so ee eae
ee
Satan ence oca
lp ppc
coun cp ncexed ceecce in cran bagged (Chapel
i ae cersate toto
ee
sat i oe
eae ret aa
Se ener
ee Se ores
een ene ene
oa ney aero
soit cg nage eeepc
een acreerio ata
en pegepeerrine
ae erode
pee ere
ae ete er
meee eer oeecnrreett
Se eee eee
eee ote
a
=
say itace watts commen aul tod seca oe
ee anes
over Peel tage Assent 38
‘with 2 single leer ga or nuaerca score and consider thei job dove. In reaiy,
leer grades and muverca scores eve aboluely no informatio of ini nterest
to the student. Grades and scores alone, without comments and other feedback,
reduce the lnguisic and cognitive performance data avalable to student to aon
nothing. at bes, they ge a celztve indication ofa formulaic judgment of perfor
‘mance as compared t cters in the cass—which fosters compecte, not cope:
ie, eming
‘Wie this in mind, when you ctu a writen test rt datasheet fom an cel
production re, consider givag more thana ruber, grade, or phase 2s you feed:
‘ack. Bren if you evuaton is nota neat paragraph appended othe test, ou
‘an espond to as many deals txoughout the tes as me wil pert. Give praise
for sceagihs—the good stuf?—as well as coascuctve cris of weaknesses
Give sumegic bins on how a student might improve cea elemeats of perfor
‘mance. {n other words, take some time to make the test performance an intrns-
cally motivating experience from which 2 student wll gain 2 sense of
accomplishment and challenge
Ait bic of washinack aay aso help students through 2 spécifeation ofthe
‘numerical sores on the vaous subsections ofthe test. A subeecton on verb
‘eases, for example, that yields a tlavely lve score may serve the dapnosic pu
ese of showing the student an ae of challenge,
Another viewpoint on washback is achieved by a quick consideration of ci
ferences becween formative and summative tests, mentioned ia Chapter 1
Formative tests, by deiitioa, provide washback i the form of formation tothe
learner on progress toward goal. But teachers might be tempted to fe! hat sum
native tess, which proride asessment athe end ofa course or program, do not
‘eed oofer much in the way of washbaci. Such an atdrude is uafrunse because
‘he end of every langue couse or program saays the beginning of fhe pur
‘sis, ore learning, more gels, and more chalenges o face. Eve inal exanina-
‘ion ina course shoud carry wit it soe means fr giving washbac to students.
ny courses I never gia fnal examination a the lst scheduled classroom
sessio, [always amine fis exam dung the enulinate session then cou
plete the evaation ofthe exans inorder to retun them ostadeats dura the st
ls, A this tine, the svdeas ceive scores, grades, and comments o thet wor,
and I spend some of the cas sesion addressing material on which the students
‘were sot completely lear. My summative assesment is thereby enhance by some
beneficial washback tha is usualy not expected of fl examisatons.
‘inal, washback implies tha students have ready accesso yout discuss the
feesbackand eration you have gen. Wheres ou almost certsniy ave known
teachers with whom you woulda’ dare argue abou a grade, an interactive, cooper
‘tie, colaboratire classroom can promote an atmosphere of dialogue berween St
Gears and teachers regaring evaluative judgmens. For learing to contin,
seats nce to havea chance to fed back on you feedback, to sek carfcaton
‘any issues tha are hy, and ose new and appropice goals fr themselves for
the days and weeks ahead,0 wren? Pcl of agape see
APPLYING PRINCIPLES TO THE EVALUATION
OF CLASSROOM TESTS
“te ive piles of pace eb, vats, antici, ad washbck 9
‘ag may owud proving sf oes forboshevaeaing a exsing mss
spent roedur and desgnngoa¢ on You oma. Ques, ess, Ga exams sad
Tenure proficiency et can al be seruinized hrogh these ve lenses
Ae there other pencils tat shoul be invoked in erating and sing
ascconen? The towey, of eure, yes Lange sssment isan corti
frou! dgpine wit many brands, axere es, and ise, The proces of
(ging eaive asesenet insane fx 0 comps tobe reed ie
fen Good tt construct, for camp goveedy sacs eso
Ter pepuaion, sapling of i, tem dein a cssurton, edn sponses
‘che sandat, ands on. Bat the Bre penis ced here serve as an exelent
fGunacon on wich evhaue exiting nuns ac ld You Om
‘Me wl lock at how o design ess in Chapter 3. Thetis and checlss hat
foto ints apes ince bythe fe princes, wil ep you eee ising
fess br your own dasoor is import for yout ements howerey tha the
Seroenc of tee questions does nop apis oe. Vai, oe example
sc Nerninly te mow seniicant cao! principle of asessment ersten
Petal eouldbe a secondary sve in dasroom testing. Of fora parce
ourmay seed place autensciy#5our pray coserion, When ass
Td dou, however, fay not substan al oterconideralons may be
rendered uses
1. Are the test procedures practical?
Practical is determined by the vacher’s (aad the swdens) time const,
ons and sdeseseatve deals andr some extent by what occurs before and afer
‘he nt To detemmine whether «testis practical for Four needs, you may want f
we the check belo
PRACTICALITY CHECKLIST
D2 canscens complete thers reaceably within thes ne fume?
3, Ca the tebe admired sooty, without proce “gtches"?
‘Areal priced marl accounted fr?
Hs equipment been preset?
{site scoragleralnton stem feasible in the eacher time fe?
a
Os.
1 & eke cox ofthe test win budgeced tims?
an
D8, are methods fr reporting resus derermines in arance?
fourna2 Pinplesof ngage rset 1
As this checklist suggests, after you account for the adminisintive deta ef
ving test, you need ro tak about she practical of your plas for scoring the
test. fn teachers busy Eves time often emerges asthe most important fctor, ont
‘hat overrides other considerations in evaluating anassessnea. Ifyou aced to taer
‘atest to fityour own time fame, as teachers kequent do, ou néed to accomplish
this thou damaging the test's vakity and washback. Teachers should, for
crample, oid the tempation to offer only quicly scored multiple-choice sele
tion items that may be neither appropriate nor welldesgned. Exeryoue knows
teachers secret hate to grade tts osc a much as students hace wo ake then)
sd will do almost anything to get throogh tat ek as quickly aad efordessy as
possible. Yer good teaching almost always impli an investment ofthe teachers
tie in giving leedbeck—comments and suggestions to students on thei tests
2. 1s the test itself reliable?
_Relabie appesto the suet, the est adminisraoo, the esse, andthe teacher.
Areas four sources of unelibiiy must be guarded guns, 28 noced in tis chapter
‘on pages 27-29 Test and test administration relay canbe achieved by making sure
tharall waders receive the same quay ofiaput, whether writen oe auditory Te
lowing checks shoul help yoa to devermin if esti ise relabe:
TEST RELIABILITY CHECKLIST
1, Does every suena dal photcopid text shee?
1D 2 Issoud anpiicaon cea audible to everpoce athe room?
3 side iapcleny and wnormyvisle wal?
4, ascii, empersuc, ennai nse, nd oer cso
cocisequl (nd opin) orl sent?
D5. Ferdosegeated sponses, do scoring procedures leave Ee debate
abo corecass ofan 205m?
EN gegen OI,
Can you ensure rater reliability?
Racer relabiliy, another common see in assessments, may be more dificult, per
Inaps because we too often overlook thisas an ise. Because classroom tess carly
favolve two scorers, interrater elit s seldom anise. Instead, inrsater rele
ably is of constant concem ro teachers: What happen to our faible conceat™
tion and samina over the period of time during wich we are evaluating & cx?
“Teaches ped o fad ways to aninain thei fous and energy over the te sakes
to score assesses. [a openended response test, this issue is of paramount
Importance, Ici eay 0 let meataly established sundards erode over the ows
sequired 10 evaluate the test.2 cama? ines larpage msec
lnwrasatr celal for openened responses zy be eahanced by answering
these questions:
of PRACTICALITY CHECKLIST
1. Have os esabished consent enter for comet response?
2 can yougive uiform sealant those ritersthrooghour the
craton tie?
3, Canyooguarasiee thar coin is based only onthe exablsed era
and or on erencos or ected!
© 4 Hine you ead tough est a tie to check orconsceny
O 5, tyoubivemade-misiran’ modlexions of what ou coer
cone syns, i yo go bck dp he sae Sanda xP
O 6, angouarid ge by eng th ess in ere segs, epecaly
Hebe dne urements water of erreur?
4, Does the procedure demonstrate content validity?
‘Tae major source for esabishing ray in acssroom tes is coaten aii: the
cement to Which the asessent requires sudents to pean tasks that Were
Included in the previous classroom lessons and thar direct epresent the objectives
ofthe unt on hich the assessment is based. If you have been teaching an English
language cls to svadeats who have been reading, sunmaing, and responding tO
shor pusages, and if our assessments based on this Wot, thea tbe conten aid,
the re needs to inchde pecormance in those sks, For casscoom assestnens,
content nd eteron raility are closely linked, becuse lesson or uni objectives are
essentially the exteron of an ssessmeat covering that lesson oe unit.
Sever steps might be aken wo evauate the content aid ofa classroom test:
of CONTENT VALIDITY CHECKLIST (FOR A TEST
‘ON A UNIT)
1, emit objected ideaiiee?
2 Ave ni objects represented in the form of te speieatons? (See
” the nex page fr deals oa ex speccntons)
G5, Do the est specifications include tasks that have already been
pedormed 2s partf th couse procedures?
4. Do thee: spectros inchde mts that represen (0 os of
(he objects fr the wi?
5. Dothose tasks ave anal perormance ofthe tage ts?
our? Pinch mpage Aaminet 43
A primary isu in esablshing content ality recognising hae undesping
every good clasroom cet ate the objectives ofthe lesson, module, or uc of the
course in question, So the frst measure of an effective classroom testi the ident
‘cadoa of objectives. Sometimes this is easier sid than done, Too often teachers
"Work through lessoas day afer day with lin or no cognizance of the objectives
they sec co full, Or perhaps thse objectives are so poorly defined that deter
ining whetber they were accomplished i imposible,
A second isu in content validity i test specifications (specs), Don't et this
‘word scare you. Itsimply means tha test should havea srucare that flows log
‘call from the leson o uc you ae tesng. Many tests have 2 design that
+ vides them into a number of secdons (comesponding, pehaps, to the
objectives that re being assessed)
+ offers sudensavarcty of item types
+ hres an appropriate relative weight exch section
Some tens, ofcourse, do not lend themselves to tis kind of rucrre. A test
ina couse in academic wig atthe university level might usifably consist ofan
‘sass writen esay on 2 given topic—caly one “tem” and ote response, in a
imanaer of speaing. But inthis case the specs would be embeded inthe prompt
‘use and in the scoring or erauation rubric use vo grade ital gie feedback. We
‘eum tothe concept of test specsin the next chapter.
‘The content Yay of zn existing classroom tex should be apparent in how
the objectives ofthe wl being tested are cepreseated inthe form of the concent of
items, custrs of items, an item types. Do you cleat pecceve the performance of
‘esttakers as reflective of the classroom objective? If so (and you ca argue thi),
content vali has most ely been achieved.
5. Has the impact of the test been carefully accounted for?
‘This question integrates the concept of consequential valty CImpac) 2nd the
Atporaace of sirucurag an assessment procedure to etic the optical perce
mance ofthe aude Remember tat even though ic an elusive concept, the
appearace ofa es oma stent’ pint of ew i pont consid\
4 oumne Pinesfanpag Asement
“The flowing fctrs aight help you to pinpoint some of the issues sur
rounding the impact of ates:
CONSEQUENTIAL VALIDITY CHECRLIST
01. ive oct sues appre rvew ad repro or |
kee |
2. Have you sagen etang seesaw be Benefca?
5 wee secre so ay if posse, the be rade Wil be
odeatycalnged andthe weaker sens wl not
te overvined?
C4, Does ee end telco your rng bene washack?
5, ace the seudeas eacournged tose the es as. earning experience? |
6.15 the procedure “biased for best"?
‘A phrse dint is come to be asocaed wih consequent vy is “biased for
beat’ term tz goes line beyond how the studeot views the test toa degree of
strategie iavlvenent on the prt of Seen ad teacher in prepa fo, seg Up,
40d following upon the et ise. According to Sain (198, fo give an assessment
cede thats based fr bas, teacher provides condos fora stdent’ opal
performance. In such a case, your roles aot to becky” oo Seaze your sadess
Faro encourage them and big out the best in their performance. Cohen (206)
suppored Swal's concept ina comprehensive discusioa of research tat showed
the psive elects of xudeas awareness and tllzation of tescaking stati.
Its exsy for teachers to forgethow challenging some tests canbe, and so 2 welt
planed ting experience wil include some sirtegic suggestions on how tudes
‘igh optiize their perfomance. In evakting a chssroom tes, consider the
ferret 9 which Delo, during, sd afer tex options se fled
7. Are the test tasks as authentic as possible?
Evaluate the extent to which testis authentic by asking the folowing quesdoas:
U xursentery cnecKusst
G1. sche lguage inthe test a narra as possible? i
are gens is conesulzd ss possle uber han led? |
5. Ar topesan sextons teresting. exe andjorhumerus? |
0 & tssone theme opiates proved such tough a sory ine |
or eiode?
5, Do ass represen, cls apposite, abo a? j
° eg EO
ow? Finis el ingageAssimen 48
Consider the folowing te excerps from tests, andthe concept of athe
Leiey may become alle caret
tiple choice tsts—contetalized
(9 3)
Dens: Aer answering the questn, oe the‘Sabit ten,
Going To”
1. Amanda: What__ this weekend?
Oynsgeng a
Oaeyougin boo
jour
2, Gwen: ti not see, _anyingspeca?
Ode yu gig ta
‘Oto aregang oo
Ob pig nce
‘3. mands: Meissa and |_a party. Wold youths come?
ate einato
are gehg
Owe
4, Gnenctidievetol_?
(oWhatsing tobe
(OW tobe
‘Ore it te
5, fmand:its_tobe at ts aus
on
9
Ome
Er
——— =
Ape om SV, fo DarsESL UEi
c
«
i
$6 cxera2 Pincsoangap Ase:
‘Multiplechoice tasks —decontextualized
“Going To”
1. Wat___tissunner?
dein ig do
Bis.in gig bio
youre gong ao
__—_mything psi ect weekend?
‘Rhee oot oto
Bw pig 8
Gls ing todo
4. Steand|__my Engle ass tomorow,
‘are wing
Baza
C gona
4, Th Giants are paying besabal on Wedestay,
Aatsiaxng?
os Rang bt
(c's Egoig be pye?
5 Theocean’s___tobeatlow te ltr this marta,
Aw
Bing
Cig
‘The sequence ofitems inthe conrerulzed sks achieves wodicum of utheo-
‘ety by contermsakzing al the items in 2 storyline. The conversation oe that
sight occurin the real work, evenif wih ide les formality, The sequence ofits
in the decooterrsized tas takes the estar into re diferent topic areas wth
‘no cootet for any, wit the grammatical category sth oni wang element. Each
sentence i ely tobe write or spoken inthe real woeld but ony pettaps in fre
cent conten, Gien the contains ofa mulplechoie forma, ona mesure of
abet would say the rst excerpts good andthe second excerpts only fal
8. Does the test offer beneficial washback to the learner?
‘The desig ofan efeive test should poine the way to beneficial wahback. A test
that achieves concen ality demonstates relevance to the cuicaum in question
and thereby ses the sage for washback When tet items epreseat the varios objec:
tives of a uni, and/or when sections ofa test cles focus oa jor topics ofthe
‘ui, casroom tests can serve in a dagnosiccapecity even ifthey aren't specially
Labeled as such.
(OWT? Piss engugeAosmen 47
‘The folowing cect shaldbelp you to maxiize benefice washback oa text:
| 7 wasupack cnecuisr
1 tthe tet designed ia uch wy tht you can gir fedhak hat wil
Derelevant othe cbccves af dhe ni being tete®
2. fave you sien sens ufc pretest opparanises to review the
subject mater ofthe ts?
G5. tnyourwrinen feedback to each dent, do you leche comments
tat wilconubute to student formate development?
G4. er emening tess do you spend cs tie “ung over he est and
feng advice on what stents shoud focus oa inthe une?
5. Afr ening tes; do you eacourge questions fom suena?
j 9 6 Wein an creunsances pet, do you oer sadents especialy the
| eake ones) chance ro Asus results in aa ofc hour
‘roups are more valuable, in terns of measurable washback, than the tet ele
By spending classroom time ater the test reviewing the conten, studeas de
ove tei ares of sength and weakest, Teachers can rae the wathback poten
‘wi Jouaal wring may provide sradents a spcic place to rece thes feelings,
WOE they learned, and thei resotuons for furae effort
ne48 come? Pincesol lange Ae
“Te five basi principles of lnguage assessment have been expanded hee into
beret questions you might sk yourself about an sessment As yous the
rine and gielie to ete varius ferns f test and procedures be sre
Poallow ech on ofthe fie to uke on greater or lser importance, pending on
the coatet In lgescae standardized testing or exapl, racial is aly
fnore mporant than wasibac, but the reverse may be woe of mos clasroom
tet Vly is of couse ways the al arbiter. Remeber, 20 tha these prac
pies porns they a, are tthe ony considers rating mang
a eerie tes Lave some space for othe tors 0 eer in
"Te nex cate focuses ca how to design 2 tes. These same fire principles
andere est coosoction 2 wel 2 test vat, log With some new concep
tha expand your abit apply prinoples tthe practic of language ses
tment in yourown casroom
EXERCISES
{Not: (0) tnivcal work; (6) Group or ait works, (C) Whole-ss discussion]
1. (©) sk the cass to volunteer brie descriptions of tests they have taken or
ven that iste, either poskively or negatively, each of the five basic pie
tSpls of language assessment tht se defined and expbined inthis chaptes
Tae proces, try to come up with exampes of tess that Huse (ind cit
‘erentae) four kinds of relablir as wel sth four types of evidence that
‘support the ality of es
fume? tina of anpuge Asam 9
4, Standardized molple-choice
proiiency tsp ral or writen
production
(Student eeceives a epet form
lisig otal score and subscores
fer sering,gamnmas, proofeeding,
and reading comprehersion.
2 Timed impromptu test of writen
English WE? Test
S receives a report fom sing one
oli score ranging betoeen Oar 6.
4. One-on-one oa interview to assess
cyeral orl production abiity
Stecives one hols score ranging
ietweenOand 5. |
2. GUC) Some assessment expers contend that fice vali isnot alegiiate
{orm of aii because irks solely onthe perception ofthe estar
‘her than en exteroal messure. Nevertheless, a numberof education
fesessment experts recognize the perception ofthe testeaker sa very Impor
‘ant factor intext design and administration. What is your opinion? How
swould you reconcile the two views?
+3, (G) Inthe section on washack, itis sated that"Washback enhances
nuraber of basi principles of language acqision: auinsic motivation,
autonomy, sefcoaldence, nguage eg, iteanguage, and tte invest
‘meat, among others (page 38). In2 group, discuss the connection between
‘washbzck and each of the boveamed general princiles of aneuage
Tearing ad reaching. Desc speciiccramples or thstrations ofeach coo
eccon If time permis, report your examples tthe cls.
4, (© laa smal group, evaluate the assessment scenarios in the chart on pages
49-50 by anking the sx fctors ised there from 105 with score of 5
Inciesting tat the priacple is high fuled anda score of 1 indicating very
low or no fullmen), Brake te scenarics by usisg your bes inno in
the absence of complete infomation for exch context. Report your group's
findings tothe rest ofthe clas and compare.
4,5 gives a ivesinute prepared oral
‘resentation in chs,
‘teacher evaluates by filing in &
tng sheet indicating 3 succes in |
detnvry,appor, pronunciation, |
ipomeay, an er. |
5, Slitens oa fitteen-ninute video |
lecture and takes notes,
T makes individual coments on each |
of $5 notes, |
6 Site atk home (venight one |
tage esa onan asgned topic.
5 reads paper and comments on
cngaizason ane crit ony, then,
ren esa) Siorasubsequent at
Govino)ure? Pps LxpageAsesinens $1
Flute eee of be eset in ems (te eb po
es fase dr Deep aes rt entero ea
Clap: ren: jourEdns eras wen apt ope een
7, S creates multiple oma of a Leary
essay, peers and T-reviewed,
fein fal veto,
Teanmert on germataltercal
rence es |
‘This highly informative “state-oftheart* aricle summarizes a qumber of
‘issues in classroom-based language assessment, For a reladvely short anicte,
‘sexta tad nis tenet eaaree
pone at ae ben sn ecaeyns Com cal
et 2ad lity a eam nd cee nd eee ey
theme scenes ctl assed dt vehoene eee
Data amples tat aches rh ely seat
‘Wei 7 C005) Longuae esting and adetion n cena basedepproa
Basogstoke, Eagan: Palgrave Macmillan iH
4, Sassembls a porto of materials
over a semester-long couse,
T conferences with onthe porto
atthe end ofthe semester.
5. (6) Ts chap ges cet ol ou uae and phe i pich
ls ofuguge snc nour poop, kt on pace
Desc set i onere in your op oko pe Dec
loi quesion: Dili es he eter be ec Repo ny
ctyou discs tact ote cs
6. (©)lnowr dscsionof input in scape te sugeson as mde hat
techs an prepress es yeas hem seeps :
pepuing ling sod ean tom tes. Test at ecg 8
"eb, dog nd afer cates “ee” seeps ol nde ig
iran abu what expand signs rb oc. i
“Dassen he is acing tes a ese.
eae 'Aer sep shasta omeneasaleinienng |
frugal cou Dene sens i sal pep, eign cet
often eps, ete cach goup Dea ono be
tees Repo you heck bc the ds
1. (6) in an accesible gage ins, he exces alow yout obsene
t= asesent prcfite tatabouto ae lets mses pe
tae tesa ue). Do te loi
2 Codes ee i te ache e pct ge
ines onthe pupae ft seen eis pce een
2. Ober pol de xl xian oe tse.
Aeaige hor ter wih th tech er et 17
cuoreDESIGNING CLASSROOM _
LANGUAGE TESTS _____
OBJECTIVES: afer ceacing this chepter, you wil beable to
+ gobeyoad erating an exiting + design avai of tens (est
‘esto acwally dsianing one 02 methods) fora proposed est
yourowa + cary out the adminisraion of
+ analyze de purpose of 2 tes, afer checking a umber of
est ‘essenal deta invoived
« futeinexplct tems he objecives + construct ainae for scoring
‘ofa proposed vst ging, and giving feedback oa
«cet ust specications for proposed tes
proposed te
Gomme and sumuatire tex and not and cerionrferenced tests, Deere
tppet or purposes of assent have been induced to Fou Youve tact some
‘ie hstoral ines of though in the del of language estesiment. You bave 3
Seose of mar cuentwrends in anguageesessmeat, especially the preset foes 09
npushing ordeals into challenging and itinsealy modating learning expert
senes By dow, cerain fundetona principles bave entered your woskng rob
tare praia, relay, vali, aubency, and wshbse. You should nos
dis powes 2 few tos with which you can evn the eecvenes of an
‘existing caso ts.
hts chapter, you wil draw on those foundations and rol co begin the
proces of exgning es revising exsing ess, A ays, the primary fs 2
ve book is on casoonr-bsed assessment, beemst that’ the doworo-arh con
teen wtbich you ace reguaely involved, Well eal deci with issues in farge-
eal, standardized resting in the next chapter. So fo nw, fo castoom PUEOSS
Tes art the process by asking ome ciel questions.
1. What the purpose of be test? Why ae you retin this test, or why ws
crete by, 9, a extbook writer? What sits significance relative your
‘couse (or eximple, to eraluteoveral proficiency or pace asruéeat ia
1
413
be
ii
=
2
ak
SSSI
THN
Loom} Owing Catoom Lgugr is 52
course)? How importane isthe test compared to oer student performance?
‘What wil as impact be on you and your students before and ater the asses-
reat? Once you have established the major purpose of ates then
becomes easiest specify its objectives,
‘2, What are the objectives of the te? What exactly are you ring to ind out?
Exablishing appropiate objectives involves a numberof issues, from rls
tively simple one about forms an functions covered in 2 course ust 19
such wore complex ones about constructs to be represented on the et.
Included here are decisions about wit language ables ae tobe ested,
43. Aow uil he test specifications reflect botb the purpose and the objectives?
‘To design or erate ates, you must ale sure that he est asa sacar
that opal fofows from the ur o esson ics testing, The das objectires
shold be preset inthe test through appropriate task types and weights, a
losial sequence, and 2 variery of asks,
4, How wl ets ster types (tasks) be sleted and the separate tems
arvanged? The sks nee tobe praccl as defined ia Chapter 2) To hare
content vali, they shoul lo mieor tasks of the cours, lesson, OS
ent. They should also be aulhendc, with a progression bse for best per
formance, Finally, the asks must be ones that can be evaluated aby by the
teacher or scores
45, In adinstering the et, what deta shoul attend to in onder to belp
“students achieve optimal performance? Once the est has been created and
js ready to administer, scents need to feet well prepared fr their pefor
‘mance. An otherwise effective, valid test igh to reach is goal ifthe con
lions fortes taking ae inadequately exablished. How will ou rece
‘unnecessary ame in students iether confidence, and help them view
‘he test an opportunity to lara?
6, What kindof scoring, grading, and/or feedback s expected? The appro
pate form of eda on test wil rary, depending on their purpose For
vey te, he way results are reported i an impocaat consideration, Under
Some creauisunces aletr guide or a boliste score may he sppropeate
foes circumstances tay require that a teacher offer substantive washback 0
the lamer
‘These six questions should form the basis of your approach to designs,
sdesnistering, and making maxim use of tests in your casroom.
FOUR ASSESSMENT SCENARIOS
forthe purposes of making preccal applications inthis chapter, we wil oasier
four scenaio 25 we proceed chrough the sit steps for designing an asessment.
‘These common ckstoom coateats should caable you to entity with rehworld
assessment sintions,rere mo ee
SH cn) Deng Caer lnpage ot
Scenario 1: Reading Quiz
“Te fist contet ian intermedia
English chs in Bo. The students,
for secondaiy school seuded
been asigned a two-page shor story to
The gut Will @ give adem sense of how wel they undestod the sory and
act asa staring point for a eachered discussion on each ofthe items, Results
of he quiz Wil ot be recorded in the texcher’ record book.
Scenario 2: Grammar Unit Test
‘This est comes the end of three sree unit in 2 grammarfocus course ata high
‘begining (Level 2) lass in an adit schoolin the United Sates, Seder have com-
pleted Level 1 or are been placed ino level 2 bya placement es Al the seudents
xe simulancousy aiing ovo lntegrtedstils classes Cistening/speaking and
reading/writing), and the grammar clas serves to reinforce the grammatical forms
thathare been encounterein the other two cases.
‘The grammar unit has covered verb tenses, The curicaum specifies tbat the
‘Sminue est isto be divided into tee sections: muliplechoie irems inthe
‘blak (lore) items, aad grammar editing task (where stadeats mus. detect eros
in several writen paragraph). The test wil be handed i, graded by the teacher,
ad erured ro students few day ates
Scenario 3: Midterm Essay
Ina wring course in university in Tuiland, sudents atthe advanced level bare
been working forbalfasemeser on writing sys, most aarativ ad desription
‘essays. ln the second half ofthe course, sdents wll move ono cause, gir
meat, and opinion esas.
‘Tue midterm es san opportunity for stadt to demons their by to
‘write a coherent say with elavel few grammatical and rhetorical eros. The essay
‘wl be given in css Gia 90a das period. The scent 6 not know the rpic
shend of time bucare allowed tose bling dctonary tolook up words orspeig.
‘Tae curicalum species quality of writing over quanti. The teacher wil read esas
cover the weekend and make commen'sbut ac give gre o 2 senre.Durng the next
‘weak, there willbe peer conferences wit the goal ofeach stdea to revs his or ber
sy followed bya student-teacher conference afer revision has ben tae i
Scenario 4; Listening/Speaking Final Exam
‘Chileen in te fit grade of x private schol in Japan have been taking 215+
‘week course in orl comamuaication sil (imtening and speaking). This i their
third yeu of English couses (hey begin i the third gride), and by now they are
naa sg
|
|
|
ies in sn
pesoqyio
Bd 1 class the
questions
[story and
as, Results
‘OWE. Deing Ceseomtangae es 38
able to comprehend simple English seaences, distinguish many phonemic con
teats, only produce (epeat) sentences that have been modeled for ther, aad
camry on very rudimentary oa exchanges, mos wsing words and phrases the
See neonate ther won ecb vey vn Se
matical accuracy is perhaps paszble with the mininal amouzt
aia pass of language they
‘The fl exam forthe css, according to the prescided curicaum, consis
of @ listening to an avdio program with most ofthe couse’s grmmatical and
phonological elements represented ina varey of stimulus types and respond to
-Rtten muliplechoie items (ihe suggesed time Knit fr this lsering portion
20 minwes), followed by () a threeminuce ora interview, oneonone, with the
teacher (Because sia private schoo, the class sie is quit sul (15 stents,
‘hich gives the teacher tne to complete ora interviews within the alloted tine
le for the final examination) While dents ae going oa by one, ito the onl
interview in a separte room, ohers are doing Lnernetbased English activities aad
‘ames inthe School's compucer lab. Because this fnal examization, the oaly
‘icipated follow-up othe two-part eam is Score repor by the teaches, which
‘parents and students wil seein 2 Few weeks,
‘Keep these fourassessment sitions ia miad2s you read this chapter, Al four
Will be referred 1028 we look 2 the five steps in designing an efectve test
DETERMINING THE PURPOSE OF A TEST
You may nk tat eery tet ou dese mis be a wonder asa ase
sent hat ipresses you coezgues zd dents ale Nos, Fe new dan?
‘athe tetg formats ake of eto desig and ng ie ee teh
wal and er Secon, adios rein tecniqus can, witha Ud cea
coafoem to the spit of niece, communicate ngage caret, Yous
Bes couse of acon sa new teacer isto mod tin be ues of cepted,
Neowin ig ches Sv wi exes, oc ag
design. Inte spn, les eonier ome practi ep in oastug
-dassroom tests, f fae
The is ad perhaps mos impact ep in desgning ey so of elscom
assesment (or in determing he appropiteaess of an exsing tx) i 0 ep
‘bac and conse th fel purpose of te exercise Ua your iene sb
‘opeconm, The pupae ofan assent sat Baha and ane (1956, p.
17-19) fe toasts wsefuless 0 rr sappy to wa we You pl ib
asec? Conte the chek onthe net page fr dteig pose
co er demain purposed36 uEME Osigng Clem agg Toss
(/ PURPOSE AND USEFULNESS CHECKLIST
H
\
i
0 1 bot scedto wininera tea this pont in my couse? 30, wlat
pepo wl seve the tet nore? |
2, Whats uicaace elt to my couse? {
5S eiesinpy a expected way to matte end ofaieso,uator
period of ie?
4 Ho agora ist compared to oer seat perornzoce!
5. Dol wise oust erie any seas ve mtein |
predetermined cone sant? \
6, Bot geo ware sens tobe eins of bene wast?
© 7, Witte theres as a means to aca my own pedagoicl eft
inched or nels w flow? }
1g, what wile impact be on what do, and what stants do, befoe and |
afer bets? i
LN er
ow look back at each of the four acessment scenarios described on pages
454-55 and think about the purpase of cach, Before resting on, do some personal
‘ranstonmin (ee Exerise 1 atthe en of his chapter) on just how he eight ques
tis in the checkli wil be answered fr each scentio.
"Reading Quiz To sar you thinking process, e's look athe purpose ofthe
sist scenarothe eading quiz. Te quiz is designed to bean insractoeal teal #9
‘ide cassoom dcasion for one cxstoom pedod. ls significance is ino bat
fot val when viewed agune the backdrop ofthe whole couse, Because isa. su-
prise tes and tel for teaching and slassessnent, the resus wil jusiabiy not
be recorded, and s0 one students pecormance compared to oes is relevant. Ik
‘is entiely formative ia rature, with ie almost exchasie purpose of prowcing bene
ficial wacback. Forcing sefentem think independent about te eating pase
lows then to se ares of srength and weakness in their comprehension sts,
‘aa you sow consider the other thre scenarios and think about the overall
purpose of each one, gten the comet described an the infomation given? Your
Fndersanding of the purpose of an asessmcat procedure governs, 10 & great
feet, te next tps ou take ia identifying lear objectives, designing ces spect
Featons, coasting sks, and derecmining scoring and reporting cites,
© 99 00
DESIGNING CLEAR, UNAMBIGUOUS OBJECTIVES
In addon to knowing the purpose ofthe txt you're creating, you aeed to know
as specials posible whar itis you wan o test. Sometimes teachers give ess
Simply because ifs Friday in Ue hid week of the course; after hasy glances at the
‘our. DaspingCaswer Lapage ss ST
eapter( covered during those cree weeks, they dash off some test items so hat
-seadets wil have something to do duriag te elas. Tis sno way to approach 2
tes nstead, begin by taking 2 careful look at everything that you thnk your st
dents should *know” or be able to"dobased on the material thatthe students are
responsible fx in odes words, examine the objectives forthe wnt you are esting
Remember tht every curicuum should hare appropriately framed, assessable
objectives, that, objectives that ae stated in ters of over pecormance by stu
dent, Thus an objective that states “Srudens wil eam tg questions o simply
names the grammatical focus of tag questions is ot tesible. You dont know
‘whether students should beable to understand them in spoken oc wrnen language,
Or wbether they should be abl to produce them orally or in writing. Nor 60 you
know ia what context (a conversation? an essay? an academic leur!) thos ix
‘uit forms shouldbe wed, Your ist askin designing tex, then, isto deterine
sppropie objectives, sated as explicitly as posible
(Grammar Unit Test. I you'r lucky, someone wil have already sated objec
tives cea in performance terms, Iyou'e les foracate, you may have to go back
though 2 unit and formulate them yourselt. Le’ sy you ind youself vaching the
grammar focus class cescibed in Scenario 2, andthe objectives given by the course
‘guide simpy specify the folowing forthe unit on verb tenses:
Students wil understand
and produce te foiuing
ver tenses in aproprite
ce and wen cnt
1. sip present
(eieu rom Lol
2. eset caninuous
>. single past
“4. aes perfect
‘sewer in the cusiculum, ‘appropiate cones are described a8 acontn-
unton of the mater introduced an practiced in the other two istening/speaking
and teading/ writing) cases So yout left with a sketchy bar workable setof objec
tives on which to base your unit ret You wil certainly need to flesh these out it
‘ore detil before you can be stised that you have cen, assesable objecives.
‘Where do you bepa? Ia this grammar couse, stdents equally wse al for skills
1s they work with the grammar foonsisructures. So to achieve content rai
your objectives should reflec ll four modes of pedformance and sample al four
‘eb tenses. Hee is a pessibe set of objectives for you to work fom:58 cued Deg Chom Lge ss
Oo. hes Seen RE
| ~_| suse incanterts dandy enanien din ite dnecanecea |
ia ea oe
1 single asenease
senate z
ile past
eesen gacet nce
Production —_
Soest ua ete tend acaulaediatne ee dases) —_|
[| —luanactypaducenctoaditeniamsitine |
ste sar |
ekcontiwnsas
plegatinse
[Oo pases J
SSIS oe
Aldbough these objectives may sear a bit ovesated i's usally helpfl to
‘nical all the possible elements ofboth comprehension and production to give
you an lasan: checks for your est specications (ace next section) Notice that,
exch objective is sated in tems ofthe pectomance elced andthe target nguisc
ctx Pecado ee
onl or writen response and Mkewise be writen eiciions, Gand at alo | ‘nocall of 7 Lid uci oe ap
the response modes cocespond oa ofthe ection modes For example, Ris | stole, iis Begilacand ineqdlac art oan
unlikely hat 2 prompt of minal pale Cea, bit) would be matched with 2 | Hh a Modal a esi es, ca? ond Na eat
“yes/no” response, nor would 2 monologue asa prompt elicit speling a word 2s a (Pond asa Pneeations of ca
response. A modicum of auton wil eliiaate thee aonsequus, i oe
In Scenario 4, the fithgrade English cass in Japa, the curriculum dicares 2 cates a ets ode serene sig ne cng oreo dus |
lsening secon of 20 nines anda tee nine ol iesen. Tis maybe ta Fe cal Of
ocder 3 fl examination that ostensbly covers & semesters wok ino com bn com
‘munication skis, bu well begin with the course objectives, In shortened form, {ed form,
those objectives are as folows:oo
cum Oxi Com ngage Tiss
‘Asyou desgn your fl exanisimpocaat to consider the ageof the sedens,
‘itn gers ace approximately 10 years lo, and at his age exp for fous is
appropiate fa par ofthe curiam, So your obec, sated onthe pre
“ios page imply temp we ofthe forms intcared bt ao exp dete
tion. A gre: dea ofthe insrcion throughout the reac tas consisted of audtory
input fom leet tased activites, DVDs anda CD supplied withthe tetbook or
the course Acres age om gamestorepeton ils aed most orl production
sseebeaed
‘Ustening Compreension Section Becnse of he conan of your cur
‘sclum, the senng parc ofthe al exam mast tke no ore haa 20 mits, s
aleady noted, The stdens have become accustomed to taking muliprecoice
testa quis in thee casswok So for reasons of pratay and impo, you
decide co desn a maple choice ening comprehension tet wih Ue diferent
tests Your school asthe latest omer technology arable so you ean make
2 god uly aio recoding sing your oie 2nd dat of oe th person (8 col
Jeagu inthe school whois cate pecking apanse but as excelet ora sin
Engi Here’ the frat you deci wwe:
Listening comprehension format.
Tet method: Audio prompts, multiple-choice response
Specification: Each tem wes fila, rehearsed conetope.
art 1 Minimal pis in words eed sentences (DiienanS mine)
Part 2 Vocabulary comprehesio of cjecs, Cahn, and clas
Ostemsizminites) —_—
Fart 3A ix of te eng reg
location (10 ites: 8 inte)
Scoring: Record the number of correct responses out of 30.
i, and prepestiog of
“This nformal,cssroomeniented outine gies you an indicaon of
+ the implied econ and response forms frites
+ the objectives you wll over
+ che aunber of ems in exch secon
+ the dine tobe allocated foreach
‘Novice that a number of the possible isening objectives are no deci teted
“This decision may be based oa the sie you devoted to these objectives, the impor
our} Osiptng lssoom angagetir 65
tance you place oa each objective, and ofcourse the fist mumber of minus rai
able to adinster the test. Is this an appropiate decision?
“Tae ial ter in your tes: outline species scoring For the steing secon,
coring is simple. Fr the oral meri, it becomes considerably more complex a8
‘ve shall ee Weil bok again atscoving, grading, aod feedback ter in his chapeer
ad then much more comorchensrety in Chapter 12
‘What wl hose mukiplechoce ening comprehension ems oc ike? How
wil you design appropriate stems (ee pages 68-70 fora descpion of mute
‘hoice question design), each witha lay, conrect response and muliple dita
tor? Can you ensue enough auhentcity 2nd also provi some variety or your
Seen? Well take up these questions inthe next main secon ofthis ptr
‘Mean, we tam our azenion to the or production sexion
(Oral Production Section. Your curculum allows you to design your 7m
cra interiew protaco, and so You daft questions to conform tothe accepted fat
tern of onl intervie (ee Chapter 8 fr inforatloa on coatucting cn ine
‘yews), You have decided to conduct the interviews one-on-one becast you hare
‘nough tine with smal cls todos. You beet and en with nonscored items
qarmup and window) designed tos students a ase and chen suadch
fermen them ems imended t9 test the objectives (evel chet) and 2 ile
beyond (rob.
“Because these are 10-year-old hire, you have decided, on the advice of
“other teachers, to make use of fe of piers for stimuli. They hve responded
‘pelo pictures in previous oncon-one simtions Here isthe outline you decide
vo folow:
Oral interview format
‘A. Warmup: retings and seing the cid at ease
B. Leel-check questions
1 dering objec, single and plural
2. Present progressive tense
3. Adjectives i, lite)
, Probe questions
1, Tak aboot his picture,
2. Agere question
1. Wind down: comments an reassurance
Le(ue. Dalgig Cc rps
You're now ready to det acral est tems—with matching pictures ect
espouses, Here's what you come up with asst def
H__tname. How ayo?
{Civea Gorplrnent othe cd. They have practiced ging conpimeiin chss)
(Grey xpi, in panes, th procedure re tre. Ressure tech}
Gi nde wats is gos
What color is #2 be
(hereto oo ey te ea
Gilinmee ete Paget
eee
Ths isa pea of by he pice hows a boy waking What sth by dig?
(Rept this proce wih ee ober picts doing ton)
lay a ae re i wo gles ne big and one all) Ae ihe ame
Sie They have pace ingle compas)
eesti proce eee cre ear fits)
Pat TT
(Sow th cla ire ofa ay eg a mel tome) Ff
Olay lest
Cape pie oe yng in pak)
‘Godot How can yas a quesion’
(iran hs been racic nls ary tins)
[Biko ell ise coo
thecomr rw yt c if
4S you continue to put ourself in the place ofthe teacher inthis school in
‘Japan, are there any changes you would make to you potocl forthe orl atervew?
‘Toe fal step in the process isto devise a method of coreg You need make
this a simple and sighorvard as posible, so lets say you decide to give two
Scores for ezch separte question, cae for pronation and one fr grammar
a3 Desig Coseom Lsgeage Te 7
Because the couse hs focused 2 gor deal on grammatical and phonologcl form,
and because stents expect such an asessmest, you just the exclusion of such
cements as content and social sill, Here's what you come up with
+ visually pecectpromunciatcalgammar: 2
* some ero) in the response: 1
+ wrong oF no respoase: 0
You prepare a card for each student with your list of questions. Bese each
‘question we the numbers 2 1, and 0, Yu csc the numbers athe interview p1o-
ceeds, When al interviews ae complete, you cin add up the numbers fr 3 toa
score. Because it’ final examination and the schoo’ poicy does‘ offer a means
to give more than a score report to the child (and the parent), your numer
scores appear o be suliciet,
‘DESIGNING MULTIPLE CHOICE ITEMS
Sona well ream othe spect ask o esgning the muliple choice seni com
echeasion test or the Japanese fifth araders, but we fst ned to tara ou anen-
‘ion o some important principles and tips for designing maaltiple choice tests.
Muluple choice items, which may on the surface appeat wo be spe items to con-
sSruc, are acu very feu to design core. Hughes 2003, pp. 76-78) caw
‘dons agains 2 number of weaknesses of makiplechoce ites:
+ The technique tess only recognition knowledge
+ Guessing may havea considerable eect om test scones.
+ The technique severely rescicts what can be tested.
+ Ieisvery cult o wrce succes items,
+ Beneficial washback may be minimal
+ Chening aay be fciiated.
‘The two principles that stand out in support of muliplechoie formato, of
‘ours, pracy and reabily. With ther predetermined correct responses and
time-saving scoring procedures, muliplechoice tems offer overmonked teachers
the tempting possibilty ofan easy and consistent proces of scoring and gracias
‘But isthe preparstion phase worth the eor? Sometines ii, but ou might spend
ren more ime designing such tems tan you save in grading the te. OF cours,
‘fyour objective is to design a argescale standardized test for eepeted adminis.
ions, thea a multiple-choice format does indeed become viable64 cumer Desig Clason ange Ts
1, Design each item to measure a single objective.
ction ofthe
As you fice the tsk of designing the tsening compreension section ofthe
%
fiingaeEalis cam, le fi conser some inporanterislgy
inthe
supply
ten pes
se,
cexaaly
1k Muliple choice ems area receptive, o selective response items in that
the tester chooses froma st of respoases (commonly cals supply
type of response) rather than creating a response. Other receptive tem pes
‘relude true/false questions and matching ls. in the cscussion here, he
guidelines apply primarily ro mukiplecheic item types a noe necessary
to ober recepie types)
2, Evecy mmlplechoice item basa stem (the "body of the item thar presents
sul) and several (osaly between tree and five) options or
alternatives to choose from.
+3, One of those options the Key, i the comrect reponse, whereas the others
serve as distractor.
others
propriate,
stor bot
960-75,
‘Because these wll be occasions when muliplechoice ems are appropiate,
onside he following four gudeines for designing mile choice items for both
Gaseroon based and large scale sntions (adapted fom Groniud, 1998, pp. 60-75,
and J.D. Brown, 2005, p. 48-50)
Consider the following tem from a secondary school cass in English at ee inte ‘he acer
smetite level. The objective is whe questions:
Testers ear: Where id Gecge goa apart ast igh?
Testes read: A Yes ned
1B, becuse he wes red
©. to Baie placer nc paty
oun een cock
Diswactor Ais designed to ascertain tht the salen knows the dlference
bermeen an answer toa teh: question anda yes/ao question, Disuactrs Band as
sels the ke item, C test comprehension ofthe measing of ubere as opposed
uy aed wben. Te objective hasbeen directly adresse,
‘Ga the oer hand, heres an teen that was designed to tet recognition of the
‘comec word order of indirect questions:
reuse ie, doyouknon 2
‘A whe te pst fce
8, ert post fc is
©. wer poste is
PRE eee eee eeee eee erate
Yr
onmrad Oniing Casco Langage Tas 68
Distacoe A is designed to lure srodens ‘who dont know bow to frame ind
rece questions and therefore serves as an efficient dsracror. But wht does df
tractor C actualy measure? In face, the missing definite ance (ie) is wht JD.
‘Brow (2005) cllsan ‘unintentional clue"(p. 48)—a flaw that could case the tes
taker to eliminae Czutommticaly Inthe process, no assessment bas been made of
indiret question in is distractor Can you tisk of better distractor for C that
‘would focus more cleuly a the objective?
2, State both stem and options as simply and directly
as possible.
‘We're sometimes tempted to make muliple-choce items too wordy. A god re of
tum sw get dzectyto the point. Here's a negative example
My eyaiht a ely bean eter ley. wonder ited sss, tink
beter gotthe__phave my es cece.
A pedabican
2B. eermatlogst
©. epturtit
‘Yu might angue that the frst two sentences ofthis item giv it some authe-
‘icy and accomplish abc of schema senting, But if you simply want a student to
‘deny che e7pe of medical profesional that deals wit eyesight issue, those sea-
tence ae superfuous. Moreorer, by lengthening the stem, you bave introduced a
poten confounding lexical item, deteriorate, that could distract the sradent
unecessary.
‘Another rule of sueciactness isto remoe needless redundancy from your
‘options. Ja the following em, “which were is repeated in all three options. It
should be pled inthe scu Wy heep the tem 2 succinct as posse
‘We went ows temples, _ sang
A. wich were auc
5B. which wre especialy
©. whch wre haly
te14 cura. etn Cesioom Large To
‘tren, because seuent ave also been rating Engh word and can 1005
size them wel the Ist Sve ers use veal cues, 3 flows
Testtakers seer
Aba Brat
can Bo
Testers ear:
1. The abat one fos.
8, Vice (How ld ha? Voie Ste’ it
or Pare 2, Jessa ou agin choose to se ptr ced ies, his time for sis
seems Bete allow you to depict objects easly and uaambiguouly, You have
‘hose to have four muliglechoice options for each te, So #0 of those es
took ike this.
Vii
-
|
eforsic
i
seitems
Tsttaks ee
o
Testers heer
1, (sepa sit
2. Thea a earintofeest
HIV
. curr) Despng Cononm ages 7S
For the ast fouritems, you choose to have students doa matching exercise, to
test knowledge of jects in the casroom. The item, worth four pin for all four
objets, looks like this:
Testers see
cranple: ABODEFC
7. ABODEFG
8 ABCOEFG
9. ABCDEFG
10 ABCOEFS
Testtakrs bear
In depaesa ache cored ites in pce a youse. Fr eae vet
puree wer, cae the coret ea ta este wre An examples been
sane rou,
ane: penal
‘The eter" has been dia because eet" pits to fe pen.
wien, :
1. desk
8 toot
2. compet
1. cic
PETITE78 cure omipirg Cee gene ant
For Par, ia which youre wo est negacives, contractions, a prepositions of
location, ence agia you bavedecied to rely on pictures, given the age of your tee
takers and wat they ate accustomed to responding ofa your classroom acts.
Here's what tof thse items look ike:
posilons of
ofyourtes.
actives
Testis see
Bay
Testtakers hear
1. Te shoes under box,
2 The catisiton he cha
_ taste anne
erm} Cegrngcseeom npge es 77
Asyoucan see, these items are quite waitonl. The format les self to pac:
tical aod realy, paving the way to quick, consistent scoring. The items areal
very Cleary formulate, within all the expected objectives of the course, and ste
ents are accustomed co such testing techniques o various aspects of vali ae
accounted fer You nigh setriticaly admit ta the format of some ofthe ites is
contrived, hus lowering the level of auhendciy. Bur your students are in quite 2
‘rastionaledveaona spstem in Wich they wi need faacion in adtona test
formats, so these items may help them, in aiendy" way, o handle such assess
sents inte rue. No washback is bul into the syste, 0 you have co be sat
fied with what you hope was ample washback nthe 15 weeks of cassoom actrty
thatled upto this da.
As you ook over the items, ae there some that nee tobe revised before you
Sinalze thea? tn revising your rat, ak yourself the following questions:
Suggestion fo evising your test:
1. Ae the directions to each section absolutely cles?
2, fs there an example item for exch section? Ifnot ar the dictions and
formato famine o student that they wil eal understand the tasks they
tre being asked to perform?
3. Dots cach tera measure a specified abject?
4, Is there a single conect answer for each question?
5. Is each tem sated in clea, simple language?
6, Does each mutiplechoice em have appropriate distractor; thats, ae the
‘wrong items cleat wrong and ye sufclenty“aluag" that they aren't
tidiulouy espe
7, sth dlifculy of ach item appropriate for your students?
8, Is the language of cach item sufficiently autbeatc?
9. ls thereabetance berween exsy and dificultitems?
10, Do te sum of the Rems and the esas whole adequiely reflect the
learning objectives?
‘deat you would ty outall our tess on 2 sample of srdens notin your dss
before acualy admitting the tet, but in ou daly classroom teaching, such 2
‘ayour phase is almost impossible. Atenatvely, you coud enlist the ad of cok
league to look over your test or, beter yer, taketh test 5 rl run You must €o
‘what you can to bring to your students a instru tat i, 1 the bes of our
abil, practical and relible
In the fina revision of your test, imagine that you'e & student taking the tes.
Go through each set of eizections and al tems slowiy and deliberately. Time your
self. (Often we underesimate the time students eed to complete atest) I the west,
shouid be shortened or leaginened, make the aecestay adjustments, Make sure
your testis nextand uncured onthe page and that ars clear and uaunblguous,
‘reflecting al the cae and precision you hae put ino is construction, there isan7a cum Casing Cees ange Tes
audio component, 2 there is inthe Bstening test fr Japanese ith graders, make
fare that the sep sear, that your voce and anyother vices are clea, and that
the audo equipment isin woking ower beoce stating the test
ADMINISTERING THE TEST
“The moment has ssived. You kare designed your test based on your carefully con.
sidered purposes objectives, and specs, Cou anything now go ary in these best
tad plan? Ofcourse, you know te answers es So consider some ofthe measures
souean ake to ensure thatthe aca agminitaton ofthe west accomplishes every
thing you want to. Here's ais of pointers:
precest consiratins (te day before the in-as ssy)
1. Provide appropriate presest information on
1. the condtons forthe est (me Limits, 2 portable electronics, breaks, e:)
>, mater hat eudens shoul bring wit theo
¢. the kinds of items (rem tyes) that wil be on che vest
suggestion of sates foe optial performance
fe. eration erteriaubecs, show benchmark samples)
2, Offer arevicw of component of narrative and description es
3, Give sofents a chance to ak any questions, and provide cespoases
“Tesadinisaton deals
‘4 arrive caly and se to iethat the classroom contons Gighing,vemnpesarue,
fa dock tat al can see cea, farinue arrangement, er) are conceive
5, tf audo oe video oF ober techaology is needed for administration, xy every
‘bing out in advance.
66 Have exta paper, wing aseumens, o other response materials on band,
1. Sarton tine
8, Disbute the rest iuel
9, Remain quietly seaed atthe teacher’s ded, arable for questions fom stu.
dents as they proceed
10, Fora ined test, war students when time i about to man our, and encourage
heir completion of thes wor.
“This ts notan exhaustive is, 2st doesnot cover al posible testing snstions,
butt shoul serve as sting pint fr you 2 you aempé to cover all the deals
iaelved in an séminisiation
ers, ake
ig and chat
shally come
se bes
beserere
as, et)
——4
une Deipng siwom vps 79
‘SCORING, GRADING, AND GIVING FEEDBACK
Scoring
‘As you desig &casroom test, you must consider how the test wil be scored and
‘raed, Your scoring pa reflects the relive weight tat you place on each section
ado te items in each esi, In the four scenario that we have bea discussing
in this chapter, scoring i thematic terms) «factor in ony to of them, the
‘grammar tit tex and be fin Isteningspesking examination for Jepanese fit
‘riders, Let's look at each,
“The grammar test as three sections, eich wit a numberof scorble items.
togialy then, ou coud place equal weight on each secon and mathematicaly
celeulat a sore. However, his may nc reflecr your ewn conception ofthe impor
tance of each task type, and so one decison you might make isto place more
‘weight on, pettap, the grammar editing secon. Your argument might simply be
that you fee those tsk represen more geara o integrative language ably and
therefore are deserving of greater weight.
‘The lstening/speaking zl exam fr Japanese ith graders presen a poten
aly complex chlleage, because school policy cequires ene grade for the Saal
‘examination 10 be reported fr exch rodent.’ your ob to deter the relative
‘weight ofthe listening section and th or ineriem You could argue that the oral
Interview invoives both compethension and production ané therefore give it more
‘weight, or ou might simply consderthem equal components, To make afinl dec
sion oa this issue, you Would need to know more about the particu conret than
hasbeen desrived ee.
‘Asa classroom teaches ster administering atest once, you may deride to erst
our scoring pln fo the course the next you teach it Attar point you" bare
‘alusble information about how easy or difculc a test was, abou whether te ine
limit was reasonable, about your sents’ alective reaction to and about their
ener peeformance, all, yo have an acuive judgment about whether test
Conecty assessed your aden, Take note ofthese impressions, eventhough they
ae aot empirical dita, and use them for revising atest in anothe term
Grading
‘our fsc thought might beth asigning gies to udentperfoomance on 25
vw be easy Jus ge 20 A foe 900160 percent, B for 80 o 89 percent, and s0
tn. Nor so ise! Grading is uch athoay sue ta all of Chapter 12 devoted to
the topic. How pou asig leer grades otis test fa product of
+ the county, culture, and conte ofthe English dassoom.
+ instrtinal expectations (ost of them unwritten)
oteew cured Deng Ce ange ss
+ cxplic and implicit definions of grades that you have set forth
+ the rebtionship you have established with this class
+ suudeat expecuitons that have been engendered in previous tests and
quizes in the ss
Torthe dine being, then, we wil set aside issues that deal with grating the four
scenarios in particular in fivor of the comprehensive treatment of gr in
Chap 12
Giving Feedback
‘Asection on scoring and grading would not be complete without some considers
tion ofthe forms in Which you can offer feedback ro your students feedback that
you want to become benefciadwasbback. Consider jst afew of the many possible
‘maniesations of feedback asociated with tests (his i not an exiausve Is):
[tn general bv gradi
sa lee rade
baal coe
«subsea og, of separate sis or sects of es)
2. inden of corecinconect repens
5. agri st of ets og, scars on ceain gaat cage)
cc aresnedng work ard rate opto,
for ol pructon est
2. Sor foreach elena beng ated
1 check of areas ecg wk and rate optans
‘orl eeback fr pereence
tite conten 9
‘crv nh lea beg ed
Beco seas eng work nd gpd cepacia ee
imprinting
«magn Sd endoltay cman, jets
pst ion ove wok
= Addljonalfalterative feedback fora est:
[aon aaltepet, po corfewrcs en els
ithe cee of rl be ot
inka coerce ienh sen re corgi tt
4 elt aseentn vero mansions
over) Osiging Coon Langage ss 81
In the four example scenarios that we have been refesing to in this chapees,
there aze« mule of options fr giving feedback:
Reading Quiz. The primary if not excusive pupose ofthe eading que wis
‘0 prompt seltassessment and diss discussion, With oo scoring or grating, fed.
back was to some degseeselFinduced through the knowledge of what questions
one got right or wrong, but more extensively inthe form of wholedlass discussion
ofthe reading passage
Grammar Unit Test. The most salen form of eedbeck isin tot score and
subscores, but perhaps the most useful feedkack could come in the form of dige
nostic scores, a checklist of areas needing work, and cas discussion of the resus
oftae test.
Midterm Essay. Al the pes of feedback Listed ae fessible and potedally
‘seh, Dut peraps te kind of feedback that would mast contribute to beneficial
vwashback would be the subsequent peer conferences and individual conferences
between sudent 2nd teacher
‘istening/Speaking Final Exam. Evenmzly the cidren in this cass wil
recive a lener grade forthe couse, eich mayinlude scores and subscars ofthe
‘Sul examination, with line else posse witin the system. One might vearre to
‘say tat the teacher could give some minimal ora feedback fer the ol itevie.
ee
[in this chapter, guidelines and tools were provided to enable yout adres the fre
quistios posed at the outset: (@—) how to determine the purpose ofthe test,
() how eo sae objectives (€) how o design rest specications, () how to design
or select estas, incudng eralating aoe aks with item indices and (©) how
‘o begin to address scoring and grading, This var template, show in Figure 32,
‘an serve asa puter as you design cssroom tess,
Determine Draw up
purposelisetness. Sate objecting, spectators
Inzcrinsieingtte | [ Constuctasyaton
Sdecttaks anton | | tat hp studeis ‘ofscrngrading |
‘ypes and anareo tg
ieneenaien: toachieve optinal and poddg
Performance student feabacke
Figure 3.2. Steps to designing an efective test52 core} Dering Cassoom ngage Tos
nthe ner ewo chapters (Chapters and 5, you will expove the extent to
which any of these principles and guidelines apply to standards based (and stan
tric) lange scale testing You wil then consider an aay of posses of what
fp come to be ealledateratve” assessment (Chapter 6, ony because pools,
conferences, jourmals, and se and peerastsments a no alvays comforably
Categorized oog core trsitoral forms of assessment. Subsequent tapers 7
‘rough 10) willed you through a wide selection of est sks inthe separate sls
‘flixening, speaking, resting, nd weing, swells provdeasense of how testing
foc em focused objecties sito the piware (Chapter 11) Fal in Chapter 12
you wil tea lng, bard look athe diecumas of eading stents
EXERCISES
{Note: 0 individual works (6) Group opie won; (C) Wales dscusson}
1 (@ The fist issue disused in this chapter was determining the purpose oF
a proposed test, anda checlis was offered In sl groups, hae one OF
foo members share an experience they bad either taking or grag es,
then systematically escus the probable answers to each item on the checke
Ii The group aay be able to help solve cern poblens or demu hat
‘ame up. Repo back to the dass ay notable suprises or quesons chat
were usesolred
2, (@) lock agin the discussion of objectives (page 36-58). You wil note
that we have oot expicly discussed objectives for Senaros 1 3, a6 &.
‘Wat woul tose objectives look ike? With a pares or ina small group, see
sfyou can jot down objectives for al thee scenarios. Stare your thoughts
‘wit the rescof the cs
+5, (UC) Figure 3.1 depits various modes of elicitation and esponse. Are there
other modes of elicaton ad response hat could be included in sucha
‘hac Jsify your aditions with an example of exch.
4. (@ With a panes orn small zoup, lok a the tems in each past of the
Tisteing comprehension west foc Japanese fh graders (page 64) and cesign
several more ems for each par Share you resis with another pir (@5 2
frou of four) and tak abou strengths and weaknesses of ens.
5. (Gn the or interview fr Japanese fith graders cscased carer inthis
caper (pages 65-67), can you justify the decsons made? Would you sug
gest any changes? Discus with a partner
ured Osiging soon ngage:
6, (6) This exer could be a challenge, especily those who have never
designed test specietions before, so platy of ime and asiance may be
‘necessary Look atthe following sine objectives fem a loweintermedte inte-
gated couse. In four ifleen groups craft st f test specs and
sample test tems foc the objectives our group has been assigned Report
your ndings to the rest ofthe css.
Studen wil
rei nd pg airs wt Gar pesca for ad fal
{moran palin insingl sci convaatns
2 tech tnd rouce mein qos wth cet fl onto ten
‘Communication sil (pedi
Stolen wl rate
Sheed aca ad wenn asocalcooveraon
{Btls cafematon 2 ocelconesaton
Freepers abou an even a cl omeraon
EB cpap wan comet protien, se adhe
Reading skills (simple ‘or story)
ee
See rg paste ced ves in ay 87
‘Wein kl (imple esa o stor)
Sut
i wt oe parooh ay about singe esa
a etd ear aonb
17. (@) Selecta lagtage cas in your immediate eairoament forthe folowing
projec In sal groups, design an achievement et fora eaonably hort
Segment ofthe course (perhaps 2 lesson or unit for which there is 80 current
tes or for which the present tests inadequat) Follow the guidlines ia this
cuptr for developing an assessment procedure Wea tis completed, pre-
se yur astssmene project othe restof the cbs.
8. (© Ifpossbie, oeae an existing, recenty used standardized mulkiplechoice
test for whieh there i accessible daa on student pesformance, Calcul the
item fry CF) and item escrimistion (ID) index for sleced items. IF
there are no da for an exiting tes, select some tems onthe test and 203.
‘yar te trucrre of those items na cstactr analysis to decerine if they
shave (® any bad cistacoss, (any bad ses, or (© more than oe potete
aly conect answer,14 cman) Osi Clsioaninguge Tos
9. (U/C) On pag, ten teen options ried fe ging feedback os
Gents on assessments. Review the practicality of cach and determine the
eaten to ich pactaty (grin, more tine expend) tabi
ssciced node toe beer wasbick lamers
FOR YOUR FURTHER READING
‘Bachan, Iyle F, & Palmer, Adin S (1996). Language testing in practice. New
Yor: Oxford tiniest Press.
Bachman and Pale’ book remsirs a sundae inthe fil of language
assessens. Nore paccalycriented than Bachar’ (1990) seminal theo
retical expeston oa language tesing, tis one is aed 2¢ providing 2
comprehensire set of tools for he development of language tex Its dif
cut feadieg forthe beginning level graduate stent or novice teacher, bat
‘thers will appreciate its thoroughness, theoreical soundness, acd
practical examples.
Geoalund, Norman E, & Waugh, . Ket. 200), Asesimen of student acter
‘ent (hed) Beso: Alfa &Bscon,
This widely used general manual of tesing acs education subject
aes provdes useful information for language assessment. In patel,
Chapter 5, 6 7, and 8 desrbe deed seps fr designing tess end
‘king lip choice, efile and sho-anser ems.
Browa, James Dean. (2005). Testing in language programs (2d et) New Yor:
MeGraw-HiL,
Chapters 3 ard 4 of this anguage-esting manual offer some fuer infor.
‘mation on developing tests and tet items inluding fonmulas fr elewaing
item facity and fe discrimination
sack ts
jn the
jasiay
vactice. New
langage
aatheo
viding 2
esa
shes, but
sand
nt achieve
subject
icv,
sis and
New Yor:
+ aloe
lating
OBJECTIVES: Ante reading this chapter, you willbe able to
+ understand the crucial sole of + appy principles of sandadizaion
staodard in edvcatonal to the construction ofeacher
‘instruction and assessment, bused sans
especially in sundardved testing + appreciate the rroeged swont of
+ examine ase of standards for a lucgescle standards based
specified age, level, and context testng—is soci, politica, ad
and apply them to contexs of ‘deoogial consequences
your orn + be prepared to take action in your
+ -aiyze the purpose, advantages, ‘oa teaching and assessing to
and dissdvanages of standards: casa finess and openness for
based assessment our studenss
‘Throughout hisory, people have been tested to prove ther capabilites or quale
Cations. Some ofthe eaest formal cuminatons or tess have been waced back
almost 2,C00 yes to the Han Dynasy in Chia where they were used t select the
highest oficidsin the country (Cheng, 2008). Even earl, in de Bile Gudges
125-6), we havea recorded instanceof the use of the socalled “shibboleth test?
by aeans of which evo ethnical and inguistical diferent groups of people Were
clisingushed. When an Ephaimite was ordered ta pronauce the word “hibbe-
leh his language caused him to say ‘sibboleds* with ans, a8 opposed to the
Gileadites' pronunciation that used 2 /sy/. The consequences were dit
Ephrimies were thus cxposed and lle. In another example, before Word War
4, the usalian government used language tess 28 a method o keep out ima
rans from other counties (7: MeNamace, 2000). The goverament oer could
Select any language forthe dictation test thatthe iumigrant bad wo take.
“Tday, we ac all sll deeply afeted by tests and examination, especialy
highstakes standardized tests. For almost a ceaury, schools, waivers, bust
‘esses and governments have looked to standardized measures for ectoomicl,
relale and val aesment of hose who would ente, coating in, o exit heir
instaions, Proponents of these lacgescale instruments make strong cains for
‘hs usefulness when great numbers of people must be measured quickly end
fictive. Those chins are well supported by reas of research dit that com
rise const validations oftheir efficacy and the specification of standards orcuore ¢ Sande Aemet
benchmarks tht are to be icorponted ino assessment instrument. We tave
become sword that abides bythe resus of sandardzed ests ifthey wee saco-
snc, havng been blesed by esearch findings and intwiona tuna
ta seats, we look a jot wit hese sandars ae that dese mary
vrgece sundried, where they come fom, and wheter the ways
‘and Our purpose iso case our avarens of standards based asessment—
sentbea cat ae used to evlatesudent academic achievement and show that
sregpns have cached cerain performance levels or standards. With his bak
hop, we wil then tua, in Chapter 5, the issues sureundng te standarczed
tests that such standards ae intended to Support.
"THE ROLE OF STANDARDS IN STANDARDIZED TESTS
‘sk nootangusge specials wht standatiaed testi and thy ate key ve
‘partva mukiplectciee x, and the they wil gre youn empl such he
Tat or GRE By aow you know tat thisis ot a comple answer. A standardize}
test among ober tings, presupposes cern standard objectives or Peformance
tenis now beter knowa as standards Gnd a known as beachmarks) that
reall constant aros ove form of testo soothe. The sandards that underte
Mendzed et ae ual tof carey defied competencies that 0 0
‘etease, a comiculum, 2 yearlong progr, o een matipeyear ebjers fo,
tay a Kl? progam or secondary school gradation cer, Standards base¢
Sr ement teers 10 procedues that have been speicaly designed 10 ret
such competences.
“Thew do these standards come from? Who designs them and how ae they
incorpontd ao asesenetisrments? The pst 30 yess Bae seen & mute
Tooming of ef on the part of edcsional leaders wordide c base =
pletion of scholaéminsered sanéald tess on dey specified tt
pin each conten ara being measured For example, most depanents of e+
ron ae ate lerelin the United Stats hareaow specie the appropriate san
arts (at ester o objective) foreach grade lve Gnderarten ro grade 12)
‘od exch coment aea(aath language, sciences a)
supnsosbeniet scspom Bayon “sue
ape mace onideg aye qia esc on uiog
voyeuoy sno
5 veneer va
* (g4-d “uowwonp3 jo wuounsedoc
onal cr1a SuwurSeq ‘Tupjeods pue ywuart 19) sprepunie kiouudojenap OBenuLy Ysy Huy ewreseD "ze BIEL
pedo ejusogneD “Z008 a)
Usb ensvoynea "2b SIL
an aygeye eaae eatTHg
CH open HAW HIE GRIM
cree. fqhaiaays dete) tau; Seabsceg
beds @ ehlaiat eebebigdgs gilguada
Hb Pe) Gee hae
He TWH GEIS aa
We ae TELE Hoag
HUUUMEGG LT) Hind
: a basin aly é au ite
RRO U HABE Erie
ere waa au § ey
1, anes ane i peibgecgh att ashy
. j a8 ik B
5 beede sue Wa se Bt Ue
i Hi ello dgah | 2 elisa uae
2 é
1
'
‘
'
t
‘
(
(
{
{
{SR yarns Sado Asien
Beeweca 1999 and 2002, the Cloris English Language Deeopmeat Tet
(Ceumty wus devetoped. Te CDT sate ofinsrunets desired to sesh
taneat of ELD sar ars rade levels For esos of test erry, pec
‘toasts test are aoc arable tothe public) Hosa (2007 examined the eee
to which the Eg Language Developmen (FLD) Cassoom Asesment sed ina
large wcban school dri in Gafoia measures the same corsircs sthe CELDT
and concided tht thatthe evdece gathered via the ELD Clastoom Asessmestis
‘consent with that provided bythe CELDT, the saadaried measue For more
{nfoemton on the CLD, cons the append 2 the ead of is bok
‘Te proces of adnisisering 2 comprehense, vai, and fi asessmen of
1D eden comaues tobe perfected. Stagest budgets within deparments of
‘efuciton Wweldvide predispose many in decsionaking poston to ey ont
‘ional sandazed tests for ELD assess, but mys of hope ein the pio
rion of more sidenrcenteed apprcaces to leamer assessment. Sack and
cllagues 2000), for came reported on «porto assessment system inthe
San Franco Unied Schl District called the langage and Literacy assesment
Rubric (ALAR, in which mukipe forms of evideace of suet? work ae cot
lected. Teachers observe sens yeacround and record tir cbse on
canna forms The we ofthe LALAR sem provides wef aon ude
fonmance atl gr lees fo orl producon an for eading end wg pec
mance in elementary and mide echoo rates (1) Farther research ongcing
for highschool level (ends 9-12.
‘We can find sine sancardsboseé ascents in ote countries such 38
ong Kong, Chia, wich eceoly moved rom aomreeenced io stundards based
ascesiments in the Hoag Koug Certfeae of Education Fxamizatten Davison,
2009, Ia Ban, er, and Eewdor, standards ised asessment snow common
lace, with all the advantages and dsranaes that come with sich eductonal
reform Godoy, B Lena, & M. Nagdany, pertonl communion, 208).
“Advantages inde common itera natoavie fo iachers to pursue ia tir ur
reul, whereas te most obvious dstansage the temptation teach to the
tesla te ated Ses, a perenialcomplaint of schoo teachers isthe potential,
doe oft “a” ofteseing ite feedom of the cea 1o cups wat he or
she ges to be inporant—and instead having 0 gine equal eto to 193 cit
ferent competences in 2 couse unit. Browa, personal communication, 2008.
These shortcomings newithsunding, tnd in thir besinten, ae of couse
designed to be anchors, signing cui, instruction, and assessment
(CASAS and SCANS
{At the higher eves ofeducaton (cllegs, community colleges, adult schools, lane
‘guage schools, and workplace seings), standards based assessment systems bare
also had an enormous impact. The Comprehensive Adut Student Assessment
System (CASAS), for example, is 2 progam designed to provide broadly based
|
sessment of
armen of
rely on tre
the explo.
sack and
‘tem in the
Assessment
wk ae cok
tons on
ents pee
log perio.
is oogring
2s such a5
ardsbased
(Darina,
comton
Sucaonal
2, 209)
thei eu
ch to the
poten
tice or
0193 cit
1, 2008),
ofcourse
os ne
sms have
rssnent
“y based
owns Sandu seedAcmumet 9
assessments of ESL custcula2eoss the United States, The system includes more
‘han 80 standadined assessment instruments used to place leemers in programs,
diagnose learners needs, monitor progress, and ceil mastery of functional basic
skill. CASAS assessment instruments ae used to mezsure functioeal reading,
‘writing, istening, and speaking ski, as wel as higher oder thinking sil. CASAS
scaled Scores report learners’ nguage abit levels in employment and atl fe
sls conte. Fra review ofthe CASAS test, see Gorman and Erase (200 some
deus ae also proved inthe appendix atthe end of tls book.
A simiar set of standards compiled by the U. . Deparumest of Labor, now"
‘known a the Secetary’s Commission in Achieving Necessary Skis (SCANS), out
lines competencies necessary for language in the workplace The competencies
cover guage functions in tems of standards:
+ esoureesGllecatng time, materi, staf ete)
+ Tnterpersoa! sills teamwork, customer service. ete
+ information processing, evaluating data, organizing fies, ete.
+ systems (c., understanding social and organizational teas)
+ technology use and application
‘These fie competencies ae acquire and mrsined though ising i the
brs ke reating, wring, stenig, speaking hiking sks suck seasoning
and creative problem solving: and persoral qualtes, och 2 stfestem and soc
Dir Fo ore information on SCANS, see Berton ele, & Thompeon (200),
Teacher Standards
In addition to the moverent ro create standards for leaning, an equally song
‘movement has emerged to design standard for texcing. Cloud (2001) noted that a
suadent’ "performance fon an assesment} depends on the qual ofthe lsc
ional program provided, . .. which depends on the quality of the professional
development” (p. 3 of teachers. Kuhlman (2001) emphasied the importance of
veacher suncar n three domains:
1. linguistics and language development
2, culture and the interreladoaship berween language and cule
3. planing and managing instraction
Ince education of new teachers, che University of Californi (2008 advocates
stention to six dortins, one of which i the assessment of student learning, With
‘maphass on fre factors
1. esublshing and communicating learning goals fr all cudens
2 collecting tnd using multe sources of information to assess student
teaming onic)*%
owe 4 Sanda ued Aceon
tnvolving and guiding al sudentsin assessing hee own Fearing
sing te resus of assessment to ie instruction
5. communiating with sriden's, files, and other mudences about
sodene progress
Professional teaching andar have als been the fous of several comminess
in the international association of TESOL (Gotlieb etal, 2006). Coespondingh,
the Ansralan Couscl of TESOL Associations has developed standards for teachers
sind ouwines the expectations of TESOL practitioners is eaton to three erent
tons woctng ia a muliculrlseciey, second language educaoa, andthe prac:
tice of TISOL Garment & Antenucd, 2006)
iow to aces whether teachers have met sundands remains @ comple se.
can pdagogi experi be aed through 3 tadkonl santa est to he
Tet of Ruhlnaa’s (2001) doazise—lingisics and guage developmest— know
fdge can priaps bes crahated, but the cull an intrcve characters of
(ecoveteaching cannot be s easy asesedin suc 2 tet TESOL standards com
‘ainee vpaites perommancedbased assessment of teachers forthe follwing reasons:
«Teachers can demoastne the standards in thes eaching
1 Teaching ean be asested through what eacher do wit their learners ia
their claserooms or vrual classrooms (heir perfomance)
«This perfomance can be dee in what ae cad indicators" examples
cf evidence tat the teacher can meet a partofa sanded.
«Te processes used to assess weachers ace to drew on complex evidence
cf gerformance In ther wort, indica ase more than simple “ROW to"
ssatemeats.
«+ Perormancebased asesement ofthe standards isan integrated ste. I's
either checklist aor series of discrete assessments,
«Fach assesment within the system bas performance criteria against which
the perfomance can be measured.
«Perfomance crite identify to what extent the teacher meets the andar
1 sruden eareing is tthe bear of the eacher’ pesformance.
“The sandarshased approsch to teaching and assessment preseas the roles
son with aay cbalenges Homere thorny those issues are, the socal consequences
‘ofthis movemenccanno be ignceed, especialy in terms of suden assessment,
“THE CONSEQUENCES OF STANDARDS-BASED AND
‘STANDARDIZED TESTING
‘We bave already sted tha standards based assessments are aot withour thee sare
of pmbiems. Although sandads are implemeated to improve eduction a growing
body of researc tas found 2 number of unintended or negrive consequences of
Acommitees
cespondigy,
Hor texches,
tee oan.
und the pac
mpl ue
es? Inthe
ent—inowt
acces of
cadens com
ring esos
earners
examples
xevidence
sebow tot
system, is
slnst which
ve standart,
the profes
sequences
cso aunsenay iin
1
ure t Sutebebued Asem 95
ssaneardsbased assessments Jone, Jones, & Hargrove, 2003; ina, 2001; Wan,
eck, & Brows, 2006). Soe sues have found tat sandardsbaced es can
ono the curicum, pushing atiction toward lowerorder rather han higher
‘ior cogatve sels, Further, lower test score result in grade retention, which
“peas aot o improve educational clevenen: fo thse saeas who a held
‘ack Darling Hammond, 200.
‘One of tie sroogest arguments made aginst sandardsbased assessments is
ihe ive of secountabiy, That, ress results ae used 10 bod schol distces
ecounaible for raising student academic achieement and ieatfing schools in
teed of improvement. A prime example inthe United Stes isthe aforementioned
[No chia Lef Behind Act. Thus for many schools, goverament funda and suppor
depend on student perfomance, whlch pus pressure aot cay on he suet but
vio on teachers, sehook principal, end school dissit adminiator. To avoid
‘peing penalized schools and school sic sometimes pus lowsconing students
ino special edvaton or retin or bold them back a grade herby encour
them to drop out) 50 that the schools average rest scores wil lok betwen this
‘case some argue (€@, Dating Hammond, 204) thatthe standards based assess
‘Gens do not improve sunt achievement but rather pobibit some student frm
“Another major challenge in sandardsbaced education iste close ik to the
sandandized tenting industey that we al ae very ilar with, Stories abound
tome of then of Blockbuster sppnovel prporons—on the high pice cestakers
tre wing wo py tops such tess, dramatically ustraing he gate keeping role
‘eibose cess. We have already alloded othe widespread global acceptance of tan-
dined tess 25 Tali procedures for assessing individual i many walls of Ee.
‘Taose tess bring wi them ceria Consequences thar fll ude the exegory of
‘consequential Yaidity or impact discused in Chapter 2
‘Some of those consequences ae pestire. Sanduized tess oft high levels
of prctealy and relabry and a often sypponed by impressive const ak
dauoa sues, They ace therefore capable of securely placing hundreds of thou
aod of tetiakess onto a normefeenced sale with high rebity ais (ost
‘raging berven 80 and 90 percent), For decades, university admins oles
round the word Rave rele ca the ress of ets such asthe Scholastic Apne
‘est GAT, the Graduate Record Bram (GRE*), 2nd the Test of English 2 ocean
Language CTOER® Tes) to screen appians. The respecubly dente cor:
tions between test tests and academic perfomance are wed to usy determining
the scudens educational fur on the basis of one relatively inespense seo
tmliplechoice test Thusas emerged the tera high-stakes testing, based on the
faekeeping fnction tat standardized tes pcfomn.
“rec ihe insutons that produce and ule highstakes standard tetsu
tied in tele decisions An impressive aray of research would scea 0 Sf Te
‘Consider the fact that coreations between “oi” TOEEL scores (before the advent
lof computer based TOEFL) and academic peeermance inthe Sst year of coe
“yeve impressively high Henig & Casal, 1992), and more recent valldson96 ures Sandstone
Eder a st Gt a
ce ed tema
2 ee ce esi
Se ci vie ae
So canner
a eee arg
a aa eg
duction Henning & Cascallar, 1992). Lrfenurns commonly use such findings
Se
TA ap erm
Se ee een ate
Soe
x ed with high but not
1. Should the educational and business worlds be soi i
peviecxprotablites of accurately assessing testers on sandaied insta
‘meni? in other words, what about the small minority who are not fly
assessed? _
12. Regardless of construct validation swties and correlation satis, shoul fr
‘her types of pecformance be elite eo obi a more compreneasive pic
ture ofthe reaker? :
3. Does the proliferation of sandardzed tes throughout a young petso's ie
‘ie deco estdiven curricula, diving the auention of nudents fom ce.
{tive or pesonal interes ad in-depth pure?
4, isthe sanderdined tes induscry in effect promoting a cua och and
polical agenda that mains exiting power srtuses by assuring oppor
‘iy to an ete (realy) as of people Ghobamy, 2000)?
~ Test Bias
Tes oo secret that standardized tet can nv a numberof peso ests. Te
oft fines ino ew a guage eng bat rece supe of interest
{hacen bree wy ad es Os, 20%, a, 26 So,
20) Shows a widespread concem vertex Bis, Some researches arg at bh
tex developers an erasers mu seo cet ir and unbiased tess aswell 2 we
tesa Way hat icfo alee That bis, heyy can come in any fons
feng, cuss, ce, gence, and eaming ses Koh, 2000 Media & NEB,
1990; Ross & Obie, 200. The Nena Center for Fas Open Testing, in is
mec ocwsece Fr Te recy ear oes doen ofinace fins oft
tis fom teaches, parent siden, ad lel consulta. ewe the American
Poyctolgal Assocao’s Join Conuicer on Tesing Pacics has published
‘de fx proesods to promotes tnt ae far tnaltestakes eae of,
eae, dsbiy,rce, cic, cational og, rego, sel aeration, ingisic
i
te
WW
NUNN
E
#
ure 4 Suncast beens 97
background, or other personal characterises” (Code of Faic Tesing Practices in
ducition, 2004p. 1) For example, rating selection in sandeized ets may use
a passage om a iterary pice that reflectsa middeclas, white, Anglo-Saxon nom,
‘Leetwres used fr iseaing simul can easly promote a biaced socopotica view
Consider the flowing proage for an esa in ‘general writing abiiy” on the
‘oxeratpndl English Language Testing System (HITS)
You rear a house throvgh an agecy. The hetng stem tas stopped
‘wodking You phoned te ageacy 1 week 2g but it hs stl aot been
‘mendes Wee alec to the agency. Expiathesteton and tllhem
‘wht you wane them todo abut
Although this ak fvorably Ulusates the principle of authenicy, 2 umber
of call presuppostions are evident in uch 2 prompt. For example, acepied
‘norms foe “complaining to an agency or expressing power relationships berweea
‘eaters and landiods/agents maybe unfnlas discourse for testiakers, cling ito
‘question 2 potential cultural bis,
‘further issue extends beyond he fac bar 2 based iter messes features
that ae relevant tothe test construct. Bas also cea systematically harm one
‘aroup of eseakers, thereby making the cer una In an era When We see tose
gnize the multiple iteligences present within every student (Chrisison, 2005,
4. Gardaer, 1985, 1999, is it not likely that sandartied tests procote logic
‘mathematical and verbalinguisicitligences othe vtul exclusion of the other
contextualized, integratire inteligeaces? Only very recently have taditionaly
receptive tests begun to include writen and oral production ia her tes batery—
2 positive sign. But is it enough? Kis alo lear that many otherwise “smart” people
o not pecfoom well on sundariized cess. They tay excel in cognitive ss that
‘re ot amenable toa sandanized format. Peshaps they need to be assessed by
such perormancebased evaluation as imerviews, porto, samples of wotk,
demonsrations, nd observatoarepors,Feshaps, as Wer (2001, p. 122) suggesed,
learners and teachers need to be given the freedom to choose more formative
_ssessmenr rater than the sumaativeasesonet inherent in standard tess,
‘Expanding test bates to include such measures Would help o solve the
problem of res bis (which extremely dct co conte or in standard ites)
ng account fr the small bot sigiticant mumber of testakers who are not ca
tel assessed by sandardized tess. Those Who ae using the tes for steeping
orposes vith few if any other acesiments, would do well cocsder multiple
‘mezsures before auributng infill preci power to andantized te,
On the sure, such efor sound laudable with oaly Beneficial outcomes, 25
‘they have the potential for transfering the “gatekeeper” role of tests to that of
“toor opener (Bachmaa & Purpur., 2008). On the ober hand, assessmeat experts
‘so wara that principles of pracicaly and valty canbe threatened by “ies
6
©
c‘agate against a mumber of
ed We oust somehow find
‘Test Driven Learning and Teaching
ee anpher omeqpence of sxndarzed tesing ie dagtt OF wn
re ae excane When sent afer ec ow Uo 8488
lenin erence wi ar ete elec be Ee He 9 18
ety to take a poste side toward eatin "The motes in such a conten are
ey i ay exes ith le eood of ing ats To
a mdng woke sve a Jpn, Ke, ant aban 6 a8 FE
sensing hea et of scans 2 DSSS
sre pogenenkege ence exusistinn, snr sina of WS 8
English (Chi, 2008; Kab, 2002). Tine atention given to any topic cc sk that
fds not due conebate to passing tbat one eD tn the United States, high
Se Tpesor ands forced elms much ation 9S”
las op get xh up the wave of ween ans. Bon
spent eheswee ecey prosel cash bmi $100 SE
a ep acl igh prooance 0 he sama delT
the nal x et cheatre ieee. Ham (Sos Peat Herne
a Th deco is oy wa undue pressure on als A Se
ra recede eam san ey sco 1 GE POS BY
a dering ent en pg ober ces ins TS
aa ena veces pst is OTC
scyachertn ach a cool igh acy be aspartate an at
ee een nh ake exelent rages ough shoo PSS
veh eedrven poi, te eacer woud recive evar sl
[ETHICAL ISSUES: CRITICAL LANGUAGE TESTING
Some researchers beer hat one ofthe pout 02 reply growing
jnduty isthe dager ofanabuse of power 10 special eport oa “owt fom the
testing explosion; Medi and Weil (190) nocd the folowing
efecto en pleas acd evr Be inne he
Ronen of sng res 208 cous Eas te Se
Ste on sedated swe, Hen hy have Ben sss 8
se of silt uc. The poe which has bese pf
ak and om chien xr eration wes @ 20)
i
i
umes Sadan Avecinest 99
soot: sess ised by olay (957 Ta
ye ent cei pees
‘hice pene be casa ecg fore ad eo Te
satel bya fen ie ese Seman ene ct
Catton Sed ober sel tae ean,
tame scoped sas ey ose epee, ve EE
Qin mou en rus oped er cat eno
(ots Ts sant bng nthe ern cil esa
sel one imc
O97, 2002) ad ers ap as 20
ne Ft a etn Bp, 37 sets
Sah cnn ochre lee plaor ome
sere eue tal langage etn Ger BF Cape 2 sone com
‘Sten ete tana pepe gear,Popones cite pps
a iy cist eed steed eng rca wich
Cac etter apc of cas poe ne
{GST iar to dope hee of intl perp, ees.
fe Gi 7, 9 ed oe
a
+ Psychometric traditions are challenged by interpre, individualized proce
ares fr predicting success and evaluating ail.
«Tes designers have «responsibilty roofer multiple modes of pesformaace
to accoune fr varying ses and abies among testakes
+ Tess ae deeply embeded in clare and ideology.
+ Teseakers ate politica subjects in political comtet
“These iswes are oot om: More than a cea agp, Erith edo E
ss lo aa a pen oc nape ig
‘examinations for nvr enanceIn recent ear te dette haseated wp Tess
fee ore peralent ino ves and ar often sed to make deco that are signe
Can Thus it was ot upg hatin 197, an ene sof the joumal Language
Testing was devoted to questions about etcs in agg eng Moe recent in
2 ct ey i
in angugeteing Futhemore, he nto Langage Tesing soci
Spe Nee bs wh eos ow el pop
‘gprpate professional conduc. The code of ics sates ais athe seit
aor regulation adit doe not provide guidelines fr racic, buts intended to
fer a benchnatk of ssc tical beaver by al aaguage ess" (.D.
‘One ofthe problns highlhved bythe push or cra ngage esingis tbe
widespread conviction, aeady aed to above, th cael conse 0.
Canned vests designed by reputable west manufrs ar infill i thei re“
‘sve vay. Too ofe one sundaized testis deemed tobe sufi, and
flloeup mearuses re conseed tobe to cost,