0% found this document useful (0 votes)
4 views7 pages

Introduction To The Speaking Tasks

The document compares the KET Speaking Test and the Shanghai Junior High School Oral English Test, highlighting their similarities in task design, cognitive validity, and criterion validity. Both tests effectively measure speaking skills, with high correlations in key areas, but the KET requires a broader vocabulary and more spontaneous responses. The study emphasizes the reliability of the Shanghai test as a valid alternative for assessing spoken English in the region.

Uploaded by

kevin chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

Introduction To The Speaking Tasks

The document compares the KET Speaking Test and the Shanghai Junior High School Oral English Test, highlighting their similarities in task design, cognitive validity, and criterion validity. Both tests effectively measure speaking skills, with high correlations in key areas, but the KET requires a broader vocabulary and more spontaneous responses. The study emphasizes the reliability of the Shanghai test as a valid alternative for assessing spoken English in the region.

Uploaded by

kevin chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Introduction to the Speaking Tasks (Task Design & Equivalence)

The researcher looks at two standard assessments in English used in schools: the Key English
Test (KET) Speaking and the Shanghai Junior High School Oral English Test. The test from
Cambridge English, Level A2, called the KET Speaking Test, evaluates your ability to talk in
English in real-life situations.(Cambridge Assessment English, 2020)

The Shanghai test is an achievement test designed by Chinese regions to gauge how well juni
or secondary students have learned to speak English on graduation. Although KET is meant t
o measure overall language skills and the Shanghai test tracks outcomes from the curriculum,
both tests are built in a way that makes them easy to compare.

All tests cover several speaking tasks and are usually about 15–20 minutes long. Both types o
f tasks offer students chances for open-ended speech and interaction in picture-based discussi
ons, question-answer pairs and quick dialogues. The assessment tasks in both tests were matc
hed for length, difficulty and the speaking skills they measure. In particular, the tests were ex
amined using four different analytic dimensions: Grammar & Vocabulary, Pronunciation, Flu
ency & Coherence and Interactive Communication. By making them aligned, both assessmen
ts become suitable for checking their validity and reliability.

Cognitive Validity Comparison

Cognitive validity means that a test checks how the mind uses that language during real-worl
d situations. It shows how effectively the tasks help students demonstrate language they woul
d normally use in everyday communication. Following our midterm essay’s framework, cogn
itively valid speaking tasks require students to engage in several areas such as getting words,
formulating sentences, building talk and sharing meaning with each other. In our analysis, we
analyze the test demands of the KET Speaking component and the Shanghai Junior High Sch
ool Oral English Test, paying special attention to four important features: Grammar & Vocab
ulary, Pronunciation, Fluency & Coherence and Interactive Communication.

The purpose of the KET Speaking Test which matches the CEFR A2 level, is to see how well
test-takers can share information about themselves, the suite they are in and their immediate s
urroundings.(Council of Europe, 2001)

We should highlight using language to perform daily activities, like socializing, asking and a
nswering basic questions and describing what happens to us. Local educators designed the Sh
anghai test to match the common tasks for students, including explaining images, answering
daily questions and stating their opinions in English as required by the official curriculum.
(Weir, 2005)

All of the questions in both tests involve easy sentence forms and familiar vocabulary. Even s
o, for KET tasks, candidates may need to have a wider vocabulary and understand more gram
mar forms, as KET pictures are diverse and the topics tend to be less familiar than the Shangh
ai test’s more academic questions. Both tests require students to talk constantly, but the KET
focuses mainly on how they organize their language and respond freely on the spot, unlike th
e Shanghai test which is more organized and planned ahead. In both tests, pronunciation matt
ers, so the examiners look at how clearly you pronounce words, handle stress and use intonati
on. Monitoring of rhythm and ability to control phonics is central in the KET, whereas the Sh
anghai test centers on a speaker’s clarity and how easy their speech is to understand. Further
more, KET contains paired activities that model everyday conversations, so students must tak
e turns and negotiate, whereas Shanghai’s format is mainly set up with the teacher leading the
conversation instead.(Field, 2011)

Both tests are designed to measure basic speaking abilities and have much in common intelle
ctually, but the KET tends to use language in more subtle and less predictable ways, leading t
o somewhat more complex cognitive tasks throughout the test. Even so, the Shanghai test is a
n effective way to evaluate spoken English at a regional level of education.

Data Collection Process


Making sure the data collection method was standardized, impartial and sound was very impo
rtant. All twelve students in the research were second grade Chinese students aged between 9
and 11 with similar backgrounds in their studies and English. By being so similar, the particip
ants were more similar in how much they used language outside the study, allowing us to foc
us only on the two speaking tests.

All students took part in the KET Speaking Test and the Shanghai Junior High School Oral E
nglish Test on Zoom which was scheduled for a Sunday in May. The virtual mode was chose
n to give everyone equal and easy access to the training. Students took each test one after the
other in a 20-minute session under standard conditions. We were told clearly what to do in ad
vance and the testing was carried out in a calm and quiet room.

Every session was recorded for correct assessment and in case rewatching was needed. All th
ree scores were provided by experienced raters—Greta, Vicky and Elena—each using their e
xpertise and litigation skills. Across the four dimensions of Grammar & Vocabulary, Pronunc
iation, Fluency & Coherence and Interactive Communication, a 1–5 analytic scale was applie
d. By handling all tests in the same way and with the same standards, the results could be trus
ted.

Criterion Validity (Pearson r)

If there is strong criterion validity, the results on one test will mostly match those of a well-kn
own comparison assessment, as both tests measure the same thing. For this study, we compar
ed the Shanghai Junior High School Oral English Test to the Key English Test, a worldwide a
ccepted proficiency exam. The relationship between the two sets of scores for speaking was
measured using Pearson’s correlation coefficient (r) in the subskills of Grammar & Vocabular
y, Pronunciation, Fluency & Coherence and Interactive Communication.

They indicate that the total scores of the two tests show a strong positive correlation, with a re
sulting coefficient of r = 0.9715. Analysis at the subskill level revealed high correlations: Gra
mmar & Vocabulary (r = 0.9146), Pronunciation (r = 0.9097) and Interactive Communication
(r = 0.8905) all worked well together. Fluency & Coherence showed a fairly strong relationsh
ip (r = 0.7616) yet reveals how it is tested in the reading passages can show slight variances.
As an example, during KET, accent and the way you speak will be favored, while in the Shan
ghai test, you should answer coherently.

It appears that the two speaking tests are closely connected when measuring communication s
kills, mainly in grammar, pronunciation and how people interact. In Comparison, the slightly
lower correlation in Fluency & Coherence points out an opportunity for realignment, by adjus
ting the rubric or changing the task. The strong criterion validity confirms that the Shanghai t
est can be relied on to represent global-speaking standards like KET in the region. The Shang
hai results can predict how well students will perform on international tests, so using them hel
ps with deciding on student placement and creating curriculum.

Pearson’
Subskill Interpretation
sr

Grammar & Vocabulary 0.9146 Very strong correlation

Pronunciation 0.9097 Very strong correlation

Strong correlation (moderate


Fluency & Coherence 0.7616
variance)

Interactive
0.8905 Very strong correlation
Communication

Overall Speaking
0.9715 Extremely strong correlation
Score

Inter-Rater Reliability & Standardization Process

When two or more raters score a performance in the same way, it is known as inter-rater relia
bility. To make sure that scores in speaking tests accurately show what someone is capable of ,
there must be high consistency among assessors. A strict rating and standardization method
was used in the study to ensure that results from the KET Speaking Test and Shanghai Junior
High School Oral English Test were as similar as possible.(Taylor & Galaczi, 2011)
Greta, Vicky and Elena were chosen as raters because they had professional knowledge of En
glish, good understanding of testing and were familiar with both global and local testing nor
ms. After their training was complete, all raters were brought together for a standardization se
ssion to review the rating scales for use in both tests. Every scale had four abilities as subskill
s: Grammar & Vocabulary, Pronunciation, Fluency & Coherence and Interactive Communica
tion, all judged on a scale from 1 to 5. To set up the benchmark scores and be sure everyone a
greed on the criteria, sample responses were reassessed and discussed in training.(Cohen,
1988)

Rater
Standard
Subskill Agreement Interpretation
Deviation (SD)
(%)

Grammar &
92% 0.31 High agreement
Vocabulary

Very high
Pronunciation 94% 0.27
agreement

Fluency & Moderate to high


89% 0.38
Coherence agreement

Interactive
90% 0.35 High agreement
Communication

Overall Consistently
91.25% 0.33
Agreement reliable

After being trained, all the raters assessed every student recording on their own. The people
writing the tests did so under quiet conditions and with close attention to avoid any interrupti
ons. After the initial scoring, two rounds of verification were carried out. Initially, raters went
over each other’s notes and pointed out any ratings that were a great deal different. When ther
e was a disagreement, the raters talked about the recordings and reached a decision that satisfi
ed the set criteria. Because the team worked closely, any disagreements in how to rate eviden
ce were settled thoughtfully and according to the evidence available.

As a result of this thorough review, there was a good level of agreement between the research
ers. Because well-developed rubrics, regular training and specific verification steps were used
all scoring processes were unbiased and repeatable. For this reason, the final scores truly sho
w how well each student speaks, while limiting the chances for rater bias or mistakes. Becaus
e variations in rater judgment may mask the differences between tests, this level of reliability
assures us that observed changes in scores truly come from the tests and not from the rating.
Consequently, this makes the study’s findings accurate and just.

Sco Grammar & Fluency & Interactive


Pronunciation
re Vocabulary Coherence Communication

Fluent with
Wide range, Clear, natural stress Initiates, responds,
5 logical
accurate use & intonation negotiates well
connections

Generally Participates
Good range, Mostly clear with
4 fluent, minor actively and
minor errors few issues
hesitation effectively

Moderate
Understandable but Some fluency, Responds
range,
3 some pronunciation occasional appropriately
noticeable
issues pauses when prompted
errors

Limited range, Frequent Limited


Often unclear,
2 frequent pauses, hard interaction, relies
stress issues
errors to follow on prompts

Very limited No fluency,


Very unclear, hard No interaction or
1 and fragmented
to understand understanding
inaccurate speech

Conclusion

Both the speaking test from the KET and the Shanghai Junior High School Oral English Test
have similar task lengths, organization and difficulty. Cognitive validity and criterion validity
for all tests were high, with a Pearson correlation over 0.9 in all major aspects of speech. Bec
ause of the proper training and standardization methods followed, the process scored an excel
lent level of inter-rater reliability. In other words, these findings suggest that using the Shang
hai speaking test would improve assessments of spoken English and, after making a few smal
l improvements, could be trusted as a reliable alternative for tests such as the KET in the regi
on.

Reference:

Cambridge Assessment English. (2020). A2 Key Speaking test.


https://www.cambridgeenglish.org/

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.).
Lawrence Erlbaum Associates.

Council of Europe. (2001). Common European Framework of Reference for Languages:


Learning, teaching, assessment. Cambridge University Press.

Field, J. (2011). Cognitive validity in speaking test tasks. Studies in Language Testing,
30, 115–147.

Taylor, L., & Galaczi, E. D. (2011). Scoring validity. In L. Taylor (Ed.), Examining
speaking: Research and practice in assessing second language speaking (pp. 171–
233). Cambridge University Press.

Weir, C. J. (2005). Language testing and validation: An evidence-based approach.


Palgrave Macmillan.

You might also like