100% found this document useful (1 vote)
2K views21 pages

Understanding Test Score Types

1) There are different types of test scores such as raw scores, percentile ranks, and standard scores. Standard scores include z-scores, T-scores, stanines, and normal curve equivalents. 2) Test scores can be interpreted using norm-referenced or criterion-referenced frameworks. Norm-referenced interpretations compare a student's performance to peers while criterion-referenced interpretations determine if a student has met a predefined standard of mastery. 3) Both frameworks require defining the achievement domain measured and use valid and reliable test items and assessment methods, though they differ in how scores are scaled and mastery is determined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
2K views21 pages

Understanding Test Score Types

1) There are different types of test scores such as raw scores, percentile ranks, and standard scores. Standard scores include z-scores, T-scores, stanines, and normal curve equivalents. 2) Test scores can be interpreted using norm-referenced or criterion-referenced frameworks. Norm-referenced interpretations compare a student's performance to peers while criterion-referenced interpretations determine if a student has met a predefined standard of mastery. 3) Both frameworks require defining the achievement domain measured and use valid and reliable test items and assessment methods, though they differ in how scores are scaled and mastery is determined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

UTILIZATION OF

ASSESSMENT DATA
Chapter 10
TYPES OF TEST SCORES

Raw and Percentage Scores


• Raw scores are obtained by simply counting the number of
correct responses in a test following the scoring directions.
Percentile Rank
• A percentile rank gives the percent of scores that are at or
below a raw or standard score. It is used to rank students in
a reference sample. This should not be confused with the
percentage of correct answers.
Standard Scores
• It is difficult to use raw scores when making comparisons between
groups on different tests considering that tests may have different levels
of difficulty. To remedy this, raw score may be transformed to a derived
score.
• A normal curve represents a normal distribution – a symmetric
theoretical distribution. The mean (arithmetic average), median
(score that separates the upper and lower 50%) and mode (most
frequently occurring score) are located at the center of the bell
curve where the peak is.
• The curve approaches the horizontal axis asymptotically. The empirical rule (68-
95-99.7 rule) shows the connection between the normal distribution and the
standard deviation. It states that 68% of the scores will fall within 1 standard
deviation of the mean; 95% of the scores within 2 standard deviations; and
almost all (99.7%) of the scores will fall within 3 standard of the mean.
STANDARD SCORES

•  
A. Z – score
• The z – score gives the number of standard deviations of a test score above
or below the mean. The formula is where is the test score, is the average
score and is the standard deviation. A negative z – score means it is below the
average, while a positive z value means it is above the average.
B. T – score
• In a T – score scale, the mean is set to 50 instead of 0, and the standard
deviation is 10. To transform a Z – score to a T – score, we multiply the z –
score by 10 and add 50, i.e. T = 10z + 50. A z – score of -1.5 converts to a T
– score of 35. Hence, similar to a z – score, a T – score of 35 indicates that
the raw score of 12 is 1.5 standard deviations below the mean. This implies
that based on the test scores, the student performed below the average.
C. Stanine
• Stanine, short for standard nine, is a method of scaling scores on a nine-point
scale. A raw score is converted to a whole number from a low 1 to a high 9.
Unlike in a z-score where the mean and standard deviation are 0 to 1,
respectively, stanines have a mean of 5 and a standard deviation of 2.
• Stanine scores of 1, 2 and 3 are below average; 4, 5 and 6 are average; 7, 8
and 9 are above average. In the previous example, the stanine equivalent of a
raw score of 12 is 2. This can be calculated from the z-score by multiplying
it by 2 and adding 5, i.e. stanine = 2z + 5. A stanine value of 2 shows that
the student’s performance level is below average.
D. Normal Curve Equivalent
• The Normal Curve Equivalent (NCE) is a normalized standard score within
the range 1 – 99. It has a mean of 50 and a standard deviation of 21.06.
Caution should be exercised when converting a raw score to NCE because
the latter requires a representative national sample.
• NCE is preferred by some because of its equal-interval. Some differences
between tests and among subtests in a test battery can be calculated and
examined.
E. Developmental Scores
• A grade equivalent (GE) describes a learner’s developmental growth. It gives a picture
as to where he/she is on an achievement continuum.
• Note that a GE is an estimate of a learner’s location in the development continuum and
not the grade level where he/she should be placed (Frechtling & Myerberg, 1983). One
criticism against GE is that it assumes that equal learning occurs throughout the year
which is far from the truth.
• Further, it cannot be used to compare a student’s performance in different tests/subtests.
Analogous to grade equivalents are age equivalent scores. They are interpreted similarly.
• Developmental scores like grade and age equivalents promote typological thinking –
categorizing abilities and performance by age or grade without due consideration to
variations.
TYPES OF TEST SCORE INTERPRETATIONS
• A frame of reference is some well-defined performance domain (Mehrens &
Lehmann, 1985). This is needed to make sense of test scores. Scores and marks
may be explained in relation to a norm or criterion. These references were
conceived and differentiated by American psychologist Robert Glaser in 1963.
The use of norm and criterion-reference measures hinges on the purpose of
assessment.
• Rendering judgement whether he/she passes or fails is criterion-referenced.
• Selection is made based on norm-referenced scores.
A. NORM-REFERENCED INTERPRETATIONS
• The term “norm” originated from the Latin word norma which means precept or rule. By
definition, it pertains to the average score in a test. Apart from school average norm, there are
other types of norms that can be reported: international, national and local norm groups, and
special norm groups (e.g. students who are visually impaired).
• Norm-referenced interpretations are explanations of a learner’s performance in comparison with
other learners of the same age or grade. A learner’s knowledge is gauged in terms of his/her
position in the norm group. The use of standard scores and percentile rank are common in norm-
referenced interpretations.
• When using percentile ranks, Campbell (1995) cautioned test interpreters that (1) percentile units
are not necessarily equal in size and (2) it is of critical importance that the norms be developed
using a comparable cohort of students.
• As pointed out, norm-referenced evaluations determine the learner’s place or rank.
Assessment instruments that lend itself to this kind of interpretation include standardized
aptitude and achievement tests, teacher-made survey tests, interest inventories and
adjustment inventories.
• According to Kubiszyn & Borich (2010), a norm-referenced assessment tends to be
general as it covers a wider scope of content measuring a variety of skills. Because there
are several objectives covered, only one or two items are sampled for each learning
objective.
• Morever, difficulty of test items varies and may likely result to greater variability in test
scores. This implies that scores are dispersed and score standings become more telling.
Besides, the purpose of norm-referenced measures is to discriminate between high and
low achievers.
FIVE GUIDELINES WHEN INTERPRETING
NORM-REFERENCED TEST SCORES.

1. Detect any unexpected pattern of scores.


2. Determine the reasons for score patterns.
3. Do not expect surprises for every student.
4. Small differences in subtest scores should be viewed as chance fluctuations.
5. Use information from various assessments and observations to explain
performance on other assessments.
B. CRITERION-REFERENCED INTERPRETATIONS
• The word “criterion” came from the Greek word kriterion which means standard. And so, criterion-
referenced interpretations provide meaning to tests scores by describing what the learner can and cannot do
in light of a standard. Hence, test scores allow for absolute interpretations and not comparative.
• A learner’s performance is explained in relation to a pre-determined criterion of mastery. Mastery tests
generate criterion-referenced interpretations, so do teacher-made tests, skill test, competency test,
performance assessments, and licensure examinations.
• Multiple choice questions, alternate response items, short-response and essays enable criterion-referenced
interpretations since these test formats can be used to measure specific body of knowledge or skills set the
students have acquired.
• What matters is the link between the test items and the criteria set forth. Unlike in norm-referenced
interpretations where most students are categorized as average, it is quite possible in a criterion-based test to
have more students able to achieve an acceptable or high level of proficiency, or otherwise.
• Criterion-referenced scores include percentage correct, speed of performance, quality
ratings and precision of performance (Nitko & Brookhart, 2011). When interpreting
standardized test scores, there is an assumption of normality in the distribution of test
scores.
• This is not necessary in a criterion-referenced framework. When the test scores are
negatively skewed (skewed to the left), there are more high scores. This implies that
students performed well and speaks of the quality of instruction given them.
• However, it may also mean that test items may have been easy. If positively skewed
(skewed to the right), then there are more low scores. The students performed poorly
and test items may have been difficult.
• Criterion-referencing is used in diagnosing students’ needs and monitoring their
progress. It is likewise used in certification and program evaluation. It is the
preferred mode of assessment in an outcome-based education framework.
• The foregoing discussion revealed the differences between the two major frames
of references. Despite the differences, there are also commonalities. Miller, Linn
& Gronlund (2009) stated that both require specification of the achievement
domain to be measured, as well as relevant and representative sample of test
items. They added that the same test item rules (except for item difficulty) are
followed and the principles of validity and reliability are still observed.

You might also like