0% found this document useful (0 votes)
331 views7 pages

Ak Singh Norms

The document discusses the concepts of norm-referencing and criterion-referencing in psychological and educational testing, explaining how test scores are interpreted through comparisons with normative groups or predetermined criteria. It outlines the steps in developing norms, including defining the target population, selecting a representative sample, and standardizing test conditions. Additionally, it describes various types of norms such as age-equivalent, grade-equivalent, and percentile norms, along with the applications of computer technology in psychological assessment.

Uploaded by

Rayyan Arfeen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
331 views7 pages

Ak Singh Norms

The document discusses the concepts of norm-referencing and criterion-referencing in psychological and educational testing, explaining how test scores are interpreted through comparisons with normative groups or predetermined criteria. It outlines the steps in developing norms, including defining the target population, selecting a representative sample, and standardizing test conditions. Additionally, it describes various types of norms such as age-equivalent, grade-equivalent, and percentile norms, along with the applications of computer technology in psychological assessment.

Uploaded by

Rayyan Arfeen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

7

NORMS AND TEST SCALES

CHAPTER PREVIEW

 Meaning of Norm-Referencing and Criterion-Referencing

 Steps in developing Norms

 Types of Norms and Test Scales

 Age-equivalent Norms

 Grade-equivalent Norms

 Percentile Norms (or Percentile-rank Norm)

 Standard Score Norms

 Computer Applications in Psychological Testing and Assessment

MEANING OF NORM-REFERENCING AND CRITERION-REFERENCING

An individual's performance in any psychological and educational test is recorded in terms of


the raw 'scores. Raw scores are expressed in terms of different units, such as the number of
trials taken within a specified period to reach a criterion; the number of correct responses
given by the examinees; the number of wrong responses given; the total time taken in
assembling the objects; and the like. All these raw scores convey no meaning in themselves.

For example, when A has a score of 40 on the arithmetical reasoning test and a score of 30 on
the history test, would it mean that A is superior or inferior or equivalent on the arithmetical
reasoning test to the history test? In the absence of some interpretative data, it is difficult to
answer. Usually, there are two reference points that are applied in interpreting the test scores
(that is, in interpreting what the scores tell us about the examinees with respect to the
characteristics being measured).

The first way is to compare an examinee's test score with the score of a specific group of
examinees on that test. This process is known as norm-referencing. When raw scores are
compared to the norms, a scientific meaning emerges. Norms may be defined as the average
performance on a particular test made by a standardization sample. By a standardization
sample is meant a sample which is the true representative of the population and takes the test
for the express purpose of providing data for comparison and subsequent interpretation of the
test scores.

For adequate representation the sample must include a cross-sectional representation of


different parts of the population. In order to compare the raw scores with the performance of
the standardization sample, they are converted into what is called "derived score". There are
two reasons for such conversions. First, the derived scores provide occasion for direct
comparison of the person's own performances on different tests, because they are expressed
in the same units for different tests (whereas raw scores cannot be expressed in the same unit
for different tests). Second, the derived scores denote the person's relative position in the
standardization sample, and therefore, his performance may be evaluated in comparison to
other persons (raw scores do not provide this facility).

The second way of interpreting a test score is to establish an external standard or criterion and
compare the examinee's test scores with it. This process is known as criterion-referencing. In
a criterion-referenced test there is a fixed performance criterion. If an examinee passes some
predetermined number of items (the criterion) or answers them correctly, it is said that he is
capable of the total performance demanded by the test. Thus, the criterion-referenced test
may be defined as one in which the test performance is linked or related to some behavioural
measures or referents (Glaser, 1963).

One question which arises here is: From where do the criteria come with which a test is
referenced? According to Cox & Vargas (1966), a major criterion for referencing a test is the
training, which results in an increase in the skill or proficiency. Suppose a group of
examinees is given training and the group is measured by a test before training and after
training. If after training the scores are better than the pretraining scores, the test can be
considered as sensitive to the results or outcomes of the training. In other words, the test
scores or results can be interpreted as an indication of increased skill or proficiency. Such a
test constitutes an example of the criterion-referenced test.

The important features of a criterion-referenced test are as follows:

1. The test is usually based upon a set of behavioural referents, which it intends to
measure.

2. The test represents the samples of actual behaviour or performance.

3. The performance on a criterion-referenced test can be explained in terms of


predetermined cut-off scores.

It should be noted that a test cannot be automatically called a criterion-referenced test simply
because it has no norms. Further, there is no clear distinction between a norm-referenced test
and a criterion-referenced test. In some sense, these two types of tests can be regarded as
complementary. The main difference between these two tests is, however, that a criterion-
referenced test is always based upon a predetermined cut-off score whereas a norm-
referenced test is always based upon the performances of a normative group or
standardization sample.

STEPS IN DEVELOPING NORMS

Developing norms is certainly a very difficult task. However, this difficulty can be minimized
if we follow the proper step in developing norms. The following are the three important steps
in developing norms.

1. Defining the target population

2. Selecting the sample from the target population


3. Standardizing the conditions

These steps may be discussed as follows:

1. Defining the target population:

The first step in developing norms is to define the composition of the target group. The test is
intended to be used for a particular type of person or group of persons. The composition of
the target group (also called normative group) is determined by the intended use of the test.
Let us suppose that the test constructor has constructed Test of English as Foreign Language
(TOEFL). Obviously, this test is intended for those students whose native tongue is not
English but who plan for studying abroad where medium of instruction is English. Thus, for
TOEFL the target population will consist of these students who are relevant and appropriate.
For TOEFL, population of PhD candidates will be an example of inappropriate one.

2. Selecting the sample:

When the test constructor has defined the target population, he proceeds to select a
representative sample from each of the target population or group. To make the selected
sample representative of the target population, a cross-sectional representation of the target
population must be made. A cross-sectional representation is one in which people from all
sections of the target population are represented. For ensuring representative character of the
sample, various techniques of sampling are employed. Generally, for constructing norms a
larger sample is preferred. For constituting a larger sample, a completely random sample
technique is the best one; but due to its impracticability, this technique is seldom employed.
Generally, cluster sampling or its variation is preferred.

3. Standardizing conditions for proper implementation of test:

Unless conditions of test administrations are standard: -ed, valid and proper comparisons of
individual test scores to test norms are impossible. Therefore, factors like adequate sound
control, lighting, ventilation, temperature of working space and the like must be properly
controlled. Above all, factors like test timing, test security, adherence to test-manual direction
and assuring that the examinees work on proper test sections are most important for
standardizing the conditions of working space. Without these standardization procedures,
norms cannot serve as a useful comparative device.

These are the important steps in developing norms of a test. For norms to be a useful
comparative device, these steps must be covered thoroughly.

TYPES OF NORMS AND TEST SCALES

Ordinarily, the derived scores may be divided into four common types and depending upon
each of these four scores, there are four types of norms. The four commonly derived scores
are age scores, grade scores, percentile scores and standard scores (Lyman, 1963).
Accordingly, there are four types of norms-age norms, grade norms, percentile norms and
standard-score norms. A detailed critical discussion of each of these is given below.

Age-equivalent Norms
Age-equivalent norms are defined as the average performance of a representative sample of a
certain age level on the measure of a certain trait or ability. If, for example, we measure the
weight of a representative sample of 10-year-old girls of the state of Bihar, and find out the
average of the obtained weight, we can determine the age norms for the weights of 10-year-
old girls. Age norms are most suited to those traits or abilities which increase systematically
with age. Since most of the physical traits like weight, height, etc., and cognitive abilities like
general intelligence show such systematic change during childhood and adolescence, age
norms can be more appropriately used for these traits or abilities at the elementary level.

There are some disadvantages of age norms:

1. Age norms lack a standard and uniform unit throughout the period of growth of physical
and psychological traits. As pointed out earlier, age norms are suited to traits or abilities
which show a progressive growth with advancement of age. Consider the example of general
intelligence. The growth in the level of general intelligence from age 8 to 9 is in no way equal
to the growth from 3 to 4 and 14 to 15. This is true because the growth at the earlier level is
faster than the growth at a later level and it almost comes to a halt after 16 to 17. Not only
this, this problem becomes further aggravated due to the fact that even at a particular age
level the rate of growth of general intelligence (or the traits like height and weight) is not
uniform for all children and sometimes it varies to a great extent. In such a situation we are
forced to view with scepticism the meaning of equality of age units used in age norms.

2. Another problem in age norms arises from the fact that the growth rate of some traits are
not comparable. For example, progress in maze learning does not ordinarily take place after
adolescence but progress in vocabulary continues even after adolescence. In such a situation
the age norms for these two traits cannot be compared.

3. A trait like acuity of vision cannot be expressed in terms of age norms because this trait
does not exhibit progressive change over the years. Many other personality traits would also
fall into this category.

Grade-equivalent Norms

Like age-equivalent norms, grade-equivalent norms are defined as the average performance
of a representative sample of a certain grade or class. The test whose norms are being
prepared, is given to the representative sample selected from each of the several grades or
classes. After that the average performance of each grade on the given test is determined and
then grade equivalents for the in-between scores are determined arithmetically by
interpolation. This average performance is known as the grade-equivalent norms. Thus, if the
average number of items done correctly on the arithmetic test is 30 by the sixth grade that
constitutes a representative sample, then a raw score of 30 becomes a grade norm of six. In
this way the grade-equivalent norms indicate the grade levels at which the performance of
representative groups is average.

Grade-equivalent norms also have some limitations.

1. Grade-equivalent norms of the same student in different subjects are not comparable. For
example, the grade-equivalent in social studies of a student cannot be compared with the
grade- equivalent in arithmetic of the same student because everyday life experiences outside
the school may contribute to the knowledge of social studies whereas knowledge of
arithmetical concepts is primarily dependent upon formal training in the class.

2. Grade-equivalent norms assume that all students of a class or grade have more or less
similar curriculum experiences. This assumption may be true in the elementary classes but it
may not be true for higher classes.

3. The grade-equivalent norm is not suited to those subjects in which there occurs rapid
growth in the elementary class and a very slow growth in the higher classes. For example, in
spelling and arithmetic there occurs rapid growth in the elementary grades but growth is slow
in the higher classes.

Despite these limitations grade-equivalent norms are common particularly among the
achievement tests and the educational tests. Such norms are also suited to the intelligence
tests.

Percentile Norms (or Percentile-rank Norm)

Percentile norms are the most popular and common type of norms used in psychological and
educational tests. Such norms can be prepared for either adults or children and for any type of
tests. A percentile norm indicates, for each raw score, percentage of standardization sample
that falls below that raw score. To illustrate, suppose Mohan has a score of. 26. on the
mechanical reasoning test and if 40% of the standardization sample secures below the score
of 26 then Mohan has percentile rank of 40 or PR40 and Percentile scores of 26. Percentile
norms, thus, provide a basis for interpreting an individual's score on a test in terms of his own
standing in a particular standardization sample. If the percentile norm is to be meaningful, it
should be based upon a sample which has been made homogeneous with respect to age,
grade, sex, occupation and other factors, otherwise separate tables for percentile norms for
each age, grade, sex and occupation should be prepared.

Sometimes, percentile norms are also reported through the ogive-a graph which shows a
cumulative percentage of scores falling below the upper limit of each class interval. The
graph enables a person to read his score (plotted on the X axis) on the basis of percentile rank
or cumulative frequency percentage (plotted on the Y axis); or the percentile rank can be read
on the basis of the person's score. Some of the important advantages of percentile norms are
that they are easy to construct, easy to understand, that even an untrained person can freely
use them. Despite these advantages, such norms have two distinct limitations:

1. Laymen as well as skilled persons sometimes fail to distinguish between the percentile and
the percentage score, and the obvious result is confusion. But the two should not be confused
because percentile is a derived or converted score usually expressed in terms of percentages
of printouts of the diagnostic and interpretative statement about their personality together
with the numerical scores.

Individualized interpretation of test scores is also done by what is called interactive computer
systems in which the individual is in direct contact with the computer by means of a response
pattern and thus, engages in a dialogue with the computer (Katz, 1974). The reality is that in
this technique test scores are usually incorporated in the computer together with other
information provided by the person. Subsequently, the computer combines all the available
information about the individual with stored data, particularly about his educational and
occupational background. It utilizes all relevant facts and relations in answering the
individual's questions and also helps him in reaching certain decisions. A very good example
of such an interactive computer system is SIGI (System for Interactive Guidance
Information).

Several researchers have reported that the presentation of the test content and scoring of the
test by computer does not appreciably reduce reliability or validity of test scores (Lee,
Moreno & Sympson, 1986). However, certain applications of computers may lead to misuses
and misinterpretation of test scores (Butcher, 1985; Matarazzo, 1986). For guarding against
such hazards, considerable alienation has been given to developing important guidelines for
computer-based testing. AERA, APA and NCME together called The Standards have
developed several standard guidelines on computer-based testing.

Today most of the experts agree with the view that pertaining to computerized testing score
comparability and narrative interpretive scoring are of major concerns which deserve special
attention. Mazzeo et al: (1991) have pointed that when the same test is administered in
computerized mode and in traditional printed mode, the comparability of the scores need to
be carefully investigated. The reliability and the validity of the test may also vary. Thus, it
can be concluded that unless the two modes are shown to yield fully equated test forms, the
same set of norms may not be appropriate as well as applicable for both. Narrative
interpretive scoring has also aroused some concern because sometimes it has been observed
that adequate information is not provided to test users for evaluating the reliability, validity
and other technical properties of the interpretive system. Likewise, at other times it is also not
clear how the interpretive statements were derived from the scores. In the absence of such
guidelines, narrative interpretive score reports no longer remains a meaningful affair.

The potential contribution of computers to psychological assessment is impressive (Gutkin &


Wise, 1991; Moreland, 1992). Farrell (1993) has recently identified several major
applications of computers to psychological assessment, specially to cognitive-behavioural
assessment. These applications are (i) collecting self-report data, (ii) directly recording
behaviours, (iii) coding observational data, (iv) training, (v) organizing and synthesizing
behavioural assessment data, (vi) analysing behavioural assessment data, and (vii) supporting
decision making.

Computer technologies have also led to the development and rapid growth of many new
instruments for the assessment of cognitive functions which are extensively being used in
clinical neuropsychology, as well as, in the study of attention disorders and learning
disabilities (Stoloff & Couch, 1992). Computers allow us not only to vary task presentation
conditions more precisely so as to assess performance on different task components but also
allow recording and evaluation of response parameters such as timing in the way not possible
by the traditional test. The Microcog: Assessment of cognitive functioning is one of the best
examples of recently developed computer-administered batteries which is designed to screen
for likely signs of cognitive impairment in adults (Powell & Whitla, 1994).

However, some factors continue to impede the widespread application of computers in


psychological assessment. First, the lack of acceptance by some practitioners is one of the
major factors. Second, evaluation of software is another obstacle (Farrell, 1992). Mostly
vendors are reluctant to make their products available for review. Consequently, potential
users don't have proper and sufficient information to evaluate the quality of the software.
Third, most systems of computer-assisted test interpretation tend to combine clinical and
statistical procedures. This specific mix of quantitative data and clinical judgment varies
among the systems in the same way as the technical quality of the database and the clinical
expertise of the judgment. Apart from this, the needed information to evaluate a particular
system is often unavailable because of proprietary concern.

Despite these obstacles, it is hoped that greater use of computers in both psychological testing
and psychological assessment will be witnessed in future.

Review Questions

1. What do you understand by norms of a test? Why are the norms needed for a
psychological test?

2. Make a distinction between age norms and grade norms.

3. What is meant by percentile norms? Discuss the advantages and disadvantages of


percentile norms.

4. What is the standard about standard score? Discuss the research utilities of different
types of standard-score norms.

5. Discuss the major computer applications in psychological testing and assessment.

You might also like