0% found this document useful (0 votes)

18 views49 pages

Measurement Reading Material

Uploaded by

muradaliyi406

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views49 pages

Measurement Reading Material

Uploaded by

muradaliyi406

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Educational Measurement and Evaluation of

Learning Reading Material

Department of Vocational Pedagogy

Contents

1
AN OVERVIEW OF MEASUREMENT AND EVALUATION..........................................................................3
Unit objectives....................................................................................................................................3
BLOOM’S TAXONOMY OF EDUCATIONAL OBJECTIVES..........................................................................6
Taxonomy of Educational Objectives.................................................................................................6
Steps for Stating Instructional Objectives..........................................................................................6
CLASSROOM ACHIEVEMENT TESTS AND ASSESSMENTS........................................................................7
Unit Objectives....................................................................................................................................7
TEST DEVELOPMENT – PLANNING THE CLASSROOM TEST..................................................................19
Test Development – Planning the Classroom Test...........................................................................19
ASSEMBLING, REPRODUCING, ADMINISTERING, AND SCORINGOF CLASSROOM TESTS....................22
Unit objectives..................................................................................................................................22
Arranging the test items...................................................................................................................22
Reproducing test items.....................................................................................................................23
SUMMARIZING AND INTERPRETING TEST SCORES..............................................................................25
Descriptive statistics.........................................................................................................................25
RELIABILITY AND VALIDITY OF A TEST..................................................................................................34
Test Reliability...................................................................................................................................35
Validity of a Test...............................................................................................................................39
JUDGING THE QUALITY OF A CLASSROOM TEST..................................................................................43
Judging the Quality of A Classroom Test..........................................................................................43
Item Analysis and Criterion Referenced Mastery Tests...................................................................46
Building A Test Item File (Item Bank)...............................................................................................48

2
UNIT ONE
AN OVERVIEW OF MEASUREMENT AND EVALUATION
Unit objectives
After reading through this unit and completion of the tasks and activities, you will be able to:

 Define basic terminologies in measurement and evaluation.

 Recognize the functions of measurement and evaluation.
 Recall the important issues to be raised in measurement and evaluation.
 Classify measurement tools based on their functions.
 Differentiate the use of norm-referenced and criterion referenced measures.

Test and testing

A test is a measuring tool or instrument in education. More specifically, a test is considered to be a kind
or class of measurement device typically used to find out something about a person. Most of the times,
when you finish a lesson or lessons in a week, your teacher gives you a test. This test is an instrument
given to you by the teacher in order to obtain data on which you are judged. It is an educationally
common type of device which an individual completes; the intent is to determine changes or gains
resulting from such instruments as inventory, questionnaire, scale etc.
Testing on the other hand is the process of administering the test on the pupils. In other words, the process
of making you or letting you take the test in order to obtain a quantitative representation of the cognitive
or non-cognitive traits you possess is called testing. So, the instrument or tool is the test and the process
of administering the test is testing.
Assessment
Assessment is a systematic basis for making inference about the learning and development of students…
the process of defining, selecting, designing, collecting, analysing, interpreting and using information to
increase students’ learning and development.
Measurement

In simple terms, measurement refers to giving or assigning a number value to a certain attribute or
behaviour. It is a systematic process of obtaining the quantified degree to which a trait or an attribute is
present in an individual or object. In other words, it is a systematic assignment of numerical values or
figures to a trait or an attribute in a person or object. Measurement conveys a broader meaning.
Measurement uses a variety of ways to obtain information in a quantitative form. Measurement can use
paper and pencil test, rating scales, and observations to assign a number value to a given trait or

3
behaviour. Measurement can also mean to both the score obtained by the measuring device and the
process used to obtain the score.
Evaluation

Evaluation is formative when conducted over small bodies of content to provide feedback in directing
further instruction and student learning. Formative evaluation then refers to an ongoing process which is
done before instruction, during instruction, and at the end of term or unit. Summative evaluation on the
other hand, is an evaluation conducted over a larger outcome of an extended instructional sequence, over
an entire course of a large or part of it.

Summative evaluation may serve for reporting a student’s overall achievement, licensing and certifying,
predicting success in related courses, assigning marks, and reporting overall achievement of a class.
Evaluation in the classroom context is directed to the improvement of student learning by supporting the
instructional process as in the following.

Types of evaluation

The different types of evaluation are: placement, formative, diagnostic and summative evaluations.
Placement Evaluation
This is a type of evaluations carried out in order to fix the students in the appropriate group or class. In
some schools for instance, students are assigned to classes according to their subject combinations, such
as science, Technical, arts, Commercial etc. before this is done an examination will be carried out. This is
in form of pretest or aptitude test. It can also be a type of evaluation made by the teacher to find out the
entry behavior of his students before he starts teaching. This may help the teacher to adjust his lesson
plan. Tests like readiness tests, ability tests, aptitude tests and achievement tests can be used.
Formative Evaluation
This is a type of evaluation designed to help both the student and teacher to pinpoint areas where the
student has failed to learn so that this failure may be rectified. It provides feedback to the teacher and the
student and thus estimating teaching success e.g., weekly tests, terminal examinations etc.
Diagnostic Evaluation
This type of evaluation is carried out most of the time as a follow up evaluation to formative evaluation.
As a teacher, you have used formative evaluation to identify some weaknesses in your students. You have
also applied some corrective measures which have not showed success. What you will now do is to
design a type of diagnostic test, which is applied during instruction to find out the underlying cause of
students persistent learning difficulties. These diagnostic tests can be in the form of achievement tests,
performance test, self-rating, interviews, observations, etc.

4
Summative evaluation:
This is the type of evaluation carried out at the end of the course of instruction to determine the extent to
which the objectives have been achieved. It is called a summarizing evaluation because it looks at the
entire course of instruction or program and can pass judgment on the teacher and students, the curriculum
and the entire system. It is used for certification. Think of the educational certificates you have acquired
from examination bodies. These were awarded to you after you had gone through some types of
examination. This is an example of summative evaluation.
Norm-Referenced
These are tests used to compare the performance of an individual with those of other individuals of
comparable background. In other words, the score of an individual in a norm-referenced testing has
meaning only when it is viewed in relation to the scores of other individuals on the test. The success or
failure of an individual on this kind of test is, therefore, determined on the basis of how he/she performs
in relation to his/her colleagues’ performance on the test.

Criterion-referenced tests
In contrast to norm-referenced tests criterion-referenced tests are tests when the score of an individual on
a given test is related to a specific performance standard for interpretation purposes. Such tests are
labelled criterion-referenced; the criterion in this respect being the specific performance standard.

If a given score of an examinee is equal to or greater than a specified standard (i.e., the criterion) the
examinee is said to have passed; otherwise, she/he is deemed to have failed the test or examination.
Therefore, the success or failure of an individual on a criterion-referenced test (CRT) depends on what
he/she scores on the test in relation to the set standard, and this may depend, to a large extent on the test
content itself or the relative strictness of the marker if the test is not an objective one. As a result of any
criterion measure, therefore, it is possible for all the members of a class to pass or fail the test depending
on a number of obvious reasons such as the easiness or difficulty of items of the test.
SELF CHECK
1. Explain the difference between measurement and evaluation, assessment and testing.
2. What are the types of evaluation?
3. What is the major difference between test and testing?
4. In your own words define Assessment.
5. Give an example of a test.
6. What are the major differences and similarities between formative evaluation and diagnostic
evaluation?
7. List 5 purposes of measurement and evaluation?

5
8. List the instruments which you can use to measure the following: weight, height, length,
achievement in Mathematics, performance of students in technical drawing, attitude of workers
towards delay in the payment of salaries

UNIT TWO
BLOOM’S TAXONOMY OF EDUCATIONAL OBJECTIVES
Unit objectives
At the end of this unit, you will be able to:

 Explain the role of objectives in assessment and evaluation

 Classify instructional objective into cognitive, affective and psychomotor
 Follow the guidelines in stating or writing instructional objectives.
 List categories of each taxonomy of educational objectives
 Classify action verbs used in each category

Taxonomy of Educational Objectives

Benjamin Bloom and a group of people involved in education came up with a list of levels of
difficulty in what you can do with what you know. This group of different levels of describing
how you approach a problem is called taxonomy/classification: Cognitive, Affective, and
Psychomotor domains.
 The Cognitive Domain is concerned with knowledge outcomes and intellectual abilities
and skills.
 The Affective Domain is concerned with the attitudes, interests, appreciation, and modes
of adjustment.
 The Psychomotor Domain is concerned with motor skills
Steps for Stating Instructional Objectives
The list of objectives for a course, or unit should include all important learning outcomes
(cognitive, affective, and psychomotor outcomes) and should be stated in a manner that clearly
conveys what students are like at the end of the learning process. The following summary of
steps provides guidelines for obtaining a clear statement of instructional objectives.
 Stating general instructional objectives
 State each general objective as an intended learning outcome; in terms of students’
terminal performance

6
 Begin each general objective with a verb; like knows, applies, interprets
 State each general objective to include only one general learning outcome, that is,
objectives should be unitary; not knows and understands
 State each general objective at the proper level of generality; it should encompass a
readily definable domain of response
 Stating specific learning outcomes
 List beneath each general instructional objective a representative sample of specific
learning outcomes that describes the terminal performance students are expected to
demonstrate
 Begin each specific learning outcome with an action verb that specifies observable
performance like identifies, describes
 Make sure that each specific learning outcome is relevant to the general objective it
describes
Self-Check Exercises

Instruction I: Choose the best answer from the alternatives given for each item

1. Which one of the following action verbs can be used in synthesis level of cognitive domain?
A. Write B. Transfer C. Distinguish D. Interpret
2. Which one of the following action verbs does not indicate learning outcome at the
evaluation level?
A. Decide B. Conclude C. Validate D. Discriminate
3. Which one of the following affective domains does the learner expected to develop a
consistent philosophy of life?
A. Organization B. Characterization C. Valuing D. Responding
4. Which one of the following action verbs can be used in synthesis level of cognitive
domain?
A. Write B. Transfer C. Distinguish D. Interpret

UNIT THREE
CLASSROOM ACHIEVEMENT TESTS AND ASSESSMENTS
Unit Objectives
After going through this unit, the learner should be able to:

7
 List the different types of items used in classroom test.
 Describe the different types of objectives questions.
 Describe the different types of essay questions.
 Compare the characteristics of objectives and essay tests.
 Explain the meaning of objective test,
TYPES OF TESTS USED IN THE CLASSROOM
There are different types of test forms used in the classroom. These can be essay test, objectives test,
norm-referenced test or criterion referenced test. But we are going to concentrate on the essay test and
objectives test. These are the most common tests which you can easily construct for your purpose in the
class.
Types of objective tests
The objective test can be classified into those that require the examinee to supply the answer to the test
items (free-response type) and those that require the examinee to select the answer from a given number
of alternatives (fixed response type). The free-response type consists of the short answer and completion
items while the fixed response type is commonly further divided into true-false or alternative response
matching items and multiple-choice items.
Selection type items
This is the type where possible alternatives are provided for the teste to choose the most appropriate or
the correct option. Can you mention them? Let us take them one by one.
 True/False or two option items
The true-false type of test is representative of a somewhat larger group called alternate-response items
such as, yes-no, correct-incorrect, agree-disagree, right-wrong, etc... This group consists of any question
in which the student is confronted with two possible answers. Since most of the points discussed here are
equally applicable to all alternative-response items, since teachers are familiar with the true-false type, the
following discussion will concentrate on true-false items.
 Advantages of true/false items
 It is commonly used to measure the ability to identify the correctness of statements of fact,
definitions of terms, statements of principles and other relatively simple learning outcomes to
which a declarative statement might be used with any of the several methods of responding.
 It is also used to measure examinee ability to distinguish fact from opinion; superstition from
scientific belief.
 It is used to measure the ability to recognize cause – and – effect relationships.
 It is best used in situations in which there are only two possible alternatives such as right or
wrong, more or less, and so on.

8
 It is easy to construct alternative response item but the validity and reliability of such item depend
on the skill of the item constructor. To construct unambiguous alternative response item, which
measures significant learning outcomes, requires much skill.
 A large number of alternative response items covering a wide area of sampled course material can
be obtained and the examinees can respond to them in a short period of time.
 Disadvantages of true/false items
 It requires course material that can be phrased so that the statement is true or false without
qualification or exception as in the Social Sciences.
 It is limited to learning outcomes in the knowledge area except for distinguishing between facts
and opinion or identifying cause – and – effect relationships.
 It is susceptible to guessing with a fifty-fifty chance of the examinee selecting the correct answer
on chance alone. The chance selection of correct answer has the following effects:
i. It reduces the reliability of each item thereby making it necessary to include many
items in order to obtain a reliable measure of achievement.
ii. The diagnostic value of answers to guess test items is practically nil because analysis
based on such response is meaningless.
iii. The validity of examinees response is also questionable because of response set.

 Guidelines for preparing true false items

 Keep language as simple and clear as possible.

 Avoid using universal descriptors such as “never”, “none”, “always”, and “all”.
 Test wise students will recognize that thereare few absolutes.
 Use certain key words sparingly since they tip students off to the correct answer. (The words all,
always, never, none, and only usually indicate a false statement, whereas the words generally,
sometimes, usually, maybe, and often are frequently used in true statements.)
 Do not include two ideas in one statement unless you are evaluating students understanding of
cause-and-effect relationships.
- Poor: porpoises are able to communicate because they are mammals. T F
- Better: Porpoises are mammals. T F
- Porpoises are able to communicate. TF
 Include more false than true statements in any given test. Because
- Tendency to mark more statements true than false.
- Discrimination between those who know the content and those who do not is greater for false
expressions.

9
 Avoid using negative statements especially double negatives
- Under the demands of the testing situation, students may fail to see the negative
qualifier.
- Poor: None of the steps in planning stages of test construction are not important. T F
- Better: All of the steps in planning stage of test construction are important. T F
 Test important ideas, knowledge, or understanding (rather than trivia, general knowledge, or
common sense). Look the following examples:
o Artists live longer than farmers. T F
o The coefficient of correlation shows the cause-and-effect relationship between two
paired variables. T F
o Avoid copying statements directly taken from textbook and other written materials.
 Keep the word length of true statements about the same as that of false statements
 Make sure that the statements used are entirely true or entirely false. (Partially or marginally true
or false statements cause unnecessary ambiguity.)
Matching items
The matching test items usually consist of two parallel columns. One column contains a list of word,
number, symbol or other stimuli (premises) to be matched to a word, sentence, phrase or other possible
answer from the other column (responses) lists. The examinee is directed to match the responses to the
appropriate premises. Usually, the two lists have some sort of relationship. Although the basis for
matching responses to premises is sometimes self-evident but more often it must be explained in the
directions.
 Advantages of matching items
 It is used whenever learning outcomes emphasize the ability to identify the relationship
between things and a sufficient number of homogenous premises and responses can be
obtained.
 Essentially used to relate two things that have some logical basis for association.
 It is adequate for measuring factual knowledge like testing the knowledge of terms,
definitions, dates, events, references to maps and diagrams.
 The major advantage of matching exercise is that one matching item consists of many
problems. This compact form makes it possible to measure a large amount of related factual
material in a relatively short time.
 It enables the sampling of larger content, which results in relatively higher content validity.
 The guess factor can be controlled by skilfully constructing the items such that the correct
response for each premise must also serve as a plausible response for the other premises.

10
 The scoring is simple and objective and can be done by machine.
 Disadvantages of matching items
 It is restricted to the measurement of factual information based on rote learning because the
material tested lend themselves to the listing of a number of important and related concepts.
 Many topics are unique and cannot be conveniently grouped in homogenous matching
clusters and it is sometimes difficult to get homogenous materials clusters of premises and
responses that can sufficiently match even for contents that are adaptable for clustering.
 It requires extreme care during construction in order to avoid encouraging serial
memorization rather than association and to avoid irrelevant clues to the correct answer.
 Guidelines for preparing matching items

 Use only homogeneous material in a set of matching items (i.e., dates and places should not
be in the same set).
 Use the more involved expressions in the stem and keep the responses short and simple.
 Supply directions that clearly state the basis for the matching, indicating whether or not a
response can be used more than once, and stating where the answer should be placed.
 Make sure that there are never multiple correct responses for one stem (although a response
may be used as the correct answer for more than one stem).
 Avoid giving inadvertent grammatical clues to the correct response (e.g., using a/an, singular/
plural verb forms).
 Arrange items in the response column in some logical order (alphabetical, numerical, and
chronological) so that students can find them easily.
 Avoid breaking a set of items (stems and responses) over two pages.
 Use no more than 15 items in one set.
 Provide more responses than stems to make process-of-elimination guessing less effective.
 Number each stem for ease in later discussions.
 Use capital letters for the response signs rather than lower-case letters.

The multiple-choice items (MCQs)

The multiple-choice item consists of two parts – a problem and a list of suggested solutions. The problem
generally referred to as the stem may be stated as a direct question or an incomplete statement while the
suggested solutions generally referred to as the alternatives, choices or options may include words,
numbers, symbols or phrases. In its standard form, one of the options of the multiple-choice item is the
correct or best answer and the others are intended to mislead, foil, or distract examinees from the correct

11
option and are therefore called distracters, foils or decoys. These incorrect alternatives receive their name
from their intended function – to distract the examinees who are in doubt about the correct answer.
 Advantages of the MCQs
 The multiple-choice item is the most widely used of the types of tests available. It can be
used to measure a variety of learning outcomes from simple to complex.
 It is adaptable to any subject matter content and educational objective at the knowledge
and understanding levels.
 It can be used to measure knowledge outcomes concerned with vocabulary, facts,
principles, method and procedures and also aspects of understanding relating to the
application and interpretation of facts, principles and methods.
 Most commercially developed and standardized achievement and aptitude tests make use
of multiple-choice items.
 The main advantage of multiple-choice test is its wide applicability in the measurement
of various phases of achievement.
 It is the desirable of all the test formats being free of many of the disadvantages of other
forms of objective items. For instance, it presents a more well-defined problem than the
short-answer item, avoids the need for homogenous material necessary for the matching
item, reduces the clues and susceptibility to guessing characteristics of the true-false item
and is relatively free from response sets.
 It is useful in diagnosis and it enables fine discrimination among the examinees on the
basis of the amount of what is being measured possessed by them.
 It can be scored with a machine.
 Disadvantages/limitations of the MCQs
 It measures problem-solving behaviour at the verbal level only.
 It is inappropriate for measuring learning outcomes requiring the ability to recall,
organize or present ideas because it requires selection of correct answer.
 It is very difficult and time consuming to construct.
 It requires more response time than any other type of objective item and may favour the
test-wise examinees if not adequately and skilful constructed.
 Measuring evaluation and synthesis can be difficult.
 Inappropriate for measuring outcomes that require skilled performance
 Guidelines for preparing multiple-choice items

 Use the stem to present the problem or question as clearly as possible; eliminate excessive
wordiness and irrelevant information.

12
 Use direct questions rather than incomplete statements for the stem.
 Include as much of the item as possible in the stem so that alternatives can be kept brief.
Include in the stem words that would otherwise be repeated in each option.
 In testing for definitions, include the term in the stem rather than as one of the alternatives.
 List alternatives on separate lines rather than including them as part of the stem so that they
can be clearly distinguished.
 Keep all alternatives in a similar format (e. g. All phrases, all sentences, etc.).
 Make sure that all options are plausible responses to the stem. (Poor alternatives should not
be included just for the sake of having more options.)
 Check to see that all choices are grammatically consistent with the stem.
 Try to make alternatives for an item approximately the same length. (Making the correct
response consistently longer is a common error.)
 Use misconceptions which students have indicated in class or errors commonly made by
students in the class as the basis for incorrect alternatives.
 Use “all of the above” and “none of the above” sparingly since these alternatives are often
chosen on the basis of incomplete knowledge. Words such as “all,” “always,” and “never” are
likely to signal incorrect options.
 Use capital letters (A, C, D, and E) on tests as responses rather than lower-case letters (“a”
gets confused with “d” and “c” with “e” if the type or duplication is poor). Instruct students to
use capital letters when answering (for the same reason), or have them circle the letter or the
whole correct answer, or use scan able answer sheets.
 Try to write items with equal numbers of alternatives in order to avoid asking students to
continually adjust to a new pattern caused by different numbers.
 Put the incomplete part of the sentence at the end rather than the beginning of the stem.
Phrase the item as a statement rather than a direct question.
 Use negatively stated items sparingly. (When they are used, it helps to underline or otherwise
visually emphasize the negative work.)
 Make sure that there is only one best or correct response to the stem. If there are multiple
correct responses, instruct students to “choose the best response.”
 Limit the number of alternatives to five or less. (The more alternatives used, the lower the
probability of getting the correct answer by guessing. Beyond five alternatives, however,
confusion and poor alternatives are likely.)

Supply type items

13
This is the type of test item, which requires the testee to give very brief answers to the questions. These
answers may be a word, a phrase, a number, a symbol or symbols etc. Supply test items can be in the
form of short answer or completion form. Bothare supply-type test items consisting of direct questions
which require a short answer (short-answer type) or an incomplete statement or question to which a
response must be supplied by an examinee (completion type). The answers to such questions could be a
word, phrase, number or symbol. It is easy to develop and if well developed, the answers are definite and
specific and can be scored quickly and accurately.
 Advantages of supply type items

 Suitable for measuring simple learning outcomes

 Measure the ability to interpret diagrams, charts, graphs and pictorial data.

 Most effective for measuring a specific learning outcome such as computational learning
outcomes in mathematics and sciences

 It measures simple learning outcomes, which makes it easier to construct.

 It minimizes guessing.
 Disadvantages of supply type items
 It is not suitable for measuring complex learning outcomes. It tends to measure only factual
knowledge and not the ability to apply such knowledge and it encourages memorization if
excessively used.
 It cannot be scored by a machine because the test item can, if not properly worded, elicit
more than one correct answer. Hence the scorer must make decision about the corrections of
various responses. For example, a question such as “Where did Menellik II born?” could be
answered by name of the town, state, country or even continent. Apart from the multiple
correct answers to this question, there is also the possibility of spelling mistakes associated
with free-response questions that the scorer has to contend with.
 Guidelines for preparing short- answer items

 Questions must be carefully worded so that all students understand the specific nature of the
question asked and the answer required.

_Poor: Tewodros II defeated Ras Ali in _______________?

_Better: In what battle fought in 1869 did Tewodros II defeat Ras Ali?

Or In what year did Tewodros II defeat Ras Ali at Deresge?

14
 Word completion or fill in the blank questions so that missing information is at, or near the end
of, the sentence. Make reading and responding easier.

_Poor: In the year ______________ Canada turned 100 yrs old.

_Better: Canada turned 100 yrs old in the year____________.

 When an answer is to be expressed in numerical units, the unit should be stated.

_ Poor: If a room measures 7 meters by 4 meters, the perimeter is _____________.

- Better: If a room measures 7 meters by 4 meters, the perimeter is ______meters (or m).

 Do not use too many blanks in completion items. The emphasis should be on knowledge and
comprehension not mind reading.

Consider: In the year __________, Prime Minister ___________ signed the _________, which led
to a __________ which was ____________.

 Word each item in specific terms with clear meanings so that the intended answer is the only one
possible, and so that the answer is a single work, brief phrase, or number.

 Omit important (key) words.

_In supply items, present much of the statement and blank the key word.

Poor: ___ are words that refer to particular ,

________, or ___________.

_Better: Proper nouns are words that refer to particular _________, __________ or
________.

_Best: Words that refer to particular persons, objects or things are ____________.

Essay type items

Essay test items are of two types:

 Extended/Unrestricted/Open-ended/free response
 Restricted/Closed-ended

Extended response items

 No restrictions on response

15
 No restrictions on No of pages

 Originality required

 No bound on the depth, breadth and the organization of the response

 Expose individual differences in attitudes, values and creative ability.

 Applicable in measuring higher level learning outcomes of the cognitive level such as
analysis, synthesis & evaluation level

Limitations/disadvantages of extended response essay items

o Scoring is difficult and unreliable (scorer unreliability)

o Insufficient for measuring knowledge of facts

Examples-extended responses type:

 Describe the sampling techniques used in research studies.

 Explain the various ways of preventing accident in a school workshop or laboratory.

 Describe the processes of producing or cutting screw threads in the school technical workshop.

 Describe the processes involved in cement production.

 Why should the classroom teacher state his instructional objectives to cover the three domains of
educational objectives?

 Open and Distance Learning is a viable option for the eradication of illiteracy in Ethiopia.
Discuss

 Which of the following alternatives would you favor & why?

 Explain why you agree or disagree with the following statement:

Restricted essay types

 Such items are directional questions and aimed at the desired responses.

 Useful for measuring learning outcomes at the lower cognitive levels:

16
o Knowledge, comprehension, analysis

 More efficient for measuring knowledge of factual information

 More reliability in scoring as compared to extended type

 Reduces scoring difficulty

 Examples:

o Give three advantages and two disadvantages of essay tests.

o State four uses of tests in education.

o Explain five factors which influence the choice of building site.

o Mention five rules for preventing accident in a workshop.

o State 5 technical drawing instruments and their uses.

 Advantages of essay tests

 Find out how ideas are related to each other.

 They increase security.

 Relatively easy to construct when compared to objective items.

 Measure higher level learning outcomes.

 Have influence on student study habits.

 Require the instructor to give critical comments.

 It is easy and economical to administer.

 Promote the development of problem – solving skills.

 Disadvantages/limitations
 Scoringis time consuming, subjective, & difficult.
 Low content validity: inadequate sampling of content
 Scorer unreliability/subjectivity in scoring
 Not suitable for item analysis

17
 Difficulty levels and discrimination powers
 How to make essay questions less subjective
 Avoid open-ended questions.
 Let the students’ answer the same questions. Avoid options/choices.
 Use students’ numbers instead of their names, to conceal their identity.
 Score all the answers to each question for all students at a time.
 Do not allow score on one question to influence you while marking the next. Always
rearrange the papers before you mark.
 Do not allow your feelings or emotions to influence your marking.
 Decide a policy for handling irrelevant or incorrect responses:

SELF CHECK
1. Briefly explain the meaning of objective test.
2. What are the two major advantages of objective test items over the Essay test item?
3. What is the major feature of objective test that distinguishes it from essay test?
4. Which of the types of objective test items would you recommend for school wide test and why?
5. How would you make essay questions less subjective?
6. What are the two sub-divisions of the supply test items?
7. (a) Subjectivity in scoring is a major limitation of essay test?
True / false
(b) Essay questions cover the content of a course and the objectives as comprehensively as
possible? True / False
(c) Grading of essay questions is time consuming?
True / False
(d) Multiple choice questions should have only two options.
True / False
8. Construct five multiple choice questions in any course of your choice.
9. Give 5 examples of free response or extended response questions
10. Briefly identify the two most outstanding weaknesses of essay test as a measuring instrument.

UNIT FOUR
TEST DEVELOPMENT – PLANNING THE CLASSROOM TEST
Unit objectives
By the time you finish this unit, you will be able to:

18
 Identify the sequence of planning a classroom test,
 Prepare table of specifications for classroom test in a given subject.
 recognize some common problems of teacher made tests
 carry out content survey in the development of table of specifications
Test Development – Planning the Classroom Test
The development, of good questions or items writing for the purpose of classroom test, cannot be
taken for granted. An inexperienced teacher may write good items by chance. But this is not
always possible. Development of good questions or items must follow a number of principles
without which no one can guarantee that the responses given to the tests will be relevant and
consistent. In this unit, we shall examine the various aspects of the teacher’s own test.
Some Pit Falls in Teacher – Made Tests
The following observations have been made about teacher-made tests. They are listed below in
order to make you avoid them when you construct your questions for your class tests.
 Most teacher-tests are not appropriate to the different levels of learning outcomes. The
teachers specify their instructional objectives covering the whole range simple recall to
evaluation. Yet the teachers’ items fall within the recall of specific facts only
 Many of the test exercises fail to measure what they are supposed to measure. In other
words most of the teacher-made tests are not valid. You may wonder what validity is. It is
a very important quality of a good test, which implies that a test is valid if it measures
what it is supposed to measure. You will read about in details later in this course.
 Some classroom tests do not cover comprehensively the topics taught. One of the
qualities of a good test is that it should represent the entire topic taught. But, these tests
cannot be said to be a representative sample of the whole topic taught.
 Most tests prepared by teacher lack clarify in the wordings. The questions of the tests are
ambiguous, not precise, and not clear and most of the times carelessly worded. Most of
the questions are general or global questions.
 Most teacher-made tests fail item analysis test. They fail to discriminate properly and not
designed according to difficulty levels.
These are not the only pit falls. But you should try to avoid both the ones mentioned here and
those not mentioned here. Now let us look at how to develop test items.
Considerations in Planning a Classroom Test

19
To plan a classroom test that will be both practical and effective in providing evidence of
mastery of the instructional objectives and content covered requires relevant considerations.
Hence the following serves as guide in planning a classroom test.
 Determine the purpose of the test;
 Describe the instructional objectives and content to be measured.
 Determine the relative emphasis to be given to each learning outcome;
 Select the most appropriate item formats (essay or objective);
 Develop the test blue print to guide the test construction;
 Prepare test items that are relevant to the learning outcomes specified in the test plan;
 Decide on the pattern of scoring and the interpretation of result;
 Decide on the length and duration of the test, and
 Assemble the items into a test, prepare direction and administer the test.
Analysis of the Instructional Objectives
The instructional objectives of the course are critically considered while developing the test
items. This is because the instructional objectives are the intended behavioral changes or
intended learning outcomes of instructional programs which students are expected to possess at
the end of the course or program of study. The instructional objectives usually stated for the
assessment of behavior in the cognitive domain of educational objectives are classified by Bloom
(1956) in his taxonomy of educational objectives into knowledge, comprehension, application,
analysis, synthesis and evaluation. The objectives are also given relative weight in respect to the
level of importance and emphasis given to them. Educational objectives and the content of a
course form the nucleus on which test development revolves.
Content Survey
This is an outline of the content (subject matter or topics) of a course of program to be covered in
the test. The test developer assigns relative weight to the outlined content – topics and subtopics
to be covered in the test. This weighting depends on the importance and emphasis given to that
content area. Content survey is necessary since it is the means by which the objectives are to be
achieved and level of mastering determined.
Planning the table of specifications/test blue print
The table of specification is a two-dimensional table that specifies the level of objectives in
relation to the content of the course. A well-planned table of specification enhances content

20
validity of that test for which it is planned. The two dimensions (content and objectives) are put
together in a table by listing the objectives across the top of the table (horizontally) and the
content down the table (vertically) to provide the complete framework for the development of the
test items. The table of specification is planned to take care of the coverage of content and
objectives in the right proportion according to the degree of relevance and emphasis (weight)
attached to them in the teaching learning process. The table of specifications is a two-
dimensional table that specifies the level of objectives in relation to the content of the course or
the type of the items in relation to the content of the course. These are called table of
specifications by objective and table of specifications by test type respectively. A hypothetical
table of specification by objective is illustrated in table 4.1 below:
Table 4.1 A Hypothetical Test Blue Print/Table of Specifications by objective
Content Objectives Total

Area Weight Knowl. Compreh. Applicat. Analys. Synth. Eval. 100 %

10% 15% 15% 30% 10% 20%

Set A 15% - 1 - 2 - - 3

Set B 15% - 1 - 2 - - 3

Set C 25% 1 - 1 1 1 1 5

Set D 25% 1 - 1 1 1 1 5

Set E 20% - 1 1 - - 2 4

Total 100% 2 3 3 6 2 4 20

Self-Check Exercises

Dear Trainees! You have now completed your study of chapter four. Therefore, you are expected to
answer the following self-test questions.

1. Identify five important considerations in planning a classroom test.

21
2. Why is it necessary to prepare a table of specification or test blue print before writing test
items?
3. What are the first steps to follow in writing test items?
4. Write five pitfalls of teacher made tests.

UNIT FIVE
ASSEMBLING, REPRODUCING, ADMINISTERING, AND
SCORINGOF CLASSROOM TESTS
Unit objectives
By the time you finish this unit you will be able to:
 Explain the meaning of test administration
 State the steps involved in test administration
 Identify the need for civility and credibility in test administration
 State the factors to be considered for credible and civil test administration
Introduction
In most cases, different item formats cannot be administered orally or cannot be easily written on
chalkboard to administer. This means that there is a necessity for reproducing tests and during
reproduction care must be taken to assure that:

Arranging the test items

It is very essential that the task presented to students must be as clear as possible. To do this we have to
group all items of the same format together rather than to intersperse them throughout the test. The
advantage of doing this is that:

 Younger children may not realize that the first set of directions is applicable to all items
of a particular format and may become confused;
 It makes it easier for the examinee to maintain a particular mental set rather than having
to change from one to another;
 It makes it easier for the teacher to score the test, mainly when hand scoring is done.

In arranging item formats due emphasis should be given to the complexity of mental activity they demand
in answering them. In this way we have to arrange item formats so that they progress from the simple to
the complex. For instance, items that measure simple recall should precede those that measure
understanding and application.

22
According to Ground (1985) item formats can be arranged in the following way, which roughly
approximates the complexity of the instructional objectives measured. Hence:

 True-false or alternative-response items;

 Matching items;
 Short-answer items;
 Multiple choice items;
 Interpretive exercise; and
 Essay questions should appear in a test in this order.

Writing test directions

Teachers should be aware of the significance of providing clear and concise directions. The directions
given should transmit clear information concerning what to do, how to do and where to record answers. In
other words, directions should tell students:

 The time to be allotted to the various sections

 The value of the items, and
 Whether or not students should guess at any answers they may be unsure of.

Reproducing test items

1. Space the items so that they are not crowded, if items are tightly crammed together, it would be
difficult for student to easily read and answer the questions. This is largely true when writing
multiple choice items, hence for multiple choice tests the options should be placed in a vertical
column below the test item rather than in paragraph fashion.
2. For the alternate response test have a column of T’S and F’S at either the right- or left-hand side
of the items. If we do this, students need only to circle or underline the correct response.
3. Concerning matching exercise, have the two lists on the same page.
4. When we use interpretive exercise, the introductory material-be it a graph, chart, diagram, or
piece of prose-and the items based on it should e on the same page. In case if the material used is
too long, facing pages should be used when possible.
5. All items should be numbered consecutively. For the matching and multiple-choice item, the
material in the list to be matched and/or the options to be used should be lettered.
6. If the responses are recorded directly on the test booklet, it will make scoring easier if all
responses to objective items are recorded on one side of the page, regardless of the item format
used.

23
7. In the elementary grades, if workspace is needed to solve numerical problems, provide this space
in the test booklet rather than having examinees use scratch paper. This would minimize
recording errors that might occur when students transfer questions from test booklet to scratch
paper for computation.
8. All illustrative material used should be clear, legible, and accurate.
9. Proof the test carefully before it is reproduced. If you can it is better if a teacher who is teaching
the same subject with you checks errors for early correction. But if errors are found after the test
has been produced, they should be called to the students’ attention before the actual test is begun.
10. Even for essay tests, every student should have a copy of the test.
Teachers should not write the questions on the black board.

Self-assessment exercise
1. What is test administration?

2. What are the steps involved in test administration?

3. List the methods for scoring

i. Essay test item

ii. Objective test items.
1. What is a “halo” effect?
2. Discuss different conditions that should be taken into account while scoring objective and essay
tests.

UNIT SIX
SUMMARIZING AND INTERPRETING TEST SCORES
Unit objectives
Dear learner, by the time you finish this unit, you will be able to:
 Interpret classroom test scores by criterion-referenced or norm referenced
 Calculate the average result of a given class of test scores
 Decide if there is a relationship between two factors
 Convert raw scores to z-scores and T-scores
 Compare one’s score with the score of a group
Descriptive statistics
One of the major purposes of statistics in test use is to allow us to describe and summarize data-
for example, test scores- in efficient and useful ways.
24
1. Describe
2. Interpret
3. Pass judgment

For this we need to apply statistical procedures. It includes the followings:

1. Measures of Central tendency

2. Measures of Variability
3. Measures of Relative position
4. Measures of Relationship

Measures of central tendency/measures of location

 What is the central data point?

 What is the average/ most popular/ mean / typical/ “middle” / most common data value?
 A value (i.e. single number) that is used to represent where the majority of the data values
lie for a given random variable.
 Three commonly used central location measures are:
1. Mean
2. Median
3. Mode

The Mean( X )

It is the Arithmetic average of the observed scores. It is also the most commonly used measure of
location. This is the most popular and useful measure of central location.
−
∑X
1. Mean( X ) = where∑=summation
N
X=Raw score
N=total No of students
−
X =mean
−
∑ fX
2. Mean( X ) = where∑=summation
N
X=raw score

25
N=total No of students
f= frequency

ADVANTAGES OF USING THE MEAN

 It is a single number
 It takes every data point into account
 It is simple to understand and is understood by most people
 It is an unbiased measure, meaning that it neither overestimates nor underestimates the
actual central value
DISADVANTAGES OF USING THE MEAN
 It can only be calculated for quantitative variables.
 It is affected by extreme values, commonly known as outliers (Outliers are extreme
values --either extremely smaller extremely large--relative to the majority of the other
data values).

The Median

 The median of a random variable is the value which divides ranked data into two equal
halves, i.e., it is the middle number of an ordered set of data.
 The median is valid for quantitative random variables only.
 It is the score that divides a score distribution into two equal parts.
 It is most appropriate when dealing with small number of students.

Advantages of the median

 It is easy to understand.
 It is not affected by outliers

One of its disadvantages is that it can only be calculated for quantitative variables.

For ungrouped distribution

a) Even number of scores

26
- Arrange the scores in decreasing /increasing order.
- Take the middle scores and divide by two.

Example1: Here is the score of six people 14, 11, 8,6,7,9 calculate the median.

( N /2+ N /2+1) (6 /2 )th +(6/2+1)th term

Remark: Median= 2 th
term = 2

Mdn=
(3rd +4 th )term
2
9+ 8
[ ]
=Mdn= 2 = 17/2= 8.50

b) Odd number of scores

- Arrange the scores in descending/ascending order.
- Take the middle score.

Example 2: Calculate the median of the following scores 14, 11, 9, 6,8,7,5.

Step 1 Arrange 5,6,7,8,9,11,14,17,22

Remark: mdn= [ ]
N +1 th
2
term

The Mode

It is the most frequent value. Mode means most ‘’popular’’. The score/s with the highest
frequency is/are the mode of the distribution. The mode is valid for both quantitative and
qualitative random variables. A set of data may have one mode, or two or more modes.

Advantages of using the mode

1. It is easy to calculate
2. It is valid for all data types.
3. It is not affected by outliers
4. Most appropriate when numerical values in a data set are labels for categories
(nominal)

Disadvantages of using the mode

27
1. There could be more than one mode, which could lead to confusion since no single value
is then representative of this number
2. It could be a random event and not truly representative of the data, especially in a
relatively small data set.

Measures of Variability

Consider the following three distributions:

Distribution I 37 37 37 37 37

Distribution II 33 36 37 38 41

Distribution III 1 11 19 20 134

Note: the three distributions have the same arithmetic mean=37. But there is marked difference
among the distributions.

- In Distribution I, the five values are identical.

- In Distribution II, there is small scatter of values.
- In Distribution III, there is a great dispersion of values.

Therefore, the topic on dispersion of scores is concerned with studying measures which show the
amount of variability among data.

1. The Range
2. The Variance
3. The Standard Deviation

The range

The range is the difference between the highest and the lowest scores in a distribution. The
higher the value of the range, the greater the difference between the students in academic
achievement. However, it is a crude measure of variability.

The variance

28
Variance is the arithmetic mean of the squared deviations of individual scores from the mean. It
is expressed in squared units. It shows a spread or dispersion of scores i.e. a tendency for any set
of scores to depart from a central point or any other point.

Ungrouped score distribution

Definitional formulae

2
δ =∑ ¿ ¿ ; ∑ f ¿ ¿;

Where σ 2=Variance

X= Raw score
=Mean of the distribution
f=frequency
N=total number of students

Computational formula: δ 2=N ∑ f X 2−¿ ¿

The standard deviation

The standard deviation is a measure of how much a set of scores varies on the average around the
mean of scores. In other words, it reveals how closely scores tend to vary from the mean.

Standard deviation is the positive square root of variance. It measures the extent to which scores
tend to deviate from the mean. It is useful for:

a) Comparing one or more sets of observations /scores

b) Comparing an individual’s performance to that of a group.
For ungrouped Distribution

σ¿ √ ∑ ¿ ¿ ¿; σ¿ √ ∑ f ¿ ¿ ¿

- The larger the σ, the greater is the difference in academic achievement. The smaller the
standard deviation the less the scores tend to vary from the mean.
29
Measures of Position
Percentiles and Percentile Ranks
Percentiles and percentile ranks are frequently used as indicators of performance in both the
academic and corporate worlds. Percentiles and percentile ranks provide information about how
a person or thing relates to a larger group. Relative measures of this type are often extremely
valuable to researchers employing statistical techniques.
Percentiles
A percentile is the point in a distribution at or below which a given percentage of scores is
found. OR The value below which P% of the values fall is called the Pth percentile
For example, the 5th percentile is denoted by P5, the 10th by P10 and 95th by P95.
Percentile Rank
A percentile rank is used to determine where a particular score or value fits within a broader
distribution. For example: A student receives a score of 75 out of 100 on an exam and wishes to
determine how her score compares to the rest of the class. She calculates a percentile rank for a
score of 75 based on the reported scores of the entire class. Her percentile rank in this example
would be 80, meaning that 80 percent of scores on the exam were at or below 75.
Notes:
I. A Percentile is a value in the data set.
II. A Percentile rank of a given value is a percent that indicates the percentage of data is
smaller than the value.
III. Percentiles are not the same as percentage.
Calculation of Percentiles and Percentile ranks
A. In case of ranked raw data:
th
The (approximate) value of the K percentile Pk , is calculated by the formula,

Pk
( Kn
= Value of the 100
) th
term
Where k –is the percentile one wishes to calculate
n- is the total number of values in the distribution.
The percentile rank (PR) of a given value, xi, is obtained by the formula,

Percentile Rank (PR of xi) =

( Numberofvalueslessthanxi
TotlalNumberofValues )
×100 %

30
The Z – Scores
The Z – score is the simple standard score which expresses test performance simply and directly
as the number of standard deviation units a raw score is above or below the mean. The Z-score is
computed by using the formula.
−
X− X
Z – Score = SD
Where
X = any raw score
−
X = arithmetic mean of the raw scores

SD = standard deviation
When the raw score is smaller than the mean the Z –score results in a negative (-) value which
can cause a serious problem if not well noted in test interpretation. Hence Z-scores are
transformed into a standard score system that utilizes only positive values.
Measures of Relationship

Measures of association provide a means of summarizing the size of the association between two
variables. Most measures of association are scaled so that they reach a maximum numerical
value of 1 when the two variables have a perfect relationship with each other. They are also
scaled so that they have a value of 0 when there is no relationship between two variables. While
there are exceptions to these rules, most measures of association are of this sort. Some measures
of association are constructed to have a range of only 0 to 1; other measures have a range from -1
to +1. The latter provide a means of determining whether the two variables have a positive or
negative association with each other.

The two commonly used measures are:

 Correlation
 Chi-square
A. Correlation

31
 A correlation coefficient is used to measure the strength of the relationship between numeric
variables (e.g., weight and height)
 If the coefficient is between 0 and 1, as one variable increases, the other also increases. This
is called a positive correlation. For example, height and weight are positively correlated
because taller people usually weigh more.
 If the correlation coefficient is between -1 and 0, as one variable increases the other
decreases. This is called a negative correlation. For example, age and hours slept per night
are negatively correlated because older people usually sleep fewer hours per night.
 There are two common methods of computing correlation coefficient. These are:
 Pearson Product-Moment Correlation.
 Spearman Rank-Difference Correlation
Pearson Product-Moment Correlation:
This is the most widely used method and the coefficient is denoted by the symbol r. This method
is favoured when the number of scores is large and it’s also easier to apply to large group. The
computation is easier with ungrouped test scores and would be illustrated here. The computation
with grouped data appears more complicated and can be obtained from standard statistics test
book.
The following steps listed below will serve as guide for computing a product-moment correlation
coefficient (r) from ungrouped data.
Step 1 - Begin by writing the pairs of score to be studied in two columns. Make certain that the
pair of scores for each examinee is in the same row. Call one Column X and the other Y
Step 2 - Square each of the entries in the X column and enter the result in the X2 column
Step 3 - Square each of the entries in the Y column and enter the result in the Y2 column
Step 4 - In each row, multiply the entry in the X column byte entry in the Y column, and enter
the result in theXY column

Step 5 - Add the entries in each column to find the sum of ( ∑ ) each column.
Step 6 -Apply the following formula

32
N ( )( )
∑ XY − ∑ X ∑ Y
N N

√ (∑ ) √∑ (∑ )
2 2
∑ X 2− X Y2
−
Y
r= N N N N

∑ XY − M
N
( X )( M Y )
OR r = SD X SDY

where
MX = mean of scores in X column
MY= mean of scores in Y column
SDX= standard deviation of scores in X column
SDY = standard deviation of scores in Y column
Spearman Rank-Difference Correlation:
This method is satisfactory when the number of scores to be correlated is small (less than 30). It
is easier to compute with a small number of cases than the Pearson Product-Moment Correlation.
It is a simple practical technique for most classroom purposes. To use the Spearman Rank-
Difference Method, the following steps listed under should be taken.
Computing Procedure for the Spearman Rank-Difference Correlation
Step 1 -Arrange pairs of scores for each examinee in columns (Columns 1 and 2)
Step 2 - Rank examinees from 1 to N (number in group) for each set of scores
Step 3 - Rank the difference (D) in ranks by subtracting the rank in the right hand column from
the rank in the left-hand column
Step 4- Square each difference in rank to obtain difference squared (D2)

Step 5- Sum the squared differences to obtain ∑ D2

Step 6 - Apply the following formula
6×∑ D 2
ρ (rho) = 1 - N ( N −1 )
2

Where
D = Difference in rank and N= total no of students
Self-Check Exercises

33
Dear Trainees! You have now completed your study of chapter six. Therefore, you are expected
to answer the following self-test questions.

1. What is test interpretation and why is it necessary to interpret classroom test?

2. Highlight the major difference between criterion-reference interpretation and Norm-
referenced interpretation.
3. For a student with raw score of 62, calculate the Z – score and the T-score If the mean is
92 and SD = 6

UNIT SEVEN
RELIABILITY AND VALIDITY OF A TEST
Unit objectives
By the time you finish this unit you will be able to:
 Define reliability of a test
 State the various forms of reliability
 Explain the factors that influence reliability measures
 Compare and contrast the different forms of estimating reliability
 Define validity as well as content, criterion and construct validity
Test Reliability
Reliability of a test may be defined as the degree to which a test is consistent, stable, dependable
or trustworthy in measuring what it is measuring. This definition implies that the reliability of a
test tries to answer questions like: How can we rely on the results from a test? How dependable
are scores from the test? How well are the items in the test consistent in measuring whatever it is
measuring? In general, reliability of a test seeks to find if the ability of a set of testees are
determined based on testing them two different times using the same test, or using two parallel
forms of the same test, or using scores on the same test marked by two different examiners, will
the relative standing of the testees on each of the pair of scores remain the same.

Types of Reliability Measures

There are different types of reliability measures. These measures are estimated by different
methods. The chief methods of estimating reliability measures are illustrated in table 6.1.
Methods of Estimating Reliability

34
Method Types of Reliability Procedure
Measure

Test-retest Measure of stability Give the same test twice to the same group
with any time interval between tests
method

Equivalent-forms Measure of equivalence Give two forms of the test to the same group
in close succession
methods

Split-half Measure of internal Give test once. Score two equivalent halves
consistency say odd and even number items, correct
Method reliability coefficient to fit whole test by
Spearman- Brown formula

Kuder- Measure of internal Give test once. Score total test and apply
consistency kuder-Richardson formula
Richardson
methods

Methods of Computing Correlation Coefficient

A correlation coefficient expresses the degree of relationship between two sets of scores by
numbers ranging from + 1.00 to -1∙00. A perfect positive correlation is indicated by a coefficient
of +1∙00 and a perfect negative correlation by a coefficient of -1∙00. A correlation coefficient of
0∙00 lies midway between these extremes and indicates no relationship between the two sets of
scores. The larger the coefficient (positive or negative), the higher the degree of relationship
expressed. There are two common methods of computing correlation coefficient. These are:
 Spearman Rank-Difference Correlation
 Pearson Product-Moment Correlation
You will note that correlation indicates the degree of relationship between two scores but not the
causation of the relationship. Usually, further study is needed to determine the cause of any
particular relationship. The following are illustrations of the computation of correlation
coefficients using both methods.
Spearman Rank-Difference Correlation:

35
This method is satisfactory when the number of scores to be correlated is small, which is less
than 30. It is easier to compute with a small number of cases than the Pearson Product-Moment
Correlation. It is a simple practical technique for most classroom purposes. To use the Spearman
Rank-Difference Method, the following steps listed under should be taken.
Computing Procedure for the Spearman Rank-Difference Correlation
Step 1. Arrange pairs of scores for each examinee in columns (Columns 1 and 2)
Step 2. Rank examinees from 1 to N (number in group) for each set of scores
Step 3. Rank the difference (D) in ranks by subtracting the rank in the right-hand column
from the rank in the left-hand column
Step 4. Square each difference in rank to obtain difference squared (D2)

Step 5. Sum the squared differences to obtain ∑ D2

Step 6. Apply the following formula
6×∑ D 2
ρ (rho) = 1 - N ( N −1 )
2

Where:
∑ = Sum of
D = Difference in rank
N = Number of examinees

Pearson Product-Moment Correlation:

This is the most widely used method and the coefficient is denoted by the symbol r. This method
is favored when the number of scores is large and it is also easier to apply to large group. The
computation is easier with ungrouped test scores and would be illustrated here. The computation
with grouped data appears more complicated and can be obtained from standard statistics test
book.

The following steps listed below will serve as guide for computing a product-moment correlation
coefficient (r) from ungrouped data.
Step 1. Begin by writing the pairs of score to be studied in two columns. Make certain that the
pair of scores for each examinee is in the same row. Call one Column X and the other Y
Step 2. Square each of the entries in the X column and enter the result in the X2 column

36
Step 3. Square each of the entries in the Y column and enter the result in the Y2 column
Step 4. In each row, multiply the entry in the X column by the entry in the Y column, and enter
the result in the XY column

Step 5. Add the entries in each column to find the sum of ∑ each column.
Step 6. Apply the following formula

N ( )( )
∑ XY − ∑ X ∑ Y
N N

√ ( )√ ( )
2 2
∑ X 2− ∑ ∑ ∑X Y2
−
Y
r= N N N N

∑ XY − M
N
( X )( M Y )
OR r = SD X SDY

Where:
MX = Mean of scores in X column
MY = Mean of scores in Y column
SDX = Standard deviation of scores in X column
SDY = standard deviation of scores in Y column
Factors Influencing Reliability Measures
The reliability of classroom tests is affected by some factors. These factors can be controlled
through adequate care during test construction. Therefore, the knowledge of the factors are
necessary to classroom teachers to enable them control them through adequate care during test
construction in order to build in more reliability in norm referenced classroom tests.
Length of Test
The reliability of a test is affected by the length. The longer the length of a test is, the higher its
reliability will be. This is because longer test will provide a more adequate sample of the
behaviour being measured, and the scores are apt to be less distorted by chance factors such as
guessing. If the quality of the test items and the nature of the testees can be assumed to remain
the same, then the relationship of reliability to length can be expressed by the simple formula
stated as follow:

37
nr ii

rnn = 1+ ( n−1 ) r ii
Where:
rnn= is the reliability of a test n times as long as the original test
rii= is the reliability of the original test
n = is as indicated, the factors by which the length of the test is increased
Increase in length of a test brings test scores to depend closer upon the characteristics of the
person being measured and more accurate appraisal of the person is obtained. However, we all
know that lengthen a test is limited by a number of practical considerations. The considerations
are the amount of time available for testing, factors of fatigue and boredom on part of the testees,
inability of classroom teachers to constructs more equally good test items. Nevertheless,
reliability can be increased as needed by lengthening the test within these limits.
Spread of Scores
The reliability coefficients of a test are directly influenced by the spread of scores in the group
tested. The larger the spread of scores is, the higher the estimate of reliability will be if all other
factors are kept constant. Larger reliability coefficients result when individuals tend to stay in
same relative position in a group from one testing to another. It therefore follows that anything
that reduces the possibility of shifting positions in the group also contributes to larger reliability
coefficient. This means that greater differences between the scores of individuals reduce the
possibility of shifting positions. Hence, errors of measurement have less influence on the relative
position of individuals when the differences among group members are large when there is a
wide spread of scores.
Difficulty of Test
When norm-referenced test are too easy or too difficult for the group members taking it, it tends
to produce scores of low reliability. This is so since both easy and difficult tests will result in a
restricted spread of scores. In the case of easy test, the scores are closed together at the top of the
scale while for the difficult test; the scores are grouped together at the bottom end of the scale.
Thus, for both easy and difficult tests, the differences among individuals are small and tend to be
unreliable. Therefore, a norm-referenced test of ideal difficulty is desired to enable the scores to
spread out over the full range of the scale. This implies that classroom achievement tests are to
be designed to measure differences among testees. This can be achieved by constructing test

38
items with at least average scores of 50 percent and with the scores ranging from zero to near
perfect scores. Constructing tests that match this level of difficulty permits the full range of
possible scores to be used in measuring differences among individuals. This is because the
bigger the spread of scores, the greater the likelihood of its measured differences to be reliable.
Objectivity
This refers to the degree to which equally competent scorers obtain the same results in scoring a
test. Objective tests easily lend themselves to objectivity because they are usually constructed so
that they can be accurately scored by trained individuals and by the use of machines. For such
test constructed using highly objective procedures, the reliability of the test results is not affected
by the scoring procedures. Therefore, the teacher made classroom test calls for objectivity. This
is necessary in obtaining reliable measure of achievement. This is more obvious in essay testing
and various observational procedures where the results of testing depend to a large extent on the
person doing the scoring. Sometimes even the same scorer may get different results at different
times. This inconsistency in scoring has an adverse effect on the reliability of the measures
obtained. The resulting test scores reflect the opinions and biases of the scorer and the
differences among testees in the characteristics being measured.
Validity of a Test
Validity is the most important quality you have to consider when constructing or selecting a test.
It refers to the meaningfulness or appropriateness of the interpretations to be made from test
scores and other evaluation results. Validity is a measure or the degree to which a test measures
what it is intended to measure. It is always concerned with the specific use of the results and the
soundness of our proposed interpretations. Hence, to the extent that a test score is decided by
factors or abilities other than that which the test was designed or used to measure, its validity is
impaired.
Types of Validity
The concern of validity is basically three, which are:
 Determining the extent to which performance on a test represents level of knowledge of
the subject matter content which the test was designed to measure – content validity of
the test.
 Determining the extent to which performance on a test represents the amount of what was
being measured possessed by the examinee – construct validity of the test.

39
 Determining the extent to which performance on a test represents an examinee’s probable
task – criterion (concurrent and prediction) validity of a test.
Factors Influencing Validity
Many factors tend to influence the validity of test interpretation. These factors include factors in
the test itself. The following are the list of factors in the test itself that can prevent the test items
from functioning as intended and thereby lower the validity of the interpretations from the test
scores. They are:
 Unclear directions on how examinees should respond to test items
 Too difficult reading vocabulary and sentence structure
 Inappropriate level of difficulty of the test items
 Poorly structured test items
 Ambiguity leading to misinterpretation of test items
 Inappropriate test items for the outcomes being measured
 Test too short to provide adequate sample
 Improper arrangement of items in the test and
 Identifiable pattern of answer that leads to guessing.
SELF ASSESSMENT EXERCISE
PART I. READ THE FOLLOWING QUESTIONS AND GIVE SHORT AND PRECISE
ANSWER.
1. Define the reliability of a test.
2. Mention methods of estimating reliability and the type of reliability measure associated
with each of them.
3. What are the factors that influence reliability measures?
4. Define the following terms:
i. Content Validity
ii. Criterion related Validity
iii. Construct Validity
5. What are the three main concerns of validity of a test?
6. What are the factors that affect validity?
7. Explain the relationship between reliability and validity?

40
READ EACH OF THE FOLLOWING QUESTION AND CHOOSE THE MOST
APPROPRIATE ONE FROM THE GIVEN ALTERNATIVES.
1.A form of correlation that shows equal magnitude increment value in both variables is called______?
A. Perfect positive relation
B. Perfect negative relation
C. No relation at all
D. Very high positive relation
2. Which of the following is not true about correlation?
A. It measures association
B. It is scaled between -1 to +1
C. It shows the relation of two variables
D. It estimates the average value of two variables
3. Which one is not true about negative correlation?
A. As one variable increases, the other will decrease
B. The coefficient lies between-1 and less than 0
C. Both variables show increment in the same direction
D. Each variable may increase or decrease in the opposite direction
4. What is the meaning of correlation coefficient r=0.00?
A. Perfect positive relation
B. There is no relation at all
C. There is negligible relation
D. Perfect negative relation
5. Which reliability measure uses a single exam two times for the same group?
A. Equivalence
B. Stability
C. Internal consistency
D. Split half
6. In split half method of estimating reliability, even and odd number items correlation is 0.64. What will
be the total items reliability?
A. 0.74 B. 0.87 C. 0.64 D. 0.78

7. Which one of the following dose not influence the reliability measures?

A. Length of the test

B. Too easy or too difficult questions

41
C. The variability of score distribution
D. The nature of the subject matter

8. Which type of validity focuses on standard?

A. Content validity
B. Criterion validity
C. Construct validity
D. None of the above

9 . Which type of validity is guaranteed if test preparation represents the entire content of the course?

A. Content validity B. Criterion validity C. Construct validity D. Face validity

10. Which one of the following is not influence the reliability measures?

A. Length of the test B. Too easy or too difficult questions

C. The variability of score distribution D. The nature of the subject matter

11. The method of computing the degree of relationship between two sets of scores is called__________?

A. Correlation B. Mode C. Variance D. Average

12. The type of correlation that shows as one variable increase the other decreases?

A. Positive correlation B. Negative correlation C. Zero correlation D. None of the above

13. The correlation coefficient of r = 0.99 can be interpreted as:

A. Perfect positive relationship B. Very high positive relationship

C. Low positive relationship D. Perfect negative relationship

14. Measurement reliability refers to the:

A. Comprehensiveness of the scores B. Consistency of the scores

C. Dependency of the scores D. Accuracy of the scores

15. If a measure is consistent over multiple occasions, it has

A. Construct validity B. Inter-rater reliability C. Test-retest reliability D. Internal validity
16. The validity of a measure refers to the:
A. Consistency of the measurement. B. Particular type of construct specification
C. Accuracy with which it measures the construct.

42
D. Comprehensiveness with which it measures the construct
17. A measure has high internal consistency reliability when:
A. Multiple observers obtain the same score every time they use the measure
B. Multiple observers make the same ratings using the measure
C. Participants score at the high end of the scale every time they complete the measure
D. Each of the items correlates with other items on the measure

UNIT EIGHT
JUDGING THE QUALITY OF A CLASSROOM TEST
Unit objectives
By the end of this unit, you will be able to:
 Differentiate distinctively between item difficulty, item discrimination and the distraction
power of an option
 Recognize the need for item analysis, its place and importance in test development
 Conduct item analysis of a classroom test
 Calculate the value of each item parameter for different types of items
 Appraise an item based on the results of item analysis
Judging the Quality of A Classroom Test
Item Analysis
Item analysis is the process of testing the item to ascertain specifically whether the item is
functioning properly in measuring what the entire test is measuring. As already mentioned, item
analysis begins after the test has been administered and scored. It involves detailed and
systematic examination of the testees’ responses to each item to determine the difficulty level
and discriminating power of the item.
Purpose and Uses of Item Analysis
Item analysis is usually designed to help determine whether an item functions as intended with
respect to discriminating between high and low achievers in a norm-referenced test, and
measuring the effects of the instruction in a criterion referenced test items. It is also a means of
determining items having desirable qualities of a measuring instrument, those that need revision
for future use and even for identifying deficiencies in the teaching/learning process.
The Process of Item Analysis for Norm Referenced Classroom Test

43
To illustrate the method of item analysis using an example with a class of 40 learners taking a
10-item test that have been administered and scored, and using 25% test groups. The item
analysis procedure might follow this basic step.
Step 1. Arrange the 40 test papers by ranking them in order from the highest to the lowest score.
Step 2. Select the best 10 papers (upper 25% of 40 testees) with the highest total scores and the
least 10 papers (lower 25% of 40 testees) with the lowest total scores.
Step 3. Drop the middle 20 papers (the remaining 50% of the 40 testees) because they will no
longer be needed in the analysis.
Step 4. Draw a table as shown in table 8.1 in readiness for the tallying of responses for item
analysis.
Step 5. For each of the 10 test items, tabulate the number of testees in the upper and lower
groups who got the answer right or who selected each alternative (for multiple choice
items).
Step 6. Compute the difficulty of each item (percentage of testees who got the item right).
Step 7. Compute the discriminating power of each item (difference between the number of
testees in the upper and lower groups who got the item right).
Step 8. Evaluate the effectiveness of the distracters in each item (attractiveness of the incorrect
alternatives) for multiple choice test items.

Computing Item Difficulty

The difficulty index P for each of the items is obtained by using the formula:
Item Difficulty (P) =Number of testees who got item right (T)
Total number of testees responding to item (N) i.e. P = T/N
14
Thus, for item I in table 8.1, P = 20 = 0.7
The item difficult indicates the percentage of testees who got the item right in the two groups
used for the analysis. That is 0.7 x 100% = 70%.
Computing Item Discriminating Power (D)
Item discrimination power is an index which indicates how well an item is able to distinguish
between the high achievers and low achievers given what the test is measuring. That is, it refers

44
to the degree to which it discriminates between testees with high and low achievement. It is
obtained from this formula:
H−L
D= n
Where:
D= Item Discrimination Power
H= Number of high scorers whogot the item right

L= Number of low scorers whogot the item right

n= Total Number of examinees in upper or lower group

Hence for item 1 in table 8.1, the item discriminating power D is obtained thus:
H−L 10−4 6
D= n = 10 = 10 = 0∙60
Item discrimination values range from – 1∙00 to + 1∙00. The higher the discriminating index, the
better is an item in differentiating between high and low achievers. Usually, if item
discriminating power is a:
 Positive value when a larger proportion of those in the high scoring group get the item
right compared to those in the low scoring group.
 Negative value when more testees in the lower group than in the upper group get the item
right.
 Zero(0) value when an equal number of testees in both groups get the item right; and
 One (1.00) when all testees in the upper group get the item right and all the testees in the
lower group get the item wrong.
Evaluating the Effectiveness of Distracters
The distraction power of a distracter is its ability to differentiate between those who do not know
and those who know what the item is measuring. That is, a good distracter attracts more testees
from the lower group than the upper group. The distraction power or the effectiveness of each
distracter (incorrect option) for each item could be obtained using the formula:
L−H
Do= n
Where:

45
Do = Option Distracter Power
H =Number of high scorers who marked option
L=Number of low scorers who marked option
n = Total Number of examinees in upper or lower group
Item Analysis and Criterion Referenced Mastery Tests
The item analysis procedures we used earlier for norm–referenced tests are not directly
applicable to criterion–referenced mastery tests. In this case indexes of item difficulty and item
discriminating power are less meaningful because criterion referenced tests are designed to
describe learners in terms of the types of learning tasks they can perform unlike in the norm-
referenced test where reliable ranking of testees is desired.
Item Difficulty
In the criterion referenced mastery tests the desired level of item difficulty of each test item is
determined by the learning outcome it is designed to measure and not as earlier stated on the
items ability to discriminate between high and low achievers. However, the standard formula for
determining item difficulty can be applied here but the results are not usually used to select test
items or to manipulate item difficulty. Rather, the result is used for diagnostic purposes. Also
most items will have a larger difficulty index when the instruction is effective with large
percentage of the testees passing the test.
Item Discriminating Power
As you know the ability of test items to discriminate between high and low achievers are not
crucial to evaluating the effectiveness of criterion referenced tests this is because some of the
best items might have low or zero indexes of discrimination. This usually occurs when all testees
answer a test item correctly at the end of the teaching learning process implying that both the
teaching-learning process and the item are effective. Moreover, they provide useful information
concerning the mastery of items by the testees unlike in the norm-referenced test where they
would be eliminated for failing to eliminate between the high and the low achievers. Therefore,
the traditional indexes of discriminating power are of little value for judging the test items
quality since the purpose and emphasis of criterion referenced test is to describe what learners
can do rather than to discriminate among them.
Analysis of Criterion Referenced Mastery Items

46
Ideally, a criterion referenced mastery test is analyzed to determine extent to which the test items
measure the effects of the instruction. In other to provide such evidence, the same test items is
given before instruction (pretest) and after instruction (posttest) and the results of the same test
pre-and-post administered are compared. The analysis is done by the use of item response chart.
The item response chart is prepared by listing the numbers of items across the top of the chart
and the testees names or identification numbers down the side of the chart and the record correct
(+) and incorrect (-) responses for each testee on the pretest (B) and the posttest (A). This is
illustrated in Table 8.2 for an arbitrary 10 testees.

An index of item effectiveness for each item is obtained by using the formula for a measure of
Sensitivity to Instructional Effects (S) given by:
R A −R B
S= T
Where:
RA = Number of testees who got the item right after the teaching-learning process.
RB = Number of testees who got the item right before the teaching-learning process.
T = Total number of testees who tried the item both times.

Usually for a criterion-referenced mastery test with respect to the index of sensitivity to
instructional effect,
 An ideal item yields a value of 1.00.
 Effective items fall between 0.00 and 1.00, the higher the positive value, the more
sensitive the item to instructional effects; and
 Items with zero and negative values do not reflect the intended effects of instruction.
Building A Test Item File (Item Bank)
This entails a gradual collection and compilation of items administered, analyzed and selected
based on their effectiveness and psychometric characteristics identified through the procedure of
item analysis over time. This file of effective items can be built and maintained easily by
recording them on item card, adding item analysis information indicating both objective and
content area the item measures and can be maintained on both content and objective categories.
This makes it possible to select items in accordance with any table of specifications in the
particular area covered by the file.

47
Building item file is a gradual process that progresses over time. At first it seems to be additional
work without immediate usefulness. But with time its usefulness becomes obvious when it
becomes possible to start using some of the items in the file and supplementing them with other
newly constructed ones. As the file grows into item bank most of the items can then be selected
from the bank without frequent repetition. Some of the advantages of item bank are that:
 Parallel test can be generated from the bank which would allow learners who were ill for
a test or due to some other reasons were unavoidable absent to take up the test later;
 They are cost effective since new questions do not have to be generated at the same rate
from year to year;
 The quality of items gradually improves with modification of the existing ones with time;
and
 The burden of test preparation is considerably lightened when enough high quality items
have been assembled in the item bank.
SELF ASSESSMENT EXERCISE

PART I. READ THE FOLLOWING QUESTION AND GIVE SHORT AND PRECISE
ANSWER.
1. Explain the meaning of item analysis?
2. List and explain the purposes of item analysis?
3. Show norm referenced item analysis procedure?
4. When do teachers use norm reference and criterion referenced item analysis procedure?
5. How can you compute item difficulty and discrimination?
6. How do you evaluate the effectiveness of distractor in item analysis?
7. Explain the purposes of building item bank?
8. Explain the relationship between reliability and validity?
9. What is the difference between index of discriminating power (D) and index of
sensitivity to instructional effects (S)?
10. Do you think that item analysis could help teachers to improve their skill of classroom
test preparation? Why?

48
49

Social Approach Qs
No ratings yet
Social Approach Qs
7 pages
Assessment and Evaluation
No ratings yet
Assessment and Evaluation
94 pages
Monitoring and Evaluation Assistant Cover Letter
100% (2)
Monitoring and Evaluation Assistant Cover Letter
6 pages
Measurement and Evaluation PPP
100% (2)
Measurement and Evaluation PPP
299 pages
Educ 6 Prelim Midterm
100% (1)
Educ 6 Prelim Midterm
8 pages
Exit Examination
No ratings yet
Exit Examination
93 pages
PSYC 321
No ratings yet
PSYC 321
91 pages
EDU 423 Module 1 4 Measurement and Evaluation
No ratings yet
EDU 423 Module 1 4 Measurement and Evaluation
155 pages
New -Measurement and Evaluation
No ratings yet
New -Measurement and Evaluation
67 pages
APICS Scoring
100% (1)
APICS Scoring
13 pages
Inquiry and Action Research Module-1
No ratings yet
Inquiry and Action Research Module-1
192 pages
660-Article Text-2098-1-10-20231224
No ratings yet
660-Article Text-2098-1-10-20231224
29 pages
Appendix 21- Formel Q for German Customer Supplier
No ratings yet
Appendix 21- Formel Q for German Customer Supplier
33 pages
Tps 201 Test & Measurements
No ratings yet
Tps 201 Test & Measurements
57 pages
A and E All 1-8
No ratings yet
A and E All 1-8
289 pages
Teaching Methodology and Practice
No ratings yet
Teaching Methodology and Practice
56 pages
Care Plan PDF
No ratings yet
Care Plan PDF
39 pages
PSE 2201
No ratings yet
PSE 2201
20 pages
EBS 234 Assessment in Basic Schools
No ratings yet
EBS 234 Assessment in Basic Schools
92 pages
Employee Performance Appraisal Best Practices Management Essay
No ratings yet
Employee Performance Appraisal Best Practices Management Essay
25 pages
Assessment Course
No ratings yet
Assessment Course
221 pages
Educational Measurement & Evaluation
100% (1)
Educational Measurement & Evaluation
199 pages
Business Ethics Unit-3
No ratings yet
Business Ethics Unit-3
23 pages
TDP 301 Eductional Measurements and Evaluation Notes Sept Dec 2023-1
No ratings yet
TDP 301 Eductional Measurements and Evaluation Notes Sept Dec 2023-1
129 pages
Final Edit Project
No ratings yet
Final Edit Project
14 pages
Lesson 1
No ratings yet
Lesson 1
17 pages
Module 1 Lesson 1
No ratings yet
Module 1 Lesson 1
29 pages
Using Svo-Bibo-Bacus To Improve Year 3 Pupils' Skill On Simple Sentence Construction (S-V-O)
No ratings yet
Using Svo-Bibo-Bacus To Improve Year 3 Pupils' Skill On Simple Sentence Construction (S-V-O)
10 pages
Assessment and Evaluation
No ratings yet
Assessment and Evaluation
83 pages
1a. Tor English Ta-Idsrf Regional 2
No ratings yet
1a. Tor English Ta-Idsrf Regional 2
17 pages
PSY 311 Week 1
No ratings yet
PSY 311 Week 1
10 pages
Module 1
No ratings yet
Module 1
33 pages
ME 1 & 2 (1)
No ratings yet
ME 1 & 2 (1)
25 pages
Genral Method of Teaching
No ratings yet
Genral Method of Teaching
44 pages
8602 Assignment No1
No ratings yet
8602 Assignment No1
14 pages
Course outline Corporate Governance
No ratings yet
Course outline Corporate Governance
6 pages
Measuring The Maturity of Digitalization Transformation From Operational Excellence's Perspective
No ratings yet
Measuring The Maturity of Digitalization Transformation From Operational Excellence's Perspective
16 pages
Module For Assesstment
100% (1)
Module For Assesstment
14 pages
education
No ratings yet
education
6 pages
Jitendra Reddy - Job Eva
No ratings yet
Jitendra Reddy - Job Eva
18 pages
DES502 Asst 2 - Legal Requirements - M&E
No ratings yet
DES502 Asst 2 - Legal Requirements - M&E
10 pages
Test & Measurement Summary
No ratings yet
Test & Measurement Summary
10 pages
Introduction to Measurement and evaluation
No ratings yet
Introduction to Measurement and evaluation
12 pages
Test, Measurement, Evaluation and Assessment
No ratings yet
Test, Measurement, Evaluation and Assessment
28 pages
Identified Competencies and Courses Validated
No ratings yet
Identified Competencies and Courses Validated
13 pages
Educ 6 Prelim Midterm
No ratings yet
Educ 6 Prelim Midterm
8 pages
Measurement
No ratings yet
Measurement
15 pages
PED 8 - Hand-Outs Week 2
No ratings yet
PED 8 - Hand-Outs Week 2
6 pages
Module in Prof Ed 6 - Lesson 1-3
No ratings yet
Module in Prof Ed 6 - Lesson 1-3
25 pages
EDUC 30083 - Chapter 1
No ratings yet
EDUC 30083 - Chapter 1
21 pages
Group 3
No ratings yet
Group 3
11 pages
Module 10 Measurement Assessment Evaluation
No ratings yet
Module 10 Measurement Assessment Evaluation
5 pages
Cultural Essentialism and The Persistence of The Multicultural Day
No ratings yet
Cultural Essentialism and The Persistence of The Multicultural Day
17 pages
Basic Concepts of Evaluation
No ratings yet
Basic Concepts of Evaluation
58 pages
University of Gondar: College of Medicine and Health Science Institute of Public Health
No ratings yet
University of Gondar: College of Medicine and Health Science Institute of Public Health
9 pages
8602 Assignment 1
No ratings yet
8602 Assignment 1
25 pages
Business Incubator Training
No ratings yet
Business Incubator Training
96 pages
Assessment Book PDF
0% (1)
Assessment Book PDF
56 pages
Educational Measurement and Evaluation 506-Lecture 1
No ratings yet
Educational Measurement and Evaluation 506-Lecture 1
11 pages
Evaluation and Research
No ratings yet
Evaluation and Research
4 pages
Curriculum (1)
No ratings yet
Curriculum (1)
43 pages
Evaluation in Nursing PDF
No ratings yet
Evaluation in Nursing PDF
29 pages
Basic-Concept-In-Assessment-Morales Mark John C. - Beed 3C
No ratings yet
Basic-Concept-In-Assessment-Morales Mark John C. - Beed 3C
7 pages
Allama Iqbal Open University Islamabad: Department of Education
No ratings yet
Allama Iqbal Open University Islamabad: Department of Education
21 pages
Assessment of Facilities Management Performance
100% (1)
Assessment of Facilities Management Performance
9 pages
Assessment of Learning 1 - Gabuyo
No ratings yet
Assessment of Learning 1 - Gabuyo
88 pages
8602 ASSIGNMENT NO 1 - Compressed
No ratings yet
8602 ASSIGNMENT NO 1 - Compressed
25 pages
Module-1 Assessment of Learning
100% (1)
Module-1 Assessment of Learning
4 pages
Adapting Governance To Your Innovation Journey: Key Challenges
No ratings yet
Adapting Governance To Your Innovation Journey: Key Challenges
9 pages
Assessment H1
100% (1)
Assessment H1
9 pages
GMAT Catch 1
No ratings yet
GMAT Catch 1
16 pages
Operational Plan Example
No ratings yet
Operational Plan Example
4 pages
Teacher Evaluation Form PDF
No ratings yet
Teacher Evaluation Form PDF
14 pages
'TRAINING AND DEVELOPMENT'ppt
No ratings yet
'TRAINING AND DEVELOPMENT'ppt
19 pages
Open-Ended Design: © 2013 Project Lead The Way, Inc. Engineering Design and Development
No ratings yet
Open-Ended Design: © 2013 Project Lead The Way, Inc. Engineering Design and Development
40 pages
Standards of Proficiency Biomedical Scientists
No ratings yet
Standards of Proficiency Biomedical Scientists
20 pages
Basic Concept in Assessment
No ratings yet
Basic Concept in Assessment
17 pages
Measurement UNIT 1-3 Final
No ratings yet
Measurement UNIT 1-3 Final
168 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
Achieving Excellence Design Quality
No ratings yet
Achieving Excellence Design Quality
28 pages
Meaning of Test
100% (4)
Meaning of Test
3 pages
Establishing High Quality Classroom Assessments-1
No ratings yet
Establishing High Quality Classroom Assessments-1
4 pages
Basic Concepts in Assessment 1
No ratings yet
Basic Concepts in Assessment 1
55 pages
Chapter 1 Assess
No ratings yet
Chapter 1 Assess
11 pages
Definition and Purposes of Measurement and Evaluation
50% (6)
Definition and Purposes of Measurement and Evaluation
7 pages
Basic Concepts in Assessing Student Learning
No ratings yet
Basic Concepts in Assessing Student Learning
9 pages
Risk Management Guidelines PDF
No ratings yet
Risk Management Guidelines PDF
6 pages
IOD PARC Final Evaluation of DFID Civil Society Challenge Fund 01072015
No ratings yet
IOD PARC Final Evaluation of DFID Civil Society Challenge Fund 01072015
64 pages
Quality Manual Template Example
100% (1)
Quality Manual Template Example
10 pages
Powerpontassessment1 150219064534 Conversion Gate02
No ratings yet
Powerpontassessment1 150219064534 Conversion Gate02
34 pages
Formative Assessment In Practice
From Everand
Formative Assessment In Practice
Lucas
No ratings yet
Master the Essentials of Assessment and Evaluation: Pedagogy of English, #4
From Everand
Master the Essentials of Assessment and Evaluation: Pedagogy of English, #4
Dr. Jayanthi N.L.N.
No ratings yet
Passing Exams with Confidence Strategies for Study Habit Improvement
From Everand
Passing Exams with Confidence Strategies for Study Habit Improvement
Daniel Ortega
5/5 (1)
Meeting the Assessment Requirements of the Award in Education and Training
From Everand
Meeting the Assessment Requirements of the Award in Education and Training
Nabeel Zaidi
No ratings yet

Measurement Reading Material

Uploaded by

Measurement Reading Material

Uploaded by

Educational Measurement and Evaluation of

Learning Reading Material

Department of Vocational Pedagogy

 Define basic terminologies in measurement and evaluation.

Test and testing

 Explain the role of objectives in assessment and evaluation

Taxonomy of Educational Objectives

 Guidelines for preparing true false items

 Keep language as simple and clear as possible.

The multiple-choice items (MCQs)

Supply type items

 Suitable for measuring simple learning outcomes

 It measures simple learning outcomes, which makes it easier to construct.

_Poor: Tewodros II defeated Ras Ali in _______________?

Or In what year did Tewodros II defeat Ras Ali at Deresge?

_Poor: In the year ______________ Canada turned 100 yrs old.

_Better: Canada turned 100 yrs old in the year____________.

 When an answer is to be expressed in numerical units, the unit should be stated.

_ Poor: If a room measures 7 meters by 4 meters, the perimeter is _____________.

 Omit important (key) words.

Poor: ___________ ____________ are words that refer to particular ________,

Essay type items

Essay test items are of two types:

Extended response items

 No bound on the depth, breadth and the organization of the response

 Expose individual differences in attitudes, values and creative ability.

Limitations/disadvantages of extended response essay items

o Scoring is difficult and unreliable (scorer unreliability)

o Insufficient for measuring knowledge of facts

Examples-extended responses type:

 Describe the sampling techniques used in research studies.

 Explain the various ways of preventing accident in a school workshop or laboratory.

 Describe the processes involved in cement production.

 Which of the following alternatives would you favor & why?

 Explain why you agree or disagree with the following statement:

Restricted essay types

 Useful for measuring learning outcomes at the lower cognitive levels:

 More efficient for measuring knowledge of factual information

 More reliability in scoring as compared to extended type

 Reduces scoring difficulty

o Give three advantages and two disadvantages of essay tests.

o State four uses of tests in education.

o Explain five factors which influence the choice of building site.

o Mention five rules for preventing accident in a workshop.

o State 5 technical drawing instruments and their uses.

 Advantages of essay tests

 Find out how ideas are related to each other.

 They increase security.

 Relatively easy to construct when compared to objective items.

 Measure higher level learning outcomes.

 Have influence on student study habits.

 Require the instructor to give critical comments.

 It is easy and economical to administer.

 Promote the development of problem – solving skills.

Area Weight Knowl. Compreh. Applicat. Analys. Synth. Eval. 100 %

10% 15% 15% 30% 10% 20%

1. Identify five important considerations in planning a classroom test.

Arranging the test items

 True-false or alternative-response items;

Writing test directions

 The time to be allotted to the various sections

Reproducing test items

2. What are the steps involved in test administration?

3. List the methods for scoring

i. Essay test item

For this we need to apply statistical procedures. It includes the followings:

1. Measures of Central tendency

Measures of central tendency/measures of location

 What is the central data point?

ADVANTAGES OF USING THE MEAN

Advantages of the median

For ungrouped distribution

a) Even number of scores

( N /2+ N /2+1) (6 /2 )th +(6/2+1)th term

b) Odd number of scores

Step 1 Arrange 5,6,7,8,9,11,14,17,22

Poor: ___ are words that refer to particular ,