Assessment and
Evaluation of
Learning
DR OMUNDI ESTHER
What is Assessment?
• Assessment is the systematic collection, review, and use of
information about educational programs undertaken for the
purpose of improving learning and development. (Palomba
& Banta, 1999)
• Educational assessment is the process of determining the
extent to which learners have acquired specific skills,
knowledge and attitudes. It includes all the ways teachers
use to determine what students know and what they can do.
(Ogula & Munene, 1999)
Definition of Assessment Cont.
• Assessment is the process of gathering and discussing information from
multiple and diverse sources in order to develop a deep understanding of
what students know, understand, and can do with their knowledge as a
result of their educational experiences; the process culminates when
assessment results are used to improve subsequent learning.(The
University of Oregon, USA)
• Task 1- The students to explain the main similarities that
run through all these definitions of assessment
Types of Assessment Contd.
• Summative assessment: This is carried out at the end of a unit,
chapter, year, term or course to measure pupils progress during
a given time span. It is used mainly for the purposes of grading
learners, certifying learners, judging teacher effectiveness and
comparing performance of pupils, schools and districts.
• Informal assessments: These are assessments that are not
planned. They take place during instruction. Informal assessment is
most often used to provide formative feedback. As such, it tends to be less
threatening and thus less stressful to the student
Types of Assessment
• Diagnostic Assessment: This is an assessment carried out
before instruction to determine whether or not students
possess certain entry behaviour and during instruction to
help the teacher determine the difficulties pupils are
experiencing.
• Formative Assessment: This takes place during instruction
to provide feedback to teachers and students on students
progress towards attainment of desired objectives and to
identify areas that need further attention.
Types of Assessment Cont:
• Formal assessments: These are planned assessments.
Formal assessment occurs when students are aware that the task that they
are doing is for assessment purposes, e.g., a written examination. Most
formal assessments also are summative in nature and thus tend to have
greater motivation impact and are associated with increased stress.
Types of assessment contd.
• Continuous assessment: Continuous assessment occurs throughout
a learning experience (intermittent is probably a more realistic term).
Continuous assessment is most appropriate when student and/or instructor
knowledge of progress or achievement is needed to determine the
subsequent progression or sequence of activities (McAlpine 2002).
• Continuous assessment provides both students and teachers with the
information needed to improve teaching and learning in process.
• Task 2: Students to discuss types of assessment and give
examples
Purpose for assessment
• According to Ogula (1999) assessment is done for the following reasons:
1. To identify problem areas in student achievement in order to rectify them
2. To identify reasons why students are encountering specific difficulties in a subject
for purposes of providing feedback to teachers, students and parents.
3. Reward good performance and encourage those with bad performance to improve
4. Provide feedback to teachers on the effectiveness of their teaching
5. For promotion to the next grade
.
Purposes of Assessment Contd.
• According to Ogula (1999) assessment is done for the following reasons:
1. Monitor knowledge, attitudes and skills acquired by the students in the course of
instruction
2. Evaluate students progress at school and recommend ways to improve their learning
3. Evaluate the progress at schools, districts, Counties in achieving curricular
objectives
4. Identify curriculum areas that may need revision.
Features of a good assessment
• Validity
• Reliability
• Practicability
• Economy
• Comparability and interpretability
What is a valid test?
• A test is valid when it measures what it is supposed to
measure. For example if a test is supposed to measure
reading ability of class three pupils, then it is valid when it
does that and not when it measures the reading ability of
class five pupils.
• Content validity of a test: This refers to the degree to
which an instrument measures the subject matter that has
been taught. The content validity of a test is established
by examining the test items to see whether they
correspond to what the user feels should be covered by
the test.
Content validity Continued
• There are two types of content validity, namely: Face
validity and sampling validity.
• Face validity: Face validity is concerned with the extent
to which an instrument measures what it appears to
measure according to the researcher’s subjective
assessment. To evaluate the face validity of your
instrument, you should review each item in the instrument
to assess the extent to which it is related to what you wish
to measure. After this, it may be necessary to consult
experts.
Content Validity continued
• Sampling validity refers to the degree to which a
measure adequately samples the subject matter content
and the behavior changes under consideration. Sampling
validity is commonly used in evaluating achievement tests.
The setter must ensure that the test includes questions
on all the materials covered in the course.
How can we determine content
validity in the real world?
• It is done by asking a panel of experts in the field of study such as
teachers, inspectors and curriculum developers to critically examine
the items if they represent the content of the property being measured.
Another method of determining the content validity of an instrument is
by having a group of experts in the field, rate each test item in terms of
its relevance to the content taught, for example, on a four -point scale
(i) not relevant (ii) somewhat relevant (iii) Quite relevant (iv) very
relevant.
• Two different assessments by different experts of the same content can
be correlated using Pearson’s product moment correlation coefficient.
Criterion related tests
• This is the degree to which a measure is related to some other
standard or criterion that is known to indicate the construct
accurately. Criterion related validity is established by comparing
an instrument with another instrument whose validity is known to
be accurate, an accurate measure of the same construct.
Suppose that a researcher wants to find out the attitudes of
teachers towards the teaching profession. To evaluate the
criterion related validity of the attitude scale, the researcher
should find out whether the new instrument is related to other
measures of attitude towards the teaching profession.
Criterion related validity of tests
continued
• There are two types of criterion related validity: Concurrent
validity and predictive validity.
• Concurrent validity: This refers to the extent to which a new
measuring instrument is related to the pre-testing measure of the
construct. To determine the concurrent validity of the test, you
should administer the test to a sample of students and then
correlate the results with several criterion scores. For example, if
a teacher wants to establish whether the results of student’s
scores on the achievement test are valid, he/she should correlate
the results with pupil’s scores on other tests.
Criterion related validity of tests
continued
• Predictive validity: This refers to the extent to which a
measuring instrument predicts future outcomes that are
logically related to the construct. The predictive validity
of a measuring instrument such as a test is evaluated by
checking scores on the instrument against the subjects
future performance. For example, to establish the
predictive validity of the KCSE examination, a researcher
should determine whether pupils who scored high in the
KCPE also scored high in KCSE examinations.
Construct validity of a test
• Construct validity refers to the accuracy with which a test
measures some meaningful traits(or constructs) in the
individual. Traits or constructs could be human behavior
as assertiveness, sociability, verbal reasoning etc. These
fall under psychology and psychological tests and
therefore out of the concerns of our course.
Reliability of a test
• This is the accuracy or precision of a measuring instrument.
Reliability refers to the consistency within which a measuring
instrument yields the same results for an individual to whom the
instrument is administered several times. Reliability is the degree to
which an instrument yields the same results on repeat trials. When
repeated measures of the same thing give similar results, then the
instrument is said to be reliable. However, if a teacher gives the same
test in different occasions and gets different results, the instrument
contains measurement errors. Reliability refers to the results obtained
with research instrument but not the instrument itself.
Types of reliability of a test
• There are two types of reliability: Repeated measures and internal
consistency
• Repeated measures- In repeated measures two tests must be
administered and their results correlated to determine the extent to
which they are related. There are basically three types of repeated
measures:
• Test – retest
• Alternative forms
• Parallel forms
Types of reliability of a test
Continued
• Test-Retest
• Measuring the individuals on the same instrument on different occasions
and correlating the scores obtained by the same persons on the two
administrations.
• Alternative forms
• The questions in the test are renumbered to create an alternative form of
the first test.
• Parallel forms
• Measures which are exactly equivalent to each other and are administered
to the same group of individuals on the same occasion.
Types of reliability of a test
Continued
• Internal consistency is estimated by determining the
degree to which each item in a scale correlates with each
other item. Internal consistency reliability is based on a
single administration of a measure. It determines the
degree of homogeneity amongst items in an Construct.
There are three types of internal consistency i.e.
reliability, split-half, Kinder-Richardson and Crobach
Alpha.
Types of reliability of a test
Continued
• Split-half: A measure is split into two parts. Each of them is
treated as a separate scale and scored accordingly. Reliability is
then estimated by correlating scores of the two scales. Spearman
brown formula is used to estimate reliability.
• The Kuder-Richardson method: This is used in tests where
there is a right and wrong answer.
• Crobach Alpha: Used where there is no right or wrong answer
as an attitude scale
• In internal Consistency only one test is administered
What is evaluation?
• Evaluation is determining the value of something. So, more
specifically, in the field of education, evaluation means measuring or
observing the process to judge it or to determine it for its value by
comparing it to others or some kind of a standard (Weir & Roberts,
1994).
• The focus of the evaluation is not on grades. It is rather a final
process that is determined to understand the quality of the process.
The quality of the process is mostly determined by grades. That is
such an evaluation can come as a paper that is given grades. This
type of paper will test the knowledge of each student.
Evaluation and assessment
• Having your students to write on a given topic, you are collecting information, this
is what we mean here by assessment (Kizlik 2010; Richards and Schmidt 2002; Weir
& Roberts, 1994).
• Evaluation on the other hand, is recognized as a more scientific process aimed at
determining what can be known about performance capabilities and how these are
best measured. Evaluation is concerned with issues of validity, accuracy, reliability,
analysis, and reporting. It can therefore be seen as the systematic gathering of
information for purposes of decision-making, using both quantitative methods (tests)
and qualitative methods (observations, ratings and value judgments) with purpose of
judging the gathered information.
Why evaluation?
• To make judgement on a teaching learning programme
• Make improvements to a programme
• Change the programme altogether
• Task 3: Learners to discuss differences between assessment
and evaluation and to give examples from their countries
how programme evaluations have led to improvements or
changes in the curriculum
What is measurement?
• After collecting data from students there is then the need for assigning
students with numbers or other symbols to a certain characteristic of the
objects of interest according to some specified rules in order to reflect
quantities of properties. This is called measurement and can be attributed
to students’ achievement, personality traits or attitudes. Measurement then
is the process of determining a quantitative or qualitative attribute of an
individual or group of individuals that is of academic relevance
Definition of measurement
contd.
• Stevens (1951) defines measurement as a procedure in
which one assigns numerals, numbers or other symbols to
empirical priorities according to rules
• Unlike evaluation and assessment, measurement does not
require the making of judgement about the number of
scores obtained. For example we measure students
achievement in mathematics by counting the number of
items in maths test that a student has answered correctly.
If the student has answered 5 out of ten correctly, we do
not jump to conclusion that the student is stupid.
The characteristics of
Measurement
• All measurements contain errors
• All measurements are approximate
• The results of repeated measurements do not agree
exactly
Sources of error in
measurement
• Personal errors such as the manipulative skills of a person
making the measurement.
• Errors in the instrument
• Using the instrument under circumstances or conditions
in which it was not meant to be used.
Measurement scales
Nominal Scale: The nominal scale is a set of names of
categories which may be identified by numbers. In
education, subjects and other entitites are usually
categorized. Numbers are then assigned to represent
different categories, but these numbers do not have the
properties of real number systems. Nominal scales merely
identify different categories. They have no property of
magnitude. Some of the categories which may be identified
by numbers are sex, place of birth, location of the school,
type of school etc. We can assign1 for male and 2 for
female etc.
Measurement Scales Contd.
Ordinal Scales: They rank objects or observations in order of magnitude.
Examples of ordinal scales are measurements ranked by grade A,B,C,D etc.
Ordinal Scales have the property of magnitude and identity. But they have no
zero point and consequently do not show the distances between the values.
Interval scale: Interval scales are scales of measurement which provide
information about the distances between the units and the ordering of the
magnitude of the measures, but they lack absolute zero point. Examples of an
interval scale are measurements of temperature, intelligence and attitude
scores. The scale used to measure temperature does not have a true zero
point which indicates a total absence of heat. However, it has an arbitrary
zero point.
Measurement scales Contd.
• Ratio scale: Ratio scale has all the properties of
measurement, namely: identity, magnitude, equal interval
and a non –arbitrary zero point. Ratio scales involve
mainly physical measurements such as height, weight, age
and number of respondents.
• Task 4: The students to discuss and give examples of
measurement scales under each level