0% found this document useful (0 votes)

32 views

CLASS PRESENTATION - Test Reliability

This is basically a class presentation prepared for TEST RELIABILITY

Uploaded by

oc.nwafor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

CLASS PRESENTATION - Test Reliability

This is basically a class presentation prepared for TEST RELIABILITY

Uploaded by

oc.nwafor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

CLASS PRESENTATION

TEST RELIABILITY

Introduction

Today we are going to talk about test reliability. As future educators, we will be creating
tests and exams for our students, and it is important that we understand the concept of test
reliability to ensure that the assessments we create are valid and accurate.

For example, you have two children and each time you send them to get a piece of
information for you, one of them always return back with information that you can trust but the
other return with a piece of information you cannot depend on. Which of the children will you
say is reliable?

The same can be said about tests, exams and research findings. One important aspect of our
research findings is the dependability or trustworthiness of the results obtained from such a
research findings. To establish a high level of trustworthiness of our tests or research findings,
the measuring or research instrument must equally possess the property of dependability or
trustworthiness. The property of dependability or trustworthiness is simply called Reliability.

So, what exactly is test reliability? Test reliability refers to the consistency of results from
a test. In other words, if the same test is given to the same group of people at different times, the
results should be similar. In simple terms, reliability is consistency.

Test scores are reliable to the extent that they are consistent over ...

• different occasions of testing,

• different editions of the test, containing different questions or problems designed to
measure the same general skills or types of knowledge, and
• different scorings of the test takers’ responses, by different raters.

Why is test reliability important? Test reliability is important because it ensures that the
assessments we create are accurate and valid. If a test is not reliable, then the results may not be
an accurate representation of what the student actually knows. It also ensures that the results
obtained from a test are dependable, consistent, and free from errors.
Assumptions for Test Reliability

To effectively utilize test reliability, we need to make some assumptions:

1. Stability Assumption:

 Assumes that the trait being measured is stable over time and does not change
significantly between test administrations.

2. Homogeneity Assumption:

 Assumes that the construct being measured is consistent across all items or tasks
within the test.

3. Consistency Assumption:

 Assumes that the test-takers respond consistently to the test items, regardless of
external factors or variations in test administration.

Types, forms or methods of Reliability

The choice of a type of reliability may depend on the number of times the instrument will
be administered (which may be either once or twice) or on evaluating persons agreement and/or
items agreement.

There are five types or forms of reliability

• Stability (also called test-retest reliability)

• Equivalence (also called alternate form reliability)
• Equivalent and Stability
• Internal Consistency
• Scorer or Rater reliability

Stability or Test-retest reliability

This is the extent to which the scores on the same test are consistent over time. It involves
giving one group the same instrument or test at two different times, and then correlate the two
sets of scores. This type of reliability is, therefore, an evaluation of two scores of individuals
about an instrument.

The time interval between the first and second tests has no generally acceptable rule of
thumb, but a period of 2 to 6 weeks is okay.

The reliability statistic employed here is the bivariate correlation (e.g. Pearson’s Product
Moment Correlation, PPMC).

Example: Imagine we want to measure the test-retest reliability of a mathematics quiz. We

administer the quiz to a group of students and then re-administer it after two weeks. By
comparing the scores, we can determine the consistency of the test.

Equivalence or Alternate Form Reliability

It is the degree to which two similar or equivalent forms of an instrument are consistent.
This could also be referred to as Parallel Forms Reliability. Hence, it establishes the relationship
between two versions of a test or research instrument (about the same construct) intended to be
equivalent. Alternate-forms reliability answers the question, “To what extent does the test takers
who perform well on one edition of the test also perform well on another edition?”

The two versions of the instrument are similar in that they:

a) measure the same construct, and

b) have the same number of items, level of difficulty, scoring and interpretation.
However, they differ in the wording of the specific items.

This method involves administering the two versions of the same instrument once to a
single group at the same time or almost the same time. The two sets of scores obtained are
statistically correlated using bivariate correlation (e.g., PPMC).

Example: In a psychology class, one group of students takes Test A, and the same group
takes Test B. Both tests cover the same content but have different questions. Equivalence forms
reliability helps determine if the two tests yield similar results.
Equivalence and Stability

As the name implies, it somewhat combines the equivalent (alternate form) and stability
(test-retest) forms. It is aimed at establishing the relationship between equivalent versions of an
instrument administered to a single group at two different times; such that one version is
administered at a time, while the other version is administered at a later time.

Just as the equivalent form, the two versions of the instrument

(a) measure the same construct, and

(b) have the same number of items, level of difficulty, scoring and interpretation; but differ
in the wording of the specific items.

Bivariate correlation is often employed as the reliability statistic.

Example: In a psychology class, one group of students take Test A, while another group
takes Test B. Both tests cover the same content but have different questions. Parallel forms
reliability helps determine if the two tests yield similar results.

Internal Consistency

It is the degree to which the items of an instrument are consistent among themselves and
with the test as a whole. It measures the extent to which the items are similar to one another in
content. It involves administering the instrument once to a single group, and then apply any of
these approaches:

(a) Split-half (r^ reliability),

(b) Kuder-Richardson reliability (KR-20 or 21),
(c) Cronbach’s alpha (a) reliability,
(d) McDonald’s omega (ɸ) reliability, and many others (e.g., Revelle’s beta, greatest lower
bound GLB)
Split-half, KR, Alpha & Omega Reliability

Split-half reliability — after administering the instrument, (a) split the scores into two halves,
usually scores of odd ana even numbers/items, (b) correlate the two sets of scores and apply
PPMC, (c) then apply Brown’s correlation formula.

Kuder-Richardson reliability - applies to only dichotomous (two) response options. After

administering tne instrument, apply KR-20 or KR-21 formula.

Cronbach’s alpha reliability (sometimes called tau-equivalent reliability) - Cronbach’s alpha

provides a measure of the extent to which the items on a test, each of which could be thought of
as a mini-test, provide consistent information with regard to students’ mastery of the domain. It
applies to more than two (polychotomous) response options. After administering the instrument,
apply the Cronbach’s alpha formula.

Where k is the number of items on the

exam;

pi, referred to as the item difficulty, is the proportion of examinees who answered
item i correctly;

and σ2X s is the sample variance for the total score.

Example: To illustrate, suppose that a five-item multiple-choice exam was administered with the
following percentages of correct response: p1 = .4, p2= .5, p3 = .6, p4 = .75, p5 = .85, and σ2X
=1.84. Cronbach’s alpha would be calculated as follows:
Cronbach’s alpha ranges from 0 to 1.00, with values close to 1.00 indicating high
consistency. Professionally developed high-stakes standardized tests should have internal
consistency coefficients of at least .90. Lower-stakes standardized tests should have internal
consistencies of at least .80 or .85. For a classroom exam, it is desirable to have a reliability
coefficient of .70 or higher.

McDonald's omega reliability - is closely related to Cronbach’s alpha, but the Mcdonald’s
omega formula is applied instead.

Scorer or Rater Reliability

Rater or Scorer reliability is the degree of consistency of subjective scores (about a

subjective instrument) obtained from either (a) two or more raters/scorers/judges at one time
(inter-rater), or (b) one rater/scorer/judge at two or more different times (intra-rater). Hence,
inter-rater (between rater) reliability evaluates agreement on how consistently two or more
independent raters can assign the same score to a measure, behaviour or test at a particular time.

On the other hand, intra-rater (within rater) reliability evaluates agreement on how
consistently the same rater can assign a score to a measure, behaviour or test at two or more
different times. The statistic often applied is Spearman’s rho, Cohen’s kappa, Krippendorff’s
alpha, or Intra-class Correlation Coefficients (ICC)

Other Reliability Coefficient

Intraclass Correlation Coefficient (ICC): This determines the agreement or consistency

between multiple raters or observers.

Example: ICC can be used to assess the reliability of scores assigned by different judges
in a figure skating competition.
References

Chukwuedo, S. O. (2021), Conceptualizing the Forms of Reliability and its types in Quantittative
Behavioral, Education and Social Research. Research-Statistics Mind.
https://youtu.be/0qcYNJa1a7l

Craig, S. W. & James, A. W. (2003). An Instructor’s Guide to Understanding Test Reliability:

Testing & Evaluation Services, University of Wisconsin, 1025 W. Johnson St., #373
Madison, WI 53706

Livingston, S. A. (2018). Test reliability—Basic concepts (Research Memorandum No. RM-18-

01). Princeton, NJ: Educational Testing Service.

CHAPTER 6
No ratings yet
CHAPTER 6
8 pages
RELIABILITY 2024
No ratings yet
RELIABILITY 2024
30 pages
Reliabilty Lecture (5)
No ratings yet
Reliabilty Lecture (5)
16 pages
Language Test Reliability
No ratings yet
Language Test Reliability
20 pages
Reliability Test by Group 2
No ratings yet
Reliability Test by Group 2
28 pages
Reliability
No ratings yet
Reliability
11 pages
Unit 6
No ratings yet
Unit 6
37 pages
Module 4 Psychometric properties (1)
No ratings yet
Module 4 Psychometric properties (1)
49 pages
5 Reliability
No ratings yet
5 Reliability
29 pages
Test - Education (1) STANDARDIZED TESTS
No ratings yet
Test - Education (1) STANDARDIZED TESTS
9 pages
Reliability Reviewer
No ratings yet
Reliability Reviewer
5 pages
Reliability and its Types
No ratings yet
Reliability and its Types
13 pages
Reading 03 Psychometric Principles
No ratings yet
Reading 03 Psychometric Principles
20 pages
Reliability
No ratings yet
Reliability
9 pages
UNIT-5 psychometry_240505_1652001
No ratings yet
UNIT-5 psychometry_240505_1652001
20 pages
9 Reliability
No ratings yet
9 Reliability
10 pages
3 - Types of Reliability
No ratings yet
3 - Types of Reliability
36 pages
Psychometric Properties
No ratings yet
Psychometric Properties
3 pages
Chapter 5 Reliability
No ratings yet
Chapter 5 Reliability
38 pages
Reliability
No ratings yet
Reliability
3 pages
PSY 210 L7 Reliability
No ratings yet
PSY 210 L7 Reliability
8 pages
Reliability
No ratings yet
Reliability
5 pages
Chapter 6edited
No ratings yet
Chapter 6edited
15 pages
Reliability Estimates: Source of Error Variance Is Test Administration
No ratings yet
Reliability Estimates: Source of Error Variance Is Test Administration
8 pages
RELIABILITY
No ratings yet
RELIABILITY
16 pages
reliability
No ratings yet
reliability
15 pages
Validity and Reliability Updated
No ratings yet
Validity and Reliability Updated
9 pages
Assess 1 PED 106 Lesson 6
No ratings yet
Assess 1 PED 106 Lesson 6
75 pages
KPD Validity & Realibility
No ratings yet
KPD Validity & Realibility
25 pages
RELIABILITY AND VALIDITY
No ratings yet
RELIABILITY AND VALIDITY
47 pages
MOUNT MARY COLLEGE OF EDUCATION
No ratings yet
MOUNT MARY COLLEGE OF EDUCATION
5 pages
0520-20Validity2020Reliability
No ratings yet
0520-20Validity2020Reliability
37 pages
Reliability
No ratings yet
Reliability
113 pages
35 40 Ganesh
No ratings yet
35 40 Ganesh
6 pages
Reliability and Its Importance
No ratings yet
Reliability and Its Importance
57 pages
Reliability and Validity
No ratings yet
Reliability and Validity
18 pages
Students_Slides_1_Realibity
No ratings yet
Students_Slides_1_Realibity
59 pages
Top 4 Characteristics of A Good Test: Characteristic # 1. Reliability
No ratings yet
Top 4 Characteristics of A Good Test: Characteristic # 1. Reliability
21 pages
Reliability: Floramae Z. Campos Student/MA-GC
No ratings yet
Reliability: Floramae Z. Campos Student/MA-GC
29 pages
Characteristicsofagoodtest3 140227023631 Phpapp02
No ratings yet
Characteristicsofagoodtest3 140227023631 Phpapp02
41 pages
Chapter 13 Assessing Quality of Measurement Tools 2
No ratings yet
Chapter 13 Assessing Quality of Measurement Tools 2
57 pages
Psy 112 Handout 6
No ratings yet
Psy 112 Handout 6
6 pages
SPL-3 Unit 2
No ratings yet
SPL-3 Unit 2
11 pages
Chracteristics of A Good Test
No ratings yet
Chracteristics of A Good Test
58 pages
Internal Consistency: From Wikipedia, The Free Encyclopedia
100% (2)
Internal Consistency: From Wikipedia, The Free Encyclopedia
18 pages
script-sir Fano
No ratings yet
script-sir Fano
1 page
Reliability
No ratings yet
Reliability
9 pages
Lesson 09 - Tagged
No ratings yet
Lesson 09 - Tagged
34 pages
Reliability and Its Types...
No ratings yet
Reliability and Its Types...
53 pages
Lesson 6.2 Item Analysis and Validation 3
No ratings yet
Lesson 6.2 Item Analysis and Validation 3
11 pages
Reliability (Statistics) : Navigation Search
No ratings yet
Reliability (Statistics) : Navigation Search
3 pages
Unit 9
No ratings yet
Unit 9
27 pages
Lesson 6.2 Item Analysis and Validation
No ratings yet
Lesson 6.2 Item Analysis and Validation
24 pages
test constrcution
No ratings yet
test constrcution
39 pages
Properties of Assessment Method: Validity
No ratings yet
Properties of Assessment Method: Validity
30 pages
Reliability and Validity
No ratings yet
Reliability and Validity
23 pages
Lesson 6 Establishing Test Validity and Reliability
No ratings yet
Lesson 6 Establishing Test Validity and Reliability
19 pages
assessment midtrerm quiz reviewer
No ratings yet
assessment midtrerm quiz reviewer
3 pages
Testing Impact Review
From Everand
Testing Impact Review
Mason Ross
No ratings yet
CLEP® College Mathematics Book + Online
From Everand
CLEP® College Mathematics Book + Online
Stu Schwartz
No ratings yet
Ej 1134490
No ratings yet
Ej 1134490
8 pages
User Satisfaction On Human Resource Management Information System (HRMIS) : A Case Study at Terengganu Police Contingent, Malaysia
No ratings yet
User Satisfaction On Human Resource Management Information System (HRMIS) : A Case Study at Terengganu Police Contingent, Malaysia
13 pages
Consumption of OTT Media Streaming in COVID-19 Lockdown: Insights From PLS Analysis
No ratings yet
Consumption of OTT Media Streaming in COVID-19 Lockdown: Insights From PLS Analysis
11 pages
Akesmawan,+2 Cornelia Matani
No ratings yet
Akesmawan,+2 Cornelia Matani
25 pages
Financial Resource Availability and Implementation of Child Protection and Safeguarding Programs in Kwale County, Kenya
No ratings yet
Financial Resource Availability and Implementation of Child Protection and Safeguarding Programs in Kwale County, Kenya
14 pages
Test Reliability 2
100% (1)
Test Reliability 2
47 pages
Digital Transformation in Encouraging The Advancement of Accounting Learning in The Industrial Era 5.0 Through Optimizing Learning in Schools and On Campus
No ratings yet
Digital Transformation in Encouraging The Advancement of Accounting Learning in The Industrial Era 5.0 Through Optimizing Learning in Schools and On Campus
11 pages
Positive Leadership and Organizational Identification: Mediating Roles of Positive Emotion and Compassion
No ratings yet
Positive Leadership and Organizational Identification: Mediating Roles of Positive Emotion and Compassion
12 pages
Comprehensive Exam Methods of Research Answer Auxtero
No ratings yet
Comprehensive Exam Methods of Research Answer Auxtero
6 pages
Bas 3
No ratings yet
Bas 3
12 pages
Teori Sosial Kognitif Bandura
No ratings yet
Teori Sosial Kognitif Bandura
2 pages
Teamstepps Teamwork Attitudes Questionnaire Manual: Prepared For
100% (2)
Teamstepps Teamwork Attitudes Questionnaire Manual: Prepared For
17 pages
XCXZC
No ratings yet
XCXZC
4 pages
Sachs Slides
No ratings yet
Sachs Slides
34 pages
Notice of Retraction Applying Structural Equation Model To Study of The Relationship Model Among Leadership Style, Satisfaction, Organization Commitment and Performance in Hospital Industry
No ratings yet
Notice of Retraction Applying Structural Equation Model To Study of The Relationship Model Among Leadership Style, Satisfaction, Organization Commitment and Performance in Hospital Industry
6 pages
Impact of Age On Student's Attitude Towards E-Learning A Study On Panjab University, India - PPSX
No ratings yet
Impact of Age On Student's Attitude Towards E-Learning A Study On Panjab University, India - PPSX
23 pages
Study Destination Preference and Post-Graduation Intentions
No ratings yet
Study Destination Preference and Post-Graduation Intentions
21 pages
Humor Styles and Emotional Intelligence
No ratings yet
Humor Styles and Emotional Intelligence
55 pages
Brainstorming and Essay Writing in EFL Class
100% (1)
Brainstorming and Essay Writing in EFL Class
10 pages
Sex Talk: The Effects of Sexual Self Disclosure and Identity Gaps On Sexual and Relational Outcomes in Diverse Relationships
No ratings yet
Sex Talk: The Effects of Sexual Self Disclosure and Identity Gaps On Sexual and Relational Outcomes in Diverse Relationships
25 pages
Effects of Students' Perceptions Toward Using Logistics Software On Learning Motivation in Logistics Education
No ratings yet
Effects of Students' Perceptions Toward Using Logistics Software On Learning Motivation in Logistics Education
8 pages
Gupta 2018
No ratings yet
Gupta 2018
28 pages
Jurnal Internasional 5
No ratings yet
Jurnal Internasional 5
14 pages
Q1 a Moderated Mediation Model of Situational Context and Brand Image for Online Purchases Using EWOM
No ratings yet
Q1 a Moderated Mediation Model of Situational Context and Brand Image for Online Purchases Using EWOM
31 pages
R 1 Desi Fix
No ratings yet
R 1 Desi Fix
4 pages
Program Effectiveness Survey
No ratings yet
Program Effectiveness Survey
32 pages
Empirical Estimates of Reliability
No ratings yet
Empirical Estimates of Reliability
17 pages
Journal of Human Resources in Hospitality & Tourism
No ratings yet
Journal of Human Resources in Hospitality & Tourism
21 pages
Cronbach's Alpha PDF
No ratings yet
Cronbach's Alpha PDF
2 pages
KnowlProcessManag-2024-Durst-TheimpactofethicalleadershiponKMpracticesandperformance
No ratings yet
KnowlProcessManag-2024-Durst-TheimpactofethicalleadershiponKMpracticesandperformance
10 pages