0% found this document useful (0 votes)

24 views12 pages

Reliability 1

The document discusses the concept of reliability in measurement, detailing various types such as test-retest, parallel-form, inter-scorer, and internal consistency. It outlines sources of error variance, methods for estimating reliability, and the importance of understanding the differences in perspectives between individuals, as illustrated by a narrative about an elderly couple observing Halley's Comet. Additionally, it provides statistical procedures and coefficients used to evaluate reliability, emphasizing the significance of clear item construction and standardized testing conditions.

Uploaded by

JohnLloydMartin Viloria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views12 pages

Reliability 1

Uploaded by

JohnLloydMartin Viloria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

RELIABILITY

Title: "Halley's Comet"

In a small rural town, excitement buzzed as news spread about the impending arrival of Halley's
Comet. The townspeople were eager to witness this once-in-a-lifetime event. Among them was
an elderly couple, Mr. and Mrs. Thompson.

Mr. Thompson, a retired astronomer, had spent weeks preparing his telescope for the comet's
arrival. He wanted to share this special moment with his wife, who had no interest in astronomy
but adored her husband. The couple's communication was usually spot-on, but this time,
miscommunication loomed.

One evening, Mr. Thompson excitedly called out to his wife, "Honey, the comet will be visible
tonight! Come outside and take a look through the telescope with me!"
Mrs. Thompson, engrossed in her book, replied without looking up, "Sure, dear, I'll come out
right after I finish this chapter."

Hours passed, and Mr. Thompson set up his telescope, eagerly anticipating his wife's arrival.
Meanwhile, Mrs. Thompson, unaware of the comet's significance, thought her husband had been
referring to a regular star. She finally closed her book and stepped outside, expecting a routine
stargazing session.

As she approached the telescope, Mr. Thompson excitedly pointed to the night sky. "There it is,
Halley's Comet! Isn't it magnificent?"

Mrs. Thompson squinted through the telescope and saw what appeared to be an ordinary star.
She nodded politely, not wanting to disappoint her husband, and said, "Yes, dear, it's lovely."
Unbeknownst to each other, their definitions of "lovely" differed drastically. Mr. Thompson saw
the breathtaking beauty of the comet, while Mrs. Thompson simply saw a distant twinkle in the
night sky.

As the comet moved across the heavens, Mr. Thompson continued to share fascinating facts
about its history and significance. Mrs. Thompson listened attentively, trying her best to engage
in the conversation even though she didn't fully understand or share her husband's enthusiasm.
In the end, they both walked back to their house with smiles on their faces, each believing they
had shared a special moment. Though they had seen the same celestial event, their perspectives
and the depth of their understanding remained worlds apart.
“Consistency in measurement.”
▪ Reliability Coefficient
▪ An index of reliability
▪ Best statistical to use? VARIANCE
▪ it describes sources of test score variability
▪ it can be broken into components of:
▪ true variance – variance from true differences
▪ error variance – variance from irrelevant sources

a proportion of the total score variance attributed to true variance

▪ Reliability Coefficient
▪ An index of reliability
▪ A proportion that indicates the ratio between the true score variance on a test and
the total variance

A. Sources of Error Variance

1. Test Construction: difference in test items; item sampling (variation among items
within a test—wordings); content sampling (variation among items between a test)
2. Test Administration: attention or motivation, test environment (room temperature,
level of lighting, amount of ventilation and noise), test taker variables (emotional
problems, physical discomfort, lack of sleep, medication), examiner-related variables
(physical appearance and demeanor, presence or absence)
3. Test Scoring: scorers (subjectivity) and scoring system (technical glitch if computer;
procedure if not)
4. Other Sources: sample error (wrong representation of population—voter’s sample);
methodological error (e.g., untrained interviewer, ambiguous wording in questionnaire);
systematic error (internal agreement, underreporting, overreporting); nonsystematic error
(forgetting, failing to understand abusive behavior, misunderstanding of instructions)

B. Reliability Estimates
1. Test-Retest Reliability
▪ it is an estimate of reliability obtained by correlating pairs of scores from the same
people on two different administrations of the test
▪ it is appropriate when evaluating the reliability of a test that purports to measure
something that is relatively stable over time
▪ Example: personality trait
TYPICAL STATISTICAL ERROR
PURPOSE METHOD
USE PROCEDURES VARIANCE
To evaluate When 1. Administer the Pearson r or Passage of
the stability assessing psychological test Spearman rho/ Time and Test
of a measure the 2. Get Results Spearman rank Administration
stability of 3. Interval (Time Gap)
various Coefficient Stability: if
traits more than 6 months
4. Re-administer
5. Get Results
6. Correlate

▪ Disadvantages:
▪ Checking of answers
▪ Practice Effect
▪ Passage of Time
▪ Memory
▪ Fatigue
▪ Motivation
Example:

2. Parallel-Form and Alternate-Form Reliability

▪ Coefficient of Equivalence: degree of relationship between various forms of a test by
means of an alternate or parallel forms
▪ Parallel Forms: exist when for each form of the test, the means and the variances of
observed test scores are equal.
▪ Alternate Forms: different versions of a test that have been constructed so as to be
parallel; designed to be equivalent with respect to content and level of difficulty

▪ How to make alternate form?

▪ Same Number of Items: test 1: 100; test 2: 100
▪ Same Format: Likert Scale, Dichotomous Items
▪ Same Type: Achievement, Intelligent
▪ Same Language: English, Filipino
▪ Same Content: Has items that measure the same variable
▪ Same Level oF Difficulty: both has easy, hard, difficult
▪ How hard to construct alternate form?
1. Construct/Administer Original Test
2. Compute Item difficulty for each item
3. Construct/Administer the Alternate Form (Clone) to the same population
4. Compute item difficulty of alternate form items
5. Match items according to difficulty
6. Find new population
7. Administer the original test and alternate form
8. Correlate scores of original and alternates tests
TYPICAL STATISTICAL ERROR
PURPOSE METHOD
USE PROCEDURES VARIANCE
To evaluate When there 1. Administer the Pearson r Test
the is a need first test or Spearman rho Construction
relationship for 2. Administer the or Test
between different alternate test Administration
different forms of a 3. Score both test
forms of a test 4. Correlate
measure and to
avoid practice
effect

▪ Advantages:
▪ minimize the effect of memory for the content of previously administered form of
the test
▪ Disadvantages:
▪ hard to construct, time consuming, expensive

3. Inter-Scorer Reliability
▪ also known as scorer reliability, judge reliability, observer reliability, and inter-rater
reliability
▪ it is the degree of agreement or consistency between two or more scorers with regard
to a particular measure
▪ Coefficient of inter-scorer reliability is the degree of consistency among scorers in the
scoring of a test
▪ Kappa statistics
▪ the best method for assessing the level of agreement among several observers.

▪
▪ where po is the relative observed agreement among raters, and pe is the
hypothetical probability of chance agreement, using the observed data to calculate
the probabilities of each observer randomly saying each category. If the raters are
in complete agreement then κ = 1. If there is no agreement among the raters other
than what would be expected by chance (as given by pe), κ ≤ 0.
▪ Example
Reader B
Yes No

Reader Yes 20 5
A No 10 15

▪ Note that there were 20 proposals that were granted by both reader A and reader
B, and 15 proposals that were rejected by both readers. Thus, the observed
proportionate agreement is po = (20 + 15) / 50 = 0.70
▪ To calculate pe (the probability of random agreement) we note that:
▪ Reader A said "Yes" to 25 applicants and "No" to 25 applicants. Thus reader A
said "Yes" 50% of the time.
▪ Reader B said "Yes" to 30 applicants and "No" to 20 applicants. Thus reader B
said "Yes" 60% of the time.
▪ Therefore the probability that both of them would say "Yes" randomly is 0.50 ·
0.60 = 0.30 and the probability that both of them would say "No" is 0.50 · 0.40 =
0.20. Thus the overall probability of random agreement is Pr(e) = 0.3 + 0.2 = 0.5

TYPICAL STATISTICAL ERROR

PURPOSE METHOD
USE PROCEDURES VARIANCE
To evaluate Used when 1. Look at least 2 Cohen’s kappa, Scoring and
the level of researchers raters Pearson r, or Interpretation
agreement need to show 2. Teach scoring Spearman rho
between that there is system
test raters consensus in 3. Administer test
on a the way that to sample
measure different testtakers
raters view 4. Let 2 raters rate
particular the test
behavior 5. Correlate
pattern

4. Internal Consistency
▪ also known as inter-item consistency
▪ estimates the reliability of a test without developing an alternate form or without having
to administer the test twice to same people
▪ it is the consistency or homogeneity of the items of a test
TYPICAL STATISTICAL ERROR
PURPOSE METHOD
USE PROCEDURES VARIANCE
To evaluate When Depends on Pearson r between Test
the extent to evaluating the the equivalent test halves Construction
which items homogeneity statistical with Spearman rho
on a scale of a measure procedures correction, or Kuder
relate to one used Richardson for
another dichotomous items, or
Coefficient Alpha for
multipoint items, or
APD

4. Internal Consistency (SPLIT-HALF RELIABILITY)

▪ mini parallel forms
▪ applicable only to assessment of homogeneity of the test
▪ obtained by correlating two pairs of scores obtained from equivalent halves of a single
test administered once

▪ Method:
1. Randomly group test items into 2
2. Administer each half to single subject
3. Total each half
4. Correlate!
Assignment of Items: Odd-even (odd-even reliability) or Random assignment or by content
and/or difficulty

▪ Advantages:
▪ Time efficient
▪ Addresses issues about two forms or two administrations
▪ Disadvantages:
▪ Reliability is based on 50% of the test
▪ Solution → Spearman Brown Formula
▪ estimate reliability of the half test if it becomes WHOLE
▪ METHOD: X2
▪ Not for heterogenous test (measures different factors)

4. Internal Consistency (KUDER-RICHARDSON KR20)

▪ For homogenous test as well but primarily those dichotomous in nature, can be scored
right or wrong
▪ Method:
1. Administer the test
2. Score the test
3. Tally how many testtakers get the right to wrong answers per item (RATIO; e.g.
2(right):10(wrong)
4. Then compute
Disadvantages:
• More difficult computation
• Broader range of difficulty
Solution → KR21
• For equal item of difficulty
• For easier computation
Disadvantage of Both:
• does not work for non-objective test

4. Internal Consistency (CRONBACH’S ALPHA)

▪ mean of all possible split-half correlations
▪ unlike KR20, it is appropriately used on tests containing nondichotomous items
▪ for objective and non-objective, likert type
▪ the preferred statistic for obtaining an estimate of internal consistency reliability
▪ typically ranges from 0 (absolutely no similarity) to 1 (perfect similarity) to help answers
questions about how similar sets of data are
▪ Method:
1. Administer test to the subjects
2. Score the test by multiplying each half of each subtest into 2
3. Correlate
▪ Disadvantage:
▪ Does not measure degree of difference
▪ Using and Interpreting a Coefficient of Reliability
Cronbach's alpha Internal consistency
α ≥ 0.9 Excellent
0.9 > α ≥ 0.8 Good
0.8 > α ≥ 0.7 Acceptable
0.7 > α ≥ 0.6 Questionable
0.6 > α ≥ 0.5 Poor
0.5 > α Unacceptable

4. Internal Consistency (AVERAGE PROPORTIONAL DISTANCE METHOD)

▪ rather than focusing on similarity between scores on items of a test, it focuses on the
degree of difference that exists between item scores
▪ Method:
1. Calculate the absolute difference between scores for all of the items
2. Average the difference between scores
3. Obtain the APD by dividing the average difference between scores by the number of
response options on the test, minus one.
2 or lower is excellent internal consistency
.25 to .3 is the acceptable range
not connected to the number of items in a measurement unlike cronbach’s alpha

What will make an item unreliable?

▪ Items are TOO LONG!
▪ Items are vaguely/unclearly written
How to increase reliability?
▪ Eliminate items that are unclear
▪ Standardize the conditions under which the test is taken
▪ Moderate the degree of difficulty of the tests
▪ Minimize the effects of external events
▪ Standardize instructions
▪ Maintain consistent scoring procedures

SUMMARY of Reliability Estimates

Type of Number of Number of Sources of Error Statistical
Reliability Testing Test Forms Variance Procedures
Sessions
Test-retest Administration Pearson r or
2 1
Spearman rho
Alternate- Test construction or Pearson r or
1 or 2 2
forms administration Spearman rho
Internal Test construction Pearson r or
consistency Spearman rho;
Spearman
1 1
Brown
correction;
KR21; α
Inter-scorer Scoring and Pearson r or
1 1
interpretation Spearman rho

C. Using and Interpreting a Coefficient of Reliability

▪ Purpose:
▪ If a specific test is designed for use at various times over the course of the
period, it would be reasonable to expect the test to demonstrate reliability across
time (test-retest).
▪ For a test designed for a single administration only, an estimate of internal
consistency would be the reliability of choice.
▪ If the purpose is to break down the error variance into its parts, then a number
of reliability would have to be calculated.
▪ Nature
▪ Homogeneity vs Heterogeneity
▪ Homogeneity: uniform functionality/one factor (e.g. internal consistency)
▪ Heterogeneity: more than one factor (e.g, test-retest)
▪ Dynamic Characteristics vs Static Characteristics
▪ Dynamic Characteristics: trait, state, or ability presume to be ever-
changing as a function of situation and cognitive experiences
▪ Static Characteristics: relatively unchanging
▪ Restriction Range vs Inflated Range
▪ Restriction Range: variance is restricted by the sampling procedure
(Restricted; correlation coefficient=down)
▪ Inflated Range: variances is inflated by the sampling procedure
▪ (Inflated; correlation coefficient=up)
▪ Speed Test vs Power Test
▪ Speed Test: time pressured; consistency of response speed
▪ Power Test: performance; right or wrong
▪ Criterion-Referenced Test
▪ provide an indication of where a testtaker stands with respect to some
variable or criterion

D. Alternative to the True Score Model

▪ True Score Theory – estimate the portion of a test score that is attributable to
error
▪ Domain Sampling Theory – estimate the extent to which specific sources of
variation under defined conditions are contributing to test score
▪ Generalizability Theory – given the exact same conditions of all the facets in
the universe, the exact same test score should be obtained
▪ Facets – number of items, amount of training, purpose of test
administration
▪ Universe – particular test situations
▪ Item Response Theory – probability that a person with X ability will be able to
perform a level of Y
▪ also known as Latent (unobservable) – Trait theory

E. Standard Error of Measurement

▪ Also known as Standard Error of Scores
▪ provides a measure of the precision of an observed test score
▪ or, provides an estimate of the amount of error inherent in an observed score of
measurement
▪ The higher the SEM, the lower the reliability. And vice-versa

▪ Formula:

▪
▪ σ is the standard deviation of test scores
▪ rxx is equal to the reliability coefficient of the test
▪ Confidence Intervals
▪ 68% → ±1σmeas
▪ 95% → ±2σmeas
▪ 99% → ±3σmeas

Connecting Sources of Error with Reliability Assessment Method

Source of Example Method How assessed
error
Time sampling Same test given at Test–retest Correlation (Pearson r or
two points in time Sperman’s rho)
Item sampling Different items used Alternate forms Correlation (Pearson r or
to assess the same or Sperman’s rho)
attribute parallel forms
Internal Consistency of items [Link]-half 1. Ordinal/Composite
consistency within the same test 2. KR20 2. Kuder-Richardson
3. Alpha 3. Cronbach’s Alpha
Observer Different observers Kappa statistic Kappa’s Coefficient
differences recording Percentage

Reliability and Validity
No ratings yet
Reliability and Validity
47 pages
Reliability & Validity
No ratings yet
Reliability & Validity
6 pages
Language Test Reliability
No ratings yet
Language Test Reliability
20 pages
Chapter 5 Reliability
No ratings yet
Chapter 5 Reliability
38 pages
5 Reliability
No ratings yet
5 Reliability
29 pages
Psychological Assessment HW #6
No ratings yet
Psychological Assessment HW #6
16 pages
Reliability Estimates: Source of Error Variance Is Test Administration
No ratings yet
Reliability Estimates: Source of Error Variance Is Test Administration
8 pages
Psy211 Readings
No ratings yet
Psy211 Readings
12 pages
Reliability Reviewer
No ratings yet
Reliability Reviewer
5 pages
Test Constrcution
No ratings yet
Test Constrcution
39 pages
Lesson 9A - Reliability
No ratings yet
Lesson 9A - Reliability
9 pages
Chapter 5
No ratings yet
Chapter 5
9 pages
Psyc 85 - Reliability
No ratings yet
Psyc 85 - Reliability
37 pages
Readings Psy211
No ratings yet
Readings Psy211
23 pages
Understanding Reliability in Assessment
No ratings yet
Understanding Reliability in Assessment
4 pages
Reliability
No ratings yet
Reliability
2 pages
Week 4 - Reliability
No ratings yet
Week 4 - Reliability
8 pages
Students Slides 1 Realibity
No ratings yet
Students Slides 1 Realibity
59 pages
Reliability 2024
No ratings yet
Reliability 2024
30 pages
UNIT 05: Reliability: Module Overview
No ratings yet
UNIT 05: Reliability: Module Overview
9 pages
Chapter 5 Reliability
No ratings yet
Chapter 5 Reliability
33 pages
Reliability
No ratings yet
Reliability
3 pages
5 Reliability
No ratings yet
5 Reliability
67 pages
Reviewer Test Measurement Midterms
No ratings yet
Reviewer Test Measurement Midterms
6 pages
9 Reliability
No ratings yet
9 Reliability
10 pages
Validity and Reliability: I Qra Development Academy Reporter: Nur - Salam Sultan SEPT. 21, 2019
No ratings yet
Validity and Reliability: I Qra Development Academy Reporter: Nur - Salam Sultan SEPT. 21, 2019
22 pages
Effective Employee Selection Techniques
No ratings yet
Effective Employee Selection Techniques
17 pages
Chapter 5 New
No ratings yet
Chapter 5 New
13 pages
Psychometric Test Properties Guide
No ratings yet
Psychometric Test Properties Guide
44 pages
Properties of Assessment Method: Validity
No ratings yet
Properties of Assessment Method: Validity
30 pages
Paprint
No ratings yet
Paprint
3 pages
Psychometrics
No ratings yet
Psychometrics
102 pages
UNIT-5 Psychometry - 240505 - 1652001
No ratings yet
UNIT-5 Psychometry - 240505 - 1652001
20 pages
Questionnaire Reliability and Validity
No ratings yet
Questionnaire Reliability and Validity
29 pages
Psych Assessment Unit V
No ratings yet
Psych Assessment Unit V
2 pages
Supplementary Readings For Reliability, Validity, Utility
No ratings yet
Supplementary Readings For Reliability, Validity, Utility
8 pages
Group 4 (Reliability)
No ratings yet
Group 4 (Reliability)
78 pages
Chapter 5 - Reliability
No ratings yet
Chapter 5 - Reliability
9 pages
Slide 5 - Reliability Validity
No ratings yet
Slide 5 - Reliability Validity
31 pages
Introduction To Reliability: What Is Reliability? Why Is It Important?
No ratings yet
Introduction To Reliability: What Is Reliability? Why Is It Important?
14 pages
Measurement Tool Quality Guide
No ratings yet
Measurement Tool Quality Guide
57 pages
Downloadfile 9
No ratings yet
Downloadfile 9
32 pages
Alfa Elevado Redundancia Streiner 2003
No ratings yet
Alfa Elevado Redundancia Streiner 2003
5 pages
Psychometric Test Validity Types
No ratings yet
Psychometric Test Validity Types
15 pages
Research Methods: Reliability & Validity
No ratings yet
Research Methods: Reliability & Validity
23 pages
Reliability Test by Group 2
No ratings yet
Reliability Test by Group 2
28 pages
Reading Control No. 4 Qualities of Psychological Tests
No ratings yet
Reading Control No. 4 Qualities of Psychological Tests
7 pages
Reliability 2019
No ratings yet
Reliability 2019
7 pages
Strructures
No ratings yet
Strructures
28 pages
Module 5 The Concept of Realiability (YENI)
No ratings yet
Module 5 The Concept of Realiability (YENI)
35 pages
Reliability
No ratings yet
Reliability
9 pages
Reliability in Psychological Testing
100% (1)
Reliability in Psychological Testing
56 pages
Psy 112 Handout 6
No ratings yet
Psy 112 Handout 6
6 pages
Module 4 Psychometric Properties
No ratings yet
Module 4 Psychometric Properties
49 pages
5.concepts of Reliability
No ratings yet
5.concepts of Reliability
60 pages
Data Analysis in Statistics Course
No ratings yet
Data Analysis in Statistics Course
6 pages
Statistical Theory A Concise Introduction 1St Edition Abramovich - PDF Version
No ratings yet
Statistical Theory A Concise Introduction 1St Edition Abramovich - PDF Version
161 pages
Linear Regression Analysis Results
No ratings yet
Linear Regression Analysis Results
3 pages
SPM Form 4 Mathematics (Chapter 6 Statistics III)
0% (1)
SPM Form 4 Mathematics (Chapter 6 Statistics III)
10 pages
2019 RVHS H2 Maths Prelim P2 Solutions: U U An B
No ratings yet
2019 RVHS H2 Maths Prelim P2 Solutions: U U An B
19 pages
Q4LCTG10
No ratings yet
Q4LCTG10
31 pages
Homework 6
No ratings yet
Homework 6
16 pages
Train, Test and Validation
No ratings yet
Train, Test and Validation
3 pages
Introduction To Applied Geostatistics For Geomodelling
No ratings yet
Introduction To Applied Geostatistics For Geomodelling
75 pages
Independent T Test Lecture
No ratings yet
Independent T Test Lecture
27 pages
Forecasting Template
No ratings yet
Forecasting Template
16 pages
Critical Errors in Linear Regression Analysis
No ratings yet
Critical Errors in Linear Regression Analysis
3 pages
Basic Statistical Concepts-2
No ratings yet
Basic Statistical Concepts-2
20 pages
A Study of Classification Algorithms Using Rapidminer
No ratings yet
A Study of Classification Algorithms Using Rapidminer
12 pages
Statistics Practice Problems
No ratings yet
Statistics Practice Problems
3 pages
HW Istatictik
No ratings yet
HW Istatictik
2 pages
Workforce Data Analysis
No ratings yet
Workforce Data Analysis
9 pages
Eco454 Applied Econometrics Ii Summary 09046023711
No ratings yet
Eco454 Applied Econometrics Ii Summary 09046023711
37 pages
Student Teachers' Attitude Scale
No ratings yet
Student Teachers' Attitude Scale
12 pages
Probability and Statistics in Normal Distribution
No ratings yet
Probability and Statistics in Normal Distribution
4 pages
c4 Problems, Advanced Macroeconomics David Romer 4th
No ratings yet
c4 Problems, Advanced Macroeconomics David Romer 4th
6 pages
1333355396testing For Normality Using SPSS
No ratings yet
1333355396testing For Normality Using SPSS
19 pages
Market Models A Guide To Financial Data Analysis by Carol
No ratings yet
Market Models A Guide To Financial Data Analysis by Carol
2 pages
CH 10
No ratings yet
CH 10
125 pages
A Meta Analysis of Periodized Versus Nonperiodized Strength and Power Training Programs
No ratings yet
A Meta Analysis of Periodized Versus Nonperiodized Strength and Power Training Programs
11 pages
File 3
No ratings yet
File 3
7 pages
SML and Probit in STATA
No ratings yet
SML and Probit in STATA
31 pages
Machine Learning Mcqs
No ratings yet
Machine Learning Mcqs
4 pages
R22 Machine Learning Digital Notes Final
No ratings yet
R22 Machine Learning Digital Notes Final
143 pages
5 1 1-Ridf-Curves
No ratings yet
5 1 1-Ridf-Curves
18 pages

Reliability 1

Uploaded by

Reliability 1

Uploaded by

RELIABILITY

Title: "Halley's Comet"

a proportion of the total score variance attributed to true variance

A. Sources of Error Variance

2. Parallel-Form and Alternate-Form Reliability

▪ How to make alternate form?

TYPICAL STATISTICAL ERROR

4. Internal Consistency (SPLIT-HALF RELIABILITY)

4. Internal Consistency (KUDER-RICHARDSON KR20)

4. Internal Consistency (CRONBACH’S ALPHA)

4. Internal Consistency (AVERAGE PROPORTIONAL DISTANCE METHOD)

What will make an item unreliable?

SUMMARY of Reliability Estimates

C. Using and Interpreting a Coefficient of Reliability

D. Alternative to the True Score Model

E. Standard Error of Measurement

Connecting Sources of Error with Reliability Assessment Method

You might also like