2/14/2019
RELIABILITY
GOOD PSYCH TEST?
Ardyne D. Tuazon, RPm, LPT Good “psychometric properties”
RELIABILITY
VALIDITY
WHAT IS A RELIABLE TEST?
First try: 350 lbs
Second try: 125 lbs
Third try: 150 lbs
INTELLIGENCE TEST
First administration: EXCELLENT
Second administration: LOW AVERAGE
SAME WITH PSYCH TEST Third administration: HIGH AVERAGE
1
2/14/2019
RELIABILITY (?)
Give CONSISTENT result What ASPECT to check?
REVIEW YOUR CORRELATION
Psychological
test Main tool to check your reliability and validity.
- Testing procedure
- Scoring System
- Test Items
CHECKING RELIABILITY TESTING PROCEDURE
- Environment
- Delivery of instructions
- Behavior of the examiner
-Testing procedure
-Scoring System
-Test Items
2
2/14/2019
Method # 1 Test-Retest Method
Method # 1 GENERAL STEPS
Test-Retest Method Step 1. Administer the test
Step 2. Get results
Step 3. INTERVAL (TIME GAP)
Step 4. Re-administer the test
Step 5. Get result
Examinee T1 T2
Is the testing procedure
1 30 28
reliable?
2 19 21
3 24 23 Correlation Coefficient : .93
… ? ?
Correlate
RELIABLE
Coefficient .93?
PROCEDURE IS GOOD
CONCIDERATION: Disadvantage of test-retest (?)
STABILITY OF THE VARIABLE BETTER PERFORMANCE (?)
Length of the interval - Checking of answer
- Practice effect
If the variable is highly unstable (i.e. anxiety)
length of the interval is SHORTER RESULTED TO: LOWERS CORRELATION
If the variable is highly stable (i.e. personality)
Length of the interval is LONGER
3
2/14/2019
METHOD #2
Reliability of the
TESTING PROCEDURE
SAME TEST but,
Population 1 = DIFFERENT items
Test A -----Test A’
(Using a CLONE test)
ALTERNATE FORM?
GENERAL STEPS:
“Equivalent” Step 1: Administer first
test
- # of items Step 2: Administer
- Format ALTERNATIVE FORM
(Equivalent) test
- Language
Step 3: Score both
- Content Step 4: CORRELATE!
- LEVEL OF DIFFICULTY
Step 1: Construct/Administer original test
Step 2: Compute item difficulty for each item.
FORMULA?
DISADVANTAGE OF ALTERNATE Number of subjects committed mistakes X 100
N
FORM?
Step 3: Construct items of Alternate (CLONE)
REMEMBER: Same nature
Hard to construct!; Time more consuming
Step 4: Administer alternate form (to who?)
(Test-Retest MORE PREFERRED) SAME POPULATION as that of the original
test!
Step 5: Compute item difficulty of alternate forms items
(same formula)
4
2/14/2019
CHECKING RELIABILITY
Step 6: MATCH items according to difficulty (between
original and alternate)
Ex.
Item 1 (98%) Item 25 (98%)
Item 2 (85%) Item 3 (83%)
Step 7: Find new population
Step 8: Administer the original test and the alternate
form (immediate, delayed) -Testing procedure
Step 9: CORRELATE scores of original and alternate test -Scoring System
-Test Items
Scoring system Is the scoring system reliable?
• Guidelines HOW TO score/compute Someone took a psychological test…
the score of the test taker
Let 2 raters compute for the score based on the
(easy to understand?) manual
Same computation?
Method #3 Inter-scorer reliability
Examinee Rater 1 Rater 2
Step 1: look for at least two raters.
1 30 28
Step 2: Teach scoring system.
Step 3: Administer test to sample subjects
2 19 21
Step 4: Let 2 raters rate the sample subject 3 24 23
… ? ?
Correlate
5
2/14/2019
If the correlation result is Can we use more than 2?
= +.91 YES! (How?)
Rater: 1 2 3 4 5
=-. 91
Correlate (1-2) (1-3) (1-4) (1-5)
=.10 (2-3) (2-4) (2-5) (3-4) (4-5)
CHECKING RELIABILITY
-Testing procedure
-Scoring System
-Test Items
Method # 4 Internal consistency A) SPLIT HALF RELIABILITY
Check reliability of individual test items. DIVIDING test items into 2 groups
Methods: Step 1: RANDOMLY group test items into 2
(why random?)
- Split half - To equalize the difficulty
Step 2: Administer each half to single
- KR-20
subject
- Cronbach’s Alpha
Step 3: Total each half
6
2/14/2019
ITEM # ITEM #
Examinee First half Second Half
1 11 3 12
(1,2,5,6, …) (3,4,7,8 …)
2 13 4 15
5 14 7 16
1 30 28
6 18 8 17 2 19 21
9 20 10 19 3 24 23
… ? ?
Administer to the same person
Step 4: Correlate
Get the score of each half r = .90
Examinee First half Second Half Examinee First half Second Half
(1,2,5,6, …) (3,4,7,8 …) (1,2,5,6, …) (3,4,7,8 …)
1 30 28 1 30 28
2 19 21 2 19 21
3 24 23 3 24 23
… ? ? … ? ?
IF WEAK
Step 4: Correlate
r = .10 NO PATTERN
r = .90
ADVANTAGE Spearman-brown formula:
Time efficient (one sitting!)
Estimates reliability of half test if it
DISADVANTAGE becomes WHOLE
1. Not applicable in heterogeneous test (?) - Multiplies the test into two to assume
- test with many components (ex. Test 1, that it is whole
test 2, test 3 etc.)
2. Reliability only based on 50% of the test (i.e.
Partner)
MEANING ONLY HALF OF THE TEST IS RELIABLE
7
2/14/2019
B) Kuder Richardson (KR 20) Method
Strictly: OBJECTIVE TEST only RATIO: Example:
Right to wrong answer 100 Subjects
MAIN QUESTION : per item
Item 1 = 1:1
Item 2 = 20:1
HOW CONSISTENTLY people get an item right or
wrong?
Poorer reliability?
SPSS INTPUT 2 disadvantage of KR20
ITEM 1 ITEM 2 ITEM 3 ITEM 4
1. N/A when item unequal difficulty
Student 1 wrong Correct Correct Wrong
Student 2 Correct Correct Correct Correct
Student 3 wrong Correct wrong wrong Solution:
Student 4 wrong wrong Correct wrong
Student 5 Correct Correct Correct Correct
KR 21
KR Value KR Value KR Value KR Value
(right-wrong (right-wrong (right-wrong (right-wrong
ratio) ratio) ratio) ratio) 2. Doesn’t work for non objective test (i.e.
personality test
C) Cronbach’s Alpha PROBLEM: SPLIT HALF TECHNIQUE
Limited (1) combination of half items
ITEM # ITEM #
For non objective test, (Likert Type) 1 11 3 12
2 13 4 15
AVERAGE of all possible split half 5 14 7 16
6 18 8 17
9 20 10 19
8
2/14/2019
CHRONBACH’S ALPHA
Get’s all the possible split half
ITEM # ITEM #
2 13 1 11
4 14 3 12
5 15 8 16
6 19 9 17
7 20 10 18
ITEM # ITEM #
1 11 3 12
2 13 4 15
5 14 7 16
6 18 8 17
9 20 10 19
EXAMPLE OF VAGUE ITEMS
(Psych Achievement Test)
CLOSING QUESTION:
TRUE or FALSE:
What will make a test item
unreliable? (2) The unconscious always contains
negative images which represents
Test TOO LONG (why?) a person’s past negative
Item is vaguely/unclearly written! experiences
SUMMARY
TEST ADMINISTRATION TEST ITEMS
- Test-retest
- Alternate form
- Split Half
- Kuder Richardson END
- Cronbach’s Alpha
SCORING SYSTEM
- Inter-rater reliability