0% found this document useful (0 votes)
4 views

Reliabilty Lecture (5)

The document discusses the concept of reliability in measurement, defining it as consistency and dependability in results. It outlines various types of reliability, including test-retest, parallel-forms, split-half, and internal consistency, while also addressing measurement errors and their impact on reliability. The document emphasizes the importance of understanding different sources of error and the methods to assess reliability effectively.

Uploaded by

Ahmad Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Reliabilty Lecture (5)

The document discusses the concept of reliability in measurement, defining it as consistency and dependability in results. It outlines various types of reliability, including test-retest, parallel-forms, split-half, and internal consistency, while also addressing measurement errors and their impact on reliability. The document emphasizes the importance of understanding different sources of error and the methods to assess reliability effectively.

Uploaded by

Ahmad Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

RELIABILITY

BY: HINA SAMREEN


RELIABILITY
• Synonym for dependability or consistency (Gatewood & Field, 2001).
• Refers to consistency in measurement/ stability of results on a measure (Schwab, 2005).
• “The degree to which the results can be replicated under similar conditions”(Mc Bride,
2010).
• Reliability – not an all-or-none matter.
• A test may be reliable in one context and unreliable in another.
• Reliability Coefficient – an index of reliability, a proportion that indicates the ration
between the true score variance on a test and the total variance.
• Reliability is based on probability with a reliability coefficient ranging from 0 to 1.
• IF we use X to represent an observed score, T to represent a true score,
and E to represent error, then the observed score equals the true score
plus error.
X= T + E
• VARIANCE – any type of change. Variance in score affects the reliability of
the test.
• Variance from true differences is true variance and variance from
irrelevant, random sources is error variance.
• Term reliability refers to the proportion of the total variance attributed to
true variance, the more reliable the test.
• Because true differences are assumed to be stable, presumed to yield
consistent scores on repeated administrations of the same test as well as
on equivalent forms of test.
• Error variance may increase or decrease a test score by varying
amounts – consistency of the test score or reliability may be affected.
• Measurement Error – all of the factors associated with the process of
measuring some variable., other than the variable being measured.
• Example. Consider an English-language test on the subject of 12 th grade algebra being
administered, in English to a sample of 12 th grade students ,newly arrived to the United
States from China. Students in the sample are known to be Whiz-Kids in algebra. Yet all of
the students receive failing grades on the test, for some reasons.
• Do these failures indicate that these students are not whiz kids at
all? Perhaps this group of students did not do well on the algebra test
because they could neither read nor understand what was required
of them.
• In fact, the test administered was written and administered in
English – measurement of error. (Test should have been translated and administered
in the language of the test takers).

Categories of Measurement Error (Two)


Systematic or random error
Random error. Source of error in measuring a targeted variable caused by
unpredictable fluctuations and inconsistencies of other variables in the measurement
process. (Example: noise, unanticipated events happening within the test taker, etc.).
Systematic error. Source of error in measuring a variable that is typically
constant or proportionate to what is presumed to be true value of the variable being
measured. (Example. A measuring instrument itself found t be a source of systematic
error).
Types of Reliability
• Test-Retest Reliability. An estimate of reliability obtained by correlating pairs
of scores from the sample on two different administrations of the same test.
• Purpose of this type of reliability – Participant’s scores are consistent on
multiple administrations of the same test over time.
• Test-retest measure is appropriate when evaluating the reliability of a test
that purports to measure something that is relatively stable over time e.g.,
personality trait.
• If the characteristic being measured is assumed to fluctuate over time – test
retest reliability not suitable.
• As time passes by, people change. E.g., people may learn new things, forget
some tings and acquire new skills.
• The main issue with test-retest reliability is (Sturman et al., 2005)
– Difference in measures between the first and second administrations could impact the
reliability due to the following factors:
1. Time interval between test administrations (Passage of time can be a source of variance – the
longer the time interval between administrations of the same test, the correlation between the scores
obtained on each testing decreases).

2. The test or other factors associated with the participant.


• Test retest reliability appropriate for gauging the reliability of test that employ outcome measures
e.g., reaction time, or perceptual judgements.
• Even when the time period between the two test administrations of test is relatively small –
various factors like experience, fatigue, practice, memory etc. may intervene and confound
reliability.
• Parallel-forms/Equivalent forms of Reliability
• When a researcher creates two different but similar tests that measure the
same construct.
• Alternate forms are simply different versions of a test tat have been
constructed so as to be parallel.
• Idea behind parallel or equivalent forms reliability is to have two
conceptually identical tests that utilize separate questions to measure the
same construct of interest.
• Alternate forms of a test are typically designed to be equivalent with
respect to variables such as content and level of difficulty.
• Alternate forms reliability refers to an estimate of the extent to which
these different forms of the same test have been affected by item
sampling error or other error.
• Obtaining estimates of alternate-forms reliability is similar in two ways
to test-retest reliability:
– Two test administrations with the same group are required
– Test scores may be affected by factors such as motivation, fatigue or intervening
evets like practice, learning, memory or therapy.
Additional source of error variance – item sampling – test takers may do better or
worse on a specific form of the test not as a function of their true ability but
simply because of the particular items that were selected in the test.
• Having multiple items to measure the same construct could be a benefit
for using parallel or equivalent forms reliability to create similar but
different tests. The challenge you face when multiple tests are created
to measure the same construct is that the items on both versions of the
same test may not actually measure the same construct.
Strengths & Limitations
• Developing alternate-forms – time consuming & expensive.
• Test scores may be affected by test take variables.
• Once developed – advantageous for the test user
– Minimizes the effect of memory for the content of previously
administered tests - removes carryover effect.
Split-half Reliability
• Purpose of split half reliability is to divide the test/measure into two halves and test
the internal consistency of the items used.
• An estimate of split-half reliability is obtained by correlating two pairs of scores
obtained from two equivalent halves of a single test administered once.
• Split half reliability is different to the parallel or equivalent forms reliability with a
couple exceptions.
– Parallel or equivalent forms reliability requires two versions of a test.
– With split half, researchers conducts single administration of the test/measure and split the test
in two equal halves (odd-even method).

• Useful measure of reliability when it is impractical to assess reliability with two tests
or to administer a test twice – because of the factors such as time and expense.
• Recommended for homogenous tests.
• One common criticism of this technique is determining where to split the test
because of how the items are divided within the measure.
• There is more than one way to split a test – but there are some ways you should
never split a test.
– Simply dividing the test in the middle is not recommended – there is likely that this procedure
would spuriously raise or lower the reliability coefficient.
• (1). acceptable way to split a test – randomly assign items to one/other half of the
test.
• (2). Other acceptable way – split a test to assign odd-numbered items to now half of
the test & even-numbered items to other half of test. This method referred to as
odd-even reliability.
Alpha Coefficient
• Internal consistency reliability – Coefficient alpha or Cronbach’s alpha.
• Widely reported measure of reliability (Hogan, Benjamin, Brezinski, 2003).
• Similar to split half reliability as it also measures the internal consistency or
correlation between the items on a test.
• Main difference between split half and coefficient alpha – the entire test is used to
estimate the correlation between the items without splitting the test in half.
• Cronbach (1951) outlined that coefficient of alpha of greater than or equal to 0.7 is
generally acceptable.
• Very high Cronbach alpha – indicate redundancy of the items.
Inter-rater reliability
• Also known as inter-observer reliability or inter-judge reliability,
assesses the level of agreement or consistency between multiple raters
or observers when they independently assess the same phenomenon,
event or data. It is used to determine the extent to which different
raters, who may have different perspectives or judgments, provide
similar assessments or insights.
• Measuring the consistency of ratings across different raters.
• High inter-rater reliability indicates that the judgments made by
different raters are in agreement.
Intra-rater reliability
• Also known as intra-observer reliability or test-retest reliability.
• It assesses the consistency of ratings or measurements made by the same rater or
observer on two or more occasions when assessing the same phenomenon or data.
• When a researcher examines the consistency of one particular individual’s rating at
multiple points in time.
• Purpose of intra-rater reliability is to determine the sustainability
of an individual’s ratings at two different points in time.
• It is used to determine whether a single rater’s judgments or measurements are
consistent over time.
• High intra-rater reliability indicates that the rater provides consistent results when
assessing the same thig on different occasions.
Thank you

You might also like