0% found this document useful (0 votes)

24 views33 pages

RELIABILITY Show - PPSX

Uploaded by

Bhavika Rawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views33 pages

RELIABILITY Show - PPSX

Uploaded by

Bhavika Rawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 33

RELIABILITY

“The quality of being trustworthy or dependable.

In psychometrics, the internal consistency and stability with which a
measuring instrument performs its function, corresponding roughly
to the everyday concept of accuracy.”
- Oxford dictionary of Psychology

In the simplest sense, reliability refers to the precision, or accuracy, of

the measurement or score. A well made scientific instrument should
yield accurate results both at present as well as over time. Thus,
reliability refers to this consistency of scores or measurement which is
reflected in the reproducibility of the scores.

•Temporal stability: same relative rank over separate administrations.

•Internal consistency: same relative score over separate sets of equivalent items.
DEFINITION

Reliability refers to the “ consistency of

scores obtained by same individuals
when re examined with tests on different
occasions, or with different sets of
equivalent items or under other variable
examining conditions.”
-Anastasi and Urbina (1997)
PRACTICAL IMPLICATIONS OF MEASURING RELIABILITY
Perfect reliability coefficient: +1.00

Problems in obtaining perfect reliability:

•Personality factors (e.g.. Distraction, motivational and emotional status etc.)
•Imperfection in the instrument (e.g.. Misleading items or instructions)

So,
Where,
XT = X∞ + Xe
XT = The actual obtained
score
X∞ = The true score
true score is the score which is free from errors occurring X e =to
due The
theerror score
chance factor as
well as other kinds of errors. It is indicated by the mean of a large number of scores
made by the same person on the same test.

X∞ = (X1 + X2 + X3 + . . . . . . . + Xn )/n

Usually, a person’s true score remains the same but his obtained score may vary
from trial to trial because error score contributes to obtained score in each trial.
Error score may be a result of two kinds of errors- random (or chance) and
systematic (or constant)errors.

•Random errors: these chance errors work randomly in both the positive and negative
directions and therefore, sometimes inflate and sometimes depress the obtained score.
Since, these errors in the long run would tend to cancel each other out and therefore,
the mean of all of these errors of measurement would be zero.
E.g.. Malfunctioning electronic thermometer

•Systematic errors: they work constantly in one direction and, therefore, they would
either tend to inflate or depress the score. The mean of such errors of measurement
would not be zero.
E.g.. Height board with an incorrect baseline.

The reliability is directly related to the size of the error score. The smaller the error
score, the more reliable the test or measuring instrument.
Reliability is defined, so to speak, through error: the more the error, the less the
reliability; the less error, the greater the reliability. Practically speaking, this means that if
we can estimate the error variance of a measure we can also estimate the measures
reliability.

Where, V= variance
VT = V ∞ + Ve

This brings us to two equivalent definitions of reliability:

• Reliability is the proportion of the “true” variance to the to the total
obtained variance of the data yielded by a measuring instrument.

R= V ∞ / VT
• Reliability is the proportion of error variance to the total obtained
variance yielded by a measuring instrument subtracted from 1.00, the
index 1.00 indicating perfect reliability.

R= 1- (Ve / VT)
RELIABILTY COEFFICIENT
Since all types of reliability are concerned with the degree of consistency or agreement between two
independently derived sets of scores, they can also be expressed in terms of a correlation coefficient.
“In psychometrics, a correlation coefficient or other numerical index of the reliability of
a test or measure.”
-Oxford dictionary of Psychology.
It varies from 0 to +1.00 (it can not be negative).

Satisfactory size for reliability coefficient:

This is judged by the purpose for which the test is given to the examinees. If the purpose
is to segregate superiors and inferiors i.e. to make individual diagnosis, (E.g..
Intelligence, aptitude and achievement tests). The reliability coefficient of 0.90 or higher
is regarded as the best coefficient. Like wise, where the purpose is to compare the means
of the two groups of narrow range a reliability coefficient of 0.50-0.60 should suffice.

Higher reliability necessary when:

•When taking a final decision about people. E.g. placements
•For categorization upon relatively small individual differences. E.g. advanced matrices
Lower reliability acceptable when:
•Preliminary rather than final decisions
•Categorizing people depending upon gross individual differences.
INDEX OF RELIABILITY

Theoretically, the correlation between the obtained scores and the true scores
should be perfect but this is rarely ever the case.

The index of reliability is statistically defined as the correlation coefficient

between the obtained scores and their true counterparts. It indicates the extent to
which we can depend on the obtained scores as a measure of true scores. Thus, it
gives the maximum correlation which the test is capable of yielding in its present
form.

Mathematically, it is equal to the square root of the reliability coefficient of the

test:

R1∞ = √R
TYPES OF
RELIABILTY:
1. Test –retest reliability
2. Internal Consistency reliability
• Split half reliability
• Rulon formula
• Flanagan formula
• Kuder- Richardson formula
• Cronbach’s alpha
3. Alternate forms reliability
4. Scorer reliability
5. ANOVA
TEST- RETEST RELIABILITY
“ A measure of a test’s reliability or more specifically its stability, based on the
correlation between scores of a group of respondents on two separate occasions.”
- Oxford dictionary of Psychology.
In test rest reliability, the single form of the test is administered twice on the same sample
with a reasonable time gap. In this way, two administrations of the same test yield two
independent sets of scores. The two sets, when correlated, give the value of the reliability
coefficient, which is also known as the temporal stability coefficient and indicates to what
extent the examinees retain their relative position as measured in terms of the test score over
a given period of time.
Criteria Administration 1 Administration 2
High Reliability high high
low low
Low Reliability high low
low high

So, what is ‘reasonable’ time gap between two administrations of the test?
When the time is too short, it is likely to increase the reliability coefficient due to carry over and
practice effects, on the other hand if the time gap is too long, it is likely to lower the reliability
coefficient.
The most appropriate and convenient time gap between the two administrations is a fortnight,
which is considered neither too long or too short. There is evidence to support that this time
interval yields a comparatively higher reliability coefficient.
● The test-retest method involves
● (1) administering a test to a group of individuals
● (2) readministering that same test to the same group at some later time, and
● (3) correlating the first set of scores with the second.

The correlation between scores on the first test and scores on the retest is used to
estimate the reliability of the test.
● Same test is administered twice and every test is parallel with itself, differences between
scores on the test and scores on the retest should be due solely to measurement error.
● This argument is often inappropriate for psychological measurement, since it is often
impossible to consider the second administration of a test a parallel measure to the first.
● Thus, it may be inaccurate to treat a test-retest correlation as a measure of reliability.
● The second administration of a psychological test might yield systematically different
scores than the first administration for several reasons.
● First, the characteristic or attribute that is being measured may change between the first
test and the retest.
● Second, the experience of taking the test itself can change a person’s true score; this is
referred to as reactivity.
● Third, one must be concerned with carryover effects, particularly if the interval between
test and retest is short.
● When retested, people may remember their original answers, which could affect their
answers the second time around.
Advantages of test retest reliability
⮚It is the most appropriate method of estimating reliability of both the speed test and the
power test.
⮚In case of heterogeneous tests, the test retest method has proved to be most suitable.
⮚The test-retest method is most useful when one is interested in the long-term stability of a
measure. For example, research on the accuracy of personnel selection tests is concerned
with the ability of the test to predict long-term job performance.
Disadvantages of test retest reliability
Contributing to error variance:
⮚Highly time consuming
⮚It assumes that the examinee and examiner’s physical and mental set up remains unchanged over
time.
⮚It does not account for uncontrollable environmental changes that may take place during either
administration
⮚Maturational effects.
Therefore the source of error variance in this method is time sampling.

Contributing to true variance:

⮚Practice effect- maintaining the same relative position over time due to acquisition of skill from
first administration. As well as familiarity with the types of items and the mode of answering.
⮚Carry over effect-examinee may tend to memorize many answers given in the first
administration.
⮚This method is not appropriate for tests that measure constantly changing characteristics.
ALTERNATE FORMS RELIABILITY
Also known as parallel forms reliability, equivalent forms reliability and the comparable forms reliability.
It requires that the test be developed in two forms, which should be comparable or equivalent.
Two forms of the test are administered to the same sample either immediately the same day or
with a time interval of usually a fortnight.
•Alternate form (immediate) reliability: on the basis of data collected immediately.
Source of error variance- content sampling.
•Alternate form (delayed) reliability: on the basis of data collected after the time gap, usually
a fortnight.
Source of error variance- time sampling, content sampling and content heterogeneity.
Pearson’s r is used to find the alternate forms reliability coefficient which measures the
consistency of the examinee’s scores between two administrations of parallel forms of a single
test. This is known as the coefficient of equivalence.
Disadvantages:
⮚A short time interval may help examinees in maintaining the same position on the second form
⮚A long time interval may lead to a lowered reliability coefficient
⮚Drastic changes in the content of the two forms would contribute to the error variance.
⮚Labor and time intensive.
⮚High standards of equivalence of the two parallel forms
Advantages:
⮚Practice effect and memory effect are automatically controlled because the two forms are similar, not
identical.
⮚Most appropriate method for a speed test.
The alternate forms method of estimating test reliability involves
● (1) administering one form of the test (e.g., form A) to a group of
individuals;
● (2) at some later time, administering an alternate form of the same
test (e.g., form B) to the same group of individuals; and
● (3) correlating scores on form A with scores on form B. The
correlation between scores on the two alternate forms is used to
estimate the reliability of the test.
● Since the two forms of the test are different, carryover effects are
less of a problem, although remembering answers previously given
to a similar set of questions may affect responses to the present set of
questions.
● Reactivity effects are also partially controlled; although taking the
first test may change responses to the second test, it is reasonable to
assume that the effect will not be as strong with alternate forms as
with two administrations of the same test.
● The alternate forms method also may allow one to partially control
for real changes over time in the attribute being measured.
CRITERIA OF EQUIVALENCE OF ITEMS

GUILLIKSEN (1950) has defined parallel tests as tests having equal means,
equal variances and equal inter item correlation.

FREEMAN (1962) has listed the following criteria for judging whether or not
the two forms of the test are parallel:
1. The number of items each should be same.
2. Items in both should have uniformity regarding the content, the range of
difficulty and the adequacy of sampling.
3. Distribution of the indices of difficulty of items in both should be similar.
4. Items in both should have and equal degree of homogeneity, which can be
shown either by inter item correlation or by correlating each item with
subtest scores or total test scores.
5. Means and standard deviations of both the forms should be equal or nearly
so.
6. Mode of administration and scoring of both should be uniform.
SPLIT HALF METHOD
In this, a test is given and divided into two halves that are scored separately. The
results of one half of the test are then compared with the results of the other. The two
halves of the test can be created in a variety of ways. The most common among which
is the odd- even method, whereby one set of score is obtained for the odd numbered
items in the test and other set for the even numbered items.
Split-half methods of estimating reliability provide a simple solution to the two
practical problems that plague the alternate forms method:
(1) the difficulty in developing alternate forms and
(2) the need for two separate test administrations.
● The simplest way to create two alternate forms of a test is to split the existing test
in half and use the two halves as alternate forms.
● The split-half method of estimating reliability thus involves
(1) administering a test to a group of individuals,
(2) splitting the test in half, and
(3) correlating scores on one half of the test with scores on the other half. The
correlation between these two split halves is used in estimating the reliability of the
test.
ESTIMATING
2 SETS OF ODD PRODUCT RELIABILITY RELIABILITY OF WHOLE
SCORES EVEN MOMENT OF HALF TEST
CORRELATION TEST (SPEARMAN BROWN
PROPHECY FORMULA)
2 X reliability of half test
Spearman Brown Prophecy Formula:
1 + reliability of half test

Advantages of split half method

▪All data necessary for calculating the reliability coefficient are obtained in a single
administration of the test. Variability produced by the difference in the two administrations
of the test is automatically eliminated.
▪A quicker estimate of the reliability is made and that is why GUILFORD and FRUCHTER
(1973) have described it as ‘on-the-spot’ reliability.

Disadvantages of split half method

▪Fluctuations due to changes in the temporary conditions within the examinee as well as due
to the temporary changes in the external environment will operate in one direction, i.e., either
favorably or unfavorably, resulting in either enhancement or depression of the real coefficient
of reliability.
▪Should not be used with a speed test because then the reliability coefficient is highly
overestimated.
▪Different methods of dividing the test into halves have been found to yield different
coefficients of reliability.
Rulon formula
Variance of differences
estimated R = 1 -
Variance of total score

From a given test four types of scores are generated: the score on even items, the score on odd items, a
difference score (the score on odd items minus the score on even items) and the score of total items (odd plus
even). The variance of the difference scores and the variance of the total scores is then computed and put into
the above formula. Note that if the scores on the two halves were perfectly consistent then there would be no
variance between the odd items score and the even items score, and so the variance of the difference score
would be zero, and therefore the estimated R would equal 1. the ratio of the two variances in fact reflects the
proportion of error variance which when subtracted from 1, leaves the proportion of “true” variance i.e. the
reliability.
Flanagan formula

estimated R = 2 1- variance of odd score + variance of even score

Variance of total score
This formula for estimating reliability is very similar to the Rulon formula. Here, Variance of the score of odd
numbered items and the variance of the score of the even numbered items are calculated separately and then,
an estimate of error variance is made. Thus, like the Rulon formula it is not based upon the difference of the
two half scores.

Advantages:
•No need to calculate reliability coefficients of the two halves
•Used to compute the reliability of alternate forms of the test
INTERNAL CONSISTENCY RELIABILITY
“In psychometrics, an aspect of reliability associated with the degree to which the items of a test measure
the same construct or attribute.”-Oxford dictionary of Psychology.
So, it indicates the homogeneity of the test. If all the items of the test measure the same
function or trait, the test is said to be a homogenous one and its internal consistency
would be high. From a single administration of a single form of the test it is possible to
arrive at a measure of reliability by various procedures.
Thus, the internal consistency method involves
● (1) administering a test to a group of individuals,
● (2) computing the correlations among all items and computing the average of those
intercorrelations, and
● (3) using Formula 7 or an equivalent formula to estimate reliability.
● This formula gives a standardized estimate; raw score formulas that take into account the
variance of different test items may provide slightly different estimates of internal
consistency reliability.
● There are both mathematical and conceptual ways of demonstrating the links between
internal consistency methods and the methods of estimating reliability discussed so far.
● First, internal consistency methods are mathematically linked to the split-half method. In
particular, coefficient alpha, which represents the most widely used and most general form
of internal consistency estimate, represents the mean reliability coefficient one would
obtain from all possible split halves.
● In particular, Cortina (1993) notes that alpha is equal to the mean of the split halves
defined by formulas from Rulon (1939) and J. C. Flanagan (1937). In other words, if
every possible split-half reliability coefficient for a 30-item test were computed, the
The difference between the split-half method and the internal
consistency method is, for the most part, a difference in unit of analysis.
Split-half methods compare one half-test with another; internal
consistency estimates compare each item with every other item.
In understanding the link between internal consistency and the general
concept of reliability, it is useful to note that internal consistency
methods suggest a fairly simple answer to the question, “Why is a test
reliable?” Remember that internal consistency estimates are a function
of (1) the number of test items and
● (2) the average intercorrelation among these test items.
If we think of each test item as an observation of behavior, internal
consistency estimates suggest that reliability is a function of
● (1) the number of observations that one makes and
● (2) the extent to which each item represents an observation of the
same thing observed by other test items. For example, if you wanted
to determine how good a bowler someone was, you would obtain
more reliable information by observing the person bowl many
frames than you would by watching the person roll the ball once.
Kuder- Richardson formulae
KUDER and RICHARDSON (1937) did a series of researches to remove some of the difficulties of the split
half method. They devised their own formulae for estimating the internal consistency of the test:
KR20 is the basic formula for computing the reliability coefficient and KR 21 is the modified form of KR20 .
These techniques are based on an examination of performance on each item instead of two half scores.

Mathematically, KR reliability coefficient is actually the mean of all split half coefficients resulting from the
different splittings of a test(CRONBACH, 1951)

The source of error variance in KR coefficients is content sampling.

Main requirements for the use of these formulae:

•All items of the test should be homogenous, i.e. each item should measure the same factor or factors in the
same proportion. Thus, the test should be a unifactor one.
•The items should be scored as either +1 or 0.
•For KR20 items should not vary much in their indices of difficulty and for KR 21 all items should be of the
same difficulty value.

Disadvantages of the KR formulae:

1. If the reliability coefficient is computed from the same set of data by the two formulae, KR 21 always
underestimates the coefficient
2. When used in a heterogeneous test where each item measures different functions or traits these
formulae underestimate the actual reliability coefficient.
3. When items differ widely in their indices of difficulty these formulae yield a lower coefficient of
reliability.
4. Should not be used with a speed test or one very much like a speed test.
Cronbach’s Alpha
Also called alpha reliability coefficient and coefficient alpha.
“in psychometrics, a reliability coefficient indicating the degree of internal
consistency of items within a test. Mathematically it is the equivalent of the
average of all possible split half reliability coefficients of the test.”
- Oxford dictionary of Psychology

The KR formulae are applicable to test whose items are scored as either 0 or +1 (wrong
or right) or according to some other all-or-none system. Some tests, however, may have
multiple scored items.
E.g. personality inventory test
For calculating reliability of such tests a generalized formula, i.e. Cronbach’s alpha is
used. The sources of error variance here are content sampling and content heterogeneity.
SCORER RELIABILITY

It is the reliability which can be estimated by having a sample of test independently scored
by two or more examiners or scorers. The two sets of scores obtained by each examiner are
completed in the usual way and the resulting correlation coefficient is known as scorer
reliability. This method is most appropriate for tests where judgment of the scorer is
required such as tests of creativity and projective tests. The source of error variance in
scorer reliability is interscorer differences.
● Both the split-half and the internal consistency methods define measurement error
strictly in terms of consistency or inconsistency in the content of a test.
● Test-retest and alternate forms methods, both of which require two test
administrations, define measurement error in terms of three general factors:
● (1) the consistency or inconsistency of test content (in the test-retest method,
content is always consistent);
● (2) changes in examinees over time; and
● (3) the effects of the first test on responses to the second test. Thus, although each
method is concerned with reliability, each defines true score and error in a
somewhat different fashion.
● Schmidt, Le, and Ilies (2003) note that all of these sources of error can operate
simultaneously, and that their combined effects can be substantially larger than
reliability estimates that take only one or two effects into account.
● The principal advantage of internal consistency methods is their practicality.
● Since only one test administration is required, it is possible to estimate internal
consistency reliability every time the test is given.
● Although split-half methods can be computationally simpler, the widespread
availability of computers makes it easy to compute coefficient alpha, regardless of
the test length or the number of examinees. It therefore is possible to compute
coefficient alpha whenever a test is used in a new situation or population.
THE GENERALIZABILITY OF TEST SCORES

Reliability theory tends to classify all the factors that may affect test scores into two
components, true scores and random errors of measurement.
Although this sort of broad classification may be useful for studying physical measurements, it
is not necessarily the most useful way of thinking about psychological measurement
(Lumsden, 1976).
We typically think of the reliability coefficient as a ratio of true score to true score plus error,
but if the makeup of the true score and error parts of a measure change when we change our
estimation procedures, something is seriously wrong.

The theory of generalizability presented by Cronbach and his colleagues (1972)

represents an alternate approach to measuring and studying the consistency of test
scores.

In reliability theory, the central question is how much random error there is in our measures.

In generalizability theory, the focus is on our ability to generalize from one set of measures
(e.g., a score on an essay test) to a set of other plausible measures (e.g., the same essay graded
by other teachers).

The central question in generalizability theory concerns the conditions over which one can
generalize, or under what sorts of conditions would we expect results that are either similar to
or different from those obtained here? Generalizability theory attacks this question by
G theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972; Shavelson
& Webb, 1991). Generalizability theory is an extension of classical
test theory that uses analysis of variance (ANOVA) methods to
evaluate the combined effects of multiple sources of error variance on
test scores simultaneously. A distinct advantage that G theory has—
compared to the method for combining reliability estimates —is that
it also allows for the evaluation of the interaction effects from
different types of error sources. Thus, it is a more thorough procedure
for identifying the error variance component that may enter into
scores. On the other hand, in order to apply the experimental designs
that G theory requires, it is necessary to obtain multiple observations
for the same group of individuals on all the independent variables that
might contribute to error variance on a given test (e.g., scores across
occasions, across scorers, across alternate forms, etc.).
On the whole, however, when this is feasible, the results provide a
better estimate of score reliability than the approaches described
earlier.
ANOVA
Analysis of variance as a measure of relative reliability technique has
been used by HOYT (1941), JACKSON (1939) and ALEXANDER
(1947). Four assumptions as given by HOYT need to be kept in mind
when using the ANOVA technique:
1. The total score of an examinee on a test can be divided into 4
independent components:
• A component which is common to all examinees and to
all items on the test
• A component associated with items only
• A component associated with examinees only
• The error component independent of the first three factors
2. The variance of the error component of each item is equal
3. The error component for each item is symmetrical and usually
distributed.
4. The error component of any two distinct items is independent.
Disadvantage: cannot be used with a test where speed is an important factor.
Advantage: ANOVA approach can be applied to data obtained from alternative forms and test retest
administrations.
The central assumption of reliability theory is that measurement
errors are essentially random.
This does not mean that errors arise from random or mysterious
processes, across a large number of individuals, the causes of
measurement error are assumed to be so varied and complex that
measurement errors act as random variables.
Thus, a theory that assumes that measurement errors are essentially
random may provide a pretty good description of their effects.
If errors have the essential characteristics of random variables, then it
is reasonable to assume that errors are equally likely to be positive or
negative and that they are not correlated with true scores or with
errors on other tests.
That is, it is assumed that
1. Mean error of =0
2. True scores and errors are uncorrelated
3. Errors on different measures are uncorrelated. On the basis of these
three assumptions, an extensive theory of test reliability has been
developed (Gulliksen, 1950; F. M. Lord & Novick, 1968).
FACTORS INFLUENCING RELIABILITY
EXTRINSIC FACTORS
Factors that lie outside the test itself and tend to make the test reliable or unreliable:
a) Group variability: when the group of examinees being tested is homogenous in ability, the
reliability of test scores is likely to be lower. Only when there is some variability in the
group, correlation and reliability are possible.
b) Environmental conditions: the testing environment should be uniform. Arrangements should
be such that light, sound, and other comforts are equal and uniform to all examinees
otherwise it will tend to lower the reliability coefficient.
c) Momentary fluctuations in the examinee: E.g. a broken pencil, changes in anxiety level,
motivation, distraction.
d) Guessing by the examinees: this has two important effects upon the total test scores.
-it tends to raise the total score, making the reliability coefficient very high
-it contributes to the measurement error since examinees differ in exercising their
luck over guessing the correct answer.
e.g.. True-false, MCQ

INTRINSIC FACTORS
Factors which lie within the test itself and influence the reliability of the test:
a) Range of the total scores: if the obtained total scores on the test are very close to each other
the reliability of the test is lowered.
b) Length of the test: longer tests tend to yield a higher reliability coefficient than a shorter test.
It has been demonstrated that averaging the test scores of several applications essentially
gives the same result as increasing the length of the test. When lengthening, care has to be
taken to see that added items should have the same variance and the same inter-item
correlations as items of the original test. With the Spearman- Brown formula it is possible to
estimate the length of the test required to achieve a given level of the reliability coefficient.
The use of this formula makes two assumptions:
-new items added to the original test must have the same statistical properties as the original
test items, i.e. same average difficulty value and same inter item correlation.
-added items should not influence the examinee’s response.
Ebel (1972) has shown that doubling the length of a test quadruples true variance while only
doubling the error variance.
c) Homogeneity of items: this includes two things- item reliability ( or inter item correlation)
and the homogeneity of function or trait measured from one item to another. When the item
measures different functions and the inter correlations of items are zero or near it
(heterogeneous test) then the reliability is zero or very low.
d) Difficulty value of items: items having indices of difficulty at 0.5 or close to it yield higher
reliability that items of extreme indices of difficulty.
e) Discrimination value: when the test is composed of discriminating items, the inter item test
correlation is likely to be high and then the reliability is also likely to be high.
f) Scorer reliability also known as reader reliability means how closely two or more scorers
agree in scoring or rating the same set of responses. If they do not agree the reliability is
likely to be lowered.
IMPROVING RELIABILITY
The following suggestions are useful for controlling the factors that adversely affect the
reliability of the test:
1. The group of examinees should be heterogeneous that is they should vary widely in the
ability or trait being measured.
2. Items should be homogenous.
3. The test should preferably be a long one.
4. Items as far as possible should be of moderate difficulty values (0.4- 0.5- 0.6).
5. Items should be discriminatory ones.

Apart from these general suggestions there are two common approaches to improving
the reliability of the test:
•First approach emphasizes on increasing the length of the test and assumes that if new
items similar to the original set of items are added, the reliability of the test would tend
to increase.
•Second approach to improve reliability is to discard the items that run down the
reliability. It assumes that for increasing the reliability it must be ensured that all items
measure the same thing. Under this approach two techniques are commonly applied:
factor analysis and item analysis.

Reliability 2019
No ratings yet
Reliability 2019
7 pages
Test Constrcution
No ratings yet
Test Constrcution
39 pages
Chap 5 Reliability of Measurement-Rev
No ratings yet
Chap 5 Reliability of Measurement-Rev
61 pages
Chracteristics of A Good Test
No ratings yet
Chracteristics of A Good Test
58 pages
Reliability and Its Importance
No ratings yet
Reliability and Its Importance
57 pages
Reliability 2024
No ratings yet
Reliability 2024
30 pages
Readings Psy211
No ratings yet
Readings Psy211
23 pages
Reliability & Testing Oral Ability
No ratings yet
Reliability & Testing Oral Ability
35 pages
Reliability of A Test Report Latest
100% (1)
Reliability of A Test Report Latest
18 pages
9 Reliability
No ratings yet
9 Reliability
10 pages
4 Essential Criterions of A Good Test
No ratings yet
4 Essential Criterions of A Good Test
31 pages
Understanding Test Reliability
100% (1)
Understanding Test Reliability
9 pages
Reliabilty Lecture
No ratings yet
Reliabilty Lecture
16 pages
CC04 PA Reliability
No ratings yet
CC04 PA Reliability
10 pages
Psy211 Readings
No ratings yet
Psy211 Readings
12 pages
Reliability Reviewer
No ratings yet
Reliability Reviewer
5 pages
Chapter 6
No ratings yet
Chapter 6
8 pages
Reliability Test by Group 2
No ratings yet
Reliability Test by Group 2
28 pages
Psychometric Properties Reliability Full
No ratings yet
Psychometric Properties Reliability Full
4 pages
Psychometric Properties
No ratings yet
Psychometric Properties
3 pages
Hand Out Reliability
No ratings yet
Hand Out Reliability
3 pages
3 - Reliability
No ratings yet
3 - Reliability
38 pages
Reliability (Statistics)
No ratings yet
Reliability (Statistics)
7 pages
Reliability: Floramae Z. Campos Student/MA-GC
No ratings yet
Reliability: Floramae Z. Campos Student/MA-GC
29 pages
Concept of Reliability, Validity and Norms (AutoRecovered)
No ratings yet
Concept of Reliability, Validity and Norms (AutoRecovered)
10 pages
Reliability & Validity
No ratings yet
Reliability & Validity
6 pages
Chapter 5 Reliability
No ratings yet
Chapter 5 Reliability
38 pages
Final Notes of Psychological Testing
No ratings yet
Final Notes of Psychological Testing
13 pages
Top 4 Characteristics of A Good Test: Characteristic # 1. Reliability
No ratings yet
Top 4 Characteristics of A Good Test: Characteristic # 1. Reliability
21 pages
Language Test Reliability
No ratings yet
Language Test Reliability
20 pages
Reading 03 Psychometric Principles
No ratings yet
Reading 03 Psychometric Principles
20 pages
Reliability
No ratings yet
Reliability
15 pages
TYPESOFRELIABILITY
No ratings yet
TYPESOFRELIABILITY
5 pages
Reliability and Its Types
No ratings yet
Reliability and Its Types
13 pages
Effective Employee Selection Techniques
No ratings yet
Effective Employee Selection Techniques
17 pages
Psy 112 Handout 6
No ratings yet
Psy 112 Handout 6
6 pages
SPL-3 Unit 2
No ratings yet
SPL-3 Unit 2
11 pages
Chapter 5 Reliability
No ratings yet
Chapter 5 Reliability
33 pages
Psych Stats Semi
No ratings yet
Psych Stats Semi
11 pages
Lecture 7 Wednesday 19 Feb Measurement Psychometric Properties
No ratings yet
Lecture 7 Wednesday 19 Feb Measurement Psychometric Properties
31 pages
Educational Assessment Guide
No ratings yet
Educational Assessment Guide
109 pages
Chapter 3: Understanding Test Quality-Concepts of Reliability and Validity
No ratings yet
Chapter 3: Understanding Test Quality-Concepts of Reliability and Validity
10 pages
Module 4
No ratings yet
Module 4
23 pages
Reliability
No ratings yet
Reliability
3 pages
Chapter 4 Reliability 1
No ratings yet
Chapter 4 Reliability 1
79 pages
Handbook of Psychological Assessment Fourth Edition
100% (1)
Handbook of Psychological Assessment Fourth Edition
9 pages
Relai Ibility
No ratings yet
Relai Ibility
60 pages
Reliability & Validity
No ratings yet
Reliability & Validity
6 pages
Reliability
No ratings yet
Reliability
13 pages
Module 4 Psychometric Properties
No ratings yet
Module 4 Psychometric Properties
49 pages
Testing and Assessment - Reliability and Validity
No ratings yet
Testing and Assessment - Reliability and Validity
5 pages
Psychometrics: Understanding Reliability
No ratings yet
Psychometrics: Understanding Reliability
7 pages
Supplementary Readings For Reliability, Validity, Utility
No ratings yet
Supplementary Readings For Reliability, Validity, Utility
8 pages
W2 - Reliability in ESL Research
No ratings yet
W2 - Reliability in ESL Research
27 pages
Validity
No ratings yet
Validity
11 pages
(Mid-Semster Revision) PSYU3332 - Principles of Psychological Assessment
No ratings yet
(Mid-Semster Revision) PSYU3332 - Principles of Psychological Assessment
15 pages
Reliability Estimates: Source of Error Variance Is Test Administration
No ratings yet
Reliability Estimates: Source of Error Variance Is Test Administration
8 pages
Startme
No ratings yet
Startme
16 pages
Laptop Guide PDF
No ratings yet
Laptop Guide PDF
19 pages
Windows Client Setup Guide
No ratings yet
Windows Client Setup Guide
13 pages
Assessment Task 2 Instructions: Answer
No ratings yet
Assessment Task 2 Instructions: Answer
8 pages
Sustainability 13 07042 v2
No ratings yet
Sustainability 13 07042 v2
12 pages
Reteach 1
No ratings yet
Reteach 1
2 pages
02 Second Law
No ratings yet
02 Second Law
61 pages
Module 6 Stoichiometry 1
No ratings yet
Module 6 Stoichiometry 1
37 pages
Tales from the Rabbi's Desk 2
No ratings yet
Tales from the Rabbi's Desk 2
11 pages
Gary Goldschneider's Everyday Astrology PDF
No ratings yet
Gary Goldschneider's Everyday Astrology PDF
31 pages
US Apparel Market Forecast 2024
No ratings yet
US Apparel Market Forecast 2024
24 pages
Princeton Chromatography SFC & HPLC Solutions
No ratings yet
Princeton Chromatography SFC & HPLC Solutions
20 pages
2005 FBLA Introduction To Business Communication
No ratings yet
2005 FBLA Introduction To Business Communication
7 pages
Power Grid Corporation of India: Print
No ratings yet
Power Grid Corporation of India: Print
2 pages
ThreadReader 0 Edwardw2 1443550946932543498
No ratings yet
ThreadReader 0 Edwardw2 1443550946932543498
12 pages
Compendium Part 21 1
No ratings yet
Compendium Part 21 1
7 pages
Lecture 05 Roundabouts
100% (1)
Lecture 05 Roundabouts
36 pages
Corporate Strategy and Planning - Timothy Mahea
83% (6)
Corporate Strategy and Planning - Timothy Mahea
224 pages
RA 9184 Slides (1) - Atty. Tom
No ratings yet
RA 9184 Slides (1) - Atty. Tom
86 pages
MANM519 - Week 3 AI Jobs and Future of Work - Lecture Notes
No ratings yet
MANM519 - Week 3 AI Jobs and Future of Work - Lecture Notes
12 pages
Sugar Community Edition Application Guide 6.2
No ratings yet
Sugar Community Edition Application Guide 6.2
196 pages
BTVN Ngày 11.4
No ratings yet
BTVN Ngày 11.4
4 pages
Chromosome
No ratings yet
Chromosome
10 pages
Benazepril
No ratings yet
Benazepril
2 pages
WWG - Maiden of The High Seas - Props - Instructions
No ratings yet
WWG - Maiden of The High Seas - Props - Instructions
22 pages
Cape Physics Unit 2 Formula Sheet
No ratings yet
Cape Physics Unit 2 Formula Sheet
4 pages
Basic Karma Consciousness Healing Basic Practitioner Manual
100% (10)
Basic Karma Consciousness Healing Basic Practitioner Manual
71 pages
Insider Trading: India vs USA
100% (1)
Insider Trading: India vs USA
13 pages
ZD 180B Rescue Chopper
No ratings yet
ZD 180B Rescue Chopper
20 pages
Applied Mechanics (Dynamics)
No ratings yet
Applied Mechanics (Dynamics)
2 pages