0% found this document useful (0 votes)
170 views16 pages

Psychological Assessment HW #6

The document defines key terms related to psychological assessment and testing, including: Reliability refers to the proportion of test score variance attributed to true differences between test takers. Sources of score variability include true variance and error variance. Reliability coefficients measure the consistency of test scores, such as test-retest, parallel forms, and inter-scorer reliability. Validity determines if a test measures the intended construct. Classical test theory and generalizability theory provide frameworks for understanding measurement error.

Uploaded by

maerucel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views16 pages

Psychological Assessment HW #6

The document defines key terms related to psychological assessment and testing, including: Reliability refers to the proportion of test score variance attributed to true differences between test takers. Sources of score variability include true variance and error variance. Reliability coefficients measure the consistency of test scores, such as test-retest, parallel forms, and inter-scorer reliability. Validity determines if a test measures the intended construct. Classical test theory and generalizability theory provide frameworks for understanding measurement error.

Uploaded by

maerucel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Psychological Assessment

Seniors’ In-Service Training


Worksheet #6

Name: Bula,Tabac,Trocio
Date: Aug.7, 2020

KEY TERM EXERCISE:


Key Term Definition
Reliability Coefficient It is an index of reliability, a proportion that
indicates the ratio between the true score
variance on a test and the total variance.
Variance A statistic useful in describing sources of test
score variability. It is the standard deviation
squared.
True Variance Variance from true differences.
Error Variance Variance from irrelevant and random sources.
Reliability It refers to the proportion of the total variance
attributed to true variance.
Measurement Error Refers to collectively all of the factors associated
with the process of measuring some variable
other than the variable being measured.
Random Error Source of error in measuring a targeted variable
caused by unpredictable fluctuations and
inconsistencies of other variables in the
measurement process
Systematic Error Refers to a source of error in measuring a variable
that is typically constant or proportionate to what
is presumed to be the true value of the variable
being measured
Item or Content Sampling Terms that refer to variation among items within a
test as well as to variation among items between
tests
Test-retest Reliability Estimate of reliability obtained by correlating pairs
of scores from the same people on two different
administrations of the same test
Coefficient of Stability When the interval between testing is greater than
six months. It is also where the subjects being
measured and the measuring instrument remain
precisely the same.
Coefficient of Equivalence The degree of the relationship between various
forms of a test can be evaluated by means of an
alternate-forms or parallel-forms coefficient of
reliability.
Parallel Forms It exists when, for each form of the test, the
means and the variances of observed test scores
are equal.
Parallel Forms Reliability
It refers to an estimate of the extent to which item
sampling and other errors have affected test
scores on versions of the same test when, for
each form of the test, the means and variances of
observed test scores are equal,
Key Term Definition
Alternate Forms These are simply different versions of a test that
have been constructed so as to be parallel.
Alternate Forms Reliability Refers to an estimate of the extent to which these
different forms of the same test have been
affected by an item sampling error, or other error.
Split-half Reliability Obtained by correlating two pairs of scores from
equivalent halves of a single test administered
once
Odd-even Reliability Splitting a test by assigning odd-numbered items
to one test and even-numbered items on another
test
Spearman-Brown Formula Allows a test developer or user to estimate
internal consistency reliability from a correlation of
two halves of a test
Inter-item consistency Refers to the degree of correlation among all the
items on a scale
Test Homogeneity It is when the test contains items that measure a
single trait.
Test Heterogeneity It is when the test is composed of items that
measure more than one trait.
Kuder-Richardson Formula 20/KR-20 It is used for determining the inter-item
consistency of dichotomous items, primarily those
items that can be scored right or wrong (such as
multiple-choice items). If test items are more
heterogeneous, KR-20 will yield lower reliability
estimates than the split-half method.
Coefficient Alpha It may be thought of as the mean of all possible
split-half correlations, corrected by the
Spearman–Brown formula.
Average Proportional Distance Method It is the measure used to evaluate the internal
consistency of a test that focuses on the degree of
difference that exists between item scores.
Inter-scorer Reliability Degree of agreement or consistency between two
or more scorers with regard to a particular
measure
Coefficient of inter-scorer reliability The correlation coefficient used when determining
the degree of consistency among scorers in the
scoring of a test
Dynamic Characteristic Trait, state, or ability presumed to change as a
function of situational or cognitive experiences
Static Characteristic Trait, state, or ability that is relatively unchanging
Power test A test whose time limit is long enough to allow
testtakers to attempt all items, and if some items
are so difficult that no testtaker is able to obtain a
perfect score
Speed Test It contains items in uniform level of difficulty so
that, when given generous time limits, all test
takers should be able to complete all the test
items correctly
Criterion-Referenced Tests It is designed to provide an indication of where a
testtaker stands with respect to some variable or
criterion, such as an educational or a vocational
objective.
Classical Test Theory also referred to as the true score (or classical)
model of measurement. It has a notion that
everyone has a “true score” on a test has had,
and continues to have, great intuitive appeal.
True Score It genuinely reflects an individual’s ability (or trait)
level as measured by a particular test
Domain Sampling Theory It seeks to estimate the extent to which specific
sources of variation under defined conditions are
contributing to the test score.
Generalizability Theory Based on the idea that a person’s test scores vary
from testing to testing because of variables in the
testing situation
Universe Details of a particular test situation
Facets Includes things like number of items in the test,
amount of training of test scorers, purpose of test
administration
Universe Score Analogous to a true score in the true score model

Key Term Definition


Item Response Theory Procedures of this theory provide a way to model
the probability that a person with x ability will be
able to perform at a level of Y.
Latent-Trait Theory It refers to a family of theories and methods—and
quite a large family at that—with many other
names used to distinguish specific approaches. It
is also a general psychometric theory contending
that observed traits, such as intelligence, are
reflections of more basic unobservable traits
Discrimination It signifies the degree to which an item
differentiates among people with higher or lower
levels of the trait, ability, or whatever it is that is
being measured.
Dichotomous Test Items Test items or questions that can be answered with
only one of two alternative responses, such as
true–false, yes–no, or correct–incorrect questions.
Polytomous Test Items Test items or questions with three or more
alternative responses, where only one is scored
correct or scored as being consistent with a
targeted trait or other construct
Rasch Model It is a reference to an IRT model with very specific
assumptions about the underlying distribution.

DISTINGUISHING BETWEEN RANDOM AND SYSTEMATIC ERRORS

Fill out the table below with your own examples of Random and Systematic Errors.

Random Errors Systematic Errors

drowsiness of a test taker using the same metal ruler in different


climates (hot and cold)

a sudden blackout occuring within the vicinity using a weighing scale that adds 11kgs
of the test venue everytime you measure yourself

hunger of the test taker a faulty thermometer that adds 2°C to every
temperature check conducted

the frequent occurrence of brownouts as


a fire suddenly emerging from the test venue electrical currents are consistently low

a tornado passing by within the vicinity of the a cloth measuring tape has been overused
test venue that it’s been stretching 5cm every year

Fill out the table with characteristics of Parallel and Alternate Forms to reflect how they are similar
and how they are different in terms of definitions, descriptions, characteristics.

Parallel and Alternate Forms Similarities Differences


degree of the relationship parallel forms of a test exist
between various forms of a when, for each form of the
test can be evaluated by this Two test administrations with test, the means and the
means the same group are required variances of observed test
scores are equal

the means of scores obtained


on parallel forms correlate
equally with the true score

scores obtained on parallel


correlate equally with other
measures

parallel forms reliability refers


to an estimate of the extent to
which item sampling and
other errors have affected test
scores on versions of the
same test when, for each
form of the test, the means
and variances of observed
test scores are equal

alternate-forms or parallel- test scores may be affected designed to be equivalent


forms of coefficient of by factors such as motivation, with respect to variables such
reliability which is often fatigue, or intervening events as content and level of
termed as coefficient of such as practice, learning, or difficulty
equivalence therapy (although not as
much as when the same test alternate forms reliability
is administered twice) refers to an estimate of the
extent to which these different
forms of the same test have
been affected by item
sampling error, or other error

additional source of error


Certain traits are presumed variance, item sampling, is
to be relatively stable in inherent in the computation of
people over time, and we an alternate- or parallel-forms
would expect tests measuring reliability
those traits— alternate forms,
parallel forms, or otherwise—
to reflect that stabilit

advantageous to the test user


in that it minimizes the effect
of memory for the content of
a previously administered
form of the test

TESTING YOUR UNDERSTANDING


Indicate whether the statement is True (T) or False (F)

1. The greater the proportion of the total variance attributed to true variance, the more reliable
the test. T
2. True differences are not presumed to yield consistent scores on repeated administrations of
the same test. F
3. Error variance can affect the reliability of a test. T
4. Systematic Errors affect score consistency. F
5. A challenge faced by a test developer is minimizing the proportion of the total variance that is
true variance. F
6. Test-Retest reliability is suitable for a test that measures a construct that is relatively stable
over time. T
7. Reliability always increases as test length increases. T
8. A measure of inter-item consistency is calculated from multiple administrations of a single
form of a test. F
9. The more homogeneous a test is, the more inter-item consistency it can be expected to have.
T
10. Where items are highly homogeneous, KR-20 and split-half reliability estimates will be
similar. T

SOURCES OF ERROR VARIANCE


Fill out the table by giving two examples for each of the possible sources of error variance that
are different from the ones provided in the book.

Test Construction Test Administration Test Scoring Test


Interpretation

Making
When evaluating the unsupported
homogeneity of a When assessing the Results that are very conclusions in
measure (or, all items stability of various far from the true terms of test
are tapping a single personality traits score of the test results
construct) taker

If test questions are


difficult, confusing or Instructions that Error due to the Coding of
ambiguous, reliability is interfere with accurately variation of in the behavior
negatively affected. gathering information setting of the
Some people read the (such as a time limit workpiece and the
question to mean one when the measure the instrument
thing, whereas others test is seeking has
read the same nothing to do with
question to mean speed) reduce the
something else. reliability of a test.
LEARNING MORE ABOUT THE SPEARMAN BROWN FORMULA

In the table below, list what you have learned about the Spearman Brown formula and
its usefulness in test

enables test developer to foresee the


internal consistency reliability by
magnifying the relationship of two halves
of a test through correlation
when one wants to shorten the length of a
test, the formula can utilized to calculate
the method of shortening on the test’s
reliability

determine how many items are needed in


order for the test to be called reliable
formula is also used when newly added
items can increase the reliability of the
test

used to determine how homogenous


items are in a test
THE COEFFICIENT ALPHA, THE AVERAGE PROPORTIONAL DISTANCE, AND
THE KR-20

In the table below, list the definitions, similarities, and differences of the three methods
of estimating internal consistency

Coefficient Alpha Average Proportional Kuder-Richardson 20


Distance
Where test items are
Coefficient alpha may be a measure used to highly homogeneous, KR-
thought of as the mean of evaluate the internal 20 and split-half reliability
all possible split-half consistency of a testthat estimates will be similar
correlations focuses on the degree of
difference that exists
between items scores
Appropriate for use on determining the inter-item
tests containing non- Step 1: Calculate the consistency of
dichotomous items absolute difference dichotomous items,
between scores for all of primarily those items that
the items can be scored right or
Step 2: Average the wrong (such as multiple-
difference between scores choice items)
Step 3: Obtain the APD by
dividing the average
difference between scores
by the number of response
options on the test, minus
1

preferred statistic for The general rule of thumb The mostly widely used
obtaining an estimate of for interpreting APD adaptation of the KR-20 is
internal consistency –oA value of .2 or lower is a statistic called the
reliability indicative of excellent coefficient alpha(coefficient
internal consistency α-20)
In contrast to KR-20, which oA value of .25 to .2 is in
is appropriately used only the acceptable range
on tests with dichotomous oA value above .25 is
items, coefficient alpha is suggestive of problems
appropriate for use on with the internal
tests containing consistency of the test
nondichotomous items.
widely used as a measure One potential advantage of used for items that have
of reliability in part the APD method over varying difficulty. For
because it requires only using Cronbach’s alpha is example, some items
one administration of the that the APD might be very easy, others
test index is not connected to more challenging. It should
the number of items on a only be used if there is a
Coefficient alpha range in measure. Cronbach’s correct answer for each
value from 0 to 1 alpha will be higher question — it shouldn’t be
when a measure has more used for questions with
-Calculated to help answer than 25 items partial credit is possible or
questions about for scales like the Likert
howsimilarsets of data are Scale.

0 = absolutely no similarity
1 = perfectly identical

-In contrast to coefficient


alpha, a Pearsonrmay be
thought of as dealing
conceptuallywith both
dissimilarity and similarity

SUMMARIZING WHAT YOU HAVE LEARNED ABOUT THE COEFFICIENTS OF


RELIABILITY
Fill out the missing information in the table below.

Type of Purpose Typical Number of Sources of Statistical


Reliability Uses Testing Error Procedures
Sessions Variance
to review how When 2 Administration Pearson r or
Test-Retest stable a assessing the Spearman rho
measure is stability of
various
personality
tests
Alternate- assess the used when 1 or 2` test Pearson r or
Forms relationship there are construction Spearman rho
between the alternatives or
different for a certain administration
forms of a test
measure

Internal gauge the When 1 Test Pearson r


Consistency level of how evaluating the construction between test
the raters homogeneity halves such as
agree with of a measure Spearman Brown
one another (or, all items correlation and
are tapping on Kuder-
a single Richarson(dichoto
construct) mous items) or
coefficient alpha
for multipoint

Inter-scorer Evaluate the used when 1 scoring and Cohen’s kappa,


level of behavior is interpretation Pearson r, or
agreement being coded Spearman Rho
between and observe
raters on a how different
measure raters observe
a certain
behavior
pattern

NATURE OF TESTS

Different kinds of reliability coefficients are used depending on the nature of the tests. In
each characteristic, please indicate the type of reliability coefficient you would use or
what we would expect to see in terms of reliability.

Characteristic Reliability Coefficient


Homogeneous high degree of internal consistency

Heterogeneous low degree internal consistency

Dynamic Trait internal consistency

Static Trait test-retest or alternate forms

Restricted Range correlation coefficient is low

Inflated Range correlation coefficient is higher

Criterion Referenced traditional ways of estimating reliability


are not always appropriate for criterion-
referenced and may vary on the variety of
the test scores

critical issue for the user of a mastery test


is whether or not a certain criterion score
has been achieved
Norm Referenced traditional ways are appropriate in
estimating reliability (e.g. test-retest,
equivalent forms, etc.)
Power-Tests - mean, standard deviations,
number of items

Speed-Tests two independent


testing periods using either one of the
following:
- test-retest reliability,
- alternate-forms reliability,
- split-half reliability from two separately
timed half tests

If a speed test is administered once and


some measure of internal consistency,
such as the Kuder–Richardson or a split-
half correlation, is calculated, the result
will be a spuriously high reliability
coefficient
COMPARING THEORIES

Fill the table below with information that you can use to compare and contrast theories
related to testing

Classical Test Domain Sampling Generalizability Item Response


Theory Theory Theory Theory

simple and gives seek to estimate “universe score” models the


the notion that the extent replaces that of a probability that a
everyone has a to which specific “true score” which person with X
“true score” on a sources of variation is analogous to a amount of a
test under defined true score particular
conditions are personality trait will
contributing to the exhibit Y amount of
test score that trait on a
personality test
designed to
measure it

assumptions allow a test’s reliability is given the exact synonym is latent-


for its application in conceived of as an same conditions of trait theory
most situations in objective all the facets in
which they are measure of how the universe, the
easily met and precisely the test exact same test
therefore applicable score assesses the score should be
to so many domain from which obtained; based on
measurement the test draws the idea that a
situations can be a sample person’s test
advantageous, scores vary from
especially for the testing to testing
test developer in because of
search of an variables in the
appropriate model testing situation
of
measurement for a
particular
application
in psychometric items in the domain a test’s reliability is assumptions are
parlance, CTT is are thought to have very much a made about the
considered weak the same means function of the frequency
compared to IRT and variances of circumstances distribution of test
which has those in the test under which the scores
assumptions that that samples from test is developed,
are difficult to meet the domain administered, and
interpreted

compatibility Of the three types tests should be refers to a family of


and ease of use of developed with the theories and
with widely used estimates of aid of methods—and
statistical reliability, measures generalizability quite a large family
techniques (as well of internal study where scores at that—with many
as most currently consistency are are examined on other names used
available perhaps the most how generalized to distinguish
data analysis compatible they are if the test specific approaches
software) with domain is administered in
sampling theory different situations
and then in
decision study
where developers
examine the
usefulness of test
scores in helping
the test user make
decisions

References:

Gulliksen, H. (1950). THE RELIABILITY OF SPEEDED TESTS. ETS Research Bulletin


Series, 1950(1), i–16. doi:10.1002/j.2333-8504.1950.tb00876.x
Dannana, S., & Engineer, A. (2018, September 02). What are the sources of errors in
measurement? Retrieved August 05, 2020, from
https://extrudesign.com/sources-of-errors-in-measurement/

Mote, T. (2020, July 21). Factors Affecting Reliability in Psychological Tests. Retrieved
August 05, 2020, from https://healthfully.com/factors-affecting-reliability-in-
psychological-tests-4020509.html

Yrubin1. (n.d.). This is because Spearman Brown estimates are based on a test that is
twice as: Course Hero. Retrieved August 05, 2020, from
https://www.coursehero.com/file/p6ue4h2/This-is-because-Spearman-Brown-
estimates-are-based-on-a-test-that-is-twice-as/

You might also like