Psych Assessment
Psych Assessment
Obvious behavior.
5. Various Sources of Errors are Part of RELIABILITY
the Assessment A test should produce consistent scores in
measuring the same variable after being
Error Error Variance
taken multiple times.
Assumption that factors Unpredictable variation
other than what is in test results due to VALIDITY
intended to be uncontrolled factors
It is the accuracy of a test, if it measures
measured will influence that are not being
what it intends to measure as it ensures that
the result of the test. measured by the study.
the results can be used to make accurate
conclusions and predictions.
The standard set on Group of people Process of creating The representative Process of selecting
a standardized test after second norms and average score based portion of a group
that helps interpret tryouts that are administering the test on the performance deemed to be
an individual’s test then used as norms to a representative a specific group of representative of
results by during test scores sample for a purpose test takers who have the whole
comparing them to comparison. of establishing norms. already taken the population.
a larger group. test.
sample population.
2. Cluster Sampling – selecting a group (Proceed to Next Page)
and including all, commonly divided by
geography.
3. Purposive Sampling – selecting a
sample which we believe is
representative of a certain population.
4. Incidental Sampling – selecting sample
because of convenience.
National Norms Vs. National Anchor Norms
The average scores on a single test from a group of The common national standard used to compare
people in a country used to compare individual scores. individual scores from different tests in a country.
A standardized reference used to compare the test A standardized reference used to compare the test
scores of people based on specific traits that were scores of people coming from the same geographical
initially used as criteria for selecting subjects for the location.
sample.
Criterion Referenced Testing and Assessment Vs. Norm Referenced Testing and
Assessment
Derive meaning from a test to evaluate it on the basis It is a set of standards deriving from the performance
whether it has met a certain criterion created by of an already established normative sample and is used
experts. as reference by tests to see whether an individual
performed relative to the normative sample’s
Also referred to as content-referenced testing and performance.
assessment.
RELIABILITY ESTIMATES
RELIABILITY
Results for this test may encounter 2. Test of Homogeneity – ensures that all
error due to practice, fatigue, and items measure a single trait or construct.
Coefficient of Equivalence –
estimates that measure the degree of
RESTRICTION OR IMFLAMMATION
OF RANGE
PURPOSE OF RELIABILITY
COEFFICIENT 1. Restriction of Range – when sample
size is restricted, correlation and
Not all reliability coefficients reflect the
reliability coefficients decrease.
same sources of error variance.
The better the sample represents the 1. Confidence Interval – a range of test
full domain, the more accurate it is. scores which likely contains a person’s
true score.
VALIDITY
VALIDITY Culture and Relatively of Content
– some test maybe valid to some
Something would be valid if it is grounded
cultures, but not to others.
with evidence.
The process of gathering and evaluating If the correlation is high, your test is valid.
evidence about validity.
1. Incremental Validity – it discovers new
1. Validation Studies – constructed by test information aside from what the test
developers with their own group of test intends to measure or is predicting for a
takers. future performance.
2. Local Validation Studies – used when 2. Validity Coefficient – measures the
test user plans to alter a test in terms of correlation of the test score and the
format, language, instruction, and criterion measure’s score.
etcetera in ways that it is more suitable 3. Concurrent Validity – if test and the
for the cultural background of the test criterion test is taken at the same time
takers. and shows the same result.
3. Face Validity – if a test appears to
measure what it intends to measure,
judging the book by its cover.
4. Content Validity – the subsurface,
representing the items of the
psychological test; does the item reflect 4. Predictive Validity – measures the
of the construct being studied; the Base Rate – a group of people who
unobservable traits that a test developer through comparing its results to other
may invoke to describe a test behavior or tests that measures unrelated concepts.
EVIDENCE OF CONSTRUCT
8. Exploratory Factor Analysis Vs.
VALIDITY
Confirmatory Factor Analysis
1. Evidence of Homogeneity – how
Exploratory Factor Confirmatory Factor
uniform a test is in measuring one single Analysis Analysis
concept.
clustering all items confirming a set of
related to each other; no data.
set of data, still on the
process of
categorization.
TEST UTILITY
Also referred to as the practical value of a purchasing a test, a supply bank of test
is likely to be, however there are analysis about the usefulness and practical
exceptions to this for there are many value of a tool of assessment; evaluating
variables that could affect a test’s utility. whether the benefits outweigh the cost of
Hence, valid tests are not always useful. using a certain psychological tool.
4. The Brodgen-Cronbach-Gleser
Formula – calculates the dollar amount 2. The Complexity of the Job – same
of a utility gain. utility models are used for a variety of
positions.
However, the difficulty of the job affects 1. The Angoff Method – relies on
how well people perform. judgements of experts which are
averaged to yield cut scores for the test.
The cut scores in use are the It considers the difficulty of each
following: question and the test taker’s ability in
Relative Cut Score setting a fair and accurate cut score.
Norm-Referenced Cut Score
Fixed Cut Score
Multiple Cut Score
Compensatory Model of Selection
Tests are not just papers with carefully 1.1. Pilot Work – refers to the
worded questions, it is a scientific product preliminary research surrounding
involving human and economic resources, the creation of a prototype of the
hence psychological tools are expensive. test; a test item is pilot studied to
determine whether they should be
included the instruments’ final
TEST DEVELOPMENT PROCESS
form.
1. Test Conceptualization – includes
answering the following questions: A test developer uses pilot work to
What is the test designed to determine how best to measure a
measure? targeted construct.
What is the test’s objective?
Is there a need for the test? The process may include literature
Who will take this test? reviews and experimentation and
What content does the test cover? as well as creation, revision, and
How will the test be administered? deletion of preliminary items.
What is the ideal format of the test?
Should more than one form of the 2. Test Construction
test be developed? Scaling – process of setting rules in
What special training does the test assigning numbers in measurement;
user need to have to administer or process by which a measure device
interpret the test? is calibrated to have numbers (scale
What types of response will be values) be assigned to different
required from the test takers? amounts of trait, attribute, or
characteristic being measured.
Types of Scales 2. Writing Items
o Nominal, Ordinal, Interval, or Item Pool is the reservoir wherein
Ratio (NOIR) items will or will not be drawn for the
o Age-based Scale – if the test final version of the test.
taker’s performance as a
function of age is of critical When constructing multiple choice
interest. items, it is advised that the first draft
o Grade-based Scale – if the test contains double the items that the
taler’s performance as a final version of the test will contain.
function of grade is of critical
interest. 1. What range of content should the
items cover?
2. Which of many different types of
SCALING METHOD item formats should be employed?
3. How many items should be written
1. Rating Scale
in total and for each content area
Likert Scale presents five alternative
covered?
responses (sometimes seven), usually
on agree-disagree or approve-
disapprove continuum.
Paired Comparison includes
presenting pairs of stimuli to the test
takers which they are asked to
compare, and they must select one
stimulus according to a rule.
ITEM FORMAT 3. Writing Items for Computer
Administration – a computer program
Variables such as form, plan, structure,
designed to facilitate the construction of
arrangement, and layout of individual test
test as well as their administration,
items.
scoring, and interpretation.
1. Selected-response Format – requires Computerized Adaptive Testing
selecting an answer from a set of (CAT) – interactive, computer-
alternative responses. administered test wherein items
Multiple Choice Format – has three presented are based on the test taker’s
formats (1) a stem, (2) a correct performance in previous items.
alternative or option, and (3) several
incorrect alternatives referred to as If you answer questions correctly, the
distractors or fails. next ones might be harder.
Matching Item – presented in two o Item Bank – a relatively large and
columns, a test taker determines easily accessible collection of test
what response is best associated with questions.
the premise. o Item Branching – ability of the
True-False (binary-choice item) –
computer to tailor the content and
includes a statement that the test
order of presentation of test items
taker is asked to determine whether it
on the basis of responses to
is or is not a fact.
previous items.
2. Constructed-response Format –
o Floor Effect – refers to the
requires the test taker to create the
diminished utility of an assessment
correct answer and not just select it.
tool for distinguishing test takers at
Completion Item – requires test
the low end of the ability.
takers to provide a word or phrase
o Ceiling Effect – ability of a tool in
that completes a sentence.
distinguishing test takers at the
1. Short Item
high end of the ability, trait, or
2. Essay Item
other attitude being measured.
SCORING ITEMS 1. Item-Difficulty Index – uses p-value to
obtain the value of Item-Difficulty
1. Cumulative Model – each time a test
Index; calculating the proportion of the
taker answers a question in a certain
total number of test takers who answered
way, they earn points, and these points
the item correctly.
adds up and show how well they
understand a certain skill or trait.
0 (if no one got them right), 1 (if
everyone got them right).
The higher the test score, the better the
person is at whatever the test is trying to
measure.
2. Class Scoring (Category Scoring) –
your answer on the test help decide
which group or category you belong in.
You are put in a group whose answers
are similar to you.
3. Ipsative Scoring – comparing your The larger the item-difficulty index, the
score in one part of the test to the other easier the item.
part of the same test.
TEST TRYOUTS
2. Item-Reliability Index – indicates
After creating a pool of items from which internal consistency of a test; the higher
the final version of the test will be the index, the greater a test’s internal
developed, the test developer will try out the consistency.
test to gather what items are good or bad.
Includes Factor Analysis and Inter-item
consistency.
3. Item-Validity Index – a statistic design 7. Qualitative Item Analysis – rely
indicating if a test measures what it primarily on verbal rather than statistical
intends to measure. procedure in analyzing.
Think Aloud Test Administration –
The higher the item validity index, the a one on one test wherein the client is
greater the test’s criterion related tasked to read aloud a question for the
validity. clinician to analyze how and why a
client misinterpreted a data.
3.1. Item-Score Standard Expert Panels – a sensitivity review
Deviation – correlation between the wherein the panel analyzes what
item score and criterion score. items might stereotype or offend test
takers.
4. Item-Discrimination Index – indicate
how an item separates or discriminates
between high scorers and low scorers of TEST REVISION
CROSS-VALIDATION