0% found this document useful (0 votes)
23 views22 pages

Introduction To Psychological Testing

The document provides a comprehensive overview of psychological testing, detailing the methods, tools, and processes involved in psychological assessment. It outlines the various types of tests, their applications in different settings, and the historical context of psychological measurement. Additionally, it discusses the roles of different parties in the assessment process and the importance of measurement scales and statistical interpretation in evaluating test scores.

Uploaded by

hazymemories02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views22 pages

Introduction To Psychological Testing

The document provides a comprehensive overview of psychological testing, detailing the methods, tools, and processes involved in psychological assessment. It outlines the various types of tests, their applications in different settings, and the historical context of psychological measurement. Additionally, it discusses the roles of different parties in the assessment process and the importance of measurement scales and statistical interpretation in evaluating test scores.

Uploaded by

hazymemories02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Introduction to Psychological Testing

PSYCHOLOGICAL ASSESSMENT is the gathering and integration of psychology-related data for the goal
of doing a psychological evaluation with the use of tools such as:
 Test is a measuring device or procedure
Psychological tests are the devices or procedures designed to measure variables in psychology.
Item is the stimulus the person responds overtly.
Content is the focus of the test. Ex. personality, IQ.
Format pertains to the form, plan, structure, arrangement, & layout of test items. Also refers to
administration, such as computerized, pencil-paper, or some other form.
Score is a code or summary statement that reflects an evaluation of performance, task,
interview, or other sample of behavior.
Scoring is the assigning of codes or meaning to performance, tasks, interview, or other behavior
samples.
Scales are tools that relate raw scores to some defined theoretical or empirical distribution.
Cut score is a reference point used to divide a set of data into two or more classifications.

 Interview is the method of gathering information through direct communication involving


reciprocal exchange.
In face-to-face, interviewer is taking note not only to the content of what is said, but also the
way it is being said.
takes note on both verbal and nonverbal behavior.
nonverbal includes body language, facial expressions. Extent of eye contact, and apparent
willingness to cooperate.
may also be conducted through telephone and other devices.

 Portfolio is a work product such as paper, canvas, film, video, audio, or some other medium.

 Case History Data refers to records, transcripts, and other accounts in written, pictorial, or other
form that preserve archival information, official and informal accounts, other relevant data to an
assessee.
-may include files from schools, hospitals, employers, religious institutions, and criminal justice
agencies.
-another term is case study, concerns the assembly of case history data into an illustrative
account.

 Behavioral observation is defined as monitoring the actions of others or oneself by visual or


electronic means while recording quantitative and/or qualitative information regarding the
actions.
-naturalistic observation is done in natural setting.

 Errors of observers:
>Reactivity is the reaction to being checked.
>Drift is the tendency to stray away from the definitions of behavioral observation they learned
during training and develop their own definition of behaviors.
>contrast effect is the tendency to rate the same behavior differently when observations are
repeated in the same context.
>Halo effect is the tendency to ascribe positive attributes independently of the observed
behavior.
 Computers in test administration, scoring, and interpretation.
- scoring may be done through local-processing or central processing.
- scores may be in the form of:
*simple scoring report is the mere listing of scores
*extended scoring report includes statistical analyses of the testtaker’s performance.
*interpretive report includes numerical or narrative interpretive statements in the
report.
*consultative report is written in language appropriate for communication between
professionals, may provde expert opinion concerning analysis of the data.
*integrative report employs other collected data into the test report.
-CAPA (computer assisted psychological assessment) refers to the assistance computers prove
to the test user.
-CAT computer adaptive testing refers to the computer’s ability to tailor the test to the
testtaker’s ability or testtaking pattern.

USES OF PSYCHOLOGICAL ASSESSMENT


 Classification or diagnosis
 Description such as strengths and weaknesses, explain characteristics
 Prediction tendencies of future behavior, treatment planning

PROCESS OF PSYCHOLOGICAL ASSESSMENT

 Phase I: Referral Question


-Sources of referral questions are psychiatrist, psychologist, teacher, counselor, judge, corporate
resource specialist.
Typically, one or more referral questions in an assessment.
 Phase II: Acquiring Knowledge Relating to the Content of the Problem
-Selecting the tools of assessment to answer the referral question/s.
 Phase III: Data Collection
-Must rely on multiple sources and use these to assess consistency of findings.
 Phase IV: Data Interpretation
-Describes client’s present level of functioning.
-Make objective and empirical inferences on data.
-Must be clear, relevant to the goal of assessment, and useful to the client.
-Give oral feedback.

APPLICATION OF PSYCHOLOGICAL ASSESSMENT

 Educational Setting
-to identify students with special needs, academic difficulty, emotional problems.
-to evaluate accomplishment or degree of learning that has taken place.
-for admission
 Geriatric settings
-To evaluate cognitive, psychological, adaptive, or other functioning.
 Clinical Setting
-Assessment occurs in public, private, and military hospitals, inpatient and outpatient clinics,
private-practice counseling rooms, schools, and other institutions.
-Used to help screen for or diagnose behavior problems.
 Counseling setting
-the focus is the improvement of the assesse in terms of adjustments, productivity, or some
related variables.
 Business and Military settings
-Decision making about the careers of the personnel. Hiring, promotions, transfer, job
satisfaction, eligibility for further training.
 Government and organizational credentialing
-governmental licensing, certification, or general credentialing of professionals

WHO ARE PARTIES IN THE ASSESSMENT ENTERPRISE?


 Test developer- creates tests or other methods of assessment.
-According to APA, more than 20,000 psychological tests are developed each year.
-professional organizations published standards of ethical behavior for responsible test
development and use.
 Test user- includes clinicians, counselors, psychologists, HR personnel, social workers.
 Testtaker- anyone who is the subject of an assessment or an evaluation.
 Society at large- in organizing and systematizing the many-faceted complexity of individual
differences.

(Urbina, 2004)

HOW ARE ASSESSMENTS CONDUCTED?


-there should be an existing need to measure a particular variable.
-assessor must follow standards on how to prepare, administer, use scores, and how the entire record is
stored.
-before assessment: administrator/assessor must be familiar with test materials and procedures, and
with selection of tests, testing venue, and other materials.
-during: rapport, working relationship between examiner and examinee, is important to establish.
-after: safeguard test protocols, permission of third party,
Assessment of people with disabilities- are assessed for exactly the same as that people with no
disabilities are assessed.
-for instance, accommodation must be done.
-accommodation is the adoption of test, procedure, or situation, or the substitution of one test for
another, to make the assessment more suitable for an assessee with exceptional needs.
-alternate assessment is a procedure that varies from the usual, customary, or standardized way a
measurement is derived either by virtue of some special accommodation made to the assessee or by
means of alternative methods designed to measure the same variable/s.

WHERE TO GO FOR AUTHORITATIVE INFORMATIONS: REFERENCE SOURCES


Test catalogues- contains brief description of the test and detailed technical information that a
prospective user might require.
Test manuals- contains the development and information of the test.
Reference volumes- provides detailed information for each test listed, test publisher, author, test
purpose, intended test population and administration time.
Journal articles- reviews of the test, updated or independent studies of psychometric soundness.

PSYCHOLOGICAL TESTING
 Psychological testing is the process of measuring psychology-related variables by means of
devices/procedures designed to obtain a sample of behavior (Cohen & Swerdik, 2009).
 Psychological testing refers to all the possible uses, application, and under-lying concepts of
psychological and educational tests (Kaplan & Saccuzzo, 2011).
 Psychometrics the science of psychological measurement.
 Psychometric soundness refers to how a test consistently and accurately measures what it
purports to measure.

HISTORICAL BACKGROUND OF TESTING


 206 B.C.E to 220 C.E.
Han Dynasty used test batteries in civil law, military affairs, agriculture, revenue, and geography.
 1368-1644 C.E
Ming Dynasty used test to evaluate individuals that are eligible for public office.
 Greco-Roman (460-377B.C)
Hippocrates categorized people in terms of personality types: blood, black bile, phlegm, and
choler.
 Christian von Wolff anticipated psychology as a science and psychological measurement as a
specialty within that science.
 In 1859, Charles Darwin published his book, The Origins of Species. Developed the concept of
survival of the fittest and individual difference.
 In 1869, published the book Hereditary Genius and insisted that individual differences exist in
human sensory and motor functioning.
 19th century
 J. E. Herbert used mathematical models as basis for educational theories.
 Ernst Weber attempted to demonstrate the existence of psychological threshold, the minimum
stimulus necessary to activate a sensory system.
 G.T. Fechner devised the law that the strength of a sensation grows as the logarithm of the
stimulus intensity.
 In 1979, Wilhelm Wundt set up a laboratory at University of Leipzig and credited with founding
the science of psychology.
 Early 20th Century
 in France, Alfred Binet developed the Binet-Simon Scale in 1905 to identify intellectually
subnormal individuals to provide them w/ appropriate educational intervention.
 WWI prompted the creation of standardized achievement tests. Robert Yerkes developed Army
Alpha that requires reading ability and Army Beta that measures intelligence of illiterate adults.
 1939, David Wechsler developed Wechsler-Bellevue Intelligence Scale.
 1920-1940 before and after WWII, personality tests begun to blossom
 Robert S. Woodworth developed Personal Data Sheet to measure adjustment and emotional
stability.
 Rorschach inkblot test was published by Herman Rorschach of Switzerland in 1921.
 David Levy introduced Rorschach test in U.S
 Thematic Apperception test was developed in 1935 by Henry Murray and Christina Morgan.
 In 1943, Minnesota Multiphasic Personality Inventory was developed by Starke Hathaway and
J.C. McKinley. MMPI lead the use of empirical methods to determine the meaning of a test
response.
 Factor analysis, a statistical procedure, became a trend in creating personality tests.
 J.R Guilford made the 1st attempt to use factor analysis in the development of structured
personality test.
 Raymond B. Cattell introduced 16 Personality Questionnaire.

TYPES OF TEST
 According to Type of Administration
Individual tests- one-to-one, may require an active and knowledgeable test administrator,
usually takes longer period of time in terms of administration

Group tests- administer in two or more, may not require the testing administrator to be present
while the test takers independently do whatever it is the test requires.

 According to the Type of Behavior they Measure


Ability test- measures skills in terms of speed, accuracy, or the combination of the two.
a. Achievement test- pertains to previous learning
b. Aptitude test- future learning
c. Intelligence- potential to solve problem, adopt to different circumstances, and profit from
experience.
-all three are highly interrelated.

Personality test- pertains to typical behavior such as traits, temperaments, and dispositions.
a. Structured (objective)- provides self-report statement to which the person responds “true” or
“false,” “yes” or “no.”
b. Projective- provides an ambiguous test stimulus; responses of assessee are unclear.
USES OF PSYCHOGICAL TESTING
 Decision-making- involves value judgment on the part of one or more decision makers who
need to determine bases upon which to select, place, classify, diagnose.
 Psychological research- provides method of studying the nature, development, and
interrelationships of cognitive, affective, and behavioral traits.
 Self-understanding and personal development- applies in counseling and psychotherapeutic
settings.

TEST ADMINISTRATION
 The Examiner and the Examinee
-The behavior and the relationship of the examiner to the examinee can affect test scores.
-The examiner must build rapport with the examinee.
-Witmer, Bernstein & Dunham (1971), found in their study that children who received
disapproving comments such as “I thought you could do better than that” got lower scores in
their examination than did children who received approving comments such as “good” or “fine.”
 The race or ethnicity of the examiner has generated considerable attention.
-Sattler (2002) reviewed experimental studies and found that there is a little evidence that the
race of the examiner significantly affects intelligence test scores.
 Different assessment procedures require training of test administrators. Many behavioral
assessments and psychological tests require training and evaluation, but not a formal degree or
diploma.
 Expectancy effect/Rosenthal effect can also alter data findings.
 Reinforcement affects behavior, testers should always administer the test under controlled
conditions.
 Computer-assisted Test Administration- Locke & Gilbert (1995) stated that students were less
likely to disclose socially undesirable information during a personal interview than a computer.
 The mode of administration should be constant within any evaluation of clients (Bowling, 2005).
-Studies on health have shown that measures administered by an interviewer are more likely to
show people in good health than are measures that are self-completed (Hanmer, Hays &
Fryback, 2007)
 Subject variables may be a serious source of errors.
-Test anxiety appears to have three components: worry, emotionality, and lack of self-
confidence (Oostdam & Meijer, 2003)
(Cohen & Swerdlik, 2010)
MEASUREMENT SCALES

Measurement is the act of assigning numbers or symbols to characteristics of things according to rules.
Variable is anything that varies (e.g sex, personality, intelligence); constant is anything that does not.
(e.g constant is π=3.1416)
 Discrete variables are those with finite range of values.
-dichotomous variables assume only two values. E.g. Male/female
-polytomous variables assume more than two values. E.g marital status, race, etc.
 Continuous variables have infinite ranges and really cannot be counted. E.g. anxiety,
extraversion.
Error is the collective influence of all the factors on a test score or measurement beyond those
specifically measured by the test or measurement.
Scale is a set of numbers (or symbols) whose properties model empirical properties of the object to
which the numbers are assigned.
Four levels of scale: NOIR of Stanley Smith Stevens (1946)
 Nominal scale involves classification or categorization based on characteristics. Ex. disorders in
DSM; pass/fail.
-only permissible arithmetic operation is counting the frequencies within each category.
 Ordinal scale has a property of magnitude, but not equal interval and absolute 0. Permits
classification and ranking order. E.g. birth order, level of academic performance.
-convey a precise meaning in terms of position, they carry no information with regard to the
distance between positions.
-in psychological testing, rank ordered test scores are reported as percentile rank scores.
-can be manipulated statistically in the same way as nominal. In addition, Spearman’s rho
correlation coefficient for rank differences.
 Interval scale has properties of magnitude and equal interval, but not absolute 0.
-each unit on the scale is exactly equal to any other unit in the scale.
-a scale that is possible to average a set of measurements and obtain a meaningful result.
-e.g. one day consists of 24hrs, difference of IQ scores of 80 and 100 is thought to be similar to
IQ scores of 100 and 120.
 Ratio scale has magnitude, equal interval and absolute zero.
-all mathematical operations can be performed.
-e.g. frequency counts or of time intervals, both of which allow the possibility of true zeros.

NORMS & INTERPRETATION OF TEST SCORES

 Parameters are measures derived populations.


 Statistics are measures derived from sample of data.
 Statistics is a branch of mathematics that organizes, depicts, summarizes, and analyzes
numerical data.

TYPES OF STATISTICS:
-Descriptive statistics are used to provide a concise description of a collection of quantitative
information.
-Inferential statistics are methods used to make inferences from observations of a small group of
people known as a sample to a population.

DESCRIPTIVE STATISTICS
 Distribution is a set of test scores arrayed for recording or study.
 Raw score is a straightforward, unmodified accounting of performance that is usually numerical.
 FREQUENCY DISTRIBUTION is a listed scores alongside the number of times each score
occurred. It might be in tabular or graph form.
-simple frequency distribution indicates that individual scores have been used and the scores
have not been grouped.
-grouped frequency distribution organizes scores into a still more compact form.
 Frequency distributions can also be illustrated graphically.
 Graph is a diagram or chart consists of lines, points, bars, or other symbols that describe data.
-Histogram is a graph with vertical drawn at the true limits of each test scores (or class interval),
forming a series of contiguous rectangles.
-Bar graph shows the frequency on the y-axis, and the reference to some categorization (e.g.
yes/no/maybe, male/female) appears on the x-axis.
-Frequency polygon is expressed by a continuous line connecting the points where test scores
or class intervals (x-axis) meet frequencies (y-axis).

 MEASURES OF CENTRAL TENDENCY is a statistic that indicates the average or midmost score
between the extreme scores in a distribution.
 Arithmetic mean or known as average. It takes into account the actual numerical value of every
score.
-denoted by , is equal to the sum of obervations divided by the number of
observations.
-most appropriate measure for interval and ratio data when distribution are believed to be
approximately normal.
 Median is the middle score in a distribution.
-to determine the median, order the scores in a list of magnitude, in either ascending or
descending order. If the number of scores is odd then the middle score is the median; if it is
even, find arithmetic mean of the two middle scores.
-appropriate for ordinal, interval, and ratio data.
 Mode is the most frequently occurring score in a distribution of scores.
-bimodal happends when there are two scores that occur with the highest frequency.
-multimodal
-frequently used when dealing with qualitative or categorical variables.

 MEASURES OF VARIABILITY are statistics that describes the amount of variation in a


distribution.
-Variability indicates how scores in a distribution are scattered or dispersed.
 Range is equal to the difference between the highest and the lowest scores.
-is the simplest measure of variability, but its use is limited.
-one extreme score might underestimate or overestimate the value of range.

 Interquartile range and semi-interquartile range- distribution of test scores can be divided into
four parts such that 25% of scores occur in each quarter.
-quartiles divide points between four quarters.
-quarter is the interval.
 Interquartile range is equal to the difference between Q3 and Q1. It is an ordinal statistics.
 Semi-interquartile is equal to the interquartile range divided by 2.
-if distances are unequal then there is a lack of symmetry,referred to as skewness.
 Average deviation is the distance between each value in the data set’s mean.

 Standard deviation is equal to the square root of the averaged squared deviations around the
mean.
-it is equal to square root of variance.

or

-s, S, SD refer to sample standard deviation.


-σ (sigma) refers to population standard deviation.

 Skewness is the extent to which symmetry is absent.


 Positive skew is when relatively few of the scores fall at the high end of the distribution.
 Negative skew is when relatively few of the scores fall at the low end of the distribution.

 Kurtosis is the steepness of the distribution in its center.


-platykurtic is relatively flat.
-leptokurtic relatively peaked.
-mesokurtic intermediate degree of dispersion.

THE NORMAL CURVE


 Abraham DeMoivre, later, the Marquis de Laplace begun working on the idea of normal curve in
the middle of the eighteenth century.
 At the beginning of the 19th century, Karl Friedrich Gauss contributed to the work. Then,
scientists referred to it as the “Laplace-gaussian curve.”
 Karl Pearson is credited with the first to refer it as the “normal curve.”
 Normal curve is a bell-shaped, smooth, mathematically defined curve that is highest at its
center.

STANDARD SCORES
 Is a raw score that has been converted from one scale to another scale, where the latter scale
has some arbitrarily set mean and standard deviation.
-it gives meaning to the raw score relative to other scores.

 Z score has a mean set at 0 and a standard deviation set at 1.

 T score has a mean set at 50 and a standard deviation set at 10.


-none of the scores is negative.

T= 10Z + 50

 Stanine converts any set of scores into a transformed scale, which ranges from 1 to 9. It has a
mean of 5 and standard deviation of approximately 2.

 Interquartile range and semi-interquartile range- distribution of test scores can be divided into
four parts such that 25% of scores occur in each quarter.
-quartiles divide points between four quarters.
-quarter is the interval.
 Interquartile range is equal to the difference between Q3 and Q1. It is an ordinal statistics.
NORMS
 Norm-referenced testing and assessment is a process of deriving meaning from test scores by
evaluating an individual’s score and comparing it to scores of a group of testtakers.
-Norm, singular, refers to behavior that is usual, average, normal, standard, expected, or typical.
-Norms, plural, refer to the test performance data of a particular group of testtakers that are
designed for use as a reference when evaluating individual test scores.
-Normative sample refers to that group of people whose performance on a particular test is
analyzed for reference in evaluating the performance of individual testtakers.
-Norming is the process of deriving norms.

 Sampling to Develop Norms


-Standardization/ test standardization is the method of administering a test to a representative
sample of testtakers for the purpose of establishing norms.
-sample is the representative of the population.
-Sampling is the method of selecting the portion of the population deemed to be representative
of the whole.
o Stratified sampling is the method of dividing the population into subgroups (strata). E.g.
Filipinos: ilocanos, cebuanos, batangenos, etc..
o Stratified-random sampling is the stratified sampling done in random, that is, evry
members of the population has the same chance of being included in the sample.
o Purposive sampling is the method of selecting some sample because they believe to be
representative of the population.
o Convenient/ incidental sampling is the method of selecting samples because of their
convenient accessibility and proximity.

 Developing Norms for a Standardized Test


-Establish a standard set of instructions for the test administration.
-provide a precise description of the standardization sample itself
-summarize the data using descriptive statistics.

 Types of Norms
-Percentile is the specific score or point within a distribution.
-Percentile rank is the percentage of scores fall below a particular score.
-Age norms / age-equivalent scores indicate the average performance of different samples of
testtakers who were at various ages at the time the test was administered.
-Grade norms developed by administering the test to representative samples of children over a
range of consecutive grade levels; and to indicate the average test performance of testtakers in
a given school grade.
-developmental norms are norms developed on the basis of any trait, ability, skill, or other
characteristic that is presumed to develop, deteriorate, or otherwise be affected by
chronological age, school grade, or stage of life.
-national norms are based from a normative sample that was nationally representative of the
population at the time the norming study was conducted.
-subgroup norms are derived from segmented samples.
-local norms provide normative information about the local population’s performance on some
test.
 Fixed reference group is used as the basis for the calculation of test scores for future
administrations of the test.

 Criterion-referenced testing and assessment is a method of evaluation and a way of deriving


meaning from test scores by evaluating an individual’s score with reference to a set standard.

CORRELATION AND INFERENCE

 Scatterplot is a graphing of the coordinate points for values of X-variable and Y-variable.
It gives a quick indication of the direction and magnitude of the relationship between
two variables.
-Curvilinearity is the “eyeball gauge” of how curved the graph is.
-Outlier is an extremely atypical point located at a relatively long distance from the rest
of the coordinate points in a scatterplot.
 Coefficient correlation is a mathematical index that describes the direction and magnitude of a
relationship.
 Correlation is an expression of the degree and direction of correspondence between two things.
-(r) expresses a linear relationship.
-positive correlation means high on Y are associated with high scores on X; low on Y corresponds
to low on X.
-negative correlation means high on Y is associated with low on X; low on Y means high on X.
-no correlation means two variables are not related.

o Pearson r/ pearson correlation coefficient/ pearson product-moment coefficient of


correlation is a measure of correlation when the relationship between the variables is
linear and the variables being correlated are continuous.

o Spearman’s rho is the measure for finding association between two sets of ranks.
o Biserial correlation tells the relationship between a continuous and artificial
dichotomous variable.
o Point biserial measures the relationship between true dichotomous and continuous
variable variables.
o Phi coefficient measures the relationship between true dichotomous and another true
dichotomous or true and artificial dichotomous variables.
o Tetrachoric measures the relationship of artificial variable to another artificial variable.

CONTINUOUS ARTIFICIAL
VARIABLE DICHOTOMOUS

CONTINUOUS Pearson r Biserial correlation


VARIABLE

ARTIFICIAL Biserial Tetrachoric


DICHOTOMOUS correlation correlation
TRUE Point biserial Phi coefficient
DICHOTOMOUS correlation correlation

 Regression is a measurement used to make predictions about scores on one variable from
knowledge of scores on another variables.
-regression line is the best-fitting straight line through a set of points in a scatter diagram; the
running mean or the line of least squares in two dimensions or in the space created by two
variables.
-Sum of squares is the sum of squared deviations around the mean.
-Covariance is the expression of how much two measures covary or vary together.
-Slope tells how much change is expected in Y each time X increases by one unit.
-Intercept (a) is the value of Y when X is 0.

o The Best-Fitting line appears through a series of data points.


o Residual is the difference between the observed and predicted score Y-Y’
o Principles of Least Squares is used to form best-fitting line by keeping the squared
residuals as small as possible.

Example:
 Terms to Remember
o Residual is the difference between the predicted and observed score, that is, Y-Y’
o Standard Error of Estimate is the standard deviation of the residuals; it is a measure of
accuracy of prediction. Prediction is accurate if it is small.
o Coefficient of Determination (r²) tells the proportion of the total variation in scores on Y
that we know as a function of information about X.
o Coefficient of Alienation is the measure of nonassociation between variables. √ 1−r 2
o Shrinkage is the amount of decrease observed when a regression equation is created
for one population and then applied to another.
o Cross validation is the use of regression equation to predict performance in a group of
subjects other than the ones to which the equation was applied.
 Issues in the Use of Correlation
o Correlation-Causation Problem- a correlation alone does not prove causality because
there is a third variable.
o Restricted Range (problem)- correlations requires variability. If the variability is
restricted, then significant correlations are difficult to find.
 Multivariate Analysis is the measurement of the relationship between combinations of three or
more variables. The goal is to find the linear combination of the three variables that provides
the best prediction of the other variable.
 Discriminant Analysis is used to find a linear combination between categories.
 Factor Analysis is used to study the interrelationships among a set of variables without
references to a criterion.

RELIABILITY
 Reliability is the consistency of measurement; is a synonym of dependability and consistency.
-Reliability coefficient is the index of reliability, a proportion that indicates the ratio between
the true score variance on a test and the total variance.
X=T+E
 Sources of Error Variance
-Test Construction: error comes from item sampling refers or content sampling which refers to
variation among items within a test as well as to variation among items between tests.
-Test Administration: test environment such as room temperature, noise, level of lighting, for
instance. Testtaker variables such as emotional problems, physical discomfort, lack of sleep.
Examiner-related variables such as physical appearance, presence or absence of an examiner.
-Test Scoring and Interpretation: scorer, pencil-paper tests.

 RELIABILITY ESTIMATES
 Test-Retest is the estimation of reliability from correlating pairs of scores from the same group
of people on two different administrations of the same test.
-appropriate when evaluating construct that is relatively stable over time.
-coefficient of stability is the estimate of test-retest reliability
-for tests that employ outcome measures such as reaction time or perceptual judgment
(discriminations of brightness, loudness, or taste).

 Parallel-Forms and Alternate-Forms


 Parallel forms of tests have equal means and variances of observed test scores. Scores obtained
on parallel tests correlated equally with other measures.
 Alternate forms are different versions of a test that have been constructed so as to be parallel.
They are designed to be equivalent with respect to variables such as content and level of
difficulty.
-require two test administrations with the same group.
-Coefficient of equivalence is the degree of the relationship between alternate-forms of tests,
or parallel forms of tests.

 Split-Half reliability is the correlation of two pairs of scores obtained from equivalent halves of a
single test administered once.
-Odd-even system is the assigning of odd-numbered items to one half of the test and even-
numbered items to other half.

 Spearman-Brown formula estimates the internal consistency reliability from a correlation of


two halves of a test. The estimate is based on a test that is twice as long as the original half test.

 KR20, Kuder-Richardson formula 20 is the estimation of inter-item consistency of highly


homogenous test that has dichotomous items.

KR20= the reliability estimate (r)


σ²= variance of the total test score
k= is the number of items
p= the proportion of the testtakers getting each item correct
q=the proportion of the testtakers getting each item incorrect
∑pq= the sum of the products of p times q for each item on the test.

 KR21 assumes that all the items are of equal difficulty, or the average level of difficulty is 50%.

KR 21=
k
k−1 [
1−
Ẋ (1−Ẋ /k )
σ² ]
Ẋ= mean test score

 Coefficient Alpha estimates the internal consistency of tests that are nondichotomous.
-mean of all possible split-half correlations
-is the preferred statistic for obtaining an estimate of internal consistency reliability.

 Inter-Scorer Reliability
-is the degree of consistency between two or more scorers/raters with regard to a particular
measure.
 Kappa statistic assesses the level of agreement among several observers. Introduced by J.
Cohen.
-values of kappa vary between 1 (perfect agreement) and -1 (less agreement that can be
expected on the basis of chance alone).
>.75 = excellent agreement
.40 - .75 = fair to good agreement
<.40 = poor

-Kappa calculation procedures can be found in Fleiss (1971) and Shrout, Spitzer, and Fleiss 1987.
-phi coefficient can give the approximation of the coefficient for the agreement between two
observers.

 Standard Error of Measurement, SEM is the tool used to estimate or infer the extent to which
an observed score deviates from a true score.
- it is the standard deviation of a theoretically normal distribution of test scores obtained by one
person on equivalent tests.
-SEM and reliability is inversely proportional.
-if the standard deviation for the distribution of test scores is known, and if an estimate of the
reliability of the test is known, then an estimate of standard error of a particular score can be
obtained by:

You might also like