0% found this document useful (0 votes)
49 views

Chapter 4 - Measurement and Statistics

This document provides an overview of key concepts in statistics, psychometrics, and measurement. It discusses scales of measurement, measures of central tendency and dispersion, correlations, regression, factor analysis, reliability, validity, and derived scores such as standard scores and percentiles. Measurement concepts covered include norm-referenced testing, criterion-referenced testing, standards-referenced testing, and reliability measures like the standard error of measurement.

Uploaded by

alex cabral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Chapter 4 - Measurement and Statistics

This document provides an overview of key concepts in statistics, psychometrics, and measurement. It discusses scales of measurement, measures of central tendency and dispersion, correlations, regression, factor analysis, reliability, validity, and derived scores such as standard scores and percentiles. Measurement concepts covered include norm-referenced testing, criterion-referenced testing, standards-referenced testing, and reliability measures like the standard error of measurement.

Uploaded by

alex cabral
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Chapter 4

A Primer on Statistics and


Psychometrics
Scales of Measurement

Nominal Ordinal Interval Ratio

• Set of • Use to classify • Use to classify • Highest level


categories that and order and order, but of
do not have a items it adds an measurement.
sequential (describe arbitrary zero • It has a true
order. magnitude). point and has zero point, has
• ID by a name, • Examples: rank equal intervals equal intervals
number, or ordering, between between
letter, and are Likert Scales. points. adjacent
used for • SES, movie • Examples: points, and
classification ratings, temperature, allows
only. intelligence altitude above ordering and
• Examples: test scores. sea level. classification.
gender, • Examples:
ethnicity, height,
marital status weight, age,
length.
Measures of Central Tendency

 Mean
The arithmetic average
 Median
The middle point
 Mode
The most frequently occurring
Dispersion
 Range (R)
Difference between highest and
lowest point
 Variance (S2)
Amount of variability of scores around
the mean
 Standard Deviation (SD)
Average distance of each score from
the mean
Normal Curve
 Symmetrical distribution of scores
 More scores closer to the middle than at
the ends
 Mean, median, and mode are the same
Correlations
 Tell us about the degree of relationship
between two variables
 The higher the correlation between two
variables, the more accurately we can
predict the value of one variable when
we know the value of the other.
 Correlations of +1.00 or -1.00 are the
strongest.
Correlations (Cont.)

A correlation of .00 does not allow for


any prediction.
 Correlations do not imply cause and
effect.
Correlation Coefficients
 Pearson product moment correlation
coefficient (r)
Two variables are continuous and
normally distributed
Linear relationship
Outliers kept to a minimum
Distribution of scores for each variable
is spread out to about the same extent
Regression
 Regression equation
Ypred = bX + a
 Standard error of estimate
A measure of the accuracy of the
predicted Y scores in a regression
equation
Multiple Correlation
 Isa statistical technique for determining
how well one variable can be predicted
using two or more variables.
 Example: Predicting a student’s GPA
based on his/her IQ plus the average
number of hours spent daily on
homework.
Norm-Referenced
Measurement
Criterion- A child's performance is
referenced compared with an
objective criterion, such

interpretation as 90% or better mastery.

A child’s performance is
Standards- evaluated in reference
to the degree to which
referenced defined standards are
met (e.g., below basic,
interpretation basic, proficient,
advanced)

A child's performance is
Norm- compared with the
performance of a
referenced representative group of
children, referred as a
measurement norm group or
standardization sample.
Norm-Referenced
Measurement
 Four important groups in norm-references measurement:
Population- the complete group.
Representative sample- a group drawn from the
population that represents the population accurately.
Random sample-sample obtained by selecting
members of the population based on random selection.
Reference group-norm group that serves as the
comparison group for computing standard scores,
percentile ranks, and related statistics.
Derived Scores
 Standardscores-raw scores that have been
transformed so they have a predetermined mean and
standard deviation.
 Percentileranks-derived scores that permit us to
determine an individual’s position relative to the
standardization sample or any other specified sample.
 Normal-curveequivalents-are standard scores with a
M=50 and a SD= 21.06.
 Stanines-provides
a single-digit scoring system with
M=5 and SD=2. Stanines scores are expressed as whole
numbers from 1to 9.
Derived Scores (Cont.)
 Age equivalents-are obtained by computing
the average raw scores obtained on a test by
children at different ages.
 Grade equivalents-are obtained by
computing the average raw scores obtained
on test by children in different grades.
 RatioIQs-ratios of mental age (MA) to
chronological age (CA), multiplied by 100 to
eliminate the decimal.
Inferential Statistics
 Statisticalsignificance-refers to whether
scores differ from what would be expected
on the basis of chance alone.
 Effectsize-is a statistical index based on
standard deviation units, independent of
sample size.
Cohen’s d
Correlation coefficient (r) and d
Reliability
A measure may be:
Consistent within itself (internal
consistency)
Consistent over time (test-retest)
Consistent with an alternate or parallel
form of the measure (alternate-forms)
Consistent when used by various raters
or observers (interrater)
Factors Affecting Reliability

Length of Homogeneity Test-retest


test of items interval

Variation in
Variability of
Guessing the test
scores
situation

Sample size
Standard Error of Measurement
(SEM)
 Isan estimate of the amount of error
inherent in a child’s obtained score
 Directly reflects the reliability of a test
 Represents the standard deviation of
the distribution of error scores
Confidence Intervals

A band or range of scores around the


obtained score that likely includes a
child’s true score.
 Traditionally,
a range is selected that
represents the 68%, 95%, or 99% level of
confidence.
Practice Effects
 Definition:
when a test is re-administered, retest scores
may differ from those obtained on the initial test.
 Practice effects may be related to prior exposure to the
test.
 Practice
effects may occur because of intervening
events between the two administrations.
 Practiceeffects may not occur to the same extent in all
populations.
 Practice effects vary from different types of tasks.
Practice Effects (Cont.)
 Practiceeffects may be affected by
regression toward the mean.
 Practiceeffects may be difficult to
interpret when the initial test and the
retest are different.
 Practice
effects may depend on the item
content covered throughout the test.
Item Response Theory
 Allows to determine item difficulty and
item discrimination.
 Italso allows to establish a “guessing”
parameter.
Differential Item
Functioning (DIF)
 Isa statistical procedure designed to
reveal whether test items function
differently in different groups.
Validity
 Content validity-refers to whether the
items represent the domain being
assessed.
 Face validity-refers to whether a test
looks valid “on the face of it”.
 Constructvalidity-establishes the degree
to which a test measures a specified
psychological construct.
Validity (Cont.)
 Criterion-related validity-is based on how
adequately test scores correlate with some
type of criterion or outcome (such as
ratings, classifications, or test scores).
 Testutility-refers to the practical value a
test has as an aid in decision making.
 Predictivepower-it assess the accuracy of
a decision made on the basis of a given
measure.
Factors Affecting Validity

Reliability

Range of attributes being measured

Length of the interval between administration


of the test and of the criterion measure

Range of variability in the criterion measure


Meta-Analysis
 Summarizes results of many studies.
 Usesresearch techniques to sum up and
integrate the findings of a body of studies
covering similar topics.
 Particularly
useful in validity
generalization studies.
Factor Analysis
 Used
to explain the pattern of intercorrelations
among a set of variables by deriving the smallest
number of meaningful variables or factors.
 Used to delineate patterns in a complex set of
data.
 Basedon the assumption that a significant
correlation between two variables indicates a
common underlying factor shared by both
variables.
Components of Variance
• Refers to that part of the total variance that
can be attributed to common factors (those that
Communality appear in more than one variable).

• Refers to that part of the total variance that is


due to factors specific to a particular variable,
Specificity not to measurement error or common factors.

• Refers to that part of the total variance that


remains when we subtract the reliability of the
Error
variance variable from the total variance.
Other Useful Psychometric
Concepts
 Floor effect differences-refer to the number of easy
items available at the lowest level of a test to
distinguish among children with below-average ability.
 Ceilingeffect differences-differences-refer to the
number of difficult items available at the highest level
of a test to distinguish among children with above-
average ability.
 Itemgradient differences-refer to the ratio of item
raw scores to standard scores, or the number of raw
score points required to earn 1 standard score point.
Other Useful Psychometric
Concepts (Cont.)
 Differencesin layouts of norm tables-norm tables
may have different age-span layouts on different
tests.
 Differences in age-equivalent or grade-equivalent
scores-age equivalents or grade equivalents on
different tests may not coincide, even though the
standard scores are similar on the tests.
 Reliability differences-tests with low reliability will
produce less stable scores than tests with high
reliability.
Other Useful Psychometric
Concepts (Cont.)
 Differences in skill areas assessed-different test may
measure different skills, even though they have the same
label for a skill area.
 Test
content differences-different tests may measure the
same skill area but contain different content.
 Differences resulting from publication date-tests published
in different years may yield scores that differ because of
changes in the abilities of the norm group.
 Sampling differences-tests normed in different samples
may yield different scores because the samples are not
comparable.

You might also like