0% found this document useful (0 votes)
7 views

Psychological Assessment

The document provides an extensive overview of statistical methods, including descriptive and inferential statistics, measurement scales, and types of reliability. It discusses various statistical techniques for analyzing data, such as T-tests, ANOVA, and correlation coefficients, as well as concepts like error variance and measures of central tendency. Additionally, it covers the importance of reliability in testing and introduces different theories related to test measurement and generalizability.

Uploaded by

bkatlenem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Psychological Assessment

The document provides an extensive overview of statistical methods, including descriptive and inferential statistics, measurement scales, and types of reliability. It discusses various statistical techniques for analyzing data, such as T-tests, ANOVA, and correlation coefficients, as well as concepts like error variance and measures of central tendency. Additionally, it covers the importance of reliability in testing and introduces different theories related to test measurement and generalizability.

Uploaded by

bkatlenem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Statistics Refresher

Types of Statistical Methods


• Descriptive Statistics
- Provide a concise description of collection of quantitative information.
• Inferential Statistics
- Provide a logical deduction about events that cannot be observed directly; make inferences from observations
of a sample to population.

Measurement
- The act of assigning numbers or symbols to characteristics of things (people, events, whatever) according to rules.

Scale
- A set of numbers (or other symbols) whose properties model empirical properties of the objects to which the numbers
are assigned.
- Types of scale:
o Continuous Scale – Numerical data that can theoretically be measured in infinitely small units. Blood
pressure is an example.
o Discrete Scale – Data that takes one of a set of particular values and can't be divided into fractions or
decimals. Examples include the number of children someone has and a credit card number.
o Nominal Scale – Used for labeling variables, with no quantitative value. Examples include marital status,
blood type, and gender.
o Ordinal Scale – Used when data can be ranked, but the differences between values are unknown or
unequal. Examples include education level and satisfaction rating.
o Interval Scale – Used for numerical data with evenly spaced intervals between values. Examples include
temperature in Fahrenheit and dates.
o Ratio Scale – Used for numerical data with a true zero and equal intervals. Examples include length, money,
age, and strength of hand grip using an instrument that measure force called dynamometer.
- Properties of Scales:
o Magnitude – Property of moreness; a particular instance of the attribute represents more, less, or equal
amounts of the given quantity than does another instance
o Equal Interval – Difference between two points at any place on the scale has the same meaning as the
difference between two other points that differ by the same number of scale units
o Absolute Zero – Nothing of the property being measured exists.

Ratio or Interval Scale


• T-Test Independent = 1DV ; 2 groups (1:2)
• T-Test Dependent = 2DV; 1 group (2:1)
• ANOVA One Way = 1DV; 2 or more groups (1:2+)
• ANOVA Repeated Measures = 2 or more DV; 1 group (2+:1)

Ordinal Scale
• Mann Whitney U Test = 1DV; 2 groups (1:2)
• Wilcoxon Signed Rank Test = 2DV; 1 group (2:1)
• Kruskal Wallis Test = 1DV; 2 or more groups (1:2+)
• Friedman Test = 2 or more DV; 1 group (2+:1)
• ANOVA Two Way = 2IV, 2 Levels (2IV:2L)
• MANOVA = many measures of DV

Error
- Refers to the collective influence of all of the factors on a test score or measurement beyond those specifically
measured by the test or measurement.

Describing Data
• Distribution
- A set of test scores arrayed for recording or study, summarized scores.
• Raw Score
- A straightforward accounting of performance that is usually numerical.
Frequency Distribution
• Frequency Distribution
- All scores are listed alongside the number of times each score occurred.
• Grouped Frequency Distribution
- Test-score intervals, also called class intervals, replace the actual test scores.
• Histogram
- A graph with vertical lines.
• Bar Graph
- Numbers indicating frequency, which appears on the Y and X axis.
• Frequency Polygon
- Test scores or class intervals meet frequencies.

Measures of Central Tendency


- A statistic that indicates the average or midmost score between the extreme scores in a distribution.
o Mean – Also referred to as arithmetic mean, which is referred to in everyday language as the “average.”
o Median – The middle score in a distribution.
o Mode – The most frequently occurring score in a distribution. May result to Bimodal Distribution, which
highlights having repeated scores.

Measures of Variability
- Variability is an indication of how scores in a distribution are scattered or dispersed.
- Statistics that describe the amount of variation in a distribution are referred to as measures of variability.
o Range – The distance between the highest and the lowest.
▪ Quartiles – Dividing points between the four quarters in the distribution.
• Interquartile Range – A measure of variability equal to the difference between Q3 and Q1.
Like the median, it is an ordinal statistic.
• Semi-interquartile Range – A related measure of variability that is equal to the interquartile
range divided by 2.
o Average Deviation – A statistical tool that provides the average of different variations from a data set.

X = score’s deviation from the mean;


n = total number of scores
o Standard Deviation – A measure of variability equal to the square root of the average squared deviations
about the mean.
o Variance – It is defined as the average of the squared difference from the mean; the standard deviation
squared.

Skewness
- How measurement is distributed.
- Two types of skewness
o Positively Skewed – Peak is on the left side; test is difficult.
o Negatively Skewed – Peak is on the right side; test is easy.

Kurtosis
- The steepness of a distribution in its center.
- Types of curves:
o Platykurtic – Relatively flat.
o Leptokurtic – Relatively peaked.
o Mesokurtic – Somewhere in the middle.

Normal Curve
- It is a bell-shaped, smooth, mathematically defined curve that is highest at its center.
- Also referred to as Gaussian.
- The area on the normal curve between 2 and 3 standard deviations above the mean is referred to as a tail.
Standard Score
- It is a raw score that has been converted from one scale to another scale, where the latter scale has some arbitrarily set
mean and standard deviation.
- A standard score obtained by linear transformation retains a direct numerical relationship to the original raw score.
- A nonlinear transformation may be required when the data under consideration are not normally distributed yet
comparisons with normal distributions need to be made
o Z Score – Results from the conversion of a raw score into a number indicating how many standard deviation
units the raw score is below or above the mean of the distribution

X = observed value
X̅ = Mean
S = Standard Deviation
o T-Scores – Can be called a fifty plus or minus ten scale; that is, a scale with a mean set at 50 and a standard
deviation set at 10.
o Stanine – A term that was a contraction of the words: standard and nine, which is divided into nine units. It is
a standard deviation with a mean of five and standard deviation of approximately 2.
o Normalizing a distribution – Involves “stretching” the skewed curve into the shape of a normal curve and
creating a corresponding scale of standard scores, a scale that is technically referred to as a normalized
standard score scale.

Correlational and Inference


• Correlation Coefficient
- Provides us with an index of the strength of the relationship between two things.
- The value obtained for the coefficient of correlation can be further interpreted by deriving from it what is
called a coefficient of determination, or r 2 .
o Correlation – An expression of the degree and direction of correspondence between two things.
▪ Positive Correlation – Both variables increase and/or decreases at the same time.
▪ Negative Correlation – One variable increase, the other decreases.
• Pearson R
- Also known as the Pearson correlation coefficient and the Pearson product-moment coefficient of correlation.
- A method of computing correlation when both variables are linearly related and continuous.
• Spearman Rho
- One commonly used alternative statistic is variously called a rank-order correlation coefficient, a rank-
difference correlation coefficient, or simply Spearman’s rho.
- A method of computing correlation, using sample sizes that are small, or the variables are ordinal in nature.

Graphic Representation of Correlation


• Scatterplot
- A scatterplot is a simple graphing of the coordinate points for values of the X-variable (placed along the
graph’s horizontal axis) and the Y-variable (placed along the graph’s vertical axis).
- Scatterplots are useful in revealing the presence of curvilinearity (“eyeball gauge” of how curved a graph is)
in a relationship.
- An outlier is an extremely atypical point located at a relatively long distance—an outlying distance—from
the rest of the coordinate points in a scatterplot

Meta Analysis
- Allows researchers to look at the relationship between variables across many studies.
- May be defined as a family of techniques used to statistically combine information across studies to produce single
estimates of the data under study.
- The estimates derived, referred to as effect size, may take several different forms.
- It promotes evidence-based practice, which may be defined as professional practice that is based on clinical and
research findings.
Reliability
- Reliability refers to the consistency in measurement.
- It is measured through reliability coefficient, an index of reliability, a proportion that indicates the ratio between the
true score variance on a test and the total variance.
- The greater the proportion of the total variance attributed to true variance, the more reliable the test.

The Concept of Reliability


• Variance
- A statistic useful in describing sources of test score variability and error, which refers to the component of the
observed score that does not have to do with the test takers true ability or trait being measured.
o True Variance – Scores that the test taker would have obtained if measurement was perfect.
o Error Variance – Variance from random sources.
• Measurement Error
- Refers to all of the factors associated with the process of measuring some variable, other than the variable
being measured.
o Random Error – A source of error in measuring a targeted variable caused by unpredictable fluctuations and
inconsistencies of other variables in the measurement process; affects precision, which is how reproducible
the same measurement is under equivalent circumstances.
o Systematic Error – Refers s to a source of error in measuring a variable that is typically constant or
proportionate to what is presumed to be the true value of the variable being measured; affects the accuracy of
a measurement, or how close the observed value is to the true value.
o Transient Error – A source of error attributable to variations in the test taker’s feelings, moods, or mental
state over time.

Sources of Error Variance


• Test Construction
- One source of variance during test construction is item sampling or content sampling, which refer to
variation among items within a test as well as to variation among items between tests.
• Test Administration
• Test Scoring and Administration
• Other Sources of Error
- The error in such research may be a result of sampling error.

Types of Reliability
• Test-Retest Reliability
- An estimate of reliability obtained by correlating pairs of scores from the same people on two different
administrations of the same test.
- When the interval between testing is greater than six months, the estimate of test-retest reliability is often
referred to as the coefficient of stability.
• Parallel Forms and Alternate Forms Reliability
- Administering 2 forms of a test to the same group.
- The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms
or parallel-forms coefficient of reliability, which is often termed the coefficient of equivalence.
o Parallel Forms – For each form of the test, the means and variances of observe test scores are equal. The
term parallel forms reliability refers to an estimate of the extent to which item sampling and other errors
have affected test scores on versions of the same test when, for each form of the test, the means and variances
of observed test scores are equal.
o Alternate Forms – Different versions of a test that have been constructed so as to be parallel. The term
alternate forms reliability refers to an estimate of the extent to which these different forms of the same test
have been affected by item sampling error, or other error
- An estimate of the reliability of a test can be obtained without administering the test twice to the same people.
Deriving this type of estimate entails an evaluation of the internal consistency (the items relate to each other)
of the test items through internal consistency estimate of reliability or estimate of inter-item consistency.
• Spit-Half Reliability
- Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
- The computation of a coefficient of split-half reliability generally entails three steps:
▪ Step 1. Divide the test into equivalent halves.
▪ Step 2. Calculate a Pearson r between scores on the two halves of the test.
▪ Step 3. Adjust the half-test reliability using the Spearman–Brown formula (discussed shortly).
- Spearman-Brown Formula allows a test developer or user to estimate internal consistency reliability from a
correlation of two halves of a test.

rhh = correlation between both tests.

- Another acceptable way to split a test:


o Odd-Even Reliability – Assign odd-numbered items to one half of the test and even-numbered items to the
other half.
o Top Bottom Method – Divide the items equally by half.

Other Methods of Estimating Internal Consistency


• Inter-Item Consistency
- Refers to the degree of correlation among all the items on a scale.
- An index of inter-item consistency is useful in assessing the homogeneity (if the test contains items that
measure a single trait) or heterogeneity (test is composed of items that measure more than one trait) of the test.
• The Kuder-Richardson Formula 20
- Statistic choice for determining the inter-item consistency of dichotomous items, items that have two possible
outcomes or scores.

• Coefficient Alpha
- Developed by Cronbach, which estimates items that are not scored as 1 or 0; used to measure internal
consistency in likerts scales and other survey questionnaires.
- In clinical settings, 0.90 to 0.95 Cronbach alpha is an acceptable value; in research setting, 0.70 to 0.90
Cronbach alpha is an acceptable value.
• Average Proportional Distance (APD)
- A measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists
between item scores.
• Inter-Scorer Reliability
- The degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a
particular measure.
- Variously referred to as scorer reliability, judge reliability, observer reliability, and inter-rater reliability.
- The simplest way of determining the degree of consistency among scorers in the scoring of a test is to calculate
a coefficient of correlation, referred to as coefficient of inter-scorer reliability.

Types of Characteristics
• Dynamic Characteristics
- It is a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences.
• Static Characteristics
- It is a trait, state, or ability presumed to be relatively unchanging, such as intelligence.

Types of Tests
• Power Test
- When a time limit is long enough to allow test takers to attempt all items, and if some items are so difficult that
no test taker is able to obtain a perfect score.
• Speed Test
- Contains uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all test
takers should be able to complete all the test items correctly.
• Criterion-reference Test
- Designed to provide an indication of where a test taker stands with respect to some variable or criterion, such
as an educational or a vocational objective.

Classical Test Theory (CTT)


- The most widely used and accepted model in the psychometric literature today—rumors of its demise have been
greatly exaggerated.
- True Score as a value that according to classical test theory genuinely reflects an individual’s ability (or trait) level
as measured by a particular test.
Domain Sampling Theory
- A test’s reliability is conceived of as an objective measure of how precisely the test score assesses the domain from
which the test draws a sample.

Generalizability Theory
- Based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation.
- Cronbach encouraged test developers and researchers to describe the details of the particular test situation or
universe leading to a specific test score. This universe is described in terms of its facets, which include things like
the number of items in the test, the amount of training the test scorers have had, and the purpose of the test
administration.
- A generalizability study examines how generalizable scores from a particular test are if the test is administered in
different situations.
- The influence of particular facets on the test score is represented by coefficients of generalizability.
- In decision study, developers examine the usefulness of test scores in helping the test user make decisions.

Item Response Theory (IRT)


- A mathematical model that helps researchers understand how people respond to tests and assessments.
- Discrimination signifies the degree to which an item differentiates among people with higher or lower levels of the
trait, ability, or whatever it is that is being measured.
- Rasch Model, a reference to an IRT model with very specific assumptions about the underlying distribution.

Types of Test Items


• Dichotomous Test Items
- Requires only 2 alternative answers.
• Polytomous Test Items
- Requires 3 or more alternative answers.

Standard Error of Measurement


- Also referred to as, standard error of a score, a tool used to estimate or infer the extent to which an observed score
deviates from a true score.
- The standard error of measurement provides such an estimate. Further, the standard error of measurement is useful in
establishing what is called a confidence interval: a range or band of test scores that is likely to contain the true score.
- Comparisons between scores are made using the standard error of the difference, a statistical measure that can aid
a test user in determining how large a difference should be before it is considered statistically significant
Validity
- Validity is a judgment or estimate of how well a test measures what it purports to measure in a particular context.
- The inference is a logical result or deduction.
- In this classic conception of validity, referred to as the trinitarian view (Guion, 1980), it might be useful to visualize
construct validity as being “umbrella validity” because every other variety of validity falls under it.
- There are many ways of approaching the process of test validation, and these different plans of attack are often
referred to as strategies.

Validation
- The process of gathering and evaluating evidence about validity.
- It may be more appropriate for test users to conduct their own validation studies with their own groups of test takers.
o Local Validation Studies – Absolutely necessary when the test user plans to alter in some way the format,
instructions, language, or content of the test.
- Ecological Validity refers to a judgment regarding how well a test measures what it purports to measure at the time
and place that the variable being measured (typically a behavior, cognition, or emotion) is actually emitted.
- Incerement Validity refers to the degree to which an additional predictor explains something about the criterion
being measured that is noy explained by predictors in used.
- 3 categories of validity:
o Content Validity – A type of measurement validity that evaluates how well a test, survey, or measurement
method covers all relevant aspects of a construct; test blueprint (structure of the test evaluation) covers the
content what is supposed to cover.
o Criterion-Related Validity – A way to evaluate how well a test measures the outcome it was designed to
measure based on the Criterion, which is the standard on which judgment or decision is based. If a test
measures what it does not supposed to measure, it is called criterion contamination.
▪ Predictive Validity – Refers to the ability of a test to measure some event or outcome in the future.
▪ Concurrent Validity – Refers to the ability of a test to predict an event in the present.
o Construct Validity – Appropriateness of inferences drawn from test scores regarding individual standings
on a variable called construct; how well a test measures the concept it was designed to evaluate.
▪ Convergent Validity – Measures how well a test correlates with other tests that assess the same or
similar construct.
▪ Discriminant Validity – Measures how well a test correlates with tests that assess different
constructs.
o Face Validity – Relates more to what a test appears to measure to the person being tested than to what the
test actually measures; how relevant the items appear to be.

Types of Rates
• Base Rate
- The extent to which a particular trait, behavior, characteristic, or attribute exists in the population.
• Hit Rate
- Defined as the proportion of people a test accurately identifies as possessing or exhibiting a particular trait,
behavior, characteristic, or attribute.
• Miss Rate
- Defined as the proportion of people the test fails to identify as having, or not having, a particular characteristic
or attribute.
▪ False Positive – A miss wherein the test predicted that the test taker did possess the particular
characteristic or attribute being measured when in fact the test taker did not.
▪ False Negative – A miss wherein the test predicted that the test taker did not possess the particular
characteristic or attribute being measured when the test taker actually did.

Method of Contrasted Groups


- One way of providing evidence for the validity of a test is to demonstrate that scores on the test vary in a predictable
way as a function of membership in some group.

Types of Evidence
• Convergent Evidence
- If scores on the test undergoing construct validation tend to correlate highly in the predicted direction with
scores on older, more established, and already validated tests designed to measure the same construct.
• Discriminant Evidence
- A validity coefficient showing little (a statistically insignificant) relationship between test scores and/or other
variables with which scores on the test being construct-validated should not theoretically be correlated.

Multitrait-multimethod Matrix
- In 1959, an experimental technique useful for examining both convergent and discriminant validity evidence was
presented in Psychological Bulletin.

Factor Analysis
- A shorthand term for a class of mathematical procedures designed to identify factors or specific variables that are
typically attributes, characteristics, or dimensions on which people may differ.
o Exploratory Factor Analysis – Typically entails “estimating or extracting factors; deciding how many
factors to retain; and rotating factors to an interpretable orientation.”
o Confirmatory Factor Analysis – Researchers test the degree to which a hypothetical model (which includes
factors) fits the actual data.
o Factor Loading – Defined as “a sort of metaphor. Each test is thought of as a vehicle carrying a certain
amount of one or more abilities”

Test Bias
- Bias is a factor inherent in a test that systematically prevents accurate, impartial measurement.

Rating Error
- A rating is a numerical or verbal judgment (or both) that places a person or an attribute along a continuum identified
by a scale of numerical or word descriptors known as a rating scale.
- A rating error is a judgment resulting from the intentional or unintentional misuse of a rating scale.
o Leniency Error – Also known as Generosity Error, which is an error in rating that arises from the tendency
on the part of the rater to be lenient in scoring, marking, and/or grading.
o Severity Error – A rating error that occurs when a rater consistently gives overly negative ratings, especially
when evaluating someone's performance or abilities.
o Central Tendency Error – A type of rater error that occurs when an evaluator rates most employees as
average or near the middle of a rating scale, regardless of their actual performance.
o Halo Effect – A cognitive bias that occurs when a person's overall impression of someone, something, or a
brand is based on a single positive characteristic.
Utility
- Utility refers usefulness or practical value of testing to improve efficiency.
- Utility Analysis is a family of techniques that entail a cost–benefit analysis designed to yield information relevant to
a decision about the usefulness and/or practical value of a tool of assessment.

Benefit
- Refers to profits, gains, or advantages. As we did in discussing costs associated with testing (and not testing), we can
view benefits in both economic and noneconomic terms.

Top-down Selection
- It is the process of awarding available positions to applicants whereby the highest scorer is awarded the first position,
the next highest scorer the next position, and so forth until all positions are filled.

Types of Tables
• Taylor-Russell Tables
- Provides an estimate of the extent to which inclusion of a particular test in the selection system will improve
selection.
▪ Selection Ratio – A numerical value that reflects the relationship between the number of people to
be hired and the number of people available to be hired.
▪ Base Rate – Refers to the percentage of people hired under the existing system for a particular
position.
• Naylor-Shine Tables
- Entails obtaining the difference between the means of the selected and unselected groups to derive an index
of what the test (or some other tool of assessment) is adding to already established procedures.

Brogden-Cronbach-Gleser Formula
- Used to calculate the dollar amount of a utility gain (an estimate of the benefit (monetary or otherwise) of using a
particular test or selection method) resulting from the use of a particular selection instrument under specified
conditions.
- Productivity gain refers to an estimated increase in work output.

Cut Score
- Determines whether a test is passed or failed.
- At every stage in a multistage (or multiple hurdle) selection process, a cut score is in place for each predictor used.
o Relative Cut Score – A method for determining a minimum passing score for a test by using the percentage
of test takers who pass.
o Norm-Referenced Cut Score – A score that compares a test taker's performance to their peers, regardless of
their understanding of the material.
o Fixed Cut Score – May also be referred to as absolute cut scores; made on the basis of having achieved a
minimum level of proficiency on a test.
o Multiple Cut Score – Refers to the use of two or more cut scores with reference to one predictor for the
purpose of categorizing test takers.

Compensatory Model of Selection


- An assumption is made that high scores on one attribute can, in fact, “balance out” or compensate for low scores on
another attribute.
o Angoff Method – Can be applied to personnel selection tasks as well as to questions regarding the presence
or absence of a particular trait, attribute, or ability.
o Known Groups Method – Entails collection of data on the predictor of interest from groups known to
possess, and not to possess, a trait, attribute, or ability of interest.
o Bookmark Method – An IRT-based method of setting cut scores that is more typically used in academic
applications.
o The Method of Predictive Yield – A technique for setting cut scores which took into account the number of
positions to be filled, projections regarding the likelihood of offer acceptance, and the distribution of
applicant scores.
o Discriminant Analysis – An approach to setting cut scores employs a family of statistical techniques.

You might also like