Statistics (RDA) Forumla Sheet
Statistics (RDA) Forumla Sheet
*A psychological test is essentially an objective & standardised measure of a sample of behaviour. The diagnostic or predictive value of a psychological test depends
the degree to which it serves as an indicator of a
relatively broad & significant area of behaviour Instruments Sourcing tests
Test Response format Lecturer evaluations Tests, publishers
* is it SELF REPORT ? completed by the person or not Exams Buros institute
* are there open or closed ended questions Admission tests Mental measurements book,
* what type of answer is required e.g MCQ, likert google etc.
Importance of tests Test scores (observed) reflect 2 factors
* tests are developed objectively 1. factors contributing to consistency – stable characteristics of
*present a collective understanding of phenomenon the individual or attribute being measures
* best was to get info, make info gathering easy Factors contributing to inconsistency – features of an individual
* reliability – refers to the consistency of scores obtained by the same persons when re or testing situation that affect test scores but have nothing to do
examined with the same test on different occasions or with different sets of equivalent with the attribute being measures
items, or under other variable examining conditions.
*True score = not an indication of the true or real level of an attribute and represents a Theory of true scores
combination of all factors leading to consistency in measurement
Observed test score = true score + errors of measurement
*Variability = gives the assumption of tts, reliability theory tells us the variability of observed
scores is simply the sum of the true scores plus variance of errors of measurement. Error X=T+e
variance has little effect on variability of observed scores
Reliability coefficient (r**) = can be seen as the ratio of true score variance to the total variance of observed test scores
Interpretation of the reliability coefficient = r** tells us the proportion of the varience in observed test scores is due to variance in true scores. Tts helps interpret the
meaning of reliability info and in understanding what reliability coefficient tells us
0.9-1 Excellent reliability
if a test = unreliable, it cant be valid
0.8 – 0.9 Very good to excellent
can be invalid & reliable
0.7 - 0.8 Good to fair
* Bias = is said to exist when a test makes a systematic error of measurement or prediction. It is
a statistical characteristic of test scores or predictions based on those scores. Can be investigated and 0.6 – 0.7 Fair to poor
correlated Below 0.6 Very poor
* test bias in GGAT = dated, American, middle class content, abstract,writing & language based Bias
*Fairness – can have a biased yet fair test. Fairness refers to a value judgement regarding decisions or 1. bias in measurement = attest make systematic
actions taken as a result if test scores – subjective Bias Fairness errors in measuring a specific characteristic or
Test scores/ predictions Actions/ decsions attribute. A biased test systematically- or
Statistical characteristic Value judgements overestimates the level of an attribute in
Empirical def Philosophical def individuals from particular groups.
Scientifically determined Not scientifically determined
2. bias in prediction – test makes systematic
errors in prediciting some criterion or outcome. A
biased test systematicallu under or over predicts
the level of an attribute in individuals from
particular groups
Types of reliability
1. Test- retest also called stability estimates. These establish the degree of consistency to test
scores over time. Method = needs atleast 2 administrators. Correlate the scores for individuals on
each occasion, give estimate stability. Length of time between test is NB. Sources of error = temporal
factors which are only suitable for tests of stable constructs – genuine change I the attribute being
measured can reduce the estimate. Reactivity occurs when the experience of having taken the test
before changes the score the second time. Carry over = person remembers answers. And impractical
becase it requires 2 administrators
2.INTERNAL CONSISTENCY – MOST COMMON REPORT TYPE. Internal consistency indicates Types of validity
how well performance by a person on each item of the test relates to their performance on all other
items of the same test. Most commonly used = cronbachs coefficient alpha, which is suitable for most 1. face validity – ability of test items to users &
test response formats. Others incluse kuder-richardson 20, spearman brown etc. Method= estimate sunjects as appropriate purpose. Concerns style
reliability by using the number of items in the test and the average inter-item correlation, highly and appropriatenedd of itmes – based on
practical, reliability is affected by number of observations. Source of error = content factors- judgement. Impacts on test taking and rapport
inconsistency of test content. Consistency & homogeneity `(taking test seriously
2. content validity – determining whether the test
3. split half – eindicate how well performance by a person on one half of the test relates to theor content covers representative sample of the
performance on the other half of the same test. Method= correlate persons on both halves, requires behaviourlal domain measures. Degree of match
formula, practical & avoids problems such as carry over. Sources of error= content factors between test specification (a structured description
(inconsistency of content) which split to use, underestimation ( are halves equivalent) of relevant content and domain) and the task
specification ( what the test actually requires by the
way of response) related to relevance of responses.
Qualitative. What kind of deviationa are okay
4. parallel or alternate forms – can only be calculated if there are 2 different forms of the same test, 3. constructs – abstract summaries of some
these establish the degree of consistency between performance on the two different versions. ,ethos regulatory nature related to or concerned with
=correlate measurements on tests,always the same, reduce reactivity & carry over. Sources of error concrete, observable entities or events. Ideas
= temporal factors (genuine change in attribute being measured) carryover,reactivity,content constructed to help summarise a group of related
factors,underestimation,impractical (time & money) super tests phenomena e.g gravity
4. criterion validity assesses the validity of the test
as a means of predicting a particular outcome, its
making a judgement or decision. Direct and
independent measure of whatever the test is being
used to predict e.g occupation selection tests. 2
subtypes = predictive & concurrent
5. inter-rater –or inter- scorer. Examines the consistency of scores between different raters or Investigating construct validity
judges.only applies to test withsubjective scoring rules, establishes a degree of similarity In the Often requires psychological theory, processes=
pattern of scores or ratings given between different scores. Method = correlates the scores between identify behaviour related to the construct, identify
raters, expect to see strong positive good reliability. Sources of error= difficult to get 2 experienced other constructs & decide if it related or not. Identify
raters. Impractical,need effective coding frames behaviours related to constructs and determine if
and how each is related e.g extraversion and
introversion.
2 subtypes of construct validity
1. 2. convergent validity= measures the same
and or closer related constructs are
strongly correlated measures converge on
similar results.
3. Discriminant validity = measures the
unrelated constructs are not stongly
correlated applies no relationship.
Measures effectively discriminate between
unrelated constructs and behaviours