Stats Step 3
Stats Step 3
Odds ratio = =
b/d bc
a/(a + b)
260
Relative risk =
SECTION II Public Health Sciences PUBLIC HEALTH SCIENCES—Epidemiology
c/(c + d)and Biostatistics
a c
Attributable risk =
a+b c+d
Disease or outcome
Quantifying risk Definitions and formulas are based on the classic
2 × 2 or contingency table.
or intervention
Exposure
a b
c d
Demographic As a country proceeds to higher levels of development, birth and mortality rates decline to varying
transition degrees, changing the age composition of the population.
Population pyramid
Age
Age
Age
Male Female
Male Female Female % Population % Population %
Male Population
Birth rate
Mortality rate
Life expectancy Short Long Long
Likelihood ratio probability of positive result in patient with disorder sensitivity TP rate
LR+ = = =
probability of positive result in patient without disorder 1 – specificity FP rate
LR+ > 10 indicates a highly specific test, while LR– < 0.1 indicates a highly sensitive test.
Pretest probability × LR = posttest odds. Posttest probability = posttest odds / (posttest odds + 1).
X Y Z
Time
Test
NPV
– FN TN = TN/(TN + FN)
Sensitivity (true- Proportion of all people with disease who test = TP / (TP + FN)
positive rate) positive, or the ability of a test to correctly = 1 – FN rate
identify those with the disease. SN-N-OUT = highly SeNsitive test, when
Value approaching 100% is desirable for ruling Negative, rules OUT disease
out disease and indicates a low false-negative High sensitivity test used for screening
rate.
Specificity (true- Proportion of all people without disease who = TN / (TN + FP)
negative rate) test negative, or the ability of a test to correctly = 1 – FP rate
identify those without the disease. SP-P-IN = highly SPecific test, when Positive,
Value approaching 100% is desirable for ruling rules IN disease
in disease and indicates a low false-positive High specificity test used for confirmation after a
rate. positive screening test
Positive predictive Probability that a person who has a positive test PPV = TP / (TP + FP)
value result actually has the disease. PPV varies directly with pretest probability
(baseline risk, such as prevalence of disease):
high pretest probability high PPV
Negative predictive Probability that a person with a negative test NPV = TN / (TN + FN)
value result actually does not have the disease. NPV varies inversely with prevalence or pretest
probability
Possible cutoff values for vs – test result
Disease Disease A = 100% sensitivity cutoff value
Number of people
FN FP
Raising the cutoff value: ↑ Specificity ↑ PPV
A B C B C ( ↑ FN FP)
↑ ↑
Sensitivity NPV
↑
↑
Test results
Receiver operating ROC curve demonstrates how well a diagnostic Ideal test (AUC = 1)
1
characteristic curve test can distinguish between 2 groups (eg, 1)
<
disease vs healthy). Plots the true-positive rate UC
<A
(sensitivity) against the false-positive rate .5
t (0
TP rate (sensitivity)
0.5)
(1 – specificity).
s
l te
C=
ua
lu
area under the curve (AUC), with the curve va
it ve
ic
closer to the upper left corner. pr
ed
o
In diseases diagnosed based on low lab values N
(eg, anemia), the curve is flipped: lowering the
cutoff further FP, FN; raising the cutoff
FP rate (1 – specificity) 1
FN, FP.
Precision vs accuracy
Precision (reliability) The consistency and reproducibility of a test. Random error precision in a test.
The absence of random variation in a test. precision standard deviation.
precision statistical power (1 − β).
Accuracy (validity) The closeness of test results to the true values. Systematic error accuracy in a test.
The absence of systematic error or bias in a test.
Accuracy Accuracy
High Low High Low
survival time —
mortality —
Faster recovery time —
Extensive vaccine administration
risk factors
diagnostic sensitivity
New effective treatment started —
contact between patients with and — —
without noninfectious disease
contact between infected and
noninfected patients with airborne
infectious disease
Statistical distribution
Measures of central Mean = (sum of values)/(total number of values). Most affected by outliers (extreme values).
tendency Median = middle value of a list of data sorted If there is an even number of values, the median
from least to greatest. will be the average of the middle two values.
Mode = most common value. Least affected by outliers.
Measures of Standard deviation = how much variability σ = SD; n = sample size.
dispersion exists in a set of values, around the mean of Variance = (SD)2.
these values. SE = σ/√n.
Standard error = an estimate of how much SE as n .
variability exists in a (theoretical) set of sample
means around the true population mean.
Normal distribution Gaussian, also called bell-shaped.
–1σ +1σ
Mean = median = mode.
For normal distribution, mean is the best –2σ +2σ
–3σ +3σ
measure of central tendency.
For skewed data, median is a better measure of 68%
Testing errors
Type I error (α) Stating that there is an effect or difference when Also called false-positive error.
none exists (H0 incorrectly rejected in favor of 1st time boy cries wolf, the town believes there
H1). is a wolf, but there is not (false positive).
α is the probability of making a type I error You can never “prove” H1, but you can reject the
(usually 0.05 is chosen). If P < α, then H0 as being very unlikely.
assuming H0 is true, the probability of α level ( statistical significance level).
obtaining the test results would be less than
the probability of making a type I error. H0 is
therefore rejected as false.
Statistical significance ≠ clinical significance.
Type II error (β) Stating that there is not an effect or difference Also called false-negative error.
when one exists (H0 is not rejected when it is 2nd time boy cries wolf, the town believes there is
in fact false). no wolf, but there is one.
β is the probability of making a type II error. β is If you sample size, you power. There is power
related to statistical power (1 – β), which is the in numbers.
probability of rejecting H0 when it is false. Generally, when type I error increases, type II
power and β by: error decreases.
sample size
expected effect size
precision of measurement
Statistical vs clinical Statistical significance—defined by the likelihood of study results being due to chance. If there is a
significance high statistical significance, then there is a low probability that the results are due to chance.
Clinical significance—measure of effect on treatment outcomes. An intervention with high clinical
significance is likely to have a large impact on patient outcomes/measures.
Some studies have a very high statistical significance, but the proposed intervention may not have
any clinical impact/significance.
Confidence interval Range of values within which the true mean H0 is rejected (and results are significant) when:
of the population is expected to fall, with a 95% CI for mean difference excludes 0
specified probability. 95% CI OR or RR excludes 1
CI = 1 – α. The 95% CI (corresponding to CIs between two groups do not overlap
α = 0.05) is often used. As sample size H0 is not rejected (and results are not significant)
increases, CI narrows. when:
CI for sample mean = x ± Z(SE) 95% CI for mean difference includes 0
For the 95% CI, Z = 1.96. 95% CI OR or RR includes 1
For the 99% CI, Z = 2.58. CIs between two groups do overlap
Meta-analysis A method of statistical analysis that pools summary data (eg, means, RRs) from multiple studies
for a more precise estimate of the size of an effect. Also estimates heterogeneity of effect sizes
between studies.
Improves power, strength of evidence, and generalizability (external validity) of study findings.
Limited by quality of individual studies and bias in study selection.
Variables to be compared
Pearson correlation A measure of the linear correlation between two variables. r is always between −1 and +1. The
coefficient closer the absolute value of r is to 1, the stronger the linear correlation between the 2 variables.
Variance is how much the measured values differ from the average value in a data set.
Positive r value positive correlation (as one variable , the other variable ).
Negative r value negative correlation (as one variable , the other variable ).
Coefficient of determination = r 2 (amount of variance in one variable that can be explained by
variance in another variable).
r = –0.8 r = –0.4 r=0 r = +0.4 r = +0.8
Decision-making Physician must determine whether the patient is Capacity is determined by a physician for a
capacity psychologically and legally capable of making specific healthcare-related decision (eg, to
a particular healthcare decision. refuse medical care).
Note that decisions made with capacity cannot Competency is determined by a judge and
be revoked simply if the patient later loses usually refers to more global categories of
capacity. decision-making (eg, legally unable to make
Intellectual disabilities and mental illnesses are any healthcare-related decision).
not exclusion criteria for informed decision- Four major components of decision-making:
making unless the patient’s condition presently Understanding
impairs their ability to make healthcare Appreciation
decisions. Reasoning
Expressing a choice