DIMI 4015: INTRODUCTION
TO MEDICAL STATISTICS
Lecture 5 Updated
Richard Kollmar 2024-06-14
Today’s Quiz
Homework 2024-06-10-B
Q1: Can you nd an explanation why the p-value for our parents’
age data was so much smaller with the paired-sample t-test than
with the independent-sample t-test?
fi
Homework 2024-06-10-B
Q1: Can you nd an explanation why the p-value for our parents’ age
data was so much smaller with the paired-sample t-test than with the
independent-sample t-test?
When comparing paired t-test to independent–sample t-test, the former
typically results in lower p-value. This is because paired samples tend to
be more alike than independent samples. The reduced variation within
these pairs increases the T- value which decreases the p value. Paired
samples, T-test consider the inherent connections within the data set, like
pre and post treatment measurements on the same individuals enhancing
the test's ability to identify di erence even with fewer data points.
fi
ff
Independent- vs. Paired-Sample t-Tests
and Sources of Variance
Homework 2024-06-10-B
• Filenames
• More than two levels in categorical grouping variable
• Categorical dependent variable
• Descriptive table not split by grouping variable
• Descriptive graph not split or averages only
• Non-normal distribution of dependent variable
• Statistical vs. clinical signi cance
• Contents and formating of report derived from .spv output
fi
Statistical Concepts
Hypothesis Formulation
Scienti c Method, Null vs. Alternative Hypothesis, Iterative Testing, Falsi cation, …
Experimental Design
Variable Types, Measurement Scales, Study Designs, Types of Bias;
Sensitivity, Speci city, Predictive Values of Diagnostic Tests, …
Data Wrangling
Data Import, Cleaning, Labeling, Transformation, Recoding, Sorting, Filtering, …
Exploratory Data Analysis
Descriptive Statistics, Central Tendency, Dispersion, Shape, Graphing, …
Hypothesis Testing / Inference
Models, Assumptions, Probability, Probability Distributions, Signi cance, p-Value,
Con dence Interval, False Positive Risk, t-Test, ANOVA, Correlation, Regression, χ2-
Test, Non-Parametric Tests, Linear Mixed Models, …
Documentation & Reporting
Data Preservation, Scripting, Presentation, Manuscript Preparation, …
fi
fi
fi
fi
fi
8. Hypothesis Testing … - cont’d
• Assumptions of the t-test:
• The subjects in each group (sample) were randomly selected
from, or are at least representative of, the larger populations.
• The subjects in one group were obtained independently from
the subjects in the other group. (We will explore this
assumption in more detail later on!)
• The subjects within each group were obtained independently
from each other.
• The values of the observations in the underlying populations
follow a normal (“Gaussian”) distribution.
• The variances/standard deviations of the values in the
underlying populations are equal.
Study Designs and Sources of Bias
• Study Design: The prior plan for performing various aspects of a
study—hypothesis formulation, subject recruitment, treatment
administration, outcome measurement, data analysis, …
• Bias: A aw in the design, execution, or analysis of a study that
distorts the outcome and conclusions
• A critical consideration in developing a study design is the
avoidance of any form of bias
fl
Common Forms of Bias
• Selection Bias
Systematic di erences in the baseline characteristics of the groups that are being
compared. E.g., di erent sex ratios, age ranges, health status, etc.
• Performance Bias
Systematic di erences in the way the groups are treated or exposed to other factors
outside of the intervention of interest. E.g., surgery group gets more attention than non-
surgery control group.
• Detection Bias
Systematic di erences in the way outcomes are determined in the di erent groups. E.g.,
leading questions when sta is not blinded to group assignment.
• Attrition Bias
Systematic di erences between groups regarding exclusion (omission from study) or
attrition (withdrawal from study). E.g., harsh side e ects from drug but not from placebo
cause uneven withdrawal.
• Reporting Bias
Systematic di erences between reported and unreported ndings. E.g., negative studies
(no signi cant di erences found) are hard to publish.
• Many others!
Source: Cochrane
fi
ff
ff
ff
ff
ff
ff
ff
ff
ff
fi
ff
Common Study Designs
• Case Report - single subject, no comparison in study
• Cross-Sectional and Ecological - single point-in-time survey of subjects or groups
• CaSe-Control - group by existing Status/disease, compare treatment/exposure
• CohorT - group by existing Treatment/exposure, compare status/disease
• Randomized Controlled - assign at random to treatment/exposure, compare status/
disease
based on exposure
• Retrospective (never randomized-controlled) vs. Prospective (never case-control)
start at time of exposure (historically) and follow forward to follow into the future to asses for outcome
point of outcome occurred (know) in the present
• Non-Blinded vs. Blinded (subject) vs. Double-Blinded (subject and staff)
• Placebo-controlled
• Clinical equipoise and ethics of clinical research:
- Only study treatments if you truly don’t know whether they will be beneficial.
- Never assign study treatments (including none) worse than existing treatments.
Case-Control vs. Cohort Trial
Source: Wikipedia
Randomized Controlled Trial
Source: Wikipedia
collected data follows a normal distribution (results only include P values)more important for small sample
常态 sizes than for larger
Checking t-Test Assumptions: Normality
p-value>significance level 0.05 = distribution of the data are not significantly different from the normal distribution=Can assume the normality
• Kolmogorov-Smirnov Test of Normality
Not significant ➙ Assume normal distribution of observations
Significant ➙ Assume non-normal distribution
• Shapiro-Wilks Test of Normality
Not significant ➙ Assume normal distribution of observations
Significant ➙ Assume non-normal distribution
• Problem with Kolmogorov-Smirnov and Shapiro-Wilks Tests
Too much sensitivity for large samples
Too little sensitivity for small samples
• Quantile-Quantile (Q-Q) Plot
Qualitative visual assessment
• Recommendation
Don’t worry about non-normality for large samples thanks to the Central Limit
Theorem
Use a non-parametric test or a randomization test or bootstrapping for small
samples
Checking t-Test Assumptions: Homoscedasticity
• Homoscedasticity = Equality (Homogeneity) of Variances among
Groups
vs. Heteroscedasticity = Non-Equality of Variances
• Levene’s Test for Equality of Variances used to test the null hypothesis that the samples to be c
ompared come from a population with the same variance.
Not significant ➙ Assume equal variances
Significant ➙ Assume unequal variances
• Problem with Levene’s Test
Too much sensitivity for large samples
Too little sensitivity for small samples
• Recommendation
Always use Welch’s test for unequal variances
Documentation & Reporting
• Make sure in SPSS::Edit > Options > Viewer that ‘Display
commands in the log’ is checked
• .sav Data le (for reproducibility)
• .sps Syntax le (for reproducibility)
• .spv Output le (for documentation)
• .pdf Report le
SPSS::File > Export…
check ‘Objects to Export’ > All visible
choose ‘Document Type’ > ‘Word/RTF (*.doc)’
click Browse… and choose folder and enter lename
open exported .doc le in Word and edit and annotate
save as .pdf le for submission
• The Report le must contain Syntax, tables, graphs, and other
meaningful SPSS output plus your explanations and conclusions
fi
fi
fi
fi
fi
fi
fi
fi
Homework Exercise 2024-06-14-B
Once I send you your graded homework from this week (by Saturday
evening), x it and clean it up according to the ‘Documentation &
Reporting’ slide from this lecture and my individual comments.
fi