0% found this document useful (0 votes)
63 views31 pages

Biostatistics For Public Health: Chapter 11 - Inference About A Mean

This chapter discusses inference about a mean. It covers estimating the standard error of the mean using sample standard deviation, the Student's t-distribution and how it differs from the z-distribution. It then explains how to perform one-sample t-tests to test hypotheses about a population mean, and how to construct confidence intervals for a mean. Finally, it discusses paired sample t-tests, which are used when samples are paired or dependent, such as samples taken from the same subject before and after an intervention.

Uploaded by

Javier Benavides
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views31 pages

Biostatistics For Public Health: Chapter 11 - Inference About A Mean

This chapter discusses inference about a mean. It covers estimating the standard error of the mean using sample standard deviation, the Student's t-distribution and how it differs from the z-distribution. It then explains how to perform one-sample t-tests to test hypotheses about a population mean, and how to construct confidence intervals for a mean. Finally, it discusses paired sample t-tests, which are used when samples are paired or dependent, such as samples taken from the same subject before and after an intervention.

Uploaded by

Javier Benavides
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Biostatistics for Public Health

Chapter 11 - Inference About a Mean

Kevin Brooks MSc., PhD.


Objective

Perform and interpret one-sample, two-sample, and


paired t hypothesis tests on means.
✓ Estimated Standard Error of the Mean
✓ Student’s t Distribution
✓ One-Sample t Test
✓ Confidence Interval for μ
✓ Paired Samples
✓ Conditions for Inference

Kevin Brooks MSc., PhD. 2


Estimated Standard Error of the
Mean

• We rarely know population standard deviation σ


⇒ instead, we calculate sample standard
deviations s and use this as an estimate of σ
• We then use s to calculate this estimated
standard error of the mean:
s

 SE x =
n
• Using s instead of σ adds a source of uncertainty
⇒ z procedures no longer apply 

⇒ use t procedures instead

3
Kevin Brooks MSc., PhD.
Student’s t distributions

• A family of distributions identified by


“Student” (William Sealy Gosset) in 1908
• t family members are identified by their degrees of
freedom, df.
• t distributions are similar to z distributions but with
broader tails
• As df increases → t tails get skinnier → t become
more like z

4
Kevin Brooks MSc., PhD.
t probability density functions with 1, 9, and ∞ degrees
of freedom.

5
Kevin Brooks MSc., PhD.
t table (Table C)

• Use Table C to look up t values and probabilities


– Entries ⇒ t values
– Rows ⇒ df
– Columns ⇒ probabilities

6
Kevin Brooks MSc., PhD.
Understanding Table C

Let tdf, p ≡ a t value with df degrees of freedom and


cumulative probability p. For example, t9, 0.90 = 1.383

Table C. Traditional t table


Cumulative p 0.75 0.80 0.85 0.90 0.95 0.975
Upper-tail p 0.25 0.20 0.15 0.10 0.05 0.025
df = 9 0.703 0.883 1.100 1.383 1.833 2.262

7
Kevin Brooks MSc., PhD.
The 10th and 90th percentiles on t9.

Left tail: Right tail:


Pr(T9 < -1.383) = 0.10 Pr(T9 > 1.383) = 0.10

8
Kevin Brooks MSc., PhD.
One-Sample t Test

A. Hypotheses. H0: µ = µ0 vs. Ha: µ ≠ µ0 (two-sided) [ Ha:


µ < µ0 (left-sided) or Ha: µ > µ0 (right-sided)]
B. Test statistic.



 x − µ0

tstat = with df = n − 1
s n
C. P-value. Convert tstat to P-value [table C or software].
Small P ⇒ strong evidence against H0
D. Significance level (optional). See Ch 9 for guidelines.
9
Kevin Brooks MSc., PhD.
One-Sample t Test: Statement of
the Problem

• Do SIDS babies have lower than average birth


weights?
• We know from prior research that the mean birth
weight of the non-SIDs babies in this population is
3300 grams
• We study n = 10 SIDS babies, determine their birth
weights, and calculate x-bar = 2890.5 and s = 720.
• Do these data provide significant evidence that
SIDs babies have different birth weights than the
rest of the population?

10
Kevin Brooks MSc., PhD.
One-Sample t Test: Example
A. H0: µ = 3300 versus Ha: µ ≠ 3300 (two-sided)
B. Test statistic

x − µ 0 2890.5 − 3300
tstat = = = −1.80
SE x 720 10
df = n − 1 = 10 − 1 = 9
C.P = 0.1054 [next slide]
Weak evidence against H0
(optional) Data are not significant at α = 0.10
11
Kevin Brooks MSc., PhD.
Converting the tstat to a P-value

tstat ⇒ P-value via Table C. Wedge |tstat| between critical


value landmarks on Table C. One-tailed 0.05 < P < 0.10 and
two-tailed 0.10 < P < 0.20.

Table C. Traditional t table |tstat| = 1.80


Cumulative p 0.75 0.80 0.85 0.90 0.95 0.975
Upper-tail p 0.25 0.20 0.15 0.10 0.05 0.025
df = 9 0.703 0.883 1.100 1.383 1.833 2.262

tstat ⇒ P-value via software. Use a software utility to


determine that a t of −1.80 with 9 df has two-tails of 0.1054.

12
Kevin Brooks MSc., PhD.
Two-tailed P-value, SIDS illustrative example

13
Kevin Brooks MSc., PhD.
Confidence Interval for µ
s
(1 − α )100% CI for µ = x ± t n −1,1− α ⋅
2
n
• Typical point “estimate ± margin of error” formula
• tn-1,1-α/2 is from t table (see bottom row for conf. level)
• Similar to z procedure except uses s instead of σ
• Similar to z procedure except uses t instead of z
• Alternative formula:
s
x ± t n −1,1− α ⋅ SE x where SE x =
2
n 14
Kevin Brooks MSc., PhD.
Confidence Interval: Example 1

Let us calculate a 95% confidence interval for μ for the


birth weight of SIDS babies.

x = 2890.5 s = 720.0 n = 10
s
95% CI for µ = x ± t10 −1,1− .05 ⋅
2
n
720
= 2890.5 ± 2.262 ⋅
10
= 2890.5 ± 515.1
= (2375.4 to 3405.6) grams
15
Kevin Brooks MSc., PhD.
Confidence Interval: Example 2

Data are “% of ideal body weight” in 18 diabetics:


{107, 119, 99, 114, 120, 104, 88, 114, 124, 116,
101, 121, 152, 100, 125, 114, 95, 117}. Based on
these data we calculate a 95% CI for µ.
x = 112.778 s = 14.424 n = 18
s 14.242
SE x = = = 3.400
n 18
t n −1,1− α = t18−1,1− .05 = t17,.975 = 2.110 (from t table)
2 2

x ± (t n −1,1− α )( SE x ) = 112.778 ± (2.110)(3.44)


2

= 112.778 ± 7.17 = (105.6, 120.0)


16
Kevin Brooks MSc., PhD.
Paired Samples

• Paired samples: Each point in one sample is matched


to a unique point in the other sample
• Pairs be achieved via sequential samples within
individuals (e.g., pre-test/post-test), cross-over
trials, and match procedures
• Also called “matched-pairs” and “dependent
samples”

17
Kevin Brooks MSc., PhD.
Example: Paired Samples

• A study addresses whether oat bran reduce LDL


cholesterol with a cross-over design.
• Subjects “cross-over” from a cornflake diet to an oat
bran diet.
– Half subjects start on CORNFLK, half on
OATBRAN
– Two weeks on diet 1
– Measures LDL cholesterol
– Washout period
– Switch diet
– Two weeks on diet 2
– Measures LDL cholesterol
18
Kevin Brooks MSc., PhD.
Example, Data
Subject CORNFLK OATBRAN
---- ------- -------
1 4.61 3.84
2 6.42 5.57
3 5.40 5.85
4 4.54 4.80
5 3.98 3.68
6 3.82 2.96
7 5.01 4.41
8 4.34 3.72
9 3.80 3.49
10 4.56 3.84
11 5.35 5.26 19
12 3.89 3.73 Kevin Brooks MSc., PhD.
Calculate Difference Variable “DELTA”

• Step 1 is to create difference variable “DELTA”


• Let DELTA = CORNFLK - OATBRAN
• Order of subtraction does not materially effect
results (but does change sign of differences)
• Here are the first three observations:

ID CORNFLK OATBRAN DELTA 
 Positive


---- ------- ------- -----
 values
1 4.61 3.84 0.77
represent
2 6.42 5.57 0.85
3 5.40 5.85 -0.45 lower LDL on
↓ ↓ ↓ ↓ oatbran
20
Kevin Brooks MSc., PhD.
Explore DELTA Values
Here are all the twelve paired differences (DELTAs): 

0.77, 0.85, −0.45, −0.26, 0.30, 0.86, 0.60, 0.62, 0.31, 0.72, 0.09, 0.16

-0.5 0 0.5 1 1.5

EDA shows a slight


negative skew, a median
of about 0.45, with
results varying from −0.4
to 0.8. 21
Kevin Brooks MSc., PhD.
Descriptive stats for DELTA

• Data (DELTAs): 0.77, 0.85, −0.45, −0.26, 0.30,


0.86, 0.60, 0.62, 0.31, 0.72, 0.09, 0.16
• The subscript d will be used to denote statistics for
difference variable DELTA

n = 12
xd = 0.3808
s d = 0.4335
22
Kevin Brooks MSc., PhD.
95% Confidence Interval for µd

• A t procedure directed toward the DELTA


variable calculates the confidence interval
for the mean difference.
sd
(1 − α )100% CI for µ d = xd ± t n −1,1− α ⋅
2
n
• “Oat bran” data:
For 95% confidence use t12−1,1− .05 = t11,.975 = 2.201 (from Table C)
2

.4335
95% CI for µ d = 0.3808 ± 2.201 ⋅
12
= 0.3808 ± 0.2754
= (0.105 to 0.656) 23
Kevin Brooks MSc., PhD.
Paired t Test

• Similar to one-sample t test


• µ0 is usually set to 0, representing “no mean
difference”, i.e., H0: µ = 0
• Test statistic:

xd − µ 0
tstat =
sd n
df = n − 1
24
Kevin Brooks MSc., PhD.
Paired t Test: Example“Oat bran” data

A. Hypotheses. H0: µd = 0 vs. Ha: µd ≠ 0


B. Test statistic.
xd − µ 0 0.38083 − 0
tstat = = = 3.043
s n .4335 / 12
df = n − 1 = 12 − 1 = 11
C. P-value. P = 0.011 (via computer). The evidence
against H0 is statistically significant.
D. Significance level (optional). The evidence against H0
is significant at α = 0.05 but is not significant at α = .01
25
Kevin Brooks MSc., PhD.
SPSS Output: Oat Bran data
● USE SAS TO PRODUCE THIS EXAMPLE LIVE

26
Kevin Brooks MSc., PhD.
Conditions for Inference

t procedures require these conditions:


• SRS (individual observations or DELTAs)
• Valid information (no information bias)
• Normal population or large sample (central limit
theorem)

27
Kevin Brooks MSc., PhD.
The Normality Condition

• The Normality condition applies to the sampling


distribution of the mean, not the population.
• Therefore, it is OK to use t procedures when:
– The population is Normal
– Population is not Normal but is symmetrical and
n is at least 5 to 10
– The population is skewed and the n is at least 30
to 100 (depending on the extent of the skew)

28
Kevin Brooks MSc., PhD.
Can a t procedures be used?

• If dataset is skewed and small: avoid t


procedures
• If dataset has a mild skew and is moderate in
size: use t procedures
• If data set is highly skewed and is small: avoid t
procedure

29
Kevin Brooks MSc., PhD.
Thank You

For Viewing

Kevin Brooks MSc., PhD.


MPH Program,
Division of Public Health
College of Human Medicine
Michigan State University
[email protected]
30
Kevin Brooks MSc., PhD.
Biostatistics for Public Health

The End

Chapter 11 - Inference About a Mean

Kevin Brooks MSc., PhD. 31

You might also like