STATISTICAL FOUNDATIONS
SOST70151 – LECTURE 7
Prof. Natalie Shlomo and Dr. Kathrin Morosow
Overview
Hypothesis Testing
1. Statistical Inference: Estimation and Confidence Intervals
2. Statistical Inference: Testing for Significance
3. Decisions and errors
4. t-Test
1. Statistical Inference: Estimation
Point and Interval Estimation:
A point estimate is a single number that is the best guess
for the estimation of the parameter value (mean, median,
coefficient).
An interval estimate is an interval of numbers around the
point estimate that we believe contains the parameter
value in the population. This interval is called a confidence
interval.
According to the Central Limit Theorem, we can use the
sample distribution for point and interval estimates which
is the Normal Distribution
1. Statistical Inference: Estimation
Standard error: The standard error (SE), sometimes referred to as
the standard error of the mean (SEM), is a statistic that
corresponds to the standard deviation of a sampling distribution,
relative to the mean value
As the sample size increases, the standard error gets smaller.
1. Statistical Inference: Estimation
A confidence interval for a parameter is an interval of numbers
within which the parameter in the population is believed to fall.
The probability that this method produces an interval that contains
the parameter is called the confidence level. This is a number
chosen to be close to 1, such as 0.95 or 0.99. a b
Form of confidence interval :
a b
a b
Point estimate ± Margin of error a b
x̄ ± z* σ/√n a b
a b
where x̄ is the sample mean (point estimate), a b
σ is the population standard deviation, a b
n is the sample size, and b
z* represents the appropriate z-score from the standard normal a b
distribution for your desired confidence level
θ
x̄ ± z*(se)
where se is the standard error
The z -value multiplied by SE is the margin of error.
1. Statistical Inference: Estimation
x̄ ± z* σ/√n
With greater confidence, the confidence interval is
wider because the z-score in the margin of error is
larger— for instance, z = 1.96 for 95% confidence and
z = 2.58 for 99% confidence
The larger the value of n, the smaller the margin of
error and the narrower the interval.
→ The width of a confidence interval ↑ as the
confidence level increases
→ The width of a confidence interval ↓ as the
sample size increases
2. Statistical Inference: Significance Tests
Hypothesis
In statistics, a hypothesis is a statement about a parameter in the population. It takes the
form of a prediction that a parameter takes a particular numerical value or falls in a certain
range of values.
Significance Tests
A statistical significance test uses data to summarize the evidence about a hypothesis. It does
this by comparing point estimates of parameters to the values predicted by the hypothesis.
2. Statistical Inference: Significance Tests
The Five Parts of a Significance Test
The significance test method, also called a hypothesis test , or test for short. All tests have
five parts:
1. Assumptions
2. Hypotheses
3. Test statistic
4. p-value
5. Conclusion
2. Statistical Inference: Significance Tests
1. Assumptions
Each test makes certain assumptions or has certain conditions for the test to be valid. These pertain to
• Type of data: quantitative data
• Randomization: data gathering employed randomization, such as a random sample
• Population distribution: Some tests assume that the variable has a particular probability distribution, such
as the normal distribution
• Sample size: Many tests employ an approximate normal or t sampling distribution. The approximation is
adequate for any n when the population distribution is approximately normal, but it also holds for highly
nonnormal populations when the sample size is relatively large, by the Central Limit Theorem
2. Statistical Inference: Significance Tests
2. Hypotheses
Each significance test has two hypotheses about the value of a population parameter:
𝐻0 → The null hypothesis is a statement that the parameter takes a particular value. Usually the value in 𝐻0
corresponds, in a certain sense, to no effect (no change, no difference).
𝐻𝑎 → The alternative hypothesis states that the parameter falls in some alternative range of values. The values
in 𝐻𝑎 usually represent an effect of some type. 𝐻𝑎 sometimes called the research hypothesis
(investigator's belief).
H0: μ = μ0
Ha: μ > μ0 , positive one-sided test
Ha: μ < μ0 , negative one-sided test
Ha: μ ≠ μ0, two-sided test.
2. Statistical Inference: Significance Tests
3. Test Statistic
Compares point estimate to 𝐻0 parameter value
The parameter to which the hypotheses refer has a point estimate.
The test statistic summarizes how far that estimate falls from the parameter value in 𝐻0 .
Often this is expressed by the number of standard errors between the estimate and the 𝐻0
value.
2. Statistical Inference: Significance Tests
4a. p-Value
Weight of evidence against 𝐻0 , smaller p-value is stronger evidence
The p-value is the probability that the test statistic equals the observed value or a value even more extreme in
the direction predicted by 𝐻𝑎 . In other words, it is the area under the Normal curve from the test statistic into
the tail of the distribution. It is calculated by presuming that 𝐻0 is true.
A small p-value (such as p-value=0 . 01) means that the data we observed would have been unusual if 𝐻0 were
true.
The smaller the p-value, the stronger the evidence is against 𝐻0 .
Can be any value between 0 and 1.
p-Value = the probability of the test statistic given 𝐻0 is true:
→ The smaller it is the more strongly it contradicts 𝐻0 .
→ By contrast, a moderate to large p-value means the data are consistent with 𝐻0 .
2. Statistical Inference: Significance Tests
4b. Critical Value
Alternatively, we can compare the test statistic with the critical value (z-score) associated with a given level of
significance
Often we use a 5% significance level for hypothesis tests so for a two-sided test, the critical value is 1.96 (or
-1.96 depending on the test statistic)
Weight of evidence against 𝐻0 , smaller p-value is stronger evidence and this means that the test statistic is
greater than the critical value at the given level of significance
At a 5% significance level, we reject 𝐻0 if the p-value is less than 0.05 meaning that the critical value is greater
than 1.96 (or -1.96), i.e. further in the tail of the Normal Distribution
2. Statistical Inference: Significance Tests
5. Conclusion
Report and interpret p–value (or critical value).
Based on the p-value we can make a statement about the 𝐻0 .
Conclusion should interpret what the p-value tells us about the question motivating the test.
If the p-value is sufficiently small, for example below 1% or 5%, we reject 𝐻0 and accept 𝐻𝑎 .
We can never accept the null hypothesis – rather we ‘fail to reject’.
2. Statistical Inference: Significance Tests
We can also carry out significance testing using confidence
intervals
Usual notation:
H0: θ = θ0
Ha: θ ≠ θ0
A α×100% significance test is given by the rule:
reject H0: θ = θ0 a b
on the α×100%-level
if the (1-α)×100% Confidence interval does not cover θ0 θ qˆ
otherwise: fail to reject H0: θ = θ0
2. Statistical Inference: Significance Tests
Significance testing using confidence intervals
Example: Height
height in population of men: X~N(μ, 0.0722)
Mean height in population: μ = 1.77 and 𝜎=0.072
test H0: μ = 1.77
against Ha: μ ≠ 1.77
A 5% (= 𝛼) significance test is given by the rule:
reject H0: μ = 1.77 on the 5%-significance level Sample:
if the 95% confidence interval for μ does
X = 1.50
not cover 1.77
otherwise: fail to reject H0: μ = 1.77
2. Statistical Inference: Significance Tests
Significance testing using confidence intervals
Example: Height
height men: X~N(μ, 0.0722)
from i.i.d. sample of n = 30 men 95% CI:
V(X )
X ±1.96se(X ) = 1.96
n
Sample:
0.0722 X = 1.50
1.5 ± 1.96 = 1.5 ± 0.03
30
= [1.47 ; 1.53] μ0 = 1.77 ?
2. Statistical Inference: Significance Tests
Means or Proportions
Mean Proportions
1. Assumptions Random Sample, Random Sample,
Quantitative variable Categorical variable*
(outcome is continuous) (outcome is dichotomous)
2. Hypotheses H0: μ = μ0 H0: π = π0
Ha: μ ≠ μ0 Ha: π ≠ π0
Ha: μ > μ0 Ha: π > π0
Ha: μ < μ0 Ha: π < π0
3. Test statistic 𝑦ത − μ π − π
𝑡 = 𝑠𝑒 0 with 𝑧 = 𝑠𝑒 0 with
𝑠 0
𝑠𝑒 = , 𝑑𝑓 =𝑛−1
𝑛 π0 (1 − π0)
𝑠𝑒0 =
𝑛
4. P-value Two-tail probability in sampling distribution for two-sided test;
one-tail probability for one-sided test
5. Conclusions Reject 𝐻0 if p-value ≤ α-level such as 0.05
*Note: Here the proportion in the population is represented by 𝜋
2. Statistical Inference: Significance Tests
Significance testing for the mean:
Example: number of hours worked
Using UKHLS (n = 11000) we find that women do paid work
= 11 hours
with 𝑌ത = 29 hours, 𝑠𝑑
In the population, we know that average work hours are 36
hours and the standard deviation is 13
(note: if we don’t know the population SD we can estimate it
from the sample (𝑠𝑑 = 11 ) but then we need to use the t-
distribution – to be shown below. For large sample sizes (n>30)
the t-distribution and normal distribution are the same)
2. Statistical Inference: Significance Tests
Significance testing for the mean:
Example: number of hours worked
Hypotheses:
H0: μ = 36
Ha: μ ≠ 36
Cut off point (5% - two-sided test)?
Conclusion:
𝑌ത − μ0 (1) The test statistic is -56 and is greater than the critical value
What is the z statistic? 𝑧 = of 1.96 at the 5% significance level, hence we reject the null
𝜎/ 𝑛
29 − 36 −7 hypothesis that the average week hours are 36
𝑧= = = −56 (2) The 95% confidence interval is 29 ±1.96 ∗ 0.104 or (28.8,
13/ 11000 0.124 29.2) and does not include the value of 36, hence we reject the
null hypothesis
2. Statistical Inference: Significance Tests
Significance testing for the mean:
Example: number of hours worked
p-value and sample size
If we had a smaller sample size, for example n=30, the
test statistic would be much smaller and the area under
the curve for rejecting the null hypothesis would be
greater
𝑌ത − μ0
𝑧=
𝜎/ 𝑛
29 − 36 −7
𝑧= = = −2.95
13/ 30 2.37
2. Statistical Inference: Significance Tests
Significance testing for the mean:
One-sided test (5%)
H0: μ = μ0
H a : μ > μ0
For a one−sided test, all of the 5% significance
level goes into the repective tail
critical value is 1.645 for 5% significance
level on a one-sided test vs 1.96 for a
two-sided test (where it is 2.5% significance
level in each tail)
Makes stronger assumption about direction
2. Statistical Inference: Significance Tests
One-sided vs two-sided test?
In most research articles, significance tests use two-sided p–values
• two-sided tests can detect an effect that falls in either direction
• one-sided tests detect either an increase or a decrease, but not both so
need stronger evidence for a rejection of 𝐻0
Consider the context:
“Test whether the mean has changed” → two-sided alternative, to allow for
increase or decrease
“Test whether the mean has increased” → one-sided, 𝐻𝑎 : μ > μ0
For a two-sided test at the 5% significance level, the critical value is the z-score
that gives 2.5% of the area under the curve (z=1.96)
For a one-sided test at the 5% significance level, the critical value is the z-score
that gives all of the 5% in the area under the curve in one direction (z=1.645)
2. Statistical Inference: Significance Tests
Example: IQ
The Council claims that all elderly (65+) in Manchester are highly
intelligent. Available population level data show that the average IQ
score for 65+ in Manchester is 100 with a SD of 5. Harry, a
gerontologist by training, wants to find whether this is true or not?
Harry took a random sample of 100 people aged 65+ from
Manchester. He conducts an IQ test and finds that the average
performance of 65+ in his sample is 80 with a SD of 15. Does this
mean that the Manchester Council is not right?
2. Statistical Inference: Significance Tests
Example: IQ
1. Hypotheses
The average IQ of the whole of Manchester is 100. Lets call this a null
hypothesis; Ho: =100 (i.e. Harry may chose a wrong sample by chance –
Does this mean that we should ignore Harry’s survey?
▪ Is our hypothesis sensible?
To find a balance, lets make an alternative hypothesis
HA: ≠100
2. Statistical Inference: Significance Tests
Example: IQ
2. Find the sample statistic: i.e. sample mean = 80
3. Calculate the TEST statistic:
𝑥ҧ − 𝜇 80 − 100
𝑧= = = −40
𝜎/ 𝑛 5/ 100
Now you want to see whether we can test at 5% significance level, the
critical value is 1.96 - 40 is much less than – 1.96, hence it is very
unlikely that the sample comes from a population with mean 100 and
SD 5 and we reject the null hypothesis
2. Statistical Inference: Significance Tests
Example: IQ
Confidence Interval for the mean
𝜎 5
𝑥ҧ ± 1.96 ∗ = 80 ± 1.96 ∗ = 80 ± 0.98
𝑛 100
= [79.02, 80.98]
Now what do you conclude?
The hypothesized value of 100 is not included in the
confidence interval and we reject the null hypothesis that the sample is
coming from a population with mean 100 and SD 5
2. Statistical Inference: Significance Tests
Significance testing for a proportion
Usual notation: Main difference to the mean test is the SE
H0: π = π0
H a: π ≠ π 0
𝜋ො − π0
Or it can be one-sided: 𝑧=
𝜎𝜋ෝ
H a: π > π 0
𝜋ො − 𝜋0
H a: π < π 0 𝑧=
𝜋0 (1 − 𝜋0 )
𝑛
2. Statistical Inference: Significance Tests
Significance testing for a proportion
Example: Income
Jane, an educational expert, wants to find out whether the first-year
student performances at Manchester University are associated with
family incomes or not.
Jane takes a random sample of 50 first-year students. She finds that 15
students were from poor income households. In the UK as a whole, we
know that 23% of University entrants are coming from poor income
families. Can Jane make a conclusion that the findings from her sample
differ significantly from the rest of the UK?
2. Statistical Inference: Significance Tests
Significance testing for a proportion
Example: Income
1. Hypotheses:
Null hypothesis; Ho: =0.23
(the proportion of students from poor families in Manchester is similar
to the national proportion)
Alternative hypothesis: HA: ≠0.23
2. Statistical Inference: Significance Tests
Significance testing for a proportion
Example: Income
2. Sample statistic
Proportion of Manchester students who are from poor families = 15/50
= 0.30 (p)
3. Test statistic (note that we use the population 𝜋 for the SE)
𝑝 − 𝜋 0.3 − 0.23 0.07
𝑧= = = = 1.7
𝜋 ∗ (1 − 𝜋) 0.23 ∗ (1 − 0.23) 0.05195
𝑛 50
The test statistic 1.7 is less than the critical value 1.96 therefore we fail
to reject the null hypothesis at the 5% significance level and the
proportion of students from poor families in Manchester is similar to the
national proportion
2. Statistical Inference: Significance Tests
Significance testing for a proportion
Example: Voting for the Conservatives
A recent YouGuv survey found 40% would vote
conservative tomorrow (n: 2012).
Is the number significantly different from
previous elections (42.4%)?
2. Statistical Inference: Significance Tests
Significance testing for a proportion
Example: Voting for the Conservatives Conclusion:
The test statistic -2.17 is greater than
Hypotheses: the critical value of -1.96 and we reject
H0: 𝜋 = 0.424 the null hypothesis at the 5%
Ha: 𝜋 ≠ 0.424 significance level.
In fact, the p-value is 0.015 according to
What is the critical value for two-sided 5% test? the Normal table which is less than 5%
±1.96 significance level.
Note however that we would fail to
reject the null hypothesis at the 1%
What is the test statistic?
significance level (the critical value
𝜋ො − 𝜋0 −0.024
𝑧= = = −2.17 here is 2.576)
𝜋0 (1 − 𝜋0 ) 0.424(1 − 0.424)
𝑛 2012
3. Decisions and Errors
Note: Why Don’t Statisticians Accept the Null
Possible decisions in a significance test with Hypothesis?
α-level=0.05 • A lack of evidence only means that you haven’t proven
that something exists
• It does not prove that something doesn’t exist. It might
Conclusions exist, but your study missed it.
P-value H0 Ha
Accepting the null hypothesis would indicate that you’ve
P ≤ 0.05 Reject Accept
proven an effect doesn’t exist. Instead, the strength of
P > 0.05 Do not reject Do not accept your evidence falls short of being able to reject the null
hypothesis. Consequently, we fail to reject it.
P ≤ 0.05 → statistically significant difference Failing to reject the null hypothesis indicates that our
P > 0.05 → no statistically significant difference sample did not provide sufficient evidence to conclude
that the effect exists. However, at the same time, that lack
of evidence doesn’t prove that the effect does not exist.
3. Decisions and Errors
Possible decisions in a significance test with α-level=0.05
Decision (Action)
Reject H0 Do not reject H0
Condition of 𝐻0 𝐻0 true Type I error Correct decision
(Truth) 𝐻0 false Correct decision Type II error
(power of the test)
Type I error: H0 wrongly rejected (false positive)
Type II error: H0 wrongly not rejected (false negative)
3. Decisions and Errors
Type I Error
(When H0 is true, a Type I error occurs if H0 is rejected)
The probability for a type I error is the α-level for the test 𝛼 = 𝑃 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑔𝑖𝑣𝑒𝑛 𝐻0 𝑡𝑟𝑢𝑒)
If the α-level is 0.05 and you run 100 statistical tests, you expect 5 out of 100 type I errors (5%) where you
reject the null hypothesis when in fact it is true.
→ As P(type I error) goes down, P(type II error) goes up
With a smaller α in a significance test, we need a smaller p-value to reject H0.
So it becomes harder to reject H0, even if H0 is false.
The stronger the evidence required, the more likely a wrong decision.
3. Decisions and Errors
What does it mean to make
α = P reject H0 given H0 true)
unlikley if
x
small? H0: μ = μ0 is unlikely if H0: μ = μ0
true is true
x x
For r.v. X~N(μ, σ2), H0 : μ = μ0, implies
X ~ N(m0,s 2 / n)
μ0
when H0: μ = μ0 true
Relatively likely if H0: μ = μ0 is true
3. Decisions and Errors
set critical value/rejection region
X ~ N(m0,s 2 / n)
when H0: μ = μ0 true
xത is ”far” from 𝜇0 if xത > Cu or xത < Cl
Typically, in statistics:
something is “large” if the probability of getting /2 /2
something bigger is small
α = P reject H0 given H0 true) Cl μ0 Cu
ഥ < Cl | H0 true) + P(X
= P(X ഥ > Cu | H0 true)
3. Decisions and Errors
Conclusion: setting too small an 𝛼 we risk:
P(do not reject H0 | H0 false) = P(type II error) getting too big...
usual significance levels
𝛼 = 0.1 we risk reject true H0 in 10 of hundred
𝛼 = 0.05 we risk reject true H0 in 5 of hundred
𝛼 = 0.01 we risk reject true H0 in 1 of hundred
→ The lower the 𝛼–level the stronger evidence needed to reject H0
→ 𝛼-level must be selected before analysing the data
3. Decisions and Errors
Type II Error
(When H0 is false, a type II error results from not rejecting it)
• Its probability has more than one value, because Ha contains a range of possible values
• Type II error is inversely related to Type I errorm decreases with sample size but more
complicated to calculate
• We often refer to the probability of (1- type II error) as the power of the test: this is where
we reject the null hypothesis correctly and there is in fact a deviation and the alternative
hypothesis is correct (the power is related to the sample size and tells us how
‘distinguishable’ we can make our decision to reject the null hypothesis,. For example, if
the significance level is 5% and a test statistics is 2, this will have lower power compared
to a test statistic of 20)
3. Decisions and Errors
The smaller the 𝛽
the more power
4. T-Test
T-distribution:
It is a type of normal distribution used for smaller
sample sizes, and where the variance in the population
is unknown.
• The normal distribution assumes that the
population standard deviation is known. The t-
distribution does not make this assumption.
• The t-distribution is defined by the degrees of
freedom (n-1). In this case, n=the sample size
• The t-distribution is most useful for small sample
sizes, when the population standard deviation is
not known, or both.
• As the sample size increases, the t-distribution
becomes more similar to a normal distribution
• A common rule of thumb is that for a sample size of
at least 30, one can use the normal distribution in
place of a t-distribution.
4. T-Test
Example: number of hours studied
We randomly ask 10 students how many hours they prepare for this
ത
course: 𝑌=3,
𝑠𝑑=0.05
We know they should spend 8 hours.
Is this significantly different?
Hypotheses:
𝐻0 : μ = 8
𝐻𝑎 : μ ≠ 8
Number of degrees of freedom?
df = 9
This is a two-sided test and we use the 5% significance level
We use the t-distribution table to determine the critical value
4. T-Test
Example: number of hours studied: T-tables
4. T-Test
Example: number of hours studied
Hypotheses:
𝐻0 : μ = 8
𝐻𝑎 : μ ≠ 8
Number of degrees of freedom?
df = 9
Cut off point (5% two-sided) → 2.262
What is the t statistic?
𝑌ത − 𝜇0 4−8 −4
𝑡= = = = −𝟐𝟔𝟔
𝜎ො / 𝑛 0.05 / 10 0.015
→ Conclusion?
Reading: Confidence Intervals, Testing for Significance
Decisions and errors, t-Test
Crawshaw & Chambers (2014)
Ch.: 9, 10 and 11
Agresti, A. (2018)
Ch.: 5 and 6