0% found this document useful (0 votes)

12 views22 pages

SB (Final-Note)

This document discusses sampling distributions and estimation, emphasizing the importance of sample size in achieving accurate population estimates. It covers concepts such as sampling error, bias, and the central limit theorem, which helps approximate the sampling distribution of sample means. Additionally, it outlines the process of hypothesis testing, including the formulation of null and alternative hypotheses and the steps involved in testing them.

Uploaded by

linh.ha.005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views22 pages

SB (Final-Note)

Uploaded by

linh.ha.005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

1

Chapter 8. Sampling distributions and estimation

Sampling and estimation

● Some samples represent population well ( same similar to population)

● However, some samples differ greatly from population (particularly if the sample size is small)
→ Sampling variation

→ Larger samples, 𝑥 (sample mean) tends to be closer to µ (population mean)

→ Statistical estimation
Example: P = {1, 2, 3, 4} → sample of sizes
S1 = {1, 2} → 𝑥 = 1. 5 → an point estimate of µ
1

S2 = {1, 4} → 𝑥
2
= 2. 5, tính tương tự với {1, 3}, {2, 3}, {2, 4}, {3, 4}
→ 𝑋 (S1) = 1.5 → 𝑋: 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛
1.5+2+2.5+2.5+3+3.5
µ𝑋 = σ
= 2. 5 = µ

1 1 2
Practice. P (𝑋 = 1. 5) = P ({1, 2}) = 2 = 6
⇒ 𝐶4 𝑙à 𝑐ℎọ𝑛 2 𝑠ố 𝑡ừ 𝑡ậ𝑝 4 𝑠ố 𝑛ℎư 𝑡𝑟ê𝑛
𝐶4

1.5 2 2.5 3 3.5

𝑋
P 1/6 1/6 1/3 1/6 1/6

Sampling error: 𝑆 = {1, 2} → 𝑥 = 1. 5, µ = 2. 5 ⇒ 𝑥 − µ =− 1

1 1

Get samples from the population. Make inferences about population from sample.

*NOTE:
Uncontrollable Controllable

● Sampling variation ● Sample size

● Population variation ● Desired confidence in the estimate
2
Estimators
● Estimator is a statistic derived from a sample to infer the value of a population parameter

Population parameters Sample estimators

µ − σ − π 𝑥− 𝑠 − 𝑝

1. Mean µ : population mean - 𝑥 : sample mean

2. Standard deviation: σ : population SD - s : sample SD
3. Proportion: π : population - p : sample

𝑛 2
𝑛 ∑ (𝑥𝑖−𝑥)
1 𝑖=1
Sample mean: 𝑥 = 𝑛 ∑ 𝑥 Sample SD: 𝑠 = 𝑛−1
𝑖
𝑖=1
𝑥
Sample proportion: 𝑝 = 𝑛

Sampling error: the difference between an estimate and the corresponding population parameter
𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 𝑒𝑟𝑟𝑜𝑟 = 𝑥 − µ
Bias: the difference between the expected value of the estimator and the true parameter
𝐵𝑖𝑎𝑠 = 𝐸(𝑋) − µ

Central limit theorem

σ
n is large enough: 𝑋 ∼ 𝑁(µ𝑋 , σ𝑋) → µ𝑋 = µ → σ𝑋 =
𝑛
● Sampling distribution: probability distribution of all possible values the statistics may assume when a
random sample of size n is taken
*NOTE: sample mean 𝑋 used to estimate population mean µ
Explain: bởi population quá lớn nên ko thể lấy population mean nên ta chỉ lấy sample, sau đó dùng sample
mean để generalize và để make reference about population mean. Sample mean tượng trưng cho
population mean. - tương tự đối vs SD và proportion
𝐸(𝑋) = µ (𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛)
σ
σ𝑥 = (𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛)
𝑛

● Central limit theorem: allows us to approximate the shape of the sampling distribution of 𝑋
3
Range of sample means
σ
● Expected range of sample means: µ ±𝑧
𝑛
● We use the familiar z-values for the standard normal distribution. If we know µ and σ, the CLT allows us
to predict the range of sample means for sample of size n:

90% interval 95% interval 99% interval

σ σ σ
µ ± 1. 645 µ ± 1. 960 µ ± 2. 576
𝑛 𝑛 𝑛

Sample size and standard error

The standard error decreases as n increases
Example. When n = 4, the standard error is halved. To halve it again requires n = 16, and to halve it again
requires n = 64. To halve the standard error, you must quadruple the sample size
Sample size standard error
n=4 σ𝑥 = σ/ 4 = σ/2

n = 16 σ𝑥 = σ/ 16 = σ/4

n = 64 σ𝑥 = σ/ 64 = σ/8
Exercise. Consider a discrete uniform distribution consisting of integers {0, 1, 2, 3}. The population parameters
are µ = 1. 5 and σ = 1. 118
𝑁
1 0+1+2+3
µ = 𝑁
∑ 𝑥𝑖 = 4
= 1. 5
𝑖=1

𝑛
2
∑ (𝑥𝑖−µ) 2 2 2 2
𝑖=1 (0−1.5) +(1−1.5) +(2−1.5) +(3−1.5)
σ = 𝑁
= 4
= 1. 118
Với n = 2, chọn 2 số bất kỳ trong tập hợp {0, 1, 2, 3} → có 16 trường hợp xảy ra: (0, 0); (0, 1); (0, 2);...
𝑥1+𝑥2 0+0
Each sample mean: 𝑥 = = = 0 (𝑣í 𝑑ụ 𝑐ℎ𝑜 𝑡𝑟ườ𝑛𝑔 ℎợ𝑝 (0, 0))
2 2

Confidence interval for a mean (µ) with known σ

Confidence interval
● Point estimate: a sample mean 𝑥 calculated from a random sample 𝑥 , 𝑥2,..., 𝑥𝑛.
1
σ σ
● Confidence interval: 𝑥 ± 𝑧𝑎/2 (𝑤ℎ𝑒𝑟𝑒 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛)
𝑛 𝑛
*NOTE: if samples are drawn from a normal population and σ is known, the margin of error is calculated using
the standard normal distribution
4
The value of 𝑧 will depend on the level of confidence desired
𝑎/2
α = area of tails, (1 − α): confidence interval for a mean, σ is known

As can be seen from the information above: σ = 1. 25 ↔ σ 𝑖𝑠 𝑘𝑛𝑜𝑤𝑛

Sample: 𝑥 = 21. 0, 𝑛 = 10
σ
Q: 95% confidence interval for µ ⇒ 𝑥 ± 𝑧𝑎/2
𝑛

α
1 − α = 95% → α = 5% ⇒ 2
= 0. 025
𝑧𝑎/2 = 𝑞𝑛𝑜𝑟𝑚(0. 025) =− 1. 96

The error E = |𝑧𝑎/2| * α

𝑛
= 1. 96 *
1.25
10
= 0. 775
● How if σ is unknown ?

Example. If the chosen confidence level is 90% → 1

− α = 0. 9 ⇒ α = 0. 1
→ we would use 𝑧 = 𝑧0.1/2 = 𝑧0.05 = 1. 645
𝑎/2

Choosing a confidence level

● Confidence is not free - there is a trade-off that must be made.
● A higher confidence level leads to a wider confidence interval
5
● In order to gain confidence, we must accept a wide range of possible values for µ. Greater confidence
implies loss of precision (e.g. greater margin of error)
● A 95 confidence level is often used b/c it’s a reasonable compromise btw confidence & precision
Interpretation
σ σ
𝑃(𝑋 − 𝑧𝑎/2 < µ < 𝑋 + 𝑧𝑎/2 )= 1− α
𝑛 𝑛

→ This is a statement about the random variable 𝑋

Confidence interval for a mean (µ) with unknown µ

Student’s t distribution
● This should be used instead of normal z distribution when the population is normal but its SD is unknown
● This is particularly important when the sample size is small
𝑠 𝑠
𝑥 ± 𝑡α/2 (𝑤ℎ𝑒𝑟𝑒 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 𝑒𝑟𝑟𝑜𝑟 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛)
𝑛 𝑛
Interpretation
The interpretation of the confidence interval is the same as when σ is known, however, the confidence intervals
will be wider b/c 𝑡 always greater than 𝑧
𝑎/2 𝑎/2

Degrees of freedom
𝑑. 𝑓 = 𝑛 − 1 (𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝑓𝑜𝑟 𝑎 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑓𝑜𝑟 µ)

Confidence interval for a proportion (π)

For a proportion, the CLT says that the distribution of a sample proportion 𝑝 = 𝑥/𝑛 tends toward normality as
n increases

● As n increases, the range of sample proportion p = x/n narrows b/c n appears in the denominator of the
standard error:
π(1−π)
σ𝑃 = 𝑛
6
→ therefore, the sampling variation can be reduced by increasing the sample size

*NOTE: the sample proportion p = x/n may be assumed normal if both 𝑛π ≥ 10 and 𝑛(1 − π) ≥ 10

𝑝(1−𝑝)
Confidence interval for π: 𝑝 ± 𝑧 𝑛
𝑎/2
The width of the confidence interval for π depends on:
● Sample size
● Confidence level
● Sample proportion p
● If we want a narrower interval, we could either increase the sample size or reduce the confidence
level (e.g. from 95% to 90%)

Chapter 9. One-sample hypothesis tests

Logic of hypothesis testing

● The process of hypothesis testing can be an iterative process (quá trình lặp đi lặp lại)

● All business managers need at least a basic understanding of hypothesis testing b/c managers often
interact with specialists, read technical reports,...
Steps

1 State the hypothesis to be tested

One statement or the other must be true, but they can’t be both true
● 𝐻0: 𝑁𝑢𝑙𝑙 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠
● 𝐻1: 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠
E.g. Criminal trial: the hypotheses are:
1. 𝐻0: 𝑡ℎ𝑒 𝑑𝑒𝑓𝑒𝑛𝑑𝑎𝑛𝑡 𝑖𝑠 𝑖𝑛𝑛𝑜𝑐𝑒𝑛𝑡
2. 𝐻1: 𝑡ℎ𝑒 𝑑𝑒𝑓𝑒𝑛𝑑𝑎𝑛𝑡 𝑖𝑠 𝑔𝑢𝑖𝑙𝑡𝑦
→ a defendant is innocent unless the evidence gathered by the prosecutor is sufficient to
reject this assumption

2 Specify the decision rule

3 Collect data & calculate necessary statistics to test the hypothesis

7
4 Make a decision. Should the hypothesis be rejected or not ?

5 Take action based on the decision

Type I and type II error

We have 2 possible choices concerning the Null hypothesis. We either reject 𝐻 or fail to reject 𝐻
0 0

★ Rejecting the null hypothesis when it is true → Type I error (false positive)
★ Failure to reject the null hypothesis when it is false → Type II error (false negative)

Probability of Type I and Type II errors

Relationship between α and β

The proper balance between α and β can be elusive
E.g. a doctor who is conservative about admitting patients with symptoms of heart attack (reduced β) will admit
more patients with no heart attack (increased α)
*NOTE: both α and β can be reduced simultaneously only by increasing the sample size, which is not always
feasible and cost-effective

Decision rules and critical values

Statistical hypothesis: a statement about the value of a proportion parameter

Hypothesis test: a decision btw 2 competing, mutually exclusive, and collectively exhaustive hypotheses about
the value of parameter
*NOTE:
● For a mean or proportion, the value of µ (or π ) is a benchmark based on past experience, an industry
0 0
standard, a target or a product specification
● µ0 (or π0) does not come from a sample

One-tailed and two-tailed tests

There are 3 possible alternative hypotheses:

Left-tailed test Two-tailed test Right-tailed test

𝐻0: µ ≥ µ0 𝐻0: µ = µ0 𝐻0: µ ≤ µ0

𝐻1: µ < µ0 𝐻1: µ ≠ µ0 𝐻1: µ > µ0

Decision rule
● Extreme outcomes occurring in the left tail would cause us to reject the null hypothesis in a right-tailed
test, the same for right tail
● Rejection region: the area under the sampling distribution curve that defines an extreme outcome
● Test statistic: measures the difference between the sample statistic and the hypothesised parameter

Testing a mean: known population variance

Test statistic
● Test statistic measures the difference between a given sample mean 𝑥 and a benchmark µ in terms
0
of the standard error of the mean
● Test statistic: is the “standardised score” of the sample statistic
● 𝑧𝑐𝑎𝑙𝑐: refer to the calculated value of the rest statistic

Steps

1 State the hypotheses

The question indicates the right-tailed test, the hypotheses would be:
● 𝐻0: µ ≤ 216𝑚𝑚 (product mean does not exceed the specification)
● 𝐻1: µ > 216𝑚𝑚 (product mean has risen above the specification)
● µ0 = 216𝑚𝑚 (product specification)

2 Specify the decision rule

Reject 𝐻 if 𝑧 > 1. 645, otherwise do not reject 𝐻0
0 𝑐𝑎𝑙𝑐
9

3 Collect sample data and calculate the test statistic

If 𝐻 is true, the test statistic should be near 0. The value of test statistic:
0
𝑥−µ0
𝑧𝑐𝑎𝑙𝑐 = σ
𝑛

4 Make the decision

5 Take action

P-value method
For a right-tailed test, the decision rule using the p-value approach is stated as:
𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑃(𝑍 > 𝑧𝑐𝑎𝑙𝑐) < α, otherwise fail to reject 𝐻0

Two-tailed test

Steps

1 State the hypotheses

For a two-tailed test, the hypotheses are:
● 𝐻0: µ = 216𝑚𝑚 (product mean is what it is supposed to be)
● 𝐻1: µ ≠ 216𝑚𝑚 (product mean is not what it is supposed to be)

2 Specify the decision rule

Reject 𝐻 if
0
𝑧𝑐𝑎𝑙𝑐 >+ 1. 96 𝑜𝑟 𝑖𝑓 𝑧𝑐𝑎𝑙𝑐 <− 1. 96, otherwise do not reject 𝐻0

3 Calculate the test statistic

The test statistic is unaffected by the hypotheses or the level of significance
𝑥−µ0
𝑧𝑐𝑎𝑙𝑐 = σ
𝑛

4 Make the decision

5 Take action
P-value approach
In a two-tailed test, the decision rule using the p-value method is the same as one-tailed test:
10
𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝑝 − 𝑣𝑎𝑙𝑢𝑒 < α, otherwise, do not reject 𝐻0

Testing a mean: unknown population variance

Using student’s t

Testing proportion
Our rule is to assume normality if 𝑛π≥ 10 and 𝑛(1 − π0) ≥ 10
0
If we can assume a normal sampling distribution, then the test statistic would be the z-score. The sample
proportion is:
𝑥 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠
𝑝 = 𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
Test statistic for a proportion
𝑝−π0 𝑝−π0
𝑧𝑐𝑎𝑙𝑐 = σ𝑝
=
π0(1−π0)
𝑛

*NOTE: the value of π we are testing is a benchmark

Chapter 10. Two-sample hypothesis tests

Two-sample tests

● Two-sample tests: compare 2 sample estimates with each other, whereas one-sample tests compare a
sample estimate with a non sample benchmark or target

E.g. manufacturer A’s sample mean was 510.5 with a SD of 147.2 in 18 tests, compare with manufacturer B’s
mean of 628.9 with a SD of 237.9 in 17 tests

Basic of two-sample tests

You can think of many situations where 2 groups are to be compared:
➢ Before versus after
➢ Old versus new
➢ Experimental versus control
*The logic of two-sample tests is based on the fact that 2 samples drawn from the same population may
yield different estimates of a parameter due to chance

Test procedure
Larger samples are always desirable because they permit us to reduce the chance of making either a Type I
error or Type II error, however, large samples take time and cost money, so we often must work with available
data.
11

Comparing 2 means: independent samples

Format of hypotheses - find the distance between µ and µ
1 2
*Tìm khoảng cách giữa 2 giá trị trên sau đó so sánh để quyết nó thuộc loại nào dưới đây.

𝐷0 = 0, this is what we need to focus on

Test statistic
The sample statistic used to test the parameter µ − µ2 is 𝑋1 − 𝑋2 where both 𝑋1 and 𝑋2 are calculated
1
from independent random samples taken from normal populations
The formula for the test statistic is determined by the sampling distribution of the sample statistic. There are 3
cases to consider:
1. Case 1: 2 known variances
(𝑥1−𝑥2)−(µ1−µ2)
𝑧𝑐𝑎𝑙𝑐 = 2 2
σ1 σ2
𝑛1
+𝑛
2

2. Case 2: unknown variances, assumed equal

3. Case 3: unknown variances, assumed unequal

For the common situation of testing for a zero difference (𝐷 = 0) in 2 population means the possible pairs of
0
null and alternative hypotheses are:

Large samples
𝑥1−𝑥2
𝑧𝑐𝑎𝑙𝑐 = 2 2
𝑠1 𝑠2
𝑛1
+𝑛
2

Confidence interval for the difference of 2 means, µ − µ

1 2
If the confidence interval for the difference of 2 means includes zero, we could conclude that there is no
significant difference in means
● Equal variances:

● Unequal variances:

Comparing 2 means: paired samples

Paired data
13
If the same individuals are observed twice but under different circumstances, we have a paired comparison.
● Paired data typically come from a before-after experiment, but not always

Paired t test
In the paired t test we define a new variable 𝑑 = 𝑋1 − 𝑋2 as the difference between 𝑋1 and 𝑋2
𝑛
∑ 𝑑𝑖
𝑖=1
𝑑= 𝑛
(𝑚𝑒𝑎𝑛 𝑜𝑓 𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠)

𝑛 2
(𝑑𝑖−𝑑)
𝑠𝑑 = ∑ 𝑛−1
(𝑠𝑡𝑑. 𝑑𝑒𝑣. 𝑜𝑓 𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒𝑠)
𝑖=1

Analogy to confidence interval

A two-tailed test for a zero difference is equivalent to asking whether the confidence interval for the true mean
difference µ includes zero
𝑑
𝑠𝑑
𝑑 ± 𝑡α/2 (confidence interval for difference of paired means)
𝑛

Comparing two proportions

Testing for zero difference π − π = 0
1 2
3 possible pairs of hypotheses are:

Sample proportions

Pooled proportion
If 𝐻 is true, there is no difference between π and π .
0 1 2
𝑥1+𝑥2 #𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑛 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
𝑝𝑐 = 𝑛1+𝑛2
= 𝑐𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒𝑠
(𝑝𝑜𝑜𝑙𝑒𝑑 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛)
Test statistic
14

Confidence interval for the difference of 2 proportions, π − π

1 2
A confidence interval for the difference of 2 population proportions, π − π2, given by
1

𝑝1(1−𝑝1) 𝑝2(1−𝑝2)
(𝑝1 − 𝑝2) ± 𝑧𝑎/2 𝑛1
+ 𝑛2

The rule of thumb for assuming normally is that 𝑛𝑝 ≥ 10 and 𝑛(1 − 𝑝) ≥ 10 for each sample
Comparing 2 variances
Format of hypotheses

An equivalent way to state these hypotheses is to look at the ratio of the 2 variances. A ratio near 1 would
indicate equal variances

.
The F test

If the null hypothesis of equal variances is true, this ratio should be near 1:
𝐹𝑐𝑎𝑙𝑐≌ 1 (𝑖𝑓 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)
If the test statistic F is much less than 1 or much greater than 1, we would reject the hypothesis of equal
population variances.
2 2
● The numerator 𝑠 has degrees of freedom 𝑑𝑓 = 𝑛 − 1, while the denominator 𝑠 has degrees of
1 1 1 2
freedom 𝑑𝑓 = 𝑛2 − 1
2
2 2
● F can’t be negative, since 𝑠 and 𝑠 can’t be negative
1 2
Two-tailed F test
The critical values for the F test are denoted 𝐹 (left tail) and 𝐹 (right tail)
𝐿 𝑅
Notice that the rejection regions are asymmetric
15

A right-tail critical value 𝐹 may be found from Appendix F using 𝑑𝑓 and 𝑑𝑓 degrees of freedom.
𝑅 1 2
𝐹𝑅 = 𝐹𝑑𝑓 ,𝑑𝑓 (right-tail critical F)
1 2

1
𝐹𝐿 = 𝐹𝑑𝑓 ,𝑑𝑓
(left-tail critical F with reversed 𝑑𝑓 and 𝑑𝑓 )
1 2
2 1

Steps

1 State the hypotheses

For a two-tailed test for equality of variances, the hypotheses are:

2 Specify the decision rule

❖ Numerator: 𝑑𝑓 = 𝑛1 − 1
1
❖ Denominator: 𝑑𝑓 = 𝑛2 − 1
2

3 Calculate the test statistic

2
𝑠1
𝐹𝑐𝑎𝑙𝑐 = 2
𝑠2

4 Make the decision

Folded F test
The test statistic for the folded F test is:
2
𝑠𝑚𝑎𝑥
𝐹𝑐𝑎𝑙𝑐 = 2 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑖𝑓 𝐹𝑐𝑎𝑙𝑐 > 𝐹α/2
𝑠𝑚𝑖𝑛
*NOTE: the largest variance goes in the numerator and the smaller variance in the denominator
➢ ‘Larger’ refers to the variance
➢ But the hypotheses are the same as two-tailed test:
2 2
𝐻0: σ1/σ2 = 1
2 2
𝐻1: σ1/σ2 ≠ 1
16
One-tailed F test
Suppose that the firm was interested in knowing whether the new bumper had reduced the variance in collision
damage cost. We would then perform a left-tailed test.

Steps

1 State the hypotheses

2 Specify the decision rule

Degrees of freedom for the F test are the same as for a two-tailed test:
❖ Numerator: 𝑑𝑓 = 𝑛1 − 1
1
❖ Denominator: 𝑑𝑓 = 𝑛2 − 1
2

3 Calculate the test statistic

The test statistic is the same as for a two-tailed test
2
𝑠1
𝐹𝑐𝑎𝑙𝑐 = 2
𝑠2

4 Make the decision

Chapter 11. Analysis of variance (ANOVA)

In this chapter, you will learn how to compare more than 2 means simultaneously and how to trade sources of
variation to potential explanatory factors by using analysis of variance (ANOVA)

Variation in the response variable about its mean either is explained by one or more categorical independent
variables (the factors) or is unexplained (random error)
𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑖𝑛 𝑌 = 𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 + 𝑢𝑛𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
● Each possible value of a factor or combination of factors is a treatment

A simple way to state the one-factor ANOVA hypothesis:

● 𝐻0: µ1 = µ2 = µ3 = µ4
● 𝐻1: 𝑛𝑜𝑡 𝑎𝑙𝑙 𝑚𝑒𝑎𝑛𝑠 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙 (𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑑𝑖𝑓𝑓𝑒𝑟𝑠 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑜𝑡ℎ𝑒𝑟𝑠)
17

One-factor ANOVA (completely randomised model)

If we are only interested in comparing the means of C groups, we have a one-factor ANOVA
If subjects (individuals) are assigned randomly to treatments, we call this completely randomised model
(most common ANOVA model)
The total number of observations:
𝑛 = 𝑛1 + 𝑛2 + 𝑛3 +... + 𝑛𝑐

One-factor ANOVA as a linear model

This is an equivalent way to express the one-factor ANOVA model

If we interested in only what happens to the response for the particular levels of the factor that were selected
(fixed-effects model), the hypotheses to be tested are:
𝐻0: 𝑇1 = 𝑇2 =... = 𝑇𝐶 = 0
𝐻1: 𝑛𝑜𝑡 𝑎𝑙𝑙 𝐴𝑗𝑎𝑟𝑒 𝑧𝑒𝑟𝑜

If the Null hypothesis

True ➢ Observation x came from treatment j does not explain the variation
in Y
➢ ANOVA model collapses to:
𝑦𝑖𝑗 = µ + ε𝑖𝑗

False 🙂 At least some of the 𝑇 must be nonzero

𝑗
𝑇𝑗: negative (below µ)

Group means
𝑛𝑗
1
𝑦𝑖 = 𝑛𝑗
∑ 𝑦𝑖𝑗 (mean of each group)
𝑖=1

𝑐 𝑛𝑗 𝑐
1 1
𝑦= 𝑛
∑ ∑ 𝑦𝑖𝑗 = 𝑛
∑ 𝑛𝑗𝑦𝑗 (overall sample mean)
𝑗=1 𝑖=1 𝑗=1

Partitioned sum of squares

(𝑦𝑖𝑗 − 𝑦) = (𝑦𝑗 − 𝑦) + (𝑦𝑖𝑗 − 𝑦𝑗)
18
𝑐 𝑛𝑗 𝑐 𝑐 𝑛𝑗
2 2 2
∑ ∑ (𝑦𝑖𝑗 − 𝑦) = ∑ 𝑛𝑗(𝑦𝑗 − 𝑦) + ∑ ∑ (𝑦𝑖𝑗 − 𝑦𝑗)
𝑗=1 𝑖=1 𝑗=1 𝑗=1 𝑖=1
This important relationship may be simply expressed as:

⚠️ the sums SSB and SSE may be used to test the hypothesis that the treatment means differ from the grand
mean.
● The F test statistic is the ratio of the resulting mean squares.

Test statistic

💥 The test statistic 𝐹 =

𝑀𝑆𝐵
𝑀𝑆𝐸
cannot be negative ⇒ the F test for equal treatment means is always a
right-tailed test

Decision rule

Steps

1 State the hypotheses

➢ 𝐻0: µ1 = µ2 = µ3 = µ4
➢ 𝐻1: 𝑛𝑜𝑡 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙 (𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 1 𝑚𝑒𝑎𝑛 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡)

2 State the decision rule

Degrees of freedom for the F test are:
➢ Numerator: 𝑑𝑓 = 𝑐 − 1
1
➢ Denominator: 𝑑𝑓 =𝑛−𝑐
2

3 Perform the calculations

19
4 Make the decision

5 Take action

Multiple comparisons
To maintain the desired overall probability of Type I error, we need to create a simultaneous confidence interval

🤓
for the difference of means based on the pool variances for all c groups
𝑐(𝑐−1)
For all c groups, there are distinct pairs of means to be compared
2

Tukey’s studentized range test - HSD

● It has good power and is widely used
● This test is available in most statistical packages
● Two-tailed test for equality of paired means from c groups

The hypotheses to compare group j with group k are:

𝐻0: µ𝑗 = µ𝑘
𝐻1: µ𝑗 ≠ µ𝑘
||𝑦 −𝑦 ||
| 𝑗 𝑘|
Tukey’s test statistic is: 𝑇𝑐𝑎𝑙𝑐 =
1 1
𝑀𝑆𝐸[ 𝑛 + 𝑛 ]

😘
𝑗 𝑘

we would reject 𝐻 if 𝑇
0 𝑐𝑎𝑙𝑐
> 𝑇𝑐,𝑛−𝑐, where 𝑇𝑐,𝑛−𝑐 is a critical value for the desired level of significance
Tukey’s test statistic could also be written as:

😄 The decision rule for any pair of means is:

||𝑦 −𝑦 ||
| 𝑗 𝑘|
Reject 𝐻 if
0
𝑇𝑐𝑎𝑙𝑐 = > 2. 86
1 1
𝑀𝑆𝐸[ 𝑛𝑗
+ 𝑛𝑘
]

Tests for homogeneity of variances

Hartley’s test
The hypotheses are:
2 2 2 2
𝐻0: σ1 = σ2 =... = σ𝑐 𝐻1: 𝑇ℎ𝑒 σ𝑗 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑎𝑙𝑙 𝑒𝑞𝑢𝑎𝑙
Hartley’s test statistic is the ratio of the largest sample variance to the smallest sample variance:

2
𝑠𝑚𝑎𝑥
𝐻𝑐𝑎𝑙𝑐 = 2
𝑠𝑚𝑖𝑛
20

Two-factor ANOVA without replication (randomised block model)

In this two-factor ANOVA without replication (or non repeated measures design), each factor combination is
observed exactly once

Two-factor ANOVA model

𝑦𝑗𝑘 = µ + 𝐴𝑗 + 𝐵𝑘 + ε𝑗𝑘

👏 The random error is assumed to be normally distributed with zero mean and the same variance for all
treatments
21

ANOVA table
22
Total = between + within
Between df = SS between / MS between = 12471.6 / 2078.6 = 6

Chapter 12. Simple regression

Revision SB Chap 8 12 Updated 1
No ratings yet
Revision SB Chap 8 12 Updated 1
44 pages
MAS202 - Assignment 2: Exercise 1
No ratings yet
MAS202 - Assignment 2: Exercise 1
16 pages
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
No ratings yet
2006 Geog090 Week06 Lecture01 CentralLimitTheorem
37 pages
CFA Level 1 Review - Quantitative Methods
50% (2)
CFA Level 1 Review - Quantitative Methods
10 pages
Outline 3
No ratings yet
Outline 3
1 page
A Session 18 2021
No ratings yet
A Session 18 2021
36 pages
Lecture 6 Estimation
No ratings yet
Lecture 6 Estimation
8 pages
Sampling Distributions & Confidence Interval
No ratings yet
Sampling Distributions & Confidence Interval
42 pages
SB K49 Lecture7
No ratings yet
SB K49 Lecture7
57 pages
Applied Statistics and Probability For Engineers Chapter - 8
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 8
13 pages
Lecture 4.2
No ratings yet
Lecture 4.2
31 pages
10 Inferential Statistics
No ratings yet
10 Inferential Statistics
39 pages
Sampling
No ratings yet
Sampling
34 pages
Business Statistics Interval Estimation 2025
No ratings yet
Business Statistics Interval Estimation 2025
60 pages
Chapter 6. Estiamation
No ratings yet
Chapter 6. Estiamation
65 pages
Unit-4 - Confidence Interval and CLT
No ratings yet
Unit-4 - Confidence Interval and CLT
29 pages
Chapter-8-Estimation & Hypothesis Testing Bios
No ratings yet
Chapter-8-Estimation & Hypothesis Testing Bios
10 pages
Normal Population, Hypothesis Testing
No ratings yet
Normal Population, Hypothesis Testing
44 pages
66362dfe2b8230e6ab4031d5 MAS202 Chap8
No ratings yet
66362dfe2b8230e6ab4031d5 MAS202 Chap8
39 pages
Module3 Part3 Inference About Population Mean
No ratings yet
Module3 Part3 Inference About Population Mean
67 pages
Estimations
No ratings yet
Estimations
24 pages
Lecture On Normal Distribution - Docx STAT JUNE 17
No ratings yet
Lecture On Normal Distribution - Docx STAT JUNE 17
15 pages
Ci 1
No ratings yet
Ci 1
47 pages
Chapter 6 - Estimation
No ratings yet
Chapter 6 - Estimation
20 pages
Group 4 - SB T125WSB 1 - Slide Report
No ratings yet
Group 4 - SB T125WSB 1 - Slide Report
22 pages
Chapter 9 Slides
No ratings yet
Chapter 9 Slides
33 pages
Statistics for Data Analysts
No ratings yet
Statistics for Data Analysts
29 pages
Lecture 6 Estimate
No ratings yet
Lecture 6 Estimate
19 pages
Inference Statistics
No ratings yet
Inference Statistics
41 pages
Theory of Estimation
100% (1)
Theory of Estimation
30 pages
Unit-3 (Estimation)
No ratings yet
Unit-3 (Estimation)
16 pages
Business Analytics & Machine Learning: Regression Analysis
No ratings yet
Business Analytics & Machine Learning: Regression Analysis
58 pages
L8 Estimate 2014
No ratings yet
L8 Estimate 2014
40 pages
Lecture 02
No ratings yet
Lecture 02
54 pages
Theory Term2
No ratings yet
Theory Term2
9 pages
Confidence Interval
No ratings yet
Confidence Interval
44 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Estimation Handout
No ratings yet
Estimation Handout
7 pages
Chapter 7 Confidence Interval and Sample Mean A
No ratings yet
Chapter 7 Confidence Interval and Sample Mean A
37 pages
Chapter 3 - Sampling Distribution and Confidence Interval1
No ratings yet
Chapter 3 - Sampling Distribution and Confidence Interval1
54 pages
Unit 10
No ratings yet
Unit 10
20 pages
Chapter 8
No ratings yet
Chapter 8
42 pages
Inferential Statistic: 1 Estimation of A Population Mean
No ratings yet
Inferential Statistic: 1 Estimation of A Population Mean
8 pages
Confidence Interval
No ratings yet
Confidence Interval
2 pages
HCMUTE Prob and Stat Lecture 5 - Estimate
No ratings yet
HCMUTE Prob and Stat Lecture 5 - Estimate
19 pages
Statistic Formula
No ratings yet
Statistic Formula
8 pages
h5 Statistical Inference
No ratings yet
h5 Statistical Inference
4 pages
ECO2004 Ch9
No ratings yet
ECO2004 Ch9
12 pages
Statistical Estimation Guide
No ratings yet
Statistical Estimation Guide
68 pages
C - Normal Distribution
No ratings yet
C - Normal Distribution
196 pages
Chapter 17 Confidence Interval
0% (1)
Chapter 17 Confidence Interval
3 pages
Estimation and Confidence Intervals: Mcgraw Hill/Irwin
No ratings yet
Estimation and Confidence Intervals: Mcgraw Hill/Irwin
15 pages
Stimation: Statistic
No ratings yet
Stimation: Statistic
46 pages
Statistics: Confidence Intervals
No ratings yet
Statistics: Confidence Intervals
12 pages
Week 7 - (2) Confidence Intervals
No ratings yet
Week 7 - (2) Confidence Intervals
12 pages
Bus 7
No ratings yet
Bus 7
48 pages
If Z Calculates To Big Number Like 3,2 Then P Value 0 If P (Z ?) 2 If P (Z ?)
No ratings yet
If Z Calculates To Big Number Like 3,2 Then P Value 0 If P (Z ?) 2 If P (Z ?)
3 pages
CVE 303 - 5. Sampling Distribution Estimation of Confidence Intervals
No ratings yet
CVE 303 - 5. Sampling Distribution Estimation of Confidence Intervals
62 pages
MyISB Guidelines
No ratings yet
MyISB Guidelines
6 pages
EAP4 Course Outline
No ratings yet
EAP4 Course Outline
2 pages
Autumn24 Tutorial Agenda - Week 4
No ratings yet
Autumn24 Tutorial Agenda - Week 4
12 pages
Tutorial Agenda Week 10
No ratings yet
Tutorial Agenda Week 10
10 pages
MR Quiz
No ratings yet
MR Quiz
38 pages
(EIM) Chapters Notes (Ban Dep)
No ratings yet
(EIM) Chapters Notes (Ban Dep)
100 pages
EIM Quizlet MCOs
No ratings yet
EIM Quizlet MCOs
65 pages
MR Quiz 10
No ratings yet
MR Quiz 10
2 pages
MR Quiz 3
No ratings yet
MR Quiz 3
4 pages
MR Quiz 9
No ratings yet
MR Quiz 9
3 pages
MR Quiz 7
No ratings yet
MR Quiz 7
5 pages
IB Notes
No ratings yet
IB Notes
33 pages
Pre-Lecture Quiz 3 (For Session 4 - Chapter 4) - Corporate Finance-T124WSB-3
No ratings yet
Pre-Lecture Quiz 3 (For Session 4 - Chapter 4) - Corporate Finance-T124WSB-3
5 pages
PA Test Bank
No ratings yet
PA Test Bank
1 page
MR - Note
No ratings yet
MR - Note
73 pages
MR Note
No ratings yet
MR Note
50 pages
How To Write Qualitative Research TQR Slides
No ratings yet
How To Write Qualitative Research TQR Slides
143 pages
Microelectronic Circuits Analysis and Design 2nd Edition Rashid Solutions Manual Kindle & PDF Formats
100% (3)
Microelectronic Circuits Analysis and Design 2nd Edition Rashid Solutions Manual Kindle & PDF Formats
68 pages
Thesis Development Plan Uq
100% (2)
Thesis Development Plan Uq
5 pages
AP Statistics: Normal Models Quiz
No ratings yet
AP Statistics: Normal Models Quiz
4 pages
Principles of Management - Planning Unit II (Bcom)
No ratings yet
Principles of Management - Planning Unit II (Bcom)
63 pages
15.093 Optimization Methods
No ratings yet
15.093 Optimization Methods
10 pages
Marcus, G. The Uses of Cumplicity
No ratings yet
Marcus, G. The Uses of Cumplicity
25 pages
Sources of External Information
No ratings yet
Sources of External Information
2 pages
Hybrid ML Models for Scour Depth Prediction
No ratings yet
Hybrid ML Models for Scour Depth Prediction
16 pages
Second Language Writing Anxiety Among Grade 12 HUMSS Students
No ratings yet
Second Language Writing Anxiety Among Grade 12 HUMSS Students
13 pages
Lecture 1 - Introduction2025 - Web
No ratings yet
Lecture 1 - Introduction2025 - Web
42 pages
Camm 3e Ch01 PPT PDF
No ratings yet
Camm 3e Ch01 PPT PDF
41 pages
Art of Writing Research Papers - Brochure
No ratings yet
Art of Writing Research Papers - Brochure
4 pages
Lab Accreditation Policy Guide
No ratings yet
Lab Accreditation Policy Guide
8 pages
Exam Aptis
100% (1)
Exam Aptis
7 pages
Iqbal BSBTR801
No ratings yet
Iqbal BSBTR801
51 pages
2019-BIM-Based Digital Fabrication Process South Korea
No ratings yet
2019-BIM-Based Digital Fabrication Process South Korea
19 pages
Inquiries Qtr3 Week 1 4
No ratings yet
Inquiries Qtr3 Week 1 4
14 pages
PHD To Consulting Conference 2017 Programme - Final
No ratings yet
PHD To Consulting Conference 2017 Programme - Final
22 pages
Doing Academic Writing Differently: A Feminist Bricolage
No ratings yet
Doing Academic Writing Differently: A Feminist Bricolage
26 pages
Project-Based Learning - Pfalberto
No ratings yet
Project-Based Learning - Pfalberto
76 pages
Honey 2025
No ratings yet
Honey 2025
1 page
Semiotic Analysis On The Latest Jollibee Facebook Post
No ratings yet
Semiotic Analysis On The Latest Jollibee Facebook Post
8 pages
Field Project Guidelines As Per Syllabus 2025 Pattern
No ratings yet
Field Project Guidelines As Per Syllabus 2025 Pattern
6 pages
Boinett Alvin Kiprop PDF
No ratings yet
Boinett Alvin Kiprop PDF
44 pages
Food Delivery Optimization
No ratings yet
Food Delivery Optimization
12 pages
Example Reference Thesis
100% (2)
Example Reference Thesis
4 pages
Unique Brand Extension Challenges
No ratings yet
Unique Brand Extension Challenges
12 pages
Leadership Styles of School Principals - A Multiple-Case Study
No ratings yet
Leadership Styles of School Principals - A Multiple-Case Study
172 pages
Parent Involvement and Its Effects On The Learning Behavior of The Grade 10 Students in F. Buencamino Sr. Integrated School
No ratings yet
Parent Involvement and Its Effects On The Learning Behavior of The Grade 10 Students in F. Buencamino Sr. Integrated School
52 pages