Chapter 12: Analysis of Variance (ANOVA)-one way
ANOVA
1. ANOVA is used to test the equality of three or more than
three populations means simultaneously.
2. F-distribution is used as a test statistic.
3. Note: F-distribution is continuous, cannot be negative and is
positively skewed. Further, like normal distribution, there is a
family of F-distribution.
Assumptions:
1. The populations follow distribution.
2. The populations have equal SD.
3. The populations are independent.
Why ANOVA? Why not t-test to compare more than 2 populations
means?
1. If we use t distribution to compare among more than 2
populations means then the size of the significance level
becomes larger.
2. Let us consider 4 populations A, B, C & D and their
corresponding means are µ1, µ2, µ3 and µ4 respectively.
3. Using the t-distribution to compare the four population
means, we would have to conduct six different t-tests. That
is
(i) µ1 vs 2 (ii) µ1 vs µ3 (iii) µ1 vs µ4 (iv) µ2 vs µ3 (v) µ2
vs µ4 and
(vi) µ3 vs µ4
4. For each t-test, suppose we choose an α = .05.
i.e. Type I error: P(Rejecting H0/H0 is true)=0.05.
Now according to the complement rule of probability the
complement is the P(We do not reject H0/H0 is true) =1-
0.05=0.95
5. Because we conduct six separate (independent) tests, the
probability that all six tests result in correct decisions is:
P(All correct) = (.95)(.95)(.95)(.95)(.95)(.95) =0 .735
6. Thus P(at least one incorrect decision due to sampling)=1−
0.735= 0.265
7. To summarize, if we conduct six independent tests using the
t distribution, the likelihood of rejecting a true null
hypothesis because of sampling error is an unsatisfactory
0.265.
8. The ANOVA technique allows us to compare population
means simultaneously at a selected significance level. It
avoids the buildup of Type I error associated with testing
many hypotheses.
ANOVA Steps
1. Formulation of H0 & H1:
H0: All populations means are equal
i.e. µ1=µ2=µ3=………………..=µk (k is the number of
treatments)
H1: At least two of them are not equal (not all population
means are equal)
Here, k is the number of treatments.
2. Select, level of significance, α=0.05, 0.01, 0.10 etc.
MST
3. Test statistic F= MSE distributed with F, α,k-1, n-k (k-1 is the
df in numerator & n-k is the df of denominator
4. If F (cal)> F(tab), H0 is rejected, accepted otherwise
5. ANOVA Table (we need to developed an ANOVA table to find
F statistic)
ANOVA Table
Source of df SS (sum of MS (mean F (cal)
variation square) sum of
square)
Treatment k-1 SST SST/k-1=MST F=MST/
(factor) MSE
Error n-k SSE SSE/n-k=MSE
Total n-1 TSS TSS/n-1=TMS
Mini Case (Source-Text):
Recently airlines cut services, such as meals and snacks during flights, and started charging for checked
luggage. A group of four carriers hired Brunner Marketing Research Inc. to survey passengers regarding
their level of satisfaction with a recent flight. The survey included questions on ticketing, boarding, in-flight
service, baggage handling, pilot communication, and so forth. Twenty-five questions offered a range of
possible answers: excellent, good, fair, or poor. A response of excellent was given a score of 4, good a 3,
fair a 2, and poor a 1. These responses were then totaled, so the total score was an indication of the
satisfaction with the flight. The greater the score, the higher the level of satisfaction with the service. The
highest possible score was 100. Brunner randomly selected and surveyed passengers from the four
airlines. Following is the sample information. Is there a difference in the mean satisfaction level
among the four airlines? Use the .01 significance level. Use the software output to answer the question.
One-way ANOVA: Northern, WTA, Pocono,
Branson
Source DF SS MS F P
Factor 3 890.7 296.9 8.99 0.001
Error 18 594.4 33.0
Total 21 1485.1
Solution (using statistical software output):
We will use the six-step hypothesis-testing procedure.
Step 1:
H0: μN = μW =μP = μB
H1: The mean scores are not all equal/ At least two mean scores are
not equal.
Step 2: Select the level of significance. Given, α= 0.01.
Step 3: Determine the test statistic, F.
MST
F= with (k-1), (n-k) df
MSE
F-statistic can be found from software output given above
The critical value for the F-statistic is found from F-distribution table. The CV
for F is 5.09.
Note: We will discuss in the next class how to find CV from table
Decision:
F(cal)=8.99, F(tab)=5.09. As F(cal)>F(tab), Hence, H0 is rejected.
Use alternative decision rule:
As p=0.001<0.05. Hence, null hypothesis is rejected
Interpretation: As null hypothesis is rejected, we conclude that there is no
difference in the mean scores for the four airlines. That means, there is a
difference in at least one pair of mean scores. However, at this point we do
not know which pair or how many pairs differ.
Using hands (manually) to develop ANOVA table
To complete the ANOVA table, we need to calculate the following
elements:
TSS (Total sum of square)=The sum of squared differences between each
observation and overall mean =∑ (x−x G )
2
Here, x is the individual observation and x Gis the grand (overall mean), where
x G=
∑x
n
SST (Sum of square due to treatment) = The sum of the squared
differences between each treatment mean and grand mean
k
∑ ni (x i−x G)2
i=1
Here, ni is the number of observations for each treatment, x i is the each
treatment mean and x G isthe grand mean.
SSE (Erro sum of square/random)=The sum of the squared differences
between each observation and its treatment=
n1 n2 n3
∑ (x 1 j−x 1) +¿ ∑ (x 2 j−x 2) +… … .+∑ ( x kj −x k )2 ¿
2 2
j=1 j=1 j=1
Here, x1j, x2j and x3j denote the individual observations of treatment1,
treatment 2 & treatment 3 respectively and x 1 , x 2 and x 3 denote the mean of
treatment 1, treatment 2 and treatment 3 respectively.
Note: TSS=SST +
SSE
Calculations:
x G=
∑ x = 94+90+ 85+80+ … … … 65 = 1664 = 7 5. 64
n 22 22
x 1=
∑ x 1 j = 94+ 90+85+80 = 349 = 87.25
n1 4 4
x 2=
∑ x 2 j = 75+68+77+ 83+88 = 391 =¿78.2
n2 5 5
x 3=
∑ x 3 j = 70+73+76+ 78+80+68+65 = 510 = 72.86
n3 7 7
x4 =
∑ x 4 j = 68+70 +72+ 65+74+ 65 = 414 =¿69
n4 6 6
∑ (x−x G ) =
2
TSS= (94-75.64)2+(90-75.64)2 +(85-75.64)2 + (80-75.64)2
+(75-75.64)2+(68-75.64)2+(77-75.64)2+(83-
75.64)2+(88-75.64)2
+(70-75.64)2 +(73-75.64)2+(76-75.64)2+(78-75.64)2+(80-
75.64)2+(88-75.64)2+(65-75.64)2
+(68-75.64)2+(70-75.64)2+(72-75.64)2+(65-
75.64)2+(74-75.64)2+(65-75.64)2
=1485.10
The Error SS is SSE=
n1 n2 n3
∑ (x 1 j−x 1) +¿ ∑ (x 2 j−x 2) +… … .+∑ ( x kj −x k )2 ¿
2 2
j=1 j=1 j=1
= (94-87.25)2+(90-87.25)2 +(85-87.25)2 + (80-87.25)2
+(75-78.2)2+(68-78.2)2+(77-78.2)2+(83-78.2)2+(88-
78.2)2
+(70-72.86)2 +(73-72.86)2+(76-72.86)2+(78-72.86)2+(80-
72.86) +(88-72.86)2+(65-72.86)2
2
+(68-69)2+(70-69)2+(72-69)2+(65-69)2+(74-69)2+(65-
69)2
= 594.41
k
SSTreat is SST= ∑ ni (x i−x G)2
i=1
= 4(87.25-75.64)2+5(78.2-75.64)2+7(72.86-75.64)2+6(69-75.64)2
=539.1684+32.768+54.0988+264.5376
=890.69
Source of df SS (sum of MS (mean sum of F (cal)
variation square) square)
Treatment k-1=4-1=3 SST=890.6 SST 890.69 MST 296.90
MST= = = F= =
(factor) 8 k−1 3 MSE 33.02
296.90 =8.99
Error n-k=22- SSE=594.4 SSE 594.41
MSE= = =
4=18 1 n−k 18
33.02
Total n-1=22- TSS=1485.
1=21 10
Decision: Fcal=8.90, F0.01, 3, 18= 5.09. Since, F (cal)>F(tab), H0 is rejected.
Note: TSS=Total Sum of Square, SSE= Error Sum of Square, SST=Treatment Sum of
Square, MST=Mean Sum of Square due to Treatment, MSE=Mean Sum of Square due
to Error
Special Note: As H0 is rejected, i.e. not all populations means are equal. At this stage
we can use t-distribution to identify which pairs are not equal. The following formula is
used to do so.