Chapter No. 03 Experiments With A Single Factor - The Analysis of Variance (Presentation)
Chapter No. 03 Experiments With A Single Factor - The Analysis of Variance (Presentation)
Fall-21
𝑌 − 𝑌 = 𝑌 − 𝑌𝑗 + (𝑌𝑗 − 𝑌)
In words, the total sum of squares (SST) is equal to the Error Sum of squares (SSE) plus
the sum of squares among groups (SSTreatments).
The identity above expresses how between-treatment and within-treatment variation
add to the total sum of squares.
SST
The total corrected sum of squares measures the total variability in the data.
SSTreatments
It is sum of squares of the differences between the treatment averages and the grand
average. The difference between the observed treatment averages and the grand
average is a measure of the differences between treatment means.
SSE
It is sum of squares of the differences of observations within treatments from the
treatment average. The differences of observations within a treatment from the
treatment average can be due to only random error. MSE is a pooled estimate of the
common variance within each of the a treatments.
If the null hypothesis is true, the between treatment variation (numerator) will not
considerably exceed the residual or error variation (denominator) and the F statistic will
be small. If the null hypothesis is false, then the F statistic will be large.
The Analysis of Variance
The Analysis of Variance
The Analysis of Variance
F0 F , a −1, a ( n −1)
The Analysis of Variance is Summarized in a
Table
a =5, n=5, a(n-1) = 20
Fα,a-1, a(n-1) = F0.05, 4, 20 = 2.87
❑If the model assumptions are reasonable, then the residuals
• Should appear to be normally distributed
• Should be close to statistically independent
• Have a constant variance; the residual variance should not differ for
different treatment groups
❑Residual Plots:
• Normal QQ Plot: Residuals should be scattered about a straight line.
• Run Chart: There should be no systematic pattern.
• Plot of Residuals Versus Fitted Values 𝑦(𝑖. ): There should be no systematic
pattern.
❑Remedial Measures:
•If the residuals are not close to normal, a data transformation might help
•A pattern on the run chart may be indicating that the measurements are not
independent. Run order may be an important factor – it should be included in the
experimental design.
𝜀𝑖𝑗 = 𝑦𝑖 randomized
For the completely ,𝑗 − 𝜇𝑖 design,𝑒𝑖the
,𝑗 = 𝑦𝑖 ,𝑗 −
residuals are 𝑦𝑖 .
𝑒𝑖𝑗 = 𝑦𝑖 ,𝑗 − 𝑦𝑖 . For i= 1, 2,…a j= 1, 2,…n
Systematic patterns on the run charts are not evident. Hence, the
independence assumption is not violated.
Plot of Residuals Versus Fitted Values
If the model is correct and the assumptions are satisfied, the residuals
should be structureless; in particular, they should be unrelated to any
other variable including the predicted response.
A defect that occasionally shows up on this plot is nonconstant
variance. Sometimes the variance of the observations increases as the
magnitude of the observation increases. This would be the case if the
error or background noise in the experiment was a constant percentage of
the size of the observation. (This commonly happens with many
measuring instruments—error is a percentage of the scale reading.)
ANOVA table for Tensile Strength Example
Optional
An Example
i = 1, 2,..., a
yij = + i + ij ,
j = 1, 2,..., n
i =1 j =1
ij .. i. .. ij i.
( y − y
i =1 j =1
) = [( y −2
y ) + ( y − y
i =1 j =1
)]2
a a n
= n ( yi. − y.. ) 2 + ( yij − yi. ) 2
i =1 i =1 j =1
SST = SSTreatments + SS E
The Analysis of Variance
H 0 : 1 = 2 = = a
H1 : At least one mean is different
The Analysis of Variance
• While sums of squares cannot be directly compared to test
the hypothesis of equal means, mean squares can be
compared.
• A mean square is a sum of squares divided by its degrees
of freedom:
• Design-Expert generates
the residuals
• Residual plots are very
useful
• Normal probability plot
of residuals
Other Important Residual Plots
Post-ANOVA Comparison of Means
If you get a large F you want to reject and if you get a small F you don't
want to reject. The large values of F are those values in the upper tail while
small values of F are those values in the lower tail. You only want to reject
when the values are large ... i.e., in the upper tail, but not the lower tail.
The rejection happens only when the F ratio is on the right side & never
when the F ratio is on the left side. The level of significance is the measure
of error due to statistical limitations. As the rejection happens only on the
right the entire level of significance (error risk of mis-conclusion) is kept in
the right. So that is why, you put all your significance on the upper tail.
How F-Distribution tells about Population Means
How F-Distribution tells about Population Means
How F-Distribution tells about Population Means