0% found this document useful (0 votes)
336 views24 pages

STT843 HW2 Solution YiChen

This document provides the solutions to several problems regarding multivariate analysis homework on grizzly bear data. It includes: 1) Calculating the T-squared statistic and determining its distribution to test a hypothesis about population means. 2) Verifying that replacing observations with a scaled version does not change the T-squared value. 3) Obtaining simultaneous and Bonferroni confidence intervals for multiple population means and a confidence ellipse/rectangle for two means using the given sample data, means, and covariance matrix of six measurements from grizzly bears.

Uploaded by

Sudi Imesha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
336 views24 pages

STT843 HW2 Solution YiChen

This document provides the solutions to several problems regarding multivariate analysis homework on grizzly bear data. It includes: 1) Calculating the T-squared statistic and determining its distribution to test a hypothesis about population means. 2) Verifying that replacing observations with a scaled version does not change the T-squared value. 3) Obtaining simultaneous and Bonferroni confidence intervals for multiple population means and a confidence ellipse/rectangle for two means using the given sample data, means, and covariance matrix of six measurements from grizzly bears.

Uploaded by

Sudi Imesha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Multivariate Analysis Homework 2

A49109720 Yi-Chen Zhang


March 25, 2018

5.1. (a) Evaluate T 2 , for testing H0 : µT = (7, 11), using the data
0 1
2 12
B8 9 C
X=B @6 9 A
C

8 10

(b) Specify the distribution of T 2 for the situation in (a).


(c) Using (a) and (b), test H0 at the ↵ = 0.05 level. What conclusion do you reach?
Xn ✓ ◆
1 6
Sol. (a) We have n = 4, p = 2, x̄ = n xi = , and
10
i=1

n
X ✓ 10
◆ ✓9 15

1 T 8 1
S= (xi x̄)(xi x̄i ) = 10
3 . ) S = 22
15
22
18
n 1 3
2 22 11
i=1

T 2 = n(x̄ µ)T S 1
(x̄ µ) = 13.6364
(n 1)p
(b) Under H0 , T 2 is distributed as Fp,n p . That is,
n p

T 2 ⇠ 3F2,2 .

(c) Using R we calculate F2,2 (0.05) = qf(1-0.05,df1=2,df2=2) = 19.


Since T 2 = 13.6364 < 3F2,2 (0.05) = 57, we do not reject H0 at significant level
↵ = 0.05.

5.2. Using the data in Example 5.1, verify that T 2 remains unchanged if each observations xj ,
j = 1, 2, 3, is replaced by Cxj , where
✓ ◆
1 1
C=
1 1

Note that the observations ✓ ◆


xj1 xj2
Cxj =
xj1 + xj2
yield the data matrix
✓ ◆T
(6 9) (10 6) (8 3)
(6 + 9) (10 + 6) (8 + 3)

1
Sol. Let zj = Cxj for j = 1, 2, 3 and µz,0 = Cµ0 , then
0 1 0 1
Cx1 3 15 ✓ ◆
@ A @ A 4
Z = Cx2 = 4 16 and µz,0 = Cµ0 =
14
Cx3 5 11
✓ ◆ ✓ ◆ ✓ 7 5

2 T 19 5 1 108 108
z̄ = , Sz = CSC = and Sz = 5 19
14 5 7 108 108
7
T ⇤ = n(z̄ µz,0 )T Sz 1 (z̄ µz,0 ) = = T2
9
Thus, T 2 remains unchanged.

5.9. Harry Roberts, a naturalist for the Alask Fish and Game departmen, studies grizzly bears
with the goal of maintaining a healthy population. Measurements on n = 61 bears
provided the following summary statistics:

Variable Weight Body Neck Girth Head Head


(kg) length (cm) (cm) length width
(cm) (cm) (cm)
Sample mean x̄ 95.52 164.38 55.69 93.39 17.98 31.13

Covariance matrix
0 1
3266.46 1343.97 731.54 1175.50 162.68 238.37
B1343.97 721.91 324.25 537.35 80.17 117.73C
B C
B 731.54 324.25 179.28 281.17 39.15 56.80 C
S=B
B1175.50 537.35 281.17 474.98 63.73 94.85 C
C
B C
@ 162.68 80.17 39.15 63.73 9.95 13.88 A
238.37 117.73 56.80 94.85 13.88 21.26

(a) Obtain the large sample 95% simultaneous confidence intervals for the six population
mean body measurements.
(b) Obtain the large sample 95% simultaneous ellipse for mean weight and mean girth.
(c) Obtain the 95% Bonferroni confidence intervals for the six means in Part (a).
(d) Refer to Part (b). Construct the 95% Bonferroni confidence rectangle for the mean
weight and mean girth using m = 6. Compare this rectangle with the confidence
ellipse in Part (b).
(e) Obtain the 95% Bonferroni confidence interval for

mean head width mean head length

using m = 6 + 1 = 7 to allow for this statement as well as statements about each


individual mean.

Sol. (a) One can use either Sche↵e’s (exact) or large sample (approximate) simultaneous
confidence interval. Here we provide two solutions for this question. To construct
Sche↵’s confidence interval we use
s r
(n 1)p aT Sa
aT X̄ ± Fp,n p (↵)
n p n

2
and to construct large sample confidence interval we use
q r
T 2 (↵)
aT Sa
a X̄ ± p .
n
The large sample result is from the Result 5.5 in the textbook at page 235. We
note here these two confidence intervals will be very close because of the fact that
(n 1)p
n p
Fp,n p (↵) and 2p (↵) are approximately equal for n large relative to p.
The above two intervals will contain aT µ, for every a, with probability approxi-
mately 100(1 ↵)%. Since n = 61, p = 6, and ↵ = 0.05, the value of Fp,n p (↵)
is
Fp,n p (↵) = qf(1-0.05,df1=6,df2=61-6) = 2.2687
2
and the value of p (↵) is
2
p (↵)
= qchisq(1-0.05,df=6) = 12.5916.
q
The critical value of Sche↵’s method is (nn 1)p
p
Fp,n p (↵) = 3.8535 and that of large
q
sample method is 2 (↵) = 3.5484.
p

The 100(1 ↵)% Sche↵’s simultaneous confidence interval for the six population
mean body measurements are:
s r r
(n 1)p s11 3266.46
x̄1 ± Fp,n p (↵) = 95.52 ± 3.8535
n p n 61

) 67.3210  µ1  123.7190
s r r
(n 1)p s22 721.91
x̄2 ± Fp,n p (↵) = 164.38 ± 3.8535
n p n 61
) 151.1233  µ2  177.6367
s r r
(n 1)p s33 179.28
x̄3 ± Fp,n p (↵) = 55.69 ± 3.8535
n p n 61
) 49.0837  µ3  62.2963
s r r
(n 1)p s44 474.98
x̄4 ± Fp,n p (↵) = 93.39 ± 3.8535
n p n 61
) 82.6369  µ4  104.1431
s r r
(n 1)p s55 9.95
x̄5 ± Fp,n p (↵) = 17.98 ± 3.8535
n p n 61
) 16.4237  µ5  19.5364
s r r
(n 1)p s66 21.26
x̄6 ± Fp,n p (↵) = 31.13 ± 3.8535
n p n 61
) 28.8550  µ6  33.4050

3
The 100(1 ↵)% large sample simultaneous confidence for the six population mean
body measurements are:
q r r
s 11
p 3266.46
x̄1 ± 2 (↵) = 95.52 ± 12.5916 ) 69.5535  µ1  121.4865
p
n 61
q r r
s 22
p 721.91
x̄2 ± 2 (↵) = 164.38 ± 12.5916 ) 152.1728  µ2  176.5872
p
n 61
q r r
s 33
p 179.28
x̄3 ± 2 (↵) = 55.69 ± 12.5916 ) 49.6067  µ3  61.7733
p
n 61
q r r
s 44
p 474.98
x̄4 ± 2 (↵) = 93.39 ± 12.5916 ) 83.4882  µ4  103.2918
p
n 61
q r r
s55 p 9.95
x̄5 ± 2 = 17.98 ± 12.5916 ) 16.5469  µ5  19.4131
p (↵)
n 61
q r r
s 66
p 21.26
x̄6 ± 2 (↵) = 31.13 ± 12.5916 ) 29.0351  µ6  33.2249
p
n 61

(b) From Result 5.3, for Sche↵’s method the 100(1 ↵)% simultaneous ellipse for (µi , µk )
belongs to the sample mean-centered ellipses
✓ ◆ 1✓ ◆
sii sik x̄i µi (n 1)p
n(x̄i µi , x̄k µk )  Fp,n p (↵),
ski skk x̄k µk n p

and for large sample method the 100(1 ↵)% simultaneous ellipse for (µi , µk ) belongs
to the sample mean-centered ellipses
✓ ◆ 1✓ ◆
sii sik x̄i µi
n(x̄i µi , x̄k µk )  2p (↵).
ski skk x̄k µk

We apply these two results and plot the 95% Sche↵’s simultaneous ellipse and large
sample simultaneous ellipses for mean weight and mean girth in Figure 1.
105
100
95
Girth
90
85

70 80 90 100 110 120


Weight

Figure 1: The 95% simultaneous ellipse with confidence rectangle. Sche↵: dotted blue line;
Large sample: solid black line.

4
(c) The Bonferroni 100(1 ↵)% confidence interval for µi (see page 232 for details) is
✓ ◆r
↵ sii
x̄i ± tn 1 , for i = 1, . . . , p.
2p n
⇣ ⌘

Since n = 61, p = 6, and ↵ = 0.05, the critical value of tn 1 2p is
⇣ ⌘

tn 1 2p = qt(1-0.05/(2*6),df=61-1) = 2.7286.
The Bonferroni confidence interval for the six population mean body measurements
are:
✓ ◆r r
↵ s11 3266.46
x̄1 ± tn 1 = 95.52 ± 2.7286 ) 75.5533  µ1  115.4867
2p n 61
✓ ◆r r
↵ s22 721.91
x̄2 ± tn 1 = 164.38 ± 2.7286 ) 154.9934  µ2  173.7666
2p n 61
✓ ◆r r
↵ s33 179.28
x̄3 ± tn 1 = 55.69 ± 2.7286 ) 51.0123  µ3  60.3677
2p n 61
✓ ◆r r
↵ s44 474.98
x̄4 ± tn 1 = 93.39 ± 2.7286 ) 85.7761  µ4  101.0039
2p n 61
✓ ◆r r
↵ s55 9.95
x̄5 ± tn 1 = 17.98 ± 2.7286 ) 16.8780  µ5  19.0820
2p n 61
✓ ◆r r
↵ s66 21.26
x̄6 ± tn 1 = 31.13 ± 2.7286 ) 29.5192  µ6  32.7408
2p n 61

(d) The 95% Bonferroni confidence rectangle for the mean weight and mean girth with
the confidence ellipse in Part (b) are plotted in Figure 2.
105
100
95
Girth
90
85

70 80 90 100 110 120


Weight

Figure 2: The 95% simultaneous ellipse with Bonferroni confidence rectangle.

From Figure 2 the Bonferroni confidence rectangle seems shorter than the Sche↵’s
and large sample’s simultaneous ellipse. This result is not surprising since we know
the Bonferroni’s method is more conservative.

5
(e) The 95% Bonferroni confidence interval for linear combinations aT µ (see page 234
for details) is r
⇣ ↵ ⌘ aT Sa
T
a X̄ ± tn 1
2m n
The di↵erence for mean head width mean head length is µ6 µ5 . Let aT =
(0, 0, 0, , 0, 1, 1), then the Bonferroni confidence interval for µ6 µ5 is
⇣ ↵ ⌘ rs s56 s65 + s66
55
(x̄6 x̄5 ) ± tn 1 .
2m n
We note that the critical value is

tn 1 2m
= qt(1-0.05/(2*7),df=n-1) = 2.7855.
So the 95% Bon↵eroni confidence interval for µ6 µ5 is
r
9.95 13.88 13.88 + 21.26
(31.13 17.98) ± 2.7855
61
) 12.4876  µ6 µ5  13.8124
6.8. Observations
✓ ◆ on two responses are collected for three treatments. The observation vectors
x1
are
x2
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
6 5 8 4 7
Treatment 1 : , , , ,
7 9 6 9 9
✓ ◆ ✓ ◆ ✓ ◆
3 1 2
Treatment 2 : , ,
3 6 3
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2 5 3 2
Treatment 3 : , , ,
3 1 1 3
(a) Break up the observations into mean, treatment, and residual components, as in
(6-39). Construct the corresponding arrays for each variable.
(b) Using the information in Part (a), construct the one-way MANOVA table.
(c) Evaluate Wilks’ lambda, ⇤⇤ , and use Table 6.3 to test for treatment e↵ects. Set ↵ =
0.01. Repeat the test using the chi-square approximation with Bartlett’s correction.
Compare the conclusions.
Sol. (a) We calculate the mean of each treatment and the overall mean
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
6 2 3 4
x̄1 = , x̄2 = , x̄3 = , and x̄ = ,
8 4 2 5
and break up the observations into mean, treatment, and residual components as
xlj = x̄ + (x̄l x̄) + (xlj x̄l ). The first variable can be decomposed as
0 1 0 1 0 1 0 1
6 5 8 4 7 4 4 4 4 4 2 2 2 2 2 0 1 2 2 1
@3 1 2 A = @4 4 4 A+@ 2 2 2 A+ @ 1 1 0 A
2 5 3 2 4 4 4 4 1 1 1 1 1 2 0 1
and the second variable can be decomposed as
0 1 0 1 0 1 0 1
7 9 6 9 9 5 5 5 5 5 3 3 3 3 3 1 1 2 1 1
@3 6 3 A @
= 5 5 5 A+ @ 1 1 1 A+ @ 1 2 1 A
3 1 1 3 5 5 5 5 3 3 3 3 1 1 1 1

6
(b) For first variable we have
SSobs = 62 + 52 + 82 + 42 + 72 + 32 + 12 + 22 + 22 + 52 + 32 + 22 = 246
SSmean = 12 ⇥ 42 = 192
SStrt = 5 ⇥ 22 + 3 ⇥ ( 2)2 + 4 ⇥ ( 1)2 = 36
SSres = 02 + ( 1)2 + 22 + ( 2)2 + 12 + 12 + ( 1)2 + 02 + ( 1)2 + 22 + 02 + ( 1)2 = 18

The corrected total sum of square is SStotal = SSobs SSmean = 246 192 = 54.
Repeating this process on the second variable, we have
SSobs = 72 + 92 + 62 + 92 + 92 + 32 + 62 + 32 + 32 + 12 + 12 + 32 = 402
SSmean = 12 ⇥ 52 = 300
SStrt = 5 ⇥ 32 + 3 ⇥ ( 1)2 + 4 ⇥ ( 3)2 = 84
SSres = ( 1)2 + 12 + ( 2)2 + 12 + 12 + ( 1)2 + 22 + ( 1)2 + 12 + ( 1)2 + ( 1)2 + 12 = 18

The corrected total sum of square is SStotal = SSobs SSmean = 402 300 = 102.
The cross product terms are:
SSobs = 6 ⇥ 7 + 5 ⇥ 9 + 8 ⇥ 6 + 4 ⇥ 9 + 7 ⇥ 9 + 3 ⇥ 3 + 1 ⇥ 6 + 2 ⇥ 3 + 2 ⇥ 3 +
5 ⇥ 1 + 3 ⇥ 1 + 2 ⇥ 3 = 275
SSmean = 12 ⇥ 4 ⇥ 5 = 240
SStrt = 5 ⇥ 2 ⇥ 3 + 3 ⇥ ( 2) ⇥ ( 1) + 4 ⇥ ( 1) ⇥ ( 3) = 48
SSres = 0 ⇥ ( 1) + ( 1) ⇥ 1 + 2 ⇥ ( 2) + ( 2) ⇥ 1 + 1 ⇥ 1 + 1 ⇥ ( 1) + ( 1) ⇥ 2 +
0 ⇥ ( 1) + ( 1) ⇥ 1 + 2 ⇥ ( 1) + 0 ⇥ ( 1) + ( 1) ⇥ 1 = 13

The corrected total sum of square is SStotal = SSobs SSmean = 275 240 = 35.

The one-way MANOVA table


Source of variation Sum of square
✓ matrix
◆ Degrees of freedom
36 48
Treatment B= 3 1=2
✓ 48 84 ◆
18 13
Residual W = 5+3+4 3=9
13
✓ 18 ◆
54 35
Total (correted) B+W = 12 1 = 11
35 102
(c) To test the treatment e↵ects, we make the following hypothesis

H0 : ⌧1 = ⌧2 = ⌧3 = 0 vs. H1 : at least one ⌧l 6= 0

To carry out the test, we calculate Wilks’ lambda


|W | 155
⇤⇤ = = = 0.0362.
|W + B| 4283
From Table 6.3 for p = 2 and g = 3 the test statistic is
✓ Pg ◆ p !
n
l=1 l g 1 1 ⇤⇤
p = 17.0266
g 1 ⇤⇤

For ↵ = 0.01 the critical value of F2(g Pg


1),2( l=1 nl g 1) (↵) is
F4,16 (0.01) = qf(1-0.01,df1=4,df2=16) = 4.7726

7
Since 17.0266 > 4.7726, we reject H0 at significant level ↵ = 0.01 and conclude that
at least one ⌧l is not zero.

To carry out the chi-square approximation with Barlett’s correction, we caluclate


the test statistic ✓ ◆
p+g
n 1 log ⇤⇤ = 28.2114
2
2
For ↵ = 0.01 the critical value of p(g 1) (↵) is
2
4 (0.01) = qchisq(1-0.01,df=4) = 13.2767
Since 28.2114 > 13.2767, we reject H0 at significant level ↵ = 0.01 and conclude
that at least one ⌧l is not zero.
Both of the tests show that the treatment di↵erence exists. We note here the Wilk’s
lambda can be applied for small sample size, however, the Bartlett’s correction is an
approximation for large sample size.

6.12. (Test for linear profiles, given that the profiles are parallel.) Let µT1 = (µ11 , µ12 , . . . , µ1p )
and µT2 = (µ21 , µ22 , . . . , µ2p ) be the mean responses to p treatments for populations 1 and
2, respectively. Assum that the profiles given by the two mean vectors are paralle.

(a) Show that the hypothesis that the profiles are linear can be written as H0 : (µ1i +
µ2i ) (µ1,i 1 + µ2,i 1 ) = (µ1,i 1 + µ2,i 1 ) (µ1,i 2 + µ2,i 2 ), i = 3, . . . , p or as
H0 : C(µ1 + µ2 ) = 0, where the (p 2) ⇥ p matrix
0 1
1 2 1 0 ... 0 0 0
B0 1 2 1 ... 0 0 0C
B C
C = B .. .. .. .. .. .. .. C
@. . . . . . .A
0 0 0 0 ... 1 2 1

(b) Following an argument similar to the one leading to (6-73), we reject H0 : C(µ1 +
µ2 ) = 0 at level ↵ if
✓ ◆ 1
2 T T 1 1
T = (x̄1 + x̄2 ) C + CSpooled C T C(x̄1 + x̄2 ) > c2
n1 n2

where
(n1 + n2 2)(p 2)
c2 = Fp 2,n1 +n2 p+1 (↵)
n1 + n2 p + 1
Let n1 = 30, n2 = 30, x̄T1 = (6.4, 6.8, 7.3, 7.0)T , x̄T2 = (4.3, 4.9, 5.3, 5.1), and
0 1
0.61 0.26 0.07 0.16
B0.26 0.64 0.17 0.14C
Spooled = B @0.07 0.17
C
0.81 0.03A
0.16 0.14 0.03 0.31

Test for linear profiles, assuming that the profiles are parallel. Use ↵ = 0.05.

Sol. (a) Given that the profiles are parallel, then one will be above the other for all i =
1, . . . , p, that is

µ1i > µ2i (or µ1i < µ2i ) for all i = 1, . . . , p

8
So, profiles will be linear only if the increment of the sum of two treatments from i
to i + 1 is the same as the increment of that from i + 1 to i + 2, for all i = 1, . . . , p 2.
Thus, we might consider looking at the di↵erence of these increment of the sum of
two treatments:

(µ1,i+2 + µ2,i+2 ) (µ1,i+1 + µ2,i+1 ) = (µ1,i+1 + µ2,i+1 ) (µ1i + µ2,i )

for i = 1, . . . , p 2. This is equivalent to test

H0 : (µ1i + µ2i ) (µ1,i 1 + µ2,i 1 ) = (µ1,i 1 + µ2,i 1 ) (µ1,i 2 + µ2,i 2 )

for i = 3, . . . , p. We can also rewrite the above hypothesis as

H0 : C(µ1 + µ2 ) = 0

where C is a (p 2) ⇥ p constant matrix,


0 1
1 2 1 0 ... 0 0 0
B0 1 2 1 ... 0 0 0C
B C
C = B .. .. .. .. .. .. .. C
@. . . . . . .A
0 0 0 0 ... 1 2 1

(b) The test statistic T 2 = 16.8361 is caluclated by R. The code is compiled in the Ap-
pendix. Moreover, the quantile Fp 2,n1 +n2 p+1 (↵) = qf(1-0.05,df1=2,df2=57) =
3.1588, and the critical vale c2 is

(n1 + n2 2)(p 2) 58 ⇥ 2
c2 = Fp 2,n1 +n2 p+1 (↵) = ⇥ 3.1588 = 6.4285
n1 + n2 p + 1 57

Since T 2 = 16.8361 > c2 = 6.4285, we reject the H0 at significant level ↵ = 0.05 and
conclude that the profiles are not linear, given that the profiles are parallel. The
profile picture plotted in Figure 3 is also indicated that the profiles are not linear.
7.0
6.5
Mean Value
6.0
5.5
5.0
4.5

1 2 3 4
Variables

Figure 3: Profile analysis for two treatments.

9
6.20. The tail lengths in millimeters (x1 ) and wing lengths in millimeters (x2 ) for 45 male
hook-billed kites are given in Table 6.11 on page 346. Similar measurements for female
hook-billed kites were given in Table 5.12.

(a) Plot the male hook-billed kite data as a scatter diagram, and (visually) check for
outliers. (Note, in particular, observation 31 with x1 = 284.)
(b) Test for equality of mean vectors for the populations of male and female hook-billed
kites. Set ↵ = 0.05. If H0 : µ1 µ2 = 0 is rejected, find the linear combination most
responsible for the rejection of H0 . (You may want to eliminate any outliers found in
Part (a) for the male hook-billed kite data before conducting this test. Alternatively,
you may want to interpret x1 = 284 for observation 31 as a misprint and conduct
the test with x1 = 184 for this observation. Does it make any di↵erence in this case
how observation 31 for the male hook-billed kite data is treated?)
(c) Determine the 95% confidence region for µ1 µ2 and 95% simultaneous confidence
intervals for the components of µ1 µ2 .
(d) Are male or female birds generally larger?

Sol. (a) The scatter plot for the male hook-billed kite is plotted in Figure 4. It is clear that
the observation 31 with x1 = 281 is an outlier.
310
300
Wing lengths
290
280
270
260

180 200 220 240 260 280


Tail lengths

Figure 4: The scatter plot for the male hook-billed kite data

(b) We remove the observation 31 in male hook-billed kite data, since it is identifided as
an outlier. We run Box’s M-test for homogeneity of covariance matrices. The test
statistic: 2⇢ log ⇤ has an approximate 2⌫ distribution with degrees of freedom ⌫,
where ✓ ◆ X !
g
2p2 + 3p 1 1 1
⇢=1 Pg ,
6(p + 1)(g 1) l=1
(nl 1) l=1 (nl 1)

g
! nl2 1
Y |Sl | 1
⇤= , and ⌫ = p(p + 1)(g 1).
l=1
|Spooled | 2
We found that 2⇢ log ⇤ = 1.0431 and ⌫ = 3, and the p-value is 0.7908. So we
do not reject the null hypothesis and conclude the covariance matrices of male and

10
female hook-billed kite are the same. It is reasonable to pool the covariance matrices
of male and female hook-billed kite together and we denote it by

(n1 1)S1 + (n2 1)S2


Spooled =
n1 + n2 2
To test for equality of mean vectors for the populations of male and female hook-
billed kites, we make the following hypothesis:

H0 : µ 1 µ2 = 0 vs. H1 : µ1 µ2 6= 0

From Result 6.2 at page 286, we calculate the test statistics


✓ ◆ 1
2 T 1 1
T = (x̄1 x̄2 ) + Spooled (x̄1 x̄2 ) = 24.9649,
n1 n2

the quantile Fp,n1 +n2 p 1 (↵) = qf(1-0.05,df1=2,df2=86) = 3.1026, and the critical
value
(n1 + n2 2)p
Fp,n1 +n2 p 1 (↵) = 6.2773
n1 + n2 p 1
Since T 2 = 24.9649 > 6.2773, we reject H0 and conclude that the male and female
hook-billed kite population mean vectors are not equal.

For testing H0 : µ1 µ2 = 0, the linear combination âT (x̄1 x̄2 ), with coefficient
vector â / S 1 (x̄ x̄2 ), quantifies the largest population di↵erence. The linear
pooled 1
combination of mean components most responsible for rejecting H0 is given by the
vector (see the remark on page 289, which builds on the argument at the top of page
225 in the textbook):
✓ ◆
1 3.4842
â / S (X̄ X̄2 ) = .
pooled 1 2.0785

Alternatively, we revise the observation 31 by changing it from x1 = 284 to x1 = 184


and repeat the whole processes. We found that for Box’s M-test 2⇢ log ⇤ = 1.2223
and ⌫ = 3, and p-value is 0.7477, so the covariance matrices of male and female hook-
billed kite are the same. To conduct the hypothesis for equality of mean vectors, we
found the test statistic is T 2 = 25.6625 and the critical value is 6.2739. Hence, we
still conclude that the male and female hook-billed kite population mean vectors are
not equal.
There is no di↵erence in this case how observation 31 for the male hook-billed kite
data is treated. The R code for testing homogeneity of covariance matrices and for
equality of mean vectors without (and with revised) observation 31 is compiled in the
Appendix. In the following questions we only consider the data set with observation
31 being deleted.
(c) From Result 6.2 at page 286 in textbook, we have
⇣ ⌘T ✓ 1 1
◆ 1 ⇣ ⌘
2
T = (X̄1 X̄2 ) (µ1 µ2 ) + Spooled (X̄1 X̄2 ) (µ1 µ2 )
n1 n2
(n1 + n2 2)p
⇠ Fp,n1 +n2 p 1.
n1 + n2 p 1

11
A 100(1 ↵)% confidence region for µ1 µ2 is
( )
⇣ ⌘T ✓ 1 1
◆ 1 ⇣ ⌘
(µ1 µ2 ) (X̄1 X̄2 ) + Spooled (µ1 µ2 ) (X̄1 X̄2 )  c2 ,
n1 n2

where
(n1 + n2 2)p
c2 = Fp,n1 +n2 p 1 (↵)
(n1 + n2 p 1)
The confidence region is an ellipse with center at X̄1 X̄2 and the axes of the ellipse
are s✓
p ◆
1 1
± i + c2 e i
n1 n2
where iand ei are eigenvalues and eigenvectors of Spooled .
Here the 95% confidence ellipse for µ1 µ2 is determined from the eigenvalue-
eigenvector pairs 1 = 262.5640, eT1 = (0.5588, 0.8293) and 2 = 33.0469, eT2 =
( 0.8293, 0.5588).
Since
s✓ ◆ s✓ ◆
p 1 1 p 1 1
+ 2
c = 262.5640 + 6.2773 = 8.6072
1
n1 n2 44 45

and s✓ s✓
◆ ◆
p 1 1 p 1 1
2 + c2 = 33.0469 + 6.2773 = 3.0536
n1 n2 44 45
we obtain the 95% confidence ellipse for µ1 µ2 sketched in Figure 5.
Moreover, from Result 6.3 a 100(1 ↵)% simultaneous confidence intervals for the
components of µ1 µ2 is
s ✓ ◆
T T
1 1
a (X̄1 X̄2 ) ± c a + Spooled a
n1 n2

will cover aT (µ1 µ2 ) for all a. In particular µ1i µ2i will be covered by
s✓ ◆
1 1
(X̄1i X̄2i ) ± c + Spooled,ii for i = 1, . . . , p
n1 n2

With µT1 µT2 = (µ11 µ21 , µ12 µ22 ), the 95% simultaneous confidence intervals
for the population di↵erence are
s✓ ◆
p 1 1
µ11 µ21 : (187.1951 193.6222) ± 6.2773 + 104.7180
44 45
or
11.8989  µ11 µ21 
1.0274
s✓ ◆
p 1 1
µ12 µ22 : (280.9545 279.7778) ± 6.2773 + 109.8930
44 45
or
6.1623  µ21 µ22  8.5159
The 95% simultaneous confidence intervals are also sketched in Figure 5.

12
5
µ21 − µ22
0
−5

−12 −10 −8 −6 −4 −2
µ11 − µ21

Figure 5: The 95% simultaneous ellipse with confidence rectangle.

(d) We conclude that there is a di↵erence in tail lengths between male and female hook-
billed kite. Female birds are general larger than male birds.

6.24. Researchers have suggested that a change in skull size over time is evidence of the inter-
breeding of a resident population with immigrant populations. Four measurements were
made of male Egyptian skulls for three di↵erent time periods: period 1 is 4000 B.C.,
period 2 is 3300 B.C., and period 3 is 1850 B.C. The data are shown in Table 6.13 on
page 349 (see the skull data on the website http://www.prenhall.com/statistics). The
measured variables are

X1 = maximum breadth of skull (mm)


X2 = basibregmatic height of skull (mm)
X3 = basialveolar length of skull (mm)
X4 = nasal height of skull (mm)

Construct a one-way MANOVA of the Egyptian slull data. Use ↵ = 0.05. Construct 95%
simultaneous confidence intervals to determine which mean components di↵er among the
populations represented by the three time periods. Are the usual MANOVA assumptions
realistic for these data? Explain.

Sol. The one-way MANOVA model is specified by

Xlj = µ + ⌧l + elj , for j = 1, . . . , nl and l = 1, . . . , g

where the elj are independent N (0, ⌃) variables. Here the parameter
Pgvector µ is an over
all mean (level), and ⌧l represents the l-th time period e↵ect with l=1 nl ⌧l = 0.
The hypothesis of no time period e↵ects is tested by considering the relative size of the
time e↵ect and residual sums of squares and cross products.

H0 : ⌧1 = · · · = ⌧g = 0 vs. H1 : at least one ⌧ 6= 0

We summarize the calculations leading to the test statistic in a MANOVA table:

13
Source Matrix of sum of squares Degrees of freedom
g
X
Time e↵ect B= nl (x̄l x̄)(x̄l x̄)T g 1
l=1
g nl g
X X X
T
Residual W = (xlj x̄l )(xlj x̄l ) nl g
l=1 j=1 l=1
g nl g
X X X
Total B+W = (xlj x̄)(xlj x̄)T nl 1
l=1 j=1 l=1

We reject H0 if the ratio of generalized variances


|W |
⇤⇤ =
|B + W |
is too small. The exact distribution of ⇤⇤ can be derived for the special case listed in
Table 6.3 in the textbook at page 303.

Since p = 4 and g = 3, the distribution of Wilks’ Lambda ⇤⇤ is


✓ Pg ◆ p !
n
l=1 l p 2 1 ⇤⇤
p ⇠ F2p,2(Pgl=1 nl p 2)
p ⇤ ⇤

Using manova function in R, we get ⇤⇤ = 0.8301.


The test statistics is
✓ Pg ◆ p ! ✓ ◆ p !
l=1 nl p 2 1 ⇤⇤ 90 4 2 1 0.8301
p = p = 2.0491
p ⇤⇤ 4 0.8301
and the critical value is
F2p,2(Pgl=1 nl p 2) (↵) = qf(1-0.05,df1=8,df2=168) = 1.9939
Since the test statistics 2.0491 > 1.9939, we reject H0 and conclude that the time e↵ect
di↵erences exist. There is a di↵erence of male Egyptian skulls for three di↵erent time
periods.
For pairwise comparisons, the Bonferroni approach can be used to construct simultaneous
confidence intervals for the components of the di↵erences ⌧k ⌧l . From Result 6.5, For
the MANOVA model, with confidence level at 100(1 ↵)%
✓ ◆s ✓ ◆
↵ wii 1 1
⌧ki ⌧li belongs to x̄ki x̄li ± tn g +
pg(g 1) n g nk nl
The calculation of above equation for each pair is calculated by R and compiled in Ap-
pendix.
⌧11 ⌧21 2 ( 4.4423, 2.4423), ⌧11 ⌧31 2( 6.5423, 0.3423), ⌧21 ⌧31 2( 5.5423, 1.3423)
⌧12 ⌧22 2 ( 2.6737, 4.4737), ⌧12 ⌧32 2( 3.7737, 3.3737), ⌧22 ⌧32 2( 4.6737, 2.4737)
⌧13 ⌧23 2 ( 3.6801, 3.8801), ⌧13 ⌧33 2( 0.6468, 6.9134), ⌧23 ⌧33 2( 0.7468, 6.8134)
⌧14 ⌧24 2 ( 2.0614, 2.6614), ⌧14 ⌧34 2( 2.3948, 2.3281), ⌧24 ⌧34 2( 2.6948, 2.0281)

For ↵ = 0.05 we find all simultaneous confidence intervals cover zero, indicating that
there is no significant di↵erence between three di↵erent time periods. We further inves-
tigate the skull data and find that the normality assumption is violated for each period
(see Figure 6). Hencn we conclude that the usual MANOVA assumptions are not realistic
for these data.

14
(a) 4000 B.C. (b) 3300 B.C. (c) 1850 B.C.

12

12
10

10

10
Sample Quantiles

Sample Quantiles

Sample Quantiles
8

8
6

6
4

4
2

2
2
0

0 5 10 15 0 5 10 15 0 5 10 15
Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles

Figure 6: QQ-plot for period 1, 2, and 3 from left to right.

6.37. Table 6.9 page 344 contains the carapace measurements for 24 female and 24 male turtles.
Use Box’s M-test to test H0 : ⌃1 = ⌃2 = ⌃, where ⌃1 is the population covariance matrix
for carapace measurements for female turtles, and ⌃2 is the population covariance matrix
for carapace measurements for male turtles. Set ↵ = 0.05.

Sol. We run Box’s M-test for homogeneity of covariance matrices. The R code is compiled
in the Appendix. We found that the test statistic is 2⇢ log ⇤ = 23.405 with degrees of
freedom v = 6 and the p-value is 0.0007. So we reject H0 at significant level ↵ = 0.05
and conclude that there is a di↵erence of population covariance between male and female
turtles.

6.39. Anacondas are some of the largest snakes in the world. Jesus Ravis and his fellow,
researchers capture a snake and measure its (i) snout vent length (cm) or the length from
the snout of the snake to its vent where it evacuaies waste and (ii) weight (kilograms). A
sample of these measurements in shown in Table 6.19.

(a) Test for equality of means between males and females using ↵ = 0.05. Appiy the
large sample statistic.
(b) Is it reasonable to pool variances in this case? Explain.
(c) Find the 95% Boneferroni confidence intervals for the mean di↵erences between males
and females on both length and weight.

Sol. (a) We first run Box M-test to test the equality of covariance matrices between male
and female. The p-value is reported to be 0. Hence, we conclude that the covariance
matrices between male and female are di↵erent.
To test for equality of means we make the following hypothesis:

H0 : µ 1 µ2 = 0 vs. H1 : µ1 µ2 6= 0

From Result 6.4 at page 292 in the textbook, under the large sample setting to test
for equality of means we calculate the test statistics
✓ ◆ 1
2 T 1 1
T = (x̄1 x̄2 ) S1 + S2 (x̄1 x̄2 ) = 76.9153,
n1 n2

and the critical value 2p (↵) = 5.9915.


Since T 2 = 76.9153 > 5.9915, we reject H0 and conclude that the means for at least
one of the variables between males and females are significantly di↵erent.

15
(b) Though the covariance matrices of female and male groups are significantly di↵erent,
the sample sizes for the two groups are equal, so it is still reasonable to pool the
variance. We note that under H0 , if n1 = n2 = n, then (n 1)/(n + n 2) = 1/2, so
✓ ◆ ✓ ◆
1 1 1 (n 1)S1 + (n 1)S2 1 1 1 1
S1 + S2 = (S1 +S2 ) = + = Spooled + .
n1 n2 n n+n 2 n n n n
With equal sample sizes, the large sample procedure is essentially the same as the
procedure based on the pooled covariance matrix.
(c) The Bonferroni 100(1 ↵)% simultaneous confidence intervals for the p population
mean di↵erences (see page 291 in the textbook) are
✓ ◆ s✓ ◆
↵ 1 1
µ1i µ2i : (x̄1i x̄2i ) ± tn1 +n2 2 + spooled,ii
2p n1 n2

the critical value


⇣ ⌘

tn1 +n2 2 2p
= qt(1-0.05/(2*2),df=54) = 2.3056

The 95% Bonferroni confidence interval for the mean di↵erences between males and
females on length is ( 150.98, 88.06), and the 95% Bonferroni confidence interval
for the mean di↵erences between males and females on weight is ( 38.78, 21.17).

Appendix
R code for Problem 5.1.
> # (a)
> mu <- c(7,11)
> n <- 4
> p <- 2
> X <- matrix(c(2,8,6,8,12,9,9,10),nrow=n,ncol=p)
> xbar <- colMeans(X)
> S <- cov(X)
> Sinv <- solve(S)
> T2 <- n*(xbar-mu) %*% Sinv %*% (xbar-mu)
>
> # (c)
> alpha <- 0.05
> qf(1-alpha,df1=2,df2=2)
R code for Problem 5.2.
> n <- 3
> p <- 2
> mu <- c(9,5)
> X <- matrix(c(6,10,8,9,6,3),nrow=n,ncol=p)
> xbar <- colMeans(X)
> S <- cov(X)
> Sinv <- solve(S)
> T2 <- n*(xbar-mu) %*% Sinv %*% (xbar-mu)
>

16
> C <- matrix(c(1,1,-1,1),nrow=2,ncol=2)
> Z <- t(C%*%t(X))
> zbar <- colMeans(Z)
> muz <- C%*%mu
> Sz <- cov(Z)
> Szinv <- solve(Sz)
>
> Tstar <- n*t(zbar-muz) %*% Szinv %*% (zbar-muz)
>
> # (c)
> alpha <- 0.05
> qf(1-alpha,df1=2,df2=2)

R code for Problem 5.9.

> xbar <- c(95.52, 164.38, 55.69, 93.39, 17.98, 31.13)


> S <- matrix(c(3266.46, 1343.97, 731.54, 1175.50, 162.68, 238.37,
+ 1343.97, 721.91, 324.25, 537.35, 80.17, 117.73,
+ 731.54, 324.25, 179.28, 281.17, 39.15, 56.80,
+ 1175.50, 537.35, 281.17, 474.98, 63.73, 94.85,
+ 162.68, 80.17, 39.15, 63.73, 9.95, 13.88,
+ 238.37, 117.73, 56.80, 94.85, 13.88, 21.26),
+ nrow=6, ncol=6, byrow=TRUE)
> p <- 6
> n <- 61
> alpha <- 0.05
>
> # (a) Scheff’s
> SH.qlevel <- qf(1-alpha,df1=p,df2=n-p)
> for ( i in 1:p ){
+ SH.LCI <- xbar[i]-sqrt((n-1)*p/(n-p)*SH.qlevel)*sqrt(S[i,i]/n)
+ SH.UCI <- xbar[i]+sqrt((n-1)*p/(n-p)*SH.qlevel)*sqrt(S[i,i]/n)
+ print(c(SH.LCI, SH.UCI))
+ }
> # (a) Large sample
> qlevel <- qchisq(1-alpha,df=p)
> for ( i in 1:p ){
+ LCI <- xbar[i]-sqrt(qlevel)*sqrt(S[i,i]/n)
+ UCI <- xbar[i]+sqrt(qlevel)*sqrt(S[i,i]/n)
+ print(c(LCI, UCI))
+ }
>
> # (b)
> center <- xbar[c(1,4)]
> Sn2 <- S[c(1,4),c(1,4)]
> npoints <- 1000
> theta <- seq(0, 2*pi, length = npoints)
>
> # transform for points on ellipse for Scheff’s
> SH.r <- sqrt((n-1)*p/(n-p)*SH.qlevel/n)
> SH.v <- rbind(SH.r*cos(theta), SH.r*sin(theta))

17
> SH.z <- backsolve(chol(solve(Sn2)),SH.v)+center
>
> # calculate the 95% simultaneous confidence interval
> SH.LCIx <- xbar[1]-sqrt((n-1)*p/(n-p)*SH.qlevel)*sqrt(S[1,1]/n)
> SH.UCIx <- xbar[1]+sqrt((n-1)*p/(n-p)*SH.qlevel)*sqrt(S[1,1]/n)
> SH.LCIy <- xbar[4]-sqrt((n-1)*p/(n-p)*SH.qlevel)*sqrt(S[4,4]/n)
> SH.UCIy <- xbar[4]+sqrt((n-1)*p/(n-p)*SH.qlevel)*sqrt(S[4,4]/n)
>
> # transform for points on ellipse for large sample
> r <- sqrt(qlevel/n)
> v <- rbind(r*cos(theta), r*sin(theta))
> z <- backsolve(chol(solve(Sn2)),v)+center
>
> # calculate the 95% simultaneous confidence interval
> LCIx <- xbar[1]-sqrt(qlevel)*sqrt(S[1,1]/n)
> UCIx <- xbar[1]+sqrt(qlevel)*sqrt(S[1,1]/n)
> LCIy <- xbar[4]-sqrt(qlevel)*sqrt(S[4,4]/n)
> UCIy <- xbar[4]+sqrt(qlevel)*sqrt(S[4,4]/n)
>
> # plot the ellipse for Scheff’s
> plot(t(SH.z), type=’l’, xlab=’Weight’, ylab=’Girth’, lty=2, col=’blue’)
>
> # plot 95% simultaneous confidence interval
> abline(v=SH.LCIx, col=’blue’)
> abline(v=SH.UCIx, col=’blue’)
> abline(h=SH.LCIy, col=’blue’)
> abline(h=SH.UCIy, col=’blue’)
>
> # plot the ellipse for large sample
> lines(t(z))
>
> # plot 95% simultaneous confidence interval
> abline(v=LCIx)
> abline(v=UCIx)
> abline(h=LCIy)
> abline(h=UCIy)
>
> # plot center of ellipse
> points(center[1], center[2], col=’red’)
>
> # (c)
> BF.qlevel <- qt(1-alpha/(2*p),df=n-1)
> for ( i in 1:p ){
+ BF.LCI <- xbar[i]-BF.qlevel*sqrt(S[i,i]/n)
+ BF.UCI <- xbar[i]+BF.qlevel*sqrt(S[i,i]/n)
+ print(c(BF.LCI,BF.UCI))
+ }
>
> # (d)
> # calculate the 95% simultaneous confidence interval for Bonferroni method

18
> BF.LCIx <- xbar[1]-BF.qlevel*sqrt(S[1,1]/n)
> BF.UCIx <- xbar[1]+BF.qlevel*sqrt(S[1,1]/n)
> BF.LCIy <- xbar[4]-BF.qlevel*sqrt(S[4,4]/n)
> BF.UCIy <- xbar[4]+BF.qlevel*sqrt(S[4,4]/n)
> plot(t(SH.z), type=’l’, xlab=’Weight’, ylab=’Girth’, lty=2, col=’blue’)
> lines(t(z))
> points(center[1], center[2], col=’red’)
>
> # plot 95% simultaneous confidence interval
> abline(v=BF.LCIx, lty=3, col=’red’)
> abline(v=BF.UCIx, lty=3, col=’red’)
> abline(h=BF.LCIy, lty=3, col=’red’)
> abline(h=BF.UCIy, lty=3, col=’red’)
>
> # (e)
> a <- c(0,0,0,0,-1,1)
> m <- 7
> dqlevel <- qt(1-alpha/(2*m),df=n-1)
> dLCI <- a%*%xbar-dqlevel*sqrt(a%*%S%*%a/n)
> dUCI <- a%*%xbar+dqlevel*sqrt(a%*%S%*%a/n)
R code for Problem 6.8.
> x1 <- c(6,5,8,4,7,3,1,2,2,5,3,2)
> x2 <- c(7,9,6,9,9,3,6,3,3,1,1,3)
> trt <- as.factor(c(1,1,1,1,1,2,2,2,3,3,3,3))
> mfit <- lm(cbind(x1,x2)~trt)
> mvafit <- manova(mfit)
> summary(mva, test=’Wilk’)
R code for Problem 6.12.
> alpha <- 0.05
> p <- 4
> n1 <- 30
> n2 <- 30
> xbar1 <- c(6.4,6.8,7.3,7.0)
> xbar2 <- c(4.3,4.9,5.3,5.1)
> Sp <- matrix(c(0.61,0.26,0.07,0.16,0.26,0.64,0.17,0.14,0.07,
+ 0.17,0.81,0.03,0.16,0.14,0.03,0.31),nrow=4,ncol=4)
> C <- matrix( c(1,0,-2,1,1,-2,0,1), nrow=2, ncol=4)
>
> mudiff <- C%*%(xbar1+xbar2)
> Sinv <- solve((1/n1+1/n2)*C%*%Sp%*%t(C))
> T2 <- t(mudiff) %*% Sinv %*% mudiff
> c2 <- (n1+n2-2)*(p-2)/(n1+n2-p+1)*qf(1-alpha,df1=p-2,df2=n1+n2-p+1)
>
> # plot the profile
> ymax <- max(xbar1,xbar2)
> ymin <- min(xbar1,xbar2)
> plot(1:4,xbar1,type=’b’,lty=2,ylim=c(ymin,ymax),xlab=’Variables’,
+ ylab=’Mean Value’,xaxt=’n’)

19
> lines(1:4,xbar2,type=’b’,pch=23)
> axis(side=1,at=1:4)

R code for Problem 6.20.

> # (a) load the data and plot it


> male <- read.table(’./T6-11.dat’)
> colnames(male) <- c(’Tail lengths’,’Wing lengths’)
> female <- read.table(’./T5-12.dat’)
> colnames(female) <- c(’Tail lengths’,’Wing lengths’)
> plot(male)
>
> # (b) test equality of mean
> library(’biotools’)
> # number of variables and alpha
> p <- 2
> alpha <- 0.05
>
> # combine two dataset
> n1 <- dim(male)[1]
> n2 <- dim(female)[1]
> bird <- rbind(male,female)
> bird$gender <- c(rep(’male’,n1),rep(’female’,n2))
>
> # remove obs. 31
> bird <- bird[-31,]
> n1 <- n1-1
> box <- boxM(bird[,-3], bird[,3])
>
> # test for equality of mean vector
> xbar1 <- colMeans(bird[bird$gender==’male’,-3])
> xbar2 <- colMeans(bird[bird$gender==’female’,-3])
> S1 <- cov(bird[bird$gender==’male’,-3])
> S2 <- cov(bird[bird$gender==’female’,-3])
> Sp <- ((n1-1)*S1+(n2-1)*S2)/(n1+n2-2)
> Spinv <- solve((1/n1+1/n2)*Sp)
>
> # test statistic
> T2 <- (xbar1-xbar2)%*%Spinv%*%(xbar1-xbar2)
>
> # critical value
> c2 <- (n1+n2-2)*p/(n1+n2-p-1)*qf(1-alpha,df1=p,df2=n1+n2-p-1)
>
> # Alternative method combine two data set
> n1 <- dim(male)[1]
> n2 <- dim(female)[1]
> bird <- rbind(male,female)
> bird$gender <- c(rep(’male’,n1),rep(’female’,n2))
>
> # change observation 31 from x1=284 to x1=184
> bird[31,1] <- 184

20
>
> # Run Box’s M-test
> box <- boxM(bird[,-3], bird[,3])
>
> # test for equality of mean vector
> xbar1 <- colMeans(bird[bird$gender==’male’,-3])
> xbar2 <- colMeans(bird[bird$gender==’female’,-3])
> S1 <- cov(bird[bird$gender==’male’,-3])
> S2 <- cov(bird[bird$gender==’female’,-3])
> Sp <- ((n1-1)*S1+(n2-1)*S2)/(n1+n2-2)
> Spinv <- solve((1/n1+1/n2)*Sp)
>
> # test statistic
> T2 <- (xbar1-xbar2)%*%Spinv%*%(xbar1-xbar2)
>
> # critical value
> c2 <- (n1+n2-2)*p/(n1+n2-p-1)*qf(1-alpha,df1=p,df2=n1+n2-p-1)
>
> # (c) Determine the 95% confidence region for mu1-mu2 and
> # simultaneous confidence intervals for the components of mu1-mu2
> center <- xbar1-xbar2
> npoints <- 1000
> theta <- seq(0, 2*pi, length = npoints)
> qlevel <- qf(1-alpha,df1=p,df2=n1+n2-p-1)
>
> r <- sqrt((1/n1+1/n2)*(n1+n2-2)*p/(n1+n2-p-1)*qlevel)
> v <- rbind(r*cos(theta), r*sin(theta))
> z <- backsolve(chol(solve(Sp)),v)+center
>
> # calculate the 95% simultaneous confidence interval
> LCIx <- center[1]-r*sqrt(Sp[1,1])
> UCIx <- center[1]+r*sqrt(Sp[1,1])
> LCIy <- center[2]-r*sqrt(Sp[2,2])
> UCIy <- center[2]+r*sqrt(Sp[2,2])
>
> # plot the ellipse for Scheff’s
> plot(t(z),type=’l’,xlab=expression(mu[11]-mu[21]),ylab=expression(mu[21]-mu[22]))
> points(center[1], center[2],pch=19)
>
> # plot 95% simultaneous confidence interval
> abline(v=LCIx, lty=2)
> abline(v=UCIx, lty=2)
> abline(h=LCIy, lty=2)
> abline(h=UCIy, lty=2)

R code for Problem 6.24.

> skull <- read.table(’./T6-13.dat’)


> colnames(skull) <- c(’MB’,’BH’,’BL’,’NH’,’Period’)
> skull$Period <- as.factor(skull$Period)
> n <- dim(skull)[1]

21
> p <- dim(skull[,-5])[2]
> mfit <- lm(cbind(MB,BH,BL,NH) ~ Period, data=skull)
> mvafit <- manova(mfit)
> mfitsummary <- summary(mvafit)
>
> B <- mfitsummary$SS$Period
> W <- mfitsummary$SS$Residuals
>
> # Wilk’s Lambda
> Lambda <- det(W)/(det(B+W))
> FV <- ((n-p-2)/p)*((1-sqrt(Lambda))/sqrt(Lambda))
> alpha <- 0.05
> qf(1-alpha,df1=2*p,df2=2*(n-p-2))
>
> # or one can do this
> summary(mvafit,test=’Wilk’)
>
> # pair comparison
> g <- 3
> n1 <- length(which((skull$Period==1)))
> n2 <- length(which((skull$Period==2)))
> n3 <- length(which((skull$Period==3)))
> n <- n1+n2+n3
> xbar1 <- colMeans(skull[skull$Period==1,-5])
> xbar2 <- colMeans(skull[skull$Period==2,-5])
> xbar3 <- colMeans(skull[skull$Period==3,-5])
> xbar <- (n1*xbar1+n2*xbar2+n3*xbar3)/(n1+n2+n3)
> S1 <- cov(skull[skull$Period==1,-5])
> S2 <- cov(skull[skull$Period==2,-5])
> S3 <- cov(skull[skull$Period==3,-5])
> W <- (n1-1)*S1+(n2-1)*S2+(n3-1)*S3
> qtlevel <- qt(1-alpha/(p*g*(g-1)),df=n-g)
> for ( i in 1:p ){
+ # \tau_{11}-\tau_{21}
+ LCI12 <- (xbar1[i]-xbar2[i])-qtlevel*sqrt(W[i,i]/(n-g)*(1/n1+1/n2))
+ UCI12 <- (xbar1[i]-xbar2[i])+qtlevel*sqrt(W[i,i]/(n-g)*(1/n1+1/n2))
+ cat("tau1[",i,"]-tau2[",i,"] belongs to (",LCI12,",",UCI12,")\n",sep="")
+
+ # \tau_{11}-\tau_{31}
+ LCI13 <- (xbar1[i]-xbar3[i])-qtlevel*sqrt(W[i,i]/(n-g)*(1/n1+1/n3))
+ UCI13 <- (xbar1[i]-xbar3[i])+qtlevel*sqrt(W[i,i]/(n-g)*(1/n1+1/n3))
+ cat("tau1[",i,"]-tau3[",i,"] belongs to (",LCI13,",",UCI13,")\n",sep="")
+
+ # \tau_{21}-\tau_{31}
+ LCI23 <- (xbar2[i]-xbar3[i])-qtlevel*sqrt(W[i,i]/(n-g)*(1/n2+1/n3))
+ UCI23 <- (xbar2[i]-xbar3[i])+qtlevel*sqrt(W[i,i]/(n-g)*(1/n2+1/n3))
+ cat("tau2[",i,"]-tau3[",i,"] belongs to (",LCI23,",",UCI23,")\n",sep="")
+ }
>
> # Check normality for each group

22
> S1inv <- solve(S1)
> skull1 <- skull[skull$Period==1,-5]
> datachisq <- diag(t(t(skull1)-xbar1) %*% S1inv %*% (t(skull1)-xbar1))
> qqplot(qchisq(ppoints(500),df=p),datachisq, main="",
+ xlab="Theoretical Quantiles",ylab="Sample Quantiles")
> qqline(datachisq,distribution=function(p) qchisq(p, df = p))
>
> S2inv <- solve(S2)
> skull2 <- skull[skull$Period==2,-5]
> datachisq <- diag(t(t(skull2)-xbar2) %*% S2inv %*% (t(skull2)-xbar2))
> qqplot(qchisq(ppoints(500),df=p),datachisq,main="",
+ xlab="Theoretical Quantiles",ylab="Sample Quantiles")
> qqline(datachisq,distribution=function(p) qchisq(p, df = p))
>
> S3inv <- solve(S3)
> skull3 <- skull[skull$Period==3,-5]
> datachisq <- diag(t(t(skull3)-xbar3) %*% S3inv %*% (t(skull3)-xbar3))
> qqplot(qchisq(ppoints(500),df=p),datachisq,main="",
+ xlab="Theoretical Quantiles",ylab="Sample Quantiles")
> qqline(datachisq,distribution=function(p) qchisq(p, df = p))

R code for Problem 6.37.


> turtle <- read.table(’./T6-9.dat’)
>
> library(’biotools’)
> box <- boxM(turtle[,-4], turtle[,4])
R code for Problem 6.39.
> anacondas <- read.table(’./T6-19.dat’)
> colnames(anacondas) <- c(’length’, ’weight’, ’sex’)
> # test equal variance
> library(’biotools’)
> box <- boxM(anacondas[,-3], anacondas[,3])
>
> # (a)
> p <- 2
> alpha <- 0.05
> n1 <- dim(anacondas[anacondas$sex==’M’,])[1]
> n2 <- dim(anacondas[anacondas$sex==’F’,])[1]
> xbar1 <- colMeans(anacondas[anacondas$sex==’M’,-3])
> xbar2 <- colMeans(anacondas[anacondas$sex==’F’,-3])
> S1 <- cov(anacondas[anacondas$sex==’M’,-3])
> S2 <- cov(anacondas[anacondas$sex==’F’,-3])
> Sp <- S1/n1+S2/n2
> Spinv <- solve(Sp)
>
> # test statistic
> T2 <- (xbar1-xbar2) %*% Spinv %*% (xbar1-xbar2)
>

23
> # critical value
> qchisq(1-alpha,df=2)
>
> # or one can simply apply MANOVA model
> fit.lm <- lm(cbind(length,weight)~sex, data=anacondas)
> fit.manova <- manova(fit.lm)
> summary(fit.manova, test="Wilks")
>
> # (c)
> Sp <- ((n1-1)*S1+(n2-1)*S2)/(n1+n2-2)
> qlevel <- qt(1-alpha/(2*p),df=n1+n2-2)
> LCIx <- (xbar1[1]-xbar2[1])-qlevel*sqrt((1/n1+1/n2)*Sp[1,1])
> UCIx <- (xbar1[1]-xbar2[1])+qlevel*sqrt((1/n1+1/n2)*Sp[1,1])
> LCIy <- (xbar1[2]-xbar2[2])-qlevel*sqrt((1/n1+1/n2)*Sp[2,2])
> UCIy <- (xbar1[2]-xbar2[2])+qlevel*sqrt((1/n1+1/n2)*Sp[2,2])

24

You might also like