0% found this document useful (0 votes)
98 views

Chap2 Multivariate Normal and Related Distributions

The document discusses properties and distributions related to the multivariate normal distribution. It provides 10 key properties, including: 1) linear combinations of multivariate normal variables are normally distributed; 2) the moment generating function identifies a multivariate normal; 3) sums of independent multivariate normals are multivariate normal. It also discusses the maximum likelihood estimators of the mean (sample mean) and covariance matrix ((n-1) times the sample covariance matrix/n) and their sampling distributions.

Uploaded by

chanpein
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

Chap2 Multivariate Normal and Related Distributions

The document discusses properties and distributions related to the multivariate normal distribution. It provides 10 key properties, including: 1) linear combinations of multivariate normal variables are normally distributed; 2) the moment generating function identifies a multivariate normal; 3) sums of independent multivariate normals are multivariate normal. It also discusses the maximum likelihood estimators of the mean (sample mean) and covariance matrix ((n-1) times the sample covariance matrix/n) and their sampling distributions.

Uploaded by

chanpein
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

STAT7005 Multivariate Methods

2. Multinormal and Related Distributions

2 Multivariate Normal and Related Distributions

2.1 Multivariate Normal Distribution


Definition

A random vector x is said to have a multivariate normal distribution


(multinormal distribution) if every linear combination of its components has
a univariate normal distribution.

Suppose a = (a1 a2 )0 and x = (x1 x2 )0 . The multinormality of x requires that


a0 x = a1 x1 + a2 x2 is univariate normal for all a1 and a2 . Graphically,

HKU STAT7005 1
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

Properties

1. If x is multinormal, for any constant vector a,

a0 x ∼ N (a0 µ, a0 Σa).

Proof: Since E(a0 x) = a0 µ and Var(a0 x) = a0 Σa, the result follows by


knowing that a0 x is univariate normal. 

2. The m.g.f. of a multinormal random vector x with mean vector µ and


covariance matrix Σ is given by
 
0 1 0
Mx (t) = exp t µ + t Σt .
2

Thus, a multinormal distribution is identified by its means µ and covariances


Σ. We use the notation x ∼ Np (µ, Σ).
0
Hint: Mx (t) = E(et x ) = E(ey ), where t = (t1 , · · · , tp )0 and y = t0 x ∼
N (t0 µ, t0 Σt), y is a linear combination of components in x.
Recall that the moment generating function (m.g.f.) of a univariate normal
2
x ∼ N (µ, σ 2 ) is Mx (t) = exp(µt + σ2 t2 ) and the kth moment is generated by
dk Mx (t)
E(xk ) = dtk
.
t=0

3. Given x ∼ Np (µ1 , Σ1 ) and y ∼ Np (µ2 , Σ2 ). If x and y are independent,

x + y ∼ Np (µ1 + µ2 , Σ1 + Σ2 ).

Hint: Use property (2).

4. If x ∼ Np (µ, Σ), for any constant m × p matrix A and constant m × 1 vector


d,
Ax + d ∼ Nm (Aµ + d, AΣA0 ).
Hint: Use property (2).

5. Given a positive definite (i.e. non-singular square, or invertible) matrix Σ.


Then, x ∼ Np (µ, Σ) iff there exists a non-singular matrix B and z ∼ Np (0, I)
such that
x = µ + Bz.
In this case, Σ = BB 0 .
Hint: Use property (4) and the fact that the decomposition Σ = BB 0 always
exists.
In the univariate case, the result can be regarded as x = µ+σz or z = (x−µ)/σ
with z ∼ N (0, 1).

HKU STAT7005 2
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

6. The pdf of x ∼ Np (µ, Σ) is given by


 
1 1 0 −1
f (x) = p 1 exp − (x − µ) Σ (x − µ) , (Σ is p.d.).
(2π) 2 |Σ| 2 2

Hint: Use property (5) and the transformation z = B −1 (x − µ).


     
x1 µ1 Σ11 Σ12
7. Let x = ,µ= and Σ = , where x1 consists of the
x2 µ2 Σ21 Σ22
first q components and x2 consists of the last (p − q) components.

(a) x1 and x2 are independent iff Cov(x1 , x2 ) = Σ12 = 0.


Proof: (⇒) independence ⇒ Cov(x1 , x2 ) = 0
(⇐) use definition: f (x1 , x2 ) = f (x1 )f (x2 ). 
(b) x1 ∼ Nq (µ1 , Σ11 ) and x2 ∼ Np−q (µ2 , Σ22 ).
Hint: Use property (4).
(c) (x1 − Σ12 Σ−122 x2 ) is independent of x2 and is distributed as Nq (µ1 −
Σ12 Σ22 µ2 , Σ11 − Σ12 Σ−1
−1
22 Σ21 ).

Hint: Use property 7(a).


(d) Given x2 , x1 ∼ Nq (µ1 + Σ12 Σ−1 −1
22 (x2 − µ2 ),Σ11 − Σ12 Σ22 Σ21 ).

Hint: Use property 7(c).

Property 7(d) implies, E(x1 |x2 ) = µ1 + Σ12 Σ−122 (x2 − µ2 ) and Var(x1 |x2 ) =
−1
Σ11 − Σ12 Σ22 Σ21 which does not change with the value of x2 . Indeed, the
results of Property 7(d) is related to the multivariate linear regression model.
To summarize, the marginal pdf and conditional pdf of a multivariate normal
distribution are still multivariate normal.

8. Given E(x) = µ and Var(x) = Σ and p × p symmetric matrix A. Then,

E(x0 Ax) = µ0 Aµ + tr(AΣ).

(This statement is also true for any random x).


Exercise.

9. Given x ∼ Np (µ, Σ) and Σ is p.d.. Then,

(x − µ)0 Σ−1 (x − µ) ∼ χ2 (p).

Proof. From property (5), we have z = (z1 , . . . , zp )0 = B −1 (x − µ) ∼ Np (0, I)


Pp z12, . . . , z2p are i.i.d. N (0, 1). The result follows by the fact that
and hence
0
z z = i=1 zi ∼ χ (p). 

10. Let x ∼ Np (µ, Σ) and Σ is p.d.. Then, for any m × p matrix A and n × p
matrix B,

HKU STAT7005 3
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

(a) Ax is independent of Bx iff AΣB 0 = 0.


(b) x0 Ax (A is symmetric) is independent of Bx iff BΣA = 0.
(c) x0 Ax and x0 Bx (A and B are both symmetric) are independent iff
AΣB = 0.

(a). Exercise; Proofs for (b) and (c), see Searle (1971, Linear Model, p.59-60).

2.2 Estimators of µ and Σ and Their Sampling


Distributions
Suppose x1 , . . . , xn are i.i.d. Np (µ, Σ) where Σ is positive definite, the sample mean
vector x̄ and the sample covariance matrix S defined in Section 1.4 are unbiased
estimators. By the Law of Large Numbers, these sample quantities approach to µ
and Σ.

As shown in what follows, the method of Maximum Likelihood Estimation (MLE)


will also give x̄ as MLE of µ, but the MLE of Σ is slightly different from S, namely
(n − 1)S/n, which is close to S when n is large.

The likelihood function is

L(µ, Σ) = f (x1 , x2 , . . . , xn )
= f (x1 )f (x2 ) · · · f (xn ) x1 , x2 , . . . , xn are independent
Yn
= f (xi )
i=1
n  
Y 1 1 0 −1
= p 1 exp − (xi − µ) Σ (xi − µ)
i=1 (2π) 2 |Σ| 2 2
( n
)
1 1X
= np n exp − (xi − µ)0 Σ−1 (xi − µ) .
(2π) |Σ|
2 2 2 i=1

HKU STAT7005 4
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

and thus the log-likelihood function is


`(µ, Σ) = log L(µ, Σ)
n
np n 1X
=− log(2π) − log |Σ| − (xi − µ)0 Σ−1 (xi − µ)
2 2 2 i=1
n
np n 1X
= − log(2π) − log |Σ| − tr[(xi − µ)0 Σ−1 (xi − µ)]
2 2 2 i=1
" n
#
np n 1 X
= − log(2π) − log |Σ| − tr Σ−1 (xi − µ)(xi − µ)0 .
2 2 2 i=1

By the method of completing squares,


n
X n
X
0
(xi −µ)(xi −µ) = (xi − x̄)(xi − x̄)0 +n(x̄−µ)(x̄−µ)0 = W +n(x̄−µ)(x̄−µ)0 .
i=1 i=1

We therefore have
  
np n −1 W n
`(µ, Σ) = − log(2π) − log |Σ| + tr Σ − (x̄ − µ)0 Σ−1 (x̄ − µ).
2 2 n 2
(2.2.1)
0 −1
Since (x̄ − µ) Σ (x̄ − µ) ≥ 0 for any µ, we have
  
np n −1 W
`(µ, Σ) ≤ `(x̄, Σ) = − log(2π) − log |Σ| + tr Σ . (2.2.2)
2 2 n
Since the matrices Σ and W /n are positive defnite, the function
 
−1 W
g(Σ) = log |Σ| + tr Σ (2.2.3)
n
attains its minimum at Σ = W /n by means of inequality (5) on page 14 of the
Appendix 1 of Chapter 1. Then,
 
W np np n W
`(µ, Σ) ≤ ` x̄, = − log(2π) − − log .
n 2 2 2 n
Hence, the MLE of µ and Σ are respectively
W (n − 1)S
µ̂ = x̄ and Σ̂ = = . (2.2.4)
n n
Properties
1. x̄ and S are sufficient statistics for Np (µ, Σ), i.e., the conditional distribution
of the sample (x1 , . . . , xn ) given x̄ and S does not depend on µ and Σ.
2. x̄ ∼ Np (µ, n1 Σ) and (n − 1)S ∼ Wp (n − 1, Σ), a central Wishart distribution
which is defined in the next section.
3. µ̂ is unbiased but Σ̂ is biased. However, S is unbiased for Σ.
4. The MLEs possess an invariance property: if MLE of θ is θ̂, then MLE of
φ = h(θ) is φ̂ = h(θ̂), provided that h(·) is a one-to-one function.

HKU STAT7005 5
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2.3 Wishart Distribution


Definition

Suppose xi (i = 1, . . . , k) are independent Np (µi , Σ). Define a symmetric p × p


matrix V as
Xk
V = xi x0i = X 0 X.
i=1

Then, V is said to follow a Wishart distribution, denoted by Wp (k, Σ, Ψ), where


k
µi µ0i the (p × p
P
Σ is called the scaling matrix, k the degree of freedom, Ψ =
i=1
symmetric) noncentrality matrix. Indeed, the Wishart distribution can be considered
a multivariate extension of chi-squared distribution.

When Ψ = 0, the Wishart distribution is called the central Wishart distribution,


denoted simply by Wp (k, Σ).

Note that when p = 1 and Σ = σ 2 , the Wishart  Wp (k, Σ, Ψ) is reduced


 distribution
k
to a non-central chi-squared distribution σ 2 χ2 k, µ2i .
P
i=1

In the univariate case, the pdf of a central Wishart distribution is reduced to


central chi-squared distribution:

Properties

1. If V ∼ Wp (k, Σ), the pdf of V is given by


  
−k/2 (k−p−1)/2 1 −1
f (V ) = c(p, k) |Σ| |V | exp tr − Σ V
2
 k
 k−1
 k−p+1
−1
where c(p, k) = 2kp/2 π p(p−1)/4 Γ 2
Γ 2
···Γ 2
.

HKU STAT7005 6
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2. If V 1 ∼ Wp (k1 , Σ, Ψ1 ) and V 2 ∼ Wp (k2 , Σ, Ψ2 ) are independent, V 1 + V 2 ∼


Wp (k1 + k2 , Σ, Ψ1 + Ψ2 ).
k1 k2
xi x0i and V 2 = y i y 0i . Let xk1 +i = y i , i = 1, . . . , k2 .
P P
Proof: Let V 1 =
i=1 i=1
k1P
+k2
Then, V 1 + V 2 = xi x0i and the result follows. 
i=1
This implies that the sum of 2 independent Wishart random matrices can give
a Wishart random matrix.
3. If V ∼ Wp (k, Σ, Ψ), AV A0 ∼ Wq (k, AΣA0 , AΨA0 ) for any given q×p matrix
A; it is central if, in addition, AΨA0 = 0.
Exercise.
4. If V ∼ Wp (k, Σ, Ψ), a0 V a/a0 Σa ∼ χ2 (k, a0 Ψa). In particular, for the ith
diagonal element of V , vii /σii ∼ χ2 (k, Ψii ).
N.B. a is not a random vector. Hint: Use property (3).
5. If y is any random vector independent of V ∼ Wp (k, Σ), y 0 V y/y 0 Σy ∼ χ2 (k)
and is independent of y.
6. Let X be a k × p data matrix given in the Definition and M = [µ1 · · · µk ]0 .
Then, we have

(a) For a symmetric A, X 0 AX ∼ Wp (r, Σ, Ψ) iff A2 = A (i.e. idempotent),


in which case r = rank(A) = tr(A) and Ψ = M 0 AM .
(b) For symmetric idempotent matrices A and B, X 0 AX and X 0 BX are
independent iff AB = 0.
(c) For symmetric idempotent A, X 0 AX and X 0 B are independent iff
AB = 0.

Proof: See Seber (1984, Multivariate Observations, p.24-25). 


7. Let x1 , . . . , xn be a random sample from Np (µ, Σ).

(a) x̄ ∼ Np (µ, n1 Σ).


(b) (n − 1)S (= W , the CSSP matrix) ∼ Wp (n − 1, Σ).
(c) x̄ and S are independent.

Hint: (a) Use x̄ = (x1 + · · · + xn )/n; (b) Use property 6(a) with A = I − n1 J ;
(c) Use property 6(c) with B = n1 1.
8. In the Cholesky decomposition V = T T 0 of V ∼ Wp (k, I), where T = (tij )
is lower triangular (i.e. tij = 0 if i < j), all non-zero elements are mutually
independent with tij ∼ N (0, 1) for i > j and tii ∼ χ(k − i + 1).
9. If y is any random vector independent of V ∼ Wp (k, Σ), k > p, the ratio
y 0 Σ−1 y/y 0 V −1 y ∼ χ2 (k − p + 1) and is independent of y. [Note the difference
with property (5) above.]

HKU STAT7005 7
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2.4 Assessing Normality Assumption


Before any statistical modeling, it is crucial to verify if the data at hand
satisfy the underlying distributional assumptions. For most multivariate analyses,
it is important that the data indeed follow the multivariate normal, at least
approximately if not exactly. Here are some commonly used methods.

1. Check each variable for univariate normality (necessary for multinormality


but not sufficient)

• Q-Q plot (quantile against quantile plot) for normal distribution


– sample quantiles are plotted against the expected sample quantiles
of a standard normal distribution
– a straight line indicates univariate normality
– non-linearity may indicate a need to transform the variable.
For example, if x is close to a normal distribution, the corresponding
Q-Q plot is close to the reference line. Note the x values in the following
plots are being standardized, i.e. mean substraction and divided by the
standard deviation.

An alternative device is the P-P plot (probability against probability plot)


that plots the sample cdf against the theoretical cdf.

HKU STAT7005 8
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

In contrast, if x is not close to a normal distribution, the Q-Q plot may


become

• Shapiro-Wilk W test
– the test statistic of this test is a modified version of the squared
sample correlation between the sample quantiles and the expected
quantiles.
• Kolmogorov-Smirnov-Lilliefors (KSL) test (large sample, at least
hundreds)
– the test statistic of this test is the maximum difference between the
empirical cdf and the normal cdf.
• Test for zero skewness
– for symmetric distribution, the test statistic for skewness =
n
Pn (xi −x̄)3
(n−1)(n−2) i=1 s3
, see Joanes and Gill (1998, The Statistician,
183-189), is close to 0.
• Test for zero excess kurtosis
– for normal distribution, the test statistic for excess kurtosis =
n(n+1) Pn (xi −x̄)4 3(n−1)2
(n−1)(n−2)(n−3) i=1 s4
− (n−2)(n−3) , see Joanes and Gill (1998,
The Statistician, 183–189), is close to 0.

HKU STAT7005 9
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2. When n − p is large enough, we make use of property (9) of Section 2.1. Check
whether the squared generalized distance as defined below follows a chi-squared
distribution by a Q-Q plot (necessary and sufficient conditions for very
large sample size).

• Define the squared generalized distance as d2i = (xi − x̄)0 S −1 (xi − x̄),
i = 1, . . . , n.
• Order d21 , d22 , . . . , d2n as d2(1) ≤ d2(2) ≤ · · · ≤ d2(n) .
• Plot χ2(i) vs d2(i) , where χ2(i) is the 100(i − 21 )/n percentile of χ2 (p)
distribution.
• A straight line indicate multivariate normality.

For example, if x = (x1 , x2 , x3 )0 is close to a trivariate normal distribution, the


corresponding chi-squared plot is close to the reference line where the “Ordered
Squared Mahanobis Distance” and “Chi square” on the axes represent the the
values of d2(i) ’s and χ2(i) ’s respectively.

In contrast, if x = (x1 , x2 , x3 )0 is not close to a trivariate normal distribution,


the chi-squared plot may become

HKU STAT7005 10
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

3. Check each Principal Component (PC) for univariate normality (necessary


condition; and if the sample size n is large enough, a sufficient
condition)

2.5 Transformations to Near Normality


To achieve the multinormality of the data, univariate transformation is applied to
each variable individually. After then, the multinormality of transformed variables
is checked again. The following transformations are commonly used in practice:

1. Use some common transformation such as log x

2. Choose a transformation based on theory, for example variance stabilizing


transformation

• Poisson count data: x

• Binomial proportion: sin−1 x

3. Use univariate Box-Cox transformation


[λ]
The transformed xi is denoted as xi where
 λ
 xi − 1
[λ] for λ 6= 0 , i = 1, . . . , n
xi = λ
log xi for λ = 0 , i = 1, . . . , n

and λ is a unknown parameter. Typically, λ can be chosen by

• priori information of λ, or search among the following λ values

Power, λ Transformation
3 cubic
2 square
1 no transform
0.5 square-root
1/3 cubic-root
0 log
−1/3 inverse of cubic-root
−0.5 inverse of square-root
−1 inverse
−2 inverse of square
−3 inverse of cubic

Since x[λ] ’s are of different scale for different λ, so it makes it difficult to find
the optimal λ value. We instead consider the power transformation as

HKU STAT7005 11
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

follows: 
xλi − 1
if λ 6= 0

[λ]
 , , i = 1, . . . , n
xi = λ[GM (x)]λ−1

 GM (x) log x ,
i if λ = 0 , i = 1, . . . , n
where GM (x) = (x1 x2 · · · xn )1/n is the geometric mean of the observations
x1 , x2 , . . . , xn . We choose the value of λ that minimizes the sum of squares of
residuals (equivalent to sample variance) after transformation.

In the above example, after the Box-Cox transformation of x with λ = 2, x2


becomes closer to the normal distribution (note the scales of the y-axis are
different).

4. Box-Cox transformation for multivariate data


The simplest way is to transform each variable by univariate Box-Cox
transformation with different parameter λ. Then the individual optimal λ
value for each of the x variables is obtained. Notice that each of the x variables
after transformation is normally distributed does not imply that they jointly
follow a multivariate normal distribution. Actually, the multivariate normality
[λ ] [λ ] [λ ]
of (X1 1 , X2 2 , . . . , Xp p ) is assumed.

HKU STAT7005 12
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

Remarks

1. The transformations above require x ≥ 0 and some of them require x > 0.

2. Box-Cox transformation cannot guarantee that the transformed variable is


close to the normal distribution.

For more general transformation methods, see Sakia (1992, The Statistician,
169–178).

Example 2.1 Mosteller and Tukey (1977) considered data from The Coleman
Report on the relationships between several variables and mean verbal test scores
for sixth graders at 20 schools in the New England and Mid-Atlantic regions of the
United States. Two variables, x1 = the percentage of sixth-graders’ fathers employed
in white collar jobs and x2 = one-half of the sixth-graders’ mothers’ mean number
of years of schooling, are given in the following table.

x1 x2 x1 x2
28.87 6.19 12.20 5.62
20.10 5.17 22.55 5.34
69.05 7.04 14.30 5.80
65.40 7.10 31.79 6.19
29.59 6.15 11.60 5.62
44.82 6.41 68.47 6.94
77.37 6.86 42.64 6.33
24.67 5.78 16.70 6.01
65.01 6.51 86.27 7.51
9.99 5.57 76.73 6.96

The joint density of x1 and x2 is given below

Clearly, the joint density of x1 and x2 illustrates a bimodal density. By the


1/100
transformation of x1 into x1 , their joint density becomes

HKU STAT7005 13
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

The bimodal situation of x1 and x2 is improved even though the problem is not
completely solved.

Example 2.2 The data below were collected in an experiment on jute in Bishnupur
village of West Bengal, India, in which the weights of green jute plants (x1 ) and
their dry jute fibers (x2 ) were recorded for 20 randomly selected individual plants.

Plant No. x1 x2 Plant No. x1 x2


1 68 971 11 33 462
2 63 892 12 27 352
3 70 1125 13 21 305
4 6 82 14 5 84
5 65 931 15 14 229
6 9 112 16 27 332
7 10 162 17 17 185
8 12 321 18 53 703
9 20 315 19 62 872
10 30 375 20 65 740

1. Examine the univariate normality of x1 and x2 .

2. Examine the bivariate normality of x1 and x2 by chi-square plot.

3. Examine the bivariate normality of x1 and x2 by principal component method.

4. Assume bivariate normality on (x1 , x2 ),

(a) Compute the Hotelling’s T 2 = n(x̄ − µ0 )0 S −1 (x̄ − µ0 ) statistic where µ0


is the hypothesized mean, for testing H0 : µ1 = 50, µ2 = 1000.
(b) Find the maximum likelihood estimate for E(x1 |x2 ) in terms of x2 . What
is the maximum likelihood estimate of E(x1 |x2 = 200)?

5. Determine the optimal Box-Cox transformation for each of x1 and x2 . Examine


the bivariate normality of the transformed variables.

HKU STAT7005 14
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

Solution
1. Graphically, the Q-Q plots for x1 and x2 are given below:

We can see that both Q-Q plots are not close to the reference line. Therefore,
we can say that both of them may not be normally distributed.
We have the results of Kolmogorov-Smirnov test and Shapiro-Wilk test.

$univariateNormality
Test Variable Statistic p value Normality
1 Shapiro-Wilk x1 0.8585 0.0074 NO
2 Shapiro-Wilk x2 0.8893 0.0261 NO

> describe(plant[,2:3],type=2)
vars n mean sd median trimmed mad min max
x1 1 20 33.85 23.88 27 33.0 25.95 5 70
x2 2 20 477.50 335.27 342 455.5 303.93 82 1125
range skew kurtosis se
x1 65 0.42 -1.55 5.34
x2 1043 0.60 -1.09 74.97

One-sample Kolmogorov-Smirnov test


data: plant$x1
D = 0.18072, p-value = 0.5308
alternative hypothesis: two-sided

One-sample Kolmogorov-Smirnov test


data: plant$x2
D = 0.22009, p-value = 0.2482
alternative hypothesis: two-sided

Here, only the results of Shapiro-Wilk test are considered because the sample
size is small and the Kolmogorov-Smirnov test is valid for large-sample. From
the Shapiro-Wilk test, both x1 and x2 are non-normal.

HKU STAT7005 15
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2. The chi-square plot is shown as follow:

Obviously, the chi-square plot is not close to the reference line. Therefore, x1
and x2 may not be bivariate normal.

3. A principal component analysis is applied and the normality tests are applied
to the principal component scores.

$univariateNormality
Test Variable Statistic p value Normality
1 Shapiro-Wilk Comp.1 0.8892 0.0260 NO
2 Shapiro-Wilk Comp.2 0.8813 0.0187 NO

vars n mean sd median trimmed mad min


PC1 1 20 0 336.08 135.65 22.01 305.00 -648.45
PC2 2 20 0 4.78 -0.01 0.22 2.76 -10.90
max range skew kurtosis se
PC1 396.48 1044.92 -0.6 -1.09 75.15
PC2 12.79 23.69 0.1 3.15 1.07

Neither the first or the second components is normaly distributed. x1 and x2


are not bivariate normal.

4. The sample covariance matrix and sample mean vector are given as follows.

> sx
x1 x2
x1 570.450 7845.079
x2 7845.079 112404.263
> mx
x1 x2
33.85 477.50

HKU STAT7005 16
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

(a) By using the formula T 2 = n(x − µ)0 S −1 (x − µ) where µ = (50, 1000)0 ,


the computed T 2 is 408.8485.
(b) The MLE estimate for E(x1 |x2 ) is given by
s12
E (X1\|X2 = x2 ) = x̄1 + (x2 − x̄2 ),
s22
Pn
where
Pn s 12 = i=1 (xi1 − x1 ) (xi2 − x2 ) /(n − 1) and s22 =
2
i=1 (xi2 − x2 ) /(n − 1). Note that the above formula is the same as
that in the case of simple linear regression.
Given that x2 = 200, we have

E (X1\
|X2 = 200) = 14.4823.

5. The optimal λ’s for x1 and x2 are given as follows.

> lambda <- BoxCox.lambda( plant$x1,method="loglik" )


> lambda
[1] 0.25
> lambda <- BoxCox.lambda( plant$x2,method="loglik" )
> lambda
[1] 0.25

Therefore,
(λ) x10.25 − 1 (λ) x20.25 − 1
x1 = , x2 =
0.25 0.25

$univariateNormality
Test Variable Statistic p value Normality
1 Shapiro-Wilk tx1 0.9166 0.0852 YES
2 Shapiro-Wilk tx2 0.9382 0.2217 YES

$Descriptives
n Mean Std.Dev Median Min Max 25th
tx1 20 5.142587 1.878541 5.118028 1.981395 7.57003 3.664219
tx2 20 13.771677 3.527091 13.200129 8.036867 19.16584 11.358266
75th Skew Kurtosis
tx1 7.235512 -0.08937166 -1.461265
tx2 17.081050 -0.05304673 -1.322734

HKU STAT7005 17
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

The Shapiro-Wilk test shows that the transformed variables are univariate
normal. However the chi-square plot shows that they are not bivariate normal.

HKU STAT7005 18

You might also like