0% found this document useful (0 votes)

98 views

Chap2 Multivariate Normal and Related Distributions

The document discusses properties and distributions related to the multivariate normal distribution. It provides 10 key properties, including: 1) linear combinations of multivariate normal variables are normally distributed; 2) the moment generating function identifies a multivariate normal; 3) sums of independent multivariate normals are multivariate normal. It also discusses the maximum likelihood estimators of the mean (sample mean) and covariance matrix ((n-1) times the sample covariance matrix/n) and their sampling distributions.

Uploaded by

chanpein

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views

Chap2 Multivariate Normal and Related Distributions

Uploaded by

chanpein

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

STAT7005 Multivariate Methods

2. Multinormal and Related Distributions

2 Multivariate Normal and Related Distributions

2.1 Multivariate Normal Distribution

Definition

A random vector x is said to have a multivariate normal distribution

(multinormal distribution) if every linear combination of its components has
a univariate normal distribution.

Suppose a = (a1 a2 )0 and x = (x1 x2 )0 . The multinormality of x requires that

a0 x = a1 x1 + a2 x2 is univariate normal for all a1 and a2 . Graphically,

HKU STAT7005 1
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

Properties

1. If x is multinormal, for any constant vector a,

a0 x ∼ N (a0 µ, a0 Σa).

Proof: Since E(a0 x) = a0 µ and Var(a0 x) = a0 Σa, the result follows by

knowing that a0 x is univariate normal.

2. The m.g.f. of a multinormal random vector x with mean vector µ and

covariance matrix Σ is given by

0 1 0
Mx (t) = exp t µ + t Σt .
2

Thus, a multinormal distribution is identified by its means µ and covariances

Σ. We use the notation x ∼ Np (µ, Σ).
0
Hint: Mx (t) = E(et x ) = E(ey ), where t = (t1 , · · · , tp )0 and y = t0 x ∼
N (t0 µ, t0 Σt), y is a linear combination of components in x.
Recall that the moment generating function (m.g.f.) of a univariate normal
2
x ∼ N (µ, σ 2 ) is Mx (t) = exp(µt + σ2 t2 ) and the kth moment is generated by
dk Mx (t)
E(xk ) = dtk
.
t=0

3. Given x ∼ Np (µ1 , Σ1 ) and y ∼ Np (µ2 , Σ2 ). If x and y are independent,

x + y ∼ Np (µ1 + µ2 , Σ1 + Σ2 ).

Hint: Use property (2).

4. If x ∼ Np (µ, Σ), for any constant m × p matrix A and constant m × 1 vector

d,
Ax + d ∼ Nm (Aµ + d, AΣA0 ).
Hint: Use property (2).

5. Given a positive definite (i.e. non-singular square, or invertible) matrix Σ.

Then, x ∼ Np (µ, Σ) iff there exists a non-singular matrix B and z ∼ Np (0, I)
such that
x = µ + Bz.
In this case, Σ = BB 0 .
Hint: Use property (4) and the fact that the decomposition Σ = BB 0 always
exists.
In the univariate case, the result can be regarded as x = µ+σz or z = (x−µ)/σ
with z ∼ N (0, 1).

HKU STAT7005 2
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

6. The pdf of x ∼ Np (µ, Σ) is given by

1 1 0 −1
f (x) = p 1 exp − (x − µ) Σ (x − µ) , (Σ is p.d.).
(2π) 2 |Σ| 2 2

Hint: Use property (5) and the transformation z = B −1 (x − µ).

x1 µ1 Σ11 Σ12
7. Let x = ,µ= and Σ = , where x1 consists of the
x2 µ2 Σ21 Σ22
first q components and x2 consists of the last (p − q) components.

(a) x1 and x2 are independent iff Cov(x1 , x2 ) = Σ12 = 0.

Proof: (⇒) independence ⇒ Cov(x1 , x2 ) = 0
(⇐) use definition: f (x1 , x2 ) = f (x1 )f (x2 ).
(b) x1 ∼ Nq (µ1 , Σ11 ) and x2 ∼ Np−q (µ2 , Σ22 ).
Hint: Use property (4).
(c) (x1 − Σ12 Σ−122 x2 ) is independent of x2 and is distributed as Nq (µ1 −
Σ12 Σ22 µ2 , Σ11 − Σ12 Σ−1
−1
22 Σ21 ).

Hint: Use property 7(a).

(d) Given x2 , x1 ∼ Nq (µ1 + Σ12 Σ−1 −1
22 (x2 − µ2 ),Σ11 − Σ12 Σ22 Σ21 ).

Hint: Use property 7(c).

Property 7(d) implies, E(x1 |x2 ) = µ1 + Σ12 Σ−122 (x2 − µ2 ) and Var(x1 |x2 ) =
−1
Σ11 − Σ12 Σ22 Σ21 which does not change with the value of x2 . Indeed, the
results of Property 7(d) is related to the multivariate linear regression model.
To summarize, the marginal pdf and conditional pdf of a multivariate normal
distribution are still multivariate normal.

8. Given E(x) = µ and Var(x) = Σ and p × p symmetric matrix A. Then,

E(x0 Ax) = µ0 Aµ + tr(AΣ).

(This statement is also true for any random x).

Exercise.

9. Given x ∼ Np (µ, Σ) and Σ is p.d.. Then,

(x − µ)0 Σ−1 (x − µ) ∼ χ2 (p).

Proof. From property (5), we have z = (z1 , . . . , zp )0 = B −1 (x − µ) ∼ Np (0, I)

Pp z12, . . . , z2p are i.i.d. N (0, 1). The result follows by the fact that
and hence
0
z z = i=1 zi ∼ χ (p).

10. Let x ∼ Np (µ, Σ) and Σ is p.d.. Then, for any m × p matrix A and n × p
matrix B,

HKU STAT7005 3
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

(a) Ax is independent of Bx iff AΣB 0 = 0.

(b) x0 Ax (A is symmetric) is independent of Bx iff BΣA = 0.
(c) x0 Ax and x0 Bx (A and B are both symmetric) are independent iff
AΣB = 0.

(a). Exercise; Proofs for (b) and (c), see Searle (1971, Linear Model, p.59-60).

2.2 Estimators of µ and Σ and Their Sampling

Distributions
Suppose x1 , . . . , xn are i.i.d. Np (µ, Σ) where Σ is positive definite, the sample mean
vector x̄ and the sample covariance matrix S defined in Section 1.4 are unbiased
estimators. By the Law of Large Numbers, these sample quantities approach to µ
and Σ.

As shown in what follows, the method of Maximum Likelihood Estimation (MLE)

will also give x̄ as MLE of µ, but the MLE of Σ is slightly different from S, namely
(n − 1)S/n, which is close to S when n is large.

The likelihood function is

L(µ, Σ) = f (x1 , x2 , . . . , xn )
= f (x1 )f (x2 ) · · · f (xn ) x1 , x2 , . . . , xn are independent
Yn
= f (xi )
i=1
n
Y 1 1 0 −1
= p 1 exp − (xi − µ) Σ (xi − µ)
i=1 (2π) 2 |Σ| 2 2
( n
)
1 1X
= np n exp − (xi − µ)0 Σ−1 (xi − µ) .
(2π) |Σ|
2 2 2 i=1

HKU STAT7005 4
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

and thus the log-likelihood function is

`(µ, Σ) = log L(µ, Σ)
n
np n 1X
=− log(2π) − log |Σ| − (xi − µ)0 Σ−1 (xi − µ)
2 2 2 i=1
n
np n 1X
= − log(2π) − log |Σ| − tr[(xi − µ)0 Σ−1 (xi − µ)]
2 2 2 i=1
" n
#
np n 1 X
= − log(2π) − log |Σ| − tr Σ−1 (xi − µ)(xi − µ)0 .
2 2 2 i=1

By the method of completing squares,

n
X n
X
0
(xi −µ)(xi −µ) = (xi − x̄)(xi − x̄)0 +n(x̄−µ)(x̄−µ)0 = W +n(x̄−µ)(x̄−µ)0 .
i=1 i=1

We therefore have

np n −1 W n
`(µ, Σ) = − log(2π) − log |Σ| + tr Σ − (x̄ − µ)0 Σ−1 (x̄ − µ).
2 2 n 2
(2.2.1)
0 −1
Since (x̄ − µ) Σ (x̄ − µ) ≥ 0 for any µ, we have

np n −1 W
`(µ, Σ) ≤ `(x̄, Σ) = − log(2π) − log |Σ| + tr Σ . (2.2.2)
2 2 n
Since the matrices Σ and W /n are positive defnite, the function

−1 W
g(Σ) = log |Σ| + tr Σ (2.2.3)
n
attains its minimum at Σ = W /n by means of inequality (5) on page 14 of the
Appendix 1 of Chapter 1. Then,

W np np n W
`(µ, Σ) ≤ ` x̄, = − log(2π) − − log .
n 2 2 2 n
Hence, the MLE of µ and Σ are respectively
W (n − 1)S
µ̂ = x̄ and Σ̂ = = . (2.2.4)
n n
Properties
1. x̄ and S are sufficient statistics for Np (µ, Σ), i.e., the conditional distribution
of the sample (x1 , . . . , xn ) given x̄ and S does not depend on µ and Σ.
2. x̄ ∼ Np (µ, n1 Σ) and (n − 1)S ∼ Wp (n − 1, Σ), a central Wishart distribution
which is defined in the next section.
3. µ̂ is unbiased but Σ̂ is biased. However, S is unbiased for Σ.
4. The MLEs possess an invariance property: if MLE of θ is θ̂, then MLE of
φ = h(θ) is φ̂ = h(θ̂), provided that h(·) is a one-to-one function.

HKU STAT7005 5
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2.3 Wishart Distribution

Definition

Suppose xi (i = 1, . . . , k) are independent Np (µi , Σ). Define a symmetric p × p

matrix V as
Xk
V = xi x0i = X 0 X.
i=1

Then, V is said to follow a Wishart distribution, denoted by Wp (k, Σ, Ψ), where

k
µi µ0i the (p × p
P
Σ is called the scaling matrix, k the degree of freedom, Ψ =
i=1
symmetric) noncentrality matrix. Indeed, the Wishart distribution can be considered
a multivariate extension of chi-squared distribution.

When Ψ = 0, the Wishart distribution is called the central Wishart distribution,

denoted simply by Wp (k, Σ).

Note that when p = 1 and Σ = σ 2 , the Wishart Wp (k, Σ, Ψ) is reduced

distribution
k
to a non-central chi-squared distribution σ 2 χ2 k, µ2i .
P
i=1

In the univariate case, the pdf of a central Wishart distribution is reduced to

central chi-squared distribution:

Properties

1. If V ∼ Wp (k, Σ), the pdf of V is given by

−k/2 (k−p−1)/2 1 −1
f (V ) = c(p, k) |Σ| |V | exp tr − Σ V
2
k
k−1
k−p+1
−1
where c(p, k) = 2kp/2 π p(p−1)/4 Γ 2
Γ 2
···Γ 2
.

HKU STAT7005 6
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2. If V 1 ∼ Wp (k1 , Σ, Ψ1 ) and V 2 ∼ Wp (k2 , Σ, Ψ2 ) are independent, V 1 + V 2 ∼

Wp (k1 + k2 , Σ, Ψ1 + Ψ2 ).
k1 k2
xi x0i and V 2 = y i y 0i . Let xk1 +i = y i , i = 1, . . . , k2 .
P P
Proof: Let V 1 =
i=1 i=1
k1P
+k2
Then, V 1 + V 2 = xi x0i and the result follows.
i=1
This implies that the sum of 2 independent Wishart random matrices can give
a Wishart random matrix.
3. If V ∼ Wp (k, Σ, Ψ), AV A0 ∼ Wq (k, AΣA0 , AΨA0 ) for any given q×p matrix
A; it is central if, in addition, AΨA0 = 0.
Exercise.
4. If V ∼ Wp (k, Σ, Ψ), a0 V a/a0 Σa ∼ χ2 (k, a0 Ψa). In particular, for the ith
diagonal element of V , vii /σii ∼ χ2 (k, Ψii ).
N.B. a is not a random vector. Hint: Use property (3).
5. If y is any random vector independent of V ∼ Wp (k, Σ), y 0 V y/y 0 Σy ∼ χ2 (k)
and is independent of y.
6. Let X be a k × p data matrix given in the Definition and M = [µ1 · · · µk ]0 .
Then, we have

(a) For a symmetric A, X 0 AX ∼ Wp (r, Σ, Ψ) iff A2 = A (i.e. idempotent),

in which case r = rank(A) = tr(A) and Ψ = M 0 AM .
(b) For symmetric idempotent matrices A and B, X 0 AX and X 0 BX are
independent iff AB = 0.
(c) For symmetric idempotent A, X 0 AX and X 0 B are independent iff
AB = 0.

Proof: See Seber (1984, Multivariate Observations, p.24-25).

7. Let x1 , . . . , xn be a random sample from Np (µ, Σ).

(a) x̄ ∼ Np (µ, n1 Σ).

(b) (n − 1)S (= W , the CSSP matrix) ∼ Wp (n − 1, Σ).
(c) x̄ and S are independent.

Hint: (a) Use x̄ = (x1 + · · · + xn )/n; (b) Use property 6(a) with A = I − n1 J ;
(c) Use property 6(c) with B = n1 1.
8. In the Cholesky decomposition V = T T 0 of V ∼ Wp (k, I), where T = (tij )
is lower triangular (i.e. tij = 0 if i < j), all non-zero elements are mutually
independent with tij ∼ N (0, 1) for i > j and tii ∼ χ(k − i + 1).
9. If y is any random vector independent of V ∼ Wp (k, Σ), k > p, the ratio
y 0 Σ−1 y/y 0 V −1 y ∼ χ2 (k − p + 1) and is independent of y. [Note the difference
with property (5) above.]

HKU STAT7005 7
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2.4 Assessing Normality Assumption

Before any statistical modeling, it is crucial to verify if the data at hand
satisfy the underlying distributional assumptions. For most multivariate analyses,
it is important that the data indeed follow the multivariate normal, at least
approximately if not exactly. Here are some commonly used methods.

1. Check each variable for univariate normality (necessary for multinormality

but not sufficient)

• Q-Q plot (quantile against quantile plot) for normal distribution

– sample quantiles are plotted against the expected sample quantiles
of a standard normal distribution
– a straight line indicates univariate normality
– non-linearity may indicate a need to transform the variable.
For example, if x is close to a normal distribution, the corresponding
Q-Q plot is close to the reference line. Note the x values in the following
plots are being standardized, i.e. mean substraction and divided by the
standard deviation.

An alternative device is the P-P plot (probability against probability plot)

that plots the sample cdf against the theoretical cdf.

HKU STAT7005 8
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

In contrast, if x is not close to a normal distribution, the Q-Q plot may

become

• Shapiro-Wilk W test
– the test statistic of this test is a modified version of the squared
sample correlation between the sample quantiles and the expected
quantiles.
• Kolmogorov-Smirnov-Lilliefors (KSL) test (large sample, at least
hundreds)
– the test statistic of this test is the maximum difference between the
empirical cdf and the normal cdf.
• Test for zero skewness
– for symmetric distribution, the test statistic for skewness =
n
Pn (xi −x̄)3
(n−1)(n−2) i=1 s3
, see Joanes and Gill (1998, The Statistician,
183-189), is close to 0.
• Test for zero excess kurtosis
– for normal distribution, the test statistic for excess kurtosis =
n(n+1) Pn (xi −x̄)4 3(n−1)2
(n−1)(n−2)(n−3) i=1 s4
− (n−2)(n−3) , see Joanes and Gill (1998,
The Statistician, 183–189), is close to 0.

HKU STAT7005 9
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2. When n − p is large enough, we make use of property (9) of Section 2.1. Check
whether the squared generalized distance as defined below follows a chi-squared
distribution by a Q-Q plot (necessary and sufficient conditions for very
large sample size).

• Define the squared generalized distance as d2i = (xi − x̄)0 S −1 (xi − x̄),
i = 1, . . . , n.
• Order d21 , d22 , . . . , d2n as d2(1) ≤ d2(2) ≤ · · · ≤ d2(n) .
• Plot χ2(i) vs d2(i) , where χ2(i) is the 100(i − 21 )/n percentile of χ2 (p)
distribution.
• A straight line indicate multivariate normality.

For example, if x = (x1 , x2 , x3 )0 is close to a trivariate normal distribution, the

corresponding chi-squared plot is close to the reference line where the “Ordered
Squared Mahanobis Distance” and “Chi square” on the axes represent the the
values of d2(i) ’s and χ2(i) ’s respectively.

In contrast, if x = (x1 , x2 , x3 )0 is not close to a trivariate normal distribution,

the chi-squared plot may become

HKU STAT7005 10
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

3. Check each Principal Component (PC) for univariate normality (necessary

condition; and if the sample size n is large enough, a sufficient
condition)

2.5 Transformations to Near Normality

To achieve the multinormality of the data, univariate transformation is applied to
each variable individually. After then, the multinormality of transformed variables
is checked again. The following transformations are commonly used in practice:

1. Use some common transformation such as log x

2. Choose a transformation based on theory, for example variance stabilizing

transformation
√
• Poisson count data: x
√
• Binomial proportion: sin−1 x

3. Use univariate Box-Cox transformation

[λ]
The transformed xi is denoted as xi where
 λ
 xi − 1
[λ] for λ 6= 0 , i = 1, . . . , n
xi = λ
log xi for λ = 0 , i = 1, . . . , n


and λ is a unknown parameter. Typically, λ can be chosen by

• priori information of λ, or search among the following λ values

Power, λ Transformation
3 cubic
2 square
1 no transform
0.5 square-root
1/3 cubic-root
0 log
−1/3 inverse of cubic-root
−0.5 inverse of square-root
−1 inverse
−2 inverse of square
−3 inverse of cubic

Since x[λ] ’s are of different scale for different λ, so it makes it difficult to find
the optimal λ value. We instead consider the power transformation as

HKU STAT7005 11
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

follows: 
xλi − 1
if λ 6= 0

[λ]
 , , i = 1, . . . , n
xi = λ[GM (x)]λ−1

 GM (x) log x ,
i if λ = 0 , i = 1, . . . , n
where GM (x) = (x1 x2 · · · xn )1/n is the geometric mean of the observations
x1 , x2 , . . . , xn . We choose the value of λ that minimizes the sum of squares of
residuals (equivalent to sample variance) after transformation.

In the above example, after the Box-Cox transformation of x with λ = 2, x2

becomes closer to the normal distribution (note the scales of the y-axis are
different).

4. Box-Cox transformation for multivariate data

The simplest way is to transform each variable by univariate Box-Cox
transformation with different parameter λ. Then the individual optimal λ
value for each of the x variables is obtained. Notice that each of the x variables
after transformation is normally distributed does not imply that they jointly
follow a multivariate normal distribution. Actually, the multivariate normality
[λ ] [λ ] [λ ]
of (X1 1 , X2 2 , . . . , Xp p ) is assumed.

HKU STAT7005 12
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

Remarks

1. The transformations above require x ≥ 0 and some of them require x > 0.

2. Box-Cox transformation cannot guarantee that the transformed variable is

close to the normal distribution.

For more general transformation methods, see Sakia (1992, The Statistician,
169–178).

Example 2.1 Mosteller and Tukey (1977) considered data from The Coleman
Report on the relationships between several variables and mean verbal test scores
for sixth graders at 20 schools in the New England and Mid-Atlantic regions of the
United States. Two variables, x1 = the percentage of sixth-graders’ fathers employed
in white collar jobs and x2 = one-half of the sixth-graders’ mothers’ mean number
of years of schooling, are given in the following table.

x1 x2 x1 x2
28.87 6.19 12.20 5.62
20.10 5.17 22.55 5.34
69.05 7.04 14.30 5.80
65.40 7.10 31.79 6.19
29.59 6.15 11.60 5.62
44.82 6.41 68.47 6.94
77.37 6.86 42.64 6.33
24.67 5.78 16.70 6.01
65.01 6.51 86.27 7.51
9.99 5.57 76.73 6.96

The joint density of x1 and x2 is given below

Clearly, the joint density of x1 and x2 illustrates a bimodal density. By the

1/100
transformation of x1 into x1 , their joint density becomes

HKU STAT7005 13
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

The bimodal situation of x1 and x2 is improved even though the problem is not
completely solved.

Example 2.2 The data below were collected in an experiment on jute in Bishnupur
village of West Bengal, India, in which the weights of green jute plants (x1 ) and
their dry jute fibers (x2 ) were recorded for 20 randomly selected individual plants.

Plant No. x1 x2 Plant No. x1 x2

1 68 971 11 33 462
2 63 892 12 27 352
3 70 1125 13 21 305
4 6 82 14 5 84
5 65 931 15 14 229
6 9 112 16 27 332
7 10 162 17 17 185
8 12 321 18 53 703
9 20 315 19 62 872
10 30 375 20 65 740

1. Examine the univariate normality of x1 and x2 .

2. Examine the bivariate normality of x1 and x2 by chi-square plot.

3. Examine the bivariate normality of x1 and x2 by principal component method.

4. Assume bivariate normality on (x1 , x2 ),

(a) Compute the Hotelling’s T 2 = n(x̄ − µ0 )0 S −1 (x̄ − µ0 ) statistic where µ0

is the hypothesized mean, for testing H0 : µ1 = 50, µ2 = 1000.
(b) Find the maximum likelihood estimate for E(x1 |x2 ) in terms of x2 . What
is the maximum likelihood estimate of E(x1 |x2 = 200)?

5. Determine the optimal Box-Cox transformation for each of x1 and x2 . Examine

the bivariate normality of the transformed variables.

HKU STAT7005 14
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

Solution
1. Graphically, the Q-Q plots for x1 and x2 are given below:

We can see that both Q-Q plots are not close to the reference line. Therefore,
we can say that both of them may not be normally distributed.
We have the results of Kolmogorov-Smirnov test and Shapiro-Wilk test.

$univariateNormality
Test Variable Statistic p value Normality
1 Shapiro-Wilk x1 0.8585 0.0074 NO
2 Shapiro-Wilk x2 0.8893 0.0261 NO

> describe(plant[,2:3],type=2)
vars n mean sd median trimmed mad min max
x1 1 20 33.85 23.88 27 33.0 25.95 5 70
x2 2 20 477.50 335.27 342 455.5 303.93 82 1125
range skew kurtosis se
x1 65 0.42 -1.55 5.34
x2 1043 0.60 -1.09 74.97

One-sample Kolmogorov-Smirnov test

data: plant$x1
D = 0.18072, p-value = 0.5308
alternative hypothesis: two-sided

One-sample Kolmogorov-Smirnov test

data: plant$x2
D = 0.22009, p-value = 0.2482
alternative hypothesis: two-sided

Here, only the results of Shapiro-Wilk test are considered because the sample
size is small and the Kolmogorov-Smirnov test is valid for large-sample. From
the Shapiro-Wilk test, both x1 and x2 are non-normal.

HKU STAT7005 15
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

2. The chi-square plot is shown as follow:

Obviously, the chi-square plot is not close to the reference line. Therefore, x1
and x2 may not be bivariate normal.

3. A principal component analysis is applied and the normality tests are applied
to the principal component scores.

$univariateNormality
Test Variable Statistic p value Normality
1 Shapiro-Wilk Comp.1 0.8892 0.0260 NO
2 Shapiro-Wilk Comp.2 0.8813 0.0187 NO

vars n mean sd median trimmed mad min

PC1 1 20 0 336.08 135.65 22.01 305.00 -648.45
PC2 2 20 0 4.78 -0.01 0.22 2.76 -10.90
max range skew kurtosis se
PC1 396.48 1044.92 -0.6 -1.09 75.15
PC2 12.79 23.69 0.1 3.15 1.07

Neither the first or the second components is normaly distributed. x1 and x2

are not bivariate normal.

4. The sample covariance matrix and sample mean vector are given as follows.

> sx
x1 x2
x1 570.450 7845.079
x2 7845.079 112404.263
> mx
x1 x2
33.85 477.50

HKU STAT7005 16
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

(a) By using the formula T 2 = n(x − µ)0 S −1 (x − µ) where µ = (50, 1000)0 ,

the computed T 2 is 408.8485.
(b) The MLE estimate for E(x1 |x2 ) is given by
s12
E (X1\|X2 = x2 ) = x̄1 + (x2 − x̄2 ),
s22
Pn
where
Pn s 12 = i=1 (xi1 − x1 ) (xi2 − x2 ) /(n − 1) and s22 =
2
i=1 (xi2 − x2 ) /(n − 1). Note that the above formula is the same as
that in the case of simple linear regression.
Given that x2 = 200, we have

E (X1\
|X2 = 200) = 14.4823.

5. The optimal λ’s for x1 and x2 are given as follows.

> lambda <- BoxCox.lambda( plant$x1,method="loglik" )

> lambda
[1] 0.25
> lambda <- BoxCox.lambda( plant$x2,method="loglik" )
> lambda
[1] 0.25

Therefore,
(λ) x10.25 − 1 (λ) x20.25 − 1
x1 = , x2 =
0.25 0.25

$univariateNormality
Test Variable Statistic p value Normality
1 Shapiro-Wilk tx1 0.9166 0.0852 YES
2 Shapiro-Wilk tx2 0.9382 0.2217 YES

$Descriptives
n Mean Std.Dev Median Min Max 25th
tx1 20 5.142587 1.878541 5.118028 1.981395 7.57003 3.664219
tx2 20 13.771677 3.527091 13.200129 8.036867 19.16584 11.358266
75th Skew Kurtosis
tx1 7.235512 -0.08937166 -1.461265
tx2 17.081050 -0.05304673 -1.322734

HKU STAT7005 17
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions

The Shapiro-Wilk test shows that the transformed variables are univariate
normal. However the chi-square plot shows that they are not bivariate normal.

HKU STAT7005 18

(Monographs and Surveys in Pure and Applied Mathematics) A K Gupta, D K Nagar - Matrix Variate Distributions-Chapman and Hall - CRC (1999)
No ratings yet
(Monographs and Surveys in Pure and Applied Mathematics) A K Gupta, D K Nagar - Matrix Variate Distributions-Chapman and Hall - CRC (1999)
384 pages
STAT 1520 Notes
No ratings yet
STAT 1520 Notes
61 pages
R Cheat Sheet Merged
100% (2)
R Cheat Sheet Merged
35 pages
AQA Physics Revision Guide
100% (1)
AQA Physics Revision Guide
233 pages
Multivariate Normal Distribution
100% (1)
Multivariate Normal Distribution
8 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
9 pages
Basics of Multivariate Normal
No ratings yet
Basics of Multivariate Normal
46 pages
Formulae Sheet For Multivariate Statistics
No ratings yet
Formulae Sheet For Multivariate Statistics
4 pages
Properties of The Normal and Multivariate Normal Distributions
No ratings yet
Properties of The Normal and Multivariate Normal Distributions
2 pages
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
No ratings yet
3 The Rao-Blackwell Theorem: 3.1 Mean Squared Error
2 pages
Multivariate Normal Distribution: 1 Random Vector
No ratings yet
Multivariate Normal Distribution: 1 Random Vector
3 pages
R-Tutorial - Introduction
No ratings yet
R-Tutorial - Introduction
30 pages
Multivariate Material
No ratings yet
Multivariate Material
58 pages
EC2303 Final Formula Sheet PDF
No ratings yet
EC2303 Final Formula Sheet PDF
8 pages
R Basics PDF
No ratings yet
R Basics PDF
10 pages
R Programming Introduction
No ratings yet
R Programming Introduction
20 pages
R Programming For NGS Data Analysis
No ratings yet
R Programming For NGS Data Analysis
5 pages
Wishart Distribution
100% (1)
Wishart Distribution
6 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
04 Jackknife PDF
No ratings yet
04 Jackknife PDF
22 pages
R Packages For Machine Learning
No ratings yet
R Packages For Machine Learning
3 pages
Data Science With R
No ratings yet
Data Science With R
46 pages
Predictions in Linear Regression Model
No ratings yet
Predictions in Linear Regression Model
15 pages
Markov Chains Cheat Sheet PDF
No ratings yet
Markov Chains Cheat Sheet PDF
2 pages
Formula Sheet
100% (1)
Formula Sheet
2 pages
Beta Distribution: First Kind, Whereasbeta Distribution of The Second Kind Is An Alternative Name For The
No ratings yet
Beta Distribution: First Kind, Whereasbeta Distribution of The Second Kind Is An Alternative Name For The
4 pages
Ggplot 2
No ratings yet
Ggplot 2
48 pages
Bootstrap PDF
No ratings yet
Bootstrap PDF
24 pages
R Programming
No ratings yet
R Programming
63 pages
Lecture+14 SAS Bootstrap and Jackknife
No ratings yet
Lecture+14 SAS Bootstrap and Jackknife
12 pages
Wishart Distribution
No ratings yet
Wishart Distribution
6 pages
Cheat Sheet On Probability
No ratings yet
Cheat Sheet On Probability
2 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
8 pages
Estimation EMV
No ratings yet
Estimation EMV
37 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
487 pages
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
No ratings yet
RYAN, THOMAS P. - [Wiley Series in Probability and Statistics] Modern Regression Methods __ (2
658 pages
Buy Ebook Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin Cheap Price
100% (4)
Buy Ebook Modern Statistics With R From Wrangling and Exploring Data To Inference and Predictive Modelling Second Edition Måns Thulin Cheap Price
84 pages
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
100% (1)
Least Squares Problems: How To State and Solve Them, Then Evaluate Their Solutions
63 pages
Frequency Distribution For Categorical Data
No ratings yet
Frequency Distribution For Categorical Data
6 pages
Stock Watson 3U ExerciseSolutions Chapter12 Students
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter12 Students
6 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
App.A - Detection and Estimation in Additive Gaussian Noise PDF
No ratings yet
App.A - Detection and Estimation in Additive Gaussian Noise PDF
55 pages
Beta Distribution
No ratings yet
Beta Distribution
8 pages
Book-Sher Muhammad Chaudary - 89-133 PDF
100% (1)
Book-Sher Muhammad Chaudary - 89-133 PDF
45 pages
Excel For Business Statistics
No ratings yet
Excel For Business Statistics
37 pages
Random Forest
No ratings yet
Random Forest
5 pages
ST1131 Cheat Sheet Page 1
0% (1)
ST1131 Cheat Sheet Page 1
1 page
Full Statistics
No ratings yet
Full Statistics
108 pages
Numpy
No ratings yet
Numpy
15 pages
Parametric Families of Discrete Distributions
No ratings yet
Parametric Families of Discrete Distributions
2 pages
(GAM) Application PDF
No ratings yet
(GAM) Application PDF
30 pages
Survival Plots SURVMINER Package Tutorial
No ratings yet
Survival Plots SURVMINER Package Tutorial
5 pages
Data Visualization With Ma Thematic A
No ratings yet
Data Visualization With Ma Thematic A
46 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Machine Learning Roadmap For Absolute Beginners
No ratings yet
Machine Learning Roadmap For Absolute Beginners
2 pages
R Studio How To
No ratings yet
R Studio How To
12 pages
Multivariate Linear Regression
100% (1)
Multivariate Linear Regression
46 pages
Statistical Inference Cheat Sheet
No ratings yet
Statistical Inference Cheat Sheet
4 pages
Chap 2
No ratings yet
Chap 2
9 pages
Multivariate Analysis
No ratings yet
Multivariate Analysis
25 pages
Statistics Review
No ratings yet
Statistics Review
9 pages
Elasticity and Its Application
No ratings yet
Elasticity and Its Application
44 pages
ACCT2111 CH 2QA
No ratings yet
ACCT2111 CH 2QA
27 pages
ACCT2111 CH 1QA Conceptual Frameworks and Financial Statements
No ratings yet
ACCT2111 CH 1QA Conceptual Frameworks and Financial Statements
40 pages
Chap5 - Multivariate Regression and Linear Model
No ratings yet
Chap5 - Multivariate Regression and Linear Model
16 pages
Advanced High Temperature Alloys PDF
No ratings yet
Advanced High Temperature Alloys PDF
121 pages
Class 12 Phy Half Yearly
No ratings yet
Class 12 Phy Half Yearly
6 pages
Least Square Adjustment
100% (1)
Least Square Adjustment
12 pages
Physics Form 5
No ratings yet
Physics Form 5
4 pages
Outline: Quantifying Growth Kinetics
No ratings yet
Outline: Quantifying Growth Kinetics
21 pages
Remedial Action Schemes
No ratings yet
Remedial Action Schemes
28 pages
Grove GMK 2035
No ratings yet
Grove GMK 2035
18 pages
Introduction To Nondestructive Testing Methods
No ratings yet
Introduction To Nondestructive Testing Methods
32 pages
SUMO Robot
No ratings yet
SUMO Robot
20 pages
Photorealistic Texturing For Dummies Pt4
No ratings yet
Photorealistic Texturing For Dummies Pt4
6 pages
Practice 1 INGE 4010
No ratings yet
Practice 1 INGE 4010
2 pages
ASTM E 740 - 2003 - Standard Practice For Fracture Testing With Surface-Crack Tension Specimens
No ratings yet
ASTM E 740 - 2003 - Standard Practice For Fracture Testing With Surface-Crack Tension Specimens
9 pages
Whitepaper 4Z1c
No ratings yet
Whitepaper 4Z1c
6 pages
(Text) Theory of Atomic Nucleus and Nuclear Energy Sources (1949)
No ratings yet
(Text) Theory of Atomic Nucleus and Nuclear Energy Sources (1949)
372 pages
Adhesion of Copper To Poly (Tetrafluoroethylene-Co-Hexafluoropropylene) (FEP) Surfaces Modified by Vacuum UV Photo-Oxidation Downstream From Ar Microwave Plasma
No ratings yet
Adhesion of Copper To Poly (Tetrafluoroethylene-Co-Hexafluoropropylene) (FEP) Surfaces Modified by Vacuum UV Photo-Oxidation Downstream From Ar Microwave Plasma
19 pages
Risk Analysis Based CWR Track Buckling Safety Evaluations
No ratings yet
Risk Analysis Based CWR Track Buckling Safety Evaluations
19 pages
(L5) - Basic Mathematics - Physics
No ratings yet
(L5) - Basic Mathematics - Physics
47 pages
Edexcel GCE: Core Mathematics C3
No ratings yet
Edexcel GCE: Core Mathematics C3
24 pages
AC Motor Enclosures
No ratings yet
AC Motor Enclosures
16 pages
(Ebook) Foundations of Fluid Mechanics with Applications: Problem Solving Using Mathematica® by Sergey P. Kiselev, Evgenii V. Vorozhtsov, Vasily M. Fomin ISBN 9783319661483, 9783319661490, 3319661485, 3319661493 all chapter instant download
100% (9)
(Ebook) Foundations of Fluid Mechanics with Applications: Problem Solving Using Mathematica® by Sergey P. Kiselev, Evgenii V. Vorozhtsov, Vasily M. Fomin ISBN 9783319661483, 9783319661490, 3319661485, 3319661493 all chapter instant download
55 pages
IET Generation Trans Dist - 2016 - Kopsidas - Overhead Line Design Considerations For Conductor Creep Mitigation
No ratings yet
IET Generation Trans Dist - 2016 - Kopsidas - Overhead Line Design Considerations For Conductor Creep Mitigation
10 pages
A2 Physics Practicals 56873406 Practical 5 Rate of Change of Momentum Using A Trolley
100% (1)
A2 Physics Practicals 56873406 Practical 5 Rate of Change of Momentum Using A Trolley
2 pages
Vibrations, Waves and Sound Jeopardy
No ratings yet
Vibrations, Waves and Sound Jeopardy
51 pages
Anomalous bulk-edge
No ratings yet
Anomalous bulk-edge
10 pages
Thermal Gate Pqs
No ratings yet
Thermal Gate Pqs
62 pages
BMD Plug Gauge Technical Guide en
No ratings yet
BMD Plug Gauge Technical Guide en
52 pages
Conduction Tutorial Sheet 2
No ratings yet
Conduction Tutorial Sheet 2
3 pages
Hydraulic Pumps
100% (5)
Hydraulic Pumps
35 pages
Tech Paper RINGFEDER Locking Assemblies RFN 7012 2 EN 03 2020
No ratings yet
Tech Paper RINGFEDER Locking Assemblies RFN 7012 2 EN 03 2020
3 pages

Chap2 Multivariate Normal and Related Distributions

Uploaded by

Chap2 Multivariate Normal and Related Distributions

Uploaded by

STAT7005 Multivariate Methods

2. Multinormal and Related Distributions

2 Multivariate Normal and Related Distributions

2.1 Multivariate Normal Distribution

A random vector x is said to have a multivariate normal distribution

Suppose a = (a1 a2 )0 and x = (x1 x2 )0 . The multinormality of x requires that

1. If x is multinormal, for any constant vector a,

Proof: Since E(a0 x) = a0 µ and Var(a0 x) = a0 Σa, the result follows by

2. The m.g.f. of a multinormal random vector x with mean vector µ and

Thus, a multinormal distribution is identified by its means µ and covariances

3. Given x ∼ Np (µ1 , Σ1 ) and y ∼ Np (µ2 , Σ2 ). If x and y are independent,

Hint: Use property (2).

4. If x ∼ Np (µ, Σ), for any constant m × p matrix A and constant m × 1 vector

5. Given a positive definite (i.e. non-singular square, or invertible) matrix Σ.

6. The pdf of x ∼ Np (µ, Σ) is given by

Hint: Use property (5) and the transformation z = B −1 (x − µ).

(a) x1 and x2 are independent iff Cov(x1 , x2 ) = Σ12 = 0.

Hint: Use property 7(a).

Hint: Use property 7(c).

8. Given E(x) = µ and Var(x) = Σ and p × p symmetric matrix A. Then,

E(x0 Ax) = µ0 Aµ + tr(AΣ).

(This statement is also true for any random x).

9. Given x ∼ Np (µ, Σ) and Σ is p.d.. Then,

(x − µ)0 Σ−1 (x − µ) ∼ χ2 (p).

Proof. From property (5), we have z = (z1 , . . . , zp )0 = B −1 (x − µ) ∼ Np (0, I)

(a) Ax is independent of Bx iff AΣB 0 = 0.

2.2 Estimators of µ and Σ and Their Sampling

As shown in what follows, the method of Maximum Likelihood Estimation (MLE)

The likelihood function is

and thus the log-likelihood function is

By the method of completing squares,

2.3 Wishart Distribution

Suppose xi (i = 1, . . . , k) are independent Np (µi , Σ). Define a symmetric p × p

Then, V is said to follow a Wishart distribution, denoted by Wp (k, Σ, Ψ), where

When Ψ = 0, the Wishart distribution is called the central Wishart distribution,

Note that when p = 1 and Σ = σ 2 , the Wishart  Wp (k, Σ, Ψ) is reduced

In the univariate case, the pdf of a central Wishart distribution is reduced to

1. If V ∼ Wp (k, Σ), the pdf of V is given by

2. If V 1 ∼ Wp (k1 , Σ, Ψ1 ) and V 2 ∼ Wp (k2 , Σ, Ψ2 ) are independent, V 1 + V 2 ∼

(a) For a symmetric A, X 0 AX ∼ Wp (r, Σ, Ψ) iff A2 = A (i.e. idempotent),

Proof: See Seber (1984, Multivariate Observations, p.24-25).

(a) x̄ ∼ Np (µ, n1 Σ).

2.4 Assessing Normality Assumption

1. Check each variable for univariate normality (necessary for multinormality

• Q-Q plot (quantile against quantile plot) for normal distribution

An alternative device is the P-P plot (probability against probability plot)

In contrast, if x is not close to a normal distribution, the Q-Q plot may

For example, if x = (x1 , x2 , x3 )0 is close to a trivariate normal distribution, the

In contrast, if x = (x1 , x2 , x3 )0 is not close to a trivariate normal distribution,

3. Check each Principal Component (PC) for univariate normality (necessary

2.5 Transformations to Near Normality

1. Use some common transformation such as log x

2. Choose a transformation based on theory, for example variance stabilizing

3. Use univariate Box-Cox transformation

and λ is a unknown parameter. Typically, λ can be chosen by

• priori information of λ, or search among the following λ values

In the above example, after the Box-Cox transformation of x with λ = 2, x2

4. Box-Cox transformation for multivariate data

1. The transformations above require x ≥ 0 and some of them require x > 0.

2. Box-Cox transformation cannot guarantee that the transformed variable is

The joint density of x1 and x2 is given below

Clearly, the joint density of x1 and x2 illustrates a bimodal density. By the

Plant No. x1 x2 Plant No. x1 x2

1. Examine the univariate normality of x1 and x2 .

2. Examine the bivariate normality of x1 and x2 by chi-square plot.

3. Examine the bivariate normality of x1 and x2 by principal component method.

4. Assume bivariate normality on (x1 , x2 ),

(a) Compute the Hotelling’s T 2 = n(x̄ − µ0 )0 S −1 (x̄ − µ0 ) statistic where µ0

5. Determine the optimal Box-Cox transformation for each of x1 and x2 . Examine

One-sample Kolmogorov-Smirnov test

One-sample Kolmogorov-Smirnov test

2. The chi-square plot is shown as follow:

vars n mean sd median trimmed mad min

Neither the first or the second components is normaly distributed. x1 and x2

(a) By using the formula T 2 = n(x − µ)0 S −1 (x − µ) where µ = (50, 1000)0 ,

5. The optimal λ’s for x1 and x2 are given as follows.

Note that when p = 1 and Σ = σ 2 , the Wishart Wp (k, Σ, Ψ) is reduced