Chap2 Multivariate Normal and Related Distributions
Chap2 Multivariate Normal and Related Distributions
HKU STAT7005 1
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
Properties
a0 x ∼ N (a0 µ, a0 Σa).
x + y ∼ Np (µ1 + µ2 , Σ1 + Σ2 ).
HKU STAT7005 2
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
Property 7(d) implies, E(x1 |x2 ) = µ1 + Σ12 Σ−122 (x2 − µ2 ) and Var(x1 |x2 ) =
−1
Σ11 − Σ12 Σ22 Σ21 which does not change with the value of x2 . Indeed, the
results of Property 7(d) is related to the multivariate linear regression model.
To summarize, the marginal pdf and conditional pdf of a multivariate normal
distribution are still multivariate normal.
10. Let x ∼ Np (µ, Σ) and Σ is p.d.. Then, for any m × p matrix A and n × p
matrix B,
HKU STAT7005 3
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
(a). Exercise; Proofs for (b) and (c), see Searle (1971, Linear Model, p.59-60).
L(µ, Σ) = f (x1 , x2 , . . . , xn )
= f (x1 )f (x2 ) · · · f (xn ) x1 , x2 , . . . , xn are independent
Yn
= f (xi )
i=1
n
Y 1 1 0 −1
= p 1 exp − (xi − µ) Σ (xi − µ)
i=1 (2π) 2 |Σ| 2 2
( n
)
1 1X
= np n exp − (xi − µ)0 Σ−1 (xi − µ) .
(2π) |Σ|
2 2 2 i=1
HKU STAT7005 4
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
We therefore have
np n −1 W n
`(µ, Σ) = − log(2π) − log |Σ| + tr Σ − (x̄ − µ)0 Σ−1 (x̄ − µ).
2 2 n 2
(2.2.1)
0 −1
Since (x̄ − µ) Σ (x̄ − µ) ≥ 0 for any µ, we have
np n −1 W
`(µ, Σ) ≤ `(x̄, Σ) = − log(2π) − log |Σ| + tr Σ . (2.2.2)
2 2 n
Since the matrices Σ and W /n are positive defnite, the function
−1 W
g(Σ) = log |Σ| + tr Σ (2.2.3)
n
attains its minimum at Σ = W /n by means of inequality (5) on page 14 of the
Appendix 1 of Chapter 1. Then,
W np np n W
`(µ, Σ) ≤ ` x̄, = − log(2π) − − log .
n 2 2 2 n
Hence, the MLE of µ and Σ are respectively
W (n − 1)S
µ̂ = x̄ and Σ̂ = = . (2.2.4)
n n
Properties
1. x̄ and S are sufficient statistics for Np (µ, Σ), i.e., the conditional distribution
of the sample (x1 , . . . , xn ) given x̄ and S does not depend on µ and Σ.
2. x̄ ∼ Np (µ, n1 Σ) and (n − 1)S ∼ Wp (n − 1, Σ), a central Wishart distribution
which is defined in the next section.
3. µ̂ is unbiased but Σ̂ is biased. However, S is unbiased for Σ.
4. The MLEs possess an invariance property: if MLE of θ is θ̂, then MLE of
φ = h(θ) is φ̂ = h(θ̂), provided that h(·) is a one-to-one function.
HKU STAT7005 5
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
Properties
HKU STAT7005 6
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
Hint: (a) Use x̄ = (x1 + · · · + xn )/n; (b) Use property 6(a) with A = I − n1 J ;
(c) Use property 6(c) with B = n1 1.
8. In the Cholesky decomposition V = T T 0 of V ∼ Wp (k, I), where T = (tij )
is lower triangular (i.e. tij = 0 if i < j), all non-zero elements are mutually
independent with tij ∼ N (0, 1) for i > j and tii ∼ χ(k − i + 1).
9. If y is any random vector independent of V ∼ Wp (k, Σ), k > p, the ratio
y 0 Σ−1 y/y 0 V −1 y ∼ χ2 (k − p + 1) and is independent of y. [Note the difference
with property (5) above.]
HKU STAT7005 7
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
HKU STAT7005 8
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
• Shapiro-Wilk W test
– the test statistic of this test is a modified version of the squared
sample correlation between the sample quantiles and the expected
quantiles.
• Kolmogorov-Smirnov-Lilliefors (KSL) test (large sample, at least
hundreds)
– the test statistic of this test is the maximum difference between the
empirical cdf and the normal cdf.
• Test for zero skewness
– for symmetric distribution, the test statistic for skewness =
n
Pn (xi −x̄)3
(n−1)(n−2) i=1 s3
, see Joanes and Gill (1998, The Statistician,
183-189), is close to 0.
• Test for zero excess kurtosis
– for normal distribution, the test statistic for excess kurtosis =
n(n+1) Pn (xi −x̄)4 3(n−1)2
(n−1)(n−2)(n−3) i=1 s4
− (n−2)(n−3) , see Joanes and Gill (1998,
The Statistician, 183–189), is close to 0.
HKU STAT7005 9
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
2. When n − p is large enough, we make use of property (9) of Section 2.1. Check
whether the squared generalized distance as defined below follows a chi-squared
distribution by a Q-Q plot (necessary and sufficient conditions for very
large sample size).
• Define the squared generalized distance as d2i = (xi − x̄)0 S −1 (xi − x̄),
i = 1, . . . , n.
• Order d21 , d22 , . . . , d2n as d2(1) ≤ d2(2) ≤ · · · ≤ d2(n) .
• Plot χ2(i) vs d2(i) , where χ2(i) is the 100(i − 21 )/n percentile of χ2 (p)
distribution.
• A straight line indicate multivariate normality.
HKU STAT7005 10
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
Power, λ Transformation
3 cubic
2 square
1 no transform
0.5 square-root
1/3 cubic-root
0 log
−1/3 inverse of cubic-root
−0.5 inverse of square-root
−1 inverse
−2 inverse of square
−3 inverse of cubic
Since x[λ] ’s are of different scale for different λ, so it makes it difficult to find
the optimal λ value. We instead consider the power transformation as
HKU STAT7005 11
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
follows:
xλi − 1
if λ 6= 0
[λ]
, , i = 1, . . . , n
xi = λ[GM (x)]λ−1
GM (x) log x ,
i if λ = 0 , i = 1, . . . , n
where GM (x) = (x1 x2 · · · xn )1/n is the geometric mean of the observations
x1 , x2 , . . . , xn . We choose the value of λ that minimizes the sum of squares of
residuals (equivalent to sample variance) after transformation.
HKU STAT7005 12
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
Remarks
For more general transformation methods, see Sakia (1992, The Statistician,
169–178).
Example 2.1 Mosteller and Tukey (1977) considered data from The Coleman
Report on the relationships between several variables and mean verbal test scores
for sixth graders at 20 schools in the New England and Mid-Atlantic regions of the
United States. Two variables, x1 = the percentage of sixth-graders’ fathers employed
in white collar jobs and x2 = one-half of the sixth-graders’ mothers’ mean number
of years of schooling, are given in the following table.
x1 x2 x1 x2
28.87 6.19 12.20 5.62
20.10 5.17 22.55 5.34
69.05 7.04 14.30 5.80
65.40 7.10 31.79 6.19
29.59 6.15 11.60 5.62
44.82 6.41 68.47 6.94
77.37 6.86 42.64 6.33
24.67 5.78 16.70 6.01
65.01 6.51 86.27 7.51
9.99 5.57 76.73 6.96
HKU STAT7005 13
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
The bimodal situation of x1 and x2 is improved even though the problem is not
completely solved.
Example 2.2 The data below were collected in an experiment on jute in Bishnupur
village of West Bengal, India, in which the weights of green jute plants (x1 ) and
their dry jute fibers (x2 ) were recorded for 20 randomly selected individual plants.
HKU STAT7005 14
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
Solution
1. Graphically, the Q-Q plots for x1 and x2 are given below:
We can see that both Q-Q plots are not close to the reference line. Therefore,
we can say that both of them may not be normally distributed.
We have the results of Kolmogorov-Smirnov test and Shapiro-Wilk test.
$univariateNormality
Test Variable Statistic p value Normality
1 Shapiro-Wilk x1 0.8585 0.0074 NO
2 Shapiro-Wilk x2 0.8893 0.0261 NO
> describe(plant[,2:3],type=2)
vars n mean sd median trimmed mad min max
x1 1 20 33.85 23.88 27 33.0 25.95 5 70
x2 2 20 477.50 335.27 342 455.5 303.93 82 1125
range skew kurtosis se
x1 65 0.42 -1.55 5.34
x2 1043 0.60 -1.09 74.97
Here, only the results of Shapiro-Wilk test are considered because the sample
size is small and the Kolmogorov-Smirnov test is valid for large-sample. From
the Shapiro-Wilk test, both x1 and x2 are non-normal.
HKU STAT7005 15
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
Obviously, the chi-square plot is not close to the reference line. Therefore, x1
and x2 may not be bivariate normal.
3. A principal component analysis is applied and the normality tests are applied
to the principal component scores.
$univariateNormality
Test Variable Statistic p value Normality
1 Shapiro-Wilk Comp.1 0.8892 0.0260 NO
2 Shapiro-Wilk Comp.2 0.8813 0.0187 NO
4. The sample covariance matrix and sample mean vector are given as follows.
> sx
x1 x2
x1 570.450 7845.079
x2 7845.079 112404.263
> mx
x1 x2
33.85 477.50
HKU STAT7005 16
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
E (X1\
|X2 = 200) = 14.4823.
Therefore,
(λ) x10.25 − 1 (λ) x20.25 − 1
x1 = , x2 =
0.25 0.25
$univariateNormality
Test Variable Statistic p value Normality
1 Shapiro-Wilk tx1 0.9166 0.0852 YES
2 Shapiro-Wilk tx2 0.9382 0.2217 YES
$Descriptives
n Mean Std.Dev Median Min Max 25th
tx1 20 5.142587 1.878541 5.118028 1.981395 7.57003 3.664219
tx2 20 13.771677 3.527091 13.200129 8.036867 19.16584 11.358266
75th Skew Kurtosis
tx1 7.235512 -0.08937166 -1.461265
tx2 17.081050 -0.05304673 -1.322734
HKU STAT7005 17
STAT7005 Multivariate Methods
2. Multinormal and Related Distributions
The Shapiro-Wilk test shows that the transformed variables are univariate
normal. However the chi-square plot shows that they are not bivariate normal.
HKU STAT7005 18