The Multivariate Normal Distribution: f (x) = √ e −∞ 0. /σ
The Multivariate Normal Distribution: f (x) = √ e −∞ 0. /σ
Why should we consider the multivariate normal distribution? It would seem that applied problems
are so complex that it would only be interesting from a mathematical perspective.
1. It is mathematically tractable for a large number of problems, and, therefore, progress towards
answers to statistical questions can be provided, even if only approximately so.
2. Because it is tractable for so many problems, it provides insight into techniques based upon
other distributions or even non-parametric techniques. For this, it is often a benchmark
against which other methods are judged.
3. For some problems it serves as a reasonable model of the data. In other instances, transfor-
mations can be applied to the set of responses to have the set conform well to multivariate
normality.
4. The sampling distribution of many (multivariate) statistics are normal, regardless of the
parent distribution (Multivariate Central Limit Theorems). Thus, for large sample sizes, we
may be able to make use of results from the multivariate normal distribution to answer our
statistical questions, even when the parent distribution is not multivariate normal.
Consider first the univariate normal distribution with parameters µ (the mean) and σ (the variance)
for the random variable x,
2
1 1 (x−µ)
f (x) = √ e− 2 σ2 (1)
2πσ 2
for −∞ < x < ∞, −∞ < µ < ∞, and σ 2 > 0.
Now rewrite the exponent (x − µ)2 /σ 2 using the linear algebra formulation of
This formulation matches that for the generalized or Mahalanobis squared distance
where both x and µ are vectors. The multivariate normal distribution can be derived by substi-
tuting the Mahalanobis squared distance formula into the univariate formula and normalizing the
distribution such that the total probability of the distribution is 1. This yields,
1 1 −1 (x−µ)
f (x) = 1/2
e− 2 (x−µ) Σ (2)
(2π)p/2 |Σ|
(x − µ) Σ−1 (x − µ) =
2 2
1 x1 − µ1
√
x2 − µ2 x1 − µ1 x2 − µ2
2 + √ − 2ρ12 √ √ . (3)
1 − ρ12 σ11 σ22 σ11 σ22
Multivariate Normal Properties 2
Let X ∼ Np (µ, Σ) be p-variate multivariate normal with mean µ and variance-covariance matrix
Σ, where
X1 µ1 σ11 σ12 · · · σ1p
X2 µ2 σ21 σ22 · · · σ2p
X= ..
µ= ..
Σ= .. .. .. .. .
. . . . . .
Xp µp σp1 σp2 · · · σpp
1. The solid ellipsoid of all x, such that (x − µ) Σ−1 (x − µ) ≤ χ2p (α), contains (1 − α)100%
of the probability in the distribution. This also implies that contours delineating regions of
constant probability about µ are given by (x − µ) Σ−1 (x − µ) = χ2p (α).
2. The semiaxes of the ellipsoid containing (1 − α)100% probability are given by the eigenvalues
(λi ) and eigenvectors (ei ) of Σ such that the semiaxes are
±c λi ei
then
C X ∼ Nq (C µ, C ΣC).
and so that
X1 µ1 Σ11 Σ12
X= ∼ Np+q ( , ).
X2 µ2 Σ21 Σ22
6. If two variates, say X1 and X2 , of the multivariate normal are uncorrelated, ρ12 = 0 and
implies σ12 = 0, then X1 and X2 are independent. This property is not in general true for
other distributions. However, it is always true that if two variates are independent, then they
are uncorrelated, no matter what their joint distribution is.
1. Let
X̄1
X̄2
X̄ =
..
.
X̄p
be the vector of sample means from a sample of size n from the multivariate normal distri-
bution for X, then
1
X̄ ∼ Np (µ, Σ).
n
2. Let S be the sample variance-covariance matrix computed from a sample of size n from the
multivariate normal distribution for X, then
(n − 1)S ∼ W(n−1) (Σ),
the Wishart distribution with (n − 1) degrees of freedom.
3. The density function W for S does not exist when n ≤ p. Further, S must be positive definite
(λi > 0 ∀ i = 1, 2, · · · , p) for the density to exist.
4. X̄ and S are stochastically independent.
5. Let (n − 1)S ∼ W(n−1) (Σ), then
(n − 1)C SC ∼ W(n−1) (C ΣC).
6. Let A1 = (n1 − 1)S1 ∼ W(n1 −1) (Σ) and A2 = (n2 − 1)S2 ∼ W(n2 −1) (Σ), where S1 and S2 are
independent estimates of Σ, then
A1 + A2 ∼ W(n1 +n2 −2) (Σ)
and
1
( )(A1 + A2 )
n1 + n2 − 2
is a “pooled” estimate of Σ.
7. Let X1 , X2 , . . . , Xn be a simple random sample of size n, where Xi ∼ Np (µ, Σ), then approx-
imately for (n − p) large,
n(X̄ − µ) Σ−1 (X̄ − µ) ∼ χ2p .
A central limit theorem says that for very large n − p we can relax the requirement that the
Xi be multivariate normal. Further, for n − p large, an approximate (1 − α)100% confidence
region for µ is given by the set of all µ such that
n(X̄ − µ) S −1 (X̄ − µ) ≤ χ2p (α).
Assessing Multivariate Normality 4
1. All marginal distributions must be normal. Check the normality of each variable. If a variable
does not conform to the normal distribution, then the set of variables can not be multivariate
normal.
Steps for the q-q normal distribution plot:
(a) Order the observations from smallest to largest (X(1) ≤ X(2) ≤ . . . ≤ X(n) ). These
are the order statistics for this random variable and they estimate the quantiles of the
distribution from which they were sampled. The quantile is the value at which a certain
proportion of the distribution is less than or equal to that value.
(b) Estimate the proportion of the distribution that should be less than or equal to the value
of each order statistic. One such estimate is
i − 1/2
n
where i is the rank of each observation.
(c) Compute the expected quantiles from the normal distribution as
i − 1/2
qi = Φ−1 ,
n
where Φ−1 is the inverse of the standard normal cumulative distribution function.
(d) Plot the observed quantiles, X(i) , versus the expected quantiles, qi , and check for linearity
of the plot. If the observed quantiles correspond with a normal distribution, then the
points will plot on a straight line. If not, reject (multivariate) normality. Note, you
should have a minimum sample size of 20 to begin to have confidence in the plot.
2. All pairs of variables must be bivariate normal. Produce scatter plots of all pairs of variables.
Density regions should correspond roughly to elliptical patterns with linear relationships
among pairs of variables.
3. Linear combinations of the variables are normal. Check any meaningful linear combinations
for normality (sums, differences). Further, check the principal components (linear combina-
tions corresponding to the eigenvectors of Σ) for normality. Check pairs of linear combinations
and principal components for bivariate normality. Rejection of normality or bivariate nor-
mality for linear combinations also rejects multivariate normality.
4. Squared distances about the population mean vector are distributed as chi-square with p
degrees of freedom. Estimate the population mean vector with the sample mean vector, and
estimate the population covariance matrix with the sample covariance matrix. Compute the
squared distances of each observation to the sample mean vector and check to see that they
are chi-square distributed.
Steps for the q-q chi-square distribution plot:
Assessing Multivariate Normality 5
These steps can be repeated for subsets of the variables and linear combinations. This may
be useful in identifying whether “problems” of multivariate normality are associated with a
set of variables or a single variable.
1. check for outliers or errors in the data. If outliers are identified, then use methods for dealing
with outliers to determine their impact on the analysis. Alternative robust methods may also
be available. Remember that an outlier may be the most informative observation in the data
set as it is “different” from all the others.
2. consider transformations of one or more of the variables. A variable may, for example, follow
the lognormal distribution, so a logarithm transformation would be in order. Note that
transformations also affect the variable’s associations with the other variables.
3. consider robust or alternative multivariate methods if available. Some techniques are much
less sensitive to outliers or the distribution of the data than others. See, for example, Sil-
verman, B.W., 1986, Density Estimation for Statistics and Data Analysis, Chapman & Hall,
New York, 175pp.
4. consider basing inference on results using resampling methods (Monte Carlo, bootstrap, or
permutation methods). See, for example, Manly, B.F.J., 1997, Randomization, Bootstrap and
Monte Carlo Methods in Biology, second edition, Chapman & Hall, New York, 399pp.