Week04
Week04
beamer-tu-logo
beamer-tu-logo
Example
Suppose the random variables X1 , X2 and X3 have the covariance
matrix
1 −2 0
Σ = −2 5 0
0 0 2
Calculating the population principal components.
beamer-tu-logo
beamer-tu-logo
(x − µ)0 Σ−1 (x − µ) = c 2
p
which have axes ±c λi ei , i = 1, 2, . . . , p, where the (λi , ei ) are the
eigenvalue-eigenvector pairs of Σ. Assume µ = 0, the equation above
can be rewritten as
1 0 2 1 0 2 1
c 2 = x 0 Σ−1 x = (e x) + (e2 x) + · · · + (ep0 x)2
λ1 1 λ2 λp
1 1 1
= y12 + y22 + · · · + yp2 .
λ1 λ2 λp
beamer-tu-logo
σ 2 ρσ 2 · · · ρσ 2
ρσ 2 σ 2 . . . ρσ 2
Σ= .
. .. . . ..
. . . .
ρσ ρσ · · · σ 2
2 2
beamer-tu-logo
σ2 φσ2 · · · φ p−1 σ 2
φσ2 σ2 . . . φ p−2 σ 2
Σ= .
. .. . . ..
. . . .
φ p−1 2
σ φ p−2 2
σ ··· σ2
No closed form, but for large p the eigen vectors are like sines and
cosines.
beamer-tu-logo
Se λk e
bk = b bk ,
and
ybk = Xdev e
bk ,
where
1 1
Xdev = X − 110 X = (I − 110 )X
n n
Recall:
0
1 1 1
S= X 0 Xdev = √ Xdev √ Xdev
n − 1 dev n−1 n−1
Singular value decomposition:
1
√ Xdev = UDV 0 beamer-tu-logo
n−1
Z. Zhang (UCAS) Econometrics II Week 4 March 19, 2025 98 / 230
The diagonal entries of D are the square roots of the largest p
eigenvalues of both (n − 1)−1 Xdev
0 X −1
dev = S and (n − 1) Xdev Xdev
0
0 X
The columns of V are the eigenvectors of Xdev dev .
Also √
Xdev V = [yb1 , yb2 , . . . , ybp ] = ( n − 1)UD
so the singular value decomposition of (n − 1)−1/2 Xdev provides all
the details of the sample principal components:
the coefficients V ;
the values UD.
Similarly, if X ∗ is Xdev with its columns normalized (sum of
squares = 1), then
R = X ∗0X ∗,
and the singular value decomposition of X ∗ gives the PCA of R.
beamer-tu-logo
that is, the left singular vectors multiplied by the singular values
(square roots of the eigenvalues of S). These are returned in the
argument p1$x. beamer-tu-logo
beamer-tu-logo
1.5
80
60
Variances
Variances
1.0
40
0.5
20
0.0
0
beamer-tu-logo
4
60
34 34
0.4
37
47
0.2
6
gov
8 20 41
40
38
2
37 36
33 32
21 72 42
5051
0.2
21
4314
35
26 1 1124
36 2414
20
32
51 45
275049 19 44 31
11 27
0.0
6 17
0
4133 43 3
3 42 22 45
38 15 59 3025
23 16 5355
22
20 1 4 61 28 10 4039
0.0
59 1053
0
medS
15 46
78
19 3156 39 48 4 60
PC2
PC2
TOT 52
prof44 35 25 60 49 29
18 1312
30 46 58
55 40 26
5461
−2
16 54 52 56
5 5
48
−0.2
17 129
−20
9
gov
47 13 57 58
−0.2
2 28 prof 57
18
29 23
−4
−40
emp16 TOT
medS
−0.4
−0.4
emp16
−60
−6
−0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2
PC1 PC1
beamer-tu-logo
beamer-tu-logo
2
PC1
1
0
−1
−2
−3
−4
| | || || | ||||| |||||||| ||||||||||||| ||||||||||||||||||
|||| ||| ||| ||| |||
3
2
1
0
−1
−2
PC2
2
PC3
1
1
0
2
−1
−2
| || ||| ||||| | | ||||| ||| ||||||||||
| |||||||||| |||||||||||| ||||| |||| ||||||||| | |
−4 −3 −2 −1 0 1 2 −2 −1 0 1 2
beamer-tu-logo
beamer-tu-logo
λ1 + · · · + λk
ψk = ,
λ1 + · · · + λp
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
2λ 2
λ ∼ N(λ ,
b ).
n
So we could also state that, approximately,
nbλi
∼ χn2 .
λi
beamer-tu-logo
beamer-tu-logo
The matrix
ΣA,ij = Cov (Zti , Ztj ) = R(ti , tj )
is the covariance matrix and is determined by a function R called
the covariance kernel of Z .
beamer-tu-logo
k (x, y ) = (x · y )d ;
where the function Φ(x) maps the data nonlinearly into a feature
space.
Want Cε = λ ε. We have
1 n 1 n
Cε = ∑ Φ(xk )Φ(xk )0 ε = ∑ Φ(xk )αk
n k =1 n k =1
ε= ∑ αk Φ(xk ),
k =1
Define
Kij = (Φ(xi ) · Φ(xj )),
we arrive at
nλ K α = K 2 α (8)
or equivalently
nλ α = K α (9)
beamer-tu-logo
subject to α 0 K̃ α = 1.
The jth kernel PC eigenvector αj is to maximize
1 1
α 0 K (I − 110 )K α
n−1 n
beamer-tu-logo
beamer-tu-logo
or in matrix notation
X − µ = LF + ε.
The coefficient lij is called the loading of the ith variable on the jth
factor, so the matrix L is the matrix of factor loadings.
beamer-tu-logo
Assumption
The observable X and the unobservable F are related by
Cov (X , F ) = L.
or
Cov (Xi , Fj ) = lij .
Note that if T is (m × m) orthogonal, then (LT )(LT )0 = LL0 , so
loadings LT generate the same Σ as L: loadings are not unique.
beamer-tu-logo
Σ = LL0
Σ = LL0 + Ψ
beamer-tu-logo
beamer-tu-logo
with m = p.
If λ1 + λ2 + · · · + λm >> λm+1 + · · · + λp , and L(m) is the first m
columns of L, then
0
Σ ≈ L(m) L(m)
gives such an approximation with Ψ = 0.
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
ψi∗ = 1 − h̃i∗2 ,
or
0
Ψ = I − diag(L∗r L∗r ).
This is the Principal Factor solution.
The Principal Component solution is the special case where the
initial communalities are all 1.
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
Factor1 Factor2
beamer-tu-logo
Factor1 Factor2
1.9093697 0.8880550
beamer-tu-logo
Output description:
By default the program will convert the sample covariance matrix
S to a correlation matrix before computing. If you want to override
this behavior, you can choose the matrix yourself using the
covmat argument.
The uniquenesses are the estimates of the diagonal elements of
Ψ. In the text- book, these are called specific variances. The
larger the specific variance, the less a particular variable is
determined by the latent factors.
At the foot of the loadings, the SS loadings are the column sum of
squares ∑i lij2 .
beamer-tu-logo
Call:
factanal(x = stock, factors = 2, rotation = "none")
Uniquenesses:
J P Morgan Citibank Wells Fargo Royal Dutch Shell
0.417 0.275 0.542 0.005
Exxon Mobil
0.530
beamer-tu-logo
Factor1 Factor2
SS loadings 1.622 1.610
Proportion Var 0.324 0.322
Cumulative Var 0.324 0.646
so
bl 2 +bl 2 + · · · +bl 2
Proportion of total sample 1j 2j pj
= (14)
variance due to jth factor s11 + s22 + · · · + spp
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo
> det(R)
[1] 0.1752028
> det(L%*%t(L)+Psi) beamer-tu-logo
[1] 0.1788174
Z. Zhang (UCAS) Econometrics II Week 4 March 19, 2025 156 / 230
Scaling and the Likelihood
If the maximum likelihood estimates for a data matrix X are L
b and
b and
Ψ,
Yn×p = Xn×p D
is a scaled data matrix, with the columns of X scaled by the
entries of the diagonal matrix D, then the maximum likelihood
b and D 2 Ψ.
estimates for Y are D L b
That is, the mle’s are invariant to scaling:
b Y = DΣ
Σ b X D.
Proof. LY (µ, Σ) = LX (D −1 µ, D −1 ΣD −1 ).
No distinction between covariance and correlation matrices.
beamer-tu-logo
Write
Σ∗ = Ψ−1/2 ΣΨ−1/2
= Ψ−1/2 (LL0 + Ψ)Ψ−1/2
= (Ψ−1/2 L)(Ψ−1/2 L)0 + Ip
= L∗ L∗ 0 + Ip .
beamer-tu-logo
Σ∗ − Ip = Ψ−1/2 (Σ − Ψ)Ψ−1/2 ,
n b −1 ).
(I + ∆
n−1
beamer-tu-logo
In particular, the sample correlations of the factor scores are zero.
Z. Zhang (UCAS) Econometrics II Week 4 March 19, 2025 164 / 230
Two estimation methods:
Regression Method
X and F have a joint multivariate normal distribution
Σ = LL0 + Ψ L
X −µ
Var =
F L0 Im
E(F | X = x) = L0 S −1 (x − x̄)
which leads to
b0 (L
bfj = L bLb0 + Ψ)
b −1 (xj − x̄)
= (I + Lb0 Ψ
b −1 L)
b −1 L
b0 Ψ
b −1 (xj − x̄)
bf LS = [I + (L Ψ b0 b −1 b −1
L) ]bfjR .
Z. Zhang (UCAS) j Econometrics II Week 4 March 19, 2025 165 / 230
> f2 <- factanal(stock, factor = 2, rotation = "none",
scores="regression")
> pairs(f2$scores)
−2 −1 0 1 2
2
1
Factor1
0
−1
−2
2
1
Factor2
0
−1
−2
−2 −1 0 1 2
beamer-tu-logo
beamer-tu-logo
beamer-tu-logo