Vdoc.pub Generalized Vectorization Cross Products and Matrix Calculus
Vdoc.pub Generalized Vectorization Cross Products and Matrix Calculus
This book presents the reader with new operators and matrices that arise in
the area of matrix calculus. The properties of these mathematical concepts
are investigated and linked with zero-one matrices such as the commutation
matrix. Elimination and duplication matrices are revisited and partitioned into
submatrices. Studying the properties of these submatrices facilitates achieving
new results for the original matrices themselves. Different concepts of matrix
derivatives are presented and transformation principles linking these concepts
are obtained. One of these concepts is used to derive new matrix calculus
results, some involving the new operators and others the derivatives of the
operators themselves. The last chapter contains applications of matrix calculus,
including optimization, differentiation of log-likelihood functions, iterative
interpretations of maximum likelihood estimators, and a Lagrangian multiplier
test for endogeneity.
DARRELL A. TURKINGTON
University of Western Australia
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Mexico City
Cambridge University Press
32 Avenue of the Americas, New York, NY 10013-2473, USA
www.cambridge.org
Information on this title: www.cambridge.org/9781107032002
C Darrell A. Turkington 2013
A catalog record for this publication is available from the British Library.
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for
external or third-party Internet Web sites referred to in this publication and does not guarantee
that any content on such Web sites is, or will remain, accurate or appropriate.
Contents
Preface page ix
1 Mathematical Prerequisites 1
1.1 Introduction 1
1.2 Kronecker Products 2
1.3 Cross-Product of Matrices 6
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 13
1.4.1 Basic Operators 13
1.4.2 Vecs, Rvecs, and the Cross-Product Operator 15
1.4.3 Related Operators: Vech and v 17
1.4.4 Generalized Vecs and Generalized Rvecs 18
1.4.5 Generalized Vec Operators and the Cross-Product
Operator 25
2 Zero-One Matrices 28
2.1 Introduction 28
2.2 Selection Matrices and Permutation Matrices 28
2.3 The Elementary Matrix Eimn j 34
2.4 The Commutation Matrix 35
2.4.1 Commutation Matrices, Kronecker Products,
and Vecs 38
2.4.2 Commutation Matrices and Cross-Products 50
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 57
2.5.1 Deriving Results for Generalized Vecs and Rvecs of
the Commutation Matrix 60
2.5.2 Generalized Vecs and Rvecs of the Commutation
Matrix and Cross-Products 68
2.5.3 KnG,G versus Rvecn KGn 70
2.5.4 The Matrix Nn 71
v
vi Contents
commutation matrix itself. This chapter introduces two new matrices whose
properties are investigated. One is similar to the commutation matrix in
that its submatrices are certain elementary matrices. The second, I call a
‘twining matrix’, a zero-one matrix that intertwines rows or columns of a
given set of matrices. Its relationship to the commutation matrix is clearly
shown.
Chapter 3 studies in some detail well-known matrices associated with
matrix calculus, namely elimination and duplication matrices. The
approach taken is to partition these matrices into interesting submatrices
and study the properties of these submatrices. This facilitates the inves-
tigation as to how these peculiar matrices interact with other matrices,
particularly Kronecker products. It also involves the introduction of new
matrix operators whose properties in turn are studied.
Chapter 4 looks at four concepts of the derivative of a matrix with respect
to another matrix that exists in the literature and develops transformation
principles that allow an easy movement from a result obtained using one
of the concepts to the corresponding results for the others. In doing so,
extensive use is made of results obtained in the first two chapters.
Chapter 5 derives new matrix calculus results with reference to general-
ized vecs and cross-products of matrices, and shows how those results can
be expanded into appropriate submatrices. The last section of this chapter
gives some simple, but powerful, theorems involving the concept of the
matrix derivative used in this book.
The final chapter presents applications of matrix calculus itself. It demon-
strates how matrix calculus can be used to efficiently solve complicated
optimization problems, but it is largely concerned with the use of matrix
calculus in statistics and econometrics. It explains how matrix differentia-
tion can be used in differentiating a log-likelihood function, involving as it
usually does a symmetric covariance matrix, in obtaining the score vector
and finally in obtaining the information matrix. This work calls on the
theorems of the last section of Chapter 5.
The second part of Chapter 6 uses matrix calculus to obtain iterative
interpretations of maximum likelihood estimators in simultaneous equa-
tion models in terms of econometric estimators. It looks at the computa-
tional convergence of the different interpretations. Finally, a new Lagrangian
multiplier test statistic is derived for testing for endogeneity in such models.
Two institutions should be mentioned in the preface: First, my home
university, the University of Western Australia, for allowing me time off
from teaching to concentrate on the manuscript; second, Nuffield College
Preface xi
Mathematical Prerequisites
1.1 Introduction
This chapter considers elements of matrix algebra, knowledge of which is
essential for discussions throughout this book. This body of mathematics
centres around the concepts of Kronecker products and vecs of a matrix.
From the elements of an m×n matrix A = {ai j } and a p×q matrix B = {bi j },
the Kronecker product forms a new mp×nq matrix. The vec operator forms
a column vector from the elements of a given matrix by stacking its columns
one underneath the other. This chapter discusses several new operators that
are derived from these basic operators.
The operator, which I call the cross-product operator, takes the sum of
Kronecker products formed from submatrices of two given matrices. The
rvec operator forms a row vector by stacking the rows of a given matrix
alongside each other. The generalized vec operator forms a new matrix
from a given matrix by stacking a certain number of its columns, taken as a
block, under each other. The generalized rvec operator forms a new matrix
by stacking a certain number of rows, again taken as a block, alongside each
other.
Although it is well known that Kronecker products and vecs are intimately
connected, this connection also holds for rvec and generalised operators as
well. The cross-product operator, as far as I know, is being introduced
by this book. As such, I present several theorems designed to investigate
the properties of this operator. This book’s approach is to list, without
proof, well-known properties of the mathematical operator or concept in
hand. However, I give a proof whenever I present the properties of a new
operator or concept, a property in a different light, or something new about a
concept.
1
2 Mathematical Prerequisites
where the notation we use for the ith row of a matrix throughout this book
′
is ai . Thus, from our definition of a Kronecker product
⎛ 1′ ⎞
a ⊗b
A⊗b=⎝ ..
⎠.
⎜ ⎟
.
′
am ⊗ b
A⊗b=⎝ ..
⎠.
⎜ ⎟
.
m′
b⊗a
Then, in general,
⎛ ⎞
A ⊗ B11 · · · A ⊗ B1r
A ⊗ B = ⎝ .. ..
⎠.
⎜ ⎟
. .
A ⊗ Bs1 · · · A ⊗ Bsr
= (a ⊗ B1 . . . a ⊗ Br ). (1.4)
A ⊗ B = (a1 ⊗ B1 . . . a1 ⊗ Br . . . an ⊗ B1 . . . an ⊗ Br ).
A ⊗ B = (a1 ⊗ b1 . . . a1 ⊗ bq . . . an ⊗ b1 . . . an ⊗ bq ). (1.5)
a ′ ⊗ B1
⎛ ⎞
..
a′ ⊗ B = ⎝ ⎠.
⎜ ⎟
.
a ′ ⊗ Bs
If A is m×n, then
′
⎛ ⎞
a 1 ⊗ B1
⎜ .. ⎟
⎜
⎜ . ⎟
⎟
⎜ 1′ ⎟
⎜ a ⊗ Bs ⎟
..
⎜ ⎟
A⊗B =⎜
⎜ ⎟
. ⎟
⎜ ′ ⎟
⎜ am ⊗ B ⎟
⎜ 1⎟
⎜ .. ⎟
⎜
⎝ . ⎟
⎠
′
a m ⊗ Bs
1.2 Kronecker Products 5
′
where, as before, ai refers to the ith row of A, i = 1, . . . , m. If B is partitioned
into its rows, then
⎛ 1′ ′ ⎞
a ⊗ b1
⎜ .. ⎟
⎜ 1′ . p′ ⎟
⎜ ⎟
⎜a ⊗b ⎟
⎜ ⎟
A⊗B =⎜
⎜ .. ⎟
(1.6)
⎜ ′ . ⎟
⎜ am ⊗ b1 ′ ⎟
⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
′ ′
am ⊗ b p
′
where b j refers to this jth row of B, j = 1, . . . , p.
Let x be a column vector and A a matrix. As a consequence of these
′ ′
results, the ith row of x ′ ⊗ A is x ′ ⊗ ai , where ai is the ith row of A, and
the jth column of x ⊗ A is x ⊗ a j , where a j is the jth column of A.
Another useful property for Kronecker products is this: Suppose A and B
are m×n and p×q matrices respectively, and x is any column vector. Then,
A(In ⊗ x ′ ) = (A ⊗ 1)(In ⊗ x ′ ) = A ⊗ x ′
(x ⊗ Ip )B = (x ⊗ Ip )(1 ⊗ B) = x ⊗ B,
where In is the n×n identity matrix.
We can use these results to prove that for a, a n×1 column vector and
b a p×1 column vector,
(a ′ ⊗ IG )(b ′ ⊗ InG ) = b ′ ⊗ a ′ ⊗ IG .
Clearly,
(a ′ ⊗ IG )(b ′ ⊗ InG ) = (a ′ ⊗ IG )(b ′ ⊗ In ⊗ IG ) = a ′ (b ′ ⊗ In ) ⊗ IG
= (1 ⊗ a ′ )(b ′ ⊗ In ) ⊗ IG = b ′ ⊗ a ′ ⊗ IG .
Another notation used throughout this book is: I represent the ith column
of the n×n identity matrix In by ein and the jth row of this identity matrix
′
by e nj . Using this notation, a result that we find useful in our future work is
given by our first theorem.
for i = 1, . . . , n. Then,
′
In ⊗ emp = O e1n O . . . O enn O .
Proof: We have
′ ′
m′
m′ ′
In ⊗ emp = e1n ⊗ e m n
= e p ⊗ e1n . . . e m n
p . . . en ⊗ e p p ⊗ en
= O e1n O . . . O enn O .
The operator τ is the relevant operator to use when matrices are partitioned
into a ‘column’ of submatrices, where as τ is the appropriate operator to
use when matrices are partitioned into a ‘row’ of submatrices. The two
operators are intimately connected as
(AτGmn B) ′ = A1′ ⊗ B1′ + · · · + AG′ ⊗ BG′ = A ′ τGmn B ′ .
In this book, theorems are proved for τ operator and the equivalent results
for the τ operator can be obtained by taking transposes.
Sometimes, we have occasion to take the cross-products of very large
matrices. For example, suppose A is mrG× p and B is nG×q as previously
shown. Thus, if we partition A as
⎛ ⎞
A1
⎜ .. ⎟
A = ⎝ . ⎠,
AG
each of the submatrices in this partition is mr × p. To avoid confusion, signify
the cross-product between A and B, namely A1 ⊗ B1 + · · · + AG ⊗ BG as
AτG,mr,n B, and the cross-product between B and A, B1 ⊗ A1 + · · · + BG ⊗
AG as BτG,n,mr A.
Notice that in dealing with two matrices A and B, where A is mG× p
and B is mG×q, then it is possible to take two cross-products AτGmm B or
AτmGG B, but, of course, these are not the same. However, the following
theorem shows that in some cases the two cross-products are related.
Proof: Write
′ ⎞
d1
⎛
D = (d1 . . . ds ) = ⎝ ... ⎠ .
⎜ ⎟
′
dG
Then,
′ ⎛ 1′
d 1 ⊗ In
⎛ ⎞ ⎞
(d ⊗ In )B
(D ⊗ In )B = ⎝ .
.. ..
⎠B = ⎝ ⎠.
⎜ ⎟ ⎜ ⎟
.
′ ′
d G ⊗ In (d G ⊗ In )B
8 Mathematical Prerequisites
Partition A as
⎛ ⎞
A1
A = ⎝ ... ⎠
⎜ ⎟
AG
where each submatrix Ai is m× p. Then,
′ ′
(D ⊗ In )B τGnm A = (d 1 ⊗ In )B ⊗ A1 + · · · + (d G ⊗ In )B ⊗ AG .
Now
⎛ ′ ⎞
d1′ ⊗ Im
⎛ ⎞
d1 ⊗ Im A
(D ′ ⊗ Im )A = ⎝ .. ..
⎠A = ⎝ . ⎠.
⎜ ⎟ ⎜ ⎟
.
ds′ ⊗ Im ds′ ⊗ Im A
But,
⎛ ⎞
A1
d j ⊗ Im A = d1 j Im . . . dG j Im ⎝ ... ⎠ = d1 j A1 + · · · + dG j AG ,
′ ⎜ ⎟
AG
so when we partition B as
⎛ ⎞
B1
B = ⎝ ... ⎠
⎜ ⎟
Bs
where each submatrix Bi is n×q, we have
Bτsnm (D ′ ⊗ Im )A
= B1 ⊗ (d11 A1 + · · · + dG1 AG ) + · · · + Bs ⊗ (d1s A1 + . . . dGs AG )
= B1 ⊗ d11 A1 + · · · + Bs ⊗ d1s A1 . . . B1 ⊗ dG1 AG + · · · + Bs ⊗ dGs AG
= (d11 B1 + · · · + d1s Bs ) ⊗ A1 + · · · + (dG1 B1 . . . + dGs Bs ) ⊗ AG
′ ′
= (d 1 ⊗ In )B ⊗ A1 + · · · + (d G ⊗ In )B ⊗ AG .
A = (C D . . . F )
AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG .
Ai ⊗ Bi = (Ci ⊗ Bi Di ⊗ Bi . . . Fi ⊗ Bi ).
Theorem 1.4 Let A and B be mG× p matrices, and let C and D be nG×q
matrices. Then,
and
Proof: Clearly,
and
Proof: Clearly,
AG C BG D
Likewise,
(E ⊗ F )(AτGmn B) = (E ⊗ F )(A1 ⊗ B1 + · · · + AG ⊗ BG )
= EA1 ⊗ F B1 + · · · + EAG ⊗ F BG
⎛ ⎞ ⎛ ⎞
EA1 F B1
= ⎝ ... ⎠ τGrs ⎝ ... ⎠
⎜ ⎟ ⎜ ⎟
EAG F BG
= (IG ⊗ E )A τGrs (IG ⊗ F )B.
(AG ) j .
1.3 Cross-Product of Matrices 11
That is, to form A( j ) where we stack the jth rows of the submatrices under
each other.
Notice if C is a r ×G matrix and D is a s×m matrix, then from Equation 1.6
′ ′ ⎞
c1 ⊗ d1
⎛
⎜ .. ⎟
⎜ 1′ . s ′ ⎟
⎜ ⎟
⎜c ⊗d ⎟
⎜ ⎟
C⊗D =⎜
⎜ .. ⎟
⎜ ′ . ⎟
⎜ cr ⊗ d 1′ ⎟
⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
′ ′
cr ⊗ ds
so
′ ′ ⎞
c1 ⊗ d j
⎛
[(C ⊗ D)A]( j ) .. j′
=⎝ ⎠ A = (C ⊗ d )A. (1.9)
⎜ ⎟
.
′ ′
cr ⊗ d j
AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG
⎛ ⎞ ⎛ (1) ⎞
(A1 )1 . ⊗ B1 + · · · + (AG )1 . ⊗ BG A τG1n B
=⎝ .
. .
. ⎟ ⎜ .. .. .. ⎟ .
⎠=⎝ .
⎜
. . . .⎠
(m)
(A1 )m . ⊗ B1 + · · · + (AG )m . ⊗ BG A τG1n B
12 Mathematical Prerequisites
aτn1G B = (a ′ ⊗ IG )B.
Proof: Clearly,
aτn1G B = a1 ⊗ B1 + · · · + an ⊗ Bn
a1 ⊗ B1 = a1 B1
so
a τn1G B = a1 B1 + · · · + an Bn = (a ′ ⊗ IG )B.
aτn11 B = a ′ B = Bτn11 a.
Proof: If we partition B as
⎛ ⎞
B1
B = ⎝ ... ⎠
⎜ ⎟
Bm
C(Aτm1G B) = C(A1 . ⊗ B1 + · · · + Am . ⊗ Bm )
= A1 . ⊗ CB1 + · · · + Am . ⊗ CBm = Aτm1r (Im ⊗ C )B.
of the basic operators, which are particularly useful when we are deal-
ing with partitioned matrices. Theorems involving these operators and the
cross-product operator are presented in the following sections.
am
These basic relationships mean that results for one of the operators can be
readily obtained from results for the other operator.
Both operators are connected with the Kronecker product operator.
From
ab ′ = b ′ ⊗ a = a ⊗ b ′ ,
a property noted in Section 1.2, it is clear that the jth column of ab ′ is b j a
and the ith row of ab ′ is ai b ′ , so
vec ab ′ = vec(b ′ ⊗ a) = b ⊗ a (1.11)
14 Mathematical Prerequisites
and
rvec ab ′ = rvec(a ⊗ b ′ ) = a ′ ⊗ b ′ .
More generally, if A, B, and C are three matrices such that the product ABC
is defined, then
vec ABC = (C ′ ⊗ A)vec B
and
rvec ABC = rvec B(A ′ ⊗ C ).
Often, we will have occasion to take the vec of a partitioned matrix. Let
A be a m×np matrix and partition A so that A = (A1 . . . A p ), where each
submatrix is m×n. Then, it is clear that
⎛ ⎞
vec A1
⎜ . ⎟
vec A = ⎝ .. ⎠ .
vec A p
An application of this result follows. Suppose B is any n×q matrix and
consider
A(Ip ×B) = (A1 B . . . A p B).
Then,
⎛ ⎞ ⎛ ⎞
vec A1 B Iq ⊗ A1
.. ..
vec A(Ip ⊗ B) = ⎝ ⎠=⎝ ⎠ vec B.
⎜ ⎟ ⎜ ⎟
. .
vec A p B Iq ⊗ A p
If A is a m×n matrix and x is any vector, then
⎛ ⎞
a1 ⊗ x
vec(A ⊗ x) = vec(a1 ⊗ x . . . an ⊗ x) = ⎝ ... ⎠ = vec A ⊗ x
⎜ ⎟
an ⊗ x
′
vec(x ⊗ A) = vec(x1 A . . . xn A) = x ⊗ vec A. (1.12)
and
vec(A ⊗ x ′ ) = vec(a1 ⊗ x ′ . . . an ⊗ x ′ )
vec(a1 ⊗ x ′ )
⎛ ⎞ ⎛ ⎞
x ⊗ a1
.. ⎟ ⎜ .. ⎟
=⎝ ⎠ = ⎝ . ⎠ = vec(x ⊗ a1 . . . x ⊗ an )
⎜
.
vec(an ⊗ x ′ ) x ⊗ an
= vec(x ⊗ (a1 . . . an )) = vec(x ⊗ A), (1.13)
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 15
where in our analysis we have used Equations 1.11 and 1.5. Using Equations
1.12 and 1.13, we have that if x and y are any vectors
a1′ x
⎛ ⎞
an′ x
By taking transposes and using the fact that rvec A ′ = (vec A) ′ , we get the
corresponding results for the rvec operator.
Bn
so
(vec A ′ ⊗ IG )(Ip ⊗ B) = a11 B1 + · · · + an1 Bn . . . a1p B1 + · · · + anp Bn
= (a11 B1 . . . a1p B1 ) + · · · + (an1 Bn . . . anp Bn )
′ ′
= a1 ⊗ B1 + · · · + an ⊗ Bn = Aτn1G B.
Theorem 1.10 Let A and B be m×n and p×q matrices, respectively. Then,
Im τm1p (A ⊗ B) = rvec A ⊗ B.
A = (A1 . . . A p )
The operator that does this for us is called the generalized vec of order n,
denoted by vecn . To form vecn A, we stack columns of A underneath each
other taking n at a time. Clearly, this operator is only performable on A if
the number of columns of A is a multiple of n. Under this notation,
vec A = vec1 A.
In a similar fashion, if A is partitioned into its rows we know that the rvec
operator forms a row vector out of the elements of A by stacking the rows
of A alongside each other. If A has a large number of rows, say, A is mp×n
we often have occasion to partition A into p m×n matrices, so we write
⎛ ⎞
A1
⎜ .. ⎟
A=⎝ . ⎠
Ap
where each submatrix is m×n matrix. Again we may want to stack these
submatrices alongside each other instead of underneath each other, to form
the m×np matrix
(A1 . . . A p ).
The operator that does this for us is called the generalized rvec of order m
denoted by rvecm . To form rvecm A, we stack rows of A alongside each other
taking m at a time, so this operator is only performable on A if the number
of rows of A is a multiple of m. Under this notation,
rvec A = rvec1 A.
For a given matrix A, which is m×n, the number of generalized vecs (rvecs)
that can be performed on A clearly depends on the number of columns
n(rows m) of A. If n(m) is a prime number, then only two generalized
vec (rvec) operators can be performed on A, vec1 A = vec A and vecn A =
A, rvec1 A = rvec A, and rvecm A = A.
For n(m) any other number, the number of generalized vec (rvec) oper-
ators that can be performed on A is the number of positive integers that
divide into n(m).
As with the vec and rvec operators, the vecn and rvecn operators are
intimately connected. Let A be a m×np matrix and, as before, write
A = (A1 . . . A p )
20 Mathematical Prerequisites
an ⊗ B
As a special case, vecq (a ′ ⊗ B) = a ⊗ B.
′
Now write A = (a1 . . . am ) ′ , where ai is the ith row of A. Then,
⎛ 1′ ⎞
a ⊗B
A⊗B =⎝
⎜ .. ⎟
. ⎠
′
am ⊗ B
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 21
so
′ ′
rvec p (A ⊗ B) = (a1 ⊗ B . . . am ⊗ B) = rvec A ⊗ B, (1.17)
and as a special case rvec p (a ⊗ B) = a ′ ⊗ B.
The generalized vec of a matrix can be undone by taking the appropriate
generalized rvec of the vec. This property induced the author to originally
call generalized rvecs, generalized devecs (see Turkington (2005)). If A is
m×n, for example, then clearly
rvecm (vec A) = A.
In fact, if vec j A refers to a generalized vec operator that is performable on
A, then the following relationships exist between the two operators
rvec(vec A) = (vec A) ′ = rvec A ′ ,
rvecm (vec j A) = A
rvec(vec j A) = 1×mn vectors where elements are obtained
from a permutation of those of (vec A) ′ .
In a similar fashion, the generalized vec operator can be viewed as undoing
the rvec of a matrix.
If rveci A refers to a generalized rvec operator that is performable on A,
then we have
vec(rvec A) = vec A ′ = (rvec A) ′
vecn (rveci A) = A
vec(rveci A) = mn×1 vectors whose elements are obtained
from a permutation of those of vec A ′ .
There are some similarities between the behavior of vecs on the one hand
and that of generalized vecs on the other. For example, if A is an m×n
matrix, then as
A = A In In
we have
vec A = (In ⊗ A)vec In .
If A be an m×nG matrix, we have the following theorem:
An
by Equation 1.16.
AB = (AB1 . . . AB p )
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 23
so
⎛ ⎞
AB1
⎜ . ⎟
vecG AB = ⎝ .. ⎠ = (Ip ⊗ A)vecG B.
AB p
′ ′
= vecG c 1 ⊗ A1 D + · · · + c r ⊗ Ar D
⎛ ⎞ ⎛ ⎞
c11 A1 D cr1 Ar D
⎜ . ⎟ ⎜ . ⎟
= ⎝ .. ⎠ + · · · + ⎝ .. ⎠
c1p A1 D cr p Ar D
24 Mathematical Prerequisites
Ar
= (c1′ ⊗ Im )(vecs A)D.
The result follows.
ym Ip
Proof:
(AτGmn B) ′ = (A1 ⊗ B1 + · · · + AG ⊗ BG ) ′
= A1′ ⊗ B1′ + · · · + AG′ ⊗ BG′ .
Now A ′ = (A1′ . . . AG′ ) where each Ai′ is p×m so
⎛ ′⎞
A1
′ ⎜ .. ⎟
vecm A = ⎝ . ⎠ .
AG′
Similarly,
B1′
⎛ ⎞
vecn B ′ = ⎝ ... ⎠
⎜ ⎟
BG′
where each submatrix B ′j is q×n so the result holds.
So
′ ′
IG τG1m A = e1G ⊗ A1 + · · · + eGG ⊗ AG = (A1 . . . AG ).
Theorem 1.17 Let A and B be mG× p and nG×q matrices, respectively, and
partition A and B as in Equation 1.7. Then,
vecq (AτGmn B) = vec(rvecm A)τG,mp,n B.
vec AG
But
⎛ ⎞
vec A1
⎜ .. ⎟
⎝ . ⎠ = vec(A1 . . . AG ) = vec(rvecm A).
vec AG
Proof: Write
A(B ⊗ C ) = A(B ⊗ Im )(Iq ⊗ C )
and partition A as A = (A1 . . . AG ) where each submatrix in this partitioning
is p×m.
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 27
Then,
′
b1 ⊗ Im
⎛ ⎞
A(B ⊗ Im ) = (A1 . . . AG ) ⎝
⎜ .. ⎟
. ⎠
′
bG ⊗ Im
′ ′
= A1 b1 ⊗ Im + · · · + AG bG ⊗ Im
′ ′
= b1 ⊗ A1 + · · · + bG ⊗ AG = BτG1p vecm A,
so,
A(B ⊗ C ) = (BτG1p vecm A)(Iq ⊗ C ) = BτG1p (vecm A)C,
by Theorem 1.5.
TWO
Zero-One Matrices
2.1 Introduction
A matrix whose elements are all either one or zero is, naturally enough,
called a zero-one matrix. Such matrices have had a long association with
statistics and econometrics, although their prominence has really come to
the fore with the advent of matrix calculus. In this chapter, the intent is not to
give a list of all known zero-one matrices plus their properties. The reader
is referred to Magnus (1988), Magnus and Neudecker (1999), Lutkepohl
(1996), and Turkington (2005) for such material. Instead, what is presented
are zero-one matrices that may be new to the reader, but which I have found
useful in the evaluation of certain matrix calculus results. Having said that,
I do talk about some known zero-one matrices and their properties in
order for the reader to have a full understanding of the new matrices. The
later sections of this chapter are reserved for theorems linking the zero-one
matrices with the mathematical operators we looked at in Chapter 1.
AS = (a1 a4 a5 ) = B.
28
2.2 Selection Matrices and Permutation Matrices 29
when talking about the jth column of this matrix, we must specify the exact
location of this column. We do this by writing
j = (d − 1)q + j,¯
i = (c − 1)p + i¯
j = (d − 1)q + j¯
′ p′ q
= ecm Aedn ⊗ ei¯ Be j¯
= acd bi¯ j¯.
2.2 Selection Matrices and Permutation Matrices 31
7=1×4+3
9=1×5+4
Imn = Im ⊗ In
j = (d − 1)n + j¯
e mn
m n
m n
j = (Im ⊗ In ) ed ⊗ e j¯ = ed ⊗ e j¯ .
I6 = I3 ⊗ I2 .
5 = 2 × 2 + 1,
Sometimes, we wish to retrieve the element ai j from the vec A or from the
rvec A. This is a far simpler operation, as shown by Theorem 2.2.
32 Zero-One Matrices
Proof: We have
′
ai j = eim Ae nj .
′ ′
But ai j = vecai j = (e nj ⊗ eim )vec A. Also, ai j = rvec ai j = (rvec A)
(ei ⊗ e nj ).
m
The concept of a selection matrix can be generalized to handle the case where
our matrices are partitioned matrices. Suppose A is an m × nG matrix and
we partition A as
A = (A1 . . . An ) (2.1)
Ai = A ein ⊗ IG .
B = A(S ⊗ IG )
CG
If we wish to form
⎛ ⎞
C2
D = ⎝ C3 ⎠
C7
′ ⎞
e2G
⎛
′
we premultiply C by the selection matrix S ⊗ Im where S = ⎝ e3G ⎠.
′
e7G
Finally, staying with the same partition of C notice that
⎛ m′ ⎞ ⎛ ⎞
e j C1 (C1 ) j .
IG ⊗ e mj C = ⎝ ... ⎠ = ⎝ ... ⎠ = C ( j ) ,
′
⎜ ⎟ ⎜ ⎟
′
e mj CG (CG ) j .
where we use the notation introduced by Equation 1.8 in Chapter 1. That
′
is, (IG ⊗ e mj ) is the selection matrix that selects C ( j ) from C.
Sometimes instead of selecting rows or columns from a matrix A, we
want to rearrange the rows or columns of A. The zero-one matrix that
does this for us is called a permutation matrix. A permutation matrix P is
obtained from a permutation of the rows or columns of an identity matrix.
The result is a matrix in which each row and each column of the matrix
contains a single element, one, and all the remaining elements are zeros.
As the columns or rows of an identity matrix form an orthonormal set
of vectors, it is quite clear that every permutation matrix is orthogonal,
that is, P ′ = P −1 . Where a given matrix A is premultiplied (postmultiplied)
by a permutation matrix, formed from the rows (columns) of an identity
matrix, the result is a matrix whose rows (columns) are obtained from a
permutation of the rows (columns) of A.
As with selection matrices, the concept of permutation matrices can be
generalized to handle partitioned matrices. If A is m × nG and we partition
A as in Equation 2.1, and we want to rearrange the submatrices in this
partitioning, we can do this by post multiplying A by
P ⊗ IG
where P is the appropriate permutation matrix formed from the columns
of the identity matrix In .
Similarly, if we want to rearrange the submatrices in C given by Equation
2.2, we premultiply C by
P ⊗ Im
34 Zero-One Matrices
where P is the appropriate permutation matrix formed from the rows of the
identity matrix IG .
Also,
′ ′
Eimn
j = e nj eim = E nm
ji .
Similarly,
rvec p (a ⊗ b) = a′ ⊗ b = ba′
2.4 The Commutation Matrix 35
(A ⊗ B).j = ad ⊗ b j¯
whereas
a1
⎛ ⎞
vec A′ = ⎝ ... ⎠ .
⎜ ⎟
am
Clearly, both vec A and vec A′ contain all the elements of A, although
arranged in different orders. It follows that there exists a mn × mn
36 Zero-One Matrices
This matrix is called the commutation matrix. The order of the subscripts
is important. The notation is that Kmn is the commutation matrix associated
with an m × n matrix A and takes vec A to vec A′ . On the other hand, Knm is
the commutation matrix associated with an n × m matrix and as A′ is such
a matrix it follows that
Using Equation 2.7, it follows that the two commutation matrices are linked
by
−1 ′
so it follows that Knm = Kmn = Kmn , where the last equality comes about
because Kmn , like all permutation matrices, is orthogonal.
If the matrix A is a vector a, so m = 1, we have that
vec a = vec a′
so
K1n = Kn1 = In .
The commutation matrix can also be used to take us from a rvec to a vec.
For A as previously, we have
There are several explicit expressions for the commutation matrix. Two of
the most useful, particularly when working with partitioned matrices are
these:
′ ⎤
In ⊗ e1m
⎡
Kmn = ⎣ .. ⎥ n n
⎦ = Im ⊗ e1 . . . Im ⊗ en , (2.8)
⎢
.
m′
In ⊗ em
2.4 The Commutation Matrix 37
Kmn = ⎣ .. .. ..
⎦=⎣ ⎦.
⎢ ⎥ ⎢ ⎥
. . .
′ ′ ′
m
In ⊗ em e1n ⊗ em
m
... m
enn ⊗ em
We have occasion to use this expression for Kmn throughout this book.
′
Notice that Knn is symmetric and is its own inverse. That is, Knn = Knn and
Knn Knn = In2 , so Knn is a symmetric idempotent matrix.
For other expressions, see Magnus (1988), Graham (1981), and Hender-
son and Searle (1979).
Large commutation matrices can be written in terms of smaller commu-
tation matrices as the following result shows. (See Magnus (1988), Chapter
3).
Moreover,
Kpm (A ⊗ b) = b ⊗ A (2.13)
Kmp (b ⊗ A) = A ⊗ b. (2.14)
If B is m × n, then
Kpm (A ⊗ bc ′ ) = b ⊗ A ⊗ c ′
Kmp (bc ′ ⊗ A) = c ′ ⊗ A ⊗ b.
A ⊗ B = A ⊗ ⎝ ... ⎠ = ⎝ ..
⎠. (2.15)
⎜ ⎟ ⎜ ⎟
.
p′ p′
b A⊗b
2.4 The Commutation Matrix 39
However, the last matrix in Equation 2.15 can be achieved from A ⊗ B using
a commutation matrix as the following theorem shows.
Kpm (A ⊗ B) = ⎝ ..
⎠.
⎜ ⎟
.
′
A ⊗ bp
A ⊗ B = A ⊗ (b1 . . . bq ) = (A ⊗ b1 . . . A ⊗ bq ). (2.16)
(A ⊗ B)Knq = (A ⊗ b1 . . . A ⊗ bq ).
Notice that
′ ′ ⎞
a1 ⊗ b1
⎛
⎜ .. ⎟
⎜
⎜ m′ . ⎟
⎜ a ⊗ b1′ ⎟
⎟
⎜ ⎟
Kpm (A ⊗ B) = ⎜
⎜ .. ⎟
⎜ ′ . ⎟
⎜ a1 ⊗ b p′ ⎟
⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
′ ′
am ⊗ b p
so using the operator introduced in Section 1.3 of Chapter 1, we have
⎛ j′ ′ ⎞ ⎛ 1′ j ′
a ⊗ b1
⎞
b (a ⊗ Iq )
(Kpm (A ⊗ B))( j ) = ⎝ .. .. j′
⎠=⎝ ⎠ = B(a ⊗ Iq ).
⎜ ⎟ ⎜ ⎟
. .
′ ′ ′ ′
a j ⊗ bp b p (a j ⊗ Iq )
(2.17)
This result will be useful to us in Chapter 5.
In our work in Chapter 4, we have occasion to consider the ith row
of Kpm (A ⊗ B). From Theorem 2.3, it is clear that in obtaining this row
we must specify exactly where the ith row is located in this matrix. If i is
′ ′
between 1 and m, the ith row is ai ⊗ b1 , if between m + 1 and 2m it is
′ ′
ai ⊗ b2 and so on, until i is between (p − 1)m and pm, in which case the
′ ′
ith row is ai ⊗ b p . To cater for all possibilities, we use the device introduced
in Section 2.2 of this chapter. We write
i = (c − 1)m + i¯
for c some value between 1 and p and i¯ some value between 1 and m. Then,
¯′ ′
Kpm (A ⊗ B) i. = ai ⊗ bc
(2.18)
where bc is q × 1. Taking the vecq of both sides of Equation 2.18, we have
¯ ′ p′ ′ mp
vecq Kpm (A ⊗ B) i. = ai bc = A′ eim
¯ ec B = A Eic
¯ B. (2.19)
In comparing Equation 2.19 with Equation 2.3, we note that the difference
in taking the vecq of [Kpm (A ⊗ B)]i . as compared to taking the vecq of
(A × B)i . is that the subscripts of the elementary matrix are interchanged.
Undoing the vecq by taking the rvec of each side, we get another way of
writing [Kpm (A ⊗ B)]i. namely
np
Kpm (A ⊗ B) i. = rvec A′ Eic¯ B.
(2.20)
2.4 The Commutation Matrix 41
We will also have occasion to consider the jth column of [(A ⊗ B)Knq ].
Referring to Theorem 2.4 again, we have to specify exactly where the jth
column is in the matrix. Conducting a similar analysis leads us to write
j = (d − 1)n + j¯
where d takes a suitable value between 1 and q and j¯ takes a suitable value
between 1 and n. Then,
(A ⊗ B)Knq . j = a j¯ ⊗ bd (2.21)
q ′ qn
rvec p (A ⊗ B)Knq . j = bd a j¯′ = Bed e nj¯ A′ = BEd j¯ A′ .
(2.22)
Again, comparing Equation 2.22 with Equation 2.5, we see that the sub-
scripts of the elementary matrix are interchanged. Undoing the rvec p by
taking the vec, we get another way of writing [(A ⊗ B)Knq ].j , namely
qn
(A ⊗ B)Knq . j = vec BEd j¯ A′ .
BG
Note that
⎛ ⎞⎛ ⎞
Kmp O b1 ⊗ A
.. ⎟ ⎜ .. ⎟
(Iq ⊗ Kmp )(vec B ⊗ A) = ⎝
⎜
. ⎠⎝ . ⎠
O Kmp bq ⊗ A
⎛ ⎞ ⎛ ⎞
Kmp (b1 ⊗ A) A ⊗ b1
.. ⎟ ⎜ .. ⎟
=⎝ ⎠ = ⎝ . ⎠,
⎜
.
Kmp (bq ⊗ A) A ⊗ bq
so we have
⎛ ⎞
A ⊗ b1
⎜ . ⎟
(Kqm ⊗ Ip )(A ⊗ vec B) = ⎝ .. ⎠ = (Ip ⊗ Kmp )(vec B ⊗ A). (2.23)
A ⊗ bq
A consequence of Theorems 2.3 and 2.5, which is useful for our work
throughout many chapters, is the following result.
2.4 The Commutation Matrix 43
by Theorem 2.3.
B = (B1 . . . BG )
Then,
(A ⊗ B)(KnG ⊗ Iq ) = (A ⊗ B1 . . . A ⊗ BG ).
= A ⊗ B e1G ⊗ Iq . . . A ⊗ B eGG ⊗ Iq
= (A ⊗ B1 . . . A ⊗ BG ).
product of vecs and vice versa. Partitioning both A and B into their columns,
we have:
A = (a1 . . . an ), B = (b1 . . . bq ).
A ⊗ B = (a1 ⊗ b1 . . . a1 ⊗ bq . . . an ⊗ b1 . . . an ⊗ bq ),
so
⎛ ⎞
a1 ⊗ b1
⎜ .. ⎟
⎜
⎜ . ⎟
⎟
⎜ a1 ⊗ bq ⎟
⎜ ⎟
vec(A ⊗ B) = ⎜ ..
⎟,
⎜ ⎟
⎜ . ⎟
⎜a ⊗ b ⎟
⎜ n 1⎟
⎜ .. ⎟
⎝ . ⎠
an ⊗ bq
whereas
⎛ ⎞⎞ ⎛
b1
⎜ a ⊗ ⎜ .. ⎟ ⎟
⎜ 1 ⎝ . ⎠⎟
⎜ ⎟
⎜
⎜ bq ⎟ ⎟
vec A ⊗ vec B = ⎜ ..
⎟.
⎜ ⎟
⎜ ⎛. ⎞ ⎟
⎜
⎜ b1 ⎟ ⎟
⎜ . ⎟⎟
⎝ an ⊗ ⎝ .. ⎠ ⎠
⎜
bq
Clearly, both vectors have the same elements, although these elements are
rearranged in moving from one vector to another. Each vector must then
be able to be obtained by premultiplying the other by a suitable zero-one
matrix.
An application of Corollary 2.1 gives the following theorem:
Theorem 2.8
Proof: We write
bq
⎛ ⎛ ⎛ ⎞⎞ ⎞
b1
⎜ (K ⊗ I ) ⎜a ⊗ ⎜ .. ⎟⎟ ⎟
⎜ qm p ⎝ 1 ⎝ . ⎠⎠ ⎟
⎜ ⎟
⎜
⎜ bq ⎟
⎟
⎜ ..
= ⎜. ⎟ = vec (A ⊗ B),
⎟
⎜ ⎛ ⎛ ⎞⎞ ⎟
⎜
⎜ b1 ⎟
⎟
⎜ .. ⎟⎟ ⎟
⎝ (Kqm ⊗ Ip ) ⎝an ⊗ ⎝ . ⎠⎠ ⎠
⎜ ⎜
bq
−1
using Corollary 2.1. As Kmq = Kqm the inverse of (In ⊗ Kqm ⊗ Ip ) is
(In ⊗ Kmq ⊗ Ip ), which gives the second result.
Theorem 2.8 and Equation 2.23 can also be used to show that vec(A ⊗ B)
can be written in terms of either vec A or vec B:
Theorem 2.9
⎡ ⎛ ⎞⎤
Im ⊗ b1
..
vec(A ⊗ B) = ⎣In ⊗ ⎝ ⎠⎦ vec A,
⎢ ⎜ ⎟⎥
.
Im ⊗ bq
and
⎡⎛ ⎞ ⎤
Iq ⊗ a1
vec(A ⊗ B) = ⎣⎝ ... ⎠ ⊗ Ip ⎦ vec B.
⎢⎜ ⎟ ⎥
Iq ⊗ an
2.4 The Commutation Matrix 47
Kmn A = ⎝ .. ..
⎠A = ⎝ . ⎠.
⎜ ⎟ ⎜ ⎟
.
m′ m′
In ⊗ em In ⊗ em A
But,
′ ⎞ ⎛ m′ ⎞ ⎛
e mj
⎛ ⎞⎛ ⎞
O A1 e j A1 (A1 ) j·
⎟ ⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
′
In ⊗ e mj A = ⎝
⎜ .. ⎠⎝ . ⎠ = ⎝ . ⎠ = ⎝ . ⎠.
.
′ ′
O e mj An e mj An (An ) j·
Notice that when we use this property of Kmn , the second subscript of the
commutation matrix refers to the number of submatrices in the partition
2.4 The Commutation Matrix 49
of A whereas the first subscript refers to the number of rows in each of the
submatrices of the partition of A. Thus,
⎛ (1) ⎞
A
⎜ .. ⎟
Knm A = ⎝ . ⎠
A(n)
where the stacking of the rows refer to a different partitioning of A namely
⎛ ⎞
A1
A = ⎝ ... ⎠
⎜ ⎟
Am
and now each submatrix in this partitioning is n × p.
A similar discussion can be made from the case where we postmultiply
an p × mn matrix B by Kmn .
Consider the case where A is a Kronecker product, say A = B ⊗ C where
B is an n × r matrix and C is an m × s matrix. Then,
⎛ 1′ ⎞
b ⊗C
B ⊗C = ⎝
⎜ .. ⎟
. ⎠
′
bn ⊗ C
′
where each submatrix bi ⊗ C is m × rs, for i = 1, . . . , n. In Section 1.2 of
′ ′ ′
Chapter 1, we saw that the jth row of bi ⊗ C is bi ⊗ c j , so
′ ⎞
B ⊗ c1
⎛
Kmn (B ⊗ C ) = ⎝
⎜ .. ⎟
. ⎠
′
B ⊗ cm
which we already knew from Theorem 2.3.
Notice also that as Knm Kmn = Imn , we have
⎛ (1) ⎞ ⎛ ⎞
A A1
⎜ .. ⎟ ⎜ .. ⎟
Knm Kmn A = Knm ⎝ . ⎠ = ⎝ . ⎠ .
A(m) An
That is, premultiplying Kmn A as given in Theorem 2.10 by Knm takes us back
to the original partitioning.
More will be made of this property of the commutation matrix in Section
2.7 of this chapter where we discuss a new zero-one matrix called a twining
matrix.
50 Zero-One Matrices
C = (c1 . . . cG ).
Then,
Recall that Theorem 1.6 of Section 1.3 in Chapter 1 demonstrated that for
A a mG × p matrix and B a nG × q matrix, we can write A τGmn B in terms
of a vector of τG1n cross-products, namely
⎛ (1) ⎞
A τG1n B
A τGmn B = ⎝ ..
⎠.
⎜ ⎟
.
A(m) τG1n B
2.4 The Commutation Matrix 51
Knm (A τGmn B) = ⎝ ..
⎠.
⎜ ⎟
.
A τGm1 B (n)
Proof: We have
A τGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG
and from Theorem 2.3
⎛ ⎞
Ai ⊗ (Bi )1·
Knm (Ai ⊗ Bi ) = ⎝
⎜ .. ⎟
. ⎠
Ai ⊗ (Bi )n·
for i = 1, . . . , G. It follows that
Knm (A τGmn B)
A τGm1 B (1)
⎛ ⎞ ⎛ ⎞
A1 ⊗ (B1 )1· + · · · + AG ⊗ (BG )1·
=⎝ .. .. ..
⎠=⎝ ⎠.
⎜ ⎟ ⎜ ⎟
. . .
A1 ⊗ (B1 )n· + · · · + AG ⊗ (BG )n· A τGm1 B (n)
The following theorems tell us what happens when the commutation matrix
appears in the cross-product.
It follows that the left-hand side of Equation 2.29 can be written as:
vecb1′ A
⎛ ⎞
⎜ .. ⎟ ′ ′
′ ′
⎝ . ⎠ = vec (b1 A . . . bn A) = vec b1 . . . bn (In ⊗ A)
vecb′n A
= vec (vec B)′ (In ⊗ A) = (In ⊗ A′ )vec B = vec A′ B.
54 Zero-One Matrices
KmG =⎝
⎜ .. ⎟
. ⎠
m′
IG ⊗ em
so if we partition A as
⎛ ⎞
A1
A = ⎝ ... ⎠
⎜ ⎟
Am
where each submatrix is n × p, we have
′
m′
KmG τmGn A = IG ⊗ e1m ⊗ A1 + · · · + IG ⊗ em
⊗ Am
⎛ m′ ⎞ ⎛ m′ ⎞
e1 ⊗ A1 O em ⊗ Am O
=⎝
⎜ . .. ⎠ + ··· + ⎝
⎟ ⎜ .. ⎟
. ⎠
′ ′
O e1m ⊗ A1 O m
em ⊗ Am
⎛ ⎞
(A1 O . . . O) O
=⎝
⎜ .. ⎠ + ...
⎟
.
O (A1 O . . . O)
⎛ ⎞
(O . . . OAm ) O
+⎝
⎜ .. ⎟
. ⎠
O (O . . . OAm )
⎛ ⎞
(A1 . . . Am ) O
=⎝
⎜ .. ⎠ = IG ⊗ rvecn A.
⎟
.
O (A1 . . . Am )
Now, by Theorems 2.11 and 2.16
AτmnG KmG = KnG (KmG τmGn A)KmG,p = KnG (IG ⊗ rvecn A)KmG,p .
2.4 The Commutation Matrix 55
and
Interchanging the n and G in the second of these equations gives the result
that
Proof: Write
AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG
= (A1 ⊗ In )(Ip ⊗ B1 ) + · · · + (AG ⊗ In )(Ip ⊗ BG )
⎛ ⎞
I p ⊗ B1
= (A1 ⊗ In . . . AG ⊗ In ) ⎝
⎜ .. ⎟
. ⎠
I p ⊗ BG
= ((A1 . . . AG ) ⊗ In )(KG p ⊗ In )(Ip ⊗ B),
It follows that
′ ′
vecm AτG pn (IG ⊗ B) = A1 ⊗ e1G ⊗ B + · · · + AG ⊗ eGG ⊗ B.
But from the definition of the commutation matrix given in Equation 2.8,
′
Im ⊗ e1G ⊗ Iq
⎛ ⎞
(A ⊗ B)(KGm ⊗ Iq ) = (A1 ⊗ B . . . AG ⊗ B) ⎝
⎜ .. ⎟
. ⎠
′
Im ⊗ eGG ⊗ Iq
′ ′
= A1 Im ⊗ e1G ⊗ B + · · · + AG Im ⊗ eGG ⊗ B
′ ′
= A1 ⊗ e1G ⊗ B + · · · + AG ⊗ eGG ⊗ B.
One final theorem involving cross-products and commutation matrices:
and
(A ⊗ B)Knq = Kmp (B ⊗ A) = ⎝ ..
⎠.
⎜ ⎟
.
′
B ⊗ am
Partitioning C as
⎛ ⎞
C1
C = ⎝ ... ⎠
⎜ ⎟
Cm
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 57
Kpm (A ⊗ B) = ⎝
⎜ .. ⎟
. ⎠
p′
A⊗b
Kmn .. ⎥ n n
=⎣ ⎦ = Im ⊗ e1 . . . Im ⊗ en ,
⎢
.
m′
In ⊗ em
Im ⊗ e1n
⎡ ⎤
For example,
100000010000001000
rvec2 K32 = ,
000100000010000001
and
100
⎡ ⎤
⎢ 000 ⎥
⎢ ⎥
⎢ 010 ⎥
⎢ ⎥
⎢ 000 ⎥
⎢ ⎥
⎢ 001 ⎥
⎢ ⎥
⎢ 000 ⎥
vec3 K32 = ⎢
⎢ ⎥.
⎢ 000 ⎥
⎥
⎢ 100 ⎥
⎢ ⎥
⎢ 000 ⎥
⎢ ⎥
⎢ 010 ⎥
⎢ ⎥
⎣ 000 ⎦
001
Im ⊗ e1G
⎡ ⎛ ⎞⎤
vec(A ⊗ IG ) = ⎣In ⊗ ⎝ ..
⎠⎦ vec A.
⎢ ⎜ ⎟⎥
.
Im ⊗ eGG
In ⊗ e1G
⎡⎛ ⎞ ⎤
In ⊗ eGG
= (vecn KnG ⊗ Im )vec A, (2.34)
Theorem 2.20
Proof: Write
Im ⊗ e1n
⎞ ⎛ m
e1 ⊗ e1n m
⊗ e1n
⎛ ⎞
··· em
vecm Kmn .
.. .. ..
=⎝ ⎠=⎝ ⎠,
⎜ ⎟ ⎜ ⎟
. .
Im ⊗ enn e1m ⊗ enn ··· m
em ⊗ enn
so
e1m ⊗ e1n
⎛ ⎞
⎜ .. ⎟
⎜ m . n⎟
⎜ ⎟
⎜ e1 ⊗ en ⎟
⎜ ⎟
vec(vecm Kmn ) = ⎜
⎜ .. ⎟
⎜ . ⎟
⎟
⎜ em ⊗ en ⎟
⎜ m 1⎟
⎜ .. ⎟
⎝ . ⎠
m n
em ⊗ en
m
= vec e1m ⊗ e1n . . . e1m ⊗ enn . . . em m
⊗ e1n . . . em ⊗ enn
Now,
′ ′
vec(vecm Kmn )′ = vec Im ⊗ e1n . . . Im ⊗ enn
′ ⎞
vec Im ⊗ e1n vec e1n ⊗ Im
⎛ ⎛ ⎞
=⎝ .. ..
⎠=⎝
⎜ ⎟ ⎜ ⎟
. . ⎠
n′
n
vec Im ⊗ en vec en ⊗ Im
n n
= vec e1 ⊗ Im . . . en ⊗ Im = vec Inm ,
where in our working we have used Equation 1.13 of Chapter 1.
so the equivalent results for rvecn KGn are found by taking the transposes of
Equations 2.37 and 2.36. They are
rvecn KGn = [In ⊗ rvec IG ](KnG ⊗ IG ) = [rvec IG ⊗ In ](IG ⊗ KGn ).
(2.39)
Other results for vecn KnG and rvecG KnG can be obtained in a similar manner.
For example, if A is an m × n matrix and B is an p × q matrix, we know
that
Kpm (A ⊗ B) = (B ⊗ A)Kqn .
Then, taking the vecq of both sides, using Theorem 1.12 of Section 1.4.3 in
Chapter 1, we have
(In ⊗ Kpm )vecq (A ⊗ B) = (In ⊗ B ⊗ A)vecq Kqn .
That is,
(In ⊗ B ⊗ A)vecq Kqn = (In ⊗ Kpm )(vec A ⊗ B)
⎛ ⎞
B ⊗ a1
= ⎝ ... ⎠ ,
⎜ ⎟
B ⊗ an
by Equation 2.23.
If b is an p × 1 vector, we know that
Kpm (A ⊗ b) = b ⊗ A
Kmp (b ⊗ A) = A ⊗ b.
Taking the generalized rvecs of both sides of these equations, we have using
Equations 1.18 and 1.19 of Section 1.4.4 in Chapter 1, that
(rvecm Kpm )(Ip ⊗ A ⊗ b) = b′ ⊗ A (2.40)
(rvec p Kmp )(Im ⊗ b ⊗ A) = rvec A ⊗ b. (2.41)
Further results about generalized vecs and rvecs can be obtained by applying
the following theorem:
Theorem 2.21
(rvecG KnG )(KnG ⊗ In )KnG,n = (rvecG KnG )(In ⊗ KGn )Kn,nG = rvecG KnG
Kn,nG (KGn ⊗ In )vecG KGn = KnG,n (In ⊗ KnG )vecG KGn = vec G KGn .
62 Zero-One Matrices
To illustrate the use of Theorem 2.21, write the left-hand side of Equation
2.40 as:
rvecm Kpm [Ip ⊗ Kmp (b ⊗ A)]
= rvecm Kpm (Ip ⊗ Kmp )[Ip ⊗ (b ⊗ A)]
= rvecm Kpm (Ip ⊗ Kmp )Kp,mp ((b ⊗ A) ⊗ Ip )Knp
= rvecm Kpm (b ⊗ A ⊗ Ip )Knp ,
so from Equation 2.40, we have that
(rvecm Kpm )(b ⊗ A ⊗ Ip ) = (b′ ⊗ A)Kpn = A ⊗ b′ .
In a similar fashion, using Equation 2.41, we get
(rvec p Kmp )(A ⊗ b ⊗ Im ) = b ⊗ (vec A)′ .
Similar results can be achieved by taking the appropriate generalized vec of
both sides of Equation 2.40 and 2.41, but the details are left to the reader.
For a final example of the use of this technique, consider A an m × n
matrix and consider the basic definition of the commutation matrix Kmn ,
namely
Kmn vec A = vec A′ .
Taking the rvecn of both sides of this equation, we have
(rvecn Kmn )(Im ⊗ vec A) = rvecn vec A′ = A′
but
(rvecn Kmn )(Im ⊗ vec A) = (rvecn Kmn )(Im ⊗ Knm vec A′ )
= (rvecn Kmn )(Im ⊗ Knm )Km,mn (vec A′ ⊗ Im )
= rvecn Kmn (vec A′ ⊗ Im )
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 63
so
as well.
Another theorem linking the generalized rvec of a commutation matrix
with other commutation matrices is as follows:
Theorem 2.22
Consider the first block in this matrix, which using the definition of the
generalized rvec of the commutation matrix given by Equation 2.30 can be
written as
′
Im ⊗ e1G ⊗ Iq
⎛ ⎞
O
q′ ′
..
Im ⊗ e1 . . . Im ⊗ eqq ⎝
⎜ ⎟
. ⎠
′
O Im ⊗ e1G ⊗ Iq
′ ′
q′ ′
= Im ⊗ e1 e1G ⊗ Iq . . . Im ⊗ eqq e1G ⊗ Iq
′ q′ ′ ′
= Im ⊗ e1G ⊗ e1 . . . Im ⊗ e1G ⊗ eqq
′ ′
q ′ ′
= e1 ⊗ Im ⊗ e1G Kq,mG . . . eqq ⊗ Im ⊗ e1G Kq,mG
′
q ′ ′ ′
= e1 ⊗ Im ⊗ e1G . . . eqq ⊗ Im ⊗ e1G (Iq ⊗ Kq,mG )
′
= rvec Iq ⊗ Im ⊗ e1G (Iq ⊗ Kq,mG ).
64 Zero-One Matrices
It follows this that the left-hand side of Equation 2.42 can be written as
′ ⎞
rvec Iq ⊗ Im ⊗ e1G
⎛
..
⎠ (Iq ⊗ Kq,mG )
⎜ ⎟
⎝ .
′
rvec Iq ⊗ Im ⊗ eGG
′
Im ⊗ e1G ⊗ rvec Iq
⎛ ⎞
..
=⎝ ⎠ KmG,q2 (Iq ⊗ Iq,mG )
⎜ ⎟
.
′
Im ⊗ eGG ⊗ rvec Iq
= (KGm ⊗ rvec Iq )(KmG,q ⊗ Iq )
= KmG (ImG ⊗ rvec Iq )(KmG,q ⊗ Iq )
= KmG rvecmG KmG,q
Proof: Write (rvecG KnG )(b ⊗ A) = (rvecG KnG )(b ⊗ InG )A.
Now,
⎛ ⎞
b1 InG
rvecG KnG (b ⊗ InG ) = IG ⊗ e1n . . . IG ⊗ enn ⎝ ... ⎠
′ ′
⎜ ⎟
bn InG
′
′
= b1 IG ⊗ e1n + · · · + bn IG ⊗ enn = IG ⊗ b′ .
= (A1 . . . AG ) ⎝ ..
⎠ = (rvecm A)KG p .
⎜ ⎟
.
′
Ip ⊗ eGG
(rvecn KGn )(D ⊗ B) = rvecn KGn (IG ⊗ B)(D ⊗ Iq ) = (rvecn KGn B)(D ⊗ Iq ).
The following theorem shows there are several ways of writing this matrix.
where a j is the jth column of A. Again using the Equation 2.30, we write
⎛ ⎞
a11 Imp ... a1G Imp
⎜ .. .. ⎟ .
p′ p ′
(rvecm Kpm )(A ⊗ Ipm ) = Im ⊗ e1 . . . Im ⊗ e p ⎝ . . ⎠
a p1 Imp ··· a pG Imp
(A ⊗ B)Knq = (A ⊗ b1 . . . A ⊗ bq ),
But the following theorem shows there are several ways of writing this
matrix, two involving a generalized vec of the commutation matrix.
Now,
But
where we have used Equation 2.9 of Section 2.3. Finally, using Theorem
1.13 of Section 1.4.4 in Chapter 1, we have
The equivalent results for generalized rvec operators are found by taking
transposes. If C is a n × m matrix and D is a q × p matrix, then
For further such theorems on generalized vecs and rvecs of the commuta-
tion, see Turkington (2005).
Now,
KGm AτGmn KGn B = (KGm τGmn KGn )(A ⊗ B) = (Im ⊗ rvecn KGn )(A ⊗ B)
by Theorem 2.16.
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 69
Notice that Theorem 2.28 is easily reconciled with Theorem 2.17 using
Theorem 2.24.
AG
and
⎛ ⎞
A1 ⊗ B
..
′ ′
(rvecm KGm )(A ⊗ B) = Im ⊗ e1G . . . Im ⊗ eGG
⎜ ⎟
⎝ . ⎠
AG ⊗ B
1′ G′
= A1 ⊗ b + · · · + AG ⊗ b = AτGm1 B.
We finish this section with a theorem that gives yet another way of writing
the cross-product of AτGmn B involving this time rvecG KmG A.
A(1) τG1n B
⎛ ⎞
AτGmn B = ⎝ ..
⎠.
⎜ ⎟
.
A(m) τG1n B
70 Zero-One Matrices
A(1)
⎞ ⎛
A(m)
(A . . . A ) and rvec KmG AτG1n B = (A . . . A(m) )τG1n B = (A(1) τG1n
(1) (m) (1)
KnG,G (D ⊗ B) = ⎝ ..
⎠.
⎜ ⎟
.
′
D ⊗ bnG
The result for rvecn KGn is given by the following theorem.
Proof: Write
(rvecn KGn )(D ⊗ B) = (rvecn KnG )(D ⊗ InG )(Ir ⊗ B),
where
(rvecn KnG )(D ⊗ InG ) = (rvecn KnG )(d1 ⊗ InG . . . dr ⊗ InG ).
But,
rvecn KnG (d1 ⊗ InG ) = In ⊗ d1′
by Theorem 2.23.
Proof: Write
′ ⎞
IG ⊗ e1nG
⎛
KnG,G (B ⊗ D) = ⎝ ..
⎠ (B ⊗ D).
⎜ ⎟
.
nG ′
IG ⊗ enG
Consider the first submatrix
′
′ ′
′
′
IG ⊗ e1nG (B ⊗ D) = IG ⊗ e1n ⊗ e1G (B ⊗ D) = IG ⊗ e1n B ⊗ e1G D.
′
But from our work in selection matrices in Section 2.2, (IG ⊗ e1n )B = B (1)
′ ′
and e1G D = d 1 . The other submatrices are analysed in a similar fashion and
the result follows.
For example,
′ ′
⎛ ⎞
e13 ⊗ I3 + I3 ⊗ e13
1⎜⎜ e 3′ ⊗ I + I ⊗ e 3′ ⎟ .
⎟
N3 = 2 3 3 2 ⎠
2 ⎝
′ ′
e33 ⊗ I3 + I3 ⊗ e33
and
1 1
Nn′ = (In2 + Knn )′ = (In2 + Knn ) = Nn
2 2
1 2
Nn Nn = In2 + Knn + Knn + Knn = Nn ,
4
so Nn is symmetric idempotent.
Other properties for Nn can be derived from the corresponding properties
for Knn . If A and B are n × p and n × q matrices, respectively, then
⎛ 1′ ′ ⎞
a ⊗ B + A ⊗ b1
1 1⎜ ..
Nn (A ⊗ B) = (A ⊗ B) + Knn (A ⊗ B) = ⎝ ⎠,
⎟
2 2 .
n′ n′
a ⊗B+A⊗b
(2.48)
Nn (A ⊗ B)Nn = Nn (B ⊗ A)Nn
Nn (A ⊗ A)Nn = Nn (A ⊗ A) = (A ⊗ A)Nn
1
Nn (A ⊗ b) = Nn (b ⊗ A) = (A ⊗ b + b ⊗ A).
2
Additional properties of Nn can be found in Magnus (1988).
74 Zero-One Matrices
′ ′ ′
But eim e nj = e nj ⊗ eim = eim ⊗ e nj , so
′ ′
e1n ⊗ e1m enn ⊗ e1m rvec In ⊗ e1m
⎛ ⎞ ⎛ ⎞
···
Umn =⎝ .. .. .. (2.52)
⎠=⎝
⎜ ⎟ ⎜ ⎟
. . . ⎠
′ ′ m
e1n ⊗ em
m
··· m
enn ⊗ em rvec In ⊗ em
or
′ ′ ⎞
e1m ⊗ e1n e1m ⊗ enn
⎛
···
.. ..
n′ n′
Umn =⎝ = vec I ⊗ e . . . vec I ⊗ en .
⎜ ⎟
. . ⎠ m 1 m
′
m n′
em ⊗ e1n ··· m
em ⊗ en
(2.53)
2.7.1 Introduction
Often, in statistics and econometrics, we work with matrices that are formed
by intertwining the rows (columns) of a set of matrices.
To understand what I mean by intertwining rows of matrices, consider
two m × n matrices A = {ai j } and B = {bi j }. Suppose we want to form a
new matrix C from A and B by intertwining single rows of A and B together,
taking the first row of A as the first row of C. That is,
⎛ ⎞
a11 a12 ... a1n
⎜ ⎟
⎜b b12 ... b1n ⎟
⎜ 11 ⎟
⎜ ⎟
C = ⎜ ... .. .. ⎟ .
⎜
⎜ . . ⎟
⎟
⎜ ⎟
⎜a am2 ... amn ⎟
⎝ m1 ⎠
bm1 bm2 ... bmn
⎛ ⎞
a11 a12 ... a1n
⎜ ⎟
⎜ a21 a22 ... a2n ⎟
⎜ ⎟
⎜ ⎟
⎜ b11 b12 ... b1n ⎟
⎜ ⎟
⎜ ⎟
⎜ b21 b22 ... b2n ⎟
⎜ ⎟
⎜ . .. .. ⎟
⎜ ..
D=⎜ . . ⎟.
⎟
⎜ ⎟
⎜a am−12 ... am−1n ⎟
⎜ m−11 ⎟
⎜ ⎟
⎜ a am2 ... amn ⎟
⎜ m1 ⎟
⎜ ⎟
⎜b bm−12 ... bm−1n ⎟
⎝ m−11 ⎠
bm1 bm2 ... bmn
Clearly, from A and B we can form a new matrix by intertwining any r rows
at a time where r is a divisor of m.
2.7 Twining Matrices 77
AG BG
⎛ ⎞
A1
B ⎟
⎜ ⎟
⎜⎜ 1⎟
A ⎜ .. ⎟
T = ⎜ . ⎟.
B ⎜ ⎟
⎝ AG ⎠
⎜ ⎟
BG
78 Zero-One Matrices
Clearly,
G T is the
(m + p) × (m + p) permutation matrix, where m =
G
i=1 m i and p = j=1 p j given by
Im O ... O · O O ... O
⎛ ⎞
1
⎜O O ... O · Ip1 O ... O ⎟
⎜ ⎟
⎜O Im ... O · O O ... O ⎟
⎜ 2 ⎟
O O ... O · O Ip ... O
⎜ ⎟
T =⎜ 2
⎟.
⎜ .. .. .. .. .. ..
⎜ ⎟
⎜ . . . . . .
⎟
· ⎟
⎜ ⎟
⎝O O ... Im · O O ... O ⎠
G
O O ... O · O O Ip
G
A lot of the mathematics in this book concerns itself with the case where A is
an mG × p matrix and B is an nG × q matrix, and each of those matrices are
partitioned into G submatrices. For A, each submatrix is of order m × p and
for B, each submatrix is of order n × q. If p = q = ℓ say, then the twining
matrix can be written as
⎛ ⎛ ⎞ ⎛ ⎞⎞
Im O
m×n ⎠⎠
T = ⎝IG ⊗ ⎝ O ⎠ : IG ⊗ ⎝ (2.56)
n×m
In
Theorem 2.34
KmG O
TG,m,n = KG,m+n
O KnG
Proof: Write
′ ⎞
Im+n ⊗ e1G
⎛
KmG O .. ⎟ KmG O
KG,m+n =⎝
⎜
O KnG . ⎠
O KnG
G′
Im+n ⊗ eG
and consider
′
K I ⊗ eG ′ O
KmG
Im+n ⊗ e1G mG
= m 1
′
O O In ⊗ e1G O
G′
Im ⊗ e1 KmG
= ′ .
Im ⊗ e1G O
′ (1)
But, we saw in Section 2.2 that (Im ⊗ e1G )KmG = KmG , so
⎛ ′ ′
⎞
e1G ⊗ e1m
⎜ .. ⎟
⎜
⎜ . ⎟
⎟
⎜ G′ m′ ⎟
e e
G ′
KmG ⎜ 1 ⊗ m ⎟ G′ I
Im+n ⊗ e1 = ⎜ G′ ⎟ = e1 ⊗ m
O ⎜ e1 ⊗ 0 ⎟ ′ O
..
⎜ ⎟
⎜ ⎟
⎝ . ⎠
′
e1G ⊗ 0′
In a similar manner,
O
O
G′ G′
Im+n ⊗ e1 = e1 ⊗ .
KnG In
The result follows.
Im
Proof: Consider the first submatrix of TG,m,n , namely IG ⊗ O .
n×m
As n ≥ 1,
it follows that the only nonzero elements on the main diagonal of TG,m,n
arising from this submatrix are those of the main diagonal of Im . Likewise,
O
consider the second submatrix IG ⊗ m×p
In
. Again, as m ≥ 1, it follows that
the only nonzero elements on the main diagonal of TG,m,n arising from this
submatrix are those on the main diagonal of In . Thus, trTG,m,n = m + n.
= (−1) 12 G(G−1)[m(m−1)+n(n−1)+mn] .
T
G,m,n
1
Proof: The proof uses the fact that |Kmn | = (−1) 4 mn(m−1)(n−1) . (See
Henderson and Searle (1981)).
82 Zero-One Matrices
⎛ ⎞
A1
O
⎜ B 1
⎟
A
⎜ ⎟
TG,m,n =⎜
⎜ .. ⎟.
⎟
B ⎜ .
⎟
⎝ AG ⎠
O
BG
It follows that
⎛ ⎞
a11 c · · · a1ℓ c
⎜ a11 d · · · a1ℓ d ⎟
⎜
⎟
A⊗c ⎜ .. .
TG,m,n =⎜ . .. ⎟
A⊗d
⎟
⎜ ⎟
⎝ aG1 c · · · aGℓ c ⎠
aG1 d · · · aGℓ d
⎛ ⎞
c c
a
⎜ 11 d · · · a1ℓ
d ⎟
c
⎜ ⎟
.
..
=⎜ ⎟=A⊗ .
⎜ ⎟
⎜ ⎟ d
⎝ c c ⎠
aG1 · · · aGℓ
d d
This last result is a special case of a theorem on how twining matrices interact
with Kronecker products, a topic that concerns us in the next section.
Proof:
A⊗E KmG (A ⊗ E ) (E ⊗ A)Kℓr
T G,m,n = KG,m+n = KG,m+n
A⊗F KnG (A ⊗ F ) (F ⊗ A)Kℓr
E E
= KG,m+n ⊗ A Kℓr = A ⊗ . (2.59)
F F
Proof:
(B ⊗ (C D))TG,m,n
KmG O K O
= (B ⊗ (C D))KG,m+n = Krs (C ⊗ B D ⊗ B) mG
O KnG O KnG
= Krs ((C ⊗ B)KmG (D ⊗ B)KnG ) = Krs (Ksr (B ⊗ C ) Ksr (B ⊗ D))
= (B ⊗ C B ⊗ D),
as Krs−1 = Ksr .
Notice that if we take the transposes of both sides of Equations 2.59 and
2.58, we have
′ ′
B ⊗ C′
′ ′ C
TG,m,n B ⊗ =
D′ B′ ⊗ D ′
and
(A′ ⊗ E ′ A′ ⊗ F ′ )TG,m,n
′
= A′ ⊗ (E ′ F ′ ).
′ −1
That is, TG,m,n = TG,m,n undoes the transformation brought about by TG,m,n .
2.7.7 Generalizations
The results up to this point have to do largely with intertwining corre-
sponding submatrices from two partitioned matrices. Moreover, we have
concentrated on the case where the submatrices of each partitioned matrix
all have the same order. If we stick to the latter qualification, our results
easily generalize to the case where we intertwine corresponding submatri-
ces from any number of partitioned matrices. All that happens is that the
notation gets a little messy. Here, we content ourselves with generalizing the
definition of a twining matrix and the two explicit expressions we derived
for this matrix. The generalizations of the other results are obvious and are
left to the reader.
j
where each submatrix Ai is p j × ℓ for i = 1, . . . , G. The twining matrix,
denoted by TG,p ,...,p is defined by
1 r
A11
⎞ ⎛
⎜ .. ⎟
⎛ 1 ⎞ ⎜ .r ⎟
⎜ ⎟
A ⎜ A1 ⎟
⎜ ⎟
TG,p ...,p ⎝ ... ⎠ = ⎜ ... ⎟ (2.60)
⎜ ⎟ ⎜ ⎟
1 r ⎜ ⎟
Ar ⎜ A1 ⎟
⎜ G⎟
⎜ . ⎟
⎝ .. ⎠
AGr
⎛ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎞
Ip1 O O
⎜ ⎜ ⎟ ⎜ p1 ×p2 ⎟ ⎜ p1 ×pr ⎟⎟
⎜ O ⎟
⎜ Ip ⎟ ⎜ O ⎟⎟
⎜ ⎜ ⎟ ⎜ ⎟⎟
⎜ ⎜ p ×p ⎟
TG,p ,...,p = ⎜IG ⊗ ⎜ . ⎟ IG ⊗ ⎜ . ⎟ · · · IG ⊗ ⎜ p2 ×pr ⎟
⎜ ⎜ 2 1 ⎟ ⎜ 2 ⎟ ⎜
⎟⎟ , (2.61)
⎟
1 r
⎜ ⎜ . ⎟. ⎜ . ⎟. .
⎜ . ⎟⎟
⎝ ⎝ ⎠ ⎝ ⎠ ⎝ . ⎠⎠
O O I
pr ×p1 pr ×p2 pr
⎛ ⎞
Kp G O
1
⎜
⎜ Kp G ⎟
⎟
2
TG,p ,...,p = KG,p +···+p ⎜ ..
⎟. (2.62)
1 r 1 r ⎜
.
⎟
⎝ ⎠
O Kp G
r
and
D = (D1 . . . DG ) = B1′ . . . BG′
C = (C1 . . . CG )
CG
so
⎛ ⎞
(C1 ).1
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟ ⎛ ⎞
⎜ (CG ).1 ⎟
⎜ ⎟ vecC(1)
vec(vecmC ) = ⎜ ... ⎟ = ⎝ ..
⎜ ⎟ ⎜ ⎟
⎜ ⎟ . ⎠
⎜ (C ). ⎟
⎜ 1 m⎟ vecC(m)
⎜ . ⎟
⎝ .. ⎠
(CG ).m
The corresponding results for rvecs is found by taking the transpose of both
sides of Equation 2.68 to obtain
rvec (vecmC )′ = rvec (CKmG )′ .
That is,
rvec (rvecm A) = rvec KGm A
where A is a mG × p matrix.
A more general analysis also applies to intertwining columns of matrices.
If we take the transpose of Equation 2.60, we have
′ ′
′
r′ 1′ r′
(A1 . . . Ar )TG,p
′
,...,p = A 1
1 . . . A 1 . . . A G . . . A G ,
1 r
′
′ j j′ j j
then letting C j = A j = (A1 . . . AG ) = (C1 . . . CG ), for j = 1, . . . , r, we
have
(C 1 . . . C r )TG,p
′
1 r 1 r
1
,...,p = C1 . . . C1 . . . CG . . . CG ,
r
3.1 Introduction
A special group of selection matrices is associated with the vec, vech, and
v(A) of a given square matrix A. These matrices are called elimination
matrices and duplication matrices. They are extremely important in the
application of matrix calculus to statistical models as we see in Chapter 6.
The purpose of this chapter is not to list all the known results for these
matrices. One can do no better than refer to Magnus (1988) for this. Rather,
we seek to present these matrices in a new light and in such a way that
facilitates the investigation as to how these matrices interact with other
matrices, particularly Kronecker products. The mathematics involved in
doing this entitles a new notation – well, at least it is new to me. But it
is hoped that the use of this notation makes it clear how the otherwise
complicated matrices behave.
Ln vec A = v(A).
89
90 Elimination and Duplication Matrices
1
Nn vec A = (vec A + vec A′ ) = vec A,
2
and
Finally, note that as vechA contains all the elements in v(A) and more, there
exists a 12 n(n − 1)× 12 n(n + 1) zero-one matrix Ln∗ such that
Then,
⎞⎛ ⎛ ⎞
⎛ ⎞ a11 a21
a11 ⎜ .. ⎟ ⎜ .. ⎟
⎜ .. ⎟ ⎜ . ⎟ ⎜ . ⎟
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ an1 ⎟ ⎜ an1 ⎟
⎜ an1 ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ a22 ⎟ ⎜ a32 ⎟
vec A = ⎜ ... ⎟ , vech A = ⎜ . ⎟ , v(A) = ⎜ . ⎟ ,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ .. ⎟ ⎜ .. ⎟
⎜a ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 1n ⎟ ⎜a ⎟ ⎜ a ⎟
⎜ . ⎟ n2 n2
⎝ .. ⎠
⎜ ⎟ ⎜ ⎟
⎜ . ⎟ ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠
ann
ann ann−1
which are n2 ×1, 12 n(n + 1)×1 and 21 n(n − 1)×1 vectors, respectively.
Comparing vechA with vecA, it is clear that Ln is the 12 n(n + 1)×n2 block
diagonal matrix given by
⎛ ⎞
In O
⎜ E1 ⎟
Ln = ⎜ ⎟, (3.1)
⎜ ⎟
..
⎝ . ⎠
O En−1
Ej = ( O In − j ) (3.2)
n−j ×j n − j ×n − j
for j = 1, . . . , n − 1.
Note, for convenience we only use one subscript j to identify Ej , the
second parameter n being obvious from the content. For example, if we are
dealing with L3 , then
0 1 0
E1 = ( 0 I2 ) = ,
2×1 2×2
0 0 1
E2 = ( 0 I1 ) = 0 0 1
1×2
92 Elimination and Duplication Matrices
and
⎛ ⎞
I3 O O
L3 = ⎝O E1 O⎠
O O E2
⎛ ⎞
1 0 0 0 0 0 0 0 0
⎜0 1 0 0 0 0 0 0 0⎟
⎜ ⎟
⎜0 0 1 0 0 0 0 0 0⎟
=⎜⎜0
⎟.
⎜ 0 0 0 1 0 0 0 0⎟⎟
⎝0 0 0 0 0 1 0 0 0⎠
0 0 0 0 0 0 0 0 1
′
Also, for mathematical convenience, we take E0 = In . Note that En−1 = enn .
The matrix E j itself can be regarded as an elimination matrix. If A and B
are n×m and p×n matrices, respectively, then
E j A = (A) j , j = 1, . . . , n − 1 (3.3)
where (A) j is the n − j ×m matrix formed from A by deleting the first j
rows of A, and
BE ′j = (B) j , j = 1, . . . , n − 1 (3.4)
where (B) j is the p×n − j matrix formed from B by deleting the first j
column of B. For mathematical convenience, we said we would take E0 = In ,
so this implies that we must also take
(A)0 = A
and
(B)0 = B.
Note that when we use this notation for j = 1, . . . , n − 1
j
(A)′j = (A′ )
and
′
(B) j = (B ′ )j .
In particular, with A an n×m matrix
⎛ ⎞′
a j+1
(vec A)′n j = ⎝ ... ⎠ = a′j+1 . . . am
′
= ((vec A)′ )n j = (rvec A′ )n j ,
⎜ ⎟
am
3.2 Elimination Matrices 93
and
⎛ j+1 ⎞
a
′ ′
n′ ′ ⎜ .. ⎟
(rvec A)m j = a j+1 = ⎝ . ⎠ = (rvec A)′ m j
··· a
an
= (vec A′ )m j .
When we apply this notation to Kronecker products, we have
(A ⊗ x ′ ) j = (A) j ⊗ x ′ (3.5)
(x ′ ⊗ A) j = x ′ ⊗ (A) j (3.6)
(B ⊗ x) j = (B) j ⊗ x
(x ⊗ B) j = x ⊗ (B) j .
When working with columns from identity matrices and indeed identity
matrices themselves, we have
n
e j i = e n−i
j−i i< j
=0 i≥ j (3.7)
and
(In ) j = O In−j = E j , (3.8)
for j = 1, . . . , n − 1. Also,
′ p′
(In ) j ⊗ emp = n−j×j
O In−j ⊗ em
p
n−j n−j
= O O e1 O ... O en−j O (3.9)
E j (x ′ ⊗ A) = x ′ ⊗ (A) j (3.10)
E j (A ⊗ x ′ ) = (A) j ⊗ x ′ (3.11)
(x ⊗ B)E j′ = x ⊗ (B) j
(B ⊗ x)E j′ = (B) j ⊗ x
94 Elimination and Duplication Matrices
A1
⎛ ⎞ ⎛ ⎞
A1
⎜ E A ⎟ ⎜ (A2 )1 ⎟
⎜ 1 2 ⎟ ⎜
Ln A = ⎜ . ⎟ = ⎜ . ⎟ .
⎟
⎝ .. ⎠ ⎝ .. ⎠
En−1 An (An )n−1
B = (B1 · · · Bn ) (3.13)
Now, if we partition C as
⎛ ⎞
C11 · · · C1n
⎜ .. .. ⎟
C=⎝ . . ⎠ (3.15)
Cn1 · · · Cnn
3.2 Elimination Matrices 95
b1 A
⎛ ⎞
⎜ b2 (A)1 ⎟
Ln (b ⊗ A) = ⎜
⎜ ⎟
.. ⎟
⎝ . ⎠
bn (A)n−1
Likewise,
⎛ ⎞
In O ⎛ ⎞
⎜ E1 ⎟ b1 A
⎟ ⎜ .. ⎟
Ln (b ⊗ A) = ⎜
⎜
.. ⎟⎝ . ⎠
⎝ . ⎠
bn A
O En−1
b1 A b1 A
⎛ ⎛ ⎞ ⎞
⎜ b E A ⎟ ⎜ b (A) ⎟
⎜ 2 1 ⎟ ⎜ 2 1 ⎟
=⎜ .. ⎟=⎜ .. ⎟,
⎝ . ⎠ ⎝ . ⎠
bn En−1 A bn (A)n−1
from Equation 3.3.
Now,
⎛ ⎞
In O
⎜ E1′ ⎟
(A ⊗ b′ )Ln′ = (a1 ⊗ b′ ··· an ⊗ b′ ) ⎜
⎜ ⎟
.. ⎟
⎝ . ⎠
′
O En−1
= a1 ⊗ b′ a2 ⊗ b′ E1′ . . . an ⊗ b′ En−1
= a1 ⊗ b′ a2 ⊗ (b′ )1 . . . an ⊗ (b′ )n−1 ,
using Equation 3.4. Finally,
⎛ ⎞
In O
⎜ E1′ ⎟
(b′ ⊗ A)Ln′ = b1 A
· · · bn A ⎜
⎜ ⎟
.. ⎟
⎝ . ⎠
O En′
= b1 A b2 AE1′ . . . bn AEn−1
′
= b1 A b2 (A)1 . . . bn (A)n−1 .
Proof: Clearly,
′
a1 ⊗ B a12 (B)1 a1n (B)n−1
⎛ ⎞ ⎛ ⎞
a11 B ···
′ ⎜ .. ⎟ ′ ⎜ .. .. ..
(A ⊗ B)Ln = ⎝ . ⎠ Ln = ⎝ .
⎟
. . ⎠
′
an ⊗ B an1 B an2 (B)1 · · · ann (B)n−1
= a1 ⊗ B a2 ⊗ (B)1 ... an ⊗ (B)n−1 ,
where we have used Theorem 3.1.
Write
1 1
Ln Nn = Ln (In2 + Knn ) = (Ln + Ln Knn ).
2 2
Now, from Equation 2.8 of Chapter 2, we have
Ln Knn = Ln In ⊗ e1n . . . In ⊗ enn .
⎛ ′ ′ ⎞ ⎛
e1n ⊗ e1n ··· enn ⊗ e1n In ⊗ e1n
′ ⎞
⎜ n ′ ′ ⎟ n′ ⎟
⎜ e1 1 ⊗ e2n enn 1 ⊗ e2n ⎟ ⎜
··· ⎜ (In )1 ⊗ e2 ⎟
=⎜⎜
. .. ⎟=⎜
⎟ ..
⎝ ..
⎟
. ⎠ ⎝ . ⎠
′
′ ′ n
e1n n−1 ⊗ enn · · · enn n−1 ⊗ enn
(In )n−1 ⊗ en
for j = 2, . . . , n − 1.
But, we can write
′ ′
′ e nj ⊗ e nj
(In ) j−1 ⊗ e nj = ′
(In ) j ⊗ e nj
and using Equation 3.9, we have that
′
n′
0′ e nj 0′ 0 0′ · · · 0′ 0 0′
(In ) j−1 ⊗ e j = n− j n− j
O O O e1 O · · · O en− j O
′
e nj n− j+1 n− j+1
= O O e2 O · · · O en− j+1 O
O
so
′
en
n− j+1 n− j+1
Tj = O E j−1 + j O e2 O ... O en− j+1 O .
O
102 Elimination and Duplication Matrices
for j = 2, . . . , n − 1.
The other two submatrices are
Z21 · · · Zn1
T1 = R1 (3.23)
and
Tn = (0′ · · · 0′En−1 ) + en′ ⊗ en′
′ ′
= (0′ · · · 0′ enn ) + 0′ · · · 0′ enn
′
0′
= 1×n(n−1) 2enn . (3.24)
and
(C ⊗ D)Nn Ln′
1
= c ⊗ D + C ⊗ d1 c2 ⊗ (D)1 + (C )1 ⊗ d2 · · · cn ⊗ (D)n−1
2 1
+ (C )n−1 ⊗ dn . (3.26)
where we have used Theorem 3.2 and Equations 2.11 and 2.14 of Chapter 2
in our working.
104 Elimination and Duplication Matrices
1
= c ⊗ D + C ⊗ d1 c2 ⊗ (D)1
2 1
+ (C )1 ⊗ d2 · · · cn ⊗ (D)n−1 + (C )n−1 ⊗ dn .
1 1
· · · a jn (B) j−1 )n−1
Cj = a j1 (B) j−1 a j2 (B) j−1
4
1
· · · b jn (A) j−1 )n−1
+ b j1 (A) j−1 b j2 ((A) j−1
′ ′ ′
+ (a1 ) j−1 ⊗ b j (a2 ) j−1 ⊗ (b j )1 · · · (an ) j−1 ⊗ (b j )n−1
′ ′ ′
+ (b1 ) j−1 ⊗ a j (b2 ) j−1 ⊗ (a j )1 · · · (bn ) j−1 ⊗ (a j )n−1
for j = 1, . . . , n.
Ln Nn (A ⊗ B)Nn Ln′
′ ′ ′ ′
a1 ⊗ B + A ⊗ b1 + B ⊗ a1 + b1 ⊗ A
⎛ ⎞
2′ ′ ′ ′
1⎜ a ⊗ (B)1 + (A)1 ⊗ b2 + (B)1 ⊗ a2 + b2 ⊗ (A)1 ⎟
⎟ ′
= ⎟ Ln .
⎜
..
4⎝
⎜
. ⎠
′ ′ ′ ′
an ⊗ (B)n−1 + (A)n−1 ⊗ bn + (B)n−1 ⊗ an + bn ⊗ (A)n−1
1 j′ ′ ′ ′
= a ⊗ (B) j−1 + (A) j−1 ⊗ b j + (B) j−1 ⊗ a j + b j ⊗ (A) j−1 Ln′ .
4
Applying Theorem 3.1 gives the result.
Ln (A ⊗ A)Knn Ln′
′ ′
a1 ⊗ A A ⊗ a1
⎛ ⎞ ⎛ ⎞
′ ′
⎜ a2 ⊗ (A) ⎟ ⎜ (A)1 ⊗ a2 ⎟
1
⎟ Knn Ln′ = ⎜
⎟ ′
=⎜ ⎟ Ln
⎜ ⎟ ⎜
.. ..
⎝ . ⎠ ⎝ . ⎠
′ ′
an ⊗ (A)n−1 (A)n−1 ⊗ an
′ ′ ′
a1 ⊗ a 1 a2 ⊗ (a1 )1 an ⊗ (a1 )n−1
⎛ ⎞
···
⎜ (a ) ⊗ a2′ ′
(a2 )1 ⊗ (a2 )1 ···
′
(an )1 ⊗ (a2 )n−1 ⎟
⎜ 1 1
=⎜ ⎟.
⎟
.. .. ..
⎝ . . . ⎠
′ ′ ′
(a1 )n−1 ⊗ an (a2 )n−1 ⊗ (an )1 ··· (an )n−1 ⊗ (an )n−1
106 Elimination and Duplication Matrices
By Equation 3.27,
′ ′
a1 ⊗ a1 a2 ⊗ (a1 )1
1 1
L2 N2 (A ⊗ A)N2 L2′ = L2 (A ⊗ A)L2′ + ′ ′
2 2 (a1 )1 ⊗ a2 (a2 )1 ⊗ (a2 )1
⎛ ⎞
a11 (a11 a12 ) a21 a12
1 1
= L2 (A ⊗ A)L2′ + ⎝a21 (a11 a12 ) a22 a12 ⎠
2 2
a21 (a21 a22 ) a22 a22
⎛ 2 ⎞
a11 a11 a12 a21 a12
1 1
= L2 (A ⊗ A)L2′ + ⎝a21 a11 a21 a12 a22 a12 ⎠
2 2 2 2
a21 a21 a22 a22
⎛ 2 ⎞
a11 a11 a12 a21 a12
⎜
=⎜ a11 a22 + a21 a12 a21 a22 + a22 a12 ⎟ ⎟.
⎝a11 a21
2 2
⎠
2 2
a21 a21 a22 a22
(3.28)
Note if A is 2×2 and symmetric then only the (2, 2) element differs in these
two matrices.
3.2 Elimination Matrices 107
⎛ 1 n−1 ⎞
(C11 )1 ··· (C1n−1 )1
1 n−1 ⎟
⎜ (C21 )2 ··· (C2n−1 )2
⎜
′ ⎟
LnCLn = ⎜
⎜ .. ⎟.
.
⎟
⎝ ⎠
1 n−1
(Cn−11 )n−1 · · · (Cn−1n−1 )n−1
If A and B are n× p and n×q matrices, respectively, then
⎛ 1′ ⎞
a ⊗ (B)1
Ln (A ⊗ B) = ⎝
⎜ .. ⎟
. ⎠
′
an−1 ⊗ (B)n−1
and if C and D are r ×n and s×n matrices, respectively, then
′
(C ⊗ D)Ln = c1 ⊗ (D)1 ··· cn−1 ⊗ (D)n−1 .
108 Elimination and Duplication Matrices
Cn−1
then the submatrix Cj is a the n − j × 21 n(n − 1) given by
1 1 n−1
Cj = a j1 (B) j · · · a jn−1 (B) j
4
1 n−1
+ b j1 (A) j · · · b jn−1 (A) j
′ ′
+ (b1 ) j ⊗ (a j )1 ··· (bn−1 ) j ⊗ (a j )n−1
′ ′
+ (a1 ) j ⊗ (b j )1 ··· (an−1 ) j ⊗ (b j )n−1 ,
for j = 1, . . . , n − 1.
For the special case A ⊗ A, we have
′
Ln Nn (A ⊗ A)Nn Ln
1 ′ 1 ′ 1 ′
= Ln (A ⊗ A)Ln + Ln (A ⊗ A)Knn Ln = Ln (A ⊗ A)Ln
2 ⎛ 2 2
′ ′
(a1 )1 ⊗ (a1 )1 (an−1 )1 ⊗ (a1 )n−1
⎞
···
1⎜ .. ..
+ ⎝ ⎠.
⎟
2 . .
n−1′ 1 n−1′ n−1
(a1 )n−1 ⊗ (a ) · · · (an−1 )n−1 ⊗ (a )
3.2 Elimination Matrices 109
′ ′
Consider Ln (A ⊗ A)Ln and Ln Nn (A ⊗ A)Nn Ln for the 3×3 case. First,
1 2
′ a11 (A)1 a12 (A)1
L3 (A ⊗ A)L3 = 1 2
a21 (A)2 a22 (A)2
⎛ ⎞
a22 a23 a23
a a
= ⎝ 11 a32 a33 12
a33 ⎠
a21 a32 a33 a22 a33
⎛ ⎞
a11 a22 a11 a23 a12 a23
= ⎝a11 a32 a11 a33 a12 a33 ⎠ .
a21 a32 a21 a33 a22 a33
Now,
′ 1 ′ 2
′ (a1 )1 ⊗ a1 (a2 )1 ⊗ a1
L3 (A ⊗ A)K33 L3 = ′ 1 ′ 2
(a1 )2 ⊗ a2 (a2 )2 ⊗ a2
⎛ ⎞
a21 a12 a13 a22 a13
⎜
= ⎝a31 a12 a13 a32 a13 ⎠
⎟
a31 a22 a23 a32 a23
⎛ ⎞
a21 a12 a21 a13 a22 a13
= ⎝a31 a12 a31 a13 a32 a13 ⎠ ,
a31 a22 a31 a23 a32 a23
so
′
L3 N3 (A ⊗ A)N3 L3
⎛ ⎞
a a + a21 a12 a11 a23 + a21 a13 a12 a23 + a22 a13
1 ⎝ 11 22
= a11 a32 + a31 a12 a11 a33 + a31 a13 a12 a33 + a32 a13 ⎠ .
2
a21 a32 + a31 a22 a21 a33 + a31 a23 a22 a33 + a32 a23
with Ln Nn the elimination matrix that recognizes the fact that A is symmet-
ric.
To find an explicit expression for the matrix Ln Nn , we take a different
approach than the one we used for Ln Nn .
110 Elimination and Duplication Matrices
for j = 1, . . . , n − 1.
Clearly, if A is a n − j + 1 × p matrix, then
Fj A = (A)1 . (3.31)
Fj E j−1 = (E j−1 )1 = E j ,
F1 O ··· ··· 0 In · · · · · · O
⎛ ⎞⎛ ⎞
. ⎟ ⎜ .. .. ⎟
· · · .. ⎟
⎜
⎜O F2 · · · ⎜ . E1 . ⎟
Ln∗ Ln =⎜
⎜. .. . .
⎟⎜
.. ⎟ ⎜ .. .
⎟
⎝ .. .. .. ⎠
⎟
. . .⎠ ⎝ . .
′ ′
0 0 ··· Fn−1 0 O · · · · · · En−1
E1 O · · · ··· O
⎛ ⎞
⎜ O E2 · · · · · · O⎟
=⎜. .. ⎟ = Ln ,
⎜ ⎟
.. . .
⎝ .. . . .⎠
0′ 0′ · · · En−1 0′
so
Ln∗ Ln N = Ln N (3.32)
matrix that takes us from vechA to vecA for the case where A is a symmetric
matrix. Recall that vechA is the n(n + 1)/2 × 1 vector given by
⎛ ⎞
a11
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜an1 ⎟
⎜ ⎟
⎜ a22 ⎟
vech A = ⎜ . ⎟
⎜ ⎟
⎜ .. ⎟
⎜ ⎟
⎜a ⎟
⎜ n2 ⎟
⎜ . ⎟
⎝ .. ⎠
ann
whereas vecA is n2 × 1 vector given by
⎛ ⎞
a11
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜an1 ⎟
⎜ ⎟
vec A = ⎜ ... ⎟ .
⎜ ⎟
⎜ ⎟
⎜a ⎟
⎜ 1n ⎟
⎜ . ⎟
⎝ .. ⎠
ann
Comparing vechA with vecA, we see that we can write Dn as follows:
⎛ ⎞
In O O ··· O 0
′
⎜e n 0′ 0′ · · · 0′ 0⎟
⎜2 ⎟
⎜O I O · · · O 0⎟
⎜ ′ n−1 ⎟
⎜e n 0′ 0′ · · · 0′ 0⎟
⎜3 ⎟
⎜ 0′ e n−1′ 0 ′
· · · 0 ′
0 ⎟
⎜ 2 ⎟
⎜O O In−2 · · · O 0⎟
⎜ ⎟
⎜ .. .. ⎟
. .⎟
⎜ ⎟
⎜
Dn = ⎜
⎜en n′
0 ′
0 ′
· · · 0 0⎟ .
′ ⎟ (3.33)
′
⎜ ′ n−1
0′ · · · 0′ 0⎟
⎟
⎜ 0 en−1
⎜ ⎟
⎜ ⎟
⎜ .. .. ⎟
⎜ ⎟
..
⎜ . . .⎟
⎜ .. .
⎜ ⎟
.. .. ⎟
⎜ . . ⎟
2′
⎜ ⎟
⎝ e2 ⎠
′ ′ ′ ′
0 0 0 ··· 0 1
3.3 Duplication Matrices 113
For example,
1 0 0 0 0 0
⎛ ⎞
⎛ ⎞
I3 O 0 ⎜0 1 0 0 0 0⎟
⎜ ⎟
⎟ ⎜0 0 1 0 0 0⎟
⎜e 3′ 0′ 0⎟ ⎜
⎜2 ⎟
⎜ ⎟ ⎜0 1 0 0 0 0⎟
⎜O I2 0⎟ ⎜ ⎟
D3 = ⎜ ⎟ = ⎜0 0 0 1 0 0⎟
⎜e 3′ 0′ 0⎟ ⎜ ⎟.
⎜3 ⎟ ⎜0 0 0 0 1 0⎟
′
e22
⎜ ′
⎝0 0⎠ ⎜ 0
⎟ ⎜ ⎟
⎜ 0 1 0 0 0⎟⎟
0′ 0′ 1 ⎝0 0 0 0 1 0⎠
0 0 0 0 0 1
A close inspection of Dn shows that we can write the matrix in the following
way:
⎞ ⎛
H1
Dn = ⎝ ... ⎠ = (M 1 · · · Mn ) (3.34)
⎜ ⎟
Hn
Hj = G j O (3.36)
′
e nj
⎛ ⎞
O
′
⎜
⎜ e n−1
j−1
⎟
⎟
..
⎜ ⎟
Gj = ⎜
⎜ . ⎟,
⎟ (3.37)
⎜ n− j+2′ ⎟
⎝ e2 ⎠
O In− j+1
for j = 2, . . . , n.
114 Elimination and Duplication Matrices
BH j = (BG j O)
where
′ ⎞′
b1 e nj
⎛
⎜ .. ⎟
⎜
BG j = ⎜
⎜ . ⎟
⎟
′⎟
n− j+2
⎝b j−1 e2 ⎠
(B) j−1
for j = 2, . . . , n. and
(x ⊗ B)H j = x ⊗ BH j
for j = 1, . . . , n.
We write the other matrix Mj as
⎛ ⎞
O
⎜ E ′j−1 ⎟
⎜ ⎟
j′ ⎟
Z
⎜
Mj = ⎜ 2 ⎟
⎜
⎜ .. ⎟
⎟
⎝ . ⎠
j′
Zn− j+1
j′
for j = 2, . . . , n − 1, where from Equation 3.20 Zi is the n × n − j + 1
matrix given by
⎞3
O
⎛
⎜ ( j−1)×(n− j+1)⎟
j′
⎜ ⎟
Zi = ⎜ n− j+1′
⎜ ei
⎟
⎟
⎝ ⎠
O
(n− j )×(n− j+1)
for j = 2, . . . , n − j + 1.
3
The remarks made about Ej clearly refer to Hj , Gj , Mj and Zij as well. All these matrices are
dependent on n, but for simplicity of notation, this is not indicated, the relevant n being
clear from the content.
116 Elimination and Duplication Matrices
j
It is now time to investigate some of the properties of Zi . First, if A is a
n × p matrix, then
⎛ ′⎞
0
⎜ .. ⎟
⎜ . ⎟
n− j+1 j ′
j n− j+1
⎜ j ′ ⎟ th
Zi A = O ei O A = ei a =⎜ ⎜a ⎟ i
⎟ (3.44)
⎜ .. ⎟
⎝ . ⎠
0′
Clearly, if x is a n × 1 vector
′ ′
Zij (x ′ ⊗ A) = x ′ ⊗ Zij A = x ′ ⊗ ein− j+1 ⊗ a j .
and
′ ′
(x ⊗ B)Zij = x ⊗ ein− j+1 ⊗ b j .
for j = 1, . . . , n − 1 and
Mn′ A = En−1 An .
3.3 Duplication Matrices 117
for j = 1, . . . , n − 1 and
′
Mn′ (A ⊗ B) = (an ⊗ B)n−1 .
As pointed out in Section 1.2 of Chapter 1,
′ ′ ′
(a j ⊗ B) j· = a j ⊗ b j .
From Equation 3.6, we have
′ ′
(a j ⊗ B) j = a j ⊗ (B) j
and from Equation 1.9 of Chapter 1, we have
′
(A ⊗ B)( j ) = A ⊗ b j
so using Equation 3.5, we have
′
(A ⊗ B)( j ) j = (A) j ⊗ b j
118 Elimination and Duplication Matrices
= c j ⊗ (D) j−1 + (0 (C ) j ) ⊗ d j
(3.50)
for j = 1, . . . , n − 1 and
(C ⊗ D)Mn = cn ⊗ (D)n−1 .
Note that as special cases if x and y are both n × 1 vectors
′ xjyj 0
M j (x ⊗ y) = = x j (y) j−1 + y j
x j (y) j + (x) j y j (x) j
for j = 1, . . . , n − 1 and
Mn′ (x ⊗ y) = xn yn
whereas
(x ′ ⊗ y ′ )M j = x j y j x j (y ′ ) j + y j (x ′ ) j = x j (y ′ ) j−1 + y j 0′ (x ′ ) j
(3.51)
3.3 Duplication Matrices 119
for j = 1, . . . , n − 1 and
(x ′ ⊗ y ′ )Mn = xn yn .
Now, consider the case where A and B are both n × n matrices, so we can
form M j′ (A ⊗ B)Mℓ . From Equation 3.47
′ a ′j ⊗ b ′j
M j (A ⊗ B)Mℓ = Mℓ
a ′j ⊗ (B) j + (A) j ⊗ b ′j
and
ℓ
(A) j ⊗ b ′j Mℓ = b jℓ (aℓ ) j
(aℓ ) j ⊗ (b ′j )ℓ + (A) j b jℓ
for j = 1, . . . , n − 1, and ℓ = 1, . . . , n − 1.
The special cases are given by
Mn′ (A ⊗ B)Mℓ = anℓ bnℓ anℓ (bn′ )ℓ + bnℓ (an′ )ℓ
′ ℓ
′ ℓ−1 0
= anℓ (bn ) + bnℓ ′
an
for ℓ = 1, . . . , n − 1 and
a jn b jn 0
M j′ (A ⊗ B)Mn = = a jn (bn ) j−1 + b jn
a jn (bn ) j + b jn (an ) j (an ) j
for j = 1, . . . , n − 1 and
Mn′ (A ⊗ B)Mn = ann bnn .
120 Elimination and Duplication Matrices
(C ⊗ D)Dn = c1 ⊗ d1 c1 ⊗ (D)1 + (C )1 ⊗ d1 · · ·
= c1 ⊗ D + (O (C )1 ) ⊗ d1 · · ·
j ′ n−1 n−1 0 ′
⊗ bj
′
+ 0 (a ) ⊗ (bn−1 ) j−1 a jn ((B) j−1 ) +
(a1 ) j
1
0′
0 ′
+ 0 b j1 · · · ⊗ (b j )n−2
(A) j (an−1 ) j
n−1
0′
0 j ′ n−1
+ 0 b jn−1 ⊗ (b ) (3.52)
(A) j (an ) j
′ n−1
+ 0 (an )n−1 ⊗ (bn−1 )n−1 ann (B)n−1
′ n−2
= an1 (B)n−1 + 0 (an )1 ⊗ (b1 )n−1 · · · ann−1 (B)n−1
+ 0 ann bn−1n ann bnn . (3.53)
and
′
C2 = a21 (A)1 + 0 (a2 )1 (a1 )1 a22 ((A)1 )1
2
= a21 (a21 a22 ) + (0 a22 )a21 a22
2 2
= a21 2a21 a22 a22
so
2 2
⎛ ⎞
a11 2a11 a12 a12
D2′ (A ⊗ A)D2 = ⎝2a11 a21 2a11 a22 + 2a21 a12 2a12 a22 ⎠ . (3.54)
⎜ ⎟
2 2
a21 2a21 a22 a22
Comparing Equation 3.54 with Equation 3.28, we see that there are a lot of
similarities between L2 N2 (A ⊗ A)N2 L2′ and D2′ (A ⊗ A)D2 when A is sym-
metric. All the elements of these two matrices have the same combination
of the aij s, though the number of these combinations differs in the 2nd row
and the 2nd column. More will be made of this when we compare Ln Nn
with Dn as we do in Section 3.4.
Using our explicit expression for Ln , Ln Nn , and Dn , it is simple to prove
known results linking Dn with Ln and Nn .
For example,
H1
⎛ ⎞ ⎛ ⎞
In O ⎛ ⎞
H 1
⎜ E1 ⎟ ⎜ .. ⎟ ⎜ E1 H2 ⎟
⎟ ⎜ ⎟
Ln Dn = ⎜ = ⎟.
⎜
.. ⎠ .
⎟ ⎝ ⎠
⎝ ... ⎠
⎜
⎝ .
Hn
O En−1 En−1 Hn
But using Equations 3.3, 3.35 and 3.36, the matrix E j H j+1 is the n − j ×
1
2
n(n + 1) matrix given by
O In− j O
E j H j+1 = j 1
,
(n− j )× 2 (2n− j+1) (n− j )× 2 (n− j+1)(n− j )
so
⎛ ⎞
In O
⎜ In−1 ⎟
Ln Dn = ⎜ ⎟ = I 21 n(n+1) .
⎜ ⎟
..
⎝ . ⎠
O 1
3.3 Duplication Matrices 123
Similarly,
⎞ ⎛
H1
1 ⎜ ⎟ 1
Ln Nn Dn = (P · · · Pn ) ⎝ ... ⎠ = (P1 H1 + · · · + Pn Hn )
2 1 2
Hn
R1 R1 O
P1 H1 = (In O) =
O O O
⎛
e nj
⎞
O
⎜
⎜ e n−1
j−1
⎟
⎟
⎜ .. ⎟
⎜
Pj H j = ⎜ . ⎟
⎟
⎜ n− j+2 ⎟
⎜O e2 ⎟
⎜ ⎟
⎝ Rj⎠
O ··· O
′
e nj
⎛ ⎞
O O
′
⎜
⎜ e n−1
j−1
⎟
⎟
⎜
×⎜ .. .. ⎟
⎜ . .⎟
⎟
⎜ n− j+2′ ⎟
⎝O e2 ⎠
O In− j+1 O
′
e nj e nj
⎛ ⎞
O O
′
⎜
⎜ e n−1 n−1
j−1 e j−1
⎟
⎟
⎜ .. .. ⎟
⎜
=⎜ . .⎟
⎟
⎜ n− j+2 n− j+2′ ⎟
⎜ O e2 e2 ⎟
⎜ ⎟
⎝ Rj ⎠
O ··· O O
124 Elimination and Duplication Matrices
for j = 2, . . . , n, so
′
e2n e2n
⎛ ⎞
⎛ ⎞ O
R1 O ⎜ R2 ⎟
⎜ O ⎟ ⎜ ⎟
2Ln Nn Dn = ⎜
⎜
⎟+⎜
⎟ ⎜ O ⎟
.. ⎟
⎝ . ⎠ ⎜ .. ⎟
⎝ . ⎠
O O
O O
⎛ n n′ ⎞
e3 e3 O
′
⎜
⎜ e2n−1 e2n−1 ⎟
⎟
+⎜
⎜ R3 ⎟ + ···
⎟
⎜ .. ⎟
⎝ . ⎠
O O
⎛ n n′ ⎞
en en O
⎜ . .. ⎟
+⎜
⎜ ⎟
′
⎟
2 2
⎝ e2 e2 ⎠
O Rn
⎛ ⎞
In O
⎜ In−1
⎟
= 2⎜ ⎟ = 2I 21 n(n+1) ,
⎜ ⎟
..
⎝ . ⎠
O 1
using Equation 3.17 and the fact that Rn = 2, which gives the result
Ln Nn Dn = I 1 n(n+1) . (3.55)
2
But such proofs are highly inefficient. A far more elegant approach, which
leads to simpler proofs is that of Magnus (1988), which concentrates on the
roles played by the various matrices. For example, for a symmetric matrix
A, we know that Nn vec A = vec A, Ln Nn vec A = vech A, and Dn vech A =
vec A. Thus, it follows that
Dn Ln Nn vec A = Nn vec A,
which gives the result
Dn Ln Nn = Nn .
For numerous other results linking Ln , Ln Nn , Ln , Ln Nn and Dn I can do no
better than refer the reader to Magnus (1988).
Our approach, investigating explicit expression for elimination matrices
and duplication matrices, comes into its own when we want to highlight
3.3 Duplication Matrices 125
Theorem 3.7
Ln∗ Dn′ = 2Ln Nn = 2Ln∗ Ln Nn
3.36, and 3.37, one cannot help but notice the similarities between 2Ln Nn
and Dn′ .
In fact, these two matrices have most of their elements the same and the
elements that differ are strategically placed in the two matrices being 2 in
the matrix 2Ln Nn and being 1 in the matrix Dn′ . The following theorem
conveys this result.
Theorem 3.8 The matrix 2Ln Nn − Dn′ is the 21 n(n + 1)×n2 block diagonal
matrix given by
⎛ n n′ ⎞
e1 e1 O
′
⎜ e1n−1 e2n ⎟
′
⎜ ⎟
2Ln Nn − Dn′ = ⎜
⎜ e1n−2 e3n ⎟.
⎟
⎜ .. ⎟
⎝ . ⎠
′
O enn
2Ln Nn = (P1 · · · Pn )
and
Dn′ = H1′
· · · Hn′ .
′
It follows that R1 − In = e1n e1n , so
′
e1n e1n
P1 − H1′ = .
O
e nj O e nj O
⎛ ⎞ ⎛ ⎞
⎜ .. ..
.
⎟ ⎜
.
⎟
⎜ ⎟ ⎜ ⎟
Pj − H j′ = ⎜O n− j+2 ⎟ − ⎜O n− j+2
⎜ ⎟ ⎜ ⎟
⎜ e2 ⎟ ⎜ e2 ⎟
⎟
⎝ Rj⎠ ⎝ In− j+1 ⎠
O ··· ··· O O ··· ··· O
so
n− j+1 n− j+1′
R j − In− j+1 = e1 e1
and
O
⎛ ⎞
..
.
⎜ ⎟
⎜ ⎟
Pj − H j′ =⎜
⎜ ⎟
O ⎟
n− j+1 n− j+1′ ⎠
⎜ ⎟
⎝O ··· O e e
1 1
O ··· ··· O
for j = 2, . . . , n. But,
n− j+1 n− j+1′ n− j+1 n ′
O ··· O e1 e1 = e1 ej
for j = 2, . . . , n.
128 Elimination and Duplication Matrices
Theorem 3.8 can be used to investigate the different ways 2Ln Nn and Dn′
interact with Kronecker products. For example,
⎛ n n′ ⎞ ⎛ 1′ ⎞
e1 e1 O a ⊗B
′ ′
⎜ e1n−1 e2n ⎟ ⎜ a2 ⊗ B ⎟
2Ln Nn − Dn′ (A ⊗ B) = ⎜
⎜ ⎟⎜ ⎟
.. ⎟ ⎜ .. ⎟
⎝ . ⎠ ⎝ . ⎠
n′ n′
O en a ⊗B
⎛ 1′ 1′ ⎞
⎛ n 1′ a ⊗b
n′
⎞
e1 a ⊗ e1 B ⎜ O ⎟
⎜ ⎟
⎜e n−1 a2′ ⊗ e n ′ B⎟ ⎜ a2′ ⊗ b2′ ⎟
⎜1 2 ⎟
=⎜ ⎟ = ⎜ O ⎟.
⎜ ⎟
..
⎝ . ⎠ ⎜
⎜ ..
⎟
⎟
n′ n′ .
a ⊗ en B ⎝ ⎠
n′ n′
a ⊗b
If we partition Ln Nn (A ⊗ B) as in Theorem 3.5, then the jth submatrix of
2Ln Nn (A ⊗ B) is
′ ′
a j ⊗ (B) j−1 + (A) j−1 ⊗ b j .
To obtain the equivalent jth submatrix of Dn′ (A ⊗ B), we subtract
a j ′ ⊗ b j ′
from it. That is, Dn′ (A ⊗ B) is the same matrix as 2Ln Nn (A ⊗ B)
O
′
except in the jth submatrix of 2Ln Nn (A ⊗ B), the first row of (A) j−1 ⊗ b j ,
′ ′
which is a j ⊗ b j , is replaced by the null vector.
By a similar analysis,
(C ⊗ D)Dn
= 2(C ⊗ D)Nn Ln′ − (c1 ⊗ d1 O ··· cn−1 ⊗ dn−1 O cn ⊗ dn )
If we use the partitioning of (C ⊗ D)Nn Ln′ given by Theorem 3.5, then the
jth submatrix of 2(C ⊗ D)Nn Ln′ is c j ⊗ (D) j−1 + (C ) j−1 ⊗ d j .
To obtain the equivalent jth submatrix for (C ⊗ D)Dn , we subtract
cj ⊗ dj O .
In other words, (C ⊗ D)Dn is the same matrix as (C ⊗ D)Nn Ln′ except
in each jth submatrix the first column of (C ) j−1 ⊗ d j , which is c j ⊗ d j , is
replaced by the null vector.
Further comparisons can be made. If we continue to write, Dn′ =
(H1′ · · · Hn′ ) and 2Ln Nn = (P1 · · · Pn ), then
R1 I
P1 = and H1′ = n
O O
3.3 Duplication Matrices 129
so, clearly
⎛ n′ ⎞
2e1
⎜ en ′ ⎟
P1 = H1′ ⎜ 2. ⎟ .
⎜ ⎟
⎝ .. ⎠
′
enn
But,
e nj
⎛ n
O ej O
⎛ ⎞ ⎞
⎜ .. ..
.
⎟ ⎜
.
⎟
⎜ ⎟ ⎜ ⎟
Pj = ⎜ n− j+2 ⎟ and H j′ = ⎜ n− j+2
⎜ ⎟ ⎜ ⎟
⎜ e2 ⎟ ⎜ e2 ⎟
⎟
⎝O Rj⎠ ⎝O In− j+1 ⎠
O ··· ··· O O ··· ··· O
so
′ ⎞
e1n
⎛
⎜ .. ⎟
⎜ . ⎟
′ ⎜ n′ ⎟
⎜ ⎟
Pj = H j ⎜2e j ⎟
⎜ . ⎟
⎝ .. ⎠
′
enn
for j = 2, . . . , n.
It follows that
⎛ n′ ⎞ ′
e1n
⎛ ⎛ ⎞⎞
2e1
⎜ ⎜ en ′ ⎟ ⎜ en ′ ⎟⎟
2Ln Nn = ⎜H1′ ⎜ 2. ⎟ · · · Hn′ ⎜ 2.
⎜ ⎜ ⎟ ⎜ ⎟⎟
⎝ ⎝ .. ⎠ ⎝ ..
⎟⎟
⎠⎠
′ ′
enn 2enn
⎛⎛ n ′ ⎞
2e1
⎞
⎜⎜ e n ′ ⎟ ⎟
⎜⎜ 2 ⎟ O ⎟
⎜⎜ .. ⎟ ⎟
⎜⎝ . ⎠ ⎟
⎜ n′ ⎟
⎜ en ⎟
⎜ ⎟
′ ⎜
= Dn ⎜ .. ⎟.
⎟
(3.56)
.
′
⎜ ⎞⎟
e1n
⎜ ⎛ ⎟
⎜ ⎟
′
e2n
⎜ ⎜ ⎟⎟
⎜ O
⎜ ⎜ ⎟⎟
⎜ . ⎟⎟
⎝ ⎝ .. ⎠⎠
′
2enn
130 Elimination and Duplication Matrices
Notice that the block diagonal matrix in Equation 3.56 is symmetric as its
transpose is
⎛ n n
2e1 e2 · · · enn
⎞
O
⎜ .. ⎟
⎝ . ⎠
n n
O e1 · · · 2en
which is the matrix itself and it is also non-singular, its inverse being
⎛⎛ 1 n ′ ⎞ ⎞
e
2 1
⎜⎜ e n ′ ⎟ ⎟
⎜⎜ 2 ⎟
O ⎟
⎟
⎜⎜ . ⎟
⎜⎝ .. ⎠ ⎟
⎜ ⎟
′
⎜ enn
⎜ ⎟
⎟
⎜ .. ⎟
⎟,
⎜
⎜ . ⎟
⎛ n ′ ⎞⎟
e1 ⎟
⎜
⎜
⎜ ⎟
⎜ ⎜ e n ′ ⎟⎟
⎜ O ⎜ 2 ⎟ ⎟
⎜ . ⎟⎟
⎝ .. ⎠⎠
⎜
⎝
1 n′
e
2 n
so we can write
′
e1n
⎛⎛ ⎞ ⎞
⎜⎜2e n ′ ⎟ ⎟
⎜⎜ 2 ⎟
O ⎟
⎟
⎜⎜ . ⎟
⎜⎝ .. ⎠ ⎟
⎜ ⎟
′
⎜ 2enn
⎜ ⎟
⎟
⎜
Dn′ = Ln Nn ⎜ .. ⎟
⎜ . ⎟
⎟
⎛ n ′ ⎞⎟
2e1 ⎟
⎜
⎜
⎜ ⎟
⎜ ⎜2e n ′ ⎟⎟
⎜ O ⎜ 2 ⎟⎟
⎜ . ⎟⎟
⎝ .. ⎠⎠
⎜
⎝
′
enn
if we like.
Suppose now we use our other expression for 2Ln Nn and Dn′ , namely
⎛ ⎞ ⎛ ′⎞
T1 M1
⎜ .. ⎟ ′ ⎜ .. ⎟
2Ln Nn = ⎝ . ⎠ , Dn = ⎝ . ⎠ .
Tn Mn′
3.3 Duplication Matrices 131
so, clearly
Tn = 2Mn′ = Rn Mn′ .
and
j j
M j′ = O
(O In− j+1 ) Z2 · · · Zn− j+1 .
Consider for i = 2, . . . , n − j + 1
n− j+1
R j Zi = R j O ein− j+1
O = O R j ei O
and
⎛ ⎞
2 O
⎜ 1 ⎟
R j ein− j+1 = ⎜
⎟ n− j+1
⎟ ei = ein− j+1 ,
⎜
..
⎝ . ⎠
O 1
R1 M1′
⎛ ⎞ ⎛ ⎞
R1 O
2Ln Nn = ⎝ ... ⎠ = ⎝ .. ⎟ ′
⎠ Dn . (3.57)
⎜ ⎟ ⎜
.
Rn Mn′ O Rn
Matrix Calculus
4.1 Introduction
Let Y be an p×q matrix whose elements yi j s are differentiable functions of
the elements xrs s of an m×n matrix X. We write Y = Y (X ) and say Y is a
matrix function of X. Given such a setup, we have mnpq partial derivatives
that we can consider:
i = 1, . . . , p
∂yi j
j = 1, . . . , q
∂xrs
r = 1, . . . , m
s = 1, . . . , n.
134
4.2 Different Concepts of a Derivative of a Matrix 135
Concept 1 The derivative of the p×q matrix Y with respect to the m×n
matrix X is the pq×mn matrix.
∂y11 ∂y11 ∂y11 ∂y11
⎛ ⎞
··· ··· ···
⎜ ∂x11 ∂xm1 ∂x1n ∂xmn ⎟
⎜ .. .. .. ..
⎜ ⎟
⎟
⎜ . . . . ⎟
⎜ ∂y p1 ∂y p1 ∂y p1 ∂y p1 ⎟
⎜ ⎟
⎜
⎜ ∂x ··· ··· ··· ⎟
⎜ 11 ∂xm1 ∂x1n ∂xmn ⎟
⎜ . .. .. ..
⎟
DY (X ) = ⎜ .. ⎟.
⎟
. . .
⎜ ∂y1q ∂y1q ∂y1q ∂y1q ⎟
⎜ ⎟
⎜
⎜ ∂x ··· ··· ··· ⎟
⎜ 11 ∂xm1 ∂x1n ∂xmn ⎟ ⎟
⎜ .. .
. .
. .
. ⎟
⎜ . . . . ⎟
⎜ ⎟
⎝ ∂y pq ∂y pq ∂y pq ∂y pq ⎠
··· ··· ···
∂x11 ∂xm1 ∂x1n ∂xmn
Notice that under this concept the mnpq derivatives are arranged in such
a way that a row of DY (X ) gives the derivatives of a particular element of
136 Matrix Calculus
∂y11
⎛ ∂y1q ⎞
⎜ ∂x ···
⎜ . ∂x ⎟
δY .. ⎟
⎜ ..
=⎜ . ⎟.
⎟
δx ⎝ ∂y
p1 ∂y ⎠
pq
···
∂x ∂x
Return now to the case where each element of Y is a function of the elements
of an m×n matrix X. We could then consider the derivative of Y with respect
to X as made up of derivatives Y with respect to each element in X. That is,
the mp×qn matrix
⎛ ⎞
δY δY
⎜ δx ···
⎜ 11 δx1n ⎟
⎟
= ⎜ ... .. ⎟ .
δY ⎜
δX ⎜ . ⎟
⎟
⎝ δY δY ⎠
···
δxm1 δxmn
Concept 2 The derivative of the p×q matrix Y with respect to the m×n
matrix X is the mp×nq matrix
⎛ ⎞
δY δY
⎜ δx ···
⎜ 11 δx1n ⎟
⎟
= ⎜ ... .. ⎟
δY ⎜
δX ⎜ . ⎟
⎟
⎝ δY δY ⎠
···
δxm1 δxmn
∂y1q
⎛ ⎞
∂y11
⎜ ∂x ···
⎜ rs ∂xrs ⎟⎟
= ⎜ ... .. ⎟
δY ⎜
δxrs ⎜ . ⎟ ⎟
⎝ ∂y p1 ∂y pq ⎠
···
∂xrs ∂xrs
for r = 1, . . . , m, s = 1, . . . , n.
This concept of a matrix derivative is discussed in Dwyer and MacPhail
(1948), Dwyer (1967), Rogers (1980), and Graham (1981).
Suppose y is a scalar but a differentiable function of all the elements of an
m×n matrix X. Then, we could conceive of the derivative of y with respect
to X as the m×n matrix consisting of all the partial derivatives of y with
respect to the elements of X. Denote this m×n matrix as
⎛ ⎞
∂y ∂y
⎜ ∂x ···
⎜ 11 ∂x1n ⎟
γy ⎟
= ⎜ ... .. ⎟ .
⎜
γX ⎜ . ⎟
⎟
⎝ ∂y ∂y ⎠
···
∂xm1 ∂xmn
Concept 3 The derivative of the p×q matrix Y with respect to the m×n
matrix X is the mp×nq matrix
⎛ γy
11
γy1q ⎞
···
⎜ γX γX ⎟
γY ⎜ . .. ⎟
⎜ ..
=⎜ . ⎟.
⎟
γX ⎝ γy p1 γy pq ⎠
···
γX γX
⎛ ⎞
∂ℓ
⎜ ∂x ⎟
1⎟
∂ℓ ⎜
= ⎜ ... ⎟ .
⎜ ⎟
∂x ⎜ ⎟
⎝ ∂ℓ ⎠
∂xn
⎛ ⎞
∂y1 ∂ys
⎜ ∂x ···
⎜ 1 ∂x1 ⎟
∂y ⎟
= ⎜ ... .. ⎟ .
⎜
∂x ⎜ . ⎟
⎟
⎝ ∂y1 ∂ys ⎠
···
∂xr ∂xr
4.3 The Commutation Matrix and the Concepts of Matrix Derivatives 139
∂y11
⎛ ∂y p1 ∂y1q ∂y pq ⎞
⎜ ∂x11 ··· ··· ···
∂x11 ∂x11 ∂x11 ⎟
⎜ .. .. .. .. ⎟
⎜ ⎟
⎜ . . . . ⎟
∂y p1 ∂y1q ∂y pq ⎟
⎜ ⎟
⎜ ∂y11
⎜
⎜ ∂x ··· ··· ··· ⎟
⎜ m1 ∂xm1 ∂xm1 ∂xm1 ⎟
∂vec Y ⎟
= ⎜ ... .. .. .. ⎟ .
⎜
∂vec X ⎜ . . . ⎟
⎟
⎜ ∂y11 ∂y p1 ∂y1q ∂y pq ⎟
⎜
⎜ ∂x ··· ··· ··· ⎟
⎜ 1n ∂x1m ∂x1n ∂x1n ⎟ ⎟
⎜ . .. .. .. ⎟
⎜ .. . . . ⎟
⎜ ⎟
⎝ ∂y
11
∂y p1 ∂y1q ∂y pq ⎠
··· ··· ···
∂xmn ∂xmn ∂xmn ∂xmn
This concept of a matrix derivative was used by Graham (1983) and Turk-
ington (2005).
⎛⎞
Dy1
⎜ . ⎟
DY (X ) = ⎝ .. ⎠
Dyq
140 Matrix Calculus
and
⎛ ⎞
Dy11
⎜ . ⎟
⎜ .. ⎟
⎜ ⎟
⎜ ⎟ ⎛ ⎞
⎜Dy1q ⎟ DY1·
⎜ ⎟
⎜ . ⎟ ⎜ . ⎟
⎜ .. ⎟ = ⎝ .. ⎠ ,
Kpq DY (X ) = ⎜ ⎟ ⎜ ⎟
⎜ ⎟
⎜Dy p1 ⎟ DYp·
⎜ ⎟
⎜ . ⎟
⎜ . ⎟
⎝ . ⎠
Dy pq
Kpp DY (X ) = DY (X ).
Referring to Concept 2
⎛ δY δY1· ⎞
1·
···
⎜ δx11 δx1n ⎟
⎜ . .. ⎟
⎜ .
⎜ . . ⎟
⎟
⎛ ⎞ ⎜
⎜ δY1· δY1· ⎟
⎟
δY δY ⎛ δY ⎞
··· ⎜
⎜ δx ··· ⎟ 1·
⎜ δx11 δx1n ⎟ m1 δxmn ⎟ ⎜ δX ⎟
⎟
⎟ ⎜ .. .. ⎟ ⎜ .. ⎟
⎜ . .. ⎟ ⎜
δY ⎜
Kpm = Kpm ⎜ .. . ⎟=⎜ .
⎜ . ⎟
⎟ = ⎜ . ⎟.
⎜ ⎟
δX ⎜ ⎟ ⎜
⎝ δY δY ⎠ ⎜ δYp· δYp· ⎟ ⎟
⎝ δY ⎠
p·
··· ⎜ ··· ⎟
δxm1 δxmn ⎜ δx11
⎜ . δx1n ⎟ δX
⎜ . .. ⎟
⎜ . . ⎟
⎟
⎜ ⎟
⎝ δYp· δYp· ⎠
···
δxm1 δxmn
In a similar manner,
⎛ ⎞
γY
⎜ γX1· ⎟
⎜ . ⎟
γY ⎜ ⎟
Kmp = ⎜ .. ⎟ .
γX ⎜ ⎟
⎝ γY ⎠
γXm·
4.4 Relationships Between the Different Concepts 141
δY γY δY
= and DY (x) = vec .
δx γx δx
δY γY
= rvec p DY (x) = .
δx γx
Suppose Y is a scalar, say y. This case is far more common in statistics
and econometrics. Then again, Concept 2 and Concept 3 are the same and
Concept 1 is the transpose of the vec of either concept. That is, for y a scalar
and X an m×n matrix
δy ′
δy γy
= and Dy(X ) = vec . (4.2)
δX γX δX
δy
As vec = (Dy(X ))′ and, again as a vec can always be undone by taking
δX
the appropriate generalized rvec, rvecm , in this case, we have
δy γy
= = rvecm (Dy(X ))′ . (4.3)
δX γX
The last case, where Y is in fact a scalar is prevalent enough in statistics to
warrant us looking at specific examples of the relationships between our
three concepts. The matrix calculus results presented here, as indeed the
results presented throughout this chapter, can be found in books such as
Graham (1981), Lutkepohl (1996), Magnus and Neudecker (1988), Rogers
(1980), and Turkington (2005).
Examples where Y is a scalar:
These examples suffice to show that it is a trivial matter moving between the
different concepts of matrix derivatives when Y is a scalar. In the next section,
we derive transformation principles that allow us to move freely between
the three different concepts of matrix derivatives in more complicated cases.
These principles can be regarded as a generalization of the work done by
Dwyer and Macphail (1948) and by Graham (1980). In deriving these
principles, we call on the work we have done with regards to generalized
vecs and rvecs in Chapter 2, particularly with reference to the selection of
rows and columns of Kronecker products.
∂y1q
⎛ ⎞
∂y11
⎜ ∂x ···
⎜ rs ∂xrs ⎟⎟
= ⎜ ... .. ⎟
δY ⎜
δxrs ⎜ . ⎟ ⎟
⎝ ∂y p1 ∂y pq ⎠
···
∂xrs ∂xrs
ℓ = (s − 1)m + r
144 Matrix Calculus
If we are given DY (X ) and we can identify the ℓth column of this matrix,
then Equation 4.7 allows us to move from Concept 1 to Concept 2. If,
however, we have in hand δY/δX , we can identify the submatrix δY/δxrs
then Equation 4.8 allows us to move from Concept 2 to Concept 1.
∂yi j ∂yi j
⎛ ⎞
⎜ ∂x ···
11 ∂x1n ⎟
γyi j ⎜ ⎟
= ⎜ ... .. ⎟
⎜
γX ⎜ . ⎟ ⎟
⎝ ∂yi j ∂yi j ⎠
···
∂xm1 ∂xmn
4.5 Tranformation Principles Between the Concepts 145
∂yi j ∂yi j
⎛ ⎞
⎜ ∂x ···
11 ∂x1n ⎟
γyi j ⎜ ⎟
= ⎜ ... .. ⎟
⎜
γX ⎜ . ⎟ ⎟
⎝ ∂yi j ∂yi j ⎠
···
∂xm1 ∂xmn
∂yi j
and the partial derivative is given by the (r, s)th element of this sub-
∂xrs
matrix. That is,
∂yi j γyi j
= .
∂xrs γX rs
It follows that
⎛
γy1q ⎞
γy11
⎜ γX ···
⎜ rs γX rs ⎟⎟
δY
=⎜ .. .. (4.11)
⎟.
⎜ ⎟
δxrs ⎜ . .
⎝ γy p1 γy pq ⎠
⎟
···
γX rs γX rs
∂y1q
⎛ ⎞
∂y11
⎜ ∂x ···
⎜ rs ∂xrs ⎟⎟
= ⎜ ... .. ⎟
δY ⎜
δxrs ⎜ . ⎟ ⎟
⎝ ∂y p1 ∂y pq ⎠
···
∂xrs ∂xrs
and the partial derivative ∂yi j /∂xrs is the (i, j )th element of this submatrix.
That is,
∂yi j
δY
= .
∂xrs δxrs ij
4.6 Tranformation Principle One 147
It follows that
⎛ δY
δY
⎞
···
⎜ δx11 i j δx1n i j ⎟
γyi j ⎜ .. ..
⎟
=⎜ ⎟. (4.12)
⎜ ⎟
γX ⎜ . . ⎟
⎝ δY δY ⎠
···
δxm1 i j δxmn i j
If we have in hand γy/γX , then Equation 4.11 allows us to build up the
submatrices we need for δY/δX. If, however, we have a result for δY/δX ,
then Equation 4.12 allows us to obtain the submatrices we need for γY/γX.
so
δY
= (vec B)(rvec A ′ ).
δX
In terms of Concept 3, for this case
⎛ ′ pq pq ⎞
B E11 A . . . B ′ E1q A
γY
= ⎝ ... .. ⎟ = (I ⊗ B ′ )U (I ⊗ A)
⎜
γX . ⎠ p pq q
′ pq ′ pq
B E p1 A . . . B E pq A
= (vec B ′ )(rvec A).
DY (X ) = A ⊗ B
δY
= (vec B)(rvec A ′ )
δX
γY
= (vec B ′ )(rvec A).
γX
D(AX B) = B ′ ⊗ A. (4.15)
It follows that
δAX B
= AErsmn B
δxrs
and
γ(AX B)i j pq
= A ′ Ei j B ′ .
γX
Moreover,
δAX B
= (vec A)(rvec B)
δX
γAX B
= (vec A ′ )(rvec B ′ ).
γX
150 Matrix Calculus
D(X AX ) = X ′ A ′ ⊗ Im + In ⊗ X A (4.16)
δX AX
= (vec Im )(rvec AX ) + (vec X A)(rvec In )
δX
γX AX
= (vec Im )(rvec X ′ A ′ ) + (vec A ′ X ′ )(rvec In ).
γX
3. Y = X ⊗ IG where X is an m×n matrix.
We have seen in Equation 2.29 of Chapter 2 that vec (X ⊗ IG ) =
(In ⊗ vecm KmG )vec X , so
It follows that
δ(X ⊗ IG )
= (vecm KGm )Ersmn
δxrs
and
γ(X ⊗ IG )i j
= (vecm KGm )′ Eiknj where k = G 2 m.
γX
Moreover,
δ(X ⊗ IG )
= vec(vecm KmG )(rvec In ) = (vec ImG )(rvec In )
δX
γ(X ⊗ IG )
= vec(vecm KmG )′ (rvec In ) = (vec ImG )(rvec In ),
γX
where we have used Theorem 2.20 of Section 2.5 in Chapter 2.
4. Y = AX −1 B where A is p×n and B is n×q. Then, it is known that
γ(AX −1 B)i j ′ pq ′
= −X −1 A ′ Ei j B ′ X −1 .
γX
4.6 Tranformation Principle One 151
DY (X ) = Kqp (C ⊗ E )
δY
= (Im ⊗ C )Kmn (In ⊗ E ′ )
δX
γY
= (Ip ⊗ E ′ )Kpq (Iq ⊗ C ).
γX
As an example of the use of this second transformation principle, let Y =
AX ′ B where A is p×n and B is m×q. Then, it is known that
D(AX ′ B) = Kpq (A ⊗ B ′ ).
It follows that
δAX ′ B
= AEsrmn B
δxrs
and that
γ(AX ′ B)i j pq
= BE ji A.
γX
In terms of the entire matrices, we
δY
= (In ⊗ A)Knm (Im ⊗ B)
δX
γY
= (Iq ⊗ B)Kqp (Ip ⊗ A).
γX
Principle 2 comes into its own when it is used in conjunction with Princi-
ple 1. Many matrix derivatives come in two parts: one where Principle 1 is
applicable and the other where Principle 2 is applicable.
For example, we often have
DY (X ) = A ⊗ B + Kqp (C ⊗ E ),
D(BYC ) = (C ′ ⊗ B)DY (X )
δBYC δY
=B C
δxrs δxrs
δBYC δY
= (In ⊗ B) (In ⊗ C ).
δX δX
The third concept of a matrix derivative is not so accommodating.
Certainly, there are rules that allow you to move from γYi j /γX and
γY /γX to γ(BYC )i j /γX and γBYC/γX respectively, but these are
more complicated.
The following results are not as well known:
5. Let Y = E ′ E where E = A + BXC with A p×q, B p×m, and C n×q.
Then, from Lutkepohl (1996), p. 191, we have
D(E ′ E ) = Kqq (C ′ ⊗ E ′ B) + C ′ ⊗ E ′ B.
6. Let Y = EE ′ where E is as in 5.
Then, from Lutkepohl (1996), p. 191, again we have
It follows that
δEE ′
l = EC ′ Esrnm B ′ + BErsmnCE ′
δxrs
γ(EE ′ )i j pp pp
= B ′ E ji EC ′ + B ′ Ei j EC ′
γX
or in terms of complete matrices
δEE ′
= (Im ⊗ EC ′ )Kmn (In ⊗ B ′ ) + (Im ⊗ B)Umn (In ⊗ CE ′ )
δX
= (Im ⊗ EC ′ )Kmn (In ⊗ B ′ ) + (vec B)(rvec CE ′ )
γEE ′
= (Ip ⊗ B ′ )Kpp (Ip ⊗ EC ′ ) + (Ip ⊗ B ′ )Upp (Ip ⊗ EC ′ )
γX
= (Ip ⊗ B ′ )Kpp (Ip ⊗ EC ′ ) + (vec B ′ )(rvec EC ′ ).
The next chapter looks at some new matrix calculus results or at least
old results expressed in a new way. We deal with matrix derivatives using
Concept 4 that involves cross-products and generalized vecs and rvecs.
As far as cross-products are concerned, we can apply our principles to
the transpose of every Kronecker product in the cross-product to get the
corresponding results for the other concepts of matrix derivatives.
∂yi j ∂yi j
Dyi j = ···
∂x1 ∂xm
is 1×m this matrix is p×qm. They then define a matrix of the second order
partial derivatives, the ∂ 2 yi j /∂xr ∂xs , as ∇ 2Y = ∇(∇Y ). That is, to form
158 Matrix Calculus
and compare it with the matrix of second order partial derivatives that
would have been formed using Concept 1, namely the pqm×m matrix,
which written out in full is
⎛ ⎞
∂y11
⎜D ∂x ⎟
⎜ 1 ⎟
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜ ∂y p1 ⎟
⎜D ⎟
⎜
⎜ ∂x1 ⎟ ⎟
⎜ .
.
⎟
⎜ . ⎟
⎜ ⎟
⎜
⎜D ∂y 1q ⎟
⎟
∂x1 ⎟
⎜ ⎟
⎜
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜ ∂y ⎟
pq ⎟
⎜D
⎜
∂x1 ⎟
⎟
⎜
D2Y = D(vec D) = ⎜
⎜ . ⎟
⎜ .. ⎟ . (4.23)
⎟
⎜ ∂y11 ⎟ ⎟
⎜D
⎜
⎟
⎜ ∂xm ⎟
⎜ . ⎟
⎜ .. ⎟
⎜ ⎟
⎜ ∂y p1 ⎟
⎜D
⎜ ⎟
⎟
⎜ ∂xm ⎟
⎜ .. ⎟
.
⎜ ⎟
⎜ ⎟
⎜ ∂y ⎟
⎜ 1q ⎟
⎜D ⎟
⎜ ∂xm ⎟
..
⎜ ⎟
⎜ ⎟
⎜ . ⎟
∂y pq ⎠
⎜ ⎟
⎝
D
∂xm
Comparing Equations 4.22 and 4.23, we have that
D2Y = vecm [(∇ 2Y )T ′m,m,...,m ],
where Tm,m,...,m is the appropriate twining matrix. But from Equation 2.69
of Chapter 2,
T ′m,m,...,m = Kqm ⊗ Im
so, we have
D2Y = vecm [(∇ 2Y )(Kqm ⊗ Im )]. (4.24)
160 Matrix Calculus
Using the chain rule of ordinary calculus and Equation 4.29, we obtain
n
D(ℓ(β)) = [x t′ − yt exp(x t′ β)x t′ ]. (4.30)
t =1
5.1 Introduction
In this chapter, we develop new matrix calculus results or at least view
existing results in a new light. We concentrate on results that involve the
mathematical concepts developed in Chapters 1 and 2, particularly results
that involve generalized vecs and rvecs on the one hand and cross-products
on the other.
We avoid as much as possible matrix calculus results that are well known.
If the reader wants to familiarize themselves with these, then I refer them to
Magnus and Neudecker (1999), Lutkepohl (1996), and Turkington (2005).
Having said this, however, because I want this book to be self-contained,
it is necessary for me to at least present matrix calculus results, which we
use all the time in our derivations. These results on the whole form rules,
which are the generalizations of the chain rule and the product rule of
ordinary calculus.
We saw in the last chapter that at least four different concepts of matrix
derivatives are prevalent in the literature and that using transformation
principles is an easy matter to move from results derived for one concept to
the corresponding results for the other concepts. That is not to say, however,
that new results can be just as easily obtained regardless of what concept one
chooses to work with. Experience has shown that by far the easiest concept
to use in deriving results for difficult cases is Concept 1, or the transpose
of this concept, which we called Concept 4. In the following sections, we
develop basic rules for Concept 4.
For the general case given by Equation 5.1, where y and x are s×1 and
r ×1 vectors, respectively, the jth column of ∂y/∂x is the derivative of a
scalar function with respect to a vector, namely ∂y j /∂x, whereas the i th
row of the matrix ∂y/∂x is the derivative of a vector with respect to a scalar,
namely ∂y/∂xi .
In deriving results, where y = vec Y is a complicated vector function of
x = vec X , we need a few basic rules for ∂y/∂x, which I now intend to give
with proofs. For a more complete list of known matrix calculus results,
consult the references previously given.
The last section presents some simple theorems concerning ∂vec Y/
∂vec X. These theorems at first glance appear trivial, but taken together
they give a very effective method of finding new matrix calculus results.
This method is then applied to obtain new results for derivatives involving
vec A, vech A, and v(A) where A is an n×n matrix.
∂Ax
= A′
∂x
∂x ′ Ax
= (A + A ′ )x = 2Ax if A is symmetric.
∂x
Proof: The jth element of Ax is k a jk xk so the jth column of ∂Ax/∂x
′
A j . and
is ∂Ax/∂x = A ′ . The jth element of ∂x ′ Ax/∂x is ∂x ′ Ax/∂x j =
′ ′
i ai j xi + ℓ a j ℓ xℓ so ∂x Ax/∂x = (A + A )x. Clearly, if A is symmetric,
the result becomes 2Ax.
Theorem 5.2 (The Backward Chain Rule) Let x = (xi ), y = (yℓ ), and
z = (z j ) be r ×1, s×1, and t ×1 vectors, respectively. Suppose z is a vec-
tor function of y, which in turn is a vector function of x, so we can write
z = z[y(x)].
5.3 Some Basic Rules of Matrix Calculus 167
Then,
∂z ∂y ∂z
= .
∂x ∂x ∂y
Hence,
∂z ∂y ∂z
= .
∂x ∂x ∂y
In developing the next rule, the product rule, it is useful for us to refer to
a generalization of the chain rule where z is a vector function of two vectors
u and v. This generalization is given by the following theorem.
Theorem 5.3 can now be used to obtain the following product rule.
where this last equality follows from the backward chain rule. The result
follows by noting that
Recall from Section 2.5 of Chapter 2 that for an m×n matrix X, we can
write
vec(X ⊗ IG ) = (In ⊗ vecm KmG )vec X
and
vec(IG ⊗ X ) = (vecm KnG ⊗ Im )vec X.
It follows using Theorem 5.1 that
∂vec(X ⊗ IG )
= (In ⊗ vecm KmG ) ′ = In ⊗ (vecm KmG ) ′ = In ⊗ rvecm KGm
∂vec X
(5.2)
and that
∂vec(IG ⊗ X )
= (vecn KnG ⊗ Im ) ′ = (vecn KnG ) ′ ⊗ Im = rvecn K Gn ⊗ Im .
∂vec X
(5.3)
These two results are the building blocks of numerous other results
involving the derivatives of the vecs of Kronecker products. We see that we
can write these derivatives either in terms of generalized rvecs or in terms
of cross-products and that both cases our results involve the commutation
matrix.
Consider now an p×G matrix A whose elements are not functions of the
elements of X. Then,
vec(X ⊗ A) = vec[(Im ⊗ A)(X ⊗ IG )] = (InG ⊗ Im ⊗ A)vec(X ⊗ IG ).
Using the backward chain rule, Theorem 4.2, we have
∂vec(X ⊗ A) ∂vec(X ⊗ IG )
= (InGm ⊗ A ′ ).
∂vec X ∂vec X
From Equation 5.2, we can now write
∂vec(X ⊗ A)
∂vec X
= (In ⊗ rvecm KGm )(In ⊗ IGm ⊗ A ′ )=In ⊗ (rvecm KGm )(IGm ⊗ A ′ ) (5.4)
= In ⊗ rvecm [KGm (Im ⊗ A ′ )], (5.5)
using Equation 1.19 of Chapter 1, which gives the derivative in terms of
generalized rvecs. If we want the equivalent result in terms of cross-products,
we apply Theorem 2.28 of Chapter 2 to Equation 5.4 to obtain
∂vec(X ⊗ A)
= KGn τGnm [KGm (Im ⊗ A ′ )]. (5.6)
∂vec X
170 New Matrix Calculus Results
∂vec(X ⊗ A)
= In ⊗ (Im ⊗ a1′ . . . Im ⊗ aG′ ).
∂vec X
Alternatively, as
′ ⎞
In ⊗ e1G Im ⊗ a1′
⎛ ⎛ ⎞
KGn .. ′ ..
=⎝ ⎠ and KGm (Im ⊗ A ) = ⎝
⎜ ⎟ ⎜ ⎟
. . ⎠
′
In ⊗ eGG Im ⊗ aG′
∂vec(X ⊗ A) ′ ′
= In ⊗ e1G ⊗ Im ⊗ a1′ + · · · + In ⊗ eGG ⊗ Im ⊗ aG′ .
∂vec X
In a similar manner,
∂vec(A ⊗ X )
∂vec X
∂vec(IG ⊗ X )
= (IGn ⊗ A ′ ⊗ Im ) = (rvecm KGn ⊗ Im )(IGn ⊗ A ′ ⊗ Im )
∂vec X
(5.7)
= (rvecm KGn )(IGn ⊗ A ′ ) ⊗ Im = rvecm [KGn (In ⊗ A ′ )] ⊗ Im , (5.8)
∂vec(A ⊗ X )
= IGn τGnm (A ′ ⊗ Im ). (5.9)
∂vec X
Again, we can investigate this result further by applying Theorem 2.25 of
Chapter 2 to Equation 5.8 to obtain
∂vec(A ⊗ X )
= (In ⊗ a1′ . . . In ⊗ aG′ ) ⊗ Im .
∂vec X
5.4 Matrix Calculus Results Involving Generalized Rvecs or Cross-Products 171
Suppose now A and B are mG× p and nG×q matrices whose elements are
not functions of the elements of the m×n matrix X. Consider
∂vec A ′ (IG ⊗ X )B
= (B1 ⊗ A1 ) + · · · + (BG ⊗ AG ).
∂vec X
The result for A ′ (X ⊗ IG )B is easily obtained by writing
∂vec A ′ (X ⊗ IG )B
= KGn BτGnm KGm A. (5.11)
∂vec X
172 New Matrix Calculus Results
Bs
5.4 Matrix Calculus Results Involving Generalized Rvecs or Cross-Products 173
∂vec A ′ (D ⊗ X )B
= (B1 ⊗ (d1′ ⊗ Im )A) + · · · + (Bs ⊗ (ds′ ⊗ Im )A)
∂vec X
where d j is the jth column of D for j = 1, . . . , s. Similarly, if A is now ms× p,
then
∂vec A ′ (X ⊗ D)B
= Ksn Bτsnm Ksm (Im ⊗ D ′ )A = B (1) ⊗ (Im ⊗ d1′ )A
∂vec X
+ · · · + B (s) ⊗ (Im ⊗ ds′ )A.
A ′ (X ⊗ X )B = A ′ (X ⊗ Im )(In ⊗ X )B.
∂vec A ′ (X ⊗ X )B ∂vec A ′ (X ⊗ Im )
= ((In ⊗ X )B ⊗ Ip )
∂vec X ∂vec X
∂vec(In ⊗ X )B
+ (Iq ⊗ (X ′ ⊗ Im )A)
∂vec X
= (Kmn τmnm Kmm A)((In ⊗ X )B ⊗ Ip )
+ (Bτnnm Imn )(Iq ⊗ (X ′ ⊗ Im )A)
by applying Equations 5.10 and 5.11. It follows from Theorem 1.5 of Chap-
ter 1 that
∂vec A ′ (X ⊗ X )B
= Kmn (In ⊗ X )Bτmnm Kmm A + Bτnnm (X ′ ⊗ Im )A.
∂vec X
(5.13)
174 New Matrix Calculus Results
=⎝
⎜ .. .. ⎟
. . ⎠
(Im ⊗ x1′ )B ⊗ (An )1· + · · · +(Im ⊗ xn′ )B ⊗ (An )n·
′ ′
B1 ⊗ x 1 A(1) + · · · +Bm ⊗ x m A(1)
⎛ ⎞
.. ..
+⎝ ⎠,
⎜ ⎟
. .
′ ′
B1 ⊗ x 1 A(n) + · · · +Bm ⊗ x m A(n)
where B = (B1′ . . . Bm′ ) ′ and each submatrix B j is m×q.
Consider
A ′ (X ′ ⊗ X )B = A ′ (X ′ ⊗ Im )(Im ⊗ X )B
so, applying the product rule yields
∂vec A ′ (X ′ ⊗ X )B ∂vec A ′ (X ′ ⊗ Im )
= [(Im ⊗ X )B ⊗ Ip ]
∂vec X ∂vec X
∂vec(Im ⊗ X )B
+ [Iq ⊗ (X ⊗ Im )A]. (5.15)
∂vec X
176 New Matrix Calculus Results
=⎝
⎜ .. .. ⎟
. . ⎠
′ ′
(Im ⊗ x 1 )B ⊗ (An )1· + · · · + (Im ⊗ x m )B ⊗ (An )m·
′ ′
(B1 )1· ⊗ (x 1 ⊗ Im )A + · · · + (Bm )1· ⊗ (x m ⊗ Im )A
⎛ ⎞
+⎝ .. ..
⎠.
⎜ ⎟
. .
1′ m′
(B1 )n· ⊗ (x ⊗ Im )A + · · · + (Bm )n· ⊗ (x ⊗ Im )A
Consider
A ′ (X ⊗ X ′ )B = A ′ (X ⊗ In )(In ⊗ X ′ )B
so, again applying the product rule gives
∂vec A ′ (X ⊗ X ′ )B ∂vec A ′ (X ⊗ In )
= [(In ⊗ X ′ )B ⊗ Ip ]
∂vec X ∂vec X
∂vec(In ⊗ X ′ )B
+ [Iq ⊗ (X ′ ⊗ In )A]. (5.17)
∂vec X
Now,
∂vec(In ⊗ X ′ )B ∂vec X ′ ∂vec(In ⊗ X ′ )B
= = Knm (B τnmn In2 ) (5.18)
∂vec X ∂vec X ∂vec X ′
5.4 Matrix Calculus Results Involving Generalized Rvecs or Cross-Products 177
∂vec A ′ (X ⊗ X ′ )B
= Knn (In ⊗ X ′ )B τnnm Knm A + Knm [B τnmn (X ′ ⊗ In )A].
∂vec X
Expanding this result requires a little work.
Using Theorem 2.13 of Chapter 2,
X ′ B1 τn1m Knm A
⎛ ⎞
where B = (B1′ . . . Bn′ ) ′ and each submatrix is m×q. From Equation 1.10 of
Chapter 1,
((X ′ ⊗ In )A)( j ) = X ′ A( j )
B τnm1 X ′ A(1)
⎛ ⎞
=⎝
⎜ .. .. ⎟
. . ⎠
x1′ Bn ⊗ A(1) + · · · +xn′ Bn ⊗ A(n)
B1 ⊗ x1′ A(1) + · · · +Bn ⊗ xn′ A(1)
⎛ ⎞
.. ..
+⎝ ⎠,
⎜ ⎟
. .
B1 ⊗ x1′ A(n) + · · · +Bn ⊗ xn′ A(n)
5.5.1 Introduction
When we take the rvecm of an mG× p matrix A, we get an m× pG matrix.
Whereas if we take the vecm of q×mG matrix B, we get an Gq×m matrix and
just like any other matrices we can envisage taking the matrix derivatives of
these generalized rvecs and vecs. If Y is such a matrix, that is a generalized vec
or generalized rvec and the elements of Y are differentiable functions of the
elements of X, then as in the previous section we work with ∂vecY/∂vec X.
For convenience, we divide this section into two parts. The first part deals
with ‘large X’, where X is mG× p or p×mG. The second part looks at
generalized rvecs and vecs involving a ‘small X’ where X is, say, p×q. As in
the previous section, we call on the results derived in Chapters 1 and 2 on
generalized vecs, rvecs, and cross-products together with results involving
the rvec of the commutation matrix.
5.5.2 Large X
rvecm X = (X1 . . . XG )
so
⎛ ⎞
vec X1
vec(rvecm X ) = ⎝ ... ⎠ .
⎜ ⎟
vec XG
say, for j = 1, . . . G, so
vec X j = (Ip ⊗ S j )vec X
and
⎛ ⎞
I p ⊗ S1
vec(rvecm X ) = ⎝ ..
⎠ vec X.
⎜ ⎟
.
I p ⊗ SG
Using Theorem 5.1, we obtain
∂vec(rvecm X )
= Ip ⊗ S1′ . . . Ip ⊗ SG′ = Ip ⊗ e1G ⊗ Im . . . Ip ⊗ eGG ⊗ Im
∂vec X
= Ip ⊗ e1G . . . Ip ⊗ eGG ⊗ Im = KpG ⊗ Im .
(5.21)
This result is the basic building block from which several other matrix
derivative results of generalized rvecs can be derived.
If X is now p×mG, then using the backward chain rule
∂vec(rvecm X ′ ) ∂vec X ′ ∂vec(rvecm X ′ )
= .
∂vec X ∂vec X ∂vec X ′
∂vec X ′ ′
But = Kp,mG = KmG,p so
∂vec X
∂vec(rvecm X ′ )
= KmG,p (KpG ⊗ Im ),
∂vec X
from Equation 5.21. But from Equation 2.9 of Chapter 2,
KmG,p (KpG ⊗ Im ) = (IG ⊗ Kmp )(KG p ⊗ Im )(KpG ⊗ Im ) = IG ⊗ Kmp ,
which gives our second result, namely
∂vec(rvecm X ′ )
= IG ⊗ Kmp . (5.22)
∂vec X
In a similar fashion, if X is mG×mG and nonsingular, then by the backward
chain rule
∂vec(rvecm X −1 ) ∂vec X −1 ∂vec(rvecm X −1 )
= .
∂vec X ∂vec X ∂vec X −1
∂vec X −1 ′
But = −(X −1 ⊗ X −1 ), by Equation 4.17 of Chapter 4 so using
∂vec X
Equation 5.21, we have
∂vec(rvecm X −1 ) ′
= −(X −1 ⊗ X −1 )(KmG,G ⊗ Im ).
∂vec X
180 New Matrix Calculus Results
= IG ⊗ ⎝ ..
⎠,
⎜ ⎟
.
′
m
A ⊗ em
by Theorem 2.3 of Chapter 2.
5.5 Matrix Derivatives of Generalized Vecs and Rvecs 181
∂vec(rvecm X −1 A)
= −(X −1 ⊗ X −1 )(KmG,G ⊗ Im )(IG ⊗ A ⊗ Im )
∂vec X
′ ′
= −(X −1 ⊗ X 1 . . . X −1 ⊗ X G )(IG ⊗ A ⊗ Im )
⎛ ⎞
A ⊗ Im O
−1 1′ −1
= −(X ⊗ X . . . X ⊗ X ) ⎝ G′ ⎜ .. ⎟
. ⎠
O A ⊗ Im
′ ′
= −(X −1 A ⊗ X 1 . . . X −1 A ⊗ X G )
∂vec X ′ ′
= KmG,p = Kp ,mG
∂vec X
and
∂vec(vecm X ′ )
= Kp,mG (KGm ⊗ Ip ). (5.24)
∂vec X
182 New Matrix Calculus Results
so,
where in our analysis we have used Equation 5.24 and Theorem 2.3 of
Chapter 2.
Finally, if X is an mG×mG nonsingular matrix and A is an q×mG matrix
of constants, then
∂vec(vecm AX −1 )
∂vec X
∂vec(vecm X −1 ) ′
= (ImG ⊗ A ′ ) = −(X −1 KGm ⊗ X −1 )(ImG ⊗ A ′ )
∂vec X
′ ′ ′
= −(X −1 KGm ⊗ X −1 A ′ ) = X(1)
−1
⊗ X −1 A ′ . . . X(m)
−1
⊗ X −1 A ′
5.5 Matrix Derivatives of Generalized Vecs and Rvecs 183
where in our working we have made use of Equation 5.25 and Equation
2.66 of Chapter 2.
5.5.3 Small X
so
(rvecm A) ′ = vecm A ′
so
∂vec(rvecm AX B)
∂vec X
AG
where each submatrix is m× p, then
A ′ = (A1′ . . . AG′ )
and
A1′
⎛ ⎞
vecm A ′ = ⎝ ... ⎠ .
⎜ ⎟
AG′
From our work of selection matrices in Section 2.2 of Chapter 2, we know
that
G′
e j ⊗ Ip vecm A ′ = A ′j
an s×Gm matrix, so it makes sense that the vecm of this product, which by
Theorem 1.12 of Chapter 1 is given by
vecm AX B = (IG ⊗ AX )vecm B = (IG ⊗ A)(IG ⊗ X )vecm B.
Taking the vec of this matrix renders
vec(vecm AX B) = [(vecm B) ′ ⊗ (IG ⊗ A)]vec(IG ⊗ X ),
so by Theorem 5.1
∂vec(vecm AX B) ∂vec(IG ⊗ X )
= (vecm B ⊗ IG ⊗ A ′ ).
∂vec X ∂vec X
Applying Equation 5.3 allows us to write
∂vec(vecm AX B)
= (rvecm KGq ⊗ Ip )(vecm B ⊗ IG ⊗ A ′ ).
∂vec X
Applying Theorem 2.28 of Chapter 2, we obtain
∂vec(vecm AX B)
= vecm B τGqp (IG ⊗ A ′ ). (5.26)
∂vec X
If we expand this derivative further by partitioning B as B = (B1 . . . BG ),
where each submatrix is q×m, so writing out the cross-product of Equation
5.26 gives
∂vec(vecm AX B) ′ ′
= B1 ⊗ e1G ⊗ A ′ + · · · + BG ⊗ eGG ⊗ A ′ .
∂vec X
Suppose now X is q× p while A and B remain the same. Then, by the
backward chain rule
∂vec(vecm AX ′ B) ∂vec X ′ ∂vec(vecm AX ′ B)
= = Kpq (vecm BτGqp (IG ⊗ A ′ ))
∂vec X ∂vec X ∂vec X ′
by Equation 5.26. But by Equation 1.9 of Chapter 1, (IG ⊗ A ′ )( j ) = IG ⊗ a ′j
where a j is the jth column of A, so using Theorem 2.12 of Chapter 2 we
can write
vecm B τGq1 (IG ⊗ a1′ )
⎛ ⎞
′
∂vec(vecm AX B) ⎜ ..
=⎝ ⎠.
⎟
∂vec X .
′
vecm B τGq1 (IG ⊗ a p )
To elaborate further, we can expand the cross-products to obtain
′ ′
B1 ⊗ e1G ⊗ a1′ + · · · + BG ⊗ eGG ⊗ a1′
⎛ ⎞
′
∂vec(vecm AX B) ⎜ ..
=⎝ ⎠.
⎟
∂vec X .
G′ ′ G′ ′
B1 ⊗ e1 ⊗ a p + · · · + BG ⊗ eG ⊗ a p
186 New Matrix Calculus Results
so
∂vec(X1 ⊗ A1 )
= Ip ⊗ eG1 ⊗ rvecm Kqm Iqm ⊗ A1′
∂vec X
= Ip ⊗ eG1 ⊗ (rvecm Kqm ) Iqm ⊗ A1′
⎞
(rvecm Kqm ) Iqm ⊗ A1′
⎛
⎜ O ⎟
= Ip ⊗ ⎜ ⎟.
⎜ ⎟
..
⎝ . ⎠
O
It follows that
⎞
(rvecm Kqm ) Iqm ⊗ A1′
⎛
∂vec X τGmn A ..
= Ip ⊗ ⎝ ⎠.
⎜ ⎟
∂vec X .
′
(rvecm Kqm ) Iqm ⊗ AG
But using Equation 1.19 of Chapter 1, we can write
(rvecm Kqm ) Iqm ⊗ A1′ = rvecm Kqm Im ⊗ A1′
188 New Matrix Calculus Results
so
rvecm [Kqm (Im ⊗ A1′ )]
⎛ ⎞
∂vec X τGmn A ..
= Ip ⊗ ⎝ ⎠. (5.30)
⎜ ⎟
∂vec X .
′
rvecm [Kqm (Im ⊗ AG )]
If we wanted to write this result more succinctly note that
Iqm ⊗ A1′
⎛ ⎞
∂vec X τGmn A ..
= Ip ⊗ (IG ⊗ rvecm Kqm ) ⎝
⎜ ⎟
∂vec X . ⎠
Iqm ⊗ AG′
and from Theorem 2.5 of Chapter 2
Iqm ⊗ A1′
⎛ ⎞
.. ′
⎠ = (KG,qm ⊗ Iq )(Iqm ⊗ vecm A ),
⎜ ⎟
⎝ .
Iqm ⊗ AG′
allowing us to write
∂vec X τGmn A
= Ip ⊗ (IG ⊗ rvecm Kqm )(KG,qm ⊗ Iq )(Iqm ⊗ vecn A ′ ).
∂vec X
But by Theorem 2.22 of Chapter 2,
(IG ⊗ rvecm Kqm )(KG,qm ⊗ Iq ) = KGm rvecmG Kq,mG
so, more succinctly
∂vec X τGmn A
= Ip ⊗ KGm rvecmG Kq,mG (Iqm ⊗ vecm A ′ ). (5.31)
∂vec X
If, however, we wanted to break this result down further or write it
another way, we could return to Equation 5.30 and appeal to Equation 2.11
of Chapter 2, which then allows us to write
Im ⊗ (A1′ )1· · · · Im ⊗ (A1′ )q·
⎛ ⎞
∂vec X τGmn A .. ..
= Ip ⊗ ⎝
⎜ ⎟
∂vec X . . ⎠
′ ′
Im ⊗ (AG )1· · · · Im ⊗ (AG )q·
rvec A1′ ⊗ Im
⎛ ⎞
= Ip ⊗ ⎝ ..
⎠ (Iq ⊗ Knm ). (5.32)
⎜ ⎟
.
rvec AG′ ⊗ Im
Consider now
vec A τGnm X = vec(A1 ⊗ X1 ) + · · · + vec(AG ⊗ XG ).
5.6 Matrix Derivatives of Cross-Products 189
∂vec(A1 ⊗ X1 )
= rvecm Kqp Iqp ⊗ A1′ ⊗ Im .
∂vec X1
so
and
∂vec A τGnm X ∂vec X τGmn A
= (Kpq ⊗ Kmn ).
∂vec X ∂vec X
Using Equation 5.30, we can write
∂vec(A τGnm X )
= (Ip ⊗ C )(Kpq ⊗ Kmn )
∂vec X
where
C=⎝ ..
⎠.
⎜ ⎟
.
rvecm [Kqm (Im ⊗ AG′ )]
so we write
∂vec(A τGnm X ) q
= Ip ⊗ C e1 ⊗ Kmn . . . Ip ⊗ C eqq ⊗ Kmn .
∂vec X
190 New Matrix Calculus Results
q
Consider the first block of the matrix C(e1 ⊗ Kmn ):
q
rvecm Kqm Im ⊗ A1′
e1 ⊗ Kmn
q
= (rvecm Kqm ) Iq ⊗ Im ⊗ A1′ e1 ⊗ Kmn
q
= (rvecm Kqm ) e1 ⊗ Im ⊗ A1′ Kmn
⎛
Im ⊗ A1′ Kmn
⎞
q′
⎜
′ ⎜ O ⎟
= Im ⊗ e1 . . . Im ⊗ eqq ⎜
⎟
.. ⎟
⎝ . ⎠
O
q′ ′
= Im ⊗ e1 A1 Kmn = Im ⊗ (A1 )1 . Kmn = A1′ 1 . ⊗ Im .
′
(KG p ⊗ Im ) ⎝Ip ⊗ ⎝
⎜ ⎜ .. ⎟⎟
.
⎠⎠
rvecm Kqm Im ⊗ AG′
⎞
Ip ⊗ rvecm Kqm Im ⊗ A1′
⎛
=⎝
⎜ .. ⎟
.
⎠
′
Ip ⊗ rvecm Kqm Im ⊗ AG
so
⎞
Ip ⊗ rvecm Kqm Im ⊗ A1′
⎛
∂vec X ′ τGmn A ..
= (IG ⊗ Kmp ) ⎝ ⎠.
⎜ ⎟
∂vec X . ′
Ip ⊗ rvecm Kqm Im ⊗ AG
Theorem 2.25 of Chapter 2 allows us to write this result another way and
break it down further. Applying this theorem we have,
Ip ⊗ rvec A1′ ⊗ Im
⎛ ⎞
∂vec X ′ τGmn A ..
= (IG ⊗ Kmp ) ⎝ ⎠ (Ipq ⊗ Knm )
⎜ ⎟
∂vec X .
′
Ip ⊗ rvec AG ⊗ Im
Ip ⊗ (Im ⊗ (A1′ )1· . . . Im ⊗ (A1′ )q· )
⎛ ⎞
= (IG ⊗ Kmp ) ⎝ ..
⎠.
⎜ ⎟
.
′ ′
Ip ⊗ (Im ⊗ (AG )1· . . . Im ⊗ (AG )q· )
(5.34)
192 New Matrix Calculus Results
so
and
∂vec AτGnm X ′ ∂vec X ′ τGmn A
= (Kpq ⊗ Kmn ). (5.35)
∂vec X ∂vec X
Substituting Equation 5.34 into Equation 5.35 and noting that Knm Kmn =
Imn , we have
Ip ⊗ rvec A1′ ⊗ Im
⎛ ⎞
′
∂vec AτGmn X ..
= (IG ⊗ Kmp ) ⎝ ⎠ (Kpq ⊗ Imn ).
⎜ ⎟
∂vec X .
′
Ip ⊗ rvec AG ⊗ Im
If we partition X −1 as
X1
⎛ ⎞
X −1 = ⎝ ... ⎠
⎜ ⎟
XG
∂vec X −1 τGm n A ′
= −X −1 ⊗ X −1 KGm rvecmG Kq,mG (Iqm ⊗ vecm A ′ ).
∂vec X
(5.36)
In a similar manner,
∂vec AτGnm X −1
∂vec X
∂vec AτGnm X −1
′
= −(X −1 ⊗ X −1 )
∂vec X −1
−1 ′
= −(X ⊗ X )(ImG ⊗ (vecn A ′ )(1) ⊗ Im . . . ImG ⊗ (vecn A ′ )(q) ⊗ Im )
−1
′ ′
= −X −1 ⊗ X −1 ((vecn A ′ )(1) ⊗ Im ) . . . − X −1 ⊗ X −1 (vecn A ′ )(q) ⊗ Im
A more succinct expression for this equation can be obtained using Equation
5.36 and the fact that
∂vec AτGnm X −1 ∂vec X −1 τGmn A
= (KmG,q ⊗ Kmn )
∂vec X ∂vec X
to obtain
∂vec AτGnm X −1
∂vec X
′
= −(X −1 ⊗ X −1 KGm rvecmG Kq,mG (Iqm ⊗ vecn A ′ ))(KmG,q ⊗ Kmn ).
5.6 Matrix Derivatives of Cross-Products 195
X τGmm X = X1 ⊗ X1 + · · · + XG ⊗ XG
and
so
∂vec(X τGmm X ) ∂vec(X1 ⊗ X1 ) ∂vec(XG ⊗ XG )
= + ··· + .
∂vec X ∂vec X ∂vec X
∂vec(X1 ⊗ X1 )
Consider . By the backward chain rule,
∂vec X
∂vec(X1 ⊗ X1 ) ∂vec X1 ∂vec(X1 ⊗ X1 )
= .
∂vec X ∂vec X ∂vec X1
By Equation 5.29,
∂vec X1
= Ip ⊗ S1′ .
∂vec X
′
where S1 is the m×mG selection matrix e1G ⊗ Im .
From Equation 5.13 of Section 5.4,
∂vec(X1 ⊗ X1 )
= Kmp (Ip ⊗ X1 )τmpm Kmm + Ip2 τ ppm (X1′ ⊗ Im ).
∂vec X1
But using Theorem 2.19 of Chapter 2,
so
∂vec(X1 ⊗ X1 )
= (Ip ⊗ S1′ )[Ip ⊗ (X1 τm1m Kmm ) + Ip2 τ pp m (X1′ ⊗ Im )].
∂vec X
(5.37)
196 New Matrix Calculus Results
by Theorem 2.13 of Chapter 2. We can write the first part of our derivative
then as
Ip ⊗ (KmG X τmGm Kmm ) . (5.39)
Adding our two parts given by Equations 5.38 and 5.39 together yields,
∂vec(X τGmm X )
∂vec X
= Ip ⊗ (KmG X τmGm Kmm ) + Ip2 τ (vecm X ′ τG,p,Gm vecm S ′ ). (5.40)
p,p,Gm
But (X1′ )1· ⊗ S1′ = (X1′ )1· ⊗ e1G ⊗ Im = e1G (X1′ )1· ⊗ Im
so
Ip2 τ p,p,Gm X1′ ⊗ S1′ = Ip ⊗ e1G X1′ 1· . . . Ip ⊗ e1G X1′ p· ) ⊗ Im
⎡ ⎛ ′ ⎞ ⎛ ′ ⎞⎤
X1 1· (X1 ) p·
⎢ ⎜ O ⎟ ⎜ O ⎟⎥
= ⎢I p ⊗ ⎜ . ⎟ . . . Ip ⊗ ⎜ . ⎟⎥ ⊗ Im .
⎢ ⎜ ⎟ ⎜ ⎟⎥
⎣ ⎝ . . ⎠ ⎝ . . ⎠⎦
O O
It follows that the second part of ∂vec(X τGm m X )/∂vec X can be written as
⎡ ⎛ ′ ⎞ ⎛ ′ ⎞⎤
(X1 )1· (X1 ) p·
⎜ .. ⎟ ⎜ .. ⎟⎥
⎣Ip ⊗ ⎝ . ⎠ . . . Ip ⊗ ⎝ . ⎠⎦ ⊗ Im
⎢
so
∂vec X ′ ′
= Kp,mG = KmG,p
∂vec X
and using Equation 5.40, we have
∂vec X ′ τGm m X ′
= KmG,p Ip ⊗ (KmG X ′ τmGm Kmm )
∂vec X
+ Ip2 τ p,p,Gm (vecm X τG,p,Gm vecm S ′ ) .
(5.43)
and we can write the first matrix on the right-hand side of Equation 5.43
as
Now, as
⎛ ⎞
(X1 )1·
(vecm X )(1) = ⎝ ... ⎠
⎜ ⎟
(XG )1·
it follows from Theorem 2.3 of Chapter 2 that
⎛ ⎞
Ip ⊗ (X1 )1·
KG p Ip ⊗ (vecm X )(1) = ⎝
⎜ .. ⎟
. ⎠
Ip ⊗ (XG )1·
200 New Matrix Calculus Results
⎛ ⎞
Kmp (Ip ⊗ (X1 )1· ⊗ Im )
..
⎠.
⎜ ⎟
⎝ .
Kmp (Ip ⊗ (XG )1· ⊗ Im )
Returning now to Equation 5.43, it is clear that we can write the second
matrix of the right-hand side of Equation 5.43 as
⎛ ⎞
Kmp (Ip ⊗ (X1 )1· ⊗ Im ) ··· Kmp (Ip ⊗ (X1 ) p· ⊗ Im )
.. ..
⎠.
⎜ ⎟
⎝ . .
Kmp (Ip ⊗ (XG )1· ⊗ Im ) · · · Kmp (Ip ⊗ (XG ) p· ⊗ Im )
′
Kmp (Ip ⊗ X1′ τm1m Kmm ) = Kmp (Ip ⊗ [(X1′ )1· ⊗ Im ⊗ e1m
′
m
+ · · · + (X1′ )m· ⊗ Im ⊗ em ])
′ ′ ⎞
Ip ⊗ (X1′ )1· ⊗ e1m ⊗ e1m
⎛
′
Kmp (Ip ⊗ (X1′ )1· ⊗ Im ⊗ e1m ) = ⎝
⎜ .. ⎟
. ⎠
′ m′ m′
Ip ⊗ (X1 )1· ⊗ em ⊗ e1
5.6 Matrix Derivatives of Cross-Products 201
so
′ ′ ′
m′
Ip ⊗ (X1′ )1· ⊗ e1m ⊗ e1m + · · · + (X1′ )m· ⊗ e1m ⊗ em
⎛ ⎞
=⎝
⎜ .. ⎟
. ⎠
m′ m′ m′ m′
′ ′
Ip ⊗ (X1 )1· ⊗ em ⊗ e1 + · · · + (X1 )m· ⊗ em ⊗ em
′
Ip ⊗ (X1′ ⊗ e1m )τm1m Im
⎛ ⎞
= ⎝ ... ⎠.
⎜ ⎟
m′
′
Ip ⊗ (X1 ⊗ em )τm1m Im
The first matrix on the right-hand side of Equation 5.47 can then be broken
down to
′
⎛ ⎞
Ip ⊗ (X1′ ⊗ e1m )τm11 Im
⎜ .. ⎟
⎜
⎜ . ⎟
⎟
⎜ I ⊗ (X ′ ⊗ e m ′ )τ I ⎟
⎜ p 1 m m11 m ⎟
⎜ .. ⎟
⎟.
.
⎜
⎜ ⎟
′
m
′
⎜ Ip ⊗ (XG ⊗ e1 )τm11 Im ⎟
⎜ ⎟
⎜ .. ⎟
.
⎜ ⎟
⎝ ⎠
′
m
′
Ip ⊗ (XG ⊗ em )τm11 Im
To expand the second matrix on the right-hand side of Equation 5.47 note
that by Equation 1.6 of Chapter 1,
′ ⎞
(X1 )1· ⊗ e1m
⎛
(X1′ )1· ⊗ Im = ⎝
⎜ .. ⎟
. ⎠
m′
(X1 )1· ⊗ em
′ ⎞
Ip ⊗ (X1 )1· ⊗ e1m
⎛
so
∂vec X −1 τGmm X −1
∂vec X
′
= −X −1 ⊗ X −1 (KmG X −1 τmGm Kmm )
′ ′
− (X −1 ⊗ X −1 )(I(mG)2 τmG,mG,m (vecm X −1 τG,Gm,Gm vecm S ′ )). (5.48)
Consider the first matrix on the right-hand side of this equation.
Suppose we write,
⎛ 1⎞
X
−1 ⎜ .. ⎟
X =⎝ . ⎠
XG
5.6 Matrix Derivatives of Cross-Products 203
Consider now the second matrix on the right-hand side of Equation 5.48,
which using Equation 5.41, we can write as
′ ′ ′
−(X −1 ⊗ X −1 ) ImG ⊗ (vecm X −1 )(1) ⊗ Im . . . ImG ⊗ (vecm X −1 )(mG) ⊗ Im
′ ′ (1)
= −X −1 ⊗ X −1 vecm X −1
⊗ Im . . .
′ ′ (mG)
−X −1 ⊗ X −1 vecm X −1
⊗ Im .
∂vec X −1 τGm m X −1
∂vec X
′
= − X −1 ⊗ X −1 τG,m,mG vecm X −1 τm,mG,1 Im
′ (1) ′
− X −1 ⊗ vecm X −1 τG,1,mG vecm X −1 . . .
′ (mG) ′
X −1 ⊗ vecm X −1 τG,1,mG vecm X −1 .
so the first matrix on the right-hand side of our result can be written as
′ ′ ′
−X −1 ⊗ ((X 1 )1· ⊗ X 1 + · · · + (X G )1· ⊗ X G ) ⊗ e1m
′ ′ ′
+ · · · + ((X 1 )m· ⊗ X 1 + · · · + (X G )m· ⊗ X G ) ⊗ e m
and
′ (1) ′ ′ ′ ′ ′
vecm X −1 τG,1,mG vecm X −1 = X·11 ⊗ X 1 + · · · + X·1G ⊗ X G .
G′ ′
+ · · · + X·mG ⊗ XG .
5.7 Results with Reference to ∂ vec Y/∂ vec X 205
5.7.1 Introduction
One of the advantages of working with the concept of a matrix deriva-
tive given by ∂vec Y/∂vec X is that if vec Y = Avec X where A is a matrix
of constants, then ∂ℓ/∂vec Y = A∂ℓ/∂vec X for several of the vectors and
matrices we encounter in our work. That is, often given the specialized
matrices and vectors we work with if y = Ax, and A is a matrix of con-
stants, then ∂ℓ/∂y = A∂ℓ/∂x for a scalar function ℓ. For example, if A
is a selection matrix or a permutation matrix, then y = Ax implies that
∂ℓ/∂y = A∂ℓ/∂x, for an arbitrary scalar function ℓ as well. In this section,
this property is investigated further. It is demonstrated that several theorems
can be derived from this property. On the face of it, these theorems appear
very simple and indeed their proofs are almost trivial. But taken together,
they form a powerful tool for deriving matrix calculus results. By way of
illustration, these theorems are used in Section 5.7.3 to derive results, some
of which are new, for derivatives involving the vectors studied in Section
1.4.3 of Chapter 1, namely vec A, vech A, and v(A) for A a n×n matrix.
They are also used in Section 5.7.4 to explain how results for derivatives
involving vec X where X is a symmetric matrix can be derived from known
results.
Proof: Clearly,
∂x ∂x1 ∂xn
= e1n . . . enn = In ,
= ...
∂x ∂x ∂x
where e nj is the jth column of In .
Theorem 5.6 Suppose x and y are two column vectors such that y = Ax and
∂ℓ/∂y = A∂ℓ/∂x for A a matrix of constants and ℓ a scalar function. Let z be
a column vector. Then,
∂z ∂z
=A .
∂y ∂x
206 New Matrix Calculus Results
∂ℓ ∂ℓ
=A .
∂y ∂x
Write
′
z = z1 . . . zp .
Then,
∂z p
∂z ∂z1 ∂z1 ∂z p
= ... = A ... A
∂y ∂y ∂y ∂x ∂x
∂z1 ∂z p ∂z
=A ... =A .
∂x ∂x ∂x
Theorem 5.7 Suppose x and y are two column vectors such that y = Ax and
∂ℓ/∂y = A∂ℓ/∂x for A a matrix of constants and ℓ a scalar function. Suppose
the elements of x are distinct. Then,
′
∂y ∂x
= .
∂x ∂y
∂z ∂z
=A
∂y ∂x
∂x ∂x
=A
∂y ∂x
That is, to form vechA we stack the elements of A on and below the main
diagonal one underneath the other. The vector v(A) is the 12 n(n − 1)×1
vector given by
That is, we form v(A) by stacking the elements of A below the main diag-
onal, one beneath the other. These vectors are important for statisticians
and econometricians. If A is a covariance matrix, then vecA contains the
variances and covariances but with the covariances duplicated. The vector
vechA contains the variances and covariances without duplication and v(A)
contains the covariances without the variances.
Regardless as to whether A is symmetric or not, the elements in vechA
and v(A) are distinct. The elements in vecA are distinct provided A is
not symmetric. If A is symmetric, the elements of vecA are not distinct.
208 New Matrix Calculus Results
∂v(A)
= I 1 n(n−1) for all A
∂v(A) 2
∂vec A
= In2 provided A is not symmetric.
∂vec A
What ∂vec A/∂vec A is in the case where A symmetric is discussed in Sec-
tion 5.7.4.
In Section 3.2 of Chapter 3, we also saw that there exists 21 n(n + 1)×n2
and 12 n(n − 1)×n2 zero-one matrices Ln and Ln , respectively, such that
Ln vec A = vech A
and
Ln vec A = v(A).
If A is symmetric, then
Nn vec A = vec A
where Nn = 12 (In2 + Knn ) and Knn is a commutation matrix, so for this case
Ln Nn vec A = vech A
and
Ln Nn vec A = v(A).
Dn vech A = vec A.
Consider ℓ any scalar function. Then, reflexion shows that the same
relationships exist between ∂ℓ/∂vec A, ∂ℓ/∂vech A, and ∂ℓ/∂v(A) as exist
between vecA, vechA, and v(A), respectively.
5.7 Results with Reference to ∂ vec Y/∂ vec X 209
Theorem 5.9
∂vec A
= Dn′ if A is symmetric
∂vech A
∂vec A
= Ln if A is not symmetric.
∂vech A
Proof: If A is symmetric vec A = Dn vech A and the result follows. For the
case where A is not symmetric, consider
vech A = Ln vec A.
and
∂vec A
= Ln .
∂vech A
Theorem 5.10
∂vech A
= Dn if A is symmetric
∂vec A
∂vech A
= Ln′ if A is not symmetric
∂vec A
Theorem 5.6 can also be used to quickly derive results about elimination
matrices, duplication matrices, and the matrix Nn . Consider, for example,
the case where A is a symmetric n×n matrix, so
Ln Nn vec A = vech A.
By Theorem 5.6, for any vector z,
∂z ∂z
= Ln Nn .
∂vech A ∂vec A
Take z = vechA. Then,
∂vech A ∂vech A
= Ln Nn = Ln Nn Dn
∂vech A ∂vec A
by Theorem 5.10.
But as the elements of vechA are distinct,
∂vech A
= I 1 n(n+1) ,
∂vech A 2
so
Ln Nn Dn = I 1 n(n+1) ,
2
Clearly, this matrix is not the identity matrix. What it is, is given by the
following theorem whose proof again calls on our results of Section 5.7.2.
vec X = Dn vech X
is a symmetric matrix. Then, the full import of Theorem 5.8 for this case is
given by the equation
∂y ∂x φy
= . (5.53)
∂x ∂x φx
Combining Equations 5.51 and 5.52 give the following theorem.
A few examples will suffice to illustrate the use of this theorem. (For the rules
referred to in these examples, see Turkington (2004), Lutkepohl (1996), or
Magnus and Neudecker (1999)).
For x with distinct elements and A a matrix of constants, we know that
∂x ′ Ax
= 2(A + A ′ )x.
∂x
It follows that when x = vecX and X is an n×n symmetric matrix
∂x ′ Ax
= 2Dn Dn′ (A + A ′ )x.
∂x
For X non-singular, but non-symmetric matrix
∂|X |
= |X |vec(X −1 ) ′
∂vec X
so for X non-singular, but symmetric
∂|X |
= |X |Dn Dn′ vec X −1 .
∂vec X
For X an n×n non-symmetric matrix, A and B matrices of constants
∂vec AX B
= B ⊗ A′
∂vec X
so for X an n×n symmetric matrix
∂vec AX B
= Dn Dn′ (B ⊗ A ′ ).
∂vec X
5.7 Results with Reference to ∂ vec Y/∂ vec X 213
All results using either ∂vec Y/∂vec X or DY (in which case we have to take
transposes) can be adjusted in this way to allow for the case where X is a
symmetric matrix.
In the next chapter, the analysis of this section is brought together to
explain precisely how one should differentiate a log-likelihood function
using matrix calculus.
SIX
Applications
6.1 Introduction
As mentioned in the preface of this book, the main purpose of this work is
to introduce new mathematical operators and to present known matrices
that are important in matrix calculus in a new light. Much of this work
has concentrated on cross-products, generalized vecs and rvecs, and how
they interact and how they can be used to link different concepts of matrix
derivatives. Well-known matrices such as elimination matrices and duplica-
tion matrices have been revisited and presented in a form that enables one
to see precisely how these matrices interact with other matrices, particularly
Kronecker products. New matrix calculus results have also been presented
in this book.
Much of the work then has been of a theoretical nature and I hope it
can stand on its own. Having said this, however, I feel the book would
be incomplete without some indication as to how matrix calculus and the
specialized properties associated with it can be applied.
Matrix calculus can be applied to any area that requires extensive dif-
ferentiation. The advantage of using matrix calculus is that it substantially
speeds up the differentiation process and stacks the partial derivatives in
such a manner that one can easily identify the end result of the process.
Multivariate optimization springs to mind. In Section 6.2, we illustrate the
use of matrix calculus in a well-known optimization problem taken from
the area of finance.
The traditional areas, however, that use matrix calculus are to a large
extent statistics and econometrics. Classical statistical procedures centred
around the log-likelihood function such as maximum likelihood estima-
tion and the formation of classical test statistics certainly require extensive
differentiation. It is here that matrix calculus comes into its own.
214
6.2 Optimization Problems 215
What has been said for statistics holds more so for econometrics, where
the statistical models are complex and the log-likelihood function is a
very complicated function. Applying classical statistical procedures then
to econometric models is no trivial matter. Usually, it is beyond the scope
of ordinary calculus and requires matrix calculus.
As shown in Chapter 4, four different concepts of matrix calculus have
been used, particularly in statistics. In this chapter, as in Chapter 5, Concept
4 of Chapter 4 is used to derive the results.
No attempt is made in this chapter to provide an extensive list of the
applications of matrix calculus and zero-one matrices to models in statis-
tics and econometrics. For such applications, see Magnus and Neudecker
(1999) and Turkington (2005). Instead, what is offered in Section 6.3 is a
brief and non-rigorous summary of classical statistical procedures. Section
6.4 explains why these procedures are amenable to matrix calculus and the
standard approach one should adopt when using matrix calculus to form
the score vector and information matrix, the basic building blocks of clas-
sical statistical procedures. Sections 6.4, 6.5, and 6.6 present applications of
our technique to a statistical model, where we are sampling from a mul-
tivariate normal distribution and to two econometric models, the limited
information model and the full information matrix.
∂y
= 0.
∂x
A given critical point is a local maximum if the Hessian matrix is negative
definite when evaluated at that point whereas the point is a local minimum
if the Hessian matrix is positive definite when evaluated at the point.
In complicated optimization problems, the rules of matrix calculus can
be used to obtain both the score vector and the Hessian matrix usually
far easier than if one was to use ordinary calculus. To illustrate, consider
a well-known problem taken from finance, namely finding the optimal
portfolio allocation. (This section is taken from Maller and Turkington
216 Applications
x1 + · · · + xn = 1
so
xn = 1 − x1 − · · · − xn−1 = 1 − iR′ xR
where iR is an n − 1×1 vector whose elements are all ones and xR is the
n − 1×1 vector given by xR = (x1 . . . xn−1 ) ′ , and we can write
xR
x= = AxR + d,
1 − iR′ xR
where
In−1 0
A= and d = .
−iR′ 1
y ′µ 1
Max g(xR ) = % = y ′ µ(y ′ y)− 2
xR ′
y y
where y = AxR + d. Using the product rule of ordinary calculus plus the
backward chain rule of matrix calculus given by Theorem 5.2 of Chapter 5,
6.2 Optimization Problems 217
( −1 µ)R
xR∗ = ,
i ′ −1 µ
where, following our notation ( −1 µ)R denotes the vector consisting of the
first n − 1 elements of −1 µ. In terms of our original variables, the point
xR∗ corresponds to
′
x ∗ = −1 µ/i ′ −1 µ
∂ 2 g(xR ) ∗
x
∂xR ∂xR R
& '
µ′ −1 µ µµ′ µµ′ µ′ −1 µ 3µ′ −1 µµµ′
= −A 2 ′ −1 + ′ −1 + ′ −1 A
i ′ −1 µ i µ i µ i µ (i ′ −1 µ)3
5
(µ′ −1 µ) 2
× 5 .
((i ′ −1 µ)2 ) 2
5
Now ((i ′ −1 µ)2 ) 2 = (|i ′ −1 µ|)5 = sign(i ′ −1 µ)(i ′ −1 µ)5 , so
Well-known results from matrix algebra (see Horn and Johnson (1989))
ensure that the matrix A ′ ( − µ(µ′ −1 µ)−1 µ′ )A is positive definite, so
whether the Hessian matrix at xR∗ is negative definite or positive definitive
depends crucially on the sign of i ′ −1 µ. If i ′ −1 µ > 0, then xR∗ is a max-
imum and converting back to our original variables, x ∗ = −1 µ/i ′ −1 µ
would be the unique maximum of the %constrained problem. This gives the
maximum Sharpe ratio of f (x ∗ ) = µ′ −1 µ. If i ′ −1 µ < 0, then xR∗ is
a minimum and x ∗%gives a unique minimum of the constrained problem,
namely f (x ∗ ) = − µ′ −1 µ 1 .
1
Maller and Turkington (2002) were the first to recognize the possibility that x ∗ may give
rise to a minimum of the constrained problem rather than a maximum. Their expression
for the Hessian matrix ∂g(xR )/∂xR ∂xR contains a number of typos in it.
6.3 Summary of Classical Statistical Procedures 219
∂ℓ/∂θi . This vector we call the score vector. The Hessian matrix of ℓ(θ) is the
k×k matrix ∂ 2 ℓ/∂θ∂θ = ∂ (∂ℓ/∂θ)/∂θ whose (i, j)th element is ∂ 2 ℓ/∂θi ∂θ j .
The asymptotic information matrix is
2
1 ∂ ℓ
I (θ) = − lim E
n→∞ n ∂θ∂θ
where n denotes the sample size. Now, the limit of the expectation need
not be the same as the probability limit, but for the models we consider
in this chapter, based as they are on the multivariate normal distribution,
the two concepts are the same. Often it is more convenient to regard the
information matrix as
1 ∂ 2ℓ
I (θ) = −p lim .
n ∂θ∂θ
The inverse of this matrix, I −1 (θ) is called the asymptotic Cramer-Rao
lower bound and can be used in the following way. Suppose θˆ is a consistent
estimator of θ and that
√ d
n(θˆ − θ) → N (0, V ).2
Let θ˜ denote the MLE of θ. Then, θ˜ is consistent, and θ˜ is the BAN estimator
so
√ d
n(θ˜ − θ) → N 0, I −1 (θ) .
Note that the likelihood ratio test (LRT) statistic uses both the unconstrained
MLE θ˜ and the constrained MLE θ. If H0 is indeed true, it should not matter
6.3 Summary of Classical Statistical Procedures 221
6.6, we use the second approach and form the concentrated log-likelihood
function for our models.
∂vec
= DG′
∂v
so we would write Equation 6.3 as
∂ℓ ∂vec φℓ
=
∂v ∂v φvec
Using Equation 6.4 allows us to form the Hessian matrix of ℓ(θ). We have
∂ ∂ℓ φ (∂ℓ/∂δ)
= DG′
∂v ∂δ φvec
so
′ ′
∂ 2ℓ
∂ ∂ℓ φ (∂ℓ/∂δ)
= = DG
∂δ∂v ∂v ∂δ φvec
and
∂ 2ℓ
∂ ∂ℓ ′ φ ′ φℓ
= = DG DG
∂v∂v ∂v ∂v φvec φvec
φ2 ℓ
φ φℓ
= DG′ DG = DG′ D (6.5)
φvec φvec φvecφvec G
1 φ2 ℓ
C = −p lim .
n φvecφvec
Then, we can write the information matrix as
B ′ DG
A
I (θ) = .
DG′ B DG′ CDG
Often, see for example Turkington (2005), the matrices B and C will be
Kronecker products or at least involve Kronecker products, thus justifying
our study in Chapter 3 of how the duplication matric DG interacts with
Kronecker products. In fact, in many econometric models C = 21 ( ⊗ ).
Consider then the case where
C = (E ⊗ E )
thus justifying our study in Section 3.2.2 of Chapter 3 of how the elimination
matrix LG NG interacts with Kronecker products. In the case where B is not
the null matrix, then
−1 G S
I (θ) =
S′ J
where
The next derivative in the score vector, namely ∂ℓ/∂v, uses the technique
explained in the previous section. Consider
φℓ 1 φ log || 1 φ
=− n − tr −1 Z.
φvec 2 φvec 2 φvec
Now, from Equation 4.4 of Chapter 4
φ log ||
= vec −1
φvec
and using the backward chain rule together with Equations 4.5 and 4.16 of
Chapter 4
φ tr −1 Z φvec −1 φ tr −1 Z
= = −( −1 ⊗ −1 )vecZ
φvec φvec φvec −1
so
φℓ 1 1
= − n vec −1 + ( −1 ⊗ −1 )vecZ
φvec 2 2
1 −1
= ( ⊗ −1 )vec (Z − n)
2
and
∂ℓ 1
= DG′ ( −1 ⊗ −1 )vec(Z − n). (6.7)
∂v 2
Together, Equations 6.6 and 6.7 give the components of the score vector
∂ℓ ∂ℓ ′ ′
′
∂ℓ
= .
∂θ ∂µ ∂v
φvec −1 φvec −1 a
φ ∂ℓ
=
φvec ∂µ φvec φvec −1
n
with a = (yi − µ).
i=1
But using Theorem 5.1 of Chapter 5,
n φvec −1 1 φvec −1 Z −1
φ φℓ
=− +
φvec φvec 2 φvec 2 φvec
−1
1 φvec −1 Z −1
φvec nIG2
=− − . (6.8)
φvec 2 2 φvec −1
But from Equation 4.15 of Chapter 4,
φvec −1 Z −1
= −1 Z ⊗ IG + IG ⊗ −1 Z (6.9)
φvec −1
so from Equations 6.8 and 6.9,
∂ 2ℓ ( −1 Z ⊗ IG ) (IG ⊗ −1 Z )
nIG2
= DG′ ( −1 ⊗ −1 ) − − DG .
∂v∂v 2 2 2
The Information Matrix
From basic statistics,
1 1
E (a) = 0 E (Z ) = ,
n n
so the information matrix is
−1
1 O
I (θ) = − lim E (H (θ)) = 1 ′ .
n→∞ n O D ( −1 ⊗ −1 )DG
2 G
6.6 The Limited Information Model 229
by Equation 4.4 of Chapter 4 and using the backward chain rule given by
Theorem 5.2 of Chapter 5, we have
φ tr −1U ′U φvec −1 φ tr −1U ′U
= = −( −1 ⊗ −1 )vecU ′U
φvec φvec φvec −1
= −vec −1U ′U −1 ,
by Equations 4.5 and 4.16 of Chapter 4. It follows that
∂ℓ D′
= G (vec −1U ′U −1 − nvec −1 )
∂v 2
which equals the null vector, only if
˜ = U ′U
= . (6.13)
n
The second derivative is by the backward chain rule and Theorem 5.1 of
Chapter 5:
∂ℓ 1 ∂u u ′ ( −1 ⊗ In )u
=− = H ′ ( −1 ⊗ In )u.
∂δ 2 ∂δ δu
Setting this derivative to the null vector gives,
H ′ ( −1 ⊗ In )(y − H δ) = 0.
Solving for δ gives and iterative interpretation for the limited information
maximum likelihood (LIML) estimator δ˜ as a generalized least squares
estimator namely,
δ˜ = (H ′ ( −1 ⊗ In )H )−1 H ′ ( −1 ⊗ In )y. (6.14)
This interpretation of the LIML estimator was first obtained by Pagan
(1979).
Equations 6.13 and 6.14 form the basis of our iterative procedures, which
is outlined as follows:
Iterative Procedure 1
1. Apply two-stage least squares (2SLS) (or another consistent estimation
procedure) to y1 = H1 δ1 + u1 to obtain the 2SLSE δˆ1 . Apply ordinary
least squares (OLS) to the reduced form equation Y1 = X 1 + V1 and
obtain the OLSE ˆ 1 . Compute the residual matrices
û1 = y1 − H1 δˆ1 , ˆ 1.
V̂1 = Y1 − X
234 Applications
ˆ
δˆ = (H ′ (
ˆ −1 ⊗ In )H )−1 H ′ (
ˆ −1 ⊗ In )y,
ˆ
and compute ûˆ = y − H δˆ and Ûˆ = rvecn û.ˆ
4. Repeat steps 2 and 3 with Ûˆ in place of Û .
5. Continue in this manner until convergence is reached. The LIML
estimate of δ1 is then the first component of the estimate thus obtained
for δ.
n
ℓ∗ (δ) = − log det U ′U .
2
The first order condition for the maximization of this function is ∂ℓ∗ /∂δ =
0 and our iterative process is derived from the two components of this
equation. We have using the backward chain rule and Equation 4.5 of
Chapter 4
(U ′U )−1
(u1′ MV u1 )−1 −(u1′ MV u1 )−1 u1′ V1 (V1′V1 )−1
= 1 1
.
−(V1′ Mu V1 )−1V1′ u1 (u1′ u1 )−1 (V1′ Mu V1 )−1
1 1
6.6 The Limited Information Model 235
∂ℓ∗ n H1′ MV u1
= ′ (H1′ u1 − H1′ (u1′ V1 (V1′V1 )−1 ⊗ In ))v1 = n ′ 1
,
∂δ1 u1 MV u1 u1 MV u1
1 1
H1′ MV u1 = 0.
1
X ′ Mu V1 = 0.
1
Solving gives
˜ 1 = (X ′ Mu X )−1 X ′ Mu Y1
(6.16)
1 1
Equations 6.15 and 6.16 form the basis of our next iterative process. Before
we outline this process, it pays us to give an interpretation to the iterative
estimators portrayed in these equations.
We have assumed that the rows of U = (u1V1 ) are statistically indepen-
dently identically, normally distributed random vectors with mean 0 and
covariance matrix
2
σ η′
= .
η 1
V1 = u1 η ′ −1
1 +W
236 Applications
Y1 = X 1 + u1 + W , (6.18)
Iterative Procedure 2
û1 = y1 − H1 δˆ1
and
3. Form
6. Repeat steps 2, 3, 4, and 5 with δ˜1 in place of the original estimate δˆ1 .
7. Continue in this manner until convergence is obtained.
and hence
Ṽ1 = (In − X (X ′ Mu X )−1 X ′ Mu )Y1 .
1 1
It follows that
n
ℓ∗∗ (δ1 ) = − log det Ũ ′Ũ
2
where Ũ = (u1Ṽ1 ).
Before using matrix calculus to obtain the derivative ∂ℓ∗∗ /∂δ1 it pays us
to simplify this expression as much as possible. To this end, write
det Ũ ′ Ũ = u1′ u1 det Y1′ Mu In − Mu X (X ′ Mu X )−1 X ′ Mu Mu Y1 .
1 1 1 1 1
(6.19)
238 Applications
1
det (u1Y1 ) ′ M (u1Y1 ) .
= ′
u1 Mu1
Furthermore,
′
0′ 0′
1 1
(u1 Y1 ) ′ M (u1 Y1 ) = (y1 Y1 ) ′ M (Y1 y1 ) (6.20)
−β1 IG −β1 IG
1 1
where the first partitioned matrix on the right-hand side of Equation 6.20
has a determinant equal to one. Therefore,
which does not depend on δ1 . Thus, the log-likelihood function ℓ∗∗ (δ1 ) can
be written as
n u ′ Mu n
ℓ∗∗ (δ1 ) = k ∗ − log 1 ′ 1 = k ∗ − (log u1′ Mu1 − log u1′ u1 )
2 u1 u1 2
Similarly,
∂ log u1′ u1 2H ′ u
= − ′1 1 ,
∂δ1 u1 u1
so
∂ℓ∗∗
′
H1′ Mu1 (H1′ Nu1 u1′ u1 − H1′ u1 u1′ Nu1 )
H1 u1
= −n − = n ,
∂δ1 u1′ u1 u1′ Mu1 u1′ u1 u1′ Mu1
6.6 The Limited Information Model 239
If this is the case, then the LIML estimator of δ1 has an iterative instrumental
variable interpretation given by
δ˜1 = (H̃1′ H1 )−1 H̃1′ y1 .
−1
To establish our result, we expand X ′ Mu X to obtain
1
−1 Nu u ′ N
X X ′ Mu X X′ = N + ′1 1 .
1 u1 Mu1
Then, after a little algebra, we find that
′
−1
′ Nu1′ Mu1 − Nu1 u1′ M
X X Mu X X Mu = .
1 1 u1′ Mu1
Thus,
′
H1′ Nu1 u1′ Mu1 − H1′ Mu1 u1′ Nu1 H1 Nu1 u1′ u1 − H1′ u1 u1′ Nu1
H̃1′ u1 = =
u1′ Mu1 u1′ Mu1
as we require.
Our results give rise to a third iterative process for finding the LIML
estimator of δ1 , which is now outlined:
Iterative Procedure 3
1. Apply steps 1, 2, and 3 of iterative process 2.
2. Form
ˆ1
Ŷ1 = X
and
Ĥ1 = (Ŷ1 X1 )
240 Applications
and obtain
δ1 = (Ĥ1′ Ĥ1 )−1 Ĥ1′ y1 .
3. Repeat steps 1 and 2 with δ1 in place of the original estimate of δ1 .
4. Continue in this manner until convergence is achieved.
(OLS). For the other four sets of initial starting values, Procedure 1 and
Procedure 3 always converge with Procedure 3, again being vastly more
efficient than Procedure 1. Procedure 2 often would not converge. In the
case where it did, it was ranked in efficiency terms between Procedure 1 and
Procedure 3.
The message from these results seems clear. Iterative procedures based
on the first-order conditions derived from the maximization of the log-
likelihood function work, but are inefficient. More efficient iterative proce-
dures can be derived by working with concentrated log-likelihood functions.
But the most efficient procedure arises from the first-order conditions of the
maximization of the log-likelihood function concentrated in the parameters
of primary interest. Moreover, such a procedure seems relatively insensitive
to the initial starting value. Concentrating out a subset of nuisance parame-
ters can lend to a more efficient iterative procedure, but this procedure may
become sensitive to initial starting values. Arbitrary starting values may not
give rise to convergence.
yi = Yi βi + Xi γi + ui = Hi δi + ui , i = 1, . . . , G,
E (ut i us j ) = σi j if t = s
=0 if t = s
y = Hδ + u (6.21)
Y B + XŴ = U (6.22)
Y = −X ŴB−1 + U B−1 = X + V
y = (IG ⊗ X )π + v
′
where π = vec, and v = vecV = (B−1 ⊗ In )u.
The unknown parameters of our model are θ = (δ ′ v ′ ) ′ where v = vech.
Usually, δ is the vector of parameters of primary interest and v is the vector
of nuisance parameters.
The likelihood function is the joint probability function of y. We obtain
this function by starting with the joint probability density function of u.
We have assumed that u ∼ N (0, ⊗ In ), so the joint probability density
function of y is
1 1 ′ −1
f (y) = |det J| n 1 exp − u ( ⊗ In )u ,
(2π) 2 (det ⊗ In ) 2 2
244 Applications
Ŵ = −( T1 δ1 · · · TG δ1 ).
246 Applications
Moreover,
vec B = vec IG − W δ,
Returning to Equation 6.24 using Equations 6.25, 6.26, and 6.27, we find
we can write
∂ℓ∗ (δ) ′ ˜ ⊗ In u
= H − W ′ (IG ⊗ V ′ )
−1
∂δ
and H ′ − W ′ (IG ⊗ V ′ )W is the block matrix
⎛ ⎞ ⎛ ⎞
H1 − V W1 O X 1 X1 O
⎜ .. ⎠=⎝
⎟ ⎜ .. ⎠.
⎟
⎝ . .
O HG − V WG O (X G XG )
6.7 The Full Information Model 247
δ˜ = H˜ ˜ ⊗ In H −1 H˜ ˜ ⊗ In y
−1 −1
(6.28)
ˆ
1. Repeat process with δ in place of δ.
2. Continue in this manner until convergence is reached.
Number of iterations
Initial values until convergence
Y1′ X1′
⎛ ⎞ ⎛ ⎞
O O
′
Y =⎝
⎜ .. ⎟ ′
and X = ⎝
⎜ .. ⎟
. ⎠ . ⎠
O YG′ O XG′
6.7 The Full Information Model 249
so we can write
′
Y
T ′ = H ′,
X
where T is the appropriate twining matrix.
Recognising this relationship facilitates the mathematics required in
applying classical statistical procedures to our model. To illustrate, sup-
pose we want to test the null hypothesis
H0 : β = 0
against the alternative
HA : β = 0,
where β = (β′1 . . . β′a )′
The null hypothesis implies that the equations of our model contain no
right-hand current endogenous variables and thus our model under the
null collapses to the Seemingly Unrelated Regressions Equation Model, see
Turkington (2005). Suppose further we want to develop the Lagrangian
multiplier test statistic for this hypothesis and present it as an alternative
to other test statistics that would be using to test endogeneity such as the
Hausman test statistic (Hausman (1978)).
We are working with the concentrated log-likelihood function ℓ∗ (δ)
formed by concentrating out the nuisance parameters v. The test statistic
we seek to form is then
′
1 ∂ℓ∗ ββ ˆ ∂ℓ∗
T∗ = I ( θ)
n ∂β θˆ ∂β θˆ
where Ẽ = Ũ ′Ũ /n, Ũ = rvecn ũ with ũ = (ũ1′ · · · ũG′ ) ′ and ũi the ordi-
nary least squares residual vector from the ith equation, that is, ũi =
(In − Xi (Xi′ Xi )−1 Xi′ )yi .
This iterative interpretation regards the joint generalised least squares
estimator (JGLSE) as the starting point in the iterative process to the MLE.
The constrained MLE of θˆ (or at least the iterative asymptotic equivalent
of this estimator) is θˆ = (( 0 γ̂ ) ′ T ′ v̂) ′ where v̂ = vech
ˆ and
ˆ = Û ′Û /n,
Û = rvecn û, and û is the JGLS residual vector, that is, û = y − X γ̂. Notice
that the twining matrix T is involved in the expression for the constrained
MLE of θ.
Our twining matrix T comes into play again when we form the second
component of our test statistic, namely ∂ℓ/∂β. Let = (β ′ γ ′ ) ′ , then as
T = δ it follows that T ∂ℓ/∂ψ = ∂ℓ/∂δ and that ∂ℓ/∂ψ = T ′ ∂ℓ/∂δ.We
can then obtain the derivative we want using
∂ℓ ∂ℓ
=S
∂β ∂ψ
G
where S is the selection matrix (Im Om×p ) with m = i=1 Gi and p =
G
i=1 ki .
In summary,
∂ℓ ∂ℓ
=A (6.30)
∂β ∂δ
where
(IG OG ×k ) O
⎛ ⎞
1 1 1
A = ST ′ = ⎝
⎜ .. ⎠.
⎟
.
O (IG OG ×k )
G G G
∂ℓ∗ ′ −1
ˆ ⊗ In u
=H
∂δ
6.7 The Full Information Model 251
where H is a block diagonal matrix with H i = X i Xi in the ith block
diagonal position. It follows from Equation 6.30 that
⎛ ′ ′ ⎞
1 X1 O
∂ℓ ⎜ ..
⎟ ˆ −1
=⎝ . ⎠ ⊗ In u.
∂β
O G′ XG′
The third component of the quadratic form, that is, the Lagrangian multi-
plier test statistic, can also be expressed with the help of twining matrices.
As ∂ℓ/∂β = A∂ℓ/∂δ, we have that I ββ = AI ββ A ′ . From our discussion in
Section 6.2, it is clear that
−1
I (δ) = I δδ (θ).
It is well known, see for example Turkington (2005), that
−1
δδ 1 ′ −1
I (θ) = p lim H ( ⊗ N )H ,
n
where N is the projection matrix N = X (X ′ X )−1 X ′ . Moreover, H =
(Y X )T ′ so we can write
1 −1
I δδ = p lim ST ′ T (Y X ) ′ ( −1 ⊗ N )(Y X )T ′ T S′
n
1 −1
= p lim S (Y X ) ′ ( −1 ⊗ N )(Y X ) S ′ (6.31)
n
so in obtaining the part of the Cramer-Rao lower bound we want, we need
the (1,1) block matrix of the inverse of Equation 6.31. That is,
1 ′ ′
I = p lim Y −1 ⊗ N Y − Y −1 ⊗ N
ββ
n
′ −1 ′ −1
X X −1 ⊗ N X X −1 ⊗ N Y .
Evaluating the probability limit requires basic asymptotic theory. If from
here on, we use the notation that {Ai′ A j } stands for a partitioned matrix
whose (i, j)th block is Ai′ A j , then
ββ 1 i j ′ ′
I = p lim σ i Xi X j j
n
−1 i j −1
− σi j i′ Xi′ X j σi j Xi′ X j σ Xi′ X j j
.
ˆ ′ X1′
⎛ ⎞
G O
∂ℓ .. ⎟ ˆ −1
=⎝ ⎠ ( ⊗ In )û.
⎜
θˆ .
∂β
O ˆ ′ X′
G G
5. Form
ˆ = 1 i j ˆ ′ ′ ˆ −1
σ̂ i Xi X j j − σ̂i j
ˆ i′ Xi′ X j σ̂i j Xi′ X j
ββ
I (θ)
n
ˆ j −1 .
× σ̂i j Xi′ X j
6.7 The Full Information Model 253
AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG
⎛
⎞
A1
⎜ B1 ⎟
⎜ ⎟
A ⎜ .. ⎟
TG,m,n =⎜ . ⎟
B ⎜ ⎟
⎝ AG ⎠
BG
255
256 Symbols and Operators Used in this Book
⎛ ⎞
A1 j·
A ( j ) = ⎝ ... ⎠
⎜ ⎟
AG j·
rvecm A = A1 . . . AG
Let C = C1 . . . CG where each submatrix is q ×n.
C( j ) = C1 .j . . . CG .j
⎛ ⎞
C1
⎜ .. ⎟
vecnC = ⎝ . ⎠
CG
Special Matrices
Kmn commutation matrix
rvecn Kmn generalized rvec of the commutation matrix
vecm Kmn generalized vec of the commutation matrix
1
Nn = In2 + Knn
2
Ln , Ln Nn , L̄n Nn , Ln , Ln∗ elimination matrices
Dn , D̄n duplication matrices
TG,m,n twining matrix
O null matrix
0 null column vector
References
Byron, R. P. ‘On the Derived Reduced Form from Limited Information Maximum
Likelihood’, Australia National University Memo, 1978.
Bowden, R. and Turkington, D. A. ‘Instrumental Variables’, vol 8 of the Econometric
Society Monographs in Quantitative Economics. New York: Cambridge University Press,
1990.
Durbin, J. ‘Maximum Likelihood Estimator of the Parameters of a System of Simulta-
neous Regression Equations’, Econometric Theory 4 (1988): 159–70.
Dwyer, P. S. ‘Some Applications of Matrix Derivatives in Multivariate Analysis’. Journal
of the American Statistical Association 26 (1967): 607–25.
Dwyer, P. S. and MacPhail, M. S. ‘Symbolic Matrix Derivatives’. Annals of Mathematical
Statistics 19 (1948): 517–34.
Efron, B. ‘Defining the Curvature of a Statistical Problem (with Applications to Second
Order Efficiency)’, Annals of Statistics 3 (1975): 1189–242.
Fuller, W. ‘Some Properties of a Modification of the Limited Information Estimator’,
Econometrica 45 (1977): 939–56.
Graham, A. Kronecker Products and Matrix Calculus with Applications. Chichester, U.K.:
Ellis Horwood, 1981.
Graeme, W. H. Econometric Analysis, 7th edn. Pearson, N.J.: Prentice Hall, 2010.
Hausman, J. ‘Specification Tests in Econometrics’, Econometrica 46 (1978): 1251–71.
Henderson, H. V. and Searle, S. R. ‘Vec and Vech Operators for Matrices with Some Uses
in Jacobian and Multivariate Statistics’, Canadian Journal of Statistics 7 (1979): 65–81.
Henderson, H. V. and Searle, S. R. ‘The Vec-Permutation Matrix, the Vec Operator and
Kronecker Products: A Review’, Linear and Multilinear Algebra 9 (1981): 271–88.
Horn, R. A. and Johnson, C.R. Matrix Analysis. New York: Cambridge University Press,
1981.
Lutkepohl, H. Handbook of Matrices. New York: John Wiley & Sons, 1996.
Magnus, J. Linear Structures. New York: Oxford University Press, 1988.
Magnus, J. R. ‘On the Concept of Matrix Derivative’, Journal of Multivariate Analysis
101 (2010): 2200–06.
Magnus, J. R. and Neudecker, H. Matrix Differential Calculus with Applications in Statis-
tics and Econometrics, revised edn. New York: John Wiley & Sons, 1999.
Maller, R. A. and Turkington, D. A. ‘New Light on the Portfolio Allocation Problem’,
Mathematical Methods of Operations Research 56 (2002): 501–11.
257
258 References
259
260 Index
matrix calculus and, 135–138, 141, 143, nuisance parameters and, 221–223,
158, 160, 162, 165–168, 205–206, 208 231–232, 234, 237, 241, 243–245, 249
optimization and, 215 partitions and, 223
Score vector, 215–219, 223, 227 scalar functions and, 218
Searle, S. R., 37, 81 score vector and, 215–219, 223, 227
Seemingly Unrelated Regressions Equation selection matrices and, 28
Model, 249–250 symmetry and, 223–224
Selection matrices test procedures and, 214, 219–222,
definition of, 28 248–254
duplication matrices and, 89 (see also twining matrices and, 76
Duplication matrices) vec operators and, 207, 214–215
econometrics and, 28–29 v operator and, 18
elimination matrices and, 89 (see also Submatrices, 255
Elimination matrices) column of, 6–7, 100–101
full information model and, 245, 250, cross-product of matrices and, 6–12,
252 15–16, 25–26, 171–177, 186, 193, 195,
identity matrix and, 28–29, 31 199, 203
Kronecker products and, 29 duplication matrices and, 116, 118, 121,
properties of, 28–33, 38, 41, 71 128
statistics and, 28 elimination matrices and, 90, 94–95,
theorems for, 30–33 98–105, 108
Sharpe ratio, 216, 218 generalized vecs/rvecs and, 18–24,
Square matrices, 17–18, 89, 98, 193 171–181, 184–185
Srivastava, U. K., 157–158 Kronecker products and, 1, 3–6, 9
Statistics, 141 limited information model and, 248
chain rule and, 224 matrix calculus and, 143–147, 157,
classical procedures for, 218–226 171–181, 184–186, 193, 195, 199, 203
commutation matrices and, 35 recursive derivatives and, 157
concentrated likelihood function and, row of, 6–7, 100
222, 232 transformation principles and, 143–147
covariance matrix and, 219, 221 zero-one matrices and, 32–33, 41, 43,
Cramer-Rao lower bound and, 218–219, 47–57, 67, 69–72, 77–81, 84–87
221, 229, 249, 251 Symmetry
derivatives and, 220, 223–224 classical statistics and, 223–224
duplication matrices and, 121, 132 commutation matrices and, 37, 72–73
elimination matrices and, 105 derivatives and, 210–213
full information likelihood (FIML) duplication matrices and, 112, 121–122,
estimator and, 245–248 124, 130–132
Hessian matrix and, 160, 215–219, elimination matrices and, 90, 98, 106,
223–225, 227–228 109
information matrix and, 215, 219, 221, full information model and, 243
225, 228–229 idempotent matrices and, 37, 73, 132
likelihood ratio test statistic and, 220–223 log-likelihood functions and, 213, 224
limited information maximum likelihood matrix calculus and, 140, 160, 166,
(LIML) estimators and, 230–241 205–213, 223–224
log-likelihood functions and, 213–215, vech operators and, 18
218, 222–224, 226, 229–232, 234, vec operators and, 210–213
237–238, 241–245, 249
maximum likelihood estimators and, 161, Test procedures
214, 219–221, 230–240, 245–249 Hausman, 249
multivariate normal distribution and, Langrangian multiplier, 220–221,
215, 219, 225–230 248–254
266 Index