0% found this document useful (0 votes)
18 views

Vdoc.pub Generalized Vectorization Cross Products and Matrix Calculus

Uploaded by

spossito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Vdoc.pub Generalized Vectorization Cross Products and Matrix Calculus

Uploaded by

spossito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 279

GENERALIZED VECTORIZATION, CROSS-PRODUCTS,

AND MATRIX CALCULUS

This book presents the reader with new operators and matrices that arise in
the area of matrix calculus. The properties of these mathematical concepts
are investigated and linked with zero-one matrices such as the commutation
matrix. Elimination and duplication matrices are revisited and partitioned into
submatrices. Studying the properties of these submatrices facilitates achieving
new results for the original matrices themselves. Different concepts of matrix
derivatives are presented and transformation principles linking these concepts
are obtained. One of these concepts is used to derive new matrix calculus
results, some involving the new operators and others the derivatives of the
operators themselves. The last chapter contains applications of matrix calculus,
including optimization, differentiation of log-likelihood functions, iterative
interpretations of maximum likelihood estimators, and a Lagrangian multiplier
test for endogeneity.

Darrell A. Turkington is a professor of economics at the University of Western


Australia. His numerous publications include articles in leading international
journals such as the Journal of the American Statistical Association, the Interna-
tional Economic Review, and the Journal of Econometrics. He is also the author
of Instrumental Variables (Cambridge University Press, 1985, with Roger J.
Bowden), Matrix Calculus and Zero-One Matrices: Statistical and Econometric
Applications (Cambridge University Press, 2002), and Mathematical Tools for
Economics (2007). Professor Turkington received his PhD in theoretical econo-
metrics from the University of California, Berkeley.
Generalized Vectorization, Cross-Products,
and Matrix Calculus

DARRELL A. TURKINGTON
University of Western Australia
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Mexico City
Cambridge University Press
32 Avenue of the Americas, New York, NY 10013-2473, USA
www.cambridge.org
Information on this title: www.cambridge.org/9781107032002


C Darrell A. Turkington 2013

This publication is in copyright. Subject to statutory exception


and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.

First published 2013

Printed in the United States of America

A catalog record for this publication is available from the British Library.

Library of Congress Cataloging in Publication data


Turkington, Darrell A., author.
Generalized vectorization, cross-products, and matrix calculus /
Darrell A. Turkington.
pages cm
Includes bibliographical references and index.
ISBN 978-1-107-03200-2 (hardback)
1. Matrices. 2. Vector analysis. I. Title.
QA188.T8645 2012
515′ .63–dc23 2012022017

ISBN 978-1-107-03200-2 Hardback

Cambridge University Press has no responsibility for the persistence or accuracy of URLs for
external or third-party Internet Web sites referred to in this publication and does not guarantee
that any content on such Web sites is, or will remain, accurate or appropriate.
Contents

Preface page ix
1 Mathematical Prerequisites 1
1.1 Introduction 1
1.2 Kronecker Products 2
1.3 Cross-Product of Matrices 6
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 13
1.4.1 Basic Operators 13
1.4.2 Vecs, Rvecs, and the Cross-Product Operator 15
1.4.3 Related Operators: Vech and v 17
1.4.4 Generalized Vecs and Generalized Rvecs 18
1.4.5 Generalized Vec Operators and the Cross-Product
Operator 25
2 Zero-One Matrices 28
2.1 Introduction 28
2.2 Selection Matrices and Permutation Matrices 28
2.3 The Elementary Matrix Eimn j 34
2.4 The Commutation Matrix 35
2.4.1 Commutation Matrices, Kronecker Products,
and Vecs 38
2.4.2 Commutation Matrices and Cross-Products 50
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 57
2.5.1 Deriving Results for Generalized Vecs and Rvecs of
the Commutation Matrix 60
2.5.2 Generalized Vecs and Rvecs of the Commutation
Matrix and Cross-Products 68
2.5.3 KnG,G versus Rvecn KGn 70
2.5.4 The Matrix Nn 71

v
vi Contents

2.6 The Matrix Umn 74


2.7 Twining Matrices 76
2.7.1 Introduction 76
2.7.2 Definition and Explicit Expressions for a Twining
Matrix 77
2.7.3 Twining Matrix TG,m,n and the Commutation Matrix 79
2.7.4 Properties of the Twining Matrix TG,m,n . 80
2.7.5 Some Special Cases 82
2.7.6 Kronecker Products and Twining Matrices 83
2.7.7 Generalizations 84
2.7.8 Intertwining Columns of Matrices 86
3 Elimination and Duplication Matrices 89
3.1 Introduction 89
3.2 Elimination Matrices 89
3.2.1 The Elimination Matrix Ln 90
3.2.2 The Elimination Matrix Ln Nn 98
3.2.3 The Elimination Matrices Ln and Ln Nn 107
3.2.4 The Elimination Matrices Ln∗ 110
3.3 Duplication Matrices 111
3.3.1 The Duplication Matrix Dn 111
3.3.2 The Elimination Matrix Ln Nn and the Duplication
Matrix Dn 125
3.3.3 The Duplication Matrix Dn 132
4 Matrix Calculus 134
4.1 Introduction 134
4.2 Different Concepts of a Derivative of a Matrix with Respect
to Another Matrix 135
4.3 The Commutation Matrix and the Concepts of Matrix
Derivatives 139
4.4 Relationships Between the Different Concepts 141
4.5 Tranformation Principles Between the Concepts 143
4.5.1 Concept 1 and Concept 2 143
4.5.2 Concept 1 and Concept 3 144
4.5.3 Concept 2 and Concept 3 146
4.6 Transformation Principle One 147
4.7 Transformation Principle Two 152
4.8 Recursive Derivatives 157
Contents vii

5 New Matrix Calculus Results 164


5.1 Introduction 164
5.2 Concept of a Matrix Derivative Used 164
5.3 Some Basic Rules of Matrix Calculus 166
5.4 Matrix Calculus Results Involving Generalized Rvecs or
Cross-Products 168
5.5 Matrix Derivatives of Generalized Vecs and Rvecs 178
5.5.1 Introduction 178
5.5.2 Large X 178
5.5.3 Small X 183
5.6 Matrix Derivatives of Cross-Products 186
5.6.1 Basic Cross-Products 186
5.6.2 Cross-Products Involving X ′ 190
5.6.3 Cross-Products Involving X −1 193
5.6.4 The Cross-Product X τGm m X 195
5.6.5 The Cross-Product X ′ τGm m X ′ 198
5.6.6 The Cross-Product X −1 τGmm X −1 202
5.7 Results with Reference to ∂ vec Y/∂ vec X 205
5.7.1 Introduction 205
5.7.2 Simple Theorems Involving ∂vec Y/∂vec X 205
5.7.3 Theorems Concerning Derivatives Involving VecA,
Vech A, and v 207
5.7.4 Theorems Concerning Derivatives Involving VecX
where X Is Symmetric 210
6 Applications 214
6.1 Introduction 214
6.2 Optimization Problems 215
6.3 Summary of Classical Statistical Procedures 218
6.3.1 The Score Vector, the Information Model, and the
Cramer-Rao Lower Bound 218
6.3.2 Maximum Likelihood Estimators and Test Procedures 219
6.3.3 Nuisance Parameters 221
6.4 Matrix Calculus and Classical Statistical Procedures 223
6.5 Sampling from a Multivariate Normal Distribution 226
6.6 The Limited Information Model 229
6.6.1 The Model and the Log-Likelihood Function 229
6.6.2 Iterative Interpretations of Limited Information
Maximum Likelihood Estimators 230
6.6.3 Comparison of the Three Iterative Procedures 240
viii Contents

6.7 The Full Information Model 242


6.7.1 The Model and the Log-Likelihood Function 242
6.7.2 The Full Information Maximum Likelihood
Estimator As an Iterative Instrumental Variable
Estimator 245
6.7.3 A Lagrangian Multiplier Test for Endogeneity 248

Symbols and Operators Used in this Book 255


References 257
Index 259
Preface

This book can be regarded as a sequel to my previous book, Matrix Calculus


and Zero-One Matrices: Statistical and Econometric Applications, which was
published by Cambridge University Press in 2002 (with a paperback edition
published in 2005). It largely concerns itself with the mathematics behind
matrix calculus. Several new matrix operators and matrices are introduced
in this book and their properties are studied. This forms the substance
of the first three chapters of the book. Chapter 4 may be regarded as an
application of some of these mathematical concepts. Chapter 5 gives new
matrix calculus results pertaining to the new operators. The last chapter
gives some applications of matrix calculus itself.
Aiming to have a self-contained book, I cannot avoid presenting some
known theorems and definitions along with some results from my previous
book.
The outline of the chapters in more detail follows: The first chapter intro-
duces a new matrix operator, which I call a cross-product of matrices. It
sums Kronecker products formed from two partitioned matrices. General-
ized vecs and rvecs are presented. These matrix operators are generalizations
of the vec and rvec operators, and come into their own when we are dealing
with partitioned matrices.
Chapter 2 deals with well-known zero-one matrices such as selection
matrices, permutation matrices, elementary matrices, and commutation
matrices. A number of theorems are given involving commutation matrices
and cross-products of matrices. This chapter also looks at zero-one matri-
ces that the reader may not be as familiar with, namely generalized vecs
and rvecs of the commutation matrix. These concepts were introduced in
my previous book. The chapter builds on this work presenting many new
theorems about generalized vecs and rvecs of the commutation matrix, and
methods for finding results for these matrices from known results of the
ix
x Preface

commutation matrix itself. This chapter introduces two new matrices whose
properties are investigated. One is similar to the commutation matrix in
that its submatrices are certain elementary matrices. The second, I call a
‘twining matrix’, a zero-one matrix that intertwines rows or columns of a
given set of matrices. Its relationship to the commutation matrix is clearly
shown.
Chapter 3 studies in some detail well-known matrices associated with
matrix calculus, namely elimination and duplication matrices. The
approach taken is to partition these matrices into interesting submatrices
and study the properties of these submatrices. This facilitates the inves-
tigation as to how these peculiar matrices interact with other matrices,
particularly Kronecker products. It also involves the introduction of new
matrix operators whose properties in turn are studied.
Chapter 4 looks at four concepts of the derivative of a matrix with respect
to another matrix that exists in the literature and develops transformation
principles that allow an easy movement from a result obtained using one
of the concepts to the corresponding results for the others. In doing so,
extensive use is made of results obtained in the first two chapters.
Chapter 5 derives new matrix calculus results with reference to general-
ized vecs and cross-products of matrices, and shows how those results can
be expanded into appropriate submatrices. The last section of this chapter
gives some simple, but powerful, theorems involving the concept of the
matrix derivative used in this book.
The final chapter presents applications of matrix calculus itself. It demon-
strates how matrix calculus can be used to efficiently solve complicated
optimization problems, but it is largely concerned with the use of matrix
calculus in statistics and econometrics. It explains how matrix differentia-
tion can be used in differentiating a log-likelihood function, involving as it
usually does a symmetric covariance matrix, in obtaining the score vector
and finally in obtaining the information matrix. This work calls on the
theorems of the last section of Chapter 5.
The second part of Chapter 6 uses matrix calculus to obtain iterative
interpretations of maximum likelihood estimators in simultaneous equa-
tion models in terms of econometric estimators. It looks at the computa-
tional convergence of the different interpretations. Finally, a new Lagrangian
multiplier test statistic is derived for testing for endogeneity in such models.
Two institutions should be mentioned in the preface: First, my home
university, the University of Western Australia, for allowing me time off
from teaching to concentrate on the manuscript; second, Nuffield College
Preface xi

Oxford. As an academic visitor there, I first conceived the notion of this


book. During a second visit, I put the finishing touches to it.
Several individuals must also be thanked. Anna Wiechecki and Rebecca
Doran-Wu for their skill at typing this work; Holly Elsholz for proofreading
the manuscript; and finally, my family, Sonia, Joshua, and Nikola for their
support.
ONE

Mathematical Prerequisites

1.1 Introduction
This chapter considers elements of matrix algebra, knowledge of which is
essential for discussions throughout this book. This body of mathematics
centres around the concepts of Kronecker products and vecs of a matrix.
From the elements of an m×n matrix A = {ai j } and a p×q matrix B = {bi j },
the Kronecker product forms a new mp×nq matrix. The vec operator forms
a column vector from the elements of a given matrix by stacking its columns
one underneath the other. This chapter discusses several new operators that
are derived from these basic operators.
The operator, which I call the cross-product operator, takes the sum of
Kronecker products formed from submatrices of two given matrices. The
rvec operator forms a row vector by stacking the rows of a given matrix
alongside each other. The generalized vec operator forms a new matrix
from a given matrix by stacking a certain number of its columns, taken as a
block, under each other. The generalized rvec operator forms a new matrix
by stacking a certain number of rows, again taken as a block, alongside each
other.
Although it is well known that Kronecker products and vecs are intimately
connected, this connection also holds for rvec and generalised operators as
well. The cross-product operator, as far as I know, is being introduced
by this book. As such, I present several theorems designed to investigate
the properties of this operator. This book’s approach is to list, without
proof, well-known properties of the mathematical operator or concept in
hand. However, I give a proof whenever I present the properties of a new
operator or concept, a property in a different light, or something new about a
concept.

1
2 Mathematical Prerequisites

1.2 Kronecker Products


Let A = {ai j } be an m×n matrix and B be a p×q matrix. The mp×nq
matrix given by
⎡ ⎤
a11 B · · · a1n B
⎢ .. .. ⎥
⎣ . . ⎦
am1 B · · · amn B
is called the Kronecker product of A and B, denoted by A ⊗ B.
Well-known properties of Kronecker products are as follows:
A ⊗ (B ⊗ C ) = (A ⊗ B) ⊗ C = A ⊗ B ⊗ C;
(A + B) ⊗ (C + D) = A ⊗ C + A ⊗ D + B ⊗ C + B ⊗ D; and
(A ⊗ B)(C ⊗ D) = AC ⊗ BD provided AC and BD exist. (1.1)
The transpose of a Kronecker product is the Kronecker product of transposes
(A ⊗ B) ′ = A ′ ⊗ B ′ .
If A and B are non-singular, the inverse of a Kronecker product is the
Kronecker product of the inverses
(A ⊗ B)−1 = A−1 ⊗ B−1 .
If A is an n×n matrix and B is an p× p matrix, then
tr(A ⊗ B) = trA.trB,
where the determinant of the Kronecker product is given by
|A ⊗ B| = |A|P .|B|n .
Notice that generally, this operator does not obey the commutative law.
That is, A ⊗ B = B ⊗ A. One exception to this rule is if a and b are column
vectors, not necessarily of the same order, then
a ′ ⊗ b = b ⊗ a ′ = ba ′ . (1.2)
This exception allows us to write A ⊗ b in an interesting way, where A is an
m×n matrix and b is a column vector. Partition A into its rows so
⎛ 1′ ⎞
a
⎜ .. ⎟
A=⎝ . ⎠

am
1.2 Kronecker Products 3

where the notation we use for the ith row of a matrix throughout this book

is ai . Thus, from our definition of a Kronecker product
⎛ 1′ ⎞
a ⊗b
A⊗b=⎝ ..
⎠.
⎜ ⎟
.

am ⊗ b

By using Equation 1.2, we can write


′ ⎞
b ⊗ a1

A⊗b=⎝ ..
⎠.
⎜ ⎟
.
m′
b⊗a

As far as partitioned matrices are concerned, suppose we partition A into


submatrices as follows:
⎛ ⎞
A11 . . . A1K
A = ⎝ ... .. ⎟ .

. ⎠
AL1 . . . ALK

Therefore, from our definition it is clear that


⎛ ⎞
A11 ⊗ B . . . A1K ⊗ B
A⊗B =⎝ .. (1.3)
⎠.
⎜ ⎟
.
AL1 ⊗ B . . . ALK ⊗ B

Likewise, suppose we partition B into an arbitrary number of submatrices,


say,
⎛ ⎞
B11 . . . B1r
B = ⎝ ... .. ⎟ .

. ⎠
Bs1 . . . Bsr

Then, in general,
⎛ ⎞
A ⊗ B11 · · · A ⊗ B1r
A ⊗ B = ⎝ .. ..
⎠.
⎜ ⎟
. .
A ⊗ Bs1 · · · A ⊗ Bsr

One exception to this rule can be formulated as follows: Suppose B is a p×q


matrix and we write B = (B1 . . . Br ), where each submatrix of B has p rows.
4 Mathematical Prerequisites

Furthermore, let a be any column vector, say m×1. Then,


⎛ ⎞ ⎛ ⎞
a1 (B1 . . . Br ) a1 B 1 ... a1 Br
. . .. ⎟
a⊗B =⎝ .. ⎠ = ⎝ ..
⎜ ⎟ ⎜
. ⎠
am (B1 . . . Br ) am B 1 am Br

= (a ⊗ B1 . . . a ⊗ Br ). (1.4)

Staying with the same partitioning of B, consider A a m×n matrix parti-


tioned into its columns A = (a1 . . . an ). Therefore, using Equations 1.3 and
1.4, it is clear that

A ⊗ B = (a1 ⊗ B1 . . . a1 ⊗ Br . . . an ⊗ B1 . . . an ⊗ Br ).

If, for example, B is partitioned into its columns, then B = (b1 . . . bq ), so


we can write

A ⊗ B = (a1 ⊗ b1 . . . a1 ⊗ bq . . . an ⊗ b1 . . . an ⊗ bq ). (1.5)

Another exception to the rule is a ′ ⊗ B, where we now partition B as B =


(B1′ . . . Bs′ ) ′ and each submatrix has q columns. Therefore,

a ′ ⊗ B1
⎛ ⎞
..
a′ ⊗ B = ⎝ ⎠.
⎜ ⎟
.
a ′ ⊗ Bs

If A is m×n, then

⎛ ⎞
a 1 ⊗ B1
⎜ .. ⎟

⎜ . ⎟

⎜ 1′ ⎟
⎜ a ⊗ Bs ⎟
..
⎜ ⎟
A⊗B =⎜
⎜ ⎟
. ⎟
⎜ ′ ⎟
⎜ am ⊗ B ⎟
⎜ 1⎟
⎜ .. ⎟

⎝ . ⎟


a m ⊗ Bs
1.2 Kronecker Products 5


where, as before, ai refers to the ith row of A, i = 1, . . . , m. If B is partitioned
into its rows, then
⎛ 1′ ′ ⎞
a ⊗ b1
⎜ .. ⎟
⎜ 1′ . p′ ⎟
⎜ ⎟
⎜a ⊗b ⎟
⎜ ⎟
A⊗B =⎜
⎜ .. ⎟
(1.6)
⎜ ′ . ⎟
⎜ am ⊗ b1 ′ ⎟

⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
′ ′
am ⊗ b p

where b j refers to this jth row of B, j = 1, . . . , p.
Let x be a column vector and A a matrix. As a consequence of these
′ ′
results, the ith row of x ′ ⊗ A is x ′ ⊗ ai , where ai is the ith row of A, and
the jth column of x ⊗ A is x ⊗ a j , where a j is the jth column of A.
Another useful property for Kronecker products is this: Suppose A and B
are m×n and p×q matrices respectively, and x is any column vector. Then,
A(In ⊗ x ′ ) = (A ⊗ 1)(In ⊗ x ′ ) = A ⊗ x ′
(x ⊗ Ip )B = (x ⊗ Ip )(1 ⊗ B) = x ⊗ B,
where In is the n×n identity matrix.
We can use these results to prove that for a, a n×1 column vector and
b a p×1 column vector,
(a ′ ⊗ IG )(b ′ ⊗ InG ) = b ′ ⊗ a ′ ⊗ IG .
Clearly,
(a ′ ⊗ IG )(b ′ ⊗ InG ) = (a ′ ⊗ IG )(b ′ ⊗ In ⊗ IG ) = a ′ (b ′ ⊗ In ) ⊗ IG
= (1 ⊗ a ′ )(b ′ ⊗ In ) ⊗ IG = b ′ ⊗ a ′ ⊗ IG .
Another notation used throughout this book is: I represent the ith column
of the n×n identity matrix In by ein and the jth row of this identity matrix

by e nj . Using this notation, a result that we find useful in our future work is
given by our first theorem.

Theorem 1.1 Consider the n×m matrix given by


 
O ein O
n×(p−1) n×(m−p)
6 Mathematical Prerequisites

for i = 1, . . . , n. Then,

In ⊗ emp = O e1n O . . . O enn O .
 

Proof: We have
′ ′
m′
  m′ ′
In ⊗ emp = e1n ⊗ e m n
= e p ⊗ e1n . . . e m n
 
p . . . en ⊗ e p p ⊗ en
= O e1n O . . . O enn O .
 


1.3 Cross-Product of Matrices


Much of this book’s discussions involve partitioned matrices. A matrix
operator that I find very useful when working with such matrices is the
cross-product operator. This section introduces this operator and presents
several theorems designed to portray its properties.
Let A be an mG× p matrix and B be an nG×q matrix. Partition these
matrices as follows:
⎛ ⎞ ⎛ ⎞
A1 B1
⎜ .. ⎟ ⎜ .. ⎟
A = ⎝ . ⎠, B = ⎝ . ⎠
AG BG
where each submatrix Ai of A is m× p for i = 1, . . . , G and each submatrix
B j of B is n×q for j = 1, . . . , G. The cross-product of A and B denoted by
AτGmn B is the mn× pq matrix given by
AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG .
Notice the first subscript attached to the operator refers to the number of
submatrices in the partitions of the two matrices, the second subscript refers
to the number of rows in each submatrix of A, and the third subscript refers
to the number of rows in each of the submatrices of B.
A similar operator can be defined when the two matrices in question are
partitioned into a row of submatrices, instead of a column of submatrices
as previously discussed. Let C be a p×mG matrix and D be a q×nG matrix,
and partition these matrices as follows:
C = (C1 . . . CG ) D = (D1 . . . DG ),
where each submatrix Ci of C is p×m for i = 1, . . . , G and each submatrix
D j of D is q×n for j = 1, . . . , G. Then, the column cross-product is defined
as
C τGmn D = C1 ⊗ D1 + · · · + CG ⊗ DG .
1.3 Cross-Product of Matrices 7

The operator τ is the relevant operator to use when matrices are partitioned
into a ‘column’ of submatrices, where as τ is the appropriate operator to
use when matrices are partitioned into a ‘row’ of submatrices. The two
operators are intimately connected as
(AτGmn B) ′ = A1′ ⊗ B1′ + · · · + AG′ ⊗ BG′ = A ′ τGmn B ′ .
In this book, theorems are proved for τ operator and the equivalent results
for the τ operator can be obtained by taking transposes.
Sometimes, we have occasion to take the cross-products of very large
matrices. For example, suppose A is mrG× p and B is nG×q as previously
shown. Thus, if we partition A as
⎛ ⎞
A1
⎜ .. ⎟
A = ⎝ . ⎠,
AG
each of the submatrices in this partition is mr × p. To avoid confusion, signify
the cross-product between A and B, namely A1 ⊗ B1 + · · · + AG ⊗ BG as
AτG,mr,n B, and the cross-product between B and A, B1 ⊗ A1 + · · · + BG ⊗
AG as BτG,n,mr A.
Notice that in dealing with two matrices A and B, where A is mG× p
and B is mG×q, then it is possible to take two cross-products AτGmm B or
AτmGG B, but, of course, these are not the same. However, the following
theorem shows that in some cases the two cross-products are related.

Theorem 1.2 Let A be a mG× p matrix, B be a ns×q matrix, and D be a


G×s matrix. Then,
Bτsnm (D ′ ⊗ Im )A = (D ⊗ In )B τGnm A.

Proof: Write
′ ⎞
d1

D = (d1 . . . ds ) = ⎝ ... ⎠ .
⎜ ⎟

dG
Then,
′ ⎛ 1′
d 1 ⊗ In
⎛ ⎞ ⎞
(d ⊗ In )B
(D ⊗ In )B = ⎝ .
.. ..
⎠B = ⎝ ⎠.
⎜ ⎟ ⎜ ⎟
.
′ ′
d G ⊗ In (d G ⊗ In )B
8 Mathematical Prerequisites

Partition A as
⎛ ⎞
A1
A = ⎝ ... ⎠
⎜ ⎟

AG
where each submatrix Ai is m× p. Then,
′ ′
(D ⊗ In )B τGnm A = (d 1 ⊗ In )B ⊗ A1 + · · · + (d G ⊗ In )B ⊗ AG .
Now
⎛ ′  ⎞
d1′ ⊗ Im
⎛ ⎞
d1 ⊗ Im A
(D ′ ⊗ Im )A = ⎝ .. ..
⎠A = ⎝ .  ⎠.
⎜ ⎟ ⎜ ⎟
. 
ds′ ⊗ Im ds′ ⊗ Im A
But,
⎛ ⎞
A1
d j ⊗ Im A = d1 j Im . . . dG j Im ⎝ ... ⎠ = d1 j A1 + · · · + dG j AG ,
 ′   ⎜ ⎟

AG
so when we partition B as
⎛ ⎞
B1
B = ⎝ ... ⎠
⎜ ⎟

Bs
where each submatrix Bi is n×q, we have
Bτsnm (D ′ ⊗ Im )A
= B1 ⊗ (d11 A1 + · · · + dG1 AG ) + · · · + Bs ⊗ (d1s A1 + . . . dGs AG )
= B1 ⊗ d11 A1 + · · · + Bs ⊗ d1s A1 . . . B1 ⊗ dG1 AG + · · · + Bs ⊗ dGs AG
= (d11 B1 + · · · + d1s Bs ) ⊗ A1 + · · · + (dG1 B1 . . . + dGs Bs ) ⊗ AG
′ ′
= (d 1 ⊗ In )B ⊗ A1 + · · · + (d G ⊗ In )B ⊗ AG . 

In the following theorems, unless specified, A is mG× p and B is nG×q,


and
⎛ ⎞ ⎛ ⎞
A1 B1
⎜ .. ⎟ ⎜ .. ⎟
A = ⎝ . ⎠, B = ⎝ . ⎠ (1.7)
AG BG
1.3 Cross-Product of Matrices 9

where each submatrix Ai of A is m× p and each submatrix B j of B is n×q,


for i = 1, . . . , G and j = 1, . . . , G. The proofs of these theorems are derived
using the properties of Kronecker products.

Theorem 1.3 Partition A differently as

A = (C D . . . F )

where each submatrix C, D, . . . , F has mG rows. Then,

AτGmn B = (CτGmn B DτGmn B . . . F τGmn B).

Proof: From our definition,

AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG .

Writing Ai = (Ci Di . . . Fi ) for i = 1, . . . , G, we have from the properties of


Kronecker products that

Ai ⊗ Bi = (Ci ⊗ Bi Di ⊗ Bi . . . Fi ⊗ Bi ).

The result follows immediately. 

Theorem 1.4 Let A and B be mG× p matrices, and let C and D be nG×q
matrices. Then,

(A + B)τGmnC = AτGmnC + BτGmnC

and

AτGmn (C + D) = AτGmnC + AτGmn D.

Proof: Clearly,

(A + B)τGmnC = (A1 + B1 ) ⊗ C1 + · · · + (AG + BG ) ⊗ CG


= A1 ⊗ C1 + · · · + AG ⊗ CG + B1 ⊗ C1 + · · · + BG ⊗ CG
= AτGmnC + BτGmnC.

The second result is proved similarly. 

Theorem 1.5 Let A and B be specified in Equation 1.7, let C, D, E, F be


p×r, q×s, r ×m, and s×n matrices, respectively. Then,

(AτGmn B)(C ⊗ D) = ACτGmn BD


10 Mathematical Prerequisites

and

(E ⊗ F )(AτGmn B) = (IG ⊗ E )AτGrs (IG ⊗ F )B.

Proof: Clearly,

(AτGmn B)(C ⊗ D) = (A1 ⊗ B1 + · · · AG ⊗ BG )(C ⊗ D)


= A1C ⊗ B1 D + · · · + AGC ⊗ BG D
⎛ ⎞ ⎛ ⎞
A1C B1 D
= ⎝ ... ⎠ τGmn ⎝ ... ⎠ = AC τGmn B D.
⎜ ⎟ ⎜ ⎟

AG C BG D

Likewise,

(E ⊗ F )(AτGmn B) = (E ⊗ F )(A1 ⊗ B1 + · · · + AG ⊗ BG )
= EA1 ⊗ F B1 + · · · + EAG ⊗ F BG
⎛ ⎞ ⎛ ⎞
EA1 F B1
= ⎝ ... ⎠ τGrs ⎝ ... ⎠
⎜ ⎟ ⎜ ⎟

EAG F BG
= (IG ⊗ E )A τGrs (IG ⊗ F )B. 

A standard notation that is regularly used in this book is

Ai . = i-th row of the matrix A


A.j = j-th column of the matrix A.

For the next theorem, it is advantageous to introduce a new notation that we


will find useful for our work throughout most chapters. We are considering
A a mG× p matrix, which we have partitioned as
⎛ ⎞
A1
⎜ .. ⎟
A=⎝ . ⎠
AG

where each submatrix Ai in this partitioning is m× p. Thus, we denoted by


A( j ) the G× p matrix given by
⎛ ⎞
(A1 ) j .
A( j ) = ⎝ ... ⎠ . (1.8)
⎜ ⎟

(AG ) j .
1.3 Cross-Product of Matrices 11

That is, to form A( j ) where we stack the jth rows of the submatrices under
each other.
Notice if C is a r ×G matrix and D is a s×m matrix, then from Equation 1.6

′ ′ ⎞
c1 ⊗ d1

⎜ .. ⎟
⎜ 1′ . s ′ ⎟
⎜ ⎟
⎜c ⊗d ⎟
⎜ ⎟
C⊗D =⎜
⎜ .. ⎟
⎜ ′ . ⎟
⎜ cr ⊗ d 1′ ⎟

⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
′ ′
cr ⊗ ds
so
′ ′ ⎞
c1 ⊗ d j

[(C ⊗ D)A]( j ) .. j′
=⎝ ⎠ A = (C ⊗ d )A. (1.9)
⎜ ⎟
.
′ ′
cr ⊗ d j

A special case of interest to us is when D is an identity matrix, in which


case
( j )  ′ ′
= C ⊗ e mj A = C IG ⊗ e mj A = CA( j ) . (1.10)
 
(C ⊗ Im )A

Using this notation, we have the following theorem, which demonstrates


that we can write AτGmn B in terms of a vector of τG1n cross-products.

Theorem 1.6 For A and B as previously specified,


⎛ (1) ⎞
A τG1n B
A τGmn B = ⎝ ... .. .. ⎟ .

. .⎠
A(m) τG1n B

Proof: Using the properties of Kronecker products we write,

AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG
⎛ ⎞ ⎛ (1) ⎞
(A1 )1 . ⊗ B1 + · · · + (AG )1 . ⊗ BG A τG1n B
=⎝ .
. .
. ⎟ ⎜ .. .. .. ⎟ .
⎠=⎝ .

. . . .⎠
(m)
(A1 )m . ⊗ B1 + · · · + (AG )m . ⊗ BG A τG1n B

12 Mathematical Prerequisites

Theorem 1.7 Let a be an n×1 vector. Then,

aτn1G B = (a ′ ⊗ IG )B.

Proof: Clearly,

aτn1G B = a1 ⊗ B1 + · · · + an ⊗ Bn

where now we partition B as B = (B1′ . . . Bn′ ) ′ each of the submatrices being


G×q.
But,

a1 ⊗ B1 = a1 B1

so

a τn1G B = a1 B1 + · · · + an Bn = (a ′ ⊗ IG )B. 

A special case of this theorem is when G = 1 so B is n×q. Then,

aτn11 B = a ′ B = Bτn11 a.

Theorem 1.8 Let A, B, and C be m× p, mG×q, and r ×G matrices, respec-


tively. Then,

C(Aτm1G B) = Aτm1r (Im ⊗ C )B.

Proof: If we partition B as
⎛ ⎞
B1
B = ⎝ ... ⎠
⎜ ⎟

Bm

where each submatrix in this partitioning is G×q then

C(Aτm1G B) = C(A1 . ⊗ B1 + · · · + Am . ⊗ Bm )
= A1 . ⊗ CB1 + · · · + Am . ⊗ CBm = Aτm1r (Im ⊗ C )B. 

The cross-product operator, like the Kronecker product, is intimately con-


nected with the vec operator. In the next section, we look at the vec operator
that works with columns of a given matrix, stacking them underneath each
other. The rvec operator works with rows of a matrix, stacking them along-
side each other. The generalized vec and rvec operators are generalization
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 13

of the basic operators, which are particularly useful when we are deal-
ing with partitioned matrices. Theorems involving these operators and the
cross-product operator are presented in the following sections.

1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs

1.4.1 Basic Operators



Let A be an m×n matrix with ai its ith column and a j its jth row. Then,
vec A is the mn×1 vector given by
⎛ ⎞
a1
⎜ .. ⎟
vec A = ⎝ . ⎠ .
an
That is, the vec operator transforms A into an mn×1 column vector by
stacking the columns of A one underneath the other. Similarly, rvec A is the
1×mn row vector:
′ ′
rvec A = (a1 . . . am ).
That is, the rvec operator transforms A into a 1×mn row vector by stacking
the rows of A alongside each other.
Both operators are intimately connected as
(vec A) ′ = (a1′ . . . an′ ) = rvec A ′
and
a1
⎛ ⎞

vec A ′ = ⎝ ... ⎠ = (rvec A) ′ .


⎜ ⎟

am
These basic relationships mean that results for one of the operators can be
readily obtained from results for the other operator.
Both operators are connected with the Kronecker product operator.
From
ab ′ = b ′ ⊗ a = a ⊗ b ′ ,
a property noted in Section 1.2, it is clear that the jth column of ab ′ is b j a
and the ith row of ab ′ is ai b ′ , so
vec ab ′ = vec(b ′ ⊗ a) = b ⊗ a (1.11)
14 Mathematical Prerequisites

and
rvec ab ′ = rvec(a ⊗ b ′ ) = a ′ ⊗ b ′ .
More generally, if A, B, and C are three matrices such that the product ABC
is defined, then
vec ABC = (C ′ ⊗ A)vec B
and
rvec ABC = rvec B(A ′ ⊗ C ).
Often, we will have occasion to take the vec of a partitioned matrix. Let
A be a m×np matrix and partition A so that A = (A1 . . . A p ), where each
submatrix is m×n. Then, it is clear that
⎛ ⎞
vec A1
⎜ . ⎟
vec A = ⎝ .. ⎠ .
vec A p
An application of this result follows. Suppose B is any n×q matrix and
consider
A(Ip ×B) = (A1 B . . . A p B).
Then,
⎛ ⎞ ⎛ ⎞
vec A1 B Iq ⊗ A1
.. ..
vec A(Ip ⊗ B) = ⎝ ⎠=⎝ ⎠ vec B.
⎜ ⎟ ⎜ ⎟
. .
vec A p B Iq ⊗ A p
If A is a m×n matrix and x is any vector, then
⎛ ⎞
a1 ⊗ x
vec(A ⊗ x) = vec(a1 ⊗ x . . . an ⊗ x) = ⎝ ... ⎠ = vec A ⊗ x
⎜ ⎟

an ⊗ x

vec(x ⊗ A) = vec(x1 A . . . xn A) = x ⊗ vec A. (1.12)
and
vec(A ⊗ x ′ ) = vec(a1 ⊗ x ′ . . . an ⊗ x ′ )
vec(a1 ⊗ x ′ )
⎛ ⎞ ⎛ ⎞
x ⊗ a1
.. ⎟ ⎜ .. ⎟
=⎝ ⎠ = ⎝ . ⎠ = vec(x ⊗ a1 . . . x ⊗ an )

.
vec(an ⊗ x ′ ) x ⊗ an
= vec(x ⊗ (a1 . . . an )) = vec(x ⊗ A), (1.13)
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 15

where in our analysis we have used Equations 1.11 and 1.5. Using Equations
1.12 and 1.13, we have that if x and y are any vectors

vec(y ′ ⊗ x ′ ) = vec(x ⊗ y ′ ) = vec(y ′ ⊗ x) = y ⊗ x.

Finally, if A is n×n and x is n×1, then

a1′ x
⎛ ⎞

vecx ′ A = vec(x ′ a1 . . . x ′ an ) = ⎝ ... ⎠ = A ′ x.


⎜ ⎟

an′ x

By taking transposes and using the fact that rvec A ′ = (vec A) ′ , we get the
corresponding results for the rvec operator.

1.4.2 Vecs, Rvecs, and the Cross-Product Operator


Just as Kronecker products are intimately connected with vecs and rvecs, so
are cross-products. The following theorem gives this basic connection.

Theorem 1.9 Let A be n× p and B be nG×q. Then,

Aτn1G B = ((vec A) ′ ⊗ IG )(Ip ⊗ B).

Proof: Write A = (a1 . . . a p ) where a j is the jth column of A. Then,


⎛ ⎞
B O
((vec A) ′ ⊗ IG )(Ip ⊗ B) = (a1′ ⊗ IG . . . a p′ ⊗ IG ) ⎝
⎜ .. ⎟
. ⎠
O B
= (a1′ ⊗ IG )B . . . (a p′ ⊗ IG )B.

Partition B such that


⎛⎞
B1
B = ⎝ ... ⎠
⎜ ⎟

Bn

where each submatrix in this partition is G×q.


Then,
⎛ ⎞
B1
′ ⎜ .. ⎟
(a j ⊗ IG )B = (a1 j IG . . . an j IG ) ⎝ . ⎠ = a1 j B1 + · · · + an j Bn ,
Bn
16 Mathematical Prerequisites

so
(vec A ′ ⊗ IG )(Ip ⊗ B) = a11 B1 + · · · + an1 Bn . . . a1p B1 + · · · + anp Bn
= (a11 B1 . . . a1p B1 ) + · · · + (an1 Bn . . . anp Bn )
′ ′
= a1 ⊗ B1 + · · · + an ⊗ Bn = Aτn1G B. 

A special case of this theorem is when B is n×q so G = 1. We have then that


Aτn11 B = (vec A) ′ (Ip ⊗ B) = ((Ip ⊗ B ′ )vec A) ′
= (vec B ′ A) ′ = (rvec In )(A ⊗ B).
In a similar vein, if C is r ×m and D is s×m, then
vec C τmrs vec D = vec DC ′ = (C ⊗ D)vec Im .
Another theorem involving cross-products and rvecs that will be useful in
our future work is the following:

Theorem 1.10 Let A and B be m×n and p×q matrices, respectively. Then,
Im τm1p (A ⊗ B) = rvec A ⊗ B.

Proof: From our definition of cross-products given in Section 1.3,


′  ′ m′
 ′
Im τm1p (A ⊗ B) = e1m ⊗ a1 ⊗ B + · · · + em ⊗ am ⊗ B
 
 ′ ′
= a1 ⊗ B O . . . O + · · · + O . . . O am ⊗ B
  
 ′ ′   ′ ′
= a1 ⊗ B . . . am ⊗ B = a1 . . . am ⊗ B = rvec A ⊗ B.

Cross-products come into their own when we are dealing with partitioned
matrices. Often with a partitioned matrix, we want to stack submatrices
in the partition underneath each other or alongside each other. Operators
that do this are called generalized vec or generalized rvec operators. Section
1.4.4 looks at these operators in detail and later we see that there are several
theorems linking cross-products with those generalized operators.
To finish this section, we briefly look at expressing traces in term of our
vec and rvec operators. It is easily shown that
trAB = (vec A ′ ) ′ vec B = rvec A vec B.
When it comes to the trace of a product of three matrices, we can write
trABC = rvec A vec BC = rvec A(I ⊗ B)vec C
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 17

for an appropriate identity matrix I. Other expressions for trABC in terms


of vecs and rvecs can be similarly achieved using the fact that
trABC = trCAB = trBCA.

1.4.3 Related Operators: Vech and v


In taking the vec of a square matrix A, we form a column vector by using all
the elements of A. The vech and the v operators form column vectors from
select elements of A.
Let A be the n×n matrix:
⎛ ⎞
a11 · · · a1n
A = ⎝ ... .. ⎟ .

. ⎠
an1 · · · an n

Then, vech A is the 12 n(n + 1)×1 vector


a11
⎛ ⎞
⎜ .. ⎟
⎜ . ⎟
⎜a ⎟
⎜ n1 ⎟
⎜a ⎟
⎜ 22 ⎟
vechA = ⎜ . ⎟ ,
⎜ .. ⎟
⎜ ⎟
⎜ an2 ⎟
⎜ . ⎟
⎝ . ⎠
.
an n
that is, we form vechA by stacking the elements of A on and below the main
diagonal, one underneath the other.
The vector v(A) is the 12 n(n − 1)×1 vector given by
a21
⎛ ⎞
⎜ .. ⎟
⎜ . ⎟
⎜ a ⎟
⎜ n1 ⎟
⎜ a ⎟
⎜ 32 ⎟
v(A) = ⎜ . ⎟ ,
⎜ .. ⎟
⎜ ⎟
⎜ an2 ⎟
⎜ . ⎟
⎝ . ⎠
.
an n−1
that is, we form v(A) by stacking the elements of A below the main diagonal,
one underneath the other.
18 Mathematical Prerequisites

If A is symmetric, that is, A ′ = A, then ai j = a ji and the elements of


A, below the main diagonal are duplicated by the elements above the main
diagonal. Often, we wish to form a vector from A that consists of the essential
elements of A without duplication. Clearly, the vech operator allows us to
do this.
An obvious example is in statistics where A is the covariance matrix. The
unknown parameters associated with the covariance matrix are given by
vechA. If we wished to form a vector consisting of only the covariances of
the covariance matrix, but not the variances, then we take v(A).
Before we leave this section, note that for a square matrix A, not necessarily
symmetric, vecA contains all the elements in vechA and in v(A), and more.
It follows that we can obtain vechA and v(A) by premultiplying vecA
by matrices whose elements are zeros or ones strategically placed. Like-
wise, v(A) can be obtained from vechA by premultiplying vechA by such a
matrix. These matrices are examples of zero-one matrices called elimination
matrices.
If A is symmetric then, as previously noted, vechA contains all the essential
elements of A. It follows that there exists a matrix, whose elements are all
zeros or ones such that when we premultiply vechA by this matrix we obtain
vecA. In a similar manner, if A is strictly lower triangular, then v(A) contains
the essential elements of A apart from zeros, so we must be able to obtain
vecA by premultiplying v(A) by a matrix whose elements are zeros or ones
suitably placed. Such matrices are called duplication matrices.
Chapter 3 studies elimination matrices and duplication matrices, perhaps
in a new way.

1.4.4 Generalized Vecs and Generalized Rvecs


When dealing with a matrix that has been partitioned into its columns,
we often have occasion to stack the columns of the matrix underneath
each other. If A is a large matrix, we often partition A into a number of
submatrices. For example, if A is m×np, we may write

A = (A1 . . . A p )

where each submatrix in this partition is m×n. Often, we want to stack


these submatrices underneath each other to form the mp×n matrix.
⎛ ⎞
A1
⎜ .. ⎟
⎝ . ⎠.
Ap
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 19

The operator that does this for us is called the generalized vec of order n,
denoted by vecn . To form vecn A, we stack columns of A underneath each
other taking n at a time. Clearly, this operator is only performable on A if
the number of columns of A is a multiple of n. Under this notation,

vec A = vec1 A.

In a similar fashion, if A is partitioned into its rows we know that the rvec
operator forms a row vector out of the elements of A by stacking the rows
of A alongside each other. If A has a large number of rows, say, A is mp×n
we often have occasion to partition A into p m×n matrices, so we write
⎛ ⎞
A1
⎜ .. ⎟
A=⎝ . ⎠
Ap

where each submatrix is m×n matrix. Again we may want to stack these
submatrices alongside each other instead of underneath each other, to form
the m×np matrix

(A1 . . . A p ).

The operator that does this for us is called the generalized rvec of order m
denoted by rvecm . To form rvecm A, we stack rows of A alongside each other
taking m at a time, so this operator is only performable on A if the number
of rows of A is a multiple of m. Under this notation,

rvec A = rvec1 A.

For a given matrix A, which is m×n, the number of generalized vecs (rvecs)
that can be performed on A clearly depends on the number of columns
n(rows m) of A. If n(m) is a prime number, then only two generalized
vec (rvec) operators can be performed on A, vec1 A = vec A and vecn A =
A, rvec1 A = rvec A, and rvecm A = A.
For n(m) any other number, the number of generalized vec (rvec) oper-
ators that can be performed on A is the number of positive integers that
divide into n(m).
As with the vec and rvec operators, the vecn and rvecn operators are
intimately connected. Let A be a m×np matrix and, as before, write

A = (A1 . . . A p )
20 Mathematical Prerequisites

where each submatrix Ai is m×n. Then,


⎛ ⎞
A1
⎜ . ⎟
vecn A = ⎝ .. ⎠
Ap
so
(vecn A) ′ = (A1′ . . . A ′p ) = rvecn A ′ . (1.14)
Similarly, if B is mp×n and we partition B as
⎛ ⎞
B1
⎜ .. ⎟
B=⎝ . ⎠
Bp
where each submatrix B j is m×n then
⎛ ′⎞
B1
⎜ .. ⎟

vecm B = ⎝ . ⎠ = (rvecm B) ′ . (1.15)
B p′
As before, we need only derive theorems for one of these operators. Then,
using Equations 1.14 or 1.15, we can readily obtain the corresponding results
for the other operator.
Clearly, we can take generalized vecs of matrices, which are Kronecker
products. Let A and B be m×n and p×q matrices, respectively, and write
A = (a1 . . . an ), where a j is the jth column of A. Then, we can write
A ⊗ B = (a1 ⊗ B . . . an ⊗ B)
so
⎛ ⎞
a1 ⊗ B
vecq (A ⊗ B) = ⎝ ... ⎠ = vec A ⊗ B. (1.16)
⎜ ⎟

an ⊗ B
As a special case, vecq (a ′ ⊗ B) = a ⊗ B.

Now write A = (a1 . . . am ) ′ , where ai is the ith row of A. Then,
⎛ 1′ ⎞
a ⊗B
A⊗B =⎝
⎜ .. ⎟
. ⎠

am ⊗ B
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 21

so
′ ′
rvec p (A ⊗ B) = (a1 ⊗ B . . . am ⊗ B) = rvec A ⊗ B, (1.17)
and as a special case rvec p (a ⊗ B) = a ′ ⊗ B.
The generalized vec of a matrix can be undone by taking the appropriate
generalized rvec of the vec. This property induced the author to originally
call generalized rvecs, generalized devecs (see Turkington (2005)). If A is
m×n, for example, then clearly
rvecm (vec A) = A.
In fact, if vec j A refers to a generalized vec operator that is performable on
A, then the following relationships exist between the two operators
rvec(vec A) = (vec A) ′ = rvec A ′ ,
rvecm (vec j A) = A
rvec(vec j A) = 1×mn vectors where elements are obtained
from a permutation of those of (vec A) ′ .
In a similar fashion, the generalized vec operator can be viewed as undoing
the rvec of a matrix.
If rveci A refers to a generalized rvec operator that is performable on A,
then we have
vec(rvec A) = vec A ′ = (rvec A) ′
vecn (rveci A) = A
vec(rveci A) = mn×1 vectors whose elements are obtained
from a permutation of those of vec A ′ .
There are some similarities between the behavior of vecs on the one hand
and that of generalized vecs on the other. For example, if A is an m×n
matrix, then as
A = A In In
we have
vec A = (In ⊗ A)vec In .
If A be an m×nG matrix, we have the following theorem:

Theorem 1.11 For A a m×nG matrix


vecG A = (In ⊗ A)(vecG InG ).
22 Mathematical Prerequisites

Proof: Partition A is A = (a1 . . . an ) so


⎛ ⎞
A1
vecG A = ⎝ ... ⎠ ,
⎜ ⎟

An

where each submatrix Ai is m×G.


Now
⎛ ⎞
O
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟  n 
⎜ IG ⎟ = A ei ⊗ IG
Ai = (A1 . . . An ) ⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
O
so
⎛  n ⎞ ⎛ n ⎞
A ei ⊗ IG e1 ⊗ IG
.. ⎜ . ⎟
vecG A = ⎝ ⎠ = (In ⊗ A) ⎝ .. ⎠
⎜ ⎟
 n.
enn ⊗ IG

A en ⊗ IG
= (In ⊗ A)(vecIn ⊗ IG ) = (In ⊗ A)vecG InG ,

by Equation 1.16. 

Also, for A m×n and B n× p then as AB = ABIp , we have

vec AB = (Ip ⊗ A)vec B.

For A a m×n matrix and B a n×G p matrix, we have the following


theorem.

Theorem 1.12 If A and B are m×n and n×G p matrices, respectively,


then

vecG AB = (Ip ⊗ A)vecG B.

Proof: Partition B as B = (B1 . . . B p ) where each submatrix B j is n×G.


Then,

AB = (AB1 . . . AB p )
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 23

so
⎛ ⎞
AB1
⎜ . ⎟
vecG AB = ⎝ .. ⎠ = (Ip ⊗ A)vecG B.
AB p 

However, the similarities end here. There appears to be no equivalent the-


orems for generalized vecs that correspond to
vec AB = (B ′ ⊗ Im )vec A
or
vec ABC = (C ′ ⊗ A)vec B.
Notice in Theorem 1.12 if B itself was a Kronecker product, say B = C ⊗ D
where C and D are r × p and s×G matrices, respectively, so n = rs, then we
are using Equation 1.16
vecG [ A(C ⊗ D)] = (Ip ⊗ A) vecG (C ⊗ D) = (Ip ⊗ A)(vecC ⊗ D).
(1.18)
We can write this generalized vec another way as shown by the following
theorem.

Theorem 1.13 Let A, C and D be m×rs, r × p, and s×G matrices res-


pectively. Then,
vecG [A(C ⊗ D)] = (C ′ ⊗ Im )(vecs A)D.

Proof: Partition A as A = (A1 . . . Ar ) where each submatrix is m×s.


Then,
⎛ 1′ ⎞
c ⊗D
vecG [A(C ⊗ D)] = vecG [(A1 . . . Ar )] ⎝
⎜ .. ⎟
. ⎠

cr ⊗ D
  ′  ′
= vecG A1 c 1 ⊗ D + · · · + Ar c r ⊗ D
 

 ′  ′
= vecG c 1 ⊗ A1 D + · · · + c r ⊗ Ar D
 

⎛ ⎞ ⎛ ⎞
c11 A1 D cr1 Ar D
⎜ . ⎟ ⎜ . ⎟
= ⎝ .. ⎠ + · · · + ⎝ .. ⎠
c1p A1 D cr p Ar D
24 Mathematical Prerequisites

where C = {ci j }. Consider the first submatrix




A1
c11 A1 D + · · · + cr1 Ar D = (c1′ ⊗ Im ) ⎝ ... ⎠ D
⎜ ⎟

Ar
= (c1′ ⊗ Im )(vecs A)D.
The result follows. 

The equivalent results for generalized rvecs are listed as:


If A is a mG×n matrix, then
rvecG A = (rvecIm ⊗ IG )(Im ⊗ A) = (rvecm ImG )(Im ⊗ A).
If B is a n× p matrix, then
rvecG AB = rvecG A(Im ⊗ B). (1.19)
If C and D are m×r and G×s, respectively, and n = rs,
rvecG (C ⊗ D)B = (rvecC ⊗ D)(Im ⊗ B) = D(rvecs B)(C ′ ⊗ Ip ).
If C and D are m×r and G×G, respectively, so n = rG, then
rvecG [(C ⊗ D)B] = rvecG [(Ir ⊗ D)B](C ⊗ Ip ).
This section finishes with a result that is useful in dealing with a partitioned
vector.

Theorem 1.14 Let x be a mp×1 vector and y be a m×1 vector. Then,


x ′ (y ⊗ Ip ) = y ′ vec p x ′ .

Proof: Partition x is x = (x1′ . . . xm′ ) ′ where each subvector is p×1. Then,


⎛ ⎞
y1 Ip
x ′ (y ⊗ Ip ) = (x1′ . . . xm′ ) ⎝ ... ⎠ = y1 x1′ + · · · + ym xm′ = y ′ vec p x ′ .
⎜ ⎟

ym Ip


Note from Theorem 1.7, we have that


yτm11 vec p x ′ = x ′ (y ⊗ Ip ).
For further theorems, on generalized vecs and rvecs see Turkington (2005).
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 25

1.4.5 Generalized Vec Operators and the Cross-Product Operator


Generalized vec operators like the cross-product operator really come into
their own when we are dealing with large partitioned matrices. In this
section, we present theorems that link the operators.
First, if we take the transpose of a cross-product, we get a cross-product
of generalized vecs as the following theorem shows.

Theorem 1.15 Let A be mG× p and B be nG×q, partitioned as in Equa-


tion 1.7 of Section 1.3. Then,
(AτGmn B) ′ = vecm (A ′ )τG pq vecn (B ′ ).

Proof:
(AτGmn B) ′ = (A1 ⊗ B1 + · · · + AG ⊗ BG ) ′
= A1′ ⊗ B1′ + · · · + AG′ ⊗ BG′ .
Now A ′ = (A1′ . . . AG′ ) where each Ai′ is p×m so
⎛ ′⎞
A1
′ ⎜ .. ⎟
vecm A = ⎝ . ⎠ .
AG′
Similarly,
B1′
⎛ ⎞

vecn B ′ = ⎝ ... ⎠
⎜ ⎟

BG′
where each submatrix B ′j is q×n so the result holds. 

A generalized vec or rvec can be written as a cross-product as the following


theorem shows.

Theorem 1.16 Let A be a mG× p matrix. Then,


rvecm A = IG τG1m A.

Proof: Partitioning A as in Equation 1.7, we have


rvecm A = (A1 . . . AG ).
But IG = (e1G . . . eGG ) ′ where e Gj refers to the jth column of IG .
26 Mathematical Prerequisites

So
′ ′
IG τG1m A = e1G ⊗ A1 + · · · + eGG ⊗ AG = (A1 . . . AG ). 

A consequence of this theorem is that some cross-products can be written


as a generalized rvecs.
When we take the generalized vec of a cross-product, we get another
cross-product that involves a vec of a generalized rvec, as the following
theorem shows.

Theorem 1.17 Let A and B be mG× p and nG×q matrices, respectively, and
partition A and B as in Equation 1.7. Then,
vecq (AτGmn B) = vec(rvecm A)τG,mp,n B.

Proof: As AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG , we have, using Equation


1.16,
vecq (AτGmn B) = vecq (A1 ⊗ B1 ) + · · · + vecq (AG ⊗ BG )
= vec A1 ⊗ B1 + · · · + vec AG ⊗ BG
⎛ ⎞
vec A1
= ⎝ ... ⎠ τG,mp,n B.
⎜ ⎟

vec AG
But
⎛ ⎞
vec A1
⎜ .. ⎟
⎝ . ⎠ = vec(A1 . . . AG ) = vec(rvecm A).
vec AG


One final theorem involving cross-products and generalized vecs:

Theorem 1.18 Let A, B, C be p×mG, G×q, and m×r matrices, respectively.


Then,
A(B ⊗ C ) = BτG1p (vecm A)C.

Proof: Write
A(B ⊗ C ) = A(B ⊗ Im )(Iq ⊗ C )
and partition A as A = (A1 . . . AG ) where each submatrix in this partitioning
is p×m.
1.4 Vecs, Rvecs, Generalized Vecs, and Rvecs 27

Then,

b1 ⊗ Im
⎛ ⎞

A(B ⊗ Im ) = (A1 . . . AG ) ⎝
⎜ .. ⎟
. ⎠

bG ⊗ Im
 ′  ′
= A1 b1 ⊗ Im + · · · + AG bG ⊗ Im
 
′ ′
= b1 ⊗ A1 + · · · + bG ⊗ AG = BτG1p vecm A,
so,
A(B ⊗ C ) = (BτG1p vecm A)(Iq ⊗ C ) = BτG1p (vecm A)C,
by Theorem 1.5. 
TWO

Zero-One Matrices

2.1 Introduction
A matrix whose elements are all either one or zero is, naturally enough,
called a zero-one matrix. Such matrices have had a long association with
statistics and econometrics, although their prominence has really come to
the fore with the advent of matrix calculus. In this chapter, the intent is not to
give a list of all known zero-one matrices plus their properties. The reader
is referred to Magnus (1988), Magnus and Neudecker (1999), Lutkepohl
(1996), and Turkington (2005) for such material. Instead, what is presented
are zero-one matrices that may be new to the reader, but which I have found
useful in the evaluation of certain matrix calculus results. Having said that,
I do talk about some known zero-one matrices and their properties in
order for the reader to have a full understanding of the new matrices. The
later sections of this chapter are reserved for theorems linking the zero-one
matrices with the mathematical operators we looked at in Chapter 1.

2.2 Selection Matrices and Permutation Matrices


Probably the first zero-one matrix to appear in statistics and econometrics
was a selection matrix. A selection matrix is a matrix whose (rows) columns
are a selection of the (rows) columns of an identity matrix. Consider A, an
m × n matrix, and write A = (a1 . . . an ) where ai is the ith column of A.
Suppose from A we wish to form a new matrix, B, whose columns consist of
the first, fourth, and fifth columns of A. Let S be the selection matrix given
by S = (e1n e4n e5n ), where e nj is the jth column of the n × n identity matrix
In . Then,

AS = (a1 a4 a5 ) = B.

28
2.2 Selection Matrices and Permutation Matrices 29

Selection matrices have an obvious application in econometrics. The matrix


A, for example, may represent the observations on all the endogenous
variables in an econometric model, and the matrix B may represent the
observations on the endogenous variables that appear on the right-hand
side of a particular equation in the model. Often, it is mathematically
convenient to use selection matrices and write B = AS.
Similarly, if we premultiply a matrix A by a selection matrix made up of
rows of the identity matrix, we get a new matrix consisting of a selection of
the rows of A.
Selection matrices can be used to select the (i, j )th element of a matrix.
′ ′
Let A be m × n matrix, then as eim A selects the ith row of A and (eim A)e nj
selects the jth column of this vector, it follows that

ai j = eim Ae nj .
When it comes to selecting the (i, j )th element from a Kronecker product,
we have to specify exactly where the ith row and jth column are located in
the matrix. Let A be an m × n matrix and B be an p × q matrix. Then,
⎛ 1′ ⎞
a ⊗B
A⊗B =⎝
⎜ .. ⎟ ,
. ⎠

am ⊗ B
′ ′
so if i takes a value between 1 and p the ith row is a1 ⊗ bi , if i takes a
′ ′
value between p + 1 and 2p the ith row is a2 ⊗ bi , and so on until the last
possibility where i takes a value between (m − 1)p and pm in which case
′ ′
the ith row is am ⊗ bi . To cater for all of these possibilities, we write
i = (c − 1)p + i,¯
where c is some value between 1 and m and i¯ is some value between 1
and p. By setting c = 1 and letting i¯ range from 1 to p, then setting c = 2
and letting i¯ take the same values and so on until we set c = m and let i¯
take values between 1 and p, we generate all possible values for i, namely
i = 1, 2, . . . mp.
′ ¯′
If we do this, set i = (c −′ 1)p + i,¯ then the ith row of A ⊗ B is ac ⊗ bi .
′ ′ ¯′ p
But ac = ecm A and bi = ei¯ B so the ith row of A ⊗ B is
 m′ p′ 
ec ⊗ ei¯ (A ⊗ B).

A similar analysis can be carried out for the columns of A ⊗ B. As


A ⊗ B = (a1 ⊗ B . . . an ⊗ B)
30 Zero-One Matrices

when talking about the jth column of this matrix, we must specify the exact
location of this column. We do this by writing

j = (d − 1)q + j,¯

for suitable d between 1 and n, and suitable j¯ between 1 and q. If we set


d = 1, let j¯ range from 1 to q, set d = 2 and let j¯ range over the same values,
then continue in this manner until we set d = n and let j¯ take the values 1 to
q, we generate all possible values for j, namely j = 1, 2, . . . , nq. Writing j in
q
this manner, the jth column of A ⊗ B is ad ⊗ b j¯. But ad = Aedn and b j¯ = Be j¯
so the jth column of A ⊗ B is
q
(A ⊗ B) edn ⊗ e j¯ .


We can put our analysis together in the following theorem.

Theorem 2.1 Let A be an m × n matrix and B be an p × q matrix. In


selecting the (i, j )th element of A ⊗ B write

i = (c − 1)p + i¯
j = (d − 1)q + j¯

for suitable c between 1 and m, suitable i¯ between 1 and p, suitable d between


1 and n, and suitable j¯ between 1 and q. Then,

(A ⊗ B)i j = acd bi¯ j¯.

Proof: The ith row of A ⊗ B is given by


 ′ p′ 
(A ⊗ B)i . = ecm ⊗ ei¯ (A ⊗ B),

and the jth column of A ⊗ B is given by


q
(A ⊗ B).j = (A ⊗ B) edn ⊗ e j¯ .


Putting these two results together gives


 ′ p′  q
(A ⊗ B)i j = ecm ⊗ ei¯ (A ⊗ B) edn ⊗ e j¯


′ p′ q
= ecm Aedn ⊗ ei¯ Be j¯
= acd bi¯ j¯.

2.2 Selection Matrices and Permutation Matrices 31

To illustrate the use of this theorem, suppose A is 2 × 3 and B is 4 × 5. If


we want to find (A ⊗ B)79 , we would write

7=1×4+3

so c = 2 and i¯ = 3 and we would write

9=1×5+4

so d = 2 and j¯ = 4. According to the theorem,

(A ⊗ B)79 = a22 b34 .

An important application of this analysis comes about when we are dealing


with large identity matrices, Imn say. Write

Imn = e1mn . . . emn


mn
 

where, as per usual, e mn


j refers to the jth column of this matrix. The question
is, can we get an expression for e mn
j in terms of columns of smaller identity
matrices? Writing Imn as a Kronecker product, we have

Imn = Im ⊗ In

so our expression for e mn


j depends on the exact location of this jth column.
If we write,

j = (d − 1)n + j¯

for suitable d between 1 and m and suitable j¯ between 1 and n, then

e mn
 m n
 m n
j = (Im ⊗ In ) ed ⊗ e j¯ = ed ⊗ e j¯ .

For example, consider I6 and suppose we write

I6 = I3 ⊗ I2 .

If we are interested in the 5th column of I6 , we write

5 = 2 × 2 + 1,

so d = 3 and j¯ = 1, and we can write

e56 = e33 ⊗ e12 .

Sometimes, we wish to retrieve the element ai j from the vec A or from the
rvec A. This is a far simpler operation, as shown by Theorem 2.2.
32 Zero-One Matrices

Theorem 2.2 Let A be an m × n matrix. Then,


 ′ ′
ai j = e nj ⊗ eim vec A = (rvec A) eim ⊗ e nj .
 

Proof: We have

ai j = eim Ae nj .

′ ′
But ai j = vecai j = (e nj ⊗ eim )vec A. Also, ai j = rvec ai j = (rvec A)
(ei ⊗ e nj ).
m


The concept of a selection matrix can be generalized to handle the case where
our matrices are partitioned matrices. Suppose A is an m × nG matrix and
we partition A as

A = (A1 . . . An ) (2.1)

where each submatrix is m × G. To select the submatrix Ai from A, we post


multiply A by ein ⊗ IG where ein is the ith column of In . That is,

Ai = A ein ⊗ IG .
 

Suppose now we want to form the matrix B = (A1 A4 A5 ) from A. Then,

B = A(S ⊗ IG )

where S = (e1n e4n e5n ).


In like manner, consider C an mG × n matrix partitioned as
⎛ ⎞
C1
C = ⎝ ... ⎠ (2.2)
⎜ ⎟

CG

where each submatrix is m × n. If from C we wish to select C j , we pre-



multiply C by e Gj ⊗ Im .
That is,
 ′ 
C j = e Gj ⊗ Im C.
2.2 Selection Matrices and Permutation Matrices 33

If we wish to form
⎛ ⎞
C2
D = ⎝ C3 ⎠
C7
′ ⎞
e2G


we premultiply C by the selection matrix S ⊗ Im where S = ⎝ e3G ⎠.

e7G
Finally, staying with the same partition of C notice that
⎛ m′ ⎞ ⎛ ⎞
e j C1 (C1 ) j .
IG ⊗ e mj C = ⎝ ... ⎠ = ⎝ ... ⎠ = C ( j ) ,
 ′

⎜ ⎟ ⎜ ⎟

e mj CG (CG ) j .
where we use the notation introduced by Equation 1.8 in Chapter 1. That

is, (IG ⊗ e mj ) is the selection matrix that selects C ( j ) from C.
Sometimes instead of selecting rows or columns from a matrix A, we
want to rearrange the rows or columns of A. The zero-one matrix that
does this for us is called a permutation matrix. A permutation matrix P is
obtained from a permutation of the rows or columns of an identity matrix.
The result is a matrix in which each row and each column of the matrix
contains a single element, one, and all the remaining elements are zeros.
As the columns or rows of an identity matrix form an orthonormal set
of vectors, it is quite clear that every permutation matrix is orthogonal,
that is, P ′ = P −1 . Where a given matrix A is premultiplied (postmultiplied)
by a permutation matrix, formed from the rows (columns) of an identity
matrix, the result is a matrix whose rows (columns) are obtained from a
permutation of the rows (columns) of A.
As with selection matrices, the concept of permutation matrices can be
generalized to handle partitioned matrices. If A is m × nG and we partition
A as in Equation 2.1, and we want to rearrange the submatrices in this
partitioning, we can do this by post multiplying A by
P ⊗ IG
where P is the appropriate permutation matrix formed from the columns
of the identity matrix In .
Similarly, if we want to rearrange the submatrices in C given by Equation
2.2, we premultiply C by
P ⊗ Im
34 Zero-One Matrices

where P is the appropriate permutation matrix formed from the rows of the
identity matrix IG .

2.3 The Elementary Matrix Eimn


j

Sometimes, it is convenient to express an m × n matrix A as a sum involving


its elements. A zero-one matrix that allows us to do this is the elementary
matrix, (not to be confused with elementary matrices that give rise to
elementary row or column operations). The elementary matrix Eimn j is the
m × n matrix whose elements are all zeros except the (i, j )th element, which
is 1. That is, Eimn
j is defined by

Eimn m n
j = ei e j .

Clearly if A = {ai j } is an m × n matrix then


m 
 n
A= ai j Eimn
j .
i=1 j=1

Also,
 ′ ′
Eimn
j = e nj eim = E nm
ji .

Notice that if A is an m × n matrix and B is an p × q matrix, then


np
AEi j B = ai b ′j .
The ith row and the jth column of a Kronecker product can be written in
terms of elementary matrices. Note that
vecq (a′ ⊗ b′ ) = a ⊗ b′ = ab′
where b is q × 1. Returning to the ith row of A ⊗ B, which we wrote as
′ ¯′
(A ⊗ B)i . = ac ⊗ bi
¯
where bi is q × 1, it follows that
¯′ p′ mp
vecq (A ⊗ B)i . = ac bi = A′ ecm ei¯ B = A′ Ec i¯ B. (2.3)

If we undo the vecq by taking the rvec of both sides, we have


mp
(A ⊗ B)i . = rvec A′ Ec i¯ B. (2.4)

Similarly,
rvec p (a ⊗ b) = a′ ⊗ b = ba′
2.4 The Commutation Matrix 35

where b is now p × 1. Returning to the jth column of A ⊗ B, which we wrote


as

(A ⊗ B).j = ad ⊗ b j¯

where b j¯ is p × 1, it follows that


q ′ qn
rvec p (A ⊗ B).j = b j¯ ad′ = Be j¯ edn A′ = BE jd ′
¯ A. (2.5)

Undoing the rvec p by taking the vec of both sides, gives


′ qn
(A ⊗ B).j = vec BE jd
¯ A. (2.6)

These results will be important for us in our work in Chapter 4 where we


look at different concepts of matrix derivatives.
Elementary matrices will be important for us as certain concepts of matrix
differentiation make use of these matrices.

2.4 The Commutation Matrix


One of the most useful permutation matrices for statistics, econometrics,
and matrix calculus is the commutation matrix. Consider an m × n matrix
A, which using our notation for columns and rows, we write as
⎛ 1′ ⎞
a
⎜ .. ⎟
A = (a1 . . . an ) = ⎝ . ⎠ ,

am

where a j is the jth column of A and ai is the ith row of A. Then,
⎛ ⎞
a1
⎜ .. ⎟
vec A = ⎝ . ⎠
an

whereas
a1
⎛ ⎞

vec A′ = ⎝ ... ⎠ .
⎜ ⎟

am

Clearly, both vec A and vec A′ contain all the elements of A, although
arranged in different orders. It follows that there exists a mn × mn
36 Zero-One Matrices

permutation matrix Kmn that has the property

Kmn vec A = vec A′ . (2.7)

This matrix is called the commutation matrix. The order of the subscripts
is important. The notation is that Kmn is the commutation matrix associated
with an m × n matrix A and takes vec A to vec A′ . On the other hand, Knm is
the commutation matrix associated with an n × m matrix and as A′ is such
a matrix it follows that

Knm vec A′ = vec (A′ )′ = vec A.

Using Equation 2.7, it follows that the two commutation matrices are linked
by

Knm Kmn vec A = vec A

−1 ′
so it follows that Knm = Kmn = Kmn , where the last equality comes about
because Kmn , like all permutation matrices, is orthogonal.
If the matrix A is a vector a, so m = 1, we have that

vec a = vec a′

so

K1n = Kn1 = In .

The commutation matrix can also be used to take us from a rvec to a vec.
For A as previously, we have

(rvec A)Kmn = (vec A′ )′ Kmn = (Kmn vec A)′ Kmn


= (vec A)′ Knm Kmn = (vecA)′ .

There are several explicit expressions for the commutation matrix. Two of
the most useful, particularly when working with partitioned matrices are
these:
′ ⎤
In ⊗ e1m

Kmn = ⎣ .. ⎥  n n

⎦ = Im ⊗ e1 . . . Im ⊗ en , (2.8)

.
m′
In ⊗ em
2.4 The Commutation Matrix 37

where, as always in this book, e mj is the jth column of the m × m identity


matrix Im . For example,
′ ⎤
I2 ⊗ e13

′ ⎥
K32 = ⎣ I2 ⊗ e23 ⎦ = I3 ⊗ e12 I3 ⊗ e22
⎢  

I2 ⊗ e33
⎡ ⎤
1 0 0 0 0 0
⎢0 0 0 1 0 0⎥
⎢ ⎥
⎢0 1 0 0 0 0⎥
=⎢
⎢0
⎥.
⎢ 0 0 0 1 0⎥⎥
⎣0 0 1 0 0 0⎦
0 0 0 0 0 1

The commutation matrix Kmn can be written in terms of elementary matri-


ces. Write
′ ⎤ ′ ′ ⎤
In ⊗ e1m
⎡ n
e1 ⊗ e1m . . . enn ⊗ e1m

Kmn = ⎣ .. .. ..
⎦=⎣ ⎦.
⎢ ⎥ ⎢ ⎥
. . .
′ ′ ′
m
In ⊗ em e1n ⊗ em
m
... m
enn ⊗ em

From Equation 1.2 of Section 1.2 in Chapter 1 for vectors a and b, a′ ⊗ b =


b ⊗ a′ = ba′ , so
⎡ n m′ ′ ⎤
. . . enn e1m
⎡ nm nm

e1 e1 E11 . . . En1
Kmn = ⎣ ... .. ⎥ = ⎢ .. .. ⎥ .

. ⎦ ⎣ . . ⎦
′ ′ nm nm
e1n em
m
... m
enn em E1m ... Enm

We have occasion to use this expression for Kmn throughout this book.

Notice that Knn is symmetric and is its own inverse. That is, Knn = Knn and
Knn Knn = In2 , so Knn is a symmetric idempotent matrix.
For other expressions, see Magnus (1988), Graham (1981), and Hender-
son and Searle (1979).
Large commutation matrices can be written in terms of smaller commu-
tation matrices as the following result shows. (See Magnus (1988), Chapter
3).

Kst ,n = (Is ⊗ Kt n )(Ksn ⊗ It ) = (It ⊗ Ksn )(Kt n ⊗ Is ). (2.9)

Moreover,

Kst ,n = Ks,t n Kt ,ns = Kt ,ns Ks,t n . (2.10)


38 Zero-One Matrices

2.4.1 Commutation Matrices, Kronecker Products, and Vecs


Some of the most interesting properties of commutation matrices are con-
cerned with how they interact with Kronecker products. Using commuta-
tion matrices, we can interchange matrices in a Kronecker product as the
following well-known results illustrate (see Neudecker and Magnus (1988),
p.47 and Magnus (1988), Chapter 3).
Let A be an m × n matrix, B be an p × q matrix, and b be an p × 1 vector.
Then,

Kpm (A ⊗ B) = (B ⊗ A)Kqn (2.11)

Kpm (A ⊗ B)Knq = B ⊗ A (2.12)

Kpm (A ⊗ b) = b ⊗ A (2.13)

Kmp (b ⊗ A) = A ⊗ b. (2.14)

If B is m × n, then

tr(Kmn (A′ ⊗ B)) = trA′ B.

Note if c is a q × 1 vector, then as bc ′ = c ′ ⊗ b = b ⊗ c ′ , we have using


Equations 2.13 and 2.14 that

Kpm (A ⊗ bc ′ ) = b ⊗ A ⊗ c ′

Kmp (bc ′ ⊗ A) = c ′ ⊗ A ⊗ b.

Secondly, interesting properties of commutation matrices with respect to


Kronecker products, perhaps not so well known, can be achieved by writing
the commutation matrix as we did in Equation 2.8 and calling on the work
we did on selection matrices in Section 2.2. Consider A and B as previously
shown and partition B into its rows:
⎛ 1′ ⎞
b
⎜ .. ⎟
B = ⎝ . ⎠.

bp
Then, we know that in general
′ ⎞ ′ ⎞
b1 A ⊗ b1
⎛ ⎛

A ⊗ B = A ⊗ ⎝ ... ⎠ = ⎝ ..
⎠. (2.15)
⎜ ⎟ ⎜ ⎟
.
p′ p′
b A⊗b
2.4 The Commutation Matrix 39

However, the last matrix in Equation 2.15 can be achieved from A ⊗ B using
a commutation matrix as the following theorem shows.

Theorem 2.3 If A and B are m × n and p × q matrices respectfully, then


′ ⎞
A ⊗ b1

Kpm (A ⊗ B) = ⎝ ..
⎠.
⎜ ⎟
.

A ⊗ bp

Proof: Using Equation 2.8


p′ ⎞ p′ ⎞ ′ ⎞
A ⊗ b1
⎛ ⎛ ⎛
Im ⊗ e1 A ⊗ e1 B
Kpm (A ⊗ B) = ⎝ .. .. ..
⎠ (A ⊗ B) = ⎝ ⎠=⎝ ⎠.
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
. . .
p′ p′ p′
Im ⊗ e p A ⊗ ep B A⊗b


Similarly, if we partition B into its columns so B = (b1 . . . bq ) then

A ⊗ B = A ⊗ (b1 . . . bq ) = (A ⊗ b1 . . . A ⊗ bq ). (2.16)

However, the last matrix of Equation (2.16) can be achieved from A ⊗ B


using the commutation matrix.

Theorem 2.4 If A and B are m × n and p × q matrices, respectively, then

(A ⊗ B)Knq = (A ⊗ b1 . . . A ⊗ bq ).

Proof: Using Equation 2.8, we write


   
q q
(A ⊗ B)Knq = (A ⊗ B) In ⊗ e1 . . . In ⊗ eqq = A ⊗ Be1 . . . A ⊗ Beqq
= (A ⊗ b1 . . . A ⊗ bq ). 

An interesting implication of Theorems 2.3 and 2.4 is that


′ ⎞
A ⊗ b1

..
⎠ = (B ⊗ a1 . . . B ⊗ an ).
⎜ ⎟
⎝ .

A ⊗ bp

This result follows as Kpm (A ⊗ B) = (B ⊗ A)Kqn .


40 Zero-One Matrices

Notice that
′ ′ ⎞
a1 ⊗ b1

⎜ .. ⎟

⎜ m′ . ⎟
⎜ a ⊗ b1′ ⎟

⎜ ⎟
Kpm (A ⊗ B) = ⎜
⎜ .. ⎟
⎜ ′ . ⎟
⎜ a1 ⊗ b p′ ⎟

⎜ ⎟
⎜ .. ⎟
⎝ . ⎠
′ ′
am ⊗ b p
so using the operator introduced in Section 1.3 of Chapter 1, we have
⎛ j′ ′ ⎞ ⎛ 1′ j ′
a ⊗ b1

b (a ⊗ Iq )
(Kpm (A ⊗ B))( j ) = ⎝ .. .. j′
⎠=⎝ ⎠ = B(a ⊗ Iq ).
⎜ ⎟ ⎜ ⎟
. .
′ ′ ′ ′
a j ⊗ bp b p (a j ⊗ Iq )
(2.17)
This result will be useful to us in Chapter 5.
In our work in Chapter 4, we have occasion to consider the ith row
of Kpm (A ⊗ B). From Theorem 2.3, it is clear that in obtaining this row
we must specify exactly where the ith row is located in this matrix. If i is
′ ′
between 1 and m, the ith row is ai ⊗ b1 , if between m + 1 and 2m it is
′ ′
ai ⊗ b2 and so on, until i is between (p − 1)m and pm, in which case the
′ ′
ith row is ai ⊗ b p . To cater for all possibilities, we use the device introduced
in Section 2.2 of this chapter. We write
i = (c − 1)m + i¯
for c some value between 1 and p and i¯ some value between 1 and m. Then,
¯′ ′
Kpm (A ⊗ B) i. = ai ⊗ bc
 
(2.18)
where bc is q × 1. Taking the vecq of both sides of Equation 2.18, we have
¯ ′ p′ ′ mp
vecq Kpm (A ⊗ B) i. = ai bc = A′ eim
 
¯ ec B = A Eic
¯ B. (2.19)

In comparing Equation 2.19 with Equation 2.3, we note that the difference
in taking the vecq of [Kpm (A ⊗ B)]i . as compared to taking the vecq of
(A × B)i . is that the subscripts of the elementary matrix are interchanged.
Undoing the vecq by taking the rvec of each side, we get another way of
writing [Kpm (A ⊗ B)]i. namely
np
Kpm (A ⊗ B) i. = rvec A′ Eic¯ B.
 
(2.20)
2.4 The Commutation Matrix 41

We will also have occasion to consider the jth column of [(A ⊗ B)Knq ].
Referring to Theorem 2.4 again, we have to specify exactly where the jth
column is in the matrix. Conducting a similar analysis leads us to write

j = (d − 1)n + j¯

where d takes a suitable value between 1 and q and j¯ takes a suitable value
between 1 and n. Then,
 
(A ⊗ B)Knq . j = a j¯ ⊗ bd (2.21)

where bd is p × 1. Taking the rvec p of both sides of Equation 2.21, gives

q ′ qn
rvec p (A ⊗ B)Knq . j = bd a j¯′ = Bed e nj¯ A′ = BEd j¯ A′ .
 
(2.22)

Again, comparing Equation 2.22 with Equation 2.5, we see that the sub-
scripts of the elementary matrix are interchanged. Undoing the rvec p by
taking the vec, we get another way of writing [(A ⊗ B)Knq ].j , namely

qn
(A ⊗ B)Knq . j = vec BEd j¯ A′ .
 

In Section 2.2, we saw that when it comes to partitioned matrices, selection



matrices of the form eiG ⊗ Im or e Gj ⊗ In are useful in selecting submatrices.
Using these matrices, we can generalize Theorems 2.3 and 2.4 to the case
where B is a large partitioned matrix.

Theorem 2.5 Let A be an m × n matrix and B be an pG × q matrix and


partition B as follows:
⎛ ⎞
B1
B = ⎝ ... ⎠ ,
⎜ ⎟

BG

where each submatrix Bi is p × q. Then,


⎞⎛
A ⊗ B1
(KGm ⊗ Ip )(A ⊗ B) = ⎝
⎜ .. ⎟
. ⎠
A ⊗ BG
42 Zero-One Matrices

Proof: Using Equation 2.8, we write


⎛  ′ ⎞
Im ⊗ e1G ⊗ Ip
⎜ ⎟
(KGm ⊗ Ip )(A ⊗ B) = ⎜ ..
⎟ (A ⊗ B)
⎜ ⎟
⎝ .′ ⎠
Im ⊗ eGG ⊗ Ip
⎛  ′  ⎞
A ⊗ e1G ⊗ Ip B ⎛
A ⊗ B1

⎜ ⎟
=⎜ .. ..
⎟=⎝ ⎠.
⎜ ⎟ ⎜ ⎟
⎝  .′  ⎠ .

A ⊗ eGG ⊗ Ip B A ⊗ BG

The following corollary of this theorem is important for us:

Corollary 2.1 Let A be an m × n matrix and B be p × q. Then,




A ⊗ b1
⎜ . ⎟
(Kqm ⊗ Ip )(A ⊗ vec B) = ⎝ .. ⎠
A ⊗ bq

where b j is the jth column of B. 

Note that
⎛ ⎞⎛ ⎞
Kmp O b1 ⊗ A
.. ⎟ ⎜ .. ⎟
(Iq ⊗ Kmp )(vec B ⊗ A) = ⎝

. ⎠⎝ . ⎠
O Kmp bq ⊗ A
⎛ ⎞ ⎛ ⎞
Kmp (b1 ⊗ A) A ⊗ b1
.. ⎟ ⎜ .. ⎟
=⎝ ⎠ = ⎝ . ⎠,

.
Kmp (bq ⊗ A) A ⊗ bq

so we have
⎛ ⎞
A ⊗ b1
⎜ . ⎟
(Kqm ⊗ Ip )(A ⊗ vec B) = ⎝ .. ⎠ = (Ip ⊗ Kmp )(vec B ⊗ A). (2.23)
A ⊗ bq

A consequence of Theorems 2.3 and 2.5, which is useful for our work
throughout many chapters, is the following result.
2.4 The Commutation Matrix 43

Theorem 2.6 Let A and B be matrices as specified in Theorem 2.5. Then,


⎛ ⎞
A ⊗ (B1 )1·
⎜ .. ⎟

⎜ . ⎟

⎜ A ⊗ (B1 ) p· ⎟
⎜ ⎟
KpG,m (A ⊗ B) = ⎜ ..
⎟.
⎜ ⎟
⎜ . ⎟
⎜ A ⊗ (B ) ⎟
⎜ G 1· ⎟
⎜ .. ⎟
⎝ . ⎠
A ⊗ (BG ) p·

Proof: Using Equation 2.9, we can write

KpG,m (A ⊗ B) = (IG ⊗ Kpm )(KGm ⊗ Ip )(A ⊗ B)


⎛ ⎞⎛ ⎞
Kpm O A ⊗ B1
=⎝
⎜ .. ⎟⎜ .. ⎟
. ⎠⎝ . ⎠
O Kpm A ⊗ BG

by Theorem 2.5. Thus, we have


⎛ ⎞
A ⊗ (B1 )1·
⎜ .. ⎟
⎜ . ⎟
⎛ ⎞ ⎜ ⎟
Kpm (A ⊗ B1 ) ⎜ A ⊗ (B1 ) p· ⎟
⎜ ⎟
KpG,m (A ⊗ B) = ⎝
⎜ .. =
⎟ ⎜ .. ⎟
. ⎠ ⎜
⎜ . ⎟

Kpm (A ⊗ BG ) ⎜ A ⊗ (B ) ⎟
⎜ G 1· ⎟
⎜ .. ⎟
⎝ . ⎠
A ⊗ (BG ) p·

by Theorem 2.3. 

The generalization of Theorem 2.4 to the case where B is a p × qG matrix


is as follows:

Theorem 2.7 Let A be an m × n matrix and B be an p × qG matrix parti-


tioned as follows:

B = (B1 . . . BG )

where each submatrix B j is p × q.


44 Zero-One Matrices

Then,
(A ⊗ B)(KnG ⊗ Iq ) = (A ⊗ B1 . . . A ⊗ BG ).

Proof: Using Equation 2.8, we write


 
(A ⊗ B)(KnG ⊗ Iq ) = (A ⊗ B) In ⊗ e1G . . . In × eGG ⊗ Iq

 
= (A ⊗ B) In ⊗ e1G ⊗ Iq . . . In ⊗ eGG ⊗ Iq
  

= A ⊗ B e1G ⊗ Iq . . . A ⊗ B eGG ⊗ Iq
   

= (A ⊗ B1 . . . A ⊗ BG ). 

A corollary of this theorem is as follows:

Corollary 2.2 Let A be m × n and B be p × q. Then,


′ ′
(A ⊗ rvec B)(Knp ⊗ Iq ) = (A ⊗ b1 . . . A ⊗ b p )
Note that
(rvec B ⊗ A)(Ip ⊗ Kqn )
⎛ ⎞
Kqn O
′ ′
= (b1 ⊗ A . . . b p ⊗ A) ⎝
⎜ .. ⎟
. ⎠
O Kqn
1′ p′ ′ ′
= (b ⊗ A)Kqn . . . (b ⊗ A)Kqn = A ⊗ b1 . . . A ⊗ b p ,
so we have
(A ⊗ rvec B)(Knp ⊗ Iq )
′ ′
= A ⊗ b1 . . . A ⊗ b p = (rvec B ⊗ A)(Ip ⊗ Kqn ). (2.24)

The result corresponding to Theorem 2.6 is that for A and B specified as in


Theorem 2.7, then
(A ⊗ B)Kn,Gq
= (A ⊗ (B1 )·1 . . . A ⊗ (B1 )·q . . . A ⊗ (BG )·1 . . . A ⊗ (BG )·q ). (2.25)
The basic properties of commutation matrices with respect to Kronecker
products as presented by Equations 2.11 to 2.14 can be generalized in a
similar fashion.
Corollary 2.1 and Equation 2.23 allow us to come up with the zero-
one matrix that converts the vec of a Kronecker product into the Kronecker
2.4 The Commutation Matrix 45

product of vecs and vice versa. Partitioning both A and B into their columns,
we have:

A = (a1 . . . an ), B = (b1 . . . bq ).

We saw in Section 1.2 of Chapter 1, that we can write

A ⊗ B = (a1 ⊗ b1 . . . a1 ⊗ bq . . . an ⊗ b1 . . . an ⊗ bq ),

so
⎛ ⎞
a1 ⊗ b1
⎜ .. ⎟

⎜ . ⎟

⎜ a1 ⊗ bq ⎟
⎜ ⎟
vec(A ⊗ B) = ⎜ ..
⎟,
⎜ ⎟
⎜ . ⎟
⎜a ⊗ b ⎟
⎜ n 1⎟
⎜ .. ⎟
⎝ . ⎠
an ⊗ bq

whereas
⎛ ⎞⎞ ⎛
b1
⎜ a ⊗ ⎜ .. ⎟ ⎟
⎜ 1 ⎝ . ⎠⎟
⎜ ⎟

⎜ bq ⎟ ⎟
vec A ⊗ vec B = ⎜ ..
⎟.
⎜ ⎟
⎜ ⎛. ⎞ ⎟

⎜ b1 ⎟ ⎟
⎜ . ⎟⎟
⎝ an ⊗ ⎝ .. ⎠ ⎠

bq

Clearly, both vectors have the same elements, although these elements are
rearranged in moving from one vector to another. Each vector must then
be able to be obtained by premultiplying the other by a suitable zero-one
matrix.
An application of Corollary 2.1 gives the following theorem:

Theorem 2.8

vec(A ⊗ B) = (In ⊗ Kqm ⊗ Ip )(vec A ⊗ vec B)


(In ⊗ Kmq ⊗ Ip )vec(A ⊗ B) = vec A ⊗ vec B.
46 Zero-One Matrices

Proof: We write

(In ⊗ Kqm ⊗ Ip )(vec A ⊗ vec B)


⎛ ⎛ ⎞⎞
b1
⎜ a ⊗ ⎜ .. ⎟ ⎟
⎜ 1 ⎝ . ⎠⎟
⎛ ⎞⎜ ⎟
Kqm ⊗ Ip O ⎜
⎜ bq ⎟ ⎟
=⎝
⎜ .. ⎟⎜ .. ⎟
. ⎠ ⎜
⎜ . ⎟

⎛ ⎞
O Kqm ⊗ Ip ⎜⎜ b1 ⎟ ⎟
⎜ . ⎟⎟
⎝ an ⊗ ⎝ .. ⎠ ⎠

bq
⎛ ⎛ ⎛ ⎞⎞ ⎞
b1
⎜ (K ⊗ I ) ⎜a ⊗ ⎜ .. ⎟⎟ ⎟
⎜ qm p ⎝ 1 ⎝ . ⎠⎠ ⎟
⎜ ⎟

⎜ bq ⎟

⎜ ..
= ⎜. ⎟ = vec (A ⊗ B),

⎜ ⎛ ⎛ ⎞⎞ ⎟

⎜ b1 ⎟

⎜ .. ⎟⎟ ⎟
⎝ (Kqm ⊗ Ip ) ⎝an ⊗ ⎝ . ⎠⎠ ⎠
⎜ ⎜

bq

−1
using Corollary 2.1. As Kmq = Kqm the inverse of (In ⊗ Kqm ⊗ Ip ) is
(In ⊗ Kmq ⊗ Ip ), which gives the second result. 

Theorem 2.8 and Equation 2.23 can also be used to show that vec(A ⊗ B)
can be written in terms of either vec A or vec B:

Theorem 2.9
⎡ ⎛ ⎞⎤
Im ⊗ b1
..
vec(A ⊗ B) = ⎣In ⊗ ⎝ ⎠⎦ vec A,
⎢ ⎜ ⎟⎥
.
Im ⊗ bq

and
⎡⎛ ⎞ ⎤
Iq ⊗ a1
vec(A ⊗ B) = ⎣⎝ ... ⎠ ⊗ Ip ⎦ vec B.
⎢⎜ ⎟ ⎥

Iq ⊗ an
2.4 The Commutation Matrix 47

Proof: By Theorem 2.8


vec(A ⊗ B) = (In ⊗ Kqm ⊗ Ip )(vec A ⊗ vec B)
= (In ⊗ Kqm ⊗ Ip )vec[vec B(vec A)′ ],
by Equation 1.11 from Chapter 1.
But,
vec[vec B(vec A)′ ] = [In ⊗ (Im ⊗ vec B)]vec A = [vec A ⊗ Iq ⊗ Ip ]vec B,
so
vec(A ⊗ B) = {In ⊗ [(Kqm ⊗ Ip )(Im ⊗ vec B)]}vec A
= {[(In ⊗ Kqm )(vec A ⊗ Iq )] ⊗ Ip }vec B
Applying Equation 2.23 gives the result. 

Notice from Theorem 2.4 that


(Im ⊗ B)Kmq = (Im ⊗ b1 . . . Im ⊗ bq )
and each of the submatrices Im ⊗ b j are mp × m.
It follows that
⎛ ⎞
Im ⊗ b1
..
 
vecm (Im ⊗ B)Kmq = ⎝
⎜ ⎟
. ⎠
Im ⊗ bq
so from Theorem 2.9, we can write
vec(A ⊗ B) = {In ⊗ vec m [(Im ⊗ B)Kmq ]}vec A.
But all vecs can be undone by applying a suitable rvec, in this case, rvecmp ,
so we have
A ⊗ B = rvecmp {In ⊗ vec m [(Im ⊗ B)Kmq ]}vec A.
In like manner,
vec(A ⊗ B) = [vecq [(Iq ⊗ A)Kqn ] ⊗ Ip ]vec B (2.26)
and
A ⊗ B = rvecmp {vecq [(Iq ⊗ A)Kqn ] ⊗ Ip }vec B. (2.27)
By taking transposes we can get equivalent results in terms of vecmp , rvecq ,
and rvec p , but the details are left to the reader.
48 Zero-One Matrices

A final property for the commutation matrix that is important for us


concentrates on the fact that the commutation matrix is a permutation
matrix. In Section 2.2, we noted that when a matrix A is premultiplied
by a permutation matrix, the result is a matrix whose rows are obtained
from a permutation of the rows of A. It is of interest to us then to see
what permutation of the rows of A that results from A being premultiplied
by the commutation matrix Kmn . The answer is provided by the following
theorem, which calls on the notation introduced in Equation 1.8 of Sec-
tion 1.3.

Theorem 2.10 Let A be an mn × p matrix and partition A as


⎛ ⎞
A1
⎜ .. ⎟
A=⎝ . ⎠ (2.28)
An
where each submatrix is m × p. Then,
⎛ ⎞
(A1 )1·
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟ ⎛ (1) ⎞
⎜ (An )1· ⎟
⎜ ⎟ A
⎜ .. ⎟ ⎜ .. ⎟
Kmn A = ⎜ . ⎟ = ⎝ . ⎠ .
⎜ ⎟
⎜ (A ) ⎟
⎜ 1 m· ⎟ A(m)
⎜ . ⎟
⎝ .. ⎠
(An )m·

Proof: Using Equation 2.8, we have


′ ⎞ ′
In ⊗ e1m In ⊗ e1m A
⎛ ⎛ ⎞

Kmn A = ⎝ .. ..
⎠A = ⎝ .  ⎠.
⎜ ⎟ ⎜ ⎟
.
m′ m′

In ⊗ em In ⊗ em A
But,
′ ⎞ ⎛ m′ ⎞ ⎛
e mj
⎛ ⎞⎛ ⎞
O A1 e j A1 (A1 ) j·
⎟ ⎜ .. ⎟ ⎜ .. ⎟ ⎜ .. ⎟
 

In ⊗ e mj A = ⎝
⎜ .. ⎠⎝ . ⎠ = ⎝ . ⎠ = ⎝ . ⎠.
.
′ ′
O e mj An e mj An (An ) j·


Notice that when we use this property of Kmn , the second subscript of the
commutation matrix refers to the number of submatrices in the partition
2.4 The Commutation Matrix 49

of A whereas the first subscript refers to the number of rows in each of the
submatrices of the partition of A. Thus,
⎛ (1) ⎞
A
⎜ .. ⎟
Knm A = ⎝ . ⎠
A(n)
where the stacking of the rows refer to a different partitioning of A namely
⎛ ⎞
A1
A = ⎝ ... ⎠
⎜ ⎟

Am
and now each submatrix in this partitioning is n × p.
A similar discussion can be made from the case where we postmultiply
an p × mn matrix B by Kmn .
Consider the case where A is a Kronecker product, say A = B ⊗ C where
B is an n × r matrix and C is an m × s matrix. Then,
⎛ 1′ ⎞
b ⊗C
B ⊗C = ⎝
⎜ .. ⎟
. ⎠

bn ⊗ C

where each submatrix bi ⊗ C is m × rs, for i = 1, . . . , n. In Section 1.2 of
′ ′ ′
Chapter 1, we saw that the jth row of bi ⊗ C is bi ⊗ c j , so
′ ⎞
B ⊗ c1

Kmn (B ⊗ C ) = ⎝
⎜ .. ⎟
. ⎠

B ⊗ cm
which we already knew from Theorem 2.3.
Notice also that as Knm Kmn = Imn , we have
⎛ (1) ⎞ ⎛ ⎞
A A1
⎜ .. ⎟ ⎜ .. ⎟
Knm Kmn A = Knm ⎝ . ⎠ = ⎝ . ⎠ .
A(m) An
That is, premultiplying Kmn A as given in Theorem 2.10 by Knm takes us back
to the original partitioning.
More will be made of this property of the commutation matrix in Section
2.7 of this chapter where we discuss a new zero-one matrix called a twining
matrix.
50 Zero-One Matrices

2.4.2 Commutation Matrices and Cross-Products


The basic properties of commutation matrices with respect to Kronecker
products, as presented in Equations 2.11 to 2.14, can be used to illustrate
how commutation matrices interact with cross-products:

Theorem 2.11 Let A be a mG × p matrix and B be a nG × q matrix. Let C


be r × G matrix. Then,

Knm (A τGmn B) = (B τGnm A)Kqp


Knm (A τGmn B)Kpq = B τGnm A

Krm (A τGmr vec C ) = vec C τGrm A

Kmr (vec C τGrm A) = A τGmr vec C

Proof: Partition A and B as in Equation 1.7 of Section 1.3. Then,

Knm (A τGmn B) = Knm (A1 ⊗ B1 + · · · + AG ⊗ BG )


= Knm (A1 ⊗ B1 ) + · · · + Knm (AG ⊗ BG )

= (B1 ⊗ A1 )Kqp + · · · + (BG ⊗ AG )Kqp = (B τGnm A)Kqp .

The second result is proved in a similar manner.


Now, partition C into its columns:

C = (c1 . . . cG ).

Then,

Krm (A τGmr vec C ) = Krm (A1 ⊗ c1 + · · · + AG ⊗ cG )


= c1 ⊗ A1 + · · · + cG ⊗ AG = vec CτGrm A.

The final result is proved similarly. 

Recall that Theorem 1.6 of Section 1.3 in Chapter 1 demonstrated that for
A a mG × p matrix and B a nG × q matrix, we can write A τGmn B in terms
of a vector of τG1n cross-products, namely
⎛ (1) ⎞
A τG1n B
A τGmn B = ⎝ ..
⎠.
⎜ ⎟
.
A(m) τG1n B
2.4 The Commutation Matrix 51

Contrast this with the following theorem:

Theorem 2.12 Let A be a mG × p matrix and B be a nG × q partitioned as


in Equation 1.7 of Section 1.3. Then,
A τGm1 B (1)
⎛ ⎞

Knm (A τGmn B) = ⎝ ..
⎠.
⎜ ⎟
.
A τGm1 B (n)

Proof: We have
A τGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG
and from Theorem 2.3
⎛ ⎞
Ai ⊗ (Bi )1·
Knm (Ai ⊗ Bi ) = ⎝
⎜ .. ⎟
. ⎠
Ai ⊗ (Bi )n·
for i = 1, . . . , G. It follows that
Knm (A τGmn B)
A τGm1 B (1)
⎛ ⎞ ⎛ ⎞
A1 ⊗ (B1 )1· + · · · + AG ⊗ (BG )1·
=⎝ .. .. ..
⎠=⎝ ⎠.
⎜ ⎟ ⎜ ⎟
. . .
A1 ⊗ (B1 )n· + · · · + AG ⊗ (BG )n· A τGm1 B (n)


The following theorems tell us what happens when the commutation matrix
appears in the cross-product.

Theorem 2.13 Let A and B be mG × p and nm × q matrices, respectively.


Partition A as
⎛ ⎞
A1
⎜ .. ⎟
A=⎝ . ⎠
AG
where each submatrix in this partitioning is m × p. Then,
⎛ ⎞
A1 τm1n B
KmG A τmGn B = ⎝ ..
⎠.
⎜ ⎟
.
AG τm1n B
52 Zero-One Matrices

Proof: Given our partitioning of A, we have from Theorem 2.10 that


⎛ (1) ⎞
A
⎜ .. ⎟
KmG A = ⎝ . ⎠
A(m)

where each submatrix A( j ) is G × p, so if we partition B as


⎛ ⎞
B1
⎜ .. ⎟
B=⎝ . ⎠
Bm
and each submatrix in this partitioning is n × q, then

KmG A τmGn B = A(1) ⊗ B1 + · · · + A(m) ⊗ Bm


⎛ ⎞ ⎛ ⎞
(A1 )1· ⊗ B1 + · · · + (A1 )m· ⊗ Bm A1 τm1n B
=⎝ .. .. ..
⎠=⎝ ⎠.
⎜ ⎟ ⎜ ⎟
. . .
(AG )1· ⊗ B1 + · · · + (AG )m· ⊗ Bm AG τm1n B


Theorem 2.14 Let A and B be nm × p and nG × q matrices, respectively,


and partition B as
⎛ ⎞
B1
⎜ .. ⎟
B=⎝ . ⎠
BG
where each submatrix is n × q. Then,
⎛ ⎞
A τnm1 B1
KGm (A τnmG KnG B) = ⎝ ..
⎠.
⎜ ⎟
.
A τnm1 BG

Proof: From Theorem 2.10 given our partitioning of B


⎛ (1) ⎞
B
⎜ .. ⎟
KnG B = ⎝ . ⎠
B (n)

where each submatrix B ( j ) is G × q. Applying Theorem 2.12 gives the


result. 
2.4 The Commutation Matrix 53

Notice that if A is n × p in Theorem 2.14, so m = 1 and KG1 = IG , we have


⎛ ⎞
A τn11 B1
A τn1G KnG B = ⎝ ..
⎠,
⎜ ⎟
.
A τn11 BG
a result that will be useful to us in our future work.
The following theorem demonstrates that a vec of a product matrix can
be written as a cross-product of vectors involving the commutation matrix:

Theorem 2.15 Let A be an G × m matrix and B be an G × n matrix. Then,


KGn vec B τGnm KGm vec A = vec A′ B.

Proof: Partitioning A and B into their columns, we have A = (a1 . . . am )


and B = (b1 . . . bn ), so
⎛ (1) ⎞ ⎛ (1) ⎞
b a
⎜ .. ⎟ ⎜ .. ⎟
KGn vec B = ⎝ . ⎠ and KGm vec A = ⎝ . ⎠
b(G) a (G)
and
KGn vec B τGnm KGm vec A = b(1) ⊗ a (1) + · · · + b(G) ⊗ a (G) . (2.29)
Consider the first block of the right-hand side of Equation 2.29:
(b1 )1 ⊗ a (1) + · · · + (b1 )G ⊗ a (G) = (b1 )1 a (1) + · · · + (b1 )G a (G)
⎛ (1) ⎞
a
 ′ ⎜ . ⎟
= b1 ⊗ Im ⎝ .. ⎠
a (G)
= b1′ ⊗ Im KGm vec A
 

= Im ⊗ b1′ vec A = vec b1′ A.


 

It follows that the left-hand side of Equation 2.29 can be written as:
vecb1′ A
⎛ ⎞
⎜ .. ⎟ ′ ′
 ′ ′
 
⎝ . ⎠ = vec (b1 A . . . bn A) = vec b1 . . . bn (In ⊗ A)
vecb′n A
= vec (vec B)′ (In ⊗ A) = (In ⊗ A′ )vec B = vec A′ B.
 

54 Zero-One Matrices

The following theorem demonstrates what happens if the commutation


matrix itself is part of a cross-product.

Theorem 2.16 Let A be an mn × p matrix. Then,


KmG τmGn A = IG ⊗ rvecn A
AτmnG KmG = (rvecn A ⊗ IG )Km,G p .

Proof: Recall that


′ ⎞
IG ⊗ e1m

KmG =⎝
⎜ .. ⎟
. ⎠
m′
IG ⊗ em
so if we partition A as
⎛ ⎞
A1
A = ⎝ ... ⎠
⎜ ⎟

Am
where each submatrix is n × p, we have

m′
KmG τmGn A = IG ⊗ e1m ⊗ A1 + · · · + IG ⊗ em
   
⊗ Am
⎛ m′ ⎞ ⎛ m′ ⎞
e1 ⊗ A1 O em ⊗ Am O
=⎝
⎜ . .. ⎠ + ··· + ⎝
⎟ ⎜ .. ⎟
. ⎠
′ ′
O e1m ⊗ A1 O m
em ⊗ Am
⎛ ⎞
(A1 O . . . O) O
=⎝
⎜ .. ⎠ + ...

.
O (A1 O . . . O)
⎛ ⎞
(O . . . OAm ) O
+⎝
⎜ .. ⎟
. ⎠
O (O . . . OAm )
⎛ ⎞
(A1 . . . Am ) O
=⎝
⎜ .. ⎠ = IG ⊗ rvecn A.

.
O (A1 . . . Am )
Now, by Theorems 2.11 and 2.16
AτmnG KmG = KnG (KmG τmGn A)KmG,p = KnG (IG ⊗ rvecn A)KmG,p .
2.4 The Commutation Matrix 55

But KmG,p = KG,mp Km,G p by Equation 2.10, so we can write

AτmnG KmG = KnG (IG ⊗ rvecn A)KG,mp Km,G p = (rvecn A ⊗ IG )Km,G p . 

Notice in Theorem 2.16, if we let A = Kmn , then we have

KmG τmGn Kmn = IG ⊗ rvecn Kmn

and

Kmn τmnG KmG = (rvecn Kmn ⊗ IG )Km,Gmn .

Interchanging the n and G in the second of these equations gives the result
that

KmG τmGn Kmn = IG ⊗ rvecn Kmn = (rvecG KmG ⊗ In )Km,nmG .

Cross-products can be written as an expression involving Kronecker prod-


ucts, one of which involves the commutation matrix, as the following two
theorems show.

Theorem 2.17 Let A be an mG × p matrix and B be an nG × q matrix


partitioned as in Equation 1.7 of Section 1.3.
Then,

AτGmn B = [(rvecm A)KG p ⊗ In ](Ip ⊗ B).

Proof: Write

AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG
= (A1 ⊗ In )(Ip ⊗ B1 ) + · · · + (AG ⊗ In )(Ip ⊗ BG )
⎛ ⎞
I p ⊗ B1
= (A1 ⊗ In . . . AG ⊗ In ) ⎝
⎜ .. ⎟
. ⎠
I p ⊗ BG
= ((A1 . . . AG ) ⊗ In )(KG p ⊗ In )(Ip ⊗ B),

by Theorem 2.5. It follows that AτGmn B = ((rvecm A)(KG p ⊗ In )(Ip ⊗ B).




Theorem 2.18 If A is an p × mG matrix and B is an n × q matrix, then

vecm AτG pn (IG ⊗ B) = (A ⊗ B)(KGm ⊗ Iq ).


56 Zero-One Matrices

Proof: Partition A as A = (A1 . . . AG ) where each submatrix is p × m, so


⎛ ⎞
A1
⎜ .. ⎟
vecm A = ⎝ . ⎠ .
AG

It follows that
′ ′
vecm AτG pn (IG ⊗ B) = A1 ⊗ e1G ⊗ B + · · · + AG ⊗ eGG ⊗ B.

But from the definition of the commutation matrix given in Equation 2.8,

Im ⊗ e1G ⊗ Iq
⎛ ⎞

(A ⊗ B)(KGm ⊗ Iq ) = (A1 ⊗ B . . . AG ⊗ B) ⎝
⎜ .. ⎟
. ⎠

Im ⊗ eGG ⊗ Iq
′ ′
= A1 Im ⊗ e1G ⊗ B + · · · + AG Im ⊗ eGG ⊗ B
 
′ ′
= A1 ⊗ e1G ⊗ B + · · · + AG ⊗ eGG ⊗ B.

One final theorem involving cross-products and commutation matrices:

Theorem 2.19 Let A, B, C and D be m × n, p × q, mr × s, and pr × t


matrices respectively. Then,

Cτmr p (A ⊗ B)Knq = (C ⊗ B)τm,pr,1 A

and

Kpm (A ⊗ B)τ pmr D = A ⊗ (Bτ p1r D).

Proof: From the property of the commutation matrices given by Equation


2.11 and Theorem 2.3, we have
′ ⎞
B ⊗ a1

(A ⊗ B)Knq = Kmp (B ⊗ A) = ⎝ ..
⎠.
⎜ ⎟
.

B ⊗ am
Partitioning C as
⎛ ⎞
C1
C = ⎝ ... ⎠
⎜ ⎟

Cm
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 57

where each submatrix in this partitioning is r × s, enables us to write


′ ′
Cτmr p (A ⊗ B)Knq = C1 ⊗ B ⊗ a1 + · · · + Cm ⊗ B ⊗ am
= (C ⊗ B)τm,pr,1 A

as C j ⊗ B is pr × sq for j = 1, . . . , m. Using Theorem 2.3 again, we write


′ ⎞
A ⊗ b1

Kpm (A ⊗ B) = ⎝
⎜ .. ⎟
. ⎠
p′
A⊗b

whereas we partition D as follows:


⎛ ⎞
D1
⎜ . ⎟
D = ⎝ .. ⎠
Dp

where each submatrix in this partitioning is r × t . It follows that


′ ′
Kpm (A ⊗ B)τ pmr D = (A ⊗ b1 ) ⊗ D1 + · · · + (A ⊗ b p ) ⊗ D p
′ ′
= A ⊗ (b1 ⊗ D1 ) + · · · + A ⊗ (b p ⊗ D p )
′ ′
= A ⊗ (b1 ⊗ D1 + · · · + b p ⊗ D p ) = A ⊗ (Bτ p1r D).


2.5 Generalized Vecs and Rvecs of the Commutation Matrix


At the start of Section 2.3, we saw that we can write the commutation matrix
Kmn as
′ ⎤
In ⊗ e1m

Kmn .. ⎥  n n

=⎣ ⎦ = Im ⊗ e1 . . . Im ⊗ en ,

.
m′
In ⊗ em

where e nj is the jth column of the n × n identity matrix In . It follows that


rvecn Kmn is the n × nm2 matrix given by
 ′

m′
rvecn Kmn = In ⊗ e1m . . . In ⊗ em (2.30)
58 Zero-One Matrices

and vecm Kmn is the mn2 × m matrix given by

Im ⊗ e1n
⎡ ⎤

vecm Kmn =⎣ .. (2.31)


⎦.
⎢ ⎥
.
Im ⊗ enn

For example,
 
100000010000001000
rvec2 K32 = ,
000100000010000001

and

100
⎡ ⎤
⎢ 000 ⎥
⎢ ⎥
⎢ 010 ⎥
⎢ ⎥
⎢ 000 ⎥
⎢ ⎥
⎢ 001 ⎥
⎢ ⎥
⎢ 000 ⎥
vec3 K32 = ⎢
⎢ ⎥.
⎢ 000 ⎥

⎢ 100 ⎥
⎢ ⎥
⎢ 000 ⎥
⎢ ⎥
⎢ 010 ⎥
⎢ ⎥
⎣ 000 ⎦
001

These matrices will be important for us in matrix calculus in deriving


derivatives of expressions involving vec(A ⊗ IG ) or vec(IG ⊗ A).
From Theorem 2.9, we see that for an m × n matrix A

Im ⊗ e1G
⎡ ⎛ ⎞⎤

vec(A ⊗ IG ) = ⎣In ⊗ ⎝ ..
⎠⎦ vec A.
⎢ ⎜ ⎟⎥
.
Im ⊗ eGG

Using Equation 2.31, we can now write

vec(A ⊗ IG ) = (In ⊗ vec m KmG )vec A. (2.32)

Note for the special case in which A is an m × 1 vector a, we have

vec(a ⊗ IG ) = (vecm KmG )a. (2.33)


2.5 Generalized Vecs and Rvecs of the Commutation Matrix 59

In a similar fashion, we can write

In ⊗ e1G
⎡⎛ ⎞ ⎤

vec(IG ⊗ A) = ⎣⎝ ... ⎠ ⊗ Im ⎦ vec A


⎢⎜ ⎟ ⎥

In ⊗ eGG
= (vecn KnG ⊗ Im )vec A, (2.34)

and for the special case where a is m × 1, we have

vec(IG ⊗ a) = (vecIG ⊗ Im )a. (2.35)

In Chapter 4, we have occasion to take the vec of a generalized vec of a


commutation matrix. The following theorem tells us what happens when
we do this.

Theorem 2.20

vec(vecm Kmn ) = vec Imn = vec (vecm Kmn )′ = vec Inm .

Proof: Write

Im ⊗ e1n
⎞ ⎛ m
e1 ⊗ e1n m
⊗ e1n
⎛ ⎞
··· em
vecm Kmn .
.. .. ..
=⎝ ⎠=⎝ ⎠,
⎜ ⎟ ⎜ ⎟
. .
Im ⊗ enn e1m ⊗ enn ··· m
em ⊗ enn
so
e1m ⊗ e1n
⎛ ⎞
⎜ .. ⎟
⎜ m . n⎟
⎜ ⎟
⎜ e1 ⊗ en ⎟
⎜ ⎟
vec(vecm Kmn ) = ⎜
⎜ .. ⎟
⎜ . ⎟

⎜ em ⊗ en ⎟
⎜ m 1⎟
⎜ .. ⎟
⎝ . ⎠
m n
em ⊗ en
m
= vec e1m ⊗ e1n . . . e1m ⊗ enn . . . em m
⊗ e1n . . . em ⊗ enn
 

= vec e1m ⊗ e1n . . . enn . . . em


m
⊗ e1n . . . enn
    

= vec e1m ⊗ In . . . emm


 
⊗ In = vec (Im ⊗ In ) = vec Imn ,

where in our working we have used Equation 1.4 of Chapter 1.


60 Zero-One Matrices

Now,
 ′ ′

vec(vecm Kmn )′ = vec Im ⊗ e1n . . . Im ⊗ enn
′ ⎞
vec Im ⊗ e1n vec e1n ⊗ Im
⎛  ⎛  ⎞

=⎝ .. ..
⎠=⎝
⎜ ⎟ ⎜ ⎟
. . ⎠
n′
   n 
vec Im ⊗ en vec en ⊗ Im
 n n

= vec e1 ⊗ Im . . . en ⊗ Im = vec Inm ,
where in our working we have used Equation 1.13 of Chapter 1. 

2.5.1 Deriving Results for Generalized Vecs and Rvecs


of the Commutation Matrix
Recall that for A a m × n matrix and B an n × G p matrix
vecG AB = (Ip ⊗ A)vecG B.
We can use this property to derive results for generalized vecs of com-
mutation matrices from known results about commutation matrices. For
example, as
KGn KnG = InG
we have taking the vecn of both sides that
(IG ⊗ KGn )vecn KnG = vecn InG = (vec IG ⊗ In ),
using Equation 1.16 of Section 1.4.1 in Chapter 1, so we can now write
vecn KnG in terms of the commutation matrix KnG as follows
vecn KnG = (IG ⊗ KnG )(vec IG ⊗ In ). (2.36)
An alternative expression in terms of the commutation matrix KGn can be
obtained by noting that
vec IG ⊗ In = KG2 n (In ⊗ vec IG )
so using Equation 2.9 of Section 2.3, we have
vecn KnG = (IG ⊗ KnG )(IG ⊗ KGn )(KGn ⊗ IG )(In ⊗ vec IG )
= (KGn ⊗ IG )(In ⊗ vec IG ). (2.37)
Recall also that for A an m × np matrix
rvecn A′ = (vecn A)′ (2.38)
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 61

so the equivalent results for rvecn KGn are found by taking the transposes of
Equations 2.37 and 2.36. They are
rvecn KGn = [In ⊗ rvec IG ](KnG ⊗ IG ) = [rvec IG ⊗ In ](IG ⊗ KGn ).
(2.39)
Other results for vecn KnG and rvecG KnG can be obtained in a similar manner.
For example, if A is an m × n matrix and B is an p × q matrix, we know
that
Kpm (A ⊗ B) = (B ⊗ A)Kqn .
Then, taking the vecq of both sides, using Theorem 1.12 of Section 1.4.3 in
Chapter 1, we have
(In ⊗ Kpm )vecq (A ⊗ B) = (In ⊗ B ⊗ A)vecq Kqn .
That is,
(In ⊗ B ⊗ A)vecq Kqn = (In ⊗ Kpm )(vec A ⊗ B)
⎛ ⎞
B ⊗ a1
= ⎝ ... ⎠ ,
⎜ ⎟

B ⊗ an
by Equation 2.23.
If b is an p × 1 vector, we know that
Kpm (A ⊗ b) = b ⊗ A
Kmp (b ⊗ A) = A ⊗ b.
Taking the generalized rvecs of both sides of these equations, we have using
Equations 1.18 and 1.19 of Section 1.4.4 in Chapter 1, that
(rvecm Kpm )(Ip ⊗ A ⊗ b) = b′ ⊗ A (2.40)
(rvec p Kmp )(Im ⊗ b ⊗ A) = rvec A ⊗ b. (2.41)
Further results about generalized vecs and rvecs can be obtained by applying
the following theorem:

Theorem 2.21
(rvecG KnG )(KnG ⊗ In )KnG,n = (rvecG KnG )(In ⊗ KGn )Kn,nG = rvecG KnG
Kn,nG (KGn ⊗ In )vecG KGn = KnG,n (In ⊗ KnG )vecG KGn = vec G KGn .
62 Zero-One Matrices

Proof: Using Equation 2.39 and Equation 2.9


(rvecG KnG )(KnG ⊗ In )KnG,n = [IG ⊗ (vec In )′ ](KGn ⊗ In )(KnG ⊗ In )KnG,n
= [IG ⊗ (vec In′ )](IG ⊗ Knn )(KGn ⊗ In )
= [IG ⊗ (vec In )′ Knn ](KGn ⊗ In )
= [IG ⊗ (vec In )′ ](KGn ⊗ In ) = rvecG KnG .
The second result for rvecG KnG can be achieved in a similar manner provided
we note that Kn,nG = (KnG,n )′ .
The equivalent results for vecG KGn can then be obtained by taking trans-
poses. 

To illustrate the use of Theorem 2.21, write the left-hand side of Equation
2.40 as:
rvecm Kpm [Ip ⊗ Kmp (b ⊗ A)]
= rvecm Kpm (Ip ⊗ Kmp )[Ip ⊗ (b ⊗ A)]
= rvecm Kpm (Ip ⊗ Kmp )Kp,mp ((b ⊗ A) ⊗ Ip )Knp
= rvecm Kpm (b ⊗ A ⊗ Ip )Knp ,
so from Equation 2.40, we have that
(rvecm Kpm )(b ⊗ A ⊗ Ip ) = (b′ ⊗ A)Kpn = A ⊗ b′ .
In a similar fashion, using Equation 2.41, we get
(rvec p Kmp )(A ⊗ b ⊗ Im ) = b ⊗ (vec A)′ .
Similar results can be achieved by taking the appropriate generalized vec of
both sides of Equation 2.40 and 2.41, but the details are left to the reader.
For a final example of the use of this technique, consider A an m × n
matrix and consider the basic definition of the commutation matrix Kmn ,
namely
Kmn vec A = vec A′ .
Taking the rvecn of both sides of this equation, we have
(rvecn Kmn )(Im ⊗ vec A) = rvecn vec A′ = A′
but
(rvecn Kmn )(Im ⊗ vec A) = (rvecn Kmn )(Im ⊗ Knm vec A′ )
= (rvecn Kmn )(Im ⊗ Knm )Km,mn (vec A′ ⊗ Im )
= rvecn Kmn (vec A′ ⊗ Im )
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 63

so

rvecn Kmn (vec A′ ⊗ Im ) = A′

as well.
Another theorem linking the generalized rvec of a commutation matrix
with other commutation matrices is as follows:

Theorem 2.22

(IG ⊗ rvecm Kqm )(KG,qm ⊗ Iq ) = KGm rvecmG Kq,mG (2.42)

Proof: Using the definition of the commutation matrix given by Equation


2.8, we can write the left-hand side of Equation 2.42 as

Iqm ⊗ e1G ⊗ Iq
⎛ ⎞⎛ ⎞
rvecm Kqm O
⎜ .. ⎟⎜ .. ⎟
⎝ . ⎠⎝ . ⎠

O rvecm Kqm Iqm ⊗ eGG ⊗ Iq
⎛  ′
⎞
rvecm Kqm Iqm ⊗ e1G ⊗ Iq
⎜ ⎟
=⎜ ..
⎟.
⎜ ⎟
⎝  . ′
 ⎠
rvecm Kqm Iqm ⊗ eGG ⊗ Iq

Consider the first block in this matrix, which using the definition of the
generalized rvec of the commutation matrix given by Equation 2.30 can be
written as

Im ⊗ e1G ⊗ Iq
⎛ ⎞
  O
q′ ′
..
Im ⊗ e1 . . . Im ⊗ eqq ⎝
⎜ ⎟
. ⎠

O Im ⊗ e1G ⊗ Iq
 ′   ′ 
q′ ′
= Im ⊗ e1 e1G ⊗ Iq . . . Im ⊗ eqq e1G ⊗ Iq
′ q′ ′ ′
= Im ⊗ e1G ⊗ e1 . . . Im ⊗ e1G ⊗ eqq
 ′   ′ 
q ′ ′
= e1 ⊗ Im ⊗ e1G Kq,mG . . . eqq ⊗ Im ⊗ e1G Kq,mG
 ′ 
q ′ ′ ′
= e1 ⊗ Im ⊗ e1G . . . eqq ⊗ Im ⊗ e1G (Iq ⊗ Kq,mG )
 ′

= rvec Iq ⊗ Im ⊗ e1G (Iq ⊗ Kq,mG ).

64 Zero-One Matrices

It follows this that the left-hand side of Equation 2.42 can be written as
′ ⎞
rvec Iq ⊗ Im ⊗ e1G

..
⎠ (Iq ⊗ Kq,mG )
⎜ ⎟
⎝ .

rvec Iq ⊗ Im ⊗ eGG

Im ⊗ e1G ⊗ rvec Iq
⎛ ⎞
..
=⎝ ⎠ KmG,q2 (Iq ⊗ Iq,mG )
⎜ ⎟
.

Im ⊗ eGG ⊗ rvec Iq
= (KGm ⊗ rvec Iq )(KmG,q ⊗ Iq )
= KmG (ImG ⊗ rvec Iq )(KmG,q ⊗ Iq )
= KmG rvecmG KmG,q

where in the working we have used Equations 2.9 and 2.39. 

Other theorems stand on their own:

Theorem 2.23 Let b be an n × 1 vector and let A be an Gn × p matrix. Then,

(rvecG KnG )(b ⊗ A) = (IG ⊗ b′ )A.

Proof: Write (rvecG KnG )(b ⊗ A) = (rvecG KnG )(b ⊗ InG )A.
Now,
⎛ ⎞
b1 InG
rvecG KnG (b ⊗ InG ) = IG ⊗ e1n . . . IG ⊗ enn ⎝ ... ⎠
 ′ ′

⎜ ⎟

bn InG
 ′
  ′

= b1 IG ⊗ e1n + · · · + bn IG ⊗ enn = IG ⊗ b′ . 

Theorem 2.24 Let A and B be mG × p and nG × q matrices, respectively,


and let C and D be G × r and G × s matrices respectively.
Then,

(rvecm KGm )(A ⊗ C ) = (rvecm A)KG p (Ip ⊗ C )

(rvecn KGn )(D ⊗ B) = (rvecn KGn B)(D ⊗ Iq ).


2.5 Generalized Vecs and Rvecs of the Commutation Matrix 65

Proof: Partition A as in Equation 1.7 of Section 1.3 in Chapter 1.


Then,
⎛ ⎞
A1 ⊗ IG
..
 ′ ′

(rvecm KGm )(A ⊗ IG ) = Im ⊗ e1G . . . Im ⊗ eGG ⎝
⎜ ⎟
. ⎠
AG ⊗ IG
′ ′
= A1 ⊗ e1G + · · · + AG ⊗ eGG
′ ⎞
Ip ⊗ e1G

= (A1 . . . AG ) ⎝ ..
⎠ = (rvecm A)KG p .
⎜ ⎟
.

Ip ⊗ eGG

The first result follows.


Write

(rvecn KGn )(D ⊗ B) = rvecn KGn (IG ⊗ B)(D ⊗ Iq ) = (rvecn KGn B)(D ⊗ Iq ).


The equivalent results for generalized vecs are

(b′ ⊗ A′ )vecG KGn = A′ (IG ⊗ b).

(A′ ⊗ C ′ )vecm KmG = (Ip ⊗ C ′ )KpG vec m A′

(D′ ⊗ B′ )vecn KnG = (D′ ⊗ Iq )vecn B′ KnG .

Suppose now A is an p × G matrix. Then, by Equation 1.19 of Section 1.3


in Chapter 1,

rvecm KGm (Im ⊗ A′ ) = (rvecm KGm )(IGm ⊗ A′ ).


 

The following theorem shows there are several ways of writing this matrix.

Theorem 2.25 Let A be an p × G matrix. Then,

(rvecm KGm )(IGm ⊗ A′ ) = (rvecm Kpm )(A ⊗ Ipm ) = Im ⊗ a1′ . . . Im ⊗ aG′


= (rvec A′ ⊗ Im )(IG ⊗ Kpm ),

where a j is the jth column of A.


66 Zero-One Matrices

Proof: Using Equation 2.30, we write

(rvecm KGm )(IGm ⊗ A′ )


Im ⊗ A′
⎛ ⎞
  O

= Im ⊗ e1G . . . Im ⊗ eGG
′ ⎜ .. ⎟
⎝ . ⎠
O Im ⊗ A′
′ ′
= Im ⊗ e1G A′ . . . Im ⊗ eGG A′ = Im ⊗ a1′ . . . Im ⊗ aG′

where a j is the jth column of A. Again using the Equation 2.30, we write
⎛ ⎞
a11 Imp ... a1G Imp
⎜ .. .. ⎟ .
 
p′ p ′
(rvecm Kpm )(A ⊗ Ipm ) = Im ⊗ e1 . . . Im ⊗ e p ⎝ . . ⎠
a p1 Imp ··· a pG Imp

Consider the jth block of this expression, which is


   
p′ p′
a1 j Im ⊗ e1 + · · · + a p j Im ⊗ e p
 
p′ p′
= Im ⊗ ai j e1 + · · · + a p j e p = Im ⊗ a j′

so the result follows.


Finally,

Im ⊗ a1′ . . . Im ⊗ aG′ = (a1′ ⊗ Im )Kpm . . . (aG′ ⊗ Im )Kpm


= (a1′ . . . aG′ ) ⊗ Im (IG ⊗ Kpm )
 

= (rvec A′ ⊗ Im )(IG ⊗ Kpm ). 

Theorem 2.26 Let A be an mG × p matrix. Then,

(rvecqG Km,qG )(Iqm ⊗ A) = Iq ⊗ A(1) . . . Iq ⊗ A(m) .

Proof: From the definition of the generalized rvec of the commutation


matrix given by Equation 2.30, we can write

(rvecqG Km,qG )(Iqm ⊗ A)


⎛ ⎞
  Iq ⊗ A O
m′ m′ ⎜
= IqG ⊗ e1 . . . IqG ⊗ em ⎝ .. ⎟
. ⎠
O Iq ⊗ A
 ′
  ′

= Iq ⊗ IG ⊗ e1m A . . . Iq ⊗ IG ⊗ em
m
A = Iq ⊗ A(1) . . . Iq ⊗ A(m) ,
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 67

as in Section 2.2, we saw that


 ′

IG ⊗ e mj A = A( j ) . 

From Theorem 2.4, if A is m × n and B is p × q, then

(A ⊗ B)Knq = (A ⊗ b1 . . . A ⊗ bq ),

where each submatrix A ⊗ b j is mp × n.


It follows that
⎛ ⎞
A ⊗ b1
⎜ . ⎟
 
vecn (A ⊗ B)Knq = ⎝ .. ⎠ .
A ⊗ bq

But the following theorem shows there are several ways of writing this
matrix, two involving a generalized vec of the commutation matrix.

Theorem 2.27 Let A and B be m × n and p × q matrices, respectively. Then,

vecn [(A ⊗ B)Knq ] = (Iq ⊗ A ⊗ B)vecn Knq = (Iq ⊗ Kmp )(vec B ⊗ A)


= (Kqm ⊗ Ip )(A ⊗ vec B) = (B′ ⊗ Imp )(vecm Kmp )A.

Proof: From the properties of generalized vec operators, we have

vecn [(A ⊗ B)Knq ] = (Iq ⊗ A ⊗ B)vecn Knq .

Now,

vecn [(A ⊗ B)Knq ] = vec n [Kmp (B ⊗ A)] = (Iq ⊗ Kmp )vecn (B ⊗ A)


= (Iq ⊗ Kmp )(vec B ⊗ A).

But

(Iq ⊗ Kmp )(vec B ⊗ A) = (Iq ⊗ Kmp )Kpq,m (A ⊗ vec B)


= (Kqm ⊗ Ip )(A ⊗ vec B),

where we have used Equation 2.9 of Section 2.3. Finally, using Theorem
1.13 of Section 1.4.4 in Chapter 1, we have

vecn [Kmp (B ⊗ A)] = (B′ ⊗ Imp )(vecm Kmp )A.



68 Zero-One Matrices

The equivalent results for generalized rvec operators are found by taking
transposes. If C is a n × m matrix and D is a q × p matrix, then

rvecn [Kqn (C ⊗ D)] = [(rvecn Kqn )(Iq ⊗ C ⊗ D)]


= (rvec D ⊗ C )(Iq ⊗ Kpm ) = (C ⊗ rvec D)(Kmq ⊗ Ip )
′ ′
= C(rvecm Kpm )(D′ ⊗ Imp ) = (C ⊗ d 1 . . . C ⊗ d q ).

For further such theorems on generalized vecs and rvecs of the commuta-
tion, see Turkington (2005).

2.5.2 Generalized Vecs and Rvecs of the Commutation Matrix


and Cross-Products
In this section, we demonstrate that there are intimate connections between
the cross-product AτGmn B, the Kronecker product A ⊗ B, and the general-
ized rvec of the commutation matrix.
The next two theorems clearly bring out the relationships that exist
between these concepts.

Theorem 2.28 Let A be an mG × p matrix and B be an nG × q matrix.


Then,

AτGmn B = (rvecm KGm ⊗ In )(A ⊗ B)


KGm AτGmn KGn B = (Im ⊗ rvecn KGn )(A ⊗ B).

Proof: Using Equation 2.30 of Section 2.5 and partitioning A and B as in


Equation 1.7 of Section 1.3 in Chapter 1, we write

(rvecm KGm ⊗ In )(A ⊗ B)


⎛ ⎞
A1 ⊗ B
..
 ′ ′

= Im ⊗ e1G ⊗ In . . . Im ⊗ eGG ⊗ In
⎜ ⎟
⎝ . ⎠
AG ⊗ B
 ′
  ′

= A1 ⊗ e1G ⊗ In B + · · · + AG ⊗ eGG ⊗ In B
= A1 ⊗ B1 + · · · + AG ⊗ BG = AτGmn B.

Now,

KGm AτGmn KGn B = (KGm τGmn KGn )(A ⊗ B) = (Im ⊗ rvecn KGn )(A ⊗ B)

by Theorem 2.16. 
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 69

Notice that Theorem 2.28 is easily reconciled with Theorem 2.17 using
Theorem 2.24.

Theorem 2.29 Suppose A be an Gm × p matrix and B be an G × q matrix.


Then,

(rvecm KGm )(A ⊗ B) = AτGm1 B.

Proof: Partition A as follows


⎛ ⎞
A1
A = ⎝ ... ⎠
⎜ ⎟

AG

where each submatrix is m × p, so


⎛ ⎞
A1 ⊗ B
A⊗B =⎝ ..
⎠,
⎜ ⎟
.
AG ⊗ B

and
⎛ ⎞
A1 ⊗ B
..
 ′ ′

(rvecm KGm )(A ⊗ B) = Im ⊗ e1G . . . Im ⊗ eGG
⎜ ⎟
⎝ . ⎠
AG ⊗ B
1′ G′
= A1 ⊗ b + · · · + AG ⊗ b = AτGm1 B. 

We finish this section with a theorem that gives yet another way of writing
the cross-product of AτGmn B involving this time rvecG KmG A.

Theorem 2.30 Let A be mG × p and B be nG × q. Then,


 
AτGmn B = vec pq (rvecG KmG A)τG1n B .

Proof: From Theorem 1.6 of Section 1.3 in Chapter 1

A(1) τG1n B
⎛ ⎞

AτGmn B = ⎝ ..
⎠.
⎜ ⎟
.
A(m) τG1n B
70 Zero-One Matrices

A(1)
⎞ ⎛

But, we saw in Theorem 2.10 that KmG A = ⎝ ... ⎠ so rvecG KmG A =


⎜ ⎟

A(m)
(A . . . A ) and rvec KmG AτG1n B = (A . . . A(m) )τG1n B = (A(1) τG1n
(1) (m) (1)

B . . . A(m) τG1n B), by Theorem 1.3 of Section 1.3 in Chapter 1. As each


of the submatrices A(i) τG1n B is 1 × pq for i = 1, . . . , m it follows that
vec pq [(rvecG KmG A)τG1n B] = AτGmn B by Theorem 1.6 of Chapter 1. 

2.5.3 KnG,G versus Rvecn KGn


Both KnG,G and rvecn KGn have nG 2 columns so it is of some interest to
contrast what happens to a Kronecker product with nG 2 rows when it is
premultiplied by these matrices. Let D be G × r and B be nG × q, then
D ⊗ B is such a matrix. From Theorem 2.3 of Section 2.4.1,
′ ⎞
D ⊗ b1

KnG,G (D ⊗ B) = ⎝ ..
⎠.
⎜ ⎟
.

D ⊗ bnG
The result for rvecn KGn is given by the following theorem.

Theorem 2.31 Let D be an G × r matrix and B be an nG × p matrix. Then,


(rvecn KGn )(D ⊗ B) = (In ⊗ d1′ )B . . . (In ⊗ dr′ )B
 

where d j is the jth column of D.

Proof: Write
(rvecn KGn )(D ⊗ B) = (rvecn KnG )(D ⊗ InG )(Ir ⊗ B),
where
(rvecn KnG )(D ⊗ InG ) = (rvecn KnG )(d1 ⊗ InG . . . dr ⊗ InG ).
But,
rvecn KnG (d1 ⊗ InG ) = In ⊗ d1′
by Theorem 2.23. 

From Theorem 2.29, we have that


(rvecn KGn )(B ⊗ D) = BτGn1 D.
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 71

The result for KnG,G (B ⊗ D) is given by the following theorem.

Theorem 2.32 Let D and B be G × r and nG × p matrices, respectively.


Then,
′ ⎞
B ⊗ d1
⎛ (1)
⎜ .. ⎟
⎜ . ⎟
⎜ B ⊗ dG′ ⎟
⎜ (1) ⎟
⎜ ⎟
KnG,G (B ⊗ D) = ⎜ ..
⎟.
⎜ ⎟
.
⎜ B (n) ⊗ d 1′ ⎟
⎜ ⎟
⎜ ⎟
⎜ .. ⎟
⎝ . ⎠

B (n) ⊗ d G

Proof: Write
′ ⎞
IG ⊗ e1nG

KnG,G (B ⊗ D) = ⎝ ..
⎠ (B ⊗ D).
⎜ ⎟
.
nG ′
IG ⊗ enG
Consider the first submatrix
 ′
  ′ ′
  ′
 ′
IG ⊗ e1nG (B ⊗ D) = IG ⊗ e1n ⊗ e1G (B ⊗ D) = IG ⊗ e1n B ⊗ e1G D.

But from our work in selection matrices in Section 2.2, (IG ⊗ e1n )B = B (1)
′ ′
and e1G D = d 1 . The other submatrices are analysed in a similar fashion and
the result follows. 

2.5.4 The Matrix Nn


Associated with the commutation matrix Knn is the n2 × n2 matrix Nn ,
which is defined by
1
Nn = (I 2 + Knn ).
2 n
From Equation 2.8, it is clear that we can write
⎛ n′ ′ ⎞
e1 ⊗ In + In ⊗ e1n
1⎜ ..
Nn = ⎝ (2.43)

2 . ⎠
n′ n′
en ⊗ In + In ⊗ en
1 n
e1 ⊗ In + In ⊗ e1n . . . enn ⊗ In + In ⊗ enn .

= (2.44)
2
72 Zero-One Matrices

For example,
′ ′
⎛ ⎞
e13 ⊗ I3 + I3 ⊗ e13
1⎜⎜ e 3′ ⊗ I + I ⊗ e 3′ ⎟ .

N3 = 2 3 3 2 ⎠
2 ⎝
′ ′
e33 ⊗ I3 + I3 ⊗ e33

200 000 000


⎛ ⎞
⎜ 010 100 000 ⎟
⎜ ⎟
⎜ 001 000 100 ⎟
⎜ ⎟
⎜ 010 100 000 ⎟
1⎜ ⎟
= ⎜ 000 020 000 ⎟ ⎟.
2⎜⎜
⎜ 000 001 010 ⎟

⎜ 001 000 100 ⎟
⎜ ⎟
⎝ 000 001 010 ⎠
000 000 002
Clearly, Nn is not a zero-one matrix. It is an important matrix for us as
1
Nn vec A = vec (A + A′ ) (2.45)
2
and if A is symmetric
Nn vec A = vec A. (2.46)
Suppose A is an n2 × p matrix and we partition A as
⎛ ⎞
A1
⎜ .. ⎟
A=⎝ . ⎠
An
where each submatrix in this partitioning is n × p. Then, using Theorem
2.10 of the previous section, we have that
⎛ (1) ⎞
A1 + A(1)
⎛ ⎞ ⎛ ⎞
A1 A
1 1⎜ ⎟ 1⎜ ⎟ 1⎜
Nn A = (In2 + Knn )A = ⎝ ... ⎠ + ⎝ ... ⎠ = ⎝ .. ⎟
2 2 2 2 . ⎠
(n) (n)
An A An + A
(2.47)
That is, the jth submatrix of Nn A is formed by adding onto A j the matrix
consisting of the jth rows of the submatrices of A.
Notice also that as Knn is symmetric and its own inverse
Nn Knn = Nn = Knn Nn .
2.5 Generalized Vecs and Rvecs of the Commutation Matrix 73

and
1 1
Nn′ = (In2 + Knn )′ = (In2 + Knn ) = Nn
2 2
1 2

Nn Nn = In2 + Knn + Knn + Knn = Nn ,
4
so Nn is symmetric idempotent.
Other properties for Nn can be derived from the corresponding properties
for Knn . If A and B are n × p and n × q matrices, respectively, then
⎛ 1′ ′ ⎞
a ⊗ B + A ⊗ b1
1   1⎜ ..
Nn (A ⊗ B) = (A ⊗ B) + Knn (A ⊗ B) = ⎝ ⎠,

2 2 .
n′ n′
a ⊗B+A⊗b
(2.48)

where we have used Theorem 2.3. Similarly, if C and D are p × n and q × n


matrices, respectively,
then
1
(C ⊗ D)Nn = (c ⊗ D + C ⊗ d1 . . . cn ⊗ D + C ⊗ dn ) (2.49)
2 1
where we have used Theorem 2.4.
If A and B are both n × n matrices, then
⎛ 1′ ′ ⎞
a ⊗ B + A ⊗ b1
1⎜ ..
Nn (A ⊗ B)Nn = ⎝ ⎠ Nn

2 .
n′ n′
a ⊗B+A⊗b
⎛ 1′ ′ ′ ′
a ⊗ B + A ⊗ b1 + B ⊗ a1 + b1 ⊗ A

1⎜ ..
= ⎝ ⎠. (2.50)

4 .
n′ n′ n′ n′
a ⊗B+A⊗b +B⊗a +b ⊗A

From these properties of Nn , it is clear that if A and B are n × n matrices


and b is a n × 1 vector, then

Nn (A ⊗ B)Nn = Nn (B ⊗ A)Nn
Nn (A ⊗ A)Nn = Nn (A ⊗ A) = (A ⊗ A)Nn
1
Nn (A ⊗ b) = Nn (b ⊗ A) = (A ⊗ b + b ⊗ A).
2
Additional properties of Nn can be found in Magnus (1988).
74 Zero-One Matrices

2.6 The Matrix Umn


In Section 2.4, we wrote the commutation matrix Kmn in terms of elementary
matrices as
⎡ nm nm

E11 · · · En1
Kmn = ⎣ ... .. ⎥ .

. ⎦
nm nm
E1m ··· Enm
As each of these elementary matrices is n × m, the commutation matrix,
as we know, is mn × mn. But in the commutation matrix, the subscripts of
these elementary matrices appear out of natural order. For example, in the
nm nm
first block row we have E11 . . . En1 , whilst in the last block row we have
nm nm
E1m . . . Enm . Moreover, the superscripts of the elementary matrices appear
out of natural order as nm instead of mn.
A matrix that appears often in our work in Chapter 4 is made up of
elementary matrices whose subscripts and superscripts appear in natural
order. For want of a better symbol, the author denotes this matrix by Umn .
That is, Umn is the m2 × n2 matrix given by
⎛ mn mn

E11 · · · E1n
Umn = ⎝ ... .. ⎟ . (2.51)

. ⎠
mn mn
Em1 ··· Emn
For example,
1 0 0 1
⎛ ⎞
⎜0 0 0 0⎟
⎜ ⎟
⎜0 0 0 0⎟
⎜ ⎟
⎜0 0 0 0⎟
⎜ ⎟
⎜1
U32 = ⎜ 0 0 1⎟⎟.
⎜0 0 0 0⎟
⎜ ⎟
⎜0 0 0 0⎟
⎜ ⎟
⎝0 0 0 0⎠
1 0 0 1

There are several ways we can write this matrix. Substituting Eimn m n
j = ei e j
into the matrix, we have
⎛ m n′ ′ ⎞
e1 e1 · · · e1m enn
Umn = ⎝ ... .. ⎟ .

. ⎠
′ ′
m n m n
em e1 ··· em en
2.6 The Matrix Umn 75

′ ′ ′
But eim e nj = e nj ⊗ eim = eim ⊗ e nj , so
′ ′
e1n ⊗ e1m enn ⊗ e1m rvec In ⊗ e1m
⎛ ⎞ ⎛ ⎞
···
Umn =⎝ .. .. .. (2.52)
⎠=⎝
⎜ ⎟ ⎜ ⎟
. . . ⎠
′ ′ m
e1n ⊗ em
m
··· m
enn ⊗ em rvec In ⊗ em
or
′ ′ ⎞
e1m ⊗ e1n e1m ⊗ enn

···
.. ..
 
n′ n′
Umn =⎝ = vec I ⊗ e . . . vec I ⊗ en .
⎜ ⎟
. . ⎠ m 1 m

m n′
em ⊗ e1n ··· m
em ⊗ en
(2.53)

Using the same property again, we can write Equation 2.52 as

Umn = vec Im ⊗ rvec In ,

whereas Equation 2.53 can be written as

Umn = rvec In ⊗ vec Im .

One final application of this property renders

Umn = vec Im ⊗ rvec In = rvec In ⊗ vec Im = (vec Im )(rvec In ). (2.54)

Notice that as (Eimn ′ nm ′


j ) = E ji , we have Umn = Unm .
In our work in Chapter 4, we need to know how Umn interacts with
Kronecker products. The following theorem tells us how.

Theorem 2.33 Let A be an r × m matrix and B be an s × m matrix whilst


C and D are n × u and n × v matrices, respectively. Then,

(A ⊗ B)Umn (C ⊗ D) = (vec BA′ )(rvec C ′ D) = (vec Aτmrs vec B)(Cτn11 D).

Proof: From Equation 2.54

(A ⊗ B)Umn (C ⊗ D) = (A ⊗ B)vec Im rvec In (C ⊗ D).

But (A ⊗ B)vecIm = vec BA′ and rvec In (C ⊗ D) = rvec C ′ D. In Section


1.4.2 of Chapter 1, we saw that rvec In (C ⊗ D) = Cτn11 D and that
(A ⊗ B)vec Im = vec Aτmrs vec B. 
76 Zero-One Matrices

2.7 Twining Matrices

2.7.1 Introduction
Often, in statistics and econometrics, we work with matrices that are formed
by intertwining the rows (columns) of a set of matrices.
To understand what I mean by intertwining rows of matrices, consider
two m × n matrices A = {ai j } and B = {bi j }. Suppose we want to form a
new matrix C from A and B by intertwining single rows of A and B together,
taking the first row of A as the first row of C. That is,
⎛ ⎞
a11 a12 ... a1n
⎜ ⎟
⎜b b12 ... b1n ⎟
⎜ 11 ⎟
⎜ ⎟
C = ⎜ ... .. .. ⎟ .

⎜ . . ⎟

⎜ ⎟
⎜a am2 ... amn ⎟
⎝ m1 ⎠
bm1 bm2 ... bmn

Suppose we form a new matrix D from A and B by intertwining rows of A


and B, taking two rows at a time, assuming m is even so

⎛ ⎞
a11 a12 ... a1n
⎜ ⎟
⎜ a21 a22 ... a2n ⎟
⎜ ⎟
⎜ ⎟
⎜ b11 b12 ... b1n ⎟
⎜ ⎟
⎜ ⎟
⎜ b21 b22 ... b2n ⎟
⎜ ⎟
⎜ . .. .. ⎟
⎜ ..
D=⎜ . . ⎟.

⎜ ⎟
⎜a am−12 ... am−1n ⎟
⎜ m−11 ⎟
⎜ ⎟
⎜ a am2 ... amn ⎟
⎜ m1 ⎟
⎜ ⎟
⎜b bm−12 ... bm−1n ⎟
⎝ m−11 ⎠
bm1 bm2 ... bmn

Clearly, from A and B we can form a new matrix by intertwining any r rows
at a time where r is a divisor of m.
2.7 Twining Matrices 77

Suppose, more generally now, A is m × n and B is 2m × n, and I want to


form a new matrix E by intertwining rows of A and B, taking one row from
A and two rows from B so
⎛ ⎞
a11 a12 ... a1n
⎜ ⎟
⎜ b11 b12 ... b1n ⎟
⎜ ⎟
⎜ ⎟
⎜ b21 b22 ... b2n ⎟
⎜ ⎟
⎜ . .. .. ⎟
⎜ ..
E =⎜ . . ⎟.

⎜ ⎟
⎜ a am2 ... amn ⎟
⎜ m1 ⎟
⎜ ⎟
⎜b b2m−12 ... b2m−1n ⎟
⎝ 2m−11 ⎠
b2m1 b2m2 ... b2mn

In this section, it is shown that such intertwining of any number of rows of


A and B, where the number of rows from A may differ from those of B, can
be achieved by premultiplying the matrix (A′ B′ )′ by a permutation matrix
which, as I say, I call a twining matrix.

2.7.2 Definition and Explicit Expressions for a Twining Matrix


Let A and B be two matrices and partition these matrices as follows:
⎛ ⎞ ⎛⎞
A1 B1
A = ⎝ ... ⎠ , B = ⎝ ... ⎠ (2.55)
⎜ ⎟ ⎜ ⎟

AG BG

where each submatrix Ai is mi × ℓ for i = 1, . . . , G, and each submatrix B j


is p j × ℓ for j = 1, . . . , G. Then, T is a twining matrix if

⎛ ⎞
A1
B ⎟
⎜ ⎟
  ⎜⎜ 1⎟
A ⎜ .. ⎟
T = ⎜ . ⎟.
B ⎜ ⎟
⎝ AG ⎠
⎜ ⎟

BG
78 Zero-One Matrices

Clearly,
G T is the 
(m + p) × (m + p) permutation matrix, where m =
G
i=1 m i and p = j=1 p j given by

Im O ... O · O O ... O
⎛ ⎞
1
⎜O O ... O · Ip1 O ... O ⎟
⎜ ⎟
⎜O Im ... O · O O ... O ⎟
⎜ 2 ⎟
O O ... O · O Ip ... O
⎜ ⎟
T =⎜ 2
⎟.
⎜ .. .. .. .. .. ..
⎜ ⎟
⎜ . . . . . .

· ⎟
⎜ ⎟
⎝O O ... Im · O O ... O ⎠
G
O O ... O · O O Ip
G

A lot of the mathematics in this book concerns itself with the case where A is
an mG × p matrix and B is an nG × q matrix, and each of those matrices are
partitioned into G submatrices. For A, each submatrix is of order m × p and
for B, each submatrix is of order n × q. If p = q = ℓ say, then the twining
matrix can be written as
⎛ ⎛ ⎞ ⎛ ⎞⎞
Im O
m×n ⎠⎠
T = ⎝IG ⊗ ⎝ O ⎠ : IG ⊗ ⎝ (2.56)
n×m
In

We now introduce the following notation:

Notation: Denote the twining matrix given by Equation 2.56 as TG,m,n .

In this notation, the first subscript refers to the common number of


submatrices in the partitions of the two matrices, the second subscript
refers to the number of rows in each submatrix of A and the third subscript
refers to the number of rows in each of the submatrices of B.
An example is,
⎛ ⎞
1 0 0 0 0 0 0 0
⎜0 0 1 0 0 0 0 0⎟
⎜ ⎟
⎜0 0 0 1 0 0 0 0⎟
⎜ ⎟
⎜0 0 0 0 1 0 0 0⎟
T2,1,3 =⎜
⎜0
⎟.
⎜ 1 0 0 0 0 0 0⎟⎟
⎜0 0 0 0 0 1 0 0⎟
⎜ ⎟
⎝0 0 0 0 0 0 1 0⎠
0 0 0 0 0 0 0 1
2.7 Twining Matrices 79

Like all other permutation matrices, the twining


 matrix is orthogonal, that
−1 ′ I O
is TG,m,n = TG,m,n . Note also that T1,m,n = Om I = Im+n . It is also of some
n
interest that TG,m,p is an intertwined matrix itself. As TG,m,n IG(m+n) = TG,m,n ,
the twining matrix is formed by intertwining submatrices of (IGm O) and
(O IGn ).

2.7.3 Twining Matrix TG,m,n and the Commutation Matrix


The special twining matrix TG,m,n is intimately connected with the commu-
tation matrix as the following theorem demonstrates.

Theorem 2.34
 
KmG O
TG,m,n = KG,m+n
O KnG

Proof: Write
′ ⎞
Im+n ⊗ e1G 

  
KmG O .. ⎟ KmG O
KG,m+n =⎝

O KnG . ⎠
O KnG
G′
Im+n ⊗ eG
and consider
 ′
  K   I ⊗ eG ′ O

KmG

Im+n ⊗ e1G mG
= m 1

O O In ⊗ e1G O
 
G′

Im ⊗ e1 KmG
= ′ .
Im ⊗ e1G O


′ (1)
But, we saw in Section 2.2 that (Im ⊗ e1G )KmG = KmG , so
⎛ ′ ′

e1G ⊗ e1m
⎜ .. ⎟

⎜ . ⎟

 ⎜ G′ m′ ⎟
e e
  

G ′
 KmG ⎜ 1 ⊗ m ⎟ G′ I
Im+n ⊗ e1 = ⎜ G′ ⎟ = e1 ⊗ m
O ⎜ e1 ⊗ 0 ⎟ ′ O
..
⎜ ⎟
⎜ ⎟
⎝ . ⎠

e1G ⊗ 0′

by Equation 1.6 of Section 1.2 in Chapter 1.


80 Zero-One Matrices

In a similar manner,
  O   
O
G′ G′
Im+n ⊗ e1 = e1 ⊗ .
KnG In
The result follows. 

In fact, the commutation matrix itself can be considered as a twining matrix.


In Theorem 2.10 of Section 2.4.1, we saw that for A a mG × p matrix
partitioned as
⎛ ⎞
A1
⎜ .. ⎟
A=⎝ . ⎠
AG
where each submatrix in this partitioning is m × p, then
⎛ (1) ⎞
A
⎜ .. ⎟
KmG A = ⎝ . ⎠ . (2.57)
(m)
A
Therefore, the commutation matrix KmG can be regarded as a twining
matrix, which intertwines not two matrices but G matrices. In this inter-
twining, a new matrix is formed by taking one row at a time from each of
the G submatrices of A.
If we return to the case in hand, where we are intertwining two matrices
only, we have
   
K1G O I O
TG,1,1 = KG2 = KG2 G = KG2 .
O K1G O IG
−1 ′
In Section 2.3, we saw that KGn = KGn = KnG . Using this result, we have
from Theorem 2.34 that
 −1  
KmG O KGm O
KG,m+n = TG,m,n = TG,m,n .
O KnG O KGn
That is, KG,m+n is formed by an intertwining of (KGm O) and (O KGn ).

2.7.4 Properties of the Twining Matrix TG,m,n .


Properties of the twining matrix TG,m,n are easily derived from the properties
of the commutation matrix itself. We present these properties as a series of
theorems.
2.7 Twining Matrices 81

Theorem 2.35 The inverse of TG,m,n is given by


 
−1 KGm O
TG,m,n = Km+n,G .
O KGn

Proof: As TG,m,n is a permutation matrix, it is orthogonal so



   
−1 ′ KmG O ′ KGm O
TG,m,n = TG,m,n = ′ KG,m+n = Km+n,G
O KnG O KGn

as KmG = KGm . 

Theorem 2.36 The trace of TG,m,n is given by trTG,m,p = m + n.

 Im 
Proof: Consider the first submatrix of TG,m,n , namely IG ⊗ O .
n×m
As n ≥ 1,
it follows that the only nonzero elements on the main diagonal of TG,m,n
arising from this submatrix are those of the main diagonal of Im . Likewise,
 O 
consider the second submatrix IG ⊗ m×p
In
. Again, as m ≥ 1, it follows that
the only nonzero elements on the main diagonal of TG,m,n arising from this
submatrix are those on the main diagonal of In . Thus, trTG,m,n = m + n. 

An interesting observation can be made from Theorem 2.36. The trace


of a commutation matrix is a complicated expression. (See, for example,
Henderson and Searle (1979, 1981) and Magnus (1988)). It is

trKmn = 1 + gcd(m − 1, n − 1),

where gcd(m, n) is the greatest common divisor of m and n. However,


 
KmG O
trKG,m+n =m+n
O KnG

is a very simple expression.

Theorem 2.37 The determinant of TG,m,n is

 = (−1) 12 G(G−1)[m(m−1)+n(n−1)+mn] .
 
T
G,m,n

1
Proof: The proof uses the fact that |Kmn | = (−1) 4 mn(m−1)(n−1) . (See
Henderson and Searle (1981)).
82 Zero-One Matrices

From Theorem 2.34, we have


 
 KmG O 
|TG,m,n | = |KG,m+n |   = |K
G,m+n | · |KmG | · |KnG |
O KnG 
1 1 1
= (−1) 4 G(m+n)(G−1)(m+n−1) (−1) 4 mG(m−1)(G−1) (−1) 4 nG(n−1)(G−1)
1
= (−1) 2 G(G−1)[m(m−1)+n(n−1)+mn] .


2.7.5 Some Special Cases


Consider the case where both A and B are block diagonal matrices given by
⎛ ⎞ ⎛ ⎞
A1 O B1 O
A=⎝
⎜ .. ⎠ and ⎝
⎟ ⎜ .. ⎟
. . ⎠
O AG O BG

where each Ai is m × ℓi and each Bi is n × ℓi for i = 1, . . . , G. Then,


TG,m,n AB is the block diagonal matrix given by
 

⎛  ⎞
A1
O
  ⎜ B 1

A
⎜ ⎟
TG,m,n =⎜
⎜ .. ⎟.

B ⎜ .
 ⎟
⎝ AG ⎠
O
BG

The next case of interest involves Kronecker products. Suppose A is a G × ℓ


matrix and c and d are m × 1 and n × 1 vectors respectively. Then, A ⊗ c is
the mG × ℓ matrix given by
⎛ ⎞
a11 c ··· a1ℓ c
⎜ .. .. ⎟ .
A⊗c =⎝ . . ⎠
aG1 c ··· aGℓ c

Likewise, A ⊗ d is the nG × ℓ matrix given by


⎛ ⎞
a11 d ··· a1ℓ d
⎜ .. .. ⎟ .
A⊗d =⎝ . . ⎠
aa1 d ··· aGℓ d
2.7 Twining Matrices 83

It follows that
⎛ ⎞
a11 c · · · a1ℓ c
⎜ a11 d · · · a1ℓ d ⎟
 ⎜
 ⎟
A⊗c ⎜ .. .
TG,m,n =⎜ . .. ⎟
A⊗d

⎜ ⎟
⎝ aG1 c · · · aGℓ c ⎠
aG1 d · · · aGℓ d
⎛    ⎞
c c
a
⎜ 11 d · · · a1ℓ
d ⎟  
c
⎜ ⎟
.
..
=⎜ ⎟=A⊗ .
⎜ ⎟
⎜    ⎟ d
⎝ c c ⎠
aG1 · · · aGℓ
d d
This last result is a special case of a theorem on how twining matrices interact
with Kronecker products, a topic that concerns us in the next section.

2.7.6 Kronecker Products and Twining Matrices


As to be expected, there are a number of results about twining matrices and
Kronecker products when one of the matrices in the Kronecker product is
a partitioned matrix.

Theorem 2.38 Consider matrices A, E, and F whose orders are G × r, m × ℓ,


and n × ℓ, respectively. Then,
   
A⊗E E
TG,m,n =A⊗ . (2.58)
A⊗F F

Proof:
     
A⊗E KmG (A ⊗ E ) (E ⊗ A)Kℓr
T G,m,n = KG,m+n = KG,m+n
A⊗F KnG (A ⊗ F ) (F ⊗ A)Kℓr
    
E E
= KG,m+n ⊗ A Kℓr = A ⊗ . (2.59)
F F


Theorem 2.39 Consider B, C, and D where orders are r × G, s × m, and


s × n, respectively. Then,
(B ⊗ (C D))TG,m,n = (B ⊗ C B ⊗ D).
84 Zero-One Matrices

Proof:
(B ⊗ (C D))TG,m,n
   
KmG O K O
= (B ⊗ (C D))KG,m+n = Krs (C ⊗ B D ⊗ B) mG
O KnG O KnG
= Krs ((C ⊗ B)KmG (D ⊗ B)KnG ) = Krs (Ksr (B ⊗ C ) Ksr (B ⊗ D))
= (B ⊗ C B ⊗ D),
as Krs−1 = Ksr . 

Notice that if we take the transposes of both sides of Equations 2.59 and
2.58, we have
 ′   ′
B ⊗ C′
 
′ ′ C
TG,m,n B ⊗ =
D′ B′ ⊗ D ′
and
(A′ ⊗ E ′ A′ ⊗ F ′ )TG,m,n

= A′ ⊗ (E ′ F ′ ).
′ −1
That is, TG,m,n = TG,m,n undoes the transformation brought about by TG,m,n .

2.7.7 Generalizations
The results up to this point have to do largely with intertwining corre-
sponding submatrices from two partitioned matrices. Moreover, we have
concentrated on the case where the submatrices of each partitioned matrix
all have the same order. If we stick to the latter qualification, our results
easily generalize to the case where we intertwine corresponding submatri-
ces from any number of partitioned matrices. All that happens is that the
notation gets a little messy. Here, we content ourselves with generalizing the
definition of a twining matrix and the two explicit expressions we derived
for this matrix. The generalizations of the other results are obvious and are
left to the reader.

A More General Definition of a Twining Matrix


Let A1 , A2 , . . . , Ar be G p1 × ℓ, G p2 × ℓ, . . . , G pr × ℓ matrices, respectively,
and partition A j as follows:
⎛ j ⎞
A1
j ⎜ .. ⎟
A = ⎝ . ⎠ , j = 1, . . . r,
j
AG
2.7 Twining Matrices 85

j
where each submatrix Ai is p j × ℓ for i = 1, . . . , G. The twining matrix,
denoted by TG,p ,...,p is defined by
1 r

A11
⎞ ⎛
⎜ .. ⎟
⎛ 1 ⎞ ⎜ .r ⎟
⎜ ⎟
A ⎜ A1 ⎟
⎜ ⎟
TG,p ...,p ⎝ ... ⎠ = ⎜ ... ⎟ (2.60)
⎜ ⎟ ⎜ ⎟
1 r ⎜ ⎟
Ar ⎜ A1 ⎟
⎜ G⎟
⎜ . ⎟
⎝ .. ⎠
AGr

Two explicit expressions for TG,p ,...,p


1 r

⎛ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎞
Ip1 O O
⎜ ⎜ ⎟ ⎜ p1 ×p2 ⎟ ⎜ p1 ×pr ⎟⎟
⎜ O ⎟
⎜ Ip ⎟ ⎜ O ⎟⎟
⎜ ⎜ ⎟ ⎜ ⎟⎟
⎜ ⎜ p ×p ⎟
TG,p ,...,p = ⎜IG ⊗ ⎜ . ⎟ IG ⊗ ⎜ . ⎟ · · · IG ⊗ ⎜ p2 ×pr ⎟
⎜ ⎜ 2 1 ⎟ ⎜ 2 ⎟ ⎜
⎟⎟ , (2.61)

1 r
⎜ ⎜ . ⎟. ⎜ . ⎟. .
⎜ . ⎟⎟
⎝ ⎝ ⎠ ⎝ ⎠ ⎝ . ⎠⎠
O O I
pr ×p1 pr ×p2 pr

⎛ ⎞
Kp G O
1

⎜ Kp G ⎟

2
TG,p ,...,p = KG,p +···+p ⎜ ..
⎟. (2.62)
1 r 1 r ⎜
.

⎝ ⎠
O Kp G
r

Consider the special case where p1 = . . . = pr = n. Then,


⎛ ⎞
KnG O
TG,n,...,n = KG,nr ⎝
⎜ .. ⎠ = KG,nr (Ir ⊗ KnG ).

.
O KnG

But from Equation 2.9 in Chapter 2,

KG,nr = (KGr ⊗ In )(Ir ⊗ KGn )


′ −1
and as KnG = KGn = KnG , we have that

TG,n,...,n = (KGr ⊗ In ). (2.63)


86 Zero-One Matrices

2.7.8 Intertwining Columns of Matrices


Our discussion up to this point has focused on intertwining rows of matrices
but, of course, a similar discussion would involve columns of matrices.
Suppose A is mG × p and B is nG × q, and partition these matrices as
follows
⎛ ⎞ ⎛ ⎞
A1 B1
⎜ .. ⎟ ⎜ .. ⎟
A = ⎝ . ⎠ and B = ⎝ . ⎠
AG BG
where each submatrix of A is m × p and each submatrix of B is n × q. Then,
by definition
⎛ ⎞
A1
  ⎜ ⎜ B1 ⎟

A ⎜ .. ⎟
TG,m,n = ⎜ . ⎟. (2.64)
B ⎜ ⎟
⎝ AG ⎠
BG
Taking the transpose of this equation gives
(A′ B′ )TG,m,n

= A1′ B1′ . . . AG′ BG′ .
 
(2.65)
Let C and D be the p × mG and q × nG matrices defined by
C = A′ D = B′ .
Then,
C = (C1 . . . CG ) = A1′ . . . AG′ .
 

and
D = (D1 . . . DG ) = B1′ . . . BG′
 

so Equation 2.61 yields



(C D)TG,m,n = (C1 D1 . . . CG DG ).

That is, when the matrix (C D) is postmultiplied by TG,m,n , then columns
of C and D are intertwined.
A special case that will be important for us in our future discussions is
obtained by taking the transpose of both sides of Equation 2.57. We get
 ′ ′

A′ KGm = A(1) . . . A(m) .
2.7 Twining Matrices 87

Again, letting C = A′ , we have


 
CKGm = C(1) . . . C(m) (2.66)

where our notation is

C( j ) = (C1 ).j . . . (CG )· j . (2.67)

That is, C( j ) is formed by stacking the jth columns of the submatrices


C1 . . . CG alongside each other. This notation is used in the proof of the
following theorem.

Theorem 2.40 Let C be an p × mG matrix. Then,

vec(vecmC ) = vec CKGm . (2.68)

Proof: Partition C as follows:

C = (C1 . . . CG )

where each submatrix is p × m. Then,




C1
vecmC = ⎝ ... ⎠
⎜ ⎟

CG
so
⎛ ⎞
(C1 ).1
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟ ⎛ ⎞
⎜ (CG ).1 ⎟
⎜ ⎟ vecC(1)
vec(vecmC ) = ⎜ ... ⎟ = ⎝ ..
⎜ ⎟ ⎜ ⎟
⎜ ⎟ . ⎠
⎜ (C ). ⎟
⎜ 1 m⎟ vecC(m)
⎜ . ⎟
⎝ .. ⎠
(CG ).m

from Equation 2.67. But from Equation 2.66,


⎛ ⎞
vecC(1)
vec CKGm = ⎝ ..
⎠.
⎜ ⎟
.
vecC(m)

88 Zero-One Matrices

The corresponding results for rvecs is found by taking the transpose of both
sides of Equation 2.68 to obtain
rvec (vecmC )′ = rvec (CKmG )′ .
   

That is,
rvec (rvecm A) = rvec KGm A
where A is a mG × p matrix.
A more general analysis also applies to intertwining columns of matrices.
If we take the transpose of Equation 2.60, we have
′ ′
 ′ 
r′ 1′ r′
(A1 . . . Ar )TG,p

,...,p = A 1
1 . . . A 1 . . . A G . . . A G ,
1 r


′ j j′ j j
then letting C j = A j = (A1 . . . AG ) = (C1 . . . CG ), for j = 1, . . . , r, we
have
(C 1 . . . C r )TG,p

 1 r 1 r

1
,...,p = C1 . . . C1 . . . CG . . . CG ,
r

where from Equation 2.62


⎛ ⎞
KG p O
1

⎜ KG p ⎟

′ 2
TG,p ,...,p = ⎜ ..
⎟K
⎟ p1 +···+pr ,G , (2.69)
1 r
.

⎝ ⎠
O KG p
r

and from Equation 2.63



TG,n,...,n = (KGr ⊗ In )′ = KrG ⊗ In .
THREE

Elimination and Duplication Matrices

3.1 Introduction
A special group of selection matrices is associated with the vec, vech, and
v(A) of a given square matrix A. These matrices are called elimination
matrices and duplication matrices. They are extremely important in the
application of matrix calculus to statistical models as we see in Chapter 6.
The purpose of this chapter is not to list all the known results for these
matrices. One can do no better than refer to Magnus (1988) for this. Rather,
we seek to present these matrices in a new light and in such a way that
facilitates the investigation as to how these matrices interact with other
matrices, particularly Kronecker products. The mathematics involved in
doing this entitles a new notation – well, at least it is new to me. But it
is hoped that the use of this notation makes it clear how the otherwise
complicated matrices behave.

3.2 Elimination Matrices


Consider A an n×n matrix. As noted in Section 1.4.3 of Chapter 1, vecA
contains all the elements of A. It is the n2 ×1 vector formed by stacking the
columns of A underneath each other. The vechA is the 21 n(n + 1)×1 vector
formed by stacking the elements on and beneath the main diagonal under
each other. Finally, v(A) is the 21 n(n − 1)×1 vector formed by stacking
the elements beneath the main diagonal under each other. Clearly, vecA
contains all the elements in vechA and v(A). It follows that there exists zero-
one matrices Ln and Ln whose orders are 21 n(n + 1)×n2 and 21 n(n − 1)×n2 ,
respectively, such that
Ln vec A = vech A

Ln vec A = v(A).
89
90 Elimination and Duplication Matrices

Variations of these matrices are used in the case when A is a symmetric


matrix.
Recall from Section 2.5.7 in Chapter 2 that

1
Nn vec A = (vec A + vec A′ ) = vec A,
2

if A is symmetric. It follows that when A is a symmetric matrix

Ln Nn vec A = Ln vec A = vech A

and

Ln Nn vec A = Ln vec A = v(A).

Finally, note that as vechA contains all the elements in v(A) and more, there
exists a 12 n(n − 1)× 12 n(n + 1) zero-one matrix Ln∗ such that

Ln∗ vech A = v(A).

All these matrices Ln , Ln Nn , Ln , Ln N and Ln∗ are called elimination matrices


and in this section we study some of their properties in detail. We are par-
ticularly interested in how these matrices interact with Kronecker products.
The approach taken here differs from that taken by other authors such as
Magnus (1988) and Magnus and Neudecker (1988). What we seek to do is to
break these matrices down to smaller submatrices. Studying the properties
of these submatrices facilitates achieving new results for the elimination
matrices themselves.

3.2.1 The Elimination Matrix Ln


To begin, consider the following n×n matrix:
⎛ ⎞
a11 . . . a1n
⎜ .. .. .. ⎟
A=⎝ . . . ⎠
an1 · · · ann
3.2 Elimination Matrices 91

Then,
⎞⎛ ⎛ ⎞
⎛ ⎞ a11 a21
a11 ⎜ .. ⎟ ⎜ .. ⎟
⎜ .. ⎟ ⎜ . ⎟ ⎜ . ⎟
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ an1 ⎟ ⎜ an1 ⎟
⎜ an1 ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ a22 ⎟ ⎜ a32 ⎟
vec A = ⎜ ... ⎟ , vech A = ⎜ . ⎟ , v(A) = ⎜ . ⎟ ,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ .. ⎟ ⎜ .. ⎟
⎜a ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 1n ⎟ ⎜a ⎟ ⎜ a ⎟
⎜ . ⎟ n2 n2
⎝ .. ⎠
⎜ ⎟ ⎜ ⎟
⎜ . ⎟ ⎜ . ⎟
⎝ .. ⎠ ⎝ .. ⎠
ann
ann ann−1

which are n2 ×1, 12 n(n + 1)×1 and 21 n(n − 1)×1 vectors, respectively.
Comparing vechA with vecA, it is clear that Ln is the 12 n(n + 1)×n2 block
diagonal matrix given by

⎛ ⎞
In O
⎜ E1 ⎟
Ln = ⎜ ⎟, (3.1)
⎜ ⎟
..
⎝ . ⎠
O En−1

where Ej is the n − j ×n matrix given by

Ej = ( O In − j ) (3.2)
n−j ×j n − j ×n − j

for j = 1, . . . , n − 1.
Note, for convenience we only use one subscript j to identify Ej , the
second parameter n being obvious from the content. For example, if we are
dealing with L3 , then
 
0 1 0
E1 = ( 0 I2 ) = ,
2×1 2×2
0 0 1

 
E2 = ( 0 I1 ) = 0 0 1
1×2
92 Elimination and Duplication Matrices

and
⎛ ⎞
I3 O O
L3 = ⎝O E1 O⎠
O O E2
⎛ ⎞
1 0 0 0 0 0 0 0 0
⎜0 1 0 0 0 0 0 0 0⎟
⎜ ⎟
⎜0 0 1 0 0 0 0 0 0⎟
=⎜⎜0
⎟.
⎜ 0 0 0 1 0 0 0 0⎟⎟
⎝0 0 0 0 0 1 0 0 0⎠
0 0 0 0 0 0 0 0 1

Also, for mathematical convenience, we take E0 = In . Note that En−1 = enn .
The matrix E j itself can be regarded as an elimination matrix. If A and B
are n×m and p×n matrices, respectively, then
E j A = (A) j , j = 1, . . . , n − 1 (3.3)
where (A) j is the n − j ×m matrix formed from A by deleting the first j
rows of A, and
BE ′j = (B) j , j = 1, . . . , n − 1 (3.4)
where (B) j is the p×n − j matrix formed from B by deleting the first j
column of B. For mathematical convenience, we said we would take E0 = In ,
so this implies that we must also take
(A)0 = A
and
(B)0 = B.
Note that when we use this notation for j = 1, . . . , n − 1
j
(A)′j = (A′ )
and

(B) j = (B ′ )j .
In particular, with A an n×m matrix
⎛ ⎞′
a j+1
(vec A)′n j = ⎝ ... ⎠ = a′j+1 . . . am

= ((vec A)′ )n j = (rvec A′ )n j ,
⎜ ⎟  

am
3.2 Elimination Matrices 93

and
⎛ j+1 ⎞
a
′ ′
n′ ′ ⎜ .. ⎟ 
(rvec A)m j = a j+1 = ⎝ . ⎠ = (rvec A)′ m j
   
··· a
an
= (vec A′ )m j .
When we apply this notation to Kronecker products, we have

(A ⊗ x ′ ) j = (A) j ⊗ x ′ (3.5)

(x ′ ⊗ A) j = x ′ ⊗ (A) j (3.6)

(B ⊗ x) j = (B) j ⊗ x

(x ⊗ B) j = x ⊗ (B) j .

When working with columns from identity matrices and indeed identity
matrices themselves, we have
 n
e j i = e n−i
j−i i< j
=0 i≥ j (3.7)
and
 
(In ) j = O In−j = E j , (3.8)
for j = 1, . . . , n − 1. Also,
 
′ p′
(In ) j ⊗ emp = n−j×j
O In−j ⊗ em
p
 
n−j n−j
= O O e1 O ... O en−j O (3.9)

by Theorem 1.1 of Chapter 1.


Returning to Ej , we have that for any vector x

E j (x ′ ⊗ A) = x ′ ⊗ (A) j (3.10)

E j (A ⊗ x ′ ) = (A) j ⊗ x ′ (3.11)

(x ⊗ B)E j′ = x ⊗ (B) j

(B ⊗ x)E j′ = (B) j ⊗ x
94 Elimination and Duplication Matrices

Using the properties of Ej , involving as they do our newly introduced nota-


tion, we can obtain properties for the elimination matrix Ln itself. Suppose
A is n2 × p and we partition it as
⎛ ⎞
A1
⎜ .. ⎟
A=⎝ . ⎠ (3.12)
An

where each submatrix in this partitioning is n × p, then from Equation 3.1:

A1
⎛ ⎞ ⎛ ⎞
A1
⎜ E A ⎟ ⎜ (A2 )1 ⎟
⎜ 1 2 ⎟ ⎜
Ln A = ⎜ . ⎟ = ⎜ . ⎟ .

⎝ .. ⎠ ⎝ .. ⎠
En−1 An (An )n−1

Similarly, if B is an q × n2 matrix and we partition B as

B = (B1 · · · Bn ) (3.13)

where each submatrix in this partitioning is q × n, then

BLn′ = B1 B2 E1′ · · · Bn En−1′ = B1 (B2 )1 · · · (Bn )n−1 .


   

If C is a n2 × n2 matrix and initially partition C as


⎛ ⎞
C1
⎜ .. ⎟
C = ⎝ . ⎠, (3.14)
Cn

where each submatrix in this partitioning is n × n2 , then


⎛ ⎞
C1
⎜ (C2 )1 ⎟
LnCLn′ = ⎜ . ⎟ Ln′ .
⎜ ⎟
⎝ .. ⎠
(Cn )n−1

Now, if we partition C as
⎛ ⎞
C11 · · · C1n
⎜ .. .. ⎟
C=⎝ . . ⎠ (3.15)
Cn1 · · · Cnn
3.2 Elimination Matrices 95

where each submatrix Ci j in this partitioning is n × n, then


⎛ ⎞
C11 (C12 )1 ··· (C1n )n−1
((C22 )1 )1 · · · ((C2n )1 )n−1 ⎟
⎜ ⎟

⎜ (C21 )1
LnCLn = ⎜

..
⎟.

⎝ . ⎠
1 n−1
(Cn1 )n−1 ((Cn2 )n−1 ) · · · ((Cnn )n−1 )

Of course, n2 × p, q × n2 and n2 × n2 matrices often arise in Kronecker


products. The following four theorems tell us how the elimination matrix
interacts with Kronecker products.

Theorem 3.1 Let A be an n×n matrix and b be an n×1 vector. Then,



a1 ⊗ b
⎛ ⎞
⎜ a2′ ⊗ (b) ⎟
1 ⎟
Ln (A ⊗ b) = ⎜

.. ⎟
⎝ . ⎠

an ⊗ (b)n−1

b1 A
⎛ ⎞
⎜ b2 (A)1 ⎟
Ln (b ⊗ A) = ⎜
⎜ ⎟
.. ⎟
⎝ . ⎠
bn (A)n−1

(A ⊗ b′ )Ln′ = a1 ⊗ b′ a2 ⊗ (b′ )1 . . . an ⊗ (b′ )n−1


(b′ ⊗ A)Ln′ = b1 A b2 (A)1 . . . bn (A)n−1 .

Proof: Using Equation 3.1


⎛ ⎞
In O ⎛ ′ ⎞
⎜ E1 ⎟ a1 ⊗ b
⎟ ⎜ .. ⎟
Ln (A ⊗ b) = ⎜

.. ⎟⎝ . ⎠
⎝ . ⎠ n′
a ⊗b
O En−1
′ ′
a1 ⊗ b a1 ⊗ b
⎛ ⎞ ⎛ ⎞
′ ′
⎜ a 2 ⊗ E1 b ⎟ ⎜ a2 ⊗ (b)1 ⎟
=⎜ ⎟=⎜ ⎟,
⎜ ⎟ ⎜ ⎟
.. ..
⎝ . ⎠ ⎝ . ⎠
′ ′
an ⊗ En−1 b an ⊗ (b)n−1

where we have used Equation 3.3.


96 Elimination and Duplication Matrices

Likewise,
⎛ ⎞
In O ⎛ ⎞
⎜ E1 ⎟ b1 A
⎟ ⎜ .. ⎟
Ln (b ⊗ A) = ⎜

.. ⎟⎝ . ⎠
⎝ . ⎠
bn A
O En−1
b1 A b1 A
⎛ ⎛ ⎞ ⎞
⎜ b E A ⎟ ⎜ b (A) ⎟
⎜ 2 1 ⎟ ⎜ 2 1 ⎟
=⎜ .. ⎟=⎜ .. ⎟,
⎝ . ⎠ ⎝ . ⎠
bn En−1 A bn (A)n−1
from Equation 3.3.
Now,
⎛ ⎞
In O
⎜ E1′ ⎟
(A ⊗ b′ )Ln′ = (a1 ⊗ b′ ··· an ⊗ b′ ) ⎜
⎜ ⎟
.. ⎟
⎝ . ⎠

O En−1
= a1 ⊗ b′ a2 ⊗ b′ E1′ . . . an ⊗ b′ En−1
= a1 ⊗ b′ a2 ⊗ (b′ )1 . . . an ⊗ (b′ )n−1 ,
using Equation 3.4. Finally,
⎛ ⎞
In O
⎜ E1′ ⎟
(b′ ⊗ A)Ln′ = b1 A

· · · bn A ⎜
⎜ ⎟
.. ⎟
⎝ . ⎠
O En′
= b1 A b2 AE1′ . . . bn AEn−1

= b1 A b2 (A)1 . . . bn (A)n−1 . 

We are now in a position to represent Ln (A ⊗ B), (A ⊗ B)Ln′ , and


Ln (A ⊗ B)Ln′ in an informative way, where A and B are n × n matrices.

Theorem 3.2 If A and B are both n × n matrices, then



a1 ⊗ B
⎛ ⎞

⎜ a2 ⊗ (B) ⎟
1 ⎟
Ln (A ⊗ B) = ⎜ ⎟.

..
⎝ . ⎠

an ⊗ (B)n−1
3.2 Elimination Matrices 97

Proof: Using Equation 3.1, we write



a1 ⊗ B
⎛ ⎞⎛ ⎞
In O

⎜ E1 ⎟ ⎜ a2 ⊗ B ⎟
Ln (A ⊗ B) = ⎜
⎜ ⎟⎜ ⎟
.. ⎟ ⎜ .. ⎟
⎝ . ⎠⎝ . ⎠

O En−1 an ⊗ B
′ ′
a1 ⊗ B a1 ⊗ B
⎛ ⎞ ⎛ ⎞
⎜ E (a2′ ⊗ B) ⎟ ⎜ a2′ ⊗ (B) ⎟
⎜ 1 1 ⎟
=⎜ ⎟=⎜ ⎟,
⎟ ⎜
.. ..
⎝ . ⎠ ⎝ . ⎠
′ ′
En−1 (an ⊗ B) an ⊗ (B)n−1
where we have used Equation 3.10. 

Theorem 3.3 If A and B are both n × n matrices, then


(A ⊗ B)Ln′ = a1 ⊗ B a2 ⊗ (B)1 . . . an ⊗ (B)n−1 .

Proof: Clearly,

a1 ⊗ B a12 (B)1 a1n (B)n−1
⎛ ⎞ ⎛ ⎞
a11 B ···
′ ⎜ .. ⎟ ′ ⎜ .. .. ..
(A ⊗ B)Ln = ⎝ . ⎠ Ln = ⎝ .

. . ⎠

an ⊗ B an1 B an2 (B)1 · · · ann (B)n−1
= a1 ⊗ B a2 ⊗ (B)1 ... an ⊗ (B)n−1 ,
where we have used Theorem 3.1. 

Theorem 3.4 If A and B are both n × n matrices, then


a12 (B)1 a1n (B)n−1

a11 B

···
⎜ a (B)
⎜ 21 1 a22 ((B)1 )1 · · · a2n ((B)1 )n−1 ⎟
Ln (A ⊗ B)Ln′ = ⎜ ⎟.

.
. .
. .
.
⎝ . . . ⎠
1 n−1
an1 (B)n−1 an2 ((B)n−1 ) · · · ann ((B)n−1 )

Proof: From Theorem 3.2



(a1 ⊗ B)Ln′
⎛ ⎞

⎜ (a2 ⊗ (B)1 )Ln′ ⎟
Ln (A ⊗ B)Ln′ = ⎜ ⎟.
⎜ ⎟
..
⎝ . ⎠

(an ⊗ (B)n−1 )Ln′
98 Elimination and Duplication Matrices

But from Theorem 3.1


 j′ 1 n−1
a ⊗ (B) j−1 Ln′ = a j1 (B) j−1 a j2 (B) j−1 . . . a jn (B) j−1
  
,
for j = 2, . . . , n. 

3.2.2 The Elimination Matrix Ln Nn


Recall that Nn = 12 (In2 + Knn ) is the n2 ×n2 matrix with the property that
for a square n×n matrix A
1
Nn vec A = vec (A + A′ )
2
so if A is a symmetric matrix
Nn vec A = vec A.
It follows that for a symmetric matrix A
Ln vec A = Ln Nn vec A = vech A.
So, Ln Nn itself can be regarded as an elimination matrix for symmetric
matrices. The difference in the operating of Ln and Ln Nn is this: for i > j,
the elimination matrix Ln picks ai j from vecA directly. The matrix Ln Nn ,
however, recognises that A is symmetric and chooses ai j for vechA by
picking ai j and a ji from vecA and forming ai j + a ji /2. Because Ln Nn recog-
nises the symmetry in A, it is the elimination matrix that should be used in
dealing with such symmetric matrices. We proceed in much the same way
as we did for Ln . Our first task is to form explicit expressions for Ln Nn .
From Equation 2.44 in Chapter 2, we can write
1 
Ln Nn = Ln In ⊗ e1n + e1n ⊗ In . . . In ⊗ enn + enn ⊗ In .

2
Consider twice the jth submatrix in this matrix, which is
Ln In ⊗ e nj + e nj ⊗ In .
 

Using Theorem 3.1, we can write the matrix as


⎛ ⎞
′ O
e1n ⊗ e nj
⎛ ⎞
⎜ .. ⎟
⎜ ′
e2n ⊗ e nj 1
  ⎟ ⎜ . ⎟
n n
  ⎜ ⎟ ⎜ ⎟
Ln In ⊗ e j + e j ⊗ In = ⎜
⎜ ..
⎟ + ⎜(In ) j−1 ⎟ jth.
⎟ ⎜ ⎟
⎝ .  ⎠ ⎜ .. ⎟

enn ⊗ e nj n−1
 ⎝ . ⎠
O
3.2 Elimination Matrices 99

Using Equations 3.7 and 3.8, we obtain



⎛ ⎞
e1n ⊗ e nj ⎛
O

⎜ ′ ⎟
⎜ e n ⊗ e n−1 ⎟ ⎜ O ⎟
⎜ 2 j−1 ⎟
⎟ ⎜ .. ⎟
⎜ ⎟
⎜ ..
. ⎟ ⎜ . ⎟
⎜ ⎟ ⎜ ⎟
n n
  ⎜
Ln In ⊗ e j + e j ⊗ In = ⎜e n ⊗ e n− j+1 ⎟ + ⎜ E j−1 ⎟
⎜ ′ ⎟ ⎜
⎟ = Pj ,
⎜ j 1 ⎟ ⎜ O ⎟
⎜ O ⎟ ⎜
⎟ ⎜ . ⎟

.. ⎟ ⎝ .. ⎠


⎝ . ⎠
O O

where Pj is the 12 n(n + 1)×n matrix given by


⎛⎛ n ⎞1
ej O

⎜⎜ .. ⎟
O⎟
⎜⎝ . ⎠ ⎟
Pj = ⎜ O n− j+2 (3.16)
⎜ ⎟
⎜ e2 ⎟

⎝ O R⎠ j
O O

for j = 2, . . . , n − 1, and Rj is the n − j + 1 × n − j + 1 matrix given by


⎛ ⎞
2 O
n− j+1
⎜ 1 ⎟
R j = e1 + E j−1 = ⎜ (3.17)
⎜ ⎟
. . ⎟
⎝ . ⎠
O 1
for j = 1, 2, . . . , n − 1.
Additionally, let
 
R1
P1 = . (3.18)
O
and
enn
⎛ ⎞
O
⎜ .. ⎟
Pn = ⎜
⎜ . ⎟

⎝ e22 ⎠
O Rn
with Rn = 2.
1
The comments made with regard to Ej apply equally well for Pj and Rj .
100 Elimination and Duplication Matrices

Under this notation, an explicit expression for Ln Nn is


1
Ln Nn = (P · · · Pn ). (3.19)
2 1
For example,
1
L3 N3 = (P P2 P3 )
2 1
where
⎛ ⎞
2 0 0
⎜0 1 0⎟
  ⎜ ⎟
R1 ⎜0 0 1⎟
P1 = =⎜ ⎟
O ⎜0
⎜ 0 0⎟⎟
⎝0 0 0⎠
0 0 0
⎛ ⎞
0 0 0
⎛ 3 ⎞ ⎜1 0 0⎟
e2 O ⎜
⎜0

0 0⎟
P2 = 0
⎝ R2 = ⎜

⎜0

2 0⎟
0 0′ ⎜
⎝0

0 1⎠
0 0 0
⎛ ⎞
0 0 0
⎞ ⎜0 0 0⎟
e33

0 0 ⎜
⎜1

0 0⎟
P3 = 0
⎝ e22 0 =⎜

⎜0

0 0⎟
0 0 R3 ⎜
⎝0

1 0⎠
0 0 2
so
⎛ ⎞
2 0 0 0 0 0 0 0 0
⎜0 1 0 1 0 0 0 0 0⎟
⎜ ⎟
1 ⎜0 0 1 0 0 0 1 0 0⎟
L3 n3 = ⎜ ⎟.
2⎜⎜0 0 0 0 2 0 0 0 0⎟

⎝0 0 0 0 0 1 0 1 0⎠
0 0 0 0 0 0 0 0 2
The explicit expression obtained for Ln Nn writes this matrix as a ‘row’
of submatrices. An alternative expression for Ln Nn writes the matrix as a
‘column’ of submatrices.
3.2 Elimination Matrices 101

Write
1 1
Ln Nn = Ln (In2 + Knn ) = (Ln + Ln Knn ).
2 2
Now, from Equation 2.8 of Chapter 2, we have
Ln Knn = Ln In ⊗ e1n . . . In ⊗ enn .
 

Using Theorem 3.1, we can then write Ln Knn as


⎛ n′ ′ ⎞
e1 ⊗ e1n ··· e1n ⊗ enn
⎜ n′  n ′
· · · e2n ⊗ enn 1 ⎟
  ⎟
⎜ e2 ⊗ e1 1
Ln Knn = ⎜

.. ..


⎝ .  .  ⎠
′ ′
enn ⊗ e1n n−1 · · · enn ⊗ enn n−1
 

⎛ ′ ′ ⎞ ⎛
e1n ⊗ e1n ··· enn ⊗ e1n In ⊗ e1n
′ ⎞
⎜ n ′ ′ ⎟ n′ ⎟
⎜ e1 1 ⊗ e2n enn 1 ⊗ e2n ⎟ ⎜
   
··· ⎜ (In )1 ⊗ e2 ⎟
=⎜⎜
. .. ⎟=⎜
⎟ ..
⎝  ..

. ⎠ ⎝ . ⎠

′ ′ n
e1n n−1 ⊗ enn · · · enn n−1 ⊗ enn
 
(In )n−1 ⊗ en

so if we write 2Ln Nn as a ‘column’ of submatrices, the jth submatrix would


be, using Equation 3.1
 
O E j−1 O ′
T j = n− j+1×n( j−1) n− j+1×n(n− j )
+ (In ) j−1 ⊗ e nj ,
n−j+1×n

for j = 2, . . . , n − 1.
But, we can write
′ ′
 
′ e nj ⊗ e nj
(In ) j−1 ⊗ e nj = ′
(In ) j ⊗ e nj
and using Equation 3.9, we have that
 ′

n′
0′ e nj 0′ 0 0′ · · · 0′ 0 0′
(In ) j−1 ⊗ e j = n− j n− j
O O O e1 O · · · O en− j O
  ′ 
e nj n− j+1 n− j+1
= O O e2 O · · · O en− j+1 O
O
so

en
   
n− j+1 n− j+1
Tj = O E j−1 + j O e2 O ... O en− j+1 O .
O
102 Elimination and Duplication Matrices

Clearly, from Equation 3.2



en
 
E j−1 + j
 
= O Rj
O
where Rj is the n − j + 1 × n − j + 1 matrix given by Equation 3.17.
j
If we let Zi be the n − j + 1 × n matrix given by
 2
n− j+1
j
Zi = O ei O (3.20)
n− j+1× j−1 n− j+1×n − j

for i = 2, . . . , n − j + 1 and for j = 2, . . . , n − 1 and we let


 n
Zi1 = ei n×n−1 O

(3.21)

for the same values of i then we can write


 j j

O
Tj = n − j+1×n( (O R j ) Z2 ··· Zn − j+1 (3.22)
j−1) n − j+1×n

for j = 2, . . . , n − 1.
The other two submatrices are
Z21 · · · Zn1
 
T1 = R1 (3.23)
and
Tn = (0′ · · · 0′En−1 ) + en′ ⊗ en′
′ ′
= (0′ · · · 0′ enn ) + 0′ · · · 0′ enn
 
 ′
0′
= 1×n(n−1) 2enn . (3.24)

The second explicit expression for Ln Nn is then


⎛ ⎞
T1
1⎜ ⎟
Ln Nn = ⎝ ... ⎠ . (3.25)
2
Tn
j j
The matrices Z2 . . . Zn− j+1 are interesting in themselves but we reserve our
study of the properties of these matrices until a future section on duplication
matrices where they appear again. The reader may also be surprised that after
all the trouble we have gone through to obtain explicit expressions for Ln Nn ,
we do not use them to get insights into how the elimination matrix Ln Nn
2
Comments made about E j apply equally well to Zij . Each Zij depends on n, but this
parameter will be clear from the content.
3.2 Elimination Matrices 103

interacts with Kronecker products. The mathematics is simpler if we use


known properties of Ln and Nn . However, we shall certainly use the explicit
expressions for Ln Nn in the last subsection where we make comparisons
between 2 Ln Nn and the duplication matrix Dn .
But first Kronecker products and Ln Nn . The notation introduced in Equa-
tions 3.3 and 3.4 is used extensively in the theorems that follow.

Theorem 3.5 Let A, B, C, and D be n× p, n×q, r ×n, and s×n matrices,


respectively. Then,
′ ′
a1 ⊗ B + A ⊗ b1
⎛ ⎞
′ ′
2 2
⎜ a ⊗ (B)1 + (A)1 ⊗ b
1⎜ ⎟
Ln Nn (A ⊗ B) =

2⎝
⎜ .. ⎟
. ⎠
′ ′
an ⊗ (B)n−1 + (A)n−1 ⊗ bn

and

(C ⊗ D)Nn Ln′
1
= c ⊗ D + C ⊗ d1 c2 ⊗ (D)1 + (C )1 ⊗ d2 · · · cn ⊗ (D)n−1
2 1 
+ (C )n−1 ⊗ dn . (3.26)

Proof: From the definition of Nn given in Section 2.5.7 of Chapter 2, we


have
1 1
Ln Nn (A ⊗ B) = Ln (A ⊗ B + Knn (A ⊗ B)) = Ln (A ⊗ B + (B ⊗ A)Kqp )
2 2
′ ′
a1 ⊗ B b1 ⊗ A
⎡⎛ ⎞ ⎛ ⎞ ⎤
′ ′
⎜ 2 ⎟ ⎜ 2
1⎢⎢⎜ a ⊗ (B)1 ⎟ ⎜ b ⊗ (A)1 ⎟
⎟ ⎥
= ⎢⎜ + K

.. .. qp ⎥
2 ⎣⎝
⎟ ⎜ ⎟
. ⎠ ⎝ . ⎠ ⎦
′ ′
an ⊗ (B)n−1 bn ⊗ (A)n−1
′ ′
a1 ⊗ B + A ⊗ b1
⎛ ⎞
′ ′
1⎜ a2 ⊗ (B)1 + (A)1 ⊗ b2 ⎟
=
⎜ ⎟
..
2⎝
⎜ ⎟
. ⎠
′ ′
an ⊗ (B)n−1 + (A)n−1 ⊗ bn

where we have used Theorem 3.2 and Equations 2.11 and 2.14 of Chapter 2
in our working.
104 Elimination and Duplication Matrices

In a similar manner, using Theorem 3.3


1
(C ⊗ D)Nn Ln′ = (C ⊗ D + Krs (D ⊗ C )Ln′
2
1 
c1 ⊗ D c2 ⊗ (D)1 · · · cn ⊗ (D)n−1

=
2

+ Krs d1 ⊗ C d2 ⊗ (C )1 · · · dn ⊗ (C )n−1


1
= c ⊗ D + C ⊗ d1 c2 ⊗ (D)1
2 1

+ (C )1 ⊗ d2 · · · cn ⊗ (D)n−1 + (C )n−1 ⊗ dn . 

Theorem 3.6 Let A and B be n×n matrices and write


⎛ ⎞
C1
′ ⎜ .. ⎟
Ln Nn (A ⊗ B)Nn Ln = ⎝ . ⎠ .
Cn

Then, the jth submatrix Cj is the n − j + 1 × 12 n(n + 1) matrix given by

1  1
· · · a jn (B) j−1 )n−1
  
Cj = a j1 (B) j−1 a j2 (B) j−1
4
1
· · · b jn (A) j−1 )n−1
  
+ b j1 (A) j−1 b j2 ((A) j−1
′ ′ ′
+ (a1 ) j−1 ⊗ b j (a2 ) j−1 ⊗ (b j )1 · · · (an ) j−1 ⊗ (b j )n−1
 

′ ′ ′ 
+ (b1 ) j−1 ⊗ a j (b2 ) j−1 ⊗ (a j )1 · · · (bn ) j−1 ⊗ (a j )n−1


for j = 1, . . . , n.

Proof: From Equation 2.50 of Chapter 2, we have


⎛ 1′ ′ ′ ′
a ⊗ B + A ⊗ b1 + B ⊗ a1 + b1 ⊗ A

1 ⎜ .. ..
Ln Nn (A ⊗ B)Nn Ln′ = Ln ⎝
⎟′
. . ⎠Ln
4 n′ n′ n′ n′
a ⊗B+A⊗b + B⊗a +b ⊗A
′ ′ ′ ′
a1 ⊗ B + A ⊗ b1 + B ⊗ a1 + b1 ⊗ A
⎛ ⎞
′ ′ ′ ′
2 2 2 2
1⎜⎜ E1 (a ⊗ B + A ⊗ b + B ⊗ a + b ⊗ A) ⎟ ′

= ⎜ .. ⎟Ln
4⎝ . ⎠
′ ′ ′ ′
En−1 (an ⊗ B + A ⊗ bn + B ⊗ an + bn ⊗ A)
3.2 Elimination Matrices 105

where we have used the representation of Ln given by Equation 3.1. Using


the properties of Ej given by Equations 3.10 and 3.11, we write

Ln Nn (A ⊗ B)Nn Ln′
′ ′ ′ ′
a1 ⊗ B + A ⊗ b1 + B ⊗ a1 + b1 ⊗ A
⎛ ⎞
2′ ′ ′ ′
1⎜ a ⊗ (B)1 + (A)1 ⊗ b2 + (B)1 ⊗ a2 + b2 ⊗ (A)1 ⎟
⎟ ′
= ⎟ Ln .

..
4⎝

. ⎠
′ ′ ′ ′
an ⊗ (B)n−1 + (A)n−1 ⊗ bn + (B)n−1 ⊗ an + bn ⊗ (A)n−1

The jth submatrix of this matrix is

1  j′ ′ ′ ′

= a ⊗ (B) j−1 + (A) j−1 ⊗ b j + (B) j−1 ⊗ a j + b j ⊗ (A) j−1 Ln′ .
4
Applying Theorem 3.1 gives the result. 

Often, in the application of matrix calculus to statistics, as we shall see


in Chapter 6, we are confronted with matrices like Ln Nn (A ⊗ A)Nn Ln′ and
Ln (A ⊗ A)Ln′ .
It is informative to spend a little time comparing these two matrices for
this special case. From the properties of Nn discussed in Section 2.5.7 of
Chapter 2, we have that

Ln Nn (A ⊗ A)Nn Ln′ = Ln (A ⊗ A)Nn Ln′


1 1
= Ln (A ⊗ A)Ln′ + Ln (A ⊗ A)Knn Ln′ .
2 2
Using Theorem 3.2, we have

Ln (A ⊗ A)Knn Ln′
′ ′
a1 ⊗ A A ⊗ a1
⎛ ⎞ ⎛ ⎞
′ ′
⎜ a2 ⊗ (A) ⎟ ⎜ (A)1 ⊗ a2 ⎟
1
⎟ Knn Ln′ = ⎜
⎟ ′
=⎜ ⎟ Ln
⎜ ⎟ ⎜
.. ..
⎝ . ⎠ ⎝ . ⎠
′ ′
an ⊗ (A)n−1 (A)n−1 ⊗ an
′ ′ ′
a1 ⊗ a 1 a2 ⊗ (a1 )1 an ⊗ (a1 )n−1
⎛ ⎞
···
⎜ (a ) ⊗ a2′ ′
(a2 )1 ⊗ (a2 )1 ···

(an )1 ⊗ (a2 )n−1 ⎟
⎜ 1 1
=⎜ ⎟.

.. .. ..
⎝ . . . ⎠
′ ′ ′
(a1 )n−1 ⊗ an (a2 )n−1 ⊗ (an )1 ··· (an )n−1 ⊗ (an )n−1
106 Elimination and Duplication Matrices

Putting our pieces together, we have that


1
Ln Nn (A ⊗ A)Nn Ln′ = Ln (A ⊗ A)Ln′
2
′ ′ ′
a1 ⊗ a1 a2 ⊗ (a1 )1 an ⊗ (a1 )n−1
⎛ ⎞
···
′ ′ ′
2
1⎜⎜ (a1 )1 ⊗ a (a2 )1 ⊗ (a2 )1 ··· (an )1 ⊗ (a2 )n−1 ⎟
+ ⎜ ⎟.

.. .. ..
2⎝ . . . ⎠
′ ′ ′
(a1 )n−1 ⊗ an (a2 )n−1 ⊗ (an )1 ··· (an )n−1 ⊗ (an )n−1
(3.27)
Consider, for example, the case where A is a 2×2 matrix. By Theorem 3.4,
we have
a12 (A)1
 
′ a11 A
L2 (A ⊗ A)L2 =
a21 (A)1 a22 ((A)1 )1
⎛    ⎞
a11 a12 a
⎜a11 a a21 12 ⎟
=⎝ 21 a22 a22 ⎠
 
a21 a21 a22 a22 a22
⎛ 2 ⎞
a11 a11 a12 a21 a12
= ⎝a11 a21 a11 a22 a21 a22 ⎠ .
2 2
a21 a21 a22 a22

By Equation 3.27,
′ ′
a1 ⊗ a1 a2 ⊗ (a1 )1
 
1 1
L2 N2 (A ⊗ A)N2 L2′ = L2 (A ⊗ A)L2′ + ′ ′
2 2 (a1 )1 ⊗ a2 (a2 )1 ⊗ (a2 )1
⎛ ⎞
a11 (a11 a12 ) a21 a12
1 1
= L2 (A ⊗ A)L2′ + ⎝a21 (a11 a12 ) a22 a12 ⎠
2 2
a21 (a21 a22 ) a22 a22
⎛ 2 ⎞
a11 a11 a12 a21 a12
1 1
= L2 (A ⊗ A)L2′ + ⎝a21 a11 a21 a12 a22 a12 ⎠
2 2 2 2
a21 a21 a22 a22
⎛ 2 ⎞
a11 a11 a12 a21 a12

=⎜ a11 a22 + a21 a12 a21 a22 + a22 a12 ⎟ ⎟.
⎝a11 a21
2 2

2 2
a21 a21 a22 a22
(3.28)
Note if A is 2×2 and symmetric then only the (2, 2) element differs in these
two matrices.
3.2 Elimination Matrices 107

3.2.3 The Elimination Matrices Ln and Ln Nn


Comparing vecA with v(A), we see that Ln is the 21 n(n − 1) × n2 matrix
given by
⎛ ⎞
E1 O ··· ··· O
⎜ O E2 ··· ··· O⎟
Ln = ⎜ . .. ⎟ .
⎜ ⎟
.. ..
⎝ .. . . .⎠
0′ 0′ · · · En−1 0′
For example,
⎛ ⎞
  0 1 0 0 0 0 0 0 0
E1 O O
L3 = = ⎝0 0 1 0 0 0 0 0 0⎠ .
0′ E2 0′
0 0 0 0 0 1 0 0 0

Properties of Ln can be obtained from the properties of E j in much the same


way as we derived properties for Ln .
If A, B, and C are the matrices given by Equations 3.12, 3.13, and 3.14,
respectively, then
⎛ ⎞
(A1 )1
⎜ (A2 )2 ⎟
Ln A = ⎜
⎜ ⎟
.. ⎟
⎝ . ⎠
(An−1 )n−1

= (B1 ) . . . (Bn−1 )n−1
1
 
BLn

⎛  1  n−1 ⎞
(C11 )1 ··· (C1n−1 )1
 1  n−1 ⎟
⎜ (C21 )2 ··· (C2n−1 )2

′ ⎟
LnCLn = ⎜
⎜ .. ⎟.
.

⎝ ⎠
 1  n−1
(Cn−11 )n−1 · · · (Cn−1n−1 )n−1
If A and B are n× p and n×q matrices, respectively, then
⎛ 1′ ⎞
a ⊗ (B)1
Ln (A ⊗ B) = ⎝
⎜ .. ⎟
. ⎠

an−1 ⊗ (B)n−1
and if C and D are r ×n and s×n matrices, respectively, then

(C ⊗ D)Ln = c1 ⊗ (D)1 ··· cn−1 ⊗ (D)n−1 .
108 Elimination and Duplication Matrices

Finally, if A and B are both n×n matrices, then


⎛  1  n−1 ⎞
a11 (B)1 ··· a1n−1 (B)1

Ln (A ⊗ B)Ln = ⎝ .. ..
⎠.
⎜ ⎟
. .
 1  n−1
an−11 (B)n−1 · · · an−1n−1 (B)n−1
In a similar manner, properties can be obtained for the elimination matrix
Ln Nn . If A and B n× p and n×q matrices, respectively, then
′ ′
a1 ⊗ (B)1 + (A)1 ⊗ b1
⎡ ⎤
1⎢ ..
Ln Nn (A ⊗ B) = ⎣ ⎦.

2 .
′ ′
n−1 n−1
a ⊗ (B)n−1 + (A)n−1 ⊗ b
If C and D are r ×n and s×n matrices, respectively, then
′ 1
(C ⊗ D)Nn Ln = c ⊗ (D)1 + (C )1 ⊗ d1 · · · cn−1 ⊗ (D)n−1
2 1
+ (C )n−1 ⊗ dn−1 .


If A and B are both n×n matrices and we write


⎛ ⎞
C1
Ln Nn (A ⊗ B)Nn Ln = ⎝ ... ⎠
′ ⎜ ⎟

Cn−1
then the submatrix Cj is a the n − j × 21 n(n − 1) given by
1  1  n−1
Cj = a j1 (B) j · · · a jn−1 (B) j
4
 1  n−1
+ b j1 (A) j · · · b jn−1 (A) j
′ ′
+ (b1 ) j ⊗ (a j )1 ··· (bn−1 ) j ⊗ (a j )n−1
′ ′

+ (a1 ) j ⊗ (b j )1 ··· (an−1 ) j ⊗ (b j )n−1 ,

for j = 1, . . . , n − 1.
For the special case A ⊗ A, we have

Ln Nn (A ⊗ A)Nn Ln
1 ′ 1 ′ 1 ′
= Ln (A ⊗ A)Ln + Ln (A ⊗ A)Knn Ln = Ln (A ⊗ A)Ln
2 ⎛ 2 2
′ ′
(a1 )1 ⊗ (a1 )1 (an−1 )1 ⊗ (a1 )n−1

···
1⎜ .. ..
+ ⎝ ⎠.

2 . .
n−1′ 1 n−1′ n−1
(a1 )n−1 ⊗ (a ) · · · (an−1 )n−1 ⊗ (a )
3.2 Elimination Matrices 109

′ ′
Consider Ln (A ⊗ A)Ln and Ln Nn (A ⊗ A)Nn Ln for the 3×3 case. First,
  1  2 
′ a11 (A)1 a12 (A)1
L3 (A ⊗ A)L3 =  1  2
a21 (A)2 a22 (A)2
⎛    ⎞
a22 a23 a23
a a
= ⎝ 11 a32 a33 12
a33 ⎠
 
a21 a32 a33 a22 a33
⎛ ⎞
a11 a22 a11 a23 a12 a23
= ⎝a11 a32 a11 a33 a12 a33 ⎠ .
a21 a32 a21 a33 a22 a33

Now,
  ′ 1  ′ 2 
′ (a1 )1 ⊗ a1 (a2 )1 ⊗ a1
L3 (A ⊗ A)K33 L3 =  ′ 1  ′ 2
(a1 )2 ⊗ a2 (a2 )2 ⊗ a2
⎛   ⎞
a21 a12 a13 a22 a13
⎜  
= ⎝a31 a12 a13 a32 a13 ⎠

 
a31 a22 a23 a32 a23
⎛ ⎞
a21 a12 a21 a13 a22 a13
= ⎝a31 a12 a31 a13 a32 a13 ⎠ ,
a31 a22 a31 a23 a32 a23

so

L3 N3 (A ⊗ A)N3 L3
⎛ ⎞
a a + a21 a12 a11 a23 + a21 a13 a12 a23 + a22 a13
1 ⎝ 11 22
= a11 a32 + a31 a12 a11 a33 + a31 a13 a12 a33 + a32 a13 ⎠ .
2
a21 a32 + a31 a22 a21 a33 + a31 a23 a22 a33 + a32 a23

The comments we made regarding Ln and Ln Nn for a symmetric matrix A


hold for Ln as well. If A is symmetric

Ln vec A = Ln Nn vec A = v(A)

with Ln Nn the elimination matrix that recognizes the fact that A is symmet-
ric.
To find an explicit expression for the matrix Ln Nn , we take a different
approach than the one we used for Ln Nn .
110 Elimination and Duplication Matrices

Using Equation 2.43 of Chapter 2, we write


′ ′
In ⊗ e1n + e1n ⊗ In
⎛ ⎞⎛ ⎞
E1 O O
1⎜ .. .. ⎟ ⎜ ..
Ln Nn = ⎝

. . .
2
⎠⎝ ⎠
′ ′
O En−1 0 ′ n n
In ⊗ en + en ⊗ In
n′ n′
⎛   ⎞
E1 In ⊗ e1 + e1 ⊗ In
1⎜ ..
= ⎝

2 . ⎠
′ ′
n n
 
En−1 In ⊗ en−1 + en−1 ⊗ In
′ ′
(In )1 ⊗ e1n + e1n ⊗ (In )1
⎛ ⎞
1⎜ ..
= ⎝ ⎠,

2 .
n′ n′
(In )n−1 ⊗ en−1 + en−1 ⊗ (In )n−1
where we have used Equations 3.10 and 3.11.
We have seen that (In ) j is the n − j × n matrix given by (In ) j =
O In−j = E , so the jth block of this matrix is the n − j × n2 matrix
 
n−j × j j
given by
′ ′
O In− j ⊗ e nj + e nj ⊗ O In− j
   
   
n′
 
= O In−j ⊗ e j + O O In−j O
n−j× jn n−j×( j−1)n n−j×n(n−j )
 
n′
= O In−j In− j ⊗ e j = Qj, (3.29)
(n−j )× j(n+1)−n

say, for j = 1, . . . , n − 1. Our explicit expression for Ln Nn is then


⎛ ⎞
Q1
1⎜
Ln Nn = ⎝ ... ⎠ .

2
Qn−1

3.2.4 The Elimination Matrices Ln∗


Comparing vech A with v(A), we see that Ln∗ is the 12 n(n − 1) × 21 n(n + 1)
matrix given by
⎛ ⎞
F1 O · · · · · · 0
⎜O F2 · · · · · · 0⎟
Ln∗ = ⎜ (3.30)
⎜ ⎟
.. .. ⎟
⎝ . .⎠
0′ 0′ · · · Fn−1 0
3.3 Duplication Matrices 111

where Fj is a n − j × n − j + 1 matrix given by


 
Fj = O In− j ,
n − j×1
n − j×n − j

for j = 1, . . . , n − 1.
Clearly, if A is a n − j + 1 × p matrix, then

Fj A = (A)1 . (3.31)

It follows then that

Fj E j−1 = (E j−1 )1 = E j ,

for j = 2, . . . , n − 1 and that F1 = E1 .


Using these properties, we have

F1 O ··· ··· 0 In · · · · · · O
⎛ ⎞⎛ ⎞
. ⎟ ⎜ .. .. ⎟
· · · .. ⎟

⎜O F2 · · · ⎜ . E1 . ⎟
Ln∗ Ln =⎜
⎜. .. . .
⎟⎜
.. ⎟ ⎜ .. .

⎝ .. .. .. ⎠

. . .⎠ ⎝ . .
′ ′
0 0 ··· Fn−1 0 O · · · · · · En−1
E1 O · · · ··· O
⎛ ⎞
⎜ O E2 · · · · · · O⎟
=⎜. .. ⎟ = Ln ,
⎜ ⎟
.. . .
⎝ .. . . .⎠
0′ 0′ · · · En−1 0′

so

Ln∗ Ln N = Ln N (3.32)

3.3 Duplication Matrices

3.3.1 The Duplication Matrix Dn


A matrix as complicated as Ln Nn is the duplication matrix Dn (in fact, we
shall see in the last section of this chapter that the two matrices bear many
similarities). The duplication matrix Dn is the n2 × n(n + 1)/2 zero-one
112 Elimination and Duplication Matrices

matrix that takes us from vechA to vecA for the case where A is a symmetric
matrix. Recall that vechA is the n(n + 1)/2 × 1 vector given by
⎛ ⎞
a11
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜an1 ⎟
⎜ ⎟
⎜ a22 ⎟
vech A = ⎜ . ⎟
⎜ ⎟
⎜ .. ⎟
⎜ ⎟
⎜a ⎟
⎜ n2 ⎟
⎜ . ⎟
⎝ .. ⎠
ann
whereas vecA is n2 × 1 vector given by
⎛ ⎞
a11
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜an1 ⎟
⎜ ⎟
vec A = ⎜ ... ⎟ .
⎜ ⎟
⎜ ⎟
⎜a ⎟
⎜ 1n ⎟
⎜ . ⎟
⎝ .. ⎠
ann
Comparing vechA with vecA, we see that we can write Dn as follows:
⎛ ⎞
In O O ··· O 0

⎜e n 0′ 0′ · · · 0′ 0⎟
⎜2 ⎟
⎜O I O · · · O 0⎟
⎜ ′ n−1 ⎟
⎜e n 0′ 0′ · · · 0′ 0⎟
⎜3 ⎟
⎜ 0′ e n−1′ 0 ′
· · · 0 ′
0 ⎟
⎜ 2 ⎟
⎜O O In−2 · · · O 0⎟
⎜ ⎟
⎜ .. .. ⎟
. .⎟
⎜ ⎟

Dn = ⎜
⎜en n′
0 ′
0 ′
· · · 0 0⎟ .
′ ⎟ (3.33)

⎜ ′ n−1
0′ · · · 0′ 0⎟

⎜ 0 en−1
⎜ ⎟
⎜ ⎟
⎜ .. .. ⎟
⎜ ⎟
..
⎜ . . .⎟
⎜ .. .
⎜ ⎟
.. .. ⎟
⎜ . . ⎟
2′
⎜ ⎟
⎝ e2 ⎠
′ ′ ′ ′
0 0 0 ··· 0 1
3.3 Duplication Matrices 113

For example,

1 0 0 0 0 0
⎛ ⎞
⎛ ⎞
I3 O 0 ⎜0 1 0 0 0 0⎟
⎜ ⎟
⎟ ⎜0 0 1 0 0 0⎟
⎜e 3′ 0′ 0⎟ ⎜
⎜2 ⎟
⎜ ⎟ ⎜0 1 0 0 0 0⎟
⎜O I2 0⎟ ⎜ ⎟
D3 = ⎜ ⎟ = ⎜0 0 0 1 0 0⎟
⎜e 3′ 0′ 0⎟ ⎜ ⎟.
⎜3 ⎟ ⎜0 0 0 0 1 0⎟

e22
⎜ ′
⎝0 0⎠ ⎜ 0
⎟ ⎜ ⎟
⎜ 0 1 0 0 0⎟⎟
0′ 0′ 1 ⎝0 0 0 0 1 0⎠
0 0 0 0 0 1

A close inspection of Dn shows that we can write the matrix in the following
way:
⎞ ⎛
H1
Dn = ⎝ ... ⎠ = (M 1 · · · Mn ) (3.34)
⎜ ⎟

Hn

where H1 is the n × 21 n(n + 1) matrix given by


 
H1 = In O (3.35)

where Hj is the n × 21 n(n + 1) matrix given by

 
Hj = G j O (3.36)

where Gj is the n × 2j (2n + 1 − j ) matrix given by


e nj
⎛ ⎞
O


⎜ e n−1
j−1


..
⎜ ⎟
Gj = ⎜
⎜ . ⎟,
⎟ (3.37)
⎜ n− j+2′ ⎟
⎝ e2 ⎠
O In− j+1

for j = 2, . . . , n.
114 Elimination and Duplication Matrices

In the alternate representation, Mj is the n2 × n − j + 1 matrix given by


⎛ ⎞
O
⎜n( j−1)×(n− j+1)⎟
⎜ O
⎜ ( j−1)×(n−

j+1) ⎟
O
⎜ ⎟ ⎛ ⎞
⎜ I ⎟
⎜ n− j+1 ⎟
⎟ ⎜ O ⎟
O

⎜ ⎟ ⎜ ⎟
⎜ ( j−1)×(n− j+1) ⎟ ⎜ In− j+1 ⎟
⎜ ′
⎟ ⎜ ⎟
M j = ⎜ e2n− j+1 ⎟ = ⎜ Z j′ ⎟ , (3.38)
⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ 2 ⎟
O ⎟ ⎜ . ⎟
⎜ (n− j )×(n− j+1) ⎟ ⎜ .. ⎟

⎜ .. ⎟ ⎝ ⎠
. j′
⎜ ⎟
⎜ ⎟ Zn− j+1

⎜ O ′ ⎟ ⎟
⎜ e n− j+1 ⎟
⎝ n− j+1 ⎠
O
for j = 2, . . . , n − 1 and
In


⎛⎞

⎜Z 1 ⎟ 0
and Mn = ⎝ ... ⎠ .
⎜ 2⎟
M1 = ⎜ . ⎟ (3.39)
⎜ ⎟
⎝ .. ⎠
′ En−1′
Zn1
j j
The matrices Z2 . . . Zn− j+1 are given by Equations 3.20 and 3.21 for j =
1, . . . , n − 1.
The matrices Hj and Mj for j = 1, . . . , n are interesting in themselves and
worthy of study. Consider the former one first. If A is a n × p matrix, then
   
′ In A
H1 A = A= (3.40)
O O
and
   
G ′j G ′j A
H j′ A = A= (3.41)
O O
where
⎛ ⎞
e nj O ′
e nj a1
⎛ 1′ ⎞ ⎛ ⎞
a

⎜ e n−1
j−1

⎟⎜ . ⎟ ⎜ ..
⎜ .. ⎟ ⎜ .. ⎟ ⎜ .

G ′j A = ⎜ =

. ⎟⎜
⎟ ⎝ j−1′ ⎟ ⎜ n− j+2
j−1 ′

⎟ a e2 a ⎠
⎜ ⎠ ⎝
⎜ n− j+2
⎝ e2 ⎠
(A) j−1 (A) j−1
O In− j+1
(3.42)
3.3 Duplication Matrices 115

for j = 2, . . . , n. Also, if x is a n × 1 matrix


H j′ (x ′ ⊗ A) = x ′ ⊗ H j′ A, (3.43)
for j = 1, . . . , n.
Taking the transposes of Equations 3.40, 3.41, 3.42, and 3.43, we get that
if B is a p × n matrix, then
BH1 = (B O)

BH j = (BG j O)
where
′ ⎞′
b1 e nj

⎜ .. ⎟

BG j = ⎜
⎜ . ⎟

′⎟
n− j+2
⎝b j−1 e2 ⎠
(B) j−1
for j = 2, . . . , n. and
(x ⊗ B)H j = x ⊗ BH j
for j = 1, . . . , n.
We write the other matrix Mj as
⎛ ⎞
O
⎜ E ′j−1 ⎟
⎜ ⎟
j′ ⎟
Z

Mj = ⎜ 2 ⎟

⎜ .. ⎟

⎝ . ⎠
j′
Zn− j+1
j′
for j = 2, . . . , n − 1, where from Equation 3.20 Zi is the n × n − j + 1
matrix given by
⎞3
O

⎜ ( j−1)×(n− j+1)⎟
j′
⎜ ⎟
Zi = ⎜ n− j+1′
⎜ ei


⎝ ⎠
O
(n− j )×(n− j+1)

for j = 2, . . . , n − j + 1.
3
The remarks made about Ej clearly refer to Hj , Gj , Mj and Zij as well. All these matrices are
dependent on n, but for simplicity of notation, this is not indicated, the relevant n being
clear from the content.
116 Elimination and Duplication Matrices

j
It is now time to investigate some of the properties of Zi . First, if A is a
n × p matrix, then
⎛ ′⎞
0
⎜ .. ⎟
  ⎜ . ⎟
n− j+1 j ′
j n− j+1
⎜ j ′ ⎟ th
Zi A = O ei O A = ei a =⎜ ⎜a ⎟ i
⎟ (3.44)
⎜ .. ⎟
⎝ . ⎠
0′

Clearly, if x is a n × 1 vector
′ ′
Zij (x ′ ⊗ A) = x ′ ⊗ Zij A = x ′ ⊗ ein− j+1 ⊗ a j .

Taking the transposes, we get


′ ′ ′ ′
BZij = b j ein− j+1 = b j ⊗ ein− j+1 = ein− j+1 ⊗ b j = (0 · · · b j · · · 0)

and
′ ′
(x ⊗ B)Zij = x ⊗ ein− j+1 ⊗ b j .

If A is a n2 × p matrix and we partition A as


⎛ ⎞
A1
⎜ .. ⎟
A=⎝ . ⎠
An

where each submatrix is n × p, then


⎛ ⎞
A1
⎜ .. ⎟
⎜ . ⎟
 ⎜
⎜ Aj ⎟

j j
M j′ A = O E j−1 Z2 · · · Zn− j+1 ⎜
⎜A ⎟

⎜ j+1 ⎟
⎜ . ⎟
⎝ .. ⎠
An
j j
= E j−1 A j + Z2 A j+1 + ··· + Zn− j+1 An

for j = 1, . . . , n − 1 and

Mn′ A = En−1 An .
3.3 Duplication Matrices 117

Using Equations 3.3 and 3.44, we have


⎛ ⎞
O ⎛ ⎞
⎜(A j+1 ) j· ⎟ O
⎜ ⎟ ⎜ O ⎟
M j′ A = (A j ) j−1 + ⎜ O ⎟ + · · · + ⎜ .. ⎟
⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ ⎝ . ⎠
⎝ . ⎠
(An ) j·
O
⎛ ⎞ ⎛ ⎞
(A j ) j· O
⎜(A j ) j+1· ⎟ ⎜(A j+1 ) j· ⎟
=⎜ ⎟+⎜
⎜ ⎟ ⎜ ⎟
.. .. ⎟
⎝ . ⎠ ⎝ . ⎠
(A j )n· (An ) j·
    ′ 
=
(A j )j· 
= (A j ) j−1 +  0( j )  (3.45)
(A j ) j + A( j ) j A j

for j = 1, . . . , n − 1, where we are using the notation introduced in Equa-


tion1.8 of Chapter 1, and
Mn′ A = (An )n−1 . (3.46)
If A and B are n × p and n × q matrices, respectively,
 ′

′ (a j ⊗ B) j ·
M j (A ⊗ B) = ′
(a j ⊗ B) j + (A ⊗ B)( j ) j
 

for j = 1, . . . , n − 1 and

Mn′ (A ⊗ B) = (an ⊗ B)n−1 .
As pointed out in Section 1.2 of Chapter 1,
′ ′ ′
(a j ⊗ B) j· = a j ⊗ b j .
From Equation 3.6, we have
′ ′
(a j ⊗ B) j = a j ⊗ (B) j
and from Equation 1.9 of Chapter 1, we have

(A ⊗ B)( j ) = A ⊗ b j
so using Equation 3.5, we have

(A ⊗ B)( j ) j = (A) j ⊗ b j
 
118 Elimination and Duplication Matrices

for j = 1, . . . , n − 1. Hence, we write


′ ′
a j ⊗ bj
   ′ 
j′ 0 ′
M j′ (A ⊗ B) = ′ ′ = a ⊗ (B) + ⊗ bj
a j ⊗ (B) j + (A) j ⊗ b j j−1 (A) j
(3.47)
for j = 1, . . . , n − 1 and

Mn′ (A ⊗ B) = an ⊗ (B)n−1 . (3.48)
If we take the transposes of both sides of Equations 3.45, 3.46, 3.47, and
3.48, we have the following results. If B is a p × n2 matrix and we partition
B as
 
B = B 1 · · · Bn
where each submatrix is p × n, then
BM j = (B j )· j (B j ) j + (B( j ) ) j
 
(3.49)
for j = 1, . . . , n − 1, where we have used the notation introduced by Equa-
tion 2.67 of Chapter 2, and
BMn = (Bn )n−1 .
If C and D are p × n and q × n matrices, respectively, then
(C ⊗ D)M j = c j ⊗ d j c j ⊗ (D) j + (C ) j ⊗ d j
 

= c j ⊗ (D) j−1 + (0 (C ) j ) ⊗ d j
 
(3.50)
for j = 1, . . . , n − 1 and
(C ⊗ D)Mn = cn ⊗ (D)n−1 .
Note that as special cases if x and y are both n × 1 vectors
    
′ xjyj 0
M j (x ⊗ y) = = x j (y) j−1 + y j
x j (y) j + (x) j y j (x) j
for j = 1, . . . , n − 1 and
Mn′ (x ⊗ y) = xn yn
whereas
  
(x ′ ⊗ y ′ )M j = x j y j x j (y ′ ) j + y j (x ′ ) j = x j (y ′ ) j−1 + y j 0′ (x ′ ) j
 

(3.51)
3.3 Duplication Matrices 119

for j = 1, . . . , n − 1 and
(x ′ ⊗ y ′ )Mn = xn yn .
Now, consider the case where A and B are both n × n matrices, so we can
form M j′ (A ⊗ B)Mℓ . From Equation 3.47
 
′ a ′j ⊗ b ′j
M j (A ⊗ B)Mℓ = Mℓ
a ′j ⊗ (B) j + (A) j ⊗ b ′j

for j = 1, . . . , n − 1. But using Equation 3.51, we can write


 ′   ℓ  ℓ 
a j ⊗ b ′j Mℓ = a jℓ b jℓ a jℓ b ′j + b jℓ a ′j


for ℓ = 1, . . . , n − 1, whereas using Equation 3.50, we have


 ′    ℓ  ℓ 
a j ⊗ (B) j Mℓ = a jℓ (bℓ ) j a jℓ (B) j + a ′j ⊗ (bℓ ) j

and
 ℓ 
(A) j ⊗ b ′j Mℓ = b jℓ (aℓ ) j
  
(aℓ ) j ⊗ (b ′j )ℓ + (A) j b jℓ

again for j = 1, . . . , n − 1. Putting our pieces together we have that


M j′ (A ⊗ B)Mℓ
⎛ ′ ′

a jℓ b jℓ a (b j )ℓ + b jℓ (a j )ℓ
=⎝  ℓ  jℓ ℓ ′ ′

a jℓ (bℓ ) j + b jℓ (aℓ ) j a jℓ (B) j + b jℓ (A) j + (a j )ℓ ⊗ (bℓ ) j + (aℓ ) j ⊗ (b j )ℓ
    ′ ℓ   
0 0 0 ′
= a jℓ (bℓ ) j−1 + b jℓ (a ) a jℓ ((B) j−1 )ℓ + b jℓ ((A) j−1 )ℓ + j ′ ⊗ (bℓ ) j + (a ) ⊗ (b j )ℓ
ℓ j a ℓ j

for j = 1, . . . , n − 1, and ℓ = 1, . . . , n − 1.
The special cases are given by
Mn′ (A ⊗ B)Mℓ = anℓ bnℓ anℓ (bn′ )ℓ + bnℓ (an′ )ℓ
 
  ′ ℓ 
′ ℓ−1 0
= anℓ (bn ) + bnℓ ′
an

for ℓ = 1, . . . , n − 1 and
    
a jn b jn 0
M j′ (A ⊗ B)Mn = = a jn (bn ) j−1 + b jn
a jn (bn ) j + b jn (an ) j (an ) j
for j = 1, . . . , n − 1 and
Mn′ (A ⊗ B)Mn = ann bnn .
120 Elimination and Duplication Matrices

Using these properties for M1 , . . . , Mn , we can investigate how Dn interact


with Kronecker products. Consider two matrices A and B, which are n × p
and n × q, respectively, then
⎛ ′ ⎞
M1 (A ⊗ B)
Dn′ (A ⊗ B) = ⎝ ..
⎠.
⎜ ⎟
.
Mn′ (A ⊗ B)

Using the properties of Mj and Mn given by Equations 3.47 and 3.48, we


can write
′ ′
a1 ⊗ b1
⎛ ⎞
⎜a1′ ⊗ (B)1 + (A)1 ⊗ b1′ ⎟
⎜ ′ ′ ⎟

⎜ a2 ⊗ b2 ⎟
Dn (A ⊗ B) = ⎜a2 ⊗ (B) + (A) ⊗ b2 ⎟
⎜ ′ ′

⎜ 2 2 ⎟
⎜ .. ⎟
⎝ . ⎠

an ⊗ (B)n−1
 
O
⎛ ⎞
1′ ′
a ⊗B+ ⊗ b1
(A)
 1 
⎜ ⎟
⎜ ⎟

2′ O 2′

⎜ a ⊗ (B)1 + ⊗b ⎟
⎜ (A) 2

=⎜
⎜ ⎟
⎜ .
..


⎜   ⎟

⎜an−1′ ⊗ (B) O ⎟
n−1′ ⎟
n−2 + ⊗b ⎠
⎝ (A)n−1

an ⊗ (B)n−1

and if C and D are r × n and s × n matrices, respectively, then

(C ⊗ D)Dn = c1 ⊗ d1 c1 ⊗ (D)1 + (C )1 ⊗ d1 · · ·


cn−1 ⊗ dn−1 cn−1 ⊗ (D)n−1 + (C )n−1 ⊗ dn−1 cn ⊗ (D)n−1




= c1 ⊗ D + (O (C )1 ) ⊗ d1 · · ·


cn−1 ⊗ (D)n−2 + (O (C )n−1 ) ⊗ dn−1 cn ⊗ (D)n−1 .




If A and B are both n × n matrices , so Dn′ (A ⊗ B)Dn exists and if we write


this matrix as
⎛ ⎞
C1
′ ⎜ .. ⎟
Dn (A ⊗ B)Dn = ⎝ . ⎠ ,
Cn
3.3 Duplication Matrices 121

then the submatrix Cj is the n − j + 1 × 21 n(n + 1) matrix given by



′ n−2
C j = a j1 (B) j−1 + 0′ (a j )1 ⊗ (b1 ) j−1 · · · a jn−1 (B) j−1
  

  
j ′ n−1 n−1 0 ′
⊗ bj
 ′ 
+ 0 (a ) ⊗ (bn−1 ) j−1 a jn ((B) j−1 ) +
(a1 ) j
  1 
0′
 
0 ′
+ 0 b j1 · · · ⊗ (b j )n−2
(A) j (an−1 ) j
  n−1

0′
   
0 j ′ n−1
+ 0 b jn−1 ⊗ (b ) (3.52)
(A) j (an ) j

for j = 1,. . . . , n and


 ′
Cn = an1 (B)n−1 + 0 (an )1 ⊗ (b1 )n−1 · · · ann−1 ((B)n−1 )n−2
 

′ n−1 
+ 0 (an )n−1 ⊗ (bn−1 )n−1 ann (B)n−1
  
 ′ n−2
= an1 (B)n−1 + 0 (an )1 ⊗ (b1 )n−1 · · · ann−1 (B)n−1
  

  
+ 0 ann bn−1n ann bnn . (3.53)

We see in Chapter 6 that the application of matrix calculus to statistics often


gives rise to the matrix Dn′ (A ⊗ A)Dn and in this expression, more often
than not, A is a symmetric matrix. Consider the case where A is a 2 × 2
matrix, not necessarily symmetric. Then, by Equations 3.52 and 3.53
 
′ C1
D2 (A ⊗ A)D2 =
C2
where

C1 = a11 (A)0 + 0 (a1 )1 ⊗ (a1 )0 a12 ((A)0 )1
   
    1    
0 ′ 0 0 ′
+ ⊗ a1 + 0 a11 ⊗ (a1 )1
(a1 )1 (A)1 (a2 )1
      
a a12 a a
= a11 11 + 0 a12 ⊗ 11 a12 12
 
a21 a22 a21 a22
      
0   0 0 0
+ ⊗ a11 a12 + a a
a21 0 a22 11 a22 12
 2 2

a11 2a11 a12 a12
=
2a21 a11 2a11 a22 + 2a21 a12 2a12 a22
122 Elimination and Duplication Matrices

and

C2 = a21 (A)1 + 0 (a2 )1 (a1 )1 a22 ((A)1 )1
   

2
 
= a21 (a21 a22 ) + (0 a22 )a21 a22
 2 2

= a21 2a21 a22 a22

so
2 2
⎛ ⎞
a11 2a11 a12 a12
D2′ (A ⊗ A)D2 = ⎝2a11 a21 2a11 a22 + 2a21 a12 2a12 a22 ⎠ . (3.54)
⎜ ⎟
2 2
a21 2a21 a22 a22

Comparing Equation 3.54 with Equation 3.28, we see that there are a lot of
similarities between L2 N2 (A ⊗ A)N2 L2′ and D2′ (A ⊗ A)D2 when A is sym-
metric. All the elements of these two matrices have the same combination
of the aij s, though the number of these combinations differs in the 2nd row
and the 2nd column. More will be made of this when we compare Ln Nn
with Dn as we do in Section 3.4.
Using our explicit expression for Ln , Ln Nn , and Dn , it is simple to prove
known results linking Dn with Ln and Nn .
For example,

H1
⎛ ⎞ ⎛ ⎞
In O ⎛ ⎞
H 1
⎜ E1 ⎟ ⎜ .. ⎟ ⎜ E1 H2 ⎟
⎟ ⎜ ⎟
Ln Dn = ⎜ = ⎟.

.. ⎠ .
⎟ ⎝ ⎠
⎝ ... ⎠

⎝ .
Hn
O En−1 En−1 Hn

But using Equations 3.3, 3.35 and 3.36, the matrix E j H j+1 is the n − j ×
1
2
n(n + 1) matrix given by
 
O In− j O
E j H j+1 = j 1
,
(n− j )× 2 (2n− j+1) (n− j )× 2 (n− j+1)(n− j )

so
⎛ ⎞
In O
⎜ In−1 ⎟
Ln Dn = ⎜ ⎟ = I 21 n(n+1) .
⎜ ⎟
..
⎝ . ⎠
O 1
3.3 Duplication Matrices 123

Similarly,

⎞ ⎛
H1
1 ⎜ ⎟ 1
Ln Nn Dn = (P · · · Pn ) ⎝ ... ⎠ = (P1 H1 + · · · + Pn Hn )
2 1 2
Hn

Now, from Equations 3.18 and 3.35,

   
R1 R1 O
P1 H1 = (In O) =
O O O

and from Equations 3.16, 3.36, and 3.37.


e nj

O

⎜ e n−1
j−1


⎜ .. ⎟

Pj H j = ⎜ . ⎟

⎜ n− j+2 ⎟
⎜O e2 ⎟
⎜ ⎟
⎝ Rj⎠
O ··· O


e nj
⎛ ⎞
O O


⎜ e n−1
j−1



×⎜ .. .. ⎟
⎜ . .⎟

⎜ n− j+2′ ⎟
⎝O e2 ⎠
O In− j+1 O


e nj e nj
⎛ ⎞
O O


⎜ e n−1 n−1
j−1 e j−1


⎜ .. .. ⎟

=⎜ . .⎟

⎜ n− j+2 n− j+2′ ⎟
⎜ O e2 e2 ⎟
⎜ ⎟
⎝ Rj ⎠
O ··· O O
124 Elimination and Duplication Matrices

for j = 2, . . . , n, so

e2n e2n
⎛ ⎞
⎛ ⎞ O
R1 O ⎜ R2 ⎟
⎜ O ⎟ ⎜ ⎟
2Ln Nn Dn = ⎜

⎟+⎜
⎟ ⎜ O ⎟
.. ⎟
⎝ . ⎠ ⎜ .. ⎟
⎝ . ⎠
O O
O O
⎛ n n′ ⎞
e3 e3 O


⎜ e2n−1 e2n−1 ⎟

+⎜
⎜ R3 ⎟ + ···

⎜ .. ⎟
⎝ . ⎠
O O
⎛ n n′ ⎞
en en O
⎜ . .. ⎟
+⎜
⎜ ⎟


2 2
⎝ e2 e2 ⎠
O Rn
⎛ ⎞
In O
⎜ In−1

= 2⎜ ⎟ = 2I 21 n(n+1) ,
⎜ ⎟
..
⎝ . ⎠
O 1
using Equation 3.17 and the fact that Rn = 2, which gives the result
Ln Nn Dn = I 1 n(n+1) . (3.55)
2

But such proofs are highly inefficient. A far more elegant approach, which
leads to simpler proofs is that of Magnus (1988), which concentrates on the
roles played by the various matrices. For example, for a symmetric matrix
A, we know that Nn vec A = vec A, Ln Nn vec A = vech A, and Dn vech A =
vec A. Thus, it follows that
Dn Ln Nn vec A = Nn vec A,
which gives the result
Dn Ln Nn = Nn .
For numerous other results linking Ln , Ln Nn , Ln , Ln Nn and Dn I can do no
better than refer the reader to Magnus (1988).
Our approach, investigating explicit expression for elimination matrices
and duplication matrices, comes into its own when we want to highlight
3.3 Duplication Matrices 125

the interaction of these matrices with Kronecker products. It also greatly


facilitates comparisons of the various zero-one matrices, particularly Ln Nn
and Dn as we see in the next section.
But first a new result, involving as it does Ln∗ .

Theorem 3.7
Ln∗ Dn′ = 2Ln Nn = 2Ln∗ Ln Nn

Proof: Using Equations 3.30 and 3.34, we write


⎞ ⎛ ′⎞ ⎛
F1 M1′
⎛ ⎞
F1 O 0 M1
Ln∗ Dn′ = ⎝ .. .. ⎟ ⎜ .. ⎟ = ⎜ ..
⎠.
⎜ ⎟
. . ⎠⎝ . ⎠ ⎝ .
′ ′ ′ ′
0 Fn−1 0 Mn Fn−1 Mn−1
But using Equations 3.31 and 3.38,
 
n− j+1 n− j+1
Fj M j′ = Fj O O In− j+1 O e2 O · · · O en− j+1 O
  n− j+1   n− j+1  
= O O (In− j+1 )1 O e2 1
O · · · O en− j+1 1 O
 ′
= O In− j In− j ⊗ e nj = Q ,
(n− j )×( j(n+1)−n) j

the matrix given by Equation 3.29.


In obtaining this result, we have used Theorem 1.1 of Chapter 1. The
second part of the theorem was obtained earlier in Equation 3.32. 

Important consequences of Theorem 3.7 follow. If A and B are n × p and


n × q matrices, respectively, then

Ln∗ Dn′ (A ⊗ B) = 2Ln∗ Ln Nn (A ⊗ B)

and if C and D are r × n and s × n matrices respectively, then

(C ⊗ D)Dn Ln∗′ = 2(C ⊗ D)Nn Ln′ Ln∗′ .

If, however, A and B are both n × n matrices, then

Ln∗ Dn′ (A ⊗ B)Dn Ln∗′ = 4Ln∗ Ln Nn (A ⊗ B)Nn Ln′ Ln∗′ .

3.3.2 The Elimination Matrix Ln Nn and the Duplication Matrix Dn


If one compares Equations 3.19, 3.16, and 3.17 with Equations 3.34 and
3.39, or Equations 3.25, 3.22, 3.23, and 3.24 with Equations 3.34, 3.35,
126 Elimination and Duplication Matrices

3.36, and 3.37, one cannot help but notice the similarities between 2Ln Nn
and Dn′ .
In fact, these two matrices have most of their elements the same and the
elements that differ are strategically placed in the two matrices being 2 in
the matrix 2Ln Nn and being 1 in the matrix Dn′ . The following theorem
conveys this result.

Theorem 3.8 The matrix 2Ln Nn − Dn′ is the 21 n(n + 1)×n2 block diagonal
matrix given by
⎛ n n′ ⎞
e1 e1 O

⎜ e1n−1 e2n ⎟

⎜ ⎟
2Ln Nn − Dn′ = ⎜
⎜ e1n−2 e3n ⎟.

⎜ .. ⎟
⎝ . ⎠

O enn

Proof: Consider Ln Nn and Dn as given by Equations 3.19 and 3.34, it follows


that

2Ln Nn = (P1 · · · Pn )

and

Dn′ = H1′
 
· · · Hn′ .

Now, from Equations 3.18 and 3.35,


   
R1 In
P1 − H1′ = − ,
O O

where R1 is the n × n matrix given by


⎛ ⎞
2 O
⎜ 1 ⎟
R1 = ⎜ ⎟.
⎜ ⎟
..
⎝ . ⎠
O 1
3.3 Duplication Matrices 127


It follows that R1 − In = e1n e1n , so
 ′

e1n e1n
P1 − H1′ = .
O

From Equations 3.16, 3.36, and 3.37,

e nj O e nj O
⎛ ⎞ ⎛ ⎞
⎜ .. ..
.
⎟ ⎜
.

⎜ ⎟ ⎜ ⎟
Pj − H j′ = ⎜O n− j+2 ⎟ − ⎜O n− j+2
⎜ ⎟ ⎜ ⎟
⎜ e2 ⎟ ⎜ e2 ⎟

⎝ Rj⎠ ⎝ In− j+1 ⎠
O ··· ··· O O ··· ··· O

recalling that Rj is the n − j + 1 × n − j + 1 matrix, given by


⎛ ⎞
2 O
⎜ 1 ⎟
Rj = ⎜ ⎟,
⎜ ⎟
..
⎝ . ⎠
O 1

so
n− j+1 n− j+1′
R j − In− j+1 = e1 e1

and

O
⎛ ⎞
..
.
⎜ ⎟
⎜ ⎟
Pj − H j′ =⎜
⎜ ⎟
O ⎟
n− j+1 n− j+1′ ⎠
⎜ ⎟
⎝O ··· O e e
1 1
O ··· ··· O

for j = 2, . . . , n. But,
 
n− j+1 n− j+1′ n− j+1 n ′
O ··· O e1 e1 = e1 ej

for j = 2, . . . , n. 
128 Elimination and Duplication Matrices

Theorem 3.8 can be used to investigate the different ways 2Ln Nn and Dn′
interact with Kronecker products. For example,
⎛ n n′ ⎞ ⎛ 1′ ⎞
e1 e1 O a ⊗B
′ ′
⎜ e1n−1 e2n ⎟ ⎜ a2 ⊗ B ⎟
2Ln Nn − Dn′ (A ⊗ B) = ⎜
  ⎜ ⎟⎜ ⎟
.. ⎟ ⎜ .. ⎟
⎝ . ⎠ ⎝ . ⎠
n′ n′
O en a ⊗B
⎛ 1′ 1′ ⎞
⎛ n 1′ a ⊗b
n′

e1 a ⊗ e1 B ⎜ O ⎟
⎜ ⎟
⎜e n−1 a2′ ⊗ e n ′ B⎟ ⎜ a2′ ⊗ b2′ ⎟
⎜1 2 ⎟
=⎜ ⎟ = ⎜ O ⎟.
⎜ ⎟
..
⎝ . ⎠ ⎜
⎜ ..


n′ n′ .
a ⊗ en B ⎝ ⎠
n′ n′
a ⊗b
If we partition Ln Nn (A ⊗ B) as in Theorem 3.5, then the jth submatrix of
2Ln Nn (A ⊗ B) is
′ ′
a j ⊗ (B) j−1 + (A) j−1 ⊗ b j .
To obtain the equivalent jth submatrix of Dn′ (A ⊗ B), we subtract
a j ′ ⊗ b j ′ 
from it. That is, Dn′ (A ⊗ B) is the same matrix as 2Ln Nn (A ⊗ B)
O

except in the jth submatrix of 2Ln Nn (A ⊗ B), the first row of (A) j−1 ⊗ b j ,
′ ′
which is a j ⊗ b j , is replaced by the null vector.
By a similar analysis,
(C ⊗ D)Dn
= 2(C ⊗ D)Nn Ln′ − (c1 ⊗ d1 O ··· cn−1 ⊗ dn−1 O cn ⊗ dn )
If we use the partitioning of (C ⊗ D)Nn Ln′ given by Theorem 3.5, then the
jth submatrix of 2(C ⊗ D)Nn Ln′ is c j ⊗ (D) j−1 + (C ) j−1 ⊗ d j .
 To obtain the equivalent jth submatrix for (C ⊗ D)Dn , we subtract
cj ⊗ dj O .
In other words, (C ⊗ D)Dn is the same matrix as (C ⊗ D)Nn Ln′ except
in each jth submatrix the first column of (C ) j−1 ⊗ d j , which is c j ⊗ d j , is
replaced by the null vector.
Further comparisons can be made. If we continue to write, Dn′ =
(H1′ · · · Hn′ ) and 2Ln Nn = (P1 · · · Pn ), then
   
R1 I
P1 = and H1′ = n
O O
3.3 Duplication Matrices 129

so, clearly
⎛ n′ ⎞
2e1
⎜ en ′ ⎟
P1 = H1′ ⎜ 2. ⎟ .
⎜ ⎟
⎝ .. ⎠

enn
But,
e nj
⎛ n
O ej O
⎛ ⎞ ⎞
⎜ .. ..
.
⎟ ⎜
.

⎜ ⎟ ⎜ ⎟
Pj = ⎜ n− j+2 ⎟ and H j′ = ⎜ n− j+2
⎜ ⎟ ⎜ ⎟
⎜ e2 ⎟ ⎜ e2 ⎟

⎝O Rj⎠ ⎝O In− j+1 ⎠
O ··· ··· O O ··· ··· O
so
′ ⎞
e1n

⎜ .. ⎟
⎜ . ⎟
′ ⎜ n′ ⎟
⎜ ⎟
Pj = H j ⎜2e j ⎟
⎜ . ⎟
⎝ .. ⎠

enn
for j = 2, . . . , n.
It follows that
⎛ n′ ⎞ ′
e1n
⎛ ⎛ ⎞⎞
2e1
⎜ ⎜ en ′ ⎟ ⎜ en ′ ⎟⎟
2Ln Nn = ⎜H1′ ⎜ 2. ⎟ · · · Hn′ ⎜ 2.
⎜ ⎜ ⎟ ⎜ ⎟⎟
⎝ ⎝ .. ⎠ ⎝ ..
⎟⎟
⎠⎠
′ ′
enn 2enn
⎛⎛ n ′ ⎞
2e1

⎜⎜ e n ′ ⎟ ⎟
⎜⎜ 2 ⎟ O ⎟
⎜⎜ .. ⎟ ⎟
⎜⎝ . ⎠ ⎟
⎜ n′ ⎟
⎜ en ⎟
⎜ ⎟
′ ⎜
= Dn ⎜ .. ⎟.

(3.56)
.

⎜ ⎞⎟
e1n
⎜ ⎛ ⎟
⎜ ⎟

e2n
⎜ ⎜ ⎟⎟
⎜ O
⎜ ⎜ ⎟⎟
⎜ . ⎟⎟
⎝ ⎝ .. ⎠⎠

2enn
130 Elimination and Duplication Matrices

Notice that the block diagonal matrix in Equation 3.56 is symmetric as its
transpose is
⎛ n n
2e1 e2 · · · enn
 ⎞
O
⎜ .. ⎟
⎝ . ⎠
 n n

O e1 · · · 2en

which is the matrix itself and it is also non-singular, its inverse being
⎛⎛ 1 n ′ ⎞ ⎞
e
2 1
⎜⎜ e n ′ ⎟ ⎟
⎜⎜ 2 ⎟
O ⎟

⎜⎜ . ⎟
⎜⎝ .. ⎠ ⎟
⎜ ⎟

⎜ enn
⎜ ⎟

⎜ .. ⎟
⎟,

⎜ . ⎟
⎛ n ′ ⎞⎟
e1 ⎟


⎜ ⎟
⎜ ⎜ e n ′ ⎟⎟
⎜ O ⎜ 2 ⎟ ⎟
⎜ . ⎟⎟
⎝ .. ⎠⎠


1 n′
e
2 n

so we can write

e1n
⎛⎛ ⎞ ⎞
⎜⎜2e n ′ ⎟ ⎟
⎜⎜ 2 ⎟
O ⎟

⎜⎜ . ⎟
⎜⎝ .. ⎠ ⎟
⎜ ⎟

⎜ 2enn
⎜ ⎟


Dn′ = Ln Nn ⎜ .. ⎟
⎜ . ⎟

⎛ n ′ ⎞⎟
2e1 ⎟


⎜ ⎟
⎜ ⎜2e n ′ ⎟⎟
⎜ O ⎜ 2 ⎟⎟
⎜ . ⎟⎟
⎝ .. ⎠⎠



enn

if we like.
Suppose now we use our other expression for 2Ln Nn and Dn′ , namely
⎛ ⎞ ⎛ ′⎞
T1 M1
⎜ .. ⎟ ′ ⎜ .. ⎟
2Ln Nn = ⎝ . ⎠ , Dn = ⎝ . ⎠ .
Tn Mn′
3.3 Duplication Matrices 131

Then, from Equations 3.24 and 3.39


′ ′
Tn = 0′ 2enn , enn ,
 
Mn = 0′

so, clearly

Tn = 2Mn′ = Rn Mn′ .

For j = 2, . . . , n − 1 from Equations 3.22 and 3.38


 j j 
T j = O (O R j ) Z2 · · · Zn− j+1

and
j j
M j′ = O
 
(O In− j+1 ) Z2 · · · Zn− j+1 .

Consider for i = 2, . . . , n − j + 1
n− j+1
R j Zi = R j O ein− j+1
   
O = O R j ei O

and
⎛ ⎞
2 O
⎜ 1 ⎟
R j ein− j+1 = ⎜
⎟ n− j+1
⎟ ei = ein− j+1 ,

..
⎝ . ⎠
O 1

so R j Zi = Zi and Tj = R j M j′ for the values of the subscripts we consider.


This comparison then gives

R1 M1′
⎛ ⎞ ⎛ ⎞
R1 O
2Ln Nn = ⎝ ... ⎠ = ⎝ .. ⎟ ′
⎠ Dn . (3.57)
⎜ ⎟ ⎜
.
Rn Mn′ O Rn

As R j is a symmetric matrix, the block matrix in the right-hand side of


Equation 3.57 is symmetric. It is also nonsingular, with its inverse being
⎛⎛ 1 ⎞ ⎞
2
⎜⎜ 1 ⎟ ⎟
O⎟
⎜⎜ ⎟ ⎟
⎜⎜ .. ⎟
⎜⎝
⎜ . ⎠ ⎟


⎜ 1 ⎟

⎜ .. ⎟
⎝ . ⎠
O 1
132 Elimination and Duplication Matrices

so if we like we could write


⎛⎛ ⎞ ⎞
1
⎜⎜ 2 ⎟ ⎟
O⎟
⎜⎜ ⎟ ⎟
⎜⎜ .. ⎟

⎜⎝ . ⎠ ⎟
Dn = ⎜ ⎟L N .
⎟ n n

⎜ 2 ⎟
⎜ .. ⎟
⎝ . ⎠
O 2
The matrix Ln Nn and Dn are linked in another way. We saw in Section 2.5.7
of Chapter 2 that Nn is symmetric and idempotent, and in the previous
section we saw that Ln Nn Dn = I 1 n(n+1) and Dn Ln Nn = Nn . These results
2
mean that if A is a n × n nonsingular matrix, then
 ′ −1
Dn (A ⊗ A)Dn = Ln Nn (A−1 ⊗ A−1 )Nn Ln′ . (3.58)
To establish this result, consider
Dn′ (A ⊗ A)Dn Ln Nn (A−1 ⊗ A−1 )Nn Ln′
= Dn′ (A ⊗ A)Nn (A−1 ⊗ A−1 )Nn Ln′
= Dn′ Nn (A ⊗ A)(A−1 ⊗ A−1 )Nn Ln′ = Dn′ Nn Ln′
= I 1 n(n+1) ,
2

where we have used the fact that Nn (A ⊗ A) = (A ⊗ A)Nn .


Similarly,
Ln Nn (A−1 ⊗ A−1 )Nn Ln′ Dn′ (A ⊗ A)Dn = I 1 n(n+1)
2

This result is found in Magnus (1988) and is important in the application


of matrix calculus to statistical models, as discussed in Chapter 4.

3.3.3 The Duplication Matrix Dn


There is another duplication matrix and we finish this chapter by quickly
looking at this matrix. Fortunately, it is a far simpler matrix than Dn . It
is associated with strictly lower triangular matrices rather than symmetric
matrices as Dn . A n × n matrix A is strictly lower triangular if
⎛ ⎞
0 0 ··· ··· 0
⎜a21 0 · · · · · · 0⎟
⎜ ⎟
A = ⎜a31 a32 0⎟ ⎟.

⎜ .. .. .. .. ⎟
⎝ . . . .⎠
an1 an2 · · · ann−1 0
3.3 Duplication Matrices 133

The vec of such a matrix is the n2 × 1 vector given by


⎛ ⎞
0
⎜ a ⎟
⎜ 21 ⎟
⎜ . ⎟
⎜ .. ⎟
⎜ ⎟
⎜ a ⎟
⎜ n1 ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ a ⎟
⎜ 32 ⎟
⎜ . ⎟
⎜ . ⎟
⎜ . ⎟
⎜ an2 ⎟
⎜ ⎟
vec A = ⎜ . ⎟

⎟. (3.59)
⎜ .. ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ . ⎟
⎜ . ⎟
⎜ . ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ann−1 ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ .. ⎟
⎜ ⎟
⎝ . ⎠
0
As v(A) contains all the essential elements of A, there exists a n2 × 12 n(n − 1)
duplication Dn such that Dn v(A) = vec A.
Comparing vecA given by Equation 3.59 with v(A), we see that
⎛ ′  ⎞
0
⎜ In−1 · · · · · · O ⎟ ⎛E ′ · · · · · · ⎞
⎜ 1 O
⎜ .. .. ⎟ ⎜ .
  ⎟
O .. ⎟
⎜ . . ⎟ ⎜ .. E2′ . ⎟
⎜ In−2

⎟ ⎜ . . ⎟ = L′ .

Dn = ⎜ .

. .. ⎟ = ⎜ ..
⎜ .. . n
⎜ . . . . . ⎟ . . ⎟
⎟ ⎜ ⎝O ′ ⎠

En−1
⎜ 

⎜ O 0 ⎟

0 ··· ··· 0

⎝ 1 ⎠
′ ′
0 0 ··· 0
Properties of Dn can then be obtained from the properties we have already
obtained for the elimination matrix Ln .
FOUR

Matrix Calculus

4.1 Introduction
Let Y be an p×q matrix whose elements yi j s are differentiable functions of
the elements xrs s of an m×n matrix X. We write Y = Y (X ) and say Y is a
matrix function of X. Given such a setup, we have mnpq partial derivatives
that we can consider:

i = 1, . . . , p
∂yi j
j = 1, . . . , q
∂xrs
r = 1, . . . , m
s = 1, . . . , n.

The question is how to arrange these derivatives. Different arrangements


give rise to different concepts of derivatives in matrix calculus.
At least four concepts of a derivative of Y with respect to X are used in the
literature. In the first part of this chapter, we show how the mathematical
tools discussed in Chapters 1 and 2 can be used to analyze relationships
that exist between the four concepts. In particular, generalized vec and rvec
operators are useful concepts in establishing transformation principles that
allow us to move from a result for one of the concepts of matrix derivatives
to the corresponding results for the other three concepts.
In doing all this, it is not our intention to develop a table of known
matrix calculus results. Such results are available elsewhere (see Rogers
(1980), Graham (1981), Magnus and Neudecker (1988), Lutkepohl (1996)
and Turkington (2005)).
Having said that, known matrix calculus results are presented without
proof to illustrate the transformation principles we develop.

134
4.2 Different Concepts of a Derivative of a Matrix 135

4.2 Different Concepts of a Derivative of a Matrix with Respect


to Another Matrix
As mentioned in the introduction to this chapter, there are several concepts
of the derivative of a p×q matrix Y with respect to another m×n matrix X
depending on how we arrange the partial derivatives ∂yi j /∂xrs .
The first concept we considered starts with a differentiable real value
scalar function y of an n×1 vector x. It then defines the derivative of y with
respect to vector x as the 1×n row vector
 
∂y ∂y
Dy = ···
∂x1 ∂xn
where x1 , . . . , xn are the elements of x. Consider now y = vecY where Y is
our p×q matrix whose elements are differentiable functions of the elements,
the xrs s of a matrix X. Each element of vecY is a differential function of x
where x = vecX , so the derivative of the ith element with respect to x can
be defined as
 
∂ (vec Y )i ∂ (vec Y )i ∂ (vec Y )i ∂ (vec Y )i
D(vec Y )i = ··· ··· ···
∂x11 ∂xm1 ∂x1n ∂xmn
for i = 1, . . . , pq. Stacking these row vectors under each other gives us our
first concept of the derivative of Y with respect to X.

Concept 1 The derivative of the p×q matrix Y with respect to the m×n
matrix X is the pq×mn matrix.
∂y11 ∂y11 ∂y11 ∂y11
⎛ ⎞
··· ··· ···
⎜ ∂x11 ∂xm1 ∂x1n ∂xmn ⎟
⎜ .. .. .. ..
⎜ ⎟

⎜ . . . . ⎟
⎜ ∂y p1 ∂y p1 ∂y p1 ∂y p1 ⎟
⎜ ⎟

⎜ ∂x ··· ··· ··· ⎟
⎜ 11 ∂xm1 ∂x1n ∂xmn ⎟
⎜ . .. .. ..

DY (X ) = ⎜ .. ⎟.

. . .
⎜ ∂y1q ∂y1q ∂y1q ∂y1q ⎟
⎜ ⎟

⎜ ∂x ··· ··· ··· ⎟
⎜ 11 ∂xm1 ∂x1n ∂xmn ⎟ ⎟
⎜ .. .
. .
. .
. ⎟
⎜ . . . . ⎟
⎜ ⎟
⎝ ∂y pq ∂y pq ∂y pq ∂y pq ⎠
··· ··· ···
∂x11 ∂xm1 ∂x1n ∂xmn
Notice that under this concept the mnpq derivatives are arranged in such
a way that a row of DY (X ) gives the derivatives of a particular element of
136 Matrix Calculus

Y with respect to each element of X and a column gives the derivatives of


all the elements of Y with respect to a particular element of X. Notice also
in talking about the derivatives of yi j , we have to specify exactly where the
ith row is located in this matrix. The device we first used in Theorem 2.1 of
Section 2.2 of Chapter 2 comes in handy here. Likewise, when talking of the
derivatives of all the elements of Y with respect to particular element xrs of
X, again, we have to specify exactly where the sth column is located in this
matrix. Again, the device introduced in Section 2.2 of Chapter 2 comes in
handy in doing this.
This concept of a matrix derivative is strongly advocated by Magnus and
Neudecker (see, for example Magnus and Neudecker (1985) and Magnus
(2010)). The feature they like about it is that DY (X ) is a straightforward
matrix generalization of the Jacobian Matrix for y = y(x) where y is a p×1
vector, which is a real value differentiable function of an m×1 vector x.
Consider now the case where the elements of the p×q matrix Y are all
differentiable functions of a scalar x. Then, we could consider the derivative
of Y with respect to x as the matrix of the derivatives of the elements of Y
with respect to x.
Denote this p×q matrix as

∂y11
⎛ ∂y1q ⎞
⎜ ∂x ···
⎜ . ∂x ⎟
δY .. ⎟
⎜ ..
=⎜ . ⎟.

δx ⎝ ∂y
p1 ∂y ⎠
pq
···
∂x ∂x

Return now to the case where each element of Y is a function of the elements
of an m×n matrix X. We could then consider the derivative of Y with respect
to X as made up of derivatives Y with respect to each element in X. That is,
the mp×qn matrix
⎛ ⎞
δY δY
⎜ δx ···
⎜ 11 δx1n ⎟

= ⎜ ... .. ⎟ .
δY ⎜
δX ⎜ . ⎟

⎝ δY δY ⎠
···
δxm1 δxmn

This leads us to Concept 2 of the derivative of Y with respect to X.


4.2 Different Concepts of a Derivative of a Matrix 137

Concept 2 The derivative of the p×q matrix Y with respect to the m×n
matrix X is the mp×nq matrix

⎛ ⎞
δY δY
⎜ δx ···
⎜ 11 δx1n ⎟

= ⎜ ... .. ⎟
δY ⎜
δX ⎜ . ⎟

⎝ δY δY ⎠
···
δxm1 δxmn

where δY/δxrs is the p×q matrix given by

∂y1q
⎛ ⎞
∂y11
⎜ ∂x ···
⎜ rs ∂xrs ⎟⎟
= ⎜ ... .. ⎟
δY ⎜
δxrs ⎜ . ⎟ ⎟
⎝ ∂y p1 ∂y pq ⎠
···
∂xrs ∂xrs

for r = 1, . . . , m, s = 1, . . . , n.
This concept of a matrix derivative is discussed in Dwyer and MacPhail
(1948), Dwyer (1967), Rogers (1980), and Graham (1981).
Suppose y is a scalar but a differentiable function of all the elements of an
m×n matrix X. Then, we could conceive of the derivative of y with respect
to X as the m×n matrix consisting of all the partial derivatives of y with
respect to the elements of X. Denote this m×n matrix as

⎛ ⎞
∂y ∂y
⎜ ∂x ···
⎜ 11 ∂x1n ⎟
γy ⎟
= ⎜ ... .. ⎟ .

γX ⎜ . ⎟

⎝ ∂y ∂y ⎠
···
∂xm1 ∂xmn

We could then conceive of the derivative of Y with respect to X as the matrix


made up of the γyi j /γX. Denote this mp×qn matrix by γy/γX. This leads
to the third concept of the derivative of Y with respect to X.
138 Matrix Calculus

Concept 3 The derivative of the p×q matrix Y with respect to the m×n
matrix X is the mp×nq matrix

⎛ γy
11
γy1q ⎞
···
⎜ γX γX ⎟
γY ⎜ . .. ⎟
⎜ ..
=⎜ . ⎟.

γX ⎝ γy p1 γy pq ⎠
···
γX γX

This is the concept of a matrix derivative studied in detail by MacRae


(1974) and discussed by Dwyer (1967), Roger (1980), Graham (1981), and
others.
From a theoretical point of view, Parring (1992) argues that all three
concepts are permissible as operators depending on which matrix or vector
space we are operating in and how this space is normed.

Concept 4 Consider ℓ a scalar function of an n×1 vector x. This concept


defines the derivative of ℓ with respect to x as the n×1 vector

⎛ ⎞
∂ℓ
⎜ ∂x ⎟
1⎟
∂ℓ ⎜
= ⎜ ... ⎟ .
⎜ ⎟
∂x ⎜ ⎟
⎝ ∂ℓ ⎠
∂xn

Let y = (yi ) be an s×1 vector whose elements are differentiable functions


of the elements of an r ×1 vector x = (xi ). We write y = y(x) and say y is a
vector function of x.
Then, consider this concept the derivative of y with respect to x as the
r ×s matrix

⎛ ⎞
∂y1 ∂ys
⎜ ∂x ···
⎜ 1 ∂x1 ⎟
∂y ⎟
= ⎜ ... .. ⎟ .

∂x ⎜ . ⎟

⎝ ∂y1 ∂ys ⎠
···
∂xr ∂xr
4.3 The Commutation Matrix and the Concepts of Matrix Derivatives 139

For a p×q matrix Y which is a matrix function of X this concept considers


the vectors vecY and vecX and defines the derivative of Y with respect to X
as

∂y11
⎛ ∂y p1 ∂y1q ∂y pq ⎞
⎜ ∂x11 ··· ··· ···
∂x11 ∂x11 ∂x11 ⎟
⎜ .. .. .. .. ⎟
⎜ ⎟
⎜ . . . . ⎟
∂y p1 ∂y1q ∂y pq ⎟
⎜ ⎟
⎜ ∂y11

⎜ ∂x ··· ··· ··· ⎟
⎜ m1 ∂xm1 ∂xm1 ∂xm1 ⎟
∂vec Y ⎟
= ⎜ ... .. .. .. ⎟ .

∂vec X ⎜ . . . ⎟

⎜ ∂y11 ∂y p1 ∂y1q ∂y pq ⎟

⎜ ∂x ··· ··· ··· ⎟
⎜ 1n ∂x1m ∂x1n ∂x1n ⎟ ⎟
⎜ . .. .. .. ⎟
⎜ .. . . . ⎟
⎜ ⎟
⎝ ∂y
11
∂y p1 ∂y1q ∂y pq ⎠
··· ··· ···
∂xmn ∂xmn ∂xmn ∂xmn

This concept of a matrix derivative was used by Graham (1983) and Turk-
ington (2005).

As this is just the transpose of Concept 1, we do not include it in our


discussions on the different concepts of matrix derivatives. However, we
take it up again in Chapter 5.

4.3 The Commutation Matrix and the Concepts


of Matrix Derivatives
We saw in Equation 2.57 of Chapter 2 that the commutation matrix can
be regarded as a twining matrix that intertwines a number of matrices
taking one row at a time. Suppose we partition the p×q matrix Y into its
columns, so Y = (y1 . . . yq ). If we let x = vecX and y = vecY , then using
Concept 1

⎛⎞
Dy1
⎜ . ⎟
DY (X ) = ⎝ .. ⎠
Dyq
140 Matrix Calculus

and
⎛ ⎞
Dy11
⎜ . ⎟
⎜ .. ⎟
⎜ ⎟
⎜ ⎟ ⎛ ⎞
⎜Dy1q ⎟ DY1·
⎜ ⎟
⎜ . ⎟ ⎜ . ⎟
⎜ .. ⎟ = ⎝ .. ⎠ ,
Kpq DY (X ) = ⎜ ⎟ ⎜ ⎟
⎜ ⎟
⎜Dy p1 ⎟ DYp·
⎜ ⎟
⎜ . ⎟
⎜ . ⎟
⎝ . ⎠
Dy pq

where Y j· is the jth row of Y for j = 1, . . . , p. If Y is an p× p symmetric


matrix, so yi j = y ji , then

Kpp DY (X ) = DY (X ).

Referring to Concept 2
⎛ δY δY1· ⎞

···
⎜ δx11 δx1n ⎟
⎜ . .. ⎟
⎜ .
⎜ . . ⎟

⎛ ⎞ ⎜
⎜ δY1· δY1· ⎟

δY δY ⎛ δY ⎞
··· ⎜
⎜ δx ··· ⎟ 1·
⎜ δx11 δx1n ⎟ m1 δxmn ⎟ ⎜ δX ⎟

⎟ ⎜ .. .. ⎟ ⎜ .. ⎟
⎜ . .. ⎟ ⎜
δY ⎜
Kpm = Kpm ⎜ .. . ⎟=⎜ .
⎜ . ⎟
⎟ = ⎜ . ⎟.
⎜ ⎟
δX ⎜ ⎟ ⎜
⎝ δY δY ⎠ ⎜ δYp· δYp· ⎟ ⎟
⎝ δY ⎠

··· ⎜ ··· ⎟
δxm1 δxmn ⎜ δx11
⎜ . δx1n ⎟ δX
⎜ . .. ⎟
⎜ . . ⎟

⎜ ⎟
⎝ δYp· δYp· ⎠
···
δxm1 δxmn

In a similar manner,
⎛ ⎞
γY
⎜ γX1· ⎟
⎜ . ⎟
γY ⎜ ⎟
Kmp = ⎜ .. ⎟ .
γX ⎜ ⎟
⎝ γY ⎠
γXm·
4.4 Relationships Between the Different Concepts 141

4.4 Relationships Between the Different Concepts


Suppose X is a scalar, say x. This case is rather exceptional, but we include
it for the sake of completeness. Then, it is easily seen that Concept 2 and
Concept 3 are the same and Concept 1 is the vec of the others. That is, for x
a scalar and Y an p×q matrix

δY γY δY
= and DY (x) = vec .
δx γx δx

As a vec can always be undone by taking the appropriate generalized rvec,


rvec p , in this case, we also have

δY γY
= rvec p DY (x) = .
δx γx
Suppose Y is a scalar, say y. This case is far more common in statistics
and econometrics. Then again, Concept 2 and Concept 3 are the same and
Concept 1 is the transpose of the vec of either concept. That is, for y a scalar
and X an m×n matrix

δy ′
 
δy γy
= and Dy(X ) = vec . (4.2)
δX γX δX

δy
As vec = (Dy(X ))′ and, again as a vec can always be undone by taking
δX
the appropriate generalized rvec, rvecm , in this case, we have

δy γy
= = rvecm (Dy(X ))′ . (4.3)
δX γX
The last case, where Y is in fact a scalar is prevalent enough in statistics to
warrant us looking at specific examples of the relationships between our
three concepts. The matrix calculus results presented here, as indeed the
results presented throughout this chapter, can be found in books such as
Graham (1981), Lutkepohl (1996), Magnus and Neudecker (1988), Rogers
(1980), and Turkington (2005).
Examples where Y is a scalar:

1. Suppose y is the determinant of a non-singular matrix. That is, y = |X |


where X is a non-singular matrix. Then,
′
Dy(X ) = |X | vec(X −1 )′ .

(4.4)
142 Matrix Calculus

From Equation 4.3, it follows immediately that


δy γy
= = |X |(X −1 )′ .
δX γX
2. Consider y = |Y | where Y = X ′ AX is non-singular. Then,
δy ′
= |Y | AXY −1 + A ′ XY −1 .

δX
It follows from Equation 4.2 that
′ ′
Dy(X ) = |Y | Y −1 ⊗ A + (Y −1 ⊗ A ′ ) vec X
  

= |Y |(vec X )′ Y −1 ⊗ A ′ + Y −1 ⊗ A .
   
(4.5)
3. Consider y = |Z| where Z = X BX ′ . Then,
′
Dy(X ) = |Z|(vec X )′ B ⊗ Z −1 + B ′ ⊗ Z −1 .
  

It follows from Equation 4.3 that


δy γy ′
= |Z| Z −1 X B + Z −1 X B ′ .
 
=
δX γX
4. Let y = trAX. Then,
δy
= A′.
δX
It follows from Equation 4.2 that
Dy(X ) = (vec A ′ )′ . (4.6)
5. Let y = trX ′ AX . Then,
Dy(X ) = (vec (A ′ X + AX ))′ .
It follows from Equation 4.3 that
δy γy
= = A ′ X + AX.
δX γX
6. Lety = trX AX ′ B. Then,
δy γy
= = B ′ X A ′ + BX A.
δX γX
It follows from Equation 4.2 that
Dy(X ) = (vec (B ′ X A ′ + BX A))′ .
4.5 Tranformation Principles Between the Concepts 143

These examples suffice to show that it is a trivial matter moving between the
different concepts of matrix derivatives when Y is a scalar. In the next section,
we derive transformation principles that allow us to move freely between
the three different concepts of matrix derivatives in more complicated cases.
These principles can be regarded as a generalization of the work done by
Dwyer and Macphail (1948) and by Graham (1980). In deriving these
principles, we call on the work we have done with regards to generalized
vecs and rvecs in Chapter 2, particularly with reference to the selection of
rows and columns of Kronecker products.

4.5 Tranformation Principles Between the Concepts


We can use our generalized vec and rvec operators to spell out the rela-
tionships that exist between our three concepts of matrix derivatives. We
consider two concepts in turn.

4.5.1 Concept 1 and Concept 2


The submatrices in δY/δX are

∂y1q
⎛ ⎞
∂y11
⎜ ∂x ···
⎜ rs ∂xrs ⎟⎟
= ⎜ ... .. ⎟
δY ⎜
δxrs ⎜ . ⎟ ⎟
⎝ ∂y p1 ∂y pq ⎠
···
∂xrs ∂xrs

for r = 1, . . . , m and s = 1, . . . , n. In forming the submatrix δY/δxrs , we


need the partial derivatives of the elements of Y with respect to xrs . When
we turn to Concept 1, we note that these partial derivatives all appear in
a column of DY (X ). Just as we did in locating a column of a Kronecker
product, we have to specify exactly where this column is located in the matrix
DY (X ). If s is 1, then the partial derivatives appear in the rth column, if
s is 2, then they appear in the m + rth column, if s is 3 in the 2m + rth
column, and so on until s is n, in which case the partial derivatives appear
in the (n − 1)m + rth column. To cater for all these possibilities, we say xrs
appears in the ℓth column of DY (X ) where

ℓ = (s − 1)m + r
144 Matrix Calculus

and s = 1, . . . , n. The partial derivatives we seek appear in that column as


the column vector
∂y11
⎛ ⎞
⎜ ∂xrs ⎟
⎜ .. ⎟
⎜ ⎟
⎜ . ⎟
⎜ ∂y p1 ⎟
⎜ ⎟
⎜ ⎟
⎜ ∂x ⎟
⎜ rs ⎟
⎜ .. ⎟
⎜ . ⎟.
⎜ ∂y1q ⎟
⎜ ⎟
⎜ ⎟
⎜ ∂x ⎟
⎜ rs ⎟
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎝ ∂y pq ⎠
∂xrs

If we take the rvec p of this vector, we get δY/δxrs , so

δY/δxrs = rvec p (DY (X )).ℓ (4.7)

where ℓ = (s − 1)m + r, for s = 1, . . . , n and r = 1, . . . , m.


Now, this generalized rvec can be undone by taking the vec, so
 
δY
(DY (X )).ℓ = vec . (4.8)
δxrs

If we are given DY (X ) and we can identify the ℓth column of this matrix,
then Equation 4.7 allows us to move from Concept 1 to Concept 2. If,
however, we have in hand δY/δX , we can identify the submatrix δY/δxrs
then Equation 4.8 allows us to move from Concept 2 to Concept 1.

4.5.2 Concept 1 and Concept 3


The submatrices in γY/γX are

∂yi j ∂yi j
⎛ ⎞
⎜ ∂x ···
11 ∂x1n ⎟
γyi j ⎜ ⎟
= ⎜ ... .. ⎟

γX ⎜ . ⎟ ⎟
⎝ ∂yi j ∂yi j ⎠
···
∂xm1 ∂xmn
4.5 Tranformation Principles Between the Concepts 145

for i = 1, . . . , p and j = 1, . . . , q. In forming the submatrix γyi j /γX , we


need the partial derivative of yi j with respect to the elements of X. When we
examine DY (X ), we see that these derivatives appear in a row of DY (X ).
Again, we have to specify exactly where this row is located in the matrix
DY (X ). If j is 1, then the partial derivatives appear in the ith row, if j = 2,
then they appear in the p + ith row, if j = 3, then in the 2p + ith row, and so
on until j = q, in which case the partial derivative appears in (q − 1)p + ith
row. To cater for all possibilities, we say the partial derivatives appear in the
t th row of ∂Y/∂X where
t = ( j − 1)p + i
and j = 1, . . . , q. In this row, they appear as the row vector
∂yi j ∂yi j ∂yi j ∂yi j
 
··· ··· ··· .
∂x11 ∂xm1 ∂x1n ∂xmn
If we take the vecm of this vector, we obtain the matrix
∂yi j ∂yi j
⎛ ⎞
⎜ ∂x · · ·
⎜ 11 ∂xm1 ⎟⎟
⎜ .. .. ⎟
⎜ . . ⎟
⎜ ⎟
⎝ ∂yi j ∂yi j ⎠
···
∂x1n ∂xmn
which is (γyi j /γX )′ . So, we have
γyi j
= (vecm (DY (X ))t · )′ (4.9)
γX
where t = ( j − 1)p + i, for j = 1, . . . , q and i = 1, . . . , p.
As
γyi j ′
 
vecm (DY (X ))t · =
γX
and a generalized vec can be undone by taking the rvec, we have
γyi j ′
 
(DY (X ))t · = rvec . (4.10)
γX
If we have in hand DY (X ) and if we can identify the tth row of this
matrix, then Equation 4.9 allows us to move from Concept 1 to Concept 3.
If, however, we have obtained γY/γX so we can identify the submatrix
γyi j /γX of this matrix, then Equation 4.10 allows us to move from Concept
3 to Concept 1.
146 Matrix Calculus

4.5.3 Concept 2 and Concept 3


Returning to Concept 3, the submatrices of γY/γX are

∂yi j ∂yi j
⎛ ⎞
⎜ ∂x ···
11 ∂x1n ⎟
γyi j ⎜ ⎟
= ⎜ ... .. ⎟

γX ⎜ . ⎟ ⎟
⎝ ∂yi j ∂yi j ⎠
···
∂xm1 ∂xmn

∂yi j
and the partial derivative is given by the (r, s)th element of this sub-
∂xrs
matrix. That is,

∂yi j γyi j
 
= .
∂xrs γX rs

It follows that
⎛
γy1q ⎞
 

γy11
⎜ γX ···
⎜ rs γX rs ⎟⎟
δY
=⎜ .. .. (4.11)
⎟.
⎜ ⎟
δxrs ⎜ .  .
⎝ γy p1 γy pq ⎠
 ⎟
···
γX rs γX rs

Starting now with Concept 2, the submatrices of δY/δX are

∂y1q
⎛ ⎞
∂y11
⎜ ∂x ···
⎜ rs ∂xrs ⎟⎟
= ⎜ ... .. ⎟
δY ⎜
δxrs ⎜ . ⎟ ⎟
⎝ ∂y p1 ∂y pq ⎠
···
∂xrs ∂xrs

and the partial derivative ∂yi j /∂xrs is the (i, j )th element of this submatrix.
That is,

∂yi j
 
δY
= .
∂xrs δxrs ij
4.6 Tranformation Principle One 147

It follows that
⎛  δY  
δY
 ⎞
···
⎜ δx11 i j δx1n i j ⎟
γyi j ⎜ .. ..

=⎜ ⎟. (4.12)
⎜ ⎟
γX ⎜ .   . ⎟
⎝ δY δY ⎠
···
δxm1 i j δxmn i j
If we have in hand γy/γX , then Equation 4.11 allows us to build up the
submatrices we need for δY/δX. If, however, we have a result for δY/δX ,
then Equation 4.12 allows us to obtain the submatrices we need for γY/γX.

4.6 Transformation Principle One


Several matrix calculus results when we use Concept 1 involve Kronecker
products whereas the equivalent results, using Concepts 2 and 3, involve the
elementary matrices we looked at in the start of Chapter 2. In this section,
we see that this is no coincidence.
We have just seen that
δY
= rvec p (DY ).ℓ (4.13)
δxrs
where ℓ = (s − 1)m + r and that
γyi j
= (vecm (DY )t · )′ (4.14)
γX
where t = ( j − 1)p + i. Suppose now that DY (X ) = A ⊗ B where A is a
q×n matrix and B is an p×m matrix. Then, we can call on the work we did
in Sections 2.2 and 2.3 of Chapter 2 to locate the ℓth column and tth row of
this Kronecker product. Using Equation 2.6 of that chapter, we have
(A ⊗ B).ℓ = vec BErsmn A ′ .
Undoing the vec by taking the rvec p , we have
rvec p (A ⊗ B).ℓ = BErsmn A ′ ,
so using Equation 4.13, we have that
δY
= BErsmn A ′ .
δxrs
Using Equation 2.4 of Chapter 2, we have
qp
(A ⊗ B)t · = rvec A ′ E ji B.
148 Matrix Calculus

Undoing the rvec by taking the vecm , we have


qp
vecm (A ⊗ B)t · = A ′ E ji B
so from Equation 4.14
γyi j  ′ qp ′ pq
= A E ji B = B ′ Ei j A,
γX
as
 qp  ′ pq
E ji = Ei j .
This leads us to our first transformation principle.

The First Transformation Principle


Let A be an q×n matrix and B be an p×m matrix. Whenever
DY (X ) = A ⊗ B
regardless of whether A and B are matrix functions of X or not
δY
= BErsmn A ′
δxrs
and
γyi j pq
= B ′ Ei j A
γX
and the converse statements are true also.

For this case,


⎛ mn ′ mn ′

BE11 A · · · BE1n A
δY ⎜ .. .. ⎟ = (I ⊗ B)U (I ⊗ A ′ ),
=⎝ . . ⎠ m mn n
δX mn ′ mn ′
BEm1 A · · · BEmn A
where Umn is the m2 ×n2 matrix introduced in Section 2.6 of Chapter 2,
given by
⎛ mn mn

E11 . . . E1n
Umn = ⎝ ... .. ⎟ .

. ⎠
mn mn
Em1 . . . Emn
We saw in Theorem 2.33 of Chapter 2 that
(A ⊗ B)Umn (C ⊗ D) = (vec BA ′ )(rvec C ′ D),
4.6 Tranformation Principle One 149

so
δY
= (vec B)(rvec A ′ ).
δX
In terms of Concept 3, for this case
⎛ ′ pq pq ⎞
B E11 A . . . B ′ E1q A
γY
= ⎝ ... .. ⎟ = (I ⊗ B ′ )U (I ⊗ A)

γX . ⎠ p pq q
′ pq ′ pq
B E p1 A . . . B E pq A
= (vec B ′ )(rvec A).

In terms of the entire matrices, we can express the First Transformation


Principle by saying that the following statements are equivalent:

DY (X ) = A ⊗ B
δY
= (vec B)(rvec A ′ )
δX
γY
= (vec B ′ )(rvec A).
γX

Examples of the Use of the First Transformation Principle

1. Y = AX B for A p×m and B n×q.


Then, it is known that

D(AX B) = B ′ ⊗ A. (4.15)

It follows that
δAX B
= AErsmn B
δxrs
and
γ(AX B)i j pq
= A ′ Ei j B ′ .
γX
Moreover,
δAX B
= (vec A)(rvec B)
δX

γAX B
= (vec A ′ )(rvec B ′ ).
γX
150 Matrix Calculus

2. If Y = X AX where X is an m×n matrix, then


δX AX
= Ersmn AX + X AErsmn .
δxrs
It follows that
γ(X AX )i j
= Eimn ′ ′ ′ ′ mn
j X A + A X Ei j
γX
and that

D(X AX ) = X ′ A ′ ⊗ Im + In ⊗ X A (4.16)
δX AX
= (vec Im )(rvec AX ) + (vec X A)(rvec In )
δX
γX AX
= (vec Im )(rvec X ′ A ′ ) + (vec A ′ X ′ )(rvec In ).
γX
3. Y = X ⊗ IG where X is an m×n matrix.
We have seen in Equation 2.29 of Chapter 2 that vec (X ⊗ IG ) =
(In ⊗ vecm KmG )vec X , so

D(X ⊗ IG ) = In ⊗ vecm KmG .

It follows that
δ(X ⊗ IG )
= (vecm KGm )Ersmn
δxrs
and
γ(X ⊗ IG )i j
= (vecm KGm )′ Eiknj where k = G 2 m.
γX
Moreover,
δ(X ⊗ IG )
= vec(vecm KmG )(rvec In ) = (vec ImG )(rvec In )
δX
γ(X ⊗ IG )
= vec(vecm KmG )′ (rvec In ) = (vec ImG )(rvec In ),
γX
where we have used Theorem 2.20 of Section 2.5 in Chapter 2.
4. Y = AX −1 B where A is p×n and B is n×q. Then, it is known that

γ(AX −1 B)i j ′ pq ′
= −X −1 A ′ Ei j B ′ X −1 .
γX
4.6 Tranformation Principle One 151

It follows straight away that


δAX −1 B
= −AX −1 Ersnn X −1 B,
δxrs
and that

D(AX −1 B) = −B ′ X −1 ⊗ AX −1 . (4.17)
Moreover,
δAX −1 B
= −(vec AX −1 )(rvec X −1 B)
δX
and
γAX −1 B ′ ′
= −(vec X −1 A ′ )(rvec B ′ X −1 ).
γX
5. Y = AX BXC where X is m×n, A is p×m, B is n×m, and C is n×q.
Then, it is well known that
δAX BXC
= AErsmn BXC + AX BErsmnC.
δxrs
It follows that
γ(AX BXC)i j pq pq
= A ′ Ei j C ′ X ′ B ′ + B ′ X ′ A ′ Ei j C ′
γX
and
D(AX BXC) = (C ′ X ′ B ′ ⊗ A) + (C ′ ⊗ AX B).
Moreover,
δAX BXC
= (vec A)(rvec BXC) + (vec AX B)(rvec C ).
δX
and
γAX BXC
= (vec A ′ )(rvec C ′ X ′ B ′ ) + (vec B ′ X ′ A ′ )(rvec C ′ ).
γX
I hope these examples make clear that this transformation principle ensures
that it is a very easy matter to move from a result involving one of the
concepts of matrix derivatives to the corresponding results for the other
two concepts. Although this principle covers a lot of cases, it does not cover
them all. Several matrix calculus results for Concept 1 involve multiplying a
Kronecker product by a commutation matrix. The following transformation
principle covers this case.
152 Matrix Calculus

4.7 Transformation Principle Two


Suppose then that
DY (X ) = Kqp (C ⊗ E ) = (E ⊗ C )Kmn
where C is an p×n matrix and E is an q×m matrix. Forming ∂Y/∂xrs from
this matrix requires that we first obtain the ℓth column of this matrix where
ℓ = (s − 1)m + r and we take the rvec p of this column. Again, we can call
on the work we did in Chapter 2. From Equation 2.22 of that chapter
δY
= CEsrnm E ′ .
δxrs
In forming γyi j /γX from DY , we first have to obtain the tth row of this
matrix, for t = ( j − 1)p + i and then we take the vecm of this row. The
required matrix γyi j /γX is the transpose of the matrix thus obtained.
Again, we call on the work we did in Chapter 2. From Equation 2.19 of that
chapter,
γyi j  ′ pq ′ qp
= C Ei j E = E ′ E ji C.
γX
This leads us to our second transformation principle.

The Second Transformation Principle


Let C be an p×n matrix and D be an q×m matrix. Whenever
DY (X ) = Kqp (C ⊗ E )
regardless of whether C and D are matrix functions of X or not
δY
= CEsrnm E ′
δxrs
γyi j qp
= E ′ E ji C
γX
and the converse statements are true also.

For this case,


⎛ nm ′ nm ′

CE11 E . . . CEn1 E
= ⎝ ... .. ⎟
δY ⎜
δX . ⎠
nm ′ nm ′
CE1m E . . . CEnm E
⎛ nm nm

E11 . . . En1
= (Im ⊗ C ) ⎝ ... .. ⎟ (I ⊗ E ′ ) = (I ⊗ C )K (I ⊗ E ′ ).

. ⎠ n m mn n
nm nm
E1m . . . Enm
4.7 Tranformation Principle Two 153

In terms of γY/γX , we have


⎛ ′ qp qp ⎞
E E11 C . . . E ′ Eq1 C
γY
= ⎝ ... .. ⎟ = (I ⊗ E ′ )K (I ⊗ C ).

γX . ⎠ p pq q
′ qp ′ qp
E E1p C . . . E Eqp C

In terms of the full matrices, we can express the Second Transformation


Principle as saying that the following statements are equivalent:

DY (X ) = Kqp (C ⊗ E )
δY
= (Im ⊗ C )Kmn (In ⊗ E ′ )
δX
γY
= (Ip ⊗ E ′ )Kpq (Iq ⊗ C ).
γX
As an example of the use of this second transformation principle, let Y =
AX ′ B where A is p×n and B is m×q. Then, it is known that

D(AX ′ B) = Kpq (A ⊗ B ′ ).

It follows that
δAX ′ B
= AEsrmn B
δxrs
and that
γ(AX ′ B)i j pq
= BE ji A.
γX
In terms of the entire matrices, we
δY
= (In ⊗ A)Knm (Im ⊗ B)
δX
γY
= (Iq ⊗ B)Kqp (Ip ⊗ A).
γX
Principle 2 comes into its own when it is used in conjunction with Princi-
ple 1. Many matrix derivatives come in two parts: one where Principle 1 is
applicable and the other where Principle 2 is applicable.
For example, we often have

DY (X ) = A ⊗ B + Kqp (C ⊗ E ),

so we would apply Principle 1 to the A ⊗ B part and Principle 2 to the


Kqp (C ⊗ E ) part.
154 Matrix Calculus

Examples of the Combined Use of Principles One and Two


1. Let Y = X ′ AX where X is m×n, A is m×m. Then, it is well known
that
D(X ′ AX ) = Knn (In ⊗ X ′ A ′ ) + (In ⊗ X ′ A).
It follows that
δX ′ AX
= Esrnm AX + X ′ AErsmn
δxrs
and that
γ(X ′ AX )i j
= AX E nn ′ nn
ji + A X Ei j .
γX
Moreover,
δX ′ AX
= Kmn (In ⊗ AX ) + (Im ⊗ X ′ A)Umn
δX
= Kmn (In ⊗ AX ) + (vec X ′ A)(rvec In ).
γX ′ AX
= (In ⊗ AX )Knn + (In ⊗ A ′ X )Unn
γX
= (In ⊗ AX )Knn + (vec A ′ X )(rvec In ).
2. Let Y = X AX ′ where X is m×n and A is n×n. Then, it is known
that
δX AX ′
= X AEsrnm + Ersmn AX ′ .
δxrs
It follows that
γ(X AX ′ )i j
= E mm mm
ji X A + Ei j X A

γX
and
D(X AX ′ ) = Kmm (X A ⊗ Im ) + (X A ′ ⊗ Im ). (4.18)
Moreover,
δX AX ′
= (Im ⊗ X A)Kmn + Umn (In ⊗ AX ′ )
δX
= (Im ⊗ X A)Kmn + (vec Im )(rvec AX ′ ),
and
γX AX ′
= Kmm (Im ⊗ X A ′ ) + Umm (Im ⊗ X A)
γX
= Kmm (Im ⊗ X A) + (vec Im )(rvec AX ′ ).
4.7 Tranformation Principle Two 155

3. Let Y = BX ′ AXC where B is p×n, A is m×m and C is n×q. Then, it


is known that
γ(BX ′ AXC )i j qp pq
= AXCE ji B + A ′ X B ′ Ei j C ′ .
γX
It follows using our principles that
δBX ′ AXC
= BEsrnm AXC + BX ′ AErsmnC
δxrs
and that
D(BX ′ AXC ) = Kqp (B ⊗ C ′ X ′ A ′ ) + (C ′ ⊗ BX ′ A).
In terms of the entire matrices, we have
δBX ′ AXC
= (Im ⊗ B)Kmn (In ⊗ AXC) + (Im ⊗ BX ′ A)Umn (In ⊗ C )
δX
= (Im ⊗ B)Kmn (In ⊗ AXC) + (vec BX ′ A)(rvec C ).
γBX ′ AXC
= (Ip ⊗ AXC)Kpq (Iq ⊗ B) + (Ip ⊗ A ′ X B ′ )Upq (Iq ⊗ C ′ )
γX
= (Ip ⊗ AXC)Kpq (Iq ⊗ B) + (vec A ′ X B ′ )(rvec C ′ ).
4. Let Y = BX AX ′C where B is p×m, A is n×n, and C is m×q. Then, it
is well known that
D(BX AX ′C ) = Kqp (BX A ⊗ C ′ ) + (C ′ X A ′ ⊗ B).
Using our principles we obtain,
δBX AX ′C
= BX AEsrnmC + BErsmn AX ′C
δxrs
and
γ(BX AX ′C )i j qp pq
= CE ji BX A + B ′ Ei j C ′ X A ′ .
γX
Moreover, we have
δBX AX ′C
= (Im ⊗ BX A)Kmn (In ⊗ C ) + (Im ⊗ B)Umn (In ⊗ AX ′C )
δX
= (Im ⊗ BX A)Kmn (In ⊗ C ) + (vec B)(rvec AX ′C ).
γBX AX ′C
= (Ip ⊗ C )Kpq (Iq ⊗ A ′ X ′ B ′ ) + (Ip ⊗ B ′ )Upq (Iq ⊗ C ′ X A ′ )
γX
= (Ip ⊗ C )Kpq (Iq ⊗ BX A) + (vec B ′ )(rvec C ′ X A ′ ).
156 Matrix Calculus

Comparing Example 1 with Example 3, and Example 4 with Example 2,


points to rules that pertain to the different concepts of derivatives
themselves. If Y (X ) is an p×q matrix function of an m×n matrix X,
and A, B, and C are matrices of constants, then

D(BYC ) = (C ′ ⊗ B)DY (X )
δBYC δY
=B C
δxrs δxrs
δBYC δY
= (In ⊗ B) (In ⊗ C ).
δX δX
The third concept of a matrix derivative is not so accommodating.
Certainly, there are rules that allow you to move from γYi j /γX and
γY /γX to γ(BYC )i j /γX and γBYC/γX respectively, but these are
more complicated.
The following results are not as well known:
5. Let Y = E ′ E where E = A + BXC with A p×q, B p×m, and C n×q.
Then, from Lutkepohl (1996), p. 191, we have

D(E ′ E ) = Kqq (C ′ ⊗ E ′ B) + C ′ ⊗ E ′ B.

Using our principles, we obtain


∂E ′ E
= C ′ Esrnm B ′ E + B ′ EErsmnC
∂xrs
and
γ(E ′ E )i j qq qq
= B ′ EE ji C ′ + B ′ EEi j C ′ .
γX
In terms of the complete matrices, we have
δE ′ E
= (Im ⊗ C ′ )Kmn (In ⊗ B ′ E ) + (Im ⊗ E ′ B)Umn (In ⊗ C )
δX
= (Im ⊗ C ′ )Kmn (In ⊗ B ′ E ) + (vec E ′ B)(rvec C )
γE ′ E
= (Iq ⊗ B ′ E )Kqq (Iq ⊗ E ′ ) + (Iq ⊗ B ′ E )Uqq (Iq ⊗ C ′ )
γX
= (Iq ⊗ B ′ E )Kqq (Iq ⊗ C ′ ) + (vec B ′ E )(rvec C ′ ).

6. Let Y = EE ′ where E is as in 5.
Then, from Lutkepohl (1996), p. 191, again we have

D(EE ′ ) = Kpp (EC ′ ⊗ B) + (EC ′ ⊗ B).


4.8 Recursive Derivatives 157

It follows that
δEE ′
l = EC ′ Esrnm B ′ + BErsmnCE ′
δxrs
γ(EE ′ )i j pp pp
= B ′ E ji EC ′ + B ′ Ei j EC ′
γX
or in terms of complete matrices

δEE ′
= (Im ⊗ EC ′ )Kmn (In ⊗ B ′ ) + (Im ⊗ B)Umn (In ⊗ CE ′ )
δX
= (Im ⊗ EC ′ )Kmn (In ⊗ B ′ ) + (vec B)(rvec CE ′ )
γEE ′
= (Ip ⊗ B ′ )Kpp (Ip ⊗ EC ′ ) + (Ip ⊗ B ′ )Upp (Ip ⊗ EC ′ )
γX
= (Ip ⊗ B ′ )Kpp (Ip ⊗ EC ′ ) + (vec B ′ )(rvec EC ′ ).

The next chapter looks at some new matrix calculus results or at least
old results expressed in a new way. We deal with matrix derivatives using
Concept 4 that involves cross-products and generalized vecs and rvecs.
As far as cross-products are concerned, we can apply our principles to
the transpose of every Kronecker product in the cross-product to get the
corresponding results for the other concepts of matrix derivatives.

4.8 Recursive Derivatives


Let Y (x) be an p×q matrix function of an m×1 vector x. Then, Rilstone,
Srivastava, and Ullah (1996) consider a derivative of Y with respect to x that
is a variation of Concept 3. That is, they define this derivative as
⎛ ⎞
Dy11 · · · Dy1q
∇Y = ⎝ ... .. ⎟ (4.19)

. ⎠
Dy p1 ··· D pq

where yi j is the (i, j )th element of Y . As each submatrix,

∂yi j ∂yi j
 
Dyi j = ···
∂x1 ∂xm

is 1×m this matrix is p×qm. They then define a matrix of the second order
partial derivatives, the ∂ 2 yi j /∂xr ∂xs , as ∇ 2Y = ∇(∇Y ). That is, to form
158 Matrix Calculus

∇ 2Y we take D of every element in ∇Y so ∇ 2Y is p×qm2 . Matrices of


higher order partial derivatives can be defined recursively by
∇ v Y = ∇(∇ v−1Y )
where ∇ v Y is an p×qmv matrix. If we let yiv−1 j denote the (i, j ) th element
of ∇ v−1Y for i = 1, . . . , p and j = 1, . . . , qmv−1 , then in ∇ v Y this becomes
the 1×m vector Dyiv−1 j .
In this section, we want to look at the relationships between ∇Y and
higher order derivatives as defined by Rilstone, Srivastava, and Ullah (1996),
on the one hand, with those derived using the more conventional concept
of a matrix derivative, Concept 1 studied in the previous sections.
Consider Concept 1 for this case, which is the pq×m matrix
Dy11
⎛ ⎞
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜Dy p1 ⎟
⎜ . ⎟
⎜ ⎟
DY (x) = ⎜ .. ⎟ . (4.21)
⎜ ⎟
⎜Dy ⎟
⎜ 1q ⎟
⎜ . ⎟
⎝ .. ⎠
Dy pq
By comparing Equations 4.19 and 4.21, we see that
∇Y = rvec p DY (x),
and
DY (x) = vec∇Y.
Two special cases come to mind. If Y is a scalar function of x, so p = q = 1,
or if Y is a vector function of x, so q = 1, then
∇Y = DY (x).
Consider now the p×qm2 matrix,
∂y1q ∂y1q
⎛        ⎞
∂y11 ∂y11
⎜D ∂x ··· D
∂xm
··· D
∂x1
··· D
∂xm ⎟
⎜ 1 ⎟
∇ 2Y = ⎜
⎜ .
.. .
.. .
.. .. ⎟
⎜  . ⎟

∂y p1 ∂y p1 ∂y pq ∂y pq ⎠
     

D ··· D ··· D ··· D
∂x1 ∂xm ∂x1 ∂xm
(4.22)
4.8 Recursive Derivatives 159

and compare it with the matrix of second order partial derivatives that
would have been formed using Concept 1, namely the pqm×m matrix,
which written out in full is
⎛  ⎞
∂y11
⎜D ∂x ⎟
⎜ 1 ⎟
⎜ .. ⎟
⎜  . ⎟
⎜ ⎟
⎜ ∂y p1 ⎟
⎜D ⎟

⎜ ∂x1 ⎟ ⎟
⎜ .
.

⎜  . ⎟
⎜ ⎟

⎜D ∂y 1q ⎟

∂x1 ⎟
⎜ ⎟

⎜ .. ⎟
⎜  . ⎟
⎜ ⎟
⎜ ∂y ⎟
pq ⎟
⎜D

∂x1 ⎟


D2Y = D(vec D) = ⎜
⎜ . ⎟
⎜  .. ⎟ . (4.23)

⎜ ∂y11 ⎟ ⎟
⎜D


⎜ ∂xm ⎟
⎜ . ⎟
⎜  .. ⎟
⎜ ⎟
⎜ ∂y p1 ⎟
⎜D
⎜ ⎟

⎜ ∂xm ⎟
⎜ .. ⎟
.
⎜ ⎟
⎜ ⎟
⎜  ∂y ⎟
⎜ 1q ⎟
⎜D ⎟
⎜ ∂xm ⎟
..
⎜ ⎟
⎜ ⎟
⎜ . ⎟
∂y pq ⎠
⎜  ⎟

D
∂xm
Comparing Equations 4.22 and 4.23, we have that
D2Y = vecm [(∇ 2Y )T ′m,m,...,m ],
where Tm,m,...,m is the appropriate twining matrix. But from Equation 2.69
of Chapter 2,
T ′m,m,...,m = Kqm ⊗ Im
so, we have
D2Y = vecm [(∇ 2Y )(Kqm ⊗ Im )]. (4.24)
160 Matrix Calculus

Moreover, as a generalized vec can always be undone by a generalized rvec,


rvec p in this case, we have

rvec p (D2Y ) = ∇ 2Y (Kqm ⊗ Im )


so
∇ 2Y = rvec p (D2Y )(Kmq ⊗ Im ). (4.25)
If Y (x) = y(x) is a vector function of x, so q = 1, then as K1m = Km1 = Im ,
we have
D2 y = vecm ∇ 2 y
and
∇ 2 y = rvec p D2 y.
If Y (x) = ℓ(x) is a scalar function of x, so p = q = 1, then we have
D2 ℓ(x) = vecm ∇ 2 ℓ
and
∇ 2 ℓ = rvec (D2 ℓ(x)).
Notice that using Concept 1
D(Dℓ(x))′
is the Hessian matrix of the function ℓ(x). That is, it is the m×m symmetric
matrix whose (i, j )th is ∂ 2 ℓ/∂x j ∂xi = ∂ 2 ℓ/∂xi ∂x j . Again, notice also that
using Concept 1 if ℓ(x) is a scalar function of a n×1 vector x, b is an p×1
vector of constants, and A is an m×q matrix of constants, then
D(bℓ(x)) = bD(ℓ(x)) (4.28)
and
D(Aℓ(x)) = vecAD(ℓ(x)). (4.29)
Continuing with the special case Y (x) = y(x) further, suppose we denote
the pmv−2 ×m matrix of v − 1 order partial derivatives we get using Concept
1 by Dv−1 y(x). Then, the matrix of pmv−1 ×m matrix of v order partial
derivatives, we get using this concept is
Dv y = D(vec Dv−1 y(x))
4.8 Recursive Derivatives 161

and a little reflection shows that


Dv y = vecm ∇ v y (4.26)
and
∇ v y = rvec p (Dv y) (4.27)
for all higher order derivatives v ≥ 2.
We illustrate these results by way of the example provided by Rilstone
et al., (1996). They consider the version of the exponential regression model
where the probability density function for yt is
f (yt ;β) = exp(x t′ β − yt exp(x t′ β))
and xt a K ×1 vector of constants, and β is a K ×1 vector of unknown
parameters.1
The probability density function of our sample is
n

f (y;β) = exp [x t′ β − yt exp(x t′ β)],
t =1

so the log likelihood function is


n

ℓ(β) = [x t′ β − yt exp(x t′ β)].
t =1

Sticking with Concept 1


n

D(ℓ(β)) = D[x t′ β − yt exp(x t′ β)].
t =1

Using the chain rule of ordinary calculus and Equation 4.29, we obtain
n

D(ℓ(β)) = [x t′ − yt exp(x t′ β)x t′ ]. (4.30)
t =1

Taking the transpose of Equation 4.30, the maximum likelihood estimator


βˆ of β solves
n
1 ˆ =0
q ( β)
n t =1 t
1
Rilstone et al. actually consider the case where xt is a random vector and f (yt ;β) is the
conditional density function, but as we are illustrating matrix calculus techniques we take
the simpler case.
162 Matrix Calculus

where the k×1 vector qt (β) is given by


qt (β) = xt − yt exp(x t′ β)xt .
Rilstone et al., (1996) obtain successively higher order derivatives of qt (β )
using their recursive operator ∇. Here, instead, the derivatives are obtained
using Concept 1 and the corresponding results for ∇ are obtained using
Equation 4.27.
Write
qt (β) = xt − xt µt (β),
where µt (β) is the scalar function of β given by µt (β) = yt exp(xt′ β). Using
the chain rule of ordinary calculus and Equation 4.29, we have
Dµt (β) = yt exp(x t′ β)x t′ = x t′ µt ,
so using Concept 1 and Equation 4.28,
Dqt (β) = −xt x t′ µt = ∇qt (β).
Differentiating again using Concept 1 gives
D2 qt (β) = D(vec xt x t′ µ) = −(vec xt x t′ )x t′ µt .
But from Equation 1.11 of Chapter 1, vecxt xt′ = xt ⊗ xt , so
D2 qt (β) = A1 µt ,
where A1 = −xt ⊗ xt xt′ . Differentiating again using Equation 4.28 gives
D3 qt (β) = D vecA1 µ1 = (vec A1 )x t′ µt .
Again, by Equation 1.11 of Chapter 1
vec A1 = −vec (xt ⊗ xt )x t′ = −xt ⊗ xt ⊗ xt
so
D3 q1 (β) = A2 µt
where A2 = −xt ⊗ xt ⊗ xt xt′ . Continuing in this manner it is clear that
Dv q1 (β) = Av−1 µt = −(xt ⊗ . . . ⊗ xt )x t′ µt
for v ≥ 2 where
Av−1 = −xt ⊗ . . . ⊗ xt ⊗xt x t′ .
! "# $
v−1
4.8 Recursive Derivatives 163

Using Equation 4.27, we have


∇ v qt (β) = (rveck Av−1 )µt .
But from Equation 1.17 of Chapter 1,
rveck Av−1 = −(xt ⊗ . . . ⊗ xt )′ ⊗ xt x t′ = −x t′ ⊗ . . . ⊗ x t′ ⊗ xt x t′
= −xt (x t′ ⊗ . . . ⊗ x t′ ),
so
∇ v qt (β) = −xt (x t′ ⊗ . . . ⊗ x t′ )µt v ≥ 2.
Notice that for this special example
∇ v qt (β) = (Dv qt (β))′ .
FIVE

New Matrix Calculus Results

5.1 Introduction
In this chapter, we develop new matrix calculus results or at least view
existing results in a new light. We concentrate on results that involve the
mathematical concepts developed in Chapters 1 and 2, particularly results
that involve generalized vecs and rvecs on the one hand and cross-products
on the other.
We avoid as much as possible matrix calculus results that are well known.
If the reader wants to familiarize themselves with these, then I refer them to
Magnus and Neudecker (1999), Lutkepohl (1996), and Turkington (2005).
Having said this, however, because I want this book to be self-contained,
it is necessary for me to at least present matrix calculus results, which we
use all the time in our derivations. These results on the whole form rules,
which are the generalizations of the chain rule and the product rule of
ordinary calculus.
We saw in the last chapter that at least four different concepts of matrix
derivatives are prevalent in the literature and that using transformation
principles is an easy matter to move from results derived for one concept to
the corresponding results for the other concepts. That is not to say, however,
that new results can be just as easily obtained regardless of what concept one
chooses to work with. Experience has shown that by far the easiest concept
to use in deriving results for difficult cases is Concept 1, or the transpose
of this concept, which we called Concept 4. In the following sections, we
develop basic rules for Concept 4.

5.2 Concept of a Matrix Derivative Used


In this chapter the concept of a matrix derivative used is Concept 4.
Recall that if y = y(x) is an s×1 vector function of x, an r ×1 vector
164
5.2 Concept of a Matrix Derivative Used 165

then under this concept we define ∂y/∂x as


⎛ ⎞
∂y1 ∂ys
⎜ ∂x · · · ∂x1 ⎟
∂y ⎜ 1 ⎟
= ⎜ ... ..
⎟. (5.1)
⎜ ⎟
∂x ⎜ . ⎟
⎝ ∂y1 ∂ys ⎠
···
∂xr ∂xr
(See Graham 1981). If Y is an p×q matrix whose elements yi j are differen-
tiable functions of the elements xrs of an m×n matrix X, then the derivative
of Y with respect to X we work withis the mn× pq matrix

∂y11 ∂y p1 ∂y1q ∂y pq ⎤
⎢ ∂x11 · · · ∂x11 · · · ∂x11 · · · ∂x11 ⎥
⎢ .. .. .. .. ⎥
⎢ ⎥
⎢ . . . . ⎥
∂y p1 ∂y1q ∂y pq ⎥
⎢ ⎥
⎢ ∂y11

⎢ ∂x ··· ··· ··· ⎥
⎢ m1 ∂x m1 ∂x m1 ∂x ⎥
m1 ⎥
∂vec Y ⎢ .. . . .
=⎢ . .. .. .. ⎥ ⎥,
∂vec X ⎢ ⎥
⎢ ∂y11 ∂y p1 ∂y1q ∂y pq ⎥

⎢ ∂x ··· ··· ··· ⎥
⎢ 1n ∂x 1n ∂x 1n ∂x ⎥
1n ⎥
⎢ . .. .. .. ⎥
⎢ .. . . . ⎥
⎢ ⎥
⎣ ∂y
11
∂y p1 ∂y 1q ∂y pq

··· ··· ···
∂xmn ∂xmn ∂xmn ∂xmn
where ∂yi j /∂xrs is the partial derivative of yi j with respect to xrs . A column
of this matrix gives the derivatives of yi j with respect to all the elements of
X , x11 . . . xm1 . . . x1n . . . xmn . A row of this matrix gives the derivatives of
y11 . . . y p1 . . . y1q... y pq with respect to xrs , a single element of X.
If y is a scalar, then y(x) is a scalar function of x and ∂y/∂x is the r ×1
vector given by
⎛ ⎞
∂y
⎜ ∂x ⎟
∂y ⎜ 1⎟
= ⎜ ... ⎟
⎜ ⎟
∂x ⎜ ⎟
⎝ ∂y ⎠
∂xr
which is often called the gradient vector of y. Similarly, if x is a scalar ∂y/∂x
is the 1×s vector
 
∂y ∂y1 ∂y
= ... s .
∂x ∂x ∂x
166 New Matrix Calculus Results

For the general case given by Equation 5.1, where y and x are s×1 and
r ×1 vectors, respectively, the jth column of ∂y/∂x is the derivative of a
scalar function with respect to a vector, namely ∂y j /∂x, whereas the i th
row of the matrix ∂y/∂x is the derivative of a vector with respect to a scalar,
namely ∂y/∂xi .
In deriving results, where y = vec Y is a complicated vector function of
x = vec X , we need a few basic rules for ∂y/∂x, which I now intend to give
with proofs. For a more complete list of known matrix calculus results,
consult the references previously given.
The last section presents some simple theorems concerning ∂vec Y/
∂vec X. These theorems at first glance appear trivial, but taken together
they give a very effective method of finding new matrix calculus results.
This method is then applied to obtain new results for derivatives involving
vec A, vech A, and v(A) where A is an n×n matrix.

5.3 Some Basic Rules of Matrix Calculus


Theorem 5.1 Let x be an r ×1 vector and let A be a matrix of constants; that
is, the elements of A = {ai j } are not scalar functions, of x.
Then

∂Ax
= A′
∂x
∂x ′ Ax
= (A + A ′ )x = 2Ax if A is symmetric.
∂x

Proof: The jth element of Ax is k a jk xk so the jth column of ∂Ax/∂x

A j . and 
is ∂Ax/∂x = A ′ . The jth element of ∂x ′ Ax/∂x is ∂x ′ Ax/∂x j =
′ ′
i ai j xi + ℓ a j ℓ xℓ so ∂x Ax/∂x = (A + A )x. Clearly, if A is symmetric,
the result becomes 2Ax. 

The next rule represents a generalization of the chain rule of ordinary


calculus.

Theorem 5.2 (The Backward Chain Rule) Let x = (xi ), y = (yℓ ), and
z = (z j ) be r ×1, s×1, and t ×1 vectors, respectively. Suppose z is a vec-
tor function of y, which in turn is a vector function of x, so we can write
z = z[y(x)].
5.3 Some Basic Rules of Matrix Calculus 167

Then,
∂z ∂y ∂z
= .
∂x ∂x ∂y

Proof: The (i, j )th element of the matrix ∂z/∂x is


s
∂z j ∂yk ∂z j
       
∂z  ∂y ∂z ∂y ∂z
= = = = .
∂x i j ∂xi ∂xi ∂yk ∂x i· ∂y ·j ∂x ∂y i j
k=1

Hence,
∂z ∂y ∂z 
= .
∂x ∂x ∂y
In developing the next rule, the product rule, it is useful for us to refer to
a generalization of the chain rule where z is a vector function of two vectors
u and v. This generalization is given by the following theorem.

Theorem 5.3 (Generalized Chain Rule) Let z be an t ×1 vector function of


two vectors u and v, which are h×1 and k×1, respectively. Suppose u and v
are both vector functions of an r ×1 vector x, so z = z[u(x), v(x)]. Then,
 
∂z ∂u ∂z ∂u ∂z ∂z  ∂z 
= + = +
∂x ∂x ∂u ∂x ∂v ∂x v constant ∂x u constant

Proof: Similar to that of Theorem 5.2. 

Theorem 5.3 can now be used to obtain the following product rule.

Theorem 5.4 (The Product Rule) Let X be an m×n matrix and Y be an


n× p matrix, and suppose that the elements of both these matrices are scalar
functions of a vector z. Then,
∂vec XY ∂vec X ∂vec Y
= (Y ⊗ Im ) + (Ip ⊗ X ′ ).
∂z ∂z ∂z

Proof: By Theorem 5.3, we have


 
∂vec XY ∂vec XY  ∂vec XY 
= +
∂z ∂z vecY constant ∂z vecX constant
 
∂vec X ∂vec XY  ∂vec Y ∂vec XY 
= +
∂z ∂vec X vecY constant ∂z ∂vec X vecX constant
168 New Matrix Calculus Results

where this last equality follows from the backward chain rule. The result
follows by noting that

vec XY = (Y ′ ⊗ Im )vec X = (Ip ⊗ X )vec Y

and applying Theorem 5.1. 

Theorem 5.4 has the following useful corollary.

Corollary to Theorem 5.4 Let x be an n×1 vector, f (x) be a scalar function


of x, u(x) and v(x) be m×1 vector functions of x, and A(x) and B(x) be
p×m and m×q matrices, respectively, whose elements are scalar functions of
x. Then,

∂ f (x)x ∂ f (x) ′ ∂ f (x) ′


= f (x) ⊗ In + x = f (x)In + x,
∂x ∂x ∂x
∂ f (x)u(x) ∂u(x) ∂ f (x)
= ( f (x) ⊗ Im ) + u(x) ′
∂x ∂x ∂x
∂u(x) ∂ f (x)
= f (x) + u(x) ′ ,
∂x ∂x
∂u(x) ′ v(x) ∂u(x) ∂v(x)
= v(x) + u(x),
∂x ∂x ∂x
∂ vec u(x)v(x) ′ ∂u(x) ∂v(x)
= (v(x) ′ ⊗ Im ) + (Im ⊗ u(x) ′ ),
∂x ∂x ∂x
∂A(x)u(x) ∂ vec A(x) ∂u(x)
= (u(x) ⊗ Ip ) + A(x) ′ ,
∂x ∂x ∂x
∂ vec u(x) ′ B(x) ∂B(x) ′ u(x) ∂ vec [B(x) ′ ] ∂u(x)
= = (u(x) ⊗ Iq ) + B(x).
∂x ∂x ∂x ∂x
These few basic results will suffice in the derivations that follow.

5.4 Matrix Calculus Results Involving Generalized


Rvecs or Cross-Products
In this section, we obtain a number of matrix calculus results that can be
expressed in terms of cross-products or generalized vecs and rvecs. In so
doing, we call on the work we have done involving these operators together
with work we did on generalized vecs and rvecs of the commutation matrix.
These results pertain to the matrix differentiation of vecs of Kronecker
products.
5.4 Matrix Calculus Results Involving Generalized Rvecs or Cross-Products 169

Recall from Section 2.5 of Chapter 2 that for an m×n matrix X, we can
write
vec(X ⊗ IG ) = (In ⊗ vecm KmG )vec X
and
vec(IG ⊗ X ) = (vecm KnG ⊗ Im )vec X.
It follows using Theorem 5.1 that
∂vec(X ⊗ IG )
= (In ⊗ vecm KmG ) ′ = In ⊗ (vecm KmG ) ′ = In ⊗ rvecm KGm
∂vec X
(5.2)
and that
∂vec(IG ⊗ X )
= (vecn KnG ⊗ Im ) ′ = (vecn KnG ) ′ ⊗ Im = rvecn K Gn ⊗ Im .
∂vec X
(5.3)
These two results are the building blocks of numerous other results
involving the derivatives of the vecs of Kronecker products. We see that we
can write these derivatives either in terms of generalized rvecs or in terms
of cross-products and that both cases our results involve the commutation
matrix.
Consider now an p×G matrix A whose elements are not functions of the
elements of X. Then,
vec(X ⊗ A) = vec[(Im ⊗ A)(X ⊗ IG )] = (InG ⊗ Im ⊗ A)vec(X ⊗ IG ).
Using the backward chain rule, Theorem 4.2, we have
∂vec(X ⊗ A) ∂vec(X ⊗ IG )
= (InGm ⊗ A ′ ).
∂vec X ∂vec X
From Equation 5.2, we can now write
∂vec(X ⊗ A)
∂vec X
= (In ⊗ rvecm KGm )(In ⊗ IGm ⊗ A ′ )=In ⊗ (rvecm KGm )(IGm ⊗ A ′ ) (5.4)
= In ⊗ rvecm [KGm (Im ⊗ A ′ )], (5.5)
using Equation 1.19 of Chapter 1, which gives the derivative in terms of
generalized rvecs. If we want the equivalent result in terms of cross-products,
we apply Theorem 2.28 of Chapter 2 to Equation 5.4 to obtain
∂vec(X ⊗ A)
= KGn τGnm [KGm (Im ⊗ A ′ )]. (5.6)
∂vec X
170 New Matrix Calculus Results

We can investigate this result further by applying Theorem 2.25 of Chapter 2


to Equation 5.5 to obtain

∂vec(X ⊗ A)
= In ⊗ (Im ⊗ a1′ . . . Im ⊗ aG′ ).
∂vec X
Alternatively, as
′ ⎞
In ⊗ e1G Im ⊗ a1′
⎛ ⎛ ⎞

KGn .. ′ ..
=⎝ ⎠ and KGm (Im ⊗ A ) = ⎝
⎜ ⎟ ⎜ ⎟
. . ⎠

In ⊗ eGG Im ⊗ aG′

using Equation 5.6, we can write the same result as

∂vec(X ⊗ A)  ′ ′
= In ⊗ e1G ⊗ Im ⊗ a1′ + · · · + In ⊗ eGG ⊗ Im ⊗ aG′ .
    
∂vec X
In a similar manner,

vec(A ⊗ X ) = vec[(A ⊗ Im )(IG ⊗ X )] = (IGn ⊗ A ⊗ Im )vec(IG ⊗ X ).

Using the backward chain rule and Equation 5.3, we have

∂vec(A ⊗ X )
∂vec X
∂vec(IG ⊗ X )
= (IGn ⊗ A ′ ⊗ Im ) = (rvecm KGn ⊗ Im )(IGn ⊗ A ′ ⊗ Im )
∂vec X
(5.7)
= (rvecm KGn )(IGn ⊗ A ′ ) ⊗ Im = rvecm [KGn (In ⊗ A ′ )] ⊗ Im , (5.8)

by Equation 1.19 of Chapter 1, which gives the derivative in terms of gener-


alized rvecs. If we want the equivalent result in terms of cross-products, we
apply Theorem 2.28 of Chapter 2 to Equation 5.7 to obtain

∂vec(A ⊗ X )
= IGn τGnm (A ′ ⊗ Im ). (5.9)
∂vec X
Again, we can investigate this result further by applying Theorem 2.25 of
Chapter 2 to Equation 5.8 to obtain

∂vec(A ⊗ X )
= (In ⊗ a1′ . . . In ⊗ aG′ ) ⊗ Im .
∂vec X
5.4 Matrix Calculus Results Involving Generalized Rvecs or Cross-Products 171

An alternative way of writing this result uses the cross-product given in


Equation 5.9. Write
⎛ ′ ⎞ ⎛ ′ ⎞
e1 ⊗ In a1 ⊗ Im
IGn = IG ⊗ In = ⎝ ... ⎠ , A ′ ⊗ Im = ⎝ ..
⎜ ⎟ ⎜ ⎟
. ⎠
eG′ ⊗ In aG′ ⊗ Im
so
∂vec(A ⊗ X )
= IGn τGnm (A ′ ⊗ Im ) = (e1′ ⊗ In ) ⊗ (a1′ ⊗ Im )
∂vec X
+ · · · + (eG′ ⊗ In ) ⊗ (aG′ ⊗ Im ).

Suppose now A and B are mG× p and nG×q matrices whose elements are
not functions of the elements of the m×n matrix X. Consider

vec A ′ (IG ⊗ X )B = (B ′ ⊗ A ′ )vec(IG ⊗ X ).

Applying the backward chain rule we have,

∂vec A ′ (IG ⊗ X )B ∂vec(IG ⊗ X )


= (B ⊗ A) = (rvecm KGn ⊗ Im )(B ⊗ A)
∂vec X ∂vec X
= BτGnm A, (5.10)

by Theorem 2.28 of Chapter 2.


If we partition A and B as follows,
⎛ ⎞ ⎛ ⎞
A1 B1
⎜ .. ⎟ ⎜ .. ⎟
A = ⎝ . ⎠ and B = ⎝ . ⎠
AG BG

where each submatrix Ai is m× p and each submatrix B j is n×q, then

∂vec A ′ (IG ⊗ X )B
= (B1 ⊗ A1 ) + · · · + (BG ⊗ AG ).
∂vec X
The result for A ′ (X ⊗ IG )B is easily obtained by writing

A ′ (X ⊗ IG )B = A ′ KmG (IG ⊗ X )KGn B,

so using Equation 5.10, we have

∂vec A ′ (X ⊗ IG )B
= KGn BτGnm KGm A. (5.11)
∂vec X
172 New Matrix Calculus Results

If we want to expand this cross-product, recall from Theorem 2.10 of Chap-


ter 2 that
⎛ (1) ⎞
A
⎜ .. ⎟
KGm A = ⎝ . ⎠
A(G)

where the A( j ) s refer to the partitioning A = (A1′ . . . Am′ ) ′ where each Ai is


G× p. So
∂vec A ′ (X ⊗ IG )B
= B (1) ⊗ A(1) + · · · + B (G) ⊗ A(G)
∂vec X
where the B ( j ) s refer to the partitioning B = (B1′ . . . Bn′ ), in which each Bi
is G×q.
Special cases of the last two results are worthy of mention.
If a is an mG×1 vector and b is an nG×1 vector, then Equation 5.10
gives
∂a ′ (IG ⊗ X )b
= bτGnm a
∂vec X
whereas Equation 5.11 gives
∂a ′ (X ⊗ IG )b
= KGn bτGnm KGm a = vec A ′ B
∂vec X
by Theorem 2.15 where A = rvecG a and B = rvecG b.
Suppose D is an G×s matrix of constants and B is now an ns×q matrix,
A and X as previously. Then, expressions such as A ′ (D ⊗ X )B are easily
handled. We write

vec A ′ (D ⊗ X )B = vec A ′ (D ⊗ Im )(Is ⊗ X )B

so, using Equation 5.10, we have


∂vec A ′ (D ⊗ X )B
= Bτsn m (D ′ ⊗ Im )A.
∂vec X
If we partition B as,
⎛ ⎞
B1
B = ⎝ ... ⎠ (5.12)
⎜ ⎟

Bs
5.4 Matrix Calculus Results Involving Generalized Rvecs or Cross-Products 173

where each submatrix B j is n×q, then

∂vec A ′ (D ⊗ X )B
= (B1 ⊗ (d1′ ⊗ Im )A) + · · · + (Bs ⊗ (ds′ ⊗ Im )A)
∂vec X
where d j is the jth column of D for j = 1, . . . , s. Similarly, if A is now ms× p,
then

A ′ (X ⊗ D)B = A ′ (Im ⊗ D)(X ⊗ Is )B

and from Equation 5.11

∂vec A ′ (X ⊗ D)B
= Ksn Bτsnm Ksm (Im ⊗ D ′ )A = B (1) ⊗ (Im ⊗ d1′ )A
∂vec X
+ · · · + B (s) ⊗ (Im ⊗ ds′ )A.

when the B ( j ) s refer to the previous partitioning given by Equation 5.12.


So far our results have been acquired with the application of Theorem
5.1 and Theorem 5.2, or the backward chain rule. Further results bring in
the product rule as presented in Theorem 5.4.
Suppose that A is m2 × p and B is n2 ×q, and both these matrices are
matrices of constants.
In obtaining the derivative of A ′ (X ⊗ X )B, we write

A ′ (X ⊗ X )B = A ′ (X ⊗ Im )(In ⊗ X )B.

Applying the product rule, we have

∂vec A ′ (X ⊗ X )B ∂vec A ′ (X ⊗ Im )
= ((In ⊗ X )B ⊗ Ip )
∂vec X ∂vec X
∂vec(In ⊗ X )B
+ (Iq ⊗ (X ′ ⊗ Im )A)
∂vec X
= (Kmn τmnm Kmm A)((In ⊗ X )B ⊗ Ip )
+ (Bτnnm Imn )(Iq ⊗ (X ′ ⊗ Im )A)

by applying Equations 5.10 and 5.11. It follows from Theorem 1.5 of Chap-
ter 1 that
∂vec A ′ (X ⊗ X )B
= Kmn (In ⊗ X )Bτmnm Kmm A + Bτnnm (X ′ ⊗ Im )A.
∂vec X
(5.13)
174 New Matrix Calculus Results

We could investigate this result further by expanding the cross-products to


obtain
∂vec A ′ (X ⊗ X )B ′ ′
= (In ⊗ x 1 )B ⊗ A(1) + · · · + (In ⊗ x m )B ⊗ A(m)
∂vec X
+ B1 ⊗ (x1′ ⊗ Im )A + · · · + Bn ⊗ (xn′ ⊗ Im )A,

where we have partitioned A and B as


⎛ ⎞ ⎛ ⎞
A1 B1
⎜ .. ⎟ ⎜ .. ⎟
A=⎝ . ⎠ B=⎝ . ⎠
Am Bn
with each submatrix A j being m× p and each submatrix B j being n×q.
With this basic result, we can easily obtain several others using the chain
rule.
Suppose X is n×n and nonsingular, and A and B are n2 × p and n2 ×q
matrices of constants. Then, using the backward chain rule, we have
∂vec A ′ (X −1 ⊗ X −1 )B ∂vec X −1 ∂vec A ′ (X −1 ⊗ X −1 )B
= .
∂vec X ∂vec X ∂vec X −1
By Equation 4.17 of Chapter 4,
∂vec X −1 ′
= −X −1 ⊗ X −1 ,
∂vec X
so, applying Equation 5.13, we have
∂vec A ′ (X −1 ⊗ X −1 )B ′
= −(X −1 ⊗ X −1 )[Knn (In ⊗ X −1 )Bτnnn Knn A
∂vec X

+ Bτnnn (X −1 ⊗ In )A].

Using Theorem 1.5 of Chapter 1 and Equation 2.11 of Chapter 2, we have


∂vecA ′ (X −1 ⊗ X −1)B ′
= −(In ⊗ X −1)Knn (In ⊗ X −1)Bτnnn (In ⊗ X −1 )Knn A
∂vec X
′ ′
−(In ⊗ X −1 )Bτnnn (I ⊗ X −1 )(X −1 ⊗ In )A

= −(X −1 ⊗ X −1 )Knn Bτnnn (In ⊗ X −1 )Knn A
′ ′
−(In ⊗ X −1 )Bτnnn (X −1 ⊗ X −1 )A.

Several results can be achieved in a similar manner. In what follows, X is an


m×n matrix and the orders of the matrices of constants, A and B, can be
inferred from the example in hand.
5.4 Matrix Calculus Results Involving Generalized Rvecs or Cross-Products 175

For the derivative of A ′ (X ′ ⊗ X ′ )B, we write


∂vec A ′ (X ′ ⊗ X ′ )B ∂vec X ′ ∂vec A ′ (X ′ ⊗ X ′ )B
= . (5.14)
∂vec X ∂vec X ∂vec X ′
Recall that Kmn vec X = vec X ′ , so
∂vec X ′
= (Kmn ) ′ = Knm .
∂vec X
By substituting in Equation 5.14 and appealing to Equation 5.13, we have
∂vecA ′ (X ′ ⊗ X ′ )B
= Knm [Knm (Im ⊗ X ′ )B τnmn Knn A + B τmmn (X ⊗ In )A].
∂vec X
If we wanted to break this derivative down further, we would appeal to
Theorems 2.12 and 2.14 of Chapter 2 and Equation 1.10 of Chapter 1, which
allow us to write
′ (1)
⎛ ⎞ ⎛ ⎞
K nm (Im ⊗ X )Bτnm1 A 1 Bτ mm1 X A
∂vec A ′ (X ′ ⊗ X ′ )B ⎜ .. ..
=⎝ ⎠+⎝ ⎠,
⎟ ⎜ ⎟
∂vec X . .
′ (n)
Knm (Im ⊗ X )Bτnm1 An Bτmm1 X A
where A( j ) s refer to the partitioning A = (A1′ . . . An′ ) ′ and each A j is n× p,
so
∂vec A ′ (X ′ ⊗ X ′ )B
⎛∂vec X ′
(Im ⊗ x1 )B ⊗ (A1 )1· + · · · +(Im ⊗ xn′ )B ⊗ (A1 )n·

=⎝
⎜ .. .. ⎟
. . ⎠
(Im ⊗ x1′ )B ⊗ (An )1· + · · · +(Im ⊗ xn′ )B ⊗ (An )n·
′ ′
B1 ⊗ x 1 A(1) + · · · +Bm ⊗ x m A(1)
⎛ ⎞
.. ..
+⎝ ⎠,
⎜ ⎟
. .
′ ′
B1 ⊗ x 1 A(n) + · · · +Bm ⊗ x m A(n)
where B = (B1′ . . . Bm′ ) ′ and each submatrix B j is m×q.
Consider
A ′ (X ′ ⊗ X )B = A ′ (X ′ ⊗ Im )(Im ⊗ X )B
so, applying the product rule yields
∂vec A ′ (X ′ ⊗ X )B ∂vec A ′ (X ′ ⊗ Im )
= [(Im ⊗ X )B ⊗ Ip ]
∂vec X ∂vec X
∂vec(Im ⊗ X )B
+ [Iq ⊗ (X ⊗ Im )A]. (5.15)
∂vec X
176 New Matrix Calculus Results

The backward chain rule yields


∂vecA ′ (X ′ ⊗ Im ) ∂vec X ′ ∂vec A ′ (X ′ ⊗ Im )
= = Knm (Kmm τmmm Kmn A)
∂vec X ∂vec X ∂vec X ′
(5.16)
where we have used Equation 5.11. Substituting Equation 5.16 in Equation
5.15 and using Equation 5.10 and Theorem 1.5 of Chapter 1, we obtain
∂vecA ′ (X ′ ⊗ X )B
= Knm [Kmm (Im ⊗ X )Bτmmn Kmn A] + Bτmnm (X ⊗ Im )A.
∂vec X
We can expand this result further by appealing to Theorem 2.14 of Chap-
ter 2 and Theorem 1.6 of Chapter 1 to obtain
⎛ ⎞ ⎛ (1) ⎞
K mm (Im ⊗ X )B τmm1 A 1 B τ m1m (X ⊗ Im )A
∂vec A ′ (X ′ ⊗ X )B ⎜ .. ..
=⎝ ⎠+⎝
⎟ ⎜ ⎟
∂vec X . . ⎠
Kmm (Im ⊗ X )B τmm1 An B (n) τm1m (X ⊗ Im )A
where A = (A1′ . . . An′ ) ′ with each submatrix A j being m× p and the B ( j ) s
refer to the partitioning B = (B1′ . . . Bm′ ) ′ where each submatrix is n×q, so
∂vec A ′ (X ′ ⊗ X )B
∂vec X
′ ′
(Im ⊗ x 1 )B ⊗ (A1 )1· + · · · + (Im ⊗ x m )B ⊗ (A1 )m·
⎛ ⎞

=⎝
⎜ .. .. ⎟
. . ⎠
′ ′
(Im ⊗ x 1 )B ⊗ (An )1· + · · · + (Im ⊗ x m )B ⊗ (An )m·
′ ′
(B1 )1· ⊗ (x 1 ⊗ Im )A + · · · + (Bm )1· ⊗ (x m ⊗ Im )A
⎛ ⎞

+⎝ .. ..
⎠.
⎜ ⎟
. .
1′ m′
(B1 )n· ⊗ (x ⊗ Im )A + · · · + (Bm )n· ⊗ (x ⊗ Im )A
Consider
A ′ (X ⊗ X ′ )B = A ′ (X ⊗ In )(In ⊗ X ′ )B
so, again applying the product rule gives
∂vec A ′ (X ⊗ X ′ )B ∂vec A ′ (X ⊗ In )
= [(In ⊗ X ′ )B ⊗ Ip ]
∂vec X ∂vec X
∂vec(In ⊗ X ′ )B
+ [Iq ⊗ (X ′ ⊗ In )A]. (5.17)
∂vec X
Now,
∂vec(In ⊗ X ′ )B ∂vec X ′ ∂vec(In ⊗ X ′ )B
= = Knm (B τnmn In2 ) (5.18)
∂vec X ∂vec X ∂vec X ′
5.4 Matrix Calculus Results Involving Generalized Rvecs or Cross-Products 177

where we have used Equation 5.10. Substituting Equation 5.18 in Equation


5.17 and using Equation 5.11, we have

∂vec A ′ (X ⊗ X ′ )B
= Knn (In ⊗ X ′ )B τnnm Knm A + Knm [B τnmn (X ′ ⊗ In )A].
∂vec X
Expanding this result requires a little work.
Using Theorem 2.13 of Chapter 2,

X ′ B1 τn1m Knm A
⎛ ⎞

Knn (In ⊗ X ′ )B τnnm Knm A = ⎝ ..


⎠, (5.19)
⎜ ⎟
.

X Bn τn1m Knm A

where B = (B1′ . . . Bn′ ) ′ and each submatrix is m×q. From Equation 1.10 of
Chapter 1,

((X ′ ⊗ In )A)( j ) = X ′ A( j )

where A( j ) refers to the partitioning A = (A1′ . . . Am′ ) with each submatrix


A j being n× p. Using Theorem 2.12 of Chapter 2, then

B τnm1 X ′ A(1)
⎛ ⎞

Knm [B τnmn (X ′ ⊗ In )A] = ⎝ .. (5.20)


⎠.
⎜ ⎟
.
′ (n)
B τnm1 X A

Joining Equations 5.19 and 5.20 together, we have

X ′ B1 τn1m Knm A B τnm1 X ′ A(1)


⎛ ⎞ ⎛ ⎞
′ ′
∂vec A (X ⊗ X )B ⎜ .. ..
=⎝ ⎠+⎝
⎟ ⎜ ⎟
∂vec X . . ⎠
X ′ Bn τn1m Knm A B τnm1 X ′ A(n)
⎛ ′
x1 B1 ⊗ A(1) + · · · +xn′ B1 ⊗ A(n)

=⎝
⎜ .. .. ⎟
. . ⎠
x1′ Bn ⊗ A(1) + · · · +xn′ Bn ⊗ A(n)
B1 ⊗ x1′ A(1) + · · · +Bn ⊗ xn′ A(1)
⎛ ⎞
.. ..
+⎝ ⎠,
⎜ ⎟
. .
B1 ⊗ x1′ A(n) + · · · +Bn ⊗ xn′ A(n)

using Theorem 2.12 again.


178 New Matrix Calculus Results

5.5 Matrix Derivatives of Generalized Vecs and Rvecs

5.5.1 Introduction
When we take the rvecm of an mG× p matrix A, we get an m× pG matrix.
Whereas if we take the vecm of q×mG matrix B, we get an Gq×m matrix and
just like any other matrices we can envisage taking the matrix derivatives of
these generalized rvecs and vecs. If Y is such a matrix, that is a generalized vec
or generalized rvec and the elements of Y are differentiable functions of the
elements of X, then as in the previous section we work with ∂vecY/∂vec X.
For convenience, we divide this section into two parts. The first part deals
with ‘large X’, where X is mG× p or p×mG. The second part looks at
generalized rvecs and vecs involving a ‘small X’ where X is, say, p×q. As in
the previous section, we call on the results derived in Chapters 1 and 2 on
generalized vecs, rvecs, and cross-products together with results involving
the rvec of the commutation matrix.

5.5.2 Large X

Results for Generalized rvecs


Suppose X is an mG× p matrix and we partition X as follows
⎛ ⎞
X1
⎜ .. ⎟
X =⎝ . ⎠
XG

where each submatrix is m× p. It follows that rvecm X is the m× pG matrix


given by

rvecm X = (X1 . . . XG )

so
⎛ ⎞
vec X1
vec(rvecm X ) = ⎝ ... ⎠ .
⎜ ⎟

vec XG

From our work on selection matrices in Section 2.2 of Chapter 2, we know


that
 ′
X j = e Gj ⊗ Im X = S j X

5.5 Matrix Derivatives of Generalized Vecs and Rvecs 179

say, for j = 1, . . . G, so
vec X j = (Ip ⊗ S j )vec X
and
⎛ ⎞
I p ⊗ S1
vec(rvecm X ) = ⎝ ..
⎠ vec X.
⎜ ⎟
.
I p ⊗ SG
Using Theorem 5.1, we obtain
∂vec(rvecm X ) 
= Ip ⊗ S1′ . . . Ip ⊗ SG′ = Ip ⊗ e1G ⊗ Im . . . Ip ⊗ eGG ⊗ Im
  
∂vec X
= Ip ⊗ e1G . . . Ip ⊗ eGG ⊗ Im = KpG ⊗ Im .
 
(5.21)
This result is the basic building block from which several other matrix
derivative results of generalized rvecs can be derived.
If X is now p×mG, then using the backward chain rule
∂vec(rvecm X ′ ) ∂vec X ′ ∂vec(rvecm X ′ )
= .
∂vec X ∂vec X ∂vec X ′
∂vec X ′ ′
But = Kp,mG = KmG,p so
∂vec X
∂vec(rvecm X ′ )
= KmG,p (KpG ⊗ Im ),
∂vec X
from Equation 5.21. But from Equation 2.9 of Chapter 2,
KmG,p (KpG ⊗ Im ) = (IG ⊗ Kmp )(KG p ⊗ Im )(KpG ⊗ Im ) = IG ⊗ Kmp ,
which gives our second result, namely
∂vec(rvecm X ′ )
= IG ⊗ Kmp . (5.22)
∂vec X
In a similar fashion, if X is mG×mG and nonsingular, then by the backward
chain rule
∂vec(rvecm X −1 ) ∂vec X −1 ∂vec(rvecm X −1 )
= .
∂vec X ∂vec X ∂vec X −1
∂vec X −1 ′
But = −(X −1 ⊗ X −1 ), by Equation 4.17 of Chapter 4 so using
∂vec X
Equation 5.21, we have
∂vec(rvecm X −1 ) ′
= −(X −1 ⊗ X −1 )(KmG,G ⊗ Im ).
∂vec X
180 New Matrix Calculus Results

If we want to breakdown this result further, we would partition X −1 as


follows
⎛ 1⎞
X
−1 ⎜ .. ⎟
X =⎝ . ⎠
XG
where each submatrix is m×mG, so
′ ′ ′
X −1 = (X 1 . . . X G )
and we can call on Theorem 2.7 of Chapter 2 to write
∂vec(rvecm X −1 ) ′ ′
= −(X −1 ⊗ X 1 . . . X −1 ⊗ X G ).
∂vec X
Matrices of constants can now be introduced. Let A be such a matrix. If
X is mG× p and A is p×q, then by Equation 1.19 of Chapter 1
rvecm X A = (rvecm X )(IG ⊗ A)
so
vec(rvecm X A) = (IG ⊗ A ′ ⊗ Im )vec(rvecm X )
and by Theorem 5.1 and Equation 5.21
∂vec(rvecm X A)
= (KpG ⊗ Im )(IG ⊗ A ⊗ Im ) = KpG (IG ⊗ A) ⊗ Im .
∂vec X
We can expand this result further by calling on Theorem 2.3 of Chapter 2
to write

IG ⊗ a1 ⊗ Im
⎛ ⎞
∂vec(rvecm X A) ⎜ ..
=⎝ ⎠.

∂vec X .
p′
IG ⊗ a ⊗ Im
In a similar manner, if X is p×mG and A is an p×q matrix of constants
∂vec(rvecm X ′ A)
= (IG ⊗ Kmp )(IG ⊗ A ⊗ Im ) = IG ⊗ Kmp (A ⊗ Im )
∂vec X
′ ⎞
A ⊗ e1m

= IG ⊗ ⎝ ..
⎠,
⎜ ⎟
.

m
A ⊗ em
by Theorem 2.3 of Chapter 2.
5.5 Matrix Derivatives of Generalized Vecs and Rvecs 181

By a similar analysis if X is a nonsingular mG×mG matrix and A is a


mG×q matrix of constants,

∂vec(rvecm X −1 A)
= −(X −1 ⊗ X −1 )(KmG,G ⊗ Im )(IG ⊗ A ⊗ Im )
∂vec X
′ ′
= −(X −1 ⊗ X 1 . . . X −1 ⊗ X G )(IG ⊗ A ⊗ Im )
⎛ ⎞
A ⊗ Im O
−1 1′ −1
= −(X ⊗ X . . . X ⊗ X ) ⎝ G′ ⎜ .. ⎟
. ⎠
O A ⊗ Im
′ ′
= −(X −1 A ⊗ X 1 . . . X −1 A ⊗ X G )

Results for Generalized vecs


Suppose now X is an p×mG matrix and we partition X as X = (X1 . . . XG ),
where each submatrix is p×m. It follows that vecm X is the pG×m matrix
given by
⎛ ⎞
X1
⎜ .. ⎟
vecm X = ⎝ . ⎠ .
XG

From Theorem 2.40 of Chapter 2,

vec(vecm X ) = vec X KGm = (KmG ⊗ Ip )vec X

so using Theorem 5.1


∂vec(vecm X )
= (KmG ⊗ Ip ) ′ = KGm ⊗ Ip . (5.23)
∂vec X
If X is an mG× p matrix, then by the backward chain rule
∂vec(vecm X ′ ) ∂vec X ′ ∂vecm X ′
= .
∂vec X ∂vec X ∂vec X ′
But KmG,p vec X = vec X ′ so

∂vec X ′ ′
= KmG,p = Kp ,mG
∂vec X
and
∂vec(vecm X ′ )
= Kp,mG (KGm ⊗ Ip ). (5.24)
∂vec X
182 New Matrix Calculus Results

If X is an mG×mG nonsingular matrix, then


∂vec(vecm X −1 ) ′ ′
= −(X −1 ⊗ X −1 )(KGm ⊗ ImG ) = −X −1 KGm ⊗ X −1
∂vec X
 −1 ′
⊗ X −1 . . . X(m)
−1
⊗ X −1

= − X(1) (5.25)

by Equation 2.66 of Section 2.7.7 of Chapter 2.


As with rvecs, we are now in a position to introduce a matrix of constants
A. If X is an p×mG matrix and A is an q× p matrix of constants, then by
Theorem 1.12 of Chapter 1

vecm AX = (IG ⊗ A)vecm X

so,

vec(vecm AX ) = (Im ⊗ IG ⊗ A)vec(vecm X )

and by Theorem 5.1 and Equation 5.23


∂vec(vecm AX ) ∂vec(vecm X )
= (Im ⊗ IG ⊗ A ′ )
∂vec X ∂vec X
= (KGm ⊗ Ip )(ImG ⊗ A ′ ) = KGm ⊗ A ′ .

If X is an mG× p matrix and A is an q× p matrix of constants, then in a


similar manner
∂vec(vecm AX ′ ) ∂vec(vecm X ′ )
= (Im ⊗ IG ⊗ A ′ )
∂vec X ∂vec X
= Kp,mG (KGm ⊗ Ip )(ImG ⊗ A ′ )
KGm ⊗ a1′
⎛ ⎞
..
= Kp,mG (KGm ⊗ A ′ ) = ⎝
⎜ ⎟
. ⎠
KGm ⊗ a p′

where in our analysis we have used Equation 5.24 and Theorem 2.3 of
Chapter 2.
Finally, if X is an mG×mG nonsingular matrix and A is an q×mG matrix
of constants, then
∂vec(vecm AX −1 )
∂vec X
∂vec(vecm X −1 ) ′
= (ImG ⊗ A ′ ) = −(X −1 KGm ⊗ X −1 )(ImG ⊗ A ′ )
∂vec X
′ ′ ′
= −(X −1 KGm ⊗ X −1 A ′ ) = X(1)
 −1
⊗ X −1 A ′ . . . X(m)
−1
⊗ X −1 A ′

5.5 Matrix Derivatives of Generalized Vecs and Rvecs 183

where in our working we have made use of Equation 5.25 and Equation
2.66 of Chapter 2.

5.5.3 Small X

Results for Generalized rvecs


The matrix X may be part of a matrix product and it may also be the case
that we are considering a generalized rvec of this product. The question is:
what is the matrix derivative of such a matrix?
Suppose then that A and B are mG× p and q×r matrices of constants,
respectively, and that X is an p×q matrix so it makes sense to take the rvecm
of AX B, which from Equation 1.19 of Chapter 1 is given by

rvecm AX B = (rvecm A)(IG ⊗ X B) = (rvecm A)(IG ⊗ X )(IG ⊗ B),

so

vec(rvecm AX B) = (IG ⊗ B ′ ⊗ rvecm A)vec(IG ⊗ X )

and by Theorem 5.1


∂vec(rvecm AX B) ∂vec(IG ⊗ X )
= (IG ⊗ B ⊗ (rvecm A) ′ ).
∂vec X ∂vec X
Recall from Equation 5.3 that
∂vec(IG ⊗ X )
= (rvecq KGq ⊗ Ip )
∂vec X
and from our work in Section 1.4 of Chapter 1 that

(rvecm A) ′ = vecm A ′

so
∂vec(rvecm AX B)
∂vec X

= (rvecq KGq ⊗ Ip )(IG ⊗ B ⊗ vecm A ′ )


B ⊗ vecm A ′
⎛ ⎞
O
 ′ ′
= Iq ⊗ e1G ⊗ Ip . . . Iq ⊗ eGG ⊗ Ip ⎝
 ⎜ .. ⎟
. ⎠
O B ⊗ vecm A ′
 ′  ′
= B ⊗ e1G ⊗ Ip vecm A ′ . . . B ⊗ eGG ⊗ Ip vecm A ′ .
 
184 New Matrix Calculus Results

Now, if we partition A as follows


⎛ ⎞
A1
A = ⎝ ... ⎠
⎜ ⎟

AG
where each submatrix is m× p, then
A ′ = (A1′ . . . AG′ )
and
A1′
⎛ ⎞

vecm A ′ = ⎝ ... ⎠ .
⎜ ⎟

AG′
From our work of selection matrices in Section 2.2 of Chapter 2, we know
that
 G′
e j ⊗ Ip vecm A ′ = A ′j


which gives our matrix derivative result:


∂vec(rvecm AX B)
= B ⊗ A1′ . . . B ⊗ AG′ .
∂vec X
Using this result as our building block, we can derive others. If X is now
q× p, then by the backward chain rule
∂vec(rvecm AX ′ B) ∂vec X ′ ∂ (rvecm AX ′ B)
= = Kpq (B ⊗ A1′ . . . B ⊗ AG′ )
∂vec X ∂vec X ∂vec X ′
= (A1′ ⊗ B)Kmr . . . (AG′ ⊗ B)Kmr = (A ′ ⊗ B)(IG ⊗ Kmr ).
Finally, if X is p× p and nonsingular and B is p×r, then
∂vec(rvecm AX −1 B) ∂vec X −1 ∂vec(rvecm AX −1 B)
=
∂vec X ∂vec X ∂vec X −1
−1 ′
= −(X ⊗ X )(B ⊗ A1′ . . . B ⊗ AG′ )
−1
′ ′
= −(X −1 B ⊗ X −1 A1′ . . . X −1 B ⊗ X −1 AG′ ).

Result for Generalized vecs


As with rvecs, we now want to take a generalized vec of a suitable product
matrix that involves X. We then want to derive the matrix derivative of such
a matrix.
Suppose that A and B are s× p and q×Gm matrices of constants, respec-
tively, and that X is an p×q matrices. The product matrix AX B will then be
5.5 Matrix Derivatives of Generalized Vecs and Rvecs 185

an s×Gm matrix, so it makes sense that the vecm of this product, which by
Theorem 1.12 of Chapter 1 is given by
vecm AX B = (IG ⊗ AX )vecm B = (IG ⊗ A)(IG ⊗ X )vecm B.
Taking the vec of this matrix renders
vec(vecm AX B) = [(vecm B) ′ ⊗ (IG ⊗ A)]vec(IG ⊗ X ),
so by Theorem 5.1
∂vec(vecm AX B) ∂vec(IG ⊗ X )
= (vecm B ⊗ IG ⊗ A ′ ).
∂vec X ∂vec X
Applying Equation 5.3 allows us to write
∂vec(vecm AX B)
= (rvecm KGq ⊗ Ip )(vecm B ⊗ IG ⊗ A ′ ).
∂vec X
Applying Theorem 2.28 of Chapter 2, we obtain
∂vec(vecm AX B)
= vecm B τGqp (IG ⊗ A ′ ). (5.26)
∂vec X
If we expand this derivative further by partitioning B as B = (B1 . . . BG ),
where each submatrix is q×m, so writing out the cross-product of Equation
5.26 gives
∂vec(vecm AX B) ′ ′
= B1 ⊗ e1G ⊗ A ′ + · · · + BG ⊗ eGG ⊗ A ′ .
∂vec X
Suppose now X is q× p while A and B remain the same. Then, by the
backward chain rule
∂vec(vecm AX ′ B) ∂vec X ′ ∂vec(vecm AX ′ B)
= = Kpq (vecm BτGqp (IG ⊗ A ′ ))
∂vec X ∂vec X ∂vec X ′
by Equation 5.26. But by Equation 1.9 of Chapter 1, (IG ⊗ A ′ )( j ) = IG ⊗ a ′j
where a j is the jth column of A, so using Theorem 2.12 of Chapter 2 we
can write
vecm B τGq1 (IG ⊗ a1′ )
⎛ ⎞

∂vec(vecm AX B) ⎜ ..
=⎝ ⎠.

∂vec X .

vecm B τGq1 (IG ⊗ a p )
To elaborate further, we can expand the cross-products to obtain
′ ′
B1 ⊗ e1G ⊗ a1′ + · · · + BG ⊗ eGG ⊗ a1′
⎛ ⎞

∂vec(vecm AX B) ⎜ ..
=⎝ ⎠.

∂vec X .
G′ ′ G′ ′
B1 ⊗ e1 ⊗ a p + · · · + BG ⊗ eG ⊗ a p
186 New Matrix Calculus Results

Finally, if X is p× p and nonsingular and B is p×Gm, then


∂vec(vecm AX −1 B) ∂vec X −1 ∂vec(vecm AX −1 B)
=
∂vec X ∂vec X ∂vec X −1
−1 ′
= −(X ⊗ X )vecm B τG pp (IG ⊗ A ′ )
−1

= −(IG ⊗ X −1 )vecm B τG pp (IG ⊗ (AX −1 ) ′ ),


using Equation 4.17 of Chapter 4, Equation 5.26, and Theorem 1.5 of
Chapter 1 consecutively. Expanding this cross-product gives
∂vec(vecm AX −1 B) ′ ′
= − X −1 B1 ⊗ e1G ⊗ (AX −1

∂vec X −1

+ · · · + X −1 BG ⊗ eGG ⊗ (AX −1 ) ′ ).

5.6 Matrix Derivatives of Cross-Products

5.6.1 Basic Cross-Products


Cross-products, as we know, involve the sums of Kronecker products, so it
follows that we can use the results obtained for the derivatives of vecs of
Kronecker products in Section 5.3 to develop matrix derivatives of cross-
products. This work will rely heavily on the results concerning selection
matrices presented in Section 2.2 and the results about generalized vecs and
rvecs of the commutation matrix presented in Section 2.5.
To get started, let X be an mG× p matrix and A be an nG×q matrix of
constants and partition these matrices as follows:
⎛ ⎞ ⎛ ⎞
X1 A1
⎜ .. ⎟ ⎜ .. ⎟
X = ⎝ . ⎠, A = ⎝ . ⎠,
XG AG
where in these partitions each submatrix Xi is m× p and each submatrix A j
is n×q for i = 1, . . . , G and j = 1, . . . , G. Then, we know that
X τGm n A = X1 ⊗ A1 + · · · + XG ⊗ AG
so
vec(X τGmn A) = vec(X1 ⊗ A1 ) + · · · + vec(XG ⊗ AG ),
and
∂vec(X τGm n A) ∂vec(X1 ⊗ A1 ) ∂vec(XG ⊗ AG )
= + ··· + .
∂vec X ∂vec X ∂vec X
5.6 Matrix Derivatives of Cross-Products 187

Consider ∂vec(X1 ⊗ A1 )/∂vec X , which using the backward chain rule,


Theorem 5.2, we can write as
∂vec(X1 ⊗ A1 ) ∂vec X1 ∂vec(X1 ⊗ A1 )
= . (5.27)
∂vec X1 ∂vec X ∂vec X1
Using Equation 5.5 of Section 5.2, we can write
∂vec(X1 ⊗ A1 )
= Ip ⊗ (rvecm Kqm )(Iqm ⊗ A1′ ). (5.28)
∂vec X1
From our work on selection matrices in Section 2.2, we know that
X1 = S1 X

where S1 = eG1 ⊗ Im so vec X1 = (Ip ⊗ S1 )vec X and
∂vec X1
= Ip ⊗ S1′ . (5.29)
∂vec X
Substituting Equations 5.29 and 5.28 in Equation 5.27, we have
∂vec(X1 ⊗ A1 )
= Ip ⊗ (S1′ rvecm Kqm )(Iqm ⊗ A1′ ).
∂vec X
Now,
S1′ rvecm Kqm = eG1 ⊗ Im rvecm Kqm = eG1 ⊗ rvecm Kqm
 

so
∂vec(X1 ⊗ A1 )
= Ip ⊗ eG1 ⊗ rvecm Kqm Iqm ⊗ A1′
  
∂vec X
= Ip ⊗ eG1 ⊗ (rvecm Kqm ) Iqm ⊗ A1′
 
 ⎞
(rvecm Kqm ) Iqm ⊗ A1′

⎜ O ⎟
= Ip ⊗ ⎜ ⎟.
⎜ ⎟
..
⎝ . ⎠
O
It follows that
 ⎞
(rvecm Kqm ) Iqm ⊗ A1′

∂vec X τGmn A ..
= Ip ⊗ ⎝ ⎠.
⎜ ⎟
∂vec X . 

(rvecm Kqm ) Iqm ⊗ AG
But using Equation 1.19 of Chapter 1, we can write
(rvecm Kqm ) Iqm ⊗ A1′ = rvecm Kqm Im ⊗ A1′
    
188 New Matrix Calculus Results

so
rvecm [Kqm (Im ⊗ A1′ )]
⎛ ⎞
∂vec X τGmn A ..
= Ip ⊗ ⎝ ⎠. (5.30)
⎜ ⎟
∂vec X .

rvecm [Kqm (Im ⊗ AG )]
If we wanted to write this result more succinctly note that
Iqm ⊗ A1′
⎛ ⎞
∂vec X τGmn A ..
= Ip ⊗ (IG ⊗ rvecm Kqm ) ⎝
⎜ ⎟
∂vec X . ⎠
Iqm ⊗ AG′
and from Theorem 2.5 of Chapter 2
Iqm ⊗ A1′
⎛ ⎞
.. ′
⎠ = (KG,qm ⊗ Iq )(Iqm ⊗ vecm A ),
⎜ ⎟
⎝ .
Iqm ⊗ AG′
allowing us to write
∂vec X τGmn A
= Ip ⊗ (IG ⊗ rvecm Kqm )(KG,qm ⊗ Iq )(Iqm ⊗ vecn A ′ ).
∂vec X
But by Theorem 2.22 of Chapter 2,
(IG ⊗ rvecm Kqm )(KG,qm ⊗ Iq ) = KGm rvecmG Kq,mG
so, more succinctly
∂vec X τGmn A
= Ip ⊗ KGm rvecmG Kq,mG (Iqm ⊗ vecm A ′ ). (5.31)
∂vec X
If, however, we wanted to break this result down further or write it
another way, we could return to Equation 5.30 and appeal to Equation 2.11
of Chapter 2, which then allows us to write
Im ⊗ (A1′ )1· · · · Im ⊗ (A1′ )q·
⎛ ⎞
∂vec X τGmn A .. ..
= Ip ⊗ ⎝
⎜ ⎟
∂vec X . . ⎠
′ ′
Im ⊗ (AG )1· · · · Im ⊗ (AG )q·
rvec A1′ ⊗ Im
⎛ ⎞

= Ip ⊗ ⎝ ..
⎠ (Iq ⊗ Knm ). (5.32)
⎜ ⎟
.
rvec AG′ ⊗ Im
Consider now
vec A τGnm X = vec(A1 ⊗ X1 ) + · · · + vec(AG ⊗ XG ).
5.6 Matrix Derivatives of Cross-Products 189

We could proceed as we did for vec X τGmn A and compute

∂vec(A1 ⊗ X1 ) ∂vec X1 ∂vec(A1 ⊗ X1 )


=
∂vec X ∂vec X ∂vec X1

and use Equation 5.8 to write

∂vec(A1 ⊗ X1 ) 
= rvecm Kqp Iqp ⊗ A1′ ⊗ Im .
 
∂vec X1

Alternatively, we could start by using the properties of cross-products as


presented in Section 2.42 of Chapter 2. We saw in this section that

A τGnm X = Knm (X τGmn A)Kpq

so

vec A τGnm X = (Kqp ⊗ Knm )vec X τGmn A

and
∂vec A τGnm X ∂vec X τGmn A
= (Kpq ⊗ Kmn ).
∂vec X ∂vec X
Using Equation 5.30, we can write

∂vec(A τGnm X )
= (Ip ⊗ C )(Kpq ⊗ Kmn )
∂vec X
where

rvecm [Kqm (Im ⊗ A1′ )]


⎛ ⎞

C=⎝ ..
⎠.
⎜ ⎟
.
rvecm [Kqm (Im ⊗ AG′ )]

But from the definition of the commutation matrix given by Equation


2.8 of Chapter 2,
q
Kpq = Ip ⊗ e1 . . . Ip ⊗ eqq

so we write
∂vec(A τGnm X )  q
= Ip ⊗ C e1 ⊗ Kmn . . . Ip ⊗ C eqq ⊗ Kmn .
  
∂vec X
190 New Matrix Calculus Results

q
Consider the first block of the matrix C(e1 ⊗ Kmn ):
 q
rvecm Kqm Im ⊗ A1′
   
e1 ⊗ Kmn
 q
= (rvecm Kqm ) Iq ⊗ Im ⊗ A1′ e1 ⊗ Kmn
 
 q 
= (rvecm Kqm ) e1 ⊗ Im ⊗ A1′ Kmn
 
⎛ 
Im ⊗ A1′ Kmn

q′

′ ⎜ O ⎟
= Im ⊗ e1 . . . Im ⊗ eqq ⎜
 ⎟
.. ⎟
⎝ . ⎠
O
q′ ′ 
= Im ⊗ e1 A1 Kmn = Im ⊗ (A1 )1 . Kmn = A1′ 1 . ⊗ Im .

    

It follows that the matrix in question can be written as


⎛ ′ ⎞
(A1 )1· ⊗ Im
 q  ⎜
C e1 ⊗ Kmn = ⎝ .. ′ (1)
⎠ = (vecn A ) ⊗ Im ,

.
(AG′ )1· ⊗ Im
and that
∂vec(A τGnm X )
= Ip ⊗ (vecn A ′ )(1) ⊗ Im . . . Ip ⊗ (vecn A ′ )(q) ⊗ Im
∂vec X
= (Ip ⊗ (vecn A ′ )(1) . . . Ip ⊗ (vecn A ′ )(q) ) ⊗ Im . (5.33)
Appealing to Theorem 2.26 allows us to write this more succinctly as
∂vec(A τGnm X )
= (rvec pG Kq,pG )(Ipq ⊗ vecn A ′ ) ⊗ Im .
∂vec X

5.6.2 Cross-Products Involving X ′


Having obtained ways of writing the derivatives of basic cross-products, we
can now expand our analysis for cross-products that involve X ′ .
Let X now be an p×mG matrix. Then,
∂vec X ′ τGm n A ∂vec X ′ ∂vec X ′ τGm n A
= .
∂vec X ∂vec X ∂vec X ′
Now,
Kp,mG vec X = vec X ′
so
∂vec X ′ ′
= Kp,mG = KmG,p ,
∂vec X
5.6 Matrix Derivatives of Cross-Products 191

and using Equation 5.30, we have


   ⎞⎞
rvecm Kqm Im ⊗ A1′
⎛ ⎛

∂vec X τGm n A ..
= KmG,p ⎝Ip ⊗ ⎝ ⎠⎠ .
⎜ ⎜ ⎟⎟
∂vec X  . ′

rvecm Kqm Im ⊗ AG

But from Equation 2.9 of Chapter 2,

KmG,p = (IG ⊗ Kmp )(KG p ⊗ Im )

and by Theorem 2.5 of Chapter 2


   ⎞⎞
rvecm Kqm Im ⊗ A1′
⎛ ⎛

(KG p ⊗ Im ) ⎝Ip ⊗ ⎝
⎜ ⎜ .. ⎟⎟
.
⎠⎠
 
rvecm Kqm Im ⊗ AG′
   ⎞
Ip ⊗ rvecm Kqm Im ⊗ A1′

=⎝
⎜ .. ⎟
 .  


Ip ⊗ rvecm Kqm Im ⊗ AG
so
   ⎞
Ip ⊗ rvecm Kqm Im ⊗ A1′

∂vec X ′ τGmn A ..
= (IG ⊗ Kmp ) ⎝ ⎠.
⎜ ⎟
∂vec X  .  ′

Ip ⊗ rvecm Kqm Im ⊗ AG

Theorem 2.25 of Chapter 2 allows us to write this result another way and
break it down further. Applying this theorem we have,

rvecm Kqm Im ⊗ A1′ = rvec A1′ ⊗ Im (Iq ⊗ Knm )


    

= Im ⊗ A1′ 1· . . . Im ⊗ (A1′ )q·


 

so, another way of writing our result is

Ip ⊗ rvec A1′ ⊗ Im
⎛ ⎞
∂vec X ′ τGmn A ..
= (IG ⊗ Kmp ) ⎝ ⎠ (Ipq ⊗ Knm )
⎜ ⎟
∂vec X .

Ip ⊗ rvec AG ⊗ Im
Ip ⊗ (Im ⊗ (A1′ )1· . . . Im ⊗ (A1′ )q· )
⎛ ⎞

= (IG ⊗ Kmp ) ⎝ ..
⎠.
⎜ ⎟
.
′ ′
Ip ⊗ (Im ⊗ (AG )1· . . . Im ⊗ (AG )q· )
(5.34)
192 New Matrix Calculus Results

Finally, appealing to Theorem 2.3 of Chapter 2, we can write


 ′   ′
Ip ⊗ e1m ⊗ A1′ 1· . . . e1m ⊗ A1′ q·
⎛   ⎞
⎜ .. ⎟

⎜ . ⎟

⎜ I ⊗ em ′ ⊗ A ′ . . . em ′ ⊗ A ′
      ⎟
p m 1 1· m 1 q· ⎟


∂vec X τGm n A ⎜ .. ⎟
=⎜ .

∂vec X ⎜  m′  ′  ′

⎜ Ip ⊗ e1 ⊗ AG 1· . . . e1m ⊗ AG′ q· ⎟
⎜    ⎟
..
⎜ ⎟
⎜ ⎟
 m′  ′  .
⎝ ⎠
m′
  
Ip ⊗ em ⊗ AG 1· . . . em ⊗ AG′ q·

Appealing to the properties of cross-products, we now write

AτGnm X ′ = Knm (X ′ τGmn A)Kpq

so

vec AτGnm X ′ = (Kqp ⊗ Knm )vec X ′ τGm n A

and
∂vec AτGnm X ′ ∂vec X ′ τGmn A
= (Kpq ⊗ Kmn ). (5.35)
∂vec X ∂vec X
Substituting Equation 5.34 into Equation 5.35 and noting that Knm Kmn =
Imn , we have

Ip ⊗ rvec A1′ ⊗ Im
⎛ ⎞

∂vec AτGmn X ..
= (IG ⊗ Kmp ) ⎝ ⎠ (Kpq ⊗ Imn ).
⎜ ⎟
∂vec X .

Ip ⊗ rvec AG ⊗ Im

The first block of this matrix is

Kmp Ip ⊗ rvec A1′ Kpq ⊗ In ⊗ Im


   

and appealing to Corollary 2.2 of Chapter 2, we can write this block as

Kmp Ip ⊗ A1′ 1· . . . Ip ⊗ A1′ q· ⊗ Im .


      

Theorem 2.3 of Chapter 2 allows us to write this first block as


′ ⎞
(Ip ⊗ (A1′ )1· . . . Ip ⊗ (A1′ )q· ) ⊗ e1m

⎜ .. ⎟
⎝ . ⎠

m
(Ip ⊗ (A1′ )1· . . . Ip ⊗ (A1′ )q· ) ⊗ em
5.6 Matrix Derivatives of Cross-Products 193

so the derivative can be broken down to give



⎛ ⎞
(Ip ⊗ (A1′ )1· . . . Ip ⊗ (A1′ )q· ) ⊗ e1m
⎜ .. ⎟

⎜ . ⎟

⎜ (I ⊗ (A ′ ) . . . I ⊗ (A ′ ) ) ⊗ e m ′ ⎟
p 1 1· p 1 q· m ⎟
∂vec AτGm n X ′ ⎜

=⎜ ⎜ .
..

⎟.
∂vec X ⎟
′ ⎟
⎜ (Ip ⊗ (AG′ )1· . . . Ip ⊗ (AG′ )q· ) ⊗ e1m ⎟

⎜ .. ⎟
.
⎜ ⎟
⎝ ⎠

m
(Ip ⊗ (AG′ )1· . . . Ip ⊗ (AG′ )q· ) ⊗ em

5.6.3 Cross-Products Involving X −1


Cross-products can be formed from the inverse of X provided, of course, X
is square and nonsingular. It is of some interest then to derive the derivative
of such cross-products.
Suppose X is mG×mG and nonsingular. Then, by the backward chain
rule and using Equation 5.32,
∂vec X −1 τGmn A
∂vec X
∂vec X −1 ∂vec X −1 τGmn A
=
∂vec X ∂vec X −1
Im ⊗ (A1′ )1· . . . Im ⊗ (A1′ )q·
⎡ ⎛ ⎞⎤
′ ⎢
= −(X −1 ⊗ X −1 ) ⎣ImG ⊗ ⎝ .. ..
⎠⎦ .
⎜ ⎟⎥
. .
′ ′
Im ⊗ (AG )1· . . . Im ⊗ (AG )q·

If we partition X −1 as
X1
⎛ ⎞

X −1 = ⎝ ... ⎠
⎜ ⎟

XG

where each submatrix X j is m×mG, then we can write


∂vec X −1 τGmn A
∂vec X

= −X −1 ⊗ [X 1 (Im ⊗ (A1′ )1· ) + · · ·
′ ′ ′
+ X G (Im ⊗ (AG′ )1· ) · · · X1 (Im ⊗ (A1′ )q· ) + · · · + X G (Im ⊗ (AG′ )q· )].
194 New Matrix Calculus Results

If we want a more succinct expression, we can use Equation 5.31 to obtain

∂vec X −1 τGm n A ′
= −X −1 ⊗ X −1 KGm rvecmG Kq,mG (Iqm ⊗ vecm A ′ ).
∂vec X
(5.36)

In a similar manner,
∂vec AτGnm X −1
∂vec X
∂vec AτGnm X −1

= −(X −1 ⊗ X −1 )
∂vec X −1
−1 ′
= −(X ⊗ X )(ImG ⊗ (vecn A ′ )(1) ⊗ Im . . . ImG ⊗ (vecn A ′ )(q) ⊗ Im )
−1
′ ′
= −X −1 ⊗ X −1 ((vecn A ′ )(1) ⊗ Im ) . . . − X −1 ⊗ X −1 (vecn A ′ )(q) ⊗ Im

where in our working we have used Equation 5.33.


Consider
⎛ ′ ⎞
(A1 )1· ⊗ Im
′ ′ ′ ⎜
X −1 ((vecn A ′ )(1) ⊗ Im ) = (X 1 . . . X G ) ⎝ .. ⎟
. ⎠
(AG′ )1· ⊗ Im
′ ′
= X 1 ((A1′ )1· ⊗ Im ) + · · · + X G ((AG′ )1· ⊗ Im ).

It follows then that


∂vec AτGm n X −1
∂vec X
′  ′ ′
= −X −1 ⊗ X 1 ((A1′ )1· ⊗ Im ) + · · · + X G ((AG′ )1· ⊗ Im ) . . .

′  ′ ′
−X −1 ⊗ X 1 ((A1′ )q· ⊗ Im ) + · · · + X G ((AG′ )q· ⊗ Im ) .


A more succinct expression for this equation can be obtained using Equation
5.36 and the fact that
∂vec AτGnm X −1 ∂vec X −1 τGmn A
= (KmG,q ⊗ Kmn )
∂vec X ∂vec X
to obtain
∂vec AτGnm X −1
∂vec X

= −(X −1 ⊗ X −1 KGm rvecmG Kq,mG (Iqm ⊗ vecn A ′ ))(KmG,q ⊗ Kmn ).
5.6 Matrix Derivatives of Cross-Products 195

5.6.4 The Cross-Product X τGm m X


If X is mG× p, then we can form the cross-product X τGmm X. In this section,
we derive an expression for the derivative of this cross-product. Write
⎛ ⎞
X1
⎜ .. ⎟
X =⎝ . ⎠
XG
where each submatrix in this partitioning is m× p. Then,

X τGmm X = X1 ⊗ X1 + · · · + XG ⊗ XG

and

vec(X τGmm X ) = vec(X1 ⊗ X1 ) + · · · + vec(XG ⊗ XG ),

so
∂vec(X τGmm X ) ∂vec(X1 ⊗ X1 ) ∂vec(XG ⊗ XG )
= + ··· + .
∂vec X ∂vec X ∂vec X
∂vec(X1 ⊗ X1 )
Consider . By the backward chain rule,
∂vec X
∂vec(X1 ⊗ X1 ) ∂vec X1 ∂vec(X1 ⊗ X1 )
= .
∂vec X ∂vec X ∂vec X1
By Equation 5.29,
∂vec X1
= Ip ⊗ S1′ .
∂vec X

where S1 is the m×mG selection matrix e1G ⊗ Im .
From Equation 5.13 of Section 5.4,
∂vec(X1 ⊗ X1 )
= Kmp (Ip ⊗ X1 )τmpm Kmm + Ip2 τ ppm (X1′ ⊗ Im ).
∂vec X1
But using Theorem 2.19 of Chapter 2,

Kmp (Ip ⊗ X1 )τmpm Kmm = Ip ⊗ (X1 τm1m Kmm )

so
∂vec(X1 ⊗ X1 )
= (Ip ⊗ S1′ )[Ip ⊗ (X1 τm1m Kmm ) + Ip2 τ pp m (X1′ ⊗ Im )].
∂vec X
(5.37)
196 New Matrix Calculus Results

The second part of ∂vec(X1 ⊗ X1 )/∂vec X given by Equation 5.37 is


(Ip ⊗ S1′ )[Ip2 τ pp m (X1′ ⊗ Im )] = Ip2 τ pp,Gm (Ip ⊗ S1′ )(X1′ ⊗ Im )
= Ip2 τ pp,Gm (X1′ ⊗ S1′ )
so the corresponding second part of ∂vec(X τGm m X )/∂vec X is
Ip2 τ p,p,Gm (X1′ ⊗ S1′ ) + · · · + Ip2 τ p,p,Gm (XG′ ⊗ SG′ )
= Ip2 τ p,p,Gm [X1′ ⊗ S1′ + · · · + XG′ ⊗ SG′ ],

by Theorem 1.4 of Chapter 1 and where S ′j = e Gj ⊗ Im for j = 1, . . . , G.


If we write,
⎛ ⎞
S1
⎜ .. ⎟
S=⎝ . ⎠
SG
then
X1′ ⊗ S1′ + · · · + XG′ ⊗ SG′ = vecm X ′ τG,p,Gm vecm S ′
so this second part can be written as
Ip2 τ p,p,Gm (vecm X ′ τG,p,Gm vecm S ′ ). (5.38)
Consider the first matrix on this right-hand side of Equation 5.37, which
we can write as
Ip ⊗ S1′ (X1 τm1m Kmm ) = Ip ⊗ e1G ⊗ Im (X1 τm1m Kmm )
 

= Ip ⊗ e1G ⊗ X1 τm1m Kmm


so the corresponding part of ∂vec(X τGm m X )/∂vec X is
Ip ⊗ e1G ⊗ X1 τm1m Kmm + · · · + Ip ⊗ eGG ⊗ XG τm1m Kmm
= Ip ⊗ e1G ⊗ X1 τm1m Kmm + · · · + eGG ⊗ XG τm1m Kmm
 
⎛ ⎞
X1 τm1m Kmm
= Ip ⊗ ⎝ ..
⎠.
⎜ ⎟
.
XG τm1m Kmm
But,
⎛ ⎞
X1 τm1m Kmm
..
⎠ = KmG X τmGm Kmm ,
⎜ ⎟
⎝ .
XG τm1m Kmm
5.6 Matrix Derivatives of Cross-Products 197

by Theorem 2.13 of Chapter 2. We can write the first part of our derivative
then as
Ip ⊗ (KmG X τmGm Kmm ) . (5.39)
Adding our two parts given by Equations 5.38 and 5.39 together yields,
∂vec(X τGmm X )
∂vec X
= Ip ⊗ (KmG X τmGm Kmm ) + Ip2 τ (vecm X ′ τG,p,Gm vecm S ′ ). (5.40)
p,p,Gm

To break this result down further consider,


Ip2 τ p,p,Gm X1′ ⊗ S1′
 
 p′  p′
= e1 ⊗ Ip ⊗ X1′ 1· ⊗ S1′ + · · · + e p ⊗ Ip ⊗ X1′ p· ⊗ S1′
       

= Ip ⊗ X1′ 1· ⊗ S1′ . . . Ip ⊗ (X1′ p· ⊗ S1′ .


    

But (X1′ )1· ⊗ S1′ = (X1′ )1· ⊗ e1G ⊗ Im = e1G (X1′ )1· ⊗ Im
so
Ip2 τ p,p,Gm X1′ ⊗ S1′ = Ip ⊗ e1G X1′ 1· . . . Ip ⊗ e1G X1′ p· ) ⊗ Im
      
⎡ ⎛  ′ ⎞ ⎛ ′ ⎞⎤
X1 1· (X1 ) p·
⎢ ⎜ O ⎟ ⎜ O ⎟⎥
= ⎢I p ⊗ ⎜ . ⎟ . . . Ip ⊗ ⎜ . ⎟⎥ ⊗ Im .
⎢ ⎜ ⎟ ⎜ ⎟⎥
⎣ ⎝ . . ⎠ ⎝ . . ⎠⎦
O O
It follows that the second part of ∂vec(X τGm m X )/∂vec X can be written as
⎡ ⎛ ′ ⎞ ⎛ ′ ⎞⎤
(X1 )1· (X1 ) p·
⎜ .. ⎟ ⎜ .. ⎟⎥
⎣Ip ⊗ ⎝ . ⎠ . . . Ip ⊗ ⎝ . ⎠⎦ ⊗ Im

(XG′ )1· (XG′ ) p·


 
= Ip ⊗ (vecm X ′ )(1) . . . Ip ⊗ (vecm X ′ )(p) ⊗ Im . (5.41)

Also using Theorem 2.10 and Equation 2.8 of Chapter 2,


′ ′
KmG X τmGm Kmm = X (1) ⊗ Im ⊗ e1m + · · · + X (m) ⊗ Im ⊗ em
m
. (5.42)
Combining Equations 5.41 and 5.42 allows us to write
∂vec X τGmm X ′
m′
= Ip ⊗ X (1) ⊗ Im ⊗ e1m + · · · + X (m) ⊗ Im ⊗ em
 
∂vec X
+ Ip ⊗ (vecm X ′ )(1) . . . Ip ⊗ (vecm X ′ )(p) ⊗ Im .
 
198 New Matrix Calculus Results

5.6.5 The Cross-Product X ′ τGm m X ′


Suppose now X is p×mG, so we can form X ′ τGm m X ′ . The derivative of
this cross-product can be obtained from

∂vec X ′ τGm m X ′ ∂vec X ′ ∂vec X ′ τGm m X ′


= .
∂vec X ∂vec X ∂vec X ′
As in the previous section,

vec X ′ = Kp,mG vec X

so
∂vec X ′ ′
= Kp,mG = KmG,p
∂vec X
and using Equation 5.40, we have

∂vec X ′ τGm m X ′
= KmG,p Ip ⊗ (KmG X ′ τmGm Kmm )

∂vec X
+ Ip2 τ p,p,Gm (vecm X τG,p,Gm vecm S ′ ) .

(5.43)

Now, from Equation 2.9 of Chapter 2,

KmG,p = (IG ⊗ Kmp )(KG p ⊗ Im )

and we can write the first matrix on the right-hand side of Equation 5.43
as

(IG ⊗ Kmp )(KG p ⊗ Im ) Ip ⊗ (KmG X ′ τmGm Kmm ) .


 

From Equation 2.8 of Chapter 2,

(KG p ⊗ Im )(Ip ⊗ (KmG X ′ τmGm Kmm ))



Ip ⊗ e1G ⊗ Im
⎛ ⎞
.. ⎟ ′

=⎝ ⎠ Ip ⊗ (KmG X τmGm Kmm ) (5.44)

.

Ip ⊗ eGG ⊗ Im

and so the first block of this matrix is


 ′
Ip ⊗ e1G ⊗ Im (KmG X ′ τmGm Kmm ) .
  
5.6 Matrix Derivatives of Cross-Products 199

Now, from Theorem 1.5 of Chapter 1,


 G′ ′
e1 ⊗ Im (KmG X ′ τmGm Kmm ) = Im ⊗ e1G KmG X ′ τm1m Kmm
 
 ′
= e1G ⊗ Im X ′ τm1m Kmm = X1′ τm1m Kmm


where we have partitioned X as X = (X1 . . . XG ) each submatrix being


p×m, so we can write the right-hand side of Equation 5.44 as
Ip ⊗ X1′ τm1m Kmm
⎛ ⎞
..
⎠,
⎜ ⎟
⎝ .
Ip ⊗ XG′ τm1m Kmm

and the first matrix on the right-hand side of Equation 5.43 as


 ⎞
Kmp Ip ⊗ X1′ τm1m Kmm

⎜ .. ⎟

⎜  . ⎟
⎟ (5.45)
⎝ Kmp Ip ⊗ XG′ τm1m Kmm ⎠
.
The second matrix on the right-hand side of Equation 5.43 is

KmG,p (Ip2 τ p,p,Gm (vecm X τG,p,Gm vecm S ′ ))

which using Equation 5.41, we can write as

KmG,p Ip ⊗ (vecm X )(1) ⊗ Im . . . Ip ⊗ (vecm X )(p) ⊗ Im .


 
(5.46)

The first block of this matrix using Equation 2.9 of Chapter 2

(IG ⊗ Kmp )(KG p ⊗ Im ) Ip ⊗ (vecm X )(1) ⊗ Im


 

= (IG ⊗ Kmp ) KG p (Ip ⊗ (vecm X )(1) ) ⊗ Im .


 

Now, as
⎛ ⎞
(X1 )1·
(vecm X )(1) = ⎝ ... ⎠
⎜ ⎟

(XG )1·
it follows from Theorem 2.3 of Chapter 2 that
⎛ ⎞
Ip ⊗ (X1 )1·
KG p Ip ⊗ (vecm X )(1) = ⎝
  ⎜ .. ⎟
. ⎠
Ip ⊗ (XG )1·
200 New Matrix Calculus Results

so we can write our first block as

⎛ ⎞
Kmp (Ip ⊗ (X1 )1· ⊗ Im )
..
⎠.
⎜ ⎟
⎝ .
Kmp (Ip ⊗ (XG )1· ⊗ Im )

Returning now to Equation 5.43, it is clear that we can write the second
matrix of the right-hand side of Equation 5.43 as

⎛ ⎞
Kmp (Ip ⊗ (X1 )1· ⊗ Im ) ··· Kmp (Ip ⊗ (X1 ) p· ⊗ Im )
.. ..
⎠.
⎜ ⎟
⎝ . .
Kmp (Ip ⊗ (XG )1· ⊗ Im ) · · · Kmp (Ip ⊗ (XG ) p· ⊗ Im )

Combining this with Equation 5.45 gives the following result,

Kmp (Ip ⊗ X1′ τm1m Kmm )


⎛ ⎞
∂vec X ′ τGmm X ′ ⎜ ..
=⎝

∂vec X . ⎠
Kmp (Ip ⊗ XG′ τm1m Kmm )
⎛ ⎞
Kmp (Ip ⊗ (X1 )1· ⊗ Im ) · · · Kmp (Ip ⊗ (X1 ) p· ⊗ Im )
+⎝ .. .. (5.47)
⎠.
⎜ ⎟
. .
Kmp (Ip ⊗ (XG )1· ⊗ Im ) · · · Kmp (Ip ⊗ (XG ) p· ⊗ Im )

We can break this result down further by noting that


Kmp (Ip ⊗ X1′ τm1m Kmm ) = Kmp (Ip ⊗ [(X1′ )1· ⊗ Im ⊗ e1m

m
+ · · · + (X1′ )m· ⊗ Im ⊗ em ])

and by Theorem 2.3 of Chapter 2

′ ′ ⎞
Ip ⊗ (X1′ )1· ⊗ e1m ⊗ e1m


Kmp (Ip ⊗ (X1′ )1· ⊗ Im ⊗ e1m ) = ⎝
⎜ .. ⎟
. ⎠
′ m′ m′
Ip ⊗ (X1 )1· ⊗ em ⊗ e1
5.6 Matrix Derivatives of Cross-Products 201

so

Kmp Ip ⊗ X1′ τm1m Kmm


 

′ ′ ′
m′
Ip ⊗ (X1′ )1· ⊗ e1m ⊗ e1m + · · · + (X1′ )m· ⊗ e1m ⊗ em
⎛  ⎞

=⎝
⎜ .. ⎟
. ⎠
m′ m′ m′ m′
 ′ ′

Ip ⊗ (X1 )1· ⊗ em ⊗ e1 + · · · + (X1 )m· ⊗ em ⊗ em

Ip ⊗ (X1′ ⊗ e1m )τm1m Im
⎛  ⎞

= ⎝ ... ⎠.
⎜ ⎟
m′
 ′ 
Ip ⊗ (X1 ⊗ em )τm1m Im

The first matrix on the right-hand side of Equation 5.47 can then be broken
down to


⎛ ⎞
Ip ⊗ (X1′ ⊗ e1m )τm11 Im

⎜ .. ⎟

⎜ . ⎟

⎜ I ⊗ (X ′ ⊗ e m ′ )τ I  ⎟
⎜ p 1 m m11 m ⎟
⎜ .. ⎟
⎟.
.

⎜ ⎟

m
 ′ 
⎜ Ip ⊗ (XG ⊗ e1 )τm11 Im ⎟
⎜ ⎟
⎜ .. ⎟
.
⎜ ⎟
⎝ ⎠

m
 ′ 
Ip ⊗ (XG ⊗ em )τm11 Im

To expand the second matrix on the right-hand side of Equation 5.47 note
that by Equation 1.6 of Chapter 1,

′ ⎞
(X1 )1· ⊗ e1m

(X1′ )1· ⊗ Im = ⎝
⎜ .. ⎟
. ⎠
m′
(X1 )1· ⊗ em

so, by Theorem 2.3 of Chapter 2

′ ⎞
Ip ⊗ (X1 )1· ⊗ e1m

Kmp (Ip ⊗ (X1 )1· ⊗ Im ) = ⎝ ..


⎠.
⎜ ⎟
.
m′
Ip ⊗ (X1 )1· ⊗ em
202 New Matrix Calculus Results

It follows that this second matrix can be written as


′ ′
⎛ ⎞
Ip ⊗ (X1 )1· ⊗ e1m · · · Ip ⊗ (X1 ) p· ⊗ e1m
⎜ .. .. ⎟

⎜ . . ⎟

⎜ I ⊗ (X ) ⊗ e m ′ · · · I ⊗ (X ) ⊗ e m ′ ⎟
⎜ p 1 1· m p 1 p· m ⎟
⎜ .. .. ⎟
⎟.
. .

⎜ ⎟
m′ m′ ⎟
⎜ Ip ⊗ (XG )1· ⊗ e1 · · · Ip ⊗ (XG ) p· ⊗ e1 ⎟

⎜ .. .. ⎟
. .
⎜ ⎟
⎝ ⎠
′ ′
m m
Ip ⊗ (XG )1· ⊗ em · · · Ip ⊗ (XG ) p· ⊗ em

5.6.6 The Cross-Product X −1 τGmm X −1


Suppose now X is mG×mG and nonsingular, so X −1 τGmm X −1 can be
formed. In this section, we obtain the derivative of this matrix.
By the backward chain rule,
∂vec X −1 τGmm X −1 ∂vec X −1 ∂vec X −1 τGmm X −1
= .
∂vec X ∂vec X ∂vec X −1
We know that
∂vec X −1 ′
= −X −1 ⊗ X −1
∂vec X
and from Equation 5.40 that
∂vec X −1 τGmn X −1
= ImG ⊗ (KmG X −1 τmGm Kmm )
∂vec X −1

+ I(mG)2 τmG,mG,m vecm X −1 τG,Gm,Gm vecm S ′
 

so
∂vec X −1 τGmm X −1
∂vec X

= −X −1 ⊗ X −1 (KmG X −1 τmGm Kmm )
′ ′
− (X −1 ⊗ X −1 )(I(mG)2 τmG,mG,m (vecm X −1 τG,Gm,Gm vecm S ′ )). (5.48)
Consider the first matrix on the right-hand side of this equation.
Suppose we write,
⎛ 1⎞
X
−1 ⎜ .. ⎟
X =⎝ . ⎠
XG
5.6 Matrix Derivatives of Cross-Products 203

where each submatrix in this partitioning is m×mG. It follows that


′ ′ ′
X −1 = (X 1 . . . X G )

and using Theorem 2.13 of Chapter 2



X −1 (KmG X −1 τmGm Kmm )
⎛ 1 ⎞
X τm1m Kmm
′ ′ ⎜
= (X 1 . . . X G ) ⎝ .. ⎟
. ⎠
X G τm1m Kmm
′ ′
= X 1 (X 1 τm1m Kmm ) + · · · + X G (X G τm1m Kmm ).

By Theorem 1.8 of Chapter 1,


′ ′ ′
X 1 (X 1 τm1m Kmm ) = X 1 τm,1,mG (Im ⊗ X 1 )Kmm = (X 1 ⊗ X 1 )τm,mG,1 Im

by Theorem 2.19 of Chapter 2.


It follows that the first matrix on the right-hand side of Equation 5.48
can be written as
′ ′
−X −1 ⊗ X 1 ⊗ X 1 )τm,mG,1 Im + · · · + X G ⊗ X G τm,mG,1 Im
  
′ ′ ′
= −X −1 ⊗ X 1 ⊗ X 1 + · · · + X G ⊗ X G τm,mG,1 Im
 
′
= −X −1 ⊗ X −1 τG,m,mG vecm X −1 τm,mG,1 Im .
 
(5.49)

Consider now the second matrix on the right-hand side of Equation 5.48,
which using Equation 5.41, we can write as
′  ′ ′
−(X −1 ⊗ X −1 ) ImG ⊗ (vecm X −1 )(1) ⊗ Im . . . ImG ⊗ (vecm X −1 )(mG) ⊗ Im

′  ′  (1)
= −X −1 ⊗ X −1 vecm X −1

⊗ Im . . .
′  ′  (mG)
−X −1 ⊗ X −1 vecm X −1

⊗ Im .

By Theorem 1.18 of Chapter 1,


′ ′  (1) ′  (1) ′
X −1 vecm X −1 ⊗ Im = vecm X −1 τG,1,mG vecm X −1
  

so we can write this second matrix as


′  (1) ′
−X −1 ⊗ vecm X −1 τG,1,mG vecm X −1 . . .

′  (mG) ′
−X −1 ⊗ vecm X −1 τG,1,mG vecm X −1 .

(5.50)
204 New Matrix Calculus Results

Combining Equations 5.49 and 5.50 gives us our result, namely

∂vec X −1 τGm m X −1
∂vec X
′
= − X −1 ⊗ X −1 τG,m,mG vecm X −1 τm,mG,1 Im
 
′  (1) ′
− X −1 ⊗ vecm X −1 τG,1,mG vecm X −1 . . .
 
′  (mG) ′
X −1 ⊗ vecm X −1 τG,1,mG vecm X −1 .


To break this result down further note that


′ ′
X1 ⊗ X1 + ··· + XG ⊗ XG
⎛ 1 ′ ′ ⎞
(X )1· ⊗ X 1 + · · · + (X G )1· ⊗ X G
.. ..
=⎝ ⎠,
⎜ ⎟
. .
′ ′
(X 1 )m· ⊗ X 1 + · · · + (X G )m· ⊗ X G

so the first matrix on the right-hand side of our result can be written as
′ ′ ′
−X −1 ⊗ ((X 1 )1· ⊗ X 1 + · · · + (X G )1· ⊗ X G ) ⊗ e1m

′ ′ ′
+ · · · + ((X 1 )m· ⊗ X 1 + · · · + (X G )m· ⊗ X G ) ⊗ e m

As far as the second matrix is concerned, note that


⎛ 1′ ⎞
X
−1 ′ ⎜ .. ⎟
vecm X =⎝ . ⎠

XG
so
′ ⎞ ⎛ 1′ ⎞
(X 1 )1·

X.1

vecm X −1
′ (1) .
.. . ⎟
=⎝ ⎠ = ⎝ .. ⎠
⎜ ⎟ ⎜
′ ′
(X G )1· X.G1

and
′  (1) ′ ′ ′ ′ ′
vecm X −1 τG,1,mG vecm X −1 = X·11 ⊗ X 1 + · · · + X·1G ⊗ X G .


We can break this second matrix of our result down as


 ′ ′ ′ ′  1′ ′
− X −1 ⊗ X·11 ⊗ X 1 + · · · + X·1G ⊗ X G . . . X −1 ⊗ X·mG ⊗ X1


G′ ′ 
+ · · · + X·mG ⊗ XG .
5.7 Results with Reference to ∂ vec Y/∂ vec X 205

5.7 Results with Reference to ∂ vec Y/∂ vec X

5.7.1 Introduction
One of the advantages of working with the concept of a matrix deriva-
tive given by ∂vec Y/∂vec X is that if vec Y = Avec X where A is a matrix
of constants, then ∂ℓ/∂vec Y = A∂ℓ/∂vec X for several of the vectors and
matrices we encounter in our work. That is, often given the specialized
matrices and vectors we work with if y = Ax, and A is a matrix of con-
stants, then ∂ℓ/∂y = A∂ℓ/∂x for a scalar function ℓ. For example, if A
is a selection matrix or a permutation matrix, then y = Ax implies that
∂ℓ/∂y = A∂ℓ/∂x, for an arbitrary scalar function ℓ as well. In this section,
this property is investigated further. It is demonstrated that several theorems
can be derived from this property. On the face of it, these theorems appear
very simple and indeed their proofs are almost trivial. But taken together,
they form a powerful tool for deriving matrix calculus results. By way of
illustration, these theorems are used in Section 5.7.3 to derive results, some
of which are new, for derivatives involving the vectors studied in Section
1.4.3 of Chapter 1, namely vec A, vech A, and v(A) for A a n×n matrix.
They are also used in Section 5.7.4 to explain how results for derivatives
involving vec X where X is a symmetric matrix can be derived from known
results.

5.7.2 Simple Theorems Involving ∂vec Y/∂vec X


Theorem 5.5 Let x be an n×1 vector whose elements are distinct. Then,
∂x
= In .
∂x

Proof: Clearly,
 
∂x ∂x1 ∂xn
= e1n . . . enn = In ,
 
= ...
∂x ∂x ∂x
where e nj is the jth column of In . 

Theorem 5.6 Suppose x and y are two column vectors such that y = Ax and
∂ℓ/∂y = A∂ℓ/∂x for A a matrix of constants and ℓ a scalar function. Let z be
a column vector. Then,
∂z ∂z
=A .
∂y ∂x
206 New Matrix Calculus Results

Proof: We know that for any scalar ℓ,

∂ℓ ∂ℓ
=A .
∂y ∂x

Write
 ′
z = z1 . . . zp .

Then,

∂z p
   
∂z ∂z1 ∂z1 ∂z p
= ... = A ... A
∂y ∂y ∂y ∂x ∂x
 
∂z1 ∂z p ∂z
=A ... =A . 
∂x ∂x ∂x

Theorem 5.7 Suppose x and y are two column vectors such that y = Ax and
∂ℓ/∂y = A∂ℓ/∂x for A a matrix of constants and ℓ a scalar function. Suppose
the elements of x are distinct. Then,
 ′
∂y ∂x
= .
∂x ∂y

Proof: Using the concept of a matrix derivative ∂y/∂x = A ′ . But from


Theorem 5.6,

∂z ∂z
=A
∂y ∂x

for any vector z. Taking z = x gives

∂x ∂x
=A
∂y ∂x

and as the elements of x are distinct by Theorem 1, the derivative ∂x/∂x is


the identity matrix
so
 ′
∂x ∂y
=A= .
∂y ∂x

Taking transposes gives the result. 


5.7 Results with Reference to ∂ vec Y/∂ vec X 207

In using the concept of a matrix derivative we have, a backward chain rule


applies, which is just the transpose of the chain rule reported by Magnus
(see Magnus (2010)). That is, if y is a vector function of u and u is a vector
function of x, so y = y(u(x)), then
∂y ∂u ∂y
= .
∂x ∂x ∂u
Using this result gives us the following theorem.

Theorem 5.8 For any vectors x and y,


∂y ∂x ∂y
= .
∂x ∂x ∂x

Proof: Write y = y(x(x)) and apply the backward chain rule. 

5.7.3 Theorems Concerning Derivatives Involving VecA, VechA,


and v
Let A = {ai j } be an n×n matrix and partition A into its columns, so A =
(a1 . . . an ) where aj is the jth column of A for j = 1, . . . , n. Then, recall
from Section 1.4.3 of Chapter 1 that vecA is the n2 ×1 vector given by
vec A = (a1′ . . . an′ ) ′ , that is, to form vecA we stack the columns of A
underneath each other. VechA is the 12 n(n + 1)×1 vector given by

vech A = (a11 . . . an1 a22 . . . an2 . . . ann ) ′ .

That is, to form vechA we stack the elements of A on and below the main
diagonal one underneath the other. The vector v(A) is the 12 n(n − 1)×1
vector given by

v(A) = (a21 . . . an1 a32 . . . an2 . . . ann−1 ) ′ .

That is, we form v(A) by stacking the elements of A below the main diag-
onal, one beneath the other. These vectors are important for statisticians
and econometricians. If A is a covariance matrix, then vecA contains the
variances and covariances but with the covariances duplicated. The vector
vechA contains the variances and covariances without duplication and v(A)
contains the covariances without the variances.
Regardless as to whether A is symmetric or not, the elements in vechA
and v(A) are distinct. The elements in vecA are distinct provided A is
not symmetric. If A is symmetric, the elements of vecA are not distinct.
208 New Matrix Calculus Results

So, from Theorem 5.5, we have


∂vech A
= I 1 n(n+1) for all A
∂vech A 2

∂v(A)
= I 1 n(n−1) for all A
∂v(A) 2

∂vec A
= In2 provided A is not symmetric.
∂vec A
What ∂vec A/∂vec A is in the case where A symmetric is discussed in Sec-
tion 5.7.4.
In Section 3.2 of Chapter 3, we also saw that there exists 21 n(n + 1)×n2
and 12 n(n − 1)×n2 zero-one matrices Ln and Ln , respectively, such that

Ln vec A = vech A

and

Ln vec A = v(A).

If A is symmetric, then

Nn vec A = vec A

where Nn = 12 (In2 + Knn ) and Knn is a commutation matrix, so for this case

Ln Nn vec A = vech A

and

Ln Nn vec A = v(A).

The matrices Ln Nn and Ln Nn are not zero-one matrices. However, as we


know from Chapter 3 along with Ln and Ln , they form a group of matrices
known as elimination matrices. Finally, in Section 3.3 of Chapter 3, we
saw that for special cases there exists zero-one matrices called duplication
matrices, which take us back from vechA and v(A) to vecA. If A is symmetric,
there exists an n2 × 12 n(n + 1) zero-one matrix Dn such that

Dn vech A = vec A.

Consider ℓ any scalar function. Then, reflexion shows that the same
relationships exist between ∂ℓ/∂vec A, ∂ℓ/∂vech A, and ∂ℓ/∂v(A) as exist
between vecA, vechA, and v(A), respectively.
5.7 Results with Reference to ∂ vec Y/∂ vec X 209

Thus, for general A


∂ℓ ∂ℓ
= Ln
∂vech A ∂vec A
∂ℓ ∂ℓ
= Ln .
∂v(A) ∂vec A
For symmetric A,
∂ℓ ∂ℓ
= Ln Nn
∂vech A ∂vec A
∂ℓ ∂ℓ
= Ln Nn
∂v(A) ∂vec A
∂ℓ ∂ℓ
= Dn . (5.51)
∂vec A ∂vech A
Using the Theorems of Section 3, we can prove the following results.

Theorem 5.9
∂vec A
= Dn′ if A is symmetric
∂vech A
∂vec A
= Ln if A is not symmetric.
∂vech A

Proof: If A is symmetric vec A = Dn vech A and the result follows. For the
case where A is not symmetric, consider

vech A = Ln vec A.

By Theorem 5.6, we have that for any vector z


∂z ∂z
= Ln .
∂vech A ∂vec A
Taking z = vecA gives
∂vec A ∂vec A
= Ln
∂vech A ∂vec A
and as A is not symmetric the elements of vecA are distinct, so by Theo-
rem 5.5
∂vec A
= In2
∂vec A
210 New Matrix Calculus Results

and
∂vec A
= Ln . 
∂vech A
Theorem 5.10
∂vech A
= Dn if A is symmetric
∂vec A
∂vech A
= Ln′ if A is not symmetric
∂vec A

Proof: A trivial application of Theorem 5.7. 

Theorem 5.6 can also be used to quickly derive results about elimination
matrices, duplication matrices, and the matrix Nn . Consider, for example,
the case where A is a symmetric n×n matrix, so
Ln Nn vec A = vech A.
By Theorem 5.6, for any vector z,
∂z ∂z
= Ln Nn .
∂vech A ∂vec A
Take z = vechA. Then,
∂vech A ∂vech A
= Ln Nn = Ln Nn Dn
∂vech A ∂vec A
by Theorem 5.10.
But as the elements of vechA are distinct,
∂vech A
= I 1 n(n+1) ,
∂vech A 2

so
Ln Nn Dn = I 1 n(n+1) ,
2

a result we knew already from Equation 3.55 of Chapter 3.

5.7.4 Theorems Concerning Derivatives Involving VecX


where X Is Symmetric
Consider X an n×n symmetric matrix and let x = vecX. Then, the elements
of x are not distinct and one of the implications of this is that
∂x
= In2 .
∂x
5.7 Results with Reference to ∂ vec Y/∂ vec X 211

Consider the 2×2 case. Then,


 
x11 x21
X =
x21 x22

and x = (x11 x21 x21 x22 ) ′ , so


⎛ ⎞
  1 0 0 0
∂x ∂x11 ∂x21 ∂x21 ∂x22 ⎜0 1 1 0⎟
= =⎜
⎝0
⎟.
∂x ∂x ∂x ∂x ∂x 1 1 0⎠
0 0 0 1

Clearly, this matrix is not the identity matrix. What it is, is given by the
following theorem whose proof again calls on our results of Section 5.7.2.

Theorem 5.11 Let X be an n×n symmetric matrix. Then,


∂vec X
= Dn Dn′ .
∂vec X

Proof: As X is an n×n symmetric matrix,

vec X = Dn vech X

so it follows from Theorem 5.6 that for any vector z


∂z ∂z
= Dn .
∂vec X ∂vech X
Take z = vecX, so
∂vec X ∂vec X
= Dn = Dn Dn′ (5.52)
∂vec X ∂vech X
by Theorem 5.9. 

The fact that in the case where X is an n×n symmetric matrix


∂vec X /∂vec X = Dn Dn′ means that all the usual rules of matrix calcu-
lus, regardless of what concept of a matrix derivative one is using, do not
apply for vecX where X is symmetric. However, Theorem 5.8, coupled with
Theorem 5.11, provide a quick and easy method for finding the results for
this case using known matrix calculus results.
Consider again x = vecX with X a symmetric matrix. Let φy/φx denote
the matrix derivative, we would get if we differentiated y with respect to x
using the concept of differentiation advocated but ignoring the fact that X
212 New Matrix Calculus Results

is a symmetric matrix. Then, the full import of Theorem 5.8 for this case is
given by the equation
∂y ∂x φy
= . (5.53)
∂x ∂x φx
Combining Equations 5.51 and 5.52 give the following theorem.

Theorem 5.12 Consider y = y(x) with x = vecX and X is a n×n symmetric


matrix. Let φy/φx denote the derivative of y with respect to x obtained when
we ignore the fact that X is a symmetric matrix. Then,
∂y φy
= Dn Dn′ .
∂x φx

A few examples will suffice to illustrate the use of this theorem. (For the rules
referred to in these examples, see Turkington (2004), Lutkepohl (1996), or
Magnus and Neudecker (1999)).
For x with distinct elements and A a matrix of constants, we know that
∂x ′ Ax
= 2(A + A ′ )x.
∂x
It follows that when x = vecX and X is an n×n symmetric matrix
∂x ′ Ax
= 2Dn Dn′ (A + A ′ )x.
∂x
For X non-singular, but non-symmetric matrix
∂|X |
= |X |vec(X −1 ) ′
∂vec X
so for X non-singular, but symmetric
∂|X |
= |X |Dn Dn′ vec X −1 .
∂vec X
For X an n×n non-symmetric matrix, A and B matrices of constants
∂vec AX B
= B ⊗ A′
∂vec X
so for X an n×n symmetric matrix
∂vec AX B
= Dn Dn′ (B ⊗ A ′ ).
∂vec X
5.7 Results with Reference to ∂ vec Y/∂ vec X 213

All results using either ∂vec Y/∂vec X or DY (in which case we have to take
transposes) can be adjusted in this way to allow for the case where X is a
symmetric matrix.
In the next chapter, the analysis of this section is brought together to
explain precisely how one should differentiate a log-likelihood function
using matrix calculus.
SIX

Applications

6.1 Introduction
As mentioned in the preface of this book, the main purpose of this work is
to introduce new mathematical operators and to present known matrices
that are important in matrix calculus in a new light. Much of this work
has concentrated on cross-products, generalized vecs and rvecs, and how
they interact and how they can be used to link different concepts of matrix
derivatives. Well-known matrices such as elimination matrices and duplica-
tion matrices have been revisited and presented in a form that enables one
to see precisely how these matrices interact with other matrices, particularly
Kronecker products. New matrix calculus results have also been presented
in this book.
Much of the work then has been of a theoretical nature and I hope it
can stand on its own. Having said this, however, I feel the book would
be incomplete without some indication as to how matrix calculus and the
specialized properties associated with it can be applied.
Matrix calculus can be applied to any area that requires extensive dif-
ferentiation. The advantage of using matrix calculus is that it substantially
speeds up the differentiation process and stacks the partial derivatives in
such a manner that one can easily identify the end result of the process.
Multivariate optimization springs to mind. In Section 6.2, we illustrate the
use of matrix calculus in a well-known optimization problem taken from
the area of finance.
The traditional areas, however, that use matrix calculus are to a large
extent statistics and econometrics. Classical statistical procedures centred
around the log-likelihood function such as maximum likelihood estima-
tion and the formation of classical test statistics certainly require extensive
differentiation. It is here that matrix calculus comes into its own.

214
6.2 Optimization Problems 215

What has been said for statistics holds more so for econometrics, where
the statistical models are complex and the log-likelihood function is a
very complicated function. Applying classical statistical procedures then
to econometric models is no trivial matter. Usually, it is beyond the scope
of ordinary calculus and requires matrix calculus.
As shown in Chapter 4, four different concepts of matrix calculus have
been used, particularly in statistics. In this chapter, as in Chapter 5, Concept
4 of Chapter 4 is used to derive the results.
No attempt is made in this chapter to provide an extensive list of the
applications of matrix calculus and zero-one matrices to models in statis-
tics and econometrics. For such applications, see Magnus and Neudecker
(1999) and Turkington (2005). Instead, what is offered in Section 6.3 is a
brief and non-rigorous summary of classical statistical procedures. Section
6.4 explains why these procedures are amenable to matrix calculus and the
standard approach one should adopt when using matrix calculus to form
the score vector and information matrix, the basic building blocks of clas-
sical statistical procedures. Sections 6.4, 6.5, and 6.6 present applications of
our technique to a statistical model, where we are sampling from a mul-
tivariate normal distribution and to two econometric models, the limited
information model and the full information matrix.

6.2 Optimization Problems


Consider scalar function of many variables y = f (x) where x is an n×1
vector. Then using our concept of matrix derivative, the score vector is
∂y/∂x and the Hessian matrix is ∂ 2 y/∂x∂x = ∂ (∂y/∂x)/∂x.
A critical point, (vectors are called points in this context), of the function
is any point x such that

∂y
= 0.
∂x
A given critical point is a local maximum if the Hessian matrix is negative
definite when evaluated at that point whereas the point is a local minimum
if the Hessian matrix is positive definite when evaluated at the point.
In complicated optimization problems, the rules of matrix calculus can
be used to obtain both the score vector and the Hessian matrix usually
far easier than if one was to use ordinary calculus. To illustrate, consider
a well-known problem taken from finance, namely finding the optimal
portfolio allocation. (This section is taken from Maller and Turkington
216 Applications

(2002)). Given an n×1 vector µ of expected asset returns and an associated


n×n positive definite matrix , the portfolio optimization problem is to
choose a n×1 vector x of asset weights, whose elements add to one such
that expected return µp = µ′ x is maximized
√ when this return is discounted
by the portfolio standard deviation σp = x ′ x. That is, our problem is as
follows:
µ′ x
Maximize f (x) = √ ,
x ′ x
subject to i ′ x = 1,

where i is an n×1 vector whose elements are all 1. The ratio µ′ x/ x ′ x is
called the Sharpe ratio.
As it stands, the problem is a constrained optimization problem, but it
is easily converted to an unconstrained problem by using the constraint to
eliminate one of the variables, say, the last one, xn . We have

x1 + · · · + xn = 1

so

xn = 1 − x1 − · · · − xn−1 = 1 − iR′ xR

where iR is an n − 1×1 vector whose elements are all ones and xR is the
n − 1×1 vector given by xR = (x1 . . . xn−1 ) ′ , and we can write
 
xR
x= = AxR + d,
1 − iR′ xR

where
   
In−1 0
A= and d = .
−iR′ 1

The constrained optimization problem then becomes the following uncon-


strained optimization problem.

y ′µ 1
Max g(xR ) = % = y ′ µ(y ′ y)− 2
xR ′
y y

where y = AxR + d. Using the product rule of ordinary calculus plus the
backward chain rule of matrix calculus given by Theorem 5.2 of Chapter 5,
6.2 Optimization Problems 217

we have that the score vector is given by


∂g(xR ) ∂y ∂µ′ y ′ 1 1 ′
3 ∂y ∂y y
= (y y)− 2 + y ′ µ(y ′ y)− 2
∂xR ∂xR ∂y 2 ∂xR ∂y
1 3
= A ′ µ(y ′ y)− 2 − y ′ µ(y ′ y)− 2 A ′ y
A ′ µ(y ′ y) − y ′ µA ′ y
= 3 .
(y ′ y) 2
A critical point of g(xR ) is any point xR such that ∂g(xR )/∂xR = 0, that is,
any point xR such that A ′ µ(y ′ y) − y ′ µA ′ y = 0. Maller and Turkington
(2002) shows that g(xR ) has a unique critical point

( −1 µ)R
xR∗ = ,
i ′  −1 µ
where, following our notation ( −1 µ)R denotes the vector consisting of the
first n − 1 elements of  −1 µ. In terms of our original variables, the point
xR∗ corresponds to

x ∗ =  −1 µ/i ′  −1 µ

which in turn is a critical point of f (x).


Next, we want to determine the nature of this critical point by evaluat-
ing the Hessian matrix of g(xR ) at xR∗ . Again, the rules of matrix calculus
substantially help in determining this matrix. The Hessian matrix is
∂ 2 g(x)
 
∂ ∂g(x) ∂  ′ ′ − 3 
A µy y − y ′ µA ′ y y ′ y 2 .

= =
∂xR ∂xR ∂xR ∂xR ∂xR
Using the product rule of ordinary calculus, the product rule of matrix
calculus as presented in the corollary of Theorem 5.4 of Chapter 5 and the
backward chain rule of matrix calculus Theorem 5.2 of that chapter, we
have
∂ 2 g (x) ∂y ∂y ′ y ′ ∂y ∂A ′ y ′ ∂y ∂µ′ y ′
 
− 3
y A y ′ y 2

= µA− y µ−
∂xR ∂xR ∂xR ∂y ∂xR ∂y ∂xR ∂y
3 ∂y ∂y ′ y  ′ − 25  ′ ′
µ Ay y − y ′ µy ′ A

− y y
2 ∂x ∂y
 ′ R ′
= 2A yµ A − A ′ Ay ′ µ − A ′ µy ′ A y ′ y
 
5
− 3A ′ y µ′ Ay ′ y − y ′ µy ′ A / y ′ y 2
  
5
= −A ′ y ′ y yµ′ + µy ′  + µ′ y − 3µ′ yyy ′  A/ y ′ y 2 .
    
218 Applications

At the critical point xR∗ , y =  −1 µ/i ′  −1 µ, so evaluating the Hessian matrix


at xR∗ , we have

∂ 2 g(xR )  ∗
x
∂xR ∂xR R
& '
µ′  −1 µ µµ′ µµ′ µ′  −1 µ 3µ′  −1 µµµ′
 
= −A  2 ′ −1 + ′ −1 + ′ −1 A
i ′  −1 µ i µ i µ i µ (i ′  −1 µ)3
5
(µ′  −1 µ) 2
× 5 .
((i ′  −1 µ)2 ) 2

5
Now ((i ′  −1 µ)2 ) 2 = (|i ′  −1 µ|)5 = sign(i ′  −1 µ)(i ′  −1 µ)5 , so

∂ 2 g(xR )  ∗ sign(i ′  −1 µ)(i ′  −1 µ)2 ′


xR = − 1 A ( − µ(µ′  −1 µ)−1 µ′ )A.
∂xR ∂xR′ (µ′  −1 µ) 2

Well-known results from matrix algebra (see Horn and Johnson (1989))
ensure that the matrix A ′ ( − µ(µ′  −1 µ)−1 µ′ )A is positive definite, so
whether the Hessian matrix at xR∗ is negative definite or positive definitive
depends crucially on the sign of i ′  −1 µ. If i ′  −1 µ > 0, then xR∗ is a max-
imum and converting back to our original variables, x ∗ =  −1 µ/i ′  −1 µ
would be the unique maximum of the %constrained problem. This gives the
maximum Sharpe ratio of f (x ∗ ) = µ′  −1 µ. If i ′  −1 µ < 0, then xR∗ is
a minimum and x ∗%gives a unique minimum of the constrained problem,
namely f (x ∗ ) = − µ′  −1 µ 1 .

6.3 Summary of Classical Statistical Procedures

6.3.1 The Score Vector, the Information Model, and the


Cramer-Rao Lower Bound
Let θ be an k×1 vector of unknown parameters associated with a statistical
model and let ℓ(θ) be the log-likelihood function of the model. We assume
that this scalar function satisfies certain regularity conditions and that it
is twice differentiable. Then, ∂ℓ/∂θ is an k×1 vector whose ith element is

1
Maller and Turkington (2002) were the first to recognize the possibility that x ∗ may give
rise to a minimum of the constrained problem rather than a maximum. Their expression
for the Hessian matrix ∂g(xR )/∂xR ∂xR contains a number of typos in it.
6.3 Summary of Classical Statistical Procedures 219

∂ℓ/∂θi . This vector we call the score vector. The Hessian matrix of ℓ(θ) is the
k×k matrix ∂ 2 ℓ/∂θ∂θ = ∂ (∂ℓ/∂θ)/∂θ whose (i, j)th element is ∂ 2 ℓ/∂θi ∂θ j .
The asymptotic information matrix is
 2 
1 ∂ ℓ
I (θ) = − lim E
n→∞ n ∂θ∂θ
where n denotes the sample size. Now, the limit of the expectation need
not be the same as the probability limit, but for the models we consider
in this chapter, based as they are on the multivariate normal distribution,
the two concepts are the same. Often it is more convenient to regard the
information matrix as
1 ∂ 2ℓ
I (θ) = −p lim .
n ∂θ∂θ
The inverse of this matrix, I −1 (θ) is called the asymptotic Cramer-Rao
lower bound and can be used in the following way. Suppose θˆ is a consistent
estimator of θ and that
√ d
n(θˆ − θ) → N (0, V ).2

Then, the matrix V is the asymptotic covariance matrix of θˆ and it exceeds


the Cramer-Rao lower bound in the sense that V − I −1 (θ) is a positive-
semidefinite matrix. If V = I −1 (θ), then θˆ is an asymptotically efficient
estimator and θˆ is called a best asymptotically normally distributed estimator
(BAN estimator for short).

6.3.2 Maximum Likelihood Estimators and Test Procedures


Classical statisticians prescribed a procedure for obtaining a BAN estima-
tor, namely the maximum-likelihood procedure. Let ⊕ denote the param-
eter space. Then, any value of θ that maximizes ℓ(θ) over ⊕ is called a
maximum-likelihood estimate, and the underlying estimator is called the
maximum-likelihood estimator (MLE). The first-order conditions for this
maximization are given by
∂ℓ(θ)
= 0.
∂θ

2
A shortcut notation is being used here. The more formally correct notation is n(θ˜ −
d
θ) → xN (0, V ).
220 Applications

Let θ˜ denote the MLE of θ. Then, θ˜ is consistent, and θ˜ is the BAN estimator
so
√ d
n(θ˜ − θ) → N 0, I −1 (θ) .
 

Let h be a G×1 vector whose elements are differentiable functions of the


elements of θ. That is, h is a vector function of θ, h = h(θ). Suppose we are
interested in developing test statistics for the null hypothesis
H0 : h(θ) = 0
against the alternative
HA : h(θ) = 0.
Let θ˜ denote the MLE of θ and θ denote the constrained MLE of θ; that is, θ
is the MLE of θ after we impose H0 in our model. Now, using our concept
of a matrix derivative, ∂h(θ)/∂θ is the k×G matrix whose (i, j)th element is
∂h j /∂θi . Then, classical statisticians prescribed three competing procedures
for obtaining a test statistic for H0 . These are as follows.

Lagrangian Multiplier Test Statistic


1 ∂ℓ(θ) ′ −1 ∂ℓ(θ)
T1 = I (θ) .
n ∂θ ∂θ
Note that the LMT statistic uses the constrained MLE of θ. If H0 is true, θ
should be close to θ˜ and as, by the first-order conditions, ∂ℓ(θ)/∂θ
˜ = 0, the
derivative ∂ℓ(θ)/∂θ evaluated at θ should also be close to the null vector.
The test statistic is a measure of the distance ∂ℓ(θ)/∂θ is from the null
vector.

Wald Test Statistic


& '−1
˜ ′
∂h(θ) ˜
˜
T2 = nh(θ) ′ ˜ ∂h(θ)
I −1 (θ) ˜
h(θ).
∂θ ∂θ
Note that the Wald test statistic uses the (unconstrained) MLE of θ. Essen-
√ ˜ under H , the
tially, it is based on the asymptotic distribution of nh(θ) 0
˜
statistic itself measuring the distance h(θ) is from the null vector.

Likelihood Ratio Test Statistic  


˜ − ℓ(θ) .
T3 = 2 ℓ(θ)

Note that the likelihood ratio test (LRT) statistic uses both the unconstrained
MLE θ˜ and the constrained MLE θ. If H0 is indeed true, it should not matter
6.3 Summary of Classical Statistical Procedures 221

whether we impose it or not, so ℓ(θ) ˜ should be approximately the same as


ℓ(θ). The test statistic T3 measures the distance between ℓ(θ) ˜ and ℓ(θ).
All three test statistics are asymptotically equivalent in the sense that,
under H0 , they all have the same limiting χ2 distribution and under HA , with
local alternatives, they have the same limiting noncentral χ2 distribution.
Usually, imposing the null hypothesis on our model leads to a simpler
statistical model, and thus constrained MLEs θ are more obtainable than
the θ˜ MLEs. For this reason, the LMT statistic is often the easiest statistic to
form. Certainly, it is the one that has been most widely used in econometrics.

6.3.3 Nuisance Parameters


Let us now partition θ into θ = (α ′ β ′ ) ′ , where α is an k1 ×1 vector of param-
eters of primary interest and β is an k2 ×1 vector of nuisance parameters,
k1 + k2 = k. The terms used here do not imply that the parameters in β are
unimportant to our statistical model. Rather, they indicate that the purpose
of our analysis is to make statistical inference about the parameters in α
instead of those in β.
In this situation, two approaches can be taken. First, we can derive the
information matrix I (θ) and the Cramer-Rao lower bound I −1 (θ).
Let
   αα αβ 
Iαα Iαβ −1 I I
I (θ) = , I (θ) =
Iβα Iββ I βα I ββ
be these matrices partitioned according to our partition of θ. As far as
α is concerned, we can now work with Iαα and I αα in place of I (θ) and
I −1 (θ), respectively. For example, I αα is the Cramer-Rao lower bound for
the asymptotic covariance matrix of a consistent estimator of α. If α̃ is the
MLE of α, then
√ d
n(α̃ − α) → N (0, I αα ),
and so on.
A particular null hypothesis that has particular relevance for us is
H0 : α = 0
against
HA : α = 0.
Under this first approach, the classical test statistics for this null hypothesis
would be the following test statistics.
222 Applications

Langrangian Test Statistic


1 ∂ℓ(θ) ′ αα ∂ℓ(θ)
T1 = I (θ) .
n ∂α ∂α
Wald Test Statistic
˜ −1 α̃.
T2 = nα̃ ′ I αα (θ)

Likelihood Ration Test Statistic


˜ − ℓ(θ) .
 
T3 = 2 ℓ(θ)
Under H0 , all three test statistics would have a limiting χ2 distribution with
k1 degrees of freedom, and the nature of the tests insists that we use the
upper tail of this distribution to find the appropriate critical region.
The second approach is to work with the concentrated log-likelihood
function. Here, we undertake a stepwise maximization of the log-likelihood
function. We first maximize ℓ(θ) with respect to the nuisance parameters β
to obtain β = β(α). The vector β is then placed back in the log-likelihood
function to obtain
 
ℓ(α) = ℓ α, β(α) .

The function ℓ(α) is called the concentrated likelihood function. Our


analysis can now be reworked with ℓ(α) in place of ℓ(θ).
For example, let
1 ∂ℓ
I = −p lim
n ∂α∂α ′
and let α̂ be any consistent estimator of α such that
√ d  
n(α̂ − α) → N 0, Vα .
Then, Vα ≥ I −1 in the sense that their difference is a positive-semidefinite
matrix. If α̃ is the MLE of α, then α̃ is obtained from
∂ℓ
=0
∂α
√ d
n(α̃ − α) → N (0, I −1 ),
and so on. As far as test procedures go for the null hypothesis H0 : α = 0,
under this second approach we rewrite the test statistics by using ℓ and I in
place of ℓ(θ) and I (θ), respectively. In our application in Sections 6.5 and
6.4 Matrix Calculus and Classical Statistical Procedures 223

6.6, we use the second approach and form the concentrated log-likelihood
function for our models.

6.4 Matrix Calculus and Classical Statistical Procedures


Classical statistical procedures involve much differentiation. The score vec-
tor ∂ℓ/∂θ, the Hessian matrix ∂ 2 ℓ/∂θ∂θ, and ∂h/∂θ all involve working out
partial derivatives and it is at this stage that difficulties can arise in applying
these procedures to econometric models. As noted in the introduction, the
log-likelihood function ℓ(θ) for most econometric models is a complicated
function and it is no trivial matter to obtain the derivatives required for
our application. Although in some cases it can be done (see, for example,
Rothenberg and Lenders (1964)), what often happens when one attempts
to do the differentiation using ordinary calculus is that one is confronted
with a hopeless mess. It is here that matrix calculus comes into its own.
In most econometric models, we can partition θ, the vector containing
the parameters of the model, as θ = (δ ′ v ′ ) ′ where v = vech and  is a
covariance matrix associated with the model. Usually, though not always, the
vector v represents the nuisance parameters of the model and the primary
aim of our analysis is to make statistical inference about the parameters
in δ. Nuisance parameters or not, v represents a problem in that the log
likelihood function is never expressed in terms of v. Rather, it is written up
in terms of .
The question is then how do we form ∂ℓ/∂v. The results of the last section
of Chapter 5 present us with a method of doing this. As  is symmetric and
assuming it is G×G, then from Theorem 5.11 of Chapter 5, we have that
∂ℓ φℓ
= DG DG′ (6.1)
∂vec φvec
recalling that φℓ/φvec is the derivative obtained when we ignore the fact
that  is symmetric. But from Equation 5.51 of the same chapter,
∂ℓ ∂ℓ
= LG NG (6.2)
∂v ∂vec
so, combining Equations 6.1 and 6.2, we have that
∂ℓ φℓ φℓ
= LG NG DG DG′ = DG′ (6.3)
∂v φvec φvec
as by Equation 3.55 of Chapter 3, LG NG DG = I 1 G(G+1) .
2
224 Applications

Our method then is to differentiate the log-likelihood function with


respect to vec ignoring the fact that  is symmetric. Then, premultiply
the result obtained by DG′ .
Note from theorem of Chapter 5

∂vec
= DG′
∂v
so we would write Equation 6.3 as

∂ℓ ∂vec φℓ
=
∂v ∂v φvec

which resembles a backward chain rule. This is the approach taken by


Turkington (2005) in forming matrix derivatives associated with econo-
metric models.
Consider now an p×1 vector x = (x1 · · · x p ) ′ whose elements are
differentiable functions of v but the vector itself is expressed in terms of .
Then, by Equation (6.3)
   
∂x ∂x1 ∂x p ′ φx1 ′
φx p
= ··· = DG · · · DG
∂v ∂v ∂v φv φv
 
φx1 φx p φx
= DG′ ··· = DG′ . (6.4)
φv φv φv

Using Equation 6.4 allows us to form the Hessian matrix of ℓ(θ). We have
 
∂ ∂ℓ φ (∂ℓ/∂δ)
= DG′
∂v ∂δ φvec
so
 ′ ′
∂ 2ℓ
  
∂ ∂ℓ φ (∂ℓ/∂δ)
= = DG
∂δ∂v ∂v ∂δ φvec

and
∂ 2ℓ
   
∂ ∂ℓ ′ φ ′ φℓ
= = DG DG
∂v∂v ∂v ∂v φvec φvec
φ2 ℓ
 
φ φℓ
= DG′ DG = DG′ D (6.5)
φvec φvec φvecφvec G

where in our working we have used Theorem 5.1 of Chapter 5.


6.4 Matrix Calculus and Classical Statistical Procedures 225

The Hessian matrix of ℓ(θ) is then


∂ 2ℓ φ (∂ℓ/∂δ) ′
⎛   ⎞
DG ⎟

H (θ) = ⎜ ∂δ∂δ φvec ⎟.
2

′ φ(∂ℓ/∂δ) ′ φℓ ⎠
DG DG DG
φvec φvecφvec
As far as the asymptotic information matrix is concerned, if we assume the
underlying distribution is the multivariate normal distribution, we know
that we can write this matrix as
1 ∂ 2ℓ 1
I (θ) = −p lim ′
= −p lim H (θ).
n ∂θ∂θ n
If we let
1 ∂ 2ℓ
 
1 φ (∂ℓ/∂δ)
A = −p lim , B = −p lim
n ∂δ∂δ ′ n φvec

1 φ2 ℓ
C = −p lim .
n φvecφvec
Then, we can write the information matrix as
B ′ DG
 
A
I (θ) = .
DG′ B DG′ CDG

Often, see for example Turkington (2005), the matrices B and C will be
Kronecker products or at least involve Kronecker products, thus justifying
our study in Chapter 3 of how the duplication matric DG interacts with
Kronecker products. In fact, in many econometric models C = 21 ( ⊗ ).
Consider then the case where

C = (E ⊗ E )

where we assume that E is nonsingular. Then, we saw in Equation 3.58 of


Section 3.4 of Chapter 3 that

(DG′ (E ⊗ E )DG )−1 = LG NG (E −1 ⊗ E −1 )NG LG′ .

In some statistical and econometric models, B is the null matrix. In this


special case, the information matrix is
 −1 
A  −1 O −1 
I −1 (θ) =
O LG NG E ⊗ E NG LG′
226 Applications

thus justifying our study in Section 3.2.2 of Chapter 3 of how the elimination
matrix LG NG interacts with Kronecker products. In the case where B is not
the null matrix, then
 
−1 G S
I (θ) =
S′ J

where

G = (A − B ′ DG LG NG (E −1 ⊗ E −1 )NG LG′ DG′ B)−1


= (A − B ′ NG (E −1 ⊗ E −1 )NG B)−1

as in Section 3.2.2, we saw that DG LG NG = NG , S = −GB ′ DG LG NG (E −1 ⊗


E −1 )NG LG′ = −GB ′ NG (E −1 ⊗ E −1 )NG LG′
and

J = LG NG (E −1 ⊗ E −1 )NG LG′ − LG NG (E −1 ⊗ E −1 )NG S.

Again, we see that application of classical statistical procedures justifies


the study, in some detail of NG (A ⊗ B)NG , LG NG (A ⊗ B)NG LG′ and
DG (A ⊗ B)DG′ as was conducted in Sections 3.2 and 3.3 of Chapter 3.

6.5 Sampling from a Multivariate Normal Distribution


A simple example shows how our analysis works in practice. The matrix
calculus rules used in this example are found by taking the transposes of the
equivalent rules reported in Section 4.3 of Chapter 4. We consider a sample
of size n from the G dimensional distribution of a random vector y with
mean vector µ and a positive definite covariance matrix . The parameters
of this model are θ = (µ′ v ′ ) ′ where v = vech and the log-likelihood
function, apart from a constant is
n
1 1  ′
ℓ(θ) = n log || − yi − µ  −1 (yi − µ)
2 2 i=1
1 1
= n log || − tr  −1 Z
2 2
with
n

Z= (yi − µ)(yi − µ) ′ .
i=1
6.5 Sampling from a Multivariate Normal Distribution 227

The Score Vector


Now, using Theorem 5.1 of Chapter 5
n n
∂ℓ 1 ∂ 
=− (yi − µ) ′  −1 (yi − µ) =  −1 (yi − µ). (6.6)
∂µ 2 i=1 ∂µ i=1

The next derivative in the score vector, namely ∂ℓ/∂v, uses the technique
explained in the previous section. Consider
φℓ 1 φ log || 1 φ
=− n − tr  −1 Z.
φvec 2 φvec 2 φvec
Now, from Equation 4.4 of Chapter 4
φ log ||
= vec −1
φvec
and using the backward chain rule together with Equations 4.5 and 4.16 of
Chapter 4
φ tr  −1 Z φvec −1 φ tr  −1 Z
= = −( −1 ⊗  −1 )vecZ
φvec φvec φvec −1
so
φℓ 1 1
= − n vec −1 + ( −1 ⊗  −1 )vecZ
φvec 2 2
1 −1
= ( ⊗  −1 )vec (Z − n)
2
and
∂ℓ 1
= DG′ ( −1 ⊗  −1 )vec(Z − n). (6.7)
∂v 2
Together, Equations 6.6 and 6.7 give the components of the score vector
∂ℓ ∂ℓ ′ ′
 ′ 
∂ℓ
= .
∂θ ∂µ ∂v

The Hessian Matrix


The first component of this matrix is
  n
∂ ∂ℓ  ∂µ
= − −1 = −n −1 ,
∂µ ∂µ i=1
∂µ
228 Applications

and using the backward chain rule, we can write

φvec −1 φvec −1 a
 
φ ∂ℓ
=
φvec ∂µ φvec φvec −1
n

with a = (yi − µ).
i=1
But using Theorem 5.1 of Chapter 5,

φvec −1 a φ(a ′ ⊗ IG )vec −1


= = a ⊗ IG ,
φvec −1 φvec −1
so the second component of the Hessian matrix is
∂ 2ℓ
= −( −1 a ⊗  −1 ) ′ DG = −(a ′  −1 ⊗  −1 )DG .
∂µ∂v
The last component of the Hessian matrix is computed by first considering

n φvec −1 1 φvec −1 Z −1
 
φ φℓ
=− +
φvec φvec 2 φvec 2 φvec
−1 
1 φvec −1 Z −1

φvec nIG2
=− − . (6.8)
φvec 2 2 φvec −1
But from Equation 4.15 of Chapter 4,
φvec −1 Z −1
=  −1 Z ⊗ IG + IG ⊗  −1 Z (6.9)
φvec −1
so from Equations 6.8 and 6.9,
∂ 2ℓ ( −1 Z ⊗ IG ) (IG ⊗  −1 Z )
 
nIG2
= DG′ ( −1 ⊗  −1 ) − − DG .
∂v∂v 2 2 2
The Information Matrix
From basic statistics,
1 1
E (a) = 0 E (Z ) = ,
n n
so the information matrix is
 −1
 
1 O
I (θ) = − lim E (H (θ)) = 1 ′ .
n→∞ n O D ( −1 ⊗  −1 )DG
2 G
6.6 The Limited Information Model 229

The Cramer-Rao Lower Bound


Inverting the information matrix gives the Cramer-Rao lower bound
 
−1  O
I (θ) = .
O 2LG NG ( ⊗ ) NG LG′
These results were derived by Magnus and Neudecker (1980) though their
approach using differentials to obtain the derivatives.

6.6 The Limited Information Model

6.6.1 The Model and the Log-Likelihood Function


The limited information model is the statistical model behind a single
behavioural economic equation. In this model, it is assumed that all we
have specified is this one equation, which presumably belongs to a larger
linear economic model. The other equations in this model are not, however,
available to us. Instead, what is given is that certain exogenous or pre-
determined variables enter the reduced forms of the endogenous variables
on the right-hand side of our specified equation.
We write the limited information model as
y1 = Y1 β1 + X1 γ1 + u1 = H1 δ1 + u1
Y1 = X 1 + V1 (6.10)
where Y1 is an n×G1 matrix of observations of G1 current endogenous
variables, X1 is an n×K1 matrix of observations on K1 predetermined vari-
ables appearing in the equation, X is the n×K matrix of observations on
all the predetermined variables appearing in the system, H1 = (Y1 X1 ) and
δ1 = (β1′ γ1′ ) ′ . The second equation Y1 = X 1 + V1 is the reduced-form
equation for Y1 .
We assume the rows of (u1 V1 ) are statistically, independently, and iden-
tically normally distributed random vectors with mean 0 and covariance
matrix
 2
σ η′

= .
η 1
as always let v = vech.
Alternatively, taking the vec of both sides of Equation 6.10, we can write the
model as
y1 = H1 δ1 + u1
y2 = (IG ⊗ X )π1 + v1
1
230 Applications

where y2 = vecY1 , π1 = vec1 , and v1 = vecV1 . Using this notation, we can


then write the model more succinctly as
y = H δ + u, (6.11)
 ′  ′
where y = (y1′ y2′ ) ′ , δ = δ1′ π1′ , u = u1′ v1′ , and
 
H1 0
H= .
0 IG ⊗ X
1

Under our assumption u has a multivariate normal distribution with mean


0 and covariance matrix ψ =  ⊗ In , so the probability density function of
u is
 
1 1
f (u) = (2π)−n det ψ− 2 exp − u ′ ψ−1 u
2
 
−n − n2 1 ′ −1
= (2π) (det ) exp − u ψ u .
2
It follows that the probability density function of y is
 
n 1
g(y) = |det J| (2π)−n (det )− 2 exp − (y − H δ) ′ ψ(y − H δ)
2
where J is the Jacobian matrix ∂u/∂y. But from Equation 6.11, ∂u/∂y is
the identity matrix so det J = 1 and the log-likelihood function, ignoring a
constant, is
n 1
ℓ(v, δ) = − log det  − tr  −1U ′U , (6.12)
2 2
where in this function U is set equal to (y1 − H δ1 Y1 − X 1 ).

6.6.2 Iterative Interpretations of Limited3 Information Maximum


Likelihood Estimators
It has been known for some time that mathematical manipulation of the
first order conditions for the maximization of the log-likelihood function
associated with an econometric model leads to an iterative interpretation of
the maximum-likelihood estimator (see for example Byron (1978), Bowden
and Turkington (1990), Durbin (1988), Hausman (1975), and Turkington
(2002)). This interpretation is couched in terms of the econometric estima-
tor developed for the parameters of primary interest of the model and is
3
I should like to acknowledge my research assistant Stephane Verani for his excellent
programming used in this section and the next.
6.6 The Limited Information Model 231

often used as a justification for the econometric estimator. The economet-


ric estimator can thus be viewed as the first step in an iterative procedure
that leads to the maximum likelihood estimator. In terms of second order
asymptotic efficiency, we know that for some cases at least the maximum
likelihood estimator dominates the econometric estimator (see for example
Efron (1975) and Fuller (1977)).
But what seems to have been overlooked in this literature is the iterative
procedure itself. Several questions can be asked of such a procedure: How
quickly does it converge from the econometric estimator to the maximum
likelihood estimator? Does it converge if we start the iterative process with
estimates obtained from inconsistent estimators, or if we choose any value
as the starting point? In a statistical model, which is complicated in the sense
that it has several sets of nuisance parameters, should we work with iterative
processes derived from the log-likelihood function or from concentrated
log-likelihood functions? That is, does further mathematical manipulations
lead to more efficient iterative procedures?
In this section, as another application, we seek to investigate these matters
using the limited information model, which is suited for this study as it has
two sets of nuisance parameters. One then has the choice of deriving iterative
procedures for the maximum likelihood estimators of the parameters of
primary interest from the log-likelihood function or from two concentrated
log-likelihood functions. The data used in this study is that associated with
Klein’s 1950 model. Klein’s model and data are readily available in textbooks
such as Theil (1971) or Greene (2010) and in the econometric package
Gretl.4
In the log-likelihood function obtained in the previous section, the
parameters of primary interest are contained in the vector δ1 . As far as
classical statistics is concerned, what makes this function difficult to handle
mathematically is that it contains two sets of nuisance parameters: those
contained in the vector π1 , which are the reduced form parameters of the
right-hand current endogenous variables, and those contained in the vector
v, which are the unknown parameters in the covariance matrix . Two sets
of nuisance parameters mean that we are presented with a choice in the way
we obtain the maximum likelihood estimator for the parameters of primary
interest δ1 :
1. We can work with the first order conditions arising from the maxi-
mization of the log-likelihood function ℓ(v, δ).
4
Gretl is an open source econometric package developed by Allen Cottrell. It is available
free of charge at http://gretl.sourceforge.net/.
232 Applications

2. We can use a step-wise maximization procedures, where we first


maximize ℓ(v, δ) with respect to the nuisance parameters v and
form the concentrated log likelihood function ℓ∗ (δ), concentrated
in δ = (δ1′ π1′ ) ′ . We then work with the first-order conditions for the
maximization of this function.
3. Finally, we can start with the concentrated log-likelihood function
ℓ∗ (δ) and maximize this first with respect to the second set of nuisance
parameters π1 to form the concentrated likelihood function ℓ∗∗ (δ1 )
concentrated in the parameters of primary interest δ1 . We then work
with the first-order conditions for the maximization of this function.5

All three procedures led to iterative processes, which can be interpreted


in terms of econometric estimators. We now deal with each procedure
in turn, again using the rules reported in Chapter 4 and the method for
differentiating a log-likelihood function developed in Section 6.3.

Limited Information Maximum Likelihood Estimator As an Iterative


Generalized Least Squares Estimator
The simplest iterative procedure is obtained from first order conditions
∂ℓ ∂ℓ
= 0 and = 0.
∂v ∂δ
From Section 6.3, we know that
∂ℓ φℓ
= DG′
∂v φvec
and using the log-likelihood function written as in Equation 6.12, we have

φℓ n φ log det  1 φ tr  −1U ′U


=− − .
φvec 2 φvec 2 φvec
Now,
φ log det  1 φ det 
= = vec −1
φvec det  φvec
5
There is one further possibility. First, maximize the log-likelihood function ℓ(v, δ) with
respect to nuisance parameters π and form the concentrated log-likelihood function
ℓ(v, δ 1 ) concentrated in v, δ1 . An iterative process can then be derived from the first order
conditions of the maximization of this function. However, this procedure did not easily
lend itself to an interpretation in terms of known estimators and for this reason was not
included in this study.
6.6 The Limited Information Model 233

by Equation 4.4 of Chapter 4 and using the backward chain rule given by
Theorem 5.2 of Chapter 5, we have
φ tr  −1U ′U φvec −1 φ tr  −1U ′U
= = −( −1 ⊗  −1 )vecU ′U
φvec φvec φvec −1
= −vec −1U ′U  −1 ,
by Equations 4.5 and 4.16 of Chapter 4. It follows that
∂ℓ D′
= G (vec −1U ′U  −1 − nvec −1 )
∂v 2
which equals the null vector, only if

˜ = U ′U
= . (6.13)
n
The second derivative is by the backward chain rule and Theorem 5.1 of
Chapter 5:
∂ℓ 1 ∂u u ′ ( −1 ⊗ In )u
=− = H ′ ( −1 ⊗ In )u.
∂δ 2 ∂δ δu
Setting this derivative to the null vector gives,
H ′ ( −1 ⊗ In )(y − H δ) = 0.
Solving for δ gives and iterative interpretation for the limited information
maximum likelihood (LIML) estimator δ˜ as a generalized least squares
estimator namely,
δ˜ = (H ′ ( −1 ⊗ In )H )−1 H ′ ( −1 ⊗ In )y. (6.14)
This interpretation of the LIML estimator was first obtained by Pagan
(1979).
Equations 6.13 and 6.14 form the basis of our iterative procedures, which
is outlined as follows:

Iterative Procedure 1
1. Apply two-stage least squares (2SLS) (or another consistent estimation
procedure) to y1 = H1 δ1 + u1 to obtain the 2SLSE δˆ1 . Apply ordinary
least squares (OLS) to the reduced form equation Y1 = X 1 + V1 and
obtain the OLSE  ˆ 1 . Compute the residual matrices

û1 = y1 − H1 δˆ1 , ˆ 1.
V̂1 = Y1 − X 
234 Applications

2. Form the matrices



Û = (û1V̂1 ) ˆ = Û Û .
and 
n

3. Compute the GLSE

ˆ
δˆ = (H ′ (
ˆ −1 ⊗ In )H )−1 H ′ (
ˆ −1 ⊗ In )y,

ˆ
and compute ûˆ = y − H δˆ and Ûˆ = rvecn û.ˆ
4. Repeat steps 2 and 3 with Ûˆ in place of Û .
5. Continue in this manner until convergence is reached. The LIML
estimate of δ1 is then the first component of the estimate thus obtained
for δ.

LIML Estimator As an Iterative OLS Estimator


We have seen that maximization of the log-likelihood function ℓ(v, δ) with
respect to the nuisance parameters v gives  = ˜ = U ′U /n. If we substitute
this into the log-likelihood function as given by Equation 6.12, we get the
concentrated log-likelihood function, concentrated in δ = (δ1′ π1′ ) ′ . This
function is, apart from a constant,

n
ℓ∗ (δ) = − log det U ′U .
2

The first order condition for the maximization of this function is ∂ℓ∗ /∂δ =
0 and our iterative process is derived from the two components of this
equation. We have using the backward chain rule and Equation 4.5 of
Chapter 4

∂ℓ∗ n ∂u ∂ log det U ′U


=− = nH ′ vecU (U ′U)−1 = nH ′ ((U ′U)−1 ⊗ I )u.
∂δ 2 ∂δ ∂u

From the inverse of a partitioned matrix, we obtain

(U ′U )−1
 
(u1′ MV u1 )−1 −(u1′ MV u1 )−1 u1′ V1 (V1′V1 )−1
= 1 1
.
−(V1′ Mu V1 )−1V1′ u1 (u1′ u1 )−1 (V1′ Mu V1 )−1
1 1
6.6 The Limited Information Model 235

where MV = In − V1 (V1′V1 )−1V1′ and Mu = In − u1 (u1′ u1 )−1 u1′ . The first


1 1
component of ∂ℓ∗ /∂δ can be then written as

∂ℓ∗ n H1′ MV u1
= ′ (H1′ u1 − H1′ (u1′ V1 (V1′V1 )−1 ⊗ In ))v1 = n ′ 1
,
∂δ1 u1 MV u1 u1 MV u1
1 1

which is equal to the null vector when

H1′ MV u1 = 0.
1

Solving for δ1 gives

δ˜1 = (H1′ MV H1 )−1 H1′ MV y1 . (6.15)


1 1

In a similar manner, the second component of ∂ℓ∗ /∂δ can be written as


∂ℓ∗
= nvecX ′ Mu V1 (V1′ Mu V1 )−1 ,
∂π1 1 1

which is equal to the null vector when

X ′ Mu V1 = 0.
1

Solving gives
˜ 1 = (X ′ Mu X )−1 X ′ Mu Y1
 (6.16)
1 1

Equations 6.15 and 6.16 form the basis of our next iterative process. Before
we outline this process, it pays us to give an interpretation to the iterative
estimators portrayed in these equations.
We have assumed that the rows of U = (u1V1 ) are statistically indepen-
dently identically, normally distributed random vectors with mean 0 and
covariance matrix
 2
σ η′

= .
η 1

It follows that we can write


η
u1 = V1 +ω
σ2
where ω is a random vector whose elements are independent of those of V1 .
Similarly, we can write

V1 = u1 η ′ −1
1 +W
236 Applications

where the elements of u1 are independent of those of W. Consider the


artificial equation,
η
y1 = H1 δ1 + V1 + ω,
σ2
and suppose for the moment we assume V1 is known. Then, applying OLS
to this equation gives

δ˜1 = (H1′ MV H1 )−1 H1′ MV y1 . (6.17)


1 1

In a similar manner, write the second equation as

Y1 = X 1 + u1  + W , (6.18)

where  = η′ −11 . Again, assuming that u1 is known and applying OLS to


this equation gives
˜ 1 = (X ′ Mu X )−1 X ′ Mu Y1 .

1 1

What maximum likelihood estimation appears to do is to take account of


the dependence of the disturbance terms u1 and V1 , in the way previously
outlined, and apply OLS. Of course, this interpretation is iterative as we
have not really solved for 1 and δ1 as  ˜ 1 still depends on δ1 through u1
˜
and δ1 still depends on 1 through V1 . Moreover, Equations 6.17 and 6.18
are artificial in that we have no observations on 1 and V1 (if we did, we
would not have a statistical problem!). But, our results clearly give rise to
the following iterative process:

Iterative Procedure 2

1. Apply 2SLS (or some other consistent estimation procedure), to y1 =


H1 δ1 + u1 to obtain the estimate δˆ1 .
2. Form the residual vector

û1 = y1 − H1 δˆ1

and

M̂u = In − û1 (û1′ û1 )−1 û1′ .


1

3. Form

ˆ 1 = (X ′ M̂u X )−1 X ′ M̂u Y1 ,



1 1
6.6 The Limited Information Model 237

4. Form the residual matrix


ˆ 1,
V̂1 = Y1 − X 
5. Obtain
δ˜1 = (H1′ M̂V H1 )−1 H1′ M̂V y1 .
1 1

6. Repeat steps 2, 3, 4, and 5 with δ˜1 in place of the original estimate δˆ1 .
7. Continue in this manner until convergence is obtained.

LIML Estimator As an Iterative Instrumental Variable Estimator


In obtaining our last iterative process, we conducted a stepwise maximiza-
tion procedure where we first maximize the log-likelihood function ℓ(v, δ)
with respect to the nuisance parameter v = vech and obtained the concen-
trated log-likelihood function ℓ∗ (δ). We then maximized this function with
respect to δ. But if our statistical interest is centred on δ1 , then π1 should
really be considered as a vector of nuisance parameters as well. Suppose now
we continue with stepwise maximization and maximize the concentrated
log-likelihood function ℓ∗ (δ) with respect to this second vector of nuisance
parameters π1 . We then form the concentrated log-likelihood function
ℓ∗∗ (δ1 ) concentrated in the parameters of primary interest. In what follows,
we show that the first order conditions of maximizing this function with
respect to δ1 leads to an iterative instrumental variable interpretation of the
LIML estimator.
We have seen in the previous subsection that maximizing ℓ∗ (δ) with
respect to π1 gives
˜ 1 = (X ′ M̂u X )−1 X ′ M̂u Y1 ,

1 1

and hence
Ṽ1 = (In − X (X ′ Mu X )−1 X ′ Mu )Y1 .
1 1

It follows that
n
ℓ∗∗ (δ1 ) = − log det Ũ ′Ũ
2
where Ũ = (u1Ṽ1 ).
Before using matrix calculus to obtain the derivative ∂ℓ∗∗ /∂δ1 it pays us
to simplify this expression as much as possible. To this end, write
det Ũ ′ Ũ = u1′ u1 det Y1′ Mu In − Mu X (X ′ Mu X )−1 X ′ Mu Mu Y1 .
  
1 1 1 1 1

(6.19)
238 Applications

Consider now the artificial regression equation of Y1 on X and u1 given


by Equation 6.18. Let M = In − X (X ′ X )−1 X ′ . Then, we know that the
residual sum of squares from the regression of Mu Y1 on Mu X1 is equal
1 1
to the residual sum of squares from the regression of MY on Mu . So, the
1
determinant on the right side of Equation 6.19 is equal to

det Y1′ M In − Mu1 (u1′ Mu1 )−1 u1′ M MY1


  

1
det (u1Y1 ) ′ M (u1Y1 ) .
 
= ′
u1 Mu1

Furthermore,
′
0′ 0′
  
1 1
(u1 Y1 ) ′ M (u1 Y1 ) = (y1 Y1 ) ′ M (Y1 y1 ) (6.20)
−β1 IG −β1 IG
1 1

where the first partitioned matrix on the right-hand side of Equation 6.20
has a determinant equal to one. Therefore,

det (u1 Y1 ) ′ M (Y1 u1 ) = det (y1 Y1 ) ′ M (Y1 y1 )


   

which does not depend on δ1 . Thus, the log-likelihood function ℓ∗∗ (δ1 ) can
be written as
n u ′ Mu n
ℓ∗∗ (δ1 ) = k ∗ − log 1 ′ 1 = k ∗ − (log u1′ Mu1 − log u1′ u1 )
2 u1 u1 2

where k ∗ does not depend on δ1 . Obtaining our derivative is now a simple


matter.
Using the backward chain rule,

∂ log u1′ Mu1 1 ∂u1 ∂u1′ Mu1 2H ′ Mu1


= ′ = − ′1 .
∂δ1 u1 Mu1 ∂δ1 ∂u1 u1 Mu1

Similarly,

∂ log u1′ u1 2H ′ u
= − ′1 1 ,
∂δ1 u1 u1
so
∂ℓ∗∗
 ′
H1′ Mu1 (H1′ Nu1 u1′ u1 − H1′ u1 u1′ Nu1 )

H1 u1
= −n − = n ,
∂δ1 u1′ u1 u1′ Mu1 u1′ u1 u1′ Mu1
6.6 The Limited Information Model 239

where N = X (X ′ X )−1 X ′ . The maximum likelihood estimator of δ1 then


satisfies the equation
 ′ 
H1 Nu1 u1′ u1 − H1′ u1 u1′ Nu1
= 0.
u1′ Mu1
We now prove that this equation is the same as
H̃1′ u1 = 0,
where
−1
˜ 1 X1 ).
H̃1 = X X ′ Mu X X ′ Mu H1 = (Ỹ1 X1 ) = (X 

1 1

If this is the case, then the LIML estimator of δ1 has an iterative instrumental
variable interpretation given by
δ˜1 = (H̃1′ H1 )−1 H̃1′ y1 .
 −1
To establish our result, we expand X ′ Mu X to obtain
1

 −1 Nu u ′ N
X X ′ Mu X X′ = N + ′1 1 .
1 u1 Mu1
Then, after a little algebra, we find that
 


 −1
′ Nu1′ Mu1 − Nu1 u1′ M
X X Mu X X Mu = .
1 1 u1′ Mu1
Thus,
   ′ 
H1′ Nu1 u1′ Mu1 − H1′ Mu1 u1′ Nu1 H1 Nu1 u1′ u1 − H1′ u1 u1′ Nu1
H̃1′ u1 = =
u1′ Mu1 u1′ Mu1
as we require.
Our results give rise to a third iterative process for finding the LIML
estimator of δ1 , which is now outlined:

Iterative Procedure 3
1. Apply steps 1, 2, and 3 of iterative process 2.
2. Form
ˆ1
Ŷ1 = X 
and
Ĥ1 = (Ŷ1 X1 )
240 Applications

and obtain
δ1 = (Ĥ1′ Ĥ1 )−1 Ĥ1′ y1 .
3. Repeat steps 1 and 2 with δ1 in place of the original estimate of δ1 .
4. Continue in this manner until convergence is achieved.

6.6.3 Comparison of the Three Iterative Procedures


The model and data used to compare our three procedures are those associ-
ated with the Klein (1950) model. This model consisted of three equations:
a consumption equation, an investment equation, and a wage equation.
For each equation, our three iterative procedures were started up with the
following initial values:
1. The two-stage least squares estimates
2. The ordinary least squares estimates
3. The null vector
4. A vector of ones
5. A vector of arbitrary near values
6. A vector of arbitrary far values.
The arbitrary near values were obtained from a point arbitrarily chosen
from the 95 percent concentration ellipsoid of the parameters obtained
using the LIML estimators. Likewise, the arbitrary far values were obtained
from a point arbitrarily chosen outside the 99 percent concentration ellip-
soid of the parameters obtained from the LIML estimators. Each iterative
procedure was run with each initial value until convergence was achieved
or until it was clear that the procedure was not going to converge. Con-
vergence was defined as taking place when the values obtained from the
procedure all were within 0.000001 of the LIML estimates. No convergence
was defined as taking place when this did not happen after 10,000 iterations.
For each case, the number of iterations was counted for the procedure in
question to move from the initial values to the LIML estimates. The pro-
grams were written in GAUSS. The results are presented in Tables 6.1, 6.2,
and 6.3.
Focusing our attention on Tables 6.1, 6.2, and 6.3, we see that all three
procedures converge when estimates are used as the initial starting values.
Procedure 3 is far more efficient in terms of number of iterations until
convergence than Procedure 2, which in turn is more efficient than Pro-
cedure 1. Moreover, it makes little difference whether the estimates used
are derived from consistent estimators (2SLS) or inconsistent estimators
6.6 The Limited Information Model 241

Table 6.1. Consumption equation

Number of iterations until convergence


Initial values Procedure 1 Procedure 2 Procedure 3

2SLS estimates 829 559 6


OLS estimates 836 565 7
Null vector 712 No Conv. 6
Vector of ones 841 No Conv. 7
Arbitrary near values 825 514 6
Arbitrary far values 872 599 6

(OLS). For the other four sets of initial starting values, Procedure 1 and
Procedure 3 always converge with Procedure 3, again being vastly more
efficient than Procedure 1. Procedure 2 often would not converge. In the
case where it did, it was ranked in efficiency terms between Procedure 1 and
Procedure 3.
The message from these results seems clear. Iterative procedures based
on the first-order conditions derived from the maximization of the log-
likelihood function work, but are inefficient. More efficient iterative proce-
dures can be derived by working with concentrated log-likelihood functions.
But the most efficient procedure arises from the first-order conditions of the
maximization of the log-likelihood function concentrated in the parameters
of primary interest. Moreover, such a procedure seems relatively insensitive
to the initial starting value. Concentrating out a subset of nuisance parame-
ters can lend to a more efficient iterative procedure, but this procedure may
become sensitive to initial starting values. Arbitrary starting values may not
give rise to convergence.

Table 6.2. Investment equation

Number of iterations until convergence


Initial values Procedure 1 Procedure 2 Procedure 3

2SLS estimates 135 75 4


OLS estimates 142 81 5
Null vector 129 86 5
Vector of ones 143 No Conv. 6
Arbitrary near values 139 No Conv. 5
Arbitrary far values 158 87 5
242 Applications

Table 6.3. Wage equation

Number of iterations until convergence


Initial values Procedure 1 Procedure 2 Procedure 3

2SLS estimates 137 33 33


OLS estimates 137 34 34
Null vector 152 No Conv. 45
Vector of ones 167 No Conv. 48
Arbitrary near values 143 No Conv. 37
Arbitrary far values 120 37 37

6.7 The Full Information Model

6.7.1 The Model and the Log-Likelihood Function


The full information model is the statistical model behind a linear economic
model. Assuming this model contains G jointly dependent current endoge-
nous variables and k predetermined variables, we write the ith equation of
the full information model as

yi = Yi βi + Xi γi + ui = Hi δi + ui , i = 1, . . . , G,

where yi is an n × 1 vector of sample observations on a current endogenous


variable, Yi is an n × Gi matrix of observations on the other Gi current
endogenous variables in the ith equation, Xi is an n × ki matrix of ki
predetermined variables in the ith equation, ui is an n×1 vector of random
disturbances, Hi is the n × (Gi + ki ) matrix (Yi Xi ) and δi is the (Gi + ki ) ×1
vector (βi′ γi′ ) ′ . It is assumed that the ui s are normal random vectors with
expectations equal to the null vectors, and that they are contemporaneously
correlated. That is, if ut i and us j are the tth element and sth element of ui
and u j , respectively, then

E (ut i us j ) = σi j if t = s
=0 if t = s

Writing our model succinctly, we have

y = Hδ + u (6.21)

E (u) = 0, V (u) =  ⊗ I, u ∼ N (0,  ⊗ I )


6.7 The Full Information Model 243

where y = (y1′ . . . yG′ ) ′ , u = (u1′ . . . uG′ ) ′ , δ = (δ1′ . . . δG′ ) ′ , H is the block


diagonal matrix
⎡ ⎤
H1 O
⎢ .. ⎦,

⎣ .
O HG

and  is a symmetric, positive definite matrix whose (i, j)th element is σi j .


A different way of writing our model is

Y B + XŴ = U (6.22)

where Y is the n × G matrix of observations on the G current endogenous


variables, X is the n × k matrix of observations on the k predetermined
variables, B is the G × G matrix of coefficients on the current endogenous
variables in our equations, Ŵ is an k × G matrix of coefficients of the
predetermined variables in our equations, and U is the n × G matrix
(u1 . . . uG ). Some of the elements of B are known a priori to be one or
zero as yi has a coefficient of one in the ith equation and some current
endogenous variables are excluded from certain equations. Similarly, as
certain predetermined variables are excluded from each equation, some of
the elements of Ŵ are known to be zero. We assume B is non-singular.
The reduced-form of our model is

Y = −X ŴB−1 + U B−1 = X  + V

or taking the vec of both sides

y = (IG ⊗ X )π + v

where π = vec, and v = vecV = (B−1 ⊗ In )u.
The unknown parameters of our model are θ = (δ ′ v ′ ) ′ where v = vech.
Usually, δ is the vector of parameters of primary interest and v is the vector
of nuisance parameters.
The likelihood function is the joint probability function of y. We obtain
this function by starting with the joint probability density function of u.
We have assumed that u ∼ N (0,  ⊗ In ), so the joint probability density
function of y is
 
1 1 ′ −1
f (y) = |det J| n 1 exp − u ( ⊗ In )u ,
(2π) 2 (det  ⊗ In ) 2 2
244 Applications

with u set equal to y − H δ and where


∂u
det J = det .
∂y
Our first application of matrix calculus to this model involves working out
the Jacobian matrix ∂u/∂y. Taking the vec of both sides of U = Y B + X Ŵ,
we have
u = (B ′ ⊗ In )y + (Ŵ ′ ⊗ I )x,
where u = vecU , y = vecY , and x = vecX . It follows that
∂u
= (B ⊗ In ),
∂y
and that
 
|det(B ⊗ In )| 1 ′ −1
f (y) = n 1 exp − u ( ⊗ In )u .
(2π) 2 (det  ⊗ In ) 2 2
However, from the properties of the determinant of a Kronecker production
we have det( ⊗ In ) = (det )n , so
(det B)n
 
1 ′ −1
f (y) = n n exp − u ( ⊗ In )u ,
(2π) 2 (det ) 2 2
with u set equal to y − H δ in this expression. This is the likelihood function
L(θ). The log-likelihood function, apart from a constant, is
n 1
ℓ(θ) = n log |det B| − log det  − u ′ ( −1 ⊗ In )u,
2 2
with u set equal to y − H δ.
An alternative way of writing this function is
n 1
ℓ(θ) = n log |det B| − log det  − tr  −1 U ′ U. (6.23)
2 2
Although this function has an extra term in it, namely nlog|det B|, when
compared with the corresponding log-likelihood function of the limited
information model as given by Equation (6.12), it is far easier to manipulate
mathematically than the latter. The reason for this is that this log-likelihood
function contains only one set of nuisance parameters, whereas that of
the limited information model contained two sets of nuisance parameters.
However, this means that the log-likelihood function of the full information
model does not lend itself to a variety of iterative processes. In the next
6.7 The Full Information Model 245

subsection, we develop a single iterative process for the full information


likelihood estimator (FIML).

6.7.2 The Full Information Maximum Likelihood Estimator


As an Iterative Instrumental Variable Estimator
The term n log |det B| of the log-likelihood function given by Equation
6.23 is a function of δ, but not of v. It follows that our derivative of the log-
likelihood function with respect to v is the same as that derived in the limited
information model and that the concentrated log-likelihood function for
the model in hand is, apart from a constant
n
ℓ∗ (δ) = n log |det B| − log det U ′ U.
2
Now,
∂ℓ∗ (δ) n∂ log |det B| n ∂ log det U ′U
= − (6.24)
∂δ ∂δ 2 ∂δ
so our first task is to express the matrix B in terms of the vector δ so we can
evaluate the first derivative on the right-hand side of this equation.
To this end, we write the ith equation of our model as
yi = Y W i βi + X T i γi + ui ,
where W i and T i are G×Gi and k×ki selection matrices, respectively, with
the properties that
Y W i = Yi , X T i = Xi .
Alternatively, we can write
yi = Y Wi δi + X Ti δi + ui,
where Wi and Ti are the G×(Gi + ki ) and k×(Gi + ki ) selection matrices
given by Wi = (W i O) and Ti = (O T i ), respectively.
Under this notation, we can write
Y = ( y1 · · · yG ) = Y ( W1 δ1 · · · WG δG ) + X ( T1 δ1 · · · TG δG ) + U .
It follows then that
B = IG − ( W1 δ1 · · · WG δG ),

Ŵ = −( T1 δ1 · · · TG δ1 ).
246 Applications

Moreover,

vec B = vec IG − W δ,

where W is the block diagonal matrix


⎡ ⎤
W1 O
W =⎣
⎢ .. ⎦.

.
O WG

Returning to our derivative now, clearly from


∂vec B
= −W ′ ,
∂δ
and as
∂ log |det B| ∂vec B ∂ log |det B|
= ,
∂δ ∂δ ∂vec B
we obtain
∂ log |det B| ′
= −W ′ vec(B−1 ). (6.25)
∂δ
From our work on the limited information model,
∂ log det U ′U H ′ ˜ −1
= −2 ( ⊗ In )u. (6.26)
∂δ n
Now,

˜ = 1 W ′ (IG ⊗ V ′ ) 
˜ ⊗ IG vec B−1 ′ 
W ′ vec B−1 = W ′  ˜ ⊗ In u.
 −1   −1 
n
(6.27)

Returning to Equation 6.24 using Equations 6.25, 6.26, and 6.27, we find
we can write
∂ℓ∗ (δ)  ′ ˜ ⊗ In u
= H − W ′ (IG ⊗ V ′ ) 
  −1 
∂δ
and H ′ − W ′ (IG ⊗ V ′ )W is the block matrix
⎛ ⎞ ⎛  ⎞
H1 − V W1 O X 1 X1 O
⎜ .. ⎠=⎝
⎟ ⎜ .. ⎠.

⎝ . .
O HG − V WG O (X G XG )
6.7 The Full Information Model 247

Let H i = (X i Xi ) and H be the block diagonal matrix where H i is in the


ith block diagonal position. Then,
∂ℓ∗ (δ) ′  −1
˜ ⊗ In u.

=H 
∂δ
Setting this derivative equal to the null vector and solving for δ gives

δ˜ = H˜ ˜ ⊗ In H −1 H˜  ˜ ⊗ In y
  −1    −1 
(6.28)

where H˜ is the block diagonal matrix with H˜ i = (X  ˜ i Xi ) in the ith block


˜
diagonal position, i being the MLE of i . Clearly, Equation 6.28 gives
an iterative IVE interpretation of the FIML estimator of δ, where X i is
used as an IV for Yi . The iterative process arising from this equation can be
outlined as follows:
1. Run three stage least squares (or some other consistent estimation
procedure) on y = H δ + u, to obtain the estimate δ. ˆ Compute the
ˆ Û = rvecnû, and 
residual vector û = y − H δ, ˜ = Ũ ′Ũ /n.
2. Using δˆ compute B̂, Ŵ̂, 
ˆ = −Ŵ̂ B̂−1 , and Ŷi = X W
ˆ i.
3. Compute Hˆ = (Ŷ X ) and Hˆ = diagHˆ .
i i i i
4. Compute
 ′ −1 ′
˜ ⊗ In )H −1 Hˆ 
δ = Hˆ ( ˜ −1 ⊗ In y.
 

ˆ
1. Repeat process with δ in place of δ.
2. Continue in this manner until convergence is reached.

The Performance of Our Iterative Procedure


In this subsection, we use real data to investigate the efficiency of our
iterative procedure. We wish to determine whether the procedure is sensitive
to the initial starting values and if indeed any of these values result in non-
convergence.
As in the previous section, the model and data used in this study is that
of Klein (1950) model. The following starting initial values were tried:
1. The three-stage least squares estimates
2. The two-stage least squares estimates
3. The ordinary least squares estimates
4. The limited-information maximum likelihood estimates
5. The null vector
6. A vector of arbitrary near values
7. A vector of arbitrary far values
248 Applications

Table 6.4. Full-information model

Number of iterations
Initial values until convergence

3SLS estimates 107


2SLS estimates 108
OLS estimates 109
Null vector 108
Arbitrary near values 106
Arbitrary far values 105

As in the limited information case, arbitrary near values come from an


arbitrary point chosen from the 95 percent concentration ellipsoid of the
parameters of the model obtained using the FIML estimate; arbitrary far
values come from an arbitrary point outside the 99 percent concentration
ellipsoid. Unlike the limited information model, a vector of ones was not
tried as initial values as this violates one of the assumptions of the full
information model, namely that the matrix B of Equation 6.22 is non-
singular. Again, the program was written in GAUSS, and the same definition
of convergence was used. Our results are presented in Table 6.4.
Table 6.4 clearly indicates that this iterative procedure is insensitive to the
initial values used in starting the procedures and that all initial values lead
to convergence.

6.7.3 A Lagrangian Multiplier Test for Endogeneity


In this section, we develop a Lagrangian multiplier test for endogeneity
in the full-information model. Our analysis calls on the work we did on
twining matrices in Section 2.7 of Chapter 2.
Several of the matrices in the full-information model are in fact inter-
twined matrices. The block diagonal matrix H ′ , when we write the model as
in Equation 6.21, is obtained by intertwining the submatrices of the block
diagonal matrices

Y1′ X1′
⎛ ⎞ ⎛ ⎞
O O

Y =⎝
⎜ .. ⎟ ′
and X = ⎝
⎜ .. ⎟
. ⎠ . ⎠
O YG′ O XG′
6.7 The Full Information Model 249

so we can write
 ′ 
Y
T ′ = H ′,
X
where T is the appropriate twining matrix.
Recognising this relationship facilitates the mathematics required in
applying classical statistical procedures to our model. To illustrate, sup-
pose we want to test the null hypothesis
H0 : β = 0
against the alternative
HA : β = 0,
where β = (β′1 . . . β′a )′
The null hypothesis implies that the equations of our model contain no
right-hand current endogenous variables and thus our model under the
null collapses to the Seemingly Unrelated Regressions Equation Model, see
Turkington (2005). Suppose further we want to develop the Lagrangian
multiplier test statistic for this hypothesis and present it as an alternative
to other test statistics that would be using to test endogeneity such as the
Hausman test statistic (Hausman (1978)).
We are working with the concentrated log-likelihood function ℓ∗ (δ)
formed by concentrating out the nuisance parameters v. The test statistic
we seek to form is then
′ 
1 ∂ℓ∗  ββ ˆ ∂ℓ∗ 

T∗ = I ( θ)
n ∂β  θˆ ∂β  θˆ

where θˆ refers to the constrained maximum-likelihood estimators (CMLE),


ββ
that is, the MLEs formed after we impose H0 on our model and I refers
to that part of the asymptotic Cramer-Rao lower bound corresponding to
ββ −1
β. Thus, I is the appropriate component of I (δ) where
∂ 2 ℓ∗
I (δ) = −p lim .
∂δ∂δ
ˆ we set β = 0 so from Equation 6.21 our model becomes
In forming θ,
y = X γ + u, (6.29)
where γ = (γ1′ . . . γG′ )′ and
 
E (u) = 0, V (u) =  ⊗ In , u ∼ N 0,  ⊗ In
250 Applications

which is the seemingly unrelated regressions equations model (Zellner


1962). An iterative interpretation of the constrained MLE of γ is then
found by replacing H˜ and H by X in Equation 6.28 to obtain
 ′  −1
˜ ⊗ In X −1 X ′ 
˜ ⊗ In y,
   −1 
γ̂ = X 

where Ẽ = Ũ ′Ũ /n, Ũ = rvecn ũ with ũ = (ũ1′ · · · ũG′ ) ′ and ũi the ordi-
nary least squares residual vector from the ith equation, that is, ũi =
(In − Xi (Xi′ Xi )−1 Xi′ )yi .
This iterative interpretation regards the joint generalised least squares
estimator (JGLSE) as the starting point in the iterative process to the MLE.
The constrained MLE of θˆ (or at least the iterative asymptotic equivalent
of this estimator) is θˆ = (( 0 γ̂ ) ′ T ′ v̂) ′ where v̂ = vech
ˆ and 
ˆ = Û ′Û /n,
Û = rvecn û, and û is the JGLS residual vector, that is, û = y − X γ̂. Notice
that the twining matrix T is involved in the expression for the constrained
MLE of θ.
Our twining matrix T comes into play again when we form the second
component of our test statistic, namely ∂ℓ/∂β. Let  = (β ′ γ ′ ) ′ , then as
T  = δ it follows that T ∂ℓ/∂ψ = ∂ℓ/∂δ and that ∂ℓ/∂ψ = T ′ ∂ℓ/∂δ.We
can then obtain the derivative we want using

∂ℓ ∂ℓ
=S
∂β ∂ψ
G
where S is the selection matrix (Im Om×p ) with m = i=1 Gi and p =
G
i=1 ki .
In summary,

∂ℓ ∂ℓ
=A (6.30)
∂β ∂δ

where
(IG OG ×k ) O
⎛ ⎞
1 1 1

A = ST ′ = ⎝
⎜ .. ⎠.

.
O (IG OG ×k )
G G G

Returning to Equation 6.24, we have that

∂ℓ∗ ′  −1
ˆ ⊗ In u

=H 
∂δ
6.7 The Full Information Model 251

 
where H is a block diagonal matrix with H i = X i Xi in the ith block
diagonal position. It follows from Equation 6.30 that
⎛ ′ ′ ⎞
1 X1 O
∂ℓ ⎜ ..

⎟ ˆ −1

=⎝ . ⎠  ⊗ In u.
∂β
O G′ XG′
The third component of the quadratic form, that is, the Lagrangian multi-
plier test statistic, can also be expressed with the help of twining matrices.
As ∂ℓ/∂β = A∂ℓ/∂δ, we have that I ββ = AI ββ A ′ . From our discussion in
Section 6.2, it is clear that
−1
I (δ) = I δδ (θ).
It is well known, see for example Turkington (2005), that
 −1
δδ 1 ′ −1
I (θ) = p lim H ( ⊗ N )H ,
n
where N is the projection matrix N = X (X ′ X )−1 X ′ . Moreover, H =
(Y X )T ′ so we can write
1 −1
I δδ = p lim ST ′ T (Y X ) ′ ( −1 ⊗ N )(Y X )T ′ T S′

n
1  −1
= p lim S (Y X ) ′ ( −1 ⊗ N )(Y X ) S ′ (6.31)
n
so in obtaining the part of the Cramer-Rao lower bound we want, we need
the (1,1) block matrix of the inverse of Equation 6.31. That is,
1  ′ ′
I = p lim Y  −1 ⊗ N Y − Y  −1 ⊗ N
ββ  
n
 ′  −1 ′   −1
X X  −1 ⊗ N X X  −1 ⊗ N Y .
Evaluating the probability limit requires basic asymptotic theory. If from
here on, we use the notation that {Ai′ A j } stands for a partitioned matrix
whose (i, j)th block is Ai′ A j , then
ββ 1  i j ′ ′
I = p lim σ i Xi X j  j
n
−1  i j −1
− σi j i′ Xi′ X j σi j Xi′ X j σ Xi′ X j  j
 
.

The Reduced Form Parameters under H0


Before we can evaluate our test statistic further it must be noted that both
ββ
∂ℓ∗ /∂β and I involve reduced form parameters, and so we must investigate
252 Applications

the nature of these parameters when we impose β = 0 on the model. Clearly,


β = 0 implies that B = I, so Y = −X Ŵ + U and  is −Ŵ and U is V. But,
from Equation 6.29
Y = (y1 . . . yG ) = (X1 γ1 . . . XG γG ) + U .
Consider now the selection matrix Qi such that XQi = Xi for i = 1, . . . , G.
Then,
Y = (XQ1 γ1 . . . XQG γG ) + U = X (Q1 γ1 . . . QG γG ) + U ,
so under the null hypothesis
 = (Q1 γ1 . . . QG γG ).
Moreover, as Yi = Y W i
i = (Q1 γ1 . . . QG γG )W i
under the null hypothesis.

Procedure for Forming the Lagrangian Multiplier Test Statistic


We are now in a position to form the LMT statistic. Taking the com-
ββ
ponents ∂ℓ∗ /∂β and I , and evaluating these at the constrained MLE
θˆ = ((0 γ̂) ′ T ′ v̂) ′ leads to the following procedure.
1. Apply JGLS to the equations y = X γ + u and form the JGLSE γ̂ =

(X ( ˜ −1 ⊗ In )X )−1 X ′ (
˜ −1 ⊗ In )y, together with the residual vector
û = y − X γ̂.
2. Form  ˆ = Û ′Û /n where Û = rvecn û, and  ˆ −1 is the G×G matrix
whose (i, j)th elements is σ̂i j .
3. Form the selection matrices Q1 , . . . , QG and W 1 , . . . , W G . Using these
and γ̂ form ˆ i = (Q1 γ̂1 . . . QG γ̂G )W i .
4. Form

ˆ ′ X1′
⎛ ⎞
 G O
∂ℓ  .. ⎟ ˆ −1
=⎝ ⎠ ( ⊗ In )û.

θˆ .
∂β
O ˆ ′ X′
 G G

5. Form
ˆ = 1  i j ˆ ′ ′ ˆ −1
σ̂ i Xi X j  j − σ̂i j 
ˆ i′ Xi′ X j σ̂i j Xi′ X j
ββ  
I (θ)
n
ˆ j −1 .
× σ̂i j Xi′ X j 
 
6.7 The Full Information Model 253

6. To obtain the LMT statistic derive the quadratic form,


 
∗ ′ ∗
1 ∂ℓ ˆ ∂ℓ
T∗ =  I ββ ( θ)  .
 
n ∂β  θˆ ∂β  θˆ
Under H0 , our test statistic for large sample sizes approximately has a
chi-squared distribution with G degrees of freedom so the upper tail of the
distribution is used to get the appropriate critical value.
Symbols and Operators Used in this Book

With Respect to a Matrix A



Ai· , ai ith row of A
A·j , a j jth column of A
(A) j matrix formed by deleting the first j rows of A
(A) j matrix formed by deleting the first j columns of A

With respect to the Identity Matrix

eiG ith column of the G ×G identity matrix IG



eiG ith row of the G ×G identity matrix IG

With respect to Partition Matrices


Let
⎛ ⎞ ⎛ ⎞
A1 B1
⎜ .. ⎟ ⎜ .. ⎟
A = ⎝ . ⎠ and B = ⎝ . ⎠
AG BG

where each submatrix of A is m × p and each submatrix of B is n × p.

AτGmn B = A1 ⊗ B1 + · · · + AG ⊗ BG



A1
⎜ B1 ⎟
  ⎜ ⎟
A ⎜ .. ⎟
TG,m,n =⎜ . ⎟
B ⎜ ⎟
⎝ AG ⎠
BG

255
256 Symbols and Operators Used in this Book

⎛  ⎞
A1 j·
A ( j ) = ⎝ ... ⎠
⎜ ⎟
 
AG j·
 
rvecm A = A1 . . . AG
 
Let C = C1 . . . CG where each submatrix is q ×n.
    
C( j ) = C1 .j . . . CG .j
⎛ ⎞
C1
⎜ .. ⎟
vecnC = ⎝ . ⎠
CG
Special Matrices
Kmn commutation matrix
rvecn Kmn generalized rvec of the commutation matrix
vecm Kmn generalized vec of the commutation matrix
1 
Nn = In2 + Knn
2
Ln , Ln Nn , L̄n Nn , Ln , Ln∗ elimination matrices
Dn , D̄n duplication matrices
TG,m,n twining matrix
O null matrix
0 null column vector
References

Byron, R. P. ‘On the Derived Reduced Form from Limited Information Maximum
Likelihood’, Australia National University Memo, 1978.
Bowden, R. and Turkington, D. A. ‘Instrumental Variables’, vol 8 of the Econometric
Society Monographs in Quantitative Economics. New York: Cambridge University Press,
1990.
Durbin, J. ‘Maximum Likelihood Estimator of the Parameters of a System of Simulta-
neous Regression Equations’, Econometric Theory 4 (1988): 159–70.
Dwyer, P. S. ‘Some Applications of Matrix Derivatives in Multivariate Analysis’. Journal
of the American Statistical Association 26 (1967): 607–25.
Dwyer, P. S. and MacPhail, M. S. ‘Symbolic Matrix Derivatives’. Annals of Mathematical
Statistics 19 (1948): 517–34.
Efron, B. ‘Defining the Curvature of a Statistical Problem (with Applications to Second
Order Efficiency)’, Annals of Statistics 3 (1975): 1189–242.
Fuller, W. ‘Some Properties of a Modification of the Limited Information Estimator’,
Econometrica 45 (1977): 939–56.
Graham, A. Kronecker Products and Matrix Calculus with Applications. Chichester, U.K.:
Ellis Horwood, 1981.
Graeme, W. H. Econometric Analysis, 7th edn. Pearson, N.J.: Prentice Hall, 2010.
Hausman, J. ‘Specification Tests in Econometrics’, Econometrica 46 (1978): 1251–71.
Henderson, H. V. and Searle, S. R. ‘Vec and Vech Operators for Matrices with Some Uses
in Jacobian and Multivariate Statistics’, Canadian Journal of Statistics 7 (1979): 65–81.
Henderson, H. V. and Searle, S. R. ‘The Vec-Permutation Matrix, the Vec Operator and
Kronecker Products: A Review’, Linear and Multilinear Algebra 9 (1981): 271–88.
Horn, R. A. and Johnson, C.R. Matrix Analysis. New York: Cambridge University Press,
1981.
Lutkepohl, H. Handbook of Matrices. New York: John Wiley & Sons, 1996.
Magnus, J. Linear Structures. New York: Oxford University Press, 1988.
Magnus, J. R. ‘On the Concept of Matrix Derivative’, Journal of Multivariate Analysis
101 (2010): 2200–06.
Magnus, J. R. and Neudecker, H. Matrix Differential Calculus with Applications in Statis-
tics and Econometrics, revised edn. New York: John Wiley & Sons, 1999.
Maller, R. A. and Turkington, D. A. ‘New Light on the Portfolio Allocation Problem’,
Mathematical Methods of Operations Research 56 (2002): 501–11.

257
258 References

Pagan, A. ‘Some Consequences of Viewing LIML as an Iterated Aitken Estimator’,


Economic Letters (1979): 369–72.
Parring, A. M. ‘About the Concept of the Matrix Derivative’. Linear Algebra and its
Applications 176 (1992): 223–35.
Rilstone, P., Srivastava, U. K., and Ullah, A. ‘The Second-order Bias and Mean Squared
Error of Nonlinear Estimators’, Journal of Econometrics 75 (1996):369–95.
Rogers, G. S. Matrix Derivatives. New York: Marcel Dekker, 1980.
Theil, H. Principles of Econometrics. New York: John Wiley & Sons, 1971.
Turkington, D. A. Matrix Calculus and Zero-One Matrices, Statistical and Econometric
Applications, paperback edn. New York: Cambridge University Press, 2005.
Zellner, A. ‘An Efficient Method of Estimating Seemingly Unrelated Regressions and
Tests for Aggregation Bias’. Journal of the American Statistical Association 57 (1962):
348–68.
Index

Applications generalized, 167–168, 179, 181, 184–185


classical statistics and, 218–226 limited information model and, 233–234,
full information model and, 242–254 238
information matrix and, 215, 219, 221, multivariate normal distribution and,
225, 228–229 227–228
iterative procedures and, 230–240 optimization and, 216–217
Klein model and, 231, 240, 247 vec operators and, 179, 181, 184–185, 207
limited information model and, 229–242 vector function and, 167
log-likelihood functions and, 213–215, Chi–square distribution, 253
218, 222–224, 226, 229–232, 234, Commutation matrices, 256
237–238, 241–245, 249 cross-product of matrices and, 50–57,
matrix calculus and, 223–226 68–70
multivariate normal distribution and, definition of, 36
215, 219, 225–230 derivatives and, 58, 139–141
nuisance parameters and, 221–223 derived results for, 60–68
optimization problems and, 214–218 econometrics and, 35
Sharpe ratio and, 216, 218 elementary matrix and, 37, 40–41, 74
Asset returns, 216 explicit expressions for, 36
Asymptotic Cramer-Rao lower bound, identity matrix and, 37, 57
218–219, 221, 229, 249, 251 Kronecker products and, 38–50, 55, 68,
70
Block diagonal matrices, 82, 126, 130, 243, Nn and, 71–73
246–248, 251 permutation matrices and, 35–36, 48
Bowden, R., 230 properties of, 35–49
Byron, R. P., 230 rvec operators and, 36, 57–73
statistics and, 35
Chain rule symmetry and, 37, 72–73
backward, 166–171, 173–174, 176, 179, theorems for, 39–57, 59–71
181, 184–185, 187, 193, 195, 202, 207, transposes and, 47, 61–62, 68, 139
216–217, 224, 227–228, 233–234, 238 twining matrices and, 79–80, 139
classical statistics and, 224 Umn matrix and, 74–75
cross-product of matrices and, 169–171, vec operators and, 36, 38–49, 57–73
173–174, 176, 187, 193, 195, 202 Concentrated likelihood function, 222,
derivatives and, 161–162, 164–176, 179, 232
181, 184–185, 187, 193, 195, 202, 207, Constrained maximum-likelihood
216–217, 224, 227–228, 233–234, 238 estimators (CMLE), 249

259
260 Index

Convergence different concepts of, 135–166, 214–215


full information model and, 247–248 elimination matrices and, 35
limited information model and, 234, 237, full information model and, 245–247,
240–241 250
Cottrell, A., 231n4 gradient vector and, 165
Covariance matrix Hessian matrix and, 215
classical statistics and, 219, 221 Kronecker products and, 143, 147, 151,
limited information model and, 229–231, 157, 168–169, 186
235 limited information model and, 233,
multivariate normal distributions and, 237–238
226 multivariate normal distribution and,
nuisance parameters and, 221 227, 229
rvec operators and, 18 optimization problems and, 215
vec operators and, 18, 207 partial, 134–135, 137, 143–146, 157–160,
Cramer-Rao lower bound, 218–219, 221, 165, 205–214, 223
229, 249, 251 probability density function and, 161
Cross-product of matrices, ix product rule and, 164, 167, 173, 175–176,
basic, 186–190 216–217
chain rule and, 169–171, 173–174, 176, recursive, 157–163
187, 193, 195, 202 rvec operators and, 141, 169–170, 173,
commutation matrices and, 50–57, 175, 178–186
68–70 score vectors and, 215
definition of, 6, 16 symmetry and, 210–213
derivatives and, 169–170, 173, 175, theorems for, 205–213
186–204 transformation principles and, 143–157
identity matrix and, 11 vec operators and, 141, 178–186,
Kronecker products and, 1, 9, 11–12, 50, 205–213
55, 186 v operators and, 207–210
matrix calculus and, 168–177, 186–204 Duplication matrices, x, 18, 208, 210, 214,
partitions and, 6–13, 171–177, 186, 195, 226, 256
199, 203 block diagonal matrices and, 126, 130
properties of, 6–13 classical statistics and, 225
rvec operators and, 15–17, 68–70, 141, Dn and, 125–132
168–191, 194 Dn and, 132–133
submatrices and, 6–12, 15–16, 25–26, elimination matrices and, 112,
171–177, 186, 193, 195, 199, 203 124–132
theorems for, 7–13, 25–27, 50–57 explicit expressions for, 122, 124
transposes and, 7, 15, 25 importance of, 89
vec operators and, 12–13, 15–17, 25–27, Kronecker products and, 120, 125,
68–70, 168–177, 190, 194 128
very large, 7 partitions and, 116, 118, 128
properties of, 111–133
Derivatives, x statistics and, 121, 132
arranging, 134 submatrices and, 116, 118, 121, 128
chain rule and, 161–162, 164–176, 179, symmetry and, 112, 121–122, 124,
181, 184–185, 187, 193, 195, 202, 207, 130–132
216–217, 224, 227–228, 233–234, 238 theorems for, 125–133
classical statistics and, 220, 223–224 transposes and, 115–116, 118, 130
commutation matrices and, 58, 139–141 vech operators and, 112, 124
cross-product of matrices and, 169–170, Durbin, J., 230
173, 175, 186–204 Dwyer, P. S., 137–138, 143
Index 261

Econometrics, 214–215 zero-one matrices and, 29


classical statistics and, 221, 223–225 Explicit expressions
commutation matrices and, 35 commutation matrices and, 36
Gretl and, 231 duplication matrices and, 122, 124
limited information maximum likelihood elimination matrices and, 98–103, 105,
(LIML) estimators and, 230–231 109–110
limited information model and, 230–232 twining matrices and, 77–79, 84–85
matrix calculus and, 141, 207, 223–225 Exponential regression model, 161
selection matrices and, 28–29
twining matrices and, 76 Fuller, W., 231
zero-one matrices and, 28–29, 35, 76 Full information maximum-likelihood
Efron, B., 231 (FIML) estimator, 245–248
Elementary matrix Full information model
commutation matrices and, 37, 40–41, 74 convergence and, 247–248
definition of, 34 derivatives and, 245–247, 250
Kronecker products and, 34 description of, 242–245
properties of, 34–35, 37, 40–41, 74, 147 endogeneity and, 242–243, 248–253
rvec operators and, 34–35 instrumental variable estimator and,
zero-one matrices and, 34–35, 37, 40–41, 245–248
74, 147 iterative procedures and, 245–248
Elimination matrices, x, 208, 210, 214, 256 Klein model and, 247
classical statistics and, 226 Kronecker products and, 244
definition of, 90 log-likelihood functions and, 242–245,
derivatives and, 35 249
duplication matrices and, 112, 124–132 nuisance parameters and, 243–245, 249
explicit expressions for, 98–103, 105, partitions and, 251
109–110 reduced form parameters and, 251–252
identity matrix and, 93 rvecn and, 250, 252
importance of, 89 selection matrices and, 245, 250, 252
Kronecker products and, 89–90, 93, 95, symmetry and, 243
103 twining matrices and, 248–251, 250
Ln , 90–98 vech operators and, 243, 250
Ln∗ , 110–111
Ln , 107–110 GAUSS program, 240
Ln Nn , 98–106, 125–132 Generalized least squares estimators,
Ln Nn , 107–110 232–234
matrix calculus and, 178, 184, 186–187, Generalized rvec of order m (rvecm )
195, 205 matrix calculus and, 141, 169–172,
partitions and, 94–95 178–191, 194
properties of, 89–111 properties of, 19–21, 24–26
statistics and, 105 zero-one matrices and, 47, 55, 61–69, 88
submatrices and, 90, 94–95, 98–105, 108 Generalized rvec of order n (rvecn ), 256
symmetry and, 90, 98, 106, 109 full information model and, 250, 252
theorems for, 95–111 limited information model and, 234
vech operators and, 89–91, 98, 110 properties of, 19–20
Endogeneity zero-one matrices and, 54–55, 57, 60–65,
full information model and, 242–243, 68, 70
248–253 Generalized vec of order n (vecn ), 256
Lagrangian multiplier test for, 248–253 cross-product of matrices and, 190, 194
limited information model and, 229, 231 properties of, 19–21, 25
reduced form parameters and, 251–252 zero-one matrices and, 59–61, 65, 67
262 Index

Gradient vector, 165 cross-product of matrices and, 1, 9,


Graham, A., 37, 134, 137–139, 141, 143, 11–12, 50, 55, 186
165 definition of, 2–3
Greene, 231 derivatives and, 143, 147, 151, 157,
Gretl package, 231 168–169, 186
determinant of, 2
Hausman, J., 230, 249 duplication matrices and, 120, 125, 128
Henderson, H. V., 37, 81 elementary matrix and, 34
Hessian matrix, 160 elimination matrices and, 89–90, 93, 95,
backward chain rule and, 216–217, 224, 103
227–228 full information model and, 244
matrix calculus and, 215–219, 223–225, inverse of, 2
227–228 matrix calculus and, 143, 147, 151, 157,
multivariate normal distribution and, 168–169, 186, 214, 225–226
227–228 matrix Umn and, 75
Sharpe ratio and, 216, 218 partitions and, 2–5
properties of, 2–6
Idempotent matrices, 37, 73, 132 rvecs and, 1
Identity matrix, 255 selection matrices and, 29
commutation matrices and, 37, 57 submatrices and, 1, 3–6, 9
cross-product of matrices and, 11 theorems for, 5–6, 83–84
definition of, 5 transformation principles and, 147, 151,
elimination matrices and, 93 157
limited information model and, transposes and, 2
230 twining matrices and, 82–84
matrix calculus and, 206, 211 vec operators and, 38–49
permutation matrices and, 33–34
rvec operators and, 17 Lagrangian multiplier test (LMT), 220–221,
selection matrices and, 28–29, 31 248–254
vec operators and, 17 Least-squares estimation
Information matrix, 215, 219, 221, 225, limited information maximum likelihood
228–229 estimator and, 232–240
Iterative procedures ordinary, 233–236, 241–242, 248
full information model and, 245–248 three-stage, 248
generalized least squares and, 232–234 two-stage, 233, 236, 240–242, 248
instrumental variable estimators and, Lenders, 223
237–240, 245–248 Likelihood ratio test statistic, 220–223
limited information maximum likelihood Limited information maximum likelihood
(LIML) estimators and, 230–241 (LIML) estimators
ordinary least squares and, 234–237, convergence and, 234, 237, 240–241
240–241 econometrics and, 230–231
performance and, 247–248 as generalized least squares estimator,
232–234
Jacobian matrix, 136, 230, 244 as instrumental variable estimator,
Joint generalised least squares estimator 237–240
(JGLSE), 250, 252 iterative interpretations of, 230–241
Klein model and, 231, 240
Klein model, 231, 240, 247 log-likelihood function and, 230–232,
Kronecker products, ix 234, 237–238
commutation matrices and, 38–50, 55, as ordinary least squares (OLS) estimator,
68, 70 234–237
Index 263

Limited information model Matrix calculus and Zero-one Matrices,


chain rule and, 233–234, 238 Statistical and Econometric Application
convergence and, 234, 237, 240–241 (Turkington), ix
covariance matrix and, 229–231, 235 Matrix calculus
derivatives and, 233, 237–238 basic rules of, 166–168
description of, 229–230 chain rule and, 161–176, 179, 181,
econometrics and, 230–232 184–185, 187, 193, 195, 202, 207,
endogeneity and, 229, 231 216–217, 224, 227–228, 233–234,
identity matrix and, 230 238
least squares estimation and, 232–240 classical statistics and, 223–226
log-likelihood functions and, 229–232, cross-product of matrices and, 168–177,
234, 237–238, 241, 244 186–204
nuisance parameters and, 231–232, 234, econometrics and, 141, 207, 223–225
237, 241 gradient vector and, 165
partitions and, 234, 238 identity matrix and, 206, 211
rvecn and, 234 iterative procedures and, 230–240
submatrices and, 248 Jacobian matrix and, 136, 230, 244
Log-likelihood functions, x Kronecker products and, 143, 147, 151,
complexity and, 215 157, 168–169, 186, 214, 225–226
concentrated, 222–223, 231–232, 241 log-likelihood functions and, 213–215,
full information model and, 242–245, 218, 222–224, 226, 229–232, 234,
249 237–238, 241–245, 249
instrumental variable estimators and, matrix function and, 134, 139, 148, 152,
237–240 156–157
iterative procedures and, 230–232, 234, multivariate normal distribution and,
237–238 215, 219, 225–230
limited information maximum likelihood nuisance parameters and, 221–223,
(LIML) estimators and, 230–232, 234, 231–232, 234, 237, 241, 243–245, 249
237–238 partial derivatives and, 134–135, 137,
limited information model and, 229–232, 143–146, 157–160, 165, 223 (see also
234, 237–238, 241, 244 Derivatives)
matrix calculus and, 213–215, 218, permutation matrices and, 205
222–224, 226, 229–232, 234, 237–238, probability density function and, 161,
241–245, 249 230, 243
maximization of, 231–232, 232n5 product rule and, 164, 167, 173, 175–176,
multivariate normal distribution and, 216–217
226 rvec operators and, 168–177
ordinary least squares (OLS) estimators scalar functions and, 135–138, 141, 143,
and, 234–237 158, 160, 162, 165–168, 205–206, 208
scalar functions and, 218 selection matrices and, 178, 184,
symmetry and, 213, 224 186–187, 195, 205
Lutkepohl, H., 28, 134, 141, 156, 164, 212 symmetry and, 140, 160, 166, 205–213,
223–224
MacPhail, M. S., 137, 143 theorems for, 166–168, 205–213
Magnus, J. transformation principles and, 143–157
duplication/elimination matrices and, vech operators and, 166, 205, 207–211,
89–90, 124, 132 223
matrix calculus and, 134, 136, 141, 164, vec operators and, 168–186, 205–213
207, 212, 215, 229 v operators and, 207–210
zero-one matrices and, 28, 37–38, 73, 81 Matrix function, 134, 139, 148, 152,
Maller, R. A., 215–217, 218n1 156–157
264 Index

Maximum-likelihood estimators (MLEs) vec operators and, 14–26, 178–181,


constrained, 249 184–185, 207
first-order conditions for, 219–220 zero-one matrices and, 32–33, 36–45,
joint generalised least squares (JGLSE) 48–57, 65, 68–69, 72, 77–80, 83–87
and, 250, 252 Permutation matrices
likelihood ratio test statistic and, 220–223 commutation matrices and, 35–36, 48
statistics and, 161, 214, 219–223, definition of, 33
230–240, 245–249 identity matrix and, 33–34
test procedures and, 219–223 matrix calculus and, 205
vector function and, 220 properties of, 33–36
Multivariate normal distribution, 215, 219, twining matrices and, 77–81
225–230 zero-one matrices and, 33–36, 48, 77–81
Portfolios, 215–218
Neudecker, H., 28, 38, 90, 134, 136, 141, Positive-semidefinite matrix, 222
164, 212, 215, 229 Probability density function, 161, 230, 243
Nonsingular matrix, 132, 141, 182 Product rule, 164, 167, 173, 175–176,
Nn matrix, 71–73 216–217
Nuffield College, x–xi
Nuisance parameters Random vectors, 161n1, 226, 229, 235, 242
full information model and, 243–245, Recursive derivatives, 157–163
249 Regression analysis, 161, 238, 249–250
limited information model and, 231–232, Rilstone, P., 157–158, 161–162
234, 237, 241 Rogers, G. S., 134, 137–138, 141
statistics and, 221–223, 231–232, 234, Rothenberg, T., 223
237, 241, 243–245, 249 Rvec operators, ix–x
Null hypothesis, 220–222, 249, 252 basic operators and, 13–15
Null matrix, 225–226, 256 chain rule and, 179, 181, 184–185
Null vector, 128, 220, 233, 235, 240–242, commutation matrices and, 36, 57–73
247–248, 256 cross-product of matrices and, 12–13,
15–17, 68–70
Optimization, 214–218 derivatives and, 60–68, 141, 169–170,
Ordinary least squares (OLS) estimation, 173, 175, 178–186
233–236, 241–242, 248 elementary matrix and, 34–35
generalized, 1, 16, 18–24, 26, 57–73, 61,
Parring, A. M., 138 63, 66, 68, 141, 144, 160, 168–179, 183,
Partial derivatives 256
applications and, 214, 223 identity matrix and, 17
classical statistics and, 223 Kronecker products and, 1
matrix calculus and, 134–135, 137, large X, 178–181
143–146, 157–160, 165, 223 matrix calculus and, 168–177
Partitions, 255 partitions and, 14–26, 171–181, 184–185
classical statistics and, 223 small X, 183–184
cross-product of matrices and, 6–13, submatrices and, 14–16, 18–20, 22–26
171–177, 186, 195, 199, 203 theorems for, 15–17, 59–67
duplication matrices and, 116, 118, 128 vech operators and, 17–18
elimination matrices and, 94–95 v operators and, 17–18
full information model and, 251 zero-one matrices and, 57–73
Kronecker products and, 2–5
limited information model and, 234, 238 Scalar functions
nuisance parameters and, 221–223 classical statistics and, 218
rvec operators and, 14–26, 171–181, gradient vector and, 165
184–185 log-likelihood functions and, 218
Index 265

matrix calculus and, 135–138, 141, 143, nuisance parameters and, 221–223,
158, 160, 162, 165–168, 205–206, 208 231–232, 234, 237, 241, 243–245, 249
optimization and, 215 partitions and, 223
Score vector, 215–219, 223, 227 scalar functions and, 218
Searle, S. R., 37, 81 score vector and, 215–219, 223, 227
Seemingly Unrelated Regressions Equation selection matrices and, 28
Model, 249–250 symmetry and, 223–224
Selection matrices test procedures and, 214, 219–222,
definition of, 28 248–254
duplication matrices and, 89 (see also twining matrices and, 76
Duplication matrices) vec operators and, 207, 214–215
econometrics and, 28–29 v operator and, 18
elimination matrices and, 89 (see also Submatrices, 255
Elimination matrices) column of, 6–7, 100–101
full information model and, 245, 250, cross-product of matrices and, 6–12,
252 15–16, 25–26, 171–177, 186, 193, 195,
identity matrix and, 28–29, 31 199, 203
Kronecker products and, 29 duplication matrices and, 116, 118, 121,
properties of, 28–33, 38, 41, 71 128
statistics and, 28 elimination matrices and, 90, 94–95,
theorems for, 30–33 98–105, 108
Sharpe ratio, 216, 218 generalized vecs/rvecs and, 18–24,
Square matrices, 17–18, 89, 98, 193 171–181, 184–185
Srivastava, U. K., 157–158 Kronecker products and, 1, 3–6, 9
Statistics, 141 limited information model and, 248
chain rule and, 224 matrix calculus and, 143–147, 157,
classical procedures for, 218–226 171–181, 184–186, 193, 195, 199, 203
commutation matrices and, 35 recursive derivatives and, 157
concentrated likelihood function and, row of, 6–7, 100
222, 232 transformation principles and, 143–147
covariance matrix and, 219, 221 zero-one matrices and, 32–33, 41, 43,
Cramer-Rao lower bound and, 218–219, 47–57, 67, 69–72, 77–81, 84–87
221, 229, 249, 251 Symmetry
derivatives and, 220, 223–224 classical statistics and, 223–224
duplication matrices and, 121, 132 commutation matrices and, 37, 72–73
elimination matrices and, 105 derivatives and, 210–213
full information likelihood (FIML) duplication matrices and, 112, 121–122,
estimator and, 245–248 124, 130–132
Hessian matrix and, 160, 215–219, elimination matrices and, 90, 98, 106,
223–225, 227–228 109
information matrix and, 215, 219, 221, full information model and, 243
225, 228–229 idempotent matrices and, 37, 73, 132
likelihood ratio test statistic and, 220–223 log-likelihood functions and, 213, 224
limited information maximum likelihood matrix calculus and, 140, 160, 166,
(LIML) estimators and, 230–241 205–213, 223–224
log-likelihood functions and, 213–215, vech operators and, 18
218, 222–224, 226, 229–232, 234, vec operators and, 210–213
237–238, 241–245, 249
maximum likelihood estimators and, 161, Test procedures
214, 219–221, 230–240, 245–249 Hausman, 249
multivariate normal distribution and, Langrangian multiplier, 220–221,
215, 219, 225–230 248–254
266 Index

Test procedures (cont.) inverse of, 81


likelihood ratio, 220, 222–223 Kronecker products and, 82–84
maximum-likelihood estimators and, notation for, 78–79
161, 214, 219–221, 230–240, 245–249 permutation matrices and, 77–81
statistics and, 214, 219–222, 248–254 properties of, 76–88
Wald, 220, 222 recursive derivatives and, 159
Theil, H., 231 special cases of, 82–83
Three-stage least squares (3SLS), 248 statistics and, 76
Transformation principles theorems for, 79–84, 87–88
applications of, 149–151, 154–157 trace of, 81
combined use of, 154–157 transposes and, 84, 86, 88
derivative concepts and, 143–147, Two-stage least squares (2SLS), 233, 236,
151–153, 156–163 240–242, 248
Kronecker products and, 147, 151, 157
matrix calculus and, 143–157 Ullah, A., 157–158
One, 147–151 University of Western Australia, x
submatrices and, 143–147 Umn matrix, 74–75
Two, 152–157
Transposes, 141, 164 Vech operators
commutation matrices and, 47, 61–62, elimination matrices and, 89–91, 98, 110
68, 139 full information model and, 243, 250
cross-product of matrices and, 7, 15, 25 limited information maximum likelihood
duplication matrices and, 115–116, 118, estimator and, 237
130 matrix calculus and, 166, 205, 207–211,
Kronecker products and, 2 223
multivariate normal distribution and, multivariate normal distribution and, 226
226 properties of, 17–18
recursive derivatives and, 161 symmetry and, 18
twining matrices and, 84, 86, 88 Vec operators, ix–x. See also Rvec operators
vec operators and, 206–207, 213 basic operators and, 13–15
Turkington, D. A., ix, 251 chain rule and, 179, 181, 184–185, 207
commutation matrix and, 68 commutation matrices and, 36, 38–49,
econometric methods and, 224 57–73
generalized devecs and, 21 commutation matrix and, 57–73
generalized rvecs and, 24, 68 cross-product of matrices and, 12–13,
generalized vecs and, 24, 68 15–17, 25–27, 68–70, 168–177, 190,
Kronecker products and, 225 194
matrix calculus and, 134, 139, 141, 164, derivatives and, 60–68, 141, 178–186,
212, 215, 217, 218n1 205–213
maximum-likelihood estimators and, generalized, 1, 12–27, 57–73, 134, 143,
230 145, 157, 160, 164, 168–186, 214, 256
Seemingly Unrelated Regressions identity matrix and, 17
Equation Model and, 249 Kronecker products and, 38–49
zero-one matrices and, 28, 68 matrix calculus and, 168–186, 205–213
Twining matrices, 256 partitions and, 14–26, 178–181, 184–185,
commutation matrices and, 79–80, 139 207
definition of, 77–79, 84–85 statistics and, 207, 214–215
determinant of, 81–82 submatrices and, 14–16, 18–20, 22–26
econometrics and, 76 symmetry and, 210–213
explicit expressions and, 77–79, 84–85 theorems for, 15–17, 21–27, 59–67,
full information model and, 248–251 205–213
Index 267

transposes and, 206–207, 213 Zero-one matrices, ix–x, 215


v operators and, 17–18 commutation matrix and, 35–73
zero-one matrices and, 57–73 econometrics and, 28–29, 35, 76
Vector function elementary matrix and, 34–35, 37, 40–41,
chain rule and, 167 74, 147
maximum likelihood estimators and, 220 generalized vecs/rvecs and, 57–73
of x, 138, 158, 160, 164–166, 168, 207 partitions and, 32–33, 36–45, 48–57, 65,
Verani, S., 230n3 68–69, 72, 77–80, 83–87
v operators permutation matrices and, 33–36, 48,
derivatives and, 207–210 77–81
matrix calculus and, 207–210 rvecm and, 47, 55, 61–69, 88
properties of, 17–18 rvecn and, 54–55, 57, 60–65, 68, 70
statistics and, 18 selection matrices and, 38, 41, 71
theorems for, 207–210 submatrices and, 32–33, 41, 43, 47–57,
67, 69–72, 77–81, 84–87
Wald test statistic, 220, 222 twining matrices and, 76–88
Umn and, 74–75
Zellner, A., 250 vecn and, 59–61, 65, 67

You might also like