0% found this document useful (0 votes)

26 views

Matrix Calculus

This document discusses notation and basic rules of matrix calculus. It covers derivatives of scalar, vector, and matrix functions. The derivatives can be scalars, vectors, matrices or higher order tensors depending on the type of functions and variables. Chain and product rules for derivatives are also stated.

Uploaded by

fadeevla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Matrix Calculus

Uploaded by

fadeevla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Matrix Calculus

Sourya Dey

1 Notation
• Scalars are written as lower case letters.
• Vectors are written as lower case bold letters, such as x, and can be either row (dimensions
1×n) or column (dimensions n×1). Column vectors are the default choice, unless otherwise
mentioned. Individual elements are indexed by subscripts, such as xi (i ∈ {1, · · · , n}).
• Matrices are written as upper case bold letters, such as X, and have dimensions m × n
corresponding to m rows and n columns. Individual elements are indexed by double
subscripts for row and column, such as Xij (i ∈ {1, · · · , m}, j ∈ {1, · · · , n}).

• Occasionally higher order tensors occur, such as 3rd order with dimensions m × n × p, etc.

Note that a matrix is a 2nd order tensor. A row vector is a matrix with 1 row, and a column
vector is a matrix with 1 column. A scalar is a matrix with 1 row and 1 column. Essentially,
scalars and vectors are special cases of matrices.
∂f
The derivative of f with respect to x is . Both x and f can be a scalar, vector, or matrix,
∂x T
∂f
leading to 9 types of derivatives. The gradient of f w.r.t x is ∇x f = , i.e. gradient
∂x
is transpose of derivative. The gradient at any point x0 in the domain has a physical
interpretation, its direction is the direction of maximum increase of the function f at the point
x0 , and its magnitude is the rate of increase in that direction. We do not generally deal with
the gradient when x is a scalar.

2 Basic Rules

This document follows numerator layout convention. There is an alternative denom-

inator layout convention, where several results are transposed. Do not mix different layout
conventions.

We’ll first state the most general matrix-matrix derivative type. All other types are sim-
plifications since scalars and vectors are special cases of matrices. Consider a function F (·)
which maps m × n matrices to p × q matrices, i.e. domain ⊂ Rm×n and range ⊂ Rp×q . So,
∂F
F (·) : X → F (X). Its derivative is a 4th order tensor of dimensions p × q × n × m. This
m×n p×q ∂X
is an outer matrix of dimensions n × m (transposed dimensions of the denominator X), with

1
each element being a p × q inner matrix (same dimensions as the numerator F ). It is given as:
∂F ∂F
 
 ∂X1,1 · · · ∂Xm,1 
 
∂F  . .. .. 
 . .
= . .  (1a)

∂X  
 
 ∂F ∂F 
···
∂X1,n ∂Xm,n
which has n rows and m columns, and the (i, j)th element is given as:
∂F1,1 ∂F1,q
 
 ∂Xi,j · · ·
 ∂Xi,j 

∂F 
 . . .. .. 
= . . . 

(1b)
∂Xi,j  
 
 ∂Fp,1 ∂Fp,q 
···
∂Xi,j ∂Xi,j
which has p rows and q columns.

Whew! Now that that’s out of the way, let’s get to some general rules (for the following, x and
y can represent scalar, vector or matrix):

∂y
• The derivative always has outer matrix dimensions = transposed dimen-
∂x
sions of denominator x, and each individual element (inner matrix) has di-
mensions = same dimensions of numerator y. If you do a calculation and the
dimension doesn’t come out right, the answer is not correct.
∂f (g(x)) ∂f (g(x)) ∂g(x)
• Derivatives usually obey the chain rule, i.e. = .
∂x ∂g(x) ∂x
∂f (x)g(x) ∂g(x) ∂f (x)
• Derivatives usually obey the product rule, i.e. = f (x) + g(x) .
∂x ∂x ∂x

3 Types of derivatives

3.1 Scalar by scalar

Nothing special here. The derivative is a scalar, and can also be written as f 0 (x). For example,
if f (x) = sin x, then f 0 (x) = cos x.

3.2 Scalar by vector

f (·) : x → f (x). For this, the derivative is a 1 × m row vector:

m×1 1×1

∂f ∂f ∂f ∂f
= ··· (2)
∂x ∂x1 ∂x2 ∂xm
The gradient ∇x f is its transposed column vector.

2
3.3 Vector by scalar

f (·) : x → f (x). For this, the derivative is a n × 1 column vector:

1×1 n×1

∂f1
 
 ∂x 
 
 ∂f2 
 
∂f  ∂x 
= (3)
 
∂x

 .. 
 . 
 
 
 ∂f 
n
∂x

3.4 Vector by vector

f (·) : x → f (x). Derivative, also known as the Jacobian, is a matrix of dimensions n × m.

m×1 n×1
Its (i, j)th element is the scalar derivative of the ith output component w.r.t the jth input
component, i.e.:
∂f1 ∂f1
 
 ∂x1 · · · ∂xm 
 
∂f  . .. 
 . ..
= . . .  (4)

∂x  
 
 ∂fn ∂fn 
···
∂x1 ∂xm

3.4.1 Special case – Vectorized scalar function

This is a scalar-scalar function applied element-wise to a vector, and is denoted by f (·) : x →

m×1
f (x). For example:
m×1
   
x1 f (x1 )
 x2   f (x2 ) 
f  .  =  .  (5)
   
 ..   .. 
xm f (xm )

In this case, both the derivative and gradient are the same m × m diagonal matrix, given as:

0
 
0
f (x 1 ) 
f 0 (x2 )
 
∂f  
∇x f = = (6)
 
∂x  .. 
 . 


0
 
0
f (xm )

∂f (xi )
where f 0 (xi ) = .
∂xi

3
Note: Some texts take the derivative of a vectorized scalar function by taking element-wise
derivatives to get a m × 1 vector. To avoid confusion with (6), we will refer to this as f 0 (x).
 0 
f (x1 )
 f 0 (x2 ) 
f 0 (x) =  .  (7)
 
 .. 
f 0 (xm )

To realize the effect of this, let’s say we want to multiply the gradient from (6) with some
m-dimensional vector a. This would result in:
 0 
f (x1 ) a1
 0
 f (x2 ) a2 


∇x f a =  (8)
 
.. 

 . 

f 0 (xm ) am

Achieving the same result with f 0 (x) from (7) would require the Hadamard product ◦, defined
as element-wise multiplication of 2 vectors:
 0 
f (x1 ) a1
 0
 f (x2 ) a2 

f 0 (x) ◦ a =  (9)
 
.. 

 . 

f 0 (xm ) am

3.4.2 Special Case – Hessian

Consider the type of function in Sec. 3.2, i.e. f (·) : x → f (x). Its gradient is a vector-
m×1 1×1
to-vector function given as ∇x f (·) : x → ∇x f (x). The transpose of its derivative is the
m×1 m×1
Hessian:
∂2f ∂2f
 
···

 ∂x21 ∂x1 ∂xm 

 .. .. .. 
H=

. . .

 (10)
 
 
 ∂2f 2
∂ f 
···
∂xm ∂x1 ∂x2m
T
∂2f ∂2f

∂∇x f
i.e. H = . If derivatives are continuous, then = , so the Hessian is
∂x ∂xi ∂xj ∂xj ∂xi
symmetric.

4
3.5 Scalar by matrix

f (·) : X → f (X). In this case, the derivative is a n × m matrix:

m×n 1×1

∂f ∂f
 
 ∂X1,1 ···
 ∂Xm,1 

∂f  . .. .. 
=  .. . .  (11)
 
∂X  
 
 ∂f ∂f 
···
∂X1,n ∂Xm,n

The gradient has the same dimensions as the input matrix, i.e. m × n.

3.6 Matrix by scalar

f (·) : x → F (x). In this case, the derivative is a p × q matrix:

1×1 p×q

 ∂F ∂F1,q 
1,1
···
 ∂x ∂x 
 
∂F  .
. .. .. 
=
 . . .  (12)
∂x 


∂Fp,1 ∂Fp,q
 
···
∂x ∂x

3.7 Vector by matrix

f (·) : X → f (X). In this case, the derivative is a 3rd-order tensor with dimensions p × n × m.
m×n p×1
This is the same n × m matrix in (11), but with f replaced by the p-dimensional vector f , i.e.:
 
∂f ∂f
 ∂X1,1 ···
∂Xm,1 
∂f  
=  ... .. ..  (13)

∂X  . . 

 ∂f ∂f 
···
∂X1,n ∂Xm,n

3.8 Matrix by vector

F (·) : x → F (x). In this case, the derivative is a 3rd-order tensor with dimensions p × q × m.
m×1 p×q
This is the same m × 1 row vector in (2), but with f replaced by the p × q matrix F , i.e.:

∂F ∂F ∂F ∂F
= ··· (14)
∂x ∂x1 ∂x2 ∂xm

5
4 Operations and Examples

4.1 Commutation

If things normally don’t commute (such as for matrices, AB 6= BA), then order should be
maintained when taking derivatives. If things normally commute (such as for vector inner
product, a·b = b·a), their order can be switched when taking derivatives. Output dimensions
must always come out right.
∂f
For example, let f (x) = (aT x) b . The derivative should be a n×m matrix. Keeping
n×1 1×m m×1 n×1 ∂x
∂f ∂x
order fixed, we get = aT b = aT Ib = aT b. This is a scalar, which is wrong! The solution?
∂x ∂x
Note that aT x is a scalar, which can sit either to the right or the left of vector b, i.e. ordering
∂f ∂x
doesn’t really matter. Rewriting f (x) = b aT x , we get = baT = baT I = baT , which
∂x ∂x
is the correct n × m matrix.

If this seems confusing, it might be useful to take a simple example with low values for m and
n, and write out the full derivative in matrix form as shown in (4). The resulting matrix will
be baT .

4.2 Derivative of a transposed vector

The derivative of a transposed vector w.r.t itself is the identity matrix, but the transpose

gets applied
to everything
after. For example, let f (w) = (y − wT x)2 = y 2 − wT x y −
y wT x + wT x wT x , where y and x are not a function of w. Taking derivative of the
terms individually:

∂y 2
• = 0T , i.e. a row vector of all 0s.
∂w

∂ wT x y ∂wT T
• = xy = (xy) = y T xT . Since y is a scalar, this is simply yxT .
∂w ∂w

∂y wT x ∂wT
• =y x = yxT
∂w ∂w

∂ wT x wT x ∂wT ∂wT
• = x wT x + wT x x = xT w xT + wT x xT . Since vector
∂w ∂w ∂w
inner products commute, this is 2 wT x xT .

∂f
So = −2yxT + 2 wT x xT
∂w

4.3 Dealing with tensors

A tensor of dimensions p × q × n × m (such as given in (1)) can be pre- and post-multiplied by

vectors just like an ordinary matrix. These vectors must be compatible with the inner matrices

6
of dimensions p × q, i.e. for each inner matrix, pre-multiply with a 1 × p row vector and post-
multiply with a q × 1 column vector to get a scalar. This gives a final matrix of dimensions
n × m.
∂f
Example: f (W ) = aT W b . This is a scalar, so should be a matrix which has
1×m m×n n×1 ∂W
∂f ∂W ∂W
transposed dimensions as W , i.e. n × m. Now, = aT b, where has dimensions
∂W ∂W ∂W
m × n × n × m. For example if m = 3, n = 2, then:
     
1 0 0 0 0 0
0 0 1 0 0 0
 
∂W  0 0 0 0 1 0 
=  (15)
 
∂W

 0 1 0 0 0 0
 
0 0 0 1 0 0
0 0 0 0 0 1

Note that the (i, j)th inner matrix has a 1 in its (j, i)th position. Pre- and post-multiplying the
(i, j)th inner matrix with aT and b gives aj bi , where i ∈ {1, 2} and j ∈ {1, 2, 3}. So:

T ∂W a1 b1 a2 b1 a3 b1
a b= (16)
∂W a1 b2 a2 b2 a3 b2

∂f
Thus, = baT .
∂W

4.4 Gradient Example: L2 Norm

Problem: Given f (x) = kx − ak2 , find ∇x f .

p
Note that kx − ak2 = (x − a)T (x − a), which is a scalar. So the derivative will be a row
vector and gradient will be a column vector of the same dimension as x. Let’s use the chain
rule:
p
∂f ∂ (x − a)T (x − a) ∂(x − a)T (x − a)
= × (17)
∂x ∂(x − a)T (x − a) ∂x
1
The first term is a scalar-scalar derivative equal to p . The second term is:
2 (x − a)T (x − a)

∂(x − a)T (x − a) ∂ xT x − aT x − xT a + aT a
= (18)
∂x ∂x
= xT + xT − aT − aT + 0T

= 2 xT − aT

∂f xT − aT
So =p .
∂x (x − a)T (x − a)
x−a
So ∇x f = , which is basically the unit displacement vector from a to x. This means
kx − ak2
that to get maximum increase in f (x), one should move away from a along the straight line
joining a and x. Alternatively, to get maximum decrease in f (x), one should move from x
directly towards a, which makes sense geometrically.

7
5 Notes and Further Reading

The chain rule and product rule do not always hold when dealing with matrices. However,
some modified forms can hold when using the T race(·) function. For a full list of derivatives,
the reader should consult a textbook or websites such as Wikipedia’s page on Matrix calculus.
Keep in mind that some texts may use denominator layout convention, where results will look
different.

Linear System Theory 2E (Wilson J. Rugh)
88% (16)
Linear System Theory 2E (Wilson J. Rugh)
596 pages
Principles of Magnetic Resonance Imaging: A Signal Processing Perspective
No ratings yet
Principles of Magnetic Resonance Imaging: A Signal Processing Perspective
44 pages
Schonemann Trace Derivatives Presentation
No ratings yet
Schonemann Trace Derivatives Presentation
82 pages
Chapter Matrix Derivative Common Cases
No ratings yet
Chapter Matrix Derivative Common Cases
6 pages
F Matrix Calculus
No ratings yet
F Matrix Calculus
9 pages
Matrix Calculus PDF
No ratings yet
Matrix Calculus PDF
9 pages
mit18_s096iap23_lec1
No ratings yet
mit18_s096iap23_lec1
16 pages
Matrix Calculus
No ratings yet
Matrix Calculus
9 pages
Matrix Calculus Tutorial
No ratings yet
Matrix Calculus Tutorial
7 pages
Matrix Calculus: 1 The Derivative
100% (1)
Matrix Calculus: 1 The Derivative
13 pages
Notice: Estimation Theory Pattern Recognition
No ratings yet
Notice: Estimation Theory Pattern Recognition
5 pages
Matrix Algebra Calculus Review
0% (1)
Matrix Algebra Calculus Review
12 pages
Derivatives, Backpropagation, and Vectorization
No ratings yet
Derivatives, Backpropagation, and Vectorization
7 pages
Matrixcookbook Wiki
No ratings yet
Matrixcookbook Wiki
18 pages
Matrix Calculus
100% (1)
Matrix Calculus
9 pages
矩阵微分手册-Matrix calculus-Wikipedia
No ratings yet
矩阵微分手册-Matrix calculus-Wikipedia
18 pages
Vector, Matrix, and Tensor Derivatives: 1 Simplify, Simplify, Simplify
No ratings yet
Vector, Matrix, and Tensor Derivatives: 1 Simplify, Simplify, Simplify
7 pages
Notes On Tensors
No ratings yet
Notes On Tensors
11 pages
Math 5390 Chapter 2
No ratings yet
Math 5390 Chapter 2
5 pages
Calculus With Vectors and Matrices
No ratings yet
Calculus With Vectors and Matrices
16 pages
Vector and Matrix Calculus: Herman Kamper 30 January 2013
No ratings yet
Vector and Matrix Calculus: Herman Kamper 30 January 2013
5 pages
Thomas Minka - Note On Matrix Calculus and Algebra
No ratings yet
Thomas Minka - Note On Matrix Calculus and Algebra
19 pages
Matrix Differentiation
No ratings yet
Matrix Differentiation
15 pages
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
No ratings yet
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning: Andrew Rosenberg
46 pages
matrixcalc Đạo hàm ma trận PDF
No ratings yet
matrixcalc Đạo hàm ma trận PDF
25 pages
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
No ratings yet
Lecture Note 3 - Introduction To Vector and Matrix Differentiation
6 pages
Matrix Calculus
No ratings yet
Matrix Calculus
56 pages
CENG3300 Lecture 2-1
No ratings yet
CENG3300 Lecture 2-1
21 pages
1 VectorCalculus S
No ratings yet
1 VectorCalculus S
11 pages
2207.04377v1
No ratings yet
2207.04377v1
6 pages
matrix-differential
No ratings yet
matrix-differential
21 pages
Gradient Notes
No ratings yet
Gradient Notes
5 pages
TA WEEK 3 Copy
No ratings yet
TA WEEK 3 Copy
27 pages
Matrix Calculus - Notes On The Derivative of A Trace: Johannes Traa
No ratings yet
Matrix Calculus - Notes On The Derivative of A Trace: Johannes Traa
7 pages
1 Vector Analysis
No ratings yet
1 Vector Analysis
27 pages
Matrix Differentiation
No ratings yet
Matrix Differentiation
36 pages
Matrix Differentiation Rules and Application
No ratings yet
Matrix Differentiation Rules and Application
36 pages
gradientt
No ratings yet
gradientt
6 pages
Vector - Matrix Calculus
No ratings yet
Vector - Matrix Calculus
10 pages
matrices Lecture_Econometrics_Unil
No ratings yet
matrices Lecture_Econometrics_Unil
18 pages
Matrix Differentiation
No ratings yet
Matrix Differentiation
34 pages
D2L CH2 Part2
No ratings yet
D2L CH2 Part2
40 pages
Gradient, Jacobian, Hessian, Laplacian and All That
No ratings yet
Gradient, Jacobian, Hessian, Laplacian and All That
2 pages
Notes On Derivatives of Vectors
No ratings yet
Notes On Derivatives of Vectors
12 pages
IB352 Warwick Wk1 - Maths
No ratings yet
IB352 Warwick Wk1 - Maths
15 pages
Review of Matrix Operations: Vector: A Sequence of Elements (The Order Is Important)
No ratings yet
Review of Matrix Operations: Vector: A Sequence of Elements (The Order Is Important)
11 pages
Regression 1
No ratings yet
Regression 1
63 pages
Chap 0 Mathematical Preliminary: Finite Element Analysis and Design Nam-Ho Kim
No ratings yet
Chap 0 Mathematical Preliminary: Finite Element Analysis and Design Nam-Ho Kim
15 pages
斯坦福大学机器学习数学基础 17-24
No ratings yet
斯坦福大学机器学习数学基础 17-24
8 pages
EN530.678 Nonlinear Control and Planning in Robotics Lecture 1: Matrix Algebra Basics January 27, 2020
No ratings yet
EN530.678 Nonlinear Control and Planning in Robotics Lecture 1: Matrix Algebra Basics January 27, 2020
4 pages
Background Material Crib-Sheet: 1 Probability Theory
No ratings yet
Background Material Crib-Sheet: 1 Probability Theory
4 pages
Mat Deriv
No ratings yet
Mat Deriv
3 pages
Electromagnetic Theory 1 2019 PDF
No ratings yet
Electromagnetic Theory 1 2019 PDF
34 pages
Vector Calculus and Differential Forms With Applications To Electromagnetism
No ratings yet
Vector Calculus and Differential Forms With Applications To Electromagnetism
28 pages
Lec 1 - Maths in ML I
No ratings yet
Lec 1 - Maths in ML I
38 pages
MC
No ratings yet
MC
27 pages
Lecture 7
No ratings yet
Lecture 7
24 pages
Matrix Calc
No ratings yet
Matrix Calc
23 pages
T&S Book
No ratings yet
T&S Book
8 pages
Matrix Calculus: Derivation and Simple Application: HU, Pili March 30, 2012
No ratings yet
Matrix Calculus: Derivation and Simple Application: HU, Pili March 30, 2012
30 pages
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Bayesian Optimization - Garnett CAMBRIDGE 2023
No ratings yet
Bayesian Optimization - Garnett CAMBRIDGE 2023
374 pages
Derivation of Normal Equations
No ratings yet
Derivation of Normal Equations
7 pages
Matrix Matrix Equation: FL, e +/.i
No ratings yet
Matrix Matrix Equation: FL, e +/.i
4 pages
Magnus Matrix Differentials Presentation
100% (1)
Magnus Matrix Differentials Presentation
119 pages
Matrixproblems PDF
No ratings yet
Matrixproblems PDF
151 pages
Full Download (Ebook) Matrix Differential Calculus with Applications in Statistics and Econometrics by Jan R. Magnus, Heinz Neudecker, Heinz Neudeckery PDF DOCX
100% (4)
Full Download (Ebook) Matrix Differential Calculus with Applications in Statistics and Econometrics by Jan R. Magnus, Heinz Neudecker, Heinz Neudeckery PDF DOCX
65 pages
Kronecker Product
No ratings yet
Kronecker Product
10 pages
Vdoc.pub Generalized Vectorization Cross Products and Matrix Calculus
No ratings yet
Vdoc.pub Generalized Vectorization Cross Products and Matrix Calculus
279 pages
Matrix Derivatives
No ratings yet
Matrix Derivatives
4 pages
Gradients For An RNN: Carter N Brown January 4, 2017 Last Edited: June 6, 2017
No ratings yet
Gradients For An RNN: Carter N Brown January 4, 2017 Last Edited: June 6, 2017
4 pages
Complex Valued Matrix Derivatives With Applications in Signal Processing and Communications 1st Edition Are Hjorungnes download pdf
100% (15)
Complex Valued Matrix Derivatives With Applications in Signal Processing and Communications 1st Edition Are Hjorungnes download pdf
85 pages
Matrix Cookbook
No ratings yet
Matrix Cookbook
71 pages
Numerical Linear Algebra A Concise Introduction with MATLAB and Julia Folkmar Bornemann instant download
100% (1)
Numerical Linear Algebra A Concise Introduction with MATLAB and Julia Folkmar Bornemann instant download
59 pages
Bayesian Optimization Roman Garnett Ebook All Chapters PDF
100% (1)
Bayesian Optimization Roman Garnett Ebook All Chapters PDF
78 pages
Alexander Graham - Kronecker Products and Matrix C
No ratings yet
Alexander Graham - Kronecker Products and Matrix C
128 pages
Mathematics For Data Science - Towards Data Science
100% (1)
Mathematics For Data Science - Towards Data Science
5 pages
Bayesian Optimization Roman Garnett all chapter instant download
No ratings yet
Bayesian Optimization Roman Garnett all chapter instant download
40 pages
Section1notes Linear Algebra Review CS229
No ratings yet
Section1notes Linear Algebra Review CS229
94 pages
Matrix Derivative Calculus Paper
0% (1)
Matrix Derivative Calculus Paper
7 pages

Matrix Calculus

Uploaded by

Matrix Calculus

Uploaded by

Matrix Calculus

This document follows numerator layout convention. There is an alternative denom-

3.1 Scalar by scalar

3.2 Scalar by vector

f (·) : x → f (x). For this, the derivative is a 1 × m row vector:

f (·) : x → f (x). For this, the derivative is a n × 1 column vector:

3.4 Vector by vector

f (·) : x → f (x). Derivative, also known as the Jacobian, is a matrix of dimensions n × m.

3.4.1 Special case – Vectorized scalar function

This is a scalar-scalar function applied element-wise to a vector, and is denoted by f (·) : x →

3.4.2 Special Case – Hessian

f (·) : X → f (X). In this case, the derivative is a n × m matrix:

3.6 Matrix by scalar

f (·) : x → F (x). In this case, the derivative is a p × q matrix:

3.7 Vector by matrix

3.8 Matrix by vector

4.2 Derivative of a transposed vector

4.3 Dealing with tensors

A tensor of dimensions p × q × n × m (such as given in (1)) can be pre- and post-multiplied by

4.4 Gradient Example: L2 Norm

Problem: Given f (x) = kx − ak2 , find ∇x f .

You might also like