2207.04377v1
2207.04377v1
Kenji Nakahira
Quantum Information Science Research Center,
Quantum ICT Research Institute, Tamagawa University
6-1-1 Tamagawa-gakuen, Machida, Tokyo 194-8610 Japan
E-mail: [email protected]
Abstract—We propose a diagrammatic notation for ma- the (i, j)-th component of X. We have
trix differentiation. Our new notation enables us to derive m n
formulas for matrix differentiation more easily than the ∂ XX ∂
f (X) = |ii h j| f (X)
usual matrix (or index) notation. We demonstrate the ∂X i=1 j=1
∂Xi, j
effectiveness of our notation through several examples.
∂
∂X1,1 f (X) ∂X∂1,2 f (X) ∂
··· ∂X1,n f (X)
∂ f (X) ∂
··· ∂
∂X2,1 ∂X2,2 f (X) ∂X2,n f (X)
= .. .. .. ..
. (1)
I. Introduction . . . .
∂ ∂ ∂
arXiv:2207.04377v1 [eess.SP] 10 Jul 2022
Matrix differentiation (or matrix calculus) is widely In the special case of n = 1, X is a column vector, which
accepted as an essential tool in various fields includ- is denoted by |xi. In this case, we have
ing estimation theory, signal processing, and machine ∂
∂x1 f (|xi)
learning. Matrix differentiation provides a convenient m ∂ f (|xi)
∂ X ∂ ∂x2
way to collect the derivative of each component of f (|xi) = |ii f (|xi) = .. , (2)
the dependent variable with respect to each component ∂ |xi i=1
∂x i
.
∂
of the independent variable, where the dependent and ∂xm f (|xi)
independent variables can be a scalar, a vector, or a
where xi B hi|xi.
matrix. However, the usual matrix (or index) notation
A similar notation is used when f is a map from Rm×n
often suffers from cumbersome calculations and difficulty 0 0 ∂
to Rm ×n . For such f , ∂X f (X) is an m × n × m0 × n0 fourth-
in the intuitive interpretation of the final results. It is
known that diagrammatic representations using string order tensor with components { ∂X∂i, j hi0 | f (X)| j0 i}i, j,i0 , j0 . This
diagrams can be successfully applied in linear algebra can be written as the following mm0 × nn0 matrix:
∂
(see [1] and references therein). In this paper, we provide ∂X1,1 f (X) ∂X∂1,2 f (X) · · · ∂X∂1,n f (X)
a simple diagrammatic approach to derive useful formulas ∂
∂X2,1 f (X) ∂X∂2,2 f (X) · · · ∂X∂2,n f (X)
∂
for matrix differentiation. f (X) = .. .. .. .. , (3)
∂X . . . .
Here we mention some related work. In Ref. [2], the
∂ ∂ ∂
way of graphically representing the del operator (i.e., ∂Xm,1 f (X) ∂Xm,2 f (X) · · · ∂Xm,n f (X)
∇) is presented, in which calculations are limited to the
where, for each i and j, ∂X∂i, j f (X) is the m0 × n0 matrix
case of three-dimensional Euclidean space. Reference [3]
presents a diagrammatic notation for manipulating tensor whose (i0 , j0 )-th component is ∂X∂i, j hi0 | f (X)| j0 i.
derivatives with respect to one parameter. We adopt a III. Diagrammatic notation
similar notation to those given in these references.
In diagrammatic terms, a matrix is represented as a
box with an input wire at the bottom and an output wire
at the top. Column vectors, row vectors, and scalars are
II. Definition of matrix differentiation regarded as special cases of matrices. For example, A ∈
Rm×n , |xi ∈ Rm B Rm×1 , |yi ∈ Rm∗ B R1×m , and p ∈ R are
diagrammatically depicted as
Let R be the set of all real numbers and Rm×n be the
set of all m × n real matrices. Also, let {|ii}m i=1 denote
the standard basis of Rm . We are concerned only with
finite-dimensional real Hilbert spaces. Given a map f .
(4)
from Rm×n to R and a matrix X ∈ Rm×n of independent
variables, we denote the m × n real matrix with (i, j)-th The Hilbert space Rm is represented by the wire with
component ∂X∂i, j f (X) by ∂X
∂
f (X), where Xi, j B hi|X| ji is label m, while the Hilbert space R is represented by
‘no wire’. For a scalar, the box will be omitted. Matrix The trace of X ∈ Rm×m satisfies Tr X = h∩|X ⊗ 1|∪i, i.e.,
multiplication and tensor products are represented as the
sequential and parallel compositions, respectively. The
identity matrix 1 ∈ Rm×m is depicted as
. (11)
.
(5)
We also use the swap matrix ×n,m , depicted by
We often use a special column vector |∪n i ∈ Rn ⊗ Rn ,
called a cup, and a special row vector h∩n | ∈ Rn∗ ⊗ Rn∗ ,
called a cap. The cup |∪n i is depicted as
, (12)
. (6)
and the matrix called “spider”, depicted by
The cap h∩n | is the transpose of |∪n i, which is depicted
as
. (7)
. (13)
m×n
We have that, for any X ∈ R ,
For details regarding the properties of these matrices, see,
e.g., Ref. [1].
∂ 0 0
We write ∂X f (X) with a map f : Rm×n → Rm ×n as
. (8)
. (15)
A. Derivatives of A and X
, (9)
∂
For any matrix A that is independent of X, ∂X A = 0,
and the same argument works for the right equality. Equa- i.e.,
tion (8) implies that the transpose acts diagrammatically
by rotating boxes 180◦ . Substituting X = 1 with Eq. (8)
yields
(16)
∂ X X ∂ f [Y(X)] ∂Yi0 , j0
k l
f [Y(X)] = , (23)
∂Xi, j i0 =1 j0 =1
∂Yi0 , j0 ∂Xi, j
. (17)
∂
where Yi0 , j0 B hi0 |Y(X)| j0 i. Thus, ∂X f [Y(X)] can be
B. Rules for sums and products diagrammatically represented by
The following sum rule holds:
∂ ∂ ∂
[ f (X) + g(X)] = f (X) + g(X), (18)
∂X ∂X ∂X
. (24)
which is diagrammatically represented as
All the formulas presented in this paper can be obtained
using the above-mentioned equations. It is noteworthy
that this paper is focused on the matrix differentiation,
but our notation can be easily extended to the case of
. high-order tensors.
(19) V. Other basic formulas
As for matrix multiplication and tensor products, we have We derive several basic formulas.
A. Derivatives of matrix multiplication and tensor prod-
∂ ∂ ∂
" # " #
f (X)g(X) = f (X) g(X) + f (X) g(X) , ucts
∂X ∂X ∂X
We immediately obtain
∂ ∂ ∂
" # " #
f (X) ⊗ h(X) = f (X) ⊗ h(X) + f (X) ⊗ h(X) ,
∂X ∂X ∂X
(20) (21)
(22)
which are depicted as (16)
. (25)
B. Derivative of X T
(21) Since X T is represented by
and
, (26)
we have
(26)
(25)
.
(17) (10)
(22)
(33)
(31)
. (29)
(34)
holds.
3) Other important examples:
(17)
A. Derivatives with respect to column vectors
∂
1) ha|xi = |ai:
∂ |xi
. (37)
Substituting n = 1 into Eq. (17) gives
∂
2) Tr(AX) = AT :
∂X
. (31)
(17) (8)
Thus, we have
(31) . (38)
1 The second line follows from substituting u B k |xi − |bi k22 into
. √
(32) ∂ √ ∂u ∂ u ∂u 1
u= = · √ , (35)
∂ |xi ∂ |xi ∂u ∂ |xi 2 u
Note that ha|T = |ai holds since |ai is a real column vector. which is immediately obtained by the chain rule.
∂ ∂
3) Tr(XX T ) = 2X: 6) Tr[(X + A)−1 ] = −[(X + A)−2 ]T :
∂X ∂X
(8)
(24)
. (39)
∂
4) Tr(AXBX) = AT X T BT + BT X T AT :
∂X
(17)
(41)
(17)
(8)
. (42)
(8)
.
∂
(40) 7) Tr(A ◦ X) = A ◦ 1:
∂X
∂ −1
5) X = −(1 ⊗ X −1 ) |∪i h∩| (1 ⊗ X −1 ):
∂X
∂ −1
Letting Z B ∂X X and differentiating X −1 = X −1 XX −1
∂X −1
with respect to X gives Z = Z + X −1 ∂X X + Z. Thus, we
have
(30)
(17)
(17)
.
(41) . (43)
∂2
8) (hx|A|xi + hb|xi) = A + AT : [3] A. Toumi, R. Yeung, and G. de Felice, “Diagrammatic dif-
∂ |xi ∂ hx| ferentiation for quantum machine learning,” arXiv preprint
arXiv:2103.07960, 2021.
(34)
(32)
(31)
(10)
. (44)
This formula shows that the Hessian matrix of the
quadratic function hx|A|xi + hb|xi + c with A ∈ Rm×m ,
|bi ∈ Rm , and c ∈ R is A + AT .
9) Other important examples: