Multivariate Chain Rule Explained

1) The multivariate chain rule extends the basic differentiation rules to vector-valued functions, where the order of multiplication matters due to non-commutativity of matrices. 2) The chain rule exhibits properties similar to matrix multiplication, where the "dimensions" of partial derivatives must match up. 3) For a function f of variables x1 and x2, where x1 and x2 are functions of t, the chain rule allows computing the derivative of f with respect to t by taking partial derivatives of f with respect to x1 and x2, and chaining those with the derivatives of x1 and x2 with respect to t.

Uploaded by

neilli1992

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views3 pages

Multivariate Chain Rule Explained

Uploaded by

neilli1992

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Multivariate Chain-Rule

In the multivariate case, where x ∈ Rn , the basic differentiation rules that we

know from school (e.g., sum rule, product rule, chain rule) still apply. However, Product rule:
we need to pay attention because now we have to deal with matrices where (f g)0 =
multiplication is no longer commutative, i.e., the order matters. f 0 g + f g 0 , Sum
rule: (f +g)0 =
∂ ∂f ∂g
Product Rule: f (x)g(x) = g(x) + f (x) (1) f 0 + g 0 , Chain
∂x ∂x ∂x rule: (g ◦ f )0 =
∂ ∂f ∂g
Sum Rule: f (x) + g(x) = + (2) g 0 (f )f 0
∂x ∂x ∂x
∂ ∂ ∂g ∂f
Chain Rule: (g ◦ f )(x) = g(f (x)) = (3)
∂x ∂x ∂f ∂x
Let us have a closer look at the chain rule. The chain rule formula (3) resembles
to some degree the rules for matrix multiplication where “neighboring” dimen-
sions have to match for matrix multiplication to be defined. If we go from left to
right, the chain rule exhibits similar properties: ∂f shows up in the “denomina-
tor” of the first factor and in the “numerator” of the second factor. If we multiply
the factors together, multiplication is defined (the dimensions of ∂f match, and
∂f “cancels”, such that ∂g/∂x remains.1
Consider a function f : R2 → R of two variables x1 , x2 . Furthermore, x1 (t)
and x2 (t) are themselves functions of t. To compute the gradient of f with
respect to t, we need to apply the chain rule (3) for multivariate functions as
" #
df h i ∂x1 (t) ∂f ∂x1 ∂f ∂x2
∂f ∂f ∂t
= ∂x1 ∂x2 ∂x2 (t) = + (4)
dt ∂t
∂x1 ∂t ∂x 2 ∂t

where d denotes the gradient and ∂ partial derivatives.

Example:
Consider f (x1 , x2 ) = x21 + 2x2 , where x1 = sin t and x2 = cos t, then
df ∂f ∂x1 ∂f ∂x2
= + (5)
dt ∂x1 ∂t ∂x2 ∂t
∂ sin t ∂ cos t
= 2 sin t +2 (6)
∂t ∂t
= 2 sin t cos t − 2 sin t = 2 sin t(cos t − 1) (7)
is the corresponding derivative of f with respect to t.

If f (x1 , x2 ) is a function of x1 and x2 , where x1 (s, t) and x2 (s, t) are themselves

functions of two variables s and t, the chain rule yields
df ∂f ∂x1 ∂f ∂x2
= + , (8)
ds ∂x1 ∂s ∂x2 ∂s
1 This is only an intuition, but not mathematically correct since the partial derivative is not a

fraction.

1
df ∂f ∂x1 ∂f ∂x2
= + , (9)
dt ∂x1 ∂t ∂x2 ∂t
which can be expressed as the matrix multiplication
df ∂f ∂x h i ∂x1 ∂x1
∂f ∂f ∂s ∂t
= = ∂x ∂x2 . (10)
d(s, t) ∂x ∂(s, t) | 1 {z ∂x2 } ∂x ∂s
2

∂f
| {z ∂t }
= ∂x ∂x
= ∂(s,t)

This compact way of writing the chain rule as a matrix multiplication makes
only sense if the gradient is defined as a row vector. Otherwise, we will need
to start transposing gradients for the matrix dimensions to match. This may
still be straightforward as long as the gradient is a vector or a matrix; however,
when the gradient becomes a tensor (we will discuss this in the following), the
transpose is no longer a triviality.

Example: (Gradient of a Linear Model)

Let us consider the linear model
y = Φθ , (11)
where θ ∈ RD is a parameter vector, Φ ∈ RN ×D are input features and y ∈ RN
are the corresponding observations. We define the following functions:
L(e) := kek2 , (12)
e(θ) := y − Φθ . (13)

We seek ∂L
∂θ , and we will use the chain rule for this purpose.
Before we start any calculation, we determine the dimensionality of the gra-
dient as
∂L
∈ R1×D . (14)
∂θ
The chain rule allows us to compute the gradient as
∂L ∂L ∂e
= . (15)
∂θ ∂e ∂θ
We know that kek2 = e> e and determine
∂L
= 2e> ∈ R1×N . (16)
∂e
Furthermore, we obtain
∂e
= −Φ ∈ RN ×D , (17)
∂θ
such that our desired derivative is
∂L (13)
= −2e> Φ = − 2(y > − θ > Φ> ) |{z}
Φ ∈ R1×D . (18)
∂θ | {z }
1×N N ×D

2
Remark. We would have obtained the same result without using the chain rule
by immediately looking at the function

L2 (θ) := ky − Φθk2 = (y − Φθ)> (y − Φθ) . (19)

This approach is still practical for simple functions like L2 but becomes imprac-
tical if consider deep function compositions.

S&ML Unit 3 - Q & A
No ratings yet
S&ML Unit 3 - Q & A
14 pages
Chain Rule for Functions of Two Variables
No ratings yet
Chain Rule for Functions of Two Variables
45 pages
Chain Rule
No ratings yet
Chain Rule
7 pages
Section 145
No ratings yet
Section 145
18 pages
Cal146 The Chain Rule of Functions of Two Variables
100% (1)
Cal146 The Chain Rule of Functions of Two Variables
3 pages
The Chain Rule
No ratings yet
The Chain Rule
30 pages
Chain Rule for Multiple Variables
No ratings yet
Chain Rule for Multiple Variables
2 pages
Understanding the Chain Rule in Calculus
No ratings yet
Understanding the Chain Rule in Calculus
3 pages
Chain Rule: Differentiation Guide
No ratings yet
Chain Rule: Differentiation Guide
30 pages
U4D12 The Chain Rule
No ratings yet
U4D12 The Chain Rule
29 pages
LESSON 25: Chain Rule On Partial Derivatives
No ratings yet
LESSON 25: Chain Rule On Partial Derivatives
9 pages
Understanding the Chain Rule
No ratings yet
Understanding the Chain Rule
2 pages
Chap14 Sec5
No ratings yet
Chap14 Sec5
63 pages
LESSON 25: Chain Rule On Partial Derivatives
No ratings yet
LESSON 25: Chain Rule On Partial Derivatives
9 pages
Calculus: Mastering the Chain Rule
No ratings yet
Calculus: Mastering the Chain Rule
27 pages
Class Notes Calculus
No ratings yet
Class Notes Calculus
27 pages
Note3 - Chain Rules
No ratings yet
Note3 - Chain Rules
6 pages
Diff Slides2
No ratings yet
Diff Slides2
17 pages
MATH 136 More Derivatives: Product Rule, Quotient Rule, Chain Rule
No ratings yet
MATH 136 More Derivatives: Product Rule, Quotient Rule, Chain Rule
4 pages
Understanding the Chain Rule
No ratings yet
Understanding the Chain Rule
2 pages
Machine Learning and Pattern Recognition Week 8 - Backprop
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Backprop
8 pages
Advanced Matrix Calculus Guide
No ratings yet
Advanced Matrix Calculus Guide
16 pages
Lecture 12 (Section 3.4)
No ratings yet
Lecture 12 (Section 3.4)
29 pages
Busa130 L8
No ratings yet
Busa130 L8
138 pages
2019fall CALII WK3 THR v5
No ratings yet
2019fall CALII WK3 THR v5
24 pages
Advanced Calculus: Chain Rule
No ratings yet
Advanced Calculus: Chain Rule
9 pages
MM17-Content-Module 11 Lesson Proper
No ratings yet
MM17-Content-Module 11 Lesson Proper
5 pages
Derivatives: Application & Chain Rule
No ratings yet
Derivatives: Application & Chain Rule
13 pages
Chainrule PDF
No ratings yet
Chainrule PDF
2 pages
Chain Rule
No ratings yet
Chain Rule
2 pages
Chainrule PDF
No ratings yet
Chainrule PDF
2 pages
Civil Engineering Mathematics 3 Report Bru
No ratings yet
Civil Engineering Mathematics 3 Report Bru
18 pages
Lecture 3.4, Week 4
No ratings yet
Lecture 3.4, Week 4
18 pages
Chain Rule for Second Derivatives
No ratings yet
Chain Rule for Second Derivatives
3 pages
MATH 2107: Calculus-1 Section 3.6: The Chain Rule
No ratings yet
MATH 2107: Calculus-1 Section 3.6: The Chain Rule
10 pages
D1 Lecture 17
No ratings yet
D1 Lecture 17
18 pages
Mathematical Analysis 2 Lecture 2 Finals
No ratings yet
Mathematical Analysis 2 Lecture 2 Finals
5 pages
Chain Rulezzz
No ratings yet
Chain Rulezzz
5 pages
Understanding the Chain Rule in Calculus
No ratings yet
Understanding the Chain Rule in Calculus
3 pages
MethodofDifferentiation-Theory JEE @GB Sir
No ratings yet
MethodofDifferentiation-Theory JEE @GB Sir
13 pages
L-3.1 Chain Rule
No ratings yet
L-3.1 Chain Rule
14 pages
6502 Multi 4
No ratings yet
6502 Multi 4
5 pages
Chain Rule and Implicit Differentiation
100% (1)
Chain Rule and Implicit Differentiation
18 pages
Chain Rule
No ratings yet
Chain Rule
29 pages
The Chain Rule in Partial Differentiation
No ratings yet
The Chain Rule in Partial Differentiation
3 pages
MC Stack Ty Chain 2009 1
No ratings yet
MC Stack Ty Chain 2009 1
9 pages
Chain Rule in Differentiation Explained
No ratings yet
Chain Rule in Differentiation Explained
3 pages
Differentials and Chain Rule Explained
No ratings yet
Differentials and Chain Rule Explained
2 pages
Comparative Statics in Economics
No ratings yet
Comparative Statics in Economics
44 pages
Topics From Tensoral Calculus: 1 Preliminaries
No ratings yet
Topics From Tensoral Calculus: 1 Preliminaries
11 pages
101W23 01 Notes
No ratings yet
101W23 01 Notes
3 pages
The Chain Rule
No ratings yet
The Chain Rule
6 pages
Calculus Chain Rule Explained
No ratings yet
Calculus Chain Rule Explained
8 pages
StewartCalc8 14 05
No ratings yet
StewartCalc8 14 05
21 pages
Math Xa Fall 2002 Review Notes - Calculating and Using Derivatives
No ratings yet
Math Xa Fall 2002 Review Notes - Calculating and Using Derivatives
20 pages
Computing - Basic 6 - T1
No ratings yet
Computing - Basic 6 - T1
6 pages
NSTP 1 - National Service Training Program 1
No ratings yet
NSTP 1 - National Service Training Program 1
5 pages
Input Data Sheet For E-Class Record: Region Division District School Name School Id School Year
No ratings yet
Input Data Sheet For E-Class Record: Region Division District School Name School Id School Year
22 pages
Flutter Developer Interview QA
No ratings yet
Flutter Developer Interview QA
2 pages
Senior High School Final Grades Summary
No ratings yet
Senior High School Final Grades Summary
10 pages
LOGO!8
No ratings yet
LOGO!8
3 pages
? Heal The World - Gap Fill Activity ?
No ratings yet
? Heal The World - Gap Fill Activity ?
5 pages
SAP CPI Adapters. - Implementation of All SAP CPI Adapters-1
No ratings yet
SAP CPI Adapters. - Implementation of All SAP CPI Adapters-1
51 pages
This Study Resource Was: Encuentro and The Metamorphosis of Indio
100% (1)
This Study Resource Was: Encuentro and The Metamorphosis of Indio
3 pages
Part of Speech
100% (1)
Part of Speech
17 pages
Research Report Writing Lesson Plan
100% (1)
Research Report Writing Lesson Plan
6 pages
1.3. Percentages Fractions Conversions:: Percentage Is Denoted by "%", Which Means "/100"
No ratings yet
1.3. Percentages Fractions Conversions:: Percentage Is Denoted by "%", Which Means "/100"
1 page
KHYATI INFRA MART FrenchiseOpportunity
No ratings yet
KHYATI INFRA MART FrenchiseOpportunity
1 page
Year 8 Core V2 Summer 2021 Markscheme
No ratings yet
Year 8 Core V2 Summer 2021 Markscheme
5 pages
Part 1 Sol
No ratings yet
Part 1 Sol
26 pages
第2课怎么办？
No ratings yet
第2课怎么办？
38 pages
Module 3 - Chapter 4 - Sticks (Flash Fiction) by George Saunders
No ratings yet
Module 3 - Chapter 4 - Sticks (Flash Fiction) by George Saunders
2 pages
Burghart 1983 Renunciation
No ratings yet
Burghart 1983 Renunciation
20 pages
Fourth Grade Main Idea Lesson Plan
No ratings yet
Fourth Grade Main Idea Lesson Plan
4 pages
Nishchal Visa Resume 2025
No ratings yet
Nishchal Visa Resume 2025
1 page
Module 2 - Compute in The Cloud
No ratings yet
Module 2 - Compute in The Cloud
22 pages
Cleft Lip and Palate Overview
No ratings yet
Cleft Lip and Palate Overview
52 pages
Introduction to C# Programming
No ratings yet
Introduction to C# Programming
13 pages
Literature Theory Course Guide
No ratings yet
Literature Theory Course Guide
5 pages
Ramadan 2024: Guía y Reflexiones
No ratings yet
Ramadan 2024: Guía y Reflexiones
15 pages
Swing vs JApplet for Java GUIs
No ratings yet
Swing vs JApplet for Java GUIs
9 pages
The Sound of Music - by Deobarah Cowly
No ratings yet
The Sound of Music - by Deobarah Cowly
3 pages
Sermon 4 Your City, Your Mission
No ratings yet
Sermon 4 Your City, Your Mission
3 pages
Grade 10 Music Q4: Multimedia Forms
No ratings yet
Grade 10 Music Q4: Multimedia Forms
16 pages
Veneration Without Understanding Interpretation
No ratings yet
Veneration Without Understanding Interpretation
2 pages

Multivariate Chain Rule Explained

Uploaded by

Multivariate Chain Rule Explained

Uploaded by

Multivariate Chain-Rule

In the multivariate case, where x ∈ Rn , the basic differentiation rules that we

where d denotes the gradient and ∂ partial derivatives.

If f (x1 , x2 ) is a function of x1 and x2 , where x1 (s, t) and x2 (s, t) are themselves

Example: (Gradient of a Linear Model)

L2 (θ) := ky − Φθk2 = (y − Φθ)> (y − Φθ) . (19)

You might also like