0% found this document useful (0 votes)

13 views60 pages

Lecture21 Deep Learning PartII April12 2021

The lecture focuses on Deep Learning, specifically on the concepts of forward and backward propagation in neural networks. It explains how computations flow through a computation graph to calculate outputs and gradients. Key topics include the computation of derivatives and the application of the chain rule in calculus.

Uploaded by

zxi09062025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views60 pages

Lecture21 Deep Learning PartII April12 2021

Uploaded by

zxi09062025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

AI for Medicine

Lecture 21:
Deep Learning – Part II

April 12, 2021

Mohammad Hammoud
Carnegie Mellon University in Qatar
Today…
• Last Monday’s Session:
• Deep Learning – Part I

• Today’s Session:
• Deep Learning – Part II

• Announcements:
• Assignment 3 is due on Wednesday April 14 by midnight
• Quiz II is on April 19
Outline

Deep Learning

Computation Gradient
Overview Vectorization
Graph Descent
The Flow of Computations in Neural
Networks
• The flow of computations in a neural network goes in two ways:
1. Left-to-right: This is referred to as forward propagation, which results in
computing the output of the network
2. Right-to-left: This is referred to as back propagation, which results in
computing the gradients (or derivatives) of the parameters in the network

• The intuition behind this 2-way flow of computations can be explained

through the concept of “computation graphs”
• What is a computation graph?
What is a Computation Graph?
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗

𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward propagation allows computing
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=𝟑
𝒅𝒗
To compute the derivative of with respect to , we went back to ,
nudged it, and measured the corresponding resultant increase on
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅 𝒂
The change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅 𝒂
And the change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
This is denoted as the chain rule in calculus
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱
=𝟑= ×𝟏
𝒅𝒂 𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=𝟑=𝟑 ×𝟏
𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
In essence, to compute the derivative of with respect to , we had to go back to ,
nudge it a little bit, and measure the corresponding resultant increase on
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
Then, we had to go back to , nudge it a little bit, and measure the corresponding
resultant increase on
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂

Then, we multiplied the changes together (i.e., we applied the chain rule!)
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= .
𝒅𝒖 𝒅𝒗 𝒅 𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
The change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
And the change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱
=𝟑= ×𝟏
𝒅𝒖 𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=𝟑=𝟑 ×𝟏
𝒅𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
Same as before, we had to go back to then to in order to compute the derivative
of
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒃
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒃
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=? = × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃

3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟗= × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃

3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒄
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒄
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=? = × ×
𝒅𝒄 𝒅𝒗 𝒅 𝒖 𝒅 𝒄

3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅 𝒖 𝒅 𝒄

3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅𝒗 𝒅𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅𝒖 𝒅𝒄

3 1 4
Outline

Deep Learning

Computation Gradient
Overview Vectorization
Graph Descent
The Computation Graph of Logistic
Regression
• Let us translate logistic regression (which is a neural network with only
1 neuron) into a computation graph
𝟏
𝒃 𝑻
𝒛 =𝒘 𝒙 + 𝒃

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒙𝟏 𝒘
𝟏 𝒃
𝒙 𝟐 𝒘𝟐 𝒛 𝒂 ^
𝒚 𝒙 𝑻
𝒘𝟑
𝒘
𝒙𝟑
𝒂=𝝈 ( 𝒛 )

Where , , , and is the cost (or loss) function

Forward Propagation
• The loss function can be computed by moving from left to right

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝒘
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 𝝏
( − 𝒚𝒍𝒐𝒈
= Partial ( 𝒂 ) −of
derivative ( 𝟏with
− 𝒚 ) respect − 𝒂) )
𝒍𝒐𝒈 ( 𝟏 to
𝝏𝒂 𝝏𝒂
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 𝝏
= ( − 𝒚𝒍𝒐𝒈 ( 𝒂 ) − ( 𝟏 − 𝒚 ) 𝒍𝒐𝒈 ( 𝟏 − 𝒂 ) )
𝝏𝒂 𝝏𝒂
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 −𝝏𝒚 (𝟏 − 𝒚 () ) (
= ( − 𝒚𝒍𝒐𝒈 𝒂 − 𝟏 − 𝒚 ) 𝒍𝒐𝒈 ( 𝟏 − 𝒂 ) )
+
𝝏𝒂 𝝏𝒂𝒂 ( 𝟏− 𝒂)
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂
= Partial
× derivative of with respect to
𝝏 𝒛 𝝏𝒂 𝝏 𝒛
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂 − 𝒚 (𝟏 − 𝒚 )
= × ¿
𝝏 𝒛 𝝏𝒂 𝝏 𝒛 𝒂 (
+
( 𝟏 − 𝒂)
× ¿
)
𝝏𝒛 𝒂
+
(
𝝏 𝒂 − 𝒚 (𝟏 − 𝒚 )
( 𝟏 − 𝒂) )
×𝒂 (𝟏 − 𝒂)
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂
=𝒂 −×𝒚
𝝏 𝒛 𝝏𝒂 𝝏 𝒛
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂 𝝏 𝒛
= Partial
× derivative
× of with respect to
𝝏𝒃 𝝏𝒂 𝝏 𝒛 𝝏𝒃
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 𝝏 𝓛 𝝏 𝒂 𝝏 𝒛 𝝏𝒛
× ¿ (𝒂 − 𝒚 )× ¿ (𝒂 − 𝒚 )
𝝏 𝒃¿ (𝒂 − 𝒚 )×𝟏
= ×
𝝏𝒃 𝝏𝒂 𝝏 𝒛 𝝏𝒃
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 𝝏𝓛 𝝏 𝒂 𝝏 𝒛
= Partial
× derivative
× of with respect to
𝝏 𝒘 𝝏 𝒂 𝝏 𝒛 𝝏𝒘
Backward Propagation
• The derivatives can be computed by moving from right to left

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒃
𝒙 𝑻

𝝏𝓛 𝝏𝓛 𝝏 𝒂 𝝏 𝒛 𝝏𝒛
¿ (𝒂 − 𝒚 )×
𝝏 𝒘¿ (𝒂 − 𝒚 ) 𝒙
= × ×
𝝏 𝒘 𝝏 𝒂 𝝏 𝒛 𝝏𝒘
Next Monday’s Lecture…

Deep Learning

Computation Gradient
Overview Vectorization
Graph Descent

Continue…

Computational Graphs
No ratings yet
Computational Graphs
10 pages
Chap 3 Slides
No ratings yet
Chap 3 Slides
95 pages
Lec06 Derivatives
No ratings yet
Lec06 Derivatives
22 pages
Back Propagation
No ratings yet
Back Propagation
10 pages
Tut 01
No ratings yet
Tut 01
39 pages
Demystifying Deep Learning
No ratings yet
Demystifying Deep Learning
68 pages
Deep Learning's Evolution and Impact
No ratings yet
Deep Learning's Evolution and Impact
6 pages
Machine Learning and Pattern Recognition Week 8 - Backprop
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Backprop
8 pages
Chap5 3-BackProp
No ratings yet
Chap5 3-BackProp
41 pages
Automatic Differentiation in ML
No ratings yet
Automatic Differentiation in ML
114 pages
NLP Backpropagation Guide
No ratings yet
NLP Backpropagation Guide
8 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
Lecture 2, Part 2: Backpropagation: Roger Grosse
No ratings yet
Lecture 2, Part 2: Backpropagation: Roger Grosse
9 pages
Back Propagation
No ratings yet
Back Propagation
71 pages
NER and Backpropagation in NLP
No ratings yet
NER and Backpropagation in NLP
80 pages
Machine Learning: Backpropagation Basics
No ratings yet
Machine Learning: Backpropagation Basics
5 pages
Learning 3
No ratings yet
Learning 3
98 pages
Backpropagation: TA: Yi Wen
No ratings yet
Backpropagation: TA: Yi Wen
39 pages
First
No ratings yet
First
92 pages
Neural Networks & Auto Differentiation
No ratings yet
Neural Networks & Auto Differentiation
13 pages
Chapter 6 - Backpropagation
No ratings yet
Chapter 6 - Backpropagation
48 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
07autodiff Nnets
No ratings yet
07autodiff Nnets
12 pages
Back Prop
No ratings yet
Back Prop
8 pages
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
No ratings yet
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
3 pages
Numerical Analysis for Economists
No ratings yet
Numerical Analysis for Economists
57 pages
Differentiable Programming & Optimization
No ratings yet
Differentiable Programming & Optimization
72 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
Autodiff
No ratings yet
Autodiff
12 pages
Karpathy 1 Micrograd 1
No ratings yet
Karpathy 1 Micrograd 1
52 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
9 pages
Part 2 Module 2 DL BP
No ratings yet
Part 2 Module 2 DL BP
66 pages
Step-by-Step Automatic Differentiation Guide
No ratings yet
Step-by-Step Automatic Differentiation Guide
17 pages
AML 04 Backpropagation
100% (1)
AML 04 Backpropagation
26 pages
Gradient Descent & Backpropagation Practice Problems
No ratings yet
Gradient Descent & Backpropagation Practice Problems
7 pages
Lecture12 Diff
No ratings yet
Lecture12 Diff
31 pages
CHAPTER 3.4.1 - Backpropagation - Updated
No ratings yet
CHAPTER 3.4.1 - Backpropagation - Updated
20 pages
3 Gradient
No ratings yet
3 Gradient
31 pages
Lecture02 Backpropagation Annotated
No ratings yet
Lecture02 Backpropagation Annotated
33 pages
Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville (Z-Lib - Org) - 226-228
No ratings yet
Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville (Z-Lib - Org) - 226-228
3 pages
Neural Networks: Gradients & Backpropagation
No ratings yet
Neural Networks: Gradients & Backpropagation
83 pages
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND - 20250415 - 122012 - 0000
No ratings yet
Understanding Backpropagation and Its Role in Deep LearningPARTH LAMBAT AND - 20250415 - 122012 - 0000
18 pages
14 Backprop
No ratings yet
14 Backprop
34 pages
Chapter 3-3 Neural Network-Back Propagation
No ratings yet
Chapter 3-3 Neural Network-Back Propagation
32 pages
Mod 2 DL
No ratings yet
Mod 2 DL
8 pages
Karpathy 1 Micrograd 2
No ratings yet
Karpathy 1 Micrograd 2
27 pages
Week 5 - Ann
No ratings yet
Week 5 - Ann
30 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
2024 04 CS115 Vector Caculus
No ratings yet
2024 04 CS115 Vector Caculus
131 pages
Neural Networks Cost Function & Backpropagation
No ratings yet
Neural Networks Cost Function & Backpropagation
9 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
Vector Calculus: Gradients & Backpropagation
No ratings yet
Vector Calculus: Gradients & Backpropagation
41 pages
Introduction to Differentiable Physics
No ratings yet
Introduction to Differentiable Physics
8 pages
Fast Ai Class Notes
No ratings yet
Fast Ai Class Notes
48 pages
Machine Learning - Exercise 4: Companion Slides
No ratings yet
Machine Learning - Exercise 4: Companion Slides
14 pages
Backpropagation for Neural Nets
No ratings yet
Backpropagation for Neural Nets
30 pages
Asr04 HMM Intro
No ratings yet
Asr04 HMM Intro
38 pages
Machine Learning & Deep Learning Models For Time Series Forecasting
No ratings yet
Machine Learning & Deep Learning Models For Time Series Forecasting
13 pages
AI & ML Researchers' Digest
No ratings yet
AI & ML Researchers' Digest
15 pages
C.W. AI Ch-1 (05 - 07 - 24)
No ratings yet
C.W. AI Ch-1 (05 - 07 - 24)
1 page
DLT Lab Mannual
No ratings yet
DLT Lab Mannual
50 pages
Megersa, Thesis Presentation
No ratings yet
Megersa, Thesis Presentation
40 pages
Unit Iii Convolutional Networks and Sequence Modelling
No ratings yet
Unit Iii Convolutional Networks and Sequence Modelling
38 pages
Gradient Boosting Machine and SHAP For Biogas Production
No ratings yet
Gradient Boosting Machine and SHAP For Biogas Production
73 pages
Transformer Decoder Side
No ratings yet
Transformer Decoder Side
9 pages
Kernel Machines
No ratings yet
Kernel Machines
33 pages
21 KNN
No ratings yet
21 KNN
28 pages
MCC532
No ratings yet
MCC532
2 pages
Where Can Buy Geometry of Deep Learning: A Signal Processing Perspective Ye Ebook With Cheap Price
100% (4)
Where Can Buy Geometry of Deep Learning: A Signal Processing Perspective Ye Ebook With Cheap Price
40 pages
Ensemble Methods in Data Mining
No ratings yet
Ensemble Methods in Data Mining
11 pages
Generative AI Models
No ratings yet
Generative AI Models
5 pages
AI Pastpaper Solve by M.Noman Tariq
No ratings yet
AI Pastpaper Solve by M.Noman Tariq
23 pages
Pre-Trained Models For Sonar Images: Matias Valdenegro-Toro Alan Preciado-Grijalva Bilal Wehbe
No ratings yet
Pre-Trained Models For Sonar Images: Matias Valdenegro-Toro Alan Preciado-Grijalva Bilal Wehbe
8 pages
Soft Computing Techniques Overview
100% (1)
Soft Computing Techniques Overview
1 page
Ijset v11 Issue6 571
No ratings yet
Ijset v11 Issue6 571
5 pages
Univ Practical Schedule 19.11.2024
No ratings yet
Univ Practical Schedule 19.11.2024
2 pages
PixelGen IEEE
No ratings yet
PixelGen IEEE
4 pages
SVM vs CNN in Hyperspectral Classification
No ratings yet
SVM vs CNN in Hyperspectral Classification
11 pages
Intro To AI 5 Pages
No ratings yet
Intro To AI 5 Pages
5 pages
Xia Sparse Local Patch Transformer For Robust Face Alignment and Landmarks CVPR 2022 Paper
No ratings yet
Xia Sparse Local Patch Transformer For Robust Face Alignment and Landmarks CVPR 2022 Paper
10 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
74 pages
Methodological Introduction Texts in Computer Science 3rd Edition 42309098
No ratings yet
Methodological Introduction Texts in Computer Science 3rd Edition 42309098
81 pages
(Deep Learning Using PyTorch) (Cheatsheet)
No ratings yet
(Deep Learning Using PyTorch) (Cheatsheet)
7 pages
Chapter 5 AI
No ratings yet
Chapter 5 AI
32 pages
Lecture 1 - Introduction To The Course and AI, ML
No ratings yet
Lecture 1 - Introduction To The Course and AI, ML
44 pages
Unsupervised Deep Learning-Unit 4
No ratings yet
Unsupervised Deep Learning-Unit 4
26 pages

Lecture21 Deep Learning PartII April12 2021

Uploaded by

Lecture21 Deep Learning PartII April12 2021

Uploaded by

AI for Medicine

April 12, 2021

• The intuition behind this 2-way flow of computations can be explained

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

Where , , , and is the cost (or loss) function

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)

You might also like