AI for Medicine
Lecture 21:
Deep Learning – Part II
April 12, 2021
Mohammad Hammoud
Carnegie Mellon University in Qatar
Today…
• Last Monday’s Session:
• Deep Learning – Part I
• Today’s Session:
• Deep Learning – Part II
• Announcements:
• Assignment 3 is due on Wednesday April 14 by midnight
• Quiz II is on April 19
Outline
Deep Learning
Computation Gradient
Overview Vectorization
Graph Descent
The Flow of Computations in Neural
Networks
• The flow of computations in a neural network goes in two ways:
1. Left-to-right: This is referred to as forward propagation, which results in
computing the output of the network
2. Right-to-left: This is referred to as back propagation, which results in
computing the gradients (or derivatives) of the parameters in the network
• The intuition behind this 2-way flow of computations can be explained
through the concept of “computation graphs”
• What is a computation graph?
What is a Computation Graph?
• Let us assume we want to compute the following function :
𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄
𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗
𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :
𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄
𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗
𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :
𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄
𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗
𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :
𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄
𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗
𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward Propagation
• Let us assume we want to compute the following function :
𝑱 ( 𝒂 , 𝒃 , 𝒄 )=𝟑 (𝒂 +𝒃𝒄 ) 𝒖=𝒃𝒄
𝒖 𝒗 =𝒂+𝒖
𝒗
𝑱 𝑱 =𝟑 𝒗
𝒂=𝟐
𝒃=𝟒
Computation
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐 Graph
𝒄=𝟑
Forward propagation allows computing
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=𝟑
𝒅𝒗
To compute the derivative of with respect to , we went back to ,
nudged it, and measured the corresponding resultant increase on
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅 𝒂
The change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅 𝒂
And the change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
This is denoted as the chain rule in calculus
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱
=𝟑= ×𝟏
𝒅𝒂 𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=𝟑=𝟑 ×𝟏
𝒅𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
In essence, to compute the derivative of with respect to , we had to go back to ,
nudge it a little bit, and measure the corresponding resultant increase on
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
Then, we had to go back to , nudge it a little bit, and measure the corresponding
resultant increase on
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒 2
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒂 𝒅𝒗 𝒅𝒂
Then, we multiplied the changes together (i.e., we applied the chain rule!)
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= .
𝒅𝒖 𝒅𝒗 𝒅 𝒂
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
The change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅 𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
And the change in caused a change in
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱
=𝟑= ×𝟏
𝒅𝒖 𝒅𝒗
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱
=𝟑=𝟑 ×𝟏
𝒅𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟑
𝒅𝑱 𝒅𝑱 𝒅𝒗
=𝟑= ×
𝒅𝒖 𝒅𝒗 𝒅 𝒖
Same as before, we had to go back to then to in order to compute the derivative
of
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒃
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒃
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=? = × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃
3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟗= × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃
3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟗= × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃
3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟗= × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃
3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
4 𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
12 1 𝟒𝟐.𝟎𝟎𝟗
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟗= × ×
𝒅𝒃 𝒅𝒗 𝒅 𝒖 𝒅 𝒃
3 1 3
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ Derivative of with respect to
𝒅𝒄
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
𝒅𝑱
=¿ If we change a little bit, how would change?
𝒅𝒄
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=? = × ×
𝒅𝒄 𝒅𝒗 𝒅 𝒖 𝒅 𝒄
3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅 𝒗 𝒅 𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅 𝒖 𝒅 𝒄
3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅𝒗 𝒅𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅𝒖 𝒅𝒄
3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅𝒗 𝒅𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅𝒖 𝒅𝒄
3 1 4
Backward Propagation
• Let us now compute the derivatives of the variables through the
computation graph as follows:
𝒂=𝟐
𝒃=𝟒
𝒖=𝒃𝒄 𝟏𝟐 𝒗=𝒂+𝒖 1
𝑱 =𝟑 𝒗 𝟒𝟐
𝒄=𝟑
3 12 1 𝟒𝟐.𝟎𝟏𝟐
𝒅𝑱 𝒅𝑱 𝒅𝒗 𝒅𝒖
=𝟏𝟐= × ×
𝒅𝒄 𝒅𝒗 𝒅𝒖 𝒅𝒄
3 1 4
Outline
Deep Learning
Computation Gradient
Overview Vectorization
Graph Descent
The Computation Graph of Logistic
Regression
• Let us translate logistic regression (which is a neural network with only
1 neuron) into a computation graph
𝟏
𝒃 𝑻
𝒛 =𝒘 𝒙 + 𝒃
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒙𝟏 𝒘
𝟏 𝒃
𝒙 𝟐 𝒘𝟐 𝒛 𝒂 ^
𝒚 𝒙 𝑻
𝒘𝟑
𝒘
𝒙𝟑
𝒂=𝝈 ( 𝒛 )
Where , , , and is the cost (or loss) function
Forward Propagation
• The loss function can be computed by moving from left to right
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝒘
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 𝝏
( − 𝒚𝒍𝒐𝒈
= Partial ( 𝒂 ) −of
derivative ( 𝟏with
− 𝒚 ) respect − 𝒂) )
𝒍𝒐𝒈 ( 𝟏 to
𝝏𝒂 𝝏𝒂
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 𝝏
= ( − 𝒚𝒍𝒐𝒈 ( 𝒂 ) − ( 𝟏 − 𝒚 ) 𝒍𝒐𝒈 ( 𝟏 − 𝒂 ) )
𝝏𝒂 𝝏𝒂
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 −𝝏𝒚 (𝟏 − 𝒚 () ) (
= ( − 𝒚𝒍𝒐𝒈 𝒂 − 𝟏 − 𝒚 ) 𝒍𝒐𝒈 ( 𝟏 − 𝒂 ) )
+
𝝏𝒂 𝝏𝒂𝒂 ( 𝟏− 𝒂)
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 𝝏 𝓛 𝝏 𝒂
= Partial
× derivative of with respect to
𝝏 𝒛 𝝏𝒂 𝝏 𝒛
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 𝝏 𝓛 𝝏 𝒂 − 𝒚 (𝟏 − 𝒚 )
= × ¿
𝝏 𝒛 𝝏𝒂 𝝏 𝒛 𝒂 (
+
( 𝟏 − 𝒂)
× ¿
)
𝝏𝒛 𝒂
+
(
𝝏 𝒂 − 𝒚 (𝟏 − 𝒚 )
( 𝟏 − 𝒂) )
×𝒂 (𝟏 − 𝒂)
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 𝝏 𝓛 𝝏 𝒂
=𝒂 −×𝒚
𝝏 𝒛 𝝏𝒂 𝝏 𝒛
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 𝝏 𝓛 𝝏 𝒂 𝝏 𝒛
= Partial
× derivative
× of with respect to
𝝏𝒃 𝝏𝒂 𝝏 𝒛 𝝏𝒃
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 𝝏 𝓛 𝝏 𝒂 𝝏 𝒛 𝝏𝒛
× ¿ (𝒂 − 𝒚 )× ¿ (𝒂 − 𝒚 )
𝝏 𝒃¿ (𝒂 − 𝒚 )×𝟏
= ×
𝝏𝒃 𝝏𝒂 𝝏 𝒛 𝝏𝒃
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 𝝏𝓛 𝝏 𝒂 𝝏 𝒛
= Partial
× derivative
× of with respect to
𝝏 𝒘 𝝏 𝒂 𝝏 𝒛 𝝏𝒘
Backward Propagation
• The derivatives can be computed by moving from right to left
𝒛=𝒘 𝒙 +𝒃 𝒂=𝝈 (𝒛) 𝓛 (𝒂, 𝒚)
𝒃
𝒙 𝑻
𝝏𝓛 𝝏𝓛 𝝏 𝒂 𝝏 𝒛 𝝏𝒛
¿ (𝒂 − 𝒚 )×
𝝏 𝒘¿ (𝒂 − 𝒚 ) 𝒙
= × ×
𝝏 𝒘 𝝏 𝒂 𝝏 𝒛 𝝏𝒘
Next Monday’s Lecture…
Deep Learning
Computation Gradient
Overview Vectorization
Graph Descent
Continue…