0% found this document useful (0 votes)

5 views

WEEK 9

The document outlines lectures on popular CNN models and optimization techniques in deep learning, focusing on models like GoogLeNet and ResNet, as well as challenges such as overfitting and the vanishing gradient problem. It discusses various optimization algorithms including Momentum, Nesterov Accelerated Gradient, and Adagrad, highlighting their advantages and challenges. The content is structured for a course taught by Prof. P. K. Biswas at IIT Kharagpur.

Uploaded by

adithiyaaaiml2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

WEEK 9

Uploaded by

adithiyaaaiml2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Course Name: Deep Learning

Faculty Name: Prof. P. K. Biswas

Department : E & ECE, IIT Kharagpur

Topic
Lecture 41: Popular CNN Models V
Concepts Covered:
 CNN
 AlexNet
 VGG Net
 Transfer Learning
 Challenges in Deep Learning
 GoogLeNet
 ResNet
 etc.
18
Challenges
 Deep learning is data hungry.
 Overfitting or lack of generalization.
 Vanishing/Exploding Gradient Problem.
 Appropriate Learning Rate.
 Covariate Shift.
 Effective training.
16
Vanishing Gradient
Problem
X f1 f2 f3 f4 O
W1 W2 W3 W4
∂O
= X . f1′.W2 . f 2′.W3 . f 3′.W4 . f 4′
∂W1
17
Vanishing Gradient
Problem
 Choice of activation function: ReLU instead
of Sigmoid.
 Appropriate initialization of weights.
 Intelligent Back Propagation Learning
Algorithm.
15

GoogLeNet
ILSVRC 2014 Winner
14
GoogLeNe
t

Convolution Layer

22 Layers with parameters Maxpool Layer

Feature Concatenation
27 Layer including Maxpool layers
Softmax Layer
13
GoogLeNe
t

Inception Module
12
Inception
Module
 Computing 1×1, 3×3, and 5×5 convolutions within
the same module of the network.
 Covers a bigger area, at the same time preserves
fine resolution for small information on the images.
 Use different convolution kernels of different sizes
in parallel from the most accurate detailing (1x1) to
a bigger one (5x5).
 1x1 convolution also reduces computation.
11
Inception
Module
Number of operations for 1×1 =
(14×14×16)×(1×1×480) = 1.5M
Number of operations for 5×5 =
(14×14×48)×(5×5×16) = 3.8M
Total number of operations =
1.5M + 3.8M = 5.3M

Number of operations =
(14×14×48)×(5×5×480) =
112.9M

https://medium.com/coinmonks/paper-review-of-googlenet-
inception-v1-winner-of-ilsvlc-2014-image-classification-
c2b3565a64e7
10
Inception
Module
 Outputs of these filters are then
stacked along the channel
dimension.
 Multi-level feature extractor.
 There are 9 such inception
modules.
 Top-5 error rate of less than 7 %.
9
GoogLeNe
t

Auxiliary Classifier
2
8
Auxiliary
Classifier
 Due to large depth of the network, ability to
propagate gradient back through all the layers was a
concern.
 Auxiliary Classifiers are smaller CNNs put on top of
middle Inception modules.
 Addition of auxiliary classifiers in the middle
exploits the discriminative power of the features
produced by the layers in the middle.
7
Auxiliary
Classifier
 During training, loss of Auxiliary classifiers are added to
the total loss of the network.
 Losses from Auxiliary classifiers were weighted by 0.3.
 Auxiliary classifiers are discarded at Inference time.
Course Name: Deep Learning
Faculty Name: Prof. P. K. Biswas
Department : E & ECE, IIT Kharagpur

Topic
Lecture 42: Popular CNN Models VI
Concepts Covered:
 CNN
Challenges in Deep Learning
 GoogLeNet
 ResNet
 Momentum Optimizer
18
Challenges
 Deep learning is data hungry.
 Overfitting or lack of generalization.
 Vanishing/Exploding Gradient Problem.
 Appropriate Learning Rate.
 Covariate Shift.
 Effective training.
17
Vanishing Gradient
Problem
 Choice of activation function: ReLU instead
of Sigmoid.
 Appropriate initialization of weights.
 Intelligent Back Propagation Learning
Algorithm.
13
GoogLeNe
t

Inception Module
9
GoogLeNe
t

Auxiliary Classifier
6

ResNet
5
ResNe
t
 Core idea is: introduction of Skip
Connection/ Identity Shortcut
Connection that skips one or
more layers.
 Stacking layers should not
degrade performance compared
H(x)
to its shallow counterpart.
 Weight layer learns F(x)=H(x)-x

https://towardsdatascience.com/an-overview-of-resnet-and-
its-variants-5281e2f56035
4
ResNe
t
 By stacking identity mappings the
resultant deep network should give at
least same performance as its shallow
counterpart.
 Deeper network should not give higher
training error than shallow network. H(x)
 During learning the gradient can flow to
any earlier network through shortcut
connections alleviating vanishing
gradient problem.
3
ResNe
t

https://towardsdatascience.com/an-overview-of-resnet-and-
its-variants-5281e2f56035
2
ResNe
t
Forward flow: Layer l -2

l −1,l l −1 l − 2 ,l l −2
a = f (W
l
.a + b +Wl
.a ) Layer l -1 W l − 2 ,l
l − 2 ,l l −2 W l −1,l
= f (Z + W
l
.a ) Layer l

l −2
a = f (Z + a
l l
) if same dimension
1
ResNe
t
Backward Propagation: Layer l -2

∇W l −1,l
= −a .δ
l −1 l
normal path Layer l -1 W l − 2 ,l
W l −1,l
∇W l − 2 ,l
= −a .δ
l −2 l
skip path Layer l

If the skip path has fixed weights, identity

matrix, then they are not updated.
18
Challenges
 Deep learning is data hungry.
 Overfitting or lack of generalization.
 Vanishing/Exploding Gradient Problem.
 Appropriate Learning Rate.
 Covariate Shift.
 Effective training.
9

Optimizing
Gradient Descent
Gradient Descent
Challenges
Challenges of Mini-batch Gradient
Descent
 Choice of Proper Learning Rate:
 Too small a learning rate leads to
slow convergence.
 A large learning rate may lead to
oscillation around the minima or
may even diverge.
Gradient Descent
Challenges
Challenges of Mini-batch Gradient

L(W)
Descent
 Choice of Proper Learning Rate:
 Too small a learning rate leads to
slow convergence.
W
 A large learning rate may lead to
oscillation around the minima or
may even diverge.
Gradient Descent
Challenges
 Learning Rate Schedules: changing learning rate according to
some predefined schedule.
 The same learning rate applies to all parameter updates.
 The data may be sparse and different features have very
different frequencies.
 Updating all of them to the same extent might not be
proper.
 Larger update for rarely occurring features might be a
better choice.
Gradient Descent
Challenges
 Avoiding getting trapped in suboptimal
local minima.
 Difficulty arises in from saddle points,
i.e. points where one dimension slopes
up and another slopes down.
 These saddle points are usually
surrounded by a plateau of the same
error, which makes it hard for SGD to
escape, as the gradient is close to zero
in all dimensions.
Momentum
Optimizer
W2
L(W)

W1
Course Name: Deep Learning
Faculty Name: Prof. P. K. Biswas
Department : E & ECE, IIT Kharagpur

Topic
Lecture 43: Popular Optimizing Gradient Descent
18
Challenges
 Deep learning is data hungry.
 Overfitting or lack of generalization.
 Vanishing/Exploding Gradient Problem.
 Appropriate Learning Rate.
 Covariate Shift.
 Effective training.
Concepts Covered:
 CNN
 ResNet
 Gradient Descent Challenges
 Momentum Optimizer
 Nestevor Accelerated Gradient
 Adagrad.
 etc.
Gradient Descent
Challenges
Challenges of Mini-batch Gradient

L(W)
Descent
 Choice of Proper Learning Rate:
 Too small a learning rate leads to
slow convergence.
W
 A large learning rate may lead to
oscillation around the minima or
may even diverge.
Gradient Descent
Challenges
 Learning Rate Schedules: changing learning rate according to
some predefined schedule.
 The same learning rate applies to all parameter updates.
 The data may be sparse and different features have very
different frequencies.
 Updating all of them to the same extent might not be
proper.
 Larger update for rarely occurring features might be a
better choice.
Gradient Descent
Challenges
 Avoiding getting trapped in suboptimal
local minima.
 Difficulty arises from saddle points, i.e.
points where one dimension slopes up
and another slopes down.
 These saddle points are usually
surrounded by a plateau of the same
error, which makes it hard for SGD to
escape, as the gradient is close to zero
in all dimensions.
9

Optimizing
Gradient Descent
Concepts Covered:
 CNN
 ResNet
 Gradient Descent Challenges
 Momentum Optimizer
 Adagrad.
 etc.
9

Momentum Optimizer
Momentum
Optimizer
W2
L(W)

W1
Momentum
Optimizer
W2

W
1
Momentum
Optimizer
W2 W2

W1 W1
SGD SGD with Momentum
9

Nesterov Accelerated
Gradient (NAG)
Nesterov Accelerated Gradient
(NAG)
W2

W
1
Problem with Momentum
Optimizer/NAG
 Both the algorithms require the hyper-parameters to be
set manually.
 These hyper-parameters decide the learning rate.
 The algorithm uses same learning rate for all dimensions.
 The high dimensional (mostly) non-nonconvex nature of
loss function may lead to different sensitivity on different
dimension.
 We may require learning rate could be small in some
dimension and large in another dimension.
Course Name: Deep Learning
Faculty Name: Prof. P. K. Biswas
Department : E & ECE, IIT Kharagpur

Topic
Lecture 44: Optimizing Gradient Descent II
Concepts Covered:
 CNN
 Gradient Descent Challenges
 Momentum Optimizer
 Nesterov Accelerated Gradient
 Adagrad
RMSProp
 etc.
Momentum
Optimizer
W2

W
1
Momentum
Optimizer
W2 W2

W1 W1
SGD SGD with Momentum
9

Nesterov Accelerated
Gradient (NAG)
Nesterov Accelerated Gradient
(NAG)
W2

W
1
Problem with Momentum
Optimizer/NAG
 Both the algorithms require the hyper-parameters to be
set manually.
 These hyper-parameters decide the learning rate.
 The algorithm uses same learning rate for all dimensions.
 The high dimensional (mostly) non-nonconvex nature of
loss function may lead to different sensitivity on different
dimension.
 We may require learning rate be small in some dimension
and large in another dimension.
9

Adagrad
Adagra
d
 Adagrad adaptively scales the learning rate for different
dimensions.
 Scale factor of a parameter is inversely proportional to the
square root of sum of historical squared values of the
gradient.
 The parameters with the largest partial derivative of the
loss will have rapid decrease in their learning rate.
 Parameters with small partial derivatives will have
relatively small decrease in learning rate.
Adagra
d
t
rt = ∑ gτ  gτ
1
gt = ∑ ∇W L(Wt , X )
n ∀X ∈Minibatch τ =1
η
Wt +1 = Wt −  gt
∈ I + rt
 → element - wise product
Adagra
d
 η (1) 
 .g t 
Wt +1  Wt   ∈ + rt
(1)
(1) (1)

 ( 2)   ( 2)   η ( 2) 
Wt +1  = Wt  −  .g t 
 :   :   ∈ + rt
( 2)

 (d )   (d )   : 
Wt +1  Wt   η (d ) 
 .g t 
 ∈ + rt
(d )

Adagra
d Side:
Positive
 Adagrad adaptively scales the learning rate for different
dimensions by normalizing with respect to the gradient magnitude
in the corresponding dimension.
 Adagrad eliminates the need to manually tune the learning rate.
 Reduces learning rate faster for parameters showing large slope
and slower for parameters giving smaller slope.
 Adagrad converges rapidly when applied to convex functions.
Adagra
d
Negative side:
 If the function is non-convex:- trajectory may pass through
many complex terrains eventually arriving at a locally region.
 By then learning rate may become too small due to the
accumulation of gradients from the beginning of training.
 So at some point the model may stop learning.
Course Name: Deep Learning
Faculty Name: Prof. P. K. Biswas
Department : E & ECE, IIT Kharagpur

Topic
Lecture 45: Optimizing Gradient Descent III
Concepts Covered:
 CNN
 Gradient Descent Challenges
 Momentum Optimizer
 Nesterov Accelerated Gradient
 Adagrad
RMSProp
 etc.
Adagra
d
t
rt = ∑ gτ  gτ
1
gt = ∑ ∇W L(Wt , X )
n ∀X ∈Minibatch τ =1
η
Wt +1 = Wt −  gt
∈ I + rt
 → element - wise product
Adagra
d Side:
Positive
 Adagrad adaptively scales the learning rate for different
dimensions by normalizing with respect to the gradient magnitude
in the corresponding dimension.
 Adagrad eliminates the need to manually tune the learning rate.
 Reduces learning rate faster for parameters showing large slope
and slower for parameters giving smaller slope.
 Adagrad converges rapidly when applied to convex functions.
Adagra
d
Negative side:
 If the function is non-convex:- trajectory may pass through
many complex terrains eventually arriving at a locally region.
 By then learning rate may become too small due to the
accumulation of gradients from the beginning of training.
 So at some point the model may stop learning.
9

RMSProp
RMSPro
p
 RMSProp uses exponentially decaying average of squared
gradient and discards history from the extreme past.
 Converges rapidly once it finds a locally convex bowl.
 Treats this as an instance of Adagrad algorithm initialized
within that bowl.
RMSPro
p
1
gt = ∑ ∇W L(Wt , X )
n ∀X ∈Minibatch

rt = β r t −1+(1 − β ) g t  g t Exponentially decaying average

η
Wt +1 = Wt −  gt
∈ I + rt
RMSProp with Nesterov
Momentum
~ 1 ~
W = Wt + αv gt = ∑ ∇W L(W , X )
n ∀X ∈Minibatch

rt = β r t −1+(1 − β ) g t  g t
η
vt +1 = α vt −  gt Wt +1 = Wt + vt
∈ I + rt
9

Adaptive Moments
(Adam)
Adam
 Variant of the combination of RMSProp and Momentum.
 Incorporates first order moment (with exponential weighting)
of the gradient (Momentum term).
 Momentum is incorporated in RMSProp by adding momentum
to the rescaled gradients.
 Both first and second moments are corrected for bias to
account for heir initialization to zero.
Adam
1
gt = ∑ ∇W L(W , X )
n ∀X ∈Minibatch

Biased first and second moments

st = β1 s t −1 +(1 − β1 ) g t
rt = β 2 r t −1+(1 − β 2 ) g t  g t
Adam
Bias corrected first and second moments
st rt
sˆt = rˆt =
1 − β1 1− β2
sˆt
Wt +1 = Wt − η
∈ I + rˆt
Momentum
Optimizer

Animation Source:-
https://imgur.com/a/Hqolp

Secret Garden Yorkshire Words
No ratings yet
Secret Garden Yorkshire Words
10 pages
Campey - Imants Shockwave 100-155-210 - Operators Manual
No ratings yet
Campey - Imants Shockwave 100-155-210 - Operators Manual
30 pages
SS_2020
No ratings yet
SS_2020
21 pages
UNIT-II [ML-I]
No ratings yet
UNIT-II [ML-I]
81 pages
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
No ratings yet
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
50 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
No ratings yet
Basics of DL: Prof. Leal-Taixé and Prof. Niessner 1
76 pages
S09_DNN_Gradients_wip
No ratings yet
S09_DNN_Gradients_wip
28 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
WS_2021
No ratings yet
WS_2021
16 pages
Adl Unit 1 2
No ratings yet
Adl Unit 1 2
67 pages
DL Intro
No ratings yet
DL Intro
64 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
SS_2021
No ratings yet
SS_2021
16 pages
L10 Learning II Gradient Based Learning
No ratings yet
L10 Learning II Gradient Based Learning
72 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Lecture 9 Training Deep Networks
No ratings yet
Lecture 9 Training Deep Networks
20 pages
Deep Learning Module-03 Search Creators
No ratings yet
Deep Learning Module-03 Search Creators
20 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Optimization
No ratings yet
Optimization
51 pages
CST414-SCHEME
No ratings yet
CST414-SCHEME
8 pages
Unit 2.2
No ratings yet
Unit 2.2
46 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
NISS Deep Learning Tutorial
No ratings yet
NISS Deep Learning Tutorial
58 pages
Mock Endterm ADL 2021
No ratings yet
Mock Endterm ADL 2021
8 pages
WEEK 4
No ratings yet
WEEK 4
61 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
cours5
No ratings yet
cours5
23 pages
mv_cs4243_2024_amir_6_p1 (1)
No ratings yet
mv_cs4243_2024_amir_6_p1 (1)
97 pages
Assignment_14_Modern_AI
No ratings yet
Assignment_14_Modern_AI
3 pages
5.Scaling_Optimization
No ratings yet
5.Scaling_Optimization
68 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Lec 8
No ratings yet
Lec 8
43 pages
WEEK 10
No ratings yet
WEEK 10
69 pages
Lecture_2
No ratings yet
Lecture_2
31 pages
Lecture 4
No ratings yet
Lecture 4
45 pages
NN 08
No ratings yet
NN 08
36 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Lecture8 DeepLearning
No ratings yet
Lecture8 DeepLearning
94 pages
Dlincv 161110052148 PDF
No ratings yet
Dlincv 161110052148 PDF
271 pages
Unit 3
No ratings yet
Unit 3
110 pages
Lbdl a5 Booklet
No ratings yet
Lbdl a5 Booklet
90 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Solving Parabolic Periodic P-Laplacian by Deep Learning
No ratings yet
Solving Parabolic Periodic P-Laplacian by Deep Learning
15 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
UNIT 5
No ratings yet
UNIT 5
36 pages
2 Deep Neural Network_241120_095158
No ratings yet
2 Deep Neural Network_241120_095158
47 pages
Quiz sol
No ratings yet
Quiz sol
4 pages
Lecture 7 - Optimization Part I
No ratings yet
Lecture 7 - Optimization Part I
38 pages
Lesson 4 Training ANNs
No ratings yet
Lesson 4 Training ANNs
34 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computational Epidemiology From Disease Transmission Modeling To Vaccination Decision Making 1st Ed
No ratings yet
Computational Epidemiology From Disease Transmission Modeling To Vaccination Decision Making 1st Ed
126 pages
LESSON 2 - TRANSMUTATION - Louise Peralta - 11 - Fairness
No ratings yet
LESSON 2 - TRANSMUTATION - Louise Peralta - 11 - Fairness
2 pages
Question Bank Sem 4
100% (1)
Question Bank Sem 4
3 pages
The Benefit of Adding MSG On The Growth of Tomato Plants
50% (2)
The Benefit of Adding MSG On The Growth of Tomato Plants
17 pages
Data Delta Mahakan Lapangan Handil
No ratings yet
Data Delta Mahakan Lapangan Handil
14 pages
Radioactivity 2023
No ratings yet
Radioactivity 2023
11 pages
Nursing Practice Ii - Questions
No ratings yet
Nursing Practice Ii - Questions
17 pages
Ps Final Report
No ratings yet
Ps Final Report
37 pages
Stoicism 101
100% (1)
Stoicism 101
4 pages
Mto2150502 150110105053,54,55,56,59
No ratings yet
Mto2150502 150110105053,54,55,56,59
39 pages
Midterm English 9
No ratings yet
Midterm English 9
3 pages
6B Power Plants
No ratings yet
6B Power Plants
2 pages
Stage of International Business
No ratings yet
Stage of International Business
25 pages
Bowel's Reaction Series
No ratings yet
Bowel's Reaction Series
6 pages
Product and Company Identification: Safety Data Sheet
No ratings yet
Product and Company Identification: Safety Data Sheet
6 pages
Farm Animal Note Year 7
No ratings yet
Farm Animal Note Year 7
3 pages
XLR Ju 1
No ratings yet
XLR Ju 1
7 pages
Eksporter Importer: Remarks
No ratings yet
Eksporter Importer: Remarks
1 page
GS 94010-2 - Lubricants For Process Chain v. 2020-11
No ratings yet
GS 94010-2 - Lubricants For Process Chain v. 2020-11
19 pages
REHJHGs Hxig CEjo KPLZ AMH3 XH C3 FYSKB9 FLJF EWE
100% (1)
REHJHGs Hxig CEjo KPLZ AMH3 XH C3 FYSKB9 FLJF EWE
5 pages
Population Explosion in India
0% (1)
Population Explosion in India
17 pages
Pages from Tord Berglundh editor William V Giannobile editor Mariano Sanz editor Niklaus P Lang editor - Lindhes Clinical Periodontology and Implant Dentistry Volume I-II-Wiley-Blackwell 2021(4)
No ratings yet
Pages from Tord Berglundh editor William V Giannobile editor Mariano Sanz editor Niklaus P Lang editor - Lindhes Clinical Periodontology and Implant Dentistry Volume I-II-Wiley-Blackwell 2021(4)
63 pages
According To The Writer, What Happened To People That Tried To Swim or Wade The River?
No ratings yet
According To The Writer, What Happened To People That Tried To Swim or Wade The River?
3 pages
User Guide: Kasa Smart Wi-Fi Plug Mini HS103
No ratings yet
User Guide: Kasa Smart Wi-Fi Plug Mini HS103
20 pages
Yilmaz
No ratings yet
Yilmaz
45 pages
Practical Guide to Vegetable Oil Processing 2nd Edition Monoj K. Gupta download pdf
100% (1)
Practical Guide to Vegetable Oil Processing 2nd Edition Monoj K. Gupta download pdf
65 pages
Ear Irrigation Procedure Rationale
No ratings yet
Ear Irrigation Procedure Rationale
4 pages
10_Practice problem
No ratings yet
10_Practice problem
5 pages

WEEK 9

Uploaded by

WEEK 9

Uploaded by

Course Name: Deep Learning

Faculty Name: Prof. P. K. Biswas

22 Layers with parameters Maxpool Layer

If the skip path has fixed weights, identity

rt = β r t −1+(1 − β ) g t  g t Exponentially decaying average

Biased first and second moments

You might also like