Part 1.3. Optimazation of Learning Algorithms

The document discusses optimization techniques for deep learning models, including momentum, Adagrad, RMSProp, and Adam. It describes challenges like local minima, vanishing gradients, and overfitting/underfitting. Momentum helps models converge faster by incorporating previous gradient directions. Adagrad and RMSProp adapt the learning rate for each parameter based on its update history. Adam combines momentum and RMSProp benefits. Dropout helps prevent overfitting by randomly dropping nodes during training.

Uploaded by

Việt Hoàng

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Part 1.3. Optimazation of Learning Algorithms

Uploaded by

Việt Hoàng

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

AI - FOUNDATION AND APPLICATION

Instructor:
Assoc. Prof. Dr. Truong Ngoc Son
Chapter 3
Optimazation of learning process
Outline
The challenges in Deep learning
Momentum
ADAGRAD – Adaptive Gradient Descent
RMSPROP (Root Mean Squared Propagation)
ADAM
Dropout
The challenge in Deep Learning
Local minima
 The objective function of deep learning usually has many local minima
 When the numerical solution of an optimization problem is near the local
optimum, the numerical solution obtained by the final iteration may only
minimize the objective function locally, rather than globally, as the gradient of
the objective functionʼs solutions approaches or becomes zero
L(w)

Local minimum Local minimum

w
Global minimum
The challenge in Deep Learning
Vanishing Gradient
 As more layers using certain activation functions are added to neural networks,
the gradients of the loss function approaches zero, making the network hard to
train
 The simplest solution is to use other activation functions, such as ReLU, which
doesn’t cause a small derivative.
 Residual networks are another solution, as they provide residual connections
straight to earlier layers
The challenge in Deep Learning
Over fitting and under fitting
 Overfitting is a modeling error in statistics that occurs when a function is too
closely aligned to a limited set of data points. As a result, the model is useful in
reference only to its initial data set, and not to any other data sets
 The model fit well the training data, but it not show the good performance with
the testing data
 Underfitting is a scenario in data science where a data model is unable to capture
the relationship between the input and output variables accurately, generating a
high error rate on both the training set and unseen data
Momentum
 The method of momentum is designed to accelerate learning.
 The momentum algorithm accumulates exponentially decaying moving average
of past gradient and continues to move in their direction

Gradient descent y y
Local minimal Local minimal
𝑤𝑡 = 𝑤𝑡 − ∆𝑤𝑡
∆𝑤𝑡 = 𝜂𝛻C 𝑤

Gradient descent with momentum

∆𝑤𝑡 = 𝜂𝛻C 𝑤 + 𝛾∆𝑤𝑡−1 x x

(a) Không sử dụng momentum (a) Sử dụng momentum
ADAGRAD – Adaptive Gradient Descent
 Decay the learning rate for parameters in proportion to their update history
 Adapts the learning rate to the parameters, performing smaller updates (low
learning rates) for parameters associated with frequently occurring features, and
larger updates (high learning rates) for parameters associated with infrequent
features
 It is well-suited for dealing with sparse data
 Adagrad greatly improved the robustness of SGD and used it for training large-
scale neural nets
𝜕L
𝑤𝑡 = 𝑤𝑡 − 𝜂′
𝜕𝑤𝑡−1
Where
𝜂
𝜂′ =
𝛼𝑡 + 𝜀
𝑡 2
𝜕L
𝛼𝑡 =
𝜕𝑤𝑡−1
𝑖=1
RMSProp (Root Mean Squared Propagation)
 Adapts the learning rate to the parameters
 Divide the learning rate for a weight by a running average of the magnitudes of
recent gradients for that weight

𝜕𝐿
𝑤𝑡 = 𝑤𝑡 − 𝜂′
𝜕𝑤𝑡−1
Where
𝜂
𝜂′ =
𝛼𝑡 + 𝜀
2
𝜕𝐿
𝛼𝑡 = 𝛽𝛼𝑡−1 + 1 − 𝛽
𝜕𝑤𝑡−1
Adam — Adaptive Moment Estimation
 ADAM combines two stochastic gradient descent approaches, Adaptive
Gradients, and Root Mean Square Propagation
 Adam also keeps an exponentially decaying average of past gradients similar to
SGD with momentum

𝜕L
𝑣𝑑𝑊 = 𝛽1 𝑣𝑑𝑊 + 1 − 𝛽1
𝜕𝑤
𝜕L
𝑠𝑑𝑊 = 𝛽2 𝑠𝑑𝑊 + 1 − 𝛽2
𝜕𝑤
𝑣𝑑𝑊
𝑣𝑑𝑊 =
1 − 𝛽1 𝑡
𝑠𝑑𝑊
𝑠𝑑𝑊 =
1 − 𝛽2 𝑡
𝑣𝑑𝑊
𝑊 =𝑊−𝜂
𝑠𝑑𝑊 + 𝜀
Dropout
 Avoid overfitting problem
 Probabilistically dropping out nodes in the network is a simple and effective
regularization method
 Dropout is implemented per-layer in a neural network
 A common value is a probability of 0.5 for retaining the output of each node in a
hidden layer

W pW
Always
Appear with probability of p appear
Training Testing

Standard NN Dropout NN
Dropout
 How to apply dropout
𝑙+1 𝑙+1 𝑙+1
𝑧𝑖 = 𝑤𝑖 𝑦 𝑙 + 𝑏𝑖
𝑙+1 𝑙+1
𝑦𝑖 = 𝑓(𝑧𝑖 )

1 1
𝑙
r 3( l )
b i( l  1 )
𝑟𝑗 ~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 𝑝
b i( l  1 )

y (l)
3
y 3( l ) ~
y 3
(l) 𝑦 𝑙 =𝑟 𝑙 ∗𝑦 𝑙
f 𝑙+1 𝑙+1 𝑙 𝑙+1
w i( l 1)
z i( l  1 ) f y i( l  1 ) r 2( l )
w i( l 1)
z i( l  1 ) y i( l  1 ) 𝑧𝑖 = 𝑤𝑖 𝑦 + 𝑏𝑖
𝑙+1 𝑙+1
y (l)
2
y (l)
2
~
y (l)
2 𝑦𝑖 = 𝑓(𝑧𝑖 )
(l)
r1

y 1( l ) ~
y 1( l )
y 1( l )

Standard NN Dropout NN
PYTHON CODE
Assignments
Design a multilayer neural network, apply the
optimizations of learning algorithms.
(input layer, 2 hidden layers (sigmoid,ReLU) , output layer)
Optimization: Momentum, Adagrad, Dropout ( +
Momentum, Adagrad)
Compare : Accuracy, (Converging time )
Dataset: MNIST
Assignments
Week 8: Submit assignment
Week 9: Quiz, decision of final project
Week 10-14: CNN
Week 15-17: Final project present

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
94% (68)
Knee Ability Zero Now Complete As A Picture Book 4 PDF Free
49 pages
Read People Like A Book by Patrick King-Edited
61% (72)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (29)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (55)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
Optics Formula Sheet Study Sheet PhysicsA 2010
100% (1)
Optics Formula Sheet Study Sheet PhysicsA 2010
1 page
1001 Songs
70% (71)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
complete RNN LSTM Encoder
No ratings yet
complete RNN LSTM Encoder
65 pages
DL Assigment Aryan Gupta UE218015
No ratings yet
DL Assigment Aryan Gupta UE218015
5 pages
CHAPTER 3.4.1 - Backpropagation_Updated
No ratings yet
CHAPTER 3.4.1 - Backpropagation_Updated
20 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
Inbound 8392301798635648784
No ratings yet
Inbound 8392301798635648784
43 pages
Introduction to Neural Network and Deep Learning
No ratings yet
Introduction to Neural Network and Deep Learning
59 pages
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
12 pages
Rumus MPC Uts Gasal 2017/2018 Angkatan 58: Standar Error
No ratings yet
Rumus MPC Uts Gasal 2017/2018 Angkatan 58: Standar Error
2 pages
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
No ratings yet
Lecture 04 - Supervised Learning by Computing Distances (2) - Plain
16 pages
optimization techniques (SGD alternatives)
No ratings yet
optimization techniques (SGD alternatives)
34 pages
Lecture W2c
No ratings yet
Lecture W2c
16 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
CEE 5403 Lecture 6
No ratings yet
CEE 5403 Lecture 6
18 pages
Section 2
No ratings yet
Section 2
25 pages
SDET Formulae MidSem2 2018 Ver3
No ratings yet
SDET Formulae MidSem2 2018 Ver3
2 pages
Error Based Learning
No ratings yet
Error Based Learning
48 pages
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
No ratings yet
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
38 pages
IFD Sedi
No ratings yet
IFD Sedi
2 pages
Regularization and Feature Selectio N
No ratings yet
Regularization and Feature Selectio N
102 pages
Additional Notes 3 - Forecasting Model Performance
No ratings yet
Additional Notes 3 - Forecasting Model Performance
5 pages
Optimization Problem
No ratings yet
Optimization Problem
21 pages
HPGE Formulas - Soil and Geo HPGE4637_unlocked
No ratings yet
HPGE Formulas - Soil and Geo HPGE4637_unlocked
3 pages
Formula Sheet For Free Vibration
100% (1)
Formula Sheet For Free Vibration
5 pages
A 4 TH Order 7-Dimensional Polynomial WH
No ratings yet
A 4 TH Order 7-Dimensional Polynomial WH
11 pages
Linear Regression-2
No ratings yet
Linear Regression-2
2 pages
CHE S402 Chapter7 Internal Transport Processes Part3
No ratings yet
CHE S402 Chapter7 Internal Transport Processes Part3
17 pages
第二章(1)
No ratings yet
第二章(1)
37 pages
PRAKKOMP 12 Method of Lines
No ratings yet
PRAKKOMP 12 Method of Lines
16 pages
Lecture 23
No ratings yet
Lecture 23
31 pages
Steady State Plug Flow Reactor: Hernandez Pine
No ratings yet
Steady State Plug Flow Reactor: Hernandez Pine
17 pages
Curs7 PDF
No ratings yet
Curs7 PDF
46 pages
03 Solution Methods For Linear Equations
No ratings yet
03 Solution Methods For Linear Equations
27 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
Practical Research 2 Quantitative Research: Inferential Statistics Reference of Formulas Hypothesis-Testing Process
No ratings yet
Practical Research 2 Quantitative Research: Inferential Statistics Reference of Formulas Hypothesis-Testing Process
4 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
Electerical Systems
No ratings yet
Electerical Systems
6 pages
Neural Network: Sudipta Roy
No ratings yet
Neural Network: Sudipta Roy
25 pages
SPH3U2 Formula Sheet
No ratings yet
SPH3U2 Formula Sheet
2 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
6.TrainingNN
No ratings yet
6.TrainingNN
51 pages
EE-101 Frequency Response - 1 July-Nov 2017
No ratings yet
EE-101 Frequency Response - 1 July-Nov 2017
5 pages
WTW123 - Formulas, Proofs and Graphs
No ratings yet
WTW123 - Formulas, Proofs and Graphs
22 pages
Curs4site PDF
No ratings yet
Curs4site PDF
44 pages
PPT4_All Source Shortest Path_FLoyd Warshall
No ratings yet
PPT4_All Source Shortest Path_FLoyd Warshall
20 pages
MN906-NNWatermarking
No ratings yet
MN906-NNWatermarking
66 pages
Microscope EM
No ratings yet
Microscope EM
9 pages
Lecture 8 Backpropagation
No ratings yet
Lecture 8 Backpropagation
28 pages
1phase-Half Wave Uncontrolled Rectifier: RL - Load: R-Load With Capacitor Filter
No ratings yet
1phase-Half Wave Uncontrolled Rectifier: RL - Load: R-Load With Capacitor Filter
1 page
CS772-Lec21
No ratings yet
CS772-Lec21
26 pages
Lecture 14 Simple Random Sampling 3
No ratings yet
Lecture 14 Simple Random Sampling 3
15 pages
Lecture - 4 - Understanding of First & Second Order Systems-V3
No ratings yet
Lecture - 4 - Understanding of First & Second Order Systems-V3
31 pages
Formulas General Physics 1 1
No ratings yet
Formulas General Physics 1 1
1 page
Formulas in Business Math
No ratings yet
Formulas in Business Math
1 page
Formulas in Business Math
No ratings yet
Formulas in Business Math
1 page
Lesson 1 and 2 - Solving Quadratic Equations by Extracting Square Roots and Factoring
No ratings yet
Lesson 1 and 2 - Solving Quadratic Equations by Extracting Square Roots and Factoring
21 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Introduction To HEC DSS
No ratings yet
Introduction To HEC DSS
25 pages
Tugas Citra 4
No ratings yet
Tugas Citra 4
23 pages
Krushi 12333444444
No ratings yet
Krushi 12333444444
83 pages
Complete Query - User Requirement and System Requirement
No ratings yet
Complete Query - User Requirement and System Requirement
39 pages
10 Chapter 10 (202-241)
No ratings yet
10 Chapter 10 (202-241)
40 pages
Top-Down Distribution Template
No ratings yet
Top-Down Distribution Template
4 pages
C++ Full Course 2
No ratings yet
C++ Full Course 2
80 pages
Assignments - Day 2 - Jupyter Notebook
No ratings yet
Assignments - Day 2 - Jupyter Notebook
6 pages
Express
No ratings yet
Express
11 pages
New Revised Esc101 Prutor All Labs and Quiz Problems by Sujal PDF
No ratings yet
New Revised Esc101 Prutor All Labs and Quiz Problems by Sujal PDF
146 pages
Unit 2 Advanced ES6 Features in JavaScript and Typescript
100% (1)
Unit 2 Advanced ES6 Features in JavaScript and Typescript
20 pages
Main PDF 2 - PHP Operators and Control Structures PDF
No ratings yet
Main PDF 2 - PHP Operators and Control Structures PDF
27 pages
Anuj Sharma Training Profile
No ratings yet
Anuj Sharma Training Profile
2 pages
Reduced Restructuring in Splay Trees: Evan Huus (Carleton University) April 15, 2014
No ratings yet
Reduced Restructuring in Splay Trees: Evan Huus (Carleton University) April 15, 2014
15 pages
Backend Sec Application
No ratings yet
Backend Sec Application
183 pages
Long Short-Term Memory Recurrent Neural Network Architectures For Large Scale Acoustic Modeling
No ratings yet
Long Short-Term Memory Recurrent Neural Network Architectures For Large Scale Acoustic Modeling
5 pages
Switch Framework SAP
No ratings yet
Switch Framework SAP
44 pages
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
No ratings yet
Buy Quest Products Buy Guy'S Book Buy Quest Products: Top Tips For Oracle SQL Tuning
41 pages
Micro Project MIC PDF
No ratings yet
Micro Project MIC PDF
18 pages
第五章：Flink Table & SQL实践原理（上）
No ratings yet
第五章：Flink Table & SQL实践原理（上）
113 pages
Uml-Yang Mapping Gdls v1.1-1-1 Tr-531
No ratings yet
Uml-Yang Mapping Gdls v1.1-1-1 Tr-531
98 pages
Introduction To Programming 1
No ratings yet
Introduction To Programming 1
22 pages
File Handling Text+Binary+CSV
No ratings yet
File Handling Text+Binary+CSV
62 pages
06 Ch16 Greedy Algorithm
No ratings yet
06 Ch16 Greedy Algorithm
19 pages
System Erros Codes
100% (1)
System Erros Codes
94 pages
Java Programming Reference Material Unit-I: Sree Venkateswara College of Engineering N.Rajupalem Nellore
No ratings yet
Java Programming Reference Material Unit-I: Sree Venkateswara College of Engineering N.Rajupalem Nellore
51 pages
ColorSchemeAutoRotation-8 1 X-And-Earlier Py
No ratings yet
ColorSchemeAutoRotation-8 1 X-And-Earlier Py
4 pages
Bca - 401 (Old)
No ratings yet
Bca - 401 (Old)
2 pages
Keyword Sample JPQL Snippet: Table 4. Supported Keywords Inside Method Names
No ratings yet
Keyword Sample JPQL Snippet: Table 4. Supported Keywords Inside Method Names
1 page
Synon Cool2e Parameters
100% (1)
Synon Cool2e Parameters
28 pages