0% found this document useful (0 votes)

101 views37 pages

Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008

Deep Learning Concept

Uploaded by

Shanmuganathan V (RC2113003011029)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views37 pages

Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008

Deep Learning Concept

Uploaded by

Shanmuganathan V (RC2113003011029)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Gradient Descent

Deep Learning
By
T.K. Damodharan
Vice President, RBS
Reg.No: PC2013003013008

Under the guidance of

Dr V.Rajasekar,
Associate Professor,
Department of Computer Science & Engineering,
SRM Institute of Science and Technology-Vadapalani Campus.
Gradient Descent

 Gradient descent is by far the most

popular optimization strategy used in
machine learning and deep learning at the
moment.
 It is used when training data models, can be
combined with every algorithm and is easy to
understand and implement.
 Everyone working with machine learning
should understand its concept.
Gradient Descent
 Gradient Descent is an optimization algorithm
for finding a local minimum of a differentiable
function.
 Gradient descent is simply used to find the
values of a function's parameters (coefficients)
that minimize a cost function as far as possible.
 It's based on a convex function and tweaks its
parameters iteratively to minimize a given
function to its local minimum.
What is a Gradient

"A gradient measures how much the output of a

function changes if you change the inputs a little
bit." — Lex Fridman (MIT)
A gradient simply measures the change in all
weights with regard to the change in error.
You can also think of a gradient as the slope of a
function. The higher the gradient, the steeper the
slope and the faster a model can learn.
But if the slope is zero, the model stops learning.
In mathematical terms, a gradient is a partial
derivative with respect to its inputs.
What is a Gradient
What is a Gradient

Imagine a blindfolded man who wants to climb to

the top of a hill with the fewest steps along the way
as possible.
He might start climbing the hill by taking really
big steps in the steepest direction, which he can do
as long as he is not close to the top.
As he comes closer to the top, however, his steps
will get smaller and smaller to avoid overshooting
it.
This process can be described mathematically
using the gradient.
What is a Gradient

Imagine the image below illustrates our hill from a

top-down view and the red arrows are the steps of
our climber.
Think of a gradient in this context as a vector that
contains the direction of the steepest step the
blindfolded man can take and also how long
that step should be.
What is a Gradient
What is a Gradient

Note that the gradient ranging from X0 to X1 is

much longer than the one reaching from X3 to X4.
This is because the steepness/slope of the hill,
which determines the length of the vector, is less.
This perfectly represents the example of the hill
because the hill is getting less steep the higher it's
climbed.
Therefore a reduced gradient goes along with a
reduced slope and a reduced step size for the hill
climber.
How Gradient Descent Works

Instead of climbing up a hill, think of gradient

descent as hiking down to the bottom of a valley.
This is a better analogy because it is a
minimization algorithm that minimizes a given
function.

Equation :b is the next position of our climber,

while a represents his current position.
The minus sign refers to the min part of GD.
The gamma in the middle is a waiting factor and
Gradient Descent
More details and Types of Gradient Descent
https://builtin.com/data-science/gradient-descent

Step by step Video guide:

https://youtu.be/sDv4f4s2SB8
Linear Models
A strong high-bias assumption is linear separability:
 in 2 dimensions, can separate classes by a line

 in higher dimensions, need hyperplanes

A linear model is a model that assumes the data is linearly

separable
Linear models
A strong high-bias assumption is linear separability:
 in 2 dimensions, can separate classes by a line

 in higher dimensions, need hyperplanes

A linear model is a model that assumes the data is linearly

separable
Linear Regression
DATASET

inputs outputs
x1 = 1 y1 = 1
x2 = 3 y2 = 2.2

w x3 = 2 y3 = 2
 1  x4 = 1.5 y4 = 1.9
x5 = 4 y5 = 3.1

Linear regression assumes that the expected value of

the output given an input, E[y|x], is linear.
Simplest case: Out(x) = wx for some unknown w.
Given the data, we can estimate w.
Copyright © 2001, 2003, Andrew W. Moore Neural Networks: Slide 14
Linear models
A linear model in n-dimensional space (i.e. n
features) is define by n+1 weights:

In two dimensions, a line:

0 =w1 f1 + w2 f2 + b (where b = -a)

In three dimensions, a plane:

0 =w1 f1 + w2 f2 + w3 f3 + b
In m-dimensions, a hyperplane
m
0 =b + å wj fj
j=1
Which line will it find?
Which line will it find?

Only guaranteed to find some

line that separates the data
Linear models
Perceptron algorithm is one example of a linear
classifier

Many, many other algorithms that learn a line (i.e. a

setting of a linear combination of weights)

Goals:
 Explore a number of linear training algorithms

 Understand why these algorithms work

Linear models in general
1. pick a model
0 =b + å
m
wj fj
j=1

These are the parameters we want to learn

2. pick a criteria to optimize (aka objective function)

Some notation: indicator function

ìï 1 if x =True üï
1[ x ] =í ý
î 0 if x =False
ï ï
þ

Convenient notation for turning T/F answers into numbers/counts:

drinks _ to _ bring _ for _ class = å 1[ x >=21]

xÎ class
Some notation: dot-product
Sometimes it is convenient to use vector notation

We represent an example f1, f2, …, fm as a single vector, x

Similarly, we can represent the weight vector w1, w2, …, wm as a single

vector, w

The dot-product between two vectors a and b is defined as:

m
a ×b =å a j b j
j=1
Linear models
1. pick a model
0 =b + å
n
wj fj
j=1

These are the parameters we want to learn

2. pick a criteria to optimize (aka objective function)

å 1[ y (w ×x + b) £0]
i i
i=1

What does this equation say?

Convex functions
Convex functions look something like:

One definition: The line segment between any

two points on the function is above the function
Finding the minimum

You’re blindfolded, but you can see out of the bottom of the
blindfold to the ground right by your feet. I drop you off
somewhere and tell you that you’re in a convex shaped valley
and escape is at the bottom/minimum. How do you get out?
Finding the minimum

loss

How do we do this for a function?

One approach: gradient descent

Partial derivatives give us the

slope (i.e. direction to move)
in that dimension
loss

w
One approach: gradient descent

Partial derivatives give us the

slope (i.e. direction to move) in
that dimension
loss

Approach:
 pick a starting point (w)
 repeat: w
 pick a dimension
 move a small amount in that
dimension towards decreasing loss
(using the derivative)
One approach: gradient descent

Partial derivatives give us the

slope (i.e. direction to move) in
that dimension

Approach:
 pick a starting point (w)
 repeat:
 pick a dimension
 move a small amount in that
dimension towards decreasing loss
(using the derivative)
Gradient descent

 pick a starting point (w)

 repeat until loss doesn’t decrease in all dimensions:
 pick a dimension
 move a small amount in that dimension towards decreasing loss
(using the derivative)
d
w j =w j - h loss(w)
dw j

What does this do?

Gradient descent

 pick a starting point (w)

 repeat until loss doesn’t decrease in all dimensions:
 pick a dimension
 move a small amount in that dimension towards decreasing loss
(using the derivative)
d
w j =w j - h loss(w)
dw j

learning rate (how much we want to move in the error

direction, often this will change over time)
Some maths
n
d d
dw j
loss = å
dw j i=1
exp(- yi (w ×xi + b))

n
d
=å exp(- yi (w ×xi + b)) - yi (w ×xi + b)
i=1 dw j
n
=å - yi xij exp(- yi (w ×xi + b))
i=1
Gradient descent

 pick a starting point (w)

 repeat until loss doesn’t decrease in all dimensions:
 pick a dimension
 move a small amount in that dimension towards decreasing loss
(using the derivative)

n
w j =w j + h å yi xij exp(- yi (w ×xi + b))
i=1

What is this doing?

Exponential update rule
n
w j =w j + hå yi xij exp(- yi (w ×xi + b))
i=1

for each example xi:

w j =w j + h yi xij exp(- yi (w ×xi + b))

Summary
Gradient descent minimization algorithm
 require that our loss function is convex
 make small updates towards lower losses
Gradient descent

 pick a starting point (w)

 repeat until loss doesn’t decrease in all dimensions:
 pick a dimension
 move a small amount in that dimension towards decreasing loss
(using the derivative)
d
wi =wi - h (loss(w) + regularizer(w, b))
dwi

n
w j =w j + h å yi xij exp(- yi (w ×xi + b)) - hl w j
i=1
The update
w j =w j + h yi xij exp(- yi (w ×xi + b)) - hl w j

learning rate direction to regularization

update
constant: how far from wrong

What effect does the regularizer have?

The update
w j =w j + h yi xij exp(- yi (w ×xi + b)) - hlw j

learning rate direction to regularization

update
constant: how far from wrong

If wj is positive, reduces wj moves wj towards 0

If wj is negative, increases wj

Understanding Gradient Descent in ML
No ratings yet
Understanding Gradient Descent in ML
9 pages
Junior Software Developer Experience Certificate
No ratings yet
Junior Software Developer Experience Certificate
1 page
Computer Vision Lecture Notes All
No ratings yet
Computer Vision Lecture Notes All
18 pages
Lecture 1 - Introduction To Optimization PDF
No ratings yet
Lecture 1 - Introduction To Optimization PDF
31 pages
ML PPT Activation Functions
100% (1)
ML PPT Activation Functions
12 pages
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
No ratings yet
Gradient Descent: Disclaimer: This PPT Is Modified Based On Hung-Yi Lee
38 pages
ANN-Unit 6 - Deep Neural Networks
No ratings yet
ANN-Unit 6 - Deep Neural Networks
29 pages
Deep Learning For Vision Systems 1st Edition Mohamed Elgendy - The Ebook in PDF Format Is Available For Download
No ratings yet
Deep Learning For Vision Systems 1st Edition Mohamed Elgendy - The Ebook in PDF Format Is Available For Download
56 pages
Multivariate Linear Regression Guide
No ratings yet
Multivariate Linear Regression Guide
24 pages
Pytorch Tutorial 1
No ratings yet
Pytorch Tutorial 1
48 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Neural Networks & Deep Learning Basics
100% (1)
Neural Networks & Deep Learning Basics
24 pages
Vanishing and Exploding
No ratings yet
Vanishing and Exploding
9 pages
C15-Momentum RMSProp Adam
No ratings yet
C15-Momentum RMSProp Adam
23 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Unit 5
No ratings yet
Unit 5
36 pages
Feedforward Neural Network
No ratings yet
Feedforward Neural Network
30 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Deep Learning LectureCNN
No ratings yet
Deep Learning LectureCNN
28 pages
Batch Normalization Separate
No ratings yet
Batch Normalization Separate
20 pages
AI Statistical Methods Course
No ratings yet
AI Statistical Methods Course
23 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
Math Essentials for ML Enthusiasts
No ratings yet
Math Essentials for ML Enthusiasts
25 pages
B Ridge - and - Lasso - Regression
No ratings yet
B Ridge - and - Lasso - Regression
5 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Machine Learning Exam Questions and Answers
No ratings yet
Machine Learning Exam Questions and Answers
16 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Telegram Channel Telegram Group
No ratings yet
Telegram Channel Telegram Group
36 pages
Introduction to Machine Learning
100% (1)
Introduction to Machine Learning
17 pages
Deep Learning CNN Training Guide
No ratings yet
Deep Learning CNN Training Guide
20 pages
Single Layer Perceptrons Overview
No ratings yet
Single Layer Perceptrons Overview
25 pages
CNN Short
No ratings yet
CNN Short
61 pages
Math4ml PDF
No ratings yet
Math4ml PDF
21 pages
Pattern Recognition: Tutorial 2
No ratings yet
Pattern Recognition: Tutorial 2
23 pages
Soft Max
No ratings yet
Soft Max
6 pages
Batch Normalization in AIML Accelerating Deep Learning
No ratings yet
Batch Normalization in AIML Accelerating Deep Learning
12 pages
Deep Learning with RBMs and DBNs
No ratings yet
Deep Learning with RBMs and DBNs
79 pages
AI Past Years DU
No ratings yet
AI Past Years DU
4 pages
Logistic Regression & Model Evaluation
100% (1)
Logistic Regression & Model Evaluation
11 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
20 pages
Sequence Modeling with Neural Networks
No ratings yet
Sequence Modeling with Neural Networks
75 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
27 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
Introduction
No ratings yet
Introduction
6 pages
Resnet: Solving Gradient Issues
No ratings yet
Resnet: Solving Gradient Issues
14 pages
Deep Learning Course Overview
100% (1)
Deep Learning Course Overview
122 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Supervised Learning in Neural Networks
100% (1)
Supervised Learning in Neural Networks
72 pages
Module I
No ratings yet
Module I
109 pages
ML Unit-1
No ratings yet
ML Unit-1
43 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
No ratings yet
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
3 pages
1.2 Problem Solving - State Space Search (AIML)
No ratings yet
1.2 Problem Solving - State Space Search (AIML)
29 pages
CS4670: Computer Vision: Lecture 5: Feature Detection and Matching
No ratings yet
CS4670: Computer Vision: Lecture 5: Feature Detection and Matching
46 pages
01 - ML Introduction - Course Outline
No ratings yet
01 - ML Introduction - Course Outline
21 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Gradient Descent - A Quick, Simple Introduction - Built in
No ratings yet
Gradient Descent - A Quick, Simple Introduction - Built in
15 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Adam Optimizer
No ratings yet
Adam Optimizer
22 pages
ReLu Heuristics For Avoiding Local Bad Minima
100% (2)
ReLu Heuristics For Avoiding Local Bad Minima
10 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Electric Power Scam Prediction Using Machine Learning Techniques
No ratings yet
Electric Power Scam Prediction Using Machine Learning Techniques
8 pages
Fast Video Inpainting with SSD Method
No ratings yet
Fast Video Inpainting with SSD Method
10 pages
HW 9 Solutions
No ratings yet
HW 9 Solutions
7 pages
MNED G12MNEDMockP22017Memo
No ratings yet
MNED G12MNEDMockP22017Memo
11 pages
Math 202 Homework 8 Solutions
No ratings yet
Math 202 Homework 8 Solutions
4 pages
Functions - Domain and Range
No ratings yet
Functions - Domain and Range
158 pages
Maximum Bipartite Matching Explained
No ratings yet
Maximum Bipartite Matching Explained
5 pages
Lec03 - Response of A Linear System
No ratings yet
Lec03 - Response of A Linear System
61 pages
PCH Chapter 1 Student Notes (2024)
No ratings yet
PCH Chapter 1 Student Notes (2024)
8 pages
Quintic Function
No ratings yet
Quintic Function
8 pages
Balancing Rectangular Tables on Uneven Ground
No ratings yet
Balancing Rectangular Tables on Uneven Ground
17 pages
Advanced Structural Analysis
No ratings yet
Advanced Structural Analysis
52 pages
Newton-Raphson Method Overview
No ratings yet
Newton-Raphson Method Overview
32 pages
Breaking Down A Quintic Polynomial Into A Quartic Equation
No ratings yet
Breaking Down A Quintic Polynomial Into A Quartic Equation
5 pages
Physics Vectors Interactive Questions
No ratings yet
Physics Vectors Interactive Questions
55 pages
FebruaryMarch 2022
No ratings yet
FebruaryMarch 2022
1 page
Statement of The Riemann-Lebesgue Lemma: The Result Is Easy To State
No ratings yet
Statement of The Riemann-Lebesgue Lemma: The Result Is Easy To State
4 pages
Maths P1 2017
No ratings yet
Maths P1 2017
11 pages
Ridge Regression: Addressing Collinearity Issues
No ratings yet
Ridge Regression: Addressing Collinearity Issues
132 pages
CAE Design Validation Basics
No ratings yet
CAE Design Validation Basics
17 pages
Joint Probability Functions
No ratings yet
Joint Probability Functions
7 pages
William D. Penny - Signal Processing Course
100% (1)
William D. Penny - Signal Processing Course
178 pages
Johan G. F. Belinfante, Bernard Kolman
100% (1)
Johan G. F. Belinfante, Bernard Kolman
175 pages
Mcq-Central Tendency-1113
No ratings yet
Mcq-Central Tendency-1113
24 pages
Ol 2024 SW Mock Add Maths P1
No ratings yet
Ol 2024 SW Mock Add Maths P1
6 pages
RM Notes
No ratings yet
RM Notes
23 pages
Solving Transportation Problem Using Vogel's Approximation Method, Stepping Stone Method & Modified Distribution Method
No ratings yet
Solving Transportation Problem Using Vogel's Approximation Method, Stepping Stone Method & Modified Distribution Method
38 pages
2025 Maths PU2 Mid Term ALL District QPs
No ratings yet
2025 Maths PU2 Mid Term ALL District QPs
77 pages
Module 6: Inequalities: Objectives
No ratings yet
Module 6: Inequalities: Objectives
5 pages
Python Unit 5.notes
No ratings yet
Python Unit 5.notes
47 pages
Differentiation
No ratings yet
Differentiation
33 pages

Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008

Uploaded by

Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008

Uploaded by

Gradient Descent

Under the guidance of

 Gradient descent is by far the most

"A gradient measures how much the output of a

Imagine a blindfolded man who wants to climb to

Imagine the image below illustrates our hill from a

Note that the gradient ranging from X0 to X1 is

Instead of climbing up a hill, think of gradient

Equation :b is the next position of our climber,

Step by step Video guide:

 in higher dimensions, need hyperplanes

A linear model is a model that assumes the data is linearly

 in higher dimensions, need hyperplanes

A linear model is a model that assumes the data is linearly

Linear regression assumes that the expected value of

In two dimensions, a line:

In three dimensions, a plane:

Only guaranteed to find some

Many, many other algorithms that learn a line (i.e. a

 Understand why these algorithms work

These are the parameters we want to learn

2. pick a criteria to optimize (aka objective function)

Convenient notation for turning T/F answers into numbers/counts:

drinks _ to _ bring _ for _ class = å 1[ x >=21]

We represent an example f1, f2, …, fm as a single vector, x

Similarly, we can represent the weight vector w1, w2, …, wm as a single

The dot-product between two vectors a and b is defined as:

These are the parameters we want to learn

2. pick a criteria to optimize (aka objective function)

What does this equation say?

One definition: The line segment between any

How do we do this for a function?

Partial derivatives give us the

Partial derivatives give us the

Partial derivatives give us the

 pick a starting point (w)

What does this do?

 pick a starting point (w)

learning rate (how much we want to move in the error

 pick a starting point (w)

What is this doing?

for each example xi:

w j =w j + h yi xij exp(- yi (w ×xi + b))

 pick a starting point (w)

learning rate direction to regularization

What effect does the regularizer have?

learning rate direction to regularization

If wj is positive, reduces wj moves wj towards 0

You might also like