Machine Learning

The document covers key concepts in mathematics for machine learning, focusing on vector calculus, gradients, and backpropagation in deep learning. It discusses automatic differentiation methods in TensorFlow and PyTorch, various probability distributions used in deep learning, and optimization techniques including adaptive learning rates. Additionally, it highlights the importance of regularization methods and Bayesian approaches for uncertainty estimation in neural networks.

Uploaded by

mariyanmathew2027

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Machine Learning

Uploaded by

mariyanmathew2027

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Mathematics for Machine Learning -Assignment 1

Module 3: Vector Calculus and Deep Learning

1. Gradients in Deep Learning & Backpropagation
The gradient of a function f : Rn → R is a vector of partial derivatives repre-
senting the function’s rate of change with respect to its input variables. The
gradient is defined as:

∂f ∂f ∂f
∇f (x) = , ,...,
∂x1 ∂x2 ∂xn
In deep learning, gradients dictate the weight adjustments during training
using the gradient descent algorithm:

w ← w − η∇L
where: - w: weights vector - η: learning rate (step size) - L: loss function

Backpropagation Mechanism
For a basic neural network, the output can be represented as:

y = f (wx + b)
where f is an activation function and b is the bias. The loss function L(y, ŷ)
compares the predicted output ŷ with the true output y.
To compute gradients using the chain rule, the gradient of the loss with
respect to the weights is calculated:
dL dL dy
= ·
dw dy dw
For a multi-layer neural network, applying the chain rule will look like this:
1. For Output Layer:
δ = ∇L(y, ŷ) · f ′ (z)
where z = wx + b and f ′ is the derivative of the activation function.
2. For Hidden Layer:

δhidden = (wT δ) · f ′ (zhidden )

1
Different activation functions affect the gradient calculations:
- **ReLU**: (
x if x > 0
f (x) =
0 otherwise
Derivative: (
′ 1 if x > 0
f (x) =
0 otherwise
- **Sigmoid**:
1
f (x) =
1 + e−x
Derivative:
f ′ (x) = f (x) · (1 − f (x))

2. Automatic Differentiation in TensorFlow/PyTorch

Forward Mode Autodiff computes derivatives simultaneously with function eval-
uation, while **Reverse Mode Autodiff** efficiently computes gradients in deep
learning.
Example in PyTorch:
import torch
x = torch.tensor(2.0, requires_grad=True)
y = x**3 + 2*x + 5
y.backward()
print(x.grad) # dy/dx = 14
Example in TensorFlow:

import tensorflow as tf
x = tf.Variable(2.0)
with tf.GradientTape() as tape:
y = x**3 + 2*x + 5
grad = tape.gradient(y, x)
print(grad.numpy()) # Output: 14

3. Comparison of Differentiation Methods

Differentiation Type Formula Advantages Limitations
Symbolic Exact derivatives Exact and precise Computational expensive
f (x+h)−f (x−h)
Numerical 2h Simple to implement Prone to approx errors
Automatic Computational graphs Efficient for complex functions Requires extra memory

2
Module 4: Probability & Distributions in Deep
Learning
1. Common Probability Distributions
1. Gaussian (Normal) Distribution: - Probability density function:

1 (x−µ)2
f (x) = √ e− 2σ 2
2πσ 2
- Used in weight initialization (e.g., Xavier, He Initialization).
2. Bernoulli Distribution: - Probability mass function:

P (X = 1) = p, P (X = 0) = 1 − p

- Applicable for binary classification.

3. Exponential Distribution: - Probability density function:

f (x; λ) = λe−λx (x ≥ 0)

- Used for modeling the time until an event occurs.

2. Gaussian Distribution-Based Regularization

- Dropout Regularization: Reduces overfitting by randomly deactivating neu-
rons:
y (d) = y · Dropout(p)
where p is the dropout probability.
- Batch Normalization: Normalizes activations using:
y−µ
ŷ = √
σ2 + ϵ

where µ is the mean, σ 2 is the variance, and ϵ is a small value to prevent division
by zero.

3. Bayesian Deep Learning for Uncertainty Estimation

Bayesian Neural Networks (BNNs) assign probability distributions to weights
rather than deterministic values: - A weight w can be represented as:

P (w) ∼ N (µ, σ 2 )

- Provides uncertainty quantification in predictions.

3
Module 5: Optimization in Deep Learning
1. Impact of Learning Rate in Gradient Descent
Gradient Descent Update Rule:

w ← w − η∇L

Adaptive Learning Rate Methods:

1. Momentum: Accelerates gradient descent using past gradients:

v ← βv + (1 − β)∇L

w ← w − ηv
where β is the momentum term (typically set to 0.9).
2. Adam Optimizer: A combination of momentum and RMSProp. Updates
are computed as:
vt = β1 vt−1 + (1 − β1 )∇L
st = β2 st−1 + (1 − β2 )(∇L)2
η
w←w− √ vt
st + ϵ
where β1 and β2 are the decay rates for the moving averages, and ϵ is a small
constant for numerical stability.

IB Entrance Exams 2013
No ratings yet
IB Entrance Exams 2013
5 pages
Xlookup Function
No ratings yet
Xlookup Function
5 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
MODULE 2 DL SNOTES P1
No ratings yet
MODULE 2 DL SNOTES P1
16 pages
Deep Learning
100% (4)
Deep Learning
100 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
Deep Learning
No ratings yet
Deep Learning
100 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
DL-2
No ratings yet
DL-2
62 pages
07autodiff Nnets
No ratings yet
07autodiff Nnets
12 pages
2. Neural Network Training
No ratings yet
2. Neural Network Training
73 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
6.3 HiddenUnits
No ratings yet
6.3 HiddenUnits
26 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
A2.2 DNN Update 2
No ratings yet
A2.2 DNN Update 2
51 pages
Lect 12 -Deep Feed Forward NN- Review
No ratings yet
Lect 12 -Deep Feed Forward NN- Review
93 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
AI - W7L13
No ratings yet
AI - W7L13
46 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
UNIT 1
No ratings yet
UNIT 1
30 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Pr2_ANN_WriteUp.docx
No ratings yet
Pr2_ANN_WriteUp.docx
11 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Matrix Calculus
No ratings yet
Matrix Calculus
33 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
AyushChokhani AI Asiignment 2
No ratings yet
AyushChokhani AI Asiignment 2
12 pages
Module 2
No ratings yet
Module 2
13 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
7 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Module 3.Docxaiml
No ratings yet
Module 3.Docxaiml
20 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Deep Learning Andrew NG
100% (3)
Deep Learning Andrew NG
173 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
4. DeepLearning
No ratings yet
4. DeepLearning
32 pages
Slides 11
No ratings yet
Slides 11
48 pages
Deep learning
No ratings yet
Deep learning
15 pages
Automatic Differentiation and Neural Networks
No ratings yet
Automatic Differentiation and Neural Networks
13 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Deep Learning
100% (1)
Deep Learning
49 pages
DeepLearning Introduction
No ratings yet
DeepLearning Introduction
14 pages
cs224n 2023 Lecture03 Neuralnets
No ratings yet
cs224n 2023 Lecture03 Neuralnets
83 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Chapter One: What Are Electrical & Computer Engineering
No ratings yet
Chapter One: What Are Electrical & Computer Engineering
28 pages
Probability II Form 5
No ratings yet
Probability II Form 5
24 pages
Slater Determinant Chemisttry
No ratings yet
Slater Determinant Chemisttry
17 pages
11th MCQ Test
No ratings yet
11th MCQ Test
2 pages
Problems Chap8
No ratings yet
Problems Chap8
22 pages
Google C++ Testing Framework: Running Test Programs: Advanced Options
No ratings yet
Google C++ Testing Framework: Running Test Programs: Advanced Options
18 pages
SIC - AI - Chapter 3. Exploratory Data Analysis - Rev2.0
No ratings yet
SIC - AI - Chapter 3. Exploratory Data Analysis - Rev2.0
527 pages
17EC61 DIGITAL COMMUNICATION SYLLABUSpdf
No ratings yet
17EC61 DIGITAL COMMUNICATION SYLLABUSpdf
2 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
14 pages
A.2 Level Math Edexcel Past Papers C34 2020
No ratings yet
A.2 Level Math Edexcel Past Papers C34 2020
16 pages
In Planetary Gears: Lo, Adiidistri'Bluti, On
No ratings yet
In Planetary Gears: Lo, Adiidistri'Bluti, On
4 pages
Pressuremeter Test
No ratings yet
Pressuremeter Test
54 pages
Solutions Ark2: M N M N M N M N
No ratings yet
Solutions Ark2: M N M N M N M N
6 pages
CFD Simulation For Gas Explosions
No ratings yet
CFD Simulation For Gas Explosions
15 pages
Variables and Patterns Make-Up Test
No ratings yet
Variables and Patterns Make-Up Test
3 pages
Musical Analogies
No ratings yet
Musical Analogies
9 pages
Rashid Term 1 Syllabus
No ratings yet
Rashid Term 1 Syllabus
39 pages
Data Binding
No ratings yet
Data Binding
123 pages
Two Way Slab Design
100% (1)
Two Way Slab Design
35 pages
Flowchart For All C.I. Cases (Compact)
No ratings yet
Flowchart For All C.I. Cases (Compact)
4 pages
ACET March 2021 - Solution
No ratings yet
ACET March 2021 - Solution
12 pages
CMO-Sample-Papers-for-Class-8 1
No ratings yet
CMO-Sample-Papers-for-Class-8 1
4 pages
MScIT Final PDF
No ratings yet
MScIT Final PDF
53 pages
PAPER 2 - Two-Echelon Supply Chain Models: Considering Duopolistic Retailers' Different Competitive Behaviors
No ratings yet
PAPER 2 - Two-Echelon Supply Chain Models: Considering Duopolistic Retailers' Different Competitive Behaviors
2 pages
SP Practical Solution
No ratings yet
SP Practical Solution
10 pages
Math 2270 - Lecture 26: The Properties of Determinants
No ratings yet
Math 2270 - Lecture 26: The Properties of Determinants
6 pages
Quants
No ratings yet
Quants
18 pages
Solving Dsa Interview Questions
100% (1)
Solving Dsa Interview Questions
32 pages