003 Activation Functions in Machine Learning

The document discusses various activation functions used in machine learning, highlighting their importance in neural networks for introducing non-linearity and enabling complex pattern learning. It covers functions such as Binary Step, Sigmoid, Tanh, ReLU, Leaky ReLU, Parameterized ReLU, Swish, and Softmax, explaining their characteristics and when to use them. The document also emphasizes the significance of choosing the right activation function based on the specific problem to ensure effective convergence and performance of the neural network.

Uploaded by

Syed Qasir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views19 pages

003 Activation Functions in Machine Learning

Uploaded by

Syed Qasir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Activation

Functions
last update 28-05-
2022 •Dr. Ghulam Gilanie Janjua
•PhD (Artificial Intelligence)

Activation functions are one

of the building blocks of ML.
Activation
Functions

Input Information

Activation Function

“useful” “less-useful” “not-so-useful”

Neural
Network

The output from the activation function moves to the next hidden layer and the same process is repeated. This
forward movement of information is known as the "forward propagation”.
If the output generated is far away from the actual value. Using the output from the forward propagation, the error
is calculated. Based on this error value, the weights and biases of the neurons are updated. This process is known
as "back-propagation”.
-activation function introduces an
additional step at each layer during the
forward propagation
Can we do -increases the complexity
it without -without the activation functions, every
an neuron will only be performing a linear
transformation on the inputs using the
activation weights and biases
function? -although linear transformations make the
neural network simpler, this network would
be less powerful and will not be able to
learn the complex patterns from the data
-a neural network without an activation function is
essentially just a linear regression model
-thus, we use a nonlinear transformation to
the inputs of the neuron, and this non-
linearity in the network is introduced by an
WHAT IS A GRADIENT?
In machine learning, a gradient is a
Activation derivative of a function that has
more than one input variable.
functions 1.Binary Step Known as the slope of a function in
and when 2.Linear mathematical terms, the gradient
simply measures the change in all
to use 3.Sigmoid weights about the change in error.
them? 4.Tanh
5.ReLU
6.Leaky ReLU
7.Parameterised ReLU
8.Exponential Linear Unit
9.Swish
10.Softmax
f(x) = 1,
Binary Step or Step
function or x>=0
threshold
• f(x)to=the0,activation function is
If the input
greater than a threshold, then the
neuronx<0is activated, else it is
deactivated, i.e., its output is not
considered for the next hidden layer
• Simplest activation function, which can
be implemented with a single if-else
def
condition binary_step(
binary_step(x):
5)
if x<0: Output?
binary_step(-
return 0
1)
else:
Output?
Binary Step or Step
function or
threshold • Suitable for a binary classifier
• Not useful for multiple classes in the
target variable
• Gradients are calculated to update the
weights and biases during the
backpropagation process.
• The gradient of the step function is
zero, so, no backpropagation is
possible
• If you calculate the derivative of f(x)
with respect to x, it comes out to be 0.
• Add another component of x in the
Linear Function binary step function “a linear function”.
f(x)=a
def x
linear_function(4),
linear_function(x): linear_function(-2)
return Output?
4*x
• Gradient is not zero now, but it is a
constant which does not depend upon
f'(x) x=at all.
the input value
a
• Weights and biases will be updated during the backpropagation
process, but the updating factor would be the same.
• Network will not really improve the error since the gradient is
the same for every iteration.
• Linear function might be ideal for simple tasks where
interpretability is highly desired.
• One of the most widely used “non-
linear” activation functions.
• Transforms the values between the
range 0 f(x)
and = 1/(1+e^-
1
Sigmoid import numpy as np
x)
def sigmoid_function(7),sigmoid_functi
on(-22)
sigmoid_function(x)
Output?
: ( 0.9990889488055994,
2.7894680920908113e-10)
z = (1/(1 + np.exp(-
x)))
• return
A smooth
z S-shaped function and is
continuously differentiable.
f'(x) = sigmoid(x)*(1-
sigmoid(x))
• The gradient values are significant for ranges -3 and 3.
• For values greater than 3 or less than -3, will have very small
gradients
• Not symmetric around zero, the output of all the neurons will
• Similar to the sigmoid function, but,
symmetric around the origin.
• Transforms the values between the
sigmoid
range -1 and 1. function(7),sigmoid_function(-22)
tanh(x)=2sigmoid( Output?
Tanh 2x)-1 (0.4621171572600098, -
tanh(x) = 2/(1+e^(- 0.7615941559557646)
2x)) -1
• Inputs to the next layers will not always
be of the same sign.
• All other properties of tanh function are the same as
that of the sigmoid function, it is continuous and
differentiable at all points.
• The gradient of the tanh function is steeper as
compared to the sigmoid function.
• Being zero centered, tanh is preferred over the
sigmoid function and the gradients are not
restricted to move in a certain direction.
• Non-linear activation function, gained
popularity in the machine learning
domain.
• The main advantage of using the ReLU
function over other activation functions
ReLU is that it does notrelu_function(7),
activate all the
relu_function(-7)
f(x)=max(0
neurons at the same time.
Output?
,x) (7, 0)

• For the negative input values, the result is

zero, which means the neuron does not get
activated.
• ReLU function is far more computationally
• On the negativewhen
efficient side, the gradient to
compared value
the issigmoid
zero. during
and
the backpropagation
tanh function process, the weights and biases for
some neurons are not updated. This can create dead
neurons which never get activated. This is taken care of
• An improved version of the ReLU
function.
• For the ReLU function, the gradient is 0
for x<0, which would deactivate the
neurons in that region.
Leaky ReLU • Instead of defining the Relu function as
0 for0.01x,
f(x)= negative values of x, we define it
leaky_relu_function(7),
as an extremely
x<0 small linear
leaky_relu_function(-7)
component of x Output?
= x, (7, -
f'(x)
x>=0 = 1, 0.07)

x>=0
=0.01,
• Apart fromx<0
Leaky ReLU, there are a few other
variants of ReLU, the two most popular are –
Parameterized ReLU function and Exponential ReLU.
• Another variant of ReLU to solve the
problem of gradients becoming zero for
the left half of the axis.
Parameteri • As the name suggests, introduces a
new parameter as a slope of the
zed ReLU negative part off(x) = function.
the x, f'(x) = 1,
x>=0 x>=0
= ax, x<0 = a, x<0
• When the value of a is fixed to 0.01, the
function acts as a Leaky ReLU function. in case
of a parameterized ReLU function, ‘a‘ is also a
trainable parameter. The network also learns the
value of ‘a‘ for faster and more optimum
convergence.
• Lesser known activation function,
discovered by researchers at Google.
• Computationally efficient as ReLU and
shows better performance than ReLU
Swish on deeper models.
• The values for swish range from
negative
f(x) = infinity to infinity
swish_function(-67),
x*sigmoid(x) swish_function(4)
f(x) = x/(1-e^- Output?
(5.349885844610276e-28,
x) 4.074629441455096)
• The curve of the function is smooth, and the function
is differentiable at all points. This is helpful during the
model optimization process and is one of the reasons
that swish outperforms than ReLU.
• Swish function is not monotonic, the value of the
function may decrease even when the input values
• Described as a combination of multiple
sigmoids.
• Sigmoid returns values between 0 and
Softmax 1, which can be treated as probabilities
of a data point belonging to a
particular class.
• Sigmoid is widely used for binary
• Softmax function can
classification be used for multiclass
problems.
classification problems.
• Returns the probability for a data point belonging
to each individual class.
• for a multiclass problem, the output layer would
have as many neurons as the number of classes
in the target, if we have three classes, there
would be three neurons in the output layer, as
[1.2will
• The softmax function , 0.9
get ,the0.75].
following result – [0.42, 0.31, 0.27], representing the
probability for the data point belonging to each class, the sum of all the values is 1.
• Good or bad – there is no rule of
thumb.
• Depending upon the properties of the
Choosing problem, we might be able to make a
the right better choice for easy and quicker
Activatio convergence
•Sigmoid of their
functions and the network.
combinations generally
work better in the case of classifiers.
n •Sigmoids and tanh functions are sometimes avoided
due to the vanishing gradient problem.
Function •ReLU function is a general activation function and is
used in most cases these days.
•If we encounter a case of dead neurons in our
networks, the leaky ReLU function is the best choice.
•Always keep in mind that ReLU function should only be
used in the hidden layers.
•You can begin with using ReLU function and then move
over to other activation functions in case ReLU doesn’t
provide optimum results.

Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Activation Function
No ratings yet
Activation Function
10 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
43 pages
Activation
No ratings yet
Activation
7 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Deep Learning Activation Functions
No ratings yet
Deep Learning Activation Functions
10 pages
Act Fun
No ratings yet
Act Fun
7 pages
Module-4 Neural Network
No ratings yet
Module-4 Neural Network
61 pages
Activation Function
No ratings yet
Activation Function
36 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Activation Function
No ratings yet
Activation Function
34 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Unit 2
No ratings yet
Unit 2
35 pages
4 4 Choosing The Right Activation Function For Neural Networks
No ratings yet
4 4 Choosing The Right Activation Function For Neural Networks
25 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
UNIT-III Activation-Function
No ratings yet
UNIT-III Activation-Function
6 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
7 pages
Lect 5 - Non Linear Activation Functions
No ratings yet
Lect 5 - Non Linear Activation Functions
41 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Arjun Yadav 32, Activation Function Assignment
No ratings yet
Arjun Yadav 32, Activation Function Assignment
7 pages
ML PPT Activation Functions
No ratings yet
ML PPT Activation Functions
12 pages
Fundamentals Deep Learning Activation Functions When To Use Them
No ratings yet
Fundamentals Deep Learning Activation Functions When To Use Them
15 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Neural Network Activation Guide
No ratings yet
Neural Network Activation Guide
14 pages
Activation Functions
No ratings yet
Activation Functions
9 pages
Types of Neural Network Activation Functions - How To Choose
No ratings yet
Types of Neural Network Activation Functions - How To Choose
36 pages
UNIT-3 Deep Learning (Revised) - 1
No ratings yet
UNIT-3 Deep Learning (Revised) - 1
92 pages
Activation Functions
No ratings yet
Activation Functions
8 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
26 - Netinput Activation Function Forward and Back Propogation
No ratings yet
26 - Netinput Activation Function Forward and Back Propogation
41 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Activation Function
No ratings yet
Activation Function
6 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Performance Analysis of Various Activation Functio
No ratings yet
Performance Analysis of Various Activation Functio
7 pages
Activation Functions
No ratings yet
Activation Functions
23 pages
Dl-Module 2
No ratings yet
Dl-Module 2
138 pages
Activation Functions and Loss
No ratings yet
Activation Functions and Loss
17 pages
CNN Activation Functions Explained
No ratings yet
CNN Activation Functions Explained
5 pages
Activation Function
No ratings yet
Activation Function
31 pages
Unit 3 Deep Learning
No ratings yet
Unit 3 Deep Learning
11 pages
Neural Networks: A Deep Dive
No ratings yet
Neural Networks: A Deep Dive
34 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
Deep Learning Tutorial 3
No ratings yet
Deep Learning Tutorial 3
12 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
Activation Function
No ratings yet
Activation Function
18 pages
Neural Networks: Key Concepts & Functions
No ratings yet
Neural Networks: Key Concepts & Functions
22 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
Pr1 ANN Writeup
No ratings yet
Pr1 ANN Writeup
7 pages
Activation Functions
No ratings yet
Activation Functions
4 pages
Activation Function
No ratings yet
Activation Function
4 pages
Activation Functions in Neural Networks
No ratings yet
Activation Functions in Neural Networks
3 pages
Lec 22 Activations Functions Complete
No ratings yet
Lec 22 Activations Functions Complete
33 pages
SoftComp 02
No ratings yet
SoftComp 02
33 pages
Functii de Activare1
No ratings yet
Functii de Activare1
89 pages
Fundamentals of Neural Network
No ratings yet
Fundamentals of Neural Network
84 pages
Presentations Marks NLP
No ratings yet
Presentations Marks NLP
1 page
Women Unoversty Template
No ratings yet
Women Unoversty Template
105 pages
001IntroductiontomachinelearningPart I
No ratings yet
001IntroductiontomachinelearningPart I
10 pages
Digital Asset Management Project Overview
No ratings yet
Digital Asset Management Project Overview
11 pages
Brain Tumor Classification by Azan
No ratings yet
Brain Tumor Classification by Azan
6 pages
Slide 1: Title Slide
No ratings yet
Slide 1: Title Slide
6 pages
21CS42 Design and Analysis of Algorithms
No ratings yet
21CS42 Design and Analysis of Algorithms
5 pages
DSA Lab Experiments - 12 & 13
No ratings yet
DSA Lab Experiments - 12 & 13
5 pages
Gauss Quadrature Integration Guide
No ratings yet
Gauss Quadrature Integration Guide
9 pages
Ecs511 July24
No ratings yet
Ecs511 July24
7 pages
Integration by Partial Fractions
No ratings yet
Integration by Partial Fractions
4 pages
Lecture 3.a - Exact Methods PDF
No ratings yet
Lecture 3.a - Exact Methods PDF
50 pages
Safiya Yakasai SCM230 1
No ratings yet
Safiya Yakasai SCM230 1
9 pages
P3 Chapter 1 Algebraic Methods
No ratings yet
P3 Chapter 1 Algebraic Methods
19 pages
1 - (Partial Fraction)
100% (1)
1 - (Partial Fraction)
7 pages
Q. Implement A Python Program For Kruskal's Algorithm For Finding The Minimum Spanning Tree. Sol
No ratings yet
Q. Implement A Python Program For Kruskal's Algorithm For Finding The Minimum Spanning Tree. Sol
7 pages
Numerical Integration Methods Guide
No ratings yet
Numerical Integration Methods Guide
6 pages
Gauss-Jordan & Matrix Methods Guide
No ratings yet
Gauss-Jordan & Matrix Methods Guide
52 pages
Special Products
No ratings yet
Special Products
2 pages
Day 5 - CNN, Autoencoder, GANs
No ratings yet
Day 5 - CNN, Autoencoder, GANs
19 pages
Enat Maths For Social Remedial
No ratings yet
Enat Maths For Social Remedial
6 pages
Gauss Quadrature ppt1 (1) Edited
No ratings yet
Gauss Quadrature ppt1 (1) Edited
22 pages
ICS 46 Study Guide
No ratings yet
ICS 46 Study Guide
1 page
Initial Feasible Solutions in Transportation Method
100% (1)
Initial Feasible Solutions in Transportation Method
31 pages
Chapter6 Sec4
No ratings yet
Chapter6 Sec4
46 pages
221902285-Algorithm Lab Report 6
No ratings yet
221902285-Algorithm Lab Report 6
6 pages
Maths Class X Chapter 02 Polynomials Practice Paper 02 Answers 1
100% (1)
Maths Class X Chapter 02 Polynomials Practice Paper 02 Answers 1
5 pages
Numerical Methods in CFD
No ratings yet
Numerical Methods in CFD
40 pages
Quick Sort vs Merge Sort Guide
No ratings yet
Quick Sort vs Merge Sort Guide
45 pages
Shiksha Mantra: Mathematics
No ratings yet
Shiksha Mantra: Mathematics
1 page
Cell Name Original Value Final Value
No ratings yet
Cell Name Original Value Final Value
4 pages
Grade 10 Math Q1 Quarter 1 Module 5 Answer Key
No ratings yet
Grade 10 Math Q1 Quarter 1 Module 5 Answer Key
16 pages
Linear Algebra & Optimization Course
No ratings yet
Linear Algebra & Optimization Course
3 pages
How To Solve Problem by Using Graphical Method
No ratings yet
How To Solve Problem by Using Graphical Method
52 pages
Google Technical Interview Prep - Trello
No ratings yet
Google Technical Interview Prep - Trello
4 pages
RD Sharma Solutions For Class 10 Maths Chapter 2 Polynomials
No ratings yet
RD Sharma Solutions For Class 10 Maths Chapter 2 Polynomials
25 pages

003 Activation Functions in Machine Learning

Uploaded by

003 Activation Functions in Machine Learning

Uploaded by

Activation

Activation functions are one

“useful” “less-useful” “not-so-useful”

• For the negative input values, the result is

You might also like