0% found this document useful (0 votes)

10 views

Different Activation Functions With The Equations

Uploaded by

Narayana

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Different Activation Functions With The Equations

Uploaded by

Narayana

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Different activation functions with the Equations

1) Step Activation Function:-

The Step Function is an activation function used in binary classification
tasks, where a neuron activates (outputs 1) if the input is greater than or
equal to 0 and deactivates (outputs 0) otherwise. It is simple and can be
implemented using an if-else condition, but it cannot be used for multi
class classification and has a gradient of zero, which makes it unsuitable
for back propagation in deep learning as it hinders learning through
gradient descent. The function is mathematically represented as:
· f(x)=1 if x≥0
· f(x)=0 if x<0

2) Sigmoid Activation Function:-

It is the most widely used activation function as it is a non-linear function.
Sigmoid function transforms the values in the range 0 to 1. It can be
defined as: f(x) = 1/e-x Sigmoid function is continuously differentiable
and a smooth S-shaped function. The derivative of the function is: f’(x) =
1-sigmoid(x)
Also, sigmoid function is not symmetric about zero which means that the
signs of all output values of
neurons will be same. This issue can be improved by scaling the sigmoid
function.
3) TanH Activation Function:-
It is Hyperbolic Tangent function. Tanh function is similar to the sigmoid
function but it is symmetric to around the origin. This results in different
signs of outputs from previous layers which will be fed as input to the
next layer. It can be defined as: f(x) = 2sigmoid(2x)-1 Tanh function is
continuous and differentiable, the values lies in the range -1 to 1. As
compared to the sigmoid function the gradient of tanh function is more
steep. Tanh is preferred over sigmoid function as it has gradients which
are not restricted to vary in a certain direction and also, it is zero centered.

4) ReLU Activation Function:-

ReLU stands for rectified liner unit and is a non-linear activation function
which is widely used in neural network. The upper hand of using ReLU
function is that all the neurons are not activated at the same time. This
implies that a neuron will be deactivated only when the output of linear
transformation is zero. It can be defuned mathematically as: f(x) =
max(0,x) ReLU is more efficient than other functions because as all the
neurons are not activated at the same time, rather a
certain number of neurons are activated at a time.
In some cases, the value of gradient is zero, due to which the weights and
biases are not updated during back propagation step in neural network
training.

5) LEAKY RELU ACTIVATION FUNCTION:-

Leaky ReLU is an improvised version of ReLU function where for

negative values of x, instead of defining the ReLU functions’ value as
zero, it is defined as extremely small linear component of x. It can be
expressed mathematically as: f(x) = 0.01x, x < 0
f(x) = x, x >= 0

Batch Normalization:- Batch normalization (BN) is a powerful

technique to address issues like vanishing and exploding gradients and
internal covariate shift in training deep neural networks. The core idea is
to normalize the inputs to a layer such that the mean and variance are
controlled, thereby stabilizing and speeding up training.

Key Concepts of Batch Normalization:

1. Internal Co-variate Shift:

1. During training, as parameters are updated, the hidden layer

inputs change, leading to instability and slower convergence.
This shift is called internal co-variate shift.
2. Batch normalization helps mitigate this by normalizing the
input at each layer, ensuring that the distribution of inputs
remains more stable.
2. Normalization Layer:

1. The BN layer is introduced between hidden layers in the

network. It normalizes inputs such that they have a mean of
zero and a standard deviation of one over each mini-batch of
training data.
2. The normalization is followed by a scaling and shifting
operation using learnable parameters βi(shift) and γi(scale).

3. Choices for Normalization:

1. Post-activation normalization: After applying the

activation function, the values are normalized.
2. Pre-activation normalization: The normalization is applied
right after the linear transformation (before applying the
activation function).
3. Research suggests that normalizing pre-activation values is
more effective, leading to faster and more stable convergence
during training.

4. Mathematical Operations:

5. Back propagation Through Batch Normalization:

1. BN involves additional parameters βi and γi that must be

updated during back propagation.
2. To update these parameters, the gradients of the loss with
respect to βi and γi are computed as:

3. Back propagation through the BN layer also requires

computing the gradients with respect to the mean μi and
variance σi2 , which are batch-dependent. This non-linearity
adds complexity but still allows the network to back propagate
through these layers.

Key Benefits of Batch Normalization:

 Faster Training: By reducing internal co-variate shift, BN allows

higher learning rates, leading to faster training.
 Improved Stability: The controlled input distributions reduce the
risk of vanishing or exploding gradients.
 Regularization Effect: BN introduces a slight regularization effect,
sometimes reducing the need for dropout.
Ensemble Methods

 Ensemble methods help improve classifier performance by

addressing the bias-variance trade-off.
 Bagging reduces variance, while Boosting reduces bias.
 Neural networks, which tend to have low bias but high variance,
benefit from ensemble methods to enhance generalization.

4.5.1 Bagging and Subsampling

 Bagging: Creates multiple models by sampling the training data

with replacement and averaging the predictions to reduce variance.

o Customarily, sample size s=n (size of original data), but

smaller s can work better.

 Subsampling: Similar to bagging but without replacement.

Preferable when sufficient data is available.

4.5.2 Parametric Model Selection and Averaging

 Model selection involves finding the best configuration from a set

of hyperparameters.
 Averages predictions from the top kkk configurations for more
robust predictions.

4.5.3 Randomized Connection Dropping

 Randomly drop connections between layers in a neural network.

 Diverse models are generated, and averaging their predictions
improves accuracy.

4.5.4 Dropout

· Node Sampling: Dropout randomly drops nodes (input and hidden)

along with their connections to create different neural networks during
training.
· Weight Sharing: Different sampled networks share the same weights,
updated via backpropagation.
· Sampling Process: Each node is sampled with a probability, typically
between 20-50%, and all edges connected to dropped nodes are also
removed.
· Training with Dropout: A new neural network is sampled for each
mini-batch, making the number of sampled networks large.
· Weight Scaling Inference Rule: At inference, the base network (no
dropping) is used with re-scaled weights to approximate the ensemble's
output.

· Regularization: Dropout acts as a regularizer by introducing noise

(setting random nodes to 0), preventing overfitting and feature co-
adaptation.
· Feature Co-Adaptation: Dropout prevents complex dependencies
between features by encouraging the network to rely on subsets of
features, increasing generalization.
· Larger Models Needed: Due to the regularization effect, larger models
and more units are often required.
· Performance Improvement: Commonly improves model performance
by around 2% in large datasets like ImageNet.
· DropConnect: A variation that applies dropout to weights instead of
nodes.

4.5.5. Data Perturbation Ensembles:

 Noise on Input Data: Add small amounts of noise to input data,

train multiple models, and average their predictions for better
generalization.
 Noise in Hidden Layers: Inject noise into hidden layers (e.g.,
Dropout), but careful calibration is needed to avoid degrading
performance.
 Dropout: Randomly drops nodes in neural networks, indirectly
adding noise to the hidden layers, enhancing robustness.
 Data Augmentation: Apply transformations like rotations,
translations to increase the dataset size, improving model
generalization (used in CNNs).
 Denoising Autoencoders: Common in unsupervised learning,
reconstruct input data from noisy versions to enhance feature
learning.

Solution Manual For Nonlinear Control PDF
25% (8)
Solution Manual For Nonlinear Control PDF
11 pages
GTZ 2018 List
No ratings yet
GTZ 2018 List
36 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
ANN notes
No ratings yet
ANN notes
7 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
59 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
4 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Module 2
No ratings yet
Module 2
13 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
DL UNIT2
No ratings yet
DL UNIT2
22 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Deep Learning Module-02
No ratings yet
Deep Learning Module-02
15 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Module 3_Modified
No ratings yet
Module 3_Modified
106 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
NN unit_1
No ratings yet
NN unit_1
27 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
Machine Learning Questions
No ratings yet
Machine Learning Questions
2 pages
Study of Ensemble of Activation Functions in Deep Learning
No ratings yet
Study of Ensemble of Activation Functions in Deep Learning
10 pages
Ad3451 Ml Unit 4 Notes
No ratings yet
Ad3451 Ml Unit 4 Notes
34 pages
DL Answers
No ratings yet
DL Answers
24 pages
Unit 2 - Machine Learning
No ratings yet
Unit 2 - Machine Learning
19 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
26- netinput activation function forward and back propogation
No ratings yet
26- netinput activation function forward and back propogation
41 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
DL UNIT 3
No ratings yet
DL UNIT 3
14 pages
465-Lecture 2-4
No ratings yet
465-Lecture 2-4
43 pages
Lecture NN Part1
No ratings yet
Lecture NN Part1
62 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
ml
No ratings yet
ml
10 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
15 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
Activation functions 2
No ratings yet
Activation functions 2
5 pages
Feed Forward Neural Network Assignment PDF
No ratings yet
Feed Forward Neural Network Assignment PDF
11 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
DVR 5 EFA0016 Construction Rimula R4 Peru-Brazil
No ratings yet
DVR 5 EFA0016 Construction Rimula R4 Peru-Brazil
2 pages
Exccellent Torrent Power Details
No ratings yet
Exccellent Torrent Power Details
9 pages
Class Lec. 9 & 10, Chromosomes & Associated Studies
No ratings yet
Class Lec. 9 & 10, Chromosomes & Associated Studies
19 pages
Theories of Emotions
No ratings yet
Theories of Emotions
5 pages
98 PDF
No ratings yet
98 PDF
4 pages
Olhuveli Factsheet en Nov 2020
No ratings yet
Olhuveli Factsheet en Nov 2020
8 pages
Sensores SPO2 G.E
No ratings yet
Sensores SPO2 G.E
4 pages
Removal
No ratings yet
Removal
2 pages
Vitolo Lessons in Script Fundamental Forms
No ratings yet
Vitolo Lessons in Script Fundamental Forms
3 pages
NetLiNQ EMS Installation Guide For SW 3.0 - 211007
No ratings yet
NetLiNQ EMS Installation Guide For SW 3.0 - 211007
51 pages
Amazing Jobs PDF
No ratings yet
Amazing Jobs PDF
9 pages
Drinking Water Often Helps Us To Live Longer
No ratings yet
Drinking Water Often Helps Us To Live Longer
5 pages
CEC IOBCProc Symp Fruit Flies Imp Rome 1987 2
No ratings yet
CEC IOBCProc Symp Fruit Flies Imp Rome 1987 2
648 pages
Mass Media Diary. Yaryna Hachok, Report Number 8
No ratings yet
Mass Media Diary. Yaryna Hachok, Report Number 8
3 pages
U.S. Customs Form: CBP Form 349 - Harbor Maintenance Fee Quarterly Summary Report
No ratings yet
U.S. Customs Form: CBP Form 349 - Harbor Maintenance Fee Quarterly Summary Report
2 pages
SPE-A Poroelastic Analysis To Address The Impact of
No ratings yet
SPE-A Poroelastic Analysis To Address The Impact of
7 pages
Piping Design Criteria
50% (2)
Piping Design Criteria
15 pages
Blackout in Spain and Portugal
No ratings yet
Blackout in Spain and Portugal
13 pages
Ips M El 181
No ratings yet
Ips M El 181
20 pages
Canine Nutrition
67% (3)
Canine Nutrition
216 pages
Escape From Tarkov Objetos Iniciales Misiones
No ratings yet
Escape From Tarkov Objetos Iniciales Misiones
9 pages
Hy 7246362
No ratings yet
Hy 7246362
4 pages
Module-2 Balancing of Rotating Masses: Depart of Mechanical Engineering, ATMECE MYSORE
No ratings yet
Module-2 Balancing of Rotating Masses: Depart of Mechanical Engineering, ATMECE MYSORE
26 pages
Fal Safah
No ratings yet
Fal Safah
10 pages
Chapter 2 Components of A Hydroelectric Power System
No ratings yet
Chapter 2 Components of A Hydroelectric Power System
59 pages
Memmert CO2 Incubator ICO50med - en
No ratings yet
Memmert CO2 Incubator ICO50med - en
4 pages
Adobe Scan May 28, 2022
No ratings yet
Adobe Scan May 28, 2022
5 pages
The Slumbering Storm
No ratings yet
The Slumbering Storm
19 pages