MACHINE LEARNING NEURAL NETWORK PPT UNIT 4

Activation Functions:
• Artificial neurons are elementary units in an artificial neural
network. The artificial neuron receives one or more inputs
and sums them to produce an output. Each input is separately
weighted, and the sum is passed through a function known as
an activation function or transfer function

Threshold activation function
• The threshold activation function is defined by

Unit step functions
• Sometimes, the threshold activation function is also defined as a unit step function in which case it is called a
unit-step activation function

Sigmoid activation function (logistic function):
 One of the most commonly used activation functions is the sigmoid activation function.
 It is a function which is plotted as ‘S’ shaped graph

GRADIENT DESCENT IN MACHINE LEARNING
• Gradient Descent is known as one of the most commonly used optimization
algorithms to train machine learning models by means of minimizing errors
between actual and expected results.
• In mathematical terminology, Optimization algorithm refers to the task of
minimizing/maximizing an objective function f(x) parameterized by x. Similarly,
in machine learning, optimization is the task of minimizing the cost function
parameterized by the model's parameters.
• “Gradient Descent is defined as one of the most commonly used iterative
optimization algorithms of machine learning to train the machine learning and
deep learning models. It helps in finding the local minimum of a function.”

• If we move towards a negative gradient or away from the gradient of
the function at the current point, it will give the local minimum of
that function.
• Whenever we move towards a positive gradient or towards the
gradient of the function at the current point, we will get the local
maximum of that function.

Cost-function
The cost function is defined as the measurement of
difference or error between actual values and expected
values at the current position

Types of Gradient Descent
• Batch gradient descent,
• stochastic gradient descent, and
• mini-batch gradient descent

1. Batch Gradient Descent:
• Batch gradient descent (BGD) is used to find the error for each point
in the training set and update the model after evaluating all training
examples.
• This procedure is known as the training epoch
2. Stochastic gradient descent
• Stochastic gradient descent (SGD) is a type of gradient descent that
runs one training example per iteration.
3.MiniBatch Gradient Descent:
• Mini Batch gradient descent is the combination of both batch gradient
descent and stochastic gradient descent. It divides the training datasets
into small batch sizes then performs the updates on those batches
separately.

Challenges with the Gradient Descent
• Local Minima and Saddle Point:
Whenever the slope of the cost function is at zero or just close to
zero, this model stops learning further.

2.Vanishing and Exploding Gradient
Vanishing Gradients:
• Vanishing Gradient occurs when the gradient is smaller than expected.
Exploding Gradient:
• Exploding gradient is just opposite to the vanishing gradient as it
occurs when the Gradient is too large and creates a stable model.

UNIT SATURATION
• Unit saturation occurs when the output of an activation
function reaches its maximum or minimum value and stops
responding to changes in input. This can hinder the learning
process of neural networks.

The problem
•As more layers using certain activation functions are added to neural
networks, the gradients of the loss function approaches zero, making the network
hard to train.

Solution
• The simplest solution is to use other activation functions,
such as ReLU, which doesn't cause a small derivative.
Residual networks are another solution, as they provide
residual connections straight to earlier layers

ReLU
•The rectified linear activation unit, or ReLU, is one of the few landmarks in the
deep learning revolution. It's simple, yet it's far superior to previous activation
functions like sigmoid or tanh.
• ReLU formula is: f(x) = max(0,x)
• If the function receives any negative input, it returns 0; however, if the
function receives any positive value x, it returns that value.

def relu(x):
return max(0.0, x)
To test the function, let’s run it on a few inputs.
x = 1.0
print('Applying Relu on (%.1f) gives %.1f' % (x, relu(x)))
x = -10.0
x = 0.0
x = 15.0
x = -20.0

Advantages of ReLU:
• ReLU is used in the hidden layers instead of Sigmoid or tanh
as using sigmoid or tanh in the hidden layers leads to the
infamous problem of "Vanishing Gradient".
• Simpler Computation
• Mitigates Vanishing Gradient Problem
• Less Computational Cost

Disadvantages of ReLU:
•Exploding Gradient
•Dying ReLU
•Sensitivity to Outliers

HYPERPARAMETER TUNING
•These parameters express important properties of the
model such as its complexity or how fast it should
learn.
• They are usually fixed before the actual training process
begins. These parameters express important properties of the
model

EXAMPLE:
• Grid Search CV
It searches for the best set of hyperparameters from a grid of hyperparameters values.

DROUPOUT
• "Dropout" in machine learning refers to the process of randomly
ignoring certain nodes in a layer during training.

• Dropout: A Simple Way to Prevent Neural Networks from Overfitting

MACHINE LEARNING NEURAL NETWORK PPT UNIT 4

More Related Content

Similar to MACHINE LEARNING NEURAL NETWORK PPT UNIT 4

Recently uploaded

MACHINE LEARNING NEURAL NETWORK PPT UNIT 4