0% found this document useful (0 votes)

14 views92 pages

Unit 7 Neural Networks

This document provides an introduction to neural networks, explaining their structure, components, and functioning. It covers the roles of neurons, layers, weights, and biases, as well as the learning process through input computation, output generation, and iterative refinement. Additionally, it discusses the differences between single-layer perceptrons and multi-layer perceptrons, highlighting their applications in machine learning and deep learning.

Uploaded by

Juee Jamsandekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views92 pages

Unit 7 Neural Networks

Uploaded by

Juee Jamsandekar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Unit 7: Introduction to

Neural Networks
Prof.Bhavna Bose
• Single layer
What is a neural network?
• A neural network is a machine learning program, or model, that
makes decisions in a manner similar to the human brain, by using
processes that mimic the way biological neurons work together to
identify phenomena, weigh options and arrive at conclusions.
• Every neural network consists of layers of nodes, or artificial
neurons—an input layer, one or more hidden layers, and an output
layer. Each node connects to others, and has its own associated
weight and threshold.
• If the output of any individual node is above the specified threshold
value, that node is activated, sending data to the next layer of the
network.
• Otherwise, no data is passed along to the next layer of the network.
https://www.ibm.com/think/topics/neural-networks
Understanding Neural Networks
These networks are built from several key components:
1.Neurons: The basic units that receive inputs, each neuron is governed by a
threshold and an activation function.

2.Connections: Links between neurons that carry information, regulated by

weights and biases.

3.Weights and Biases: These parameters determine the strength and influence of
connections.

4.Propagation Functions: Mechanisms that help process and transfer data across
layers of neurons.

5.Learning Rule: The method that adjusts weights and biases over time to
improve accuracy.

https://www.geeksforgeeks.org/neural-networks-a-beginners-guide/
Learning in neural networks follows a structured, three-stage
process:
1.Input Computation: Data is fed into the network.

2.Output Generation: Based on the current parameters, the

network generates an output.

3.Iterative Refinement: The network refines its output by

adjusting weights and biases, gradually improving its
performance on diverse tasks.

https://www.geeksforgeeks.org/neural-networks-a-beginners-guide/
Layers in Neural Network Architecture
1.Input Layer: This is where the network receives its input data. Each
input neuron in the layer corresponds to a feature in the input data.

2.Hidden Layers: These layers perform most of the computational

heavy lifting. A neural network can have one or multiple hidden
layers. Each layer consists of units (neurons) that transform the
inputs into something that the output layer can use.

3.Output Layer: The final layer produces the output of the model.
The format of these outputs varies depending on the specific task
(e.g., classification, regression).

https://www.ibm.com/think/topics/neural-networks
Think of each individual node as its own linear regression model,
composed of input data, weights, a bias (or threshold), and an
output. The formula would look something like this:
∑wixi + bias = w1x1 + w2x2 + w3x3 + bias
• output = f(x) = 1 if ∑w1x1 + b>= 0; 0 if ∑w1x1 + b < 0

https://www.ibm.com/think/topics/neural-networks
• Once an input layer is determined, weights are assigned.
• These weights help determine the importance of any given variable,
with larger ones contributing more significantly to the output
compared to other inputs.
• All inputs are then multiplied by their respective weights and then
summed.
• Afterward, the output is passed through an activation function,
which determines the output.
• If that output exceeds a given threshold, it “fires” (or activates) the
node, passing data to the next layer in the network.
• This results in the output of one node becoming in the input of the
next node.
• This process of passing data from one layer to the next layer defines
this neural network as a feedforward network.

https://www.ibm.com/think/topics/neural-networks
• Biological neurons are pivotal in artificial neural network research,
mirroring the intricate structures responsible for brain functions.
• Soma, axons, dendrites, and synapses are part of neurons that help
process information. McCulloch-Pitts Neuron is an early
computational model that simulates the basic operations of these
biological units.

https://www.analyticsvidhya.com/blog/2024/07/mcculloch-pitts-neuron/#h-what-are-biological-neurons
What are Biological Neurons?
• Biological neurons are the fundamental units of the brain. They
consist of:
• Dendrite: Receives signals from other neurons.
• Soma: Processes the information.
• Axon: Transmits the output to other neurons.
• Synapse: Connection points to other neurons.
• A neuron functions like a tiny biological computer, taking input
signals, processing them, and passing on the output.
What is McCulloch-Pitts Neuron?
The McCulloch-Pitts Neuron is the first computational model of a
neuron. It can be divided into two parts:
1.Aggregation: The neuron aggregates multiple boolean inputs (0 or 1).
2.Threshold Decision: Based on the aggregated value, the neuron
makes a decision using a threshold function.
Example Scenario
Imagine wanting to predict whether to watch a football game. The
inputs (boolean values) could be:
• X1: Is Premier League on? (1 if yes, 0 if no)
• X2: Is it a friendly game? (1 if yes, 0 if no)
• X3: Are you not at home? (1 if yes, 0 if no)
• X4: Is Manchester United playing? (1 if yes, 0 if no)
• Each input can be excitatory or inhibitory. For instance, X3 is
inhibitory because you can’t watch the game at home.
• Thresholding Logic
• The neuron fires (outputs 1) if the aggregated sum of inputs meets or
exceeds a threshold value (θ). For example, if you always watch the
game when at least two conditions are met, θ would be 2.
Boolean Functions Using the McCulloch-Pitts
Neuron
The McCulloch-Pitts Neuron can represent various boolean functions:
• AND Function: Fires when all inputs are ON (( x1 + x2 + x3 >= 3 )).
• OR Function: Fires when any input is ON (( x1 + x2 + x3 >= 1 )).
• Inhibitory Input Function: Fires only when specific conditions are met
(e.g., ( x1 ) AND NOT ( x2 )).
• NOR Function: Fires when all inputs are OFF.
• NOT Function: Inverts the input.
• The McCulloch-Pitts model of a neural network is a simple binary (on-
off) model designed to simulate basic logical operations. Below, I’ll
walk you through a step-by-step process to calculate a Boolean
function using this model. Let’s consider the Boolean function =A∧B
(AND gate) as an example.
Geometric Interpretation
The McCulloch-Pitts Neuron can be visualized geometrically by plotting
inputs in a multi-dimensional space and drawing a decision boundary:
• OR Function: In 2D, the decision boundary is a line (( x1 + x2 = 1 )).
• AND Function: The decision boundary is a line (( x1 + x2 = 2 )).
• Generalization: The decision boundary becomes a plane in higher
dimensions for more inputs.
Single layer perceptron learning

1
−1
𝑤1 =
0
0⋅5
• https://youtu.be/KKSCmPUyczU?si=RK6nGan4-bXP1Rw3
• https://www.youtube.com/watch?app=desktop&v=ItkSCYzSD34
• multilayer

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
• A neuron, or artificial neuron, is a more generalized version of the
perceptron and is the building block of modern deep learning
architectures.
• Neurons in deep learning are part of multi-layer neural networks,
which can have multiple hidden layers.

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
Key Differences and Features of a Neuron:
• Activation Function: Unlike the perceptron, which uses a simple step
function for activation, neurons in modern neural networks can use a
variety of activation functions, such as:
•

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
• Multi-layer Networks:
• Neurons are part of more sophisticated architectures called multi-
layer perceptrons (MLPs) or deep neural networks, where neurons
are organized into layers (input layer, hidden layers, and output layer).
• Each layer performs computations, and the output of one layer is fed
as input to the next.

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
• Continuous Output: Neurons can output continuous values, unlike
the binary output of a perceptron. This makes them more versatile for
tasks like regression, multi-class classification, and complex feature
extraction.

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
• Learning through Backpropagation:
• Neurons in deep learning models are trained using backpropagation
and gradient descent, which adjusts the weights based on the error
between the predicted and actual outputs.
• The perceptron uses a simpler update rule that works only for linearly
separable problems.

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
Multi layer neural network
• A multilayer neural network is an advanced model used in artificial
intelligence and machine learning.
• Unlike a perceptron, which has only one layer of neurons, a multilayer
neural network has multiple layers stacked on top of each other.
• Each layer receives input from the previous layer and applies a
mathematical operation called an activation function, such as the sigmoid
function.
• This allows the network to capture complex relationships between inputs
and outputs.
• To make the network learn, we use back-propagation, a technique that
adjusts the weights connecting the neurons based on the error between
the predicted and actual output.
• This adjustment is controlled by gradient descent or other algorithms.
• Through this iterative process, the network improves its ability to make
accurate predictions, ultimately enabling it to solve complex problems.
https://muneebsa.medium.com/deep-learning-101-lesson-9-multi-layer-neural-network-7cd3a53066c8
https://muneebsa.medium.com/deep-learning-101-lesson-9-multi-layer-neural-network-7cd3a53066c8
Multilayer Perceptrons
• A multilayer perceptron is a type of feedforward neural network
consisting of fully connected neurons with a nonlinear kind of
activation function. It is widely used to distinguish data that is not
linearly separable.
• MLPs have been widely used in various fields, including image
recognition, natural language processing, and speech recognition,
among others. Their flexibility in architecture and ability to
approximate any function under certain conditions make them a
fundamental building block in deep learning and neural network
research. Let's take a deeper dive into some of its key concepts.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Input layer
• The input layer consists of nodes or neurons that receive the initial
input data. Each neuron represents a feature or dimension of the
input data. The number of neurons in the input layer is determined by
the dimensionality of the input data.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Hidden layer
• Between the input and output layers, there can be one or more layers
of neurons. Each neuron in a hidden layer receives inputs from all
neurons in the previous layer (either the input layer or another
hidden layer) and produces an output that is passed to the next layer.
The number of hidden layers and the number of neurons in each
hidden layer are hyperparameters that need to be determined during
the model design phase.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Output layer
• This layer consists of neurons that produce the final output of the
network. The number of neurons in the output layer depends on the
nature of the task. In binary classification, there may be either one or
two neurons depending on the activation function and representing
the probability of belonging to one class; while in multi-class
classification tasks, there can be multiple neurons in the output layer.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Weights
• Neurons in adjacent layers are fully connected to each other. Each
connection has an associated weight, which determines the strength
of the connection. These weights are learned during the training
process.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Bias neurons
• In addition to the input and hidden neurons, each layer (except the
input layer) usually includes a bias neuron that provides a constant
input to the neurons in the next layer. Bias neurons have their own
weight associated with each connection, which is also learned during
training.
• The bias neuron effectively shifts the activation function of the
neurons in the subsequent layer, allowing the network to learn an
offset or bias in the decision boundary. By adjusting the weights
connected to the bias neuron, the MLP can learn to control the
threshold for activation and better fit the training data.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Activation function
• Typically, each neuron in the hidden layers and the output layer
applies an activation function to its weighted sum of inputs. Common
activation functions include sigmoid, tanh, ReLU (Rectified Linear
Unit), and softmax. These functions introduce nonlinearity into the
network, allowing it to learn complex patterns in the data.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Training with backpropagation
• MLPs are trained using the backpropagation algorithm, which
computes gradients of a loss function with respect to the model's
parameters and updates the parameters iteratively to minimize the
loss.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
Working of a MLP
• Input layer
• The input layer of an MLP receives input data, which could be
features extracted from the input samples in a dataset. Each neuron
in the input layer represents one feature.
• Neurons in the input layer do not perform any computations; they
simply pass the input values to the neurons in the first hidden layer.
• Hidden layers
• The hidden layers of an MLP consist of interconnected neurons that
perform computations on the input data.
• Each neuron in a hidden layer receives input from all neurons in the
previous layer. The inputs are multiplied by corresponding weights,
denoted as w. The weights determine how much influence the input
from one neuron has on the output of another.
• In addition to weights, each neuron in the hidden layer has an
associated bias, denoted as b. The bias provides an additional input to
the neuron, allowing it to adjust its output threshold. Like weights,
biases are learned during training.
• For each neuron in a hidden layer or the output layer, the weighted
sum of its inputs is computed. This involves multiplying each input by
its corresponding weight, summing up these products, and adding the
bias:
• The weighted sum is then passed through an activation function,
denoted as f. The activation function introduces nonlinearity into the
network, allowing it to learn and represent complex relationships in
the data. The activation function determines the output range of the
neuron and its behavior in response to different input values. The
choice of activation function depends on the nature of the task and
the desired properties of the network.
• Output layer
• The output layer of an MLP produces the final predictions or outputs
of the network. The number of neurons in the output layer depends
on the task being performed (e.g., binary classification, multi-class
classification, regression).
• Each neuron in the output layer receives input from the neurons in
the last hidden layer and applies an activation function. This
activation function is usually different from those used in the hidden
layers and produces the final output value or prediction.
• During the training process, the network learns to adjust the weights
associated with each neuron's inputs to minimize the discrepancy
between the predicted outputs and the true target values in the
training data. By adjusting the weights and learning the appropriate
activation functions, the network learns to approximate complex
patterns and relationships in the data, enabling it to make accurate
predictions on new, unseen samples.
Sivanandam, S. N., and S. N. Deepa. Principles of soft computing (with CD). John Wiley & Sons, 2007.
Sivanandam, S. N., and S. N. Deepa. Principles of soft computing (with CD). John Wiley & Sons, 2007.
Back propogation
• The backpropagation learning algorithm is one of the most important developments in
neural networks.
• This learning algorithm is applied to a multilayer feed-forward network consisting of
processing elemens with continuous differentiable activation functions.
• The neural network associated with back-propagation learning algorithm are called back
-propagation networks. (BPNs).
• For a given set of training input-output pair, this algorithm provides a procedure for
changing the weights in a BPN to classify the given input patterns correctly.
• The basic concept for this weight update algorithm is simply the gradient descent method
as used in the case of simple peceptron networks with differentiable units.
• This is a method where the error is propagated back to the hidden unit.
• The aim of the neural network is to train the network to achieve a balance between the
network's ability to respond (memorization) and its ability to give reasonable responses to
the input that is similar, but not identical to the one that is used in training generalization.
Sivanandam, S. N., and S. N. Deepa. Principles of soft computing (with CD). John Wiley & Sons, 2007.
• The training of the BPN is done in three stages –
• the feed-forward of the input training pattern,
• The calculation and back-propagation of the error,
• and updation of weights.
• The testing of the BPN involves the computation of feed-forward
phase only.
• There can be more than one hidden layer (more beneficial) but one
hidden layer is sufficient.
• Even though the training is very slow, once the network is trained it
can produce its outputs very rapidly.
• https://www.youtube.com/watch?v=tUoUdOdTkRw
• https://www.youtube.com/watch?v=ItkSCYzSD34
• https://www.youtube.com/watch?v=tTjcakAuHPI
What is a Convolutional Neural Network?
• A Convolutional Neural Network (CNN) is a type of artificial
intelligence especially good at processing images and videos. They
draw inspiration from the structure of the human visual cortex.
• You can use CNNs in many applications, including image recognition,
facial recognition, and medical imaging analysis. They are able to
automatically extract features from images, which makes them very
powerful tools.
•
• https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
convolutional-neural-networks
• There are several reasons why CNNs are important in the modern world, as highlighted
below:
• CNNs are distinguished from classic machine learning algorithms such
as SVMs and decision trees by their ability to autonomously extract features at a large
scale, bypassing the need for manual feature engineering and thereby enhancing
efficiency.
• The convolutional layers grant CNNs their translation-invariant characteristics,
empowering them to identify and extract patterns and features from data irrespective of
variations in position, orientation, scale, or translation.
• A variety of pre-trained CNN architectures, including VGG-16, ResNet50, Inceptionv3,
and EfficientNet, have demonstrated top-tier performance. These models can be
adapted to new tasks with relatively little data through a process known as fine-tuning.
• Beyond image classification tasks, CNNs are versatile and can be applied to a range of
other domains, such as natural language processing, time series analysis, and speech
recognition.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Key Components of a CNN

• The convolutional neural network is made of four main parts.

• But how do CNNs Learn with those parts?

• They help the CNNs mimic how the human brain operates to recognize
patterns and features in images:

• Convolutional layers
• Rectified Linear Unit (ReLU for short)
• Pooling layers
• Fully connected layers
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Convolution layers
• This is the first building block of a CNN. As the name suggests, the
main mathematical task performed is called convolution, which is the
application of a sliding window function to a matrix of pixels
representing an image. The sliding function applied to the matrix is
called kernel or filter, and both can be used interchangeably.
• In the convolution layer, several filters of equal size are applied, and
each filter is used to recognize a specific pattern from the image, such
as the curving of the digits, the edges, the whole shape of the digits,
and more.
• It requires a few components, which are input data, a filter and
a feature map.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Put simply, in the convolution layer, we use small grids (called filters
or kernels) that move over the image. Each small grid is like a mini
magnifying glass that looks for specific patterns in the photo, like
lines, curves, or shapes. As it moves across the photo, it creates a new
grid that highlights where it found these patterns.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• The kernel used for the convolution is a matrix with a dimension of 3x3. The weights of
each element of the kernel is represented in the grid.
• In real life, the weights of the kernels are determined during the training process of the
neural network.
• Using these two matrices, we can perform the convolution operation by applying the dot
product, and work as follows:
1.Apply the kernel matrix from the top-left corner to the right.
2.Perform element-wise multiplication.
3.Sum the values of the products.
4.The resulting value corresponds to the first value (top-left corner) in the convoluted
matrix.
5.Move the kernel down with respect to the size of the sliding window.
6.Repeat steps 1 to 5 until the image matrix is fully covered.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Another name associated with the kernel in the literature is feature
detector because the weights can be fine-tuned to detect specific
features in the input image.
• For instance:
• Averaging neighboring pixels kernel can be used to blur the input
image.
• Subtracting neighboring kernel is used to perform edge detection.
• The more convolution layers the network has, the better the layer is
at detecting more abstract features.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Note that the weights in the
feature detector remain fixed as
it moves across the image,
which is also known as
parameter sharing.
• Some parameters such as the
weight values, adjust during
training through the process of
backpropagation and gradient
descent. However, there are
three hyperparameters which
affect the volume size of the
output that need to be set
before the training of the neural
network begins.

https://www.ibm.com/think/topics/convolutional-neural-networks
• Another convolution layer can follow the
initial convolution layer.
• When this happens, the structure of the
CNN can become hierarchical as the later
layers can see the pixels within the
receptive fields of prior layers.
• As an example, let’s assume that we’re
trying to determine if an image contains a
bicycle.
• You can think of the bicycle as a sum of
parts.
• It is comprised of a frame, handlebars,
wheels, pedals, and so on.
• Each individual part of the bicycle makes up
a lower-level pattern in the neural net, and
the combination of its parts represents a
higher-level pattern, creating a feature
hierarchy within the CNN.
• Ultimately, the convolutional layer converts
the image into numerical values, allowing
the neural network to interpret and extract
relevant patterns.

https://www.ibm.com/think/topics/convolutional-neural-networks
• These include:
• 1. The number of filters affects the depth of the output. For example, three distinct filters
would yield three different feature maps, creating a depth of three.
• 2. Stride is the distance, or number of pixels, that the kernel moves over the input matrix.
While stride values of two or greater is rare, a larger stride yields a smaller output.
• 3. Zero-padding is usually used when the filters do not fit the input image. This sets all
elements that fall outside of the input matrix to zero, producing a larger or equally sized
output. There are three types of padding:
• Valid padding: This is also known as no padding. In this case, the last convolution is dropped if dimensions do
not align.
• Same padding: This padding ensures that the output layer has the same size as the input layer.
• Full padding: This type of padding increases the size of the output by adding zeros to the border of the input.
• After each convolution operation, a CNN applies a Rectified Linear Unit (ReLU)
transformation to the feature map, introducing nonlinearity to the model.

https://www.ibm.com/think/topics/convolutional-neural-networks
Activation function
• A ReLU activation function is applied after each convolution
operation.
• This function helps the network learn non-linear relationships
between the features in the image, hence making the network more
robust for identifying different patterns.
• It also helps to mitigate the vanishing gradient problems.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Pooling layer
• The goal of the pooling layer is to pull the most significant features from the
convoluted matrix.
• This is done by applying some aggregation operations, which reduce the
dimension of the feature map (convoluted matrix), hence reducing the memory
used while training the network.
• Pooling is also relevant for mitigating overfitting.
• The most common aggregation functions that can be applied are:
• Max pooling, which is the maximum value of the feature map
• Sum pooling corresponds to the sum of all the values of the feature map
• Average pooling is the average of all the values.
• Also, the dimension of the feature map becomes smaller as the pooling function
is applied.
• The last pooling layer flattens its feature map so that it can be processed by the
fully connected layer.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Pooling layers, also known as downsampling, conducts dimensionality
reduction, reducing the number of parameters in the input. Similar to the
convolutional layer, the pooling operation sweeps a filter across the entire
input, but the difference is that this filter does not have any weights.
Instead, the kernel applies an aggregation function to the values within
the receptive field, populating the output array. There are two main types
of pooling:
• Max pooling: As the filter moves across the input, it selects the pixel with the
maximum value to send to the output array. As an aside, this approach tends to
be used more often compared to average pooling.
• Average pooling: As the filter moves across the input, it calculates the average
value within the receptive field to send to the output array.
• While a lot of information is lost in the pooling layer, it also has a number
of benefits to the CNN. They help to reduce complexity, improve
efficiency, and limit risk of overfitting.

https://www.ibm.com/think/topics/convolutional-neural-networks
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Fully connected layers
• These layers are in the last layer of the convolutional neural network,
and their inputs correspond to the flattened one-dimensional matrix
generated by the last pooling layer.
• ReLU activations functions are applied to them for non-linearity.
• Finally, a softmax prediction layer is used to generate probability
values for each of the possible output labels, and the final label
predicted is the one with the highest probability score.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Overfitting and Regularization in CNNs
• Overfitting is a common challenge in machine learning models and
CNN deep learning projects. It happens when the model learns the
training data too well (“learning by heart”), including its noise and
outliers. Such a learning leads to a model that performs well on the
training data but badly on new, unseen data.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Dropout: This consists of randomly dropping some neurons during the
training process, which forces the remaining neurons to learn new
features from the input data.
• Batch normalization: The overfitting is reduced at some extent by
normalizing the input layer by adjusting and scaling the activations. This
approach is also used to speed up and stabilize the training process.
• Pooling Layers: This can be used to reduce the spatial dimensions of the
input image to provide the model with an abstracted form of
representation, hence reducing the chance of overfitting.
• Early stopping: This consists of consistently monitoring the model’s
performance on validation data during the training process and stopping
the training whenever the validation error does not improve anymore.
• Noise injection: This process consists of adding noise to the inputs or the
outputs of hidden layers during the training to make the model more
robust and prevent it from a weak generalization.
• L1 and L2 normalizations: Both L1 and L2 are used to add a penalty to
the loss function based on the size of weights. More specifically, L1
encourages the weights to be spare, leading to better feature selection.
On the other hand, L2 (also called weight decay) encourages the weights
to be small, preventing them from having too much influence on the
predictions.
• Data augmentation: This is the process of artificially increasing the size
and diversity of the training dataset by applying random transformations
like rotation, scaling, flipping, or cropping to the input images.
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Image classification: Convolutional
neural networks are used for image
categorization, where images are
assigned to predefined categories.
One use of such a scenario is
automatic photo organization in
social media platforms.
• Object detection: CNNs are able to
identify and locate multiple objects
within an image. This capability is
crucial in multiple scenarios of shelf
scanning in retail to identify out-of-
stock items.
• Facial recognition: this is also one of
the main industries of application of
CNNs. For instance, this technology
can be embedded into security
systems for efficient control of
access based on facial features.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Recurrent Neural Network
• A recurrent neural network or RNN is a deep neural network trained
on sequential or time series data to create a machine learning (ML)
model that can make sequential predictions or conclusions based
on sequential inputs.
• An RNN might be used to predict daily flood levels based on past
daily flood, tide and meteorological data. But RNNs can also be used
to solve ordinal or temporal problems such as language
translation, natural language processing (NLP), sentiment
analysis, speech recognition and image captioning.
• https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
recurrent-neural-networks
•
https://www.ibm.com/think/topics/recurrent-neural-networks
• Like traditional neural networks, such as feedforward neural
networks and convolutional neural networks (CNNs), recurrent
neural networks use training data to learn. They are
distinguished by their “memory” as they take information from
prior inputs to influence the current input and output.
• While traditional deep learning networks assume that inputs
and outputs are independent of each other, the output of
recurrent neural networks depend on the prior elements within
the sequence.

https://www.ibm.com/think/topics/recurrent-neural-networks
• Let’s take an idiom, such as “feeling under the weather,” which is
commonly used when someone is ill to aid us in the explanation of
RNNs. For the idiom to make sense, it needs to be expressed in that
specific order. As a result, recurrent networks need to account for
the position of each word in the idiom, and they use that information
to predict the next word in the sequence.
• Each word in the phrase "feeling under the weather" is part of a
sequence, where the order matters. The RNN tracks the context by
maintaining a hidden state at each time step. A feedback loop is
created by passing the hidden state from one-time step to the next.
The hidden state acts as a memory that stores information about
previous inputs. At each time step, the RNN processes the current
input (for example, a word in a sentence) along with the hidden
state from the previous time step. This allows the RNN to
"remember" previous data points and use that information to
influence the current output.

https://www.ibm.com/think/topics/recurrent-neural-networks
• Another distinguishing characteristic of recurrent networks is
that they share parameters across each layer of the network.
• While feedforward networks have different weights across
each node, recurrent neural networks share the same weight
parameter within each layer of the network.
• That said, these weights are still adjusted through the
processes of backpropagation and gradient descent to
facilitate reinforcement learning.

https://www.ibm.com/think/topics/recurrent-neural-networks
• Recurrent neural networks use forward propagation and
backpropagation through time (BPTT) algorithms to determine
the gradients (or derivatives), which is slightly different from
traditional backpropagation as it is specific to sequence data.
• The principles of BPTT are the same as
traditional backpropagation, where the model trains itself by
calculating errors from its output layer to its input layer.
• These calculations allow us to adjust and fit the parameters of
the model appropriately.
• BPTT differs from the traditional approach in that BPTT sums
errors at each time step whereas feedforward networks do not
need to sum errors as they do not share parameters across
each layer.

https://www.ibm.com/think/topics/recurrent-neural-networks
Types of RNNs
• Feedforward networks map inputs
and outputs one-to-one, and while
we’ve visualized recurrent neural
networks in this way in the diagrams
before this, they do not have this
constraint. Instead, their inputs and
outputs can vary in length, and
different types of RNNs are used for
different use cases, such as music
generation, sentiment classification
and machine translation.
• Popular recurrent neural network
architecture variants include:
• Standard RNNs
• Bidirectional recurrent neural networks
(BRRNs)
• Long short-term memory (LSTM)
• Gated recurrent units (GNUs)
• Encoder-decoder RNN

https://www.ibm.com/think/topics/recurrent-neural-networks
Standard RNNs
• The most basic version of an RNN, where the output at each time step
depends on both the current input and the hidden state from the previous
time step, suffers from problems such as vanishing gradients, making it
difficult for them to learn long-term dependencies.
• They excel in simple tasks with short-term dependencies, such as
predicting the next word in a sentence (for short, simple sentences) or the
next value in a simple time series.
• RNNs are good for tasks that process data sequentially in real time, such as
processing sensor data to detect anomalies in short time frames, where
inputs are received one at a time and predictions need to be made
immediately based on the most recent inputs.
Bidirectional recurrent neural networks
(BRNNs)
• While unidirectional RNNs can only be drawn from previous
inputs to make predictions about the current state,
bidirectional RNNs or BRNNs, pull in future data to improve the
accuracy of it.
• Returning to the example of “feeling under the weather”, a
model based on a BRNN can better predict that the second
word in that phrase is “under” if it knows that the last word in
the sequence is “weather.”
Long short-term memory (LSTM)
• LSTM is a popular RNN architecture, which was introduced by Sepp
Hochreiter and Juergen Schmidhuber as a solution to the vanishing
gradient problem. This work addressed the problem of long-term
dependencies. That is, if the previous state that is influencing the current
prediction is not in the recent past, the RNN model might not be able to
accurately predict the current state.
• As an example, let’s say we wanted to predict the italicized words in,
“Alice is allergic to nuts. She can’t eat peanut butter.” The context of a nut
allergy can help us anticipate that the food that cannot be eaten contains
nuts. However, if that context was a few sentences prior, then it would
make it difficult or even impossible for the RNN to connect the
information.
• To remedy this, LSTM networks have “cells” in the hidden layers of the
artificial neural network, which have 3 gates: an input gate, an output gate
and a forget gate. These gates control the flow of information that is
needed to predict the output in the network. For example, if gender
pronouns, such as “she”, was repeated multiple times in prior sentences,
you might exclude that from the cell state.
Gated recurrent units (GRUs)
• A GRU is similar to an LSTM as it also works to address the short-term
memory problem of RNN models. Instead of using a “cell state” to
regulate information, it uses hidden states, and instead of 3 gates, it
has 2: a reset gate and an update gate. Similar to the gates within
LSTMs, the reset and update gates control how much and which
information to retain.
• Because of its simpler architecture, GRUs are computationally more
efficient and require fewer parameters compared to LSTMs. This
makes them faster to train and often more suitable for certain real-
time or resource-constrained applications.
Encoder-decoder RNNs
• These are commonly used for sequence-to-sequence tasks, such as
machine translation. The encoder processes the input sequence into
a fixed-length vector (context), and the decoder uses that context to
generate the output sequence. However, the fixed-length context
vector can be a bottleneck, especially for long input sequences.
Fitting a neural network
• Fitting a neural network involves training it to map inputs to outputs by
adjusting its parameters (weights and biases) using a training dataset and
optimization algorithms like gradient descent.
• Process:
• 1. Data Preparation:
• Define the Problem:
• Clearly identify the input and output variables, and the type of problem
(regression, classification, etc.).
• Collect and Prepare Data:
• Gather a sufficient amount of relevant data, preprocess it (e.g., scaling,
normalization), and split it into training, validation, and testing sets.
Fitting a neural network
• 2. Neural Network Architecture:
• Choose the Network Type:
• Select the appropriate type of neural network (e.g., feedforward,
convolutional, recurrent) based on the problem.
• Define Layers and Neurons:
• Determine the number of layers, the number of neurons in each layer, and
the activation functions for each layer.
• 3. Training:
• Initialization: Randomly initialize the weights and biases of the network.
• Forward Propagation: Feed the input data through the network to
generate an output.
Fitting a neural network
• Loss Function: Define a loss function that quantifies the difference between the
predicted and actual outputs.
• Backpropagation: Calculate the gradient of the loss function with respect to the
weights and biases using backpropagation.
• Optimization: Update the weights and biases using an optimization algorithm
(e.g., stochastic gradient descent) to minimize the loss function.
• Iteration: Repeat the forward and backward propagation steps until the network
converges to a satisfactory solution.
• 4. Evaluation and Tuning:
• Validation:
• Use the validation set to monitor the network's performance during training and
prevent overfitting.
• Testing:
Fitting a neural network
• Evaluate the trained network's performance on the test set to assess its
generalization ability.
• Tuning:
• Adjust hyperparameters (e.g., learning rate, number of layers, number of
neurons) to improve the network's performance.
• Tools and Libraries:
• MathWorks (MATLAB): Provides tools for creating, visualizing, and training
neural networks.
• OriginLab (OriginPro): Offers a Neural Network Fitting App for fitting data
with neural networks.
• Python Libraries: TensorFlow, PyTorch, Scikit-learn, Keras.

A2. Revision 2
No ratings yet
A2. Revision 2
2 pages
CUNY SUNY Admissions Profile
No ratings yet
CUNY SUNY Admissions Profile
5 pages
My Search For Radionic Truths
67% (3)
My Search For Radionic Truths
11 pages
Unit 1 Deep Learning
No ratings yet
Unit 1 Deep Learning
20 pages
Unit 1 Notes Final.docx
No ratings yet
Unit 1 Notes Final.docx
36 pages
unit 4- DL
No ratings yet
unit 4- DL
33 pages
Deep Leaning
No ratings yet
Deep Leaning
117 pages
Unit 1
No ratings yet
Unit 1
25 pages
Neural Networks
No ratings yet
Neural Networks
33 pages
Ml Unit Iiia
No ratings yet
Ml Unit Iiia
180 pages
DL_Unit_I_&_Unit_II
No ratings yet
DL_Unit_I_&_Unit_II
156 pages
Module5
No ratings yet
Module5
91 pages
Neural Network
No ratings yet
Neural Network
85 pages
Unit 5
No ratings yet
Unit 5
25 pages
4-Early Neural Network Architectures (MADALINE Network), And Application Domains.-16!12!2024
No ratings yet
4-Early Neural Network Architectures (MADALINE Network), And Application Domains.-16!12!2024
136 pages
ML Module 5
No ratings yet
ML Module 5
14 pages
ASC-unit 1 Notes
No ratings yet
ASC-unit 1 Notes
46 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
Ict L2 PDF
No ratings yet
Ict L2 PDF
49 pages
UNIT - 4
No ratings yet
UNIT - 4
17 pages
Unit1.2-OOMDUML
No ratings yet
Unit1.2-OOMDUML
28 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Lecture 01-Introduction
No ratings yet
Lecture 01-Introduction
33 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Lesson 03 Artificial Neural Network
No ratings yet
Lesson 03 Artificial Neural Network
116 pages
The Introduction To Neural Networks 10 4 24
No ratings yet
The Introduction To Neural Networks 10 4 24
54 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
48 pages
deep_learning
No ratings yet
deep_learning
5 pages
Module 4 artificial neural network
No ratings yet
Module 4 artificial neural network
18 pages
Neural network
No ratings yet
Neural network
7 pages
NNDL
No ratings yet
NNDL
96 pages
Introduction To Neural Networks
100% (1)
Introduction To Neural Networks
46 pages
Introduction To Neural Networks: Training Learn Generalization
No ratings yet
Introduction To Neural Networks: Training Learn Generalization
46 pages
Week 2
No ratings yet
Week 2
47 pages
Neural Nets
No ratings yet
Neural Nets
43 pages
Biological Neuron Artificial Neuron
No ratings yet
Biological Neuron Artificial Neuron
18 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Unit1.2
No ratings yet
Unit1.2
28 pages
brain and Neuron
No ratings yet
brain and Neuron
18 pages
UNIT II Basic On Neural Networks
No ratings yet
UNIT II Basic On Neural Networks
36 pages
Unit-7_ANN
No ratings yet
Unit-7_ANN
211 pages
Intro To DL - Module - 1 2
No ratings yet
Intro To DL - Module - 1 2
115 pages
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
35 pages
4 Neural Networks
No ratings yet
4 Neural Networks
44 pages
NN Lecture1 Introduction
No ratings yet
NN Lecture1 Introduction
40 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Chapter 3-1 Neural Network
No ratings yet
Chapter 3-1 Neural Network
43 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Artificial Neural Networks (1)
No ratings yet
Artificial Neural Networks (1)
17 pages
LIET III-II CSE AIML IV UNIT Previous Yrs QN Papers Qns and Answers
No ratings yet
LIET III-II CSE AIML IV UNIT Previous Yrs QN Papers Qns and Answers
15 pages
DL_IT324a_2_ANN
No ratings yet
DL_IT324a_2_ANN
123 pages
UNIT1_C
No ratings yet
UNIT1_C
21 pages
Neural Networks
No ratings yet
Neural Networks
16 pages
Deep Learning Unit1
No ratings yet
Deep Learning Unit1
25 pages
Lecture 7 - Neural Networks
No ratings yet
Lecture 7 - Neural Networks
48 pages
UNIT-4 TNM
No ratings yet
UNIT-4 TNM
25 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
22 pages
UNIT V
No ratings yet
UNIT V
49 pages
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
No ratings yet
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-11 Reference-Material-I
40 pages
Counter Propagation Network
No ratings yet
Counter Propagation Network
229 pages
Unit 1
No ratings yet
Unit 1
20 pages
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
From Everand
Artificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation
Fouad Sabry
No ratings yet
Unit 3
No ratings yet
Unit 3
53 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Juee A090 Sas Exp 7
No ratings yet
Juee A090 Sas Exp 7
6 pages
A082 Practical No 9
No ratings yet
A082 Practical No 9
9 pages
Practical No 6 A082
No ratings yet
Practical No 6 A082
7 pages
Breadth-First Search (BFS)
No ratings yet
Breadth-First Search (BFS)
10 pages
Ni Atex
No ratings yet
Ni Atex
15 pages
Worms Day 1
No ratings yet
Worms Day 1
1 page
RF Circuit Design and Manufacture: Workshop On
100% (1)
RF Circuit Design and Manufacture: Workshop On
2 pages
LM386M-1/LM386MX-1 Low Voltage Audio Power Amplifier: 1 Features 3 Description
No ratings yet
LM386M-1/LM386MX-1 Low Voltage Audio Power Amplifier: 1 Features 3 Description
28 pages
(GSA Chemical Detection Selection Guide) Dhs100-06
100% (1)
(GSA Chemical Detection Selection Guide) Dhs100-06
473 pages
Sounding Polite ST
No ratings yet
Sounding Polite ST
7 pages
Puncture and Tear of Woven Fabrics
No ratings yet
Puncture and Tear of Woven Fabrics
8 pages
Bluetooth Network Encapsulation (BNEP) Protocol Test Cases - Rev0.95a
No ratings yet
Bluetooth Network Encapsulation (BNEP) Protocol Test Cases - Rev0.95a
41 pages
Adobe Scan 30-Dec-2022
No ratings yet
Adobe Scan 30-Dec-2022
6 pages
Anr 6.37 (63700007) 20211222 220802 4148675534
No ratings yet
Anr 6.37 (63700007) 20211222 220802 4148675534
12 pages
1st Departmental Test
No ratings yet
1st Departmental Test
3 pages
HSS Handbook 2018 Digital
No ratings yet
HSS Handbook 2018 Digital
16 pages
PALM Flat Scanner: For General Weld Inspection
No ratings yet
PALM Flat Scanner: For General Weld Inspection
4 pages
Quota Check
No ratings yet
Quota Check
38 pages
Measuring Electrical Tool Safety Knowledge and Practices in the Workplace (GROUP 3)_REVISED
No ratings yet
Measuring Electrical Tool Safety Knowledge and Practices in the Workplace (GROUP 3)_REVISED
25 pages
Year7 English
No ratings yet
Year7 English
8 pages
Optimal Drum Filtration Strategy in Enzyme Recovery 06-11-20 JPR
No ratings yet
Optimal Drum Filtration Strategy in Enzyme Recovery 06-11-20 JPR
76 pages
Category 4, Grade 9-10
No ratings yet
Category 4, Grade 9-10
5 pages
Productividad Economica.
No ratings yet
Productividad Economica.
8 pages
Lect On Estimation of The Survival Function
No ratings yet
Lect On Estimation of The Survival Function
44 pages
Residence Distribution Box - s-MCB-Wire Size Calculation (1.1.19)
No ratings yet
Residence Distribution Box - s-MCB-Wire Size Calculation (1.1.19)
14 pages
Optimisation OF GSM
No ratings yet
Optimisation OF GSM
536 pages
Mikropor America ESD Guide
No ratings yet
Mikropor America ESD Guide
8 pages
Know Yourself and Your Destination Workshop
No ratings yet
Know Yourself and Your Destination Workshop
34 pages
PPR-2663 Flashover Performance MVLC
No ratings yet
PPR-2663 Flashover Performance MVLC
5 pages
E-Tech Lesson Guide - Unit 3-5
No ratings yet
E-Tech Lesson Guide - Unit 3-5
16 pages

Unit 7 Neural Networks

Uploaded by

Unit 7 Neural Networks

Uploaded by

Unit 7: Introduction to

2.Connections: Links between neurons that carry information, regulated by

2.Output Generation: Based on the current parameters, the

3.Iterative Refinement: The network refines its output by

2.Hidden Layers: These layers perform most of the computational

• The convolutional neural network is made of four main parts.

• But how do CNNs Learn with those parts?

You might also like