0% found this document useful (0 votes)
14 views92 pages

Unit 7 Neural Networks

This document provides an introduction to neural networks, explaining their structure, components, and functioning. It covers the roles of neurons, layers, weights, and biases, as well as the learning process through input computation, output generation, and iterative refinement. Additionally, it discusses the differences between single-layer perceptrons and multi-layer perceptrons, highlighting their applications in machine learning and deep learning.

Uploaded by

Juee Jamsandekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views92 pages

Unit 7 Neural Networks

This document provides an introduction to neural networks, explaining their structure, components, and functioning. It covers the roles of neurons, layers, weights, and biases, as well as the learning process through input computation, output generation, and iterative refinement. Additionally, it discusses the differences between single-layer perceptrons and multi-layer perceptrons, highlighting their applications in machine learning and deep learning.

Uploaded by

Juee Jamsandekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

Unit 7: Introduction to

Neural Networks
Prof.Bhavna Bose
• Single layer
What is a neural network?
• A neural network is a machine learning program, or model, that
makes decisions in a manner similar to the human brain, by using
processes that mimic the way biological neurons work together to
identify phenomena, weigh options and arrive at conclusions.
• Every neural network consists of layers of nodes, or artificial
neurons—an input layer, one or more hidden layers, and an output
layer. Each node connects to others, and has its own associated
weight and threshold.
• If the output of any individual node is above the specified threshold
value, that node is activated, sending data to the next layer of the
network.
• Otherwise, no data is passed along to the next layer of the network.
https://www.ibm.com/think/topics/neural-networks
Understanding Neural Networks
These networks are built from several key components:
1.Neurons: The basic units that receive inputs, each neuron is governed by a
threshold and an activation function.

2.Connections: Links between neurons that carry information, regulated by


weights and biases.

3.Weights and Biases: These parameters determine the strength and influence of
connections.

4.Propagation Functions: Mechanisms that help process and transfer data across
layers of neurons.

5.Learning Rule: The method that adjusts weights and biases over time to
improve accuracy.

https://www.geeksforgeeks.org/neural-networks-a-beginners-guide/
Learning in neural networks follows a structured, three-stage
process:
1.Input Computation: Data is fed into the network.

2.Output Generation: Based on the current parameters, the


network generates an output.

3.Iterative Refinement: The network refines its output by


adjusting weights and biases, gradually improving its
performance on diverse tasks.

https://www.geeksforgeeks.org/neural-networks-a-beginners-guide/
Layers in Neural Network Architecture
1.Input Layer: This is where the network receives its input data. Each
input neuron in the layer corresponds to a feature in the input data.

2.Hidden Layers: These layers perform most of the computational


heavy lifting. A neural network can have one or multiple hidden
layers. Each layer consists of units (neurons) that transform the
inputs into something that the output layer can use.

3.Output Layer: The final layer produces the output of the model.
The format of these outputs varies depending on the specific task
(e.g., classification, regression).

https://www.ibm.com/think/topics/neural-networks
Think of each individual node as its own linear regression model,
composed of input data, weights, a bias (or threshold), and an
output. The formula would look something like this:
∑wixi + bias = w1x1 + w2x2 + w3x3 + bias
• output = f(x) = 1 if ∑w1x1 + b>= 0; 0 if ∑w1x1 + b < 0

https://www.ibm.com/think/topics/neural-networks
• Once an input layer is determined, weights are assigned.
• These weights help determine the importance of any given variable,
with larger ones contributing more significantly to the output
compared to other inputs.
• All inputs are then multiplied by their respective weights and then
summed.
• Afterward, the output is passed through an activation function,
which determines the output.
• If that output exceeds a given threshold, it “fires” (or activates) the
node, passing data to the next layer in the network.
• This results in the output of one node becoming in the input of the
next node.
• This process of passing data from one layer to the next layer defines
this neural network as a feedforward network.

https://www.ibm.com/think/topics/neural-networks
• Biological neurons are pivotal in artificial neural network research,
mirroring the intricate structures responsible for brain functions.
• Soma, axons, dendrites, and synapses are part of neurons that help
process information. McCulloch-Pitts Neuron is an early
computational model that simulates the basic operations of these
biological units.

https://www.analyticsvidhya.com/blog/2024/07/mcculloch-pitts-neuron/#h-what-are-biological-neurons
What are Biological Neurons?
• Biological neurons are the fundamental units of the brain. They
consist of:
• Dendrite: Receives signals from other neurons.
• Soma: Processes the information.
• Axon: Transmits the output to other neurons.
• Synapse: Connection points to other neurons.
• A neuron functions like a tiny biological computer, taking input
signals, processing them, and passing on the output.
What is McCulloch-Pitts Neuron?
The McCulloch-Pitts Neuron is the first computational model of a
neuron. It can be divided into two parts:
1.Aggregation: The neuron aggregates multiple boolean inputs (0 or 1).
2.Threshold Decision: Based on the aggregated value, the neuron
makes a decision using a threshold function.
Example Scenario
Imagine wanting to predict whether to watch a football game. The
inputs (boolean values) could be:
• X1: Is Premier League on? (1 if yes, 0 if no)
• X2: Is it a friendly game? (1 if yes, 0 if no)
• X3: Are you not at home? (1 if yes, 0 if no)
• X4: Is Manchester United playing? (1 if yes, 0 if no)
• Each input can be excitatory or inhibitory. For instance, X3 is
inhibitory because you can’t watch the game at home.
• Thresholding Logic
• The neuron fires (outputs 1) if the aggregated sum of inputs meets or
exceeds a threshold value (θ). For example, if you always watch the
game when at least two conditions are met, θ would be 2.
Boolean Functions Using the McCulloch-Pitts
Neuron
The McCulloch-Pitts Neuron can represent various boolean functions:
• AND Function: Fires when all inputs are ON (( x1 + x2 + x3 >= 3 )).
• OR Function: Fires when any input is ON (( x1 + x2 + x3 >= 1 )).
• Inhibitory Input Function: Fires only when specific conditions are met
(e.g., ( x1 ) AND NOT ( x2 )).
• NOR Function: Fires when all inputs are OFF.
• NOT Function: Inverts the input.
• The McCulloch-Pitts model of a neural network is a simple binary (on-
off) model designed to simulate basic logical operations. Below, I’ll
walk you through a step-by-step process to calculate a Boolean
function using this model. Let’s consider the Boolean function =A∧B
(AND gate) as an example.
Geometric Interpretation
The McCulloch-Pitts Neuron can be visualized geometrically by plotting
inputs in a multi-dimensional space and drawing a decision boundary:
• OR Function: In 2D, the decision boundary is a line (( x1 + x2 = 1 )).
• AND Function: The decision boundary is a line (( x1 + x2 = 2 )).
• Generalization: The decision boundary becomes a plane in higher
dimensions for more inputs.
Single layer perceptron learning

1
−1
𝑤1 =
0
0⋅5
• https://youtu.be/KKSCmPUyczU?si=RK6nGan4-bXP1Rw3
• https://www.youtube.com/watch?app=desktop&v=ItkSCYzSD34
• multilayer

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
• A neuron, or artificial neuron, is a more generalized version of the
perceptron and is the building block of modern deep learning
architectures.
• Neurons in deep learning are part of multi-layer neural networks,
which can have multiple hidden layers.

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
Key Differences and Features of a Neuron:
• Activation Function: Unlike the perceptron, which uses a simple step
function for activation, neurons in modern neural networks can use a
variety of activation functions, such as:

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
• Multi-layer Networks:
• Neurons are part of more sophisticated architectures called multi-
layer perceptrons (MLPs) or deep neural networks, where neurons
are organized into layers (input layer, hidden layers, and output layer).
• Each layer performs computations, and the output of one layer is fed
as input to the next.

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
• Continuous Output: Neurons can output continuous values, unlike
the binary output of a perceptron. This makes them more versatile for
tasks like regression, multi-class classification, and complex feature
extraction.

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
• Learning through Backpropagation:
• Neurons in deep learning models are trained using backpropagation
and gradient descent, which adjusts the weights based on the error
between the predicted and actual outputs.
• The perceptron uses a simpler update rule that works only for linearly
separable problems.

https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
https://medium.com/@abhishekjainindore24/perceptron-vs-neuron-single-layer-perceptron-and-multi-layer-perceptron-68ce4e8db5ea
Multi layer neural network
• A multilayer neural network is an advanced model used in artificial
intelligence and machine learning.
• Unlike a perceptron, which has only one layer of neurons, a multilayer
neural network has multiple layers stacked on top of each other.
• Each layer receives input from the previous layer and applies a
mathematical operation called an activation function, such as the sigmoid
function.
• This allows the network to capture complex relationships between inputs
and outputs.
• To make the network learn, we use back-propagation, a technique that
adjusts the weights connecting the neurons based on the error between
the predicted and actual output.
• This adjustment is controlled by gradient descent or other algorithms.
• Through this iterative process, the network improves its ability to make
accurate predictions, ultimately enabling it to solve complex problems.
https://muneebsa.medium.com/deep-learning-101-lesson-9-multi-layer-neural-network-7cd3a53066c8
https://muneebsa.medium.com/deep-learning-101-lesson-9-multi-layer-neural-network-7cd3a53066c8
Multilayer Perceptrons
• A multilayer perceptron is a type of feedforward neural network
consisting of fully connected neurons with a nonlinear kind of
activation function. It is widely used to distinguish data that is not
linearly separable.
• MLPs have been widely used in various fields, including image
recognition, natural language processing, and speech recognition,
among others. Their flexibility in architecture and ability to
approximate any function under certain conditions make them a
fundamental building block in deep learning and neural network
research. Let's take a deeper dive into some of its key concepts.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Input layer
• The input layer consists of nodes or neurons that receive the initial
input data. Each neuron represents a feature or dimension of the
input data. The number of neurons in the input layer is determined by
the dimensionality of the input data.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Hidden layer
• Between the input and output layers, there can be one or more layers
of neurons. Each neuron in a hidden layer receives inputs from all
neurons in the previous layer (either the input layer or another
hidden layer) and produces an output that is passed to the next layer.
The number of hidden layers and the number of neurons in each
hidden layer are hyperparameters that need to be determined during
the model design phase.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Output layer
• This layer consists of neurons that produce the final output of the
network. The number of neurons in the output layer depends on the
nature of the task. In binary classification, there may be either one or
two neurons depending on the activation function and representing
the probability of belonging to one class; while in multi-class
classification tasks, there can be multiple neurons in the output layer.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Weights
• Neurons in adjacent layers are fully connected to each other. Each
connection has an associated weight, which determines the strength
of the connection. These weights are learned during the training
process.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Bias neurons
• In addition to the input and hidden neurons, each layer (except the
input layer) usually includes a bias neuron that provides a constant
input to the neurons in the next layer. Bias neurons have their own
weight associated with each connection, which is also learned during
training.
• The bias neuron effectively shifts the activation function of the
neurons in the subsequent layer, allowing the network to learn an
offset or bias in the decision boundary. By adjusting the weights
connected to the bias neuron, the MLP can learn to control the
threshold for activation and better fit the training data.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Activation function
• Typically, each neuron in the hidden layers and the output layer
applies an activation function to its weighted sum of inputs. Common
activation functions include sigmoid, tanh, ReLU (Rectified Linear
Unit), and softmax. These functions introduce nonlinearity into the
network, allowing it to learn complex patterns in the data.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
• Training with backpropagation
• MLPs are trained using the backpropagation algorithm, which
computes gradients of a loss function with respect to the model's
parameters and updates the parameters iteratively to minimize the
loss.

https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
https://www.datacamp.com/tutorial/multilayer-perceptrons-in-machine-learning
Working of a MLP
• Input layer
• The input layer of an MLP receives input data, which could be
features extracted from the input samples in a dataset. Each neuron
in the input layer represents one feature.
• Neurons in the input layer do not perform any computations; they
simply pass the input values to the neurons in the first hidden layer.
• Hidden layers
• The hidden layers of an MLP consist of interconnected neurons that
perform computations on the input data.
• Each neuron in a hidden layer receives input from all neurons in the
previous layer. The inputs are multiplied by corresponding weights,
denoted as w. The weights determine how much influence the input
from one neuron has on the output of another.
• In addition to weights, each neuron in the hidden layer has an
associated bias, denoted as b. The bias provides an additional input to
the neuron, allowing it to adjust its output threshold. Like weights,
biases are learned during training.
• For each neuron in a hidden layer or the output layer, the weighted
sum of its inputs is computed. This involves multiplying each input by
its corresponding weight, summing up these products, and adding the
bias:
• The weighted sum is then passed through an activation function,
denoted as f. The activation function introduces nonlinearity into the
network, allowing it to learn and represent complex relationships in
the data. The activation function determines the output range of the
neuron and its behavior in response to different input values. The
choice of activation function depends on the nature of the task and
the desired properties of the network.
• Output layer
• The output layer of an MLP produces the final predictions or outputs
of the network. The number of neurons in the output layer depends
on the task being performed (e.g., binary classification, multi-class
classification, regression).
• Each neuron in the output layer receives input from the neurons in
the last hidden layer and applies an activation function. This
activation function is usually different from those used in the hidden
layers and produces the final output value or prediction.
• During the training process, the network learns to adjust the weights
associated with each neuron's inputs to minimize the discrepancy
between the predicted outputs and the true target values in the
training data. By adjusting the weights and learning the appropriate
activation functions, the network learns to approximate complex
patterns and relationships in the data, enabling it to make accurate
predictions on new, unseen samples.
Sivanandam, S. N., and S. N. Deepa. Principles of soft computing (with CD). John Wiley & Sons, 2007.
Sivanandam, S. N., and S. N. Deepa. Principles of soft computing (with CD). John Wiley & Sons, 2007.
Back propogation
• The backpropagation learning algorithm is one of the most important developments in
neural networks.
• This learning algorithm is applied to a multilayer feed-forward network consisting of
processing elemens with continuous differentiable activation functions.
• The neural network associated with back-propagation learning algorithm are called back
-propagation networks. (BPNs).
• For a given set of training input-output pair, this algorithm provides a procedure for
changing the weights in a BPN to classify the given input patterns correctly.
• The basic concept for this weight update algorithm is simply the gradient descent method
as used in the case of simple peceptron networks with differentiable units.
• This is a method where the error is propagated back to the hidden unit.
• The aim of the neural network is to train the network to achieve a balance between the
network's ability to respond (memorization) and its ability to give reasonable responses to
the input that is similar, but not identical to the one that is used in training generalization.
Sivanandam, S. N., and S. N. Deepa. Principles of soft computing (with CD). John Wiley & Sons, 2007.
• The training of the BPN is done in three stages –
• the feed-forward of the input training pattern,
• The calculation and back-propagation of the error,
• and updation of weights.
• The testing of the BPN involves the computation of feed-forward
phase only.
• There can be more than one hidden layer (more beneficial) but one
hidden layer is sufficient.
• Even though the training is very slow, once the network is trained it
can produce its outputs very rapidly.
• https://www.youtube.com/watch?v=tUoUdOdTkRw
• https://www.youtube.com/watch?v=ItkSCYzSD34
• https://www.youtube.com/watch?v=tTjcakAuHPI
What is a Convolutional Neural Network?
• A Convolutional Neural Network (CNN) is a type of artificial
intelligence especially good at processing images and videos. They
draw inspiration from the structure of the human visual cortex.
• You can use CNNs in many applications, including image recognition,
facial recognition, and medical imaging analysis. They are able to
automatically extract features from images, which makes them very
powerful tools.

• https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
convolutional-neural-networks
• There are several reasons why CNNs are important in the modern world, as highlighted
below:
• CNNs are distinguished from classic machine learning algorithms such
as SVMs and decision trees by their ability to autonomously extract features at a large
scale, bypassing the need for manual feature engineering and thereby enhancing
efficiency.
• The convolutional layers grant CNNs their translation-invariant characteristics,
empowering them to identify and extract patterns and features from data irrespective of
variations in position, orientation, scale, or translation.
• A variety of pre-trained CNN architectures, including VGG-16, ResNet50, Inceptionv3,
and EfficientNet, have demonstrated top-tier performance. These models can be
adapted to new tasks with relatively little data through a process known as fine-tuning.
• Beyond image classification tasks, CNNs are versatile and can be applied to a range of
other domains, such as natural language processing, time series analysis, and speech
recognition.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Key Components of a CNN

• The convolutional neural network is made of four main parts.

• But how do CNNs Learn with those parts?

• They help the CNNs mimic how the human brain operates to recognize
patterns and features in images:

• Convolutional layers
• Rectified Linear Unit (ReLU for short)
• Pooling layers
• Fully connected layers
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Convolution layers
• This is the first building block of a CNN. As the name suggests, the
main mathematical task performed is called convolution, which is the
application of a sliding window function to a matrix of pixels
representing an image. The sliding function applied to the matrix is
called kernel or filter, and both can be used interchangeably.
• In the convolution layer, several filters of equal size are applied, and
each filter is used to recognize a specific pattern from the image, such
as the curving of the digits, the edges, the whole shape of the digits,
and more.
• It requires a few components, which are input data, a filter and
a feature map.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Put simply, in the convolution layer, we use small grids (called filters
or kernels) that move over the image. Each small grid is like a mini
magnifying glass that looks for specific patterns in the photo, like
lines, curves, or shapes. As it moves across the photo, it creates a new
grid that highlights where it found these patterns.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• The kernel used for the convolution is a matrix with a dimension of 3x3. The weights of
each element of the kernel is represented in the grid.
• In real life, the weights of the kernels are determined during the training process of the
neural network.
• Using these two matrices, we can perform the convolution operation by applying the dot
product, and work as follows:
1.Apply the kernel matrix from the top-left corner to the right.
2.Perform element-wise multiplication.
3.Sum the values of the products.
4.The resulting value corresponds to the first value (top-left corner) in the convoluted
matrix.
5.Move the kernel down with respect to the size of the sliding window.
6.Repeat steps 1 to 5 until the image matrix is fully covered.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Another name associated with the kernel in the literature is feature
detector because the weights can be fine-tuned to detect specific
features in the input image.
• For instance:
• Averaging neighboring pixels kernel can be used to blur the input
image.
• Subtracting neighboring kernel is used to perform edge detection.
• The more convolution layers the network has, the better the layer is
at detecting more abstract features.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Note that the weights in the
feature detector remain fixed as
it moves across the image,
which is also known as
parameter sharing.
• Some parameters such as the
weight values, adjust during
training through the process of
backpropagation and gradient
descent. However, there are
three hyperparameters which
affect the volume size of the
output that need to be set
before the training of the neural
network begins.

https://www.ibm.com/think/topics/convolutional-neural-networks
• Another convolution layer can follow the
initial convolution layer.
• When this happens, the structure of the
CNN can become hierarchical as the later
layers can see the pixels within the
receptive fields of prior layers.
• As an example, let’s assume that we’re
trying to determine if an image contains a
bicycle.
• You can think of the bicycle as a sum of
parts.
• It is comprised of a frame, handlebars,
wheels, pedals, and so on.
• Each individual part of the bicycle makes up
a lower-level pattern in the neural net, and
the combination of its parts represents a
higher-level pattern, creating a feature
hierarchy within the CNN.
• Ultimately, the convolutional layer converts
the image into numerical values, allowing
the neural network to interpret and extract
relevant patterns.

https://www.ibm.com/think/topics/convolutional-neural-networks
• These include:
• 1. The number of filters affects the depth of the output. For example, three distinct filters
would yield three different feature maps, creating a depth of three.
• 2. Stride is the distance, or number of pixels, that the kernel moves over the input matrix.
While stride values of two or greater is rare, a larger stride yields a smaller output.
• 3. Zero-padding is usually used when the filters do not fit the input image. This sets all
elements that fall outside of the input matrix to zero, producing a larger or equally sized
output. There are three types of padding:
• Valid padding: This is also known as no padding. In this case, the last convolution is dropped if dimensions do
not align.
• Same padding: This padding ensures that the output layer has the same size as the input layer.
• Full padding: This type of padding increases the size of the output by adding zeros to the border of the input.
• After each convolution operation, a CNN applies a Rectified Linear Unit (ReLU)
transformation to the feature map, introducing nonlinearity to the model.

https://www.ibm.com/think/topics/convolutional-neural-networks
Activation function
• A ReLU activation function is applied after each convolution
operation.
• This function helps the network learn non-linear relationships
between the features in the image, hence making the network more
robust for identifying different patterns.
• It also helps to mitigate the vanishing gradient problems.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Pooling layer
• The goal of the pooling layer is to pull the most significant features from the
convoluted matrix.
• This is done by applying some aggregation operations, which reduce the
dimension of the feature map (convoluted matrix), hence reducing the memory
used while training the network.
• Pooling is also relevant for mitigating overfitting.
• The most common aggregation functions that can be applied are:
• Max pooling, which is the maximum value of the feature map
• Sum pooling corresponds to the sum of all the values of the feature map
• Average pooling is the average of all the values.
• Also, the dimension of the feature map becomes smaller as the pooling function
is applied.
• The last pooling layer flattens its feature map so that it can be processed by the
fully connected layer.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Pooling layers, also known as downsampling, conducts dimensionality
reduction, reducing the number of parameters in the input. Similar to the
convolutional layer, the pooling operation sweeps a filter across the entire
input, but the difference is that this filter does not have any weights.
Instead, the kernel applies an aggregation function to the values within
the receptive field, populating the output array. There are two main types
of pooling:
• Max pooling: As the filter moves across the input, it selects the pixel with the
maximum value to send to the output array. As an aside, this approach tends to
be used more often compared to average pooling.
• Average pooling: As the filter moves across the input, it calculates the average
value within the receptive field to send to the output array.
• While a lot of information is lost in the pooling layer, it also has a number
of benefits to the CNN. They help to reduce complexity, improve
efficiency, and limit risk of overfitting.

https://www.ibm.com/think/topics/convolutional-neural-networks
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Fully connected layers
• These layers are in the last layer of the convolutional neural network,
and their inputs correspond to the flattened one-dimensional matrix
generated by the last pooling layer.
• ReLU activations functions are applied to them for non-linearity.
• Finally, a softmax prediction layer is used to generate probability
values for each of the possible output labels, and the final label
predicted is the one with the highest probability score.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Overfitting and Regularization in CNNs
• Overfitting is a common challenge in machine learning models and
CNN deep learning projects. It happens when the model learns the
training data too well (“learning by heart”), including its noise and
outliers. Such a learning leads to a model that performs well on the
training data but badly on new, unseen data.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Dropout: This consists of randomly dropping some neurons during the
training process, which forces the remaining neurons to learn new
features from the input data.
• Batch normalization: The overfitting is reduced at some extent by
normalizing the input layer by adjusting and scaling the activations. This
approach is also used to speed up and stabilize the training process.
• Pooling Layers: This can be used to reduce the spatial dimensions of the
input image to provide the model with an abstracted form of
representation, hence reducing the chance of overfitting.
• Early stopping: This consists of consistently monitoring the model’s
performance on validation data during the training process and stopping
the training whenever the validation error does not improve anymore.
• Noise injection: This process consists of adding noise to the inputs or the
outputs of hidden layers during the training to make the model more
robust and prevent it from a weak generalization.
• L1 and L2 normalizations: Both L1 and L2 are used to add a penalty to
the loss function based on the size of weights. More specifically, L1
encourages the weights to be spare, leading to better feature selection.
On the other hand, L2 (also called weight decay) encourages the weights
to be small, preventing them from having too much influence on the
predictions.
• Data augmentation: This is the process of artificially increasing the size
and diversity of the training dataset by applying random transformations
like rotation, scaling, flipping, or cropping to the input images.
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
• Image classification: Convolutional
neural networks are used for image
categorization, where images are
assigned to predefined categories.
One use of such a scenario is
automatic photo organization in
social media platforms.
• Object detection: CNNs are able to
identify and locate multiple objects
within an image. This capability is
crucial in multiple scenarios of shelf
scanning in retail to identify out-of-
stock items.
• Facial recognition: this is also one of
the main industries of application of
CNNs. For instance, this technology
can be embedded into security
systems for efficient control of
access based on facial features.

https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Recurrent Neural Network
• A recurrent neural network or RNN is a deep neural network trained
on sequential or time series data to create a machine learning (ML)
model that can make sequential predictions or conclusions based
on sequential inputs.
• An RNN might be used to predict daily flood levels based on past
daily flood, tide and meteorological data. But RNNs can also be used
to solve ordinal or temporal problems such as language
translation, natural language processing (NLP), sentiment
analysis, speech recognition and image captioning.
• https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-
recurrent-neural-networks

https://www.ibm.com/think/topics/recurrent-neural-networks
• Like traditional neural networks, such as feedforward neural
networks and convolutional neural networks (CNNs), recurrent
neural networks use training data to learn. They are
distinguished by their “memory” as they take information from
prior inputs to influence the current input and output.
• While traditional deep learning networks assume that inputs
and outputs are independent of each other, the output of
recurrent neural networks depend on the prior elements within
the sequence.

https://www.ibm.com/think/topics/recurrent-neural-networks
• Let’s take an idiom, such as “feeling under the weather,” which is
commonly used when someone is ill to aid us in the explanation of
RNNs. For the idiom to make sense, it needs to be expressed in that
specific order. As a result, recurrent networks need to account for
the position of each word in the idiom, and they use that information
to predict the next word in the sequence.
• Each word in the phrase "feeling under the weather" is part of a
sequence, where the order matters. The RNN tracks the context by
maintaining a hidden state at each time step. A feedback loop is
created by passing the hidden state from one-time step to the next.
The hidden state acts as a memory that stores information about
previous inputs. At each time step, the RNN processes the current
input (for example, a word in a sentence) along with the hidden
state from the previous time step. This allows the RNN to
"remember" previous data points and use that information to
influence the current output.

https://www.ibm.com/think/topics/recurrent-neural-networks
• Another distinguishing characteristic of recurrent networks is
that they share parameters across each layer of the network.
• While feedforward networks have different weights across
each node, recurrent neural networks share the same weight
parameter within each layer of the network.
• That said, these weights are still adjusted through the
processes of backpropagation and gradient descent to
facilitate reinforcement learning.

https://www.ibm.com/think/topics/recurrent-neural-networks
• Recurrent neural networks use forward propagation and
backpropagation through time (BPTT) algorithms to determine
the gradients (or derivatives), which is slightly different from
traditional backpropagation as it is specific to sequence data.
• The principles of BPTT are the same as
traditional backpropagation, where the model trains itself by
calculating errors from its output layer to its input layer.
• These calculations allow us to adjust and fit the parameters of
the model appropriately.
• BPTT differs from the traditional approach in that BPTT sums
errors at each time step whereas feedforward networks do not
need to sum errors as they do not share parameters across
each layer.

https://www.ibm.com/think/topics/recurrent-neural-networks
Types of RNNs
• Feedforward networks map inputs
and outputs one-to-one, and while
we’ve visualized recurrent neural
networks in this way in the diagrams
before this, they do not have this
constraint. Instead, their inputs and
outputs can vary in length, and
different types of RNNs are used for
different use cases, such as music
generation, sentiment classification
and machine translation.
• Popular recurrent neural network
architecture variants include:
• Standard RNNs
• Bidirectional recurrent neural networks
(BRRNs)
• Long short-term memory (LSTM)
• Gated recurrent units (GNUs)
• Encoder-decoder RNN

https://www.ibm.com/think/topics/recurrent-neural-networks
Standard RNNs
• The most basic version of an RNN, where the output at each time step
depends on both the current input and the hidden state from the previous
time step, suffers from problems such as vanishing gradients, making it
difficult for them to learn long-term dependencies.
• They excel in simple tasks with short-term dependencies, such as
predicting the next word in a sentence (for short, simple sentences) or the
next value in a simple time series.
• RNNs are good for tasks that process data sequentially in real time, such as
processing sensor data to detect anomalies in short time frames, where
inputs are received one at a time and predictions need to be made
immediately based on the most recent inputs.
Bidirectional recurrent neural networks
(BRNNs)
• While unidirectional RNNs can only be drawn from previous
inputs to make predictions about the current state,
bidirectional RNNs or BRNNs, pull in future data to improve the
accuracy of it.
• Returning to the example of “feeling under the weather”, a
model based on a BRNN can better predict that the second
word in that phrase is “under” if it knows that the last word in
the sequence is “weather.”
Long short-term memory (LSTM)
• LSTM is a popular RNN architecture, which was introduced by Sepp
Hochreiter and Juergen Schmidhuber as a solution to the vanishing
gradient problem. This work addressed the problem of long-term
dependencies. That is, if the previous state that is influencing the current
prediction is not in the recent past, the RNN model might not be able to
accurately predict the current state.
• As an example, let’s say we wanted to predict the italicized words in,
“Alice is allergic to nuts. She can’t eat peanut butter.” The context of a nut
allergy can help us anticipate that the food that cannot be eaten contains
nuts. However, if that context was a few sentences prior, then it would
make it difficult or even impossible for the RNN to connect the
information.
• To remedy this, LSTM networks have “cells” in the hidden layers of the
artificial neural network, which have 3 gates: an input gate, an output gate
and a forget gate. These gates control the flow of information that is
needed to predict the output in the network. For example, if gender
pronouns, such as “she”, was repeated multiple times in prior sentences,
you might exclude that from the cell state.
Gated recurrent units (GRUs)
• A GRU is similar to an LSTM as it also works to address the short-term
memory problem of RNN models. Instead of using a “cell state” to
regulate information, it uses hidden states, and instead of 3 gates, it
has 2: a reset gate and an update gate. Similar to the gates within
LSTMs, the reset and update gates control how much and which
information to retain.
• Because of its simpler architecture, GRUs are computationally more
efficient and require fewer parameters compared to LSTMs. This
makes them faster to train and often more suitable for certain real-
time or resource-constrained applications.
Encoder-decoder RNNs
• These are commonly used for sequence-to-sequence tasks, such as
machine translation. The encoder processes the input sequence into
a fixed-length vector (context), and the decoder uses that context to
generate the output sequence. However, the fixed-length context
vector can be a bottleneck, especially for long input sequences.
Fitting a neural network
• Fitting a neural network involves training it to map inputs to outputs by
adjusting its parameters (weights and biases) using a training dataset and
optimization algorithms like gradient descent.
• Process:
• 1. Data Preparation:
• Define the Problem:
• Clearly identify the input and output variables, and the type of problem
(regression, classification, etc.).
• Collect and Prepare Data:
• Gather a sufficient amount of relevant data, preprocess it (e.g., scaling,
normalization), and split it into training, validation, and testing sets.
Fitting a neural network
• 2. Neural Network Architecture:
• Choose the Network Type:
• Select the appropriate type of neural network (e.g., feedforward,
convolutional, recurrent) based on the problem.
• Define Layers and Neurons:
• Determine the number of layers, the number of neurons in each layer, and
the activation functions for each layer.
• 3. Training:
• Initialization: Randomly initialize the weights and biases of the network.
• Forward Propagation: Feed the input data through the network to
generate an output.
Fitting a neural network
• Loss Function: Define a loss function that quantifies the difference between the
predicted and actual outputs.
• Backpropagation: Calculate the gradient of the loss function with respect to the
weights and biases using backpropagation.
• Optimization: Update the weights and biases using an optimization algorithm
(e.g., stochastic gradient descent) to minimize the loss function.
• Iteration: Repeat the forward and backward propagation steps until the network
converges to a satisfactory solution.
• 4. Evaluation and Tuning:
• Validation:
• Use the validation set to monitor the network's performance during training and
prevent overfitting.
• Testing:
Fitting a neural network
• Evaluate the trained network's performance on the test set to assess its
generalization ability.
• Tuning:
• Adjust hyperparameters (e.g., learning rate, number of layers, number of
neurons) to improve the network's performance.
• Tools and Libraries:
• MathWorks (MATLAB): Provides tools for creating, visualizing, and training
neural networks.
• OriginLab (OriginPro): Offers a Neural Network Fitting App for fitting data
with neural networks.
• Python Libraries: TensorFlow, PyTorch, Scikit-learn, Keras.

You might also like