0% found this document useful (0 votes)
303 views36 pages

Slides CNN Unit 3

Uploaded by

Neha Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
303 views36 pages

Slides CNN Unit 3

Uploaded by

Neha Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Deep Learning & Neural

Network
Unit-3

Neha Gupta
April 3, 2025
Deep learning
“Deep learning is a collection of statistical techniques of
machine learning for learning feature hierarchies that are
actually based on artificial neural networks.”

Deep-learning architectures such as deep neural


networks, deep belief networks, deep reinforcement
learning, recurrent neural networks, convolutional neural
networks and Transformers have been applied to fields
including computer vision, speech recognition, natural language
processing, machine translation, bioinformatics, drug
design, medical image analysis, climate science, material
inspection and board game programs, where they have
produced results comparable to and in some cases surpassing
human expert performance

April 3, 2025 2
It is a machine learning class that makes use of numerous
nonlinear processing units so as to perform feature extraction
as well as transformation. The output from each preceding layer
is taken as input by each one of the successive layers.

Deep learning models are capable enough to focus on the


accurate features themselves by requiring a little guidance from
the programmer and are very helpful in solving out the problem
of dimensionality. Deep learning algorithms are used,
especially when we have a huge number of inputs and outputs.

Deep learning is implemented with the help of Neural Networks,


and the idea behind the motivation of Neural Network is the
biological neurons, which is nothing but a brain cell.

April 3, 2025 3
we provide the raw data of images to the first layer of the input layer. After
then, these input layer will determine the patterns of local contrast that
means it will differentiate on the basis of colors, luminosity, etc. Then the
1st hidden layer will determine the face feature, i.e., it will fixate on eyes,
nose, and lips, etc. And then, it will fixate those face features on the
correct face template. So, in the 2nd hidden layer, it will actually
determine the correct face here as it can be seen in the above image,
after which it will be sent to the output layer.
April 3, 2025 4
CNN Architectures
• Deep Neural Networks
It is a neural network that incorporates the complexity of a
certain level, which means several numbers of hidden layers
are encompassed in between the input and output layers.
They are highly proficient on model and process non-linear
associations.
• Deep Belief Networks
A deep belief network is a class of Deep Neural Network that
comprises of multi-layer belief networks. Steps to perform
DBN
• With the help of the Contrastive Divergence algorithm, a
layer of features is learned from perceptible units.
• Next, the formerly trained features are treated as visible
units, which perform learning of features.
• Lastly, when the learning of the final hidden layer is
accomplished, then the whole DBN is trained.

April 3, 2025 5
• Recurrent Neural Networks
It permits parallel as well as sequential computation, and it is
exactly similar to that of the human brain (large feedback
network of connected neurons). Since they are capable enough
to recollect all of the imperative things related to the input they
have received, so they are more precise.
Types of Deep Learning Networks :
1. Feed Forward Neural Network
A feed-forward neural network is none other than
an Artificial Neural Network, which ensures that the nodes do
not form a cycle. In this kind of neural network, all the
perceptron's are organized within layers, such that the input
layer takes the input, and the output layer generates the
output. Since the hidden layers do not link with the outside
world, it is named as hidden layers.

April 3, 2025 6
Each of the perceptron's contained in one single layer is
associated with each node in the subsequent layer. It can be
concluded that all of the nodes are fully connected. It does not
contain any visible or invisible connection between the nodes in
the same layer. There are no back-loops in the feed-forward
network. To minimize the prediction error, the backpropagation
algorithm can be used to update the weight values.
Applications:
• Data Compression
• Pattern Recognition
• Computer Vision
• Sonar Target Recognition
• Speech Recognition
• Handwritten Characters Recognition

April 3, 2025 7
2. Recurrent Neural Network
Recurrent neural networks are yet another variation of feed-
forward networks. Here each of the neurons present in the hidden
layers receives an input with a specific delay in time. The Recurrent
neural network mainly accesses the preceding info of existing
iterations. For example, to guess the succeeding word in any
sentence, one must have knowledge about the words that were
previously used. It not only processes the inputs but also shares
the length as well as weights crossways time. It does not let the
size of the model to increase with the increase in the input size.
However, the only problem with this recurrent neural network is
that it has slow computational speed as well as it does not
contemplate any future input for the current state. It has a problem
with recollecting prior information.
Applications: Machine Translation, Robot Control, Time Series
Prediction, Speech Recognition, Speech Synthesis, Time Series
Anomaly Detection, Rhythm Learning, Music Composition

April 3, 2025 8
3. Convolutional Neural Network
Convolutional Neural Networks are a special kind of neural
network mainly used for image classification, clustering of
images and object recognition. DNNs enable unsupervised
construction of hierarchical image representations. To achieve
the best accuracy, deep convolutional neural networks are
preferred more than any other neural network.
Applications:
• Identify Faces, Street Signs, Tumors.
• Image Recognition.
• Video Analysis.
• NLP.
• Anomaly Detection.
• Drug Discovery.
• Checkers Game.
• Time Series Forecasting.

April 3, 2025 9
4. Autoencoders:
An autoencoder neural network is another kind of unsupervised
machine learning algorithm. Here the number of hidden cells is
merely small than that of the input cells. But the number of input
cells is equivalent to the number of output cells. An autoencoder
network is trained to display the output similar to the fed input to
force AEs to find common patterns and generalize the data. The
autoencoders are mainly used for the smaller representation of the
input. It helps in the reconstruction of the original data from
compressed data. This algorithm is comparatively simple as it only
necessitates the output identical to the input.
• Encoder: Convert input data in lower dimensions.
• Decoder: Reconstruct the compressed data.
Applications:
• Classification.
• Clustering.
• Feature Compression.
April 3, 2025 10
Deep learning applications
• Self-Driving Cars
In self-driven cars, it is able to capture the images around it by
processing a huge amount of data, and then it will decide which
actions should be incorporated to take a left or right or should it stop.
So, accordingly, it will decide what actions it should take, which will
further reduce the accidents that happen every year.
• Voice Controlled Assistance
When we talk about voice control assistance, then Siri is the one thing
that comes into our mind. So, you can tell Siri whatever you want it to
do it for you, and it will search it for you and display it for you.
• Automatic Image Caption Generation
Whatever image that you upload, the algorithm will work in such a way
that it will generate caption accordingly. If you say blue colored eye, it
will display a blue-colored eye with a caption at the bottom of the
image.
• Automatic Machine Translation
With the help of automatic machine translation, we are able to convert
one language into another with the help of deep learning.

April 3, 2025 11
• Limitations
• It only learns through the observations.
• It comprises of biases issues.
• Advantages
• It reduces the need for feature engineering.
• It eradicates all those costs that are needless.
• It easily identifies difficult defects.
• It results in the best-in-class performance on problems.
• Disadvantages
• It requires an ample amount of data.
• It is quite expensive to train.
• It does not have strong theoretical groundwork.

April 3, 2025 12
Basic Architecture of CNN
• A convolution tool that separates and identifies the various
features of the image for analysis in a process called as
Feature Extraction.
• The network of feature extraction consists of many pairs of
convolutional or pooling layers.
• A fully connected layer that utilizes the output from the
convolution process and predicts the class of the image
based on the features extracted in previous stages.
• This CNN model of feature extraction aims to reduce the
number of features present in a dataset. It creates new
features which summarizes the existing features contained in
an original set of features. There are many CNN layers as
shown in the CNN architecture diagram.

April 3, 2025 13
• Convolution Layers
There are three types of layers that make up the CNN
which are the convolutional layers, pooling layers, and fully-
connected (FC) layers. When these layers are stacked, a
CNN architecture will be formed. In addition to these
three layers, there are two more important parameters are
the dropout layer and the activation function.

April 3, 2025 14
1. Convolutional Layer
This layer is the first layer that is used to extract the various
features from the input images. In this layer, the mathematical
operation of convolution is performed between the input image
and a filter of a particular size MxM. By sliding the filter over the
input image, the dot product is taken between the filter and the
parts of the input image with respect to the size of the filter
(MxM).

The output is termed as the Feature map which gives us


information about the image such as the corners and edges.
Later, this feature map is fed to other layers to learn several other
features of the input image.

The convolution layer in CNN passes the result to the next layer
once applying the convolution operation in the input.
Convolutional layers in CNN benefit a lot as they ensure the
spatial relationship between the pixels is intact.
April 3, 2025 15
2. Pooling Layer
In most cases, a Convolutional Layer is followed by a Pooling Layer.
The primary aim of this layer is to decrease the size of the convolved
feature map to reduce the computational costs. This is performed by
decreasing the connections between layers and independently
operates on each feature map. Depending upon method used, there
are several types of Pooling operations. It basically summarizes the
features generated by a convolution layer.

In Max Pooling, the largest element is taken from feature map. Average
Pooling calculates the average of the elements in a predefined sized
Image section. The total sum of the elements in the predefined section
is computed in Sum Pooling. The Pooling Layer usually serves as a
bridge between the Convolutional Layer and the FC Layer.

This CNN model generalizes the features extracted by the convolution


layer, and helps the networks to recognize the features independently.
With the help of this, the computations are also reduced in a network.

April 3, 2025 16
3. Fully Connected Layer
The Fully Connected (FC) layer consists of the weights and
biases along with the neurons and is used to connect the
neurons between two different layers. These layers are
usually placed before the output layer and form the last
few layers of a CNN Architecture.

In this, the input image from the previous layers are


flattened and fed to the FC layer. The flattened vector then
undergoes few more FC layers where the mathematical
functions operations usually take place. In this stage, the
classification process begins to take place. The reason two
layers are connected is that two fully connected layers will
perform better than a single connected layer. These layers
in CNN reduce the human supervision

April 3, 2025 17
4. Dropout
Usually, when all the features are connected to the FC
layer, it can cause overfitting in the training dataset.
Overfitting occurs when a particular model works so well
on the training data causing a negative impact in the
model’s performance when used on a new data.
To overcome this problem, a dropout layer is utilized
wherein a few neurons are dropped from the neural
network during training process resulting in reduced size of
the model. On passing a dropout of 0.3, 30% of the nodes
are dropped out randomly from the neural network.
Dropout results in improving the performance of a machine
learning model as it prevents overfitting by making the
network simpler. It drops neurons from the neural networks
during training.

April 3, 2025 18
5. Activation Functions
Finally, one of the most important parameters of the CNN model
is the activation function. They are used to learn and approximate
any kind of continuous and complex relationship between
variables of the network. In simple words, it decides which
information of the model should fire in the forward direction and
which ones should not at the end of the network.

It adds non-linearity to the network. There are several commonly


used activation functions such as the ReLU, Softmax, tanH and
the Sigmoid functions. Each of these functions have a specific
usage. For a binary classification CNN model, sigmoid and
softmax functions are preferred an for a multi-class
classification, generally softmax used. In simple terms,
activation functions in a CNN model determine whether a neuron
should be activated or not. It decides whether the input to the
work is important or not to predict using mathematical
operations.

April 3, 2025 19
Building Blocks of CNN
Convolutional Neural Network is one of the main categories to do
image classification and image recognition in neural networks.
Scene labeling, objects detections, and face recognition, etc., are
some of the areas where convolutional neural networks are widely
used.

CNN takes an image as input, which is classified and process under


a certain category such as dog, cat, lion, tiger, etc. The computer
sees an image as an array of pixels and depends on the resolution of
the image. Based on image resolution, it will see as h * w * d, where
h= height w= width and d= dimension. For example, An RGB image
is 6 * 6 * 3 array of the matrix, and the grayscale image is 4 * 4 *
1 array of the matrix.

In CNN, each input image will pass through a sequence of


convolution layers along with pooling, fully connected layers, filters
(Also known as kernels). After that, we will apply the Soft-max
function to classify an object with probabilistic values 0 and 1.
April 3, 2025 20
Convolution Layer
Convolution layer is the first layer to extract features from an input image.
By learning image features using a small square of input data, the
convolutional layer preserves the relationship between pixels. It is a
mathematical operation which takes two inputs such as image matrix and a
kernel or filter.
April 3, 2025 21
• The dimension of the image matrix is h×w×d.
• The dimension of the filter is fh×fw×d.
• The dimension of the output is (h-fh+1)×(w-fw+1)×1.

Let's start with consideration a


5*5 image whose pixel values
are 0, 1, and filter matrix 3*3 as:

April 3, 2025 22
• The convolution of 5*5 image matrix multiplies with 3*3 filter
matrix is called "Features Map" and show as an output.

Convolution of an image with different filters can perform an operation


such as blur, sharpen, and edge detection by applying filters.

Strides
Stride is the number of pixels which are shift over the input matrix. When the
stride is equaled to 1, then we move the filters to 1 pixel at a time and similarly, if
the stride is equaled to 2, then we move the filters to 2 pixels at a time. The
following figure shows that the convolution would work with a stride of 2.

April 3, 2025 23
Padding
Padding plays a crucial role in building the convolutional neural
network. If the image will get shrink and if we will take a neural
network with 100's of layers on it, it will give us a small image
after filtered in the end.
April 3, 2025 24
If we take a three by three filter on top of a grayscale image and do the
convolving then what will happen?

• It is clear from the above picture that the pixel in the corner will only get
covers one time, but the middle pixel will get covered more than once. It
means that we have more information on that middle pixel, so there are two
downsides:
• Shrinking outputs
• Losing information on the corner of the image.
To overcome this, we have introduced padding to an image. "Padding is an
additional layer which can add to the border of an image."

April 3, 2025 25
Pooling Layer
Pooling layer plays an important role in pre-processing of an
image. Pooling layer reduces the number of parameters when the
images are too large. Pooling is "downscaling" of the image
obtained from the previous layers. It can be compared to
shrinking an image to reduce its pixel density. Spatial pooling is
also called down sampling or subsampling, which reduces the
dimensionality of each map but retains the important
information. There are the following types of spatial pooling:
• Max Pooling
Max pooling is a sample-based discretization process. Its main
objective is to downscale an input representation, reducing its
dimensionality and allowing for the assumption to be made about
features contained in the sub-region binned.
Max pooling is done by applying a max filter to non-overlapping
sub-regions of the initial representation.
April 3, 2025 26
April 3, 2025 27
Average Pooling
Down-scaling will perform through average pooling by dividing the
input into rectangular pooling regions and computing the average
values of each region.
Sum Pooling
The sub-region for sum pooling or mean pooling are set exactly
the same as for max-pooling but instead of using the max
function we use sum or mean.

Fully Connected Layer


• The fully connected layer is a layer in which the input
from the other layers will be flattened into a vector and
sent. It will transform the output into the desired
number of classes by the network.
April 3, 2025 28
In the above diagram, the feature map matrix will be converted into the
vector such as x1, x2, x3... xn with the help of fully connected layers.
We will combine features to create a model and apply the activation
function such as softmax or sigmoid to classify the outputs as a car,
dog, truck, etc.
April 3, 2025 29
Link:
https://www.youtube.com/watch?v=Y1qxI-Df4Lk&t=341s

April 3, 2025 30
Training of Convolutional Neural Network Model
There are the following steps to train our CNN model:
Step 1:
In the first step of the training section, we will specify the device
with the help of torch.device(). We will check for CUDA; if CUDA will be
available, then we used Cuda else we will use CPU.
*CUDA: With the CUDA Toolkit, you can develop, optimize, and deploy your
applications on GPU-accelerated embedded systems, desktop workstations,
enterprise data.
Step 2:
In the next step, we will assign our model to our device.
Step 3:
Now, we will define our loss function. The loss function will
define in the same way as we have defined in our previous model in
which we have used a deep neural network. After that, we will use the
familiar optimizer, i.e., Adam as
April 3, 2025 31
Step 4:
In the next step, we will specify number of epochs. We initialize
number of epochs and analyzing the loss at every epoch with the plot.
We will initialize two lists, i.e., loss_history and correct history.
Step 5:
We will start by iterating through every epoch, and for every epoch, we
must iterate through every single training batch that's provided to us
by the training loader. Each training batch contains one hundred
images as well as one hundred labels in a train in training loader.
Step 6:
We are dealing with the convolutional neural network in which
the inputs are first being passed. We will pass the images in the
four dimensionalities, so there is no need to flatten them.
As we have assigned our model to our device, in the same way,
we assign inputs and labels to our devices also.

April 3, 2025 32
Step 7:
In the next step, we will perform the optimization algorithm in the same way as
we have done before in image recognition.
Step 8:
To keep track of the losses at every epoch, we will initialize a variable loss, i.e.,
running_loss. For every loss which is computed as per batch, we must add all
up for every single batch and then compute the final loss at every epoch.
Now, we will append this accumulated loss for the entire epoch into our
losses list. For this, we use an else statement after the looping statement. So
once the for loop is finished, then the else statement is called. In this else
statement we will print the accumulated loss which was computed for the
entire dataset at that specific epoch.
Step 09:
In the next step, we will find the accuracy of our network. We will initialize the
correct variable and assign the value zero. We will compare the predictions
made by the model for each training image to the actual labels of the images
to show how many of them get correct within an epoch.

April 3, 2025 33
For each image, we will take the maximum score value. In such that case a
tuple is returned. The first value it gives back is the actual top value - the
maximum score, which was made by the model for every single image within
this batch of images. So, we are not interested in the first tuple value, and the
second will correspond to the top predictions made by the model which we
will call preds. It will return the index of the maximum value for that image.
Step 10:
Each image output will be a collection of values with indices ranging from 0 to
9 such that the MNIST dataset contains classes from 0 to 9. It follows that the
prediction where the maximum value occurs corresponds to the prediction
made by the model. We will compare all of these predictions made by the
model to the actual labels of the images to see how many of them they got
correct.

This will give the number of correct predictions for every single batch of
images. We will define the epoch accuracy in the same way as epoch loss and
print both epoch loss and accuracy as
• epoch_acc=correct.float()/len(training_loader)
• print('training_loss:{:.4f},{:.4f}'.format(epoch_loss,epoch_acc.item()))

April 3, 2025 34
This will give the expected result as:

Step 11:
Now, we will append the accuracy for the entire epoch into our
correct_history list, and for better visualization, we will plot both epoch loss
and accuracy as
• plt.plot(loss_history,label='Running Loss History')
• plt.plot(correct_history,label='Running correct History')
April 3, 2025 35
• Case Study
Link 1: Diabetic Retinopathy
https://www.coursera.org/lecture/machine-learning-duke/motivaion-diadetic-
retinopathy-C183X

Link 2: Case study: Smart speaker


https://www.coursera.org/lecture/ai-for-everyone/case-study-smart-speaker-ahvm7

Link 3: Self Driving Cars


https://neptune.ai/blog/self-driving-cars-with-convolutional-neural-networks-cnn

April 3, 2025 36

You might also like