0% found this document useful (0 votes)
22 views48 pages

586 114 216 Convolutional Neural Networks

CNN

Uploaded by

g4gowthamkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views48 pages

586 114 216 Convolutional Neural Networks

CNN

Uploaded by

g4gowthamkumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Convolutional Neural

Networks
CNN
• A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning
algorithm which can take in an input image, assign importance
(learnable weights and biases) to various aspects/objects in the image
and be able to differentiate one from the other.
• The pre-processing required in a ConvNet is much lower as compared to
other classification algorithms.
• convolutional neural networks (ConvNets or CNNs) are more often
utilized for classification and computer vision tasks.
• The role of the ConvNet is to reduce the images into a form which is
easier to process, without losing features which are critical for getting a
good prediction.
Why CNN for image processing
task?
• Its built-in convolutional layer reduces the high dimensionality of images
without losing its information.
• But ANN uses weights to learn. Weights get changed after each iteration
through the neuron in ANN.
• Using ANN, image classification problems become difficult because 2-
dimensional images need to be converted to 1-dimensional vectors. This
increases the number of trainable parameters exponentially. Increasing
trainable parameters takes storage and processing capability.
• Comparatively, there is no neuron or weights in CNN. CNN instead casts
multiple layers on images and uses filtration to analyze image inputs.
CNN
• Convolutional Neural Networks are very similar to ordinary Neural
Networks : they are made up of neurons that have learnable weights
and biases.
• Each neuron receives some inputs, performs a dot product and
optionally follows it with a non-linearity.
• ConvNet architectures make the explicit assumption that the inputs
are images, which allows us to encode certain properties into the
architecture.
Example
CNN
CNN
CNN
• Convolutional neural networks are distinguished from other neural
networks by their superior performance with image, speech, or audio
signal inputs. They have three main types of layers, which are:
• Convolutional layer
• Pooling layer
• Fully-connected (FC) layer
• The convolutional layer is the first layer of a convolutional network.
While convolutional layers can be followed by additional
convolutional layers or pooling layers, the fully-connected layer is the
final layer.
Layers used to build ConvNets :
Example considering input image 32*32*3

• INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image
of width 32, height 32, and with three color channels R,G,B.
• CONV layer will compute the output of neurons that are connected to local
regions in the input, each computing a dot product between their weights and a
small region they are connected to in the input volume. This may result in volume
such as [32x32x12] if we decided to use 12 filters.
• RELU layer will apply an elementwise activation function, such as the max(0,x) the
thresholding at zero. This leaves the size of the volume unchanged ([32x32x12]).
• The Pooling layer is responsible for reducing the spatial size of the Convolved
Feature.
• FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of
size [1x1x10]…….(where each of the 10 numbers correspond to a class score)
Convolutional Layer

• The convolutional layer is the core building block of a CNN, and it is


where the majority of computation occurs.
• It requires a few components, which are input data, a filter, and a
feature map.
• Let’s assume that the input will be a color image, which is made up of
a matrix of pixels in 3D.
• This means that the input will have three dimensions—a height,
width, and depth—which correspond to RGB in an image.
Image representation in 3D form
Convolutional Layer
Convolutional Layer
Stride-2
Convolution Layer — The Kernel
CONVOLUTIONAL OPERATION
Convolutional Layer

• We also have a feature detector, also known as a kernel or a filter,


which will move across the receptive fields of the image, checking if
the feature is present. This process is known as a convolution.
• The feature detector is a two-dimensional (2-D) array of weights,
which represents part of the image.
• While they can vary in size, the filter size is typically a 3x3 matrix; this
also determines the size of the receptive field.
• The filter is then applied to an area of the image, and a dot product is
calculated between the input pixels and the filter.
Convolutional Layer
• This dot product is then fed into an output array.
• Afterwards, the filter shifts by a stride, repeating the process until the
kernel has swept across the entire image.
• The final output from the series of dot products from the input and
the filter is known as a feature map, activation map, or a convolved
feature.
• Note that the weights in the feature detector(filter) remain fixed as it
moves across the image, which is also known as parameter sharing.
Convolutional Layer
• Some parameters, like the weight values, adjust during training
through the process of backpropagation and gradient descent.
• However, there are three hyperparameters which affect the volume
size of the output that need to be set before the training of the neural
network begins. These include:
• 1. The number of filters affects the depth of the output. For example,
three distinct filters would yield three different feature maps, creating
a depth of three.
Convolutional Layer

• 2.Stride is the distance, or number of pixels, that the kernel moves


over the input matrix. While stride values of two or greater is rare, a
larger stride yields a smaller output.

• When the stride is 1 then we move the filters one pixel at a time.
When the stride is 2 (or uncommonly 3 or more, though this is rare in
practice) then the filters jump 2 pixels at a time as we slide them
around. This will produce smaller output volumes spatially.
PADDING
• During convolution, the size of the output feature map is determined by the size
of the input feature map, the size of the kernel, and the stride.
• if we simply apply the kernel on the input feature map, then the output feature
map will be smaller than the input.
• This can result in the loss of information at the borders of the input feature map.
• In Order to preserve the border information we use padding.
• Padding is a term relevant to convolutional neural networks as it refers to the
amount of pixels added to an image when it is being processed by the kernel of
a CNN.
• For example, if the padding in a CNN is set to zero, then every pixel value that is
added will be of value zero.
PADDING
Convolutional Layer
• 3.Zero-padding is usually used when the filters do not fit the input image.
This sets all elements that fall outside of the input matrix to zero, producing
a larger or equally sized output. There are three types of padding:
• Valid padding: This is also known as no padding. In this case, the last
convolution is dropped if dimensions do not align. Here the output feature
map is smaller than the input feature map. This is useful when we want to
reduce the spatial dimensions of the feature maps.
• Same padding: In the same padding, padding is added to the input feature
map such that the size of the output feature map is the same as the input
feature map. This is useful when we want to preserve the spatial
dimensions of the feature maps.
Why Padding?
• Padding helps to retain information without losing information
• How much padding is required for convolution it depends the formula Padding
P=(f-1)/2.
• In order to work the kernel with processing in the image, padding is added to the
outer frame of the image to allow for more space for the filter to cover in the
image.
• Adding padding to an image processed by a CNN allows for a more accurate
analysis of images.
• After convolutional operation we get resultant matrix less than the original image.
We apply multiple convolutional operation resultant image will be very small at
that point it will become useless, so in order to retain the shape or size of the
original image we need padding.
• We can compute the spatial size of the output volume as a function of
the input volume size(W), the receptive field or filter size of the Conv
Layer neurons(F), the stride with which they are applied (S), and the
amount of zero padding used(P) on the border.
• The correct formula for calculating how many neurons “fit” is given
by (W-F+1).If padding and stride added it will be (W-F+2P)/S+1.
• For example for a 7x7 input and a 3x3 filter with stride 1 and pad 0 we
would get a 5x5 output. With stride 2 we would get a 3x3 output.
Importance of Padding
• As we will soon see, sometimes it will be convenient to pad the input volume with
zeros around the border. The size of this zero-padding is a hyperparameter.
• The nice feature of zero padding is that it will allow us to control the spatial size of the
output volumes (most commonly as we’ll see soon we will use it to exactly preserve
the spatial size of the input volume so the input and output width and height are the
same).
• Use of zero-padding. consider an example, note that the input dimension was 5 and
the output dimension was equal: also 5. This worked out so because our receptive
fields were 3 and we used zero padding of 1. If there was no zero-padding used, then
the output volume would have had spatial dimension of only 3, because that is how
many neurons would have “fit” across the original input. In general, setting zero
padding to be P=(F-1)/2 when the stride is S=1 ensures that the input volume and
output volume will have the same size spatially.
About filter
• There is not specific way to choose such dimensions or sizes of filters.
• In general, we would like to use smaller odd-sized kernel filters.
• A common choice is to keep the kernel size at 3x3 or 5x5.
• 2x2 and 4x4 are generally not preferred because odd-sized filters
symmetrically divide the previous layer pixels around the output pixel
• but it would be computationally inefficient since an even sized filter leads
to asymmetry. For eg, when a 3x3 filter is used, there would be a padding
of 1 in all sides (if you want the output to be of the same size as input
• because of extremely longer training time consumed and expensiveness,
we no longer use such large kernel sizes. .
CONVOLUTIONAL OPERATION
• The objective of the Convolution Operation is to extract the high-
level features such as edges, from the input image.
• There are two types of results to the operation — one in which the
convolved feature is reduced in dimensionality as compared to the
input, and the other in which the dimensionality is either increased or
remains the same.
Pooling Layer

• Similar to the Convolutional Layer, the Pooling layer is responsible for reducing
the spatial size of the Convolved Feature.
• This is to decrease the computational power required to process the
data through dimensionality reduction.
• Furthermore, it is useful for extracting dominant features.
• There are two types of Pooling: Max Pooling and Average Pooling.
• Max Pooling returns the maximum value from the portion of the image
covered by the Kernel.
• Average Pooling returns the average of all the values from the portion of the
image covered by the Kernel.
• Max Pooling performs a lot better than Average Pooling.
• Pooling involves selecting a pooling operation, much like a filter to be
applied to feature maps.
• The size of the pooling operation or filter is smaller than the size of
the feature map; specifically, it is almost always 2×2 pixels applied
with a stride of 2 pixels.
Max Pooling
Why do we use Max Pooling?
• Max polling helps in extracting low-level features from the data like
edges, points etc. or if we talk about image processing the max-
pooling helps to extract the sharpest features on the image and the
sharpest features are a best lower-level representation of the image.
• Loosely, the low-level feature extraction is based on signal/image
processing techniques, while the high-level feature extraction is based
on machine learning techniques.
Min Pooling
Min Pooling
• In min pooling, the layer operates with the most non-prominent
feature of the feature map provided by the convolutional layer.
• More basically we can say it selects the minimum valued element
from the region captured by the filter in any feature map.
• We can choose background color using Min pooling.
• As the example image showed above we use the min pooling where
we are required to extract the most irrelevant features from the data
or if we talk about the image data it helps to extract the features
which have lower sharp values or the edgeless features from the
image.
Average pooling
• pooling is faster to compute than convolutions.
• When we use average pooling it helps to extract the smooth features.
• If talking about image data if we apply the average pooling layers we
will get them out as the combination of all colors presented in the
region covered by the feature map.
• So if the distribution of the data points in data and colors in any image
is smooth or more basically the distribution is proper then we can use
the Average pooling to get proper results.
Fully connected Layers
• After going through the above process, we have successfully enabled the
model to understand the features.
• Moving on, we are going to flatten the final output and feed it to a regular
Neural Network for classification purposes.
• Now that we have converted our input image into a suitable form for our
Multi-Level Perceptron, we shall flatten the image into a column vector.
• The flattened output is fed to a feed-forward neural network and
backpropagation applied to every iteration of training.
• Over a series of epochs, the model is able to distinguish between
dominating and certain low-level features in images and classify them using
the Softmax Classification technique.
Fully connected Layers
Fully connected Layers
• In fully connected layers, the neuron applies a linear transformation
to the input vector through a weights matrix.
• A non-linear transformation is then applied to the product through a
non-linear activation function f.
• Neurons in a fully connected layer have full connections to all
activations in the previous layer, as seen in regular Neural Networks.
Implementation
• https://keras.io/examples/vision/mnist_convnet/

You might also like