13-Convolutional Neural Network-09-08-2024
13-Convolutional Neural Network-09-08-2024
Dr.S.ALBERT ALEXANDER
SCHOOL OF ELECTRICAL ENGINEERING
[email protected]
Dr.S.ALBERT ALEXANDER-
SELECT-VIT 1
Module 2
Artificial Neural Networks
❖ Perceptron Learning Algorithm
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 2
2.4 Convolutional Neural Network
❖ The purpose of the hidden units is to learn non-linear
combinations of the original inputs; this is called feature
extraction or feature construction
❖ These hidden features are then passed as input to the final
generalized linear model
❖ This approach is particularly useful for problems where the
original input features are not very individually informative
❖ For example, each pixel in an image is not very informative
❖ It is the combination of pixels that tells us what objects are
present
❖ For a task such as document classification, each feature
(word count) is informative on its own, so extracting “higher
order” features is less important
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 3
Convolutional Neural Network
❖ Much of the work in neural networks has been motivated
by visual pattern recognition
❖ A form of MLP which is particularly well suited to 1D
signals like speech or text, or 2D signals like images, is the
convolutional neural network
❖ This is an MLP in which the hidden units have local
receptive fields (as in the primary visual cortex), and in
which the weights are tied or shared across the image, in
order to reduce the number of parameters
❖ Intuitively, the effect of such spatial parameter tying is that
any useful features that are “discovered” in some portion of
the image can be re-used everywhere else without having
to be independently learned
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 4
Convolutional Neural Network
❖ The resulting network then exhibits translation
invariance, meaning it can classify patterns no matter
where they occur inside the input image
❖ Figure shown below gives an example of a convolutional
network, designed by Simard and colleagues (2003)
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 5
Convolutional Neural Network
❖ It has 5 layers (4 layers of adjustable parameters)
designed to classify 29×29 gray-scale images of
handwritten digits from a dataset
❖ In layer 1, we have 6 feature maps each of which has size
13×13
❖ Each hidden node in one of these feature maps is
computed by convolving the image with a 5×5 weight
matrix (sometimes called a kernel), adding a bias, and
then passing the result through some form of nonlinearity
❖ There are 13×13×6 = 1014 neurons in Layer 1, and
(5×5+1)×6 = 156 weights (The "+1" is for the bias)
❖ If we did not share these parameters, there would be 1014
× 26 = 26,364 weights at the first layer
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 6
Convolutional Neural Network
❖ In layer 2, we have 50 feature maps, each of which is
obtained by convolving each feature map in layer 1 with a
5×5 weight matrix, adding them up, adding a bias, and
passing through a nonlinearity
❖ There are 5×5×50 = 1250 neurons in Layer 2, (5×5+1)×6×
50 = 7800 adjustable weights (one kernel for each pair of
feature maps in layers 1 and 2)
❖ 1250×26 = 32,500 connections
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 7
Convolutional Neural Network
❖ Layer 3 is fully connected to layer 2, and has 100 neurons
and 100×(1250+1)= 125,100 weights
❖ Finally, layer 4 is also fully connected, and has 10
neurons, and 10×(100+1)=1010 weights
❖ Adding the above numbers, there are a total of 3,215
neurons, 134,066 adjustable weights, and 184,974
connections
❖ This model is usually trained using stochastic gradient
descent
❖ A single pass over the data set is called an epoch
❖ When Mike O’Neill did these experiments in 2006, he
found that a single epoch took about 40 minutes
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 8
Convolutional Neural Network
❖ Since it took about 30 epochs for the error rate to
converge, the total training time was about 20 hours
❖ Using this technique, they obtained a misclassification rate
on the 10,000 test cases of about 1.40%
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 9
Why Convolutional Neural Network?
❖ CNN is another type of neural network that can be used to
enable machines to visualize things
❖ CNN’s are used to perform analysis on images and visuals
❖ Object detection
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 11
Layers in CNN
Input Layer:
❖ The input layer in CNN should contain image data
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 14
Kernel
❖ Kernel is nothing but a filter that is used to extract the
features from the images
❖ The kernel is a matrix that moves over the input data,
performs the dot product with the sub-region of input data,
and gets the output as the matrix of dot products
❖ Kernel moves on the input data by the stride value
3 3 2 1 0
0 0 1 3 1 0 1 2
3 1 2 2 3
2 2 0
2 0 0 2 2
0 1 2
2 0 0 0 1
Kernel 3x3
Input 5x5
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 15
Kernel
3 3 2 1 0
0 0 1 3 1 0 1 2
3 1 2 2 3 2 2 0
2 0 0 2 2 0 1 2
2 0 0 0 1
Kernel 3x3
Input 5x5
10 17 13
9 6 14
Feature Map
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 16
Solved Example 1
Calculate the size of the feature map if the input image size is
5x5 and kernel is 3x3.
SOLUTION:
❖ Output=[size of input - size of kernel] + 1
❖ O=[I-k]+1
❖ O=[5-3]+1= 3
❖ If input image has a size of nxn and filters size fxf and p is
the padding amount and s is the stride, then the dimension
of the feature map is given by:
❖ Dimension = floor[ ((n-f+2p)/s)+1] x floor[ ((n-f+2p)/s)+1]
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 17
Solved Example 2
An input image has been converted into a matrix of size
12x12 along with a filter of size 3x3 with a stride of 1.
Determine the size of the convoluted matrix.
SOLUTION:
❖ C = ((n-f+2p)/s)+1
❖ Here n = 12, f = 3, p = 0, s = 1
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 18
Convolution
❖ Take filter/kernel (3×3 matrix) and apply it to the input image to get the
convolved feature
❖ This convolved feature is passed on to the next layer
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 19
Convolution
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 20
Convolution (Case: R,G,B)
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 21
Stride
❖ The filter is moved across the image left to right, top to
bottom, with a one-pixel column change on the horizontal
movements, then a one-pixel row change on the vertical
movements
❖ The amount of movement between applications of the filter
to the input image is referred to as the stride, and it is
almost always symmetrical in height and width dimensions
❖ The default stride or strides in two dimensions is (1,1) for
the height and the width movement
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 22
Stride (Considering stride = 2)
3 3 2 1 0
0 1 2
0 0 1 3 1
2 2 0
3 1 2 2 3
0 1 2
2 0 0 2 2
2 0 0 0 1
2 2 3 0 1 2
3 3 2 0 1 2
0 2 2 2 2 0
0 0 1 2 2 0
0 0 1 0 1 2
3 1 2 0 1 2
2 1 0 0 1 2
1 3 1 2 2 0
2 2 3 0 1 2
12 7
3 1 2 0 1 2
2 0 0 2 2 0 9 14
2 0 0 0 1 2
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 23
Solved Example 3
Calculate the size of the feature map if the input image size is
5x5, kernel is 3x3 and stride=2.
SOLUTION:
❖ Output=[(size of input - size of kernel)/s] + 1
❖ O=[(I-k)/s]+1
❖ O=[(5-3)/2]+1= 2
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 24
Solved Example 4
Illustrate the use of stride 1 and stride 2 with input data
{1,3,3,2,1,0,1} and filter {1,1,1}
SOLUTION:
❖ Stride 1
❖ Stride 2
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 25
Padding
❖ Padding is the best approach, where the number of pixels
needed for the convolutional kernel to process the edge
pixels are added onto the outside copying the pixels from
the edge of the image
❖ Fix the border effect problem with padding
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 26
Padding 0
0
0
3
0
3
0
2
0
1
0
0
0
0
0 0 0 1 3 1 0 0 1 2
0 3 1 2 2 3 0 2 2 0
0 2 0 0 2 2 0
0 1 2
0 2 0 0 0 1 0
0 0 0 0 0 0 0
0 0 0 0 1 2 0 0 0 0 1 2
0 3 3 2 2 0 2 1 0 2 2 0
0 0 0 0 1 2 1 3 1 0 1 2
0 0 0 0 1 2 0 0 0 0 1 2
3 3 2 2 2 0 1 0 0 2 2 0
0 0 1 0 1 2 3 1 0 0 1 2
0 0 0 0 1 2
3 2 1 2 2 0
6 14 17 11 3
0 1 3 0 1 2
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 27
Padding
0 0 0 0 0 0 0
0 3 3 2 1 0 0 0 1 2
0 0 0 1 3 1 0
0 3 1 2 2 3 0 2 2 0
0 2 0 0 2 2 0 0 1 2
0 2 0 0 0 1 0
0 0 0 0 0 0 0
6 14 17 11 3
14 12 12 17 11
8 10 17 19 13
11 9 6 14 12
6 4 4 6 4
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 28
Solved Example 5
Calculate the size of the feature map if the input image size is
5x5, kernel is 3x3, padding =1 and stride=1.
SOLUTION:
❖ Output=[(size of input - size of kernel+2P)/s] + 1
❖ O=[(I-k+2P)/s]+1
❖ O=[(5-3+2*1)/1]+1= 5x5
6 14 17 11 3
14 12 12 17 11
8 10 17 19 13
11 9 6 14 12
6 4 4 6 4
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 29
Solved Example 6
Calculate the size of the feature map if the input image size is
28x28, kernel is 5x5, padding =0 and stride=1.
SOLUTION:
❖ Output=[(size of input - size of kernel+2P)/s] + 1
❖ O=[(I-k+2P)/s]+1
❖ O=[(28-5+2*0)/1]+1= 24x24
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 30
Pooling
❖ Pooling is required to down sample the detection of
features in feature maps
❖ Pooling layers provide an approach to down sampling
feature maps by summarizing the presence of features in
patches of the feature map
❖ Two common pooling methods are average pooling and
max pooling that summarize the average presence of a
feature and the most activated presence of a feature
respectively
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 31
Pooling
❖ Similar to the convolutional layer, the pooling layer is
responsible for reducing the spatial size of the convolved
feature
❖ This is to decrease the computational power required to
process the data by reducing the dimensions
❖ There are two types of pooling average pooling and max
pooling
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 32
Pooling
❖ In Max Pooling, find the maximum value of a pixel from a
portion of the image covered by the kernel
❖ Max Pooling also performs as a Noise Suppressant
❖ It discards the noisy activations altogether and also
performs de-noising along with dimensionality reduction
❖ Average Pooling returns the average of all the
values from the portion of the image covered by the Kernel
❖ Average Pooling simply performs dimensionality reduction
as a noise-suppressing mechanism
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 33
Pooling : An example
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 34
Pooling (2x2)
MAX POOLING [2X2]
14 17
11 19
6 14 17 11 3
14 12 12 17 11
8 10 17 19 13
AVERAGE POOLING[2X2]
11 9 6 14 12 11.5 14.25
6 4 4 6 4 9.4 14.0
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 35
Flattening
❖ Once the pooled featured map is obtained, the next step is
to flatten it
❖ It involves transforming the entire pooled feature map
matrix into a single column which is then fed to the neural
network for processing
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 36
Solved Example 7
If we give the input a 3-D image to the network of dimension
39x39, determine the dimension of the vector after passing
through a fully connected layer in the architecture. Consider
the hyper-parameters as follows:
Layer-1: Filter Size – 3x3, Number of Filters – 10, Stride – 1, Padding – 0
Layer-2: Filter Size – 5x5, Number of Filters – 20, Stride – 2, Padding – 0
Layer-3: Filter Size – 5x5, Number of Filters – 40, Stride – 2, Padding – 0
SOLUTION:
❖ Here we have the input image of dimension 39x39x3
convolves with 10 filters of size 3x3 and takes the stride as
1 with no padding
❖ C = ((n-f+2p)/s)+1=((39-3+0)/1)+1= 37
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 37
Solved Example 7
❖ C = ((n-f+2p)/s)+1=((37-5+0)/2)+1= 17
❖ After these operations, we will get an output of 17x17x20
❖ C = ((n-f+2p)/s)+1=((17-5+0)/2)+1= 7
❖ After these operations, we will get an output of 7x7x40
❖ Then unroll them into a large vector, and pass them to a
classifier that will make predictions
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 38
Solved Example 8
Consider the following data
1 2 3 4
5 7 8 9 1 0
10 11 12 11 and mask 0 1
8 6 4 3
16 19 19
16 15 15
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 39
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 40