0% found this document useful (0 votes)
15 views

13-Convolutional Neural Network-09-08-2024

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their architecture, including layers such as convolutional, ReLU, pooling, and fully connected layers. It explains the significance of feature extraction, the role of kernels, and the importance of parameters like stride and padding in processing image data. Additionally, it includes examples and calculations related to feature map sizes and the convolution process.

Uploaded by

sachitamanna2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

13-Convolutional Neural Network-09-08-2024

The document provides an overview of Convolutional Neural Networks (CNNs), detailing their architecture, including layers such as convolutional, ReLU, pooling, and fully connected layers. It explains the significance of feature extraction, the role of kernels, and the importance of parameters like stride and padding in processing image data. Additionally, it includes examples and calculations related to feature map sizes and the convolution process.

Uploaded by

sachitamanna2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

BEEE410L MACHINE LEARNING

Dr.S.ALBERT ALEXANDER
SCHOOL OF ELECTRICAL ENGINEERING
[email protected]

Dr.S.ALBERT ALEXANDER-
SELECT-VIT 1
Module 2
Artificial Neural Networks
❖ Perceptron Learning Algorithm

❖ Multi-layer Perceptron, Feed-forward Network,


Feedback Network
❖ Back propagation Algorithm

❖ Convolutional Neural Network (CNN)

❖ Recurrent Neural Network (RNN)

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 2
2.4 Convolutional Neural Network
❖ The purpose of the hidden units is to learn non-linear
combinations of the original inputs; this is called feature
extraction or feature construction
❖ These hidden features are then passed as input to the final
generalized linear model
❖ This approach is particularly useful for problems where the
original input features are not very individually informative
❖ For example, each pixel in an image is not very informative
❖ It is the combination of pixels that tells us what objects are
present
❖ For a task such as document classification, each feature
(word count) is informative on its own, so extracting “higher
order” features is less important
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 3
Convolutional Neural Network
❖ Much of the work in neural networks has been motivated
by visual pattern recognition
❖ A form of MLP which is particularly well suited to 1D
signals like speech or text, or 2D signals like images, is the
convolutional neural network
❖ This is an MLP in which the hidden units have local
receptive fields (as in the primary visual cortex), and in
which the weights are tied or shared across the image, in
order to reduce the number of parameters
❖ Intuitively, the effect of such spatial parameter tying is that
any useful features that are “discovered” in some portion of
the image can be re-used everywhere else without having
to be independently learned
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 4
Convolutional Neural Network
❖ The resulting network then exhibits translation
invariance, meaning it can classify patterns no matter
where they occur inside the input image
❖ Figure shown below gives an example of a convolutional
network, designed by Simard and colleagues (2003)

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 5
Convolutional Neural Network
❖ It has 5 layers (4 layers of adjustable parameters)
designed to classify 29×29 gray-scale images of
handwritten digits from a dataset
❖ In layer 1, we have 6 feature maps each of which has size
13×13
❖ Each hidden node in one of these feature maps is
computed by convolving the image with a 5×5 weight
matrix (sometimes called a kernel), adding a bias, and
then passing the result through some form of nonlinearity
❖ There are 13×13×6 = 1014 neurons in Layer 1, and
(5×5+1)×6 = 156 weights (The "+1" is for the bias)
❖ If we did not share these parameters, there would be 1014
× 26 = 26,364 weights at the first layer
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 6
Convolutional Neural Network
❖ In layer 2, we have 50 feature maps, each of which is
obtained by convolving each feature map in layer 1 with a
5×5 weight matrix, adding them up, adding a bias, and
passing through a nonlinearity
❖ There are 5×5×50 = 1250 neurons in Layer 2, (5×5+1)×6×
50 = 7800 adjustable weights (one kernel for each pair of
feature maps in layers 1 and 2)
❖ 1250×26 = 32,500 connections

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 7
Convolutional Neural Network
❖ Layer 3 is fully connected to layer 2, and has 100 neurons
and 100×(1250+1)= 125,100 weights
❖ Finally, layer 4 is also fully connected, and has 10
neurons, and 10×(100+1)=1010 weights
❖ Adding the above numbers, there are a total of 3,215
neurons, 134,066 adjustable weights, and 184,974
connections
❖ This model is usually trained using stochastic gradient
descent
❖ A single pass over the data set is called an epoch
❖ When Mike O’Neill did these experiments in 2006, he
found that a single epoch took about 40 minutes

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 8
Convolutional Neural Network
❖ Since it took about 30 epochs for the error rate to
converge, the total training time was about 20 hours
❖ Using this technique, they obtained a misclassification rate
on the 10,000 test cases of about 1.40%

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 9
Why Convolutional Neural Network?
❖ CNN is another type of neural network that can be used to
enable machines to visualize things
❖ CNN’s are used to perform analysis on images and visuals

❖ These classes of neural networks can input a multi-channel


image and work on it easily with minimal preprocessing
required
CNNs are widely used in:
❖ Image recognition and Image classification

❖ Object detection

❖ Recognition of faces, etc.

Therefore, CNN takes an image as an input, processes it,


and classifies it under certain categories.
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 10
Framework

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 11
Layers in CNN

Input Layer:
❖ The input layer in CNN should contain image data

❖ Image data is represented by a three-dimensional matrix

❖ We have to reshape the image into a single column

➢ For Example: Suppose we have an image of dimension


28 x 28 =784, we need to convert it into 784 x 1 before
feeding it into the input
➢ If we have “k” training examples in the dataset, then the
dimension of input will be (784, k)
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 12
Layers in CNN
❖ Convolutional Layer: To perform the convolution
operation, this layer is used which creates several smaller
picture windows to go over the data
❖ ReLU Layer: A Rectified Linear Unit (ReLU) layer is a non-
linear activation function that performs element-wise
operations on multi-layer neural networks
❖ This layer introduces the non-linearity to the network and
converts all the negative pixels to zero
❖ The final output is a rectified feature map
❖ Pooling Layer: Pooling is a down-sampling operation that
reduces the dimensionality of the feature map
❖ Fully Connected Layer: This layer identifies and classifies
the objects in the image
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 13
Layers in CNN
❖ Softmax / Logistic Layer: The softmax or Logistic layer is
the last layer of CNN
❖ It resides at the end of the fully connected layer
❖ Logistic is used for binary classification problem statement
❖ Softmax is for multi-classification problem statement

❖ Output Layer: This layer contains the label in the form of a


one-hot encoded vector

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 14
Kernel
❖ Kernel is nothing but a filter that is used to extract the
features from the images
❖ The kernel is a matrix that moves over the input data,
performs the dot product with the sub-region of input data,
and gets the output as the matrix of dot products
❖ Kernel moves on the input data by the stride value

3 3 2 1 0
0 0 1 3 1 0 1 2
3 1 2 2 3
2 2 0
2 0 0 2 2
0 1 2
2 0 0 0 1

Kernel 3x3
Input 5x5

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 15
Kernel
3 3 2 1 0
0 0 1 3 1 0 1 2

3 1 2 2 3 2 2 0
2 0 0 2 2 0 1 2
2 0 0 0 1
Kernel 3x3
Input 5x5

= (3x0) + (3x1) + (2x2) + (0x2) + (0x2) + (1x0) + (3x0) + (1x1) +(2x2)


= 0+3+4+0+0+0+0+1+ 4 = 12
12 12 17

10 17 13

9 6 14

Feature Map

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 16
Solved Example 1
Calculate the size of the feature map if the input image size is
5x5 and kernel is 3x3.
SOLUTION:
❖ Output=[size of input - size of kernel] + 1

❖ O=[I-k]+1
❖ O=[5-3]+1= 3

❖ If input image has a size of nxn and filters size fxf and p is
the padding amount and s is the stride, then the dimension
of the feature map is given by:
❖ Dimension = floor[ ((n-f+2p)/s)+1] x floor[ ((n-f+2p)/s)+1]
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 17
Solved Example 2
An input image has been converted into a matrix of size
12x12 along with a filter of size 3x3 with a stride of 1.
Determine the size of the convoluted matrix.
SOLUTION:
❖ C = ((n-f+2p)/s)+1

❖ C is the size of the convoluted matrix

❖ n is the size of the input matrix

❖ f is the size of the filter matrix

❖ p is the padding amount.

❖ s is the stride applied.

❖ Here n = 12, f = 3, p = 0, s = 1

❖ Therefore the size of the convoluted matrix is 10x10

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 18
Convolution

❖ Take filter/kernel (3×3 matrix) and apply it to the input image to get the
convolved feature
❖ This convolved feature is passed on to the next layer
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 19
Convolution

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 20
Convolution (Case: R,G,B)

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 21
Stride
❖ The filter is moved across the image left to right, top to
bottom, with a one-pixel column change on the horizontal
movements, then a one-pixel row change on the vertical
movements
❖ The amount of movement between applications of the filter
to the input image is referred to as the stride, and it is
almost always symmetrical in height and width dimensions
❖ The default stride or strides in two dimensions is (1,1) for
the height and the width movement

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 22
Stride (Considering stride = 2)
3 3 2 1 0
0 1 2
0 0 1 3 1
2 2 0
3 1 2 2 3
0 1 2
2 0 0 2 2
2 0 0 0 1

2 2 3 0 1 2
3 3 2 0 1 2
0 2 2 2 2 0
0 0 1 2 2 0
0 0 1 0 1 2
3 1 2 0 1 2

2 1 0 0 1 2

1 3 1 2 2 0
2 2 3 0 1 2

12 7
3 1 2 0 1 2
2 0 0 2 2 0 9 14

2 0 0 0 1 2

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 23
Solved Example 3
Calculate the size of the feature map if the input image size is
5x5, kernel is 3x3 and stride=2.
SOLUTION:
❖ Output=[(size of input - size of kernel)/s] + 1

❖ O=[(I-k)/s]+1
❖ O=[(5-3)/2]+1= 2

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 24
Solved Example 4
Illustrate the use of stride 1 and stride 2 with input data
{1,3,3,2,1,0,1} and filter {1,1,1}
SOLUTION:
❖ Stride 1

❖ Stride 2

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 25
Padding
❖ Padding is the best approach, where the number of pixels
needed for the convolutional kernel to process the edge
pixels are added onto the outside copying the pixels from
the edge of the image
❖ Fix the border effect problem with padding

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 26
Padding 0
0
0
3
0
3
0
2
0
1
0
0
0
0
0 0 0 1 3 1 0 0 1 2
0 3 1 2 2 3 0 2 2 0
0 2 0 0 2 2 0
0 1 2
0 2 0 0 0 1 0
0 0 0 0 0 0 0

0 0 0 0 1 2 0 0 0 0 1 2
0 3 3 2 2 0 2 1 0 2 2 0
0 0 0 0 1 2 1 3 1 0 1 2

0 0 0 0 1 2 0 0 0 0 1 2

3 3 2 2 2 0 1 0 0 2 2 0
0 0 1 0 1 2 3 1 0 0 1 2

0 0 0 0 1 2
3 2 1 2 2 0
6 14 17 11 3
0 1 3 0 1 2

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 27
Padding
0 0 0 0 0 0 0
0 3 3 2 1 0 0 0 1 2
0 0 0 1 3 1 0
0 3 1 2 2 3 0 2 2 0
0 2 0 0 2 2 0 0 1 2
0 2 0 0 0 1 0
0 0 0 0 0 0 0

6 14 17 11 3

14 12 12 17 11

8 10 17 19 13

11 9 6 14 12

6 4 4 6 4

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 28
Solved Example 5
Calculate the size of the feature map if the input image size is
5x5, kernel is 3x3, padding =1 and stride=1.
SOLUTION:
❖ Output=[(size of input - size of kernel+2P)/s] + 1

❖ O=[(I-k+2P)/s]+1
❖ O=[(5-3+2*1)/1]+1= 5x5

6 14 17 11 3

14 12 12 17 11

8 10 17 19 13

11 9 6 14 12

6 4 4 6 4

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 29
Solved Example 6
Calculate the size of the feature map if the input image size is
28x28, kernel is 5x5, padding =0 and stride=1.
SOLUTION:
❖ Output=[(size of input - size of kernel+2P)/s] + 1

❖ O=[(I-k+2P)/s]+1
❖ O=[(28-5+2*0)/1]+1= 24x24

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 30
Pooling
❖ Pooling is required to down sample the detection of
features in feature maps
❖ Pooling layers provide an approach to down sampling
feature maps by summarizing the presence of features in
patches of the feature map
❖ Two common pooling methods are average pooling and
max pooling that summarize the average presence of a
feature and the most activated presence of a feature
respectively

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 31
Pooling
❖ Similar to the convolutional layer, the pooling layer is
responsible for reducing the spatial size of the convolved
feature
❖ This is to decrease the computational power required to
process the data by reducing the dimensions
❖ There are two types of pooling average pooling and max
pooling

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 32
Pooling
❖ In Max Pooling, find the maximum value of a pixel from a
portion of the image covered by the kernel
❖ Max Pooling also performs as a Noise Suppressant
❖ It discards the noisy activations altogether and also
performs de-noising along with dimensionality reduction
❖ Average Pooling returns the average of all the
values from the portion of the image covered by the Kernel
❖ Average Pooling simply performs dimensionality reduction
as a noise-suppressing mechanism

❖ Max Pooling performs a lot better than Average Pooling

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 33
Pooling : An example

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 34
Pooling (2x2)
MAX POOLING [2X2]
14 17

11 19
6 14 17 11 3

14 12 12 17 11

8 10 17 19 13
AVERAGE POOLING[2X2]
11 9 6 14 12 11.5 14.25

6 4 4 6 4 9.4 14.0

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 35
Flattening
❖ Once the pooled featured map is obtained, the next step is
to flatten it
❖ It involves transforming the entire pooled feature map
matrix into a single column which is then fed to the neural
network for processing

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 36
Solved Example 7
If we give the input a 3-D image to the network of dimension
39x39, determine the dimension of the vector after passing
through a fully connected layer in the architecture. Consider
the hyper-parameters as follows:
Layer-1: Filter Size – 3x3, Number of Filters – 10, Stride – 1, Padding – 0
Layer-2: Filter Size – 5x5, Number of Filters – 20, Stride – 2, Padding – 0
Layer-3: Filter Size – 5x5, Number of Filters – 40, Stride – 2, Padding – 0
SOLUTION:
❖ Here we have the input image of dimension 39x39x3
convolves with 10 filters of size 3x3 and takes the stride as
1 with no padding
❖ C = ((n-f+2p)/s)+1=((39-3+0)/1)+1= 37

❖ After these operations, we will get an output of 37x37x10

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 37
Solved Example 7

❖ C = ((n-f+2p)/s)+1=((37-5+0)/2)+1= 17
❖ After these operations, we will get an output of 17x17x20
❖ C = ((n-f+2p)/s)+1=((17-5+0)/2)+1= 7
❖ After these operations, we will get an output of 7x7x40
❖ Then unroll them into a large vector, and pass them to a
classifier that will make predictions
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 38
Solved Example 8
Consider the following data

1 2 3 4
5 7 8 9 1 0
10 11 12 11 and mask 0 1
8 6 4 3

Show the results of the convolution operation.


SOLUTION:
8 10 12

16 19 19

16 15 15

Dr.S.ALBERT ALEXANDER-SELECT-
VIT 39
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 40

You might also like