Understanding of Convolutional Neural Network (CNN) - Deep Learning
Understanding of Convolutional Neural Network (CNN) - Deep Learning
CNN image classifications takes an input image, process it and classify it under certain
categories (Eg., Dog, Cat, Tiger, Lion). Computers sees an input image as array of pixels
and it depends on the image resolution. Based on the image resolution, it will see h x w
x d( h = Height, w = Width, d = Dimension ). Eg., An image of 6 x 6 x 3 array of
matrix of RGB (3 refers to RGB values) and an image of 4 x 4 x 1 array of matrix of
grayscale image.
Technically, deep learning CNN models to train and test, each input image will pass it
through a series of convolution layers with filters (Kernals), Pooling, fully connected
layers (FC) and apply Softmax function to classify an object with probabilistic values
between 0 and 1. The below figure is a complete flow of CNN to process an input image
and classifies the objects based on values.
Convolution Layer
Convolution is the first layer to extract features from an input image. Convolution
preserves the relationship between pixels by learning image features using small
squares of input data. It is a mathematical operation that takes two inputs such as
image matrix and a filter or kernal
Consider a 5 x 5 whose image pixel values are 0, 1 and filter matrix 3 x 3 as shown in
below
Figure 4: Image matrix multiplies kernel or lter matrix
Then the convolution of 5 x 5 image matrix multiplies with 3 x 3 filter matrix which is
called “Feature Map” as output shown in below
Convolution of an image with different filters can perform operations such as edge
detection, blur and sharpen by applying filters. The below example shows various
convolution image after applying different types of filters (Kernels).
Figure 7 : Some common lters
Strides
Stride is the number of pixels shifts over the input matrix. When the stride is 1 then we
move the filters to 1 pixel at a time. When the stride is 2 then we move the filters to 2
pixels at a time and so on. The below figure shows convolution would work with a
stride of 2.
Padding
Sometimes filter does not fit perfectly fit the input image. We have two options:
Drop the part of the image where the filter did not fit. This is called valid padding
which keeps only valid part of the image.
ReLU stands for Rectified Linear Unit for a non-linear operation. The output is ƒ(x) =
max(0,x).
Why ReLU is important : ReLU’s purpose is to introduce non-linearity in our ConvNet.
Since, the real world data would want our ConvNet to learn would be non-negative
linear values.
There are other non linear functions such as tanh or sigmoid can also be used instead of
ReLU. Most of the data scientists uses ReLU since performance wise ReLU is better than
other two.
Pooling Layer
Pooling layers section would reduce the number of parameters when the images are
too large. Spatial pooling also called subsampling or downsampling which reduces the
dimensionality of each map but retains the important information. Spatial pooling can
be of different types:
Max Pooling
Average Pooling
Sum Pooling
Max pooling take the largest element from the rectified feature map. Taking the largest
element could also take the average pooling. Sum of all elements in the feature map
call as sum pooling.
Figure 8 : Max Pooling
The layer we call as FC layer, we flattened our matrix into vector and feed it into a fully
connected layer like neural network.
In the above diagram, feature map matrix will be converted as vector (x1, x2, x3, …).
With the fully connected layers, we combined these features together to create a
model. Finally, we have an activation function such as softmax or sigmoid to classify
the outputs as cat, dog, car, truck etc.,
Figure 10 : Complete CNN architecture
Summary
Flatten the output and feed into a fully connected layer (FC Layer)
Output the class using an activation function (Logistic Regression with cost
functions) and classifies images.
In the next post, I would like to talk about some popular CNN architectures such as
AlexNet, VGGNet, GoogLeNet and ResNet.
References :
https://www.mathworks.com/discovery/convolutional-neural-network.html
https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-
Understanding-Convolutional-Neural-Networks/
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
https://blog.datawow.io/interns-explain-cnn-8a669d053f8b.
Machine Learning Cnn Convolution Neural Net Image Recognition Neural Networks