Machine Learning - Introduction to Convolutional Neural Networks
The document provides an overview of convolutional neural networks (CNNs) and their application in image classification, detailing their historical background and structure. It explains the process of convolution, feature detection, pooling, and the significance of flattening in preparing image data for neural networks. Key concepts such as strides, feature maps, and different pooling techniques (max and mean pooling) are also discussed to illustrate how CNNs efficiently process and classify images.
Background
• 1988 –Invented by Yann Lecun at AT&T Bell Laboratories.
• Used for Image Classification.
• Acts as a pre-processing front-end to neural networks to
preprocess images efficiently.
• Consists of:
• Convolution
• Max Pooling
• Flattening
3.
Convolutional Neural Network(CNN)
Convolution is a front-end to a Neural Network for Image Classification
Softmax
z1
z2
z3
zk
Output Layer
Hidden Layer
x1
x2
xn
Input Layer
Convoluti
onal
Front-End
Categorical
Outputs
(e.g., Cat, Dog)
and probabilities.
Preprocess Images
Into vector of real values
Image Input
Recognition of real value inputs
Into classification of image inputs
Squashes output into set of
Classification probabilities
4.
Image Data
• Consistsof Pixel Values
• Pixel Values in a Grid Layout (2D array).
• One Layer (Grid) per color.
• BW is pixel values 0 (black) and 1 (white).
• Grayscale and color (RGB) is 0 .. 255.
0
1
BW Image 4 x 4 pixels
Pixel = 0 (black)
Pixel = 1 (white)
0
255
Grayscale Image 4 x 4 pixels
Pixel = 0 (black)
Pixel = 255 (white)
5.
Color Image Data
0
255
RedLayer - Image 4 x 4 pixelsPixel = 0 (no red)
Pixel = 255 (max red)
Color (RGB) is made of 3 layers (grids or called planes or channels)
0
255
Blue Image 4 x4 pixels
Pixel = 0 (black)
Pixel = 255 (max blue)
0
255
Green Image 4x 4 pixels
Pixel = 0 (black)
Pixel = 255 (all green)
All Colors are made up of some combination of Red, Green and Blue.
This is the same as the
color spectrum of the three
types of cones in the retina.
6.
Convolution - FeatureDetectors
0
255
Image Data 5x5 pixels
Apply Feature Detector Filters
Output of application
of filters.
Feature
Detectors
Also known as [image] filters
…
Apply Feature Filters across
a layer of image data.
Collection of Feature Maps
7.
Feature Maps
• Convolutionpreserves the spatial relationship between
pixels by learning image features using small squares of
data.
• (Image) Feature Detector Types
• Edges (Lines) - Detect edges (lines) in the image.
• Curves – Detect curves in the image.
• Sharpen - TBA
• Blur - TBA
• Typically 3x3 pixel shape, but can be 5x5 or 7x7.
8.
Feature Maps –Stride
Move Feature Detector across Image as a sliding window.
Moving the feature Detector across the image (up and down) is called a stride. Moving one pixel at
a time is called a stride of 1.
9.
Feature Maps –Stride - Example
0 1 1 0
1 0 0 0
1 1 0 0
0 1 1 1
BW Image Data 5x5 pixels
Apply Feature Detector Filters
1
1
0
0
0 0 1 0 1
1 0 0
0 1 0
0 1 0
Filter (Feature Detector)
Apply the 3x3 filter
as a matrix product operation
on first 3x3 grid in the image.
Pixel match
1
0 1 1 0
1 0 0 0
1 1 0 0
0 1 1 1
BW Image Data 5x5 pixels
1
1
0
0
0 0 1 0 1
1 0 0
0 1 0
0 1 0
Filter (Feature Detector)
Pixel match
1 1
First cell holds matching pixels
From first stride.
Second cell holds matching pixels
From first stride.
Convolutional Layer
Assemble andCollect Complete Feature Maps, one per Feature Detector
Feature Map is substantially smaller In size
Complete Feature Map for Single Feature Detector
Stride
Map to corresponding placement in complete
feature map, preserving spatial relationship.
Convolutional Layer:
Collection of complete feature
maps, one per feature detector.
12.
ReLU Step
The FeatureMaps are processed by an ReLU function.
Convolution Layer Rectifier Linear Unit Step
All Negative
Values
Replaced with 0.
• The ReLU step increases non-linearity in feature Maps.
• Enhances features such as borders and elements.
13.
Pooling
• Add SpatialInvariance to Feature Maps
• Be able to recognize feature regardless of angle, direction or
skew.
• Does not care where feature is, as long as it maintains its
relative position to other features.
Spatial Invariance
14.
Pooling
• Uses awindow (typically 2x2 pixels) that is slid across
the feature map.
• Finds the pixel with the highest value within the window.
• Places the highest value pixel into a pooled map at the same
relative position.
• Generally uses a stride of 2.
0 1 2 0
1 0 0 4
1 2 0 0
0 1 1 3
Feature Map
1
1
0
0
0 0 1 0 1
0 1 2 0
1 0 0 4
1 2 0 0
0 1 1 3
1
1
0
0
0 0 1 0 1
0 1 2 0
1 0 0 4
1 2 0 0
0 1 1 3
1
1
0
0
0 0 1 0 1
Pooling Options
• MaxPooling
• Finds the pixel with the highest value within the window (also
known as downsampling).
4
• Mean Pooling
• Calculates the average value of all pixels within the window
(also known as subsampling).
2 0
0 4
2 0
0 4
3
17.
Flattening
Convolution Layer PooledLayer
1 4 0 2…Flatten Single Vector
• Flattening takes the pooled layer and flattens it in
sequential order into a single vector.
• Vector is used as the input to the Neural Network