Machine Learning
Convolutional Neural Networks
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
August, 2017
Background
• 1988 – Invented by Yann Lecun at AT&T Bell Laboratories.
• Used for Image Classification.
• Acts as a pre-processing front-end to neural networks to
preprocess images efficiently.
• Consists of:
• Convolution
• Max Pooling
• Flattening
Convolutional Neural Network (CNN)
Convolution is a front-end to a Neural Network for Image Classification
Softmax
z1
z2
z3
zk
Output Layer
Hidden Layer
x1
x2
xn
Input Layer
Convoluti
onal
Front-End
Categorical
Outputs
(e.g., Cat, Dog)
and probabilities.
Preprocess Images
Into vector of real values
Image Input
Recognition of real value inputs
Into classification of image inputs
Squashes output into set of
Classification probabilities
Image Data
• Consists of Pixel Values
• Pixel Values in a Grid Layout (2D array).
• One Layer (Grid) per color.
• BW is pixel values 0 (black) and 1 (white).
• Grayscale and color (RGB) is 0 .. 255.
0
1
BW Image 4 x 4 pixels
Pixel = 0 (black)
Pixel = 1 (white)
0
255
Grayscale Image 4 x 4 pixels
Pixel = 0 (black)
Pixel = 255 (white)
Color Image Data
0
255
Red Layer - Image 4 x 4 pixelsPixel = 0 (no red)
Pixel = 255 (max red)
Color (RGB) is made of 3 layers (grids or called planes or channels)
0
255
Blue Image 4 x4 pixels
Pixel = 0 (black)
Pixel = 255 (max blue)
0
255
Green Image 4x 4 pixels
Pixel = 0 (black)
Pixel = 255 (all green)
All Colors are made up of some combination of Red, Green and Blue.
This is the same as the
color spectrum of the three
types of cones in the retina.
Convolution - Feature Detectors
0
255
Image Data 5x5 pixels
Apply Feature Detector Filters
Output of application
of filters.
Feature
Detectors
Also known as [image] filters
…
Apply Feature Filters across
a layer of image data.
Collection of Feature Maps
Feature Maps
• Convolution preserves the spatial relationship between
pixels by learning image features using small squares of
data.
• (Image) Feature Detector Types
• Edges (Lines) - Detect edges (lines) in the image.
• Curves – Detect curves in the image.
• Sharpen - TBA
• Blur - TBA
• Typically 3x3 pixel shape, but can be 5x5 or 7x7.
Feature Maps – Stride
Move Feature Detector across Image as a sliding window.
Moving the feature Detector across the image (up and down) is called a stride. Moving one pixel at
a time is called a stride of 1.
Feature Maps – Stride - Example
0 1 1 0
1 0 0 0
1 1 0 0
0 1 1 1
BW Image Data 5x5 pixels
Apply Feature Detector Filters
1
1
0
0
0 0 1 0 1
1 0 0
0 1 0
0 1 0
Filter (Feature Detector)
Apply the 3x3 filter
as a matrix product operation
on first 3x3 grid in the image.
Pixel match
1
0 1 1 0
1 0 0 0
1 1 0 0
0 1 1 1
BW Image Data 5x5 pixels
1
1
0
0
0 0 1 0 1
1 0 0
0 1 0
0 1 0
Filter (Feature Detector)
Pixel match
1 1
First cell holds matching pixels
From first stride.
Second cell holds matching pixels
From first stride.
Feature Maps – Stride - Example
0 1 1 0
1 0 0 0
1 1 0 0
0 1 1 1
BW Image Data 5x5 pixels
Apply Feature Detector Filters
1
1
0
0
0 0 1 0 1
1 0 0
0 1 0
0 1 0
Filter (Feature Detector)
Pixel match
1
0 1 1 0
1 0 0 0
1 1 0 0
0 1 1 1
BW Image Data 5x5 pixels
1
1
0
0
0 0 1 0 1
1 0 0
0 1 0
0 1 0
Filter (Feature Detector)
Pixel match
1 1
Completed first horizontal stride1 1
1
3
Matched 3 pixels
1 1 1
3 1 1
2 3 1
Finished Feature Map
High Detection
Also known as convolved feature
or activation map.
Convolutional Layer
Assemble and Collect Complete Feature Maps, one per Feature Detector
Feature Map is substantially smaller In size
Complete Feature Map for Single Feature Detector
Stride
Map to corresponding placement in complete
feature map, preserving spatial relationship.
Convolutional Layer:
Collection of complete feature
maps, one per feature detector.
ReLU Step
The Feature Maps are processed by an ReLU function.
Convolution Layer Rectifier Linear Unit Step
All Negative
Values
Replaced with 0.
• The ReLU step increases non-linearity in feature Maps.
• Enhances features such as borders and elements.
Pooling
• Add Spatial Invariance to Feature Maps
• Be able to recognize feature regardless of angle, direction or
skew.
• Does not care where feature is, as long as it maintains its
relative position to other features.
Spatial Invariance
Pooling
• Uses a window (typically 2x2 pixels) that is slid across
the feature map.
• Finds the pixel with the highest value within the window.
• Places the highest value pixel into a pooled map at the same
relative position.
• Generally uses a stride of 2.
0 1 2 0
1 0 0 4
1 2 0 0
0 1 1 3
Feature Map
1
1
0
0
0 0 1 0 1
0 1 2 0
1 0 0 4
1 2 0 0
0 1 1 3
1
1
0
0
0 0 1 0 1
0 1 2 0
1 0 0 4
1 2 0 0
0 1 1 3
1
1
0
0
0 0 1 0 1
Pooling Example
0 1 2 0
1 0 0 4
1 2 0 0
0 1 1 3
1
1
0
0
0 0 1 0 1
0 1 2 0
1 0 0 4
1 2 0 0
0 1 1 3
1
1
0
0
0 0 1 0 1
0 1 2 0
1 0 0 4
1 2 0 0
0 1 1 3
1
1
0
0
0 0 1 0 1
1 4 1
2
0
3
1
0
1
Completed Pooled Feature Map
Highest value placed in
corresponding position
In pooled map.
Stride of 2
Window slides
off the edge.
Pooling Options
• Max Pooling
• Finds the pixel with the highest value within the window (also
known as downsampling).
4
• Mean Pooling
• Calculates the average value of all pixels within the window
(also known as subsampling).
2 0
0 4
2 0
0 4
3
Flattening
Convolution Layer Pooled Layer
1 4 0 2…Flatten Single Vector
• Flattening takes the pooled layer and flattens it in
sequential order into a single vector.
• Vector is used as the input to the Neural Network

Machine Learning - Introduction to Convolutional Neural Networks

  • 1.
    Machine Learning Convolutional NeuralNetworks Portland Data Science Group Created by Andrew Ferlitsch Community Outreach Officer August, 2017
  • 2.
    Background • 1988 –Invented by Yann Lecun at AT&T Bell Laboratories. • Used for Image Classification. • Acts as a pre-processing front-end to neural networks to preprocess images efficiently. • Consists of: • Convolution • Max Pooling • Flattening
  • 3.
    Convolutional Neural Network(CNN) Convolution is a front-end to a Neural Network for Image Classification Softmax z1 z2 z3 zk Output Layer Hidden Layer x1 x2 xn Input Layer Convoluti onal Front-End Categorical Outputs (e.g., Cat, Dog) and probabilities. Preprocess Images Into vector of real values Image Input Recognition of real value inputs Into classification of image inputs Squashes output into set of Classification probabilities
  • 4.
    Image Data • Consistsof Pixel Values • Pixel Values in a Grid Layout (2D array). • One Layer (Grid) per color. • BW is pixel values 0 (black) and 1 (white). • Grayscale and color (RGB) is 0 .. 255. 0 1 BW Image 4 x 4 pixels Pixel = 0 (black) Pixel = 1 (white) 0 255 Grayscale Image 4 x 4 pixels Pixel = 0 (black) Pixel = 255 (white)
  • 5.
    Color Image Data 0 255 RedLayer - Image 4 x 4 pixelsPixel = 0 (no red) Pixel = 255 (max red) Color (RGB) is made of 3 layers (grids or called planes or channels) 0 255 Blue Image 4 x4 pixels Pixel = 0 (black) Pixel = 255 (max blue) 0 255 Green Image 4x 4 pixels Pixel = 0 (black) Pixel = 255 (all green) All Colors are made up of some combination of Red, Green and Blue. This is the same as the color spectrum of the three types of cones in the retina.
  • 6.
    Convolution - FeatureDetectors 0 255 Image Data 5x5 pixels Apply Feature Detector Filters Output of application of filters. Feature Detectors Also known as [image] filters … Apply Feature Filters across a layer of image data. Collection of Feature Maps
  • 7.
    Feature Maps • Convolutionpreserves the spatial relationship between pixels by learning image features using small squares of data. • (Image) Feature Detector Types • Edges (Lines) - Detect edges (lines) in the image. • Curves – Detect curves in the image. • Sharpen - TBA • Blur - TBA • Typically 3x3 pixel shape, but can be 5x5 or 7x7.
  • 8.
    Feature Maps –Stride Move Feature Detector across Image as a sliding window. Moving the feature Detector across the image (up and down) is called a stride. Moving one pixel at a time is called a stride of 1.
  • 9.
    Feature Maps –Stride - Example 0 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 BW Image Data 5x5 pixels Apply Feature Detector Filters 1 1 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 Filter (Feature Detector) Apply the 3x3 filter as a matrix product operation on first 3x3 grid in the image. Pixel match 1 0 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 BW Image Data 5x5 pixels 1 1 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 Filter (Feature Detector) Pixel match 1 1 First cell holds matching pixels From first stride. Second cell holds matching pixels From first stride.
  • 10.
    Feature Maps –Stride - Example 0 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 BW Image Data 5x5 pixels Apply Feature Detector Filters 1 1 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 Filter (Feature Detector) Pixel match 1 0 1 1 0 1 0 0 0 1 1 0 0 0 1 1 1 BW Image Data 5x5 pixels 1 1 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 0 Filter (Feature Detector) Pixel match 1 1 Completed first horizontal stride1 1 1 3 Matched 3 pixels 1 1 1 3 1 1 2 3 1 Finished Feature Map High Detection Also known as convolved feature or activation map.
  • 11.
    Convolutional Layer Assemble andCollect Complete Feature Maps, one per Feature Detector Feature Map is substantially smaller In size Complete Feature Map for Single Feature Detector Stride Map to corresponding placement in complete feature map, preserving spatial relationship. Convolutional Layer: Collection of complete feature maps, one per feature detector.
  • 12.
    ReLU Step The FeatureMaps are processed by an ReLU function. Convolution Layer Rectifier Linear Unit Step All Negative Values Replaced with 0. • The ReLU step increases non-linearity in feature Maps. • Enhances features such as borders and elements.
  • 13.
    Pooling • Add SpatialInvariance to Feature Maps • Be able to recognize feature regardless of angle, direction or skew. • Does not care where feature is, as long as it maintains its relative position to other features. Spatial Invariance
  • 14.
    Pooling • Uses awindow (typically 2x2 pixels) that is slid across the feature map. • Finds the pixel with the highest value within the window. • Places the highest value pixel into a pooled map at the same relative position. • Generally uses a stride of 2. 0 1 2 0 1 0 0 4 1 2 0 0 0 1 1 3 Feature Map 1 1 0 0 0 0 1 0 1 0 1 2 0 1 0 0 4 1 2 0 0 0 1 1 3 1 1 0 0 0 0 1 0 1 0 1 2 0 1 0 0 4 1 2 0 0 0 1 1 3 1 1 0 0 0 0 1 0 1
  • 15.
    Pooling Example 0 12 0 1 0 0 4 1 2 0 0 0 1 1 3 1 1 0 0 0 0 1 0 1 0 1 2 0 1 0 0 4 1 2 0 0 0 1 1 3 1 1 0 0 0 0 1 0 1 0 1 2 0 1 0 0 4 1 2 0 0 0 1 1 3 1 1 0 0 0 0 1 0 1 1 4 1 2 0 3 1 0 1 Completed Pooled Feature Map Highest value placed in corresponding position In pooled map. Stride of 2 Window slides off the edge.
  • 16.
    Pooling Options • MaxPooling • Finds the pixel with the highest value within the window (also known as downsampling). 4 • Mean Pooling • Calculates the average value of all pixels within the window (also known as subsampling). 2 0 0 4 2 0 0 4 3
  • 17.
    Flattening Convolution Layer PooledLayer 1 4 0 2…Flatten Single Vector • Flattening takes the pooled layer and flattens it in sequential order into a single vector. • Vector is used as the input to the Neural Network