Topic 5 Computer Vision
Topic 5 Computer Vision
Computer Vision
The slides are mostly adopted from material developed by:
Presentation is based on the content developed by Dr. Akram F. Ahmed, COE Dept.
Dr. Aiman El-Maleh and Dr. Abdul Jabbar Siddiqui, KFUPM, COE Department
INTRODUCTION TO AI 1
Outline
❖ Image Representation
❖ What is Computer Vision
❖ Computer Vision Challenges
❖ Computer Vision Tasks
❖ Computer Vision Applications
❖ Convolutional Neural Networks (CNNs)
INTRODUCTION TO AI 2
Digital Cameras and Imaging
❖ About a few hundred thousand (or a few million) photocells in the
camera
INTRODUCTION TO AI 3
Digital Images
❖ A digital image is made up of
pixels
❖ Grayscale Images
➢ Pixel values represent the
intensity values
▪ 0-255, or 0-1
❖ Color Images
➢ RGB (Red-Green-Blue), HSV
(Hue-Saturation-Value), or
other color spaces
INTRODUCTION TO AI 4
Image Representation
❖ In image processing, the coordinate system is flipped in the vertical direction
The origin of the axis will be at the left top corner
INTRODUCTION TO AI 5
Colored Images Representation
INTRODUCTION TO AI 6
Colored Images Representation
❖ How does a computer see a RGB
colored image?
➢ Computers see three matrices
stacked on top of each other;
▪ 3 Channels – one for each color (red,
green, blue)
➢ Each channel is a 2D matrix of
numbers
➢ How many bits are used to
represent intensity values?
▪ 8 bits = 1 Byte
INTRODUCTION TO AI 7
Colored Images Representation
INTRODUCTION TO AI 8
Colored Images Representation
Red
Green Blue
INTRODUCTION TO AI 9
What is Computer Vision (CV)?
INTRODUCTION TO AI 10
What is Computer Vision (CV)?
garden, spring,
bridge, water,
trees, flower,
green, etc.
INTRODUCTION TO AI 11
What is Computer Vision (CV)?
INTRODUCTION TO AI 12
Computer Vision Challenges
❖ Viewpoint Variation
➢ When viewed from different viewpoints, the image "matrix" may
not have the same values
INTRODUCTION TO AI 13
Computer Vision Challenges
❖ Background Clutter
➢ Makes it difficult to distinguish an object of interest from the
background
INTRODUCTION TO AI 14
Computer Vision Challenges
❖ Illumination
➢ The effects due to illumination changes and variations make it
difficult to detect/recognize objects of interest
INTRODUCTION TO AI 15
Computer Vision Challenges
❖ Occlusion
➢ Occlusion may be simply defined as other object(s) hiding the
object(s) of interest either partially or completely
INTRODUCTION TO AI 16
Computer Vision Challenges
❖ Deformation
▪ An object of interest may be present in different shapes/forms
◦ For example, a cat may not always appear in a fixed shape/form
▪ A good computer vision method therefore has to take into account these
different variations if it has to robustly detect/recognize cats for example
INTRODUCTION TO AI 17
Computer Vision Challenges
❖ Intraclass Variation
➢ Intraclass Variation refers to
the issue of an object class
exhibiting a variety of
appearances
➢ For example, objects of the
class "Cat" can be of various
colors, patterns, shapes, sizes,
etc.
▪ A good computer vision method
should be able to recognize “Cats”
despite the variations of “Cat”
objects
INTRODUCTION TO AI 18
Computer Vision Tasks
❖ Low Level Vision
➢ Measurements
➢ Enhancements
➢ Region segmentation
➢ Features Extraction
❖ Mid Level Vision
➢ Reconstruction
➢ Depth
➢ Motion Estimation
❖ High Level Vision
➢ Category detection
➢ Activity recognition
INTRODUCTION TO AI
➢ Deep understandings
➢ Pose estimation
19
Computer Vision Tasks
❖ Low Level Vision
➢ Measurements
➢ Enhancements
➢ Region segmentation
➢ Features
❖ Mid Level Vision
➢ Reconstruction
➢ Depth Estimation
➢ Motion Estimation
❖ High Level Vision Structure estimation
from photos
➢ Category detection
➢ Activity recognition
➢ Deep understandings
➢ Pose estimation
INTRODUCTION TO AI 20
Computer Vision Tasks
❖ Low Level Vision
➢ Measurements
➢ Enhancements
➢ Region segmentation
➢ Features
❖ Mid Level Vision
➢ Reconstruction
➢ Depth
➢ Motion Estimation
❖ High Level Vision
➢ Category detection
➢ Activity recognition
➢ Deep understandings
➢ Pose estimation
INTRODUCTION TO AI 21
Computer Vision Applications
❖ Laptop: Biometrics auto-login (face recognition), Optical character
recognition
❖ Smartphones: QR codes, panorama construction, face and smile
detection, Google Tango (3D reconstruction), Snapchat filters (face
tracking)
❖ Web: Image search, Google Photos (face recognition, object
recognition, scene recognition, geolocalization from vision),
Facebook (image captioning), YouTube (content categorization)
❖ VR/AR: Outside-in tracking (HTC VIVE), inside out tracking
(simultaneous localization and mapping, HoloLens), object
occlusion (dense depth estimation)
INTRODUCTION TO AI 22
Computer Vision Applications
INTRODUCTION TO AI 23
Computer Vision Applications
❖ Optical Character Recognition (OCR): Technology to convert
images of text into text. Scanners come with OCR software.
Live
Camera
Translation
INTRODUCTION TO AI 24
Computer Vision Applications
❖ Sports ❖ Medical Imaging
INTRODUCTION TO AI 25
Computer Vision Applications
❖ Drones and Mobile Robots ❖ Virtual & Augmented Reality
http://www.robocup.org/
https://www.skydio.com/
https://qltyss.com/
INTRODUCTION TO AI 26
Convolutional Neural Networks (CNNs)
INTRODUCTION TO AI 27
Convolutional Neural Networks (CNNs)
❖ Convolutional Neural Network (ConvNet/CNN) is a Deep Learning network
which can take in an input image, assign importance (learnable weights and
biases) to various aspects/objects in the image and be able to differentiate one
from the other
❖ Advantage: pre-processing required in a ConvNet is much lower as compared
to other classification algorithms
❖ The architecture of a ConvNet is analogous to that of
the connectivity pattern of Neurons in the Human
Brain and was inspired by the organization of the Visual
Cortex
❖ Individual neurons respond to stimuli only in a
restricted region of the visual field known as the
Receptive Field
➢ A collection of such fields overlap to cover the entire
visual area
INTRODUCTION TO AI 28
Why CNN?
INTRODUCTION TO AI 29
Applying Filters on Images
INTRODUCTION TO AI 30
CNN
❖ The role of the CNN is to reduce
the images into a form which is
easier to process, without losing
features which are critical for
getting a good prediction
❖ Filters in primitive CNNs are
often hand-engineered to
extract desired features
❖ Advanced CNNs can learn the
Filters by training them
INTRODUCTION TO AI 31
Image Convolution
INTRODUCTION TO AI 32
Image Convolution
❖ Suppose we have an image of size 5x5x1
❖ Scanning an image with a “filter”
➢ A filter is really just a perceptron, with weights
and a bias Convolved
Map
➢ We have selected a filter as a 3x3 block Input Map
➢ At each location, the “filter and underlying
image or map values are multiplied component
wise, and the products are added along with
the bias”
➢ The filter may proceed by more than 1 pixel at
a time
▪ E.g. with a “hop” of two pixels per shift, called stride
INTRODUCTION TO AI 33
Image Convolution
❖ “Stride” between adjacent scanned locations need not be 1
…..
…..
INTRODUCTION TO AI 34
Image Convolution
❖ Applying the following filter on an image performs edge detection
3x3 Filter
INTRODUCTION TO AI 36
Size of Convolution Result
❖ Image size: 5x5
❖ Filter: 3x3
❖ Stride: 1
❖ Output size = (5-3)/1 + 1=3
Input Image
Neural
Network
INTRODUCTION TO AI 38
Basic Components of a CNN
❖ Convolution Layer
➢ Instead of feeding the input image
directly to a neural network, it is
fed into convolution layers which
apply some number of image
filters to the original image
producing what are called feature
maps
➢ Each feature map may be
extracting some different relevant Input Image
features from the input
image/map
➢ In a CNN’s training, the values of
the filters are learnt to extract the
most useful/relevant information
INTRODUCTION TO AI 39
Convolution Layer
❖ We can view the joint processing of the various maps as processing
a stacked arrangement of planes using a three-dimensional filter
❖ The computation of the convolutional map at any location sums
the convolutional outputs at all planes
INTRODUCTION TO AI 40
Convolution Layer
❖ Consider a convolution layer with 2 input channels of size 4x4 each and one output
channel with a filter size of 2x2 and a stride of 2. Determine the resulting output from
this convolution operation for the given input values and the given filter values
Filter Channel 1
Size of output channel = (4 – 2)/2 + 1 = 2 x2
1st value = (2*1 + 1*2 + 1*1 + 2*2) + (1*5 + 2*6 + 2*5 + 1*6) = 42
2nd value = (2*3 + 1*4 + 1*3 + 2*4) + (1*7 + 2*8 + 2*7 + 1*8) = 66
3rd value = (2*1 + 1*2 + 1*1 + 2*2) + (1*5 + 2*6 + 2*5 + 1*6) = 42
4th value = (2*3 + 1*4 + 1*3 + 2*4) + (1*7 + 2*8 + 2*7 + 1*8) = 66
Filter Channel 2
Output:
INTRODUCTION TO AI 41
Convolution Layer
INTRODUCTION TO AI 42
Convolution Layer
Transpose
INTRODUCTION TO AI 43
Convolution Layer
INTRODUCTION TO AI 44
Convolution Layer
INTRODUCTION TO AI 45
Convolution Layer
Consider a second, green filter
INTRODUCTION TO AI 46
Convolution Layer
❖ For example, if we had six 5x5x3 filters, we’ll get six separate
activation maps
❖ We stack these up to get a “new image” of size 28x28x6
INTRODUCTION TO AI 47
Convolution Layer
❖ If we have 3 input channels (NI) of size 32x32 (NxN) and 6 output channels
(NO), using a filter of size 5x5 (MxM), then
➢ The number of needed filters = 6
➢ The total number of filter channels is
▪ 3 channels per filter x 6 filters = 18 filter channels, each of size 5x5, i.e., one filter
channel for each input-output combination
➢ The number of neurons = 6
➢ The number of inputs to each neuron is 5x5x3 = 75
➢ The number of network parameters to be trained (ignoring biases) is
▪ 5x5x3x6 = 450
➢ For each Convolution Layer:
Number of trained network parameters ignoring biases = 𝑀𝑀𝑁𝐼 𝑁𝑂
INTRODUCTION TO AI 48
Convolution Layer
• Example: If we have 3 input
channels of size 6x6 and 2 output
channels and using a filter of size
4x4, then:
INTRODUCTION TO AI 49
Pooling Layer
❖ Similar to the Convolutional Layer, the
Pooling layer is responsible for
reducing the spatial size of the
Convolved Feature Map
❖ Pooling:
➢ Decrease the computational power
required to process the data through
dimensionality reduction
➢ Useful for extracting dominant
features which are rotational and
positional invariant
INTRODUCTION TO AI 50
Pooling
❖ There are two types of Pooling:
➢ Max Pooling returns the maximum value
from the portion of the image covered by
the filter
➢ Average Pooling returns the average of
all the values from the portion of the
image covered by the Kernel
➢ Max Pooling performs a lot better than
Average Pooling since It discards the noisy
activations altogether and also performs
de-noising along with dimensionality
reduction
INTRODUCTION TO AI 51
Max and Mean/Average Pooling
❖ There are no parameters Max Pooling
to be learned during the
pooling layer
❖ Number of output
channels in a pooling
layer is the same as the Mean Pooling
number of input
channels of the pooling
layer
INTRODUCTION TO AI 52
Flattening
❖An image is nothing but a matrix
of pixel values
❖Flattening refers to making a
higher dimension array into one
dimension as shown
INTRODUCTION TO AI 53
CNN Structure
❖ Multiple layers of Convolution+Pooling are often used
➢ In each layer/step, different types of features may be
learnt/extracted
Neural
Network
INTRODUCTION TO AI 54
CNN Learned Features
Deeper Higher-level
convolution features
layers (e.g., objects)
Mid-level
features
(e.g., object
parts, shapes)
Initial Low-level
convolution features
layers (e.g., edges)
INTRODUCTION TO AI 55
CNN Learned Features
This figure shows
visualizations of feature
maps at different layers in a
CNN
INTRODUCTION TO AI 56
CNN Structure
❖ Convolution, pooling, flattening and then feeding to often a fully
connected feed forward neural network for classification
INTRODUCTION TO AI 57
CNN Example: Le-net 5
INTRODUCTION TO AI 58
CNN Example: Le-net 5
❖ Digit recognition on MNIST
(32x32 images)
➢ Conv1: 6 5x5 filters in first conv layer,
stride 1 => 6 28x28 maps
➢ Pool1: 2x2 max pooling, stride 2 => 6
14x14 maps
➢ Conv2: 16 5x5 filters in second conv
layer, stride 1 => 16 10x10 maps
➢ Pool2: 2x2 max pooling with stride 2
for second conv layer =>16 5x5 maps
(400 values in all)
➢ FC: Final MLP: 3 layers: 120 neurons,
84 neurons, and finally 10 output
neurons
INTRODUCTION TO AI 59
Example: Detecting Flower in an Image
❖ Each neuron scans the image and “redraws” the input with some
features enhanced
➢ The specific features that the neuron detects
INTRODUCTION TO AI 60
Example: Detecting Flower in an Image
❖ The first layer looks at small sub regions of the input image; sufficient to detect, say,
petals and enhances those
❖ The second layer looks at regions of output of the first layer
➢ To put the petals together into a flower
➢ This corresponds to looking at a larger region of original input image
❖ We may have any number of layers in this fashion
INTRODUCTION TO AI 61
Example: Detecting Flower in an Image
❖ The rectangular maps of the neurons in the final layer of the
scanning network will generally be reorganized into a vector before
passing them to the final softmax or MLP
❖ The flower will be detected regardless of its position in the image
INTRODUCTION TO AI 62
CNN Demo
❖ CNN Explainer
❖ Demo Video
❖ https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.
html
INTRODUCTION TO AI 63
Summary
❖ The convolutional neural network is a supervised version of a
computational model of mammalian vision
❖ It includes
➢ Convolutional layers comprising learned filters that scan the
outputs of the previous layer
➢ Downsampling layers that operate over groups of outputs from
the convolutional layer to reduce network size
❖ The parameters of the network can be learned through regular
back propagation
INTRODUCTION TO AI 64
Acknowledgments
❖ Slides have been used from:
➢ https://deeplearning.cs.cmu.edu/F20/index.html
➢ http://cs231n.stanford.edu/syllabus.html
➢ https://towardsdatascience.com/a-comprehensive-guide-to-
convolutional-neural-networks-the-eli5-way-3bd2b1164a53
INTRODUCTION TO AI 65