0% found this document useful (0 votes)
14 views

Topic 5 Computer Vision

Uploaded by

i 3l3j
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Topic 5 Computer Vision

Uploaded by

i 3l3j
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

COE 292

Introduction to Artificial Intelligence

Computer Vision
The slides are mostly adopted from material developed by:
Presentation is based on the content developed by Dr. Akram F. Ahmed, COE Dept.
Dr. Aiman El-Maleh and Dr. Abdul Jabbar Siddiqui, KFUPM, COE Department

INTRODUCTION TO AI 1
Outline

❖ Image Representation
❖ What is Computer Vision
❖ Computer Vision Challenges
❖ Computer Vision Tasks
❖ Computer Vision Applications
❖ Convolutional Neural Networks (CNNs)

INTRODUCTION TO AI 2
Digital Cameras and Imaging
❖ About a few hundred thousand (or a few million) photocells in the
camera

INTRODUCTION TO AI 3
Digital Images
❖ A digital image is made up of
pixels
❖ Grayscale Images
➢ Pixel values represent the
intensity values
▪ 0-255, or 0-1
❖ Color Images
➢ RGB (Red-Green-Blue), HSV
(Hue-Saturation-Value), or
other color spaces
INTRODUCTION TO AI 4
Image Representation
❖ In image processing, the coordinate system is flipped in the vertical direction
The origin of the axis will be at the left top corner

❖ Each square in the coordinate plane is known as a pixel


❖ Each pixel will hold a quantized value acquired when capturing an image
❖ A grayscale image is a scalar image with 0 = black and 28 − 1 =255= white

INTRODUCTION TO AI 5
Colored Images Representation

❖ Grayscale images have only one


channel
❖ In a colored image, the color is
typically composed of multiple
channels
➢ For example, in RGB Images,
there are three channels: red,
green and blue

INTRODUCTION TO AI 6
Colored Images Representation
❖ How does a computer see a RGB
colored image?
➢ Computers see three matrices
stacked on top of each other;
▪ 3 Channels – one for each color (red,
green, blue)
➢ Each channel is a 2D matrix of
numbers
➢ How many bits are used to
represent intensity values?
▪ 8 bits = 1 Byte
INTRODUCTION TO AI 7
Colored Images Representation

INTRODUCTION TO AI 8
Colored Images Representation
Red

Green Blue
INTRODUCTION TO AI 9
What is Computer Vision (CV)?

❖ Concerns with the automatic extraction, analysis and Artificial


Intelligence

understanding of useful information from single images


or sequences of images Computer
Vision
Machine
Learning

❖ Algorithms to achieve automatic visual understanding


➢ Involves converting images to more understandable
things like distance, edges, directions, or other features
❖ From a Biological/Science perspective:
➢ Computer Vision aims to develop computational models
of the human visual system

INTRODUCTION TO AI 10
What is Computer Vision (CV)?

Image (or video) Sensing device Interpreting device Interpretations

garden, spring,
bridge, water,
trees, flower,
green, etc.

INTRODUCTION TO AI 11
What is Computer Vision (CV)?

❖ From an Engineering perspective:


➢ Computer Vision aims to build autonomous systems to
perform some of the tasks which the human visual
Artificial
system can perform and even surpass it in many cases Intelligence

❖ Many vision tasks involve extraction of 3D and temporal Computer


Vision
Machine
Learning
information from time-varying 2D data, i.e., videos
❖ Properties and characteristics of the human visual
system give inspiration to engineers designing computer
vision systems

INTRODUCTION TO AI 12
Computer Vision Challenges
❖ Viewpoint Variation
➢ When viewed from different viewpoints, the image "matrix" may
not have the same values

INTRODUCTION TO AI 13
Computer Vision Challenges
❖ Background Clutter
➢ Makes it difficult to distinguish an object of interest from the
background

INTRODUCTION TO AI 14
Computer Vision Challenges
❖ Illumination
➢ The effects due to illumination changes and variations make it
difficult to detect/recognize objects of interest

INTRODUCTION TO AI 15
Computer Vision Challenges
❖ Occlusion
➢ Occlusion may be simply defined as other object(s) hiding the
object(s) of interest either partially or completely

INTRODUCTION TO AI 16
Computer Vision Challenges
❖ Deformation
▪ An object of interest may be present in different shapes/forms
◦ For example, a cat may not always appear in a fixed shape/form
▪ A good computer vision method therefore has to take into account these
different variations if it has to robustly detect/recognize cats for example

INTRODUCTION TO AI 17
Computer Vision Challenges
❖ Intraclass Variation
➢ Intraclass Variation refers to
the issue of an object class
exhibiting a variety of
appearances
➢ For example, objects of the
class "Cat" can be of various
colors, patterns, shapes, sizes,
etc.
▪ A good computer vision method
should be able to recognize “Cats”
despite the variations of “Cat”
objects

INTRODUCTION TO AI 18
Computer Vision Tasks
❖ Low Level Vision
➢ Measurements
➢ Enhancements
➢ Region segmentation
➢ Features Extraction
❖ Mid Level Vision
➢ Reconstruction
➢ Depth
➢ Motion Estimation
❖ High Level Vision
➢ Category detection
➢ Activity recognition
INTRODUCTION TO AI
➢ Deep understandings
➢ Pose estimation

19
Computer Vision Tasks
❖ Low Level Vision
➢ Measurements
➢ Enhancements
➢ Region segmentation
➢ Features
❖ Mid Level Vision
➢ Reconstruction
➢ Depth Estimation
➢ Motion Estimation
❖ High Level Vision Structure estimation
from photos
➢ Category detection
➢ Activity recognition
➢ Deep understandings
➢ Pose estimation

INTRODUCTION TO AI 20
Computer Vision Tasks
❖ Low Level Vision
➢ Measurements
➢ Enhancements
➢ Region segmentation
➢ Features
❖ Mid Level Vision
➢ Reconstruction
➢ Depth
➢ Motion Estimation
❖ High Level Vision
➢ Category detection
➢ Activity recognition
➢ Deep understandings
➢ Pose estimation

INTRODUCTION TO AI 21
Computer Vision Applications
❖ Laptop: Biometrics auto-login (face recognition), Optical character
recognition
❖ Smartphones: QR codes, panorama construction, face and smile
detection, Google Tango (3D reconstruction), Snapchat filters (face
tracking)
❖ Web: Image search, Google Photos (face recognition, object
recognition, scene recognition, geolocalization from vision),
Facebook (image captioning), YouTube (content categorization)
❖ VR/AR: Outside-in tracking (HTC VIVE), inside out tracking
(simultaneous localization and mapping, HoloLens), object
occlusion (dense depth estimation)
INTRODUCTION TO AI 22
Computer Vision Applications

❖ Medical imaging: CAT / MRI reconstruction, assisted diagnosis,


automatic pathology, endoscopic surgery
❖ Industry: Vision-based robotics (marker-based), machine-assisted
router (jig), surveillance, drones, shopping
❖ Transportation: Assisted driving, face tracking for drowsiness
❖ Motion: Gesture/Activity recognition
❖ Media: Visual effects for film, TV (reconstruction), virtual sports
replay (reconstruction)
https://www.cs.ubc.ca/~lowe/vision.html

INTRODUCTION TO AI 23
Computer Vision Applications
❖ Optical Character Recognition (OCR): Technology to convert
images of text into text. Scanners come with OCR software.

Live
Camera
Translation

Mail digit recognition, AT&T labs

License plate readers


http://en.wikipedia.org/wiki/Automatic_number_plate_recognition

INTRODUCTION TO AI 24
Computer Vision Applications
❖ Sports ❖ Medical Imaging

3D imaging Image guided surgery


Sportvision first down line GoalControl-4D System MRI, CT

INTRODUCTION TO AI 25
Computer Vision Applications
❖ Drones and Mobile Robots ❖ Virtual & Augmented Reality
http://www.robocup.org/
https://www.skydio.com/
https://qltyss.com/

Oculus Quest Microsoft Hololens 2

INTRODUCTION TO AI 26
Convolutional Neural Networks (CNNs)

INTRODUCTION TO AI 27
Convolutional Neural Networks (CNNs)
❖ Convolutional Neural Network (ConvNet/CNN) is a Deep Learning network
which can take in an input image, assign importance (learnable weights and
biases) to various aspects/objects in the image and be able to differentiate one
from the other
❖ Advantage: pre-processing required in a ConvNet is much lower as compared
to other classification algorithms
❖ The architecture of a ConvNet is analogous to that of
the connectivity pattern of Neurons in the Human
Brain and was inspired by the organization of the Visual
Cortex
❖ Individual neurons respond to stimuli only in a
restricted region of the visual field known as the
Receptive Field
➢ A collection of such fields overlap to cover the entire
visual area
INTRODUCTION TO AI 28
Why CNN?

❖ A CNN can successfully capture the Spatial and Temporal


dependencies in an image or a video through the application of
relevant filters
➢ Spatial Dependency: means that things closer together are more
similar than things further apart in an image
➢ Temporal Dependency: means that when a frame changes in a
video, pixel's values remain the same if there are no movements
between frames. In other words, a pixel's value depends on its
value in previous frame

INTRODUCTION TO AI 29
Applying Filters on Images

❖ Applying a filter h on an Image F would


change the pixel values
❖ In traditional Computer Vision, filters
were hand-engineered to
➢ Pre-process images
▪ E.g., blurring, sharpening,
brightness/contrast changes, etc.
➢ or, Extract features from images
▪ E.g., edges, lines, corners, etc.

INTRODUCTION TO AI 30
CNN
❖ The role of the CNN is to reduce
the images into a form which is
easier to process, without losing
features which are critical for
getting a good prediction
❖ Filters in primitive CNNs are
often hand-engineered to
extract desired features
❖ Advanced CNNs can learn the
Filters by training them
INTRODUCTION TO AI 31
Image Convolution

❖ Image convolution is applying a filter that adds each pixel value of


an image to its neighbors, weighted according to a kernel matrix
❖ The filter is defined as a 𝑀 × 𝑀 kernel matrix
❖ Convolving the filter with the image means sliding over the image
spatially, computing dot products
❖ Doing so detects some desired features from the image and can
help the neural network process it
❖ Different filters can detect different features

INTRODUCTION TO AI 32
Image Convolution
❖ Suppose we have an image of size 5x5x1
❖ Scanning an image with a “filter”
➢ A filter is really just a perceptron, with weights
and a bias Convolved
Map
➢ We have selected a filter as a 3x3 block Input Map
➢ At each location, the “filter and underlying
image or map values are multiplied component
wise, and the products are added along with
the bias”
➢ The filter may proceed by more than 1 pixel at
a time
▪ E.g. with a “hop” of two pixels per shift, called stride

INTRODUCTION TO AI 33
Image Convolution
❖ “Stride” between adjacent scanned locations need not be 1

…..

…..

What stride is being used in these convolution operations?


Answer: Stride = 2

INTRODUCTION TO AI 34
Image Convolution
❖ Applying the following filter on an image performs edge detection

3x3 Filter

Original Image Convolved Image


The idea here is that when the pixel is similar to all its neighbors, they should cancel each
other, giving a value of 0. Therefore, the more similar the pixels, the darker the part of the
resulting convolved image, and the more different they are the lighter it is
INTRODUCTION TO AI 35
Size of Convolution Result
❖ Simple convolution size pattern:
➢ Image size: N x N
➢ Filter: M x M
➢ Stride: S
➢ Output size (each side) =
▪ Assuming you’re not allowed to go beyond the edge of the input
❖ Results in a reduction in the output size
➢ Even if S=1

INTRODUCTION TO AI 36
Size of Convolution Result
❖ Image size: 5x5
❖ Filter: 3x3
❖ Stride: 1
❖ Output size = (5-3)/1 + 1=3

❖ Image size: 5x5


❖ Filter: 3x3
❖ Stride: 2
❖ Output size = (5-3)/2+1=2
INTRODUCTION TO AI 37
Basic Components of a CNN
❖ Convolution Layer
❖ Pooling
❖ Flattening

Input Image

Neural
Network

INTRODUCTION TO AI 38
Basic Components of a CNN
❖ Convolution Layer
➢ Instead of feeding the input image
directly to a neural network, it is
fed into convolution layers which
apply some number of image
filters to the original image
producing what are called feature
maps
➢ Each feature map may be
extracting some different relevant Input Image
features from the input
image/map
➢ In a CNN’s training, the values of
the filters are learnt to extract the
most useful/relevant information

INTRODUCTION TO AI 39
Convolution Layer
❖ We can view the joint processing of the various maps as processing
a stacked arrangement of planes using a three-dimensional filter
❖ The computation of the convolutional map at any location sums
the convolutional outputs at all planes

INTRODUCTION TO AI 40
Convolution Layer
❖ Consider a convolution layer with 2 input channels of size 4x4 each and one output
channel with a filter size of 2x2 and a stride of 2. Determine the resulting output from
this convolution operation for the given input values and the given filter values
Filter Channel 1
Size of output channel = (4 – 2)/2 + 1 = 2 x2
1st value = (2*1 + 1*2 + 1*1 + 2*2) + (1*5 + 2*6 + 2*5 + 1*6) = 42
2nd value = (2*3 + 1*4 + 1*3 + 2*4) + (1*7 + 2*8 + 2*7 + 1*8) = 66
3rd value = (2*1 + 1*2 + 1*1 + 2*2) + (1*5 + 2*6 + 2*5 + 1*6) = 42
4th value = (2*3 + 1*4 + 1*3 + 2*4) + (1*7 + 2*8 + 2*7 + 1*8) = 66
Filter Channel 2
Output:

INTRODUCTION TO AI 41
Convolution Layer

Filters always extend the full


depth of the input volume
Each input image has a different filter

INTRODUCTION TO AI 42
Convolution Layer

Transpose

INTRODUCTION TO AI 43
Convolution Layer

INTRODUCTION TO AI 44
Convolution Layer

INTRODUCTION TO AI 45
Convolution Layer
Consider a second, green filter

INTRODUCTION TO AI 46
Convolution Layer
❖ For example, if we had six 5x5x3 filters, we’ll get six separate
activation maps
❖ We stack these up to get a “new image” of size 28x28x6

INTRODUCTION TO AI 47
Convolution Layer
❖ If we have 3 input channels (NI) of size 32x32 (NxN) and 6 output channels
(NO), using a filter of size 5x5 (MxM), then
➢ The number of needed filters = 6
➢ The total number of filter channels is
▪ 3 channels per filter x 6 filters = 18 filter channels, each of size 5x5, i.e., one filter
channel for each input-output combination
➢ The number of neurons = 6
➢ The number of inputs to each neuron is 5x5x3 = 75
➢ The number of network parameters to be trained (ignoring biases) is
▪ 5x5x3x6 = 450
➢ For each Convolution Layer:
Number of trained network parameters ignoring biases = 𝑀𝑀𝑁𝐼 𝑁𝑂
INTRODUCTION TO AI 48
Convolution Layer
• Example: If we have 3 input
channels of size 6x6 and 2 output
channels and using a filter of size
4x4, then:

◦ The output size = (6-4)/1 + 1=3


➢ The number of needed filters is 2
➢ The number of needed filter channels is 2x3 = 6, each of size 4x4
➢ The number of neurons is 2
➢ The number of inputs to each neuron is 4x4x3=48
➢ The number of network parameters to be trained (ignoring biases) is 4x4x3x2 = 96

INTRODUCTION TO AI 49
Pooling Layer
❖ Similar to the Convolutional Layer, the
Pooling layer is responsible for
reducing the spatial size of the
Convolved Feature Map
❖ Pooling:
➢ Decrease the computational power
required to process the data through
dimensionality reduction
➢ Useful for extracting dominant
features which are rotational and
positional invariant

INTRODUCTION TO AI 50
Pooling
❖ There are two types of Pooling:
➢ Max Pooling returns the maximum value
from the portion of the image covered by
the filter
➢ Average Pooling returns the average of
all the values from the portion of the
image covered by the Kernel
➢ Max Pooling performs a lot better than
Average Pooling since It discards the noisy
activations altogether and also performs
de-noising along with dimensionality
reduction
INTRODUCTION TO AI 51
Max and Mean/Average Pooling
❖ There are no parameters Max Pooling
to be learned during the
pooling layer
❖ Number of output
channels in a pooling
layer is the same as the Mean Pooling
number of input
channels of the pooling
layer

INTRODUCTION TO AI 52
Flattening
❖An image is nothing but a matrix
of pixel values
❖Flattening refers to making a
higher dimension array into one
dimension as shown

INTRODUCTION TO AI 53
CNN Structure
❖ Multiple layers of Convolution+Pooling are often used
➢ In each layer/step, different types of features may be
learnt/extracted

Neural
Network

INTRODUCTION TO AI 54
CNN Learned Features

Deeper Higher-level
convolution features
layers (e.g., objects)

Mid-level
features
(e.g., object
parts, shapes)

Initial Low-level
convolution features
layers (e.g., edges)

INTRODUCTION TO AI 55
CNN Learned Features
This figure shows
visualizations of feature
maps at different layers in a
CNN

The final output at the


rightmost indicates the fully
connecter layer's output, i.e.,
the probability scores for the
respective classes

In this example, the "car"


class scores the highest and
hence the CNN will classify
the given image as a "car"
object

INTRODUCTION TO AI 56
CNN Structure
❖ Convolution, pooling, flattening and then feeding to often a fully
connected feed forward neural network for classification

INTRODUCTION TO AI 57
CNN Example: Le-net 5

INTRODUCTION TO AI 58
CNN Example: Le-net 5
❖ Digit recognition on MNIST
(32x32 images)
➢ Conv1: 6 5x5 filters in first conv layer,
stride 1 => 6 28x28 maps
➢ Pool1: 2x2 max pooling, stride 2 => 6
14x14 maps
➢ Conv2: 16 5x5 filters in second conv
layer, stride 1 => 16 10x10 maps
➢ Pool2: 2x2 max pooling with stride 2
for second conv layer =>16 5x5 maps
(400 values in all)
➢ FC: Final MLP: 3 layers: 120 neurons,
84 neurons, and finally 10 output
neurons

INTRODUCTION TO AI 59
Example: Detecting Flower in an Image
❖ Each neuron scans the image and “redraws” the input with some
features enhanced
➢ The specific features that the neuron detects

INTRODUCTION TO AI 60
Example: Detecting Flower in an Image
❖ The first layer looks at small sub regions of the input image; sufficient to detect, say,
petals and enhances those
❖ The second layer looks at regions of output of the first layer
➢ To put the petals together into a flower
➢ This corresponds to looking at a larger region of original input image
❖ We may have any number of layers in this fashion

INTRODUCTION TO AI 61
Example: Detecting Flower in an Image
❖ The rectangular maps of the neurons in the final layer of the
scanning network will generally be reorganized into a vector before
passing them to the final softmax or MLP
❖ The flower will be detected regardless of its position in the image

INTRODUCTION TO AI 62
CNN Demo

❖ CNN Explainer

❖ Demo Video

❖ https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.
html

INTRODUCTION TO AI 63
Summary
❖ The convolutional neural network is a supervised version of a
computational model of mammalian vision
❖ It includes
➢ Convolutional layers comprising learned filters that scan the
outputs of the previous layer
➢ Downsampling layers that operate over groups of outputs from
the convolutional layer to reduce network size
❖ The parameters of the network can be learned through regular
back propagation

INTRODUCTION TO AI 64
Acknowledgments
❖ Slides have been used from:
➢ https://deeplearning.cs.cmu.edu/F20/index.html
➢ http://cs231n.stanford.edu/syllabus.html
➢ https://towardsdatascience.com/a-comprehensive-guide-to-
convolutional-neural-networks-the-eli5-way-3bd2b1164a53

INTRODUCTION TO AI 65

You might also like