0% found this document useful (0 votes)

66 views

Chapter 7. Object Recognition

This document discusses different techniques for object recognition including pattern matching, feature-based methods, interpretation trees, and artificial neural networks. It provides details on each technique including applications, general strategies, and factors for performance analysis such as invariances, robustness, complexity, reliability and accuracy.

Uploaded by

Hiển Nguyễn Xuân

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

Chapter 7. Object Recognition

Uploaded by

Hiển Nguyễn Xuân

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 106

TRƯỜNG ĐẠI HỌC BÁCH KHOA HÀ NỘI

XỬ LÝ ẢNH TRONG CƠ ĐIỆN TỬ

Machine Vision

Giảng viên: TS. Nguyễn Thành Hùng

Đơn vị: Bộ môn Cơ điện tử, Viện Cơ khí

Hà Nội, 2021 1
Chapter 7. Object Recognition

❖1. Introduction

❖2. Pattern Matching

❖3. Feature-based Methods

❖4. Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 2
1. Introduction
▪ Object recognition: localize and to classify objects.
▪ General concept:
➢ training datasets containing images with known and labelled objects;
➢ extracts different types of information (colours, edges, geometric forms) based on the
chosen algorithm
➢ for any new image the same information is gathered and compared to the training
dataset to find the most suitable classification.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 3
1. Introduction
▪ Applications:
➢ robots in industrial environments,
➢ face or handwriting recognition
➢ autonomous systems such as modern cars which use object recognition for pedestrian
detection, emergency brake assistant and so on.
➢…

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 4
1. Introduction

▪ General Object Recognition Strategies

➢ Appearance-based method
➢ Feature-based method
➢ Interpretation Tree
➢ Pattern Matching
➢ Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 5
1. Introduction
▪ General Object Recognition Strategies: Appearance-based method
➢ Face or handwriting recognition
➢ Reference training images
➢ This dataset is compressed to obtain a lower dimension subspace, also called eigenspace.
➢ Parts of the new input images are projected on the eigenspace and then correspondence is
examined.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 6
1. Introduction
▪ General Object Recognition Strategies: Feature-based Method
➢ Characteristic for each object
➢ Colours, contour lines, geometric forms or edges
➢ The basic concept of feature-based object recognition strategies is following:
• Every input image is searched for a specific type of feature,
• This feature is then compared to a database containing models of the objects in order to
verify if there are recognised objects.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 7
1. Introduction

▪ General Object Recognition Strategies: Feature-based method

➢ Features and their descriptors can be either found considering the whole image (global
feature) or after observing just small parts of the image (local feature).
➢ An histogram of the pixel intensity or colour are simple examples for global features.
➢ It is not always reasonable to compare the whole image, as already slight changes in
illumination, position (occlusion) or rotation lead to significant differences and a correct
recognition is not possible anymore.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 8
1. Introduction

▪ General Object Recognition Strategies: Feature-based method

➢ Descriptors of local features are more robust against these problems and therefore
algorithms with local features often outperform global feature-based methods.

Two patches of different

images are cut and
compared if the error
between the patches is
below a certain threshold.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 9
1. Introduction

▪ General Object Recognition Strategies: Interpretation Tree

➢ Interpretation tree is a depth first search algorithm for model matching.
➢ Algorithms based on this approach often try to recognise n-dimensional geometric
objects, therefore a database containing models with known features is necessary.
➢ The feature set might consist of distance, angle and direction constraints between points
on the surface of the objects.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 10
1. Introduction

▪ General Object Recognition Strategies: Interpretation Tree

Procedure of an interpretation tree algorithm

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 11
1. Introduction

▪ General Object Recognition Strategies: Pattern Matching

➢ Methods of pattern matching, or sometimes called template matching, are often used
because of their simplicity.
➢ Template matching is a technique for finding small parts of an image which match a
template image.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 12
1. Introduction

▪ General Object Recognition Strategies: Pattern Matching

➢ One famous application of template matching is traffic sign recognition, small parts of the
input image are tried to be matched with a database full of different images of traffic signs.
➢ This approach has lots of disadvantages such as problems with occlusion, rotation, scaling,
different illuminations.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 13
1. Introduction

▪ General Object Recognition Strategies: Artificial neural networks

➢ A model consists of several layer, in which each layer is composed of a certain number of
neurons.

A neural network containing one input layer, two hidden layer and one output layer.
Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 14
1. Introduction

▪ General Object Recognition Strategies: Artificial neural networks

➢ An input and an output layer is the minimum amount of layers a network can have, but
normally hidden layer are included to be able to learn more complex things such as object
recognition.
➢ All neurons from one layer are connected to all neurons from the next layer and therefore
create a huge network with millions of parameters.
➢ All of these connections have a weight which is updated during learning phase. Neurons
are activated if the sum of the input signals is above a certain threshold and a activation
function triggers the output.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 15
1. Introduction

▪ General Object Recognition Strategies: Artificial neural networks

A neural network containing one input layer, two hidden layer and
one output layer
Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 16
1. Introduction

▪ General Object Recognition Strategies: Artificial neural networks

➢ There are different types of networks such as feed-forward, recurrent with different
number and types of hidden layers, while the input (e.g. number of pixels) and output
(number of classes) layer are fixed.
➢ Later, convolutional neural networks and their hidden layers are explained in a more
detailed way in Section 4. New inputs go through the same way, some neurons might be
activated based on the trained network and finally, this leads to the most suitable
classification.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 17
1. Introduction

▪ Performance Analysis
➢ Invariances and Robustness
➢ Complexity
➢ Reliability and Accuracy

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 18
1. Introduction
❖ Performance Analysis: Invariances and Robustness
▪ First, the algorithms are analysed and checked whether invariances occur and what level
of robustness they have.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 19
1. Introduction
❖ Performance Analysis: Complexity
▪ Secondly, the algorithms are compared regarding complexity, especially in terms of
computational load and memory usage.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 22
1. Introduction
❖ Performance Analysis: Reliability and Accuracy

The development of accuracy rates of

traditional computer vision and deep
learning regarding ImageNet

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 26
Chapter 7. Object Recognition

❖1. Introduction

❖2. Pattern Matching

❖3. Feature-based Methods

❖4. Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 28
2. Pattern Matching
❖ Template matching is a technique for finding areas of an image that match (are similar) to
a template image (patch).
❖ How does it work?
▪ We need two primary components:
▪ Source image (I): The image in which we expect to find a match to the template image
▪ Template image (T): The patch image which will be compared to the template image our
goal is to detect the highest matching area:

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 29
2. Pattern Matching

❖ Template matching

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 30
2. Pattern Matching
❖ Template matching
▪ To identify the matching area, we have to compare the template image against the source
image by sliding it:

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 31
2. Pattern Matching
❖ Template matching
▪ By sliding, we mean moving the patch one pixel at a time (left to right, up to down). At
each location, a metric is calculated so it represents how "good" or "bad" the match at that
location is (or how similar the patch is to that particular area of the source image).
▪ For each location of T over I, you store the metric in the result matrix R. Each location
(x,y) in R contains the match metric:

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 32
2. Pattern Matching
❖ Template matching

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 33
2. Pattern Matching
❖ Template matching
▪ The image above is the result R of sliding the patch with a metric
TM_CCORR_NORMED. The brightest locations indicate the highest matches. As you can
see, the location marked by the red circle is probably the one with the highest value, so
that location (the rectangle formed by that point as a corner and width and height equal to
the patch image) is considered the match.

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 34
2. Pattern Matching
❖ Template matching
▪ Which are the matching methods available in OpenCV?

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 35
2. Pattern Matching
❖ Template matching
▪ Which are the matching methods available in OpenCV?

https://docs.opencv.org/4.3.0/de/da9/tutorial_template_matching.html 36
Chapter 7. Object Recognition

❖1. Introduction

❖2. Pattern Matching

❖3. Feature-based Methods

❖4. Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 37
3. Feature-based Methods

▪ Feature Detectors
▪ Feature Descriptors
▪ Feature Matching

38
3. Feature-based Methods
❖ Feature detectors

Image pairs with extracted patches below. Notice how some patches
can be localized or matched with higher accuracy than others.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 39
3. Feature-based Methods
❖ Feature detectors
▪ The simplest possible matching criterion for comparing two image patches:

where I0 and I1 are the two images being compared, u = (u, v) is the displacement vector, w(x) is a
spatially varying weighting (or window) function, and the summation i is over all the pixels in the patch.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 40
3. Feature-based Methods
❖ Feature detectors

Aperture problems for different image patches: (a) stable (“corner-like”) flow; (b) classic aperture problem
(barber-pole illusion); (c) textureless region. The two images I0 (yellow) and I1 (red) are overlaid. The red
vector u indicates the displacement between the patch centers and the w(xi) weighting function (patch
window) is shown as a dark circle.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 41
3. Feature-based Methods
❖ Feature detectors
▪ auto-correlation function or surface

Three auto-correlation surfaces EAC(Δu) shown as both grayscale

images and surface plots: (a) The original image is marked with
three red crosses to denote where the auto-correlation surfaces
were computed; (b) this patch is from the flower bed (good
unique minimum); (c) this patch is from the roof edge (one-
dimensional aperture problem); and (d) this patch is from the
cloud (no good peak). Each grid point in figures b–d is one value
of Δu.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 42
3. Feature-based Methods
❖ Feature detectors
▪ auto-correlation function or surface

Uncertainty ellipse corresponding to an eigenvalue analysis of

the auto-correlation matrix A.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 43
3. Feature-based Methods
❖ Feature detectors
▪ Forstner–Harris

Interest operator responses: (a) Sample image, (b) Harris response, and (c) DoG response. The circle sizes
and colors indicate the scale at which each interest point was detected. Notice how the two detectors
tend to respond at complementary locations.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 44
3. Feature-based Methods
❖ Feature detectors
▪ Adaptive non-maximal suppression (ANMS)

Adaptive non-maximal suppression (ANMS) (Brown,

Szeliski, and Winder 2005): The upper two images show
the strongest 250 and 500 interest points, while the
lower two images show the interest points selected
with adaptive non-maximal suppression, along with the
corresponding suppression radius r. Note how the latter
features have a much more uniform spatial distribution
across the image.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 45
3. Feature-based Methods
❖ Feature detectors
▪ Scale invariance

Multi-scale oriented patches (MOPS) extracted at five pyramid levels (Brown, Szeliski, and
Winder 2005). The boxes show the feature orientation and the region from which the
descriptor vectors are sampled.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 46
3. Feature-based Methods
❖ Feature detectors
▪ Scale invariance

Scale-space feature detection using a sub-octave Difference of Gaussian pyramid (Lowe 2004): (a) Adjacent
levels of a sub-octave Gaussian pyramid are subtracted to produce Difference of Gaussian images; (b) extrema
(maxima and minima) in the resulting 3D volume are detected by comparing a pixel to its 26 neighbors.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 47
3. Feature-based Methods
❖ Feature detectors
▪ Rotational invariance and orientation estimation

A dominant orientation estimate

can be computed by creating a
histogram of all the gradient
orientations (weighted by their
magnitudes or after thresholding
out small gradients) and then
finding the significant peaks in this
distribution (Lowe 2004)

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 48
3. Feature-based Methods
❖ Feature detectors
▪ Rotational invariance and orientation estimation

Affine region detectors used to match two images taken from dramatically different viewpoints
(Mikolajczyk and Schmid 2004)
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 49
3. Feature-based Methods
❖ Feature detectors
▪ Affine invariance

Affine normalization using the second moment matrices, as described by Mikolajczyk, Tuytelaars, Schmid et
al. (2005): After image coordinates are transformed using the matrices A0-1/2 and A1-1/2, they are related by a
pure rotation R, which can be estimated using a dominant orientation technique.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 50
3. Feature-based Methods
❖ Feature detectors
▪ Affine invariance

Maximally stable extremal regions (MSERs) extracted and matched from a number of images
(Matas, Chum, Urban et al. 2004)

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 51
3. Feature-based Methods

▪ Feature Detectors
▪ Feature Descriptors
▪ Feature Matching

52
3. Feature-based Methods
❖ Feature descriptors

Feature matching: how can we extract local descriptors that are invariant to inter-image variations and yet
still discriminative enough to establish correct correspondences?

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 53
3. Feature-based Methods
❖ Feature descriptors
▪ Bias and gain normalization (MOPS)

MOPS descriptors are formed using an 8×8 sampling of bias and gain normalized intensity values, with a
sample spacing of five pixels relative to the detection scale (Brown, Szeliski, and Winder 2005). This low
frequency sampling gives the features some robustness to interest point location error and is achieved by
sampling at a higher pyramid level than the detection scale.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 54
3. Feature-based Methods
❖ Feature descriptors
▪ Scale invariant feature transform (SIFT)
A schematic representation of Lowe’s
(2004) scale invariant feature transform
(SIFT): (a) Gradient orientations and
magnitudes are computed at each pixel
and weighted by a Gaussian fall-off
function (blue circle). (b) A weighted
gradient orientation histogram is then
computed in each subregion, using trilinear
interpolation. While this figure shows an 8
× 8 pixel patch and a 2 × 2 descriptor array,
Lowe’s actual implementation uses 16 × 16
patches and a 4 × 4 array of eight-bin
histograms.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 55
3. Feature-based Methods
❖ Feature descriptors
▪ Gradient location-orientation histogram (GLOH)

The gradient location-

orientation histogram
(GLOH) descriptor uses log-
polar bins instead of square
bins to compute orientation
histograms (Mikolajczyk and
Schmid 2005).

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 56
3. Feature-based Methods
❖ Feature descriptors

Spatial summation blocks for SIFT, GLOH, and some newly developed feature descriptors (Winder and Brown 2007): (a)
The parameters for the new features, e.g., their Gaussian weights, are learned from a training database of (b) matched
real-world image patches obtained from robust structure from motion applied to Internet photo collections (Hua, Brown,
and Winder 2007).

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 57
3. Feature-based Methods

▪ Feature Detectors
▪ Feature Descriptors
▪ Feature Matching

58
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

Recognizing objects in a cluttered scene (Lowe 2004). Two of the training images in the database are shown on the left.
These are matched to the cluttered scene in the middle using SIFT features, shown as small squares in the right image.
The affine warp of each recognized database image onto the scene is shown as a larger parallelogram in the right image.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 59
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

False positives and negatives: The black digits 1 and

2 are features being matched against a database
of features in other images. At the current
threshold setting (the solid circles), the green 1 is a
true positive (good match), the blue 1 is a false
negative (failure to match), and the red 3 is a false
positive (incorrect match). If we set the threshold
higher (the dashed circles), the blue 1 becomes a
true positive but the brown 4 becomes an
additional false positive.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 60
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

The number of matches correctly and incorrectly estimated by a feature matching algorithm, showing the number of
true positives (TP), false positives (FP), false negatives (FN) and true negatives (TN). The columns sum up to the actual
number of positives (P) and negatives (N), while the rows sum up to the predicted number of positives (P’) and
negatives (N’). The formulas for the true positive rate (TPR), the false positive rate (FPR), the positive predictive value
(PPV), and the accuracy (ACC) are given in the text.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 61
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 62
3. Feature-based Methods
❖ Feature matching
ROC curve and its related rates: (a) The
▪ Matching strategy and error rates ROC curve plots the true positive rate
against the false positive rate for a
particular combination of feature
extraction and matching algorithms.
Ideally, the true positive rate should be
close to 1, while the false positive rate is
close to 0. The area under the ROC curve
(AUC) is often used as a single (scalar)
measure of algorithm performance.
Alternatively, the equal error rate is
sometimes used. (b) The distribution of
positives (matches) and negatives (non-
matches) as a function of inter-feature
distance d. As the threshold θ is increased,
the number of true positives (TP) and false
positives (FP) increases.
Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 63
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 64
3. Feature-based Methods
❖ Feature matching
▪ Matching strategy and error rates

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 65
3. Feature-based Methods
❖ Feature matching
▪ Nearest neighbor distance ratio

where d1 and d2 are the nearest and second nearest neighbor distances, DA is the target
descriptor, and DB and DC are its closest two neighbors

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 66
3. Feature-based Methods

❖ Feature matching

Performance of the feature descriptors evaluated by Mikolajczyk and Schmid (2005), shown for three matching
strategies: (a) fixed threshold; (b) nearest neighbor; (c) nearest neighbor distance ratio (NNDR). Note how the
ordering of the algorithms does not change that much, but the overall performance varies significantly between the
different matching strategies.

Richard Szeliski, Computer Vision Algorithms and Applications, Springer-Verlag London Limited 2011. 67
Chapter 7. Object Recognition

❖1. Introduction

❖2. Pattern Matching

❖3. Feature-based Methods

❖4. Artificial Neural Networks

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 68
4. Artificial Neural Networks
❖ CNN - Convolutional Neural Network
▪ (Deep) convolutional neural networks (CNN): The term deep means that there is at least
one hidden layer and convolutional implies the use of convolution layers. The basic
principles of CNNs are inspired by the biological visual cortex of humans.
▪ The architecture of an example CNN can be seen in Slide 70. Input images with 28x28
pixels are convoluted with a filter to obtain 3D feature maps. The succeeding sub-
sampling, or often called pooling, layer further reduces the amount of data. This
procedure is continued until a one-dimensional vector, which represents the different
classes, is obtained.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 69
4. Artificial Neural Networks
❖ CNN - Convolutional Neural Network

One example architecture of a convolutional neural network using subsampling and convolution hidden layers.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 70
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ As most of the object recognition algorithms, CNNs need a training to adapt all weights of
the neurons. During that phase, different levels of features are extracted (see Slide 72).
▪ Low-level features contain colour, lines or contrast, whereas edges and corner belong to
mid-level features. High-level features already include class specific forms or sections.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 71
4. Artificial Neural Networks
❖ CNN - Convolutional Neural Network

Intermediate results from hidden layers. From left to right: low-level, mid-level and high-level features.

Simon Achatz, State of the art of object recognition techniques, Technische Universitat Muchen. 72
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Architecture Overview
➢ Regular Neural Nets: Neural Networks receive an input (a single vector), and
transform it through a series of hidden layers.
➢ Regular Neural Nets don’t scale well to full images.
➢ 3D volumes of neurons: width, height, depth.
➢ A ConvNet is made up of Layers. Every Layer has a simple API: It transforms an
input 3D volume to an output 3D volume with some differentiable function that
may or may not have parameters.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 73
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Architecture Overview

A regular 3-layer Neural Network.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 74
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Architecture Overview

A ConvNet arranges its neurons in three dimensions (width, height, depth), as visualized in one of the layers.
Every layer of a ConvNet transforms the 3D input volume to a 3D output volume of neuron activations. In this
example, the red input layer holds the image, so its width and height would be the dimensions of the image,
and the depth would be 3 (Red, Green, Blue channels).
Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 75
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Three main types of layers to build ConvNet architectures:
▪ Convolutional Layer
▪ Pooling Layer, and
▪ Fully-Connected Layer

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 76
4. Artificial Neural Networks

➢ Example Architecture: Overview: a simple ConvNet for CIFAR-10 classification

The activations of an example ConvNet architecture.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 77
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Example Architecture: Overview: a simple ConvNet for CIFAR-10 classification
▪ INPUT [32x32x3]: an image of width 32, height 32, and with three color
channels R,G,B.
▪ CONV layer will compute the output of neurons that are connected to local
regions in the input → volume [32x32x12] if we decided to use 12 filters.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 78
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Example Architecture: Overview: a simple ConvNet for CIFAR-10 classification
▪ RELU layer: elementwise activation function → the size of the volume
unchanged ([32x32x12]).
▪ POOL layer: downsampling operation → volume such as [16x16x12].
▪ FC (i.e. fully-connected) layer will compute the class scores, resulting in
volume of size [1x1x10].

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 79
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ A ConvNet architecture is in the simplest case a list of Layers that transform the
image volume into an output volume (e.g. holding the class scores)
➢ There are a few distinct types of Layers (e.g. CONV/FC/RELU/POOL are by far
the most popular)
➢ Each Layer accepts an input 3D volume and transforms it to an output 3D
volume through a differentiable function
➢ Each Layer may or may not have parameters (e.g. CONV/FC do, RELU/POOL
don’t)
➢ Each Layer may or may not have additional hyperparameters (e.g.
CONV/FC/POOL do, RELU doesn’t)
Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 80
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Input image

4x4x3 RGB Image

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 81
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Convolutional Layer

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 82
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Convolutional Layer

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 83
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Convolutional Layer

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 84
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Convolutional Layer
Parameters that control the behavior of each convolved layer
▪ Stride
▪ Padding (Zero-padding)
▪ Number of filter (depth of next layer)
▪ Size of the filter

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 85
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Stride

Value of stride is 1, with filter 3x3 on 7x7 image. Value of stride is 2, with filter 3x3 on 7x7 image.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 86
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Stride

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 87
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Padding (Zero-padding)

Showing image padded with zero-padding with value 2.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 88
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Padding (Zero-padding)
• same convolution: preserve the dimension of the image
• wide convolution: Adding zero-padding
• narrow convolution: not using zero-padding

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 89
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Number of filter (depth of next layer)
• Example: 6x6x3 image with four 3x3 filter.
• After convolving, will get 4x4xn, n is depends on the number of filter you use,
in another words, means that depends on the number of feature detector you
use. In this case, n will be 4.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 90
4. Artificial Neural Networks

➢ Convolutional Layer
▪ Size of the filter
• Size of the filter usually is odd number so that the filter has the “central
pixel”/”central vision” so to know the position of the filter.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 91
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Activation Function

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 92
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Activation Function
▪ Sigmoid Activation Function

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 93
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Activation Function
▪ ReLU Activation Function

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 94
4. Artificial Neural Networks

➢ Activation Function
▪ ReLU Activation Function

Applying ReLU to image.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 95
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ The Pooling Layer: Spatial Pooling reduces the dimensionality of each feature
map and retains the most important information of an image.
▪ Average pooling
▪ Max pooling
▪ Sum pooling

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 96
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ The Pooling Layer

Max-pooling in 2D image.
Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 97
4. Artificial Neural Networks

➢ The Pooling Layer

Max-pooling in 3D image, which is the one that we normally deal with.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 98
4. Artificial Neural Networks

➢ The Pooling Layer

Applying Max/Sum pooling to image that applied ReLU.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 99
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

▪ Layers used to build ConvNets
➢ Fully Connected Layer (FC)

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 100
4. Artificial Neural Networks

➢ Fully Connected Layer (FC)

Fully connected layer.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 101
4. Artificial Neural Networks

➢ Fully Connected Layer (FC)

▪ Softmax function: takes a vector of arbitrary real-valued scores and squashes
it to a vector of values between zero and one that sum to one.

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 102
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ LeNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 103
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ AlexNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 104
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ ZFNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 105
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ Inception-v4

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 106
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ VGGNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 107
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ VGGNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 108
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ ResNet

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition, 2020 109
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ Example: LeNet

110
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ Example: LeNet

111
4. Artificial Neural Networks

❖ CNN - Convolutional Neural Network

➢ Example: LeNet

112

Swarm Robotics: Fundamentals and Applications
From Everand
Swarm Robotics: Fundamentals and Applications
Fouad Sabry
No ratings yet
Robot Operating System ROS: Textbook P. 9 21
No ratings yet
Robot Operating System ROS: Textbook P. 9 21
31 pages
PROJECT REPORT Vidoe Tracking
100% (1)
PROJECT REPORT Vidoe Tracking
27 pages
Embedded Systems Application Areas
No ratings yet
Embedded Systems Application Areas
8 pages
Panoptic Segmentation
No ratings yet
Panoptic Segmentation
29 pages
SSRN Id4107251
No ratings yet
SSRN Id4107251
7 pages
Introduction To Robot Vision: Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University
No ratings yet
Introduction To Robot Vision: Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University
28 pages
Gesture Controlled Robot Using Image Processing
No ratings yet
Gesture Controlled Robot Using Image Processing
19 pages
Pedestrian Tracking Algorithm For Video Surveillance Based On Lightweight Convolutional Neural Network
No ratings yet
Pedestrian Tracking Algorithm For Video Surveillance Based On Lightweight Convolutional Neural Network
12 pages
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
No ratings yet
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
12 pages
Improved YOLOv4 Tiny Network For Real-Time Electronic Component Detection
No ratings yet
Improved YOLOv4 Tiny Network For Real-Time Electronic Component Detection
13 pages
Chapter 1 Robotics
No ratings yet
Chapter 1 Robotics
36 pages
Concurrent Activity Recognition With Multimodal CNN-LSTM Structure
No ratings yet
Concurrent Activity Recognition With Multimodal CNN-LSTM Structure
14 pages
Multi Object Tracking in Traffic Environments: A Systematic Literature
No ratings yet
Multi Object Tracking in Traffic Environments: A Systematic Literature
13 pages
Raspberry Pi Workshop Proposal
No ratings yet
Raspberry Pi Workshop Proposal
7 pages
16 Robotics Visions Warm Intelligence Traffic Safety
No ratings yet
16 Robotics Visions Warm Intelligence Traffic Safety
9 pages
Applsci 13 04144 v2
No ratings yet
Applsci 13 04144 v2
26 pages
Video Segmentation For Moving Object Detection Using Local Change & Entropy Based Adaptive Window Thresholding
No ratings yet
Video Segmentation For Moving Object Detection Using Local Change & Entropy Based Adaptive Window Thresholding
12 pages
Combining Multiple Sources of Knowledge in Deep Cnns For Action Recognition
No ratings yet
Combining Multiple Sources of Knowledge in Deep Cnns For Action Recognition
8 pages
Implementation of SLAM On Mobile Robots and Stitching of The Generated Maps
No ratings yet
Implementation of SLAM On Mobile Robots and Stitching of The Generated Maps
13 pages
Soft Robots Modeling A Structured Overview
No ratings yet
Soft Robots Modeling A Structured Overview
21 pages
Robotics
No ratings yet
Robotics
15 pages
CNN-based and DTW Features For Human Activity Recognition On Depth Maps
No ratings yet
CNN-based and DTW Features For Human Activity Recognition On Depth Maps
14 pages
Sun Human Action Recognition ICCV 2015 Paper
No ratings yet
Sun Human Action Recognition ICCV 2015 Paper
9 pages
Object Recognition
No ratings yet
Object Recognition
30 pages
Aerial Insights Deep Learning-Based Human Action Recognition in Drone Imagery
No ratings yet
Aerial Insights Deep Learning-Based Human Action Recognition in Drone Imagery
16 pages
Dynamic Mobile Robot Paper 1
No ratings yet
Dynamic Mobile Robot Paper 1
6 pages
Chapter 2 - Robot Kinematics
No ratings yet
Chapter 2 - Robot Kinematics
35 pages
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
No ratings yet
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
6 pages
Simultaneous Localization and Mapping For Robot Mapping
No ratings yet
Simultaneous Localization and Mapping For Robot Mapping
4 pages
Snsce / Ece: Sangeetha.K/Robotics & Automation
No ratings yet
Snsce / Ece: Sangeetha.K/Robotics & Automation
15 pages
12 - Goal Stack Planning
100% (1)
12 - Goal Stack Planning
65 pages
08 Robot Sensor Motor
No ratings yet
08 Robot Sensor Motor
29 pages
Robotics and Computer Vision in Swarm Intelligence and Traffic Safety
No ratings yet
Robotics and Computer Vision in Swarm Intelligence and Traffic Safety
10 pages
A Survey of Modern Deep Learning Based Object Detection Models
No ratings yet
A Survey of Modern Deep Learning Based Object Detection Models
19 pages
Vision Systems Applications PDF
No ratings yet
Vision Systems Applications PDF
618 pages
A Review of Deep Learning Methods and Applications For PDF
No ratings yet
A Review of Deep Learning Methods and Applications For PDF
14 pages
OIE 751 ROBOTICS Unit 3 Class 2 (12-9-2020)
No ratings yet
OIE 751 ROBOTICS Unit 3 Class 2 (12-9-2020)
15 pages
2021 - A Graph Neural Network Method For Distributed Anomaly Detection in IoT - Protogerou Et Al
No ratings yet
2021 - A Graph Neural Network Method For Distributed Anomaly Detection in IoT - Protogerou Et Al
18 pages
Odometry Sensors
No ratings yet
Odometry Sensors
20 pages
Cuda 9 and Beyond
100% (1)
Cuda 9 and Beyond
45 pages
3D Convolutional Neural Networks For Human Action Recognition
No ratings yet
3D Convolutional Neural Networks For Human Action Recognition
11 pages
Autonomous Agile Aerial Robots: by P.A.Murthy 09A81A04A4 Iv/Iv Ece
No ratings yet
Autonomous Agile Aerial Robots: by P.A.Murthy 09A81A04A4 Iv/Iv Ece
33 pages
Artificial Vision For Robots
No ratings yet
Artificial Vision For Robots
228 pages
Transformers in Single Object Tracking: An Experimental Survey
No ratings yet
Transformers in Single Object Tracking: An Experimental Survey
32 pages
Robotic Vision Systems: Dr. A. R. Jayan, ECE Dept., GEC Sreekrishnapuram
No ratings yet
Robotic Vision Systems: Dr. A. R. Jayan, ECE Dept., GEC Sreekrishnapuram
22 pages
Robotics Vision Lecture1
No ratings yet
Robotics Vision Lecture1
258 pages
Gps Tracker
No ratings yet
Gps Tracker
7 pages
Raspberry Pi Remote Access
No ratings yet
Raspberry Pi Remote Access
1 page
Intro2robotics PPT SMR
No ratings yet
Intro2robotics PPT SMR
48 pages
Content: Ambient Intelligence
100% (1)
Content: Ambient Intelligence
20 pages
Humanoid Robot
No ratings yet
Humanoid Robot
16 pages
Robotics Book
No ratings yet
Robotics Book
27 pages
Robotic Path Control Techniques
No ratings yet
Robotic Path Control Techniques
52 pages
Robotics: Hira Shabbir 15006101049
100% (1)
Robotics: Hira Shabbir 15006101049
13 pages
Computational Intelligence and Applications
No ratings yet
Computational Intelligence and Applications
4 pages
Research Article: Moving Object Detection Using Dynamic Motion Modelling From UAV Aerial Images
No ratings yet
Research Article: Moving Object Detection Using Dynamic Motion Modelling From UAV Aerial Images
13 pages
Object Detector For Blind Person
No ratings yet
Object Detector For Blind Person
20 pages
Computer Vision Based Moving Object Detection and Tracking: Suresh Kumar, Prof. Yatin Kumar Agarwal
No ratings yet
Computer Vision Based Moving Object Detection and Tracking: Suresh Kumar, Prof. Yatin Kumar Agarwal
6 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Chapter 9. Camera Calibration and 3D Reconstruction
No ratings yet
Chapter 9. Camera Calibration and 3D Reconstruction
45 pages
Chapter 8. Video Processing
No ratings yet
Chapter 8. Video Processing
49 pages
Chapter 4. Color Image Processing
No ratings yet
Chapter 4. Color Image Processing
37 pages
Chapter 5. Morphological Image Processing
No ratings yet
Chapter 5. Morphological Image Processing
33 pages