0% found this document useful (0 votes)

3 views

Object Detyection Using CNN

Lecture 9 discusses object detection and image segmentation, focusing on semantic segmentation as a core task in computer vision. It introduces various methods for semantic segmentation, including sliding window techniques and fully convolutional networks, which aim to classify each pixel in an image. The lecture also covers the importance of downsampling and upsampling in network design to maintain spatial resolution while processing images.

Uploaded by

geetha.r

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Object Detyection Using CNN

Uploaded by

geetha.r

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 113

Lecture 9:

Object Detection and Image

Segmentation

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 1 April 26, 2022
Image Classification: A core task in Computer Vision

(assume given a set of possible labels)

{dog, cat, truck, plane, ...}

cat

This image by Nikita is

licensed under CC-BY 2.0

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 2 April 26, 2022
Computer Vision Tasks
Semantic Object Instance
Classification
Segmentation Detection Segmentation

CAT GRASS, CAT, DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No spatial extent No objects, just pixels Multiple Object This image is CC0 public domain

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 3 April 26, 2022
Semantic Segmentation
Semantic Object Instance
Classification
Segmentation Detection Segmentation

CAT GRASS, CAT, DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No spatial extent No objects, just pixels Multiple Object

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 4 April 26, 2022
Semantic Segmentation: The Problem

GRASS, CAT, At test time, classify each pixel of a new image.

TREE, SKY, ...
Paired training data: for each training image,
each pixel is labeled with a semantic category.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 5 April 26, 2022
Semantic Segmentation Idea: Sliding Window

Full image

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 6 April 26, 2022
Semantic Segmentation Idea: Sliding Window

Full image

Impossible to classify without context

Q: how do we include context?

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 7 April 26, 2022
Semantic Segmentation Idea: Sliding Window

Full image

Q: how do we model this?

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 8 April 26, 2022
Semantic Segmentation Idea: Sliding Window
Classify center
Extract patch pixel with CNN

Full image
Cow

Cow

Grass

Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013
Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 9 April 26, 2022
Semantic Segmentation Idea: Sliding Window
Classify center
Extract patch pixel with CNN

Full image
Cow

Cow

Grass
Problem: Very inefficient! Not
reusing shared features between
overlapping patches Farabet et al, “Learning Hierarchical Features for Scene Labeling,” TPAMI 2013
Pinheiro and Collobert, “Recurrent Convolutional Neural Networks for Scene Labeling”, ICML 2014

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 10 April 26, 2022
Semantic Segmentation Idea: Convolution

Full image

An intuitive idea: encode the entire image with conv net, and do semantic segmentation
on top.

Problem: classification architectures often reduce feature spatial sizes to go deeper, but
semantic segmentation requires the output size to be the same as input size.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 11 April 26, 2022
Semantic Segmentation Idea: Fully Convolutional
Design a network with only convolutional layers
without downsampling operators to make predictions
for pixels all at once!

Conv Conv Conv Conv argmax

Input:
Scores: Predictions:
3xHxW
CxHxW HxW
Convolutions:
DxHxW

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 12 April 26, 2022
Semantic Segmentation Idea: Fully Convolutional
Design a network with only convolutional layers
without downsampling operators to make predictions
for pixels all at once!

Conv Conv Conv Conv argmax

Input:
Scores: Predictions:
3xHxW
CxHxW HxW
Convolutions:
Problem: convolutions at DxHxW
original image resolution will
be very expensive ...

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 13 April 26, 2022
Semantic Segmentation Idea: Fully Convolutional
Design network as a bunch of convolutional layers, with
downsampling and upsampling inside the network!

Med-res: Med-res:
D2 x H/4 x W/4 D2 x H/4 x W/4

Low-res:
D3 x H/4 x W/4
Input: High-res: High-res: C x H x W Predictions:
3xHxW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 14 April 26, 2022
Semantic Segmentation Idea: Fully Convolutional
Downsampling: Design network as a bunch of convolutional layers, with Upsampling:
Pooling, strided downsampling and upsampling inside the network! ???
convolution
Med-res: Med-res:
D2 x H/4 x W/4 D2 x H/4 x W/4

Low-res:
D3 x H/4 x W/4
Input: High-res: High-res: C x H x W Predictions:
3xHxW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 15 April 26, 2022
In-Network upsampling: “Unpooling”

Nearest Neighbor “Bed of Nails”

1 1 2 2 1 0 2 0

1 2 1 1 2 2 1 2 0 0 0 0

3 4 3 3 4 4 3 4 3 0 4 0

3 3 4 4 0 0 0 0

Input: 2 x 2 Output: 4 x 4 Input: 2 x 2 Output: 4 x 4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 16 April 26, 2022
In-Network upsampling: “Max Unpooling”
Max Pooling
Max Unpooling
Remember which element was max!
Use positions from
1 2 6 3 pooling layer 0 0 2 0

1 2 0 1 0 0
3 5 2 1 5 6
… 3 4
1 2 2 1 7 8 0 0 0 0
Rest of the network
7 3 4 8 3 0 0 4

Input: 4 x 4 Output: 2 x 2 Input: 2 x 2 Output: 4 x 4

Corresponding pairs of
downsampling and
upsampling layers

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 17 April 26, 2022
Learnable Upsampling
Recall: Normal 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 18 April 26, 2022
Learnable Upsampling
Recall: Normal 3 x 3 convolution, stride 1 pad 1

Dot product
between filter
and input

Input: 4 x 4 Output: 4 x 4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 19 April 26, 2022
Learnable Upsampling
Recall: Normal 3 x 3 convolution, stride 1 pad 1

Dot product
between filter
and input

Input: 4 x 4 Output: 4 x 4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 20 April 26, 2022
Learnable Upsampling
Recall: Normal 3 x 3 convolution, stride 2 pad 1

Input: 4 x 4 Output: 2 x 2

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 21 April 26, 2022
Learnable Upsampling
Recall: Normal 3 x 3 convolution, stride 2 pad 1

Dot product
between filter
and input

Input: 4 x 4 Output: 2 x 2

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 22 April 26, 2022
Learnable Upsampling
Recall: Normal 3 x 3 convolution, stride 2 pad 1

Filter moves 2 pixels in

the input for every one
pixel in the output
Dot product
between filter Stride gives ratio between
and input movement in input and
output

We can interpret strided

Input: 4 x 4 Output: 2 x 2 convolution as “learnable
downsampling”.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 23 April 26, 2022
Learnable Upsampling: Transposed Convolution
3 x 3 transposed convolution, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 24 April 26, 2022
Learnable Upsampling: Transposed Convolution
3 x 3 transposed convolution, stride 2 pad 1

Input gives
weight for
filter

Input: 2 x 2 Output: 4 x 4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 25 April 26, 2022
Learnable Upsampling: Transposed Convolution
3 x 3 transposed convolution, stride 2 pad 1

Filter moves 2 pixels in

Input gives the output for every one
weight for pixel in the input
filter
Stride gives ratio between
movement in output and
input
Input: 2 x 2 Output: 4 x 4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 26 April 26, 2022
Learnable Upsampling: Transposed Convolution
Sum where
3 x 3 transposed convolution, stride 2 pad 1 output overlaps

Filter moves 2 pixels in

Input gives the output for every one
weight for pixel in the input
filter
Stride gives ratio between
movement in output and
input
Input: 2 x 2 Output: 4 x 4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 27 April 26, 2022
Learnable Upsampling: Transposed Convolution
Sum where
Q: Why is it called 3 x 3 transposed convolution, stride 2 pad 1 output overlaps
transposed
convolution?

Filter moves 2 pixels in

Input gives the output for every one
weight for pixel in the input
filter
Stride gives ratio between
movement in output and
input
Input: 2 x 2 Output: 4 x 4

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 28 April 26, 2022
Learnable Upsampling: 1D Example
Output
Input Filter Output contains
ax copies of the filter
weighted by the
x ay input, summing at
where at overlaps in
a the output
y az + bx
b
z by
bz

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 29 April 26, 2022
Convolution as Matrix Multiplication (1D Example)
We can express convolution in
terms of a matrix multiplication

Example: 1D conv, kernel

size=3, stride=2, padding=1

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 30 April 26, 2022
Convolution as Matrix Multiplication (1D Example)
We can express convolution in Transposed convolution multiplies by the
terms of a matrix multiplication transpose of the same matrix:

Example: 1D conv, kernel Example: 1D transposed conv, kernel

size=3, stride=2, padding=1 size=3, stride=2, padding=0

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 31 April 26, 2022
Semantic Segmentation Idea: Fully Convolutional
Upsampling:
Downsampling: Design network as a bunch of convolutional layers, with
Unpooling or strided
Pooling, strided downsampling and upsampling inside the network!
transposed convolution
convolution
Med-res: Med-res:
D2 x H/4 x W/4 D2 x H/4 x W/4

Low-res:
D3 x H/4 x W/4
Input: High-res: High-res: Predictions:
3xHxW D1 x H/2 x W/2 D1 x H/2 x W/2 HxW

Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation”, CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 32 April 26, 2022
Semantic Segmentation: Summary

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 33 April 26, 2022
Semantic Segmentation This image is CC0 public domain

Label each pixel in the

image with a category
label

s
Sky

ee
Sky

Tr
e
es
Don’t differentiate
Cat Cow
instances, only care about
pixels
Grass
Grass

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 34 April 26, 2022
Object Detection
Semantic Object Instance
Classification
Segmentation Detection Segmentation

CAT GRASS, CAT, DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No spatial extent No objects, just pixels Multiple Object

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 35 April 26, 2022
Object Detection
Semantic Object Instance
Classification
Segmentation Detection Segmentation

CAT GRASS, CAT, DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No spatial extent No objects, just pixels Multiple Object

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 36 April 26, 2022
Object Detection: Single Object
(Classification + Localization)
Class Scores
Fully Cat: 0.9
Connected: Dog: 0.05
4096 to 1000 Car: 0.01
x, y ...

w
This image is CC0 public domain Vector: Fully
Connected:
4096 4096 to 4 Box
Coordinates
(x, y, w, h)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 37 April 26, 2022
Object Detection: Single Object Correct label:
Cat
(Classification + Localization)
Class Scores
Fully Cat: 0.9 Softmax
Connected: Dog: 0.05 Loss
4096 to 1000 Car: 0.01
x, y ...

w
This image is CC0 public domain Vector: Fully
Connected:
4096 4096 to 4 Box
Coordinates L2 Loss
(x, y, w, h)
Treat localization as a
regression problem! Correct box:
(x’, y’, w’, h’)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 38 April 26, 2022
Object Detection: Single Object Correct label:
Cat
(Classification + Localization)
Class Scores
Fully Cat: 0.9 Softmax
Connected: Dog: 0.05 Loss
4096 to 1000 Car: 0.01
x, y ...

h Multitask Loss + Loss

w
This image is CC0 public domain Vector: Fully
Connected:
4096 4096 to 4 Box
Coordinates L2 Loss
(x, y, w, h)
Treat localization as a
regression problem! Correct box:
(x’, y’, w’, h’)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 39 April 26, 2022
Object Detection: Multiple Objects
CAT: (x, y, w, h)

DOG: (x, y, w, h)
DOG: (x, y, w, h)
CAT: (x, y, w, h)

DUCK: (x, y, w, h)
DUCK: (x, y, w, h)
….

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 40 April 26, 2022
Each image needs a
Object Detection: Multiple Objects different number of outputs!

CAT: (x, y, w, h) 4 numbers

DOG: (x, y, w, h)
DOG: (x, y, w, h) 12 numbers
CAT: (x, y, w, h)

DUCK: (x, y, w, h) Many

DUCK: (x, y, w, h) numbers!
….

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 41 April 26, 2022
Object Detection: Multiple Objects
Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? NO
Cat? NO
Background? YES

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 42 April 26, 2022
Object Detection: Multiple Objects
Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? YES
Cat? NO
Background? NO

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 43 April 26, 2022
Object Detection: Multiple Objects
Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? YES
Cat? NO
Background? NO

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 44 April 26, 2022
Object Detection: Multiple Objects
Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? NO
Cat? YES
Background? NO

Q: What’s the problem with this approach?

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 45 April 26, 2022
Object Detection: Multiple Objects
Apply a CNN to many different crops of the
image, CNN classifies each crop as object
or background

Dog? NO
Cat? YES
Background? NO

Problem: Need to apply CNN to huge

number of locations, scales, and aspect
ratios, very computationally expensive!

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 46 April 26, 2022
Region Proposals: Selective Search
● Find “blobby” image regions that are likely to contain objects
● Relatively fast to run; e.g. Selective Search gives 2000 region
proposals in a few seconds on CPU

Alexe et al, “Measuring the objectness of image windows”, TPAMI 2012

Uijlings et al, “Selective Search for Object Recognition”, IJCV 2013
Cheng et al, “BING: Binarized normed gradients for objectness estimation at 300fps”, CVPR 2014
Zitnick and Dollar, “Edge boxes: Locating object proposals from edges”, ECCV 2014

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 47 April 26, 2022
R-CNN

Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Input image Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 48 April 26, 2022
R-CNN

Regions of Interest
(RoI) from a proposal
method (~2k)
Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Input image Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 49 April 26, 2022
R-CNN

Warped image regions

(224x224 pixels)
Regions of Interest
(RoI) from a proposal
method (~2k)
Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Input image Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 50 April 26, 2022
R-CNN

ConvN Forward each region

et through ConvNet
ConvN
(ImageNet-pretranied)
et
ConvN
et Warped image regions
(224x224 pixels)
Regions of Interest
(RoI) from a proposal
method (~2k)
Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Input image Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 51 April 26, 2022
R-CNN
SVMs Classify regions with
SVMs SVMs

SVMs
ConvN Forward each region
et through ConvNet
ConvN
(ImageNet-pretranied)
et
ConvN
et Warped image regions
(224x224 pixels)
Regions of Interest
(RoI) from a proposal
method (~2k)
Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Input image Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 52 April 26, 2022
Predict “corrections” to the RoI: 4 numbers: (dx, dy, dw, dh)
R-CNN
Bbox reg SVMs Classify regions with
Bbox reg SVMs SVMs

Bbox reg SVMs

ConvN Forward each region
et through ConvNet
ConvN
(ImageNet-pretranied)
et
ConvN
et Warped image regions
(224x224 pixels)
Regions of Interest
(RoI) from a proposal
method (~2k)
Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Input image Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 53 April 26, 2022
Predict “corrections” to the RoI: 4 numbers: (dx, dy, dw, dh)
R-CNN
Bbox reg SVMs Classify regions with Problem: Very slow!
Bbox reg SVMs SVMs
Need to do ~2k
Bbox reg SVMs independent forward
ConvN Forward each
passes for each image!
et region through
ConvN
ConvNet
et
ConvN
et Warped image regions
(224x224 pixels)
Regions of Interest
(RoI) from a proposal
method (~2k)
Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Input image Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 54 April 26, 2022
Predict “corrections” to the RoI: 4 numbers: (dx, dy, dw, dh)
“Slow” R-CNN
Bbox reg SVMs Classify regions with Problem: Very slow!
Bbox reg SVMs SVMs
Need to do ~2k
Bbox reg SVMs independent forward
ConvN Forward each
passes for each image!
et region through
ConvN
ConvNet Idea: Pass the
et
ConvN
image through
et Warped image regions convnet before
(224x224 pixels) cropping! Crop the
Regions of Interest conv feature instead!
(RoI) from a proposal
method (~2k)
Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
Input image Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 55 April 26, 2022
Fast R-CNN

“Slow” R-CNN

Input image

Girshick, “Fast R-CNN”, ICCV 2015. Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 56 April 26, 2022
Fast R-CNN

“Slow” R-CNN

“conv5” features

Run whole image

“Backbone” through ConvNet
network:
AlexNet, VGG, ConvNet
ResNet, etc
Input image

Girshick, “Fast R-CNN”, ICCV 2015. Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 57 April 26, 2022
Fast R-CNN

Regions of “Slow” R-CNN

Interest (RoIs)
from a proposal
method “conv5” features

Run whole image

“Backbone” through ConvNet
network:
AlexNet, VGG, ConvNet
ResNet, etc
Input image

Girshick, “Fast R-CNN”, ICCV 2015. Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 58 April 26, 2022
Fast R-CNN

Regions of “Slow” R-CNN

Interest (RoIs)
Crop + Resize features
from a proposal
method “conv5” features

Run whole image

“Backbone” through ConvNet
network:
AlexNet, VGG, ConvNet
ResNet, etc
Input image

Girshick, “Fast R-CNN”, ICCV 2015. Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 59 April 26, 2022
Fast R-CNN
Object Linear +
softmax Linear Box offset
category
Regions of CNN Per-Region Network “Slow” R-CNN
Interest (RoIs)
Crop + Resize features
from a proposal
method “conv5” features

Run whole image

“Backbone” through ConvNet
network:
AlexNet, VGG, ConvNet
ResNet, etc
Input image

Girshick, “Fast R-CNN”, ICCV 2015. Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 60 April 26, 2022
Fast R-CNN
Object Linear +
softmax Linear Box offset
category
Regions of CNN Per-Region Network “Slow” R-CNN
Interest (RoIs)
Crop + Resize features
from a proposal
method “conv5” features

Run whole image

“Backbone” through ConvNet
network:
AlexNet, VGG, ConvNet
ResNet, etc
Input image

Girshick, “Fast R-CNN”, ICCV 2015. Figure copyright Ross Girshick, 2015; source. Reproduced with permission.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 61 April 26, 2022
Cropping Features: RoI Pool

CNN

Input Image Image features: C x H x W

(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15)

Girshick, “Fast R-CNN”, ICCV 2015.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 62 April 26, 2022
Cropping Features: RoI Pool
Project proposal
onto features

CNN

Input Image Image features: C x H x W

(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15)

Girshick, “Fast R-CNN”, ICCV 2015.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 63 April 26, 2022
Cropping Features: RoI Pool “Snap” to
grid cells
Project proposal
onto features

CNN

Input Image Image features: C x H x W

(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15)

Girshick, “Fast R-CNN”, ICCV 2015.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 64 April 26, 2022
Cropping Features: RoI Pool “Snap” to
grid cells
Project proposal
onto features

Q: how do we resize the 512

x 5 x 4 region to, e.g., a 512
x 2 x 2 tensor?.
CNN

Input Image Image features: C x H x W

(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15)

Girshick, “Fast R-CNN”, ICCV 2015.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 65 April 26, 2022
Cropping Features: RoI Pool “Snap” to
Divide into 2x2
grid of (roughly)
grid cells equal subregions
Project proposal
onto features

Q: how do we resize the 512

x 5 x 4 region to, e.g., a 512
x 2 x 2 tensor?.
CNN

Input Image Image features: C x H x W

(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15)

Girshick, “Fast R-CNN”, ICCV 2015.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 66 April 26, 2022
Cropping Features: RoI Pool “Snap” to
Divide into 2x2
grid of (roughly)
grid cells equal subregions
Project proposal
onto features
Max-pool within
each subregion

CNN

Region features
(here 512 x 2 x 2;
In practice e.g 512 x 7 x 7)
Input Image Image features: C x H x W Region features always the
(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15) same size even if input
regions have different sizes!
Girshick, “Fast R-CNN”, ICCV 2015.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 67 April 26, 2022
Cropping Features: RoI Pool “Snap” to
Divide into 2x2
grid of (roughly)
grid cells equal subregions
Project proposal
onto features
Max-pool within
each subregion

CNN

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 68 April 26, 2022
Cropping Features: RoI Align
No “snapping”!
Project proposal
onto features

CNN

Input Image Image features: C x H x W

(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15)

He et al, “Mask R-CNN”, ICCV 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 69 April 26, 2022
Sample at regular points
Cropping Features: RoI Align in each subregion using
No “snapping”! bilinear interpolation
Project proposal
onto features

CNN

Input Image Image features: C x H x W

(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15)

He et al, “Mask R-CNN”, ICCV 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 70 April 26, 2022
Sample at regular points
Cropping Features: RoI Align in each subregion using
No “snapping”! bilinear interpolation
Project proposal
onto features

CNN

Feature fxy for point (x, y)

is a linear combination of
Input Image Image features: C x H x W features at its four
(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15) neighboring grid cells:
He et al, “Mask R-CNN”, ICCV 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 71 April 26, 2022
Sample at regular points
Cropping Features: RoI Align in each subregion using
No “snapping”! bilinear interpolation
Project proposal
51
onto features f ∈R
f11∈R512 21 2
(x1,y1)
(x2,y1)
(x,y)
512
f12∈R f22∈R 512
CNN
(x1,y2) (x2,y2)

Feature fxy for point (x, y)

is a linear combination of
Input Image Image features: C x H x W features at its four
(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15) neighboring grid cells:
He et al, “Mask R-CNN”, ICCV 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 72 April 26, 2022
Sample at regular points
Cropping Features: RoI Align in each subregion using
No “snapping”! bilinear interpolation
Project proposal
onto features
Max-pool within
each subregion

CNN

Region features
(here 512 x 2 x 2;
In practice e.g 512 x 7 x 7)
Input Image Image features: C x H x W
(e.g. 3 x 640 x 480) (e.g. 512 x 20 x 15)

He et al, “Mask R-CNN”, ICCV 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 73 April 26, 2022
R-CNN vs Fast R-CNN

Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014.
He et al, “Spatial pyramid pooling in deep convolutional networks for visual recognition”, ECCV 2014
Girshick, “Fast R-CNN”, ICCV 2015

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 74 April 26, 2022
R-CNN vs Fast R-CNN

Problem:
Runtime dominated
by region proposals!

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 75 April 26, 2022
Faster R-CNN:
Make CNN do proposals!

Insert Region Proposal

Network (RPN) to predict
proposals from features

Otherwise same as Fast R-CNN:

Crop features for each proposal,
classify each one

Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
Figure copyright 2015, Ross Girshick; reproduced with permission

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 76 April 26, 2022
Region Proposal Network

CNN

Input Image
(e.g. 3 x 640 x 480) Image features
(e.g. 512 x 20 x 15)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 77 April 26, 2022
Region Proposal Network Imagine an anchor box
of fixed size at each
point in the feature map

CNN

Input Image
(e.g. 3 x 640 x 480) Image features
(e.g. 512 x 20 x 15)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 78 April 26, 2022
Region Proposal Network Imagine an anchor box
of fixed size at each
point in the feature map

Anchor is an object?
1 x 20 x 15
CNN Conv

At each point, predict

Input Image
(e.g. 3 x 640 x 480)
whether the corresponding
Image features
(e.g. 512 x 20 x 15) anchor contains an object
(binary classification)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 79 April 26, 2022
Region Proposal Network Imagine an anchor box
of fixed size at each
point in the feature map

Anchor is an object?
1 x 20 x 15
CNN Conv
Box corrections
4 x 20 x 15

For positive boxes, also predict

Input Image
(e.g. 3 x 640 x 480)
a corrections from the anchor to
Image features
(e.g. 512 x 20 x 15) the ground-truth box (regress 4
numbers per pixel)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 80 April 26, 2022
In practice use K different
Region Proposal Network anchor boxes of different
size / scale at each point

Anchor is an object?
K x 20 x 15
CNN Conv
Box transforms
4K x 20 x 15

Input Image
(e.g. 3 x 640 x 480) Image features
(e.g. 512 x 20 x 15)

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 81 April 26, 2022
In practice use K different
Region Proposal Network anchor boxes of different
size / scale at each point

Anchor is an object?
K x 20 x 15
CNN Conv
Box transforms
4K x 20 x 15

Sort the K2015 boxes by

Input Image their “objectness” score,
(e.g. 3 x 640 x 480) Image features take top ~300 as our
(e.g. 512 x 20 x 15) proposals

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 82 April 26, 2022
Faster R-CNN:
Make CNN do proposals!

Jointly train with 4 losses:

1. RPN classify object / not object
2. RPN regress box coordinates
3. Final classification score (object
classes)
4. Final box coordinates

Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
Figure copyright 2015, Ross Girshick; reproduced with permission

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 83 April 26, 2022
Faster R-CNN:
Make CNN do proposals!

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 84 April 26, 2022
Faster R-CNN:
Make CNN do proposals!

Glossing over many details:

- Ignore overlapping proposals with
non-max suppression
- How are anchors determined?
- How do we sample positive /
negative samples for training the
RPN?
- How to parameterize bounding
box regression?

Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015
Figure copyright 2015, Ross Girshick; reproduced with permission

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 85 April 26, 2022
Faster R-CNN:
Make CNN do proposals!

Faster R-CNN is a
Two-stage object detector

First stage: Run once per image

- Backbone network
- Region proposal network

Second stage: Run once per region

- Crop features: RoI pool / align
- Predict object class
- Prediction bbox offset

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 86 April 26, 2022
Faster R-CNN: Do we really need
Make CNN do proposals! the second stage?

Faster R-CNN is a
Two-stage object detector

First stage: Run once per image

- Backbone network
- Region proposal network

Second stage: Run once per region

- Crop features: RoI pool / align
- Predict object class
- Prediction bbox offset

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 87 April 26, 2022
Single-Stage Object Detectors: YOLO / SSD / RetinaNet
Within each grid cell:
- Regress from each of the B
base boxes to a final box
with 5 numbers:
(dx, dy, dh, dw, confidence)
- Predict scores for each of C
classes (including
background as a class)
- Looks a lot like RPN, but
category-specific!
Input image Divide image into grid
3xHxW 7x7
Image a set of base boxes
Output:
Redmon et al, “You Only Look Once:
Unified, Real-Time Object Detection”, CVPR 2016 centered at each grid cell 7 x 7 x (5 * B + C)
Liu et al, “SSD: Single-Shot MultiBox Detector”, ECCV 2016
Lin et al, “Focal Loss for Dense Object Detection”, ICCV 2017 Here B = 3

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 88 April 26, 2022
Object Detection: Lots of variables ...
Backbone “Meta-Architecture” Takeaways
Network Two-stage: Faster R-CNN Faster R-CNN is slower
VGG16 Single-stage: YOLO / SSD but more accurate
ResNet-101 Hybrid: R-FCN
Inception V2 SSD is much faster but
Inception V3 Image Size not as accurate
Inception # Region Proposals
ResNet … Bigger / Deeper
MobileNet backbones work better
Huang et al, “Speed/accuracy trade-offs for modern convolutional object detectors”, CVPR 2017

R-FCN: Dai et al, “R-FCN: Object Detection via Region-based Fully Convolutional Networks”, NIPS 2016
Inception-V2: Ioffe and Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, ICML 2015
Inception V3: Szegedy et al, “Rethinking the Inception Architecture for Computer Vision”, arXiv 2016
Inception ResNet: Szegedy et al, “Inception-V4, Inception-ResNet and the Impact of Residual Connections on Learning”, arXiv 2016
MobileNet: Howard et al, “Efficient Convolutional Neural Networks for Mobile Vision Applications”, arXiv 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 89 April 26, 2022
Object Detection: Lots of variables ...
Backbone “Meta-Architecture” Takeaways
Network Two-stage: Faster R-CNN Faster R-CNN is slower
VGG16 Single-stage: YOLO / SSD but more accurate
ResNet-101 Hybrid: R-FCN
Inception V2 SSD is much faster but
Inception V3 Image Size not as accurate
Inception # Region Proposals
ResNet … Bigger / Deeper
MobileNet backbones work better
Huang et al, “Speed/accuracy trade-offs for modern convolutional object detectors”, CVPR 2017
Zou et al, “Object Detection in 20 Years: A Survey”, arXiv 2019
R-FCN: Dai et al, “R-FCN: Object Detection via Region-based Fully Convolutional Networks”, NIPS 2016
Inception-V2: Ioffe and Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, ICML 2015
Inception V3: Szegedy et al, “Rethinking the Inception Architecture for Computer Vision”, arXiv 2016
Inception ResNet: Szegedy et al, “Inception-V4, Inception-ResNet and the Impact of Residual Connections on Learning”, arXiv 2016
MobileNet: Howard et al, “Efficient Convolutional Neural Networks for Mobile Vision Applications”, arXiv 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 90 April 26, 2022
Instance Segmentation
Semantic Object Instance
Classification
Segmentation Detection Segmentation

CAT GRASS, CAT, DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No spatial extent No objects, just pixels Multiple Object

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 91 April 26, 2022
Object Detection:
Faster R-CNN

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 92 April 26, 2022
Instance Segmentation: Mask Prediction

Mask R-CNN

Add a small mask

network that operates
on each RoI and
predicts a 28x28
binary mask

He et al, “Mask R-CNN”, ICCV 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 93 April 26, 2022
Mask R-CNN
Classification Scores: C
Box coordinates (per class): 4 * C

CNN Conv Conv

+RPN RoI Align

256 x 14 x 14 256 x 14 x 14 Predict a mask for

each of C classes

C x 28 x 28
He et al, “Mask R-CNN”, arXiv 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 94 April 26, 2022
Mask R-CNN: Example Mask Training Targets

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 95 April 26, 2022
Mask R-CNN: Example Mask Training Targets

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 96 April 26, 2022
Mask R-CNN: Example Mask Training Targets

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 97 April 26, 2022
Mask R-CNN: Example Mask Training Targets

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 98 April 26, 2022
Mask R-CNN: Very Good Results!

He et al, “Mask R-CNN”, ICCV 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 99 April 26, 2022
Mask R-CNN
Also does pose

He et al, “Mask R-CNN”, ICCV 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 100 April 26, 2022
Open Source Frameworks
Lots of good implementations on GitHub!

TensorFlow Detection API:

https://github.com/tensorflow/models/tree/master/research/object_detection
Faster RCNN, SSD, RFCN, Mask R-CNN, ...

Detectron2 (PyTorch)
https://github.com/facebookresearch/detectron2
Mask R-CNN, RetinaNet, Faster R-CNN, RPN, Fast R-CNN, R-FCN, ...

Finetune on your own dataset with pre-trained models

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 101 April 26, 2022
Beyond 2D Object Detection...

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 102 April 26, 2022
Object Detection + Captioning
= Dense Captioning

Johnson, Karpathy, and Fei-Fei, “DenseCap: Fully Convolutional Localization Networks for Dense Captioning”, CVPR 2016
Figure copyright IEEE, 2016. Reproduced for educational purposes.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 103 April 26, 2022
Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 104 April 26, 2022
Dense Video Captioning

Ranjay Krishna et al., “Dense-Captioning Events in Videos”, ICCV 2017

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 105 April 26, 2022
Objects + Relationships = Scene Graphs

Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz,
Stephanie Chen et al. "Visual genome: Connecting language and vision using
crowdsourced dense image annotations." International Journal of Computer Vision 123,
no. 1 (2017): 32-73.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 106 April 26, 2022
Scene Graph Prediction

Xu, Zhu, Choy, and Fei-Fei, “Scene Graph Generation by Iterative Message Passing”, CVPR 2017
Figure copyright IEEE, 2018. Reproduced for educational purposes.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 107 April 26, 2022
3D Object Detection
2D Object Detection:
2D bounding box
(x, y, w, h)

3D Object Detection:
3D oriented bounding box
(x, y, z, w, h, l, r, p, y)

Simplified bbox: no roll & pitch

Much harder problem than 2D

object detection!
This image is CC0 public domain

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 108 April 26, 2022
3D Object Detection: Simple Camera Model
A point on the image plane
corresponds to a ray in the 3D
3D ray space

A 2D bounding box on an image

is a frustrum in the 3D space
2D point
Localize an object in 3D:
The object can be anywhere in
camera
viewing frustrum the camera viewing frustrum!
image plane
camera

Image source: https://www.pcmag.com/encyclopedia_images/_FRUSTUM.GIF

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 109 April 26, 2022
3D Object Detection: Monocular Camera

Faster R-CNN

- Same idea as Faster RCNN, but proposals are in 3D

- 3D bounding box proposal, regress 3D box parameters + class score
Chen, Xiaozhi, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel
Urtasun. "Monocular 3d object detection for autonomous driving." CVPR 2016.

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 110 April 26, 2022
3D Shape Prediction: Mesh R-CNN

Gkioxari et al., Mesh RCNN, ICCV 2019

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 111 April 26, 2022
Recap: Lots of computer vision tasks!
Semantic Object Instance
Classification
Segmentation Detection Segmentation

CAT GRASS, CAT, DOG, DOG, CAT DOG, DOG, CAT

TREE, SKY

No spatial extent No objects, just pixels Multiple Object This image is CC0 public domain

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 112 April 26, 2022
Next time: Recurrent Neural Networks

Fei-Fei Li, Jiajun Wu, Ruohan Gao Lecture 9 - 113 April 26, 2022

UserManual TCR-25XX TCT-25X1 RET-25X2 M2072 9 English 20180509
No ratings yet
UserManual TCR-25XX TCT-25X1 RET-25X2 M2072 9 English 20180509
64 pages
Segmentation Detection
100% (1)
Segmentation Detection
109 pages
Lect-7 Segmentation Localization
No ratings yet
Lect-7 Segmentation Localization
151 pages
12. Object Detection-compressed
No ratings yet
12. Object Detection-compressed
80 pages
Lecture 5 - CNNs For Detection and Segmentation
No ratings yet
Lecture 5 - CNNs For Detection and Segmentation
62 pages
8-Image Detection and Segmentation
No ratings yet
8-Image Detection and Segmentation
73 pages
Lecture07 VDL Part01
No ratings yet
Lecture07 VDL Part01
90 pages
lecture4
No ratings yet
lecture4
46 pages
5. Object Detection and Segmentation - part 2
No ratings yet
5. Object Detection and Segmentation - part 2
36 pages
14 Segmentation
No ratings yet
14 Segmentation
22 pages
05 CNN 2
No ratings yet
05 CNN 2
92 pages
[Fall 2024] Images and Convolutions
No ratings yet
[Fall 2024] Images and Convolutions
69 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
38 pages
Semantic Segmentation of Images
No ratings yet
Semantic Segmentation of Images
76 pages
Dlcv2017d3l1segmentation 170623173102
No ratings yet
Dlcv2017d3l1segmentation 170623173102
36 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
Sarma Cnn Vce Oct 2022
No ratings yet
Sarma Cnn Vce Oct 2022
63 pages
02 Semantic Segmentation 2024
No ratings yet
02 Semantic Segmentation 2024
53 pages
HODL Lec 3 DNNs For Vision 1
No ratings yet
HODL Lec 3 DNNs For Vision 1
36 pages
Chapter Convolutional Neural Networks
No ratings yet
Chapter Convolutional Neural Networks
7 pages
CNN AI
No ratings yet
CNN AI
17 pages
CV Lab 12 - Implementatin of a Simple CNN
No ratings yet
CV Lab 12 - Implementatin of a Simple CNN
9 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
7 pages
CNN Iitkgp
No ratings yet
CNN Iitkgp
112 pages
Lecture4 - Convnets For CV Slide
No ratings yet
Lecture4 - Convnets For CV Slide
65 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
102 pages
Lecture 2 PDF
No ratings yet
Lecture 2 PDF
62 pages
Ml@ok Questions
No ratings yet
Ml@ok Questions
16 pages
notes_chapter_Convolutional_Neural_Networks
No ratings yet
notes_chapter_Convolutional_Neural_Networks
6 pages
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
No ratings yet
Fully Convolutional Networks For Semantic Segmentation: Jonathan Long Evan Shelhamer Trevor Darrell UC Berkeley
10 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
Deep Learning CNN
No ratings yet
Deep Learning CNN
204 pages
CS60010_CNN 4
No ratings yet
CS60010_CNN 4
32 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Image Recognition Using Neural Networks
No ratings yet
Image Recognition Using Neural Networks
18 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
L11 Learning III Neural Network Architectures
No ratings yet
L11 Learning III Neural Network Architectures
35 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
05introduction To Convolutional Neural Networks
No ratings yet
05introduction To Convolutional Neural Networks
72 pages
DAAI - Lecture - 15 - 23nov22
No ratings yet
DAAI - Lecture - 15 - 23nov22
113 pages
Ch3 CNN
No ratings yet
Ch3 CNN
64 pages
Semantic Segmentation
No ratings yet
Semantic Segmentation
22 pages
What is a Convolutional Neural Network-unit3.docx
No ratings yet
What is a Convolutional Neural Network-unit3.docx
12 pages
CNN-Slides-Part2.pptx
No ratings yet
CNN-Slides-Part2.pptx
69 pages
Convolutional_Networks_2024
No ratings yet
Convolutional_Networks_2024
44 pages
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
No ratings yet
Convolutional Neural Networks: CMSC 733 Fall 2015 Angjoo Kanazawa
55 pages
1.neural Networks and Convolutional Processing
No ratings yet
1.neural Networks and Convolutional Processing
94 pages
Lecture 2
No ratings yet
Lecture 2
101 pages
Unit 3 - Machine Learning
No ratings yet
Unit 3 - Machine Learning
29 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
03 Convolutional Neural Networks
No ratings yet
03 Convolutional Neural Networks
83 pages
Lecture_3
No ratings yet
Lecture_3
48 pages
598_FA2020_lecture07
No ratings yet
598_FA2020_lecture07
98 pages
Reading List Team Specific
No ratings yet
Reading List Team Specific
3 pages
Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
No ratings yet
Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
69 pages
Module 3
No ratings yet
Module 3
67 pages
08. Chap 9-2_Convolutional Neural Network_Heechul Lim
No ratings yet
08. Chap 9-2_Convolutional Neural Network_Heechul Lim
58 pages
DLCV Ch2 Neural Network
No ratings yet
DLCV Ch2 Neural Network
68 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
From Everand
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
Fouad Sabry
No ratings yet
Cardboard VR Projects for Android
From Everand
Cardboard VR Projects for Android
Jonathan Linowes
No ratings yet
21CS644 Module
No ratings yet
21CS644 Module
30 pages
Lesson Plan - FCV - 2024
No ratings yet
Lesson Plan - FCV - 2024
4 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
43 pages
Probability
No ratings yet
Probability
11 pages
Scheme Report
No ratings yet
Scheme Report
1 page
Discrete Mathematics
No ratings yet
Discrete Mathematics
5 pages
Nishit 1
No ratings yet
Nishit 1
8 pages
Back Catering Guidance Document
No ratings yet
Back Catering Guidance Document
75 pages
Sigma-5 Users Manual - Chapter 10 Troublestooting
No ratings yet
Sigma-5 Users Manual - Chapter 10 Troublestooting
28 pages
Know Your Lab Equipments: Test Tube
No ratings yet
Know Your Lab Equipments: Test Tube
9 pages
Using Punnett Squares
No ratings yet
Using Punnett Squares
11 pages
Hall Ticket c2402030183
No ratings yet
Hall Ticket c2402030183
2 pages
1 - Scope and Importance of Environmental Studies
100% (1)
1 - Scope and Importance of Environmental Studies
36 pages
Implementación Del Método Kaizen en La Industria Manufacturera: Una Revisión Sistemática 2010-2019
No ratings yet
Implementación Del Método Kaizen en La Industria Manufacturera: Una Revisión Sistemática 2010-2019
4 pages
Mechanical Excavation in Mining and Civil Industries Edited by Nuh Bilgin, Hanif Copur and Cemal Balci
No ratings yet
Mechanical Excavation in Mining and Civil Industries Edited by Nuh Bilgin, Hanif Copur and Cemal Balci
378 pages
55pfl8007k 12 Dfu Eng
No ratings yet
55pfl8007k 12 Dfu Eng
30 pages
Heat and Mass Transfer 1
No ratings yet
Heat and Mass Transfer 1
2 pages
10A7BDE500FF4295B7ABC9F036FC8968.xls
No ratings yet
10A7BDE500FF4295B7ABC9F036FC8968.xls
220 pages
Marinduque National High School Division of Marinduque
No ratings yet
Marinduque National High School Division of Marinduque
4 pages
Map Catalogue Full Version
No ratings yet
Map Catalogue Full Version
86 pages
Performance of A Cylindrical Hydrophone Array For Practical Use
No ratings yet
Performance of A Cylindrical Hydrophone Array For Practical Use
5 pages
Design Calculation For Gearbox
100% (1)
Design Calculation For Gearbox
69 pages
BTech CBCS Course Structure With Syllabi - Minerals and Metallurgical Engineering
No ratings yet
BTech CBCS Course Structure With Syllabi - Minerals and Metallurgical Engineering
38 pages
HRM2601
No ratings yet
HRM2601
5 pages
The System Analyst Position
No ratings yet
The System Analyst Position
9 pages
The Causes of Crime New Biological Approaches 1st Edition Sarnoff A. Mednick 2025 scribd download
100% (1)
The Causes of Crime New Biological Approaches 1st Edition Sarnoff A. Mednick 2025 scribd download
84 pages
Speaking Lesson Plan
No ratings yet
Speaking Lesson Plan
7 pages
Chapter 22 Lesson 1 Homework
100% (1)
Chapter 22 Lesson 1 Homework
6 pages
Instant Access to Engineering Ethics: Concepts and Cases 6 ed. Edition Charles E. Harris ebook Full Chapters
100% (2)
Instant Access to Engineering Ethics: Concepts and Cases 6 ed. Edition Charles E. Harris ebook Full Chapters
65 pages
5.1 Wave
No ratings yet
5.1 Wave
13 pages
9 Empirical Correlation Between Geotechnical and Geophysical Parameters in A Landslide Zone
No ratings yet
9 Empirical Correlation Between Geotechnical and Geophysical Parameters in A Landslide Zone
10 pages
Public Speaking Chapter 2
No ratings yet
Public Speaking Chapter 2
9 pages
Phy Worksheet IG 3 Phase 2
No ratings yet
Phy Worksheet IG 3 Phase 2
6 pages
Hormones and Reproduction of Vertebrates Volume 3 Reptiles 1st Edition David O. Norris - Quickly download the ebook in PDF format for unlimited reading
100% (2)
Hormones and Reproduction of Vertebrates Volume 3 Reptiles 1st Edition David O. Norris - Quickly download the ebook in PDF format for unlimited reading
55 pages