0% found this document useful (0 votes)
56 views

10 R CNN

The document discusses the R-CNN family of object detection models, including R-CNN, Fast R-CNN, and Faster R-CNN. R-CNN was one of the first applications of convolutional neural networks to object detection. It used selective search to first generate region proposals, then extracted CNN features from each proposal and classified them with an SVM. Fast R-CNN and Faster R-CNN improved on R-CNN by performing the feature extraction once on the full image rather than individually on each proposal to increase speed. Faster R-CNN also integrated a region proposal network to generate proposals, removing selective search and allowing end-to-end training.

Uploaded by

Eng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views

10 R CNN

The document discusses the R-CNN family of object detection models, including R-CNN, Fast R-CNN, and Faster R-CNN. R-CNN was one of the first applications of convolutional neural networks to object detection. It used selective search to first generate region proposals, then extracted CNN features from each proposal and classified them with an SVM. Fast R-CNN and Faster R-CNN improved on R-CNN by performing the feature extraction once on the full image rather than individually on each proposal to increase speed. Faster R-CNN also integrated a region proposal network to generate proposals, removing selective search and allowing end-to-end training.

Uploaded by

Eng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

R-CNN:Regions with CNN

Features” - “Region-Based
Convolutional Neural
Network
DR. OUIEM BCHIR

https://machinelearningmastery.com/object-recognition-with-
deep-learning/
R-CNN Model Family
The R-CNN family of methods refers to the R-CNN, which may stand for “Regions with CNN
Features” or “Region-Based Convolutional Neural Network,” developed by Ross Girshick, et al.
This includes the techniques R-CNN, Fast R-CNN, and Faster-RCNN designed and demonstrated
for object localization and object recognition.
R-CNN
The R-CNN was described in the 2014 paper by Ross Girshick, et al. from UC Berkeley titled “Rich
feature hierarchies for accurate object detection and semantic segmentation.”
It may have been one of the first large and successful application of convolutional neural
networks to the problem of object localization, detection, and segmentation.
The approach was demonstrated on benchmark datasets, achieving then state-of-the-art results
on the VOC-2012 dataset and the 200-class ILSVRC-2013 object detection dataset.
R-CNN
Their proposed R-CNN model is comprised of three modules; they are:
Module 1: Region Proposal. Generate and extract category independent region proposals, e.g.
candidate bounding boxes.
Module 2: Feature Extractor. Extract feature from each candidate region, e.g. using a deep
convolutional neural network.
Module 3: Classifier. Classify features as one of the known class, e.g. linear SVM classifier
model.
The architecture of the model is summarized in the image below, taken from the paper.
Architecture
selective search
To bypass the problem of selecting a huge number of regions, Ross Girshick et al. proposed a
method where we use selective search to extract just 2000 regions from the image and he called
them region proposals.
Therefore, now, instead of trying to classify a huge number of regions, you can just work with
2000 regions.
A computer vision technique is used to propose candidate regions or bounding boxes of
potential objects in the image called “selective search”
selective search
These 2000 region proposals are generated using the selective search algorithm which is written
below.

Selective Search:
1. Generate initial sub-segmentation, we generate many candidate regions
2. Use greedy algorithm to recursively combine similar regions into larger ones
3. Use the generated regions to produce the final candidate region proposals
Although the flexibility of the design allows other region proposal algorithms to be used.
Feature extractor
These 2000 candidate region proposals are warped into a square and fed into a convolutional
neural network that produces a 4096-dimensional feature vector as output.
The CNN acts as a feature extractor and the output dense layer consists of the features
extracted from the image
The feature extractor used by the model was the AlexNet deep CNN that won the ILSVRC-2012
image classification competition.
Classify regions
The output of the CNN was a 4,096 element vector that describes the contents of the image that
is fed to a linear SVM for classification,
One SVM is trained for each known class.
SVM classifies the presence of the object within that candidate region proposal.
In addition to predicting the presence of an object within the region proposals, the algorithm
also predicts four values which are offset values to increase the precision of the bounding box.
For example, given a region proposal, the algorithm would have predicted the presence of a
person but the face of that person within that region proposal could’ve been cut in half.
Therefore, the offset values help in adjusting the bounding box of the region proposal.
Problems with R-CNN
It still takes a huge amount of time to train the network as you would have to classify 2000
region proposals per image.
It cannot be implemented real time as it takes around 47 seconds for each test image.
The selective search algorithm is a fixed algorithm. Therefore, no learning is happening at that
stage. This could lead to the generation of bad candidate region proposals.
Fast R-CNN
OUIEM BCHIR

https://machinelearningmastery.com/object-recognition-with-
deep-learning/
https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-
yolo-object-detection-algorithms-36d53571365e
Fast R-CNN
Given the great success of R-CNN, Ross Girshick, then at Microsoft Research, proposed an
extension to address the speed issues of R-CNN in a 2015 paper titled “Fast R-CNN.”
The paper opens with a review of the limitations of R-CNN, which can be summarized as follows:
Training is a multi-stage pipeline. Involves the preparation and operation of three separate
models.
Training is expensive in space and time. Training a deep CNN on so many region proposals per
image is very slow.
Object detection is slow. Make predictions using a deep CNN on so many region proposals is
very slow.
FAST-CNN
The approach is similar to the R-CNN algorithm.
But, instead of feeding the region proposals to the CNN, we feed the input image to the CNN to
generate a convolutional feature map.
From the convolutional feature map, we identify the region of proposals and warp them into
squares
By using a RoI pooling layer, we reshape them into a fixed size so that it can be fed into a fully
connected layer.
From the RoI feature vector, we use a softmax layer to predict the class of the proposed region
and also the offset values for the bounding box.
Architecture
Architecture
Fast R-CNN is proposed as a single model instead of a pipeline to learn and output regions and
classifications directly.
The architecture of the model takes the image and a set of region proposals as input.
They are passed through a deep convolutional neural network.
A pre-trained CNN, such as a VGG-16, is used for feature extraction.
The end of the deep CNN is a custom layer called a Region of Interest Pooling Layer, or RoI
Pooling, that extracts features specific for a given input candidate region.
Architecture
The output of the CNN is then interpreted by a fully connected layer
Then the model bifurcates into two outputs,
◦ one for the class prediction via a softmax layer, and
◦ another with a linear output for the bounding box.

This process is then repeated multiple times for each region of interest in a given image.
Comparison

From the above graphs, you can infer that Fast R-CNN is significantly faster in training and
testing sessions over R-CNN.
When you look at the performance of Fast R-CNN during testing time, including region proposals
slows down the algorithm significantly when compared to not using region proposals.
Therefore, region proposals become bottlenecks in Fast R-CNN algorithm affecting its
performance.
Discussion
The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to feed 2000 region
proposals to the convolutional neural network every time. Instead, the convolution operation is
done only once per image and a feature map is generated from it.
The model is significantly faster to train and to make predictions, yet still requires a set of
candidate regions to be proposed along with each input image.
Faster R-CNN
OUIEM BCHIR

https://machinelearningmastery.com/object-recognition-with-
deep-learning/
https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-
yolo-object-detection-algorithms-36d53571365e
Faster R-CNN
The model architecture was further improved for both speed of training and detection by
Shaoqing Ren, et al. at Microsoft Research in the 2016 paper titled “Faster R-CNN: Towards Real-
Time Object Detection with Region Proposal Networks.”
The architecture was the basis for the first-place results achieved on both the ILSVRC-2015 and
MS COCO-2015 object recognition and detection competition tasks.
Faster R-CNN
Both R-CNN & Fast R-CNN use selective search to find out the region proposals.
Selective search is a slow and time-consuming process affecting the performance of the
network.
Therefore, Shaoqing Ren et al. came up with an object detection algorithm that eliminates the
selective search algorithm and lets the network learn the region proposals.
Faster R-CNN
Similar to Fast R-CNN, the image is provided as an input to a convolutional network which
provides a convolutional feature map.
Instead of using selective search algorithm on the feature map to identify the region proposals, a
separate network is used to predict the region proposals.
The predicted region proposals are then reshaped using a RoI pooling layer which is then used to
classify the image within the proposed region and predict the offset values for the bounding
boxes.
Faster R-CNN
The architecture was designed to both propose and refine region proposals as part of the
training process, referred to as a Region Proposal Network, or RPN.
These regions are then used in concert with a Fast R-CNN model in a single model design.
These improvements both reduce the number of region proposals and accelerate the test-time
operation of the model to near real-time with then state-of-the-art performance.
Architecture
Although it is a single unified model, the architecture is comprised of two modules:
Module 1: Region Proposal Network. Convolutional neural network for proposing regions and
the type of object to consider in the region.
Module 2: Fast R-CNN. Convolutional neural network for extracting features from the proposed
regions and outputting the bounding box and class labels.
Both modules operate on the same output of a deep CNN.
The region proposal network acts as an attention mechanism for the Fast R-CNN network,
informing the second network of where to look or pay attention.
RPN
The RPN works by taking the output of a pre-trained deep CNN, such as VGG-16, and passing a
small network over the feature map and outputting multiple region proposals and a class
prediction for each.
Region proposals are bounding boxes, based on so-called anchor boxes or pre-defined shapes
designed to accelerate and improve the proposal of regions.
The class prediction is binary, indicating the presence of an object, or not, so-called “objectness”
of the proposed region.
Faster R-CNN
A procedure of alternating training is used where both sub-networks are trained at the same
time, although interleaved.
This allows the parameters in the feature detector deep CNN to be tailored or fine-tuned for
both tasks at the same time.
Faster R-CNN architecture is the pinnacle of the Region based family of models and continues to
achieve near state-of-the-art results on object recognition tasks.
A further extension adds support for image segmentation, described in the paper 2017 paper
“Mask R-CNN.”
From the above graph, you can see that Faster R-CNN is much faster than it’s
predecessors. Therefore, it can even be used for real-time object detection.

You might also like