Selective Search for Object Detection | R-CNN
Last Updated :
22 Jul, 2021
The problem of object localization is the most difficult part of object detection. One approach is that we use sliding window of different size to locate objects in the image. This approach is called Exhaustive search. This approach is computationally very expensive as we need to search for object in thousands of windows even for small image size. Some optimization has been done such as taking window sizes in different ratios (instead of increasing it by some pixels). But even after this due to number of windows it is not very efficient. This article looks into selective search algorithm which uses both Exhaustive search and segmentation (a method to separate objects of different shapes in the image by assigning them different colors).
Algorithm Of Selective Search :
- Generate initial sub-segmentation of input image using the method describe by Felzenszwalb et al in his paper "Efficient Graph-Based Image Segmentation ".
Image and its Segmentation (Source: selective Search Paper)
- Recursively combine the smaller similar regions into larger ones. We use Greedy algorithm to combine similar regions to make larger regions. The algorithm is written below.
Greedy Algorithm :
1. From set of regions, choose two that are most similar.
2. Combine them into a single, larger region.
3. Repeat the above steps for multiple iterations.
Image Segmentation after combining similar regions (Source: selective Search Paper)
- Use the segmented region proposals to generate candidate object locations.
Image showing segmentation and their candidate proposals (Source: selective Search Paper)
Similarity in Segmentation:
The selective search paper considers four types of similarity when combining the initial small segmentation into larger ones. These similarities are:
- Color Similarity : Specifically for each region we generate the histogram of each channels of colors present in image .In this paper 25 bins are taken in histogram of each color channel. This gives us 75 bins (25 for each R, G and B) and all channels are combined into a vector (n = 75) for each region. Then we find similarity using equation below:
\kern 6pc \mathbf{S_{color}(r_i, r_j) = \sum_{k=1}^{n} min(c_{i}^{k}, c_{j}^{k}) }\\
C_{i}^{k}, c_{j}^{k} = k^{th} \, value \, of\, histogram \, bin \, of \, region\, r_{i}\, and\, r_{j}\, respectively
- Texture Similarity : Texture similarity are calculated using generated 8 Gaussian derivatives of image and extracts histogram with 10 bins for each color channels. This gives us 10 x 8 x 3 = 240 dimensional vector for each region. We derive similarity using this equation.
\kern 6pc \mathbf{\kern 6pc S_{texture}(r_i, r_j) = \sum_{k=1}^{n} min(t_{i}^{k}, t_{j}^{k})}\\
t_{i}^{k}, t_{j}^{k} = k^{th} \, value \, of\, texture\, histogram \, bin \, of \, region\, r_{i}\, and\, r_{j}\, respectively
- Size Similarity : The basic idea of size similarity is to make smaller region merge easily. If this similarity is not taken into consideration then larger region keep merging with larger region and region proposals at multiple scales will be generated at this location only.
\kern 6pc \mathbf{S_{size}(r_i, r_j) = 1 - \left ( size\left ( r_i \right ) + size\left ( r_j \right ) \right )\div size\left ( img \right )}\\
where \, size\left ( r_i \right ) \,, \, size\left ( r_j \right )\, and \, size\left ( img \right ) \, are \, the \, sizes \, of \, regions\, r_i \,, \, r_j \, and \, image \\ \kern 6pc respectively \, in \, pixels
- Fill Similarity : Fill Similarity measures how well two regions fit with each other. If two region fit well into one another (For Example one region is present in another) then they should be merged, if two region does not even touch each other then they should not be merged.
\kern 6pc\mathbf{S_{fill}(r_i, r_j) = 1 - \left ( size\left ( BB_{ij}\right )-size\left ( r_i \right ) - size\left ( r_j \right ) \right )\div size\left ( img \right )}
\kern 6pc size\left ( BB_{ij}\right ) \, is \, the \, size \, of \, bounding \, box \, around \, i \, and \, j
Now, Above four similarities combined to form a final similarity.
\kern 6pc \mathbf{S_{(r_i, r_j)} = a_1 * s_{color}{(r_i, r_j)} +a_2 * s_{texture}{(r_i, r_j)} + a_3 * s_{size}{(r_i, r_j)}+ a_4 * s_{fill}{(r_i, r_j)}} \\
where\, a_i\, is\, either\, 0\, or\, 1\, depending\, upon\, we\, consider\, this\, similarity\, or\, not\, .
Results :
To measure the performance of this method. The paper describes an evaluation parameter known as MABO (Mean Average Best Overlap).
There are two version of selective search came
Fast and
Quality. The difference between them is Quality generated much more bounding boxes than Fast and so takes more time to compute but have higher recall and ABO(Average Best Overlap) and MABO (Mean Average Best overlap). We calculated ABO as follows.

As we can observe that when all the similarities are used in combination, It gives us best MABO. However, it can also be conclude RGB is not best color scheme to use in this method. HSV, Lab and rgI all performs better than RGB, this is because these are not sensitive to shadows and brightness changes.
But when we diversify and combine these different similarities, color scheme and threshold values (k),
Selective search Result on different combination of similarities (Credits : Selective search paper)In selective search paper, it applies greedy method based on MABO on different strategies to get above results. We can say that this method of combining different strategies although gives better MABO, but the run time also increases considerably.
Selective Search In Object Recognition :
In selective search paper, authors use this algorithm on object detection and train a model using by giving ground truth examples and sample hypothesis that overlaps 20-50% with ground truth(as negative example) into SVM classifier and train it to identify false positive . The architecture of model used in given below.
Object Recognition Architecture (Source : Selective Search paper)The result generated on VOC 2007 test set is,
Selective search result on different parameter (Credit)As we can see that it produces a very high recall and best MABO on VOC 2007 test Set and it requires much less number of windows to be processed as compared to other algorithms who achieve similar recall and MABO.
Applications :
Selective Search is widely used in early state-of-the-art architecture such as R-CNN, Fast R-CNN etc. However, Due to number of windows it processed, it takes anywhere from 1.8 to 3.7 seconds (Selective Search Fast) to generate region proposal which is not good enough for a real-time object detection system.
Reference:
Similar Reads
Object Detection using yolov8
In the world of computer vision, YOLOv8 object detection really stands out for its super accuracy and speed. It's the latest version of the YOLO series, and it's known for being able to detect objects in real-time. YOLOv8 takes web applications, APIs, and image analysis to the next level with its to
7 min read
OpenCV | Real Time Road Lane Detection
Autonomous Driving Car is one of the most disruptive innovations in AI. Fuelled by Deep Learning algorithms, they are continuously driving our society forward and creating new opportunities in the mobility sector. An autonomous car can go anywhere a traditional car can go and does everything that an
15+ min read
YOLO v2 - Object Detection
In terms of speed, YOLO is one of the best models in object recognition, able to recognize objects and process frames at the rate up to 150 FPS for small networks. However, In terms of accuracy mAP, YOLO was not the state of the art model but has fairly good Mean average Precision (mAP) of 63% when
7 min read
Project Idea | Searching a person in stored video sequence
Project Title: Face identification in a video sequence. Introduction: This project is related to face recognition and identification area. We have to search for a person in a video. In the face detection and identification we have to train our machine for thousands of faces, but for the purpose of s
2 min read
Haar Cascades for Object Detection - Python
Haar Cascade classifiers are a machine learning-based method for object detection. They use a set of positive and negative images to train a classifier, which is then used to detect objects in new images. Positive Images: These images contain the objects that the classifier is trained to detect.Nega
3 min read
YOLO : You Only Look Once - Real Time Object Detection
YOLO was proposed by Joseph Redmond et al. in 2015. It was proposed to deal with the problems faced by the object recognition models at that time, Fast R-CNN is one of the state-of-the-art models at that time but it has its own challenges such as this network cannot be used in real-time, because it
6 min read
Object Detection with Detection Transformer (DETR) by Facebook
Facebook has just released its State of the art object detection Model on 27 May 2020. They are calling it DERT stands for Detection Transformer as it uses transformers to detect objects.This is the first time that transformer is used for such a task of Object detection along with a Convolutional Ne
7 min read
Object Detection vs Object Recognition vs Image Segmentation
Object Recognition: Object recognition is the technique of identifying the object present in images and videos. It is one of the most important applications of machine learning and deep learning. The goal of this field is to teach machines to understand (recognize) the content of an image just like
5 min read
Object Detection Models
One of the most important tasks in computer vision is object detection, which is locating and identifying items in an image or video. In contrast to image classification, which gives an image a single label, object detection gives each object it detects its spatial coordinates (bounding boxes) along
15+ min read
What is Object Detection in Computer Vision?
Now day Object Detection is very important for Computer vision domains, this concept(Object Detection) identifies and locates objects in images or videos. Object detection finds extensive applications across various sectors. The article aims to understand the fundamentals, of working, techniques, an
9 min read