0% found this document useful (0 votes)

19 views

The Basics of Object Detection YOLO SSD R-CNN

Uploaded by

Asnaku Arja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

The Basics of Object Detection YOLO SSD R-CNN

Uploaded by

Asnaku Arja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

towardsdatascience.

com /the-basics-of-object-detection-yolo-ssd-r-cnn-6def60f51c0b

Unknown Title
Hari Devanathan ⋮ ⋮ 10/12/2022

Top highlight

This member-only story is on us. Upgrade to access all of Medium.

The Basics of Object Detection: YOLO, SSD, R-CNN

Overview of how object detection works, and where to get

started

Hari Devanathan

Towards Data Science

Photo by on

Note: This assumes you have a general idea of what convolutional neural networks are. If you need a
refresher, this IBM post is excellent.

You know how convolutional neural networks (or CNNs) detect and classify images. But you can expand
on that CNN to detect objects within that image.

How is object detection different from image recognition?

You have an image of 4 cows. Image recognition detects the image as a whole. It says the image is
classified as a cow.

However, image recognition can’t tell where in the image the cow is. Furthermore, it won’t be able to tell
that there are 4 cows instead of 1.

On a side note, object detection is NOT meant for counting. There are other computer vision techniques
like density estimation and extracting patches that are meant for counting. Object detection just detects if
there are multiple objects in the same image.

Why do I need to learn this? How is this useful in the real world?

There are a lot of scammers. Some get by with using fake cashier’s checks or dollar bills. Image
detection can only say whether the fake replica looks like a check or a dollar bill. Object detection can

1/4
help search for individual components that help determine if the check or bill is fake.

If you’re fascinated by medicine, then object detection can identify shadows or abnormalities in X-rays.
Even if it can’t diagnose the disease, object detection can assist doctors in finding abnormalities in lung
tissue. Doctors can catch lung cancer in patients before it becomes malicious.

If you’re fascinated by drones and self driving cars, then object detection is very useful for detecting
obstacles and boundaries in a video stream. It keeps the car within the driving lane and avoids accidents.

Sounds awesome! But how does object detection work?

Images already have so much information. CNNs aim to reduce image data and keep information that is
important. This can be done with different convolutional, pooling, and dense layers.

At the end of these layers, the image is reduced enough to make predictions on. For image recognition,
it’s an activation function like Softmax to classify the image. For object detection, it has an algorithm to
predict bounding boxes around detected objects before it classifies the object.

What algorithms are there?

There are many object detection algorithms, but we’ll cover three main ones.

YOLO — You Only Look Once

SSD — Single Shot Detector
R-CNN—Region-based Convolutional Neural Network

YOLO is the simplest object detection architecture. It predicts bounding boxes through a grid based
approach after the object goes through the CNN. It divides each image into an SxS grid, with each grid
predicting N boxes that contain any object. From those SxSxN boxes, it classifies each box for every
class and picks the highest class probability.

SSD is similar to YOLO, but uses the feature maps of each convolutional layer (output of each filter/layer)
to predict the bounding boxes. After consolidating all the feature maps, it runs a 3x3 convolutional kernel
on them to predict bounding boxes and classification probability. SSD is a family of algorithms, with the
popular choice being RetinaNet.

R-CNN takes a different approach by classifying the pixels that make up the object in the identified
bounding box/region. It uses one neural network to suggest potential locations for objects to be detected
(region based network). It uses a second neural network to classify and detect objects based on those
regions proposed. This second neural network adds a pixel mask that gives shape to the object that
needs to be classified.

Note, some researchers have different semantics of these algorithms. Some consider YOLO as part of
the SSD family because they both process images exactly once. Some keep YOLO separate from the
SSD family. Some say that region based networks like R-CNN are instance segmentation methods as
opposed to an object detection methods.

Which one should I use?

2/4
This is heavily dependent on your data, your goals, and your compute usage.

YOLO is blazing fast and uses little processing memory. While YOLOv1 was less accurate than SSD,
YOLOv3 and YOLOv5 have surpassed SSD in accuracy and speed. In addition, YOLO can predict only 1
class per grid. If there are multiple objects in a grid, YOLO fails. Finally, YOLO struggles to detect small
objects.

SSD can handle objects of various scales. It utilizing feature maps from all convolutional layers, and each
layer operates at different scales. It is also not computationally heavy. However, SSD also struggles to
detect small objects. Furthermore, SSD becomes slower if it contains more convolutional layers.

R-CNN is the most accurate. However, it is computationally expensive. It needs a lot of storage and
processing power for detection. It’s also slower than YOLO and SSD.

There are tradeoffs for each. If accuracy isn’t a huge concern, YOLO is the best bet. If your images are in
black-and-white or have easily identifiable objects on a clear background, YOLO would be very accurate
on those scenarios. If you have complex images and care about accuracy (such as cancer detection from
X-rays), then R-CNN would be the best fit.

Do I have to rewrite these algorithms from scratch?

No. These are open sourced and pre-trained. They’re all available on OpenCV. OpenCV documentation
and samples exist for YOLO, SSD, and Mask R-CNN.

Wait, if these models are pre-trained, then how do I use them on new data that is specific to
my use case?

You need to create two datasets

custom training dataset that includes the images and the labels/annotations of the objects in each
image
custom test dataset for the model to predict on, and labels/annotations to verify the model’s
accuracy with their predictions

It’s highly recommended that you create a custom validation dataset.

This sounds like a lot of work! How would I go about doing this?

You use a tool to add the annotations. After you’re done, you can export the dataset as any format for
your model.

My favorite tool is Label Studio. You can annotate image objects for computer vision, text for natural
language processing, audio for transcription, and more. I’ve used it only for annotating objects, and it’s
excellent.

You can output the datasets in various formats. CSV, JSON, XML, Pascal VOC XML, etc. There’s even a
format specifically for YOLO.

3/4
You can read more on how to get started with Label Studio. It’s super easy to set up. You download label-
studio via the command pip install -U label-studio , and you then launch it via label-
studio . The UI is very intuitive to figure things out on the go.

Label Studio Documentation — Get Started with Label Studio

Label Studio is an open source data labeling tool for labeling and exploring multiple
types of data. You can perform…

labelstud.io

Sweet! But how would I visualize the annotations of these objects?

Use python package matplotlib!

If you need help figuring it out, some other contributors from Towards Data Scientists wrote their own
methods for annotating images. See below.

How To Use Matplotlib For Plotting Samples From An Object

Detection Dataset
A short tutorial on how to draw plots using Matplotlib for object detection tasks

towardsdatascience.com

Now, you have the tools needed to train existing object detection models on your own custom datasets.

Thanks for reading! If you want to read more of my work, view my Table of Contents.

If you’re not a Medium paid member, but are interested in subscribing to Towards Data Science just to
read tutorials and articles like this, click here to enroll in a membership. Enrolling in this link means I get
paid for referring you to Medium.

References
Practical Machine Learning for Computer Vision — O’Reilly Media (specifically Chapter 4: Object
Detection and Image Segmentation)

R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms

Custom Object Detection Training with YOLO5

4/4

Mastering All YOLO Models From YOLOv1 To YOLO
100% (1)
Mastering All YOLO Models From YOLOv1 To YOLO
58 pages
YOLO Object Detection Explained_ A Beginner's Guide _ DataCamp
No ratings yet
YOLO Object Detection Explained_ A Beginner's Guide _ DataCamp
14 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Yolo Paper
No ratings yet
Yolo Paper
10 pages
You Only Look Once - Unified, Real-Time Object Detection
No ratings yet
You Only Look Once - Unified, Real-Time Object Detection
10 pages
Object Detect
No ratings yet
Object Detect
12 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
Object Detection Using Yolo Algorithm-1
No ratings yet
Object Detection Using Yolo Algorithm-1
9 pages
yolopdf
No ratings yet
yolopdf
10 pages
Object_Detection_Document
No ratings yet
Object_Detection_Document
4 pages
Unified Real-Time Object Detection
No ratings yet
Unified Real-Time Object Detection
36 pages
Project
100% (1)
Project
30 pages
Constructon
No ratings yet
Constructon
10 pages
Red Mon 2016
No ratings yet
Red Mon 2016
10 pages
Ex No 06
No ratings yet
Ex No 06
4 pages
5-IJLEMR-77839
No ratings yet
5-IJLEMR-77839
5 pages
Week 05
No ratings yet
Week 05
38 pages
Final Synopsis1
No ratings yet
Final Synopsis1
10 pages
Yolo
No ratings yet
Yolo
10 pages
Base Paper (YOLO)
No ratings yet
Base Paper (YOLO)
6 pages
Object Detection
No ratings yet
Object Detection
31 pages
Pedestrian Detection System Based On Deep Learning
No ratings yet
Pedestrian Detection System Based On Deep Learning
5 pages
Detection and Content Retrieval of Object in An Image Using YOLO
No ratings yet
Detection and Content Retrieval of Object in An Image Using YOLO
8 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
Yolo Algorithm
No ratings yet
Yolo Algorithm
37 pages
ref14
No ratings yet
ref14
5 pages
YOLO Based Object Detection Models: A Review and Its Applications
No ratings yet
YOLO Based Object Detection Models: A Review and Its Applications
40 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Deep Learning YOLOv2
No ratings yet
Deep Learning YOLOv2
3 pages
YOLO_v2
No ratings yet
YOLO_v2
9 pages
Research Paper UGR_Team-07
No ratings yet
Research Paper UGR_Team-07
16 pages
Team 10
No ratings yet
Team 10
20 pages
Overview of YOLO ObjectDetectionAlgorithm
No ratings yet
Overview of YOLO ObjectDetectionAlgorithm
7 pages
1-s2.0-S1877050924033301-main
No ratings yet
1-s2.0-S1877050924033301-main
7 pages
The Code
No ratings yet
The Code
4 pages
Od Segment
No ratings yet
Od Segment
53 pages
Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)
No ratings yet
Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)
5 pages
CSPPartial-YOLO_A_Lightweight_YOLO-Based_Method_for_Typical_Objects_Detection_in_Remote_Sensing_Images
No ratings yet
CSPPartial-YOLO_A_Lightweight_YOLO-Based_Method_for_Typical_Objects_Detection_in_Remote_Sensing_Images
12 pages
27 GSJ8976
No ratings yet
27 GSJ8976
16 pages
Object Detection Using TensorFlow
No ratings yet
Object Detection Using TensorFlow
21 pages
Make 05 00083 v2
No ratings yet
Make 05 00083 v2
37 pages
19bce0014 VL2021220702099 Pe003
No ratings yet
19bce0014 VL2021220702099 Pe003
17 pages
NN 09
No ratings yet
NN 09
34 pages
Comparative Analysis of Deep Learning Image Detection Algorithms
No ratings yet
Comparative Analysis of Deep Learning Image Detection Algorithms
27 pages
Efficient Detection of Small and Complex Objects for Autonomous Driving Using Deep Learning
No ratings yet
Efficient Detection of Small and Complex Objects for Autonomous Driving Using Deep Learning
5 pages
Vehicle Detection and Classification based on C-DSO Dataset using YOLO v3 with SRBD Method for Intelligent Transportation Applications
No ratings yet
Vehicle Detection and Classification based on C-DSO Dataset using YOLO v3 with SRBD Method for Intelligent Transportation Applications
5 pages
Yolo
No ratings yet
Yolo
10 pages
CV Lab 9
No ratings yet
CV Lab 9
4 pages
MJEER-Volume 30-Issue 1 - Page 52-57
No ratings yet
MJEER-Volume 30-Issue 1 - Page 52-57
6 pages
YED-YOLO: An Object Detection Algorithm For Automatic Driving
No ratings yet
YED-YOLO: An Object Detection Algorithm For Automatic Driving
9 pages
JJSB 2018
No ratings yet
JJSB 2018
10 pages
Object Detection With Deep Learning
No ratings yet
Object Detection With Deep Learning
3 pages
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
No ratings yet
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
33 pages
M10 - Introduction To TensorFlow, Deep Learning and Application
No ratings yet
M10 - Introduction To TensorFlow, Deep Learning and Application
25 pages
The Applications of Machine Learning and Computer Vision Algorithms To Aid People With Vision Impairment.
100% (1)
The Applications of Machine Learning and Computer Vision Algorithms To Aid People With Vision Impairment.
16 pages
Seminar 201202175023
No ratings yet
Seminar 201202175023
16 pages
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
SSD Single Shot MultiBox Detector
No ratings yet
SSD Single Shot MultiBox Detector
17 pages
Implementing Single Shot Detector (SSD) in Keras
No ratings yet
Implementing Single Shot Detector (SSD) in Keras
49 pages
Improved_SSD_network_for_fast_concealed_object_detection_and_recognition
No ratings yet
Improved_SSD_network_for_fast_concealed_object_detection_and_recognition
16 pages
thesis-kisner
No ratings yet
thesis-kisner
86 pages
Image Recognition Using Artificial Intelligence kumar2021
No ratings yet
Image Recognition Using Artificial Intelligence kumar2021
4 pages
YOLOv5 Pytorch Implementation
No ratings yet
YOLOv5 Pytorch Implementation
14 pages
Bluetooth Next II Manual Ver.10 en
No ratings yet
Bluetooth Next II Manual Ver.10 en
26 pages
DATABRICKS DATA ENGG PRO CERTIFICATION DUMPS
100% (1)
DATABRICKS DATA ENGG PRO CERTIFICATION DUMPS
41 pages
3D Geospatial Data Visualization With Ai Integration
No ratings yet
3D Geospatial Data Visualization With Ai Integration
5 pages
Siemens Electrical - Capital Electra
No ratings yet
Siemens Electrical - Capital Electra
8 pages
AQtivate 200 Instruction Manual v2.02 English
No ratings yet
AQtivate 200 Instruction Manual v2.02 English
136 pages
XOR Basic Training(All Document)
No ratings yet
XOR Basic Training(All Document)
278 pages
114997
No ratings yet
114997
4 pages
ICT Booklet Primary 1 Term 2
No ratings yet
ICT Booklet Primary 1 Term 2
20 pages
OpenCNC English SDK Datasheet
No ratings yet
OpenCNC English SDK Datasheet
2 pages
AOS RISC-V - TowardsAlways OnHeapMemorySafety
No ratings yet
AOS RISC-V - TowardsAlways OnHeapMemorySafety
5 pages
Imen Mabrouki Resume
No ratings yet
Imen Mabrouki Resume
1 page
Release Notes Xerox CX Print Server, Powered by Creo, For Xerox 700 Digital Color Press
No ratings yet
Release Notes Xerox CX Print Server, Powered by Creo, For Xerox 700 Digital Color Press
6 pages
20131A0568 Intern Docu
No ratings yet
20131A0568 Intern Docu
48 pages
452298130982 PM XtraVision R5 Xper Systems
No ratings yet
452298130982 PM XtraVision R5 Xper Systems
10 pages
Data Modelling
No ratings yet
Data Modelling
9 pages
220 1001 Exam Objectives
No ratings yet
220 1001 Exam Objectives
19 pages
DX Diag
No ratings yet
DX Diag
24 pages
SM 02 Design Process
No ratings yet
SM 02 Design Process
60 pages
Lec 2 PDC
No ratings yet
Lec 2 PDC
31 pages
Semi-Supervised Semantic Segmentation Using Adversarial Learning For Pavement Crack Detection
No ratings yet
Semi-Supervised Semantic Segmentation Using Adversarial Learning For Pavement Crack Detection
14 pages
Exercise 6 - Express Generator
No ratings yet
Exercise 6 - Express Generator
2 pages
DeltaV Operate Themes
No ratings yet
DeltaV Operate Themes
18 pages
Computer Pratical Exercise Final
100% (1)
Computer Pratical Exercise Final
180 pages
MVB-UART Datasheet
No ratings yet
MVB-UART Datasheet
24 pages
Android Accident Detection & Alert System: 1) Background/ Problem Statement
No ratings yet
Android Accident Detection & Alert System: 1) Background/ Problem Statement
8 pages
Image Fusion Techniques A Review
No ratings yet
Image Fusion Techniques A Review
8 pages
WSN Solution
No ratings yet
WSN Solution
16 pages
Lecture 24 - 8255 Various Modes of Operations
No ratings yet
Lecture 24 - 8255 Various Modes of Operations
16 pages
Typingring: A Wearable Ring Platform For Text Input: Shahriar Nirjon, Jeremy Gummeson, Dan Gelb, Kyu-Han Kim
No ratings yet
Typingring: A Wearable Ring Platform For Text Input: Shahriar Nirjon, Jeremy Gummeson, Dan Gelb, Kyu-Han Kim
14 pages
SHS - EmpTech - Q1 - LAS9 - FINAL POWERPOINT
No ratings yet
SHS - EmpTech - Q1 - LAS9 - FINAL POWERPOINT
12 pages