0% found this document useful (0 votes)
112 views33 pages

Crowd Detection

The document summarizes a term project report on a Raspberry Pi crowd detection system. It includes a cover page with the project title and student names and roll numbers. It also includes a certificate page signed by the staff in-charge and head of department, dated November 27, 2021. The acknowledgments section thanks the staff in-charge and head of department for their support and guidance in completing the project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views33 pages

Crowd Detection

The document summarizes a term project report on a Raspberry Pi crowd detection system. It includes a cover page with the project title and student names and roll numbers. It also includes a certificate page signed by the staff in-charge and head of department, dated November 27, 2021. The acknowledgments section thanks the staff in-charge and head of department for their support and guidance in completing the project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Term Project Report on

RASPBERRY PI CROWD
DETECTION SYSTEM
B. Tech. EC
Sem. VII

Submitted by
Name:Sruthi Cheruvullil Name: SnigdhaLokre
Roll No.: EC010 Roll No.: EC031
ID No.: 18ECUOS096 ID No.: 18ECUOS036

DEPARTMENT OF ELECTRONICS AND COMMUNICATION


FACULTY OF TECHNOLOGY
DHARMSINH DESAI UNIVERSITY, NADIAD

I|Page
Certificate
This is to certify that the project entitled “title of the project” is a bonafide
work of Mr. /
Miss.CHERUVULLIL SRUTHI GOKULANDHANRoll.No.: EC010_Identity
No.:__18 ECUOS096_______ of B.Tech. Semester VII in the branch of
“Electronics and Communication” during the academic year 2021-2022.

Staff In-Charge Head of the Department,


Prof. Nisarg K. Bhatt Dr. PurvangDalal

Date: 27-11-2021

II | P a g e
ACKNOWLEDGEMENT

The success and final outcome of any project requires the right guidance and assistance from
a lot of people and we are extremely privileged to have got it throughout the completion of
our project. All that we have done is only due to such supervision and assistance and would
like to thank all of them. we respect and thank Prof. Nisarg Bhatt, for providing us an
opportunity to carry out the project work and giving us all support and guidance which made
us complete the project duly.
We would like to express our sincere gratitude to our Head of Department, Dr. PurvangDalal
for providing us with an ingenuitive atmosphere. At last but not the least, we are highly
thankful to the Faculty of Technology, DDU for providing us such a platform to give life to
our classroom learned skills.

III | P a g e
INDEX

TITLE PAGE NO.

Certificate II

Acknowledgement III

List of figures VI

Abbreviations VII

Abstract VIII

References

Chapter 1- Introduction

1.1 Background 1

1.2 Objective 1

1.3 Scope of project 1

1.4 Research and principle 1

1.4.1 Related Works 1

1.4.2 Existing Methods and Proposed Methodology 2

1.5 Organization of thesis 7

Chapter 2- Raspberry Pi crowd detection system

2.1 Internal architecture of the project 9

2.1.1 Code flow 9

2.1.2 Program code 10

2.1.3 Working 16

2.1.4 Troubleshooting 17

Chapter 3- Accomplishments

3.1 Results 19

3.1.1 Batch Processing (Non Real Time)


Implementation - Google Colab 19

IV | P a g e
3.1.2 Real Time Implementation - RPi 19
and Spyder

3.2 Observations 20

Chapter 4- Conclusion and scope of the project

4.1 Limitations 22

4.2 Applications 22

4.3 Conclusion 23

V|Page
LIST OF FIGURES

Fig 1.1 Flow diagram of crowd detection using image processing technique

Fig 1.2 Small sized object

Fig 1.3 Big sized object. What size do we choose for our sliding window detector?

Fig 1.4 Sliding Window Pyramid

Fig 1.5 Working of R- CNN

Fig 1.6 Working of SPP-net

Fig 1.7 Object Detection by YOLO

Fig 1.8 Accuracy and Speed trade-off

Fig 1.9 YOLO vs SSD vs Faster-RCNN for various sizes

Fig 2.1 Blob Detection Method

Fig 2.2 Block diagram

Fig 3.1 Image given as input in google colab

Fig 3.2 Crowd detected output in google colab

Fig 3.3 Input image captured by RPi camera

Fig 3.4 crowd detected output image in RPi

VI | P a g e
ABBREVIATIONS

RPi = Raspberry Pi

COCO = Company Owned Company Operated

YOLO = You Only Look Once

VNC = Virtual Network Computing

CNN = Convolutional Neural Networks

SSD = Single Shot Detector

PC = Personal Computer

GPU = Graphical Interface Unit

Spyder = Scientific Python Development Environment

WiFi = Wireless Fidelity

SSH = Secure Shell

HDMI = High Definition Multimedia Interface

VII | P a g e
ABSTRACT

With the expanding population and several problems arising due to crowded situations, the
necessity of crowd detection is also at a raise. It includes assessing the number of individuals
in the group and in addition the appropriation of the crowd density in different regions of the
group. Human monitoring can be quite tiresome and expensive.This is where Automated
Crowd Surveillance comes into picture. Estimation of such crowd density can be done from
the image/video of the crowded scene. Our project proposes a real-time approach to solve
such problems related to dense crowds. It uses live video capturing techniques with the help
of Raspberry Pi camera and Raspberry Pi B 4, and attempts to estimate the crowd density of
an area by applying image processing concepts, COCO classes and YOLO models.

VIII | P a g e
CHAPTER 1

INTRODUCTION

1.1 BACKGROUND

By crowd, we mainly refer to the average number of individuals present in a


particular place at a particular time. As a result of such crowds, spreading of diseases like the
novel CORONAvirus or various accidents may take place.. Often miscreants use such crowds
to do various inhuman activities. Presently, video based crowd monitoring is of severe
importance in order to maintain human safety in crowded situations. To avoid hazardous
situations,due to the crowd, the need for crowd analysis system for handling dense crowds is
at a raise. For any crowd analysis system, crowd counting is a must. This includes evaluating
the aggregate number of people in the crowd, along with the density of the crowd in the
different parts of the area. Certain potential mishaps may be easily avoided by giving advance
warning whenever the crowd density of a certain area went over the safe limit. This may help
to maintain the overall management and infrastructure of the area.

1.2 OBJECTIVE
Our proposed project uses Raspberry Pi as the microcontroller and Raspberry Pi
camera as the device to capture live video of a certain place where the crowd detection is to
be carried out. The primary objective is to carry out crowd detection without human
interference in order to increase accuracy and precision.

1.3 SCOPE OF THE PROJECT

The Raspberry PI crowd detection system can extend its facets in the future by
including individual emotional analysis of detected people, detection of velocity of moving
objects(helpful at traffic signals) and object classification into human beings, 2 wheeler, 3
wheeler etc.

1.4 RESEARCH AND PRINCIPLE


1.4.1 Related Works

a. Crowd count for high-density images by Ankan et al. [1] but it is inefficient
for images containing mutual occlusion.
b. A deep learning approach by Shao et al. [2] for understanding crowded scenes
from video sequences.
c. Crowd counting technique proposed by Fu et al. [3]. But it works only for
low-density images and not for high-density ones.
d. A promising crowd counting method proposed by Huiyuan Fu [4] where
probable head regions can be found using a depth camera. But the system is
infeasible due to cost overhead because of the depth camera. Besides, it does
not work for large regions too.
e. The count of moving objects was estimated in the methods proposed in [5, 6].
These methods use the pattern of moving objects obtained from video streams
requiring a good frame rate which is pretty tough to achieve and also do not
work in case of still images.

1|Page
f. Zhang et al. [7] introduces a method utilizing a deep network trained using
perspective maps of images.

The methodologies proposed above (a through e) involve use of image processing


algorithms: Thresholding (Image Segmentation) and Erosion as shown in the below
flowchart:

Fig 1.1 Flow diagram of crowd detection using image processing technique

However, image processing using the above techniques is the most appropriate in images
with high stages of contrast which is not always the case in real time scenarios where we
could have distorted images depending on the camera quality or target places of monitoring
might be overcrowded.

Therefore, our main focus would be on state-of-the-art methods, all of which use neural
networks and Deep Learning.

1.4.2 Existing Methods and Proposed Methodology

A few of the important concepts in object detection are a sliding window pyramid
and aspect ratio.

Object Detection is modeled as a classification problem where we take windows of fixed


sizes from input images at all the possible locations and feed these patches to an image
classifier.

Each window is fed to the classifier which predicts the class of the object in the window( or
background if none is present). Hence, we know both the class and location of the objects in
the image. But how do we know the size of the window so that it always contains the image?
Let us look at examples:

2|Page
Fig 1.2 Small sized object

Fig 1.3 Big sized object. What size do we choose for our sliding window detector?

As we can see that the object can be of varying sizes. To solve this problem an image
pyramid is created by scaling the image.Idea is that we resize the image at multiple scales and
we count on the fact that our chosen window size will completely contain the object in one of
these resized images. Most commonly, the image is downsampled(size is reduced) until a
certain condition, typically a minimum size is reached. On each of these images, a fixed size
window detector is run. It’s common to have as many as 64 levels on such pyramids. Now,
all these windows are fed to a classifier to detect the object of interest. This will help us solve
the problem of size and location.

3|Page
Fig 1.4 Sliding Window Pyramid

The other problem is aspect ratio. A lot of objects can be present in various shapes like a
sitting person will have a different aspect ratio than a standing person or a sleeping person.

The following methods exist in the field of object detection:

1. Object detection by classification by building pipeline: Here object detection is


handled as a classification problem by building a pipeline where first object proposals
are generated and then these proposals are sent to classification/regression heads.

a. Object Detection using Hog Features.

Hog features are good for many real-world problems. On each window obtained from
running the sliding window on the pyramid, we calculate Hog Features which are fed to an
SVM(Support vector machine) to create classifiers.

b. Region-based Convolutional Neural Networks(R-CNN).

HOG based classifiers are CNNs that are too slow and computationally very expensive. It is
impossible to run CNNs on so many patches generated by a sliding window detector. R-CNN
solves this problem by using an object proposal algorithm called Selective Search which
reduces the number of bounding boxes that are fed to the classifier.

The 3 important parts of R-CNN are:

I. Run Selective Search to generate probable objects.


II. Feed these patches to CNN, followed by SVM to predict the class of each patch.
III. Optimize patches by training bounding box regression separately.

Fig 1.5 Working of R- CNN

c. Spatial Pyramid Pooling(SPP-net).

Still, RCNN are very slow. With SPP-net, we calculate the CNN representation for the entire
image only once and can use that to calculate the CNN representation for each patch
generated by Selective Search by performing a pooling type of operation on just that section

4|Page
of the feature maps of the last conv layer that corresponds to the region.
It also uses spatial pooling after the last convolutional layer as opposed to traditionally used
max-pooling because we need to generate the fixed size of input for the fully connected
layers of the CNN.

Fig 1.6 Working of SPP-net

d. Fast R-CNN.

With SPP net, it is not trivial to perform back-propagation through spatial pooling layer.
Hence, the network only fine-tunes the fully connected part of the network. Thus, Fast RCNN
uses the ideas from SPP-net and RCNN and fixes the key problem in SPP-net i.e. it makes it
possible to train end-to-end. Along with this, Fast RCNN adds the bounding box regression
to the neural network training itself hence reducing the overall training time and increasing
the accuracy in comparison to SPP net because of the end to end learning of CNN.

e. Faster R-CNN.

Even though Fast R-CNN is fast and accurate, the slowest part is its Selective Search or Edge
boxes. Faster RCNN replaces selective search with a very small convolutional network called
Region Proposal Network to generate regions of Interests.

It introduces the idea of anchor boxes. At each location, the original paper uses 3 kinds of
anchor boxes for scale 128x 128, 256×256 and 512×512. Similarly, for aspect ratio, it uses
three aspect ratios 1:1, 2:1 and 1:2. So, In total at each location, we have 9 boxes on which
RPN predicts the probability of it being background or foreground. Hence, Faster-RCNN is
10 times faster than Fast-RCNN with similar accuracy.

2. Regression-based object detectors: This includes methods that pose detection as a


regression problem.

f. YOLO(You only Look Once).

5|Page
For YOLO, detection is a simple regression problem which takes an input image and learns
the class probabilities and bounding box coordinates.

It divides each image into a grid of S x S and each grid predicts N bounding boxes and
confidence. The confidence reflects the accuracy of the bounding box and whether the
bounding box actually contains an object(regardless of class). YOLO also predicts the
classification score for each box for every class in training. You can combine both the classes
to calculate the probability of each class being present in a predicted box.

So, total SxSxN boxes are predicted. However, most of these boxes have low confidence
scores and if we set a threshold, say 30% confidence, we can remove most of them as shown
in the example below.

Fig 1.7 Object Detection by YOLO

At runtime, we run our image on CNN only once. Hence, YOLO is super fast and can be run
real time. Another key difference is that YOLO sees the complete image at once as opposed
to looking at the generated region proposals in the previous methods. So, this contextual
information helps in avoiding false positives. However, one limitation for YOLO is that it
only predicts 1 type of class in one grid hence, it struggles with very small objects.

g. Single Shot Detector(SSD).

Single Shot Detector achieves a good balance between speed and accuracy. SSD runs a
convolutional network on input image only once and calculates a feature map. In order to
handle the scale, it predicts bounding boxes after multiple convolutional layers. Since each
convolutional layer operates at a different scale, it is able to detect objects of various scales.

Coming back to the main question. “What method should we adopt in order to monitor the
crowd?” Below is a comparison for all above mentioned methods as given by [8].

6|Page
Fig 1.8 Accuracy and Speed trade-off

Fig 1.9 YOLO vs SSD vs Faster-RCNN for various sizes

Keeping in mind that crowd images may be mutually occluded or distorted (depending on
camera resolution) and crowds are continuously moving, speed seems to be a little more
important than accuracy in this respect. Hence, our choice of methodology for
implementation would be YOLO based on the DarkNet (Darknet is an open source neural
network) framework.

However, it is a challenging task to model and implement YOLO CNN from scratch,
especially for beginners as it requires the development of many customized model elements
for training and for prediction. For example, even using a pre-trained model directly requires
sophisticated code to distill and interpret the predicted bounding boxes output by the model.
Instead of developing this code from scratch, we will use a third-party implementation i.e.
COCO dataset framework for YOLOv3 which consists of 80 labels, including, but not limited
to:

People , Bicycles , Cars and trucks , Airplanes , Stop signs and fire hydrants , Animals,
including cats, dogs , birds , horses , cows , and sheep , Kitchen and dining objects , such as
wine glasses , cups, forks, knives, spoons, etc.

Since we only need to detect people and exclude other classes, we will specify in our code:

“detect_people(frame, net, ln,personIdx=LABELS.index("person"))”

This particular instruction detects people (and only people) in it.

1.5 ORGANIZATION OF THESIS

Chapter 1 includes the overview of Crowd detection system. Its need in today’s world
is briefly discussed.

Chapter 2 describes the internal architecture of the software part and the working is
discussed thoroughly with the help of a block diagram.

7|Page
Chapter 3 shows the output obtained and what can be observed from these outputs.

Chapter 4 at the end, gives a proper conclusion and the future scopes and aspects for
the proposed project.

8|Page
CHAPTER 2

RASPBERRY PI CROWD DETECTION SYSTEM

2.1 INTERNAL ARCHITECTURE OF THE PROJECT

2.1.1 CODE FLOW

The implementation of code was carried out on Google Collab initially and then on
Raspbian which was installed on the RPi and run on SSH using VNC and PuTTY. The code
is divided into three parts.

1) Set the variables values

a. Minimum confidence value is set below which we receive weak detection which is to
be filtered out.
b. A minimum threshold value is set which draws a box round the detected object.

2) Creating people detection function

a. Import necessary packages.


b. Grab dimensions of the frame and initialize the results.

Fig 2.1 Blob Detection from Image

c. Construct blob from input frame and then pass YOLO object detection.

9|Page
d. Set the box coordinates, centroids and confidences (probability of object detection ).
e. Extract class ID and confidence.
f. Filter detections by ensuring that the detected object is a human and minimum
confidence is met.
g. Apply non-maxima suppression to prevent overlapping of close by boxes to avoid
confusion.
h. Ensure to update results list if atleast one person is detected

3) Grab frames and make prediction of detected people

a. Import necessary packages.


b. Construct argument parser and parse the arguments.
c. Load COCO class labels on which the YOLO used is based.
d. Derive paths of our recorded video and configure it to YOLO.
e. Initialize video stream.
f. Resize frame when object is detected.

2.1.2 PROGRAM CODE

1) Google Colab - GPU simulation on recorded video clip:


from google.colab import drive
drive.mount('/content/drive')

#--------------------Setting up the variable values-----------------#

MIN_CONF = 0.3
NMS_THRESH = 0.3

#-------------------Creating the People Detection Function------------------#

# import the necessary packages

import numpy as np
import cv2

def detect_people(frame, net, ln, personIdx=0):


# grab the dimensions of the frame and initialize the list of
# results
(H, W) = frame.shape[:2]
results = []

blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),


swapRB=True, crop=False)
net.setInput(blob)
layerOutputs = net.forward(ln)

boxes = []

10 | P a g e
centroids = []
confidences = []

for output in layerOutputs:

for detection in output:

scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]

# filter detections by (1) ensuring that the object


# detected was a person and (2) that the minimum
# confidence is met
if classID == personIdx and confidence > MIN_CONF:
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")

x = int(centerX - (width / 2))


y = int(centerY - (height / 2))

# update list of bounding box coordinates,


# centroids, and confidences
boxes.append([x, y, int(width), int(height)])
centroids.append((centerX, centerY))
confidences.append(float(confidence))

# apply non-maxima suppression to suppress weak, overlapping


# bounding boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidences, MIN_CONF, NMS_THRESH)

# ensure at least one detection exists


if len(idxs) > 0:
# loop over the indexes we are keeping
for i in idxs.flatten():
# extract the bounding box coordinates
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])

# update our results list to consist of the person


# prediction probability, bounding box coordinates,
# and the centroid
r = (confidences[i], (x, y, x + w, y + h), centroids[i])
results.append(r)

# return the list of results


return results

#-------Grab frames from video and make prediction of detected people--------#

# import the necessary packages


from google.colab.patches import cv2_imshow
from scipy.spatial import distance as dist
import numpy as np

11 | P a g e
import argparse
import imutils
import cv2
import os

# construct the argument parse and parse the arguments


ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", type=str, default="",
help="path to (optional) input video file")
ap.add_argument("-o", "--output", type=str, default="",
help="path to (optional) output video file")
ap.add_argument("-d", "--display", type=int, default=1,
help="whether or not output frame should be displayed")
args = vars(ap.parse_args(["--input","/content/drive/My Drive/social-distance-
detector/pedestrians.mp4","--output","my_output.avi","--display","1"]))

# load the COCO class labels our YOLO model was trained on
labelsPath = os.path.sep.join(["/content/drive/My Drive/social-distance-detector/yolo-
coco/coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")

# derive the paths to the YOLO weights and model configuration


weightsPath = os.path.sep.join(["/content/drive/My Drive/social-distance-detector/yolo-
coco/yolov3.weights"])
configPath = os.path.sep.join(["/content/drive/My Drive/social-distance-detector/yolo-
coco/yolov3.cfg"])

print("Loading YOLO from disk...")


net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

# determine only the output layer names that we need from YOLO
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# initialize the video stream and point to output video file


print("=Accessing video stream...")
vs = cv2.VideoCapture(args["input"] if args["input"] else 0)
writer = None

# loop over the frames from the video stream


while True:
# read the next frame from the file
(grabbed, frame) = vs.read()

# if the frame was not grabbed, then we have reached the end
# of the stream
if not grabbed:
break

# resize the frame and then detect people (and only people) in it
frame = imutils.resize(frame, width=700)
results = detect_people(frame, net, ln,personIdx=LABELS.index("person"))

if len(results) >= 2:
# extract all centroids from the results and compute the

12 | P a g e
# Euclidean distances between all pairs of the centroids
centroids = np.array([r[2] for r in results])
D = dist.cdist(centroids, centroids, metric="euclidean")

# loop over the results


for (i, (prob, bbox, centroid)) in enumerate(results):
# extract the bounding box and centroid coordinates, then
# initialize the color of the annotation
(startX, startY, endX, endY) = bbox
(cX, cY) = centroid
color = (0, 255, 0)

# draw (1) a bounding box around the person and (2) the
# centroid coordinates of the person
cv2.rectangle(frame, (startX, startY), (endX, endY), color, 2)
cv2.circle(frame, (cX, cY), 5, color, 1)

if args["display"] > 0:
# show the output frame
cv2_imshow(frame)

# if an output video file path has been supplied and the video
# writer has not been initialized
if args["output"] != "" and writer is None:
# initialize our video writer
fourcc = cv2.VideoWriter_fourcc(*"MJPG")
writer = cv2.VideoWriter(args["output"], fourcc, 25,
(frame.shape[1], frame.shape[0]), True)

# if the video writer is not None, write the frame to the output
# video file
if writer is not None:
writer.write(frame)

2) Spyder - Real time simulation on RPi on captured image:


# import the necessary packages
from scipy.spatial import distance as dist
import numpy as np
import argparse
import imutils
import cv2
import os

#--------------------Setting up the variable values-----------------#

MIN_CONF = 0.3
NMS_THRESH = 0.3

#-------------------Creating the People Detection Function------------------#

def detect_people(frame, net, ln, personIdx=0):


# grab the dimensions of the frame and initialize the list of
# results

13 | P a g e
(H, W) = frame.shape[:2]
results = []

blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),


swapRB=True, crop=False)
net.setInput(blob)
layerOutputs = net.forward(ln)

boxes = []
centroids = []
confidences = []

for output in layerOutputs:

for detection in output:

scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]

# filter detections by (1) ensuring that the object


# detected was a person and (2) that the minimum
# confidence is met
if classID == personIdx and confidence > MIN_CONF:

box = detection[0:4] * np.array([W, H, W, H])


(centerX, centerY, width, height) = box.astype("int")

x = int(centerX - (width / 2))


y = int(centerY - (height / 2))

# update list of bounding box coordinates,


# centroids, and confidences
boxes.append([x, y, int(width), int(height)])
centroids.append((centerX, centerY))
confidences.append(float(confidence))

# apply non-maxima suppression to suppress weak, overlapping


# bounding boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidences, MIN_CONF, NMS_THRESH)

# ensure at least one detection exists


if len(idxs) > 0:

# loop over the indexes we are keeping


for i in idxs.flatten():
# extract the bounding box coordinates
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])

# update our results list to consist of the person


# prediction probability, bounding box coordinates,
# and the centroid
r = (confidences[i], (x, y, x + w, y + h), centroids[i])

14 | P a g e
results.append(r)

# return the list of results


return results

#---------------Load module and initialize picamera---------------#

import time
import picamera
with picamera.PiCamera() as camera:
camera.start_preview()
try:
for i, filename in enumerate(camera.capture_continuous('image{counter:02d}.jpg')):
print(filename)
time.sleep(1)
if i == 0:
break
finally:
camera.stop_preview()

#-------Grab frames from video and make prediction of detected people--------#

# construct the argument parse and parse the arguments


ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", type=str, default="",
help="/home/pi/sem7/tp/cds/cds1/CDS_orig/image01.jpg")
ap.add_argument("-o", "--output", type=str, default="",
help="//home/pi/sem7/tp/cds/cds1/CDS_orig/my_output.jpg")
ap.add_argument("-d", "--display", type=int, default=1,
help="whether or not output frame should be displayed")
args = vars(ap.parse_args(["--input","/home/pi/sem7/tp/cds/cds1/CDS_orig/image01.jpg","-
-output","my_output.jpg","--display","1"]))

# load the COCO class labels our YOLO model was trained on
labelsPath = os.path.sep.join(["yolo-coco/coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")

# derive the paths to the YOLO weights and model configuration


weightsPath = os.path.sep.join(["yolo-coco/yolov3.weights"])
configPath = os.path.sep.join(["yolo-coco/yolov3.cfg"])

print("Loading YOLO from disk...")


net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

# determine only the output layer names that we need from YOLO
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# initialize the camera and point to output image file


print("Accessing Captured Image...")
imgpath="/home/pi/sem7/tp/cds/cds1/CDS_orig/image01.jpg"
frame=cv2.imread(imgpath,1)
#ifargs["display"] > 0:
cv2.imshow('Image',frame)
15 | P a g e
cv2.waitKey(0)
cv2.destroyAllWindows()
writer=None

# resize the frame and then detect people (and only people) in it
frame = imutils.resize(frame,width=700)
results = detect_people(frame, net, ln, personIdx=LABELS.index("person"))

if len(results) >= 2:
# extract all centroids from the results and compute the
# Euclidean distances between all pairs of the centroids
centroids = np.array([r[2] for r in results])
D = dist.cdist(centroids, centroids, metric="euclidean")

# loop over the results


for (i, (prob, bbox, centroid)) in enumerate(results):
# extract the bounding box and centroid coordinates, then
# initialize the color of the annotation
(startX, startY, endX, endY) = bbox
(cX, cY) = centroid
color = (0, 255, 0)

# draw (1) a bounding box around the person and (2) the
# centroid coordinates of the person
cv2.rectangle(frame, (startX, startY), (endX, endY), color, 2)
cv2.circle(frame, (cX, cY), 5, color, 1)

if args["display"] > 0:
# show the output frame
cv2.imshow("Frame", frame)
cv2.waitKey(0)
cv2.destroyAllWindows()

# if an output image file path has been supplied and the writer has not been #initialized
if args["output"] != "" and writer is None:
outpath="/home/pi/sem7/tp/cds/cds1/CDS_orig/my_output.jpg"
cv2.imwrite(outpath,frame)

# if the writer is not None, write the frame to the output


# image file
if writer is not None:
writer.write(frame)

2.1.3 WORKING

Raspberry Pi camera is used along with Raspberry to capture live video. The live
video is then processed frame by frame by using image processing concepts.

16 | P a g e
Fig 2.2 Block diagram

The block diagram shown in fig 2.1 briefs how the whole project works. It starts with
a Raspberry pi camera which records real time video which is further passed on to Raspberry
Pi board which has an external power supply connected with it. Normally, PC is not capable
of providing enough amperes of current to RPi, so we connect an external power supply of
output current greater than 3amps.

The Raspberry Pi is connected to the PC via WiFi through an SSH client PuTTY
which carries out the necessary processing and computations to detect humans. The Raspbian
is operated on VNC Viewer. When the code is run in Spyder or any similar software, an
image is captured by the Picamera, saved and displayed on the screen on VNC. The next step
is to process this image with the help of COCO and YOLO framework. This processing is
done frame by frame. Upon completing the processing, the frame is written with the
bounding boxes and centroid and finally displayed on screen as the output. The output is
saved in the specified path.

2.1.4 TROUBLESHOOTING

1. Segmentation fault errors because of bounding box size. We faced this error initially when
we were running the code on spyder in windows. But then we shifted to google colab because
it offers GPU simulation and further on to spyder on raspbian, where we didn’t face the same
issue.

2. swapRB conflict type error. This error occurred because the ‘mean’ argument in the
blobfromimage function takes the value in RGB format while opencv uses BGR format.
After making swapRB=true, we made both formats the same.

3. Input file error. This error occurred because of an incorrect drive file path where the
yoloweights config file was located.

4. Google colab cannot be interfaced with microcontrollers. Although it now seems logical
that it is a virtual/ cloud platform just to enable code simulation and see possible errors before
actual implementation on hardware, we did not know this initially. So we had to carry out the
entire process of raspbian installation and configuration.

5. Required cable for RPi4 is HDMI to microHDMI which is HDMI0. The regular ones used
are HDMI1.

17 | P a g e
6. Raspbian installation. During raspbian installation we faced various minor errors. For
example, we already had an 8GB SD card. So when to tried writing raspbianos image, it
couldn’t be written because of disc file size larger than the card’s size. We also couldn’t find
a software to compress a disc file. After numerous such problems, we configured RPi using
PuTTY and VNC [16]

7. ‘Cannot display desktop error’. Changed resolution and reinstalled session manager called
LXSession [17]

8. Real time crowd detection on live video stream. Live video capture is not possible on
colab and since the main reason for using it was its GPU simulation capability, having to run
the final code on raspbian was ultimately running on CPU which is extremely slow when it
comes to intensive processing and computations like this. Whenever we tried to capture the
live video stream, the RPi would get extremely hot before the output frames could start to
display. For this reason real time testing was done on image/ 1 frame instead of video.

Although the live stream simulation can easily be done using a CUDA GPU however,
according to this article [19] , it appears that none of the GPUs available in the market can be
initialized using RPi B 4.

9. Cv2 version. At one point we faced error with the version of opencv as follows:

cv2.error: OpenCV (4.5.3) /tmp/pip-wheel-pd499c91/opencv-python_3a15e83ee


e864e65b7311a199a94e9f1/opencv/modules/core/src/array.cpp:2494: error: -206: Bad flag
(parameter or structure field)) Unrecognized or unsupported array type in function
'cvGetMat'

From the above error, it isn’t quite clear if the error is with the opencv version. We tried
running a very basic part of this code on spyder in windows and when it successfully gave the
output, we updated the opencv version in raspbian as the same one that we were using in
spyder on windows.

After updating the version, we got another error:

ImportError: libjasper.so.1: cannot open shared object file: No such file or directory on RPi

which we were able to solve with the help of this thread [18]

18 | P a g e
CHAPTER 3

ACCOMPLISHMENTS

3.1 RESULTS

3.1.1 Batch Processing (Non Real Time) Implementation - Google


Colab

Fig 3.1 image given as input in Google colab

Fig 3.2 crowd detected output in google colab

3.1.2 Real Time Implementation - RPi and Spyder


19 | P a g e
Fig 3.3 Input image captured by RPi camera

Fig 3.4 crowd detected output image in RPi

3.2 OBSERVATIONS

20 | P a g e
Object Detection Metrics are often useful in making observations related to object detection.
A few related terms are:

Precision
Precision is the ability of a model to identify only the relevant objects. It is the percentage of
correct positive predictions and is given by:

Recall
Recall is the ability of a model to find all the relevant cases (all ground truth bounding
boxes). It is the percentage of true positive detected among all relevant ground truths and is
given by:

True Positive, False Positive, False Negative and True Negative

True Positive (TP): A correct detection. Detection with IOU ≥ threshold


False Positive (FP): A wrong detection. Detection with IOU < threshold
False Negative (FN): A ground truth not detected
True Negative (TN): Does not apply. It would represent a corrected misdetection. In the
object detection task there are many possible bounding boxes that should not be detected
within an image. Thus, TN would be all possible bounding boxes that were corrrectly not
detected (so many possible boxes within an image). That's why it is not used by the metrics.

Threshold
Depending on the metric, it is usually set to 50%, 75% or 95%.

With the help of the procedure mentioned in [20], we were able to generate confidence,
precision and recall values for our code for 5 sample images captured using camera at
random from internet and our own images.

Ground
Image No. Confidences TP or FP Precision Recall
Truth

1 10 54% TP 1.00 0.63

2 54 67% TP 0.95 1.00

3 59 44% FP 0.87 0.94

4 38 71% FP 0.88 0.60

5 153 95% TP 0.80 0.74

Therefore the average accuracy from a sample of just 5 images processed using YOLO is
approximately 88%.
CHAPTER 4
21 | P a g e
CONCLUSION AND SCOPE OF THE PROJECT

4.1 LIMITATIONS

1. Unable to detect objects which are masked.


If the whole body of a person is not detected properly then the detection won’t
be successfully done

2. Object Classification
Any object appearing would be detected without classification of it being a
human being , car, bike etc.

3. RPi Computational requirements


The computation of Raspberry Pi requires high power.

4. Network Interruption
If WiFi is interrupted, PuTTY - the SSH client for windows becomes inactive
and has to be restarted
.
5. Accuracy and Confidence
Confidence is good but Accuracy of the system is comparatively slightly less.
This data is also has dependencies on camera resolution.

4.2 APPLICATIONS

Several applications exhaustively rely on a robust and efficient crowd management


and monitoring system.

1. Maintaining public order in certain crowded places such as airports, carnivals, sports
events, and railway stations is very essential. In a crowd management system,
counting people is an essential factor. Particularly in smaller areas, increase in the
number of people create problems such as fatalities, physical injury etc. Detecting
such unnecessary social gatherings/public events by alerting the required authorities
can be done easily.
.

2. Obtain real world data for revenue opportunity analysis which would help places like
a cafeteria where if the number of people present at the food counter is known, it
would be useful in making better decisions regarding service offerings, advertisement
and to streamline staffing levels.

3. The number of fighting jets, soldiers, and moving drones and their motion etc. are
estimated through proper crowd management systems. Thus the strength of the armed
forces can be estimated through this system.

22 | P a g e
4. Crowd monitoring systems are used to minimize terror attacks in public gatherings.
Traditional machine learning methods do not perform well in these situations. Some
methods which are used for proper monitoring of such sort of detection activities can
be explored

5. Detection of high traffic on roads.

6. Observe the features of the persons entering(Male/Female).

4.3 CONCLUSION

Crowd image analysis is an essential task for several applications. Crowd analysis
provides sufficient information about several tasks including counting, localization,
behaviour analysis etc. Our proposed project is easy to use, affordable and efficient. The
issue of the tiresome job of human detection can be resolved.
From the perspective of reuqirement of speed and accuracy of crowd detection systems, we
conclude that YOLO can be considered as a potential solution. However, with the right kind
of computing tools like a GPU and better microcontrollers and/or microprocessors, this
implementation could be even more efficient and smooth.

REFERENCES

23 | P a g e
[1] Antic B, Letic D, Culibrk D, Crnojevic V (2009) K- means based segmentation for real
time zenithal people counting. In: Proceedings of IEEE international conference on image
processing, pp 2565–2568

[2] Shao J (2017) Crowded scene understanding by deeply learned volumetric slices. IEEE
Trans Circuits Syst Video Technol 27(3):613–623

[3] Bansal A, Venkatesh KS (2009) People counting in high density crowds from still
images. In: Proceedings of. IEEE international conference on computer vision and pattern
recognition, pp 1093–1100

[4] Fu H, Ma H, Xiao H (2014a) Crowd counting via head detection and motion flow
estimation. In: Proceedings of 22nd ACM international conference on Multimedia, Florida,
pp 877–880

[5] Fu H, Ma H, Xiao H (2014b) Real-time accurate crowd counting based on RGBD


information. In: Proceedings of IEEE international conference on image processing, pp
2585–2568

[6] Chauhan RV, Kumar S, Singh SK (2016) Human count estimation in high density crowd
images and videos. In: Proceedings of fourth international conference on parallel, distributed
and grid computing (PDGC), Waknaghat, pp 343–347

[7] Brostow GJ, Cipolla R (2006) Unsupervised bayesian detection of independent motion in
crowds. In: Proceedings IEEE computer society conference on computer vision and pattern
recognition (CVPR’06), pp 594–601

[8] Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep
convolutional neural networks. In: Proceedings of IEEE conference on computer vision and
pattern recognition (CVPR), Boston, MA, pp 833–841

[9] Zero to Hero: Guide to Object Detection using Deep Learning: Faster -CNN,YOLO,SSD
- CV-Tricks.com, 2021

[10] Advances and Trends in Real Time Visual Crowd Analysis, MDPI

[11] Real-Time Crowd Detection and Surveillance System, IEEE

[12] Methods for Crowd Video Surveillance and Analysis, IEEE

[13] Simple object tracking with OpenCV

[14] Darknet Github Repository

[15] Perform Object Detection With YOLO in Keras

[16] Set Up a Headless Raspberry Pi using PuTTY and VNC

[17] Fix Raspberry Pi's 'Cannot Currently Show the Desktop' Error

[18] ImportError: libjasper.so.1: cannot open shared object file: No such file or directory on
RPi

24 | P a g e
[19] External GPUs and the Raspberry Pi Compute Module 4

25 | P a g e

You might also like