0% found this document useful (0 votes)

112 views33 pages

Crowd Detection

The document summarizes a term project report on a Raspberry Pi crowd detection system. It includes a cover page with the project title and student names and roll numbers. It also includes a certificate page signed by the staff in-charge and head of department, dated November 27, 2021. The acknowledgments section thanks the staff in-charge and head of department for their support and guidance in completing the project.

Uploaded by

sruthi cheruvullil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

112 views33 pages

Crowd Detection

Uploaded by

sruthi cheruvullil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Term Project Report on

RASPBERRY PI CROWD
DETECTION SYSTEM
B. Tech. EC
Sem. VII

Submitted by
Name:Sruthi Cheruvullil Name: SnigdhaLokre
Roll No.: EC010 Roll No.: EC031
ID No.: 18ECUOS096 ID No.: 18ECUOS036

DEPARTMENT OF ELECTRONICS AND COMMUNICATION

FACULTY OF TECHNOLOGY
DHARMSINH DESAI UNIVERSITY, NADIAD

I|Page
Certificate
This is to certify that the project entitled “title of the project” is a bonafide
work of Mr. /
Miss.CHERUVULLIL SRUTHI GOKULANDHANRoll.No.: EC010_Identity
No.:__18 ECUOS096_______ of B.Tech. Semester VII in the branch of
“Electronics and Communication” during the academic year 2021-2022.

Staff In-Charge Head of the Department,

Prof. Nisarg K. Bhatt Dr. PurvangDalal

Date: 27-11-2021

II | P a g e
ACKNOWLEDGEMENT

The success and final outcome of any project requires the right guidance and assistance from
a lot of people and we are extremely privileged to have got it throughout the completion of
our project. All that we have done is only due to such supervision and assistance and would
like to thank all of them. we respect and thank Prof. Nisarg Bhatt, for providing us an
opportunity to carry out the project work and giving us all support and guidance which made
us complete the project duly.
We would like to express our sincere gratitude to our Head of Department, Dr. PurvangDalal
for providing us with an ingenuitive atmosphere. At last but not the least, we are highly
thankful to the Faculty of Technology, DDU for providing us such a platform to give life to
our classroom learned skills.

III | P a g e
INDEX

TITLE PAGE NO.

Certificate II

Acknowledgement III

List of figures VI

Abbreviations VII

Abstract VIII

References

Chapter 1- Introduction

1.1 Background 1

1.2 Objective 1

1.3 Scope of project 1

1.4 Research and principle 1

1.4.1 Related Works 1

1.4.2 Existing Methods and Proposed Methodology 2

1.5 Organization of thesis 7

Chapter 2- Raspberry Pi crowd detection system

2.1 Internal architecture of the project 9

2.1.1 Code flow 9

2.1.2 Program code 10

2.1.3 Working 16

2.1.4 Troubleshooting 17

Chapter 3- Accomplishments

3.1 Results 19

3.1.1 Batch Processing (Non Real Time)

Implementation - Google Colab 19

IV | P a g e
3.1.2 Real Time Implementation - RPi 19
and Spyder

3.2 Observations 20

Chapter 4- Conclusion and scope of the project

4.1 Limitations 22

4.2 Applications 22

4.3 Conclusion 23

V|Page
LIST OF FIGURES

Fig 1.1 Flow diagram of crowd detection using image processing technique

Fig 1.2 Small sized object

Fig 1.3 Big sized object. What size do we choose for our sliding window detector?

Fig 1.4 Sliding Window Pyramid

Fig 1.5 Working of R- CNN

Fig 1.6 Working of SPP-net

Fig 1.7 Object Detection by YOLO

Fig 1.8 Accuracy and Speed trade-off

Fig 1.9 YOLO vs SSD vs Faster-RCNN for various sizes

Fig 2.1 Blob Detection Method

Fig 2.2 Block diagram

Fig 3.1 Image given as input in google colab

Fig 3.2 Crowd detected output in google colab

Fig 3.3 Input image captured by RPi camera

Fig 3.4 crowd detected output image in RPi

VI | P a g e
ABBREVIATIONS

RPi = Raspberry Pi

COCO = Company Owned Company Operated

YOLO = You Only Look Once

VNC = Virtual Network Computing

CNN = Convolutional Neural Networks

SSD = Single Shot Detector

PC = Personal Computer

GPU = Graphical Interface Unit

Spyder = Scientific Python Development Environment

WiFi = Wireless Fidelity

SSH = Secure Shell

HDMI = High Definition Multimedia Interface

VII | P a g e
ABSTRACT

With the expanding population and several problems arising due to crowded situations, the
necessity of crowd detection is also at a raise. It includes assessing the number of individuals
in the group and in addition the appropriation of the crowd density in different regions of the
group. Human monitoring can be quite tiresome and expensive.This is where Automated
Crowd Surveillance comes into picture. Estimation of such crowd density can be done from
the image/video of the crowded scene. Our project proposes a real-time approach to solve
such problems related to dense crowds. It uses live video capturing techniques with the help
of Raspberry Pi camera and Raspberry Pi B 4, and attempts to estimate the crowd density of
an area by applying image processing concepts, COCO classes and YOLO models.

VIII | P a g e
CHAPTER 1

INTRODUCTION

1.1 BACKGROUND

By crowd, we mainly refer to the average number of individuals present in a

particular place at a particular time. As a result of such crowds, spreading of diseases like the
novel CORONAvirus or various accidents may take place.. Often miscreants use such crowds
to do various inhuman activities. Presently, video based crowd monitoring is of severe
importance in order to maintain human safety in crowded situations. To avoid hazardous
situations,due to the crowd, the need for crowd analysis system for handling dense crowds is
at a raise. For any crowd analysis system, crowd counting is a must. This includes evaluating
the aggregate number of people in the crowd, along with the density of the crowd in the
different parts of the area. Certain potential mishaps may be easily avoided by giving advance
warning whenever the crowd density of a certain area went over the safe limit. This may help
to maintain the overall management and infrastructure of the area.

1.2 OBJECTIVE
Our proposed project uses Raspberry Pi as the microcontroller and Raspberry Pi
camera as the device to capture live video of a certain place where the crowd detection is to
be carried out. The primary objective is to carry out crowd detection without human
interference in order to increase accuracy and precision.

1.3 SCOPE OF THE PROJECT

The Raspberry PI crowd detection system can extend its facets in the future by
including individual emotional analysis of detected people, detection of velocity of moving
objects(helpful at traffic signals) and object classification into human beings, 2 wheeler, 3
wheeler etc.

1.4 RESEARCH AND PRINCIPLE

1.4.1 Related Works

a. Crowd count for high-density images by Ankan et al. [1] but it is inefficient
for images containing mutual occlusion.
b. A deep learning approach by Shao et al. [2] for understanding crowded scenes
from video sequences.
c. Crowd counting technique proposed by Fu et al. [3]. But it works only for
low-density images and not for high-density ones.
d. A promising crowd counting method proposed by Huiyuan Fu [4] where
probable head regions can be found using a depth camera. But the system is
infeasible due to cost overhead because of the depth camera. Besides, it does
not work for large regions too.
e. The count of moving objects was estimated in the methods proposed in [5, 6].
These methods use the pattern of moving objects obtained from video streams
requiring a good frame rate which is pretty tough to achieve and also do not
work in case of still images.

1|Page
f. Zhang et al. [7] introduces a method utilizing a deep network trained using
perspective maps of images.

The methodologies proposed above (a through e) involve use of image processing

algorithms: Thresholding (Image Segmentation) and Erosion as shown in the below
flowchart:

Fig 1.1 Flow diagram of crowd detection using image processing technique

However, image processing using the above techniques is the most appropriate in images
with high stages of contrast which is not always the case in real time scenarios where we
could have distorted images depending on the camera quality or target places of monitoring
might be overcrowded.

Therefore, our main focus would be on state-of-the-art methods, all of which use neural
networks and Deep Learning.

1.4.2 Existing Methods and Proposed Methodology

A few of the important concepts in object detection are a sliding window pyramid
and aspect ratio.

Object Detection is modeled as a classification problem where we take windows of fixed

sizes from input images at all the possible locations and feed these patches to an image
classifier.

Each window is fed to the classifier which predicts the class of the object in the window( or
background if none is present). Hence, we know both the class and location of the objects in
the image. But how do we know the size of the window so that it always contains the image?
Let us look at examples:

2|Page
Fig 1.2 Small sized object

Fig 1.3 Big sized object. What size do we choose for our sliding window detector?

As we can see that the object can be of varying sizes. To solve this problem an image
pyramid is created by scaling the image.Idea is that we resize the image at multiple scales and
we count on the fact that our chosen window size will completely contain the object in one of
these resized images. Most commonly, the image is downsampled(size is reduced) until a
certain condition, typically a minimum size is reached. On each of these images, a fixed size
window detector is run. It’s common to have as many as 64 levels on such pyramids. Now,
all these windows are fed to a classifier to detect the object of interest. This will help us solve
the problem of size and location.

3|Page
Fig 1.4 Sliding Window Pyramid

The other problem is aspect ratio. A lot of objects can be present in various shapes like a
sitting person will have a different aspect ratio than a standing person or a sleeping person.

The following methods exist in the field of object detection:

1. Object detection by classification by building pipeline: Here object detection is

handled as a classification problem by building a pipeline where first object proposals
are generated and then these proposals are sent to classification/regression heads.

a. Object Detection using Hog Features.

Hog features are good for many real-world problems. On each window obtained from
running the sliding window on the pyramid, we calculate Hog Features which are fed to an
SVM(Support vector machine) to create classifiers.

b. Region-based Convolutional Neural Networks(R-CNN).

HOG based classifiers are CNNs that are too slow and computationally very expensive. It is
impossible to run CNNs on so many patches generated by a sliding window detector. R-CNN
solves this problem by using an object proposal algorithm called Selective Search which
reduces the number of bounding boxes that are fed to the classifier.

The 3 important parts of R-CNN are:

I. Run Selective Search to generate probable objects.

II. Feed these patches to CNN, followed by SVM to predict the class of each patch.
III. Optimize patches by training bounding box regression separately.

Fig 1.5 Working of R- CNN

c. Spatial Pyramid Pooling(SPP-net).

Still, RCNN are very slow. With SPP-net, we calculate the CNN representation for the entire
image only once and can use that to calculate the CNN representation for each patch
generated by Selective Search by performing a pooling type of operation on just that section

4|Page
of the feature maps of the last conv layer that corresponds to the region.
It also uses spatial pooling after the last convolutional layer as opposed to traditionally used
max-pooling because we need to generate the fixed size of input for the fully connected
layers of the CNN.

Fig 1.6 Working of SPP-net

d. Fast R-CNN.

With SPP net, it is not trivial to perform back-propagation through spatial pooling layer.
Hence, the network only fine-tunes the fully connected part of the network. Thus, Fast RCNN
uses the ideas from SPP-net and RCNN and fixes the key problem in SPP-net i.e. it makes it
possible to train end-to-end. Along with this, Fast RCNN adds the bounding box regression
to the neural network training itself hence reducing the overall training time and increasing
the accuracy in comparison to SPP net because of the end to end learning of CNN.

e. Faster R-CNN.

Even though Fast R-CNN is fast and accurate, the slowest part is its Selective Search or Edge
boxes. Faster RCNN replaces selective search with a very small convolutional network called
Region Proposal Network to generate regions of Interests.

It introduces the idea of anchor boxes. At each location, the original paper uses 3 kinds of
anchor boxes for scale 128x 128, 256×256 and 512×512. Similarly, for aspect ratio, it uses
three aspect ratios 1:1, 2:1 and 1:2. So, In total at each location, we have 9 boxes on which
RPN predicts the probability of it being background or foreground. Hence, Faster-RCNN is
10 times faster than Fast-RCNN with similar accuracy.

2. Regression-based object detectors: This includes methods that pose detection as a

regression problem.

f. YOLO(You only Look Once).

5|Page
For YOLO, detection is a simple regression problem which takes an input image and learns
the class probabilities and bounding box coordinates.

It divides each image into a grid of S x S and each grid predicts N bounding boxes and
confidence. The confidence reflects the accuracy of the bounding box and whether the
bounding box actually contains an object(regardless of class). YOLO also predicts the
classification score for each box for every class in training. You can combine both the classes
to calculate the probability of each class being present in a predicted box.

So, total SxSxN boxes are predicted. However, most of these boxes have low confidence
scores and if we set a threshold, say 30% confidence, we can remove most of them as shown
in the example below.

Fig 1.7 Object Detection by YOLO

At runtime, we run our image on CNN only once. Hence, YOLO is super fast and can be run
real time. Another key difference is that YOLO sees the complete image at once as opposed
to looking at the generated region proposals in the previous methods. So, this contextual
information helps in avoiding false positives. However, one limitation for YOLO is that it
only predicts 1 type of class in one grid hence, it struggles with very small objects.

g. Single Shot Detector(SSD).

Single Shot Detector achieves a good balance between speed and accuracy. SSD runs a
convolutional network on input image only once and calculates a feature map. In order to
handle the scale, it predicts bounding boxes after multiple convolutional layers. Since each
convolutional layer operates at a different scale, it is able to detect objects of various scales.

Coming back to the main question. “What method should we adopt in order to monitor the
crowd?” Below is a comparison for all above mentioned methods as given by [8].

6|Page
Fig 1.8 Accuracy and Speed trade-off

Fig 1.9 YOLO vs SSD vs Faster-RCNN for various sizes

Keeping in mind that crowd images may be mutually occluded or distorted (depending on
camera resolution) and crowds are continuously moving, speed seems to be a little more
important than accuracy in this respect. Hence, our choice of methodology for
implementation would be YOLO based on the DarkNet (Darknet is an open source neural
network) framework.

However, it is a challenging task to model and implement YOLO CNN from scratch,
especially for beginners as it requires the development of many customized model elements
for training and for prediction. For example, even using a pre-trained model directly requires
sophisticated code to distill and interpret the predicted bounding boxes output by the model.
Instead of developing this code from scratch, we will use a third-party implementation i.e.
COCO dataset framework for YOLOv3 which consists of 80 labels, including, but not limited
to:

People , Bicycles , Cars and trucks , Airplanes , Stop signs and fire hydrants , Animals,
including cats, dogs , birds , horses , cows , and sheep , Kitchen and dining objects , such as
wine glasses , cups, forks, knives, spoons, etc.

Since we only need to detect people and exclude other classes, we will specify in our code:

“detect_people(frame, net, ln,personIdx=LABELS.index("person"))”

This particular instruction detects people (and only people) in it.

1.5 ORGANIZATION OF THESIS

Chapter 1 includes the overview of Crowd detection system. Its need in today’s world
is briefly discussed.

Chapter 2 describes the internal architecture of the software part and the working is
discussed thoroughly with the help of a block diagram.

7|Page
Chapter 3 shows the output obtained and what can be observed from these outputs.

Chapter 4 at the end, gives a proper conclusion and the future scopes and aspects for
the proposed project.

8|Page
CHAPTER 2

RASPBERRY PI CROWD DETECTION SYSTEM

2.1 INTERNAL ARCHITECTURE OF THE PROJECT

2.1.1 CODE FLOW

The implementation of code was carried out on Google Collab initially and then on
Raspbian which was installed on the RPi and run on SSH using VNC and PuTTY. The code
is divided into three parts.

1) Set the variables values

a. Minimum confidence value is set below which we receive weak detection which is to
be filtered out.
b. A minimum threshold value is set which draws a box round the detected object.

2) Creating people detection function

a. Import necessary packages.

b. Grab dimensions of the frame and initialize the results.

Fig 2.1 Blob Detection from Image

c. Construct blob from input frame and then pass YOLO object detection.

9|Page
d. Set the box coordinates, centroids and confidences (probability of object detection ).
e. Extract class ID and confidence.
f. Filter detections by ensuring that the detected object is a human and minimum
confidence is met.
g. Apply non-maxima suppression to prevent overlapping of close by boxes to avoid
confusion.
h. Ensure to update results list if atleast one person is detected

3) Grab frames and make prediction of detected people

a. Import necessary packages.

b. Construct argument parser and parse the arguments.
c. Load COCO class labels on which the YOLO used is based.
d. Derive paths of our recorded video and configure it to YOLO.
e. Initialize video stream.
f. Resize frame when object is detected.

2.1.2 PROGRAM CODE

1) Google Colab - GPU simulation on recorded video clip:

from google.colab import drive
drive.mount('/content/drive')

#--------------------Setting up the variable values-----------------#

MIN_CONF = 0.3
NMS_THRESH = 0.3

#-------------------Creating the People Detection Function------------------#

# import the necessary packages

import numpy as np
import cv2

def detect_people(frame, net, ln, personIdx=0):

# grab the dimensions of the frame and initialize the list of
# results
(H, W) = frame.shape[:2]
results = []

blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),

swapRB=True, crop=False)
net.setInput(blob)
layerOutputs = net.forward(ln)

boxes = []

10 | P a g e
centroids = []
confidences = []

for output in layerOutputs:

for detection in output:

scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]

# filter detections by (1) ensuring that the object

# detected was a person and (2) that the minimum
# confidence is met
if classID == personIdx and confidence > MIN_CONF:
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")

x = int(centerX - (width / 2))

y = int(centerY - (height / 2))

# update list of bounding box coordinates,

# centroids, and confidences
boxes.append([x, y, int(width), int(height)])
centroids.append((centerX, centerY))
confidences.append(float(confidence))

# apply non-maxima suppression to suppress weak, overlapping

# bounding boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidences, MIN_CONF, NMS_THRESH)

# ensure at least one detection exists

if len(idxs) > 0:
# loop over the indexes we are keeping
for i in idxs.flatten():
# extract the bounding box coordinates
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])

# update our results list to consist of the person

# prediction probability, bounding box coordinates,
# and the centroid
r = (confidences[i], (x, y, x + w, y + h), centroids[i])
results.append(r)

# return the list of results

return results

#-------Grab frames from video and make prediction of detected people--------#

# import the necessary packages

from google.colab.patches import cv2_imshow
from scipy.spatial import distance as dist
import numpy as np

11 | P a g e
import argparse
import imutils
import cv2
import os

# construct the argument parse and parse the arguments

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", type=str, default="",
help="path to (optional) input video file")
ap.add_argument("-o", "--output", type=str, default="",
help="path to (optional) output video file")
ap.add_argument("-d", "--display", type=int, default=1,
help="whether or not output frame should be displayed")
args = vars(ap.parse_args(["--input","/content/drive/My Drive/social-distance-
detector/pedestrians.mp4","--output","my_output.avi","--display","1"]))

# load the COCO class labels our YOLO model was trained on
labelsPath = os.path.sep.join(["/content/drive/My Drive/social-distance-detector/yolo-
coco/coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")

# derive the paths to the YOLO weights and model configuration

weightsPath = os.path.sep.join(["/content/drive/My Drive/social-distance-detector/yolo-
coco/yolov3.weights"])
configPath = os.path.sep.join(["/content/drive/My Drive/social-distance-detector/yolo-
coco/yolov3.cfg"])

print("Loading YOLO from disk...")

net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

# determine only the output layer names that we need from YOLO
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# initialize the video stream and point to output video file

print("=Accessing video stream...")
vs = cv2.VideoCapture(args["input"] if args["input"] else 0)
writer = None

# loop over the frames from the video stream

while True:
# read the next frame from the file
(grabbed, frame) = vs.read()

# if the frame was not grabbed, then we have reached the end
# of the stream
if not grabbed:
break

# resize the frame and then detect people (and only people) in it
frame = imutils.resize(frame, width=700)
results = detect_people(frame, net, ln,personIdx=LABELS.index("person"))

if len(results) >= 2:
# extract all centroids from the results and compute the

12 | P a g e
# Euclidean distances between all pairs of the centroids
centroids = np.array([r[2] for r in results])
D = dist.cdist(centroids, centroids, metric="euclidean")

# loop over the results

for (i, (prob, bbox, centroid)) in enumerate(results):
# extract the bounding box and centroid coordinates, then
# initialize the color of the annotation
(startX, startY, endX, endY) = bbox
(cX, cY) = centroid
color = (0, 255, 0)

# draw (1) a bounding box around the person and (2) the
# centroid coordinates of the person
cv2.rectangle(frame, (startX, startY), (endX, endY), color, 2)
cv2.circle(frame, (cX, cY), 5, color, 1)

if args["display"] > 0:
# show the output frame
cv2_imshow(frame)

# if an output video file path has been supplied and the video
# writer has not been initialized
if args["output"] != "" and writer is None:
# initialize our video writer
fourcc = cv2.VideoWriter_fourcc(*"MJPG")
writer = cv2.VideoWriter(args["output"], fourcc, 25,
(frame.shape[1], frame.shape[0]), True)

# if the video writer is not None, write the frame to the output
# video file
if writer is not None:
writer.write(frame)

2) Spyder - Real time simulation on RPi on captured image:

# import the necessary packages
from scipy.spatial import distance as dist
import numpy as np
import argparse
import imutils
import cv2
import os

#--------------------Setting up the variable values-----------------#

MIN_CONF = 0.3
NMS_THRESH = 0.3

#-------------------Creating the People Detection Function------------------#

def detect_people(frame, net, ln, personIdx=0):

# grab the dimensions of the frame and initialize the list of
# results

13 | P a g e
(H, W) = frame.shape[:2]
results = []

blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),

swapRB=True, crop=False)
net.setInput(blob)
layerOutputs = net.forward(ln)

boxes = []
centroids = []
confidences = []

for output in layerOutputs:

for detection in output:

scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]

# filter detections by (1) ensuring that the object

# detected was a person and (2) that the minimum
# confidence is met
if classID == personIdx and confidence > MIN_CONF:

box = detection[0:4] * np.array([W, H, W, H])

(centerX, centerY, width, height) = box.astype("int")

x = int(centerX - (width / 2))

y = int(centerY - (height / 2))

# update list of bounding box coordinates,

# centroids, and confidences
boxes.append([x, y, int(width), int(height)])
centroids.append((centerX, centerY))
confidences.append(float(confidence))

# apply non-maxima suppression to suppress weak, overlapping

# bounding boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidences, MIN_CONF, NMS_THRESH)

# ensure at least one detection exists

if len(idxs) > 0:

# loop over the indexes we are keeping

for i in idxs.flatten():
# extract the bounding box coordinates
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])

# update our results list to consist of the person

# prediction probability, bounding box coordinates,
# and the centroid
r = (confidences[i], (x, y, x + w, y + h), centroids[i])

14 | P a g e
results.append(r)

# return the list of results

return results

#---------------Load module and initialize picamera---------------#

import time
import picamera
with picamera.PiCamera() as camera:
camera.start_preview()
try:
for i, filename in enumerate(camera.capture_continuous('image{counter:02d}.jpg')):
print(filename)
time.sleep(1)
if i == 0:
break
finally:
camera.stop_preview()

#-------Grab frames from video and make prediction of detected people--------#

# construct the argument parse and parse the arguments

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", type=str, default="",
help="/home/pi/sem7/tp/cds/cds1/CDS_orig/image01.jpg")
ap.add_argument("-o", "--output", type=str, default="",
help="//home/pi/sem7/tp/cds/cds1/CDS_orig/my_output.jpg")
ap.add_argument("-d", "--display", type=int, default=1,
help="whether or not output frame should be displayed")
args = vars(ap.parse_args(["--input","/home/pi/sem7/tp/cds/cds1/CDS_orig/image01.jpg","-
-output","my_output.jpg","--display","1"]))

# load the COCO class labels our YOLO model was trained on
labelsPath = os.path.sep.join(["yolo-coco/coco.names"])
LABELS = open(labelsPath).read().strip().split("\n")

# derive the paths to the YOLO weights and model configuration

weightsPath = os.path.sep.join(["yolo-coco/yolov3.weights"])
configPath = os.path.sep.join(["yolo-coco/yolov3.cfg"])

print("Loading YOLO from disk...")

net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)

# determine only the output layer names that we need from YOLO
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]

# initialize the camera and point to output image file

print("Accessing Captured Image...")
imgpath="/home/pi/sem7/tp/cds/cds1/CDS_orig/image01.jpg"
frame=cv2.imread(imgpath,1)
#ifargs["display"] > 0:
cv2.imshow('Image',frame)
15 | P a g e
cv2.waitKey(0)
cv2.destroyAllWindows()
writer=None

# resize the frame and then detect people (and only people) in it
frame = imutils.resize(frame,width=700)
results = detect_people(frame, net, ln, personIdx=LABELS.index("person"))

if len(results) >= 2:
# extract all centroids from the results and compute the
# Euclidean distances between all pairs of the centroids
centroids = np.array([r[2] for r in results])
D = dist.cdist(centroids, centroids, metric="euclidean")

# loop over the results

# draw (1) a bounding box around the person and (2) the
# centroid coordinates of the person
cv2.rectangle(frame, (startX, startY), (endX, endY), color, 2)
cv2.circle(frame, (cX, cY), 5, color, 1)

if args["display"] > 0:
# show the output frame
cv2.imshow("Frame", frame)
cv2.waitKey(0)
cv2.destroyAllWindows()

# if an output image file path has been supplied and the writer has not been #initialized
if args["output"] != "" and writer is None:
outpath="/home/pi/sem7/tp/cds/cds1/CDS_orig/my_output.jpg"
cv2.imwrite(outpath,frame)

# if the writer is not None, write the frame to the output

# image file
if writer is not None:
writer.write(frame)

2.1.3 WORKING

Raspberry Pi camera is used along with Raspberry to capture live video. The live
video is then processed frame by frame by using image processing concepts.

16 | P a g e
Fig 2.2 Block diagram

The block diagram shown in fig 2.1 briefs how the whole project works. It starts with
a Raspberry pi camera which records real time video which is further passed on to Raspberry
Pi board which has an external power supply connected with it. Normally, PC is not capable
of providing enough amperes of current to RPi, so we connect an external power supply of
output current greater than 3amps.

The Raspberry Pi is connected to the PC via WiFi through an SSH client PuTTY
which carries out the necessary processing and computations to detect humans. The Raspbian
is operated on VNC Viewer. When the code is run in Spyder or any similar software, an
image is captured by the Picamera, saved and displayed on the screen on VNC. The next step
is to process this image with the help of COCO and YOLO framework. This processing is
done frame by frame. Upon completing the processing, the frame is written with the
bounding boxes and centroid and finally displayed on screen as the output. The output is
saved in the specified path.

2.1.4 TROUBLESHOOTING

1. Segmentation fault errors because of bounding box size. We faced this error initially when
we were running the code on spyder in windows. But then we shifted to google colab because
it offers GPU simulation and further on to spyder on raspbian, where we didn’t face the same
issue.

2. swapRB conflict type error. This error occurred because the ‘mean’ argument in the
blobfromimage function takes the value in RGB format while opencv uses BGR format.
After making swapRB=true, we made both formats the same.

3. Input file error. This error occurred because of an incorrect drive file path where the
yoloweights config file was located.

4. Google colab cannot be interfaced with microcontrollers. Although it now seems logical
that it is a virtual/ cloud platform just to enable code simulation and see possible errors before
actual implementation on hardware, we did not know this initially. So we had to carry out the
entire process of raspbian installation and configuration.

5. Required cable for RPi4 is HDMI to microHDMI which is HDMI0. The regular ones used
are HDMI1.

17 | P a g e
6. Raspbian installation. During raspbian installation we faced various minor errors. For
example, we already had an 8GB SD card. So when to tried writing raspbianos image, it
couldn’t be written because of disc file size larger than the card’s size. We also couldn’t find
a software to compress a disc file. After numerous such problems, we configured RPi using
PuTTY and VNC [16]

7. ‘Cannot display desktop error’. Changed resolution and reinstalled session manager called
LXSession [17]

8. Real time crowd detection on live video stream. Live video capture is not possible on
colab and since the main reason for using it was its GPU simulation capability, having to run
the final code on raspbian was ultimately running on CPU which is extremely slow when it
comes to intensive processing and computations like this. Whenever we tried to capture the
live video stream, the RPi would get extremely hot before the output frames could start to
display. For this reason real time testing was done on image/ 1 frame instead of video.

Although the live stream simulation can easily be done using a CUDA GPU however,
according to this article [19] , it appears that none of the GPUs available in the market can be
initialized using RPi B 4.

9. Cv2 version. At one point we faced error with the version of opencv as follows:

cv2.error: OpenCV (4.5.3) /tmp/pip-wheel-pd499c91/opencv-python_3a15e83ee

e864e65b7311a199a94e9f1/opencv/modules/core/src/array.cpp:2494: error: -206: Bad flag
(parameter or structure field)) Unrecognized or unsupported array type in function
'cvGetMat'

From the above error, it isn’t quite clear if the error is with the opencv version. We tried
running a very basic part of this code on spyder in windows and when it successfully gave the
output, we updated the opencv version in raspbian as the same one that we were using in
spyder on windows.

After updating the version, we got another error:

ImportError: libjasper.so.1: cannot open shared object file: No such file or directory on RPi

which we were able to solve with the help of this thread [18]

18 | P a g e
CHAPTER 3

ACCOMPLISHMENTS

3.1 RESULTS

3.1.1 Batch Processing (Non Real Time) Implementation - Google

Colab

Fig 3.1 image given as input in Google colab

Fig 3.2 crowd detected output in google colab

3.1.2 Real Time Implementation - RPi and Spyder

19 | P a g e
Fig 3.3 Input image captured by RPi camera

Fig 3.4 crowd detected output image in RPi

3.2 OBSERVATIONS

20 | P a g e
Object Detection Metrics are often useful in making observations related to object detection.
A few related terms are:

Precision
Precision is the ability of a model to identify only the relevant objects. It is the percentage of
correct positive predictions and is given by:

Recall
Recall is the ability of a model to find all the relevant cases (all ground truth bounding
boxes). It is the percentage of true positive detected among all relevant ground truths and is
given by:

True Positive, False Positive, False Negative and True Negative

True Positive (TP): A correct detection. Detection with IOU ≥ threshold

False Positive (FP): A wrong detection. Detection with IOU < threshold
False Negative (FN): A ground truth not detected
True Negative (TN): Does not apply. It would represent a corrected misdetection. In the
object detection task there are many possible bounding boxes that should not be detected
within an image. Thus, TN would be all possible bounding boxes that were corrrectly not
detected (so many possible boxes within an image). That's why it is not used by the metrics.

Threshold
Depending on the metric, it is usually set to 50%, 75% or 95%.

With the help of the procedure mentioned in [20], we were able to generate confidence,
precision and recall values for our code for 5 sample images captured using camera at
random from internet and our own images.

Ground
Image No. Confidences TP or FP Precision Recall
Truth

1 10 54% TP 1.00 0.63

2 54 67% TP 0.95 1.00

3 59 44% FP 0.87 0.94

4 38 71% FP 0.88 0.60

5 153 95% TP 0.80 0.74

Therefore the average accuracy from a sample of just 5 images processed using YOLO is
approximately 88%.
CHAPTER 4
21 | P a g e
CONCLUSION AND SCOPE OF THE PROJECT

4.1 LIMITATIONS

1. Unable to detect objects which are masked.

If the whole body of a person is not detected properly then the detection won’t
be successfully done

2. Object Classification
Any object appearing would be detected without classification of it being a
human being , car, bike etc.

3. RPi Computational requirements

The computation of Raspberry Pi requires high power.

4. Network Interruption
If WiFi is interrupted, PuTTY - the SSH client for windows becomes inactive
and has to be restarted
.
5. Accuracy and Confidence
Confidence is good but Accuracy of the system is comparatively slightly less.
This data is also has dependencies on camera resolution.

4.2 APPLICATIONS

Several applications exhaustively rely on a robust and efficient crowd management

and monitoring system.

1. Maintaining public order in certain crowded places such as airports, carnivals, sports
events, and railway stations is very essential. In a crowd management system,
counting people is an essential factor. Particularly in smaller areas, increase in the
number of people create problems such as fatalities, physical injury etc. Detecting
such unnecessary social gatherings/public events by alerting the required authorities
can be done easily.
.

2. Obtain real world data for revenue opportunity analysis which would help places like
a cafeteria where if the number of people present at the food counter is known, it
would be useful in making better decisions regarding service offerings, advertisement
and to streamline staffing levels.

3. The number of fighting jets, soldiers, and moving drones and their motion etc. are
estimated through proper crowd management systems. Thus the strength of the armed
forces can be estimated through this system.

22 | P a g e
4. Crowd monitoring systems are used to minimize terror attacks in public gatherings.
Traditional machine learning methods do not perform well in these situations. Some
methods which are used for proper monitoring of such sort of detection activities can
be explored

5. Detection of high traffic on roads.

6. Observe the features of the persons entering(Male/Female).

4.3 CONCLUSION

Crowd image analysis is an essential task for several applications. Crowd analysis
provides sufficient information about several tasks including counting, localization,
behaviour analysis etc. Our proposed project is easy to use, affordable and efficient. The
issue of the tiresome job of human detection can be resolved.
From the perspective of reuqirement of speed and accuracy of crowd detection systems, we
conclude that YOLO can be considered as a potential solution. However, with the right kind
of computing tools like a GPU and better microcontrollers and/or microprocessors, this
implementation could be even more efficient and smooth.

REFERENCES

23 | P a g e
[1] Antic B, Letic D, Culibrk D, Crnojevic V (2009) K- means based segmentation for real
time zenithal people counting. In: Proceedings of IEEE international conference on image
processing, pp 2565–2568

[2] Shao J (2017) Crowded scene understanding by deeply learned volumetric slices. IEEE
Trans Circuits Syst Video Technol 27(3):613–623

[3] Bansal A, Venkatesh KS (2009) People counting in high density crowds from still
images. In: Proceedings of. IEEE international conference on computer vision and pattern
recognition, pp 1093–1100

[4] Fu H, Ma H, Xiao H (2014a) Crowd counting via head detection and motion flow
estimation. In: Proceedings of 22nd ACM international conference on Multimedia, Florida,
pp 877–880

[5] Fu H, Ma H, Xiao H (2014b) Real-time accurate crowd counting based on RGBD

information. In: Proceedings of IEEE international conference on image processing, pp
2585–2568

[6] Chauhan RV, Kumar S, Singh SK (2016) Human count estimation in high density crowd
images and videos. In: Proceedings of fourth international conference on parallel, distributed
and grid computing (PDGC), Waknaghat, pp 343–347

[7] Brostow GJ, Cipolla R (2006) Unsupervised bayesian detection of independent motion in
crowds. In: Proceedings IEEE computer society conference on computer vision and pattern
recognition (CVPR’06), pp 594–601

[8] Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep
convolutional neural networks. In: Proceedings of IEEE conference on computer vision and
pattern recognition (CVPR), Boston, MA, pp 833–841

[9] Zero to Hero: Guide to Object Detection using Deep Learning: Faster -CNN,YOLO,SSD
- CV-Tricks.com, 2021

[10] Advances and Trends in Real Time Visual Crowd Analysis, MDPI

[11] Real-Time Crowd Detection and Surveillance System, IEEE

[12] Methods for Crowd Video Surveillance and Analysis, IEEE

[13] Simple object tracking with OpenCV

[14] Darknet Github Repository

[15] Perform Object Detection With YOLO in Keras

[16] Set Up a Headless Raspberry Pi using PuTTY and VNC

[17] Fix Raspberry Pi's 'Cannot Currently Show the Desktop' Error

[18] ImportError: libjasper.so.1: cannot open shared object file: No such file or directory on
RPi

24 | P a g e
[19] External GPUs and the Raspberry Pi Compute Module 4

25 | P a g e

Crowd Size Estimation Using Raspberry Pi
No ratings yet
Crowd Size Estimation Using Raspberry Pi
4 pages
Report of grp15 On Crowd Monitoring System
No ratings yet
Report of grp15 On Crowd Monitoring System
25 pages
Miniproject
No ratings yet
Miniproject
20 pages
16 CS01008 Btpreport
No ratings yet
16 CS01008 Btpreport
30 pages
Real Time Crowd Monitoring System
No ratings yet
Real Time Crowd Monitoring System
11 pages
Minor Project Report
No ratings yet
Minor Project Report
60 pages
Batch 20 KV
No ratings yet
Batch 20 KV
54 pages
AI-Powered Crowd Counting
No ratings yet
AI-Powered Crowd Counting
6 pages
Report
No ratings yet
Report
76 pages
PBL Proj Synopsis
No ratings yet
PBL Proj Synopsis
12 pages
Nabeel Khalid 5017 FYP
No ratings yet
Nabeel Khalid 5017 FYP
19 pages
5 BEST Sources
No ratings yet
5 BEST Sources
6 pages
Report On Attendance Management System PDF
100% (1)
Report On Attendance Management System PDF
62 pages
AI-Powered Crowd Counting in CCTV
No ratings yet
AI-Powered Crowd Counting in CCTV
4 pages
Minor Report Final-1
No ratings yet
Minor Report Final-1
21 pages
ThesisADS 11
No ratings yet
ThesisADS 11
66 pages
IoT BASED ATTENDANCE MONITORING SYSTEM PROJECT REPORT
No ratings yet
IoT BASED ATTENDANCE MONITORING SYSTEM PROJECT REPORT
45 pages
NWPU-Crowd A Large-Scale Benchmark For Crowd Counting and Localization
No ratings yet
NWPU-Crowd A Large-Scale Benchmark For Crowd Counting and Localization
9 pages
Thesis ADS11
No ratings yet
Thesis ADS11
54 pages
Sayali
No ratings yet
Sayali
7 pages
Crowd Safety Through AI
No ratings yet
Crowd Safety Through AI
74 pages
Object Detection for the Blind
No ratings yet
Object Detection for the Blind
49 pages
Screenshot 2025-04-22 at 11.44.52 AM
No ratings yet
Screenshot 2025-04-22 at 11.44.52 AM
10 pages
Letran Sticker - Based Detector For Vehicles Using Image Processing in Raspberry Pi
No ratings yet
Letran Sticker - Based Detector For Vehicles Using Image Processing in Raspberry Pi
15 pages
Rfid Based Attendance System Report
No ratings yet
Rfid Based Attendance System Report
39 pages
IoT-Based Crowd Management Model
No ratings yet
IoT-Based Crowd Management Model
6 pages
Robust Human Target Detection and Acquisition
No ratings yet
Robust Human Target Detection and Acquisition
14 pages
Crowd Safety Management System
No ratings yet
Crowd Safety Management System
8 pages
Automatic Attendance via Face Recognition
No ratings yet
Automatic Attendance via Face Recognition
16 pages
Batch 15
No ratings yet
Batch 15
14 pages
Object Recognition: Mekala Sathvik Reddy Urk18Cs146
No ratings yet
Object Recognition: Mekala Sathvik Reddy Urk18Cs146
22 pages
Delussu PHD Thesis
No ratings yet
Delussu PHD Thesis
125 pages
Project-II B.Tech Format&guidelines
No ratings yet
Project-II B.Tech Format&guidelines
30 pages
Single-Image Crowd Counting Via Multi-Column Convolutional Neural Network 2016
No ratings yet
Single-Image Crowd Counting Via Multi-Column Convolutional Neural Network 2016
9 pages
A Survey On Crowd Analysis Using Artificial Intelligence Techniques
No ratings yet
A Survey On Crowd Analysis Using Artificial Intelligence Techniques
25 pages
Human Detection System Report
No ratings yet
Human Detection System Report
39 pages
GLCM and GLDS Texture Equations
No ratings yet
GLCM and GLDS Texture Equations
26 pages
Crowd Density Mapping and Anomaly Detection Using YOLOv8 and DEEPSORT
No ratings yet
Crowd Density Mapping and Anomaly Detection Using YOLOv8 and DEEPSORT
19 pages
Face Recognition Attendance System
No ratings yet
Face Recognition Attendance System
49 pages
Final Project Report
No ratings yet
Final Project Report
19 pages
Smart Attendance System Report Final Document For Print
No ratings yet
Smart Attendance System Report Final Document For Print
44 pages
Report
No ratings yet
Report
26 pages
229 PDF
No ratings yet
229 PDF
7 pages
Ayush Sahu Report
No ratings yet
Ayush Sahu Report
46 pages
Engineering Student's Detection Project
No ratings yet
Engineering Student's Detection Project
23 pages
Python Human Detection and Counting System
No ratings yet
Python Human Detection and Counting System
7 pages
Deep Crowd Counting in Congested Scenes Through Refine Modules
No ratings yet
Deep Crowd Counting in Congested Scenes Through Refine Modules
10 pages
CNN-based Density Estimation and Crowd Counting: A Survey
No ratings yet
CNN-based Density Estimation and Crowd Counting: A Survey
25 pages
FULLTEXT02
No ratings yet
FULLTEXT02
42 pages
Real-Time Object Tracking System
No ratings yet
Real-Time Object Tracking System
50 pages
Perfect Crowd Counting Presentation
No ratings yet
Perfect Crowd Counting Presentation
13 pages
Human Detection and Counting
No ratings yet
Human Detection and Counting
11 pages
8 IV April 2020
No ratings yet
8 IV April 2020
7 pages
Thesis. Facial Recognition Security System
0% (1)
Thesis. Facial Recognition Security System
44 pages
1 s2.0 S2773186324000999 Main
No ratings yet
1 s2.0 S2773186324000999 Main
14 pages
Advances in Convolution Neural Networks Based Crowd Counting and Density Estimation 2021
No ratings yet
Advances in Convolution Neural Networks Based Crowd Counting and Density Estimation 2021
21 pages
ICT-Grade 6-Term 3-Workbook
No ratings yet
ICT-Grade 6-Term 3-Workbook
19 pages
C ZoneDIY
No ratings yet
C ZoneDIY
2 pages
Sonoscape E2 Quick Manual
100% (1)
Sonoscape E2 Quick Manual
5 pages
OS - Module 2-2
No ratings yet
OS - Module 2-2
36 pages
Role of GIS in Distribution Power Systems: N. Rezaee, M Nayeripour, A. Roosta, T. Niknam
No ratings yet
Role of GIS in Distribution Power Systems: N. Rezaee, M Nayeripour, A. Roosta, T. Niknam
5 pages
Livro IA A Guide For Everyone
No ratings yet
Livro IA A Guide For Everyone
278 pages
Introduction To Generative AI (Google Cloud Skill Boost)
No ratings yet
Introduction To Generative AI (Google Cloud Skill Boost)
20 pages
Manual FlowDDE PDF
No ratings yet
Manual FlowDDE PDF
20 pages
Microsoft SQL Server Architecture Overview
No ratings yet
Microsoft SQL Server Architecture Overview
111 pages
20 Features
100% (7)
20 Features
5 pages
Install Mujoco221 On Win
No ratings yet
Install Mujoco221 On Win
5 pages
Getting Started With Photoshop
No ratings yet
Getting Started With Photoshop
15 pages
ICDL Computer Essentials 1.0 - QRG
No ratings yet
ICDL Computer Essentials 1.0 - QRG
4 pages
Rev Qbank Micro
No ratings yet
Rev Qbank Micro
1 page
Adobe Creative Cloud All Apps Overview
No ratings yet
Adobe Creative Cloud All Apps Overview
2 pages
Multilingual Software Localization Services
No ratings yet
Multilingual Software Localization Services
8 pages
MAD Proposal
No ratings yet
MAD Proposal
2 pages
Mini Project - Merged
No ratings yet
Mini Project - Merged
48 pages
Dell Wireless Keyboard and Mouse Combo km3322w Datasheet
100% (1)
Dell Wireless Keyboard and Mouse Combo km3322w Datasheet
4 pages
Article Title: First Author, Second Author and Third Author
No ratings yet
Article Title: First Author, Second Author and Third Author
13 pages
Student Eligibility System For Job or Internship
No ratings yet
Student Eligibility System For Job or Internship
13 pages
MK15 User Manual v1.5
No ratings yet
MK15 User Manual v1.5
173 pages
Bhutan Census GIS Mapping 2005
No ratings yet
Bhutan Census GIS Mapping 2005
57 pages
HCI Slides 5 PDF
No ratings yet
HCI Slides 5 PDF
35 pages
FPGA Processor in Memory Architectures PIMs Overla
No ratings yet
FPGA Processor in Memory Architectures PIMs Overla
8 pages
HydroD User Manual for Engineers
100% (1)
HydroD User Manual for Engineers
143 pages
ScanPlus ATM Terminal Security (TSS) - Comparision Sheet
No ratings yet
ScanPlus ATM Terminal Security (TSS) - Comparision Sheet
10 pages
Exe - Bit 2025 Event Synopsis - 11 - 06
No ratings yet
Exe - Bit 2025 Event Synopsis - 11 - 06
3 pages
SFT2841 Software Overview and Functions
No ratings yet
SFT2841 Software Overview and Functions
9 pages
DWSIM Tutorial Instructions Guide
No ratings yet
DWSIM Tutorial Instructions Guide
2 pages