HUMAN DETECTION SYSTEM
Submitted in partial fulfillment of the requirements
of the degree of
Bachelor of Engineering
by
Mr. VIKAS SHARMA ARMIET/IT21/SV222
Miss. JYOTI JADHAV ARMIET/IT21/JJ219
Miss. POURNIMA GHUDE ARMIET/IT21/GP218
MR. SANDIP PASHTE ARMIET/IT21/SP226
Under the Guidance of
PROF. MAYANK MANGAL
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING
AND TECHNOLOGY
Affiliated to
UNIVERSITY OF MUMBAI
Department of Information Technology
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
Academic Year – 2022-2023
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
CERTIFICATE
This mini project report entitled “HUMAN DETECTION SYSTEM” by, Mr. Vikas
Sharma, Miss. Jyoti Jadhav, Miss. Pournima Ghude, Mr. Sandip Pashte is approved for the
degree of Bachelor of Engineering in Information Technology Engineering (Third Year) for
academic year 2022 - 2023.
Examiners
1.
2.
Supervisor
1.
Prof. Mayank Mangal
Head of the Department Principal
Date:
Place:
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
Declaration
I declare that this written submission represents my ideas in my own words and where others' ideas or
words have been included, I have adequately cited and referenced the original sources. I also declare
that I have adhered to all principles of academic honesty and integrity and have not misrepresented or
fabricated or falsified any idea/data/fact/source in my submission. I understand that any violation of
the above will be cause for disciplinary action by the Institute and can also evoke penal action from
the sources which have thus not been properly cited or from whom proper permission has not been
taken when needed.
________________ _________________ ____________________ _________________
Mr. Vikas Sharma Miss. Jyoti Jadhav Miss. Pournima Ghude Mr. Sandip Pashte
ARMIET/IT21/SV222 ARMIET/IT21/JJ219 ARMIET/IT21/GP218 ARMIET/IT21/SP226
Date:
Place:
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
Acknowledgement
A mini project is something that could not have been materialized without cooperation of many
people. This project shall be incomplete if I do not convey my heartfelt gratitude to those people
from whom I have got considerable support and encouragement.
It is a matter of great pleasure for us to have a respected Prof. MAYANK MANGAL as my project
guide. We are thankful to her for being constant source of inspiration.
We would also like to give our sincere thanks to Prof. Mayank Mangal, Head of Department, for their
kind support.
Last but not the least I would also like to thank all the staffs of ARMIET college of Engineering
(Information Technology Department) for their valuable guidance with their interest and valuable
suggestions brightened us.
1. Mr. Vikas Sharma
2. Miss. Jyoti Jadhav
3. Miss. Pournima Ghude
4. Mr. Sandip Pashte
CONTENTS
CH.
TOPIC NAME PAGE NO.
NO
List of Figures i
List of Table ii
List of Abbreviation iii
INTRODUCTION
1 1.1 Objective of the Project 9
1.2 Background Research 9
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
1.3 Main Purpose 10
1.4 Future Scope 10
SURVEY OF TECHNOLOGY
2.1 Existing System 11
2
2.2 Proposed System 11
2.3 Overall Design 11
REQUIREMENT ANALYSIS
3.1 Problem Definition 12
3.2 Requirement Specification 12
3
3.3 Software Requirements and Hardware Requirements 12
3.3.1 Hardware Requirements 12
3.3.2 Software Requirements 12
CH.
TOPIC NAME PAGE NO.
NO
SYSTEM DESIGN
4
4.1 Methodology 13
REVIEW OF LITERATURE
5
5.1 Review of Literature 13
6 UNIFIED MODELING LANGUAGE (UML)
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
6.1 Flow Chart 14
6.2 Use Case Diagram 15
6.3 System Architecture 15
6.5 Block Diagram 16
6.6 Class Diagram 16
SYSTEM DESIGN AND IMPLEMENTATION
7
7.1 System Implementation 17
RESULTS & OUTPUTS
8
8.1 Results 32
CONCLUSION 34
REFERENCES 34
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
LIST OF FIGURES
Sr. No. Figure Name Page No.
1 Flow Chart 14
2 Use Case Diagram 15
3 System Architecture 15
5 Block Diagram 16
6 Class Diagram 16
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
ABSTRACT
Real time object detection is a vast, vibrant and complex area of computer vision. If there is a single object to be
detected in an image, it is known as Image Localization and if there are multiple objects in an image, then it is
Object Detection. This detects the semantic objects of a class in digital images and videos. The applications of real
time object detection include tracking objects, video surveillance, pedestrian detection, people counting, self-
driving cars, face detection, ball tracking in sports and many more. Convolution Neural Networks is a
representative tool of Deep learning to detect objects using OpenCV (Opensource Computer Vision), which is a
library of programming functions mainly aimed at real-time computer vision.
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
INTRODUCTION
1.1 Objective of the Project:
The motive of object detection is to recognize and locate all known objects in a scene. Preferably in 3D space,
recovering pose of individual in 3D is very important for robotic control systems. Imparting intelligence to
machines and making robots more and more autonomous and independent has been a sustaining technological
dream for the mankind. It is our dream to let the robots take on tedious, boring, or dangerous work so that we can
commit our time to more creative tasks. Unfortunately, the intelligent part seems to be still lagging behind. In real
life, to achieve this goal, besides hardware development, we need the software that can enable robot the
intelligence to do the work and act independently. One of the crucial components regarding this is vision, apart
from other types of intelligences such as learning and cognitive thinking.
A robot cannot be too intelligent if it cannot see and adapt to a dynamic environment. The searching or recognition
process in real time scenario is very difficult. So far, no effective solution has been found for this problem. Despite
a lot of research in this area, the methods developed so far are not efficient, require long training time, are not
suitable for real time application, and are not scalable to large number of classes.
1.2 Background Research:
Various techniques have been applied to construct fast and reliable person detectors for surveillance applications.
Classification techniques can for example be applied to decide if a given image region contains a person. Amongst
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
use Support Vector Machines to approach this problem. A dynamic point distribution model. An alternative to
modeling the appearance of an entire person is to design detectors for specific body parts and combine the result of
those. The idea of learning part detectors using Ada-Boost and a set of weak classifiers is presented in. AI learning
approach is then being used to combine the set of weak classifiers with body part detectors, which are further
combined using a probabilistic person model. All these approaches require a fair amount of training data to learn
the 1parameters of the underlying model. Although these classifiers are robust to limited occlusions, they are not
suitable to segment a group of people into individuals.
1.3Main Purpose:
A number of surveillance scenarios require the detection and tracking of people. Although person detection and
counting systems are commercially available today, there is need for further research to address the challenges of
real world scenarios. The focus of this work is the segmentation of groups of people into individuals and tracking
them over time. The relevant applications of this algorithm are people counting and event detection. Experiments
document that the presented approach leads to robust people counts.
1.4 Future Scope:
Big data applications are consuming most of the space in industry and research area. Among the widespread
examples of big data, the role of video streams from CCTV cameras is equally important as other sources like
social media data, sensor data, agriculture data, medical data and data evolved from space research. Surveillance
videos have a major contribution in unstructured big data. CCTV cameras are implemented in all places where
security having much importance. Manual surveillance seems tedious and time consuming. Security can be defined
in different terms in different contexts like theft identification, violence detection, chances of explosion etc.
In crowded public places the term security covers almost all type of abnormal events. Among them violence
detection is difficult to handle since it involves group activity. The anomalous or abnormal activity analysis in a
crowd video scene is very difficult due to several real-world constraints. The paper includes a deep-rooted survey
which starts from object recognition, action recognition, crowd analysis and finally violence detection in a crowd
environment. Majority of the papers reviewed in this survey are based on deep learning technique. Various deep
learning methods are compared in terms of their algorithms and models
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
SURVEY OF TECHNOLOGY
2.1 Existing System:
Most existing digital video surveillance systems rely on human observers for detecting specific activities in a real-
time video scene. However, there are limitations in the human capability to monitor simultaneous events in
surveillance displays [1]. Hence, human motion analysis in automated video surveillance has become one of the
most active and attractive research topics in the area of computer vision and pattern recognition.
2.2 Proposed System:
When presented with an image or video, object detection TensorFlow works by identifying known objects from all
instances through the help of computer vision. The history of object detection is as recent as the internet. The first
recorded neural network for object detection was Overfeat, as developers believed object detection would help
improve image identification. TensorFlow object detection combines Deep and Machine Learning for object
recognition. Through APIs, developers can run TensorFlow trained models rather than build from scratch. This
saves them a lot of time and improves their predictive accuracy. Also, TensorFlow object detection tutorial ensures
anyone can apply a model without prior knowledge of Machine Learning. However, they'd still need a basic
understanding of Python.
2.3 Overall Design
The system consists of four main components: a standard low-level foreground estimation algorithm, an
autocalibration module, a template-based tracker, as well as the crowd segmentation. All four components are
combined into a tightly coupled framework. For each frame, the foreground estimation algorithm detects a set of
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
consistent foreground rectangles, where each of them is assumed to be either a person or a group of people. The
tracker maintains the trajectory of each person over time, whose head and foot location are sent to the
autocalibration module. The resulting calibration parameters are utilized by the crowd segmentation module in
segmenting a group of people into individuals. Furthermore, the segmented individuals are tracked via the data
association. Eventually, these trajectories are used in the people counting and event detection applications. In the
following, we will describe the autocalibration, the tracker and the integrated crowd segmentation components in
more detail.
REQUIREMENT ANALYSIS
3.1 Problem definition
Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level
understanding from digital images or videos. It is most widely used field of artificial intelligence (AI) that enables
computers and systems to derive meaningful information from digital images, videos and other visual inputs.
Different types of computer vision include image segmentation, object detection, facial recognition, edge detection,
pattern detection, image classification, and feature matching. Building an AI and Machine Learning based model.
In this model will allow user to upload any real time video and model will show us the traffic condition of the area
in video. Thus, like wise using this model one can take on control of traffic and which will prevent various chaos
and accidents. And will also help to regulate traffic flow in that area.
3.2 Requirement Specification:
PYTHON 3
TKINTER
MESSAGEBOX
PIL
CV2
ARGPARSE
MATPLOTLIB.PYPLOT
NUMPY
TIME
OS
TENSORFLOW
FPDF
3.3 Software Requirements and Hardware Requirements:
3.3.1 Hardware Requirements:
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
Processor – i5 & above
Hard Disk – 100GB & above
GPU (CUDA compatible)– GTX 2060
Memory – 32GB RAM
Camera - High Definition
3.3.2 Software Requirements:
Windows 11/Debian
PyCharm
SYSTEM DESIGN
4.1 Methodology:
TensorFlow is an open-source API from Google, which is widely used for solving machine learning tasks that
involve Deep Neural Networks. TensorFlow Object Detection API is an open-source library made based on
TensorFlow for supporting training and evaluation of Object Detection models. Today we will take a look at
“TensorFlow Detection Model Zoo”, which is a collection of pre-trained models compatible with TensorFlow
Object Detection API. PyCharm is a programming language that translates an abstract idea into a program design
we can see on screens. PyCharm presents a three-step approach for creating programs which are to design the
appearance of the application, Assign property settings to the objects of your program & write the code to direct
specific tasks at runtime.
REVIEW OF LITERATURE
5.1 Review of Literature:
Extracting high level features is an important field in video indexing and retrieving. Identifying the presence of
human in video is one of these high-level features, which facilitate the understanding of other aspects concerning
people or the interactions between people. Our work proposes a method for identifying the presence of human in
videos. Experimental results demonstrate the successfulness of the algorithm used and its capability in detecting
faces under different challenges. The proposed work is crucial in lots of applications whose concern is mainly
human activities and can be a basic step in such activities. So, for that an algorithm has been proposed to detect the
presence of human in video sequence.
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
UNIFIED MODELING LANGUAGE (UML)
6.1 Flowchart:
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
6.2 Usecase Diagram:
6.3 System Architecture:
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
6.4 Block Diagram:
6.5 Class diagram:
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
SYSTEM DESIGN AND IMPLEMENTATION
7 System Implementation:
Code: Main.py
# Smart Detection System
# imported necessary library
from tkinter import *
import tkinter as tk
import tkinter.messagebox as mbox
from tkinter import filedialog
from PIL import ImageTk, Image
import cv2
import argparse
from persondetection import DetectorAPI
import matplotlib.pyplot as plt
from fpdf import FPDF
# Main Window & Configuration
window = tk.Tk()
window.title("Smart Detection System")
window.iconbitmap('Images/icon.ico')
window.geometry('1000x700')
# top label
start1 = tk.Label(text="SMART DETECTION SYSTEM", font=("Arial", 50, "underline"), fg="black") # same way bg
start1.place(x=70, y=10)
# function defined to start the main application
def start_fun():
window.destroy()
# created a start button
Button(window, text="▶ START", command=start_fun, font=("Arial", 25), bg="orange", fg="blue", cursor="hand2",
borderwidth=3, relief="raised").place(x=130, y=570)
# image on the main window
path1 = "Images/front2.png"
img2 = ImageTk.PhotoImage(Image.open(path1))
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
panel1 = tk.Label(window, image=img2)
panel1.place(x=90, y=250)
# image on the main window
path = "Images/front1.png"
img1 = ImageTk.PhotoImage(Image.open(path))
panel = tk.Label(window, image=img1)
panel.place(x=380, y=180)
exit1 = False
# function created for exiting from window
def exit_win():
global exit1
if mbox.askokcancel("Exit", "Do you want to exit?"):
exit1 = True
window.destroy()
# exit button created
Button(window, text="❌ EXIT", command=exit_win, font=("Arial", 25), bg="red", fg="blue", cursor="hand2",
borderwidth=3,
relief="raised").place(x=680, y=570)
window.protocol("WM_DELETE_WINDOW", exit_win)
window.mainloop()
if exit1 == False:
# Main Window & Configuration of window1
window1 = tk.Tk()
window1.title("Smart Detection System")
window1.iconbitmap('Images/icon.ico')
window1.geometry('1000x700')
filename = ""
filename1 = ""
filename2 = ""
def argsParser():
arg_parse = argparse.ArgumentParser()
arg_parse.add_argument("-v", "--video", default=None, help="path to Video File ")
arg_parse.add_argument("-i", "--image", default=None, help="path to Image File ")
arg_parse.add_argument("-c", "--camera", default=False, help="Set true if you want to use the camera.")
arg_parse.add_argument("-o", "--output", type=str, help="path to optional output video file")
args = vars(arg_parse.parse_args())
return args
# ---------------------------- image section ------------------------------------------------------------
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
def image_option():
# new windowi created for image section
windowi = tk.Tk()
windowi.title("Human Detection from Image")
windowi.iconbitmap('Images/icon.ico')
windowi.geometry('1000x700')
max_count1 = 0
framex1 = []
county1 = []
max1 = []
avg_acc1_list = []
max_avg_acc1_list = []
max_acc1 = 0
max_avg_acc1 = 0
# function defined to open the image
def open_img():
global filename1, max_count1, framex1, county1, max1, avg_acc1_list, max_avg_acc1_list, max_acc1, max_avg_acc1
max_count1 = 0
framex1 = []
county1 = []
max1 = []
avg_acc1_list = []
max_avg_acc1_list = []
max_acc1 = 0
max_avg_acc1 = 0
filename1 = filedialog.askopenfilename(title="Select Image file", parent=windowi)
path_text1.delete("1.0", "end")
path_text1.insert(END, filename1)
# function defined to detect the image
def det_img():
global filename1, max_count1, framex1, county1, max1, avg_acc1_list, max_avg_acc1_list, max_acc1, max_avg_acc1
max_count1 = 0
framex1 = []
county1 = []
max1 = []
avg_acc1_list = []
max_avg_acc1_list = []
max_acc1 = 0
max_avg_acc1 = 0
image_path = filename1
if (image_path == ""):
mbox.showerror("Error", "No Image File Selected!", parent=windowi)
return
info1.config(text="Status : Detecting...")
# info2.config(text=" ")
mbox.showinfo("Status", "Detecting, Please Wait...", parent=windowi)
# time.sleep(1)
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
detectByPathImage(image_path)
# main detection process here
def detectByPathImage(path):
global filename1, max_count1, framex1, county1, max1, avg_acc1_list, max_avg_acc1_list, max_acc1, max_avg_acc1
max_count1 = 0
framex1 = []
county1 = []
max1 = []
avg_acc1_list = []
max_avg_acc1_list = []
max_acc1 = 0
max_avg_acc1 = 0
# function defined to plot the enumeration fo people detected
def img_enumeration_plot():
plt.figure(facecolor='orange', )
ax = plt.axes()
ax.set_facecolor("yellow")
plt.plot(framex1, county1, label="Human Count", color="green", marker='o', markerfacecolor='blue')
plt.plot(framex1, max1, label="Max. Human Count", linestyle='dashed', color='fuchsia')
plt.xlabel('Time (sec)')
plt.ylabel('Human Count')
plt.legend()
plt.title("Enumeration Plot")
plt.get_current_fig_manager().canvas.set_window_title("Plot for Image")
plt.show()
def img_accuracy_plot():
plt.figure(facecolor='orange', )
ax = plt.axes()
ax.set_facecolor("yellow")
plt.plot(framex1, avg_acc1_list, label="Avg. Accuracy", color="green", marker='o',
markerfacecolor='blue')
plt.plot(framex1, max_avg_acc1_list, label="Max. Avg. Accuracy", linestyle='dashed', color='fuchsia')
plt.xlabel('Time (sec)')
plt.ylabel('Avg. Accuracy')
plt.title('Avg. Accuracy Plot')
plt.legend()
plt.get_current_fig_manager().canvas.set_window_title("Plot for Image")
plt.show()
def img_gen_report():
pdf = FPDF(orientation='P', unit='mm', format='A4')
pdf.add_page()
pdf.set_font("Arial", "", 20)
pdf.set_text_color(128, 0, 0)
pdf.image('Images/Crowd_Report.png', x=0, y=0, w=210, h=297)
pdf.text(125, 150, str(max_count1))
pdf.text(105, 163, str(max_acc1))
pdf.text(125, 175, str(max_avg_acc1))
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
if (max_count1 > 25):
pdf.text(26, 220, "Max. Human Detected is greater than MAX LIMIT.")
pdf.text(70, 235, "Region is Crowded.")
else:
pdf.text(26, 220, "Max. Human Detected is in range of MAX LIMIT.")
pdf.text(65, 235, "Region is not Crowded.")
pdf.output('Crowd_Report.pdf')
mbox.showinfo("Status", "Report Generated and Saved Successfully.", parent=windowi)
odapi = DetectorAPI()
threshold = 0.7
image = cv2.imread(path)
img = cv2.resize(image, (image.shape[1], image.shape[0]))
boxes, scores, classes, num = odapi.processFrame(img)
person = 0
acc = 0
for i in range(len(boxes)):
if classes[i] == 1 and scores[i] > threshold:
box = boxes[i]
person += 1
cv2.rectangle(img, (box[1], box[0]), (box[3], box[2]), (255, 0, 0), 2) # cv2.FILLED #BGR
cv2.putText(img, f'P{person, round(scores[i], 2)}', (box[1] - 30, box[0] - 8),
cv2.FONT_HERSHEY_COMPLEX, 0.5, (0, 0, 255), 1) # (75,0,130),
acc += scores[i]
if (scores[i] > max_acc1):
max_acc1 = scores[i]
if (person > max_count1):
max_count1 = person
if (person >= 1):
if ((acc / person) > max_avg_acc1):
max_avg_acc1 = (acc / person)
cv2.imshow("Human Detection from Image", img)
info1.config(text=" ")
info1.config(text="Status : Detection & Counting Completed")
# info2.config(text=" ")
# info2.config(text="Max. Human Count : " + str(max_count1))
cv2.waitKey(0)
cv2.destroyAllWindows()
for i in range(20):
framex1.append(i)
county1.append(max_count1)
max1.append(max_count1)
avg_acc1_list.append(max_avg_acc1)
max_avg_acc1_list.append(max_avg_acc1)
Button(windowi, text="Enumeration\nPlot", command=img_enumeration_plot, cursor="hand2", font=("Arial", 20),
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
bg="orange", fg="blue").place(x=100, y=530)
Button(windowi, text="Avg. Accuracy\nPlot", command=img_accuracy_plot, cursor="hand2", font=("Arial", 20),
bg="orange", fg="blue").place(x=700, y=530)
Button(windowi, text="Generate Crowd Report", command=img_gen_report, cursor="hand2", font=("Arial", 20),
bg="light gray", fg="blue").place(x=325, y=550)
def prev_img():
global filename1
img = cv2.imread(filename1, 1)
cv2.imshow("Selected Image Preview", img)
# for images ----------------------
lbl1 = tk.Label(windowi, text="DETECT FROM\nIMAGE", font=("Arial", 50, "underline"), fg="brown")
lbl1.place(x=230, y=20)
lbl2 = tk.Label(windowi, text="Selected Image", font=("Arial", 30), fg="green")
lbl2.place(x=80, y=200)
path_text1 = tk.Text(windowi, height=1, width=37, font=("Arial", 30), bg="light yellow", fg="orange",
borderwidth=2, relief="solid")
path_text1.place(x=80, y=260)
Button(windowi, text="SELECT", command=open_img, cursor="hand2", font=("Arial", 20), bg="light green",
fg="blue").place(x=220, y=350)
Button(windowi, text="PREVIEW", command=prev_img, cursor="hand2", font=("Arial", 20), bg="yellow",
fg="blue").place(x=410, y=350)
Button(windowi, text="DETECT", command=det_img, cursor="hand2", font=("Arial", 20), bg="orange",
fg="blue").place(x=620, y=350)
info1 = tk.Label(windowi, font=("Arial", 30), fg="gray")
info1.place(x=100, y=445)
# info2 = tk.Label(windowi,font=("Arial", 30), fg="gray")
# info2.place(x=100, y=500)
def exit_wini():
if mbox.askokcancel("Exit", "Do you want to exit?", parent=windowi):
windowi.destroy()
windowi.protocol("WM_DELETE_WINDOW", exit_wini)
# ---------------------------- video section ------------------------------------------------------------
def video_option():
# new windowv created for video section
windowv = tk.Tk()
windowv.title("Human Detection from Video")
windowv.iconbitmap('Images/icon.ico')
windowv.geometry('1000x700')
max_count2 = 0
framex2 = []
county2 = []
max2 = []
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
avg_acc2_list = []
max_avg_acc2_list = []
max_acc2 = 0
max_avg_acc2 = 0
# function defined to open the video
def open_vid():
global filename2, max_count2, framex2, county2, max2, avg_acc2_list, max_avg_acc2_list, max_acc2, max_avg_acc2
max_count2 = 0
framex2 = []
county2 = []
max2 = []
avg_acc2_list = []
max_avg_acc2_list = []
max_acc2 = 0
max_avg_acc2 = 0
filename2 = filedialog.askopenfilename(title="Select Video file", parent=windowv)
path_text2.delete("1.0", "end")
path_text2.insert(END, filename2)
# function defined to detect inside the video
def det_vid():
global filename2, max_count2, framex2, county2, max2, avg_acc2_list, max_avg_acc2_list, max_acc2, max_avg_acc2
max_count2 = 0
framex2 = []
county2 = []
max2 = []
avg_acc2_list = []
max_avg_acc2_list = []
max_acc2 = 0
max_avg_acc2 = 0
video_path = filename2
if (video_path == ""):
mbox.showerror("Error", "No Video File Selected!", parent=windowv)
return
info1.config(text="Status : Detecting...")
# info2.config(text=" ")
mbox.showinfo("Status", "Detecting, Please Wait...", parent=windowv)
# time.sleep(1)
args = argsParser()
writer = None
if args['output'] is not None:
writer = cv2.VideoWriter(args['output'], cv2.VideoWriter_fourcc(*'MJPG'), 10, (600, 600))
detectByPathVideo(video_path, writer)
# the main process of detection in video takes place here
def detectByPathVideo(path, writer):
global filename2, max_count2, framex2, county2, max2, avg_acc2_list, max_avg_acc2_list, max_acc2, max_avg_acc2
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
max_count2 = 0
framex2 = []
county2 = []
max2 = []
avg_acc2_list = []
max_avg_acc2_list = []
max_acc2 = 0
max_avg_acc2 = 0
# function defined to plot the people detected in video
def vid_enumeration_plot():
plt.figure(facecolor='orange', )
ax = plt.axes()
ax.set_facecolor("yellow")
plt.plot(framex2, county2, label="Human Count", color="green", marker='o', markerfacecolor='blue')
plt.plot(framex2, max2, label="Max. Human Count", linestyle='dashed', color='fuchsia')
plt.xlabel('Time (sec)')
plt.ylabel('Human Count')
plt.title('Enumeration Plot')
plt.legend()
plt.get_current_fig_manager().canvas.set_window_title("Plot for Video")
plt.show()
def vid_accuracy_plot():
plt.figure(facecolor='orange', )
ax = plt.axes()
ax.set_facecolor("yellow")
plt.plot(framex2, avg_acc2_list, label="Avg. Accuracy", color="green", marker='o',
markerfacecolor='blue')
plt.plot(framex2, max_avg_acc2_list, label="Max. Avg. Accuracy", linestyle='dashed', color='fuchsia')
plt.xlabel('Time (sec)')
plt.ylabel('Avg. Accuracy')
plt.title('Avg. Accuracy Plot')
plt.legend()
plt.get_current_fig_manager().canvas.set_window_title("Plot for Video")
plt.show()
def vid_gen_report():
pdf = FPDF(orientation='P', unit='mm', format='A4')
pdf.add_page()
pdf.set_font("Arial", "", 20)
pdf.set_text_color(128, 0, 0)
pdf.image('Images/Crowd_Report.png', x=0, y=0, w=210, h=297)
pdf.text(125, 150, str(max_count2))
pdf.text(105, 163, str(max_acc2))
pdf.text(125, 175, str(max_avg_acc2))
if (max_count2 > 25):
pdf.text(26, 220, "Max. Human Detected is greater than MAX LIMIT.")
pdf.text(70, 235, "Region is Crowded.")
else:
pdf.text(26, 220, "Max. Human Detected is in range of MAX LIMIT.")
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
pdf.text(65, 235, "Region is not Crowded.")
pdf.output('Crowd_Report.pdf')
mbox.showinfo("Status", "Report Generated and Saved Successfully.", parent=windowv)
video = cv2.VideoCapture(path)
odapi = DetectorAPI()
threshold = 0.7
check, frame = video.read()
if check == False:
print('Video Not Found. Please Enter a Valid Path (Full path of Video Should be Provided).')
return
x2 = 0
while video.isOpened():
# check is True if reading was successful
check, frame = video.read()
if (check == True):
img = cv2.resize(frame, (800, 500))
boxes, scores, classes, num = odapi.processFrame(img)
person = 0
acc = 0
for i in range(len(boxes)):
# print(boxes)
# print(scores)
# print(classes)
# print(num)
# print()
if classes[i] == 1 and scores[i] > threshold:
box = boxes[i]
person += 1
cv2.rectangle(img, (box[1], box[0]), (box[3], box[2]), (255, 0, 0), 2) # cv2.FILLED
cv2.putText(img, f'P{person, round(scores[i], 2)}', (box[1] - 30, box[0] - 8),
cv2.FONT_HERSHEY_COMPLEX, 0.5, (0, 0, 255), 1) # (75,0,130),
acc += scores[i]
if (scores[i] > max_acc2):
max_acc2 = scores[i]
if (person > max_count2):
max_count2 = person
county2.append(person)
x2 += 1
framex2.append(x2)
if (person >= 1):
avg_acc2_list.append(acc / person)
if ((acc / person) > max_avg_acc2):
max_avg_acc2 = (acc / person)
else:
avg_acc2_list.append(acc)
if writer is not None:
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
writer.write(img)
cv2.imshow("Human Detection from Video", img)
key = cv2.waitKey(1)
if key & 0xFF == ord('q'):
break
else:
break
video.release()
info1.config(text=" ")
# info2.config(text=" ")
info1.config(text="Status : Detection & Counting Completed")
# info2.config(text="Max. Human Count : " + str(max_count2))
cv2.destroyAllWindows()
for i in range(len(framex2)):
max2.append(max_count2)
max_avg_acc2_list.append(max_avg_acc2)
Button(windowv, text="Enumeration\nPlot", command=vid_enumeration_plot, cursor="hand2", font=("Arial", 20),
bg="orange", fg="blue").place(x=100, y=530)
Button(windowv, text="Avg. Accuracy\nPlot", command=vid_accuracy_plot, cursor="hand2", font=("Arial", 20),
bg="orange", fg="blue").place(x=700, y=530)
Button(windowv, text="Generate Crowd Report", command=vid_gen_report, cursor="hand2", font=("Arial", 20),
bg="gray", fg="blue").place(x=325, y=550)
# funcion defined to preview the selected video
def prev_vid():
global filename2
cap = cv2.VideoCapture(filename2)
while (cap.isOpened()):
ret, frame = cap.read()
if ret == True:
img = cv2.resize(frame, (800, 500))
cv2.imshow('Selected Video Preview', img)
if cv2.waitKey(25) & 0xFF == ord('q'):
break
else:
break
cap.release()
cv2.destroyAllWindows()
lbl1 = tk.Label(windowv, text="DETECT FROM\nVIDEO", font=("Arial", 50, "underline"), fg="brown")
lbl1.place(x=230, y=20)
lbl2 = tk.Label(windowv, text="Selected Video", font=("Arial", 30), fg="green")
lbl2.place(x=80, y=200)
path_text2 = tk.Text(windowv, height=1, width=37, font=("Arial", 30), bg="light yellow", fg="orange",
borderwidth=2, relief="solid")
path_text2.place(x=80, y=260)
Button(windowv, text="SELECT", command=open_vid, cursor="hand2", font=("Arial", 20), bg="light green",
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
fg="blue").place(x=220, y=350)
Button(windowv, text="PREVIEW", command=prev_vid, cursor="hand2", font=("Arial", 20), bg="yellow",
fg="blue").place(x=410, y=350)
Button(windowv, text="DETECT", command=det_vid, cursor="hand2", font=("Arial", 20), bg="orange",
fg="blue").place(x=620, y=350)
info1 = tk.Label(windowv, font=("Arial", 30), fg="gray") # same way bg
info1.place(x=100, y=440)
# info2 = tk.Label(windowv, font=("Arial", 30), fg="gray") # same way bg
# info2.place(x=100, y=500)
# function defined to exit from windowv section
def exit_winv():
if mbox.askokcancel("Exit", "Do you want to exit?", parent=windowv):
windowv.destroy()
windowv.protocol("WM_DELETE_WINDOW", exit_winv)
# ---------------------------- camera section ------------------------------------------------------------
def camera_option():
# new window created for camera section
windowc = tk.Tk()
windowc.title("Human Detection from Camera")
windowc.iconbitmap('Images/icon.ico')
windowc.geometry('1000x700')
max_count3 = 0
framex3 = []
county3 = []
max3 = []
avg_acc3_list = []
max_avg_acc3_list = []
max_acc3 = 0
max_avg_acc3: int = 0
# function defined to open the camera
def open_cam():
global max_count3, framex3, county3, max3, avg_acc3_list, max_avg_acc3_list, max_acc3, max_avg_acc3
max_count3 = 0
framex3 = []
county3 = []
max3 = []
avg_acc3_list = []
max_avg_acc3_list = []
max_acc3 = 0
max_avg_acc3 = 0
args = argsParser()
info1.config(text="Status : Opening Camera...")
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
# info2.config(text=" ")
mbox.showinfo("Status", "Opening Camera...Please Wait...", parent=windowc)
# time.sleep(1)
writer = None
if args['output'] is not None:
writer = cv2.VideoWriter(args['output'], cv2.VideoWriter_fourcc(*'MJPG'), 10, (600, 600))
if True:
detectByCamera(writer)
# function defined to detect from camera
def detectByCamera(writer):
global max_count3, framex3, county3, max3, avg_acc3_list, max_avg_acc3_list, max_acc3, max_avg_acc3
max_count3 = 0
framex3 = []
county3 = []
max3 = []
avg_acc3_list = []
max_avg_acc3_list = []
max_acc3 = 0
max_avg_acc3 = 0
# function defined to plot the people count in camera
def cam_enumeration_plot():
plt.figure(facecolor='orange', )
ax = plt.axes()
ax.set_facecolor("yellow")
plt.plot(framex3, county3, label="Human Count", color="green", marker='o', markerfacecolor='blue')
plt.plot(framex3, max3, label="Max. Human Count", linestyle='dashed', color='fuchsia')
plt.xlabel('Time (sec)')
plt.ylabel('Human Count')
plt.legend()
plt.title("Enumeration Plot")
plt.get_current_fig_manager().canvas.set_window_title("Plot for Camera")
plt.show()
def cam_accuracy_plot():
plt.figure(facecolor='orange', )
ax = plt.axes()
ax.set_facecolor("yellow")
plt.plot(framex3, avg_acc3_list, label="Avg. Accuracy", color="green", marker='o',
markerfacecolor='blue')
plt.plot(framex3, max_avg_acc3_list, label="Max. Avg. Accuracy", linestyle='dashed', color='fuchsia')
plt.xlabel('Time (sec)')
plt.ylabel('Avg. Accuracy')
plt.title('Avg. Accuracy Plot')
plt.legend()
plt.get_current_fig_manager().canvas.set_window_title("Plot for Camera")
plt.show()
def cam_gen_report():
pdf = FPDF(orientation='P', unit='mm', format='A4')
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
pdf.add_page()
pdf.set_font("Arial", "", 20)
pdf.set_text_color(128, 0, 0)
pdf.image('Images/Crowd_Report.png', x=0, y=0, w=210, h=297)
pdf.text(125, 150, str(max_count3))
pdf.text(105, 163, str(max_acc3))
pdf.text(125, 175, str(max_avg_acc3))
if (max_count3 > 25):
pdf.text(26, 220, "Max. Human Detected is greater than MAX LIMIT.")
pdf.text(70, 235, "Region is Crowded.")
else:
pdf.text(26, 220, "Max. Human Detected is in range of MAX LIMIT.")
pdf.text(65, 235, "Region is not Crowded.")
pdf.output('Crowd_Report.pdf')
mbox.showinfo("Status", "Report Generated and Saved Successfully.", parent=windowc)
video = cv2.VideoCapture(0)
odapi = DetectorAPI()
threshold = 0.7
x3 = 0
while True:
check, frame = video.read()
img = cv2.resize(frame, (800, 600))
boxes, scores, classes, num = odapi.processFrame(img)
person = 0
acc = 0
for i in range(len(boxes)):
if classes[i] == 1 and scores[i] > threshold:
box = boxes[i]
person += 1
cv2.rectangle(img, (box[1], box[0]), (box[3], box[2]), (255, 0, 0), 2) # cv2.FILLED
cv2.putText(img, f'P{person, round(scores[i], 2)}', (box[1] - 30, box[0] - 8),
cv2.FONT_HERSHEY_COMPLEX, 0.5, (0, 0, 255), 1) # (75,0,130),
acc += scores[i]
if (scores[i] > max_acc3):
max_acc3 = scores[i]
if (person > max_count3):
max_count3 = person
if writer is not None:
writer.write(img)
cv2.imshow("Human Detection from Camera", img)
key = cv2.waitKey(1)
if key & 0xFF == ord('q'):
break
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
county3.append(person)
x3 += 1
framex3.append(x3)
if (person >= 1):
avg_acc3_list.append(acc / person)
if ((acc / person) > max_avg_acc3):
max_avg_acc3 = (acc / person)
else:
avg_acc3_list.append(acc)
video.release()
info1.config(text=" ")
# info2.config(text=" ")
info1.config(text="Status : Detection & Counting Completed")
# info2.config(text="Max. Human Count : " + str(max_count3))
cv2.destroyAllWindows()
for i in range(len(framex3)):
max3.append(max_count3)
max_avg_acc3_list.append(max_avg_acc3)
Button(windowc, text="Enumeration\nPlot", command=cam_enumeration_plot, cursor="hand2", font=("Arial", 20),
bg="orange", fg="blue").place(x=100, y=530)
Button(windowc, text="Avg. Accuracy\nPlot", command=cam_accuracy_plot, cursor="hand2", font=("Arial", 20),
bg="orange", fg="blue").place(x=700, y=530)
Button(windowc, text="Generate Crowd Report", command=cam_gen_report, cursor="hand2", font=("Arial", 20),
bg="gray", fg="blue").place(x=325, y=550)
lbl1 = tk.Label(windowc, text="DETECT FROM\nCAMERA", font=("Arial", 50, "underline"),
fg="brown") # same way bg
lbl1.place(x=230, y=20)
Button(windowc, text="OPEN CAMERA", command=open_cam, cursor="hand2", font=("Arial", 20), bg="light green",
fg="blue").place(x=370, y=230)
info1 = tk.Label(windowc, font=("Arial", 30), fg="gray") # same way bg
info1.place(x=100, y=330)
# info2 = tk.Label(windowc, font=("Arial", 30), fg="gray") # same way bg
# info2.place(x=100, y=390)
# function defined to exit from the camera window
def exit_winc():
if mbox.askokcancel("Exit", "Do you want to exit?", parent=windowc):
windowc.destroy()
windowc.protocol("WM_DELETE_WINDOW", exit_winc)
# options -----------------------------
lbl1 = tk.Label(text="OPTIONS", font=("Arial", 50, "underline"), fg="brown") # same way bg
lbl1.place(x=340, y=20)
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
# image on the main window
pathi = "Images/image1.jpg"
imgi = ImageTk.PhotoImage(Image.open(pathi))
paneli = tk.Label(window1, image=imgi)
paneli.place(x=90, y=110)
# image on the main window
pathv = "Images/image2.png"
imgv = ImageTk.PhotoImage(Image.open(pathv))
panelv = tk.Label(window1, image=imgv)
panelv.place(x=700, y=260) # 720, 260
# image on the main window
pathc = "Images/image3.jpg"
imgc = ImageTk.PhotoImage(Image.open(pathc))
panelc = tk.Label(window1, image=imgc)
panelc.place(x=90, y=415)
# created button for all three option
Button(window1, text="DETECT FROM IMAGE ➡", command=image_option, cursor="hand2", font=("Arial", 30),
bg="light green", fg="blue").place(x=350, y=150)
Button(window1, text="DETECT FROM VIDEO ➡", command=video_option, cursor="hand2", font=("Arial", 30),
bg="light blue", fg="blue").place(x=110, y=300) # 90, 300
Button(window1, text="DETECT FROM CAMERA ➡", command=camera_option, cursor="hand2", font=("Arial", 30),
bg="light green", fg="blue").place(x=350, y=450)
# function defined to exit from window1
def exit_win1():
if mbox.askokcancel("Exit", "Do you want to exit?"):
window1.destroy()
# created exit button
Button(window1, text="❌ EXIT", command=exit_win1, cursor="hand2", font=("Arial", 25), bg="red", fg="blue").place(
x=440, y=600)
window1.protocol("WM_DELETE_WINDOW", exit_win1)
window1.mainloop()
Code: persondetection.py
import numpy as np
import tensorflow as tf
# import cv2
import time
import os
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
# import tensorflow.compat.v1 as tf
import tensorflow._api.v2.compat.v1 as tf
tf.disable_v2_behavior()
class DetectorAPI:
def __init__(self):
path = os.path.dirname(os.path.realpath(__file__))
self.path_to_ckpt = f'frozen_inference_graph.pb'
self.detection_graph = tf.Graph()
with self.detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(self.path_to_ckpt, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
self.default_graph = self.detection_graph.as_default()
self.sess = tf.Session(graph=self.detection_graph)
# Definite input and output Tensors for detection_graph
self.image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
self.detection_boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
self.detection_scores = self.detection_graph.get_tensor_by_name('detection_scores:0')
self.detection_classes = self.detection_graph.get_tensor_by_name('detection_classes:0')
self.num_detections = self.detection_graph.get_tensor_by_name('num_detections:0')
def processFrame(self, image):
# Expand dimensions since the trained_model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image, axis=0)
# Actual detection.
start_time = time.time()
(boxes, scores, classes, num) = self.sess.run(
[self.detection_boxes, self.detection_scores,
self.detection_classes, self.num_detections],
feed_dict={self.image_tensor: image_np_expanded})
end_time = time.time()
# print("Elapsed Time:", end_time-start_time)
# print(self.image_tensor, image_np_expanded)
im_height, im_width, _ = image.shape
boxes_list = [None for i in range(boxes.shape[1])]
for i in range(boxes.shape[1]):
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
boxes_list[i] = (int(boxes[0, i, 0] * im_height),int(boxes[0, i, 1]*im_width),int(boxes[0, i, 2] *
im_height),int(boxes[0, i, 3]*im_width))
return boxes_list, scores[0].tolist(), [int(x) for x in classes[0].tolist()], int(num[0])
def close(self):
self.sess.close()
self.default_graph.close()
RESULT / OUTPUT
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
CONCLUSION
In the last section of the project, we generate Crowd Report, which will give some message on the basis of the
results we got from the detection process. For this we took some threshold human count and we gave different
message for different results of human count we got form detection process. Now coming to the future scope of
this project or application, since in this we are taking any image, video or with camera we are detecting humans
and getting count of it, along with accuracy. So some of the future scope can be: This can be used in various malls
and other areas, to analyses the maximum people count, and then providing some restrictions on number of people
to have at a time at that place. This can replace various mental jobs, and this can be done more efficiently with
machines. This will ultimately leads to some kind of crowd-ness control in some places or areas when
implemented in that area.
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY
REFERENCE
IEEE Reference:
https://ieeexplore.ieee.org/document/9760635
https://ieeexplore.ieee.org/document/9730709
ALAMURI RATNAMALA INSTITUTE OF ENGINEERING AND TECHNOLOGY