Tracking and Counting People in
Video Surveillance Data
Group 10
Urvidh Narula
Kavya Bhandari
Bhuyashi Deka
Objective
Take raw video footage and accurately return
the number of individuals present in the frame
at any given moment along with bounding
boxes around their frame
Overview
1. Background subtraction
2. Cleaning up foreground images
3. Finding connected objects & their bounding boxes
4. Running SVM on segments contained in bounding boxes
5. Getting final count and tracking
Selected Frame 1
Selected Frame 2
Reference Background
● Background images from the surveillance
camera are collected for an appropriate
amount of time, to account for the change in
lighting conditions with time, and averaged
Background
to get a representative reference background
● This reference is then subtracted from each
frame of the video, which helps to detect
subtraction
moving objects
● Based on a threshold, obtained through
observation, the image is then divided into
foreground and background
● A resultant binary image is created
Binary Image 1
Binary Image 2
● The binary image is divided into blocks of
8x8 pixels
Cleaning
● If less than 5 pixels in a block are labelled
foreground, then all pixels in the block are
relabelled background
foreground images
● This mechanism is used to counter change in
pixel intensity due to change in lighting
conditions, elimination of local noise, etc.
Processed Image 1
Processed Image 2
Sections of the foreground that are connected
Finding
●
with each other are identified and labelled as
a single object
● Each object is then surrounded by a
connected
bounding box
● An object with less than 550 foreground
pixels is eliminated. This threshold is chosen
as this is the minimum number of pixels we
objects & their ●
expect an identifiable human object to have
As indicated in the picture ahead the
bounding boxes are tight around the humans
bounding boxes
identified, and hence we found that
increasing the bounding boxes by 10 pixels in
each direction improves the performance
Bounding Boxes 1
Bounding Boxes 2
● In MATLAB, the SVM used for human
identification was trained using the Caltech
Running SVM on
Pedestrian Dataset by extracting Aggregate
Channel Features (ACF) from the training
data set
segments contained
● The SVM is trained to identify images of size
atleast 50x21 pixels, thus, all segments with
smaller dimensions are discarded
in bounding boxes
● The SVM is then only run on the regions of
interest marked by the bounding boxes. This
greatly speeds up the process of human
detection (from about 3.5 fps to nearly 5 fps)
as the sliding windows have to be run across
only a small portion of the frame
● The total number of people detected in each
bounding box are summed over all the boxes
Getting final count
to obtain the final count.
● The detection window of the SVM also serves
the purpose of tracking the individual
and tracking
detected
● Multiple detections of the same individual
are avoided by selecting a suitably large (8x8
pixels) stride for the sliding window
Predicted Count 1
Predicted Count 2
Results
● On eliminating the time taken to read and save each frame, the
computation time was observed to be about 0.21 secs/frame. This
allows us to run the video at a frame rate of 4.67 fps. Such results are
comparable to other computationally inexpensive techniques that do
not require a GPU
● No false positives were detected in dense crowds using the trained
SVM, but they were more frequent in sparse crowds
● Significant improvement in the performance of the SVM was
observed in sparser crowd conditions without much occlusion
Tracking & Counting 1
Tracking & Counting 2
Challenges
● Speed-up of this technique is difficult as the subsections of the
program cannot be run in parallel and must be run sequentially
● The method is not accurate enough for dense crowds
● Needs buffer time initially in order to collect data for forming the
reference background image
● Unstable detection of the same individual across frames due to
varying states of occlusion
Applications for Make In India
Providing information about foot traffic, occupancy and demand trends
for the physical area which can be exploited to a great extent for the
development of Smart Cities
Estimating the number of people in each check out lane of big super
markets and displaying the number so that the people can select their
exit near a less crowded line
Mapping path trajectories which can be utilized for in-store optimization
for improving customer experience
To send alert messages to officials in railway stations upon encroachment on
prohibited areas such as the railway tracks
The data that we get can be used for shopping mall analytics which measure
the quality of relationships between the mall and the store
Live tracking of crowd density at railway stations to allow commuters to make
an informed choice about mode of transport
Monitoring attendance in classrooms to match with official records
References
Literature References
● https://ieeexplore.ieee.org/document/5946681?fbclid=IwAR3ynlcO-8rk2xxyjSVBdOpI2OtW
ODTv3MKi1BWKd4TCesjH2ZJ5H4yhsAw
● http://bigwww.epfl.ch/chaudhury/ME_thesis_KunalNC.pdf?fbclid=IwAR0glAVlFbhrlnjiDJp3x
aX3H53Gi-I2zHAhq0NHo-8Xcdbw75J_iUsMjQg
● https://in.mathworks.com/help/
Datasets
● http://www.cvg.reading.ac.uk/PETS2009/a.html?fbclid=IwAR1AvD6TRj5q3URNcvFRZm0KE6f
wnFcLrc2tNsWP8rkY2PYruwdU9RHXtN8
● http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/