40
CHAPTER 3
TARGET DETECTION AND TRACKING
Detection and tracking of moving objects in a video stream is the
first relevant step of information extraction in computer vision applications
including people tracking, video surveillance, traffic monitoring and semantic
annotation of videos. In these applications, the algorithms for detection and
tracking of objects should be characterized by some important features.
High precision
Flexibility in different scenarios (indoor, outdoor)
Adaptivity in different lighting conditions
Efficiency to make feasible for real time
The fast execution and flexibility in different scenarios should be
considered as basic requirements to be met. Object detection makes tracking
more reliable (the same object can be identified more reliably from frame to
frame if its shape and position are accurately detected) and faster.
The input video data is obtained from unmanned aerial vehicle. A
dataset of aerial videos are of varying circumstances and properties is
collected by deploying the UAV in real time environment and fed as input
data to the designed algorithms. The target in the video frame is defined by
the user. The respective target is detected by the proposed Running Gaussian
41
background subtraction technique (RGBS). The technique is compared with
few other existing techniques like Temporal frame differencing (TMF),
Running average background subtraction (RABS) and Temporal median
filtering (TMF) techniques. The same video is fed to the target tracking
algorithm. An Adaptive background mean shift tracking (ABMST) technique
is proposed and implemented to track the target of interest. The algorithm is
compared with traditional mean shift tracking technique (TMST) and
Continuously adaptive mean shift technique (CAM Shift) to justify the
efficiency of the proposed algorithm. The work flow diagram is shown in
Figure 3.1.
INPUT VIDEO DATA FROM UAV
TARGET DETECTION TARGET TRACKING
ALGORITHM ALGORITHM
OBJECT BASED FRAME BASED
PERFORMANCE METRICS PERFORMANCE METRICS
Figure 3.1 Work flow diagram
In this chapter the proposed target detection and tracking
algorithms are discussed in detail along with other target detection and
tracking techniques which are used for comparison.
42
3.1 TARGET DETECTION
Detection of moving object from a video stream is a fundamental
but critical task for surveillance application. A generic approach in detection
of moving objects is background subtraction where each incoming video
frame is compared with a generated background model which is assumed as
reference. The pixel of the current video frame that significantly deviates
from the reference frame is considered as elements of moving object. In case
of aerial surveillance, the algorithm should adapt to various issues like change
in illumination, cluttered background, cast shadow, snow and fog. Also, to
accommodate real time feasibility the algorithm should be computationally
inexpensive with low memory requirements.
The basic idea of background subtraction is to subtract the image
from a reference image. This reference image is a model of the background
scene. The basic step shown in Figure 3.2 involves background modeling,
thresholding and subtraction operation.
Background modeling step involves construction of a
reference image that represents the background.
In threshold selection step, determination of threshold value
for subtraction operation is done.
In subtraction step classification of the pixels into background
or moving object is done.
Background Thresholding Background
Modeling Subtraction
Figure 3.2 Steps in Background subtraction
43
Traditional background subtraction is a method typically used to
segment moving regions in image sequences taken from a static camera by
comparing each new frame to a model of the scene background. In this thesis,
a novel non-parametric background modeling and a background subtraction
approach is proposed and implemented. The model can handle situations
where the background of the scene is cluttered and not completely static but
contains small motions such as trees and bushes.
3.1.1 Temporal Frame Differencing
Temporal or adjacent frame differencing is the basic technique for
segmentation of background from the target. It involves subtraction of
consequent frames to generate a background model. It assumes initial frame
as background model and the next incoming frame is subtracted with the
assumed background model. The resultant frame becomes the new
background model and it is compared with the next frame. This procedure is
iteratively done till ‘n’ frames of video data. The concept is explained in
Figure 3.3.
Consider the input video consisting of ‘n’ frames f1, f2…, fn.
Initially the first incoming frame fi is assumed as the initial background
model bi.
bi fi , i 1 (3.1)
The final background model bi+1 is obtained as the difference
between two consequent frames fi and fi+1.
fi 1 fi bi 1 , i 1, 2..n (3.2)
where, fi is the ith frame and fi+1 is i+1th frame.
44
The update gi is obtained by subtracting the frame fi with the
resultant background model bi
fi bi gi (3.3)
The resultant binary image Ri is obtained by applying a fixed
threshold T. The threshold is predefined ranging from 100 to 120.
1, gi T
Ri (3.4)
0, gi T
Initial frame Initial BG model
(A1)
Bi
Next Frame (fi) bi New BG model
Bi+1
Frame fi+1 bi+1 BG Update gi
Compare with Threshold T
gi < T gi > T
Background (0) Foreground (1)
Figure 3.3 Concept of Temporal Frame differencing
3.1.2 Running Average Background Subtraction
Running average background segmentation also known as Adaptive
mean background subtraction method involves subtraction of consequent
45
frames based on the background model with a learning rate ‘α’. The
thresholding process is done based on the mean intensity value of the
background model generated for ‘n’ frames of the input video. The concept of
adaptive mean background subtraction is described in Figure 3.4.
The first frame of the input video is assumed as background model
bi initially.
bi fi , i 1 (3.5)
The final background model bi+1 is obtained by the difference
between the frame fi and the initially assumed background model bi blended
with a proportionate fraction α named as tuning factor. The value of α decides
the extent of contribution of the two frames to be subtracted to produce final
background model. The value of α ranges between 0 and 1.
bi 1 fi (1 )bi (3.6)
The update gi is obtained by subtracting the subsequent frames fi
with the background model bi+1
gi fi bi 1 (3.7)
The resultant binary image Ri is obtained based on the value of
threshold T.
1, gi T
Ri (3.8)
0, gi T
46
Initial frame (A1) Initial BG model Bi
Next Frame (fi) bi New BG model
Bi+1
fi α bi BG Update gi
Compare with Threshold T Mean
(Bi+1)
gi < T gi > T
Background (0) Foreground (1)
Figure 3.4 Concept of Running Average background subtraction
Here the threshold is not fixed. It is an adaptive measure obtained
by calculating the mean of all the pixels in each frame of the background
model.
T mean(bi 1 ) (3.9)
3.1.3 Temporal median filtering
The Temporal median filtering technique (TMF) evolves
classification of target and shadows and definition of background model
accordingly. The concept of TMF is described in Figure 3.5. They define
different constraints as follows
47
Moving visual object (MVO): A set of connected points
belonging to object.
Uncovered Background: A set of visible scene points
currently not in motion.
Background (B): The computed model of the background.
Ghost (G): A set of connected points detected as in motion by
means of background subtraction, but not corresponding to
any real moving object.
Shadow: A set of connected background points modified by a
shadow cast over them by a moving object. Shadows can be
classified as a shadow connected with an MVO and ghost
shadow (GSH), being a shadow not connected with any real
MVO.
The known object is defined as
Kot MVOt MVO G G
t
sh
t t
sh
(3.10)
For the set of S elements, It(p) is the object pixel, Bt(p) is the
background pixel and wb is the weight.
S {I t ( p), I t t ( p),..., I t nt ( p)} Wb{Bt ( p)} (3.11)
The final background model is
t t
Bst ( p), p 0in MVOt MVO t
sh
B ( p) (3.12)
t t
Bs ( p), p 0in G G
t t
sh
48
Initial frame (A1) Store data
Median (D) Histogram Model
Calculate Cumulative Histogram Csum BG Update gi
Compare with Threshold hn
Csum< hn Csum> hn
Background (0) Foreground (1)
Figure 3.5 Concept of Temporal Median filtering
Temporal Median Filter computes the median intensity for each
pixel from all the stored frames in the buffer. Considering the complexity in
computation and storage limitations it is not possible to store all the incoming
video frames and make the decision accordingly. Hence the frames are stored
in a limited size buffer. The estimated background model will be closer to the
real background scene as we grow the size of the buffer. However the speed
of the process will get reduced and also higher capacity storage devices will
be required.
3.1.4 Running Gaussian Background Subtraction
In general, in region based Background subtraction techniques, the
regions other than the object of interest are stationary. Thus obtaining
difference in each frame is easier. But, in aerial surveillance an adaptive
49
measure is required to update the background frame periodically to achieve
effective object detection. The adaptive measure is done in thresholding step
of the proposed technique. The concept is described in Figure 3.6. The basic
steps include background modeling, update, subtraction and thresholding. The
‘n’ number of frames f1, f2…., fn, in video sequence is considered for analysis.
The input frames are converted into Gray scale. Background modeling is the
first step in background subtraction. It constructs the reference image
representing the background. Let bi be the ith background model. It is obtained
from the frame fi.
bi fi, i 1 (3.13)
The final background model bi+1 is obtained by the resultant input
frame fi and the assumed background model bi. The value of α decides the
extent that both the frames can influence in the next background model. In
general α varies from 0 to 1.
Initial frame (A1) Initial BG model Bi
Next Frame (fi) bi New BG model Bi+1
fi α bi BG Update gi
1
Compare with Threshold T T i
n i 1n
gi < T gi > T
Background (0) Foreground (1)
Figure 3.6 Concept of Adaptive Gaussian background subtraction
50
bi 1 fi (1 )bi (3.14)
If α = 0, then bi 1 bi , thus there will be no updation in background
model resulting in ghosting effect. If α = 1, then bi 1 fi . Now, the
background model and the input frame are the same and the result reduces to
Binary 0. Thus it is essential to update the tuning factor to get the desired
result. Let the background update be gi. It is obtained by subtracting the
subsequent input frames fi with the tuned background model bi
gi fi bi (3.15)
Thresholding is a process used to classify the input pixels into
background or object based on the value assumed as threshold ‘T’. We
consider an adaptive approach to update the threshold value based on the
statistical properties of the input frame and background model. The resultant
binary image Ri is obtained based on the value of threshold T.
1, gi T
Ri (3.16)
0, gi
The adaptive threshold is decided as the average standard deviation
of each video frame. The value of σ is updated for each frame.
1
T i
n i 1n
(3.17)
i 1 ( fi bi )2 (1 ) i 2 (3.18)
[Link] Tuning factor optimization
Optimization of the tuning factor α is very important step in
background modeling and updation. The need for optimization of the tuning
51
factor α is that, since the new background model is based on previous
background models and incoming frames, if the value of α is updated
periodically, the noise due to distortion which arises with motion is added
with the image parameters. This reduces generation of background model
effectively. Thus there is a necessity of a optimal value for α that suits for all
cases of aerial platform. Optimization is done with the assumption that
statistical properties of the object differ widely from the background.
Consider a 3 dimensional RGB color space of a pixel wise
statistically modeled background image as in Figure 3.7. Consider a pixel ‘i’
in the color space. Ei be the expected color of the pixel ‘i’ and Ii be the color
intensity value for the pixel ‘i’. The difference between Ii and Ei is
decomposed into brightness (α) and chromaticity (D) components.
G
Ei
α
E Ii
Di
Figure 3.7 RGB color space of a pixel ‘i’
The expected RGB color value for the pixel ‘i’ in the background
image is defined as
52
Ei ERi , EGi , EBi (3.19)
Generalizing the expectation parameter for n pixels,
Ei E X i n ;1 n N (3.20)
The line OE is the expected chromaticity line. The RBG intensity
value of the pixel ‘i’ is
Ii I Ri , I Gi , I Bi (3.21)
The brightness distortion is scalar that brings the observed color
nearer to the expected chromaticity line and denoted by ( i ) ,
(i ) ( Ii i Ei )2 (3.22)
Let’s compute distortion of X i n from its mean Ei by considering
orthogonal distortion parameter i n
i n X i n Ei U y (3.23)
where
U y is the Unit vector in Y axis
i n is the brightness parameter
Thus the factor α for RGB space is defined as,
X iR n EiR
R n (3.24)
iR
53
X iG n EiG
G n (3.25)
iG
X iB n EiB
B n (3.26)
iB
Since RGB color space is highly correlated, the values of α
obtained in RGB scale are converted into HSI color space. This is done by
normalizing the equations 3.24, 3.25 and 3.26.
I max R , G , B (3.27)
I is the intensity value in HSI space which contains the
predominant information for processing. Usually H, S values are eliminated
(Chowdhury et al 2011). Thus the final tuning factor ‘α’ is obtained as,
X i n Ei
I n (3.28)
i
If the value of α is 1, it means that the brightness of the sample
pixel is same as the reference pixel. If it is lesser than 1, it means a reasonable
change in the brightness between two pixels considered.
The value of α is optimized and varies from 0 to 1 for both static
and dynamic background models. Four videos are used for analysis and in
each video 100 frames are considered. In each frame 500 pixels at random
locations are considered and their intensities are taken for processing. Based
on the considered pixel values the statistical calculations are made to
determine mean, standard deviation and variance. Based on these three factors
the tuning factor is determined and the results are shown in Table 3.1.
54
Table 3.1 Optimization of tuning factor
Standard Tuning
Video Mean Variance
Deviation factor
Video 1 135.091 5123.039 71.575 0.602
Video 2 137.864 2772.981 52.659 0.507
Video 3 125.727 1084.398 32.930 0.498
Video 4 124.955 1786.522 42.267 0.528
In Table 3.1, for the algorithm Adaptive Gaussian Background
subtraction the average tuning factor for video 1 to 4 are 0.602, 0.507, 0.498
0.528 respectively. The overall tuning factor is approximately around 0.53.
Thus the value for α may be assigned from varying range of 0.5 to 0.6 for
efficient segmentation.
3.2 TARGET TRACKING
Target tracking is a method to track single or multiple objects in a
sequence of frames. Tracking can be done in forward or backward motion.
Mean Shift Tracking is a forward tracking method. It estimates the position of
object in the current frame based on the previous frame. It is gradient ascent
approach that models the image region to be tracked as histogram. It finds the
local maxima of the density function from the data samples. Mean Shift
Tracking is a non-parametric, real time user interfaced kernel based tracking
technique which provides accurate localization and matching of target without
expensive search. It works on iterative passion. It computes the mean shift value
for the current position of pixel and shifts to new position which has the mean
shift value and continues till it satisfies the conditions.
Here, a kernel function is defined which calculates the distance
between sample points considered and the mean shift point. A weight co-
55
efficient is included which is inversely proportional to the distance. As the
distance is closer, the weight co-efficient is larger.
Mean Shift Tracking plays a vital role in the area of target tracking
due to its robustness and computational efficiency. However, the traditional
mean Shift Tracking assumes that the target differs significantly from
background. But, in cases like aerial videos it is difficult to discriminate the
background and the target. Thus, the traditional technique cannot adaptively
catch up to the dynamic changes and thus results in failure. The concept of
mean shift clustering is depicted in Figure 3.8.
Frame n
Frame 1
Mean Shifted center point
Initial center point
Figure 3.8 Concept of mean shift clustering
3.2.1 Traditional mean shift tracking
The traditional mean shift tracking tracks the target region which is
the region of interest of the entire image. It is a simple iterative procedure that
shifts the position of the data point to the mean position of the data cluster.
Here the target model is the density of previous region in a frame and
56
candidate model is the density of the region in the next frame. Target model
and candidate model and defined by same kernel function to define the
region. The tracking procedure involves definition of target and candidate
model, calculation of similarity, defining new position to the target model,
calculation of the distance from current position to the mean position in next
frame and shifting to the new mean position in the next frame. When the
distance between the position of target in current frame and next frame
exceeds a threshold, the current region becomes the new previous region and
the procedure is repeated till it converges to the mean position of the target.
Mean Shift Tracking undergoes two steps – target appearance
description and tracking. The color histogram description is obtained by
classifying all the pixels in the area of target and estimating the probability of
each color. Then, in the next frame the most similar pixels are found by Mean
Shift Tracking. This is done by similarity measure function. Then calculating the
shift vector is done which maximizes the similarity between target histogram and
candidate histogram. Then it converges to the position with maximum similarity.
The classical MST is insensitive to non rigid transformations target location and
overlap. The work flow is described in Figure 3.9.
Let point ŷ0 be the initial position of the previous frame. The
respective model is defined as qˆu u 1..m for m bins of color histogram. The
candidate model is defined as { pˆ u ( yˆ0 )} for position ŷ0 and the distance
measure is evaluated as
m
[ pˆ ( yˆ0 ), qˆ ] pˆ u ( yˆ0 ), qˆu (3.29)
u 1
The weight vector wi i 1..nh for ‘n’ frames of bandwidth ‘h’ is
derived as
57
m
qˆu
wi [b( xi ) u ] (3.30)
u 1 pˆ u ( yˆ0 )
The next location for the candidate model is defined as ŷ1 ,
yˆ x
nh 2
xi wi g ( 0 i )
h
yˆ1 i 1 (3.31)
yˆ 0 xi
nh 2
i 1
w i g (
h
)
TARGET MODEL CANDIDATE MODEL
INITIALIZATION AND LOCALIZATION INITIALIZATION AND LOCALIZATION
NEXT LOCATION OF TARGET NEXT LOCATION OF CANDIDATE
MODEL MODEL
SIMILARITY
DISTANCE MEASURE
SHIFT TO NEW POSITION
Figure 3.9 Traditional Mean shift tracking workflow diagram
The similarity is calculated by Bhattacharya coefficient (Equations
3.59-3.62). For the new position ŷ1 , the distance metric is calculated as
m
[ pˆ ( yˆ1 ), qˆ ] pˆ u ( yˆ1 ), qˆu (3.32)
u 1
The distance metrics for current frame and incoming frame is
compared. If [ pˆ ( yˆ1 ), qˆ ] [ pˆ ( yˆ0 ), qˆ ] , then the new position ŷ1 is,
58
1
yˆ1 ( yˆ0 yˆ1 ) . (3.33)
2
And, if yˆ0 yˆ1 , the iteration is concluded. If the value exceeds
threshold, then the incoming frame is assumed as previous frame ( yˆ0 yˆ1 )
and the procedure is repeated.
3.2.2 Continuously adaptive mean shift tracking
The Continuously Adaptive Mean Shift Algorithm (CAM Shift) is
an adaptation of the Mean Shift algorithm for object tracking in an arbitrary
number and type of feature spaces. The algorithm is based on modification of
mean shift concept, it calculates the probability density of the image of the
distribution and iterates in the direction of the maximum probability density
(mode).. The concept is described in Figure 3.10.
Define ROI of target model
Define Color Histogram
Calculate Moments
Calculate ratioed Histogram
Calculate orientation and scale
Shift to new position
Figure 3.10 Concept of CAM Shift algorithm
59
The first step in CAM Shift algorithm is the definition of
histogram. It is done by associating the pixel value to the corresponding
histogram bin. The m-bin histogram defined for a pixel location xi is
computed as
n
qˆu [c( xi* u )] (3.34)
i 1
After generation of m-bin histogram, the mean location (centroid)
within the search window is defined. The zero, first and second order
moments for the pixel (x,y) with intensity I are computed as follows
M 00 x y I ( x, y) (3.35)
M10 x y xI ( x, y) (3.36)
M 01 x y yI ( x, y) (3.37)
M 20 x y x 2 I ( x, y) (3.38)
M 02 x y y 2 I ( x, y) (3.39)
M11 x y xyI ( x, y) (3.40)
Thus the mean search window location is
M 10
xc (3.41)
M 00
M 01
yc (3.42)
M 00
60
Thus the target model based on the histogram is defined by
[c(x u)]
n
qˆu k xi*
2 *
i (3.43)
i 1
Here, k(x) is convex, monotonically decreasing kernel profile that
assigns higher value to pixels near to center of search window
1 r , r 1
k ( x) (3.44)
0, otherwise
The weighted histogram is not sufficient to localize the target. Thus
a ratio histogram solves the issue by assigning color features to the
background with lower weight. With ‘a’ as scaling factor and ‘h’ as
bandwidth,
ar ,1 r h
k ( x) (3.45)
0, otherwise
Thus the background weighted histogram becomes
[c(x u)]
n
qˆu wˆ u k xi*
2 *
i (3.46)
i 1
The orientation (θ) and scale is defined and the length (l) is defined
based on the intermediate values as follows
M 20
a xc2 (3.47)
M 00
M
b 2 11 xc yc (3.48)
M 00
61
M 02
c yc2 (3.49)
M 00
From these intermediate values the orientation and scale is defined
as
1 b
tan 1 (3.50)
2 ac
The distance d1 and d2 from the distribution is defined as
( a c) b 2 ( a c) 2
d1 (3.51)
2
(a c) b 2 (a c) 2
d2 (3.52)
2
Based on these values of scale and orientation, the new position is
determined and shifted.
3.2.3 Adaptive background Mean shift tracking
The accuracy in mean shift tracking depends on generation of
candidate model which is used to compare with the position of target in
incoming frames. The position of target may fall into three cases as shown in
Figure 3.11.
Case 1: The current position is a target (Vehicle A in Figure 3.11)
Case 2: The position is not a target, but possesses same color properties of
target (Vehicle B in Figure 3.11)
62
Case 3: The current position is not a target and it does not have any
similarity with target (Vehicle C in Figure 3.11)
The method of defining the background position depends on the
information from the previous frame. First frame detects the target and creates
the target model. In next frame candidate model is created. Based on the
difference between target model and candidate model the window is shifted.
In the CAM shift technique the new position is calculated from the scale and
orientation of the centroid position. It is suitable only for the cases where the
frame movement is linear and stable.
Figure 3.11 Target Definition - The mark A is the target, B is not target
but with similar properties of target, C is totally different
from target
In cases of unmanned aerial vehicle based videos, there will be
shear and angular displacement. Thus the new position of a single point does
not help in exact localization of the target in the incoming frame. A total
model of the background of target which includes the distance and angle at
each pixel in the defined target window is required. This requirement is
63
fulfilled in adaptive background mean shift tracking technique. The concept is
depicted in Figure 3.12.
Step 1: Initialization Phase:
Assume the target model with center position ‘y’. The bin size, area
and the color system of the target model is defined. The initialization is done
with Epanechnikov kernel shown in Figure 3.13.
1 x 2 , if 0 x 1
E ( x) (3.53)
0, otherwise
CREATE TARGET MODEL CREATE CANDIDATE MODEL
(Color system, Bin size, Object definition)
Similarity
Yes No
Target Candidate model Define Area, New Background model
Shift to new position
Figure 3.12 Concept of Adaptive background mean shift tracking
Step 2 : Generation of Models:
The target model denoted as qu* and the candidate model denoted as
pu* are generated. The initial window is stored. The target model q
*
u u 1,2,..m is
generated for all bins of an m-bin histogram. An m-bin histogram is a chart
64
with ‘m’ number of bins where each bin depicts the number of pixels in same
color.
Let f ( xi ) be the color index in position xi , ‘m’ be the number of
bins, ‘h’ be the bandwidth and ‘n’ be the pixels defined in the current
window,
n
y xi
qu* N k u f ( xi ) (3.54)
i 1 h
Here, u ( x) is the Kronecker function or delta function is a function
of two variables which is 1 if they are equal. It is defined as,
1, x u 0
u ( x) (3.55)
0, otherwise
Figure 3.13 Epanechnikov Kernel
N is the normalization constant, since the sum of all pixel
probabilities yield 1. Thus, the constant N becomes,
1
k y xi
N (3.56)
i 1 h
65
After creation of target model, the candidate model pu* is
generated. Since the candidate model varies with each frame an index is
assigned for each frame. Let f ( x p ) be the index of the bin in previous frame
and f ( xc ) be the index of the bin in current frame.
n y xi
E u ( f ( xc )) g ( xi ) u ( x p ), f ( xc ) f ( x p )
i 1 h
pu ( y0 )
* *
(3.57)
E y xi ( f ( x )), otherwise
n
u c
i 1 h
For pu* ( y0* ) 0 ,
pu* ( y0* )
pu* ( y0* ) m
(3.58)
p (y )
u 1
*
u
*
0
Step 3: Similarity Definition and Transition to new position:
The similarity function defines the distance between the target
model and candidate model. The maximum multiplication between
probabilities of target model qu* and candidate model pu* will give least error.
For an m x m window in an m-bin histogram, the probability of histogram is
calculated for target model qu* with each candidate model pu* .
The similarity window is calculated by Bhattacharya Co-efficient.
Bhattacharya measures the dissimilarity between distributions of features
such as color and texture. It is a simple geometric interpretation as the cosine
of the angle between the N dimensional vectors. For two identical
distributions
66
N N N
cos( ) p(i) p* (i) p(i ) p(i ) p(i ) 1 (3.59)
i 1 i 1 i 1
Thus the distance axiom is defined as
d ( p, p* ) 1 ( p, p* ) (3.60)
By the axiom of Bhattacharya coefficient,
m
[ pu* ( y0* ), qu* ] pu* ( y0* ), qu* (3.61)
u 1
d ( y) 1 ( p* ( y), q* ) (3.62)
After determining the distance between the target model qu* and
candidate model pu* , the new position y1* for the ‘n’ pixels in the current
window is defined.
n m
qu*
xi
i 1 u 1 pu* ( y0* )
( f ( xi ))
y1* (3.63)
n m
qu*
i 1 u 1 pu* ( y0* )
( f ( xi ))
Step 4: Creating new Background position:
Let J(i) be the new background position for a pixel index ‘i’.
Consider an area Axy in the new window y1* . For the pixels ranging from 1 to
n (i=1…..n), if the area of window y1* and y0* are equal, then J(i) = 0. Else if
they are not equal, then J(i) =1.
67
1, Axy1 Axy 0 0
J (i) (3.64)
0, otherwise
If J(i) is 1, then there is a reasonable shift in position of
background. Thus the shift vector ‘l’ is defined as follows,
y1*x y0* x
x (3.65)
2
y1*y y0* y
y (3.66)
2
l x 2 y 2 (3.67)
The concept of defining background is described in Figure 3.14.
New Background
Next Fame
Next Fame
Length & Slope
Current Frame Current Frame
Figure 3.14 Generation of new Background
Thus the new temporary center y2* is obtained by defining the
distance and slope,
68
y2* y y1*y y
m (3.68)
y y
*
2x
*
1x x
y y
2 2
l *
2y y1*y *
2x y1*x (3.69)
Generalizing for all frames,
* l lm
ynx y(*n 1) x *
, yny yny
*
, if x 0
m 1 2
m2 1
* y(*n 1) x , yny
*
y(*n 1) y l , x 0, y 0
yn ynx
*
(3.70)
y* y *
,y y
* *
l , x 0, y 0
nx ( n 1) x ny ( n 1) y
ynx
*
y(*n 1) x , yny
*
y(*n 1) y , x 0, y 0
Step 5: Reassigning the position of window:
The position of window y1* for the candidate model is
m
[ p1* ( y1* ), q* ] pu* ( y1* ), qu* (3.71)
u 1
if [ p1* ( y1* ), q* ] [ p1* ( y0* ), q* ] , then the position y1*
1 *
y1* ( y1 y0* ) (3.72)
2
Thus in general,
1 *
yn* ( yn yn*1 ) (3.73)
2
The similarity is detected between target model and candidate
model and adjusted till the difference is matched.