41st Southeastern Symposium on System Theory M1C.
3
University of Tennessee Space Institute
Tullahoma, TN, USA, March 15-17, 2009
Airplane Detection and Tracking Using Wavelet Features
and SVM Classifier
Saeed Rastegar1, Amir Babaeian2, Mojtaba Bandarabadi3, Yashar Toopchi4
1,3,4
Department of ECE, Noshirvani Inst. of Tech., University of Mazandaran,
Shariati, Babol, Iran
2
Department of Electrical Engineering, Amirkabir University of Technology,
15875-4413, Hafez, Tehran, Iran
S.Rastegar36@[Link] [Link]@[Link]
[Link]@[Link] Yashar_toopchi@[Link]
Abstract classifier helps to distinguish the object from the its
background [1-3]. Recently, Javed is proposed as a
In this paper we explain a fully automatic system for airplane solution based on wavelet features and Artificial Neural
detection and tracking based on wavelet transform and Support Network (ANN) algorithms [4]. Wavelet coefficients of
Vector Machine (SVM). By using 50 airplane images in different Daubechies 4 wavelet transform were selected as features
situations, models are developed to recognize airplane in the to be used to train ANN. Next, in order to obtain the
first frame of a video sequence. To train a SVM classifier for wavelets coefficients of the entire image, each image
classifying pixels belong to objects and background pixels,
vectors of features are built. The learned model can be used to
(incoming frame) is processed. A specific airplane was
detect the airplane in the original video and in the novel considered as the desired object to be tracked in their
images. For original video, the system can be considered as a experiments as reported in [4]. Firstly, color frames were
generalized tracker and for novel images it can be interpreted converted into gray level images, and then wavelet
as method for learning models for object recognition. After transform were employed; furthermore ANN was trained
airplane detection in the first frame, the feature vectors of this for each frame. However, in [9], we proposed a novel
frame are used to train the SVM classifier. For new video approach to manipulate wavelet coefficients. For each
frame, SVM is applied to test the pixels and form a confidence color band, we processed each image to obtain the
map. The 4th level of Daubechies’s wavelet coefficients
wavelets coefficients. In addition, the SVM classifier [7]
corresponding to input image are used as features. Conducting
simulations, it is demonstrated that airplane detection and
was trained by employing the first frame coefficients and
tracking based on wavelet transform and SVM classification tested with other frames coefficients. In [9] we drew an
result in acceptable and efficient performance. The ellipse in the first frame, around the desired object in
experimental results agree with the theoretical results. order to separate the target from its surrounding
background. The region of target is obtained as an ellipse
at our first frame of image. Then we trained SVM
1. INTRODUCTION classifier with features vectors, derived from the first
Object detection and tracking is known as a crucial frame. At the end we tested SVM with features vectors,
and basic ingredient computer vision and involves in derived from the next frames. Our method in [9] could
many research fields including target tracking in recent successfully perform real-time tracking algorithm in which
years. Generally, tracking is the task of finding the object we demonstrated that it was capable of tracking objects
states (including: position, scale, velocity and many other over long periods of time. In [9] in the first frame, we
characterizing parameters) obtained from observed image distinguished the object from background manually,
sequences. Humans can simply recognize and track an therefore we searched a fully automatic system that can
object immediately even in the presence of high clutter, detect and track different kind of targets in complicated
occlusion, and non-linear variations in the background as backgrounds. In this paper we introduce a new method for
well as in the shape, direction or even the size of the target target detection and tracking based on wavelet features
object. However, such object tracking can be a difficult and SVM classifier. The object, which is tracked for the
and challenging task for a machine. If we consider purpose of demonstration, is a specific airplane. We use
tracking as a classification problem, the choice of a good an airplane library [10] to train the SVM classifier. We
choose 50 airplane images in different positions and then
978-1-4244-3325-4/09/$25.00 ©2009 IEEE 64
M1C.3
we extract their features. Then we train SVM classifier of 240u320 pixels. Images contain approximation,
with this features vectors. The learned model can be used horizontal detail, vertical detail, and diagonal detail. A
to detect the airplane in the first frame. After airplane similar procedure is applied on 240u320 approximation
detection in the first frame, again we train the SVM image for each RGB channel get next images of 120u160
classifier with feature vectors of this frame. Receiving pixels. We continue with this procedure level by level to
new video frame, SVM is employed to test the pixels and get images with sizes of 60×80, 30× 40. Fig.1 shows how
form a confidence map. The proposed method offers four final images with size of 30u40 are obtained from
several advantages. For instance, it can be very resistant each RGB channel of original image. We also use this
against difficulties such as partial occlusion, blurring method for G and B channels. As our final step depicted
caused by camera shaking and deformation of object in Fig.1, we have twelve sub-images with size of 30× 40
position. This is due to employing color information in the for each color image. Here we obtain a feature vector with
proposed technique. Moreover, it can recognize objects size of 60000 for all the 50 images in our library (1200 for
with large aspect change. In addition, the proposed each image) to train the SVM classifier which is discussed
method can be used both for rigid and non-rigid objects. in the following section.
The paper is organized as follows: our procedure to create
of an airplane library is explained in section 2. A short 4. SUPPORT VECTOR MACHINE
review on wavelet transform and our methodology for The principle of Support Vector Machine (SVM)
features extraction is presented in section 3. SVM relies on a linear separation in a high dimension feature
classifier is described in Section 4. The simulation results space where the data have been previously mapped, in
are given in Section 5. Section 6 concludes. order to take into account the eventual non-linearities of
2. AIRPLANES the problem. If we assume that, the training set
X (x i )il 1 R r where l is the number of training
As our first step, to evaluate descriptors on small image
patches, we create a library of airplane images. Images were
taken from [Link]. We choose 50 airplane
images from this database. In this we try to use from different
kinds of airplanes in different backgrounds. Size of all the
images is 480u640.
Original image
3. WAVELET TRANSFORM
3.1. Overview of the Wavelet Transform
Unlike the Fourier transform, in which basis functions
are sinusoids and redundant, the wavelet transforms are
based on short-duration waves, called wavelets, of R channel of resize image G channel of resize image B channel of resize image
480*640 480*640 480*640
different frequency and restricted duration. This
characteristic makes them a favorable choice to provide us Wavelet transform Wavelet transform Wavelet transform
with the frequency as well as temporal information for a
given signal as reported in [5]. They can absolutely be
implemented using multi-resolution techniques as reported
in [6]. The advantage of such an approach is that some
G channel Approximation B channel Approximation
features which might not be detected at one resolution R channel Approximation
image with size of 240*320
image with size of 240*320 image with size of 240*320
may be found at some other resolutions. The 2-D Fast
Wavelet transform
Wavelet Transform (FWT) result in four sub-band images Wavelet transform Wavelet transform
in each level where, dH, dV, dD denote the ......... ......... .........
…….. …….. ……..
approximation, horizontal detail, vertical detail, and …….. ….…. ….….
diagonal detail sub-band images parameters, respectively.
The size of each of the four sub-band images is half of
that of the input image.
3.2. Wavelets Coefficients from each RGB Four final images of R
channel, contain:
Four final images of G
channel, contain:
Four final images of B
channel, contain:
approximation, horizontal
channel approximation, horizontal
detail, vertical detail, and
diagonal detail with size of
detail, vertical detail, and
diagonal detail with size of
approximation, horizontal
detail, vertical detail, and
diagonal detail with size of
30*40
30*40 30*40
As our first step, we resize each image to size of
480u640 then we divide each image into its three RGB Fig. 1. Mechanism used to get the wavelet coefficients for training
channels. In the first resolution level, we apply 4th level
Daubechies’s wavelet in each RGB channel to get images
65
M1C.3
vectors, R stands for R the real line and r is the number the Gaussian Radial Basis Function (RBF) kernel, defined
of modalities, is labeled with two class as:
2
targets Y ( yi ) li 1 , where : x y
K ( x, y ) exp( ) (4)
yi ^ 1,1` ) : R R o F (1) (2V 2 )
Maps the data into a feature space F. Vapnik proved that Where V is the scale parameter, x,y are feature-vectors in the
input space. The Gaussian kernel has two hyper parameters to
maximizing the minimum distance in space F between
control performance C and the scale parameter V . In this paper
) ( X ) and the separating hyper plane H ( w, b) can be we employed Radial Basis Function (RBF) as our SVM kernel
viewed as an appropriate means to reduce the function.
generalization risk and we have:
5. EXPERIMENTAL RESULTS
H ( w, b) ^ f F | w, f ! F b 0`,
5.1. SVM Train and Test for Airplane Detection
(2)
(! is inner product)
In the first step, for each image of our airplane library,
Vapnik also proved that the optimal hyper plane can be we separate the airplane from its surrounding background,
obtained with solving the convex quadratic programming manually. Then the points inside the border are labeled
(QP) problem as follows: with one and points outside are labeled whit zero. Fig2
l
shows 4 examples of airplane images of our library that
1 2
Minimize w c¦ [i (3) we label them. This results in a binary classification
2 i 1 problem which can appropriately be solved using SVM
with yi ( w, )( X ) ! b) t 1 [i i 1,..., l classifier. We train SVM classifier with features vectors
derived of 50 airplane image of library, described in
Where constant C and slack variables x are introduced to Section II.B [8].We have evaluated proposed algorithm on
take into account the eventual non-separability of ) ( X ) first frame of video sequences. The obtained results
into F. However, in practice such criterion is softened to demonstrate that our method is effective for airplane
the minimization of a cost factor involving both the detection in video images taken with a moving camera.
complexity of the classifier and the degree to which Fig3 show the results for airplane detection.
marginal points are misclassified. As a result, a tradeoff
between these factors is managed through a margin of 5.2. SVM train for Airplane Tracking
error parameter (usually designated C) which is tuned
through cross-validation procedures. Although the SVM In this phase, we use results of previous stage (airplane
classifier is based upon a linear discriminator, it is not detection). After airplane detection in the first frame, we
restricted to making linear hypotheses. Non-linear use feature vector of this frame as an input of SVM
decisions are made possible by a non-linear mapping of classifier and train SVM with this features vector.
the data to a higher dimensional space. The phenomenon
is analogous to folding a flat sheet of paper into any three-
dimensional shape and then cutting it into two halves, the
resultant non-linear boundary in the two-dimensional
(a)
space is revealed by unfolding the pieces. The SVM’s
non-parametric mathematical formulation allows these
transformations to be applied efficiently and implicitly:
the SVM’s objective is a function of the dot product
between pairs of vectors; the substitution of the original (b)
dot products with those computed in another space Fig. 2. (a) Original images of Airplane library (b) Region of the target
eliminates the need to transform the original data points is labeled
explicitly to the higher space. The computation of dot
products between vectors without explicitly mapping to
another space is performed by a kernel function. The
nonlinear projection of the data is performed by this
kernel functions. The most commonly used kernel
functions include the linear, polynomial kernel
( K ( x, y ) ( x, y ! R R 1) d and the sigmoid kernel
( K ( x, y ) tanh( x, y ! R R a )) , where x and y are (a) (b)
Fig 3. (a) Original image in the first frame. (b) Result of Airplane
feature vectors in the input space. Other popular kernel is detection in the first frame
66
M1C.3
5.3. Airplane Tracking REFERENCES
We have evaluated proposed algorithm on some video [1] M. Shell, “How to Use the IEEEtran LATEX Class”,
sequences. The obtained results demonstrate that our Journal of LaTeX Class Files, vol. I, no. 11, pp. 1-21,
November 2002. (article style)
method is effective for Airplane tracking in video images
[2] [Link], [Link], and [Link], “Vision-Based Preceding
taken with a moving camera. Fig4 shows the results Vehicle Detection and Tracking,” IEEE Int’l Conf.
related to several frames from a 250-frame color video Computer Vision, 2006.
Sequence where the size of each frame is set to 480u640. [3] [Link], “Ensemble Tracking,” IEEE Trans. Pattern
It can be concluded from Fig4 that the proposed method Analysis and Machine Intelligence, vol. 29, no. 2,
can efficiently handle different occlusions. Note that February. 2007.
although some aspect changes are occurred in some [4] [Link], M.N. Jafri, and J. Ahmad, “Target Tracking in an
frames, the proposed method can efficiently distinguish Image Sequence Using Wavelet Features and a Neural
Airplane from background. The parameter values for Network” IEEE conf. TENCON November. 2005.
[5] Rafael C. Gonzalez, Richard E. Woods, Digital Image
proposed method used for our experiments are as follows:
Processing, 2nd Ed, Prentice-Hall, Inc, 2002
kernel parameter (V ) 5 , c 50
[6] Mallat, S., “A compact multiresolution representation: the
wavelet model,” Proc. IEE Computer Society Workshopon
Computer Vision.
6. CONCLUSIONS AND FUTURE WORKS [7] V.N. Vapnik. The nature of statistical learning theory,
In this paper, we presented a new method for Airplane Second Edition, New York: Springer-Verlag, 1999.
detection and tracking based on wavelet transform and [8] A. Bayesteh Tashk and P. Mowlaee, “Pattern Classification
Using SVM with GMM Data Selection Training Method,”
SVM classifier. The proposed method employs both color IEEE International Conference on Signal Processing and
and spatial information obtained from image. The object Communications (ICSPC 2007), November 2007.
tracking is considered as a classification problem by [9] [Link] and A. Bayesteh Tashk, “Target Tracking
labeling the object in the first frame. Next, SVM was Using Wavelet Features and SVM Classifier” International
trained using the training vectors obtained from image Symposium on Comunnication Systems, Networks and
frames. At the end SVM was used for other frames to Digital Signal Processing, 2008. Accepted
distinguish object from its background. We can create a [10] [Link]
digital library of different kind of targets and then we
examine our method, to detect and track of targets in
complicated backgrounds. The training process for
tracking was done in the first frame. Training SVM
network in other frames except the first one is what we
decide to do in the future work and we expect that this,
may improve the performance of the tracking algorithm.
In addition, we can use this method as an estimation
motion model of the target, and then we find precise
border of the target using other methods such as active
contours.
(a)
(b)
Frame 50 Frame 100 Frame 150
Fig 4. An example for Airplane tracking. (a) We show three frames from
a 250-frame long sequence. (b) The corresponding confidence map.
67