Similarity Search On Time Series Data
Similarity Search On Time Series Data
Motivations
z
Real motivation?
Distance functions
z
Important feature:
L2 distance is preserved under orthonormal
transforms (For L-p norm, only p=2 satisfy this
property)
Orthonormal transforms: K-L transform, DFT, DWT
Widely used
Figures taken from: A comparison of DFT and DWT based similarity search in Timeseries Databases (Also figures on slide 9,17,18,24,25)
DFT definition
z
n-point DFT:
(Xf is frequency domain, xt is time domain)
Xf = (1/n1/2) * t=0 to n-1 xt exp(-j2ft/n)2 f = 0,1,, n-1
Inverse DFT:
xt = (1/n1/2) * f=0 to n-1 Xf exp(j2ft/n)2 t = 0,1,, n-1
Energy E(x):
E(x) = ||x||2 = |xt|2
Parsevals theorem
z
Building Index
z
Can we do better?
z
z
Where
ij (x) = (2jx i)
i = 0, , 2j-1
1
0 < t < 0.5
(t) =
-1
0.5 < t < 1
0
elsewhere
z
z
Average
Coefficients
(9 7 3 5)
(8 4)
(1 -1)
(6)
(2)
Preserve L2
distance
Feature
Computation
time
Energy
concentration for
first few params
Harr wavelet
DFT
Yes
Yes
Can capture
localized feature
O(n)
Only global
feature
O(nlogn)
Low resolution
Low frequency
Performance comparison
z
Query:
Data:
Nave method
z
z
ST-index
z
Prefix search
Simply use the first of the query to do the search
Multipiece search
If |q| >= k, split q into k pieces, and search DB
with /(k1/2), join the results.
Multi-resolution index
z
z
Figure taken from: Optimizing similarity search for arbitrary length time series queries
(also figures on three slides)
Improved algorithm
z
z
q2
q3
16
32
128
Summary
z
Related papers
z
z
z
z
z
z
Orthonormal transform
z
From: http://www.math.iitb.ac.in/~suneel/final_report/node15.html