Time series indexing by dynamic covering with cross-range constraints

Tao Sun; Hongbo Liu; Seán McLoone; Shaoxiong Ji; Xindong Wu

doi:10.1007/S00778-020-00614-9

Outline

Time series indexing by dynamic covering with cross-range constraints

Seán McLoone

The VLDB Journal

https://doi.org/10.1007/S00778-020-00614-9

Abstract

Time series indexing plays an important role in querying and pattern mining of big data. This paper proposes a novel structure for tightly covering a given set of time series under the dynamic time warping similarity measurement. The structure, referred to as Dynamic Covering with cross-Range Constraints (DCRC), enables more efficient and scalable indexing to be developed than current hypercube based partitioning approaches. In particular, a lower bound of the DTW distance from a given query time series to a DCRC-based cover set is introduced. By virtue of its tightness, which is proven theoretically, the lower bound can be used for pruning when querying on an indexing tree. If the DCR-C based Lower Bound (LB DCRC) of an upper node in an index tree is larger than a given threshold, all child nodes can be pruned yielding a significant reduction in

Time series indexing by dynamic covering with cross-range constraints Sun, T., Liu, H., McLoone, S., Ji, S., & Wu, X. (2020). Time series indexing by dynamic covering with cross- range constraints. The International Jounal on Very Large Data Bases. Published in: The International Jounal on Very Large Data Bases Document Version: Peer reviewed version Queen's University Belfast - Research Portal: Link to publication record in Queen's University Belfast Research Portal Publisher rights Copyright 2020 Springer. This work is made available online in accordance with the publisher’s policies. Please refer to any applicable terms of use of the publisher. General rights Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the Research Portal that you believe breaches copyright or violates any law, please contact [email protected]. Download date:13. Dec. 2021 The VLDB Journal manuscript No. (will be inserted by the editor) Time Series Indexing By Dynamic Covering with Cross-Range Constraints Tao Sun · Hongbo Liu · Seán McLoone · Shaoxiong Ji · Xindong Wu Received: date / Accepted: date Abstract Time series indexing plays an important role computational time. A Hierarchical DCRC (HDCRC) in querying and pattern mining of big data. This paper structure is proposed to generate the DCRC-tree based proposes a novel structure for tightly covering a given indexing and used to develop time series indexing and set of time series under the dynamic time warping sim- insertion algorithms. Experimental results for a selec- ilarity measurement. The structure, referred to as Dy- tion of benchmark time series datasets are presented namic Covering with cross-Range Constraints (DCRC), to illustrate the tightness of LB DCRC, as well as the enables more eﬃcient and scalable indexing to be de- pruning eﬃciency on the DCRC-tree, especially when veloped than current hypercube based partitioning ap- the time series have large deformations. proaches. In particular, a lower bound of the DTW dis- Keywords Time Series · Dynamic Time Warping · tance from a given query time series to a DCRC-based Indexing · R-Tree · Dynamic Covering · Cross-Range cover set is introduced. By virtue of its tightness, which Constraints is proven theoretically, the lower bound can be used for pruning when querying on an indexing tree. If the DCR- C based Lower Bound (LB DCRC) of an upper node in 1 Introduction an index tree is larger than a given threshold, all child nodes can be pruned yielding a signiﬁcant reduction in With the dramatic growth in the volume of data, and the opportunities for data driven decision making af- T. Sun forded by such data, particularly when it comes to so- School of Innovation and Entrepreneurship, Dalian Universi- cial networks and e-commerce [18, 40], it is vital to have ty of Technology, Dalian 116023, China algorithms that are able to eﬃciently mine big data [2, E-mail: [email protected] 36]. In many practical applications mining of data that H. Liu ( ) is in the form of time series [5,10] is of interest and Institute of Cognitive Information Technology, Dalian Mar- itime University, Dalian 116026, China this has led to the development of bespoke approaches E-mail: [email protected] for tasks such as pattern discovery and clustering [37, S. McLoone 21, 9], classiﬁcation [7,20], rule discovery [30,34], and School of Electronics, Electrical Engineering and Computer summarisation [13]. As with standard data mining, in- Science at Queen’s University Belfast, Northern Ireland, BT9 dexing is a fundamental technique for eﬃciently access- 5AH, UK ing and querying data when performing these tasks [6, E-mail: [email protected] 4]. However, when indexing time series data the choice S. Ji of similarity measurement is a key consideration [23], Department of Computer Science, Aalto University, Espoo 02150, Finland particularly when they are not aligned temporally. In E-mail: shaoxiong.ji@aalto.ﬁ these circumstances, the classical Euclidean distance, X. Wu as introduced in [1], can result in large diﬀerences be- School of Computing and Informatics, University of Louisiana tween two time series even when they are quite simi- at Lafayette, Lafayette, LA 70504-3694, USA lar in shape [14]. Consequently, dynamic time warping E-mail: [email protected] (DTW), which addresses this deﬁciency, has become a 2 T. Sun, et al. popular method of measuring the similarity between relationship. These tuples under the ACM-relationship time series [25,22,35, 24]. correspond to multiple m-dimensional hyper rectangles. When indexing big time series datasets performing In contrast to the classic method, DCRC proposes a direct linear scan of all the time series is general- a “tight” structure composed of multiple hyper rectan- ly computationally intractable and a more considered gles. For a given set S of similar time series in terms of approach is needed. This usually involves mapping the DTW distance, any element of the corresponding DCR- data to a tree-like structure with partitions, and then C must be similar to the elements of S. The tightness extracting a small number of time series from these par- makes it possible to eﬃciently prune unnecessary sam- titions for linear scanning [26,39]. A partition is deﬁned ples when partitioning for DTW indexing. as a low-complexity structure covering a set of relative- We determine the lower bound of the DTW between ly similar time series. For a given query time series, a a given query time series and the cover set of a giv- lower bound with respect to each partition can then be en DCRC structure, denoted as LB DCRC, and then employed during indexing instead of directly measuring introduce the hierarchical DCRC (HDCRC) structure. the similarity between the query time series and each This is composed of multiple layers, with the upper D- element of the partitions. Using this approach eﬃcient CRC structure covering all the elements covered by the pruning procedures can be implemented, substantial- DCRC structures of its sub-layers. Based on the DCR- ly reducing the computational complexity of indexing, C and HDCRC structures, we further present a novel and enabling fast data access and querying [14]. The tree-like indexing and its insertion and node splitting speed-ups achievable using time series partitioning very algorithms. Given time series set S and a query time much depend on how the partitions are deﬁned, the ap- series q, from the root down to its sub-layers in the proach used to generate tree-like indexing using these indexing tree, if the LB DCRC (DCRC based Lower partitions, and the complexity of the lower bound cal- Bound of DTW) of an upper layer is larger than a giv- culation, hence improving on each of these remains an en acceptable range query tolerance, then all of its sub- important area of research, and is the focus of this pa- layers are accordingly pruned, with the result that only per. a few remaining leaves on the indexing tree need to be sequentially scanned using the DTW distance. This In the classical methods [14,39,15], when comput- leads to signiﬁcant reductions in computational time. ing the lower bound of DTW from a query time series q to a set S of time series, the range [Li , Ui ] is comput- In summary, the novel contributions of the paper ed for each dimension i. The set of dimensional ranges are as follows: [Li , Ui ], i = 1, · · · , m, deﬁne a hyper rectangular area, denoted by C, which can serve as a partition in an in- (a) We develop the theory of DCRC-based covering of dexing structure. In fact, the lower bound of DTW is a given set of time series, and prove that a DCRC- exactly the Hausdorﬀ distance from q to C. However, a based covering has signiﬁcantly lower volume than partition represented by a hyper rectangle is often not other methods, that is, if all the elements are similar optimal in terms of DTW distance. With the deforma- to the reference c, any element of the corresponding tion of the time axis when DTW matching, the volume DCRC-based cover set is also similar to c. of partition C can be so large that C might still include (b) The corresponding lower bound of the DTW be- quite dissimilar time series, even if the elements of S tween a given query time series and a given time se- are similar, which results in ineﬃcient indexing. ries set, namely, LB DCRC is proposed. This bound outperforms other lower bounds in terms of tight- As an alternative to hyper rectangles, we propose ness. the use of Dynamic Covering with Cross-Range Con- (c) Since the number of feasible ACM-relationships for straints (DCRC) to partition time series for indexing. a given DCRC usually grows exponentially, we pro- For a given set S, let an approximately central ele- pose a novel polynomial time algorithm to compute ment in terms of the DTW distance be the “reference” the lower bound of the DTW between a given query time series, denoted c. DCRC is deﬁned as a series of time series and the cover set of a given DCRC struc- sets V1 , V2 , · · · , Vm . Each element of set Vi is a 3-tuple ture. (l, u, p), where p is a dimensional subscript of refer- (d) We then present the hierarchical DCRC (HDCR- ence c, and [l, u] denotes a dimensional range. A tuple C) structure, HDCRC-based tree indexing and its (v1 , v2 , · · · ) over the Cartesian product V1 × V2 × · · · insertion and node splitting algorithms and demon- corresponds to an m-dimensional hyper rectangle. We strate with extensive numerical studies that the only consider tuples satisfying “Alignment”, “Continu- proposed DCRC based indexing method performs ity” and “Monotonicity” conditions, called the ACM- eﬃcient pruning for range querying, and outper- Time Series Indexing By Dynamic Covering with Cross-Range Constraints 3 forms linear scanning and other indexing methods and the lower bound is represented by LB Keogh(x,y) in terms of computational time. + LB Keogh(y,x′ ). Based on the common features of LB Kim, LB Yi The remainder of the paper is organized as follows. and LB Keogh, Zhou and Wong [39] proposed several Related work is reviewed in section 2. The key DCRC boundary-based lower bound functions including a non- concepts and alrorithms are introduced in section 3. elaborate version (denoted as LB Corner) and an elab- Then the HDCRC structure and the indexing approach orate version (denoted as LB ECorner). Li and Yang based on the DCRC-tree are developed in section 4. The [17] proposed two extensions of LB Kim and LB Keogh relevant theorems on DCRC and HDCRC are presented (denoted respectively as LB NKim and LB NKeogh). in section 5. Using benchmark datasets from the UCR In 2018 Shen et al. proposed a new lower bound (L- Time Series Classiﬁcation Archive, experimental results B NEW) [29]. In contrast to LB Keogh, LB NEW de- are provided in section 6 to demonstrate the eﬃciency ﬁnes Yi as all the elements of the warping window with of our approaches. Finally, conclusions are provided in center yi , instead of the i-th envelope Yi in LB KEOGH. section 7. Therefore, LB NEW is usually tighter than LB Keogh. Tan et al. [32] proposed the LB ENHANCED lower bound. In this algorithm, Yi is represented by left bands 2 Related Work LWi or right bands Ri , assuring a relatively tight lower W bound. DTW is a more robust measure of the similarity be- In the traditional time series indexing methods [14], tween two time series than the Euclidean distance as the dataset S of sample time series is stored in an R- it takes account of time axis shifting between time se- tree like structure, each tree node of which corresponds ries. Generally, the warping path of DTW is deﬁned to a minimal boundary rectangle (MBR) containing a by a number of global and/or local constraints. Two subset of S. Given a query time series q, retrieving the of the most popular global constraints are the Itaku- subset {s ∈ S|DT W (q, s) ≤ ε} involves two steps: ra parallelogram [12] and the Sakoe-Chiba band [28]. (1) Seach the nodes based on the lower bound between In contrast to the traditional form of DTW, this paper q and MBR in a top-down approach. adopts the form DT Wp [16,32] to denote the Lp norm (2) All the feasible time series are linear scanned using of monotonic DTW distance (p = 2). an eﬃcient method [27]. Despite its limitation with respect to scalability to high dimensional data sets, in recent years DTW has been widely applied, particularly for high-dimensional 3 Dynamic Covering with Cross-Range data indexing [33] and stream matching [19,11]. Constraints (DCRC) However, since DTW does not obey the triangle in- equality, and therefore is not suitable for indexing with 3.1 DTW a metric access method, researchers have switched their attention to developing indexing approaches that work Given a time series x represented by [x1 , x2 , · · · , xn ], with suitability deﬁned DTW lower bounds, rather than let x(i) denote the i-th entry of x, xi and x(i1 : i2 ) DTW itself. In recent years, many researches have fo- denote the subsequence [xi1 , xi1 +1 , · · · , xi2 ]. Here, n is cused on the DTW lower bound. the length of the time series, also referred to as its “di- The idea of using a lower bound function was ﬁrst mension”. proposed by Yi et al. [38]. In their lower bound, denoted DTW measures the similarity between two time se- as LB Yi, the maximum and minimum elements of a ries [31]. For two given time series x = [x1 , x2 , · · · , xm ] sequence are used to represent the sequence. and time series y = [y1 , y2 , · · · , yn ], let W denote a Keogh et al. proposed a lower bound function (de- warping path from x to y. Let (ik , jk ) be the k-th el- noted as LB Keogh) [14], together with an exact index- ement of W and K be the length of W (1 ≤ k ≤ K). ing method based on their lower bound function. For The warping path in DTW is required to satisfy a set two given time series x and y, let Y be a range series, of constraints, referred to as alignment, continuity and each entry Yi of which denotes the i-th envelope, i.e. monotonicity constraints. These are deﬁned as follows: the range between the minimum and the maximum of (a) (i1 , j1 ) = (1, 1) and (iK , jK ) = (m, n); the warping window with center yi . In fact, LB Keogh (b) ik+1 −ik ≤ 1 and jk+1 −jk ≤ 1, k = 1, 2, · · · , K −1; corresponds to the Hausdorﬀ distance from x to Y . Lemire proposed LB IMPROVED lower bound [16], (c) ik+1 −ik ≥ 0 and jk+1 −jk ≥ 0, k = 1, 2, · · · , K −1. which imports additional time series x′ from x and Y , 4 T. Sun, et al. 3 • • • The ratio of the width of the Sakoe-Chiba Band to the length of the time series, denoted by λ (0 < λ ≤ 1), imposes an additional constraint which is deﬁned as 2 • • • follows: (d) | m n ik − jk | ≤ λn, k = 1, 2, · · · , K. 1 • • The DTW path distance is obtained subject to these 1 2 3 4 5 constraints by solving the dynamic programming √ prob- (a) A legal ACM-Relationship lem given in Equ. (1), where δ(i, j) = (xi −yj )2 , µ(i, j) 3 • • • represents the DTW distance √ between x(1 : i) and y(1 : j), and DT W (x, y) = µ(m, n).  2 • • •  δ(i, j) + µ(i − 1, j − 1) µ(i, j) = min δ(i, j) + µ(i − 1, j) (1)   1 • • δ(i, j) + µ(i, j − 1) 1 2 3 4 5 (b) A legal ACM-Relationship 3 • • • 3.2 ACM-Relationship Definition 1 (ACM-Relationship) Considering the Cartesian product P1 × P2 × · · · × Pm , where Pi = {1, 2, 2 • • • · · · , n} for i = 1, 2, · · · , m. Let R(m, n) denote the relationship on the Cartesian product, each element 1 • • r[r1 , r2 , · · · , rm ] of which satisﬁes the Alignment, Con- 1 2 3 4 5 tinuity and Monotonicity (ACM-Relationships) as fol- (c) An illegal ACM-Relationship as it vi- lows. olates “Monotonicity” (a) Alignment. r1 = 1, rm = n; 3 • • • (b) Continuity. ri+1 − ri ≤ 1 for i = 1, 2, · · · , m − 1; (c) Monotonicity. ri+1 − ri ≥ 0 for i = 1, 2, · · · , m − 1. 2 • • • Given a time series x[x1 , x2 , · · · , xn ] of length n, and a relationship r[r1 , r2 , · · · , rm ] ∈ R(m, n), let 1 • • τ (x, r) = [xr1 , xr2 , · · · , xrm ] (2) 1 2 3 4 5 (d) An illegal ACM-Relationship as it vi- Given a time series x[x1 , x2 , · · · , xm ] of length m, and olates “Continuity” a time series y[y1 , y2 , · · · , yn ] of length n, let  Fig. 1 Examples of legal and illegal ACM-relationships  R(x, y) = argmin ||x, τ (y, r)|| r∈R(m,n) (3) 1(b) satisfy the ACM-relationships. However, the two  D(x, y) = min ||x, τ (y, r)|| r∈R(m,n) series in Figs. 1(c) and 1(d) do not satisfy the ACM- relationships. In Equ. (3), r is a ACM-Relationship, τ (y, r) is a time series of length m while |y| = n < m, and ||x, τ (y, r)|| is the Euclidian distance of the two m- 3.3 Approximate Subsequence length time series x and τ (y, r). D(x, y) is the mini- mum Euclidian distance with respect to relationship r, Let A(i1 : i2 ) denote the mean of the entries of x(i1 : i2 ) and R(x, y) is the corresponding value of r. and let E(i1 : i2 ) denote the sum of squares of deviations Fig. 1 shows examples of the ACM-relationship. In from the mean of the entries of x(i1 : i2 ) as deﬁned in each sub-ﬁgure of Fig. 1, 5 columns correspond to 5 Equ. (4). sets P1 , P2 , · · · , P5 , and the black dots correspond to  ∑i2 the elements of Pi . The black dots on the black path  xj represent elements of the Cartesian product P1 × P2 × A(i1 : i2 ) = i2 j=i 1 −i1 +1 E(i1 : i2 ) = ∑ i (4) · · · × P5 . The two series represented by Figs. 1(a) and j=i1 (xj − A(i1 : i2 )) 2 2 Time Series Indexing By Dynamic Covering with Cross-Range Constraints 5 Algorithm 1 Minimization for ACM-Relationship Given a positive integer n (n < m), we deﬁne a Input: A given time series x[x1 , x2 , · · · , xm ] of length m, new structure V to store diﬀerent dimensional ranges. and a time series y[y1 , y2 , · · · , yn ] of length n(n < m). Assume V = [V1 , V2 , · · · , Vm ], where each element v Output: r[r1 , r2 , · · · , rm ] = R(x, y) and d = D (x, y). 1: Let µ00 = 0, let µi0 = ∞ for i = 1, 2, · · · , m, and let of Vi is represented by v(p, l, u). The component p ∈ µ0j = ∞ for j = 1, 2, · · · , n; {1, 2, · · · , n} denotes a dimensional subscript, and [l, u] 2: for i = 1 to m, j = 1 to n do denotes an interval on the real line of the p-th dimen- 3: Let p = argmin δ (i − 1, q ); sion. We stipulate that for any given v1 , v2 ∈ Vi , v1 .p = q∈{j−1,j} 4: Let ri−1 = p; v2 .p if and only if v1 = v2 . 5: Let µij = δ (i, j ) + µi−1,p ; Fig. 2 shows an example of the structure. As shown 6: end for in Fig. 2(a), the structure is composed of 5 sets V1 , V2 , 7: Let rm = n; √ 8: return r = [r1 , r2 , · · · , rm ], and d = µmn ; · · · , V5 , with each set containing a number of 3-tuples. Take V1 for example in Fig. 2(b). There are two rect- angles representing the two 3-tuples. The upper edge Definition 2 (Approximate Subsequence) For a and the lower edge of each rectangle denote the range given m-length time series x and a given integer n (0 < [v.l, v.u], and the number in the rectangle denotes a n < m), the n-length Approximate Subsequence of x, subscript of the reference time series. denoted by AS(x, n) is deﬁned as For the sake of convenience, we introduce the fol- lowing notation. AS(x, n) = argmin D(x, y) (5)  |y|=n   V.n = max{v.p|v ∈ Vm }     V.Pi = {v.p|v ∈ Vi } From Deﬁnition 2, the approximate subsequence of V.vij = v(p, l, u) s.t. (v ∈ Vi ∧ v.p = j) (7) x is the approximate time series of x. The optimal so-     j V.Li = V.vi .lj lution to Equ. (5), and hence AS(x, n), is obtained by    solving the dynamic program: V.Uij = V.vij .u ν(i, j) = min (ν(k − 1, j − 1) + E(k : i)) (6) For a given r ∈ R(m, n), let Rectr (V, r), as deﬁned k in Equ. (8), be an m-dimensional hyper rectangular range. where k ∈ {j, j + 1, · · · , i} and ν(i, j) = D2 (x(1 : i), AS(x(1 : i), j)). The procedure for computing AS(x, n) Rectr (V, r) = {[x1 , · · · , xm ] | xi ∈ [Lj , U j ]} (8) i i is given in Algorithm 2. Fig. 3 illustrates the set in a hyper dimensional rect- angle deﬁned in Equ. (8). The ﬁrst row denotes a match- Algorithm 2 Approximate subsequence for a given ing path of DTW, the second row illustrates a DCRC time series structure, and the third row illustrates a 5-dimensional Input: m-length time series x. Output: n-length Approximate subsequence. hyper rectangle. The lower and upper edges of each 1: Initialise an n-length time series y; rectangle denote the corresponding range of each di- 2: Let i = m; mension. Take the third column for example. The val- 3: for j = n down to 1 do ue of the ﬁrst row is 2, and then in the second row, 4: Let p = i; 5: Let i = argmin (ν (k − 1, j − 1) + E (k : i)); the rectangle with label 2 is selected as the range cor- k responding to the third row. 6: Let y(j ) = A(i : p); 7: Let i = i − 1; Rectr (V, r) corresponds to an m-dimensional cube 8: end for for a given tuple r, which covers a set of time series. 9: return y In fact, not all tuples are permitted; a “legal” tuple r must obey the so-called “ACM”-Relationships. ∏ n volume(V) = (max{v.u − v.l|v.p = j ∧ v ∈ V ∈ V}) 3.4 Covering Set 1 (9) Consider a given set of m-length time series S = {s1 , s2 , · · · , s|S| }, where si consists of [si1 , si2 , · · · , sim ]. In The structure V stores diﬀerent dimensional ranges this section, we focus on deﬁning a structure that can from the given set of time series, from which we can tightly cover set S using the DTW distance. dynamically obtain a “legal” and “tight” cover of the 6 T. Sun, et al. 5 At Line 1, c is the reference time series for set S. To v.u simply the computation, c is assigned to the n-length 2 3 3 • 1 3 “Approxmate Subsequence” of sk randomly selected 1 2 1• v.p from set S. At Line 4, r is an ACM-Relationship by 2 Algorithm 1. For each dimension i, the tuple set Vi of 0 • v.l V is created or updated by the steps at Lines 5-13. V1 V2 V3 V4 V5 Table 1 illustrates a sample DCRC structure build- m dimensions v(p, l, u) ∈ V1 (a) (b) ing procedure. rt corresponds to the matching from st to c satisfying Equ. (3). Xi is the set of matchings Fig. 2 Illustration of a DCRC structure: (a) A DCRC struc- (rti , sti ). Yi represents the merged set {(r, Gr )} of Xi ture with 5 tuple sets; (b) A tuple v(p, l, u) in set V1 . such that r ∈ {rti } and Gr = {s|(r, s) ∈ Xi }, and Vi denotes the i-th entry of the DCRC structure. r 1 1 2 2 3 Algorithm 3 DCRC Structure for a Given Set of Time DCRC V 2 3 3 1 3 1 2 Series 2 Input: A given reference time series c of n-length; Input: A given set of m-length time series S = {s1 , s2 , · · · , sT }, with each element, st , represented by st = [st1 , st2 , · · · , stm ], where t = 1, 2, · · · , T . Rectr (V, d) Output: DCRC structure V. 1 3 1 2 1: If c = nil, let c = AS (sk , n) (n < m) by Algorithm ; 2 2: Initialise series V = [{}, {}, · · · , {}] of m-length; 3: for t = 1 to T do 4: Let rt = R(st , c) by Algorithm 1; Fig. 3 Illustration of hyper rectangle Rectr for 5-dimensional 5: for i = 1 to m do DCRC V 6: if (rti ∈ V.Pi ) then 7: Let v = V.virti ; 8: Let v.l = min(sti , v.l); given set. The “Cover” function is deﬁned by Equ. (10). 9: Let v.u = max(sti , v.u); 10: else 11: Let Vi = Vi ∪ {(rti , sti , sti )}; 12: end if Cover(V) = {x | x ∈ Rectr (V, r), r ∈ R(m, n)} (10) 13: end for 14: end for where Cover is a dynamic combination of Rectr (V, r), 15: return V where r is subject to the ACM-relationship. Hence, we refer to the covering structure as the “Dynamic Cover- ing with Cross-Range Constraints” (DCRC for short). 4 Time Series Indexing with DCRC 3.5 DCRC of Time Series 4.1 DCRC based DTW Lower Bound (LB DCRC) In this section, a feasible and optimal algorithm for Given set S of m-length times series and a DCRC struc- computing the DCRC for a given general set of time ture V determined by Equ. (10), a lower bound of DTW series is proposed. The required steps are set out in from a given time series q to the elements of S can be detail in Algorithm 3. deﬁned as the minimal DTW distance from q to the For a given set S of similar time series, the num- elements of Cover(V), as deﬁned in Equ. (11). ber of possible values that can be assigned to a DCRC structure grows exponentially with the number of their LB DCRC(q, S) = min DT W (q, x) (11) x∈Cover(V) dimension. In our method, the creation of DCRC de- pends on a so-called reference c time series, which is The DCRC based lower bound of classic DTW, name- understood to be a lower dimensional contour of al- ly, LB DCRC, is summarized in Algorithm 4. Given l the samples of S. The ACM-Relationship rt is just a the ratio of the width of the Sakoe-Chiba Band to the many-to-one function from st to c. In fact, the greater length of the time series, denoted by λ, the time com- the similarity between the reference and the samples, plexity for the algorithm is O(λm2 n). the tighter the DCRC structure. The relevant theory is Note that, in a given DCRC structure V, the number established in Theorem 3. of feasible relationships grows with the power of m and Time Series Indexing By Dynamic Covering with Cross-Range Constraints 7 Table 1 Example of computing a DCRC structure from two 5-length time series and a 3-length reference time series 1 2 3 4 5 c 4.00 6.00 5.00 s1 4.11 4.12 4.13 6.14 5.15 s2 4.21 6.22 6.23 5.24 5.25 s3 4.31 6.32 6.33 6.34 5.35 r1 1(s11 → c1 ) 1(s12 → c1 ) 1(s13 → c1 ) 2(s14 → c2 ) 3(s15 → c3 ) r2 1(s21 → c1 ) 2(s22 → c2 ) 2(s23 → c2 ) 3(s24 → c3 ) 3(s25 → c3 ) r3 1(s31 → c1 ) 2(s32 → c2 ) 2(s33 → c2 ) 2(s34 → c2 ) 3(s35 → c3 ) [Xi ] {(1,s11 ), {(1,s12 ), {(1,s13 ), {(2,s14 ), {(3,s15 ), (1,s21 ), (2,s22 ), (2,s23 ), (3,s24 ), (3,s25 ), (1,s31 )} (2,s32 )} (2,s33 )} (2,s34 )} (3,s35 )} [Y i ] {Y11 {s11 ,s21 ,s31 }} {Y21 {s12 }, {Y31 {s13 }, {Y42 {s14 ,s34 }, {Y53 {s15 ,s25 ,s35 }} Y22 {s22 ,s32 }} Y32 {s23 ,s33 }} Y43 {s24 }} [Vi ] {(1,minY11 ,maxY11 )} {(1,minY21 ,maxY21 ), {(1,minY31 ,maxY31 ), {(2,minY42 ,maxY42 ), {(3,minY53 ,maxY53 )} (2,minY22 ,maxY22 )} (2,minY32 ,maxY32 )} (3,minY43 ,maxY43 )} [Vi ] {(1,4.11,4.13)} {(1,4.12,4.12), {(1,4.13,4.13), {(2,6.14,6.34), {(3,5.15,5.35)} (2,6.22,6.32)} (2,6.23,6.33)} (3,5.24,5.24)} n, i.e. is O(ϕmn ), where ϕ is a positive constant. How- V ever, the computation of LB DCRC does not directly enumerate all the relationships, and achieves polynomi- V1 V2 ... VT al complexity by using dynamic programming. √ In Algorithm 4, aijk represents the lower bound DTW from i-length time series q[1 : i] to j-length DCR- s11 s12 . . . s1n1 s21 s22 . . . s2n1 sT 1 sT 2 . . . sT nT C V′ (V1′ , V2′ , · · · , Vj′ ), satisfying Vl′ = {v ∈ Vk |v.p ≤ k}, ... Fig. 5 Illustration of Hierarchical DCRC with two layers for l = 1, 2, · · · , j. Then aijk is computed by the recur- sive formula at Line 16. ries in Equ. (11). As jp = jq ⇒ kp = kq (p ̸= q), assume (0, j, 0) r = [r1 , r2 , · · · , rm ] satisfying for ∀(p ∈ {1, 2, · · · , m} r r ∃q(jq = p ∧kq = rp ), we have gp ∈ [Lpp , Up p ] for p = 1, 2, · · · , m. Furthermore, (i1 , j1 ), (i2 , j2 ), · · · , (iL , jL ) is exactly the DTW path between the query time series q and the optimal solution g. (i − 1, j, k) (i, j, k) (i, j − 1, k − 1) (i − 1, j − 1, k − 1) (0, 0, 0) 4.2 Hierarchical DCRC (HDCRC) (i, j − 1, k) (i, 0, 0) (i − 1, j − 1, k) Consider a given series of sets S1 , S2 , · · · , ST , where St (t = 1, 2, · · · , T ) is a set of m-length time series; and a given series of DCRC structures V1 , V2 , · · · , VT , where (0, 0, k) Vt .n = n and St ⊆ Cover(Vt ) for t = 1, 2, · · · , T . Fig. 4 A feasible matching path from (0,0,0) to (i, j, k) for The problem ∪T is how to obtain a DCRC structure V the computation of LB DCRC satisfying t=1 St ⊆ Cover(V) and V.n = n′ (n′ ≤ n) according to V1 , V2 , · · · , VT only, and not the en- tire set of elements of S1 , S2 , · · · , ST . The hierarchical We will prove Algorithm 4 satisﬁes Equ. (11) by structure is illustrated in Fig. 5. Algorithm 5 sets out Theorem 4 in Sec. 5. In Fig. 4, the dotted line shows the procedure for determining the DCRC structure. a solution for LB DCRC. The computation of point At line 3, the reference time series c of length n′ (i, j, k) depends on the ﬁve points (i − 1, j − 1, k), (i − is converted from the reference x1 of V1 by Algorithm 1, j −1, k −1), (i−1, j, k), (i, j −1, k) and (i, j −1, k −1). 2. The components of V are built by the steps from Let (i1 , j1 , k1 ), (i2 , j2 , k2 ), · · · , (iL , jL , kL ) be an opti- Lines 9-19. For the i-th set in V, if j ∈ Vt .Pi , we have mized path and g [g1 , g2 , · · · , gm ] the optimized time se- rj ∈ V.Pi . 8 T. Sun, et al. Algorithm 4 DCRC based lower bound of DTW (L- Algorithm 5 Hierarchical DCRC B DCRC) Input: Time series c of n′ -length Input: Set S of m-length times series; Input: Set (V1 , V2 , · · · , VT ) of m-length DCRC structures, Input: DCRC structure V = [V1 , V2 , · · · , Vm ] satisfying S ⊂ where Vt .n = n(n′ ≤ n) for t = 1, 2, · · · , T . ∪ Cover(V); Output: A DCRC structure V satisfying T t=1 Cover (Vt ) ⊆ Cover (V) and V.n = n . ′ Input: Ratio λ (0 < λ ≤ 1) of band width to m; an m-length query time series q = [q1 , q2 , · · · , qm ]. 1: if c = nil then Output: LB DCRC (q, S ). 2: Let x1 be the reference time series of Vt . 1: Let A = [aijk ] be an m×m×n-size array, each aijk = +∞ 3: Let c = AS (x1 , n′ ) by Algorithm 2; initially; 4: end if 2: Let B be an empty set; 5: Initialise V = {V1 , V2 , · · · , Vm } such that Vi = ϕ for i = 3: for i = 1 to m, j = 1 to m do 1, 2, · · · , m ; 4: if |i − j| ≤ λm then 6: for t = 1 to T do 5: for each k in V.Pj do 7: Let xt be the reference time series of Vt . 6: B = B ∪ (i, j, k); 8: Let r[r1 , r2 , · · · , rn ] = R(xt , c); 7: end for 9: for i = 1 to m do 8: end if 10: for each j in Vt .Pi do 9: end for 11: Let k = rj ; 10: for each (i, j, k) in B do 12: if (k ∈ V.Pi ) then 11: Let η1 = α(i − 1, j − 1, k); 13: Let V.Lki = min(V.Lki , V.Ljti ); j 12: Let η2 = α(i − 1, j − 1, k − 1); 14: Let V.Uik = max(V.Uik , V.Uti ); 13: Let η3 = α(i − 1, j, k); 15: else 14: Let η4 = α(i, j − 1, k); 16: Let Vi = Vi ∪ {(k, V.Ljti , V.Uti j )} 15: Let η5 = α(i, j − 1, k − 1); 17: end if 16: Let aijk = min(η1 , η2 , η3 , η4 , η5 ) + γ (i, j, k); 18: end for 17: end for 19: end for √ 18: return ammn 20: end for 19: 21: return V 20: function α(i, j, k) 21: if i = j = k = 0 then return 0; 22: else if (i, j, k) ∈ B return aijk ; implementation of function update dcrc(X, c, x) is de- 23: else return +∞; 24: end if rived from lines 5 - 13 in Algorithm 3, with V, c, st 25: end function replaced by parameters X, c, x. The implementation of 26: function update hdcrc(X, c, Y) is derived from lines 9 - 27: function γ (i, j, k) 19 in Algorithm 5, with V, c, Vt replaced by parameters 28: Let x = qi ; 29: Let y0 = V.Lkj ; X, c and Y. 30: Let y1 = V.Ujk ; The implementation of insert series(N , s) is as fol- 31: if x < y0 return (y0 − x)2 ; lows: 32: else if x > y1 return (x − y1 )2 ; (a) If N .c = nil, then N .c is assigned to AS(s, |N .c|); 33: else return 0; 34: end if (b) Let N .Series = N .Series ∪ {s}; 35: end function (c) Let N .V = update dcrc(N .V, N .c, s). The implementation of insert node( N , N ′ ) is as follows: 4.3 DCRC-Tree and Relevant Functions (a) If N .c = nil, then let N .c = AS(N ′ .c, |N .c|); Based on the HDCRC structure, an R-tree [8] like in- (b) Let N .Children = N .Children ∪ {N ′ }; dexing tree, named DCRC-tree, is proposed for eﬃcient (c) Let N .V = update hdcrc(N .V, N .c, N ′ .V); querying. Each node in a DCRC-tree corresponds to a (d) Let N ′ .Parent = N . DCRC structure V (See Sec. 3.5), rather than a minimal boundary rectangle (MBR) as used in R-trees. When 4.4 Node Splitting and Insertion in a DCRC-Tree searching a time series from the DCRC-tree, we still adopt the classic DTW (with global constraints). Motivated by the idea of node splitting in R-trees, we A tree node of the DCRC-tree is represented by tu- develop a node splitting algorithm for DCRC-trees. Let ple N (d, V, c, Parent, Children, Series), where the M be the maximal number of child nodes (not including components are as deﬁned in Table 2. The relevant ba- leaves) of each tree node. There are two cases of node sic operators of the DCRC-Tree are given in Table 3. splitting. The implementation of function create dcrc(c, S) u- The ﬁrst case is when node N is a leaf node satis- tilizes Algorithm 3 with c, S as input parameters. The fying |N .Series| = M , then it is split into nodes N1 Time Series Indexing By Dynamic Covering with Cross-Range Constraints 9 and N2 , with both N1 .Series or N2 .Series containing Table 2 Relevant functions of DCRC-Tree M/2 time series. Algorithm 6 details the node splitting Components Discription algorithm. d Depth of the tree node The second case is is when N is a none-leaf node V DCRC structure satisfying |N .Children| = M , then it is split into nodes c Referent time series N1 and N2 , such that N1 .Children and N2 .Children Parent Parent node Children Child nodes respectively contain M/2 tree nodes. The correspond- Series Chiled time series ing node splitting algorithm for the set of tree nodes is similar to Algorithm 6. Table 3 Relevant functions of DCRC-Tree Algorithm 7 summarizes the steps for inserting a time series into a given DCRC-tree. These are similar Function Input/Output Discription to the steps used with R-trees. From the root, the child c Reference time series node with the minimal increasing volume is selected create dcrc S Set of time series recursively, until the current node is a leaf. Then, the - A new DCRC structure built from S time series is inserted into the leaf node, and from bot- V Original DCRC structure tom to top, the parent node is split if the number of c Reference time series update dcrc its children exceeds a pre-given maximal limit, and the x Newly inserted Time series depth of the tree is less than a pre-given maximal limit. - The updated DCRC structure V after insertion of x Therefore the leaf nodes might have a huge number of X Original DCRC structure time series, which are relatively similar to each other in update hdcrc c Reference time series terms of DTW distance. Y Newly inserted DCRC For R-Tree and DCRC-Tree, consider the tree node - The updated DCRC structure X after insertion of Y covering a set of time series. In a tree node of a R-Tree: N Original DCRC-Tree Node insert series s Newly inserted Time series (1) The covering set is a MBR, each i-th component - The updated DCRC-Tree node is a range interval derived from the bands with the N after insertion of s i-th entry centered. N Original DCRC-Tree Node insert node N′ Newly inserted Node (2) The volume is the production of each i-th range - The updated DCRC-Tree node interval. When the elements are similar, but have N after insertion of N ′ large time axis deformation, we have relatively large volume. (3) The lower bound DTW to a given query time se- 5 Theorems for DCRC ries, is computed by diﬀerent Hausdorﬀ-distance- like methods, including LB Keogh [14], LB NEW For the algorithms in Secs. 3 and 4.2, we will prove [29], LB ENHANCED [32], etc. their correctness and eﬃciency in this section. Theo- rem 1 assures the DCRC structure can cover a given In a tree node of a DCRC-Tree: set. Theorems 2 and 3 prove the tightness of the DCRC (1) The covering set is a DCRC structure, each i-th covering. Considering the lower bound of DTW between component is a set of tuples, and each tuple is a a given query time series and a given DCRC structure range interval and a subscript. by Algorithm 4, Theorem 4 proves its correctness and (2) The volume is computed a deﬁned in Equ. (9). Theorem 5 proves that the hierarchical structure gen- When the elements are similar, but have large time erated by Algorithm 5 is still a DCRC structure, which axis deformation, as long as the reference time se- is used to generate an indexing tree. ries is similar to these elements, we have relatively small volume. Theorem 1 For a given set S of m-length time se- (3) The lower bound DTW to a given query time se- ries, let V be the return value of Algorithm 3, then ries, is computed by LB DCRC using a dynamic S ⊆ Cover(V). programming method. Proof Given st = [st1 , st2 , · · · , stm ] ∈ S where t ∈ {1, 2, Hence, the DCRC-Tree based on HDCRC is a tighter · · · , T }, let r[r1 , r2 , · · · , rm ] be the ACM-relationship structure for covering time series samples, than an R- at line 4 in Algorithm 3. From the loop from lines 5 Tree like structure. Consequently, this leads to more to 13, we have sti ∈ [Lri i , Uiri ] for i = 1, 2, · · · , m. From eﬃcient pruning when performing a query. r ∈ R(m, n) (deﬁned in Deﬁnition 1), and the deﬁnition 10 T. Sun, et al. Algorithm 6 Node Splitting for Time Series Set Algorithm 7 Insertion in a DCRC-tree Input: DCRC-Tree node N (|N .Series| = M , N .d < dmax ). Input: Time series s of m-length Output: The updated DCRC-Tree nodes N and N ′ after Input: Root T of DCRC-tree. splitting. Output: The updated root T after insertion. 1: Let vol = volume(N ); 1: if T = nil then 2: Let Xi = create dcrc(N .c, {N .Series[i]}), for i = 1, 2, 2: New a DCRC-tree node T ; · · · , M; 3: insert series(T , s); 3: Let j1 = argmin volume(Xi ), j2 = argmax volume (Xi ); 4: return T ; i i 5: end if 4: Let x1 = N .Series[j1 ], and let x2 = N .Series[j2 ]; 6: Let N = T ; 5: Let N .Series = ϕ; 7: while N .Children = ̸ ϕ do 6: Create a new tree node N ′ , let |N ′ .c| = |N .c|, and |N ′ .d = 8: for each Ni in N .Children do |N .d; 9: Copy Ni .V to Xi ; 7: insert series( N , x1 ); 10: Let Yi = update dcrc(Xi , Ni .c, {s}); 8: if vol < ε then 11: end for 9: Let N ′ .c = N .c; 12: Let N = N .Children[k], k = argmin volume(Yi ); 10: end if i 11: insert series( N ′ , x2 ); 13: end while 12: Let S ′ = N .Series − {x1 } − {x2 }; 14: insert series(N , s); 13: for each s in S ’ do 15: Let N ′ = nil; 14: Let v1 (s) = volume(create dcrc(x1 , {x1 , s})); 16: if T .d < dmax and |N .Series| = M then 15: Let v2 (s) = volume(create dcrc(x2 , {x2 , s})); 17: Split node N into N and N ′ ; 16: Denote ω (s) = v1 (s) − v2 (s); 18: end if 17: end for 19: while true do 18: Let y1 , y2 , · · · , yM −2 be the permutation of the elements 20: Let Nt = N .Parent; of S ′ satisfying |ω (yi )| ≥ |ω (yi+1 )| for i = 1, 2, · · · , M − 3; 21: if Nt = nil then 19: for i = 1 to M − 2 do 22: if N ′ = ̸ nil then 20: if |N .Series| = M/2 then 23: Create a new node T , let T .Parent = nil; 21: insert series(N ′ , yi ); 24: Let T .d = N .d + 1 22: else if |N ′ .Series| = M/2 then 25: Let |T .c| = |N .c|/2; 23: insert series(N , yi ); 26: insert node(T , N ); 24: else 27: insert node(T , Nb ); 25: if ω (s) < 0 then 28: end if 26: insert series(N , yi ); 29: return T ; 27: else 30: else 28: insert series(N ′ , yi ); 31: if N ′ = nil then 29: end if 32: Update Nt .V with Nt .Children by Algorithm 5; 30: end if 33: else 31: end for 34: insert node(Nt , N ′ ); 32: return N , N ′ 35: if |Nt .Children| = M then 36: Split node N into N and N ′ ; 37: end if 38: end if of Rectr (V, r), st ∈ Rectr (V, r), i.e., st ∈ Cover(V) 39: Let N = Nt ; from Equ. (10). 40: end if 41: end while Lemma 1 Let x1 = [x11 , x12 , · · · , x1m1 ], x2 = [x21 , x22 , · · · , x2m2 ] be two given time series, of length m1 , m2 (m∑ √ 1 ≤ m2 ), and let y be a √ constant. If we have α = Theorem 2 If D(x1 , y) = α, D(x2 , y) = β (where m1 ∑m2 (x − y) and β = (x − y), we have function D is defined in Equ. (3)), we have DT W (x1 , x2 ) i=1 1i i=1 2i √ DT W 2 (x1 , x2 ) ≤ 2⌈m2 /m1 ⌉(α2 + β 2 ). ≤ 2(m2 − n)(α2 + β 2 ). Proof Denote d = DT W (x1 , x2 ). Consider a matching Proof Let r1 [r11 , r12 , · · · , r1m ] = R(x1 , y), let r2 [r21 , r22 , path W of length m2 (might not be a DTW warp- · · · , r2m ] = R(x2 , y), and let W denote a matching ing path) from x2 to x1 , such as (1, i1 ), (2, i2 ),∑· · · , m2 path from x1 to x2 , which is divided into n segments. ∑imk2= ⌈km1 /m2 ⌉.2 We have d2 ≤ k=1 2 (m2 , im2 ), where Let the t-th segment correspond to set Xpt = {k| rpk = (x1ik −x2k ) ≤ k=1 2((x1i∑ 2 −y) +(x2k −y) ). As ik = k m2 t}, and let apt , bpt denote the minimum and maximum ⌈km ∑m11 2/m ⌉, we have d2 ≤ 2 k=1 (x2k −y) +2⌈m2 /m1 ⌉ 2 of Xpt , respectively, where p = 1, 2. k=1 (x1k − y) 2 ≤ 2⌈m2 /m1 ⌉(α2 + β 2 ). Therefore, ∑b1t ∑b2t Let αt2 = k=a (x1k −yt )2 and βt2 = k=a (x2k − DT W (x1 , x2 ) ≤ 2⌈m2 /m1 ⌉(α2 + β 2 ). 2 1t 2t yt )2 . Consider three time series x1 = [x11 , x12 , · · · , x1m1 ], We have 1 ≤ |X1t |, |X2t | ≤ m2 − n. From Lemma 1, x2 = [ x21 , x22 , · · · , x2m2 ] and y = [ y1 , y2 , · · · , yn ] of we have DT W 2 (x1 (a1t : b1t ), x2 (a ∑2tn : b2t )) ≤ 2⌈d1 /d0 ⌉ length m1 , m2 and n, respectively, with n < m1 , m2 . (αt2 +βt2 ). Then DT W 2 (x1 , x2 ) ≤ t=1 DT W 2 (x1 (a1t : Time Series Indexing By Dynamic Covering with Cross-Range Constraints 11 ∑n b1t ), x2 (a2t : b2t )) ≤ 2(m2 −n) t=1√(αt2 +βt2 ) = 2(m2 − is adopted in the computation of aijk , then (j, k − 1) n)(α2 +β 2 ). Then DT W ( x1 , x2 ) ≤ 2(m2 − n)(α2 + β 2 ). and (j, k) will appear in r1 , r2 , · · · , rj at the same time, which contradicts the deﬁnition of ACM-Relationship. Theorem 3 Considering Algorithm 3, let set S = {s1 , s2 , Using dynamic programming, aijk also satisﬁes the · · · , sT } of m-length time series and time series c of √ minimal assumption. Finally, ammn at line 18 is the n-length be the input parameters, and let V be the out- minimum of Equ. (11). put DCRC structure. Assume D(st , c) ≤ α for ∀t ∈ {1, 2, · · · , T }. If x = [x1 , x2 , · · · , xm ] ∈ Cover(V), 5 The return value V of Algorithm 5 satis- √ Theorem ∪T then D(x, c) ≤ mα. D is defined in Equ. (3). fies Cover(Vt ) ⊆ Cover(V). t=1 Proof Firstly, we will prove that for ∀V.vij (assumed v), Proof For any given s = [s1 , s2 , · · · , sm ] ∈ Cover(Vt ), we have v.l ≥ cj − α and v.u ≤ cj + α, where cj is the there exists b = [b1 , b2 , · · · , bm ] satisfying s ∈ Rectr (Vt , b), j-th entry of c. From the computation of warping∑m path i.e., si ∈ Vt .[Lbi i , Uibi ] for i = 1, 2, · · · , m. In addition, b W at line 4, and the assumption D (st , c) = i=1 (sti − 2 satisﬁes the ACM-relationships. cri )2 ≤ α2 , we have cri − α ≤ sti ≤ cri + α. For each Consider line 8 in Algorithm 5, let r = [r1 , r2 , · · · , rn ]. v(V.vij ), from the assignments at lines 7-9 and 11, we From Deﬁnition 1, we have that r satisﬁes the ACM- have v.l ≥ cj − α and v.u ≤ cj + α. relationships. If x ∈ Cover(V), there exists a series r = [r1 , r2 , · · · , The series [rb1 , rb2 , · · · , rbm ] can be shown to satis- rm ] ∈ R(m, n) satisfying x∑∈ Rectr (V, c). From the m fy the ACM-relationships as follows. As r1 = 1, b1 = deﬁnition of D, D2 (x, c) ≤ i=1 (xi − cri )2 . From Equ. 1, rn = n′ and bm = n, then rb1 = 1 and rbm = n′ , (8), we have xi ∈ [v.l, v.u]. As v.l ≥ cri − α and v.u ≤ ∑m i.e., “Alignment” is satisﬁed. As 0 ≤ ri+1 − ri ≤ 1 cri + α, then D2 (x, c) ≤ i=1 (xi − cri )2 ≤ mα2 . Then √ for i = 1, 2, · · · , n − 1 and 0 ≤ bi+1 − bi ≤ 1, for D(x, c) ≤ mα. i = 1, 2, · · · , m − 1, then 0 ≤ rbi+1 − rbi ≤ 1, i.e., “Con- From Theorems 2, 3, and Algorithm 3, we can con- tinuity” and “Monotonicity” are satisﬁed. From the as- rb rb clude that if the elements of DCRC are all similar to signment at lines 12-17, we have si ∈ V.[Li i , Ui i ], i.e., the reference c as measured by function D, the elements s ∈ Cover(V). are also similar to each other in terms of the DTW Dis- tance. According to Theorem 5, if the LB DCRC of an up- per layer is larger than a given acceptable range query Theorem 4 The return value of Algorithm 4 is tolerance, then all of its sub-layers can be pruned to LB DCRC( q, S) as defined in Equ. (11). reduce computational load. Proof Firstly, we will prove aijk = min DT W (q(1 : x i), x(1 : j)) s.t. x(1 : j) ∈ Cover(Vj ) and xj ∈ [Lkj , Ujk ], 6 Experiments where Vj = [V1 , V2 , · · · , Vj ] and x(1 : j) = [x1 , x2 , · · · , xj ]. In order to illustrate the eﬀectiveness of our algorithm- Mathematical induction. Assume ai′ j ′ k′ satisfy the s and indexing structure, experiments are carried out in above min equation for ∀(i′ , j ′ , k ′ ) ((i′ ≤ i ∧j ′ ≤ j this section. We use LB NEW [29] and LB ENHANCED ∧k ′ ≤ k) ((i′ , j ′ , k ′ ) ̸= (i, j, k))). We will prove aijk also [32] for comparisons. The experiments are divided into satisﬁes the above min equation. two parts, the ﬁrst part, presented in Sec. 6.2, provides In line 16, aijk is recursively represented by the sum a comparison of the diﬀerent DTW lower bounds. In of γ(i, j, k) and ai′ j ′ k′ . Considering subscript pair (i′ j ′ ) addition, we also perform experiments to analyze the of ai′ j ′ k′ , there are three cases: (i − 1, j − 1), (i − 1, j) impact of parameters including the length of time se- and (i, j − 1). In the case of (i′ , j ′ ) = (i − 1, j − 1), ries, the ratio of the width of the Sakoe-Chiba Band to from the deﬁnition of Cover in Equ. (10) and the ACM- the length of the time series λ, and acceptable query tol- relationships in Deﬁnition 1, we have (k − 1) ∈ V.Pj−1 erance ε. The second part, presented in Sec. 6.3, shows or k ∈ V.Pj−1 . The two cases correspond to η1 and η2 , the performance of the diﬀerent index trees. respectively. Similarly, η2 , η3 , η4 and η5 correspond to the other cases. Note that η6 = α(i − 1, j, k − 1) and η7 = α(i, j, k − 6.1 Setup 1) are excluded. Considering aijk is the lower bound of DTW from q(1 : i) to x(1 : j). As the optimum The datasets selected for our experiments are from the x ∈ Cover(V), there exists r = [r1 , r2 , · · · , rj ] ∈ R(j, k) UCR Time Series Classiﬁcation Archive [3]. Firstly, we satisfying that xt ∈ V.vtrt for t = 1, 2, · · · , j. If η6 or η7 compute the average of the LB DCRC distances from 12 T. Sun, et al. 1 the query time series to the DCRC structure using Al- LB NEW LB ENHANCED LB DCRC gorithm 4. Then we compute the average DTW from 0.8 the query time series to all the samples in the dataset 0.6 tightness S. The computed LB DCRC and actual DTW values 0.4 for diﬀerent λ are shown in Table 4. The average lower bound distance of LB DCRC is lower than DTW for the 0.2 20 datasets. The time series have diﬀerent length mi . 0 The dimension of V of the DCRC is set to mi and the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ID of datasets dimension of reference r of the DCRC is set to mi /2. (a) λ = 0.2 1 6.2 Distance and Tightness LB NEW LB ENHANCED LB DCRC 0.8 In terms of distance, we compute the average distance 0.6 tightness between the query time series, and the candidate set of time series using four methods: DTW, LB NEW, L- 0.4 B ENHANCED and LB DCRC. Table 5, which shows 0.2 the results of the average distance when λ = 0.2, demon- strates that LB DCRC achieves better performance than 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 LB NEW and LB ENHANCED for all datasets. ID of datasets (b) λ = 0.4 Definition 3 (Tightness of the DTW Lower Bound) Given a method LB of obtaining a lower bound of 1 DTW, a set S of time series, and a query time series 0.8 LB NEW LB ENHANCED LB DCRC q, let the tightness of LB for q and S be deﬁned as LB(q,S) minDT W (s,q) . 0.6 tightness s∈S 0.4 Using this deﬁnition, Fig. 6 shows the average tight- ness of LB NEW, LB ENHANCED, and LB DCRC for 0.2 diﬀerent λ (i.e., 0.2, 0.4, 0.6). From the charts, it is clear 0 that LB DCRC is superior to LB ENHANCED and L- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ID of datasets B NEW on all datasets. When λ increases, the tightness (c) λ = 0.6 of LB NEW and LB ENHANCED decrease signiﬁcant- ly. In contrast, the width of the Sakoe-Chiba band has Fig. 6 Comparison of lower bound tightness under diﬀerent little impact on LB DCRC, i.e. when time series has ratios λ of warping windows over the 20 datasets relatively large deformation, LB DCRC is still a tight lower bound of DTW. The dimensions of these datasets the average pruning power as a function of ε and the are distributed in the range 60 to 637, but this varia- average tightness as a function of λ computed over the tion in dimension does not impact the performance of datasets. LB DCRC relative to the other methods. Fig. 9 shows how the tightness changes with the Definition 4 (Pruning Power for a Query Set) ratio of the Sakoe-Chiba Band for the ﬁrst 4 datasets Given a candidate data set S of time series, and a query employed in our experiments, while Fig. 10 shows the set of time series Q, the pruning power of LB for set Q corresponding variation in pruning power as a function is deﬁned as |{q∈Q|LB(q,S}>ε}| |Q| , where ε is a predeﬁned of query tolerance. In all cases the curves in Figs. 9 and tolerance. 10 decrease monotonically and LB DCRC substantially outperforms its counterparts. Given a tolerance ε, higher pruning power means more query time series can be directly excluded after the computation of the DTW lower bound. Fig. 7 shows 6.3 Indexing Tree Comparisons a comparison of the pruning power of each approach, with increasing ε. The pruning power of LB NEW and By default, the length of each leave is reduced to 20 LB ENHANCED decrease dramatically, while the de- by PAA [14]. Let the maximum number of child nodes cline in LB DCRC is much more gradual. Fig. 8 shows M = 20 and let the maximal depth of the tree dmax = 3 Time Series Indexing By Dynamic Covering with Cross-Range Constraints 13 Table 4 Average LB DCRC / DTW values for diﬀerent λ Dataset Dimension λ=0.2 λ=0.6 λ=1.0 synthetic control 60 2.189/5.757 1.987/5.603 1.987/5.603 Gun Point 150 0.317/0.845 0.287/0.820 0.287/0.820 CBF 128 2.497/4.715 2.413/4.645 2.413/4.645 FaceAll 131 2.196/5.743 1.917/5.616 1.906/5.616 OSULeaf 427 1.416/5.543 1.326/5.411 1.326/5.411 SwedishLeaf 128 0.322/1.306 0.319/1.305 0.319/1.305 50Words 270 7.139/8.897 5.002/6.793 4.866/6.686 Trace 275 10.281/10.864 10.051/10.656 10.051/10.656 MedicalImages 99 1.696/3.545 1.379/3.209 1.363/3.204 ShapeletSim 500 8.148/13.396 8.135/13.396 8.135/13.396 FaceFour 350 4.951/7.250 4.794/7.237 4.794/7.237 Lighting2 637 4.858/8.804 3.828/7.842 3.828/7.842 Lighting7 319 6.770/9.794 5.223/8.268 5.195/8.268 FacesUCR 131 3.696/6.407 3.522/6.361 3.494/6.355 Adiac 176 0.899/1.179 0.899/1.179 0.899/1.179 MoteStrain 84 1.560/4.222 1.478/4.048 1.478/4.048 Fish 463 0.416/0.992 0.416/0.992 0.416/0.992 Plane 144 2.762/3.543 2.698/3.485 2.698/3.485 Car 577 0.663/1.243 0.663/1.243 0.663/1.243 Beef 470 3.112/3.908 3.099/3.894 3.099/3.894 Table 5 Average distance for λ = 0.2 Dataset Dimension DTW LB DCRC LB NEW LB ENHANCED synthetic control 60 5.757 2.189 0.771 0.942 Gun Point 150 0.845 0.317 0.148 0.154 CBF 128 4.715 2.497 0.168 0.218 FaceAll 131 5.743 2.196 1.112 1.127 OSULeaf 427 5.543 1.416 0.033 0.042 SwedishLeaf 128 1.306 0.322 0.079 0.070 50Words 270 8.897 7.139 0.957 0.406 Trace 275 10.864 10.281 3.951 7.755 MedicalImages 99 3.545 1.696 0.807 0.769 ShapeletSim 500 13.396 8.148 0.075 0.082 FaceFour 350 7.250 4.951 0.340 0.368 Lighting2 637 8.804 4.858 0.014 0.070 Lighting7 319 9.794 6.770 0.749 1.068 FacesUCR 131 6.407 3.696 1.689 1.885 Adiac 176 1.179 0.899 0.710 0.386 MoteStrain 84 4.222 1.560 0.416 0.544 Fish 463 0.992 0.416 0.056 0.060 Plane 144 3.543 2.762 1.088 1.197 Car 577 1.243 0.663 0.020 0.021 Beef 470 3.908 3.112 0.659 1.346 in Algorithm 7. For each R-Tree node, the maximum time series contained in the leaf nodes will not be split, number of child nodes M is set to 20. The time series for i.e., a leaf node might contain a huge number of time our experiments are randomly selected from the UCR series which need to be stored in the same hard disk Archive by the random walk method until the resulting ﬁle. dataset has 1 Gillion bytes. All experiments were op- Fig. 11 compares the performance of the indexing timised and implemented in Ansi C++ and conducted trees as a function of query tolerance. In plots (a) and on a 64-bit Win10 operating system with 2.4GHz main (b), the horizontal axis is the query tolerance, and the frequency, 8 CPUs, 64GB RAM and 4T hard disk. vertical axis is the pruning power, where the ratio of For LB NEW and LB ENHANCED, we construct warping window λ is 0.1 in plot (a), and 1.0 in plot the corresponding index structures as R-trees, while for (b). For all the algorithms considered, pruning power LB DCRC we use a DCRC-tree. If the depth of the decreases with increasing query tolerance because more DCRC-tree in Algorithm 7 reaches the given maximum, samples are accepted. From the two plots, the pruning 14 T. Sun, et al. 1 1 LB NEW 0.8 LB ENHANCED LB NEW LB ENHANCED LB DCRC 0.8 LB DCRC pruning power tightness 0.6 0.6 0.4 0.2 0.4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.2 ID of datasets (a) ε = 0.1, λ = 0.2 0.2 0.4 0.6 0.8 1 ε 1 (a) Pruning power as a function of ε (λ = 1.0) 0.8 0.6 LB NEW LB NEW LB ENHANCED LB DCRC tightness 0.6 LB ENHANCED 0.5 LB DCRC 0.4 0.4 tightness 0.2 0.3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ID of datasets 0.2 (b) ε = 0.5, λ = 0.2 0.1 1 0.2 0.4 0.6 0.8 1 0.8 λ (b) Tightness as a function λ LB NEW LB ENHANCED LB DCRC tightness 0.6 Fig. 8 Pruning power as a function of ε and tightness as a 0.4 function of λ averaged over the 20 datasets 0.2 Fig. 12 shows the pruning power with varying λ for 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 diﬀerent indexing structures. In plots (a) and (b), the ID of datasets horizontal axis is the ratio of warping window λ, and (c) ε = 1.0, λ = 0.2 the vertical axis is the pruning power, where the toler- Fig. 7 Comparison of pruning power under diﬀerent accept- ance ε = 0.1 and 1.0 in plots (a)and (b), respectively. able tolerances over the 20 datasets The pruning power decreases with increasing λ, because as the warping window λ increases, the lower bound be- comes lower so that more candidates are accepted. From power of the DCRC-Tree is higher than the others, i.e. the two plots, it is evident that the pruning power of LB DCRC has a tighter lower bound. After querying in DCRC-Tree is greater than the other methods. the indexing tree(R-Tree or DCRC-Tree), the remain- In plots (c) and (d) (the tolerance ε is set to 0.1 and ing unpruned time series are sequentially scanned using 1.0, respectively), the DCRC-Tree signiﬁcantly reduces the UCR suite method [27]. the number of candidates, which greatly reduces the While searching for a given query time series on time complexity of indexing as only a small part of the the DCRC-tree, visiting the non-leaf nodes only costs dataset needs to be linear scanned. about 800 milliseconds of computation time. Therefore, Due to the computation time cost and “dimensional the querying time cost of linear scanning is decided by curse” [8, 14], the node dimension of tree-like indexing the pruning power, more pruning power leads to lower structures is usually set to 20. In Fig. 13, we compare time cost. Plots (c)(λ = 0.1) and (d)(λ = 1.0) provide the inﬂuence of the tree node dimension on pruning a comparison of the computation time for the diﬀerent power. In Fig. 13, the horizontal axis is the node dimen- algorithms. Again, DCRC-Tree outperforms the other sion which varies from 10 to 30, and the vertical axis is methods. The curves are all monotonically increasing, the pruning power, where ε = 1.0, λ = 0.2 in plot(a), which reﬂects the fact that as ε increases, more candi- and λ = 1.0 in plot(b), respectively. The dimensional date data are retrieved. conversion adopts the PAA algorithm [8,14] and the un- Time Series Indexing By Dynamic Covering with Cross-Range Constraints 15 1 0.5 LB NEW LB NEW LB ENHANCED LB ENHANCED LB DCRC 0.8 LB DCRC 0.4 pruning power tightness 0.3 0.6 0.2 0.4 0.1 0.2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 λ ε (a) Synthetic Control (a) Synthetic Control 0.3 LB NEW LB NEW 0.4 LB ENHANCED LB ENHANCED LB DCRC 0.25 LB DCRC pruning power 0.2 tightness 0.3 0.15 0.2 0.1 5 · 10−2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 λ ε (b) Gun Point (b) Gun Point LB NEW 1.5 LB NEW 0.8 LB ENHANCED LB ENHANCED LB DCRC LB DCRC 0.6 pruning power 1 tightness 0.4 0.5 0.2 0 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 λ ε (c) CBF (c) CBF 0.5 1.2 LB NEW LB NEW LB ENHANCED LB ENHANCED LB DCRC LB DCRC 1 pruning power 0.4 tightness 0.8 0.3 0.6 0.4 0.2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ε l λ (d) Face All (d) Face All Fig. 10 The relationship between pruning power and the Fig. 9 The relationship between tightness and warping win- query tolerance for 4 selected datasets (λ = 1.0) dow for 4 selected datasets 16 T. Sun, et al. 120 R-Ttree(LB NEW) 104 R-Ttree(LB NEW) R-Tree(LB ENHANCED) R-Tree(LB ENHANCED) 110 DCRC-Tree DCRC-Tree P runingP ower(%) P runingP ower (%) 102 100 100 98 90 96 80 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ε λ (a) Pruning power for varying of query toler- (a) Pruning power for varying of warping win- ance ε (λ=0.1) dow λ (ε = 0.1) 120 R-Ttree(LB NEW) R-Ttree(LB NEW) 100 R-Tree(LB ENHANCED) R-Tree(LB ENHANCED) 110 DCRC-Tree DCRC-Tree P runingP ower(%) P runingP ower(%) 100 90 90 80 80 70 70 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ε λ (b) Pruning power for varying of query toler- (b) Pruning power for varying of warping win- ance ε (λ=1.0) dow λ (ε = 1.0) 15 12 R-Ttree(LB NEW) R-Ttree(LB NEW) R-Tree(LB ENHANCED) R-Tree(LB ENHANCED) Time Consumption (sec) Time Consumption (sec) 10 DCRC-Tree DCRC-Tree 10 8 6 4 5 2 0 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ε λ (c) Computation Time for varying of query (c) Computation Time for varying of warping tolerance ε (λ=0.1) window λ (ε = 0.1) R-Ttree(LB NEW) 80 60 R-Tree(LB ENHANCED) R-Ttree(LB NEW) Time Consumption (sec) DCRC-Tree R-Tree(LB ENHANCED) Time Consumption (sec) 60 DCRC-Tree 40 40 20 20 0 0.2 0.4 0.6 0.8 1 0 ε 0.2 0.4 0.6 0.8 1 (d) Computation Time for varying of query λ tolerance ε (λ=1.0) (d) Time Consumption for varying of warping window λ (ε = 1.0) Fig. 11 The impact of query tolerance on indexing perfor- mance for λ = 0.1 and λ = 1.0 Fig. 12 Indexing performance comparisons with diﬀerent warping windows when ε = 0.1 and ε = 1.0 Time Series Indexing By Dynamic Covering with Cross-Range Constraints 17 101 pruned results are linear scanned [27]. The results show R-Ttree(LB NEW) R-Tree(LB ENHANCED) that, as expected, the time consumption increases with 100.5 DCRC-Tree increasing dimension, and that the LB DCRC tree sub- P runingP ower(%) stantially outperforms the other methods across the full 100 range of dimensions considered. 99.5 7 Conclusion 99 Dynamic time warping has become a popular approach 10 15 20 25 30 for measuring the similarity of time series, with lower Dimension bound based techniques used to speed up its applica- (a) Pruning power for varying of dimension (λ = 0.1) tion to pruning series in search processes. This paper has presented DCRC as a novel structure for tightly R-Ttree(LB NEW) covering a given set of time series under the DTW dis- 102 R-Tree(LB ENHANCED) tance, and based on this structure proposed the Hier- DCRC-Tree archical DCRC (HDCRC) to generate DCRC-tree in- P runingP ower(%) 100 dexing. We also introduce a lower bound of the DTW distance, which is the distance between a query time se- ries and a given DCRC-based cover set. The tightness 98 of the lower bound, which we have proven theoretical- ly, makes it highly suited to pruning when querying on 96 indexing trees. With the aid of extensive experimental studies we have illustrated that LB DCRC has more 10 15 20 25 30 Dimension stable performance than competing methods for time (b) Pruning power for varying of dimension (λ series indexing. = 1.0) Our future research will focus on multivariate time series, an increasingly important topic in time series 2 R-Ttree(LB NEW) data mining, with the view to extending the DCRC 1.8 R-Tree(LB ENHANCED) structure to cover the set of multivariate time series. Time Consumption (sec) DCRC-Tree Since multivariate time series have both variable-based 1.6 and time-based dimensions, we will endeavor to explore a new way to represent multivariate time series appro- 1.4 priately. 1.2 Acknowledgements The authors sincerely thank the editors 1 and the anonymous reviewers for the very helpful and kind comments that have enhanced the presentation of our pa- 10 15 20 25 30 per. The authors would also like to thank the UCR time se- Dimension ries classiﬁcation archive and Prof. Keogh for providing the (c) Computation Time for varying of dimen- datasets used in the study. This work is supported in part sion (λ = 0.1) by the National Natural Science Foundation of China (Grant Nos. 61751205, 61572540, 61772102), and in part by the U.S. 15 National Science Foundation (Grant No. IIS-1613950). R-Ttree(LB NEW) R-Tree(LB ENHANCED) Time Consumption (sec) DCRC-Tree 10 References 1. Agrawal, R., Faloutsos, C., Swami, A.: Eﬃcient similarity search in sequence databases. In: Proceedings of Interna- 5 tional Conference on Foundations of Data Organization and Algorithms, pp. 69–84. Springer, Boston, MA (1993) 2. Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: A survey on big 0 data. Information Sciences 275, 314–347 (2014) 10 15 20 25 30 3. Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.C.M., Zhu, Dimension Y., Gharghabi, S., Ratanamahatana, C.A., Chen, Y., Hu, (d) Time Consumption for varying of dimen- sion (λ = 1.0) Fig. 13 Indexing performance comparisons with varying tree node dimension 18 T. Sun, et al. B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The 21. Mikalsen, K.Ø., Bianchi, F.M., Soguero-Ruiz, C., UCR time series classiﬁcation archive (2018). URL www. Jenssen, R.: Time series cluster kernel for learning simi- cs.ucr.edu/~eamonn/time_series_data_2018 larities between multivariate time series with missing da- 4. Edstrom, J., Chen, D., Gong, Y., Wang, J., Gong, ta. Pattern Recognition 76, 569–581 (2018) N.: Data-pattern enabled self-recovery low-power storage 22. Mondal, T., Ragot, N., Ramel, J.Y., Pal, U.: Compar- system for big video data. IEEE Transactions on Big ative study of conventional time series matching tech- Data 5(1), 95–105 (2019) niques for word spotting. Pattern Recognition 73, 47–64 5. Esling, P., Agon, C.: Time-series data mining. ACM (2018) Computing Surveys 45(1), 12:1–34 (2012) 23. Mori, U., Mendiburu, A., Lozano, J.A.: Similarity mea- 6. Fu, T.C.: A review on time series data mining. Engineer- sure selection for clustering time series databases. IEEE ing Applications of Artiﬁcial Intelligence 24(1), 164–181 Transactions on Knowledge and Data Engineering 28(1), (2011) 181–195 (2016) 7. Grabocka, J., Wistuba, M., Schmidt-Thieme, L.: Fast 24. Mueen, A., Chavoshi, N., Abu-El-Rub, N., Hamooni, H., classiﬁcation of univariate and multivariate time series Minnich, A., MacCarthy, J.: Speeding up dynamic time through shapelet discovery. Knowledge and Information warping distance for sparse time series data. Knowledge Systems 49(2), 429–454 (2016) and Information Systems 54(1), 237–263 (2018) 8. Guttman, A.: R-trees: A dynamic index structure for spa- 25. Mueen, A., Keogh, E.: Extracting optimal performance tial searching. In: ACM Sigmod International Conference from dynamic time warping. In: Proceedings of the 22nd on Management of Data, pp. 47–57. ACM, New York, NY ACM SIGKDD International Conference on Knowledge (1984) Discovery and Data Mining, pp. 2129–2130. ACM, New 9. He, H., Tan, Y.: Unsupervised classiﬁcation of multivari- York, NY (2016) ate time series using VPCA and fuzzy clustering with 26. Park, S., Lee, D., Chu, W.W.: Fast retrieval of similar spatial weighted matrix distance. IEEE Transactions on subsequences in long sequence databases. In: Proceedings Cybernetics 50(3), 1096–1105 (2020) of 1999 Workshop on Knowledge and Data Engineering 10. Hu, J., Yang, B., Guo, C., Jensen, C.S.: Risk-aware path Exchange, pp. 60–67. IEEE, Chicago, IL (1999) selection with time-varying, uncertain travel costs: A 27. Rakthanmanon, T., Campana, B., Mueen, A., Batista, time series approach. VLDB Journal 27(2), 179–200 G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Search- (2018) ing and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM 11. Ignatov, A.: Real-time human activity recognition from SIGKDD International Conference on Knowledge Discov- accelerometer data using convolutional neural networks. ery and Data Mining, pp. 262–270. ACM, New York, NY Applied Soft Computing 62, 915–922 (2018) (2012) 12. Itakura, F.: Minimum prediction residual principle ap- 28. Sakoe, H., Chiba, S.: Dynamic programming algorithm plied to speech recognition. IEEE Transactions on Acous- optimization for spoken word recognition. IEEE Transac- tics, Speech, and Signal Processing 23(1), 67–72 (1975) tions on Acoustics, Speech, and Signal Processing 26(1), 13. Kacprzyk, J., Wilbik, A., Zadrożny, S.: Linguistic sum- 43–49 (1978) marization of time series using a fuzzy quantiﬁer driven 29. Shen, Y., Chen, Y., Keogh, E., Jin, H.: Accelerating time aggregation. Fuzzy Sets and Systems 159(12), 1485–1499 series searching with large uniform scaling. In: Proceed- (2008) ings of the 2018 SIAM International Conference on Data 14. Keogh, E., Ratanamahatana, C.A.: Exact indexing of dy- Mining, pp. 234–242. SIAM, Bologna, Italy (2018) namic time warping. Knowledge and Information Sys- 30. Son, N.T., Anh, D.T.: Discovery of time series k-motifs tems 7(3), 358–386 (2005) based on multidimensional index. Knowledge and Infor- 15. Keogh, E., Wei, L., Xi, X., Vlachos, M., Lee, S.H., Pro- mation Systems 46(1), 59–86 (2016) topapas, P.: Supporting exact indexing of arbitrarily ro- 31. Sun, T., Liu, H., Yu, H., Chen, C.L.P.: Degree-pruning tated shapes and periodic time series under Euclidean dynamic planning approaches to central time series and warping distance measures. VLDB Journal 18(3), through minimizing dynamic time warping distance. 611–630 (2009) IEEE Transactions on Cybernetics 47(7), 1719–1729 16. Lemire, D.: Faster retrieval with a two-pass dynamic- (2017) time-warping lower bound. Pattern Recognition 42, 32. Tan, C.W., Petitjean, F., Webb, G.: Elastic bands across 2169–2180 (2009) the path: A new framework and method to lower bound 17. Li, H., Yang, L.: Extensions and relationships of some DTW. In: Proceedings of the 2019 SIAM International existing lower-bound functions for dynamic time warping. Conference on Data Mining, pp. 522–530. SIAM, Alberta, Journal of Intelligent Information Systems 43(1), 59–79 Canada (2019) (2014) 33. Tan, C.W., Webb, G.I., Petitjean, F.: Indexing and clas- 18. Li, Q., Chen, Y., Wang, J., Chen, Y., Chen, H.C.: Web sifying gigabytes of time series under time warping. In: media and stock markets: A survey and future direction- Proceedings of the 2017 SIAM International Conference s from a big data perspective. IEEE Transactions on on Data Mining, pp. 282–290. SIAM, Houston, TX (2017) Knowledge and Data Engineering 30(2), 381–399 (2018) 34. Tan, Z., Wang, Y., Zhang, Y., Zhou, J.: A novel time 19. Lin, S.C., Yeh, M.Y., Chen, M.S.: Non-overlapping subse- series approach for predicting the long-term popularity of quence matching of stream synopses. IEEE Transaction- online videos. IEEE Transactions on Broadcasting 62(2), s on Knowledge and Data Engineering 30(1), 101–114 436–445 (2016) (2018) 35. Tang, J., Cheng, H., Zhao, Y., Guo, H.: Structured dy- 20. Liu, M., Zhang, X., Xu, G.: Continuous motion classi- namic time warping for continuous hand trajectory ges- ﬁcation and segmentation based on improved dynamic ture recognition. Pattern Recognition 80, 21–31 (2018) time warping algorithm. International Journal of Pattern 36. Wu, X., Zhu, X., Wu, G., Ding, W.: Data mining with Recognition and Artiﬁcial Intelligence 32(2), 1850,002 big data. IEEE Transactions on Knowledge and Data (2018) Engineering 26(1), 97–107 (2014) Time Series Indexing By Dynamic Covering with Cross-Range Constraints 19 37. Wu, Y., Tong, Y., Zhu, X., Wu, X.: NOSEP: Nonoverlap- ping sequence pattern mining with gap constraints. IEEE Transactions on Cybernetics 48(10), 2809–2822 (2018) 38. Yi, B.K., Jagadish, H.V., Faloutsos, C.: Eﬃcient retrieval of similar time sequences under time warping. In: Pro- ceedings of the 14th International Conference on Data Engineering, pp. 201–208. IEEE, Orlando, FL (1998) 39. Zhou, M., Wong, M.H.: Boundary-based lower-bound functions for dynamic time warping and their indexing. Information Sciences 181(19), 4175–4196 (2011) 40. Zoumpatianos, K., Lou, Y., Ileana, I., Palpanas, T., Gehrke, J.: Generating data series query workloads. VLDB Journal 27(6), 823–846 (2018)

References (41)

Agrawal, R., Faloutsos, C., Swami, A.: Efficient similarity search in sequence databases. In: Proceedings of Interna- tional Conference on Foundations of Data Organization and Algorithms, pp. 69-84. Springer, Boston, MA (1993)
Chen, C.L.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences 275, 314-347 (2014)
Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C.A., Chen, Y., Hu, T. Sun, et al.
B., Begum, N., Bagnall, A., Mueen, A., Batista, G.: The UCR time series classification archive (2018). URL www. cs.ucr.edu/ ~eamonn/time_series_data_2018
Edstrom, J., Chen, D., Gong, Y., Wang, J., Gong, N.: Data-pattern enabled self-recovery low-power storage system for big video data. IEEE Transactions on Big Data 5(1), 95-105 (2019)
Esling, P., Agon, C.: Time-series data mining. ACM Computing Surveys 45(1), 12:1-34 (2012)
Fu, T.C.: A review on time series data mining. Engineer- ing Applications of Artificial Intelligence 24(1), 164-181 (2011)
Grabocka, J., Wistuba, M., Schmidt-Thieme, L.: Fast classification of univariate and multivariate time series through shapelet discovery. Knowledge and Information Systems 49(2), 429-454 (2016)
Guttman, A.: R-trees: A dynamic index structure for spa- tial searching. In: ACM Sigmod International Conference on Management of Data, pp. 47-57. ACM, New York, NY (1984)
He, H., Tan, Y.: Unsupervised classification of multivari- ate time series using VPCA and fuzzy clustering with spatial weighted matrix distance. IEEE Transactions on Cybernetics 50(3), 1096-1105 (2020)
Hu, J., Yang, B., Guo, C., Jensen, C.S.: Risk-aware path selection with time-varying, uncertain travel costs: A time series approach. VLDB Journal 27(2), 179-200 (2018)
Ignatov, A.: Real-time human activity recognition from accelerometer data using convolutional neural networks. Applied Soft Computing 62, 915-922 (2018)
Itakura, F.: Minimum prediction residual principle ap- plied to speech recognition. IEEE Transactions on Acous- tics, Speech, and Signal Processing 23(1), 67-72 (1975)
Kacprzyk, J., Wilbik, A., Zadrożny, S.: Linguistic sum- marization of time series using a fuzzy quantifier driven aggregation. Fuzzy Sets and Systems 159(12), 1485-1499 (2008)
Keogh, E., Ratanamahatana, C.A.: Exact indexing of dy- namic time warping. Knowledge and Information Sys- tems 7(3), 358-386 (2005)
Keogh, E., Wei, L., Xi, X., Vlachos, M., Lee, S.H., Pro- topapas, P.: Supporting exact indexing of arbitrarily ro- tated shapes and periodic time series under Euclidean and warping distance measures. VLDB Journal 18(3), 611-630 (2009)
Lemire, D.: Faster retrieval with a two-pass dynamic- time-warping lower bound. Pattern Recognition 42, 2169-2180 (2009)
Li, H., Yang, L.: Extensions and relationships of some existing lower-bound functions for dynamic time warping. Journal of Intelligent Information Systems 43(1), 59-79 (2014)
Li, Q., Chen, Y., Wang, J., Chen, Y., Chen, H.C.: Web media and stock markets: A survey and future direction- s from a big data perspective. IEEE Transactions on Knowledge and Data Engineering 30(2), 381-399 (2018)
Lin, S.C., Yeh, M.Y., Chen, M.S.: Non-overlapping subse- quence matching of stream synopses. IEEE Transaction- s on Knowledge and Data Engineering 30(1), 101-114 (2018)
Liu, M., Zhang, X., Xu, G.: Continuous motion classi- fication and segmentation based on improved dynamic time warping algorithm. International Journal of Pattern Recognition and Artificial Intelligence 32(2), 1850,002 (2018)
Mikalsen, K.Ø., Bianchi, F.M., Soguero-Ruiz, C., Jenssen, R.: Time series cluster kernel for learning simi- larities between multivariate time series with missing da- ta. Pattern Recognition 76, 569-581 (2018)
Mondal, T., Ragot, N., Ramel, J.Y., Pal, U.: Compar- ative study of conventional time series matching tech- niques for word spotting. Pattern Recognition 73, 47-64 (2018)
Mori, U., Mendiburu, A., Lozano, J.A.: Similarity mea- sure selection for clustering time series databases. IEEE Transactions on Knowledge and Data Engineering 28(1), 181-195 (2016)
Mueen, A., Chavoshi, N., Abu-El-Rub, N., Hamooni, H., Minnich, A., MacCarthy, J.: Speeding up dynamic time warping distance for sparse time series data. Knowledge and Information Systems 54(1), 237-263 (2018)
Mueen, A., Keogh, E.: Extracting optimal performance from dynamic time warping. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2129-2130. ACM, New York, NY (2016)
Park, S., Lee, D., Chu, W.W.: Fast retrieval of similar subsequences in long sequence databases. In: Proceedings of 1999 Workshop on Knowledge and Data Engineering Exchange, pp. 60-67. IEEE, Chicago, IL (1999)
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Search- ing and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discov- ery and Data Mining, pp. 262-270. ACM, New York, NY (2012)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Transac- tions on Acoustics, Speech, and Signal Processing 26(1), 43-49 (1978)
Shen, Y., Chen, Y., Keogh, E., Jin, H.: Accelerating time series searching with large uniform scaling. In: Proceed- ings of the 2018 SIAM International Conference on Data Mining, pp. 234-242. SIAM, Bologna, Italy (2018)
Son, N.T., Anh, D.T.: Discovery of time series k-motifs based on multidimensional index. Knowledge and Infor- mation Systems 46(1), 59-86 (2016)
Sun, T., Liu, H., Yu, H., Chen, C.L.P.: Degree-pruning dynamic planning approaches to central time series through minimizing dynamic time warping distance. IEEE Transactions on Cybernetics 47(7), 1719-1729 (2017)
Tan, C.W., Petitjean, F., Webb, G.: Elastic bands across the path: A new framework and method to lower bound DTW. In: Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 522-530. SIAM, Alberta, Canada (2019)
Tan, C.W., Webb, G.I., Petitjean, F.: Indexing and clas- sifying gigabytes of time series under time warping. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 282-290. SIAM, Houston, TX (2017)
Tan, Z., Wang, Y., Zhang, Y., Zhou, J.: A novel time series approach for predicting the long-term popularity of online videos. IEEE Transactions on Broadcasting 62(2), 436-445 (2016)
Tang, J., Cheng, H., Zhao, Y., Guo, H.: Structured dy- namic time warping for continuous hand trajectory ges- ture recognition. Pattern Recognition 80, 21-31 (2018)
Wu, X., Zhu, X., Wu, G., Ding, W.: Data mining with big data. IEEE Transactions on Knowledge and Data Engineering 26(1), 97-107 (2014)
Wu, Y., Tong, Y., Zhu, X., Wu, X.: NOSEP: Nonoverlap- ping sequence pattern mining with gap constraints. IEEE Transactions on Cybernetics 48(10), 2809-2822 (2018)
Yi, B.K., Jagadish, H.V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Pro- ceedings of the 14th International Conference on Data Engineering, pp. 201-208. IEEE, Orlando, FL (1998)
Zhou, M., Wong, M.H.: Boundary-based lower-bound functions for dynamic time warping and their indexing. Information Sciences 181(19), 4175-4196 (2011)
Zoumpatianos, K., Lou, Y., Ileana, I., Palpanas, T., Gehrke, J.: Generating data series query workloads. VLDB Journal 27(6), 823-846 (2018)

Time series indexing by dynamic covering with cross-range constraints

Sign up for access to the world's latest research

Abstract

Related papers

References (41)

Related papers

Related topics