0% found this document useful (0 votes)

11 views14 pages

1 s2.0 S0304397508008852 Main

Uploaded by

fordgt90

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views14 pages

1 s2.0 S0304397508008852 Main

Uploaded by

fordgt90

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Theoretical Computer Science 410 (2009) 900–913

Contents lists available at ScienceDirect

Theoretical Computer Science

journal homepage: www.elsevier.com/locate/tcs

Efficient algorithms to compute compressed longest common substrings

and compressed palindromes
Wataru Matsubara a , Shunsuke Inenaga b,∗ , Akira Ishino a,1 , Ayumi Shinohara a ,
Tomoyuki Nakamura a , Kazuo Hashimoto a
a
Graduate School of Information Sciences, Tohoku University, Japan
b
Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan

article info a b s t r a c t

Article history: This paper studies two problems on compressed strings described in terms of straight line
Received 22 April 2008 programs (SLPs). One is to compute the length of the longest common substring of two
Received in revised form 28 November given SLP-compressed strings, and the other is to compute all palindromes of a given SLP-
2008
compressed string. In order to solve these problems efficiently (in polynomial time w.r.t.
Accepted 8 December 2008
Communicated by M. Crochemore
the compressed size) decompression is never feasible, since the decompressed size can
be exponentially large. We develop combinatorial algorithms that solve these problems in
Keywords:
O(n4 log n) time with O(n3 ) space, and in O(n4 ) time with O(n2 ) space, respectively, where
Text compression n is the size of the input SLP-compressed strings.
String processing algorithms © 2008 Elsevier B.V. All rights reserved.
Longest common substring
Palindromes
Straight line program

1. Introduction

The importance of algorithms for compressed texts has recently been arising due to the massive increase of data that are
treated in compressed form. Of various text compression schemes introduced so far, straight line program (SLP) is one of
the most powerful and general compression schemes. An SLP is a context-free grammar of either of the forms X → YZ or
X → a, where a is a constant. SLP allows exponential compression, i.e., the original (uncompressed) string length N can be
exponentially large w.r.t. the corresponding SLP size n. In addition, resulting encoding of most grammar- and dictionary-
based text compression methods such as the LZ-family [13,14], run-length encoding, multi-level pattern matching code [5],
Sequitur [10] and so on, can quickly be transformed into SLPs [2,12,3]. Therefore, it is of great interest to analyze what kind
of problems on SLP-compressed strings can be solved in polynomial time w.r.t. n. Moreover, for those that are polynomial
solvable, it is of great importance to design efficient algorithms. In so doing, one has to notice that decompression is never
feasible, since it can require exponential time and space w.r.t. n.
The first polynomial time algorithm for SLP-compressed strings was given by Plandowski [11], which tests the equality
of two SLP-compressed strings in O(n4 ) time. Later on Karpinski et al. [4] presented an O(n4 log n)-time algorithm for the
substring pattern matching problem for two SLP-compressed strings. Then it was improved to O(n4 ) time by Miyazaki et
al. [9] and recently to O(n3 ) time by Lifshits [6]. The problem of computing the minimum period of a given SLP-compressed

∗Corresponding author. Tel.: +81 92 802 3668.

E-mail addresses: [email protected] (W. Matsubara), [email protected] (S. Inenaga), [email protected]
(A. Ishino), [email protected] (A. Shinohara), [email protected] (T. Nakamura), [email protected] (K. Hashimoto).
1 Presently at Google Japan Inc.

0304-3975/$ – see front matter © 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.tcs.2008.12.016
W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913 901

string was shown to be solvable in O(n4 log n) time [4], and lately in O(n3 log N ) time [6]. Ga̧sieniec et al. [2] claimed that all
squares of a given SLP-compressed string can be computed in O(n6 log5 N ) time.
On the other hand, there are some hardness results on SLP-compressed string processing. Lifshits and Lohrey [7] showed
that the subsequence pattern matching problem for SLP-compressed strings is NP-hard, and that computing the length of
the longest common subsequence of two SLP-compressed strings is also NP-hard. Lifshits [6] showed that computing the
Hamming distance between two SLP-compressed strings is #P-complete.
In this paper we tackle the following two problems: one is to compute the length of the longest common substring of two
SLP-compressed strings, and the other is to find all maximal palindromes of an SLP-compressed string. The first problem was
listed as an open problem in [6]. This paper closes the problem giving an algorithm that runs in O(n4 log n) time with O(n3 )
space. For the second problem of computing all maximal palindromes, we give an algorithm that runs in O(n4 ) time with
O(n2 ) space.
Comparison with previous work. Composition system is a generalization of SLP which also allows ‘‘truncations’’ for the
production rules. Namely, a rule of composition systems is of one of the following forms: X → Y [i] Z[j] , X → YZ , or X → a,
where Y [i] and Z[j] denote the prefix of length i of Y and the suffix of length j of Z , respectively. Ga̧sieniec et al. [2] presented
an algorithm that computes all maximal palindromes from a given composition system in O(n log2 N × Eq(n)) time, where
Eq(n) denotes the time needed for the equality test of composition systems. Since Eq(n) = O(n4 log2 N ) in [2], the overall
time cost is O(n5 log4 N ).
Limited to SLPs, Eq(n) = O(n3 ) due to the recent work by Lifshits [6]. Still, computing all maximal palindromes takes
O(n4 log2 N ) time in total, and therefore our solution with O(n4 ) time is faster than the previous known ones (recall that
N = O(2n )). The space requirement of the algorithm by Ga̧sieniec et al. [2] is unclear. However, since the equality test
algorithm of [6] takes O(n2 ) space, the above-mentioned O(n4 log2 N )-time solution takes at least as much space as ours.
A preliminary version of this work appeared in [8].

2. Preliminaries

2.1. Notations on strings

For any set U of pairs of integers, we denote U ⊕ k = {(i + k, j + k) | (i, j) ∈ U }. We denote by ha, d, t i the
arithmetic progression with the minimal element a, the common difference d and the number of elements t, that is,
ha, d, t i = {a + (i − 1)d | 1 ≤ i ≤ t }. When t = 0, let ha, d, t i = ∅.
Let Σ be a finite alphabet. An element of Σ ∗ is called a string. The length of a string T is denoted by |T |. The empty string
ε is a string of length 0, namely, |ε| = 0. For a string T = XYZ , X , Y and Z are called a prefix, substring, and suffix of T ,
respectively. The i-th character of a string T is denoted by T [i] for 1 ≤ i ≤ |T |, and the substring of a string T that begins at
position i and ends at position j is denoted by T [i : j] for 1 ≤ i ≤ j ≤ |T |. For any string T , let T R denote the reversed string
of T , namely, T R = T [|T |] · · · T [2]T [1].
For any two strings T , S, let LCPref (T , S ), LCStr (T , S ), and LCSuf (T , S ) denote the length of the longest common prefix,
substring and suffix of T and S, respectively.
A period of a string T is an integer p (1 ≤ p ≤ |T |) such that T [i] = T [i + p] for any i = 1, 2, . . . , |T | − p.
A non-empty string T such that T = T R is said to be a palindrome. When |T | is even, then T is said to be an even palindrome,
that is, T = SS R for some S ∈ Σ + . Similarly, when |T | is odd, then T is said to be an odd palindrome, that is, T = ScS R for
some S ∈ Σ ∗ and c ∈ Σ . For any string T and its substring T [i : j] such that T [i : j] = T [i : j]R , T [i : j] is said to be the
i +j
maximal palindrome w.r.t. the center b 2 c, if either T [i − 1] 6= T [j + 1], i = 1, or j = |T |. In particular, T [1 : j] is said to be a
prefix palindrome of T , and T [i : |T |] be a suffix palindrome of T .

2.2. Text compression by straight line programs

In this paper, we treat strings described in terms of straight line programs (SLPs). A straight line program T is a sequence
of assignments such that
X1 = expr1 , X2 = expr2 , . . . , Xn = exprn ,
where each Xi is a variable and each expri is an expression in either of the following form:
• expri = a (a ∈ Σ ), or
• expri = X` Xr (`, r < i).
Denote by T the string derived from the last variable Xn of the program T . The size of the program T is the number n of
assignments in T . We remark that |T | = O(2n ).
Example 1. SLP T = {Xi }7i=1 with X1 = a, X2 = b, X3 = X1 X2 , X4 = X1 X3 , X5 = X3 X4 , X6 = X4 X5 , and X7 = X6 X5 generates
string T = aababaababaab. The derivation tree of SLP T is shown in Fig. 1.
When it is not confusing, we identify a variable Xi with the string derived from Xi . Then, |Xi | denotes the length of the
string derived from Xi .
902 W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913

Fig. 1. The derivation tree of SLP T of Example 1 that generates the string T = aababaababaab.

For any variable Xi of T with 1 ≤ i ≤ n, we define XiR as follows:

if Xi = a (a ∈ Σ ),

a
XiR =
XrR X`R if Xi = X` Xr (`, r < i).
Let T be the SLP consisting of variables XiR for 1 ≤ i ≤ n. The following lemma is important for our algorithms which will
R

be given later on.

R
Lemma 1. SLP T derives string T R .
Proof. By induction on the variables XiR . Let ΣT be the set of characters appearing in T . For any 1 ≤ i ≤ |ΣT |, we have Xi = a
for some a ∈ ΣT , thus XiR = a and a = aR . Let Ti denote the string derived from Xi . For the induction hypothesis, assume
that XjR derives TjR for any 1 ≤ j ≤ i. Now consider variable Xi+1 = X` Xr . Note Ti+1 = T` Tr , which implies TiR+1 = TrR T`R . By
definition, we have XiR+1 = XrR X`R . Since `, r < i + 1, by the induction hypothesis XiR+1 derives TrR T`R = TiR+1 . Thus, T R
= XnR
derives TnR = T R .
Example 2. For SLP T = {Xi }7i=1 of Example 1, its reversed SLP T R = {XiR }7i=1 consists of X1R = a, X2R = b, X3R = X2R X1R ,
X4R = X3R X1R , X5R = X4R X3R , X6R = X5R X4R , and X7R = X5R X6R . SLP T R generates the reversed string T R = (aababaababaab)R =
baababaababaa.
Note that SLP T R
can be easily computed from SLP T in O(n) time.

3. Computing longest common substring of two SLP compressed strings

Let T and S be the SLPs of sizes n and m, which describe strings T and S, respectively. Without loss of generality we
assume that n ≥ m.
In this section we tackle the following problem:
Problem 1. Given two SLPs T and S , compute LCStr (T , S ).
In what follows we present an algorithm that solves Problem 1 in O(n4 log n) time and O(n3 ) space. Let Xi and Yj denote
any variable of T and S for 1 ≤ i ≤ n and 1 ≤ j ≤ m.

3.1. Overlaps between two strings

For any two strings X and Y , we define the set OL(X , Y ) as follows:
OL(X , Y ) = {k > 0 | X [|X | − k + 1 : |X |] = Y [1 : k]}
Namely, OL(X , Y ) is the set of lengths of overlaps of suffixes of X and prefixes of Y .
Example 3. For strings X = ababbab and Y = babbabb, OL(X , Y ) = {1, 3, 6} since b, bab and babbab are both suffixes
of X and prefixes of Y .
Karpinski et al. [4] gave the following results for computation of OL for strings described by SLPs.
Lemma 2 ([4]). For any variables Xi and Xj of an SLP T , OL(Xi , Xj ) can be represented by O(n) arithmetic progressions.
Theorem 1 ([4]). For any SLP T , OL(Xi , Xj ) can be computed in total of O(n4 log n) time and O(n3 ) space for any 1 ≤ i ≤ n and
1 ≤ j ≤ n.
In order to solve Problem 1 it is useful to compute OL(Xi , Yj ) and OL(Yj , Xi ) for each 1 ≤ i ≤ n and 1 ≤ j ≤ m. In so doing,
we produce a new variable V = Xn Ym , that is, V is a concatenation of SLPs T and S . Then we compute OL for each pair of
variables in the new SLP of size n + m. On the assumption that n ≥ m, it takes O(n4 log n) time and O(n3 ) space in total.
W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913 903

3.2. The FM function

For any two SLP variables Xi , Yj and any integer k with 1 ≤ k ≤ |Xi |, we define function FM (Xi , Yj , k) which returns the
position which is just one position to the left of the first position of mismatches when we compare Yj with Xi at position k.
Namely, FM (Xi , Yj , k) equals the length of the longest common prefix of Xi [k : |Xi |] and Yj ;

FM (Xi , Yj , k) = LCPref (Xi [k : |Xi |], Yj ).

Example 4. Consider variables X6 = aababaab and X5 = abaab of Example 1. Then FM (X6 , X5 , 2) = 3 as LCPref (X6 [2 :
|X6 |], X5 ) = LCPref (ababaab, abaab) = 3.
Lemma 3 ([4]). For any variables Xi , Yj and integer k, FM (Xi , Yj , k) can be computed in O(n log n) time, provided that OL(Xi0 , Yj0 )
is already computed for any 1 ≤ i0 ≤ i and 1 ≤ j0 ≤ j.

3.3. Efficient computation of longest common substrings

The main idea of our algorithm for computing LCStr (T , S ) is based on the following observation.

Observation 1. For any substring Z of string T , there always exists a variable Xi = X`i Xri of SLP T such that:

• Z is a substring of Xi and
• Z touches or covers the boundary between X`i and Xri .
Example 5. Consider SLP T of Example 1 generating T = aababaababaab. Substring baababaab of T is a substring of
X7 = X6 X5 and covers the boundary between X6 and X5 . Substring baab of T is a substring of X5 = X3 X4 and covers the
boundary between X3 and X4 . Substring T [7] = a of T is a substring of X3 = X1 X2 and touches the boundary between X1 and
X2 . (See also Fig. 1.)

It directly follows from Observation 1 that any common substring of strings T , S touches or covers both of the boundaries
in Xi and Yj for some 1 ≤ i ≤ n and 1 ≤ j ≤ m.
For any SLP variables Xi = Xì Xri , Yj = Y`j Yrj and any non-negative integer k, let h1 and h2 be the maximum values such
that Xi [|Xì | − k − h1 + 1 : |Xì | + h2 ] = Yj [|Y`j | − h1 + 1 : |Y`j | + k + h2 ]. That is,

h1 = LCSuf (X`i [1 : |X`i | − k], Y`j ) and

h2 = LCPref (Xri , Yrj [k + 1 : |Yrj |]).

Then let

if Xi = X`i Xri and Yj = Y`j Yrj ,

k + h1 + h2
Ext Xi ,Yj (k) =
k if Xi or Yj is constant.

For a set S of integers, we define Ext Xi ,Yj (S ) = {Ext Xi ,Yj (k) | k ∈ S }. Ext Yj ,Xi (k) and Ext Yj ,Xi (S ) are defined similarly.
The next observation follows from the above arguments (see also Fig. 2):

Observation 2. For any strings T and S, LCStr (T , S ) equals to the maximum element of the set
[
(Ext Xi ,Yj (OL(X`i , Yrj )) ∪ Ext Yj ,Xi (OL(Y`j , Xri )) ∪ Ext Xi ,Yj (0)),
1≤i≤n,1≤j≤m

Based on Observation 2, our strategy for computing LCStr (T , S ) is to compute max(Ext Xi ,Yj (OL(X`i , Yrj ))), max(Ext Yj ,Xi (OL
(Y`j , Xri ))), and Ext Xi ,Yj (0) for each pair of Xi and Yj . Notice that Ext Xi ,Yj (0) can be computed in O(n log n) time due to Lemma 3,
provided that the reversed SLP T R and Occ 4 (XiR , XjR ) are already computed for each pair of variables XiR and XjR in T R .
Lemma 4 below shows how to compute max(Ext Xi ,Yj (OL(X`i , Yrj ))) and max(Ext Yj ,Xi (OL(Y`j , Xri ))) using FM.

Lemma 4. For any variables Xi = X`i Xri and Yj = Y`j Yrj , we can compute max(Ext Xi ,Yj (OL(X`i , Yrj ))) and
max(Ext Yj ,Xi (OL(Y`j , Xri ))) in O(n2 log n) time.

Proof. Here we concentrate on computing max(Ext Xi ,Yj (OL(Xì , Yrj ))), as the case of max(Ext Yj ,Xi (OL(Y`j , Xri ))) is just
symmetric. Let ha, d, t i be any of the O(n) arithmetic progressions of OL(Xì , Yrj ).
Assume that t > 1 and a < d. The cases where t = 1 or a = d are easier to show. Let u = Yrj [1 : a] and v = Yrj [a + 1 : d].
For any string w , let w ∗ denote an infinite repetition of w , that is, w ∗ = www · · · .
Let e1 , e2 be the largest integer such that Xi [|Xì | − e2 + 1 : |Xì | + e1 ] is the longest substring of Xi that contains
Xi [|Xì | − d + 1 : |Xì |] and has a period d. Similarly, let e3 , e4 be the largest integer such that Yj [|Y`j | − e4 + 1 : |Y`j | + e3 ] is
904 W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913

Fig. 2. Illustration of Observation 2. Each candidate for LCStr (T , S ) can be computed by extending either some overlap between Xì and Yri or some overlap
between Yì and Xri , or concatenating LCSuf (Xì , Yì ) and LCPref (Xri , Yri ).

Fig. 3. Illustration for the proof of Lemma 4. The dark rectangles represent the overlaps between X`i and Yrj . Case 6 is the special case where cases 4 and 5
happen at the same time and case 3 does not exist.

the longest substring of Yj that contains Yj [|Y`j | + 1 : |Y`j | + d] and has a period d. More formally,

FM (Yrj , Xri , a + 1) if FM (Yrj , Xri , a + 1) < d,

e1 = LCPref (Xri , (v u) ) =∗
FM (Xri , Xri , d + 1) + d otherwise,
e2 = LCSuf (X`i , (v u)∗ ) = FM (X`Ri , X`Ri , d + 1) + d,
e3 = LCPref (Yrj , (uv)∗ ) = FM (Yrj , Yrj , d + 1) + d,
(
FM (X`Ri , Y`Rj , a + 1) if FM (X`Ri , Y`Rj , a + 1) < d,
e4 = LCSuf (Y`j , (uv) ) = ∗
FM (Y`Rj , Y`Rj , d + 1) + d otherwise.

(See also Fig. 3.) As above, we can compute e1 , e2 , e3 , e4 by at most 6 calls of FM.
Let k ∈ ha, d, t i. We categorize Ext Xi ,Yj (k) depending on the value of k, as follows.
W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913 905

case 1: When k < min{e3 − e1 , e2 − e4 }. If k − d ∈ ha, d, t i, it is not difficult to see Ext Xi ,Yj (k) = Ext Xi ,Yj (k − d) + d.
Therefore, we have

A = max{Ext Xi ,Yj (k) | k < min{e3 − e1 , e2 − e4 }} = Ext Xi ,Yj (k0 ),

where k0 = max{k | k < min{e3 − e1 , e2 − e4 }}.

case 2: When k > max{e3 − e1 , e2 − e4 }. If k + d ∈ ha, d, t i, it is not difficult to see Ext Xi ,Yj (k) = Ext Xi ,Yj (k + d) + d.
Therefore, we have

B = max{Ext Xi ,Yj (k) | k > max{e3 − e1 , e2 − e4 }} = Ext Xi ,Yj (k00 ),

where k00 = min{k | k > max{e3 − e1 , e2 − e4 }}.

case 3: When min{e3 − e1 , e2 − e4 } < k < max{e3 − e1 , e2 − e4 }. In this case we have Ext Xi ,Yj (k) = min{e1 + e2 , e3 + e4 }
for any k with min{e3 − e1 , e2 − e4 } < k < max{e3 − e1 , e2 − e4 }. Thus
C = max{Ext Xi ,Yj (k) | min{e3 − e1 , e2 − e4 } < k < max{e3 − e1 , e2 − e4 }}
= min{e1 + e2 , e3 + e4 }.
case 4: When k = e3 − e1 . In this case we have
D = Ext Xi ,Yj (k) = k + min{e2 − k, e4 } + LCPref (Yrj [k + 1 : |Yrj |], Xri )
= k + min{e2 − k, e4 } + FM (Yrj , Xri , k + 1).
case 5: When k = e2 − e4 . In this case we have
E = Ext Xi ,Yj (k) = k + LCSuf (Xì [1 : |Xì | − k], Y`j ) + min{e1 , e3 − k}
= k + FM (X`Ri , Y`Rj , k + 1) + min{e1 , e3 − k}.
case 6: When k = e3 − e1 = e2 − e4 . In this case we have
F = Ext Xi ,Yj (k)
= k + LCSuf (Xì [1 : |Xì | − k], Y`j ) + LCPref (Yrj [k + 1 : |Yrj |], Xri )
= k + FM (X`Ri , Y`Rj , k + 1) + FM (Yrj , Xri , k + 1).

Then clearly the following inequality stands (see also Fig. 3):
F ≥ max{D, E } ≥ C ≥ max{A, B}. (1)
A membership query to the arithmetic progression ha, d, t i can be answered in constant time. Also, an element k ∈ ha, d, t i
such that min{e3 − e1 , e2 − e4 } < k < max{e3 − e1 , e2 − e4 } of case 3 can be found in constant time, if such exists. k0 and
k00 of case 1 and case 2, respectively, can be computed in constant time as well. Therefore, based on inequality (1), we can
compute max(Ext Xi ,Yj (ha, d, t i)) by at most 2 calls of FM, provided that e1 , e2 , e3 , e4 are already computed.
Since OL(X`i , Yrj ) contains O(n) arithmetic progressions by Lemma 2, and each call of FM takes O(n log n) time by
Lemma 3, max(Ext Xi ,Yj (OL(X`i , Yrj ))) can be computed in O(n2 log n) time.

A pseudo-code of our algorithm is given in Algorithm 1.

Algorithm 1: Computing LCStr (T , S ).

Input: SLPs T = {Xi }ni=1 , S = {Yj }m
j= 1
Output: Length of longest common substring of strings T and S
1 for i = 1 to n do
2 for j = 1 to m do
3 compute OL(Xi , Yj ) and OL(Yj , Xi );
4
5 L = ∅;
6 for i = 1 to n do
7 for j = 1 to m do
8 L = L ∪ max(ExtXi ,Yj (OL(Xli , Yrj ))) ∪ max(ExtYj ,Xi (OL(Ylj , Xri ))) ∪ ExtXi ,Yj (0);
9
10 return max(L);

Now we obtain the main result of this section.

906 W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913

Fig. 4. Illustration of Observation 3. Any maximal palindrome of Xi is a non-suffix maximal palindrome of X` (like p1 ), a maximal palindrome of Xi covering
or touching the boundary of Xi (like p2 ), or a non-prefix maximal palindrome of Xr (like p3 ).

Theorem 2. Algorithm 1 solves Problem 1 in O(n4 log n) time with O(n3 ) space.
Proof. The correctness of the algorithm is clear from lines 6–10 which correspond to Observation 2.
It follows from Theorem 1 that it takes O(n4 log n) time and O(n3 ) space in lines 1–4.
For any variables Xi = Xì Xri and Yj = Y`j Yrj , max(Ext Xi ,Yj (OL(Xì , Yrj ))) and max(Ext Yj ,Xi (OL(Y`j , Xri ))) can be computed
in O(n2 log n) time by Lemma 4. Since each of max(Ext Xi ,Yj (OL(Xì , Yrj ))) and max(Ext Yj ,Xi (OL(Y`j , Xri ))) is singleton, we have
|L| = O(n2 ). Hence it takes O(n4 log n) time in lines 6–10.
Overall, the algorithm works in O(n4 log n) time with O(n3 ) space.
The following corollary is immediate from Theorem 2.
Corollary 1. Given two SLPs T and S describing strings T and S respectively, the beginning and ending positions of a longest
common substring of T and S can be computed in O(n4 log n) time with O(n3 ) space.

4. Computing palindromes from SLP compressed strings

In this section we present an efficient algorithm that computes a succinct representation of all maximal palindromes of
string T , when its corresponding SLP T is given as input. The algorithm runs in O(n4 ) time and O(n2 ) space, where n is the
size of the input SLP T .

4.1. The problem

For any string T , let Pals(T ) denote the set of pairs of the beginning and ending positions of all maximal palindromes in
T , namely,
Pals(T ) = {(p, q) | T [p : q] is the maximal palindrome centered at b
p+q
2
c}.
Note that |Pals(T )| = O(|T |) = O(2 ). Thus we consider a succinct representation of Pals(T ) in the sequel.
n

Let PPals(T ) and SPals(T ) denote the set of pairs of the beginning and ending positions of the prefix and suffix palindromes
of T , respectively, that is,
PPals(T ) = {(1, q) ∈ Pals(T ) | 1 ≤ q ≤ |T |}, and
SPals(T ) = {(p, |T |) ∈ Pals(T ) | 1 ≤ p ≤ |T |}.

Example 6. For string T = aababaababaab, PPals(T ) = {(1, q) | q ∈ {1, 2, 7, 12}}, since a, aa, aababaa, and
aababaababaa are prefix palindromes. Also, SPals(T ) = {(p, 13) | p ∈ {5, 10, 13}}, since baababaab, baab and b are
suffix palindromes.
It is easy to see that for any non-empty string T , PPals(T ), SPals(T ) and Pals(T ) are non-empty sets.
Let Xi denote a variable in T for 1 ≤ i ≤ n. For any variables Xi = X` Xr , let Pals4 (Xi ) be the set of pairs of beginning and
ending positions of maximal palindromes of Xi that cover or touch the boundary between X` and Xr , namely,
Pals4 (Xi ) = {(p, q) ∈ Pals(Xi ) | 1 ≤ p ≤ |X` | + 1, |X` | ≤ q ≤ |Xi |, p ≤ q}.
Example 7. Consider variable X6 = X4 X5 = aababaab of Example 1, where X4 = aab and X5 = abaab. Pals4 (X6 ) =
{(2, 4), (1, 7), (4, 6)} since X6 [2 : 4] = aba, X6 [1 : 7] = aababaa, and X6 [4 : 6] = aba are the maximal palindromes that
touch or cover the boundary of X4 and X5 .
We have the following observation for the decomposition of Pals(Xi ) (see Fig. 4).
W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913 907

Fig. 5. (1, q) ∈ PPals(X ) implies X [i : j] = X [q − j + 1 : q − i + 1]R .

Observation 3. For any variables Xi = X` Xr ,

Pals(Xi ) = (Pals(X` ) − SPals(X` )) ∪ Pals4 (Xi ) ∪ ((Pals(Xr ) − PPals(Xr )) ⊕ |X` |).

Thus, the desired output Pals(T ) = Pals(Xn ) can be represented as a combination of {Pals4 (Xi )}ni=1 , {PPals(Xi )}ni=1
and {SPals(Xi )}ni=1 . Therefore, computing Pals(T ) is reduced to computing Pals4 (Xi ), PPals(Xi ) and SPals(Xi ), for every i =
1, 2, . . . , n. The problem to be tackled in this section follows:
Problem 2. Given an SLP T of size n, compute succinct representations {Pals4 (Xi )}ni=1 , {PPals(Xi )}ni=1 and {SPals(Xi )}ni=1 .

Note that the sizes of {Pals4 (Xi )}ni=1 , {PPals(Xi )}ni=1 and {SPals(Xi )}ni=1 can be O(2n ). Thus we output succinct representations
of these sets which are polynomial in n. In the following sections we show how to succinctly represent and compute these
sets.

4.2. Succinct representations of PPals(X ) and SPals(X )

Ga̧sieniec et al. [2] claimed that PPals(X ) and SPals(X ) can be represented by O(log |X |) arithmetic progressions for any
string X . However, they gave no proof regarding it. Although they stated that a proof is to be given in a full version of the
paper, unfortunately it has never appeared. This section is to supply a full proof to show that PPals(X ) and SPals(X ) can be
represented by O(log |X |) arithmetic progressions.
Let us focus on the space requirement of PPals(X ), as that of SPals(X ) can be shown similarly. Recall that PPals(X ) is the
set of pairs of the beginning and ending positions of all prefix palindromes of X .
The following lemma is obvious but is quite helpful to prove Lemma 6.
Lemma 5. For any integers q, such that (1, q) ∈ PPals(X ) and i, j with 1 ≤ i < j ≤ q, we have X [i : j] =
X [q − j + 1 : q − i + 1]R .
Proof. Since (1, q) is the prefix palindrome in X , we have X [i] = X [q − i + 1] for any i with 1 ≤ i ≤ q, which implies that:
X [i : j] = X [i] X [i + 1] · · · X [j − 1] X [j]
= X [q − i + 1] X [q − i] · · · X [q − j + 2] X [q − j + 1]
= (X [q − j + 1] X [q − j + 2] · · · X [q − i] X [q − i + 1])R
= X [q − j + 1 : q − i + 1]R .
(see also Fig. 5).

Lemma 6. For any positive integers a and d, if (1, a), (1, a + d) ∈ PPals(X ) and a − d ≥ 0, then (1, a − d) ∈ PPals(X ).
Proof. We show X [1 : a − d] = X [1 : a − d]R , which yields that a − d is the length of a prefix palindrome in X . By applying
Lemma 5, we have
X [1 : a − d] = X [a − (a − d) + 1 : a − 1 + 1]R (2)
= X [ d + 1 : a] R
R
= (X [(a + d) − a + 1 : (a + d) − (d + 1) + 1]R ) (3)
= X [ d + 1 : a]
= X [1 : a − d ]R
where Eq. (2) comes from (1, a) ∈ PPals(X ), whereas Eq. (3) comes from (1, a + d) ∈ PPals(X ) (see also Fig. 6).
Let a1 , a2 , . . . , ak be the sequence of integers in increasing order, such that PPals(X ) = {(1, a1 ), (1, a2 ), . . . , (1, ak )}. We
define di as the progression differences for ai , that is, di = ai+1 − ai for 1 ≤ i < k. The next lemma states that the sequence
{di }ki=−11 is monotonically non-decreasing.
Lemma 7. di ≤ di+1 for any 1 ≤ i < k − 1.
Proof. Suppose di > di+1 holds for some 1 ≤ i < k−1. Since (1, ai+1 ) ∈ PPals(X ) and (1, ai+2 ) = (1, ai+1 +di+1 ) ∈ PPals(X ),
Lemma 6 claims that (1, ai+1 − di+1 ) ∈ PPals(X ). However, ai = ai+1 − di < ai+1 − di+1 < ai+1 , which contradicts the
definition that (1, ai+1 ) is the next element to (1, ai ) in PPals(X ) in increasing order (see also Fig. 7).
908 W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913

Fig. 6. (1, a) ∈ PPals(X ) and (1, a + d) ∈ PPals(X ) implies (1, a − d) ∈ PPals(X ).

Fig. 7. di > di+1 contradicts the definition of {ai }ki=1 .

Lemma 8. If di+1 6= di , then di+1 ≥ di + di−1 .

Proof. By Lemma 6, we have (1, ai+1 − di ) ∈ PPals(X ) since (1, ai+1 ) ∈ PPals(X ) and (1, ai+2 ) = (1, ai+1 + di+1 ) ∈ PPals(X ).
`=j (a`+1 − a` ) =
Pi Pi
Therefore, ai+1 − di+1 = aj for some 1 ≤ j ≤ i, so that di+1 = ai+1 − aj = `=j d` . If di+1 6= di , we have
j < i, which implies di+1 =
Pi
`=j d` ≥ di + di−1 .

The following is a key lemma of this subsection:

Lemma 9. For any variable X , PPals(X ) and SPals(X ) can be represented by O(log |X |) arithmetic progressions.

Proof. We show that PPals(X ) can be represented by O(log |X |) arithmetic progressions. The case of SPals(X ) can be proved
similarly.
It follows from Lemma 6 that, for any positive integer r such that ai − rdi > 0, we have ai − rdi ∈ PPals(X ). For any ai and
di , let ti = max{y | ai − (y − 1)di > 0} and a0i = ai − (ti − 1)di . That is, a0i is the smallest element of arithmetic progression
ha0i , di , ti i. Then, if di = di+1 , it holds that ha0i , di , ti i ∪ {ai+1 } = ha0i+1 , di+1 , ti+1 i. For any integers p, q and any arithmetic
progression ha, d, t i such that p ≤ a and q ≥ a + (t − 1)d, let

(p, ha, d, t i) = {(p, a + (i − 1)d) | 1 ≤ i ≤ t }, and

(ha, d, t i, q) = {(a + (i − 1)d, q) | 1 ≤ i ≤ t }.
Then we have PPals(X ) = 1≤i≤n (1, hai , di , ti i) =
0
i∈{i|di 6=di+1 } (1, hai , di , ti i). The worst case scenario in terms of the
0
S S

number of arithmetic progressions in PPals(X ) is that di 6= di+1 for each i. By Lemma 8, the actual worst case is given
by the following sequence {di }ki=−11 :


2 for i = 1,
di = 3 for i = 2,
for i > 2.

di−1 + di−2

Now, let Fj denote the j-th Fibonacci number, namely,

for j = 1, 2,

1
Fj =
Fj−1 + Fj−2 for j > 2.
√
ϕ i −(√1−ϕ)i
= b √ϕ 5 + 12 c, where ϕ =
i
5+1
It is a well-known fact that Fi = 2
.
5
W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913 909

Clearly di = Fi+2 . Therefore, the general term of {ai } can be represented as follows:
i −1
X i+1
X
ai = ai−1 + di−1 = ai−2 + di−2 + di−1 · · · = a1 + d k = a1 + Fk
k=1 k=3
i+1
X
= a1 + Fk − F1 − F2 = 1 + Fi+1+2 − 1 − 1 − 1 = Fi+3 − 2.
k=1

Now we have the following formula for the largest element ak of {ai }ki=1 .
ϕ k+3 1 ϕ k+3 1
ak = Fk+3 − 2 = b √ + c − 2 > √ + − 1 − 2.
5 2 5 2
Since ak ≤ |X | and ϕ > 1, we have that k = O(logϕ |X |) = O(log |X |).

4.3. Efficient computation of Pals4 (Xi ), PPals(Xi ) and SPals(Xi )

In this section we show how to efficiently compute Pals4 (Xi ), PPals(Xi ) and SPals(Xi ).
The next lemma points out that SPals(X` ) and PPals(Xr ) are useful to compute Pals4 (Xi ).
Lemma 10. For any variable Xi = X` Xr and any (p, q) ∈ Pals4 (Xi ), there exists an integer l ≥ 0 such that (p + l, q − l) ∈
SPals(X` ) ∪ (PPals(Xr ) ⊕ |X` |) ∪ {(|X` |, |X` | + 1)}.
Proof. Since Xi [p : q] is a palindrome, Xi [p + l : q − l] is also a palindrome for any 0 ≤ l < b
p+q
2
c. Then we have the following
three cases:
(1) When b 2 c < |X` |, for l = p − |X` |, we have (p + l, q − l) ∈ SPals(X` ).
p+q

(2) When b 2 c > |X` |, for l = |X` | − p + 1, we have (p + l, q − l) ∈ PPals(Xr ).

p+q

p+q
(3) When b 2 c = |X` |, if q − p + 1 is odd, then the same arguments to case 1 apply, since X` [|X` |] = X` [|X` |]R and
(|X` |, |X` |) ∈ SPals(X` ). If q − p + 1 is even, let l = |X` | − p. In this case, we have p + q = 2|X` | + 1. Thus, p + l = |X` |
and q − l = |X` | + 1.
By Lemma 10, Pals4 (Xi ) can be computed by ‘‘extending’’ all palindromes in SPals(X` ) and PPals(Xr ) to the maximal within Xi ,
and finding the maximal even palindromes centered at |X` | in Xi . In so doing, for any (maximal or non-maximal) palindrome
P = Xi [p : q], we define function Ext Xi as
Ext Xi (p, q) = (p − h, q + h),
where h ≥ 0 and Xi [p − h : q + h] is the maximal palindrome centered at position b 2 c in Xi . For any p, q with Xi [p : q]
p+q

not being a palindrome, we leave Ext Xi (p, q) undefined. There are the following natural properties on function Ext Xi :
• the input and output palindromes are centered at the same position;
• if |P | = q − p + 1 is odd, then Ext Xi (p, q) is also an odd palindrome;
• if |P | = q − p + 1 is even, then Ext Xi (p, q) is also an even palindrome.
For a set S of pairs of integers, let Ext Xi (S ) = {Ext Xi (p, q) | (p, q) ∈ S }.
Let
Pals∗ (Xi ) = {(|X` | − k + 1, |X` | + k) ∈ Pals(Xi ) | k ≥ 1}.
The next observations give us a procedure to compute Pals4 (Xi ).
Observation 4. For any variable Xi = X` Xr ,
Pals4 (Xi ) = Ext Xi (SPals(X` )) ∪ Ext Xi (PPals(Xr ) ⊕ |X` |) ∪ Pals∗ (Xi ). (4)
See also Fig. 8 that illustrates Observation 4.
In what follows we show how to efficiently execute the Ext functions in Eq. (4). Let us first briefly recall the work of [9,6].
For any variables Xi = X` Xr and Xj , we define the set Occ 4 (Xi , Xj ) of all occurrences of Xj that cover or touch the boundary
between X` and Xr , namely,
Occ 4 (Xi , Xj ) = {s > 0 | Xi [s : s + |Xj | − 1] = Xj , |X` | − |Xj | + 1 ≤ s ≤ |X` |}.
Theorem 3 ([6]). For any variables Xi and Xj , Occ 4 (Xi , Xj ) can be computed in total of O(n3 ) time and O(n2 ) space.
Lemma 11 ([9]). For any variables Xi , Xj and integer k, FM (Xi , Xj , k) can be computed in O(n2 ) time, provided that Occ 4 (Xi0 , Xj0 )
is already computed for any 1 ≤ i0 ≤ i and 1 ≤ j0 ≤ j.
Lemma 12. For any variable Xi = X` Xr and any arithmetic progression ha, d, t i with (1, ha, d, t i) ⊆ PPals(Xr ),
Ext Xi ((1, ha, d, t i)) can be represented by at most 2 arithmetic progressions and a pair of the beginning and ending positions
of a maximal palindrome, and can be computed by at most 4 calls of FM. The same holds for any arithmetic progression ha0 , d0 , t 0 i
with (ha0 , d0 , t 0 i, |X` |) ⊆ SPals(X` ).
The above lemma can be inherently proven by Lemma 3.4 of [1]. However, for the sake of completeness we supply a full
proof of the lemma in Appendix.
910 W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913

Fig. 8. Illustration of Observation 4. Any element of Pals(Xi ) can be computed by extending either some prefix palindrome of SPals(X` ) or some suffix
palindrome of PPals(Xr ), or it is the maximal even palindrome centered at |X` | in Xi .

Fig. 9. Illustration of Observation 5. Any element of PPals(Xi ) is either an element of PPals(X` ) or an element of Pals4 (Xi ) whose beginning position is 1.
Similar arguments hold for SPals(Xi ).

We are now ready to prove the following lemma:

Lemma 13. For any variable Xi = X` Xr , Pals4 (Xi ) can be represented by O(log |Xi |) arithmetic progressions and can be computed
in O(n2 log |Xi |) time.
Proof. Recall Observation 4. It is clear from the definition that Pals∗ (Xi ) is either singleton or empty. When it is a singleton,
it consists of the maximal even palindrome centered at |X` |. Let k = FM (Xr , X`R , 1). Then we have

∅ if k = 0,
Pals∗ (Xi ) =
{(|X` | − k + 1, |X` | + k)} otherwise.
Due to Lemma 11, Pals∗ (Xi ) can be computed in O(n2 ) time.
Now we consider Ext Xi (PPals(X` )). It follows from Lemma 12 that each subset Ext Xi ((1, ha, d, t i)) ⊆ Ext Xi (PPals(X` )) can
be represented by O(1) arithmetic progressions. Also, Ext Xi ((1, ha, d, t i)) can be computed in O(n2 ) time due to Lemma 11
and Lemma 12. It follows from Lemma 9 that PPals(X` ) consists of O(log |X` |) arithmetic progressions. Thus Ext Xi (PPals(X` ))
can be computed in O(n2 log |X` |) time. Similar arguments hold for Ext Xi (SPals(Xr )).
Hence, by Observation 4, Pals4 (Xi ) can be represented by O(log |Xi |) arithmetic progressions and can be computed in
O(n2 log |Xi |) time.
On the other hand, PPals(Xi ) and SPals(Xi ) can be computed using Pals4 (Xi ), as follows:
Observation 5. For any variable Xi = X` Xr ,
PPals(Xi ) = (PPals(X` ) − (1, |X` |)) ∪ {(1, q) ∈ Pals4 (Xi )} and
SPals(Xi ) = ((SPals(Xr ) − (1, |Xr |)) ⊕ |X` |) ∪ {(p, |Xi |) ∈ Pals4 (Xi )}.

Algorithm 2: Computing succinct representation of Pals(T ).

Input: SLP T = {Xi }ni=1
Output: Succinct representation of Pals(T) for string T
1 for i = 1 to n do
2 for j = 1 to n do
3 compute Occ 4 (Xi , Xj );
4
5 for i = 1 to n do
6 SPals(Xi ) = ∅; PPals(Xi ) = ∅; Pals4 (Xi ) = ∅;
7 for i = 1 to n do
8 if Xi = a then /* Xi is constant */
9 SPals(Xi ) = h1, 1, 1i; PPals(Xi ) = h1, 1, 1i; Pals4 (Xi ) = h1, 1, 1i;
10 else /* Xi = Xl Xr */
11 Pals4 (Xi ) = Ext Xi (SPals(Xl )) ∪ Ext Xi (PPals(Xr ) ⊕ |Xl |) ∪ Pals∗ (Xi );
12 PPals(Xi ) = PPals(Xl ) ∪ {(p, |Xl |) ∈ Pals4 (Xi )};
13 SPals(Xi ) = (SPals(Xr ) ⊕ |Xl |) ∪ {(1, q) ∈ Pals4 (Xi )};
14
15 return {Pals4 (Xi )}ni=1 , {SPals(Xi )}ni=1 , {PPals(Xi )}ni=1 ;

The main result of this section is the following theorem.

Theorem 4. Algorithm 2 solves Problem 2 in O(n4 ) time with O(n2 ) space.
Proof. The correctness of the algorithm follows from lines 11–13 that correspond to Observations 4 and 5.
Now we analyze the time complexity. It follows from Theorem 3 that it takes O(n3 ) time in total for lines 1–4. By
Lemma 13 it takes O(n2 log |Xi |) time for line 11. Also, by Lemma 14 it takes O(log |Xi |) time for lines 12–13. Therefore
the time complexity for the for loop of line 7 is O(n4 ). Hence the overall time cost is O(n4 ).
The total space complexity is as follows. It follows from Theorem 3 that it takes O(n2 ) space for lines 1–4. By Lemma 13, it
takes O(log |Xi |) space for line 11. Also, by Lemma 9, it takes O(log |Xi |) space for lines 12–13. Therefore the space complexity
for the for loop of line 7 is O(n2 ). Hence the overall space requirement is O(n2 ).
The following two theorems are results obtained by slightly modifying the algorithm of the previous subsections.
Theorem 5. Given an SLP T that describes string T , whether T is a palindrome or not can be determined with extra O(1) space
and without increasing asymptotic time complexities of the algorithm.
Proof. It suffices to see if (1, |T |) ∈ PPals(T ) = PPals(Xn ). By Lemma 9, PPals(Xn ) can be represented by O(n) arithmetic
progressions. It is not difficult to see that T is a palindrome if and only if a + (t − 1)d = |T | for the arithmetic progression
ha, d, t i of the largest common difference among those in PPals(Xn ). Such an arithmetic progression can easily be found
during computation of PPals(Xn ) without increasing asymptotic time complexities of the algorithm.
Theorem 6. Given an SLP T that describes string T , the position pair (p, q) of the longest palindrome in T can be found with extra
O(1) space and without increasing asymptotic time complexities of the algorithm.
Proof. We compute the beginning and ending positions of the longest palindrome in Pals4 (Xi ) for i = 1, 2, . . . , n. It takes
O(n) time for each Xi . If its length exceeds the length of the currently kept palindrome, we update the beginning and ending
positions.
Provided that {PPals(Xi )}ni=1 , {SPals(Xi )}ni=1 , and {Pals4 (Xi )ni=1 } are already computed, we have the following result:
Theorem 7. Given a pair (p, q) of integers, it can be answered in O(n) time whether or not substring T [p : q] is a maximal
palindrome of T .
Proof. We binary search the derivation tree of SLP T until finding the variable Xi = X` Xr such that 1 + offset ≤ p ≤
|X` | + offset and 1 + offset + |X` | ≤ q ≤ |Xi | + offset. This takes O(n) time. Due to Observation 4, for each variable Xi ,
Pals4 (Xi ) can be represented by O(n) arithmetic progressions plus a pair of the beginning and ending positions of a maximal
palindrome. Thus, we can check if (p, q) ∈ Pals4 (Xi ) in O(n) time.
912 W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913

Fig. 10. Illustration for the proof of Lemma 12.

5. Conclusions and further work

In this paper we considered strings compressed by straight line programs (SLPs). Since SLP-compressed strings can be
exponentially small w.r.t. the uncompressed (original) strings, it is significant to process SLP-compressed strings without
decompression and in time polynomial in the compressed size n. In this paper, we showed the first polynomial time
algorithm to compute the longest common substring of two given SLP-compressed strings, which runs in O(n4 log n) time
and O(n3 ) space. In addition, we presented an O(n4 )-time O(n2 )-space algorithm to compute all maximal palindromes
of a given SLP-compressed strings. This is faster than the O(n4 log N )-time solution obtained by combining the results of
Ga̧sieniec et al. [2] and Lifshits [6].
Our future work includes extending our results to computing all squares from a given SLP-compressed string. Ga̧sieniec
et al. [2] claimed that all squares can be found in O(n6 log5 N ) time from strings compressed by compositions systems, which
are generalizations of SLPs. The time complexity would be improved to O(n5 log3 N ) in combination with the algorithm by
Lifshits [6]. Still, it might be possible to produce a faster solution using the techniques presented in this paper.

Appendix

This Appendix is to give a complete proof for Lemma 12. To prove this lemma, we need to show the following lemma:

Lemma 15. For any variable Xi and {(1, q) | q ∈ ha, d, t i} ⊆ PPals(Xi ), there exist palindromes u, v and a non-negative integer
k, such that (uv)t +k−1 u is a prefix of Xi , |uv| = d and |(uv)k u| = a.

Proof. Let k = max{h | a − hd > 0}, a0 = a − kd. It is not difficult to see that ha0 , d, t + ki ⊆ PPals(Xi ). Let w = Xi [1 : d],
u = Xi [1 : a0 ], and v = Xi [a0 + 1 : d]. Then, a = a0 + kd = |u| + k|uv| = |(uv)k u|.
Since (1, a0 + d) ∈ PPals(Xi ), Xi [d + 1 : a0 + d] = uR . Also, for any 1 ≤ j ≤ t + k − 1, since (1, a0 + jd) ∈ PPals(Xi ), we
have

Xi [a0 + jd + 1 : a0 + (j + 1)d] = w R .

Thus uv uR (w R )t +k−2 is a prefix of Xi .

Since (1, a0 ) ∈ PPals(Xi ), u is a palindrome. Since (1, a0 + d) ∈ PPals(Xi ), uv uR is a palindrome, which implies that v is
also a palindrome. Consequently,

uv uR (w R )t +k−2 = uv u((uv)R )t +k−2 = uv u(v R uR )t +k−2

= uv u(v u)t +k−2 = u(v u)t +k−1 = (uv)t +k−1 u.
Therefore, (uv)t +k−1 u is a prefix of Xi .

In the above lemma, clearly |uv| = d is a period of string (uv)t u.

We are now ready to prove Lemma 12. (See also Fig. 10.)

Proof. Let us consider Ext Xi ({1, ha, d, t i}). By Lemma 15, Xr [1 : a +(t − 1)d] = (uv)t +k−1 u, where |uv| = d and |(uv)k u| = a.
Let x be the maximum integer such that Xr [1 : x] has a period |uv|. Namely, Xr [1 : x] is the longest prefix of Xr that has a
period |uv|. Then x can be computed by using FM as follows:

x = FM (Xr , Xr , d + 1) + d.
W. Matsubara et al. / Theoretical Computer Science 410 (2009) 900–913 913

Let y be the largest integer such that (uv)y is a prefix of X`R . Then y can be computed by at most 2 calls of FM, as follows.
First, we call FM to check whether or not the string uv is a prefix of X`R . If FM (Xr , X`R , 1) < d, then y = FM (Xr , X`R , 1).
Otherwise, by Lemma 1 we can compute y by:
y = FM (X`R , X`R , d + 1) + d.
Let e` = |X` | − y + 1 and er = |X` | + x. Then, clearly string Xi [e` : er ] has a period d. Let
ha, d, t i = ha1 , d, t1 i ∪ ha2 , d, t2 i ∪ ha3 , d, t3 i
= ha, d, t1 i ∪ ha + t1 d, d, t2 i ∪ ha + (t1 + t2 )d, d, t3 i, such that

|X` | − e` + 1 < er − q1 for any q1 ∈ ha1 , d, t1 i,

|X` | − e` + 1 = er − q2 for any q2 ∈ ha2 , d, t2 i,
|X` | − e` + 1 > er − q3 for any q3 ∈ ha3 , d, t3 i,
and t1 + t2 + t3 = t. For the first and the last arithmetic progressions, we have:
Ext Xi ((1, ha1 , d, t1 i)) = {(e` , q1 + |X` | − e` + 1) | q1 ∈ ha1 , d, t1 i}
= {(e` , ha + |X` | − e` + 1, d, t1 i} and
Ext Xi ((1, ha3 , d, t3 i)) = {(|X` | + er − q3 , |X` | + er ) | q3 ∈ ha3 , d, t3 i}
= {(h|X` | + er − a − (t − 1)d, d, t3 i, |X` | + er )}.
Now let us consider ha + t1 d, d, t2 i. It is easy to see that t2 ≤ 1. We consider the case where t2 = 1 and a2 = a + t1 d = q2 .
Notice that the palindrome (1, a2 ) can be expanded beyond the periodicity w.r.t. d. Thus,
Ext Xi ((1, a2 )) = {(|X` | − z + 1, |X` | + a2 + z )} = {(|X` | − z + 1, |X` | + a + t1 d + z )},

where z = FM (X`R , Xr , a2 + 1) + a2 . Therefore, the set of expanded palindromes can be represented as follows:
Ext Xi ({1, ha, d, t i} ⊕ |X` |) = {(e` , ha + |X` | − e` + 1, d, t1 i}
∪{(h|X` | + er − a − (t − 1)d, d, t3 i, |X` | + er )}
∪{(|X` | − z + 1, |X` | + a + t1 d + z )}.
Hence Ext Xi ({1, ha, d, t i}) can be represented by at most 2 arithmetic progressions and a palindrome, which in total require
a constant space. We remark that similar arguments hold for Ext Xi (ha0 , d0 , t 0 i, |X` |).

References

[1] A. Apostolico, D. Breslauer, Z. Galil, Parallel detection of all palindromes in a string, Theoretical Computer Science 141 (1995) 163–173.
[2] L. Gasieniec, M. Karpinski, W. Plandowski, W. Rytter, An efficient algorithms for Lempel-Ziv encoding, in: Proc. 5th Scandinavian Workshop on
Algorithm Theory, SWAT’96, in: Lecture Notes in Computer Science, vol. 1097, Springer-Verlag, 1996, pp. 392–403.
[3] S. Inenaga, A. Shinohara, M. Takeda, An efficient pattern matching algorithm on a subclass of context free grammars, in: Proc. Eighth International
Conference on Developments in Language Theory, DLT’04, in: Lecture Notes in Computer Science, vol. 3340, Springer-Verlag, 2004, pp. 225–236.
[4] M. Karpinski, W. Rytter, A. Shinohara, An efficient pattern-matching algorithm for strings with short descriptions, Nordic Journal of Computing 4
(1997) 172–186.
[5] J. Kieffer, E. Yang, G. Nelson, P. Cosman, Universal lossless compression via multilevel pattern matching, IEEE Transactions on Information Theory 46
(4) (2000) 1227–1245.
[6] Y. Lifshits, Processing compressed texts: A tractability border, in: Proc. 18th Annual Symposium on Combinatorial Pattern Matching, CPM’07,
in: Lecture Notes in Computer Science, vol. 4580, Springer-Verlag, 2007, pp. 228–240.
[7] Y. Lifshits, M. Lohrey, Querying and embedding compressed texts, in: Proc. 31st International Symposium on Mathematical Foundations of Computer
Science, MFCS’06, in: Lecture Notes in Computer Science, vol. 4162, Springer-Verlag, 2006, pp. 681–692.
[8] W. Matsubara, S. Inenaga, A. Ishino, A. Shinohara, T. Nakamura, K. Hashimoto, Computing longest common substring and all palindromes from
compressed strings, in: Proc. 34th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM’08, in: Lecture
Notes in Computer Science, vol. 4910, Springer-Verlag, 2008, pp. 364–375.
[9] M. Miyazaki, A. Shinohara, M. Takeda, An improved pattern matching algorithm for strings in terms of straight-line programs, in: Proc. 8th Annual
Symposium on Combinatorial Pattern Matching, CPM’97, in: Lecture Notes in Computer Science, vol. 1264, Springer-Verlag, 1997, pp. 1–11.
[10] C.G. Nevill-Manning, I.H. Witten, D.L. Maulsby, Compression by induction of hierarchical grammars, in: Data Compression Conference ’94, IEEE
Computer Society, 1994, pp. 244–253.
[11] W. Plandowski, Testing equivalence of morphisms on context-free languages, in: Proc. Second Annual European Symposium on Algorithms, ESA’94,
in: Lecture Notes in Computer Science, vol. 855, Springer-Verlag, 1994, pp. 460–470.
[12] W. Rytter, Grammar compression, LZ-encodings, and string algorithms with implicit input, in: Proc. 31st International Colloquium on Automata,
Languages and Programming, ICALP’04, in: Lecture Notes in Computer Science, vol. 3142, Springer-Verlag, 2004, pp. 15–27.
[13] J. Ziv, A. Lempel, A universal algorithm for sequential data compression, IEEE Transactions on Information Theory, IT 23 (3) (1977) 337–349.
[14] J. Ziv, A. Lempel, Compression of individual sequences via variable-length coding, IEEE Transactions on Information Theory 24 (5) (1978) 530–536.

A Subquadratic Algorithm For Minimum Palindromic Factorization
No ratings yet
A Subquadratic Algorithm For Minimum Palindromic Factorization
13 pages
Quadratic Time Bound for Palindromes
No ratings yet
Quadratic Time Bound for Palindromes
2 pages
Suffix Tree Algorithm for Palindromes
No ratings yet
Suffix Tree Algorithm for Palindromes
5 pages
Longest Palindromic Substring
No ratings yet
Longest Palindromic Substring
23 pages
54.string 2notes
No ratings yet
54.string 2notes
20 pages
Longest Common String
No ratings yet
Longest Common String
40 pages
Eer Tree
No ratings yet
Eer Tree
21 pages
Report On The Propositions of The Banquet of The Sun God
No ratings yet
Report On The Propositions of The Banquet of The Sun God
39 pages
Fall Semester 2024-25 - STS3007 - TH - AP2024252001241 - 2024-10-01 - Reference-Material-I
No ratings yet
Fall Semester 2024-25 - STS3007 - TH - AP2024252001241 - 2024-10-01 - Reference-Material-I
18 pages
DSA - Strings - Notes
No ratings yet
DSA - Strings - Notes
8 pages
Draft 1
No ratings yet
Draft 1
6 pages
Eertree: An Efficient Data Structure For Processing Palindromes in Strings
No ratings yet
Eertree: An Efficient Data Structure For Processing Palindromes in Strings
21 pages
Longest Palindromic Subsequence
No ratings yet
Longest Palindromic Subsequence
26 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Pattern Matching: Suffix Tree Applications
No ratings yet
Pattern Matching: Suffix Tree Applications
39 pages
Naïve String Search Explained
No ratings yet
Naïve String Search Explained
5 pages
hw10 Solution PDF
No ratings yet
hw10 Solution PDF
5 pages
Z Function and Its Calculation:: Int Int Int Int For Int If While If
No ratings yet
Z Function and Its Calculation:: Int Int Int Int For Int If While If
32 pages
Lec06 448
No ratings yet
Lec06 448
6 pages
Extra CKY AlgorithmCD - 2025 - 26 - LAB5CD - 2025 - 26CD - 2025 - 26 - LAB5 - LAB5
No ratings yet
Extra CKY AlgorithmCD - 2025 - 26 - LAB5CD - 2025 - 26CD - 2025 - 26 - LAB5 - LAB5
7 pages
Non-Uniqueness of Minimal Superpermutations
No ratings yet
Non-Uniqueness of Minimal Superpermutations
9 pages
4 Module Algorithms
No ratings yet
4 Module Algorithms
28 pages
Coding Challenges for Programmers
No ratings yet
Coding Challenges for Programmers
20 pages
Aoa 6
No ratings yet
Aoa 6
4 pages
Top Google Coding Questions
No ratings yet
Top Google Coding Questions
300 pages
Applications of Suffix Trees
No ratings yet
Applications of Suffix Trees
40 pages
Fast Algorithms for LCE Problem
No ratings yet
Fast Algorithms for LCE Problem
11 pages
BSc Text Searching Exam 2010
No ratings yet
BSc Text Searching Exam 2010
8 pages
String Matching Algorithms Guide
No ratings yet
String Matching Algorithms Guide
46 pages
ICPC 2019 - Online Preliminary Problem Set Analysis
No ratings yet
ICPC 2019 - Online Preliminary Problem Set Analysis
6 pages
Palindrome Tree
No ratings yet
Palindrome Tree
8 pages
Jda 2009
No ratings yet
Jda 2009
29 pages
FM 072
No ratings yet
FM 072
20 pages
Longest Common Sub Sequence
No ratings yet
Longest Common Sub Sequence
4 pages
ExamIIPreparationExercises With Bold
No ratings yet
ExamIIPreparationExercises With Bold
10 pages
Notes 05 Parallel String Matching
No ratings yet
Notes 05 Parallel String Matching
31 pages
Data Structures For Computing Unique Palindromes in Static and Non-Static Strings
No ratings yet
Data Structures For Computing Unique Palindromes in Static and Non-Static Strings
22 pages
Solution Notes
No ratings yet
Solution Notes
3 pages
On Improving Tunstall Codes: Shmuel T. Klein and Dana Shapira
No ratings yet
On Improving Tunstall Codes: Shmuel T. Klein and Dana Shapira
16 pages
Optimal Data Compression Algorithm: Pergamon
No ratings yet
Optimal Data Compression Algorithm: Pergamon
16 pages
Finding Palindromes: Variants and Algorithms: January 2013
No ratings yet
Finding Palindromes: Variants and Algorithms: January 2013
18 pages
Near-Optimal Quantum Algorithm For Finding The Longest Common Substring Between Run-Length Encoded Strings
No ratings yet
Near-Optimal Quantum Algorithm For Finding The Longest Common Substring Between Run-Length Encoded Strings
21 pages
Pali2 (En)
No ratings yet
Pali2 (En)
2 pages
Toc
No ratings yet
Toc
6 pages
Seminar 2
No ratings yet
Seminar 2
20 pages
Solutions To Problems From IOI 2018: Tomasz Idziaszek
No ratings yet
Solutions To Problems From IOI 2018: Tomasz Idziaszek
72 pages
HW 2
No ratings yet
HW 2
5 pages
Longest Palindromic Subsequence.2025
No ratings yet
Longest Palindromic Subsequence.2025
29 pages
15 String Matching
No ratings yet
15 String Matching
45 pages
String Problems
No ratings yet
String Problems
20 pages
Subsequence Covers of Words
No ratings yet
Subsequence Covers of Words
17 pages
Boyer Moore Algorithm: Idan Szpektor
100% (1)
Boyer Moore Algorithm: Idan Szpektor
48 pages
WINSEM2024-25 STS4006 TH AP2024254001072 2025-03-21 Reference-Material-I
No ratings yet
WINSEM2024-25 STS4006 TH AP2024254001072 2025-03-21 Reference-Material-I
13 pages
Unit-V String Matching Algorithms
No ratings yet
Unit-V String Matching Algorithms
53 pages
Suffix Trees and Suffix Arrays
No ratings yet
Suffix Trees and Suffix Arrays
33 pages
Application of A Modified Convolution Method To Exact String Matching
No ratings yet
Application of A Modified Convolution Method To Exact String Matching
6 pages
Adaptive Automata For Grammar Based Text Compression
No ratings yet
Adaptive Automata For Grammar Based Text Compression
11 pages
Fin f12 Sol
No ratings yet
Fin f12 Sol
6 pages
Civil Engineering Methodology Dissertation
100% (2)
Civil Engineering Methodology Dissertation
4 pages
gr4 Unit 1a Earth Features and Processes
No ratings yet
gr4 Unit 1a Earth Features and Processes
8 pages
Sound Healing Practice: by Simon Heather
100% (1)
Sound Healing Practice: by Simon Heather
23 pages
India's Population Control Policies
No ratings yet
India's Population Control Policies
8 pages
Noor Hassan Mahdi
No ratings yet
Noor Hassan Mahdi
18 pages
EDII, Chennai-Revised Mentors List PDF
No ratings yet
EDII, Chennai-Revised Mentors List PDF
6 pages
Different Types of Context Clues Education Presentation in Blue White Simple Lined Style
No ratings yet
Different Types of Context Clues Education Presentation in Blue White Simple Lined Style
26 pages
Covariance Matrix
No ratings yet
Covariance Matrix
6 pages
Doormen 1 (Ing)
No ratings yet
Doormen 1 (Ing)
10 pages
Computer Controlled Absorption Refrigeration Unit, With SCADA
No ratings yet
Computer Controlled Absorption Refrigeration Unit, With SCADA
19 pages
Irrigation Headworks & Distribution Systems
100% (1)
Irrigation Headworks & Distribution Systems
115 pages
Chapter 1-2 Lecture Note
No ratings yet
Chapter 1-2 Lecture Note
64 pages
Solutions International Economics 9 Ed Appleyard
No ratings yet
Solutions International Economics 9 Ed Appleyard
301 pages
Feel Better Blueprint Jennifer Mclean
100% (2)
Feel Better Blueprint Jennifer Mclean
18 pages
Environmental Thesis Topic Help
100% (1)
Environmental Thesis Topic Help
8 pages
Economics Exam IMP Question May-24 by HM Hasnan.
No ratings yet
Economics Exam IMP Question May-24 by HM Hasnan.
83 pages
Grade 12 Survey Questionnaire
No ratings yet
Grade 12 Survey Questionnaire
3 pages
High-Strength Steel for Equipment
No ratings yet
High-Strength Steel for Equipment
2 pages
1937 Vision of Joe Brandt
No ratings yet
1937 Vision of Joe Brandt
6 pages
List of Outbond Student
No ratings yet
List of Outbond Student
5 pages
Cohesive Nouns
100% (1)
Cohesive Nouns
3 pages
Worksheet - Opt - Snells - Answers
No ratings yet
Worksheet - Opt - Snells - Answers
5 pages
SUBIECTE - LICEU Engleza
No ratings yet
SUBIECTE - LICEU Engleza
34 pages
Prarancangan Pabrik Asam Benzoat Dari Toluen Dan Oksigen Dengan Proses Oksidasi Kapasitas 20.000 TON/TAHUN
No ratings yet
Prarancangan Pabrik Asam Benzoat Dari Toluen Dan Oksigen Dengan Proses Oksidasi Kapasitas 20.000 TON/TAHUN
1 page
Quick and Easy Analysis of Alcohol Content in Hand Sanitizer by FTIR Spectroscopy. Application Note (Shimadzu)
No ratings yet
Quick and Easy Analysis of Alcohol Content in Hand Sanitizer by FTIR Spectroscopy. Application Note (Shimadzu)
2 pages
Archer
No ratings yet
Archer
7 pages
Biology Unit 4 PDF
No ratings yet
Biology Unit 4 PDF
41 pages
Primary Science 4 Teacher Guide
No ratings yet
Primary Science 4 Teacher Guide
113 pages
Project Risk Management - TMA4
No ratings yet
Project Risk Management - TMA4
58 pages
Concrete Radiation Shielding Study
No ratings yet
Concrete Radiation Shielding Study
5 pages

1 s2.0 S0304397508008852 Main

Uploaded by

1 s2.0 S0304397508008852 Main

Uploaded by

Theoretical Computer Science 410 (2009) 900–913

Contents lists available at ScienceDirect

Theoretical Computer Science

Efficient algorithms to compute compressed longest common substrings

∗Corresponding author. Tel.: +81 92 802 3668.

2.1. Notations on strings

2.2. Text compression by straight line programs

For any variable Xi of T with 1 ≤ i ≤ n, we define XiR as follows:

be given later on.

3. Computing longest common substring of two SLP compressed strings

3.1. Overlaps between two strings

3.2. The FM function

FM (Xi , Yj , k) = LCPref (Xi [k : |Xi |], Yj ).

3.3. Efficient computation of longest common substrings

h1 = LCSuf (X`i [1 : |X`i | − k], Y`j ) and

if Xi = X`i Xri and Yj = Y`j Yrj ,

FM (Yrj , Xri , a + 1) if FM (Yrj , Xri , a + 1) < d,

A = max{Ext Xi ,Yj (k) | k < min{e3 − e1 , e2 − e4 }} = Ext Xi ,Yj (k0 ),

where k0 = max{k | k < min{e3 − e1 , e2 − e4 }}.

B = max{Ext Xi ,Yj (k) | k > max{e3 − e1 , e2 − e4 }} = Ext Xi ,Yj (k00 ),

where k00 = min{k | k > max{e3 − e1 , e2 − e4 }}.

A pseudo-code of our algorithm is given in Algorithm 1.

Algorithm 1: Computing LCStr (T , S ).

Now we obtain the main result of this section.

4. Computing palindromes from SLP compressed strings

4.1. The problem

Fig. 5. (1, q) ∈ PPals(X ) implies X [i : j] = X [q − j + 1 : q − i + 1]R .

Observation 3. For any variables Xi = X` Xr ,

4.2. Succinct representations of PPals(X ) and SPals(X )

Fig. 6. (1, a) ∈ PPals(X ) and (1, a + d) ∈ PPals(X ) implies (1, a − d) ∈ PPals(X ).

Fig. 7. di > di+1 contradicts the definition of {ai }ki=1 .

Lemma 8. If di+1 6= di , then di+1 ≥ di + di−1 .

The following is a key lemma of this subsection:

(p, ha, d, t i) = {(p, a + (i − 1)d) | 1 ≤ i ≤ t }, and

Now, let Fj denote the j-th Fibonacci number, namely,

4.3. Efficient computation of Pals4 (Xi ), PPals(Xi ) and SPals(Xi )

(2) When b 2 c > |X` |, for l = |X` | − p + 1, we have (p + l, q − l) ∈ PPals(Xr ).

We are now ready to prove the following lemma:

See also Fig. 9 that illustrates Observation 5.

Algorithm 2: Computing succinct representation of Pals(T ).

The main result of this section is the following theorem.

Fig. 10. Illustration for the proof of Lemma 12.

5. Conclusions and further work

Thus uv uR (w R )t +k−2 is a prefix of Xi .

uv uR (w R )t +k−2 = uv u((uv)R )t +k−2 = uv u(v R uR )t +k−2

In the above lemma, clearly |uv| = d is a period of string (uv)t u.

|X` | − e` + 1 < er − q1 for any q1 ∈ ha1 , d, t1 i,

You might also like