Page 34 - 2024F
P. 34
UEC Int’l Mini-Conference No.53 27
Since 2q + 1 ≤ S[l..r] ≤ j − i + 1, we need not
S' = consider values k ∈ N so that 2kp+1 > j −i+1:
the maximum value for k that needs to be con-
e
sidered is ⌈ ⌉−1.By then adding up the previous
Figure 1: Structure of S assuming b ≤ d. 2
′
sums for each possible q = kp, we finally obtain
e
⌈ ⌉−1 j−i+1−2kp
2
X X
|W | = n
distinct (a, b, c, d) ∈ I. (i,j,p)
k=1 n=1
Consider any such (l, r, q). We now construct a
′
tuple (a, b, c, d) such that (a, b, c, d) ∈ I ⊂ Q : We use all the previous results in designing
Let (a, b, c, d) = (l, r − q, l + q, r). Then we have Algorithm 2. We can thus state the theorem:
l ≤ r − q and l + q ≤ r from 2q + 1 ≤ r − l + 1
and S[l..r − q] = S[l + q..r] since q is a period Theorem 11 Problem 1 can be solved in
of w := S[l..r] and thus w [i] = w [i + q] for all O(N) time.
′
′
′
1 ≤ i ≤ r−q. Thus we have (l, r−q, l+q, r) ∈ Q . Proof. Algorithm 2 computes the desired value
′
Since |S[l..r]| = r − l + 1 < r − l + 2 = ans = |Q|. The correctness of Algorithm 1 is
r−l+1−2q+(2q+1) ≤ r−l+1−2q+(r−l+1) = given by Lemma 4. The step 4 is correct by
2r − 2l − 2q + 2 = |S[l..r − q]| + |S[l + q..r]|, Lemma 10. Lemma 7 proves the correctness of
there must be some intersection, hence the step 6.
[l, r − q] ∩ [l + q, r] ̸= ∅. It follows that The suffix tree of the string S$ in step 1 is com-
(l, r − q, l + q, r) ∈ I. Thus, for every (l, r, q) puted in O(N) time using Ukkonen’s algorithm
element of any W (i,j,p) , there is a distinct [4]. Algorithm 1 runs in linear time by Lemma
(a, b, c, d) ∈ I. 4. The Runs(S) is computed in O(N) time us-
ing the algorithm by Bannai et al. [1] Step 5
To count (a, b, c, d) ∈ I, we now compute for is done in linear time since both the number of
each run (i, j, p) ∈ Runs(S) the value |W (i,j,p) | runs of S and the sum of exponents e of all runs
representing the number of tuples (a, b, c, d) ∈ I is in O(N) per Bannai et al. and Step 6 takes
which are contained in the run. Then ans 2 = constant time. Thus the entire algorithm runs
X
|W (i,j,p) |. in O(N) time.
(i,j,p)∈Runs(S)
Algorithm 2: The algorithm for solving
Lemma 10 Let (i, j, p) ∈ Runs(S) Problem 1.
with exponent e be any run of S. Then 1 build suffix tree on the string S$ ;
e 2 compute ans 1 using Algorithm 1 ;
⌈ ⌉−1 j−i+1−2kp
2
X X
|W (i,j,p) | = n . 3 compute the set Runs(S) of runs of S ;
k=1 n=1 4 compute the values |W (i,j,p) | ;
Proof. For run (i, j, p) ∈ Runs(S), first con- 5 compute ans 2 as X |W | ;
sider the tuples (l, r, q) ∈ W (i,j,p) with a fixed (i,j,p)∈Runs(S) (i,j,p)
q = kp for some k ∈ N. With each length 6 compute ans = (ans 1 − ans 2 )/2 ;
2kp + 1 ≤ |S[l..r]| ≤ j − i + 1 of the substring
S[l..r] of S[i..j], the number of possible starting
positions l varies: for |S[l..r]| = j − i + 1,
the substring can be in exactly one position, 4 Conclusions and Future
for S[l..r] = j − i, two positions are possible.
Through the summation of the number of Work
possible positions, 1 through j − i + 1 − 2kp, for
j−i+1−2kp In this paper we show a linear time algorithm
X
all valid lengths of S[l..r], we obtain n, that counts the number of gapped repeats in a
string.
n=1
the number of tuples (l, r, kp) ∈ W (i,j,p) . Now One open problem is counting the number of
consider the possible values k so that q = kp. gapped repeats uvu with a constrained gap size