Page 34 - 2024F
P. 34

UEC Int’l Mini-Conference No.53                                                               27







                                                              Since 2q + 1 ≤ S[l..r] ≤ j − i + 1, we need not
              S' =                                            consider values k ∈ N so that 2kp+1 > j −i+1:
                                                              the maximum value for k that needs to be con-
                                                                        e
                                                              sidered is ⌈ ⌉−1.By then adding up the previous
               Figure 1: Structure of S assuming b ≤ d.                 2
                                      ′
                                                              sums for each possible q = kp, we finally obtain
                                                                         e
                                                                        ⌈ ⌉−1  j−i+1−2kp
                                                                         2
                                                                         X        X
                                                              |W     | =                n
            distinct (a, b, c, d) ∈ I.                          (i,j,p)
                                                                         k=1      n=1
            Consider any such (l, r, q). We now construct a
                                                       ′
            tuple (a, b, c, d) such that (a, b, c, d) ∈ I ⊂ Q :  We use all the previous results in designing
            Let (a, b, c, d) = (l, r − q, l + q, r). Then we have  Algorithm 2. We can thus state the theorem:
            l ≤ r − q and l + q ≤ r from 2q + 1 ≤ r − l + 1
            and S[l..r − q] = S[l + q..r] since q is a period  Theorem 11     Problem 1 can be solved in
            of w := S[l..r] and thus w [i] = w [i + q] for all  O(N) time.
                ′
                                            ′
                                     ′
            1 ≤ i ≤ r−q. Thus we have (l, r−q, l+q, r) ∈ Q .  Proof. Algorithm 2 computes the desired value
                                                       ′
            Since |S[l..r]| = r − l + 1 < r − l + 2 =         ans = |Q|. The correctness of Algorithm 1 is
            r−l+1−2q+(2q+1) ≤ r−l+1−2q+(r−l+1) =              given by Lemma 4. The step 4 is correct by
            2r − 2l − 2q + 2 = |S[l..r − q]| + |S[l + q..r]|,  Lemma 10. Lemma 7 proves the correctness of
            there  must   be  some   intersection,  hence     the step 6.
            [l, r − q] ∩ [l + q, r] ̸= ∅.  It follows that    The suffix tree of the string S$ in step 1 is com-
            (l, r − q, l + q, r) ∈ I. Thus, for every (l, r, q)  puted in O(N) time using Ukkonen’s algorithm
            element of any W   (i,j,p) , there is a distinct  [4]. Algorithm 1 runs in linear time by Lemma
            (a, b, c, d) ∈ I.                                 4. The Runs(S) is computed in O(N) time us-
                                                              ing the algorithm by Bannai et al. [1] Step 5
              To count (a, b, c, d) ∈ I, we now compute for   is done in linear time since both the number of
            each run (i, j, p) ∈ Runs(S) the value |W (i,j,p) |  runs of S and the sum of exponents e of all runs
            representing the number of tuples (a, b, c, d) ∈ I  is in O(N) per Bannai et al. and Step 6 takes
            which are contained in the run. Then ans 2 =      constant time. Thus the entire algorithm runs
                X
                        |W (i,j,p) |.                         in O(N) time.
            (i,j,p)∈Runs(S)
                                                               Algorithm 2: The algorithm for solving
            Lemma      10 Let    (i, j, p)  ∈   Runs(S)        Problem 1.
            with exponent e be any run of S.        Then      1 build suffix tree on the string S$ ;
                        e                                     2 compute ans 1 using Algorithm 1 ;
                       ⌈ ⌉−1  j−i+1−2kp
                        2
                       X        X
            |W (i,j,p) | =             n .                    3 compute the set Runs(S) of runs of S ;
                       k=1      n=1                           4 compute the values |W (i,j,p) | ;
            Proof. For run (i, j, p) ∈ Runs(S), first con-    5 compute ans 2 as    X       |W     | ;
            sider the tuples (l, r, q) ∈ W (i,j,p)  with a fixed                (i,j,p)∈Runs(S)  (i,j,p)
            q = kp for some k ∈ N. With each length           6 compute ans = (ans 1 − ans 2 )/2 ;
            2kp + 1 ≤ |S[l..r]| ≤ j − i + 1 of the substring
            S[l..r] of S[i..j], the number of possible starting
            positions l varies: for |S[l..r]| = j − i + 1,
            the substring can be in exactly one position,     4    Conclusions         and       Future
            for S[l..r] = j − i, two positions are possible.
            Through the summation of the number of                 Work
            possible positions, 1 through j − i + 1 − 2kp, for
                                              j−i+1−2kp       In this paper we show a linear time algorithm
                                                X
            all valid lengths of S[l..r], we obtain   n,      that counts the number of gapped repeats in a
                                                              string.
                                                n=1
            the number of tuples (l, r, kp) ∈ W (i,j,p) . Now  One open problem is counting the number of
            consider the possible values k so that q = kp.    gapped repeats uvu with a constrained gap size
   29   30   31   32   33   34   35   36   37   38   39