Page 32 - 2024F
P. 32
UEC Int’l Mini-Conference No.53 25
common prefix of the strings str(v 1 ) and str(v 2 ). have that S[a + l − 1] ̸= S[c + l − 1] for any
For each suffix S[i..N] of S, there is a node v l > L, so for any assignment b := a + l − 1 and
such that S[i..N] = str(v) and by the previous d := c + l − 1 the property S[a..b] = S[c..d]
property, the node v is unique, thus we denote does not hold. For other values of b and d, the
it by ℓ(i). We term ℓ(i) the associated node of strings S[a..b] and S[c..d] have different length.
the suffix S[i..N]. If S is of the form S $ where Thus, the number of tuples (a, b, c, d) with the
′
$ is a character that does not appear anywhere desired property is |{1..L}|, which, in turn, is
in S , then the associated node ℓ(i) of a suffix equal to |lcp(S[a..N], S[c..N])|.
′
S[i..N] of S is a leaf. We denote the root of the
tree by v r . Let T be the suffix tree built on the string S$
We denote by Tree(v) the set of all nodes of length N + 1, where the character $ does not
in the subtree of v, i.e. v and all its de- appear in the string S.
scendant nodes. By the property of the lcp Then pairs of suffixes with a certain longest com-
function we have that lcp(S[i..N], S[j..N]) = mon prefix can be found using Remark 3:
str(lca(ℓ(i), ℓ(j))).
Assuming the alphabet of a string is of a Remark 3 For two suffixes S[i..N] and
constant size, the suffix tree can be constructed S[j..N], we have that lca(ℓ(i), ℓ(j)) = u is equiv-
in O(N) time using Ukkonen’s algorithm [4]. alent to the existence of two nodes v 1 and v 2
Given a suffix tree of a string, Gusfield [2] shows children of the node u such that ℓ(i) ∈ Tree(v 1 )
how to compute lcp(S[i..N], S[j..N]) in O(1) and ℓ(j) ∈ Tree(v 2 ).
time. Since we have that lcp(str(ℓ(i)), str(ℓ(j))) =
str(lca(ℓ(i), ℓ(j))), any such pair i, j indicates
We define an integer p ≥ 1 to be a period suffixes S[i..N], S[j..N] with the same longest
of string S if S[i] = S[i + p] for all 1 ≤ i ≤ common prefix.
|S|−p. We also call a triple r = (i, j, p) a run of
string S if the smallest period p of S[i..j] both Algorithm 1 The Algorithm 1 is comprised
satisfies |S[i..j]| ≥ 2p and the periodicity cannot of the procedure iterate() that is ran with the
be extended to the left or right, i.e. either i = 1 root node v r of T as the argument, traversing the
or S[i − 1] ̸= S[i + p − 1], and either j = N or suffix tree in a depth-first manner and counting
S[j + 1] ̸= S[j + p + 1]. The rational number excluding the condition b < c:
j−i+1 is called the exponent of r. Finally, let
p
Runs(S) denote the set of runs of string S. Procedure iterate(v)
Data: ans 1 ← 0;
Sf [u] - the number of suffixes in the
3 Methodology
subtree of the node u
1 sf ← 0;
Problem 1 Let S be a given string of length
N. Count how many tuples (a, b, c, d) exist such 2 cnt ← 0;
that 1 ≤ a ≤ b < c ≤ d ≤ N and S[a..b] = 3 if v is leaf then
S[c..d]. We denote the set of all such tuples by 4 sf ← 1;
5 else
Q.
6 for ch ← child of n do
7 iterate(ch);
Lemma 2 For fixed values a and c, the num-
8 cnt ← cnt + Sf [ch] · sf ;
ber of tuples (a, b, c, d) such that S[a..b] = S[c..d]
9 sf ← sf + Sf [ch];
is |lcp(S[a..N], S[c..N])|.
10 end
Proof. Let L = |lcp(S[a..N], S[c..N])|.
For any l ∈ {1..L} we have that 11 end
S[a..a + l − 1] = S[c..c + l − 1], thus as- 12 ans 1 ← ans 1 + cnt · |str(v)|;
signing b := a + l − 1 and d := c + l − 1 13 Sf [v] ← sf ;
the property S[a..b] = S[c..d] holds. Also we