Page 69 - 2024S

P. 69

62 UEC Int’l Mini-Conference No.52

Implementation of the Sliding-Window Suﬃx Tree
Construction and Its Applications

Leon Flaack , Prof. Takuya Mieno
Department of Computer and Network Engineering
University of Electro-Communications, Tokyo, Japan

Introduction Suﬃx Trees
A Suﬃx Tree is a data structure that enables the eﬃcient solution A suﬃx tree of a string is the compacted trie of all the suﬃxes of
of many problems and tasks related to Strings, such as pattern the string. Thus, it is a tree structure corresponding to a String of
matching or ﬁnding the longest common substring of multiple characters where following the path from the root to each leaf
Strings. yields a unique nonempty suﬃx of the string:
The implemented algorithm solves the problem of maintaining a
suﬃx tree for a sliding window of a given size over an incoming
stream of text of any length.
While it is possible to construct a suﬃx tree for the whole input
string, this is not reasonable especially for longer strings. This is
because suﬃx trees grow very large with regard to their
corresponding string.
In addition, classic suﬃx trees tend to have limitations when it
comes to adding characters after their completion, which makes
the sliding window uniquely ﬁt for stream data.

Algorithm Outline Deletion
A suﬃx tree for the sliding window is maintained mainly by The deletion of the old character makes use of the algorithm
executing two steps: proposed by Leonard [1] using leaf pointers that point to any
descendant leaf for every inner node:
1.Inserting the new character into the tree
2.Deleting the old character from the tree 1.Delete the deepest leaf node and the edge connecting to it
2.If the inner node the leaf was connected to has only one
By doing this every time the window slides by a character, the remaining child, delete the node and connect its incoming edge
tree stays in correspondence with the current window's content. to its outgoing edge
In order to enable an eﬃcient execution of these steps, concepts 3.If the active point was on the deleted edge or node, create
like suﬃx links and leaf pointers are used. corresponding nodes and edges to it.

Insertion
The insertion of a new character into the suﬃx tree is conducted
using Ukkonen's Algorithm [2]. This makes use of the so-called
"Active Point" to track the location of the longest repeating suﬃx
(the longest suﬃx that is not a unique suﬃx), where the next Expected Results
character would be inserted.
The algorithm roughly operates as follows, utilizing suﬃx links to It has been proven in theory that this algorithm can be used to
navigate between nodes: iterate over the whole input text T in O(|T| log σ) time and with a
space complexity of O(|W|), where W is the window and σ the size
1.Consider the location of the active point of the underlying alphabet [1].
2.Try to append the new character to the current location
3.Follow the active point's parent's suﬃx link to the next node To verify the correct implementation, experiments for diﬀerent
4.Repeat steps 2. and 3. until the new character can be traversed strings of varying lengths will be conducted and their results
from the current location. analyzed to conﬁrm the calculated runtime.

References
[1] L. Leonard, S. Inenaga, H. Bannai, and T. Mieno, “Constant-time edge
label and leaf pointer maintenance on sliding suﬃx trees.” arXiv, Feb. 29,
2024. doi: 10.48550/arXiv.2307.01412.
[2] E. Ukkonen, “On-line construction of suﬃx trees,” Algorithmica, vol. 14,
no. 3, pp. 249–260, Sep. 1995, doi: 10.1007/BF01206331.

64 65 66 67 68 69 70 71 72 73 74