Page 69 - 2024S
P. 69
62 UEC Int’l Mini-Conference No.52
Implementation of the Sliding-Window Suffix Tree
Construction and Its Applications
Leon Flaack , Prof. Takuya Mieno
Department of Computer and Network Engineering
University of Electro-Communications, Tokyo, Japan
Introduction Suffix Trees
A Suffix Tree is a data structure that enables the efficient solution A suffix tree of a string is the compacted trie of all the suffixes of
of many problems and tasks related to Strings, such as pattern the string. Thus, it is a tree structure corresponding to a String of
matching or finding the longest common substring of multiple characters where following the path from the root to each leaf
Strings. yields a unique nonempty suffix of the string:
The implemented algorithm solves the problem of maintaining a
suffix tree for a sliding window of a given size over an incoming
stream of text of any length.
While it is possible to construct a suffix tree for the whole input
string, this is not reasonable especially for longer strings. This is
because suffix trees grow very large with regard to their
corresponding string.
In addition, classic suffix trees tend to have limitations when it
comes to adding characters after their completion, which makes
the sliding window uniquely fit for stream data.
Algorithm Outline Deletion
A suffix tree for the sliding window is maintained mainly by The deletion of the old character makes use of the algorithm
executing two steps: proposed by Leonard [1] using leaf pointers that point to any
descendant leaf for every inner node:
1.Inserting the new character into the tree
2.Deleting the old character from the tree 1.Delete the deepest leaf node and the edge connecting to it
2.If the inner node the leaf was connected to has only one
By doing this every time the window slides by a character, the remaining child, delete the node and connect its incoming edge
tree stays in correspondence with the current window's content. to its outgoing edge
In order to enable an efficient execution of these steps, concepts 3.If the active point was on the deleted edge or node, create
like suffix links and leaf pointers are used. corresponding nodes and edges to it.
Insertion
The insertion of a new character into the suffix tree is conducted
using Ukkonen's Algorithm [2]. This makes use of the so-called
"Active Point" to track the location of the longest repeating suffix
(the longest suffix that is not a unique suffix), where the next Expected Results
character would be inserted.
The algorithm roughly operates as follows, utilizing suffix links to It has been proven in theory that this algorithm can be used to
navigate between nodes: iterate over the whole input text T in O(|T| log σ) time and with a
space complexity of O(|W|), where W is the window and σ the size
1.Consider the location of the active point of the underlying alphabet [1].
2.Try to append the new character to the current location
3.Follow the active point's parent's suffix link to the next node To verify the correct implementation, experiments for different
4.Repeat steps 2. and 3. until the new character can be traversed strings of varying lengths will be conducted and their results
from the current location. analyzed to confirm the calculated runtime.
References
[1] L. Leonard, S. Inenaga, H. Bannai, and T. Mieno, “Constant-time edge
label and leaf pointer maintenance on sliding suffix trees.” arXiv, Feb. 29,
2024. doi: 10.48550/arXiv.2307.01412.
[2] E. Ukkonen, “On-line construction of suffix trees,” Algorithmica, vol. 14,
no. 3, pp. 249–260, Sep. 1995, doi: 10.1007/BF01206331.