Page 69 - 2024S
P. 69

62                                                                UEC Int’l Mini-Conference No.52




                            Implementation of the Sliding-Window Suffix Tree
                                       Construction and Its Applications


                                                  Leon Flaack , Prof. Takuya Mieno
                                           Department of Computer and Network Engineering
                                           University of Electro-Communications, Tokyo, Japan


                              Introduction                                     Suffix Trees
               A Suffix Tree is a data structure that enables the efficient solution   A suffix tree of a string is the compacted trie of all the suffixes of
               of many problems and tasks related to Strings, such as pattern   the string. Thus, it is a tree structure corresponding to a String of
               matching or finding the longest common substring of multiple   characters where following the path from the root to each leaf
               Strings.                                        yields a unique nonempty suffix of the string:
               The implemented algorithm solves the problem of maintaining a
               suffix tree for a sliding window of a given size over an incoming
               stream of text of any length.
               While it is possible to construct a suffix tree for the whole input
               string, this is not reasonable especially for longer strings. This is
               because suffix trees grow very large with regard to their
               corresponding string.
               In addition, classic suffix trees tend to have limitations when it
               comes to adding characters after their completion, which makes
               the sliding window uniquely fit for stream data.

                           Algorithm Outline                                     Deletion
               A suffix tree for the sliding window is maintained mainly by   The deletion of the old character makes use of the algorithm
               executing two steps:                            proposed by Leonard [1] using leaf pointers that point to any
                                                               descendant leaf for every inner node:
               1.Inserting the new character into the tree
               2.Deleting the old character from the tree      1.Delete the deepest leaf node and the edge connecting to it
                                                               2.If the inner node the leaf was connected to has only one
               By doing this every time the window slides by a character, the   remaining child, delete the node and connect its incoming edge
               tree stays in correspondence with the current window's content.  to its outgoing edge
               In order to enable an efficient execution of these steps, concepts   3.If the active point was on the deleted edge or node, create
               like suffix links and leaf pointers are used.     corresponding nodes and edges to it.







                                Insertion
               The insertion of a new character into the suffix tree is conducted
               using Ukkonen's Algorithm [2]. This makes use of the so-called
               "Active Point" to track the location of the longest repeating suffix
               (the longest suffix that is not a unique suffix), where the next   Expected Results
               character would be inserted.
               The algorithm roughly operates as follows, utilizing suffix links to   It has been proven in theory that this algorithm can be used to
               navigate between nodes:                         iterate over the whole input text T in O(|T| log σ) time and with a
                                                               space complexity of O(|W|), where W is the window and σ the size
              1.Consider the location of the active point      of the underlying alphabet [1].
              2.Try to append the new character to the current location
              3.Follow the active point's parent's suffix link to the next node  To verify the correct implementation, experiments for different
              4.Repeat steps 2. and 3. until the new character can be traversed   strings of varying lengths will be conducted and their results
               from the current location.                      analyzed to confirm the calculated runtime.


                                                                                References
                                                                [1] L. Leonard, S. Inenaga, H. Bannai, and T. Mieno, “Constant-time edge
                                                               label and leaf pointer maintenance on sliding suffix trees.” arXiv, Feb. 29,
                                                               2024. doi: 10.48550/arXiv.2307.01412.
                                                                [2] E. Ukkonen, “On-line construction of suffix trees,” Algorithmica, vol. 14,
                                                               no. 3, pp. 249–260, Sep. 1995, doi: 10.1007/BF01206331.
   64   65   66   67   68   69   70   71   72   73   74