Page 67 - 2024S
P. 67

60                                                                UEC Int’l Mini-Conference No.52

                       [一般発表]BEYOND WORD COUNT: EXPLORING

                        APPROXIMATED TARGET LENGTHS FOR CIF-RNNT
                     ☆Wen Shen TEO, Yasuhiro MINAMI (The University of Electro-Communications)   電気通信大学

                                  CIF-RNNT                                              Idea
                                                          ∗
                            −1
               ∗
               :      ෡  = σ =0    :   ′     ∗    ,  if training  ℒ  =    −  ෡   Past   Current
              target length  predicted length    = ൞ ෡   otherwise   ∗  Conventional      CIF-RNNT:
                                              ,
                                                                         RNN-T:      CIF-RNNT:
                                                                                                  Decode when
               Weight                                                     Decode                   enough new
               Predic-     Weight     Quantity    Integra-    RNN-T      at every     Decode       information
                tion       Scaling      Loss        tion      Loss        fixed       when a      (equivalent to a
                                                                          time        word is       word) is
                                                                         interval     uttered      accumulated
                                                                                    Methodology

                                                                                                  ∗
                                                                       Approximate word count for Target Length ( ) from
                                                                       • Token count (Token #)
                                                 ℒ = ℒ  + ℒ 
                                                                       • Self-information from -gram LMs,   ;  = − log   ()
                                                                            Approximation slope     : Reference text
                                                                            calculated by linear
                           Approximation Parameters                           regression
                                                                      Approximated    ∗            Token count or Self-
                                                                      word count for   ෩  = ( )  information of
                                                                                              
                                                                                      
                             LibriSpeech              CSJ              utterance                   utterance 
                ∗
                 Type                                    
                       slope   MSE     ൗ slope  slope  MSE   ൗ slope
               Token #  0.504  28.128  1.983  0.639   8.772  1.564
               1gram   0.214   23.449  4.667  0.271  14.326  3.689
               2gram   0.281   27.387  3.560  0.417  16.199  2.401
               3gram   0.364   24.877  2.750  0.558  17.173  1.791
               4gram   0.455   24.435  2.196  0.666  14.771  1.502
               5gram   0.507   28.751  1.971  0.722  14.405  1.385
                                                                             − log  3gram    − log  3gram  
                                                           Results
                                     LibriSpeech                              CSJ (fluent)
                  Type
                                                                                                        −5
                             clean    other   RTF (× 10 )     eval1       eval2       eval3     RTF (× 10 )
                                                      −4
                 No CIF      3.22      8.08       5.31        4.53        3.81        3.77         13.08
                 Word #      3.49      8.59       3.50          -           -           -            -
                 Token #     3.54      8.57       3.37        4.43        3.90        4.12         9.20
                 1gram       3.50      8.41       3.34        4.47        3.77        4.10         9.20
                 2gram       3.47      8.52       3.34        4.42        3.80        4.04         9.09
                 3gram       3.38     8.23        3.33        4.55        3.79        3.94         9.10
                 4gram       3.38      8.45       3.37        4.52        3.99        4.06         9.07
                 5gram       3.47      8.50       3.38        4.66        4.03        4.04         9.08


                                                        Findings

               Word Segmentation (Naïve vs Actual)                Impact of Chunk Size against Error Rates
   62   63   64   65   66   67   68   69   70   71   72