Page 67 - 2024S
P. 67
60 UEC Int’l Mini-Conference No.52
[一般発表]BEYOND WORD COUNT: EXPLORING
APPROXIMATED TARGET LENGTHS FOR CIF-RNNT
☆Wen Shen TEO, Yasuhiro MINAMI (The University of Electro-Communications) 電気通信大学
CIF-RNNT Idea
∗
−1
∗
: = σ =0 : ′ ∗ , if training ℒ = − Past Current
target length predicted length = ൞ otherwise ∗ Conventional CIF-RNNT:
,
RNN-T: CIF-RNNT:
Decode when
Weight Decode enough new
Predic- Weight Quantity Integra- RNN-T at every Decode information
tion Scaling Loss tion Loss fixed when a (equivalent to a
time word is word) is
interval uttered accumulated
Methodology
∗
Approximate word count for Target Length ( ) from
• Token count (Token #)
ℒ = ℒ + ℒ
• Self-information from -gram LMs, ; = − log ()
Approximation slope : Reference text
calculated by linear
Approximation Parameters regression
Approximated ∗ Token count or Self-
word count for ෩ = ( ) information of
LibriSpeech CSJ utterance utterance
∗
Type
slope MSE ൗ slope slope MSE ൗ slope
Token # 0.504 28.128 1.983 0.639 8.772 1.564
1gram 0.214 23.449 4.667 0.271 14.326 3.689
2gram 0.281 27.387 3.560 0.417 16.199 2.401
3gram 0.364 24.877 2.750 0.558 17.173 1.791
4gram 0.455 24.435 2.196 0.666 14.771 1.502
5gram 0.507 28.751 1.971 0.722 14.405 1.385
− log 3gram − log 3gram
Results
LibriSpeech CSJ (fluent)
Type
−5
clean other RTF (× 10 ) eval1 eval2 eval3 RTF (× 10 )
−4
No CIF 3.22 8.08 5.31 4.53 3.81 3.77 13.08
Word # 3.49 8.59 3.50 - - - -
Token # 3.54 8.57 3.37 4.43 3.90 4.12 9.20
1gram 3.50 8.41 3.34 4.47 3.77 4.10 9.20
2gram 3.47 8.52 3.34 4.42 3.80 4.04 9.09
3gram 3.38 8.23 3.33 4.55 3.79 3.94 9.10
4gram 3.38 8.45 3.37 4.52 3.99 4.06 9.07
5gram 3.47 8.50 3.38 4.66 4.03 4.04 9.08
Findings
Word Segmentation (Naïve vs Actual) Impact of Chunk Size against Error Rates