Page 57 - 2024S
P. 57

50                                                                UEC Int’l Mini-Conference No.52









              Continuous Sign Language Recognition using Squeeze-and-Excitation

                                                      Networks


                                        Nguyen-Tu NAM , Hiroki TAKAHASHI
                                                          ∗
                                               Department of Informatics
                                       The University of Electro-Communications
                                                      Tokyo, Japan


             Keywords: Continuous Sign Language Recognition (CSLR), CNN, GRU, CTC.



                                                        Abstract

                    Continuous Sign Language Recognition (CSLR) is an important task that is being studied in depth
                 to minimize the communication gap between the hearing and Deaf and Hard of Hearing (DHH) com-
                 munities by translating sign language videos into written or spoken language. The complexity of sign
                 language, such as hand movements and facial expressions, makes effectively learning spatial-temporal
                 features from video inputs a significant challenge for CSLR systems. This paper proposes an efficient
                 CSLR framework that leverages Squeeze-and-Excitation (SE) networks to enhance feature representa-
                 tion. SE blocks are integrated into a 2D CNN feature extraction module. Subsequently, a 1D CNN
                 and a Bidirectional Gated Recurrent Unit (BiGRU) component are used to capture both short-term
                 and long-term dependencies, followed by a classifier with Connectionist Temporal Classification (CTC)
                 loss. The SE module calculates attention weights for each channel to emphasize the most informative
                 features, allowing the network to focus on the critical aspects of the sign gestures. The effectiveness
                 of our proposed model was evaluated on two benchmark datasets, RWTH-PHOENIX-Weather and
                 RWTH-PHOENIX-Weather 2014 T, both of German sign language. The results demonstrate that our
                 proposed framework achieves state-of-the-art performance.






























               ∗ The author is supported by (AiQusci) MEXT Scholar-
             ship.
   52   53   54   55   56   57   58   59   60   61   62