Page 57 - 2024S
P. 57
50 UEC Int’l Mini-Conference No.52
Continuous Sign Language Recognition using Squeeze-and-Excitation
Networks
Nguyen-Tu NAM , Hiroki TAKAHASHI
∗
Department of Informatics
The University of Electro-Communications
Tokyo, Japan
Keywords: Continuous Sign Language Recognition (CSLR), CNN, GRU, CTC.
Abstract
Continuous Sign Language Recognition (CSLR) is an important task that is being studied in depth
to minimize the communication gap between the hearing and Deaf and Hard of Hearing (DHH) com-
munities by translating sign language videos into written or spoken language. The complexity of sign
language, such as hand movements and facial expressions, makes effectively learning spatial-temporal
features from video inputs a significant challenge for CSLR systems. This paper proposes an efficient
CSLR framework that leverages Squeeze-and-Excitation (SE) networks to enhance feature representa-
tion. SE blocks are integrated into a 2D CNN feature extraction module. Subsequently, a 1D CNN
and a Bidirectional Gated Recurrent Unit (BiGRU) component are used to capture both short-term
and long-term dependencies, followed by a classifier with Connectionist Temporal Classification (CTC)
loss. The SE module calculates attention weights for each channel to emphasize the most informative
features, allowing the network to focus on the critical aspects of the sign gestures. The effectiveness
of our proposed model was evaluated on two benchmark datasets, RWTH-PHOENIX-Weather and
RWTH-PHOENIX-Weather 2014 T, both of German sign language. The results demonstrate that our
proposed framework achieves state-of-the-art performance.
∗ The author is supported by (AiQusci) MEXT Scholar-
ship.