Page 57 - 2024S

P. 57

50 UEC Int’l Mini-Conference No.52

Continuous Sign Language Recognition using Squeeze-and-Excitation

Networks

Nguyen-Tu NAM , Hiroki TAKAHASHI
∗
Department of Informatics
The University of Electro-Communications
Tokyo, Japan

Keywords: Continuous Sign Language Recognition (CSLR), CNN, GRU, CTC.

Abstract

Continuous Sign Language Recognition (CSLR) is an important task that is being studied in depth
to minimize the communication gap between the hearing and Deaf and Hard of Hearing (DHH) com-
munities by translating sign language videos into written or spoken language. The complexity of sign
language, such as hand movements and facial expressions, makes eﬀectively learning spatial-temporal
features from video inputs a signiﬁcant challenge for CSLR systems. This paper proposes an eﬃcient
CSLR framework that leverages Squeeze-and-Excitation (SE) networks to enhance feature representa-
tion. SE blocks are integrated into a 2D CNN feature extraction module. Subsequently, a 1D CNN
and a Bidirectional Gated Recurrent Unit (BiGRU) component are used to capture both short-term
and long-term dependencies, followed by a classiﬁer with Connectionist Temporal Classiﬁcation (CTC)
loss. The SE module calculates attention weights for each channel to emphasize the most informative
features, allowing the network to focus on the critical aspects of the sign gestures. The eﬀectiveness
of our proposed model was evaluated on two benchmark datasets, RWTH-PHOENIX-Weather and
RWTH-PHOENIX-Weather 2014 T, both of German sign language. The results demonstrate that our
proposed framework achieves state-of-the-art performance.

∗ The author is supported by (AiQusci) MEXT Scholar-
ship.

52 53 54 55 56 57 58 59 60 61 62