Page 79 - 2025S

P. 79

72 UEC Int’l Mini-Conference No.54

Advanced EEGtoText Translation Using Pretrained
Language Models and MultiModal Transformers

Jose Manuel Carrichi Chavez , Prof. Toru Nakashika 2
1
1 UEC Exchange Study Program JUSST, Instituto Politécnico Nacional, Mexico.
2 Department of Computer and Network Engineering, Graduate School of Informatics and
Engineering, The University of ElectroCommunications, Japan

Introduction

BrainComputer Interfaces (BCIs) seek to enhance communication
for people with motor or speech impairments. While EEGtotext
systems have advanced, they still rely on closed vocabularies and
eyetracking. EEG2TEXT [1] improved openvocabulary accuracy
using EEGspecific pretraining and a multiview transformer.
Meanwhile, LLMs like GPT and LLaMA have transformed NLP
through strong multimodal capabilities.
We propose combining an EEG2TEXTstyle encoder with LLaMA as
decoder, using the ZuCo dataset focused on EEG from reading
tasks.
Figure 2. Transformer Pretraining:
The convolutional transformer will be pretrained in a selfsupervised
Objectives manner using a masking strategy.

Develop and evaluate an EEGtotext translation system that
•
integrates a transformerbased EEG encoder with a stateof
theart large language model (LLM) as the decoder.
Improve the accuracy, fluency, and semantic coherence of text
•
generated from noninvasive EEG signals, surpassing existing
methods.
Achieve semantically enriched EEG representations through a
•
selfsupervised pretraining phase of the EEG encoder, using
tasks such as masked EEG signal reconstruction.
Methodology

Figure 1. Spatiotemporal Convolution: Figure 3. Spatial Modeling with MultiView Transformer:
Since a standard transformer cannot handle extremely long A multiview transformer will be implemented by dividing EEG
sequences, a CNN encoder is used to compress the signal into a signals into 12 groups based on brain regions (e.g., frontal, parietal,
more manageable sequence for the transformer. occipital).

Equation 1. Selfsupervised pretraining objective. Equation 2. Conditional probability of text generation given EEG
input.

References

[1] H. Liu, D. Hajialigol, B. Antony, A. Han, y X. Wang, “EEG2Text: Open vocabulary EEGtotext translation with multiview transformer”, in 2024
IEEE International Conference on Big Data (BigData), 2024, pp. 1824–1833.

74 75 76 77 78 79 80 81 82 83 84