Page 79 - 2025S
P. 79

72                                                                UEC Int’l Mini-Conference No.54

                       Advanced EEG­to­Text Translation Using Pre­trained
                          Language Models and Multi­Modal Transformers


                                       Jose Manuel Carrichi Chavez , Prof. Toru Nakashika 2
                                                               1
                             1 UEC Exchange Study Program JUSST, Instituto Politécnico Nacional, Mexico.
                         2 Department of Computer and Network Engineering, Graduate School of Informatics and
                                    Engineering, The University of Electro­Communications, Japan



                             Introduction

               Brain­Computer  Interfaces  (BCIs)  seek  to  enhance  communication
               for  people  with  motor  or  speech  impairments.  While  EEG­to­text
               systems  have  advanced,  they  still  rely  on  closed  vocabularies  and
               eye­tracking.  EEG2TEXT  [1]  improved  open­vocabulary  accuracy
               using  EEG­specific  pretraining  and  a  multi­view  transformer.
               Meanwhile,  LLMs  like  GPT  and  LLaMA  have  transformed  NLP
               through strong multimodal capabilities.
               We propose combining an EEG2TEXT­style encoder with LLaMA as
               decoder,  using  the  ZuCo  dataset  focused  on  EEG  from  reading
               tasks.
                                                               Figure 2. Transformer Pretraining:
                                                               The convolutional transformer will be pretrained in a self­supervised
                              Objectives                       manner using a masking strategy.

                 Develop  and  evaluate  an  EEG­to­text  translation  system  that
               •
                  integrates  a  transformer­based  EEG  encoder  with  a  state­of­
                  the­art large language model (LLM) as the decoder.
                 Improve  the  accuracy,  fluency,  and  semantic  coherence  of  text
               •
                  generated  from  non­invasive  EEG  signals,  surpassing  existing
                  methods.
                 Achieve  semantically  enriched  EEG  representations  through  a
               •
                  self­supervised  pretraining  phase  of  the  EEG  encoder,  using
                  tasks such as masked EEG signal reconstruction.
                            Methodology











              Figure 1. Spatiotemporal Convolution:            Figure 3. Spatial Modeling with Multi­View Transformer:
              Since  a  standard  transformer  cannot  handle  extremely  long   A  multi­view  transformer  will  be  implemented  by  dividing  EEG
              sequences,  a  CNN  encoder  is  used  to  compress  the  signal  into  a   signals into 12 groups based on brain regions (e.g., frontal, parietal,
              more manageable sequence for the transformer.    occipital).






                   Equation 1. Self­supervised pretraining objective.  Equation 2. Conditional probability of text generation given EEG
                                                                                   input.




             References

             [1]  H. Liu, D. Hajialigol, B. Antony, A. Han, y X. Wang, “EEG2Text: Open vocabulary EEG­to­text translation with multi­view transformer”, in 2024
             IEEE International Conference on Big Data (BigData), 2024, pp. 1824–1833.
   74   75   76   77   78   79   80   81   82   83   84