Page 39 - 2024S
P. 39

32                                                                UEC Int’l Mini-Conference No.52








                 Odor Detection Based on Semantic Context in Video Subtitles


                                                                  2
                         Yi Chen Shen   ∗1 , Haruka MATSUKURA , and Maki SAKAMOTO           2
                                  1 UEC Exchange Study Program (JUSST Program)
                                   2 Graduate School of Informatics and Engineering
                              The University of Electro-Communications, Tokyo, Japan





                                                       Abstract


                   Our research introduces an innovative approach to enhancing multimedia experiences by developing
                a software system that uses advanced language models to detect and predict relevant odors from
                video subtitles. This enables the addition of olfactory information to conventional multimedia, which
                typically comprises only visual and audio elements. By applying fine-tuning and modern prompt
                engineering techniques to large language models, we achieved over 95% accuracy in odor prediction
                tasks. Our comprehensive evaluation, including both model comparisons and a user study with a
                13-component olfactory display, demonstrates the system’s effectiveness in terms of accuracy, cost-
                efficiency, and user engagement. This work showcases the potential for further advancements in
                multimedia experiences through synchronized odor integration, paving the way for more immersive
                and engaging content consumption.
            Keywords: Olfactory Display, Virtual Reality, Artificial Intelligence, Large Language Model, Human-
            Computer Interaction.

            1 Introduction                                    with video content.  This paper explores the
                                                              development of our software, focusing on how
            In today’s world, multimedia experiences are      it processes textual information and predicts
            evolving beyond traditional audiovisual content   relevant  odor  labels  using  state-of-the-art
            to create more immersive and engaging inter-      language models, fine-tuning techniques, and
            actions. While significant advancements have      modern prompt engineering methods.
            been made in visual and auditory technologies,
            the integration of olfactory stimuli—our sense      The key objectives of this research are:
            of  smell—remains   largely  unexplored  and
            underutilized.  This research aims to bridge       1. To develop a robust system for detecting
            this sensory gap by developing an innovative          and predicting odors based on semantic
            software system that analyzes video subtitles         analysis of video subtitles.
            to detect and suggest corresponding scents,        2. To evaluate and compare the performance
            thereby enhancing the overall multimedia              of various large language models in the con-
            experience.
                                                                  text of odor prediction.

              Our approach harnesses the power of ad-          3. To assess the real-world effectiveness and
            vanced language models to understand the              user perception of olfactory-enhanced mul-
            context and nuances of language used in subti-        timedia through a comprehensive user
            tles. By identifying phrases or words that hint       study.
            at specific scents, our system enables a dynamic
            and responsive scent experience synchronized        Our work builds upon existing research in
                                                              olfactory displays and multi-modal sensory in-
               ∗ The author is supported by JASSO Scholarship.  tegration, while introducing novel techniques
   34   35   36   37   38   39   40   41   42   43   44