Page 82 - 2025S
P. 82

UEC Int’l Mini-Conference No.54                                                               75









                Swin-UNet++: A Multi-Scale Transformer-CNN Hybrid for Brain

                                     Tumor Segmentation in MRI Scans


                                                               2
                                                                               3
                       Rinvi Jaman Riti   ∗1 , Norihiro Koizumi , Yu Nishiyama , and Peiji Chen  4
                                   1 UEC Exchange Study Program (JUSST Program)
                            2,4 Department of Mechanical and Intelligent Systems Engineering
                                   3 Department of Computer and Network Engineering
                                The University of Electro-Communications, Tokyo, Japan




             Keywords: Brain Tumor Segmentation, Swin Transformer, U-Net++, MRI Scans, Deep Learning,
             Medical Image Analysis



                                                        Abstract
                    Brain tumors require segmentation as part of clinical diagnosis and planning of therapy. Segmen-
                 tation of tumors in MRI scans is, however, labor-intensive and time-consuming, and it is prone to
                 inter-observer variability since it is done manually. It is based on our idea to introduce a Swin-UNet++
                 model, a new type of hybrid deep learning model that has both advantages of convolutional neural net-
                 works and Swin Transformer blocks largely together, in a very successful way. The architecture created
                 is novel, as it solves the purpose of providing accurate and efficient segmentation of brain tumors by
                 capturing local, as well as global image features. However, as Swin Transformers use the shifted win-
                 dow attention method that allows sustaining fine-grained features, when supplemented with dense skip
                 connections of U-Net++, the architecture allows the model to consider not only the local information
                 but also provides it with global contextual connectivity. On the other hand, the BRATS 2021 Task 1
                 dataset, consisting of various modalities of magnetic resonance imaging, has many multi-modal scans,
                 and the tumor masks are precisely annotated by healthcare professionals. That is why we developed
                 and evaluated our model using this dataset. Based on Dice, which was 0.89, the Intersection-over-Union
                 (IoU), which was 0.83, and pixel-wise accuracy that was good, it has been observed in experiments that
                 Swin-UNet++ has a better precision and robustness as compared to conventional CNN-based designs.
                 This combination methodology shows strong potential in real-world clinical applications and will set a
                 precedent when it comes to Transformer-CNN fusion in the medical picture segmentation process.





















               ∗
                The author is supported by JASSO Scholarship.
   77   78   79   80   81   82   83   84   85   86   87