Page 82 - 2025S
P. 82
UEC Int’l Mini-Conference No.54 75
Swin-UNet++: A Multi-Scale Transformer-CNN Hybrid for Brain
Tumor Segmentation in MRI Scans
2
3
Rinvi Jaman Riti ∗1 , Norihiro Koizumi , Yu Nishiyama , and Peiji Chen 4
1 UEC Exchange Study Program (JUSST Program)
2,4 Department of Mechanical and Intelligent Systems Engineering
3 Department of Computer and Network Engineering
The University of Electro-Communications, Tokyo, Japan
Keywords: Brain Tumor Segmentation, Swin Transformer, U-Net++, MRI Scans, Deep Learning,
Medical Image Analysis
Abstract
Brain tumors require segmentation as part of clinical diagnosis and planning of therapy. Segmen-
tation of tumors in MRI scans is, however, labor-intensive and time-consuming, and it is prone to
inter-observer variability since it is done manually. It is based on our idea to introduce a Swin-UNet++
model, a new type of hybrid deep learning model that has both advantages of convolutional neural net-
works and Swin Transformer blocks largely together, in a very successful way. The architecture created
is novel, as it solves the purpose of providing accurate and efficient segmentation of brain tumors by
capturing local, as well as global image features. However, as Swin Transformers use the shifted win-
dow attention method that allows sustaining fine-grained features, when supplemented with dense skip
connections of U-Net++, the architecture allows the model to consider not only the local information
but also provides it with global contextual connectivity. On the other hand, the BRATS 2021 Task 1
dataset, consisting of various modalities of magnetic resonance imaging, has many multi-modal scans,
and the tumor masks are precisely annotated by healthcare professionals. That is why we developed
and evaluated our model using this dataset. Based on Dice, which was 0.89, the Intersection-over-Union
(IoU), which was 0.83, and pixel-wise accuracy that was good, it has been observed in experiments that
Swin-UNet++ has a better precision and robustness as compared to conventional CNN-based designs.
This combination methodology shows strong potential in real-world clinical applications and will set a
precedent when it comes to Transformer-CNN fusion in the medical picture segmentation process.
∗
The author is supported by JASSO Scholarship.