Page 16 - 2025S
P. 16

UEC Int’l Mini-Conference No.54                                                                9







            tures to generate Chinese or Latin fonts using    the lack or manner of application of a diacritic
            multiple reference images. Their model is effec-  mark or conjunct. Few prior works have ad-
            tive for simpler scripts, and these models require  dressed Bengali font generation beyond hand-
            a large number of examples and often strug-       writing or optical character recognition (OCR).
            gle with structural consistency in more complex   Some methods, like the VAE-GAN-based [1]
            scripts. However, based on our study, most of     model, for generating Bangla printed characters
            these methods did not work for generating Ben-    from handwritten inputs to support OCR and
            gali fonts with complex structures, such as char-  preserve rare compound characters. Using the
            acters with more looped and curved stroke pat-    CMATERdb [12] dataset and a printed font set,
            terns.                                            their architecture with a U-Net-based genera-
                                                              tor and CNN-based discriminator learns image-
                                                              to-image translation effectively.  Their Future
            2.2   Diffusion Model Methods
                                                              work includes improving image quality, increas-
            In recent years, Denoising Diffusion Probabilis-  ing model depth, and applying the system to
            tic Models (DDPMs) [5] have achieved state-       full document digitization. Moreover, BBCNet-
            of-the-art results in image synthesis.  Some      15 [11] introduced a deep convolutional neural
            methods have been presented, such as Okkhor-      network for Bangla handwritten basic character
            Diffusion [5], a novel framework for class-guided  recognition. The model, consisting of 15 layers
            generation of Bangla isolated handwritten char-   including convolutional, pooling, and fully con-
            acters using a DDPM. The model outperformed       nected layers with dropout regularization, was
            StyleGAN2-ADA in visual and structural qual-      trained on the CMATERdb 3.1.2 dataset. It
            ity on multiple datasets, including BanglaLekha-  achieved a test accuracy of 96.40%, outperform-
            Isolated. It achieved strong results with FID,    ing the previous methods, including SVM and
            MS-SSIM, LPIPS, and introduced a new metric,      earlier CNN models. Our approach varies from
            BCAFID, specifically designed for Bangla evalu-   the previous approaches in that it concentrates
            ation. Similarly, VecFusion [13] employed a two-  on contemporary typefaces, even if it also uses
            stage cascaded diffusion model for generating ed-  the diffusion model and reference font images for
            itable vector fonts from glyph codepoints and     Bengali font development.
            font style inputs. A novel discrete-continuous
            representation enables precise control point pre-
            diction across diverse glyph structures. In the   3   Methodology
            context of font generation, FontDiffuser [19] ap-
            plied a conditional diffusion model with multi-   3.1   Proposed Method Overview
            scale content fusion and contrastive style su-
            pervision for Chinese fonts. Their method ef-     Our proposed BengaliDiff method is a condi-
            fectively handles complex characters and large    tional diffusion model-based image generation
            style variations. Experiments on multiple bench-  method. The input of our method is a refer-
            marks show that FontDiffuser outperforms state-   ence style glyph and a source content glyph.
            of-the-art GAN-based methods in quality and       Then, the model creates a new glyph that im-
            generalization. As our method also utilizes the   itates the style while keeping the original con-
            diffusion model, BengaliDiff expands on this con-  tent. Recently, many methods for font genera-
            cept by adding new architectural elements that    tion have achieved significant results. FontDif-
            are appropriate for Bengali scripts.              fuser [19] has shown impressive results on var-
                                                              ious fonts. It has created high-end results, es-
                                                              pecially when dealing with complex fonts and
            2.3   Indic Scripts and Bengali Typog-
                  raphy                                       styles, which involve more serious adjustments
                                                              as compared to earlier methods. We used Font-
            Bengali language representation is extremely dif-  Diffuser as our base network for generating Ben-
            ficult: the same basic character can be radically  gali fonts. Fig. 1 shows the overall framework
            different in its visual forms depending on either  of the base method, FontDiffuser. This architec-
   11   12   13   14   15   16   17   18   19   20   21