Page 15 - 2025S
P. 15

8                                                                 UEC Int’l Mini-Conference No.54







            Font [6]. However, when these models are di-      diffusion approach. It should be mentioned that
            rectly applied to Bengali without explicitly de-  studies on the use of GANs for the generation of
            scribing their structural features, they frequently  handwritten Bengali characters have been con-
            have poor generalization. In this paper, we pro-  ducted. This section provides an overview of the
            pose BengaliDiff, a diffusion-based font gener-   related studies of this work, BengaliDiff.
            ation framework designed to operate in a one-
            shot environment.   BengaliDiff has presented     2.1   Font Generation
            new training and architectural improvements.
                                                              Previous approaches in font generation [3] ap-
              • Adversarial supervision discriminator: The    plied direct style-content disentanglement like
                diffusion models are not pixel-level realis-  the use of VAE-GANs approaches [1] or super-
                tic on the stroke level; however, they gen-   vised component-based GANs [7]. Representa-
                erate realistic images by iteratively refining  tive works like LF-Font [10] used a factorization
                them. To address this, we add a patch-level   strategy, and CG-GAN [7] decomposed glyphs
                discriminator that discourages artifacts and  into shared primitives or strokes, which were
                encourages sharper results.                   used to transfer styles between glyph sets [2].
                                                              Diff-Font [6], a one-shot font generation frame-
              • Cross-Attention Content Fusion: At sev-       work based on a multi-attribute conditional dif-
                eral levels within the U-Net architecture,    fusion model. Unlike GAN-based methods, Diff-
                our transformer-based cross-attention com-    Font achieves stable training and high-quality
                bines the style and content elements. This    generation, especially for complex glyphs in Chi-
                makes it easier for the model to synthesize   nese and Korean. This method is hindered by
                glyphs that retain their intricate structure  low inference speed, unreliable recognition of
                while also adopting the desired style.        rare characters, and poor generalization to un-
                                                              seen glyphs due to token dependence.
              Our contributions are summarized as follows:      Moreover, MSD-Font [4] proposed a multi-
                                                              stage few-shot font generation model built on
              1. We proposed a unique Bengali script-         latent diffusion, which is inspired by expert de-
                specific diffusion model.
                                                              signers’ workflow.  This model separates the
              2. We introduced a new joint hybrid training    generative process into structure construction,
                goal involving diffusion denoising, adversar-  font transfer, and refinement stages. They have
                ial learning, and contrastive style guidance.  Limitations, including higher inference time and
                                                              model size, due to the complexity of diffu-
              3. We achieved state-of-the-art performance     sion models. DG-Font [17] presented an unsu-
                on Bengali few-shot font generation bench-    pervised font generation model leveraging de-
                marks.                                        formable convolution and a novel Feature De-
                                                              formation Skip Connection (FDSC) to capture
              This is how the rest of the paper is organized.  geometric style variations. Additionally, a num-
            Related work is reviewed in Section 2. Our train-  ber of approaches have been proposed to ex-
            ing technique and model architecture are shown    pand the above work. A self-supervised cross-
            in Section 3. The experimental design, as well as  modality pre-training method was presented by
            the quantitative and qualitative findings, are de-  XMPFont [8]. MX-Font++ [15] introduced an
            scribed in Section 4. The broad discussion and    enhanced few-shot font generation model that
            future directions are presented in Section 5. The  uses Heterogeneous Aggregation Experts (HAE)
            paper is finally concluded in Section 6.          for improved feature extraction. This method

                                                              contributes a novel content-style homogeneity
            2    Related Work                                 loss to better disentangle content and style
                                                              in the latent space.  Early works like Auto-
            To the best of our knowledge, there is currently  Encoder Guided GAN [9] and MC-GAN [2] ap-
            no literature on Bengali font generation using a  plied encoder-decoder and GAN-based architec-
   10   11   12   13   14   15   16   17   18   19   20