Page 15 - 2025S
P. 15
8 UEC Int’l Mini-Conference No.54
Font [6]. However, when these models are di- diffusion approach. It should be mentioned that
rectly applied to Bengali without explicitly de- studies on the use of GANs for the generation of
scribing their structural features, they frequently handwritten Bengali characters have been con-
have poor generalization. In this paper, we pro- ducted. This section provides an overview of the
pose BengaliDiff, a diffusion-based font gener- related studies of this work, BengaliDiff.
ation framework designed to operate in a one-
shot environment. BengaliDiff has presented 2.1 Font Generation
new training and architectural improvements.
Previous approaches in font generation [3] ap-
• Adversarial supervision discriminator: The plied direct style-content disentanglement like
diffusion models are not pixel-level realis- the use of VAE-GANs approaches [1] or super-
tic on the stroke level; however, they gen- vised component-based GANs [7]. Representa-
erate realistic images by iteratively refining tive works like LF-Font [10] used a factorization
them. To address this, we add a patch-level strategy, and CG-GAN [7] decomposed glyphs
discriminator that discourages artifacts and into shared primitives or strokes, which were
encourages sharper results. used to transfer styles between glyph sets [2].
Diff-Font [6], a one-shot font generation frame-
• Cross-Attention Content Fusion: At sev- work based on a multi-attribute conditional dif-
eral levels within the U-Net architecture, fusion model. Unlike GAN-based methods, Diff-
our transformer-based cross-attention com- Font achieves stable training and high-quality
bines the style and content elements. This generation, especially for complex glyphs in Chi-
makes it easier for the model to synthesize nese and Korean. This method is hindered by
glyphs that retain their intricate structure low inference speed, unreliable recognition of
while also adopting the desired style. rare characters, and poor generalization to un-
seen glyphs due to token dependence.
Our contributions are summarized as follows: Moreover, MSD-Font [4] proposed a multi-
stage few-shot font generation model built on
1. We proposed a unique Bengali script- latent diffusion, which is inspired by expert de-
specific diffusion model.
signers’ workflow. This model separates the
2. We introduced a new joint hybrid training generative process into structure construction,
goal involving diffusion denoising, adversar- font transfer, and refinement stages. They have
ial learning, and contrastive style guidance. Limitations, including higher inference time and
model size, due to the complexity of diffu-
3. We achieved state-of-the-art performance sion models. DG-Font [17] presented an unsu-
on Bengali few-shot font generation bench- pervised font generation model leveraging de-
marks. formable convolution and a novel Feature De-
formation Skip Connection (FDSC) to capture
This is how the rest of the paper is organized. geometric style variations. Additionally, a num-
Related work is reviewed in Section 2. Our train- ber of approaches have been proposed to ex-
ing technique and model architecture are shown pand the above work. A self-supervised cross-
in Section 3. The experimental design, as well as modality pre-training method was presented by
the quantitative and qualitative findings, are de- XMPFont [8]. MX-Font++ [15] introduced an
scribed in Section 4. The broad discussion and enhanced few-shot font generation model that
future directions are presented in Section 5. The uses Heterogeneous Aggregation Experts (HAE)
paper is finally concluded in Section 6. for improved feature extraction. This method
contributes a novel content-style homogeneity
2 Related Work loss to better disentangle content and style
in the latent space. Early works like Auto-
To the best of our knowledge, there is currently Encoder Guided GAN [9] and MC-GAN [2] ap-
no literature on Bengali font generation using a plied encoder-decoder and GAN-based architec-