Page 16 - 2025S
P. 16
UEC Int’l Mini-Conference No.54 9
tures to generate Chinese or Latin fonts using the lack or manner of application of a diacritic
multiple reference images. Their model is effec- mark or conjunct. Few prior works have ad-
tive for simpler scripts, and these models require dressed Bengali font generation beyond hand-
a large number of examples and often strug- writing or optical character recognition (OCR).
gle with structural consistency in more complex Some methods, like the VAE-GAN-based [1]
scripts. However, based on our study, most of model, for generating Bangla printed characters
these methods did not work for generating Ben- from handwritten inputs to support OCR and
gali fonts with complex structures, such as char- preserve rare compound characters. Using the
acters with more looped and curved stroke pat- CMATERdb [12] dataset and a printed font set,
terns. their architecture with a U-Net-based genera-
tor and CNN-based discriminator learns image-
to-image translation effectively. Their Future
2.2 Diffusion Model Methods
work includes improving image quality, increas-
In recent years, Denoising Diffusion Probabilis- ing model depth, and applying the system to
tic Models (DDPMs) [5] have achieved state- full document digitization. Moreover, BBCNet-
of-the-art results in image synthesis. Some 15 [11] introduced a deep convolutional neural
methods have been presented, such as Okkhor- network for Bangla handwritten basic character
Diffusion [5], a novel framework for class-guided recognition. The model, consisting of 15 layers
generation of Bangla isolated handwritten char- including convolutional, pooling, and fully con-
acters using a DDPM. The model outperformed nected layers with dropout regularization, was
StyleGAN2-ADA in visual and structural qual- trained on the CMATERdb 3.1.2 dataset. It
ity on multiple datasets, including BanglaLekha- achieved a test accuracy of 96.40%, outperform-
Isolated. It achieved strong results with FID, ing the previous methods, including SVM and
MS-SSIM, LPIPS, and introduced a new metric, earlier CNN models. Our approach varies from
BCAFID, specifically designed for Bangla evalu- the previous approaches in that it concentrates
ation. Similarly, VecFusion [13] employed a two- on contemporary typefaces, even if it also uses
stage cascaded diffusion model for generating ed- the diffusion model and reference font images for
itable vector fonts from glyph codepoints and Bengali font development.
font style inputs. A novel discrete-continuous
representation enables precise control point pre-
diction across diverse glyph structures. In the 3 Methodology
context of font generation, FontDiffuser [19] ap-
plied a conditional diffusion model with multi- 3.1 Proposed Method Overview
scale content fusion and contrastive style su-
pervision for Chinese fonts. Their method ef- Our proposed BengaliDiff method is a condi-
fectively handles complex characters and large tional diffusion model-based image generation
style variations. Experiments on multiple bench- method. The input of our method is a refer-
marks show that FontDiffuser outperforms state- ence style glyph and a source content glyph.
of-the-art GAN-based methods in quality and Then, the model creates a new glyph that im-
generalization. As our method also utilizes the itates the style while keeping the original con-
diffusion model, BengaliDiff expands on this con- tent. Recently, many methods for font genera-
cept by adding new architectural elements that tion have achieved significant results. FontDif-
are appropriate for Bengali scripts. fuser [19] has shown impressive results on var-
ious fonts. It has created high-end results, es-
pecially when dealing with complex fonts and
2.3 Indic Scripts and Bengali Typog-
raphy styles, which involve more serious adjustments
as compared to earlier methods. We used Font-
Bengali language representation is extremely dif- Diffuser as our base network for generating Ben-
ficult: the same basic character can be radically gali fonts. Fig. 1 shows the overall framework
different in its visual forms depending on either of the base method, FontDiffuser. This architec-