Page 19 - 2025S

P. 19

12 UEC Int’l Mini-Conference No.54

Figure 3: Overview of our proposed method. The Cross-Attention Content Fusion and A Patch Level
Discriminator in the blue dashed line boxes.

our BengaliDiff Model extends it with a patch- at random (as “seen fonts”), each of which has
based discriminator, that is conditioned to dis- 800 unique characters. We employ two tests to
tinguish small elements (patches) of a glyph as assess the methods: one testing set comprises 5
real or generated. The discriminator supplies unseen fonts with 162 unseen characters (known
the model with extra data on the texture of a as “UFUC”) and 10 seen fonts with 120 un-
font and the clarity of the strokes. Specially im- seen characters that are not seen during training
proves the sharpness and fine-detail quality of (known as “SFUC”). For quantitative analysis,
generated images, particularly in difficult places we employ the FID, SSIM, LPIPS, RMSE, and
like matras, curving tails, and complex ligatures. L1 loss measures.
This improvement is clearly shown in both visual
comparisons and our quantitative data (LPIPS
and FID). In summary, the patch-level discrim-
inator acts as a powerful corrective mechanism,
guiding the BengaliDiff generator toward pro- 4.2 Implementation Details
ducing glyphs that are not only structurally
accurate but also visually pleasing and typo-
graphically robust, far surpassing the outputs of We train FontDiffuser with β1 = 0.9 and β2 =
diffusion-only baselines. 0.999 using the AdamW optimizer. The image’s
dimensions are set at 96 × 96. In phase 1, we
train the model using a batch size of 16 and a
total of 90000 steps. The learning rate with a
4 Experiments and Results
linear schedule is 1e−4. The learning rate is set
as 1e−5 for the second phase. For training, we
4.1 Datasets and Evaluation Metrics
utilize 16 negative samples, 30000 steps total,
We use Kaggle to collect 55 Bengali fonts and a batch size of 16. One RTX 3090 GPU is
(styles). For the training set, we choose 50 fonts used for both training and testing.

14 15 16 17 18 19 20 21 22 23 24