Page 22 - 2025S
P. 22
UEC Int’l Mini-Conference No.54 15
Figure 5: Qualitative Results Comparisons on UFUC between our method and the previous state-of-
the-art method of FontDiffuser. The red boxes highlight the challenging areas of fontDiffuser.
titative evaluation results of ablation studies.
Specifically, Fig. 6 compares the visual qual-
ity of generated Bangla characters across sev-
eral model configurations: Baseline, Baseline +
Cross-Attention (CA), and Baseline + Cross-
Attention with Discriminator (CA + D), along-
side the ground truth (Target) and reference Figure 6: Qualitative evaluation results of abla-
font. The example of the input character and tion studies. An illustration of several modules.
the style can be seen in the source and the refer- CA and D represent Cross-attention and Dis-
ence columns. Looking at the Baseline column,
we can see how the overall shape of characters criminator, respectively. Red boxes represent
the missing strokes, while green represents the
still remained intact, but we can notice a signif- corresponding improvements.
icant number of distortions in the complicated
strokes and misrepresentation of ligatures. This
indicates that the baseline does not have the ca-
pability to capture any form of detailed style in- ture, leading to clearer glyph shapes and more
formation. accurate stroke positioning. For example, in the
second row, the complex character shows better
When we add the Cross-Attention (CA) mod- integration of the matra (horizontal stroke) and
ule, the visual results improve significantly. The conjunct formation compared to the baseline.
CA module allows better alignment between the However, although CA improves style transfer,
reference style and the target character struc- some inconsistencies remain in the finer details,