Page 20 - 2025S
P. 20

UEC Int’l Mini-Conference No.54                                                               13







            4.3   Comparison      with    State-of-the-       Table 1: Quantitative Comparison of FontDif-
                  Art Method                                  fuser and Our Method
            We compare our approach with the baseline         Model      FID↓  LPIPS↓  L1↓   RMSE↓ SSIM↑
            method, FontDiffuser. It designed with the sim-
                                                                                0.3568
            ilar collection of Bengali letters. Fig. 4 contains  FontDiffuser 0.8685  0.3223  0.2377  0.3151  0.7131
                                                              Ours
                                                                                                     0.6751
                                                                                       0.2547
                                                                                              0.3240
                                                                        0.7063
            the detailed visual example of the FontDiffuser
            results, the BengaliDiff model, and the target
            glyphs. This research demonstrates that Ben-      but greatly differed in stroke thickness or char-
            galiDiff has certain advantages in several impor-  acter shape interpretation no matter how close
            tant font generation domains. Though it oper-     the two forms were. This uniformity is very im-
            ates effectively in preserving overall shape, Font-  portant in terms of professional font design and
            Diffuser is often challenged with smaller visual  typesetting in Bengali since it preserves intelli-
            elements of the Bengali script. As an exam-       gibility and aesthetic harmony. Despite all of
            ple, the font FontDiffuser is rendered with bad   the above, through the visual productions on
            loops, interruptions of the stroke, and even dys-  Fig. 4, one can say that BengaliDiff not only
            functional matra positions within the charac-     overcomes the weaknesses of FontDiffuser, Ben-
            ters. The glyphs can also sometimes be clipped    galiDiff presents characters that are not only aes-
            or structurally degraded because Fontdiffuser     thetically appealing, structurally consistent and
            has not been carefully monitored, so it is ei-    typographically correct. These additions justi-
            ther blurry or the stroke thickness is not con-   fies why the discriminator and cross-attention
            sistent, especially when it has a design that is  are important elements of our model.
            curved or running diagonally. This issue is more    To perform a quantitative analysis of our
            observable in complicated conjuncts and heavy     method with FontDiffuser, we used five com-
            ligature characters. BengaliDiff, however, out-   monly adopted image quality metrics:    FID,
            puts glyphs that are structurally and stylistically  LPIPS, L1, RMSE, and SSIM. FID (Fréchet In-
            rather close to the ground truth. The discrimi-   ception Distance) measures the difference in dis-
            nator fixes the distortions that often appear in  tribution between generated and real images,
            the outputs of FontDiffuser and guides the model  providing a strong indicator of overall visual
            to generate more precise and clean stroke edges.  realism.  LPIPS evaluates perceptual similar-
            This creates more visually clean characters, as   ity based on deep features, capturing differences
            may be seen in our version of complex charac-     that matter to human perception. L1 and RMSE
            ters, which quite obviously does not remove all   compute pixel-level errors between the generated
            minor decoration such as loops and hooks. As      and ground truth images, assessing low-level fi-
            well, BengaliDiff can effectively identify specific  delity. SSIM measures structural similarity, em-
            areas of content, that is, simple consonants and  phasizing the preservation of local textures and
            matras, with their respective equivalent stylis-  shapes.
            tic components of the reference as a result of
            the cross-attention approach. This is most so in  Table 2:  Quantitative evaluation results on
            glyphs, where matras are smoothly integrated      UFUC with FontDiffuser methods.
            into the character structure with the ratio of
            thickness, curve, and alignment. BengaliDiff ac-  Model      FID↓  LPIPS↓   L1↓  RMSE↓ SSIM↑
            curately captures both local texture and global   FontDiffuser 0.9573  0.3738  0.2419  0.3142  0.6855
            layout with the inclusion of multi-scale content  Ours       0.7280  0.3406  0.2706  0.3357  0.6491
            characteristics and style-aware attention.
              A key advantage of BengaliDiff is that it is      We conducted experiments on a set of 100
            consistent within the set size, stroke thickness  characters where the characters vary in the na-
            are same, sigma rhythm of the strokes are same    ture of Bangla, such as simple and compound
            which is in contrast to fontdiffuser which original  Bangla characters, as shown in Table 1. Our
            and generated strokes were not always the same,   method achieved significantly better results in
   15   16   17   18   19   20   21   22   23   24   25