Page 38 - 2024S
P. 38

UEC Int’l Mini-Conference No.52                                                               31







            Table 3: Performance Comparison of Different Feature Extractors in the Proposed Architecture by [2]

                                            32 segments                       48 segments

                  Model            AUC-Bert+MIL        AUC-MIL       AUC-Bert+MIL        AUC-MIL
                  C3D [9]                 75.93           75.91            75.83            75.82
                  I3D [5]                 77.15           74.35            76.25            68.99
                  TSM [7]                 77.36           70.60            78.10            71.07
                  UniFormer-S [9]         79.74           76.47            79.68           77.95
                  UniFormer-B [6]        80.52            77.94            79.45            77.88



            [2] Tan W, Yao Q, Liu J. Overlooked video            transformers for language understanding.
                classification in weakly supervised video        arXiv preprint arXiv:1810.04805. 2018.
                anomaly detection. In: Proceedings of the
                IEEE/CVF Winter Conference on Applica-        [9] Tran D, Bourdev L, Fergus R, Torresani L,
                                                                 Paluri M. Learning spatiotemporal features
                tions of Computer Vision; 2024. p. 202-210.
                                                                 with 3D convolutional networks. In: Pro-
            [3] Tian Y, Pang G, Chen Y, Singh R, Verjans         ceedings of the IEEE international confer-
                JW, Carneiro G. Weakly-Supervised Video          ence on computer vision. 2015; p. 4489-4497.
                Anomaly Detection With Robust Temporal
                                                                   ¨
                Feature Magnitude Learning. In: Proceed-      [10] Ozt¨urk HI, Can AB. Adnet:    Temporal
                ings of the IEEE/CVF International Confer-       anomaly detection in surveillance videos.
                ence on Computer Vision (ICCV), 2021, pp.        In: Pattern Recognition. ICPR International
                4975-4986.                                       Workshops and Challenges: Virtual Event,
                                                                 January 10–15, 2021, Proceedings, Part IV.
            [4] Karim H, Doshi K, Yilmaz Y. Real-Time            Springer; 2021. p. 88-101.
                Weakly Supervised Video Anomaly Detec-
                tion. In:  Proceedings of the IEEE/CVF        [11] Lin J, Gan C, Han S. Tsm: Temporal
                Winter Conference on Applications of Com-        shift module for efficient video understand-
                puter Vision. 2024. p. 6848-6856.                ing. In: Proceedings of the IEEE/CVF in-
                                                                 ternational conference on computer vision.
            [5] Carreira J, Zisserman A. Quo vadis, action       2019; p. 7083-7093.
                recognition? A new model and the kinet-
                ics dataset. In: Proceedings of the IEEE      [12] NVIDIA.      Jetson     Orin      Nano
                Conference on Computer Vision and Pattern        Series.    [Online].     Available     at:
                Recognition. 2017. p. 6299-6308.                 https://www.nvidia.com/en-
                                                                 us/autonomous-machines/embedded-
            [6] Li K, Wang Y, Gao P, Song G, Liu                 systems/jetson-orin/. [May 2024].
                Y, Li H, Qiao Y. UniFormer:       Unified
                transformer  for  Efficient  Spatiotempo-     [13] Carreira J, Noland E, Banki-Horvath
                ral  Representation   Learning.   CoRR.          A,   Hillier  C,  Zisserman  A.  A  short
                2022;abs/2201.04676.   Available   from:         note about kinetics-600. arXiv preprint
                [https://arxiv.org/abs/2201.04676].              arXiv:1808.01340. 2018.

            [7] Lin J, Gan C, Han S. Tsm: Temporal shift
                module for efficient video understanding. In:
                Proceedings of the IEEE/CVF International
                Conference on Computer Vision. 2019. p.
                7083-7093.

            [8] Devlin J, Chang MW, Lee K, Toutanova
                K. Bert: Pre-training of deep bidirectional
   33   34   35   36   37   38   39   40   41   42   43