Page 38 - 2024S
P. 38
UEC Int’l Mini-Conference No.52 31
Table 3: Performance Comparison of Different Feature Extractors in the Proposed Architecture by [2]
32 segments 48 segments
Model AUC-Bert+MIL AUC-MIL AUC-Bert+MIL AUC-MIL
C3D [9] 75.93 75.91 75.83 75.82
I3D [5] 77.15 74.35 76.25 68.99
TSM [7] 77.36 70.60 78.10 71.07
UniFormer-S [9] 79.74 76.47 79.68 77.95
UniFormer-B [6] 80.52 77.94 79.45 77.88
[2] Tan W, Yao Q, Liu J. Overlooked video transformers for language understanding.
classification in weakly supervised video arXiv preprint arXiv:1810.04805. 2018.
anomaly detection. In: Proceedings of the
IEEE/CVF Winter Conference on Applica- [9] Tran D, Bourdev L, Fergus R, Torresani L,
Paluri M. Learning spatiotemporal features
tions of Computer Vision; 2024. p. 202-210.
with 3D convolutional networks. In: Pro-
[3] Tian Y, Pang G, Chen Y, Singh R, Verjans ceedings of the IEEE international confer-
JW, Carneiro G. Weakly-Supervised Video ence on computer vision. 2015; p. 4489-4497.
Anomaly Detection With Robust Temporal
¨
Feature Magnitude Learning. In: Proceed- [10] Ozt¨urk HI, Can AB. Adnet: Temporal
ings of the IEEE/CVF International Confer- anomaly detection in surveillance videos.
ence on Computer Vision (ICCV), 2021, pp. In: Pattern Recognition. ICPR International
4975-4986. Workshops and Challenges: Virtual Event,
January 10–15, 2021, Proceedings, Part IV.
[4] Karim H, Doshi K, Yilmaz Y. Real-Time Springer; 2021. p. 88-101.
Weakly Supervised Video Anomaly Detec-
tion. In: Proceedings of the IEEE/CVF [11] Lin J, Gan C, Han S. Tsm: Temporal
Winter Conference on Applications of Com- shift module for efficient video understand-
puter Vision. 2024. p. 6848-6856. ing. In: Proceedings of the IEEE/CVF in-
ternational conference on computer vision.
[5] Carreira J, Zisserman A. Quo vadis, action 2019; p. 7083-7093.
recognition? A new model and the kinet-
ics dataset. In: Proceedings of the IEEE [12] NVIDIA. Jetson Orin Nano
Conference on Computer Vision and Pattern Series. [Online]. Available at:
Recognition. 2017. p. 6299-6308. https://www.nvidia.com/en-
us/autonomous-machines/embedded-
[6] Li K, Wang Y, Gao P, Song G, Liu systems/jetson-orin/. [May 2024].
Y, Li H, Qiao Y. UniFormer: Unified
transformer for Efficient Spatiotempo- [13] Carreira J, Noland E, Banki-Horvath
ral Representation Learning. CoRR. A, Hillier C, Zisserman A. A short
2022;abs/2201.04676. Available from: note about kinetics-600. arXiv preprint
[https://arxiv.org/abs/2201.04676]. arXiv:1808.01340. 2018.
[7] Lin J, Gan C, Han S. Tsm: Temporal shift
module for efficient video understanding. In:
Proceedings of the IEEE/CVF International
Conference on Computer Vision. 2019. p.
7083-7093.
[8] Devlin J, Chang MW, Lee K, Toutanova
K. Bert: Pre-training of deep bidirectional