Page 37 - 2024S

P. 37

30 UEC Int’l Mini-Conference No.52

Table 2: Overview of Edge Devices

Jetson Orin NX 16GB Jetson Orin NX 8GB Jetson Orin Nano 8GB

AI Performance 100 TOPS 70 TOPS 40 TOPS
GPU 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores
CPU 8-core Arm Cortex 6-core Arm Cortex
Power 25W 20W 15W

On the RTX 3090, I3D processed 101.23 clips sources, but its feature quality was suboptimal,
per second with 27.877 GFLOPs, outperforming with AUC decreasing from 77.15 to 76.25 when
UniFormer-B, C3D, and TSM. UniFormer-S, the segments increased from 32 to 48, indicating sen-
closest to I3D, processed 65.31 clips per second sitivity to segment length. I3D is more suit-
with 28.707 GFLOPs. able for environments not requiring immediate
For edge devices, I3D processed 16.92 clips per action.
second on the Jetson Orin NX 16GB and 11.34 UniFormer-S achieved a high AUC (79.74%)
clips per second on the Jetson Orin Nano, show- with 32 segments in BERT+MIL mode and
ing its viability for real-time implementations. demonstrated robustness to temporal changes,
Although I3D is the most efficient, maintaining consistent performance, making it
UniFormer-S is also viable, processing clips the best option for real-world integration and
efficiently on all devices with similar compu- mobile device implementation.
tational complexity. Considering traditional
monitoring videos are processed at 30 fps 5 Conclusions
(1.87 clips per second), all models except
UniFormer-B are optimal for real-time work. This study evaluated VAD feature extractors
UniFormer-S and I3D have an advantage due for accuracy and computational performance
to lower computational power requirements for using the UCF-Crime database. Tests on
mobile devices. an RTX 3090 GPU and edge devices (Jet-
I3D and UniFormer-S have the most poten- son Orin Nano and NX) showed I3D was the
tial for real-world integration due to their ef- fastest but had limited temporal variation han-
ficiency, but this does not account for feature dling. UniFormer-B achieved the highest AUC
quality or performance when integrated into the (80.52%) with BERT+MIL using 32 segments
VAD model. but is suitable only for controlled environ-
Classification Efficiency of Feature Ex- ments due to high computational complexity.
tractors: In this We aimed to select the best UniFormer-S offered a balanced performance
feature extractor by balancing processing speed, with an AUC of 79.74% and is ideal for mobile
computational complexity, and feature quality. integration with Jetson Orin NX 8GB at 20W.
Feature extractors were integrated into the ar- Future work will optimize a VAD model for real-
chitecture proposed by [2], using the training time anomaly detection using UniFormer-S and
and inference process from section 3. We tested Jetson Orin NX 8GB, enhancing robustness and
with both 32 and 48 fixed segments per video to accuracy.
evaluate performance under different temporal
conditions (Table 3).
References
UniFormer-B provided the richest features
with an AUC of 80.52 at the frame level with 32 [1] Sultani W, Chen C, Shah M. Real-world
segments using BERT + MIL but was impracti- anomaly detection in surveillance videos.
cal for real-world environments due to slow pro- In: Proceedings of the IEEE Conference on
cessing speed. Computer Vision and Pattern Recognition;
I3D processed clips quickly with minimal re- 2018. p. 6479-6488

32 33 34 35 36 37 38 39 40 41 42