Page 37 - 2024S
P. 37

30                                                                UEC Int’l Mini-Conference No.52







                                           Table 2: Overview of Edge Devices

                              Jetson Orin NX 16GB       Jetson Orin NX 8GB       Jetson Orin Nano 8GB

             AI Performance          100 TOPS                  70 TOPS                   40 TOPS
             GPU                     1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores
             CPU                 8-core Arm Cortex                      6-core Arm Cortex
             Power                      25W                      20W                       15W



              On the RTX 3090, I3D processed 101.23 clips     sources, but its feature quality was suboptimal,
            per second with 27.877 GFLOPs, outperforming      with AUC decreasing from 77.15 to 76.25 when
            UniFormer-B, C3D, and TSM. UniFormer-S, the       segments increased from 32 to 48, indicating sen-
            closest to I3D, processed 65.31 clips per second  sitivity to segment length. I3D is more suit-
            with 28.707 GFLOPs.                               able for environments not requiring immediate
              For edge devices, I3D processed 16.92 clips per  action.
            second on the Jetson Orin NX 16GB and 11.34         UniFormer-S achieved a high AUC (79.74%)
            clips per second on the Jetson Orin Nano, show-   with 32 segments in BERT+MIL mode and
            ing its viability for real-time implementations.  demonstrated robustness to temporal changes,
              Although    I3D   is  the  most    efficient,   maintaining consistent performance, making it
            UniFormer-S is also viable, processing clips      the best option for real-world integration and
            efficiently on all devices with similar compu-    mobile device implementation.
            tational complexity.  Considering traditional
            monitoring videos are processed at 30 fps         5    Conclusions
            (1.87 clips per second), all models except
            UniFormer-B are optimal for real-time work.       This study evaluated VAD feature extractors
            UniFormer-S and I3D have an advantage due         for accuracy and computational performance
            to lower computational power requirements for     using the UCF-Crime database.      Tests on
            mobile devices.                                   an RTX 3090 GPU and edge devices (Jet-
              I3D and UniFormer-S have the most poten-        son Orin Nano and NX) showed I3D was the
            tial for real-world integration due to their ef-  fastest but had limited temporal variation han-
            ficiency, but this does not account for feature   dling. UniFormer-B achieved the highest AUC
            quality or performance when integrated into the   (80.52%) with BERT+MIL using 32 segments
            VAD model.                                        but is suitable only for controlled environ-
              Classification Efficiency of Feature Ex-        ments due to high computational complexity.
            tractors: In this We aimed to select the best     UniFormer-S offered a balanced performance
            feature extractor by balancing processing speed,  with an AUC of 79.74% and is ideal for mobile
            computational complexity, and feature quality.    integration with Jetson Orin NX 8GB at 20W.
            Feature extractors were integrated into the ar-   Future work will optimize a VAD model for real-
            chitecture proposed by [2], using the training    time anomaly detection using UniFormer-S and
            and inference process from section 3. We tested   Jetson Orin NX 8GB, enhancing robustness and
            with both 32 and 48 fixed segments per video to   accuracy.
            evaluate performance under different temporal
            conditions (Table 3).
                                                              References
              UniFormer-B provided the richest features
            with an AUC of 80.52 at the frame level with 32   [1] Sultani W, Chen C, Shah M. Real-world
            segments using BERT + MIL but was impracti-          anomaly detection in surveillance videos.
            cal for real-world environments due to slow pro-     In: Proceedings of the IEEE Conference on
            cessing speed.                                       Computer Vision and Pattern Recognition;
              I3D processed clips quickly with minimal re-       2018. p. 6479-6488
   32   33   34   35   36   37   38   39   40   41   42