Page 73 - 2024S
P. 73

66                                                                UEC Int’l Mini-Conference No.52

                Skeleton-Based Action Classification in Baseball


                                                                          1
                                                         1
                           Sergio HUESCA 1,2,* , Gibran BENITEZ , Hiroki TAKAHASHI , and Mariko NAKANO 2
                         1  Department of Informatics, The University of Electro-Communications, Tokyo, Japan
                                       2  Instituto Politécnico Nacional, Mexico City, Mexico
                                                  * u2495004@gl.cc.uec.ac.jp
























                        Fig. 1. General pipeline of a Skeleton-Based Baseball Action Classification model using Deep Learning model.
                               Introduction                                   Methodology

              ❑Multi-label classification                      ❑ Objective:  Develop a Skeleton-based Deep learning model to classify
                In baseball, a single pitch can have simultaneously multiple outcomes  actions within a baseball game from different unseen points of view.
                (curveball, strike, swing, hit, foul)
                                                               ❑ Metrics and Evaluation. Metrics used for comparison include:
              ❑Human Action Recognition and Classification      • Accuracy: Measure of correct predictions.
                Identifying actions performed by subjects in videos.  • Precision: Proportion of true positives among predicted positives.
                • Classical Computer Vision approaches: SIFT, HOG.  • Recall: Proportion of true positives among actual positives.
                • Deep Learning approaches: Optical Flows, CNNs, RNNs, LSTM.  • F1-score: Harmonic mean of precision and recall.
                                                                • Computational  Efficiency:  Training  time,  inference  speed,  and
                                                                  resource requirements.
              ❑Skeleton-based Action Recognition and classification
                Human joint data of subjects in a video is used to recognize and
                classify actions.                              ❑ Dataset:  Training  and  testing  is  conducted  on  the MLB-youtube
                ➢Spatial-Temporal Graph Convolutional Network [1].  dataset [2].
                             (ST-GCN): GCN + TCN               ❑ Experimental Setup
                                                                • Data Preprocessing: Skeleton data will be processed to prepare for
                                                                  training and testing.
              ❑Human Pose Estimation (HPE)                      • Model  Training:  Train  the  ST-GCN  model  from  scratch  with  the
                Detecting and estimating human joints (keypoints) in images. 2D or 3D  dataset.
                ➢Multi-Person Pose Estimation using OpenPose (Cao et. al. 2019).  • Evaluation:  Performance  is  evaluated  using  defined  metrics  to
                                                                  assess the model´s effectiveness.
                                                    Current Progress











                             a)                           b)                               c)
               Fig. 2. Pose Estimation made pytorch implementation of OpenPose: a) unmodified OpenPose implementation´s estimation; b) modified
                   implementation, less noisy but still detects bodies in the back; c) modified search scale, no detection of bodies in the back.
              References
              [1]   S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” Jan. 25, 2018, arXiv: arXiv:1801.07455. doi: 10.48550/arXiv.1801.07455.
              [2]  A. Piergiovanni and M. S. Ryoo, “Fine-Grained Activity Recognition in Baseball Videos,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA:
                IEEE, Jun. 2018, pp. 1821– 18218. doi: 10.1109/CVPRW.2018.00226.
   68   69   70   71   72   73   74   75   76   77   78