Page 73 - 2024S
P. 73
66 UEC Int’l Mini-Conference No.52
Skeleton-Based Action Classification in Baseball
1
1
Sergio HUESCA 1,2,* , Gibran BENITEZ , Hiroki TAKAHASHI , and Mariko NAKANO 2
1 Department of Informatics, The University of Electro-Communications, Tokyo, Japan
2 Instituto Politécnico Nacional, Mexico City, Mexico
* u2495004@gl.cc.uec.ac.jp
Fig. 1. General pipeline of a Skeleton-Based Baseball Action Classification model using Deep Learning model.
Introduction Methodology
❑Multi-label classification ❑ Objective: Develop a Skeleton-based Deep learning model to classify
In baseball, a single pitch can have simultaneously multiple outcomes actions within a baseball game from different unseen points of view.
(curveball, strike, swing, hit, foul)
❑ Metrics and Evaluation. Metrics used for comparison include:
❑Human Action Recognition and Classification • Accuracy: Measure of correct predictions.
Identifying actions performed by subjects in videos. • Precision: Proportion of true positives among predicted positives.
• Classical Computer Vision approaches: SIFT, HOG. • Recall: Proportion of true positives among actual positives.
• Deep Learning approaches: Optical Flows, CNNs, RNNs, LSTM. • F1-score: Harmonic mean of precision and recall.
• Computational Efficiency: Training time, inference speed, and
resource requirements.
❑Skeleton-based Action Recognition and classification
Human joint data of subjects in a video is used to recognize and
classify actions. ❑ Dataset: Training and testing is conducted on the MLB-youtube
➢Spatial-Temporal Graph Convolutional Network [1]. dataset [2].
(ST-GCN): GCN + TCN ❑ Experimental Setup
• Data Preprocessing: Skeleton data will be processed to prepare for
training and testing.
❑Human Pose Estimation (HPE) • Model Training: Train the ST-GCN model from scratch with the
Detecting and estimating human joints (keypoints) in images. 2D or 3D dataset.
➢Multi-Person Pose Estimation using OpenPose (Cao et. al. 2019). • Evaluation: Performance is evaluated using defined metrics to
assess the model´s effectiveness.
Current Progress
a) b) c)
Fig. 2. Pose Estimation made pytorch implementation of OpenPose: a) unmodified OpenPose implementation´s estimation; b) modified
implementation, less noisy but still detects bodies in the back; c) modified search scale, no detection of bodies in the back.
References
[1] S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” Jan. 25, 2018, arXiv: arXiv:1801.07455. doi: 10.48550/arXiv.1801.07455.
[2] A. Piergiovanni and M. S. Ryoo, “Fine-Grained Activity Recognition in Baseball Videos,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA:
IEEE, Jun. 2018, pp. 1821– 18218. doi: 10.1109/CVPRW.2018.00226.