Page 39 - 2024F
P. 39
32 UEC Int’l Mini-Conference No.53
Fig. 1: Proposed Method Overview.
III. RESULTS pitch classification: applying OpenPose to detect
A. Dataset the four most frequently identified poses in each
In this paper we employ the MLB-YouTube frame, selecting the ones with the highest confi-
dence scores, without any isolation. This baseline
dataset [7], designed for fine-grained activity recog- was chosen as it provides a straightforward method
nition in baseball, and focuses on classifying play to extract pose information without focusing specif-
outcomes while also providing annotations for six ically on the pitcher. Our proposed method, which
pitch types. The dataset contains pitch clips ex- isolates only the pitcher’s pose, significantly im-
tracted from 20 different TV broadcasted baseball proves classification, achieving an overall accuracy
games from the 2017 MLB postseason available on of 68.2%, compared to 60.2% with the baseline,
YouTube. Each clip averages 6 seconds in length, as shown in Table I. By concentrating on a single,
recorded at 60 fps with a resolution of 1280×720. consistent subject, our isolation approach reduces
The dataset is split into 4,547 training clips and interference from other individuals in the frame.
1,150 test clips, with significant class imbalance, We further explore the effect of using cropped
e.g. 2,582 and 177 samples, for fastball and sinker, video clips that focus solely on the pitcher by
respectively.
applying OpenPose directly to the cropped frame.
This modification yielded an accuracy of 63.8%,
B. Implementation Details suggesting that while cropping improves focus,
We used an Intel Core i7-12700 desktop with directly isolating the pitcher’s pose through our
a single NVIDIA GTX 1660 Super GPU to train method is more effective. Lastly, to explore the
the ST-GCN model for 200 epochs with a batch impact of different graph configurations, we also
size of 8. Optimization was performed using Adam experimented with Google’s MediaPipe [14] on
optimizer [12] with a learning rate of 0.0001, the cropped frames, which extracts 33 keypoints
and Cross-Entropy Loss function. All video frames compared to the 18 provided by OpenPose. This
resulted in a slight decrease in accuracy to 63.1%,
were downsampled to 30 fps and resized to 340 ×
256 to reduce computational cost, with this resolu- indicating that the increased number of keypoints
tion used consistently for both training and testing. and connections did not necessarily lead to better
For joint keypoint extraction, we used OpenPose performance for this specific task. Table I sum-
with pretrained weights on the COCO dataset [13]. marizes these results, demonstrating that our iso-
lated pitcher’s pose strategy with OpenPose out-
C. Preliminary Experiments performed the baseline, achieving higher preci-
sion (68.0% vs. 59.0%) and F1-score (66.0% vs.
To evaluate the behavior of the proposed ap-
proach, we conducted preliminary experiments by 55.0%), and the cropped frames approach as well.
systematically varying key components of our
method, all using the ST-GCN architecture. As a
baseline, we considered the simplest approach for