Page 39 - 2024F
P. 39

32                                                                UEC Int’l Mini-Conference No.53

























                                           Fig. 1: Proposed Method Overview.



                            III. RESULTS                      pitch classification: applying OpenPose to detect
            A. Dataset                                        the four most frequently identified poses in each
              In this paper we employ the MLB-YouTube         frame, selecting the ones with the highest confi-
                                                              dence scores, without any isolation. This baseline
            dataset [7], designed for fine-grained activity recog-  was chosen as it provides a straightforward method
            nition in baseball, and focuses on classifying play  to extract pose information without focusing specif-
            outcomes while also providing annotations for six  ically on the pitcher. Our proposed method, which
            pitch types. The dataset contains pitch clips ex-  isolates only the pitcher’s pose, significantly im-
            tracted from 20 different TV broadcasted baseball  proves classification, achieving an overall accuracy
            games from the 2017 MLB postseason available on   of 68.2%, compared to 60.2% with the baseline,
            YouTube. Each clip averages 6 seconds in length,  as shown in Table I. By concentrating on a single,
            recorded at 60 fps with a resolution of 1280×720.  consistent subject, our isolation approach reduces
            The dataset is split into 4,547 training clips and  interference from other individuals in the frame.
            1,150 test clips, with significant class imbalance,  We further explore the effect of using cropped
            e.g. 2,582 and 177 samples, for fastball and sinker,  video clips that focus solely on the pitcher by
            respectively.
                                                              applying OpenPose directly to the cropped frame.
                                                              This modification yielded an accuracy of 63.8%,
            B. Implementation Details                         suggesting that while cropping improves focus,
              We used an Intel Core i7-12700 desktop with     directly isolating the pitcher’s pose through our
            a single NVIDIA GTX 1660 Super GPU to train       method is more effective. Lastly, to explore the
            the ST-GCN model for 200 epochs with a batch      impact of different graph configurations, we also
            size of 8. Optimization was performed using Adam  experimented with Google’s MediaPipe [14] on
            optimizer [12] with a learning rate of 0.0001,    the cropped frames, which extracts 33 keypoints
            and Cross-Entropy Loss function. All video frames  compared to the 18 provided by OpenPose. This
                                                              resulted in a slight decrease in accuracy to 63.1%,
            were downsampled to 30 fps and resized to 340 ×
            256 to reduce computational cost, with this resolu-  indicating that the increased number of keypoints
            tion used consistently for both training and testing.  and connections did not necessarily lead to better
            For joint keypoint extraction, we used OpenPose   performance for this specific task. Table I sum-
            with pretrained weights on the COCO dataset [13].  marizes these results, demonstrating that our iso-
                                                              lated pitcher’s pose strategy with OpenPose out-
            C. Preliminary Experiments                        performed the baseline, achieving higher preci-
                                                              sion (68.0% vs. 59.0%) and F1-score (66.0% vs.
              To evaluate the behavior of the proposed ap-
            proach, we conducted preliminary experiments by   55.0%), and the cropped frames approach as well.
            systematically varying key components of our
            method, all using the ST-GCN architecture. As a
            baseline, we considered the simplest approach for
   34   35   36   37   38   39   40   41   42   43   44