Shot segmentation is an important and challenging task in video understanding, aiming to split videos into classified shots (e.g., close-up, close shot, full view, audience, transition, zooming, and others).
The existing datasets and benchmarks either simply focus on shot boundary detection (e.g., ClipShots and SHOT) or lack well-defined shot categories for segmentation (e.g., SoccerNet-v2). As a result, it is necessary to propose a new dataset with fine-grained shot categories for shot segementation and well-defined shot boundaries for shot boundary detection.
To this purpose, we propose a fine-grained dataset for shot segmentation as well as shot boundary detection in multiple sports scenes, coined as SportsShot. Our SportsShot is characterized with important properties of well-defined shot boundaries, fine-grained shot categories of complexity, and high-quality annotations with consistency, resulting in more challenges in both shot segmentation and boundary detection. For shot segmentation, we define seven semantic categories with complexity and close to human understanding. As for shot boundary detection, we view both hard cuts and gradual transitions as boundaries and annotate them as intervals.
This track is provide by MCG Group @ Nanjing University
We use accuracy and segmental F1 scores to analyze shot segmentation performance following the standard practice in the temporal action segmentation task[1], in which accuracy evaluates the predictions in a frame-wise manner, while segmental F1-scores measure the temporal overlap between predicted and ground truth segments at different thresholds.
For shot boundary detection, we utilize the precision, recall, and F1 scores following previous datasets[2, 3], for it is important to detect shot boundaries both precisely and thoroughly. To take the duration of gradual transitions into consideration, we label a detected shot boundary as a positive result only when its temporal IoU with a gt shot boundary is over 0.5.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Please ensure that the submitted data contains results of all of the test videos.
Expected submission data:
ZIP_THIS_FOLDER
seg
v_0mcffpH2VTw_0.txt
v_0mcffpH2VTw_1.txt
det
v_0mcffpH2VTw_0.txt
v_0mcffpH2VTw_1.txt
The submission data for shot segmentation should contain one label per line for each frame:
close_up ... close_up full_view ... full_view ...
The submission data for shot boundary detection should contain both start frame and end frame per line:
21 25 78 79 122 123 ...
Start: Nov. 25, 2024, midnight
Description: To submit, upload a .zip file containing text files with the prediction.
Never
You must be logged in to participate in competitions.
Sign In