This challenge focuses on the challenging video semantic segmentation (VSS) task in the wild, i.e., assigning pre-defined semantic labels to pixels of all frames in a given video. The main challenge of Video Scene Parsing task is how to leverage the temporal information for high predictive accuracy. We expect the challengers to provide results in terms of the accuracy better than image-based semantic segmentation methods. For more details about the PVUW2024 challenge: https://www.vspwdataset.com/Workshop2024.html. If you register for this competition, please use institution emails (example: mit.edu; ox.ac.uk; microsoft.com). Registration emails with email providers such as gmail.com, yahoo.com, hotmail.com and163.com will not be accepted.
This challenge is part of the PVUW workshop at CVPR 2024, see the workshop site for more details.
Please refer to the following paper if you participate in this challenge or use the dataset for your approach:
@inproceedings{miao2021vspw,
title={VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild},
author={Miao, Jiaxu and Wei, Yunchao and Wu, Yu and Liang, Chen and Li, Guangrui and Yang, Yi},
booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
year={2021}
}
@inproceedings{miao2022large,
title={Large-scale Video Panoptic Segmentation in the Wild: A Benchmark},
author={Miao, Jiaxu and Wang, Xiaohan and Wu, Yu and Li, Wei and Zhang, Xu and Wei, Yunchao and Yang, Yi},
booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},
year={2022}
}
Any questions? Please send email to jiaxumiao@zju.edu.cn
For video scene parsing, we use Mean IoU to evaluate the segmentation performance and Video Consistency(VC) to evaluate the stability of predictions.
Mean IoU(mIoU) indicates the intersection-over-union between the predicted and ground truth pixels, averaged over all the classes.
Video Consistency (VC) indicates the category consistency among long-range adjacent frames.
We provide the labeled training data and validation data. The provided test data is unlabeled. The validation data is allowed for training your model.
There are 2 phases:
You only need to submit the prediction results (no code). The ranking is evaluated according to mIoU.
The submission file is a zip file named result_submission.zip. The structure of the folder is:
|----result_submission/
| |----video1/
| |----image1.png
| |----image2.png
......
Note that the folder name MUST be "result_submission".
NOTE!: The number range of your submission must be 0-123. The category-number dictionary "label_num_dic_final.json" in VSPW dataset shows that 0 denotes "others" , 1 denotes "wall", 2 denotes "ceiling" ... However, "others" is not evaluated. Thus, the corresponding number of category for submission should be origin number minus 1, i.e., 0 denotes "wall", 1 denotes "ceiling"...
You must submit predicted results of all the test data. One example of the submission file is shown at Participate - Files - Starting Kit. There is a baseline code for VSPW here.
Rules: 1. For the 2nd Pixel-level Video Understanding in the Wild challenge, the ranking is evaluated according to mIoU.
2.The validation data is allowed for training your model.
3. Other datasets are allowed for training and the participants need to claim which extra datasets they used.
[1] VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild. CVPR 2021
[2] Large-scale Video Panoptic Segmentation in the Wild: A Benchmark. CVPR 2022
Start: Feb. 1, 2024, 6 p.m.
Description: Development phase: directly submit results on the entire test data. Results are evaluated on test_part1.
Start: May 15, 2024, 6 p.m.
Description: Final phase: directly submit results on the entire test data. The results on the test set will be revealed when the organizers make them available.
May 25, 2024, 8 p.m.
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | SiegeLion | 0.6783 |
2 | lieflat | 0.6727 |
3 | kevin1234 | 0.6392 |