PVUW2024 VSS Track

Organized by miaomm - Current server time: March 29, 2025, 8:13 p.m. UTC

Previous

Final
May 15, 2024, 6 p.m. UTC

Current

Final
May 15, 2024, 6 p.m. UTC

End

Competition Ends
May 25, 2024, 8 p.m. UTC

Pixel-level Video Understanding in the Wild Challenge (VSS Track)

This challenge focuses on  the challenging video semantic segmentation (VSS) task in the wild, i.e., assigning pre-defined semantic labels to pixels of all frames in a given video. The main challenge of Video Scene Parsing task is how to leverage the temporal information for high predictive accuracy. We expect the challengers to provide results in terms of the accuracy better than image-based semantic segmentation methods. For more details about the PVUW2024 challenge: https://www.vspwdataset.com/Workshop2024.html. If you register for this competition, please use institution emails (example: mit.edu; ox.ac.uk; microsoft.com). Registration emails with email providers such as gmail.com, yahoo.com, hotmail.com and163.com will not be accepted.


This challenge is part of the PVUW workshop at CVPR 2024, see the workshop site for more details.

Please refer to the following paper if you participate in this challenge or use the dataset for your approach:

 

 @inproceedings{miao2021vspw,

title={VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild},

author={Miao, Jiaxu and Wei, Yunchao and Wu, Yu and Liang, Chen and Li, Guangrui and Yang, Yi},

booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},

year={2021}

}

@inproceedings{miao2022large,

title={Large-scale Video Panoptic Segmentation in the Wild: A Benchmark},

author={Miao, Jiaxu and Wang, Xiaohan and Wu, Yu and Li, Wei and Zhang, Xu and Wei, Yunchao and Yang, Yi},

booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},

year={2022}

}

 

Any questions? Please send email to jiaxumiao@zju.edu.cn

Evaluation

For video scene parsing, we use Mean IoU to evaluate the segmentation performance and Video Consistency(VC) to evaluate the stability of predictions.

Mean IoU(mIoU) indicates the intersection-over-union between the predicted and ground truth pixels, averaged over all the classes. 

Video Consistency (VC)  indicates the category consistency among long-range adjacent frames. 

We provide the labeled training data and validation data. The provided test data is unlabeled. The validation data is allowed for training your model.

 

There are 2 phases:

  • Phase 1: development phase.  We provide you with labeled training/validation data and unlabeled test data. Since the labeled validation data is provided, in this phase, we split the test data into two parts. You will receive feedback on your performance on one part of the test set only. Which videos in the test set at Phase 1 part will not be told. Thus you must submit the results of all the test set. The performance of your best submission will be displayed on the leaderboard.
  • Phase 2: final phase.  The scores of the development phase will not be automatically copied over. Therefore, you must re-submit your solution to the final phase if you want to be considered in the final leaderboard.

You only need to submit the prediction results (no code). The ranking is evaluated according to mIoU.

The submission file is a zip file named result_submission.zip. The structure of the folder is:

|----result_submission/

|        |----video1/

|                  |----image1.png

|                  |----image2.png

......

Note that the folder name MUST be "result_submission".

NOTE!:  The number range of your submission must be 0-123.  The category-number dictionary "label_num_dic_final.json" in VSPW dataset  shows that 0 denotes "others" , 1 denotes "wall",  2 denotes "ceiling" ... However, "others" is not evaluated. Thus, the corresponding number of category for submission should be origin number minus 1, i.e., 0 denotes "wall", 1 denotes "ceiling"...

 

You must submit predicted results of all the test data. One example of the submission file is shown at Participate - Files - Starting Kit.  There is a baseline code for VSPW here.        

Rules: 1. For the 2nd Pixel-level Video Understanding in the Wild challenge, the ranking is evaluated according to mIoU.

2.The validation data is allowed for training your model. 

3. Other datasets are allowed for training and the participants need to claim which extra datasets they used.

Terms and Conditions

  • You agree that if you place in the top-3 at the end of the challenge you will submit a report to introduce your method.
  • You agree to us storing your submission results for evaluation purposes.
  • For academic use of the datasets within and outside this competition, please cite the following paper. 

        [1] VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild. CVPR 2021

        [2] Large-scale Video Panoptic Segmentation in the Wild: A Benchmark. CVPR 2022 

Development

Start: Feb. 1, 2024, 6 p.m.

Description: Development phase: directly submit results on the entire test data. Results are evaluated on test_part1.

Final

Start: May 15, 2024, 6 p.m.

Description: Final phase: directly submit results on the entire test data. The results on the test set will be revealed when the organizers make them available.

Competition Ends

May 25, 2024, 8 p.m.

You must be logged in to participate in competitions.

Sign In
# Username Score
1 SiegeLion 0.6783
2 lieflat 0.6727
3 kevin1234 0.6392