MeViS Track of the 4th PVUW challenge 2025

Organized by ntuLC - Current server time: April 7, 2025, 1:28 p.m. UTC

First phase

Valid
March 1, 2025, midnight UTC

End

Competition Ends
March 25, 2025, 11:59 p.m. UTC

MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

🏠[Project page]📄[arXiv]📄[ICCV PDF]

 

Abstract

This paper strives for Motion Expressions guided Video Segmentation, which focuses on segmenting objects in video content based on a sentence describing the motion of the objects. Existing referring video object datasets typically focus on salient objects and use language expressions that contain excessive static attributes that could potentially enable the target object to be identified in a single frame. These datasets downplay the importance of motion in video content for language-guided video object segmentation. To investigate the feasibility of using motion expressions to ground and segment objects in videos, we propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments. We benchmarked 5 existing referring video object segmentation (RVOS) methods and conducted a comprehensive comparison on the MeViS dataset. The results show that current RVOS methods cannot effectively address motion expression-guided video segmentation. We further analyze the challenges and propose a baseline approach for the proposed MeViS dataset. The goal of our benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms that leverage motion expressions as a primary cue for object segmentation in complex video scenes.

BibTeX

Please consider to cite MeViS if it helps your research.

@inproceedings{MeViS,
    title={{MeViS}: A Large-scale Benchmark for Video Segmentation with Motion Expressions},
    author={Ding, Henghui and Liu, Chang and He, Shuting and Jiang, Xudong and Loy, Chen Change},
    booktitle={ICCV},
    year={2023}
  }
@inproceedings{GRES,
    title={{GRES}: Generalized Referring Expression Segmentation},
    author={Liu, Chang and Ding, Henghui and Jiang, Xudong},
    booktitle={CVPR},
    year={2023}
  }
@article{VLT,
    title={{VLT}: Vision-language transformer and query generation for referring segmentation},
    author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
    journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
    year={2023},
    publisher={IEEE}
  }

A majority of videos in MeViS are from MOSE: Complex Video Object Segmentation Dataset.

@inproceedings{MOSE,
    title={{MOSE}: A New Dataset for Video Object Segmentation in Complex Scenes},
    author={Ding, Henghui and Liu, Chang and He, Shuting and Jiang, Xudong and Torr, Philip HS and Bai, Song},
    booktitle={ICCV},
    year={2023}
  }

The data of MeViS is released for non-commercial research purpose only.

Please check the Data page under the Participate tab.

The data of MeViS is released for non-commercial research purpose only.

Valid

Start: March 1, 2025, midnight

Description: val set (140 videos). ⚠️⚠️⚠️ Final ranking will be based on J&F ⚠️⚠️⚠️

CVPR 2025 PVUW MeVis Track

Start: March 14, 2025, 11:59 p.m.

Description: ⚠️⚠️⚠️ Final ranking will be based on J&F ⚠️⚠️⚠️

Competition Ends

March 25, 2025, 11:59 p.m.

You must be logged in to participate in competitions.

Sign In