Throughout the lectures, various topics concerning Neural Networks (NNs) for computer vision have been covered. In this competition, we challenge you to put your newly acquired skills to the test by working with a real-world dataset.
You will have the opportunity to work with the CityScapes dataset, a large-scale collection of images capturing various aspects of urban environments, such as street scenes, buildings, vehicles, and more. By participating in this competition, you will gain valuable experience in applying your knowledge to real-world problems and improve your skills in computer vision.
This competition will focus on multiple problems that are relevant but often under-addressed in a research setting, however, have proven to be crucial for real-world applications. For each of these problems we have created a specific benchmarking test set, related to CityScapes.
Fig 1. Here a sample from the dataset is shown, together with the semantic label map of the sample.
Before diving into the main benchmarks, you must first create a baseline submission. This step ensures that your workflow is correctly set up, your model training and evaluation pipeline is functional, and your submissions adhere to the competition requirements. The baseline phase is critical to verify that you are on the right track. It also helps establish a reference point for your subsequent improvements. Please note that there is a strict deadline for submitting the baseline results to ensure everyone progresses smoothly toward the final assignment.
After providing the first baseline results your can chose a specific direction for your final assignment. The challenge consists of four benchmarks, each addressing a different aspect:
This is the first benchmark and will also sever as a baseline submission. Generally, this subtask is the focus of computer vision challenges. This benchmark assesses the model's performance on a clean test set. Evaluation is based on the model's ability to achieve the highest segmentation scores on a designated test set.
The robustness benchmark evaluates the model's resilience against various conditions, such as changes in lighting, weather, or decreased image quality. Consistent performance across different scenarios is needed to excel in this category.
Efficiency is crucial for practical deployment of computer vision models. This benchmark focuses on evaluating models on their size. Often, in the presence of large amounts of data, larger models outperform their smaller counterparts. However, on edge devices, such as FPGAs, very large model can't be used. Therefore, this benchmarks focuses on reducing model size while maintaining acceptable performance.
Models often encounter data that differ significantly from the training distribution, leading to unreliable predictions. This benchmark assesses the model's ability to detect out-of-distribution samples to leave out for final evaluation.
Fig 2. Some examples of challenging conditions to which the network will face in deployed in a real world setting, but are not present in the training set.
We wish you the best of luck in this competition, and we look forward to seeing your innovative solutions!
In this competition, the primary evaluation metric is the Dice coefficient.
The formula is given by:
where X represents the prediction set of pixels and Y represents the ground truth.
This metric is commonly used in image segmentation tasks and measures the overlap between the predicted segmentation and the ground truth segmentation. The metric is closely related to the IoU.
In this part of the challenge, we will only consider the highest mean Dice score over the whole test set as the performance metric.
For the robustness part of the challenge, we will consider the mean Dice score over the whole test set. However, the images are more challenging and represent a more heterogeneous test set.
For the efficiency part of the challenge, the primary ranking criterion will be the efficiency score. This score is defined by the mean Dice score over the whole test set divided by the number of flops of each model. There is a small addition, the mean dice score should be at least the performance of our baseline model.
For this benchmark evaluation, the process works slightly differently compared to the other benchmarks. The test set contains images from the original Cityscapes test set (in-domain data), as well as Cityscapes images in the rain and fog (near out-of-distribution), and completely different images from random sources. Your model should be able to provide a segmentation output and a classification token indicating if the images should be evaluated or not. The primary evaluation criterion will be the Dice score over all included images for evaluation. If an image from the original test set is incorrectly classified as out-of-distribution data and excluded from evaluation, this will result in a Dice score of 0. If near out-of-distribution data is included for evaluation, it will receive a Dice score, likely lower than the images in the Cityscapes test set. If far out-of-distribution data is classified as in-distribution and included in evaluation, this will also result in a Dice score of 0 for that image.
Welcome to the competition! Below are the rules and bonuses explained:
Participants are required to submit a minimum of 1 entry for the Peak Performance benchmark and at least one other benchmark. Each participant can submit a maximum of 1 entry per day for all benchmarks. Keep this in mind if you plan to compare multiple solutions for your report, and avoid waiting until the deadline day.
We encourage early submissions. This allows us to provide assistance promptly, helping you to refine your solution for the competition.
Participants who rank in the top 3 on any leaderboard for a benchmark will receive an additional bonus of 0.25 added to their final assignment grade. If you achieve the top-performing solution on any benchmark, you will receive a higher bonus of 0.5 added to your final assignment grade.
For example, if you have the best performance in the 'Peak Performance' benchmark and rank in the top 3 of any other benchmark, you will receive a 0.75 bonus added to your final assignment grade.
Start: Jan. 3, 2025, midnight
Description: Establish a functional pipeline and verify your model's segmentation accuracy on a clean test set to ensure you're on the right track.
Start: March 20, 2025, midnight
Description: Focus on achieving the highest segmentation accuracy on a clean, standard test set to demonstrate peak performance.
Start: March 20, 2025, midnight
Description: Test your model's resilience by evaluating performance under conditions of reduced image quality or challenging environments.
Start: March 20, 2025, midnight
Description: Evaluate your model's performance relative to its efficiency, emphasizing design for practical deployment.
Start: March 20, 2025, midnight
Description: Assess your model's ability to detect and handle out-of-distribution samples effectively.
July 5, 2025, midnight
You must be logged in to participate in competitions.
Sign In| # | Username | Score |
|---|---|---|
| 1 | samblauw | - |
| 2 | samblauwhof | - |
| 3 | lampidis | - |