Dear VIP Cup 2023 Participants,
We are pleased to announce Phase 2 of the competition and invite you all to participate!
Here is the Zenodo link for the new test data and submission file: https://zenodo.org/record/8289533
Submission Logistics: Please note that Phase 2 is closed and will not be hosted on Codalab. Final submissions must be made on CMS by Sept. 3 AoE. Submission must include the following:
1. A submission file with results on the new Phase 2 testset. Use the images in the above zenodo repository to produce biomarker predictions for each image and save them into the submission template CSV provided. The CSV file is in the same format you were using for Phase 1 and should be familiar.
2. A PDF document with names, emails, and affiliations of all participants in the team. Additionally, the document can contain the general approach used in the work, system specifications of the devices used to train and test the model, and any other detail that relates to the implementation and the deployed algorithms. Please note: We are planning on inviting top 10 teams to submit a non-archival 4-page paper that details the methodology and approach. We will host it on our website, or the authors can host on ArXiv and we will link it there. More details on this after Sept 3.
3. All registered teams on CMS website can participate in Phase 2 and submit their results.
Winners will be declared based on Phase 2 dataset results only.
New Testset: In phase 2 of the competition, we introduce a new test set. This new test set is labeled by medical experts in the same manner as the test set from Phase 1 with every image associated with 6 biomarkers. However, the key difference between the new and old test set is that the new data has a larger patient diversity (167 in Phase 2 vs. 40 unique patients in Phase 1) while having fewer images (250 vs. 3871). While Phase 1 had a larger set of images, the redundancy between images was higher due to many instances drawn from the same patients. Furthermore, the original test set was drawn from a clinical trial with similar population demographics to that of the training set. Both considerations limit the ability of the original test set to rigorously test the generalizability and personalizability of the models created in this competition. The new test set is drawn from a larger patient pool and population base which allows us to better determine a true winner for the VIP Cup.
Evaluation Metric: We will use the patient-wise macro-F1 score metric (as originally described in https://alregib.ece.gatech.edu/2023-vip-cup/) to evaluate results from Phase 2. The difference is in averaging across patients rather than the whole testset. For instance, in the Phase 1 evaluation of the old testset on Codalab, the baseline model (available on Github) provided 0.6256 macro-F1 score. In contrast, Phase 2 evaluation of the old test set on the baseline model provides 0.7514.
New Results on Phase 2 Test Set: In terms of results on the new test set, our internal baseline testing of Phase 1 metric provides 0.6315. This is comparable to Phase 1 test set’s 0.6256. However, in terms of the Phase 2 patient-wise metric, the new test set produces a score of 0.6943. This is significantly less compared to the old test set’s 0.7514. The difference is because of 167 patients in the new test set compared to 40 in the old testset.
Codalab going forward: We encourage all participants to utilize codalab as a validation dataset. As seen in the baseline results above, the Phase 1 metric is comparable between the two test sets. Creating generalizable models on Codalab will undoubtable help for Phase 2 results on the new test set. However, be careful not to overfit since the patient-wise metric can lead to different results.
We wish you all the best of luck going forward!
Posted by: OLIVES @ Aug. 28, 2023, 2:33 p.m.