Dear VIP Cup 2023 Participants,
As you might be aware, there was a test label leak of Phase 1 recently. We, the organizers, detail the reason for the leak as well as our initial response. Regarding test labels becoming public: The compromised test labels were part of a private internal commit that was used to test the repository before making it public. This was back in June before the release of the competition. The history of the commits stayed in the repository even after it was deleted from the main github page. Participants would have had to go through the commit history to find the test labels. We failed to know that the private commits were accessible even after deleting the main GitHub. We only realized the labels were accessible once the participant with the name ‘cxxxxxxxxx2’ (name is disguised for privacy) submitted 100% accurate results and we systematically went through all possible avenues of a leak. This is chronologically described below.
Timeline and the organizers actions
1. When we noticed 100% accurate results on the leaderboard, our immediate action was to check the participant who had submitted it. There were three immediate red flags: 1) The name of the participant was coincidentally identical to the name of one of the co-organizers in charge of maintaining codalab and clearly the organizer did not submitted results, 2) The name was not registered either through cms or through the registration form, and 3) The participant’s email came from a temporary email domain called nezid.com. We were unaware of this site and upon researching online, found a number of sources that claimed the domain’s usage for fraudulent purposes (https://www.ipqualityscore.com/domain-reputation/nezid.com). Our first impulse was believing that codalab internal evaluation was hacked by the participant and our first action was to revoke the access of two participants registered from nezid.com immediately. Revoking access meant that any forum post/submission/result by the revoked participant was automatically deleted. This is an automatic process within codalab.
2. In the following hour, we systematically went through all spaces where it was possible for the leak to have occurred. We had no way of testing codalab security from our side. We first went through the dataset hosting site, zenodo, and ensured there was no test labels available in any version. Then we went through the public github repo and ensured there was no available testset labels. Finally, we went through the multiple Github versions, all the way back to June where we discovered that the private commit with the test labels were available. Our first action when we realized that the private commits were accessible to all was to refresh the github repository so that the private commits were no longer available to view and download. This was done within 30 mins of revoking access to ‘cxxxxxxxxx2’.
We are committed to a fair evaluation of the VIP CUP competition and recognize the efforts of all participants. As such, we have taken steps to provide a new testset for phase 2 that will be used to rank the registered teams. Getting medical labels, however, is not an easy task and we have been working with our medical partners since then on obtaining labeled data. A separate forum post detailing Phase 2 is up.
Posted by: OLIVES @ Aug. 28, 2023, 2:30 p.m.