You train a model based on the training set corresponding to each protocol and extract results on the corresponding validation and testing sets. Therefore, you need to train a total of 3 models. The grades submitted on the website are a combination of six files, namely: dev. txt (p1), test. txt (p1), dev. txt (p2.1), test. txt (p2.1), dev. txt (p2.2), and test. txt (p2.2).
Perhaps some teams have integrated all the datasets and trained only one model, which is ineffective in the code validation stage.
Posted by: AjianLiu @ Feb. 26, 2024, 1:47 a.m.There's no point in this ranked version if everyone keeps their original top score
Posted by: Sonwe1e @ Feb. 27, 2024, 3:06 a.m.It is perplexing to consider training three separate models rather than integrating all datasets and training only one model.
What adds to the confusion is the fact that this issue is only being pointed out at this stage.
Different datasets, that is, different protocols, use different splits of training, development and test sets. That is, some images in the test set of one protocol may appear in the training set of the other protocol. So, mixing together leads to duplicate parts of the training and test sets, which also leads to so many AUCs above 99% in the leaderboards.
Posted by: Studetns @ Feb. 27, 2024, 4:41 a.m.Curious on how the organizer would verify if submitted models would be trained with leak images in the test set?
Posted by: zzhang27 @ Feb. 28, 2024, 4:18 a.m.Given the proposed rule changes, it is suggested that the organizers revoke all scores from the semifinals and extend the competition
Posted by: Darrenlu @ Feb. 28, 2024, 7:17 a.m.