Hi,
I came across an unexpected behaviour yesterday after submitting the same model's predictions twice (but for the second submission, after some mistake on my side, filled with zeros predictions : up to a quarter of all voxels's predictions were left at zeros !)
Unexpectedly, the models's score obtained were respectively 54.2031580345 (for the model predicting all voxels) & 57.318775426 (for the model whose predictions were filled with zeros)
Therefore, I have the feeling that the correlation scores are computed by removing all NaN beforehand (instead of putting them at 0) which led in my case to a better score for my 2nd model filled with zeros.
If this is the case, quite unfortunately, the competition may become a ``pattern of zeros optimisation`` to only keep voxels that are well predicted in order to artificially boost the score.
Dear all, have you encountered a similar behavior ?
PS: for the organisers, the two mentioned submission are the number 15 & 16 if you want to check by yourself.
Posted by: ChanceLevel @ June 12, 2023, 8:28 a.m.Hi ChanceLevel, thanks for bringing this up. We are looking into it. Can you double check the names of the two submissions that you are referring to? The CodaLab output log from your submissions 15 and 16 show different numbers than what you mentioned in the post. We also ran those submissions on our local copy, and they agree with the Codalab output log. We see your submission 15 as "average_model_per_roi.zip" and submission 16 as "alexnet_Conv3.zip".
Once we can reproduce your results we will look into what may be causing this behavior.
Thanks!
Oh! I guess there is a mix-up between the order of the submission on my side and how they appears on the Codalab.
I was referring to the two average_model_per_roi.zip :
- the 1st one predicting all the voxels submitted on the : 06/09/2023 10:17:46, obtaining a score of 54.2031580345
- the 2nd one, filled with a quarter of zeros submitted on the : 06/10/2023 06:09:39, obtaining a score of 57.318775426
I hope this will help.
Posted by: ChanceLevel @ June 12, 2023, 3:32 p.m.Dear organisers,
Any news on what may account for this strange behaviour ?
Posted by: ChanceLevel @ June 14, 2023, 9:21 a.m.Hi ChanceLevel,
We'll get back to you on this by the end of this week / beginning of next week. Thanks for your patience!
The Algonauts Team
Posted by: giffordale95 @ June 15, 2023, 7:41 a.m.```
r = corr(y, gt)
score = r ** 2 / nc
score = np.nanmedian(score)
```
if this is the code, filling bottom half of `y` with np.nan is all you need
a fix should be:
```
r = corr(y, gt)
score = r ** 2 / nc
score = np.nan_to_num(score)
score = np.median(score)
```
Dear Organiser,
I tested the potential breach with carrefully selected voxels... using the zeros-strategy I explained before in this thread. I reached a score of 100%.
It proves without a doubt that it is urgent to correct this breach as it makes this challenge cheatable otherwise.
Thanks in advance. If further information is needed, I remain totally avaible.
Hi ChanceLevel,
Thanks for running this simulation and empirically proving your point! We would like to reassure everyone that we are aware of this problem, and that we are fixing it (unfortunately this will take some more days). Please be patient, and we will get back to you with this as soon as possible.
As for your simulation, I understand that you carefully selected (and retained) vertices that explained the test data with 100% accuracy, and set all other vertices to zeros, thus resulting in a challenge score of 100%. However, how did you carefully select the vertices with a 100% score?
The Algonauts Team
Posted by: giffordale95 @ June 19, 2023, 7 a.m.Dear Organiser,
I selected only the 4 voxels of subj06 LH mTL-words which I knew from the "scores_subj_hemishphere_roi.txt" had been predicted to 100% in one of my previous submission.
And indeed, left all the remaining voxels at 0.
Posted by: ChanceLevel @ June 19, 2023, 8:23 a.m.I do not know if it would easily be done. But I guess it will be paramount to re-score every submission uploaded on the CodaLab as other teams may have also used this breach (wilfully or without knowing)
Posted by: ChanceLevel @ June 19, 2023, 8:28 a.m.Hi ChanceLevel,
Thanks for the clarification! As for the past submissions, we will re-scored them using the fixed evaluation metric.
The Algonauts Team
Posted by: giffordale95 @ June 19, 2023, 8:40 a.m.To add to this, I think there may be something else wrong with the scoring currently. As a sanity check I made a submission (02 for me) where I used the test predictions straight from the tutorial colab for Subject 1, RH, with no modifications, and got a median noise-normalized score for that hemisphere of only 11.35, whereas in the organizer baseline the score is closer to 40. I know the colab method is not exactly the same as the organizer baseline since in the colab the output of only 1 alexnet layer is used instead of appending across all layers, but based on my own testing there should definitely not be that extreme of a difference between the two.
Posted by: alex12341 @ June 19, 2023, 1:44 p.m.Dear Organiser,
Thanks a lot for addressing the issue about the scoring-breach. But I came across a new strange behavior with the new scoring system.
My best previous model has seen its scores per ROIs fluctuate chaotically (some improved & some decreased) whereas it was predicting all voxels (and thus should not have been impacted by the replacement of NaN by zeros in the computation of the correlation score) :
As a few examples
--------------------
Previous Submission Scores :
subj08 RH early: 57.63809967524254
subj08 RH midventral: 61.86311264462127
subj08 RH midlateral: 49.02751915068904
subj08 RH midparietal: 75.6852446831322
subj08 RH ventral: 63.86677500042189
subj08 RH lateral: 59.3715714069598
subj08 RH parietal: 45.79069244623951
New Submission Scores:
subj08 RH early: 61.09974203828644
subj08 RH midventral: 61.17616047755064
subj08 RH midlateral: 50.30978087656773
subj08 RH midparietal: 73.15454389717613
subj08 RH ventral: 60.649239039585225
subj08 RH lateral: 58.81975927162134
subj08 RH parietal: 49.43616222946191
More glaring, the ROI that enabled me to obtain 100% accurracy previously dropped drastically (whereas I had no zeros or constant prediction) :
Previous Submission Score :
subj06 LH mTL-words: 100.0
New Scoring :
subj06 LH mTL-words: 67.65403162824377
Can you release the code for scoring the model so that we can all participate in some peer-reviewing ? (in order to make sure that the scoring correspond to what was announced in the Challenge's Description )
Thanks in advance,
ChanceLevel
Hi ChanceLevel,
We will email all participants detailed info about the modifications to the evaluation code shortly. Briefly, we addressed (1) the NaN correlations and (2) changed the median calculations to mean. The latter fix is causing the differences you posted about.
While modifying the code is trivial, integrating the new code into the challenge is not. Thank you for your patience, details coming soon!
Best,
The Algonauts Organizers