There are still a few episodes in the testing set where the baseline algorithm under performs the random algorithm:
> awk -F, '$25 < $34 {print $3,$25,$34}' scenarios_test/metadata.csv
Test_1/Level_13.pkl 598.0017832208407 669.2615903392324
Test_3/Level_18.pkl 695.2904784472344 731.046553731317
Test_5/Level_16.pkl 704.4258233194461 729.7328683875704
Test_36/Level_6.pkl 4141.512907519388 4147.332887691794
---episode file---------random score--------baseline score
The awk output was for the test dataset available from this link: https://airliftchallenge.com/scenarios/scenarios_test.zip
I think there might be a difference for Test_36/Level_6. On my submission evaluation runs, I did not get negative normalized score for that episode. However, using the random and baseline score from the zip archive, the normalized score is negative. If you could provide the random and baseline score your using for evaluating this episode that would be helpful.
That zip file has only 20 tests in it.
Posted by: jkolen @ Feb. 22, 2023, 5:50 p.m.Also, the scenarios_test.zip file is accessible from the CodaLab competition page.
Posted by: jkolen @ Feb. 22, 2023, 5:51 p.m.