> Test corpus released and test baselines published, ten days to go!

Hello participants,

We just released the test data (without labels). The new data file also contains the development data with labels, besides the training data. The data is split in approximately 70%-10%-20% for training, dev and test sets. They are published in the "Files" section.

Additionally, we just submitted the baseline over the test set. The method is the same as the described for the dev set (see again https://github.com/mmaguero/genovardis-baseline).

Remember you have ten days to submit your results for this phase! The evaluation phase ends on June 7, so you can upload your results until June 6. During this phase, you can submit up to 10 predictions for the test data.

We encourage all participants to submit their test results, independently of their dev results. Keep in mind that the test data might be slightly different than the dev and train data because the last comprises the translation (from English to Spanish) and manual curation of the tmVar3's PubMed abstracts annotations [Wei et al., 2022] with their associated diseases and symptoms; but the test set comprises the manual annotation of 136 PubMed abstracts, originally written in Spanish (published between 2014 to 2024). So you might get very different results over test.

Good luck!
The GenoVarDis team

Posted by: mmaguero @ May 27, 2024, 10:36 a.m.
Post in this thread