CodaLab -

> Problems with the evaluation script

Dear organizers,

We have been trying to make some submissions during this dev-period, using the provided dev data (dataset_covid_qa_dev.json file).

We have found that the script tries to match the predicted answer text towards the context, and if it does not find it, the script stops with a failure.
Maybe there is a reason for this hard-check, but some QA approaches may not gurantee always a verbatim span from the context (e.g. generative models, models that pick tokens individually, etc.), and since the evalution stops when a single mismatch happens, such approaches cannot be used.

Also, related to the context-match, there seems to be an error in de dev data. The script is complaining about the following:
--------------
trabajo doméstico], los sectores vinculados al turismo o los servicios
el trabajo doméstico, los sectores vinculados al turismo o los servicios
answer not in context
--------------

In the version of dev-data we have (downloaded from codalab) the square-bracket ']' is there (maybe a typo?). But it seems that in the version used by the evaluation script it is not there, raising the mismatch error.

Thank you very much.

Posted by: mcuadros @ April 28, 2022, 8:41 a.m.

Dear participant,

We have fixed the problem with the bracket, it was indeed a typo. Thank you very much for reporting it.

Concerning the requirement that the answer obtained belongs to the context, we do so to be consistent with the definition of the task (the answers are required to be actual spans of the text). We understand that other less strict approaches can also be very interesting, but, having defined the task in this way, we believe it is pertinent to check it. We hope that you can adapt your models to this constraint.

Best regards,
The organizers

Posted by: pln_udelar @ April 28, 2022, 11:01 a.m.

Post in this thread

Forums

QuALES@IberLEF2022: Question Answering Learning from Examples in Spanish Forum

> Problems with the evaluation script