This shared task focuses on automatic methods for estimating the quality of neural machine translation output at run-time, without relying on reference translations. It will cover estimation at sentence and word levels and critical error detection. This year we put emphasis on:
This year we will release test-sets on the following language-pairs:
In addition to generally advancing the state of the art in quality estimation, our specific goals are:
For all tasks, the datasets and NMT models that generated the translations will be made publicly available.
Participants are also allowed to explore any additional data and resources deemed relevant. Below are the three QE tasks addressing these goals.
Here are some open source software for QE that might be useful for participants:
For questions regarding the organisations of the task and/or issues with submission to this Codalab, please use the Forum.
We will use Matthews Correlation Coefficient as primary metric and also compute F1-Score as secondary metric.
Submission Format
The competition will take place on CODALAB.
This year we will also require participants to fill in a form describing their model and data choices for each submission.
For each submission you wish to make (under “Participate>Submit” on codalab), please upload a single zip file with the predictions and the system metadata.
For the metadata, we expect a ‘metadata.txt’ file, with exactly two non-empty lines which are for the teamname and the short system description, respectively. The first line of metadata.txt must contain your team name. You can use your CodaLab username as your teamname. The second line of metadata.txt must contain a short description (2-3 sentences) of the system you used to generate the results. This description will not be shown to other participants. Note that submissions without a description will be invalid. It is fine to use the same submission for multiple submissions/phases if you use the same model (e.g. a multilingual or multitasking model)
For the predictions we describe the exact format expected separately for each subtask:
For the predictions we expect a single TSV file for each submitted QE system output (submitted online in the respective codalab competition), named ‘predictions.txt’.
You can submit different systems for any of the MQM or post-edited language pairs independently. The output of your system should be the predicted word-level tags, formatted in the following way:
Line 1: <DISK FOOTRPINT (in bytes, without compression)>
Line 2: <NUMBER OF PARAMETERS>
Line 3: <NUMBER OF ENSEMBLED MODELS> (set to 1 if there is no ensemble)
Lines 4-n where -n is the total number of tokens (words) in the test samples: <LANGUAGE PAIR> <METHOD NAME> <TYPE> <SEGMENT NUMBER> <WORD INDEX> <WORD> <BINARY SCORE>
Where:
Each field should be delimited by a single tab (<\t>) character.
Start: Aug. 1, 2023, midnight
Start: Aug. 18, 2023, noon
Never
You must be logged in to participate in competitions.
Sign In