Welcome to the 2022 edition of the shared task AmericasNLP
The Second AmericasNLP Competition on Speech-to-Text Translation for Indigenous Languages of the Americas is an official NeurIPS 2022 competition aimed at encouraging the development of machine translation (MT) systems for indigenous languages of the Americas. The overall goal is to develop new speech-to-text translation technology for Indigenous languages, and participants will build systems for 3 tasks: (1) automatic speech recognition (ASR) for an Indigenous language (Task 1), (2) text-to-text translation between an Indigenous language and a high-resource language (Task 2), and (3) speech-to-text translation between an Indigenous language and a high-resource language (Task 3, our main task).
Many Indigenous languages of the Americas are so-called low-resource languages: parallel data with other languages as needed to train speech-to-text MT systems is limited. This means that many approaches designed for translating between high-resource languages – such as English, Spanish, or Portuguese – are not directly applicable or perform poorly. Additionally, many Indigenous languages exhibit linguistic properties uncommon among languages frequently studied in natural language processing (NLP), e.g., many are polysynthetic or tonal. This constitutes an additional difficulty. We want to motivate researchers to take on the challenge of developing speech-to-text MT systems for Indigenous languages.
We invite submissions of speech-to-text MT results (as well as of results for the subtasks of ASR and text-to-text translation) obtained by systems built for Indigenous languages. We will provide training and evaluation data to the participants, but there are no limits on what outside resources – such as additional data or pretrained systems – participants can use, with the exception of the datasets listed here. This should go without saying, but we ask that participants don't translate (or transcribe, in the case of ASR) the test input by hand. The main metrics of this competition are ChrF (Popović, 2015) for Tasks 2 and 3 and character error rate for Task 1. Participants can submit results for as many language pairs as they like, but only teams that participate for all language pairs for a task are entering the official ranking. We provide an evaluation script and a baseline MT system to help participants getting started quickly. If you are interested in this competition, please register here.
The competition will have two tracks:
Both tracks are equivalent, and therefore the prizes described below are valid.
The following language pairs are featured in the NeurIPS–AmericasNLP 2022 competition:
The submissions in this competition will be evaluated and scored using:
- Task 1: Character Error rate - Tasks 2 and 3: chrF
The organizers of the task are:
If you have any question, contact us via americasnlp-sharedtask-organizers@googlegroups.com
Submit a zip file with the following format:
The files [language].txt should contain the hypothesis of the submited systems. If yout system generates an empty output, please keep the empty line. Important: please keep the oder of the test inputs.
The input data can be found here: http://turing.iimas.unam.mx/americasnlp/TestInputs/download_test.html
Start: June 6, 2022, midnight
Description: Development phase.
Start: Sept. 16, 2022, midnight
Description: Evaluation phase.
Start: Oct. 15, 2022, noon
Description: Open Post-Evaluation phase.
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | monirome | 0.2685 |
2 | a.legchenko | 0.4059 |