AmericasNLP2022-v2 Task1: ASR for Indigenous Languages of the Americas, Track 2

Organized by aoncevay - Current server time: Sept. 16, 2025, 8:54 a.m. UTC

Previous

Evaluation
Sept. 16, 2022, midnight UTC

Current

Post-Evaluation
Oct. 15, 2022, noon UTC

End

Competition Ends
Never

Second AmericasNLP Competition: Speech-to-Text Translation for Indigenous Languages of the Americas

Welcome to the 2022 edition of the shared task AmericasNLP

What?

The Second AmericasNLP Competition on Speech-to-Text Translation for Indigenous Languages of the Americas is an official NeurIPS 2022 competition aimed at encouraging the development of machine translation (MT) systems for indigenous languages of the Americas. The overall goal is to develop new speech-to-text translation technology for Indigenous languages, and participants will build systems for 3 tasks: (1) automatic speech recognition (ASR) for an Indigenous language (Task 1), (2) text-to-text translation between an Indigenous language and a high-resource language (Task 2), and (3) speech-to-text translation between an Indigenous language and a high-resource language (Task 3, our main task).

Why?

Many Indigenous languages of the Americas are so-called low-resource languages: parallel data with other languages as needed to train speech-to-text MT systems is limited. This means that many approaches designed for translating between high-resource languages – such as English, Spanish, or Portuguese – are not directly applicable or perform poorly. Additionally, many Indigenous languages exhibit linguistic properties uncommon among languages frequently studied in natural language processing (NLP), e.g., many are polysynthetic or tonal. This constitutes an additional difficulty. We want to motivate researchers to take on the challenge of developing speech-to-text MT systems for Indigenous languages.

How?

We invite submissions of speech-to-text MT results (as well as of results for the subtasks of ASR and text-to-text translation) obtained by systems built for Indigenous languages. We will provide training and evaluation data to the participants, but there are no limits on what outside resources – such as additional data or pretrained systems – participants can use, with the exception of the datasets listed here. This should go without saying, but we ask that participants don't translate (or transcribe, in the case of ASR) the test input by hand. The main metrics of this competition are ChrF (Popović, 2015) for Tasks 2 and 3 and character error rate for Task 1. Participants can submit results for as many language pairs as they like, but only teams that participate for all language pairs for a task are entering the official ranking. We provide an evaluation script and a baseline MT system to help participants getting started quickly. If you are interested in this competition, please register here.

Tracks

The competition will have two tracks:

  • Track 1: External data and pre-trained models are allowed. In this track, we aim that teams pursue to train the best system possible. For this, they can collect all external data they can find or create. The only constraint is the list of prohibited datasets.
  • Track 2: Only pre-trained models are allowed. Teams can use the provided dataset, Spanish/Portuguese monolingual data, and well-established pre-trained models (models published in any ML venue) in this track.

Both tracks are equivalent, and therefore the prizes described below are valid.

Languages

The following language pairs are featured in the NeurIPS–AmericasNLP 2022 competition:

  • Bribri–Spanish
  • Guaraní–Spanish
  • Kotiria–Portuguese
  • Wa'ikhana–Portuguese
  • Quechua–Spanish

Evaluation Criteria

The submissions in this competition will be evaluated and scored using:
- Task 1: Character Error rate - Tasks 2 and 3: chrF

Terms and Conditions

  • By submitting results to this competition, you consent to the public release of your scores at this website and at the AmericasNLP 2022 shared task and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgments, qualitative judgments, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
  • You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgment that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
  • Each team must create and use exactly one CodaLab account.
  • Team constitution (members of a team) cannot be changed after the evaluation phase has begun.
  • During the evaluation phase, each team can submit as many as ten submissions per day; the top-scoring submission will be considered as the official submission to the competition.
  • The organizers and the organizations they are affiliated with make no warranties regarding the datasets provided, including but not limited to being correct or complete. They cannot be held liable for providing access to the datasets or the usage of the datasets.

Schedule

  • May 23, 2022: Release of pilot data and evaluation script.
  • June 6, 2022: Release of training and development data and baseline systems.
  • September 16, 2022: Release of test input/start of evaluation phase.
  • September 30, 2022, October 14, 2022: Submission of translations by participants/end of competition.
  • October 4, 2022 October 12, 2022: Announcements of results.
  • December, 2022: Competition track meeting at NeurIPS (virtual event).

Organizers

The organizers of the task are:

  • Manuel Mager. University of Stuttgart, Germany.
  • Katharina Kann. University of Colorado Boulder, US.
  • Abteen Ebrahimi. University of Colorado Boulder, US.
  • Arturo Oncevay. University of Edinburgh, UK.
  • Rodolfo Zevallos. Pompeu Fabra University, Spain.
  • Adam Wiemerslage. University of Colorado Boulder, US.
  • Pavel Denisov. University of Stuttgart, Germany.
  • John E. Ortega. New York University, US.
  • Kristine Stenzel. University of Colorado Boulder, US.
  • Aldo Alvarez. Universidad Nacional de Itapúa, Paraguay.
  • Luis Chiruzzo. Universidad de la República, Uruguay.
  • Rolando Coto-Solano. Dartmouth College, US.
  • Hilaria Cruz. University of Louisville, US.
  • Sofía Flores-Solórzano. Office of Indigenous Education, Ministry of Education of Costa Rica, Costa Rica.
  • Ivan Vladimir Meza Ruiz. Universidad Nacional Autónoma de México (UNAM), México.
  • Alexis Palmer. University of Colorado Boulder, US.
  • Ngoc Thang Vu. University of Stuttgart, Germany.

Contact

If you have any question, contact us via americasnlp-sharedtask-organizers@googlegroups.com

Submissions

Submit a zip file with the following format:

  • hyp/
    • Bribri.txt
    • Guarani.txt
    • Kotiria.txt
    • Waikhana.txt
    • Quechua.txt

The files [language].txt should contain the hypothesis of the submited systems. If yout system generates an empty output, please keep the empty line. Important: please keep the oder of the test inputs.

The input data can be found here: http://turing.iimas.unam.mx/americasnlp/TestInputs/download_test.html

 

 

Development

Start: June 6, 2022, midnight

Description: Development phase.

Evaluation

Start: Sept. 16, 2022, midnight

Description: Evaluation phase.

Post-Evaluation

Start: Oct. 15, 2022, noon

Description: Open Post-Evaluation phase.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 monirome 0.2685
2 a.legchenko 0.4059