NADI-2023 Shared Task (Subtask 1) Closed Country-level Dialect Identification

Organized by chiyu94 - Current server time: Sept. 23, 2025, 4:56 a.m. UTC

Previous

Test
Aug. 14, 2023, midnight UTC

Current

Post-Evaluation
Aug. 31, 2023, noon UTC

End

Competition Ends
Never

Welcome to Subtask 1 (Closed Country-level Dialect ID) of NADI-2023 shared task!

Arabic is a rich language with a wide collection of dialects. Many of these dialects remain under-studied, primarily due to limited resources (research funding, datasets, etc.). The goal of the Nuanced Arabic Dialect Identification (NADI) shared task series (Abdul-Mageed et al., 202020212022) is to alleviate this bottleneck by providing datasets and modeling opportunities for participants to carry out dialect identification, and other dialect processing. Dialect identification is the task of automatically detecting the source variety of a given text or speech segment. In addition to nuanced dialect identification at the country level, NADI 2022 offered a new subtask focused on country-level sentiment analysis. NADI 2023 continues this tradition of extending to tasks beyond dialect identification. Namely, we propose a new open track subtask focused at machine translation (MT) from dialect to MSA. In this open track subtask, we allow participants to develop datasets under particular conditions and use them to develop systems. Namely, we allow participants to create datasets mapping MSA into dialectal Arabic (DA) that can be exploited to train their MT systems.

While we invite participation in either of the two subtasks, we hope that teams will submit systems to both tasks (i.e., participate in the two tasks rather than only one task). By offering two subtasks, we hope to receive systems that exploit diverse methods and machine learning architectures. This could include multi-task learning systems as well as sequence-to-sequence architectures in a single model such as the text-to-text Transformers (e.g., mT5, AraT5). Many other approaches could also be possible and we look forward to creative approaches to the subtasks. We introduce the two subtasks next.

(To receive access to the data, teams intending to participate are invited to fill in the form on the official website of NADI shared task. ).

Shared Task (Subtask 1): 

This CodaLab is for shared task targets Closed Country-level Dialect ID (Subtask 1).

Subtask 1 (Closed Country-level Dialect ID): In this subtask, we provide a new Twitter dataset (NADI-2023-TWT) that covers 18 dialects (a total of 23.4K tweets). We spit this dataset into Train (18K), Dev (1.8K), and Test (3.6K). In addation, we provide external data from NADI 2020 (Abdul-Mageed et al., 2020), NADI 2021 (Abdul-Mageed et al., 2021), and MADAR (Bouamor et al., 2018) train datasets. We refer to these additional datasets as NADI-2020-TWT, NADI-2021-TWT, and MADAR-2018, respectively. In other words, participants are not allowed to use any external data except the ones we provide to train their systems.

Metrics:

For subtask 1, the evaluation metrics will include precision/recall/f-score/accuracy. Macro Averaged F-score will be the official metric.

This is a closed track. Participating teams will be provided with a common training data set and a common development set. No external manually labelled data sets are allowed. A blind test data set will be used to evaluate the output of the participating teams. All teams are required to report on the development and test set in their writeups.

The shared task will be hosted through CODALAB. Teams will be provided with a CODALAB link for each subtask.

  • CODALAB link for NADI-2023 Shared Task Subtask 1: https://codalab.lisn.upsaclay.fr/competitions/14449
  • CODALAB link for NADI-2023 Shared Task Subtask 2: https://codalab.lisn.upsaclay.fr/competitions/14643
  • CODALAB link for NADI-2023 Shared Task Subtask 3: https://codalab.lisn.upsaclay.fr/competitions/14648

Important dates:

  • July 18, 2023: Shared task announcement. Release of training data and scoring script.
  • August 7, 2023: Registration deadline.

             
  • August 14, 2023: Test set made available.
  • August 30, 2023: Codalab TEST system submission deadline.
  • September 5, 2023: Shared task system paper submissions due.
  • October 12, 2023: Notification of acceptance.
  • October 30, 2023: Camera-ready version.
  • TBA: WANLP 2023 Conference.

   All deadlines are 11:59 PM UTC-12:00 (Anywhere On Earth).

 

Contact:

Please visit the official website of the NADI shared task for more information.

For any questions related to this task, please contact the organizers directly using the following email address: ubc.nadi2020@gmail.com 

Organizers:

Muhammad Abdul-Mageed, Chiyu Zhang, El Moatez Billah Nagoudi, Abdelrahim Elmadany (The University of British Columbia, Canada), Nizar Habash (New York University Abu Dhabi), and Houda Bouamor (Carnegie Mellon University, Qatar)

Evaluation Criteria

Metrics: The evaluation metrics will include precision/recall/f-score/accuracy. Macro Averaged F-score will be the official metric.

Terms and Conditions

To receive access to the data, teams intending to participate are invited to fill in the form on the official website

Copyright (c) 2023 The University of British Columbia, Canada; Carnegie Mellon University Qatar; New York University Abu Dhabi. All rights reserved.

Development

Start: July 1, 2023, midnight

Description: Development phase: Develop your models and submit prediction labels on the DEV set of subtask 1. Note: The name of your submission should be 'teamname_subtask1_dev_numberOFsubmission.zip' that includes a text file of your prediction (e.g., A submission 'UBC_subtask1_dev_1.zip' that is the zip file of my first prediction, 'UBC_subtask1_dev_1.txt'.)

Test

Start: Aug. 14, 2023, midnight

Description: Test phase: Submit your prediction labels on the TEST set of subtask 1. Each team is allowed a maximum of 3 submissions. Note: The name of your submission should be 'teamname_subtask1_test_numberOFsubmission.zip' that includes a text file of your predictions (e.g., A submission 'UBC_subtask1_test_1.zip' that is the zip file of my prediction, 'UBC_subtask1_test_1.txt'.)

Post-Evaluation

Start: Aug. 31, 2023, noon

Description: Post-Evaluation: Submit your prediction labels on the TEST set of subtask 1 after competition. Note: The name of your submission should be 'teamname_subtask1_test_numberOFsubmission.zip' that includes a text file of your predictions (e.g., A submission 'UBC_subtask1_test_1.zip' that is the zip file of my prediction, 'UBC_subtask1_test_1.txt'.)

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 asalhi85 0.8586
2 Samah 0.8543
3 Dilshod 0.8476