NER Shared Task 2024 (Subtask 1 - Closed-Track Flat Fine-Grain NER)

Organized by bashartalafha - Current server time: March 30, 2025, 1:59 a.m. UTC

Previous

Test
April 25, 2024, 10 a.m. UTC

Current

Post-Evaluation
May 15, 2024, noon UTC

End

Competition Ends
Never

*Please refer to https://dlnlp.ai/st/wojood/ for the full description of the shared task.*

 

NER SharedTask 2024

Subtask-1

(Closed-Track Flat Fine-Grain NER)

 

 

INTRODUCTION

Named Entity Recognition (NER) plays a crucial role in various Natural Language Processing (NLP) applications. This process involves identifying mentions of named entities in unstructured text and categorizing them into predefined classes, such as PERSON, ORGANIZATION, GPE, LOCATION, EVENT, and DATE. Given the relative scarcity of resources for Arabic NLP, research in Arabic NER has predominantly concentrated on "flat" entities and has been limited to a few "coarse-grained" entity types, namely PERSON, ORGANIZATION, and LOCATION. To address this limitation, the WojoodNER shared task series was initiated ( Jarrar et al., 2023). It aims to enrich Arabic NER research by introducing Wojood and Wojood-Fine, nested and fine-grained Arabic NER corpora.

DATASET

In this year's shared task (WojoodNER 2024), a new version of the Wojood corpus, called Wojood-Fine, will be released. Wojood-Fine enhances the original Wojood corpus by offering fine-grained entity types that are more granular than the data provided in WojoodNER 2023. For instance, GPE is now divided into 7 subtypes (COUNTRY, STATE-OR-PROVINCE, TOWN, NEIGHBORHOOD, CAMP, GPE_ORG, and SPORT). Similarly, LOCATION, ORGANIZATION, and FACILITY are also divided into subtyes. The corpus contains 550K tokens, 75K entity mentions covering the parent types and 47K subtype entity mentions. It is also important to highlight that Wojood-Fine is a full re-annotation of Wojood using new annotation guidelines. This means the Wojood dataset cannot be (re-)used in this shared task. More details about Wojood-Fine corpus can be found in our paper (Liqreina et al., 2023).

WojoodNER 2024 SHARED TASK

WojoodNER 2024 continues to expand on the previous WojoodNER 2023 scope, venturing beyond traditional NER tasks. This year, we introduce three subtasks, all centered around Arabic Fine-Grained NER. Among these, the "open track" subtask stands out by allowing participants to develop or utilize external datasets and leverage external tools to craft innovative systems.
While participation in any individual subtask is encouraged, we especially hope the teams will engage in all three, bringing a comprehensive approach to the competition. The introduction of multiple subtasks is designed to foster a range of diverse methodologies and machine-learning architectures. This could encompass multi-task learning systems, as well as advanced sequence-to-sequence models, such as those based on Transformer architectures and Large Language Models (LLMs).
We believe that this variety will not only challenge, but also inspire participants to explore a wide array of approaches. From leveraging existing models to pioneering new techniques, the possibilities are vast. As we delve into the specifics of these subtasks, we eagerly anticipate the creative solutions that participants will add to the Arabic NLP research for addressing the nuanced demands of Arabic Fine-Grained NER.
Subtask-1 (Closed-Track Flat Fine-Grain NER): In this subtask, we provide the Wojood-Fine Flat train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). The flat NER dataset is the same as the nested NER dataset in terms of train/test/dev split. The only difference in the flat NER is that each token is assigned one tag, which is the first high-level tag assigned to each token in the nested NER dataset. This subtask is a closed track. In other words, participants are not allowed to use other external datasets except the ones we provide to train their systems. It is also important to note that the Wojood dataset shared last year cannot be used due to different annotation guidelines.
Subtask-2 (Closed-Track Nested Fine-Grain NER): This subtask is similar to the subtask-1, we provide the Wojood-Fine Nested train (70%) and development (10%) datasets. The final evaluation will be on the test set (20%). This subtask is also a closed track.
Subtask-3 (Open-Track NER - Gaza War): In this subtask, we aim to allow participants to reflect on the utility of NER in the context of real-world events, allow them to use external resources, and encourage them to use generative models in different ways (fine-tuned, zero-shot learning, in-context learning, etc.). The goal of focusing on generative models in this particular subtask is to help the Arabic NLP research community better understand the capabilities and performance gaps of LLMs in information extraction, an area currently understudied.
We provide development and test data related to the current War on Gaza. This is motivated by the assumption that discourse about recent global events will involve mentions from different data distribution. For this subtask, we include data from five different news domains related to the War on Gaza - but we keep the names of the domains hidden. Participants will be given a development dataset (10K tokens, 2K from each of the five domains), and a testing dataset (50K tokens, 10K from each domain). Both development and testing sets are manually annotated with fine-grain named entities using the same annotation guidelines used in Subtask1 and Subtask2 (also described in Liqreina et al., 2023).

METRICs

The evaluation metrics will include precision, recall, F1-score. However, our official metric will be the micro F1-score. The evaluations of shared tasks are hosted through CODALAB.
- CODALAB link for Subtask 1 (Flat Fine-Grain NER): TBA - CODALAB link for Subtask 2 (Nested Fine-Grain NER): TBA - CODALAB link for Subtask 3 (Open Track NER): TBA

BASELINES

Two baseline models trained on WojoodFine (flat and nested) are provided (See Liqreina et al., 2023). The code used to produce these baselines is available on GitHub.

Subtask Precision Recall Average Micro-F1
Flat Fine-Grain NER (Subtask 1) 0.8870 0.8966 0.8917
Nested Fine-Grain NER (Subtask 2) 0.9179 0.9279 0.9229

 

GOOGLE COLAB NOTEBOOKS

To allow you to experiment with the baseline, we authored four Google Colab notebooks that demonstrate how to train and evaluate our baseline models.
[1] Train Flat Fine-Grain NER: This notebook can be used to train our ArabicNER model on the flat Fine-grain NER task using the sample Wojood_Fine data.
[2] Evaluate Flat Fine-Grain NER: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.
[3] Train Nested Fine-Grain NER: This notebook can be used to train our ArabicNER model on the nested Fine-grain task using the sample Wojood data.
[4] Evaluate Nested Fine-Grain NER: This notebook will use the trained model saved from the notebook above to perform evaluation on unseen dataset.

SUBMISSION DATA FORMAT

Your submission to the task will include one file, which is the prediction of your model in the CoNLL format. If you are submitting to both subtasks, then you will have two submissions. The CoNLL file should include multiple columns space-separated. The IOB2 scheme should be used for the submission, which is the same format used in the Wojood dataset. Do not include a header in your submitted file. Segments should be separated by a blank line as in the sample data found in the repository.
Note that we will validate your submission to verify the number of segments and number of tokens within each segment is the same as the test dataset. We will also verify the token on each line maps to the same token in the test dataset.

Flat and Nested NER: the first column is the token (word), followed by 51 columns, each column is for one entity type. The 21 columns should be in a particular order as follows:

 

Column #Value
1 Token
2 AIRPORT or O
3 BOUNDARY or O
4 BUILDING-OR-GROUNDS or O
5 CAMP or O
6 CARDINAL or O
7 CELESTIAL or O
8 CLUSTER or O
9 COM or O
10 CONTINENT or O
11 COUNTRY or O
12 CURR or O
13 DATE or O
14 EDU or O
15 ENT or O
16 EVENT or O
17 FAC or O
18 GOV or O
19 GPE or O
20 GPE_ORG or O
21 LAND-REGION-NATURAL or O
22 LANGUAGE or O
23 LAW or O
24 LOC or O
25 MED or O
26 MONEY or O
27 NEIGHBORHOOD or O
28 NONGOV or O
29 NORP or O
30 OCC or O
31 ORDINAL or O
32 ORG or O
33 ORG_FAC or O
34 PATH or O
35 PERCENT or O
36 PERS or O
37 PLANT or O
38 PRODUCT or O
39 QUANTITY or O
40 REGION-GENERAL or O
41 REGION-INTERNATIONAL or O
42 REL or O
43 SCI or O
44 SPO or O
45 SPORT or O
46 STATE-OR-PROVINCE or O
47 SUBAREA-FACILITY or O
48 TIME or O
49 TOWN or O
50 UNIT or O
51 WATER-BODY or O
52 WEBSITE or O

 

Table 2. Columns order for flat and nested NER submission

Example data file

flat_predictions.txt and nested_prediction.txt
جريدة O O O O O O O O O O O O O B-ORG O O O O O O O
فلسطين O O O O O B-GPE O O O O O O O I-ORG O O O O O O O
/ O O O O O O O O O O O O O O O O O O O O O
نيسان O O B-DATE O O O O O O O O O O O O O O O O O O
( O O I-DATE O O O O O O O O O O O O O O O O O O
26 O O I-DATE O O O O O O O O O O O O O O O O O O
/ O O I-DATE O O O O O O O O O O O O O O O O O O
4 O O I-DATE O O O O O O O O O O O O O O O O O O
/ O O I-DATE O O O O O O O O O O O O O O O O O O
1947 O O I-DATE O O O O O O O O O O O O O O O O O O
) O O I-DATE O O O O O O O O O O O O O O O O O O
. O O O O O O O O O O O O O O O O O O O O O

 

REGISTRATION

Participants need to register via this form (NERSharedTask 2024). Participating teams will be provided with common training development datasets. No external manually labelled datasets are allowed. Blind test data set will be used to evaluate the output of the participating teams. Each team is allowed a maximum of 3 submissions. All teams are required to report on the development and test sets (after results are announced) in their write-ups.

FAQ

For any questions related to this task, please check our Frequently Asked Questions

IMPORTANT DATES

February 25, 2024: Shared task announcement.
March 10, 2024: Release of training data, development sets, scoring script, and Codalab links.
April 5, 2024: Registration deadline.
April 26, 2024: Test set made available.
May 3, 2024: Codalab Test system submission deadline.
May 10, 2024: Shared task system paper submissions due.
June 17, 2024: Notification of acceptance.
July 1, 2024: Camera-ready version.
August 16, 2024: ArabicNLP 2024 conference in Thailand.

 

CONTACT

For any questions related to this task, please contact the organizers directly using the following email address: NERSharedtask@gmail.com .

ORGANIZERS

         - Mustafa Jarrar, Birzeit University
         - Muhammad Abdul-Mageed, University of British Columbia & MBZUAI
         - Mohammed Khalilia, Birzeit University
         - Bashar Talafha, University of British Columbia
         - AbdelRahim Elmadany, University of British Columbia
         - Nagham Hamad, Birzeit University

 

EVALUATION

The evaluation metrics will include precision, recall, F1-score. However, our official metric will be the micro F1-score.

Terms and Conditions

To receive access to the data, teams intending to participate are invited to fill in this form

Development

Start: March 10, 2024, midnight

Description: Development phase: Develop your models and submit prediction labels on the DEV set of subtask 1. Note: The name of your submission should be 'teamname_subtask1_dev_pred_numberOFsubmission.zip' that includes a text file of your prediction. Also, the submission file should include the substring _pred_ in the filename (e.g., A submission 'UBC_subtask1_dev_pred_1.zip' that is the zip file of my first prediction, 'UBC_subtask1_dev_pred_1.txt'.).

Test

Start: April 25, 2024, 10 a.m.

Description: Test phase: Submit your prediction labels on the TEST set of subtask 1. Each team is allowed a maximum of 4 submissions. Note: The name of your submission should be 'teamname_subtask1_test_pred_numberOFsubmission.zip' that includes a text file of your predictions. Also, the submission file should include the substring _pred_ in the filename (e.g., A submission 'UBC_subtask1_test_pred_1.zip' that is the zip file of my prediction, 'UBC_subtask1_test_pred_1.txt'.)

Post-Evaluation

Start: May 15, 2024, noon

Description: Submit your prediction labels on the TEST set of subtask 1. Note: The name of your submission should be 'teamname_subtask1_test_pred_numberOFsubmission.zip' that includes a text file of your predictions. Also, the submission file should include the substring _pred_ in the filename (e.g., A submission 'UBC_subtask1_test_pred_1.zip' that is the zip file of my prediction, 'UBC_subtask1_test_pred_1.txt'.)

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 Issam 0.90
2 hadikhamoud 0.87
3 Norah_Alshammari 0.86