The KSAA-CAD shared task highlight the Contemporary Arabic Language Dictionary within the scenario of developing a Reverse Dictionary (RD) system and enhancing Word Sense Disambiguation (WSD) capabilities. The first KSAA-RD (Al-Matham et al., 2023) highlighted significant gaps in the domain of reverse dictionaries, which are designed to retrieve words by their meanings or definitions. This shared task comprises two tasks: RD and WSD. The RD task focuses on identifying word embeddings that most accurately match a given definition, termed a "gloss," in Arabic. Conversely, the WSD task involves determining the specific meaning of a word in context, particularly when the word has multiple meanings. KSAA-CAD presents novel directions for researchers to investigate and offer significant contributions to the discipline.
Participants need to register via this link (https://forms.gle/SrApXnMrU7AHtRux6 )!
1.1 Task 1: Reverse Dictionary
RDs, identified by their sequence-to-vector format, characterized as sequence-to-vector, introduce a differentiated strategy in contrast to traditional dictionary lookup methods. The RD task concentrates on the conversion of human-readable glosses into word embedding vectors.
This process entails reconstructing the word embedding vector corresponding to the defined word, a methodology aligning with the approaches of (Mickus et al., 2022; Zanzotto et al., 2010; Hill et al., 2016).
The dataset includes, lemma, lemma vector representations, and their respective gloss.
The developed model is expected to generate novel lemma vector representations for the unseen human-readable definitions in the test set. Such a strategy allows users to search for words based on anticipated definitions or meanings.
1.2 Task 2: Word Sense Disambiguation
WSD focuses on identifying the specific sense of a word in a given context. The WSD gloss-based approach is categorized as a knowledge-based WSD method. This approach utilizes external resources, especially dictionaries. This technique involves determining a word's intended meaning by calculating the overlap between its contextual use and the provided gloss or definition.
In the realm of contemporary Arabic language, dictionaries have been utilized in the development of gloss-based WSD datasets, as evidenced in the works of (Jarrar et al., 2023; El-Razzaz et al., 2021). These studies employed the Ahmed Mokhtar Omar dictionary (Omar, 2008). Furthermore, the research conducted by (Jarrar et al., 2023) also incorporated the Al-Ghani Al-Zaher dictionary (Abul-Azm, 2014).
The dataset consists of form, context, and context ID, and corresponding sense ID. The developed model is expected to retrieve the suitable sense ID for the form in the context from the WSD dictionary.
RD: The model evaluation process follows a hierarchy of metrics. The primary metric is the ranking metric, which is used to assess how well the model ranks predictions compared to ground truth values. If models have similar rankings, the secondary metric, mean squared error (MSE), is considered. Lastly, if further differentiation is needed, the tertiary metric, cosine similarity, provides additional insights. This approach ensures the selection of a top-performing and well-rounded model.
WSD: In WSD, Accuracy is the primary metric, measures if the sense ID of a word is correctly identified. It calculates the proportion of correct predictions overall. The evaluation of shared tasks will be hosted through CODALAB. Here are the CODALAB links for each task:
RD: During the evaluation phase, submissions are expected to reconstruct the same JSON format. The test JSON files will contain the "id" and the embedding keys. The participants should construct JSON files that contain at least the two following keys:
The participants should construct JSON files that contain at least the two following keys:
WSD: During the evaluation phase, submissions are expected to reconstruct the same JSON format. The test JSON files will contain the "context_id" and two corresponding "gloss_id" entries, each accompanied by their ranking scores. The following is a detailed example for clarification:
{
"context_id":"context.301",
"gloss_id":"gloss.305",
"ranking_score":0.9
}
{
"context_id":"context.301",
"gloss_id":"gloss.466",
"ranking_score":0.7
}
Detailed description provided in this link: https://arai.ksaa.gov.sa/sharedTask2024/
I understood that this data is copyrighted to KSAA;
I will use the datasets only for academic research purposes, and will not use it for any other purpose;
I am not allowed to share or re-publish any part of the data, for whatever reason and in whatever means.
We are pleased to announce the awards for the Arabic Reverse Dictionary Shared Task at ArabicNLP 2023. The top-ranked teams in each task will receive cash prizes as follows:
Task 1: Arabic Reverse Dictionary (RD)
Task2: Word Sense Disambiguation
The winners will be determined based on the official evaluation metrics specified for each task. Best of luck to all the teams, and we look forward to announcing the winners at the conclusion of the competition!
Start: March 15, 2024, midnight
Description: Development phase: create models for subtask 1 (Arabic RD) and directly submit results on validation data; in this phase, feed-back are provided on the validation set only. Platform accepts ZIP file format. Kindly compress your JSON file into ZIP before uploading.
Start: April 15, 2024, midnight
Description: Test phase: create models for subtask 1 (Arabic RD) and directly submit results on test data; in this phase, feed-back are provided on the test set only. Platform accepts ZIP file format. Kindly compress your JSON file into ZIP before uploading.
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | SerrySibaee | 0.27 |