Social networks are increasingly playing crucial roles in people’s lives, transforming the dynamics of communication and information sharing. Analyzing the content originated on these platforms has become a hot research topic for the computational linguistics community. However, despite the notable advances made in recent years, there are still open challenges that merit additional research for better treatment or deeper understanding. One such challenge is the detection of abusive content, which includes aspects like hate speech, aggression, offensive language, and other related phenomena.
Given the multimodal nature of social media platforms, we aim to promote the research and development of multimodal computational models for the detection of abusive content in Mexican Spanish, particularly hate, offensive, and vulgar memes. Memes are defined as the conjunction of a text and an image which often, provide a joint meaning. This meaning is predominantly humorous or ironic, and the absence of either text or image may alter its interpretation. Accordingly, combining information from both modalities to identify a meme as abusive represents an exciting and challenging problem.
Subtasks
DIMEMEX comprises two subtasks:
Both subtasks will rely on the DIMEMEX corpus, and participants can approach either or both tasks.
During the development phase, submissions will be evaluated on the validation partition. Participants will receive immediate feedback on the performance of their submissions. During the final phase submissions will be evaluated on the test partition. Results in the final phase will be used to determine the final and official ranking.
The following evaluation measures will be used for both subtasks:
For each submission participants are free to use text, images or a combination of both modalities, just please remember to specify this information in your filenames (see the "Format of submissions" site). A single leaderboard will be maintained for each subtask, but at the end of the competition we will announce the modalities used by the different participants.
By registering to this competition you agree to use the data exclusively for the purpose of participation to this competition. Data cannot be stored after the competition, shared or distributed under any condition.
By submitting results to this competition, you consent to the public release of your scores at the IberLEF workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatically and manually calculated quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that if your team has several members, each of them will register to the competition and build a competition team and that if you are a single participant you will build a team with a single member.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
Each team can participate with up to 10 submissions for the final phase. During the validation phase a maximum of 100 submissions are allowed, 5 per day. Files to be uploaded must be compressed in a .zip file. This is the expected format for submissions:
Format for predictions is a CSV file with three predictions per line, each prediction is associated to a single category (1 for the presence of the category and 0 otherwise) use a comma "," separator between predictions in the same line. The order (columns in the labels files) of categories is as follows: hate-speech, inappropriate content, and none, for columns 1 to 3, respectively. You should use the same order as the corresponding data files i.e., line 1 in the prediction file must correspond to meme 1 in the data file). The file must be in a .zip file for submission, please do not use folders.
Example:
0,0,1
1,0,0
0,1,0
1,0,0
0,1,0
1,0,0
...
Format for predictions is a CSV file with four predictions per line, each prediction is associated to a single category (1 for the presence of the category and 0 otherwise) use a comma "," separator between predictions in the same line. The order (columns in the labels files) of categories is as follows: Classicism, Racism, Sexism, other (hate-speech), inappropriate content, and none for columns 1 to 6, respectively. This order corresponds to the order available in training data. You should use the same order as the corresponding data files i.e., line 1 in the prediction file must correspond to meme 1 in the data file). The file must be in a .zip file for submission, please do not use folders..
Example:
0,0,1,0,0,0
0,0,1,0,0,0
1,0,0,0,0,0
1,0,0,0,0,0
0,0,1,0,0,0
0,0,0,0,0,1
...
You can check development data to see the format for both data and submissions by looking into reference (ground-truth) files.
Contact: please use the forum to contact organizers and other participants (preferred contact channel)
Format details will be communicated shortly, according to the specifications of IberLEF organizers.
Download | Size (mb) | Phase |
---|---|---|
Public Data | 0.132 | #1 Development (Subtask 1) |
Public Data | 0.132 | #1 Development (Subtask 2) |
Public Data | 0.035 | #2 Final (Subtask 1) |
Public Data | 0.035 | #2 Final (Subtask 2) |
Start: March 15, 2024, 1 a.m.
Start: March 15, 2024, 1 a.m.
Start: May 10, 2024, 1 a.m.
Start: May 10, 2024, 1 a.m.
May 21, 2024, noon
You must be logged in to participate in competitions.
Sign In