Group B - UIT Data Science Challenge 2024

Organized by nhn_nlp_uit - Current server time: Nov. 10, 2025, 6:02 p.m. UTC

Previous

Private Test Phase
Nov. 9, 2024, midnight UTC

Current

Post Evaluation
Nov. 12, 2024, midnight UTC

End

Competition Ends
Never

Multimodal Sacarsm Detection on Vietnamese Social Media Texts

Multimodal learning is an interesting topic inMachine learning. This field of study learning how to mimic the ability of human brain in receiving and processing information under the various modality of data. There are many tasks proposed to evaluate multimodal models. Among of them, detecting sacarsm from social multimedia data is challenging. In this task, the sarcasm can exist in both status, images or comments of the post. To this end, we have to determine wheter or not there are only sarcasm in image, in texts, in both images and texts, or there is not any sacarsm.

In this task, the participation teams have to research and propose a multimodal method that can effectively detect the sarcasm in Vietnamese image - texts dataset collected from social media platform.

Evaluation Metrics

The position of each proposed method is evaluated by calculating the Precision, Recall, and F1 score of their predicted labels compared to the provided labels. As the provided dataset is unbalanced, the participation team must use the micro version of these metrics. The partipation methods are ranked based on its F1 score. Is there is a tide, Precision will be consider. In case both F1 score and Precision are tide, Recall will be used.

Submission Guideline

Output from your proposed methods must be saved in a JSON file with the following form:

    {

        "results": {
            id: label,
            …
        },
        "phase": phase
    }

where "id" is the id of the appropriate sample, "label" is a element from (text-sarcasm, image-sarcasm, multi-sarcasm, not-sarcasm), and "phase" is the phase of submission (dev for Developmnent Phase or test for Test Phase).

This JSON file then has to be named results.json and finally by zipped as results.zip for submission to Codaab evaluation system.

 

Terms and Conditions

- Every team has to provide comprehensively information of members. Each team is only allowed to have at most 5 members. Important imformation includes full name, student ID, major, faculty, university of each member and group name. Team leaders have to use gmail account provided by your university to register for your team. Any updates of your teams after closing registration form must be inform to organizers via email dsc@uit.edu.vn.

- Information of team leader is used for contacting, sending information about the challenge, hornoring and awarding at Award and Closing Ceremony.

- Participants have to use email registered in advance and name their team as they registered in order to be approved to join the competition on Codalab.

- All teams are only allowed to use ViMMSD dataset provided by the UIT Data Science Challenge 2024 Organizers.

- All teams are not allowed to annotate public test data and private test data manually as well as use any data augmentation techniques.

- Participating teams have to propose methods that must be trained or fine-tuned on the ViMMSD dataset.

- Only pre-trained models in provided list by Organizers are allowed to used in our challenge. Any teams use pre-trained models that is not in provided list will be reject their final results on private test phase.

- To be allowed to participate in Codalab evaluation system of our challenge, team leaders have to use email registered in registration form to create account on Codalab and name their team as registered name. Beside that, all team's representatives have to sign the Dataset User Agreement to be recognized at private test phase.

- The top-5 teams are required to provide source code to examine the final results. Top-3 teams are asked to give a presentation at the Award and Closing Ceremony of UIT Data Science Challenge 2024.

- All teams must provide pre-trained embeddings and pre-trained language models that you use in this challenge by September 30, 2024 and do not use any external resources for training methods except for data provided by organizers. If you use any pre-trained embeddings or pre-trained models that are not in the list provided by the participating teams, the final result is not accepted.

- At private test phase: each team is allowed to submit at most three json files of predictions. The final result of each team is based on the highest score among the prediction files.

- Every team participating challenge has to pay 50,000 VND for expense. # Registration for pre-trained embedding and pre-trained language models Please fill out this form by September 30, 2024. A list of pre-trained embeddings and pre-trained models will be provided to all teams by October 03, 2024.

Warmup

Start: Sept. 30, 2024, midnight

Public Test Phase

Start: Oct. 7, 2024, midnight

Private Test Phase

Start: Nov. 9, 2024, midnight

Post Evaluation

Start: Nov. 12, 2024, midnight

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 vinhthuanly123 0.5747
2 variphx 0.4658
3 Anh_Tran 0.4507