Abusive Comment Detection in Tamil and Telugu-DravidianLangTech@RANLP 2023

Organized by DravidianLangTech - Current server time: Jan. 9, 2025, 11:16 p.m. UTC

First phase

First phase
Feb. 21, 2022, midnight UTC

End

Competition Ends
Aug. 31, 2023, 11 p.m. UTC

Shared Task on Abusive Comment Detection in Tamil and Telugu at DravidianLangTech@RANLP 2023

A significant increase in the amount of digital information that is being distributed through various social media platforms has occurred in recent years. Online social networks (OSNs) have grown in importance in recent years, becoming a go-to source for acquiring news, information, and entertainment. However, despite the numerous benefits of employing OSNs, a growing body of evidence suggests that there is an ever-increasing number of malevolent actors who are exploiting these networks to spread poison and cause harm to other individuals. The term "hate speech" (HS) refers to any form of communication that is abusive, insulting, intimidating, and/or that incites violence or discrimination, and that disparages an individual or a vulnerable group on the basis of characteristics such as ethnicity, gender, sexual orientation, or religious affiliation. Because of this diversity in thematic foci, we refer to them as themes. Examples of topics include misogyny, sexism, racism, transphobia, homophobia, and xenophobia.

 

The goal of this task is to identify whether a given comment contains abusive comment. A comment / post within the corpus may contain more than one sentence but the average sentence length of the corpora is 1. The annotations in the corpus are made at a comment / post level.

The participants will be provided development, training and test dataset in Tamil and Tamil-English. To download the data and participate, go to codalab and click “Participate" tab. As far as we know, this is the first shared task on abusive detection in Tamil at this fine grained level.

Task:

This is a comment / post level classification task. Given a Youtube comment, the systems submitted by the participants should classify it abusive categories. To download the data and participate, go to the Participate tab.

 

Paper  name format should be: TEAM_NAME@DravidianLangTech@RANLP 2023: Title of the paper. 

Example: NUIG_ULD@DravidianLangTech@RANLP 2023: Abusive comments Detection in Tamil and Telugu

For electronic submission of papers to DravidianLangTech workshop please use this link:

Following are some general guidelines to keep in mind while submitting the working notes.
- Basic sanity check for grammatical errors and reported results
- Papers should have sufficient information for reproducing the mentioned results- Papers should follow the appropriate style (We will use ACL 2022 style: details below)
- Check the papers for text reuse / Plagiarism. This includes self-plagiarism as well. We would like to stress this point as ACL is quite strict about it. Any paper found to have plagiarized content should be rejected without further consideration.
- Please ensure the author names do not have any salutations like Dr., Prof., etc in the final version
 
All submissions should be in Double column RANLP 2023 format. Authors should use one of the RANLP 2023 Templates below:
- Overleaf:  https://www.overleaf.com/latex/templates/instructions-for-ranlp-2023-proceedings/dwjrqsgfrrgm
 
 
Email: bharathiraja.akr@gmail.com and b_premjith@cb.amrita.edu
 

DravidianLangTech@RANLP 2023

Evaluation Criteria

Submission should be a zip file with your team name containing tsv files for individual languages:  'teamname_language.tsv' e.g. the zip file may contain teamname_tamil.tsv (for tamil), teamname_telugu.tsv (for telugu) etc. 
The submission will be evaluated with a macro average F1-score.
We accept the test results only through the google form. The results should be submitted on the google form
 
Classification system’s performance will be measured in terms of macro-averaged Precision, macro-averaged Recall and macro-averaged F-Score across all the classes. Participants are encouraged to check their system with Sklearn classification report https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
 
Submit your results in the following google forms
 

 

You should cite these papers if you are using our data:

 

@inproceedings{priyadharshini-etal-2022-overview,
title = "Overview of Abusive Comment Detection in {T}amil-{ACL} 2022",
author = "Priyadharshini, Ruba and
Chakravarthi, Bharathi Raja and
Cn, Subalalitha and
Durairaj, Thenmozhi and
Subramanian, Malliga and
Shanmugavadivel, Kogilavani and
U Hegde, Siddhanth and
Kumaresan, Prasanna",
booktitle = "Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.dravidianlangtech-1.44",
doi = "10.18653/v1/2022.dravidianlangtech-1.44",
pages = "292--298",
abstract = "The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels. The commentsposted on social media is powerful enoughto even change the political and businessscenarios in very few hours. They alsotend to attack a particular individual ora group of individuals. This shared taskaims at detecting the abusive comments in-volving, Homophobia, Misandry, Counter-speech, Misogyny, Xenophobia, Transpho-bic. The hope speech is also identified. Adataset collected from social media taggedwith the above said categories in Tamiland Tamil-English code-mixed languagesare given to the participants. The par-ticipants used different machine learningand deep learning algorithms. This paperpresents the overview of this task compris-ing the dataset details and results of theparticipants.",
}

 

Terms and Conditions

By downloading the data or by accessing it any manner, you agree not to redistribute the data except for non-commercial and academic-research purposes. The data must not be used for providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose.

You should cite these papers if you are using our data.

 

@inproceedings{priyadharshini-etal-2022-overview,
title = "Overview of Abusive Comment Detection in {T}amil-{ACL} 2022",
author = "Priyadharshini, Ruba and
Chakravarthi, Bharathi Raja and
Cn, Subalalitha and
Durairaj, Thenmozhi and
Subramanian, Malliga and
Shanmugavadivel, Kogilavani and
U Hegde, Siddhanth and
Kumaresan, Prasanna",
booktitle = "Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.dravidianlangtech-1.44",
doi = "10.18653/v1/2022.dravidianlangtech-1.44",
pages = "292--298",
abstract = "The social media is one of the significantdigital platforms that create a huge im-pact in peoples of all levels. The commentsposted on social media is powerful enoughto even change the political and businessscenarios in very few hours. They alsotend to attack a particular individual ora group of individuals. This shared taskaims at detecting the abusive comments in-volving, Homophobia, Misandry, Counter-speech, Misogyny, Xenophobia, Transpho-bic. The hope speech is also identified. Adataset collected from social media taggedwith the above said categories in Tamiland Tamil-English code-mixed languagesare given to the participants. The par-ticipants used different machine learningand deep learning algorithms. This paperpresents the overview of this task compris-ing the dataset details and results of theparticipants.",
}

Important Dates for shared task:

Task announcement: Feb 20, 2023

Release of Training data: Feb 28, 2023

Release of Test data: May 10, 2023

Run submission deadline: June 1, 2023

Results declared: June 10, 2023

Paper submission: July 1, 2023

Peer review notification: August 5, 2023

Camera-ready paper due: August 25, 2023

Workshop Dates: September 7 - 8, 2023

Ruba Priyadharshini, Gandhigram Rural Institute-Deemed to be University, India

Bharathi Raja Chakravarthi, Insight SFI Research Centre for Data Analytics, School of Computer Science, University of Galway, Ireland

Malliga Subramanian, Kongu Engineering College, Tamil Nadu, India

Subalalitha Chinnaudayar Navaneethakrishnan, SRM INSTITUTE OF SCIENCE AND TECHNOLOGY KATTANKULATHUR,CHENNAI, INDIA

Kogilavani Shanmugavadivel, Kongu Engineering College, Tamil Nadu, India

Premjith B, Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India

Abirami Murugappan, Department of Information Science and Technology, Anna University

Prasanna Kumar Kumaresan, Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway, Ireland

 

Student Volunteer:

Karnati Sai Prashanth, Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India

Rishith, Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India

Janakiram Chandu, Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India

 

Email: bharathiraja.akr@gmail.com and b_premjith@cb.amrita.edu

Rank list:

 

First phase

Start: Feb. 21, 2022, midnight

Competition Ends

Aug. 31, 2023, 11 p.m.

You must be logged in to participate in competitions.

Sign In