Hope Speech is the type of speech that is able to relax a hostile environment (Palakodety et al., 2019) and that helps, gives suggestions and inspires for good to a number of people when they are in times of illness, stress, loneliness or depression (Chakravarthi, 2020). Detect it automatically, so that positive comments can be more widely disseminated, can have a very significant effect when it comes to combating sexual or racial discrimination or when we seek to foster less bellicose environments (Palakodety et al., 2019).
On social media, offensive messages are posted towards people because of their race, color, ethnicity, gender, sexual orientation, nationality, or religion. As Chakravarthi (2020) stated, the importance of the social media lives of vulnerable groups, such as people belonging to the Lesbian, Gay, Bisexual, and Transgender (LGBT) community, racial minorities or people with disabilities, has been studied and it has been found that the social media activities of a vulnerable individual play an essential role in shaping the individual’s personality and how he or she views society (Burnap et al., 2017; Kitzie, 2018;Milne et al., 2016). Moreover, it is a hot topic on social networks, in a multitude of languages.
This shared task is related to the inclusion of vulnerable groups and focuses on the study of the detection of hope speech, in pursuit of equality, diversity and inclusion. It consists of, given a text, written in Spanish or English, identifying whether it contains hope speech or not.
This task was previously organized at the second workshop on Language Technology for Equality, Diversity and Inclusion (LT-EDI-2022), as a part of ACL 2022, but for five languages: Tamil, Malayalam, Kannada, English and Spanish. The novelties of this shared task are threefold: i) it is organized in two languages, Spanish and English; ii) it provides an expanded and improved dataset; iii) and it is directed to the IberLEF community.
The general challenges proposed for this first edition are as follows:
To promote research in inclusive Language Technologies (LT).
To adopt and adapt appropriate LT models to suit Hope Speech.
To provide opportunities for researchers from the LT community around the world to collaborate with other researchers to identify and propose possible solutions for the challenges of Hope Speech.
Some specific challenges of the task are the following:
Identifying Hope Speech in two languages: Spanish and English texts.
Dealing with two different social networks: Twitter and Youtube.
Lack of context: Tweets are short (up to 240 characters).
Informal language: Misspellings, emojis and onomatopoeias are common.
This subtask consists of, given a Spanish tweet, identifying whether it contains hope speech or not.
The possible categories for each text are:
HS: Hope Speech.
NHS: Non Hope Speech.
This subtask consists of, given an English Youtube comment, identifying whether it contains hope speech or not.
The possible categories for each text are:
HS: Hope Speech.
NHS: Non Hope Speech.
In both subtasks there will be a real time leaderboard and the participants will be allowed to make a maximum of 10 submissions through CodaLab, from which each team will have to select the best one for ranking.
Evaluation measures: Precision, Recall and F1-score will be measured per category and averaged using the macro-average method. Systems will be ranked using the macro-F1 score.
Miguel Ángel García Cumbreras (SINAI, Universidad de Jaén)
Daniel García-Baena (SINAI, Universidad de Jaén)
Bharathi Raja Chakravarthi (University of Galway)
Salud María Jiménez-Zafra (SINAI, Universidad de Jaén)
José Antonio García-Díaz (TECNOMOD, Universidad de Murcia)
Rafael Valencia-García (TECNOMOD, Universidad de Murcia)
L. Alfonso Ureña-López (SINAI, Universidad de Jaén)
Miguel Ángel García Cumbreras, Full Professor, SINAI, Computer Science Department, Universidad de Jaén, Spain (magc@ujaen.es). His research is focused on Natural Language Processing and Text Categorization, specially on Question Answering, Sentiment Analysis, Hope-Speech, Social Media Analysis and Language Resource Generation. He was the promoter and organizer of the eight editions of TASS, and participates in the committee of IberLEF (2019, 2020) and the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at ACL LT-EDI 2022.
Daniel García-Baena. He is a secondary school computer science teacher and a doctoral student at the University of Jaén. He participates in the committee of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at ACL LT-EDI 2022.
Bharathi Raja Chakravarthi is a permanent Lecturer-Above-the-Bar/Assistant Professor at the School of Computer Science at the University of Galway, Ireland. Before this, he was a Postdoctoral Fellow at the Insight SFI Research Centre for Data Analytics, Data Science Institute, University of Galway, Ireland. He completed his PhD from the Data Science Institute, University of Galway, Ireland. His recent research focuses on text classification, multimodal machine learning, sentiment analysis, abusive/offensive language detection, bias in natural language processing tasks, inclusive language detection, positivity in social media platforms, machine translation, and multilingualism. He has published multiple international conference papers (COLING, LREC, MTSUMMIT, DSAA, LDK, GWC, AICS, FIRE, etc.) and highly reputed journal papers (Computer Speech & Language, Language Resources and Evaluation, Social Network Analysis and Mining, Multimedia Tools and Applications, International Journal of Data Science and Analytics, etc.). He has received the Best Application Paper Award at DSAA 2020 IEEE and ACM-funded conference. Dr. Chakravarthi served as chair and lead organizer for the 1st and 2nd Workshop on Language Technology for Equality, Diversity and Inclusion (https://sites.google.com/view/lt-edi-2022) and Workshop on Speech and Language Technologies for Dravidian Languages (https://dravidianlangtech.github.io/2022/). He served on program committees for a number of ACL conferences and workshops. He also served as guest editor for special issues in Computer Speech & Language, Language Resources and Evaluation, and ACM Transactions on Asian and Low-Resource Language Information Processing journals.
Salud María Jiménez-Zafra, SINAI, Computer Science Department, Universidad de Jaén, Spain (sjzafra@ujaen.es). Her research is focused on Natural Language Processing and Text Categorization, specially on Negation Processing, Sentiment Analysis, Offensive Language, Hope-Speech, Social Media Analysis and Language Resource Generation. She has been part of the organizing committee of the three editions of NEGES workshop, of SemEval-2016 Task 5: Aspect Based Sentiment Analysis, of the 32nd International Conference of the Spanish Society for Natural Language Processing (SEPLN 2016), of TASS at IberLEF 2020, of EmoEvalEs at IberLEF 2021, of the 2020, 2021 and 2022 editions of the Doctoral Symposium on NLP from the PLN.net thematic network, of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2022 - ACL 2022 and of PoliticES at IberLEF 2022. Currently, she is part of the organizing committee of the 39th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2023), of IberLEF 2023 and EVALITA 2023 Task - PoliticIT.
José Antonio García-Díaz. Member of TECNOMOD (Knowledge Modelling, Processing and Management Technologies) Research Group, Universidad de Murcia, Spain (joseantonio.garcia8@um.es). His research is focused on Natural Language Processing and Automatic Document Classification. Moreover, he has participated in several tasks regarding humor detection, misogyny detection, hate-speech, and author profiling among others. In addition, he was part of the organizing committee of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2022 - ACL 2022 and part of the organizing committee of PoliticES 2022 - IberLEF 2022. Currently, he is part of the organizing committee of EVALITA 2023 Task - PoliticIT.
Rafael Valencia-García, Full Professor. Department of Informatics and Systems. Universidad de Murcia (valencia@um.es). His research interests are focused on Natural Language Processing, Sentiment Analysis, Semantic Web and Recommender Systems. He was the General Chair of the SEPLN 2017 conference held in Murcia. He has participated in more than 35 research projects and published over 150 articles in journals, conferences, and book chapters. He has been guest editor of several NLP related Special Issues in different JCR-indexed journals such as PMC, CSI, IJSEKE, JRPIT, JUCS or SCP. He was part of the organizing committee of the Shared Task on Hope Speech Detection for Equality, Diversity, and Inclusion at LT-EDI 2022 - ACL 2022 and part of the organizing committee of PoliticES 2022 - IberLEF 2022. Currently, he is part of the organizing committee of EVALITA 2023 Task - PoliticIT.
L. Alfonso Ureña-López, Professor of Computer Science, director of the SINAI research group of Universidad de Jaén, Spain (laurena@ujaen.es). He is president of SEPLN (Spanish Society for Natural Language Processing). His research is focused on Natural Language Processing, Word Sense Disambiguation, Text Categorization, Sentiment Analysis, Offensive Language, Hope-Speech... He has organized and chaired numerous congresses in the area of NLP. Likewise, he has formed and is part of numerous scientific committees of conferences and workshops in NLP. Currently, he is part of the organizing committee of EVALITA 2023 Task - PoliticIT.
This subtask consists of, given a Spanish tweet, identifying whether it contains hope speech or not.
The possible categories for each text are:
HS: Hope Speech.
NHS: Non Hope Speech.
This subtask consists of, given an English Youtube comment, identifying whether it contains hope speech or not.
The possible categories for each text are:
HS: Hope Speech.
NHS: Non Hope Speech.
In both subtasks there will be a real time leaderboard and the participants will be allowed to make a maximum of 10 submissions through CodaLab, from which each team will have to select the best one for ranking.
Evaluation measures: Precision, Recall and F1-score will be measured per category and averaged using the macro-average method. Systems will be ranked using the macro-F1 score.
Each team can participate with up to 10 submissions (except for development, where 100 submissions are allowed). The expected format for submissions is a .zip file (no folders within) with the predictions in a .csv file (comma separated file).
IMPORTANT!!
1. The .csv file MUST be named results.csv
2. The .csv file MUST have the following header:
id,category
Example:
id,category
id1,HS
id2,NHS
...
By downloading the data or by accessing it any manner, you agree not to redistribute the data except for non-commercial and academic-research purposes. The data must not be used for providing surveillance, analyses or research that isolates a group of individuals or any single individual for any unlawful or discriminatory purpose.
You should cite this paper if you are using our data:
Start: Feb. 12, 2023, midnight
Start: Feb. 12, 2023, midnight
Start: March 13, 2023, midnight
Start: March 13, 2023, midnight
Start: March 29, 2023, midnight
Start: March 29, 2023, midnight
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | JL_DomOlmedo | 0.5040 |
2 | NLP_URJC | 0.5026 |
3 | ronghao | 0.4954 |