GUA-SPA@IberLEF2023: Guarani-Spanish Code Switching Analysis Forum

Go back to competition Back to thread list Post in this thread

> Baseline results published

Dear participants,

We just uploaded some baseline results for each task. The baselines work like this:

* Task 1:
Choose the most frequent category (gn, es, mix, ne, other, foreign) for each word in the training corpus. If it's an unknown word, choose "other".
It has the following performance on the dev corpus:
- Accuracy: 0.672
- Weighted F1: 0.703
- Macro F1: 0.511

* Task 2:
Take as an entity any sequence of tokens labeled as "ne" for task 1. Choose the most frequent category (per, org, loc) for the first word of the sequence in the training corpus. If it's an unknown word, choose "per".
It has the following performance on the dev corpus:
- Labeled F1: 0.422
- Unlabeled F1: 0.502

* Task 3:
Take as a Spanish span any sequence of tokens labeled as "es" for task 1. Choose the most frequent category (cc, ul) for the first word of the sequence in the training corpus. If it's an unknown word, choose "cc".
It has the following performance on the dev corpus:
- Labeled F1: 0.233
- Unlabeled F1: 0.417

We encourage all participants to send their results for the development set as well!

Regards,
Luis

Posted by: luischir @ May 3, 2023, 9:28 p.m.
Post in this thread