Dear participants,
We just uploaded some baseline results for each task. The baselines work like this:
* Task 1:
Choose the most frequent category (gn, es, mix, ne, other, foreign) for each word in the training corpus. If it's an unknown word, choose "other".
It has the following performance on the dev corpus:
- Accuracy: 0.672
- Weighted F1: 0.703
- Macro F1: 0.511
* Task 2:
Take as an entity any sequence of tokens labeled as "ne" for task 1. Choose the most frequent category (per, org, loc) for the first word of the sequence in the training corpus. If it's an unknown word, choose "per".
It has the following performance on the dev corpus:
- Labeled F1: 0.422
- Unlabeled F1: 0.502
* Task 3:
Take as a Spanish span any sequence of tokens labeled as "es" for task 1. Choose the most frequent category (cc, ul) for the first word of the sequence in the training corpus. If it's an unknown word, choose "cc".
It has the following performance on the dev corpus:
- Labeled F1: 0.233
- Unlabeled F1: 0.417
We encourage all participants to send their results for the development set as well!
Regards,
Luis