Russian Nested Named Entities

Organized by datapaf - Current server time: Sept. 19, 2025, 12:02 a.m. UTC

First phase

Dev
April 2, 2024, midnight UTC

End

Competition Ends
April 29, 2024, midnight UTC

NLP Course Assignment 3: Russian Nested Named Entities

Named entity extraction is one of the most popular information extraction tasks in practice – it involves searching for mentions of names, organizations, toponyms and other entities in the text. This assignment is devoted to the task of extracting nested named entities. Data partitioning allows the following cases: inside one named entity there is another named entity. For example, an entity of the Organization class “Moscow Drama Theater named after M. N. Yermolova” has a nested entity of the Person type “M. N. Yermolova”.

Data 

The competition is based on the NEREL [1] corpus, collected from WikiNews news texts in Russian. The NEREL corpus contains 29 classes of different entities, and the depth of nesting of entities reaches 6 levels of markup.

Data is provided to participants in the form of marked-up documents. The markup format is BRAT.

 

Problem statement

The task involves extracting nested named entities. In the training set, most of the named entity types occur quite often, and some number of specially selected types occur only a few times. In the test set, all entity types are equally represented.

Thus, you have to develop extraction models for nested named entities.

 

Useful Links

 

 

  1. Loukachevitch, Natalia, Ekaterina Artemova, Tatiana Batura, Pavel Braslavski, Ilia Denisov, Vladimir Ivanov, Suresh Manandhar, Alexander Pugachev, and Elena Tutubalina. "NEREL: A Russian Dataset with Nested Named Entities and Relations." In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021) (pp. 876-885). https://aclanthology.org/2021.ranlp-main.100.pdf

Algorithm evaluation

Macro-F1

In order to submit the solution, you need to create test.jsonl file and zip it using the command "zip test test.jsonl". The obtained test.zip is ready for submission as a solution.

Rules

  • Students are allowed to use any materials and any pretrained models except explicit markup of the testing set.
  • Cheating is prohibited!

Dev

Start: April 2, 2024, midnight

Description: Development phase

Test

Start: April 26, 2024, midnight

Description: Testing phase

Competition Ends

April 29, 2024, midnight

You must be logged in to participate in competitions.

Sign In