FoRC-Subtask-I@NSLP2024

Organized by raia.aa - Current server time: April 30, 2025, 11:39 a.m. UTC

First phase

Validation
Jan. 2, 2024, midnight UTC

End

Competition Ends
March 1, 2024, midnight UTC

Field of Research Classification (FoRC) - Subtask I @ NSLP 2024 

Update: This competition has ended. Thank you to everyone who participated and submitted their systems! 

If you'd like to run more experiments and re-use the dataset, the full dataset with labels for the test set can be found here.

---------------------------------------------------------------------------------------

This shared task is part of the Natural Scientific Language Processing and Research Knowledge Graphs (NSLP 2024) workshop, which is co-located with ESWC 2024.

A core application of NSLP is classifying scientific articles for their respective field of research. While some repositories already use a classification system, these are often limited either in terms of the used taxonomy or in terms of the classification model. The Field of Research Classification (FoRC) shared task aims to tackle this problem by offering two distinct subtasks:

  • Subtask I: Single-label multi-class classification of general scholarly papers. 
  • Subtask II: Multi-label classification of Computational Linguistics scholarly papers.

This competition page is dedicated to subtask I. In order to participate in subtask II, follow this link

 

In subtask I, participants are asked to develop classifiers that take (a subset of) the available metadata of articles as input and output one of 123 predefined hierarchical classes from the ORKG taxonomy of research fields. Classifiers will be trained and tested using a dataset of 59.3K English scientific papers constructed by fetching metadata from ORKG (CC BY-SA 4.0) and arXiv (CC0 1.0).

The following metadata fields are available:

  • title
  • authors
  • abstract
  • DOI
  • URL to the full paper (if available and based on license)
  • publisher
  • publication month, and year.

Systems will be evaluated using accuracy as well as weighted scores of recall, precision, and F1.

 

The phases of the competition are as follows: 

1. Development phase: during this phase, participants will develop classification models using the provided train and validation sets. Results of this phase will not be used for final standings.

2. Testing phaseStarting from January 10, 2024 until February 22, 2024 February 29, 2024, a test set will be released. Participants are expected to use their developed models and upload their results. A leaderboard will be formed from these results, and will decide the final standings of the challenge. 

Participants are encouraged to submit a short paper describing their systems (up to 8 pages in length, excluding references) to the NSLP 2024 workshop, which will be co-located with ESWC 2024 in Crete. Please refer to the guidelines here. 

The training and validation datasets can be accessed here: https://zenodo.org/records/10438530

The testing dataset can be accessed here: https://zenodo.org/records/10469550 

 

Note: When uploading your predictions, make sure that they are in the "predictions.csv" format. For more details, please go to the Evaluation tabA sample code for preparing the predictions of the validation data can be accessed here: https://drive.google.com/file/d/1bqLz0Nt33pVOMV8LTGeFipwmupjy5cgb/view

Important: The total number of system submissions is 10

 

If you have any questions, feel free to contact us:

  • Raia Abu Ahmad (raia.abu_ahmad@dfki.de)
  • Ekaterina Borisova (ekaterina.borisova@dfki.de)
  • Georg Rehm (georg.rehm@dfki.de)

Or write your question in the dedicated Forums tab.  

Evaluation Criteria

Systems will be evaluated using accuracy as well as weighted scores of precision, recall, and F1.

When uploading your predictions, please note to have them in the correct format. 

You should upload a "predictions.csv" file inside a "predictions.zip" file. Please note to save the final predictions using the text of the original labels from the ORKG taxonomy (and not numerical categories). Please also note to save the predictions in a column named "target".

A sample code for training a model with the dataset and preparing the predictions of the validation data can be accessed here

Important dates:

  • Release of training data: January 2, 2024
  • Release of testing data: January 10, 2024
  • Deadline for system submissions: February 22, 2024 February 29, 2024 (extended)
  • Paper submission deadline: March 7, 2024 March 14, 2024 (extended)
  • Notification of acceptance: April 4, 2024
  • Camera-ready submission: April 18, 2024
  • Workshop: Either May 26 -or May 27, 2024 (tbc)

Validation

Start: Jan. 2, 2024, midnight

Testing

Start: Jan. 10, 2024, midnight

Competition Ends

March 1, 2024, midnight

You must be logged in to participate in competitions.

Sign In