TESTLINK at IberLEF2023 is a relation extraction task based on clinical cases taken from the E3C corpus. These clinical cases are written documents in Spanish and Basque that describe various aspects of a clinical practice, such as the patient's visit, exam, diagnosis and treatment. The task requires identifying the test results and measurements and linking them to their corresponding textual mentions.
Clinical narratives document laboratory tests and measurements that are often used to diagnose diseases and disorders. These tests and measurements reveal the patients' condition at a specific stage of the disorder's progression, but they have been neglected in recent research. This task poses a novel challenge for data analysis, since it involves not only identifying named entities, but also interpreting numerical values and ranges.
The training dataset consists of 81 documents in Spanish and 90 documents in Basque from the E3C corpus. The annotated data is provided in a tab-delimited text file, based on the PubTator format, and uses the following structure, where each document in the dataset is in a new line and a blank line is used as a document separator. Below the text, each relation is on a separate line and is represented as an ordered pair of entity mentions (i.e. RML, event). Each entity mention in the relation is expressed by its start and end character offset. The span of the mention text can be set, but is not mandatory.
|t|
REL <RML_START>-<RML_END> <EVENT_START>-<EVENT_END> <RML_TEXT> <EVENT_TEXT>
[...]
|t|
REL <RML_START>-<RML_END> <EVENT_START>-<EVENT_END> <RML_TEXT> <EVENT_TEXT>
[...]
For example:
100001|t|Paciente de 65 a. de edad, que presentaba una elevación progresiva de las cifras de PSA desde 6 ng/ml a 12 ng/ml en el último año. Dicho paciente había sido sometido un año antes a una biopsia transrectal de próstata ecodirigida por sextantes que fue negativa. Se decide, ante la elevación del PSA, realizar una E-RME previa a la 2ª biopsia transrectal, en la que se objetiva una lesión hipointensa que abarca zona central i periférica del ápex del lóbulo D prostático. El estudio espectroscópico de ésta lesión mostró una curva de colina discretamente más elevada que la curva de citrato, con un índice de Ch-Cr/Ci > 0,80, que sugería la presencia de lesión neoplásica, por lo que se biopsia dicha zona por ecografía transrectal. La AP de la biopsia confirmó la presencia de un ADK próstata Gleason 6.
100001 REL 94-101 84-87 6 ng/ml PSA
100001 REL 104-112 84-87 12 ng/ml PSA
100001 REL 251-259 185-192 negativa biopsia
100001 REL 619-623 598-604 0,80 índice
You can download the data and a starting kit allowing you to generate a baseline submission from the "Files" tab.
The TESTLINK data is derived from the E3C corpus, which is released under CC-BY-NC-4.0 licence. Participants are free to use any data for model training, including the data provided for the CLinkaRT twin task. However, they are required to report all the data sources they utilized in their system report.
For more information about this shared task, please visit the TESTLINK webpage or the IberLEF website.
If you have any questions or comments about this shared task, please contact the contact person at altuna@fbk.eu.
This work has been partially supported by the European Language Grid project through its open call for pilot projects (EU grant no. 825627, E3C project), and by the Basque Government post-doctoral grant POS 2021 2 0030.
The task has been defined as a relation extraction (RE) task in which the elements taking part in the relation as well as the directionality of the relation are considered. Participating systems are provided with raw text from clinical cases as input and asked to return a list of entity mention pairs for which a relationexists in the text. The systems output includes both the document in input and the annotated relations as follows:
100001|t|Paciente de 65 a. de edad, que presentaba una elevación progresiva de las cifras de PSA desde 6 ng/ml a 12 ng/ml en el último año. Dicho paciente había sido sometido un año antes a una biopsia transrectal de próstata ecodirigida por sextantes que fue negativa. Se decide, ante la elevación del PSA, realizar una E-RME previa a la 2ª biopsia transrectal, en la que se objetiva una lesión hipointensa que abarca zona central i periférica del ápex del lóbulo D prostático. El estudio espectroscópico de ésta lesión mostró una curva de colina discretamente más elevada que la curva de citrato, con un índice de Ch-Cr/Ci > 0,80, que sugería la presencia de lesión neoplásica, por lo que se biopsia dicha zona por ecografía transrectal. La AP de la biopsia confirmó la presencia de un ADK próstata Gleason 6.
100001 REL 94-101 84-87 6 ng/ml PSA
100001 REL 104-112 84-87 12 ng/ml PSA
100001 REL 251-259 185-192 negativa biopsia
100001 REL 619-623 598-604 0,80 índice
Please note: In the annotated relations, the mention text span (e.g., 6 ng/ml, PSA) can be set but it is not used for evaluation.
The task is divided into two tracks depending on the language:
Please note that the data structure, annotation scheme and evaluation criteria are the same in both tracks.
We measure RE performance by standard Precision, Recall and F1 measure, in which a relation prediction is considered correct if the start and end character offsets of the two related entity mentions and their order of both in the relation are correct. We use the scorer of BioCreative V CDR task to perform the evaluation:
eval_relation.sh PubTator gold_file prediction_file
Download the evaluation scorer from here.
The TESTLINK data has been extracted from the E3C corpus, which is released under CC-BY-NC-4.0 licence.
There is no data usage restriction for model training. The training data released for the CLinkaRT twin task can be used in this task, as well as any other dataset the participants will considered relevant. Nonetheless, all the data used for training will need to be specified in the system report.
Start: March 17, 2023, midnight
Description: The scorer is not working, measure the results locally with the scorer in the "Evaluation" page.
Start: April 17, 2023, midnight
Description: Spanish track. Submit your annotated file through this page, it will be assessed locally and the results will be posted in http://e3c.fbk.eu
Start: April 17, 2023, midnight
Description: Basque track. Submit your annotated file through this page, it will be assessed locally and the results will be posted in http://e3c.fbk.eu
May 8, 2023, midnight
You must be logged in to participate in competitions.
Sign In