MEDDOPLACE stands for MEDical DOcument PLAce-related Content Extraction. It is a shared task and set of resources focused on the detection of different kinds of places, and related types of information such as nationalities or patient movements, in medical documents in Spanish.
For more information about the task, data examples, schedule, etc. please visit https://temu.bsc.es/meddoplace.
To contact the organizers directly, you can write to <salvador.limalopez [at] gmail.com> or <krallinger.martin [at] gmail.com>.
MEDDOPLACE offers four different sub-tasks, each focused on a specific problem:
- SUB-TASK 1: Named Entity Recognition
This a classic NER task where participants have to detect entities in text using the full-text documents as input.
- SUB-TASK 2: Entity Linking / Toponym Resolution
In this sub-task, participants must assign a unique identifier to every text mention to disambiguate them. This sub-task is divided in three:
- SUB-TASK 3: Entity Clasification
This sub-task involves further classifying location entities in text inside pre-defined classes of clinical relevance, such as birth place, residence or healthcare attention.
- SUB-TASK 4: End-to-end
This sub-task challenges participants to do all three previous sub-tasks at once. That is, they must detect entities in text, normalize them to the corresponding ontology and classify appropriate entities. The main difference with the previous sub-tasks is that the normalization and classification systems will not be evaluated on a vacuum, but instead depend on the mentions detected by the NER system.
For more information about each of them, please refer to the MEDDOPLACE website and the Task Guide.
To allow the individual evaluation of normalization and classification systems, the evaluation will be divided in two phases:
Because of this, the test data will be released in two parts. For Phase 1, only the test text files will be released. Once it’s finished, the list of entities found in the test files will be published so that participants can create predictions for their normalization and classification in Phase 2. Participants are allowed to re-use their systems for Sub-task 4 in the rest of the tasks.
The schedule for these phases is available on https://temu.bsc.es/meddoplace/schedule/.
You will find that in this CodaLab there is a Practice and Final submission for each sub-task. Use the Practice submission to check that everything works correctly and to evaluate your system in a reduced version of the test set. Once you feel you are ready, use the Final submission to upload the predictions that you want to count towards the task, which will be evaluated on the complete test set. If you need to do some evaluations on your own, you can also use the MEDDOPLACE scorer.
For more information on the evaluation and sub-tasks, including data formats and tips, please read the Task Guide.
Tasks 1 and 4 will use as their main metric strict, micro-averaged precision, recall and F-1 score, while Tasks 2 and 3 will use accuracy (percentage of correct mentions out of the total). In addition, some sub-tasks include additional metrics that may help further understand and interpret the results:
Sub-task | Additional Metric |
Sub-task 1 (NER) | Overlapping, micro-averaged precision, recall and F-1 score |
Sub-task 2.1 (GeoNames normalization) | Accuracy@161km, Area Under the Curve (AUC), Mean and Median Error |
Sub-task 2.2 (PlusCodes normalization) | Accuracy@161km, Area Under the Curve (AUC), Mean and Median Error |
The additional metrics given for the GeoNames and PlusCodes normalizations are distance-based metrics, which are more appropriate for Toponym Resolution. They are described in:
Gritta, M., Pilehvar, M.T. & Collier, N. A pragmatic guide to geoparsing evaluation. Lang Resources & Evaluation 54, 683–712 (2020). https://doi.org/10.1007/s10579-019-09475-3 [pages 694-697]
Predictions must be submitted as .TSV files with one annotation per row (same as the training data format). You must include headers as the first row. These are the columns for each sub-task (provided fields are in round letters, while the fields that participants must predict are in italics):
There is more information on format specifics and the meaning of each of the columns in the Task Guide, check it out here.
With your Final submissions, you also need to add a .TXT file that includes:
Put together in a .ZIP file one .TSV file (and one .TXT) per submission and give it a recognizable name (e.g. your team's name + a short description + a number; teambsc_roberta_01.zip). It is important that there is only one .TSV file inside your .ZIP file, otherwise the scorer will fail.
Start: May 1, 2023, midnight
Description: Task 4 End-to-End [Practice] Use this task for non-final results.
Start: May 1, 2023, midnight
Description: Task 4 End-to-End [Final] Submit to this task your final results.
Start: May 1, 2023, midnight
Description: Task 1 NER [Practice] Use this task for non-final results.
Start: May 1, 2023, midnight
Description: Task 1 NER [Final] Submit to this task your final results.
Start: June 8, 2023, midnight
Description: Task 2.2 PlusCodes [Final] Submit to this task your final results.
Start: June 8, 2023, midnight
Description: Task 2.1 GeoNames [Final] Submit to this task your final results.
Start: June 8, 2023, midnight
Description: Task 3 Classification [Final] Submit to this task your final results.
Start: June 8, 2023, midnight
Description: Task 2.3 Snomed-CT [Practice] Use this task for non-final results.
Start: June 8, 2023, midnight
Description: Task 2.3 Snomed-CT [Final] Submit to this task your final results.
Start: June 8, 2023, midnight
Description: Task 2.2 PlusCodes [Practice] Use this task for non-final results.
Start: June 8, 2023, midnight
Description: Task 2.1 GeoNames [Practice] Use this task for non-final results.
Start: June 8, 2023, midnight
Description: Task 3 Classification [Practice] Use this task for non-final results.
June 17, 2023, noon
You must be logged in to participate in competitions.
Sign In