ClinSpEn 2022 (subtrack of Biomedical WMT @ EMNLP)

Organized by darrylestrada97 - Current server time: May 6, 2025, 7:10 p.m. UTC

First phase

Task 1 - ClinSpEn-Clinical Cases [EN->ES]
Aug. 1, 2022, midnight UTC

End

Competition Ends
Sept. 1, 2023, 11 p.m. UTC

ClinSpEn is part of the Biomedical WMT 2022 shared task, having the aim to promote the development and evaluation of machine translation systems adapted to the medical domain with three highly relevant sub-tracks: clinical cases, medical controlled vocabularies/ontologies, and clinical terms and entities extracted from medical content.

- ClinSpEn sub-track website: https://temu.bsc.es/clinspen/
- Biomedical WMT 2022 website: https://statmt.org/wmt22/biomedical-translation-task.html

Motivation

Machine translation applied to the clinical domain is a specially challenging task due to the complexity of medical language and the heavy use of health-related technical terms and medical expressions. Therefore there is a large community of specialized medical translators, able to deal with medical narratives, terminologies or the use of ambiguous abbreviations and acronyms. 

Taking into account the relevance, impact and diversity of health-related content, as well as the rapidly growing number of publications, EHRs, clinical trials,  informed consent documents and medical terminologies there is a pressing need to be able to generate more robust medical machine translation resources together with independent quality evaluation scenarios.  

Recent advances in machine translation technologies together with the use of other NLP components are showing promising results, thus domain adaptation of MT approaches can have a significant impact in unlocking key information from medical content.

The ClinSpEn data represents three different types of data very relevant to the biomedical domain: clinical cases, clinical terminology and ontology concepts.

Sub-tracks

ClinSpEn is comprised of three different sub-tracks:

  • ClinSpEn-CC (clinical cases): EN>ES translation of clinical cases using a collection of 202 parallel COVID-19 clinical case reports.
  • ClinSpEn-CT (clinical terms): ES>EN translation of clinical terminology using a collection of over 19 000 parallel terms obtained from biomedical literature and electronic health records.
  • ClinSpEn-OC (ontology concepts): EN>ES translation using a collection of over 2 000 parallel concepts obtained from different biomedical ontologies.

The sample, test and background data for each sub-track can be found under the Participate tab.

competition flow

Participant systems are evaluated for each sub-track individually using five metrics: COMET, METEOR, SacreBLEU, BLEU and ROUGE, with the main one being SacreBLEU. Participants may upload up to 7 predictions for each sub-track.

 

The evaluation script for all three metrics was kindly shared by the MedMTEval organizers, a competition focused on the automatic translation of medical texts from Russian to English and part of the AINL 2022 conference. For more information, please check their article "E. Ezhergina, M. Fedorova, V. Malykh, and D. Petrova. Findings of Biomedical Russian-English MT Competition. To appear in AINL 2022 Proceedings." Special thanks to Tom Kocmi, one of the developers of the OCELOT evaluation tool, for his support.

All submissions must be done using a ZIP file containing a TSV inside. Depending on the sub-track, the TSV file must include the following columns:

- Sub-track 1 (Clinical Cases): document number, line number, predicted translation.

- Sub-track 2 (Clinical Terms): term number, predicted translation.

- Sub-track 3 (Ontology Concepts): concept number, predicted translation.

Headers may or may not be included for each column.

 

Please check the submitting instruction document

Task 1 - ClinSpEn-Clinical Cases [EN->ES]

Start: Aug. 1, 2022, midnight

Description: EN -> ES Translation of clinical cases, using a collection COVID-19 clinical case reports, plus Background Set. The leaderboard will be shared with the participants the day of the conference.

Task 2 - ClinSpEn-Clinical Terms [ES->EN]

Start: Aug. 1, 2022, midnight

Description: ES -> EN Translation of clinical terminology, using a collection of parallel terms obtained from biomedical literature and electronic health records, plus Background Set. The leaderboard will be shared with the participants the day of the conference.

Task 3 - ClinSpEn-Ontology Concepts [EN->ES]

Start: Aug. 1, 2022, midnight

Description: EN -> ES Translation of a collection of parallel concepts obtained from different biomedical ontologies, plus Background Set. The leaderboard will be shared with the participants the day of the conference.

Competition Ends

Sept. 1, 2023, 11 p.m.

You must be logged in to participate in competitions.

Sign In