Important dates: (tentative)
The dysmorphology physical examination is a critical component of the diagnostic evaluation in clinical genetics. This process catalogues often minor morphological differences of the patient's facial structure or body, but it may also identify more general medical signs such as neurologic dysfunction. The findings enable correlation of the patient with known rare genetic diseases. They therefore directly influence clinical diagnosis, the selection of genetic testing, and the interpretation of results---particularly when testings reveals variants of uncertain clinical significance. Beyond the clinic, such information is also useful to researchers attempting to delineate undescribed genetic conditions or to further our understanding of existing ones.
Whereas the medical findings are key information, they are nearly always captured within the electronic health record (EHR) as unstructured free text, making it unavailable for downstream computational analysis. Advanced Natural Language Processing methods are therefore required to retrieve the information from the records.
For the BioCreative VIII shared task, we call for automated systems to extract and normalize the key findings in observations written during dysmorphology physical examinations.
Dysmorphology physical examinations are frequently documented in the EHR as a series of organ system observations. For example:
PHYSICAL EXAMINATION
FACE: slightly inverted triangular face shape
EYES: long palpebral fissures with slight downslant. Sparse lateral eyebrows.
EARS: Thin inferior helices, low-set
NOSE: Short, wide nasal bridge. Anteverted nares.
MOUTH: thin upper lip; palate intact
CHEST: supernumerary nipple inferior to left nipple
HANDS FEET: Long fingers, normal toes
NEUROLOGIC: Resting tremor. Wide-based, unsteady gate.
Similar to clinical workflows, we will standardize the description of dysmorphic findings using the Human Phenotype Ontology, an ontology specially designed for human genetics.
A successful system should extract the span of text referring to the key positive findings and normalize them to term IDs in the HPO ontology. The system should ignore the normal findings. For example, in the organ system observation: EYES: long palpebral fissures with slight downslant. Normal eyebrows. A system should extract the spans of the two key findings [long palpebral fissures] and [palpebral fissures with slight downslant], and normalize them to the terms IDs HP:0000637 and HP:0000494, respectively. The system should ignore the normal finding of [Normal eyebrows].
During the competition, participants will be able to perform one of the following subtasks:
Details:
By submitting results to this competition, you consent to the public release of your scores at the BioCreative workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers.
You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science.
You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers.
You further agree to submit and present a short paper describing your system during the workshop.
You agree not to redistribute the training and test data except in the manner prescribed by its licence.
Both steps, the extraction and the normalization, are particularly difficult on dysmorphology physical examinations given the current state-of-the-art of natural language processing.
Extraction. This step is challenging due to the descriptive style of the examinations and their polarity. The observations are short reports where, for conciseness, the span of a finding can be disjoint or overlapping with the span of another finding. The previous observation is an example of overlapping findings, with the span palpebral fissures contributing to both HP:0000637 and HP:0000494 terms. For disjoint findings, i.e. findings defined with non-consecutive segments of text, consider the term Short nasal bridge - HP:0003194 in the observation NOSE: Short, wide nasal bridge. Anteverted nares. Designed extractors should go beyond the standard sequence labeling approach which, designed to extract contiguous and mutually-exclusive named entities, fails to capture the disjoint and overlapping terms. As an additional challenge, the extractor should also resolve the polarity of the findings, that is, automatically detecting and ignoring normal findings, only returning the key positive findings.
Normalization. This step is also challenging, both due to the large scale of the HPO ontology and its incompleteness. Standard strategies for multi-label classification are designed to assign small sets of classes to input instances. However, to be successful in our task, a normalizer should adapt traditional strategies to assign one term from among the 17,000 terms in the HPO to each finding detected in an observation. This must frequently occur without supervision since our training set does not provide examples of use for all terms in the HPO. Furthermore, while specifically designed for human genetics, and constantly improving, the HPO does not have standardized levels of term detail. As a consequence, a key finding may need to be matched with a close ancestor in the hierarchy of the ontology, making the strict matching strategy inefficient since the string of the ancestor in the HPO will be different from the string of the key finding in the observation. For example, there exists both Naevus flammeus of the eyelid - HP:0010733 and Nevus flammeus of the forehead - HP:0007413, but no term for the nose, leaving only generic Nevus flammeus - HP:0001052 to normalize this abnormality of the nose when it is mentioned in an observation.
This task is a part of a bigger competition: BioCreative VIII. To register, please follow the link "team registration page" in the section TEAM REGISTRATION of the BioCreative VIII page.
The registration is free but required to access the training and evaluation data. If you have any questions, please contact Davy Weissenbacher
The results of the task will be released during the BioCreative VIII workshop and published in the proceedings of the event.
Start: April 14, 2023, midnight
Start: Sept. 14, 2023, 11 p.m.
Start: Sept. 19, 2023, 1 a.m.
Start: July 14, 2023, midnight
Start: Sept. 14, 2023, 11 p.m.
Start: Sept. 19, 2023, 1 a.m.
Never
You must be logged in to participate in competitions.
Sign In# | Username | Score |
---|---|---|
1 | QJW | - |
2 | DUTIR-BioNLP | - |