SMM4H-HeaRD 2025 Task 4 - Detection of Insomnia in Clinical Notes

Organized by guilopgar - Current server time: Sept. 23, 2025, 2:51 a.m. UTC

Previous

Subtask 2A - Multi-label, Post-Eval
April 15, 2025, midnight UTC

Current

Subtask 2B - Evidence-based, Post-Eval
April 15, 2025, midnight UTC

End

Competition Ends
Never

Overview

This shared task aims to the development of automatic systems for identify patients potentially suffering from insomnia using electronic health records (EHRs). It is structured as a text classification challenge requiring participants to analyze a clinical note to determine if a patient is likely to have insomnia.

We have developed a comprehensive set of rules (Insomnia rules) to facilitate the identification of patients potentially suffering from insomnia. These rules incorporate both direct and indirect symptoms of insomnia and include information about commonly prescribed hypnotic medications. For this task, we have curated an annotated corpus of 210 clinical notes from the MIMIC III database, adhering to the Insomnia rules during the annotation process. Each note is annotated with a binary label indicating the patient’s overall insomnia status ("yes" or "no"), and at the rule-level to indicate the satisfaction of each rule based on the note’s content. Additionally, to enhance the explainability of participating NLP systems, we provide textual evidence from the clinical notes that support each annotation. This ensures that the outputs of the systems can be effectively justified.

Participants are encouraged to use large language models (LLMs) to tackle the Insomnia detection task. This shared task serves as an exceptional benchmark to assess the reasoning capabilities of LLMs in medicine, applying a realistic set of diagnostic guidelines to real-world clinical data.

Task

This text classification shared task is divided into three distinct subtasks:

  • Subtask 1: Binary text classification. Assess whether the patient described in a clinical note is likely to have insomnia ("yes" or "no"). Evaluation is based on the F1 score, treating "yes" as the positive class.
  • Subtask 2A: Multi-label text classification. Evaluate each clinical note against the defined Insomnia rules: Definition 1, Definition 2, Rule A, Rule B, and Rule C, predicting "yes" or "no" for each item. The micro-average F1 score is the primary metric, with "yes" treated as the positive class.
  • Subtask 2B: Evidence-Based Classification. This task extends Subtask 2A by requiring not only classification of each item but also the identification and extraction of text evidence from the clinical note that supports each classification. For items Definition 1, Definition 2, Rule B, and Rule C, participants must provide a label ("yes" or "no") and include specific text spans from the note that justify the classification. The alignment of text spans with the reference spans from the clinical notes will be mainly assessed using the ROUGE metric. This subtask focuses on promoting transparency and explainability in NLP models by requiring justification for each decision made.

To participate in #SMM4H 2025 Task 4, please register your team here with the same e-mail address as your CodaLab account. When your registration is approved, you will be invited to a Google group, where the training, validation, and test data will be made available. Please check the #SMM4H 2025 website for important dates.

GitHub repositoryhttps://github.com/guilopgar/SMM4H-HeaRD-2025-Task-4-Insomnia

Evaluation

  • Subtask 1: Binary text classification. The performance in this subtask is evaluated using the Precision, Recall and F1 scores. The "yes" label is treated as the positive class.
  • Subtask 2A: Multi-label text classification. The micro-average Precision, Recall and F1 scores serve as the evaluation metrics. The "yes" label is considered the positive class for each item in the Insomnia rules (Definition 1, Definition 2, Rule A, Rule B, and Rule C).
  • Subtask 2B: Evidence-Based Classification. The alignment of text spans provided by participants with the reference spans from the clinical notes is assessed using macro-average ROUGE-L Precision, Recall and F1 scores.

The evaluation scripts are available in the evaluation folder within the GitHub repository for the task.

Terms and Conditions

By submitting results to this competition, you consent to the public release of your scores at the SMM4H'25 workshop and in the associated proceedings, at the task organizers' discretion. Scores may include, but are not limited to, automatic and manual quantitative judgements, qualitative judgements, and such other metrics as the task organizers see fit. You accept that the ultimate decision of metric choice and score value is that of the task organizers. You further agree that the task organizers are under no obligation to release scores and that scores may be withheld if it is the task organizers' judgement that the submission was incomplete, erroneous, deceptive, or violated the letter or spirit of the competition's rules. Inclusion of a submission's scores is not an endorsement of a team or individual's submission, system, or science. You further agree that your system may be named according to the team name provided at the time of submission, or to a suitable shorthand as determined by the task organizers. You further agree to submit and present a short paper describing your system during the workshop. You agree not to redistribute the training and test data without the explicit approval of the organizers.

Contact Information

Guillermo Lopez-Garcia, Cedars-Sinai Medical Center, USA (Guillermo.LopezGarcia@cshs.org)

Submission Format

For each subtask, ground truth annotations are provided in JSON format. Participants are required to submit their system outputs following the same format as the ground truth annotations provided by the organizers.

  • Subtask 1: Binary text classification. System predictions should be submitted as a ZIP file containing a single JSON file named "subtask_1.json". Participants must predict the Insomnia status of each note by providing a "yes" or "no" label. Please see this sample JSON file for guidance on how to format the predictions for Subtask 1.
  • Subtask 2A: Multi-label text classification. System predictions should be submitted as a ZIP file containing a single JSON file named "subtask_2a.json". Participants must predict a "yes" or "no" label for each item of the Insomnia rules: Definition 1, Definition 2, Rule A, Rule B, and Rule C. Please see this sample JSON file for guidance on how to format the predictions for Subtask 2A.
  • Subtask 2B: Evidence-Based Classification. System predictions should be submitted as a ZIP file containing a single JSON file named "subtask_2b.json". For items Definition 1, Definition 2, Rule B, and Rule C, participants must provide a label ("yes" or "no") and include a list of specific text spans from the note that justify each classification. Note that text spans are required only when the corresponding item is assigned a "yes" label. For items assigned a "no" label, participants should submit an empty list [] for the text spans, indicating no justification is required. Please see this sample JSON file for guidance on how to format the predictions for Subtask 2B.

Subtask 1 - Binary, Practice

Start: March 1, 2025, midnight

Description: Practice phase: please submit predictions on validation data.

Subtask 1 - Binary, Evaluation

Start: April 7, 2025, midnight

Description: Evaluation phase: please submit predictions on test data. The results obtained here will be used for the the official evaluation of the competition.

Subtask 1 - Binary, Post-Eval

Start: April 15, 2025, midnight

Description: Post-Evaluation phase: please submit predictions on test data. This phase starts after the end of the competition.

Subtask 2A - Multi-label, Practice

Start: March 1, 2025, midnight

Description: Practice phase: please submit predictions on validation data.

Subtask 2A - Multi-label, Evaluation

Start: April 7, 2025, midnight

Description: Evaluation phase: please submit predictions on test data. The results obtained here will be used for the the official evaluation of the competition.

Subtask 2A - Multi-label, Post-Eval

Start: April 15, 2025, midnight

Description: Post-Evaluation phase: please submit predictions on test data. This phase starts after the end of the competition.

Subtask 2B - Evidence-based, Practice

Start: March 1, 2025, midnight

Description: Practice phase: please submit predictions on validation data.

Subtask 2B - Evidence-based, Evaluation

Start: April 7, 2025, midnight

Description: Evaluation phase: please submit predictions on test data. The results obtained here will be used for the the official evaluation of the competition.

Subtask 2B - Evidence-based, Post-Eval

Start: April 15, 2025, midnight

Description: Post-Evaluation phase: please submit predictions on test data. This phase starts after the end of the competition.

Competition Ends

Never

You must be logged in to participate in competitions.

Sign In
# Username Score
1 prajaktakini 0.53
2 RBG-AI 0.45
3 swendelken 0.41