This competition uses data from MIMIC-CXR-JPG v2.0.0, which requires credentialing through PhysioNet and a signed data use agreement (DUA) for MIMIC-CXR-JPG. To participate in this competition, you must follow these steps:
If you have completed these steps correctly, you will be admitted to the competition and we will provide links to download the necessary data by email! You are not permitted to share these labels whatsoever.
Chest radiography, like many diagnostic medical exams, produces a long-tailed distribution of clinical findings; while a small subset of diseases is routinely observed, the vast majority of diseases are relatively rare [1]. This poses a challenge for standard deep learning methods, which exhibit bias toward the most common classes at the expense of the important but rare "tail" classes [2]. Many existing methods [3] have been proposed to tackle this specific type of imbalance, though only recently has attention been given to long-tailed medical image recognition problems [4-6]. Diagnosis on chest X-rays (CXRs) is also a multi-label problem, as patients often present with multiple disease findings simultaneously; however, only a select few studies incorporate knowledge of label co-occurrence into the learning process [7-10]. Since most large-scale image classification benchmarks contain single-label images with a mostly balanced distribution of labels, many standard deep learning methods fail to accommodate the class imbalance and co-occurrence problems posed by the long-tailed, multi-label nature of tasks like disease diagnosis on CXRs [2]. This task will evaluate a model's ability to perform "in-distribution" long-tailed, multi-label disease classification on CXRs when evaluated on a large test set with noisy, automatically text-mined labels that have been encountered during training.
This challenge will use an expanded version of MIMIC-CXR-JPG [11], a large benchmark dataset for automated thorax disease classification. Each CXR study in the dataset was labeled with 26 newly added disease findings (see figure above) extracted from the associated radiology reports. The resulting long-tailed (LT) dataset contains 377,110 CXRs, each labeled with at least one of 40 clinical findings (including a "Normal" class).
Given a CXR, detect all clinical findings. If no findings are present, predict "Normal", which simply means that no cardiopulmonary disease or abnormality was found (excluding "Support Devices"). To do this, you will train multi-label thorax disease classifiers on the provided labeled training data.
This challenge is hosted in conjunction with the MICCAI 2024 challenge. After completing the challenge, we will invite participants to submit their solutions for potential presentation at the MICCAI 2024 CXR-LT 2024 challenge. Additionally, we plan to coordinate a publication summarizing the challenge results, with invitations extended to the top-performing teams to serve as coauthors. We intend to select the top 3 teams for oral presentations at the MICCAI 2024 challenge in Morocco. For more information about MICCAI 2024 CXR-LT 2024 challenge, click here.
Participants will upload image-level predictions on the provided test sets for evaluation. Since this is a multi-label classification problem with severe imbalance, the primary evaluation metric will be mean Average Precision (mAP) (i.e., "macro-averaged" AP across the 40 classes). While Area Under the Receiver Operating Characteristic Curve (AUC) is a standard metric for related datasets, AUC can be heavily inflated in the presence of strong imbalance. Instead, mAP is more appropriate for the long-tailed, multi-label setting since it both (i) measures performance across decision thresholds and (ii) does not degrade under class imbalance. For thoroughness, mean AUC (mAUC) and mean F1 score (mF1) -- using a decision threshold of 0.5 for each class -- will be calculated and appear on the leaderboard, but not contribute to team rankings. Mean expected calibration error (mECE) will also be computed to assess model calibration.
There will be two phases of the competition:
This competition uses data from MIMIC-CXR-JPG v2.0.0, which requires credentialing through PhysioNet and a signed data use agreement (DUA) for MIMIC-CXR-JPG. To participate in this competition, you must
By registering for this competition, you also agree to the following terms and conditions:
All CodaLab submissions are required to be in .zip format. For this competition, this compressed .zip file must contain (i) a predictions .csv file and (ii) a "code/" directory with all of your training and inference code. The required file structure is as follows:
xxx.csv # predictions .csv file
code/ # code directory
├── yyy.py
├── zzz.py
├── ...
To create the final submission .zip file, you might then run zip -r submission.zip xxx.csv code
. Please note that the names of your individual submission files do not matter, though the code directory must be named "code".
Your predictions .csv file must contain image-level predictions of the probability that each of the 40 classes are present in a given image. Specifically,
Please see the "Starting Kit" under "Participate" -> "Files" for a full, valid sample submission once registered for the competition.
Start: May 1, 2024, midnight
Description: Development Phase: Train models with the given labeled training data and upload your predictions on the unlabeled development set. See the sample submission under "Participate" -> "Files" for an example of a properly formatted submission.
Start: Aug. 26, 2024, midnight
Description: Test Phase: Train models with the given labeled training data and upload your predictions on the unlabeled test set (to be released after the Development Phase ends). For this phase, the leaderboard will be kept private, though you will receive feedback on your submissions by clicking "Download output from scoring step" on a successful submission. You are only allowed 5 successful submissions during this phase, so be very careful! See the sample submission under "Participate" -> "Files" for an example of a properly formatted submission. Make absolutely sure that your submission contains predictions and dicom_ids for the *test set* (not the development set)!
Sept. 6, 2024, midnight
You must be logged in to participate in competitions.
Sign In