Contact: bodymaps.official@gmail.com
Variations in organ sizes and shapes can indicate a range of medical conditions, from benign anomalies to life-threatening diseases. Precise organ volume measurement is fundamental for effective patient care, but manual organ contouring is extremely time-consuming and exhibits considerable variability among expert radiologists. Artificial Intelligence (AI) holds the promise of improving volume measurement accuracy and reducing manual contouring efforts. We formulate our challenge as a semantic segmentation task, which automatically identifies and delineates the boundary of various anatomical structures essential for numerous downstream applications such as disease diagnosis and treatment planning. Our primary goal is to promote the development of advanced AI algorithms and to benchmark the state of the art in this field.
The BodyMaps challenge particularly focuses on assessing and improving the generalizability and efficiency of AI algorithms in medical segmentation across diverse clinical settings and patient demographics. In light of this, the innovation of our BodyMaps challenge includes the use of (1) large-scale, diverse datasets for training and evaluating AI algorithms, (2) novel evaluation metrics that emphasize the accuracy of hard-to-segment anatomical structures, and (3) penalties for algorithms with extended inference times. Specifically, this challenge involves two unique datasets. First, AbdomenAtlas, the largest annotated dataset [Qu et al., 2023, Li et al., 2023], contains a total of 10,142 three-dimensional computed tomography (CT) volumes. In each CT volume, 25 anatomical structures are annotated by voxel. AbdomenAtlas is a multi-domain dataset of pre, portal, arterial, and delayed phase CT volumes collected from 88 global hospitals in 9 countries, diversified in age, pathological conditions, body parts, and race background. The AbdomenAtlas dataset will be released by stages to the public for AI development, where in each stage we will release 1,000 annotated CT volumes. Second, W-1K is a proprietary collection of 1,000 CT volumes, where 15 anatomical structures are annotated by voxel. The CT volumes and annotations of W-1K will be reserved for external validation of AI algorithms. The final score will be calculated on the W-1K dataset, measuring both segmentation performance and inference speed of the AI algorithms. Note that the segmentation performance will not only be limited to the average segmentation performance but also prioritize the performance of hard-to-segment structures. We hope our BodyMaps challenge can set the stage for larger-scale clinical trials and offer exceptional opportunities to practitioners in the medical imaging community.
The segmentation accuracy metric:
Weighted Dice Similarity Coefficient (wDSC). This metric evaluates the overlap between algorithm output and ground truth, with a weighting factor that reflects the segmentation difficulty for each structure. The weight for each structure’s DSC is estimated based on the per-class segmentation performance reported in our preliminary experiments. We average the performance of UNet [Ronneberger et al., 2015], Swin UNETR [Hatamizadeh et al., 2021], and SegResNet [Myronenko et al., 2019] and subtract the performance from one to measure the difficulty of different structures. Then we apply Softmax on the result to get the weight that sums to one. Some structures are inherently more difficult to segment than others due to blurry boundaries, small in size, and tubular structures. Our weighted metric is novel compared to the common practice in segmentation challenges, where only the average DSC is calculated uniformly across all classes.
Weighted Normalized Surface Distance (wNSD): The wNSD emphasizes the accuracy of the boundary delineation between the predicted segmentation and the ground truth. This is particularly important for precise organ volume measurement and subsequent surgical planning.
The segmentation efficiency metric:
Standardized Running Time: Taking into account the time required for docker startup, data reading, and saving segmentation results, we recommend that the total time consumption per case should, on average, be within 90 seconds. The standardized running time for each case is considered a critical efficiency metric. It is calculated by dividing the actual time consumption by 90 seconds. Furthermore, to accommodate the workload during the testing stage, a dynamic time limit is established for each case, determined by the spacing and sizes of each test case. During the inference process, if the inference time exceeds this time limit for more than 20% of the cases, the inference will be terminated, and the submission will be classified as a failure submission.
Standardized Area Under GPU Memory-Time Curve (MB) [Ma et al. FLARE 2023]: The memory efficiency of the algorithm is evaluated over time, taking into account the computational resources utilized, as indicated by the GPU memory-time curve. It is recommended that the GPU memory consumption be kept below 24GB, aligning with the affordability and availability of such GPUs in most medical centers. The standardized Area Under GPU Memory-Time Curve for each case is considered another critical efficiency metric. It is calculated by dividing the Area Under GPU Memory-Time Curve by 24*1024*90.
The submitted docker containers will be evaluated on a Ubuntu 18.04 server. Detailed information is listed as follows:
CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz x 20
GPU: Quadro RTX 8000 (48G)
RAM: 377GB
Driver Version: 510.108.03
CUDA Version: 11.6
Singularity version: 4.0.2-focal
The challenge submission is based on Docker container. So, participants should demonstrate basic segmentation skills and the ability to encapsulate their methods in Docker. We provide a playground for participants to practice. Participants should
After reviewing your submission, we will get back to you with an Entry Number, then you can join the Challenge. We also provide a step-by-step tutorial if you are not familiar with 3D image segmentation.
There are three datasets used in our challenge. First, the training dataset, AbdomenAtlas, is provided on Google Drive where you can download the ground-truth masks and follow the corresponding instructions to download the CT scans. Note that you can use external datasets to train your model for better performance. Second, the validation dataset, TotalSegmentor and DAP Atlas, are public datasets that are used to evaluate your model. Last, the private testing dataset, W-1K, reflects a real-world, diverse patient population that encompasses a broad spectrum of pathological conditions, age groups, and demographic backgrounds.
After you finish the registration in the first step, the download link to AbdomenAtlas will be sent to you along with the Entry Number. As for the validation dataset, please follow the instructions to download TotalSegmentator and DAP Atlas.
Before you start developing your model, please check our repo for more details and requirements.
The task is to develop a model that can predict high-quality segmentations for abdominal organs. The training data consists of several thousands of examples on which models can be trained and validated.
Teams are allowed to use other data in addition to the official training set in order to construct their models, however, that data must be publicly available as of 1/16/2024. The same applies to pre-trained weights -- they must have been publicly available before the AbdomenAtlas dataset was released on 1/16. This is to prevent unfair advantages for teams that may have amassed large private datasets. All external data use must be described in detail in each team's accompanying paper (described in the following section).
Based on the performance we achieved by directly training the model on AbdomenAtlas and evaluated on TotalSegmentor (without post-processing), your model’s performance should be higher than the results in the following table:
The table will be available soon.
Wondering where to start? Some useful tutorials and previous methods are given below:
Strategies to improve the segmentation performance:
Strategies to improve the computational efficiency:
Training Details and Techniques
Some useful tutorials:
The primary goal of challenges like KiTS is to objectively assess the performance of competing methods. This is only possible if teams provide a complete description of the methods they use. Teams should follow the provided template [Overleaf, Google Docs] (Coming soon) and provide satisfactory answers to every field. Papers should otherwise follow the ISBI main conference guidelines for paper formatting. Drafts of these papers must be submitted by 04/15/2024.
The submission should include: (Email subject: YourTeamName-TeamLeaderName-Testing Submission)
(1) a download link to your Docker container (teamname.tar.gz); If the Docker container does not work, we will return back the error information to the participants. Participants with technical failure are allowed to resubmit their algorithms with one extra time. When the evaluation is finished, we will return back the evaluation metrics via email. All valid submission results will be reported on the leaderboard.
PLEASE REFER TO OUR REPO FOR MORE DETAILS.
(2) a sanity test video record (download example: Google drive, Baidu Netdisk) Please test your docker on validation case in DAP_Atlas: AutoPET_01140d52d8_56839.nii.gz, AutoPET_04ab5c61c9_42241.nii.gz, AutoPET_0011f3deaf_10445.nii.gz and record the prediction process.
(3) a methodology paper (template) Please carefully read the template and the common issue before writing the manuscript. The evaluation process mainly focuses on the paper's completeness. Don't worry about the low wDSC/wNSD. Since this segmentation task is very challenging, all attempts are worth sharing with readers. We will not reject papers because of low wDSC/wNSD.
The submitted Docker container will be evaluated with the following commands. If the Docker container does not work or the paper does not include all the necessary information to reproduce the method, we will return back the error information and review comments to participants.
singularity build teamname.sif docker-archive://teamname.tar
singularity exec --nv -B $PWD/BodyMaps2024_Test/:/workspace/inputs/ -B $PWD/teamname_outputs/:/workspace/outputs/ teamname.sif bash /workspace/predict.sh
Wenxuan Li (Johns Hopkins University)
Yu-Cheng Chou (Johns Hopkins University)
Jieneng Chen (Johns Hopkins University)
Qi Chen (University of Science and Technology of China)
Chongyu Qu (Johns Hopkins University)
Alan Yuille (Johns Hopkins University)
Zongwei Zhou (Johns Hopkins University)
Yaoyao Liu (Johns Hopkins University)
Angtian Wang (Johns Hopkins University)
Junfei Xiao (Johns Hopkins University)
Yucheng Tang (NVIDIA)
Xiaoxi Chen (Shanghai Jiao Tong University)
Jincheng Wang (The First Affiliated Hospital, Zhejiang University School of Medicine)
Huimin Xue (The First Hospital of China Medical University)
Yixiong Chen (Johns Hopkins University)
Yujiu Ma (Shengjing Hospital of China Medical University)
Yuxiang Lai (Southeast University)
Hualin Qiao (Rutgers University)
Yining Cao (China Medical University)
Haoqi Han (China Medical University)
Meihua Li (China Medical University)
Xiaorui Lin (China Medical University)
Yutong Tang (China Medical University)
Jinghui Xu (China Medical University)
The data download link to AbdomenAtlas will be sent to approved teams via email. Please make sure that you can download large files from Google Drive or Baidu Netdisk and have enough space and computing resources to process them.
Additional data and pre-trained models are allowed!
The challenge data is acquired from patients represented in the AbdomenAtlas [Qu et al., 2023, Li et al., 2023] and W-1K datasets, encompassing a broad spectrum of pathological conditions, age groups, and demographic backgrounds. This ensures that the challenge reflects a real-world, diverse patient population. Detailed statistics can be found in the corresponding publications.
For AbdomenAtlas, we will provide 75K masks and 1.2M annotated images that are taken from 68 hospitals worldwide, spanning four distinct phases: pre, portal, arterial, and delayed.
For W-1K, a total of 1,000 CT volumes, where 15 anatomical structures are annotated by voxel.
As for the class-index mapping of W-1K, please refer to the below table:
| Class | Index | Class | Index | Class | Index |
|---|---|---|---|---|---|
| Spleen | 1 | Stomach | 6 | Left adrenal gland | 11 |
| Right kidney | 2 | Aorta | 7 | Duodenum | 12 |
| Left kidney | 3 | Inferior Vena Cava | 8 | Colon | 13 |
| Gallbladder | 4 | Pancreas | 9 | Intestine | 14 |
| Liver | 5 | Right adrenal gland | 10 | Celiac Trunk | 15 |
Q: How long can the participation request be approved after sending the signed challenge rule?
A: The request will be approved within 2-4 working days if the signed challenge rule document is filled out correctly.
Q: I'm only interested in the challenge dataset but I do not want to join the challenge. Can I download the dataset without joining the challenge?
A: Thanks for your interest. To ensure enough submissions, the dataset is only available to participants during the challenge.
Q: How many people can form a team?
A: Each team can have at most 10 people. The authors in your paper should be the same as the team member list.
Q: I have joined the challenge and downloaded the dataset. Can I quit the challenge?
A: No! Please respect the signed agreement. If registered participants do not make successful submissions, all the team members will be listed in the dishonest list.
Q: Can we use other datasets or pre-trained models to develop the segmentation algorithms?
A: Yes.
Q: During the testing phase, can I modify the methods and the paper?
A: Yes, you can make modifications before the testing submission. After making the testing submission, you cannot make modifications.
Waiting for submission!
We will provide cash prizes (alternative way: equal value Amazon gift card) for the top 5 teams: First prize: 500 USD/Second prize: 300 USD/Third prize: 200 USD/4rd-5th: 100 USD. A certificate will be awarded to the top 10 teams. The top 10 performing methods (teams) will be announced publicly and invited to give oral presentations during the ISBI 2024 conference. All participating teams have the opportunity to publish their results on the ISBI 2024 and other vision conference proceedings.
Start: Jan. 10, 2024, midnight
Description: Testing phase: create models and submit them via email; feed-back is provided on the test set only; submissions on Codalab are not allowed.
Never
You must be logged in to participate in competitions.
Sign In