CodaLab - Competition

ISBI BodyMaps24: 3D Atlas of Human Body

Organized by johnson - Current server time: Nov. 26, 2025, 2:45 p.m. UTC

Current

Testing

Jan. 10, 2024, midnight UTC

End

Competition Ends

Never

Overview
Evaluation
Terms and Conditions
Timeline
How to Participate
Organizers
Dataset
Q&A
Testing Results
Awards

Overview

Contact: bodymaps.official@gmail.com

Variations in organ sizes and shapes can indicate a range of medical conditions, from benign anomalies to life-threatening diseases. Precise organ volume measurement is fundamental for effective patient care, but manual organ contouring is extremely time-consuming and exhibits considerable variability among expert radiologists. Artificial Intelligence (AI) holds the promise of improving volume measurement accuracy and reducing manual contouring efforts. We formulate our challenge as a semantic segmentation task, which automatically identifies and delineates the boundary of various anatomical structures essential for numerous downstream applications such as disease diagnosis and treatment planning. Our primary goal is to promote the development of advanced AI algorithms and to benchmark the state of the art in this field.

The BodyMaps challenge particularly focuses on assessing and improving the generalizability and efficiency of AI algorithms in medical segmentation across diverse clinical settings and patient demographics. In light of this, the innovation of our BodyMaps challenge includes the use of (1) large-scale, diverse datasets for training and evaluating AI algorithms, (2) novel evaluation metrics that emphasize the accuracy of hard-to-segment anatomical structures, and (3) penalties for algorithms with extended inference times. Specifically, this challenge involves two unique datasets. First, AbdomenAtlas, the largest annotated dataset [Qu et al., 2023, Li et al., 2023], contains a total of 10,142 three-dimensional computed tomography (CT) volumes. In each CT volume, 25 anatomical structures are annotated by voxel. AbdomenAtlas is a multi-domain dataset of pre, portal, arterial, and delayed phase CT volumes collected from 88 global hospitals in 9 countries, diversified in age, pathological conditions, body parts, and race background. The AbdomenAtlas dataset will be released by stages to the public for AI development, where in each stage we will release 1,000 annotated CT volumes. Second, W-1K is a proprietary collection of 1,000 CT volumes, where 15 anatomical structures are annotated by voxel. The CT volumes and annotations of W-1K will be reserved for external validation of AI algorithms. The final score will be calculated on the W-1K dataset, measuring both segmentation performance and inference speed of the AI algorithms. Note that the segmentation performance will not only be limited to the average segmentation performance but also prioritize the performance of hard-to-segment structures. We hope our BodyMaps challenge can set the stage for larger-scale clinical trials and offer exceptional opportunities to practitioners in the medical imaging community.

Evaluation

Evaluation Metrics

The segmentation accuracy metric:

Weighted Dice Similarity Coefficient (wDSC). This metric evaluates the overlap between algorithm output and ground truth, with a weighting factor that reflects the segmentation difficulty for each structure. The weight for each structure’s DSC is estimated based on the per-class segmentation performance reported in our preliminary experiments. We average the performance of UNet [Ronneberger et al., 2015], Swin UNETR [Hatamizadeh et al., 2021], and SegResNet [Myronenko et al., 2019] and subtract the performance from one to measure the difficulty of different structures. Then we apply Softmax on the result to get the weight that sums to one. Some structures are inherently more difficult to segment than others due to blurry boundaries, small in size, and tubular structures. Our weighted metric is novel compared to the common practice in segmentation challenges, where only the average DSC is calculated uniformly across all classes.

Weighted Normalized Surface Distance (wNSD): The wNSD emphasizes the accuracy of the boundary delineation between the predicted segmentation and the ground truth. This is particularly important for precise organ volume measurement and subsequent surgical planning.

The segmentation efficiency metric:

Standardized Running Time: Taking into account the time required for docker startup, data reading, and saving segmentation results, we recommend that the total time consumption per case should, on average, be within 90 seconds. The standardized running time for each case is considered a critical efficiency metric. It is calculated by dividing the actual time consumption by 90 seconds. Furthermore, to accommodate the workload during the testing stage, a dynamic time limit is established for each case, determined by the spacing and sizes of each test case. During the inference process, if the inference time exceeds this time limit for more than 20% of the cases, the inference will be terminated, and the submission will be classified as a failure submission.

Standardized Area Under GPU Memory-Time Curve (MB) [Ma et al. FLARE 2023]: The memory efficiency of the algorithm is evaluated over time, taking into account the computational resources utilized, as indicated by the GPU memory-time curve. It is recommended that the GPU memory consumption be kept below 24GB, aligning with the affordability and availability of such GPUs in most medical centers. The standardized Area Under GPU Memory-Time Curve for each case is considered another critical efficiency metric. It is calculated by dividing the Area Under GPU Memory-Time Curve by 24*1024*90.

Evaluation Platform

The submitted docker containers will be evaluated on a Ubuntu 18.04 server. Detailed information is listed as follows:

CPU: Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz x 20
GPU: Quadro RTX 8000 (48G)
RAM: 377GB
Driver Version: 510.108.03
CUDA Version: 11.6
Singularity version: 4.0.2-focal

Terms and conditions

All participants should register for this challenge with their real names, affiliations (including department, full name of university/institute/company, country), and affiliation E-mails.
Incomplete and redundant registrations will be removed without notice. Each team can have at most ten people.
Participants are not allowed to register multiple teams and accounts. Participants from the same research group are also not allowed to register multiple teams. BodyMaps Organizers keep the right to disqualify such participants.
All participants must submit a complete solution to this challenge for testing. A complete solution includes a Docker container (tar file) and a qualified methodology paper (at least 2 pages, LNCS format).
All participants should agree that the submitted short papers can be publicly available to the community on the challenge website, and organizers can use the information provided by the participants, including scores, predicted labels, and papers.

Timeline

Challenge website running and registration open: 01/10/2024
Release of the dataset and starter code: 01/16/2024
Submission deadline: 04/15/2024
Release of final results (decisions): 04/20/2024
Challenge days (ISBI main conference): 05/27/2024 – 05/30/2024

How to Participate

Join the Challenge:

The challenge submission is based on Docker container. So, participants should demonstrate basic segmentation skills and the ability to encapsulate their methods in Docker. We provide a playground for participants to practice. Participants should

Develop any segmentation method (e.g., U-Net) based on the playground training dast and encapsulate the method by Docker.
Use Docker to predict the testing set and record 5-10 minutes of the predicting process as a video (mp4 format).
Submit the segmentation results here and upload your Docker to DockerHub. Send the following items to bodymaps.official@gmail.com
(1) docker hub link;
(2) download link to the recorded inference mp4 video;
(3) the screenshot of your playground leaderboard results (Mean DSC>0.8).

After reviewing your submission, we will get back to you with an Entry Number, then you can join the Challenge. We also provide a step-by-step tutorial if you are not familiar with 3D image segmentation.

Download the data:

There are three datasets used in our challenge. First, the training dataset, AbdomenAtlas, is provided on Google Drive where you can download the ground-truth masks and follow the corresponding instructions to download the CT scans. Note that you can use external datasets to train your model for better performance. Second, the validation dataset, TotalSegmentor and DAP Atlas, are public datasets that are used to evaluate your model. Last, the private testing dataset, W-1K, reflects a real-world, diverse patient population that encompasses a broad spectrum of pathological conditions, age groups, and demographic backgrounds.

After you finish the registration in the first step, the download link to AbdomenAtlas will be sent to you along with the Entry Number. As for the validation dataset, please follow the instructions to download TotalSegmentator and DAP Atlas.

Create your model:

Before you start developing your model, please check our repo for more details and requirements.

The task is to develop a model that can predict high-quality segmentations for abdominal organs. The training data consists of several thousands of examples on which models can be trained and validated.

Teams are allowed to use other data in addition to the official training set in order to construct their models, however, that data must be publicly available as of 1/16/2024. The same applies to pre-trained weights -- they must have been publicly available before the AbdomenAtlas dataset was released on 1/16. This is to prevent unfair advantages for teams that may have amassed large private datasets. All external data use must be described in detail in each team's accompanying paper (described in the following section).

Based on the performance we achieved by directly training the model on AbdomenAtlas and evaluated on TotalSegmentor (without post-processing), your model’s performance should be higher than the results in the following table:

The table will be available soon.

Wondering where to start? Some useful tutorials and previous methods are given below:

Strategies to improve the segmentation performance:

Preprocessing: intensity normalization, resampling...
Extensive data augmentations;
Coarse-to-fine (two-stage or cascaded) framework;
Postprocessing: connected component analysis;

Strategies to improve the computational efficiency:

Whole-volume based input;
Lightweight network modules: residual block with bottleneck, separated convolutions, and pyramid pooling;
Accelerate inference by ONNX Runtime, TensorRT...

Training Details and Techniques

Using a Larger Patch Size
Utilizing a Pre-trained Model based on the Totalsegmentor Dataset
Incorporating Other Publicly Available Data for Training the Same Task

Some useful tutorials:

Describe your approach with a short paper:

The primary goal of challenges like KiTS is to objectively assess the performance of competing methods. This is only possible if teams provide a complete description of the methods they use. Teams should follow the provided template [Overleaf, Google Docs] (Coming soon) and provide satisfactory answers to every field. Papers should otherwise follow the ISBI main conference guidelines for paper formatting. Drafts of these papers must be submitted by 04/15/2024.

Submit your algorithm container (testing leaderboard):

The submission should include: (Email subject: YourTeamName-TeamLeaderName-Testing Submission)

(1) a download link to your Docker container (teamname.tar.gz); If the Docker container does not work, we will return back the error information to the participants. Participants with technical failure are allowed to resubmit their algorithms with one extra time. When the evaluation is finished, we will return back the evaluation metrics via email. All valid submission results will be reported on the leaderboard.

PLEASE REFER TO OUR REPO FOR MORE DETAILS.

(2) a sanity test video record (download example: Google drive, Baidu Netdisk) Please test your docker on validation case in DAP_Atlas: AutoPET_01140d52d8_56839.nii.gz, AutoPET_04ab5c61c9_42241.nii.gz, AutoPET_0011f3deaf_10445.nii.gz and record the prediction process.

(3) a methodology paper (template) Please carefully read the template and the common issue before writing the manuscript. The evaluation process mainly focuses on the paper's completeness. Don't worry about the low wDSC/wNSD. Since this segmentation task is very challenging, all attempts are worth sharing with readers. We will not reject papers because of low wDSC/wNSD.

The submitted Docker container will be evaluated with the following commands. If the Docker container does not work or the paper does not include all the necessary information to reproduce the method, we will return back the error information and review comments to participants.

singularity build teamname.sif docker-archive://teamname.tar

singularity exec --nv -B $PWD/BodyMaps2024_Test/:/workspace/inputs/ -B $PWD/teamname_outputs/:/workspace/outputs/ teamname.sif bash /workspace/predict.sh

Organizers

Wenxuan Li (Johns Hopkins University)
Yu-Cheng Chou (Johns Hopkins University)
Jieneng Chen (Johns Hopkins University)
Qi Chen (University of Science and Technology of China)
Chongyu Qu (Johns Hopkins University)
Alan Yuille (Johns Hopkins University)
Zongwei Zhou (Johns Hopkins University)

Technical Support:

Yaoyao Liu (Johns Hopkins University)
Angtian Wang (Johns Hopkins University)
Junfei Xiao (Johns Hopkins University)
Yucheng Tang (NVIDIA)

Annotation Team:

Experts:

Xiaoxi Chen (Shanghai Jiao Tong University)
Jincheng Wang (The First Affiliated Hospital, Zhejiang University School of Medicine)

Trainees:

Huimin Xue (The First Hospital of China Medical University)
Yixiong Chen (Johns Hopkins University)
Yujiu Ma (Shengjing Hospital of China Medical University)
Yuxiang Lai (Southeast University)
Hualin Qiao (Rutgers University)
Yining Cao (China Medical University)
Haoqi Han (China Medical University)
Meihua Li (China Medical University)
Xiaorui Lin (China Medical University)
Yutong Tang (China Medical University)
Jinghui Xu (China Medical University)

Dataset

Dataset Instruction

The data download link to AbdomenAtlas will be sent to approved teams via email. Please make sure that you can download large files from Google Drive or Baidu Netdisk and have enough space and computing resources to process them.

Additional data and pre-trained models are allowed!

Dataset Description

The challenge data is acquired from patients represented in the AbdomenAtlas [Qu et al., 2023, Li et al., 2023] and W-1K datasets, encompassing a broad spectrum of pathological conditions, age groups, and demographic backgrounds. This ensures that the challenge reflects a real-world, diverse patient population. Detailed statistics can be found in the corresponding publications.

For AbdomenAtlas, we will provide 75K masks and 1.2M annotated images that are taken from 68 hospitals worldwide, spanning four distinct phases: pre, portal, arterial, and delayed.

For W-1K, a total of 1,000 CT volumes, where 15 anatomical structures are annotated by voxel.

As for the class-index mapping of W-1K, please refer to the below table:

Class	Index	Class	Index	Class	Index
Spleen	1	Stomach	6	Left adrenal gland	11
Right kidney	2	Aorta	7	Duodenum	12
Left kidney	3	Inferior Vena Cava	8	Colon	13
Gallbladder	4	Pancreas	9	Intestine	14
Liver	5	Right adrenal gland	10	Celiac Trunk	15

Q&A

Registration Related issues:

Q: How long can the participation request be approved after sending the signed challenge rule?

A: The request will be approved within 2-4 working days if the signed challenge rule document is filled out correctly.

Q: I'm only interested in the challenge dataset but I do not want to join the challenge. Can I download the dataset without joining the challenge?

A: Thanks for your interest. To ensure enough submissions, the dataset is only available to participants during the challenge.

Q: How many people can form a team?

A: Each team can have at most 10 people. The authors in your paper should be the same as the team member list.

Q: I have joined the challenge and downloaded the dataset. Can I quit the challenge?

A: No! Please respect the signed agreement. If registered participants do not make successful submissions, all the team members will be listed in the dishonest list.

Algorithm-related issues:

Q: Can we use other datasets or pre-trained models to develop the segmentation algorithms?

A: Yes.

Q: During the testing phase, can I modify the methods and the paper?

A: Yes, you can make modifications before the testing submission. After making the testing submission, you cannot make modifications.

Testing Results

Waiting for submission!

Awards

We will provide cash prizes (alternative way: equal value Amazon gift card) for the top 5 teams: First prize: 500 USD/Second prize: 300 USD/Third prize: 200 USD/4rd-5th: 100 USD. A certificate will be awarded to the top 10 teams. The top 10 performing methods (teams) will be announced publicly and invited to give oral presentations during the ISBI 2024 conference. All participating teams have the opportunity to publish their results on the ISBI 2024 and other vision conference proceedings.

Testing

Start: Jan. 10, 2024, midnight

Description: Testing phase: create models and submit them via email; feed-back is provided on the test set only; submissions on Codalab are not allowed.

Competition Ends

Never

You must be logged in to participate in competitions.