MICO - CIFAR-10

Organized by micochallenge - Current server time: April 29, 2025, 12:06 a.m. UTC
Reward $3,000

First phase

Development
Nov. 1, 2022, midnight UTC

End

Competition Ends
Jan. 27, 2023, noon UTC

MICO

Welcome to the Microsoft Membership Inference Competition (MICO)! In this competition, you will evaluate the effectiveness of differentially private model training as a mitigation against white-box membership inference attacks.

What is Membership Inference?

Membership inference is a widely-studied class of threats against Machine Learning (ML) models. The goal of a membership inference attack is to infer whether a given record was used to train a specific ML model. An attacker might have full access to the model and its weights (known as "white-box" access), or might only be able to query the model on inputs of their choice ("black-box" access). In either case, a successful membership inference attack could have negative consequences, especially if the model was trained on sensitive data.

Membership inference attacks vary in complexity. In a simple case, the model might have overfitted to its training data, so that it outputs higher confidence predictions when queried on training records than when queried on records that the model has not seen during training. Recognizing this, an attacker could simply query the model on records of their interest, establish a threshold on the model's confidence, and infer that records with higher confidence are likely members of the training data. In a white-box setting, as is the case for this competition, the attacker can use more sophisticated strategies that exploit access to the internals of the model.

What is MICO?

In MICO, your goal is to perform white-box membership inference against a series of trained ML models that we provide. Specifically, given a model and a set of challenge points, the aim is to decide which of these challenge points were used to train the model.

You can compete on any of four separate membership inference tasks against classification models for image, text, and tabular data, as well as on a special Differential Privacy Distinguisher task spanning all 3 modalities. Each task will be scored separately. You do not need to participate in all of them, and can choose to participate in as many as you like. Throughout the competition, submissions will be scored on a subset of the evaluation data and ranked on a live scoreboard. When submission closes, the final scores will be computed on a separate subset of the evaluation data.

The winner of each task will be eligible for an award of $2,000 USD from Microsoft and the runner-up of each task for an award of $1,000 USD from Microsoft (in the event of tied entries, these awards may be adjusted). This competition is co-located with the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) 2023, and the winners will be invited to present their strategies at the conference.

Getting started

Please select the "Participate" tab above, and register for the competition. Once registered, you will be given URLs from which to download the challenge data.

The accompanying repository contains starting kit Jupyter notebooks which will guide you through making your first submission. To use it, clone the repository and follow the steps below:

  • pip install -r requirements.txt. You may want to do this in a virtualenv.
  • pip install -e .
  • cd starting-kit/
  • pip install -r requirements-starting-kit.txt
  • The corresponding starting kit notebook illustrates how to load the challenge data, run a basic membership inference attack, and prepare an archive to submit to CodaLab.
Mico Argentatus (Silvery Marmoset) - William Warby

Mico Argentatus (Silvery Marmoset) - William Warby/Flickr

Evaluation

Submissions will be ranked based on their performance in white-box membership inference against the provided models.

There are three sets of challenges: train, dev, and final. For models in train, we reveal the full training dataset, and consequently the ground truth membership data for challenge points. These models can be used by participants to develop their attacks. For models in the dev and final sets, no ground truth is revealed and participants must submit their membership predictions for challenge points.

During the competition, there will be a live scoreboard based on the dev challenges. The final ranking will be decided on the final set; scoring for this dataset will be withheld until the competition ends.

For each challenge point, the submission must provide a value, indicating the confidence level with which the challenge point is a member. Each value must be a floating point number in the range [0.0, 1.0], where 1.0 indicates certainty that the challenge point is a member, and 0.0 indicates certainty that it is a non-member.

Submissions will be evaluated according to their True Positive Rate at 10% False Positive Rate (i.e. TPR @ 0.1 FPR). In this context, positive challenge points are members and negative challenge points are non-members. For each submission, the scoring program concatenates the confidence values for all models (dev and final treated separately) and compares these to the reference ground truth. The scoring program determines the minimum confidence threshold for membership such that at most 10% of the non-member challenge points are incorrectly classified as members. The score is the True Positive Rate achieved by this threshold (i.e., the proportion of correctly classified member challenge points). The live scoreboard shows additional scores (i.e., TPR at other FPRs, membership inference advantage, accuracy, AUC-ROC score). These are only informational.

You are allowed to make multiple submissions, but only your latest submission will be considered. In order for a submission to be valid, you must submit confidence values for all challenge points in all three scenarios of the task.

Hints and tips: - We do realize that the score of a submission leaks some information about the ground truth. However, using this information to optimize a submission based only on the live scoreboard (i.e., on dev) is a bad strategy, as this score has no relevance on the final ranking. - Pay a special attention to the evaluation metric (TPR @ 0.1 FPR). Your average accuracy at predicting membership in general may be misleading. Your attack should aim to maximize the number of predicted members whilst remaining below the specified FPR.

Winner Selection

Winners will be selected independently for each task (i.e. if you choose not to participate in certain tasks, this will not affect your rank for the tasks in which you do participate). For each task, the winner will be the one achieving the highest average score (TPR @ 0.1 FPR) across the three scenarios.

Terms & Conditions

  • This challenge is subject to the Microsoft Bounty Terms and Conditions.

  • Microsoft employees and students/employees of Imperial College London may submit solutions, but are not eligible to receive awards.

  • Submissions will be evaluated by a panel of judges according to the aims of the competition.

  • Winners may be asked to provide their code and/or a description of their strategy to the judges for verification purposes.

Development

Start: Nov. 1, 2022, midnight

Description: Development phase: submit membership inference predictions for `dev` and `final`. The live scoreboard shows scores on `dev` only.

Final

Start: Jan. 27, 2023, noon

Description: Final phase: submissions from the previous phase are automatically migrated and used to compute the score on `final` and determine the final ranking. Final scores are revealed when the organizers make them available.

Competition Ends

Jan. 27, 2023, noon

You must be logged in to participate in competitions.

Sign In