Trojan Detection Challenge 2023 - Red Teaming Track (Base Model Subtrack)

Organized by mmazeika - Current server time: Dec. 21, 2024, 2:37 p.m. UTC
Reward $30,000

First phase

Development
July 26, 2023, 7 a.m. UTC

End

Competition Ends
Nov. 6, 2023, noon UTC

Development

Start: July 26, 2023, 7 a.m.

Description: In this phase, participants can submit test cases for the dev phase LLM. Submissions are evaluated using behavior classifiers that identify whether a test case elicited a particular behavior from the LLM. This leaderboard does not determine the final ranking and is primarily for developing red teaming algorithms and comparing to other participants and the baselines. Participants can make 5 submissions per day. All values in the leaderboard are percentages.

Test

Start: Nov. 1, 2023, noon

Description: In this phase, participants can submit test cases for the test phase LLM. Submissions are evaluated automatically using behavior classifiers that identify whether a test case elicited a particular behavior from the LLM. This leaderboard determines the rankings used to select the top-ten teams for manual evaluation. Manual scores will determine the final rankings used for awarding prizes. Participants can make 5 submissions total. All values in the leaderboard are percentages.

Competition Ends

Nov. 6, 2023, noon

You must be logged in to participate in competitions.

Sign In