The second DIriazation of SPeaker and LAnguage in Conversational Environments (DISPLACE) challenge entails a first of kind task to perform speaker and language diarization on the same data containing multi-speaker social conversations in multi-lingual code-mixed speech. DISPLACE2024 reflects the theme of Interspeech 2024 - “Speech and Beyond” as it aims to:
In multi-lingual communities, social conversations frequently involve code-mixed and code-switched speech. Code-mixing happens when words or morphemes from one language (secondary) are used within a sentence of another language (primary), whereas code-switching refers to the switching of languages at the sentence or phrase level leading to a shift in the conversational language. In such cases, various speech processing systems need to perform the speaker and language segmentation before any downstream task. The current speaker diarization systems are not equipped to handle multi-lingual conversations, while the language recognition systems may not be able to handle the same talker speaking in multiple languages within the same recording. With this motivation, the DISPLACE challenge attempts to benchmark and improve Speaker Diarization (SD) in multi-lingual settings, Language Diarization (LD) in multi-speaker scenarios and and ASR in multi-accent settings, using the same underlying dataset (only audio modality).
Participation in this challenge is open to all who are interested in contributing towards reaching a new milestone in the speaker diarization, language diarization and ASR areas. This challenge provides dataset (development and evaluation), leaderboard (for online evaluation) and baseline systems to all the participating teams. The results of the challenge will be presented at Interspeech conference to be held at Kos Island, Greece during 1st - 5th September 2024. More information about the challenge can be found on the DISPLACE2024 website.
The DISPLACE challenge is organized as an open evaluation where blind evaluation sets will be sent to each team. The evaluation will be done in two phases, namely, Phase-1 and Phase-2. The overall timeline for the challenge can be found here, whereas the relevant evaluation deadlines for these phases are as follows:
Phase | Dates |
---|---|
Phase-1 Evaluation Opens | 1 Feb 2024 |
Phase-1 Evaluation Closes | 28 Feb 2024 |
Phase-2 Evaluation Opens | 1 Apr 2024 |
Phase-2 Evaluation Closes | 20 Apr 2024 |
Each team needs to evaluate their systems on the Phase-1 and Phase-2 blind sets locally and upload their output RTTMs and Transcription files(with specified file naming and folder structure) to the DISPLACE online leaderboard for evaluation.
The submitted compressed file (in .zip) must have the following directory structure:
For Track-1 (Speaker diarization in multilingual scenarios)
SPEAKER.zip/<session1_name>_SPEAKER_sys.rttm
SPEAKER.zip/<session2_name>_SPEAKER_sys.rttm
For Track-2 (Language diarization in multi-speaker settings)
LANGUAGE.zip/<session1_name>_LANGUAGE_sys.rttm
LANGUAGE.zip/<session2_name>_LANGUAGE_sys.rttm
For Track-3 (Automatic Speech Recognition in multi-accent settings)
For Closefield Data:
ASR_closefield.zip/<session1_name>_<LANGUAGEID>.wrd.trn
ASR_closefield.zip/<session2_name>_<LANGUAGEID>.wrd.trn
NOTE: Please merge all speaker transcription files of a specific language within a session into a single transcription file when submitting for evaluation. For example, concatenate adjq_S1_hi.wrd.trn, adjq_S2_hi.wrd.trn and adjq_S3_hi.wrd.trn into adjq_hi.wrd.trn
For Farfield Data:
ASR_farfield.zip/<session1_name>_<LANGUAGEID>.wrd.trn
ASR_farfield.zip/<session2_name>_<LANGUAGEID>.wrd.trn
The zipped submission folders for closefield Dev data are to be submitted in the 'ASR evaluation on Dev - Closefield' phase tab in the Submit/View Results section. Similarly, the zipped submission folders for farfield Dev data are to be submitted in the 'ASR evaluation on Dev - Farfield' phase tab in the Submit/View Results section. The same applies to Eval closefield and fardield data as well. It is mandatory to make submissions for both closefield and farfield data
NOTE: The compressed file must not contain the following structure:
SPEAKER.zip/<Directory>/<session1_name>_SPEAKER_sys.rttm
LANGUAGE.zip/<Directory>/<session1_name>_LANGUAGE_sys.rttm
Also make sure to check out the complete evaluation protocol and guidelines in the Evaluation Plan.
The data for the challenge is provided as described in the DISPLACE dataset description document under the terms of the MIT license. All the participants are requested to give the terms and conditions document a thorough read for valid submissions.
The registered participants will get the datasets through email.
Start: Feb. 2, 2024, midnight
Start: Feb. 2, 2024, midnight
Start: March 6, 2024, midnight
Start: Feb. 2, 2024, midnight
Start: Feb. 2, 2024, midnight
Start: March 6, 2024, midnight
Start: Feb. 2, 2024, midnight
Start: Feb. 2, 2024, midnight
Start: Feb. 2, 2024, midnight
Start: Feb. 2, 2024, midnight
Start: March 6, 2024, midnight
Start: March 6, 2024, midnight
May 20, 2024, midnight
You must be logged in to participate in competitions.
Sign In