DIMT25@ICDAR-OCR-based(Translation-LLM)

Organized by liangyupu - Current server time: Sept. 18, 2025, 8:29 p.m. UTC

First phase

Start
March 19, 2025, noon UTC

End

Competition Ends
April 21, 2025, noon UTC

Overview

This is the official competition submission website for the DIMT25@ICDAR challenge (Competition website Link) Track 1. OCR-based DIMT(Translation).

About Data Download

Please download the End User License Agreement (here), fill it out and send it to dimt2025.contact@gmail.com to access the data.

We will review your application and get in touch as soon as possible.

Contact email: dimt2025.contact@gmail.com

(Please make sure to use the same email address to register your huggingface account and submit your download application on huggingface.)

About Submitting Results

Participants are required to translate all the images in the testset_wo_label.json based on the OCR results in this json file into Simplified Chinese (zh-CN), and fill in the answer.json file. The file should be zipped and submitted to Codalab.

In the answer.json file, the key corresponds to the image file name, and the value is the Chinese translation after using jieba (jieba GitHub Link) word segmentation (the default mode of jieba.cut uses spaces as delimiters between words). Each image corresponds to a single string.

The testset_wo_label.json can be downloaded from huggingface: Track 1 Dataset Huggingface Link. The answer.json can be downloaded from the public data in Participate → Files → Public Data.

The file name should be answer.json. You should compress answer.json in answer.zip (no subdirectories) and submit answer.zip to Codalab. We provide an example of answer.zip in the public data.

About Evaluation

BLEU is used as the evaluation metric. We implement the evaluation script with corpus_bleu in NLTK (Link).

The evalutation script evaluate.py can be downloaded from the public data in Participate → Files → Public Data.

About Evaluation

BLEU is used as the evaluation metric. We implement the evaluation script with corpus_bleu in NLTK (Link).

The evalutation script evaluate.py can be downloaded from the public data in Participate → Files → Public Data.

Rules

You may submit 20 submissions every day.

This challenge is governed by the general ChaLearn contest rules.

Start

Start: March 19, 2025, noon

Description: The file name should be 'answer.json'. You should compress 'answer.json' in 'answer.zip' and submit 'answer.zip' into the Codalab.

Competition Ends

April 21, 2025, noon

You must be logged in to participate in competitions.

Sign In