Track2 re-ranking task provides many captions related to the image. Can we use this data (for example retrieval augmented) to enhance performance for zero-shot captioning? How to judge if someone directly uses captions from track2 to handle track1 task?
Posted by: panzeyu2013 @ Feb. 18, 2024, 7:37 a.m.Hi,
As mentioned in the challenge descriptions, the candidate captions provided in the track 2 were generated with AI models.
Those captions can be utilized in phase 1, but please note that they are not guaranteed to enhance the performance for the zero-shot image captioning.