NTIRE 2022 Image Inpainting Challenge Track 2 Semantic Guidance Forum

Go back to competition Back to thread list Post in this thread

> Evaluation metics and dataset

There will be 4 metrics used for ranking, LPIPS/FID/MOS/mIoU, right? How important is mIoU in the final ranking?

Besides, it is said that "rack 2 is only evaluated over the Places dataset" in https://codalab.lisn.upsaclay.fr/competitions/1608#participate. But It's also said Track 2 is "evaluated over those datasets with semantic labels, that is, FFHQ and Places" in https://codalab.lisn.upsaclay.fr/competitions/1608#learn_the_details-evaluation. So could you clarify which one is accurate?

Thanks

Posted by: divx @ Feb. 26, 2022, 2:41 p.m.

Hello, for the sake of simplicity and tuning your parameters during validation, only Places is considered at validation time. At test time, both datasets are considered.

Posted by: afromero @ Feb. 26, 2022, 3:18 p.m.

Regarding your first question, we are aware that the semantic maps for the Places dataset are far from accurate, yet it only serves as a proxy. Consequently, on this dataset, the mIoU will not play an important role. Conversely, the semantic maps on FFHQ are more reliable, so the mIoU here is important.

Posted by: afromero @ Feb. 26, 2022, 3:23 p.m.

Thanks for the answers. Regarding FFHQ, training data is 1024x1024, but 512x512 in val set. Any specifics about the resizing process, or code that all participants could use?

Posted by: divx @ Feb. 27, 2022, 1:52 a.m.

There is complete freedom in that regard

Posted by: afromero @ Feb. 27, 2022, 3:31 p.m.

Hi, the link you provide for computing FID only supports FFHQ. How should we do for Places dataset?

Posted by: min01 @ March 9, 2022, 6:35 a.m.

Hello,
That link in the FID is just to point a direction on a way to compute the FID, yet there is not the only one. The FID uses a pretrained network (typically InceptionV3) to compute feature vectors, which are then used to compute some statistics.
The provided link assumes those statistics from the reference dataset are pre-computed: https://github.com/GaParmar/clean-fid/blob/45eebe437a88a1031dd5a4eac4903fb1ef33fb50/cleanfid/features.py#L58, and that is why there are only a handful of datasets available in that repo.

You can of course pre-compute those for Places, or you can alternatively use another FID code that does everything for you: https://github.com/mseitzer/pytorch-fid, take a look at https://github.com/mseitzer/pytorch-fid/blob/master/src/pytorch_fid/fid_score.py#L246

Posted by: afromero @ March 9, 2022, 11:51 a.m.

Hi, since there are millions of training images for Places, how will you calculate the statistics? Does a subset work? Should we calculate a FID score for each type or a FID score for all generated images? Thanks!

Posted by: min01 @ March 9, 2022, 8:55 p.m.

Hello, the FID assumes you take a decent amount of data that can represent well your distribution. Similar to ImageNet which contains quite a lot of images, it is common to only take several thousands of them (~30k, ~50k,...).
For instance, the inpainting method LaMa takes 30k for places: https://github.com/saic-mdal/lama#places. An image synthesis method (https://arxiv.org/pdf/2202.00273.pdf - Appendix C), which uses ImageNet, computes the FID between 50k from real images and 50k generated. Whereas, for the same dataset, another method (https://arxiv.org/pdf/2105.05233.pdf), computes the FID between 10k real images and 50k generated. Sometimes when you use a small sample (~1k or so), the FID is not that reliable.

TL;DR: You do not have to take millions of images to compute the FID, several thousand will suffice.

Posted by: afromero @ March 9, 2022, 11:49 p.m.

And regarding the last part of your question, it would be interesting to have both, that is, a general FID for your whole method and independent FIDs for each type of mask. The former can compare your method directly to others, while the latter can give you insights about the best/worst type of mask performance of your solution.

Posted by: afromero @ March 9, 2022, 11:55 p.m.
Post in this thread