Hello everyone, I'm sharing some data insights I got after using a tool known as fastdup on the labeled and unlabeled images.
TLDR
> I found many duplicates, especially in the unlabeled data.
> There are a number of images that are too dark/bright to be useful.
> There are clusters of images that appear to be similar.
I share my findings on my GitHub repo. Hope it helps anyone get started.
https://github.com/dnth/mafat-fastdup-blogpost
Let me know if you have other insights.
Posted by: dickson_neoh @ March 7, 2023, 11:29 a.m.