> Sharing some data insights

Hello everyone, I'm sharing some data insights I got after using a tool known as fastdup on the labeled and unlabeled images.

> I found many duplicates, especially in the unlabeled data.
> There are a number of images that are too dark/bright to be useful.
> There are clusters of images that appear to be similar.

I share my findings on my GitHub repo. Hope it helps anyone get started.

Let me know if you have other insights.

Posted by: dickson_neoh @ March 7, 2023, 11:29 a.m.
