Skip to content

Commit 9e5d7a8

Browse files
authored
Update README.md
1 parent 76d0893 commit 9e5d7a8

File tree

1 file changed

+7
-11
lines changed

1 file changed

+7
-11
lines changed

README.md

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,29 +8,25 @@ From the authors of [GraphLab](https://github.com/jegonzal/PowerGraph) and [Turi
88
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/mscoco_duplicates-min.png)
99
*Duplicates and near duplicates identified in [MS-COCO](https://cocodataset.org/#home) and [Imagenet-21K](https://www.image-net.org) dataset*
1010

11-
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/landmark_outliers-min.png)
12-
*Outliers in the [Google Landmark Recognition 2021 dataset](https://www.kaggle.com/competitions/landmark-recognition-2021) (dataset intention is to capture recognizable landmarks, like the empire state building etc.)*
13-
1411
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/imdb_outliers-min.png)
1512
*[IMDB-WIKI](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/ ) outliers (data goal is for face detection, gender and age detection)*
1613

14+
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/landmark_outliers-min.png)
15+
*Outliers in the [Google Landmark Recognition 2021 dataset](https://www.kaggle.com/competitions/landmark-recognition-2021) (dataset intention is to capture recognizable landmarks, like the empire state building etc.)*
16+
1717
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/red_wine.png)
1818
*Cluster of wrong labels in the [Imagenet-21K](https://www.image-net.org) dataset. No human can tell those red wine flavors from their image.*
1919

20-
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/daisy.png)
21-
*Cluster of wrong labels in the [Imagenet-21K](https://www.image-net.org) dataset.*
22-
23-
24-
25-
2620
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/imagenet21k_wrong_labels-min.png)
27-
*Wrong labels in the [Imagenet-21K](https://www.image-net.org) dataset*
21+
*Wrong labels in the [Imagenet-21K](https://www.image-net.org) dataset* Different labels to visaully similar daisy flower images.
22+
23+
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/daisy.png)
24+
*Cluster of wrong labels in the [Imagenet-21K](https://www.image-net.org) dataset.* Different labels to visually similar red-wine images.
2825

2926
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/imagenet21k_funny-min.png)
3027
*Fun labels in the [Imagenet-21K](https://www.image-net.org) dataset*
3128

3229

33-
3430
## Results on Key Datasets
3531
We have thoroughly tested fastdup across various famous visual datasets. Ranging from pilar Academic datasets to Kaggle competitions. A key finding we have made using FastDup is that there are ~1.2M (!) duplicate images on the ImageNet-21K dataset, out of which 104K pairs belong both to the train and to the val splits (this amounts to 20% of the validation set). This is a new unknown result! Full results are below. * train/val splits are taken from https://github.com/Alibaba-MIIL/ImageNet21 .
3632

0 commit comments

Comments
 (0)