Skip to content

Commit a84de50

Browse files
authored
Update README.md
1 parent 590d95a commit a84de50

File tree

1 file changed

+40
-42
lines changed

1 file changed

+40
-42
lines changed

README.md

Lines changed: 40 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,45 @@
11

2-
# FastDup Manual
2+
# FastDup
33

4-
FastDup is a tool for fast detection of duplicate and near duplicate images.
4+
FastDup is a tool for fast detection of duplicate and near duplicate images. FastDup scales to millions of images running on CPU only.
55

66
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/git_main-min.png)
77

8-
# FastDup is FAST
8+
## Quick Installation
9+
For Python 3.7 and 3.8
10+
```python
11+
pip install fastdup
12+
```
13+
14+
[Install from stable release](INSTALL.md)
915

10-
Experiments on a 32 core Google cloud machine, with 128GB RAM (no GPU required).
16+
17+
## Running the code
18+
19+
### Python
20+
```python
21+
python3
22+
import fastdup
23+
fastdup.run(input_dir="/path/to/your/folder", work_dir="/path/to/your/folder") #main running function
24+
```
25+
26+
### C++
27+
```bash
28+
/usr/bin/fastdup /path/to/your/folder --work_dir="/tmp/fastdup_files"
29+
```
30+
31+
[Detailed running instructions](RUN.md)
32+
33+
34+
35+
### Support for s3 cloud/ google storage
36+
[Detailed instructions](CLOUD.md)
37+
38+
39+
## Results on Key Datasets
40+
We have thourougly tested fastdup across various famous computer-vision dataset. Ranging from Academic datasets to Kaggle competitions. A key finding we have made using FastDup is that there are ~1.2M (!) duplicate images on the ImageNet21K dataset, a new unknown result! Full results are below.
41+
42+
### FastDup is FAST
1143

1244
|Dataset |Total Images |Owner |Image Res |cost [$]|spot cost [$]|processing [sec]|throughput [1/sec]|
1345
|-----------------------|---------------|-----------------------|--------------|--------|-------|-------|-----|
@@ -21,9 +53,11 @@ Experiments on a 32 core Google cloud machine, with 128GB RAM (no GPU required).
2153
|[visualgenome](https://visualgenome.org/) |108,079 |stanford |334x500 |0.05 |0.01 |124 |872|
2254
|[sku110k](https://github.com/eg4000/SKU110K_CVPR19) |11,743 |trax |4160x2340 |0.03 |0.01 |77 |153|
2355

24-
We run on the full ImageNet dataset (11.5M images) to compare all pairs of images in less than 3 hours WITHOUT a GPU (with Google cloud cost of 5$).
56+
* Experiments on a 32 core Google cloud machine, with 128GB RAM (no GPU required).
2557

26-
# FastDup is ACCURATE
58+
* We run on the full ImageNet dataset (11.5M images) to compare all pairs of images in less than 3 hours WITHOUT a GPU (with Google cloud cost of 5$).
59+
60+
### FastDup is ACCURATE
2761

2862

2963
Dataset| Identical Pairs| Near-Identical Pairs
@@ -42,39 +76,3 @@ Dataset| Identical Pairs| Near-Identical Pairs
4276
[snakeclef2022-fgvc9](https://www.kaggle.com/competitions/snakeclef2022/data) |6,953 |33,128
4377
[fungiclef2022-fgvc9](https://www.kaggle.com/competitions/fungiclef2022/data) |2,205 |75
4478
[hotel-id-to-combat-human-trafficking-2022-fgvc9](https://www.kaggle.com/competitions/hotel-id-to-combat-human-trafficking-2022-fgvc9/data)| 3,544 |2,704
45-
46-
47-
FastDup identifies 1,200,000 duplicate images on the ImageNet dataset, a new unknown resut!
48-
49-
50-
# Installing the code
51-
For Python 3.7 and 3.8
52-
```python
53-
pip install fastdup
54-
```
55-
56-
[Install from stable release](INSTALL.md)
57-
58-
59-
# Running the code
60-
61-
## Python
62-
```python
63-
python3
64-
import fastdup
65-
fastdup.run(input_dir="/path/to/your/folder", work_dir="/path/to/your/folder") #main running function
66-
```
67-
68-
## C++
69-
```bash
70-
/usr/bin/fastdup /path/to/your/folder --work_dir="/tmp/fastdup_files"
71-
```
72-
73-
[Detailed running instructions](RUN.md)
74-
75-
76-
77-
# Support for s3 cloud/ google storage
78-
[Detailed instructions](CLOUD.md)
79-
80-

0 commit comments

Comments
 (0)