Skip to content

Commit d795ce6

Browse files
author
dbickson
committed
cleaning README
2 parents 1063c18 + 1cd709e commit d795ce6

File tree

1 file changed

+22
-3
lines changed

1 file changed

+22
-3
lines changed

README.md

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,23 +15,42 @@ We run on the full ImageNet dataset (11.5M images) to compare all pairs of image
1515

1616
![alt text](https://github.com/visualdatabase/fastdup/blob/main/gallery/fastdup_duplicates.png)
1717

18-
FastDup identifies 1,200,000 duplicate images on the ImageNet dataset.
19-
18+
FastDup identifies 1,200,000 duplicate images on the ImageNet dataset, a new unknown resut!
2019

2120

21+
# Installing the code
22+
For Python 3.7 and 3.8
23+
```
24+
pip install fastdup
25+
```
2226

27+
[Install from stable release](INSTALL.md)
2328

2429

2530
# Running the code
31+
32+
## Python
2633
```
2734
> python3
2835
> import fastdup
2936
> fastdup.__version__ # prints the version number
30-
> fastdup.run(“/path/to/your/folder”) #main running function
37+
> fastdup.run(input_dir=“/path/to/your/folder”, work_dir="/path/to/your/folder") #main running function
38+
```
39+
40+
## C++
41+
```
42+
/usr/bin/fastdup /path/to/your/folder --work_dir="/tmp/fastdup_files"
43+
3144
```
3245

3346
[Detailed running instructions](RUN.md)
3447

48+
49+
50+
# Support for s3 cloud/ google storage
51+
[Detailed instructions](CLOUD.md)
52+
53+
3554
## Error handling
3655
When bad images are encountered, namely corrupted images that can not be read, an additional csv output file is generated called features.dat.bad. The bad images filenames are stored there. In addition there is a printout that states the number of good and bad images encountered. The good images filenames are stored in the file features.dat.csv file. Namely the bad images are excluded from the total images listing. The function fastdup.load_binary_features() reads the features corresponding to the good images and returns a list of all the good images, and a numpy array of all their corresponding features.
3756
The output file similarity.csv with the list of all similar pairs does not include any of the bad images.

0 commit comments

Comments
 (0)