Skip to content

Commit 2664071

Browse files
author
dbickson
committed
fixing
1 parent c6cb091 commit 2664071

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

RUN.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,12 @@ Alternatively, it is also possible to give a location of a file listing images f
1919
2020
num_images (int): Number of images to run on. Default is -1 which means run on all the images in the image_dir folder.
2121
22-
nnmodel (str): Nearest Neighbor model for clustering the features together, when using turi (has no effect when using faiss). Supported options are brute_force (exact), ball_tree and lsh (both approximate). Default is brute_force.
22+
turi_param (str): Optional additional parameters for turi. Supported paramets are:
23+
- nnmodel=0|1|2Nearest Neighbor model for clustering the features together, when using turi (has no effect when using faiss). Supported options are 0=brute_force (exact), 1=ball_tree and 2=lsh (both approximate). Default is brute_force.
24+
- ccthreshold=XX where XX in the range [0,1]. Construct similarities graph when the similarity > XX.
25+
- run_cc=0|1 Distable/enable connected components computation on the graph of similarities.
26+
- run_pagerank=0|1 Disable/enable pagerank computation on the graph of similarities.
27+
- run_degree=0|1 Distable/enable degree distribution computation on the graph of similarities,
2328
2429
distance (str): Distance metric for the Nearest Neighbors algorithm. Default is cosine. Other distances are euclidean, squared_euclidean, manhattan.
2530
@@ -92,6 +97,10 @@ def load_binary_feature(filename):
9297
Faiss index files
9398
- When using faiss an additional intermediate results file is created: `faiss.index`.
9499

100+
Graph computation
101+
- When enableing connected components a file named `components_info.csv` is created with number of nodes (=images) per component.
102+
- A file named `connected_components.csv` includes the output of pagerank, degree distribution and connected component assignments. The first column is the index in the `features.dat.csv` file (the image list). This file is sorted according to the list.
103+
95104
## Error handling
96105
When bad images are encountered, namely corrupted images that can not be read, an additional csv output file is generated called features.dat.bad. The bad images filenames are stored there. In addition there is a printout that states the number of good and bad images encountered. The good images filenames are stored in the file features.dat.csv file. Namely the bad images are excluded from the total images listing. The function fastdup.load_binary_features() reads the features corresponding to the good images and returns a list of all the good images, and a numpy array of all their corresponding features.
97106
The output file similarity.csv with the list of all similar pairs does not include any of the bad images.

0 commit comments

Comments
 (0)