You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: RUN.md
+10-1Lines changed: 10 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,12 @@ Alternatively, it is also possible to give a location of a file listing images f
19
19
20
20
num_images (int): Number of images to run on. Default is -1 which means run on all the images in the image_dir folder.
21
21
22
-
nnmodel (str): Nearest Neighbor model for clustering the features together, when using turi (has no effect when using faiss). Supported options are brute_force (exact), ball_tree and lsh (both approximate). Default is brute_force.
22
+
turi_param (str): Optional additional parameters for turi. Supported paramets are:
23
+
- nnmodel=0|1|2Nearest Neighbor model for clustering the features together, when using turi (has no effect when using faiss). Supported options are 0=brute_force (exact), 1=ball_tree and 2=lsh (both approximate). Default is brute_force.
24
+
- ccthreshold=XX where XX in the range [0,1]. Construct similarities graph when the similarity > XX.
25
+
- run_cc=0|1 Distable/enable connected components computation on the graph of similarities.
26
+
- run_pagerank=0|1 Disable/enable pagerank computation on the graph of similarities.
27
+
- run_degree=0|1 Distable/enable degree distribution computation on the graph of similarities,
23
28
24
29
distance (str): Distance metric for the Nearest Neighbors algorithm. Default is cosine. Other distances are euclidean, squared_euclidean, manhattan.
- When using faiss an additional intermediate results file is created: `faiss.index`.
94
99
100
+
Graph computation
101
+
- When enableing connected components a file named `components_info.csv` is created with number of nodes (=images) per component.
102
+
- A file named `connected_components.csv` includes the output of pagerank, degree distribution and connected component assignments. The first column is the index in the `features.dat.csv` file (the image list). This file is sorted according to the list.
103
+
95
104
## Error handling
96
105
When bad images are encountered, namely corrupted images that can not be read, an additional csv output file is generated called features.dat.bad. The bad images filenames are stored there. In addition there is a printout that states the number of good and bad images encountered. The good images filenames are stored in the file features.dat.csv file. Namely the bad images are excluded from the total images listing. The function fastdup.load_binary_features() reads the features corresponding to the good images and returns a list of all the good images, and a numpy array of all their corresponding features.
97
106
The output file similarity.csv with the list of all similar pairs does not include any of the bad images.
0 commit comments