using exact similarity search

Thank you for providing the coding and implementation for density and coverage. It is awesome to have the code ready for use in practice. I cite the paper whenever possible.

I took the liberty to research possible improvements. I found that using an exact similarity search as offered by [`faiss`](https://github.com/facebookresearch/faiss) can speed up the calculation of density and coverage by a great deal.

Here are my results for `num_real_samples = num_fake_samples = 1024`, `feature_dim = 12`, `nearest_k = 5`:

```
--------------------------------------------------------------------------------------- benchmark: 4 tests ---------------------------------------------------------------------------------------
Name (time in ms)                 Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bench_my_coverage        10.7225 (1.0)       53.4196 (1.09)      13.9665 (1.0)       8.7931 (1.0)       11.1186 (1.0)       1.4884 (1.0)           1;4  71.5998 (1.0)          24           1
test_bench_my_density         11.9193 (1.11)      49.0908 (1.0)       16.7892 (1.20)      9.0503 (1.03)      12.8918 (1.16)      3.9669 (2.67)          2;3  59.5619 (0.83)         18           1
test_bench_prdc_coverage     316.7985 (29.55)    400.5574 (8.16)     354.7417 (25.40)    31.7475 (3.61)     355.2325 (31.95)    43.6299 (29.31)         2;0   2.8190 (0.04)          5           1
test_bench_prdc_density      365.5958 (34.10)    400.6876 (8.16)     382.3611 (27.38)    12.6120 (1.43)     380.4541 (34.22)    12.9641 (8.71)          2;0   2.6153 (0.04)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
```

I tagged, any algorithm using `faiss` with `test_bench_my`. Using a similarity tree approach, this line
```
 real_nearest_neighbour_distances = compute_nearest_neighbour_distances(
        real_features, nearest_k)
```
in the original code is accelerated big time due to the efficient lookup of samples with the tree structure. 

As such a change would drag in a dependency to `faiss`, I am reluctant to send a PR to this repo. Let me know what you think!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

using exact similarity search #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

using exact similarity search #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions