You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+53-56Lines changed: 53 additions & 56 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,7 @@ A Julia package for evaluating distances(metrics) between vectors.
9
9
10
10
This package also provides optimized functions to compute column-wise and pairwise distances, which are often substantially faster than a straightforward loop implementation. (See the benchmark section below for details).
11
11
12
+
12
13
## Supported distances
13
14
14
15
* Euclidean distance
@@ -37,11 +38,11 @@ This package also provides optimized functions to compute column-wise and pairwi
37
38
38
39
For ``Euclidean distance``, ``Squared Euclidean distance``, ``Cityblock distance``, ``Minkowski distance``, and ``Hamming distance``, a weighted version is also provided.
39
40
41
+
40
42
## Basic Use
41
43
42
44
The library supports three ways of computation: *computing the distance between two vectors*, *column-wise computation*, and *pairwise computation*.
43
45
44
-
45
46
#### Computing the distance between two vectors
46
47
47
48
Each distance corresponds to a *distance type*. You can always compute a certain distance between two vectors using the following syntax
@@ -93,7 +94,6 @@ R = pairwise(dist, X)
93
94
This statement will result in an ``m-by-m`` matrix, where ``R[i,j]`` is the distance between ``X[:,i]`` and ``X[:,j]``.
94
95
``pairwise(dist, X)`` is typically more efficient than ``pairwise(dist, X, X)``, as the former will take advantage of the symmetry when ``dist`` is a semi-metric (including metric).
95
96
96
-
97
97
#### Computing column-wise and pairwise distances inplace
98
98
99
99
If the vector/matrix to store the results are pre-allocated, you may use the storage (without creating a new array) using the following syntax:
@@ -107,7 +107,6 @@ pairwise!(R, dist, X)
107
107
Please pay attention to the difference, the functions for inplace computation are ``colwise!`` and ``pairwise!`` (instead of ``colwise`` and ``pairwise``).
108
108
109
109
110
-
111
110
## Distance type hierarchy
112
111
113
112
The distances are organized into a type hierarchy.
The implementation has been carefully optimized based on benchmarks. The Julia scripts ``test/bench_colwise.jl`` and ``test/bench_pairwise.jl`` run the benchmarks on a variety of distances, respectively under column-wise and pairwise settings.
193
+
## Benchmarks
197
194
198
-
Here are benchmarks obtained running Julia 0.5.1 on a late-2016 MacBook Pro running MacOS 10.12.3 with an quad-core Intel Core i7 processor @ 2.9 GHz.
195
+
The implementation has been carefully optimized based on benchmarks. The script in `benchmarks/benchmark.jl` defines a benchmark suite
196
+
for a variety of distances, under column-wise and pairwise settings.
197
+
198
+
Here are benchmarks obtained running Julia 0.6 on a computer with a quad-core Intel Core i5-2500K processor @ 3.3 GHz.
199
+
The tables below can be replicated using the script in `benchmarks/print_table.jl`.
199
200
200
201
#### Column-wise benchmark
201
202
202
203
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance between corresponding columns in two ``200-by-10000`` matrices.
We can see that using ``colwise`` instead of a simple loop yields considerable gain (2x - 4x), especially when the internal computation of each distance is simple. Nonetheless, when the computation of a single distance is heavy enough (e.g. *KLDivergence*, *RenyiDivergence*), the gain is not as significant.
232
231
233
232
#### Pairwise benchmark
234
233
235
234
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance in a pairwise manner between columns in a ``100-by-200`` and ``100-by-250`` matrices, which will result in a ``200-by-250`` distance matrix.
For distances of which a major part of the computation is a quadratic form (e.g. *Euclidean*, *CosineDist*, *Mahalanobis*), the performance can be drastically improved by restructuring the computation and delegating the core part to ``GEMM`` in *BLAS*. The use of this strategy can easily lead to 100x performance gain over simple loops (see the highlighted part of the table above).
0 commit comments