You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-19Lines changed: 19 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,7 +39,7 @@ This package also provides optimized functions to compute column-wise and pairwi
39
39
* Bray-Curtis dissimilarity
40
40
* Bregman divergence
41
41
42
-
For ``Euclidean distance``, ``Squared Euclidean distance``, ``Cityblock distance``, ``Minkowski distance``, and ``Hamming distance``, a weighted version is also provided.
42
+
For `Euclidean distance`, `Squared Euclidean distance`, `Cityblock distance`, `Minkowski distance`, and `Hamming distance`, a weighted version is also provided.
43
43
44
44
45
45
## Basic use
@@ -55,7 +55,7 @@ r = evaluate(dist, x, y)
55
55
r =dist(x, y)
56
56
```
57
57
58
-
Here, dist is an instance of a distance type. For example, the type for Euclidean distance is ``Euclidean`` (more distance types will be introduced in the next section), then you can compute the Euclidean distance between ``x`` and ``y`` as
58
+
Here, `dist` is an instance of a distance type. For example, the type for Euclidean distance is `Euclidean` (more distance types will be introduced in the next section), then you can compute the Euclidean distance between `x` and `y` as
59
59
60
60
```julia
61
61
r =evaluate(Euclidean(), x, y)
@@ -70,57 +70,57 @@ r = euclidean(x, y)
70
70
71
71
#### Computing distances between corresponding columns
72
72
73
-
Suppose you have two ``m-by-n`` matrix ``X`` and ``Y``, then you can compute all distances between corresponding columns of ``X`` and ``Y`` in one batch, using the ``colwise`` function, as
73
+
Suppose you have two `m-by-n` matrix `X` and `Y`, then you can compute all distances between corresponding columns of `X` and `Y` in one batch, using the `colwise` function, as
74
74
75
75
```julia
76
76
r =colwise(dist, X, Y)
77
77
```
78
78
79
-
The output ``r`` is a vector of length ``n``. In particular, ``r[i]`` is the distance between ``X[:,i]`` and ``Y[:,i]``. The batch computation typically runs considerably faster than calling ``evaluate`` column-by-column.
79
+
The output `r` is a vector of length `n`. In particular, `r[i]` is the distance between `X[:,i]` and `Y[:,i]`. The batch computation typically runs considerably faster than calling `evaluate` column-by-column.
80
80
81
-
Note that either of ``X`` and ``Y`` can be just a single vector -- then the ``colwise`` function will compute the distance between this vector and each column of the other parameter.
81
+
Note that either of `X` and `Y` can be just a single vector -- then the `colwise` function will compute the distance between this vector and each column of the other parameter.
82
82
83
83
#### Computing pairwise distances
84
84
85
-
Let ``X`` and ``Y`` respectively have ``m`` and ``n`` columns. Then the ``pairwise`` function with the ``dims=2`` argument computes distances between each pair of columns in ``X`` and ``Y``:
85
+
Let `X` and `Y` respectively have `m` and `n` columns. Then the `pairwise` function with the `dims=2` argument computes distances between each pair of columns in `X` and `Y`:
86
86
87
87
```julia
88
88
R =pairwise(dist, X, Y, dims=2)
89
89
```
90
90
91
-
In the output, ``R`` is a matrix of size ``(m, n)``, such that ``R[i,j]`` is the distance between ``X[:,i]`` and ``Y[:,j]``. Computing distances for all pairs using ``pairwise`` function is often remarkably faster than evaluting for each pair individually.
91
+
In the output, `R` is a matrix of size `(m, n)`, such that `R[i,j]` is the distance between `X[:,i]` and `Y[:,j]`. Computing distances for all pairs using `pairwise` function is often remarkably faster than evaluting for each pair individually.
92
92
93
-
If you just want to just compute distances between columns of a matrix ``X``, you can write
93
+
If you just want to just compute distances between columns of a matrix `X`, you can write
94
94
95
95
```julia
96
96
R =pairwise(dist, X, dims=2)
97
97
```
98
98
99
-
This statement will result in an ``m-by-m`` matrix, where ``R[i,j]`` is the distance between ``X[:,i]`` and ``X[:,j]``.
100
-
``pairwise(dist, X)`` is typically more efficient than ``pairwise(dist, X, X)``, as the former will take advantage of the symmetry when ``dist`` is a semi-metric (including metric).
99
+
This statement will result in an `m-by-m` matrix, where `R[i,j]` is the distance between `X[:,i]` and `X[:,j]`.
100
+
`pairwise(dist, X)` is typically more efficient than `pairwise(dist, X, X)`, as the former will take advantage of the symmetry when `dist` is a semi-metric (including metric).
101
101
102
102
For performance reasons, it is recommended to use matrices with observations in columns (as shown above). Indeed,
103
-
the ``Array`` type in Julia is column-major, making it more efficient to access memory column by column. However,
104
-
matrices with observations stored in rows are also supported via the argument ``dims=1``.
103
+
the `Array` type in Julia is column-major, making it more efficient to access memory column by column. However,
104
+
matrices with observations stored in rows are also supported via the argument `dims=1`.
105
105
106
106
#### Computing column-wise and pairwise distances inplace
107
107
108
-
If the vector/matrix to store the results are pre-allocated, you may use the storage (without creating a new array) using the following syntax (``i`` being either ``1`` or ``2``):
108
+
If the vector/matrix to store the results are pre-allocated, you may use the storage (without creating a new array) using the following syntax (`i` being either `1` or `2`):
109
109
110
110
```julia
111
111
colwise!(r, dist, X, Y)
112
112
pairwise!(R, dist, X, Y, dims=i)
113
113
pairwise!(R, dist, X, dims=i)
114
114
```
115
115
116
-
Please pay attention to the difference, the functions for inplace computation are ``colwise!`` and ``pairwise!`` (instead of ``colwise`` and ``pairwise``).
116
+
Please pay attention to the difference, the functions for inplace computation are `colwise!` and `pairwise!` (instead of `colwise` and `pairwise`).
117
117
118
118
119
119
## Distance type hierarchy
120
120
121
121
The distances are organized into a type hierarchy.
122
122
123
-
At the top of this hierarchy is an abstract class **PreMetric**, which is defined to be a function ``d`` that satisfies
123
+
At the top of this hierarchy is an abstract class **PreMetric**, which is defined to be a function `d` that satisfies
124
124
125
125
d(x, x) == 0 for all x
126
126
d(x, y) >= 0 for all x, y
@@ -214,7 +214,7 @@ The tables below can be replicated using the script in `benchmark/print_table.jl
214
214
215
215
#### Column-wise benchmark
216
216
217
-
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance between corresponding columns in two ``200-by-10000`` matrices.
217
+
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance between corresponding columns in two `200-by-10000` matrices.
218
218
219
219
| distance | loop | colwise | gain |
220
220
|----------- | -------| ----------| -------|
@@ -244,11 +244,11 @@ The table below compares the performance (measured in terms of average elapsed t
244
244
| Mahalanobis | 0.082180s | 0.019618s | 4.1891 |
245
245
| BrayCurtis | 0.004464s | 0.001121s | 3.9809 |
246
246
247
-
We can see that using ``colwise`` instead of a simple loop yields considerable gain (2x - 4x), especially when the internal computation of each distance is simple. Nonetheless, when the computation of a single distance is heavy enough (e.g. *KLDivergence*, *RenyiDivergence*), the gain is not as significant.
247
+
We can see that using `colwise` instead of a simple loop yields considerable gain (2x - 4x), especially when the internal computation of each distance is simple. Nonetheless, when the computation of a single distance is heavy enough (e.g. *KLDivergence*, *RenyiDivergence*), the gain is not as significant.
248
248
249
249
#### Pairwise benchmark
250
250
251
-
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance in a pairwise manner between columns in a ``100-by-200`` and ``100-by-250`` matrices, which will result in a ``200-by-250`` distance matrix.
251
+
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance in a pairwise manner between columns in a `100-by-200` and `100-by-250` matrices, which will result in a `200-by-250` distance matrix.
252
252
253
253
| distance | loop | pairwise | gain |
254
254
|----------- | -------| ----------| -------|
@@ -278,4 +278,4 @@ The table below compares the performance (measured in terms of average elapsed t
For distances of which a major part of the computation is a quadratic form (e.g. *Euclidean*, *CosineDist*, *Mahalanobis*), the performance can be drastically improved by restructuring the computation and delegating the core part to ``GEMM`` in *BLAS*. The use of this strategy can easily lead to 100x performance gain over simple loops (see the highlighted part of the table above).
281
+
For distances of which a major part of the computation is a quadratic form (e.g. *Euclidean*, *CosineDist*, *Mahalanobis*), the performance can be drastically improved by restructuring the computation and delegating the core part to `GEMM` in *BLAS*. The use of this strategy can easily lead to 100x performance gain over simple loops (see the highlighted part of the table above).
0 commit comments