You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* test_dists.jl: Fixes errors that could arise in the unlikely event that all values in p are < 0.3.
* Introduce tests to make sure that claims of PreMetric, SemiMetric and Metric-ness for the distance measures are justified.
* Improved and more widespread fix to problem of fixed 0.3 threshold for setting p to 0.
* Tidying test code to standard julia style used in rest of package of using .0 at end of floating point and not just .
* Force exact computation of tests, which now results in test failures.
* Individual columns of P matrix can still be zero, so fix to remove elements per column.
* Removing duplicated tests for equality with zero now we have PreMetric tests.
* Improve documentation by describing argument types of distance measures.
* Removing floating point rounding errors (sum(p) != 1) in RenyiDivergence causing v small negative divergences. RenyiDivergence now internally normalises arguments.
* Add in some extra checks for RenyiDivergence, and remove one which no longer applies post-normalisation.
* CosineDist and CorrDist can both have small rounding errors - corrected for now by increasing error tolerance.
* Add length match check for Bhattacharyya and Mahalanobis distances, especially since Bhattacharyya guarantees this using @inbounds.
* Add in metric-ness tests for Jaccard, SpanNormDist and RogersTanimoto.
* Update benchmarks to include Renyi divergences, and pass them probability distributions.
* Update benchmark processor information.
* Add in new DimensionMismatch checks to confirm implementation.
* Fix indentation problems.
* typo
* More indentation fixes - mostly to make @testsets indent.
* Remove a final equal-to-zero test.
* Fixed indent.
* There are more rounding errors resulting in dxz ≤ dxy + dyz failing because dxz ≈ dxy + dyz but on the wrong side of the equality. For the time being, we just need to accept this.
* Fix README.md argument explanation.
* Indent.
* Add in Renyi divergence docs.
* Add in new DimensionMismatch checks to confirm implementation.
* Fix indentation problems.
* typo
* More indentation fixes - mostly to make @testsets indent.
* Remove a final equal-to-zero test.
* Fixed indent.
* There are more rounding errors resulting in dxz ≤ dxy + dyz failing because dxz ≈ dxy + dyz but on the wrong side of the equality. For the time being, we just need to accept this.
* Fix README.md argument explanation.
* Indent.
* Add in Renyi divergence docs.
* Fix benchmarks.
| WeightedHamming |`whamming(x, y, w)`|`sum((x .!= y) .* w)`|
153
153
154
-
**Note:** The formulas above are using *Julia*'s functions. These formulas are mainly for conveying the math concepts in a concise way. The actual implementation may use a faster way.
154
+
**Note:** The formulas above are using *Julia*'s functions. These formulas are mainly for conveying the math concepts in a concise way. The actual implementation may use a faster way. The arguments `x` and `y` are arrays of real numbers; `k` and `l` are arrays of distinct elements of any kind; a and b are arrays of Bools; and finally, `p` and `q` are arrays forming a discrete probability distribution and are therefore both expected to sum to one.
The implementation has been carefully optimized based on benchmarks. The Julia scripts ``test/bench_colwise.jl`` and ``test/bench_pairwise.jl`` run the benchmarks on a variety of distances, respectively under column-wise and pairwise settings.
187
187
188
-
Here are benchmarks obtained on Linux with Intel Core i7-4770K 3.5 GHz.
188
+
Here are benchmarks obtained running Julia 0.5.1 on a late-2016 MacBook Pro running MacOS 10.12.3 with an quad-core Intel Core i7 processor @ 2.9 GHz.
189
189
190
190
#### Column-wise benchmark
191
191
192
192
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance between corresponding columns in two ``200-by-10000`` matrices.
We can see that using ``colwise`` instead of a simple loop yields considerable gain (2x - 6x), especially when the internal computation of each distance is simple. Nonetheless, when the computaton of a single distance is heavy enough (e.g. *Minkowski* and *JSDivergence*), the gain is not as significant.
We can see that using ``colwise`` instead of a simple loop yields considerable gain (2x - 4x), especially when the internal computation of each distance is simple. Nonetheless, when the computation of a single distance is heavy enough (e.g. *KLDivergence*, *RenyiDivergence*), the gain is not as significant.
218
222
219
223
#### Pairwise benchmark
220
224
221
225
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance in a pairwise manner between columns in a ``100-by-200`` and ``100-by-250`` matrices, which will result in a ``200-by-250`` distance matrix.
For distances of which a major part of the computation is a quadratic form (e.g. *Euclidean*, *CosineDist*, *Mahalanobis*), the performance can be drastically improved by restructuring the computation and delegating the core part to ``GEMM`` in *BLAS*. The use of this strategy can easily lead to 100x performance gain over simple loops (see the highlighted part of the table above).
0 commit comments