Skip to content

Commit c634d89

Browse files
authored
split out benchmark into own function (#70)
* split out benchmark into own function * modernize benchmark suite
1 parent bc17903 commit c634d89

File tree

6 files changed

+271
-210
lines changed

6 files changed

+271
-210
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22
*.jl.*.cov
33
*.jl.mem
44
*.ji
5+
benchmarks/params.jld

README.md

Lines changed: 53 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ A Julia package for evaluating distances(metrics) between vectors.
99

1010
This package also provides optimized functions to compute column-wise and pairwise distances, which are often substantially faster than a straightforward loop implementation. (See the benchmark section below for details).
1111

12+
1213
## Supported distances
1314

1415
* Euclidean distance
@@ -37,11 +38,11 @@ This package also provides optimized functions to compute column-wise and pairwi
3738

3839
For ``Euclidean distance``, ``Squared Euclidean distance``, ``Cityblock distance``, ``Minkowski distance``, and ``Hamming distance``, a weighted version is also provided.
3940

41+
4042
## Basic Use
4143

4244
The library supports three ways of computation: *computing the distance between two vectors*, *column-wise computation*, and *pairwise computation*.
4345

44-
4546
#### Computing the distance between two vectors
4647

4748
Each distance corresponds to a *distance type*. You can always compute a certain distance between two vectors using the following syntax
@@ -93,7 +94,6 @@ R = pairwise(dist, X)
9394
This statement will result in an ``m-by-m`` matrix, where ``R[i,j]`` is the distance between ``X[:,i]`` and ``X[:,j]``.
9495
``pairwise(dist, X)`` is typically more efficient than ``pairwise(dist, X, X)``, as the former will take advantage of the symmetry when ``dist`` is a semi-metric (including metric).
9596

96-
9797
#### Computing column-wise and pairwise distances inplace
9898

9999
If the vector/matrix to store the results are pre-allocated, you may use the storage (without creating a new array) using the following syntax:
@@ -107,7 +107,6 @@ pairwise!(R, dist, X)
107107
Please pay attention to the difference, the functions for inplace computation are ``colwise!`` and ``pairwise!`` (instead of ``colwise`` and ``pairwise``).
108108

109109

110-
111110
## Distance type hierarchy
112111

113112
The distances are organized into a type hierarchy.
@@ -190,75 +189,73 @@ julia> pairwise(Euclidean(1e-12), x, x)
190189
0.0
191190
```
192191

193-
## Benchmarks
194-
195192

196-
The implementation has been carefully optimized based on benchmarks. The Julia scripts ``test/bench_colwise.jl`` and ``test/bench_pairwise.jl`` run the benchmarks on a variety of distances, respectively under column-wise and pairwise settings.
193+
## Benchmarks
197194

198-
Here are benchmarks obtained running Julia 0.5.1 on a late-2016 MacBook Pro running MacOS 10.12.3 with an quad-core Intel Core i7 processor @ 2.9 GHz.
195+
The implementation has been carefully optimized based on benchmarks. The script in `benchmarks/benchmark.jl` defines a benchmark suite
196+
for a variety of distances, under column-wise and pairwise settings.
197+
198+
Here are benchmarks obtained running Julia 0.6 on a computer with a quad-core Intel Core i5-2500K processor @ 3.3 GHz.
199+
The tables below can be replicated using the script in `benchmarks/print_table.jl`.
199200

200201
#### Column-wise benchmark
201202

202203
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance between corresponding columns in two ``200-by-10000`` matrices.
203204

204205
| distance | loop | colwise | gain |
205206
|----------- | -------| ----------| -------|
206-
| SqEuclidean | 0.007267s | 0.002000s | 3.6334 |
207-
| Euclidean | 0.007471s | 0.002042s | 3.6584 |
208-
| Cityblock | 0.007239s | 0.001980s | 3.6556 |
209-
| Chebyshev | 0.011396s | 0.005274s | 2.1606 |
210-
| Minkowski | 0.022127s | 0.017161s | 1.2894 |
211-
| Hamming | 0.006777s | 0.001841s | 3.6804 |
212-
| CosineDist | 0.008709s | 0.003046s | 2.8592 |
213-
| CorrDist | 0.012766s | 0.014199s | 0.8991 |
214-
| ChiSqDist | 0.007321s | 0.002042s | 3.5856 |
215-
| KLDivergence | 0.037239s | 0.033535s | 1.1105 |
216-
| RenyiDivergence(0) | 0.014607s | 0.009587s | 1.5237 |
217-
| RenyiDivergence(1) | 0.044142s | 0.040953s | 1.0779 |
218-
| RenyiDivergence(2) | 0.019056s | 0.012029s | 1.5842 |
219-
| RenyiDivergence(∞) | 0.014469s | 0.010906s | 1.3267 |
220-
| JSDivergence | 0.077435s | 0.081599s | 0.9490 |
221-
| BhattacharyyaDist | 0.009805s | 0.004355s | 2.2514 |
222-
| HellingerDist | 0.010007s | 0.004030s | 2.4832 |
223-
| WeightedSqEuclidean | 0.007435s | 0.002051s | 3.6254 |
224-
| WeightedEuclidean | 0.008217s | 0.002075s | 3.9591 |
225-
| WeightedCityblock | 0.007486s | 0.002058s | 3.6378 |
226-
| WeightedMinkowski | 0.024653s | 0.019632s | 1.2557 |
227-
| WeightedHamming | 0.008467s | 0.002962s | 2.8587 |
228-
| SqMahalanobis | 0.101976s | 0.031780s | 3.2088 |
229-
| Mahalanobis | 0.105060s | 0.031806s | 3.3032 |
207+
| SqEuclidean | 0.007467s | 0.002171s | 3.4393 |
208+
| Euclidean | 0.007421s | 0.002185s | 3.3961 |
209+
| Cityblock | 0.007442s | 0.002168s | 3.4328 |
210+
| Chebyshev | 0.011494s | 0.005846s | 1.9662 |
211+
| Minkowski | 0.174122s | 0.143938s | 1.2097 |
212+
| Hamming | 0.007586s | 0.002249s | 3.3739 |
213+
| CosineDist | 0.008581s | 0.002853s | 3.0076 |
214+
| CorrDist | 0.014991s | 0.011402s | 1.3148 |
215+
| ChiSqDist | 0.012990s | 0.006910s | 1.8800 |
216+
| KLDivergence | 0.051694s | 0.047433s | 1.0898 |
217+
| RenyiDivergence | 0.021406s | 0.017845s | 1.1996 |
218+
| RenyiDivergence | 0.031397s | 0.027801s | 1.1294 |
219+
| JSDivergence | 0.115657s | 0.495861s | 0.2332 |
220+
| BhattacharyyaDist | 0.019273s | 0.013195s | 1.4606 |
221+
| HellingerDist | 0.018883s | 0.012921s | 1.4613 |
222+
| WeightedSqEuclidean | 0.007559s | 0.002256s | 3.3504 |
223+
| WeightedEuclidean | 0.007624s | 0.002325s | 3.2796 |
224+
| WeightedCityblock | 0.007803s | 0.002248s | 3.4709 |
225+
| WeightedMinkowski | 0.154231s | 0.147579s | 1.0451 |
226+
| WeightedHamming | 0.009042s | 0.003182s | 2.8417 |
227+
| SqMahalanobis | 0.070869s | 0.019199s | 3.6913 |
228+
| Mahalanobis | 0.070980s | 0.019305s | 3.6768 |
230229

231230
We can see that using ``colwise`` instead of a simple loop yields considerable gain (2x - 4x), especially when the internal computation of each distance is simple. Nonetheless, when the computation of a single distance is heavy enough (e.g. *KLDivergence*, *RenyiDivergence*), the gain is not as significant.
232231

233232
#### Pairwise benchmark
234233

235234
The table below compares the performance (measured in terms of average elapsed time of each iteration) of a straightforward loop implementation and an optimized implementation provided in *Distances.jl*. The task in each iteration is to compute a specific distance in a pairwise manner between columns in a ``100-by-200`` and ``100-by-250`` matrices, which will result in a ``200-by-250`` distance matrix.
236235

237-
| distance | loop | pairwise | gain |
236+
| distance | loop | pairwise | gain |
238237
|----------- | -------| ----------| -------|
239-
| SqEuclidean | 0.022982s | 0.000145s | **158.9554** |
240-
| Euclidean | 0.022155s | 0.000843s | **26.2716** |
241-
| Cityblock | 0.022382s | 0.003899s | 5.7407 |
242-
| Chebyshev | 0.034491s | 0.014600s | 2.3624 |
243-
| Minkowski | 0.065968s | 0.046761s | 1.4107 |
244-
| Hamming | 0.021016s | 0.003139s | 6.6946 |
245-
| CosineDist | 0.024394s | 0.000828s | **29.4478** |
246-
| CorrDist | 0.039089s | 0.000852s | **45.8839** |
247-
| ChiSqDist | 0.022152s | 0.004361s | 5.0793 |
248-
| KLDivergence | 0.096694s | 0.086728s | 1.1149 |
249-
| RenyiDivergence(0) | 0.042658s | 0.023323s | 1.8290 |
250-
| RenyiDivergence(1) | 0.122015s | 0.104527s | 1.1673 |
251-
| RenyiDivergence(2) | 0.052896s | 0.033865s | 1.5620 |
252-
| RenyiDivergence(∞) | 0.039993s | 0.027331s | 1.4632 |
253-
| JSDivergence | 0.211276s | 0.204046s | 1.0354 |
254-
| BhattacharyyaDist | 0.030378s | 0.011189s | 2.7151 |
255-
| HellingerDist | 0.029592s | 0.010109s | 2.9273 |
256-
| WeightedSqEuclidean | 0.025619s | 0.000217s | **117.8128** |
257-
| WeightedEuclidean | 0.023366s | 0.000264s | **88.3711** |
258-
| WeightedCityblock | 0.026213s | 0.004610s | 5.6855 |
259-
| WeightedMinkowski | 0.068588s | 0.050033s | 1.3708 |
260-
| WeightedHamming | 0.025936s | 0.007225s | 3.5895 |
261-
| SqMahalanobis | 0.520046s | 0.000939s | **553.6694** |
262-
| Mahalanobis | 0.480556s | 0.000954s | **503.6009** |
238+
| SqEuclidean | 0.019217s | 0.000196s | **97.9576** |
239+
| Euclidean | 0.019287s | 0.000505s | **38.1874** |
240+
| Cityblock | 0.019376s | 0.002532s | 7.6521 |
241+
| Chebyshev | 0.032814s | 0.014811s | 2.2155 |
242+
| Minkowski | 0.382199s | 0.361059s | 1.0586 |
243+
| Hamming | 0.019826s | 0.003047s | 6.5072 |
244+
| CosineDist | 0.024012s | 0.000367s | **65.3661** |
245+
| CorrDist | 0.041356s | 0.000421s | **98.3049** |
246+
| ChiSqDist | 0.035105s | 0.017882s | 1.9632 |
247+
| KLDivergence | 0.131773s | 0.117640s | 1.1201 |
248+
| RenyiDivergence | 0.057569s | 0.042694s | 1.3484 |
249+
| RenyiDivergence | 0.082862s | 0.067811s | 1.2220 |
250+
| JSDivergence | 0.292014s | 0.276898s | 1.0546 |
251+
| BhattacharyyaDist | 0.051302s | 0.033043s | 1.5526 |
252+
| HellingerDist | 0.049518s | 0.031856s | 1.5545 |
253+
| WeightedSqEuclidean | 0.019959s | 0.000218s | **91.7298** |
254+
| WeightedEuclidean | 0.020336s | 0.000557s | **36.5405** |
255+
| WeightedCityblock | 0.020391s | 0.003118s | 6.5404 |
256+
| WeightedMinkowski | 0.387738s | 0.366898s | 1.0568 |
257+
| WeightedHamming | 0.024456s | 0.007403s | 3.3033 |
258+
| SqMahalanobis | 0.113107s | 0.000366s | **309.3621** |
259+
| Mahalanobis | 0.114646s | 0.000686s | **167.0595** |
263260

264261
For distances of which a major part of the computation is a quadratic form (e.g. *Euclidean*, *CosineDist*, *Mahalanobis*), the performance can be drastically improved by restructuring the computation and delegating the core part to ``GEMM`` in *BLAS*. The use of this strategy can easily lead to 100x performance gain over simple loops (see the highlighted part of the table above).

benchmarks/benchmark.jl

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
using BenchmarkTools
2+
using Distances
3+
4+
const SUITE = BenchmarkGroup()
5+
6+
function create_distances(w, Q)
7+
dists = [
8+
SqEuclidean(),
9+
Euclidean(),
10+
Cityblock(),
11+
Chebyshev(),
12+
Minkowski(3.0),
13+
Hamming(),
14+
15+
CosineDist(),
16+
CorrDist(),
17+
ChiSqDist(),
18+
19+
BhattacharyyaDist(),
20+
HellingerDist(),
21+
22+
WeightedSqEuclidean(w),
23+
WeightedEuclidean(w),
24+
WeightedCityblock(w),
25+
WeightedMinkowski(w, 3.0),
26+
WeightedHamming(w),
27+
28+
SqMahalanobis(Q),
29+
Mahalanobis(Q),
30+
]
31+
32+
divs = [
33+
KLDivergence(),
34+
RenyiDivergence(0),
35+
RenyiDivergence(1),
36+
RenyiDivergence(2),
37+
RenyiDivergence(Inf),
38+
JSDivergence(),
39+
]
40+
return dists, divs
41+
end
42+
43+
###########
44+
# Colwise #
45+
###########
46+
47+
SUITE["colwise"] = BenchmarkGroup()
48+
49+
function evaluate_colwise(dist, x, y)
50+
n = size(x, 2)
51+
T = typeof(evaluate(dist, x[:, 1], y[:, 1]))
52+
r = Vector{T}(n)
53+
for j = 1:n
54+
r[j] = evaluate(dist, x[:, j], y[:, j])
55+
end
56+
return r
57+
end
58+
59+
function add_colwise_benchmarks!(SUITE)
60+
61+
m = 200
62+
n = 10000
63+
64+
x = rand(m, n)
65+
y = rand(m, n)
66+
67+
p = x
68+
q = y
69+
for i = 1:n
70+
p[:, i] /= sum(x[:, i])
71+
q[:, i] /= sum(y[:, i])
72+
end
73+
74+
w = rand(m)
75+
76+
Q = rand(m, m)
77+
Q = Q' * Q
78+
79+
_dists, divs = create_distances(w, Q)
80+
81+
for (dists, (a, b)) in [(_dists, (x,y)), (divs, (p,q))]
82+
for dist in (dists)
83+
Tdist = typeof(dist)
84+
SUITE["colwise"][Tdist] = BenchmarkGroup()
85+
SUITE["colwise"][Tdist]["loop"] = @benchmarkable evaluate_colwise($dist, $a, $b)
86+
SUITE["colwise"][Tdist]["specialized"] = @benchmarkable colwise($dist, $a, $b)
87+
end
88+
end
89+
end
90+
91+
add_colwise_benchmarks!(SUITE)
92+
93+
94+
############
95+
# Pairwise #
96+
############
97+
98+
SUITE["pairwise"] = BenchmarkGroup()
99+
100+
function evaluate_pairwise(dist, x, y)
101+
nx = size(x, 2)
102+
ny = size(y, 2)
103+
T = typeof(evaluate(dist, x[:, 1], y[:, 1]))
104+
r = Matrix{T}(nx, ny)
105+
for j = 1:ny
106+
for i = 1:nx
107+
r[i, j] = evaluate(dist, x[:, i], y[:, j])
108+
end
109+
end
110+
return r
111+
end
112+
113+
function add_pairwise_benchmarks!(SUITE)
114+
m = 100
115+
nx = 200
116+
ny = 250
117+
118+
x = rand(m, nx)
119+
y = rand(m, ny)
120+
121+
p = x
122+
for i = 1:nx
123+
p[:, i] /= sum(x[:, i])
124+
end
125+
126+
q = y
127+
for i = 1:ny
128+
q[:, i] /= sum(y[:, i])
129+
end
130+
131+
w = rand(m)
132+
Q = rand(m, m)
133+
Q = Q' * Q
134+
135+
_dists, divs = create_distances(w, Q)
136+
137+
for (dists, (a, b)) in [(_dists, (x,y)), (divs, (p,q))]
138+
for dist in (dists)
139+
Tdist = typeof(dist)
140+
SUITE["pairwise"][Tdist] = BenchmarkGroup()
141+
SUITE["pairwise"][Tdist]["loop"] = @benchmarkable evaluate_pairwise($dist, $a, $b)
142+
SUITE["pairwise"][Tdist]["specialized"] = @benchmarkable pairwise($dist, $a, $b)
143+
end
144+
end
145+
end
146+
147+
add_pairwise_benchmarks!(SUITE)

benchmarks/print_table.jl

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
using BenchmarkTools
2+
using Distances
3+
4+
include("benchmark.jl")
5+
6+
# BenchmarkTools stores things in a Dict so it loses ordering but we want to print the table
7+
# in a special order. Therefore define an order here:
8+
9+
order = [
10+
:SqEuclidean,
11+
:Euclidean,
12+
:Cityblock,
13+
:Chebyshev,
14+
:Minkowski,
15+
:Hamming,
16+
:CosineDist,
17+
:CorrDist,
18+
:ChiSqDist,
19+
:KLDivergence,
20+
:RenyiDivergence,
21+
:RenyiDivergence,
22+
:RenyiDivergence,
23+
:RenyiDivergence,
24+
:JSDivergence,
25+
:BhattacharyyaDist,
26+
:HellingerDist,
27+
:WeightedSqEuclidean,
28+
:WeightedEuclidean,
29+
:WeightedCityblock,
30+
:WeightedMinkowski,
31+
:WeightedHamming,
32+
:SqMahalanobis,
33+
:Mahalanobis,
34+
]
35+
36+
BenchmarkTools.DEFAULT_PARAMETERS.seconds = 2.0 # Long enough
37+
38+
# Tuning
39+
if !isfile(@__DIR__, "params.jld")
40+
tuning = tune!(SUITE; verbose = true);
41+
BenchmarkTools.save("params.jld", "SUITE", params(SUITE))
42+
end
43+
loadparams!(SUITE, BenchmarkTools.load("params.jld", "SUITE"), :evals, :samples);
44+
45+
# Run and judge
46+
results = run(SUITE; verbose = true)
47+
judgement = minimum(results)
48+
49+
# Output the comparison table
50+
getname(T::DataType) = T.name.name
51+
52+
function print_table(judgement)
53+
for typ in ("colwise", "pairwise")
54+
io = IOBuffer()
55+
println(io, "| distance | loop | $typ | gain |")
56+
println(io, "|----------- | -------| ----------| -------|")
57+
sorted_distances = sort(collect(judgement[typ]), by = y -> findfirst(x -> x == getname(y[1]), order))
58+
59+
for (dist, result) in sorted_distances
60+
t_loop = BenchmarkTools.time(result["loop"])
61+
t_spec = BenchmarkTools.time(result["specialized"])
62+
print(io, "| ", getname(dist), " |")
63+
print(io, @sprintf("%9.6fs | %9.6fs | %7.4f |\n", t_loop / 1e9, t_spec / 1e9, (t_loop / t_spec)))
64+
end
65+
print(STDOUT, String(take!(io)))
66+
println()
67+
end
68+
end
69+
70+
print_table(judgement)

0 commit comments

Comments
 (0)