Skip to content
This repository was archived by the owner on Jul 4, 2023. It is now read-only.

Commit 1089a25

Browse files
Tweak plotting to collect outliers into a single bin; drop UnicodePlots dependency (#13)
* Add `simpleunicodehistogram` Co-authored-by: C. Brenhin Keller <[email protected]> * tweak `simpleunicodehistogram` for outliers * rename things, add configuration, fix tests, update readme * use Julia 1.0 compatible syntax * fix omission of minimum * bump version Co-authored-by: C. Brenhin Keller <[email protected]>
1 parent c86970b commit 1089a25

File tree

6 files changed

+199
-86
lines changed

6 files changed

+199
-86
lines changed

Project.toml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,15 @@
11
name = "BenchmarkHistograms"
22
uuid = "a80a1652-aad8-438d-b80b-ecb1a674e33b"
33
authors = ["Eric Hanson <[email protected]> and contributors"]
4-
version = "0.1.1"
4+
version = "0.2.0"
55

66
[deps]
77
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
88
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
99
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
10-
UnicodePlots = "b8865327-cd53-5732-bb35-84acbb429228"
1110

1211
[compat]
1312
BenchmarkTools = "0.7, 1.0"
14-
UnicodePlots = "1.3"
1513
julia = "1"
1614

1715
[extras]

README.md

Lines changed: 91 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -3,19 +3,19 @@
33

44
# BenchmarkHistograms
55

6-
Wraps [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl/) to provide a UnicodePlots.jl-powered `show` method for `@benchmark`. This is accomplished by a custom `@benchmark` method which wraps the output in a `BenchmarkPlot` struct with a custom show method.
6+
Wraps [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl/) to provide a unicode histogram `show` method for `@benchmark`. This is accomplished by a custom `@benchmark` method which wraps the output in a `BenchmarkPlot` struct with a custom show method.
77

88
This means one should not call `using` on both BenchmarkHistograms and BenchmarkTools in the same namespace, or else these `@benchmark` macros will conflict ("WARNING: using `BenchmarkTools.@benchmark` in module Main conflicts with an existing identifier.")
99

10-
However, BenchmarkHistograms re-exports all of BenchmarkTools (including the module `BenchmarkTools` itself), so you can simply call `using BenchmarkHistograms` instead.
10+
However, BenchmarkHistograms re-exports all the export of BenchmarkTools, so you can simply call `using BenchmarkHistograms`.
1111

1212
Providing this functionality in BenchmarkTools itself was discussed in <https://github.com/JuliaCI/BenchmarkTools.jl/pull/180>.
13+
Thanks to @brenhinkeller for providing the initial plotting code there.
1314

14-
Use the setting `BenchmarkHistograms.NBINS[]` to change the number of histogram bins used, e.g.
15-
```julia
16-
BenchmarkHistograms.NBINS[] = 10
17-
```
18-
to use 10 bins.
15+
Use the setting `BenchmarkHistograms.NBINS` to change the number of histogram bins used, e.g. `BenchmarkHistograms.NBINS[] = 10` for 10 bins.
16+
17+
Likewise use the setting `BenchmarkHistograms.OUTLIER_QUANTILE` to tweak which values count as outliers and may be grouped into a single bin.
18+
For example, `BenchmarkHistograms.OUTLIER_QUANTILE[] = 0.99` counts any values past the 99 percentile as possible outliers. This value defaults to `0.999` and is disabled by setting it to `1.0`.
1919

2020
## Example
2121

@@ -29,22 +29,27 @@ using BenchmarkHistograms
2929

3030
```
3131
samples: 10000; evals/sample: 1000; memory estimate: 0 bytes; allocs estimate: 0
32-
┌ ┐
33-
[ 4.0, 6.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7823
34-
[ 6.0, 8.0) ┤▇▇▇▇▇▇▇ 1643
35-
[ 8.0, 10.0) ┤▇▇ 529
36-
[10.0, 12.0) ┤ 2
37-
[12.0, 14.0) ┤ 2
38-
ns [14.0, 16.0) ┤ 0
39-
[16.0, 18.0) ┤ 0
40-
[18.0, 20.0) ┤ 0
41-
[20.0, 22.0) ┤ 0
42-
[22.0, 24.0) ┤ 0
43-
[24.0, 26.0) ┤ 0
44-
[26.0, 28.0) ┤ 1
45-
└ ┘
46-
Counts
47-
min: 4.916 ns (0.00% GC); mean: 5.724 ns (0.00% GC); median: 5.208 ns (0.00% GC); max: 27.458 ns (0.00% GC).
32+
ns
33+
34+
(8.04 - 8.53 ] ██████████████████████████████▏7673
35+
(8.53 - 9.02 ] ▌109
36+
(9.02 - 9.51 ] ▏3
37+
(9.51 - 10.01] 0
38+
(10.01 - 10.5 ] 0
39+
(10.5 - 10.99] █████▋1431
40+
(10.99 - 11.48] ██▌624
41+
(11.48 - 11.97] ▍70
42+
(11.97 - 12.46] ▎38
43+
(12.46 - 12.95] ▏4
44+
(12.95 - 13.44] ▏1
45+
(13.44 - 13.93] ▏2
46+
(13.93 - 14.42] ▏7
47+
(14.42 - 14.92] ▏22
48+
(14.92 - 21.88] ▏16
49+
50+
Counts
51+
52+
min: 8.041 ns (0.00% GC); mean: 8.812 ns (0.00% GC); median: 8.166 ns (0.00% GC); max: 21.875 ns (0.00% GC).
4853
```
4954

5055
That benchmark does not have a very interesting distribution, but it's not hard to find more interesting cases.
@@ -54,18 +59,26 @@ That benchmark does not have a very interesting distribution, but it's not hard
5459
```
5560

5661
```
57-
samples: 3192; evals/sample: 1000; memory estimate: 0 bytes; allocs estimate: 0
58-
┌ ┐
59-
[ 0.0, 500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2036
60-
[ 500.0, 1000.0) ┤ 0
61-
[1000.0, 1500.0) ┤ 0
62-
ns [1500.0, 2000.0) ┤ 0
63-
[2000.0, 2500.0) ┤ 0
64-
[2500.0, 3000.0) ┤ 0
65-
[3000.0, 3500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1156
66-
└ ┘
67-
Counts
68-
min: 1.875 ns (0.00% GC); mean: 1.141 μs (0.00% GC); median: 4.521 ns (0.00% GC); max: 3.315 μs (0.00% GC).
62+
samples: 3110; evals/sample: 1000; memory estimate: 0 bytes; allocs estimate: 0
63+
ns
64+
65+
(0.0 - 280.0 ] ██████████████████████████████ 1964
66+
(280.0 - 570.0 ] 0
67+
(570.0 - 850.0 ] 0
68+
(850.0 - 1130.0] 0
69+
(1130.0 - 1410.0] 0
70+
(1410.0 - 1690.0] 0
71+
(1690.0 - 1970.0] 0
72+
(1970.0 - 2250.0] 0
73+
(2250.0 - 2540.0] 0
74+
(2540.0 - 2820.0] 0
75+
(2820.0 - 3100.0] 0
76+
(3100.0 - 3380.0] █████████████████1105
77+
(3380.0 - 3660.0] ▊41
78+
79+
Counts
80+
81+
min: 2.500 ns (0.00% GC); mean: 1.181 μs (0.00% GC); median: 5.334 ns (0.00% GC); max: 3.663 μs (0.00% GC).
6982
```
7083

7184
Here, we see a bimodal distribution; in the case `5` is indeed in the vector, we find it very quickly, in the 0-1000 ns range (thanks to `sort` which places it at the front). In the case 5 is not present, we need to check every entry to be sure, and we end up in the 3000-4000 ns range.
@@ -77,18 +90,26 @@ Without the `sort`, we end up with more of a uniform distribution:
7790
```
7891

7992
```
80-
samples: 2461; evals/sample: 999; memory estimate: 0 bytes; allocs estimate: 0
81-
┌ ┐
82-
[ 0.0, 500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 364
83-
[ 500.0, 1000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇ 327
84-
[1000.0, 1500.0) ┤▇▇▇▇▇▇▇▇▇▇ 266
85-
ns [1500.0, 2000.0) ┤▇▇▇▇▇▇▇▇ 214
86-
[2000.0, 2500.0) ┤▇▇▇▇▇▇▇▇ 213
87-
[2500.0, 3000.0) ┤▇▇▇▇▇ 146
88-
[3000.0, 3500.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 931
89-
└ ┘
90-
Counts
91-
min: 8.842 ns (0.00% GC); mean: 1.972 μs (0.00% GC); median: 2.154 μs (0.00% GC); max: 3.364 μs (0.00% GC).
93+
samples: 2393; evals/sample: 1000; memory estimate: 0 bytes; allocs estimate: 0
94+
ns
95+
96+
(0.0 - 310.0 ] ███████▏214
97+
(310.0 - 610.0 ] ██████▍191
98+
(610.0 - 910.0 ] █████▊173
99+
(910.0 - 1220.0] █████▊174
100+
(1220.0 - 1520.0] █████▏155
101+
(1520.0 - 1830.0] ████▍133
102+
(1830.0 - 2130.0] ████119
103+
(2130.0 - 2430.0] ███▍100
104+
(2430.0 - 2740.0] ██▉86
105+
(2740.0 - 3040.0] ███▍102
106+
(3040.0 - 3350.0] ██████████████████████████████ 912
107+
(3350.0 - 3650.0] █30
108+
(3650.0 - 5870.0] ▎4
109+
110+
Counts
111+
112+
min: 2.334 ns (0.00% GC); mean: 2.037 μs (0.00% GC); median: 2.236 μs (0.00% GC); max: 5.869 μs (0.00% GC).
92113
```
93114

94115
This function gives a somewhat more Gaussian distribution of times, kindly supplied by Mason Protter:
@@ -100,28 +121,33 @@ f() = sum((sin(i) for i in 1:round(Int, 1000 + 100*randn())))
100121
```
101122

102123
```
103-
samples: 10000; evals/sample: 1; memory estimate: 0 bytes; allocs estimate: 0
104-
┌ ┐
105-
[ 8000.0, 9000.0) ┤ 12
106-
[ 9000.0, 10000.0) ┤▇ 117
107-
[10000.0, 11000.0) ┤▇▇▇▇▇▇▇ 635
108-
[11000.0, 12000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1810
109-
[12000.0, 13000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2959
110-
[13000.0, 14000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2460
111-
ns [14000.0, 15000.0) ┤▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1451
112-
[15000.0, 16000.0) ┤▇▇▇▇▇ 456
113-
[16000.0, 17000.0) ┤▇ 89
114-
[17000.0, 18000.0) ┤ 9
115-
[18000.0, 19000.0) ┤ 1
116-
[19000.0, 20000.0) ┤ 0
117-
[20000.0, 21000.0) ┤ 1
118-
└ ┘
119-
Counts
120-
min: 8.109 μs (0.00% GC); mean: 12.865 μs (0.00% GC); median: 12.820 μs (0.00% GC); max: 20.459 μs (0.00% GC).
124+
samples: 10000; evals/sample: 3; memory estimate: 0 bytes; allocs estimate: 0
125+
ns
126+
127+
(7030.0 - 7480.0 ] ▏11
128+
(7480.0 - 7930.0 ] █▍128
129+
(7930.0 - 8380.0 ] ████████▏788
130+
(8380.0 - 8830.0 ] █████████████████████▏2044
131+
(8830.0 - 9280.0 ] ██████████████████████████████ 2916
132+
(9280.0 - 9730.0 ] ███████████████████████▉2309
133+
(9730.0 - 10180.0] ████████████▎1182
134+
(10180.0 - 10630.0] ████▎413
135+
(10630.0 - 11080.0] █▌140
136+
(11080.0 - 11530.0] ▌44
137+
(11530.0 - 11980.0] ▏6
138+
(11980.0 - 12430.0] ▏3
139+
(12430.0 - 12880.0] 0
140+
(12880.0 - 13330.0] ▏5
141+
(13330.0 - 18330.0] ▏11
142+
143+
Counts
144+
145+
min: 7.028 μs (0.00% GC); mean: 9.184 μs (0.00% GC); median: 9.153 μs (0.00% GC); max: 18.333 μs (0.00% GC).
121146
```
122147

123148
See also <https://tratt.net/laurie/blog/entries/minimum_times_tend_to_mislead_when_benchmarking.html> for another example of where looking at the whole histogram can be useful in benchmarking.
124149

125150
---
126151

127152
*This page was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*
153+

generate_readme/README.jl

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,19 @@
33

44
# # BenchmarkHistograms
55

6-
# Wraps [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl/) to provide a UnicodePlots.jl-powered `show` method for `@benchmark`. This is accomplished by a custom `@benchmark` method which wraps the output in a `BenchmarkPlot` struct with a custom show method.
6+
# Wraps [BenchmarkTools.jl](https://github.com/JuliaCI/BenchmarkTools.jl/) to provide a unicode histogram `show` method for `@benchmark`. This is accomplished by a custom `@benchmark` method which wraps the output in a `BenchmarkPlot` struct with a custom show method.
77

88
# This means one should not call `using` on both BenchmarkHistograms and BenchmarkTools in the same namespace, or else these `@benchmark` macros will conflict ("WARNING: using `BenchmarkTools.@benchmark` in module Main conflicts with an existing identifier.")
99

1010
# However, BenchmarkHistograms re-exports all the export of BenchmarkTools, so you can simply call `using BenchmarkHistograms`.
1111

1212
# Providing this functionality in BenchmarkTools itself was discussed in <https://github.com/JuliaCI/BenchmarkTools.jl/pull/180>.
13+
# Thanks to @brenhinkeller for providing the initial plotting code there.
1314

14-
# Use the setting `BenchmarkHistograms.NBINS[] = 10` to change the number of histogram bins used.
15+
# Use the setting `BenchmarkHistograms.NBINS` to change the number of histogram bins used, e.g. `BenchmarkHistograms.NBINS[] = 10` for 10 bins.
16+
17+
# Likewise use the setting `BenchmarkHistograms.OUTLIER_QUANTILE` to tweak which values count as outliers and may be grouped into a single bin.
18+
# For example, `BenchmarkHistograms.OUTLIER_QUANTILE[] = 0.99` counts any values past the 99 percentile as possible outliers. This value defaults to `0.999` and is disabled by setting it to `1.0`.
1519

1620
# ## Example
1721

src/BenchmarkHistograms.jl

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
module BenchmarkHistograms
22

3-
using UnicodePlots
43
using Statistics
54
using Printf
65
using BenchmarkTools: BenchmarkTools
@@ -20,10 +19,18 @@ export @benchmark
2019
const NBINS = Ref(0)
2120
2221
Controls the number of histogram bins used.
23-
When `NBINS[] <= 0`, the number is chosen automatically by UnicodePlots.
22+
When `NBINS[] <= 0`, the number is chosen automatically by Sturge's rule (i.e. `log2(length(data))+1`).
2423
"""
2524
const NBINS = Ref(0)
2625

26+
"""
27+
OUTLIER_QUANTILE = Ref(0.999)
28+
29+
Controls which benchmarking times count as outliers and may be grouped into a single bin.
30+
Set `OUTLIER_QUANTILE[] = 1.0` to avoid this behavior.
31+
"""
32+
const OUTLIER_QUANTILE = Ref(0.999)
33+
2734
struct BenchmarkHistogram
2835
trial::BenchmarkTools.Trial
2936
end
@@ -53,7 +60,8 @@ function Base.show(io::IO, ::MIME"text/plain", bp::BenchmarkHistogram; nbins=NBI
5360
println(io, "samples: ", length(t), "; evals/sample: ", t.params.evals, "; memory estimate: ", memorystr, "; allocs estimate: ", allocsstr)
5461
if length(t) > 0
5562
bin_arg = nbins <= 0 ? NamedTuple() : (; nbins=nbins)
56-
show(io, histogram(t.times; ylabel="ns", xlabel="Counts", bin_arg...))
63+
simple_unicode_histogram(io, t.times; ylabel="ns", xlabel="Counts",
64+
outlier_quantile=OUTLIER_QUANTILE[], bin_arg...)
5765
println(io)
5866
end
5967
print(io, "min: ", minstr, "; mean: ", meanstr, "; median: ", medstr, "; max: ", maxstr, ".")
@@ -70,4 +78,7 @@ end
7078
# so that we don't have to rely on internals.
7179
include("vendor.jl")
7280

81+
# The code to draw the histograms
82+
include("simple_unicode_histogram.jl")
83+
7384
end

src/simple_unicode_histogram.jl

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Modified from https://github.com/JuliaCI/BenchmarkTools.jl/pull/180#issuecomment-711128281 by @brenhinkeller
2+
3+
const BLOCKS = [" ","","","","","","","","",""]
4+
5+
function simple_unicode_histogram(io::IO, x::AbstractArray;
6+
nbins::Integer=ceil(Int, log2(length(x))+1),
7+
plot_width::Integer=30, show_counts::Bool=true,
8+
outlier_quantile = 0.999,
9+
xlabel="", ylabel="")
10+
# Find bounds. Our naive attempt is to use equal width
11+
# bins from the minimum to the maximum.
12+
l, M = extrema(x)
13+
# our lower bounds are exclusive, so we want to be sure to get the min
14+
l = prevfloat(l)
15+
16+
# Now, we check: if we don't have some big outliers, we'd expect
17+
# the 99.9 percentile, `Q`, to be within a few bins of the maximum.
18+
# Here, we choose 2. If it is not, then we decide that indeed
19+
# there are outliers. We will instead divide the range from
20+
# the minimum to `Q` equally with `nbins-1` bins, and then reserve
21+
# the last bin to hold everything greater than `Q`.
22+
Q = quantile(x, outlier_quantile)
23+
initial_dx = (M - l) / nbins
24+
truncate = M - Q > 2*initial_dx
25+
26+
# our "upper bound"
27+
u = truncate ? Q : M
28+
29+
# Fill histogram
30+
hist_counts = fill(0, nbins)
31+
dx = truncate ? (u - l) / (nbins - 1) : initial_dx
32+
for xi in x
33+
index = ceil(Int, (xi - l) / dx)
34+
if 1 <= index <= nbins
35+
hist_counts[index] += 1
36+
else
37+
hist_counts[end] += 1
38+
end
39+
end
40+
41+
if truncate
42+
bin_edges = [range(l;stop=u,length=nbins); M]
43+
else
44+
bin_edges = range(l;stop=u,length=nbins+1)
45+
end
46+
47+
# Print the histogram
48+
d = ceil(Int, -log10(u-l))+1
49+
scale = plot_width/maximum(hist_counts)
50+
lower_labels = string.(round.(bin_edges[1:end-1], digits=d+ceil(Int,log10(nbins)-1)))
51+
upper_labels = string.(round.(bin_edges[2:end], digits=d+ceil(Int,log10(nbins)-1)))
52+
longest_lower = maximum(length.(lower_labels))
53+
longest_upper = maximum(length.(upper_labels))
54+
!isempty(ylabel) && println(io, ylabel, "\n")
55+
for i=1:nbins
56+
nblocks = hist_counts[i] * scale
57+
block_string = repeat("", floor(Int, nblocks)) * BLOCKS[ceil(Int,(nblocks - floor(nblocks))*8)+1]
58+
print(io, " (", lower_labels[i], " "^(longest_lower - length(lower_labels[i])))
59+
print(io, " - ", upper_labels[i], " "^(longest_upper - length(upper_labels[i])), "] ")
60+
printstyled(io, block_string; color=:green)
61+
if show_counts
62+
print(io, hist_counts[i])
63+
end
64+
println(io)
65+
end
66+
isempty(xlabel) || println(io, "\n", " "^max(plot_width ÷2 + 6 - length(xlabel)÷2, 0), xlabel)
67+
return nothing
68+
end

0 commit comments

Comments
 (0)