Skip to content

Commit 1192bdf

Browse files
committed
do benchmarking only once
1 parent 4b8b840 commit 1192bdf

File tree

2 files changed

+104
-31
lines changed

2 files changed

+104
-31
lines changed

docs/src/lecture_01/lab.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -600,9 +600,12 @@ nothing #hide
600600
```
601601

602602
There are other options to import a function/macro from a different package, however for now let's keep it simple with the `using Module` syntax, that brings to the REPL, all the variables/function/macros exported by the `BenchmarkTools` pkg. If `@btime` is exported, which it is, it can be accessed without specification i.e. just by calling `@btime` without the need for `BenchmarkTools.@btime`. More on the architecture of pkg/module loading in the package developement lecture.
603-
```@repl lab01_base
604-
using BenchmarkTools
605-
@btime polynomial(aexp, x)
603+
```julia
604+
julia> using BenchmarkTools
605+
606+
julia> @btime polynomial(aexp, x)
607+
97.119 ns (1 allocation: 16 bytes)
608+
3.004165230550543
606609
```
607610
The output gives us the time of execution averaged over multiple runs (the number of samples is defined automatically based on run time) as well as the number of allocations and the output of the function, that is being benchmarked.
608611

docs/src/lecture_02/lecture.md

Lines changed: 98 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -95,12 +95,15 @@ one observes the second version produces more optimal code. Why is that?
9595
This difference will indeed have an impact on the time of code execution.
9696
On my i5-8279U CPU, the difference (as measured by BenchmarkTools) is
9797

98-
````@example lecture
98+
```julia
9999
using BenchmarkTools
100-
#@btime energy(a);
101-
#@btime energy(b);
102-
nothing #hide
103-
````
100+
@btime energy(a)
101+
@btime energy(b)
102+
```
103+
```
104+
159.669 ns (0 allocations: 0 bytes)
105+
44.571 ns (0 allocations: 0 bytes)
106+
```
104107

105108
Which nicely demonstrates that the choice of types affects performance. Does it mean that we should always use `Tuples` instead of `Arrays`? Surely not, it is just that each is better for different use-cases. Using Tuples means that the compiler will compile a special function for each length of tuple and each combination of types of items it contains, which is clearly wasteful.
106109

@@ -117,11 +120,14 @@ nothing # hide
117120

118121
`wolfpack_a` carries a type `Vector{Wolf}` while `wolfpack_b` has the type `Vector{Any}`. This means that in the first case, the compiler knows that all items are of the type `Wolf`and it can specialize functions using this information. In case of `wolfpack_b`, it does not know which animal it will encounter (although all are of the same type), and therefore it needs to dynamically resolve the type of each item upon its use. This ultimately leads to less performant code.
119122

120-
````@example lecture
121-
#@btime energy(wolfpack_a)
122-
#@btime energy(wolfpack_b)
123-
nothing # hide
124-
````
123+
```julia
124+
@btime energy(wolfpack_a)
125+
@btime energy(wolfpack_b)
126+
```
127+
```
128+
40.279 ns (0 allocations: 0 bytes)
129+
159.407 ns (0 allocations: 0 bytes)
130+
```
125131

126132
To conclude, julia is indeed a dynamically typed language, **but** if the compiler can infer
127133
all types in a called function in advance, it does not have to perform the type resolution
@@ -258,25 +264,49 @@ end
258264

259265
This works as the definition above except that the arguments are not converted to `Float64` now. One can store different values in `x` and `y`, for example `String` (e.g. VaguePosition("Hello","world")). Although the above definition might be convenient, it limits the compiler's ability to specialize, as the type `VaguePosition` does not carry information about type of `x` and `y`, which has a negative impact on the performance. For example
260266

261-
````@example lecture
267+
```julia
262268
using BenchmarkTools
263269
move(a,b) = typeof(a)(a.x+b.x, a.y+b.y)
264270
x = [PositionF64(rand(), rand()) for _ in 1:100]
265271
y = [VaguePosition(rand(), rand()) for _ in 1:100]
266272
@benchmark reduce(move, x)
267273
@benchmark reduce(move, y)
268-
````
274+
```
275+
```
276+
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
277+
Range (min … max): 2.245 μs … 428.736 μs ┊ GC (min … max): 0.00% … 99.35%
278+
Time (median): 2.306 μs ┊ GC (median): 0.00%
279+
Time (mean ± σ): 2.538 μs ± 8.488 μs ┊ GC (mean ± σ): 6.68% ± 1.99%
280+
281+
282+
▁▂▄▂▆█▇▇▃▂▁▁▁▂▂▁▂▂▂▁▂▂▃▂▂▂▂▂▂▂▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
283+
2.25 μs Histogram: frequency by time 2.75 μs <
284+
285+
Memory estimate: 3.12 KiB, allocs estimate: 199.
286+
```
269287

270288
Giving fields of a composite type an abstract type does not really solve the problem of the compiler not knowing the type. In this example, it still does not know, if it should use instructions for `Float64` or `Int8`.
271289

272-
````@example lecture
290+
```julia
273291
struct LessVaguePosition
274292
x::Real
275293
y::Real
276294
end
277295
z = [LessVaguePosition(rand(), rand()) for _ in 1:100];
278296
@benchmark reduce(move, z)
279-
````
297+
```
298+
```
299+
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
300+
Range (min … max): 16.542 μs … 5.043 ms ┊ GC (min … max): 0.00% … 99.57%
301+
Time (median): 16.959 μs ┊ GC (median): 0.00%
302+
Time (mean ± σ): 17.903 μs ± 50.271 μs ┊ GC (mean ± σ): 2.80% ± 1.00%
303+
304+
▆▇███▇▅▃▂▁▂▃▄▂▄▄▄▄▃▃▂▁▁ ▁▁ ▁▁▁ ▁ ▂
305+
███████████████████████████▇▇▄▇▇▇▇▆▄▆▇▇▇▆▇▇▇▇█████████▇▇▇▆▅ █
306+
16.5 μs Histogram: log(frequency) by time 21.3 μs <
307+
308+
Memory estimate: 9.31 KiB, allocs estimate: 496.
309+
```
280310

281311
From the perspective of generating optimal code, both definitions are equally uninformative to the compiler as it cannot assume anything about the code. However, the `LessVaguePosition` will ensure that the position will contain only numbers, hence catching trivial errors like instantiating `VaguePosition` with non-numeric types for which arithmetic operators will not be defined (recall the discussion on the beginning of the lecture).
282312

@@ -309,23 +339,27 @@ Note, that the memory layout of mutable structures is different, as fields now c
309339
### Parametric types
310340
So far, we had to trade-off flexibility for generality in type definitions. Can we have both? The answer is affirmative. The way to achieve this **flexibility** in definitions of the type while being able to generate optimal code is to **parametrize** the type definition. This is achieved by replacing types with a parameter (typically a single uppercase character) and decorating in definition by specifying different type in curly brackets. For example
311341

312-
````@example lecture
342+
```julia
313343
struct PositionT{T}
314344
x::T
315345
y::T
316346
end
317347
u = [PositionT(rand(), rand()) for _ in 1:100]
318-
#@btime reduce(move, u)
319-
nothing #hide
320-
````
348+
@btime reduce(move, u)
349+
```
350+
```
351+
116.285 ns (1 allocation: 32 bytes)
352+
```
321353

322354
Notice that the compiler can take advantage of specializing for different types (which does not have an effect here as in modern processors addition of `Float` and `Int` takes the same time).
323355

324-
````@example lecture
356+
```julia
325357
v = [PositionT(rand(1:100), rand(1:100)) for _ in 1:100]
326-
#@btime reduce(move, v)
327-
nothing #hide
328-
````
358+
@btime reduce(move, v)
359+
```
360+
```
361+
116.892 ns (1 allocation: 32 bytes)
362+
```
329363

330364
The above definition suffers the same problem as `VaguePosition`, which is that it allows us to instantiate the `PositionT` with non-numeric types, e.g. `String`. We solve this by restricting the types `T` to be children of some supertype, in this case `Real`
331365

@@ -457,17 +491,41 @@ If the compiler cannot narrow the types of arguments to concrete types, it has t
457491
100 ns for method with two arguements).
458492
Recall the above example
459493

460-
````@example lecture
494+
```julia
461495
wolfpack_a = [Wolf("1", 1), Wolf("2", 2), Wolf("3", 3)]
462496
@benchmark energy(wolfpack_a)
463-
````
497+
```
498+
```
499+
BenchmarkTools.Trial: 10000 samples with 991 evaluations.
500+
Range (min … max): 40.195 ns … 66.641 ns ┊ GC (min … max): 0.00% … 0.00%
501+
Time (median): 40.742 ns ┊ GC (median): 0.00%
502+
Time (mean ± σ): 40.824 ns ± 1.025 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
503+
504+
▂▃ ▃▅▆▅▆█▅▅▃▂▂ ▂
505+
▇██████████████▇▇▅▅▁▅▄▁▅▁▄▄▃▄▅▄▅▃▅▃▅▁▃▁▄▄▃▁▁▅▃▃▄▃▄▃▄▆▆▇▇▇▇█ █
506+
40.2 ns Histogram: log(frequency) by time 43.7 ns <
507+
508+
Memory estimate: 0 bytes, allocs estimate: 0.
509+
```
464510

465511
and
466512

467-
````@example lecture
513+
```julia
468514
wolfpack_b = Any[Wolf("1", 1), Wolf("2", 2), Wolf("3", 3)]
469515
@benchmark energy(wolfpack_b)
470-
````
516+
```
517+
```
518+
BenchmarkTools.Trial: 10000 samples with 800 evaluations.
519+
Range (min … max): 156.406 ns … 212.344 ns ┊ GC (min … max): 0.00% … 0.00%
520+
Time (median): 157.136 ns ┊ GC (median): 0.00%
521+
Time (mean ± σ): 158.114 ns ± 4.023 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
522+
523+
▅█▆▅▄▂ ▃▂▁ ▂
524+
██████▆▇██████▇▆▇█▇▆▆▅▅▅▅▅▃▄▄▅▄▄▄▄▅▁▃▄▄▃▃▄▃▃▃▄▄▄▅▅▅▅▁▅▄▃▅▄▄▅▅ █
525+
156 ns Histogram: log(frequency) by time 183 ns <
526+
527+
Memory estimate: 0 bytes, allocs estimate: 0.
528+
```
471529

472530
An interesting intermediate between fully abstract and fully concrete type happens, when the compiler knows that arguments have abstract type, which is composed of a small number of concrete types. This case called Union-Splitting, which happens when there is just a little bit of uncertainty. Julia will do something like
473531
```julia
@@ -481,11 +539,23 @@ end
481539
```
482540
For example
483541

484-
````@example lecture
542+
```julia
485543
const WolfOrSheep = Union{Wolf, Sheep}
486544
wolfpack_c = WolfOrSheep[Wolf("1", 1), Wolf("2", 2), Wolf("3", 3)]
487545
@benchmark energy(wolfpack_c)
488-
````
546+
```
547+
```
548+
BenchmarkTools.Trial: 10000 samples with 991 evaluations.
549+
Range (min … max): 43.600 ns … 73.494 ns ┊ GC (min … max): 0.00% … 0.00%
550+
Time (median): 44.106 ns ┊ GC (median): 0.00%
551+
Time (mean ± σ): 44.279 ns ± 0.931 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
552+
553+
█ ▁ ▃
554+
▂▂▂▆▃██▅▃▄▄█▅█▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂ ▃
555+
43.6 ns Histogram: frequency by time 47.4 ns <
556+
557+
Memory estimate: 0 bytes, allocs estimate: 0.
558+
```
489559

490560
Thanks to union splitting, Julia is able to have performant operations on arrays with undefined / missing values for example
491561

0 commit comments

Comments
 (0)