You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/lecture_05/lab.md
+83-45Lines changed: 83 additions & 45 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,16 +33,15 @@ polynomial(a, x)
33
33
34
34
- Float number valued arguments
35
35
```@example lab05_polynomial
36
-
af = Float64.(a)
37
36
xf = 3.0
38
-
polynomial(af, xf)
37
+
polynomial(a, xf)
39
38
```
40
39
41
40
The result they produce is the "same" numerically, however it differs in the output type. Though you have probably not noticed it, there should be a difference in runtime (assuming that you have run it once more after its compilation). It is probably a surprise to no one, that one of the methods that has been compiled is type unstable. This can be check with the `@code_warntype` macro:
42
41
```@repl lab05_polynomial
43
42
using InteractiveUtils #hide
44
43
@code_warntype polynomial(a, x) # type stable
45
-
@code_warntype polynomial(af, xf) # type unstable
44
+
@code_warntype polynomial(a, xf) # type unstable
46
45
```
47
46
We are getting a little ahead of ourselves in this lab, as understanding of these expressions is part of the future [lecture](@ref introspection) and [lab](@ref introspection_lab). Anyway the output basically shows what the compiler thinks of each variable in the code, albeit for us in less readable form than the original code. The more red the color is of the type info the less sure the inferred type is. Our main focus should be on the return type of the function which is just at the start of the code with the keyword `Body`. In the first case the return type is an `Int64`, whereas in the second example the compiler is unsure whether the type is `Float64` or `Int64`, marked as the `Union` type of the two. Fortunately for us this type instability can be fixed with a single line edit, but we will see later that it is not always the case.
48
47
@@ -79,21 +78,21 @@ end
79
78
80
79
```@repl lab05_polynomial
81
80
@code_warntype polynomial_stable(a, x) # type stable
82
-
@code_warntype polynomial_stable(af, xf) # type stable
81
+
@code_warntype polynomial_stable(a, xf) # type stable
83
82
```
84
83
85
84
```@repl lab05_polynomial
86
-
polynomial(af, xf) #hide
87
-
polynomial_stable(af, xf) #hide
88
-
@time polynomial(af, xf)
89
-
@time polynomial_stable(af, xf)
85
+
polynomial(a, xf) #hide
86
+
polynomial_stable(a, xf) #hide
87
+
@time polynomial(a, xf)
88
+
@time polynomial_stable(a, xf)
90
89
```
91
90
92
91
Only really visible when evaluating multiple times.
93
92
```@repl lab05_polynomial
94
93
using BenchmarkTools
95
-
@btime polynomial($af, $xf)
96
-
@btime polynomial_stable($af, $xf)
94
+
@btime polynomial($a, $xf)
95
+
@btime polynomial_stable($a, $xf)
97
96
```
98
97
Difference only a few nanoseconds.
99
98
@@ -139,24 +138,6 @@ using BenchmarkTools #hide
139
138
@btime sum(A) # global variable A has to be inferred in each evaluation
140
139
```
141
140
142
-
### Setting up benchmarks to our liking
143
-
In order to control the number of samples/evaluation and the amount of time given to a given benchmark, we can simply append these as keyword arguments to `@btime` or `@benchmark` in the following way
which runs the code repeatedly for up to `5s`, where each of the `10` samples in the trial is composed of `10` evaluations. Setting up these parameters ourselves creates a more controlled environment in which performance regressions can be more easily identified.
148
-
149
-
Another axis of customization is needed when we are benchmarking mutable operations such as `sort!`, which sorts an array in-place. One way of achieving a consistent benchmark is by omitting the interpolation such as
150
-
```@repl lab05_bench
151
-
@benchmark sort!(rand(1000))
152
-
```
153
-
however now we are again measuring the data generation as well. A better way of doing such timing is using the built in `setup` keyword, into which you can put a code that has to be run before each sample and which won't be measured.
154
-
```@repl lab05_bench
155
-
@benchmark sort!(y) setup=(y=rand(1000))
156
-
A = rand(1000) #hide
157
-
@benchmark sort!(AA) setup=(AA=copy($A))
158
-
```
159
-
160
141
## Profiling
161
142
Profiling in Julia is part of the standard library in the `Profile` module. It implements a fairly simple sampling based profiler, which in a nutshell asks at regular intervals, where the code execution is currently at. As a result we get an array of stacktraces (= chain of function calls), which allow us to make sense of where the execution spent the most time. The number of samples, that can be stored and the period in seconds can be checked after loading `Profile` into the session with the `init()` function.
162
143
@@ -175,7 +156,7 @@ Let's look at our favorite `polynomial` function or rather it's type stable vari
175
156
176
157
```@repl lab05_polynomial
177
158
Profile.clear() # clear the last trace (does not have to be run on fresh start)
178
-
@profile polynomial_stable(af, xf)
159
+
@profile polynomial_stable(a, xf)
179
160
Profile.print() # text based output of the profiler
180
161
```
181
162
Unless the machine that you run the code on is really slow, the resulting output contains nothing or only some internals of Julia's interactive REPL. This is due to the fact that our `polynomial` function take only few nanoseconds to run. When we want to run profiling on something, that takes only a few nanoseconds, we have to repeatedly execute the function.
@@ -187,11 +168,11 @@ function run_polynomial_stable(a, x, n)
187
168
end
188
169
end
189
170
190
-
af = Float64.(rand(-10:10, 10)) # using longer polynomial
171
+
a = rand(-10:10, 10) # using longer polynomial
191
172
192
-
run_polynomial_stable(af, xf, 10) #hide
173
+
run_polynomial_stable(a, xf, 10) #hide
193
174
Profile.clear()
194
-
@profile run_polynomial_stable(af, xf, Int(1e5))
175
+
@profile run_polynomial_stable(a, xf, Int(1e5))
195
176
Profile.print()
196
177
```
197
178
@@ -228,8 +209,8 @@ end
228
209
```
229
210
230
211
```@example lab05_polynomial
231
-
run_polynomial(af, xf, 10) #hide
232
-
@profview run_polynomial(af, xf, Int(1e5)) # clears the profile for us
212
+
run_polynomial(a, xf, 10) #hide
213
+
@profview run_polynomial(a, xf, Int(1e5)) # clears the profile for us
-[ProfileView](https://github.com/timholy/ProfileView.jl) - close cousin of `ProfileSVG`, spawns gtk window with interactive FlameGraph
225
+
-[ProfileView](https://github.com/timholy/ProfileView.jl) - close cousin of `ProfileSVG`, spawns GTK window with interactive FlameGraph
245
226
-[VSCode](https://www.julia-vscode.org/docs/stable/release-notes/v0_17/#Profile-viewing-support-1) - always imported `@profview` macro, flamegraphs (js extension required), filtering, one click access to source code
246
227
-[PProf](https://github.com/vchuravy/PProf.jl) - serializes the profiler output to protobuffer and loads it in `pprof` web app, graph visualization of stacktraces
247
228
@@ -282,18 +263,18 @@ Speed up:
282
263
- 420ns -> 12ns ~ 15x on real valued input
283
264
284
265
```@repl lab05_polynomial
285
-
@btime polynomial($(Int.(af)), $(Int(xf)))
286
-
@btime polynomial_stable($(Int.(af)), $(Int(xf)))
287
-
@btime polynomial($af, $xf)
288
-
@btime polynomial_stable($af, $xf)
266
+
@btime polynomial($a, $x)
267
+
@btime polynomial_stable($a, $x)
268
+
@btime polynomial($a, $xf)
269
+
@btime polynomial_stable($a, $xf)
289
270
```
290
271
These numbers will be different on different HW.
291
272
292
273
**BONUS**: The profile trace does not even contain the calling of mathematical operators and is mainly dominated by the iteration utilities. In this case we had to increase the number of runs to `1e6` to get some meaningful trace.
293
274
294
275
```@example lab05_polynomial
295
-
run_polynomial(af, xf, 10) #hide
296
-
@profview run_polynomial(af, xf, Int(1e6))
276
+
run_polynomial(a, xf, 10) #hide
277
+
@profview run_polynomial(a, xf, Int(1e6))
297
278
ProfileSVG.save("./scalar_prof_horner.svg") #hide
298
279
```
299
280

@@ -339,7 +320,7 @@ Precompile everything by running one step of our simulation and run the profiler
339
320
340
321
```@example lab05_ecosystem
341
322
simulate!(world, 1)
342
-
@profview simulate!(world, 100)
323
+
@profview simulate!(world, 10)
343
324
```
344
325
345
326
Red bars indicate type instabilities however, unless the bars stacked on top of them are high, narrow and not filling the whole width, the problem should not be that serious. In our case the worst offender is the`filter` method inside `EcosystemCore.find_rand` function, either when called from `EcosystemCore.find_food` or `EcosystemCore.find_mate`. In both cases the bars on top of it are narrow and not the full with, meaning that not that much time has been really spend working, but instead inferring the types in the function itself during runtime.
@@ -428,7 +409,7 @@ Let's profile the simulation again
@@ -448,7 +429,7 @@ Benchmark different versions of the `find_rand` function in a simulation 10 step
448
429
449
430
**HINTS**:
450
431
- use `Random.seed!` to fix the global random number generator before each run of simulation
451
-
- use `setup` keyword and `deepcopy` to initiate the `world` variable to the same state in each evaluation
432
+
- use `setup` keyword and `deepcopy` to initiate the `world` variable to the same state in each evaluation (see resources at the end of this page for more information)
452
433
453
434
```@raw html
454
435
</div></div>
@@ -500,6 +481,7 @@ julia --track-allocation=user
500
481
```
501
482
Use the steps above to obtain a memory allocation map. Investigate the results of allocation tracking inside `EcosystemCore` source files. Where is the line with the most allocations?
502
483
484
+
**HINT**: In order to locate source files consult the useful resources at the end of this page.
503
485
**BONUS**: Use pkg `Coverage.jl` to process the resulting files from withing the `EcosystemCore`.
As most of Julia is written in Julia itself it is sometimes helpful to look inside for some details or inspiration. The code of `Base` and stdlib pkgs is located just next to Julia's installation in the `./share/julia` subdirectory
579
+
```bash
580
+
./julia-1.6.2/
581
+
├── bin
582
+
├── etc
583
+
│ └── julia
584
+
├── include
585
+
│ └── julia
586
+
│ └── uv
587
+
├── lib
588
+
│ └── julia
589
+
├── libexec
590
+
└── share
591
+
├── appdata
592
+
├── applications
593
+
├── doc
594
+
│ └── julia # offline documentation (https://docs.julialang.org/en/v1/)
595
+
└── julia
596
+
├── base # base library
597
+
├── stdlib # standard library
598
+
└── test
599
+
```
600
+
Other packages installed through Pkg interface are located in the `.julia/` directory which is located in your `$HOMEDIR`, i.e. `/home/$(user)/.julia/` on Unix based systems and `/Users/$(user)/.julia/` on Windows.
601
+
```bash
602
+
~/.julia/
603
+
├── artifacts
604
+
├── compiled
605
+
├── config # startup.jl lives here
606
+
├── environments
607
+
├── logs
608
+
├── packages # packages are here
609
+
└── registries
610
+
```
611
+
If you are using VSCode, the paths visible in the REPL can be clicked through to he actual source code. Moreover in that environment the documentation is usually available upon hovering over code.
612
+
613
+
### Setting up benchmarks to our liking
614
+
In order to control the number of samples/evaluation and the amount of time given to a given benchmark, we can simply append these as keyword arguments to `@btime` or `@benchmark` in the following way
which runs the code repeatedly for up to `5s`, where each of the `10` samples in the trial is composed of `10` evaluations. Setting up these parameters ourselves creates a more controlled environment in which performance regressions can be more easily identified.
619
+
620
+
Another axis of customization is needed when we are benchmarking mutable operations such as `sort!`, which sorts an array in-place. One way of achieving a consistent benchmark is by omitting the interpolation such as
621
+
```@repl lab05_bench
622
+
@benchmark sort!(rand(1000))
623
+
```
624
+
however now we are again measuring the data generation as well. A better way of doing such timing is using the built in `setup` keyword, into which you can put a code that has to be run before each sample and which won't be measured.
0 commit comments