Skip to content

Commit dfc3b6a

Browse files
committed
Lab5 'final' checks.
1 parent 9916abc commit dfc3b6a

File tree

1 file changed

+83
-45
lines changed

1 file changed

+83
-45
lines changed

docs/src/lecture_05/lab.md

Lines changed: 83 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,15 @@ polynomial(a, x)
3333

3434
- Float number valued arguments
3535
```@example lab05_polynomial
36-
af = Float64.(a)
3736
xf = 3.0
38-
polynomial(af, xf)
37+
polynomial(a, xf)
3938
```
4039

4140
The result they produce is the "same" numerically, however it differs in the output type. Though you have probably not noticed it, there should be a difference in runtime (assuming that you have run it once more after its compilation). It is probably a surprise to no one, that one of the methods that has been compiled is type unstable. This can be check with the `@code_warntype` macro:
4241
```@repl lab05_polynomial
4342
using InteractiveUtils #hide
4443
@code_warntype polynomial(a, x) # type stable
45-
@code_warntype polynomial(af, xf) # type unstable
44+
@code_warntype polynomial(a, xf) # type unstable
4645
```
4746
We are getting a little ahead of ourselves in this lab, as understanding of these expressions is part of the future [lecture](@ref introspection) and [lab](@ref introspection_lab). Anyway the output basically shows what the compiler thinks of each variable in the code, albeit for us in less readable form than the original code. The more red the color is of the type info the less sure the inferred type is. Our main focus should be on the return type of the function which is just at the start of the code with the keyword `Body`. In the first case the return type is an `Int64`, whereas in the second example the compiler is unsure whether the type is `Float64` or `Int64`, marked as the `Union` type of the two. Fortunately for us this type instability can be fixed with a single line edit, but we will see later that it is not always the case.
4847

@@ -79,21 +78,21 @@ end
7978

8079
```@repl lab05_polynomial
8180
@code_warntype polynomial_stable(a, x) # type stable
82-
@code_warntype polynomial_stable(af, xf) # type stable
81+
@code_warntype polynomial_stable(a, xf) # type stable
8382
```
8483

8584
```@repl lab05_polynomial
86-
polynomial(af, xf) #hide
87-
polynomial_stable(af, xf) #hide
88-
@time polynomial(af, xf)
89-
@time polynomial_stable(af, xf)
85+
polynomial(a, xf) #hide
86+
polynomial_stable(a, xf) #hide
87+
@time polynomial(a, xf)
88+
@time polynomial_stable(a, xf)
9089
```
9190

9291
Only really visible when evaluating multiple times.
9392
```@repl lab05_polynomial
9493
using BenchmarkTools
95-
@btime polynomial($af, $xf)
96-
@btime polynomial_stable($af, $xf)
94+
@btime polynomial($a, $xf)
95+
@btime polynomial_stable($a, $xf)
9796
```
9897
Difference only a few nanoseconds.
9998

@@ -139,24 +138,6 @@ using BenchmarkTools #hide
139138
@btime sum(A) # global variable A has to be inferred in each evaluation
140139
```
141140

142-
### Setting up benchmarks to our liking
143-
In order to control the number of samples/evaluation and the amount of time given to a given benchmark, we can simply append these as keyword arguments to `@btime` or `@benchmark` in the following way
144-
```@repl lab05_bench
145-
@benchmark sum($(rand(1000))) evals=100 samples=10 seconds=5
146-
```
147-
which runs the code repeatedly for up to `5s`, where each of the `10` samples in the trial is composed of `10` evaluations. Setting up these parameters ourselves creates a more controlled environment in which performance regressions can be more easily identified.
148-
149-
Another axis of customization is needed when we are benchmarking mutable operations such as `sort!`, which sorts an array in-place. One way of achieving a consistent benchmark is by omitting the interpolation such as
150-
```@repl lab05_bench
151-
@benchmark sort!(rand(1000))
152-
```
153-
however now we are again measuring the data generation as well. A better way of doing such timing is using the built in `setup` keyword, into which you can put a code that has to be run before each sample and which won't be measured.
154-
```@repl lab05_bench
155-
@benchmark sort!(y) setup=(y=rand(1000))
156-
A = rand(1000) #hide
157-
@benchmark sort!(AA) setup=(AA=copy($A))
158-
```
159-
160141
## Profiling
161142
Profiling in Julia is part of the standard library in the `Profile` module. It implements a fairly simple sampling based profiler, which in a nutshell asks at regular intervals, where the code execution is currently at. As a result we get an array of stacktraces (= chain of function calls), which allow us to make sense of where the execution spent the most time. The number of samples, that can be stored and the period in seconds can be checked after loading `Profile` into the session with the `init()` function.
162143

@@ -175,7 +156,7 @@ Let's look at our favorite `polynomial` function or rather it's type stable vari
175156

176157
```@repl lab05_polynomial
177158
Profile.clear() # clear the last trace (does not have to be run on fresh start)
178-
@profile polynomial_stable(af, xf)
159+
@profile polynomial_stable(a, xf)
179160
Profile.print() # text based output of the profiler
180161
```
181162
Unless the machine that you run the code on is really slow, the resulting output contains nothing or only some internals of Julia's interactive REPL. This is due to the fact that our `polynomial` function take only few nanoseconds to run. When we want to run profiling on something, that takes only a few nanoseconds, we have to repeatedly execute the function.
@@ -187,11 +168,11 @@ function run_polynomial_stable(a, x, n)
187168
end
188169
end
189170
190-
af = Float64.(rand(-10:10, 10)) # using longer polynomial
171+
a = rand(-10:10, 10) # using longer polynomial
191172
192-
run_polynomial_stable(af, xf, 10) #hide
173+
run_polynomial_stable(a, xf, 10) #hide
193174
Profile.clear()
194-
@profile run_polynomial_stable(af, xf, Int(1e5))
175+
@profile run_polynomial_stable(a, xf, Int(1e5))
195176
Profile.print()
196177
```
197178

@@ -228,8 +209,8 @@ end
228209
```
229210

230211
```@example lab05_polynomial
231-
run_polynomial(af, xf, 10) #hide
232-
@profview run_polynomial(af, xf, Int(1e5)) # clears the profile for us
212+
run_polynomial(a, xf, 10) #hide
213+
@profview run_polynomial(a, xf, Int(1e5)) # clears the profile for us
233214
ProfileSVG.save("./scalar_prof_unstable.svg") #hide
234215
nothing #hide
235216
```
@@ -241,7 +222,7 @@ nothing #hide
241222
```
242223

243224
Other options for viewing profiler outputs
244-
- [ProfileView](https://github.com/timholy/ProfileView.jl) - close cousin of `ProfileSVG`, spawns gtk window with interactive FlameGraph
225+
- [ProfileView](https://github.com/timholy/ProfileView.jl) - close cousin of `ProfileSVG`, spawns GTK window with interactive FlameGraph
245226
- [VSCode](https://www.julia-vscode.org/docs/stable/release-notes/v0_17/#Profile-viewing-support-1) - always imported `@profview` macro, flamegraphs (js extension required), filtering, one click access to source code
246227
- [PProf](https://github.com/vchuravy/PProf.jl) - serializes the profiler output to protobuffer and loads it in `pprof` web app, graph visualization of stacktraces
247228

@@ -282,18 +263,18 @@ Speed up:
282263
- 420ns -> 12ns ~ 15x on real valued input
283264

284265
```@repl lab05_polynomial
285-
@btime polynomial($(Int.(af)), $(Int(xf)))
286-
@btime polynomial_stable($(Int.(af)), $(Int(xf)))
287-
@btime polynomial($af, $xf)
288-
@btime polynomial_stable($af, $xf)
266+
@btime polynomial($a, $x)
267+
@btime polynomial_stable($a, $x)
268+
@btime polynomial($a, $xf)
269+
@btime polynomial_stable($a, $xf)
289270
```
290271
These numbers will be different on different HW.
291272

292273
**BONUS**: The profile trace does not even contain the calling of mathematical operators and is mainly dominated by the iteration utilities. In this case we had to increase the number of runs to `1e6` to get some meaningful trace.
293274

294275
```@example lab05_polynomial
295-
run_polynomial(af, xf, 10) #hide
296-
@profview run_polynomial(af, xf, Int(1e6))
276+
run_polynomial(a, xf, 10) #hide
277+
@profview run_polynomial(a, xf, Int(1e6))
297278
ProfileSVG.save("./scalar_prof_horner.svg") #hide
298279
```
299280
![profile_horner](./scalar_prof_horner.svg)
@@ -339,7 +320,7 @@ Precompile everything by running one step of our simulation and run the profiler
339320

340321
```@example lab05_ecosystem
341322
simulate!(world, 1)
342-
@profview simulate!(world, 100)
323+
@profview simulate!(world, 10)
343324
```
344325

345326
Red bars indicate type instabilities however, unless the bars stacked on top of them are high, narrow and not filling the whole width, the problem should not be that serious. In our case the worst offender is the`filter` method inside `EcosystemCore.find_rand` function, either when called from `EcosystemCore.find_food` or `EcosystemCore.find_mate`. In both cases the bars on top of it are narrow and not the full with, meaning that not that much time has been really spend working, but instead inferring the types in the function itself during runtime.
@@ -428,7 +409,7 @@ Let's profile the simulation again
428409
```@example lab05_ecosystem
429410
world = create_world();
430411
simulate!(world, 1)
431-
@profview simulate!(world, 100)
412+
@profview simulate!(world, 10)
432413
ProfileSVG.save("./ecosystem_nofilter.svg") #hide
433414
```
434415
![profile_ecosystem_nofilter](./ecosystem_nofilter.svg)
@@ -448,7 +429,7 @@ Benchmark different versions of the `find_rand` function in a simulation 10 step
448429

449430
**HINTS**:
450431
- use `Random.seed!` to fix the global random number generator before each run of simulation
451-
- use `setup` keyword and `deepcopy` to initiate the `world` variable to the same state in each evaluation
432+
- use `setup` keyword and `deepcopy` to initiate the `world` variable to the same state in each evaluation (see resources at the end of this page for more information)
452433

453434
```@raw html
454435
</div></div>
@@ -500,6 +481,7 @@ julia --track-allocation=user
500481
```
501482
Use the steps above to obtain a memory allocation map. Investigate the results of allocation tracking inside `EcosystemCore` source files. Where is the line with the most allocations?
502483

484+
**HINT**: In order to locate source files consult the useful resources at the end of this page.
503485
**BONUS**: Use pkg `Coverage.jl` to process the resulting files from withing the `EcosystemCore`.
504486
```@raw html
505487
</div></div>
@@ -588,4 +570,60 @@ julia> analyze_malloc(expanduser("~/.julia/packages/EcosystemCore/8dzJF/src"))
588570
</p></details>
589571
```
590572

591-
## Resources
573+
---
574+
575+
## Useful resources
576+
577+
### Where to find source code?
578+
As most of Julia is written in Julia itself it is sometimes helpful to look inside for some details or inspiration. The code of `Base` and stdlib pkgs is located just next to Julia's installation in the `./share/julia` subdirectory
579+
```bash
580+
./julia-1.6.2/
581+
├── bin
582+
├── etc
583+
│ └── julia
584+
├── include
585+
│ └── julia
586+
│ └── uv
587+
├── lib
588+
│ └── julia
589+
├── libexec
590+
└── share
591+
├── appdata
592+
├── applications
593+
├── doc
594+
│ └── julia # offline documentation (https://docs.julialang.org/en/v1/)
595+
└── julia
596+
├── base # base library
597+
├── stdlib # standard library
598+
└── test
599+
```
600+
Other packages installed through Pkg interface are located in the `.julia/` directory which is located in your `$HOMEDIR`, i.e. `/home/$(user)/.julia/` on Unix based systems and `/Users/$(user)/.julia/` on Windows.
601+
```bash
602+
~/.julia/
603+
├── artifacts
604+
├── compiled
605+
├── config # startup.jl lives here
606+
├── environments
607+
├── logs
608+
├── packages # packages are here
609+
└── registries
610+
```
611+
If you are using VSCode, the paths visible in the REPL can be clicked through to he actual source code. Moreover in that environment the documentation is usually available upon hovering over code.
612+
613+
### Setting up benchmarks to our liking
614+
In order to control the number of samples/evaluation and the amount of time given to a given benchmark, we can simply append these as keyword arguments to `@btime` or `@benchmark` in the following way
615+
```@repl lab05_bench
616+
@benchmark sum($(rand(1000))) evals=100 samples=10 seconds=5
617+
```
618+
which runs the code repeatedly for up to `5s`, where each of the `10` samples in the trial is composed of `10` evaluations. Setting up these parameters ourselves creates a more controlled environment in which performance regressions can be more easily identified.
619+
620+
Another axis of customization is needed when we are benchmarking mutable operations such as `sort!`, which sorts an array in-place. One way of achieving a consistent benchmark is by omitting the interpolation such as
621+
```@repl lab05_bench
622+
@benchmark sort!(rand(1000))
623+
```
624+
however now we are again measuring the data generation as well. A better way of doing such timing is using the built in `setup` keyword, into which you can put a code that has to be run before each sample and which won't be measured.
625+
```@repl lab05_bench
626+
@benchmark sort!(y) setup=(y=rand(1000))
627+
A = rand(1000) #hide
628+
@benchmark sort!(AA) setup=(AA=copy($A))
629+
```

0 commit comments

Comments
 (0)