Skip to content

Commit 50fba09

Browse files
pevnakjanfrancu
authored andcommitted
added example why allocation on heaps cripples the performance
1 parent 662a722 commit 50fba09

File tree

1 file changed

+55
-10
lines changed

1 file changed

+55
-10
lines changed

docs/src/lecture_10/lecture.md

Lines changed: 55 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ Julia offers different levels of parallel programming
77

88
In this lecture, we will focus mainly on the first two, since SIMD instructions are mainly used for low-level optimization (such as writing you own very performant BLAS library), and task switching is not a true paralelism, but allows to run a different task when one task is waiting for example for IO.
99

10+
**The most important lesson is that before you jump into the parallelism, make sure your code is fast sequentially**
11+
1012
## Process-level paralelism
1113
Process-level paralelism means that Julia runs several compilers in different processes. By default, different processes *do not share anything by default*, meaning no libraries and variables. Everyhing has to be therefore set-up on all processes.
1214

@@ -321,11 +323,10 @@ end
321323
remotecall_fetch(g -> eval(:(g = $(g))), 2, g)
322324
@everywhere show_secret()
323325
```
324-
which is implemented in the
326+
which is implemented in the `ParallelDataTransfer.jl` with other variants, but in general, this construct should be avoided.
325327

326328
## Practical advices
327-
Recall that (i) workers are started as clean processes and (ii) they might not share the same environment with the main process. The latter is due to the fact that files describing the environment (`Project.toml` and `Manifest.toml`) might not be available on remote machines.
328-
We recommend:
329+
Recall that (i) workers are started as clean processes and (ii) they might not share the same environment with the main process. The latter is due to the possibility of remote machines to have a different directory structure. Our best practices are:
329330
- to have shared directory (shared home) with code and to share the location of packages
330331
- to place all code for workers to one file, let's call it `worker.jl` (author of this includes the code for master as well).
331332
- put to the beggining of `worker.jl` code activating specified environment as
@@ -350,13 +351,6 @@ where `main()` is the function defined in `worker.jl` to be executed on the main
350351

351352
A complete example can be seen in [`juliaset_p.jl`](juliaset_p.jl).
352353

353-
354-
## Multi-Threadding
355-
- Locks / lock-free multi-threadding
356-
- Show the effect of different schedullers
357-
- intra-model parallelism
358-
- sucks when operating with Heap
359-
360354
## Julia sets
361355
An example adapted from [Eric Aubanel](http://www.cs.unb.ca/~aubanel/JuliaMultithreadingNotes.html).
362356

@@ -500,15 +494,66 @@ julia> @btime juliaset(-0.79, 0.15, 1000, juliaset_folds!);
500494
10.421 ms (3582 allocations: 1.20 MiB)
501495
```
502496

497+
## Garbage collector is single-threadded
498+
Keep reminded that while threads are very easy very convenient to use, there are use-cases where you might be better off with proccess, even though there will be some communication overhead. One such case happens when you need to allocate and free a lot of memory. This is because Julia's garbage collector is single-threadded. Imagine a task of making histogram of bytes in a directory.
499+
For a fair comparison, we will use `Transducers`, since they offer thread and process based paralelism
500+
```julia
501+
using Transducers
502+
@everywhere begin
503+
function histfile(filename)
504+
h = Dict{UInt8,Int}()
505+
foreach(open(read, filename, "r")) do b
506+
h[b] = get(h, b, 0) + 1
507+
end
508+
h
509+
end
510+
end
511+
512+
files = filter(isfile, readdir("/Users/tomas.pevny/Downloads/", join = true))
513+
@elapsed foldxd(mergewith(+), files |> Map(histfile))
514+
150.863183701
515+
```
516+
and using the multi-threaded version of `map`
517+
```julia
518+
@elapsed foldxt(mergewith(+), files |> Map(histfile))
519+
205.309952618
520+
```
521+
we see that the threadding is actually worse than process based paralelism despite us paying the price for serialization and deserialization of `Dict`. Needless to say that changing `Dict` to `Vector` as
522+
```julia
523+
using Transducers
524+
@everywhere begin
525+
function histfile(filename)
526+
h = Dict{UInt8,Int}()
527+
foreach(open(read, filename, "r")) do b
528+
h[b] = get(h, b, 0) + 1
529+
end
530+
h
531+
end
532+
end
533+
files = filter(isfile, readdir("/Users/tomas.pevny/Downloads/", join = true))
534+
@elapsed foldxd(mergewith(+), files |> Map(histfile))
535+
36.224765744
536+
@elapsed foldxt(mergewith(+), files |> Map(histfile))
537+
23.257072067
538+
```
539+
is much better.
540+
541+
542+
## Multi-Threadding
543+
- Locks / lock-free multi-threadding
544+
545+
503546
## Take away message
504547
When deciding, what kind of paralelism to employ, consider following
505548
- for tightly coupled computation over shared data, multi-threadding is more suitable due to non-existing sharing of data between processes
506549
- but if the computation requires frequent allocation and freeing of memery, or IO, separate processes are multi-suitable, since garbage collectors are independent between processes
550+
- Making all cores busy while achieving an ideally linear speedup is difficult and needs a lot of experience and knowledge. Tooling and profilers supporting debugging of parallel processes is not much developped.
507551
- `Transducers` thrives for (almost) the same code to support thread- and process-based paralelism.
508552

509553
### Materials
510554
- http://cecileane.github.io/computingtools/pages/notes1209.html
511555
- https://lucris.lub.lu.se/ws/portalfiles/portal/61129522/julia_parallel.pdf
556+
- http://igoro.com/archive/gallery-of-processor-cache-effects/
512557
- https://www.csd.uwo.ca/~mmorenom/cs2101a_moreno/Parallel_computing_with_Julia.pdf
513558
- Threads: https://juliahighperformance.com/code/Chapter09.html
514559
- Processes: https://juliahighperformance.com/code/Chapter10.html

0 commit comments

Comments
 (0)