You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/lecture_10/lecture.md
+59-39Lines changed: 59 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,10 +7,10 @@ Julia offers different levels of parallel programming
7
7
8
8
In this lecture, we will focus mainly on the first two, since SIMD instructions are mainly used for low-level optimization (such as writing your own very performant BLAS library), and task switching is not a true paralelism, but allows to run a different task when one task is waiting for example for IO.
9
9
10
-
**The most important lesson is that before you jump into the parallelism, be certain you have made your squential code as fast as possible.**
10
+
**The most important lesson is that before you jump into the parallelism, be certain you have made your sequential code as fast as possible.**
11
11
12
12
## Process-level paralelism
13
-
Process-level paralelism means we run several instances of Julia (in different processes) and they communicate between each other using inter-process communication (IPC). The implementation of IPC differs if parallel julia instances share the same machine, or they are located spread over the network. By default, different processes *do not share any libraries or any variables*. The are loaded as clean and it is up to the user to set-up all needed code and data.
13
+
Process-level paralelism means we run several instances of Julia (in different processes) and they communicate between each other using inter-process communication (IPC). The implementation of IPC differs if parallel julia instances share the same machine, or they are on different machines spread over the network. By default, different processes *do not share any libraries or any variables*. They are loaded clean and it is up to the user to set-up all needed code and data.
14
14
15
15
Julia's default modus operandi is a single *main* instance controlling several workers. This main instance has `myid() == 1`, worker processes receive higher numbers. Julia can be started with multiple workers from the very beggining, using `-p` switch as
16
16
```julia
@@ -31,14 +31,16 @@ As we have mentioned, workers are loaded without libraries. We can see that by r
31
31
```julia
32
32
@everywhere InteractiveUtils.varinfo()
33
33
```
34
-
which fails, but
34
+
which fails, but after loading `InteractiveUtils` everywhere
`@everywhere` macro allows us to define function and variables, and import libraries on workers as
42
+
we see that `Statistics` was loaded only on the main process. Thus, there is not magical sharing of data and code.
43
+
With `@everywhere` macro we can define function and variables, and import libraries on workers as
42
44
```julia
43
45
@everywherebegin
44
46
foo(x, y) = x * y +sin(y)
@@ -79,7 +81,7 @@ An interesting feature of `fetch` is that it re-throw an exception raised on a d
79
81
end
80
82
r =@spawnat2exfoo()
81
83
```
82
-
where `@spawnat`is a an alternative to `remotecall`, which executes a closure around expression (in this case `exfoo()`) on a specified worker (in this case 2). Fetching the result `r` throws an exception on the main process.
84
+
where we have used `@spawnat`instead of `remote_call`. It is higher level alternative executing a closure around the expression (in this case `exfoo()`) on a specified worker, in this case 2. Coming back to the example, when we fetch the result `r`, the exception is throwed on the main process, not on the worker
which has slightly better timing then the version based on `@spawnat` and `fetch` (as explained below in section about `Threads`, the parallel computation of Julia set suffers from each pixel taking different time to compute, which can be relieved by dividing the work into more parts --- `@btime juliaset_pmap(-0.79, 0.15, 1000, 16);`).
209
209
210
210
## Shared memory
211
-
When all workers and master are located on the same process, and the OS supports sharing memory between processes (by sharing memory pages), we can use `SharedArrays` to avoid sending the matrix with results.
212
-
```julia
213
-
@everywhereusing SharedArrays
214
-
functionjuliaset_shared(x, y, n=1000)
215
-
c = x + y*im
216
-
img =SharedArray(Array{UInt8,2}(undef,n,n))
217
-
@sync@distributedfor j in1:n
218
-
juliaset_column!(img, c, n, j, j)
219
-
end
220
-
return img
221
-
end
211
+
When main and all workers are located on the same process, and the OS supports sharing memory between processes (by sharing memory pages), we can use `SharedArrays` to avoid sending the matrix with results.
212
+
```julia
213
+
@everywherebegin
214
+
using SharedArrays
215
+
functionjuliaset_shared(x, y, n=1000)
216
+
c = x + y*im
217
+
img =SharedArray(Array{UInt8,2}(undef,n,n))
218
+
@sync@distributedfor j in1:n
219
+
juliaset_column!(img, c, n, j, j)
220
+
end
221
+
return img
222
+
end
223
+
end
222
224
julia>@elapsedjuliaset_shared(-0.79, 0.15);
223
225
0.021699503
224
226
```
@@ -254,21 +256,20 @@ The code for the main will look like
254
256
functionjuliaset_channels(x, y, n =1000, np =nworkers())
foreach(cols ->put!(instructions, (c, n, cols)), columns)
259
-
results =RemoteChannel(()->Channel{Tuple}(np))
261
+
results =RemoteChannel(()->Channel(np))
260
262
rfuns = [@spawnat i juliaset_channel_worker(instructions, results) for i inworkers()]
261
263
262
264
img =Array{UInt8,2}(undef, n, n)
263
-
whileisready(results)
265
+
for i in1:np
264
266
cols, impart =take!(results)
265
267
img[:,cols] .= impart;
266
268
end
267
269
img
268
270
end
269
271
270
272
julia>@btimejuliaset_channels(-0.79, 0.15);
271
-
254.151 μs (254 allocations:987.09 KiB)
272
273
```
273
274
The execution timw is much higher then what we have observed in the previous cases and changing the number of workers does not help much. What went wrong? The reason is that setting up the infrastructure around remote channels is a costly process. Consider the following alternative, where (i) we let workers to run endlessly and (ii) the channel infrastructure is set-up once and wrapped into an anonymous function
In some use-cases, the alternative can be to put all jobs to the `RemoteChannel` before workers are started, and then stop the workers when the remote channel is empty as
put!(results, (cols, juliaset_columns(c, n, cols)))
378
+
end
379
+
end
380
+
end
381
+
```
370
382
371
383
## Sending data
372
384
Sending parameters of functions and receiving results from a remotely called functions migh incur a significant cost.
@@ -379,7 +391,7 @@ and
379
391
```julia
380
392
Bref =@spawnat:anyrand(1000,1000)^2;
381
393
```
382
-
2. It is not only volume of data (in terms of the number of bytes), but also a complexity of objects that are being sent. Serialization can be very time consuming, an efficient converstion to something simple might be wort
394
+
2. It is not only volume of data (in terms of the number of bytes), but also a complexity of objects that are being sent. Serialization can be very time consuming, an efficient converstion to something simple might be worth
which is implemented in the `ParallelDataTransfer.jl` with other variants, but in general, this construct should be avoided.
423
435
424
436
## Practical advices
425
-
Recall that (i) workers are started as clean processes and (ii) they might not share the same environment with the main process. The latter is due to the possibility of remote machines to have a different directory structure. Our best practices are:
437
+
Recall that (i) workers are started as clean processes and (ii) they might not share the same environment with the main process. The latter is due to the possibility of remote machines to have a different directory structure.
438
+
```julia
439
+
@everywherebegin
440
+
using Pkg
441
+
println(Pkg.project().path)
442
+
end
443
+
```
444
+
Our advices earned by practice are:
426
445
- to have shared directory (shared home) with code and to share the location of packages
427
446
- to place all code for workers to one file, let's call it `worker.jl` (author of this includes the code for master as well).
428
447
- put to the beggining of `worker.jl` code activating specified environment as
@@ -550,7 +569,7 @@ end
550
569
julia>@btimejuliaset_forkjoin(-0.79, 0.15);
551
570
10.326 ms (142 allocations:986.83 KiB)
552
571
```
553
-
Due to task switching overhead, increasing the granularity might not pay off.
572
+
Unfortunatelly, the `LoggingProfiler` does not handle task migration at the moment, which means that we cannot visualize the results. Due to task switching overhead, increasing the granularity might not pay off.
0 commit comments