You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/lecture_10/lecture.md
+71-9Lines changed: 71 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ Julia offers different levels of parallel programming
8
8
In this lecture, we will focus mainly on the first two, since SIMD instructions are mainly used for low-level optimization (such as writing you own very performant BLAS library), and task switching is not a true paralelism, but allows to run a different task when one task is waiting for example for IO.
9
9
10
10
## Process-level paralelism
11
-
Process-level paralelism means that Julia runs several compilers in different processes. Compilers *do not share anything by default*, where by anything we mean no libraries, not variables. Everyhing has to be set-up, but Julia offers tooling for remote execution and communication primitives.
11
+
Process-level paralelism means that Julia runs several compilers in different processes. By default, different processes *do not share anything by default*, meaning no libraries and variables. Everyhing has to be therefore set-up on all processes.
12
12
13
13
Julia off-the-shelf supports a mode, where a single *main* process controlls several workers. This main process has `myid() == 1`, worker processes receive higher numbers. Julia can be started with multiple workers from the very beggining, using `-p` switch as
14
14
```julia
@@ -47,7 +47,7 @@ end
47
47
```
48
48
Alternatively, we can put the code into a separate file and load it on all workers using `-L filename.jl`
49
49
50
-
A real benefit from multi-processing is when we can *schedulle* an execution of a function and return the control immeadiately to do something else. A low-level function providing this functionality is `remotecall(fun, worker_id, args...)`. For example
50
+
Julia's multi-processing model is based on message-passing paradigm, but the abstraction is more akin to procedure calls. This means that users are saved from prepending messages with headers and implementing logic deciding which function should be called for thich header. Instead, we can *schedulle* an execution of a function on a remote worker and return the control immeadiately to continue in our job. A low-level function providing this functionality is `remotecall(fun, worker_id, args...)`. For example
51
51
```julia
52
52
@everywherebegin
53
53
functiondelayed_foo(x, y, n )
@@ -57,10 +57,16 @@ A real benefit from multi-processing is when we can *schedulle* an execution of
57
57
end
58
58
r =remotecall(delayed_foo, 2, 1, 1, 60)
59
59
```
60
-
which terminates immediately. `r` does not contain result of `foo(1, 1)`, but a struct `Future`. The `Future` can be seen as a *handle* allowing to retrieve the result later, using `fetch`, which either fetches the result or wait until the result is available.
60
+
returns immediately, even though the function will take at least 60 seconds. `r` does not contain result of `foo(1, 1)`, but a struct `Future`, which is a *remote reference* in Julia's terminology. It points data located on some machine, indicates, if they are available and allows to `fetch` them from the remote worker. `fetch` is blocking, which means that the execution is blocked until data are available (if they are never available, the process can wait forever.) The presence of data can be checked using `isready`, which in case of `Future` returned from `remote_call` indicate that the computation has finished.
61
61
```julia
62
+
isready(r)
62
63
fetch(r) ==foo(1, 1)
63
64
```
65
+
An advantage of the remote reference is that it can be freely shared around processes and the result can be retrieved on different node then the one which issued the call.s
An interesting feature of `fetch` is that it re-throw an exception raised on a different process.
65
71
```julia
66
72
@everywherebegin
@@ -71,9 +77,18 @@ end
71
77
r =@spawnat2exfoo()
72
78
```
73
79
where `@spawnat` is a an alternative to `remotecall`, which executes a closure around expression (in this case `exfoo()`) on a specified worker (in this case 2). Fetching the result `r` throws an exception on the main process.
74
-
```jula
80
+
```julia
81
+
fetch(r)
82
+
```
83
+
`@spawnat` can be executed with `:any` to signal that the user does not care, where the function will be executed and it will be left up to Julia.
84
+
```julia
85
+
r =@spawnat:anyfoo(1,1)
75
86
fetch(r)
76
87
```
88
+
Finally, if you would for some reason need to wait for the computed value, you can use
89
+
```julia
90
+
remotecall_fetch(foo, 2, 1, 1)
91
+
```
77
92
78
93
## Example: Julia sets
79
94
Our example for explaining mechanisms of distributed computing will be the computation of Julia set fractal. The computation of the fractal can be easily paralelized, since the value of each pixel is independent from the remaining. The example is adapted from [Eric Aubanel](http://www.cs.unb.ca/~aubanel/JuliaMultithreadingNotes.html).
@@ -256,10 +271,57 @@ end
256
271
Julia does not provide by default any facility to kill the remote execution except sending `ctrl-c` to the remote worker as `interrupt(pids::Integer...)`.
257
272
258
273
## Sending data
259
-
- Do not send `randn(1000, 1000)`
260
-
- Sending references and ObjectID would not work
261
-
- Serialization is very time consuming, an efficient converstion to something simple might be wort
262
-
- Dict("a" => [1,2,3], "b" = [2,3,4,5]) -> (Array of elements, array of bounds, keys)
274
+
Sending parameters of functions and receiving results from a remotely called functions migh incur a significant cost.
275
+
1. Try to minimize the data movement as much as possible. A prototypical example is
276
+
```julia
277
+
A =rand(1000,1000);
278
+
Bref =@spawnat:any A^2;
279
+
```
280
+
and
281
+
```julia
282
+
Bref =@spawnat:anyrand(1000,1000)^2;
283
+
```
284
+
2. It is not only volume of data (in terms of the number of bytes), but also a complexity of objects that are being sent. Serialization can be very time consuming, an efficient converstion to something simple might be wort
3. Some type of objects cannot be properly serialized and deserialized
305
+
```julia
306
+
a =IdDict(
307
+
:a=>rand(1,1),
308
+
)
309
+
b =remotecall_fetch(identity, 2, a)
310
+
a[:a] === a[:a]
311
+
a[:a] === b[:a]
312
+
```
313
+
4. If you need to send the data to worker, i.e. you want to define (overwrite) a global variable there
314
+
```julia
315
+
@everywherebegin
316
+
g =rand()
317
+
show_secret() =println("secret of ", myid(), " is ", g)
318
+
end
319
+
@everywhereshow_secret()
320
+
321
+
remotecall_fetch(g ->eval(:(g =$(g))), 2, g)
322
+
@everywhereshow_secret()
323
+
```
324
+
which is implemented in the
263
325
264
326
## Practical advices
265
327
Recall that (i) workers are started as clean processes and (ii) they might not share the same environment with the main process. The latter is due to the fact that files describing the environment (`Project.toml` and `Manifest.toml`) might not be available on remote machines.
@@ -298,7 +360,7 @@ A complete example can be seen in [`juliaset_p.jl`](juliaset_p.jl).
298
360
## Julia sets
299
361
An example adapted from [Eric Aubanel](http://www.cs.unb.ca/~aubanel/JuliaMultithreadingNotes.html).
300
362
301
-
For ilustration, we will use Julia set fractals, ad they can be easily paralelized. Some fractals (Julia set, Mandelbrot) are determined by properties of some complex-valued functions. Julia set counts, how many iteration is required for ``f(z)=z^2+c`` to be bigger than two in absolute value, ``|f(z)>=2``. The number of iterations can then be mapped to the pixel's color, which creates a nice visualization we know.
363
+
For ilustration, we will use Julia set fractals, ad they can be easily paralelized. Some fractals (Julia set, Mandelbrot) are determined by properties of some complex-valued functions. Julia set counts, how many iteration is required for ``f(z) = z^2+c`` to be bigger than two in absolute value, ``|f(z)| >=2``. The number of iterations can then be mapped to the pixel's color, which creates a nice visualization we know.
0 commit comments