You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -5,23 +5,239 @@ Julia offers different levels of parallel programming
5
5
- SIMD instructions
6
6
- Task switching.
7
7
8
-
In this lecture, we will focus mainly on the first two, since SIMD instructions are mainly used for optimization of loops, and task switching is not a true paralelism, but allows to run a different task when one task is waiting for example for IO.
8
+
In this lecture, we will focus mainly on the first two, since SIMD instructions are mainly used for low-level optimization (such as writing you own very performant BLAS library), and task switching is not a true paralelism, but allows to run a different task when one task is waiting for example for IO.
9
9
10
-
## controller / worker model of parallel processes
11
-
- the usual remote fetch call
12
-
- solve monte carlo
13
-
- pmap
10
+
## Process-level paralelism
11
+
Process-level paralelism means that Julia runs several compilers in different processes. Compilers *do not share anything by default*, where by anything we mean no libraries, not variables. Everyhing has to be set-up, but Julia offers tooling for remote execution and communication primitives.
14
12
15
-
- how to set up workers,
16
-
+ how to load functions, modules
17
-
+ julia -p 16 -L load_my_script.jl
18
-
- how to send data / how to define variable on remote process
13
+
Julia off-the-shelf supports a mode, where a single *main* process controlls several workers. This main process has `myid() == 1`, worker processes receive higher numbers. Julia can be started with multiple workers from the very beggining, using `-p` switch as
14
+
```julia
15
+
julia -p n
16
+
```
17
+
where `n` is the number of workers, or you can add workers after Julia has been started by
18
+
```julia
19
+
using Distributed
20
+
addprocs(4)
21
+
```
22
+
(when Julia is started with `-p`, `Distributed` library is loaded by default on main worker). You can also remove workers using `rmprocs`. Workers can be on the same physical machines, or on different machines. Julia offer integration via `ClusterManagers.jl` with most schedulling systems.
23
+
24
+
If you want evaluate piece of code on all workers including main process, a convenience macro `@everywhere` is offered.
25
+
```julia
26
+
@everywhere@showmyid()
27
+
```
28
+
As we have mentioned, workers are loaded without libraries. We can see that by running
29
+
```julia
30
+
@everywhere InteractiveUtils.varinfo()
31
+
```
32
+
which fails, but
33
+
```julia
34
+
@everywherebegin
35
+
using InteractiveUtils
36
+
println(InteractiveUtils.varinfo())
37
+
end
38
+
```
39
+
40
+
`@everywhere` macro allows us to define function and variables, and import libraries on workers as
41
+
```julia
42
+
@everywherebegin
43
+
foo(x, y) = x * y +sin(y)
44
+
foo(x) =foo(x, myid())
45
+
end
46
+
@everywhere@showfoo(1.0)
47
+
```
48
+
Alternatively, we can put the code into a separate file and load it on all workers using `-L filename.jl`
49
+
50
+
A real benefit from multi-processing is when we can *schedulle* an execution of a function and return the control immeadiately to do something else. A low-level function providing this functionality is `remotecall(fun, worker_id, args...)`. For example
51
+
```julia
52
+
@everywherebegin
53
+
functiondelayed_foo(x, y, n )
54
+
sleep(n)
55
+
foo(x, y)
56
+
end
57
+
end
58
+
r =remotecall(delayed_foo, 2, 1, 1, 60)
59
+
```
60
+
which terminates immediately. `r` does not contain result of `foo(1, 1)`, but a struct `Future`. The `Future` can be seen as a *handle* allowing to retrieve the result later, using `fetch`, which either fetches the result or wait until the result is available.
61
+
```julia
62
+
fetch(r) ==foo(1, 1)
63
+
```
64
+
An interesting feature of `fetch` is that it re-throw an exception raised on a different process.
65
+
```julia
66
+
@everywherebegin
67
+
functionexfoo()
68
+
throw("Exception from $(myid())")
69
+
end
70
+
end
71
+
r =@spawnat2exfoo()
72
+
```
73
+
where `@spawnat` is a an alternative to `remotecall`, which executes a closure around expression (in this case `exfoo()`) on a specified worker (in this case 2). Fetching the result `r` throws an exception on the main process.
74
+
```jula
75
+
fetch(r)
76
+
```
77
+
78
+
## Example: Julia sets
79
+
Our example for explaining mechanisms of distributed computing will be the computation of Julia set fractal. The computation of the fractal can be easily paralelized, since the value of each pixel is independent from the remaining. The example is adapted from [Eric Aubanel](http://www.cs.unb.ca/~aubanel/JuliaMultithreadingNotes.html).
r_bands = [@spawnat w juliaset_columns(c, n, cols) for (w, cols) inenumerate(columns)]
131
+
slices =map(fetch, r_bands)
132
+
reduce(hcat, slices)
133
+
end
134
+
```
135
+
we observe some speed-up over the serial version, but not linear in terms of number of workers
136
+
```julia
137
+
julia>@btimejuliaset(-0.79, 0.15);
138
+
38.699 ms (2 allocations:976.70 KiB)
139
+
140
+
julia>@btimejuliaset_spawn(-0.79, 0.15);
141
+
21.521 ms (480 allocations:1.93 MiB)
142
+
```
143
+
In the above example, we spawn one function on each worker and collect the results. In essence, we are performing `map` over bands. Julia offers for this usecase a parallel version of map `pmap`. With that, our example can look like
144
+
```julia
145
+
functionjuliaset_pmap(x, y, n =1000, np =nworkers())
146
+
c = x + y*im
147
+
columns = Iterators.partition(1:n, div(n, np))
148
+
slices =pmap(cols ->juliaset_columns(c, n, cols), columns)
149
+
reduce(hcat, slices)
150
+
end
151
+
152
+
julia>@btimejuliaset_pmap(-0.79, 0.15);
153
+
17.597 ms (451 allocations:1.93 MiB)
154
+
```
155
+
which has slightly better timing then the version based on `@spawnat` and `fetch` (as explained below in section about `Threads`, the parallel computation of Julia set suffers from each pixel taking different time to compute, which can be relieved by dividing the work into more parts --- `@btime juliaset_pmap(-0.79, 0.15, 1000, 16);`).
156
+
157
+
## Synchronization / Communication primitives
158
+
The orchestration of a complicated computation might be difficult with relatively low-level remote calls. A Producer / Consumer paradigm is a synchronization paradigm that uses queues. Consumer fetches work intructions from the queue and pushes results to different queue. Julia supports this paradigm with `Channel` and `RemoteChannel` primitives. Importantly, putting to and taking from queue is an atomic operation, hence we do not have take care of race conditions.
foreach(cols ->put!(instructions, (c, n, cols)), columns)
178
+
results =RemoteChannel(()->Channel{Tuple}(np))
179
+
rfuns = [@spawnat i juliaset_channel_worker(instructions, results) for i inworkers()]
180
+
181
+
img =Array{UInt8,2}(undef, n, n)
182
+
whileisready(results)
183
+
cols, impart =take!(results)
184
+
img[:,cols] .= impart;
185
+
end
186
+
img
187
+
end
188
+
189
+
julia>@btimejuliaset_channels(-0.79, 0.15);
190
+
254.151 μs (254 allocations:987.09 KiB)
191
+
```
192
+
The execution tim is much higher then the we have observed in the previous cases and changing the number of workers do not help much. What went wrong? The reason is that setting up the infrastructure around remote channels is a costly process. Consider the following alternative, where (i) we let workers to run end-lessly and (ii) the channel infrastructure is set-up once and wrapped into an anonymous function
foreach(cols ->put!(instructions, (c, n, cols)), columns)
211
+
for i in1:np
212
+
cols, impart =take!(results)
213
+
img[:,cols] .= impart;
214
+
end
215
+
img
216
+
end
217
+
end
218
+
219
+
t =juliaset_init(-0.79, 0.15)
220
+
julia>@btimet();
221
+
17.697 ms (776 allocations:1.94 MiB)
222
+
```
223
+
with which we obtain the comparable speed.
224
+
Instead of `@spawnat` we can also use `remote_do` as foreach`(p -> remote_do(juliaset_channel_worker, p, instructions, results), workers)`, which executes the function `juliaset_channel_worker` at worker `p` with parameters `instructions` and `results` but does not return handle to receive the future results.
19
225
20
-
## Synchronization primitives
21
226
- Channels and their guarantees
22
227
- How to orchestrate workers by channels
23
228
- how to kill the remote process with channel
24
229
230
+
I can send indices of columns that should be calculated on remote processes over the queue, from which they can pick it up and send it back over the channel.
231
+
232
+
233
+
234
+
235
+
## tooling
236
+
- how to set up workers,
237
+
+ how to load functions, modules
238
+
+ julia -p 16 -L load_my_script.jl
239
+
- how to send data / how to define variable on remote process
240
+
25
241
## Sending data
26
242
- Do not send `randn(1000, 1000)`
27
243
- Serialization is very time consuming, an efficient converstion to something simple might be wort
0 commit comments