You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/lecture_10/lecture.md
+73-21Lines changed: 73 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -189,21 +189,24 @@ end
189
189
julia>@btimejuliaset_channels(-0.79, 0.15);
190
190
254.151 μs (254 allocations:987.09 KiB)
191
191
```
192
-
The execution tim is much higher then the we have observed in the previous cases and changing the number of workers do not help much. What went wrong? The reason is that setting up the infrastructure around remote channels is a costly process. Consider the following alternative, where (i) we let workers to run end-lessly and (ii) the channel infrastructure is set-up once and wrapped into an anonymous function
192
+
The execution timw is much higher then what we have observed in the previous cases and changing the number of workers does not help much. What went wrong? The reason is that setting up the infrastructure around remote channels is a costly process. Consider the following alternative, where (i) we let workers to run endlessly and (ii) the channel infrastructure is set-up once and wrapped into an anonymous function
@@ -220,29 +223,72 @@ t = juliaset_init(-0.79, 0.15)
220
223
julia>@btimet();
221
224
17.697 ms (776 allocations:1.94 MiB)
222
225
```
223
-
with which we obtain the comparable speed.
224
-
Instead of `@spawnat` we can also use `remote_do` as foreach`(p -> remote_do(juliaset_channel_worker, p, instructions, results), workers)`, which executes the function `juliaset_channel_worker` at worker `p` with parameters `instructions` and `results` but does not return handle to receive the future results.
225
-
226
-
- Channels and their guarantees
227
-
- How to orchestrate workers by channels
228
-
- how to kill the remote process with channel
229
-
230
-
I can send indices of columns that should be calculated on remote processes over the queue, from which they can pick it up and send it back over the channel.
231
-
232
-
226
+
with which we obtain the comparable speed to the `pmap` approach.
227
+
!!! info
228
+
### `remote_do` vs `remote_call`
229
+
Instead of `@spawnat` (`remote_call`) we can also use `remote_do` as foreach`(p -> remote_do(juliaset_channel_worker, p, instructions, results), workers)`, which executes the function `juliaset_channel_worker` at worker `p` with parameters `instructions` and `results` but does not return `Future` handle to receive the future results.
233
230
231
+
!!! info
232
+
### `Channel` and `RemoteChannel`
233
+
`AbstractChannel` has to implement the interface `put!`, `take!`, `fetch`, `isready` and `wait`, i.e. it should behave like a queue. `Channel` is an implementation if an `AbstractChannel` that facilitates a communication within a single process (for the purpose of multi-threadding and task switching). Channel can be easily created by `Channel{T}(capacity)`, which can be infinite. The storage of a channel can be seen in `data` field, but a direct access will of course break all guarantees like atomicity of `take!` and `put!`. For communication between proccesses, the `<:AbstractChannel` has to be wrapped in `RemoteChannel`. The constructor for `RemoteChannel(f::Function, pid::Integer=myid())` has a first argument a function (without arguments) which constructs the `Channel` (or something like that) on the remote machine identified by `pid` and returns the `RemoteChannel`. The storage thus resides on the machine specified by `pid` and the handle provided by the `RemoteChannel` can be freely passed to any process. (For curious, `ProcessGroup` `Distributed.PGRP` contains an information about channels on machines.)
234
234
235
-
## tooling
236
-
- how to set up workers,
237
-
+ how to load functions, modules
238
-
+ julia -p 16 -L load_my_script.jl
239
-
- how to send data / how to define variable on remote process
235
+
In the above example, `juliaset_channel_worker` defined as
put!(results, (cols, juliaset_columns(c, n, cols)))
241
+
end
242
+
end
243
+
```
244
+
runs forever due to the `while true` loop. To stop the computation, we usually extend the type accepted by the `instructions` channel to accept some stopping token (e.g. :stop) and stop.
put!(results, (cols, juliaset_columns(c, n, cols)))
252
+
end
253
+
put!(results, :stop)
254
+
end
255
+
```
256
+
Julia does not provide by default any facility to kill the remote execution except sending `ctrl-c` to the remote worker as `interrupt(pids::Integer...)`.
240
257
241
258
## Sending data
242
259
- Do not send `randn(1000, 1000)`
260
+
- Sending references and ObjectID would not work
243
261
- Serialization is very time consuming, an efficient converstion to something simple might be wort
244
262
- Dict("a" => [1,2,3], "b" = [2,3,4,5]) -> (Array of elements, array of bounds, keys)
245
263
264
+
## Practical advices
265
+
Recall that (i) workers are started as clean processes and (ii) they might not share the same environment with the main process. The latter is due to the fact that files describing the environment (`Project.toml` and `Manifest.toml`) might not be available on remote machines.
266
+
We recommend:
267
+
- to have shared directory (shared home) with code and to share the location of packages
268
+
- to place all code for workers to one file, let's call it `worker.jl` (author of this includes the code for master as well).
269
+
- put to the beggining of `worker.jl` code activating specified environment as
270
+
```julia
271
+
using Pkg
272
+
Pkg.activate(@__DIR__)
273
+
```
274
+
and optionally
275
+
```julia
276
+
Pkg.resolve()
277
+
Pkg.instantiate()
278
+
```
279
+
- run julia as
280
+
```julia
281
+
julia -p ?? -L worker.jl main.jl
282
+
```
283
+
where `main.jl` is the script to be executed on the main node. Or
284
+
```julia
285
+
julia -p ?? -L worker.jl -e "main()"
286
+
```
287
+
where `main()` is the function defined in `worker.jl` to be executed on the main node.
288
+
289
+
A complete example can be seen in [`juliaset_p.jl`](juliaset_p.jl).
When deciding, what kind of paralelism to employ, consider following
443
+
- for tightly coupled computation over shared data, multi-threadding is more suitable due to non-existing sharing of data between processes
444
+
- but if the computation requires frequent allocation and freeing of memery, or IO, separate processes are multi-suitable, since garbage collectors are independent between processes
445
+
-`Transducers` thrives for (almost) the same code to support thread- and process-based paralelism.
0 commit comments