Skip to content

Commit 3b57a49

Browse files
exaexavchuravy
andauthored
avoid using @sync_add on remotecalls (JuliaLang/julia#44671)
* avoid using `@sync_add` on remotecalls It seems like @sync_add adds the Futures to a queue (Channel) for @sync, which in turn calls wait() for all the futures synchronously. Not only that is slightly detrimental for network operations (latencies add up), but in case of Distributed the call to wait() may actually cause some compilation on remote processes, which is also wait()ed for. In result, some operations took a great amount of "serial" processing time if executed on many workers at once. For me, this closes JuliaLang/julia#44645. The major change can be illustrated as follows: First add some workers: ``` using Distributed addprocs(10) ``` and then trigger something that, for example, causes package imports on the workers: ``` using SomeTinyPackage ``` In my case (importing UnicodePlots on 10 workers), this improves the loading time over 10 workers from ~11s to ~5.5s. This is a far bigger issue when worker count gets high. The time of the processing on each worker is usually around 0.3s, so triggering this problem even on a relatively small cluster (64 workers) causes a really annoying delay, and running `@everywhere` for the first time on reasonable clusters (I tested with 1024 workers, see JuliaLang/julia#44645) usually takes more than 5 minutes. Which sucks. Anyway, on 64 workers this reduces the "first import" time from ~30s to ~6s, and on 1024 workers this seems to reduce the time from over 5 minutes (I didn't bother to measure that precisely now, sorry) to ~11s. Related issues: - Probably fixes JuliaLang/julia#39291. - JuliaLang/julia#42156 is a kinda complementary -- it removes the most painful source of slowness (the 0.3s precompilation on the workers), but the fact that the wait()ing is serial remains a problem if the network latencies are high. May help with JuliaLang/julia#38931 Co-authored-by: Valentin Churavy <[email protected]>
1 parent 32e4dd1 commit 3b57a49

File tree

3 files changed

+5
-5
lines changed

3 files changed

+5
-5
lines changed

src/Distributed.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ import Base: getindex, wait, put!, take!, fetch, isready, push!, length,
1010
hash, ==, kill, close, isopen, showerror
1111

1212
# imports for use
13-
using Base: Process, Semaphore, JLOptions, buffer_writes, @sync_add,
13+
using Base: Process, Semaphore, JLOptions, buffer_writes, @async_unwrap,
1414
VERSION_STRING, binding_module, atexit, julia_exename,
1515
julia_cmd, AsyncGenerator, acquire, release, invokelatest,
1616
shell_escape_posixly, shell_escape_csh,
@@ -76,7 +76,7 @@ function _require_callback(mod::Base.PkgId)
7676
# broadcast top-level (e.g. from Main) import/using from node 1 (only)
7777
@sync for p in procs()
7878
p == 1 && continue
79-
@sync_add remotecall(p) do
79+
@async_unwrap remotecall_wait(p) do
8080
Base.require(mod)
8181
nothing
8282
end

src/clusterserialize.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ An exception is raised if a global constant is requested to be cleared.
243243
"""
244244
function clear!(syms, pids=workers(); mod=Main)
245245
@sync for p in pids
246-
@sync_add remotecall(clear_impl!, p, syms, mod)
246+
@async_unwrap remotecall_wait(clear_impl!, p, syms, mod)
247247
end
248248
end
249249
clear!(sym::Symbol, pid::Int; mod=Main) = clear!([sym], [pid]; mod=mod)

src/macros.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -222,10 +222,10 @@ function remotecall_eval(m::Module, procs, ex)
222222
if pid == myid()
223223
run_locally += 1
224224
else
225-
@sync_add remotecall(Core.eval, pid, m, ex)
225+
@async_unwrap remotecall_wait(Core.eval, pid, m, ex)
226226
end
227227
end
228-
yield() # ensure that the remotecall_fetch have had a chance to start
228+
yield() # ensure that the remotecalls have had a chance to start
229229

230230
# execute locally last as we do not want local execution to block serialization
231231
# of the request to remote nodes.

0 commit comments

Comments
 (0)