Skip to content

Commit 5ae9eb8

Browse files
authored
Change UCX environment variables in __init__, document in knownissues.md (#370)
Whie the RPATH issue is now fixed in UCX 1.7.0 and later, it appears that the memory cache still does not work correctly. Switching to using `UCX_MEMTYPE_CACHE=no` produces fewer warning messages.
1 parent 3fa9be4 commit 5ae9eb8

File tree

3 files changed

+46
-7
lines changed

3 files changed

+46
-7
lines changed

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ makedocs(
4545
"index.md",
4646
"installation.md",
4747
"usage.md",
48+
"knownissues.md",
4849
"Examples" => EXAMPLES,
4950
"Reference" => [
5051
"library.md",

docs/src/knownissues.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Known issues
2+
3+
## UCX
4+
5+
[UCX](https://www.openucx.org/) is a communication framework used by several MPI implementations.
6+
7+
### Memory cache
8+
9+
When used with CUDA, UCX intercepts `cudaMalloc` so it can determine whether the pointer passed to MPI is on the host (main memory) or the device (GPU). Unfortunately, there are several known issues with how this works with Julia:
10+
- https://github.com/openucx/ucx/issues/5061
11+
- https://github.com/openucx/ucx/issues/4001 (fixed in UCX v1.7.0)
12+
13+
By default, MPI.jl disables this by setting
14+
```
15+
ENV["UCX_MEMTYPE_CACHE"] = "no"
16+
```
17+
at `__init__` which may result in reduced performance, especially for smaller messages.
18+
19+
### Multi-threading and signal handling
20+
21+
When using [Julia multi-threading](https://docs.julialang.org/en/v1/manual/parallel-computing/#man-multithreading-1), the Julia garbage collector internally [uses `SIGSEGV` to synchronize threads](https://docs.julialang.org/en/v1/devdocs/debuggingtips/#Dealing-with-signals-1).
22+
23+
By default, UCX will error if this signal is raised ([#337](https://github.com/JuliaParallel/MPI.jl/issues/337)), resulting in a message such as:
24+
```
25+
Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0xXXXXXXXX)
26+
```
27+
28+
This signal interception can be controlled by setting the environment variable `UCX_ERROR_SIGNALS`: if not already defined, MPI.jl will set it as:
29+
```
30+
ENV["UCX_ERROR_SIGNALS"] = "SIGILL,SIGBUS,SIGFPE"
31+
```
32+
at `__init__`. If set externally, it should be modified to exclude `SIGSEGV` from the list.
33+
34+
## Microsoft MPI
35+
36+
### Custom operators on 32-bit Windows
37+
38+
It is not possible to use [custom operators with 32-bit Microsoft MPI](https://github.com/JuliaParallel/MPI.jl/issues/246), as it uses the `stdcall` calling convention, which is not supported by [Julia's C-compatible function pointers](https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/index.html#Creating-C-Compatible-Julia-Function-Pointers-1).

src/MPI.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -77,14 +77,14 @@ function __init__()
7777
# though that would probably trigger a race condition
7878
error("MPI library has changed, please restart Julia")
7979
end
80-
81-
# disable UCX memory hooks since it can mess up dlopen
82-
# https://github.com/openucx/ucx/issues/4001
83-
ENV["UCX_MEM_MMAP_RELOC"] = "no"
84-
ENV["UCX_MEM_MALLOC_HOOKS"] = "no"
85-
ENV["UCX_MEM_MALLOC_RELOC"] = "no"
86-
ENV["UCX_MEM_EVENTS"] = "no"
8780

81+
82+
# disable UCX memory cache, since it doesn't work correctly
83+
# https://github.com/openucx/ucx/issues/5061
84+
if !haskey(ENV, "UCX_MEMTYPE_CACHE")
85+
ENV["UCX_MEMTYPE_CACHE"] = "no"
86+
end
87+
8888
# Julia multithreading uses SIGSEGV to sync thread
8989
# https://docs.julialang.org/en/v1/devdocs/debuggingtips/#Dealing-with-signals-1
9090
# By default, UCX will error if this occurs (issue #337)

0 commit comments

Comments
 (0)