Skip to content

Commit 9af96c6

Browse files
authored
Merge branch 'master' into sb/mpipref
2 parents 147deb0 + 477a406 commit 9af96c6

16 files changed

+195
-132
lines changed

.github/workflows/Documenter.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,13 @@ on:
55
branches:
66
- master
77
tags: '*'
8+
paths:
9+
- 'docs/**'
10+
- 'src/**'
811
pull_request:
12+
paths:
13+
- 'docs/**'
14+
- 'src/**'
915

1016
jobs:
1117
docs-build:

.github/workflows/ShellCheck.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,11 @@ on:
55
branches:
66
- master
77
tags: '*'
8+
paths:
9+
- 'bin/**'
810
pull_request:
11+
paths:
12+
- 'bin/**'
913

1014
jobs:
1115
shellcheck:

.github/workflows/UnitTests.yml

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,20 @@
1-
name: Unit Tests
1+
name: MPI.jl Unit Tests
22

33
on:
44
pull_request:
5-
5+
paths:
6+
- 'bin/**'
7+
- 'deps/**'
8+
- 'src/**'
9+
- 'test/**'
610
push:
711
branches:
812
- master
13+
paths:
14+
- 'bin/**'
15+
- 'deps/**'
16+
- 'src/**'
17+
- 'test/**'
918

1019
jobs:
1120
test-default:

docs/make.jl

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,12 @@ for (example_title, example_md) in EXAMPLES
1818
example_jl = example_md[1:end-2]*"jl"
1919
@info "Building $example_md"
2020
open(joinpath(@__DIR__, "src", example_md), "w") do mdfile
21+
println(mdfile, """
22+
```@meta
23+
EditURL = "https://github.com/JuliaParallel/MPI.jl/blob/master/docs/$(example_jl)"
24+
```
25+
"""
26+
)
2127
println(mdfile, "# $example_title")
2228
println(mdfile)
2329
println(mdfile, "```julia")

docs/src/configuration.md

Lines changed: 40 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,52 @@ By default, MPI.jl will download and link against the following MPI implementati
66

77
This is suitable for most single-node use cases, but for larger systems, such as HPC
88
clusters or multi-GPU machines, you will probably want to configure against a
9-
system-provided MPI implementation in order to exploit feature such as fast network
9+
system-provided MPI implementation in order to exploit features such as fast network
1010
interfaces and CUDA-aware MPI interfaces.
1111

1212
MPI.jl will attempt to detect when you are running on a HPC cluster, and warn the user
1313
about this. To disable this warning, set the environment variable
1414
`JULIA_MPI_CLUSTER_WARN=n`.
1515

16+
## Julia wrapper for `mpiexec`
17+
18+
Since you can configure `MPI.jl` to use one of several MPI implementations, you
19+
may have different Julia projects using different implementation. Thus, it may
20+
be cumbersome to find out which `mpiexec` executable is associated to a specific
21+
project. To make this easy, on Unix-based systems `MPI.jl` comes with a thin
22+
project-aware wrapper around `mpiexec`, called `mpiexecjl`.
23+
24+
### Installation
25+
26+
You can install `mpiexecjl` with [`MPI.install_mpiexecjl()`](@ref). The default
27+
destination directory is `joinpath(DEPOT_PATH[1], "bin")`, which usually
28+
translates to `~/.julia/bin`, but check the value on your system. You can also
29+
tell `MPI.install_mpiexecjl` to install to a different directory.
30+
31+
```sh
32+
$ julia
33+
julia> using MPI
34+
julia> MPI.install_mpiexecjl()
35+
```
36+
37+
To quickly call this wrapper we recommend you to add the destination directory
38+
to your [`PATH`](https://en.wikipedia.org/wiki/PATH_(variable)) environment
39+
variable.
40+
41+
### Usage
42+
43+
`mpiexecjl` has the same syntax as the `mpiexec` binary that will be called, but
44+
it takes in addition a `--project` option to call the specific binary associated
45+
to the `MPI.jl` version in the given project. If no `--project` flag is used,
46+
the `MPI.jl` in the global Julia environment will be used instead.
47+
48+
After installing `mpiexecjl` and adding its directory to `PATH`, you can run it
49+
with:
50+
51+
```sh
52+
$ mpiexecjl --project=/path/to/project -n 20 julia script.jl
53+
```
54+
1655
## Using a system-provided MPI
1756

1857
### Requirements
@@ -74,36 +113,3 @@ The test suite can also be modified by the following variables:
74113
- `JULIA_MPIEXEC_TEST_ARGS`: Additional arguments to be passed to the MPI launcher for the tests only.
75114
- `JULIA_MPI_TEST_ARRAYTYPE`: Set to `CuArray` to test the CUDA-aware interface with
76115
[`CUDA.CuArray](https://github.com/JuliaGPU/CUDA.jl) buffers.
77-
78-
## Julia wrapper for `mpiexec`
79-
80-
Since you can configure `MPI.jl` to use one of several MPI implementations, you
81-
may have different Julia projects using different implementation. Thus, it may
82-
be cumbersome to find out which `mpiexec` executable is associated to a specific
83-
project. To make this easy, on Unix-based systems `MPI.jl` comes with a thin
84-
project-aware wrapper around `mpiexec`, called `mpiexecjl`.
85-
86-
### Installation
87-
88-
You can install `mpiexecjl` with [`MPI.install_mpiexecjl()`](@ref). The default
89-
destination directory is `joinpath(DEPOT_PATH[1], "bin")`, which usually
90-
translates to `~/.julia/bin`, but check the value on your system. You can also
91-
tell `MPI.install_mpiexecjl` to install to a different directory.
92-
93-
To quickly call this wrapper we recommend you to add the destination directory
94-
to your [`PATH`](https://en.wikipedia.org/wiki/PATH_(variable)) environment
95-
variable.
96-
97-
### Usage
98-
99-
`mpiexecjl` has the same syntax as the `mpiexec` binary that will be called, but
100-
it takes in addition a `--project` option to call the specific binary associated
101-
to the `MPI.jl` version in the given project. If no `--project` flag is used,
102-
the `MPI.jl` in the global Julia environment will be used instead.
103-
104-
After installing `mpiexecjl` and adding its directory to `PATH`, you can run it
105-
with:
106-
107-
```
108-
$ mpiexecjl --project=/path/to/project -n 20 julia script.jl
109-
```

docs/src/knownissues.md

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,8 +65,51 @@ ENV["UCX_ERROR_SIGNALS"] = "SIGILL,SIGBUS,SIGFPE"
6565
```
6666
at `__init__`. If set externally, it should be modified to exclude `SIGSEGV` from the list.
6767

68+
## CUDA-aware MPI
69+
70+
### Memory pool
71+
72+
Using CUDA-aware MPI on multi-GPU nodes with recent CUDA.jl may trigger (see [here](https://github.com/JuliaGPU/CUDA.jl/issues/1053#issue-946826096))
73+
```
74+
The call to cuIpcGetMemHandle failed. This means the GPU RDMA protocol
75+
cannot be used.
76+
cuIpcGetMemHandle return value: 1
77+
```
78+
in the MPI layer, or fail on a segmentation fault (see [here](https://discourse.julialang.org/t/cuda-aware-mpi-works-on-system-but-not-for-julia/75060)) with
79+
```
80+
[1642930332.032032] [gcn19:4087661:0] gdr_copy_md.c:122 UCX ERROR gdr_pin_buffer failed. length :65536 ret:22
81+
```
82+
This is due to the MPI implementation using legacy `cuIpc*` APIs, which are incompatible with stream-ordered allocator, now default in CUDA.jl, see [UCX issue #7110](https://github.com/openucx/ucx/issues/7110).
83+
84+
To circumvent this, one has to ensure the CUDA memory pool to be set to `none`:
85+
```
86+
export JULIA_CUDA_MEMORY_POOL=none
87+
```
88+
_More about CUDA.jl [memory environment-variables](https://cuda.juliagpu.org/stable/usage/memory/#Memory-pool)._
89+
90+
### Hints to ensure CUDA-aware MPI to be functional
91+
92+
Make sure to:
93+
- Have MPI and CUDA on path (or module loaded) that were used to build the CUDA-aware MPI
94+
- Make sure to have:
95+
```
96+
export JULIA_CUDA_MEMORY_POOL=none
97+
export JULIA_MPI_BINARY=system
98+
export JULIA_CUDA_USE_BINARYBUILDER=false
99+
```
100+
- Add CUDA and MPI packages in Julia. Build MPI.jl in verbose mode to check whether correct versions are built/used:
101+
```
102+
julia -e 'using Pkg; pkg"add CUDA"; pkg"add MPI"; Pkg.build("MPI"; verbose=true)'
103+
```
104+
- Then in Julia, upon loading MPI and CUDA modules, you can check
105+
- CUDA version: `CUDA.versioninfo()`
106+
- If MPI has CUDA: `MPI.has_cuda()`
107+
- If you are using correct MPI implementation: `MPI.identify_implementation()`
108+
109+
After that, it may be preferred to run the Julia MPI script (as suggested [here](https://discourse.julialang.org/t/cuda-aware-mpi-works-on-system-but-not-for-julia/75060/11)) launching it from a shell script (as suggested [here](https://discourse.julialang.org/t/cuda-aware-mpi-works-on-system-but-not-for-julia/75060/4)).
110+
68111
## Microsoft MPI
69112
70113
### Custom operators on 32-bit Windows
71114
72-
It is not possible to use [custom operators with 32-bit Microsoft MPI](https://github.com/JuliaParallel/MPI.jl/issues/246), as it uses the `stdcall` calling convention, which is not supported by [Julia's C-compatible function pointers](https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/index.html#Creating-C-Compatible-Julia-Function-Pointers-1).
115+
It is not possible to use [custom operators with 32-bit Microsoft MPI](https://github.com/JuliaParallel/MPI.jl/issues/246), as it uses the `stdcall` calling convention, which is not supported by [Julia's C-compatible function pointers](https://docs.julialang.org/en/v1/manual/calling-c-and-fortran-code/index.html#Creating-C-Compatible-Julia-Function-Pointers-1).

docs/src/topology.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
# Topology
22

33
```@docs
4-
MPI.Cart_coords
5-
MPI.Cart_coords!
4+
MPI.Dims_create
65
MPI.Cart_create
76
MPI.Cart_get
7+
MPI.Cart_coords
88
MPI.Cart_rank
99
MPI.Cart_shift
1010
MPI.Cart_sub
1111
MPI.Cartdim_get
12-
MPI.Dims_create!
1312
```

src/deprecated.jl

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,3 +224,14 @@ import Base: @deprecate
224224
@deprecate(recv(source::Integer, tag::Integer, comm::Comm), recv(comm, MPI.Status; source=source, tag=tag), false)
225225
@deprecate(Sendrecv!(sendbuf, dest::Integer, sendtag::Integer, recvbuf, source::Integer, recvtag::Integer, comm::Comm),
226226
Sendrecv!(sendbuf, recvbuf, comm, MPI.Status; dest=dest, sendtag=sendtag, source=source, recvtag=recvtag)[2], false)
227+
228+
@deprecate(Cart_create(comm_old::Comm, ndims::Integer, dims::MPIBuffertype{Cint}, periods::MPIBuffertype{Cint}, reorder),
229+
Cart_create(comm_old, dims; periodic=periods, reorder=reorder), false)
230+
@deprecate(Cart_create(comm_old::Comm, dims::AbstractArray{T}, periods::Array{T}, reorder) where {T <: Integer},
231+
Cart_create(comm_old, dims; periodic=periods, reorder=reorder), false)
232+
@deprecate(Dims_create!(nnodes::Integer, ndims::Integer, dims::MPIBuffertype{T}) where {T<:Integer},
233+
dims .= Dims_create(nnodes, dims), false)
234+
@deprecate(Dims_create!(nnodes::Integer, dims::AbstractArray{T,N}) where {T<:Integer, N},
235+
dims .= Dims_create(nnodes, dims), false)
236+
@deprecate(Cart_coords!(comm::Comm, rank::Integer, coords::MPIBuffertype{Cint}),
237+
coords .= Cart_coords(comm, rank), false)

0 commit comments

Comments
 (0)