Skip to content
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions docs/software/communication/cray-mpich.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,21 @@ Cray MPICH may sometimes hang on larger runs.

Performance may be negatively affected by this option.

#### `"cxil_map: write error"` when doing inter-node GPU-aware MPI communication

This error message is sometimes triggered by applications that use GPU Direct MPI calls when they trigger a bug in gdrcopy (a low-level library used to copy buffers between GPUs).
Setting the following option will completely disable gdrcopy.
Note that this has a performance impact for small message sizes, so it should only be enabled on a case-by-case basis.
```bash
export FI_CXI_SAFE_DEVMEM_COPY_THRESHOLD=0
```

### Resolved issues

#### `"cxil_map: write error"` when doing inter-node GPU-aware MPI communication

??? info "The issue has been resolved on the 7th of October 2024 with a system update"
The issue was caused by a system misconfiguration.
??? info "The issue has been resolved on the 7th of October 2024 with a system
update" The issue was caused by a system misconfiguration.

When doing inter-node GPU-aware communication with Cray MPICH after the update on the 30th of September 2024 on Alps, applications will fail with:
```bash
Expand Down