diff --git a/docs/software/communication/cray-mpich.md b/docs/software/communication/cray-mpich.md index e06880fd..be3adb8e 100644 --- a/docs/software/communication/cray-mpich.md +++ b/docs/software/communication/cray-mpich.md @@ -79,6 +79,15 @@ Cray MPICH may sometimes hang on larger runs. Performance may be negatively affected by this option. +#### `"cxil_map: write error"` when doing inter-node GPU-aware MPI communication + +This error message is sometimes triggered by applications that use GPU Direct MPI calls when they trigger a bug in gdrcopy (a low-level library used to copy buffers between GPUs). +Setting the following option will completely disable gdrcopy. +Note that this has a performance impact for small message sizes, so it should only be enabled on a case-by-case basis. +```bash +export FI_CXI_SAFE_DEVMEM_COPY_THRESHOLD=0 +``` + ### Resolved issues #### `"cxil_map: write error"` when doing inter-node GPU-aware MPI communication