You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#### Slow intra-node host communication with Cray MPICH
99
+
100
+
Cray MPICH can perform badly when doing intra-node CPU-CPU memory communication.
101
+
102
+
!!! info "Workaround"
103
+
In some situations Cray MPICH can perform better when communication is done over the NICs, even within a node.
104
+
To force Cray MPICH to use NICs for all communication, set:
105
+
106
+
```bash
107
+
export MPIR_CVAR_NO_LOCAL=1
108
+
```
109
+
110
+
Whenever possible, prefer using GPU-GPU communication instead of CPU-CPU communication.
111
+
It can even be beneficial to transfer data to the GPU only for the communication even if the buffer originally is in CPU memory.
112
+
113
+
#### `"cxil_map: write error"` when doing inter-node GPU-aware MPI communication
114
+
115
+
This error message is sometimes triggered by applications that use GPU Direct MPI calls when they trigger a bug in gdrcopy (a low-level library used to copy buffers between GPUs).
116
+
Setting the following option will completely disable gdrcopy.
117
+
Note that this has a performance impact for small message sizes, so it should only be enabled on a case-by-case basis.
118
+
```bash
119
+
export FI_CXI_SAFE_DEVMEM_COPY_THRESHOLD=0
120
+
```
121
+
97
122
### Resolved issues
98
123
99
124
#### `"cxil_map: write error"` when doing inter-node GPU-aware MPI communication
0 commit comments