Skip to content

Commit ef7ca32

Browse files
authored
Add an FAQ entry about a certain GPU-aware MPI error (#4811)
1 parent 73c6f64 commit ef7ca32

File tree

1 file changed

+7
-0
lines changed
  • Docs/sphinx_documentation/source

1 file changed

+7
-0
lines changed

Docs/sphinx_documentation/source/Faq.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,13 @@ to the device and pass a device pointer function object `DevicePtrIF` into the `
172172
|
173173
|
174174
175+
**Q.** I'm getting errors when running with GPU-aware MPI
176+
177+
**A.** While other problems may exist. One thing to check, if the machine is using Slurm, is the `cgroup.conf` file. If it contains `ConstrainDevices=yes`, then IPC can be impacted, which means bindings such as `--gpu-bind=closest` should not be used. Instead try `--gpu-bind=none`.
178+
179+
.. _`This Slurm issue provides more information`: https://support.schedmd.com/show_bug.cgi?id=17875
180+
181+
175182
More Questions
176183
--------------
177184

0 commit comments

Comments
 (0)