You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: benchmark/fluid/README.md
+11Lines changed: 11 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -58,3 +58,14 @@ kubectl create -f myjob/
58
58
```
59
59
60
60
The job shall start.
61
+
62
+
63
+
## Notes for Run Fluid Distributed with NCCL2 and RDMA
64
+
65
+
Before running NCCL2 distributed jobs, please check that whether your node has multiple network
66
+
interfaces, try to add the environment variable `export NCCL_SOCKET_IFNAME=eth0` to use your actual
67
+
network device.
68
+
69
+
To run high-performance distributed training, you must prepare your hardware environment to be
70
+
able to run RDMA enabled network communication, please check out [this](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/fluid/howto/cluster/nccl2_rdma_training.md)
0 commit comments