You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
init: Set NCCL_PROTO to simple when GDR is unsupported
Force NCCL_PROTO to simple when GDR is not supported. While NCCL
disables the LL128 protocol when using host buffers, it leaves the LL
protocol enabled and polls on host memory directly. Many host-only
providers, like tcp and sockets, do not guarantee data delivering
ordering and can cause corruption when used with the LL protocol. Since
polling on host memory is expensive anyways, this has no real
performance implications and avoids dealing with the lack of a data
ordering hint in the Libfabric API.
Note: Warning for pre-existing NCCL_PROTO setting was removed as modern
NCCL supports complex protocol formats (e.g. "^LL", "allreduce:simple").
We can consider adding protocol format validation to put an INFO later.
Signed-off-by: Mozar Huang <[email protected]>
NCCL_OFI_WARN("NCCL_PROTO was set to \"LL/LL128\", but the Libfabric endpoint does not support 128 byte in-order aligned stores. This endpoint may corrupt data during communication");
292
-
}
293
-
294
-
return0;
295
-
}
296
-
297
278
/*
298
279
* Try to set one of the in-order flags for either send/recv or rdma
299
280
* on the current endpoint to true. have_ordering will be the
0 commit comments