You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using uenvs like `prgenv-gnu`, make sure you are either using the `default` view which loads `aws-ofi-nccl` automatically, or, if using the `modules` view, load the `aws-ofi-nccl` module with `module load aws-ofi-nccl`.
56
+
If the plugin is found correctly, running the application with `NCCL_DEBUG=INFO` should print:
57
+
```console
58
+
nid006352:34610:34631 [0] NCCL INFO Using network AWS Libfabric
59
+
```
60
+
61
+
!!! warning "`NCCL_NET_PLUGIN="ofi"` with uenvs"
62
+
When using uenvs, do not set `NCCL_NET_PLUGIN="ofi"` instead of, or in addition to, `NCCL_NET="AWS Libfabric"`.
63
+
If you do, your application will fail to start since NCCL will:
64
+
65
+
1. fail to find the plugin because of the name of the shared library in the uenv, and
66
+
2. prefer `NCCL_NET_PLUGIN` over `NCCL_NET`, so it will fail to find the plugin even if `NCCL_NET="AWS Libfabric"` is correctly set.
67
+
68
+
When both environment variables are set the error message, with `NCCL_DEBUG=WARN`, will look similar to when the plugin isn't available:
0 commit comments