Skip to content

Commit f38035c

Browse files
[distributed][rl] remove nccl cumem env var override (vllm-project#24141)
Signed-off-by: youkaichao <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1 parent 426cc86 commit f38035c

File tree

2 files changed

+1
-19
lines changed

2 files changed

+1
-19
lines changed

docs/usage/troubleshooting.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,4 +295,4 @@ This indicates vLLM failed to initialize the NCCL communicator, possibly due to
295295
## Known Issues
296296

297297
- In `v0.5.2`, `v0.5.3`, and `v0.5.3.post1`, there is a bug caused by [zmq](https://github.com/zeromq/pyzmq/issues/2000) , which can occasionally cause vLLM to hang depending on the machine configuration. The solution is to upgrade to the latest version of `vllm` to include the [fix](gh-pr:6759).
298-
- To circumvent a NCCL [bug](https://github.com/NVIDIA/nccl/issues/1234) , all vLLM processes will set an environment variable `NCCL_CUMEM_ENABLE=0` to disable NCCL's `cuMem` allocator. It does not affect performance but only gives memory benefits. When external processes want to set up a NCCL connection with vLLM's processes, they should also set this environment variable, otherwise, inconsistent environment setup will cause NCCL to hang or crash, as observed in the [RLHF integration](https://github.com/OpenRLHF/OpenRLHF/pull/604) and the [discussion](gh-issue:5723#issuecomment-2554389656) .
298+
- To address a memory overhead issue in older NCCL versions (see [bug](https://github.com/NVIDIA/nccl/issues/1234)), vLLM versions `>= 0.4.3, <= 0.10.1.1` would set the environment variable `NCCL_CUMEM_ENABLE=0`. External processes connecting to vLLM also needed to set this variable to prevent hangs or crashes. Since the underlying NCCL bug was fixed in NCCL 2.22.3, this override was removed in newer vLLM versions to allow for NCCL performance optimizations.

vllm/env_override.py

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -13,24 +13,6 @@
1313
# that interact with vllm workers.
1414
# they are executed whenever `import vllm` is called.
1515

16-
if os.environ.get('NCCL_CUMEM_ENABLE', '0') != '0':
17-
logger.warning(
18-
"NCCL_CUMEM_ENABLE is set to %s, skipping override. "
19-
"This may increase memory overhead with cudagraph+allreduce: "
20-
"https://github.com/NVIDIA/nccl/issues/1234",
21-
os.environ['NCCL_CUMEM_ENABLE'])
22-
elif not os.path.exists('/dev/nvidia-caps-imex-channels'):
23-
# NCCL requires NCCL_CUMEM_ENABLE to work with
24-
# multi-node NVLink, typically on GB200-NVL72 systems.
25-
# The ultimate way to detect multi-node NVLink is to use
26-
# NVML APIs, which are too expensive to call here.
27-
# As an approximation, we check the existence of
28-
# /dev/nvidia-caps-imex-channels, used by
29-
# multi-node NVLink to communicate across nodes.
30-
# This will still cost some GPU memory, but it is worthwhile
31-
# because we can get very fast cross-node bandwidth with NVLink.
32-
os.environ['NCCL_CUMEM_ENABLE'] = '0'
33-
3416
# see https://github.com/vllm-project/vllm/pull/15951
3517
# it avoids unintentional cuda initialization from torch.cuda.is_available()
3618
os.environ['PYTORCH_NVML_BASED_CUDA_CHECK'] = '1'

0 commit comments

Comments
 (0)