Skip to content

Commit 14762e0

Browse files
authored
[None][fix] Replace PYTORCH_CUDA_ALLOC_CONF with PYTORCH_ALLOC_CONF to fix deprecation warning (#9294)
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
1 parent 03331bc commit 14762e0

File tree

5 files changed

+9
-9
lines changed

5 files changed

+9
-9
lines changed

docker/Dockerfile.multi

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ FROM base AS devel
2828

2929
#
3030
# NB: PyTorch requires this to be < 1.0
31-
ENV PYTORCH_CUDA_ALLOC_CONF="garbage_collection_threshold:0.99999"
31+
ENV PYTORCH_ALLOC_CONF="garbage_collection_threshold:0.99999"
3232

3333
# Copy all installation scripts at once to reduce layers
3434
COPY docker/common/install.sh \

docs/source/deployment-guide/deployment-guide-for-deepseek-r1-on-trtllm.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -250,7 +250,7 @@ Here is an example response, showing that the TensorRT LLM server returns “New
250250
### Troubleshooting Tips
251251

252252
* If you encounter CUDA out-of-memory errors, try reducing `max_batch_size` or `max_seq_len`.
253-
* For running input/output sequence lengths of 8K/1K on H200, there is a known CUDA Out-Of-Memory issue caused by the PyTorch CUDA Caching Allocator fragmenting memory. As a workaround, you can set the environment variable `PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:8192`. For more details, please refer to the [PyTorch documentation on optimizing memory usage](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf).
253+
* For running input/output sequence lengths of 8K/1K on H200, there is a known CUDA Out-Of-Memory issue caused by the PyTorch CUDA Caching Allocator fragmenting memory. As a workaround, you can set the environment variable `PYTORCH_ALLOC_CONF=max_split_size_mb:8192`. For more details, please refer to the [PyTorch documentation on optimizing memory usage](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf).
254254
* Ensure your model checkpoints are compatible with the expected format.
255255
* For performance issues, check GPU utilization with nvidia-smi while the server is running.
256256
* If the container fails to start, verify that the NVIDIA Container Toolkit is properly installed.

enroot/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ run_sqsh:
3939
--container-image "$(SQSH_PATH)" \
4040
--container-mounts "$(SOURCE_DIR):$(CODE_DIR)" --container-workdir $(CODE_DIR) \
4141
--container-mount-home --container-remap-root \
42-
--export PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.99999 \
42+
--export PYTORCH_ALLOC_CONF=garbage_collection_threshold:0.99999 \
4343
$(RUN_CMD)
4444

4545
endif

jenkins/current_image_tags.properties

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# images are adopted from PostMerge pipelines, the abbreviated commit hash is used instead.
1414
IMAGE_NAME=urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm
1515

16-
LLM_DOCKER_IMAGE=urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.10-py3-x86_64-ubuntu24.04-trt10.13.3.9-skip-tritondevel-202511200955-9055
17-
LLM_SBSA_DOCKER_IMAGE=urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.10-py3-aarch64-ubuntu24.04-trt10.13.3.9-skip-tritondevel-202511200955-9055
18-
LLM_ROCKYLINUX8_PY310_DOCKER_IMAGE=urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-13.0.2-devel-rocky8-x86_64-rocky8-py310-trt10.13.3.9-skip-tritondevel-202511200955-9055
19-
LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE=urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-13.0.2-devel-rocky8-x86_64-rocky8-py312-trt10.13.3.9-skip-tritondevel-202511200955-9055
16+
LLM_DOCKER_IMAGE=urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.10-py3-x86_64-ubuntu24.04-trt10.13.3.9-skip-tritondevel-202511271125-9294
17+
LLM_SBSA_DOCKER_IMAGE=urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:pytorch-25.10-py3-aarch64-ubuntu24.04-trt10.13.3.9-skip-tritondevel-202511271125-9294
18+
LLM_ROCKYLINUX8_PY310_DOCKER_IMAGE=urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-13.0.2-devel-rocky8-x86_64-rocky8-py310-trt10.13.3.9-skip-tritondevel-202511271125-9294
19+
LLM_ROCKYLINUX8_PY312_DOCKER_IMAGE=urm.nvidia.com/sw-tensorrt-docker/tensorrt-llm:cuda-13.0.2-devel-rocky8-x86_64-rocky8-py312-trt10.13.3.9-skip-tritondevel-202511271125-9294

tensorrt_llm/_torch/pyexecutor/_util.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -965,15 +965,15 @@ def _adjust_torch_mem_fraction():
965965
# torch.cuda._set_allocator_settings (added in PyTorch 2.8.0-rc1)
966966
# or a similar API is available, the warning below should be removed
967967
# and the allocator GC threshold be set via the new API instead.
968-
torch_allocator_config = os.environ.get("PYTORCH_CUDA_ALLOC_CONF", "")
968+
torch_allocator_config = os.environ.get("PYTORCH_ALLOC_CONF", "")
969969
torch_mem_threshold_advised = (
970970
torch.cuda.get_allocator_backend() == "native"
971971
and "expandable_segments:True" not in torch_allocator_config)
972972
torch_mem_threshold_set = "garbage_collection_threshold:" in torch_allocator_config
973973
if torch_mem_threshold_advised and not torch_mem_threshold_set:
974974
logger.warning(
975975
"It is recommended to incl. 'garbage_collection_threshold:0.???' or 'backend:cudaMallocAsync'"
976-
" or 'expandable_segments:True' in PYTORCH_CUDA_ALLOC_CONF.")
976+
" or 'expandable_segments:True' in PYTORCH_ALLOC_CONF.")
977977

978978
# NOTE: Even if a memory threshold was not set (cf. warning above), setting a memory
979979
# fraction < 1.0 is beneficial, because

0 commit comments

Comments
 (0)