Skip to content

TensorRT trying to gobble 100% GPU VRAM without considering what's already in use #4663

@Mashrien

Description

@Mashrien

Description

TRTLLM-BUILD tries to allocate the entire GPU's memory without considering that some of it may be in-use or reserved .. And there's no way to specify a vram_limit to the build command.

Tried hacking builder.py to read an env var to specify max workspace on config.create, but no dice.. So I tried hooking both cuMemGetInfo_v2 and cudaMemGetInfo via LD_PRELOAD but it seems those functions aren't ever called.

Environment

NVIDIA-supplied TRT-LLM 1.2.0rc4 docker

TensorRT Version: 10.x

NVIDIA GPU: 5070

NVIDIA Driver Version: 581.57

CUDA Version: 13.0.97

CUDNN Version: latest

Operating System: Win10 + WSL (or docker, have tried both)

Python Version (if applicable): 3.12

Tensorflow Version (if applicable):

PyTorch Version (if applicable): 2.9.0

Baremetal or Container (if so, version): Bare and Container

Relevant Files

Model link: https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B/tree/main

Steps To Reproduce

Run TRTLLM-BUILD under Windows+WSL (where Windows+WDDM+etc reserve 0.5-1.5GB VRAM just for OS display+compositing) with a decent-sized model, in my case Kunoichi 7B

Commands or scripts:
trtllm-build
--checkpoint_dir /mnt/c/ai/models/ckpt_kuno
--output_dir /mnt/c/ai/models/kuno_eng_fp16
--max_batch_size 1
--max_input_len 1024
--max_seq_len 1024
--max_num_tokens 1024
--kv_cache_type paged
--paged_kv_cache enable
--gpt_attention_plugin float16
--gemm_plugin float16
--monitor_memory
--log_level info

Have you tried the latest release?: Yes

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Unknown


The short and skinny is simply this; virtualMemoryBuffer.cpp is trying to allocate basically the entire GPU's memory without bothering to see how much is used, or offering ANY way to set a hard-cap (I've even tried hooking (with LD_PRELOAD) both cuMemGetInfo_v2 and cudaMemGetInfo -- sadly with no luck as it seems those two functions aren't called by the memory manager and it uses some internal mechanism to determine the size to allocate.

[12/06/2025-22:12:21] [TRT] [E] [virtualMemoryBuffer.cpp::resizePhysical::154] Error Code 2: OutOfMemory (Requested size was 11665408000 bytes.)
[12/06/2025-22:12:21] [TRT] [E] [virtualMemoryBuffer.cpp::resizePhysical::141] Error Code 1: Cuda Driver (In resizePhysical at optimizer/builder/virtualMemoryBuffer.cpp:141)
[12/06/2025-22:12:21] [TRT] [W] Requested amount of GPU memory (11665408000 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[12/06/2025-22:12:21] [TRT] [E] [globWriter.cpp::makeResizableGpuMemory::514] Error Code 2: OutOfMemory (Requested size was 11665408000 bytes.)
Traceback (most recent call last):
  File "/home/mash/ai/trtllm_env312/bin/trtllm-build", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/commands/build.py", line 542, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/commands/build.py", line 381, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/commands/build.py", line 356, in build_and_save
    engine = build_model(build_config,
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/commands/build.py", line 349, in build_model
    return build(model, build_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/builder.py", line 1288, in build
    engine = None if build_config.dry_run else builder.build_engine(
                                               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/_common.py", line 210, in decorated
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/builder.py", line 425, in build_engine
    assert engine is not None, 'Engine building failed, please check the error log.'
           ^^^^^^^^^^^^^^^^^^
AssertionError: Engine building failed, please check the error log.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions