-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Description
TRTLLM-BUILD tries to allocate the entire GPU's memory without considering that some of it may be in-use or reserved .. And there's no way to specify a vram_limit to the build command.
Tried hacking builder.py to read an env var to specify max workspace on config.create, but no dice.. So I tried hooking both cuMemGetInfo_v2 and cudaMemGetInfo via LD_PRELOAD but it seems those functions aren't ever called.
Environment
NVIDIA-supplied TRT-LLM 1.2.0rc4 docker
TensorRT Version: 10.x
NVIDIA GPU: 5070
NVIDIA Driver Version: 581.57
CUDA Version: 13.0.97
CUDNN Version: latest
Operating System: Win10 + WSL (or docker, have tried both)
Python Version (if applicable): 3.12
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.9.0
Baremetal or Container (if so, version): Bare and Container
Relevant Files
Model link: https://huggingface.co/SanjiWatsuki/Kunoichi-DPO-v2-7B/tree/main
Steps To Reproduce
Run TRTLLM-BUILD under Windows+WSL (where Windows+WDDM+etc reserve 0.5-1.5GB VRAM just for OS display+compositing) with a decent-sized model, in my case Kunoichi 7B
Commands or scripts:
trtllm-build
--checkpoint_dir /mnt/c/ai/models/ckpt_kuno
--output_dir /mnt/c/ai/models/kuno_eng_fp16
--max_batch_size 1
--max_input_len 1024
--max_seq_len 1024
--max_num_tokens 1024
--kv_cache_type paged
--paged_kv_cache enable
--gpt_attention_plugin float16
--gemm_plugin float16
--monitor_memory
--log_level info
Have you tried the latest release?: Yes
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Unknown
The short and skinny is simply this; virtualMemoryBuffer.cpp is trying to allocate basically the entire GPU's memory without bothering to see how much is used, or offering ANY way to set a hard-cap (I've even tried hooking (with LD_PRELOAD) both cuMemGetInfo_v2 and cudaMemGetInfo -- sadly with no luck as it seems those two functions aren't called by the memory manager and it uses some internal mechanism to determine the size to allocate.
[12/06/2025-22:12:21] [TRT] [E] [virtualMemoryBuffer.cpp::resizePhysical::154] Error Code 2: OutOfMemory (Requested size was 11665408000 bytes.)
[12/06/2025-22:12:21] [TRT] [E] [virtualMemoryBuffer.cpp::resizePhysical::141] Error Code 1: Cuda Driver (In resizePhysical at optimizer/builder/virtualMemoryBuffer.cpp:141)
[12/06/2025-22:12:21] [TRT] [W] Requested amount of GPU memory (11665408000 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[12/06/2025-22:12:21] [TRT] [E] [globWriter.cpp::makeResizableGpuMemory::514] Error Code 2: OutOfMemory (Requested size was 11665408000 bytes.)
Traceback (most recent call last):
File "/home/mash/ai/trtllm_env312/bin/trtllm-build", line 7, in <module>
sys.exit(main())
^^^^^^
File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/commands/build.py", line 542, in main
parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/commands/build.py", line 381, in parallel_build
passed = build_and_save(rank, rank % workers, ckpt_dir,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/commands/build.py", line 356, in build_and_save
engine = build_model(build_config,
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/commands/build.py", line 349, in build_model
return build(model, build_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/builder.py", line 1288, in build
engine = None if build_config.dry_run else builder.build_engine(
^^^^^^^^^^^^^^^^^^^^^
File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/_common.py", line 210, in decorated
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/mash/ai/trtllm_env312/lib/python3.12/site-packages/tensorrt_llm/builder.py", line 425, in build_engine
assert engine is not None, 'Engine building failed, please check the error log.'
^^^^^^^^^^^^^^^^^^
AssertionError: Engine building failed, please check the error log.