Skip to content

[Bug]: NameError in memory profiler on Jetson device. #9432

@bonginn

Description

@bonginn

System Info

  • Hardware: Jetson Orin NX (16GB)
  • OS: L4T 36.4.0 (JetPack 6.1)
  • CUDA Version: 12.6
  • TensorRT Version: 10.3.0
  • TensorRT-LLM version: 0.12.0
  • Python Version: 3.10

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Setup environment on a Jetson Orin NX device (JetPack 6.x).
  2. Install nvidia-ml-py to support NVML calls (or to suppress the pynvml missing warning).
  3. Run the python benchmark script:
python3 benchmarks/python/benchmark.py [and your arguments]
  1. Observe the traceback in the console output.
Traceback (most recent call last):

  File "/home/jetson/miniconda3/envs/trtllm-tiny/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap

    self.run()

  File "/home/jetson/miniconda3/envs/trtllm-tiny/lib/python3.10/multiprocessing/process.py", line 108, in run

    self._target(*self._args, **self._kwargs)

  File "/home/jetson/project/TensorRT-LLM/benchmarks/python/mem_monitor.py", line 72, in _upd_peak_memory_usage

    peak_host_used, peak_device_used = self.get_memory_usage()

  File "/home/jetson/project/TensorRT-LLM/benchmarks/python/mem_monitor.py", line 84, in get_memory_usage

    device_used, _, _ = device_memory_info()

  File "/home/jetson/miniconda3/envs/trtllm-tiny/lib/python3.10/site-packages/tensorrt_llm/profiler.py", line 163, in device_memory_info

    mem_info = _device_get_memory_info_fn(handle)

NameError: name '_device_get_memory_info_fn' is not defined

Expected behavior

The profiler should handle the Jetson platform gracefully. Since _device_get_memory_info_fn is skipped during initialization on Jetson, the device_memory_info function should check for the platform and return early (or initialize correctly), rather than attempting to call an undefined function.

actual behavior

The benchmark script continues to run (the main process reports throughput/latency correctly), but the memory monitoring process fails in the background with a NameError.

additional notes

Proposed Solution:
I have verified a fix locally. The device_memory_info function should have a guard clause to return early if on_jetson_l4t is True. Additionally, a one-time warning should be logged to inform the user that memory monitoring is not fully supported on this platform.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Labels

MemoryMemory utilization in TRTLLM: leak/OOM handling, footprint optimization, memory profiling.not a bugSome known limitation, but not a bug.stalewaiting for feedback

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions