Skip to content

[Bug]: Crash when enabling MAX_UTILIZATION #9931

@Shang-Pin

Description

@Shang-Pin

System Info

  • TensorRT-LLM v1.2.0rc2

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

enabling MAX_UTILIZATION in the scheduler_config, specifically we saw it in deepseek v3

scheduler_config:
capacity_scheduler_policy: MAX_UTILIZATION

Expected behavior

No crash

actual behavior

2025-12-04 22:23:18.852
[12/05/2025-06:23:18] [TRT-LLM] [RANK 0] [I] iter = 4286, global_rank = 0, rank = 0, currank_total_requests = 0/0, host_step_time = 88.22870254516602ms, prev_device_step_time = 89.87372589111328ms, timestamp = 2025-12-05 06:23:18, num_scheduled_requests: 48, states = {'num_ctx_requests': 0, 'num_ctx_tokens': 0, 'num_generation_tokens': 96}
2025-12-04 22:23:18.869
[12/05/2025-06:23:18] [TRT-LLM] [RANK 7] [E] Error in event loop: vector::_M_default_append
2025-12-04 22:23:18.870
Exception in thread Thread-3 (_event_loop_wrapper):
2025-12-04 22:23:18.870
Traceback (most recent call last):
2025-12-04 22:23:18.870
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
2025-12-04 22:23:18.870
[12/05/2025-06:23:18] [TRT-LLM] [RANK 5] [E] Error in event loop: vector::_M_default_append
2025-12-04 22:23:18.870
[12/05/2025-06:23:18] [TRT-LLM] [RANK 0] [E] Error in event loop: vector::_M_default_append
2025-12-04 22:23:18.870
[12/05/2025-06:23:18] [TRT-LLM] [RANK 7] [E] Traceback (most recent call last):
2025-12-04 22:23:18.870
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 346, in _event_loop_wrapper
2025-12-04 22:23:18.870
self.event_loop()
2025-12-04 22:23:18.870
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1425, in _executor_loop_overlap
2025-12-04 22:23:18.870
self._pause_requests(scheduled_batch.paused_requests)
2025-12-04 22:23:18.870
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2535, in _pause_requests
2025-12-04 22:23:18.870
req.pause(max_input_len)
2025-12-04 22:23:18.870
ValueError: vector::_M_default_append
2025-12-04 22:23:18.870
2025-12-04 22:23:18.871
Exception in thread Thread-3 (_event_loop_wrapper):
2025-12-04 22:23:18.871
Traceback (most recent call last):
2025-12-04 22:23:18.871
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
2025-12-04 22:23:18.871
[12/05/2025-06:23:18] [TRT-LLM] [RANK 5] [E] Traceback (most recent call last):
2025-12-04 22:23:18.871
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 346, in _event_loop_wrapper
2025-12-04 22:23:18.871
self.event_loop()
2025-12-04 22:23:18.871
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1425, in _executor_loop_overlap
2025-12-04 22:23:18.871
self._pause_requests(scheduled_batch.paused_requests)
2025-12-04 22:23:18.871
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2535, in _pause_requests
2025-12-04 22:23:18.871
req.pause(max_input_len)
2025-12-04 22:23:18.871
ValueError: vector::_M_default_append
2025-12-04 22:23:18.871

additional notes

We start seeing the crash most likely when there are more requests getting paused when kv cache space runs out.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Inference runtime<NV>General operational aspects of TRTLLM execution not in other categories.bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions