-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
System Info
- TensorRT-LLM v1.2.0rc2
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
enabling MAX_UTILIZATION in the scheduler_config, specifically we saw it in deepseek v3
scheduler_config:
capacity_scheduler_policy: MAX_UTILIZATION
Expected behavior
No crash
actual behavior
2025-12-04 22:23:18.852
[12/05/2025-06:23:18] [TRT-LLM] [RANK 0] [I] iter = 4286, global_rank = 0, rank = 0, currank_total_requests = 0/0, host_step_time = 88.22870254516602ms, prev_device_step_time = 89.87372589111328ms, timestamp = 2025-12-05 06:23:18, num_scheduled_requests: 48, states = {'num_ctx_requests': 0, 'num_ctx_tokens': 0, 'num_generation_tokens': 96}
2025-12-04 22:23:18.869
[12/05/2025-06:23:18] [TRT-LLM] [RANK 7] [E] Error in event loop: vector::_M_default_append
2025-12-04 22:23:18.870
Exception in thread Thread-3 (_event_loop_wrapper):
2025-12-04 22:23:18.870
Traceback (most recent call last):
2025-12-04 22:23:18.870
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
2025-12-04 22:23:18.870
[12/05/2025-06:23:18] [TRT-LLM] [RANK 5] [E] Error in event loop: vector::_M_default_append
2025-12-04 22:23:18.870
[12/05/2025-06:23:18] [TRT-LLM] [RANK 0] [E] Error in event loop: vector::_M_default_append
2025-12-04 22:23:18.870
[12/05/2025-06:23:18] [TRT-LLM] [RANK 7] [E] Traceback (most recent call last):
2025-12-04 22:23:18.870
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 346, in _event_loop_wrapper
2025-12-04 22:23:18.870
self.event_loop()
2025-12-04 22:23:18.870
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1425, in _executor_loop_overlap
2025-12-04 22:23:18.870
self._pause_requests(scheduled_batch.paused_requests)
2025-12-04 22:23:18.870
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2535, in _pause_requests
2025-12-04 22:23:18.870
req.pause(max_input_len)
2025-12-04 22:23:18.870
ValueError: vector::_M_default_append
2025-12-04 22:23:18.870
2025-12-04 22:23:18.871
Exception in thread Thread-3 (_event_loop_wrapper):
2025-12-04 22:23:18.871
Traceback (most recent call last):
2025-12-04 22:23:18.871
File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
2025-12-04 22:23:18.871
[12/05/2025-06:23:18] [TRT-LLM] [RANK 5] [E] Traceback (most recent call last):
2025-12-04 22:23:18.871
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 346, in _event_loop_wrapper
2025-12-04 22:23:18.871
self.event_loop()
2025-12-04 22:23:18.871
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1425, in _executor_loop_overlap
2025-12-04 22:23:18.871
self._pause_requests(scheduled_batch.paused_requests)
2025-12-04 22:23:18.871
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2535, in _pause_requests
2025-12-04 22:23:18.871
req.pause(max_input_len)
2025-12-04 22:23:18.871
ValueError: vector::_M_default_append
2025-12-04 22:23:18.871
additional notes
We start seeing the crash most likely when there are more requests getting paused when kv cache space runs out.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.