[Bug]: Crash when enabling MAX_UTILIZATION

### System Info

- TensorRT-LLM v1.2.0rc2

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

enabling MAX_UTILIZATION in the scheduler_config, specifically we saw it in deepseek v3

scheduler_config:
  capacity_scheduler_policy: MAX_UTILIZATION

### Expected behavior

No crash

### actual behavior

2025-12-04 22:23:18.852	
[12/05/2025-06:23:18] [TRT-LLM] [RANK 0] [I] iter = 4286, global_rank = 0, rank = 0, currank_total_requests = 0/0, host_step_time = 88.22870254516602ms, prev_device_step_time = 89.87372589111328ms, timestamp = 2025-12-05 06:23:18, num_scheduled_requests: 48, states = {'num_ctx_requests': 0, 'num_ctx_tokens': 0, 'num_generation_tokens': 96}
	2025-12-04 22:23:18.869	
[12/05/2025-06:23:18] [TRT-LLM] [RANK 7] [E] Error in event loop: vector::_M_default_append
	2025-12-04 22:23:18.870	
Exception in thread Thread-3 (_event_loop_wrapper):
	2025-12-04 22:23:18.870	
Traceback (most recent call last):
	2025-12-04 22:23:18.870	
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
	2025-12-04 22:23:18.870	
[12/05/2025-06:23:18] [TRT-LLM] [RANK 5] [E] Error in event loop: vector::_M_default_append
	2025-12-04 22:23:18.870	
[12/05/2025-06:23:18] [TRT-LLM] [RANK 0] [E] Error in event loop: vector::_M_default_append
	2025-12-04 22:23:18.870	
[12/05/2025-06:23:18] [TRT-LLM] [RANK 7] [E] Traceback (most recent call last):
	2025-12-04 22:23:18.870	
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 346, in _event_loop_wrapper
	2025-12-04 22:23:18.870	
    self.event_loop()
	2025-12-04 22:23:18.870	
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1425, in _executor_loop_overlap
	2025-12-04 22:23:18.870	
    self._pause_requests(scheduled_batch.paused_requests)
	2025-12-04 22:23:18.870	
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2535, in _pause_requests
	2025-12-04 22:23:18.870	
    req.pause(max_input_len)
	2025-12-04 22:23:18.870	
ValueError: vector::_M_default_append
	2025-12-04 22:23:18.870	
	2025-12-04 22:23:18.871	
Exception in thread Thread-3 (_event_loop_wrapper):
	2025-12-04 22:23:18.871	
Traceback (most recent call last):
	2025-12-04 22:23:18.871	
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
	2025-12-04 22:23:18.871	
[12/05/2025-06:23:18] [TRT-LLM] [RANK 5] [E] Traceback (most recent call last):
	2025-12-04 22:23:18.871	
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 346, in _event_loop_wrapper
	2025-12-04 22:23:18.871	
    self.event_loop()
	2025-12-04 22:23:18.871	
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1425, in _executor_loop_overlap
	2025-12-04 22:23:18.871	
    self._pause_requests(scheduled_batch.paused_requests)
	2025-12-04 22:23:18.871	
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 2535, in _pause_requests
	2025-12-04 22:23:18.871	
    req.pause(max_input_len)
	2025-12-04 22:23:18.871	
ValueError: vector::_M_default_append
	2025-12-04 22:23:18.871	

### additional notes

We start seeing the crash most likely when there are more requests getting paused when kv cache space runs out.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Crash when enabling MAX_UTILIZATION #9931

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Crash when enabling MAX_UTILIZATION #9931

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions