[Bug]: try to run tensorrt-llm offline exmaple but the program hangs

### System Info

- CPU: i5-14000kf
- GPU: 4070super
- python3.10
-  nvidia-smi: NVIDIA-SMI 580.105.08  Driver Version: 572.16  CUDA Version: 12.8
- nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0
- tensorrt-llm             1.0.0
- torch                    2.7.1
torchprofile             0.0.4
torchvision              0.22.1
- tensorrt                 10.11.0.33

### Who can help?

_No response_

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

```python
import os
os.environ["TLLM_LLM_ENABLE_DEBUG"] = "1"
from tensorrt_llm import LLM, SamplingParams


def main():

    # Model could accept HF model name, a path to local HF model,
    # or TensorRT Model Optimizer's quantized checkpoints like nvidia/Llama-3.1-8B-Instruct-FP8 on HF.
    llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", backend="pytorch")

    # Sample prompts.
    prompts = [
        "Hello, my name is",
        "The capital of France is",
        "The future of AI is",
    ]

    # Create a sampling params.
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

    for output in llm.generate(prompts, sampling_params):
        print(
            f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}"
        )

    # Got output like
    # Prompt: 'Hello, my name is', Generated text: '\n\nJane Smith. I am a student pursuing my degree in Computer Science at [university]. I enjoy learning new things, especially technology and programming'
    # Prompt: 'The president of the United States is', Generated text: 'likely to nominate a new Supreme Court justice to fill the seat vacated by the death of Antonin Scalia. The Senate should vote to confirm the'
    # Prompt: 'The capital of France is', Generated text: 'Paris.'
    # Prompt: 'The future of AI is', Generated text: 'an exciting time for us. We are constantly researching, developing, and improving our platform to create the most advanced and efficient model available. We are'


if __name__ == '__main__':
    main()
```

### Expected behavior

successfully get model output 

### actual behavior

/home/rookie/Qwen/.venv/bin/python3.10 /home/rookie/Qwen/trt_llm_demo.py 
<frozen importlib._bootstrap_external>:1184: FutureWarning: The cuda.cuda module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.driver module instead.
<frozen importlib._bootstrap_external>:1184: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.
[2025-12-07 13:29:44] INFO config.py:54: PyTorch version 2.7.1 available.
LLM debug mode enabled.
[12/07/2025-13:29:46] [TRT-LLM] [I] Starting TensorRT LLM init.
2025-12-07 13:29:46,319 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[12/07/2025-13:29:46] [TRT-LLM] [I] TensorRT LLM inited.
[TensorRT-LLM] TensorRT LLM version: 1.0.0
[12/07/2025-13:29:46] [TRT-LLM] [I] Using LLM with PyTorch backend
[12/07/2025-13:29:46] [TRT-LLM] [W] Using default gpus_per_node: 1
[12/07/2025-13:29:46] [TRT-LLM] [I] Set nccl_plugin to None.
[12/07/2025-13:29:46] [TRT-LLM] [I] neither checkpoint_format nor checkpoint_loader were provided, checkpoint_format will be set to HF.
LLM.args.mpi_session: None
/home/rookie/Qwen/.venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
[12/07/2025-13:29:47] [TRT-LLM] [I] PyTorchConfig(extra_resource_managers={}, use_cuda_graph=True, cuda_graph_batch_sizes=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 64, 128], cuda_graph_max_batch_size=128, cuda_graph_padding_enabled=False, disable_overlap_scheduler=False, moe_max_num_tokens=None, moe_load_balancer=None, attention_dp_enable_balance=False, attention_dp_time_out_iters=50, attention_dp_batching_wait_iters=10, attn_backend='TRTLLM', moe_backend='CUTLASS', enable_mixed_sampler=False, enable_trtllm_sampler=False, kv_cache_dtype='auto', enable_iter_perf_stats=False, enable_iter_req_stats=False, print_iter_log=False, torch_compile_enabled=False, torch_compile_fullgraph=True, torch_compile_inductor_enabled=False, torch_compile_piecewise_cuda_graph=False, torch_compile_enable_userbuffers=True, torch_compile_max_num_streams=1, enable_autotuner=True, enable_layerwise_nvtx_marker=False, load_format=<LoadFormat.AUTO: 0>, enable_min_latency=False, allreduce_strategy='AUTO', stream_interval=1, force_dynamic_quantization=False, _limit_torch_cuda_mem_fraction=True)
create pool session ...
rank 0 using MpiPoolSession to spawn MPI processes
Server [proxy_request_queue] bound to tcp://127.0.0.1:45799 in PAIR
[12/07/2025-13:29:47] [TRT-LLM] [I] Generating a new HMAC key for server proxy_request_queue
Server [worker_init_status_queue] bound to tcp://127.0.0.1:45971 in PAIR
[12/07/2025-13:29:47] [TRT-LLM] [I] Generating a new HMAC key for server worker_init_status_queue
Server [proxy_result_queue] bound to tcp://127.0.0.1:46275 in PAIR
[12/07/2025-13:29:47] [TRT-LLM] [I] Generating a new HMAC key for server proxy_result_queue
Server [proxy_stats_queue] bound to tcp://127.0.0.1:45893 in PAIR
[12/07/2025-13:29:47] [TRT-LLM] [I] Generating a new HMAC key for server proxy_stats_queue
Server [proxy_kv_cache_events_queue] bound to tcp://127.0.0.1:45993 in PAIR
[12/07/2025-13:29:47] [TRT-LLM] [I] Generating a new HMAC key for server proxy_kv_cache_events_queue

### additional notes

-

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: try to run tensorrt-llm offline exmaple but the program hangs #9766

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: try to run tensorrt-llm offline exmaple but the program hangs #9766

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions