[Feature]: Add detailed inference logging similar to SGLang and vLLM

### 🚀 The feature, motivation and pitch

## Description
I propose adding detailed, structured logging for the inference process (prefill/decode batches), similar to the excellent logging found in projects like SGLang and vLLM. This would greatly enhance visibility into system performance, debugging, and monitoring.

## Motivation & Expected Benefits
Debugging & Monitoring: Easily track request states, token usage, throughput, and CUDA graph status in real-time.

Performance Analysis: Monitor key metrics like #running-req, #queue-req, gen throughput (token/s), and token usage to identify bottlenecks.

Operational Clarity: Provides a clear, consistent log stream that helps developers and operators understand system behavior during inference.
Proposed Log Format Example
The logs should follow a structured, readable format like the sample below (inspired by SGLang/vLLM style):

```
[YYYY-MM-DD HH:MM:SS] Prefill batch. #new-seq: 1, #new-token: 25, #cached-token: 1, token usage: 0.00, #running-req: 0, #queue-req: 0
[YYYY-MM-DD HH:MM:SS] Decode batch. #running-req: 1, #token: 283, token usage: 0.00, cuda graph: True, gen throughput (token/s): 108.30, #queue-req: 0
(See full example logs in the section below.)
``` 

Example Log Output (from a test run)
```
[2025-12-08 03:57:29] Decode batch. #running-req: 1, #token: 283, token usage: 0.00, cuda graph: True, gen throughput (token/s): 0.07, #queue-req: 0
[2025-12-08 03:57:30] INFO:     10.40.32.80:34224 - "POST /v1/chat/completions HTTP/1.0" 200 OK
[2025-12-08 08:36:21] Prefill batch. #new-seq: 1, #new-token: 1, #cached-token: 249, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-12-08 08:36:21] Decode batch. #running-req: 1, #token: 271, token usage: 0.00, cuda graph: True, gen throughput (token/s): 0.00, #queue-req: 0
... (additional log lines)
``` 

### Alternatives

none

### Additional context

This feature would be especially useful for high-throughput serving environments and performance tuning.

The implementation should ideally allow the logging level/verbosity to be configurable (e.g., via environment variable or config file).

Reference: [SGLang](https://github.com/sgl-project/sglang) and [vLLM](https://github.com/vllm-project/vllm) both provide similar insightful logging.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Add detailed inference logging similar to SGLang and vLLM #9778

🚀 The feature, motivation and pitch

Description

Motivation & Expected Benefits

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Add detailed inference logging similar to SGLang and vLLM #9778

Description

🚀 The feature, motivation and pitch

Description

Motivation & Expected Benefits

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions