Skip to content

Commit e666a70

Browse files
authored
[None][doc] add visualization of perf metrics in time breakdown tool doc (#8530)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
1 parent 6ee1c87 commit e666a70

File tree

4 files changed

+13
-1
lines changed

4 files changed

+13
-1
lines changed

docs/source/commands/trtllm-serve/run-benchmark-with-trtllm-serve.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Run benchmarking with `trtllm-serve`
22

3-
TensorRT LLM provides the OpenAI-compatiable API via `trtllm-serve` command.
3+
TensorRT LLM provides the OpenAI-compatible API via `trtllm-serve` command.
44
A complete reference for the API is available in the [OpenAI API Reference](https://platform.openai.com/docs/api-reference).
55

66
This step-by-step tutorial covers the following topics for running online serving benchmarking with Llama 3.1 70B and Qwen2.5-VL-7B for multimodal models:
@@ -190,6 +190,10 @@ Across different requests, **average TPOT** is the mean of each request's TPOT (
190190
\text{TPS} = \frac{\text{\#Output\ Tokens}}{T_{last} - T_{first}}
191191
```
192192

193+
### Request Time Breakdown
194+
195+
To get more detailed metrics besides the key metrics above, there is an [experimental tool](https://github.com/NVIDIA/TensorRT-LLM/tree/main/tensorrt_llm/serve/scripts/time_breakdown) for request time breakdown.
196+
193197
## About `extra_llm_api_options`
194198
trtllm-serve provides `extra_llm_api_options` knob to **overwrite** the parameters specified by trtllm-serve.
195199
Generally, We create a YAML file that contains various performance switches.

docs/source/developer-guide/perf-benchmarking.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ easier for users to reproduce our officially published [performance overview](./
1818
the [in-flight batching section](../features/attention.md#inflight-batching) that describes the concept
1919
in further detail.
2020

21+
To benchmark the OpenAI-compatible `trtllm-serve`, please refer to the [run benchmarking with `trtllm-serve`](../commands/trtllm-serve/run-benchmark-with-trtllm-serve.md) section.
22+
2123
## Before Benchmarking
2224

2325
For rigorous benchmarking where consistent and reproducible results are critical, proper GPU configuration is essential. These settings help maximize GPU utilization, eliminate performance variability, and ensure optimal conditions for accurate measurements. While not strictly required for normal operation, we recommend applying these configurations when conducting performance comparisons or publishing benchmark results.

tensorrt_llm/serve/scripts/time_breakdown/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,11 @@ The tool aims to track detailed timing segments throughout the request lifecycle
7373
- **Time Period**: `gen_server_first_token_time``disagg_server_first_token_time`
7474
- **Description**: Routing overhead from generation server back through disagg server
7575
- **Includes**: Response forwarding, aggregation
76+
77+
#### Visualization of Disaggregated Server Metrics
78+
The timepoints are recorded internally by TensorRT LLM per-request performance metrics (also available via LLM API) and OpenAI-compatible server.
79+
![Visualization of Disaggregated Metrics](images/perf_metrics_timepoints.png)
80+
7681
## Input Format
7782

7883
The tool expects a JSON file containing an array of request performance metrics (unit: seconds).
@@ -139,6 +144,7 @@ Set
139144
perf_metrics_max_requests: <INTEGER>
140145
```
141146
in the `extra-llm-api-config.yaml`. If you are running disaggregated serving, you should add configs for all servers (disagg, context and generation server).
147+
The server keeps at most `perf_metrics_max_requests` entries.
142148

143149
Step 2:
144150
Add `--save-request-time-breakdown` when running `benchmark_serving.py`
126 KB
Loading

0 commit comments

Comments
 (0)