Skip to content

Commit f93f012

Browse files
authored
Add compare subcommand to README (#664)
* Add compare subcommand to README * Fix rendering issue * Add compare.md * Add breakpoints * Clean up * Add sample plots * Adjust size * Add height to plot * Add a section about default visualization * Address feedback
1 parent 76ac69a commit f93f012

File tree

7 files changed

+334
-4
lines changed

7 files changed

+334
-4
lines changed

src/c++/perf_analyzer/genai-perf/README.md

Lines changed: 83 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -194,6 +194,83 @@ Request throughput (per sec): 4.44
194194

195195
See [Tutorial](docs/tutorial.md) for additional examples.
196196

197+
</br>
198+
199+
# Visualization
200+
201+
GenAI-Perf can also generate various plots that visualize the performance of the
202+
current profile run. This is disabled by default but users can easily enable it
203+
by passing the `--generate-plots` option when running the benchmark:
204+
205+
```bash
206+
genai-perf \
207+
-m gpt2 \
208+
--service-kind triton \
209+
--backend tensorrtllm \
210+
--streaming \
211+
--concurrency 1 \
212+
--generate-plots
213+
```
214+
215+
This will generate a [set of default plots](docs/compare.md#example-plots) such as:
216+
- Time to first token (TTFT) analysis
217+
- Request latency analysis
218+
- TTFT vs Number of input tokens
219+
- Inter token latencies vs Token positions
220+
- Number of input tokens vs Number of output tokens
221+
222+
223+
## Using `compare` Subcommand to Visualize Multiple Runs
224+
225+
The `compare` subcommand in GenAI-Perf facilitates users in comparing multiple
226+
profile runs and visualizing the differences through plots.
227+
228+
### Usage
229+
Assuming the user possesses two profile export JSON files,
230+
namely `profile1.json` and `profile2.json`,
231+
they can execute the `compare` subcommand using the `--files` option:
232+
233+
```bash
234+
genai-perf compare --files profile1.json profile2.json
235+
```
236+
237+
Executing the above command will perform the following actions under the
238+
`compare` directory:
239+
1. Generate a YAML configuration file (e.g. `config.yaml`) containing the
240+
metadata for each plot generated during the comparison process.
241+
2. Automatically generate the [default set of plots](docs/compare.md#example-plots)
242+
(e.g. TTFT vs. Number of Input Tokens) that compare the two profile runs.
243+
244+
```
245+
compare
246+
├── config.yaml
247+
├── distribution_of_input_tokens_to_generated_tokens.jpeg
248+
├── request_latency.jpeg
249+
├── time_to_first_token.jpeg
250+
├── time_to_first_token_vs_number_of_input_tokens.jpeg
251+
├── token-to-token_latency_vs_output_token_position.jpeg
252+
└── ...
253+
```
254+
255+
### Customization
256+
Users have the flexibility to iteratively modify the generated YAML configuration
257+
file to suit their specific requirements.
258+
They can make alterations to the plots according to their preferences and execute
259+
the command with the `--config` option followed by the path to the modified
260+
configuration file:
261+
262+
```bash
263+
genai-perf compare --config compare/config.yaml
264+
```
265+
266+
This command will regenerate the plots based on the updated configuration settings,
267+
enabling users to refine the visual representation of the comparison results as
268+
per their needs.
269+
270+
See [Compare documentation](docs/compare.md) for more details.
271+
272+
</br>
273+
197274
# Model Inputs
198275

199276
GenAI-Perf supports model input prompts from either synthetically generated
@@ -203,8 +280,7 @@ inputs, or from the HuggingFace
203280
specified using the `--input-dataset` CLI option.
204281

205282
When the dataset is synthetic, you can specify the following options:
206-
* `--num-prompts <int>`: The number of unique prompts to generate as stimulus,
207-
>= 1.
283+
* `--num-prompts <int>`: The number of unique prompts to generate as stimulus, >= 1.
208284
* `--synthetic-input-tokens-mean <int>`: The mean of number of tokens in the
209285
generated prompts when using synthetic data, >= 1.
210286
* `--synthetic-input-tokens-stddev <int>`: The standard deviation of number of
@@ -215,8 +291,7 @@ When the dataset is coming from HuggingFace, you can specify the following
215291
options:
216292
* `--input-dataset {openorca,cnn_dailymail}`: HuggingFace dataset to use for
217293
benchmarking.
218-
* `--num-prompts <int>`: The number of unique prompts to generate as stimulus,
219-
>= 1.
294+
* `--num-prompts <int>`: The number of unique prompts to generate as stimulus, >= 1.
220295

221296
When the dataset is coming from a file, you can specify the following
222297
options:
@@ -240,6 +315,8 @@ You can optionally set additional model inputs with the following option:
240315
model with a singular value, such as `stream:true` or `max_tokens:5`. This
241316
flag can be repeated to supply multiple extra inputs.
242317

318+
</br>
319+
243320
# Metrics
244321

245322
GenAI-Perf collects a diverse set of metrics that captures the performance of
@@ -254,6 +331,8 @@ the inference server.
254331
| <span id="output_token_throughput_metric">Output Token Throughput</span> | Total number of output tokens from benchmark divided by benchmark duration | None–one value per benchmark |
255332
| <span id="request_throughput_metric">Request Throughput</span> | Number of final responses from benchmark divided by benchmark duration | None–one value per benchmark |
256333

334+
</br>
335+
257336
# Command Line Options
258337

259338
##### `-h`
45.6 KB
Loading
38.7 KB
Loading
34.1 KB
Loading
47.2 KB
Loading
82.4 KB
Loading
Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
<!--
2+
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
4+
Redistribution and use in source and binary forms, with or without
5+
modification, are permitted provided that the following conditions
6+
are met:
7+
* Redistributions of source code must retain the above copyright
8+
notice, this list of conditions and the following disclaimer.
9+
* Redistributions in binary form must reproduce the above copyright
10+
notice, this list of conditions and the following disclaimer in the
11+
documentation and/or other materials provided with the distribution.
12+
* Neither the name of NVIDIA CORPORATION nor the names of its
13+
contributors may be used to endorse or promote products derived
14+
from this software without specific prior written permission.
15+
16+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
-->
28+
29+
# GenAI-Perf Compare Subcommand
30+
31+
There are two approaches for the users to use the `compare` subcommand to create
32+
plots across multiple runs. First is to directly pass the profile export files
33+
with `--files` option
34+
35+
## Running initially with `--files` option
36+
37+
If the user does not have a YAML configuration file,
38+
they can run the `compare` subcommand with the `--files` option to generate a
39+
set of default plots as well as a pre-filled YAML config file for the plots.
40+
41+
```bash
42+
genai-perf compare --files profile1.json profile2.json profile3.json
43+
```
44+
45+
This will generate the default plots and compare across the three runs.
46+
GenAI-Perf will also generate an initial YAML configuration file `config.yaml`
47+
that is pre-filled with plot configurations as following:
48+
49+
```yaml
50+
plot1:
51+
title: Time to First Token
52+
x_metric: ''
53+
y_metric: time_to_first_tokens
54+
x_label: Time to First Token (ms)
55+
y_label: ''
56+
width: 1200
57+
height: 700
58+
type: box
59+
paths:
60+
- profile1.json
61+
- profile2.json
62+
- profile3.json
63+
output: compare
64+
plot2:
65+
title: Request Latency
66+
x_metric: ''
67+
y_metric: request_latencies
68+
x_label: Request Latency (ms)
69+
y_label: ''
70+
width: 1200
71+
height: 700
72+
type: box
73+
paths:
74+
- profile1.json
75+
- profile2.json
76+
- profile3.json
77+
output: compare
78+
plot3:
79+
title: Distribution of Input Tokens to Generated Tokens
80+
x_metric: num_input_tokens
81+
y_metric: num_output_tokens
82+
x_label: Number of Input Tokens Per Request
83+
y_label: Number of Generated Tokens Per Request
84+
width: 1200
85+
height: 450
86+
type: heatmap
87+
paths:
88+
- profile1.json
89+
- profile2.json
90+
- profile3.json
91+
output: compare
92+
plot4:
93+
title: Time to First Token vs Number of Input Tokens
94+
x_metric: num_input_tokens
95+
y_metric: time_to_first_tokens
96+
x_label: Number of Input Tokens
97+
y_label: Time to First Token (ms)
98+
width: 1200
99+
height: 700
100+
type: scatter
101+
paths:
102+
- profile1.json
103+
- profile2.json
104+
- profile3.json
105+
output: compare
106+
plot5:
107+
title: Token-to-Token Latency vs Output Token Position
108+
x_metric: token_positions
109+
y_metric: inter_token_latencies
110+
x_label: Output Token Position
111+
y_label: Token-to-Token Latency (ms)
112+
width: 1200
113+
height: 700
114+
type: scatter
115+
paths:
116+
- profile1.json
117+
- profile2.json
118+
- profile3.json
119+
output: compare
120+
```
121+
122+
Once the user has the YAML configuration file,
123+
they can repeat the process of editing the config file and running with
124+
`--config` option to re-generate the plots iteratively.
125+
126+
```bash
127+
# edit
128+
vi config.yaml
129+
130+
# re-generate the plots
131+
genai-perf compare --config config.yaml
132+
```
133+
134+
## Running directly with `--config` option
135+
136+
If the user would like to create a custom plot (other than the default ones provided),
137+
they can build their own YAML configuration file that contains the information
138+
about the plots they would like to generate.
139+
For instance, if the user would like to see how the inter token latencies change
140+
by the number of output tokens, which is not part of the default plots,
141+
they could add the following YAML block to the file:
142+
143+
```yaml
144+
plot1:
145+
title: Inter Token Latency vs Output Tokens
146+
x_metric: num_output_tokens
147+
y_metric: inter_token_latencies
148+
x_label: Num Output Tokens
149+
y_label: Avg ITL (ms)
150+
width: 1200
151+
height: 450
152+
type: scatter
153+
paths:
154+
- <path-to-profile-export-file>
155+
- <path-to-profile-export-file>
156+
output: compare
157+
```
158+
159+
After adding the lines, the user can run the following command to generate the
160+
plots specified in the configuration file (in this case, `config.yaml`):
161+
162+
```bash
163+
genai-perf compare --config config.yaml
164+
```
165+
166+
The user can check the generated plots under the output directory:
167+
```
168+
compare/
169+
├── inter_token_latency_vs_output_tokens.jpeg
170+
└── ...
171+
```
172+
173+
## YAML Schema
174+
175+
Here are more details about the YAML configuration file and its stricture.
176+
The general YAML schema for the plot configuration looks as following:
177+
178+
```yaml
179+
plot1:
180+
title: [str]
181+
x_metric: [str]
182+
y_metric: [str]
183+
x_label: [str]
184+
y_label: [str]
185+
width: [int]
186+
height: [int]
187+
type: [scatter,box,heatmap]
188+
paths:
189+
- [str]
190+
- ...
191+
output: [str]
192+
193+
plot2:
194+
title: [str]
195+
x_metric: [str]
196+
y_metric: [str]
197+
x_label: [str]
198+
y_label: [str]
199+
width: [int]
200+
height: [int]
201+
type: [scatter,box,heatmap]
202+
paths:
203+
- [str]
204+
- ...
205+
output: [str]
206+
207+
# add more plots
208+
```
209+
210+
The user can add as many plots they would like to generate by adding the plot
211+
blocks in the configuration file (they have a key pattern of `plot<#>`,
212+
but that is not required and the user can set it to any arbitrary string).
213+
For each plot block, the user can specify the following configurations:
214+
- `title`: The title of the plot.
215+
- `x_metric`: The name of the metric to be used on the x-axis.
216+
- `y_metric`: The name of the metric to be used on the y-axis.
217+
- `x_label`: The x-axis label (or description)
218+
- `y_label`: The y-axis label (or description)
219+
- `width`: The width of the entire plot
220+
- `height`: The height of the entire plot
221+
- `type`: The type of the plot. It must be one of the three: `scatter`, `box`,
222+
or `heatmap`.
223+
- `paths`: List of paths to the profile export files to compare.
224+
- `output`: The path to the output directory to store all the plots and YAML
225+
configuration file.
226+
227+
> [!Note]
228+
> User *MUST* provide at least one valid path to the profile export file.
229+
230+
231+
232+
## Example Plots
233+
234+
Here are the list of sample plots that gets created by default from running the
235+
`compare` subcommand:
236+
237+
### Distribution of Input Tokens to Generated Tokens
238+
<img src="assets/distribution_of_input_tokens_to_generated_tokens.jpeg" width="800" height="300" />
239+
240+
### Request Latency Analysis
241+
<img src="assets/request_latency.jpeg" width="800" height="300" />
242+
243+
### Time to First Token Analysis
244+
<img src="assets/time_to_first_token.jpeg" width="800" height="300" />
245+
246+
### Time to First Token vs. Number of Input Tokens
247+
<img src="assets/time_to_first_token_vs_number_of_input_tokens.jpeg" width="800" height="300" />
248+
249+
### Token-to-Token Latency vs. Output Token Position
250+
<img src="assets/token-to-token_latency_vs_output_token_position.jpeg" width="800" height="300" />
251+

0 commit comments

Comments
 (0)