|
| 1 | +<!-- |
| 2 | +Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 3 | +
|
| 4 | +Redistribution and use in source and binary forms, with or without |
| 5 | +modification, are permitted provided that the following conditions |
| 6 | +are met: |
| 7 | + * Redistributions of source code must retain the above copyright |
| 8 | + notice, this list of conditions and the following disclaimer. |
| 9 | + * Redistributions in binary form must reproduce the above copyright |
| 10 | + notice, this list of conditions and the following disclaimer in the |
| 11 | + documentation and/or other materials provided with the distribution. |
| 12 | + * Neither the name of NVIDIA CORPORATION nor the names of its |
| 13 | + contributors may be used to endorse or promote products derived |
| 14 | + from this software without specific prior written permission. |
| 15 | +
|
| 16 | +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY |
| 17 | +EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
| 18 | +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR |
| 19 | +PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR |
| 20 | +CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, |
| 21 | +EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, |
| 22 | +PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR |
| 23 | +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY |
| 24 | +OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT |
| 25 | +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE |
| 26 | +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| 27 | +--> |
| 28 | + |
| 29 | +# GenAI-Perf Compare Subcommand |
| 30 | + |
| 31 | +There are two approaches for the users to use the `compare` subcommand to create |
| 32 | +plots across multiple runs. First is to directly pass the profile export files |
| 33 | +with `--files` option |
| 34 | + |
| 35 | +## Running initially with `--files` option |
| 36 | + |
| 37 | +If the user does not have a YAML configuration file, |
| 38 | +they can run the `compare` subcommand with the `--files` option to generate a |
| 39 | +set of default plots as well as a pre-filled YAML config file for the plots. |
| 40 | + |
| 41 | +```bash |
| 42 | +genai-perf compare --files profile1.json profile2.json profile3.json |
| 43 | +``` |
| 44 | + |
| 45 | +This will generate the default plots and compare across the three runs. |
| 46 | +GenAI-Perf will also generate an initial YAML configuration file `config.yaml` |
| 47 | +that is pre-filled with plot configurations as following: |
| 48 | + |
| 49 | +```yaml |
| 50 | +plot1: |
| 51 | + title: Time to First Token |
| 52 | + x_metric: '' |
| 53 | + y_metric: time_to_first_tokens |
| 54 | + x_label: Time to First Token (ms) |
| 55 | + y_label: '' |
| 56 | + width: 1200 |
| 57 | + height: 700 |
| 58 | + type: box |
| 59 | + paths: |
| 60 | + - profile1.json |
| 61 | + - profile2.json |
| 62 | + - profile3.json |
| 63 | + output: compare |
| 64 | +plot2: |
| 65 | + title: Request Latency |
| 66 | + x_metric: '' |
| 67 | + y_metric: request_latencies |
| 68 | + x_label: Request Latency (ms) |
| 69 | + y_label: '' |
| 70 | + width: 1200 |
| 71 | + height: 700 |
| 72 | + type: box |
| 73 | + paths: |
| 74 | + - profile1.json |
| 75 | + - profile2.json |
| 76 | + - profile3.json |
| 77 | + output: compare |
| 78 | +plot3: |
| 79 | + title: Distribution of Input Tokens to Generated Tokens |
| 80 | + x_metric: num_input_tokens |
| 81 | + y_metric: num_output_tokens |
| 82 | + x_label: Number of Input Tokens Per Request |
| 83 | + y_label: Number of Generated Tokens Per Request |
| 84 | + width: 1200 |
| 85 | + height: 450 |
| 86 | + type: heatmap |
| 87 | + paths: |
| 88 | + - profile1.json |
| 89 | + - profile2.json |
| 90 | + - profile3.json |
| 91 | + output: compare |
| 92 | +plot4: |
| 93 | + title: Time to First Token vs Number of Input Tokens |
| 94 | + x_metric: num_input_tokens |
| 95 | + y_metric: time_to_first_tokens |
| 96 | + x_label: Number of Input Tokens |
| 97 | + y_label: Time to First Token (ms) |
| 98 | + width: 1200 |
| 99 | + height: 700 |
| 100 | + type: scatter |
| 101 | + paths: |
| 102 | + - profile1.json |
| 103 | + - profile2.json |
| 104 | + - profile3.json |
| 105 | + output: compare |
| 106 | +plot5: |
| 107 | + title: Token-to-Token Latency vs Output Token Position |
| 108 | + x_metric: token_positions |
| 109 | + y_metric: inter_token_latencies |
| 110 | + x_label: Output Token Position |
| 111 | + y_label: Token-to-Token Latency (ms) |
| 112 | + width: 1200 |
| 113 | + height: 700 |
| 114 | + type: scatter |
| 115 | + paths: |
| 116 | + - profile1.json |
| 117 | + - profile2.json |
| 118 | + - profile3.json |
| 119 | + output: compare |
| 120 | +``` |
| 121 | +
|
| 122 | +Once the user has the YAML configuration file, |
| 123 | +they can repeat the process of editing the config file and running with |
| 124 | +`--config` option to re-generate the plots iteratively. |
| 125 | + |
| 126 | +```bash |
| 127 | +# edit |
| 128 | +vi config.yaml |
| 129 | +
|
| 130 | +# re-generate the plots |
| 131 | +genai-perf compare --config config.yaml |
| 132 | +``` |
| 133 | + |
| 134 | +## Running directly with `--config` option |
| 135 | + |
| 136 | +If the user would like to create a custom plot (other than the default ones provided), |
| 137 | +they can build their own YAML configuration file that contains the information |
| 138 | +about the plots they would like to generate. |
| 139 | +For instance, if the user would like to see how the inter token latencies change |
| 140 | +by the number of output tokens, which is not part of the default plots, |
| 141 | +they could add the following YAML block to the file: |
| 142 | + |
| 143 | +```yaml |
| 144 | +plot1: |
| 145 | + title: Inter Token Latency vs Output Tokens |
| 146 | + x_metric: num_output_tokens |
| 147 | + y_metric: inter_token_latencies |
| 148 | + x_label: Num Output Tokens |
| 149 | + y_label: Avg ITL (ms) |
| 150 | + width: 1200 |
| 151 | + height: 450 |
| 152 | + type: scatter |
| 153 | + paths: |
| 154 | + - <path-to-profile-export-file> |
| 155 | + - <path-to-profile-export-file> |
| 156 | + output: compare |
| 157 | +``` |
| 158 | + |
| 159 | +After adding the lines, the user can run the following command to generate the |
| 160 | +plots specified in the configuration file (in this case, `config.yaml`): |
| 161 | + |
| 162 | +```bash |
| 163 | +genai-perf compare --config config.yaml |
| 164 | +``` |
| 165 | + |
| 166 | +The user can check the generated plots under the output directory: |
| 167 | +``` |
| 168 | +compare/ |
| 169 | +├── inter_token_latency_vs_output_tokens.jpeg |
| 170 | +└── ... |
| 171 | +``` |
| 172 | + |
| 173 | +## YAML Schema |
| 174 | + |
| 175 | +Here are more details about the YAML configuration file and its stricture. |
| 176 | +The general YAML schema for the plot configuration looks as following: |
| 177 | + |
| 178 | +```yaml |
| 179 | +plot1: |
| 180 | + title: [str] |
| 181 | + x_metric: [str] |
| 182 | + y_metric: [str] |
| 183 | + x_label: [str] |
| 184 | + y_label: [str] |
| 185 | + width: [int] |
| 186 | + height: [int] |
| 187 | + type: [scatter,box,heatmap] |
| 188 | + paths: |
| 189 | + - [str] |
| 190 | + - ... |
| 191 | + output: [str] |
| 192 | +
|
| 193 | +plot2: |
| 194 | + title: [str] |
| 195 | + x_metric: [str] |
| 196 | + y_metric: [str] |
| 197 | + x_label: [str] |
| 198 | + y_label: [str] |
| 199 | + width: [int] |
| 200 | + height: [int] |
| 201 | + type: [scatter,box,heatmap] |
| 202 | + paths: |
| 203 | + - [str] |
| 204 | + - ... |
| 205 | + output: [str] |
| 206 | +
|
| 207 | +# add more plots |
| 208 | +``` |
| 209 | + |
| 210 | +The user can add as many plots they would like to generate by adding the plot |
| 211 | +blocks in the configuration file (they have a key pattern of `plot<#>`, |
| 212 | +but that is not required and the user can set it to any arbitrary string). |
| 213 | +For each plot block, the user can specify the following configurations: |
| 214 | +- `title`: The title of the plot. |
| 215 | +- `x_metric`: The name of the metric to be used on the x-axis. |
| 216 | +- `y_metric`: The name of the metric to be used on the y-axis. |
| 217 | +- `x_label`: The x-axis label (or description) |
| 218 | +- `y_label`: The y-axis label (or description) |
| 219 | +- `width`: The width of the entire plot |
| 220 | +- `height`: The height of the entire plot |
| 221 | +- `type`: The type of the plot. It must be one of the three: `scatter`, `box`, |
| 222 | +or `heatmap`. |
| 223 | +- `paths`: List of paths to the profile export files to compare. |
| 224 | +- `output`: The path to the output directory to store all the plots and YAML |
| 225 | +configuration file. |
| 226 | + |
| 227 | +> [!Note] |
| 228 | +> User *MUST* provide at least one valid path to the profile export file. |
| 229 | + |
| 230 | + |
| 231 | + |
| 232 | +## Example Plots |
| 233 | + |
| 234 | +Here are the list of sample plots that gets created by default from running the |
| 235 | +`compare` subcommand: |
| 236 | + |
| 237 | +### Distribution of Input Tokens to Generated Tokens |
| 238 | +<img src="assets/distribution_of_input_tokens_to_generated_tokens.jpeg" width="800" height="300" /> |
| 239 | + |
| 240 | +### Request Latency Analysis |
| 241 | +<img src="assets/request_latency.jpeg" width="800" height="300" /> |
| 242 | + |
| 243 | +### Time to First Token Analysis |
| 244 | +<img src="assets/time_to_first_token.jpeg" width="800" height="300" /> |
| 245 | + |
| 246 | +### Time to First Token vs. Number of Input Tokens |
| 247 | +<img src="assets/time_to_first_token_vs_number_of_input_tokens.jpeg" width="800" height="300" /> |
| 248 | + |
| 249 | +### Token-to-Token Latency vs. Output Token Position |
| 250 | +<img src="assets/token-to-token_latency_vs_output_token_position.jpeg" width="800" height="300" /> |
| 251 | + |
0 commit comments