You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add vllm:time_per_output_token_seconds and vllm:time_to_first_token_seconds metrics (#217)
* Add vllm:time_per_output_token_seconds and vllm:time_to_first_token_seconds histogram metrics, including support in fake metrics, and update of readme
Signed-off-by: Maya Barnea <[email protected]>
* Add test for ttft kae metrics command line parameter with value for the last bucket
Signed-off-by: Maya Barnea <[email protected]>
* move calculating model name from a loop
Signed-off-by: Maya Barnea <[email protected]>
* Changes according the PR review
Signed-off-by: Maya Barnea <[email protected]>
* according review comments
Signed-off-by: Maya Barnea <[email protected]>
---------
Signed-off-by: Maya Barnea <[email protected]>
Copy file name to clipboardExpand all lines: README.md
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -143,10 +143,12 @@ For more details see the <a href="https://docs.vllm.ai/en/stable/getting_started
143
143
-`running-requests`
144
144
-`waiting-requests`
145
145
-`kv-cache-usage`
146
-
-`loras` - an array containing LoRA information objects, each with the fields: `running` (a comma-separated list of LoRAs in use by running requests), `waiting` (a comma-separated list of LoRAs to be used by waiting requests), and `timestamp` (seconds since Jan 1 1970, the timestamp of this metric).
-`loras` - an array containing LoRA information objects, each with the fields: `running` (a comma-separated list of LoRAs in use by running requests), `waiting` (a comma-separated list of LoRAs to be used by waiting requests), and `timestamp` (seconds since Jan 1 1970, the timestamp of this metric).
147
+
-`ttft-buckets-values` - array of values for time-to-first-token buckets, each value in this array is a value for the corresponding bucket. Array may contain less values than number of buckets, all trailing missing values assumed as 0. Buckets upper boundaries are: 0.001, 0.005, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, 20.0, 40.0, 80.0, 160.0, 640.0, 2560.0, +Inf.
148
+
-`tpot-buckets-values` - array of values for time-per-output-token buckets, each value in this array is a value for the corresponding bucket. Array may contain less values than number of buckets, all trailing missing values assumed as 0. Buckets upper boundaries are: 0.01, 0.025, 0.05, 0.075, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, 20.0, 40.0, 80.0, +Inf.
-`data-parallel-size`: number of ranks to run in Data Parallel deployment, from 1 to 8, default is 1. The ports will be assigned as follows: rank 0 will run on the configured `port`, rank 1 on `port`+1, etc.
0 commit comments