We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
1 parent f5f4fc6 commit 0bb540aCopy full SHA for 0bb540a
src/inference_endpoint/testing/variable_throughput_server.py
@@ -523,7 +523,7 @@ class VariableResponseServer:
523
output_len_mean: Mean output sequence length (chars).
524
output_len_spread: Coefficient of variation for output length.
525
output_len_min: Minimum output sequence length (chars).
526
- output_len_max: Maximum output sequence length (chars). None = 8 * mean.
+ output_len_max: Maximum output sequence length (chars). None = 2 * mean.
527
response_rate_mean: Per-request response rate mean (responses/sec). 0 = no rate mode.
528
response_rate_spread: CoV for per-request response rate.
529
inter_token_latency: Per-token delay (TPOT) mean in milliseconds. 0 = no ICL mode.
0 commit comments