You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/cli-options.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -167,7 +167,7 @@ Use the legacy 'max_tokens' field instead of 'max_completion_tokens' in request
167
167
168
168
#### `--use-server-token-count`
169
169
170
-
[Deprecated] This flag is a no-op and will be removed in a future release. AIPerf now always computes both client-side and server-reported token counts. Server counts are preferred for output metrics; client counts are used for input validation.
170
+
[Deprecated] This flag is a no-op and will be removed in a future release. AIPerf now prefers server-reported token counts when available and falls back to client-side tokenization for input. Use --tokenize-output to enable client-side output tokenization.
171
171
<br/>_Flag (no value required)_
172
172
173
173
#### `--stream-usage`, `--no-stream-usage`
@@ -675,7 +675,7 @@ Enable client-side tokenization of output and reasoning tokens, even when the se
675
675
676
676
#### `--tokenize-input`, `--no-tokenize-input`
677
677
678
-
Enable client-side tokenization of input prompts for every request. When enabled, locally computed input token counts are always stored in token_counts.input_local. When disabled, client-side input tokenization only occurs as a fallback when the server does not report prompt tokens. Automatically set to False for user-provided input datasets (--custom-dataset-type or --public-dataset) unless explicitly overridden.
678
+
Enable client-side tokenization of input prompts for every request. When enabled, locally computed input token counts are always stored in token_counts.input_local. When disabled, client-side input tokenization only occurs as a fallback when the server does not report prompt tokens. Use --no-tokenize-input to disable client-side input tokenization entirely, including fallback. Automatically set to False for user-provided input datasets (--custom-dataset-type or --public-dataset) unless explicitly overridden.
- When the server reports `usage.prompt_tokens`, that value is used for ISL (and thus for console display and derived metrics).
402
-
- Falls back to client-side tokenization when server does not report prompt token counts.
405
+
- Falls back to client-side tokenization when server does not report prompt token counts, unless `--no-tokenize-input` is explicitly specified.
403
406
- Client-side tokenization uses `add_special_tokens=False` to count only content tokens.
404
407
- Automatically disabled for user-provided input datasets; use `--tokenize-input` to force.
405
-
- Use `--no-tokenize-input` to skip when relying on server-reported prompt tokens.
408
+
- Use `--no-tokenize-input` to disable client-side input tokenization entirely (no fallback).
406
409
- Useful for understanding the relationship between input size and latency/throughput.
407
410
408
411
---
@@ -813,7 +816,7 @@ total_usage_total_tokens = sum(r.usage_total_tokens for r in records if r.valid)
813
816
## Usage Discrepancy Metrics
814
817
815
818
> [!NOTE]
816
-
> These metrics measure the percentage difference between API-reported token counts (`usage` fields) and client-computed token counts. They are **not displayed in console output** but help identify tokenizer mismatches or counting discrepancies. Prompt diff requires `--tokenize-input` (or fallback tokenization when server omits prompt tokens) for user-provided datasets. Output and reasoning diff metrics require `--tokenize-output` to populate both server and client values.
819
+
> These metrics measure the percentage difference between API-reported token counts (`usage` fields) and client-computed token counts. They are **not displayed in console output** but help identify tokenizer mismatches or counting discrepancies. Prompt diff requires both server-reported and client-computed input token counts. Client-side input tokenization is used as a fallback when the server omits prompt tokens, unless `--no-tokenize-input` is explicitly specified. Output and reasoning diff metrics require `--tokenize-output` to populate both server and client values.
0 commit comments