Skip to content

Commit e2c3be0

Browse files
committed
Address feedback
1 parent 330b971 commit e2c3be0

File tree

7 files changed

+62
-39
lines changed

7 files changed

+62
-39
lines changed

docs/cli-options.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,7 @@ Use the legacy 'max_tokens' field instead of 'max_completion_tokens' in request
167167

168168
#### `--use-server-token-count`
169169

170-
[Deprecated] This flag is a no-op and will be removed in a future release. AIPerf now always computes both client-side and server-reported token counts. Server counts are preferred for output metrics; client counts are used for input validation.
170+
[Deprecated] This flag is a no-op and will be removed in a future release. AIPerf now prefers server-reported token counts when available and falls back to client-side tokenization for input. Use --tokenize-output to enable client-side output tokenization.
171171
<br/>_Flag (no value required)_
172172

173173
#### `--stream-usage`, `--no-stream-usage`
@@ -675,7 +675,7 @@ Enable client-side tokenization of output and reasoning tokens, even when the se
675675

676676
#### `--tokenize-input`, `--no-tokenize-input`
677677

678-
Enable client-side tokenization of input prompts for every request. When enabled, locally computed input token counts are always stored in token_counts.input_local. When disabled, client-side input tokenization only occurs as a fallback when the server does not report prompt tokens. Automatically set to False for user-provided input datasets (--custom-dataset-type or --public-dataset) unless explicitly overridden.
678+
Enable client-side tokenization of input prompts for every request. When enabled, locally computed input token counts are always stored in token_counts.input_local. When disabled, client-side input tokenization only occurs as a fallback when the server does not report prompt tokens. Use --no-tokenize-input to disable client-side input tokenization entirely, including fallback. Automatically set to False for user-provided input datasets (--custom-dataset-type or --public-dataset) unless explicitly overridden.
679679
<br/>_Default: `True`_
680680

681681
### Load Generator

docs/metrics-reference.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -394,15 +394,18 @@ The number of input/prompt tokens for a single request. This represents the size
394394
**Formula:**
395395
```python
396396
# Server-preferred (falls back to client-side)
397-
input_sequence_length = usage.prompt_tokens or len(tokenizer.encode(prompt, add_special_tokens=False))
397+
if usage.prompt_tokens is not None:
398+
input_sequence_length = usage.prompt_tokens
399+
else:
400+
input_sequence_length = len(tokenizer.encode(prompt, add_special_tokens=False))
398401
```
399402

400403
**Notes:**
401404
- When the server reports `usage.prompt_tokens`, that value is used for ISL (and thus for console display and derived metrics).
402-
- Falls back to client-side tokenization when server does not report prompt token counts.
405+
- Falls back to client-side tokenization when server does not report prompt token counts, unless `--no-tokenize-input` is explicitly specified.
403406
- Client-side tokenization uses `add_special_tokens=False` to count only content tokens.
404407
- Automatically disabled for user-provided input datasets; use `--tokenize-input` to force.
405-
- Use `--no-tokenize-input` to skip when relying on server-reported prompt tokens.
408+
- Use `--no-tokenize-input` to disable client-side input tokenization entirely (no fallback).
406409
- Useful for understanding the relationship between input size and latency/throughput.
407410

408411
---
@@ -813,7 +816,7 @@ total_usage_total_tokens = sum(r.usage_total_tokens for r in records if r.valid)
813816
## Usage Discrepancy Metrics
814817

815818
> [!NOTE]
816-
> These metrics measure the percentage difference between API-reported token counts (`usage` fields) and client-computed token counts. They are **not displayed in console output** but help identify tokenizer mismatches or counting discrepancies. Prompt diff requires `--tokenize-input` (or fallback tokenization when server omits prompt tokens) for user-provided datasets. Output and reasoning diff metrics require `--tokenize-output` to populate both server and client values.
819+
> These metrics measure the percentage difference between API-reported token counts (`usage` fields) and client-computed token counts. They are **not displayed in console output** but help identify tokenizer mismatches or counting discrepancies. Prompt diff requires both server-reported and client-computed input token counts. Client-side input tokenization is used as a fallback when the server omits prompt tokens, unless `--no-tokenize-input` is explicitly specified. Output and reasoning diff metrics require `--tokenize-output` to populate both server and client values.
817820
818821
### Usage Prompt Tokens Diff %
819822

src/aiperf/common/config/endpoint_config.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -230,8 +230,9 @@ def url(self) -> str:
230230
Field(
231231
description=(
232232
"[Deprecated] This flag is a no-op and will be removed in a future release. "
233-
"AIPerf now always computes both client-side and server-reported token counts. "
234-
"Server counts are preferred for output metrics; client counts are used for input validation."
233+
"AIPerf now prefers server-reported token counts when available and falls back "
234+
"to client-side tokenization for input. Use --tokenize-output to enable "
235+
"client-side output tokenization."
235236
),
236237
),
237238
CLIParameter(

src/aiperf/common/config/tokenizer_config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ class TokenizerConfig(BaseConfig):
8080
"When enabled, locally computed input token counts are always stored "
8181
"in token_counts.input_local. When disabled, client-side input tokenization "
8282
"only occurs as a fallback when the server does not report prompt tokens. "
83+
"Use --no-tokenize-input to disable client-side input tokenization entirely, "
84+
"including fallback. "
8385
"Automatically set to False for user-provided input datasets "
8486
"(--custom-dataset-type or --public-dataset) unless explicitly overridden.",
8587
),

src/aiperf/common/models/record_models.py

Lines changed: 12 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -881,30 +881,18 @@ def __post_init__(self) -> None:
881881
class TokenCounts:
882882
"""Token counts for a record."""
883883

884-
input: int | None = Field(
885-
default=None,
886-
description="The server-reported prompt token count from the API usage field. If None, the server did not report prompt tokens.",
887-
)
888-
input_local: int | None = Field(
889-
default=None,
890-
description="The number of input tokens computed by the client-side tokenizer. If None, the number of tokens could not be calculated.",
891-
)
892-
output: int | None = Field(
893-
default=None,
894-
description="The server-reported output token count (completion minus reasoning). If None, the server did not report completion tokens.",
895-
)
896-
output_local: int | None = Field(
897-
default=None,
898-
description="The number of output tokens computed by the client-side tokenizer.",
899-
)
900-
reasoning: int | None = Field(
901-
default=None,
902-
description="The server-reported reasoning token count. If None, the server did not report reasoning tokens.",
903-
)
904-
reasoning_local: int | None = Field(
905-
default=None,
906-
description="The number of reasoning tokens computed by the client-side tokenizer.",
907-
)
884+
input: int | None = None
885+
"""Server-reported prompt token count from the API usage field."""
886+
input_local: int | None = None
887+
"""Input tokens computed by the client-side tokenizer."""
888+
output: int | None = None
889+
"""Server-reported output token count (completion minus reasoning)."""
890+
output_local: int | None = None
891+
"""Output tokens computed by the client-side tokenizer."""
892+
reasoning: int | None = None
893+
"""Server-reported reasoning token count."""
894+
reasoning_local: int | None = None
895+
"""Reasoning tokens computed by the client-side tokenizer."""
908896

909897

910898
@dataclass

src/aiperf/records/inference_result_parser.py

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ def __init__(
5454
"tokenize_input" in user_config.tokenizer.model_fields_set
5555
and not user_config.tokenizer.tokenize_input
5656
)
57+
self._warned_no_usage: bool = False
5758
if (
5859
self.model_endpoint.endpoint.streaming
5960
and self.model_endpoint.endpoint.stream_usage
@@ -63,12 +64,19 @@ def __init__(
6364
"Server-reported token counts will be requested. "
6465
"Use --no-stream-usage if the server does not support stream_options."
6566
)
66-
if not self.disable_tokenization and not self.tokenize_input:
67+
if not self.disable_tokenization and self._explicit_no_tokenize_input:
6768
self.info(
68-
"Input tokenization is disabled. "
69-
"Usage prompt token diff metrics will not be available. "
69+
"Input tokenization is disabled (--no-tokenize-input). "
70+
"Client-side input token counts will not be computed, even as fallback. "
7071
"Use --tokenize-input to enable."
7172
)
73+
elif not self.disable_tokenization and not self.tokenize_input:
74+
self.info(
75+
"Always-on input tokenization is disabled. "
76+
"Client-side input tokenization will still occur as a fallback "
77+
"when the server does not report prompt tokens. "
78+
"Use --tokenize-input to enable for all requests."
79+
)
7280
if not self.disable_tokenization and not self.tokenize_output:
7381
self.info(
7482
"Output tokenization is disabled. "
@@ -311,21 +319,25 @@ async def _compute_token_counts(
311319
responses, reasoning_server
312320
)
313321

314-
# Warn if server provided no usage information at all
322+
# Warn once if server provided no usage information at all
315323
if (
316-
input_token_count is None
324+
not self._warned_no_usage
325+
and input_token_count is None
317326
and output_server is None
318327
and reasoning_server is None
319328
):
329+
self._warned_no_usage = True
320330
self.warning(
321331
"Server did not provide token usage information. Token count metrics will be unavailable. "
322332
"Verify that your API endpoint supports usage reporting (stream_options are automatically configured for OpenAI-compatible endpoints)."
323333
)
324334

325335
# Client-side input tokenization
326336
input_local: int | None = None
327-
if not self.disable_tokenization and (
328-
self.tokenize_input or input_token_count is None
337+
if (
338+
not self.disable_tokenization
339+
and not self._explicit_no_tokenize_input
340+
and (self.tokenize_input or input_token_count is None)
329341
):
330342
try:
331343
input_local = await self.compute_input_token_count(request_record)

tests/unit/records/test_inference_result_parser.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -560,6 +560,23 @@ async def test_no_tokenize_input_fallback(
560560
assert result.token_counts.input is None
561561
assert result.token_counts.input_local == 8 # 8 words in sample turn
562562

563+
async def test_explicit_no_tokenize_input_skips_fallback(
564+
self, setup_inference_parser, request_record, spy_tokenizer
565+
):
566+
"""Explicit --no-tokenize-input without server usage → no fallback."""
567+
setup_inference_parser.tokenize_input = False
568+
setup_inference_parser._explicit_no_tokenize_input = True
569+
setup_inference_parser.get_tokenizer = AsyncMock(return_value=spy_tokenizer)
570+
setup_parser_responses(
571+
setup_inference_parser,
572+
[make_parsed_response(text="output", include_usage=False)],
573+
)
574+
575+
result = await setup_inference_parser.process_valid_record(request_record)
576+
577+
assert result.token_counts.input is None
578+
assert result.token_counts.input_local is None
579+
563580
async def test_tokenize_input_always_computes(
564581
self, setup_inference_parser, request_record, spy_tokenizer
565582
):

0 commit comments

Comments
 (0)