You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- OSL uses consistent source selection (all-server or all-client) to avoid double-counting. Some servers embed reasoning tokens inside `completion_tokens` but leave `reasoning_tokens` null — mixing server output with client reasoning would count reasoning twice.
377
384
- For models that do not support/separate reasoning tokens, OSL equals the output token count.
378
385
379
386
---
@@ -806,7 +813,7 @@ total_usage_total_tokens = sum(r.usage_total_tokens for r in records if r.valid)
806
813
## Usage Discrepancy Metrics
807
814
808
815
> [!NOTE]
809
-
> These metrics measure the percentage difference between API-reported token counts (`usage` fields) and client-computed token counts. They are **not displayed in console output** but help identify tokenizer mismatches or counting discrepancies. Output and reasoning token diff metrics require the `--tokenize-output` flag to populate both server and client values.
816
+
> These metrics measure the percentage difference between API-reported token counts (`usage` fields) and client-computed token counts. They are **not displayed in console output** but help identify tokenizer mismatches or counting discrepancies. Prompt diff requires `--tokenize-input` (or fallback tokenization when server omits prompt tokens) for user-provided datasets. Output and reasoning diff metrics require `--tokenize-output` to populate both server and client values.
0 commit comments