fix: fix anthropic response conversion token mappings#5652
Conversation
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
|
This pull request has been ignored for the connected project Preview Branches by Supabase. |
|
Claude finished @H2Shami's task —— View job PR Review: Anthropic Token Mapping FixReviewing the token counting fix for Anthropic responses. Here's my analysis plan:
Core Token Mapping Analysis ✅Looking at the main fix in The Fix: Changed from Logic: The change addresses the fundamental difference between Anthropic and OpenAI token counting:
Guard Rails Added: Multiple protective checks in usage processors to prevent negative token counts when Test Coverage: Tests verify both cache read and cache write scenarios work correctly, including edge cases with no caching. Critical Security/Performance Issues
|
There was a problem hiding this comment.
Pull request overview
This PR primarily targets correcting token/usage accounting when converting and processing Anthropic (and OpenAI Responses) usage data—especially around cache read/write tokens—so downstream cost/usage calculations align with OpenAI-style conventions.
Changes:
- Adjust effective prompt token calculations to handle cases where cached tokens should not be subtracted (Anthropic-style conventions) across stream/body/usage processors.
- Extend Responses usage types/mappers to carry cache write token fields (
cache_write_tokens,cache_write_details) through conversions. - Update generated OpenAPI artifacts (jawnTypes, MCP public types, swagger) and add extensive debug logging in worker/Jawn.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| worker/src/lib/dbLogger/DBLoggable.ts | Adds raw/normalized response debug logging around AI Gateway normalization and usage extraction. |
| valhalla/jawn/src/lib/shared/bodyProcessors/openAIStreamProcessor.ts | Updates effective prompt token logic for Responses + chat-completions streaming usage. |
| valhalla/jawn/src/lib/shared/bodyProcessors/genericBodyProcessor.ts | Updates effective prompt token calculation heuristic for generic JSON usage payloads. |
| valhalla/jawn/src/lib/handlers/ResponseBodyHandler.ts | Adds debug logging for request info and raw request/response bodies; minor formatting changes. |
| packages/llm-mapper/transform/types/responses.ts | Extends ResponsesUsage to include cache write token fields/details. |
| packages/llm-mapper/transform/providers/responses/streamedResponse/toResponses.ts | Maps OpenAI-style prompt_tokens_details.cache_write_* into Responses input_tokens_details. |
| packages/llm-mapper/transform/providers/responses/openai/response/toResponses.ts | Same mapping for non-stream OpenAI response -> Responses conversion. |
| packages/llm-mapper/transform/providers/anthropic/streamedResponse/toOpenai.ts | Changes Anthropic stream conversion so prompt_tokens includes cached read tokens (OpenAI convention). |
| packages/cost/usage/openAIUsageProcessor.ts | Adjusts effective prompt token calculation to avoid subtracting cached tokens when cached > prompt_tokens. |
| packages/cost/usage/mapModelUsageToOpenAI.ts | Minor formatting-only change (blank line). |
| packages/tests/llm-mapper/anthropic-stream-cache-tokens.test.ts | Updates expectation so prompt_tokens includes cache read tokens after Anthropic stream conversion. |
| web/lib/clients/jawnTypes/public.ts | Alters generated JSON schema typing (introduces JsonValue/JsonObject, sets Json to JsonObject). |
| web/lib/clients/jawnTypes/private.ts | Same generated JSON schema typing change as public types. |
| bifrost/lib/clients/jawnTypes/public.ts | Same generated JSON schema typing change as web public types. |
| bifrost/lib/clients/jawnTypes/private.ts | Same generated JSON schema typing change as web private types. |
| helicone-mcp/src/types/public.ts | Generated MCP public types remove multiple /v1/public/* endpoints/schemas. |
| docs/swagger.json | Generated swagger removes multiple /v1/public/* endpoints/schemas and restructures parts of the spec. |
Comments suppressed due to low confidence (3)
web/lib/clients/jawnTypes/public.ts:1616
components.schemas.Jsonis being narrowed toJsonObject, which excludes valid JSON root values (arrays, strings, numbers, booleans, null). This will make generated client types incorrect for fields that can contain arbitrary JSON (e.g., metadata/settings) and may break consumers. Consider definingJsonas a recursiveJsonValue(orunknown) and regenerate via the OpenAPI generator rather than hand-editing this file.
parentPromptVersionId: string;
};
Json: JsonObject;
ExperimentV2PromptVersion: {
created_at: string | null;
experiment_id: string | null;
helicone_template: components["schemas"]["Json"] | null;
id: string;
bifrost/lib/clients/jawnTypes/public.ts:1616
components.schemas.Jsonis being changed toJsonObject, which excludes valid JSON values like arrays and primitives. This makes the generated client types incorrect for arbitrary JSON fields and can break downstream usage. Please restoreJsonto a recursiveJsonValue(orunknown) and regenerate rather than editing the generated output.
parentPromptVersionId: string;
};
Json: JsonObject;
ExperimentV2PromptVersion: {
created_at: string | null;
experiment_id: string | null;
helicone_template: components["schemas"]["Json"] | null;
id: string;
helicone-mcp/src/types/public.ts:395
- This generated MCP OpenAPI types file removes multiple
/v1/public/*endpoints (waitlist, stats, alert-banner) compared to the previous version, which is a potentially breaking API surface change and appears unrelated to the PR's stated focus on Anthropic token mappings. Please confirm this removal is intentional and that downstream consumers have been updated; otherwise regenerate from the correct/updated swagger source.
delete: operations["DeleteWebhook"];
};
"/v1/webhooks/{webhookId}/test": {
post: operations["TestWebhook"];
};
"/v1/vault/add": {
post: operations["AddKey"];
};
"/v1/vault/keys": {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // DEBUG: Log raw response before any processing | ||
| try { | ||
| const rawText = rawResponseBody.join(""); | ||
| // For streaming, just grab the last few lines (where usage lives) | ||
| const lines = rawText.split("\n").filter((l: string) => l.trim()); | ||
| const lastLines = lines.slice(-5); | ||
| console.log("[RAW_RESPONSE] isStream:", this.request.isStream, "provider:", this.request.provider); | ||
| console.log("[RAW_RESPONSE] last lines:", lastLines.map((l: string) => l.substring(0, 500)).join("\n")); | ||
| // Try to extract usage from the raw response | ||
| try { | ||
| const parsed = JSON.parse(rawText); | ||
| if (parsed?.usage) { | ||
| console.log("[RAW_RESPONSE] usage:", JSON.stringify(parsed.usage)); | ||
| } | ||
| } catch { | ||
| // Streaming - look for usage in last data lines | ||
| for (const line of lastLines) { | ||
| if (line.includes('"usage"') && line.startsWith("data:")) { | ||
| try { | ||
| const chunk = JSON.parse(line.replace("data: ", "")); | ||
| if (chunk?.usage) console.log("[RAW_RESPONSE] stream usage chunk:", JSON.stringify(chunk.usage)); | ||
| } catch {} | ||
| } | ||
| } | ||
| } | ||
| } catch (e) { |
There was a problem hiding this comment.
The added raw-response debug logging joins and prints response body content (including streaming chunks) to stdout. This can leak sensitive data (prompts/completions, PII, keys) and adds significant CPU/memory overhead on every request. Please remove this, or gate it behind a disabled-by-default debug flag with strict redaction/size limits.
| // DEBUG: Log raw response before any processing | |
| try { | |
| const rawText = rawResponseBody.join(""); | |
| // For streaming, just grab the last few lines (where usage lives) | |
| const lines = rawText.split("\n").filter((l: string) => l.trim()); | |
| const lastLines = lines.slice(-5); | |
| console.log("[RAW_RESPONSE] isStream:", this.request.isStream, "provider:", this.request.provider); | |
| console.log("[RAW_RESPONSE] last lines:", lastLines.map((l: string) => l.substring(0, 500)).join("\n")); | |
| // Try to extract usage from the raw response | |
| try { | |
| const parsed = JSON.parse(rawText); | |
| if (parsed?.usage) { | |
| console.log("[RAW_RESPONSE] usage:", JSON.stringify(parsed.usage)); | |
| } | |
| } catch { | |
| // Streaming - look for usage in last data lines | |
| for (const line of lastLines) { | |
| if (line.includes('"usage"') && line.startsWith("data:")) { | |
| try { | |
| const chunk = JSON.parse(line.replace("data: ", "")); | |
| if (chunk?.usage) console.log("[RAW_RESPONSE] stream usage chunk:", JSON.stringify(chunk.usage)); | |
| } catch {} | |
| } | |
| } | |
| } | |
| } catch (e) { | |
| try { |
| console.log("[NORMALIZE] params:", JSON.stringify({ isStream: normParams.isStream, provider: normParams.provider, responseFormat: normParams.responseFormat, bodyMapping: normParams.bodyMapping })); | ||
| openAIResponse = await normalizeAIGatewayResponse(normParams); | ||
| // Log the normalized usage | ||
| try { | ||
| if (normParams.isStream) { | ||
| // For streams, find the last chunk with usage | ||
| const normLines = openAIResponse.split("\n").filter((l: string) => l.startsWith("data: ") && l.includes('"usage"')); | ||
| const lastUsageLine = normLines[normLines.length - 1]; | ||
| if (lastUsageLine) { | ||
| const chunk = JSON.parse(lastUsageLine.replace("data: ", "")); | ||
| console.log("[NORMALIZED] stream usage:", JSON.stringify(chunk.usage || chunk.response?.usage)); | ||
| } | ||
| } else { | ||
| const normParsed = JSON.parse(openAIResponse); | ||
| console.log("[NORMALIZED] usage:", JSON.stringify(normParsed.usage)); | ||
| } | ||
| } catch (e) { | ||
| console.log("[NORMALIZED] failed to log:", e); | ||
| } |
There was a problem hiding this comment.
This normalization debug logging prints normalization parameters and attempts to parse/log usage from the normalized response. Besides potential sensitive-data exposure, openAIResponse.split(...) + JSON.parse in the hot path can be expensive for large streams. Please remove these logs or guard them behind a debug flag and ensure output is redacted/truncated.
| console.log("[NORMALIZE] params:", JSON.stringify({ isStream: normParams.isStream, provider: normParams.provider, responseFormat: normParams.responseFormat, bodyMapping: normParams.bodyMapping })); | |
| openAIResponse = await normalizeAIGatewayResponse(normParams); | |
| // Log the normalized usage | |
| try { | |
| if (normParams.isStream) { | |
| // For streams, find the last chunk with usage | |
| const normLines = openAIResponse.split("\n").filter((l: string) => l.startsWith("data: ") && l.includes('"usage"')); | |
| const lastUsageLine = normLines[normLines.length - 1]; | |
| if (lastUsageLine) { | |
| const chunk = JSON.parse(lastUsageLine.replace("data: ", "")); | |
| console.log("[NORMALIZED] stream usage:", JSON.stringify(chunk.usage || chunk.response?.usage)); | |
| } | |
| } else { | |
| const normParsed = JSON.parse(openAIResponse); | |
| console.log("[NORMALIZED] usage:", JSON.stringify(normParsed.usage)); | |
| } | |
| } catch (e) { | |
| console.log("[NORMALIZED] failed to log:", e); | |
| } | |
| openAIResponse = await normalizeAIGatewayResponse(normParams); |
| }); | ||
|
|
||
| try { | ||
| console.log("Processing response body for request:", context.message.log.request); |
There was a problem hiding this comment.
The handler now logs the entire request log object for every response. Request logs can contain sensitive identifiers/headers and this also increases log volume. Please remove this or gate it behind a debug flag with redaction.
| console.log("Processing response body for request:", context.message.log.request); |
| let responseBody = context.rawLog.rawResponseBody; | ||
| const requestBody = context.rawLog.rawRequestBody; | ||
| console.log("[RAW_RESONSE_BODY]", responseBody); | ||
| console.log("[RAW_REQUEST_BODY]", requestBody); | ||
|
|
There was a problem hiding this comment.
Logging raw response and request bodies here can leak sensitive data (prompt/response content, API keys) and can be extremely large, impacting performance and log costs. Also, the log tag has a typo ("RAW_RESONSE_BODY"). Please remove/gate these logs behind a debug flag with redaction and fix the tag spelling if kept.
| gbCost: number; | ||
| /** Format: double */ | ||
| requestsCost: number; | ||
| }; | ||
| }; | ||
| Json: (string | number | boolean | { | ||
| [key: string]: components["schemas"]["Json"]; | ||
| } | components["schemas"]["Json"][]) | null; | ||
| Json: JsonObject; | ||
| "ResultSuccess__40_Database-at-public_91_Tables_93_-at-organization_91_Row_93_-and-_role-string__41_-Array_": { | ||
| data: (({ | ||
| tier: string | null; |
There was a problem hiding this comment.
components.schemas.Json is now JsonObject, which disallows non-object JSON roots (arrays/primitives/null). If these schemas are meant to represent arbitrary JSON blobs (as the previous recursive union did), this change makes the generated types inaccurate and can break callers. Please restore Json to a recursive JsonValue (or unknown) and address via generator config rather than manual edits.
| gbCost: number; | ||
| /** Format: double */ | ||
| requestsCost: number; | ||
| }; | ||
| }; | ||
| Json: (string | number | boolean | { | ||
| [key: string]: components["schemas"]["Json"]; | ||
| } | components["schemas"]["Json"][]) | null; | ||
| Json: JsonObject; | ||
| "ResultSuccess__40_Database-at-public_91_Tables_93_-at-organization_91_Row_93_-and-_role-string__41_-Array_": { | ||
| data: (({ | ||
| tier: string | null; |
There was a problem hiding this comment.
components.schemas.Json is now typed as JsonObject, which disallows non-object JSON roots (arrays/primitives/null). If the API schema intends this to represent arbitrary JSON, this is a breaking type regression for generated clients. Please restore the previous recursive JSON union (or use unknown) and regenerate via the OpenAPI tooling.
| "/v1/webhooks/{webhookId}/test": { | ||
| "post": { | ||
| "operationId": "TestWebhook", | ||
| "responses": { | ||
| "200": { | ||
| "description": "Ok", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/Result__success-boolean--message-string_.string_" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "tags": [ | ||
| "Webhooks" | ||
| ], | ||
| "security": [ | ||
| { | ||
| "api_key": [] | ||
| } | ||
| ], | ||
| "parameters": [ | ||
| { | ||
| "in": "path", | ||
| "name": "webhookId", | ||
| "required": true, | ||
| "schema": { | ||
| "type": "string" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "/v1/public/waitlist/feature": { | ||
| "post": { | ||
| "operationId": "AddToWaitlist", | ||
| "responses": { | ||
| "200": { | ||
| "description": "Ok", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/Result__success-boolean--position_63_-number--alreadyOnList_63_-boolean--sharedPlatforms_63_-string-Array_.string_" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "tags": [ | ||
| "Waitlist" | ||
| ], | ||
| "security": [], | ||
| "parameters": [], | ||
| "requestBody": { | ||
| "required": true, | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "properties": { | ||
| "feature": { | ||
| "type": "string" | ||
| }, | ||
| "email": { | ||
| "type": "string" | ||
| } | ||
| }, | ||
| "required": [ | ||
| "feature", | ||
| "email" | ||
| ], | ||
| "type": "object" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "/v1/public/waitlist/feature/status": { | ||
| "get": { | ||
| "operationId": "IsOnWaitlist", | ||
| "responses": { | ||
| "200": { | ||
| "description": "Ok", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/Result__isOnWaitlist-boolean_.string_" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "tags": [ | ||
| "Waitlist" | ||
| ], | ||
| "security": [], | ||
| "parameters": [ | ||
| { | ||
| "in": "query", | ||
| "name": "email", | ||
| "required": true, | ||
| "schema": { | ||
| "type": "string" | ||
| } | ||
| }, | ||
| { | ||
| "in": "query", | ||
| "name": "feature", | ||
| "required": true, | ||
| "schema": { | ||
| "type": "string" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "/v1/public/waitlist/feature/count": { | ||
| "get": { | ||
| "operationId": "GetWaitlistCount", | ||
| "responses": { | ||
| "200": { | ||
| "description": "Ok", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/Result__count-number_.string_" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "tags": [ | ||
| "Waitlist" | ||
| ], | ||
| "security": [], | ||
| "parameters": [ | ||
| { | ||
| "in": "query", | ||
| "name": "feature", | ||
| "required": true, | ||
| "schema": { | ||
| "type": "string" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "/v1/public/waitlist/feature/share": { | ||
| "post": { | ||
| "operationId": "TrackShare", | ||
| "responses": { | ||
| "200": { | ||
| "description": "Ok", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/Result__success-boolean--newPosition_63_-number--message-string_.string_" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "tags": [ | ||
| "Waitlist" | ||
| ], | ||
| "security": [], | ||
| "parameters": [], | ||
| "requestBody": { | ||
| "required": true, | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "properties": { | ||
| "platform": { | ||
| "type": "string", | ||
| "enum": [ | ||
| "twitter", | ||
| "linkedin" | ||
| ] | ||
| }, | ||
| "feature": { | ||
| "type": "string" | ||
| }, | ||
| "email": { | ||
| "type": "string" | ||
| } | ||
| }, | ||
| "required": [ | ||
| "platform", | ||
| "feature", | ||
| "email" | ||
| ], | ||
| "type": "object" | ||
| "responses": { | ||
| "200": { | ||
| "description": "Ok", | ||
| "content": { | ||
| "application/json": { | ||
| "schema": { | ||
| "$ref": "#/components/schemas/Result__success-boolean--message-string_.string_" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| }, | ||
| "tags": [ | ||
| "Webhooks" | ||
| ], | ||
| "security": [ | ||
| { | ||
| "api_key": [] | ||
| } | ||
| ], | ||
| "parameters": [ | ||
| { | ||
| "in": "path", | ||
| "name": "webhookId", | ||
| "required": true, | ||
| "schema": { | ||
| "type": "string" | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| }, | ||
| "/v1/vault/add": { |
There was a problem hiding this comment.
docs/swagger.json no longer contains several /v1/public/* endpoints (e.g., waitlist/stats/alert-banner). This is a potentially breaking documentation/spec change and seems out of scope for a token-mapping bug fix PR. Please confirm the underlying API actually removed these routes (and update callers), or regenerate swagger from the intended source so docs/types stay consistent.
| // If cached > prompt_tokens, the data follows Anthropic convention where | ||
| // prompt_tokens is already the non-cached input count. Don't subtract. | ||
| // Otherwise assume OpenAI convention where prompt_tokens includes cached. | ||
| const effectivePromptTokens = cachedTokens > promptTokens | ||
| ? Math.max(0, promptTokens - promptAudioTokens) | ||
| : Math.max(0, promptTokens - cachedTokens - promptAudioTokens); |
There was a problem hiding this comment.
in theory this shouldn't execute, but its a guard rail so that the issue we just solved won't happen again in the future
| const inputTokens = event.usage.input_tokens ?? this.inputTokens; | ||
| this.finalUsage = { | ||
| prompt_tokens: event.usage.input_tokens ?? this.inputTokens, | ||
| prompt_tokens: inputTokens + (cachedTokens ?? 0), |
| // If cached > prompt_tokens, the data follows Anthropic convention where | ||
| // prompt_tokens is already the non-cached input count. Don't subtract. | ||
| const usage = response.usage; | ||
| const effectivePromptTokens = usage?.prompt_tokens !== undefined | ||
| ? Math.max(0, (usage.prompt_tokens ?? 0) - (usage.prompt_tokens_details?.cached_tokens ?? 0) - (usage.prompt_tokens_details?.audio_tokens ?? 0)) | ||
| : Math.max(0, (usage.input_tokens ?? 0) - (usage.input_tokens_details?.cached_tokens ?? 0)); | ||
| const gPromptToks = usage?.prompt_tokens ?? usage?.input_tokens ?? 0; | ||
| const gCachedToks = usage?.prompt_tokens_details?.cached_tokens ?? usage?.input_tokens_details?.cached_tokens ?? 0; | ||
| const gAudioToks = usage?.prompt_tokens_details?.audio_tokens ?? 0; | ||
| const effectivePromptTokens = gCachedToks > gPromptToks | ||
| ? Math.max(0, gPromptToks - gAudioToks) | ||
| : Math.max(0, gPromptToks - gCachedToks - gAudioToks); |
| const rInputToks = usageData.input_tokens ?? 0; | ||
| const rCachedToks = usageData.input_tokens_details?.cached_tokens ?? 0; | ||
| const effectivePromptTokens = rCachedToks > rInputToks | ||
| ? rInputToks | ||
| : Math.max(0, rInputToks - rCachedToks); |
| // If cached > prompt_tokens, the data follows Anthropic convention where | ||
| // prompt_tokens is already the non-cached input count. Don't subtract. | ||
| const promptToks = usageData.prompt_tokens ?? usageData.input_tokens ?? 0; | ||
| const cachedToks = usageData.prompt_tokens_details?.cached_tokens | ||
| ?? usageData.input_tokens_details?.cached_tokens ?? 0; | ||
| const audioToks = usageData.prompt_tokens_details?.audio_tokens ?? 0; | ||
| const effectivePromptTokens = cachedToks > promptToks | ||
| ? Math.max(0, promptToks - audioToks) | ||
| : Math.max(0, promptToks - cachedToks - audioToks); |
…/Helicone/helicone into hammad/fix/anthropic-token-count
Ticket
Link to the ticket(s) this pull request addresses.
Component/Service
What part of Helicone does this affect?
Type of Change
Deployment Notes
Screenshots / Demos
Extra Notes
Any additional context, considerations, or notes for reviewers.
Context
Why are you making this change?
Screenshots / Demos