-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat: add Anthropic Batch API support for 50% cost savings #8672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// Add 1M context beta flag if enabled for Claude Sonnet 4 and 4.5 | ||
if ( | ||
(modelId === "claude-sonnet-4-20250514" || modelId === "claude-sonnet-4-5") && | ||
this.options.anthropicBeta1MContext | ||
) { | ||
betas.push("context-1m-2025-08-07") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prompt caching beta header is missing when creating batch requests with prompt caching support. When supportsPromptCaching(modelId)
returns true, the "prompt-caching-2024-07-31"
beta should be added to the betas array (similar to line 128 in the streaming path). Without this header, prompt caching won't work correctly in batch mode even though cache breakpoints are being added to the messages.
// Add 1M context beta flag if enabled for Claude Sonnet 4 and 4.5 | |
if ( | |
(modelId === "claude-sonnet-4-20250514" || modelId === "claude-sonnet-4-5") && | |
this.options.anthropicBeta1MContext | |
) { | |
betas.push("context-1m-2025-08-07") | |
} | |
// Add 1M context beta flag if enabled for Claude Sonnet 4 and 4.5 | |
if ( | |
(modelId === "claude-sonnet-4-20250514" || modelId === "claude-sonnet-4-5") && | |
this.options.anthropicBeta1MContext | |
) { | |
betas.push("context-1m-2025-08-07") | |
} | |
// Add prompt caching beta if model supports it | |
if (this.supportsPromptCaching(modelId)) { | |
betas.push("prompt-caching-2024-07-31") | |
} |
src/api/providers/anthropic.ts
Outdated
}, | ||
], | ||
}, | ||
batchOptions as any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The as any
type assertion bypasses TypeScript's type checking and could hide type mismatches. The Anthropic SDK's types should be used directly without casting. If the types don't match, either the SDK types need updating or the code needs adjustment to match the actual API contract.
while (Date.now() - startTime < BATCH_MAX_POLL_TIME_MS) { | ||
const status = await this.client.messages.batches.retrieve(batch.id) | ||
|
||
if (status.processing_status === "ended") { | ||
completedBatch = status | ||
break | ||
} | ||
|
||
// Wait before next poll | ||
await new Promise((resolve) => setTimeout(resolve, BATCH_POLL_INTERVAL_MS)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch status handling is incomplete. According to Anthropic's Batch API documentation, processing_status
can be "in_progress"
, "canceling"
, "ended"
, "expired"
, or "canceled"
. Currently, only "ended"
is checked, which means if a batch expires or is canceled, the code will wait until the timeout and throw a generic timeout error instead of providing a specific error message about the actual failure state.
while (Date.now() - startTime < BATCH_MAX_POLL_TIME_MS) { | |
const status = await this.client.messages.batches.retrieve(batch.id) | |
if (status.processing_status === "ended") { | |
completedBatch = status | |
break | |
} | |
// Wait before next poll | |
await new Promise((resolve) => setTimeout(resolve, BATCH_POLL_INTERVAL_MS)) | |
} | |
while (Date.now() - startTime < BATCH_MAX_POLL_TIME_MS) { | |
const status = await this.client.messages.batches.retrieve(batch.id) | |
if (status.processing_status === "ended") { | |
completedBatch = status | |
break | |
} | |
// Handle failure states | |
if (status.processing_status === "expired") { | |
throw new Error("Batch request expired before completing") | |
} | |
if (status.processing_status === "canceled") { | |
throw new Error("Batch request was canceled") | |
} | |
// Wait before next poll | |
await new Promise((resolve) => setTimeout(resolve, BATCH_POLL_INTERVAL_MS)) | |
} |
/** | ||
* Creates a message using the Batch API for 50% cost savings. | ||
* This method handles the async batch job lifecycle: create, poll, and retrieve results. | ||
*/ | ||
private async *createBatchMessage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new Batch API functionality lacks test coverage. Given the complexity of the batch processing lifecycle (polling, timeout handling, result retrieval) and the potential for errors at each stage, this code should have comprehensive tests. Consider adding tests that cover:
- Successful batch processing with prompt caching enabled
- Successful batch processing without prompt caching
- Batch timeout scenarios
- Batch expiration/cancellation scenarios
- Error handling in batch results
- Verification that the 50% cost discount is applied correctly
- Verification that beta headers are included when needed
The existing anthropic.spec.ts
provides a good pattern to follow for mocking the SDK's batch API methods.
} else if (result.result.type === "errored") { | ||
const errorType = result.result.error.type | ||
throw new Error(`Batch request failed: ${errorType}`) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error handling only includes the error type but omits the error message, making debugging difficult. Anthropic's batch API error responses include both type
and message
fields. Including the full error context would help users understand what went wrong with their batch request.
} else if (result.result.type === "errored") { | |
const errorType = result.result.error.type | |
throw new Error(`Batch request failed: ${errorType}`) | |
} | |
} else if (result.result.type === "errored") { | |
const error = result.result.error | |
throw new Error(`Batch request failed: ${error.type}${error.message ? ` - ${error.message}` : ""}`) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, i think i will add this
// Process results | ||
for await (const result of results) { | ||
if (result.result.type === "succeeded") { | ||
const message = result.result.message | ||
|
||
// Yield content blocks | ||
for (const content of message.content) { | ||
if (content.type === "text") { | ||
yield { type: "text", text: content.text } | ||
} else if (content.type === "thinking") { | ||
yield { type: "reasoning", text: content.thinking } | ||
} | ||
} | ||
|
||
// Yield usage information | ||
const usage = message.usage | ||
yield { | ||
type: "usage", | ||
inputTokens: usage.input_tokens || 0, | ||
outputTokens: usage.output_tokens || 0, | ||
cacheWriteTokens: usage.cache_creation_input_tokens || undefined, | ||
cacheReadTokens: usage.cache_read_input_tokens || undefined, | ||
} | ||
|
||
// Calculate and yield cost | ||
yield { | ||
type: "usage", | ||
inputTokens: 0, | ||
outputTokens: 0, | ||
totalCost: calculateApiCostAnthropic( | ||
this.getModel().info, | ||
usage.input_tokens || 0, | ||
usage.output_tokens || 0, | ||
usage.cache_creation_input_tokens || 0, | ||
usage.cache_read_input_tokens || 0, | ||
), | ||
} | ||
} else if (result.result.type === "errored") { | ||
const errorType = result.result.error.type | ||
throw new Error(`Batch request failed: ${errorType}`) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch results processing doesn't validate that at least one successful result was returned. If the batch completes but all results are errors (or the results iterator is empty), the user would see the "Using Batch API" notification but receive no response text or usage information. This could happen if the batch request was malformed or if there were API-level issues. Consider tracking whether any successful result was processed and throwing an appropriate error if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to do this.
{ | ||
requests: [ | ||
{ | ||
custom_id: `req_${Date.now()}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The custom_id uses only Date.now()
which could theoretically cause collisions if multiple batch requests are initiated in the same millisecond (e.g., in high-concurrency scenarios or automated testing). While unlikely in typical usage, a more robust approach would include additional entropy to guarantee uniqueness.
custom_id: `req_${Date.now()}`, | |
custom_id: `req_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`, |
// Apply 50% discount for Batch API (applies after 1M context pricing if both enabled) | ||
if (this.options.anthropicUseBatchApi) { | ||
info = { | ||
...info, | ||
inputPrice: typeof info.inputPrice === "number" ? info.inputPrice * 0.5 : undefined, | ||
outputPrice: typeof info.outputPrice === "number" ? info.outputPrice * 0.5 : undefined, | ||
cacheWritesPrice: typeof info.cacheWritesPrice === "number" ? info.cacheWritesPrice * 0.5 : undefined, | ||
cacheReadsPrice: typeof info.cacheReadsPrice === "number" ? info.cacheReadsPrice * 0.5 : undefined, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 50% batch API discount is applied in both getModel()
(line 273-281) and in useSelectedModel.ts
(line 398-409). While these serve different purposes (backend cost calculation vs UI pricing display), this duplication could lead to maintenance issues if the discount logic needs to change. Consider extracting the discount calculation into a shared utility function to ensure consistency and reduce duplication.
|
||
// Batch API polling configuration | ||
const BATCH_POLL_INTERVAL_MS = 5000 // Poll every 5 seconds | ||
const BATCH_MAX_POLL_TIME_MS = 600000 // Max 10 minutes polling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout constant is set to 10 minutes (600,000ms), but the PR description states "max 5 minutes timeout". This discrepancy between code and documentation could confuse users about the actual timeout behavior. Either update the constant to match the documented 5 minutes (300000
) or update the PR description to reflect the 10-minute timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this in the PR comment, although i don't think its that big of an issue really
Addresses roomote bot feedback on PR RooCodeInc#8672: - Enhanced error handling to include full error details via JSON.stringify() for better debugging when batch requests fail - Extracted batch API discount calculation into shared applyBatchApiDiscount() utility function in src/shared/cost.ts to eliminate code duplication between backend (anthropic.ts) and frontend (useSelectedModel.ts) - Added documentation comment explaining custom_id generation approach and future considerations for multiple requests per batch
Adds toggle to enable Anthropic's Batch API for async message processing with 50% cost reduction. Includes: - Backend implementation with proper prompt caching support - Beta header support for 1M context compatibility - UI pricing display updates to show 50% discount - Settings UI toggle with translations
…ype cast, handle batch failure states
Add translations for anthropicBatchApiLabel and anthropicBatchApiDescription across all 17 supported languages (ca, de, es, fr, hi, id, it, ja, ko, nl, pl, pt-BR, ru, tr, vi, zh-CN, zh-TW)
- Enhanced error messages with JSON.stringify() for full error context - Extracted applyBatchApiDiscount() utility to shared cost.ts - Added documentation for custom_id approach - Fixed batch status handling to only fail on actual error states (errored/expired/canceled) instead of failing on any unknown transitional state - Rebased onto main to resolve Claude Haiku 4.5 model conflict
44e8d23
to
2e6e434
Compare
Description
Adds support for Anthropic's Batch API with a new settings toggle, enabling 50% cost savings on API requests through asynchronous batch processing.
Changes
Implementation Details
Testing
Documentation
Official Anthropic Batch API docs: https://docs.anthropic.com/en/api/creating-message-batches
Closes #8667
Important
Adds support for Anthropic's Batch API with a new setting for 50% cost savings through async batch processing.
anthropicUseBatchApi
setting inprovider-settings.ts
for 50% cost savings via async batch processing.AnthropicHandler
inanthropic.ts
usingclient.messages.batches
API.createBatchMessage()
inanthropic.ts
to handle batch job lifecycle.getModel()
andcreateMessage()
to apply 50% discount.anthropicUseBatchApi
inAnthropic.tsx
.settings.json
for new batch API settings.This description was created by
for ccabc68. You can customize this summary. It will automatically update as commits are pushed.