Batch API Support for Async Workloads (up to 50% cost savings) #1597
SebConejo
started this conversation in
Feature request
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Major LLM providers (OpenAI, Anthropic, Google) offer Batch APIs, a separate endpoint where you upload a file of requests (JSONL) and the provider processes them asynchronously, typically within 24 hours. In exchange for relaxed latency, batch requests are ~50% cheaper than standard API calls.
Today, Manifest routes requests in real-time, one at a time. There is no way to leverage batch endpoints through the router. Users who need batch processing are forced to bypass Manifest entirely and go direct to providers, losing the benefits of intelligent routing and cost optimization.
What Batch API Looks Like
Unlike standard request/response, batch is a multi-step workflow:
Typical use cases: evaluations, bulk classification, dataset labeling, content generation at scale, synthetic data, anything that doesn't need an instant response.
Why It Matters
Proposal
Add batch mode support to Manifest, enabling the router to:
This would make Manifest useful for both real-time and async workloads, covering a much larger share of LLM API spend.
👍 React if this would be useful for your workflow.
Beta Was this translation helpful? Give feedback.
All reactions