-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What specific problem does this solve?
Ark allows context caching for some models, providing cost and speed benefits. This should be added to Roo Code.
Additional context (optional)
No response
Roo Code Task Links (Optional)
No response
Request checklist
- I've searched existing Issues and Discussions for duplicates
- This describes a specific problem with clear impact and context
Interested in implementing this?
- Yes, I'd like to help implement this feature
Implementation requirements
- I understand this needs approval before implementation begins
How should this be solved? (REQUIRED if contributing, optional otherwise)
There are two caching APIs on Ark, the context API and the OpenAI-compatible responses API. They should not be too much work to add and will save time and money, assuming a good expiry policy (storage costs money too).
How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)
Given an already existing chat,
When the user adds a request to it,
Then the request should include a reference identifying an ongoing context/previous response
And the returned usage object has prompt_tokens_details.cached_tokens being nonzero
Given a previously set checkpoint,
When the user goes back to it,
Then the request should include a reference identifying the point in the checkpoint,
And the returned usage object has prompt_tokens_details.cached_tokens being nonzero,
And the returned dialogue should not show any signs of being affected by input that happened after the checkpoint
Technical considerations (REQUIRED if contributing, optional otherwise)
API documentations are:
- https://www.volcengine.com/docs/82379/1602228 (responses), using
previous_response_idreferring to theidof a previous request with the extra body content of"caching": {"type": "enabled"}. - https://www.volcengine.com/docs/82379/1396491 (context), using
context_id. Setmode: "session"for auto-append, which makes it easier to use but makes it impossible to go back.
Both are perfectly serviceable for append-only chat (session mode). However, for going back to checkpoints, the responses API seems to provide finer-grained control.
- The first roadblock is that the RooCode code base currently uses OpenAI Completions instead of the newer, more recommended OpenAI Responses API. That needs to be fixed. See https://platform.openai.com/docs/guides/responses-vs-chat-completions?api-mode=responses.
- After that it would just be natural to use the responses API with caching. Some care in setting the expiry time is recommended (try 1h).
- There's also the problem of cache management.
- A message can only be cached if it and all preceding messages had
"caching": {"type": "enabled"}. - Removing a cached responses also removes all subsequent cached responses that depend on it. The response id for these subsequent responses remain valid however, and if they are referenced they cause the subsequent responses to be re-generated and re-cached. I cannot tell from the documentation whether the user-assistant message pair from the cached response is kept or removed during re-generation.
- Cache costs money. I am not sure whether it's each chain of responses or each individual cached response that costs money, but the way the doc is written suggests the former. It states that the cost for each hour is the value of
cache_tokensgiven inusage(if this hour contains a request) or billed as the maximum value ofcache_tokensfrom the previous hour. In any case I think one hour of TTL is more than enough.
- A message can only be cached if it and all preceding messages had
Trade-offs and risks (REQUIRED if contributing, optional otherwise)
Code complexity, mostly. Moving to responses should be good for native OpenAI models too.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status