Context caching for Ark (Volces, Volcengine)

### What specific problem does this solve?

Ark allows context caching for some models, providing cost and speed benefits. This should be added to Roo Code.

### Additional context (optional)

_No response_

### Roo Code Task Links (Optional)

_No response_

### Request checklist

- [x] I've searched existing Issues and Discussions for duplicates
- [x] This describes a specific problem with clear impact and context

### Interested in implementing this?

- [ ] Yes, I'd like to help implement this feature

### Implementation requirements

- [x] I understand this needs approval before implementation begins

### How should this be solved? (REQUIRED if contributing, optional otherwise)

There are two caching APIs on Ark, the context API and the OpenAI-compatible responses API. They should not be too much work to add and will save time and money, assuming a good expiry policy (storage costs money too).

### How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Given an already existing chat,
When the user adds a request to it,
Then the request should include a reference identifying an ongoing context/previous response
And the returned `usage` object has `prompt_tokens_details.cached_tokens` being nonzero

Given a previously set checkpoint,
When the user goes back to it,
Then the request should include a reference identifying the point in the checkpoint,
And the returned `usage` object has `prompt_tokens_details.cached_tokens` being nonzero,
And the returned dialogue should not show any signs of being affected by input that happened after the checkpoint

### Technical considerations (REQUIRED if contributing, optional otherwise)

API documentations are:
* https://www.volcengine.com/docs/82379/1602228 (responses), using `previous_response_id` referring to the `id` of a previous request with the extra body content of `"caching": {"type": "enabled"}`.
* https://www.volcengine.com/docs/82379/1396491 (context), using `context_id`. Set `mode: "session"` for auto-append, which makes it easier to use but makes it impossible to go back.

Both are perfectly serviceable for append-only chat (session mode). However, for going back to checkpoints, the responses API seems to provide finer-grained control. 

1. The first roadblock is that the RooCode code base currently uses OpenAI Completions instead of the newer, more recommended OpenAI Responses API. That needs to be fixed. See https://platform.openai.com/docs/guides/responses-vs-chat-completions?api-mode=responses.
2. After that it would just be natural to use the responses API with caching. Some care in setting the expiry time is recommended (try 1h).
3. There's also the problem of cache management.
   * A message can only be cached if it and all preceding messages had `"caching": {"type": "enabled"}`.
   * Removing a cached responses also removes all subsequent cached responses that depend on it. The response id for these subsequent responses remain valid however, and if they are referenced they cause the subsequent responses to be re-generated and re-cached. I cannot tell from the documentation whether the user-assistant message pair from the cached response is kept or removed during re-generation.
   * Cache costs money. I am not sure whether it's each chain of responses or each individual cached response that costs money, but the way the doc is written suggests the former. It states that the cost for each hour is the value of `cache_tokens` given in `usage` (if this hour contains a request) or billed as the maximum value of `cache_tokens` from the previous hour. In any case I think one hour of TTL is more than enough.

### Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Code complexity, mostly. Moving to responses should be good for native OpenAI models too.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Context caching for Ark (Volces, Volcengine) #6351

What specific problem does this solve?

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Context caching for Ark (Volces, Volcengine) #6351

Description

What specific problem does this solve?

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions