Skip to content

Commit d6c9406

Browse files
committed
update context cache param
1 parent 7216983 commit d6c9406

File tree

3 files changed

+78
-16
lines changed

3 files changed

+78
-16
lines changed

docs/my-website/docs/providers/vertex.md

Lines changed: 66 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -827,6 +827,72 @@ Use Vertex AI context caching is supported by calling provider api directly. (Un
827827

828828
[**Go straight to provider**](../pass_through/vertex_ai.md#context-caching)
829829

830+
#### 1. Create the Cache
831+
832+
First, create the cache by sending a `POST` request to the `cachedContents` endpoint via the LiteLLM proxy.
833+
834+
<Tabs>
835+
<TabItem value="proxy" label="PROXY">
836+
837+
```bash
838+
curl http://0.0.0.0:4000/vertex_ai/v1/projects/{project_id}/locations/{location}/cachedContents \
839+
-H "Content-Type: application/json" \
840+
-H "Authorization: Bearer $LITELLM_KEY" \
841+
-d '{
842+
"model": "projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-flash",
843+
"displayName": "example_cache",
844+
"contents": [{
845+
"role": "user",
846+
"parts": [{
847+
"text": ".... a long book to be cached"
848+
}]
849+
}]
850+
}'
851+
```
852+
853+
</TabItem>
854+
</Tabs>
855+
856+
#### 2. Get the Cache Name from the Response
857+
858+
Vertex AI will return a response containing the `name` of the cached content. This name is the identifier for your cached data.
859+
860+
```json
861+
{
862+
"name": "projects/12341234/locations/{location}/cachedContents/123123123123123",
863+
"model": "projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-flash",
864+
"createTime": "2025-09-23T19:13:50.674976Z",
865+
"updateTime": "2025-09-23T19:13:50.674976Z",
866+
"expireTime": "2025-09-23T20:13:50.655988Z",
867+
"displayName": "example_cache",
868+
"usageMetadata": {
869+
"totalTokenCount": 1246,
870+
"textCount": 5132
871+
}
872+
}
873+
```
874+
875+
#### 3. Use the Cached Content
876+
877+
Use the `name` from the response as `cached_content` in subsequent API calls to reuse the cached information. This is passed in the body of your request to `/chat/completions`.
878+
879+
<Tabs>
880+
<TabItem value="proxy" label="PROXY">
881+
882+
```json
883+
{
884+
"cachedContent": "projects/545201925769/locations/us-central1/cachedContents/4511135542628319232",
885+
"model": "gemini-2.5-flash",
886+
"messages": [
887+
{
888+
"role": "user",
889+
"content": "what is the book about?"
890+
}
891+
]
892+
}
893+
```
894+
</TabItem>
895+
830896

831897
## Pre-requisites
832898
* `pip install google-cloud-aiplatform` (pre-installed on proxy docker image)
@@ -2736,7 +2802,3 @@ Once that's done, when you deploy the new container in the Google Cloud Run serv
27362802

27372803

27382804
s/o @[Darien Kindlund](https://www.linkedin.com/in/kindlund/) for this tutorial
2739-
2740-
2741-
2742-

litellm/llms/vertex_ai/gemini/transformation.py

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,11 @@ def sync_transform_request_body(
537537
logging_obj=logging_obj,
538538
)
539539
else: # [TODO] implement context caching for gemini as well
540-
cached_content = optional_params.pop("cached_content", None)
540+
cached_content = None
541+
if "cached_content" in optional_params:
542+
cached_content = optional_params.pop("cached_content")
543+
elif "cachedContent" in optional_params:
544+
cached_content = optional_params.pop("cachedContent")
541545

542546
return _transform_request_body(
543547
messages=messages,
@@ -584,7 +588,11 @@ async def async_transform_request_body(
584588
logging_obj=logging_obj,
585589
)
586590
else: # [TODO] implement context caching for gemini as well
587-
cached_content = optional_params.pop("cached_content", None)
591+
cached_content = None
592+
if "cached_content" in optional_params:
593+
cached_content = optional_params.pop("cached_content")
594+
elif "cachedContent" in optional_params:
595+
cached_content = optional_params.pop("cachedContent")
588596

589597
return _transform_request_body(
590598
messages=messages,
@@ -649,5 +657,3 @@ def _transform_system_message(
649657
return SystemInstructions(parts=system_content_blocks), messages
650658

651659
return None, messages
652-
653-

litellm/llms/vertex_ai/vertex_llm_base.py

Lines changed: 2 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -271,17 +271,11 @@ def _ensure_access_token(
271271

272272
def is_using_v1beta1_features(self, optional_params: dict) -> bool:
273273
"""
274-
VertexAI only supports ContextCaching on v1beta1
275-
276274
use this helper to decide if request should be sent to v1 or v1beta1
277275
278-
Returns v1beta1 if context caching is enabled
279-
Returns v1 in all other cases
276+
Returns true if any beta feature is enabled
277+
Returns false in all other cases
280278
"""
281-
if "cached_content" in optional_params:
282-
return True
283-
if "CachedContent" in optional_params:
284-
return True
285279
return False
286280

287281
def _check_custom_proxy(

0 commit comments

Comments
 (0)