@@ -815,6 +815,77 @@ Use Vertex AI context caching is supported by calling provider api directly. (Un
815
815
816
816
[ ** Go straight to provider** ] ( ../pass_through/vertex_ai.md#context-caching )
817
817
818
+ #### 1. Create the Cache
819
+
820
+ First, create the cache by sending a ` POST ` request to the ` cachedContents ` endpoint via the LiteLLM proxy.
821
+
822
+ <Tabs >
823
+ <TabItem value =" proxy " label =" PROXY " >
824
+
825
+ ``` bash
826
+ curl http://0.0.0.0:4000/vertex_ai/v1/projects/{project_id}/locations/{location}/cachedContents \
827
+ -H " Content-Type: application/json" \
828
+ -H " Authorization: Bearer $LITELLM_KEY " \
829
+ -d ' {
830
+ "model": "projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-flash",
831
+ "displayName": "example_cache",
832
+ "contents": [{
833
+ "role": "user",
834
+ "parts": [{
835
+ "text": ".... a long book to be cached"
836
+ }]
837
+ }]
838
+ }'
839
+ ```
840
+
841
+ </TabItem >
842
+ </Tabs >
843
+
844
+ #### 2. Get the Cache Name from the Response
845
+
846
+ Vertex AI will return a response containing the ` name ` of the cached content. This name is the identifier for your cached data.
847
+
848
+ ``` json
849
+ {
850
+ "name" : " projects/12341234/locations/{location}/cachedContents/123123123123123" ,
851
+ "model" : " projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-flash" ,
852
+ "createTime" : " 2025-09-23T19:13:50.674976Z" ,
853
+ "updateTime" : " 2025-09-23T19:13:50.674976Z" ,
854
+ "expireTime" : " 2025-09-23T20:13:50.655988Z" ,
855
+ "displayName" : " example_cache" ,
856
+ "usageMetadata" : {
857
+ "totalTokenCount" : 1246 ,
858
+ "textCount" : 5132
859
+ }
860
+ }
861
+ ```
862
+
863
+ #### 3. Use the Cached Content
864
+
865
+ Use the ` name ` from the response as ` cachedContent ` or ` cached_content ` in subsequent API calls to reuse the cached information. This is passed in the body of your request to ` /chat/completions ` .
866
+
867
+ <Tabs >
868
+ <TabItem value =" proxy " label =" PROXY " >
869
+
870
+ ``` bash
871
+
872
+ curl http://0.0.0.0:4000/chat/completions \
873
+ -H " Content-Type: application/json" \
874
+ -H " Authorization: Bearer $LITELLM_KEY " \
875
+ -d ' {
876
+ "cachedContent": "projects/545201925769/locations/us-central1/cachedContents/4511135542628319232",
877
+ "model": "gemini-2.5-flash",
878
+ "messages": [
879
+ {
880
+ "role": "user",
881
+ "content": "what is the book about?"
882
+ }
883
+ ]
884
+ }'
885
+ ```
886
+
887
+ </TabItem >
888
+
818
889
819
890
## Pre-requisites
820
891
* ` pip install google-cloud-aiplatform ` (pre-installed on proxy docker image)
@@ -2724,7 +2795,3 @@ Once that's done, when you deploy the new container in the Google Cloud Run serv
2724
2795
2725
2796
2726
2797
s/o @[ Darien Kindlund] ( https://www.linkedin.com/in/kindlund/ ) for this tutorial
2727
-
2728
-
2729
-
2730
-
0 commit comments