@@ -827,6 +827,72 @@ Use Vertex AI context caching is supported by calling provider api directly. (Un
827
827
828
828
[ ** Go straight to provider** ] ( ../pass_through/vertex_ai.md#context-caching )
829
829
830
+ #### 1. Create the Cache
831
+
832
+ First, create the cache by sending a ` POST ` request to the ` cachedContents ` endpoint via the LiteLLM proxy.
833
+
834
+ <Tabs >
835
+ <TabItem value =" proxy " label =" PROXY " >
836
+
837
+ ``` bash
838
+ curl http://0.0.0.0:4000/vertex_ai/v1/projects/{project_id}/locations/{location}/cachedContents \
839
+ -H " Content-Type: application/json" \
840
+ -H " Authorization: Bearer $LITELLM_KEY " \
841
+ -d ' {
842
+ "model": "projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-flash",
843
+ "displayName": "example_cache",
844
+ "contents": [{
845
+ "role": "user",
846
+ "parts": [{
847
+ "text": ".... a long book to be cached"
848
+ }]
849
+ }]
850
+ }'
851
+ ```
852
+
853
+ </TabItem >
854
+ </Tabs >
855
+
856
+ #### 2. Get the Cache Name from the Response
857
+
858
+ Vertex AI will return a response containing the ` name ` of the cached content. This name is the identifier for your cached data.
859
+
860
+ ``` json
861
+ {
862
+ "name" : " projects/12341234/locations/{location}/cachedContents/123123123123123" ,
863
+ "model" : " projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-flash" ,
864
+ "createTime" : " 2025-09-23T19:13:50.674976Z" ,
865
+ "updateTime" : " 2025-09-23T19:13:50.674976Z" ,
866
+ "expireTime" : " 2025-09-23T20:13:50.655988Z" ,
867
+ "displayName" : " example_cache" ,
868
+ "usageMetadata" : {
869
+ "totalTokenCount" : 1246 ,
870
+ "textCount" : 5132
871
+ }
872
+ }
873
+ ```
874
+
875
+ #### 3. Use the Cached Content
876
+
877
+ Use the ` name ` from the response as ` cached_content ` in subsequent API calls to reuse the cached information. This is passed in the body of your request to ` /chat/completions ` .
878
+
879
+ <Tabs >
880
+ <TabItem value =" proxy " label =" PROXY " >
881
+
882
+ ``` json
883
+ {
884
+ "cachedContent" : " projects/545201925769/locations/us-central1/cachedContents/4511135542628319232" ,
885
+ "model" : " gemini-2.5-flash" ,
886
+ "messages" : [
887
+ {
888
+ "role" : " user" ,
889
+ "content" : " what is the book about?"
890
+ }
891
+ ]
892
+ }
893
+ ```
894
+ </TabItem >
895
+
830
896
831
897
## Pre-requisites
832
898
* ` pip install google-cloud-aiplatform ` (pre-installed on proxy docker image)
@@ -2736,7 +2802,3 @@ Once that's done, when you deploy the new container in the Google Cloud Run serv
2736
2802
2737
2803
2738
2804
s/o @[ Darien Kindlund] ( https://www.linkedin.com/in/kindlund/ ) for this tutorial
2739
-
2740
-
2741
-
2742
-
0 commit comments