You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/prompt-caching.md
+8-7Lines changed: 8 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ services: cognitive-services
6
6
manager: nitinme
7
7
ms.service: azure-ai-openai
8
8
ms.topic: how-to
9
-
ms.date: 12/15/2024
9
+
ms.date: 03/20/2025
10
10
author: mrbullwinkle
11
11
ms.author: mbullwin
12
12
recommendations: false
@@ -22,6 +22,7 @@ Caches are typically cleared within 5-10 minutes of inactivity and are always re
22
22
23
23
Currently only the following models support prompt caching with Azure OpenAI:
24
24
25
+
-`o3-mini-2025-01-31`
25
26
-`o1-2024-12-17`
26
27
-`o1-preview-2024-09-12`
27
28
-`o1-mini-2024-09-12`
@@ -36,7 +37,7 @@ Currently only the following models support prompt caching with Azure OpenAI:
36
37
37
38
## API support
38
39
39
-
Official support for prompt caching was first added in API version `2024-10-01-preview`. At this time, only the o1 model family supports the `cached_tokens` API response parameter.
40
+
Official support for prompt caching was first added in API version `2024-10-01-preview`. At this time, only the o-series model family supports the `cached_tokens` API response parameter.
40
41
41
42
## Getting started
42
43
@@ -50,7 +51,7 @@ When a match is found between the token computations in a prompt and the current
50
51
```json
51
52
{
52
53
"created": 1729227448,
53
-
"model": "o1-preview-2024-09-12",
54
+
"model": "o1-2024-12-17",
54
55
"object": "chat.completion",
55
56
"service_tier": null,
56
57
"system_fingerprint": "fp_50cdd5dc04",
@@ -82,13 +83,13 @@ Prompt caching is supported for:
|**Images**| Images included in user messages, both as links or as base64-encoded data. The detail parameter must be set the same across requests. |`gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17) |
87
-
|**Tool use**| Both the messages array and tool definitions. |`gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17)<br/>`gpt-4o-mini-realtime-preview` (version 2024-12-17)<br> `o1` (version 2024-12-17) |
88
-
|**Structured outputs**| Structured output schema is appended as a prefix to the system message. |`gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17) |
88
+
|**Tool use**| Both the messages array and tool definitions. |`gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17)<br/>`gpt-4o-mini-realtime-preview` (version 2024-12-17)<br> `o1` (version 2024-12-17) <br> `o3-mini` (version 2025-01-31) |
89
+
|**Structured outputs**| Structured output schema is appended as a prefix to the system message. |`gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17) <br> `o3-mini` (version 2025-01-31) |
89
90
90
91
To improve the likelihood of cache hits occurring, you should structure your requests such that repetitive content occurs at the beginning of the messages array.
91
92
92
93
## Can I disable prompt caching?
93
94
94
-
Prompt caching is enabled by default for all supported models. There is no opt-out support for prompt caching.
95
+
Prompt caching is enabled by default for all supported models. There is no opt-out support for prompt caching.
0 commit comments