Skip to content

Commit 752e722

Browse files
Merge pull request #2525 from mrbullwinkle/mrb_01_27_2025_update
[Azure OpenAI] Prompt caching update
2 parents b16b401 + 33a50a1 commit 752e722

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/ai-services/openai/how-to/prompt-caching.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,16 +75,16 @@ A single character difference in the first 1,024 tokens will result in a cache m
7575

7676
## What is cached?
7777

78-
The o1-series models are text only and don't support system messages, images, tool use/function calling, or structured outputs. This limits the efficacy of prompt caching for these models to the user/assistant portions of the messages array which are less likely to have an identical 1024 token prefix.
78+
o1-series models feature support varies by model. For more details, see our dedicated [reasoning models guide](./reasoning.md).
7979

8080
Prompt caching is supported for:
8181

8282
|**Caching supported**|**Description**|**Supported models**|
8383
|--------|--------|--------|
84-
| **Messages** | The complete messages array: system, user, and assistant content | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) |
85-
| **Images** | Images included in user messages, both as links or as base64-encoded data. The detail parameter must be set the same across requests. | `gpt-4o`<br/>`gpt-4o-mini` |
86-
| **Tool use** | Both the messages array and tool definitions. | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) |
87-
| **Structured outputs** | Structured output schema is appended as a prefix to the system message. | `gpt-4o`<br/>`gpt-4o-mini` |
84+
| **Messages** | The complete messages array: system, developer, user, and assistant content | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) <br> `o1` (version 2024-12-17) |
85+
| **Images** | Images included in user messages, both as links or as base64-encoded data. The detail parameter must be set the same across requests. | `gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17) |
86+
| **Tool use** | Both the messages array and tool definitions. | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) <br> `o1` (version 2024-12-17) |
87+
| **Structured outputs** | Structured output schema is appended as a prefix to the system message. | `gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17) |
8888

8989
To improve the likelihood of cache hits occurring, you should structure your requests such that repetitive content occurs at the beginning of the messages array.
9090

0 commit comments

Comments
 (0)