Skip to content

Commit 39c09ff

Browse files
Merge pull request #3639 from mrbullwinkle/mrb_03_20_2025_prompt_caching
[Azure OpenAI] o3-mini prompt caching update
2 parents 715d0bb + 217d044 commit 39c09ff

File tree

1 file changed

+8
-7
lines changed

1 file changed

+8
-7
lines changed

articles/ai-services/openai/how-to/prompt-caching.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ services: cognitive-services
66
manager: nitinme
77
ms.service: azure-ai-openai
88
ms.topic: how-to
9-
ms.date: 12/15/2024
9+
ms.date: 03/20/2025
1010
author: mrbullwinkle
1111
ms.author: mbullwin
1212
recommendations: false
@@ -22,6 +22,7 @@ Caches are typically cleared within 5-10 minutes of inactivity and are always re
2222

2323
Currently only the following models support prompt caching with Azure OpenAI:
2424

25+
- `o3-mini-2025-01-31`
2526
- `o1-2024-12-17`
2627
- `o1-preview-2024-09-12`
2728
- `o1-mini-2024-09-12`
@@ -36,7 +37,7 @@ Currently only the following models support prompt caching with Azure OpenAI:
3637
3738
## API support
3839

39-
Official support for prompt caching was first added in API version `2024-10-01-preview`. At this time, only the o1 model family supports the `cached_tokens` API response parameter.
40+
Official support for prompt caching was first added in API version `2024-10-01-preview`. At this time, only the o-series model family supports the `cached_tokens` API response parameter.
4041

4142
## Getting started
4243

@@ -50,7 +51,7 @@ When a match is found between the token computations in a prompt and the current
5051
```json
5152
{
5253
"created": 1729227448,
53-
"model": "o1-preview-2024-09-12",
54+
"model": "o1-2024-12-17",
5455
"object": "chat.completion",
5556
"service_tier": null,
5657
"system_fingerprint": "fp_50cdd5dc04",
@@ -82,13 +83,13 @@ Prompt caching is supported for:
8283

8384
|**Caching supported**|**Description**|**Supported models**|
8485
|--------|--------|--------|
85-
| **Messages** | The complete messages array: system, developer, user, and assistant content | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17)<br/>`gpt-4o-mini-realtime-preview` (version 2024-12-17)<br> `o1` (version 2024-12-17) |
86+
| **Messages** | The complete messages array: system, developer, user, and assistant content | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17)<br/>`gpt-4o-mini-realtime-preview` (version 2024-12-17)<br> `o1` (version 2024-12-17) <br> `o3-mini` (version 2025-01-31) |
8687
| **Images** | Images included in user messages, both as links or as base64-encoded data. The detail parameter must be set the same across requests. | `gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17) |
87-
| **Tool use** | Both the messages array and tool definitions. | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17)<br/>`gpt-4o-mini-realtime-preview` (version 2024-12-17)<br> `o1` (version 2024-12-17) |
88-
| **Structured outputs** | Structured output schema is appended as a prefix to the system message. | `gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17) |
88+
| **Tool use** | Both the messages array and tool definitions. | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17)<br/>`gpt-4o-mini-realtime-preview` (version 2024-12-17)<br> `o1` (version 2024-12-17) <br> `o3-mini` (version 2025-01-31) |
89+
| **Structured outputs** | Structured output schema is appended as a prefix to the system message. | `gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17) <br> `o3-mini` (version 2025-01-31) |
8990

9091
To improve the likelihood of cache hits occurring, you should structure your requests such that repetitive content occurs at the beginning of the messages array.
9192

9293
## Can I disable prompt caching?
9394

94-
Prompt caching is enabled by default for all supported models. There is no opt-out support for prompt caching.
95+
Prompt caching is enabled by default for all supported models. There is no opt-out support for prompt caching.

0 commit comments

Comments
 (0)