Skip to content

Commit a6c912b

Browse files
Merge pull request #5872 from mrbullwinkle/mrb_07_03_2025_prompt_caching
[Azure OpenAI] [Prompt caching support updates]
2 parents 73638c6 + e2c3453 commit a6c912b

File tree

1 file changed

+4
-16
lines changed

1 file changed

+4
-16
lines changed

articles/ai-foundry/openai/how-to/prompt-caching.md

Lines changed: 4 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -16,24 +16,12 @@ recommendations: false
1616

1717
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the service is able to retain a temporary cache of processed input token computations to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost. For supported models, cached tokens are billed at a [discount on input token pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) for Standard deployment types and up to [100% discount on input tokens](/azure/ai-services/openai/concepts/provisioned-throughput) for Provisioned deployment types.
1818

19-
Caches are typically cleared within 5-10 minutes of inactivity and are always removed within one hour of the cache's last use. Prompt caches are not shared between Azure subscriptions.
19+
Caches are typically cleared within 5-10 minutes of inactivity and are always removed within one hour of the cache's last use. Prompt caches are not shared between Azure subscriptions.
2020

2121
## Supported models
2222

23-
Currently only the following models support prompt caching with Azure OpenAI:
24-
25-
- `o3-mini-2025-01-31`
26-
- `o1-2024-12-17`
27-
- `o1-preview-2024-09-12`
28-
- `o1-mini-2024-09-12`
29-
- `gpt-4o-2024-11-20`
30-
- `gpt-4o-2024-08-06`
31-
- `gpt-4o-mini-2024-07-18`
32-
- `gpt-4o-realtime-preview` (version 2024-12-17)
33-
- `gpt-4o-mini-realtime-preview` (version 2024-12-17)
34-
- `gpt-4.1-2025-04-14`
35-
- `gpt-4.1-nano-2025-04-14`
36-
- `gpt-4.1-mini-2025-04-14`
23+
- Prompt caching is supported with all Azure OpenAI models GPT-4o or newer.
24+
- Prompt caching applies to models that have chat-completion, completion, responses, or real-time operations. For models which do not have these operations, this feature is not available.
3725

3826
## API support
3927

@@ -77,7 +65,7 @@ A single character difference in the first 1,024 tokens will result in a cache m
7765

7866
## What is cached?
7967

80-
o1-series models feature support varies by model. For more details, see our dedicated [reasoning models guide](./reasoning.md).
68+
o1-series models feature support varies by model. For more information, see our dedicated [reasoning models guide](./reasoning.md).
8169

8270
Prompt caching is supported for:
8371

0 commit comments

Comments
 (0)