Skip to content

Commit 5f22b8c

Browse files
committed
update
1 parent 0855211 commit 5f22b8c

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

articles/ai-services/openai/how-to/prompt-caching.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,14 @@ recommendations: false
1414

1515
# Prompt caching
1616

17-
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than re-process the same input tokens over and over again, the model is able to retain a temporary cache of processed input data to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost.
17+
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the model is able to retain a temporary cache of processed input data to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost.
1818

1919
## Supported models
2020

2121
Currently only the following models support prompt caching with Azure OpenAI:
2222

23-
- `o1-preview` (2024-09-12)
24-
- `o1-mini` (2024-09-12)
23+
- `o1-preview-2024-09-12`
24+
- `o1-mini-2024-09-12`
2525

2626
## API support
2727

@@ -31,10 +31,10 @@ Official support for prompt caching was first added in API version `2024-10-01-p
3131

3232
For a request to take advantage of prompt caching the request must be:
3333

34-
- A minimum of 1024 tokens in length.
35-
- The first 1024 tokens in the prompt must be identical.
34+
- A minimum of 1,024 tokens in length.
35+
- The first 1,024 tokens in the prompt must be identical.
3636

37-
When a match is found between a prompt and the current content of the prompt cache it is referred to a cache hit. Cache hits will show up as [`cached_tokens`](/azure/ai-services/openai/reference-preview#cached_tokens) under [`prompt_token_details`](/azure/ai-services/openai/reference-preview#properties-for-prompt_tokens_details) in the chat completions response.
37+
When a match is found between a prompt and the current content of the prompt cache, it's referred to as a cache hit. Cache hits will show up as [`cached_tokens`](/azure/ai-services/openai/reference-preview#cached_tokens) under [`prompt_token_details`](/azure/ai-services/openai/reference-preview#properties-for-prompt_tokens_details) in the chat completions response.
3838

3939
```json
4040
{
@@ -59,13 +59,13 @@ When a match is found between a prompt and the current content of the prompt cac
5959
}
6060
```
6161

62-
After the first 1024 tokens cache hits will occur for every 128 additional identical tokens.
62+
After the first 1,024 tokens cache hits will occur for every 128 additional identical tokens.
6363

64-
A single character difference in the first 1024 tokens will result in a cache miss which is characterized by a `cached_tokens` value of 0. Prompt caching is enabled by default with no additional configuration needed for supported models.
64+
A single character difference in the first 1,024 tokens will result in a cache miss which is characterized by a `cached_tokens` value of 0. Prompt caching is enabled by default with no additional configuration needed for supported models.
6565

6666
## What is cached?
6767

68-
The o1-series models are text only and do not support system messages, images, tool use/function calling, or structured outputs. This limits the efficacy of prompt caching for these models to the user/assistant portions of the messages array which are less likely to have an identical 1024 token prefix.
68+
The o1-series models are text only and don't support system messages, images, tool use/function calling, or structured outputs. This limits the efficacy of prompt caching for these models to the user/assistant portions of the messages array which are less likely to have an identical 1024 token prefix.
6969

7070
Once prompt caching is enabled for other supported models prompt caching will expand to support:
7171

@@ -76,8 +76,8 @@ Once prompt caching is enabled for other supported models prompt caching will ex
7676
|**Tool use**| Both the messages array and tool definitions |
7777
|**Structured outputs** | Structured output schema is appended as a prefix to the system message|
7878

79-
To improve the likelihood of cache hits occurring you should structure your requests such that repetitive content occurs at the beginning of the messages array.
79+
To improve the likelihood of cache hits occurring, you should structure your requests such that repetitive content occurs at the beginning of the messages array.
8080

8181
## Can I disable prompt caching?
8282

83-
Prompt caching is enabled by default. There is no opt out option.
83+
Prompt caching is enabled by default. There is no opt-out option.

0 commit comments

Comments
 (0)