Skip to content

Commit 3fdec72

Browse files
committed
Learn Editor: Update prompt-caching.md
1 parent e9794a4 commit 3fdec72

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

articles/ai-services/openai/how-to/prompt-caching.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ recommendations: false
1414

1515
# Prompt caching
1616

17-
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the model is able to retain a temporary cache of processed input data to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost. For supported models, cached tokens are billed at a [50% discount on input token pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/).
17+
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the service is able to retain a temporary cache of processed input token computations to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost. For supported models, cached tokens are billed at a [50% discount on input token pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/).
18+
19+
Caches are typically cleared within 5-10 minutes of inactivity and are always removed within one hour of the cache's last use. Prompt caches are not shared between Azure subscriptions.
1820

1921
## Supported models
2022

@@ -37,7 +39,7 @@ For a request to take advantage of prompt caching the request must be both:
3739
- A minimum of 1,024 tokens in length.
3840
- The first 1,024 tokens in the prompt must be identical.
3941

40-
When a match is found between a prompt and the current content of the prompt cache, it's referred to as a cache hit. Cache hits will show up as [`cached_tokens`](/azure/ai-services/openai/reference-preview#cached_tokens) under [`prompt_token_details`](/azure/ai-services/openai/reference-preview#properties-for-prompt_tokens_details) in the chat completions response.
42+
When a match is found between the token computations in a prompt and the current content of the prompt cache, it's referred to as a cache hit. Cache hits will show up as [`cached_tokens`](/azure/ai-services/openai/reference-preview#cached_tokens) under [`prompt_token_details`](/azure/ai-services/openai/reference-preview#properties-for-prompt_tokens_details) in the chat completions response.
4143

4244
```json
4345
{
@@ -83,7 +85,7 @@ To improve the likelihood of cache hits occurring, you should structure your req
8385

8486
## Can I disable prompt caching?
8587

86-
Prompt caching is enabled by default. There is no opt-out option.
88+
Prompt caching is enabled by default for all supported models. There is no opt-out support for prompt caching.
8789

8890
## How does prompt caching work for Provisioned deployments?
8991

0 commit comments

Comments
 (0)