Skip to content

Commit 19b44af

Browse files
committed
Learn Editor: Update prompt-caching.md
1 parent 3fdec72 commit 19b44af

File tree

1 file changed

+3
-7
lines changed

1 file changed

+3
-7
lines changed

articles/ai-services/openai/how-to/prompt-caching.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ recommendations: false
1414

1515
# Prompt caching
1616

17-
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the service is able to retain a temporary cache of processed input token computations to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost. For supported models, cached tokens are billed at a [50% discount on input token pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/).
17+
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the service is able to retain a temporary cache of processed input token computations to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost. For supported models, cached tokens are billed at a [50% discount on input token pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) for Standard deployment types and up to [100% discount on input tokens](/azure/ai-services/openai/concepts/provisioned-throughput) for Provisioned deployment types.
1818

1919
Caches are typically cleared within 5-10 minutes of inactivity and are always removed within one hour of the cache's last use. Prompt caches are not shared between Azure subscriptions.
2020

@@ -30,7 +30,7 @@ Currently only the following models support prompt caching with Azure OpenAI:
3030

3131
## API support
3232

33-
Official support for prompt caching was first added in API version `2024-10-01-preview`. At this time, only `o1-preview-2024-09-12` and `o1-mini-2024-09-12` models support the `cached_tokens` API response parameter.
33+
Official support for prompt caching was first added in API version `2024-10-01-preview`. At this time, only the o1 model family supports the `cached_tokens` API response parameter.
3434

3535
## Getting started
3636

@@ -85,8 +85,4 @@ To improve the likelihood of cache hits occurring, you should structure your req
8585

8686
## Can I disable prompt caching?
8787

88-
Prompt caching is enabled by default for all supported models. There is no opt-out support for prompt caching.
89-
90-
## How does prompt caching work for Provisioned deployments?
91-
92-
For supported models on provisioned deployments, we discount up to 100% of cached input tokens. For more information, see our [Provisioned Throughput documentation](/azure/ai-services/openai/concepts/provisioned-throughput).
88+
Prompt caching is enabled by default for all supported models. There is no opt-out support for prompt caching.

0 commit comments

Comments
 (0)