Skip to content

Commit 9d39745

Browse files
authored
Merge pull request #1743 from MicrosoftDocs/main
11/27 11:00 AM IST Publish
2 parents 796b906 + 4bd1ba4 commit 9d39745

File tree

4 files changed

+10
-12
lines changed

4 files changed

+10
-12
lines changed

articles/ai-services/openai/how-to/prompt-caching.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ recommendations: false
1414

1515
# Prompt caching
1616

17-
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the model is able to retain a temporary cache of processed input data to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost. For supported models, cached tokens are billed at a [50% discount on input token pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/).
17+
Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the service is able to retain a temporary cache of processed input token computations to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost. For supported models, cached tokens are billed at a [50% discount on input token pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) for Standard deployment types and up to [100% discount on input tokens](/azure/ai-services/openai/concepts/provisioned-throughput) for Provisioned deployment types.
18+
19+
Caches are typically cleared within 5-10 minutes of inactivity and are always removed within one hour of the cache's last use. Prompt caches are not shared between Azure subscriptions.
1820

1921
## Supported models
2022

@@ -28,7 +30,7 @@ Currently only the following models support prompt caching with Azure OpenAI:
2830

2931
## API support
3032

31-
Official support for prompt caching was first added in API version `2024-10-01-preview`. At this time, only `o1-preview-2024-09-12` and `o1-mini-2024-09-12` models support the `cached_tokens` API response parameter.
33+
Official support for prompt caching was first added in API version `2024-10-01-preview`. At this time, only the o1 model family supports the `cached_tokens` API response parameter.
3234

3335
## Getting started
3436

@@ -37,7 +39,7 @@ For a request to take advantage of prompt caching the request must be both:
3739
- A minimum of 1,024 tokens in length.
3840
- The first 1,024 tokens in the prompt must be identical.
3941

40-
When a match is found between a prompt and the current content of the prompt cache, it's referred to as a cache hit. Cache hits will show up as [`cached_tokens`](/azure/ai-services/openai/reference-preview#cached_tokens) under [`prompt_token_details`](/azure/ai-services/openai/reference-preview#properties-for-prompt_tokens_details) in the chat completions response.
42+
When a match is found between the token computations in a prompt and the current content of the prompt cache, it's referred to as a cache hit. Cache hits will show up as [`cached_tokens`](/azure/ai-services/openai/reference-preview#cached_tokens) under [`prompt_token_details`](/azure/ai-services/openai/reference-preview#properties-for-prompt_tokens_details) in the chat completions response.
4143

4244
```json
4345
{
@@ -83,8 +85,4 @@ To improve the likelihood of cache hits occurring, you should structure your req
8385

8486
## Can I disable prompt caching?
8587

86-
Prompt caching is enabled by default. There is no opt-out option.
87-
88-
## How does prompt caching work for Provisioned deployments?
89-
90-
For supported models on provisioned deployments, we discount up to 100% of cached input tokens. For more information, see our [Provisioned Throughput documentation](/azure/ai-services/openai/concepts/provisioned-throughput).
88+
Prompt caching is enabled by default for all supported models. There is no opt-out support for prompt caching.

articles/ai-services/speech-service/fast-transcription-create.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1102,6 +1102,6 @@ Here are some property options to configure a transcription when you call the [T
11021102

11031103
## Related content
11041104

1105-
- [Fast transcription REST API reference](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-2024-11-15&preserve-view=true)
1105+
- [Fast transcription REST API reference](https://go.microsoft.com/fwlink/?linkid=2296107)
11061106
- [Speech to text supported languages](./language-support.md?tabs=stt)
11071107
- [Batch transcription](./batch-transcription.md)

articles/ai-services/speech-service/speech-services-quotas-and-limits.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,10 +136,10 @@ The limits in this table apply per Speech resource when you create a personal vo
136136
| Quota | Free (F0)| Standard (S0) |
137137
|-----|-----|-----|
138138
| New connections per minute | Not available for F0 | 2 new connections per minute |
139-
| Max connection duration with speaking | Not available for F0 | 10 minutes<sup>1</sup> |
139+
| Max connection duration with speaking | Not available for F0 | 20 minutes<sup>1</sup> |
140140
| Max connection duration with idle state | Not available for F0 | 5 minutes |
141141

142-
<sup>1</sup> To ensure continuous operation of the real-time avatar for more than 10 minutes, you can enable auto-reconnect. For information about how to set up auto-reconnect, refer to this [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/README.md) (search "auto reconnect").
142+
<sup>1</sup> To ensure continuous operation of the real-time avatar for more than 20 minutes, you can enable auto-reconnect. For information about how to set up auto-reconnect, refer to this [sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/README.md) (search "auto reconnect").
143143

144144
#### Audio Content Creation tool
145145

articles/ai-services/speech-service/text-to-speech-avatar/real-time-synthesis-avatar.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ avatarSynthesizer.startAvatarAsync(peerConnection).then(
148148
);
149149
```
150150

151-
Our real-time API disconnects after 5 minutes of avatar's idle state. Even if the avatar isn't idle and functioning normally, the real-time API will disconnect after a 10-minute connection. To ensure continuous operation of the real-time avatar for more than 10 minutes, you can enable automatic reconnect. For information about how to set up automatic reconnect, refer to this [JavaScript sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/README.md) (search "auto reconnect").
151+
Our real-time API disconnects after 5 minutes of avatar's idle state. Even if the avatar isn't idle and functioning normally, the real-time API will disconnect after a 20-minute connection. To ensure continuous operation of the real-time avatar for more than 20 minutes, you can enable automatic reconnect. For information about how to set up automatic reconnect, refer to this [JavaScript sample code](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/js/browser/avatar/README.md) (search "auto reconnect").
152152

153153
## Synthesize talking avatar video from text input
154154

0 commit comments

Comments
 (0)