Skip to content

Commit 6d49935

Browse files
authored
Merge pull request #227625 from eric-urban/eur/quotas-limits
refactor tables and include speech translation
2 parents 4e678bc + 435b6b6 commit 6d49935

File tree

1 file changed

+40
-36
lines changed

1 file changed

+40
-36
lines changed

articles/cognitive-services/Speech-Service/speech-services-quotas-and-limits.md

Lines changed: 40 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -8,36 +8,41 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 04/22/2022
11+
ms.date: 02/17/2023
1212
ms.author: alexeyo
1313
---
1414

1515
# Speech service quotas and limits
1616

1717
This article contains a quick reference and a detailed description of the quotas and limits for the Speech service in Azure Cognitive Services. The information applies to all [pricing tiers](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) of the service. It also contains some best practices to avoid request throttling.
1818

19+
For the free (F0) pricing tier, see also the monthly allowances at the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
20+
1921
## Quotas and limits reference
2022

21-
The following sections provide you with a quick guide to the quotas and limits that apply to Speech service.
23+
The following sections provide you with a quick guide to the quotas and limits that apply to the Speech service.
24+
25+
For information about adjustable quotas for Standard (S0) Speech resources, see [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-online-transcription-concurrent-request-limit). The quotas and limits for Free (F0) Speech resources aren't adjustable.
2226

2327
### Speech-to-text quotas and limits per resource
2428

25-
In the following tables, the parameters without the **Adjustable** row aren't adjustable for all price tiers.
29+
This section describes speech-to-text quotas and limits per Speech resource. Unless otherwise specified, the limits aren't adjustable.
2630

27-
#### Online transcription
31+
#### Online transcription and speech translation
2832

2933
You can use online transcription with the [Speech SDK](speech-sdk.md) or the [speech-to-text REST API for short audio](rest-speech-to-text-short.md).
3034

31-
| Quota | Free (F0)<sup>1</sup> | Standard (S0) |
35+
> [!IMPORTANT]
36+
> These limits apply to concurrent speech-to-text online transcription requests and speech translation requests combined. For example, if you have 60 concurrent speech-to-text requests and 40 concurrent speech translation requests, you'll reach the limit of 100 concurrent requests.
37+
38+
| Quota | Free (F0) | Standard (S0) |
3239
|--|--|--|
33-
| Concurrent request limit - base model endpoint | 1 | 100 (default value) |
34-
| Adjustable | No<sup>2</sup> | Yes<sup>2</sup> |
35-
| Concurrent request limit - custom endpoint | 1 | 100 (default value) |
36-
| Adjustable | No<sup>2</sup> | Yes<sup>2</sup> |
40+
| Concurrent request limit - base model endpoint | 1 <br/><br/>This limit isn't adjustable. | 100 (default value)<br/><br/>The rate is adjustable for Standard (S0) resources. See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-online-transcription-concurrent-request-limit). |
41+
| Concurrent request limit - custom endpoint | 1 <br/><br/>This limit isn't adjustable. | 100 (default value)<br/><br/>The rate is adjustable for Standard (S0) resources. See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-online-transcription-concurrent-request-limit). |
3742

3843
#### Batch transcription
3944

40-
| Quota | Free (F0)<sup>1</sup> | Standard (S0) |
45+
| Quota | Free (F0) | Standard (S0) |
4146
|--|--|--|
4247
| [Speech-to-text REST API](rest-speech-to-text.md) limit | Not available for F0 | 300 requests per minute |
4348
| Max audio input file size | N/A | 1 GB |
@@ -48,7 +53,9 @@ You can use online transcription with the [Speech SDK](speech-sdk.md) or the [sp
4853

4954
#### Model customization
5055

51-
| Quota | Free (F0)<sup>1</sup> | Standard (S0) |
56+
The limits in this table apply per Speech resource when you create a Custom Speech model.
57+
58+
| Quota | Free (F0) | Standard (S0) |
5259
|--|--|--|
5360
| REST API limit | 300 requests per minute | 300 requests per minute |
5461
| Max number of speech datasets | 2 | 500 |
@@ -57,33 +64,24 @@ You can use online transcription with the [Speech SDK](speech-sdk.md) or the [sp
5764
| Max pronunciation dataset file size for data import | 1 KB | 1 MB |
5865
| Max text size when you're using the `text` parameter in the [Models_Create](https://westcentralus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Models_Create/) API request | 200 KB | 500 KB |
5966

60-
<sup>1</sup> For the free (F0) pricing tier, see also the monthly allowances at the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).<br/>
61-
<sup>2</sup> See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-online-transcription-concurrent-request-limit).<br/>
67+
### Text-to-speech quotas and limits per resource
6268

63-
### Text-to-speech quotas and limits per Speech resource
69+
This section describes text-to-speech quotas and limits per Speech resource. Unless otherwise specified, the limits aren't adjustable.
6470

65-
In the following tables, the parameters without the **Adjustable** row aren't adjustable for all price tiers.
71+
#### Common text-to-speech quotas and limits
6672

67-
#### General
68-
69-
| Quota | Free (F0)<sup>3</sup> | Standard (S0) |
73+
| Quota | Free (F0) | Standard (S0) |
7074
|--|--|--|
71-
| **Max number of transactions per certain time period** | | |
72-
| Real-time API. Prebuilt neural voices and custom neural voices. | 20 transactions per 60 seconds | 200 transactions per second (TPS) (default value) |
73-
| Adjustable | No<sup>4</sup> | Yes<sup>5</sup>, up to 1000 TPS |
74-
| **HTTP-specific quotas** | | |
75+
| Maximum number of transactions per time period for prebuilt neural voices and custom neural voices. | 20 transactions per 60 seconds<br/><br/>This limit isn't adjustable. | 200 transactions per second (TPS) (default value)<br/><br/>The rate is adjustable up to 1000 TPS for Standard (S0) resources. See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#text-to-speech-increase-concurrent-request-limit). |
7576
| Max audio length produced per request | 10 min | 10 min |
7677
| Max total number of distinct `<voice>` and `<audio>` tags in SSML | 50 | 50 |
77-
| **Websocket specific quotas** | | |
78-
| Max audio length produced per turn | 10 min | 10 min |
79-
| Max total number of distinct `<voice>` and `<audio>` tags in SSML | 50 | 50 |
80-
| Max SSML message size per turn | 64 KB | 64 KB |
78+
| Max SSML message size per turn for websocket | 64 KB | 64 KB |
8179

8280
#### Custom Neural Voice
8381

84-
| Quota | Free (F0)<sup>3</sup> | Standard (S0) |
82+
| Quota | Free (F0)| Standard (S0) |
8583
|--|--|--|
86-
| Max number of transactions per second (TPS) | Not available for F0 | See [General](#general) |
84+
| Max number of transactions per second (TPS) | Not available for F0 | 200 transactions per second (TPS) (default value) |
8785
| Max number of datasets | N/A | 500 |
8886
| Max number of simultaneous dataset uploads | N/A | 5 |
8987
| Max data file size for data import per dataset | N/A | 2 GB |
@@ -98,12 +96,20 @@ In the following tables, the parameters without the **Adjustable** row aren't ad
9896
| File size | 3,000 characters per file | 20,000 characters per file |
9997
| Export to audio library | 1 concurrent task | N/A |
10098

101-
<sup>3</sup> For the free (F0) pricing tier, see also the monthly allowances at the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).<br/>
102-
<sup>4</sup> See [additional explanations](#detailed-description-quota-adjustment-and-best-practices) and [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling).<br/>
103-
<sup>5</sup> See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#text-to-speech-increase-concurrent-request-limit).<br/>
99+
### Speaker recognition quotas and limits per resource
100+
101+
Speaker recognition is limited to 20 transactions per second (TPS).
104102

105103
## Detailed description, quota adjustment, and best practices
106104

105+
Some of the Speech service quotas are adjustable. This section provides additional explanations, best practices, and adjustment instructions.
106+
107+
The following quotas are adjustable for Standard (S0) resources. The Free (F0) request limits aren't adjustable.
108+
109+
- Speech-to-text [concurrent request limit](#online-transcription-and-speech-translation) for base model endpoint and custom endpoint
110+
- Text-to-speech [maximum number of transactions per time period](#text-to-speech-quotas-and-limits-per-resource) for prebuilt neural voices and custom neural voices
111+
- Speech translation [concurrent request limit](#online-transcription-and-speech-translation)
112+
107113
Before requesting a quota increase (where applicable), ensure that it's necessary. Speech service uses autoscaling technologies to bring the required computational resources in on-demand mode. At the same time, Speech service tries to keep your costs low by not maintaining an excessive amount of hardware capacity.
108114

109115
Let's look at an example. Suppose that your application receives response code 429, which indicates that there are too many requests. Your application receives this response even though your workload is within the limits defined by the [Quotas and limits reference](#quotas-and-limits-reference). The most likely explanation is that Speech service is scaling up to your demand and didn't reach the required scale yet. Therefore the service doesn't immediately have enough resources to serve the request. In most cases, this throttled state is transient.
@@ -121,14 +127,12 @@ The next sections describe specific cases of adjusting quotas.
121127

122128
### Speech-to-text: increase online transcription concurrent request limit
123129

124-
By default, the number of concurrent requests is limited to 100 per resource in the base model, and 100 per custom endpoint in the custom model. For the standard pricing tier, you can increase this amount. Before submitting the request, ensure that you're familiar with the material discussed earlier in this article, such as the best practices to mitigate throttling.
130+
By default, the number of concurrent speech-to-text [online transcription requests and speech translation requests](#online-transcription-and-speech-translation) combined is limited to 100 per resource in the base model, and 100 per custom endpoint in the custom model. For the standard pricing tier, you can increase this amount. Before submitting the request, ensure that you're familiar with the material discussed earlier in this article, such as the best practices to mitigate throttling.
125131

126132
>[!NOTE]
127-
> If you use custom models, be aware that one Speech service resource might be associated with many custom endpoints hosting many custom model deployments. Each custom endpoint has the default limit of concurrent requests (100) set by creation. If you need to adjust it, you need to make the adjustment of each custom endpoint *separately*. Note also that the value of the limit of concurrent requests for the base model of a resource has *no* effect to the custom endpoints associated with this resource.
128-
129-
Increasing the limit of concurrent requests doesn't directly affect your costs. Speech service uses a payment model that requires that you pay only for what you use. The limit defines how high the service can scale before it starts throttle your requests.
133+
> Concurrent request limits for base and custom models need to be adjusted separately. You can have a Speech service resource that's associated with many custom endpoints hosting many custom model deployments. As needed, the limit adjustments per custom endpoint must be requested separately.
130134
131-
Concurrent request limits for base and custom models need to be adjusted separately.
135+
Increasing the limit of concurrent requests doesn't directly affect your costs. The Speech service uses a payment model that requires that you pay only for what you use. The limit defines how high the service can scale before it starts throttle your requests.
132136

133137
You aren't able to see the existing value of the concurrent request limit parameter in the Azure portal, the command-line tools, or API requests. To verify the existing value, create an Azure support request.
134138

0 commit comments

Comments
 (0)