You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/speech-services-quotas-and-limits.md
+35-18Lines changed: 35 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,28 +16,33 @@ ms.author: alexeyo
16
16
17
17
This article contains a quick reference and a detailed description of the quotas and limits for the Speech service in Azure Cognitive Services. The information applies to all [pricing tiers](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) of the service. It also contains some best practices to avoid request throttling.
18
18
19
+
For the free (F0) pricing tier, see also the monthly allowances at the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
20
+
19
21
## Quotas and limits reference
20
22
21
23
The following sections provide you with a quick guide to the quotas and limits that apply to Speech service.
22
24
25
+
For information about adjustable quotas for Standard (S0) Speech resources, see [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-online-transcription-concurrent-request-limit). Request limits for Free (F0) Speech resources aren't adjustable.
26
+
23
27
### Speech-to-text quotas and limits per resource
24
28
25
-
In the following tables, the parameters without the **Adjustable** row aren't adjustable for all price tiers.
29
+
This section describes speech-to-text quotas and limits per Speech resource.
26
30
27
31
#### Online transcription
28
32
29
33
You can use online transcription with the [Speech SDK](speech-sdk.md) or the [speech-to-text REST API for short audio](rest-speech-to-text-short.md).
30
34
31
-
| Quota | Free (F0)<sup>1</sup> | Standard (S0) |
35
+
> [!IMPORTANT]
36
+
> These limits apply to concurrent speech-to-text [online transcription](#online-transcription) requests and [speech translation](#speech-translation-quotas-and-limits-per-resource) requests combined. For example, if you have 50 concurrent speech-to-text requests and 50 concurrent speech translation requests, you'll reach the limit of 100 concurrent requests.
37
+
38
+
| Quota | Free (F0) | Standard (S0) |
32
39
|--|--|--|
33
40
| Concurrent request limit - base model endpoint | 1 | 100 (default value) |
|[Speech-to-text REST API](rest-speech-to-text.md) limit | Not available for F0 | 300 requests per minute |
43
48
| Max audio input file size | N/A | 1 GB |
@@ -48,7 +53,7 @@ You can use online transcription with the [Speech SDK](speech-sdk.md) or the [sp
48
53
49
54
#### Model customization
50
55
51
-
| Quota | Free (F0)<sup>1</sup>| Standard (S0) |
56
+
| Quota | Free (F0) | Standard (S0) |
52
57
|--|--|--|
53
58
| REST API limit | 300 requests per minute | 300 requests per minute |
54
59
| Max number of speech datasets | 2 | 500 |
@@ -57,20 +62,16 @@ You can use online transcription with the [Speech SDK](speech-sdk.md) or the [sp
57
62
| Max pronunciation dataset file size for data import | 1 KB | 1 MB |
58
63
| Max text size when you're using the `text` parameter in the [Models_Create](https://westcentralus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Models_Create/) API request | 200 KB | 500 KB |
59
64
60
-
<sup>1</sup> For the free (F0) pricing tier, see also the monthly allowances at the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).<br/>
61
-
<sup>2</sup> See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-online-transcription-concurrent-request-limit).<br/>
62
-
63
65
### Text-to-speech quotas and limits per Speech resource
64
66
65
-
In the following tables, the parameters without the **Adjustable** row aren't adjustable for all price tiers.
67
+
This section describes text-to-speech quotas and limits per Speech resource.
66
68
67
69
#### General
68
70
69
-
| Quota | Free (F0)<sup>3</sup>| Standard (S0) |
71
+
| Quota | Free (F0) | Standard (S0) |
70
72
|--|--|--|
71
73
|**Max number of transactions per certain time period**|||
72
-
| Real-time API. Prebuilt neural voices and custom neural voices. | 20 transactions per 60 seconds | 200 transactions per second (TPS) (default value) |
73
-
| Adjustable | No<sup>4</sup> | Yes<sup>5</sup>, up to 1000 TPS |
74
+
| Real-time API. Prebuilt neural voices and custom neural voices. | 20 transactions per 60 seconds | 200 transactions per second (TPS) (default value)<br/><br/>The rate is adjustable up to 1000 TPS for Standard (S0) resources. See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#text-to-speech-increase-concurrent-request-limit). |
74
75
|**HTTP-specific quotas**|||
75
76
| Max audio length produced per request | 10 min | 10 min |
76
77
| Max total number of distinct `<voice>` and `<audio>` tags in SSML | 50 | 50 |
@@ -81,9 +82,9 @@ In the following tables, the parameters without the **Adjustable** row aren't ad
81
82
82
83
#### Custom Neural Voice
83
84
84
-
| Quota | Free (F0)<sup>3</sup> | Standard (S0) |
85
+
| Quota | Free (F0)| Standard (S0) |
85
86
|--|--|--|
86
-
| Max number of transactions per second (TPS) | Not available for F0 |See [General](#general)|
87
+
| Max number of transactions per second (TPS) | Not available for F0 |200 transactions per second (TPS) (default value) |
87
88
| Max number of datasets | N/A | 500 |
88
89
| Max number of simultaneous dataset uploads | N/A | 5 |
89
90
| Max data file size for data import per dataset | N/A | 2 GB |
@@ -98,12 +99,28 @@ In the following tables, the parameters without the **Adjustable** row aren't ad
98
99
| File size | 3,000 characters per file | 20,000 characters per file |
<sup>3</sup> For the free (F0) pricing tier, see also the monthly allowances at the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).<br/>
102
-
<sup>4</sup> See [additional explanations](#detailed-description-quota-adjustment-and-best-practices) and [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling).<br/>
103
-
<sup>5</sup> See [additional explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#text-to-speech-increase-concurrent-request-limit).<br/>
102
+
### Speech translation quotas and limits per resource
103
+
104
+
This section describes speech translation quotas and limits per Speech resource.
105
+
106
+
> [!IMPORTANT]
107
+
> These limits apply to concurrent speech-to-text [online transcription](#online-transcription) requests and [speech translation](#speech-translation-quotas-and-limits-per-resource) requests combined. For example, if you have 50 concurrent speech-to-text requests and 50 concurrent speech translation requests, you'll reach the limit of 100 concurrent requests.
108
+
109
+
| Quota | Free (F0) | Standard (S0) |
110
+
|--|--|--|
111
+
| Concurrent request limit - base model endpoint | 1 | 100 (default value) |
## Detailed description, quota adjustment, and best practices
106
115
116
+
Some of the Speech service quotas are adjustable. This section provides additional explanations, best practices, and adjustment instructions.
117
+
118
+
The following quotas are adjustable for Standard (S0) resources. The Free (F0) request limits aren't adjustable.
119
+
120
+
- Speech-to-text [concurrent request limit](#online-transcription) for base model endpoint and custom endpoint
121
+
- Text-to-speech [maximum number of transactions per time period](#text-to-speech-quotas-and-limits-per-speech-resource) for prebuilt neural voices and custom neural voices
Before requesting a quota increase (where applicable), ensure that it's necessary. Speech service uses autoscaling technologies to bring the required computational resources in on-demand mode. At the same time, Speech service tries to keep your costs low by not maintaining an excessive amount of hardware capacity.
108
125
109
126
Let's look at an example. Suppose that your application receives response code 429, which indicates that there are too many requests. Your application receives this response even though your workload is within the limits defined by the [Quotas and limits reference](#quotas-and-limits-reference). The most likely explanation is that Speech service is scaling up to your demand and didn't reach the required scale yet. Therefore the service doesn't immediately have enough resources to serve the request. In most cases, this throttled state is transient.
0 commit comments