You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/rest-text-to-speech.md
+27-18Lines changed: 27 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,13 +3,13 @@ title: Text-to-speech API reference (REST) - Speech service
3
3
titleSuffix: Azure Cognitive Services
4
4
description: Learn how to use the text-to-speech REST API. In this article, you'll learn about authorization options, query options, how to structure a request and receive a response.
5
5
services: cognitive-services
6
-
author: erhopf
6
+
author: IEvangelist
7
7
manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
10
ms.topic: conceptual
11
-
ms.date: 12/09/2019
12
-
ms.author: erhopf
11
+
ms.date: 03/23/2020
12
+
ms.author: dapine
13
13
---
14
14
15
15
# Text-to-speech REST API
@@ -94,35 +94,44 @@ This response has been truncated to illustrate the structure of a response.
94
94
"Name": "Microsoft Server Speech Text to Speech Voice (ar-EG, Hoda)",
95
95
"ShortName": "ar-EG-Hoda",
96
96
"Gender": "Female",
97
-
"Locale": "ar-EG"
97
+
"Locale": "ar-EG",
98
+
"SampleRateHertz": "16000",
99
+
"VoiceType": "Standard"
98
100
},
99
101
{
100
102
"Name": "Microsoft Server Speech Text to Speech Voice (ar-SA, Naayf)",
101
103
"ShortName": "ar-SA-Naayf",
102
104
"Gender": "Male",
103
-
"Locale": "ar-SA"
105
+
"Locale": "ar-SA",
106
+
"SampleRateHertz": "16000",
107
+
"VoiceType": "Standard"
104
108
},
105
109
{
106
110
"Name": "Microsoft Server Speech Text to Speech Voice (bg-BG, Ivan)",
107
111
"ShortName": "bg-BG-Ivan",
108
112
"Gender": "Male",
109
-
"Locale": "bg-BG"
113
+
"Locale": "bg-BG",
114
+
"SampleRateHertz": "16000",
115
+
"VoiceType": "Standard"
110
116
},
111
117
{
112
118
"Name": "Microsoft Server Speech Text to Speech Voice (ca-ES, HerenaRUS)",
113
119
"ShortName": "ca-ES-HerenaRUS",
114
120
"Gender": "Female",
115
-
"Locale": "ca-ES"
121
+
"Locale": "ca-ES",
122
+
"SampleRateHertz": "16000",
123
+
"VoiceType": "Standard"
116
124
},
117
125
{
118
-
"Name": "Microsoft Server Speech Text to Speech Voice (cs-CZ, Jakub)",
119
-
"ShortName": "cs-CZ-Jakub",
120
-
"Gender": "Male",
121
-
"Locale": "cs-CZ"
126
+
"Name": "Microsoft Server Speech Text to Speech Voice (zh-CN, XiaoxiaoNeural)",
127
+
"ShortName": "zh-CN-XiaoxiaoNeural",
128
+
"Gender": "Female",
129
+
"Locale": "zh-CN",
130
+
"SampleRateHertz": "24000",
131
+
"VoiceType": "Neural"
122
132
},
123
133
124
134
...
125
-
126
135
]
127
136
```
128
137
@@ -136,7 +145,7 @@ The HTTP status code for each response indicates success or common errors.
136
145
| 400 | Bad Request | A required parameter is missing, empty, or null. Or, the value passed to either a required or optional parameter is invalid. A common issue is a header that is too long. |
137
146
| 401 | Unauthorized | The request is not authorized. Check to make sure your subscription key or token is valid and in the correct region. |
138
147
| 429 | Too Many Requests | You have exceeded the quota or rate of requests allowed for your subscription. |
139
-
| 502 | Bad Gateway| Network or server-side issue. May also indicate invalid headers. |
148
+
| 502 | Bad Gateway| Network or server-side issue. May also indicate invalid headers. |
140
149
141
150
142
151
## Convert text-to-speech
@@ -186,7 +195,7 @@ The body of each `POST` request is sent as [Speech Synthesis Markup Language (SS
186
195
187
196
### Sample request
188
197
189
-
This HTTP request uses SSML to specify the voice and language. The body cannot exceed 1,000 characters.
198
+
This HTTP request uses SSML to specify the voice and language. If the body length is long, and the resulting audio exceeds 10 minutes - it is truncated to 10 minutes. In other words, the audio length cannot exceed 10 minutes.
190
199
191
200
```http
192
201
POST /cognitiveservices/v1 HTTP/1.1
@@ -221,12 +230,12 @@ The HTTP status code for each response indicates success or common errors.
221
230
| 413 | Request Entity Too Large | The SSML input is longer than 1024 characters. |
222
231
| 415 | Unsupported Media Type | It's possible that the wrong `Content-Type` was provided. `Content-Type` should be set to `application/ssml+xml`. |
223
232
| 429 | Too Many Requests | You have exceeded the quota or rate of requests allowed for your subscription. |
224
-
| 502 | Bad Gateway| Network or server-side issue. May also indicate invalid headers. |
233
+
| 502 | Bad Gateway| Network or server-side issue. May also indicate invalid headers. |
225
234
226
235
If the HTTP status is `200 OK`, the body of the response contains an audio file in the requested format. This file can be played as it's transferred, saved to a buffer, or saved to a file.
227
236
228
237
## Next steps
229
238
230
-
-[Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/speech-synthesis-markup.md
+17-9Lines changed: 17 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
10
ms.topic: conceptual
11
-
ms.date: 03/11/2020
11
+
ms.date: 03/23/2020
12
12
ms.author: dapine
13
13
---
14
14
@@ -326,23 +326,31 @@ Phonetic alphabets are composed of phones, which are made up of letters, numbers
326
326
327
327
| Attribute | Description | Required / Optional |
328
328
|-----------|-------------|---------------------|
329
-
|`alphabet`| Specifies the phonetic alphabet to use when synthesizing the pronunciation of the string in the `ph` attribute. The string specifying the alphabet must be specified in lowercase letters. The following are the possible alphabets that you may specify.<ul><li>`ipa`– International Phonetic Alphabet</li><li>`sapi`– Speech service phonetic alphabet</li><li>`ups`– Universal Phone Set</li></ul><br>The alphabet applies only to the `phoneme` in the element. For more information, see [Phonetic Alphabet Reference](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet). | Optional |
329
+
|`alphabet`| Specifies the phonetic alphabet to use when synthesizing the pronunciation of the string in the `ph` attribute. The string specifying the alphabet must be specified in lowercase letters. The following are the possible alphabets that you may specify.<ul><li>`ipa`–<ahref="https://en.wikipedia.org/wiki/International_Phonetic_Alphabet"target="_blank">International Phonetic Alphabet <spanclass="docon docon-navigate-external x-hidden-focus"></span></a></li><li>`sapi`–[Speech service phonetic alphabet](speech-ssml-phonetic-sets.md)</li><li>`ups`– Universal Phone Set</li></ul><br>The alphabet applies only to the `phoneme` in the element.. | Optional |
330
330
|`ph`| A string containing phones that specify the pronunciation of the word in the `phoneme` element. If the specified string contains unrecognized phones, the text-to-speech (TTS) service rejects the entire SSML document and produces none of the speech output specified in the document. | Required if using phonemes. |
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/text-to-speech.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,13 +3,13 @@ title: Text-to-speech - Speech service
3
3
titleSuffix: Azure Cognitive Services
4
4
description: The text-to-speech feature in the Speech service enables your applications, tools, or devices to convert text into natural human-like synthesized speech. Choose preset voices or create your own custom voice.
5
5
services: cognitive-services
6
-
author: erhopf
6
+
author: IEvangelist
7
7
manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
10
ms.topic: conceptual
11
-
ms.date: 03/11/2020
12
-
ms.author: erhopf
11
+
ms.date: 03/23/2020
12
+
ms.author: dapine
13
13
---
14
14
15
15
# What is text-to-speech?
@@ -26,7 +26,7 @@ Text-to-speech from the Speech service enables your applications, tools, or devi
26
26
27
27
* Speech synthesis - Use the [Speech SDK](quickstarts/text-to-speech-audio-file.md) or [REST API](rest-text-to-speech.md) to convert text-to-speech using standard, neural, or custom voices.
28
28
29
-
* Asynchronous synthesis of long audio - Use the [Long Audio API](long-audio-api.md) to asynchronously synthesize text-to-speech files longer than 10 minutes (for example audio books or lectures). Unlike synthesis performed using the Speech SDK or speech-to-text REST API, responses aren't returned in real time. The expectation is that requests are sent asynchronously, responses are polled for, and that the synthesized audio is downloaded when made available from the service. Only neural voices are supported.
29
+
* Asynchronous synthesis of long audio - Use the [Long Audio API](long-audio-api.md) to asynchronously synthesize text-to-speech files longer than 10 minutes (for example audio books or lectures). Unlike synthesis performed using the Speech SDK or speech-to-text REST API, responses aren't returned in real time. The expectation is that requests are sent asynchronously, responses are polled for, and that the synthesized audio is downloaded when made available from the service. Only custom neural voices are supported.
30
30
31
31
* Standard voices - Created using Statistical Parametric Synthesis and/or Concatenation Synthesis techniques. These voices are highly intelligible and sound natural. You can easily enable your applications to speak in more than 45 languages, with a wide range of voice options. These voices provide high pronunciation accuracy, including support for abbreviations, acronym expansions, date/time interpretations, polyphones, and more. For a full list of standard voices, see [supported languages](language-support.md#text-to-speech).
0 commit comments