Skip to content

Commit f1db56a

Browse files
authored
Merge pull request #108605 from IEvangelist/ttsSweep
Updated TTS docs, based on direct feedback
2 parents 4c8635a + c26b727 commit f1db56a

File tree

3 files changed

+48
-31
lines changed

3 files changed

+48
-31
lines changed

articles/cognitive-services/Speech-Service/rest-text-to-speech.md

Lines changed: 27 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@ title: Text-to-speech API reference (REST) - Speech service
33
titleSuffix: Azure Cognitive Services
44
description: Learn how to use the text-to-speech REST API. In this article, you'll learn about authorization options, query options, how to structure a request and receive a response.
55
services: cognitive-services
6-
author: erhopf
6+
author: IEvangelist
77
manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 12/09/2019
12-
ms.author: erhopf
11+
ms.date: 03/23/2020
12+
ms.author: dapine
1313
---
1414

1515
# Text-to-speech REST API
@@ -94,35 +94,44 @@ This response has been truncated to illustrate the structure of a response.
9494
"Name": "Microsoft Server Speech Text to Speech Voice (ar-EG, Hoda)",
9595
"ShortName": "ar-EG-Hoda",
9696
"Gender": "Female",
97-
"Locale": "ar-EG"
97+
"Locale": "ar-EG",
98+
"SampleRateHertz": "16000",
99+
"VoiceType": "Standard"
98100
},
99101
{
100102
"Name": "Microsoft Server Speech Text to Speech Voice (ar-SA, Naayf)",
101103
"ShortName": "ar-SA-Naayf",
102104
"Gender": "Male",
103-
"Locale": "ar-SA"
105+
"Locale": "ar-SA",
106+
"SampleRateHertz": "16000",
107+
"VoiceType": "Standard"
104108
},
105109
{
106110
"Name": "Microsoft Server Speech Text to Speech Voice (bg-BG, Ivan)",
107111
"ShortName": "bg-BG-Ivan",
108112
"Gender": "Male",
109-
"Locale": "bg-BG"
113+
"Locale": "bg-BG",
114+
"SampleRateHertz": "16000",
115+
"VoiceType": "Standard"
110116
},
111117
{
112118
"Name": "Microsoft Server Speech Text to Speech Voice (ca-ES, HerenaRUS)",
113119
"ShortName": "ca-ES-HerenaRUS",
114120
"Gender": "Female",
115-
"Locale": "ca-ES"
121+
"Locale": "ca-ES",
122+
"SampleRateHertz": "16000",
123+
"VoiceType": "Standard"
116124
},
117125
{
118-
"Name": "Microsoft Server Speech Text to Speech Voice (cs-CZ, Jakub)",
119-
"ShortName": "cs-CZ-Jakub",
120-
"Gender": "Male",
121-
"Locale": "cs-CZ"
126+
"Name": "Microsoft Server Speech Text to Speech Voice (zh-CN, XiaoxiaoNeural)",
127+
"ShortName": "zh-CN-XiaoxiaoNeural",
128+
"Gender": "Female",
129+
"Locale": "zh-CN",
130+
"SampleRateHertz": "24000",
131+
"VoiceType": "Neural"
122132
},
123133

124134
...
125-
126135
]
127136
```
128137

@@ -136,7 +145,7 @@ The HTTP status code for each response indicates success or common errors.
136145
| 400 | Bad Request | A required parameter is missing, empty, or null. Or, the value passed to either a required or optional parameter is invalid. A common issue is a header that is too long. |
137146
| 401 | Unauthorized | The request is not authorized. Check to make sure your subscription key or token is valid and in the correct region. |
138147
| 429 | Too Many Requests | You have exceeded the quota or rate of requests allowed for your subscription. |
139-
| 502 | Bad Gateway | Network or server-side issue. May also indicate invalid headers. |
148+
| 502 | Bad Gateway | Network or server-side issue. May also indicate invalid headers. |
140149

141150

142151
## Convert text-to-speech
@@ -186,7 +195,7 @@ The body of each `POST` request is sent as [Speech Synthesis Markup Language (SS
186195
187196
### Sample request
188197

189-
This HTTP request uses SSML to specify the voice and language. The body cannot exceed 1,000 characters.
198+
This HTTP request uses SSML to specify the voice and language. If the body length is long, and the resulting audio exceeds 10 minutes - it is truncated to 10 minutes. In other words, the audio length cannot exceed 10 minutes.
190199

191200
```http
192201
POST /cognitiveservices/v1 HTTP/1.1
@@ -221,12 +230,12 @@ The HTTP status code for each response indicates success or common errors.
221230
| 413 | Request Entity Too Large | The SSML input is longer than 1024 characters. |
222231
| 415 | Unsupported Media Type | It's possible that the wrong `Content-Type` was provided. `Content-Type` should be set to `application/ssml+xml`. |
223232
| 429 | Too Many Requests | You have exceeded the quota or rate of requests allowed for your subscription. |
224-
| 502 | Bad Gateway | Network or server-side issue. May also indicate invalid headers. |
233+
| 502 | Bad Gateway | Network or server-side issue. May also indicate invalid headers. |
225234

226235
If the HTTP status is `200 OK`, the body of the response contains an audio file in the requested format. This file can be played as it's transferred, saved to a buffer, or saved to a file.
227236

228237
## Next steps
229238

230-
- [Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services/)
231-
- [Customize acoustic models](how-to-customize-acoustic-models.md)
232-
- [Customize language models](how-to-customize-language-model.md)
239+
- [Get your Speech trial subscription](https://azure.microsoft.com/try/cognitive-services)
240+
- [Asynchronous synthesis for long-form audio](quickstarts/text-to-speech/async-synthesis-long-form-audio.md)
241+
- [Get started with Custom Voice](how-to-custom-voice.md)

articles/cognitive-services/Speech-Service/speech-synthesis-markup.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 03/11/2020
11+
ms.date: 03/23/2020
1212
ms.author: dapine
1313
---
1414

@@ -326,23 +326,31 @@ Phonetic alphabets are composed of phones, which are made up of letters, numbers
326326

327327
| Attribute | Description | Required / Optional |
328328
|-----------|-------------|---------------------|
329-
| `alphabet` | Specifies the phonetic alphabet to use when synthesizing the pronunciation of the string in the `ph` attribute. The string specifying the alphabet must be specified in lowercase letters. The following are the possible alphabets that you may specify.<ul><li>`ipa` &ndash; International Phonetic Alphabet</li><li>`sapi` &ndash; Speech service phonetic alphabet</li><li>`ups` &ndash; Universal Phone Set</li></ul><br>The alphabet applies only to the `phoneme` in the element. For more information, see [Phonetic Alphabet Reference](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet). | Optional |
329+
| `alphabet` | Specifies the phonetic alphabet to use when synthesizing the pronunciation of the string in the `ph` attribute. The string specifying the alphabet must be specified in lowercase letters. The following are the possible alphabets that you may specify.<ul><li>`ipa` &ndash; <a href="https://en.wikipedia.org/wiki/International_Phonetic_Alphabet" target="_blank">International Phonetic Alphabet <span class="docon docon-navigate-external x-hidden-focus"></span></a></li><li>`sapi` &ndash; [Speech service phonetic alphabet](speech-ssml-phonetic-sets.md)</li><li>`ups` &ndash; Universal Phone Set</li></ul><br>The alphabet applies only to the `phoneme` in the element.. | Optional |
330330
| `ph` | A string containing phones that specify the pronunciation of the word in the `phoneme` element. If the specified string contains unrecognized phones, the text-to-speech (TTS) service rejects the entire SSML document and produces none of the speech output specified in the document. | Required if using phonemes. |
331331

332332
**Examples**
333333

334-
```XML
334+
```xml
335335
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
336336
<voice name="en-US-AriaRUS">
337-
<s>His name is Mike <phoneme alphabet="ups" ph="JH AU"> Zhou </phoneme></s>
337+
<phoneme alphabet="ipa" ph="t&#x259;mei&#x325;&#x27E;ou&#x325;"> tomato </phoneme>
338+
</voice>
339+
</speak>
340+
```
341+
342+
```xml
343+
<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
344+
<voice name="en-US-AriaRUS">
345+
<phoneme alphabet="sapi" ph="iy eh n y uw eh s"> en-US </phoneme>
338346
</voice>
339347
</speak>
340348
```
341349

342350
```xml
343351
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">
344352
<voice name="en-US-AriaRUS">
345-
<phoneme alphabet="ipa" ph="t&#x259;mei&#x325;&#x27E;ou&#x325;"> tomato </phoneme>
353+
<s>His name is Mike <phoneme alphabet="ups" ph="JH AU"> Zhou </phoneme></s>
346354
</voice>
347355
</speak>
348356
```
@@ -425,13 +433,13 @@ You can use the `sapi` as the vale for the `alphabet` attribute with custom lexi
425433
<?xml version="1.0" encoding="UTF-16"?>
426434
<lexicon version="1.0"
427435
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
428-
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
429-
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
436+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
437+
xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
430438
http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
431439
alphabet="sapi" xml:lang="en-US">
432440
<lexeme>
433-
<grapheme>BTW</grapheme>
434-
<alias> By the way </alias>
441+
<grapheme>BTW</grapheme>
442+
<alias> By the way </alias>
435443
</lexeme>
436444
<lexeme>
437445
<grapheme> Benigni </grapheme>

articles/cognitive-services/Speech-Service/text-to-speech.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@ title: Text-to-speech - Speech service
33
titleSuffix: Azure Cognitive Services
44
description: The text-to-speech feature in the Speech service enables your applications, tools, or devices to convert text into natural human-like synthesized speech. Choose preset voices or create your own custom voice.
55
services: cognitive-services
6-
author: erhopf
6+
author: IEvangelist
77
manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: speech-service
1010
ms.topic: conceptual
11-
ms.date: 03/11/2020
12-
ms.author: erhopf
11+
ms.date: 03/23/2020
12+
ms.author: dapine
1313
---
1414

1515
# What is text-to-speech?
@@ -26,7 +26,7 @@ Text-to-speech from the Speech service enables your applications, tools, or devi
2626

2727
* Speech synthesis - Use the [Speech SDK](quickstarts/text-to-speech-audio-file.md) or [REST API](rest-text-to-speech.md) to convert text-to-speech using standard, neural, or custom voices.
2828

29-
* Asynchronous synthesis of long audio - Use the [Long Audio API](long-audio-api.md) to asynchronously synthesize text-to-speech files longer than 10 minutes (for example audio books or lectures). Unlike synthesis performed using the Speech SDK or speech-to-text REST API, responses aren't returned in real time. The expectation is that requests are sent asynchronously, responses are polled for, and that the synthesized audio is downloaded when made available from the service. Only neural voices are supported.
29+
* Asynchronous synthesis of long audio - Use the [Long Audio API](long-audio-api.md) to asynchronously synthesize text-to-speech files longer than 10 minutes (for example audio books or lectures). Unlike synthesis performed using the Speech SDK or speech-to-text REST API, responses aren't returned in real time. The expectation is that requests are sent asynchronously, responses are polled for, and that the synthesized audio is downloaded when made available from the service. Only custom neural voices are supported.
3030

3131
* Standard voices - Created using Statistical Parametric Synthesis and/or Concatenation Synthesis techniques. These voices are highly intelligible and sound natural. You can easily enable your applications to speak in more than 45 languages, with a wide range of voice options. These voices provide high pronunciation accuracy, including support for abbreviations, acronym expansions, date/time interpretations, polyphones, and more. For a full list of standard voices, see [supported languages](language-support.md#text-to-speech).
3232

0 commit comments

Comments
 (0)