You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/rest-text-to-speech.md
+38-23Lines changed: 38 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -230,29 +230,6 @@ This table lists required and optional headers for text-to-speech requests:
230
230
|`X-Microsoft-OutputFormat`| Specifies the audio output format. For a complete list of accepted values, see [Audio outputs](#audio-outputs). | Required |
231
231
|`User-Agent`| The application name. The provided value must be fewer than 255 characters. | Required |
232
232
233
-
### Audio outputs
234
-
235
-
This is a list of supported audio formats that are sent in each request as the `X-Microsoft-OutputFormat` header. Each format incorporates a bit rate and encoding type. The Speech service supports 24-kHz, 16-kHz, and 8-kHz audio outputs.
> If your selected voice and output format have different bit rates, the audio is resampled as necessary. You can decode the `ogg-24khz-16bit-mono-opus` format by using the [Opus codec](https://opus-codec.org/downloads/).
255
-
256
233
### Request body
257
234
258
235
If you're using a custom neural voice, the body of a request can be sent as plain text (ASCII or UTF-8). Otherwise, the body of each `POST` request is sent as [SSML](speech-synthesis-markup.md). SSML allows you to choose the voice and language of the synthesized speech that the text-to-speech feature returns. For a complete list of supported voices, see [Language and voice support for the Speech service](language-support.md#text-to-speech).
@@ -293,6 +270,44 @@ The HTTP status code for each response indicates success or common errors:
293
270
294
271
If the HTTP status is `200 OK`, the body of the response contains an audio file in the requested format. This file can be played as it's transferred, saved to a buffer, or saved to a file.
295
272
273
+
## Audio outputs
274
+
275
+
This is a list of supported audio formats that are sent in each request as the `X-Microsoft-OutputFormat` header. Each format incorporates a bit rate and encoding type. The Speech service supports 48-kHz, 24-kHz, 16-kHz, and 8-kHz audio outputs. Prebuilt neural voices are created from samples that use a 24-khz sample rate. All voices can upsample or downsample to other sample rates when synthesizing.
> en-US-AriaNeural, en-US-JennyNeural and zh-CN-XiaoxiaoNeural are available in public preview in 48Khz output. Other voices support 24khz upsampled to 48khz output.
307
+
308
+
> [!NOTE]
309
+
> If your selected voice and output format have different bit rates, the audio is resampled as necessary. You can decode the `ogg-24khz-16bit-mono-opus` format by using the [Opus codec](https://opus-codec.org/downloads/).
310
+
296
311
## Next steps
297
312
298
313
-[Create a free Azure account](https://azure.microsoft.com/free/cognitive-services/)
0 commit comments