Skip to content

Commit 6dde2c4

Browse files
authored
Merge pull request #200494 from jboback/TTS-48k
48Khz public preview + output format discovery
2 parents 71e3631 + ff05793 commit 6dde2c4

File tree

2 files changed

+39
-24
lines changed

2 files changed

+39
-24
lines changed

articles/cognitive-services/Speech-Service/rest-text-to-speech.md

Lines changed: 38 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -230,29 +230,6 @@ This table lists required and optional headers for text-to-speech requests:
230230
| `X-Microsoft-OutputFormat` | Specifies the audio output format. For a complete list of accepted values, see [Audio outputs](#audio-outputs). | Required |
231231
| `User-Agent` | The application name. The provided value must be fewer than 255 characters. | Required |
232232

233-
### Audio outputs
234-
235-
This is a list of supported audio formats that are sent in each request as the `X-Microsoft-OutputFormat` header. Each format incorporates a bit rate and encoding type. The Speech service supports 24-kHz, 16-kHz, and 8-kHz audio outputs.
236-
237-
```output
238-
raw-16khz-16bit-mono-pcm riff-16khz-16bit-mono-pcm
239-
raw-24khz-16bit-mono-pcm riff-24khz-16bit-mono-pcm
240-
raw-48khz-16bit-mono-pcm riff-48khz-16bit-mono-pcm
241-
raw-8khz-8bit-mono-mulaw riff-8khz-8bit-mono-mulaw
242-
raw-8khz-8bit-mono-alaw riff-8khz-8bit-mono-alaw
243-
audio-16khz-32kbitrate-mono-mp3 audio-16khz-64kbitrate-mono-mp3
244-
audio-16khz-128kbitrate-mono-mp3 audio-24khz-48kbitrate-mono-mp3
245-
audio-24khz-96kbitrate-mono-mp3 audio-24khz-160kbitrate-mono-mp3
246-
audio-48khz-96kbitrate-mono-mp3 audio-48khz-192kbitrate-mono-mp3
247-
raw-16khz-16bit-mono-truesilk raw-24khz-16bit-mono-truesilk
248-
webm-16khz-16bit-mono-opus webm-24khz-16bit-mono-opus
249-
ogg-16khz-16bit-mono-opus ogg-24khz-16bit-mono-opus
250-
ogg-48khz-16bit-mono-opus
251-
```
252-
253-
> [!NOTE]
254-
> If your selected voice and output format have different bit rates, the audio is resampled as necessary. You can decode the `ogg-24khz-16bit-mono-opus` format by using the [Opus codec](https://opus-codec.org/downloads/).
255-
256233
### Request body
257234

258235
If you're using a custom neural voice, the body of a request can be sent as plain text (ASCII or UTF-8). Otherwise, the body of each `POST` request is sent as [SSML](speech-synthesis-markup.md). SSML allows you to choose the voice and language of the synthesized speech that the text-to-speech feature returns. For a complete list of supported voices, see [Language and voice support for the Speech service](language-support.md#text-to-speech).
@@ -293,6 +270,44 @@ The HTTP status code for each response indicates success or common errors:
293270

294271
If the HTTP status is `200 OK`, the body of the response contains an audio file in the requested format. This file can be played as it's transferred, saved to a buffer, or saved to a file.
295272

273+
## Audio outputs
274+
275+
This is a list of supported audio formats that are sent in each request as the `X-Microsoft-OutputFormat` header. Each format incorporates a bit rate and encoding type. The Speech service supports 48-kHz, 24-kHz, 16-kHz, and 8-kHz audio outputs. Prebuilt neural voices are created from samples that use a 24-khz sample rate. All voices can upsample or downsample to other sample rates when synthesizing.
276+
277+
|Streaming |Non-Streaming |
278+
|----------------------------------|-------------------------|
279+
|audio-16khz-16bit-32kbps-mono-opus|riff-8khz-8bit-mono-alaw |
280+
|audio-16khz-32kbitrate-mono-mp3 |riff-8khz-8bit-mono-mulaw|
281+
|audio-16khz-64kbitrate-mono-mp3 |riff-8khz-16bit-mono-pcm |
282+
|audio-16khz-128kbitrate-mono-mp3 |riff-24khz-16bit-mono-pcm|
283+
|audio-24khz-16bit-24kbps-mono-opus|riff-48khz-16bit-mono-pcm|
284+
|audio-24khz-16bit-48kbps-mono-opus| |
285+
|audio-24khz-48kbitrate-mono-mp3 | |
286+
|audio-24khz-96kbitrate-mono-mp3 | |
287+
|audio-24khz-160kbitrate-mono-mp3 | |
288+
|audio-48khz-96kbitrate-mono-mp3 | |
289+
|audio-48khz-192kbitrate-mono-mp3 | |
290+
|ogg-16khz-16bit-mono-opus | |
291+
|ogg-24khz-16bit-mono-opus | |
292+
|ogg-48khz-16bit-mono-opus | |
293+
|raw-8khz-8bit-mono-alaw | |
294+
|raw-8khz-8bit-mono-mulaw | |
295+
|raw-8khz-16bit-mono-pcm | |
296+
|raw-16khz-16bit-mono-pcm | |
297+
|raw-16khz-16bit-mono-truesilk | |
298+
|raw-24khz-16bit-mono-pcm | |
299+
|raw-24khz-16bit-mono-truesilk | |
300+
|raw-48khz-16bit-mono-pcm | |
301+
|webm-16khz-16bit-mono-opus | |
302+
|webm-24khz-16bit-24kbps-mono-opus | |
303+
|webm-24khz-16bit-mono-opus | |
304+
305+
> [!NOTE]
306+
> en-US-AriaNeural, en-US-JennyNeural and zh-CN-XiaoxiaoNeural are available in public preview in 48Khz output. Other voices support 24khz upsampled to 48khz output.
307+
308+
> [!NOTE]
309+
> If your selected voice and output format have different bit rates, the audio is resampled as necessary. You can decode the `ogg-24khz-16bit-mono-opus` format by using the [Opus codec](https://opus-codec.org/downloads/).
310+
296311
## Next steps
297312

298313
- [Create a free Azure account](https://azure.microsoft.com/free/cognitive-services/)

articles/cognitive-services/Speech-Service/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -429,7 +429,7 @@ items:
429429
displayName: reference
430430
- name: Text-to-speech REST API
431431
href: rest-text-to-speech.md
432-
displayName: reference
432+
displayName: reference, tts output, output format
433433
- name: Speaker Recognition REST API
434434
href: /rest/api/speakerrecognition/
435435
displayName: reference

0 commit comments

Comments
 (0)