Merge pull request #200494 from jboback/TTS-48k

v-ccolin · web-flow · commit 6dde2c47d0f0 · 2022-06-26T09:35:10.000+01:00
48Khz public preview + output format discovery
diff --git a/articles/cognitive-services/Speech-Service/rest-text-to-speech.md b/articles/cognitive-services/Speech-Service/rest-text-to-speech.md
@@ -230,29 +230,6 @@ This table lists required and optional headers for text-to-speech requests:
 | `X-Microsoft-OutputFormat` | Specifies the audio output format. For a complete list of accepted values, see [Audio outputs](#audio-outputs). | Required |
 | `User-Agent` | The application name. The provided value must be fewer than 255 characters. | Required |
 
-### Audio outputs
-
-This is a list of supported audio formats that are sent in each request as the `X-Microsoft-OutputFormat` header. Each format incorporates a bit rate and encoding type. The Speech service supports 24-kHz, 16-kHz, and 8-kHz audio outputs.
-
-```output
-raw-16khz-16bit-mono-pcm            riff-16khz-16bit-mono-pcm
-raw-24khz-16bit-mono-pcm            riff-24khz-16bit-mono-pcm
-raw-48khz-16bit-mono-pcm            riff-48khz-16bit-mono-pcm
-raw-8khz-8bit-mono-mulaw            riff-8khz-8bit-mono-mulaw
-raw-8khz-8bit-mono-alaw             riff-8khz-8bit-mono-alaw
-audio-16khz-32kbitrate-mono-mp3     audio-16khz-64kbitrate-mono-mp3
-audio-16khz-128kbitrate-mono-mp3    audio-24khz-48kbitrate-mono-mp3
-audio-24khz-96kbitrate-mono-mp3     audio-24khz-160kbitrate-mono-mp3
-audio-48khz-96kbitrate-mono-mp3     audio-48khz-192kbitrate-mono-mp3
-raw-16khz-16bit-mono-truesilk       raw-24khz-16bit-mono-truesilk
-webm-16khz-16bit-mono-opus          webm-24khz-16bit-mono-opus
-ogg-16khz-16bit-mono-opus           ogg-24khz-16bit-mono-opus
-ogg-48khz-16bit-mono-opus
-```
-
-> [!NOTE]
-> If your selected voice and output format have different bit rates, the audio is resampled as necessary. You can decode the `ogg-24khz-16bit-mono-opus` format by using the [Opus codec](https://opus-codec.org/downloads/).
-
 ### Request body
 
 If you're using a custom neural voice, the body of a request can be sent as plain text (ASCII or UTF-8). Otherwise, the body of each `POST` request is sent as [SSML](speech-synthesis-markup.md). SSML allows you to choose the voice and language of the synthesized speech that the text-to-speech feature returns. For a complete list of supported voices, see [Language and voice support for the Speech service](language-support.md#text-to-speech).
@@ -293,6 +270,44 @@ The HTTP status code for each response indicates success or common errors:
 
 If the HTTP status is `200 OK`, the body of the response contains an audio file in the requested format. This file can be played as it's transferred, saved to a buffer, or saved to a file.
 
+## Audio outputs
+
+This is a list of supported audio formats that are sent in each request as the `X-Microsoft-OutputFormat` header. Each format incorporates a bit rate and encoding type. The Speech service supports 48-kHz, 24-kHz, 16-kHz, and 8-kHz audio outputs. Prebuilt neural voices are created from samples that use a 24-khz sample rate. All voices can upsample or downsample to other sample rates when synthesizing.
+
+|Streaming                         |Non-Streaming            |
+|----------------------------------|-------------------------|
+|audio-16khz-16bit-32kbps-mono-opus|riff-8khz-8bit-mono-alaw |
+|audio-16khz-32kbitrate-mono-mp3   |riff-8khz-8bit-mono-mulaw|
+|audio-16khz-64kbitrate-mono-mp3   |riff-8khz-16bit-mono-pcm |
+|audio-16khz-128kbitrate-mono-mp3  |riff-24khz-16bit-mono-pcm|
+|audio-24khz-16bit-24kbps-mono-opus|riff-48khz-16bit-mono-pcm|
+|audio-24khz-16bit-48kbps-mono-opus|                         |
+|audio-24khz-48kbitrate-mono-mp3   |                         |
+|audio-24khz-96kbitrate-mono-mp3   |                         |
+|audio-24khz-160kbitrate-mono-mp3  |                         |
+|audio-48khz-96kbitrate-mono-mp3   |                         |
+|audio-48khz-192kbitrate-mono-mp3  |                         |
+|ogg-16khz-16bit-mono-opus         |                         |
+|ogg-24khz-16bit-mono-opus         |                         |
+|ogg-48khz-16bit-mono-opus         |                         |
+|raw-8khz-8bit-mono-alaw           |                         |
+|raw-8khz-8bit-mono-mulaw          |                         |
+|raw-8khz-16bit-mono-pcm           |                         |
+|raw-16khz-16bit-mono-pcm          |                         |
+|raw-16khz-16bit-mono-truesilk     |                         |
+|raw-24khz-16bit-mono-pcm          |                         |
+|raw-24khz-16bit-mono-truesilk     |                         |
+|raw-48khz-16bit-mono-pcm          |                         |
+|webm-16khz-16bit-mono-opus        |                         |
+|webm-24khz-16bit-24kbps-mono-opus |                         |
+|webm-24khz-16bit-mono-opus        |                         |
+
+> [!NOTE]
+> en-US-AriaNeural, en-US-JennyNeural and zh-CN-XiaoxiaoNeural are available in public preview in 48Khz output. Other voices support 24khz upsampled to 48khz output.
+
+> [!NOTE]
+> If your selected voice and output format have different bit rates, the audio is resampled as necessary. You can decode the `ogg-24khz-16bit-mono-opus` format by using the [Opus codec](https://opus-codec.org/downloads/).
+
 ## Next steps
 
 - [Create a free Azure account](https://azure.microsoft.com/free/cognitive-services/)
diff --git a/articles/cognitive-services/Speech-Service/toc.yml b/articles/cognitive-services/Speech-Service/toc.yml
@@ -429,7 +429,7 @@ items:
           displayName: reference       
         - name: Text-to-speech REST API
           href: rest-text-to-speech.md
-          displayName: reference 
+          displayName: reference, tts output, output format
         - name: Speaker Recognition REST API
           href: /rest/api/speakerrecognition/
           displayName: reference