Skip to content

Commit 19856a4

Browse files
Merge pull request #212772 from sally-baolian/patch-64
Update language-support.md
2 parents 3386401 + 10ff997 commit 19856a4

File tree

4 files changed

+7
-6
lines changed

4 files changed

+7
-6
lines changed

articles/cognitive-services/Speech-Service/language-support.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Each prebuilt neural voice supports a specific language and dialect, identified
4242
> [!IMPORTANT]
4343
> Pricing varies for Prebuilt Neural Voice (see *Neural* on the pricing page) and Custom Neural Voice (see *Custom Neural* on the pricing page). For more information, see the [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) page.
4444
45-
Prebuilt neural voices are created from samples that use a 24-khz sample rate. All voices can upsample or downsample to other sample rates when synthesizing.
45+
Each prebuilt neural voice model is available at 24kHz and high-fidelity 48kHz. Other sample rates can be obtained through upsampling or downsampling when synthesizing.
4646

4747
Please note that the following neural voices are retired.
4848

articles/cognitive-services/Speech-Service/long-audio-api.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -455,6 +455,8 @@ We support flexible audio output formats. You can generate audio outputs per par
455455

456456
> [!NOTE]
457457
> The default audio format is riff-16khz-16bit-mono-pcm.
458+
>
459+
> The sample rate for long audio voices is 24kHz, not 48kHz. Other sample rates can be obtained through upsampling or downsampling when synthesizing.
458460
459461
* riff-8khz-16bit-mono-pcm
460462
* riff-16khz-16bit-mono-pcm

articles/cognitive-services/Speech-Service/rest-text-to-speech.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -272,7 +272,7 @@ If the HTTP status is `200 OK`, the body of the response contains an audio file
272272

273273
## Audio outputs
274274

275-
The supported streaming and non-streaming audio formats are sent in each request as the `X-Microsoft-OutputFormat` header. Each format incorporates a bit rate and encoding type. The Speech service supports 48-kHz, 24-kHz, 16-kHz, and 8-kHz audio outputs. Prebuilt neural voices are created from samples that use a 24-khz sample rate. All voices can upsample or downsample to other sample rates when synthesizing.
275+
The supported streaming and non-streaming audio formats are sent in each request as the `X-Microsoft-OutputFormat` header. Each format incorporates a bit rate and encoding type. The Speech service supports 48-kHz, 24-kHz, 16-kHz, and 8-kHz audio outputs. Each prebuilt neural voice model is available at 24kHz and high-fidelity 48kHz.
276276

277277
#### [Streaming](#tab/streaming)
278278

@@ -322,9 +322,8 @@ riff-48khz-16bit-mono-pcm
322322
***
323323

324324
> [!NOTE]
325-
> en-US-AriaNeural, en-US-JennyNeural and zh-CN-XiaoxiaoNeural are available in public preview in 48Khz output. Other voices support 24khz upsampled to 48khz output.
326-
327-
> [!NOTE]
325+
> If you select 48kHz output format, the high-fidelity voice model with 48kHz will be invoked accordingly. The sample rates other than 24kHz and 48kHz can be obtained through upsampling or downsampling when synthesizing, for example, 44.1kHz is downsampled from 48kHz.
326+
>
328327
> If your selected voice and output format have different bit rates, the audio is resampled as necessary. You can decode the `ogg-24khz-16bit-mono-opus` format by using the [Opus codec](https://opus-codec.org/downloads/).
329328
330329
## Next steps

articles/cognitive-services/Speech-Service/text-to-speech.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Here's more information about neural text-to-speech features in the Speech servi
4141

4242
* **Asynchronous synthesis of long audio**: Use the [Long Audio API](long-audio-api.md) to asynchronously synthesize text-to-speech files longer than 10 minutes (for example, audio books or lectures). Unlike synthesis performed via the Speech SDK or speech-to-text REST API, responses aren't returned in real time. The expectation is that requests are sent asynchronously, responses are polled for, and synthesized audio is downloaded when the service makes it available.
4343

44-
* **Prebuilt neural voices**: Microsoft neural text-to-speech capability uses deep neural networks to overcome the limits of traditional speech synthesis with regard to stress and intonation in spoken language. Prosody prediction and voice synthesis happen simultaneously, which results in more fluid and natural-sounding outputs. You can use neural voices to:
44+
* **Prebuilt neural voices**: Microsoft neural text-to-speech capability uses deep neural networks to overcome the limits of traditional speech synthesis with regard to stress and intonation in spoken language. Prosody prediction and voice synthesis happen simultaneously, which results in more fluid and natural-sounding outputs. Each prebuilt neural voice model is available at 24kHz and high-fidelity 48kHz. You can use neural voices to:
4545

4646
- Make interactions with chatbots and voice assistants more natural and engaging.
4747
- Convert digital texts such as e-books into audiobooks.

0 commit comments

Comments
 (0)