Skip to content

Commit d354631

Browse files
committed
hd voices
1 parent 0f34b67 commit d354631

File tree

3 files changed

+52
-4
lines changed

3 files changed

+52
-4
lines changed

articles/ai-services/speech-service/high-definition-voices.md

Lines changed: 50 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,10 @@ The following are the key features of Azure AI Speech HD voices:
2626
| Key features | Description |
2727
|--------------|-------------|
2828
| **Human-like speech generation** | Neural text to speech HD voices can generate highly natural and human-like speech. The model is trained on millions of hours of multilingual data, enabling it to accurately interpret input text and generate speech with the appropriate emotion, pace, and rhythm without manual adjustments. |
29-
| **Version control** | With neural text to speech HD voices, we release different versions of the same voice, each with a unique base model size and recipe. This offers you the opportunity to experience new voice variations or continue using a specific version of a voice. |
29+
| **Conversational** | Neural text to speech HD voices can replicate natural speech patterns, including spontaneous pauses and emphasis. When given conversational text, the model can reproduce common phonemes like pauses and filler words. The generated voice sounds as if someone is conversing directly with you. |
30+
| **Prosody variations** | Neural text to speech HD voices introduce slight variations in each output to enhance realism. These variations make the speech sound more natural, as human voices naturally exhibit variation. |
3031
| **High fidelity** | The primary objective of neural text to speech HD voices is to generate high-fidelity audio. The synthetic speech produced by our system can closely mimic human speech in both quality and naturalness. |
32+
| **Version control** | With neural text to speech HD voices, we release different versions of the same voice, each with a unique base model size and recipe. This offers you the opportunity to experience new voice variations or continue using a specific version of a voice. |
3133

3234
## Comparison of Azure AI Speech HD voices to other Azure text to speech voices
3335

@@ -40,14 +42,60 @@ Here's a comparison of features between Azure AI Speech HD voices, Azure OpenAI
4042
| **Region** | North Central US, Sweden Central | North Central US, Sweden Central | Available in dozens of regions. See the [region list](regions.md#speech-service).|
4143
| **Number of voices** | 12 | 6 | More than 500 |
4244
| **Multilingual** | No (perform on primary language only) | Yes | Yes (applicable only to multilingual voices) |
43-
| **SSML support** | Support for [a subset of SSML elements](#supported-and-unsupported-ssml-elements-for-azure-neural-text-to-speech-hd-voices).| Support for [a subset of SSML elements](openai-voices.md#ssml-elements-supported-by-openai-text-to-speech-voices-in-azure-ai-speech). | Support for the [full set of SSML](speech-synthesis-markup-structure.md) in Azure AI Speech. |
45+
| **SSML support** | Support for [a subset of SSML elements](#supported-and-unsupported-ssml-elements-for-azure-ai-speech-hd-voices).| Support for [a subset of SSML elements](openai-voices.md#ssml-elements-supported-by-openai-text-to-speech-voices-in-azure-ai-speech). | Support for the [full set of SSML](speech-synthesis-markup-structure.md) in Azure AI Speech. |
4446
| **Development options** | Speech SDK, Speech CLI, REST API | Speech SDK, Speech CLI, REST API | Speech SDK, Speech CLI, REST API |
4547
| **Deployment options** | Cloud only | Cloud only | Cloud, embedded, hybrid, and containers. |
4648
| **Real-time or batch synthesis** | Real-time only | Real-time and batch synthesis | Real-time and batch synthesis |
4749
| **Latency** | Less than 300 ms | Greater than 500 ms | Less than 300 ms |
4850
| **Sample rate of synthesized audio** | 8, 16, 22.05, 24, 44.1, and 48 kHz | 8, 16, 24, and 48 kHz | 8, 16, 22.05, 24, 44.1, and 48 kHz |
4951
| **Speech output audio format** | opus, mp3, pcm, truesilk | opus, mp3, pcm, truesilk | opus, mp3, pcm, truesilk |
5052

53+
## Supported Azure AI Speech HD voices
54+
55+
The following table lists the Azure AI Speech HD voices that are currently available.
56+
57+
| HD voice name | Neural voice persona | Locale |
58+
|---------------|----------------------|--------|
59+
| de-DE-Seraphina:DragonHDLatestNeural | de-DE-Seraphina | de-DE |
60+
| en-US-Andrew:DragonHDLatestNeural | en-US-Andrew | en-US |
61+
| en-US-Andrew2:DragonHDLatestNeural | en-US-Andrew2 | en-US |
62+
| en-US-Aria:DragonHDLatestNeural | en-US-Aria | en-US |
63+
| en-US-Ava:DragonHDLatestNeural | en-US-Ava | en-US |
64+
| en-US-Davis:DragonHDLatestNeural | en-US-Davis | en-US |
65+
| en-US-Emma:DragonHDLatestNeural | en-US-Emma | en-US |
66+
| en-US-Emma2:DragonHDLatestNeural | en-US-Emma2 | en-US |
67+
| en-US-Jenny:DragonHDLatestNeural | en-US-Jenny | en-US |
68+
| en-US-Steffan:DragonHDLatestNeural | en-US-Steffan | en-US |
69+
| ja-JP-Masaru:DragonHDLatestNeural | ja-JP-Masaru | ja-JP |
70+
| zh-CN-Xiaochen:DragonHDLatestNeural | zh-CN-Xiaochen | zh-CN |
71+
72+
73+
## How to use Azure AI Speech HD voices
74+
75+
You can use HD voices with the same Speech SDK and REST APIs as the non HD voices.
76+
77+
Here are some key points to consider when using Azure AI Speech HD voices:
78+
79+
- **Voice locale**: The locale in the voice name indicates its original language and region.
80+
- **Base models**:
81+
- HD voices come with a base model that understands the input text and predicts the speaking pattern accordingly. You can specify the desired model (such as DragonHDLatestNeural) according to the availability of each voice.
82+
- **SSML usage**: To reference a voice in SSML, use the format `voicename:basemodel`. The name before the colon, such as `en-US-Andrew`, is the voice persona name and its original locale. The base model is tracked by versions in subsequent updates.
83+
- **Temperature parameter**:
84+
- The temperature value is a float ranging from 0 to 1, influencing the randomness of the output.
85+
- You can also adjust the temperature parameter to control the variation of outputs.
86+
- **Lower temperature**: Results in less randomness, leading to more predictable outputs.
87+
- **Higher temperature**: Increases randomness, allowing for more diverse outputs.
88+
- The default temperature is set at 1.0.
89+
- Less randomness yields more stable results, while more randomness offers variety but less consistency.
90+
91+
Here's an example of how to use Azure AI Speech HD voices in SSML:
92+
93+
```ssml
94+
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='https://www.w3.org/2001/mstts' xml:lang='en-US'>
95+
<voice name='en-US-Ava:DragonHDLatestNeural' parameters='temperature=0.8'>Here is a test</voice>
96+
</speak>
97+
```
98+
5199
## Supported and unsupported SSML elements for Azure AI Speech HD voices
52100

53101
The Speech Synthesis Markup Language (SSML) with input text determines the structure, content, and other characteristics of the text to speech output. For example, you can use SSML to define a paragraph, a sentence, a break or a pause, or silence. You can wrap text with event tags such as bookmark or viseme that your application processes later.

articles/ai-services/speech-service/speech-synthesis-markup-structure.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Refer to the sections below for details about how to structure elements in the S
2121
> In addition to Azure AI Speech neural (non HD) voices, you can also use [Azure AI Speech high definition (HD) voices](high-definition-voices.md) and [Azure OpenAI neural (HD and non HD) voices](openai-voices.md). The HD voices provide a higher quality for more versatile scenarios.
2222
>
2323
> Some voices don't support all [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup-structure.md) tags. This includes neural text to speech HD voices, personal voices, and embedded voices.
24-
- For Azure AI Speech high definition (HD) voices, check the SSML support [here](high-definition-voices.md#supported-and-unsupported-ssml-elements-for-azure-neural-text-to-speech-hd-voices).
24+
- For Azure AI Speech high definition (HD) voices, check the SSML support [here](high-definition-voices.md#supported-and-unsupported-ssml-elements-for-azure-ai-speech-hd-voices).
2525
- For personal voice, you can find the SSML support [here](personal-voice-how-to-use.md#supported-and-unsupported-ssml-elements-for-personal-voice).
2626
- For embedded voices, check the SSML support [here](embedded-speech.md#embedded-voices-capabilities).
2727

articles/ai-services/speech-service/text-to-speech.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Here's more information about neural text to speech features in the Speech servi
5858
> In addition to Azure AI Speech neural (non HD) voices, you can also use [Azure AI Speech high definition (HD) voices](high-definition-voices.md) and [Azure OpenAI neural (HD and non HD) voices](openai-voices.md). The HD voices provide a higher quality for more versatile scenarios.
5959
>
6060
> Some voices don't support all [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup-structure.md) tags. This includes neural text to speech HD voices, personal voices, and embedded voices.
61-
- For Azure AI Speech high definition (HD) voices, check the SSML support [here](high-definition-voices.md#supported-and-unsupported-ssml-elements-for-azure-neural-text-to-speech-hd-voices).
61+
- For Azure AI Speech high definition (HD) voices, check the SSML support [here](high-definition-voices.md#supported-and-unsupported-ssml-elements-for-azure-ai-speech-hd-voices).
6262
- For personal voice, you can find the SSML support [here](personal-voice-how-to-use.md#supported-and-unsupported-ssml-elements-for-personal-voice).
6363
- For embedded voices, check the SSML support [here](embedded-speech.md#embedded-voices-capabilities).
6464

0 commit comments

Comments
 (0)