You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/batch-synthesis-properties.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,7 @@ Batch synthesis properties are described in the following table.
34
34
|`outputs.result`|The location of the batch synthesis result files with audio output and logs.<br/><br/>This property is read-only.|
35
35
|`properties`|A defined set of optional batch synthesis configuration settings.|
36
36
|`properties.sizeInBytes`|The audio output size in bytes.<br/><br/>This property is read-only.|
37
-
|`properties.billingDetails`|The number of words that were processed and billed by `customNeuralCharacters` versus `neuralCharacters` (prebuilt) voices.<br/><br/>This property is read-only.|
37
+
|`properties.billingDetails`|The number of words that were processed and billed by `customNeuralCharacters`(custom voice) versus `neuralCharacters` (standard voice).<br/><br/>This property is read-only.|
38
38
|`properties.concatenateResult`|Determines whether to concatenate the result. This optional `bool` value ("true" or "false") is "false" by default.|
39
39
|`properties.decompressOutputFiles`|Determines whether to unzip the synthesis result files in the destination container. This property can only be set when the `destinationContainerUrl` property is set. This optional `bool` value ("true" or "false") is "false" by default.|
40
40
|`properties.destinationContainerUrl`|The batch synthesis results can be stored in a writable Azure container. If you don't specify a container URI with [shared access signatures (SAS)](/azure/storage/common/storage-sas-overview) token, the Speech service stores the results in a container managed by Microsoft. SAS with stored access policies isn't supported. When the synthesis job is deleted, the result data is also deleted.<br/><br/>This optional property isn't included in the response when you get the synthesis job.|
@@ -59,7 +59,7 @@ Batch synthesis properties are described in the following table.
59
59
|`synthesisConfig.speakerProfileId`|The speaker profile ID of a personal voice.<br/><br/>For information about available personal voice base model names, see [integrate personal voice](personal-voice-how-to-use.md#integrate-personal-voice-in-your-application).<br/>For information about how to get the speaker profile ID, see [language and voice support](personal-voice-create-voice.md).<br/><br/>This property is required when `inputKind` is set to `"PlainText"`.|
60
60
|`synthesisConfig.style`|For some voices, you can adjust the speaking style to express different emotions like cheerfulness, empathy, and calm. You can optimize the voice for different scenarios like customer service, newscast, and voice assistant.<br/><br/>For information about the available styles per voice, see [voice styles and roles](language-support.md?tabs=tts#voice-styles-and-roles).<br/><br/>This optional property is only applicable when `synthesisConfig.style` is set.|
61
61
|`synthesisConfig.styleDegree`|The intensity of the speaking style. You can specify a stronger or softer style to make the speech more expressive or subdued. The range of accepted values are: 0.01 to 2 inclusive. The default value is 1, which means the predefined style intensity. The minimum unit is 0.01, which results in a slight tendency for the target style. A value of 2 results in a doubling of the default style intensity. If the style degree is missing or isn't supported for your voice, this attribute is ignored.<br/><br/>For information about the available styles per voice, see [voice styles and roles](language-support.md?tabs=tts#voice-styles-and-roles).<br/><br/>This optional property is only applicable when `inputKind` is set to `"PlainText"`.|
62
-
|`synthesisConfig.voice`|The voice that speaks the audio output.<br/><br/>For information about the available prebuilt neural voices, see [language and voice support](language-support.md?tabs=tts). To use a custom voice, you must specify a valid custom voice and deployment ID mapping in the `customVoices` property. To use a personal voice, you need to specify the `synthesisConfig.speakerProfileId` property. <br/><br/>This property is required when `inputKind` is set to `"PlainText"`.|
62
+
|`synthesisConfig.voice`|The voice that speaks the audio output.<br/><br/>For information about the available standard voices, see [language and voice support](language-support.md?tabs=tts). To use a custom voice, you must specify a valid custom voice and deployment ID mapping in the `customVoices` property. To use a personal voice, you need to specify the `synthesisConfig.speakerProfileId` property. <br/><br/>This property is required when `inputKind` is set to `"PlainText"`.|
63
63
|`synthesisConfig.volume`|The volume of the audio output.<br/><br/>For information about the accepted values, see the [adjust prosody](speech-synthesis-markup-voice.md#adjust-prosody) table in the Speech Synthesis Markup Language (SSML) documentation. Invalid values are ignored.<br/><br/>This optional property is only applicable when `inputKind` is set to `"PlainText"`.|
64
64
|`inputKind`|Indicates whether the `inputs` text property should be plain text or SSML. The possible case-insensitive values are "PlainText" and "SSML". When the `inputKind` is set to `"PlainText"`, you must also set the `synthesisConfig` voice property.<br/><br/>This property is required.|
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/call-center-overview.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,7 +42,7 @@ The Speech service offers the following features that can be used for call cente
42
42
-[Speaker identification](./speaker-recognition-overview.md): Helps you determine an unknown speaker’s identity within a group of enrolled speakers and is typically used for call center customer verification scenarios or fraud detection.
43
43
-[Language Identification](./language-identification.md): Identify languages spoken in audio and can be used in real-time and post-call analysis for insights or to control the environment (such as output language of a virtual agent).
44
44
45
-
The Speech service works well with prebuilt models. However, you might want to further customize and tune the experience for your product or environment. Typical examples for Speech customization include:
45
+
You might want to further customize and fine-tune the experience for your product or environment. Typical examples for Speech fine-tuning include:
46
46
47
47
| Speech customization | Description |
48
48
| -------------- | ----------- |
@@ -57,7 +57,7 @@ The Language service offers the following features that can be used for call cen
57
57
-[Conversation summarization](../language-service/summarization/overview.md?tabs=conversation-summarization): Summarize in abstract text what each conversation participant said about the issues and resolutions. For example, a call center can group product issues that have a high volume.
58
58
-[Sentiment analysis and opinion mining](../language-service/sentiment-opinion-mining/overview.md): Analyze transcriptions and associate positive, neutral, or negative sentiment at the utterance and conversation-level.
59
59
60
-
While the Language service works well with prebuilt models, you might want to further customize and tune models to extract more information from your data. Typical examples for Language customization include:
60
+
You might want to further customize and fine-tune models to extract more information from your data. Typical examples for Language customization include:
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/custom-neural-voice.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Custom neural voice (CNV) is a text to speech feature that lets you create a one
19
19
>
20
20
> Access to [Custom neural voice (CNV) Lite](custom-neural-voice-lite.md) is available for anyone to demo and evaluate CNV before investing in professional recordings to create a higher-quality voice.
21
21
22
-
Out of the box, [text to speech](text-to-speech.md) can be used with prebuilt neural voices for each [supported language](language-support.md?tabs=tts). The prebuilt neural voices work well in most text to speech scenarios if a unique voice isn't required.
22
+
Out of the box, [text to speech](text-to-speech.md) can be used with standard voices for each [supported language](language-support.md?tabs=tts). The standard voices work well in most text to speech scenarios if a unique voice isn't required.
23
23
24
24
Custom neural voice is based on the neural text to speech technology and the multilingual, multi-speaker, universal model. You can create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of custom neural voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md?tabs=tts) for custom neural voice.
25
25
@@ -46,7 +46,7 @@ Here's an overview of the steps to create a custom neural voice in Speech Studio
46
46
1.[Test your voice](professional-voice-train-voice.md#test-your-voice-model). Prepare test scripts for your voice model that cover the different use cases for your apps. It’s a good idea to use scripts within and outside the training dataset, so you can test the quality more broadly for different content.
47
47
1.[Deploy and use your voice model](professional-voice-deploy-endpoint.md) in your apps.
48
48
49
-
You can tune, adjust, and use your custom voice, similarly as you would use a prebuilt neural voice. Convert text into speech in real-time, or generate audio content offline with text input. You use the [REST API](./rest-text-to-speech.md), the [Speech SDK](./get-started-text-to-speech.md), or the [Speech Studio](https://speech.microsoft.com/audiocontentcreation).
49
+
You can tune, adjust, and use your custom voice, similarly as you would use a standard voice. Convert text into speech in real-time, or generate audio content offline with text input. You use the [REST API](./rest-text-to-speech.md), the [Speech SDK](./get-started-text-to-speech.md), or the [Speech Studio](https://speech.microsoft.com/audiocontentcreation).
50
50
51
51
> [!TIP]
52
52
> Check out the code samples in the [Speech SDK repository on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/custom-voice/README.md) to see how to use custom neural voice in your application.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/faq-tts.yml
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ sections:
36
36
- question: |
37
37
What output audio formats does text to speech support?
38
38
answer: |
39
-
Azure AI text to speech supports various streaming and non-streaming audio formats, with the commonly used sampling rates. All TTS prebuilt neural voices are created to support high-fidelity audio outputs with 48 kHz and 24 kHz. The audio can be resampled to support other rates as needed. See [Audio outputs](rest-text-to-speech.md#audio-outputs).
39
+
Azure AI text to speech supports various streaming and non-streaming audio formats, with the commonly used sampling rates. All TTS standard voices are created to support high-fidelity audio outputs with 48 kHz and 24 kHz. The audio can be resampled to support other rates as needed. See [Audio outputs](rest-text-to-speech.md#audio-outputs).
40
40
- question: |
41
41
Can the voice be customized to stress specific words?
42
42
answer: |
@@ -72,9 +72,9 @@ sections:
72
72
answer: |
73
73
We recommend that you keep the style consistent in one set of training data. If the styles are different, put them into different training sets. In this case, consider using the multi-style voice training feature of custom neural voice. For the script selection criteria, see [Record custom voice samples](record-custom-voice-samples.md).
74
74
- question: |
75
-
Does switching styles via SSML work for custom neural voices?
75
+
Does switching styles via SSML work for custom voices?
76
76
answer: |
77
-
Switching styles via SSML works for both prebuilt multi-style voices and CNV multi-style voices. With multi-style training, you can create a voice that speaks in different styles, and you can also adjust these styles via SSML.
77
+
Switching styles via SSML works for both multi-style standard voices and multi-style custom voices. With multi-style training, you can create a voice that speaks in different styles, and you can also adjust these styles via SSML.
78
78
- question: |
79
79
How does cross-lingual voice work with languages that have different pronunciation structure and assembly?
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/gaming-concepts.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Here are a few Speech features to consider for flexible and interactive game exp
19
19
- Bring everyone into the conversation by synthesizing audio from text. Or by displaying text from audio.
20
20
- Make the game more accessible for players who are unable to read text in a particular language, including young players who don't read or write. Players can listen to storylines and instructions in their preferred language.
21
21
- Create game avatars and nonplayable characters (NPC) that can initiate or participate in a conversation in-game.
22
-
-Prebuilt neural voice can provide highly natural out-of-box voices with leading voice variety in terms of a large portfolio of languages and voices.
22
+
-Standard voice can provide highly natural out-of-box voices with leading voice variety in terms of a large portfolio of languages and voices.
23
23
- Custom neural voice for creating a voice that stays on-brand with consistent quality and speaking style. You can add emotions, accents, nuances, laughter, and other para linguistic sounds and expressions.
24
24
- Use game dialogue prototyping to shorten the amount of time and money spent in product to get the game to market sooner. You can rapidly swap lines of dialog and listen to variations in real-time to iterate the game content.
25
25
@@ -29,15 +29,15 @@ For information about locale and regional availability, see [Language and voice
29
29
30
30
## Text to speech
31
31
32
-
Help bring everyone into the conversation by converting text messages to audio using [Text to speech](text-to-speech.md) for scenarios, such as game dialogue prototyping, greater accessibility, or nonplayable character (NPC) voices. Text to speech includes [prebuilt neural voice](language-support.md?tabs=tts#prebuilt-neural-voices) and [custom neural voice](language-support.md?tabs=tts#custom-neural-voice) features. Prebuilt neural voice can provide highly natural out-of-box voices with leading voice variety in terms of a large portfolio of languages and voices. Custom neural voice is an easy-to-use self-service for creating a highly natural custom voice.
32
+
Help bring everyone into the conversation by converting text messages to audio using [Text to speech](text-to-speech.md) for scenarios, such as game dialogue prototyping, greater accessibility, or nonplayable character (NPC) voices. Text to speech includes [standard voice](language-support.md?tabs=tts#standard-voices) and [custom neural voice](language-support.md?tabs=tts#custom-neural-voice) features. Standard voice can provide highly natural out-of-box voices with leading voice variety in terms of a large portfolio of languages and voices. Custom neural voice is an easy-to-use self-service for creating a highly natural custom voice.
33
33
34
34
When enabling this functionality in your game, keep in mind the following benefits:
35
35
36
36
- Voices and languages supported - A large portfolio of [locales and voices](language-support.md?tabs=tts#supported-languages) are supported. You can also [specify multiple languages](speech-synthesis-markup-voice.md#adjust-speaking-languages) for Text to speech output. For [custom neural voice](custom-neural-voice.md), you can [choose to create](professional-voice-train-voice.md?tabs=neural#choose-a-training-method) different languages from single language training data.
37
37
- Emotional styles supported - [Emotional tones](language-support.md?tabs=tts#voice-styles-and-roles), such as cheerful, angry, sad, excited, hopeful, friendly, unfriendly, terrified, shouting, and whispering. You can [adjust the speaking style](speech-synthesis-markup-voice.md#use-speaking-styles-and-roles), style degree, and role at the sentence level.
38
38
- Visemes supported - You can use visemes during real-time synthesizing to control the movement of 2D and 3D avatar models, so that the mouth movements are perfectly matched to synthetic speech. For more information, see [Get facial position with viseme](how-to-speech-synthesis-viseme.md).
39
39
- Fine-tuning Text to speech output with Speech Synthesis Markup Language (SSML) - With SSML, you can customize Text to speech outputs, with richer voice tuning supports. For more information, see [Speech Synthesis Markup Language (SSML) overview](speech-synthesis-markup.md).
40
-
- Audio outputs - Each prebuilt neural voice model is available at 24 kHz and high-fidelity 48 kHz. If you select 48-kHz output format, the high-fidelity voice model with 48 kHz is invoked accordingly. The sample rates other than 24 kHz and 48 kHz can be obtained through upsampling or downsampling when synthesizing. For example, 44.1 kHz is downsampled from 48 kHz. Each audio format incorporates a bitrate and encoding type. For more information, see the [supported audio formats](rest-text-to-speech.md?tabs=streaming#audio-outputs). For more information on 48-kHz high-quality voices, see [this introduction blog](https://techcommunity.microsoft.com/t5/ai-cognitive-services-blog/azure-neural-tts-voices-upgraded-to-48khz-with-hifinet2-vocoder/ba-p/3665252).
40
+
- Audio outputs - Each standard voice model is available at 24 kHz and high-fidelity 48 kHz. If you select 48-kHz output format, the high-fidelity voice model with 48 kHz is invoked accordingly. The sample rates other than 24 kHz and 48 kHz can be obtained through upsampling or downsampling when synthesizing. For example, 44.1 kHz is downsampled from 48 kHz. Each audio format incorporates a bitrate and encoding type. For more information, see the [supported audio formats](rest-text-to-speech.md?tabs=streaming#audio-outputs). For more information on 48-kHz high-quality voices, see [this introduction blog](https://techcommunity.microsoft.com/t5/ai-cognitive-services-blog/azure-neural-tts-voices-upgraded-to-48khz-with-hifinet2-vocoder/ba-p/3665252).
41
41
42
42
For an example, see the [text to speech quickstart](get-started-text-to-speech.md).
0 commit comments