You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/captioning-concepts.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -225,7 +225,7 @@ Profanity filter is applied to the result `Text` and `MaskedNormalizedForm` prop
225
225
226
226
## Language identification
227
227
228
-
If the language in the audio could change, use continuous [language identification](language-identification.md). Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md#speech-to-text-and-text-to-speech). You provide up to 10 candidate languages, at least one of which is expected be in the audio. The Speech service returns the most likely language in the audio.
228
+
If the language in the audio could change, use continuous [language identification](language-identification.md). Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md). You provide up to 10 candidate languages, at least one of which is expected be in the audio. The Speech service returns the most likely language in the audio.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/conversation-transcription.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -82,7 +82,7 @@ Audio data is processed live to return the speaker identifier and transcript, an
82
82
83
83
## Language support
84
84
85
-
Currently, conversation transcription supports [all speech-to-text languages](language-support.md#speech-to-text) in the following regions: `centralus`, `eastasia`, `eastus`, `westeurope`.
85
+
Currently, conversation transcription supports [all speech-to-text languages](language-support.md) in the following regions: `centralus`, `eastasia`, `eastus`, `westeurope`.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/custom-neural-voice.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ ms.author: eur
16
16
17
17
Custom Neural Voice is a text-to-speech feature that lets you create a one-of-a-kind, customized, synthetic voice for your applications. With Custom Neural Voice, you can build a highly natural-sounding voice by providing your audio samples as training data. If you're looking for ready-to-use options, check out our [text-to-speech](text-to-speech.md) service.
18
18
19
-
Based on the neural text-to-speech technology and the multilingual, multi-speaker, universal model, Custom Neural Voice lets you create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of Custom Neural Voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md#speech-to-text-and-text-to-speech) for Custom Neural Voice.
19
+
Based on the neural text-to-speech technology and the multilingual, multi-speaker, universal model, Custom Neural Voice lets you create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of Custom Neural Voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md) for Custom Neural Voice.
20
20
21
21
> [!IMPORTANT]
22
22
> Custom Neural Voice access is limited based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-audio-content-creation.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ ms.author: eur
18
18
19
19
The tool is based on [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md). It allows you to adjust text-to-speech output attributes in real time or batch synthesis, such as voice characters, voice styles, speaking speed, pronunciation, and prosody.
20
20
21
-
You have easy access to a broad portfolio of [languages and voices](language-support.md#text-to-speech). These voices include state-of-the-art prebuilt neural voices and your custom neural voice, if you've built one.
21
+
You have easy access to a broad portfolio of [languages and voices](language-support.md). These voices include state-of-the-art prebuilt neural voices and your custom neural voice, if you've built one.
22
22
23
23
To learn more, view the [Audio Content Creation tutorial video](https://youtu.be/ygApYuOOG6w).
24
24
@@ -70,7 +70,7 @@ Each step in the preceding diagram is described here:
70
70
1. Choose the Speech resource you want to work with.
71
71
72
72
1.[Create an audio tuning file](#create-an-audio-tuning-file) by using plain text or SSML scripts. Enter or upload your content into Audio Content Creation.
73
-
1. Choose the voice and the language for your script content. Audio Content Creation includes all of the [Microsoft text-to-speech voices](language-support.md#text-to-speech). You can use prebuilt neural voices or a custom neural voice.
73
+
1. Choose the voice and the language for your script content. Audio Content Creation includes all of the [Microsoft text-to-speech voices](language-support.md). You can use prebuilt neural voices or a custom neural voice.
74
74
75
75
> [!NOTE]
76
76
> Gated access is available for Custom Neural Voice, which allows you to create high-definition voices that are similar to natural-sounding speech. For more information, see [Gating process](./text-to-speech.md).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-speech-test-and-train.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,7 +73,7 @@ You can use audio + human-labeled transcript data for both [training](how-to-cus
73
73
- To improve the acoustic aspects like slight accents, speaking styles, and background noises.
74
74
- To measure the accuracy of Microsoft's speech-to-text accuracy when it's processing your audio files.
75
75
76
-
For a list of base models that support training with audio data, see [Language support](language-support.md#speech-to-text). Even if a base model does support training with audio data, the service might use only part of the audio. And it will still use all the transcripts.
76
+
For a list of base models that support training with audio data, see [Language support](language-support.md). Even if a base model does support training with audio data, the service might use only part of the audio. And it will still use all the transcripts.
77
77
78
78
> [!IMPORTANT]
79
79
> If a base model doesn't support customization with audio data, only the transcription text will be used for training. If you switch to a base model that supports customization with audio data, the training time may increase from several hours to several days. The change in training time would be most noticeable when you switch to a base model in a [region](regions.md#speech-service) without dedicated hardware for training. If the audio data is not required, you should remove it to decrease the training time.
@@ -143,7 +143,7 @@ Expected utterances often follow a certain pattern. One common pattern is that u
143
143
* "I have a question about `product`," where `product` is a list of possible products.
144
144
* "Make that `object``color`," where `object` is a list of geometric shapes and `color` is a list of colors.
145
145
146
-
For a list of supported base models and locales for training with structured text, see [Language support](language-support.md#speech-to-text). You must use the latest base model for these locales. For locales that don't support training with structured text, the service will take any training sentences that don't reference any classes as part of training with plain-text data.
146
+
For a list of supported base models and locales for training with structured text, see [Language support](language-support.md). You must use the latest base model for these locales. For locales that don't support training with structured text, the service will take any training sentences that don't reference any classes as part of training with plain-text data.
147
147
148
148
The structured-text file should have an .md extension. The maximum file size is 200 MB, and the text encoding must be UTF-8 BOM. The syntax of the Markdown is the same as that from the Language Understanding models, in particular list entities and example utterances. For more information about the complete Markdown syntax, see the <ahref="/azure/bot-service/file-format/bot-builder-lu-file-format"target="_blank"> Language Understanding Markdown</a>.
149
149
@@ -202,7 +202,7 @@ Here's an example structured text file:
202
202
203
203
Specialized or made up words might have unique pronunciations. These words can be recognized if they can be broken down into smaller words to pronounce them. For example, to recognize "Xbox", pronounce it as "X box". This approach won't increase overall accuracy, but can improve recognition of this and other keywords.
204
204
205
-
You can provide a custom pronunciation file to improve recognition. Don't use custom pronunciation files to alter the pronunciation of common words. For a list of languages that support custom pronunciation, see [language support](language-support.md#speech-to-text).
205
+
You can provide a custom pronunciation file to improve recognition. Don't use custom pronunciation files to alter the pronunciation of common words. For a list of languages that support custom pronunciation, see [language support](language-support.md).
206
206
207
207
> [!NOTE]
208
208
> You can either use a pronunciation data file on its own, or you can add pronunciation within a structured text data file. The Speech service doesn't support training a model where you select both of those datasets as input.
@@ -259,7 +259,7 @@ Use <a href="http://sox.sourceforge.net" target="_blank" rel="noopener">SoX</a>
259
259
260
260
### Audio data for training
261
261
262
-
Not all base models support [training with audio data](language-support.md#speech-to-text). For a list of base models that support training with audio data, see [Language support](language-support.md#speech-to-text).
262
+
Not all base models support [training with audio data](language-support.md). For a list of base models that support training with audio data, see [Language support](language-support.md).
263
263
264
264
Even if a base model supports training with audio data, the service might use only part of the audio. In [regions](regions.md#speech-service) with dedicated hardware available for training audio data, the Speech service will use up to 20 hours of your audio training data. In other regions, the Speech service uses up to 8 hours of your audio data.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -169,7 +169,7 @@ After you validate your data files, you can use them to build your Custom Neural
169
169
170
170
If you want to create a voice in the same language of your training data, select **Neural** method. For the **Neural** method, you can select different versions of the training recipe for your model. The versions vary according to the features supported and model training time. Normally new versions are enhanced ones with bugs fixed and new features supported. The latest version is selected by default.
171
171
172
-
You can also select **Neural - cross lingual** and **Target language** to create a secondary language for your voice model. Only one target language can be selected for a voice model. You don't need to prepare additional data in the target language for training, but your test script needs to be in the target language. For the languages supported by cross lingual feature, see [supported languages](language-support.md#speech-to-text-and-text-to-speech).
172
+
You can also select **Neural - cross lingual** and **Target language** to create a secondary language for your voice model. Only one target language can be selected for a voice model. You don't need to prepare additional data in the target language for training, but your test script needs to be in the target language. For the languages supported by cross lingual feature, see [supported languages](language-support.md).
173
173
174
174
The same unit price applies to both **Neural** and **Neural - cross lingual**. Check [the pricing details](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) for training.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-voice.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ ms.author: eur
14
14
15
15
# Create a Project
16
16
17
-
[Custom Neural Voice](https://aka.ms/customvoice) is a set of online tools that you use to create a recognizable, one-of-a-kind voice for your brand. All it takes to get started are a handful of audio files and the associated transcriptions. See if Custom Neural Voice supports your [language](language-support.md#speech-to-text-and-text-to-speech) and [region](regions.md#speech-service).
17
+
[Custom Neural Voice](https://aka.ms/customvoice) is a set of online tools that you use to create a recognizable, one-of-a-kind voice for your brand. All it takes to get started are a handful of audio files and the associated transcriptions. See if Custom Neural Voice supports your [language](language-support.md) and [region](regions.md#speech-service).
18
18
19
19
> [!IMPORTANT]
20
20
> Custom Neural Voice Pro can be used to create higher-quality models that are indistinguishable from human recordings. For access you must commit to using it in alignment with our responsible AI principles. Learn more about our [policy on limited access](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext) and [apply here](https://aka.ms/customneural).
@@ -50,7 +50,7 @@ To create a custom voice project:
50
50
51
51
## Cross lingual feature
52
52
53
-
With cross lingual feature (public preview), you can create a different language for your voice model. If the language of your training data is supported by cross lingual feature, you can create a voice that speaks a different language from your training data. For example, with the `zh-CN` training data, you can create a voice that speaks `en-US` or any of the languages supported by cross lingual feature. For details, see [supported languages](language-support.md#speech-to-text-and-text-to-speech). You don't need to prepare additional data in the target language for training, but your test script needs to be in the target language.
53
+
With cross lingual feature (public preview), you can create a different language for your voice model. If the language of your training data is supported by cross lingual feature, you can create a voice that speaks a different language from your training data. For example, with the `zh-CN` training data, you can create a voice that speaks `en-US` or any of the languages supported by cross lingual feature. For details, see [supported languages](language-support.md). You don't need to prepare additional data in the target language for training, but your test script needs to be in the target language.
54
54
55
55
For how to create a different language from your training data, select the training method **Neural-cross lingual** during training. See [how to train your custom neural voice model](how-to-custom-voice-create-voice.md#train-your-custom-neural-voice-model).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-migrate-to-prebuilt-neural-voice.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ ms.author: v-baolianzou
15
15
# Migrate from prebuilt standard voice to prebuilt neural voice
16
16
17
17
> [!IMPORTANT]
18
-
> We are retiring the standard voices from September 1, 2021 through August 31, 2024. If you used a standard voice with your Speech resource prior to September 1, 2021 then you can continue to do so until August 31, 2024. All other Speech resources can only use prebuilt neural voices. You can choose from the supported [neural voice names](language-support.md#prebuilt-neural-voices). After August 31, the standard voices won't be supported with any Speech resource.
18
+
> We are retiring the standard voices from September 1, 2021 through August 31, 2024. If you used a standard voice with your Speech resource prior to September 1, 2021 then you can continue to do so until August 31, 2024. All other Speech resources can only use prebuilt neural voices. You can choose from the supported [neural voice names](language-support.md). After August 31, the standard voices won't be supported with any Speech resource.
19
19
20
20
The prebuilt neural voice provides more natural sounding speech output, and thus, a better end-user experience.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-pronunciation-assessment.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ You can get pronunciation assessment scores for:
30
30
- Phonemes in SAPI or IPA format
31
31
32
32
> [!NOTE]
33
-
> For information about availability of pronunciation assessment, see [supported languages](language-support.md#pronunciation-assessment) and [available regions](regions.md#speech-service).
33
+
> For information about availability of pronunciation assessment, see [supported languages](language-support.md) and [available regions](regions.md#speech-service).
34
34
>
35
35
> The syllable groups, IPA phonemes, and spoken phoneme features of pronunciation assessment are currently only available for the en-US locale.
36
36
@@ -141,7 +141,7 @@ To request syllable-level results along with phonemes, set the granularity [conf
141
141
142
142
## Phoneme alphabet format
143
143
144
-
The phoneme name is provided together with the score, to help identity which phonemes were pronounced accurately or inaccurately. For the [supported languages](language-support.md#pronunciation-assessment), you can get the phoneme name in [SAPI](/previous-versions/windows/desktop/ee431828(v=vs.85)#american-english-phoneme-table) format, and for the `en-US` locale, you can also get the phoneme name in [IPA](https://en.wikipedia.org/wiki/IPA) format.
144
+
The phoneme name is provided together with the score, to help identity which phonemes were pronounced accurately or inaccurately. For the [supported languages](language-support.md), you can get the phoneme name in [SAPI](/previous-versions/windows/desktop/ee431828(v=vs.85)#american-english-phoneme-table) format, and for the `en-US` locale, you can also get the phoneme name in [IPA](https://en.wikipedia.org/wiki/IPA) format.
145
145
146
146
The following table compares example SAPI phonemes with the corresponding IPA phonemes.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-recognize-intents-from-speech-csharp.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -219,7 +219,7 @@ The application doesn't parse the JSON result. It only displays the JSON text in
219
219
220
220
## Specify recognition language
221
221
222
-
By default, LUIS recognizes intents in US English (`en-us`). By assigning a locale code to the `SpeechRecognitionLanguage` property of the speech configuration, you can recognize intents in other languages. For example, add `config.SpeechRecognitionLanguage = "de-de";` in our application before creating the recognizer to recognize intents in German. For more information, see [LUIS language support](../LUIS/luis-language-support.md#languages-supported).
222
+
By default, LUIS recognizes intents in US English (`en-us`). By assigning a locale code to the `SpeechRecognitionLanguage` property of the speech configuration, you can recognize intents in other languages. For example, add `config.SpeechRecognitionLanguage = "de-de";` in our application before creating the recognizer to recognize intents in German. For more information, see [LUIS language support](../LUIS/luis-language-support.md).
0 commit comments