You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/call-center-transcription.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -113,7 +113,7 @@ Internally, Microsoft uses these technologies to support Microsoft customer call
113
113
114
114
Some businesses are required to transcribe conversations in real time. You can use real-time transcription to identify keywords and trigger searches for content and resources that are relevant to the conversation, to monitor sentiment, to improve accessibility, or to provide translations for customers and agents who aren't native speakers.
115
115
116
-
For scenarios that require real-time transcription, we recommend using the [Speech SDK](speech-sdk.md). Currently, speech-to-text is available in [more than 20 languages](language-support.md), and the SDK is available in C++, C#, Java, Python, JavaScript, Objective-C, and Go. Samples are available in each language on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk). For the latest news and updates, see [Release notes](releasenotes.md).
116
+
For scenarios that require real-time transcription, we recommend using the [Speech SDK](speech-sdk.md). The SDK is available in C++, C#, Java, Python, JavaScript, Objective-C, and Go. Samples are available in each language on [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk). For the latest news and updates, see [Release notes](releasenotes.md).
117
117
118
118
Internally, Microsoft uses the previously mentioned technologies to analyze Microsoft customer calls in real time, as shown in the following diagram:
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/captioning-concepts.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -225,7 +225,7 @@ Profanity filter is applied to the result `Text` and `MaskedNormalizedForm` prop
225
225
226
226
## Language identification
227
227
228
-
If the language in the audio could change, use continuous [language identification](language-identification.md). Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md). You provide up to 10 candidate languages, at least one of which is expected be in the audio. The Speech service returns the most likely language in the audio.
228
+
If the language in the audio could change, use continuous [language identification](language-identification.md). Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md?tabs=language-identification). You provide up to 10 candidate languages, at least one of which is expected be in the audio. The Speech service returns the most likely language in the audio.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/conversation-transcription.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -82,7 +82,7 @@ Audio data is processed live to return the speaker identifier and transcript, an
82
82
83
83
## Language support
84
84
85
-
Currently, conversation transcription supports [all speech-to-text languages](language-support.md) in the following regions: `centralus`, `eastasia`, `eastus`, `westeurope`.
85
+
Currently, conversation transcription supports [all speech-to-text languages](language-support.md?tabs=stt-tts) in the following regions: `centralus`, `eastasia`, `eastus`, `westeurope`.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/custom-neural-voice.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ ms.author: eur
16
16
17
17
Custom Neural Voice is a text-to-speech feature that lets you create a one-of-a-kind, customized, synthetic voice for your applications. With Custom Neural Voice, you can build a highly natural-sounding voice by providing your audio samples as training data. If you're looking for ready-to-use options, check out our [text-to-speech](text-to-speech.md) service.
18
18
19
-
Based on the neural text-to-speech technology and the multilingual, multi-speaker, universal model, Custom Neural Voice lets you create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of Custom Neural Voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md) for Custom Neural Voice.
19
+
Based on the neural text-to-speech technology and the multilingual, multi-speaker, universal model, Custom Neural Voice lets you create synthetic voices that are rich in speaking styles, or adaptable cross languages. The realistic and natural sounding voice of Custom Neural Voice can represent brands, personify machines, and allow users to interact with applications conversationally. See the [supported languages](language-support.md?tabs=stt-tts) for Custom Neural Voice.
20
20
21
21
> [!IMPORTANT]
22
22
> Custom Neural Voice access is limited based on eligibility and usage criteria. Request access on the [intake form](https://aka.ms/customneural).
With Custom Speech, you can evaluate and improve the Microsoft speech-to-text accuracy for your applications and products.
19
19
20
-
Out of the box, speech to text utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing a variety of common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md) is used by default. The base model works very well in most speech recognition scenarios.
20
+
Out of the box, speech to text utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing a variety of common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md?tabs=stt-tts) is used by default. The base model works very well in most speech recognition scenarios.
21
21
22
22
A custom model can be used to augment the base model to improve recognition of domain-specific vocabulary specific to the application by providing text data to train the model. It can also be used to improve recognition based for the specific audio conditions of the application by providing audio data with reference transcriptions.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/direct-line-speech.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ Sample code for creating a voice assistant is available on GitHub. These samples
49
49
Voice assistants built using Speech service can use the full range of customization options available for [speech-to-text](speech-to-text.md), [text-to-speech](text-to-speech.md), and [custom keyword selection](./custom-keyword-basics.md).
50
50
51
51
> [!NOTE]
52
-
> Customization options vary by language/locale (see [Supported languages](./language-support.md)).
52
+
> Customization options vary by language/locale (see [Supported languages](./language-support.md?tabs=stt-tts)).
53
53
54
54
Direct Line Speech and its associated functionality for voice assistants are an ideal supplement to the [Virtual Assistant Solution and Enterprise Template](/azure/bot-service/bot-builder-enterprise-template-overview). Though Direct Line Speech can work with any compatible bot, these resources provide a reusable baseline for high-quality conversational experiences as well as common supporting skills and models to get started quickly.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-audio-content-creation.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ ms.author: eur
18
18
19
19
The tool is based on [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md). It allows you to adjust text-to-speech output attributes in real time or batch synthesis, such as voice characters, voice styles, speaking speed, pronunciation, and prosody.
20
20
21
-
You have easy access to a broad portfolio of [languages and voices](language-support.md). These voices include state-of-the-art prebuilt neural voices and your custom neural voice, if you've built one.
21
+
You have easy access to a broad portfolio of [languages and voices](language-support.md?tabs=stt-tts). These voices include state-of-the-art prebuilt neural voices and your custom neural voice, if you've built one.
22
22
23
23
To learn more, view the [Audio Content Creation tutorial video](https://youtu.be/ygApYuOOG6w).
24
24
@@ -70,7 +70,7 @@ Each step in the preceding diagram is described here:
70
70
1. Choose the Speech resource you want to work with.
71
71
72
72
1.[Create an audio tuning file](#create-an-audio-tuning-file) by using plain text or SSML scripts. Enter or upload your content into Audio Content Creation.
73
-
1. Choose the voice and the language for your script content. Audio Content Creation includes all of the [Microsoft text-to-speech voices](language-support.md). You can use prebuilt neural voices or a custom neural voice.
73
+
1. Choose the voice and the language for your script content. Audio Content Creation includes all of the [prebuilt text-to-speech voices](language-support.md?tabs=stt-tts). You can use prebuilt neural voices or a custom neural voice.
74
74
75
75
> [!NOTE]
76
76
> Gated access is available for Custom Neural Voice, which allows you to create high-definition voices that are similar to natural-sounding speech. For more information, see [Gating process](./text-to-speech.md).
Custom Speech projects contain models, training and testing datasets, and deployment endpoints. Each project is specific to a [locale](language-support.md). For example, you might create a project for English in the United States.
18
+
Custom Speech projects contain models, training and testing datasets, and deployment endpoints. Each project is specific to a [locale](language-support.md?tabs=stt-tts). For example, you might create a project for English in the United States.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-speech-model-and-endpoint-lifecycle.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,7 +42,7 @@ When a custom model or base model expires, it is no longer available for transcr
42
42
43
43
|Transcription route |Expired model result |Recommendation |
44
44
|---------|---------|---------|
45
-
|Custom endpoint|Speech recognition requests will fall back to the most recent base model for the same [locale](language-support.md). You will get results, but recognition might not accurately transcribe your domain data. |Update the endpoint's model as described in the [Deploy a Custom Speech model](how-to-custom-speech-deploy-model.md) guide. |
45
+
|Custom endpoint|Speech recognition requests will fall back to the most recent base model for the same [locale](language-support.md?tabs=stt-tts). You will get results, but recognition might not accurately transcribe your domain data. |Update the endpoint's model as described in the [Deploy a Custom Speech model](how-to-custom-speech-deploy-model.md) guide. |
46
46
|Batch transcription |[Batch transcription](batch-transcription.md) requests for expired models will fail with a 4xx error. |In each [CreateTranscription](https://westus2.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-0/operations/CreateTranscription) REST API request body, set the `model` property to a base model or custom model that hasn't yet expired. Otherwise don't include the `model` property to always use the latest base model. |
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-speech-test-and-train.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,7 +73,7 @@ You can use audio + human-labeled transcript data for both [training](how-to-cus
73
73
- To improve the acoustic aspects like slight accents, speaking styles, and background noises.
74
74
- To measure the accuracy of Microsoft's speech-to-text accuracy when it's processing your audio files.
75
75
76
-
For a list of base models that support training with audio data, see [Language support](language-support.md). Even if a base model does support training with audio data, the service might use only part of the audio. And it will still use all the transcripts.
76
+
For a list of base models that support training with audio data, see [Language support](language-support.md?tabs=stt-tts). Even if a base model does support training with audio data, the service might use only part of the audio. And it will still use all the transcripts.
77
77
78
78
> [!IMPORTANT]
79
79
> If a base model doesn't support customization with audio data, only the transcription text will be used for training. If you switch to a base model that supports customization with audio data, the training time may increase from several hours to several days. The change in training time would be most noticeable when you switch to a base model in a [region](regions.md#speech-service) without dedicated hardware for training. If the audio data is not required, you should remove it to decrease the training time.
@@ -143,7 +143,7 @@ Expected utterances often follow a certain pattern. One common pattern is that u
143
143
* "I have a question about `product`," where `product` is a list of possible products.
144
144
* "Make that `object``color`," where `object` is a list of geometric shapes and `color` is a list of colors.
145
145
146
-
For a list of supported base models and locales for training with structured text, see [Language support](language-support.md). You must use the latest base model for these locales. For locales that don't support training with structured text, the service will take any training sentences that don't reference any classes as part of training with plain-text data.
146
+
For a list of supported base models and locales for training with structured text, see [Language support](language-support.md?tabs=stt-tts). You must use the latest base model for these locales. For locales that don't support training with structured text, the service will take any training sentences that don't reference any classes as part of training with plain-text data.
147
147
148
148
The structured-text file should have an .md extension. The maximum file size is 200 MB, and the text encoding must be UTF-8 BOM. The syntax of the Markdown is the same as that from the Language Understanding models, in particular list entities and example utterances. For more information about the complete Markdown syntax, see the <ahref="/azure/bot-service/file-format/bot-builder-lu-file-format"target="_blank"> Language Understanding Markdown</a>.
149
149
@@ -202,7 +202,7 @@ Here's an example structured text file:
202
202
203
203
Specialized or made up words might have unique pronunciations. These words can be recognized if they can be broken down into smaller words to pronounce them. For example, to recognize "Xbox", pronounce it as "X box". This approach won't increase overall accuracy, but can improve recognition of this and other keywords.
204
204
205
-
You can provide a custom pronunciation file to improve recognition. Don't use custom pronunciation files to alter the pronunciation of common words. For a list of languages that support custom pronunciation, see [language support](language-support.md).
205
+
You can provide a custom pronunciation file to improve recognition. Don't use custom pronunciation files to alter the pronunciation of common words. For a list of languages that support custom pronunciation, see [language support](language-support.md?tabs=stt-tts).
206
206
207
207
> [!NOTE]
208
208
> You can either use a pronunciation data file on its own, or you can add pronunciation within a structured text data file. The Speech service doesn't support training a model where you select both of those datasets as input.
@@ -259,7 +259,7 @@ Use <a href="http://sox.sourceforge.net" target="_blank" rel="noopener">SoX</a>
259
259
260
260
### Audio data for training
261
261
262
-
Not all base models support [training with audio data](language-support.md). For a list of base models that support training with audio data, see [Language support](language-support.md).
262
+
Not all base models support [training with audio data](language-support.md?tabs=stt-tts). For a list of base models that support training with audio data, see [Language support](language-support.md?tabs=stt-tts).
263
263
264
264
Even if a base model supports training with audio data, the service might use only part of the audio. In [regions](regions.md#speech-service) with dedicated hardware available for training audio data, the Speech service will use up to 20 hours of your audio training data. In other regions, the Speech service uses up to 8 hours of your audio data.
0 commit comments