You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/custom-speech-overview.md
+1-3Lines changed: 1 addition & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,15 +29,13 @@ With Custom Speech, you can upload your own data, test and train a custom model,
29
29
30
30
Here's more information about the sequence of steps shown in the previous diagram:
31
31
32
-
1.[Create a project](how-to-custom-speech-create-project.md) and choose a model. Use a <ahref="https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices"title="Create a Speech resource"target="_blank">Speech resource</a> that you create in the Azure portal.
32
+
1.[Create a project](how-to-custom-speech-create-project.md) and choose a model. Use a <ahref="https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices"title="Create a Speech resource"target="_blank">Speech resource</a> that you create in the Azure portal. If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the [regions](regions.md#speech-service) table for more information.
33
33
1.[Upload test data](./how-to-custom-speech-upload-data.md). Upload test data to evaluate the Microsoft speech-to-text offering for your applications, tools, and products.
34
34
1.[Test recognition quality](how-to-custom-speech-inspect-data.md). Use the [Speech Studio](https://aka.ms/speechstudio/customspeech) to play back uploaded audio and inspect the speech recognition quality of your test data.
35
35
1.[Test model quantitatively](how-to-custom-speech-evaluate-data.md). Evaluate and improve the accuracy of the speech-to-text model. The Speech service provides a quantitative word error rate (WER), which you can use to determine if additional training is required.
36
36
1.[Train a model](how-to-custom-speech-train-model.md). Provide written transcripts and related text, along with the corresponding audio data. Testing a model before and after training is optional but recommended.
37
37
1.[Deploy a model](how-to-custom-speech-deploy-model.md). Once you're satisfied with the test results, deploy the model to a custom endpoint. With the exception of [batch transcription](batch-transcription.md), you must deploy a custom endpoint to use a Custom Speech model.
38
38
39
-
If you will train a custom model with audio data, choose a Speech resource [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware for training audio data. In regions with dedicated hardware for Custom Speech training, the Speech service will use up to 20 hours of your audio training data, and can process about 10 hours of data per day. In other regions, the Speech service uses up to 8 hours of your audio data, and can process about 1 hour of data per day. After a model is trained, you can copy it to a Speech resource in another region as needed.
40
-
41
39
## Next steps
42
40
43
41
*[Create a project](how-to-custom-speech-create-project.md)
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-audio-content-creation.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ After you sign up for the Azure account, you need to create a Speech resource in
41
41
It takes a few moments to deploy your new Speech resource. After the deployment is complete, you can start using the Audio Content Creation tool.
42
42
43
43
> [!NOTE]
44
-
> If you plan to use neural voices, make sure that you create your resource in [a region that supports neural voices](regions.md#prebuilt-neural-voices).
44
+
> If you plan to use neural voices, make sure that you create your resource in [a region that supports neural voices](regions.md#speech-service).
45
45
46
46
### Step 3: Sign in to Audio Content Creation with your Azure account and Speech resource
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-speech-create-project.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,6 +25,10 @@ To create a Custom Speech project, follow these steps:
25
25
26
26
1. Sign in to the [Speech Studio](https://aka.ms/speechstudio/customspeech).
27
27
1. Select the subscription and Speech resource to work with.
28
+
29
+
> [!IMPORTANT]
30
+
> If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the [regions](regions.md#speech-service) table for more information.
31
+
28
32
1. Select **Custom speech** > **Create a new project**.
29
33
1. Follow the instructions provided by the wizard to create your project.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-speech-test-and-train.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,7 @@ Training with plain text or structured text usually finishes within a few minute
45
45
>
46
46
> Start with small sets of sample data that match the language, acoustics, and hardware where your model will be used. Small datasets of representative data can expose problems before you invest in gathering larger datasets for training. For sample Custom Speech data, see <ahref="https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/sampledata/customspeech"target="_target">this GitHub repository</a>.
47
47
48
-
If you will train a custom model with audio data, choose a Speech resource [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware available for training audio data. In regions with dedicated hardware for Custom Speech training, the Speech service will use up to 20 hours of your audio training data, and can process about 10 hours of data per day. In other regions, the Speech service uses up to 8 hours of your audio data, and can process about 1 hour of data per day. After the model is trained, you can copy the model to another region as needed with the [CopyModelToSubscription](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-0/operations/CopyModelToSubscription) REST API.
48
+
If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the [regions](regions.md#speech-service) table for more information. In regions with dedicated hardware for Custom Speech training, the Speech service will use up to 20 hours of your audio training data, and can process about 10 hours of data per day. In other regions, the Speech service uses up to 8 hours of your audio data, and can process about 1 hour of data per day. After the model is trained, you can copy the model to another region as needed with the [CopyModelToSubscription](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-0/operations/CopyModelToSubscription) REST API.
49
49
50
50
## Consider datasets by scenario
51
51
@@ -76,7 +76,7 @@ You can use audio + human-labeled transcript data for both [training](how-to-cus
76
76
For a list of base models that support training with audio data, see [Language support](language-support.md#speech-to-text). Even if a base model does support training with audio data, the service might use only part of the audio. And it will still use all the transcripts.
77
77
78
78
> [!IMPORTANT]
79
-
> If a base model doesn't support customization with audio data, only the transcription text will be used for training. If you switch to a base model that supports customization with audio data, the training time may increase from several hours to several days. The change in training time would be most noticeable when you switch to a base model in a [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) without dedicated hardware for training. If the audio data is not required, you should remove it to decrease the training time.
79
+
> If a base model doesn't support customization with audio data, only the transcription text will be used for training. If you switch to a base model that supports customization with audio data, the training time may increase from several hours to several days. The change in training time would be most noticeable when you switch to a base model in a [region](regions.md#speech-service) without dedicated hardware for training. If the audio data is not required, you should remove it to decrease the training time.
80
80
81
81
Audio with human-labeled transcripts offers the greatest accuracy improvements if the audio comes from the target use case. Samples must cover the full scope of speech. For example, a call center for a retail store would get the most calls about swimwear and sunglasses during summer months. Ensure that your sample includes the full scope of speech that you want to detect.
82
82
@@ -91,7 +91,7 @@ Consider these details:
91
91
* The Speech service automatically uses the transcripts to improve the recognition of domain-specific words and phrases, as though they were added as related text.
92
92
* It can take several days for a training operation to finish. To improve the speed of training, be sure to create your Speech service subscription in a region that has dedicated hardware for training.
93
93
94
-
A large training dataset is required to improve recognition. Generally, we recommend that you provide word-by-word transcriptions for 1 to 20 hours of audio. However, even as little as 30 minutes can help improve recognition results. Although creating human-labeled transcription can take time, improvements in recognition will only be as good as the data that you provide. You should only upload only high-quality transcripts.
94
+
A large training dataset is required to improve recognition. Generally, we recommend that you provide word-by-word transcriptions for 1 to 20 hours of audio. However, even as little as 30 minutes can help improve recognition results. Although creating human-labeled transcription can take time, improvements in recognition will only be as good as the data that you provide. You should upload only high-quality transcripts.
95
95
96
96
Audio files can have silence at the beginning and end of the recording. If possible, include at least a half-second of silence before and after speech in each sample file. Although audio with low recording volume or disruptive background noise is not helpful, it shouldn't limit or degrade your custom model. Always consider upgrading your microphones and signal processing hardware before gathering audio samples.
97
97
@@ -154,7 +154,7 @@ Here are key details about the supported Markdown format:
154
154
|`@list`|A list of items that can be referenced in an example sentence.|Maximum of 10 lists. Maximum of 4,000 items per list.|
155
155
|`speech:phoneticlexicon`|A list of phonetic pronunciations according to the [Universal Phone Set](customize-pronunciation.md). Pronunciation is adjusted for each instance where the word appears in a list or training sentence. For example, if you have a word that sounds like "cat" and you want to adjust the pronunciation to "k ae t", you would add `- cat/k ae t` to the `speech:phoneticlexicon` list.|Maximum of 15,000 entries. Maximum of 2 pronunciations per word.|
156
156
|`#ExampleSentences`|A pound symbol (`#`) delimits a section of example sentences. The section heading can only contain letters, digits, and underscores. Example sentences should reflect the range of speech that your model should expect. A training sentence can refer to items under a `@list` by using surrounding left and right curly braces (`{@list name}`). You can refer to multiple lists in the same training sentence, or none at all.|Maximum of 50,000 example sentences|
157
-
|`//`|Comments follow a double slash (`//`).|Not applicable|
157
+
|`//`|Comments follow a double slash (`//`).|Not applicable|
158
158
159
159
Here's an example structured text file:
160
160
@@ -261,7 +261,7 @@ Use <a href="http://sox.sourceforge.net" target="_blank" rel="noopener">SoX</a>
261
261
262
262
Not all base models support [training with audio data](language-support.md#speech-to-text). For a list of base models that support training with audio data, see [Language support](language-support.md#speech-to-text).
263
263
264
-
Even if a base model supports training with audio data, the service might use only part of the audio. In [regions](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware available for training audio data, the Speech service will use up to 20 hours of your audio training data. In other regions, the Speech service uses up to 8 hours of your audio data.
264
+
Even if a base model supports training with audio data, the service might use only part of the audio. In [regions](regions.md#speech-service) with dedicated hardware available for training audio data, the Speech service will use up to 20 hours of your audio training data. In other regions, the Speech service uses up to 8 hours of your audio data.
In this article, you'll learn how to train a custom model to improve recognition accuracy from the Microsoft base model. The speech recognition accuracy and quality of a Custom Speech model will remain consistent, even when a new base model is released.
20
20
21
+
> [!NOTE]
22
+
> You pay to use Custom Speech models, but you are not charged for training a model.
23
+
21
24
Training a model is typically an iterative process. You will first select a base model that is the starting point for a new model. You train a model with [datasets](./how-to-custom-speech-test-and-train.md) that can include text and audio, and then you test. If the recognition quality or accuracy doesn't meet your requirements, you can create a new model with additional or modified training data, and then test again.
22
25
23
26
You can use a custom model for a limited time after it's trained. You must periodically recreate and adapt your custom model from the latest base model to take advantage of the improved accuracy and quality. For more information, see [Model and endpoint lifecycle](./how-to-custom-speech-model-and-endpoint-lifecycle.md).
24
27
25
-
> [!NOTE]
26
-
> You pay to use Custom Speech models, but you are not charged for training a model.
27
-
28
-
If you plan to train a model with audio data, use a Speech resource in a [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware for training. After a model is trained, you can [copy it to a Speech resource](#copy-a-model) in another region as needed.
28
+
> [!IMPORTANT]
29
+
> If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. After a model is trained, you can [copy it to a Speech resource](#copy-a-model) in another region as needed.
30
+
>
31
+
> In regions with dedicated hardware for Custom Speech training, the Speech service will use up to 20 hours of your audio training data, and can process about 10 hours of data per day. In other regions, the Speech service uses up to 8 hours of your audio data, and can process about 1 hour of data per day. See footnotes in the [regions](regions.md#speech-service) table for more information.
29
32
30
33
## Create a model
31
34
@@ -209,7 +212,7 @@ The top-level `self` property in the response body is the model's URI. Use this
209
212
210
213
## Copy a model
211
214
212
-
You can copy a model to another project that uses the same locale. For example, after a model is trained with audio data in a [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware for training, you can copy it to a Speech resource in another region as needed.
215
+
You can copy a model to another project that uses the same locale. For example, after a model is trained with audio data in a [region](regions.md#speech-service) with dedicated hardware for training, you can copy it to a Speech resource in another region as needed.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -288,7 +288,7 @@ After you've updated the engine version for your voice model, you need to [redep
288
288
For more information, [learn more about the capabilities and limits of this feature, and the best practice to improve your model quality](/legal/cognitive-services/speech-service/custom-neural-voice/characteristics-and-limitations-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext).
289
289
290
290
> [!NOTE]
291
-
> Custom Neural Voice training is only available in some regions. But you can easily copy a neural voice model from these regions to other regions. For more information, see the [regions for Custom Neural Voice](regions.md#text-to-speech).
291
+
> Custom Neural Voice training is only available in some regions. But you can easily copy a neural voice model from these regions to other regions. For more information, see the [regions for Custom Neural Voice](regions.md#speech-service).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-custom-voice.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ ms.author: eur
14
14
15
15
# Create a Project
16
16
17
-
[Custom Neural Voice](https://aka.ms/customvoice) is a set of online tools that you use to create a recognizable, one-of-a-kind voice for your brand. All it takes to get started are a handful of audio files and the associated transcriptions. See if Custom Neural Voice supports your [language](language-support.md#custom-neural-voice) and [region](regions.md#custom-neural-voices).
17
+
[Custom Neural Voice](https://aka.ms/customvoice) is a set of online tools that you use to create a recognizable, one-of-a-kind voice for your brand. All it takes to get started are a handful of audio files and the associated transcriptions. See if Custom Neural Voice supports your [language](language-support.md#custom-neural-voice) and [region](regions.md#speech-service).
18
18
19
19
> [!IMPORTANT]
20
20
> Custom Neural Voice Pro can be used to create higher-quality models that are indistinguishable from human recordings. For access you must commit to using it in alignment with our responsible AI principles. Learn more about our [policy on limited access](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext) and [apply here](https://aka.ms/customneural).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-deploy-and-use-endpoint.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,7 +51,7 @@ The custom endpoint is functionally identical to the standard endpoint that's us
51
51
You can copy your voice model to another project for the same region or another region. For example, you can copy a neural voice model that was trained in one region, to a project for another region.
52
52
53
53
> [!NOTE]
54
-
> Custom neural voice training is only available in the these regions: East US, Southeast Asia, and UK South. But you can copy a neural voice model from those regions to other regions. For more information, see the [regions for custom neural voice](regions.md#text-to-speech).
54
+
> Custom neural voice training is only available in the these regions: East US, Southeast Asia, and UK South. But you can copy a neural voice model from those regions to other regions. For more information, see the [regions for custom neural voice](regions.md#speech-service).
55
55
56
56
To copy your custom neural voice model to another project:
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-develop-custom-commands-application.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -625,7 +625,7 @@ Another way to customize Custom Commands responses is to select an output voice.
625
625
> 
626
626
627
627
> [!NOTE]
628
-
> For public voices, neural types are available only for specific regions. For more information, see [Speech service supported regions](./regions.md#prebuilt-neural-voices).
628
+
> For public voices, neural types are available only for specific regions. For more information, see [Speech service supported regions](./regions.md#speech-service).
629
629
>
630
630
> You can create custom voices on the **Custom Voice** project page. For more information, see [Get started with Custom Voice](./how-to-custom-voice.md).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-pronunciation-assessment.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ You can get pronunciation assessment scores for:
30
30
- Phonemes in SAPI or IPA format
31
31
32
32
> [!NOTE]
33
-
> For information about availability of pronunciation assessment, see [supported languages](language-support.md#pronunciation-assessment) and [available regions](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation).
33
+
> For information about availability of pronunciation assessment, see [supported languages](language-support.md#pronunciation-assessment) and [available regions](regions.md#speech-service).
34
34
>
35
35
> The syllable groups, IPA phonemes, and spoken phoneme features of pronunciation assessment are currently only available for the en-US locale.
0 commit comments