Skip to content

Commit 7025429

Browse files
authored
Merge pull request #206185 from eric-urban/eur/speech-regions
Don't bucket regions under SDK and REST
2 parents d747bd9 + ec5b180 commit 7025429

22 files changed

+92
-192
lines changed

articles/cognitive-services/Speech-Service/custom-speech-overview.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,13 @@ With Custom Speech, you can upload your own data, test and train a custom model,
2929

3030
Here's more information about the sequence of steps shown in the previous diagram:
3131

32-
1. [Create a project](how-to-custom-speech-create-project.md) and choose a model. Use a <a href="https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices" title="Create a Speech resource" target="_blank">Speech resource</a> that you create in the Azure portal.
32+
1. [Create a project](how-to-custom-speech-create-project.md) and choose a model. Use a <a href="https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices" title="Create a Speech resource" target="_blank">Speech resource</a> that you create in the Azure portal. If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the [regions](regions.md#speech-service) table for more information.
3333
1. [Upload test data](./how-to-custom-speech-upload-data.md). Upload test data to evaluate the Microsoft speech-to-text offering for your applications, tools, and products.
3434
1. [Test recognition quality](how-to-custom-speech-inspect-data.md). Use the [Speech Studio](https://aka.ms/speechstudio/customspeech) to play back uploaded audio and inspect the speech recognition quality of your test data.
3535
1. [Test model quantitatively](how-to-custom-speech-evaluate-data.md). Evaluate and improve the accuracy of the speech-to-text model. The Speech service provides a quantitative word error rate (WER), which you can use to determine if additional training is required.
3636
1. [Train a model](how-to-custom-speech-train-model.md). Provide written transcripts and related text, along with the corresponding audio data. Testing a model before and after training is optional but recommended.
3737
1. [Deploy a model](how-to-custom-speech-deploy-model.md). Once you're satisfied with the test results, deploy the model to a custom endpoint. With the exception of [batch transcription](batch-transcription.md), you must deploy a custom endpoint to use a Custom Speech model.
3838

39-
If you will train a custom model with audio data, choose a Speech resource [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware for training audio data. In regions with dedicated hardware for Custom Speech training, the Speech service will use up to 20 hours of your audio training data, and can process about 10 hours of data per day. In other regions, the Speech service uses up to 8 hours of your audio data, and can process about 1 hour of data per day. After a model is trained, you can copy it to a Speech resource in another region as needed.
40-
4139
## Next steps
4240

4341
* [Create a project](how-to-custom-speech-create-project.md)

articles/cognitive-services/Speech-Service/how-to-audio-content-creation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ After you sign up for the Azure account, you need to create a Speech resource in
4141
It takes a few moments to deploy your new Speech resource. After the deployment is complete, you can start using the Audio Content Creation tool.
4242

4343
> [!NOTE]
44-
> If you plan to use neural voices, make sure that you create your resource in [a region that supports neural voices](regions.md#prebuilt-neural-voices).
44+
> If you plan to use neural voices, make sure that you create your resource in [a region that supports neural voices](regions.md#speech-service).
4545
4646
### Step 3: Sign in to Audio Content Creation with your Azure account and Speech resource
4747

articles/cognitive-services/Speech-Service/how-to-custom-speech-create-project.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@ To create a Custom Speech project, follow these steps:
2525

2626
1. Sign in to the [Speech Studio](https://aka.ms/speechstudio/customspeech).
2727
1. Select the subscription and Speech resource to work with.
28+
29+
> [!IMPORTANT]
30+
> If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the [regions](regions.md#speech-service) table for more information.
31+
2832
1. Select **Custom speech** > **Create a new project**.
2933
1. Follow the instructions provided by the wizard to create your project.
3034

articles/cognitive-services/Speech-Service/how-to-custom-speech-test-and-train.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Training with plain text or structured text usually finishes within a few minute
4545
>
4646
> Start with small sets of sample data that match the language, acoustics, and hardware where your model will be used. Small datasets of representative data can expose problems before you invest in gathering larger datasets for training. For sample Custom Speech data, see <a href="https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/sampledata/customspeech" target="_target">this GitHub repository</a>.
4747
48-
If you will train a custom model with audio data, choose a Speech resource [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware available for training audio data. In regions with dedicated hardware for Custom Speech training, the Speech service will use up to 20 hours of your audio training data, and can process about 10 hours of data per day. In other regions, the Speech service uses up to 8 hours of your audio data, and can process about 1 hour of data per day. After the model is trained, you can copy the model to another region as needed with the [CopyModelToSubscription](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-0/operations/CopyModelToSubscription) REST API.
48+
If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the [regions](regions.md#speech-service) table for more information. In regions with dedicated hardware for Custom Speech training, the Speech service will use up to 20 hours of your audio training data, and can process about 10 hours of data per day. In other regions, the Speech service uses up to 8 hours of your audio data, and can process about 1 hour of data per day. After the model is trained, you can copy the model to another region as needed with the [CopyModelToSubscription](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-0/operations/CopyModelToSubscription) REST API.
4949

5050
## Consider datasets by scenario
5151

@@ -76,7 +76,7 @@ You can use audio + human-labeled transcript data for both [training](how-to-cus
7676
For a list of base models that support training with audio data, see [Language support](language-support.md#speech-to-text). Even if a base model does support training with audio data, the service might use only part of the audio. And it will still use all the transcripts.
7777

7878
> [!IMPORTANT]
79-
> If a base model doesn't support customization with audio data, only the transcription text will be used for training. If you switch to a base model that supports customization with audio data, the training time may increase from several hours to several days. The change in training time would be most noticeable when you switch to a base model in a [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) without dedicated hardware for training. If the audio data is not required, you should remove it to decrease the training time.
79+
> If a base model doesn't support customization with audio data, only the transcription text will be used for training. If you switch to a base model that supports customization with audio data, the training time may increase from several hours to several days. The change in training time would be most noticeable when you switch to a base model in a [region](regions.md#speech-service) without dedicated hardware for training. If the audio data is not required, you should remove it to decrease the training time.
8080
8181
Audio with human-labeled transcripts offers the greatest accuracy improvements if the audio comes from the target use case. Samples must cover the full scope of speech. For example, a call center for a retail store would get the most calls about swimwear and sunglasses during summer months. Ensure that your sample includes the full scope of speech that you want to detect.
8282

@@ -91,7 +91,7 @@ Consider these details:
9191
* The Speech service automatically uses the transcripts to improve the recognition of domain-specific words and phrases, as though they were added as related text.
9292
* It can take several days for a training operation to finish. To improve the speed of training, be sure to create your Speech service subscription in a region that has dedicated hardware for training.
9393

94-
A large training dataset is required to improve recognition. Generally, we recommend that you provide word-by-word transcriptions for 1 to 20 hours of audio. However, even as little as 30 minutes can help improve recognition results. Although creating human-labeled transcription can take time, improvements in recognition will only be as good as the data that you provide. You should only upload only high-quality transcripts.
94+
A large training dataset is required to improve recognition. Generally, we recommend that you provide word-by-word transcriptions for 1 to 20 hours of audio. However, even as little as 30 minutes can help improve recognition results. Although creating human-labeled transcription can take time, improvements in recognition will only be as good as the data that you provide. You should upload only high-quality transcripts.
9595

9696
Audio files can have silence at the beginning and end of the recording. If possible, include at least a half-second of silence before and after speech in each sample file. Although audio with low recording volume or disruptive background noise is not helpful, it shouldn't limit or degrade your custom model. Always consider upgrading your microphones and signal processing hardware before gathering audio samples.
9797

@@ -154,7 +154,7 @@ Here are key details about the supported Markdown format:
154154
|`@list`|A list of items that can be referenced in an example sentence.|Maximum of 10 lists. Maximum of 4,000 items per list.|
155155
|`speech:phoneticlexicon`|A list of phonetic pronunciations according to the [Universal Phone Set](customize-pronunciation.md). Pronunciation is adjusted for each instance where the word appears in a list or training sentence. For example, if you have a word that sounds like "cat" and you want to adjust the pronunciation to "k ae t", you would add `- cat/k ae t` to the `speech:phoneticlexicon` list.|Maximum of 15,000 entries. Maximum of 2 pronunciations per word.|
156156
|`#ExampleSentences`|A pound symbol (`#`) delimits a section of example sentences. The section heading can only contain letters, digits, and underscores. Example sentences should reflect the range of speech that your model should expect. A training sentence can refer to items under a `@list` by using surrounding left and right curly braces (`{@list name}`). You can refer to multiple lists in the same training sentence, or none at all.|Maximum of 50,000 example sentences|
157-
|`//`|Comments follow a double slash (`//`) .|Not applicable|
157+
|`//`|Comments follow a double slash (`//`).|Not applicable|
158158

159159
Here's an example structured text file:
160160

@@ -261,7 +261,7 @@ Use <a href="http://sox.sourceforge.net" target="_blank" rel="noopener">SoX</a>
261261

262262
Not all base models support [training with audio data](language-support.md#speech-to-text). For a list of base models that support training with audio data, see [Language support](language-support.md#speech-to-text).
263263

264-
Even if a base model supports training with audio data, the service might use only part of the audio. In [regions](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware available for training audio data, the Speech service will use up to 20 hours of your audio training data. In other regions, the Speech service uses up to 8 hours of your audio data.
264+
Even if a base model supports training with audio data, the service might use only part of the audio. In [regions](regions.md#speech-service) with dedicated hardware available for training audio data, the Speech service will use up to 20 hours of your audio training data. In other regions, the Speech service uses up to 8 hours of your audio data.
265265

266266
## Next steps
267267

articles/cognitive-services/Speech-Service/how-to-custom-speech-train-model.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,17 @@ zone_pivot_groups: speech-studio-cli-rest
1818

1919
In this article, you'll learn how to train a custom model to improve recognition accuracy from the Microsoft base model. The speech recognition accuracy and quality of a Custom Speech model will remain consistent, even when a new base model is released.
2020

21+
> [!NOTE]
22+
> You pay to use Custom Speech models, but you are not charged for training a model.
23+
2124
Training a model is typically an iterative process. You will first select a base model that is the starting point for a new model. You train a model with [datasets](./how-to-custom-speech-test-and-train.md) that can include text and audio, and then you test. If the recognition quality or accuracy doesn't meet your requirements, you can create a new model with additional or modified training data, and then test again.
2225

2326
You can use a custom model for a limited time after it's trained. You must periodically recreate and adapt your custom model from the latest base model to take advantage of the improved accuracy and quality. For more information, see [Model and endpoint lifecycle](./how-to-custom-speech-model-and-endpoint-lifecycle.md).
2427

25-
> [!NOTE]
26-
> You pay to use Custom Speech models, but you are not charged for training a model.
27-
28-
If you plan to train a model with audio data, use a Speech resource in a [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware for training. After a model is trained, you can [copy it to a Speech resource](#copy-a-model) in another region as needed.
28+
> [!IMPORTANT]
29+
> If you will train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. After a model is trained, you can [copy it to a Speech resource](#copy-a-model) in another region as needed.
30+
>
31+
> In regions with dedicated hardware for Custom Speech training, the Speech service will use up to 20 hours of your audio training data, and can process about 10 hours of data per day. In other regions, the Speech service uses up to 8 hours of your audio data, and can process about 1 hour of data per day. See footnotes in the [regions](regions.md#speech-service) table for more information.
2932
3033
## Create a model
3134

@@ -209,7 +212,7 @@ The top-level `self` property in the response body is the model's URI. Use this
209212

210213
## Copy a model
211214

212-
You can copy a model to another project that uses the same locale. For example, after a model is trained with audio data in a [region](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation) with dedicated hardware for training, you can copy it to a Speech resource in another region as needed.
215+
You can copy a model to another project that uses the same locale. For example, after a model is trained with audio data in a [region](regions.md#speech-service) with dedicated hardware for training, you can copy it to a Speech resource in another region as needed.
213216

214217
::: zone pivot="speech-studio"
215218

articles/cognitive-services/Speech-Service/how-to-custom-voice-create-voice.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,7 +288,7 @@ After you've updated the engine version for your voice model, you need to [redep
288288
For more information, [learn more about the capabilities and limits of this feature, and the best practice to improve your model quality](/legal/cognitive-services/speech-service/custom-neural-voice/characteristics-and-limitations-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext).
289289

290290
> [!NOTE]
291-
> Custom Neural Voice training is only available in some regions. But you can easily copy a neural voice model from these regions to other regions. For more information, see the [regions for Custom Neural Voice](regions.md#text-to-speech).
291+
> Custom Neural Voice training is only available in some regions. But you can easily copy a neural voice model from these regions to other regions. For more information, see the [regions for Custom Neural Voice](regions.md#speech-service).
292292
293293
## Next steps
294294

articles/cognitive-services/Speech-Service/how-to-custom-voice.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.author: eur
1414

1515
# Create a Project
1616

17-
[Custom Neural Voice](https://aka.ms/customvoice) is a set of online tools that you use to create a recognizable, one-of-a-kind voice for your brand. All it takes to get started are a handful of audio files and the associated transcriptions. See if Custom Neural Voice supports your [language](language-support.md#custom-neural-voice) and [region](regions.md#custom-neural-voices).
17+
[Custom Neural Voice](https://aka.ms/customvoice) is a set of online tools that you use to create a recognizable, one-of-a-kind voice for your brand. All it takes to get started are a handful of audio files and the associated transcriptions. See if Custom Neural Voice supports your [language](language-support.md#custom-neural-voice) and [region](regions.md#speech-service).
1818

1919
> [!IMPORTANT]
2020
> Custom Neural Voice Pro can be used to create higher-quality models that are indistinguishable from human recordings. For access you must commit to using it in alignment with our responsible AI principles. Learn more about our [policy on limited access](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=%2fazure%2fcognitive-services%2fspeech-service%2fcontext%2fcontext) and [apply here](https://aka.ms/customneural).

articles/cognitive-services/Speech-Service/how-to-deploy-and-use-endpoint.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ The custom endpoint is functionally identical to the standard endpoint that's us
5151
You can copy your voice model to another project for the same region or another region. For example, you can copy a neural voice model that was trained in one region, to a project for another region.
5252

5353
> [!NOTE]
54-
> Custom neural voice training is only available in the these regions: East US, Southeast Asia, and UK South. But you can copy a neural voice model from those regions to other regions. For more information, see the [regions for custom neural voice](regions.md#text-to-speech).
54+
> Custom neural voice training is only available in the these regions: East US, Southeast Asia, and UK South. But you can copy a neural voice model from those regions to other regions. For more information, see the [regions for custom neural voice](regions.md#speech-service).
5555
5656
To copy your custom neural voice model to another project:
5757

articles/cognitive-services/Speech-Service/how-to-develop-custom-commands-application.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -625,7 +625,7 @@ Another way to customize Custom Commands responses is to select an output voice.
625625
> ![Screenshot showing sample sentences and parameters.](media/custom-commands/select-custom-voice.png)
626626
627627
> [!NOTE]
628-
> For public voices, neural types are available only for specific regions. For more information, see [Speech service supported regions](./regions.md#prebuilt-neural-voices).
628+
> For public voices, neural types are available only for specific regions. For more information, see [Speech service supported regions](./regions.md#speech-service).
629629
>
630630
> You can create custom voices on the **Custom Voice** project page. For more information, see [Get started with Custom Voice](./how-to-custom-voice.md).
631631

articles/cognitive-services/Speech-Service/how-to-pronunciation-assessment.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ You can get pronunciation assessment scores for:
3030
- Phonemes in SAPI or IPA format
3131

3232
> [!NOTE]
33-
> For information about availability of pronunciation assessment, see [supported languages](language-support.md#pronunciation-assessment) and [available regions](regions.md#speech-to-text-pronunciation-assessment-text-to-speech-and-translation).
33+
> For information about availability of pronunciation assessment, see [supported languages](language-support.md#pronunciation-assessment) and [available regions](regions.md#speech-service).
3434
>
3535
> The syllable groups, IPA phonemes, and spoken phoneme features of pronunciation assessment are currently only available for the en-US locale.
3636

0 commit comments

Comments
 (0)