You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/custom-speech-overview.md
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: eric-urban
6
6
manager: nitinme
7
7
ms.service: azure-ai-speech
8
8
ms.topic: overview
9
-
ms.date: 1/18/2024
9
+
ms.date: 1/19/2024
10
10
ms.author: eur
11
11
ms.custom: contperf-fy21q2, references_regions
12
12
---
@@ -17,7 +17,9 @@ With Custom Speech, you can evaluate and improve the accuracy of speech recognit
17
17
18
18
Out of the box, speech recognition utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing various common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md?tabs=stt) is used by default. The base model works well in most speech recognition scenarios.
19
19
20
-
A custom model can be used to augment the base model to improve recognition of domain-specific vocabulary specific to the application by providing text data to train the model. It can also be used to improve recognition based for the specific audio conditions of the application by providing audio data with reference transcriptions.
20
+
A custom model can be used to augment the base model to improve recognition of domain-specific vocabulary specific to the application by providing text data to train the model. It can also be used to improve recognition based for the specific audio conditions of the application by providing audio data with reference transcriptions.
21
+
22
+
You can also train a model with structured text when the data follows a pattern, to specify custom pronunciations, and to customize display text formatting with custom inverse text normalization, custom rewrite, and custom profanity filtering.
21
23
22
24
## How does it work?
23
25
@@ -27,7 +29,7 @@ With Custom Speech, you can upload your own data, test and train a custom model,
27
29
28
30
Here's more information about the sequence of steps shown in the previous diagram:
29
31
30
-
1.[Create a project](how-to-custom-speech-create-project.md) and choose a model. Use a <ahref="https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices"title="Create a Speech resource"target="_blank">Speech resource</a> that you create in the Azure portal. If you train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. See footnotes in the [regions](regions.md#speech-service) table for more information.
32
+
1.[Create a project](how-to-custom-speech-create-project.md) and choose a model. Use a <ahref="https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices"title="Create a Speech resource"target="_blank">Speech resource</a> that you create in the Azure portal. If you train a custom model with audio data, choose a Speech resource region with dedicated hardware for training audio data. For more information, see footnotes in the [regions](regions.md#speech-service) table.
31
33
1.[Upload test data](./how-to-custom-speech-upload-data.md). Upload test data to evaluate the speech to text offering for your applications, tools, and products.
32
34
1.[Test recognition quality](how-to-custom-speech-inspect-data.md). Use the [Speech Studio](https://aka.ms/speechstudio/customspeech) to play back uploaded audio and inspect the speech recognition quality of your test data.
33
35
1.[Test model quantitatively](how-to-custom-speech-evaluate-data.md). Evaluate and improve the accuracy of the speech to text model. The Speech service provides a quantitative word error rate (WER), which you can use to determine if more training is required.
@@ -40,7 +42,7 @@ Here's more information about the sequence of steps shown in the previous diagra
40
42
41
43
## Responsible AI
42
44
43
-
An AI system includes not only the technology, but also the people who use it, the people who will be affected by it, and the environment in which it's deployed. Read the transparency notes to learn about responsible AI use and deployment in your systems.
45
+
An AI system includes not only the technology, but also the people who use it, the people who are affected by it, and the environment in which it's deployed. Read the transparency notes to learn about responsible AI use and deployment in your systems.
44
46
45
47
*[Transparency note and use cases](/legal/cognitive-services/speech-service/speech-to-text/transparency-note?context=/azure/ai-services/speech-service/context/context)
46
48
*[Characteristics and limitations](/legal/cognitive-services/speech-service/speech-to-text/characteristics-and-limitations?context=/azure/ai-services/speech-service/context/context)
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/how-to-custom-speech-continuous-integration-continuous-deployment.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: nitinme
6
6
manager: cmayomsft
7
7
ms.service: azure-ai-speech
8
8
ms.topic: how-to
9
-
ms.date: 05/08/2022
9
+
ms.date: 1/19/2024
10
10
ms.author: nitinme
11
11
---
12
12
@@ -24,22 +24,22 @@ Custom CI/CD solutions are possible, but for a robust, pre-built solution, use t
24
24
25
25
The purpose of these workflows is to ensure that each Custom Speech model has better recognition accuracy than the previous build. If the updates to the testing and/or training data improve the accuracy, these workflows create a new Custom Speech endpoint.
26
26
27
-
Git servers such as GitHub and Azure DevOps can run automated workflows when specific Git events happen, such as merges or pull requests. For example, a CI workflow can be triggered when updates to testing data are pushed to the *main* branch. Different Git Servers will have different tooling, but will allow scripting command-line interface (CLI) commands so that they can execute on a build server.
27
+
Git servers such as GitHub and Azure DevOps can run automated workflows when specific Git events happen, such as merges or pull requests. For example, a CI workflow can be triggered when updates to testing data are pushed to the *main* branch. Different Git Servers have different tooling, but allow scripting command-line interface (CLI) commands so that they can execute on a build server.
28
28
29
-
Along the way, the workflows should name and store data, tests, test files, models, and endpoints such that they can be traced back to the commit or version they came from. It is also helpful to name these assets so that it is easy to see which were created after updating testing data versus training data.
29
+
Along the way, the workflows should name and store data, tests, test files, models, and endpoints such that they can be traced back to the commit or version they came from. It's also helpful to name these assets so that it's easy to see which were created after updating testing data versus training data.
30
30
31
31
### CI workflow for testing data updates
32
32
33
-
The principal purpose of the CI/CD workflows is to build a new model using the training data, and to test that model using the testing data to establish whether the [Word Error Rate](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer) (WER) has improved compared to the previous best-performing model (the "benchmark model"). If the new model performs better, it becomes the new benchmark model against which future models are compared.
33
+
The principal purpose of the CI/CD workflows is to build a new model using the training data, and to test that model using the testing data to establish whether the [Word Error Rate](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer) (WER) improved compared to the previous best-performing model (the "benchmark model"). If the new model performs better, it becomes the new benchmark model against which future models are compared.
34
34
35
-
The CI workflow for testing data updates should retest the current benchmark model with the updated test data to calculate the revised WER. This ensures that when the WER of a new model is compared to the WER of the benchmark, both models have been tested against the same test data and you're comparing like with like.
35
+
The CI workflow for testing data updates should retest the current benchmark model with the updated test data to calculate the revised WER. This ensures that when the WER of a new model is compared to the WER of the benchmark, both models were tested against the same test data and you're comparing like with like.
36
36
37
37
This workflow should trigger on updates to testing data and:
38
38
39
39
- Test the benchmark model against the updated testing data.
40
40
- Store the test output, which contains the WER of the benchmark model, using the updated data.
41
41
- The WER from these tests will become the new benchmark WER that future models must beat.
42
-
- The CD workflow does not execute for updates to testing data.
42
+
- The CD workflow doesn't execute for updates to testing data.
43
43
44
44
### CI workflow for training data updates
45
45
@@ -51,7 +51,7 @@ This workflow should trigger on updates to training data and:
51
51
- Test the new model against the testing data.
52
52
- Store the test output, which contains the WER.
53
53
- Compare the WER from the new model to the WER from the benchmark model.
54
-
- If the WER does not improve, stop the workflow.
54
+
- If the WER doesn't improve, stop the workflow.
55
55
- If the WER improves, execute the CD workflow to create a Custom Speech endpoint.
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/how-to-custom-speech-create-project.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: eric-urban
6
6
manager: nitinme
7
7
ms.service: azure-ai-speech
8
8
ms.topic: how-to
9
-
ms.date: 11/29/2022
9
+
ms.date: 1/19/2024
10
10
ms.author: eur
11
11
zone_pivot_groups: speech-studio-cli-rest
12
12
---
@@ -30,7 +30,7 @@ To create a Custom Speech project, follow these steps:
30
30
1. Select **Custom speech** > **Create a new project**.
31
31
1. Follow the instructions provided by the wizard to create your project.
32
32
33
-
Select the new project by name or select **Go to project**. You will see these menu items in the left panel: **Speech datasets**, **Train custom models**, **Test models**, and **Deploy models**.
33
+
Select the new project by name or select **Go to project**. You'll see these menu items in the left panel: **Speech datasets**, **Train custom models**, **Test models**, and **Deploy models**.
34
34
35
35
::: zone-end
36
36
@@ -39,7 +39,7 @@ Select the new project by name or select **Go to project**. You will see these m
39
39
To create a project, use the `spx csr project create` command. Construct the request parameters according to the following instructions:
40
40
41
41
- Set the required `language` parameter. The locale of the project and the contained datasets should be the same. The locale can't be changed later. The Speech CLI `language` parameter corresponds to the `locale` property in the JSON request and response.
42
-
- Set the required `name` parameter. This is the name that will be displayed in the Speech Studio. The Speech CLI `name` parameter corresponds to the `displayName` property in the JSON request and response.
42
+
- Set the required `name` parameter. This is the name that is displayed in the Speech Studio. The Speech CLI `name` parameter corresponds to the `displayName` property in the JSON request and response.
43
43
44
44
Here's an example Speech CLI command that creates a project:
45
45
@@ -88,7 +88,7 @@ spx help csr project
88
88
To create a project, use the [Projects_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Projects_Create) operation of the [Speech to text REST API](rest-speech-to-text.md). Construct the request body according to the following instructions:
89
89
90
90
- Set the required `locale` property. This should be the locale of the contained datasets. The locale can't be changed later.
91
-
- Set the required `displayName` property. This is the project name that will be displayed in the Speech Studio.
91
+
- Set the required `displayName` property. This is the project name that is displayed in the Speech Studio.
92
92
93
93
Make an HTTP POST request using the URI as shown in the following [Projects_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Projects_Create) example. Replace `YourSubscriptionKey` with your Speech resource key, replace `YourServiceRegion` with your Speech resource region, and set the request body properties as previously described.
94
94
@@ -137,13 +137,13 @@ There are a few approaches to using Custom Speech models:
137
137
- A custom model augments the base model to include domain-specific vocabulary shared across all areas of the custom domain.
138
138
- Multiple custom models can be used when the custom domain has multiple areas, each with a specific vocabulary.
139
139
140
-
One recommended way to see if the base model will suffice is to analyze the transcription produced from the base model and compare it with a human-generated transcript for the same audio. You can compare the transcripts and obtain a [word error rate (WER)](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer) score. If the WER score is high, training a custom model to recognize the incorrectly identified words is recommended.
140
+
One recommended way to see if the base model suffices is to analyze the transcription produced from the base model and compare it with a human-generated transcript for the same audio. You can compare the transcripts and obtain a [word error rate (WER)](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer) score. If the WER score is high, training a custom model to recognize the incorrectly identified words is recommended.
141
141
142
142
Multiple models are recommended if the vocabulary varies across the domain areas. For instance, Olympic commentators report on various events, each associated with its own vernacular. Because each Olympic event vocabulary differs significantly from others, building a custom model specific to an event increases accuracy by limiting the utterance data relative to that particular event. As a result, the model doesn't need to sift through unrelated data to make a match. Regardless, training still requires a decent variety of training data. Include audio from various commentators who have different accents, gender, age, etcetera.
143
143
144
144
## Model stability and lifecycle
145
145
146
-
A base model or custom model deployed to an endpoint using Custom Speech is fixed until you decide to update it. The speech recognition accuracy and quality will remain consistent, even when a new base model is released. This allows you to lock in the behavior of a specific model until you decide to use a newer model.
146
+
A base model or custom model deployed to an endpoint using Custom Speech is fixed until you decide to update it. The speech recognition accuracy and quality remain consistent, even when a new base model is released. This allows you to lock in the behavior of a specific model until you decide to use a newer model.
147
147
148
148
Whether you train your own model or use a snapshot of a base model, you can use the model for a limited time. For more information, see [Model and endpoint lifecycle](./how-to-custom-speech-model-and-endpoint-lifecycle.md).
0 commit comments