Skip to content

Commit f56a51d

Browse files
committed
custom speech overview
1 parent 2ddb067 commit f56a51d

File tree

2 files changed

+22
-20
lines changed

2 files changed

+22
-20
lines changed

articles/ai-services/speech-service/custom-speech-overview.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.topic: overview
9-
ms.date: 9/15/2024
9+
ms.date: 2/25/2025
1010
ms.author: eur
1111
ms.custom: references_regions
1212
---
@@ -40,6 +40,23 @@ Here's more information about the sequence of steps shown in the previous diagra
4040
> [!TIP]
4141
> A hosted deployment endpoint isn't required to use custom speech with the [Batch transcription API](batch-transcription.md). You can conserve resources if the custom speech model is only used for batch transcription. For more information, see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
4242
43+
## Choose your model
44+
45+
There are a few approaches to using custom speech models:
46+
- The base model provides accurate speech recognition out of the box for a range of [scenarios](#speech-scenarios). Base models are updated periodically to improve accuracy and quality. We recommend that if you use base models, use the latest default base models. If a required customization capability is only available with an older model, then you can choose an older base model.
47+
- A custom model augments the base model to include domain-specific vocabulary shared across all areas of the custom domain.
48+
- Multiple custom models can be used when the custom domain has multiple areas, each with a specific vocabulary.
49+
50+
One recommended way to see if the base model suffices is to analyze the transcription produced from the base model and compare it with a human-generated transcript for the same audio. You can compare the transcripts and obtain a [word error rate (WER)](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer) score. If the WER score is high, training a custom model to recognize the incorrectly identified words is recommended.
51+
52+
Multiple models are recommended if the vocabulary varies across the domain areas. For instance, Olympic commentators report on various events, each associated with its own vernacular. Because each Olympic event vocabulary differs significantly from others, building a custom model specific to an event increases accuracy by limiting the utterance data relative to that particular event. As a result, the model doesn't need to sift through unrelated data to make a match. Regardless, training still requires a decent variety of training data. Include audio from various commentators who have different accents, gender, age, etcetera.
53+
54+
## Model stability and lifecycle
55+
56+
A base model or custom model deployed to an endpoint using custom speech is fixed until you decide to update it. The speech recognition accuracy and quality remain consistent, even when a new base model is released. This allows you to lock in the behavior of a specific model until you decide to use a newer model.
57+
58+
Whether you train your own model or use a snapshot of a base model, you can use the model for a limited time. For more information, see [Model and endpoint lifecycle](./how-to-custom-speech-model-and-endpoint-lifecycle.md).
59+
4360
## Responsible AI
4461

4562
An AI system includes not only the technology, but also the people who use it, the people who are affected by it, and the environment in which it's deployed. Read the transparency notes to learn about responsible AI use and deployment in your systems.

articles/ai-services/speech-service/how-to-custom-speech-create-project.md

Lines changed: 4 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -48,15 +48,17 @@ In the [Azure AI Foundry portal](https://ai.azure.com), you can fine-tune some A
4848

4949
1. Enter the language, name, and description for the fine-tuning job. Then select **Create**.
5050

51-
Go to the Azure AI Speech documentation to continue fine-tuning your model.
51+
## Continue fine-tuning
52+
53+
Go to the Azure AI Speech documentation to learn how to continue fine-tuning your custom speech model:
5254
* [Upload training and testing datasets](./how-to-custom-speech-upload-data.md)
5355
* [Train a model](how-to-custom-speech-train-model.md)
5456
* [Test model quantitatively](how-to-custom-speech-evaluate-data.md) and [test model qualitatively](./how-to-custom-speech-inspect-data.md)
5557
* [Deploy a model](how-to-custom-speech-deploy-model.md)
5658

5759
## View fine-tuned models
5860

59-
You can access your custom speech models and deployments from the **Fine-tuning** page.
61+
After fine-tuning, you can access your custom speech models and deployments from the **Fine-tuning** page.
6062

6163
1. Select **Fine-tuning** from the left pane.
6264
1. Select **AI Service fine-tuning**.
@@ -82,23 +84,6 @@ Select the new project by name or select **Go to project**. You'll see these men
8284

8385
::: zone-end
8486

85-
## Choose your model
86-
87-
There are a few approaches to using custom speech models:
88-
- The base model provides accurate speech recognition out of the box for a range of [scenarios](overview.md#speech-scenarios). Base models are updated periodically to improve accuracy and quality. We recommend that if you use base models, use the latest default base models. If a required customization capability is only available with an older model, then you can choose an older base model.
89-
- A custom model augments the base model to include domain-specific vocabulary shared across all areas of the custom domain.
90-
- Multiple custom models can be used when the custom domain has multiple areas, each with a specific vocabulary.
91-
92-
One recommended way to see if the base model suffices is to analyze the transcription produced from the base model and compare it with a human-generated transcript for the same audio. You can compare the transcripts and obtain a [word error rate (WER)](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer) score. If the WER score is high, training a custom model to recognize the incorrectly identified words is recommended.
93-
94-
Multiple models are recommended if the vocabulary varies across the domain areas. For instance, Olympic commentators report on various events, each associated with its own vernacular. Because each Olympic event vocabulary differs significantly from others, building a custom model specific to an event increases accuracy by limiting the utterance data relative to that particular event. As a result, the model doesn't need to sift through unrelated data to make a match. Regardless, training still requires a decent variety of training data. Include audio from various commentators who have different accents, gender, age, etcetera.
95-
96-
## Model stability and lifecycle
97-
98-
A base model or custom model deployed to an endpoint using custom speech is fixed until you decide to update it. The speech recognition accuracy and quality remain consistent, even when a new base model is released. This allows you to lock in the behavior of a specific model until you decide to use a newer model.
99-
100-
Whether you train your own model or use a snapshot of a base model, you can use the model for a limited time. For more information, see [Model and endpoint lifecycle](./how-to-custom-speech-model-and-endpoint-lifecycle.md).
101-
10287
## Related content
10388

10489
* [Training and testing datasets](./how-to-custom-speech-test-and-train.md)

0 commit comments

Comments
 (0)