Skip to content

Commit 9e865ca

Browse files
committed
Merge branch 'main' into release-postgres-flexible
2 parents ea96260 + cf71a26 commit 9e865ca

File tree

160 files changed

+2006
-855
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

160 files changed

+2006
-855
lines changed

articles/ai-services/document-intelligence/quickstarts/try-document-intelligence-studio.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,6 @@ monikerRange: '>=doc-intel-3.0.0'
2222

2323
[Document Intelligence Studio](https://formrecognizer.appliedai.azure.com/) is an online tool for visually exploring, understanding, and integrating features from the Document Intelligence service in your applications. You can get started by exploring the pretrained models with sample or your own documents. You can also create projects to build custom template models and reference the models in your applications using the [Python SDK](get-started-sdks-rest-api.md?view=doc-intel-3.0.0&preserve-view=true) and other quickstarts.
2424

25-
> [!VIDEO https://www.microsoft.com/en-us/videoplayer/embed/RE56n49]
26-
2725
## Prerequisites for new users
2826

2927
* An active [**Azure account**](https://azure.microsoft.com/free/cognitive-services/). If you don't have one, you can [**create a free account**](https://azure.microsoft.com/free/).

articles/ai-services/speech-service/faq-stt.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ sections:
164164
- question: |
165165
What is word error rate (WER), and how is it computed?
166166
answer: |
167-
WER is the evaluation metric for speech recognition. WER is calculated as the total number of errors (insertions, deletions, and substitutions), divided by the total number of words in the reference transcription. For more information, see [Test model quantitatively](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate).
167+
WER is the evaluation metric for speech recognition. WER is calculated as the total number of errors (insertions, deletions, and substitutions), divided by the total number of words in the reference transcription. For more information, see [Test model quantitatively](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer).
168168
169169
- question: |
170170
How do I determine whether the results of an accuracy test are good?

articles/ai-services/speech-service/how-to-custom-speech-continuous-integration-continuous-deployment.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Along the way, the workflows should name and store data, tests, test files, mode
3030

3131
### CI workflow for testing data updates
3232

33-
The principal purpose of the CI/CD workflows is to build a new model using the training data, and to test that model using the testing data to establish whether the [Word Error Rate](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate) (WER) has improved compared to the previous best-performing model (the "benchmark model"). If the new model performs better, it becomes the new benchmark model against which future models are compared.
33+
The principal purpose of the CI/CD workflows is to build a new model using the training data, and to test that model using the testing data to establish whether the [Word Error Rate](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer) (WER) has improved compared to the previous best-performing model (the "benchmark model"). If the new model performs better, it becomes the new benchmark model against which future models are compared.
3434

3535
The CI workflow for testing data updates should retest the current benchmark model with the updated test data to calculate the revised WER. This ensures that when the WER of a new model is compared to the WER of the benchmark, both models have been tested against the same test data and you're comparing like with like.
3636

@@ -78,7 +78,7 @@ The [Speech DevOps template repo](https://github.com/Azure-Samples/Speech-Servic
7878
- Copy the template repository to your GitHub account, then create Azure resources and a [service principal](../../active-directory/develop/app-objects-and-service-principals.md#service-principal-object) for the GitHub Actions CI/CD workflows.
7979
- Walk through the "[dev inner loop](/dotnet/architecture/containerized-lifecycle/design-develop-containerized-apps/docker-apps-inner-loop-workflow)." Update training and testing data from a feature branch, test the changes with a temporary development model, and raise a pull request to propose and review the changes.
8080
- When training data is updated in a pull request to *main*, train models with the GitHub Actions CI workflow.
81-
- Perform automated accuracy testing to establish a model's [Word Error Rate](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate) (WER). Store the test results in Azure Blob.
81+
- Perform automated accuracy testing to establish a model's [Word Error Rate](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer) (WER). Store the test results in Azure Blob.
8282
- Execute the CD workflow to create an endpoint when the WER improves.
8383

8484
## Next steps

articles/ai-services/speech-service/how-to-custom-speech-create-project.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ There are a few approaches to using Custom Speech models:
137137
- A custom model augments the base model to include domain-specific vocabulary shared across all areas of the custom domain.
138138
- Multiple custom models can be used when the custom domain has multiple areas, each with a specific vocabulary.
139139

140-
One recommended way to see if the base model will suffice is to analyze the transcription produced from the base model and compare it with a human-generated transcript for the same audio. You can compare the transcripts and obtain a [word error rate (WER)](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate) score. If the WER score is high, training a custom model to recognize the incorrectly identified words is recommended.
140+
One recommended way to see if the base model will suffice is to analyze the transcription produced from the base model and compare it with a human-generated transcript for the same audio. You can compare the transcripts and obtain a [word error rate (WER)](how-to-custom-speech-evaluate-data.md#evaluate-word-error-rate-wer) score. If the WER score is high, training a custom model to recognize the incorrectly identified words is recommended.
141141

142142
Multiple models are recommended if the vocabulary varies across the domain areas. For instance, Olympic commentators report on various events, each associated with its own vernacular. Because each Olympic event vocabulary differs significantly from others, building a custom model specific to an event increases accuracy by limiting the utterance data relative to that particular event. As a result, the model doesn't need to sift through unrelated data to make a match. Regardless, training still requires a decent variety of training data. Include audio from various commentators who have different accents, gender, age, etcetera.
143143

articles/ai-services/speech-service/how-to-custom-speech-evaluate-data.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@ author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.topic: how-to
9-
ms.date: 11/29/2022
9+
ms.date: 12/20/2023
1010
ms.author: eur
11-
ms.custom: ignite-fall-2021
1211
zone_pivot_groups: speech-studio-cli-rest
1312
show_latex: true
1413
no-loc: [$$, '\times', '\over']
@@ -22,7 +21,7 @@ In this article, you learn how to quantitatively measure and improve the accurac
2221

2322
## Create a test
2423

25-
You can test the accuracy of your custom model by creating a test. A test requires a collection of audio files and their corresponding transcriptions. You can compare a custom model's accuracy with a speech to text base model or another custom model. After you [get](#get-test-results) the test results, [evaluate](#evaluate-word-error-rate) the word error rate (WER) compared to speech recognition results.
24+
You can test the accuracy of your custom model by creating a test. A test requires a collection of audio files and their corresponding transcriptions. You can compare a custom model's accuracy with a speech to text base model or another custom model. After you [get](#get-test-results) the test results, [evaluate the word error rate (WER)](#evaluate-word-error-rate-wer) compared to speech recognition results.
2625

2726
::: zone pivot="speech-studio"
2827

@@ -222,7 +221,7 @@ The top-level `self` property in the response body is the evaluation's URI. Use
222221

223222
## Get test results
224223

225-
You should get the test results and [evaluate](#evaluate-word-error-rate) the word error rate (WER) compared to speech recognition results.
224+
You should get the test results and [evaluate](#evaluate-word-error-rate-wer) the word error rate (WER) compared to speech recognition results.
226225

227226
::: zone pivot="speech-studio"
228227

@@ -386,7 +385,7 @@ You should receive a response body in the following format:
386385
::: zone-end
387386

388387

389-
## Evaluate word error rate
388+
## Evaluate word error rate (WER)
390389

391390
The industry standard for measuring model accuracy is [word error rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate). WER counts the number of incorrect words identified during recognition, and divides the sum by the total number of words provided in the human-labeled transcript (N).
392391

@@ -423,6 +422,26 @@ How the errors are distributed is important. When many deletion errors are encou
423422

424423
By analyzing individual files, you can determine what type of errors exist, and which errors are unique to a specific file. Understanding issues at the file level will help you target improvements.
425424

425+
## Evaluate token error rate (TER)
426+
427+
Besides [word error rate](#evaluate-word-error-rate-wer), you can also use the extended measurement of **Token Error Rate (TER)** to evaluate quality on the final end-to-end display format. In addition to the lexical format (`That will cost $900.` instead of `that will cost nine hundred dollars`), TER takes into account the display format aspects such as punctuation, capitalization, and ITN. Learn more about [Display output formatting with speech to text](display-text-format.md).
428+
429+
TER counts the number of incorrect tokens identified during recognition, and divides the sum by the total number of tokens provided in the human-labeled transcript (N).
430+
431+
$$
432+
TER = {{I+D+S}\over N} \times 100
433+
$$
434+
435+
The formula of TER calculation is also very similar to WER. The only difference is that TER is calculated based on the token level instead of word level.
436+
* Insertion (I): Tokens that are incorrectly added in the hypothesis transcript
437+
* Deletion (D): Tokens that are undetected in the hypothesis transcript
438+
* Substitution (S): Tokens that were substituted between reference and hypothesis
439+
440+
In a real-world case, you may analyze both WER and TER results to get the desired improvements.
441+
442+
> [!NOTE]
443+
> To measure TER, you need to make sure the [audio + transcript testing data](./how-to-custom-speech-test-and-train.md#audio--human-labeled-transcript-data-for-training-or-testing) includes transcripts with display formatting such as punctuation, capitalization, and ITN.
444+
426445
## Example scenario outcomes
427446

428447
Speech recognition scenarios vary by audio quality and language (vocabulary and speaking style). The following table examines four common scenarios:

0 commit comments

Comments
 (0)