Skip to content

Commit 0570907

Browse files
committed
initial draft to distinguish batch vs real-time
1 parent 2a9a831 commit 0570907

9 files changed

+63
-49
lines changed

articles/cognitive-services/Speech-Service/batch-transcription-create.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ To create a transcription, use the `spx batch transcription create` command. Con
113113
Here's an example Speech CLI command that creates a transcription job:
114114

115115
```azurecli-interactive
116-
spx batch transcription create --api-version v3.1 --name "My Transcription" --language "en-US" --content https://crbn.us/hello.wav;https://crbn.us/whatstheweatherlike.wav
116+
spx batch transcription create --name "My Transcription" --language "en-US" --content https://crbn.us/hello.wav;https://crbn.us/whatstheweatherlike.wav
117117
```
118118

119119
You should receive a response body in the following format:
@@ -223,12 +223,15 @@ curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourSubscriptionKey" -H "Content-
223223
::: zone pivot="speech-cli"
224224

225225
```azurecli-interactive
226-
spx batch transcription create --api-version v3.1 --name "My Transcription" --language "en-US" --content https://crbn.us/hello.wav;https://crbn.us/whatstheweatherlike.wav --model "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/models/base/1aae1070-7972-47e9-a977-87e3b05c457d"
226+
spx batch transcription create --name "My Transcription" --language "en-US" --content https://crbn.us/hello.wav;https://crbn.us/whatstheweatherlike.wav --model "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/models/base/1aae1070-7972-47e9-a977-87e3b05c457d"
227227
```
228228

229229
::: zone-end
230230

231-
To use a Custom Speech model for batch transcription, you need the model's URI. You can retrieve the model location when you create or get a model. The top-level `self` property in the response body is the model's URI. For an example, see the JSON response example in the [Create a model](how-to-custom-speech-train-model.md?pivots=rest-api#create-a-model) guide. A [custom model deployment endpoint](how-to-custom-speech-deploy-model.md) isn't needed for the batch transcription service.
231+
To use a Custom Speech model for batch transcription, you need the model's URI. You can retrieve the model location when you create or get a model. The top-level `self` property in the response body is the model's URI. For an example, see the JSON response example in the [Create a model](how-to-custom-speech-train-model.md?pivots=rest-api#create-a-model) guide.
232+
233+
> [!TIP]
234+
> A [hosted deployment endpoint](how-to-custom-speech-deploy-model.md) isn't required to use custom speech with the batch transcription service. You can conserve resources if the [custom speech model](how-to-custom-speech-train-model.md) is only used for batch transcription.
232235
233236
Batch transcription requests for expired models will fail with a 4xx error. You'll want to set the `model` property to a base model or custom model that hasn't yet expired. Otherwise don't include the `model` property to always use the latest base model. For more information, see [Choose a model](how-to-custom-speech-create-project.md#choose-your-model) and [Custom Speech model lifecycle](how-to-custom-speech-model-and-endpoint-lifecycle.md).
234237

articles/cognitive-services/Speech-Service/custom-speech-overview.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,12 @@ ms.custom: contperf-fy21q2, references_regions
1515

1616
# What is Custom Speech?
1717

18-
With Custom Speech, you can evaluate and improve the Microsoft speech-to-text accuracy for your applications and products.
18+
With Custom Speech, you can evaluate and improve the accuracy of speech recognition for your applications and products. A custom speech model can be used for [real-time speech-to-text](speech-to-text), [speech translation](speech-translation), and [batch transcription](batch-transcription.md).
1919

20-
Out of the box, speech to text utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing a variety of common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md?tabs=stt) is used by default. The base model works very well in most speech recognition scenarios.
20+
Out of the box, speech recognition utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pre-trained with dialects and phonetics representing a variety of common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md?tabs=stt) is used by default. The base model works very well in most speech recognition scenarios.
2121

2222
A custom model can be used to augment the base model to improve recognition of domain-specific vocabulary specific to the application by providing text data to train the model. It can also be used to improve recognition based for the specific audio conditions of the application by providing audio data with reference transcriptions.
2323

24-
> [!NOTE]
25-
> You pay to use Custom Speech models, but you are not charged for training a model. Usage includes hosting of your deployed custom endpoint in addition to using the endpoint for speech-to-text. For more information, see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
26-
2724
## How does it work?
2825

2926
With Custom Speech, you can upload your own data, test and train a custom model, compare accuracy between models, and deploy a model to a custom endpoint.
@@ -37,7 +34,11 @@ Here's more information about the sequence of steps shown in the previous diagra
3734
1. [Test recognition quality](how-to-custom-speech-inspect-data.md). Use the [Speech Studio](https://aka.ms/speechstudio/customspeech) to play back uploaded audio and inspect the speech recognition quality of your test data.
3835
1. [Test model quantitatively](how-to-custom-speech-evaluate-data.md). Evaluate and improve the accuracy of the speech-to-text model. The Speech service provides a quantitative word error rate (WER), which you can use to determine if additional training is required.
3936
1. [Train a model](how-to-custom-speech-train-model.md). Provide written transcripts and related text, along with the corresponding audio data. Testing a model before and after training is optional but recommended.
37+
> [!NOTE]
38+
> You pay for Custom Speech model usage and endpoint hosting, but you are not charged for training a model.
4039
1. [Deploy a model](how-to-custom-speech-deploy-model.md). Once you're satisfied with the test results, deploy the model to a custom endpoint. With the exception of [batch transcription](batch-transcription.md), you must deploy a custom endpoint to use a Custom Speech model.
40+
> [!TIP]
41+
> A hosted deployment endpoint isn't required to use Custom Speech with the [Batch transcription API](batch-transcription.md). You can conserve resources if the custom speech model is only used for batch transcription. For more information, see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
4142
4243
## Next steps
4344

articles/cognitive-services/Speech-Service/how-to-custom-speech-deploy-model.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ zone_pivot_groups: speech-studio-cli-rest
1515

1616
# Deploy a Custom Speech model
1717

18-
In this article, you'll learn how to deploy an endpoint for a Custom Speech model. With the exception of [batch transcription](batch-transcription.md), you must deploy a custom endpoint to use a Custom Speech model.
18+
In this article, you'll learn how to deploy an endpoint for a Custom Speech model. With the exception of [batch transcription](batch-transcription.md), you must deploy a custom endpoint to use a Custom Speech model.
1919

20-
> [!NOTE]
21-
> You pay to use Custom Speech models, but you are not charged for training a model. Usage includes hosting of your deployed custom endpoint in addition to using the endpoint for speech-to-text. For more information, see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
20+
> [!TIP]
21+
> A hosted deployment endpoint isn't required to use Custom Speech with the [Batch transcription API](batch-transcription.md). You can conserve resources if the [custom speech model](how-to-custom-speech-train-model.md) is only used for batch transcription. For more information, see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
2222
2323
You can deploy an endpoint for a base or custom model, and then [update](#change-model-and-redeploy-endpoint) the endpoint later to use a better trained model.
2424

articles/cognitive-services/Speech-Service/how-to-custom-speech-train-model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ zone_pivot_groups: speech-studio-cli-rest
1919
In this article, you'll learn how to train a custom model to improve recognition accuracy from the Microsoft base model. The speech recognition accuracy and quality of a Custom Speech model will remain consistent, even when a new base model is released.
2020

2121
> [!NOTE]
22-
> You pay to use Custom Speech models, but you are not charged for training a model. Usage includes hosting of your deployed custom endpoint in addition to using the endpoint for speech-to-text. For more information, see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
22+
> You pay for Custom Speech model usage and [endpoint hosting](how-to-custom-speech-deploy-model.md), but you are not charged for training a model.
2323
2424
Training a model is typically an iterative process. You will first select a base model that is the starting point for a new model. You train a model with [datasets](./how-to-custom-speech-test-and-train.md) that can include text and audio, and then you test. If the recognition quality or accuracy doesn't meet your requirements, you can create a new model with additional or modified training data, and then test again.
2525

articles/cognitive-services/Speech-Service/how-to-get-speech-session-id.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,12 @@ ms.author: alexeyo
1717
If you use [Speech-to-text](speech-to-text.md) and need to open a support case, you are often asked to provide a *Session ID* or *Transcription ID* of the problematic transcriptions to debug the issue. This article explains how to get these IDs.
1818

1919
> [!NOTE]
20-
> * *Session ID* is used in [Online transcription](get-started-speech-to-text.md) and [Translation](speech-translation.md).
20+
> * *Session ID* is used in [real-time speech to text](get-started-speech-to-text.md) and [speech translation](speech-translation.md).
2121
> * *Transcription ID* is used in [Batch transcription](batch-transcription.md).
2222
23-
## Getting Session ID for Online transcription and Translation. (Speech SDK and REST API for short audio).
23+
## Getting Session ID
2424

25-
[Online transcription](get-started-speech-to-text.md) and [Translation](speech-translation.md) use either the [Speech SDK](speech-sdk.md) or the [REST API for short audio](rest-speech-to-text-short.md).
25+
[Real-time speech to text](get-started-speech-to-text.md) and [speech translation](speech-translation.md) use either the [Speech SDK](speech-sdk.md) or the [REST API for short audio](rest-speech-to-text-short.md).
2626

2727
To get the Session ID, when using SDK you need to:
2828

@@ -80,20 +80,20 @@ https://westeurope.stt.speech.microsoft.com/speech/recognition/conversation/cogn
8080

8181
## Getting Transcription ID for Batch transcription
8282

83-
[Batch transcription](batch-transcription.md) uses [Speech-to-text REST API](rest-speech-to-text.md).
83+
[Batch transcription API](batch-transcription.md) is a subset of the [Speech-to-text REST API](rest-speech-to-text.md).
8484

85-
The required Transcription ID is the GUID value contained in the main `self` element of the Response body returned by requests, like [Transcriptions_Create](https://centralus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create).
85+
The required Transcription ID is the GUID value contained in the main `self` element of the Response body returned by requests, like [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create).
8686

87-
The example below is the Response body of a `Create Transcription` request. GUID value `537216f8-0620-4a10-ae2d-00bdb423b36f` found in the first `self` element is the Transcription ID.
87+
The following is and example response body of a [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create) request. GUID value `537216f8-0620-4a10-ae2d-00bdb423b36f` found in the first `self` element is the Transcription ID.
8888

8989
```json
9090
{
91-
"self": "https://japaneast.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions/537216f8-0620-4a10-ae2d-00bdb423b36f",
91+
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions/537216f8-0620-4a10-ae2d-00bdb423b36f",
9292
"model": {
93-
"self": "https://japaneast.api.cognitive.microsoft.com/speechtotext/v3.1/models/base/824bd685-2d45-424d-bb65-c3fe99e32927"
93+
"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/models/base/824bd685-2d45-424d-bb65-c3fe99e32927"
9494
},
9595
"links": {
96-
"files": "https://japaneast.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions/537216f8-0620-4a10-ae2d-00bdb423b36f/files"
96+
"files": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions/537216f8-0620-4a10-ae2d-00bdb423b36f/files"
9797
},
9898
"properties": {
9999
"diarizationEnabled": false,
@@ -113,7 +113,7 @@ The example below is the Response body of a `Create Transcription` request. GUID
113113
}
114114
```
115115
> [!NOTE]
116-
> Use the same technique to determine different IDs required for debugging issues related to [Custom Speech](custom-speech-overview.md), like uploading a dataset using [Datasets_Create](https://centralus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Datasets_Create) request.
116+
> Use the same technique to determine different IDs required for debugging issues related to [Custom Speech](custom-speech-overview.md), like uploading a dataset using [Datasets_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Datasets_Create) request.
117117
118118
> [!NOTE]
119-
> You can also see all existing transcriptions and their Transcription IDs for a given Speech resource by using [Transcriptions_Get](https://centralus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Get) request.
119+
> You can also see all existing transcriptions and their Transcription IDs for a given Speech resource by using [Transcriptions_Get](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Get) request.

articles/cognitive-services/Speech-Service/language-identification.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -497,7 +497,7 @@ speechRecognizer.recognizeOnceAsync((result: SpeechSDK.SpeechRecognitionResult)
497497
### Using Speech-to-text custom models
498498

499499
> [!NOTE]
500-
> Language detection with custom models can be used in OnLine transcription only. Batch transcription supports language detection for base models.
500+
> Language detection with custom models can only be used with real-time speech to text and speech translation. Batch transcription only supports language detection for base models.
501501

502502
::: zone pivot="programming-language-csharp"
503503
This sample shows how to use language detection with a custom endpoint. If the detected language is `en-US`, then the default model is used. If the detected language is `fr-FR`, then the custom model endpoint is used. For more information, see [Deploy a Custom Speech model](how-to-custom-speech-deploy-model.md).
@@ -592,9 +592,9 @@ var autoDetectSourceLanguageConfig = SpeechSDK.AutoDetectSourceLanguageConfig.fr
592592
To identify languages in [Batch transcription](batch-transcription.md), you need to use `languageIdentification` property in the body of your [transcription REST request](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create). The example in this section shows the usage of `languageIdentification` property with four candidate languages.
593593
594594
> [!WARNING]
595-
> Batch transcription supports language identification for base models only. If both language identification and custom model usage are specified in the transcription request, the service will automatically fall back to the base models for the specified candidate languages. This may result in unexpected recognition results.
595+
> Batch transcription only supports language identification for base models. If both language identification and a custom model are specified in the transcription request, the service will fall back to use the base models for the specified candidate languages. This may result in unexpected recognition results.
596596
>
597-
> If your scenario requires both language identification and custom models, use [OnLine transcription](#using-speech-to-text-custom-models).
597+
> If your speech to text scenario requires both language identification and custom models, use [real-time speech to text](#using-speech-to-text-custom-models) instead of batch transcription.
598598
599599
```json
600600
{

articles/cognitive-services/Speech-Service/speech-services-private-link.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Speech-to-text has two REST APIs. Each API serves a different purpose, uses diff
139139

140140
The Speech-to-text REST APIs are:
141141
- [Speech-to-text REST API](rest-speech-to-text.md), which is used for [Batch transcription](batch-transcription.md) and [Custom Speech](custom-speech-overview.md).
142-
- [Speech-to-text REST API for short audio](rest-speech-to-text-short.md), which is used for online transcription
142+
- [Speech-to-text REST API for short audio](rest-speech-to-text-short.md), which is used for real-time speech to text.
143143

144144
Usage of the Speech-to-text REST API for short audio and the Text-to-speech REST API in the private endpoint scenario is the same. It's equivalent to the [Speech SDK case](#speech-resource-with-a-custom-domain-name-and-a-private-endpoint-usage-with-the-speech-sdk) described later in this article.
145145

0 commit comments

Comments
 (0)