Skip to content

Commit 8f6fe7e

Browse files
authored
Merge pull request #271910 from MicrosoftDocs/main
4/11/2024 AM Publish
2 parents c7a3605 + f819011 commit 8f6fe7e

File tree

70 files changed

+773
-1096
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+773
-1096
lines changed

articles/ai-services/.openpublishing.redirection.ai-services-from-cog.json

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2042,9 +2042,14 @@
20422042
},
20432043
{
20442044
"source_path_from_root": "/articles/cognitive-services/speech-service/migrate-v2-to-v3.md",
2045-
"redirect_url": "/azure/ai-services/speech-service/migrate-v2-to-v3",
2045+
"redirect_url": "/azure/ai-services/speech-service/migrate-v3-1-to-v3-2",
20462046
"redirect_document_id": true
20472047
},
2048+
{
2049+
"source_path_from_root": "/articles/ai-services/speech-service/migrate-v2-to-v3.md",
2050+
"redirect_url": "/azure/ai-services/speech-service/migrate-v3-1-to-v3-2",
2051+
"redirect_document_id": false
2052+
},
20482053
{
20492054
"source_path_from_root": "/articles/cognitive-services/speech-service/migrate-v3-0-to-v3-1.md",
20502055
"redirect_url": "/azure/ai-services/speech-service/migrate-v3-0-to-v3-1",

articles/ai-services/speech-service/batch-transcription-create.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 1/26/2024
10+
ms.date: 4/15/2024
1111
zone_pivot_groups: speech-cli-rest
1212
ms.custom: devx-track-csharp
1313
# Customer intent: As a user who implements audio transcription, I want create transcriptions in bulk so that I don't have to submit audio content repeatedly.
@@ -29,7 +29,7 @@ With batch transcriptions, you submit [audio data](batch-transcription-audio-dat
2929

3030
::: zone pivot="rest-api"
3131

32-
To create a transcription, use the [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create) operation of the [Speech to text REST API](rest-speech-to-text.md#transcriptions). Construct the request body according to the following instructions:
32+
To create a transcription, use the [Transcriptions_Create](/rest/api/speechtotext/transcriptions/create) operation of the [Speech to text REST API](rest-speech-to-text.md#batch-transcription). Construct the request body according to the following instructions:
3333

3434
- You must set either the `contentContainerUrl` or `contentUrls` property. For more information about Azure blob storage for batch transcription, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).
3535
- Set the required `locale` property. This value should match the expected locale of the audio data to transcribe. You can't change the locale later.
@@ -40,7 +40,7 @@ To create a transcription, use the [Transcriptions_Create](https://eastus.dev.co
4040

4141
For more information, see [Request configuration options](#request-configuration-options).
4242

43-
Make an HTTP POST request that uses the URI as shown in the following [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create) example.
43+
Make an HTTP POST request that uses the URI as shown in the following [Transcriptions_Create](/rest/api/speechtotext/transcriptions/create) example.
4444

4545
- Replace `YourSubscriptionKey` with your Speech resource key.
4646
- Replace `YourServiceRegion` with your Speech resource region.
@@ -102,11 +102,11 @@ You should receive a response body in the following format:
102102
}
103103
```
104104

105-
The top-level `self` property in the response body is the transcription's URI. Use this URI to [get](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Get) details such as the URI of the transcriptions and transcription report files. You also use this URI to [update](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Update) or [delete](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Delete) a transcription.
105+
The top-level `self` property in the response body is the transcription's URI. Use this URI to [get](/rest/api/speechtotext/transcriptions/get) details such as the URI of the transcriptions and transcription report files. You also use this URI to [update](/rest/api/speechtotext/transcriptions/update) or [delete](/rest/api/speechtotext/transcriptions/delete) a transcription.
106106

107-
You can query the status of your transcriptions with the [Transcriptions_Get](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Get) operation.
107+
You can query the status of your transcriptions with the [Transcriptions_Get](/rest/api/speechtotext/transcriptions/get) operation.
108108

109-
Call [Transcriptions_Delete](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Delete)
109+
Call [Transcriptions_Delete](/rest/api/speechtotext/transcriptions/delete)
110110
regularly from the service, after you retrieve the results. Alternatively, set the `timeToLive` property to ensure the eventual deletion of the results.
111111

112112
::: zone-end
@@ -169,15 +169,15 @@ spx help batch transcription
169169

170170
::: zone pivot="rest-api"
171171

172-
Here are some property options that you can use to configure a transcription when you call the [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create) operation.
172+
Here are some property options that you can use to configure a transcription when you call the [Transcriptions_Create](/rest/api/speechtotext/transcriptions/create) operation.
173173

174174
| Property | Description |
175175
|----------|-------------|
176176
|`channels`|An array of channel numbers to process. Channels `0` and `1` are transcribed by default. |
177177
|`contentContainerUrl`| You can submit individual audio files or a whole storage container.<br/><br/>You must specify the audio data location by using either the `contentContainerUrl` or `contentUrls` property. For more information about Azure blob storage for batch transcription, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property isn't returned in the response.|
178178
|`contentUrls`| You can submit individual audio files or a whole storage container.<br/><br/>You must specify the audio data location by using either the `contentContainerUrl` or `contentUrls` property. For more information, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property isn't returned in the response.|
179179
|`destinationContainerUrl`|The result can be stored in an Azure container. If you don't specify a container, the Speech service stores the results in a container managed by Microsoft. When the transcription job is deleted, the transcription result data is also deleted. For more information, such as the supported security scenarios, see [Specify a destination container URL](#specify-a-destination-container-url).|
180-
|`diarization`|Indicates that the Speech service should attempt diarization analysis on the input, which is expected to be a mono channel that contains multiple voices. The feature isn't available with stereo recordings.<br/><br/>Diarization is the process of separating speakers in audio data. The batch pipeline can recognize and separate multiple speakers on mono channel recordings.<br/><br/>Specify the minimum and maximum number of people who might be speaking. You must also set the `diarizationEnabled` property to `true`. The [transcription file](batch-transcription-get.md#transcription-result-file) contains a `speaker` entry for each transcribed phrase.<br/><br/>You need to use this property when you expect three or more speakers. For two speakers, setting `diarizationEnabled` property to `true` is enough. For an example of the property usage, see [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create).<br/><br/>The maximum number of speakers for diarization must be less than 36 and more or equal to the `minSpeakers` property. For an example, see [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create).<br/><br/>When this property is selected, source audio length can't exceed 240 minutes per file.<br/><br/>**Note**: This property is only available with Speech to text REST API version 3.1 and later. If you set this property with any previous version, such as version 3.0, it's ignored and only two speakers are identified.|
180+
|`diarization`|Indicates that the Speech service should attempt diarization analysis on the input, which is expected to be a mono channel that contains multiple voices. The feature isn't available with stereo recordings.<br/><br/>Diarization is the process of separating speakers in audio data. The batch pipeline can recognize and separate multiple speakers on mono channel recordings.<br/><br/>Specify the minimum and maximum number of people who might be speaking. You must also set the `diarizationEnabled` property to `true`. The [transcription file](batch-transcription-get.md#transcription-result-file) contains a `speaker` entry for each transcribed phrase.<br/><br/>You need to use this property when you expect three or more speakers. For two speakers, setting `diarizationEnabled` property to `true` is enough. For an example of the property usage, see [Transcriptions_Create](/rest/api/speechtotext/transcriptions/create).<br/><br/>The maximum number of speakers for diarization must be less than 36 and more or equal to the `minSpeakers` property. For an example, see [Transcriptions_Create](/rest/api/speechtotext/transcriptions/create).<br/><br/>When this property is selected, source audio length can't exceed 240 minutes per file.<br/><br/>**Note**: This property is only available with Speech to text REST API version 3.1 and later. If you set this property with any previous version, such as version 3.0, it's ignored and only two speakers are identified.|
181181
|`diarizationEnabled`|Specifies that the Speech service should attempt diarization analysis on the input, which is expected to be a mono channel that contains two voices. The default value is `false`.<br/><br/>For three or more voices you also need to use property `diarization`. Use only with Speech to text REST API version 3.1 and later.<br/><br/>When this property is selected, source audio length can't exceed 240 minutes per file.|
182182
|`displayName`|The name of the batch transcription. Choose a name that you can refer to later. The display name doesn't have to be unique.<br/><br/>This property is required.|
183183
|`displayFormWordLevelTimestampsEnabled`|Specifies whether to include word-level timestamps on the display form of the transcription results. The results are returned in the `displayWords` property of the transcription file. The default value is `false`.<br/><br/>**Note**: This property is only available with Speech to text REST API version 3.1 and later.|
@@ -187,7 +187,7 @@ Here are some property options that you can use to configure a transcription whe
187187
|`model`|You can set the `model` property to use a specific base model or [custom speech](how-to-custom-speech-train-model.md) model. If you don't specify the `model`, the default base model for the locale is used. For more information, see [Use a custom model](#use-a-custom-model) and [Use a Whisper model](#use-a-whisper-model).|
188188
|`profanityFilterMode`|Specifies how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. |
189189
|`punctuationMode`|Specifies how to handle punctuation in recognition results. Accepted values are `None` to disable punctuation, `Dictated` to imply explicit (spoken) punctuation, `Automatic` to let the decoder deal with punctuation, or `DictatedAndAutomatic` to use dictated and automatic punctuation. The default value is `DictatedAndAutomatic`.<br/><br/>This property isn't applicable for Whisper models.|
190-
|`timeToLive`|A duration after the transcription job is created, when the transcription results will be automatically deleted. The value is an ISO 8601 encoded duration. For example, specify `PT12H` for 12 hours. As an alternative, you can call [Transcriptions_Delete](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Delete) regularly after you retrieve the transcription results.|
190+
|`timeToLive`|A duration after the transcription job is created, when the transcription results will be automatically deleted. The value is an ISO 8601 encoded duration. For example, specify `PT12H` for 12 hours. As an alternative, you can call [Transcriptions_Delete](/rest/api/speechtotext/transcriptions/delete) regularly after you retrieve the transcription results.|
191191
|`wordLevelTimestampsEnabled`|Specifies if word level timestamps should be included in the output. The default value is `false`.<br/><br/>This property isn't applicable for Whisper models. Whisper is a display-only model, so the lexical field isn't populated in the transcription.|
192192

193193

@@ -260,7 +260,7 @@ To use a Whisper model for batch transcription, you need to set the `model` prop
260260
Whisper models by batch transcription are supported in the Australia East, Central US, East US, North Central US, South Central US, Southeast Asia, and West Europe regions.
261261

262262
::: zone pivot="rest-api"
263-
You can make a [Models_ListBaseModels](https://westus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-2-preview2/operations/Models_ListBaseModels) request to get available base models for all locales.
263+
You can make a [Models_ListBaseModels](/rest/api/speechtotext/models/list-base-models) request to get available base models for all locales.
264264

265265
Make an HTTP GET request as shown in the following example for the `eastus` region. Replace `YourSubscriptionKey` with your Speech resource key. Replace `eastus` if you're using a different region.
266266

articles/ai-services/speech-service/batch-transcription-get.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ To get transcription results, first check the [status](#get-transcription-status
2020

2121
::: zone pivot="rest-api"
2222

23-
To get the status of the transcription job, call the [Transcriptions_Get](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Get) operation of the [Speech to text REST API](rest-speech-to-text.md).
23+
To get the status of the transcription job, call the [Transcriptions_Get](/rest/api/speechtotext/transcriptions/get) operation of the [Speech to text REST API](rest-speech-to-text.md).
2424

2525
> [!IMPORTANT]
2626
> Batch transcription jobs are scheduled on a best-effort basis. At peak hours, it may take up to 30 minutes or longer for a transcription job to start processing. Most of the time during the execution the transcription status will be `Running`. This is because the job is assigned the `Running` status the moment it moves to the batch transcription backend system. When the base model is used, this assignment happens almost immediately; it's slightly slower for custom models. Thus, the amount of time a transcription job spends in the `Running` state doesn't correspond to the actual transcription time but also includes waiting time in the internal queues.
@@ -134,7 +134,7 @@ spx help batch transcription
134134

135135
::: zone pivot="rest-api"
136136

137-
The [Transcriptions_ListFiles](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_ListFiles) operation returns a list of result files for a transcription. A [transcription report](#transcription-report-file) file is provided for each submitted batch transcription job. In addition, one [transcription](#transcription-result-file) file (the end result) is provided for each successfully transcribed audio file.
137+
The [Transcriptions_ListFiles](/rest/api/speechtotext/transcriptions/list-files) operation returns a list of result files for a transcription. A [transcription report](#transcription-report-file) file is provided for each submitted batch transcription job. In addition, one [transcription](#transcription-result-file) file (the end result) is provided for each successfully transcribed audio file.
138138

139139
Make an HTTP GET request using the "files" URI from the previous response body. Replace `YourTranscriptionId` with your transcription ID, replace `YourSubscriptionKey` with your Speech resource key, and replace `YourServiceRegion` with your Speech resource region.
140140

articles/ai-services/speech-service/batch-transcription.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.custom: devx-track-csharp
1717
> [!IMPORTANT]
1818
> New pricing is in effect for batch transcription via [Speech to text REST API v3.2](./migrate-v3-1-to-v3-2.md). For more information, see the [pricing guide](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services).
1919
20-
Batch transcription is used to transcribe a large amount of audio data in storage. Both the [Speech to text REST API](rest-speech-to-text.md#transcriptions) and [Speech CLI](spx-basics.md) support batch transcription.
20+
Batch transcription is used to transcribe a large amount of audio data in storage. Both the [Speech to text REST API](rest-speech-to-text.md#batch-transcription) and [Speech CLI](spx-basics.md) support batch transcription.
2121

2222
You should provide multiple files per request or point to an Azure Blob Storage container with the audio files to transcribe. The batch transcription service can handle a large number of submitted transcriptions. The service transcribes the files concurrently, which reduces the turnaround time.
2323

0 commit comments

Comments
 (0)