You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/batch-transcription-create.md
+24-4Lines changed: 24 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,10 @@ To create a transcription, use the [Transcriptions_Create](https://eastus.dev.co
27
27
- You must set either the `contentContainerUrl` or `contentUrls` property. For more information about Azure blob storage for batch transcription, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).
28
28
- Set the required `locale` property. This should match the expected locale of the audio data to transcribe. The locale can't be changed later.
29
29
- Set the required `displayName` property. Choose a transcription name that you can refer to later. The transcription name doesn't have to be unique and can be changed later.
30
-
- Optionally you can set the `wordLevelTimestampsEnabled` property to `true` to enable word-level timestamps in the transcription results. The default value is `false`. For more information, see [request configuration options](#request-configuration-options).
30
+
- Optionally you can set the `wordLevelTimestampsEnabled` property to `true` to enable word-level timestamps in the transcription results. The default value is `false`.
31
+
- Optionally you can set the `languageIdentification` property. Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md?tabs=language-identification).<br/><br/>If you set the `languageIdentification` property, then you must also set `languageIdentification.candidateLocales` with candidate locales.
32
+
33
+
For more information, see [request configuration options](#request-configuration-options).
31
34
32
35
Make an HTTP POST request using the URI as shown in the following [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create) example. Replace `YourSubscriptionKey` with your Speech resource key, replace `YourServiceRegion` with your Speech resource region, and set the request body properties as previously described.
@@ -65,7 +73,14 @@ You should receive a response body in the following format:
65
73
1
66
74
],
67
75
"punctuationMode": "DictatedAndAutomatic",
68
-
"profanityFilterMode": "Masked"
76
+
"profanityFilterMode": "Masked",
77
+
"languageIdentification": {
78
+
"candidateLocales": [
79
+
"en-US",
80
+
"de-DE",
81
+
"es-ES"
82
+
]
83
+
}
69
84
},
70
85
"lastActionDateTime": "2022-10-21T14:18:06Z",
71
86
"status": "NotStarted",
@@ -147,11 +162,16 @@ Here are some property options that you can use to configure a transcription whe
147
162
| Property | Description |
148
163
|----------|-------------|
149
164
|`channels`|An array of channel numbers to process. Channels `0` and `1` are transcribed by default. |
150
-
|`contentContainerUrl`| You can submit individual audio files, or a whole storage container.You must specify the audio data location via either the `contentContainerUrl` or `contentUrls` property. For more information about Azure blob storage for batch transcription, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property won't be returned in the response.|
151
-
|`contentUrls`| You can submit individual audio files, or a whole storage container.You must specify the audio data location via either the `contentContainerUrl` or `contentUrls` property. For more information, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property won't be returned in the response.|
165
+
|`contentContainerUrl`| You can submit individual audio files, or a whole storage container.<br/><br/>You must specify the audio data location via either the `contentContainerUrl` or `contentUrls` property. For more information about Azure blob storage for batch transcription, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property won't be returned in the response.|
166
+
|`contentUrls`| You can submit individual audio files, or a whole storage container.<br/><br/>You must specify the audio data location via either the `contentContainerUrl` or `contentUrls` property. For more information, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property won't be returned in the response.|
152
167
|`destinationContainerUrl`|The result can be stored in an Azure container. If you don't specify a container, the Speech service stores the results in a container managed by Microsoft. When the transcription job is deleted, the transcription result data is also deleted. For more information, see [Destination container URL](#destination-container-url).|
153
168
|`diarization`|Indicates that diarization analysis should be carried out on the input, which is expected to be a mono channel that contains multiple voices. Specify the minimum and maximum number of people who might be speaking. You must also set the `diarizationEnabled` property to `true`. The [transcription file](batch-transcription-get.md#transcription-result-file) will contain a `speaker` entry for each transcribed phrase.<br/><br/>You need to use this property when you expect three or more speakers. For two speakers setting `diarizationEnabled` property to `true` is enough. See an example of the property usage in [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create) operation description.<br/><br/>Diarization is the process of separating speakers in audio data. The batch pipeline can recognize and separate multiple speakers on mono channel recordings. The feature isn't available with stereo recordings.<br/><br/>When this property is selected, source audio length can't exceed 240 minutes per file.<br/><br/>**Note**: This property is only available with Speech-to-text REST API version 3.1.|
154
169
|`diarizationEnabled`|Specifies that diarization analysis should be carried out on the input, which is expected to be a mono channel that contains two voices. The default value is `false`.<br/><br/>For three or more voices you also need to use property `diarization` (only with Speech-to-text REST API version 3.1).<br/><br/>When this property is selected, source audio length can't exceed 240 minutes per file.|
170
+
|`displayName`|The name of the batch transcription. Choose a name that you can refer to later. The display name doesn't have to be unique.<br/><br/>This property is required.|
171
+
|`languageIdentification`|Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md?tabs=language-identification).<br/><br/>If you set the `properties.languageIdentification` property, then you must also set `properties.languageIdentification.candidateLocales` with candidate locales.|
172
+
|`languageIdentification.candidateLocales`|The candidate locales for language identification such as `"properties": { "languageIdentification": { "candidateLocales": ["en-US", "de-DE", "es-ES"]}}`. A minimum of 2 and a maximum of 10 candidate locales, including the main locale for the transcription, is supported.|
173
+
|`languageIdentification.speechModelMapping`|An optional mapping of locales to speech model entities. For example: `"properties": { "candidateLocales": ["en-US", "de-DE", "es-ES"], "speechModelMapping": { "en-US": {"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/models/base/ae8d1643-53e4-4554-be4c-221dcfb471c5" }}}`. If no model is given for a locale, the default base model is used. Keys must be locales contained in the candidate locales, values are entities for models of the respective locale.|
174
+
|`locale`|The locale of the batch transcription. This should match the expected locale of the audio data to transcribe. The locale can't be changed later.<br/><br/>This property is required.|
155
175
|`model`|You can set the `model` property to use a specific base model or [Custom Speech](how-to-custom-speech-train-model.md) model. If you don't specify the `model`, the default base model for the locale is used. For more information, see [Using custom models](#using-custom-models).|
156
176
|`profanityFilterMode`|Specifies how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. |
157
177
|`punctuationMode`|Specifies how to handle punctuation in recognition results. Accepted values are `None` to disable punctuation, `Dictated` to imply explicit (spoken) punctuation, `Automatic` to let the decoder deal with punctuation, or `DictatedAndAutomatic` to use dictated and automatic punctuation. The default value is `DictatedAndAutomatic`.|
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/batch-transcription-get.md
+9-3Lines changed: 9 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,7 +51,14 @@ You should receive a response body in the following format:
51
51
],
52
52
"punctuationMode": "DictatedAndAutomatic",
53
53
"profanityFilterMode": "Masked",
54
-
"duration": "PT3S"
54
+
"duration": "PT3S",
55
+
"languageIdentification": {
56
+
"candidateLocales": [
57
+
"en-US",
58
+
"de-DE",
59
+
"es-ES"
60
+
]
61
+
}
55
62
},
56
63
"lastActionDateTime": "2022-09-10T18:39:09Z",
57
64
"status": "Succeeded",
@@ -123,7 +130,6 @@ spx help batch transcription
123
130
124
131
::: zone pivot="rest-api"
125
132
126
-
127
133
The [Transcriptions_ListFiles](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_ListFiles) operation returns a list of result files for a transcription. A [transcription report](#transcription-report-file) file is provided for each submitted batch transcription job. In addition, one [transcription](#transcription-result-file) file (the end result) is provided for each successfully transcribed audio file.
128
134
129
135
Make an HTTP GET request using the "files" URI from the previous response body. Replace `YourTranscriptionId` with your transcription ID, replace `YourSubscriptionKey` with your Speech resource key, and replace `YourServiceRegion` with your Speech resource region.
@@ -347,7 +353,7 @@ Depending in part on the request parameters set when you created the transcripti
347
353
|`durationInTicks`|The audio duration in ticks (1 tick is 100 nanoseconds).|
348
354
|`itn`|The inverse text normalized (ITN) form of the recognized text. Abbreviations such as "Doctor Smith" to "Dr Smith", phone numbers, and other transformations are applied.|
349
355
|`lexical`|The actual words recognized.|
350
-
|`locale`|The locale identified from the input the audio. The `languageIdentification` request property must be set to `true`, otherwise this property is not present.<br/><br/>**Note**: This property is only available with speech-to-text REST API version 3.1.|
356
+
|`locale`|The locale identified from the input the audio. The `languageIdentification` request property must be set, otherwise this property is not present.<br/><br/>**Note**: This property is only available with speech-to-text REST API version 3.1.|
351
357
|`maskedITN`|The ITN form with profanity masking applied.|
352
358
|`nBest`|A list of possible transcriptions for the current phrase with confidences.|
353
359
|`offset`|The offset in audio of this phrase. The value is an ISO 8601 encoded duration.|
0 commit comments