Skip to content

Commit b29d82c

Browse files
committed
lid properties
1 parent e381acf commit b29d82c

File tree

2 files changed

+33
-7
lines changed

2 files changed

+33
-7
lines changed

articles/cognitive-services/Speech-Service/batch-transcription-create.md

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,10 @@ To create a transcription, use the [Transcriptions_Create](https://eastus.dev.co
2727
- You must set either the `contentContainerUrl` or `contentUrls` property. For more information about Azure blob storage for batch transcription, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).
2828
- Set the required `locale` property. This should match the expected locale of the audio data to transcribe. The locale can't be changed later.
2929
- Set the required `displayName` property. Choose a transcription name that you can refer to later. The transcription name doesn't have to be unique and can be changed later.
30-
- Optionally you can set the `wordLevelTimestampsEnabled` property to `true` to enable word-level timestamps in the transcription results. The default value is `false`. For more information, see [request configuration options](#request-configuration-options).
30+
- Optionally you can set the `wordLevelTimestampsEnabled` property to `true` to enable word-level timestamps in the transcription results. The default value is `false`.
31+
- Optionally you can set the `languageIdentification` property. Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md?tabs=language-identification).<br/><br/>If you set the `languageIdentification` property, then you must also set `languageIdentification.candidateLocales` with candidate locales.
32+
33+
For more information, see [request configuration options](#request-configuration-options).
3134

3235
Make an HTTP POST request using the URI as shown in the following [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create) example. Replace `YourSubscriptionKey` with your Speech resource key, replace `YourServiceRegion` with your Speech resource region, and set the request body properties as previously described.
3336

@@ -42,6 +45,11 @@ curl -v -X POST -H "Ocp-Apim-Subscription-Key: YourSubscriptionKey" -H "Content-
4245
"model": null,
4346
"properties": {
4447
"wordLevelTimestampsEnabled": true,
48+
"languageIdentification": {
49+
"candidateLocales": [
50+
"en-US", "de-DE", "es-ES"
51+
],
52+
}
4553
},
4654
}' "https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/v3.1/transcriptions"
4755
```
@@ -65,7 +73,14 @@ You should receive a response body in the following format:
6573
1
6674
],
6775
"punctuationMode": "DictatedAndAutomatic",
68-
"profanityFilterMode": "Masked"
76+
"profanityFilterMode": "Masked",
77+
"languageIdentification": {
78+
"candidateLocales": [
79+
"en-US",
80+
"de-DE",
81+
"es-ES"
82+
]
83+
}
6984
},
7085
"lastActionDateTime": "2022-10-21T14:18:06Z",
7186
"status": "NotStarted",
@@ -147,11 +162,16 @@ Here are some property options that you can use to configure a transcription whe
147162
| Property | Description |
148163
|----------|-------------|
149164
|`channels`|An array of channel numbers to process. Channels `0` and `1` are transcribed by default. |
150-
|`contentContainerUrl`| You can submit individual audio files, or a whole storage container. You must specify the audio data location via either the `contentContainerUrl` or `contentUrls` property. For more information about Azure blob storage for batch transcription, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property won't be returned in the response.|
151-
|`contentUrls`| You can submit individual audio files, or a whole storage container. You must specify the audio data location via either the `contentContainerUrl` or `contentUrls` property. For more information, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property won't be returned in the response.|
165+
|`contentContainerUrl`| You can submit individual audio files, or a whole storage container.<br/><br/>You must specify the audio data location via either the `contentContainerUrl` or `contentUrls` property. For more information about Azure blob storage for batch transcription, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property won't be returned in the response.|
166+
|`contentUrls`| You can submit individual audio files, or a whole storage container.<br/><br/>You must specify the audio data location via either the `contentContainerUrl` or `contentUrls` property. For more information, see [Locate audio files for batch transcription](batch-transcription-audio-data.md).<br/><br/>This property won't be returned in the response.|
152167
|`destinationContainerUrl`|The result can be stored in an Azure container. If you don't specify a container, the Speech service stores the results in a container managed by Microsoft. When the transcription job is deleted, the transcription result data is also deleted. For more information, see [Destination container URL](#destination-container-url).|
153168
|`diarization`|Indicates that diarization analysis should be carried out on the input, which is expected to be a mono channel that contains multiple voices. Specify the minimum and maximum number of people who might be speaking. You must also set the `diarizationEnabled` property to `true`. The [transcription file](batch-transcription-get.md#transcription-result-file) will contain a `speaker` entry for each transcribed phrase.<br/><br/>You need to use this property when you expect three or more speakers. For two speakers setting `diarizationEnabled` property to `true` is enough. See an example of the property usage in [Transcriptions_Create](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_Create) operation description.<br/><br/>Diarization is the process of separating speakers in audio data. The batch pipeline can recognize and separate multiple speakers on mono channel recordings. The feature isn't available with stereo recordings.<br/><br/>When this property is selected, source audio length can't exceed 240 minutes per file.<br/><br/>**Note**: This property is only available with Speech-to-text REST API version 3.1.|
154169
|`diarizationEnabled`|Specifies that diarization analysis should be carried out on the input, which is expected to be a mono channel that contains two voices. The default value is `false`.<br/><br/>For three or more voices you also need to use property `diarization` (only with Speech-to-text REST API version 3.1).<br/><br/>When this property is selected, source audio length can't exceed 240 minutes per file.|
170+
|`displayName`|The name of the batch transcription. Choose a name that you can refer to later. The display name doesn't have to be unique.<br/><br/>This property is required.|
171+
|`languageIdentification`|Language identification is used to identify languages spoken in audio when compared against a list of [supported languages](language-support.md?tabs=language-identification).<br/><br/>If you set the `properties.languageIdentification` property, then you must also set `properties.languageIdentification.candidateLocales` with candidate locales.|
172+
|`languageIdentification.candidateLocales`|The candidate locales for language identification such as `"properties": { "languageIdentification": { "candidateLocales": ["en-US", "de-DE", "es-ES"]}}`. A minimum of 2 and a maximum of 10 candidate locales, including the main locale for the transcription, is supported.|
173+
|`languageIdentification.speechModelMapping`|An optional mapping of locales to speech model entities. For example: `"properties": { "candidateLocales": ["en-US", "de-DE", "es-ES"], "speechModelMapping": { "en-US": {"self": "https://eastus.api.cognitive.microsoft.com/speechtotext/v3.1/models/base/ae8d1643-53e4-4554-be4c-221dcfb471c5" }}}`. If no model is given for a locale, the default base model is used. Keys must be locales contained in the candidate locales, values are entities for models of the respective locale.|
174+
|`locale`|The locale of the batch transcription. This should match the expected locale of the audio data to transcribe. The locale can't be changed later.<br/><br/>This property is required.|
155175
|`model`|You can set the `model` property to use a specific base model or [Custom Speech](how-to-custom-speech-train-model.md) model. If you don't specify the `model`, the default base model for the locale is used. For more information, see [Using custom models](#using-custom-models).|
156176
|`profanityFilterMode`|Specifies how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. |
157177
|`punctuationMode`|Specifies how to handle punctuation in recognition results. Accepted values are `None` to disable punctuation, `Dictated` to imply explicit (spoken) punctuation, `Automatic` to let the decoder deal with punctuation, or `DictatedAndAutomatic` to use dictated and automatic punctuation. The default value is `DictatedAndAutomatic`.|

articles/cognitive-services/Speech-Service/batch-transcription-get.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,14 @@ You should receive a response body in the following format:
5151
],
5252
"punctuationMode": "DictatedAndAutomatic",
5353
"profanityFilterMode": "Masked",
54-
"duration": "PT3S"
54+
"duration": "PT3S",
55+
"languageIdentification": {
56+
"candidateLocales": [
57+
"en-US",
58+
"de-DE",
59+
"es-ES"
60+
]
61+
}
5562
},
5663
"lastActionDateTime": "2022-09-10T18:39:09Z",
5764
"status": "Succeeded",
@@ -123,7 +130,6 @@ spx help batch transcription
123130

124131
::: zone pivot="rest-api"
125132

126-
127133
The [Transcriptions_ListFiles](https://eastus.dev.cognitive.microsoft.com/docs/services/speech-to-text-api-v3-1/operations/Transcriptions_ListFiles) operation returns a list of result files for a transcription. A [transcription report](#transcription-report-file) file is provided for each submitted batch transcription job. In addition, one [transcription](#transcription-result-file) file (the end result) is provided for each successfully transcribed audio file.
128134

129135
Make an HTTP GET request using the "files" URI from the previous response body. Replace `YourTranscriptionId` with your transcription ID, replace `YourSubscriptionKey` with your Speech resource key, and replace `YourServiceRegion` with your Speech resource region.
@@ -347,7 +353,7 @@ Depending in part on the request parameters set when you created the transcripti
347353
|`durationInTicks`|The audio duration in ticks (1 tick is 100 nanoseconds).|
348354
|`itn`|The inverse text normalized (ITN) form of the recognized text. Abbreviations such as "Doctor Smith" to "Dr Smith", phone numbers, and other transformations are applied.|
349355
|`lexical`|The actual words recognized.|
350-
|`locale`|The locale identified from the input the audio. The `languageIdentification` request property must be set to `true`, otherwise this property is not present.<br/><br/>**Note**: This property is only available with speech-to-text REST API version 3.1.|
356+
|`locale`|The locale identified from the input the audio. The `languageIdentification` request property must be set, otherwise this property is not present.<br/><br/>**Note**: This property is only available with speech-to-text REST API version 3.1.|
351357
|`maskedITN`|The ITN form with profanity masking applied.|
352358
|`nBest`|A list of possible transcriptions for the current phrase with confidences.|
353359
|`offset`|The offset in audio of this phrase. The value is an ISO 8601 encoded duration.|

0 commit comments

Comments
 (0)