You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Construct the form definition according to the following instructions:
64
64
65
-
- Set the optional (but recommended) `locales` property that should match the expected locale of the audio data to transcribe. In this example, the locale is set to `en-US`. The supported locales that you can specify are: da-DK, de-DE, en-GB, en-IN, en-US, es-ES, es-MX, fi-FI, fr-FR, he-IL, hi-IN, id-ID, it-IT, ja-JP, ko-KR, pl-PL, pt-BR, pt-PT, sv-SE, and zh-CN. For more information about the supported locales, see [speech to text supported languages](./language-support.md?tabs=stt).
65
+
- Set the optional (but recommended) `locales` property that should match the expected locale of the audio data to transcribe. In this example, the locale is set to `en-US`. For more information about the supported locales, see [speech to text supported languages](./language-support.md?tabs=stt).
66
66
67
67
For more information about `locales` and other properties for the fast transcription API, see the [request configuration options](#request-configuration-options) section later in this guide.
68
68
@@ -94,7 +94,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
94
94
}
95
95
],
96
96
"locale": "en-US",
97
-
"confidence": 0.93616915
97
+
"confidence": 0.93554276
98
98
},
99
99
{
100
100
"offsetMilliseconds": 1600,
@@ -118,7 +118,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
118
118
}
119
119
],
120
120
"locale": "en-US",
121
-
"confidence": 0.93616915
121
+
"confidence": 0.93554276
122
122
},
123
123
{
124
124
"offsetMilliseconds": 2240,
@@ -152,7 +152,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
152
152
}
153
153
],
154
154
"locale": "en-US",
155
-
"confidence": 0.93616915
155
+
"confidence": 0.93554276
156
156
},
157
157
{
158
158
"offsetMilliseconds": 3280,
@@ -181,7 +181,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
181
181
}
182
182
],
183
183
"locale": "en-US",
184
-
"confidence": 0.93616915
184
+
"confidence": 0.93554276
185
185
},
186
186
{
187
187
"offsetMilliseconds": 5040,
@@ -200,7 +200,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
200
200
}
201
201
],
202
202
"locale": "en-US",
203
-
"confidence": 0.93616915
203
+
"confidence": 0.93554276
204
204
},
205
205
{
206
206
"offsetMilliseconds": 5440,
@@ -229,7 +229,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
229
229
}
230
230
],
231
231
"locale": "en-US",
232
-
"confidence": 0.93616915
232
+
"confidence": 0.93554276
233
233
},
234
234
// More transcription results...
235
235
// Redacted for brevity
@@ -265,7 +265,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
265
265
}
266
266
],
267
267
"locale": "en-US",
268
-
"confidence": 0.9314801
268
+
"confidence": 0.92022026
269
269
},
270
270
{
271
271
"offsetMilliseconds": 181960,
@@ -284,7 +284,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
284
284
}
285
285
],
286
286
"locale": "en-US",
287
-
"confidence": 0.9314801
287
+
"confidence": 0.92022026
288
288
}
289
289
]
290
290
}
@@ -1730,11 +1730,11 @@ Here are some property options to configure a transcription when you call the [T
1730
1730
|----------|-------------|----------------------|
1731
1731
|`channels`| The list of zero-based indices of the channels to be transcribed separately. Up to two channels are supported unless diarization is enabled. By default, the fast transcription API merges all input channels into a single channel and then performs the transcription. If this isn't desirable, channels can be transcribed independently without merging.<br/><br/>If you want to transcribe the channels from a stereo audio file separately, you need to specify `[0,1]`, `[0]`, or `[1]`. Otherwise, stereo audio is merged to mono and only a single channel is transcribed.<br/><br/>If the audio is stereo and diarization is enabled, then you can't set the `channels` property to `[0,1]`. The Speech service doesn't support diarization of multiple channels.<br/><br/>For mono audio, the `channels` property is ignored, and the audio is always transcribed as a single channel.| Optional |
1732
1732
|`diarization`| The diarization configuration. Diarization is the process of recognizing and separating multiple speakers in one audio channel. For example, specify `"diarization": {"maxSpeakers": 2, "enabled": true}`. Then the transcription file contains `speaker` entries (such as `"speaker": 0` or `"speaker": 1`) for each transcribed phrase. | Optional |
1733
-
| `locales` | The list of locales that should match the expected locale of the audio data to transcribe.<br/><br/>If you know the locale of the audio file, you can specify it to improve transcription accuracy and minimize the latency. If a single locale is specified, that locale is used for transcription.<br/><br/>But if you're not sure about the locale, you can specify multiple locales to use language identification. Language identification might be more accurate with a more precise list of candidate locales.<br/><br/>If you don't specify any locale, then the Speech service will use the latest multi-lingual model to identify the locale and transcribe continuously.<br/><br/> You can get the latest supported languages via the [Transcriptions - List Supported Locales](/rest/api/speechtotext/transcriptions/list-supported-locales) REST API. For more information about locales, see the [Speech service language support](language-support.md?tabs=stt) documentation.| Optional but recommended if you know the expected locale. |
1733
+
| `locales` | The list of locales that should match the expected locale of the audio data to transcribe.<br/><br/>If you know the locale of the audio file, you can specify it to improve transcription accuracy and minimize the latency. If a single locale is specified, that locale is used for transcription.<br/><br/>But if you're not sure about the locale, you can specify multiple locales to use language identification. Language identification might be more accurate with a more precise list of candidate locales.<br/><br/>If you don't specify any locale, then the Speech service will use the latest multi-lingual model to identify the locale and transcribe continuously.<br/><br/> You can get the latest supported languages via the [Transcriptions - List Supported Locales](/rest/api/speechtotext/transcriptions/list-supported-locales) REST API (API version 2024-11-15 or later). For more information about locales, see the [Speech service language support](language-support.md?tabs=stt) documentation.| Optional but recommended if you know the expected locale. |
1734
1734
|`profanityFilterMode`|Specifies how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. | Optional |
1735
1735
1736
1736
## Related content
1737
1737
1738
1738
-[Fast transcription REST API reference](/rest/api/speechtotext/transcriptions/transcribe)
1739
1739
-[Speech to text supported languages](./language-support.md?tabs=stt)
0 commit comments