Skip to content

Commit 14e727e

Browse files
authored
Update fast-transcription-create.md
1 parent b160336 commit 14e727e

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

articles/ai-services/speech-service/fast-transcription-create.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtot
6262

6363
Construct the form definition according to the following instructions:
6464

65-
- Set the optional (but recommended) `locales` property that should match the expected locale of the audio data to transcribe. In this example, the locale is set to `en-US`. The supported locales that you can specify are: da-DK, de-DE, en-GB, en-IN, en-US, es-ES, es-MX, fi-FI, fr-FR, he-IL, hi-IN, id-ID, it-IT, ja-JP, ko-KR, pl-PL, pt-BR, pt-PT, sv-SE, and zh-CN. For more information about the supported locales, see [speech to text supported languages](./language-support.md?tabs=stt).
65+
- Set the optional (but recommended) `locales` property that should match the expected locale of the audio data to transcribe. In this example, the locale is set to `en-US`. For more information about the supported locales, see [speech to text supported languages](./language-support.md?tabs=stt).
6666

6767
For more information about `locales` and other properties for the fast transcription API, see the [request configuration options](#request-configuration-options) section later in this guide.
6868

@@ -94,7 +94,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
9494
}
9595
],
9696
"locale": "en-US",
97-
"confidence": 0.93616915
97+
"confidence": 0.93554276
9898
},
9999
{
100100
"offsetMilliseconds": 1600,
@@ -118,7 +118,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
118118
}
119119
],
120120
"locale": "en-US",
121-
"confidence": 0.93616915
121+
"confidence": 0.93554276
122122
},
123123
{
124124
"offsetMilliseconds": 2240,
@@ -152,7 +152,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
152152
}
153153
],
154154
"locale": "en-US",
155-
"confidence": 0.93616915
155+
"confidence": 0.93554276
156156
},
157157
{
158158
"offsetMilliseconds": 3280,
@@ -181,7 +181,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
181181
}
182182
],
183183
"locale": "en-US",
184-
"confidence": 0.93616915
184+
"confidence": 0.93554276
185185
},
186186
{
187187
"offsetMilliseconds": 5040,
@@ -200,7 +200,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
200200
}
201201
],
202202
"locale": "en-US",
203-
"confidence": 0.93616915
203+
"confidence": 0.93554276
204204
},
205205
{
206206
"offsetMilliseconds": 5440,
@@ -229,7 +229,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
229229
}
230230
],
231231
"locale": "en-US",
232-
"confidence": 0.93616915
232+
"confidence": 0.93554276
233233
},
234234
// More transcription results...
235235
// Redacted for brevity
@@ -265,7 +265,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
265265
}
266266
],
267267
"locale": "en-US",
268-
"confidence": 0.9314801
268+
"confidence": 0.92022026
269269
},
270270
{
271271
"offsetMilliseconds": 181960,
@@ -284,7 +284,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
284284
}
285285
],
286286
"locale": "en-US",
287-
"confidence": 0.9314801
287+
"confidence": 0.92022026
288288
}
289289
]
290290
}
@@ -1730,11 +1730,11 @@ Here are some property options to configure a transcription when you call the [T
17301730
|----------|-------------|----------------------|
17311731
| `channels` | The list of zero-based indices of the channels to be transcribed separately. Up to two channels are supported unless diarization is enabled. By default, the fast transcription API merges all input channels into a single channel and then performs the transcription. If this isn't desirable, channels can be transcribed independently without merging.<br/><br/>If you want to transcribe the channels from a stereo audio file separately, you need to specify `[0,1]`, `[0]`, or `[1]`. Otherwise, stereo audio is merged to mono and only a single channel is transcribed.<br/><br/>If the audio is stereo and diarization is enabled, then you can't set the `channels` property to `[0,1]`. The Speech service doesn't support diarization of multiple channels.<br/><br/>For mono audio, the `channels` property is ignored, and the audio is always transcribed as a single channel.| Optional |
17321732
| `diarization` | The diarization configuration. Diarization is the process of recognizing and separating multiple speakers in one audio channel. For example, specify `"diarization": {"maxSpeakers": 2, "enabled": true}`. Then the transcription file contains `speaker` entries (such as `"speaker": 0` or `"speaker": 1`) for each transcribed phrase. | Optional |
1733-
| `locales` | The list of locales that should match the expected locale of the audio data to transcribe.<br/><br/>If you know the locale of the audio file, you can specify it to improve transcription accuracy and minimize the latency. If a single locale is specified, that locale is used for transcription.<br/><br/>But if you're not sure about the locale, you can specify multiple locales to use language identification. Language identification might be more accurate with a more precise list of candidate locales.<br/><br/>If you don't specify any locale, then the Speech service will use the latest multi-lingual model to identify the locale and transcribe continuously.<br/><br/> You can get the latest supported languages via the [Transcriptions - List Supported Locales](/rest/api/speechtotext/transcriptions/list-supported-locales) REST API. For more information about locales, see the [Speech service language support](language-support.md?tabs=stt) documentation.| Optional but recommended if you know the expected locale. |
1733+
| `locales` | The list of locales that should match the expected locale of the audio data to transcribe.<br/><br/>If you know the locale of the audio file, you can specify it to improve transcription accuracy and minimize the latency. If a single locale is specified, that locale is used for transcription.<br/><br/>But if you're not sure about the locale, you can specify multiple locales to use language identification. Language identification might be more accurate with a more precise list of candidate locales.<br/><br/>If you don't specify any locale, then the Speech service will use the latest multi-lingual model to identify the locale and transcribe continuously.<br/><br/> You can get the latest supported languages via the [Transcriptions - List Supported Locales](/rest/api/speechtotext/transcriptions/list-supported-locales) REST API (API version 2024-11-15 or later). For more information about locales, see the [Speech service language support](language-support.md?tabs=stt) documentation.| Optional but recommended if you know the expected locale. |
17341734
| `profanityFilterMode` |Specifies how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. | Optional |
17351735

17361736
## Related content
17371737

17381738
- [Fast transcription REST API reference](/rest/api/speechtotext/transcriptions/transcribe)
17391739
- [Speech to text supported languages](./language-support.md?tabs=stt)
1740-
- [Batch transcription](./batch-transcription.md)
1740+
- [Batch transcription](./batch-transcription.md)

0 commit comments

Comments
 (0)