Update fast-transcription-create.md

ArcherAZ · web-flow · commit 14e727e96052 · 2025-06-06T09:35:57.000-07:00
diff --git a/articles/ai-services/speech-service/fast-transcription-create.md b/articles/ai-services/speech-service/fast-transcription-create.md
@@ -62,7 +62,7 @@ curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtot
 
 Construct the form definition according to the following instructions:
 
-- Set the optional (but recommended) `locales` property that should match the expected locale of the audio data to transcribe. In this example, the locale is set to `en-US`. The supported locales that you can specify are: da-DK, de-DE, en-GB, en-IN, en-US, es-ES, es-MX, fi-FI, fr-FR, he-IL, hi-IN, id-ID, it-IT, ja-JP, ko-KR, pl-PL, pt-BR, pt-PT, sv-SE, and zh-CN. For more information about the supported locales, see [speech to text supported languages](./language-support.md?tabs=stt).
+- Set the optional (but recommended) `locales` property that should match the expected locale of the audio data to transcribe. In this example, the locale is set to `en-US`. For more information about the supported locales, see [speech to text supported languages](./language-support.md?tabs=stt).
 
 For more information about `locales` and other properties for the fast transcription API, see the [request configuration options](#request-configuration-options) section later in this guide.
 
@@ -94,7 +94,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
 				}
 			],
 			"locale": "en-US",
-			"confidence": 0.93616915
+			"confidence": 0.93554276
 		},
 		{
 			"offsetMilliseconds": 1600,
@@ -118,7 +118,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
 				}
 			],
 			"locale": "en-US",
-			"confidence": 0.93616915
+			"confidence": 0.93554276
 		},
 		{
 			"offsetMilliseconds": 2240,
@@ -152,7 +152,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
 				}
 			],
 			"locale": "en-US",
-			"confidence": 0.93616915
+			"confidence": 0.93554276
 		},
 		{
 			"offsetMilliseconds": 3280,
@@ -181,7 +181,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
 				}
 			],
 			"locale": "en-US",
-			"confidence": 0.93616915
+			"confidence": 0.93554276
 		},
 		{
 			"offsetMilliseconds": 5040,
@@ -200,7 +200,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
 				}
 			],
 			"locale": "en-US",
-			"confidence": 0.93616915
+			"confidence": 0.93554276
 		},
 		{
 			"offsetMilliseconds": 5440,
@@ -229,7 +229,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
 				}
 			],
 			"locale": "en-US",
-			"confidence": 0.93616915
+			"confidence": 0.93554276
 		},
 		// More transcription results...
 	    // Redacted for brevity
@@ -265,7 +265,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
 				}
 			],
 			"locale": "en-US",
-			"confidence": 0.9314801
+			"confidence": 0.92022026
 		},
 		{
 			"offsetMilliseconds": 181960,
@@ -284,7 +284,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
 				}
 			],
 			"locale": "en-US",
-			"confidence": 0.9314801
+			"confidence": 0.92022026
 		}
 	]
 }
@@ -1730,11 +1730,11 @@ Here are some property options to configure a transcription when you call the [T
 |----------|-------------|----------------------|
 | `channels` | The list of zero-based indices of the channels to be transcribed separately. Up to two channels are supported unless diarization is enabled. By default, the fast transcription API merges all input channels into a single channel and then performs the transcription. If this isn't desirable, channels can be transcribed independently without merging.<br/><br/>If you want to transcribe the channels from a stereo audio file separately, you need to specify `[0,1]`, `[0]`, or `[1]`. Otherwise, stereo audio is merged to mono and only a single channel is transcribed.<br/><br/>If the audio is stereo and diarization is enabled, then you can't set the `channels` property to `[0,1]`. The Speech service doesn't support diarization of multiple channels.<br/><br/>For mono audio, the `channels` property is ignored, and the audio is always transcribed as a single channel.| Optional |
 | `diarization` | The diarization configuration. Diarization is the process of recognizing and separating multiple speakers in one audio channel. For example, specify `"diarization": {"maxSpeakers": 2, "enabled": true}`. Then the transcription file contains `speaker` entries (such as `"speaker": 0` or `"speaker": 1`) for each transcribed phrase. | Optional |
-| `locales` | The list of locales that should match the expected locale of the audio data to transcribe.<br/><br/>If you know the locale of the audio file, you can specify it to improve transcription accuracy and minimize the latency. If a single locale is specified, that locale is used for transcription.<br/><br/>But if you're not sure about the locale, you can specify multiple locales to use language identification. Language identification might be more accurate with a more precise list of candidate locales.<br/><br/>If you don't specify any locale, then the Speech service will use the latest multi-lingual model to identify the locale and transcribe continuously.<br/><br/> You can get the latest supported languages via the [Transcriptions - List Supported Locales](/rest/api/speechtotext/transcriptions/list-supported-locales) REST API. For more information about locales, see the [Speech service language support](language-support.md?tabs=stt) documentation.| Optional but recommended if you know the expected locale. |
+| `locales` | The list of locales that should match the expected locale of the audio data to transcribe.<br/><br/>If you know the locale of the audio file, you can specify it to improve transcription accuracy and minimize the latency. If a single locale is specified, that locale is used for transcription.<br/><br/>But if you're not sure about the locale, you can specify multiple locales to use language identification. Language identification might be more accurate with a more precise list of candidate locales.<br/><br/>If you don't specify any locale, then the Speech service will use the latest multi-lingual model to identify the locale and transcribe continuously.<br/><br/> You can get the latest supported languages via the [Transcriptions - List Supported Locales](/rest/api/speechtotext/transcriptions/list-supported-locales) REST API (API version 2024-11-15 or later). For more information about locales, see the [Speech service language support](language-support.md?tabs=stt) documentation.| Optional but recommended if you know the expected locale. |
 | `profanityFilterMode` |Specifies how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. | Optional |
 
 ## Related content
 
 - [Fast transcription REST API reference](/rest/api/speechtotext/transcriptions/transcribe)
 - [Speech to text supported languages](./language-support.md?tabs=stt)
-- [Batch transcription](./batch-transcription.md)
+- [Batch transcription](./batch-transcription.md)

Original file line number	Diff line number	Diff line change
`@@ -62,7 +62,7 @@ curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtot`
`62`	`62`
`63`	`63`	`Construct the form definition according to the following instructions:`
`64`	`64`
`65`		-- Set the optional (but recommended) `locales` property that should match the expected locale of the audio data to transcribe. In this example, the locale is set to `en-US`. The supported locales that you can specify are: da-DK, de-DE, en-GB, en-IN, en-US, es-ES, es-MX, fi-FI, fr-FR, he-IL, hi-IN, id-ID, it-IT, ja-JP, ko-KR, pl-PL, pt-BR, pt-PT, sv-SE, and zh-CN. For more information about the supported locales, see [speech to text supported languages](./language-support.md?tabs=stt).
	`65`	+- Set the optional (but recommended) `locales` property that should match the expected locale of the audio data to transcribe. In this example, the locale is set to `en-US`. For more information about the supported locales, see [speech to text supported languages](./language-support.md?tabs=stt).
`66`	`66`
`67`	`67`	For more information about `locales` and other properties for the fast transcription API, see the [request configuration options](#request-configuration-options) section later in this guide.
`68`	`68`
@@ -94,7 +94,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
`94`	`94`	`}`
`95`	`95`	`],`
`96`	`96`	`"locale": "en-US",`
`97`		`- "confidence": 0.93616915`
	`97`	`+ "confidence": 0.93554276`
`98`	`98`	`},`
`99`	`99`	`{`
`100`	`100`	`"offsetMilliseconds": 1600,`
@@ -118,7 +118,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
`118`	`118`	`}`
`119`	`119`	`],`
`120`	`120`	`"locale": "en-US",`
`121`		`- "confidence": 0.93616915`
	`121`	`+ "confidence": 0.93554276`
`122`	`122`	`},`
`123`	`123`	`{`
`124`	`124`	`"offsetMilliseconds": 2240,`
@@ -152,7 +152,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
`152`	`152`	`}`
`153`	`153`	`],`
`154`	`154`	`"locale": "en-US",`
`155`		`- "confidence": 0.93616915`
	`155`	`+ "confidence": 0.93554276`
`156`	`156`	`},`
`157`	`157`	`{`
`158`	`158`	`"offsetMilliseconds": 3280,`
@@ -181,7 +181,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
`181`	`181`	`}`
`182`	`182`	`],`
`183`	`183`	`"locale": "en-US",`
`184`		`- "confidence": 0.93616915`
	`184`	`+ "confidence": 0.93554276`
`185`	`185`	`},`
`186`	`186`	`{`
`187`	`187`	`"offsetMilliseconds": 5040,`
@@ -200,7 +200,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
`200`	`200`	`}`
`201`	`201`	`],`
`202`	`202`	`"locale": "en-US",`
`203`		`- "confidence": 0.93616915`
	`203`	`+ "confidence": 0.93554276`
`204`	`204`	`},`
`205`	`205`	`{`
`206`	`206`	`"offsetMilliseconds": 5440,`
@@ -229,7 +229,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
`229`	`229`	`}`
`230`	`230`	`],`
`231`	`231`	`"locale": "en-US",`
`232`		`- "confidence": 0.93616915`
	`232`	`+ "confidence": 0.93554276`
`233`	`233`	`},`
`234`	`234`	`// More transcription results...`
`235`	`235`	`// Redacted for brevity`
@@ -265,7 +265,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
`265`	`265`	`}`
`266`	`266`	`],`
`267`	`267`	`"locale": "en-US",`
`268`		`- "confidence": 0.9314801`
	`268`	`+ "confidence": 0.92022026`
`269`	`269`	`},`
`270`	`270`	`{`
`271`	`271`	`"offsetMilliseconds": 181960,`
@@ -284,7 +284,7 @@ The response includes `durationMilliseconds`, `offsetMilliseconds`, and more. Th
`284`	`284`	`}`
`285`	`285`	`],`
`286`	`286`	`"locale": "en-US",`
`287`		`- "confidence": 0.9314801`
	`287`	`+ "confidence": 0.92022026`
`288`	`288`	`}`
`289`	`289`	`]`
`290`	`290`	`}`
`@@ -1730,11 +1730,11 @@ Here are some property options to configure a transcription when you call the [T`
`1730`	`1730`	`\|----------\|-------------\|----------------------\|`
`1731`	`1731`	\| `channels` \| The list of zero-based indices of the channels to be transcribed separately. Up to two channels are supported unless diarization is enabled. By default, the fast transcription API merges all input channels into a single channel and then performs the transcription. If this isn't desirable, channels can be transcribed independently without merging.<br/><br/>If you want to transcribe the channels from a stereo audio file separately, you need to specify `[0,1]`, `[0]`, or `[1]`. Otherwise, stereo audio is merged to mono and only a single channel is transcribed.<br/><br/>If the audio is stereo and diarization is enabled, then you can't set the `channels` property to `[0,1]`. The Speech service doesn't support diarization of multiple channels.<br/><br/>For mono audio, the `channels` property is ignored, and the audio is always transcribed as a single channel.\| Optional \|
`1732`	`1732`	\| `diarization` \| The diarization configuration. Diarization is the process of recognizing and separating multiple speakers in one audio channel. For example, specify `"diarization": {"maxSpeakers": 2, "enabled": true}`. Then the transcription file contains `speaker` entries (such as `"speaker": 0` or `"speaker": 1`) for each transcribed phrase. \| Optional \|
`1733`		-\| `locales` \| The list of locales that should match the expected locale of the audio data to transcribe.<br/><br/>If you know the locale of the audio file, you can specify it to improve transcription accuracy and minimize the latency. If a single locale is specified, that locale is used for transcription.<br/><br/>But if you're not sure about the locale, you can specify multiple locales to use language identification. Language identification might be more accurate with a more precise list of candidate locales.<br/><br/>If you don't specify any locale, then the Speech service will use the latest multi-lingual model to identify the locale and transcribe continuously.<br/><br/> You can get the latest supported languages via the [Transcriptions - List Supported Locales](/rest/api/speechtotext/transcriptions/list-supported-locales) REST API. For more information about locales, see the [Speech service language support](language-support.md?tabs=stt) documentation.\| Optional but recommended if you know the expected locale. \|
	`1733`	+\| `locales` \| The list of locales that should match the expected locale of the audio data to transcribe.<br/><br/>If you know the locale of the audio file, you can specify it to improve transcription accuracy and minimize the latency. If a single locale is specified, that locale is used for transcription.<br/><br/>But if you're not sure about the locale, you can specify multiple locales to use language identification. Language identification might be more accurate with a more precise list of candidate locales.<br/><br/>If you don't specify any locale, then the Speech service will use the latest multi-lingual model to identify the locale and transcribe continuously.<br/><br/> You can get the latest supported languages via the [Transcriptions - List Supported Locales](/rest/api/speechtotext/transcriptions/list-supported-locales) REST API (API version 2024-11-15 or later). For more information about locales, see the [Speech service language support](language-support.md?tabs=stt) documentation.\| Optional but recommended if you know the expected locale. \|
`1734`	`1734`	\| `profanityFilterMode` \|Specifies how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. \| Optional \|
`1735`	`1735`
`1736`	`1736`	`## Related content`
`1737`	`1737`
`1738`	`1738`	`- [Fast transcription REST API reference](/rest/api/speechtotext/transcriptions/transcribe)`
`1739`	`1739`	`- [Speech to text supported languages](./language-support.md?tabs=stt)`
`1740`		`-- [Batch transcription](./batch-transcription.md)`
	`1740`	`+- [Batch transcription](./batch-transcription.md)`