Skip to content

Commit 130dd63

Browse files
authored
Merge pull request #273536 from eric-urban/eur/build-ai-speech-fast-transcription
fast transcription API preview
2 parents 89a0727 + cd846c8 commit 130dd63

10 files changed

+324
-15
lines changed

articles/ai-services/speech-service/batch-transcription-audio-data.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 1/18/2024
10+
ms.date: 5/21/2024
1111
ms.devlang: csharp
1212
ms.custom: devx-track-csharp, devx-track-azurecli
1313
---
@@ -26,7 +26,7 @@ You can specify one or multiple audio files when creating a transcription. We re
2626

2727
## Supported audio formats and codecs
2828

29-
The batch transcription API supports many different formats and codecs, such as:
29+
The batch transcription API (and [fast transcription API](./fast-transcription-create.md)) supports many different formats and codecs, such as:
3030

3131
- WAV
3232
- MP3

articles/ai-services/speech-service/batch-transcription-create.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 4/15/2024
10+
ms.date: 5/21/2024
1111
zone_pivot_groups: speech-cli-rest
1212
ms.custom: devx-track-csharp
1313
# Customer intent: As a user who implements audio transcription, I want create transcriptions in bulk so that I don't have to submit audio content repeatedly.

articles/ai-services/speech-service/batch-transcription-get.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: how-to
10-
ms.date: 1/18/2024
10+
ms.date: 5/21/2024
1111
zone_pivot_groups: speech-cli-rest
1212
ms.custom: devx-track-csharp
1313
---

articles/ai-services/speech-service/batch-transcription.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: eric-urban
77
ms.author: eur
88
ms.service: azure-ai-speech
99
ms.topic: overview
10-
ms.date: 1/18/2024
10+
ms.date: 5/21/2024
1111
ms.devlang: csharp
1212
ms.custom: devx-track-csharp
1313
---
Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
---
2+
title: Use the fast transcription API - Speech service
3+
titleSuffix: Azure AI services
4+
description: Learn how to use Azure AI Speech for fast transcriptions, where you submit audio get the transcription results much faster than real-time audio.
5+
manager: nitinme
6+
author: eric-urban
7+
ms.author: eur
8+
ms.service: azure-ai-speech
9+
ms.topic: how-to
10+
ms.date: 7/12/2024
11+
# Customer intent: As a user who implements audio transcription, I want create transcriptions as quickly as possible.
12+
---
13+
14+
# Use the fast transcription API (preview) with Azure AI Speech
15+
16+
> [!NOTE]
17+
> This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
18+
>
19+
> Fast transcription API is only available via the speech to text REST API version 2024-05-15-preview. This preview version is subject to change and is not recommended for production use. It will be retired without notice 90 days after a successor preview version or the general availability (GA) of the API.
20+
21+
Fast transcription API is used to transcribe audio files with returning results synchronously and much faster than real-time audio. Use fast transcription in the scenarios that you need the transcript of an audio recording as quickly as possible with predictable latency, such as:
22+
23+
- Quick audio or video transcription, subtitles, and edit.
24+
- Video dubbing
25+
26+
> [!TIP]
27+
> Try out fast transcription in [Azure AI Studio](https://aka.ms/fasttranscription/studio).
28+
29+
## Prerequisites
30+
31+
- An Azure AI Speech resource in one of the regions where the fast transcription API is available. The supported regions are: Central India, East US, Southeast Asia, and West Europe. For more information about regions supported for other Speech service features, see [Speech service regions](./regions.md).
32+
- An audio file (less than 2 hours long and less than 200 MB in size) in one of the formats and codecs supported by the batch transcription API. For more information about supported audio formats, see [supported audio formats](./batch-transcription-audio-data.md#supported-audio-formats-and-codecs).
33+
34+
## Use the fast transcription API
35+
36+
The fast transcription API is a REST API that uses multipart/form-data to submit audio files for transcription. The API returns the transcription results synchronously.
37+
38+
Construct the request body according to the following instructions:
39+
40+
- Set the required `locales` property. This value should match the expected locale of the audio data to transcribe. The supported locales are: en-US, es-ES, es-MX, fr-FR, hi-IN, it-IT, ja-JP, ko-KR, pt-BR, and zh-CN. You can only specify one locale per transcription request.
41+
- Optionally, set the `profanityFilterMode` property to specify how to handle profanity in recognition results. Accepted values are `None` to disable profanity filtering, `Masked` to replace profanity with asterisks, `Removed` to remove all profanity from the result, or `Tags` to add profanity tags. The default value is `Masked`. The `profanityFilterMode` property works the same way as via the [batch transcription API](./batch-transcription.md).
42+
- Optionally, set the `channels` property to specify the zero-based indices of the channels to be transcribed separately. If not specified, multiple channels are merged and transcribed jointly. Only up to two channels are supported. If you want to transcribe the channels from a stereo audio file separately, you need to specify `[0,1]` here. Otherwise, stereo audio will be merged to mono, mono audio will be left as is, and only a single channel will be transcribed. In either of the latter cases, the output has no channel indices for the transcribed text, since only a single audio stream is transcribed.
43+
44+
Make a multipart/form-data POST request to the `transcriptions` endpoint with the audio file and the request body properties. The following example shows how to create a transcription using the fast transcription API.
45+
46+
- Replace `YourSubscriptionKey` with your Speech resource key.
47+
- Replace `YourServiceRegion` with your Speech resource region.
48+
- Replace `YourAudioFile` with the path to your audio file.
49+
- Set the form definition properties as previously described.
50+
51+
```azurecli-interactive
52+
curl --location 'https://YourServiceRegion.api.cognitive.microsoft.com/speechtotext/transcriptions:transcribe?api-version=2024-05-15-preview' \
53+
--header 'Content-Type: multipart/form-data' \
54+
--header 'Accept: application/json' \
55+
--header 'Ocp-Apim-Subscription-Key: YourSubscriptionKey' \
56+
--form 'audio=@"YourAudioFile"' \
57+
--form 'definition="{
58+
\"locales\":[\"en-US\"],
59+
\"profanityFilterMode\": \"Masked\",
60+
\"channels\": [0,1]}"'
61+
```
62+
63+
The response will include `duration`, `channel`, and more. The `combinedPhrases` property contains the full transcriptions for each channel separately. For example, everything the first speaker said is in the first element of the `combinedPhrases` array, and everything the second speaker said is in the second element of the array.
64+
65+
```json
66+
{
67+
"duration": 185079,
68+
"combinedPhrases": [
69+
{
70+
"channel": 0,
71+
"text": "Hello. Thank you for calling Contoso. Who am I speaking with today? Hi, Mary. Are you calling because you need health insurance? Great. If you can answer a few questions, we can get you signed up in the Jiffy. So what's your full name? Got it. And what's the best callback number in case we get disconnected? Yep, that'll be fine. Got it. So to confirm, it's 234-554-9312. Excellent. Let's get some additional information for your application. Do you have a job? OK, so then you have a Social Security number as well. OK, and what is your Social Security number please? Sorry, what was that, a 25 or a 225? You cut out for a bit. Alright, thank you so much. And could I have your e-mail address please? Great. Uh That is the last question. So let me take your information and I'll be able to get you signed up right away. Thank you for calling Contoso and I'll be able to get you signed up immediately. One of our agents will call you back in about 24 hours or so to confirm your application. Absolutely. If you need anything else, please give us a call at 1-800-555-5564, extension 123. Thank you very much for calling Contoso. Uh Yes, of course. So the default is a digital membership card, but we can send you a physical card if you prefer. Uh, yeah. Absolutely. I've made a note on your file. You're very welcome. Thank you for calling Contoso and have a great day."
72+
},
73+
{
74+
"channel": 1,
75+
"text": "Hi, my name is Mary Rondo. I'm trying to enroll myself with Contuso. Yes, yeah, I'm calling to sign up for insurance. Okay. So Mary Beth Rondo, last name is R like Romeo, O like Ocean, N like Nancy D, D like Dog, and O like Ocean again. Rondo. I only have a cell phone so I can give you that. Sure, so it's 234-554 and then 9312. Yep, that's right. Uh Yes, I am self-employed. Yes, I do. Uh Sure, so it's 412256789. It's double two, so 412, then another two, then five. Yeah, it's [email protected]. So my first and last name at gmail.com. No periods, no dashes. That was quick. Thank you. Actually, so I have one more question. I'm curious, will I be getting a physical card as proof of coverage? uh Yes. Could you please mail it to me when it's ready? I'd like to have it shipped to, are you ready for my address? So it's 2660 Unit A on Maple Avenue SE, Lansing, and then zip code is 48823. Awesome. Thanks so much."
76+
}
77+
],
78+
"phrases": [
79+
{
80+
"channel": 0,
81+
"offset": 720,
82+
"duration": 480,
83+
"text": "Hello.",
84+
"words": [
85+
{
86+
"text": "Hello.",
87+
"offset": 720,
88+
"duration": 480
89+
}
90+
],
91+
"locale": "en-US",
92+
"confidence": 0.9177142
93+
},
94+
{
95+
"channel": 0,
96+
"offset": 1200,
97+
"duration": 1120,
98+
"text": "Thank you for calling Contoso.",
99+
"words": [
100+
{
101+
"text": "Thank",
102+
"offset": 1200,
103+
"duration": 200
104+
},
105+
{
106+
"text": "you",
107+
"offset": 1400,
108+
"duration": 80
109+
},
110+
{
111+
"text": "for",
112+
"offset": 1480,
113+
"duration": 120
114+
},
115+
{
116+
"text": "calling",
117+
"offset": 1600,
118+
"duration": 240
119+
},
120+
{
121+
"text": "Contoso.",
122+
"offset": 1840,
123+
"duration": 480
124+
}
125+
],
126+
"locale": "en-US",
127+
"confidence": 0.9177142
128+
},
129+
{
130+
"channel": 0,
131+
"offset": 2320,
132+
"duration": 1120,
133+
"text": "Who am I speaking with today?",
134+
"words": [
135+
{
136+
"text": "Who",
137+
"offset": 2320,
138+
"duration": 160
139+
},
140+
{
141+
"text": "am",
142+
"offset": 2480,
143+
"duration": 80
144+
},
145+
{
146+
"text": "I",
147+
"offset": 2560,
148+
"duration": 80
149+
},
150+
{
151+
"text": "speaking",
152+
"offset": 2640,
153+
"duration": 320
154+
},
155+
{
156+
"text": "with",
157+
"offset": 2960,
158+
"duration": 160
159+
},
160+
{
161+
"text": "today?",
162+
"offset": 3120,
163+
"duration": 320
164+
}
165+
],
166+
"locale": "en-US",
167+
"confidence": 0.9177142
168+
},
169+
// More transcription results removed for brevity
170+
// {...},
171+
{
172+
"channel": 1,
173+
"offset": 4480,
174+
"duration": 1600,
175+
"text": "Hi, my name is Mary Rondo.",
176+
"words": [
177+
{
178+
"text": "Hi,",
179+
"offset": 4480,
180+
"duration": 400
181+
},
182+
{
183+
"text": "my",
184+
"offset": 4880,
185+
"duration": 120
186+
},
187+
{
188+
"text": "name",
189+
"offset": 5000,
190+
"duration": 120
191+
},
192+
{
193+
"text": "is",
194+
"offset": 5120,
195+
"duration": 160
196+
},
197+
{
198+
"text": "Mary",
199+
"offset": 5280,
200+
"duration": 240
201+
},
202+
{
203+
"text": "Rondo.",
204+
"offset": 5520,
205+
"duration": 560
206+
}
207+
],
208+
"locale": "en-US",
209+
"confidence": 0.8989456
210+
},
211+
{
212+
"channel": 1,
213+
"offset": 6080,
214+
"duration": 1920,
215+
"text": "I'm trying to enroll myself with Contuso.",
216+
"words": [
217+
{
218+
"text": "I'm",
219+
"offset": 6080,
220+
"duration": 160
221+
},
222+
{
223+
"text": "trying",
224+
"offset": 6240,
225+
"duration": 200
226+
},
227+
{
228+
"text": "to",
229+
"offset": 6440,
230+
"duration": 80
231+
},
232+
{
233+
"text": "enroll",
234+
"offset": 6520,
235+
"duration": 200
236+
},
237+
{
238+
"text": "myself",
239+
"offset": 6720,
240+
"duration": 360
241+
},
242+
{
243+
"text": "with",
244+
"offset": 7080,
245+
"duration": 120
246+
},
247+
{
248+
"text": "Contuso.",
249+
"offset": 7200,
250+
"duration": 800
251+
}
252+
],
253+
"locale": "en-US",
254+
"confidence": 0.8989456
255+
},
256+
// More transcription results removed for brevity
257+
// {...},
258+
]
259+
}
260+
```
261+
262+
## Related content
263+
264+
- [Speech to text quickstart](./get-started-speech-to-text.md)
265+
- [Batch transcription API](./batch-transcription.md)

articles/ai-services/speech-service/includes/release-notes/release-notes-stt.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,27 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 6/6/2024
5+
ms.date: 7/12/2024
66
ms.author: eur
77
---
88

9+
### July 2024 release
10+
11+
#### Fast Transcription API (Preview)
12+
13+
Fast transcription is now available in public preview. Fast transcription allows you to transcribe audio file to text accurately and synchronously, with a high speed factor. It can transcribe a 30-minutes audio in less than 1 minute. For more information, see the [fast transcription API guide](../../fast-transcription-create.md).
14+
15+
> [!TIP]
16+
> Try out fast transcription in [Azure AI Studio](https://aka.ms/fasttranscription/studio).
17+
918
### June 2024 release
1019

1120
#### Speech to text REST API v3.2 general availability
1221

1322
The Speech to text REST API version 3.2 is now generally available. For more information about speech to text REST API v3.2, see the [Speech to text REST API v3.2 reference documentation](/rest/api/speechtotext/operation-groups?view=rest-speechtotext-v3.2&preserve-view=true) and the [Speech to text REST API guide](../../rest-speech-to-text.md).
1423

1524
> [!NOTE]
16-
> Preview versions *3.2-preview.1* and *3*.2-preview.2* will be removed in September 2024.
25+
> Preview versions *3.2-preview.1* and *3.2-preview.2* will be removed in September 2024.
1726
1827
[Speech to text REST API](../../rest-speech-to-text.md) v3.1 will be retired on a date to be announced. Speech to text REST API v3.0 will be retired on April 1st, 2026. For more information about upgrading, see the Speech to text REST API [v3.0 to v3.1](../../migrate-v3-0-to-v3-1.md) and [v3.1 to v3.2](../../migrate-v3-1-to-v3-2.md) migration guides.
1928

articles/ai-services/speech-service/overview.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,18 @@ With [real-time speech to text](get-started-speech-to-text.md), the audio is tra
5858
- Dictation
5959
- Voice agents
6060

61+
## Fast transcription API (Preview)
62+
63+
Fast transcription API is used to transcribe audio files with returning results synchronously and much faster than real-time audio. Use fast transcription in the scenarios that you need the transcript of an audio recording as quickly as possible with predictable latency, such as:
64+
65+
- Quick audio or video transcription, subtitles, and edit.
66+
- Video dubbing
67+
68+
> [!NOTE]
69+
> Fast transcription API is only available via the speech to text REST API version 3.3.
70+
71+
To get started with fast transcription, see [use the fast transcription API (preview)](fast-transcription-create.md).
72+
6173
### Batch transcription
6274

6375
[Batch transcription](batch-transcription.md) is used to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results. Use batch transcription for applications that need to transcribe audio in bulk such as:

articles/ai-services/speech-service/speech-services-quotas-and-limits.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,13 @@
22
title: Speech service quotas and limits
33
titleSuffix: Azure AI services
44
description: Quick reference, detailed description, and best practices on the quotas and limits for the Speech service in Azure AI services.
5-
author: alexeyo26
5+
author: eric-urban
6+
ms.author: eur
67
manager: nitinme
78
ms.service: azure-ai-speech
89
ms.topic: conceptual
9-
ms.date: 1/22/2024
10-
ms.author: alexeyo
10+
ms.date: 5/21/2024
11+
ms.reviewer: alexeyo
1112
---
1213

1314
# Speech service quotas and limits
@@ -42,6 +43,14 @@ You can use real-time speech to text with the [Speech SDK](speech-sdk.md) or the
4243
| Concurrent request limit - custom endpoint | 1 <br/><br/>This limit isn't adjustable. | 100 (default value)<br/><br/>The rate is adjustable for Standard (S0) resources. See [more explanations](#detailed-description-quota-adjustment-and-best-practices), [best practices](#general-best-practices-to-mitigate-throttling-during-autoscaling), and [adjustment instructions](#speech-to-text-increase-real-time-speech-to-text-concurrent-request-limit). |
4344
| Max audio length for [real-time diarization](./get-started-stt-diarization.md). | N/A | 240 minutes per file |
4445

46+
#### Fast transcription
47+
48+
| Quota | Free (F0) | Standard (S0) |
49+
|-----|-----|-----|
50+
| Maximum audio input file size | N/A | 200 MB |
51+
| Maximum audio length | N/A | 120 minutes per file |
52+
| Maximum requests per minute | N/A | 300 |
53+
4554
#### Batch transcription
4655

4756
| Quota | Free (F0) | Standard (S0) |

0 commit comments

Comments
 (0)