You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|**Cost**|[Global deployment pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)|[Regional pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)| May experience cost savings for consistent usage |
37
37
|**What you get**| Easy access to all new models with highest default pay-per-call limits.<br><br> Customers with high volume usage may see higher latency variability | Easy access with [SLA on availability](https://azure.microsoft.com/support/legal/sla/). Optimized for low to medium volume workloads with high burstiness. <br><br>Customers with high consistent volume may experience greater latency variability. | Regional access with very high & predictable throughput. Determine throughput per PTU using the provided [capacity calculator](./provisioned-throughput-onboarding.md#estimate-provisioned-throughput-and-cost)|
|**What you don’t get**|❌Data processing guarantee<br> <br> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/)| ❌High volume w/consistent low latency | ❌Pay-per-call flexibility |
39
39
|**Per-call Latency**| Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time. |
40
40
|**Sku Name in code**|`GlobalStandard`|`Standard`|`ProvisionedManaged`|
@@ -52,6 +52,9 @@ Standard deployments are optimized for low to medium volume workloads with high
52
52
53
53
## Global standard
54
54
55
+
> [!IMPORTANT]
56
+
> Data might be processed outside of the resource's Azure geography, but data storage remains in its Azure geography. [Learn more about data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
57
+
55
58
Global deployments are available in the same Azure OpenAI resources as non-global offers but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard will provide the highest default quota for new models and eliminates the need to load balance across multiple resources.
56
59
57
60
The deployment type is optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability. The threshold is set per model. See the [quotas page to learn more](./quota.md).
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/speech-to-text.md
+49-25Lines changed: 49 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,37 +6,41 @@ author: eric-urban
6
6
manager: nitinme
7
7
ms.service: azure-ai-speech
8
8
ms.topic: overview
9
-
ms.date: 5/21/2024
9
+
ms.date: 7/23/2024
10
10
ms.author: eur
11
11
---
12
12
13
13
# What is speech to text?
14
14
15
-
In this overview, you learn about the benefits and capabilities of the speech to text feature of the Speech service, which is part of Azure AI services. Speech to text can be used for [real-time](#real-time-speech-to-text), [batch transcription](#batch-transcription-api), or [fast transcription](./fast-transcription-create.md) of audio streams into text.
15
+
Azure AI Speech service offers advanced speech to text capabilities. This feature supports both real-time and batch transcription, providing versatile solutions for converting audio streams into text.
16
16
17
-
> [!NOTE]
18
-
> To compare pricing of [real-time](#real-time-speech-to-text), [batch transcription](#batch-transcription-api), and [fast transcription](./fast-transcription-create.md), see [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/).
17
+
## Core Features
19
18
20
-
For a full list of available speech to text languages, see [Language and voice support](language-support.md?tabs=stt).
19
+
The speech to text service offers the following core features:
20
+
-[Real-time](#real-time-speech-to-text) transcription: Instant transcription with intermediate results for live audio inputs.
21
+
-[Fast transcription](#fast-transcription-preview): Fastest synchronous output for situations with predictable latency.
22
+
-[Batch transcription](#batch-transcription-api): Efficient processing for large volumes of prerecorded audio.
23
+
-[Custom speech](#custom-speech): Models with enhanced accuracy for specific domains and conditions.
21
24
22
25
## Real-time speech to text
23
26
24
-
With real-time speech to text, the audio is transcribed as speech is recognized from a microphone or file. Use real-time speech to text for applications that need to transcribe audio in real-time such as:
25
-
- Transcriptions, captions, or subtitles for live meetings
Real-time speech to text transcribes audio as it's recognized from a microphone or file. It's ideal for applications requiring immediate transcription, such as:
28
+
-**Transcriptions, captions, or subtitles for live meetings**: Real-time audio transcription for accessibility and record-keeping.
29
+
-**Diarization**: Identifying and distinguishing between different speakers in the audio.
30
+
-**Pronunciation assessment**: Evaluating and providing feedback on pronunciation accuracy.
31
+
-**Call center agents assist**: Providing real-time transcription to assist customer service representatives.
32
+
-**Dictation**: Transcribing spoken words into written text for documentation purposes.
33
+
-**Voice agents**: Enabling interactive voice response systems to transcribe user queries and commands.
31
34
32
-
Real-time speech to text is available via the [Speech SDK](speech-sdk.md) and the [Speech CLI](spx-overview.md).
35
+
Real-time speech to text can be accessed via the Speech SDK, Speech CLI, and REST API, allowing integration into various applications and workflows.
36
+
Real-time speech to text is available via the [Speech SDK](speech-sdk.md), the [Speech CLI](spx-overview.md), and REST APIs such as the [Fast transcription API](fast-transcription-create.md).
33
37
34
38
## Fast transcription (Preview)
35
39
36
-
Fast transcription API is used to transcribe audio files with returning results synchronously and much faster than real-time audio. Use fast transcription in the scenarios that you need the transcript of an audio recording as quickly as possible with predictable latency, such as:
40
+
Fast transcription API is used to transcribe audio files with returning results synchronously and faster than real-time audio. Use fast transcription in the scenarios that you need the transcript of an audio recording as quickly as possible with predictable latency, such as:
37
41
38
-
- Quick audio or video transcription, subtitles, and edit.
39
-
- Video translation
42
+
-**Quick audio or video transcription and subtitles**: Quickly get a transcription of an entire video or audio file in one go.
43
+
-**Video translation**: Immediately get new subtitles for a video if you have audio in different languages.
40
44
41
45
> [!NOTE]
42
46
> Fast transcription API is only available via the speech to text REST API version 2024-05-15-preview and later.
@@ -45,14 +49,15 @@ To get started with fast transcription, see [use the fast transcription API (pre
45
49
46
50
## Batch transcription API
47
51
48
-
[Batch transcription](batch-transcription.md) is used to transcribe a large amount of audio in storage. You can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results. Use batch transcription for applications that need to transcribe audio in bulk such as:
49
-
- Transcriptions, captions, or subtitles for prerecorded audio
50
-
- Contact center post-call analytics
51
-
- Diarization
52
+
[Batch transcription](batch-transcription.md) is designed for transcribing large amounts of audio stored in files. This method processes audio asynchronously and is suited for:
53
+
-**Transcriptions, captions, or subtitles for prerecorded audio**: Converting stored audio content into text.
54
+
-**Contact center post-call analytics**: Analyzing recorded calls to extract valuable insights.
55
+
-**Diarization**: Differentiating between speakers in recorded audio.
52
56
53
57
Batch transcription is available via:
54
-
-[Speech to text REST API](rest-speech-to-text.md): To get started, see [How to use batch transcription](batch-transcription.md) and [Batch transcription samples (REST)](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch).
55
-
- The [Speech CLI](spx-overview.md) supports both real-time and batch transcription. For Speech CLI help with batch transcriptions, run the following command:
58
+
-[Speech to text REST API](rest-speech-to-text.md): Facilitates batch processing with the flexibility of RESTful calls. To get started, see [How to use batch transcription](batch-transcription.md) and [Batch transcription samples](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/batch).
59
+
-[Speech CLI](spx-overview.md): Supports both real-time and batch transcription, making it easy to manage transcription tasks. For Speech CLI help with batch transcriptions, run the following command:
60
+
56
61
```azurecli-interactive
57
62
spx help batch transcription
58
63
```
@@ -66,9 +71,27 @@ With [custom speech](./custom-speech-overview.md), you can evaluate and improve
66
71
67
72
Out of the box, speech recognition utilizes a Universal Language Model as a base model that is trained with Microsoft-owned data and reflects commonly used spoken language. The base model is pretrained with dialects and phonetics representing various common domains. When you make a speech recognition request, the most recent base model for each [supported language](language-support.md?tabs=stt) is used by default. The base model works well in most speech recognition scenarios.
68
73
69
-
A custom model can be used to augment the base model to improve recognition of domain-specific vocabulary specific to the application by providing text data to train the model. It can also be used to improve recognition based for the specific audio conditions of the application by providing audio data with reference transcriptions. For more information, see [custom speech](./custom-speech-overview.md) and [Speech to text REST API](rest-speech-to-text.md).
74
+
Custom speech allows you to tailor the speech recognition model to better suit your application's specific needs. This can be particularly useful for:
75
+
- **Improving recognition of domain-specific vocabulary**: Train the model with text data relevant to your field.
76
+
- **Enhancing accuracy for specific audio conditions**: Use audio data with reference transcriptions to refine the model.
77
+
78
+
For more information about custom speech, see the [custom speech overview](./custom-speech-overview.md) and the [speech to text REST API](rest-speech-to-text.md) documentation.
79
+
80
+
For details about customization options per language and locale, see the [language and voice support for the Speech service](./language-support.md?tabs=stt) documentation.
81
+
82
+
## Usage Examples
83
+
84
+
Here are some practical examples of how you can utilize Azure AI speech to text:
70
85
71
-
Customization options vary by language or locale. To verify support, see [Language and voice support for the Speech service](./language-support.md?tabs=stt).
86
+
| Use case | Scenario | Solution |
87
+
| --- | --- | --- |
88
+
| **Live meeting transcriptions and captions** | A virtual event platform needs to provide real-time captions for webinars. | Integrate real-time speech to text using the Speech SDK to transcribe spoken content into captions displayed live during the event. |
89
+
| **Customer service enhancement** | A call center wants to assist agents by providing real-time transcriptions of customer calls. | Use real-time speech to text via the Speech CLI to transcribe calls, enabling agents to better understand and respond to customer queries. |
90
+
| **Video subtitling** | A video-hosting platform wants to quickly generate a set of subtitles for a video. | Use fast transcription to quickly get a set of subtitles for the entire video. |
91
+
| **Educational tools** | An e-learning platform aims to provide transcriptions for video lectures. | Apply batch transcription through the speech to text REST API to process prerecorded lecture videos, generating text transcripts for students. |
92
+
| **Healthcare documentation** | A healthcare provider needs to document patient consultations. | Use real-time speech to text for dictation, allowing healthcare professionals to speak their notes and have them transcribed instantly. Use a custom model to enhance recognition of specific medical terms. |
93
+
| **Media and entertainment** | A media company wants to create subtitles for a large archive of videos. | Use batch transcription to process the video files in bulk, generating accurate subtitles for each video. |
94
+
| **Market research** | A market research firm needs to analyze customer feedback from audio recordings. | Employ batch transcription to convert audio feedback into text, enabling easier analysis and insights extraction. |
72
95
73
96
## Responsible AI
74
97
@@ -79,7 +102,8 @@ An AI system includes not only the technology, but also the people who use it, t
79
102
* [Integration and responsible use](/legal/cognitive-services/speech-service/speech-to-text/guidance-integration-responsible-use?context=/azure/ai-services/speech-service/context/context)
80
103
* [Data, privacy, and security](/legal/cognitive-services/speech-service/speech-to-text/data-privacy-security?context=/azure/ai-services/speech-service/context/context)
81
104
82
-
## Next steps
105
+
## Related content
83
106
84
107
- [Get started with speech to text](get-started-speech-to-text.md)
85
108
- [Create a batch transcription](batch-transcription-create.md)
109
+
- For detailed pricing information, visit the [Speech service pricing](https://azure.microsoft.com/pricing/details/cognitive-services/speech-services/) page.
@@ -67,6 +67,9 @@ The in-place migration feature doesn't support the following scenarios. See the
67
67
- App Service Environment v1 in a [Classic virtual network](/previous-versions/azure/virtual-network/create-virtual-network-classic)
68
68
- ELB App Service Environment v2 with IP SSL addresses
69
69
- ELB App Service Environment v1 with IP SSL addresses
70
+
- App Service Environment with a name that doesn't meet the character limits. The entire name, including the domain suffix, must be 64 characters or fewer. For example: *my-ase-name.appserviceenvironment.net* for ILB and *my-ase-name.p.azurewebsites.net* for ELB must be 64 characters or fewer. If you don't meet the character limit, you must migrate manually. The character limits specifically for the App Service Environment name are as follows:
71
+
- ILB App Service Environment name character limit: 36 characters
72
+
- ELB App Service Environment name character limit: 42 characters
70
73
71
74
The App Service platform reviews your App Service Environment to confirm in-place migration support. If your scenario doesn't pass all validation checks, you can't migrate at this time using the in-place migration feature. If your environment is in an unhealthy or suspended state, you can't migrate until you make the needed updates.
# Migration to App Service Environment v3 using the side-by-side migration feature
@@ -81,6 +81,9 @@ The side-by-side migration feature doesn't support the following scenarios. See
81
81
- If you have an App Service Environment v1, you can migrate using the [in-place migration feature](migrate.md) or one of the [manual migration options](migration-alternatives.md).
82
82
- ELB App Service Environment v2 with IP SSL addresses
83
83
-[Zone pinned](zone-redundancy.md) App Service Environment v2
84
+
- App Service Environment with a name that doesn't meet the character limits. The entire name, including the domain suffix, must be 64 characters or fewer. For example: *my-ase-name.appserviceenvironment.net* for ILB and *my-ase-name.p.azurewebsites.net* for ELB must be 64 characters or fewer. If you don't meet the character limit, you must migrate manually. The character limits specifically for the App Service Environment name are as follows:
85
+
- ILB App Service Environment name character limit: 36 characters
86
+
- ELB App Service Environment name character limit: 42 characters
84
87
85
88
The App Service platform reviews your App Service Environment to confirm side-by-side migration support. If your scenario doesn't pass all validation checks, you can't migrate at this time using the side-by-side migration feature. If your environment is in an unhealthy or suspended state, you can't migrate until you make the needed updates.
0 commit comments