Skip to content

Commit 5c21d45

Browse files
authored
Merge pull request #215943 from eric-urban/eur/embedded-speech
Embedded Speech preview
2 parents 00848e9 + 696f5dc commit 5c21d45

File tree

3 files changed

+177
-1
lines changed

3 files changed

+177
-1
lines changed
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
---
2+
title: Embedded Speech - Speech service
3+
titleSuffix: Azure Cognitive Services
4+
description: Embedded Speech is designed for on-device scenarios where cloud connectivity is intermittent or unavailable.
5+
services: cognitive-services
6+
author: eric-urban
7+
manager: nitinme
8+
ms.service: cognitive-services
9+
ms.subservice: speech-service
10+
ms.topic: how-to
11+
ms.date: 10/31/2022
12+
ms.author: eur
13+
zone_pivot_groups: programming-languages-set-thirteen
14+
---
15+
16+
# Embedded Speech (preview)
17+
18+
Embedded Speech is designed for on-device [speech-to-text](speech-to-text.md) and [text-to-speech](text-to-speech.md) scenarios where cloud connectivity is intermittent or unavailable. For example, you can use embedded speech in medical equipment, a voice enabled air conditioning unit, or a car that might travel out of range. You can also develop hybrid cloud and offline solutions. For scenarios where your devices must be in a secure environment like a bank or government entity, you should first consider [disconnected containers](/azure/cognitive-services/containers/disconnected-containers).
19+
20+
> [!IMPORTANT]
21+
> Microsoft limits access to embedded speech. You can apply for access through the Azure Cognitive Services [embedded speech limited access review](https://aka.ms/csgate-embedded-speech). For more information, see [Limited access for embedded speech](/legal/cognitive-services/speech-service/embedded-speech/limited-access-embedded-speech?context=/azure/cognitive-services/speech-service/context/context).
22+
23+
## Platform requirements
24+
25+
Embedded speech is included with the Speech SDK (version 1.24.1 and higher) for C#, C++, and Java. Refer to the general [Speech SDK installation requirements](quickstarts/setup-platform.md) for programming language and target platform specific details.
26+
27+
**Choose your target environment**
28+
29+
# [Android](#tab/android)
30+
31+
Requires Android 7.0 (API level 24) or higher on ARM64 (`arm64-v8a`) or ARM32 (`armeabi-v7a`) hardware.
32+
33+
Embedded TTS with neural voices is only supported on ARM64.
34+
35+
# [Linux](#tab/linux)
36+
37+
Requires Linux on x64, ARM64, or ARM32 hardware with [supported Linux distributions](quickstarts/setup-platform.md?tabs=linux).
38+
39+
Embedded speech isn't supported on RHEL/CentOS 7.
40+
41+
Embedded TTS with neural voices isn't supported on ARM32.
42+
43+
# [macOS](#tab/macos)
44+
45+
Requires 10.14 or newer on x64 or ARM64 hardware.
46+
47+
# [Windows](#tab/windows)
48+
49+
Requires Windows 10 or newer on x64 or ARM64 hardware.
50+
51+
The latest [Microsoft Visual C++ Redistributable for Visual Studio 2015-2022](/cpp/windows/latest-supported-vc-redist?view=msvc-170&preserve-view=true) must be installed regardless of the programming language used with the Speech SDK.
52+
53+
The Speech SDK for Java doesn't support Windows on ARM64.
54+
55+
---
56+
57+
## Limitations
58+
59+
Embedded speech is only available with C#, C++, and Java SDKs. The other Speech SDKs, Speech CLI, and REST APIs don't support embedded speech.
60+
61+
Embedded speech recognition only supports mono 16 bit, 16-kHz PCM-encoded WAV audio.
62+
63+
Embedded neural voices only support 24-kHz sample rate.
64+
65+
## Models and voices
66+
67+
For embedded speech, you'll need to download the speech recognition models for [speech-to-text](speech-to-text.md) and voices for [text-to-speech](text-to-speech.md). Instructions will be provided upon successful completion of the [limited access review](https://aka.ms/csgate-embedded-speech) process.
68+
69+
## Embedded speech configuration
70+
71+
For cloud connected applications, as shown in most Speech SDK samples, you use the `SpeechConfig` object with a Speech resource key and region. For embedded speech, you don't use a Speech resource. Instead of a cloud resource, you use the [models and voices](#models-and-voices) that you downloaded to your local device.
72+
73+
Use the `EmbeddedSpeechConfig` object to set the location of the models or voices. If your application is used for both speech-to-text and text-to-speech, you can use the same `EmbeddedSpeechConfig` object to set the location of the models and voices.
74+
75+
::: zone pivot="programming-language-csharp"
76+
77+
```csharp
78+
// Provide the location of the models and voices.
79+
List<string> paths = new List<string>();
80+
paths.Add("C:\\dev\\embedded-speech\\stt-models");
81+
paths.Add("C:\\dev\\embedded-speech\\tts-voices");
82+
var embeddedSpeechConfig = EmbeddedSpeechConfig.FromPaths(paths.ToArray());
83+
84+
// For speech-to-text
85+
embeddedSpeechConfig.SetSpeechRecognitionModel(
86+
"Microsoft Speech Recognizer en-US FP Model V8",
87+
Environment.GetEnvironmentVariable("MODEL_KEY"));
88+
89+
// For text-to-speech
90+
embeddedSpeechConfig.SetSpeechSynthesisVoice(
91+
"Microsoft Server Speech Text to Speech Voice (en-US, JennyNeural)",
92+
Environment.GetEnvironmentVariable("VOICE_KEY"));
93+
embeddedSpeechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm);
94+
```
95+
96+
You can find ready to use embedded speech samples at [GitHub](https://aka.ms/csspeech/samples).
97+
98+
- [C# (.NET 6.0)](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/csharp/dotnetcore/embedded-speech)
99+
- [C# for Unity](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/csharp/unity/embedded-speech)
100+
::: zone-end
101+
102+
::: zone pivot="programming-language-cpp"
103+
104+
> [!TIP]
105+
> The `GetEnvironmentVariable` function is defined in the [speech-to-text quickstart](get-started-speech-to-text.md) and [text-to-speech quickstart](get-started-text-to-speech.md).
106+
107+
```cpp
108+
// Provide the location of the models and voices.
109+
vector<string> paths;
110+
paths.push_back("C:\\dev\\embedded-speech\\stt-models");
111+
paths.push_back("C:\\dev\\embedded-speech\\tts-voices");
112+
auto embeddedSpeechConfig = EmbeddedSpeechConfig::FromPaths(paths);
113+
114+
// For speech-to-text
115+
embeddedSpeechConfig->SetSpeechRecognitionModel((
116+
"Microsoft Speech Recognizer en-US FP Model V8",
117+
GetEnvironmentVariable("MODEL_KEY"));
118+
119+
// For text-to-speech
120+
embeddedSpeechConfig->SetSpeechSynthesisVoice(
121+
"Microsoft Server Speech Text to Speech Voice (en-US, JennyNeural)",
122+
GetEnvironmentVariable("VOICE_KEY"));
123+
embeddedSpeechConfig->SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat::Riff24Khz16BitMonoPcm);
124+
```
125+
126+
You can find ready to use embedded speech samples at [GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/cpp/embedded-speech)
127+
::: zone-end
128+
129+
::: zone pivot="programming-language-java"
130+
131+
```java
132+
// Provide the location of the models and voices.
133+
List<String> paths = new ArrayList<>();
134+
paths.add("C:\\dev\\embedded-speech\\stt-models");
135+
paths.add("C:\\dev\\embedded-speech\\tts-voices");
136+
var embeddedSpeechConfig = EmbeddedSpeechConfig.fromPaths(paths);
137+
138+
// For speech-to-text
139+
embeddedSpeechConfig.setSpeechRecognitionModel(
140+
"Microsoft Speech Recognizer en-US FP Model V8",
141+
System.getenv("MODEL_KEY"));
142+
143+
// For text-to-speech
144+
embeddedSpeechConfig.setSpeechSynthesisVoice(
145+
"Microsoft Server Speech Text to Speech Voice (en-US, JennyNeural)",
146+
System.getenv("VOICE_KEY"));
147+
embeddedSpeechConfig.setSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm);
148+
```
149+
150+
You can find ready to use embedded speech samples at [GitHub](https://aka.ms/csspeech/samples).
151+
- [Java (JRE)](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/java/jre/embedded-speech)
152+
- [Java for Android](https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples/java/android/embedded-speech)
153+
::: zone-end
154+
155+
156+
## Hybrid speech
157+
158+
Hybrid speech with the `HybridSpeechConfig` object uses the cloud speech service by default and embedded speech as a fallback in case cloud connectivity is limited or slow.
159+
160+
With hybrid speech configuration for [speech-to-text](speech-to-text.md) (recognition models), embedded speech is used when connection to the cloud service fails after repeated attempts. Recognition may continue using the cloud service again if the connection is later resumed.
161+
162+
With hybrid speech configuration for [text-to-speech](text-to-speech.md) (voices), embedded and cloud synthesis are run in parallel and the result is selected based on which one gives a faster response. The best result is evaluated on each synthesis request.
163+
164+
## Cloud speech
165+
166+
For cloud speech, you use the `SpeechConfig` object, as shown in the [speech-to-text quickstart](get-started-speech-to-text.md) and [text-to-speech quickstart](get-started-text-to-speech.md). To run the quickstarts for embedded speech, you can replace `SpeechConfig` with `EmbeddedSpeechConfig` or `HybridSpeechConfig`. Most of the other speech recognition and synthesis code are the same, whether using cloud, embedded, or hybrid configuration.
167+
168+
## Next steps
169+
170+
- [Quickstart: Recognize and convert speech to text](get-started-speech-to-text.md)
171+
- [Quickstart: Convert text to speech](get-started-text-to-speech.md)

articles/cognitive-services/Speech-Service/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,8 @@ items:
361361
- name: Generate a REST API client library
362362
href: swagger-documentation.md
363363
displayName: rest
364+
- name: Embedded Speech
365+
href: embedded-speech.md
364366
- name: Speech containers
365367
items:
366368
- name: Cognitive Services containers documentation

articles/cognitive-services/cognitive-services-limited-access.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: PatrickFarley
77
manager: nitinme
88
ms.service: cognitive-services
99
ms.topic: conceptual
10-
ms.date: 06/16/2022
10+
ms.date: 10/27/2022
1111
ms.author: pafarley
1212
---
1313

@@ -25,6 +25,7 @@ Limited Access services are made available to customers under the terms governin
2525

2626
The following services are Limited Access:
2727

28+
- [Embedded Speech](/legal/cognitive-services/speech-service/embedded-speech/limited-access-embedded-speech?context=/azure/cognitive-services/speech-service/context/context): All features
2829
- [Custom Neural Voice](/legal/cognitive-services/speech-service/custom-neural-voice/limited-access-custom-neural-voice?context=/azure/cognitive-services/speech-service/context/context): Pro features
2930
- [Speaker Recognition](/legal/cognitive-services/speech-service/speaker-recognition/limited-access-speaker-recognition?context=/azure/cognitive-services/speech-service/context/context): All features
3031
- [Face API](/legal/cognitive-services/computer-vision/limited-access-identity?context=/azure/cognitive-services/computer-vision/context/context): Identify and Verify features, face ID property
@@ -39,6 +40,7 @@ Features of these services that aren't listed above are available without regist
3940

4041
Submit a registration form for each Limited Access service you would like to use:
4142

43+
- [Embedded Speech](https://aka.ms/csgate-embedded-speech): All features
4244
- [Custom Neural Voice](https://aka.ms/customneural): Pro features
4345
- [Speaker Recognition](https://aka.ms/azure-speaker-recognition): All features
4446
- [Face API](https://aka.ms/facerecognition): Identify and Verify features
@@ -67,6 +69,7 @@ Existing customers have until June 30, 2023 to submit a registration form and be
6769

6870
The registration forms can be found here:
6971

72+
- [Embedded Speech](https://aka.ms/csgate-embedded-speech): All features
7073
- [Custom Neural Voice](https://aka.ms/customneural): Pro features
7174
- [Speaker Recognition](https://aka.ms/azure-speaker-recognition): All features
7275
- [Face API](https://aka.ms/facerecognition): Identify and Verify features

0 commit comments

Comments
 (0)