You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/speech-sdk.md
+28-17Lines changed: 28 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,47 +8,47 @@ manager: nitinme
8
8
ms.service: cognitive-services
9
9
ms.subservice: speech-service
10
10
ms.topic: conceptual
11
-
ms.date: 03/27/2020
11
+
ms.date: 04/03/2020
12
12
ms.author: dapine
13
13
---
14
14
15
15
# About the Speech SDK
16
16
17
-
The Speech software development kit (SDK) exposes many of the Speech service capabilities, making it easier to develop speech-enabled applications. There are various SDKs available in many programming languages. All of the Speech SDKs are cross-platform, with the exception of the Objective-C SDK (which is only available on iOS and macOS).
17
+
The Speech software development kit (SDK) exposes many of the Speech service capabilities, to empower you to develop speech-enabled applications. The Speech SDK is available in many programming languages, all of which work cross-platform, except for Objective-C, which is only available on iOS and macOS.
The Speech SDK exposes many features from the Speech service, but not all of them. The capabilities of the Speech SDK are often associated to scenarios. It's ideal for both real-time and non-real-time scenarios, using local devices, files, and even input and output streams. There are [known limitations](#known-limitations) with the Speech SDK, where feature gaps exist. When a scenario is unachievable with the Speech SDK, look for a REST API alternative.
23
+
The Speech SDK exposes many features from the Speech service, but not all of them. The capabilities of the Speech SDK are often associated with scenarios. The Speech SDK is ideal for both real-time and non-real-time scenarios, using local devices, files, Azure blob storage, and even input and output streams. When a scenario is not achievable with the Speech SDK, look for a REST API alternative.
24
24
25
25
### Speech-to-text
26
26
27
-
Speech-to-text (also known as *speech recognition*) transcribes audio streams to text that your applications, tools, or devices can consume or display. Use speech-to-text with [Language Understanding (LUIS)](https://docs.microsoft.com/azure/cognitive-services/luis) to derive user intents from transcribed speech and act on voice commands. For more information, see [Speech-to-text basics](speech-to-text-basics.md).
27
+
[Speech-to-text](speech-to-text.md) (also known as *speech recognition*) transcribes audio streams to text that your applications, tools, or devices can consume or display. Use speech-to-text with [Language Understanding (LUIS)](../luis/index.yml) to derive user intents from transcribed speech and act on voice commands. Use [Speech Translation](speech-translation.md) to translate speech input to a different language with a single call. For more information, see [Speech-to-text basics](speech-to-text-basics.md).
28
28
29
29
### Text-to-speech
30
30
31
-
Text-to-speech (also known as *speech synthesis*) converts text into human-like synthesized speech, using the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md). For more information on standard or neural voices, see [Text-to-speech language and voice support](language-support.md#text-to-speech).
31
+
[Text-to-speech](text-to-speech.md) (also known as *speech synthesis*) converts text into human-like synthesized speech. The input text is either string literals or using the [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup.md). For more information on standard or neural voices, see [Text-to-speech language and voice support](language-support.md#text-to-speech).
32
32
33
-
### Keyword spotting
33
+
### Voice assistants
34
34
35
-
The concept of [keyword spotting](speech-devices-sdk-create-kws.md) is supported in the Speech SDK. Keyword spotting is the act of identifying a keyword in speech, followed by an action upon hearing the keyword. For example, "Hey Cortana" would activate the Cortana assistant.
35
+
Voice assistants using the Speech SDK enable developers to create natural, human-like conversational interfaces for their applications and experiences. The voice assistant service provides fast, reliable interaction between a device and an assistant. The implementation uses the Bot Framework's Direct Line Speech channel or the integrated Custom Commands (Preview) service for task completion. Additionally, voice assistants can be created using the [Custom Voice Portal](https://aka.ms/customvoice) to create a unique voice experience.
36
36
37
-
###Voice assistants
37
+
#### Keyword spotting
38
38
39
-
Voice assistants using the Speech SDK enable developers to create natural, human-like conversational interfaces for their applications and experiences. The voice assistant service provides fast, reliable interaction between a device and an assistant. The implementation uses the Bot Framework's Direct Line Speech channel or the integrated Custom Commands (Preview) service for task completion.
39
+
The concept of [keyword spotting](speech-devices-sdk-create-kws.md) is supported in the Speech SDK. Keyword spotting is the act of identifying a keyword in speech, followed by an action upon hearing the keyword. For example, "Hey Cortana" would activate the Cortana assistant.
40
40
41
41
### Meeting scenarios
42
42
43
43
The Speech SDK is perfect for transcribing meeting scenarios, whether from a single device or multi-device conversation.
44
44
45
45
#### Conversation Transcription
46
46
47
-
Enables real-time (and asynchronous) speech recognition, speaker identification, and sentence attribution to each speaker (also known as *diarization*). It's perfect for transcribing in-person meetings with the ability to distinguish speakers.
47
+
[Conversation Transcription](conversation-transcription.md) enables real-time (and asynchronous) speech recognition, speaker identification, and sentence attribution to each speaker (also known as *diarization*). It's perfect for transcribing in-person meetings with the ability to distinguish speakers.
48
48
49
49
#### Multi-device Conversation
50
50
51
-
Connect multiple devices or clients in a conversation to send speech-based or text-based messages, with easy support for transcription and translation.
51
+
With [Multi-device Conversation](multi-device-conversation.md), connect multiple devices or clients in a conversation to send speech-based or text-based messages, with easy support for transcription and translation.
52
52
53
53
### Custom / agent scenarios
54
54
@@ -60,23 +60,34 @@ A common scenario for speech-to-text is transcribing large volumes of telephony
60
60
61
61
### Codec compressed audio input
62
62
63
-
Several of the Speech SDKs' support codec compressed audio input streams. For more information, see <ahref="https://docs.microsoft.com/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams"target="_blank">use compressed audio input formats <spanclass="docon docon-navigate-external x-hidden-focus"></span></a>.
63
+
Several of the Speech SDK programming languages support codec compressed audio input streams. For more information, see <ahref="https://docs.microsoft.com/azure/cognitive-services/speech-service/how-to-use-codec-compressed-audio-input-streams"target="_blank">use compressed audio input formats <spanclass="docon docon-navigate-external x-hidden-focus"></span></a>.
64
64
65
-
## Known limitations
65
+
## REST API
66
66
67
-
While the Speech SDK covers many feature capabilities with various scenarios, there are known limitations. Certain functionalities are only available from the Azure portal, Custom Speech portal, Custom voice portal, or the REST API. As an example, endpoint management is not possible through the Speech SDK.
67
+
While the Speech SDK covers many feature capabilities of the Speech Service, for some scenarios you might want to use the REST API. Certain functionalities are only available from the Azure portal, Custom Speech portal, Custom Voice portal, or the REST API. As an example, endpoint management is only exposed via the REST API.
68
+
69
+
> [!TIP]
70
+
> When relying on the REST API, use the <ahref="https://editor.swagger.io/"target="_blank">Swagger Editor <spanclass="docon docon-navigate-external x-hidden-focus"></span></a> to automatically generate client libraries.
71
+
> For example, to generate a Batch transcription client library:
> 1. Select **Generate Client** and choose your desired programming language
68
75
69
76
### Batch transcription
70
77
71
-
Batch transcription enables asynchronous speech-to-text transcription of large volumes of data. It is a REST-based service however, which uses the same endpoint as customization and model management. Batch transcription is only possible from the REST API.
78
+
[Batch transcription](batch-transcription.md) enables asynchronous speech-to-text transcription of large volumes of data. Batch transcription is only possible from the REST API.
79
+
80
+
## Customization
81
+
82
+
The Speech Service delivers great functionality with its default models across speech-to-text, text-to-speech, and speech-translation. Sometimes you may want to increase the baseline performance to work even better with your unique use case. The Speech Service has a variety of no-code customization tools that make it easy, and allow you to create a competitive advantage with custom models based on your own data. These models will only be available to you and your organization.
72
83
73
84
### Custom Speech-to-text
74
85
75
-
When using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. The creation and management of Custom Speech models is only available through the [Custom Speech Portal](https://aka.ms/customspeech), and not the Speech SDK. However, once the Custom Speech model is published it can be consumed by the Speech SDK.
86
+
When using speech-to-text for recognition and transcription in a unique environment, you can create and train custom acoustic, language, and pronunciation models to address ambient noise or industry-specific vocabulary. The creation and management of no-code Custom Speech models is available through the [Custom Speech Portal](https://aka.ms/customspeech). Once the Custom Speech model is published, it can be consumed by the Speech SDK.
76
87
77
88
### Custom Text-to-speech
78
89
79
-
Custom text-to-speech, also known as Custom Voice is a set of online tools that allow you to create a recognizable, one-of-a-kind voice for your brand. The creation and management of Custom Voice models is only available through the [Custom Voice Portal](https://aka.ms/customvoice), and not the Speech SDK. However, once the Custom Voice model is published it can be consumed by the Speech SDK.
90
+
Custom text-to-speech, also known as Custom Voice is a set of online tools that allow you to create a recognizable, one-of-a-kind voice for your brand. The creation and management of no-code Custom Voice models is available through the [Custom Voice Portal](https://aka.ms/customvoice). Once the Custom Voice model is published, it can be consumed by the Speech SDK.
0 commit comments