You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
description: Release notes - what has changed in the most recent releases
4
+
description: The release notes provide a log of updates, enhancements, bug fixes, and changes to the Speech Devices SDK. This article is updated with each release of the Speech Devices SDK.
5
5
services: cognitive-services
6
6
author: wsturman
7
7
manager: nitinme
@@ -13,72 +13,72 @@ ms.date: 07/10/2019
13
13
ms.author: wellsi
14
14
---
15
15
16
-
# Release notes of Cognitive Services Speech Devices SDK
16
+
# Release notes: Speech Devices SDK
17
+
17
18
The following sections list changes in the most recent releases.
18
19
19
20
## Speech Devices SDK 1.6.0:
20
21
21
-
*Support [Azure Kinect DK](https://azure.microsoft.com/services/kinect-dk/) on Windows and Linux with common [sample application](https://aka.ms/sdsdk-download)
22
-
*Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.6.0. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
22
+
-Support [Azure Kinect DK](https://azure.microsoft.com/services/kinect-dk/) on Windows and Linux with common [sample application](https://aka.ms/sdsdk-download)
23
+
-Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.6.0. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
23
24
24
25
## Speech Devices SDK 1.5.1:
25
26
26
-
*Include [Conversation Transcription](conversation-transcription-service.md) in the sample app.
27
-
*Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.5.1. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
27
+
-Include [Conversation Transcription](conversation-transcription-service.md) in the sample app.
28
+
-Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.5.1. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
*Speech Devices SDK is now GA and no longer a gated preview.
32
-
*Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.5.0. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
33
-
*New keyword technology brings significant quality improvements, see Breaking Changes.
34
-
*New audio processing pipeline for improved far-field recognition.
32
+
-Speech Devices SDK is now GA and no longer a gated preview.
33
+
-Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.5.0. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
34
+
-New keyword technology brings significant quality improvements, see Breaking Changes.
35
+
-New audio processing pipeline for improved far-field recognition.
35
36
36
37
**Breaking changes**
37
38
38
-
*Due to the new keyword technology all keywords must be re-created at our improved keyword portal. To fully remove old keywords from the device uninstall the old app.
-Due to the new keyword technology all keywords must be re-created at our improved keyword portal. To fully remove old keywords from the device uninstall the old app.
* Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.4.0. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
44
+
- Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.4.0. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
* Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.3.1. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
48
-
*Updated keyword handling, see Breaking Changes.
49
-
*Sample application adds choice of language for both speech recognition and translation.
48
+
- Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.3.1. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
49
+
-Updated keyword handling, see Breaking Changes.
50
+
-Sample application adds choice of language for both speech recognition and translation.
50
51
51
52
**Breaking changes**
52
53
53
-
*[Installing a keyword](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-devices-sdk-create-kws) has been simplified, it is now part of the app and does not need separate installation on the device.
54
-
* The keyword recognition has changed, and two events are supported.
55
-
- RecognizingKeyword, indicates the speech result contains (unverified) keyword text.
56
-
- RecognizedKeyword, indicates that keyword recognition completed recognizing the given keyword.
57
-
54
+
-[Installing a keyword](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-devices-sdk-create-kws) has been simplified, it is now part of the app and does not need separate installation on the device.
55
+
- The keyword recognition has changed, and two events are supported.
56
+
- RecognizingKeyword, indicates the speech result contains (unverified) keyword text.
57
+
- RecognizedKeyword, indicates that keyword recognition completed recognizing the given keyword.
* Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.1.0. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
62
-
* Far Field Speech recognition accuracy has been improved with our enhanced audio processing algorithm.
63
-
* Sample application added Chinese speech recognition support.
61
+
- Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.1.0. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
62
+
- Far Field Speech recognition accuracy has been improved with our enhanced audio processing algorithm.
63
+
- Sample application added Chinese speech recognition support.
* Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.0.1. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
68
-
* Speech recognition accuracy will be improved with our improved audio processing algorithm
69
-
* One continuous recognition audio session bug is fixed.
67
+
- Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 1.0.1. For more information, see its [release notes](https://aka.ms/csspeech/whatsnew).
68
+
- Speech recognition accuracy will be improved with our improved audio processing algorithm
69
+
- One continuous recognition audio session bug is fixed.
70
70
71
71
**Breaking changes**
72
72
73
-
* With this release a number of breaking changes are introduced. Please check [this page](https://aka.ms/csspeech/breakingchanges_1_0_0) for details relating to the APIs.
74
-
* The KWS model files are not compatible with Speech Devices SDK 1.0.1. The existing keyword files will be deleted after the new keyword files are written to the device.
73
+
- With this release a number of breaking changes are introduced. Please check [this page](https://aka.ms/csspeech/breakingchanges_1_0_0) for details relating to the APIs.
74
+
- The KWS model files are not compatible with Speech Devices SDK 1.0.1. The existing keyword files will be deleted after the new keyword files are written to the device.
* Improved the accuracy of speech recognition by fixing a bug in the audio processing code.
79
-
* Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 0.5.0. For more information, see its
- Improved the accuracy of speech recognition by fixing a bug in the audio processing code.
79
+
- Updated the [Speech SDK](https://docs.microsoft.com/azure/cognitive-services/speech-service/speech-sdk-reference) component to version 0.5.0. For more information, see its
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/faq-stt.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Frequently asked questions about the Speech to Text service in Azure
3
3
titleSuffix: Azure Cognitive Services
4
-
description: Get answers to the most popular questions about the Speech to Text service.
4
+
description: Get answers to frequently asked questions about the Speech to Text service.
5
5
services: cognitive-services
6
6
author: PanosPeriorellis
7
7
manager: nitinme
@@ -21,7 +21,7 @@ If you can't find answers to your questions in this FAQ, check out [other suppor
21
21
22
22
**Q: What is the difference between a baseline model and a custom Speech to Text model?**
23
23
24
-
**A**: A baseline model has been trained by using Microsoft-owned data and is already deployed in the cloud. You can use a custom model to adapt a model to better fit a specific environment that has specific ambient noise or language. Factory floors, cars, or noisy streets would require an adapted acoustic model. Topics like biology, physics, radiology, product names, and custom acronyms would require an adapted language model.
24
+
**A**: A baseline model has been trained by using Microsoft-owned data and is already deployed in the cloud. You can use a custom model to adapt a model to better fit a specific environment that has specific ambient noise or language. Factory floors, cars, or noisy streets would require an adapted acoustic model. Topics like biology, physics, radiology, product names, and custom acronyms would require an adapted language model.
25
25
26
26
**Q: Where do I start if I want to use a baseline model?**
27
27
@@ -119,13 +119,13 @@ or
119
119
120
120
**A**: The current limit for a dataset is 2 GB. The limit is due to the restriction on the size of a file for HTTP upload.
121
121
122
-
**Q: Can I zip my text files so I can upload a larger text file?**
122
+
**Q: Can I zip my text files so I can upload a larger text file?**
123
123
124
124
**A**: No. Currently, only uncompressed text files are allowed.
125
125
126
126
**Q: The data report says there were failed utterances. What is the issue?**
127
127
128
-
**A**: Failing to upload 100 percent of the utterances in a file is not a problem. If the vast majority of the utterances in an acoustic or language dataset (for example, more than 95 percent) are successfully imported, the dataset can be usable. However, we recommend that you try to understand why the utterances failed and fix the problems. Most common problems, such as formatting errors, are easy to fix.
128
+
**A**: Failing to upload 100 percent of the utterances in a file is not a problem. If the vast majority of the utterances in an acoustic or language dataset (for example, more than 95 percent) are successfully imported, the dataset can be usable. However, we recommend that you try to understand why the utterances failed and fix the problems. Most common problems, such as formatting errors, are easy to fix.
129
129
130
130
## Creating an acoustic model
131
131
@@ -135,11 +135,11 @@ or
135
135
136
136
**Q: What data should I collect?**
137
137
138
-
**A**: Collect data that's as close to the application scenario and use case as possible. The data collection should match the target application and users in terms of device or devices, environments, and types of speakers. In general, you should collect data from as broad a range of speakers as possible.
138
+
**A**: Collect data that's as close to the application scenario and use case as possible. The data collection should match the target application and users in terms of device or devices, environments, and types of speakers. In general, you should collect data from as broad a range of speakers as possible.
139
139
140
140
**Q: How should I collect acoustic data?**
141
141
142
-
**A**: You can create a standalone data collection application or use off-the-shelf audio recording software. You can also create a version of your application that logs the audio data and then uses the data.
142
+
**A**: You can create a standalone data collection application or use off-the-shelf audio recording software. You can also create a version of your application that logs the audio data and then uses the data.
143
143
144
144
**Q: Do I need to transcribe adaptation data myself?**
145
145
@@ -164,7 +164,7 @@ which includes insertions, deletions, and substitutions, divided by the total nu
164
164
165
165
**A**: The results show a comparison between the baseline model and the model you customized. You should aim to beat the baseline model to make customization worthwhile.
166
166
167
-
**Q: How do I determine the WER of a base model so I can see if there was an improvement?**
167
+
**Q: How do I determine the WER of a base model so I can see if there was an improvement?**
168
168
169
169
**A**: The offline test results show the baseline accuracy of the custom model and the improvement over baseline.
170
170
@@ -176,21 +176,21 @@ which includes insertions, deletions, and substitutions, divided by the total nu
176
176
177
177
**Q: Can I just upload a list of words?**
178
178
179
-
**A**: Uploading a list of words will add the words to the vocabulary, but it won't teach the system how the words are typically used. By providing full or partial utterances (sentences or phrases of things that users are likely to say), the language model can learn the new words and how they are used. The custom language model is good not only for adding new words to the system, but also for adjusting the likelihood of known words for your application. Providing full utterances helps the system learn better.
179
+
**A**: Uploading a list of words will add the words to the vocabulary, but it won't teach the system how the words are typically used. By providing full or partial utterances (sentences or phrases of things that users are likely to say), the language model can learn the new words and how they are used. The custom language model is good not only for adding new words to the system, but also for adjusting the likelihood of known words for your application. Providing full utterances helps the system learn better.
180
180
181
181
## Tenant Model (Custom Speech with Office 365 data)
182
182
183
183
**Q: What information is included in the Tenant Model, and how is it created?**
184
184
185
-
**A:** A Tenant Model is built using [public group](https://support.office.com/article/learn-about-office-365-groups-b565caa1-5c40-40ef-9915-60fdb2d97fa2) emails and documents that can be seen by anyone in your organization.
186
-
185
+
**A:** A Tenant Model is built using [public group](https://support.office.com/article/learn-about-office-365-groups-b565caa1-5c40-40ef-9915-60fdb2d97fa2) emails and documents that can be seen by anyone in your organization.
186
+
187
187
**Q: What speech experiences are improved by the Tenant Model?**
188
188
189
-
**A:** When the Tenant Model is enabled, created, and published, it is used to improve recognition for any enterprise applications built using Speech Service; that also passes a user AAD token indicating membership to the enterprise.
189
+
**A:** When the Tenant Model is enabled, created and published, it is used to improve recognition for any enterprise applications built using Speech Service; that also pass a user AAD token indicating membership to the enterprise.
190
190
191
191
The speech experiences built into Office 365, such as Dictation and PowerPoint Captioning, aren't changed when you create a Tenant Model for your Speech Service applications.
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/faq-text-to-speech.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: Frequently asked questions about the Text to Speech service in Azure
3
3
titleSuffix: Azure Cognitive Services
4
-
description: Get answers to the most popular questions about the Text to Speech service.
4
+
description: Get answers to the frequently asked questions about the Text to Speech service.
5
5
services: cognitive-services
6
6
author: PanosPeriorellis
7
7
manager: nitinme
@@ -21,7 +21,7 @@ If you can't find answers to your questions in this FAQ, check out [other suppor
21
21
22
22
**Q: What is the difference between a standard voice model and a custom voice model?**
23
23
24
-
**A**: The standard voice model (also called a *voice font*) has been trained by using Microsoft-owned data and is already deployed in the cloud. You can use a custom voice model either to adapt an average model and transfer the timbre and expression of the speaker’s voice style or train a full, new model based on the training data prepared by the user. Today, more and more customers want a one-of-a-kind, branded voice for their bots. The custom voice-building platform is the right choice for that option.
24
+
**A**: The standard voice model (also called a _voice font_) has been trained by using Microsoft-owned data and is already deployed in the cloud. You can use a custom voice model either to adapt an average model and transfer the timbre and expression of the speaker’s voice style or train a full, new model based on the training data prepared by the user. Today, more and more customers want a one-of-a-kind, branded voice for their bots. The custom voice-building platform is the right choice for that option.
25
25
26
26
**Q: Where do I start if I want to use a standard voice model?**
27
27
@@ -59,5 +59,5 @@ We provide additional services to help you prepare scripts for recording. Contac
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-select-audio-input-devices.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,11 @@
1
1
---
2
2
title: How to select an audio input device with the Speech SDK - Speech Service
3
3
titleSuffix: Azure Cognitive Services
4
-
description: Learn about selecting audio input devices in the Speech SDK.
4
+
description: Learn about selecting audio input devices in the Speech SDK (C++, C#, Python, Objective-C, Java, JavaScript) by obtaining the IDs of the audio devices connected to a system.
0 commit comments