Skip to content

Commit c787984

Browse files
authored
Merge pull request #284602 from cdpark/refresh-ai-recognize
Feature 294024: Q&M: AI Services freshness for 180d target - Recognize speech
2 parents 653c8cb + d849656 commit c787984

File tree

12 files changed

+103
-106
lines changed

12 files changed

+103
-106
lines changed

articles/ai-services/speech-service/how-to-recognize-speech.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.topic: how-to
9-
ms.date: 1/21/2024
9+
ms.date: 08/13/2024
1010
ms.author: eur
1111
ms.devlang: cpp
1212
ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
@@ -56,7 +56,7 @@ keywords: speech to text, speech to text software
5656
[!INCLUDE [CLI include](includes/how-to/recognize-speech/cli.md)]
5757
::: zone-end
5858

59-
## Next steps
59+
## Related content
6060

6161
* [Try the speech to text quickstart](get-started-speech-to-text.md)
6262
* [Improve recognition accuracy with custom speech](custom-speech-overview.md)

articles/ai-services/speech-service/includes/how-to/recognize-speech/cli.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 09/01/2023
5+
ms.date: 08/13/2024
66
ms.author: eur
77
---
88

@@ -21,11 +21,11 @@ spx recognize --microphone
2121
> [!NOTE]
2222
> The Speech CLI defaults to English. You can choose a different language [from the speech to text table](../../../../language-support.md?tabs=stt). For example, add `--source de-DE` to recognize German speech.
2323
24-
Speak into the microphone, and you can see transcription of your words into text in real-time. The Speech CLI stops after a period of silence, or when you select **Ctrl+C**.
24+
Speak into the microphone, and you can see transcription of your words into text in real time. The Speech CLI stops after a period of silence, or when you select **Ctrl+C**.
2525

2626
## Recognize speech from a file
2727

28-
The Speech CLI can recognize speech in many file formats and natural languages. In this example, you can use any *.wav* file (16 KHz or 8 KHz, 16-bit, and mono PCM) that contains English speech. Or if you want a quick sample, download the <a href="https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/whatstheweatherlike.wav" download="whatstheweatherlike" target="_blank">whatstheweatherlike.wav <span class="docon docon-download x-hidden-focus"></span></a> file, and copy it to the same directory as the Speech CLI binary file.
28+
The Speech CLI can recognize speech in many file formats and natural languages. In this example, you can use any *.wav* file (16 kHz or 8 kHz, 16-bit, and mono PCM) that contains English speech. Or if you want a quick sample, download the file [whatstheweatherlike.wav](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/whatstheweatherlike.wav), and copy it to the same directory as the Speech CLI binary file.
2929

3030
Use the following command to run the Speech CLI to recognize speech found in the audio file:
3131

@@ -42,5 +42,4 @@ The Speech CLI shows a text transcription of the speech on the screen.
4242

4343
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
4444

45-
For more information about containers, see [Host URLs](../../../speech-container-howto.md#host-urls) in Install and run Speech containers with Docker.
46-
45+
For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).

articles/ai-services/speech-service/includes/how-to/recognize-speech/cpp.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,20 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 09/01/2023
5+
ms.date: 08/13/2024
66
ms.author: eur
77
---
88

99
[!INCLUDE [Header](../../common/cpp.md)]
1010

1111
[!INCLUDE [Introduction](intro.md)]
1212

13-
## Create a speech configuration
13+
## Create a speech configuration instance
1414

15-
To call the Speech service using the Speech SDK, you need to create a [`SpeechConfig`](/cpp/cognitive-services/speech/speechconfig) instance. This class includes information about your subscription, like your key and associated location/region, endpoint, host, or authorization token.
15+
To call the Speech service using the Speech SDK, you need to create a [`SpeechConfig`](/cpp/cognitive-services/speech/speechconfig) instance. This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.
1616

17-
1. Create a `SpeechConfig` instance by using your key and region.
18-
1. Create a Speech resource on the [Azure portal](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices).
17+
1. Create a Speech resource in the [Azure portal](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices). Get the Speech resource key and region.
18+
1. Create a `SpeechConfig` instance by using the following code. Replace `YourSpeechKey` and `YourSpeechRegion` with your Speech resource key and region.
1919

2020
```cpp
2121
using namespace std;
@@ -48,11 +48,11 @@ auto result = speechRecognizer->RecognizeOnceAsync().get();
4848
cout << "RECOGNIZED: Text=" << result->Text << std::endl;
4949
```
5050

51-
If you want to use a *specific* audio input device, you need to specify the device ID in `AudioConfig`. For more information on how to get the device ID for your audio input device, see [Select an audio input device with the Speech SDK](../../../how-to-select-audio-input-devices.md)
51+
If you want to use a *specific* audio input device, you need to specify the device ID in `AudioConfig`. To learn how to get the device ID, see [Select an audio input device with the Speech SDK](../../../how-to-select-audio-input-devices.md).
5252

5353
## Recognize speech from a file
5454

55-
If you want to recognize speech from an audio file instead of using a microphone, you still need to create an `AudioConfig` instance. But for this case you don't call `FromDefaultMicrophoneInput()`. You call `FromWavFileInput()` and pass the file path:
55+
If you want to recognize speech from an audio file instead of using a microphone, you still need to create an `AudioConfig` instance. However, you don't call `FromDefaultMicrophoneInput()`. You call `FromWavFileInput()` and pass the file path:
5656

5757
```cpp
5858
using namespace Microsoft::CognitiveServices::Speech::Audio;
@@ -110,7 +110,7 @@ switch (result->Reason)
110110

111111
## Continuous recognition
112112

113-
Continuous recognition is a bit more involved than single-shot recognition. It requires you to subscribe to the `Recognizing`, `Recognized`, and `Canceled` events to get the recognition results. To stop recognition, you must call [StopContinuousRecognitionAsync](/cpp/cognitive-services/speech/speechrecognizer#stopcontinuousrecognitionasync). Here's an example of how continuous recognition is performed on an audio input file.
113+
Continuous recognition is a bit more involved than single-shot recognition. It requires you to subscribe to the `Recognizing`, `Recognized`, and `Canceled` events to get the recognition results. To stop recognition, you must call [StopContinuousRecognitionAsync](/cpp/cognitive-services/speech/speechrecognizer#stopcontinuousrecognitionasync). Here's an example of continuous recognition performed on an audio input file.
114114

115115
Start by defining the input and initializing [`SpeechRecognizer`](/cpp/cognitive-services/speech/speechrecognizer):
116116

@@ -192,13 +192,13 @@ A common task for speech recognition is specifying the input (or source) languag
192192
speechConfig->SetSpeechRecognitionLanguage("de-DE");
193193
```
194194
195-
[`SetSpeechRecognitionLanguage`](/cpp/cognitive-services/speech/speechconfig#setspeechrecognitionlanguage) is a parameter that takes a string as an argument. For more information, see the [list of supported speech to text locales](../../../language-support.md?tabs=stt).
195+
[`SetSpeechRecognitionLanguage`](/cpp/cognitive-services/speech/speechconfig#setspeechrecognitionlanguage) is a parameter that takes a string as an argument. For a list of supported locales, see [Language and voice support for the Speech service](../../../language-support.md).
196196
197197
## Language identification
198198
199-
You can use [language identification](../../../language-identification.md?pivots=programming-language-cpp#use-speech-to-text) with speech to text recognition when you need to identify the language in an audio source and then transcribe it to text.
199+
You can use language identification with speech to text recognition when you need to identify the language in an audio source and then transcribe it to text.
200200
201-
For a complete code sample, see [Language identification](../../../language-identification.md?pivots=programming-language-cpp#use-speech-to-text).
201+
For a complete code sample, see [Language identification](../../../language-identification.md?pivots=programming-language-cpp).
202202
203203
## Use a custom endpoint
204204
@@ -214,5 +214,4 @@ auto speechRecognizer = SpeechRecognizer::FromConfig(speechConfig);
214214

215215
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
216216

217-
For more information about containers, see [Host URLs](../../../speech-container-howto.md#host-urls) in Install and run Speech containers with Docker.
218-
217+
For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).

articles/ai-services/speech-service/includes/how-to/recognize-speech/csharp.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 09/01/2023
5+
ms.date: 08/13/2024
66
ms.author: eur
77
ms.custom: devx-track-csharp
88
---
@@ -11,12 +11,12 @@ ms.custom: devx-track-csharp
1111

1212
[!INCLUDE [Introduction](intro.md)]
1313

14-
## Create a speech configuration
14+
## Create a speech configuration instance
1515

16-
To call the Speech service by using the Speech SDK, you need to create a [`SpeechConfig`](/dotnet/api/microsoft.cognitiveservices.speech.speechconfig) instance. This class includes information about your subscription, like your key and associated location/region, endpoint, host, or authorization token.
16+
To call the Speech service by using the Speech SDK, you need to create a [`SpeechConfig`](/dotnet/api/microsoft.cognitiveservices.speech.speechconfig) instance. This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.
1717

18-
1. Create a `SpeechConfig` instance by using your key and location/region.
19-
1. Create a Speech resource on the [Azure portal](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices).
18+
1. Create a Speech resource in the [Azure portal](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices). Get the Speech resource key and region.
19+
1. Create a `SpeechConfig` instance by using the following code. Replace `YourSpeechKey` and `YourSpeechRegion` with your Speech resource key and region.
2020

2121
```csharp
2222
using System;
@@ -74,11 +74,11 @@ class Program
7474
}
7575
```
7676

77-
If you want to use a *specific* audio input device, you need to specify the device ID in `AudioConfig`. Learn [how to get the device ID](../../../how-to-select-audio-input-devices.md) for your audio input device.
77+
If you want to use a *specific* audio input device, you need to specify the device ID in `AudioConfig`. To learn how to get the device ID, see [Select an audio input device with the Speech SDK](../../../how-to-select-audio-input-devices.md).
7878

7979
## Recognize speech from a file
8080

81-
If you want to recognize speech from an audio file instead of a microphone, you still need to create an `AudioConfig` instance. But for this case you don't call `FromDefaultMicrophoneInput()`. You call `FromWavFileInput()` and pass the file path:
81+
If you want to recognize speech from an audio file instead of a microphone, you still need to create an `AudioConfig` instance. However, you don't call `FromDefaultMicrophoneInput()`. You call `FromWavFileInput()` and pass the file path:
8282

8383
```csharp
8484
using System;
@@ -110,9 +110,9 @@ class Program
110110

111111
For many use cases, it's likely that your audio data comes from Azure Blob Storage, or it's otherwise already in memory as a `byte[]` instance or a similar raw data structure. The following example uses [`PushAudioInputStream`](/dotnet/api/microsoft.cognitiveservices.speech.audio.pushaudioinputstream) to recognize speech, which is essentially an abstracted memory stream. The sample code does the following actions:
112112

113-
* Writes raw audio data (PCM) to `PushAudioInputStream` by using the `Write()` function, which accepts a `byte[]` instance.
113+
* Writes raw audio data to `PushAudioInputStream` by using the `Write()` function, which accepts a `byte[]` instance.
114114
* Reads a *.wav* file by using `FileReader` for demonstration purposes. If you already have audio data in a `byte[]` instance, you can skip directly to writing the content to the input stream.
115-
* The default format is 16-bit, 16-KHz mono pulse-code modulation (PCM) data. To customize the format, you can pass an [`AudioStreamFormat`](/dotnet/api/microsoft.cognitiveservices.speech.audio.audiostreamformat) object to `CreatePushStream()` by using the static function `AudioStreamFormat.GetWaveFormatPCM(sampleRate, (byte)bitRate, (byte)channels)`.
115+
* The default format is 16-bit, 16-kHz mono pulse-code modulation (PCM) data. To customize the format, you can pass an [`AudioStreamFormat`](/dotnet/api/microsoft.cognitiveservices.speech.audio.audiostreamformat) object to `CreatePushStream()` by using the static function `AudioStreamFormat.GetWaveFormatPCM(sampleRate, (byte)bitRate, (byte)channels)`.
116116

117117
```csharp
118118
using System;
@@ -149,7 +149,7 @@ class Program
149149
}
150150
```
151151

152-
Using a push stream as input assumes that the audio data is a raw PCM and skips any headers. The API still works in certain cases if the header hasn't been skipped. For the best results, consider implementing logic to read off the headers so that `byte[]` begins at the *start of the audio data*.
152+
Using a push stream as input assumes that the audio data is raw PCM and skips any headers. The API still works in certain cases if the header isn't skipped. For the best results, consider implementing logic to read off the headers so that `byte[]` begins at the *start of the audio data*.
153153

154154
## Handle errors
155155

@@ -267,13 +267,13 @@ A common task for speech recognition is specifying the input (or source) languag
267267
speechConfig.SpeechRecognitionLanguage = "it-IT";
268268
```
269269

270-
The [`SpeechRecognitionLanguage`](/dotnet/api/microsoft.cognitiveservices.speech.speechconfig.speechrecognitionlanguage) property expects a language-locale format string. For more information, see the [list of supported speech to text locales](../../../language-support.md?tabs=stt).
270+
The [`SpeechRecognitionLanguage`](/dotnet/api/microsoft.cognitiveservices.speech.speechconfig.speechrecognitionlanguage) property expects a language-locale format string. For a list of supported locales, see [Language and voice support for the Speech service](../../../language-support.md?tabs=stt).
271271

272272
## Language identification
273273

274-
You can use [language identification](../../../language-identification.md?pivots=programming-language-csharp#use-speech-to-text) with speech to text recognition when you need to identify the language in an audio source and then transcribe it to text.
274+
You can use language identification with speech to text recognition when you need to identify the language in an audio source and then transcribe it to text.
275275

276-
For a complete code sample, see [Language identification](../../../language-identification.md?pivots=programming-language-csharp#use-speech-to-text).
276+
For a complete code sample, see [Language identification](../../../language-identification.md?pivots=programming-language-csharp).
277277

278278
## Use a custom endpoint
279279

@@ -289,20 +289,20 @@ var speechRecognizer = new SpeechRecognizer(speechConfig);
289289

290290
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
291291

292-
For more information about containers, see [Host URLs](../../../speech-container-howto.md#host-urls) in Install and run Speech containers with Docker.
292+
For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
293293

294294
## Change how silence is handled
295295

296-
If a user is expected to speak faster or slower than usual, the default behaviors for nonspeech silence in input audio might not result in what you expect. Common problems with silence handling include:
296+
If a user speaks faster or slower than usual, the default behaviors for nonspeech silence in input audio might not result in what you expect. Common problems with silence handling include:
297297

298-
- Fast-speech chaining many sentences together into a single recognition result, instead of breaking sentences into individual results.
299-
- Slow speech separating parts of a single sentence into multiple results.
300-
- A single-shot recognition ending too quickly while waiting for speech to begin.
298+
- Fast-speech that chains many sentences together into a single recognition result, instead of breaking sentences into individual results.
299+
- Slow speech that separates parts of a single sentence into multiple results.
300+
- A single-shot recognition that ends too quickly while waiting for speech to begin.
301301

302302
These problems can be addressed by setting one of two *timeout properties* on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
303303

304304
- **Segmentation silence timeout** adjusts how much nonspeech audio is allowed within a phrase that's currently being spoken before that phrase is considered "done."
305-
- *Higher* values generally make results longer and allow longer pauses from the speaker within a phrase but make results take longer to arrive. They can also make separate phrases combine together into a single result when set too high.
305+
- *Higher* values generally make results longer and allow longer pauses from the speaker within a phrase but make results take longer to arrive. They can also combine separate phrases into a single result when set too high.
306306
- *Lower* values generally make results shorter and ensure more prompt and frequent breaks between phrases, but can also cause single phrases to separate into multiple results when set too low.
307307
- This timeout can be set to integer values between 100 and 5000, in milliseconds, with 500 a typical default.
308308
- **Initial silence timeout** adjusts how much nonspeech audio is allowed *before* a phrase before the recognition attempt ends in a "no match" result.
@@ -311,9 +311,9 @@ These problems can be addressed by setting one of two *timeout properties* on th
311311
- Because continuous recognition generates many results, this value determines how often "no match" results arrive but doesn't otherwise affect the content of recognition results.
312312
- This timeout can be set to any non-negative integer value, in milliseconds, or set to 0 to disable it entirely. 5000 is a typical default for single-shot recognition while 15000 is a typical default for continuous recognition.
313313

314-
As there are tradeoffs when modifying these timeouts, you should only change the settings when you have a problem related to silence handling. Default values optimally handle most spoken audio and only uncommon scenarios should encounter problems.
314+
Since there are tradeoffs when modifying these timeouts, you should only change the settings when you have a problem related to silence handling. Default values optimally handle most spoken audio and only uncommon scenarios should encounter problems.
315315

316-
**Example:** Users speaking a serial number like "ABC-123-4567" might pause between character groups long enough for the serial number to be broken into multiple results. In this case, try a higher value like 2000 ms for the segmentation silence timeout:
316+
**Example:** Users speaking a serial number like "ABC-123-4567" might pause between character groups long enough for the serial number to be broken into multiple results. In this case, try a higher value like 2000 milliseconds for the segmentation silence timeout:
317317

318318
```csharp
319319
speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "2000");

0 commit comments

Comments
 (0)