You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/how-to/recognize-speech/cli.md
+4-5Lines changed: 4 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
author: eric-urban
3
3
ms.service: azure-ai-speech
4
4
ms.topic: include
5
-
ms.date: 09/01/2023
5
+
ms.date: 08/13/2024
6
6
ms.author: eur
7
7
---
8
8
@@ -21,11 +21,11 @@ spx recognize --microphone
21
21
> [!NOTE]
22
22
> The Speech CLI defaults to English. You can choose a different language [from the speech to text table](../../../../language-support.md?tabs=stt). For example, add `--source de-DE` to recognize German speech.
23
23
24
-
Speak into the microphone, and you can see transcription of your words into text in real-time. The Speech CLI stops after a period of silence, or when you select **Ctrl+C**.
24
+
Speak into the microphone, and you can see transcription of your words into text in realtime. The Speech CLI stops after a period of silence, or when you select **Ctrl+C**.
25
25
26
26
## Recognize speech from a file
27
27
28
-
The Speech CLI can recognize speech in many file formats and natural languages. In this example, you can use any *.wav* file (16 KHz or 8 KHz, 16-bit, and mono PCM) that contains English speech. Or if you want a quick sample, download the <ahref="https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/whatstheweatherlike.wav"download="whatstheweatherlike"target="_blank">whatstheweatherlike.wav <spanclass="docon docon-download x-hidden-focus"></span></a> file, and copy it to the same directory as the Speech CLI binary file.
28
+
The Speech CLI can recognize speech in many file formats and natural languages. In this example, you can use any *.wav* file (16 kHz or 8 kHz, 16-bit, and mono PCM) that contains English speech. Or if you want a quick sample, download the file [whatstheweatherlike.wav](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/csharp/sharedcontent/console/whatstheweatherlike.wav), and copy it to the same directory as the Speech CLI binary file.
29
29
30
30
Use the following command to run the Speech CLI to recognize speech found in the audio file:
31
31
@@ -42,5 +42,4 @@ The Speech CLI shows a text transcription of the speech on the screen.
42
42
43
43
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
44
44
45
-
For more information about containers, see [Host URLs](../../../speech-container-howto.md#host-urls) in Install and run Speech containers with Docker.
46
-
45
+
For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/how-to/recognize-speech/cpp.md
+12-13Lines changed: 12 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,20 +2,20 @@
2
2
author: eric-urban
3
3
ms.service: azure-ai-speech
4
4
ms.topic: include
5
-
ms.date: 09/01/2023
5
+
ms.date: 08/13/2024
6
6
ms.author: eur
7
7
---
8
8
9
9
[!INCLUDE [Header](../../common/cpp.md)]
10
10
11
11
[!INCLUDE [Introduction](intro.md)]
12
12
13
-
## Create a speech configuration
13
+
## Create a speech configuration instance
14
14
15
-
To call the Speech service using the Speech SDK, you need to create a [`SpeechConfig`](/cpp/cognitive-services/speech/speechconfig) instance. This class includes information about your subscription, like your key and associated location/region, endpoint, host, or authorization token.
15
+
To call the Speech service using the Speech SDK, you need to create a [`SpeechConfig`](/cpp/cognitive-services/speech/speechconfig) instance. This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.
16
16
17
-
1. Create a `SpeechConfig` instance by using your key and region.
18
-
1. Create a Speech resource on the [Azure portal](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices).
17
+
1. Create a Speech resource in the [Azure portal](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices). Get the Speech resource key and region.
18
+
1. Create a `SpeechConfig` instance by using the following code. Replace `YourSpeechKey` and `YourSpeechRegion` with your Speech resource key and region.
19
19
20
20
```cpp
21
21
usingnamespacestd;
@@ -48,11 +48,11 @@ auto result = speechRecognizer->RecognizeOnceAsync().get();
If you want to use a *specific* audio input device, you need to specify the device ID in `AudioConfig`. For more information on how to get the device ID for your audio input device, see [Select an audio input device with the Speech SDK](../../../how-to-select-audio-input-devices.md)
51
+
If you want to use a *specific* audio input device, you need to specify the device ID in `AudioConfig`. To learn how to get the device ID, see [Select an audio input device with the Speech SDK](../../../how-to-select-audio-input-devices.md).
52
52
53
53
## Recognize speech from a file
54
54
55
-
If you want to recognize speech from an audio file instead of using a microphone, you still need to create an `AudioConfig` instance. But for this case you don't call `FromDefaultMicrophoneInput()`. You call `FromWavFileInput()` and pass the file path:
55
+
If you want to recognize speech from an audio file instead of using a microphone, you still need to create an `AudioConfig` instance. However, you don't call `FromDefaultMicrophoneInput()`. You call `FromWavFileInput()` and pass the file path:
Continuous recognition is a bit more involved than single-shot recognition. It requires you to subscribe to the `Recognizing`, `Recognized`, and `Canceled` events to get the recognition results. To stop recognition, you must call [StopContinuousRecognitionAsync](/cpp/cognitive-services/speech/speechrecognizer#stopcontinuousrecognitionasync). Here's an example of how continuous recognition is performed on an audio input file.
113
+
Continuous recognition is a bit more involved than single-shot recognition. It requires you to subscribe to the `Recognizing`, `Recognized`, and `Canceled` events to get the recognition results. To stop recognition, you must call [StopContinuousRecognitionAsync](/cpp/cognitive-services/speech/speechrecognizer#stopcontinuousrecognitionasync). Here's an example of continuous recognition performed on an audio input file.
114
114
115
115
Start by defining the input and initializing [`SpeechRecognizer`](/cpp/cognitive-services/speech/speechrecognizer):
116
116
@@ -192,13 +192,13 @@ A common task for speech recognition is specifying the input (or source) languag
[`SetSpeechRecognitionLanguage`](/cpp/cognitive-services/speech/speechconfig#setspeechrecognitionlanguage) is a parameter that takes a string as an argument. For more information, see the [list of supported speech to text locales](../../../language-support.md?tabs=stt).
195
+
[`SetSpeechRecognitionLanguage`](/cpp/cognitive-services/speech/speechconfig#setspeechrecognitionlanguage) is a parameter that takes a string as an argument. For a list of supported locales, see [Language and voice support for the Speech service](../../../language-support.md).
196
196
197
197
## Language identification
198
198
199
-
You can use [language identification](../../../language-identification.md?pivots=programming-language-cpp#use-speech-to-text) with speech to text recognition when you need to identify the language in an audio source and then transcribe it to text.
199
+
You can use language identification with speech to text recognition when you need to identify the language in an audio source and then transcribe it to text.
200
200
201
-
For a complete code sample, see [Language identification](../../../language-identification.md?pivots=programming-language-cpp#use-speech-to-text).
201
+
For a complete code sample, see [Language identification](../../../language-identification.md?pivots=programming-language-cpp).
202
202
203
203
## Use a custom endpoint
204
204
@@ -214,5 +214,4 @@ auto speechRecognizer = SpeechRecognizer::FromConfig(speechConfig);
214
214
215
215
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
216
216
217
-
For more information about containers, see [Host URLs](../../../speech-container-howto.md#host-urls) in Install and run Speech containers with Docker.
218
-
217
+
For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
Copy file name to clipboardExpand all lines: articles/ai-services/speech-service/includes/how-to/recognize-speech/csharp.md
+21-21Lines changed: 21 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
author: eric-urban
3
3
ms.service: azure-ai-speech
4
4
ms.topic: include
5
-
ms.date: 09/01/2023
5
+
ms.date: 08/13/2024
6
6
ms.author: eur
7
7
ms.custom: devx-track-csharp
8
8
---
@@ -11,12 +11,12 @@ ms.custom: devx-track-csharp
11
11
12
12
[!INCLUDE [Introduction](intro.md)]
13
13
14
-
## Create a speech configuration
14
+
## Create a speech configuration instance
15
15
16
-
To call the Speech service by using the Speech SDK, you need to create a [`SpeechConfig`](/dotnet/api/microsoft.cognitiveservices.speech.speechconfig) instance. This class includes information about your subscription, like your key and associated location/region, endpoint, host, or authorization token.
16
+
To call the Speech service by using the Speech SDK, you need to create a [`SpeechConfig`](/dotnet/api/microsoft.cognitiveservices.speech.speechconfig) instance. This class includes information about your subscription, like your key and associated region, endpoint, host, or authorization token.
17
17
18
-
1. Create a `SpeechConfig` instance by using your key and location/region.
19
-
1. Create a Speech resource on the [Azure portal](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices).
18
+
1. Create a Speech resource in the [Azure portal](https://portal.azure.com/#create/Microsoft.CognitiveServicesSpeechServices). Get the Speech resource key and region.
19
+
1. Create a `SpeechConfig` instance by using the following code. Replace `YourSpeechKey` and `YourSpeechRegion` with your Speech resource key and region.
20
20
21
21
```csharp
22
22
usingSystem;
@@ -74,11 +74,11 @@ class Program
74
74
}
75
75
```
76
76
77
-
If you want to use a *specific* audio input device, you need to specify the device ID in `AudioConfig`. Learn [how to get the device ID](../../../how-to-select-audio-input-devices.md) for your audio input device.
77
+
If you want to use a *specific* audio input device, you need to specify the device ID in `AudioConfig`. To learn how to get the device ID, see [Select an audio input device with the Speech SDK](../../../how-to-select-audio-input-devices.md).
78
78
79
79
## Recognize speech from a file
80
80
81
-
If you want to recognize speech from an audio file instead of a microphone, you still need to create an `AudioConfig` instance. But for this case you don't call `FromDefaultMicrophoneInput()`. You call `FromWavFileInput()` and pass the file path:
81
+
If you want to recognize speech from an audio file instead of a microphone, you still need to create an `AudioConfig` instance. However, you don't call `FromDefaultMicrophoneInput()`. You call `FromWavFileInput()` and pass the file path:
82
82
83
83
```csharp
84
84
usingSystem;
@@ -110,9 +110,9 @@ class Program
110
110
111
111
For many use cases, it's likely that your audio data comes from Azure Blob Storage, or it's otherwise already in memory as a `byte[]` instance or a similar raw data structure. The following example uses [`PushAudioInputStream`](/dotnet/api/microsoft.cognitiveservices.speech.audio.pushaudioinputstream) to recognize speech, which is essentially an abstracted memory stream. The sample code does the following actions:
112
112
113
-
* Writes raw audio data (PCM) to `PushAudioInputStream` by using the `Write()` function, which accepts a `byte[]` instance.
113
+
* Writes raw audio data to `PushAudioInputStream` by using the `Write()` function, which accepts a `byte[]` instance.
114
114
* Reads a *.wav* file by using `FileReader` for demonstration purposes. If you already have audio data in a `byte[]` instance, you can skip directly to writing the content to the input stream.
115
-
* The default format is 16-bit, 16-KHz mono pulse-code modulation (PCM) data. To customize the format, you can pass an [`AudioStreamFormat`](/dotnet/api/microsoft.cognitiveservices.speech.audio.audiostreamformat) object to `CreatePushStream()` by using the static function `AudioStreamFormat.GetWaveFormatPCM(sampleRate, (byte)bitRate, (byte)channels)`.
115
+
* The default format is 16-bit, 16-kHz mono pulse-code modulation (PCM) data. To customize the format, you can pass an [`AudioStreamFormat`](/dotnet/api/microsoft.cognitiveservices.speech.audio.audiostreamformat) object to `CreatePushStream()` by using the static function `AudioStreamFormat.GetWaveFormatPCM(sampleRate, (byte)bitRate, (byte)channels)`.
116
116
117
117
```csharp
118
118
usingSystem;
@@ -149,7 +149,7 @@ class Program
149
149
}
150
150
```
151
151
152
-
Using a push stream as input assumes that the audio data is a raw PCM and skips any headers. The API still works in certain cases if the header hasn't been skipped. For the best results, consider implementing logic to read off the headers so that `byte[]` begins at the *start of the audio data*.
152
+
Using a push stream as input assumes that the audio data is raw PCM and skips any headers. The API still works in certain cases if the header isn't skipped. For the best results, consider implementing logic to read off the headers so that `byte[]` begins at the *start of the audio data*.
153
153
154
154
## Handle errors
155
155
@@ -267,13 +267,13 @@ A common task for speech recognition is specifying the input (or source) languag
267
267
speechConfig.SpeechRecognitionLanguage="it-IT";
268
268
```
269
269
270
-
The [`SpeechRecognitionLanguage`](/dotnet/api/microsoft.cognitiveservices.speech.speechconfig.speechrecognitionlanguage) property expects a language-locale format string. For more information, see the [list of supported speech to text locales](../../../language-support.md?tabs=stt).
270
+
The [`SpeechRecognitionLanguage`](/dotnet/api/microsoft.cognitiveservices.speech.speechconfig.speechrecognitionlanguage) property expects a language-locale format string. For a list of supported locales, see [Language and voice support for the Speech service](../../../language-support.md?tabs=stt).
271
271
272
272
## Language identification
273
273
274
-
You can use [language identification](../../../language-identification.md?pivots=programming-language-csharp#use-speech-to-text) with speech to text recognition when you need to identify the language in an audio source and then transcribe it to text.
274
+
You can use language identification with speech to text recognition when you need to identify the language in an audio source and then transcribe it to text.
275
275
276
-
For a complete code sample, see [Language identification](../../../language-identification.md?pivots=programming-language-csharp#use-speech-to-text).
276
+
For a complete code sample, see [Language identification](../../../language-identification.md?pivots=programming-language-csharp).
277
277
278
278
## Use a custom endpoint
279
279
@@ -289,20 +289,20 @@ var speechRecognizer = new SpeechRecognizer(speechConfig);
289
289
290
290
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
291
291
292
-
For more information about containers, see [Host URLs](../../../speech-container-howto.md#host-urls) in Install and run Speech containers with Docker.
292
+
For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
293
293
294
294
## Change how silence is handled
295
295
296
-
If a user is expected to speak faster or slower than usual, the default behaviors for nonspeech silence in input audio might not result in what you expect. Common problems with silence handling include:
296
+
If a user speaks faster or slower than usual, the default behaviors for nonspeech silence in input audio might not result in what you expect. Common problems with silence handling include:
297
297
298
-
- Fast-speech chaining many sentences together into a single recognition result, instead of breaking sentences into individual results.
299
-
- Slow speech separating parts of a single sentence into multiple results.
300
-
- A single-shot recognition ending too quickly while waiting for speech to begin.
298
+
- Fast-speech that chains many sentences together into a single recognition result, instead of breaking sentences into individual results.
299
+
- Slow speech that separates parts of a single sentence into multiple results.
300
+
- A single-shot recognition that ends too quickly while waiting for speech to begin.
301
301
302
302
These problems can be addressed by setting one of two *timeout properties* on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
303
303
304
304
-**Segmentation silence timeout** adjusts how much nonspeech audio is allowed within a phrase that's currently being spoken before that phrase is considered "done."
305
-
-*Higher* values generally make results longer and allow longer pauses from the speaker within a phrase but make results take longer to arrive. They can also make separate phrases combine together into a single result when set too high.
305
+
-*Higher* values generally make results longer and allow longer pauses from the speaker within a phrase but make results take longer to arrive. They can also combine separate phrases into a single result when set too high.
306
306
-*Lower* values generally make results shorter and ensure more prompt and frequent breaks between phrases, but can also cause single phrases to separate into multiple results when set too low.
307
307
- This timeout can be set to integer values between 100 and 5000, in milliseconds, with 500 a typical default.
308
308
-**Initial silence timeout** adjusts how much nonspeech audio is allowed *before* a phrase before the recognition attempt ends in a "no match" result.
@@ -311,9 +311,9 @@ These problems can be addressed by setting one of two *timeout properties* on th
311
311
- Because continuous recognition generates many results, this value determines how often "no match" results arrive but doesn't otherwise affect the content of recognition results.
312
312
- This timeout can be set to any non-negative integer value, in milliseconds, or set to 0 to disable it entirely. 5000 is a typical default for single-shot recognition while 15000 is a typical default for continuous recognition.
313
313
314
-
As there are tradeoffs when modifying these timeouts, you should only change the settings when you have a problem related to silence handling. Default values optimally handle most spoken audio and only uncommon scenarios should encounter problems.
314
+
Since there are tradeoffs when modifying these timeouts, you should only change the settings when you have a problem related to silence handling. Default values optimally handle most spoken audio and only uncommon scenarios should encounter problems.
315
315
316
-
**Example:** Users speaking a serial number like "ABC-123-4567" might pause between character groups long enough for the serial number to be broken into multiple results. In this case, try a higher value like 2000 ms for the segmentation silence timeout:
316
+
**Example:** Users speaking a serial number like "ABC-123-4567" might pause between character groups long enough for the serial number to be broken into multiple results. In this case, try a higher value like 2000 milliseconds for the segmentation silence timeout:
0 commit comments