Skip to content

Commit 25af885

Browse files
Merge pull request #208550 from trrwilson/patch-15
[CogSvc] Speech: add a timeout configuration section to the speech how-to
2 parents f40d47f + 37366af commit 25af885

File tree

1 file changed

+40
-0
lines changed
  • articles/cognitive-services/Speech-Service/includes/how-to/recognize-speech

1 file changed

+40
-0
lines changed

articles/cognitive-services/Speech-Service/includes/how-to/recognize-speech/csharp.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -288,3 +288,43 @@ var speechConfig = SpeechConfig.FromSubscription("YourSubscriptionKey", "YourSer
288288
speechConfig.EndpointId = "YourEndpointId";
289289
var speechRecognizer = new SpeechRecognizer(speechConfig);
290290
```
291+
292+
## Change how silence is handled
293+
294+
If a user is expected to speak faster or slower than usual, the default behaviors for non-speech silence in input audio may not result in what you expect. Common problems with silence handling include:
295+
296+
- Fast speech chaining many sentences together into a single recognition result instead of breaking sentences into individual results
297+
- Slow speech separating parts of a single sentence into multiple results
298+
- A single-shot recognition ending too quickly while waiting for speech to begin
299+
300+
These problems can be addressed by setting one of two *timeout properties* on the `SpeechConfig` used to create a `SpeechRecognizer`:
301+
302+
- **Segmentation silence timeout** adjusts how much non-speech audio is allowed within a phrase that's currently being spoken before that phrase is considered "done."
303+
- *Higher* values generally make results longer and allow longer pauses from the speaker within a phrase, but will make results take longer to arrive can also make separate phrases combine together into a single result when set too high
304+
- *Lower* values generally make results shorter and ensure more prompt and frequent breaks between phrases, but can also cause single phrases to separate into multiple results when set too low
305+
- This timeout can be set to integer values between 100 and 5000, in milliseconds, with 500 a typical default
306+
- **Initial silence timeout** adjusts how much non-speech audio is allowed *before* a phrase before the recognition attempt ends in a "no match" result.
307+
- *Higher* values give speakers more time to react and start speaking, but can also result in slow responsiveness when nothing is spoken
308+
- *Lower* values ensure a prompt "no match" for faster user experience and more controlled audio handling, but may cut a speaker off too quickly when set too low
309+
- Because continuous recognition generates many results, this value determines how often "no match" results will arrive but doesn't otherwise affect the content of recognition results
310+
- This timeout can be set to any non-negative integer value, in milliseconds, or set to 0 to disable it entirely; 5000 is a typical default for single-shot recognition while 15000 is a typical default for continuous recognition
311+
312+
As there are tradeoffs when modifying these timeouts, it's only recommended to change the settings when a problem related to silence handling is observed. Default values optimally handle the majority of spoken audio and only uncommon scenarios should encounter problems.
313+
314+
**Example:** users speaking a serial number like "ABC-123-4567" pause between character groups long enough for the serial number to be broken into multiple results. In this case, setting the segmentation silence timeout to a higher value like 2000ms could help:
315+
316+
```csharp
317+
speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "2000");
318+
```
319+
320+
**Example:** a recorded presenter's speech is fast enough that several sentences in a row get combined, with big recognition results only arriving once or twice per minute. In this case, setting the segmentation silence timeout to a lower value like 300ms could help:
321+
322+
```csharp
323+
speechConfig.setProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "300");
324+
```
325+
326+
**Example:** a single-shot recognition asking a speaker to find and read a serial number ends too quickly while the number is being found. In this case, a longer initial silence timeout like 10000ms could help:
327+
328+
```csharp
329+
speechConfig.setProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "10000");
330+
```

0 commit comments

Comments
 (0)