Skip to content

Commit 07476b7

Browse files
Merge pull request #872 from eric-urban/eur/semantic-segmentation
semantic segmentation
2 parents 838d6c1 + e78b1e6 commit 07476b7

File tree

5 files changed

+87
-5
lines changed

5 files changed

+87
-5
lines changed

articles/ai-services/speech-service/how-to-recognize-speech.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: eric-urban
66
manager: nitinme
77
ms.service: azure-ai-speech
88
ms.topic: how-to
9-
ms.date: 9/20/2024
9+
ms.date: 10/17/2024
1010
ms.author: eur
1111
ms.devlang: cpp
1212
ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python

articles/ai-services/speech-service/includes/how-to/recognize-speech/cpp.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 08/13/2024
5+
ms.date: 10/17/2024
66
ms.author: eur
77
---
88

@@ -215,3 +215,24 @@ auto speechRecognizer = SpeechRecognizer::FromConfig(speechConfig);
215215
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
216216

217217
For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
218+
219+
220+
## Semantic segmentation
221+
222+
Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with silence-based segmentation:
223+
- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience.
224+
- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly.
225+
226+
Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results.
227+
228+
To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
229+
230+
```cpp
231+
speechConfig->SetProperty(PropertyId::Speech_SegmentationStrategy, "Semantic");
232+
```
233+
234+
Some of the limitations of semantic segmentation are as follows:
235+
- You need the Speech SDK version 1.41 or later to use semantic segmentation.
236+
- Semantic segmentation is only intended for use in [continuous recognition](#continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode.
237+
- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
238+
- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.

articles/ai-services/speech-service/includes/how-to/recognize-speech/csharp.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 08/13/2024
5+
ms.date: 10/17/2024
66
ms.author: eur
77
ms.custom: devx-track-csharp
88
---
@@ -330,3 +330,23 @@ speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "300");
330330
```csharp
331331
speechConfig.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "10000");
332332
```
333+
334+
## Semantic segmentation
335+
336+
Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with [silence-based segmentation](#change-how-silence-is-handled):
337+
- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience.
338+
- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly.
339+
340+
Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results.
341+
342+
To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
343+
344+
```csharp
345+
speechConfig.SetProperty(PropertyId.Speech_SegmentationStrategy, "Semantic");
346+
```
347+
348+
Some of the limitations of semantic segmentation are as follows:
349+
- You need the Speech SDK version 1.41 or later to use semantic segmentation.
350+
- Semantic segmentation is only intended for use in [continuous recognition](#use-continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode.
351+
- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
352+
- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.

articles/ai-services/speech-service/includes/how-to/recognize-speech/java.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 08/13/2024
5+
ms.date: 10/17/2024
66
ms.custom: devx-track-java
77
ms.author: eur
88
---
@@ -233,3 +233,23 @@ SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig);
233233
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
234234

235235
For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
236+
237+
## Semantic segmentation
238+
239+
Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with silence-based segmentation:
240+
- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience.
241+
- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly.
242+
243+
Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results.
244+
245+
To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
246+
247+
```java
248+
speechConfig.SetProperty(PropertyId.Speech_SegmentationStrategy, "Semantic");
249+
```
250+
251+
Some of the limitations of semantic segmentation are as follows:
252+
- You need the Speech SDK version 1.41 or later to use semantic segmentation.
253+
- Semantic segmentation is only intended for use in [continuous recognition](#use-continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode.
254+
- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
255+
- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.

articles/ai-services/speech-service/includes/how-to/recognize-speech/python.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
author: eric-urban
33
ms.service: azure-ai-speech
44
ms.topic: include
5-
ms.date: 08/13/2024
5+
ms.date: 10/17/2024
66
ms.author: eur
77
---
88

@@ -180,3 +180,24 @@ speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
180180
Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
181181

182182
For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
183+
184+
185+
## Semantic segmentation
186+
187+
Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with silence-based segmentation:
188+
- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience.
189+
- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly.
190+
191+
Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results.
192+
193+
To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
194+
195+
```python
196+
speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationStrategy, "Semantic")
197+
```
198+
199+
Some of the limitations of semantic segmentation are as follows:
200+
- You need the Speech SDK version 1.41 or later to use semantic segmentation.
201+
- Semantic segmentation is only intended for use in [continuous recognition](#use-continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode.
202+
- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
203+
- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.

0 commit comments

Comments
 (0)