Merge pull request #872 from eric-urban/eur/semantic-segmentation

prmerger-automator[bot] · web-flow · commit 07476b7b17b2 · 2024-10-22T00:38:40.000Z
semantic segmentation
diff --git a/articles/ai-services/speech-service/how-to-recognize-speech.md b/articles/ai-services/speech-service/how-to-recognize-speech.md
@@ -6,7 +6,7 @@ author: eric-urban
 manager: nitinme
 ms.service: azure-ai-speech
 ms.topic: how-to
-ms.date: 9/20/2024
+ms.date: 10/17/2024
 ms.author: eur
 ms.devlang: cpp
 ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
diff --git a/articles/ai-services/speech-service/includes/how-to/recognize-speech/cpp.md b/articles/ai-services/speech-service/includes/how-to/recognize-speech/cpp.md
@@ -2,7 +2,7 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 08/13/2024
+ms.date: 10/17/2024
 ms.author: eur
 ---
 
@@ -215,3 +215,24 @@ auto speechRecognizer = SpeechRecognizer::FromConfig(speechConfig);
 Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
 
 For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
+
+
+## Semantic segmentation
+
+Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with silence-based segmentation: 
+- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience. 
+- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly. 
+
+Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results. 
+
+To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
+
+```cpp
+speechConfig->SetProperty(PropertyId::Speech_SegmentationStrategy, "Semantic");
+```
+
+Some of the limitations of semantic segmentation are as follows:
+- You need the Speech SDK version 1.41 or later to use semantic segmentation.
+- Semantic segmentation is only intended for use in [continuous recognition](#continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode. 
+- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
+- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.
diff --git a/articles/ai-services/speech-service/includes/how-to/recognize-speech/csharp.md b/articles/ai-services/speech-service/includes/how-to/recognize-speech/csharp.md
@@ -2,7 +2,7 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 08/13/2024
+ms.date: 10/17/2024
 ms.author: eur
 ms.custom: devx-track-csharp
 ---
@@ -330,3 +330,23 @@ speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "300");
 ```csharp
 speechConfig.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "10000");
 ```
+
+## Semantic segmentation
+
+Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with [silence-based segmentation](#change-how-silence-is-handled): 
+- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience. 
+- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly. 
+
+Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results. 
+
+To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
+
+```csharp
+speechConfig.SetProperty(PropertyId.Speech_SegmentationStrategy, "Semantic");
+```
+
+Some of the limitations of semantic segmentation are as follows:
+- You need the Speech SDK version 1.41 or later to use semantic segmentation.
+- Semantic segmentation is only intended for use in [continuous recognition](#use-continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode. 
+- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
+- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.
diff --git a/articles/ai-services/speech-service/includes/how-to/recognize-speech/java.md b/articles/ai-services/speech-service/includes/how-to/recognize-speech/java.md
@@ -2,7 +2,7 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 08/13/2024
+ms.date: 10/17/2024
 ms.custom: devx-track-java
 ms.author: eur
 ---
@@ -233,3 +233,23 @@ SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig);
 Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
 
 For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
+
+## Semantic segmentation
+
+Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with silence-based segmentation: 
+- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience. 
+- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly. 
+
+Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results. 
+
+To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
+
+```java
+speechConfig.SetProperty(PropertyId.Speech_SegmentationStrategy, "Semantic");
+```
+
+Some of the limitations of semantic segmentation are as follows:
+- You need the Speech SDK version 1.41 or later to use semantic segmentation.
+- Semantic segmentation is only intended for use in [continuous recognition](#use-continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode. 
+- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
+- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.
diff --git a/articles/ai-services/speech-service/includes/how-to/recognize-speech/python.md b/articles/ai-services/speech-service/includes/how-to/recognize-speech/python.md
@@ -2,7 +2,7 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 08/13/2024
+ms.date: 10/17/2024
 ms.author: eur
 ---
 
@@ -180,3 +180,24 @@ speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
 Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
 
 For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
+
+
+## Semantic segmentation
+
+Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with silence-based segmentation: 
+- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience. 
+- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly. 
+
+Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results. 
+
+To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
+
+```python
+speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationStrategy, "Semantic") 
+```
+
+Some of the limitations of semantic segmentation are as follows:
+- You need the Speech SDK version 1.41 or later to use semantic segmentation.
+- Semantic segmentation is only intended for use in [continuous recognition](#use-continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode. 
+- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
+- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.