MicrosoftDocs
diff --git a/‎.openpublishing.redirection.json‎
Lines changed: 5 additions & 0 deletions b/‎.openpublishing.redirection.json‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎articles/ai-services/openai/how-to/prompt-caching.md‎
Lines changed: 83 additions & 0 deletions b/‎articles/ai-services/openai/how-to/prompt-caching.md‎
Lines changed: 83 additions & 0 deletions
diff --git a/‎articles/ai-services/openai/toc.yml‎
Lines changed: 2 additions & 0 deletions b/‎articles/ai-services/openai/toc.yml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎articles/ai-services/speech-service/how-to-recognize-speech.md‎
Lines changed: 1 addition & 1 deletion b/‎articles/ai-services/speech-service/how-to-recognize-speech.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-services/speech-service/includes/how-to/recognize-speech/cpp.md‎
Lines changed: 22 additions & 1 deletion b/‎articles/ai-services/speech-service/includes/how-to/recognize-speech/cpp.md‎
Lines changed: 22 additions & 1 deletion
diff --git a/‎articles/ai-services/speech-service/includes/how-to/recognize-speech/csharp.md‎
Lines changed: 21 additions & 1 deletion b/‎articles/ai-services/speech-service/includes/how-to/recognize-speech/csharp.md‎
Lines changed: 21 additions & 1 deletion
diff --git a/‎articles/ai-services/speech-service/includes/how-to/recognize-speech/java.md‎
Lines changed: 21 additions & 1 deletion b/‎articles/ai-services/speech-service/includes/how-to/recognize-speech/java.md‎
Lines changed: 21 additions & 1 deletion
diff --git a/‎articles/ai-services/speech-service/includes/how-to/recognize-speech/python.md‎
Lines changed: 22 additions & 1 deletion b/‎articles/ai-services/speech-service/includes/how-to/recognize-speech/python.md‎
Lines changed: 22 additions & 1 deletion
diff --git a/‎articles/ai-services/speech-service/includes/quickstarts/keyword-recognition/swift.md‎
Lines changed: 3 additions & 0 deletions b/‎articles/ai-services/speech-service/includes/quickstarts/keyword-recognition/swift.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎articles/ai-services/speech-service/includes/quickstarts/platform/cpp-requirements.md‎
Lines changed: 2 additions & 0 deletions b/‎articles/ai-services/speech-service/includes/quickstarts/platform/cpp-requirements.md‎
Lines changed: 2 additions & 0 deletions
@@ -4,6 +4,11 @@
       "source_path_from_root": "/articles/ai-services/openai/concepts/use-your-image-data.md",
       "redirect_url": "/azure/ai-services/openai/concepts/use-your-data",
       "redirect_document_id": true
+    },
+    {
+      "source_path_from_root": "/articles/search/search-howto-create-indexers.md",
+      "redirect_url": "/azure/search/search-how-to-create-indexers",
+      "redirect_document_id": false
     }
   ]
 }
@@ -0,0 +1,83 @@
+---
+title: 'Prompt caching with Azure OpenAI Service'
+titleSuffix: Azure OpenAI
+description: Learn how to use prompt caching with Azure OpenAI
+services: cognitive-services
+manager: nitinme
+ms.service: azure-ai-openai
+ms.topic: how-to
+ms.date: 10/18/2024
+author: mrbullwinkle
+ms.author: mbullwin
+recommendations: false
+---
+
+# Prompt caching
+
+Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. *"Prompt"* in this context is referring to the input you send to the model as part of your chat completions request. Rather than reprocess the same input tokens over and over again, the model is able to retain a temporary cache of processed input data to improve overall performance. Prompt caching has no impact on the output content returned in the model response beyond a reduction in latency and cost.  
+
+## Supported models
+
+Currently only the following models support prompt caching with Azure OpenAI:
+
+- `o1-preview-2024-09-12`
+- `o1-mini-2024-09-12`
+
+## API support
+
+Official support for prompt caching was first added in API version `2024-10-01-preview`.
+
+## Getting started
+
+For a request to take advantage of prompt caching the request must be both:
+
+- A minimum of 1,024 tokens in length.
+- The first 1,024 tokens in the prompt must be identical.
+
+When a match is found between a prompt and the current content of the prompt cache, it's referred to as a cache hit. Cache hits will show up as [`cached_tokens`](/azure/ai-services/openai/reference-preview#cached_tokens) under [`prompt_token_details`](/azure/ai-services/openai/reference-preview#properties-for-prompt_tokens_details) in the chat completions response.
+
+```json
+{
+  "created": 1729227448,
+  "model": "o1-preview-2024-09-12",
+  "object": "chat.completion",
+  "service_tier": null,
+  "system_fingerprint": "fp_50cdd5dc04",
+  "usage": {
+    "completion_tokens": 1518,
+    "prompt_tokens": 1566,
+    "total_tokens": 3084,
+    "completion_tokens_details": {
+      "audio_tokens": null,
+      "reasoning_tokens": 576
+    },
+    "prompt_tokens_details": {
+      "audio_tokens": null,
+      "cached_tokens": 1408
+    }
+  }
+}
+```
+
+After the first 1,024 tokens cache hits will occur for every 128 additional identical tokens.
+
+A single character difference in the first 1,024 tokens will result in a cache miss which is characterized by a `cached_tokens` value of 0. Prompt caching is enabled by default with no additional configuration needed for supported models.
+
+## What is cached?
+
+The o1-series models are text only and don't support system messages, images, tool use/function calling, or structured outputs. This limits the efficacy of prompt caching for these models to the user/assistant portions of the messages array which are less likely to have an identical 1024 token prefix.
+
+Once prompt caching is enabled for other supported models prompt caching will expand to support:  
+
+| **Caching Supported** | **Description** |
+|--------|--------|
+|**Messages** | The complete messages array: system, user, and assistant content |
+|**Images** | Images included in user messages, both as links or as base64-encoded data. The detail parameter must be set the same across requests.
+|**Tool use**| Both the messages array and tool definitions |
+|**Structured outputs** | Structured output schema is appended as a prefix to the system message|
+
+To improve the likelihood of cache hits occurring, you should structure your requests such that repetitive content occurs at the beginning of the messages array.
+
+## Can I disable prompt caching?
+
+Prompt caching is enabled by default. There is no opt-out option.
@@ -130,6 +130,8 @@ items:
       href: ./how-to/completions.md
     - name: JSON mode
       href: ./how-to/json-mode.md
+    - name: Prompt caching
+      href: ./how-to/prompt-caching.md
     - name: Reproducible output
       href: ./how-to/reproducible-output.md
     - name: Structured outputs
 
@@ -6,7 +6,7 @@ author: eric-urban
 manager: nitinme
 ms.service: azure-ai-speech
 ms.topic: how-to
-ms.date: 9/20/2024
+ms.date: 10/17/2024
 ms.author: eur
 ms.devlang: cpp
 ms.custom: devx-track-extended-java, devx-track-go, devx-track-js, devx-track-python
 
@@ -2,7 +2,7 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 08/13/2024
+ms.date: 10/17/2024
 ms.author: eur
 ---
 
@@ -215,3 +215,24 @@ auto speechRecognizer = SpeechRecognizer::FromConfig(speechConfig);
 Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
 
 For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
+
+
+## Semantic segmentation
+
+Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with silence-based segmentation: 
+- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience. 
+- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly. 
+
+Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results. 
+
+To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
+
+```cpp
+speechConfig->SetProperty(PropertyId::Speech_SegmentationStrategy, "Semantic");
+```
+
+Some of the limitations of semantic segmentation are as follows:
+- You need the Speech SDK version 1.41 or later to use semantic segmentation.
+- Semantic segmentation is only intended for use in [continuous recognition](#continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode. 
+- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
+- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.
@@ -2,7 +2,7 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 08/13/2024
+ms.date: 10/17/2024
 ms.author: eur
 ms.custom: devx-track-csharp
 ---
@@ -330,3 +330,23 @@ speechConfig.SetProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "300");
 ```csharp
 speechConfig.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "10000");
 ```
+
+## Semantic segmentation
+
+Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with [silence-based segmentation](#change-how-silence-is-handled): 
+- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience. 
+- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly. 
+
+Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results. 
+
+To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
+
+```csharp
+speechConfig.SetProperty(PropertyId.Speech_SegmentationStrategy, "Semantic");
+```
+
+Some of the limitations of semantic segmentation are as follows:
+- You need the Speech SDK version 1.41 or later to use semantic segmentation.
+- Semantic segmentation is only intended for use in [continuous recognition](#use-continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode. 
+- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
+- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.
@@ -2,7 +2,7 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 08/13/2024
+ms.date: 10/17/2024
 ms.custom: devx-track-java
 ms.author: eur
 ---
@@ -233,3 +233,23 @@ SpeechRecognizer speechRecognizer = new SpeechRecognizer(speechConfig);
 Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
 
 For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
+
+## Semantic segmentation
+
+Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with silence-based segmentation: 
+- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience. 
+- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly. 
+
+Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results. 
+
+To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
+
+```java
+speechConfig.SetProperty(PropertyId.Speech_SegmentationStrategy, "Semantic");
+```
+
+Some of the limitations of semantic segmentation are as follows:
+- You need the Speech SDK version 1.41 or later to use semantic segmentation.
+- Semantic segmentation is only intended for use in [continuous recognition](#use-continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode. 
+- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
+- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.
@@ -2,7 +2,7 @@
 author: eric-urban
 ms.service: azure-ai-speech
 ms.topic: include
-ms.date: 08/13/2024
+ms.date: 10/17/2024
 ms.author: eur
 ---
 
@@ -180,3 +180,24 @@ speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
 Speech containers provide websocket-based query endpoint APIs that are accessed through the Speech SDK and Speech CLI. By default, the Speech SDK and Speech CLI use the public Speech service. To use the container, you need to change the initialization method. Use a container host URL instead of key and region.
 
 For more information about containers, see Host URLs in [Install and run Speech containers with Docker](../../../speech-container-howto.md#host-urls).
+
+
+## Semantic segmentation
+
+Semantic segmentation is a speech recognition segmentation strategy that's designed to mitigate issues associated with silence-based segmentation: 
+- Under-segmentation: When users speak for a long time without pauses, they can see a long sequence of text without breaks ("wall of text"), which severely degrades their readability experience. 
+- Over-segmentation: When a user pauses for a short time, the silence detection mechanism can segment incorrectly. 
+
+Instead of relying on silence timeouts, semantic segmentation segments and returns final results when it detects sentence-ending punctuation (such as '.' or '?'). This improves the user experience with higher-quality, semantically complete segments and prevents long intermediate results. 
+
+To use semantic segmentation, you need to set the following property on the `SpeechConfig` instance used to create a `SpeechRecognizer`:
+
+```python
+speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationStrategy, "Semantic") 
+```
+
+Some of the limitations of semantic segmentation are as follows:
+- You need the Speech SDK version 1.41 or later to use semantic segmentation.
+- Semantic segmentation is only intended for use in [continuous recognition](#use-continuous-recognition). This includes scenarios such as transcription and captioning. It shouldn't be used in the single recognition and dictation mode. 
+- Semantic segmentation isn't available for all languages and locales. Currently, semantic segmentation is only available for English (en) locales such as en-US, en-GB, en-IN, and en-AU.
+- Semantic segmentation doesn't yet support confidence scores and NBest lists. As such, we don't recommend semantic segmentation if you're using confidence scores or NBest lists.
@@ -21,3 +21,6 @@ ms.author: eur
 ## Use a keyword model with the Speech SDK
 
 See the [sample on GitHub](https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/b4257370e1d799f0b8b64be9bf2a34cad8b1a251/samples/objective-c/ios/speech-samples/speech-samples/ViewController.m#L585) for using your Custom Keyword model with the Objective C SDK. Although we don't currently have a Swift sample for parity, the concepts are similar.
+
+> [!NOTE]
+> If you are going to use keyword recognition in your Swift application on iOS, note that new keyword models created in Speech Studio will require using either the Speech SDK xcframework bundle from [https://aka.ms/csspeech/iosbinaryembedded](https://aka.ms/csspeech/iosbinaryembedded) or the `MicrosoftCognitiveServicesSpeechEmbedded-iOS` pod in your project.
@@ -21,6 +21,8 @@ The Speech SDK for C++ only supports the following distributions on the x64, ARM
 
 - Ubuntu 20.04/22.04/24.04
 - Debian 11/12
+- Amazon Linux 2023
+- Azure Linux 3.0
 
 [!INCLUDE [Linux distributions](linux-distributions.md)]
Original file line number	Diff line number	Diff line change
`@@ -4,6 +4,11 @@`
`4`	`4`	`"source_path_from_root": "/articles/ai-services/openai/concepts/use-your-image-data.md",`
`5`	`5`	`"redirect_url": "/azure/ai-services/openai/concepts/use-your-data",`
`6`	`6`	`"redirect_document_id": true`
	`7`	`+ },`
	`8`	`+ {`
	`9`	`+ "source_path_from_root": "/articles/search/search-howto-create-indexers.md",`
	`10`	`+ "redirect_url": "/azure/search/search-how-to-create-indexers",`
	`11`	`+ "redirect_document_id": false`
`7`	`12`	`}`
`8`	`13`	`]`
`9`	`14`	`}`