Merge pull request #285618 from dlepow/aoaif

prmerger-automator[bot] · web-flow · commit 5c9563d97ead · 2024-08-28T18:47:06.000Z
[APIM] Updates to AOAI policy availability
diff --git a/articles/api-management/api-management-gateways-overview.md b/articles/api-management/api-management-gateways-overview.md
@@ -198,6 +198,9 @@ Scale capacity by adding and removing scale [units](upgrade-and-scale.md) in the
 
 ## Related content
 
--   Learn more about [API Management in a Hybrid and multicloud World](https://aka.ms/hybrid-and-multi-cloud-api-management)
--   Learn more about using the [capacity metric](api-management-capacity.md) for scaling decisions
--   Learn about [observability capabilities](observability.md) in API Management
+Lear more about:
+
+-   [API Management in a Hybrid and multicloud World](https://aka.ms/hybrid-and-multi-cloud-api-management)
+-   [Capacity metric](api-management-capacity.md) for scaling decisions
+-   [Observability capabilities](observability.md) in API Management
+-   [GenAI gateway capabilities](genai-gateway-capabilities.md) in API Management
diff --git a/articles/api-management/azure-openai-enable-semantic-caching.md b/articles/api-management/azure-openai-enable-semantic-caching.md
@@ -13,7 +13,7 @@ ms.collection: ce-skilling-ai-copilot
 
 # Enable semantic caching for Azure OpenAI APIs in Azure API Management
 
-[!INCLUDE [api-management-availability-basicv2-standardv2](../../includes/api-management-availability-basicv2-standardv2.md)]
+[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
 
 Enable semantic caching of responses to Azure OpenAI API requests to reduce bandwidth and processing requirements imposed on the backend APIs and lower latency perceived by API consumers. With semantic caching, you can return cached responses for identical prompts and also for prompts that are similar in meaning, even if the text isn't the same. For background, see [Tutorial: Use Azure Cache for Redis as a semantic cache](../azure-cache-for-redis/cache-tutorial-semantic-cache.md).
 
@@ -152,3 +152,4 @@ For example, if the cache was used, the **Output** section includes entries simi
 
 * [Caching policies](api-management-policies.md#caching)
 * [Azure Cache for Redis](../azure-cache-for-redis/cache-overview.md)
+* [GenAI gateway capabilities](genai-gateway-capabilities.md) in Azure API Management
diff --git a/articles/api-management/azure-openai-semantic-cache-lookup-policy.md b/articles/api-management/azure-openai-semantic-cache-lookup-policy.md
@@ -26,6 +26,8 @@ Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of r
 
 [!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
 
+[!INCLUDE [api-management-azure-openai-models](../../includes/api-management-azure-openai-models.md)]
+
 ## Policy statement
 
 ```xml
diff --git a/articles/api-management/azure-openai-semantic-cache-store-policy.md b/articles/api-management/azure-openai-semantic-cache-store-policy.md
@@ -26,6 +26,8 @@ The `azure-openai-semantic-cache-store` policy caches responses to Azure OpenAI
 
 [!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
 
+[!INCLUDE [api-management-azure-openai-models](../../includes/api-management-azure-openai-models.md)]
+
 ## Policy statement
 
 ```xml
diff --git a/articles/api-management/azure-openai-token-limit-policy.md b/articles/api-management/azure-openai-token-limit-policy.md
@@ -63,7 +63,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
 * This policy can be used multiple times per policy definition.
 * This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
 * Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the Azure OpenAI Service API are used to determine token usage.
-* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
+* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
 
 ## Example
diff --git a/articles/api-management/genai-gateway-capabilities.md b/articles/api-management/genai-gateway-capabilities.md
@@ -95,7 +95,7 @@ The backend [circuit breaker](backends.md#circuit-breaker) features dynamic trip
 
 ## Semantic caching policy
 
-Configure [Azure OpenAI semantic caching](azure-openai-enable-semantic-caching.md) policies to optimize token consumption by using semantic caching, which stores completions for prompts with similar meaning. 
+Configure [Azure OpenAI semantic caching](azure-openai-enable-semantic-caching.md) policies to optimize token use by storing completions for similar prompts.
 
 :::image type="content" source="media/genai-gateway-capabilities/semantic-caching.png" alt-text="Diagram of semantic caching in API Management.":::
 
diff --git a/articles/api-management/llm-semantic-cache-lookup-policy.md b/articles/api-management/llm-semantic-cache-lookup-policy.md
@@ -26,6 +26,8 @@ Use the `llm-semantic-cache-lookup` policy to perform cache lookup of responses
 
 [!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
 
+[!INCLUDE [api-management-llm-models](../../includes/api-management-llm-models.md)]
+
 ## Policy statement
 
 ```xml
diff --git a/articles/api-management/llm-semantic-cache-store-policy.md b/articles/api-management/llm-semantic-cache-store-policy.md
@@ -25,6 +25,8 @@ The `llm-semantic-cache-store` policy caches responses to chat completion API an
 
 [!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
 
+[!INCLUDE [api-management-llm-models](../../includes/api-management-llm-models.md)]
+
 ## Policy statement
 
 ```xml