Merge pull request #279263 from dlepow/semcache2

JamesJBarnett · web-flow · commit 28c3edb3056b · 2024-07-09T12:43:41.000-07:00
[APIM] AOAI policies - tier and gateway support
diff --git a/articles/api-management/api-management-gateways-overview.md b/articles/api-management/api-management-gateways-overview.md
@@ -130,7 +130,7 @@ Managed and self-hosted gateways support all available [policies](api-management
 
 <sup>1</sup> Configured policies that aren't supported by the self-hosted gateway are skipped during policy execution.<br/>
 <sup>2</sup> The quota by key policy isn't available in the v2 tiers.<br/>
-<sup>3</sup> The rate limit by key and quota by key policies aren't available in the Consumption tier.<br/>
+<sup>3</sup> The rate limit by key, quota by key, and Azure OpenAI token limit policies aren't available in the Consumption tier.<br/>
 <sup>4</sup> [!INCLUDE [api-management-self-hosted-gateway-rate-limit](../../includes/api-management-self-hosted-gateway-rate-limit.md)] [Learn more](how-to-self-hosted-gateway-on-kubernetes-in-production.md#request-throttling)
 
 
diff --git a/articles/api-management/azure-openai-emit-token-metric-policy.md b/articles/api-management/azure-openai-emit-token-metric-policy.md
@@ -6,7 +6,7 @@ author: dlepow
 
 ms.service: api-management
 ms.topic: article
-ms.date: 05/10/2024
+ms.date: 07/09/2024
 ms.author: danlep
 ms.collection: ce-skilling-ai-copilot
 ms.custom:
@@ -21,6 +21,8 @@ The `azure-openai-emit-token-metric` policy sends metrics to Application Insight
 
 [!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
 
+[!INCLUDE [api-management-azure-openai-models](../../includes/api-management-azure-openai-models.md)]
+
 
 ## Prerequisites
 
@@ -74,13 +76,15 @@ The `azure-openai-emit-token-metric` policy sends metrics to Application Insight
 
 - [**Policy sections:**](./api-management-howto-policies.md#sections) inbound
 - [**Policy scopes:**](./api-management-howto-policies.md#scopes) global, workspace, product, API, operation
--  [**Gateways:**](api-management-gateways-overview.md) classic, v2
+-  [**Gateways:**](api-management-gateways-overview.md) classic, v2, consumption, self-hosted
 
 ### Usage notes
 
 * This policy can be used multiple times per policy definition.
-* You can configure at most 10 custom definitions for this policy.
+* You can configure at most 10 custom dimensions for this policy.
 * This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
+* Where available, values in the usage section of the response from the Azure OpenAI Service API are used to determine token metrics.
+* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, token metrics are estimated.
 
 ## Example
 
diff --git a/articles/api-management/azure-openai-enable-semantic-caching.md b/articles/api-management/azure-openai-enable-semantic-caching.md
@@ -6,13 +6,15 @@ ms.service: api-management
 ms.custom:
   - build-2024
 ms.topic: how-to
-ms.date: 05/13/2024
+ms.date: 06/25/2024
 ms.author: danlep
 ms.collection: ce-skilling-ai-copilot
 ---
 
 # Enable semantic caching for Azure OpenAI APIs in Azure API Management
 
+[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
+
 Enable semantic caching of responses to Azure OpenAI API requests to reduce bandwidth and processing requirements imposed on the backend APIs and lower latency perceived by API consumers. With semantic caching, you can return cached responses for identical prompts and also for prompts that are similar in meaning, even if the text isn't the same. For background, see [Tutorial: Use Azure Cache for Redis as a semantic cache](../azure-cache-for-redis/cache-tutorial-semantic-cache.md).
 
 ## Prerequisites
diff --git a/articles/api-management/azure-openai-semantic-cache-lookup-policy.md b/articles/api-management/azure-openai-semantic-cache-lookup-policy.md
@@ -9,7 +9,7 @@ ms.collection: ce-skilling-ai-copilot
 ms.custom:
   - build-2024
 ms.topic: article
-ms.date: 05/10/2024
+ms.date: 06/25/2024
 ms.author: danlep
 ---
 
diff --git a/articles/api-management/azure-openai-semantic-cache-store-policy.md b/articles/api-management/azure-openai-semantic-cache-store-policy.md
@@ -1,5 +1,5 @@
 ---
-title: Azure API Management policy reference - azure-openai-sematic-cache-store
+title: Azure API Management policy reference - azure-openai-semantic-cache-store
 description: Reference for the azure-openai-semantic-cache-store policy available for use in Azure API Management. Provides policy usage, settings, and examples.
 services: api-management
 author: dlepow
@@ -9,7 +9,7 @@ ms.collection: ce-skilling-ai-copilot
 ms.custom:
   - build-2024
 ms.topic: article
-ms.date: 05/10/2024
+ms.date: 06/25/2024
 ms.author: danlep
 ---
 
diff --git a/articles/api-management/azure-openai-token-limit-policy.md b/articles/api-management/azure-openai-token-limit-policy.md
@@ -9,7 +9,7 @@ ms.collection: ce-skilling-ai-copilot
 ms.custom:
   - build-2024
 ms.topic: article
-ms.date: 05/10/2024
+ms.date: 06/25/2024
 ms.author: danlep
 ---
 
@@ -23,18 +23,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
 
 [!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
 
-## Supported Azure OpenAI Service models
-
-The policy is used with APIs [added to API Management from the Azure OpenAI Service](azure-openai-api-from-specification.md) of the following types:
-
-| API type | Supported models |
-|-------|-------------|
-| Chat completion     |  gpt-3.5<br/><br/>gpt-4 |
-| Completion | gpt-3.5-turbo-instruct |
-| Embeddings | text-embedding-3-large<br/><br/> text-embedding-3-small<br/><br/>text-embedding-ada-002 |
-
-
-For more information, see [Azure OpenAI Service models](../ai-services/openai/concepts/models.md).
+[!INCLUDE [api-management-azure-openai-models](../../includes/api-management-azure-openai-models.md)]
 
 ## Policy statement
 
@@ -67,12 +56,13 @@ For more information, see [Azure OpenAI Service models](../ai-services/openai/co
 
 - [**Policy sections:**](./api-management-howto-policies.md#sections) inbound
 - [**Policy scopes:**](./api-management-howto-policies.md#scopes) global, workspace, product, API, operation
-- [**Gateways:**](api-management-gateways-overview.md) classic, v2
+- [**Gateways:**](api-management-gateways-overview.md) classic, v2, self-hosted
 
 ### Usage notes
 
 * This policy can be used multiple times per policy definition.
 * This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
+* Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the Azure OpenAI Service API are used to determine token usage.
 * Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
 
diff --git a/includes/api-management-azure-openai-models.md b/includes/api-management-azure-openai-models.md
@@ -0,0 +1,22 @@
+---
+author: dlepow
+ms.service: api-management
+ms.custom:
+  - build-2024
+ms.topic: include
+ms.date: 07/09/2024
+ms.author: danlep
+---
+
+## Supported Azure OpenAI Service models
+
+The policy is used with APIs [added to API Management from the Azure OpenAI Service](../articles/api-management/azure-openai-api-from-specification.md) of the following types:
+
+| API type | Supported models |
+|-------|-------------|
+| Chat completion     |  gpt-3.5<br/><br/>gpt-4 |
+| Completion | gpt-3.5-turbo-instruct |
+| Embeddings | text-embedding-3-large<br/><br/> text-embedding-3-small<br/><br/>text-embedding-ada-002 |
+
+For more information, see [Azure OpenAI Service models](../articles/ai-services/openai/concepts/models.md).
+