usage updates

dlepow · dlepow · commit 16333b2c9ab4 · 2024-07-08T12:40:50.000-07:00
diff --git a/articles/api-management/azure-openai-emit-token-metric-policy.md b/articles/api-management/azure-openai-emit-token-metric-policy.md
@@ -79,8 +79,10 @@ The `azure-openai-emit-token-metric` policy sends metrics to Application Insight
 ### Usage notes
 
 * This policy can be used multiple times per policy definition.
-* You can configure at most 10 custom definitions for this policy.
+* You can configure at most 10 custom dimensions for this policy.
 * This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
+* Where available, values in the usage section of the response from the Azure OpenAI Service API are used to determine token metrics.
+* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, token metrics are estimated.
 
 ## Example
 
diff --git a/articles/api-management/azure-openai-semantic-cache-lookup-policy.md b/articles/api-management/azure-openai-semantic-cache-lookup-policy.md
@@ -15,7 +15,7 @@ ms.author: danlep
 
 # Get cached responses of Azure OpenAI API requests
 
-[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
+[!INCLUDE [api-management-availability-basicv2-standardv2](../../includes/api-management-availability-basicv2-standardv2.md)]
 
 Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of responses to Azure OpenAI Chat Completion API and Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
 
@@ -60,7 +60,7 @@ Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of r
 
 - [**Policy sections:**](./api-management-howto-policies.md#sections) inbound
 - [**Policy scopes:**](./api-management-howto-policies.md#scopes) global, workspace, product, API, operation
--  [**Gateways:**](api-management-gateways-overview.md) classic, v2, consumption, self-hosted
+-  [**Gateways:**](api-management-gateways-overview.md) v2
 
 ### Usage notes
 
diff --git a/articles/api-management/azure-openai-semantic-cache-store-policy.md b/articles/api-management/azure-openai-semantic-cache-store-policy.md
@@ -15,7 +15,7 @@ ms.author: danlep
 
 # Cache responses to Azure OpenAI API requests
 
-[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
+[!INCLUDE [api-management-availability-basicv2-standardv2](../../includes/api-management-availability-basicv2-standardv2.md)]
 
 The `azure-openai-semantic-cache-store` policy caches responses to Azure OpenAI Chat Completion API and Completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
 
@@ -44,7 +44,7 @@ The `azure-openai-semantic-cache-store` policy caches responses to Azure OpenAI
 
 - [**Policy sections:**](./api-management-howto-policies.md#sections) outbound
 - [**Policy scopes:**](./api-management-howto-policies.md#scopes) global, workspace, product, API, operation
--  [**Gateways:**](api-management-gateways-overview.md) classic, v2, consumption, self-hosted
+-  [**Gateways:**](api-management-gateways-overview.md) v2
 
 ### Usage notes
 
diff --git a/articles/api-management/azure-openai-token-limit-policy.md b/articles/api-management/azure-openai-token-limit-policy.md
@@ -73,6 +73,7 @@ For more information, see [Azure OpenAI Service models](../ai-services/openai/co
 
 * This policy can be used multiple times per policy definition.
 * This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
+* Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the Azure OpenAI Service API are used to determine token usage.
 * Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]