ai token usage

gitName · gitName · commit 52c290c7dd7f · 2025-04-09T09:51:19.000-07:00
diff --git a/articles/api-management/azure-openai-token-limit-policy.md b/articles/api-management/azure-openai-token-limit-policy.md
@@ -74,6 +74,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
 * Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
 * For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
+* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]
 
 ## Examples
 
diff --git a/articles/api-management/llm-token-limit-policy.md b/articles/api-management/llm-token-limit-policy.md
@@ -75,6 +75,7 @@ By relying on token usage metrics returned from the LLM endpoint, the policy can
 * Certain LLM endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
 * For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
+* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]
 
 ## Examples
 
diff --git a/articles/api-management/rate-limit-policy.md b/articles/api-management/rate-limit-policy.md
@@ -87,7 +87,8 @@ To understand the difference between rate limits and quotas, [see Rate limits an
 * This policy can be used only once per policy definition.
 * This policy is only applied when an API is accessed using a subscription key.
 * [!INCLUDE [api-management-self-hosted-gateway-rate-limit](../../includes/api-management-self-hosted-gateway-rate-limit.md)] [Learn more](how-to-self-hosted-gateway-on-kubernetes-in-production.md#request-throttling)
-* In a [multi-region](api-management-howto-deploy-multi-region.md) deployment, this policy counts calls separately at each regional gateway in the deployment. The policy doesn't aggregate all call data for the instance. 
+* [!INCLUDE [api-management-rate-limit-gateway-calls](../../includes/api-management-rate-limit-gateway-calls.md)]
+
 
 
 ## Example
diff --git a/includes/api-management-rate-limit-gateway-calls.md b/includes/api-management-rate-limit-gateway-calls.md
@@ -1,2 +1,9 @@
 ---
-* This policy tracks calls independently at each gateway where it is applied, including regional gateways in a [multi-region deployment](../articles/api-management/api-management-howto-deploy-multi-region.md) and [workspace gateways](../articles/api-management/workspaces-overview.md#workspace-gateway). It doesn't aggregate call data across the entire instance. 
+author: dlepow
+ms.service: azure-api-management
+ms.topic: include
+ms.date: 04/09/2025
+ms.author: danlep
+---
+
+This policy tracks calls independently at each gateway where it is applied, including regional gateways in a [multi-region deployment](../articles/api-management/api-management-howto-deploy-multi-region.md) and [workspace gateways](../articles/api-management/workspaces-overview.md#workspace-gateway). It doesn't aggregate call data across the entire instance. 
diff --git a/includes/api-management-token-limit-gateway-counts.md b/includes/api-management-token-limit-gateway-counts.md
@@ -0,0 +1,8 @@
+---
+author: dlepow
+ms.service: azure-api-management
+ms.topic: include
+ms.date: 04/09/2025
+ms.author: danlep
+---
+This policy tracks token usage independently at each gateway where it is applied, including [workspace gateways](../articles/api-management/workspaces-overview.md#workspace-gateway) and regional gateways in a [multi-region deployment](../articles/api-management/api-management-howto-deploy-multi-region.md). It doesn't aggregate token counts across the entire instance.