Skip to content

Commit 52c290c

Browse files
author
gitName
committed
ai token usage
1 parent c3f3f44 commit 52c290c

File tree

5 files changed

+20
-2
lines changed

5 files changed

+20
-2
lines changed

articles/api-management/azure-openai-token-limit-policy.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
7474
* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
7575
* For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
7676
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
77+
* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]
7778

7879
## Examples
7980

articles/api-management/llm-token-limit-policy.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ By relying on token usage metrics returned from the LLM endpoint, the policy can
7575
* Certain LLM endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
7676
* For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
7777
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
78+
* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]
7879

7980
## Examples
8081

articles/api-management/rate-limit-policy.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,8 @@ To understand the difference between rate limits and quotas, [see Rate limits an
8787
* This policy can be used only once per policy definition.
8888
* This policy is only applied when an API is accessed using a subscription key.
8989
* [!INCLUDE [api-management-self-hosted-gateway-rate-limit](../../includes/api-management-self-hosted-gateway-rate-limit.md)] [Learn more](how-to-self-hosted-gateway-on-kubernetes-in-production.md#request-throttling)
90-
* In a [multi-region](api-management-howto-deploy-multi-region.md) deployment, this policy counts calls separately at each regional gateway in the deployment. The policy doesn't aggregate all call data for the instance.
90+
* [!INCLUDE [api-management-rate-limit-gateway-calls](../../includes/api-management-rate-limit-gateway-calls.md)]
91+
9192

9293

9394
## Example
Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,9 @@
11
---
2-
* This policy tracks calls independently at each gateway where it is applied, including regional gateways in a [multi-region deployment](../articles/api-management/api-management-howto-deploy-multi-region.md) and [workspace gateways](../articles/api-management/workspaces-overview.md#workspace-gateway). It doesn't aggregate call data across the entire instance.
2+
author: dlepow
3+
ms.service: azure-api-management
4+
ms.topic: include
5+
ms.date: 04/09/2025
6+
ms.author: danlep
7+
---
8+
9+
This policy tracks calls independently at each gateway where it is applied, including regional gateways in a [multi-region deployment](../articles/api-management/api-management-howto-deploy-multi-region.md) and [workspace gateways](../articles/api-management/workspaces-overview.md#workspace-gateway). It doesn't aggregate call data across the entire instance.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
author: dlepow
3+
ms.service: azure-api-management
4+
ms.topic: include
5+
ms.date: 04/09/2025
6+
ms.author: danlep
7+
---
8+
This policy tracks token usage independently at each gateway where it is applied, including [workspace gateways](../articles/api-management/workspaces-overview.md#workspace-gateway) and regional gateways in a [multi-region deployment](../articles/api-management/api-management-howto-deploy-multi-region.md). It doesn't aggregate token counts across the entire instance.

0 commit comments

Comments
 (0)