MicrosoftDocs
diff --git a/‎articles/api-management/api-management-sample-flexible-throttling.md
Lines changed: 12 additions & 7 deletions b/‎articles/api-management/api-management-sample-flexible-throttling.md
Lines changed: 12 additions & 7 deletions
diff --git a/‎articles/api-management/azure-openai-token-limit-policy.md
Lines changed: 1 addition & 0 deletions b/‎articles/api-management/azure-openai-token-limit-policy.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎articles/api-management/llm-token-limit-policy.md
Lines changed: 1 addition & 0 deletions b/‎articles/api-management/llm-token-limit-policy.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎articles/api-management/rate-limit-by-key-policy.md
Lines changed: 3 additions & 3 deletions b/‎articles/api-management/rate-limit-by-key-policy.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎articles/api-management/rate-limit-policy.md
Lines changed: 3 additions & 1 deletion b/‎articles/api-management/rate-limit-policy.md
Lines changed: 3 additions & 1 deletion
@@ -3,11 +3,9 @@ title: Advanced request throttling with Azure API Management
 description: Learn how to create and apply flexible quota and rate limiting policies with Azure API Management.
 services: api-management
 author: dlepow
-manager: erikre
-ms.assetid: fc813a65-7793-4c17-8bb9-e387838193ae
 ms.service: azure-api-management
 ms.topic: concept-article
-ms.date: 02/03/2018
+ms.date: 04/10/2025
 ms.author: danlep
 
 ---
@@ -58,7 +56,7 @@ The following policies restrict a single client IP address to only 10 calls ever
           counter-key="@(context.Request.IpAddress)" />
 ```
 
-If all clients on the Internet used a unique IP address, this might be an effective way of limiting usage by user. However, it is likely that multiple users are sharing a single public IP address due to them accessing the Internet via a NAT device. Despite this, for APIs that allow unauthenticated access the `IpAddress` might be the best option.
+If all clients on the internet used a unique IP address, this might be an effective way of limiting usage by user. However, it is likely that multiple users are sharing a single public IP address due to them accessing the internet via a NAT device. Despite this, for APIs that allow unauthenticated access the `IpAddress` might be the best option.
 
 ## User identity throttling
 If an end user is authenticated, then a throttling key can be generated based on information that uniquely identifies that user.
@@ -85,8 +83,15 @@ When the throttling key is defined using a [policy expression](./api-management-
 
 This enables the developer's client application to choose how they want to create the rate limiting key. The client developers could create their own rate tiers by allocating sets of keys to users and rotating the key usage.
 
+## Considerations for multiple regions or gateways
+
+Rate limiting policies like `rate-limit`, `rate-limit-by-key`, `azure-openai-token-limit`, and `llm-token-limit` use counters at the level of the API Management gateway. This means that in [multi-region deployments](api-management-howto-deploy-multi-region.md) of API Management, each regional gateway has a separate counter, and rate limits are enforced separately for each region. Similarly, in API Management instances with [workspaces](workspaces-overview.md), limits are enforced separately for each workspace gateway. 
+
+Quota policies such as `quota` and `quota-by-key` are global, meaning that a single counter is used at the level of the API Management instance. 
+
 ## Summary
-Azure API Management provides rate and quota throttling to both protect and add value to your API service. The new throttling policies with custom scoping rules allow you finer grained control over those policies to enable your customers to build even better applications. The examples in this article demonstrate the use of these new policies by manufacturing rate limiting keys with client IP addresses, user identity, and client generated values. However, there are many other parts of the message that could be used such as user agent, URL path fragments, message size.
+Azure API Management provides rate and quota throttling to both protect and add value to your API service. These throttling policies with custom scoping rules allow you finer grained control over those policies to enable your customers to build even better applications. The examples in this article demonstrate the use of these new policies by manufacturing rate limiting keys with client IP addresses, user identity, and client generated values. However, there are many other parts of the message that could be used such as user agent, URL path fragments, and message size.
+
+## Related content
 
-## Next steps
-Please give us your feedback as a GitHub issue for this topic. It would be great to hear about other potential key values that have been a logical choice in your scenarios.
+* [Rate limit and quota policies](api-management-policies.md#rate-limiting-and-quotas)
@@ -74,6 +74,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
 * Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
 * For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
+* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]
 
 ## Examples
 
 
@@ -75,6 +75,7 @@ By relying on token usage metrics returned from the LLM endpoint, the policy can
 * Certain LLM endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
 * For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
+* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]
 
 ## Examples
 
 
@@ -6,7 +6,7 @@ author: dlepow
 
 ms.service: azure-api-management
 ms.topic: reference
-ms.date: 07/23/2024
+ms.date: 03/31/2025
 ms.author: danlep
 ---
 
@@ -63,8 +63,8 @@ To understand the difference between rate limits and quotas, [see Rate limits an
 
 * [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
 * [!INCLUDE [api-management-self-hosted-gateway-rate-limit](../../includes/api-management-self-hosted-gateway-rate-limit.md)] [Learn more](how-to-self-hosted-gateway-on-kubernetes-in-production.md#request-throttling)
-* When `increment-condition` or `increment-count` are defined using expressions, evaluation and increment of rate limit counter are postponed to end of outbound pipeline to allow for policy expressions based on the reponse. Limit exceeded condition is not evaluated at the same time in this case and will be evaluated on next incoming call. This leads to cases where `429 Too Many Requests` status code is returned 1 call later than usual.
-
+* [!INCLUDE [api-management-rate-limit-gateway-calls](../../includes/api-management-rate-limit-gateway-calls.md)]
+* When `increment-condition` or `increment-count` are defined using expressions, evaluation and increment of the rate limit counter are postponed to the end of outbound pipeline to allow for policy expressions based on the response. Limit exceeded condition is not evaluated at the same time in this case and will be evaluated on next incoming call. This leads to cases where `429 Too Many Requests` status code is returned 1 call later than usual.
 
 ## Example
 
 
@@ -6,7 +6,7 @@ author: dlepow
 
 ms.service: azure-api-management
 ms.topic: reference
-ms.date: 07/23/2024
+ms.date: 03/31/2025
 ms.author: danlep
 ---
 
@@ -87,6 +87,8 @@ To understand the difference between rate limits and quotas, [see Rate limits an
 * This policy can be used only once per policy definition.
 * This policy is only applied when an API is accessed using a subscription key.
 * [!INCLUDE [api-management-self-hosted-gateway-rate-limit](../../includes/api-management-self-hosted-gateway-rate-limit.md)] [Learn more](how-to-self-hosted-gateway-on-kubernetes-in-production.md#request-throttling)
+* [!INCLUDE [api-management-rate-limit-gateway-calls](../../includes/api-management-rate-limit-gateway-calls.md)]
+
 
 
 ## Example