Skip to content

Commit 5ac7005

Browse files
committed
Merge branch 'main' of https://github.com/MicrosoftDocs/azure-docs-pr into us417294-vnet-top-freshness
2 parents 8f5c5a8 + e2ca634 commit 5ac7005

26 files changed

+230
-163
lines changed

articles/api-management/api-management-sample-flexible-throttling.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,9 @@ title: Advanced request throttling with Azure API Management
33
description: Learn how to create and apply flexible quota and rate limiting policies with Azure API Management.
44
services: api-management
55
author: dlepow
6-
manager: erikre
7-
ms.assetid: fc813a65-7793-4c17-8bb9-e387838193ae
86
ms.service: azure-api-management
97
ms.topic: concept-article
10-
ms.date: 02/03/2018
8+
ms.date: 04/10/2025
119
ms.author: danlep
1210

1311
---
@@ -58,7 +56,7 @@ The following policies restrict a single client IP address to only 10 calls ever
5856
counter-key="@(context.Request.IpAddress)" />
5957
```
6058

61-
If all clients on the Internet used a unique IP address, this might be an effective way of limiting usage by user. However, it is likely that multiple users are sharing a single public IP address due to them accessing the Internet via a NAT device. Despite this, for APIs that allow unauthenticated access the `IpAddress` might be the best option.
59+
If all clients on the internet used a unique IP address, this might be an effective way of limiting usage by user. However, it is likely that multiple users are sharing a single public IP address due to them accessing the internet via a NAT device. Despite this, for APIs that allow unauthenticated access the `IpAddress` might be the best option.
6260

6361
## User identity throttling
6462
If an end user is authenticated, then a throttling key can be generated based on information that uniquely identifies that user.
@@ -85,8 +83,15 @@ When the throttling key is defined using a [policy expression](./api-management-
8583

8684
This enables the developer's client application to choose how they want to create the rate limiting key. The client developers could create their own rate tiers by allocating sets of keys to users and rotating the key usage.
8785

86+
## Considerations for multiple regions or gateways
87+
88+
Rate limiting policies like `rate-limit`, `rate-limit-by-key`, `azure-openai-token-limit`, and `llm-token-limit` use counters at the level of the API Management gateway. This means that in [multi-region deployments](api-management-howto-deploy-multi-region.md) of API Management, each regional gateway has a separate counter, and rate limits are enforced separately for each region. Similarly, in API Management instances with [workspaces](workspaces-overview.md), limits are enforced separately for each workspace gateway.
89+
90+
Quota policies such as `quota` and `quota-by-key` are global, meaning that a single counter is used at the level of the API Management instance.
91+
8892
## Summary
89-
Azure API Management provides rate and quota throttling to both protect and add value to your API service. The new throttling policies with custom scoping rules allow you finer grained control over those policies to enable your customers to build even better applications. The examples in this article demonstrate the use of these new policies by manufacturing rate limiting keys with client IP addresses, user identity, and client generated values. However, there are many other parts of the message that could be used such as user agent, URL path fragments, message size.
93+
Azure API Management provides rate and quota throttling to both protect and add value to your API service. These throttling policies with custom scoping rules allow you finer grained control over those policies to enable your customers to build even better applications. The examples in this article demonstrate the use of these new policies by manufacturing rate limiting keys with client IP addresses, user identity, and client generated values. However, there are many other parts of the message that could be used such as user agent, URL path fragments, and message size.
94+
95+
## Related content
9096

91-
## Next steps
92-
Please give us your feedback as a GitHub issue for this topic. It would be great to hear about other potential key values that have been a logical choice in your scenarios.
97+
* [Rate limit and quota policies](api-management-policies.md#rate-limiting-and-quotas)

articles/api-management/azure-openai-token-limit-policy.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
7474
* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
7575
* For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
7676
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
77+
* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]
7778

7879
## Examples
7980

articles/api-management/llm-token-limit-policy.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ By relying on token usage metrics returned from the LLM endpoint, the policy can
7575
* Certain LLM endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
7676
* For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
7777
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
78+
* [!INCLUDE [api-management-token-limit-gateway-counts](../../includes/api-management-token-limit-gateway-counts.md)]
7879

7980
## Examples
8081

articles/api-management/rate-limit-by-key-policy.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: dlepow
66

77
ms.service: azure-api-management
88
ms.topic: reference
9-
ms.date: 07/23/2024
9+
ms.date: 03/31/2025
1010
ms.author: danlep
1111
---
1212

@@ -63,8 +63,8 @@ To understand the difference between rate limits and quotas, [see Rate limits an
6363

6464
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
6565
* [!INCLUDE [api-management-self-hosted-gateway-rate-limit](../../includes/api-management-self-hosted-gateway-rate-limit.md)] [Learn more](how-to-self-hosted-gateway-on-kubernetes-in-production.md#request-throttling)
66-
* When `increment-condition` or `increment-count` are defined using expressions, evaluation and increment of rate limit counter are postponed to end of outbound pipeline to allow for policy expressions based on the reponse. Limit exceeded condition is not evaluated at the same time in this case and will be evaluated on next incoming call. This leads to cases where `429 Too Many Requests` status code is returned 1 call later than usual.
67-
66+
* [!INCLUDE [api-management-rate-limit-gateway-calls](../../includes/api-management-rate-limit-gateway-calls.md)]
67+
* When `increment-condition` or `increment-count` are defined using expressions, evaluation and increment of the rate limit counter are postponed to the end of outbound pipeline to allow for policy expressions based on the response. Limit exceeded condition is not evaluated at the same time in this case and will be evaluated on next incoming call. This leads to cases where `429 Too Many Requests` status code is returned 1 call later than usual.
6868

6969
## Example
7070

articles/api-management/rate-limit-policy.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: dlepow
66

77
ms.service: azure-api-management
88
ms.topic: reference
9-
ms.date: 07/23/2024
9+
ms.date: 03/31/2025
1010
ms.author: danlep
1111
---
1212

@@ -87,6 +87,8 @@ To understand the difference between rate limits and quotas, [see Rate limits an
8787
* This policy can be used only once per policy definition.
8888
* This policy is only applied when an API is accessed using a subscription key.
8989
* [!INCLUDE [api-management-self-hosted-gateway-rate-limit](../../includes/api-management-self-hosted-gateway-rate-limit.md)] [Learn more](how-to-self-hosted-gateway-on-kubernetes-in-production.md#request-throttling)
90+
* [!INCLUDE [api-management-rate-limit-gateway-calls](../../includes/api-management-rate-limit-gateway-calls.md)]
91+
9092

9193

9294
## Example

0 commit comments

Comments
 (0)