You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/api-management/api-management-sample-flexible-throttling.md
+12-7Lines changed: 12 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,11 +3,9 @@ title: Advanced request throttling with Azure API Management
3
3
description: Learn how to create and apply flexible quota and rate limiting policies with Azure API Management.
4
4
services: api-management
5
5
author: dlepow
6
-
manager: erikre
7
-
ms.assetid: fc813a65-7793-4c17-8bb9-e387838193ae
8
6
ms.service: azure-api-management
9
7
ms.topic: concept-article
10
-
ms.date: 02/03/2018
8
+
ms.date: 04/10/2025
11
9
ms.author: danlep
12
10
13
11
---
@@ -58,7 +56,7 @@ The following policies restrict a single client IP address to only 10 calls ever
58
56
counter-key="@(context.Request.IpAddress)" />
59
57
```
60
58
61
-
If all clients on the Internet used a unique IP address, this might be an effective way of limiting usage by user. However, it is likely that multiple users are sharing a single public IP address due to them accessing the Internet via a NAT device. Despite this, for APIs that allow unauthenticated access the `IpAddress` might be the best option.
59
+
If all clients on the internet used a unique IP address, this might be an effective way of limiting usage by user. However, it is likely that multiple users are sharing a single public IP address due to them accessing the internet via a NAT device. Despite this, for APIs that allow unauthenticated access the `IpAddress` might be the best option.
62
60
63
61
## User identity throttling
64
62
If an end user is authenticated, then a throttling key can be generated based on information that uniquely identifies that user.
@@ -85,8 +83,15 @@ When the throttling key is defined using a [policy expression](./api-management-
85
83
86
84
This enables the developer's client application to choose how they want to create the rate limiting key. The client developers could create their own rate tiers by allocating sets of keys to users and rotating the key usage.
87
85
86
+
## Considerations for multiple regions or gateways
87
+
88
+
Rate limiting policies like `rate-limit`, `rate-limit-by-key`, `azure-openai-token-limit`, and `llm-token-limit` use counters at the level of the API Management gateway. This means that in [multi-region deployments](api-management-howto-deploy-multi-region.md) of API Management, each regional gateway has a separate counter, and rate limits are enforced separately for each region. Similarly, in API Management instances with [workspaces](workspaces-overview.md), limits are enforced separately for each workspace gateway.
89
+
90
+
Quota policies such as `quota` and `quota-by-key` are global, meaning that a single counter is used at the level of the API Management instance.
91
+
88
92
## Summary
89
-
Azure API Management provides rate and quota throttling to both protect and add value to your API service. The new throttling policies with custom scoping rules allow you finer grained control over those policies to enable your customers to build even better applications. The examples in this article demonstrate the use of these new policies by manufacturing rate limiting keys with client IP addresses, user identity, and client generated values. However, there are many other parts of the message that could be used such as user agent, URL path fragments, message size.
93
+
Azure API Management provides rate and quota throttling to both protect and add value to your API service. These throttling policies with custom scoping rules allow you finer grained control over those policies to enable your customers to build even better applications. The examples in this article demonstrate the use of these new policies by manufacturing rate limiting keys with client IP addresses, user identity, and client generated values. However, there are many other parts of the message that could be used such as user agent, URL path fragments, and message size.
94
+
95
+
## Related content
90
96
91
-
## Next steps
92
-
Please give us your feedback as a GitHub issue for this topic. It would be great to hear about other potential key values that have been a logical choice in your scenarios.
97
+
*[Rate limit and quota policies](api-management-policies.md#rate-limiting-and-quotas)
Copy file name to clipboardExpand all lines: articles/api-management/azure-openai-token-limit-policy.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -74,6 +74,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
74
74
* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
75
75
* For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
Copy file name to clipboardExpand all lines: articles/api-management/llm-token-limit-policy.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,6 +75,7 @@ By relying on token usage metrics returned from the LLM endpoint, the policy can
75
75
* Certain LLM endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
76
76
* For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
*When `increment-condition` or `increment-count` are defined using expressions, evaluation and increment of ratelimit counter are postponed to end of outbound pipeline to allow for policy expressions based on the reponse. Limit exceeded condition is not evaluated at the same time in this case and will be evaluated on next incoming call. This leads to cases where `429 Too Many Requests` status code is returned 1 call later than usual.
* When `increment-condition` or `increment-count` are defined using expressions, evaluation and increment of the rate limit counter are postponed to the end of outbound pipeline to allow for policy expressions based on the response. Limit exceeded condition is not evaluated at the same time in this case and will be evaluated on next incoming call. This leads to cases where `429 Too Many Requests` status code is returned 1 call later than usual.
0 commit comments