Skip to content

Commit 0f5b98a

Browse files
authored
Merge pull request #299696 from v-albemi/apim-throttling
Freshness Edit: Azure API Management
2 parents 0b354b6 + a73b363 commit 0f5b98a

File tree

3 files changed

+37
-26
lines changed

3 files changed

+37
-26
lines changed

articles/api-management/api-management-sample-flexible-throttling.md

Lines changed: 35 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,56 @@
11
---
2-
title: Advanced request throttling with Azure API Management
3-
description: Learn how to create and apply flexible quota and rate limiting policies with Azure API Management.
2+
title: Advanced Request Throttling with Azure API Management
3+
description: Learn how to create and apply flexible quota and rate limiting policies by using Azure API Management.
44
services: api-management
55
author: dlepow
66
ms.service: azure-api-management
77
ms.topic: concept-article
8-
ms.date: 04/10/2025
8+
ms.date: 05/15/2025
99
ms.author: danlep
1010

11+
#customer intent: As an API provider, I want to create and apply quota and rate limiting so that I can protect my APIs from abuse and/or create value for different API product tiers.
1112
---
13+
1214
# Advanced request throttling with Azure API Management
1315

1416
[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
1517

16-
Being able to throttle incoming requests is a key role of Azure API Management. Either by controlling the rate of requests or the total requests/data transferred, API Management allows API providers to protect their APIs from abuse and create value for different API product tiers.
18+
The ability to throttle incoming requests is a key role of Azure API Management. API Management enables API providers to protect their APIs from abuse and create value for different API product tiers by controlling either the rate of requests or the total requests/data transferred. This article describes how to create and apply quota and rate limiting.
1719

1820
## Rate limits and quotas
21+
1922
Rate limits and quotas are used for different purposes.
2023

2124
### Rate limits
22-
Rate limits are usually used to protect against short and intense volume bursts. For example, if you know your backend service has a bottleneck at its database with a high call volume, you could set a `rate-limit-by-key` policy to not allow high call volume by using this setting.
25+
26+
Rate limits are usually used to protect against short and intense volume bursts. For example, if you know your backend service has a bottleneck at its database when call volumes are high, you can set a `rate-limit-by-key` policy to disallow high call volumes.
2327

2428
[!INCLUDE [api-management-rate-limit-accuracy](../../includes/api-management-rate-limit-accuracy.md)]
2529

2630

2731
### Quotas
28-
Quotas are usually used for controlling call rates over a longer period of time. For example, they can set the total number of calls that a particular subscriber can make within a given month. For monetizing your API, quotas can also be set differently for tier-based subscriptions. For example, a Basic tier subscription might be able to make no more than 10,000 calls a month but a Premium tier could go up to 100,000,000 calls each month.
2932

30-
Within Azure API Management, rate limits are typically propagated faster across the nodes to protect against spikes. In contrast, usage quota information is used over a longer term and hence its implementation is different.
33+
Quotas are usually used to control call rates over a longer period of time. For example, they can set the total number of calls that a particular subscriber can make within a given month. If you monetize your API, you can also set quotas differently for tier-based subscriptions. For example, a Basic tier subscription might be able to make no more than 10,000 calls per month, but a Premium tier might be able to make 100,000,000 calls each month.
34+
35+
In API Management, rate limits are typically propagated faster across the nodes to protect against spikes. In contrast, usage quota information is used over a longer term, so its implementation is different.
3136

3237
[!INCLUDE [api-management-quota-accuracy](../../includes/api-management-quota-accuracy.md)]
3338

3439

3540
## Product-based throttling
36-
Rate throttling capabilities that are scoped to a particular subscription are useful for the API provider to apply limits on the developers who have signed up to use their API. However, it does not help, for example, in throttling individual end users of the API. It is possible for a single user of the developer's application to consume the entire quota and then prevent other customers of the developer from being able to use the application. Also, several customers who might generate a high volume of requests may limit access to occasional users.
41+
42+
API providers can use rate throttling capabilities that are scoped to a particular subscription to apply limits on the developers who have signed up to use their API. However, this type of throttling doesn't help, for example, with throttling individual end users of the API. It's possible for a single user of the developer's application to consume the entire quota and prevent other customers of the developer from being able to use the application. Also, several customers who generate a high volume of requests might limit access to occasional users.
3743

3844
## Custom key-based throttling
3945

4046
> [!NOTE]
41-
> The `rate-limit-by-key` and `quota-by-key` policies are not available when in the Consumption tier of Azure API Management. The `quota-by-key` policy is also currently not available in the v2 tiers.
47+
> The `rate-limit-by-key` and `quota-by-key` policies aren't available in the Consumption tier of Azure API Management.
4248
43-
The [rate-limit-by-key](rate-limit-by-key-policy.md) and [quota-by-key](quota-by-key-policy.md) policies provide a more flexible solution to traffic control. These policies allow you to define expressions to identify the keys that are used to track traffic usage. The way this works is easiest illustrated with an example.
49+
The [rate-limit-by-key](rate-limit-by-key-policy.md) and [quota-by-key](quota-by-key-policy.md) policies provide a more flexible solution to traffic control. These policies enable you to define expressions to identify the keys that are used to track traffic usage. This technique is illustrated in the following examples.
4450

45-
## IP address throttling
46-
The following policies restrict a single client IP address to only 10 calls every minute, with a total of 1,000,000 calls and 10,000 kilobytes of bandwidth per month.
51+
### IP address throttling
52+
53+
The following policies restrict a single client IP address to only 10 calls every minute and enforce a total of 1,000,000 calls and 10,000 kilobytes of bandwidth per month:
4754

4855
```xml
4956
<rate-limit-by-key calls="10"
@@ -56,41 +63,45 @@ The following policies restrict a single client IP address to only 10 calls ever
5663
counter-key="@(context.Request.IpAddress)" />
5764
```
5865

59-
If all clients on the internet used a unique IP address, this might be an effective way of limiting usage by user. However, it is likely that multiple users are sharing a single public IP address due to them accessing the internet via a NAT device. Despite this, for APIs that allow unauthenticated access the `IpAddress` might be the best option.
66+
If all clients on the internet used a unique IP address, this might be an effective way of limiting usage by user. However, it's likely that multiple users are sharing a single public IP address because they access the internet via a NAT device. Still, for APIs that allow unauthenticated access, using `IpAddress` might be the best option.
67+
68+
### User identity throttling
6069

61-
## User identity throttling
62-
If an end user is authenticated, then a throttling key can be generated based on information that uniquely identifies that user.
70+
If an end user is authenticated, you can generate a throttling key based on information that uniquely identifies the user:
6371

6472
```xml
6573
<rate-limit-by-key calls="10"
6674
renewal-period="60"
6775
counter-key="@(context.Request.Headers.GetValueOrDefault("Authorization","").AsJwt()?.Subject)" />
6876
```
6977

70-
This example shows how to extract the Authorization header, convert it to `JWT` object and use the subject of the token to identify the user and use that as the rate limiting key. If the user identity is stored in the `JWT` as one of the other claims, then that value could be used in its place.
78+
This example shows how to extract the Authorization header, convert it to a `JWT` object, and use the subject of the token to identify the user. It then uses that value as the rate limiting key. If the user identity is stored in the `JWT` as one of the other claims, that value can be used instead.
79+
80+
### Combined policies
7181

72-
## Combined policies
73-
Although the user-based throttling policies provide more control than the subscription-based throttling policies, there is still value combining both capabilities. Throttling by product subscription key ([Limit call rate by subscription](rate-limit-policy.md) and [Set usage quota by subscription](quota-policy.md)) is a great way to enable monetizing of an API by charging based on usage levels. The finer grained control of being able to throttle by user is complementary and prevents one user's behavior from degrading the experience of another.
82+
Although user-based throttling policies provide more control than subscription-based throttling policies, there is still value in combining both capabilities. For monetized APIs, throttling by product subscription key ([Limit call rate by subscription](rate-limit-policy.md) and [Set usage quota by subscription](quota-policy.md)) is a great way to implement fees that are based on usage levels. The finer-grained control of being able to throttle by user is complementary and prevents one user's behavior from degrading the experience of another.
7483

75-
## Client driven throttling
76-
When the throttling key is defined using a [policy expression](./api-management-policy-expressions.md), then it is the API provider that is choosing how the throttling is scoped. However, a developer might want to control how they rate limit their own customers. This could be enabled by the API provider by introducing a custom header to allow the developer's client application to communicate the key to the API.
84+
### Client-driven throttling
85+
86+
When the throttling key is defined via a [policy expression](./api-management-policy-expressions.md), the API provider chooses how the throttling is scoped. However, a developer might want to control how they rate-limit their own customers. The API provider can enable this type of control by introducing a custom header to allow the developer's client application to communicate the key to the API:
7787

7888
```xml
7989
<rate-limit-by-key calls="100"
8090
renewal-period="60"
8191
counter-key="@(request.Headers.GetValueOrDefault("Rate-Key",""))"/>
8292
```
8393

84-
This enables the developer's client application to choose how they want to create the rate limiting key. The client developers could create their own rate tiers by allocating sets of keys to users and rotating the key usage.
94+
This technique enables the developer's client application to determine how to create the rate limiting key. The client developers could create their own rate tiers by allocating sets of keys to users and rotating the key usage.
8595

8696
## Considerations for multiple regions or gateways
8797

88-
Rate limiting policies like `rate-limit`, `rate-limit-by-key`, `azure-openai-token-limit`, and `llm-token-limit` use counters at the level of the API Management gateway. This means that in [multi-region deployments](api-management-howto-deploy-multi-region.md) of API Management, each regional gateway has a separate counter, and rate limits are enforced separately for each region. Similarly, in API Management instances with [workspaces](workspaces-overview.md), limits are enforced separately for each workspace gateway.
98+
Rate limiting policies like `rate-limit`, `rate-limit-by-key`, `azure-openai-token-limit`, and `llm-token-limit` use counters at the level of the API Management gateway. Therefore, in [multi-region deployments](api-management-howto-deploy-multi-region.md) of API Management, each regional gateway has a separate counter, and rate limits are enforced separately for each region. Similarly, in API Management instances with [workspaces](workspaces-overview.md), limits are enforced separately for each workspace gateway.
8999

90-
Quota policies such as `quota` and `quota-by-key` are global, meaning that a single counter is used at the level of the API Management instance.
100+
Quota policies like `quota` and `quota-by-key` are global, which means that a single counter is used at the level of the API Management instance.
91101

92102
## Summary
93-
Azure API Management provides rate and quota throttling to both protect and add value to your API service. These throttling policies with custom scoping rules allow you finer grained control over those policies to enable your customers to build even better applications. The examples in this article demonstrate the use of these new policies by manufacturing rate limiting keys with client IP addresses, user identity, and client generated values. However, there are many other parts of the message that could be used such as user agent, URL path fragments, and message size.
103+
104+
API Management provides rate and quota throttling to protect and add value to your API service. Throttling policies that have custom scoping rules provide finer-grained control over those policies to enable your customers to build even better applications. The examples in this article demonstrate the use of these policies by creating rate limiting keys with client IP addresses, user identity, and client-generated values. You can, however, use many other parts of the message, such as user agent, URL path fragments, and message size.
94105

95106
## Related content
96107

includes/api-management-quota-accuracy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ ms.date: 07/05/2022
66
ms.author: danlep
77
---
88
> [!NOTE]
9-
> When underlying compute resources restart in the service platform, API Management may continue to handle requests for a short period after a quota is reached.
9+
> When underlying compute resources restart in the service platform, API Management might continue to handle requests for a short period after a quota is reached.

includes/api-management-rate-limit-accuracy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ ms.date: 07/05/2022
66
ms.author: danlep
77
---
88
> [!CAUTION]
9-
> Due to the distributed nature of throttling architecture, rate limiting is never completely accurate. The difference between the configured and the actual number of allowed requests varies based on request volume and rate, backend latency, and other factors.
9+
> Because of the distributed nature of throttling architecture, rate limiting is never completely accurate. The difference between the configured number of allowed requests and the actual number varies depending on request volume and rate, backend latency, and other factors.

0 commit comments

Comments
 (0)