Skip to content

Commit ea30921

Browse files
authored
Merge pull request #276225 from dlepow/apimrl
[APIM] Rate limit counter clarification
2 parents fbcd502 + 3aa29f0 commit ea30921

File tree

3 files changed

+19
-9
lines changed

3 files changed

+19
-9
lines changed

articles/api-management/azure-openai-token-limit-policy.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,11 @@ For more information, see [Azure OpenAI Service models](../ai-services/openai/co
7272

7373
* This policy can be used multiple times per policy definition.
7474
* This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
75+
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
7576

7677
## Example
7778

78-
In the following example, the token limit of 5000 per minute is keyed by the caller IP address. The policy doesn't estimate the number of tokens required for a prompt. After each policy execution, the remaining tokens allowed in the time period are stored in the variable `remainingTokens`.
79+
In the following example, the token limit of 5000 per minute is keyed by the caller IP address. The policy doesn't estimate the number of tokens required for a prompt. After each policy execution, the remaining tokens allowed for that caller IP address in the time period are stored in the variable `remainingTokens`.
7980

8081
```xml
8182
<policies>

articles/api-management/rate-limit-by-key-policy.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: dlepow
66

77
ms.service: api-management
88
ms.topic: article
9-
ms.date: 03/18/2024
9+
ms.date: 05/23/2024
1010
ms.author: danlep
1111
---
1212

@@ -42,15 +42,15 @@ To understand the difference between rate limits and quotas, [see Rate limits an
4242

4343
| Attribute | Description | Required | Default |
4444
| ------------------- | ----------------------------------------------------------------------------------------------------- | -------- | ------- |
45-
| calls | The maximum total number of calls allowed during the time interval specified in the `renewal-period`. Policy expressions are allowed. | Yes | N/A |
45+
| calls | The maximum total number of calls allowed for the key value during the time interval specified in the `renewal-period`. Policy expressions are allowed. | Yes | N/A |
4646
| counter-key | The key to use for the rate limit policy. For each key value, a single counter is used for all scopes at which the policy is configured. Policy expressions are allowed. | Yes | N/A |
4747
| increment-condition | The Boolean expression specifying if the request should be counted towards the rate (`true`). Policy expressions are allowed. | No | N/A |
4848
| increment-count | The number by which the counter is increased per request. Policy expressions are allowed. | No | 1 |
4949
| renewal-period | The length in seconds of the sliding window during which the number of allowed requests should not exceed the value specified in `calls`. Maximum allowed value: 300 seconds. Policy expressions are allowed. | Yes | N/A |
50-
| retry-after-header-name | The name of a custom response header whose value is the recommended retry interval in seconds after the specified call rate is exceeded. Policy expressions aren't allowed. | No | `Retry-After` |
51-
| retry-after-variable-name | The name of a policy expression variable that stores the recommended retry interval in seconds after the specified call rate is exceeded. Policy expressions aren't allowed. | No | N/A |
52-
| remaining-calls-header-name | The name of a response header whose value after each policy execution is the number of remaining calls allowed for the time interval specified in the `renewal-period`. Policy expressions aren't allowed. | No | N/A |
53-
| remaining-calls-variable-name | The name of a policy expression variable that after each policy execution stores the number of remaining calls allowed for the time interval specified in the `renewal-period`. Policy expressions aren't allowed. | No | N/A |
50+
| retry-after-header-name | The name of a custom response header whose value is the recommended retry interval in seconds after the specified call rate is exceeded for the key value. Policy expressions aren't allowed. | No | `Retry-After` |
51+
| retry-after-variable-name | The name of a policy expression variable that stores the recommended retry interval in seconds after the specified call rate is exceeded for the key value. Policy expressions aren't allowed. | No | N/A |
52+
| remaining-calls-header-name | The name of a response header whose value after each policy execution is the number of remaining calls allowed for the key value in the time interval specified in the `renewal-period`. Policy expressions aren't allowed. | No | N/A |
53+
| remaining-calls-variable-name | The name of a policy expression variable that after each policy execution stores the number of remaining calls allowed for the key value in the time interval specified in the `renewal-period`. Policy expressions aren't allowed. | No | N/A |
5454
| total-calls-header-name | The name of a response header whose value is the value specified in `calls`. Policy expressions aren't allowed. | No | N/A |
5555

5656
## Usage
@@ -61,18 +61,19 @@ To understand the difference between rate limits and quotas, [see Rate limits an
6161

6262
### Usage notes
6363

64+
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
6465
* [!INCLUDE [api-management-self-hosted-gateway-rate-limit](../../includes/api-management-self-hosted-gateway-rate-limit.md)] [Learn more](how-to-self-hosted-gateway-on-kubernetes-in-production.md#request-throttling)
6566

6667

6768
## Example
6869

69-
In the following example, the rate limit of 10 calls per 60 seconds is keyed by the caller IP address. After each policy execution, the remaining calls allowed in the time period are stored in the variable `remainingCallsPerIP`.
70+
In the following example, the rate limit of 10 calls per 60 seconds is keyed by the caller IP address. After each policy execution, the remaining calls allowed for that caller IP address in the time period are stored in the variable `remainingCallsPerIP`.
7071

7172
```xml
7273
<policies>
7374
<inbound>
7475
<base />
75-
<rate-limit-by-key calls="10"
76+
<rate-limit-by-key calls="10"
7677
renewal-period="60"
7778
increment-condition="@(context.Response.StatusCode == 200)"
7879
counter-key="@(context.Request.IpAddress)"
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
author: dlepow
3+
ms.service: api-management
4+
ms.topic: include
5+
ms.date: 05/23/2024
6+
ms.author: danlep
7+
---
8+
API Management uses a single counter for each `counter-key` value that you specify in the policy. The counter is updated at all scopes at which the policy is configured with that key value. If you want to configure separate counters at different scopes (for example, a specific API or product), specify different key values at the different scopes. For example, append a string that identifies the scope to the value of an expression.

0 commit comments

Comments
 (0)