Skip to content

Commit 5f68278

Browse files
authored
Merge branch 'MicrosoftDocs:main' into liliankasem/func/cancellation
2 parents d229e30 + 74d472d commit 5f68278

File tree

325 files changed

+7056
-618
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

325 files changed

+7056
-618
lines changed

.openpublishing.publish.config.json

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -242,12 +242,6 @@
242242
"branch": "main",
243243
"branch_mapping": {}
244244
},
245-
{
246-
"path_to_root": "azure-proactive-resiliency-library",
247-
"url": "https://github.com/Azure/Azure-Proactive-Resiliency-Library",
248-
"branch": "main",
249-
"branch_mapping": {}
250-
},
251245
{
252246
"path_to_root": "azure-sdk-for-go-samples",
253247
"url": "https://github.com/Azure-Samples/azure-sdk-for-go-samples",

articles/active-directory-b2c/json-transformations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -337,7 +337,7 @@ The GetClaimFromJson claims transformation gets a single element from a JSON dat
337337

338338
## GetClaimsFromJsonArray
339339

340-
Get a list of specified elements from Json data. Check out the [Live demo](https://github.com/azure-ad-b2c/unit-tests/tree/main/claims-transformation/json#getclaimsfromjsonarray) of this claims transformation.
340+
Get a list of specified elements from JSON data. Check out the [Live demo](https://github.com/azure-ad-b2c/unit-tests/tree/main/claims-transformation/json#getclaimsfromjsonarray) of this claims transformation.
341341

342342
| Element | TransformationClaimType | Data Type | Notes |
343343
| ---- | ----------------------- | --------- | ----- |

articles/api-management/api-management-api-import-restrictions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: dlepow
88
ms.service: azure-api-management
99
ms.custom:
1010
- build-2024
11-
ms.topic: conceptual
11+
ms.topic: concept-article
1212
ms.date: 04/24/2024
1313
ms.author: danlep
1414
---

articles/api-management/api-management-subscriptions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ services: api-management
55
author: dlepow
66

77
ms.service: azure-api-management
8-
ms.topic: conceptual
8+
ms.topic: concept-article
99
ms.date: 09/03/2024
1010
ms.author: danlep
1111
ms.custom: engagement-fy23

articles/api-management/authentication-authorization-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn about authentication and authorization features in Azure API
55
author: dlepow
66

77
ms.service: azure-api-management
8-
ms.topic: conceptual
8+
ms.topic: concept-article
99
ms.date: 11/08/2023
1010
ms.author: danlep
1111
---

articles/api-management/azure-openai-token-limit-policy.md

Lines changed: 38 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.author: danlep
1717

1818
[!INCLUDE [api-management-availability-premium-dev-standard-basic-premiumv2-standardv2-basicv2](../../includes/api-management-availability-premium-dev-standard-basic-premiumv2-standardv2-basicv2.md)]
1919

20-
The `azure-openai-token-limit` policy prevents Azure OpenAI Service API usage spikes on a per key basis by limiting consumption of language model tokens to a specified number per minute. When the token usage is exceeded, the caller receives a `429 Too Many Requests` response status code.
20+
The `azure-openai-token-limit` policy prevents Azure OpenAI Service API usage spikes on a per key basis by limiting consumption of language model tokens to a specified rate (number per minute), a quota over a specified period, or both. When a specified token rate limit is exceeded, the caller receives a `429 Too Many Requests` response status code. When a specified quota is exceeded, the caller receives a `403 Forbidden` response status code.
2121

2222
By relying on token usage metrics returned from the OpenAI endpoint, the policy can accurately monitor and enforce limits in real time. The policy also enables precalculation of prompt tokens by API Management, minimizing unnecessary requests to the OpenAI backend if the limit is already exceeded.
2323

@@ -30,9 +30,13 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
3030
```xml
3131
<azure-openai-token-limit counter-key="key value"
3232
tokens-per-minute="number"
33+
token-quota="number"
34+
token-quota-period="Hourly | Daily | Weekly | Monthly | Yearly"
3335
estimate-prompt-tokens="true | false"
3436
retry-after-header-name="custom header name, replaces default 'Retry-After'"
3537
retry-after-variable-name="policy expression variable name"
38+
remaining-quota-tokens-header-name="header name"
39+
remaining-quota-tokens-variable-name="policy expression variable name"
3640
remaining-tokens-header-name="header name"
3741
remaining-tokens-variable-name="policy expression variable name"
3842
tokens-consumed-header-name="header name"
@@ -43,12 +47,16 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
4347
| Attribute | Description | Required | Default |
4448
| -------------- | ----------------------------------------------------------------------------------------------------- | -------- | ------- |
4549
| counter-key | The key to use for the token limit policy. For each key value, a single counter is used for all scopes at which the policy is configured. Policy expressions are allowed.| Yes | N/A |
46-
| tokens-per-minute | The maximum number of tokens consumed by prompt and completion per minute. | Yes | N/A |
50+
| tokens-per-minute | The maximum number of tokens consumed by prompt and completion per minute. | Either a rate limit (`tokens-per-minute`), a quota (`token-quota` over a `token-quota-period`), or both must be specified. | N/A |
51+
| token-quota | The maximum number of tokens allowed during the time interval specified in the `token-quota-period`. Policy expressions aren't allowed. | Either a rate limit (`tokens-per-minute`), a quota (`token-quota` over a `token-quota-period`), or both must be specified. | N/A |
52+
| token-quota-period | The length of the fixed window after which the `token-quota` resets. The value must be one of the following: `Hourly`,`Daily`, `Weekly`, `Monthly`, `Yearly`. The start time of a quota period is calculated using the UTC timestamp truncated to the unit (hour, day, etc.) used for the period. | Either a rate limit (`tokens-per-minute`), a quota (`token-quota` over a `token-quota-period`), or both must be specified. | N/A |
4753
| estimate-prompt-tokens | Boolean value that determines whether to estimate the number of tokens required for a prompt: <br> - `true`: estimate the number of tokens based on prompt schema in API; may reduce performance. <br> - `false`: don't estimate prompt tokens. <br><br>When set to `false`, the remaining tokens per `counter-key` are calculated using the actual token usage from the response of the model. This could result in prompts being sent to the model that exceed the token limit. In such case, this will be detected in the response, and all succeeding requests will be blocked by the policy until the token limit frees up again. | Yes | N/A |
48-
| retry-after-header-name | The name of a custom response header whose value is the recommended retry interval in seconds after the specified `tokens-per-minute` is exceeded. Policy expressions aren't allowed. | No | `Retry-After` |
49-
| retry-after-variable-name | The name of a variable that stores the recommended retry interval in seconds after the specified `tokens-per-minute` is exceeded. Policy expressions aren't allowed. | No | N/A |
50-
| remaining-tokens-header-name | The name of a response header whose value after each policy execution is the number of remaining tokens allowed for the time interval. Policy expressions aren't allowed.| No | N/A |
51-
| remaining-tokens-variable-name | The name of a variable that after each policy execution stores the number of remaining tokens allowed for the time interval. Policy expressions aren't allowed.| No | N/A |
54+
| retry-after-header-name | The name of a custom response header whose value is the recommended retry interval in seconds after the specified `tokens-per-minute` or `token-quota` is exceeded. Policy expressions aren't allowed. | No | `Retry-After` |
55+
| retry-after-variable-name | The name of a variable that stores the recommended retry interval in seconds after the specified `tokens-per-minute` or `token-quota` is exceeded. Policy expressions aren't allowed. | No | N/A |
56+
| remaining-quota-tokens-header-name | The name of a response header whose value after each policy execution is the number of remaining tokens corresponding to `token-quota` allowed for the `token-quota-period`. Policy expressions aren't allowed. | No | N/A |
57+
| remaining-quota-tokens-variable-name | The name of a variable that after each policy execution stores the number of remaining tokens corresponding to `token-quota` allowed for the `token-quota-period`. Policy expressions aren't allowed. | No | N/A |
58+
| remaining-tokens-header-name | The name of a response header whose value after each policy execution is the number of remaining tokens corresponding to `tokens-per-minute` allowed for the time interval. Policy expressions aren't allowed.| No | N/A |
59+
| remaining-tokens-variable-name | The name of a variable that after each policy execution stores the number of remaining tokens corresponding to `tokens-per-minute` allowed for the time interval. Policy expressions aren't allowed.| No | N/A |
5260
| tokens-consumed-header-name | The name of a response header whose value is the number of tokens consumed by both prompt and completion. The header is added to response only after the response is received from backend. Policy expressions aren't allowed.| No | N/A |
5361
| tokens-consumed-variable-name | The name of a variable initialized to the estimated number of tokens in the prompt in `backend` section of pipeline if `estimate-prompt-tokens` is `true` and zero otherwise. The variable is updated with the reported count upon receiving the response in `outbound` section.| No | N/A |
5462

@@ -64,11 +72,14 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
6472
* This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
6573
* Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the Azure OpenAI Service API are used to determine token usage.
6674
* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
75+
* For models that accept image input, image tokens are generally counted by the backend language model and included in limit and quota calculations. However, when streaming is used or `estimate-prompt-tokens` is set to `true`, the policy currently over-counts each image as a maximum count of 1200 tokens.
6776
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
6877

69-
## Example
78+
## Examples
7079

71-
In the following example, the token limit of 5000 per minute is keyed by the caller IP address. The policy doesn't estimate the number of tokens required for a prompt. After each policy execution, the remaining tokens allowed for that caller IP address in the time period are stored in the variable `remainingTokens`.
80+
### Token rate limit
81+
82+
In the following example, the token rate limit of 5000 per minute is keyed by the caller IP address. The policy doesn't estimate the number of tokens required for a prompt. After each policy execution, the remaining tokens allowed for that caller IP address in the time period are stored in the variable `remainingTokens`.
7283

7384
```xml
7485
<policies>
@@ -84,6 +95,25 @@ In the following example, the token limit of 5000 per minute is keyed by the cal
8495
</policies>
8596
```
8697

98+
### Token quota
99+
100+
In the following example, the token quota of 10000 is keyed by the subscription ID and resets monthly. After each policy execution, the number of remaining tokens allowed for that subscription ID in the time period is stored in the variable `remainingQuotaTokens`.
101+
102+
```xml
103+
<policies>
104+
<inbound>
105+
<base />
106+
<azure-openai-token-limit
107+
counter-key="@(context.Subscription.Id)"
108+
token-quota="100000" token-quota-period="Monthly" remaining-quota-tokens-variable-name="remainingQuotaTokens" />
109+
</inbound>
110+
<outbound>
111+
<base />
112+
</outbound>
113+
</policies>
114+
115+
```
116+
87117
## Related policies
88118

89119
* [Rate limiting and quotas](api-management-policies.md#rate-limiting-and-quotas)

articles/api-management/compute-infrastructure.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Azure API Management compute platform
33
description: Learn about the compute platform used to host your API Management service instance. Instances in the classic service tiers of API Management are hosted on the stv1 or stv2 compute platform.
44
author: dlepow
55
ms.service: azure-api-management
6-
ms.topic: conceptual
6+
ms.topic: concept-article
77
ms.date: 03/26/2024
88
ms.author: danlep
99
ms.custom:

articles/api-management/credentials-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: About credential manager in Azure API Management
33
description: Learn about using credential manager in Azure API Management to create and manage connections to backend SaaS APIs
44
author: dlepow
55
ms.service: azure-api-management
6-
ms.topic: conceptual
6+
ms.topic: concept-article
77
ms.date: 11/14/2023
88
ms.author: danlep
99
ms.custom: references_regions

articles/api-management/credentials-process-flow.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Credential manager in Azure API Management - process flows
33
description: Learn about the management and runtime process flows for managing OAuth 2.0 connections using credential manager in Azure API Management
44
author: dlepow
55
ms.service: azure-api-management
6-
ms.topic: conceptual
6+
ms.topic: concept-article
77
ms.date: 11/14/2023
88
ms.author: danlep
99
---

articles/api-management/developer-portal-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ services: api-management
66
author: dlepow
77

88
ms.service: azure-api-management
9-
ms.topic: conceptual
9+
ms.topic: concept-article
1010
ms.date: 03/29/2024
1111
ms.author: danlep
1212
---

0 commit comments

Comments
 (0)