Skip to content

Commit 28c3edb

Browse files
Merge pull request #279263 from dlepow/semcache2
[APIM] AOAI policies - tier and gateway support
2 parents efbaf9e + ca5086a commit 28c3edb

7 files changed

+40
-22
lines changed

articles/api-management/api-management-gateways-overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ Managed and self-hosted gateways support all available [policies](api-management
130130

131131
<sup>1</sup> Configured policies that aren't supported by the self-hosted gateway are skipped during policy execution.<br/>
132132
<sup>2</sup> The quota by key policy isn't available in the v2 tiers.<br/>
133-
<sup>3</sup> The rate limit by key and quota by key policies aren't available in the Consumption tier.<br/>
133+
<sup>3</sup> The rate limit by key, quota by key, and Azure OpenAI token limit policies aren't available in the Consumption tier.<br/>
134134
<sup>4</sup> [!INCLUDE [api-management-self-hosted-gateway-rate-limit](../../includes/api-management-self-hosted-gateway-rate-limit.md)] [Learn more](how-to-self-hosted-gateway-on-kubernetes-in-production.md#request-throttling)
135135

136136

articles/api-management/azure-openai-emit-token-metric-policy.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: dlepow
66

77
ms.service: api-management
88
ms.topic: article
9-
ms.date: 05/10/2024
9+
ms.date: 07/09/2024
1010
ms.author: danlep
1111
ms.collection: ce-skilling-ai-copilot
1212
ms.custom:
@@ -21,6 +21,8 @@ The `azure-openai-emit-token-metric` policy sends metrics to Application Insight
2121

2222
[!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
2323

24+
[!INCLUDE [api-management-azure-openai-models](../../includes/api-management-azure-openai-models.md)]
25+
2426

2527
## Prerequisites
2628

@@ -74,13 +76,15 @@ The `azure-openai-emit-token-metric` policy sends metrics to Application Insight
7476

7577
- [**Policy sections:**](./api-management-howto-policies.md#sections) inbound
7678
- [**Policy scopes:**](./api-management-howto-policies.md#scopes) global, workspace, product, API, operation
77-
- [**Gateways:**](api-management-gateways-overview.md) classic, v2
79+
- [**Gateways:**](api-management-gateways-overview.md) classic, v2, consumption, self-hosted
7880

7981
### Usage notes
8082

8183
* This policy can be used multiple times per policy definition.
82-
* You can configure at most 10 custom definitions for this policy.
84+
* You can configure at most 10 custom dimensions for this policy.
8385
* This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
86+
* Where available, values in the usage section of the response from the Azure OpenAI Service API are used to determine token metrics.
87+
* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, token metrics are estimated.
8488

8589
## Example
8690

articles/api-management/azure-openai-enable-semantic-caching.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,15 @@ ms.service: api-management
66
ms.custom:
77
- build-2024
88
ms.topic: how-to
9-
ms.date: 05/13/2024
9+
ms.date: 06/25/2024
1010
ms.author: danlep
1111
ms.collection: ce-skilling-ai-copilot
1212
---
1313

1414
# Enable semantic caching for Azure OpenAI APIs in Azure API Management
1515

16+
[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
17+
1618
Enable semantic caching of responses to Azure OpenAI API requests to reduce bandwidth and processing requirements imposed on the backend APIs and lower latency perceived by API consumers. With semantic caching, you can return cached responses for identical prompts and also for prompts that are similar in meaning, even if the text isn't the same. For background, see [Tutorial: Use Azure Cache for Redis as a semantic cache](../azure-cache-for-redis/cache-tutorial-semantic-cache.md).
1719

1820
## Prerequisites

articles/api-management/azure-openai-semantic-cache-lookup-policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.collection: ce-skilling-ai-copilot
99
ms.custom:
1010
- build-2024
1111
ms.topic: article
12-
ms.date: 05/10/2024
12+
ms.date: 06/25/2024
1313
ms.author: danlep
1414
---
1515

articles/api-management/azure-openai-semantic-cache-store-policy.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Azure API Management policy reference - azure-openai-sematic-cache-store
2+
title: Azure API Management policy reference - azure-openai-semantic-cache-store
33
description: Reference for the azure-openai-semantic-cache-store policy available for use in Azure API Management. Provides policy usage, settings, and examples.
44
services: api-management
55
author: dlepow
@@ -9,7 +9,7 @@ ms.collection: ce-skilling-ai-copilot
99
ms.custom:
1010
- build-2024
1111
ms.topic: article
12-
ms.date: 05/10/2024
12+
ms.date: 06/25/2024
1313
ms.author: danlep
1414
---
1515

articles/api-management/azure-openai-token-limit-policy.md

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.collection: ce-skilling-ai-copilot
99
ms.custom:
1010
- build-2024
1111
ms.topic: article
12-
ms.date: 05/10/2024
12+
ms.date: 06/25/2024
1313
ms.author: danlep
1414
---
1515

@@ -23,18 +23,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
2323

2424
[!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
2525

26-
## Supported Azure OpenAI Service models
27-
28-
The policy is used with APIs [added to API Management from the Azure OpenAI Service](azure-openai-api-from-specification.md) of the following types:
29-
30-
| API type | Supported models |
31-
|-------|-------------|
32-
| Chat completion | gpt-3.5<br/><br/>gpt-4 |
33-
| Completion | gpt-3.5-turbo-instruct |
34-
| Embeddings | text-embedding-3-large<br/><br/> text-embedding-3-small<br/><br/>text-embedding-ada-002 |
35-
36-
37-
For more information, see [Azure OpenAI Service models](../ai-services/openai/concepts/models.md).
26+
[!INCLUDE [api-management-azure-openai-models](../../includes/api-management-azure-openai-models.md)]
3827

3928
## Policy statement
4029

@@ -67,12 +56,13 @@ For more information, see [Azure OpenAI Service models](../ai-services/openai/co
6756

6857
- [**Policy sections:**](./api-management-howto-policies.md#sections) inbound
6958
- [**Policy scopes:**](./api-management-howto-policies.md#scopes) global, workspace, product, API, operation
70-
- [**Gateways:**](api-management-gateways-overview.md) classic, v2
59+
- [**Gateways:**](api-management-gateways-overview.md) classic, v2, self-hosted
7160

7261
### Usage notes
7362

7463
* This policy can be used multiple times per policy definition.
7564
* This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
65+
* Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the Azure OpenAI Service API are used to determine token usage.
7666
* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
7767
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
7868

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
author: dlepow
3+
ms.service: api-management
4+
ms.custom:
5+
- build-2024
6+
ms.topic: include
7+
ms.date: 07/09/2024
8+
ms.author: danlep
9+
---
10+
11+
## Supported Azure OpenAI Service models
12+
13+
The policy is used with APIs [added to API Management from the Azure OpenAI Service](../articles/api-management/azure-openai-api-from-specification.md) of the following types:
14+
15+
| API type | Supported models |
16+
|-------|-------------|
17+
| Chat completion | gpt-3.5<br/><br/>gpt-4 |
18+
| Completion | gpt-3.5-turbo-instruct |
19+
| Embeddings | text-embedding-3-large<br/><br/> text-embedding-3-small<br/><br/>text-embedding-ada-002 |
20+
21+
For more information, see [Azure OpenAI Service models](../articles/ai-services/openai/concepts/models.md).
22+

0 commit comments

Comments
 (0)