You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/api-management/genai-gateway-capabilities.md
+20-8Lines changed: 20 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,33 +1,33 @@
1
1
---
2
2
title: GenAI gateway capabilities in Azure API Management
3
-
description: Learn about policies and features in Azure API Management that support generative AI (GenAI) gateway capabilities, such as token limiting, load balancing, semantic caching, and more.
3
+
description: Learn about Azure API Management's policies and features to manage generative AI APIs, such as token rate limiting, load balancing, and semantic caching.
4
4
services: api-management
5
5
author: dlepow
6
6
7
7
ms.service: api-management
8
8
ms.collection: ce-skilling-ai-copilot
9
9
ms.topic: concept-article
10
-
ms.date: 07/24/2024
10
+
ms.date: 08/13/2024
11
11
ms.author: danlep
12
12
---
13
13
14
14
# Overview of generative AI gateway capabilities in Azure API Management
This article introduces capabilities in Azure API Management to help you manage generative AI APIs, such as those provided by [Azure OpenAI Service](../ai-services/openai/overview.md). Azure API Management provides a range of policies, metrics, and other features to enhance security, performance, and reliability for the APIs serving your intelligent apps. Collectively, this set of features are called *generative AI (GenAI) gateway capabilities* for your generative AI APIs.
18
+
This article introduces capabilities in Azure API Management to help you manage generative AI APIs, such as those provided by [Azure OpenAI Service](../ai-services/openai/overview.md). Azure API Management provides a range of policies, metrics, and other features to enhance security, performance, and reliability for the APIs serving your intelligent apps. Collectively, these features are called *generative AI (GenAI) gateway capabilities* for your generative AI APIs.
19
19
20
20
> [!NOTE]
21
-
> * This article focuses on capabilities to manage APIs exposed by Azure OpenAI Service. Many of the GenAI gateway capabilities can be applied to APIs for other generative AI services.
21
+
> * This article focuses on capabilities to manage APIs exposed by Azure OpenAI Service. Many of the GenAI gateway capabilities apply to other large language model (LLM) APIs, including those available through [Azure AI Model Inference API](../ai-studio/reference/reference-model-inference-api.md).
22
22
> * Generative AI gateway capabilities are features of API Management's existing API gateway, not a separate API gateway. For more information on API Management, see [Azure API Management overview](api-management-key-concepts.md).
23
23
24
24
## Challenges in managing generative AI APIs
25
25
26
-
One of the main resources you have in Azure OpenAI Service is *tokens*. Azure OpenAI Service assigns quota for your model deployments expressed in tokens-per-minute (TPM) which is then distributed across your model consumers - for example, different applications, developer teams, departments within the company, etc.
26
+
One of the main resources you have in generative AI services is *tokens*. Azure OpenAI Service assigns quota for your model deployments expressed in tokens-per-minute (TPM) which is then distributed across your model consumers - for example, different applications, developer teams, departments within the company, etc.
27
27
28
28
Azure makes it easy to connect a single app to Azure OpenAI Service: you can connect directly using an API key with a TPM limit configured directly on the model deployment level. However, when you start growing your application portfolio, you're presented with multiple apps calling single or even multiple Azure OpenAI Service endpoints deployed as pay-as-you-go or [Provisioned Throughput Units](../ai-services/openai/concepts/provisioned-throughput.md) (PTU) instances. That comes with certain challenges:
29
29
30
-
* How is token usage tracked across multiple applications? Can crosscharges be calculated for multiple applications/teams that use Azure OpenAI Service models?
30
+
* How is token usage tracked across multiple applications? Can cross-charges be calculated for multiple applications/teams that use Azure OpenAI Service models?
31
31
* How do you ensure that a single app doesn't consume the whole TPM quota, leaving other apps with no option to use Azure OpenAI Service models?
32
32
* How is the API key securely distributed across multiple applications?
33
33
* How is load distributed across multiple Azure OpenAI endpoints? Can you ensure that the committed capacity in PTUs is exhausted before falling back to pay-as-you-go instances?
@@ -56,9 +56,13 @@ The following basic example demonstrates how to set a TPM limit of 500 per subsc
56
56
</azure-openai-token-limit>
57
57
```
58
58
59
+
> [!TIP]
60
+
> To manage and enforce token limits for LLM APIs available through the Azure AI Model Inference API, API Management provides the equivalent [llm-token-limit](llm-token-limit-policy.md) policy.
61
+
62
+
59
63
## Emit token metric policy
60
64
61
-
The [Azure OpenAI emit token metric](azure-openai-emit-token-metric-policy.md) policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs. The policy helps provide an overview of the utilization of Azure OpenAI Service models across multiple applications or API consumers. This policy could be useful for chargeback scenarios, monitoring, and capacity planning.
65
+
The [Azure OpenAI emit token metric](azure-openai-emit-token-metric-policy.md) policy sends metrics to Application Insights about consumption of LLM tokens through Azure OpenAI Service APIs. The policy helps provide an overview of the utilization of Azure OpenAI Service models across multiple applications or API consumers. This policy could be useful for chargeback scenarios, monitoring, and capacity planning.
62
66
63
67
:::image type="content" source="media/genai-gateway-capabilities/emit-token-metrics.png" alt-text="Diagram of emitting Azure OpenAI Service token metrics using API Management.":::
64
68
@@ -74,6 +78,9 @@ For example, the following policy sends metrics to Application Insights split by
74
78
</azure-openai-emit-token-metric>
75
79
```
76
80
81
+
> [!TIP]
82
+
> To send metrics for LLM APIs available through the Azure AI Model Inference API, API Management provides the equivalent [llm-emit-token-metric](llm-emit-token-metric-policy.md) policy.
83
+
77
84
## Backend load balancer and circuit breaker
78
85
79
86
One of the challenges when building intelligent applications is to ensure that the applications are resilient to backend failures and can handle high loads. By configuring your Azure OpenAI Service endpoints using [backends](backends.md) in Azure API Management, you can balance the load across them. You can also define circuit breaker rules to stop forwarding requests to the Azure OpenAI Service backends if they're not responsive.
:::image type="content" source="media/genai-gateway-capabilities/semantic-caching.png" alt-text="Diagram of semantic caching in API Management.":::
94
101
95
-
In API Management, enable semantic caching by using Azure Redis Enterprise or another [external cache](api-management-howto-cache-external.md) compatible with RediSearch and onboarded to Azure API Management. By using the Azure OpenAI Service Embeddings API, this policy identifies semantically similar prompts and stores their respective completions in the cache. This approach ensures completions reuse, resulting in reduced token consumption and improved response performance.
102
+
In API Management, enable semantic caching by using Azure Redis Enterprise or another [external cache](api-management-howto-cache-external.md) compatible with RediSearch and onboarded to Azure API Management. By using the Azure OpenAI Service Embeddings API, the [azure-openai-semantic-cache-store](azure-openai-semantic-cache-store-policy.md) and [azure-openai-semantic-cache-lookup](azure-openai-semantic-cache-lookup-policy.md) policies to store and retrieve semantically similar prompt completions from the cache. This approach ensures completions reuse, resulting in reduced token consumption and improved response performance.
103
+
104
+
> [!TIP]
105
+
> To enable semantic caching for LLM APIs available through the Azure AI Model Inference API, API Management provides the equivalent[llm-semantic-cache-store-policy](llm-semantic-cache-store-policy.md) and [llm-semantic-cache-lookup-policy](llm-semantic-cache-lookup-policy.md) policies.
96
106
97
107
98
108
## Labs and samples
99
109
100
110
*[Labs for the GenAI gateway capabilities of Azure API Management](https://github.com/Azure-Samples/AI-Gateway)
101
111
*[Azure API Management (APIM) - Azure OpenAI Sample (Node.js)](https://github.com/Azure-Samples/genai-gateway-apim)
102
112
*[Python sample code for using Azure OpenAI with API Management](https://github.com/Azure-Samples/openai-apim-lb/blob/main/docs/sample-code.md)
113
+
*[AI hub gateway landing zone accelerator](https://github.com/Azure-Samples/ai-hub-gateway-solution-accelerator)
103
114
104
115
## Architecture and design considerations
105
116
@@ -110,5 +121,6 @@ In API Management, enable semantic caching by using Azure Redis Enterprise or an
110
121
## Related content
111
122
112
123
*[Blog: Introducing GenAI capabilities in Azure API Management](https://techcommunity.microsoft.com/t5/azure-integration-services-blog/introducing-genai-gateway-capabilities-in-azure-api-management/ba-p/4146525)
124
+
*[Blog: Integrating Azure Content Safety with API Management for Azure OpenAI Endpoints](https://techcommunity.microsoft.com/t5/fasttrack-for-azure/integrating-azure-content-safety-with-api-management-for-azure/ba-p/4202505)
113
125
*[Smart load balancing for OpenAI endpoints and Azure API Management](https://techcommunity.microsoft.com/t5/fasttrack-for-azure/smart-load-balancing-for-openai-endpoints-and-azure-api/ba-p/3991616)
114
126
*[Authenticate and authorize access to Azure OpenAI APIs using Azure API Management](api-management-authenticate-authorize-azure-openai.md)
0 commit comments