Skip to content

Commit 01063fa

Browse files
committed
updated diagrams
1 parent fb188ec commit 01063fa

File tree

6 files changed

+17
-11
lines changed

6 files changed

+17
-11
lines changed

articles/api-management/genai-gateway-capabilities.md

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,26 +7,30 @@ author: dlepow
77
ms.service: api-management
88
ms.collection: ce-skilling-ai-copilot
99
ms.topic: concept-article
10-
ms.date: 07/16/2024
10+
ms.date: 07/24/2024
1111
ms.author: danlep
1212
---
1313

1414
# Overview of generative AI gateway capabilities in Azure API Management
1515

1616
[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
1717

18-
While generative AI services and their APIs provide powerful capabilities for understanding, interpreting, and generating human-like text and images, they can also impose significant management, monitoring, and security challenges for app developers. This article provides an introduction to how Azure API Management can help you manage generative AI APIs, such as those provided by [Azure OpenAI Service](../ai-services/openai/overview.md). Azure API Management provides a range of capabilities including policies, metrics, and other features to enhance security, performance, and reliability of APIs for your intelligent apps. Collectively, this set of features enables API Management to be a *generative AI (GenAI) gateway* for your applications.
18+
This article introduces capabilities in Azure API Management to help you manage generative AI APIs, such as those provided by [Azure OpenAI Service](../ai-services/openai/overview.md). Azure API Management provides a range of policies, metrics, and other features to enhance security, performance, and reliability for the APIs serving your intelligent apps. Collectively, this set of features are called *generative AI (GenAI) gateway capabilities* for your generative AI APIs.
19+
20+
> [!NOTE]
21+
> * This article focuses on capabilities to manage APIs exposed by Azure OpenAI Service. Many of the GenAI gateway capabilities can be applied to APIs for other generative AI services.
22+
> * Generative AI gateway capabilities are features of API Management's existing API gateway, not a separate API gateway. For more information on API Management, see [Azure API Management overview](../api-management/overview.md).
1923
2024
## Challenges in managing generative AI APIs
2125

22-
One of the main resources you have in Azure OpenAI Service is tokens. Azure OpenAI Service assigns quota for your model deployments expressed in tokens-per-minute (TPM) which is then distributed across your model consumers - for example, different applications, developer teams, departments within the company, etc.
26+
One of the main resources you have in Azure OpenAI Service is *tokens*. Azure OpenAI Service assigns quota for your model deployments expressed in tokens-per-minute (TPM) which is then distributed across your model consumers - for example, different applications, developer teams, departments within the company, etc.
2327

24-
Azure makes it easy to connect a single app to Azure OpenAI Service: you can connect directly using an API key with a TPM limit configured directly on the model deployment level. However, when you start growing your application portfolio, you are presented with multiple apps calling single or even multiple Azure OpenAI Service endpoints deployed as pay-as-you-go or [Provisioned Throughput Units](../ai-services/openai/concepts/provisioned-throughput.md) (PTU) instances. That comes with certain challenges:
28+
Azure makes it easy to connect a single app to Azure OpenAI Service: you can connect directly using an API key with a TPM limit configured directly on the model deployment level. However, when you start growing your application portfolio, you're presented with multiple apps calling single or even multiple Azure OpenAI Service endpoints deployed as pay-as-you-go or [Provisioned Throughput Units](../ai-services/openai/concepts/provisioned-throughput.md) (PTU) instances. That comes with certain challenges:
2529

2630
* How is token usage tracked across multiple applications? Can cross charges be calculated for multiple applications/teams that use Azure OpenAI Service models?
2731
* How do you ensure that a single app doesn't consume the whole TPM quota, leaving other apps with no option to use Azure OpenAI Service models?
2832
* How is the API key securely distributed across multiple applications?
29-
* How is load distributed across multiple Azure OpenAI endpoints? Can you ensure that the committed capacity in PTUs is used first before falling back to pay-as-you-go instances?
33+
* How is load distributed across multiple Azure OpenAI endpoints? Can you ensure that the committed capacity in PTUs is exhausted before falling back to pay-as-you-go instances?
3034

3135
The rest of this article describes how Azure API Management can help you address these challenges.
3236

@@ -42,7 +46,7 @@ Configure the [Azure OpenAI token limit policy](azure-openai-token-limit-policy.
4246

4347
:::image type="content" source="media/genai-gateway-capabilities/token-rate-limiting.png" alt-text="Diagram of limiting Azure OpenAI Service tokens in API Management.":::
4448

45-
This policy provides flexibility to assign token-based limits on any counter key, such as subscription key, IP address, or an arbitrary key defined through a policy expression. The policy also enables precalculation of prompt tokens on the Azure API Management side, minimizing unnecessary requests to the Azure OpenAI Service backend if the prompt already exceeds the limit.
49+
This policy provides flexibility to assign token-based limits on any counter key, such as subscription key, originating IP address, or an arbitrary key defined through a policy expression. The policy also enables precalculation of prompt tokens on the Azure API Management side, minimizing unnecessary requests to the Azure OpenAI Service backend if the prompt already exceeds the limit.
4650

4751
The following basic example demonstrates how to set a TPM limit of 500 per subscription key:
4852

@@ -70,29 +74,31 @@ For example, the following policy sends metrics to Application Insights split by
7074
</azure-openai-emit-token-metric>
7175
```
7276

73-
## Load balancer and circuit breaker
77+
## Backend load balancer and circuit breaker
7478

75-
One of the challenges when building intelligent applications is to ensure that the applications' backends are resilient to backend failures and can handle high loads. By configuring your Azure OpenAI Service endpoints using [backends](backends.md) in Azure API Management, you can balance the load across them. You can also define circuit breaker rules to stop forwarding requests to the Azure OpenAI Service backends if they're not responsive.
79+
One of the challenges when building intelligent applications is to ensure that the applications are resilient to backend failures and can handle high loads. By configuring your Azure OpenAI Service endpoints using [backends](backends.md) in Azure API Management, you can balance the load across them. You can also define circuit breaker rules to stop forwarding requests to the Azure OpenAI Service backends if they're not responsive.
7680

7781
The backend [load balancer](backends.md#backends-in-api-management) supports round-robin, weighted, and priority-based load balancing, giving you flexibility to define a load distribution strategy that meets your specific requirements. For example, define priorities within the load balancer configuration to ensure optimal utilization of specific Azure OpenAI endpoints, particularly those purchased as PTUs.
7882

7983
:::image type="content" source="media/genai-gateway-capabilities/backend-load-balancing.png" alt-text="Diagram of using backend load balancing in API Management.":::
8084

81-
The backend [circuit breaker](backends.md#circuit-breaker) features dynamic trip duration, applying values from the Retry-After header provided by the backend. This ensures precise and timely recovery of the backends, maximizing the utilization of your priority backends to their fullest.
85+
The backend [circuit breaker](backends.md#circuit-breaker) features dynamic trip duration, applying values from the Retry-After header provided by the backend. This ensures precise and timely recovery of the backends, maximizing the utilization of your priority backends.
86+
87+
:::image type="content" source="media/genai-gateway-capabilities/backend-circuit-breaker.png" alt-text="Diagram of using backend circuit breaker in API Management.":::
8288

8389
## Semantic caching policy
8490

8591
Configure [Azure OpenAI semantic caching](azure-openai-enable-semantic-caching.md) policies to optimize token consumption by using semantic caching, which stores completions for prompts with similar meaning.
8692

8793
:::image type="content" source="media/genai-gateway-capabilities/semantic-caching.png" alt-text="Diagram of semantic caching in API Management.":::
8894

89-
In API Management, enable semantic caching by using Azure Redis Enterprise or another external cache compatible with RediSearch and onboarded to Azure API Management. By leveraging the Azure OpenAI Service Embeddings API, this policy identifies semantically similar prompts and stores their respective completions in the cache. This approach ensures completions reuse, resulting in reduced token consumption and improved response performance.
95+
In API Management, enable semantic caching by using Azure Redis Enterprise or another [external cache](api-management-howto-cache-external.md) compatible with RediSearch and onboarded to Azure API Management. By using the Azure OpenAI Service Embeddings API, this policy identifies semantically similar prompts and stores their respective completions in the cache. This approach ensures completions reuse, resulting in reduced token consumption and improved response performance.
9096

9197

9298
## Labs and samples
9399

94100
* [Labs for the GenAI gateway capabilities of Azure API Management](https://github.com/Azure-Samples/AI-Gateway)
95-
* [Azure API Management (APIM) - Azure Open AI Sample (Node.js)](https://github.com/Azure-Samples/genai-gateway-apim)
101+
* [Azure API Management (APIM) - Azure OpenAI Sample (Node.js)](https://github.com/Azure-Samples/genai-gateway-apim)
96102
* [Python sample code for using Azure OpenAI with API Management](https://github.com/Azure-Samples/openai-apim-lb/blob/main/docs/sample-code.md)
97103

98104
## Architecture and design considerations
25.3 KB
Loading
-454 KB
Loading
-278 KB
Loading
-358 KB
Loading
-350 KB
Loading

0 commit comments

Comments
 (0)