Skip to content

Commit c2a1e43

Browse files
author
gitName
committed
wip
1 parent b4dc2df commit c2a1e43

File tree

1 file changed

+9
-5
lines changed

1 file changed

+9
-5
lines changed

articles/api-management/genai-gateway-capabilities.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: dlepow
77
ms.service: azure-api-management
88
ms.collection: ce-skilling-ai-copilot
99
ms.topic: concept-article
10-
ms.date: 02/05/2025
10+
ms.date: 04/29/2025
1111
ms.author: danlep
1212
---
1313

@@ -18,7 +18,7 @@ ms.author: danlep
1818
This article introduces capabilities in Azure API Management to help you manage generative AI APIs, such as those provided by [Azure OpenAI Service](/azure/ai-services/openai/overview). Azure API Management provides a range of policies, metrics, and other features to enhance security, performance, and reliability for the APIs serving your intelligent apps. Collectively, these features are called *AI gateway capabilities* for your generative AI APIs.
1919

2020
> [!NOTE]
21-
> * This article focuses on capabilities to manage APIs exposed by Azure OpenAI Service. Many of the AI gateway capabilities apply to other large language model (LLM) APIs, including those available through [Azure AI Model Inference API](/azure/ai-studio/reference/reference-model-inference-api).
21+
> * Use AI gateway capabilities to manage APIs exposed by Azure OpenAI Service and with other large language model (LLM) APIs, including those available through [Azure AI Model Inference API](/azure/ai-studio/reference/reference-model-inference-api) or with OpenAI-compatible models served through third-party inference providers.
2222
> * AI gateway capabilities are features of API Management's existing API gateway, not a separate API gateway. For more information on API Management, see [Azure API Management overview](api-management-key-concepts.md).
2323
2424
## Challenges in managing generative AI APIs
@@ -57,7 +57,7 @@ The following basic example demonstrates how to set a TPM limit of 500 per subsc
5757
```
5858

5959
> [!TIP]
60-
> To manage and enforce token limits for LLM APIs available through the Azure AI Model Inference API, API Management provides the equivalent [llm-token-limit](llm-token-limit-policy.md) policy.
60+
> To manage and enforce token limits for other LLM APIs, API Management provides the equivalent [llm-token-limit](llm-token-limit-policy.md) policy.
6161
6262

6363
## Emit token metric policy
@@ -79,7 +79,7 @@ For example, the following policy sends metrics to Application Insights split by
7979
```
8080

8181
> [!TIP]
82-
> To send metrics for LLM APIs available through the Azure AI Model Inference API, API Management provides the equivalent [llm-emit-token-metric](llm-emit-token-metric-policy.md) policy.
82+
> To send metrics for other LLM APIs, API Management provides the equivalent [llm-emit-token-metric](llm-emit-token-metric-policy.md) policy.
8383
8484
## Backend load balancer and circuit breaker
8585

@@ -102,7 +102,11 @@ Configure [Azure OpenAI semantic caching](azure-openai-enable-semantic-caching.m
102102
In API Management, enable semantic caching by using Azure Redis Enterprise or another [external cache](api-management-howto-cache-external.md) compatible with RediSearch and onboarded to Azure API Management. By using the Azure OpenAI Service Embeddings API, the [azure-openai-semantic-cache-store](azure-openai-semantic-cache-store-policy.md) and [azure-openai-semantic-cache-lookup](azure-openai-semantic-cache-lookup-policy.md) policies store and retrieve semantically similar prompt completions from the cache. This approach ensures completions reuse, resulting in reduced token consumption and improved response performance.
103103

104104
> [!TIP]
105-
> To enable semantic caching for LLM APIs available through the Azure AI Model Inference API, API Management provides the equivalent [llm-semantic-cache-store-policy](llm-semantic-cache-store-policy.md) and [llm-semantic-cache-lookup-policy](llm-semantic-cache-lookup-policy.md) policies.
105+
> To enable semantic caching for other LLM APIs, API Management provides the equivalent [llm-semantic-cache-store-policy](llm-semantic-cache-store-policy.md) and [llm-semantic-cache-lookup-policy](llm-semantic-cache-lookup-policy.md) policies.
106+
107+
108+
## Content safety policy
109+
106110

107111

108112
## Labs and samples

0 commit comments

Comments
 (0)