Skip to content

Commit 5c9563d

Browse files
Merge pull request #285618 from dlepow/aoaif
[APIM] Updates to AOAI policy availability
2 parents 722ebce + dd0e64e commit 5c9563d

8 files changed

+18
-6
lines changed

articles/api-management/api-management-gateways-overview.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,9 @@ Scale capacity by adding and removing scale [units](upgrade-and-scale.md) in the
198198

199199
## Related content
200200

201-
- Learn more about [API Management in a Hybrid and multicloud World](https://aka.ms/hybrid-and-multi-cloud-api-management)
202-
- Learn more about using the [capacity metric](api-management-capacity.md) for scaling decisions
203-
- Learn about [observability capabilities](observability.md) in API Management
201+
Lear more about:
202+
203+
- [API Management in a Hybrid and multicloud World](https://aka.ms/hybrid-and-multi-cloud-api-management)
204+
- [Capacity metric](api-management-capacity.md) for scaling decisions
205+
- [Observability capabilities](observability.md) in API Management
206+
- [GenAI gateway capabilities](genai-gateway-capabilities.md) in API Management

articles/api-management/azure-openai-enable-semantic-caching.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.collection: ce-skilling-ai-copilot
1313

1414
# Enable semantic caching for Azure OpenAI APIs in Azure API Management
1515

16-
[!INCLUDE [api-management-availability-basicv2-standardv2](../../includes/api-management-availability-basicv2-standardv2.md)]
16+
[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
1717

1818
Enable semantic caching of responses to Azure OpenAI API requests to reduce bandwidth and processing requirements imposed on the backend APIs and lower latency perceived by API consumers. With semantic caching, you can return cached responses for identical prompts and also for prompts that are similar in meaning, even if the text isn't the same. For background, see [Tutorial: Use Azure Cache for Redis as a semantic cache](../azure-cache-for-redis/cache-tutorial-semantic-cache.md).
1919

@@ -152,3 +152,4 @@ For example, if the cache was used, the **Output** section includes entries simi
152152

153153
* [Caching policies](api-management-policies.md#caching)
154154
* [Azure Cache for Redis](../azure-cache-for-redis/cache-overview.md)
155+
* [GenAI gateway capabilities](genai-gateway-capabilities.md) in Azure API Management

articles/api-management/azure-openai-semantic-cache-lookup-policy.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of r
2626
2727
[!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
2828

29+
[!INCLUDE [api-management-azure-openai-models](../../includes/api-management-azure-openai-models.md)]
30+
2931
## Policy statement
3032

3133
```xml

articles/api-management/azure-openai-semantic-cache-store-policy.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ The `azure-openai-semantic-cache-store` policy caches responses to Azure OpenAI
2626
2727
[!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
2828

29+
[!INCLUDE [api-management-azure-openai-models](../../includes/api-management-azure-openai-models.md)]
30+
2931
## Policy statement
3032

3133
```xml

articles/api-management/azure-openai-token-limit-policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ By relying on token usage metrics returned from the OpenAI endpoint, the policy
6363
* This policy can be used multiple times per policy definition.
6464
* This policy can optionally be configured when adding an API from the Azure OpenAI Service using the portal.
6565
* Where available when `estimate-prompt-tokens` is set to `false`, values in the usage section of the response from the Azure OpenAI Service API are used to determine token usage.
66-
* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute.
66+
* Certain Azure OpenAI endpoints support streaming of responses. When `stream` is set to `true` in the API request to enable streaming, prompt tokens are always estimated, regardless of the value of the `estimate-prompt-tokens` attribute. Completion tokens are also estimated when responses are streamed.
6767
* [!INCLUDE [api-management-rate-limit-key-scope](../../includes/api-management-rate-limit-key-scope.md)]
6868

6969
## Example

articles/api-management/genai-gateway-capabilities.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ The backend [circuit breaker](backends.md#circuit-breaker) features dynamic trip
9595

9696
## Semantic caching policy
9797

98-
Configure [Azure OpenAI semantic caching](azure-openai-enable-semantic-caching.md) policies to optimize token consumption by using semantic caching, which stores completions for prompts with similar meaning.
98+
Configure [Azure OpenAI semantic caching](azure-openai-enable-semantic-caching.md) policies to optimize token use by storing completions for similar prompts.
9999

100100
:::image type="content" source="media/genai-gateway-capabilities/semantic-caching.png" alt-text="Diagram of semantic caching in API Management.":::
101101

articles/api-management/llm-semantic-cache-lookup-policy.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@ Use the `llm-semantic-cache-lookup` policy to perform cache lookup of responses
2626
2727
[!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
2828

29+
[!INCLUDE [api-management-llm-models](../../includes/api-management-llm-models.md)]
30+
2931
## Policy statement
3032

3133
```xml

articles/api-management/llm-semantic-cache-store-policy.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ The `llm-semantic-cache-store` policy caches responses to chat completion API an
2525
2626
[!INCLUDE [api-management-policy-generic-alert](../../includes/api-management-policy-generic-alert.md)]
2727

28+
[!INCLUDE [api-management-llm-models](../../includes/api-management-llm-models.md)]
29+
2830
## Policy statement
2931

3032
```xml

0 commit comments

Comments
 (0)