Skip to content

Commit f966be2

Browse files
Merge pull request #296197 from dlepow/remv
[APIM] Completion model deprecations
2 parents 8315064 + 4202dfd commit f966be2

5 files changed

+7
-5
lines changed

articles/api-management/azure-openai-enable-semantic-caching.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Enable semantic caching of responses to Azure OpenAI API requests to reduce band
2424

2525
* One or more Azure OpenAI Service APIs must be added to your API Management instance. For more information, see [Add an Azure OpenAI Service API to Azure API Management](azure-openai-api-from-specification.md).
2626
* The Azure OpenAI service must have deployments for the following:
27-
* Chat Completion API (or Completion API) - Deployment used for API consumer calls
27+
* Chat Completion API - Deployment used for API consumer calls
2828
* Embeddings API - Deployment used for semantic caching
2929
* The API Management instance must be configured to use managed identity authentication to the Azure OpenAI APIs. For more information, see [Authenticate and authorize access to Azure OpenAI APIs using Azure API Management ](api-management-authenticate-authorize-azure-openai.md#authenticate-with-managed-identity).
3030
* An [Azure Cache for Redis Enterprise](../azure-cache-for-redis/quickstart-create-redis-enterprise.md) or [Azure Managed Redis](../azure-cache-for-redis/quickstart-create-managed-redis.md) instance. The **RediSearch** module must be enabled on the Redis cache.

articles/api-management/azure-openai-semantic-cache-lookup-policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.author: danlep
1717

1818
[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
1919

20-
Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of responses to Azure OpenAI Chat Completion API and Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
20+
Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of responses to Azure OpenAI Chat Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
2121

2222
> [!NOTE]
2323
> * This policy must have a corresponding [Cache responses to Azure OpenAI API requests](azure-openai-semantic-cache-store-policy.md) policy.

articles/api-management/azure-openai-semantic-cache-store-policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.author: danlep
1717

1818
[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
1919

20-
The `azure-openai-semantic-cache-store` policy caches responses to Azure OpenAI Chat Completion API and Completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
20+
The `azure-openai-semantic-cache-store` policy caches responses to Azure OpenAI Chat Completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
2121

2222
> [!NOTE]
2323
> * This policy must have a corresponding [Get cached responses to Azure OpenAI API requests](azure-openai-semantic-cache-lookup-policy.md) policy.

articles/api-management/llm-semantic-cache-store-policy.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ ms.author: danlep
1616

1717
[!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
1818

19-
The `llm-semantic-cache-store` policy caches responses to chat completion API and completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
19+
The `llm-semantic-cache-store` policy caches responses to chat completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
2020

2121
> [!NOTE]
2222
> * This policy must have a corresponding [Get cached responses to large language model API requests](llm-semantic-cache-lookup-policy.md) policy.

includes/api-management-azure-openai-models.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,12 @@ The policy is used with APIs [added to API Management from the Azure OpenAI Serv
1515
| API type | Supported models |
1616
|-------|-------------|
1717
| Chat completion | gpt-3.5<br/><br/>gpt-4<br/><br/>gpt-4o<sup>1</sup> |
18-
| Completion | gpt-3.5-turbo-instruct |
1918
| Embeddings | text-embedding-3-large<br/><br/> text-embedding-3-small<br/><br/>text-embedding-ada-002 |
2019

2120
<sup>1</sup> The `gpt-4o` model is multimodal (accepts text or image inputs and generates text).
2221

22+
> [!NOTE]
23+
> Traditional completion APIs are only available with legacy model versions and support is limited.
24+
2325
For more information, see [Azure OpenAI Service models](/azure/ai-services/openai/concepts/models).
2426

0 commit comments

Comments
 (0)