Merge pull request #296197 from dlepow/remv

prmerger-automator[bot] · web-flow · commit f966be2bc8dc · 2025-03-12T19:50:12.000Z
[APIM] Completion model deprecations
diff --git a/articles/api-management/azure-openai-enable-semantic-caching.md b/articles/api-management/azure-openai-enable-semantic-caching.md
@@ -24,7 +24,7 @@ Enable semantic caching of responses to Azure OpenAI API requests to reduce band
 
 * One or more Azure OpenAI Service APIs must be added to your API Management instance. For more information, see [Add an Azure OpenAI Service API to Azure API Management](azure-openai-api-from-specification.md).
 * The Azure OpenAI service must have deployments for the following:
-    * Chat Completion API (or Completion API) - Deployment used for API consumer calls 
+    * Chat Completion API - Deployment used for API consumer calls 
     * Embeddings API - Deployment used for semantic caching
 * The API Management instance must be configured to use managed identity authentication to the Azure OpenAI APIs. For more information, see [Authenticate and authorize access to Azure OpenAI APIs using Azure API Management ](api-management-authenticate-authorize-azure-openai.md#authenticate-with-managed-identity).
 * An [Azure Cache for Redis Enterprise](../azure-cache-for-redis/quickstart-create-redis-enterprise.md) or [Azure Managed Redis](../azure-cache-for-redis/quickstart-create-managed-redis.md) instance. The **RediSearch** module must be enabled on the Redis cache.
diff --git a/articles/api-management/azure-openai-semantic-cache-lookup-policy.md b/articles/api-management/azure-openai-semantic-cache-lookup-policy.md
@@ -17,7 +17,7 @@ ms.author: danlep
 
 [!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
 
-Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of responses to Azure OpenAI Chat Completion API and Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
+Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of responses to Azure OpenAI Chat Completion API requests from a configured external cache, based on vector proximity of the prompt to previous requests and a specified similarity score threshold. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
 
 > [!NOTE]
 > * This policy must have a corresponding [Cache responses to Azure OpenAI API requests](azure-openai-semantic-cache-store-policy.md) policy. 
diff --git a/articles/api-management/azure-openai-semantic-cache-store-policy.md b/articles/api-management/azure-openai-semantic-cache-store-policy.md
@@ -17,7 +17,7 @@ ms.author: danlep
 
 [!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
 
-The `azure-openai-semantic-cache-store` policy caches responses to Azure OpenAI Chat Completion API and Completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
+The `azure-openai-semantic-cache-store` policy caches responses to Azure OpenAI Chat Completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
 
 > [!NOTE]
 > * This policy must have a corresponding [Get cached responses to Azure OpenAI API requests](azure-openai-semantic-cache-lookup-policy.md) policy. 
diff --git a/articles/api-management/llm-semantic-cache-store-policy.md b/articles/api-management/llm-semantic-cache-store-policy.md
@@ -16,7 +16,7 @@ ms.author: danlep
 
 [!INCLUDE [api-management-availability-all-tiers](../../includes/api-management-availability-all-tiers.md)]
 
-The `llm-semantic-cache-store` policy caches responses to chat completion API and completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
+The `llm-semantic-cache-store` policy caches responses to chat completion API requests to a configured external cache. Response caching reduces bandwidth and processing requirements imposed on the backend Azure OpenAI API and lowers latency perceived by API consumers.
 
 > [!NOTE]
 > * This policy must have a corresponding [Get cached responses to large language model API requests](llm-semantic-cache-lookup-policy.md) policy. 
diff --git a/includes/api-management-azure-openai-models.md b/includes/api-management-azure-openai-models.md
@@ -15,10 +15,12 @@ The policy is used with APIs [added to API Management from the Azure OpenAI Serv
 | API type | Supported models |
 |-------|-------------|
 | Chat completion     |  gpt-3.5<br/><br/>gpt-4<br/><br/>gpt-4o<sup>1</sup> |
-| Completion | gpt-3.5-turbo-instruct |
 | Embeddings | text-embedding-3-large<br/><br/> text-embedding-3-small<br/><br/>text-embedding-ada-002 |
 
 <sup>1</sup> The `gpt-4o` model is multimodal (accepts text or image inputs and generates text).
 
+> [!NOTE]
+> Traditional completion APIs are only available with legacy model versions and support is limited.
+
 For more information, see [Azure OpenAI Service models](/azure/ai-services/openai/concepts/models).