Merge pull request #291808 from nimakamoosi/nimak/fix/move-semantic-caching-managed-identity-to-backend

prmerger-automator[bot] · web-flow · commit 970588b6f2d9 · 2024-12-11T16:30:11.000Z
Update documentation for moving Semantic caching embedding auth to backend.
diff --git a/articles/api-management/azure-openai-enable-semantic-caching.md b/articles/api-management/azure-openai-enable-semantic-caching.md
@@ -61,6 +61,10 @@ Configure a [backend](backends.md) resource for the embeddings API deployment wi
         ```
         https://my-aoai.openai.azure.com/openai/deployments/embeddings-deployment/embeddings
         ```
+* **Authorization credentials** - Go to **Managed Identity** tab.
+  * **Client indentity** - Select *System assigned identity* or type in a User assigned managed identity client ID.
+  * **Resource ID** - Enter `https://cognitiveservices.azure.com/` for Azure OpenAI Service.
+
 ### Test backend 
 
 To test the backend, create an API operation for your Azure OpenAI Service API:
@@ -123,7 +127,6 @@ Configure the following policies to enable semantic caching for Azure OpenAI API
     <azure-openai-semantic-cache-lookup
         score-threshold="0.8"
         embeddings-backend-id="embeddings-deployment"
-        embeddings-backend-auth="system-assigned"
         ignore-system-messages="true"
         max-message-count="10">
         <vary-by>@(context.Subscription.Id)</vary-by>
diff --git a/articles/api-management/azure-openai-semantic-cache-lookup-policy.md b/articles/api-management/azure-openai-semantic-cache-lookup-policy.md
@@ -34,7 +34,6 @@ Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of r
 <azure-openai-semantic-cache-lookup
     score-threshold="similarity score threshold"
     embeddings-backend-id ="backend entity ID for embeddings API"
-    embeddings-backend-auth ="system-assigned"             
     ignore-system-messages="true | false"      
     max-message-count="count" >
     <vary-by>"expression to partition caching"</vary-by>
@@ -47,7 +46,6 @@ Use the `azure-openai-semantic-cache-lookup` policy to perform cache lookup of r
 | ----------------- | ------------------------------------------------------ | -------- | ------- |
 | score-threshold	| Similarity score threshold used to determine whether to return a cached response to a prompt. Value is a decimal between 0.0 and 1.0. [Learn more](../azure-cache-for-redis/cache-tutorial-semantic-cache.md#change-the-similarity-threshold). | Yes |	N/A |
 | embeddings-backend-id | [Backend](backends.md) ID for OpenAI embeddings API call. |	Yes |	N/A |
-| embeddings-backend-auth | Authentication used for Azure OpenAI embeddings API backend. | Yes. Must be set to `system-assigned`. | N/A |
 | ignore-system-messages | Boolean. If set to `true`, removes system messages from a GPT chat completion prompt before assessing cache similarity. | No | false |
 | max-message-count | If specified, number of remaining dialog messages after which caching is skipped. | No | N/A |
                                              
diff --git a/articles/api-management/llm-semantic-cache-lookup-policy.md b/articles/api-management/llm-semantic-cache-lookup-policy.md
@@ -34,7 +34,6 @@ Use the `llm-semantic-cache-lookup` policy to perform cache lookup of responses
 <llm-semantic-cache-lookup
     score-threshold="similarity score threshold"
     embeddings-backend-id ="backend entity ID for embeddings API"
-    embeddings-backend-auth ="system-assigned"             
     ignore-system-messages="true | false"      
     max-message-count="count" >
     <vary-by>"expression to partition caching"</vary-by>
@@ -47,7 +46,6 @@ Use the `llm-semantic-cache-lookup` policy to perform cache lookup of responses
 | ----------------- | ------------------------------------------------------ | -------- | ------- |
 | score-threshold	| Similarity score threshold used to determine whether to return a cached response to a prompt. Value is a decimal between 0.0 and 1.0. [Learn more](../azure-cache-for-redis/cache-tutorial-semantic-cache.md#change-the-similarity-threshold). | Yes |	N/A |
 | embeddings-backend-id | [Backend](backends.md) ID for OpenAI embeddings API call. |	Yes |	N/A |
-| embeddings-backend-auth | Authentication used for Azure OpenAI embeddings API backend. | Yes. Must be set to `system-assigned`. | N/A |
 | ignore-system-messages | Boolean. If set to `true`, removes system messages from a GPT chat completion prompt before assessing cache similarity. | No | false |
 | max-message-count | If specified, number of remaining dialog messages after which caching is skipped. | No | N/A |