Merge pull request #279657 from MicrosoftDocs/repo_sync_working_branch

Taojunshen · web-flow · commit 5c48f7e32a47 · 2024-06-29T05:39:54.000+08:00
Confirm merge from repo_sync_working_branch to main to sync with https://github.com/MicrosoftDocs/azure-docs (branch main)
diff --git a/articles/aks/gpu-cluster.md b/articles/aks/gpu-cluster.md
@@ -177,9 +177,10 @@ To use Azure Linux, you specify the OS SKU by setting `os-sku` to `AzureLinux` d
             name: nvidia-device-plugin-ds
         spec:
           tolerations:
-          - key: nvidia.com/gpu
-            operator: Exists
-            effect: NoSchedule
+          - key: "sku"
+            operator: "Equal"
+            value: "gpu"
+            effect: "NoSchedule"
           # Mark this pod as a critical add-on; when enabled, the critical add-on
           # scheduler reserves resources for critical add-on pods so that they can
           # be rescheduled after a failure.
diff --git a/articles/aks/private-clusters.md b/articles/aks/private-clusters.md
@@ -58,7 +58,7 @@ Create a private cluster with default basic networking using the [`az aks create
 ```azurecli-interactive
 az aks create \
     --name <private-cluster-name> \
-    --resource-group-name <private-cluster-resource-group> \
+    --resource-group <private-cluster-resource-group> \
     --load-balancer-sku standard \
     --enable-private-cluster \
     --generate-ssh-keys
diff --git a/articles/azure-cache-for-redis/cache-troubleshoot-timeouts.md b/articles/azure-cache-for-redis/cache-troubleshoot-timeouts.md
@@ -145,11 +145,11 @@ There are several changes you can make to mitigate high server load:
 
 - Investigate what is causing high server load such as [long-running commands](#long-running-commands), noted in this article, because of high memory pressure.
 - [Scale](cache-how-to-scale.md) out to more shards to distribute load across multiple Redis processes or scale up to a larger cache size with more CPU cores. For more information, see  [Azure Cache for Redis planning FAQs](./cache-planning-faq.yml).
-- If your production workload on a _C1_ cache is negatively affected by extra latency from virus scanning, you can reduce the effect by to pay for a higher tier offering  with multiple CPU cores, such as _C2_.
+- If your production workload on a _C1_ cache is negatively affected by extra latency from some internal defender scan runs, you can reduce the effect by scaling to a higher tier offering  with multiple CPU cores, such as _C2_.
 
 #### Spikes in server load
 
-On _C0_ and _C1_ caches, you might see short spikes in server load not caused by an increase in requests a couple times a day while virus scanning is running on the VMs. You see higher latency for requests while virus scanning is happening on these tiers. Caches on the  _C0_ and _C1_ tiers only have a single core to multitask, dividing the work of serving virus scanning and Redis requests.
+On _C0_ and _C1_ caches, you might see short spikes in server load not caused by an increase in requests a couple times a day while internal defender scanning is running on the VMs. You see higher latency for requests while internal defender scans happen on these tiers. Caches on the  _C0_ and _C1_ tiers only have a single core to multitask, dividing the work of serving internal defender scanning and Redis requests.
 
 ### High memory usage
 
diff --git a/articles/cosmos-db/ai-agents.md b/articles/cosmos-db/ai-agents.md
@@ -396,7 +396,7 @@ from langchain_core.runnables.history import RunnableWithMessageHistory
 from langchain.agents import AgentExecutor, create_openai_tools_agent
 from service import TravelAgentTools as agent_tools
 
-load_dotenv(override=True)
+load_dotenv(override=False)
 
 
 chat : ChatOpenAI | None=None
@@ -438,7 +438,7 @@ def LLM_init():
 LLM_init()
 ```
 
-The **init.py** file commences by initiating the loading of environment variables from a **.env** file utilizing the ```load_dotenv(override=True)``` method. Then, a global variable named ```agent_with_chat_history``` is instantiated for the agent, intended for use by our **TravelAgent.py**. The ```LLM_init()``` method is invoked during module initialization to configure our AI agent for conversation via the API web layer. The OpenAI Chat object is instantiated using the GPT-3.5 model, incorporating specific parameters such as model name and temperature. The chat object, tools list, and prompt template are combined to generate an ```AgentExecutor```, which operates as our AI Travel Agent. Lastly, the agent with history, ```agent_with_chat_history```, is established using ```RunnableWithMessageHistory``` with chat history (MongoDBChatMessageHistory), enabling it to maintain a complete conversation history via Azure Cosmos DB.
+The **init.py** file commences by initiating the loading of environment variables from a **.env** file utilizing the ```load_dotenv(override=False)``` method. Then, a global variable named ```agent_with_chat_history``` is instantiated for the agent, intended for use by our **TravelAgent.py**. The ```LLM_init()``` method is invoked during module initialization to configure our AI agent for conversation via the API web layer. The OpenAI Chat object is instantiated using the GPT-3.5 model, incorporating specific parameters such as model name and temperature. The chat object, tools list, and prompt template are combined to generate an ```AgentExecutor```, which operates as our AI Travel Agent. Lastly, the agent with history, ```agent_with_chat_history```, is established using ```RunnableWithMessageHistory``` with chat history (MongoDBChatMessageHistory), enabling it to maintain a complete conversation history via Azure Cosmos DB.
 
 #### Prompt
 
@@ -507,7 +507,7 @@ from model.prompt import PromptResponse
 import time
 from dotenv import load_dotenv
 
-load_dotenv(override=True)
+load_dotenv(override=False)
 
 
 def agent_chat(input:str, session_id:str)->str: