MicrosoftDocs
diff --git a/‎articles/ai-services/openai/concepts/use-your-data.md
Lines changed: 1 addition & 33 deletions b/‎articles/ai-services/openai/concepts/use-your-data.md
Lines changed: 1 addition & 33 deletions
diff --git a/‎articles/ai-services/openai/references/on-your-data.md
Lines changed: 9 additions & 16 deletions b/‎articles/ai-services/openai/references/on-your-data.md
Lines changed: 9 additions & 16 deletions
diff --git a/‎articles/aks/cluster-autoscaler-overview.md
Lines changed: 9 additions & 6 deletions b/‎articles/aks/cluster-autoscaler-overview.md
Lines changed: 9 additions & 6 deletions
diff --git a/‎articles/aks/cluster-autoscaler.md
Lines changed: 31 additions & 4 deletions b/‎articles/aks/cluster-autoscaler.md
Lines changed: 31 additions & 4 deletions
diff --git a/‎articles/aks/create-nginx-ingress-private-controller.md
Lines changed: 2 additions & 2 deletions b/‎articles/aks/create-nginx-ingress-private-controller.md
Lines changed: 2 additions & 2 deletions
@@ -383,40 +383,8 @@ You can send a streaming request using the `stream` parameter, allowing data to
 
 #### Conversation history for better results
 
-When you chat with a model, providing a history of the chat will help the model return higher quality results. 
+When you chat with a model, providing a history of the chat will help the model return higher quality results. You don't need to include the `context` property of the assistant messages in your API requests for better response quality. See [the API reference documentation](../references/on-your-data.md#examples) for examples.
 
-```json
-{
-    "dataSources": [
-        {
-            "type": "AzureCognitiveSearch",
-            "parameters": {
-                "endpoint": "'$AZURE_AI_SEARCH_ENDPOINT'",
-                "key": "'$AZURE_AI_SEARCH_API_KEY'",
-                "indexName": "'$AZURE_AI_SEARCH_INDEX'"
-            }
-        }
-    ],
-    "messages": [
-        {
-            "role": "user",
-            "content": "What are the differences between Azure Machine Learning and Azure AI services?"
-        },
-        {
-            "role": "tool",
-            "content": "{\"citations\": [{\"content\": \"title: Azure AI services and Machine Learning\\ntitleSuffix: Azure AI services\\ndescription: Learn where Azure AI services fits in with other Azure offerings for machine learning.\\nAzure AI services and machine learning\\nAzure AI services provides machine learning capabilities to solve general problems such as...\\n \"articles\\\\cognitive-services\\\\cognitive-services-and-machine-learning.md\", \"url\": null, \"metadata\": {\"chunking\": \"orignal document size=1018. Scores=0.32200050354003906 and 1.2880020141601562.Org Highlight count=115.\"}, \"chunk_id\": \"0\"}], \"intent\": \"[\\\"What are the differences between Azure Machine Learning and Azure AI services?\\\"]\"}"
-        },
-        {
-            "role": "assistant",
-            "content": " \nAzure Machine Learning is a product and service tailored for data scientists to build, train, and deploy machine learning models [doc1]..."
-        },
-        {
-            "role": "user",
-            "content": "How do I use Azure machine learning?"
-        }
-    ]
-}
-```
 
 ## Token usage estimation for Azure OpenAI On Your Data
 
 
@@ -28,7 +28,7 @@ POST {endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=
 ```
 
 **Supported versions**
-* `2024-02-15-preview` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2024-02-15-preview/inference.json)
+* `2024-02-15-preview` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2024-02-15-preview/inference.json).
 * `2024-02-01` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01).
 
 > [!NOTE]
@@ -48,7 +48,6 @@ The request body inherits the same schema of chat completions API request. This
 
 |Name | Type | Required | Description |
 |--- | --- | --- | --- |
-| `messages` | [ChatMessage](#chat-message)[] | True | The array of messages to generate chat completions for, in the chat format. The [request chat message](#chat-message) has a `context` property, which is added for Azure OpenAI On Your Data.|
 | `data_sources` | [DataSource](#data-source)[] | True | The configuration entries for Azure OpenAI On Your Data. There must be exactly one element in the array. If `data_sources` is not provided, the service uses chat completions model directly, and does not use Azure OpenAI On Your Data.|
 
 ## Response body
@@ -57,17 +56,17 @@ The response body inherits the same schema of chat completions API response. The
 
 ## Chat message
 
-In both request and response, when the chat message `role` is `assistant`, the chat message schema inherits from the chat completions assistant chat message, and is extended with the property `context`.
+The response assistant message schema inherits from the chat completions assistant [chat message](../reference.md#chatmessage), and is extended with the property `context`.
 
 |Name | Type | Required | Description |
 |--- | --- | --- | --- |
-| `context` | [Context](#context) | False | Represents the incremental steps performed by the Azure OpenAI On Your Data while processing the request, including the detected search intent and the retrieved documents. |
+| `context` | [Context](#context) | False | Represents the incremental steps performed by the Azure OpenAI On Your Data while processing the request, including the retrieved documents. |
 
 ## Context
 |Name | Type | Required | Description |
 |--- | --- | --- | --- |
-| `citations` | [Citation](#citation)[] | False | The data source retrieval result, used to generate the assistant message in the response.|
-| `intent` | string | False | The detected intent from the chat history, used to pass to the next turn to carry over the context.|
+| `citations` | [Citation](#citation)[] | False | The data source retrieval result, used to generate the assistant message in the response. Clients can render references from the citations. |
+| `intent` | string | False | The detected intent from the chat history. Passing back the previous intent is no longer needed. Ignore this property. |
 
 ## Citation
 
@@ -91,7 +90,7 @@ This list shows the supported data sources.
 
 ## Examples
 
-This example shows how to pass context with conversation history for better results.
+This example shows how to pass conversation history for better results.
 
 Prerequisites:
 * Configure the role assignments from Azure OpenAI system assigned managed identity to Azure search service. Required roles: `Search Index Data Reader`, `Search Service Contributor`.
@@ -137,10 +136,7 @@ completion = client.chat.completions.create(
         },
         {
             "role": "assistant",
-            "content": "DRI stands for Directly Responsible Individual of a service. Which service are you asking about?",
-            "context": {
-                "intent": "[\"Who is DRI?\", \"What is the meaning of DRI?\", \"Define DRI\"]"
-            }
+            "content": "DRI stands for Directly Responsible Individual of a service. Which service are you asking about?"
         },
         {
             "role": "user",
@@ -191,14 +187,11 @@ az rest --method POST \
     "messages": [
         {
             "role": "user",
-            "content": "Who is DRI?",
+            "content": "Who is DRI?"
         },
         {
             "role": "assistant",
-            "content": "DRI stands for Directly Responsible Individual of a service. Which service are you asking about?",
-            "context": {
-              "intent": "[\"Who is DRI?\", \"What is the meaning of DRI?\", \"Define DRI\"]"
-            }
+            "content": "DRI stands for Directly Responsible Individual of a service. Which service are you asking about?"
         },
         {
             "role": "user",
 
@@ -31,14 +31,16 @@ It's a common practice to enable cluster autoscaler for nodes and either the Ver
 * To **effectively run workloads concurrently on both Spot and Fixed node pools**, consider using [*priority expanders*](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders). This approach allows you to schedule pods based on the priority of the node pool.
 * Exercise caution when **assigning CPU/Memory requests on pods**. The cluster autoscaler scales up based on pending pods rather than CPU/Memory pressure on nodes.
 * For **clusters concurrently hosting both long-running workloads, like web apps, and short/bursty job workloads**, we recommend separating them into distinct node pools with [Affinity Rules](./operator-best-practices-advanced-scheduler.md#node-affinity)/[expanders](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders) or using [PriorityClass](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass) to help prevent unnecessary node drain or scale down operations.
-* We **don't recommend making direct changes to nodes in autoscaled node pools**. All nodes in the same node group should have uniform capacity, labels, and system pods running on them.
+* In an autoscaler-enabled node pool, scale down nodes by removing workloads, instead of manually reducing the node count. This can be problematic if the node pool is already at maximum capacity or if there are active workloads running on the nodes, potentially causing unexpected behavior by the cluster autoscaler 
 * Nodes don't scale up if pods have a PriorityClass value below -10. Priority -10 is reserved for [overprovisioning pods](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with-cluster-autoscaler). For more information, see [Using the cluster autoscaler with Pod Priority and Preemption](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-cluster-autoscaler-work-with-pod-priority-and-preemption).
 * **Don't combine other node autoscaling mechanisms**, such as Virtual Machine Scale Set autoscalers, with the cluster autoscaler.
 * The cluster autoscaler **might be unable to scale down if pods can't move, such as in the following situations**:
   * A directly created pod not backed by a controller object, such as a Deployment or ReplicaSet.
   * A pod disruption budget (PDB) that's too restrictive and doesn't allow the number of pods to fall below a certain threshold.
   * A pod uses node selectors or anti-affinity that can't be honored if scheduled on a different node.
     For more information, see [What types of pods can prevent the cluster autoscaler from removing a node?](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node).
+>[!IMPORTANT]
+> **Do not make changes to individual nodes within the autoscaled node pools**. All nodes in the same node group should have uniform capacity, labels, taints and system pods running on them. 
 
 ## Cluster autoscaler profile
 
@@ -52,21 +54,22 @@ It's important to note that the cluster autoscaler profile settings are cluster-
 
 #### Example 1: Optimizing for performance
 
-For clusters that handle substantial and bursty workloads with a primary focus on performance, we recommend increasing the `scan-interval` and decreasing the `scale-down-utilization-threshold`. These settings help batch multiple scaling operations into a single call, optimizing scaling time and the utilization of compute read/write quotas. It also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing the pod scheduling efficiency.
+For clusters that handle substantial and bursty workloads with a primary focus on performance, we recommend increasing the `scan-interval` and decreasing the `scale-down-utilization-threshold`. These settings help batch multiple scaling operations into a single call, optimizing scaling time and the utilization of compute read/write quotas. It also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing the pod scheduling efficiency. Also increase `ok-total-unready-count`and `max-total-unready-percentage`. 
 
-For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true`, which effectively ignores node utilization by daemonset pods and minimizes unnecessary scale down operations.
+For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true`, which effectively ignores node utilization by daemonset pods and minimizes unnecessary scale down operations. See [profile for bursty workloads](./cluster-autoscaler.md#configure-cluster-autoscaler-profile-for-bursty-workloads)
 
 #### Example 2: Optimizing for cost
 
-If you want a cost-optimized profile, we recommend setting the following parameter configurations:
-
+If you want a [cost-optimized profile](./cluster-autoscaler.md#configure-cluster-autoscaler-profile-for-aggressive-scale-down), we recommend setting the following parameter configurations:
 * Reduce `scale-down-unneeded-time`, which is the amount of time a node should be unneeded before it's eligible for scale down.
 * Reduce `scale-down-delay-after-add`, which is the amount of time to wait after a node is added before considering it for scale down.
 * Increase `scale-down-utilization-threshold`, which is the utilization threshold for removing nodes.
 * Increase `max-empty-bulk-delete`, which is the maximum number of nodes that can be deleted in a single call.
+* Set `skip-nodes-with-local-storage` to false.
+* Increase `ok-total-unready-count`and `max-total-unready-percentage` 
 
 ## Common issues and mitigation recommendations
-
+View scaling failures and scale-up not triggered events via [CLI or Portal](./cluster-autoscaler.md#retrieve-cluster-autoscaler-logs-and-status).
 ### Not triggering scale up operations
 
 | Common causes | Mitigation recommendations |
 
@@ -195,6 +195,24 @@ The following table lists the available settings for the cluster autoscaler prof
       --cluster-autoscaler-profile scan-interval=30s
     ```
 
+### Configure cluster autoscaler profile for aggressive scale down
+> [!NOTE]
+> Scaling down aggressively is not recommended for clusters experiencing frequent scale-outs and scale-ins within short intervals, as it could potentially result in extended node provisioning times under these circumstances. Increasing `scale-down-delay-after-add` can help in these circumstances by keeping the node around longer to handle incoming workloads.
+
+   ```azurecli-interactive
+    az aks update \
+        --resource-group myResourceGroup \
+        --name myAKSCluster \
+        --cluster-autoscaler-profile scan-interval=30s, scale-down-delay-after-add=0s,scale-down-delay-after-failure=30s,scale-down-unneeded-time=3m,scale-down-unready-time=3m,max-graceful-termination-sec=30,skip-nodes-with-local-storage=false,max-empty-bulk-delete=1000,max-total-unready-percentage=100,ok-total-unready-count=1000,max-node-provision-time=15m
+   ```
+### Configure cluster autoscaler profile for bursty workloads
+   ```azurecli-interactive
+    az aks update \   
+        --resource-group "myResourceGroup" \
+        --name myAKSCluster \ 
+        --cluster-autoscaler-profile scan-interval=20s,scale-down-delay-after-add=10m,scale-down-delay-after-failure=1m,scale-down-unneeded-time=5m,scale-down-unready-time=5m,max-graceful-termination-sec=30,skip-nodes-with-local-storage=false,max-empty-bulk-delete=100,max-total-unready-percentage=100,ok-total-unready-count=1000,max-node-provision-time=15m
+   ```
+
 ### Reset cluster autoscaler profile to default values
 
 * Reset the cluster autoscaler profile using the [`az aks update`][az-aks-update-preview] command.
@@ -206,12 +224,11 @@ The following table lists the available settings for the cluster autoscaler prof
       --cluster-autoscaler-profile ""
     ```
 
-## Retrieve cluster autoscaler logs and status updates
+## Retrieve cluster autoscaler logs and status 
 
 You can retrieve logs and status updates from the cluster autoscaler to help diagnose and debug autoscaler events. AKS manages the cluster autoscaler on your behalf and runs it in the managed control plane. You can enable control plane node to see the logs and operations from the cluster autoscaler.
 
 ### [Azure CLI](#tab/azure-cli)
-
 1. Set up a rule for resource logs to push cluster autoscaler logs to Log Analytics using the [instructions here][aks-view-master-logs]. Make sure you check the box for `cluster-autoscaler` when selecting options for **Logs**.
 2. Select the **Log** section on your cluster.
 3. Enter the following example query into Log Analytics:
@@ -224,8 +241,16 @@ You can retrieve logs and status updates from the cluster autoscaler to help dia
     As long as there are logs to retrieve, you should see logs similar to the following logs:
 
     :::image type="content" source="media/cluster-autoscaler/autoscaler-logs.png" alt-text="Screenshot of Log Analytics logs.":::
-
-    The cluster autoscaler also writes out the health status to a `configmap` named `cluster-autoscaler-status`. You can retrieve these logs using the following `kubectl` command:
+   
+4. View cluster autoscaler scale-up not triggered events on CLI 
+    ```bash
+    kubectl get events --field-selector source=cluster-autoscaler,reason=NotTriggerScaleUp
+    ```
+5. View cluster autoscaler warning events on CLI 
+    ```bash
+    kubectl get events --field-selector source=cluster-autoscaler,type=Warning
+    ```
+6. The cluster autoscaler also writes out the health status to a `configmap` named `cluster-autoscaler-status`. You can retrieve these logs using the following `kubectl` command:
 
     ```bash
     kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml
@@ -244,6 +269,8 @@ You can retrieve logs and status updates from the cluster autoscaler to help dia
 ---
 
 For more information, see the [Kubernetes/autoscaler GitHub project FAQ][kubernetes-faq].
+## Cluster Autoscaler Metrics
+You can enable [control plane metrics (Preview)](./monitor-control-plane-metrics.md) to see the logs and operations from the [cluster autoscaler](./control-plane-metrics-default-list.md#minimal-ingestion-for-default-off-targets) with the [Azure Monitor managed service for Prometheus add-on](../azure-monitor/essentials/prometheus-metrics-overview.md)
 
 ## Next steps
 
 
@@ -1,5 +1,5 @@
 ---
-title: Configure internal NGIX ingress controller for Azure private DNS zone
+title: Configure internal NGINX ingress controller for Azure private DNS zone
 description: Understand how to configure an ingress controller with a private IP address and an Azure private DNS zone using the application routing add-on for Azure Kubernetes Service. 
 ms.subservice: aks-networking
 ms.custom: devx-track-azurecli
@@ -320,4 +320,4 @@ For other configuration information related to SSL encryption other advanced NGI
 [azure-dns-zone-role]: ../dns/dns-protect-private-zones-recordsets.md
 [az-network-private-dns-zone-create]: /cli/azure/network/private-dns/zone?#az-network-private-dns-zone-create
 [az-network-private-dns-link-vnet-create]: /cli/azure/network/private-dns/link/vnet#az-network-private-dns-link-vnet-create
-[az-network-private-dns-record-set-a-list]: /cli/azure/network/private-dns/record-set/a#az-network-private-dns-record-set-a-list
+[az-network-private-dns-record-set-a-list]: /cli/azure/network/private-dns/record-set/a#az-network-private-dns-record-set-a-list