Skip to content

Commit afdd998

Browse files
authored
Merge pull request #269158 from MicrosoftDocs/main
Publish to live, Friday 4 AM PST, 3/15
2 parents abb60ab + f6dc5ac commit afdd998

File tree

63 files changed

+745
-343
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

63 files changed

+745
-343
lines changed

articles/ai-services/openai/concepts/use-your-data.md

Lines changed: 1 addition & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -383,40 +383,8 @@ You can send a streaming request using the `stream` parameter, allowing data to
383383

384384
#### Conversation history for better results
385385

386-
When you chat with a model, providing a history of the chat will help the model return higher quality results.
386+
When you chat with a model, providing a history of the chat will help the model return higher quality results. You don't need to include the `context` property of the assistant messages in your API requests for better response quality. See [the API reference documentation](../references/on-your-data.md#examples) for examples.
387387

388-
```json
389-
{
390-
"dataSources": [
391-
{
392-
"type": "AzureCognitiveSearch",
393-
"parameters": {
394-
"endpoint": "'$AZURE_AI_SEARCH_ENDPOINT'",
395-
"key": "'$AZURE_AI_SEARCH_API_KEY'",
396-
"indexName": "'$AZURE_AI_SEARCH_INDEX'"
397-
}
398-
}
399-
],
400-
"messages": [
401-
{
402-
"role": "user",
403-
"content": "What are the differences between Azure Machine Learning and Azure AI services?"
404-
},
405-
{
406-
"role": "tool",
407-
"content": "{\"citations\": [{\"content\": \"title: Azure AI services and Machine Learning\\ntitleSuffix: Azure AI services\\ndescription: Learn where Azure AI services fits in with other Azure offerings for machine learning.\\nAzure AI services and machine learning\\nAzure AI services provides machine learning capabilities to solve general problems such as...\\n \"articles\\\\cognitive-services\\\\cognitive-services-and-machine-learning.md\", \"url\": null, \"metadata\": {\"chunking\": \"orignal document size=1018. Scores=0.32200050354003906 and 1.2880020141601562.Org Highlight count=115.\"}, \"chunk_id\": \"0\"}], \"intent\": \"[\\\"What are the differences between Azure Machine Learning and Azure AI services?\\\"]\"}"
408-
},
409-
{
410-
"role": "assistant",
411-
"content": " \nAzure Machine Learning is a product and service tailored for data scientists to build, train, and deploy machine learning models [doc1]..."
412-
},
413-
{
414-
"role": "user",
415-
"content": "How do I use Azure machine learning?"
416-
}
417-
]
418-
}
419-
```
420388

421389
## Token usage estimation for Azure OpenAI On Your Data
422390

articles/ai-services/openai/references/on-your-data.md

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ POST {endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version=
2828
```
2929

3030
**Supported versions**
31-
* `2024-02-15-preview` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2024-02-15-preview/inference.json)
31+
* `2024-02-15-preview` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/preview/2024-02-15-preview/inference.json).
3232
* `2024-02-01` [Swagger spec](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01).
3333

3434
> [!NOTE]
@@ -48,7 +48,6 @@ The request body inherits the same schema of chat completions API request. This
4848

4949
|Name | Type | Required | Description |
5050
|--- | --- | --- | --- |
51-
| `messages` | [ChatMessage](#chat-message)[] | True | The array of messages to generate chat completions for, in the chat format. The [request chat message](#chat-message) has a `context` property, which is added for Azure OpenAI On Your Data.|
5251
| `data_sources` | [DataSource](#data-source)[] | True | The configuration entries for Azure OpenAI On Your Data. There must be exactly one element in the array. If `data_sources` is not provided, the service uses chat completions model directly, and does not use Azure OpenAI On Your Data.|
5352

5453
## Response body
@@ -57,17 +56,17 @@ The response body inherits the same schema of chat completions API response. The
5756

5857
## Chat message
5958

60-
In both request and response, when the chat message `role` is `assistant`, the chat message schema inherits from the chat completions assistant chat message, and is extended with the property `context`.
59+
The response assistant message schema inherits from the chat completions assistant [chat message](../reference.md#chatmessage), and is extended with the property `context`.
6160

6261
|Name | Type | Required | Description |
6362
|--- | --- | --- | --- |
64-
| `context` | [Context](#context) | False | Represents the incremental steps performed by the Azure OpenAI On Your Data while processing the request, including the detected search intent and the retrieved documents. |
63+
| `context` | [Context](#context) | False | Represents the incremental steps performed by the Azure OpenAI On Your Data while processing the request, including the retrieved documents. |
6564

6665
## Context
6766
|Name | Type | Required | Description |
6867
|--- | --- | --- | --- |
69-
| `citations` | [Citation](#citation)[] | False | The data source retrieval result, used to generate the assistant message in the response.|
70-
| `intent` | string | False | The detected intent from the chat history, used to pass to the next turn to carry over the context.|
68+
| `citations` | [Citation](#citation)[] | False | The data source retrieval result, used to generate the assistant message in the response. Clients can render references from the citations. |
69+
| `intent` | string | False | The detected intent from the chat history. Passing back the previous intent is no longer needed. Ignore this property. |
7170

7271
## Citation
7372

@@ -91,7 +90,7 @@ This list shows the supported data sources.
9190

9291
## Examples
9392

94-
This example shows how to pass context with conversation history for better results.
93+
This example shows how to pass conversation history for better results.
9594

9695
Prerequisites:
9796
* Configure the role assignments from Azure OpenAI system assigned managed identity to Azure search service. Required roles: `Search Index Data Reader`, `Search Service Contributor`.
@@ -137,10 +136,7 @@ completion = client.chat.completions.create(
137136
},
138137
{
139138
"role": "assistant",
140-
"content": "DRI stands for Directly Responsible Individual of a service. Which service are you asking about?",
141-
"context": {
142-
"intent": "[\"Who is DRI?\", \"What is the meaning of DRI?\", \"Define DRI\"]"
143-
}
139+
"content": "DRI stands for Directly Responsible Individual of a service. Which service are you asking about?"
144140
},
145141
{
146142
"role": "user",
@@ -191,14 +187,11 @@ az rest --method POST \
191187
"messages": [
192188
{
193189
"role": "user",
194-
"content": "Who is DRI?",
190+
"content": "Who is DRI?"
195191
},
196192
{
197193
"role": "assistant",
198-
"content": "DRI stands for Directly Responsible Individual of a service. Which service are you asking about?",
199-
"context": {
200-
"intent": "[\"Who is DRI?\", \"What is the meaning of DRI?\", \"Define DRI\"]"
201-
}
194+
"content": "DRI stands for Directly Responsible Individual of a service. Which service are you asking about?"
202195
},
203196
{
204197
"role": "user",

articles/aks/cluster-autoscaler-overview.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,16 @@ It's a common practice to enable cluster autoscaler for nodes and either the Ver
3131
* To **effectively run workloads concurrently on both Spot and Fixed node pools**, consider using [*priority expanders*](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders). This approach allows you to schedule pods based on the priority of the node pool.
3232
* Exercise caution when **assigning CPU/Memory requests on pods**. The cluster autoscaler scales up based on pending pods rather than CPU/Memory pressure on nodes.
3333
* For **clusters concurrently hosting both long-running workloads, like web apps, and short/bursty job workloads**, we recommend separating them into distinct node pools with [Affinity Rules](./operator-best-practices-advanced-scheduler.md#node-affinity)/[expanders](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-are-expanders) or using [PriorityClass](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass) to help prevent unnecessary node drain or scale down operations.
34-
* We **don't recommend making direct changes to nodes in autoscaled node pools**. All nodes in the same node group should have uniform capacity, labels, and system pods running on them.
34+
* In an autoscaler-enabled node pool, scale down nodes by removing workloads, instead of manually reducing the node count. This can be problematic if the node pool is already at maximum capacity or if there are active workloads running on the nodes, potentially causing unexpected behavior by the cluster autoscaler
3535
* Nodes don't scale up if pods have a PriorityClass value below -10. Priority -10 is reserved for [overprovisioning pods](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with-cluster-autoscaler). For more information, see [Using the cluster autoscaler with Pod Priority and Preemption](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-cluster-autoscaler-work-with-pod-priority-and-preemption).
3636
* **Don't combine other node autoscaling mechanisms**, such as Virtual Machine Scale Set autoscalers, with the cluster autoscaler.
3737
* The cluster autoscaler **might be unable to scale down if pods can't move, such as in the following situations**:
3838
* A directly created pod not backed by a controller object, such as a Deployment or ReplicaSet.
3939
* A pod disruption budget (PDB) that's too restrictive and doesn't allow the number of pods to fall below a certain threshold.
4040
* A pod uses node selectors or anti-affinity that can't be honored if scheduled on a different node.
4141
For more information, see [What types of pods can prevent the cluster autoscaler from removing a node?](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node).
42+
>[!IMPORTANT]
43+
> **Do not make changes to individual nodes within the autoscaled node pools**. All nodes in the same node group should have uniform capacity, labels, taints and system pods running on them.
4244
4345
## Cluster autoscaler profile
4446

@@ -52,21 +54,22 @@ It's important to note that the cluster autoscaler profile settings are cluster-
5254

5355
#### Example 1: Optimizing for performance
5456

55-
For clusters that handle substantial and bursty workloads with a primary focus on performance, we recommend increasing the `scan-interval` and decreasing the `scale-down-utilization-threshold`. These settings help batch multiple scaling operations into a single call, optimizing scaling time and the utilization of compute read/write quotas. It also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing the pod scheduling efficiency.
57+
For clusters that handle substantial and bursty workloads with a primary focus on performance, we recommend increasing the `scan-interval` and decreasing the `scale-down-utilization-threshold`. These settings help batch multiple scaling operations into a single call, optimizing scaling time and the utilization of compute read/write quotas. It also helps mitigate the risk of swift scale down operations on underutilized nodes, enhancing the pod scheduling efficiency. Also increase `ok-total-unready-count`and `max-total-unready-percentage`.
5658

57-
For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true`, which effectively ignores node utilization by daemonset pods and minimizes unnecessary scale down operations.
59+
For clusters with daemonset pods, we recommend setting `ignore-daemonset-utilization` to `true`, which effectively ignores node utilization by daemonset pods and minimizes unnecessary scale down operations. See [profile for bursty workloads](./cluster-autoscaler.md#configure-cluster-autoscaler-profile-for-bursty-workloads)
5860

5961
#### Example 2: Optimizing for cost
6062

61-
If you want a cost-optimized profile, we recommend setting the following parameter configurations:
62-
63+
If you want a [cost-optimized profile](./cluster-autoscaler.md#configure-cluster-autoscaler-profile-for-aggressive-scale-down), we recommend setting the following parameter configurations:
6364
* Reduce `scale-down-unneeded-time`, which is the amount of time a node should be unneeded before it's eligible for scale down.
6465
* Reduce `scale-down-delay-after-add`, which is the amount of time to wait after a node is added before considering it for scale down.
6566
* Increase `scale-down-utilization-threshold`, which is the utilization threshold for removing nodes.
6667
* Increase `max-empty-bulk-delete`, which is the maximum number of nodes that can be deleted in a single call.
68+
* Set `skip-nodes-with-local-storage` to false.
69+
* Increase `ok-total-unready-count`and `max-total-unready-percentage`
6770

6871
## Common issues and mitigation recommendations
69-
72+
View scaling failures and scale-up not triggered events via [CLI or Portal](./cluster-autoscaler.md#retrieve-cluster-autoscaler-logs-and-status).
7073
### Not triggering scale up operations
7174

7275
| Common causes | Mitigation recommendations |

articles/aks/cluster-autoscaler.md

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,24 @@ The following table lists the available settings for the cluster autoscaler prof
195195
--cluster-autoscaler-profile scan-interval=30s
196196
```
197197
198+
### Configure cluster autoscaler profile for aggressive scale down
199+
> [!NOTE]
200+
> Scaling down aggressively is not recommended for clusters experiencing frequent scale-outs and scale-ins within short intervals, as it could potentially result in extended node provisioning times under these circumstances. Increasing `scale-down-delay-after-add` can help in these circumstances by keeping the node around longer to handle incoming workloads.
201+
202+
```azurecli-interactive
203+
az aks update \
204+
--resource-group myResourceGroup \
205+
--name myAKSCluster \
206+
--cluster-autoscaler-profile scan-interval=30s, scale-down-delay-after-add=0s,scale-down-delay-after-failure=30s,scale-down-unneeded-time=3m,scale-down-unready-time=3m,max-graceful-termination-sec=30,skip-nodes-with-local-storage=false,max-empty-bulk-delete=1000,max-total-unready-percentage=100,ok-total-unready-count=1000,max-node-provision-time=15m
207+
```
208+
### Configure cluster autoscaler profile for bursty workloads
209+
```azurecli-interactive
210+
az aks update \
211+
--resource-group "myResourceGroup" \
212+
--name myAKSCluster \
213+
--cluster-autoscaler-profile scan-interval=20s,scale-down-delay-after-add=10m,scale-down-delay-after-failure=1m,scale-down-unneeded-time=5m,scale-down-unready-time=5m,max-graceful-termination-sec=30,skip-nodes-with-local-storage=false,max-empty-bulk-delete=100,max-total-unready-percentage=100,ok-total-unready-count=1000,max-node-provision-time=15m
214+
```
215+
198216
### Reset cluster autoscaler profile to default values
199217

200218
* Reset the cluster autoscaler profile using the [`az aks update`][az-aks-update-preview] command.
@@ -206,12 +224,11 @@ The following table lists the available settings for the cluster autoscaler prof
206224
--cluster-autoscaler-profile ""
207225
```
208226
209-
## Retrieve cluster autoscaler logs and status updates
227+
## Retrieve cluster autoscaler logs and status
210228
211229
You can retrieve logs and status updates from the cluster autoscaler to help diagnose and debug autoscaler events. AKS manages the cluster autoscaler on your behalf and runs it in the managed control plane. You can enable control plane node to see the logs and operations from the cluster autoscaler.
212230
213231
### [Azure CLI](#tab/azure-cli)
214-
215232
1. Set up a rule for resource logs to push cluster autoscaler logs to Log Analytics using the [instructions here][aks-view-master-logs]. Make sure you check the box for `cluster-autoscaler` when selecting options for **Logs**.
216233
2. Select the **Log** section on your cluster.
217234
3. Enter the following example query into Log Analytics:
@@ -224,8 +241,16 @@ You can retrieve logs and status updates from the cluster autoscaler to help dia
224241
As long as there are logs to retrieve, you should see logs similar to the following logs:
225242
226243
:::image type="content" source="media/cluster-autoscaler/autoscaler-logs.png" alt-text="Screenshot of Log Analytics logs.":::
227-
228-
The cluster autoscaler also writes out the health status to a `configmap` named `cluster-autoscaler-status`. You can retrieve these logs using the following `kubectl` command:
244+
245+
4. View cluster autoscaler scale-up not triggered events on CLI
246+
```bash
247+
kubectl get events --field-selector source=cluster-autoscaler,reason=NotTriggerScaleUp
248+
```
249+
5. View cluster autoscaler warning events on CLI
250+
```bash
251+
kubectl get events --field-selector source=cluster-autoscaler,type=Warning
252+
```
253+
6. The cluster autoscaler also writes out the health status to a `configmap` named `cluster-autoscaler-status`. You can retrieve these logs using the following `kubectl` command:
229254
230255
```bash
231256
kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml
@@ -244,6 +269,8 @@ You can retrieve logs and status updates from the cluster autoscaler to help dia
244269
---
245270
246271
For more information, see the [Kubernetes/autoscaler GitHub project FAQ][kubernetes-faq].
272+
## Cluster Autoscaler Metrics
273+
You can enable [control plane metrics (Preview)](./monitor-control-plane-metrics.md) to see the logs and operations from the [cluster autoscaler](./control-plane-metrics-default-list.md#minimal-ingestion-for-default-off-targets) with the [Azure Monitor managed service for Prometheus add-on](../azure-monitor/essentials/prometheus-metrics-overview.md)
247274
248275
## Next steps
249276

articles/aks/create-nginx-ingress-private-controller.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Configure internal NGIX ingress controller for Azure private DNS zone
2+
title: Configure internal NGINX ingress controller for Azure private DNS zone
33
description: Understand how to configure an ingress controller with a private IP address and an Azure private DNS zone using the application routing add-on for Azure Kubernetes Service.
44
ms.subservice: aks-networking
55
ms.custom: devx-track-azurecli
@@ -320,4 +320,4 @@ For other configuration information related to SSL encryption other advanced NGI
320320
[azure-dns-zone-role]: ../dns/dns-protect-private-zones-recordsets.md
321321
[az-network-private-dns-zone-create]: /cli/azure/network/private-dns/zone?#az-network-private-dns-zone-create
322322
[az-network-private-dns-link-vnet-create]: /cli/azure/network/private-dns/link/vnet#az-network-private-dns-link-vnet-create
323-
[az-network-private-dns-record-set-a-list]: /cli/azure/network/private-dns/record-set/a#az-network-private-dns-record-set-a-list
323+
[az-network-private-dns-record-set-a-list]: /cli/azure/network/private-dns/record-set/a#az-network-private-dns-record-set-a-list

0 commit comments

Comments
 (0)