MicrosoftDocs
diff --git a/‎articles/ai-services/content-safety/includes/severity-levels.md
Lines changed: 18 additions & 18 deletions b/‎articles/ai-services/content-safety/includes/severity-levels.md
Lines changed: 18 additions & 18 deletions
diff --git a/‎articles/ai-services/openai/concepts/content-filter.md
Lines changed: 4 additions & 0 deletions b/‎articles/ai-services/openai/concepts/content-filter.md
Lines changed: 4 additions & 0 deletions
diff --git a/‎articles/ai-services/openai/concepts/models.md
Lines changed: 3 additions & 3 deletions b/‎articles/ai-services/openai/concepts/models.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎articles/ai-services/openai/includes/dall-e-python.md
Lines changed: 3 additions & 2 deletions b/‎articles/ai-services/openai/includes/dall-e-python.md
Lines changed: 3 additions & 2 deletions
diff --git a/‎articles/aks/TOC.yml
Lines changed: 5 additions & 1 deletion b/‎articles/aks/TOC.yml
Lines changed: 5 additions & 1 deletion
diff --git a/‎articles/aks/availability-zones.md
Lines changed: 3 additions & 0 deletions b/‎articles/aks/availability-zones.md
Lines changed: 3 additions & 0 deletions
diff --git a/‎articles/aks/best-practices-performance-scale-large.md
Lines changed: 1 addition & 1 deletion b/‎articles/aks/best-practices-performance-scale-large.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/aks/best-practices-performance-scale.md
Lines changed: 2 additions & 2 deletions b/‎articles/aks/best-practices-performance-scale.md
Lines changed: 2 additions & 2 deletions
@@ -796,6 +796,10 @@ data: {"id":"","object":"","created":0,"model":"","choices":[{"index":0,"finish_
 
 data: [DONE] 
 ```
+
+> [!IMPORTANT]
+> When content filtering is triggered for a prompt and a `"status": 400` is received as part of the response there may be a charge for this request as the prompt was evaluated by the service. [Charges will also occur](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) when a `"status":200` is received with `"finish_reason": "content_filter"`. In this case the prompt did not have any issues, but the completion generated by the model was detected to violate the content filtering rules which results in the completion being filtered.
+
 ## Best practices
 
 As part of your application design, consider the following best practices to deliver a positive experience with your application while minimizing potential harms:
 
@@ -101,7 +101,7 @@ See [model versions](../concepts/model-versions.md) to learn about how Azure Ope
 **<sup>2</sup>** GPT-4 Turbo with Vision Preview = `gpt-4` (vision-preview). To deploy this model, under **Deployments** select model **gpt-4**. For **Model version** select **vision-preview**.
 
 > [!CAUTION]
-> We don't recommend using these models in production. We will upgrade all deployments of these models to a future stable version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
+> We don't recommend using preview models in production. We will upgrade all deployments of preview models to a future stable version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
 
 > [!NOTE]
 > Regions where GPT-4 (0314) & (0613) are listed as available have access to both the 8K and 32K versions of the model
@@ -110,8 +110,8 @@ See [model versions](../concepts/model-versions.md) to learn about how Azure Ope
 
 | Model Availability | gpt-4 (0314) | gpt-4 (0613) | gpt-4 (1106-preview) | gpt-4 (vision-preview) | 
 |---|:---|:---|:---|:---|
-| Available to all subscriptions with Azure OpenAI access | | Australia East <br> Canada East <br> France Central <br> Sweden Central <br> Switzerland North | Australia East <br> Canada East <br> East US 2 <br> France Central <br> Norway East <br> South India <br> Sweden Central <br> UK South <br> West US | Switzerland North <br> West US | 
-| Available to subscriptions with current access to the model version in the region | East US <br> France Central <br> South Central US <br> UK South | East US <br> East US 2 <br> Japan East <br> UK South | | Australia East <br>Sweden Central|
+| Available to all subscriptions with Azure OpenAI access | | Australia East <br> Canada East <br> France Central <br> Sweden Central <br> Switzerland North | Australia East <br> Canada East <br> East US 2 <br> France Central <br> Norway East <br> South India <br> Sweden Central <br> UK South <br> West US | Sweden Central <br> Switzerland North <br> West US | 
+| Available to subscriptions with current access to the model version in the region | East US <br> France Central <br> South Central US <br> UK South | East US <br> East US 2 <br> Japan East <br> UK South | | Australia East |
 
 ### GPT-3.5 models
 
 
@@ -64,8 +64,6 @@ Open a command prompt and browse to your project folder. Create a new python fil
 
 ## Install the Python SDK
 
-> [!IMPORTANT]
-> The latest release of the [OpenAI Python library](https://pypi.org/project/openai/) does not currently support DALL-E when used with Azure OpenAI. To access DALL-E with Azure OpenAI use version `0.28.1`.
 
 Install the OpenAI Python SDK by using the following command:
 
@@ -77,6 +75,9 @@ pip install openai
 
 #### [DALL-E 2](#tab/dalle2)
 
+> [!IMPORTANT]
+> The latest release of the [OpenAI Python library](https://pypi.org/project/openai/) does not currently support DALL-E 2 when used with Azure OpenAI. To access DALL-E 2 with Azure OpenAI use version `0.28.1`. Or, follow the [migration guide](/azure/ai-services/openai/how-to/migration?tabs=python%2Cdalle-fix) to use DALL-E 2 with OpenAI 1.x.
+
 ```bash
 pip install openai==0.28.1
 ```
 
@@ -346,7 +346,11 @@
         - name: Proximity placement groups
           href: reduce-latency-ppg.md
         - name: Cluster Autoscaler
-          href: cluster-autoscaler.md
+          items:
+            - name: Cluster Autoscaler overview
+              href: cluster-autoscaler-overview.md
+            - name: Use the Cluster Autoscaler on AKS
+              href: cluster-autoscaler.md
         - name: Node autoprovision
           href: node-autoprovision.md
         - name: Availability Zones
 
@@ -68,6 +68,9 @@ AKS clusters deployed using availability zones can distribute nodes across multi
 
 If a single zone becomes unavailable, your applications continue to run on clusters configured to spread across multiple zones.
 
+> [!NOTE]
+> When implementing **availability zones with the [cluster autoscaler](./cluster-autoscaler-overview.md)**, we recommend using a single node pool for each zone. You can set the `--balance-similar-node-groups` parameter to `True` to maintain a balanced distribution of nodes across zones for your workloads during scale up operations. When this approach isn't implemented, scale down operations can disrupt the balance of nodes across zones.
+
 ## Create an AKS cluster across availability zones
 
 When you create a cluster using the [az aks create][az-aks-create] command, the `--zones` parameter specifies the availability zones to deploy agent nodes into. The availability zones that the managed control plane components are deployed into are **not** controlled by this parameter. They are automatically spread across all availability zones (if present) in the region during cluster deployment.
 
@@ -76,7 +76,7 @@ Keeping the above considerations in mind, customers are typically able to deploy
 Always upgrade your Kubernetes clusters to the latest version. Newer versions contain many improvements that address performance and throttling issues. If you're using an upgraded version of Kubernetes and still see throttling due to the actual load or the number of clients in the subscription, you can try the following options:
 
 * **Analyze errors using AKS Diagnose and Solve Problems**: You can use [AKS Diagnose and Solve Problems](./aks-diagnostics.md) to analyze errors, identity the root cause, and get resolution recommendations.
-  * **Increase the Cluster Autoscaler scan interval**: If the diagnostic reports show that [Cluster Autoscaler throttling has been detected](/troubleshoot/azure/azure-kubernetes/429-too-many-requests-errors#analyze-and-identify-errors-by-using-aks-diagnose-and-solve-problems), you can [increase the scan interval](./cluster-autoscaler.md#change-the-cluster-autoscaler-settings) to reduce the number of calls to Virtual Machine Scale Sets from the Cluster Autoscaler.
+  * **Increase the Cluster Autoscaler scan interval**: If the diagnostic reports show that [Cluster Autoscaler throttling has been detected](/troubleshoot/azure/azure-kubernetes/429-too-many-requests-errors#analyze-and-identify-errors-by-using-aks-diagnose-and-solve-problems), you can [increase the scan interval](./cluster-autoscaler.md#update-the-cluster-autoscaler-settings) to reduce the number of calls to Virtual Machine Scale Sets from the Cluster Autoscaler.
   * **Reconfigure third-party applications to make fewer calls**: If you filter by *user agents* in the ***View request rate and throttle details*** diagnostic and see that [a third-party application, such as a monitoring application, makes a large number of GET requests](/troubleshoot/azure/azure-kubernetes/429-too-many-requests-errors#analyze-and-identify-errors-by-using-aks-diagnose-and-solve-problems), you can change the settings of these applications to reduce the frequency of the GET calls. Make sure the application clients use exponential backoff when calling Azure APIs.
 * **Split your clusters into different subscriptions or regions**: If you have a large number of clusters and node pools that use Virtual Machine Scale Sets, you can split them into different subscriptions or regions within the same subscription. Most Azure API limits are shared at the subscription-region level, so you can move or scale your clusters to different subscriptions or regions to get unblocked on Azure API throttling. This option is especially helpful if you expect your clusters to have high activity. There are no generic guidelines for these limits. If you want specific guidance, you can create a support ticket.
 
 
@@ -57,7 +57,7 @@ Implementing [vertical pod autoscaling](./vertical-pod-autoscaler.md) is useful
 
 Implementing cluster autoscaling is useful if your existing nodes lack sufficient capacity, as it helps with scaling up and provisioning new nodes.
 
-When considering cluster autoscaling, the decision of when to remove a node involves a tradeoff between optimizing resource utilization and ensuring resource availability. Eliminating underutilized nodes enhances cluster utilization but might result in new workloads having to wait for resources to be provisioned before they can be deployed. It's important to find a balance between these two factors that aligns with your cluster and workload requirements and [configure the cluster autoscaler profile settings accordingly](./cluster-autoscaler.md#change-the-cluster-autoscaler-settings).
+When considering cluster autoscaling, the decision of when to remove a node involves a tradeoff between optimizing resource utilization and ensuring resource availability. Eliminating underutilized nodes enhances cluster utilization but might result in new workloads having to wait for resources to be provisioned before they can be deployed. It's important to find a balance between these two factors that aligns with your cluster and workload requirements and [configure the cluster autoscaler profile settings accordingly](./cluster-autoscaler.md#update-the-cluster-autoscaler-settings).
 
 The Cluster Autoscaler profile settings apply universally to all autoscaler-enabled node pools in your cluster. This means that any scaling actions occurring in one autoscaler-enabled node pool might impact the autoscaling behavior in another node pool. It's important to apply consistent and synchronized profile settings across all relevant node pools to ensure that the autoscaler behaves as expected.
 
@@ -234,7 +234,7 @@ The following table provides a breakdown of suggested use cases for OS disks sup
 
 #### IOPS and throughput
 
-Input/output operations per second (IOPS) refers to the number of read and write operations that a disk can perform in a second. Throughout refers to the amount of data that can be transferred in a given time period.
+Input/output operations per second (IOPS) refers to the number of read and write operations that a disk can perform in a second. Throughput refers to the amount of data that can be transferred in a given time period.
 
 OS disks are responsible for storing the operating system and its associated files, and the VMs are responsible for running the applications. When selecting a VM, ensure the size and performance of the OS disk and VM SKU don't have a large discrepancy. A discrepancy in size or performance can cause performance issues and resource contention. For example, if the OS disk is significantly smaller than the VMs, it can limit the amount of space available for application data and cause the system to run out of disk space. If the OS disk has lower performance than the VMs, it can become a bottleneck and limit the overall performance of the system. Make sure the size and performance are balanced to ensure optimal performance in Kubernetes.