Merge pull request #105954 from jluk/autorepair

megvanhuygen · web-flow · commit d4fe4b20669c · 2020-05-20T12:44:50.000-07:00
update zones info
diff --git a/articles/aks/availability-zones.md b/articles/aks/availability-zones.md
@@ -4,15 +4,15 @@ description: Learn how to create a cluster that distributes nodes across availab
 services: container-service
 ms.custom: fasttrack-edit
 ms.topic: article
-ms.date: 06/24/2019
+ms.date: 02/27/2020
 
 ---
 
 # Create an Azure Kubernetes Service (AKS) cluster that uses availability zones
 
-An Azure Kubernetes Service (AKS) cluster distributes resources such as the nodes and storage across logical sections of the underlying Azure compute infrastructure. This deployment model makes sure that the nodes run across separate update and fault domains in a single Azure datacenter. AKS clusters deployed with this default behavior provide a high level of availability to protect against a hardware failure or planned maintenance event.
+An Azure Kubernetes Service (AKS) cluster distributes resources such as nodes and storage across logical sections of underlying Azure infrastructure. This deployment model when using availability zones, ensures nodes in a given availability zone are physically separated from those defined in another availability zone. AKS clusters deployed with multiple availability zones configured across a cluster provide a higher level of availability to protect against a hardware failure or a planned maintenance event.
 
-To provide a higher level of availability to your applications, AKS clusters can be distributed across availability zones. These zones are physically separate datacenters within a given region. When the cluster components are distributed across multiple zones, your AKS cluster is able to tolerate a failure in one of those zones. Your applications and management operations continue to be available even if one entire datacenter has a problem.
+By defining node pools in a cluster to span multiple zones, nodes in a given node pool are able to continue operating even if a single zone has gone down. Your applications can continue to be available even if there is a physical failure in a single datacenter if orchestrated to tolerate failure of a subset of nodes.
 
 This article shows you how to create an AKS cluster and distribute the node components across availability zones.
 
@@ -37,40 +37,36 @@ AKS clusters can currently be created using availability zones in the following
 
 The following limitations apply when you create an AKS cluster using availability zones:
 
-* You can only enable availability zones when the cluster is created.
+* You can only define availability zones when the cluster or node pool is created.
 * Availability zone settings can't be updated after the cluster is created. You also can't update an existing, non-availability zone cluster to use availability zones.
-* You can't disable availability zones for an AKS cluster once it has been created.
-* The node size (VM SKU) selected must be available across all availability zones.
-* Clusters with availability zones enabled require use of Azure Standard Load Balancers for distribution across zones.
-* You must use Kubernetes version 1.13.5 or greater in order to deploy Standard Load Balancers.
-
-AKS clusters that use availability zones must use the Azure load balancer *standard* SKU, which is the default value for the load balancer type. This load balancer type can only be defined at cluster create time. For more information and the limitations of the standard load balancer, see [Azure load balancer standard SKU limitations][standard-lb-limitations].
+* The chosen node size (VM SKU) selected must be available across all availability zones selected.
+* Clusters with availability zones enabled require use of Azure Standard Load Balancers for distribution across zones. This load balancer type can only be defined at cluster create time. For more information and the limitations of the standard load balancer, see [Azure load balancer standard SKU limitations][standard-lb-limitations].
 
 ### Azure disks limitations
 
-Volumes that use Azure managed disks are currently not zonal resources. Pods rescheduled in a different zone from their original zone can't reattach their previous disk(s). It's recommended to run stateless workloads that don't require persistent storage that may come across zonal issues.
+Volumes that use Azure managed disks are currently not zone-redundant resources. Volumes cannot be attached across zones and must be co-located in the same zone as a given node hosting a the target pod.
 
-If you must run stateful workloads, use taints and tolerations in your pod specs to tell the Kubernetes scheduler to create pods in the same zone as your disks. Alternatively, use network-based storage such as Azure Files that can attach to pods as they're scheduled between zones.
+If you must run stateful workloads, use node pool taints and tolerations in  pod specs to group pod scheduling in the same zone as your disks. Alternatively, use network-based storage such as Azure Files that can attach to pods as they're scheduled between zones.
 
 ## Overview of availability zones for AKS clusters
 
-Availability zones is a high-availability offering that protects your applications and data from datacenter failures. Zones are unique physical locations within an Azure region. Each zone is made up of one or more datacenters equipped with independent power, cooling, and networking. To ensure resiliency, there’s a minimum of three separate zones in all enabled regions. The physical separation of availability zones within a region protects applications and data from datacenter failures. Zone-redundant services replicate your applications and data across availability zones to protect from single-points-of-failure.
+Availability zones are a high-availability offering that protects your applications and data from datacenter failures. Zones are unique physical locations within an Azure region. Each zone is made up of one or more datacenters equipped with independent power, cooling, and networking. To ensure resiliency, there's a minimum of three separate zones in all zone enabled regions. The physical separation of availability zones within a region protects applications and data from datacenter failures.
 
 For more information, see [What are availability zones in Azure?][az-overview].
 
 AKS clusters that are deployed using availability zones can distribute nodes across multiple zones within a single region. For example, a cluster in the *East US 2* region can create nodes in all three availability zones in *East US 2*. This distribution of AKS cluster resources improves cluster availability as they're resilient to failure of a specific zone.
 
 ![AKS node distribution across availability zones](media/availability-zones/aks-availability-zones.png)
 
-In a zone outage, the nodes can be rebalanced manually or using the cluster autoscaler. If a single zone becomes unavailable, your applications continue to run.
+If a single zone becomes unavailable, your applications continue to run if the cluster is spread across multiple zones.
 
 ## Create an AKS cluster across availability zones
 
-When you create a cluster using the [az aks create][az-aks-create] command, the `--zones` parameter defines which zones agent nodes are deployed into. The AKS control plane components for your cluster are also spread across zones in the highest available configuration when you define the `--zones` parameter at cluster creation time.
+When you create a cluster using the [az aks create][az-aks-create] command, the `--zones` parameter defines which zones agent nodes are deployed into. The control plane components such as etcd is spread across three zones if you define the `--zones` parameter at cluster creation time. The specific zones which the control plane components are spread across are independent of what explicit zones are selected for the initial node pool.
 
-If you don't define any zones for the default agent pool when you create an AKS cluster, the AKS control plane components for your cluster will not use availability zones. You can add additional node pools using the [az aks nodepool add][az-aks-nodepool-add] command and specify `--zones` for those new nodes, however the control plane components remain without availability zone awareness. You can't change the zone awareness for a node pool or the AKS control plane components once they're deployed.
+If you don't define any zones for the default agent pool when you create an AKS cluster, control plane components are not guaranteed to spread across availability zones. You can add additional node pools using the [az aks nodepool add][az-aks-nodepool-add] command and specify `--zones` for new nodes, but it will not change how the control plane has been spread across zones. Availability zone settings can only be defined at cluster or node pool create-time.
 
-The following example creates an AKS cluster named *myAKSCluster* in the resource group named *myResourceGroup*. A total of *3* nodes are created - one agent in zone *1*, one in *2*, and then one in *3*. The AKS control plane components are also distributed across zones in the highest available configuration since they're defined as part of the cluster create process.
+The following example creates an AKS cluster named *myAKSCluster* in the resource group named *myResourceGroup*. A total of *3* nodes are created - one agent in zone *1*, one in *2*, and then one in *3*.
 
 ```azurecli-interactive
 az group create --name myResourceGroup --location eastus2
@@ -87,6 +83,8 @@ az aks create \
 
 It takes a few minutes to create the AKS cluster.
 
+When deciding what zone a new node should belong to, a given AKS node pool will use a [best effort zone balancing offered by underlying Azure Virtual Machine Scale Sets][vmss-zone-balancing]. A given AKS node pool is considered "balanced" if each zone has the same number of VMs or +\- 1 VM in all other zones for the scale set.
+
 ## Verify node distribution across zones
 
 When the cluster is ready, list the agent nodes in the scale set to see what availability zone they're deployed in.
@@ -144,13 +142,13 @@ Name:       aks-nodepool1-28993262-vmss000004
             failure-domain.beta.kubernetes.io/zone=eastus2-2
 ```
 
-As you can see, we now have two additional nodes in zones 1 and 2. You can deploy an application consisting of three replicas. We will use NGINX as example:
+We now have two additional nodes in zones 1 and 2. You can deploy an application consisting of three replicas. We will use NGINX as an example:
 
 ```console
 kubectl run nginx --image=nginx --replicas=3
 ```
 
-If you verify that nodes where your pods are running, you will see that the pods are running on the pods corresponding to three different availability zones. For example with the command `kubectl describe pod | grep -e "^Name:" -e "^Node:"` you would get an output similar to this:
+By viewing nodes where your pods are running, you see pods are running on the nodes corresponding to three different availability zones. For example, with the command `kubectl describe pod | grep -e "^Name:" -e "^Node:"` you would get an output similar to this:
 
 ```console
 Name:         nginx-6db489d4b7-ktdwg
@@ -182,6 +180,7 @@ This article detailed how to create an AKS cluster that uses availability zones.
 [az-extension-update]: /cli/azure/extension#az-extension-update
 [az-aks-nodepool-add]: /cli/azure/ext/aks-preview/aks/nodepool#ext-aks-preview-az-aks-nodepool-add
 [az-aks-get-credentials]: /cli/azure/aks?view=azure-cli-latest#az-aks-get-credentials
+[vmss-zone-balancing]: ../virtual-machine-scale-sets/virtual-machine-scale-sets-use-availability-zones.md#zone-balancing
 
 <!-- LINKS - external -->
 [kubectl-describe]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#describe
diff --git a/articles/aks/troubleshooting.md b/articles/aks/troubleshooting.md
@@ -130,10 +130,10 @@ Based on the output of the cluster status:
 
 ## I'm receiving errors that my service principal was not found when I try to create a new cluster without passing in an existing one.
 
-When creating an AKS cluster it requires a service principal to create resources on your behalf. AKS offers the ability to have a new one created at cluster creation time, but this requires Azure Active Directory to fully propagate the new service principal in a reasonable time in order to have the cluster succeed in creation. When this propagation takes too long, the cluster will fail validation to create as it cannot find an available service principal to do so. 
+When creating an AKS cluster it requires a service principal to create resources on your behalf. AKS offers the ability to have a new one created at cluster creation time, but this requires Azure Active Directory to propagate the new service principal in a reasonable time in order to have the cluster creation success. If regional propagation exceeds timeout thresholds, the cluster will fail validation to create as it cannot find an available service principal.
 
 Use the following workarounds for this:
-1. Use an existing service principal which has already propagated across regions and exists to pass into AKS at cluster create time.
+1. Use an existing service principal to pass to AKS at cluster create time.
 2. If using automation scripts, add time delays between service principal creation and AKS cluster creation.
 3. If using Azure portal, return to the cluster settings during create and retry the validation page after a few minutes.
 
@@ -154,41 +154,14 @@ Verify that your settings are not conflicting with any of the required or option
 | 1.14 | 1.14.2 or later |
 
 
-### What versions of Kubernetes have Azure Disk support on the Sovereign Cloud?
+### What versions of Kubernetes have Azure Disk support on the Sovereign Clouds?
 
 | Kubernetes version | Recommended version |
 | -- | :--: |
 | 1.12 | 1.12.0 or later |
 | 1.13 | 1.13.0 or later |
 | 1.14 | 1.14.0 or later |
 
-
-### WaitForAttach failed for Azure Disk: parsing "/dev/disk/azure/scsi1/lun1": invalid syntax
-
-In Kubernetes version 1.10, MountVolume.WaitForAttach may fail with an the Azure Disk remount.
-
-On Linux, you may see an incorrect DevicePath format error. For example:
-
-```console
-MountVolume.WaitForAttach failed for volume "pvc-f1562ecb-3e5f-11e8-ab6b-000d3af9f967" : azureDisk - Wait for attach expect device path as a lun number, instead got: /dev/disk/azure/scsi1/lun1 (strconv.Atoi: parsing "/dev/disk/azure/scsi1/lun1": invalid syntax)
-  Warning  FailedMount             1m (x10 over 21m)   kubelet, k8s-agentpool-66825246-0  Unable to mount volumes for pod
-```
-
-On Windows, you may see a wrong DevicePath(LUN) number error. For example:
-
-```console
-Warning  FailedMount             1m    kubelet, 15282k8s9010    MountVolume.WaitForAttach failed for volume "disk01" : azureDisk - WaitForAttach failed within timeout node (15282k8s9010) diskId:(andy-mghyb
-1102-dynamic-pvc-6c526c51-4a18-11e8-ab5c-000d3af7b38e) lun:(4)
-```
-
-This issue has been fixed in the following versions of Kubernetes:
-
-| Kubernetes version | Fixed version |
-| -- | :--: |
-| 1.10 | 1.10.2 or later |
-| 1.11 | 1.11.0 or later |
-| 1.12 and later | N/A |
-
 ### Failure when setting uid and gid in mountOptions for Azure Disk
 
 Azure Disk uses the ext4,xfs filesystem by default and mountOptions such as uid=x,gid=x can't be set at mount time. For example if you tried to set mountOptions uid=999,gid=999, would see an error like:
@@ -322,7 +295,6 @@ This issue has been fixed in the following versions of Kubernetes:
 
 If you are using a version of Kubernetes that does not have the fix for this issue and your node VM has an obsolete disk list, you can mitigate the issue by detaching all non-existing disks from the VM as a single, bulk operation. **Individually detaching non-existing disks may fail.**
 
-
 ### Large number of Azure Disks causes slow attach/detach
 
 When the number of Azure Disks attached to a node VM is larger than 10, attach and detach operations may be slow. This issue is a known issue and there are no workarounds at this time.
@@ -379,7 +351,7 @@ Recommended settings:
 | 1.12.0 - 1.12.1 | 0755 |
 | 1.12.2 and later | 0777 |
 
-If using a cluster with Kubernetes version 1.8.5 or greater and dynamically creating the persistent volume with a storage class, mount options can be specified on the storage class object. The following example sets *0777*:
+Mount options can be specified on the storage class object. The following example sets *0777*:
 
 ```yaml
 kind: StorageClass
@@ -475,7 +447,7 @@ To update your Azure secret file, use `kubectl edit secret`. For example:
 kubectl edit secret azure-storage-account-{storage-account-name}-secret
 ```
 
-After a few minutes, the agent node will retry the azure file mount with the updated storage key.
+After a few minutes, the agent node will retry the file mount with the updated storage key.
 
 ### Cluster autoscaler fails to scale with error failed to fix node group sizes