Skip to content

Commit d4fe4b2

Browse files
authored
Merge pull request #105954 from jluk/autorepair
update zones info
2 parents 12eaa47 + ebec4a5 commit d4fe4b2

File tree

2 files changed

+23
-52
lines changed

2 files changed

+23
-52
lines changed

articles/aks/availability-zones.md

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,15 @@ description: Learn how to create a cluster that distributes nodes across availab
44
services: container-service
55
ms.custom: fasttrack-edit
66
ms.topic: article
7-
ms.date: 06/24/2019
7+
ms.date: 02/27/2020
88

99
---
1010

1111
# Create an Azure Kubernetes Service (AKS) cluster that uses availability zones
1212

13-
An Azure Kubernetes Service (AKS) cluster distributes resources such as the nodes and storage across logical sections of the underlying Azure compute infrastructure. This deployment model makes sure that the nodes run across separate update and fault domains in a single Azure datacenter. AKS clusters deployed with this default behavior provide a high level of availability to protect against a hardware failure or planned maintenance event.
13+
An Azure Kubernetes Service (AKS) cluster distributes resources such as nodes and storage across logical sections of underlying Azure infrastructure. This deployment model when using availability zones, ensures nodes in a given availability zone are physically separated from those defined in another availability zone. AKS clusters deployed with multiple availability zones configured across a cluster provide a higher level of availability to protect against a hardware failure or a planned maintenance event.
1414

15-
To provide a higher level of availability to your applications, AKS clusters can be distributed across availability zones. These zones are physically separate datacenters within a given region. When the cluster components are distributed across multiple zones, your AKS cluster is able to tolerate a failure in one of those zones. Your applications and management operations continue to be available even if one entire datacenter has a problem.
15+
By defining node pools in a cluster to span multiple zones, nodes in a given node pool are able to continue operating even if a single zone has gone down. Your applications can continue to be available even if there is a physical failure in a single datacenter if orchestrated to tolerate failure of a subset of nodes.
1616

1717
This article shows you how to create an AKS cluster and distribute the node components across availability zones.
1818

@@ -37,40 +37,36 @@ AKS clusters can currently be created using availability zones in the following
3737

3838
The following limitations apply when you create an AKS cluster using availability zones:
3939

40-
* You can only enable availability zones when the cluster is created.
40+
* You can only define availability zones when the cluster or node pool is created.
4141
* Availability zone settings can't be updated after the cluster is created. You also can't update an existing, non-availability zone cluster to use availability zones.
42-
* You can't disable availability zones for an AKS cluster once it has been created.
43-
* The node size (VM SKU) selected must be available across all availability zones.
44-
* Clusters with availability zones enabled require use of Azure Standard Load Balancers for distribution across zones.
45-
* You must use Kubernetes version 1.13.5 or greater in order to deploy Standard Load Balancers.
46-
47-
AKS clusters that use availability zones must use the Azure load balancer *standard* SKU, which is the default value for the load balancer type. This load balancer type can only be defined at cluster create time. For more information and the limitations of the standard load balancer, see [Azure load balancer standard SKU limitations][standard-lb-limitations].
42+
* The chosen node size (VM SKU) selected must be available across all availability zones selected.
43+
* Clusters with availability zones enabled require use of Azure Standard Load Balancers for distribution across zones. This load balancer type can only be defined at cluster create time. For more information and the limitations of the standard load balancer, see [Azure load balancer standard SKU limitations][standard-lb-limitations].
4844

4945
### Azure disks limitations
5046

51-
Volumes that use Azure managed disks are currently not zonal resources. Pods rescheduled in a different zone from their original zone can't reattach their previous disk(s). It's recommended to run stateless workloads that don't require persistent storage that may come across zonal issues.
47+
Volumes that use Azure managed disks are currently not zone-redundant resources. Volumes cannot be attached across zones and must be co-located in the same zone as a given node hosting a the target pod.
5248

53-
If you must run stateful workloads, use taints and tolerations in your pod specs to tell the Kubernetes scheduler to create pods in the same zone as your disks. Alternatively, use network-based storage such as Azure Files that can attach to pods as they're scheduled between zones.
49+
If you must run stateful workloads, use node pool taints and tolerations in pod specs to group pod scheduling in the same zone as your disks. Alternatively, use network-based storage such as Azure Files that can attach to pods as they're scheduled between zones.
5450

5551
## Overview of availability zones for AKS clusters
5652

57-
Availability zones is a high-availability offering that protects your applications and data from datacenter failures. Zones are unique physical locations within an Azure region. Each zone is made up of one or more datacenters equipped with independent power, cooling, and networking. To ensure resiliency, theres a minimum of three separate zones in all enabled regions. The physical separation of availability zones within a region protects applications and data from datacenter failures. Zone-redundant services replicate your applications and data across availability zones to protect from single-points-of-failure.
53+
Availability zones are a high-availability offering that protects your applications and data from datacenter failures. Zones are unique physical locations within an Azure region. Each zone is made up of one or more datacenters equipped with independent power, cooling, and networking. To ensure resiliency, there's a minimum of three separate zones in all zone enabled regions. The physical separation of availability zones within a region protects applications and data from datacenter failures.
5854

5955
For more information, see [What are availability zones in Azure?][az-overview].
6056

6157
AKS clusters that are deployed using availability zones can distribute nodes across multiple zones within a single region. For example, a cluster in the *East US 2* region can create nodes in all three availability zones in *East US 2*. This distribution of AKS cluster resources improves cluster availability as they're resilient to failure of a specific zone.
6258

6359
![AKS node distribution across availability zones](media/availability-zones/aks-availability-zones.png)
6460

65-
In a zone outage, the nodes can be rebalanced manually or using the cluster autoscaler. If a single zone becomes unavailable, your applications continue to run.
61+
If a single zone becomes unavailable, your applications continue to run if the cluster is spread across multiple zones.
6662

6763
## Create an AKS cluster across availability zones
6864

69-
When you create a cluster using the [az aks create][az-aks-create] command, the `--zones` parameter defines which zones agent nodes are deployed into. The AKS control plane components for your cluster are also spread across zones in the highest available configuration when you define the `--zones` parameter at cluster creation time.
65+
When you create a cluster using the [az aks create][az-aks-create] command, the `--zones` parameter defines which zones agent nodes are deployed into. The control plane components such as etcd is spread across three zones if you define the `--zones` parameter at cluster creation time. The specific zones which the control plane components are spread across are independent of what explicit zones are selected for the initial node pool.
7066

71-
If you don't define any zones for the default agent pool when you create an AKS cluster, the AKS control plane components for your cluster will not use availability zones. You can add additional node pools using the [az aks nodepool add][az-aks-nodepool-add] command and specify `--zones` for those new nodes, however the control plane components remain without availability zone awareness. You can't change the zone awareness for a node pool or the AKS control plane components once they're deployed.
67+
If you don't define any zones for the default agent pool when you create an AKS cluster, control plane components are not guaranteed to spread across availability zones. You can add additional node pools using the [az aks nodepool add][az-aks-nodepool-add] command and specify `--zones` for new nodes, but it will not change how the control plane has been spread across zones. Availability zone settings can only be defined at cluster or node pool create-time.
7268

73-
The following example creates an AKS cluster named *myAKSCluster* in the resource group named *myResourceGroup*. A total of *3* nodes are created - one agent in zone *1*, one in *2*, and then one in *3*. The AKS control plane components are also distributed across zones in the highest available configuration since they're defined as part of the cluster create process.
69+
The following example creates an AKS cluster named *myAKSCluster* in the resource group named *myResourceGroup*. A total of *3* nodes are created - one agent in zone *1*, one in *2*, and then one in *3*.
7470

7571
```azurecli-interactive
7672
az group create --name myResourceGroup --location eastus2
@@ -87,6 +83,8 @@ az aks create \
8783

8884
It takes a few minutes to create the AKS cluster.
8985

86+
When deciding what zone a new node should belong to, a given AKS node pool will use a [best effort zone balancing offered by underlying Azure Virtual Machine Scale Sets][vmss-zone-balancing]. A given AKS node pool is considered "balanced" if each zone has the same number of VMs or +\- 1 VM in all other zones for the scale set.
87+
9088
## Verify node distribution across zones
9189

9290
When the cluster is ready, list the agent nodes in the scale set to see what availability zone they're deployed in.
@@ -144,13 +142,13 @@ Name: aks-nodepool1-28993262-vmss000004
144142
failure-domain.beta.kubernetes.io/zone=eastus2-2
145143
```
146144

147-
As you can see, we now have two additional nodes in zones 1 and 2. You can deploy an application consisting of three replicas. We will use NGINX as example:
145+
We now have two additional nodes in zones 1 and 2. You can deploy an application consisting of three replicas. We will use NGINX as an example:
148146

149147
```console
150148
kubectl run nginx --image=nginx --replicas=3
151149
```
152150

153-
If you verify that nodes where your pods are running, you will see that the pods are running on the pods corresponding to three different availability zones. For example with the command `kubectl describe pod | grep -e "^Name:" -e "^Node:"` you would get an output similar to this:
151+
By viewing nodes where your pods are running, you see pods are running on the nodes corresponding to three different availability zones. For example, with the command `kubectl describe pod | grep -e "^Name:" -e "^Node:"` you would get an output similar to this:
154152

155153
```console
156154
Name: nginx-6db489d4b7-ktdwg
@@ -182,6 +180,7 @@ This article detailed how to create an AKS cluster that uses availability zones.
182180
[az-extension-update]: /cli/azure/extension#az-extension-update
183181
[az-aks-nodepool-add]: /cli/azure/ext/aks-preview/aks/nodepool#ext-aks-preview-az-aks-nodepool-add
184182
[az-aks-get-credentials]: /cli/azure/aks?view=azure-cli-latest#az-aks-get-credentials
183+
[vmss-zone-balancing]: ../virtual-machine-scale-sets/virtual-machine-scale-sets-use-availability-zones.md#zone-balancing
185184

186185
<!-- LINKS - external -->
187186
[kubectl-describe]: https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#describe

articles/aks/troubleshooting.md

Lines changed: 5 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -130,10 +130,10 @@ Based on the output of the cluster status:
130130

131131
## I'm receiving errors that my service principal was not found when I try to create a new cluster without passing in an existing one.
132132

133-
When creating an AKS cluster it requires a service principal to create resources on your behalf. AKS offers the ability to have a new one created at cluster creation time, but this requires Azure Active Directory to fully propagate the new service principal in a reasonable time in order to have the cluster succeed in creation. When this propagation takes too long, the cluster will fail validation to create as it cannot find an available service principal to do so.
133+
When creating an AKS cluster it requires a service principal to create resources on your behalf. AKS offers the ability to have a new one created at cluster creation time, but this requires Azure Active Directory to propagate the new service principal in a reasonable time in order to have the cluster creation success. If regional propagation exceeds timeout thresholds, the cluster will fail validation to create as it cannot find an available service principal.
134134

135135
Use the following workarounds for this:
136-
1. Use an existing service principal which has already propagated across regions and exists to pass into AKS at cluster create time.
136+
1. Use an existing service principal to pass to AKS at cluster create time.
137137
2. If using automation scripts, add time delays between service principal creation and AKS cluster creation.
138138
3. If using Azure portal, return to the cluster settings during create and retry the validation page after a few minutes.
139139

@@ -154,41 +154,14 @@ Verify that your settings are not conflicting with any of the required or option
154154
| 1.14 | 1.14.2 or later |
155155

156156

157-
### What versions of Kubernetes have Azure Disk support on the Sovereign Cloud?
157+
### What versions of Kubernetes have Azure Disk support on the Sovereign Clouds?
158158

159159
| Kubernetes version | Recommended version |
160160
| -- | :--: |
161161
| 1.12 | 1.12.0 or later |
162162
| 1.13 | 1.13.0 or later |
163163
| 1.14 | 1.14.0 or later |
164164

165-
166-
### WaitForAttach failed for Azure Disk: parsing "/dev/disk/azure/scsi1/lun1": invalid syntax
167-
168-
In Kubernetes version 1.10, MountVolume.WaitForAttach may fail with an the Azure Disk remount.
169-
170-
On Linux, you may see an incorrect DevicePath format error. For example:
171-
172-
```console
173-
MountVolume.WaitForAttach failed for volume "pvc-f1562ecb-3e5f-11e8-ab6b-000d3af9f967" : azureDisk - Wait for attach expect device path as a lun number, instead got: /dev/disk/azure/scsi1/lun1 (strconv.Atoi: parsing "/dev/disk/azure/scsi1/lun1": invalid syntax)
174-
Warning FailedMount 1m (x10 over 21m) kubelet, k8s-agentpool-66825246-0 Unable to mount volumes for pod
175-
```
176-
177-
On Windows, you may see a wrong DevicePath(LUN) number error. For example:
178-
179-
```console
180-
Warning FailedMount 1m kubelet, 15282k8s9010 MountVolume.WaitForAttach failed for volume "disk01" : azureDisk - WaitForAttach failed within timeout node (15282k8s9010) diskId:(andy-mghyb
181-
1102-dynamic-pvc-6c526c51-4a18-11e8-ab5c-000d3af7b38e) lun:(4)
182-
```
183-
184-
This issue has been fixed in the following versions of Kubernetes:
185-
186-
| Kubernetes version | Fixed version |
187-
| -- | :--: |
188-
| 1.10 | 1.10.2 or later |
189-
| 1.11 | 1.11.0 or later |
190-
| 1.12 and later | N/A |
191-
192165
### Failure when setting uid and gid in mountOptions for Azure Disk
193166

194167
Azure Disk uses the ext4,xfs filesystem by default and mountOptions such as uid=x,gid=x can't be set at mount time. For example if you tried to set mountOptions uid=999,gid=999, would see an error like:
@@ -322,7 +295,6 @@ This issue has been fixed in the following versions of Kubernetes:
322295

323296
If you are using a version of Kubernetes that does not have the fix for this issue and your node VM has an obsolete disk list, you can mitigate the issue by detaching all non-existing disks from the VM as a single, bulk operation. **Individually detaching non-existing disks may fail.**
324297

325-
326298
### Large number of Azure Disks causes slow attach/detach
327299

328300
When the number of Azure Disks attached to a node VM is larger than 10, attach and detach operations may be slow. This issue is a known issue and there are no workarounds at this time.
@@ -379,7 +351,7 @@ Recommended settings:
379351
| 1.12.0 - 1.12.1 | 0755 |
380352
| 1.12.2 and later | 0777 |
381353

382-
If using a cluster with Kubernetes version 1.8.5 or greater and dynamically creating the persistent volume with a storage class, mount options can be specified on the storage class object. The following example sets *0777*:
354+
Mount options can be specified on the storage class object. The following example sets *0777*:
383355

384356
```yaml
385357
kind: StorageClass
@@ -475,7 +447,7 @@ To update your Azure secret file, use `kubectl edit secret`. For example:
475447
kubectl edit secret azure-storage-account-{storage-account-name}-secret
476448
```
477449

478-
After a few minutes, the agent node will retry the azure file mount with the updated storage key.
450+
After a few minutes, the agent node will retry the file mount with the updated storage key.
479451

480452
### Cluster autoscaler fails to scale with error failed to fix node group sizes
481453

0 commit comments

Comments
 (0)