You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Create a managed or user-assigned NAT gateway
2
+
title: Create a managed or user-assigned NAT gateway for your Azure Kubernetes Service (AKS) cluster
3
3
titleSuffix: Azure Kubernetes Service
4
4
description: Learn how to create an AKS cluster with managed NAT integration and user-assigned NAT gateway.
5
5
author: asudbring
6
6
ms.subservice: aks-networking
7
7
ms.custom: devx-track-azurecli
8
8
ms.topic: how-to
9
-
ms.date: 10/26/2021
9
+
ms.date: 05/30/2023
10
10
ms.author: allensu
11
11
---
12
12
13
-
# Create a managed or user-assigned NAT gateway
13
+
# Create a managed or user-assigned NAT gateway for your Azure Kubernetes Service (AKS) cluster
14
14
15
-
While you can route egress traffic through an Azure Load Balancer, there are limitations on the amount of outbound flows of traffic you can have. Azure NAT Gateway allows up to 64,512 outbound UDP and TCP traffic flows per IP address with a maximum of 16 IP addresses.
15
+
While you can route egress traffic through an Azure Load Balancer, there are limitations on the number of outbound flows of traffic you can have. Azure NAT Gateway allows up to 64,512 outbound UDP and TCP traffic flows per IP address with a maximum of 16 IP addresses.
16
16
17
-
This article shows you how to create an AKS cluster with a managed NAT gateway and a user-assigned NAT gateway for egress traffic and how to disable OutboundNAT on Windows.
17
+
This article shows you how to create an Azure Kubernetes Service (AKS) cluster with a managed NAT gateway and a user-assigned NAT gateway for egress traffic. It also shows you how to disable OutboundNAT on Windows.
18
18
19
19
## Before you begin
20
20
21
21
* Make sure you're using the latest version of [Azure CLI][az-cli].
22
22
* Make sure you're using Kubernetes version 1.20.x or above.
23
-
* Managed NAT Gateway is incompatible with custom virtual networks.
23
+
* Managed NAT gateway is incompatible with custom virtual networks.
24
24
25
25
## Create an AKS cluster with a managed NAT gateway
26
26
27
-
To create an AKS cluster with a new managed NAT Gateway, use `--outbound-type managedNATGateway`, `--nat-gateway-managed-outbound-ip-count`, and `--nat-gateway-idle-timeout`when running `az aks create`. If you want the NAT gateway to be able to operate out of availability zones, specify the zones using `--zones`.
27
+
* Create an AKS cluster with a new managed NAT gateway using the [`az aks create`][az-aks-create] command with the `--outbound-type managedNATGateway`, `--nat-gateway-managed-outbound-ip-count`, and `--nat-gateway-idle-timeout`parameters. If you want the NAT gateway to operate out of availability zones, specify the zones using `--zones`.
28
28
29
-
The following example creates a *myResourceGroup* resource group, then creates a *natCluster* AKS cluster in *myResourceGroup* with a Managed NAT Gateway, two outbound IPs, and an idle timeout of 30 seconds.
30
-
31
-
```azurecli-interactive
32
-
az group create --name myResourceGroup --location southcentralus
33
-
```
34
-
35
-
```azurecli-interactive
36
-
az aks create \
37
-
--resource-group myResourceGroup \
38
-
--name natcluster \
39
-
--node-count 3 \
40
-
--outbound-type managedNATGateway \
41
-
--nat-gateway-managed-outbound-ip-count 2 \
42
-
--nat-gateway-idle-timeout 4
43
-
```
29
+
```azurecli-interactive
30
+
az aks create \
31
+
--resource-group myResourceGroup \
32
+
--name myNatCluster \
33
+
--node-count 3 \
34
+
--outbound-type managedNATGateway \
35
+
--nat-gateway-managed-outbound-ip-count 2 \
36
+
--nat-gateway-idle-timeout 4
37
+
```
44
38
45
-
> [!IMPORTANT]
46
-
> If no value for the outbound IP address is specified, the default value is one.
39
+
> [!IMPORTANT]
40
+
> If no value for the outbound IP address is specified, the default value is one.
47
41
48
42
### Update the number of outbound IP addresses
49
43
50
-
To update the outbound IP address or idle timeout, use `--nat-gateway-managed-outbound-ip-count` or `--nat-gateway-idle-timeout`when running `az aks update`.
44
+
* Update the outbound IP address or idle timeout using the [`az aks update`][az-aks-update] command with the `--nat-gateway-managed-outbound-ip-count` or `--nat-gateway-idle-timeout` parameter.
51
45
52
-
```azurecli-interactive
53
-
az aks update \
54
-
--resource-group myresourcegroup \
55
-
--name natcluster\
56
-
--nat-gateway-managed-outbound-ip-count 5
57
-
```
46
+
```azurecli-interactive
47
+
az aks update \
48
+
--resource-group myResourceGroup \
49
+
--name myNatCluster\
50
+
--nat-gateway-managed-outbound-ip-count 5
51
+
```
58
52
59
53
## Create an AKS cluster with a user-assigned NAT gateway
60
54
61
-
To create an AKS cluster with a user-assigned NAT gateway, use `--outbound-type userAssignedNATGateway` when running `az aks create`. This configuration requires bring-your-own networking (via [Kubenet][byo-vnet-kubenet] or [Azure CNI][byo-vnet-azure-cni]) and that the NAT Gateway is preconfigured on the subnet. The following commands create the required resources for this scenario. Make sure to run them all in the same session so that the values stored to variables are still available for the `az aks create` command.
55
+
This configuration requires bring-your-own networking (via [Kubenet][byo-vnet-kubenet] or [Azure CNI][byo-vnet-azure-cni]) and that the NAT gateway is preconfigured on the subnet. The following commands create the required resources for this scenario.
62
56
63
-
1. Create the resource group.
57
+
1. Create a resource group using the [`az group create`][az-group-create] command.
64
58
65
59
```azurecli-interactive
66
60
az group create --name myResourceGroup \
@@ -72,13 +66,13 @@ To create an AKS cluster with a user-assigned NAT gateway, use `--outbound-type
72
66
```azurecli-interactive
73
67
IDENTITY_ID=$(az identity create \
74
68
--resource-group myResourceGroup \
75
-
--name natClusterId \
69
+
--name myNatClusterId \
76
70
--location southcentralus \
77
71
--query id \
78
72
--output tsv)
79
73
```
80
74
81
-
3. Create a public IP for the NAT gateway.
75
+
3. Create a public IP for the NAT gateway using the [`az network public-ip create`][az-network-public-ip-create] command.
82
76
83
77
```azurecli-interactive
84
78
az network public-ip create \
@@ -88,7 +82,7 @@ To create an AKS cluster with a user-assigned NAT gateway, use `--outbound-type
88
82
--sku standard
89
83
```
90
84
91
-
4. Create the NAT gateway.
85
+
4. Create the NAT gateway using the [`az network nat gateway create`][az-network-nat-gateway-create] command.
92
86
93
87
```azurecli-interactive
94
88
az network nat gateway create \
@@ -98,7 +92,7 @@ To create an AKS cluster with a user-assigned NAT gateway, use `--outbound-type
98
92
--public-ip-addresses myNatGatewayPip
99
93
```
100
94
101
-
5. Create a virtual network.
95
+
5. Create a virtual network using the [`az network vnet create`][az-network-vnet-create] command.
102
96
103
97
```azurecli-interactive
104
98
az network vnet create \
@@ -114,19 +108,19 @@ To create an AKS cluster with a user-assigned NAT gateway, use `--outbound-type
114
108
SUBNET_ID=$(az network vnet subnet create \
115
109
--resource-group myResourceGroup \
116
110
--vnet-name myVnet \
117
-
--name natCluster \
111
+
--name myNatCluster \
118
112
--address-prefixes 172.16.0.0/22 \
119
113
--nat-gateway myNatGateway \
120
114
--query id \
121
115
--output tsv)
122
116
```
123
117
124
-
7. Create an AKS cluster using the subnet with the NAT gateway and the managed identity.
118
+
7. Create an AKS cluster using the subnet with the NAT gateway and the managed identity using the [`az aks create`][az-aks-create] command.
125
119
126
120
```azurecli-interactive
127
121
az aks create \
128
122
--resource-group myResourceGroup \
129
-
--name natCluster \
123
+
--name myNatCluster \
130
124
--location southcentralus \
131
125
--network-plugin azure \
132
126
--vnet-subnet-id $SUBNET_ID \
@@ -146,13 +140,13 @@ Windows OutboundNAT can cause certain connection and communication issues with y
146
140
Windows enables OutboundNAT by default. You can now manually disable OutboundNAT when creating new Windows agent pools.
147
141
148
142
> [!NOTE]
149
-
> OutboundNAT can only be disabled on Windows Server 2019 nodepools.
143
+
> OutboundNAT can only be disabled on Windows Server 2019 node pools.
150
144
151
145
### Prerequisites
152
146
153
147
* You need to use `aks-preview` and register the feature flag.
154
148
155
-
1. Install or update `aks-preview`.
149
+
1. Install or update `aks-preview` using the [`az extension add`][az-extension-add] or [`az extension update`][az-extension-update] command.
156
150
157
151
```azurecli
158
152
# Install aks-preview
@@ -164,44 +158,44 @@ Windows enables OutboundNAT by default. You can now manually disable OutboundNAT
164
158
az extension update --name aks-preview
165
159
```
166
160
167
-
2. Register the feature flag.
161
+
2. Register the feature flag using the [`az feature register`][az-feature-register] command.
168
162
169
163
```azurecli
170
164
az feature register --namespace Microsoft.ContainerService --name DisableWindowsOutboundNATPreview
171
165
```
172
166
173
-
3. Check the registration status.
167
+
3. Check the registration status using the [`az feature list`][az-feature-list] command.
174
168
175
169
```azurecli
176
170
az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/DisableWindowsOutboundNATPreview')].{Name:name,State:properties.state}"
177
171
```
178
172
179
-
4. Refresh the registration of the `Microsoft.ContainerService` resource provider.
173
+
4. Refresh the registration of the `Microsoft.ContainerService` resource provider us
180
174
181
175
```azurecli
182
176
az provider register --namespace Microsoft.ContainerService
183
177
```
184
178
185
-
* Your clusters must have a Managed NAT Gateway (which may increase the overall cost).
179
+
* Your clusters must have a managed NAT gateway (which may increase the overall cost).
186
180
* If you're using Kubernetes version 1.25 or older, you need to [update your deployment configuration][upgrade-kubernetes].
187
-
* If you need to switch from a load balancer to NAT Gateway, you can either add a NAT Gateway into the VNet or run [`az aks upgrade`][aks-upgrade] to update the outbound type.
181
+
* If you need to switch from a load balancer to NAT gateway, you can either add a NAT gateway into the VNet or run [`az aks upgrade`][aks-upgrade] to update the outbound type.
188
182
189
183
### Manually disable OutboundNAT for Windows
190
184
191
-
You can manually disable OutboundNAT for Windows when creating new Windows agent pools using `--disable-windows-outbound-nat`.
185
+
* Manually disable OutboundNAT for Windows when creating new Windows agent pools using the [`az aks nodepool add`][az-aks-nodepool-add] command with the `--disable-windows-outbound-nat` flag.
192
186
193
-
> [!NOTE]
194
-
> You can use an existing AKS cluster, but you may need to update the outbound type and add a node pool to enable `--disable-windows-outbound-nat`.
195
-
196
-
```azurecli
197
-
az aks nodepool add \
198
-
--resource-group myResourceGroup
199
-
--cluster-name natCluster
200
-
--name mynodepool
201
-
--node-count 3
202
-
--os-type Windows
203
-
--disable-windows-outbound-nat
204
-
```
187
+
> [!NOTE]
188
+
> You can use an existing AKS cluster, but you may need to update the outbound type and add a node pool to enable `--disable-windows-outbound-nat`.
189
+
190
+
```azurecli
191
+
az aks nodepool add \
192
+
--resource-group myResourceGroup
193
+
--cluster-name myNatCluster
194
+
--name mynodepool
195
+
--node-count 3
196
+
--os-type Windows
197
+
--disable-windows-outbound-nat
198
+
```
205
199
206
200
## Next steps
207
201
@@ -212,7 +206,7 @@ For more information on Azure NAT Gateway, see [Azure NAT Gateway][nat-docs].
Copy file name to clipboardExpand all lines: articles/aks/node-auto-repair.md
+29-33Lines changed: 29 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,69 +1,65 @@
1
1
---
2
-
title: Automatically repairing Azure Kubernetes Service (AKS) nodes
3
-
description: Learn about node auto-repair functionality, and how AKS fixes broken worker nodes.
2
+
title: Automatically repair Azure Kubernetes Service (AKS) nodes
3
+
description: Learn about node auto-repair functionality and how AKS fixes broken worker nodes.
4
4
ms.topic: conceptual
5
-
ms.date: 03/11/2021
5
+
ms.date: 05/30/2023
6
6
---
7
7
8
8
# Azure Kubernetes Service (AKS) node auto-repair
9
9
10
-
AKS continuously monitors the health state of worker nodes and performs automatic node repair if they become unhealthy. The Azure virtual machine (VM) platform [performs maintenance on VMs][vm-updates] experiencing issues.
10
+
Azure Kubernetes Service (AKS) continuously monitors the health state of worker nodes and performs automatic node repair if they become unhealthy. The Azure virtual machine (VM) platform [performs maintenance on VMs][vm-updates] experiencing issues. AKS and Azure VMs work together to minimize service disruptions for clusters.
11
11
12
-
AKS and Azure VMs work together to minimize service disruptions for clusters.
13
-
14
-
In this document, you'll learn how automatic node repair functionality behaves for both Windows and Linux nodes.
12
+
In this article, you learn how the automatic node repair functionality behaves for Windows and Linux nodes.
15
13
16
14
## How AKS checks for unhealthy nodes
17
15
18
-
AKS uses the following rules to determine if a node is unhealthy and needs repair:
19
-
* The node reports **NotReady** status on consecutive checks within a 10-minute timeframe.
20
-
* The node doesn't report any status within 10 minutes.
16
+
AKS uses the following rules to determine if a node is unhealthy and needs repair:
21
17
22
-
You can manually check the health state of your nodes with kubectl.
18
+
* The node reports the **NotReady** status on consecutive checks within a 10-minute time frame.
19
+
* The node doesn't report any status within 10 minutes.
23
20
24
-
```
25
-
kubectl get nodes
26
-
```
21
+
You can manually check the health state of your nodes with the `kubectl get nodes` command.
27
22
28
23
## How automatic repair works
29
24
30
-
> [!Note]
25
+
> [!NOTE]
31
26
> AKS initiates repair operations with the user account **aks-remediator**.
32
27
33
-
If AKS identifies an unhealthy node that remains unhealthy for 5 minutes, AKS takes the following actions:
28
+
If AKS identifies an unhealthy node that remains unhealthy for *five* minutes, AKS performs the following actions:
29
+
30
+
1. Attempts to restart the node.
31
+
2. If the node restart is unsuccessful, AKS reimages the node.
32
+
3. If the reimage is unsuccessful and it's a Linux node, AKS redeploys the node.
34
33
35
-
1. Restarts the node.
36
-
1. If the restart is unsuccessful, reimages the node.
37
-
1. If the reimage is unsuccessful, and this is a Linux node, redeploys the node.
34
+
AKS engineers investigate alternative remediations if auto-repair is unsuccessful.
38
35
39
-
Alternative remediations are investigated by AKS engineers if auto-repair is unsuccessful.
40
-
As well as if you want to get the node to reimage you can always add the nodeCondition "customerMarkedAsUnhealthy": true, and remediator will reimage your node that way.
36
+
If you want the remediator to reimage the node, you can add the `nodeCondition "customerMarkedAsUnhealthy": true`.
41
37
42
-
## Node Autodrain
43
-
[Scheduled Events][scheduled-events] can occur on the underlying virtual machines (VMs) in any of your node pools. For [spot node pools][spot-node-pools], scheduled events may cause a *preempt* node event for the node. Certain node events, such as *preempt*, cause AKS node autodrain to attempt a cordon and drain of the affected node, which allows for a graceful reschedule of any affected workloads on that node. When this happens, you might notice the node to receive a taint with *"remediator.aks.microsoft.com/unschedulable"*, because of *"kubernetes.azure.com/scalesetpriority: spot"*.
38
+
## Node auto-drain
44
39
40
+
[Scheduled events][scheduled-events] can occur on the underlying VMs in any of your node pools. For [spot node pools][spot-node-pools], scheduled events may cause a *preempt* node event for the node. Certain node events, such as *preempt*, cause AKS node auto-drain to attempt a cordon and drain of the affected node. This process enables rescheduling for any affected workloads on that node. You might notice the node receives a taint with `"remediator.aks.microsoft.com/unschedulable"`, because of `"kubernetes.azure.com/scalesetpriority: spot"`.
45
41
46
-
The following table shows the node events, and the actions they cause for AKS node autodrain.
42
+
The following table shows the node events and actions they cause for AKS node auto-drain:
47
43
48
44
| Event | Description | Action |
49
45
| --- | --- | --- |
50
-
| Freeze | The VM is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there is no impact on memory or open files | No action |
51
-
| Reboot | The VM is scheduled for reboot. The VM's non-persistent memory is lost. | No action|
52
-
| Redeploy | The VM is scheduled to move to another node. The VM's ephemeral disks are lost. | Cordon and drain |
46
+
| Freeze | The VM is scheduled to pause for a few seconds. CPU and network connectivity may be suspended, but there's no impact on memory or open files.| No action.|
47
+
| Reboot | The VM is scheduled for reboot. The VM's non-persistent memory is lost. | No action. |
48
+
| Redeploy | The VM is scheduled to move to another node. The VM's ephemeral disks are lost. | Cordon and drain.|
53
49
| Preempt | The spot VM is being deleted. The VM's ephemeral disks are lost. | Cordon and drain |
54
-
| Terminate | The VM is scheduled to be deleted.| Cordon and drain |
55
-
56
-
50
+
| Terminate | The VM is scheduled for deletion.| Cordon and drain. |
57
51
58
52
## Limitations
59
53
60
-
In many cases, AKS can determine if a node is unhealthy and attempt to repair the issue, but there are cases where AKS either can't repair the issue or can't detect that there is an issue. For example, AKS can't detect issues if a node status is not being reported due to error in network configuration, or has failed to initially register as a healthy node.
54
+
In many cases, AKS can determine if a node is unhealthy and attempt to repair the issue. However, there are cases where AKS either can't repair the issue or detect that an issue exists. For example, AKS can't detect issues in the following example scenarios:
55
+
56
+
* A node status isn't being reported due to error in network configuration.
57
+
* A node failed to initially register as a healthy node.
61
58
62
59
## Next steps
63
60
64
-
Use [Availability Zones][availability-zones] to increase high availability with your AKS cluster workloads.
61
+
Use [availability zones][availability-zones] to increase high availability with your AKS cluster workloads.
0 commit comments