You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/operator-nexus/concepts-nexus-kubernetes-placement.md
+62-61Lines changed: 62 additions & 61 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,24 +9,24 @@ ms.date: 04/19/2024
9
9
ms.custom: template-concept
10
10
---
11
11
12
-
# Background
12
+
# Resource placement in Azure Operator Nexus Kubernetes
13
13
14
14
Operator Nexus instances are deployed at the customer premises. Each instance
15
15
comprises one or more racks of bare metal servers.
16
16
17
-
When a user creates a Nexus Kubernetes Cluster (NAKS), they specify a count and
17
+
When a user creates a Nexus Kubernetes Cluster (NKS), they specify a count and
18
18
a [stock keeping unit](./reference-nexus-kubernetes-cluster-sku.md) (SKU) for
19
19
virtual machines (VM) that make up the Kubernetes Control Plane and one or more
20
20
Agent Pools. Agent Pools are the set of Worker Nodes on which a customer's
21
21
containerized network functions run.
22
22
23
23
The Nexus platform is responsible for deciding the bare metal server on which
24
-
each NAKS VM launches.
24
+
each NKS VM launches.
25
25
26
-
## How the Nexus Platform Schedules a NAKS VM
26
+
## How the Nexus platform schedules a Nexus Kubernetes Cluster VM
27
27
28
28
Nexus first identifies the set of potential bare metal servers that meet all of
29
-
the resource requirements of the NAKS VM SKU. For example, if the user
29
+
the resource requirements of the NKS VM SKU. For example, if the user
30
30
specified an `NC_G48_224_v1` VM SKU for their agent pool, Nexus collects the
31
31
bare metal servers that have available capacity for 48 vCPU, 224Gi of RAM, etc.
32
32
@@ -35,39 +35,39 @@ Plane being scheduled. If this field isn't empty, Nexus filters the list of
35
35
potential bare metal servers to only those servers in the specified
36
36
availability zones (racks). This behavior is a *hard scheduling constraint*. If
37
37
there's no bare metal servers in the filtered list, Nexus *doesn't schedule*
38
-
the NAKS VM and the cluster fails to provision.
38
+
the NKS VM and the cluster fails to provision.
39
39
40
40
Once Nexus identifies a list of potential bare metal servers on which to place
41
-
the NAKS VM, Nexus then picks one of the bare metal servers after applying the
41
+
the NKS VM, Nexus then picks one of the bare metal servers after applying the
42
42
following sorting rules:
43
43
44
-
1. Prefer bare metal servers in availability zones (racks) that don't have NAKS
45
-
VMs from this NAKS Cluster. In other words, *spread the NAKS VMs for a NAKS
44
+
1. Prefer bare metal servers in availability zones (racks) that don't have NKS
45
+
VMs from this NKS Cluster. In other words, *spread the NKS VMs for an NKS
46
46
Cluster across availability zones*.
47
47
48
48
1. Prefer bare metal servers within a single availability zone (rack) that
49
-
don't have other NAKS VMs from the same NAKS Cluster. In other words,
50
-
*spread the NAKS VMs for a NAKS Cluster across bare metal servers within an
49
+
don't have other NKS VMs from the same NKS Cluster. In other words,
50
+
*spread the NKS VMs for an NKS Cluster across bare metal servers within an
51
51
availability zone*.
52
52
53
-
1. If the NAKS VM SKU is either `NC_G48_224_v1` or `NC_P46_224_v1`, prefer
53
+
1. If the NKS VM SKU is either `NC_G48_224_v1` or `NC_P46_224_v1`, prefer
54
54
bare metal servers that already house `NC_G48_224_v1` or `NC_P46_224_v1`
55
-
NAKS VMs from other NAKS Clusters. In other words, *group the extra-large
56
-
VMs from different NAKS Clusters on the same bare metal servers*. This rule
55
+
NKS VMs from other NKS Clusters. In other words, *group the extra-large
56
+
VMs from different NKS Clusters on the same bare metal servers*. This rule
57
57
"bin packs" the extra-large VMs in order to reduce fragmentation of the
58
58
available compute resources.
59
59
60
-
## Example Placement Scenarios
60
+
## Example placement scenarios
61
61
62
62
The following sections highlight behavior that Nexus users should expect
63
-
when creating NAKS Clusters against an Operator Nexus environment.
63
+
when creating NKS Clusters against an Operator Nexus environment.
64
64
65
-
> **Hint**: You can see which bare metal server your NAKS VMs were scheduled to
66
-
> by examining the `nodes.bareMetalMachineId` property of the NAKS
65
+
> **Hint**: You can see which bare metal server your NKS VMs were scheduled to
66
+
> by examining the `nodes.bareMetalMachineId` property of the NKS
67
67
> KubernetesCluster resource or viewing the "Host" column in Azure Portal's
68
68
> display of Kubernetes Cluster Nodes.
69
69
70
-
:::image type="content" source="media/nexus-kubernetes/show-baremetal-host.png" alt-text="A screenshot showing bare metal server for NAKS VMs.":::
70
+
:::image type="content" source="media/nexus-kubernetes/show-baremetal-host.png" lightbox="media/nexus-kubernetes/show-baremetal-host.png" alt-text="A screenshot showing bare metal server for NKS VMs.":::
71
71
72
72
The example Operator Nexus environment has these specifications:
73
73
@@ -77,12 +77,12 @@ The example Operator Nexus environment has these specifications:
Given an empty Operator Nexus environment with the given capacity, we create
83
83
three differently sized Nexus Kubernetes Clusters.
84
84
85
-
The NAKS Clusters have these specifications, and we assume for the purposes of
85
+
The NKS Clusters have these specifications, and we assume for the purposes of
86
86
this exercise that the user creates the three Clusters in the following order:
87
87
88
88
Cluster A
@@ -124,17 +124,16 @@ Cluster C Agent Pool #1 has 12 VMs restricted to AvailabilityZones [1, 4] so it
124
124
has 12 VMs on 12 bare metal servers, six in each of racks 1 and 4.
125
125
126
126
Extra-large VMs (the `NC_P46_224_v1` SKU) from different clusters are placed
127
-
on the same bare metal servers (see rule #3 in
128
-
[How the Nexus Platform Schedules a VM][#how-the-nexus-platform-schedule-a-vm]).
127
+
on the same bare metal servers (see rule #3 in [How the Nexus platform schedules a Nexus Kubernetes Cluster VM](#how-the-nexus-platform-schedules-a-nexus-kubernetes-cluster-vm)).
129
128
130
129
Here's a visualization of a layout the user might see after deploying Clusters
131
130
A, B, and C into an empty environment.
132
131
133
-
:::image type="content" source="media/nexus-kubernetes/after-first-deployment.png" alt-text="Diagram showing possible layout of VMs after first deployment.":::
132
+
:::image type="content" source="media/nexus-kubernetes/after-first-deployment.png" lightbox="media/nexus-kubernetes/after-first-deployment.png" alt-text="Diagram showing possible layout of VMs after first deployment.":::
134
133
135
-
### Half-full Environment
134
+
### Half-full environment
136
135
137
-
We now run through an example of launching another NAKS Cluster when the target
136
+
We now run through an example of launching another NKS Cluster when the target
138
137
environment is half-full. The target environment is half-full after Clusters A,
139
138
B, and C are deployed into the target environment.
140
139
@@ -164,7 +163,7 @@ If a Cluster D control plane VM lands on rack 7 or 8, it's likely that one
164
163
Cluster D Agent Pool #1 VM lands on the same bare metal server as that Cluster
165
164
D control plane VM. This behavior is due to Agent Pool #1 being "pinned" to
166
165
racks 7 and 8. Capacity constraints in those racks cause the scheduler to
167
-
collocate a control plane VM and an Agent Pool #1 VM from the same NAKS
166
+
collocate a control plane VM and an Agent Pool #1 VM from the same NKS
168
167
Cluster.
169
168
170
169
Cluster D's Agent Pool #2 has three VMs on different bare metal servers on each
@@ -176,12 +175,12 @@ and Agent Pool #2 are collocated on the same bare metal servers in racks 7 and
176
175
Here's a visualization of a layout the user might see after deploying Cluster
177
176
D into the target environment.
178
177
179
-
:::image type="content" source="media/nexus-kubernetes/after-second-deployment.png" alt-text="Diagram showing possible layout of VMs after second deployment.":::
178
+
:::image type="content" source="media/nexus-kubernetes/after-second-deployment.png" lightbox="media/nexus-kubernetes/after-second-deployment.png" alt-text="Diagram showing possible layout of VMs after second deployment.":::
180
179
181
-
### Nearly full Environment
180
+
### Nearly full environment
182
181
183
182
In our example target environment, four of the eight racks are
184
-
close to capacity. Let's try to launch another NAKS Cluster.
183
+
close to capacity. Let's try to launch another NKS Cluster.
185
184
186
185
Cluster E has the following specifications:
187
186
@@ -197,71 +196,73 @@ into the target environment.
197
196
| E | Agent Pool #1|`NC_P46_224_v1`| 32 | 8 | 8 |**4**|**3, 4 or 5**|
198
197
199
198
Cluster E's Agent Pool #1 will spread unevenly over all eight racks. Racks 7
200
-
and 8 will have three NAKS VMs from Agent Pool #1 instead of the expected four
201
-
NAKS VMs because there's no more capacity for the extra-large SKU VMs in those
199
+
and 8 will have three NKS VMs from Agent Pool #1 instead of the expected four
200
+
NKS VMs because there's no more capacity for the extra-large SKU VMs in those
202
201
racks after scheduling Clusters A through D. Because racks 7 and 8 don't have
203
-
capacity for the fourth extra-large SKU in Agent Pool #1, five NAKS VMs will
202
+
capacity for the fourth extra-large SKU in Agent Pool #1, five NKS VMs will
204
203
land on the two least-utilized racks. In our example, those least-utilized
205
204
racks were racks 3 and 6.
206
205
207
206
Here's a visualization of a layout the user might see after deploying Cluster
208
207
E into the target environment.
209
208
210
-
:::image type="content" source="media/nexus-kubernetes/after-third-deployment.png" alt-text="Diagram showing possible layout of VMs after third deployment.":::
209
+
:::image type="content" source="media/nexus-kubernetes/after-third-deployment.png" lightbox="media/nexus-kubernetes/after-third-deployment.png" alt-text="Diagram showing possible layout of VMs after third deployment.":::
211
210
212
-
## Placement during a Runtime Upgrade
211
+
## Placement during a runtime upgrade
213
212
214
213
As of April 2024 (Network Cloud 2304.1 release), runtime upgrades are performed
215
214
using a rack-by-rack strategy. Bare metal servers in rack 1 are reimaged all at
216
215
once. The upgrade process pauses until all the bare metal servers successfully
217
216
restart and tell Nexus that they're ready to receive workloads.
218
217
219
-
> Note: It is possible to instruct Operator Nexus to only reimage a portion of
218
+
> [!NOTE]
219
+
> It is possible to instruct Operator Nexus to only reimage a portion of
220
220
> the bare metal servers in a rack at once, however the default is to reimage
221
221
> all bare metal servers in a rack in parallel.
222
222
223
223
When an individual bare metal server is reimaged, all workloads running on that
224
-
bare metal server, including all NAKS VMs, lose power, and connectivity. Workload
225
-
containers running on NAKS VMs will, in turn, lose power, and connectivity.
226
-
After one minute of not being able to reach those workload containers, the NAKS
224
+
bare metal server, including all NKS VMs, lose power, and connectivity. Workload
225
+
containers running on NKS VMs will, in turn, lose power, and connectivity.
226
+
After one minute of not being able to reach those workload containers, the NKS
227
227
Cluster's Kubernetes Control Plane will mark the corresponding Pods as
228
-
unhealthy. If the Pods are members of a Deployment or StatefulSet, the NAKS
228
+
unhealthy. If the Pods are members of a Deployment or StatefulSet, the NKS
229
229
Cluster's Kubernetes Control Plane attempts to launch replacement Pods to
230
230
bring the observed replica count of the Deployment or StatefulSet back to the
231
231
desired replica count.
232
232
233
233
New Pods only launch if there's available capacity for the Pod in the remaining
234
-
healthy NAKS VMs. As of April 2024 (Network Cloud 2304.1 release), new NAKS VMs
235
-
aren't created to replace NAKS VMs that were on the bare metal server being
234
+
healthy NKS VMs. As of April 2024 (Network Cloud 2304.1 release), new NKS VMs
235
+
aren't created to replace NKS VMs that were on the bare metal server being
236
236
reimaged.
237
237
238
-
Once the bare metal server is successfully reimaged and able to accept new NAKS
239
-
VMs, the NAKS VMs that were originally on the same bare metal server relaunch
238
+
Once the bare metal server is successfully reimaged and able to accept new NKS
239
+
VMs, the NKS VMs that were originally on the same bare metal server relaunch
240
240
on the newly reimaged bare metal server. Workload containers may then be
241
-
scheduled to those NAKS VMs, potentially restoring the Deployments or
242
-
StatefulSets that had Pods on NAKS VMs that were on the bare metal server.
241
+
scheduled to those NKS VMs, potentially restoring the Deployments or
242
+
StatefulSets that had Pods on NKS VMs that were on the bare metal server.
243
243
244
-
> **Note**: This behavior may seem to the user as if the NAKS VMs did not
244
+
> [!NOTE]
245
+
> This behavior may seem to the user as if the NKS VMs did not
245
246
> "move" from the bare metal server, when in fact a new instance of an identical
246
-
> NAKS VM was launched on the newly reimaged bare metal server that retained the
247
+
> NKS VM was launched on the newly reimaged bare metal server that retained the
247
248
> same bare metal server name as before reimaging.
248
249
249
-
## Best Practices
250
+
## Best practices
250
251
251
252
When working with Operator Nexus, keep the following best practices in mind.
252
253
253
254
* Avoid specifying `AvailabilityZones` for an Agent Pool.
254
-
* Launch larger NAKS Clusters before smaller ones.
255
+
* Launch larger NKS Clusters before smaller ones.
255
256
* Reduce the Agent Pool's Count before reducing the VM SKU size.
256
257
257
258
### Avoid specifying AvailabilityZones for an Agent Pool
258
259
259
260
As you can tell from the above placement scenarios, specifying
260
-
`AvailabilityZones` for an Agent Pool is the primary reason that NAKS VMs from
261
-
the same NAKS Cluster would end up on the same bare metal server. By specifying
261
+
`AvailabilityZones` for an Agent Pool is the primary reason that NKS VMs from
262
+
the same NKS Cluster would end up on the same bare metal server. By specifying
262
263
`AvailabilityZones`, you "pin" the Agent Pool to a subset of racks and
263
264
therefore limit the number of potential bare metal servers in that set of racks
264
-
for other NAKS Clusters and other Agent Pool VMs in the same NAKS Cluster to
265
+
for other NKS Clusters and other Agent Pool VMs in the same NKS Cluster to
265
266
land on.
266
267
267
268
Therefore, our first best practice is to avoid specifying `AvailabilityZones`
@@ -274,27 +275,27 @@ two or three VMs in an agent pool. You might consider setting
274
275
`AvailabilityZones` for that agent pool to `[1,3,5,7]` or `[0,2,4,6]` to
275
276
increase availability during runtime upgrades.
276
277
277
-
### Launch larger NAKS Clusters before smaller ones
278
+
### Launch larger NKS Clusters before smaller ones
278
279
279
-
As of April 2024, and the Network Cloud 2403.1 release, NAKS Clusters are
280
+
As of April 2024, and the Network Cloud 2403.1 release, NKS Clusters are
280
281
scheduled in the order in which they're created. To most efficiently pack your
281
-
target environment, we recommended you create larger NAKS Clusters before
282
+
target environment, we recommended you create larger NKS Clusters before
282
283
smaller ones. Likewise, we recommended you schedule larger Agent Pools before
283
284
smaller ones.
284
285
285
286
This recommendation is important for Agent Pools using the extra-large
286
287
`NC_G48_224_v1` or `NC_P46_224_v1` SKU. Scheduling the Agent Pools with the
287
288
greatest count of these extra-large SKU VMs creates a larger set of bare metal
288
-
servers upon which other extra-large SKU VMs from Agent Pools in other NAKS
289
+
servers upon which other extra-large SKU VMs from Agent Pools in other NKS
289
290
Clusters can collocate.
290
291
291
-
### Reduce the Agent Pool's Count before reducing the VM SKU size
292
+
### Reduce the Agent Pool's count before reducing the VM SKU size
292
293
293
-
If you run into capacity constraints when launching a NAKS Cluster or Agent
294
+
If you run into capacity constraints when launching an NKS Cluster or Agent
294
295
Pool, reduce the Count of the Agent Pool before adjusting the VM SKU size. For
295
-
example, if you attempt to create a NAKS Cluster with an Agent Pool with VM SKU
296
+
example, if you attempt to create an NKS Cluster with an Agent Pool with VM SKU
296
297
size of `NC_P46_224_v1` and a Count of 24 and get back a failure to provision
297
-
the NAKS Cluster due to insufficient resources, you may be tempted to use a VM
298
+
the NKS Cluster due to insufficient resources, you may be tempted to use a VM
298
299
SKU Size of `NC_P36_168_v1` and continue with a Count of 24. However, due to
299
300
requirements for workload VMs to be aligned to a single NUMA cell on a bare
300
301
metal server, it's likely that that same request results in similar
0 commit comments