You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/operator-nexus/concepts-nexus-availability.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,7 @@ When it comes to availability, there are two areas to consider:
37
37
38
38
## Deploy and Configure Nexus for High Availability
39
39
40
-
[Reliability in Azure Operator Nexus \| Microsoft Learn](https://learn.microsoft.com/en-us/azure/reliability/reliability-operator-nexus) provides details of how to deploy the Nexus services that run in Azure so as to maximize availability.
40
+
[Reliability in Azure Operator Nexus \| Microsoft Learn](https://learn.microsoft.com/azure/reliability/reliability-operator-nexus) provides details of how to deploy the Nexus services that run in Azure so as to maximize availability.
41
41
42
42
### Capacity and Redundancy Planning
43
43
@@ -52,11 +52,11 @@ Go through the following steps to help plan a Nexus deployment.
52
52
3. If your workloads support a split between control-plane and data-plane elements, consider whether to separately design control-plane sites that can control a larger number of more widely distributed data-plane sites. This option is only likely to be attractive for larger deployments. For smaller deployments, or deployments with workloads that don't support separating the control-plane and the data-plane, you're more likely to use a homogenous site architecture where all sites are identical.
53
53
54
54
55
-
4. Plan the distribution of workload instances to determine the number of racks needed in each site type, allowing for the fact that each rack is a Nexus zone. The platform can enforce affinity/anti-affinity rules at the scope of these zones, to ensure workload instances are distributed in such a way as to be resilient to failures of individual servers or racks. See [this article](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-virtual-machine-placement-hints) for more on affinity/anti-affinity rules. The Nexus Azure Kubernetes Server (NAKS) controller automatically distributes nodes within a cluster across the available servers in a zone as uniformly as possible, within other constraints. As a result, failure of any single server has the minimum impact on the total capacity remaining.
55
+
4. Plan the distribution of workload instances to determine the number of racks needed in each site type, allowing for the fact that each rack is a Nexus zone. The platform can enforce affinity/anti-affinity rules at the scope of these zones, to ensure workload instances are distributed in such a way as to be resilient to failures of individual servers or racks. See [this article](https://learn.microsoft.com/azure/operator-nexus/howto-virtual-machine-placement-hints) for more on affinity/anti-affinity rules. The Nexus Azure Kubernetes Server (NAKS) controller automatically distributes nodes within a cluster across the available servers in a zone as uniformly as possible, within other constraints. As a result, failure of any single server has the minimum impact on the total capacity remaining.
56
56
57
-
5. Factor in the [threshold redundancy](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-cluster-runtime-upgrade#configure-compute-threshold-parameters-for-runtime-upgrade-using-cluster-updatestrategy) that is required within each site on upgrade. This configuration option indicates to the orchestration engine the minimum number of worker nodes that must be available in order for a platform upgrade to be considered successful and allowed to proceed. Reserving these nodes eats into any capacity headroom. Setting a higher bar decreases the overall deployment's resilience to failure of individual nodes, but improves efficiency of utilization of the available capacity.
57
+
5. Factor in the [threshold redundancy](https://learn.microsoft.com/azure/operator-nexus/howto-cluster-runtime-upgrade#configure-compute-threshold-parameters-for-runtime-upgrade-using-cluster-updatestrategy) that is required within each site on upgrade. This configuration option indicates to the orchestration engine the minimum number of worker nodes that must be available in order for a platform upgrade to be considered successful and allowed to proceed. Reserving these nodes eats into any capacity headroom. Setting a higher bar decreases the overall deployment's resilience to failure of individual nodes, but improves efficiency of utilization of the available capacity.
58
58
59
-
6. Nexus supports between 1 and 8 racks per site inclusive, with each rack containing 4, 8, 12 or 16 servers. All racks must be identical in terms of number of servers. See [here](https://learn.microsoft.com/en-us/azure/operator-nexus/reference-near-edge-compute) for specifics of the resource available for workloads. See the following diagram, and also [this article](https://learn.microsoft.com/en-us/azure/operator-nexus/reference-limits-and-quotas) for other limits and quotas that might have an impact.
59
+
6. Nexus supports between 1 and 8 racks per site inclusive, with each rack containing 4, 8, 12 or 16 servers. All racks must be identical in terms of number of servers. See [here](https://learn.microsoft.com/azure/operator-nexus/reference-near-edge-compute) for specifics of the resource available for workloads. See the following diagram, and also [this article](https://learn.microsoft.com/azure/operator-nexus/reference-limits-and-quotas) for other limits and quotas that might have an impact.
60
60
61
61
7. Nexus supports one or two Pure storage arrays. Currently, these arrays are available to workload NFs running as Kubernetes nodes. Workloads running as VMs use local storage from the server they're instantiated on.
62
62
@@ -100,7 +100,7 @@ Ensure that the Nexus routing tables have redundant routes preconfigured, as opp
100
100
101
101
### Identity and Authentication
102
102
103
-
During a disconnection event, the on-premises infrastructure and workloads aren't able to reach Entra in order to perform user authentication. To prepare for a disconnection, you can ensure that all necessary identities and their associated permissions and user keys are preconfigured. Nexus provides [an API](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-baremetal-bmm-ssh) that the operator can use to automate this process. Preconfiguring this information ensures that authenticated management access to the infrastructure continues unimpeded by loss of connectivity to Entra.
103
+
During a disconnection event, the on-premises infrastructure and workloads aren't able to reach Entra in order to perform user authentication. To prepare for a disconnection, you can ensure that all necessary identities and their associated permissions and user keys are preconfigured. Nexus provides [an API](https://learn.microsoft.com/azure/operator-nexus/howto-baremetal-bmm-ssh) that the operator can use to automate this process. Preconfiguring this information ensures that authenticated management access to the infrastructure continues unimpeded by loss of connectivity to Entra.
104
104
105
105
### Managing Platform Upgrade
106
106
@@ -112,20 +112,20 @@ Nexus platform upgrade is a fairly lengthy process. The customer initiates the u
112
112
113
113
- The process is only active on one rack in the selected site at a time. Although upgrade is done in-place, there's still some impact to the worker nodes in the rack during the upgrade.
114
114
115
-
For more information about the upgrade process, see [this article](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-cluster-runtime-upgrade#upgrading-cluster-runtime-using-cli). For more information about ensuring control-plane resiliency, see [this one](https://learn.microsoft.com/en-us/azure/operator-nexus/concepts-rack-resiliency).
115
+
For more information about the upgrade process, see [this article](https://learn.microsoft.com/azure/operator-nexus/howto-cluster-runtime-upgrade#upgrading-cluster-runtime-using-cli). For more information about ensuring control-plane resiliency, see [this one](https://learn.microsoft.com/azure/operator-nexus/concepts-rack-resiliency).
116
116
117
117
## Designing and Operating High Availability Workloads for Nexus
118
118
119
119
Workloads should ideally follow a cloud-native design, with N+k clusters that can be deployed across multiple nodes and racks within a site, using the Nexus zone concept.
120
120
121
-
The Well Architected Framework guidance on [mission critical](https://learn.microsoft.com/en-us/azure/well-architected/mission-critical/) and [carrier grade](https://learn.microsoft.com/en-us/azure/well-architected/carrier-grade/) workloads on Azure also applies to workloads on Nexus.
121
+
The Well Architected Framework guidance on [mission critical](https://learn.microsoft.com/azure/well-architected/mission-critical/) and [carrier grade](https://learn.microsoft.com/azure/well-architected/carrier-grade/) workloads on Azure also applies to workloads on Nexus.
122
122
123
123
Designing and implementing highly available workloads on any platform requires a top-down approach. Start with an understanding of the availability required from the solution as a whole. Consider the key elements of the solution and their predicted availability. Then determine how these attributes need to be combined in order to achieve the solution level goals.
124
124
125
125
126
126
### Workload Placement
127
127
128
-
Nexus has extensive support for providing hints to the Kubernetes orchestrator to control how workloads are deployed across the available worker nodes. See [this article](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-virtual-machine-placement-hints) for full details.
128
+
Nexus has extensive support for providing hints to the Kubernetes orchestrator to control how workloads are deployed across the available worker nodes. See [this article](https://learn.microsoft.com/azure/operator-nexus/howto-virtual-machine-placement-hints) for full details.
0 commit comments