Skip to content

Commit 4b1374a

Browse files
First section of availability article.
1 parent 65e1284 commit 4b1374a

File tree

2 files changed

+71
-0
lines changed

2 files changed

+71
-0
lines changed
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
title: "Azure Operator Nexus: Availability"
3+
description: Overview of the availability features of Azure Operator Nexus.
4+
author: joemarshallmsft
5+
ms.author: joemarshall
6+
ms.service: azure-operator-nexus
7+
ms.topic: conceptual
8+
ms.date: 02/15/2024
9+
ms.custom: template-concept
10+
---
11+
12+
# Introduction to Availability
13+
14+
When it comes to availability, there are two areas to consider:
15+
16+
- Availability of the Nexus platform itself, including:
17+
18+
- Capacity and Redundancy Planning
19+
20+
- Considering Workload Redundancy Requirements
21+
22+
- Site Deployment and Connection
23+
24+
- Other Networking Considerations for Availability
25+
26+
- Identity and Authentication
27+
28+
- Managing Platform Upgrade
29+
30+
- Availability of the Network Functions (NFs) running on the platform, including:
31+
32+
- Configuration Updates
33+
34+
- Workload Upgrade
35+
36+
- Workload Healing
37+
38+
## Deploy and Configure Nexus for High Availability
39+
40+
[Reliability in Azure Operator Nexus \| Microsoft Learn](https://learn.microsoft.com/en-us/azure/reliability/reliability-operator-nexus) provides details of how to deploy the Nexus services that run in Azure so as to maximize availability.
41+
42+
### Capacity and Redundancy Planning
43+
44+
Each on-premises deployment is a multi-rack design, providing physical redundancy at all levels of the stack.
45+
46+
Go through the following steps to help plan a Nexus deployment.
47+
48+
1. Determine the initial set of workloads (Network Functions) which the deployment should be sized to host.
49+
50+
2. Determine the capacity requirements for each of these workloads, allowing for redundancy for each one.
51+
52+
3. If your workloads support a split between control-plane and data-plane elements, consider whether to separately design control-plane sites that can control a larger number of more widely distributed data-plane sites. This option is only likely to be attractive for larger deployments. For smaller deployments, or deployments with workloads that don't support separating the control-plane and the data-plane, you're more likely to use a homogenous site architecture where all sites are identical.
53+
54+
55+
4. Plan the distribution of workload instances to determine the number of racks needed in each site type, allowing for the fact that each rack is a Nexus zone. The platform can enforce affinity/anti-affinity rules at the scope of these zones, to ensure workload instances are distributed in such a way as to be resilient to failures of individual servers or racks. See [this article](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-virtual-machine-placement-hints) for more on affinity/anti-affinity rules. The Nexus Azure Kubernetes Server (NAKS) controller automatically distributes nodes within a cluster across the available servers in a zone as uniformly as possible, within other constraints. As a result, failure of any single server has the minimum impact on the total capacity remaining.
56+
57+
5. Factor in the [threshold redundancy](https://learn.microsoft.com/en-us/azure/operator-nexus/howto-cluster-runtime-upgrade#configure-compute-threshold-parameters-for-runtime-upgrade-using-cluster-updatestrategy) that is required within each site on upgrade. This configuration option indicates to the orchestration engine the minimum number of worker nodes that must be available in order for a platform upgrade to be considered successful and allowed to proceed. Reserving these nodes eats into any capacity headroom. Setting a higher bar decreases the overall deployment's resilience to failure of individual nodes, but improves efficiency of utilization of the available capacity.
58+
59+
6. Nexus supports between 1 and 8 racks per site inclusive, with each rack containing 4, 8, 12 or 16 servers. All racks must be identical in terms of number of servers. See [here](https://learn.microsoft.com/en-us/azure/operator-nexus/reference-near-edge-compute) for specifics of the resource available for workloads. See the following diagram, and also [this article](https://learn.microsoft.com/en-us/azure/operator-nexus/reference-limits-and-quotas) for other limits and quotas that might have an impact.
60+
61+
7. Nexus supports one or two Pure storage arrays. Currently, these arrays are available to workload NFs running as Kubernetes nodes. Workloads running as VMs use local storage from the server they're instantiated on.
62+
63+
8. Other factors to consider are the number of available physical sites, and any per-site limitations such as bandwidth or power.
64+
65+
:::image type="content" source="media/nexus-availability-1.png" alt-text="Diagram of a typical server and rack structure in an Operator Nexus deployment.":::
66+
67+
**Figure 1 - Nexus elements in a single site**
68+
69+
In most cases, capacity planning is an iterative process. Work with your Microsoft account team, which has tooling in order to help make this process more straightforward.
70+
71+
As the demand on the infrastructure increases over time, either due to subscriber growth or workloads being migrated to the platform, the Nexus deployment can be scaled by adding further racks to existing sites, or adding new sites, depending on criteria such as the limitations of any single site (power, space, bandwidth etc.).
105 KB
Loading

0 commit comments

Comments
 (0)