Skip to content

Commit 8de3fde

Browse files
Add details on VF-Lag.
1 parent 7735792 commit 8de3fde

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

articles/operator-nexus/concepts-nexus-availability.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,14 @@ Although the initial requirement was for 400 nodes across the deployment, the de
9292

9393
For another workload, you might choose not to "layer" the multiple levels of redundancy, taking the view that designing for concurrent failure of one site, a rack in another site and a server in another rack in that same site is overkill. Ultimately, the optimum design depends on the specific service offered by the workload, and details of the workload itself, in particular its load-balancing functionality. Modeling the service using Markov chains to identify the various error modes, with associated probabilities, would also help determine which errors might realistically occur simultaneously. For example, a workload that is able to apply back-pressure when a given site is suffering from reduced capacity due to a server failure might then be able to redirect traffic to one of the remaining sites which still have full redundancy.
9494

95+
### Site Deployment and Connection
96+
97+
Each Nexus site is connected to an Azure region that hosts the in-Azure resources such as Cluster Manager, Nexus Fabric Controller etc. Ideally, connect each Nexus site to a different Azure region in order to maximize the resilience of the Nexus deployment to any interruption of the Azure regions. Depending on the geography, there is likely to be a trade-off between maximizing the number of distinct Azure regions the deployment is taking a dependency on, and any other restrictions around data residency or sovereignty.
98+
99+
Virtual machines, including Virtual Network Functions (VNFs) and Nexus Azure Kubernetes Service (AKS), as well as services hosted on-premises within Operator Nexus, are provided with connectivity through highly available links between them and the network fabric. This enhanced connectivity is achieved through the utilization of redundant physical connections, which are seamlessly facilitated by Single Root Input/Output Virtualization (SR-IOV) interfaces employing Virtual Function Link Aggregation (VF-Lag) technology.
100+
101+
VF-Lag technology enables the aggregation of virtual functions (VFs) into a logical Link Aggregation Group (LAG) across a pair of ports on the physical network interface card (NIC). This capability ensures robust and reliable network performance by exposing a single virtual function that is highly available. This technology requires no configuration on the part of the users to benefit from its advantages, simplifying the deployment process and enhancing the overall user experience.
102+
95103
### Other Networking Considerations for Availability
96104

97105
The Nexus infrastructure and workloads make extensive use of Domain Name System (DNS). Since there's no authoritative DNS responder within the Nexus platform, there's nothing to respond to DNS requests if the Nexus site becomes disconnected from the Azure. Therefore, take care to ensure that all DNS entries have a Time to Live (TTL) that is consistent with the desired maximum disconnection duration, typically 72 hours currently.

0 commit comments

Comments
 (0)