|
| 1 | +--- |
| 2 | +title: Azure VMware Solution NSX Scale and Performance Recommendations for VMware HCX |
| 3 | +description: Learn about the default NSX Topology in Azure VMware Solution and recommended practices to mitigate performance issues around HCX migration use cases. |
| 4 | +ms.topic: how-to |
| 5 | +ms.service: azure-vmware |
| 6 | +ms.date: 12/18/2024 |
| 7 | +ms.custom: engagement-fy25 |
| 8 | +--- |
| 9 | + |
| 10 | + |
| 11 | +# Azure VMware Solution NSX Scale and performance recommendations for VMware HCX |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +In this article, you will learn about the default NSX topology in Azure VMware Solution, NSX data path performance characteristics, how to identify NSX data path resource constraints and recommended configurations to help mitigate resource constraints and optimize over all data path performance for HCX migrations. |
| 16 | + |
| 17 | +## Azure VMware Solution NSX Default topology |
| 18 | + |
| 19 | + The Azure VMware Solution NSX default topology has the following configuration: |
| 20 | + |
| 21 | +• Three node NSX Manager cluster. |
| 22 | + |
| 23 | +• NSX Edge and Gateway for North-bound traffic: |
| 24 | + |
| 25 | + • Two Large Form Factor NSX Edges, deployed in an NSX Edge cluster. |
| 26 | + |
| 27 | + • A Default NSX Tier-0 Gateway in Active/Active mode. |
| 28 | + |
| 29 | + • A Default NSX Tier-1 Gateway in Active/Standby mode. |
| 30 | + |
| 31 | + • A Default HCX-UPLINK segment connected to default Tier-1 Gateway. |
| 32 | + |
| 33 | + |
| 34 | + |
| 35 | +Customers typically host their application workloads by creating new NSX segments and attaching them to the default Tier-1 Gateway. Additionally, customers with an HCX migration use case will use the default HCX-uplink segment, which is also connected to the default Tier-1 Gateway. |
| 36 | + |
| 37 | +The default NSX topology for Azure VMware Solution, where all traffic exits through the default Tier-1 Gateway, may not be optimal based on customer traffic flows and throughput requirements. Here are some potential challenges and the recommended configurations to optimize the NSX Edge data path resource. |
| 38 | + |
| 39 | +### Potential Challenge: |
| 40 | + |
| 41 | +• All the north-bound network traffic (Migrations, L2 Extensions, VM traffic outbound of Azure VMware Solution) uses the default Tier-1 Gateway which is in Active/Standby mode. |
| 42 | + |
| 43 | +• In the default Active/Standby mode, the Tier-1 Gateway will only use the Active Edge VM for all north-bound traffic. |
| 44 | + |
| 45 | +• The second Edge VM, which is standby, is not used for north-bound traffic. |
| 46 | + |
| 47 | +• Depending on the throughput requirement and flows this could potentially create a bottleneck on the Active Edge VM. |
| 48 | + |
| 49 | +### Recommended Practices: |
| 50 | +It is possible to change the NSX North-bound network connectivity to distribute the traffic evenly to both Edge VM’s. This can be done by creating additional Tier-1 Gateways and distributing the NSX segments across multiple Tier-1 Gateways. For an HCX migration use case, the recommendation would be to move HCX Layer 2 (L2) Extension and migration traffic to a newly created Tier-1 Gateway, so it uses the NSX Edge resource optimally. |
| 51 | + |
| 52 | +To make an Active Edge for a given Tier-1 Gateway predictable it is recommended to create an additional Tier-1 Gateway with the High Availability (HA) Mode set to Active/Standby with the Failover mode set to preemptive. This configuration would allow you to select a different active Edge VM then the one in use by the default Tier-1 Gateway. This naturally splits north-bound traffic across multiple Tier-1 Gateways, so both NSX Edges are optimally utilized, thus avoiding a potential bottleneck with the default NSX topology. |
| 53 | + |
| 54 | +:::image type="content" source="media/nsxt/default-nsx-topology.png" alt-text="Diagram showing the default nsx topology in Azure VMware Solution." border="false" lightbox="media/nsxt/default-nsx-topology.png"::: |
| 55 | + |
| 56 | + |
| 57 | +Figure 1: Depicts the default NSX topology in Azure VMware Solution |
| 58 | + |
| 59 | +### NSX Edge performance characteristics |
| 60 | + |
| 61 | +Each of the NSX Edge Virtual machine (EVM) can support up to approximately ~20 Gbps based on the number of flows, packet size and services enabled on the NSX gateways. Each of the Edge VMs (Large form factors) has 4 Data Plane Development Kit (DPDK) enabled CPU cores, essentially each of the DPDK core could process up to ~5 Gbps traffic, based on flow hashing, packet size, and services enabled on NSX gateway. For more information on NSX Edge performance, refer to the VMware NSX-T Reference Design Guide section 8.6.2.* |
| 62 | + |
| 63 | +## Monitor, Identify and Fix potential Edge data path Performance Bottlenecks |
| 64 | + |
| 65 | +### How to Monitor and Identify NSX Edge Data Path Resource Constraints: |
| 66 | + |
| 67 | +NSX Edge performance can be monitored and identified by using the built-in NSX alarm framework. The following critical NSX Edge alarms identify the NSX Edge data path resource constraints: |
| 68 | + |
| 69 | +1. Edge NIC Out of Transmit/Receive buffer. |
| 70 | + |
| 71 | +2. Edge Datapath CPU very high. |
| 72 | + |
| 73 | +3. Edge Datapath NIC throughput Very high. |
| 74 | + |
| 75 | +:::image type="content" source="media/nsxt/nsx-edge-critical-alerts.png" alt-text="Diagram showing nsx edge health critical alerts." border="false" lightbox="media/nsxt/nsx-edge-critical-alerts.png"::: |
| 76 | + |
| 77 | +Figure 2: NSX Edge Health Critical Alerts |
| 78 | + |
| 79 | +## How to fix the NSX Edge resource constraints: |
| 80 | + |
| 81 | +To validate the issue, check the historic and real-time traffic throughput: |
| 82 | + |
| 83 | +Validation of issue: |
| 84 | + |
| 85 | +• Historic/Realtime traffic throughput – check traffic throughput at the alarm time for the correlation. |
| 86 | + |
| 87 | +:::image type="content" source="media/nsxt/nsx-edge-performance-charts.png" alt-text="Diagram showing nsx edge vm performance charts." border="false" lightbox="media/nsxt/nsx-edge-performance-charts.png"::: |
| 88 | + |
| 89 | +Figure 3: NSX Edge VM Performance Charts |
| 90 | + |
| 91 | +To mitigate the issue, here are a few options to consider. |
| 92 | + |
| 93 | +Mitigation options: |
| 94 | +1. Edge Scale-UP: NSX Edge Scale-UP from Large (4 DPDK CPU) to X-Large (8 DPDK CPU) form factor could resolve part of the issue. |
| 95 | + |
| 96 | + • Edge Scale up provides additional CPU and memory for data path packet processing. |
| 97 | + |
| 98 | + • Edge Scale up may not help if you have one or more heavy flows, for example, HCX Network Extension (NE) to Network Extension (NE) traffic, as this traffic could potentially pin to one of the DPDK CPU cores. |
| 99 | + |
| 100 | +2. Tier-1 Gateway Topology Change: Change the Azure VMware Solution NSX default Tier-1 Gateway topology with multiple Tier-1 Gateways to split the traffic across multiple Edge VM’s |
| 101 | + |
| 102 | + • More details in the next section with an example of HCX migration use case. |
| 103 | + |
| 104 | +3. Edge Scale-OUT: If customer has large number of Hosts in the SDDC and workloads, NSX Edge Scale-OUT (from 2 Edges to 4 Edges) could be an option to add additional NSX Edge data path resources. |
| 105 | + |
| 106 | + • However, NSX Edge Scale-OUT is effective only with a change in the NSX default Tier-1 Gateway topology to distribute the traffic optimally across all 4 Edge VM’s. More details in the next section with an example of HCX migration use case. |
| 107 | + |
| 108 | +### Default and configuration recommendations to the NSX Edge data path performance. |
| 109 | + |
| 110 | +Here are a few configuration recommendations to mitigate an NSX Edge VM’s performance challenges. |
| 111 | + |
| 112 | +1. By default, Edge VM’s are part of Azure VMware Solution management resource pool on vCenter, all appliances in the management resource pool have dedicated computing resources assigned. |
| 113 | + |
| 114 | +2. By default, Edge VM’s are hosted on different Hosts with anti-affinity rules applied, to avoid multiple heavy packet processing workloads on same hosts. |
| 115 | + |
| 116 | +3. Disable Tier-1 Gateway Firewall if it is not required to get better packet processing power. (By default, the Tier-1 Gateway Firewall is enabled). |
| 117 | + |
| 118 | +4. Make sure NSX Edge VM’s and HCX Network Extension (NE) appliances are on separate hosts, to avoid multiple heavy packet processing workloads on same hosts. |
| 119 | + |
| 120 | +5. For HCX migration use case, be sure that the HCX Network Extension (NE) and HCX Interconnect (IX) appliances have the CPU reserved. Reserving the CPU will allow HCX to optimally process the HCX migration traffic. (By default, these appliances have no CPU reservations). |
| 121 | + |
| 122 | +## How to optimize Azure VMware Solution NSX Data Path Performance - HCX Use Case |
| 123 | + |
| 124 | +One of the most frequent scenarios that may potentially reach the NSX Edges data path limit is the HCX migration and network extension use case. The reason being HCX migration and network extension creates heavy flows (single flow between Network Extenders) this will be hashed to single edge and single DPDK core within the Edge VM. This could potentially limit HCX migration and Network extension traffic up to 5Gbps, based on the flow hashing. |
| 125 | + |
| 126 | +HCX Network extenders have a throughput limit of 4-6 Gbps per appliance. A recommended practice is to deploy multiple HCX NE appliances to distribute the load across them, ensuring reliable performance. This also allows multiple network flows, improving network hashing across different NSX Edges and cores within an NSX Edge VM. |
| 127 | + |
| 128 | +Given the nature of HCX use case traffic pattern and default Azure VMware Solution topology, here are few recommended practices to mitigate NSX Edge VM bottlenecks. |
| 129 | + |
| 130 | +## Optimizing NSX Edge Performance (Mitigate NSX Edge bottleneck) |
| 131 | + |
| 132 | +In general, creating additional Tier-1 Gateways and distributing segments across multiple Tier-1 Gateways helps to mitigate potential NSX Edge data path bottleneck. The steps outlined below show how to create and move an HCX uplink segment to the new Tier-1 Gateway. This allows you to separate out HCX traffic from workload VM traffic. |
| 133 | + |
| 134 | +:::image type="content" source="media/nsxt/nsx-traffic-flow-additional-tier-1-gateway.png" alt-text="Diagram showing nsx traffic flow in Azure VMware Solution with an additional Tier-1 gateway." border="false" lightbox="media/nsxt/nsx-traffic-flow-additional-tier-1-gateway.png"::: |
| 135 | + |
| 136 | +Figure 4: NSX traffic Flow with additional Tier-1 Gateways created. |
| 137 | + |
| 138 | + |
| 139 | +### Detailed Steps (Mitigate Edge VM bottleneck) |
| 140 | + |
| 141 | +[Create an NSX Tier-1 Gateway](tutorial-nsx-tier-1-gateway.md). |
| 142 | + |
| 143 | +Distributed Only Option: |
| 144 | +1. No Edge Cluster can be selected. |
| 145 | + |
| 146 | +2. All connected Segments and Service Ports must be advertised. |
| 147 | + |
| 148 | +3. No stateful services are available in the Distributed Only option. |
| 149 | + |
| 150 | +:::image type="content" source="media/nsxt/nsx-tier-1-gateway-distributed-only.png" alt-text="Diagram showing nsx tier-1 gateway distributed only option." border="false" lightbox="media/nsxt/nsx-tier-1-gateway-distributed-only.png"::: |
| 151 | + |
| 152 | +Figure 5: Tier-1 Gateway Distributed Only Option. |
| 153 | + |
| 154 | +>[!IMPORTANT] |
| 155 | +>In a Distributed Only High Availability (HA) Mode, traffic is distributed across all Edge VM's. Workload traffic and Migration traffic may traverse the Active Edge at the same time. |
| 156 | +
|
| 157 | + |
| 158 | +Active/Standby Option: |
| 159 | + |
| 160 | +1. Select the **Edge Cluster**. |
| 161 | + |
| 162 | +2. For Auto Allocate Edges- Select **No** on the radio button. |
| 163 | + |
| 164 | +3. Select the **Edge VM** that is not currently active as the preferred option. |
| 165 | + |
| 166 | +4. For the **Fail Over** setting, select **Preemptive**, this will ensure that traffic will always failback to the preferred Edge VM selected in Step 3. |
| 167 | + |
| 168 | +5. Select **All Connected Segments and Service Ports** to be advertised. |
| 169 | + |
| 170 | +6. Select **Save**. |
| 171 | + |
| 172 | +An Active/Standby configuration with the preferred Edge VM defined allows you to force traffic the Edge VM that is not the Active Edge on the Default Tier-1 Gateway. If the Edge cluster is scaled-out to 4 Edges, creating the new Tier-1 Gateway and selecting Edge VM 03 and Edge VM 04 may be a better option to isolate HCX traffic completely. |
| 173 | + |
| 174 | +:::image type="content" source="media/nsxt/nsx-tier-1-gateway-active-standby.png" alt-text="Diagram showing nsx tier-1 gateway active standby option." border="false" lightbox="media/nsxt/nsx-tier-1-gateway-active-standby.png"::: |
| 175 | + |
| 176 | +Figure 6: Tier-1 Gateway Active/Standby Option. |
| 177 | + |
| 178 | +>[!NOTE] |
| 179 | +>Microsoft Recommends the Active/Standby HA Mode when additional Tier-1 Gateways are created. This allows customers to seperate Workload and migration traffic across different Edge VM's. |
| 180 | +
|
| 181 | + |
| 182 | +## Create a new Segment for HCX Uplink and attach to the new Tier-1 Gateway |
| 183 | + |
| 184 | +For detailed instructions on NSX Segment creation. [NSX Segment Creation](tutorial-nsx-t-network-segment.md) |
| 185 | + |
| 186 | +Select the newly created Tier-1 Gateway when creating your new NSX Segment. |
| 187 | + |
| 188 | +>[!NOTE] |
| 189 | +>When creating a new NSX Segment, customers can utilize the Azure VMware Solution reserved IP space. For example, a new segment can be created with an IP range of 10.18.75.129/26, assuming the following IP space 10.18.72.0/22 was used to create the Azure VMware Solution Private Cloud. |
| 190 | + |
| 191 | +:::image type="content" source="media/nsxt/nsx-segment-creation.png" alt-text="Diagram showing the creation of an nsx segment." border="false" lightbox="media/nsxt/nsx-segment-creation.png"::: |
| 192 | + |
| 193 | +Figure 7: NSX Segment creation for new HCX Uplink network. |
| 194 | + |
| 195 | +## Create an HCX Network Profile |
| 196 | + |
| 197 | +For detailed steps on how to Create an HCX Network Profile. [HCX Network Profile](configure-vmware-hcx.md#create-network-profiles) |
| 198 | + |
| 199 | +1. Navigate to the HCX Portal, select **Interconnect** and then select **Network Profile**. |
| 200 | + |
| 201 | +2. Select **Create Network Profile**. |
| 202 | + |
| 203 | +3. Select **NSX Network**, and choose the newly created **HCX Uplink segment**. |
| 204 | + |
| 205 | +4. Add the desired **IP Pool range**. |
| 206 | + |
| 207 | +5. (Optional) Select **HCX Uplink** as the HCX Traffic Type. |
| 208 | + |
| 209 | +6. Select **Create**. |
| 210 | + |
| 211 | +:::image type="content" source="media/hcx/hcx-uplink-network-profile.png" alt-text="Diagram showing the creation of an hcx network profile." border="false" lightbox="media/nsxt/hcx-uplink-network-profile.png"::: |
| 212 | + |
| 213 | +Figure 8: HCX Network profile creation. |
| 214 | + |
| 215 | +Once the new HCX Uplink Network Profile is created, update the existing Service Mesh and edit the default uplink profile with the newly created Network Profile. |
| 216 | + |
| 217 | +:::image type="content" source="media/hcx/hcx-service-mesh-edit.png" alt-text="Diagram showing how to edit an existing HCX service mesh." border="false" lightbox="media/nsxt/hcx-service-mesh-edit.png"::: |
| 218 | + |
| 219 | +Figure 9: HCX Service Mesh edit. |
| 220 | + |
| 221 | +7. Select the existing **Service Mesh** and select **Edit**. |
| 222 | + |
| 223 | +8. Edit the default Uplink with the newly created Network Profile. |
| 224 | + |
| 225 | +9. Select **Service Mesh Change**. |
| 226 | + |
| 227 | +:::image type="content" source="media/hcx/hcx-in-service-mode.png" alt-text="Diagram showing how to edit an in service mode on an HCX Network extension appliance." border="false" lightbox="media/nsxt/hcx-in-service-mode.png"::: |
| 228 | + |
| 229 | +Figure 10: HCX In-Service Mode |
| 230 | + |
| 231 | +>[!Note] |
| 232 | +>In-Service Mode of the HCX Network Extension appliances should be considered to reduce downtime during this Service Mesh edit. |
| 233 | +
|
| 234 | +10. Select **Finish**. |
| 235 | + |
| 236 | +[!IMPORTANT]Downtime will vary depending on the Service Mesh change created. It is recommended to allocate 5 minutes of downtime for these changes to take effect. |
| 237 | + |
| 238 | +## More information |
| 239 | + |
| 240 | +[VMware NSX Reference Design Guide](https://www.vmware.com/docs/nsx-t-reference-design-guide-3-2-v1.1-1) |
0 commit comments