|
| 1 | +--- |
| 2 | +title: Troubleshoot Azure Virtual Network NAT connectivity problems |
| 3 | +titleSuffix: Azure Virtual Network NAT troubleshooting |
| 4 | +description: Troubleshoot issues with Virtual Network NAT. |
| 5 | +services: virtual-network |
| 6 | +documentationcenter: na |
| 7 | +author: asudbring |
| 8 | +manager: KumudD |
| 9 | +ms.service: virtual-network |
| 10 | +Customer intent: As an IT administrator, I want to troubleshoot Virtual Network NAT. |
| 11 | +ms.devlang: na |
| 12 | +ms.topic: overview |
| 13 | +ms.tgt_pltfrm: na |
| 14 | +ms.workload: infrastructure-services |
| 15 | +ms.date: 03/02/2020 |
| 16 | +ms.author: allensu |
| 17 | +--- |
| 18 | + |
| 19 | +# Troubleshoot Azure Virtual Network NAT connectivity problems |
| 20 | + |
| 21 | +This article helps administrators diagnose and resolve connectivity problems when using Virtual Network NAT. |
| 22 | + |
| 23 | +>[!NOTE] |
| 24 | +>Virtual Network NAT is available as public preview at this time. Currently it's only available in a limited set of [regions](nat-overview.md#region-availability). This preview is provided without a service level agreement and isn't recommended for production workloads. Certain features may not be supported or may have constrained capabilities. See the [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms) for details. |
| 25 | +
|
| 26 | +## Problems |
| 27 | + |
| 28 | +- [SNAT exhaustion](#snat-exhaustion). |
| 29 | +- [ICMP ping is failing](#icmp-ping-is-failing). |
| 30 | + |
| 31 | +To resolve these problems, follow the steps in the following section. |
| 32 | + |
| 33 | +## Resolution |
| 34 | + |
| 35 | +### SNAT exhaustion |
| 36 | + |
| 37 | +A single [NAT gateway resource](nat-gateway-resource.md) supports from 64,000 up to 1 million concurrent flows. Each IP address provides 64,000 SNAT ports to the available inventory. You can use up to 16 IP addresses per NAT gateway resource. The SNAT mechanism is described [here](nat-gateway-resource.md#source-network-address-translation) in more detail. |
| 38 | + |
| 39 | +#### Steps: |
| 40 | + |
| 41 | +1. Investigate how your application is creating outbound connectivity (for example, code review or packet capture). |
| 42 | +2. Determine if this activity is expected behavior or whether the application is misbehaving. Use metrics in Azure Monitor to substantiate your findings. |
| 43 | +3. Evaluate if appropriate patterns are followed. |
| 44 | +4. Evaluate if SNAT port exhaustion should be mitigated with additional IP addresses assigned to NAT gateway resource. |
| 45 | + |
| 46 | +#### Design pattern: |
| 47 | + |
| 48 | +Always take advantage of connection reuse and connection pooling whenever possible. This pattern will avoid resource exhaustion problems outright and result in predictable behavior. Primitives for these patterns can be found in many development libraries and frameworks. |
| 49 | +- Consider [asynchronous polling patterns](https://docs.microsoft.com/azure/architecture/patterns/async-request-reply) for long-running operations to free up connection resources for other operations. |
| 50 | +- Long-lived flows (for example reused TCP connections) should use TCP keepalives or application layer keepalives to avoid intermediate systems timing out. |
| 51 | +- Graceful [retry patterns](https://docs.microsoft.com/azure/architecture/patterns/retry) should be used to avoid aggressive retries/bursts during transient failure or failure recovery. |
| 52 | +Creating a new TCP connection for every HTTP operation (also known as "atomic connections") is an anti-pattern. Atomic connections will prevent your application from scaling well and waste resources. Always pipeline multiple operations into the same connection. Your application will benefit in transaction speed and resource costs. When your application uses transport layer encryption (for example TLS), there's a significant cost associated with the processing of new connections. Review [Azure Cloud Design Patterns](https://docs.microsoft.com/azure/architecture/patterns/) for additional best practice patterns. |
| 53 | + |
| 54 | +#### Mitigations |
| 55 | + |
| 56 | +You can scale outbound connectivity as follows: |
| 57 | + |
| 58 | +| Scenario | Mitigation | |
| 59 | +|---|---| |
| 60 | +| You're experiencing contention for SNAT ports and SNAT port exhaustion during periods of high usage. | Determine if you can add additional public IP address resources or public IP prefix resources. This addition will allow for up to 16 IP addresses in total to your NAT gateway. This addition will provide more inventory for available SNAT ports (64,000 per IP address) and allow you to scale your scenario further.| |
| 61 | +| You've already given 16 IP addresses and still are experiencing SNAT port exhaustion. | Distribute your application environment across multiple subnets and provide a NAT gateway resource for each subnet. | |
| 62 | + |
| 63 | +>[!NOTE] |
| 64 | +>It is important to understand why SNAT exhaustion occurs. Make sure you are using the right patterns for scalable and reliable scenarios. Adding more SNAT ports to a scenario without understanding the cause of the demand should be a last resort. If you do not understand why your scenario is applying pressure on SNAT port inventory, adding more SNAT ports to the inventory by adding more IP addresses will only delay the same exhaustion failure as your application scales. You may be masking other inefficiencies and anti-patterns. |
| 65 | +
|
| 66 | +### ICMP ping is failing |
| 67 | + |
| 68 | +[Virtual Network NAT](nat-overview.md) supports IPv4 UDP and TCP protocols. ICMP isn't supported and expected to fail. Instead, use TCP connection tests (for example "TCP ping") and UDP-specific application layer tests to validate end to end connectivity. |
| 69 | + |
| 70 | +The following table can be used a starting point for which tools to use to start tests. |
| 71 | + |
| 72 | +| Operating system | Generic TCP connection test | TCP application layer test | UDP | |
| 73 | +|---|---|---|---| |
| 74 | +| Linux | nc (generic connection test) | curl (TCP application layer test) | application specific | |
| 75 | +| Windows | [PsPing](https://docs.microsoft.com/sysinternals/downloads/psping) | PowerShell [Invoke-WebRequest](https://docs.microsoft.com/powershell/module/microsoft.powershell.utility/invoke-webrequest) | application specific | |
| 76 | + |
| 77 | +## Next steps |
| 78 | + |
| 79 | +- Learn about [Virtual Network NAT](nat-overview.md) |
| 80 | +- Learn about [NAT gateway resource](nat-gateway-resource.md) |
0 commit comments