Skip to content

Commit b684406

Browse files
authored
Merge pull request #107735 from christiankuhtz/patch-300
Tweaks
2 parents d5b2ecf + 9588da9 commit b684406

File tree

1 file changed

+14
-5
lines changed

1 file changed

+14
-5
lines changed

articles/virtual-network/troubleshoot-nat.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.devlang: na
1212
ms.topic: overview
1313
ms.tgt_pltfrm: na
1414
ms.workload: infrastructure-services
15-
ms.date: 03/05/2020
15+
ms.date: 03/14/2020
1616
ms.author: allensu
1717
---
1818

@@ -48,14 +48,23 @@ Frequently the root cause of SNAT exhaustion is an anti-pattern for how outbound
4848

4949
Always take advantage of connection reuse and connection pooling whenever possible. These patterns will avoid resource exhaustion problems and result in predictable behavior. Primitives for these patterns can be found in many development libraries and frameworks.
5050

51-
_**Solution:**_ Use appropriate patterns
51+
_**Solution:**_ Use appropriate patterns and best practices
52+
53+
- Atomic requests (one request per connection) are a poor design choice. Such anti-pattern limits scale, reduces performance, and decreases reliability. Instead, reuse HTTP/S connections to reduce the numbers of connections and associated SNAT ports. The application scale will increase and performance improve due to reduced handshakes, overhead, and cryptographic operation cost when using TLS.
54+
- DNS can introduce many individual flows at volume when the client is not caching the DNS resolvers result. Use caching.
55+
- UDP flows (for example DNS lookups) allocate SNAT ports for the duration of the idle timeout. The longer the idle timeout, the higher the pressure on SNAT ports. Use short idle timeout (for example 4 minutes).
56+
- Use connection pools to shape your connection volume.
57+
- Never silently abandon a TCP flow and rely on TCP timers to clean up flow. This will leave state allocated at intermediate systems and endpoints, and make ports unavailable for other connections. This can trigger application failures and SNAT exhaustion.
58+
- TCP close related timer values should not be changed without expert knowledge of impact. While TCP will recover, your application performance can be negatively impacted when the endpoints of a connection have mismatched expectations. The desire to change timers is usually a sign of an underlying design problem. Review following recommendations.
59+
60+
Often times SNAT exhaustion can also be amplified with other anti-patterns in the underlying application. Review these additional patterns and best practices to improve the scale and reliability of your service.
5261

5362
- Consider [asynchronous polling patterns](https://docs.microsoft.com/azure/architecture/patterns/async-request-reply) for long-running operations to free up connection resources for other operations.
54-
- Long-lived flows (for example reused TCP connections) should use TCP keepalives or application layer keepalives to avoid intermediate systems timing out.
63+
- Long-lived flows (for example reused TCP connections) should use TCP keepalives or application layer keepalives to avoid intermediate systems timing out. Increasing the idle timeout is a last resort and may not resolve the root cause. A long timeout can cause low rate failures when timeout expires and introduce delay and unnecessary failures.
5564
- Graceful [retry patterns](https://docs.microsoft.com/azure/architecture/patterns/retry) should be used to avoid aggressive retries/bursts during transient failure or failure recovery.
5665
Creating a new TCP connection for every HTTP operation (also known as "atomic connections") is an anti-pattern. Atomic connections will prevent your application from scaling well and waste resources. Always pipeline multiple operations into the same connection. Your application will benefit in transaction speed and resource costs. When your application uses transport layer encryption (for example TLS), there's a significant cost associated with the processing of new connections. Review [Azure Cloud Design Patterns](https://docs.microsoft.com/azure/architecture/patterns/) for additional best practice patterns.
5766

58-
#### Possible mitigations
67+
#### Additional possible mitigations
5968

6069
_**Solution:**_ Scale outbound connectivity as follows:
6170

@@ -161,7 +170,7 @@ You can indicate interest in additional capabilities through [Virtual Network NA
161170
## Next steps
162171

163172
* Learn about [Virtual Network NAT](nat-overview.md)
164-
* Learn about [NAT gateway resource](nat-gateway-resource.md)
173+
* Learn ab Fry out [NAT gateway resource](nat-gateway-resource.md)
165174
* Learn about [metrics and alerts for NAT gateway resources](nat-metrics.md).
166175
* [Tell us what to build next for Virtual Network NAT in UserVoice](https://aka.ms/natuservoice).
167176

0 commit comments

Comments
 (0)