Tweaks

christiankuhtz · web-flow · commit ec2e3e2e2206 · 2020-03-14T01:00:19.000-07:00
diff --git a/articles/virtual-network/troubleshoot-nat.md b/articles/virtual-network/troubleshoot-nat.md
@@ -48,10 +48,16 @@ Frequently the root cause of SNAT exhaustion is an anti-pattern for how outbound
 
 Always take advantage of connection reuse and connection pooling whenever possible.  These patterns will avoid resource exhaustion problems and result in predictable behavior. Primitives for these patterns can be found in many development libraries and frameworks.
 
-_**Solution:**_ Use appropriate patterns
+_**Solution:**_ Use appropriate patterns and best practices
+
+- Atomic requests (one request per connection) are a poor design choice. Such anti-pattern limits scale, reduces performance, and decreases reliability. Instead, reuse HTTP/S connections to reduce the numbers of connections and associated SNAT ports. The application scale will increase and performance improve due to reduced handshakes, overhead, and cryptographic operation cost  when using TLS.
+- Use connection pools to shape your connection volume.
+- Never silently abandon a TCP flow and rely on TCP timers to clean flow up. This will leave state allocated and unavailable for other connections. This can trigger application failures and SNAT exhaustion. - Modifying TCP close timers is discouraged. While TCP will recover, your application performance can be negatively impacted when the endpoints of a connection have mismatched expectations.
+
+Often times SNAT exhaustion can also be amplified with other anti-patterns in the underlying application. Review these additional patterns and best practices to improve the scale and reliability of your service.
 
 - Consider [asynchronous polling patterns](https://docs.microsoft.com/azure/architecture/patterns/async-request-reply) for long-running operations to free up connection resources for other operations.
-- Long-lived flows (for example reused TCP connections) should use TCP keepalives or application layer keepalives to avoid intermediate systems timing out.
+- Long-lived flows (for example reused TCP connections) should use TCP keepalives or application layer keepalives to avoid intermediate systems timing out. Increasing the timeout is a last resort and may not resolve the root cause. A long timeout can cause low rate failures when timeout expires and introduce delay and unneccesary failures.
 - Graceful [retry patterns](https://docs.microsoft.com/azure/architecture/patterns/retry) should be used to avoid aggressive retries/bursts during transient failure or failure recovery.
 Creating a new TCP connection for every HTTP operation (also known as "atomic connections") is an anti-pattern.  Atomic connections will prevent your application from scaling well and waste resources.  Always pipeline multiple operations into the same connection.  Your application will benefit in transaction speed and resource costs.  When your application uses transport layer encryption (for example TLS), there's a significant cost associated with the processing of new connections.  Review [Azure Cloud Design Patterns](https://docs.microsoft.com/azure/architecture/patterns/) for additional best practice patterns.