Skip to content

Commit c9f337b

Browse files
Further updates
1 parent dcc6bc1 commit c9f337b

File tree

2 files changed

+178
-8
lines changed

2 files changed

+178
-8
lines changed

support/azure/app-service/toc.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,4 +48,7 @@
4848
href: troubleshoot-vnet-integration-apps.md
4949
- name: Troubleshoot Performance Degradation
5050
href: troubleshoot-performance-degradation.md
51+
- name: Troubleshoot Intermittent Outbound Connection Errors
52+
href: troubleshoot-intermittent-outbound-connection-errors.md
53+
5154

support/azure/app-service/troubleshoot-intermittent-outbound-connection-errors.md

Lines changed: 175 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,183 @@
11
---
2-
title:
3-
description:
4-
author: JarrettRenshaw
5-
manager: dcscontentpm
6-
ms.topic:
7-
ms.service: azure-app-service
2+
title: Troubleshoot Intermittent Outbound Connection Errors
3+
description: Learn how to troubleshoot intermittent connection errors and related performance issues in Azure App Service.
4+
ms.topic: troubleshooting-general
85
ms.date: 10/14/2025
9-
ms.author: jarrettr
6+
ms.custom: security-recommendations,fasttrack-edit
7+
ms.author: msangapu
8+
author: msangapu-msft
9+
manager: dcscontentpm
1010
ms.reviewer: v-ryanberg
11+
ms.service: azure-app-service
1112
---
12-
#
1313

14+
# Troubleshoot intermittent outbound connection errors in Azure App Service
15+
16+
This article helps you troubleshoot intermittent connection errors and related performance issues in [Azure App Service](/azure/app-service/overview). It provides more information on, and troubleshooting methodologies for, exhaustion of source network address translation (SNAT) ports. If you require more help at any point in this article, contact the Azure experts at [Azure Community Support](https://azure.microsoft.com/support/community/). Alternatively, you can file an Azure support incident. Go to [Azure support](https://azure.microsoft.com/support/options/) and select **Submit a support ticket**.
17+
18+
## Symptoms
19+
20+
Applications and functions hosted on Azure App service might exhibit one or more of the following issues:
21+
22+
* Slow response times on all or some of the instances in a service plan.
23+
* Intermittent 5xx or **Bad Gateway** errors.
24+
* Time-out error messages.
25+
* Couldn't connect to external endpoints (like SQLDB, Service Fabric, or other app services).
26+
27+
## Cause
28+
29+
The major cause for intermittent connection issues is hitting a limit while making new outbound connections. The limits you can hit include:
30+
31+
* TCP connections: There's a limit on the number of outbound connections that can be made. The limit on outbound connections is associated with the size of the worker used.
32+
* SNAT ports: [Outbound connections in Azure](/azure/load-balancer/load-balancer-outbound-connections) describes SNAT port restrictions and how they affect outbound connections. Azure uses source network address translation (SNAT) and Load Balancers (not exposed to customers) to communicate with public IP addresses. Each instance on Azure App service is initially given a preallocated number of *128* SNAT ports. The SNAT port limit affects opening connections to the same address and port combination. If your app creates connections to a mix of address and port combinations, you won't use up your SNAT ports. The SNAT ports are used up when you have repeated calls to the same address and port combination. Once a port is released, the port is available for reuse as needed. The Azure Network load balancer reclaims SNAT ports from closed connections only after waiting for four minutes.
33+
34+
When applications or functions rapidly open a new connection, they can quickly exhaust their preallocated quota of 128 ports. They're then blocked until a new SNAT port becomes available, either through dynamically allocating more SNAT ports, or through reuse of a reclaimed SNAT port. If your app runs out of SNAT ports, it has intermittent outbound connectivity issues.
35+
36+
## Avoiding the problem
37+
38+
There are a few solutions that let you avoid SNAT port limitations. They include:
39+
40+
* Connection pools: By pooling your connections, you avoid opening new network connections for calls to the same address and port.
41+
* Service endpoints: You don't have a SNAT port restriction to services secured with service endpoints.
42+
* Private endpoints: You don't have a SNAT port restriction to services secured with private endpoints.
43+
* NAT gateway: With a NAT gateway, you have 64k outbound SNAT ports that are usable by the resources sending traffic through it.
44+
45+
To avoid the SNAT port problem, you prevent the creation of new connections repetitively to the same host and port. Connection pools are one of the more obvious ways to solve that problem.
46+
47+
If your destination is an Azure service that supports service endpoints, you can avoid SNAT port exhaustion issues by using [regional virtual network integration](/azure/app-service/overview-vnet-integration) and service endpoints or private endpoints. When you use regional virtual network integration and place service endpoints on the integration subnet, your app outbound traffic to those services won't have outbound SNAT port restrictions. Likewise, if you use regional virtual network integration and private endpoints, you won't have any outbound SNAT port issues to that destination.
48+
49+
If your destination is an external endpoint outside of Azure, [using a NAT gateway](/azure/app-service/networking/nat-gateway-integration) gives you 64k outbound SNAT ports. It also gives you a dedicated outbound address that you don't share with anybody.
50+
51+
If possible, improve your code to use connection pools and avoid the entire situation. It isn't always possible to change code fast enough to mitigate this situation. For the cases where you can't change your code in time, take advantage of the other solutions. The best solution to the problem is to combine all of the solutions as best you can. Try to use service endpoints and private endpoints to Azure services and the NAT gateway for the rest.
52+
53+
To learn more about strategies for mitigating SNAT port exhaustion, see [Use SNAT for outbound connections](/azure/load-balancer/load-balancer-outbound-connections). Of these strategies, the following are applicable to apps and functions hosted on Azure App service.
54+
55+
### Use connection pooling
56+
57+
* For pooling HTTP connections, review [Pool HTTP connections with HttpClientFactory](/aspnet/core/performance/performance-best-practices#pool-http-connections-with-httpclientfactory).
58+
* For information on SQL Server connection pooling, review [SQL Server Connection Pooling (ADO.NET)](/dotnet/framework/data/adonet/sql-server-connection-pooling).
59+
60+
The following articles describe implementing connection pooling by different solution stack.
61+
62+
#### Node
63+
64+
By default, connections for Node.js aren't kept alive.
65+
66+
* [MySQL](https://github.com/mysqljs/mysql#pooling-connections)
67+
* [MongoDB](https://www.mongodb.com/docs/manual/administration/connection-pool-overview)
68+
* [PostgreSQL](https://node-postgres.com/features/pooling)
69+
* [SQL Server](https://github.com/tediousjs/node-mssql#connection-pools)
70+
71+
HTTP keep-alive
72+
73+
* [agentkeepalive](https://www.npmjs.com/package/agentkeepalive)
74+
* [Node.js v13.9.0 documentation](https://nodejs.org/api/http.html)
75+
76+
#### Java
77+
78+
Java Database Connectivity (JDBC) connection pooling
79+
80+
* [Tomcat 8](https://tomcat.apache.org/tomcat-8.0-doc/jdbc-pool.html)
81+
* [C3p0](https://github.com/swaldman/c3p0)
82+
* [HikariCP](https://github.com/brettwooldridge/HikariCP)
83+
* [Apache DBCP](https://commons.apache.org/proper/commons-dbcp/)
84+
85+
HTTP connection pooling
86+
87+
* [HttpClient Overview](https://hc.apache.org/httpcomponents-client-5.4.x/)
88+
89+
#### PHP
90+
91+
Although PHP doesn't support connection pooling, you can try using persistent database connections to your back-end server.
92+
93+
* MySQL server
94+
95+
* [MySQLi connections](https://www.php.net/manual/mysqli.quickstart.connections.php) for newer versions
96+
* [mysql_pconnect](https://www.php.net/manual/function.mysql-pconnect.php) for older versions of PHP
97+
98+
* Other data Sources
99+
100+
* [PHP connection management](https://www.php.net/manual/pdo.connections.php)
101+
102+
#### Python
103+
104+
* [MySQL](https://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html)
105+
* [MariaDB](https://mariadb.com/docs/connectors/mariadb-connector-python/api/pool/)
106+
* [PostgreSQL](https://www.psycopg.org/docs/pool.html)
107+
* [Pyodbc](https://github.com/mkleehammer/pyodbc/wiki/The-pyodbc-Module#pooling)
108+
* [SQLAlchemy](https://docs.sqlalchemy.org/en/20/core/pooling.html)
109+
110+
HTTP connection pooling
111+
112+
* Keep-alive and HTTP connection pooling are enabled by default in [Requests](https://requests.readthedocs.io/en/latest/user/advanced/#keep-alive) module.
113+
* [Urllib3](https://urllib3.readthedocs.io/en/stable/reference/urllib3.connectionpool.html)
114+
115+
### Reuse connections
116+
117+
For more pointers and examples on managing connections in Azure functions, see [Manage connections in Azure Functions](/azure/azure-functions/manage-connections).
118+
119+
### Use less aggressive retry logic
120+
121+
For more guidance and examples, see [Retry pattern](/azure/architecture/patterns/retry).
122+
123+
### Use keepalives to reset the outbound idle time out
124+
125+
For implementing keepalives for Node.js apps, see [My node application is making excessive outbound calls](/azure/app-service/app-service-web-nodejs-best-practices-and-troubleshoot-guide#my-node-application-is-making-excessive-outbound-calls).
126+
127+
### More guidance specific to App Service
128+
129+
* A [load test](/azure/devops/test/load-test/app-service-web-app-performance-test) should simulate real-world data in a steady feeding speed. Testing apps and functions under real-world stress can identify and resolve SNAT port exhaustion issues ahead of time.
130+
* Ensure that the back-end services can return responses quickly. For troubleshooting performance issues with Azure SQL Database, review [Troubleshoot Azure SQL Database performance issues with Intelligent Insights](/azure/azure-sql/database/intelligent-insights-troubleshoot-performance#recommended-troubleshooting-flow).
131+
* Scale out the App Service plan to more instances. For more information on scaling, see [Scale an app in Azure App Service](/azure/app-service/manage-scale-up). Each worker instance in an app service plan is allocated a number of SNAT ports. If you spread your usage across more instances, you might get the SNAT port usage per instance below the recommended limit of 100 outbound connections, per unique remote endpoint.
132+
* Consider moving to [App Service Environment (ASE)](/azure/app-service/environment/using-an-ase), where you're allotted a single outbound IP address, and the limits for connections and SNAT ports are higher. In an ASE, the number of SNAT ports per instance is based on the [Azure load balancer preallocation table](/azure/load-balancer/load-balancer-outbound-connections#snatporttable). For example, an ASE with 1-50 worker instances has 1,024 preallocated ports per instance, while an ASE with 51-100 worker instances has 512 preallocated ports per instance.
133+
134+
Avoiding the outbound TCP limits is easier to solve, as the limits are set by the size of your worker. You can see the limits in [Sandbox Cross VM Numerical Limits - TCP Connections](https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#cross-vm-numerical-limits)
135+
136+
|Limit name|Description|Small (A1)|Medium (A2)|Large (A3)|Isolated tier (ASE)|
137+
|---|---|---|---|---|---|
138+
|Connections|Number of connections across entire VM|1920|3968|8064|16,000|
139+
140+
To avoid outbound TCP limits, you can either increase the size of your workers, or scale out horizontally.
141+
142+
## Troubleshooting help
143+
144+
Knowing the two types of outbound connection limits, and what your app does, should make it easier to troubleshoot. If you know that your app makes many calls to the same storage account, you might suspect a SNAT limit. If your app creates a great many calls to endpoints all over the internet, you would suspect you're reaching the virtual machine limit.
145+
146+
If you don't know the application behavior enough to determine the cause quickly, there are some tools and techniques available in App Service to help with that determination.
147+
148+
### Find SNAT port allocation information
149+
150+
You can use [App Service Diagnostics](/azure/app-service/overview-diagnostics) to find SNAT port allocation information, and observe the SNAT ports allocation metric of an App Service site. To find SNAT port allocation information, follow the following steps:
151+
152+
1. To access App Service diagnostics, navigate to your App Service web app or App Service Environment in the [Azure portal](https://portal.azure.com/). In the sidebar menu, select **Diagnose and solve problems**.
153+
1. Select **Availability and Performance** category.
154+
1. Select SNAT Port Exhaustion tile in the list of available tiles under the category. The practice is to keep it below 128.
155+
If you do need it, you can still open a support ticket, and the support engineer will get the metric from back-end for you.
156+
157+
Since SNAT port usage isn't available as a metric, it isn't possible to either autoscale based on SNAT port usage, or to configure autoscale based on SNAT ports allocation metric.
158+
159+
### TCP connections and SNAT ports
160+
161+
TCP connections and SNAT ports aren't directly related. A TCP connections usage detector is included in the **Diagnose and Solve Problems** management page of any App Service app. Search for the phrase *TCP connections* to find it.
162+
163+
* The SNAT ports are only used for external network flows, while the total TCP connections includes local loopback connections.
164+
* A SNAT port can be shared by different flows, if the flows are different in either protocol, IP address or port. The TCP Connections metric counts every TCP connection.
165+
* The TCP connections limit happens at the worker instance level. The Azure Network outbound load balancing doesn't use the TCP Connections metric for SNAT port limiting.
166+
* The TCP connections limits are described in [Sandbox Cross VM Numerical Limits - TCP Connections](https://github.com/projectkudu/kudu/wiki/Azure-Web-App-sandbox#cross-vm-numerical-limits).
167+
* Existing TCP sessions fail when new outbound TCP sessions are added from Azure App Service source port. You can either use a single IP or reconfigure backend pool members to avoid conflicts.
168+
169+
|Limit name|Description|Small (A1)|Medium (A2)|Large (A3)|Isolated tier (ASE)|
170+
|---|---|---|---|---|---|
171+
|Connections|Number of connections across entire VM|1920|3968|8064|16,000|
172+
173+
### WebJobs and database connections
174+
175+
If SNAT ports are exhausted, and WebJobs are unable to connect to SQL Database, there's no metric to show how many connections are opened by each individual web application process. To find the problematic WebJob, move several WebJobs out to another App Service plan to see if the situation improves, or if an issue remains in one of the plans. Repeat the process until you find the problematic WebJob.
176+
177+
## Related content
178+
179+
* [SNAT with App Service](https://4lowtherabbit.github.io/blogs/2019/10/SNAT/)
180+
* [Troubleshoot slow app performance issues in Azure App Service](/azure/app-service/troubleshoot-performance-degradation)
14181

15182
[!INCLUDE [azure-help-support](~/includes/azure-help-support.md)]
16183

0 commit comments

Comments
 (0)