You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Diagnose and troubleshoot Azure Cosmos DB Java Async SDK| Microsoft Docs
3
-
description: Use features like client-side logging, and other third-party tools to identify, diagnose, and troubleshoot Azure Cosmos DB issues.
3
+
description: Use features like client-side logging and other third-party tools to identify, diagnose, and troubleshoot Azure Cosmos DB issues.
4
4
services: cosmos-db
5
5
author: moderakh
6
6
@@ -13,54 +13,60 @@ ms.component: cosmosdb-sql
13
13
ms.topic: troubleshooting
14
14
---
15
15
16
-
# Troubleshooting issues when using Java Async SDK with Azure Cosmos DB SQL API accounts
17
-
This article covers common issues, workarounds, diagnostics steps, and tools when using [Java Async ADK](sql-api-sdk-async-java.md) with Azure Cosmos DB SQL API accounts.
18
-
Java Async SDK provides client-side logical representation to access Azure Cosmos DB SQL API. This article describes the tools and approaches to help you if you run into any issues.
16
+
# Troubleshoot issues when you use the Java Async SDK with Azure Cosmos DB SQL API accounts
17
+
This article covers common issues, workarounds, diagnostic steps, and tools when you use the [Java Async SDK](sql-api-sdk-async-java.md) with Azure Cosmos DB SQL API accounts.
18
+
The Java Async SDK provides client-side logical representation to access the Azure Cosmos DB SQL API. This article describes tools and approaches to help you if you run into any issues.
19
19
20
20
Start with this list:
21
-
* Take a look at the [Common issues and workarounds] section in this article.
22
-
* Our SDK is [open-source on github](https://github.com/Azure/azure-cosmosdb-java) and we have [issues section](https://github.com/Azure/azure-cosmosdb-java/issues) that we actively monitor. Check if you find any similar issue with a workaround already filed.
23
-
* Review [performance tips](performance-tips-async-java.md) and follow the suggested practices.
24
-
* Follow the rest of this article, if you didn't find a solution, file a [GitHub issue](https://github.com/Azure/azure-cosmosdb-java/issues).
21
+
22
+
* Take a look at the [Common issues and workarounds] section in this article.
23
+
* Look at the SDK, which is available [open source on GitHub](https://github.com/Azure/azure-cosmosdb-java). It has an [issues section](https://github.com/Azure/azure-cosmosdb-java/issues) that's actively monitored. Check to see if any similar issue with a workaround is already filed.
24
+
* Review the [performance tips](performance-tips-async-java.md), and follow the suggested practices.
25
+
* Read the rest of this article, if you didn't find a solution. Then file a [GitHub issue](https://github.com/Azure/azure-cosmosdb-java/issues).
25
26
26
27
## <aname="common-issues-workarounds"></a>Common issues and workarounds
* Make sure the app is running on the same region as your Cosmos DB account.
32
-
* Check the CPU usage on the host where the app is running. If CPU usage is 90% or more, consider running your app on a host with higher configuration or distribute the load on more machines.
32
+
* Make sure the app is running on the same region as your Azure Cosmos DB account.
33
+
* Check the CPU usage on the host where the app is running. If CPU usage is 90 percent or more, run your app on a host with a higher configuration. Or you can distribute the load on more machines.
33
34
34
35
#### Connection throttling
35
-
Connection throttling can happen due to either [Connection limit on host machine], or [Azure SNAT (PAT) port exhaustion]:
36
+
Connection throttling can happen because of either a [connection limit on a host machine] or [Azure SNAT (PAT) port exhaustion].
36
37
37
-
##### <aname="connection-limit-on-host"></a>Connection limit on host machine
38
-
Some Linux systems (like 'Red Hat') have an upper limit on the total number of open files. Sockets in Linux are implemented as files, so this number limits the total number of connections too.
39
-
Run the following command:
38
+
##### <aname="connection-limit-on-host"></a>Connection limit on a host machine
39
+
Some Linux systems, such as Red Hat, have an upper limit on the total number of open files. Sockets in Linux are implemented as files, so this number limits the total number of connections, too.
40
+
Run the following command.
40
41
41
42
```bash
42
43
ulimit -a
43
44
```
44
-
The number of open files ("nofile") needs to be large enough (at least as double as your connection pool size). Read more detail in [performance tips](performance-tips-async-java.md).
45
+
The number of max allowed open files, which are identified as "nofile," needs to be at least double your connection pool size. For more information, see [Performance tips](performance-tips-async-java.md).
45
46
46
47
##### <aname="snat"></a>Azure SNAT (PAT) port exhaustion
47
48
48
-
If your app is deployed on Azure VM without a public IP address, by default [Azure SNAT ports](https://docs.microsoft.com/azure/load-balancer/load-balancer-outbound-connections#preallocatedports) are used to establish connections to any endpoint outside of your VM. The number of connections allowed from the VM to the Cosmos DB endpoint is limited by the [Azure SNAT configuration](https://docs.microsoft.com/azure/load-balancer/load-balancer-outbound-connections#preallocatedports).
49
+
If your app is deployed on Azure Virtual Machines without a public IP address, by default [Azure SNAT ports](https://docs.microsoft.com/azure/load-balancer/load-balancer-outbound-connections#preallocatedports) establish connections to any endpoint outside of your VM. The number of connections allowed from the VM to the Azure Cosmos DB endpoint is limited by the [Azure SNAT configuration](https://docs.microsoft.com/azure/load-balancer/load-balancer-outbound-connections#preallocatedports).
50
+
51
+
Azure SNAT ports are used only when your VM has a private IP address and a process from the VM tries to connect to a public IP address. There are two workarounds to avoid Azure SNAT limitation:
49
52
50
-
The Azure SNAT ports are used only when your Azure VM has a private IP address and a process from the VM attempts to establish a connection to a public IP address. So, there are two workarounds to avoid Azure SNAT limitation:
51
-
* Add your Azure Cosmos DB service endpoint to the subnet of your Azure VM VNET as explained in [Enabling VNET Service Endpoint](https://docs.microsoft.com/azure/virtual-network/virtual-network-service-endpoints-overview). When service endpoint is enabled, the requests no longer are sent from a public IP to cosmos DB instead the VNET and subnet identity is sent. This change may result in firewall drops if only public IPs are allowed. If you are using firewall, when enabling service endpoint, add subnet to firewall using [VNET ACLs](https://docs.microsoft.com/azure/virtual-network/virtual-networks-acl).
52
-
* Assign a public IP to your Azure VM.
53
+
* Add your Azure Cosmos DB service endpoint to the subnet of your Azure Virtual Machines virtual network. For more information, see [Azure Virtual Network service endpoints](https://docs.microsoft.com/azure/virtual-network/virtual-network-service-endpoints-overview).
53
54
54
-
#### Http proxy
55
+
When the service endpoint is enabled, the requests are no longer sent from a public IP to Azure Cosmos DB. Instead, the virtual network and subnet identity are sent. This change might result in firewall drops if only public IPs are allowed. If you use a firewall, when you enable the service endpoint, add a subnet to the firewall by using [Virtual Network ACLs](https://docs.microsoft.com/azure/virtual-network/virtual-networks-acl).
56
+
* Assign a public IP to your Azure VM.
55
57
56
-
If you use an HttpProxy, make sure your HttpProxy can support the number of connections configured in the SDK `ConnectionPolicy`.
58
+
#### HTTP proxy
59
+
60
+
If you use an HTTP proxy, make sure it can support the number of connections configured in the SDK `ConnectionPolicy`.
The SDK uses [Netty](https://netty.io/) IO library for communicating to Azure Cosmos DB Service. We have Async API and we use non-blocking IO APIs of netty. The SDK's IO work is performed on IO netty threads. The number of IO netty threads is configured to be the same as the number of the CPU cores of the app machine. The netty IO threads are only meant to be used for non blocking netty IO work. The SDK returns the API invocation result on one of the netty IO threads to the apps's code. If the app after receiving results on the netty thread performs a long lasting operation on the netty thread, that may result in SDK to not have enough number of IO threads for performing its internal IO work. Such app coding may result in low throughput, high latency, and `io.netty.handler.timeout.ReadTimeoutException` failures. The workaround is to switch the thread when you know the operation will take time.
65
+
The SDK uses the [Netty](https://netty.io/) IO library to communicate with Azure Cosmos DB. The SDK has Async APIs and uses non-blocking IO APIs of Netty. The SDK's IO work is performed on IO Netty threads. The number of IO Netty threads is configured to be the same as the number of CPU cores of the app machine.
66
+
67
+
The Netty IO threads are meant to be used only for non-blocking Netty IO work. The SDK returns the API invocation result on one of the Netty IO threads to the app's code. If the app performs a long-lasting operation after it receives results on the Netty thread, the SDK might not have enough IO threads to perform its internal IO work. Such app coding might result in low throughput, high latency, and `io.netty.handler.timeout.ReadTimeoutException` failures. The workaround is to switch the thread when you know the operation takes time.
62
68
63
-
For example, the following code snippet shows that if you perform longlasting work, which takes more than a few milliseconds, on the netty thread, you eventually can get into a state where no netty IO thread is present to process IO work, and as a result you get ReadTimeoutException:
69
+
For example, take a look at the following code snippet. You might perform long-lasting work that takes more than a few milliseconds on the Netty thread. If so, you eventually can get into a state where no Netty IO thread is present to process IO work. As a result, you get a ReadTimeoutException failure.
Whenever you need to do time taking work (for example, computationally heavy work, blocking IO), switch the thread to a worker provided by your `customScheduler` using `.observeOn(customScheduler)` API.
127
+
You might need to do work that takes time, for example, computationally heavy work or blocking IO. In this case, switch the thread to a worker provided by your `customScheduler`by using the`.observeOn(customScheduler)` API.
.observeOn(customScheduler) //switches the thread.
133
+
.observeOn(customScheduler) //Switches the thread.
128
134
.subscribe(
129
135
// ...
130
136
);
131
137
```
132
-
By using `observeOn(customScheduler)`, you release the netty IO thread and switch to your own custom thread provided by customScheduler.
133
-
This modification will solve the problem, and you won't get `io.netty.handler.timeout.ReadTimeoutException` failure anymore.
138
+
By using `observeOn(customScheduler)`, you release the Netty IO thread and switch to your own custom thread provided by the custom scheduler.
139
+
This modification solves the problem. You won't get a`io.netty.handler.timeout.ReadTimeoutException` failure anymore.
134
140
135
141
### Connection pool exhausted issue
136
142
137
-
`PoolExhaustedException` is a client-side failure. If you get this failure often, that's indication that your app workload is higher than what the SDK connection pool can serve. Increasing connection pool size or distributing the load on multiple apps may help.
143
+
`PoolExhaustedException` is a client-side failure. This failure indicates thatyour app workload is higher than what the SDK connection pool can serve. Increase the connection pool size or distribute the load on multiple apps.
138
144
139
145
### Request rate too large
140
-
This failure is a server-side failure indicating that you consumed your provisioned throughput and should retry later. If you get this failure often, consider increasing the collection throughput.
146
+
This failure is a server-side failure. It indicates that you consumed your provisioned throughput. Retry later. If you get this failure often, consider an increase in the collection throughput.
141
147
142
148
### Failure connecting to Azure Cosmos DB emulator
143
149
144
-
Cosmos DB emulator HTTPS certificate is self-signed. For SDK to work with emulator you should import the emulator certificate to Java TrustStore. As explained [here](local-emulator-export-ssl-certificates.md).
150
+
The Azure Cosmos DB emulator HTTPS certificate is self-signed. For the SDK to work with the emulator, import the emulator certificate to a Java TrustStore. For more information, see [Export Azure Cosmos DB emulator certificates](local-emulator-export-ssl-certificates.md).
Review [sfl4j logging manual](https://www.slf4j.org/manual.html) for more information.
190
+
For more information, see the [sfl4j logging manual](https://www.slf4j.org/manual.html).
185
191
186
192
## <aname="netstats"></a>OS network statistics
187
-
Run netstat command to get a sense of how many connections are in `Established` state,`CLOSE_WAIT` state, etc.
193
+
Run the netstat command to get a sense of how many connections are in states such as `ESTABLISHED` and`CLOSE_WAIT`.
188
194
189
-
On Linux you can run the following command:
195
+
On Linux, you can run the following command.
190
196
```bash
191
197
netstat -nap
192
198
```
193
-
Filter the result to only connections to Cosmos DB endpoint.
199
+
Filter the result to only connections to the Azure Cosmos DB endpoint.
194
200
195
-
Apparently, the number of connections to Cosmos DB endpoint in `Established` state should be not greater than your configured connection pool size.
201
+
The number of connections to the Azure Cosmos DB endpoint in the `ESTABLISHED` state can't be greater than your configured connection pool size.
196
202
197
-
If there are many connections to Cosmos DB endpoint in `CLOSE_WAIT` state, for example more than 1000 connections, that's an indication of connections are established and torn down quickly, which may potentially cause problems. Review [Common issues and workarounds] section for more detail.
203
+
Many connections to the Azure Cosmos DB endpoint might be in the `CLOSE_WAIT` state. There might be more than 1,000. A number that high indicates that connections are established and torn down quickly. This situation potentially causes problems. For more information, see the [Common issues and workarounds] section.
198
204
199
205
<!--Anchors-->
200
206
[Common issues and workarounds]: #common-issues-workarounds
0 commit comments