Skip to content

Commit 0bc8848

Browse files
authored
Merge pull request #208580 from ealsur/users/ealsur/netpolish
Cosmos DB: Polishing NET troubleshooting docs
2 parents e874cd3 + 38cc554 commit 0bc8848

File tree

3 files changed

+25
-51
lines changed

3 files changed

+25
-51
lines changed

articles/cosmos-db/sql/troubleshoot-dot-net-sdk-slow-request.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to diagnose and fix slow requests when you use Azure Cosm
44
author: ealsur
55
ms.service: cosmos-db
66
ms.subservice: cosmosdb-sql
7-
ms.date: 07/08/2022
7+
ms.date: 08/19/2022
88
ms.author: maquaran
99
ms.topic: troubleshooting
1010
ms.reviewer: mjbrown
@@ -27,8 +27,12 @@ When you design your application, [follow the .NET SDK best practices](performan
2727
Consider the following when developing your application:
2828

2929
* The application should be in the same region as your Azure Cosmos DB account.
30-
* The SDK has several caches that have to be initialized, which might slow down the first few requests.
31-
* The connectivity mode should be direct and TCP.
30+
* Your [ApplicationRegion](/dotnet/api/microsoft.azure.cosmos.cosmosclientoptions.applicationregion), [ApplicationPreferredRegions](/dotnet/api/microsoft.azure.cosmos.cosmosclientoptions.applicationpreferredregions), or [PreferredLocations](/dotnet/api/microsoft.azure.documents.client.connectionpolicy.preferredlocations) for V2 SDK configuration is should reflect your regional preference and point to the region your application is deployed on.
31+
* There might be a bottleneck on the Network interface because of high traffic. If the application is running on Azure Virtual Machines, there are possible workarounds:
32+
* Consider using a [Virtual Machine with Accelerated Networking enabled](../../virtual-network/create-vm-accelerated-networking-powershell.md).
33+
* Enable [Accelerated Networking on an existing Virtual Machine](../../virtual-network/create-vm-accelerated-networking-powershell.md#enable-accelerated-networking-on-existing-vms).
34+
* Consider using a [higher end Virtual Machine](../../virtual-machines/sizes.md).
35+
* Prefer [direct connectivity mode](sql-sdk-connection-modes.md).
3236
* Avoid high CPU. Make sure to look at the maximum CPU and not the average, which is the default for most logging systems. Anything above roughly 40 percent can increase the latency.
3337

3438
## Metadata operations

articles/cosmos-db/sql/troubleshoot-dot-net-sdk.md

Lines changed: 4 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Diagnose and troubleshoot issues when using Azure Cosmos DB .NET SDK
33
description: Use features like client-side logging and other third-party tools to identify, diagnose, and troubleshoot Azure Cosmos DB issues when using .NET SDK.
44
author: seesharprun
55
ms.service: cosmos-db
6-
ms.date: 03/05/2021
6+
ms.date: 08/19/2022
77
ms.author: sidandrews
88
ms.reviewer: mjbrown
99
ms.subservice: cosmosdb-sql
@@ -63,9 +63,8 @@ If your app is deployed on [Azure Virtual Machines without a public IP address](
6363
* Assign a [public IP to your Azure VM](../../load-balancer/troubleshoot-outbound-connection.md#configure-an-individual-public-ip-on-vm).
6464

6565
### <a name="high-network-latency"></a>High network latency
66-
High network latency can be identified by using the [diagnostics string](/dotnet/api/microsoft.azure.documents.client.resourceresponsebase.requestdiagnosticsstring) in the V2 SDK or [diagnostics](/dotnet/api/microsoft.azure.cosmos.responsemessage.diagnostics#Microsoft_Azure_Cosmos_ResponseMessage_Diagnostics) in V3 SDK.
6766

68-
If no [timeouts](troubleshoot-dot-net-sdk-request-timeout.md) are present and the diagnostics show single requests where the high latency is evident.
67+
High network latency can be identified by using the diagnostics.
6968

7069
# [V3 SDK](#tab/diagnostics-v3)
7170

@@ -76,20 +75,6 @@ ItemResponse<MyItem> response = await container.CreateItemAsync<MyItem>(item);
7675
Console.WriteLine(response.Diagnostics.ToString());
7776
```
7877

79-
Network interactions in the diagnostics will be for example:
80-
81-
```json
82-
{
83-
"name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
84-
"id": "0e026cca-15d3-4cf6-bb07-48be02e1e82e",
85-
"component": "Transport",
86-
"start time": "12: 58: 20: 032",
87-
"duration in milliseconds": 1638.5957
88-
}
89-
```
90-
91-
Where the `duration in milliseconds` would show the latency.
92-
9378
# [V2 SDK](#tab/diagnostics-v2)
9479

9580
The diagnostics are available when the client is configured in [direct mode](sql-sdk-connection-modes.md), through the `RequestDiagnosticsString` property:
@@ -98,33 +83,13 @@ The diagnostics are available when the client is configured in [direct mode](sql
9883
ResourceResponse<Document> response = await client.ReadDocumentAsync(documentLink, new RequestOptions() { PartitionKey = new PartitionKey(partitionKey) });
9984
Console.WriteLine(response.RequestDiagnosticsString);
10085
```
101-
102-
And the latency would be on the difference between `ResponseTime` and `RequestStartTime`:
103-
104-
```bash
105-
RequestStartTime: 2020-03-09T22:44:49.5373624Z, RequestEndTime: 2020-03-09T22:44:49.9279906Z, Number of regions attempted:1
106-
ResponseTime: 2020-03-09T22:44:49.9279906Z, StoreResult: StorePhysicalAddress: rntbd://..., ...
107-
```
10886
---
10987

110-
This latency can have multiple causes:
111-
112-
* Your application is not running in the same region as your Azure Cosmos DB account.
113-
* Your [PreferredLocations](/dotnet/api/microsoft.azure.documents.client.connectionpolicy.preferredlocations) or [ApplicationRegion](/dotnet/api/microsoft.azure.cosmos.cosmosclientoptions.applicationregion) configuration is incorrect and is trying to connect to a different region to where your application is currently running on.
114-
* There might be a bottleneck on the Network interface because of high traffic. If the application is running on Azure Virtual Machines, there are possible workarounds:
115-
* Consider using a [Virtual Machine with Accelerated Networking enabled](../../virtual-network/create-vm-accelerated-networking-powershell.md).
116-
* Enable [Accelerated Networking on an existing Virtual Machine](../../virtual-network/create-vm-accelerated-networking-powershell.md#enable-accelerated-networking-on-existing-vms).
117-
* Consider using a [higher end Virtual Machine](../../virtual-machines/sizes.md).
88+
Please see our [latency troubleshooting guide](troubleshoot-dot-net-sdk-slow-request.md) once you have obtained diagnostics for the affected operations.
11889

11990
### Common query issues
12091

121-
The [query metrics](sql-api-query-metrics.md) will help determine where the query is spending most of the time. From the query metrics, you can see how much of it is being spent on the back-end vs the client. Learn more about [troubleshooting query performance](troubleshoot-query-performance.md).
122-
123-
* If the back-end query returns quickly, and spends a large time on the client check the load on the machine. It's likely that there are not enough resource and the SDK is waiting for resources to be available to handle the response.
124-
* If the back-end query is slow, try [optimizing the query](troubleshoot-query-performance.md) and looking at the current [indexing policy](../index-overview.md)
125-
126-
> [!NOTE]
127-
> For improved performance, we recommend Windows 64-bit host processing. The SQL SDK includes a native ServiceInterop.dll to parse and optimize queries locally. ServiceInterop.dll is supported only on the Windows x64 platform. For Linux and other unsupported platforms where ServiceInterop.dll isn't available, an additional network call will be made to the gateway to get the optimized query.
92+
The [query metrics](sql-api-query-metrics.md) will help determine where the query is spending most of the time. From the query metrics, you can see how much of it is being spent on the back-end vs the client. Learn more on the [query performance guide](performance-tips-query-sdk.md?pivots=programming-language-csharp).
12893

12994
If you encounter the following error: `Unable to load DLL 'Microsoft.Azure.Cosmos.ServiceInterop.dll' or one of its dependencies:` and are using Windows, you should upgrade to the latest Windows version.
13095

articles/cosmos-db/sql/troubleshoot-service-unavailable.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to diagnose and fix Azure Cosmos DB service unavailable e
44
author: rothja
55
ms.service: cosmos-db
66
ms.subservice: cosmosdb-sql
7-
ms.date: 08/06/2020
7+
ms.date: 08/19/2022
88
ms.author: jroth
99
ms.topic: troubleshooting
1010
ms.reviewer: mjbrown
@@ -13,7 +13,15 @@ ms.reviewer: mjbrown
1313
# Diagnose and troubleshoot Azure Cosmos DB service unavailable exceptions
1414
[!INCLUDE[appliesto-sql-api](../includes/appliesto-sql-api.md)]
1515

16-
The SDK wasn't able to connect to Azure Cosmos DB.
16+
The SDK wasn't able to connect to Azure Cosmos DB. This scenario can be transient or permanent depending on the network conditions.
17+
18+
It is important to make sure the application design is following our [guide for designing resilient applications with Azure Cosmos DB SDKs](conceptual-resilient-sdk-applications.md) to make sure it correctly reacts to different network conditions. Your application should have retries in place for service unavailable errors.
19+
20+
When evaluating the case for service unavailable errors:
21+
22+
* What is the impact measured in volume of operations affected compared to the operations succeeding? Is it within the service SLAs?
23+
* Is the P99 latency affected?
24+
* Are the failures affecting all your application instances or only a subset? When the issue is reduced to a subset of instances, it's commonly a problem related to those instances.
1725

1826
## Troubleshooting steps
1927
The following list contains known causes and solutions for service unavailable exceptions.
@@ -22,12 +30,7 @@ The following list contains known causes and solutions for service unavailable e
2230
Verify that all the [required ports](sql-sdk-connection-modes.md#service-port-ranges) are enabled.
2331

2432
### Client-side transient connectivity issues
25-
Service unavailable exceptions can surface when there are transient connectivity problems that are causing timeouts. Typically, the stack trace related to this scenario will contain a `TransportException` error. For example:
26-
27-
```C#
28-
TransportException: A client transport error occurred: The request timed out while waiting for a server response.
29-
(Time: xxx, activity ID: xxx, error code: ReceiveTimeout [0x0010], base error: HRESULT 0x80131500
30-
```
33+
Service unavailable exceptions can surface when there are transient connectivity problems that are causing timeouts and can be safely retried following the [design recommendations](conceptual-resilient-sdk-applications.md#timeouts-and-connectivity-related-failures-http-408503).
3134

3235
Follow the [request timeout troubleshooting steps](troubleshoot-dot-net-sdk-request-timeout.md#troubleshooting-steps) to resolve it.
3336

@@ -37,4 +40,6 @@ Check the [Azure status](https://azure.status.microsoft/status) to see if there'
3740

3841
## Next steps
3942
* [Diagnose and troubleshoot](troubleshoot-dot-net-sdk.md) issues when you use the Azure Cosmos DB .NET SDK.
40-
* Learn about performance guidelines for [.NET v3](performance-tips-dotnet-sdk-v3-sql.md) and [.NET v2](performance-tips.md).
43+
* [Diagnose and troubleshoot](troubleshoot-java-sdk-v4-sql.md) issues when you use the Azure Cosmos DB Java SDK.
44+
* Learn about performance guidelines for [.NET](performance-tips-dotnet-sdk-v3-sql.md).
45+
* Learn about performance guidelines for [Java](performance-tips-java-sdk-v4-sql.md).

0 commit comments

Comments
 (0)