You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/troubleshoot-query-performance.md
+58-18Lines changed: 58 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,31 +4,31 @@ description: Learn how to identify, diagnose, and troubleshoot Azure Cosmos DB S
4
4
author: ginamr
5
5
ms.service: cosmos-db
6
6
ms.topic: troubleshooting
7
-
ms.date: 01/08/2020
7
+
ms.date: 01/10/2020
8
8
ms.author: girobins
9
9
ms.subservice: cosmosdb-sql
10
10
ms.reviewer: sngun
11
11
---
12
-
# Guide for Optimizing Queries in Azure Cosmos DB
12
+
# Troubleshoot query issues when using Azure Cosmos DB
13
13
14
-
This document walks through a general recommended approach for troubleshooting queries in Azure Cosmos DB. While the steps outlined in this document should not be considered a “catch all” for potential query issues, we have consolidated most performance tips here. You should use this document as a starting place for troubleshooting for Azure Cosmos DB’s core (SQL) API.
14
+
This article walks through a general recommended approach for troubleshooting queries in Azure Cosmos DB. While the steps outlined in this document should not be considered a “catch all” for potential query issues, we have consolidated the most common performance tips here. You should use this document as a starting place for troubleshooting for Azure Cosmos DB’s core (SQL) API.
15
15
16
16
You can broadly categorize query optimizations in Azure Cosmos DB: Optimizations that reduce the Request Unit (RU) charge of the query and optimizations that just reduce latency. By reducing the RU charge of a query, you will almost certainly decrease latency as well.
17
-
This document will use examples that can be recreated using the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json)data set.
17
+
This document will use examples that can be recreated using the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json)dataset.
18
18
19
19
### Obtaining query metrics:
20
20
21
21
When optimizing a query in Azure Cosmos DB, the first step is always to [obtain the query metrics](profile-sql-api-query.md) for your query. These are also available through the Azure Portal as shown below:
After obtaining query metrics, compare the Retrieved Document Count with the Loaded Document Count for your query. Use this comparison to identify the relevant sections to reference below.
25
+
After obtaining query metrics, compare the Retrieved Document Count with the Output Document Count for your query. Use this comparison to identify the relevant sections to reference below.
26
26
27
27
You can reference the below section to understand the relevant query optimizations for your scenario:
28
28
29
29
### Query's RU charge is too high
30
30
31
-
#### Loaded Document Count is significantly greater than Retrieved Document Count
31
+
#### Retrieved Document Count is significantly greater than Output Document Count
32
32
33
33
a. [Ensure that the indexing policy includes necessary paths](#ensure-that-the-indexing-policy-includes-necessary-paths)
34
34
@@ -42,7 +42,7 @@ e. [Optimize JOIN expressions by using a subquery](#optimize-join-expressions-by
42
42
43
43
<br>
44
44
45
-
#### Loaded Document Count is approximately equal to Retrieved Document Count
45
+
#### Retrieved Document Count is approximately equal to Output Document Count
46
46
47
47
a. [Avoid cross partition queries](#avoid-cross-partition-queries)
48
48
@@ -62,15 +62,53 @@ c. [Increasing MaxConcurrency](#increasing-maxconcurrency)
62
62
63
63
d. [Increasing MaxBufferedItemCount](#increasing-maxbuffereditemcount)
64
64
65
-
## Optimizations for queries where Loaded Document Count significantly exceeds Retrieved Document Count:
65
+
## Optimizations for queries where Retrieved Document Count significantly exceeds Output Document Count:
66
66
67
-
The Retrieved Document Count is the number of documents that will show up in the results of your query. The Loaded Document Count is the number of documents that needed to be scanned. If the Loaded Document Count is significantly higher than the Retrieved Document Count, then there was at least one part of your query that was unable to utilize the index.
67
+
The Retrieved Document Count is the number of documents that the query needed to load. The Output Document Count is the number of documents that were needed for the results of the query. If the Retrieved Document Count is significantly higher than the Output Document Count, then there was at least one part of your query that was unable to utilize the index and needed to do a scan.
68
+
69
+
Below is an example of scan query that wasn't entirely served by the index.
70
+
71
+
Query:
72
+
73
+
```sql
74
+
SELECT VALUE c.description
75
+
FROM c
76
+
WHEREUPPER(c.description) ="BABYFOOD, DESSERT, FRUIT DESSERT, WITHOUT ASCORBIC ACID, JUNIOR"
77
+
```
78
+
79
+
Query Metrics:
80
+
81
+
```
82
+
Retrieved Document Count : 60,951
83
+
Retrieved Document Size : 399,998,938 bytes
84
+
Output Document Count : 7
85
+
Output Document Size : 510 bytes
86
+
Index Utilization : 0.00 %
87
+
Total Query Execution Time : 4,500.34 milliseconds
88
+
Query Preparation Times
89
+
Query Compilation Time : 0.09 milliseconds
90
+
Logical Plan Build Time : 0.05 milliseconds
91
+
Physical Plan Build Time : 0.04 milliseconds
92
+
Query Optimization Time : 0.01 milliseconds
93
+
Index Lookup Time : 0.01 milliseconds
94
+
Document Load Time : 4,177.66 milliseconds
95
+
Runtime Execution Times
96
+
Query Engine Times : 322.16 milliseconds
97
+
System Function Execution Time : 85.74 milliseconds
98
+
User-defined Function Execution Time : 0.00 milliseconds
99
+
Document Write Time : 0.01 milliseconds
100
+
Client Side Metrics
101
+
Retry Count : 0
102
+
Request Charge : 4,059.95 RUs
103
+
```
104
+
105
+
Retrieved Document Count (60,951) is significantly greater than Output Document Count (7) so this query needed to do a scan.
68
106
69
107
## Ensure that the indexing policy includes necessary paths
70
108
71
109
Your indexing policy should cover any properties included in WHERE clauses, ORDER BY clauses, JOINs, and most System Functions. The path specified in the index policy should match (case-sensitive) the property in the JSON documents.
72
110
73
-
If we run a simple query on the nutrition data set, we observe a much lower RU charge when the property in the WHERE clause is indexed.
111
+
If we run a simple query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset, we observe a much lower RU charge when the property in the WHERE clause is indexed.
74
112
75
113
### Original
76
114
@@ -124,13 +162,15 @@ You can add additional properties to the indexing policy at any time, with no im
124
162
125
163
## Understand which system functions utilize the index
126
164
127
-
If the expression can be translated into a range of string values, then it can utilize the index; otherwise, it cannot. Here is the list of string functions that can utilize the index:
165
+
If the expression can be translated into a range of string values, then it can utilize the index; otherwise, it cannot.
166
+
167
+
Here is the list of string functions that can utilize the index:
128
168
129
169
- STARTSWITH(str_expr, str_expr)
130
170
- LEFT(str_expr, num_expr) = str_expr
131
-
- SUBSTRING(str_expr, num_expr, num_expr) = str_expr, but only if first num_expr is
171
+
- SUBSTRING(str_expr, num_expr, num_expr) = str_expr, but only if first num_expr is 0
132
172
133
-
Some common system functions that must load each document are shown below:
173
+
Some common system functions that do not use the index and must load each document are below:
@@ -144,7 +184,7 @@ Other parts of the query may still utilize the index despite the system function
144
184
145
185
## Optimize queries with both a filter and an ORDER BY clause
146
186
147
-
While queries with a filter and an ORDER BY clause will normally utilize a range index, they will be more efficient if they can be served from a composite index.
187
+
While queries with a filter and an ORDER BY clause will normally utilize a range index, they will be more efficient if they can be served from a composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
148
188
149
189
### Original
150
190
@@ -212,7 +252,7 @@ Updated indexing policy:
212
252
213
253
## Optimize queries that use DISTINCT
214
254
215
-
It will be more efficient to find the DISTINCT set of results if the duplicate results are consecutive. Adding an ORDER BY clause to the query and a composite index will ensure that duplicate results are consecutive. If you need to ORDER BY multiple properties, add a composite index.
255
+
It will be more efficient to find the DISTINCT set of results if the duplicate results are consecutive. Adding an ORDER BY clause to the query and a composite index will ensure that duplicate results are consecutive. If you need to ORDER BY multiple properties, add a composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
216
256
217
257
### Original
218
258
@@ -281,9 +321,9 @@ JOIN (SELECT VALUE s FROM s IN c.servings WHERE s.amount > 1)
281
321
282
322
Assume that only one item in the tags array matches the filter, and there are five items for both nutrients and servings arrays. The JOIN expressions will then expand to 1 x 1 x 5 x 5 = 25 items, as opposed to 1,000 items in the first query.
283
323
284
-
## Optimizations for queries where Loaded Document Count is approximately equal to Retrieved Document Count:
324
+
## Optimizations for queries where Retrieved Document Count is approximately equal to Output Document Count:
285
325
286
-
If the Loaded Document Count is approximately equal to the Retrieved Document Count, it means the query did not have to scan many unnecessary documents. For many queries, such as those that use the TOP keyword, Loaded Document Count may exceed Retrieved Document Count by 1. This should not be cause for concern.
326
+
If the Retrieved Document Count is approximately equal to the Output Document Count, it means the query did not have to scan many unnecessary documents. For many queries, such as those that use the TOP keyword, Retrieved Document Count may exceed Output Document Count by 1. This should not be cause for concern.
287
327
288
328
## Avoid cross partition queries
289
329
@@ -356,7 +396,7 @@ Here is the relevant composite index:
356
396
357
397
## Common optimizations that reduce query latency (no impact on RU charge):
358
398
359
-
In many cases, RU charge may be acceptable but query latency is still too high. The below sections give an overview of tips for reducing query latency. If you run the same query multiple times on the same data set, it will have the same RU charge each time. However, query latency may vary between query executions.
399
+
In many cases, RU charge may be acceptable but query latency is still too high. The below sections give an overview of tips for reducing query latency. If you run the same query multiple times on the same dataset, it will have the same RU charge each time. However, query latency may vary between query executions.
360
400
361
401
## Improving proximity between your app and Azure Cosmos DB
0 commit comments