Skip to content

Commit e2aafd4

Browse files
committed
update query troubleshooting guide
1 parent a86b49b commit e2aafd4

File tree

1 file changed

+58
-18
lines changed

1 file changed

+58
-18
lines changed

articles/cosmos-db/troubleshoot-query-performance.md

Lines changed: 58 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,31 +4,31 @@ description: Learn how to identify, diagnose, and troubleshoot Azure Cosmos DB S
44
author: ginamr
55
ms.service: cosmos-db
66
ms.topic: troubleshooting
7-
ms.date: 01/08/2020
7+
ms.date: 01/10/2020
88
ms.author: girobins
99
ms.subservice: cosmosdb-sql
1010
ms.reviewer: sngun
1111
---
12-
# Guide for Optimizing Queries in Azure Cosmos DB
12+
# Troubleshoot query issues when using Azure Cosmos DB
1313

14-
This document walks through a general recommended approach for troubleshooting queries in Azure Cosmos DB. While the steps outlined in this document should not be considered a “catch all” for potential query issues, we have consolidated most performance tips here. You should use this document as a starting place for troubleshooting for Azure Cosmos DB’s core (SQL) API.
14+
This article walks through a general recommended approach for troubleshooting queries in Azure Cosmos DB. While the steps outlined in this document should not be considered a “catch all” for potential query issues, we have consolidated the most common performance tips here. You should use this document as a starting place for troubleshooting for Azure Cosmos DB’s core (SQL) API.
1515

1616
You can broadly categorize query optimizations in Azure Cosmos DB: Optimizations that reduce the Request Unit (RU) charge of the query and optimizations that just reduce latency. By reducing the RU charge of a query, you will almost certainly decrease latency as well.
17-
This document will use examples that can be recreated using the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) data set.
17+
This document will use examples that can be recreated using the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
1818

1919
### Obtaining query metrics:
2020

2121
When optimizing a query in Azure Cosmos DB, the first step is always to [obtain the query metrics](profile-sql-api-query.md) for your query. These are also available through the Azure Portal as shown below:
2222

2323
![Obtaining query metrics](./media/troubleshoot-query-performance/obtain-query-metrics.jpg)
2424

25-
After obtaining query metrics, compare the Retrieved Document Count with the Loaded Document Count for your query. Use this comparison to identify the relevant sections to reference below.
25+
After obtaining query metrics, compare the Retrieved Document Count with the Output Document Count for your query. Use this comparison to identify the relevant sections to reference below.
2626

2727
You can reference the below section to understand the relevant query optimizations for your scenario:
2828

2929
### Query's RU charge is too high
3030

31-
#### Loaded Document Count is significantly greater than Retrieved Document Count
31+
#### Retrieved Document Count is significantly greater than Output Document Count
3232

3333
a. [Ensure that the indexing policy includes necessary paths](#ensure-that-the-indexing-policy-includes-necessary-paths)
3434

@@ -42,7 +42,7 @@ e. [Optimize JOIN expressions by using a subquery](#optimize-join-expressions-by
4242

4343
<br>
4444

45-
#### Loaded Document Count is approximately equal to Retrieved Document Count
45+
#### Retrieved Document Count is approximately equal to Output Document Count
4646

4747
a. [Avoid cross partition queries](#avoid-cross-partition-queries)
4848

@@ -62,15 +62,53 @@ c. [Increasing MaxConcurrency](#increasing-maxconcurrency)
6262

6363
d. [Increasing MaxBufferedItemCount](#increasing-maxbuffereditemcount)
6464

65-
## Optimizations for queries where Loaded Document Count significantly exceeds Retrieved Document Count:
65+
## Optimizations for queries where Retrieved Document Count significantly exceeds Output Document Count:
6666

67-
The Retrieved Document Count is the number of documents that will show up in the results of your query. The Loaded Document Count is the number of documents that needed to be scanned. If the Loaded Document Count is significantly higher than the Retrieved Document Count, then there was at least one part of your query that was unable to utilize the index.
67+
The Retrieved Document Count is the number of documents that the query needed to load. The Output Document Count is the number of documents that were needed for the results of the query. If the Retrieved Document Count is significantly higher than the Output Document Count, then there was at least one part of your query that was unable to utilize the index and needed to do a scan.
68+
69+
Below is an example of scan query that wasn't entirely served by the index.
70+
71+
Query:
72+
73+
```sql
74+
SELECT VALUE c.description
75+
FROM c
76+
WHERE UPPER(c.description) = "BABYFOOD, DESSERT, FRUIT DESSERT, WITHOUT ASCORBIC ACID, JUNIOR"
77+
```
78+
79+
Query Metrics:
80+
81+
```
82+
Retrieved Document Count : 60,951
83+
Retrieved Document Size : 399,998,938 bytes
84+
Output Document Count : 7
85+
Output Document Size : 510 bytes
86+
Index Utilization : 0.00 %
87+
Total Query Execution Time : 4,500.34 milliseconds
88+
Query Preparation Times
89+
Query Compilation Time : 0.09 milliseconds
90+
Logical Plan Build Time : 0.05 milliseconds
91+
Physical Plan Build Time : 0.04 milliseconds
92+
Query Optimization Time : 0.01 milliseconds
93+
Index Lookup Time : 0.01 milliseconds
94+
Document Load Time : 4,177.66 milliseconds
95+
Runtime Execution Times
96+
Query Engine Times : 322.16 milliseconds
97+
System Function Execution Time : 85.74 milliseconds
98+
User-defined Function Execution Time : 0.00 milliseconds
99+
Document Write Time : 0.01 milliseconds
100+
Client Side Metrics
101+
Retry Count : 0
102+
Request Charge : 4,059.95 RUs
103+
```
104+
105+
Retrieved Document Count (60,951) is significantly greater than Output Document Count (7) so this query needed to do a scan.
68106

69107
## Ensure that the indexing policy includes necessary paths
70108

71109
Your indexing policy should cover any properties included in WHERE clauses, ORDER BY clauses, JOINs, and most System Functions. The path specified in the index policy should match (case-sensitive) the property in the JSON documents.
72110

73-
If we run a simple query on the nutrition data set, we observe a much lower RU charge when the property in the WHERE clause is indexed.
111+
If we run a simple query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset, we observe a much lower RU charge when the property in the WHERE clause is indexed.
74112

75113
### Original
76114

@@ -124,13 +162,15 @@ You can add additional properties to the indexing policy at any time, with no im
124162

125163
## Understand which system functions utilize the index
126164

127-
If the expression can be translated into a range of string values, then it can utilize the index; otherwise, it cannot. Here is the list of string functions that can utilize the index:
165+
If the expression can be translated into a range of string values, then it can utilize the index; otherwise, it cannot.
166+
167+
Here is the list of string functions that can utilize the index:
128168

129169
- STARTSWITH(str_expr, str_expr)
130170
- LEFT(str_expr, num_expr) = str_expr
131-
- SUBSTRING(str_expr, num_expr, num_expr) = str_expr, but only if first num_expr is
171+
- SUBSTRING(str_expr, num_expr, num_expr) = str_expr, but only if first num_expr is 0
132172

133-
Some common system functions that must load each document are shown below:
173+
Some common system functions that do not use the index and must load each document are below:
134174

135175
| **System Function** | **Ideas for Optimization** |
136176
| --------------------------------------- |------------------------------------------------------------ |
@@ -144,7 +184,7 @@ Other parts of the query may still utilize the index despite the system function
144184

145185
## Optimize queries with both a filter and an ORDER BY clause
146186

147-
While queries with a filter and an ORDER BY clause will normally utilize a range index, they will be more efficient if they can be served from a composite index.
187+
While queries with a filter and an ORDER BY clause will normally utilize a range index, they will be more efficient if they can be served from a composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
148188

149189
### Original
150190

@@ -212,7 +252,7 @@ Updated indexing policy:
212252

213253
## Optimize queries that use DISTINCT
214254

215-
It will be more efficient to find the DISTINCT set of results if the duplicate results are consecutive. Adding an ORDER BY clause to the query and a composite index will ensure that duplicate results are consecutive. If you need to ORDER BY multiple properties, add a composite index.
255+
It will be more efficient to find the DISTINCT set of results if the duplicate results are consecutive. Adding an ORDER BY clause to the query and a composite index will ensure that duplicate results are consecutive. If you need to ORDER BY multiple properties, add a composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
216256

217257
### Original
218258

@@ -281,9 +321,9 @@ JOIN (SELECT VALUE s FROM s IN c.servings WHERE s.amount > 1)
281321

282322
Assume that only one item in the tags array matches the filter, and there are five items for both nutrients and servings arrays. The JOIN expressions will then expand to 1 x 1 x 5 x 5 = 25 items, as opposed to 1,000 items in the first query.
283323

284-
## Optimizations for queries where Loaded Document Count is approximately equal to Retrieved Document Count:
324+
## Optimizations for queries where Retrieved Document Count is approximately equal to Output Document Count:
285325

286-
If the Loaded Document Count is approximately equal to the Retrieved Document Count, it means the query did not have to scan many unnecessary documents. For many queries, such as those that use the TOP keyword, Loaded Document Count may exceed Retrieved Document Count by 1. This should not be cause for concern.
326+
If the Retrieved Document Count is approximately equal to the Output Document Count, it means the query did not have to scan many unnecessary documents. For many queries, such as those that use the TOP keyword, Retrieved Document Count may exceed Output Document Count by 1. This should not be cause for concern.
287327

288328
## Avoid cross partition queries
289329

@@ -356,7 +396,7 @@ Here is the relevant composite index:
356396
357397
## Common optimizations that reduce query latency (no impact on RU charge):
358398
359-
In many cases, RU charge may be acceptable but query latency is still too high. The below sections give an overview of tips for reducing query latency. If you run the same query multiple times on the same data set, it will have the same RU charge each time. However, query latency may vary between query executions.
399+
In many cases, RU charge may be acceptable but query latency is still too high. The below sections give an overview of tips for reducing query latency. If you run the same query multiple times on the same dataset, it will have the same RU charge each time. However, query latency may vary between query executions.
360400
361401
## Improving proximity between your app and Azure Cosmos DB
362402

0 commit comments

Comments
 (0)