Skip to content

Commit 9d11c5d

Browse files
authored
Merge pull request #112422 from timsander1/master
refresh query troubleshooting guide
2 parents 8ff4272 + 7cfb712 commit 9d11c5d

File tree

1 file changed

+67
-36
lines changed

1 file changed

+67
-36
lines changed

articles/cosmos-db/troubleshoot-query-performance.md

Lines changed: 67 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to identify, diagnose, and troubleshoot Azure Cosmos DB S
44
author: timsander1
55
ms.service: cosmos-db
66
ms.topic: troubleshooting
7-
ms.date: 04/20/2020
7+
ms.date: 04/22/2020
88
ms.author: tisande
99
ms.subservice: cosmosdb-sql
1010
ms.reviewer: sngun
@@ -24,26 +24,28 @@ This article provides examples that you can re-create by using the [nutrition](h
2424

2525
## Common SDK issues
2626

27-
- For best performance, follow the [Performance tips](performance-tips.md).
27+
Before reading this guide, it is helpful to consider common SDK issues that aren't related to the query engine.
28+
29+
- For best performance, follow these [Performance tips](performance-tips.md).
2830
> [!NOTE]
2931
> For improved performance, we recommend Windows 64-bit host processing. The SQL SDK includes a native ServiceInterop.dll to parse and optimize queries locally. ServiceInterop.dll is supported only on the Windows x64 platform. For Linux and other unsupported platforms where ServiceInterop.dll isn't available, an additional network call will be made to the gateway to get the optimized query.
30-
- You can set a `MaxItemCount` for your queries but you can't specify a minimum item count.
32+
- The SDK allows setting a `MaxItemCount` for your queries but you can't specify a minimum item count.
3133
- Code should handle any page size, from zero to the `MaxItemCount`.
32-
- The number of items in a page will always be less than the specified `MaxItemCount`. However, `MaxItemCount` is strictly a maximum and there could be fewer results than this amount.
34+
- The number of items in a page will always be less or equal to the specified `MaxItemCount`. However, `MaxItemCount` is strictly a maximum and there could be fewer results than this amount.
3335
- Sometimes queries may have empty pages even when there are results on a future page. Reasons for this could be:
3436
- The SDK could be doing multiple network calls.
3537
- The query might be taking a long time to retrieve the documents.
3638
- All queries have a continuation token that will allow the query to continue. Be sure to drain the query completely. Look at the SDK samples, and use a `while` loop on `FeedIterator.HasMoreResults` to drain the entire query.
3739

3840
## Get query metrics
3941

40-
When you optimize a query in Azure Cosmos DB, the first step is always to [get the query metrics](profile-sql-api-query.md) for your query. These metrics are also available through the Azure portal:
42+
When you optimize a query in Azure Cosmos DB, the first step is always to [get the query metrics](profile-sql-api-query.md) for your query. These metrics are also available through the Azure portal. Once you run your query in the Data Explorer, the query metrics are visible next to the **Results** tab:
4143

4244
[ ![Getting query metrics](./media/troubleshoot-query-performance/obtain-query-metrics.png) ](./media/troubleshoot-query-performance/obtain-query-metrics.png#lightbox)
4345

44-
After you get the query metrics, compare the Retrieved Document Count with the Output Document Count for your query. Use this comparison to identify the relevant sections to review in this article.
46+
After you get the query metrics, compare the **Retrieved Document Count** with the **Output Document Count** for your query. Use this comparison to identify the relevant sections to review in this article.
4547

46-
The Retrieved Document Count is the number of documents that the query needed to load. The Output Document Count is the number of documents that were needed for the results of the query. If the Retrieved Document Count is significantly higher than the Output Document Count, there was at least one part of your query that was unable to use the index and needed to do a scan.
48+
The **Retrieved Document Count** is the number of documents that the query engine needed to load. The **Output Document Count** is the number of documents that were needed for the results of the query. If the **Retrieved Document Count** is significantly higher than the **Output Document Count**, there was at least one part of your query that was unable to use an index and needed to do a scan.
4749

4850
Refer to the following sections to understand the relevant query optimizations for your scenario.
4951

@@ -57,19 +59,19 @@ Refer to the following sections to understand the relevant query optimizations f
5759

5860
- [Understand which aggregate queries use the index.](#understand-which-aggregate-queries-use-the-index)
5961

60-
- [Modify queries that have both a filter and an ORDER BY clause.](#modify-queries-that-have-both-a-filter-and-an-order-by-clause)
62+
- [Optimize queries that have both a filter and an ORDER BY clause.](#optimize-queries-that-have-both-a-filter-and-an-order-by-clause)
6163

6264
- [Optimize JOIN expressions by using a subquery.](#optimize-join-expressions-by-using-a-subquery)
6365

6466
<br>
6567

6668
#### Retrieved Document Count is approximately equal to Output Document Count
6769

68-
- [Avoid cross partition queries.](#avoid-cross-partition-queries)
70+
- [Minimize cross partition queries.](#minimize-cross-partition-queries)
6971

7072
- [Optimize queries that have filters on multiple properties.](#optimize-queries-that-have-filters-on-multiple-properties)
7173

72-
- [Modify queries that have both a filter and an ORDER BY clause.](#modify-queries-that-have-both-a-filter-and-an-order-by-clause)
74+
- [Optimize queries that have both a filter and an ORDER BY clause.](#optimize-queries-that-have-both-a-filter-and-an-order-by-clause)
7375

7476
<br>
7577

@@ -85,7 +87,7 @@ Refer to the following sections to understand the relevant query optimizations f
8587

8688
## Queries where Retrieved Document Count exceeds Output Document Count
8789

88-
The Retrieved Document Count is the number of documents that the query needed to load. The Output Document Count is the number of documents that were needed for the results of the query. If the Retrieved Document Count is significantly higher than the Output Document Count, there was at least one part of your query that was unable to use the index and needed to do a scan.
90+
The **Retrieved Document Count** is the number of documents that the query engine needed to load. The **Output Document Count** is the number of documents returned by the query. If the **Retrieved Document Count** is significantly higher than the **Output Document Count**, there was at least one part of your query that was unable to use an index and needed to do a scan.
8991

9092
Here's an example of scan query that wasn't entirely served by the index:
9193

@@ -123,20 +125,25 @@ Client Side Metrics
123125
Request Charge : 4,059.95 RUs
124126
```
125127

126-
The Retrieved Document Count (60,951) is significantly higher than the Output Document Count (7), so this query needed to do a scan. In this case, the system function [UPPER()](sql-query-upper.md) doesn't use the index.
128+
The **Retrieved Document Count** (60,951) is significantly higher than the **Output Document Count** (7), implying that this query resulted in a document scan. In this case, the system function [UPPER()](sql-query-upper.md) doesn't use an index.
127129

128130
### Include necessary paths in the indexing policy
129131

130-
Your indexing policy should cover any properties included in `WHERE` clauses, `ORDER BY` clauses, `JOIN`, and most system functions. The path specified in the index policy should match (case-sensitive) the property in the JSON documents.
132+
Your indexing policy should cover any properties included in `WHERE` clauses, `ORDER BY` clauses, `JOIN`, and most system functions. The desired paths specified in the index policy should match the properties in the JSON documents.
133+
134+
> [!NOTE]
135+
> Properties in Azure Cosmos DB indexing policy are case-sensitive
131136
132-
If you run a simple query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset, you observe a much lower RU charge when the property in the `WHERE` clause is indexed:
137+
If you run the following simple query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset, you will observe a much lower RU charge when the property in the `WHERE` clause is indexed:
133138

134139
#### Original
135140

136141
Query:
137142

138143
```sql
139-
SELECT * FROM c WHERE c.description = "Malabar spinach, cooked"
144+
SELECT *
145+
FROM c
146+
WHERE c.description = "Malabar spinach, cooked"
140147
```
141148

142149
Indexing policy:
@@ -213,42 +220,55 @@ For example, given these two sample queries, the query with both an equality and
213220
Query with only `CONTAINS` filter - higher RU charge:
214221

215222
```sql
216-
SELECT COUNT(1) FROM c WHERE CONTAINS(c.description, "spinach")
223+
SELECT COUNT(1)
224+
FROM c
225+
WHERE CONTAINS(c.description, "spinach")
217226
```
218227

219228
Query with both equality filter and `CONTAINS` filter - lower RU charge:
220229

221230
```sql
222-
SELECT AVG(c._ts) FROM c WHERE c.foodGroup = "Sausages and Luncheon Meats" AND CONTAINS(c.description, "spinach")
231+
SELECT AVG(c._ts)
232+
FROM c
233+
WHERE c.foodGroup = "Sausages and Luncheon Meats" AND CONTAINS(c.description, "spinach")
223234
```
224235

225-
Here are additional examples of aggregates queries that will not fully use the index:
236+
Here are additional examples of aggregate queries that will not fully use the index:
226237

227238
#### Queries with system functions that don't use the index
228239

229240
You should refer to the relevant [system function's page](sql-query-system-functions.md) to see if it uses the index.
230241

231242
```sql
232-
SELECT MAX(c._ts) FROM c WHERE CONTAINS(c.description, "spinach")
243+
SELECT MAX(c._ts)
244+
FROM c
245+
WHERE CONTAINS(c.description, "spinach")
233246
```
234247

235248
#### Aggregate queries with user-defined functions(UDF's)
236249

237250
```sql
238-
SELECT AVG(c._ts) FROM c WHERE udf.MyUDF("Sausages and Luncheon Meats")
251+
SELECT AVG(c._ts)
252+
FROM c
253+
WHERE udf.MyUDF("Sausages and Luncheon Meats")
239254
```
240255

241256
#### Queries with GROUP BY
242257

243-
The RU charge of `GROUP BY` will increase as the cardinality of the properties in the `GROUP BY` clause increases. In this example, the query engine must load every document that matches the `c.foodGroup = "Sausages and Luncheon Meats"` filter so the RU charge is expected to be high.
258+
The RU charge of queries with `GROUP BY` will increase as the cardinality of the properties in the `GROUP BY` clause increases. In the below query, for example, the RU charge of the query will increase as the number unique descriptions increases.
259+
260+
The RU charge of an aggregate function with a `GROUP BY` clause will be higher than the RU charge of an aggregate function alone. In this example, the query engine must load every document that matches the `c.foodGroup = "Sausages and Luncheon Meats"` filter so the RU charge is expected to be high.
244261

245262
```sql
246-
SELECT COUNT(1) FROM c WHERE c.foodGroup = "Sausages and Luncheon Meats" GROUP BY c.description
263+
SELECT COUNT(1)
264+
FROM c
265+
WHERE c.foodGroup = "Sausages and Luncheon Meats"
266+
GROUP BY c.description
247267
```
248268

249269
If you plan to frequently run the same aggregate queries, it may be more efficient to build a real-time materialized view with the [Azure Cosmos DB change feed](change-feed.md) than running individual queries.
250270

251-
### Modify queries that have both a filter and an ORDER BY clause
271+
### Optimize queries that have both a filter and an ORDER BY clause
252272

253273
Although queries that have a filter and an `ORDER BY` clause will normally use a range index, they'll be more efficient if they can be served from a composite index. In addition to modifying the indexing policy, you should add all properties in the composite index to the `ORDER BY` clause. This change to the query will ensure that it uses the composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset:
254274

@@ -257,7 +277,10 @@ Although queries that have a filter and an `ORDER BY` clause will normally use a
257277
Query:
258278

259279
```sql
260-
SELECT * FROM c WHERE c.foodGroup = "Soups, Sauces, and Gravies" ORDER BY c._ts ASC
280+
SELECT *
281+
FROM c
282+
WHERE c.foodGroup = "Soups, Sauces, and Gravies"
283+
ORDER BY c._ts ASC
261284
```
262285

263286
Indexing policy:
@@ -283,7 +306,8 @@ Indexing policy:
283306
Updated query (includes both properties in the `ORDER BY` clause):
284307

285308
```sql
286-
SELECT * FROM c
309+
SELECT *
310+
FROM c
287311
WHERE c.foodGroup = "Soups, Sauces, and Gravies"
288312
ORDER BY c.foodGroup, c._ts ASC
289313
```
@@ -319,6 +343,7 @@ Updated indexing policy:
319343
**RU charge:** 8.86 RUs
320344

321345
### Optimize JOIN expressions by using a subquery
346+
322347
Multi-value subqueries can optimize `JOIN` expressions by pushing predicates after each select-many expression rather than after all cross joins in the `WHERE` clause.
323348

324349
Consider this query:
@@ -335,7 +360,7 @@ AND n.nutritionValue < 10) AND s.amount > 1
335360

336361
**RU charge:** 167.62 RUs
337362

338-
For this query, the index will match any document that has a tag with the name "infant formula", nutritionValue greater than 0, and serving amount greater than 1. The `JOIN` expression here will perform the cross-product of all items of tags, nutrients, and servings arrays for each matching document before any filter is applied. The `WHERE` clause will then apply the filter predicate on each `<c, t, n, s>` tuple.
363+
For this query, the index will match any document that has a tag with the name `infant formula`, `nutritionValue` greater than 0, and `amount` greater than 1. The `JOIN` expression here will perform the cross-product of all items of tags, nutrients, and servings arrays for each matching document before any filter is applied. The `WHERE` clause will then apply the filter predicate on each `<c, t, n, s>` tuple.
339364

340365
For example, if a matching document has 10 items in each of the three arrays, it will expand to 1 x 10 x 10 x 10 (that is, 1,000) tuples. The use of subqueries here can help to filter out joined array items before joining with the next expression.
341366

@@ -355,9 +380,9 @@ Assume that only one item in the tags array matches the filter and that there ar
355380

356381
## Queries where Retrieved Document Count is equal to Output Document Count
357382

358-
If the Retrieved Document Count is approximately equal to the Output Document Count, the query didn't have to scan many unnecessary documents. For many queries, like those that use the TOP keyword, Retrieved Document Count might exceed Output Document Count by 1. You don't need to be concerned about this.
383+
If the **Retrieved Document Count** is approximately equal to the **Output Document Count**, the query engine didn't have to scan many unnecessary documents. For many queries, like those that use the `TOP` keyword, **Retrieved Document Count** might exceed **Output Document Count** by 1. You don't need to be concerned about this.
359384

360-
### Avoid cross partition queries
385+
### Minimize cross partition queries
361386

362387
Azure Cosmos DB uses [partitioning](partitioning-overview.md) to scale individual containers as Request Unit and data storage needs increase. Each physical partition has a separate and independent index. If your query has an equality filter that matches your container's partition key, you'll need to check only the relevant partition's index. This optimization reduces the total number of RUs that the query requires.
363388

@@ -366,26 +391,30 @@ If you have a large number of provisioned RUs (more than 30,000) or a large amou
366391
For example, if you create a container with the partition key foodGroup, the following queries will need to check only a single physical partition:
367392

368393
```sql
369-
SELECT * FROM c
394+
SELECT *
395+
FROM c
370396
WHERE c.foodGroup = "Soups, Sauces, and Gravies" and c.description = "Mushroom, oyster, raw"
371397
```
372398

373-
These queries would also be optimized by the addition of the partition key in the query:
399+
Queries that have an `IN` filter with the partition key will only check the relevant physical partition(s) and will not "fan-out":
374400

375401
```sql
376-
SELECT * FROM c
402+
SELECT *
403+
FROM c
377404
WHERE c.foodGroup IN("Soups, Sauces, and Gravies", "Vegetables and Vegetable Products") and c.description = "Mushroom, oyster, raw"
378405
```
379406

380-
Queries that have range filters on the partition key, or that don't have any filters on the partition key, will need to check every physical partition's index for results:
407+
Queries that have range filters on the partition key, or that don't have any filters on the partition key, will need to "fan-out" and check every physical partition's index for results:
381408

382409
```sql
383-
SELECT * FROM c
410+
SELECT *
411+
FROM c
384412
WHERE c.description = "Mushroom, oyster, raw"
385413
```
386414

387415
```sql
388-
SELECT * FROM c
416+
SELECT *
417+
FROM c
389418
WHERE c.foodGroup > "Soups, Sauces, and Gravies" and c.description = "Mushroom, oyster, raw"
390419
```
391420

@@ -396,12 +425,14 @@ Although queries that have filters on multiple properties will normally use a ra
396425
Here are some examples of queries that could be optimized with a composite index:
397426

398427
```sql
399-
SELECT * FROM c
428+
SELECT *
429+
FROM c
400430
WHERE c.foodGroup = "Vegetables and Vegetable Products" AND c._ts = 1575503264
401431
```
402432

403433
```sql
404-
SELECT * FROM c
434+
SELECT *
435+
FROM c
405436
WHERE c.foodGroup = "Vegetables and Vegetable Products" AND c._ts > 1575503264
406437
```
407438

0 commit comments

Comments
 (0)