Skip to content

Commit e737112

Browse files
authored
Merge pull request #112096 from timsander1/master
update query troubleshooting guide
2 parents d66163f + e306832 commit e737112

File tree

1 file changed

+58
-12
lines changed

1 file changed

+58
-12
lines changed

articles/cosmos-db/troubleshoot-query-performance.md

Lines changed: 58 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Learn how to identify, diagnose, and troubleshoot Azure Cosmos DB S
44
author: timsander1
55
ms.service: cosmos-db
66
ms.topic: troubleshooting
7-
ms.date: 02/10/2020
7+
ms.date: 04/20/2020
88
ms.author: tisande
99
ms.subservice: cosmosdb-sql
1010
ms.reviewer: sngun
@@ -13,7 +13,7 @@ ms.reviewer: sngun
1313

1414
This article walks through a general recommended approach for troubleshooting queries in Azure Cosmos DB. Although you shouldn't consider the steps outlined in this article a complete defense against potential query issues, we've included the most common performance tips here. You should use this article as a starting place for troubleshooting slow or expensive queries in the Azure Cosmos DB core (SQL) API. You can also use [diagnostics logs](cosmosdb-monitor-resource-logs.md) to identify queries that are slow or that consume significant amounts of throughput.
1515

16-
You can broadly categorize query optimizations in Azure Cosmos DB:
16+
You can broadly categorize query optimizations in Azure Cosmos DB:
1717

1818
- Optimizations that reduce the Request Unit (RU) charge of the query
1919
- Optimizations that just reduce latency
@@ -22,19 +22,18 @@ If you reduce the RU charge of a query, you'll almost certainly decrease latency
2222

2323
This article provides examples that you can re-create by using the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
2424

25-
## Important
25+
## Common SDK issues
2626

2727
- For best performance, follow the [Performance tips](performance-tips.md).
2828
> [!NOTE]
2929
> For improved performance, we recommend Windows 64-bit host processing. The SQL SDK includes a native ServiceInterop.dll to parse and optimize queries locally. ServiceInterop.dll is supported only on the Windows x64 platform. For Linux and other unsupported platforms where ServiceInterop.dll isn't available, an additional network call will be made to the gateway to get the optimized query.
30-
- Azure Cosmos DB queries don't support a minimum item count.
31-
- Code should handle any page size, from zero to the maximum item count.
32-
- The number of items in a page can and will change without notice.
33-
- Empty pages are expected for queries and can appear at any time.
34-
- Empty pages are exposed in the SDKs because that exposure allows more opportunities to cancel a query. It also makes it clear that the SDK is doing multiple network calls.
35-
- Empty pages can appear in existing workloads because a physical partition is split in Azure Cosmos DB. The first partition will have zero results, which causes the empty page.
36-
- Empty pages are caused by the backend preempting a query because the query is taking more than some fixed amount of time on the backend to retrieve the documents. If Azure Cosmos DB preempts a query, it will return a continuation token that will allow the query to continue.
37-
- Be sure to drain the query completely. Look at the SDK samples, and use a `while` loop on `FeedIterator.HasMoreResults` to drain the entire query.
30+
- You can set a `MaxItemCount` for your queries but you can't specify a minimum item count.
31+
- Code should handle any page size, from zero to the `MaxItemCount`.
32+
- The number of items in a page will always be less than the specified `MaxItemCount`. However, `MaxItemCount` is strictly a maximum and there could be fewer results than this amount.
33+
- Sometimes queries may have empty pages even when there are results on a future page. Reasons for this could be:
34+
- The SDK could be doing multiple network calls.
35+
- The query might be taking a long time to retrieve the documents.
36+
- All queries have a continuation token that will allow the query to continue. Be sure to drain the query completely. Look at the SDK samples, and use a `while` loop on `FeedIterator.HasMoreResults` to drain the entire query.
3837

3938
## Get query metrics
4039

@@ -56,6 +55,8 @@ Refer to the following sections to understand the relevant query optimizations f
5655

5756
- [Understand which system functions use the index.](#understand-which-system-functions-use-the-index)
5857

58+
- [Understand which aggregate queries use the index.](#understand-which-aggregate-queries-use-the-index)
59+
5960
- [Modify queries that have both a filter and an ORDER BY clause.](#modify-queries-that-have-both-a-filter-and-an-order-by-clause)
6061

6162
- [Optimize JOIN expressions by using a subquery.](#optimize-join-expressions-by-using-a-subquery)
@@ -184,7 +185,7 @@ You can add properties to the indexing policy at any time, with no effect on wri
184185

185186
If an expression can be translated into a range of string values, it can use the index. Otherwise, it can't.
186187

187-
Here's the list of string functions that can use the index:
188+
Here's the list of some common string functions that can use the index:
188189

189190
- STARTSWITH(str_expr, str_expr)
190191
- LEFT(str_expr, num_expr) = str_expr
@@ -202,6 +203,51 @@ Following are some common system functions that don't use the index and must loa
202203

203204
Other parts of the query might still use the index even though the system functions don't.
204205

206+
### Understand which aggregate queries use the index
207+
208+
In most cases, aggregate system functions in Azure Cosmos DB will use the index. However, depending on the filters or additional clauses in an aggregate query, the query engine may be required to load a high number of documents. Typically, the query engine will apply equality and range filters first. After applying these filters,
209+
the query engine can evaluate additional filters and resort to loading remaining documents to compute the aggregate, if needed.
210+
211+
For example, given these two sample queries, the query with both an equality and `CONTAINS` system function filter will generally be more efficient than a query with just a `CONTAINS` system function filter. This is because the equality filter is applied first and uses the index before documents need to be loaded for the more expensive `CONTAINS` filter.
212+
213+
Query with only `CONTAINS` filter - higher RU charge:
214+
215+
```sql
216+
SELECT COUNT(1) FROM c WHERE CONTAINS(c.description, "spinach")
217+
```
218+
219+
Query with both equality filter and `CONTAINS` filter - lower RU charge:
220+
221+
```sql
222+
SELECT AVG(c._ts) FROM c WHERE c.foodGroup = "Sausages and Luncheon Meats" AND CONTAINS(c.description, "spinach")
223+
```
224+
225+
Here are additional examples of aggregates queries that will not fully use the index:
226+
227+
#### Queries with system functions that don't use the index
228+
229+
You should refer to the relevant [system function's page](sql-query-system-functions.md) to see if it uses the index.
230+
231+
```sql
232+
SELECT MAX(c._ts) FROM c WHERE CONTAINS(c.description, "spinach")
233+
```
234+
235+
#### Aggregate queries with user-defined functions(UDF's)
236+
237+
```sql
238+
SELECT AVG(c._ts) FROM c WHERE udf.MyUDF("Sausages and Luncheon Meats")
239+
```
240+
241+
#### Queries with GROUP BY
242+
243+
The RU charge of `GROUP BY` will increase as the cardinality of the properties in the `GROUP BY` clause increases. In this example, the query engine must load every document that matches the `c.foodGroup = "Sausages and Luncheon Meats"` filter so the RU charge is expected to be high.
244+
245+
```sql
246+
SELECT COUNT(1) FROM c WHERE c.foodGroup = "Sausages and Luncheon Meats" GROUP BY c.description
247+
```
248+
249+
If you plan to frequently run the same aggregate queries, it may be more efficient to build a real-time materialized view with the [Azure Cosmos DB change feed](change-feed.md) than running individual queries.
250+
205251
### Modify queries that have both a filter and an ORDER BY clause
206252

207253
Although queries that have a filter and an `ORDER BY` clause will normally use a range index, they'll be more efficient if they can be served from a composite index. In addition to modifying the indexing policy, you should add all properties in the composite index to the `ORDER BY` clause. This change to the query will ensure that it uses the composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset:

0 commit comments

Comments
 (0)