You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/troubleshoot-query-performance.md
+58-12Lines changed: 58 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Learn how to identify, diagnose, and troubleshoot Azure Cosmos DB S
4
4
author: timsander1
5
5
ms.service: cosmos-db
6
6
ms.topic: troubleshooting
7
-
ms.date: 02/10/2020
7
+
ms.date: 04/20/2020
8
8
ms.author: tisande
9
9
ms.subservice: cosmosdb-sql
10
10
ms.reviewer: sngun
@@ -13,7 +13,7 @@ ms.reviewer: sngun
13
13
14
14
This article walks through a general recommended approach for troubleshooting queries in Azure Cosmos DB. Although you shouldn't consider the steps outlined in this article a complete defense against potential query issues, we've included the most common performance tips here. You should use this article as a starting place for troubleshooting slow or expensive queries in the Azure Cosmos DB core (SQL) API. You can also use [diagnostics logs](cosmosdb-monitor-resource-logs.md) to identify queries that are slow or that consume significant amounts of throughput.
15
15
16
-
You can broadly categorize query optimizations in Azure Cosmos DB:
16
+
You can broadly categorize query optimizations in Azure Cosmos DB:
17
17
18
18
- Optimizations that reduce the Request Unit (RU) charge of the query
19
19
- Optimizations that just reduce latency
@@ -22,19 +22,18 @@ If you reduce the RU charge of a query, you'll almost certainly decrease latency
22
22
23
23
This article provides examples that you can re-create by using the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
24
24
25
-
## Important
25
+
## Common SDK issues
26
26
27
27
- For best performance, follow the [Performance tips](performance-tips.md).
28
28
> [!NOTE]
29
29
> For improved performance, we recommend Windows 64-bit host processing. The SQL SDK includes a native ServiceInterop.dll to parse and optimize queries locally. ServiceInterop.dll is supported only on the Windows x64 platform. For Linux and other unsupported platforms where ServiceInterop.dll isn't available, an additional network call will be made to the gateway to get the optimized query.
30
-
- Azure Cosmos DB queries don't support a minimum item count.
31
-
- Code should handle any page size, from zero to the maximum item count.
32
-
- The number of items in a page can and will change without notice.
33
-
- Empty pages are expected for queries and can appear at any time.
34
-
- Empty pages are exposed in the SDKs because that exposure allows more opportunities to cancel a query. It also makes it clear that the SDK is doing multiple network calls.
35
-
- Empty pages can appear in existing workloads because a physical partition is split in Azure Cosmos DB. The first partition will have zero results, which causes the empty page.
36
-
- Empty pages are caused by the backend preempting a query because the query is taking more than some fixed amount of time on the backend to retrieve the documents. If Azure Cosmos DB preempts a query, it will return a continuation token that will allow the query to continue.
37
-
- Be sure to drain the query completely. Look at the SDK samples, and use a `while` loop on `FeedIterator.HasMoreResults` to drain the entire query.
30
+
- You can set a `MaxItemCount` for your queries but you can't specify a minimum item count.
31
+
- Code should handle any page size, from zero to the `MaxItemCount`.
32
+
- The number of items in a page will always be less than the specified `MaxItemCount`. However, `MaxItemCount` is strictly a maximum and there could be fewer results than this amount.
33
+
- Sometimes queries may have empty pages even when there are results on a future page. Reasons for this could be:
34
+
- The SDK could be doing multiple network calls.
35
+
- The query might be taking a long time to retrieve the documents.
36
+
- All queries have a continuation token that will allow the query to continue. Be sure to drain the query completely. Look at the SDK samples, and use a `while` loop on `FeedIterator.HasMoreResults` to drain the entire query.
38
37
39
38
## Get query metrics
40
39
@@ -56,6 +55,8 @@ Refer to the following sections to understand the relevant query optimizations f
56
55
57
56
-[Understand which system functions use the index.](#understand-which-system-functions-use-the-index)
58
57
58
+
-[Understand which aggregate queries use the index.](#understand-which-aggregate-queries-use-the-index)
59
+
59
60
-[Modify queries that have both a filter and an ORDER BY clause.](#modify-queries-that-have-both-a-filter-and-an-order-by-clause)
60
61
61
62
-[Optimize JOIN expressions by using a subquery.](#optimize-join-expressions-by-using-a-subquery)
@@ -184,7 +185,7 @@ You can add properties to the indexing policy at any time, with no effect on wri
184
185
185
186
If an expression can be translated into a range of string values, it can use the index. Otherwise, it can't.
186
187
187
-
Here's the list of string functions that can use the index:
188
+
Here's the list of some common string functions that can use the index:
188
189
189
190
- STARTSWITH(str_expr, str_expr)
190
191
- LEFT(str_expr, num_expr) = str_expr
@@ -202,6 +203,51 @@ Following are some common system functions that don't use the index and must loa
202
203
203
204
Other parts of the query might still use the index even though the system functions don't.
204
205
206
+
### Understand which aggregate queries use the index
207
+
208
+
In most cases, aggregate system functions in Azure Cosmos DB will use the index. However, depending on the filters or additional clauses in an aggregate query, the query engine may be required to load a high number of documents. Typically, the query engine will apply equality and range filters first. After applying these filters,
209
+
the query engine can evaluate additional filters and resort to loading remaining documents to compute the aggregate, if needed.
210
+
211
+
For example, given these two sample queries, the query with both an equality and `CONTAINS` system function filter will generally be more efficient than a query with just a `CONTAINS` system function filter. This is because the equality filter is applied first and uses the index before documents need to be loaded for the more expensive `CONTAINS` filter.
212
+
213
+
Query with only `CONTAINS` filter - higher RU charge:
214
+
215
+
```sql
216
+
SELECTCOUNT(1) FROM c WHERE CONTAINS(c.description, "spinach")
217
+
```
218
+
219
+
Query with both equality filter and `CONTAINS` filter - lower RU charge:
220
+
221
+
```sql
222
+
SELECTAVG(c._ts) FROM c WHEREc.foodGroup="Sausages and Luncheon Meats"AND CONTAINS(c.description, "spinach")
223
+
```
224
+
225
+
Here are additional examples of aggregates queries that will not fully use the index:
226
+
227
+
#### Queries with system functions that don't use the index
228
+
229
+
You should refer to the relevant [system function's page](sql-query-system-functions.md) to see if it uses the index.
230
+
231
+
```sql
232
+
SELECTMAX(c._ts) FROM c WHERE CONTAINS(c.description, "spinach")
233
+
```
234
+
235
+
#### Aggregate queries with user-defined functions(UDF's)
236
+
237
+
```sql
238
+
SELECTAVG(c._ts) FROM c WHEREudf.MyUDF("Sausages and Luncheon Meats")
239
+
```
240
+
241
+
#### Queries with GROUP BY
242
+
243
+
The RU charge of `GROUP BY` will increase as the cardinality of the properties in the `GROUP BY` clause increases. In this example, the query engine must load every document that matches the `c.foodGroup = "Sausages and Luncheon Meats"` filter so the RU charge is expected to be high.
244
+
245
+
```sql
246
+
SELECTCOUNT(1) FROM c WHEREc.foodGroup="Sausages and Luncheon Meats"GROUP BYc.description
247
+
```
248
+
249
+
If you plan to frequently run the same aggregate queries, it may be more efficient to build a real-time materialized view with the [Azure Cosmos DB change feed](change-feed.md) than running individual queries.
250
+
205
251
### Modify queries that have both a filter and an ORDER BY clause
206
252
207
253
Although queries that have a filter and an `ORDER BY` clause will normally use a range index, they'll be more efficient if they can be served from a composite index. In addition to modifying the indexing policy, you should add all properties in the composite index to the `ORDER BY` clause. This change to the query will ensure that it uses the composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset:
0 commit comments