You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Optimizations for queries where Retrieved Document Count significantly exceeds Output Document Count:
66
+
## Queries where Retrieved Document Count exceeds Output Document Count
69
67
70
68
The Retrieved Document Count is the number of documents that the query needed to load. The Output Document Count is the number of documents that were needed for the results of the query. If the Retrieved Document Count is significantly higher than the Output Document Count, then there was at least one part of your query that was unable to utilize the index and needed to do a scan.
71
69
@@ -107,7 +105,7 @@ Client Side Metrics
107
105
108
106
Retrieved Document Count (60,951) is significantly greater than Output Document Count (7) so this query needed to do a scan. In this case, the system function [UPPER()](sql-query-upper.md) does not utilize the index.
109
107
110
-
## Ensure that the indexing policy includes necessary paths
108
+
## Include necessary paths in the indexing policy
111
109
112
110
Your indexing policy should cover any properties included in `WHERE` clauses, `ORDER BY` clauses, `JOIN`, and most System Functions. The path specified in the index policy should match (case-sensitive) the property in the JSON documents.
113
111
@@ -185,7 +183,7 @@ Some common system functions that do not use the index and must load each docume
185
183
186
184
Other parts of the query may still utilize the index despite the system functions not using the index.
187
185
188
-
## Optimize queries with both a filter and an ORDER BY clause
186
+
## Queries with both a filter and an ORDER BY clause
189
187
190
188
While queries with a filter and an `ORDER BY` clause will normally utilize a range index, they will be more efficient if they can be served from a composite index. In addition to modifying the indexing policy, you should add all properties in the composite index to the `ORDER BY` clause. This query modification will ensure that it utilizes the composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
191
189
@@ -255,33 +253,6 @@ Updated indexing policy:
255
253
256
254
**RU Charge:** 8.86 RU's
257
255
258
-
## Optimize queries that use DISTINCT
259
-
260
-
It will be more efficient to find the `DISTINCT` set of results if the duplicate results are consecutive. Adding an `ORDER BY` clause to the query and a composite index will ensure that duplicate results are consecutive. If you need to `ORDER BY` multiple properties, add a composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
261
-
262
-
### Original
263
-
264
-
Query:
265
-
266
-
```sql
267
-
SELECT DISTINCTc.foodGroup
268
-
FROM c
269
-
```
270
-
271
-
**RU Charge:** 32.39 RU's
272
-
273
-
### Optimized
274
-
275
-
Updated query:
276
-
277
-
```sql
278
-
SELECT DISTINCTc.foodGroup
279
-
FROM c
280
-
ORDER BYc.foodGroup
281
-
```
282
-
283
-
**RU Charge:** 3.38 RU's
284
-
285
256
## Optimize JOIN expressions by using a subquery
286
257
Multi-value subqueries can optimize `JOIN` expressions by pushing predicates after each select-many expression rather than after all cross-joins in the `WHERE` clause.
287
258
@@ -317,7 +288,7 @@ JOIN (SELECT VALUE s FROM s IN c.servings WHERE s.amount > 1)
317
288
318
289
Assume that only one item in the tags array matches the filter, and there are five items for both nutrients and servings arrays. The `JOIN` expressions will then expand to 1 x 1 x 5 x 5 = 25 items, as opposed to 1,000 items in the first query.
319
290
320
-
## Optimizations for queries where Retrieved Document Count is approximately equal to Output Document Count:
291
+
## Queries where Retrieved Document Count is equal to Output Document Count
321
292
322
293
If the Retrieved Document Count is approximately equal to the Output Document Count, it means the query did not have to scan many unnecessary documents. For many queries, such as those that use the TOP keyword, Retrieved Document Count may exceed Output Document Count by 1. This should not be cause for concern.
323
294
@@ -353,7 +324,7 @@ SELECT * FROM c
353
324
WHERE c.foodGroup > “Soups, Sauces, and Gravies” and c.description = "Mushroom, oyster, raw"
354
325
```
355
326
356
-
## Optimize queries that have a filter on multiple properties
327
+
## Filters on multiple properties
357
328
358
329
While queries with filters on multiple properties will normally utilize a range index, they will be more efficient if they can be served from a composite index. For small amounts of data, this optimization will not have a significant impact. It may prove useful, however, for large amounts of data. You can only optimize, at most, one non-equality filter per composite index. If your query has multiple non-equality filters, you should pick one of them that will utilize the composite index. The remainder will continue to utilize range indexes. The non-equality filter must be defined last in the composite index. [Learn more about composite indexes](index-policy.md#composite-indexes)
359
330
@@ -396,23 +367,23 @@ Here is the relevant composite index:
396
367
}
397
368
```
398
369
399
-
## Common optimizations that reduce query latency (no impact on RU charge):
370
+
## Optimizations that reduce query latency:
400
371
401
372
In many cases, RU charge may be acceptable but query latency is still too high. The below sections give an overview of tips for reducing query latency. If you run the same query multiple times on the same dataset, it will have the same RU charge each time. However, query latency may vary between query executions.
402
373
403
-
## Improving proximity between your app and Azure Cosmos DB
374
+
## Improve proximity
404
375
405
376
Queries that are run from a different region than the Azure Cosmos DB account will have a higher latency than if they were run inside the same region. For example, if you were running code on your desktop computer, you should expect latency to be tens or hundreds (or more) milliseconds greater than if the query came from a Virtual Machine within the same Azure region as Azure Cosmos DB. It is simple to [globally distribute data in Azure Cosmos DB](distribute-data-globally.md) to ensure you can bring your data closer to your app.
406
377
407
-
## Increasing provisioned throughput
378
+
## Increase provisioned throughput
408
379
409
380
In Azure Cosmos DB, your provisioned throughput is measured in Request Units (RU’s). Let’s imagine you have a query that consumes 5 RU’s of throughput. For example, if you provision 1,000 RU’s, you would be able to run that query 200 times per second. If you attempted to run the query when there was not enough throughput available, Azure Cosmos DB would return an HTTP 429 error. Any of the current Core (SQL) API sdk's will automatically retry this query after waiting a brief period. Throttled requests take a longer amount of time, so increasing provisioned throughput can improve query latency. You can observe the [total number of requests throttled requests](use-metrics.md#understand-how-many-requests-are-succeeding-or-causing-errors) in the Metrics blade of the Azure portal.
410
381
411
-
## Increasing MaxConcurrency
382
+
## Increase MaxConcurrency
412
383
413
384
Parallel queries work by querying multiple partitions in parallel. However, data from an individual partitioned collection is fetched serially with respect to the query. So, adjusting the MaxConcurrency to the number of partitions has the maximum chance of achieving the most performant query, provided all other system conditions remain the same. If you don't know the number of partitions, you can set the MaxConcurrency (or MaxDegreesOfParallelism in older sdk versions) to a high number, and the system chooses the minimum (number of partitions, user provided input) as the maximum degree of parallelism.
414
385
415
-
## Increasing MaxBufferedItemCount
386
+
## Increase MaxBufferedItemCount
416
387
417
388
Queries are designed to pre-fetch results while the current batch of results is being processed by the client. The pre-fetching helps in overall latency improvement of a query. Setting the MaxBufferedItemCount limits the number of pre-fetched results. By setting this value to the expected number of results returned (or a higher number), the query can receive maximum benefit from pre-fetching. Setting this value to -1 allows the system to automatically decide the number of items to buffer.
0 commit comments