Skip to content

Commit ab410b8

Browse files
authored
Merge pull request #101483 from timsander1/master
fix headings
2 parents 9bf674f + 894f549 commit ab410b8

File tree

1 file changed

+18
-47
lines changed

1 file changed

+18
-47
lines changed

articles/cosmos-db/troubleshoot-query-performance.md

Lines changed: 18 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,11 @@ You can reference the below section to understand the relevant query optimizatio
3333

3434
#### Retrieved Document Count is significantly greater than Output Document Count
3535

36-
- [Ensure that the indexing policy includes necessary paths](#ensure-that-the-indexing-policy-includes-necessary-paths)
36+
- [Include necessary paths in the indexing policy](#include-necessary-paths-in-the-indexing-policy)
3737

3838
- [Understand which system functions utilize the index](#understand-which-system-functions-utilize-the-index)
3939

40-
- [Optimize queries with both a filter and an ORDER BY clause](#optimize-queries-with-both-a-filter-and-an-order-by-clause)
41-
42-
- [Optimize queries that use DISTINCT](#optimize-queries-that-use-distinct)
40+
- [Queries with both a filter and an ORDER BY clause](#queries-with-both-a-filter-and-an-order-by-clause)
4341

4442
- [Optimize JOIN expressions by using a subquery](#optimize-join-expressions-by-using-a-subquery)
4543

@@ -49,23 +47,23 @@ You can reference the below section to understand the relevant query optimizatio
4947

5048
- [Avoid cross partition queries](#avoid-cross-partition-queries)
5149

52-
- [Optimize queries that have a filter on multiple properties](#optimize-queries-that-have-a-filter-on-multiple-properties)
50+
- [Filters on multiple properties](#filters-on-multiple-properties)
5351

54-
- [Optimize queries with both a filter and an ORDER BY clause](#optimize-queries-with-both-a-filter-and-an-order-by-clause)
52+
- [Queries with both a filter and an ORDER BY clause](#queries-with-both-a-filter-and-an-order-by-clause)
5553

5654
<br>
5755

5856
### Query's RU charge is acceptable but latency is still too high
5957

60-
- [Improving proximity between your app and Azure Cosmos DB](#improving-proximity-between-your-app-and-azure-cosmos-db)
58+
- [Improve proximity](#improve-proximity)
6159

62-
- [Increasing provisioned throughput](#increasing-provisioned-throughput)
60+
- [Increase provisioned throughput](#increase-provisioned-throughput)
6361

64-
- [Increasing MaxConcurrency](#increasing-maxconcurrency)
62+
- [Increase MaxConcurrency](#increase-maxconcurrency)
6563

66-
- [Increasing MaxBufferedItemCount](#increasing-maxbuffereditemcount)
64+
- [Increase MaxBufferedItemCount](#increase-maxbuffereditemcount)
6765

68-
## Optimizations for queries where Retrieved Document Count significantly exceeds Output Document Count:
66+
## Queries where Retrieved Document Count exceeds Output Document Count
6967

7068
The Retrieved Document Count is the number of documents that the query needed to load. The Output Document Count is the number of documents that were needed for the results of the query. If the Retrieved Document Count is significantly higher than the Output Document Count, then there was at least one part of your query that was unable to utilize the index and needed to do a scan.
7169

@@ -107,7 +105,7 @@ Client Side Metrics
107105

108106
Retrieved Document Count (60,951) is significantly greater than Output Document Count (7) so this query needed to do a scan. In this case, the system function [UPPER()](sql-query-upper.md) does not utilize the index.
109107

110-
## Ensure that the indexing policy includes necessary paths
108+
## Include necessary paths in the indexing policy
111109

112110
Your indexing policy should cover any properties included in `WHERE` clauses, `ORDER BY` clauses, `JOIN`, and most System Functions. The path specified in the index policy should match (case-sensitive) the property in the JSON documents.
113111

@@ -185,7 +183,7 @@ Some common system functions that do not use the index and must load each docume
185183

186184
Other parts of the query may still utilize the index despite the system functions not using the index.
187185

188-
## Optimize queries with both a filter and an ORDER BY clause
186+
## Queries with both a filter and an ORDER BY clause
189187

190188
While queries with a filter and an `ORDER BY` clause will normally utilize a range index, they will be more efficient if they can be served from a composite index. In addition to modifying the indexing policy, you should add all properties in the composite index to the `ORDER BY` clause. This query modification will ensure that it utilizes the composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
191189

@@ -255,33 +253,6 @@ Updated indexing policy:
255253

256254
**RU Charge:** 8.86 RU's
257255

258-
## Optimize queries that use DISTINCT
259-
260-
It will be more efficient to find the `DISTINCT` set of results if the duplicate results are consecutive. Adding an `ORDER BY` clause to the query and a composite index will ensure that duplicate results are consecutive. If you need to `ORDER BY` multiple properties, add a composite index. You can observe the impact by running a query on the [nutrition](https://github.com/CosmosDB/labs/blob/master/dotnet/setup/NutritionData.json) dataset.
261-
262-
### Original
263-
264-
Query:
265-
266-
```sql
267-
SELECT DISTINCT c.foodGroup
268-
FROM c
269-
```
270-
271-
**RU Charge:** 32.39 RU's
272-
273-
### Optimized
274-
275-
Updated query:
276-
277-
```sql
278-
SELECT DISTINCT c.foodGroup
279-
FROM c
280-
ORDER BY c.foodGroup
281-
```
282-
283-
**RU Charge:** 3.38 RU's
284-
285256
## Optimize JOIN expressions by using a subquery
286257
Multi-value subqueries can optimize `JOIN` expressions by pushing predicates after each select-many expression rather than after all cross-joins in the `WHERE` clause.
287258

@@ -317,7 +288,7 @@ JOIN (SELECT VALUE s FROM s IN c.servings WHERE s.amount > 1)
317288

318289
Assume that only one item in the tags array matches the filter, and there are five items for both nutrients and servings arrays. The `JOIN` expressions will then expand to 1 x 1 x 5 x 5 = 25 items, as opposed to 1,000 items in the first query.
319290

320-
## Optimizations for queries where Retrieved Document Count is approximately equal to Output Document Count:
291+
## Queries where Retrieved Document Count is equal to Output Document Count
321292

322293
If the Retrieved Document Count is approximately equal to the Output Document Count, it means the query did not have to scan many unnecessary documents. For many queries, such as those that use the TOP keyword, Retrieved Document Count may exceed Output Document Count by 1. This should not be cause for concern.
323294

@@ -353,7 +324,7 @@ SELECT * FROM c
353324
WHERE c.foodGroup > “Soups, Sauces, and Gravies” and c.description = "Mushroom, oyster, raw"
354325
```
355326
356-
## Optimize queries that have a filter on multiple properties
327+
## Filters on multiple properties
357328
358329
While queries with filters on multiple properties will normally utilize a range index, they will be more efficient if they can be served from a composite index. For small amounts of data, this optimization will not have a significant impact. It may prove useful, however, for large amounts of data. You can only optimize, at most, one non-equality filter per composite index. If your query has multiple non-equality filters, you should pick one of them that will utilize the composite index. The remainder will continue to utilize range indexes. The non-equality filter must be defined last in the composite index. [Learn more about composite indexes](index-policy.md#composite-indexes)
359330
@@ -396,23 +367,23 @@ Here is the relevant composite index:
396367
}
397368
```
398369
399-
## Common optimizations that reduce query latency (no impact on RU charge):
370+
## Optimizations that reduce query latency:
400371
401372
In many cases, RU charge may be acceptable but query latency is still too high. The below sections give an overview of tips for reducing query latency. If you run the same query multiple times on the same dataset, it will have the same RU charge each time. However, query latency may vary between query executions.
402373
403-
## Improving proximity between your app and Azure Cosmos DB
374+
## Improve proximity
404375
405376
Queries that are run from a different region than the Azure Cosmos DB account will have a higher latency than if they were run inside the same region. For example, if you were running code on your desktop computer, you should expect latency to be tens or hundreds (or more) milliseconds greater than if the query came from a Virtual Machine within the same Azure region as Azure Cosmos DB. It is simple to [globally distribute data in Azure Cosmos DB](distribute-data-globally.md) to ensure you can bring your data closer to your app.
406377
407-
## Increasing provisioned throughput
378+
## Increase provisioned throughput
408379
409380
In Azure Cosmos DB, your provisioned throughput is measured in Request Units (RU’s). Let’s imagine you have a query that consumes 5 RU’s of throughput. For example, if you provision 1,000 RU’s, you would be able to run that query 200 times per second. If you attempted to run the query when there was not enough throughput available, Azure Cosmos DB would return an HTTP 429 error. Any of the current Core (SQL) API sdk's will automatically retry this query after waiting a brief period. Throttled requests take a longer amount of time, so increasing provisioned throughput can improve query latency. You can observe the [total number of requests throttled requests](use-metrics.md#understand-how-many-requests-are-succeeding-or-causing-errors) in the Metrics blade of the Azure portal.
410381
411-
## Increasing MaxConcurrency
382+
## Increase MaxConcurrency
412383
413384
Parallel queries work by querying multiple partitions in parallel. However, data from an individual partitioned collection is fetched serially with respect to the query. So, adjusting the MaxConcurrency to the number of partitions has the maximum chance of achieving the most performant query, provided all other system conditions remain the same. If you don't know the number of partitions, you can set the MaxConcurrency (or MaxDegreesOfParallelism in older sdk versions) to a high number, and the system chooses the minimum (number of partitions, user provided input) as the maximum degree of parallelism.
414385
415-
## Increasing MaxBufferedItemCount
386+
## Increase MaxBufferedItemCount
416387
417388
Queries are designed to pre-fetch results while the current batch of results is being processed by the client. The pre-fetching helps in overall latency improvement of a query. Setting the MaxBufferedItemCount limits the number of pre-fetched results. By setting this value to the expected number of results returned (or a higher number), the query can receive maximum benefit from pre-fetching. Setting this value to -1 allows the system to automatically decide the number of items to buffer.
418389

0 commit comments

Comments
 (0)