You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/index-overview.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ The goal of this article is to explain how Azure Cosmos DB indexes data and how
20
20
21
21
## From items to trees
22
22
23
-
Every time an item is stored in a container, its content is projected as a JSON document, then converted into a tree representation. What that means is that every property of that item gets represented as a node in a tree. A pseudo root node is created as a parent to all the first-level properties of the item. The leaf nodes contain the actual scalar values carried by an item.
23
+
Every time an item is stored in a container, its content is projected as a JSON document, then converted into a tree representation. This means that every property of that item gets represented as a node in a tree. A pseudo root node is created as a parent to all the first-level properties of the item. The leaf nodes contain the actual scalar values carried by an item.
24
24
25
25
As an example, consider this item:
26
26
@@ -201,7 +201,7 @@ Here is a table that summarizes the different ways indexes are used in Azure Cos
201
201
| Full index scan | Read distinct set of indexed values and load only matching items from the transactional data store | Contains, EndsWith, RegexMatch, LIKE | Increases linearly based on the cardinality of indexed properties | Increases based on number of items in query results |
202
202
| Full scan | Load all items from the transactional data store | Upper, Lower | N/A | Increases based on number of items in container |
203
203
204
-
When writing queries, you should use filter predicate that use the index as efficiently as possible. For example, if either `StartsWith` or `Contains` would work for your use case, you should opt for `StartsWith` since it will do a precise index scan instead of a full index scan.
204
+
When writing queries, you should use filter predicate that uses the index as efficiently as possible. For example, if either `StartsWith` or `Contains` would work for your use case, you should opt for `StartsWith` since it will do a precise index scan instead of a full index scan.
205
205
206
206
## Index usage details
207
207
@@ -273,7 +273,7 @@ The query predicate (filtering on items where any location has "France" as its c
273
273
274
274
:::image type="content" source="./media/index-overview/matching-path.png" alt-text="Matching a specific path within a tree" border="false":::
275
275
276
-
Since this query has an equality filter, after traversing this tree, we can quickly identify the index pages that contain the query results. In this case, the query engine would read index pages that contain Item 1. An index seek is the most efficient way to use the index. With an index seek we only read the necessary index pages and load only the items in the query results. Therefore, the index lookup time and RU charge from index lookup are incredibly low, regardless of the total data volume.
276
+
Since this query has an equality filter, after traversing this tree, we can quickly identify the index pages that contain the query results. In this case, the query engine would read index pages that contain Item 1. An index seek is the most efficient way to use the index. With an index seek, we only read the necessary index pages and load only the items in the query results. Therefore, the index lookup time and RU charge from index lookup are incredibly low, regardless of the total data volume.
To execute this query, the query engine must do an index seek on `headquarters/employees` and full index scan on `headquarters/country`. The query engine has internal heuristics that it uses to evaluate the query filter expression as efficiently as possible. In this case, the query engine would avoid needing to read unnecessary index pages by doing the index seek first. If, for example, only 50 items matched the equality filter, the query engine would only need to evaluate `Contains` on the index pages that contained those 50 items. A full index scan of the entire container wouldn't be necessary.
348
+
To execute this query, the query engine must do an index seek on `headquarters/employees` and full index scan on `headquarters/country`. The query engine has internal heuristics that it uses to evaluate the query filter expression as efficiently as possible. In this case, the query engine would avoid needing to read unnecessary index pages by doing the index seek first. If for example, only 50 items matched the equality filter, the query engine would only need to evaluate `Contains` on the index pages that contained those 50 items. A full index scan of the entire container wouldn't be necessary.
349
349
350
350
## Index utilization for scalar aggregate functions
351
351
352
352
Queries with aggregate functions must rely exclusively on the index in order to use it.
353
353
354
354
In some cases, the index can return false positives. For example, when evaluating `Contains` on the index, the number of matches in the index may exceed the number of query results. The query engine will load all index matches, evaluate the filter on the loaded items, and return only the correct results.
355
355
356
-
For the majority of queries, loading false positive index matches will not have any noticeable impact on index utilization.
356
+
For most queries, loading false positive index matches will not have any noticeable impact on index utilization.
In addition to including or excluding paths for individual properties, you can also specify a composite index. If you would like to perform a query that has an `ORDER BY` clause for multiple properties, a [composite index](../index-policy.md#composite-indexes) on those properties is required. Additionally, composite indexes will have a performance benefit for queries that have a multiple filters or both a filter and an ORDER BY clause.
103
+
In addition to including or excluding paths for individual properties, you can also specify a composite index. If you would like to perform a query that has an `ORDER BY` clause for multiple properties, a [composite index](../index-policy.md#composite-indexes) on those properties is required. Additionally, composite indexes will have a performance benefit for queries that have multiple filters or both a filter and an ORDER BY clause.
104
104
105
105
> [!NOTE]
106
106
> Composite paths have an implicit `/?` since only the scalar value at that path is indexed. The `/*` wildcard is not supported in composite paths. You shouldn't specify `/?` or `/*` in a composite path.
When defining a custom indexing policy while creating a new container, the SDK V3's fluent API lets you write this definition in a concise and efficient way:
339
+
The SDK V3's fluent API lets you write this definition in a concise and efficient way when defining a custom indexing policy while creating a new container:
0 commit comments