Skip to content

Commit 6a16c4b

Browse files
Merge pull request #215524 from jcocchi/cosmos-query-updates
Cosmos DB Geospatial and indexing policy updates
2 parents f91b9a9 + a17f969 commit 6a16c4b

File tree

9 files changed

+120
-139
lines changed

9 files changed

+120
-139
lines changed

articles/cosmos-db/index-overview.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The goal of this article is to explain how Azure Cosmos DB indexes data and how
2020

2121
## From items to trees
2222

23-
Every time an item is stored in a container, its content is projected as a JSON document, then converted into a tree representation. What that means is that every property of that item gets represented as a node in a tree. A pseudo root node is created as a parent to all the first-level properties of the item. The leaf nodes contain the actual scalar values carried by an item.
23+
Every time an item is stored in a container, its content is projected as a JSON document, then converted into a tree representation. This means that every property of that item gets represented as a node in a tree. A pseudo root node is created as a parent to all the first-level properties of the item. The leaf nodes contain the actual scalar values carried by an item.
2424

2525
As an example, consider this item:
2626

@@ -147,7 +147,7 @@ Range indexes can be used on scalar values (string or number). The default index
147147
SELECT * FROM c WHERE ST_INTERSECTS(c.property, { 'type':'Polygon', 'coordinates': [[ [31.8, -5], [32, -5], [31.8, -5] ]] })
148148
```
149149
150-
Spatial indexes can be used on correctly formatted [GeoJSON](./sql-query-geospatial-intro.md) objects. Points, LineStrings, Polygons, and MultiPolygons are currently supported. To use this index type, set by using the `"kind": "Range"` property when configuring the indexing policy. To learn how to configure spatial indexes, see [Spatial indexing policy examples](how-to-manage-indexing-policy.md#spatial-index)
150+
Spatial indexes can be used on correctly formatted [GeoJSON](./sql-query-geospatial-intro.md) objects. Points, LineStrings, Polygons, and MultiPolygons are currently supported. To learn how to configure spatial indexes, see [Spatial indexing policy examples](how-to-manage-indexing-policy.md#spatial-index)
151151
152152
### Composite indexes
153153
@@ -201,7 +201,7 @@ Here is a table that summarizes the different ways indexes are used in Azure Cos
201201
| Full index scan | Read distinct set of indexed values and load only matching items from the transactional data store | Contains, EndsWith, RegexMatch, LIKE | Increases linearly based on the cardinality of indexed properties | Increases based on number of items in query results |
202202
| Full scan | Load all items from the transactional data store | Upper, Lower | N/A | Increases based on number of items in container |
203203
204-
When writing queries, you should use filter predicate that use the index as efficiently as possible. For example, if either `StartsWith` or `Contains` would work for your use case, you should opt for `StartsWith` since it will do a precise index scan instead of a full index scan.
204+
When writing queries, you should use filter predicate that uses the index as efficiently as possible. For example, if either `StartsWith` or `Contains` would work for your use case, you should opt for `StartsWith` since it will do a precise index scan instead of a full index scan.
205205
206206
## Index usage details
207207
@@ -273,7 +273,7 @@ The query predicate (filtering on items where any location has "France" as its c
273273

274274
:::image type="content" source="./media/index-overview/matching-path.png" alt-text="Matching a specific path within a tree" border="false":::
275275

276-
Since this query has an equality filter, after traversing this tree, we can quickly identify the index pages that contain the query results. In this case, the query engine would read index pages that contain Item 1. An index seek is the most efficient way to use the index. With an index seek we only read the necessary index pages and load only the items in the query results. Therefore, the index lookup time and RU charge from index lookup are incredibly low, regardless of the total data volume.
276+
Since this query has an equality filter, after traversing this tree, we can quickly identify the index pages that contain the query results. In this case, the query engine would read index pages that contain Item 1. An index seek is the most efficient way to use the index. With an index seek, we only read the necessary index pages and load only the items in the query results. Therefore, the index lookup time and RU charge from index lookup are incredibly low, regardless of the total data volume.
277277

278278
### Precise index scan
279279

@@ -345,15 +345,15 @@ FROM company
345345
WHERE company.headquarters.employees = 200 AND CONTAINS(company.headquarters.country, "United")
346346
```
347347

348-
To execute this query, the query engine must do an index seek on `headquarters/employees` and full index scan on `headquarters/country`. The query engine has internal heuristics that it uses to evaluate the query filter expression as efficiently as possible. In this case, the query engine would avoid needing to read unnecessary index pages by doing the index seek first. If, for example, only 50 items matched the equality filter, the query engine would only need to evaluate `Contains` on the index pages that contained those 50 items. A full index scan of the entire container wouldn't be necessary.
348+
To execute this query, the query engine must do an index seek on `headquarters/employees` and full index scan on `headquarters/country`. The query engine has internal heuristics that it uses to evaluate the query filter expression as efficiently as possible. In this case, the query engine would avoid needing to read unnecessary index pages by doing the index seek first. If for example, only 50 items matched the equality filter, the query engine would only need to evaluate `Contains` on the index pages that contained those 50 items. A full index scan of the entire container wouldn't be necessary.
349349

350350
## Index utilization for scalar aggregate functions
351351

352352
Queries with aggregate functions must rely exclusively on the index in order to use it.
353353

354354
In some cases, the index can return false positives. For example, when evaluating `Contains` on the index, the number of matches in the index may exceed the number of query results. The query engine will load all index matches, evaluate the filter on the loaded items, and return only the correct results.
355355

356-
For the majority of queries, loading false positive index matches will not have any noticeable impact on index utilization.
356+
For most queries, loading false positive index matches will not have any noticeable impact on index utilization.
357357

358358
For example, consider the following query:
359359

articles/cosmos-db/index-policy.md

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -87,22 +87,6 @@ Any indexing policy has to include the root path `/*` as either an included or a
8787

8888
- If the indexing mode is set to **consistent**, the system properties `id` and `_ts` are automatically indexed.
8989

90-
When including and excluding paths, you may encounter the following attributes:
91-
92-
- `kind` can be either `range` or `hash`. Hash index support is limited to equality filters. Range index functionality provides all of the functionality of hash indexes as well as efficient sorting, range filters, system functions. We always recommend using a range index.
93-
94-
- `precision` is a number defined at the index level for included paths. A value of `-1` indicates maximum precision. We recommend always setting this value to `-1`.
95-
96-
- `dataType` can be either `String` or `Number`. This indicates the types of JSON properties that will be indexed.
97-
98-
It's no longer necessary to set these properties. When not specified, these properties will have the following default values:
99-
100-
| **Property Name** | **Default Value** |
101-
| ----------------------- | -------------------------------- |
102-
| `kind` | `range` |
103-
| `precision` | `-1` |
104-
| `dataType` | `String` and `Number` |
105-
10690
See [this section](how-to-manage-indexing-policy.md#indexing-policy-examples) for indexing policy examples for including and excluding paths.
10791

10892
## Include/exclude precedence
@@ -335,7 +319,7 @@ A container's indexing policy can be updated at any time [by using the Azure por
335319
> Index transformation is an operation that consumes [Request Units](request-units.md). Request Units consumed by an index transformation aren't currently billed if you are using [serverless](serverless.md) containers. These Request Units will get billed once serverless becomes generally available.
336320
337321
> [!NOTE]
338-
> You can track the progress of index transformation in the Azure portal or [by using one of the SDKs](how-to-manage-indexing-policy.md).
322+
> You can track the progress of index transformation in the [Azure portal](how-to-manage-indexing-policy.md#use-the-azure-portal) or by [using one of the SDKs](how-to-manage-indexing-policy.md#dotnet-sdk).
339323
340324
There's no impact to write availability during any index transformations. The index transformation uses your provisioned RUs but at a lower priority than your CRUD operations or queries.
341325

articles/cosmos-db/nosql/how-to-manage-indexing-policy.md

Lines changed: 32 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -44,40 +44,6 @@ Here are some examples of indexing policies shown in [their JSON format](../inde
4444
}
4545
```
4646

47-
This indexing policy is equivalent to the one below which manually sets ```kind```, ```dataType```, and ```precision``` to their default values. These properties are no longer necessary to explicitly set and you should omit them from your indexing policy entirely (as shown in above example). If you try to set these properties, they'll be automatically removed from your indexing policy.
48-
49-
50-
```json
51-
{
52-
"indexingMode": "consistent",
53-
"includedPaths": [
54-
{
55-
"path": "/*",
56-
"indexes": [
57-
{
58-
"kind": "Range",
59-
"dataType": "Number",
60-
"precision": -1
61-
},
62-
{
63-
"kind": "Range",
64-
"dataType": "String",
65-
"precision": -1
66-
}
67-
]
68-
}
69-
],
70-
"excludedPaths": [
71-
{
72-
"path": "/path/to/single/excluded/property/?"
73-
},
74-
{
75-
"path": "/path/to/root/of/multiple/excluded/properties/*"
76-
}
77-
]
78-
}
79-
```
80-
8147
### Opt-in policy to selectively include some property paths
8248

8349
```json
@@ -99,48 +65,6 @@ This indexing policy is equivalent to the one below which manually sets ```kind`
9965
}
10066
```
10167

102-
This indexing policy is equivalent to the one below which manually sets ```kind```, ```dataType```, and ```precision``` to their default values. These properties are no longer necessary to explicitly set and you should omit them from your indexing policy entirely (as shown in above example). If you try to set these properties, they'll be automatically removed from your indexing policy.
103-
104-
105-
```json
106-
{
107-
"indexingMode": "consistent",
108-
"includedPaths": [
109-
{
110-
"path": "/path/to/included/property/?",
111-
"indexes": [
112-
{
113-
"kind": "Range",
114-
"dataType": "Number"
115-
},
116-
{
117-
"kind": "Range",
118-
"dataType": "String"
119-
}
120-
]
121-
},
122-
{
123-
"path": "/path/to/root/of/multiple/included/properties/*",
124-
"indexes": [
125-
{
126-
"kind": "Range",
127-
"dataType": "Number"
128-
},
129-
{
130-
"kind": "Range",
131-
"dataType": "String"
132-
}
133-
]
134-
}
135-
],
136-
"excludedPaths": [
137-
{
138-
"path": "/*"
139-
}
140-
]
141-
}
142-
```
143-
14468
> [!NOTE]
14569
> It is generally recommended to use an **opt-out** indexing policy to let Azure Cosmos DB proactively index any new property that may be added to your data model.
14670
@@ -176,7 +100,7 @@ This indexing policy is equivalent to the one below which manually sets ```kind`
176100

177101
## <a id="composite-index"></a>Composite indexing policy examples
178102

179-
In addition to including or excluding paths for individual properties, you can also specify a composite index. If you would like to perform a query that has an `ORDER BY` clause for multiple properties, a [composite index](../index-policy.md#composite-indexes) on those properties is required. Additionally, composite indexes will have a performance benefit for queries that have a multiple filters or both a filter and an ORDER BY clause.
103+
In addition to including or excluding paths for individual properties, you can also specify a composite index. If you would like to perform a query that has an `ORDER BY` clause for multiple properties, a [composite index](../index-policy.md#composite-indexes) on those properties is required. Additionally, composite indexes will have a performance benefit for queries that have multiple filters or both a filter and an ORDER BY clause.
180104

181105
> [!NOTE]
182106
> Composite paths have an implicit `/?` since only the scalar value at that path is indexed. The `/*` wildcard is not supported in composite paths. You shouldn't specify `/?` or `/*` in a composite path.
@@ -377,36 +301,6 @@ To create a container with a custom indexing policy see, [Create a container wit
377301

378302
## <a id="dotnet-sdk"></a> Use the .NET SDK
379303

380-
# [.NET SDK V2](#tab/dotnetv2)
381-
382-
The `DocumentCollection` object from the [.NET SDK v2](https://www.nuget.org/packages/Microsoft.Azure.DocumentDB/) exposes an `IndexingPolicy` property that lets you change the `IndexingMode` and add or remove `IncludedPaths` and `ExcludedPaths`.
383-
384-
```csharp
385-
// Retrieve the container's details
386-
ResourceResponse<DocumentCollection> containerResponse = await client.ReadDocumentCollectionAsync(UriFactory.CreateDocumentCollectionUri("database", "container"));
387-
// Set the indexing mode to consistent
388-
containerResponse.Resource.IndexingPolicy.IndexingMode = IndexingMode.Consistent;
389-
// Add an included path
390-
containerResponse.Resource.IndexingPolicy.IncludedPaths.Add(new IncludedPath { Path = "/*" });
391-
// Add an excluded path
392-
containerResponse.Resource.IndexingPolicy.ExcludedPaths.Add(new ExcludedPath { Path = "/name/*" });
393-
// Add a spatial index
394-
containerResponse.Resource.IndexingPolicy.SpatialIndexes.Add(new SpatialSpec() { Path = "/locations/*", SpatialTypes = new Collection<SpatialType>() { SpatialType.Point } } );
395-
// Add a composite index
396-
containerResponse.Resource.IndexingPolicy.CompositeIndexes.Add(new Collection<CompositePath> {new CompositePath() { Path = "/name", Order = CompositePathSortOrder.Ascending }, new CompositePath() { Path = "/age", Order = CompositePathSortOrder.Descending }});
397-
// Update container with changes
398-
await client.ReplaceDocumentCollectionAsync(containerResponse.Resource);
399-
```
400-
401-
To track the index transformation progress, pass a `RequestOptions` object that sets the `PopulateQuotaInfo` property to `true`.
402-
403-
```csharp
404-
// retrieve the container's details
405-
ResourceResponse<DocumentCollection> container = await client.ReadDocumentCollectionAsync(UriFactory.CreateDocumentCollectionUri("database", "container"), new RequestOptions { PopulateQuotaInfo = true });
406-
// retrieve the index transformation progress from the result
407-
long indexTransformationProgress = container.IndexTransformationProgress;
408-
```
409-
410304
# [.NET SDK V3](#tab/dotnetv3)
411305

412306
The `ContainerProperties` object from the [.NET SDK v3](https://www.nuget.org/packages/Microsoft.Azure.Cosmos/) (see [this Quickstart](quickstart-dotnet.md) regarding its usage) exposes an `IndexingPolicy` property that lets you change the `IndexingMode` and add or remove `IncludedPaths` and `ExcludedPaths`.
@@ -442,7 +336,7 @@ ContainerResponse containerResponse = await client.GetContainer("database", "con
442336
long indexTransformationProgress = long.Parse(containerResponse.Headers["x-ms-documentdb-collection-index-transformation-progress"]);
443337
```
444338

445-
When defining a custom indexing policy while creating a new container, the SDK V3's fluent API lets you write this definition in a concise and efficient way:
339+
The SDK V3's fluent API lets you write this definition in a concise and efficient way when defining a custom indexing policy while creating a new container:
446340

447341
```csharp
448342
await client.GetDatabase("database").DefineContainer(name: "container", partitionKeyPath: "/myPartitionKey")
@@ -463,6 +357,36 @@ await client.GetDatabase("database").DefineContainer(name: "container", partitio
463357
.Attach()
464358
.CreateIfNotExistsAsync();
465359
```
360+
361+
# [.NET SDK V2](#tab/dotnetv2)
362+
363+
The `DocumentCollection` object from the [.NET SDK v2](https://www.nuget.org/packages/Microsoft.Azure.DocumentDB/) exposes an `IndexingPolicy` property that lets you change the `IndexingMode` and add or remove `IncludedPaths` and `ExcludedPaths`.
364+
365+
```csharp
366+
// Retrieve the container's details
367+
ResourceResponse<DocumentCollection> containerResponse = await client.ReadDocumentCollectionAsync(UriFactory.CreateDocumentCollectionUri("database", "container"));
368+
// Set the indexing mode to consistent
369+
containerResponse.Resource.IndexingPolicy.IndexingMode = IndexingMode.Consistent;
370+
// Add an included path
371+
containerResponse.Resource.IndexingPolicy.IncludedPaths.Add(new IncludedPath { Path = "/*" });
372+
// Add an excluded path
373+
containerResponse.Resource.IndexingPolicy.ExcludedPaths.Add(new ExcludedPath { Path = "/name/*" });
374+
// Add a spatial index
375+
containerResponse.Resource.IndexingPolicy.SpatialIndexes.Add(new SpatialSpec() { Path = "/locations/*", SpatialTypes = new Collection<SpatialType>() { SpatialType.Point } } );
376+
// Add a composite index
377+
containerResponse.Resource.IndexingPolicy.CompositeIndexes.Add(new Collection<CompositePath> {new CompositePath() { Path = "/name", Order = CompositePathSortOrder.Ascending }, new CompositePath() { Path = "/age", Order = CompositePathSortOrder.Descending }});
378+
// Update container with changes
379+
await client.ReplaceDocumentCollectionAsync(containerResponse.Resource);
380+
```
381+
382+
To track the index transformation progress, pass a `RequestOptions` object that sets the `PopulateQuotaInfo` property to `true`.
383+
384+
```csharp
385+
// retrieve the container's details
386+
ResourceResponse<DocumentCollection> container = await client.ReadDocumentCollectionAsync(UriFactory.CreateDocumentCollectionUri("database", "container"), new RequestOptions { PopulateQuotaInfo = true });
387+
// retrieve the index transformation progress from the result
388+
long indexTransformationProgress = container.IndexTransformationProgress;
389+
```
466390
---
467391

468392
## Use the Java SDK
190 KB
Loading

articles/cosmos-db/nosql/query/TOC.yml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -226,6 +226,9 @@
226226
- name: Spatial functions
227227
displayName: spatial functions, spatial, system functions, built-in functions
228228
href: spatial-functions.md
229+
- name: ST_AREA
230+
displayName: ST_AREA, st area, area, built-in functions
231+
href: st-area.md
229232
- name: ST_DISTANCE
230233
displayName: ST_DISTANCE, st distance, distance, built-in functions
231234
href: st-distance.md

articles/cosmos-db/nosql/query/geospatial-intro.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -101,11 +101,13 @@ Azure Cosmos DB interprets coordinates as represented per the WGS-84 reference s
101101
**LineStrings in GeoJSON**
102102

103103
```json
104+
{
104105
"type":"LineString",
105-
"coordinates":[ [
106+
"coordinates":[
106107
[ 31.8, -5 ],
107108
[ 31.8, -4.7 ]
108-
] ]
109+
]
110+
}
109111
```
110112

111113
### Polygons

articles/cosmos-db/nosql/query/spatial-functions.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,13 @@ Azure Cosmos DB supports the following Open Geospatial Consortium (OGC) built-in
1818

1919
The following scalar functions perform an operation on a spatial object input value and return a numeric or Boolean value.
2020

21+
* [ST_AREA](st-area.md)
2122
* [ST_DISTANCE](st-distance.md)
2223
* [ST_INTERSECTS](st-intersects.md)
2324
* [ST_ISVALID](st-isvalid.md)
2425
* [ST_ISVALIDDETAILED](st-isvaliddetailed.md)
2526
* [ST_WITHIN](st-within.md)
2627

27-
28-
29-
30-
31-
3228
## Next steps
3329

3430
- [System functions Azure Cosmos DB](system-functions.md)

0 commit comments

Comments
 (0)