Skip to content

Commit 508c73b

Browse files
author
Jill Grant
authored
Merge pull request #279116 from abinav2307/master
Update how-to-create-indexes.md
2 parents 2c6255a + a442ae4 commit 508c73b

File tree

1 file changed

+121
-29
lines changed

1 file changed

+121
-29
lines changed

articles/cosmos-db/mongodb/vcore/how-to-create-indexes.md

Lines changed: 121 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,159 @@
11
---
2-
title: Optimize index creation in Azure Cosmos DB for MongoDB vCore
2+
title: Indexing Best Practices in Azure Cosmos DB for MongoDB vCore
33
titleSuffix: Azure Cosmos DB for MongoDB vCore
44
description: Use create Indexing for empty collections in Azure Cosmos DB for MongoDB vCore.
5-
author: khelanmodi
6-
ms.author: khelanmodi
5+
author: abinav2307
6+
ms.author: abramees
77
ms.reviewer: sidandrews
88
ms.service: cosmos-db
99
ms.subservice: mongodb-vcore
1010
ms.topic: conceptual
11-
ms.date: 1/24/2024
11+
ms.date: 6/24/2024
1212
---
1313

14-
# Optimize index creation in Azure Cosmos DB for MongoDB vCore
14+
# Indexing Best Practices in Azure Cosmos DB for MongoDB vCore
1515

1616
[!INCLUDE[MongoDB vCore](~/reusable-content/ce-skilling/azure/includes/cosmos-db/includes/appliesto-mongodb-vcore.md)]
1717

18-
The `CreateIndexes` Command in Azure Cosmos DB for MongoDB vCore has an option to optimize index creation, especially beneficial for scenarios involving empty collections. This document outlines the usage and expected behavior of this new option.
18+
## Queryable fields should always have indexes created
19+
Read operations based on predicates and aggregates consult the index for the corresponding filters. In the absence of indexes, the database engine performs a document scan to retrieve the matching documents. Scans are always expensive and get progressively more expensive as the volume of data in a collection grows. For optimal query performance, indexes should always be created for all queryable fields.
1920

20-
## Advantages in Specific Scenarios
21+
## Avoid unnecessary indexes and indexing all fields by default
22+
Indexes should be created only for queryable fields. Wildcard indexing should be used only when query patterns are unpredictable where any field in the document structure can be part of query filters.
2123

22-
- **Efficiency in Migration Utilities**: This option is ideal in migration contexts, reducing the time for index creation by preventing delays caused by waiting for transactions with pre-existing snapshots.
23-
- **Streamlined Index Creation Process**: In Cosmos DB for MongoDB vCore, this translates to a simpler process with a single collection scan, enhancing efficiency.
24-
- **Enhanced Control**: Users gain more control over the indexing process, crucial in environments balancing read and write operations during index creation.
24+
> [!TIP]
25+
> Azure Cosmos DB for MongoDB vCore only indexes the _id field by default. All other fields are not indexed by default. The fields to be indexed should be planned ahead of time to maximize query performance, while minimizing impact on writes from indexing too many fields.
2526
26-
## Prerequisites
27+
When a new document is inserted for the first time or an existing document is updated or deleted, each of the specified fields in the index is also updated. If the indexing policy contains a large number of fields (or all the fields in the document), more resources are consumed by the server in updating the corresponding indexes. When running at scale, only the queryable fields should be indexed while all remaining fields not used in query predicates should remain excluded from the index.
2728

28-
- An existing Azure Cosmos DB for MongoDB vCore cluster.
29-
- If you don't have an Azure subscription, [create an account for free](https://azure.microsoft.com/free).
30-
- If you have an existing Azure subscription, [create a new Azure Cosmos DB for MongoDB vCore cluster](quickstart-portal.md).
29+
## Create the necessary indexes before data ingestion
30+
For optimal performance, indexes should be created upfront before data is loaded. All document writes, updates and deletes will synchronously update the corresponding indices. If indexes are created after data is ingested, more server resources are consumed to index historical data. Depending on the size of the historical data, this operation is time consuming and impacts steady state read and write performance.
3131

32-
## Default Setting
32+
> [!NOTE]
33+
> For scenarios where read patterns change and indexes need to be added, background indexing should be enabled, which can be done through a support ticket.
3334
34-
The default value of this option is `false`, ensuring backward compatibility and maintaining the existing non-blocking behavior.
35+
## For multiple indexes created on historical data, issue nonblocking createIndex commands for each field
36+
It is not always possible to plan for all query patterns upfront, particularly as application requirements evolve. Changing application needs inevitably requires fields to be added to the index on a cluster with a large amount of historical data.
3537

36-
## Blocking Option
38+
In such scenarios, each createIndex command should be issued asynchronously without waiting on a response from the server.
3739

38-
The `CreateIndexes` Command includes a `{ "blocking": true }` option, designed to provide more control over the indexing process in an empty collection.
40+
> [!NOTE]
41+
> By default, Azure Cosmos DB for MongoDB vCore responds to a createIndex operation only after the index is fully built on historical data. Depending on the size of the cluster and the volume of data ingested, this can take time and appear as though the server is not responding to the createIndex command.
3942
40-
Setting `{ "blocking": true }` blocks all write operations (delete, update, insert) to the collection until index creation is completed. This feature is particularly useful in scenarios such as migration utilities where indexes are created on empty collections before data writes commence.
43+
If the createIndex commands are being issued through the Mongo Shell, use Ctrl + C to interrupt the command to stop waiting on a response and issue the next set of operations.
4144

42-
## Create an index using the blocking option
45+
> [!NOTE]
46+
> Using Ctrl + C to interrupt the createIndex command after it has been issued does not terminate the index build operation on the server. It simply stops the Shell from waiting on a response from the server, while the server asynchronously continues to build the index over the existing documents.
4347
44-
For simplicity, let us consider an example of a blog application with the following setup:
48+
## Create Compound Indexes for queries with predicates on multiple fields
49+
Compound indexes should be used in the following scenarios:
50+
- Queries with filters on multiple fields
51+
- Queries with filters on multiple fields and with one or more fields sorted in ascending or descending order
4552

46-
- **Database name**: `cosmicworks`
47-
- **Collection name**: `products`
53+
Consider the following document within the 'cosmicworks' database and 'employee' collection
54+
```json
55+
{
56+
"firstName": "Steve",
57+
"lastName": "Smith",
58+
"companyName": "Microsoft",
59+
"division": "Azure",
60+
"subDivision": "Data & AI",
61+
"timeInOrgInYears": 7
62+
}
63+
```
64+
65+
Consider the following query to find all employees with last name 'Smith' with the organization for more than 5 years:
66+
```javascript
67+
db.employee.find({"lastName": "Smith", "timeInOrgInYears": {"$gt": 5}})
68+
```
69+
70+
A compound index on both 'lastName' and 'timeInOrgInYears' optimizes this query:
71+
```javascript
72+
use cosmicworks;
73+
db.employee.createIndex({"lastName" : 1, "timeInOrgInYears" : 1})
74+
```
75+
76+
## Track the status of a createIndex operation
77+
When indexes are added and historical data needs to be indexed, the progress of the index build operation can be tracked using db.currentOp().
78+
79+
Consider this sample to track the indexing progress on the 'cosmicworks' database.
80+
```javascript
81+
use cosmicworks;
82+
db.currentOp()
83+
```
4884

49-
To demonstrate the use of this new option in the `cosmicworks` database for an empty collection named `products`. This code snippet demonstrates how to use the blocking option, which will temporarily block write operations to the specified collection during index creation in an empty collection:
85+
When a createIndex operation is in progress, the response looks like:
86+
```json
87+
{
88+
"inprog": [
89+
{
90+
"shard": "defaultShard",
91+
"active": true,
92+
"type": "op",
93+
"opid": "30000451493:1719209762286363",
94+
"op_prefix": 30000451493,
95+
"currentOpTime": "2024-06-24T06:16:02.000Z",
96+
"secs_running": 0,
97+
"command": { "aggregate": "" },
98+
"op": "command",
99+
"waitingForLock": false
100+
},
101+
{
102+
"shard": "defaultShard",
103+
"active": true,
104+
"type": "op",
105+
"opid": "30000451876:1719209638351743",
106+
"op_prefix": 30000451876,
107+
"currentOpTime": "2024-06-24T06:13:58.000Z",
108+
"secs_running": 124,
109+
"command": { "createIndexes": "" },
110+
"op": "workerCommand",
111+
"waitingForLock": false,
112+
"progress": {},
113+
"msg": ""
114+
}
115+
],
116+
"ok": 1
117+
}
118+
```
119+
120+
## Enable Large Index Keys by default
121+
Even if the documents do not contain keys that have a large number of characters or the documents do not contain multiple levels of nesting, specifying large index keys ensures these scenarios are covered.
122+
123+
Consider this sampe to enable large index keys on the 'large_index_coll' collection in the 'cosmicworks' database.
124+
125+
```javascript
126+
use cosmicworks;
127+
db.runCommand(
128+
{
129+
"createIndexes": "large_index_coll",
130+
"indexes": [
131+
{
132+
"key": { "ikey": 1 },
133+
"name": "ikey_1",
134+
"enableLargeIndexKeys": true
135+
}
136+
]
137+
})
138+
```
139+
140+
## Prioritizing Index Builds over new Write Operations using the Blocking Option
141+
For scenarios in which the index should be created before data is loaded, the blocking option should be used to block incoming writes until the index build completes.
142+
143+
Setting `{ "blocking": true }` is particularly useful in migration utilities where indexes are created on empty collections before data writes commence.
144+
145+
Consider an example of the blocking option for index creation on the 'employee' collection in the 'cosmicworks' database:
50146

51147
```javascript
52148
use cosmicworks;
53149
db.runCommand({
54-
createIndexes: "products",
150+
createIndexes: "employee",
55151
indexes: [{"key":{"name":1}, "name":"name_1"}],
56152
blocking: true
57153
})
58154

59155
```
60156

61-
## Summary
62-
63-
The introduction of the blocking option in the `CreateIndexes` Command of Azure Cosmos DB for MongoDB (vCore) is a strategic enhancement for optimizing index creation for an empty collection. This feature complements the existing non-blocking method, providing an additional tool for scenarios requiring efficient index creation on empty collections.
64-
65157
## Related content
66158

67159
Check out [text indexing](how-to-create-text-index.md), which allows for efficient searching and querying of text-based data.

0 commit comments

Comments
 (0)