Skip to content

Commit 3a201d4

Browse files
authored
Merge pull request #279254 from abinav2307/master
Workaround to create wildcard indexes on documents with many fields
2 parents 0cf637d + a493d1f commit 3a201d4

File tree

1 file changed

+150
-0
lines changed

1 file changed

+150
-0
lines changed
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
---
2+
title: Wildcard indexes in Azure Cosmos DB for MongoDB vCore
3+
titleSuffix: Azure Cosmos DB for MongoDB vCore
4+
description: Sample to create wildcard indexes in Azure Cosmos DB for MongoDB vCore.
5+
author: abinav2307
6+
ms.author: abramees
7+
ms.reviewer: sidandrews
8+
ms.service: cosmos-db
9+
ms.subservice: mongodb-vcore
10+
ms.topic: conceptual
11+
ms.date: 6/25/2024
12+
---
13+
14+
15+
# Create wildcard indexes in Azure Cosmos DB for MongoDB vCore
16+
17+
[!INCLUDE[MongoDB vCore](~/reusable-content/ce-skilling/azure/includes/cosmos-db/includes/appliesto-mongodb-vcore.md)]
18+
19+
While most workloads have a predictable set of fields used in query filters and predicates, adhoc query patterns may use filters on any field in the json document structure.
20+
21+
Wildcard indexing can be helpful in the following scenarios:
22+
- Queries filtering on any field in the document making indexing all fields through a single command easier than indexing each field individually.
23+
- Queries filtering on most fields in the document making indexing all but a few fields through a single easier than indexing most fields individually.
24+
25+
This sample describes a simple workaround to minimize the effort needed to create individual indexes until wildcard indexing is generally available in Azure Cosmos DB for MongoDB vCore.
26+
27+
## Solution
28+
Consider the json document below:
29+
```json
30+
{
31+
"firstName": "Steve",
32+
"lastName": "Smith",
33+
"companyName": "Microsoft",
34+
"division": "Azure",
35+
"subDivision": "Data & AI",
36+
"timeInOrgInYears": 7,
37+
"roles": [
38+
{
39+
"teamName" : "Windows",
40+
"teamSubName" "Operating Systems",
41+
"timeInTeamInYears": 3
42+
},
43+
{
44+
"teamName" : "Devices",
45+
"teamSubName" "Surface",
46+
"timeInTeamInYears": 2
47+
},
48+
{
49+
"teamName" : "Devices",
50+
"teamSubName" "Surface",
51+
"timeInTeamInYears": 2
52+
}
53+
]
54+
}
55+
```
56+
57+
The following indices are created under the covers when wildcard indexing is used.
58+
- db.collection.createIndex({"firstName", 1})
59+
- db.collection.createIndex({"lastName", 1})
60+
- db.collection.createIndex({"companyName", 1})
61+
- db.collection.createIndex({"division", 1})
62+
- db.collection.createIndex({"subDivision", 1})
63+
- db.collection.createIndex({"timeInOrgInYears", 1})
64+
- db.collection.createIndex({"subDivision", 1})
65+
- db.collection.createIndex({"roles.teamName", 1})
66+
- db.collection.createIndex({"roles.teamSubName", 1})
67+
- db.collection.createIndex({"roles.timeInTeamInYears", 1})
68+
69+
While this sample document only requires a combination of 10 fields to be explicitly indexed, larger documents with hundreds or thousands of fields can get tedious and error prone when indexing fields individually.
70+
71+
The jar file detailed in the rest of this document makes indexing fields in larger documents simpler. The jar takes a sample JSON document as input, parses the document and executes createIndex commands for each field without the need for user intervention.
72+
73+
## Prerequisites
74+
75+
### Java 21
76+
After the virtual machine is deployed, use SSH to connect to the machine, and install CQLSH using the below commands:
77+
78+
```bash
79+
# Install default-jdk
80+
sudo apt update
81+
sudo apt install openjdk-21-jdk
82+
```
83+
84+
## Sample jar to create individual indexes for all fields
85+
86+
Clone the repository containing the Java sample to iterate through each field in the JSON document's structure and issue createIndex operations for each field in the document.
87+
88+
```bash
89+
git clone https://github.com/Azure-Samples/cosmosdb-mongodb-vcore-wildcard-indexing.git
90+
```
91+
92+
The cloned repository does not need to be built if there are no changes to be made to the solution. The built runnable jar named azure-cosmosdb-mongo-data-indexer-1.0-SNAPSHOT.jar is already included in the runnableJar/ folder. The jar can be executed by specifying the following required parameters:
93+
- Azure Cosmos DB for MongoDB vCore cluster connection string with the username and password used when the cluster was provisioned
94+
- The Azure Cosmos DB for MongoDB vCore database
95+
- The collection to be indexed
96+
- The location of the json file with the document structure for the collection. This document is parsed by the jar file to extract every field and issue individual createIndex operations.
97+
98+
```bash
99+
java -jar azure-cosmosdb-mongo-data-indexer-1.0-SNAPSHOT.jar mongodb+srv://<user>:<password>@abinav-test-benchmarking.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000 cosmicworks employee sampleEmployee.json
100+
```
101+
102+
## Track the status of a createIndex operation
103+
The jar file is designed to not wait on a response from each createIndex operation. The indexes are created asynchronously on the server and the progress of the index build operation on the cluster can be tracked.
104+
105+
Consider this sample to track indexing progress on the 'cosmicworks' database.
106+
```javascript
107+
use cosmicworks;
108+
db.currentOp()
109+
```
110+
111+
When a createIndex operation is in progress, the response looks like:
112+
```json
113+
{
114+
"inprog": [
115+
{
116+
"shard": "defaultShard",
117+
"active": true,
118+
"type": "op",
119+
"opid": "30000451493:1719209762286363",
120+
"op_prefix": 30000451493,
121+
"currentOpTime": "2024-06-24T06:16:02.000Z",
122+
"secs_running": 0,
123+
"command": { "aggregate": "" },
124+
"op": "command",
125+
"waitingForLock": false
126+
},
127+
{
128+
"shard": "defaultShard",
129+
"active": true,
130+
"type": "op",
131+
"opid": "30000451876:1719209638351743",
132+
"op_prefix": 30000451876,
133+
"currentOpTime": "2024-06-24T06:13:58.000Z",
134+
"secs_running": 124,
135+
"command": { "createIndexes": "" },
136+
"op": "workerCommand",
137+
"waitingForLock": false,
138+
"progress": {},
139+
"msg": ""
140+
}
141+
],
142+
"ok": 1
143+
}
144+
```
145+
146+
## Related content
147+
148+
Check out the full sample here - https://github.com/Azure-Samples/cosmosdb-mongodb-vcore-wildcard-indexing
149+
150+
Check out [indexing best practices](how-to-create-indexes.md), which details best practices for indexing on Azure Cosmos DB for MongoDB vCore.

0 commit comments

Comments
 (0)