|
| 1 | +--- |
| 2 | +title: Wildcard Indexes in Azure Cosmos DB for MongoDB vCore |
| 3 | +titleSuffix: Azure Cosmos DB for MongoDB vCore |
| 4 | +description: Sample to create wildcard indexes in Azure Cosmos DB for MongoDB vCore. |
| 5 | +author: abinav2307 |
| 6 | +ms.author: abramees |
| 7 | +ms.reviewer: sidandrews |
| 8 | +ms.service: cosmos-db |
| 9 | +ms.subservice: mongodb-vcore |
| 10 | +ms.topic: conceptual |
| 11 | +ms.date: 6/25/2024 |
| 12 | +--- |
1 | 13 |
|
| 14 | + |
| 15 | +# Creating Wildcard Indexes in Azure Cosmos DB for MongoDB vCore |
| 16 | + |
| 17 | +[!INCLUDE[MongoDB vCore](~/reusable-content/ce-skilling/azure/includes/cosmos-db/includes/appliesto-mongodb-vcore.md)] |
| 18 | + |
| 19 | +While most workloads have a predictable set of fields used in query filters and predicates, adhoc query patterns may use filters on any field in the json document structure. |
| 20 | + |
| 21 | +Wildcard indexing can be helpful in the following scenarios: |
| 22 | +- Queries filtering on any field in the document making indexing all fields through a single command easier than indexing each field individually. |
| 23 | +- Queries filtering on most fields in the document making indexing all but a few fields through a single easier than indexing most fields individually. |
| 24 | + |
| 25 | +Wildcard indexing will soon be available in Azure Cosmos DB for MongoDB vCore. This sample describes a simple workaround to minimize the effort needed to create individual indexes until wildcar indexing is generally available. |
| 26 | + |
| 27 | +## Solution |
| 28 | +Consider the json document below: |
| 29 | +```json |
| 30 | +{ |
| 31 | + "firstName": "Steve", |
| 32 | + "lastName": "Smith", |
| 33 | + "companyName": "Microsoft", |
| 34 | + "division": "Azure", |
| 35 | + "subDivision": "Data & AI", |
| 36 | + "timeInOrgInYears": 7, |
| 37 | + "roles": [ |
| 38 | + { |
| 39 | + "teamName" : "Windows", |
| 40 | + "teamSubName" "Operating Systems", |
| 41 | + "timeInTeamInYears": 3 |
| 42 | + }, |
| 43 | + { |
| 44 | + "teamName" : "Devices", |
| 45 | + "teamSubName" "Surface", |
| 46 | + "timeInTeamInYears": 2 |
| 47 | + }, |
| 48 | + { |
| 49 | + "teamName" : "Devices", |
| 50 | + "teamSubName" "Surface", |
| 51 | + "timeInTeamInYears": 2 |
| 52 | + } |
| 53 | + ] |
| 54 | +} |
| 55 | + |
| 56 | +The following indices will be created under the covers when wildcard indexing is used. |
| 57 | +- db.collection.createIndex({"firstName", 1}) |
| 58 | +- db.collection.createIndex({"lastName", 1}) |
| 59 | +- db.collection.createIndex({"companyName", 1}) |
| 60 | +- db.collection.createIndex({"division", 1}) |
| 61 | +- db.collection.createIndex({"subDivision", 1}) |
| 62 | +- db.collection.createIndex({"timeInOrgInYears", 1}) |
| 63 | +- db.collection.createIndex({"subDivision", 1}) |
| 64 | +- db.collection.createIndex({"roles.teamName", 1}) |
| 65 | +- db.collection.createIndex({"roles.teamSubName", 1}) |
| 66 | +- db.collection.createIndex({"roles.timeInTeamInYears", 1}) |
| 67 | + |
| 68 | +While this sample document only requires a combination of 10 fields to be explicitly indexed, larger documents with hundreds or thousands of fields can get tedious and error prone when indexing fields individually. |
| 69 | + |
| 70 | +The jar file detailed in the rest of this document makes indexing fields in larger documents simpler. The jar takes a sample JSON document as input, parses the document and executes createIndex commands for each field without the need for user intervention. |
| 71 | + |
| 72 | +## Prerequisites |
| 73 | + |
| 74 | +### Java 21 |
| 75 | +After the virtual machine is deployed, use SSH to connect to the machine, and install CQLSH using the below commands: |
| 76 | + |
| 77 | +```bash |
| 78 | +# Install default-jdk |
| 79 | +sudo apt update |
| 80 | +sudo apt install openjdk-21-jdk |
| 81 | +``` |
| 82 | + |
| 83 | +### Sample jar to create individual indexes for all fields |
| 84 | + |
| 85 | +Clone the repository containing the Java sample to iterate through each field in the JSON document's structure and issue createIndex operations for each field in the document. |
| 86 | + |
| 87 | +```bash |
| 88 | +git clone https://github.com/Azure-Samples/cosmosdb-mongodb-vcore-wildcard-indexing.git |
| 89 | +``` |
| 90 | + |
| 91 | +The cloned repository does not need to be built if there are no changes to be made to the solution. The built runnable jar named azure-cosmosdb-mongo-data-indexer-1.0-SNAPSHOT.jar is already included in the runnableJar/ folder. The jar can be executed by specifying the following required parameters: |
| 92 | +- Azure Cosmos DB for MongoDB vCore cluster connection string with the username and password used when the cluster was provisioned |
| 93 | +- The database within which the collection has been created |
| 94 | +- The name of the collection to be indexed |
| 95 | +- The location of the local json file containing the document structure for the specified collection. This is the document that will be read by the jar file to extract each field and issue individual createIndex operations. |
| 96 | + |
| 97 | +```bash |
| 98 | +java -jar azure-cosmosdb-mongo-data-indexer-1.0-SNAPSHOT.jar mongodb+srv://<user>:<password>@abinav-test-benchmarking.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000 cosmicworks employee sampleEmployee.json |
| 99 | +``` |
| 100 | + |
| 101 | +### Track the status of a createIndex operation |
| 102 | +The jar file is designed to not wait on a response from each createIndex operation. The indexes are created asynchronously on the server and the progress of the index build operation on the cluster can be tracked. |
| 103 | + |
| 104 | +Consider this sample to track indexing progress on the 'cosmicworks' database. |
| 105 | +```javascript |
| 106 | +use cosmicworks; |
| 107 | +db.currentOp() |
| 108 | +``` |
| 109 | + |
| 110 | +When a createIndex operation is in progress, the response looks like: |
| 111 | +```json |
| 112 | +{ |
| 113 | + "inprog": [ |
| 114 | + { |
| 115 | + "shard": "defaultShard", |
| 116 | + "active": true, |
| 117 | + "type": "op", |
| 118 | + "opid": "30000451493:1719209762286363", |
| 119 | + "op_prefix": 30000451493, |
| 120 | + "currentOpTime": "2024-06-24T06:16:02.000Z", |
| 121 | + "secs_running": 0, |
| 122 | + "command": { "aggregate": "" }, |
| 123 | + "op": "command", |
| 124 | + "waitingForLock": false |
| 125 | + }, |
| 126 | + { |
| 127 | + "shard": "defaultShard", |
| 128 | + "active": true, |
| 129 | + "type": "op", |
| 130 | + "opid": "30000451876:1719209638351743", |
| 131 | + "op_prefix": 30000451876, |
| 132 | + "currentOpTime": "2024-06-24T06:13:58.000Z", |
| 133 | + "secs_running": 124, |
| 134 | + "command": { "createIndexes": "" }, |
| 135 | + "op": "workerCommand", |
| 136 | + "waitingForLock": false, |
| 137 | + "progress": {}, |
| 138 | + "msg": "" |
| 139 | + } |
| 140 | + ], |
| 141 | + "ok": 1 |
| 142 | +} |
| 143 | +``` |
| 144 | + |
| 145 | +## Related content |
| 146 | + |
| 147 | +Check out [indexing best practices](how-to-create-indexes.md), which details best practices for indexing on Azure Cosmos DB for MongoDB vCore. |
0 commit comments