|
| 1 | +--- |
| 2 | +title: Wildcard indexes in Azure Cosmos DB for MongoDB vCore |
| 3 | +titleSuffix: Azure Cosmos DB for MongoDB vCore |
| 4 | +description: Sample to create wildcard indexes in Azure Cosmos DB for MongoDB vCore. |
| 5 | +author: abinav2307 |
| 6 | +ms.author: abramees |
| 7 | +ms.reviewer: sidandrews |
| 8 | +ms.service: cosmos-db |
| 9 | +ms.subservice: mongodb-vcore |
| 10 | +ms.topic: conceptual |
| 11 | +ms.date: 6/25/2024 |
| 12 | +--- |
| 13 | + |
| 14 | + |
| 15 | +# Create wildcard indexes in Azure Cosmos DB for MongoDB vCore |
| 16 | + |
| 17 | +[!INCLUDE[MongoDB vCore](~/reusable-content/ce-skilling/azure/includes/cosmos-db/includes/appliesto-mongodb-vcore.md)] |
| 18 | + |
| 19 | +While most workloads have a predictable set of fields used in query filters and predicates, adhoc query patterns may use filters on any field in the json document structure. |
| 20 | + |
| 21 | +Wildcard indexing can be helpful in the following scenarios: |
| 22 | +- Queries filtering on any field in the document making indexing all fields through a single command easier than indexing each field individually. |
| 23 | +- Queries filtering on most fields in the document making indexing all but a few fields through a single easier than indexing most fields individually. |
| 24 | + |
| 25 | +This sample describes a simple workaround to minimize the effort needed to create individual indexes until wildcard indexing is generally available in Azure Cosmos DB for MongoDB vCore. |
| 26 | + |
| 27 | +## Solution |
| 28 | +Consider the json document below: |
| 29 | +```json |
| 30 | +{ |
| 31 | + "firstName": "Steve", |
| 32 | + "lastName": "Smith", |
| 33 | + "companyName": "Microsoft", |
| 34 | + "division": "Azure", |
| 35 | + "subDivision": "Data & AI", |
| 36 | + "timeInOrgInYears": 7, |
| 37 | + "roles": [ |
| 38 | + { |
| 39 | + "teamName" : "Windows", |
| 40 | + "teamSubName" "Operating Systems", |
| 41 | + "timeInTeamInYears": 3 |
| 42 | + }, |
| 43 | + { |
| 44 | + "teamName" : "Devices", |
| 45 | + "teamSubName" "Surface", |
| 46 | + "timeInTeamInYears": 2 |
| 47 | + }, |
| 48 | + { |
| 49 | + "teamName" : "Devices", |
| 50 | + "teamSubName" "Surface", |
| 51 | + "timeInTeamInYears": 2 |
| 52 | + } |
| 53 | + ] |
| 54 | +} |
| 55 | +``` |
| 56 | + |
| 57 | +The following indices are created under the covers when wildcard indexing is used. |
| 58 | +- db.collection.createIndex({"firstName", 1}) |
| 59 | +- db.collection.createIndex({"lastName", 1}) |
| 60 | +- db.collection.createIndex({"companyName", 1}) |
| 61 | +- db.collection.createIndex({"division", 1}) |
| 62 | +- db.collection.createIndex({"subDivision", 1}) |
| 63 | +- db.collection.createIndex({"timeInOrgInYears", 1}) |
| 64 | +- db.collection.createIndex({"subDivision", 1}) |
| 65 | +- db.collection.createIndex({"roles.teamName", 1}) |
| 66 | +- db.collection.createIndex({"roles.teamSubName", 1}) |
| 67 | +- db.collection.createIndex({"roles.timeInTeamInYears", 1}) |
| 68 | + |
| 69 | +While this sample document only requires a combination of 10 fields to be explicitly indexed, larger documents with hundreds or thousands of fields can get tedious and error prone when indexing fields individually. |
| 70 | + |
| 71 | +The jar file detailed in the rest of this document makes indexing fields in larger documents simpler. The jar takes a sample JSON document as input, parses the document and executes createIndex commands for each field without the need for user intervention. |
| 72 | + |
| 73 | +## Prerequisites |
| 74 | + |
| 75 | +### Java 21 |
| 76 | +After the virtual machine is deployed, use SSH to connect to the machine, and install CQLSH using the below commands: |
| 77 | + |
| 78 | +```bash |
| 79 | +# Install default-jdk |
| 80 | +sudo apt update |
| 81 | +sudo apt install openjdk-21-jdk |
| 82 | +``` |
| 83 | + |
| 84 | +## Sample jar to create individual indexes for all fields |
| 85 | + |
| 86 | +Clone the repository containing the Java sample to iterate through each field in the JSON document's structure and issue createIndex operations for each field in the document. |
| 87 | + |
| 88 | +```bash |
| 89 | +git clone https://github.com/Azure-Samples/cosmosdb-mongodb-vcore-wildcard-indexing.git |
| 90 | +``` |
| 91 | + |
| 92 | +The cloned repository does not need to be built if there are no changes to be made to the solution. The built runnable jar named azure-cosmosdb-mongo-data-indexer-1.0-SNAPSHOT.jar is already included in the runnableJar/ folder. The jar can be executed by specifying the following required parameters: |
| 93 | +- Azure Cosmos DB for MongoDB vCore cluster connection string with the username and password used when the cluster was provisioned |
| 94 | +- The Azure Cosmos DB for MongoDB vCore database |
| 95 | +- The collection to be indexed |
| 96 | +- The location of the json file with the document structure for the collection. This document is parsed by the jar file to extract every field and issue individual createIndex operations. |
| 97 | + |
| 98 | +```bash |
| 99 | +java -jar azure-cosmosdb-mongo-data-indexer-1.0-SNAPSHOT.jar mongodb+srv://<user>:<password>@abinav-test-benchmarking.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000 cosmicworks employee sampleEmployee.json |
| 100 | +``` |
| 101 | + |
| 102 | +## Track the status of a createIndex operation |
| 103 | +The jar file is designed to not wait on a response from each createIndex operation. The indexes are created asynchronously on the server and the progress of the index build operation on the cluster can be tracked. |
| 104 | + |
| 105 | +Consider this sample to track indexing progress on the 'cosmicworks' database. |
| 106 | +```javascript |
| 107 | +use cosmicworks; |
| 108 | +db.currentOp() |
| 109 | +``` |
| 110 | + |
| 111 | +When a createIndex operation is in progress, the response looks like: |
| 112 | +```json |
| 113 | +{ |
| 114 | + "inprog": [ |
| 115 | + { |
| 116 | + "shard": "defaultShard", |
| 117 | + "active": true, |
| 118 | + "type": "op", |
| 119 | + "opid": "30000451493:1719209762286363", |
| 120 | + "op_prefix": 30000451493, |
| 121 | + "currentOpTime": "2024-06-24T06:16:02.000Z", |
| 122 | + "secs_running": 0, |
| 123 | + "command": { "aggregate": "" }, |
| 124 | + "op": "command", |
| 125 | + "waitingForLock": false |
| 126 | + }, |
| 127 | + { |
| 128 | + "shard": "defaultShard", |
| 129 | + "active": true, |
| 130 | + "type": "op", |
| 131 | + "opid": "30000451876:1719209638351743", |
| 132 | + "op_prefix": 30000451876, |
| 133 | + "currentOpTime": "2024-06-24T06:13:58.000Z", |
| 134 | + "secs_running": 124, |
| 135 | + "command": { "createIndexes": "" }, |
| 136 | + "op": "workerCommand", |
| 137 | + "waitingForLock": false, |
| 138 | + "progress": {}, |
| 139 | + "msg": "" |
| 140 | + } |
| 141 | + ], |
| 142 | + "ok": 1 |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +## Related content |
| 147 | + |
| 148 | +Check out the full sample here - https://github.com/Azure-Samples/cosmosdb-mongodb-vcore-wildcard-indexing |
| 149 | + |
| 150 | +Check out [indexing best practices](how-to-create-indexes.md), which details best practices for indexing on Azure Cosmos DB for MongoDB vCore. |
0 commit comments