Skip to content

Commit fb91a35

Browse files
Update for co_locate_with and name change to DocumentDB
Update for co_locate_with and name change to DocumentDB
2 parents dad9aa5 + 63d9c08 commit fb91a35

File tree

5 files changed

+68
-5
lines changed

5 files changed

+68
-5
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
__pycache__/

README.md

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Schema Transformer: Migrating RU-based Azure Cosmos DB for MongoDB to vCore-based
1+
# Schema Transformer: Migrating RU-based Azure Cosmos DB for MongoDB to Azure DocumentDB
22

3-
Schema Transformer is a Python script designed to analyze Mongo RU Collection schemas and efficiently transform them into a vCore-optimized structure. This ensures seamless compatibility and enhances query performance.
3+
Schema Transformer is a Python script designed to analyze Mongo RU Collection schemas and efficiently transform them into a DocumentDB optimized structure. This ensures seamless compatibility and enhances query performance.
44

55
With this tool, you can generate index and sharding recommendations tailored specifically to your workload, making your migration smoother and more efficient.
66

@@ -9,16 +9,17 @@ With this tool, you can generate index and sharding recommendations tailored spe
99
The tool supports the following versions:
1010

1111
- **Source:** Azure Cosmos DB for MongoDB RU-based (version 4.2 and above)
12-
- **Target:** Azure Cosmos DB for MongoDB vCore (all versions)
12+
- **Target:** Azure DocumentDB (all versions)
1313

1414
## How to Run the Script
1515

1616
### Prerequisites
1717

1818
Before running the assessment, ensure that the client machine meets the following requirements:
1919

20-
- Access to both source and target MongoDB endpoints, either over a private or public network via the specified IP or hostname.
20+
- Access to both source MongoDB RU endpoint and target Azure DocumentDb endpoint, either over a private or public network via the specified IP or hostname.
2121
- Python (version 3.10 or above) must be installed.
22+
- PyMongo library must be installed (`pip install pymongo`).
2223

2324
### Steps to Run the Assessment
2425

@@ -132,6 +133,27 @@ Before running the assessment, ensure that the client machine meets the followin
132133
}
133134
```
134135

136+
6. To colocate collections with a reference collection
137+
138+
```json
139+
{
140+
"sections": [
141+
{
142+
"include": [
143+
"db1.coll2",
144+
"db1.coll3"
145+
],
146+
"migrate_shard_key": "false",
147+
"drop_if_exists": "true",
148+
"optimize_compound_indexes": "true",
149+
"co_locate_with": "coll1"
150+
}
151+
]
152+
}
153+
```
154+
155+
**Note:** The collection specified in `co_locate_with` must already exist in the same database as the collection being processed. If the reference collection is not found, the script will fail with an error.
156+
135157
4. Run the following command, providing the full path of the JSON file created in the previous step:
136158

137159
```cmd
@@ -148,3 +170,4 @@ This process will generate a vCore-optimized schema with index and sharding reco
148170
| **migrate_shard_key** | Determines whether the existing shard key definition should be migrated. If set to `True`, the shard key is retained; if `False`, the target collection remains unsharded. Collections that are originally unsharded in the source will remain unsharded in the target, regardless of this setting. **Default:** `False`. |
149171
| **drop_if_exists** | Specifies whether collections with the same name in the target should be dropped and recreated. If `True`, existing collections are removed before migration; if `False`, they remain unchanged. **Default:** `False`. |
150172
| **optimize_compound_indexes** | Controls whether compound indexes should be optimized. If `True`, the script identifies redundant indexes and excludes them from migration; if `False`, all indexes are migrated as-is. **Default:** `False`. |
173+
| **co_locate_with** | Specifies the name of a reference collection from the same database to colocate with. When specified, the target collection will be colocated with the reference collection for improved query performance. The reference collection must exist in the same database before colocation is applied, or an error will be thrown. This option is useful for optimizing queries that join or access related collections together. **Default:** `None`. |

collection_config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from dataclasses import dataclass
2+
from typing import Optional
23

34
@dataclass
45
class CollectionConfig:
@@ -11,3 +12,4 @@ class CollectionConfig:
1112
migrate_shard_key: bool
1213
drop_if_exists: bool
1314
optimize_compound_indexes: bool = False
15+
co_locate_with: Optional[str] = None

json_parser.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ def parse_json(self) -> List[CollectionConfig]:
3232
migrate_shard_key = section.get("migrate_shard_key", "false").lower() == "true"
3333
drop_if_exists = section.get("drop_if_exists", "false").lower() == "true"
3434
optimize_compound_indexes = section.get("optimize_compound_indexes", "false").lower() == "true"
35+
co_locate_with = section.get("co_locate_with")
3536

3637
for collection in collections_to_migrate:
3738
if collection in collection_configs:
@@ -43,7 +44,8 @@ def parse_json(self) -> List[CollectionConfig]:
4344
collection_name=collection_name,
4445
migrate_shard_key=migrate_shard_key,
4546
drop_if_exists=drop_if_exists,
46-
optimize_compound_indexes=optimize_compound_indexes
47+
optimize_compound_indexes=optimize_compound_indexes,
48+
co_locate_with=co_locate_with
4749
)
4850
collection_configs[collection] = collection_config
4951
return collection_configs.values()

schema_migration.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,11 @@ def migrate_schema(
5454
else:
5555
print("-- Target collection already exists. Skipping creation.")
5656

57+
# Handle colocation if specified
58+
if collection_config.co_locate_with:
59+
print(f"-- Setting up colocation with collection: {collection_config.co_locate_with}")
60+
self._setup_colocation(dest_db, collection_name, collection_config.co_locate_with)
61+
5762
# Check if shard key should be created
5863
if collection_config.migrate_shard_key:
5964
source_shard_key = self._get_shard_key_ru(source_db, collection_config)
@@ -176,3 +181,33 @@ def _is_subarray(self, sub: List, main: List) -> bool:
176181
if main[i:i + sub_len] == sub:
177182
return True
178183
return False
184+
185+
def _setup_colocation(self, dest_db: Database, collection_name: str, reference_collection: str) -> None:
186+
"""
187+
Set up colocation for a collection with a reference collection.
188+
189+
:param dest_db: The destination database object.
190+
:param collection_name: The name of the collection to colocate.
191+
:param reference_collection: The name of the reference collection to colocate with.
192+
:raises ValueError: If the reference collection does not exist.
193+
"""
194+
# Check if reference collection exists
195+
if reference_collection not in dest_db.list_collection_names():
196+
raise ValueError(
197+
f"Reference collection '{reference_collection}' not found in database '{dest_db.name}'. "
198+
f"Cannot colocate collection '{collection_name}'."
199+
)
200+
201+
# Run collMod command to set up colocation
202+
try:
203+
dest_db.command({
204+
"collMod": collection_name,
205+
"colocation": {
206+
"collection": reference_collection
207+
}
208+
})
209+
print(f"---- Successfully colocated '{collection_name}' with '{reference_collection}'")
210+
except Exception as e:
211+
raise ValueError(
212+
f"Failed to colocate collection '{collection_name}' with '{reference_collection}': {str(e)}"
213+
)

0 commit comments

Comments
 (0)