Schema Transformer is a Python script designed to analyze Mongo RU Collection schemas and efficiently transform them into a DocumentDB optimized structure. This ensures seamless compatibility and enhances query performance.
With this tool, you can generate index and sharding recommendations tailored specifically to your workload, making your migration smoother and more efficient.
The tool supports the following versions:
- Source: Azure Cosmos DB for MongoDB RU-based (version 4.2 and above)
- Target: Azure DocumentDB (all versions)
Before running the assessment, ensure that the client machine meets the following requirements:
- Access to both source MongoDB RU endpoint and target Azure DocumentDb endpoint, either over a private or public network via the specified IP or hostname.
- Python (version 3.10 or above) must be installed.
- PyMongo library must be installed (
pip install pymongo).
-
Download the latest release and unzip it.
-
Open the command prompt and navigate to the extracted directory.
-
Create a JSON file to define the collections to be migrated. Each section in the configuration will define the schema migration options for a set of collections (you can specify
*to refer to all collections in an account anddb.*to refer all collections within a database). Refer the next section for more details on configuration options. Here are some examples --
To specify all collections present in the account
{ "sections": [ { "include": [ "*" ], "exclude": [], "migrate_shard_key": "false", "drop_if_exists": "true", "optimize_compound_indexes": "true" } ] } -
To specify all collections except a particular database
{ "sections": [ { "include": [ "*" ], "exclude": [ "db1.*" ], "migrate_shard_key": "false", "drop_if_exists": "true", "optimize_compound_indexes": "true" } ] } -
To specify all collections except few
{ "sections": [ { "include": [ "*" ], "exclude": [ "db1.coll1", "db2.coll2" ], "migrate_shard_key": "false", "drop_if_exists": "true", "optimize_compound_indexes": "true" } ] } -
To migrate specific collections
{ "sections": [ { "include": [ "db1.coll1", "db2.coll2" ], "migrate_shard_key": "false", "drop_if_exists": "true", "optimize_compound_indexes": "true" } ] } -
To migrate different set of collections with different configuration options
{ "sections": [ { "include": [ "*" ], "exclude": [ "db1.coll1", "db2.coll2" ], "migrate_shard_key": "false", "drop_if_exists": "true", "optimize_compound_indexes": "true" }, { "include": [ "db1.coll1", "db2.coll2" ], "migrate_shard_key": "true", "drop_if_exists": "true", "optimize_compound_indexes": "true" } ] } -
To colocate collections with a reference collection
{ "sections": [ { "include": [ "db1.coll2", "db1.coll3" ], "migrate_shard_key": "false", "drop_if_exists": "true", "optimize_compound_indexes": "true", "co_locate_with": "coll1" } ] }Note: The collection specified in
co_locate_withmust already exist in the same database as the collection being processed. If the reference collection is not found, the script will fail with an error.
-
-
Run the following command, providing the full path of the JSON file created in the previous step:
python main.py --config <path_to_your_json_file> --source-uri <source_mongo_connection_string> --dest-uri <destination_connection_string>
This process will generate a vCore-optimized schema with index and sharding recommendations based on your workload.
| Option | Description |
|---|---|
| migrate_shard_key | Determines whether the existing shard key definition should be migrated. If set to True, the shard key is retained; if False, the target collection remains unsharded. Collections that are originally unsharded in the source will remain unsharded in the target, regardless of this setting. Default: False. |
| drop_if_exists | Specifies whether collections with the same name in the target should be dropped and recreated. If True, existing collections are removed before migration; if False, they remain unchanged. Default: False. |
| optimize_compound_indexes | Controls whether compound indexes should be optimized. If True, the script identifies redundant indexes and excludes them from migration; if False, all indexes are migrated as-is. Default: False. |
| co_locate_with | Specifies the name of a reference collection from the same database to colocate with. When specified, the target collection will be colocated with the reference collection for improved query performance. The reference collection must exist in the same database before colocation is applied, or an error will be thrown. This option is useful for optimizing queries that join or access related collections together. Default: None. |