Skip to content

Commit cfa9de7

Browse files
committed
Updating pre-migration doc
1 parent 6aaf83c commit cfa9de7

File tree

1 file changed

+24
-34
lines changed

1 file changed

+24
-34
lines changed

articles/cosmos-db/mongodb-pre-migration.md

Lines changed: 24 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -7,61 +7,51 @@ ms.subservice: cosmosdb-mongo
77
ms.topic: conceptual
88
ms.date: 04/17/2019
99
ms.author: roaror
10-
1110
---
1211

1312
# Pre-migration steps for data migrations from MongoDB to Azure Cosmos DB's API for MongoDB
1413

15-
Before you migrate your data from MongoDB (either on-premises or in the cloud (IaaS)) to Azure Cosmos DB’s API for MongoDB, you should:
16-
17-
1. [Create an Azure Cosmos DB account](#create-account)
18-
2. [Estimate the throughput needed for your workloads](#estimate-throughput)
19-
3. [Pick an optimal partition key for your data](#partitioning)
20-
4. [Understand the indexing policy that you can set on your data](#indexing)
21-
22-
If you have already completed the above pre-requisites for migration, see the [Migrate MongoDB data to Azure Cosmos DB's API for MongoDB](../dms/tutorial-mongodb-cosmos-db.md) for the actual data migration instructions. If not, this document provides instructions to handle these pre-requisites.
14+
Before you migrate your data from MongoDB (either on-premises or in the cloud) to Azure Cosmos DB’s API for MongoDB, you should:
2315

24-
## <a id="create-account"></a> Create an Azure Cosmos DB account
16+
1. [Read the key considerations about using Azure Cosmos DB's API for MongoDB](#considerations)
17+
2. [Choose an option to migrate your data](#options)
18+
3. [Estimate the throughput needed for your workloads](#estimate-throughput)
19+
4. [Pick an optimal partition key for your data](#partitioning)
20+
5. [Understand the indexing policy that you can set on your data](#indexing)
2521

26-
Before starting the migration, you need to [create an Azure Cosmos account using Azure Cosmos DBs API for MongoDB](create-mongodb-dotnet.md).
22+
If you have already completed the above pre-requisites for migration, you can [Migrate MongoDB data to Azure Cosmos DB's API for MongoDB using the Azure Database Migration Service](../dms/tutorial-mongodb-cosmos-db.md). Additionally, if you haven't created an account, you can browse any of the [Quickstarts](create-mongodb-dotnet.md).
2723

28-
At the account creation, you can choose settings to [globally distribute](distribute-data-globally.md) your data. You also have the option to enable multi-region writes (or multi-master configuration), that allows each of your regions to be both a write and read region.
29-
30-
![Account-Creation](./media/mongodb-pre-migration/account-creation.png)
31-
32-
## <a id="estimate-throughput"></a> Estimate the throughput need for your workloads
24+
## <a id="considerations">Main considerations when using Azure Cosmos DB's API for MongoDB</a>
3325

34-
Before starting the migration by using the [Database Migration Service (DMS)](../dms/dms-overview.md), you should estimate the amount of throughput to provision for your Azure Cosmos databases and collections.
26+
The following are specific characteristics about Azure Cosmos DB's API for MongoDB:
27+
- **Capacity model**: Database capacity on Cosmos DB is based on a throughput-based model. This model is based on [Request Units per second](request-units.md), which is a unit that represents the number of database operations that can be executed against a collection on a per-second basis. This capacity can be allocated at [a database or collection level](set-throughput.md), and it can works on an allocation model, or an [AutoPilot model](provision-throughput-autopilot.md).
28+
- **Request Units**: Every database operation has an associated Request Units (RUs) cost in Cosmos DB. When executed, this is substracted from the available request units level on a given second. If a request requires more RUs than currently allocated the two options are increasing the amount of RUs, or waiting until the next second starts, then retrying the operation.
29+
- **Elastic capacity**: The capacity for a given collection or database can change at any time. This allows for the database to elastically adapt to the throughput requirements of your workload.
30+
- **Automatic sharding**: Cosmos DB provides an automatic partitioning system that only requires a shard (or partitioning) key. The [automatic partitioning mechanism](partition-data.md) is shared across all Cosmos DB APIs and it allows for seamless data and throughout scaling through horizointal distribution.
3531

36-
Throughput can be provisioned on either:
32+
## <a id="options">Migration options for Azure Cosmos DB's API for MongoDB</a>
3733

38-
- Collection
34+
|**Migration type**|**Solution**|**Considerations**|
35+
|---------|---------|---------|
36+
|Offline|[Data Migration Tool](https://docs.microsoft.com/azure/cosmos-db/import-data)|&bull; Easy to set up and supports multiple sources <br/>&bull; Not suitable for large datasets.|
37+
|Offline|[Azure Data Factory](https://docs.microsoft.com/azure/data-factory/connector-azure-cosmos-db)|&bull; Easy to set up and supports multiple sources <br/>&bull; Makes use of the Azure Cosmos DB bulk executor library <br/>&bull; Suitable for large datasets <br/>&bull; Lack of checkpointing means that any issue during the course of migration would require a restart of the whole migration process<br/>&bull; Lack of a dead letter queue would mean that a few erroneous files could stop the entire migration process <br/>&bull; Needs custom code to increase read throughput for certain data sources|
38+
|Offline|[Existing Mongo Tools (mongodump, mongorestore, Studio3T)](https://azure.microsoft.com/resources/videos/using-mongodb-tools-with-azure-cosmos-db/)|&bull; Easy to set up and integration <br/>&bull; Needs custom handling for throttles|
39+
|Online|[Azure Database Migration Service](https://docs.microsoft.com/azure/dms/tutorial-mongodb-cosmos-db-online)|&bull; Fully managed migration service.<br/>&bull; Provides hosting and monitoring solutions for the migration task. <br/>&bull; Suitable for large datasets and takes care of replicating live changes <br/>&bull; Works only with other MongoDB sources|
3940

40-
- Database
4141

42-
> [!NOTE]
43-
> You can also have a combination of the above, where some collections in a database may have dedicated provisioned throughput and others may share the throughput. For details, please see the [set throughput on a database and a container](set-throughput.md) page.
44-
>
45-
46-
You should first decide whether you want to provision database or collection level throughput, or a combination of both. In general, it is recommended to configure a dedicated throughput at the collection level. Provisioning throughput at the database-level enables collections in your database to share the provisioned throughput . With shared throughput, however, there is no guarantee for a specific throughput on each individual collection, and you don’t get predictable performance on any specific collection.
47-
48-
If you are not sure about how much throughput should be dedicated to each individual collection, you can choose database-level throughput. You can think of the provisioned throughput configured on your Azure Cosmos database as a logical equivalent to that of the compute capacity of a MongoDB VM or a physical server, but more cost-effective with the ability to elastically scale. For more information, see [Provision throughput on Azure Cosmos containers and databases](set-throughput.md).
49-
50-
If you provision throughput at the database level, all collections created within that database must be created with a partition/shard key. For more information on partitioning, see [Partitioning and horizontal scaling in Azure Cosmos DB](partition-data.md). If you do not specify a partition/shard key during the migration, the Azure Database Migration Service automatically populates the shard key field with an *_id* attribute that is automatically generated for each document.
51-
52-
### Optimal number of Request Units (RUs) to provision
42+
## <a id="estimate-throughput"></a> Estimate the throughput need for your workloads
5343

54-
In Azure Cosmos DB, the throughput is provisioned in advance and is measured in Request Units (RU's) per second. If you have workloads that run MongoDB on a VM or on-premises, think of RU's as a simple abstraction for physical resources, such as for the size of a VM or on-an premises server and the resources they possess, e.g., memory, CPU, IOPs.
44+
In Azure Cosmos DB, the throughput is provisioned in advance and is measured in Request Units (RU's) per second. Unlike VMs or on-premises servers, RUs are easy to scale up and down at any time. You can change the number of provisioned RUs instantly. For more information, see [Request units in Azure Cosmos DB](request-units.md).
5545

56-
Unlike VMs or on-premises servers, RUs are easy to scale up and down at any time. You can change the number of provisioned RUs within seconds, and you are billed only for the maximum number of RUs that you provision for a given one-hour period. For more information, see [Request units in Azure Cosmos DB](request-units.md).
46+
You can use the [Cosmos DB Capacity Calculator](https://cosmos.azure.com/capacitycalculator/) to determine the amount of Request Units based on your database account configuration, amount of data, document size, and required reads and writes per second.
5747

5848
The following are key factors that affect the number of required RUs:
5949
- **Item (i.e., document) size**: As the size of an item/document increases, the number of RUs consumed to read or write the item/document also increases.
6050
- **Item property count**: Assuming the [default indexing](index-overview.md) on all properties, the number of RUs consumed to write an item increases as the item property count increases. You can reduce the request unit consumption for write operations by [limiting the number of indexed properties](index-policy.md).
6151
- **Concurrent operations**: Request units consumed also depends on the frequency with which different CRUD operations (like writes, reads, updates, deletes) and more complex queries are executed. You can use [mongostat](https://docs.mongodb.com/manual/reference/program/mongostat/) to output the concurrency needs of your current MongoDB data.
6252
- **Query patterns**: The complexity of a query affects how many request units are consumed by the query.
6353

64-
If you export JSON files using [mongoexport](https://docs.mongodb.com/manual/reference/program/mongoexport/) and understand how many writes, reads, updates, and deletes that take place per second, you can use the [Azure Cosmos DB capacity planner](https://www.documentdb.com/capacityplanner) to estimate the initial number of RUs to provision. The capacity planner does not factor in the cost of more complex queries. So, if you have complex queries on your data, additional RUs will be consumed. The calculator also assumes that all fields are indexed, and session consistency is used. The best way to understand the cost of queries is to migrate your data (or sample data) to Azure Cosmos DB, [connect to the Cosmos DB’s endpoint](connect-mongodb-account.md) and run a sample query from the MongoDB Shell using the `getLastRequestStastistics` command to get the request charge, which will output the number of RUs consumed:
54+
The best way to understand the cost of queries is to use sample data in Azure Cosmos DB, [and run sample queries from the MongoDB Shell](connect-mongodb-account.md) using the `getLastRequestStastistics` command to get the request charge, which will output the number of RUs consumed:
6555

6656
`db.runCommand({getLastRequestStatistics: 1})`
6757

0 commit comments

Comments
 (0)