You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Highlighted DMS as the recommended option
- Fixed index section, now points to Azure Cosmos DB Mongo API indexing document.
- Simplified throughput estimation steps
Copy file name to clipboardExpand all lines: articles/cosmos-db/mongodb-pre-migration.md
+15-14Lines changed: 15 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,12 @@
1
1
---
2
2
title: Pre-migration steps for data migration to Azure Cosmos DB's API for MongoDB
3
3
description: This doc provides an overview of the prerequisites for a data migration from MongoDB to Cosmos DB.
4
-
author: roaror
4
+
author: lbosq
5
5
ms.service: cosmos-db
6
6
ms.subservice: cosmosdb-mongo
7
7
ms.topic: conceptual
8
8
ms.date: 01/09/2020
9
-
ms.author: roaror
9
+
ms.author: lbosq
10
10
---
11
11
12
12
# Pre-migration steps for data migrations from MongoDB to Azure Cosmos DB's API for MongoDB
@@ -24,32 +24,33 @@ If you have already completed the above pre-requisites for migration, you can [M
24
24
## <aid="considerations"></a>Main considerations when using Azure Cosmos DB's API for MongoDB
25
25
26
26
The following are specific characteristics about Azure Cosmos DB's API for MongoDB:
27
-
-**Capacity model**: Database capacity on Cosmos DB is based on a throughput-based model. This model is based on [Request Units per second](request-units.md), which is a unit that represents the number of database operations that can be executed against a collection on a per-second basis. This capacity can be allocated at [a database or collection level](set-throughput.md), and it can works on an allocation model, or an[AutoPilot model](provision-throughput-autopilot.md).
28
-
-**Request Units**: Every database operation has an associated Request Units (RUs) cost in Cosmos DB. When executed, this is subtracted from the available request units level on a given second. If a request requires more RUs than currently allocated the two options are increasing the amount of RUs, or waiting until the next second starts, then retrying the operation.
27
+
-**Capacity model**: Database capacity on Azure Cosmos DB is based on a throughput-based model. This model is based on [Request Units per second](request-units.md), which is a unit that represents the number of database operations that can be executed against a collection on a per-second basis. This capacity can be allocated at [a database or collection level](set-throughput.md), and it can be provisioned on an allocation model, or using the[AutoPilot model](provision-throughput-autopilot.md).
28
+
-**Request Units**: Every database operation has an associated Request Units (RUs) cost in Azure Cosmos DB. When executed, this is subtracted from the available request units level on a given second. If a request requires more RUs than currently allocated the two options are increasing the amount of RUs, or waiting until the next second starts, then retrying the operation.
29
29
-**Elastic capacity**: The capacity for a given collection or database can change at any time. This allows for the database to elastically adapt to the throughput requirements of your workload.
30
-
-**Automatic sharding**: Cosmos DB provides an automatic partitioning system that only requires a shard (or partitioning) key. The [automatic partitioning mechanism](partition-data.md) is shared across all Cosmos DB APIs and it allows for seamless data and throughout scaling through horizontal distribution.
30
+
-**Automatic sharding**: Azure Cosmos DB provides an automatic partitioning system that only requires a shard (or partitioning) key. The [automatic partitioning mechanism](partition-data.md) is shared across all Azure Cosmos DB APIs and it allows for seamless data and throughout scaling through horizontal distribution.
31
31
32
32
## <aid="options"></a>Migration options for Azure Cosmos DB's API for MongoDB
33
33
34
+
The [Azure Database Migration Service for Azure Cosmos DB's API for MongoDB](../dms/tutorial-mongodb-cosmos-db.md) provides a mechanism that simplifies data migration by providing a fully managed hosting platform, migration monitoring options and automatic throttling handling. The full list of options are the following:
|Offline|[Data Migration Tool](https://docs.microsoft.com/azure/cosmos-db/import-data)|• Easy to set up and supports multiple sources <br/>• Not suitable for large datasets.|
37
39
|Offline|[Azure Data Factory](https://docs.microsoft.com/azure/data-factory/connector-azure-cosmos-db)|• Easy to set up and supports multiple sources <br/>• Makes use of the Azure Cosmos DB bulk executor library <br/>• Suitable for large datasets <br/>• Lack of checkpointing means that any issue during the course of migration would require a restart of the whole migration process<br/>• Lack of a dead letter queue would mean that a few erroneous files could stop the entire migration process <br/>• Needs custom code to increase read throughput for certain data sources|
38
40
|Offline|[Existing Mongo Tools (mongodump, mongorestore, Studio3T)](https://azure.microsoft.com/resources/videos/using-mongodb-tools-with-azure-cosmos-db/)|• Easy to set up and integration <br/>• Needs custom handling for throttles|
39
-
|Online|[Azure Database Migration Service](https://docs.microsoft.com/azure/dms/tutorial-mongodb-cosmos-db-online)|• Fully managed migration service.<br/>• Provides hosting and monitoring solutions for the migration task. <br/>• Suitable for large datasets and takes care of replicating live changes <br/>• Works only with other MongoDB sources|
41
+
|Online|[Azure Database Migration Service](../dms/tutorial-mongodb-cosmos-db-online.md)|• Fully managed migration service.<br/>• Provides hosting and monitoring solutions for the migration task. <br/>• Suitable for large datasets and takes care of replicating live changes <br/>• Works only with other MongoDB sources|
40
42
41
43
42
44
## <aid="estimate-throughput"></a> Estimate the throughput need for your workloads
43
45
44
46
In Azure Cosmos DB, the throughput is provisioned in advance and is measured in Request Units (RU's) per second. Unlike VMs or on-premises servers, RUs are easy to scale up and down at any time. You can change the number of provisioned RUs instantly. For more information, see [Request units in Azure Cosmos DB](request-units.md).
45
47
46
-
You can use the [Cosmos DB Capacity Calculator](https://cosmos.azure.com/capacitycalculator/) to determine the amount of Request Units based on your database account configuration, amount of data, document size, and required reads and writes per second.
48
+
You can use the [Azure Cosmos DB Capacity Calculator](https://cosmos.azure.com/capacitycalculator/) to determine the amount of Request Units based on your database account configuration, amount of data, document size, and required reads and writes per second.
47
49
48
50
The following are key factors that affect the number of required RUs:
49
-
-**Item (i.e., document) size**: As the size of an item/document increases, the number of RUs consumed to read or write the item/document also increases.
50
-
-**Item property count**: Assuming the [default indexing](index-overview.md) on all properties, the number of RUs consumed to write an item increases as the item property count increases. You can reduce the request unit consumption for write operations by [limiting the number of indexed properties](index-policy.md).
51
-
-**Concurrent operations**: Request units consumed also depends on the frequency with which different CRUD operations (like writes, reads, updates, deletes) and more complex queries are executed. You can use [mongostat](https://docs.mongodb.com/manual/reference/program/mongostat/) to output the concurrency needs of your current MongoDB data.
52
-
-**Query patterns**: The complexity of a query affects how many request units are consumed by the query.
51
+
-**Document size**: As the size of an item/document increases, the number of RUs consumed to read or write the item/document also increases.
52
+
-**Document property count**:The number of RUs consumed to create or update a document is related to the number, complexity and length of its properties. You can reduce the request unit consumption for write operations by [limiting the number of indexed properties](mongodb-indexing.md).
53
+
-**Query patterns**: The complexity of a query affects how many request units are consumed by the query.
53
54
54
55
The best way to understand the cost of queries is to use sample data in Azure Cosmos DB, [and run sample queries from the MongoDB Shell](connect-mongodb-account.md) using the `getLastRequestStastistics` command to get the request charge, which will output the number of RUs consumed:
55
56
@@ -59,15 +60,15 @@ This command will output a JSON document similar to the following:
After you understand the number of RUs consumed by a query and the concurrency needs for that query, you can adjust the number of provisioned RUs. Optimizing RUs is not a one-time event - you should continually optimize or scale up the RUs provisioned, depending on whether you are not expecting a heavy traffic, as opposed to a heavy workload or importing data.
63
+
You can also use [the diagnostic settings](cosmosdb-monitor-resource-logs.md) to understand the frequency and patterns of the queries executed against Azure Cosmos DB. The results from the diagnostic logs can be sent to a storage account, an EventHub instance or [Azure Log Analytics](https://docs.microsoft.com/azure/azure-monitor/log-query/get-started-portal).
63
64
64
65
## <aid="partitioning"></a>Choose your partition key
65
-
Partitioningis a key point of consideration before migrating to a globally distributed database like Azure Cosmos DB. Azure Cosmos DB uses partitioning to scale individual containers in a database to meet the storage and throughput needs of your application. This feature works in a similar way as sharding, without the need to host and configure routing servers.
66
+
Partitioning, also known as Sharding, is a key point of consideration before migrating data. Azure Cosmos DB uses fully-managed partitioning to increase the capacity in a database to meet the storage and throughput requirements. This feature doesn't need the hosting or configuration of routing servers.
66
67
67
68
In a similar way, the partitioning capability automatically adds capacity and re-balances the data accordingly. For details and recommendations on choosing the right partition key for your data, please see the [Choosing a Partition Key article](https://docs.microsoft.com/azure/cosmos-db/partitioning-overview#choose-partitionkey).
68
69
69
70
## <aid="indexing"></a>Index your data
70
-
By default, Azure Cosmos DB indexes all your data fields upon ingestion. You can modify the [indexing policy](index-policy.md) in Azure Cosmos DB at any time. For more details on indexing, you can read more about it in the[Indexing in Azure Cosmos DB](index-overview.md) section.
71
+
By default, Azure Cosmos DB provides automatic indexing on all data inserted. The indexing capabilities provided by Azure Cosmos DB include adding composite indices, unique indices and time-to-live (TTL) indices. The index management interface is mapped to the `createIndex()` command. Learn more at[Indexing in Azure Cosmos DB's API for MongoDB](mongodb-indexing.md).
71
72
72
73
[Azure Database Migration Service](../dms/tutorial-mongodb-cosmos-db.md) automatically migrates MongoDB collections with unique indexes. However, the unique indexes must be created before the migration. Azure Cosmos DB does not support the creation of unique indexes, when there is already data in your collections. For more information, see [Unique keys in Azure Cosmos DB](unique-keys.md).
0 commit comments