Skip to content

Commit f982150

Browse files
authored
Integrated feedback
- Highlighted DMS as the recommended option - Fixed index section, now points to Azure Cosmos DB Mongo API indexing document. - Simplified throughput estimation steps
1 parent 340cbb2 commit f982150

File tree

1 file changed

+15
-14
lines changed

1 file changed

+15
-14
lines changed

articles/cosmos-db/mongodb-pre-migration.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
title: Pre-migration steps for data migration to Azure Cosmos DB's API for MongoDB
33
description: This doc provides an overview of the prerequisites for a data migration from MongoDB to Cosmos DB.
4-
author: roaror
4+
author: lbosq
55
ms.service: cosmos-db
66
ms.subservice: cosmosdb-mongo
77
ms.topic: conceptual
88
ms.date: 01/09/2020
9-
ms.author: roaror
9+
ms.author: lbosq
1010
---
1111

1212
# Pre-migration steps for data migrations from MongoDB to Azure Cosmos DB's API for MongoDB
@@ -24,32 +24,33 @@ If you have already completed the above pre-requisites for migration, you can [M
2424
## <a id="considerations"></a>Main considerations when using Azure Cosmos DB's API for MongoDB
2525

2626
The following are specific characteristics about Azure Cosmos DB's API for MongoDB:
27-
- **Capacity model**: Database capacity on Cosmos DB is based on a throughput-based model. This model is based on [Request Units per second](request-units.md), which is a unit that represents the number of database operations that can be executed against a collection on a per-second basis. This capacity can be allocated at [a database or collection level](set-throughput.md), and it can works on an allocation model, or an [AutoPilot model](provision-throughput-autopilot.md).
28-
- **Request Units**: Every database operation has an associated Request Units (RUs) cost in Cosmos DB. When executed, this is subtracted from the available request units level on a given second. If a request requires more RUs than currently allocated the two options are increasing the amount of RUs, or waiting until the next second starts, then retrying the operation.
27+
- **Capacity model**: Database capacity on Azure Cosmos DB is based on a throughput-based model. This model is based on [Request Units per second](request-units.md), which is a unit that represents the number of database operations that can be executed against a collection on a per-second basis. This capacity can be allocated at [a database or collection level](set-throughput.md), and it can be provisioned on an allocation model, or using the [AutoPilot model](provision-throughput-autopilot.md).
28+
- **Request Units**: Every database operation has an associated Request Units (RUs) cost in Azure Cosmos DB. When executed, this is subtracted from the available request units level on a given second. If a request requires more RUs than currently allocated the two options are increasing the amount of RUs, or waiting until the next second starts, then retrying the operation.
2929
- **Elastic capacity**: The capacity for a given collection or database can change at any time. This allows for the database to elastically adapt to the throughput requirements of your workload.
30-
- **Automatic sharding**: Cosmos DB provides an automatic partitioning system that only requires a shard (or partitioning) key. The [automatic partitioning mechanism](partition-data.md) is shared across all Cosmos DB APIs and it allows for seamless data and throughout scaling through horizontal distribution.
30+
- **Automatic sharding**: Azure Cosmos DB provides an automatic partitioning system that only requires a shard (or partitioning) key. The [automatic partitioning mechanism](partition-data.md) is shared across all Azure Cosmos DB APIs and it allows for seamless data and throughout scaling through horizontal distribution.
3131

3232
## <a id="options"></a>Migration options for Azure Cosmos DB's API for MongoDB
3333

34+
The [Azure Database Migration Service for Azure Cosmos DB's API for MongoDB](../dms/tutorial-mongodb-cosmos-db.md) provides a mechanism that simplifies data migration by providing a fully managed hosting platform, migration monitoring options and automatic throttling handling. The full list of options are the following:
35+
3436
|**Migration type**|**Solution**|**Considerations**|
3537
|---------|---------|---------|
3638
|Offline|[Data Migration Tool](https://docs.microsoft.com/azure/cosmos-db/import-data)|&bull; Easy to set up and supports multiple sources <br/>&bull; Not suitable for large datasets.|
3739
|Offline|[Azure Data Factory](https://docs.microsoft.com/azure/data-factory/connector-azure-cosmos-db)|&bull; Easy to set up and supports multiple sources <br/>&bull; Makes use of the Azure Cosmos DB bulk executor library <br/>&bull; Suitable for large datasets <br/>&bull; Lack of checkpointing means that any issue during the course of migration would require a restart of the whole migration process<br/>&bull; Lack of a dead letter queue would mean that a few erroneous files could stop the entire migration process <br/>&bull; Needs custom code to increase read throughput for certain data sources|
3840
|Offline|[Existing Mongo Tools (mongodump, mongorestore, Studio3T)](https://azure.microsoft.com/resources/videos/using-mongodb-tools-with-azure-cosmos-db/)|&bull; Easy to set up and integration <br/>&bull; Needs custom handling for throttles|
39-
|Online|[Azure Database Migration Service](https://docs.microsoft.com/azure/dms/tutorial-mongodb-cosmos-db-online)|&bull; Fully managed migration service.<br/>&bull; Provides hosting and monitoring solutions for the migration task. <br/>&bull; Suitable for large datasets and takes care of replicating live changes <br/>&bull; Works only with other MongoDB sources|
41+
|Online|[Azure Database Migration Service](../dms/tutorial-mongodb-cosmos-db-online.md)|&bull; Fully managed migration service.<br/>&bull; Provides hosting and monitoring solutions for the migration task. <br/>&bull; Suitable for large datasets and takes care of replicating live changes <br/>&bull; Works only with other MongoDB sources|
4042

4143

4244
## <a id="estimate-throughput"></a> Estimate the throughput need for your workloads
4345

4446
In Azure Cosmos DB, the throughput is provisioned in advance and is measured in Request Units (RU's) per second. Unlike VMs or on-premises servers, RUs are easy to scale up and down at any time. You can change the number of provisioned RUs instantly. For more information, see [Request units in Azure Cosmos DB](request-units.md).
4547

46-
You can use the [Cosmos DB Capacity Calculator](https://cosmos.azure.com/capacitycalculator/) to determine the amount of Request Units based on your database account configuration, amount of data, document size, and required reads and writes per second.
48+
You can use the [Azure Cosmos DB Capacity Calculator](https://cosmos.azure.com/capacitycalculator/) to determine the amount of Request Units based on your database account configuration, amount of data, document size, and required reads and writes per second.
4749

4850
The following are key factors that affect the number of required RUs:
49-
- **Item (i.e., document) size**: As the size of an item/document increases, the number of RUs consumed to read or write the item/document also increases.
50-
- **Item property count**: Assuming the [default indexing](index-overview.md) on all properties, the number of RUs consumed to write an item increases as the item property count increases. You can reduce the request unit consumption for write operations by [limiting the number of indexed properties](index-policy.md).
51-
- **Concurrent operations**: Request units consumed also depends on the frequency with which different CRUD operations (like writes, reads, updates, deletes) and more complex queries are executed. You can use [mongostat](https://docs.mongodb.com/manual/reference/program/mongostat/) to output the concurrency needs of your current MongoDB data.
52-
- **Query patterns**: The complexity of a query affects how many request units are consumed by the query.
51+
- **Document size**: As the size of an item/document increases, the number of RUs consumed to read or write the item/document also increases.
52+
- **Document property count**:The number of RUs consumed to create or update a document is related to the number, complexity and length of its properties. You can reduce the request unit consumption for write operations by [limiting the number of indexed properties](mongodb-indexing.md).
53+
- **Query patterns**: The complexity of a query affects how many request units are consumed by the query.
5354

5455
The best way to understand the cost of queries is to use sample data in Azure Cosmos DB, [and run sample queries from the MongoDB Shell](connect-mongodb-account.md) using the `getLastRequestStastistics` command to get the request charge, which will output the number of RUs consumed:
5556

@@ -59,15 +60,15 @@ This command will output a JSON document similar to the following:
5960

6061
```{ "_t": "GetRequestStatisticsResponse", "ok": 1, "CommandName": "find", "RequestCharge": 10.1, "RequestDurationInMilliSeconds": 7.2}```
6162

62-
After you understand the number of RUs consumed by a query and the concurrency needs for that query, you can adjust the number of provisioned RUs. Optimizing RUs is not a one-time event - you should continually optimize or scale up the RUs provisioned, depending on whether you are not expecting a heavy traffic, as opposed to a heavy workload or importing data.
63+
You can also use [the diagnostic settings](cosmosdb-monitor-resource-logs.md) to understand the frequency and patterns of the queries executed against Azure Cosmos DB. The results from the diagnostic logs can be sent to a storage account, an EventHub instance or [Azure Log Analytics](https://docs.microsoft.com/azure/azure-monitor/log-query/get-started-portal).
6364

6465
## <a id="partitioning"></a>Choose your partition key
65-
Partitioning is a key point of consideration before migrating to a globally distributed database like Azure Cosmos DB. Azure Cosmos DB uses partitioning to scale individual containers in a database to meet the storage and throughput needs of your application. This feature works in a similar way as sharding, without the need to host and configure routing servers.
66+
Partitioning, also known as Sharding, is a key point of consideration before migrating data. Azure Cosmos DB uses fully-managed partitioning to increase the capacity in a database to meet the storage and throughput requirements. This feature doesn't need the hosting or configuration of routing servers.
6667

6768
In a similar way, the partitioning capability automatically adds capacity and re-balances the data accordingly. For details and recommendations on choosing the right partition key for your data, please see the [Choosing a Partition Key article](https://docs.microsoft.com/azure/cosmos-db/partitioning-overview#choose-partitionkey).
6869

6970
## <a id="indexing"></a>Index your data
70-
By default, Azure Cosmos DB indexes all your data fields upon ingestion. You can modify the [indexing policy](index-policy.md) in Azure Cosmos DB at any time. For more details on indexing, you can read more about it in the [Indexing in Azure Cosmos DB](index-overview.md) section.
71+
By default, Azure Cosmos DB provides automatic indexing on all data inserted. The indexing capabilities provided by Azure Cosmos DB include adding composite indices, unique indices and time-to-live (TTL) indices. The index management interface is mapped to the `createIndex()` command. Learn more at [Indexing in Azure Cosmos DB's API for MongoDB](mongodb-indexing.md).
7172

7273
[Azure Database Migration Service](../dms/tutorial-mongodb-cosmos-db.md) automatically migrates MongoDB collections with unique indexes. However, the unique indexes must be created before the migration. Azure Cosmos DB does not support the creation of unique indexes, when there is already data in your collections. For more information, see [Unique keys in Azure Cosmos DB](unique-keys.md).
7374

0 commit comments

Comments
 (0)