You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cosmos-db/bulk-executor-graph-dotnet.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ ms.reviewer: sngun
14
14
15
15
This tutorial provides instructions about using Azure CosmosDB's bulk executor .NET library to import and update graph objects into an Azure Cosmos DB Gremlin API container. This process makes use of the Graph class in the [bulk executor library](https://docs.microsoft.com/azure/cosmos-db/bulk-executor-overview) to create Vertex and Edge objects programmatically to then insert multiple of them per network request. This behavior is configurable through the bulk executor library to make optimal use of both database and local memory resources.
16
16
17
-
As opposed to sending Gremlin queries to a database, where the command is evaluated and then executed one at a time, using the bulk executor library will instead require to create and validate the objects locally. After creating the objects, the library allows you to send graph objects to the database service sequentially. Using this method, data ingestion speeds can be increased up to 100x, which makes it an ideal method for initial data migrations or periodical data movement operations. Learn more by visiting the GitHub page of the [Azure Cosmos DB Graph bulk executor sample application](https://aka.ms/graph-bulkexecutor-sample).
17
+
As opposed to sending Gremlin queries to a database, where the command is evaluated and then executed one at a time, using the bulk executor library will instead require to create and validate the objects locally. After creating the objects, the library allows you to send graph objects to the database service sequentially. Using this method, data ingestion speeds can be increased up to 100x, which makes it an ideal method for initial data migrations or periodical data movement operations. Learn more by visiting the GitHub page of the [Azure Cosmos DB Graph bulk executor sample application](https://github.com/Azure-Samples/azure-cosmosdb-graph-bulkexecutor-dotnet-getting-started).
* Git. For more information check out the [Git Downloads page](https://git-scm.com/downloads).
116
116
117
117
### Clone the sample application
118
-
In this tutorial, we'll follow through the steps for getting started by using the [Azure Cosmos DB Graph bulk executor sample](https://aka.ms/graph-bulkexecutor-sample) hosted on GitHub. This application consists of a .NET solution that randomly generates vertex and edge objects and then executes bulk insertions to the specified graph database account. To get the application, run the `git clone` command below:
118
+
In this tutorial, we'll follow through the steps for getting started by using the [Azure Cosmos DB Graph bulk executor sample](https://github.com/Azure-Samples/azure-cosmosdb-graph-bulkexecutor-dotnet-getting-started) hosted on GitHub. This application consists of a .NET solution that randomly generates vertex and edge objects and then executes bulk insertions to the specified graph database account. To get the application, run the `git clone` command below:
Copy file name to clipboardExpand all lines: articles/cosmos-db/bulk-executor-java.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ ms.reviewer: sngun
13
13
14
14
# Use bulk executor Java library to perform bulk operations on Azure Cosmos DB data
15
15
16
-
This tutorial provides instructions on using the Azure Cosmos DB’s bulk executor Java library to import, and update Azure Cosmos DB documents. To learn about bulk executor library and how it helps you leverage massive throughput and storage, see [bulk executor Library overview](bulk-executor-overview.md) article. In this tutorial, you build a Java application that generates random documents and they are bulk imported into an Azure Cosmos container. After importing, you will bulk update some properties of a document.
16
+
This tutorial provides instructions on using the Azure Cosmos DB's bulk executor Java library to import, and update Azure Cosmos DB documents. To learn about bulk executor library and how it helps you leverage massive throughput and storage, see [bulk executor Library overview](bulk-executor-overview.md) article. In this tutorial, you build a Java application that generates random documents and they are bulk imported into an Azure Cosmos container. After importing, you will bulk update some properties of a document.
17
17
18
18
Currently, the bulk executor library is supported only by Azure Cosmos DB SQL API and Gremlin API accounts. This article describes how to use bulk executor Java library with SQL API accounts. To learn about using bulk executor .NET library with Gremlin API, see [perform bulk operations in Azure Cosmos DB Gremlin API](bulk-executor-graph-dotnet.md).
19
19
@@ -23,7 +23,7 @@ Currently, the bulk executor library is supported only by Azure Cosmos DB SQL AP
23
23
24
24
* You can [try Azure Cosmos DB for free](https://azure.microsoft.com/try/cosmosdb/) without an Azure subscription, free of charge and commitments. Or, you can use the [Azure Cosmos DB Emulator](https://docs.microsoft.com/azure/cosmos-db/local-emulator) with the `https://localhost:8081` endpoint. The Primary Key is provided in [Authenticating requests](local-emulator.md#authenticating-requests).
25
25
26
-
*[Java Development Kit (JDK) 1.7+](https://aka.ms/azure-jdks)
26
+
*[Java Development Kit (JDK) 1.7+](/java/azure/jdk/?view=azure-java-stable)
27
27
- On Ubuntu, run `apt-get install default-jdk` to install the JDK.
28
28
29
29
- Be sure to set the JAVA_HOME environment variable to point to the folder where the JDK is installed.
@@ -46,7 +46,7 @@ The cloned repository contains two samples "bulkimport" and "bulkupdate" relativ
46
46
47
47
## Bulk import data to Azure Cosmos DB
48
48
49
-
1. The Azure Cosmos DB’s connection strings are read as arguments and assigned to variables defined in CmdLineConfiguration.java file.
49
+
1. The Azure Cosmos DB's connection strings are read as arguments and assigned to variables defined in CmdLineConfiguration.java file.
50
50
51
51
2. Next the DocumentClient object is initialized by using the following statements:
52
52
@@ -126,7 +126,7 @@ The cloned repository contains two samples "bulkimport" and "bulkupdate" relativ
126
126
6. After the target dependencies are generated, you can invoke the bulk importer application by using the following command:
127
127
128
128
```java
129
-
java -Xmx12G-jar bulkexecutor-sample-1.0-SNAPSHOT-jar-with-dependencies.jar -serviceEndpoint *<Fill in your AzureCosmosDB’s endpoint>*-masterKey *<Fill in your AzureCosmosDB’s master key>*-databaseId bulkImportDb -collectionId bulkImportColl -operation import-shouldCreateCollection-collectionThroughput1000000-partitionKey/profileid-maxConnectionPoolSize6000-numberOfDocumentsForEachCheckpoint1000000-numberOfCheckpoints10
129
+
java -Xmx12G-jar bulkexecutor-sample-1.0-SNAPSHOT-jar-with-dependencies.jar -serviceEndpoint *<Fill in your AzureCosmosDB's endpoint>* -masterKey *<Fill in your Azure Cosmos DB's master key>*-databaseId bulkImportDb -collectionId bulkImportColl -operation import-shouldCreateCollection-collectionThroughput1000000-partitionKey/profileid-maxConnectionPoolSize6000-numberOfDocumentsForEachCheckpoint1000000-numberOfCheckpoints10
130
130
```
131
131
132
132
The bulk importer creates a new database and a collection with the database name, collection name, and throughput values specified in the App.config file.
@@ -146,7 +146,7 @@ You can update existing documents by using the BulkUpdateAsync API. In this exam
@@ -179,7 +179,7 @@ You can update existing documents by using the BulkUpdateAsync API. In this exam
179
179
|int getNumberOfDocumentsUpdated() | The total number of documents that were successfully updated out of the documents supplied to the bulk update API call. |
180
180
|double getTotalRequestUnitsConsumed() | The total request units (RU) consumed by the bulk update API call. |
181
181
|Duration getTotalTimeTaken() | The total time taken by the bulk update API call to complete execution. |
182
-
|List\<Exception> getErrors() |Gets the list of errors if some documents out of the batch supplied to the bulk update API call failed to get inserted. |
182
+
|List\<Exception> getErrors() |Gets the list of errors if some documents out of the batch supplied to the bulk update API call failed to get inserted. |
183
183
184
184
3. After you have the bulk update application ready, build the command-line tool from source by using the 'mvn clean package' command. This command generates a jar file in the target folder:
185
185
@@ -190,7 +190,7 @@ You can update existing documents by using the BulkUpdateAsync API. In this exam
190
190
4. After the target dependencies are generated, you can invoke the bulk update application by using the following command:
191
191
192
192
```
193
-
java -Xmx12G -jar bulkexecutor-sample-1.0-SNAPSHOT-jar-with-dependencies.jar -serviceEndpoint **<Fill in your Azure Cosmos DB’s endpoint>* -masterKey **<Fill in your Azure Cosmos DB’s master key>* -databaseId bulkUpdateDb -collectionId bulkUpdateColl -operation update -collectionThroughput 1000000 -partitionKey /profileid -maxConnectionPoolSize 6000 -numberOfDocumentsForEachCheckpoint 1000000 -numberOfCheckpoints 10
193
+
java -Xmx12G -jar bulkexecutor-sample-1.0-SNAPSHOT-jar-with-dependencies.jar -serviceEndpoint **<Fill in your Azure Cosmos DB's endpoint>* -masterKey **<Fill in your Azure Cosmos DB's master key>* -databaseId bulkUpdateDb -collectionId bulkUpdateColl -operation update -collectionThroughput 1000000 -partitionKey /profileid -maxConnectionPoolSize 6000 -numberOfDocumentsForEachCheckpoint 1000000 -numberOfCheckpoints 10
194
194
```
195
195
196
196
## Performance tips
@@ -200,14 +200,14 @@ Consider the following points for better performance when using bulk executor li
200
200
* For best performance, run your application from an Azure VM in the same region as your Cosmos DB account write region.
201
201
* For achieving higher throughput:
202
202
203
-
* Set the JVM’s heap size to a large enough number to avoid any memory issue in handling large number of documents. Suggested heap size: max(3GB, 3 * sizeof(all documents passed to bulk import API in one batch)).
203
+
* Set the JVM's heap size to a large enough number to avoid any memory issue in handling large number of documents. Suggested heap size: max(3GB, 3 * sizeof(all documents passed to bulk import API in one batch)).
204
204
* There is a preprocessing time, due to which you will get higher throughput when performing bulk operations with a large number of documents. So, if you want to import 10,000,000 documents, running bulk import 10 times on 10 bulk of documents each of size 1,000,000 is preferable than running bulk import 100 times on 100 bulk of documents each of size 100,000 documents.
205
205
206
206
* It is recommended to instantiate a single DocumentBulkExecutor object for the entire application within a single virtual machine that corresponds to a specific Azure Cosmos container.
207
207
208
208
* Since a single bulk operation API execution consumes a large chunk of the client machine's CPU and network IO. This happens by spawning multiple tasks internally, avoid spawning multiple concurrent tasks within your application process each executing bulk operation API calls. If a single bulk operation API call running on a single virtual machine is unable to consume your entire container's throughput (if your container's throughput > 1 million RU/s), it's preferable to create separate virtual machines to concurrently execute bulk operation API calls.
209
209
210
-
210
+
211
211
## Next steps
212
212
* To learn about maven package details and release notes of bulk executor Java library, see[bulk executor SDK details](sql-api-sdk-bulk-executor-java.md).
Copy file name to clipboardExpand all lines: articles/cosmos-db/bulk-executor-overview.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,4 +46,4 @@ The bulk executor library makes sure to maximally utilize the throughput allocat
46
46
* Learn more by trying out the sample applications consuming the bulk executor library in [.NET](bulk-executor-dot-net.md) and [Java](bulk-executor-java.md).
47
47
* Check out the bulk executor SDK information and release notes in [.NET](sql-api-sdk-bulk-executor-dot-net.md) and [Java](sql-api-sdk-bulk-executor-java.md).
48
48
* The bulk executor library is integrated into the Cosmos DB Spark connector, to learn more, see [Azure Cosmos DB Spark connector](spark-connector.md) article.
49
-
* The bulk executor library is also integrated into a new version of [Azure Cosmos DB connector](https://aka.ms/bulkexecutor-adf-v2) for Azure Data Factory to copy data.
49
+
* The bulk executor library is also integrated into a new version of [Azure Cosmos DB connector](../data-factory/connector-azure-cosmos-db.md) for Azure Data Factory to copy data.
Copy file name to clipboardExpand all lines: articles/cosmos-db/cosmosdb-migrationchoices.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ The following factors determine the choice of the migration tool:
33
33
|Offline|[Azure Cosmos DB Spark connector](https://docs.microsoft.com/azure/cosmos-db/spark-connector)|• Makes use of the Azure Cosmos DB bulk executor library <br/>• Suitable for large datasets <br/>• Needs a custom Spark setup <br/>• Spark is sensitive to schema inconsistencies and this can be a problem during migration |
34
34
|Offline|[Custom tool with Cosmos DB bulk executor library](https://docs.microsoft.com/azure/cosmos-db/migrate-cosmosdb-data)|• Provides checkpointing, dead-lettering capabilities which increases migration resiliency <br/>• Suitable for very large datasets (10 TB+) <br/>• Requires custom setup of this tool running as an App Service |
35
35
|Online|[Cosmos DB Functions + ChangeFeed API](https://docs.microsoft.com/azure/cosmos-db/change-feed-functions)|• Easy to set up <br/>• Works only if the source is an Azure Cosmos DB container <br/>• Not suitable for large datasets <br/>• Does not capture deletes from the source container |
36
-
|Online|[Custom Migration Service using ChangeFeed](https://aka.ms/CosmosDBMigrationSample)|• Provides progress tracking <br/>• Works only if the source is an Azure Cosmos DB container <br/>• Works for larger datasets as well <br/>• Requires the user to set up an App Service to host the Change feed processor <br/>• Does not capture deletes from the source container|
36
+
|Online|[Custom Migration Service using ChangeFeed](https://github.com/nomiero/CosmosDBLiveETLSample)|• Provides progress tracking <br/>• Works only if the source is an Azure Cosmos DB container <br/>• Works for larger datasets as well <br/>• Requires the user to set up an App Service to host the Change feed processor <br/>• Does not capture deletes from the source container|
37
37
|Online|[Striim](https://docs.microsoft.com/azure/cosmos-db/cosmosdb-sql-api-migrate-data-striim)|• Works with a large variety of sources like Oracle, DB2, SQL Server <br/>• Easy to build ETL pipelines and provides a dashboard for monitoring <br/>• Supports larger datasets <br/>• Since this is a third-party tool, it needs to be purchased from the marketplace and installed in the user's environment|
Copy file name to clipboardExpand all lines: articles/cosmos-db/create-cassandra-api-account-java.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ This tutorial covers the following tasks:
29
29
30
30
* If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?ref=microsoft.com&utm_source=microsoft.com&utm_medium=docs&utm_campaign=visualstudio) before you begin.
31
31
32
-
* Get the latest version of [Java Development Kit (JDK)](https://aka.ms/azure-jdks).
32
+
* Get the latest version of [Java Development Kit (JDK)](/java/azure/jdk/?view=azure-java-stable).
33
33
34
34
*[Download](https://maven.apache.org/download.cgi) and [install](https://maven.apache.org/install.html) the [Maven](https://maven.apache.org/) binary archive.
35
35
- On Ubuntu, you can run `apt-get install maven` to install Maven.
0 commit comments