Skip to content

Commit ade5816

Browse files
authored
Merge pull request #95392 from twooley/twadlsg1seo
updating titles for SEO and markdown cleanup
2 parents 2cb664e + 82099e9 commit ade5816

7 files changed

+234
-216
lines changed

articles/data-lake-store/data-lake-store-copy-data-azure-storage-blob.md

Lines changed: 28 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,44 @@
11
---
2-
title: Copy data from Azure Storage Blobs into Azure Data Lake Storage Gen1 | Microsoft Docs
2+
title: Copy data from Azure Storage blobs to Data Lake Storage Gen1
33
description: Use AdlCopy tool to copy data from Azure Storage Blobs to Azure Data Lake Storage Gen1
4-
services: data-lake-store
5-
documentationcenter: ''
6-
author: twooley
7-
manager: mtillman
8-
editor: cgronlun
94

10-
ms.assetid: dc273ef8-96ef-47a6-b831-98e8a777a5c1
5+
author: twooley
116
ms.service: data-lake-store
12-
ms.devlang: na
137
ms.topic: conceptual
148
ms.date: 05/29/2018
159
ms.author: twooley
1610

1711
---
1812
# Copy data from Azure Storage Blobs to Azure Data Lake Storage Gen1
13+
1914
> [!div class="op_single_selector"]
2015
> * [Using DistCp](data-lake-store-copy-data-wasb-distcp.md)
2116
> * [Using AdlCopy](data-lake-store-copy-data-azure-storage-blob.md)
2217
>
2318
>
2419
25-
Azure Data Lake Storage Gen1 provides a command line tool, [AdlCopy](https://www.microsoft.com/download/details.aspx?id=50358), to copy data from the following sources:
20+
Data Lake Storage Gen1 provides a command-line tool, [AdlCopy](https://www.microsoft.com/download/details.aspx?id=50358), to copy data from the following sources:
2621

27-
* From Azure Storage Blobs into Data Lake Storage Gen1. You cannot use AdlCopy to copy data from Data Lake Storage Gen1 to Azure Storage blobs.
28-
* Between two Azure Data Lake Storage Gen1 accounts.
22+
* From Azure Storage blobs into Data Lake Storage Gen1. You can't use AdlCopy to copy data from Data Lake Storage Gen1 to Azure Storage blobs.
23+
* Between two Data Lake Storage Gen1 accounts.
2924

3025
Also, you can use the AdlCopy tool in two different modes:
3126

3227
* **Standalone**, where the tool uses Data Lake Storage Gen1 resources to perform the task.
3328
* **Using a Data Lake Analytics account**, where the units assigned to your Data Lake Analytics account are used to perform the copy operation. You might want to use this option when you are looking to perform the copy tasks in a predictable manner.
3429

3530
## Prerequisites
31+
3632
Before you begin this article, you must have the following:
3733

3834
* **An Azure subscription**. See [Get Azure free trial](https://azure.microsoft.com/pricing/free-trial/).
39-
* **Azure Storage Blobs** container with some data.
40-
* **An Azure Data Lake Storage Gen1 account**. For instructions on how to create one, see [Get started with Azure Data Lake Storage Gen1](data-lake-store-get-started-portal.md)
41-
* **Azure Data Lake Analytics account (optional)** - See [Get started with Azure Data Lake Analytics](../data-lake-analytics/data-lake-analytics-get-started-portal.md) for instructions on how to create a Data Lake Analytics account.
35+
* **Azure Storage blobs** container with some data.
36+
* **A Data Lake Storage Gen1 account**. For instructions on how to create one, see [Get started with Azure Data Lake Storage Gen1](data-lake-store-get-started-portal.md)
37+
* **Data Lake Analytics account (optional)** - See [Get started with Azure Data Lake Analytics](../data-lake-analytics/data-lake-analytics-get-started-portal.md) for instructions on how to create a Data Lake Analytics account.
4238
* **AdlCopy tool**. Install the [AdlCopy tool](https://www.microsoft.com/download/details.aspx?id=50358).
4339

4440
## Syntax of the AdlCopy tool
41+
4542
Use the following syntax to work with the AdlCopy tool
4643

4744
AdlCopy /Source <Blob or Data Lake Storage Gen1 source> /Dest <Data Lake Storage Gen1 destination> /SourceKey <Key for Blob account> /Account <Data Lake Analytics account> /Units <Number of Analytics units> /Pattern
@@ -55,19 +52,20 @@ The parameters in the syntax are described below:
5552
| SourceKey |Specifies the storage access key for the Azure storage blob source. This is required only if the source is a blob container or a blob. |
5653
| Account |**Optional**. Use this if you want to use Azure Data Lake Analytics account to run the copy job. If you use the /Account option in the syntax but do not specify a Data Lake Analytics account, AdlCopy uses a default account to run the job. Also, if you use this option, you must add the source (Azure Storage Blob) and destination (Azure Data Lake Storage Gen1) as data sources for your Data Lake Analytics account. |
5754
| Units |Specifies the number of Data Lake Analytics units that will be used for the copy job. This option is mandatory if you use the **/Account** option to specify the Data Lake Analytics account. |
58-
| Pattern |Specifies a regex pattern that indicates which blobs or files to copy. AdlCopy uses case-sensitive matching. The default pattern used when no pattern is specified is to copy all items. Specifying multiple file patterns is not supported. |
55+
| Pattern |Specifies a regex pattern that indicates which blobs or files to copy. AdlCopy uses case-sensitive matching. The default pattern when no pattern is specified is to copy all items. Specifying multiple file patterns is not supported. |
5956

6057
## Use AdlCopy (as standalone) to copy data from an Azure Storage blob
58+
6159
1. Open a command prompt and navigate to the directory where AdlCopy is installed, typically `%HOMEPATH%\Documents\adlcopy`.
62-
2. Run the following command to copy a specific blob from the source container to a Data Lake Storage Gen1 folder:
60+
1. Run the following command to copy a specific blob from the source container to a Data Lake Storage Gen1 folder:
6361

6462
AdlCopy /source https://<source_account>.blob.core.windows.net/<source_container>/<blob name> /dest swebhdfs://<dest_adlsg1_account>.azuredatalakestore.net/<dest_folder>/ /sourcekey <storage_account_key_for_storage_container>
6563

6664
For example:
6765

6866
AdlCopy /source https://mystorage.blob.core.windows.net/mycluster/HdiSamples/HdiSamples/WebsiteLogSampleData/SampleLog/909f2b.log /dest swebhdfs://mydatalakestorage.azuredatalakestore.net/mynewfolder/ /sourcekey uJUfvD6cEvhfLoBae2yyQf8t9/BpbWZ4XoYj4kAS5Jf40pZaMNf0q6a8yqTxktwVgRED4vPHeh/50iS9atS5LQ==
6967

70-
>[!NOTE]
68+
>[!NOTE]
7169
>The syntax above specifies the file to be copied to a folder in the Data Lake Storage Gen1 account. AdlCopy tool creates a folder if the specified folder name does not exist.
7270
7371
You will be prompted to enter the credentials for the Azure subscription under which you have your Data Lake Storage Gen1 account. You will see an output similar to the following:
@@ -91,10 +89,11 @@ The parameters in the syntax are described below:
9189
If you are copying from an Azure Blob Storage account, you may be throttled during copy on the blob storage side. This will degrade the performance of your copy job. To learn more about the limits of Azure Blob Storage, see Azure Storage limits at [Azure subscription and service limits](../azure-subscription-service-limits.md).
9290

9391
## Use AdlCopy (as standalone) to copy data from another Data Lake Storage Gen1 account
92+
9493
You can also use AdlCopy to copy data between two Data Lake Storage Gen1 accounts.
9594

9695
1. Open a command prompt and navigate to the directory where AdlCopy is installed, typically `%HOMEPATH%\Documents\adlcopy`.
97-
2. Run the following command to copy a specific file from one Data Lake Storage Gen1 account to another.
96+
1. Run the following command to copy a specific file from one Data Lake Storage Gen1 account to another.
9897

9998
AdlCopy /Source adl://<source_adlsg1_account>.azuredatalakestore.net/<path_to_file> /dest adl://<dest_adlsg1_account>.azuredatalakestore.net/<path>/
10099

@@ -114,15 +113,16 @@ You can also use AdlCopy to copy data between two Data Lake Storage Gen1 account
114113
100% data copied.
115114
Finishing Copy.
116115
Copy Completed. 1 file copied.
117-
3. The following command copies all files from a specific folder in the source Data Lake Storage Gen1 account to a folder in the destination Data Lake Storage Gen1 account.
116+
1. The following command copies all files from a specific folder in the source Data Lake Storage Gen1 account to a folder in the destination Data Lake Storage Gen1 account.
118117

119118
AdlCopy /Source adl://mydatastorage.azuredatalakestore.net/mynewfolder/ /dest adl://mynewdatalakestorage.azuredatalakestore.net/mynewfolder/
120119

121120
### Performance considerations
122121

123-
When using AdlCopy as a standalone tool, the copy is run on shared, Azure managed resources. The performance you may get in this environment depends on system load and available resources. This mode is best used for small transfers on an ad hoc basis. No parameters need to be tuned when using AdlCopy as a standalone tool.
122+
When using AdlCopy as a standalone tool, the copy is run on shared, Azure-managed resources. The performance you may get in this environment depends on system load and available resources. This mode is best used for small transfers on an ad hoc basis. No parameters need to be tuned when using AdlCopy as a standalone tool.
124123

125124
## Use AdlCopy (with Data Lake Analytics account) to copy data
125+
126126
You can also use your Data Lake Analytics account to run the AdlCopy job to copy data from Azure storage blobs to Data Lake Storage Gen1. You would typically use this option when the data to be moved is in the range of gigabytes and terabytes, and you want better and predictable performance throughput.
127127

128128
To use your Data Lake Analytics account with AdlCopy to copy from an Azure Storage Blob, the source (Azure Storage Blob) must be added as a data source for your Data Lake Analytics account. For instructions on adding additional data sources to your Data Lake Analytics account, see [Manage Data Lake Analytics account data sources](../data-lake-analytics/data-lake-analytics-manage-use-portal.md#manage-data-sources).
@@ -149,10 +149,11 @@ Similarly, run the following command to copy all files from a specific folder in
149149
When copying data in the range of terabytes, using AdlCopy with your own Azure Data Lake Analytics account provides better and more predictable performance. The parameter that should be tuned is the number of Azure Data Lake Analytics Units to use for the copy job. Increasing the number of units will increase the performance of your copy job. Each file to be copied can use maximum one unit. Specifying more units than the number of files being copied will not increase performance.
150150

151151
## Use AdlCopy to copy data using pattern matching
152+
152153
In this section, you learn how to use AdlCopy to copy data from a source (in our example below we use Azure Storage Blob) to a destination Data Lake Storage Gen1 account using pattern matching. For example, you can use the steps below to copy all files with .csv extension from the source blob to the destination.
153154

154155
1. Open a command prompt and navigate to the directory where AdlCopy is installed, typically `%HOMEPATH%\Documents\adlcopy`.
155-
2. Run the following command to copy all files with *.csv extension from a specific blob from the source container to a Data Lake Storage Gen1 folder:
156+
1. Run the following command to copy all files with *.csv extension from a specific blob from the source container to a Data Lake Storage Gen1 folder:
156157

157158
AdlCopy /source https://<source_account>.blob.core.windows.net/<source_container>/<blob name> /dest swebhdfs://<dest_adlsg1_account>.azuredatalakestore.net/<dest_folder>/ /sourcekey <storage_account_key_for_storage_container> /Pattern *.csv
158159

@@ -161,20 +162,24 @@ In this section, you learn how to use AdlCopy to copy data from a source (in our
161162
AdlCopy /source https://mystorage.blob.core.windows.net/mycluster/HdiSamples/HdiSamples/FoodInspectionData/ /dest adl://mydatalakestorage.azuredatalakestore.net/mynewfolder/ /sourcekey uJUfvD6cEvhfLoBae2yyQf8t9/BpbWZ4XoYj4kAS5Jf40pZaMNf0q6a8yqTxktwVgRED4vPHeh/50iS9atS5LQ== /Pattern *.csv
162163

163164
## Billing
165+
164166
* If you use the AdlCopy tool as standalone you will be billed for egress costs for moving data, if the source Azure Storage account is not in the same region as the Data Lake Storage Gen1 account.
165167
* If you use the AdlCopy tool with your Data Lake Analytics account, standard [Data Lake Analytics billing rates](https://azure.microsoft.com/pricing/details/data-lake-analytics/) will apply.
166168

167169
## Considerations for using AdlCopy
168-
* AdlCopy (for version 1.0.5), supports copying data from sources that collectively have more than thousands of files and folders. However, if you encounter issues copying a large dataset, you can distribute the files/folders into different sub-folders and use the path to those sub-folders as the source instead.
170+
171+
* AdlCopy (for version 1.0.5), supports copying data from sources that collectively have more than thousands of files and folders. However, if you encounter issues copying a large dataset, you can distribute the files/folders into different subfolders and use the path to those subfolders as the source instead.
169172

170173
## Performance considerations for using AdlCopy
171174

172-
AdlCopy supports copying data containing thousands of files and folders. However, if you encounter issues copying a large dataset, you can distribute the files/folders into smaller sub-folders. AdlCopy was built for ad hoc copies. If you are trying to copy data on a recurring basis, you should consider using [Azure Data Factory](../data-factory/connector-azure-data-lake-store.md) that provides full management around the copy operations.
175+
AdlCopy supports copying data containing thousands of files and folders. However, if you encounter issues copying a large dataset, you can distribute the files/folders into smaller subfolders. AdlCopy was built for ad hoc copies. If you are trying to copy data on a recurring basis, you should consider using [Azure Data Factory](../data-factory/connector-azure-data-lake-store.md) that provides full management around the copy operations.
173176

174177
## Release notes
178+
175179
* 1.0.13 - If you are copying data to the same Azure Data Lake Storage Gen1 account across multiple adlcopy commands, you do not need to reenter your credentials for each run anymore. Adlcopy will now cache that information across multiple runs.
176180

177181
## Next steps
182+
178183
* [Secure data in Data Lake Storage Gen1](data-lake-store-secure-data.md)
179184
* [Use Azure Data Lake Analytics with Data Lake Storage Gen1](../data-lake-analytics/data-lake-analytics-get-started-portal.md)
180185
* [Use Azure HDInsight with Data Lake Storage Gen1](data-lake-store-hdinsight-hadoop-use-portal.md)

0 commit comments

Comments
 (0)