You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Azure Data Lake Storage Gen1 provides a commandline tool, [AdlCopy](https://www.microsoft.com/download/details.aspx?id=50358), to copy data from the following sources:
20
+
Data Lake Storage Gen1 provides a command-line tool, [AdlCopy](https://www.microsoft.com/download/details.aspx?id=50358), to copy data from the following sources:
26
21
27
-
* From Azure Storage Blobs into Data Lake Storage Gen1. You cannot use AdlCopy to copy data from Data Lake Storage Gen1 to Azure Storage blobs.
28
-
* Between two Azure Data Lake Storage Gen1 accounts.
22
+
* From Azure Storage blobs into Data Lake Storage Gen1. You can't use AdlCopy to copy data from Data Lake Storage Gen1 to Azure Storage blobs.
23
+
* Between two Data Lake Storage Gen1 accounts.
29
24
30
25
Also, you can use the AdlCopy tool in two different modes:
31
26
32
27
***Standalone**, where the tool uses Data Lake Storage Gen1 resources to perform the task.
33
28
***Using a Data Lake Analytics account**, where the units assigned to your Data Lake Analytics account are used to perform the copy operation. You might want to use this option when you are looking to perform the copy tasks in a predictable manner.
34
29
35
30
## Prerequisites
31
+
36
32
Before you begin this article, you must have the following:
37
33
38
34
***An Azure subscription**. See [Get Azure free trial](https://azure.microsoft.com/pricing/free-trial/).
39
-
***Azure Storage Blobs** container with some data.
40
-
***An Azure Data Lake Storage Gen1 account**. For instructions on how to create one, see [Get started with Azure Data Lake Storage Gen1](data-lake-store-get-started-portal.md)
41
-
***Azure Data Lake Analytics account (optional)** - See [Get started with Azure Data Lake Analytics](../data-lake-analytics/data-lake-analytics-get-started-portal.md) for instructions on how to create a Data Lake Analytics account.
35
+
***Azure Storage blobs** container with some data.
36
+
***A Data Lake Storage Gen1 account**. For instructions on how to create one, see [Get started with Azure Data Lake Storage Gen1](data-lake-store-get-started-portal.md)
37
+
***Data Lake Analytics account (optional)** - See [Get started with Azure Data Lake Analytics](../data-lake-analytics/data-lake-analytics-get-started-portal.md) for instructions on how to create a Data Lake Analytics account.
42
38
***AdlCopy tool**. Install the [AdlCopy tool](https://www.microsoft.com/download/details.aspx?id=50358).
43
39
44
40
## Syntax of the AdlCopy tool
41
+
45
42
Use the following syntax to work with the AdlCopy tool
46
43
47
44
AdlCopy /Source <Blob or Data Lake Storage Gen1 source> /Dest <Data Lake Storage Gen1 destination> /SourceKey <Key for Blob account> /Account <Data Lake Analytics account> /Units <Number of Analytics units> /Pattern
@@ -55,19 +52,20 @@ The parameters in the syntax are described below:
55
52
| SourceKey |Specifies the storage access key for the Azure storage blob source. This is required only if the source is a blob container or a blob. |
56
53
| Account |**Optional**. Use this if you want to use Azure Data Lake Analytics account to run the copy job. If you use the /Account option in the syntax but do not specify a Data Lake Analytics account, AdlCopy uses a default account to run the job. Also, if you use this option, you must add the source (Azure Storage Blob) and destination (Azure Data Lake Storage Gen1) as data sources for your Data Lake Analytics account. |
57
54
| Units |Specifies the number of Data Lake Analytics units that will be used for the copy job. This option is mandatory if you use the **/Account** option to specify the Data Lake Analytics account. |
58
-
| Pattern |Specifies a regex pattern that indicates which blobs or files to copy. AdlCopy uses case-sensitive matching. The default pattern used when no pattern is specified is to copy all items. Specifying multiple file patterns is not supported. |
55
+
| Pattern |Specifies a regex pattern that indicates which blobs or files to copy. AdlCopy uses case-sensitive matching. The default pattern when no pattern is specified is to copy all items. Specifying multiple file patterns is not supported. |
59
56
60
57
## Use AdlCopy (as standalone) to copy data from an Azure Storage blob
58
+
61
59
1. Open a command prompt and navigate to the directory where AdlCopy is installed, typically `%HOMEPATH%\Documents\adlcopy`.
62
-
2. Run the following command to copy a specific blob from the source container to a Data Lake Storage Gen1 folder:
60
+
1. Run the following command to copy a specific blob from the source container to a Data Lake Storage Gen1 folder:
>The syntax above specifies the file to be copied to a folder in the Data Lake Storage Gen1 account. AdlCopy tool creates a folder if the specified folder name does not exist.
72
70
73
71
You will be prompted to enter the credentials for the Azure subscription under which you have your Data Lake Storage Gen1 account. You will see an output similar to the following:
@@ -91,10 +89,11 @@ The parameters in the syntax are described below:
91
89
If you are copying from an Azure Blob Storage account, you may be throttled during copy on the blob storage side. This will degrade the performance of your copy job. To learn more about the limits of Azure Blob Storage, see Azure Storage limits at [Azure subscription and service limits](../azure-subscription-service-limits.md).
92
90
93
91
## Use AdlCopy (as standalone) to copy data from another Data Lake Storage Gen1 account
92
+
94
93
You can also use AdlCopy to copy data between two Data Lake Storage Gen1 accounts.
95
94
96
95
1. Open a command prompt and navigate to the directory where AdlCopy is installed, typically `%HOMEPATH%\Documents\adlcopy`.
97
-
2. Run the following command to copy a specific file from one Data Lake Storage Gen1 account to another.
96
+
1. Run the following command to copy a specific file from one Data Lake Storage Gen1 account to another.
@@ -114,15 +113,16 @@ You can also use AdlCopy to copy data between two Data Lake Storage Gen1 account
114
113
100% data copied.
115
114
Finishing Copy.
116
115
Copy Completed. 1 file copied.
117
-
3. The following command copies all files from a specific folder in the source Data Lake Storage Gen1 account to a folder in the destination Data Lake Storage Gen1 account.
116
+
1. The following command copies all files from a specific folder in the source Data Lake Storage Gen1 account to a folder in the destination Data Lake Storage Gen1 account.
When using AdlCopy as a standalone tool, the copy is run on shared, Azuremanaged resources. The performance you may get in this environment depends on system load and available resources. This mode is best used for small transfers on an ad hoc basis. No parameters need to be tuned when using AdlCopy as a standalone tool.
122
+
When using AdlCopy as a standalone tool, the copy is run on shared, Azure-managed resources. The performance you may get in this environment depends on system load and available resources. This mode is best used for small transfers on an ad hoc basis. No parameters need to be tuned when using AdlCopy as a standalone tool.
124
123
125
124
## Use AdlCopy (with Data Lake Analytics account) to copy data
125
+
126
126
You can also use your Data Lake Analytics account to run the AdlCopy job to copy data from Azure storage blobs to Data Lake Storage Gen1. You would typically use this option when the data to be moved is in the range of gigabytes and terabytes, and you want better and predictable performance throughput.
127
127
128
128
To use your Data Lake Analytics account with AdlCopy to copy from an Azure Storage Blob, the source (Azure Storage Blob) must be added as a data source for your Data Lake Analytics account. For instructions on adding additional data sources to your Data Lake Analytics account, see [Manage Data Lake Analytics account data sources](../data-lake-analytics/data-lake-analytics-manage-use-portal.md#manage-data-sources).
@@ -149,10 +149,11 @@ Similarly, run the following command to copy all files from a specific folder in
149
149
When copying data in the range of terabytes, using AdlCopy with your own Azure Data Lake Analytics account provides better and more predictable performance. The parameter that should be tuned is the number of Azure Data Lake Analytics Units to use for the copy job. Increasing the number of units will increase the performance of your copy job. Each file to be copied can use maximum one unit. Specifying more units than the number of files being copied will not increase performance.
150
150
151
151
## Use AdlCopy to copy data using pattern matching
152
+
152
153
In this section, you learn how to use AdlCopy to copy data from a source (in our example below we use Azure Storage Blob) to a destination Data Lake Storage Gen1 account using pattern matching. For example, you can use the steps below to copy all files with .csv extension from the source blob to the destination.
153
154
154
155
1. Open a command prompt and navigate to the directory where AdlCopy is installed, typically `%HOMEPATH%\Documents\adlcopy`.
155
-
2. Run the following command to copy all files with *.csv extension from a specific blob from the source container to a Data Lake Storage Gen1 folder:
156
+
1. Run the following command to copy all files with *.csv extension from a specific blob from the source container to a Data Lake Storage Gen1 folder:
* If you use the AdlCopy tool as standalone you will be billed for egress costs for moving data, if the source Azure Storage account is not in the same region as the Data Lake Storage Gen1 account.
165
167
* If you use the AdlCopy tool with your Data Lake Analytics account, standard [Data Lake Analytics billing rates](https://azure.microsoft.com/pricing/details/data-lake-analytics/) will apply.
166
168
167
169
## Considerations for using AdlCopy
168
-
* AdlCopy (for version 1.0.5), supports copying data from sources that collectively have more than thousands of files and folders. However, if you encounter issues copying a large dataset, you can distribute the files/folders into different sub-folders and use the path to those sub-folders as the source instead.
170
+
171
+
* AdlCopy (for version 1.0.5), supports copying data from sources that collectively have more than thousands of files and folders. However, if you encounter issues copying a large dataset, you can distribute the files/folders into different subfolders and use the path to those subfolders as the source instead.
169
172
170
173
## Performance considerations for using AdlCopy
171
174
172
-
AdlCopy supports copying data containing thousands of files and folders. However, if you encounter issues copying a large dataset, you can distribute the files/folders into smaller sub-folders. AdlCopy was built for ad hoc copies. If you are trying to copy data on a recurring basis, you should consider using [Azure Data Factory](../data-factory/connector-azure-data-lake-store.md) that provides full management around the copy operations.
175
+
AdlCopy supports copying data containing thousands of files and folders. However, if you encounter issues copying a large dataset, you can distribute the files/folders into smaller subfolders. AdlCopy was built for ad hoc copies. If you are trying to copy data on a recurring basis, you should consider using [Azure Data Factory](../data-factory/connector-azure-data-lake-store.md) that provides full management around the copy operations.
173
176
174
177
## Release notes
178
+
175
179
* 1.0.13 - If you are copying data to the same Azure Data Lake Storage Gen1 account across multiple adlcopy commands, you do not need to reenter your credentials for each run anymore. Adlcopy will now cache that information across multiple runs.
176
180
177
181
## Next steps
182
+
178
183
*[Secure data in Data Lake Storage Gen1](data-lake-store-secure-data.md)
179
184
*[Use Azure Data Lake Analytics with Data Lake Storage Gen1](../data-lake-analytics/data-lake-analytics-get-started-portal.md)
180
185
*[Use Azure HDInsight with Data Lake Storage Gen1](data-lake-store-hdinsight-hadoop-use-portal.md)
0 commit comments