You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/storage/blobs/data-lake-storage-migrate-on-premises-HDFS-cluster.md
+24-16Lines changed: 24 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,21 +5,21 @@ services: storage
5
5
author: normesta
6
6
7
7
ms.service: storage
8
-
ms.date: 03/01/2019
8
+
ms.date: 06/05/2019
9
9
ms.author: normesta
10
10
ms.topic: article
11
11
ms.component: data-lake-storage-gen2
12
12
---
13
13
14
14
# Use Azure Data Box to migrate data from an on-premises HDFS store to Azure Storage
15
15
16
-
You can migrate data from an on-premises HDFS store of your Hadoop cluster into Azure Storage (blob storage or Data Lake Storage Gen2) by using a Data Box device.
16
+
You can migrate data from an on-premises HDFS store of your Hadoop cluster into Azure Storage (blob storage or Data Lake Storage Gen2) by using a Data Box device. You can choose from a 80-TB Data Box or a 770-TB Data Box Heavy.
17
17
18
18
This article helps you complete these tasks:
19
19
20
-
:heavy_check_mark: Copy your data to a Data Box device.
20
+
:heavy_check_mark: Copy your data to a Data Box or a Data Box Heavy device.
21
21
22
-
:heavy_check_mark: Ship the Data Box device to Microsoft.
22
+
:heavy_check_mark: Ship the device back to Microsoft.
23
23
24
24
:heavy_check_mark: Move the data onto your Data Lake Storage Gen2 storage account.
25
25
@@ -33,23 +33,23 @@ You need these things to complete the migration.
33
33
34
34
* An on-premises Hadoop cluster that contains your source data.
35
35
36
-
* An [Azure Data Box device](https://azure.microsoft.com/services/storage/databox/).
36
+
* An [Azure Data Box device](https://azure.microsoft.com/services/storage/databox/).
37
37
38
-
-[Order your Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-ordered). While ordering your Box, remember to choose a storage account that **doesn't** have hierarchical namespaces enabled on it. This is because Data Box does not yet support direct ingestion into Azure Data Lake Storage Gen2. You will need to copy into a storage account and then do a second copy into the ADLS Gen2 account. Instructions for this are given in the steps below.
39
-
-[Cable and connect your Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-set-up) to an on-premises network.
38
+
-[Order your Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-ordered) or [Data Box Heavy](https://docs.microsoft.com/azure/databox/data-box-heavy-deploy-ordered). While ordering your device, remember to choose a storage account that **doesn't** have hierarchical namespaces enabled on it. This is because Data Box devices do not yet support direct ingestion into Azure Data Lake Storage Gen2. You will need to copy into a storage account and then do a second copy into the ADLS Gen2 account. Instructions for this are given in the steps below.
39
+
- Cable and connect your [Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-set-up) or [Data Box Heavy](https://docs.microsoft.com/azure/databox/data-box-heavy-deploy-set-up) to an on-premises network.
40
40
41
41
If you are ready, let's start.
42
42
43
43
## Copy your data to a Data Box device
44
44
45
45
To copy the data from your on-premises HDFS store to a Data Box device, you'll set a few things up, and then use the [DistCp](https://hadoop.apache.org/docs/stable/hadoop-distcp/DistCp.html) tool.
46
46
47
-
If the amount of data that you are copying is more than the capacity of a single Data Box, you will have to break up your data set into sizes that do fit into your Data Boxes.
47
+
If the amount of data that you are copying is more than the capacity of a single Data Box or that of single node on Data Box Heavy, break up your data set into sizes that do fit into your devices.
48
48
49
-
Follow these steps to copy data via the REST APIs of Blob/Object storage to your Data Box. The REST API interface will make the Data Box appear as a HDFS store to your cluster.
49
+
Follow these steps to copy data via the REST APIs of Blob/Object storage to your Data Box device. The REST API interface will make the device appear as a HDFS store to your cluster.
50
50
51
51
52
-
1. Before you copy the data via REST, identify the security and connection primitives to connect to the REST interface on the Data Box. Sign in to the local web UI of Data Box and go to **Connect and copy** page. Against the Azure storage account for your Data Box, under **Access settings**, locate and select **REST(Preview)**.
52
+
1. Before you copy the data via REST, identify the security and connection primitives to connect to the REST interface on the Data Box or Data Box Heavy. Sign in to the local web UI of Data Box and go to **Connect and copy** page. Against the Azure storage account for your device, under **Access settings**, locate and select **REST**.
53
53
54
54

55
55
@@ -59,7 +59,7 @@ Follow these steps to copy data via the REST APIs of Blob/Object storage to your
59
59
60
60

61
61
62
-
3. Add the endpoint and the Data Box IP address to `/etc/hosts` on each node.
62
+
3. Add the endpoint and the Data Box or Data Box Heavy node IP address to `/etc/hosts` on each node.
@@ -119,21 +119,29 @@ Follow these steps to copy data via the REST APIs of Blob/Object storage to your
119
119
To improve the copy speed:
120
120
- Try changing the number of mappers. (The above example uses `m` = 4 mappers.)
121
121
- Try running mutliple `distcp` in parallel.
122
-
- Remember that large files perform better than small files.
122
+
- Remember that large files perform better than small files.
123
123
124
124
## Ship the Data Box to Microsoft
125
125
126
126
Follow these steps to prepare and ship the Data Box device to Microsoft.
127
127
128
-
1. After the data copy is complete, run [Prepare to ship](https://docs.microsoft.com/azure/databox/data-box-deploy-copy-data-via-rest) on your Data Box. After the device preparation is complete, download the BOM files. You will use these BOM or manifest files later to verify the data uploaded to Azure. Shut down the device and remove the cables.
129
-
2. Schedule a pickup with UPS to [Ship your Data Box back to Azure](https://docs.microsoft.com/azure/databox/data-box-deploy-picked-up).
130
-
3. After Microsoft receives your device, it is connected to the network datacenter and data is uploaded to the storage account you specified (with hierarchical namespaces disabled) when you ordered the Data Box. Verify against the BOM files that all your data is uploaded to Azure. You can now move this data to a Data Lake Storage Gen2 storage account.
128
+
1. After the data copy is complete, run:
129
+
130
+
- [Prepare to ship on your Data Box or Data Box Heavy](https://docs.microsoft.com/azure/databox/data-box-deploy-copy-data-via-rest).
131
+
- After the device preparation is complete, download the BOM files. You will use these BOM or manifest files later to verify the data uploaded to Azure.
132
+
- Shut down the device and remove the cables.
133
+
2. Schedule a pickup with UPS. Follow the instructions to:
134
+
135
+
- [Ship your Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-picked-up)
136
+
- [Ship your Data Box Heavy](https://docs.microsoft.com/azure/databox/data-box-heavy-deploy-picked-up).
137
+
3. After Microsoft receives your device, it is connected to the datacenter network and the data is uploaded to the storage account you specified (with hierarchical namespaces disabled) when you placed the device order. Verify against the BOM files that all your data is uploaded to Azure. You can now move this data to a Data Lake Storage Gen2 storage account.
138
+
131
139
132
140
## Move the data onto your Data Lake Storage Gen2 storage account
133
141
134
142
This step is needed if you are using Azure Data Lake Storage Gen2 as your data store. If you are using just a blob storage account without hierarchical namespace as your data store, you do not need to do this step.
135
143
136
-
You can do this in 2 ways.
144
+
You can do this in 2 ways.
137
145
138
146
- Use [Azure Data Factory to move data to ADLS Gen2](https://docs.microsoft.com/azure/data-factory/load-azure-data-lake-storage-gen2). You will have to specify **Azure Blob Storage** as the source.
0 commit comments