Skip to content

Commit 71b550d

Browse files
committed
Added migrate to ADLS Gen2 article
1 parent ff0fae2 commit 71b550d

File tree

2 files changed

+24
-16
lines changed

2 files changed

+24
-16
lines changed

articles/storage/blobs/data-lake-storage-migrate-on-premises-HDFS-cluster.md

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,21 @@ services: storage
55
author: normesta
66

77
ms.service: storage
8-
ms.date: 03/01/2019
8+
ms.date: 06/05/2019
99
ms.author: normesta
1010
ms.topic: article
1111
ms.component: data-lake-storage-gen2
1212
---
1313

1414
# Use Azure Data Box to migrate data from an on-premises HDFS store to Azure Storage
1515

16-
You can migrate data from an on-premises HDFS store of your Hadoop cluster into Azure Storage (blob storage or Data Lake Storage Gen2) by using a Data Box device.
16+
You can migrate data from an on-premises HDFS store of your Hadoop cluster into Azure Storage (blob storage or Data Lake Storage Gen2) by using a Data Box device. You can choose from a 80-TB Data Box or a 770-TB Data Box Heavy.
1717

1818
This article helps you complete these tasks:
1919

20-
:heavy_check_mark: Copy your data to a Data Box device.
20+
:heavy_check_mark: Copy your data to a Data Box or a Data Box Heavy device.
2121

22-
:heavy_check_mark: Ship the Data Box device to Microsoft.
22+
:heavy_check_mark: Ship the device back to Microsoft.
2323

2424
:heavy_check_mark: Move the data onto your Data Lake Storage Gen2 storage account.
2525

@@ -33,23 +33,23 @@ You need these things to complete the migration.
3333

3434
* An on-premises Hadoop cluster that contains your source data.
3535

36-
* An [Azure Data Box device](https://azure.microsoft.com/services/storage/databox/).
36+
* An [Azure Data Box device](https://azure.microsoft.com/services/storage/databox/).
3737

38-
- [Order your Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-ordered). While ordering your Box, remember to choose a storage account that **doesn't** have hierarchical namespaces enabled on it. This is because Data Box does not yet support direct ingestion into Azure Data Lake Storage Gen2. You will need to copy into a storage account and then do a second copy into the ADLS Gen2 account. Instructions for this are given in the steps below.
39-
- [Cable and connect your Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-set-up) to an on-premises network.
38+
- [Order your Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-ordered) or [Data Box Heavy](https://docs.microsoft.com/azure/databox/data-box-heavy-deploy-ordered). While ordering your device, remember to choose a storage account that **doesn't** have hierarchical namespaces enabled on it. This is because Data Box devices do not yet support direct ingestion into Azure Data Lake Storage Gen2. You will need to copy into a storage account and then do a second copy into the ADLS Gen2 account. Instructions for this are given in the steps below.
39+
- Cable and connect your [Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-set-up) or [Data Box Heavy](https://docs.microsoft.com/azure/databox/data-box-heavy-deploy-set-up) to an on-premises network.
4040

4141
If you are ready, let's start.
4242

4343
## Copy your data to a Data Box device
4444

4545
To copy the data from your on-premises HDFS store to a Data Box device, you'll set a few things up, and then use the [DistCp](https://hadoop.apache.org/docs/stable/hadoop-distcp/DistCp.html) tool.
4646

47-
If the amount of data that you are copying is more than the capacity of a single Data Box, you will have to break up your data set into sizes that do fit into your Data Boxes.
47+
If the amount of data that you are copying is more than the capacity of a single Data Box or that of single node on Data Box Heavy, break up your data set into sizes that do fit into your devices.
4848

49-
Follow these steps to copy data via the REST APIs of Blob/Object storage to your Data Box. The REST API interface will make the Data Box appear as a HDFS store to your cluster.
49+
Follow these steps to copy data via the REST APIs of Blob/Object storage to your Data Box device. The REST API interface will make the device appear as a HDFS store to your cluster.
5050

5151

52-
1. Before you copy the data via REST, identify the security and connection primitives to connect to the REST interface on the Data Box. Sign in to the local web UI of Data Box and go to **Connect and copy** page. Against the Azure storage account for your Data Box, under **Access settings**, locate and select **REST(Preview)**.
52+
1. Before you copy the data via REST, identify the security and connection primitives to connect to the REST interface on the Data Box or Data Box Heavy. Sign in to the local web UI of Data Box and go to **Connect and copy** page. Against the Azure storage account for your device, under **Access settings**, locate and select **REST**.
5353

5454
!["Connect and copy" page](media/data-lake-storage-migrate-on-premises-HDFS-cluster/data-box-connect-rest.png)
5555

@@ -59,7 +59,7 @@ Follow these steps to copy data via the REST APIs of Blob/Object storage to your
5959

6060
!["Access storage account and upload data" dialog](media/data-lake-storage-migrate-on-premises-HDFS-cluster/data-box-connection-string-http.png)
6161

62-
3. Add the endpoint and the Data Box IP address to `/etc/hosts` on each node.
62+
3. Add the endpoint and the Data Box or Data Box Heavy node IP address to `/etc/hosts` on each node.
6363

6464
```
6565
10.128.5.42 mystorageaccount.blob.mydataboxno.microsoftdatabox.com
@@ -119,21 +119,29 @@ Follow these steps to copy data via the REST APIs of Blob/Object storage to your
119119
To improve the copy speed:
120120
- Try changing the number of mappers. (The above example uses `m` = 4 mappers.)
121121
- Try running mutliple `distcp` in parallel.
122-
- Remember that large files perform better than small files.
122+
- Remember that large files perform better than small files.
123123
124124
## Ship the Data Box to Microsoft
125125
126126
Follow these steps to prepare and ship the Data Box device to Microsoft.
127127
128-
1. After the data copy is complete, run [Prepare to ship](https://docs.microsoft.com/azure/databox/data-box-deploy-copy-data-via-rest) on your Data Box. After the device preparation is complete, download the BOM files. You will use these BOM or manifest files later to verify the data uploaded to Azure. Shut down the device and remove the cables.
129-
2. Schedule a pickup with UPS to [Ship your Data Box back to Azure](https://docs.microsoft.com/azure/databox/data-box-deploy-picked-up).
130-
3. After Microsoft receives your device, it is connected to the network datacenter and data is uploaded to the storage account you specified (with hierarchical namespaces disabled) when you ordered the Data Box. Verify against the BOM files that all your data is uploaded to Azure. You can now move this data to a Data Lake Storage Gen2 storage account.
128+
1. After the data copy is complete, run:
129+
130+
- [Prepare to ship on your Data Box or Data Box Heavy](https://docs.microsoft.com/azure/databox/data-box-deploy-copy-data-via-rest).
131+
- After the device preparation is complete, download the BOM files. You will use these BOM or manifest files later to verify the data uploaded to Azure.
132+
- Shut down the device and remove the cables.
133+
2. Schedule a pickup with UPS. Follow the instructions to:
134+
135+
- [Ship your Data Box](https://docs.microsoft.com/azure/databox/data-box-deploy-picked-up)
136+
- [Ship your Data Box Heavy](https://docs.microsoft.com/azure/databox/data-box-heavy-deploy-picked-up).
137+
3. After Microsoft receives your device, it is connected to the datacenter network and the data is uploaded to the storage account you specified (with hierarchical namespaces disabled) when you placed the device order. Verify against the BOM files that all your data is uploaded to Azure. You can now move this data to a Data Lake Storage Gen2 storage account.
138+
131139
132140
## Move the data onto your Data Lake Storage Gen2 storage account
133141
134142
This step is needed if you are using Azure Data Lake Storage Gen2 as your data store. If you are using just a blob storage account without hierarchical namespace as your data store, you do not need to do this step.
135143
136-
You can do this in 2 ways.
144+
You can do this in 2 ways.
137145
138146
- Use [Azure Data Factory to move data to ADLS Gen2](https://docs.microsoft.com/azure/data-factory/load-azure-data-lake-storage-gen2). You will have to specify **Azure Blob Storage** as the source.
139147
-486 Bytes
Loading

0 commit comments

Comments
 (0)