You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Use Azure Data Box to migrate data from an on-premises HDFS store to Azure Storage
15
15
16
-
You can migrate data from an on-premises HDFS store of your Hadoop cluster into Azure Storage (blob storage or Data Lake Storage Gen2) by using a Data Box device. You can choose from a 80-TB Data Box or a 770-TB Data Box Heavy.
16
+
You can migrate data from an on-premises HDFS store of your Hadoop cluster into Azure Storage (blob storage or Data Lake Storage Gen2) by using a Data Box device. You can choose from an 80-TB Data Box or a 770-TB Data Box Heavy.
17
17
18
18
This article helps you complete these tasks:
19
19
@@ -46,10 +46,10 @@ To copy the data from your on-premises HDFS store to a Data Box device, you'll s
46
46
47
47
If the amount of data that you are copying is more than the capacity of a single Data Box or that of single node on Data Box Heavy, break up your data set into sizes that do fit into your devices.
48
48
49
-
Follow these steps to copy data via the REST APIs of Blob/Object storage to your Data Box device. The REST API interface will make the device appear as a HDFS store to your cluster.
49
+
Follow these steps to copy data via the REST APIs of Blob/Object storage to your Data Box device. The REST API interface will make the device appear as an HDFS store to your cluster.
50
50
51
51
52
-
1. Before you copy the data via REST, identify the security and connection primitives to connect to the REST interface on the Data Box or Data Box Heavy. Sign in to the local web UI of Data Box and go to **Connect and copy** page. Against the Azure storage account for your device, under **Access settings**, locate and select **REST**.
52
+
1. Before you copy the data via REST, identify the security and connection primitives to connect to the REST interface on the Data Box or Data Box Heavy. Sign in to the local web UI of Data Box and go to **Connect and copy** page. Against the Azure storage account for your device, under **Access settings**, locate, and select **REST**.
53
53
54
54

55
55
@@ -66,7 +66,7 @@ Follow these steps to copy data via the REST APIs of Blob/Object storage to your
66
66
```
67
67
If you are using some other mechanism for DNS, you should ensure that the Data Box endpoint can be resolved.
68
68
69
-
4. Set a shell variable `azjars` to point to the `hadoop-azure` and the `microsoft-windowsazure-storage-sdk` jar files. These files are under the Hadoop installation directory (You can check if these files exist by using this command `ls -l $<hadoop_install_dir>/share/hadoop/tools/lib/ | grep azure` where `<hadoop_install_dir>` is the directory where you have installed Hadoop) Use the full paths.
69
+
4. Set a shell variable `azjars` to point to the `hadoop-azure` and the `microsoft-windowsazure-storage-sdk` jar files. These files are under the Hadoop installation directory (You can check if these files exist by using this command `ls -l $<hadoop_install_dir>/share/hadoop/tools/lib/ | grep azure` where `<hadoop_install_dir>` is the directory where you have installed Hadoop) Use the full paths.
@@ -118,7 +118,7 @@ Follow these steps to copy data via the REST APIs of Blob/Object storage to your
118
118
119
119
To improve the copy speed:
120
120
- Try changing the number of mappers. (The above example uses `m` = 4 mappers.)
121
-
- Try running mutliple `distcp` in parallel.
121
+
- Try running multiple `distcp` in parallel.
122
122
- Remember that large files perform better than small files.
123
123
124
124
## Ship the Data Box to Microsoft
@@ -141,7 +141,7 @@ Follow these steps to prepare and ship the Data Box device to Microsoft.
141
141
142
142
This step is needed if you are using Azure Data Lake Storage Gen2 as your data store. If you are using just a blob storage account without hierarchical namespace as your data store, you do not need to do this step.
143
143
144
-
You can do this in 2 ways.
144
+
You can do this in two ways.
145
145
146
146
- Use [Azure Data Factory to move data to ADLS Gen2](https://docs.microsoft.com/azure/data-factory/load-azure-data-lake-storage-gen2). You will have to specify **Azure Blob Storage** as the source.
0 commit comments