Skip to content

Commit 60d5e3d

Browse files
authored
Merge pull request #201828 from sipastak/db-hdfs
1959189 Data Box: [Update] Data Box and HDFS
2 parents bb4b50e + d1ad71b commit 60d5e3d

File tree

1 file changed

+21
-2
lines changed

1 file changed

+21
-2
lines changed

articles/storage/blobs/data-lake-storage-migrate-on-premises-HDFS-cluster.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Migrate from on-prem HDFS store to Azure Storage with Azure Data Box
33
description: Migrate data from an on-premises HDFS store into Azure Storage (blob storage or Data Lake Storage Gen2) by using a Data Box device.
44
author: normesta
55
ms.service: storage
6-
ms.date: 02/14/2019
6+
ms.date: 06/16/2022
77
ms.author: normesta
88
ms.topic: how-to
99
ms.subservice: data-lake-storage-gen2
@@ -147,11 +147,30 @@ Follow these steps to copy data via the REST APIs of Blob/Object storage to your
147147
148148
To improve the copy speed:
149149
150-
- Try changing the number of mappers. (The above example uses `m` = 4 mappers.)
150+
- Try changing the number of mappers. (The default number of mappers is 20. The above example uses `m` = 4 mappers.)
151+
152+
- Try `-D fs.azure.concurrentRequestCount.out=<thread_number>` \. Replace `<thread_number>` with the number of threads per mapper. The product of the number of mappers and the number of threads per mapper, `m*<thread_number>`, should not exceed 32.
151153
152154
- Try running multiple `distcp` in parallel.
153155
154156
- Remember that large files perform better than small files.
157+
158+
- If you have files larger than 200 GB, we recommend changing the block size to 100MB with the following parameters:
159+
160+
```
161+
hadoop distcp \
162+
-libjars $azjars \
163+
-Dfs.azure.write.request.size= 104857600 \
164+
-Dfs.AbstractFileSystem.wasb.Impl=org.apache.hadoop.fs.azure.Wasb \
165+
-Dfs.azure.account.key.<blob_service_endpoint<>=<account_key> \
166+
-strategy dynamic \
167+
-Dmapreduce.map.memory.mb=16384 \
168+
-Dfs.azure.concurrentRequestCount.out=8 \
169+
-Dmapreduce.map.java.opts=-Xmx8196m \
170+
-m 4 \
171+
-update \
172+
/data/bigfile wasb://[email protected]/bigfile
173+
```
155174
156175
## Ship the Data Box to Microsoft
157176

0 commit comments

Comments
 (0)