Skip to content

Commit bcf77f8

Browse files
committed
Merge branch 'master' of https://github.com/MicrosoftDocs/azure-docs-pr into work02
2 parents 99d8790 + 8c6e528 commit bcf77f8

File tree

3 files changed

+59
-53
lines changed

3 files changed

+59
-53
lines changed

articles/hdinsight/hdinsight-troubleshoot-hdfs.md

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: troubleshooting
9-
ms.date: 09/30/2019
9+
ms.date: 04/27/2020
1010
ms.custom: seodec18
1111
---
1212

1313
# Troubleshoot Apache Hadoop HDFS by using Azure HDInsight
1414

15-
Learn about the top issues and their resolutions when working with Hadoop Distributed File System (HDFS) payloads in Apache Ambari. For a full list of commands, see the [HDFS Commands Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html) and the [File System Shell Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html).
15+
Learn top issues and resolutions when working with Hadoop Distributed File System (HDFS). For a full list of commands, see the [HDFS Commands Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html) and the [File System Shell Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html).
1616

1717
## <a name="how-do-i-access-local-hdfs-from-inside-a-cluster"></a>How do I access the local HDFS from inside a cluster?
1818

@@ -67,9 +67,60 @@ Access the local HDFS from the command line and application code instead of by u
6767
hdfs://mycluster/tmp/hive/hive/a0be04ea-ae01-4cc4-b56d-f263baf2e314/inuse.lck
6868
```
6969
70+
## Storage exception for write on blob
71+
72+
### Issue
73+
74+
When using the `hadoop` or `hdfs dfs` commands to write files that are ~12 GB or larger on an HBase cluster, you may come across the following error:
75+
76+
```error
77+
ERROR azure.NativeAzureFileSystem: Encountered Storage Exception for write on Blob : example/test_large_file.bin._COPYING_ Exception details: null Error Code : RequestBodyTooLarge
78+
copyFromLocal: java.io.IOException
79+
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:661)
80+
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:366)
81+
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:350)
82+
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
83+
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
84+
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
85+
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
86+
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
87+
at java.lang.Thread.run(Thread.java:745)
88+
Caused by: com.microsoft.azure.storage.StorageException: The request body is too large and exceeds the maximum permissible limit.
89+
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
90+
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307)
91+
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182)
92+
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:816)
93+
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:788)
94+
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:354)
95+
... 7 more
96+
```
97+
98+
### Cause
99+
100+
HBase on HDInsight clusters default to a block size of 256 KB when writing to Azure storage. While it works for HBase APIs or REST APIs, it results in an error when using the `hadoop` or `hdfs dfs` command-line utilities.
101+
102+
### Resolution
103+
104+
Use `fs.azure.write.request.size` to specify a larger block size. You can do this modification on a per-use basis by using the `-D` parameter. The following command is an example using this parameter with the `hadoop` command:
105+
106+
```bash
107+
hadoop -fs -D fs.azure.write.request.size=4194304 -copyFromLocal test_large_file.bin /example/data
108+
```
109+
110+
You can also increase the value of `fs.azure.write.request.size` globally by using Apache Ambari. The following steps can be used to change the value in the Ambari Web UI:
111+
112+
1. In your browser, go to the Ambari Web UI for your cluster. The URL is `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster. When prompted, enter the admin name and password for the cluster.
113+
2. From the left side of the screen, select **HDFS**, and then select the **Configs** tab.
114+
3. In the **Filter...** field, enter `fs.azure.write.request.size`.
115+
4. Change the value from 262144 (256 KB) to the new value. For example, 4194304 (4 MB).
116+
117+
![Image of changing the value through Ambari Web UI](./media/hdinsight-troubleshoot-hdfs/hbase-change-block-write-size.png)
118+
119+
For more information on using Ambari, see [Manage HDInsight clusters using the Apache Ambari Web UI](hdinsight-hadoop-manage-ambari.md).
120+
70121
## du
71122

72-
The [-du](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du) command displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.
123+
The [`-du`](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du) command displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.
73124

74125
The `-s` option produces an aggregate summary of file lengths being displayed.
75126
The `-h` option formats the file sizes.

articles/hdinsight/hdinsight-upload-data.md

Lines changed: 5 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdiseo17may2017,seoapr2020
1010
ms.date: 04/27/2020
11-
---
11+
---
1212

1313
# Upload data for Apache Hadoop jobs in HDInsight
1414

@@ -26,7 +26,7 @@ Note the following requirements before you begin:
2626

2727
## Upload data to Azure Storage
2828

29-
## Utilities
29+
### Utilities
3030

3131
Microsoft provides the following utilities to work with Azure Storage:
3232

@@ -41,7 +41,7 @@ Microsoft provides the following utilities to work with Azure Storage:
4141
> [!NOTE]
4242
> The Hadoop command is only available on the HDInsight cluster. The command only allows loading data from the local file system into Azure Storage.
4343

44-
## Hadoop command line
44+
### Hadoop command line
4545

4646
The Hadoop command line is only useful for storing data into Azure storage blob when the data is already present on the cluster head node.
4747

@@ -66,9 +66,9 @@ or
6666
For a list of other Hadoop commands that work with files, see [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html)
6767

6868
> [!WARNING]
69-
> On Apache HBase clusters, the default block size used when writing data is 256 KB. While this works fine when using HBase APIs or REST APIs, using the `hadoop` or `hdfs dfs` commands to write data larger than ~12 GB results in an error. For more information, see the [storage exception for write on blob](#storage-exception-for-write-on-blob) section in this article.
69+
> On Apache HBase clusters, the default block size used when writing data is 256 KB. While this works fine when using HBase APIs or REST APIs, using the `hadoop` or `hdfs dfs` commands to write data larger than ~12 GB results in an error. For more information, see [storage exception for write on blob](hdinsight-troubleshoot-hdfs.md#storage-exception-for-write-on-blob).
7070

71-
## Graphical clients
71+
### Graphical clients
7272

7373
There are also several applications that provide a graphical interface for working with Azure Storage. The following table is a list of a few of these applications:
7474

@@ -116,51 +116,6 @@ Azure Storage can also be accessed using an Azure SDK from the following program
116116

117117
For more information on installing the Azure SDKs, see [Azure downloads](https://azure.microsoft.com/downloads/)
118118

119-
## Troubleshooting
120-
121-
### Storage exception for write on blob
122-
123-
**Symptoms**: When using the `hadoop` or `hdfs dfs` commands to write files that are ~12 GB or larger on an HBase cluster, you may come across the following error:
124-
125-
ERROR azure.NativeAzureFileSystem: Encountered Storage Exception for write on Blob : example/test_large_file.bin._COPYING_ Exception details: null Error Code : RequestBodyTooLarge
126-
copyFromLocal: java.io.IOException
127-
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:661)
128-
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:366)
129-
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:350)
130-
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
131-
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
132-
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
133-
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
134-
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
135-
at java.lang.Thread.run(Thread.java:745)
136-
Caused by: com.microsoft.azure.storage.StorageException: The request body is too large and exceeds the maximum permissible limit.
137-
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
138-
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307)
139-
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182)
140-
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:816)
141-
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:788)
142-
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:354)
143-
... 7 more
144-
145-
**Cause**: HBase on HDInsight clusters default to a block size of 256 KB when writing to Azure storage. While it works for HBase APIs or REST APIs, it results in an error when using the `hadoop` or `hdfs dfs` command-line utilities.
146-
147-
**Resolution**: Use `fs.azure.write.request.size` to specify a larger block size. You can do this modification on a per-use basis by using the `-D` parameter. The following command is an example using this parameter with the `hadoop` command:
148-
149-
```bash
150-
hadoop -fs -D fs.azure.write.request.size=4194304 -copyFromLocal test_large_file.bin /example/data
151-
```
152-
153-
You can also increase the value of `fs.azure.write.request.size` globally by using Apache Ambari. The following steps can be used to change the value in the Ambari Web UI:
154-
155-
1. In your browser, go to the Ambari Web UI for your cluster. The URL is `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster. When prompted, enter the admin name and password for the cluster.
156-
2. From the left side of the screen, select **HDFS**, and then select the **Configs** tab.
157-
3. In the **Filter...** field, enter `fs.azure.write.request.size`.
158-
4. Change the value from 262144 (256 KB) to the new value. For example, 4194304 (4 MB).
159-
160-
![Image of changing the value through Ambari Web UI](./media/hdinsight-upload-data/hbase-change-block-write-size.png)
161-
162-
For more information on using Ambari, see [Manage HDInsight clusters using the Apache Ambari Web UI](hdinsight-hadoop-manage-ambari.md).
163-
164119
## Next steps
165120

166121
Now that you understand how to get data into HDInsight, read the following articles to learn analysis:

0 commit comments

Comments
 (0)