Skip to content

Commit eed6aae

Browse files
Merge pull request #112873 from dagiro/freshness_c47
freshness_c47
2 parents 1c7f0ea + 23c66d2 commit eed6aae

File tree

3 files changed

+56
-50
lines changed

3 files changed

+56
-50
lines changed

articles/hdinsight/hdinsight-troubleshoot-hdfs.md

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: troubleshooting
9-
ms.date: 09/30/2019
9+
ms.date: 04/27/2020
1010
ms.custom: seodec18
1111
---
1212

1313
# Troubleshoot Apache Hadoop HDFS by using Azure HDInsight
1414

15-
Learn about the top issues and their resolutions when working with Hadoop Distributed File System (HDFS) payloads in Apache Ambari. For a full list of commands, see the [HDFS Commands Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html) and the [File System Shell Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html).
15+
Learn top issues and resolutions when working with Hadoop Distributed File System (HDFS). For a full list of commands, see the [HDFS Commands Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html) and the [File System Shell Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html).
1616

1717
## <a name="how-do-i-access-local-hdfs-from-inside-a-cluster"></a>How do I access the local HDFS from inside a cluster?
1818

@@ -67,9 +67,60 @@ Access the local HDFS from the command line and application code instead of by u
6767
hdfs://mycluster/tmp/hive/hive/a0be04ea-ae01-4cc4-b56d-f263baf2e314/inuse.lck
6868
```
6969
70+
## Storage exception for write on blob
71+
72+
### Issue
73+
74+
When using the `hadoop` or `hdfs dfs` commands to write files that are ~12 GB or larger on an HBase cluster, you may come across the following error:
75+
76+
```error
77+
ERROR azure.NativeAzureFileSystem: Encountered Storage Exception for write on Blob : example/test_large_file.bin._COPYING_ Exception details: null Error Code : RequestBodyTooLarge
78+
copyFromLocal: java.io.IOException
79+
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:661)
80+
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:366)
81+
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:350)
82+
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
83+
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
84+
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
85+
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
86+
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
87+
at java.lang.Thread.run(Thread.java:745)
88+
Caused by: com.microsoft.azure.storage.StorageException: The request body is too large and exceeds the maximum permissible limit.
89+
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
90+
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307)
91+
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182)
92+
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:816)
93+
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:788)
94+
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:354)
95+
... 7 more
96+
```
97+
98+
### Cause
99+
100+
HBase on HDInsight clusters default to a block size of 256 KB when writing to Azure storage. While it works for HBase APIs or REST APIs, it results in an error when using the `hadoop` or `hdfs dfs` command-line utilities.
101+
102+
### Resolution
103+
104+
Use `fs.azure.write.request.size` to specify a larger block size. You can do this modification on a per-use basis by using the `-D` parameter. The following command is an example using this parameter with the `hadoop` command:
105+
106+
```bash
107+
hadoop -fs -D fs.azure.write.request.size=4194304 -copyFromLocal test_large_file.bin /example/data
108+
```
109+
110+
You can also increase the value of `fs.azure.write.request.size` globally by using Apache Ambari. The following steps can be used to change the value in the Ambari Web UI:
111+
112+
1. In your browser, go to the Ambari Web UI for your cluster. The URL is `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster. When prompted, enter the admin name and password for the cluster.
113+
2. From the left side of the screen, select **HDFS**, and then select the **Configs** tab.
114+
3. In the **Filter...** field, enter `fs.azure.write.request.size`.
115+
4. Change the value from 262144 (256 KB) to the new value. For example, 4194304 (4 MB).
116+
117+
![Image of changing the value through Ambari Web UI](./media/hdinsight-troubleshoot-hdfs/hbase-change-block-write-size.png)
118+
119+
For more information on using Ambari, see [Manage HDInsight clusters using the Apache Ambari Web UI](hdinsight-hadoop-manage-ambari.md).
120+
70121
## du
71122

72-
The [-du](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du) command displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.
123+
The [`-du`](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du) command displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.
73124

74125
The `-s` option produces an aggregate summary of file lengths being displayed.
75126
The `-h` option formats the file sizes.

articles/hdinsight/hdinsight-upload-data.md

Lines changed: 2 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdiseo17may2017,seoapr2020
1010
ms.date: 04/27/2020
11-
---
11+
---
1212

1313
# Upload data for Apache Hadoop jobs in HDInsight
1414

@@ -66,7 +66,7 @@ or
6666
For a list of other Hadoop commands that work with files, see [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html)
6767

6868
> [!WARNING]
69-
> On Apache HBase clusters, the default block size used when writing data is 256 KB. While this works fine when using HBase APIs or REST APIs, using the `hadoop` or `hdfs dfs` commands to write data larger than ~12 GB results in an error. For more information, see the [storage exception for write on blob](#storage-exception-for-write-on-blob) section in this article.
69+
> On Apache HBase clusters, the default block size used when writing data is 256 KB. While this works fine when using HBase APIs or REST APIs, using the `hadoop` or `hdfs dfs` commands to write data larger than ~12 GB results in an error. For more information, see [storage exception for write on blob](hdinsight-troubleshoot-hdfs.md#storage-exception-for-write-on-blob).
7070

7171
## Graphical clients
7272

@@ -116,51 +116,6 @@ Azure Storage can also be accessed using an Azure SDK from the following program
116116

117117
For more information on installing the Azure SDKs, see [Azure downloads](https://azure.microsoft.com/downloads/)
118118

119-
## Troubleshooting
120-
121-
### Storage exception for write on blob
122-
123-
**Symptoms**: When using the `hadoop` or `hdfs dfs` commands to write files that are ~12 GB or larger on an HBase cluster, you may come across the following error:
124-
125-
ERROR azure.NativeAzureFileSystem: Encountered Storage Exception for write on Blob : example/test_large_file.bin._COPYING_ Exception details: null Error Code : RequestBodyTooLarge
126-
copyFromLocal: java.io.IOException
127-
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:661)
128-
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:366)
129-
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:350)
130-
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
131-
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
132-
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
133-
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
134-
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
135-
at java.lang.Thread.run(Thread.java:745)
136-
Caused by: com.microsoft.azure.storage.StorageException: The request body is too large and exceeds the maximum permissible limit.
137-
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
138-
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307)
139-
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182)
140-
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:816)
141-
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:788)
142-
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:354)
143-
... 7 more
144-
145-
**Cause**: HBase on HDInsight clusters default to a block size of 256 KB when writing to Azure storage. While it works for HBase APIs or REST APIs, it results in an error when using the `hadoop` or `hdfs dfs` command-line utilities.
146-
147-
**Resolution**: Use `fs.azure.write.request.size` to specify a larger block size. You can do this modification on a per-use basis by using the `-D` parameter. The following command is an example using this parameter with the `hadoop` command:
148-
149-
```bash
150-
hadoop -fs -D fs.azure.write.request.size=4194304 -copyFromLocal test_large_file.bin /example/data
151-
```
152-
153-
You can also increase the value of `fs.azure.write.request.size` globally by using Apache Ambari. The following steps can be used to change the value in the Ambari Web UI:
154-
155-
1. In your browser, go to the Ambari Web UI for your cluster. The URL is `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster. When prompted, enter the admin name and password for the cluster.
156-
2. From the left side of the screen, select **HDFS**, and then select the **Configs** tab.
157-
3. In the **Filter...** field, enter `fs.azure.write.request.size`.
158-
4. Change the value from 262144 (256 KB) to the new value. For example, 4194304 (4 MB).
159-
160-
![Image of changing the value through Ambari Web UI](./media/hdinsight-upload-data/hbase-change-block-write-size.png)
161-
162-
For more information on using Ambari, see [Manage HDInsight clusters using the Apache Ambari Web UI](hdinsight-hadoop-manage-ambari.md).
163-
164119
## Next steps
165120

166121
Now that you understand how to get data into HDInsight, read the following articles to learn analysis:

0 commit comments

Comments
 (0)