You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-troubleshoot-hdfs.md
+54-3Lines changed: 54 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,13 +6,13 @@ ms.author: hrasheed
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
8
ms.topic: troubleshooting
9
-
ms.date: 09/30/2019
9
+
ms.date: 04/27/2020
10
10
ms.custom: seodec18
11
11
---
12
12
13
13
# Troubleshoot Apache Hadoop HDFS by using Azure HDInsight
14
14
15
-
Learn about the top issues and their resolutions when working with Hadoop Distributed File System (HDFS) payloads in Apache Ambari. For a full list of commands, see the [HDFS Commands Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html) and the [File System Shell Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html).
15
+
Learn top issues and resolutions when working with Hadoop Distributed File System (HDFS). For a full list of commands, see the [HDFS Commands Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html) and the [File System Shell Guide](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html).
16
16
17
17
## <aname="how-do-i-access-local-hdfs-from-inside-a-cluster"></a>How do I access the local HDFS from inside a cluster?
18
18
@@ -67,9 +67,60 @@ Access the local HDFS from the command line and application code instead of by u
When using the `hadoop` or `hdfs dfs` commands to write files that are ~12 GB or larger on an HBase cluster, you may come across the following error:
75
+
76
+
```error
77
+
ERROR azure.NativeAzureFileSystem: Encountered Storage Exception for write on Blob : example/test_large_file.bin._COPYING_ Exception details: null Error Code : RequestBodyTooLarge
78
+
copyFromLocal: java.io.IOException
79
+
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:661)
80
+
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:366)
81
+
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:350)
82
+
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
83
+
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
84
+
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
85
+
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
86
+
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
87
+
at java.lang.Thread.run(Thread.java:745)
88
+
Caused by: com.microsoft.azure.storage.StorageException: The request body is too large and exceeds the maximum permissible limit.
89
+
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
90
+
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307)
91
+
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182)
92
+
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:816)
93
+
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:788)
94
+
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:354)
95
+
... 7 more
96
+
```
97
+
98
+
### Cause
99
+
100
+
HBase on HDInsight clusters default to a block size of 256 KB when writing to Azure storage. While it works for HBase APIs or REST APIs, it results in an error when using the `hadoop` or `hdfs dfs` command-line utilities.
101
+
102
+
### Resolution
103
+
104
+
Use `fs.azure.write.request.size` to specify a larger block size. You can do this modification on a per-use basis by using the `-D` parameter. The following command is an example using this parameter with the `hadoop` command:
You can also increase the value of `fs.azure.write.request.size` globally by using Apache Ambari. The following steps can be used to change the value in the Ambari Web UI:
111
+
112
+
1. In your browser, go to the Ambari Web UI for your cluster. The URL is `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster. When prompted, enter the admin name and password for the cluster.
113
+
2. From the left side of the screen, select **HDFS**, and then select the **Configs** tab.
114
+
3. In the **Filter...** field, enter `fs.azure.write.request.size`.
115
+
4. Change the value from 262144 (256 KB) to the new value. For example, 4194304 (4 MB).
116
+
117
+

118
+
119
+
For more information on using Ambari, see [Manage HDInsight clusters using the Apache Ambari Web UI](hdinsight-hadoop-manage-ambari.md).
120
+
70
121
## du
71
122
72
-
The [-du](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du) command displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.
123
+
The [`-du`](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du) command displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.
73
124
74
125
The `-s` option produces an aggregate summary of file lengths being displayed.
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-upload-data.md
+5-50Lines changed: 5 additions & 50 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.service: hdinsight
8
8
ms.topic: conceptual
9
9
ms.custom: hdiseo17may2017,seoapr2020
10
10
ms.date: 04/27/2020
11
-
---
11
+
---
12
12
13
13
# Upload data for Apache Hadoop jobs in HDInsight
14
14
@@ -26,7 +26,7 @@ Note the following requirements before you begin:
26
26
27
27
## Upload data to Azure Storage
28
28
29
-
## Utilities
29
+
### Utilities
30
30
31
31
Microsoft provides the following utilities to work with Azure Storage:
32
32
@@ -41,7 +41,7 @@ Microsoft provides the following utilities to work with Azure Storage:
41
41
> [!NOTE]
42
42
> The Hadoop command is only available on the HDInsight cluster. The command only allows loading data from the local file system into Azure Storage.
43
43
44
-
## Hadoop command line
44
+
### Hadoop command line
45
45
46
46
The Hadoop command line is only useful for storing data into Azure storage blob when the data is already present on the cluster head node.
47
47
@@ -66,9 +66,9 @@ or
66
66
For a list of other Hadoop commands that work with files, see [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html](https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html)
67
67
68
68
> [!WARNING]
69
-
> On Apache HBase clusters, the default block size used when writing data is 256 KB. While this works fine when using HBase APIs or REST APIs, using the `hadoop` or `hdfs dfs` commands to write data larger than ~12 GB results in an error. For more information, see the [storage exception for write on blob](#storage-exception-for-write-on-blob) section in this article.
69
+
> On Apache HBase clusters, the default block size used when writing data is 256 KB. While this works fine when using HBase APIs or REST APIs, using the `hadoop` or `hdfs dfs` commands to write data larger than ~12 GB results in an error. For more information, see [storage exception for write on blob](hdinsight-troubleshoot-hdfs.md#storage-exception-for-write-on-blob).
70
70
71
-
## Graphical clients
71
+
### Graphical clients
72
72
73
73
There are also several applications that provide a graphical interface for working with Azure Storage. The following table is a list of a few of these applications:
74
74
@@ -116,51 +116,6 @@ Azure Storage can also be accessed using an Azure SDK from the following program
116
116
117
117
For more information on installing the Azure SDKs, see [Azure downloads](https://azure.microsoft.com/downloads/)
118
118
119
-
## Troubleshooting
120
-
121
-
### Storage exception for write on blob
122
-
123
-
**Symptoms**: When using the `hadoop` or `hdfs dfs` commands to write files that are ~12 GB or larger on an HBase cluster, you may come across the following error:
124
-
125
-
ERROR azure.NativeAzureFileSystem: Encountered Storage Exception for write on Blob : example/test_large_file.bin._COPYING_ Exception details: null Error Code : RequestBodyTooLarge
126
-
copyFromLocal: java.io.IOException
127
-
at com.microsoft.azure.storage.core.Utility.initIOException(Utility.java:661)
128
-
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:366)
129
-
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:350)
130
-
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
131
-
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
132
-
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
133
-
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
134
-
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
135
-
at java.lang.Thread.run(Thread.java:745)
136
-
Caused by: com.microsoft.azure.storage.StorageException: The request body is too large and exceeds the maximum permissible limit.
137
-
at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:89)
138
-
at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:307)
139
-
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:182)
140
-
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlockInternal(CloudBlockBlob.java:816)
141
-
at com.microsoft.azure.storage.blob.CloudBlockBlob.uploadBlock(CloudBlockBlob.java:788)
142
-
at com.microsoft.azure.storage.blob.BlobOutputStream$1.call(BlobOutputStream.java:354)
143
-
... 7 more
144
-
145
-
**Cause**: HBase on HDInsight clusters default to a block size of 256 KB when writing to Azure storage. While it works for HBase APIs or REST APIs, it results in an error when using the `hadoop` or `hdfs dfs` command-line utilities.
146
-
147
-
**Resolution**: Use `fs.azure.write.request.size` to specify a larger block size. You can do this modification on a per-use basis by using the `-D` parameter. The following command is an example using this parameter with the `hadoop` command:
You can also increase the value of `fs.azure.write.request.size` globally by using Apache Ambari. The following steps can be used to change the value in the Ambari Web UI:
154
-
155
-
1. In your browser, go to the Ambari Web UI for your cluster. The URL is `https://CLUSTERNAME.azurehdinsight.net`, where `CLUSTERNAME` is the name of your cluster. When prompted, enter the admin name and password for the cluster.
156
-
2. From the left side of the screen, select **HDFS**, and then select the **Configs** tab.
157
-
3. In the **Filter...** field, enter `fs.azure.write.request.size`.
158
-
4. Change the value from 262144 (256 KB) to the new value. For example, 4194304 (4 MB).
159
-
160
-

161
-
162
-
For more information on using Ambari, see [Manage HDInsight clusters using the Apache Ambari Web UI](hdinsight-hadoop-manage-ambari.md).
163
-
164
119
## Next steps
165
120
166
121
Now that you understand how to get data into HDInsight, read the following articles to learn analysis:
0 commit comments