Skip to content

Commit 51c5dbc

Browse files
authored
Merge pull request #85407 from dagiro/ts_hdfs2
ts_hdfs2
2 parents a60c915 + 0d6bac1 commit 51c5dbc

File tree

4 files changed

+20
-323
lines changed

4 files changed

+20
-323
lines changed

articles/hdinsight/hadoop/hdinsight-hdfs-troubleshoot-safe-mode.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ ms.service: hdinsight
55
ms.topic: troubleshooting
66
author: hrasheed-msft
77
ms.author: hrasheed
8-
ms.date: 08/02/2019
8+
ms.date: 08/14/2019
99
---
1010

1111
# Scenario: Local HDFS stuck in safe mode on Azure HDInsight cluster
@@ -14,9 +14,9 @@ This article describes troubleshooting steps and possible resolutions for issues
1414

1515
## Issue
1616

17-
Local HDFS stuck in safe mode on Azure HDInsight cluster. You receive an error message similar as follows:
17+
The local Apache Hadoop Distributed File System (HDFS) is stuck in safe mode on the HDInsight cluster. You receive an error message similar as follows:
1818

19-
```
19+
```output
2020
hdiuser@hn0-spark2:~$ hdfs dfs -D "fs.default.name=hdfs://mycluster/" -mkdir /temp
2121
17/04/05 16:20:52 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.mkdirs over hn0-spark2.2oyzcdm4sfjuzjmj5dnmvscjpg.dx.internal.cloudapp.net/10.0.0.22:8020. Not retrying because try once and fail.
2222
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /temp. Name node is in safe mode.
@@ -28,7 +28,7 @@ mkdir: Cannot create directory /temp. Name node is in safe mode.
2828

2929
## Cause
3030

31-
HDInsight cluster has been scaled down to very few nodes below or close to HDFS replication factor.
31+
The HDInsight cluster has been scaled down to very few nodes below, or number of nodes is close to the HDFS replication factor.
3232

3333
## Resolution
3434

@@ -56,6 +56,6 @@ If you didn't see your problem or are unable to solve your issue, visit one of t
5656
5757
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
5858
59-
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience by connecting the Azure community to the right resources: answers, support, and experts.
59+
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts.
6060
61-
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, please review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).
61+
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).

articles/hdinsight/hbase/apache-troubleshoot-hbase.md

Lines changed: 4 additions & 159 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ ms.service: hdinsight
55
author: hrasheed-msft
66
ms.author: hrasheed
77
ms.custom: hdinsightactive, seodec18
8-
ms.topic: conceptual
9-
ms.date: 12/06/2018
8+
ms.topic: troubleshooting
9+
ms.date: 08/14/2019
1010
---
1111

1212
# Troubleshoot Apache HBase by using Azure HDInsight
@@ -46,161 +46,6 @@ A potential cause for timeout issues when you use the `hbck` command might be th
4646
5. In the Ambari UI, restart the Active HBase Master service.
4747
6. Run the `hbase hbck -fixAssignments` command again.
4848

49-
## <a name="how-do-i-force-disable-hdfs-safe-mode-in-a-cluster"></a>How do I force-disable HDFS safe mode in a cluster?
50-
51-
### Issue
52-
53-
The local Apache Hadoop Distributed File System (HDFS) is stuck in safe mode on the HDInsight cluster.
54-
55-
### Detailed description
56-
57-
This error might be caused by a failure when you run the following HDFS command:
58-
59-
```apache
60-
hdfs dfs -D "fs.default.name=hdfs://mycluster/" -mkdir /temp
61-
```
62-
63-
The error you might see when you try to run the command looks like this:
64-
65-
```apache
66-
hdfs dfs -D "fs.default.name=hdfs://mycluster/" -mkdir /temp
67-
17/04/05 16:20:52 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.mkdirs over hn0-spark2.2oyzcdm4sfjuzjmj5dnmvscjpg.dx.internal.cloudapp.net/10.0.0.22:8020. Not retrying because try once and fail.
68-
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /temp. Name node is in safe mode.
69-
It was turned on manually. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
70-
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1359)
71-
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4010)
72-
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1102)
73-
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630)
74-
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
75-
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
76-
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
77-
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
78-
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
79-
at java.security.AccessController.doPrivileged(Native Method)
80-
at javax.security.auth.Subject.doAs(Subject.java:422)
81-
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
82-
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
83-
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
84-
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
85-
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
86-
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
87-
at com.sun.proxy.$Proxy10.mkdirs(Unknown Source)
88-
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:603)
89-
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
90-
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
91-
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
92-
at java.lang.reflect.Method.invoke(Method.java:498)
93-
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
94-
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
95-
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
96-
at com.sun.proxy.$Proxy11.mkdirs(Unknown Source)
97-
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3061)
98-
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:3031)
99-
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1162)
100-
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1158)
101-
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
102-
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1158)
103-
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1150)
104-
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1898)
105-
at org.apache.hadoop.fs.shell.Mkdir.processNonexistentPath(Mkdir.java:76)
106-
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:273)
107-
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
108-
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
109-
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
110-
at org.apache.hadoop.fs.FsShell.run(FsShell.java:297)
111-
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
112-
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
113-
at org.apache.hadoop.fs.FsShell.main(FsShell.java:350)
114-
mkdir: Cannot create directory /temp. Name node is in safe mode.
115-
```
116-
117-
### Probable cause
118-
119-
The HDInsight cluster has been scaled down to a very few nodes. The number of nodes is below or close to the HDFS replication factor.
120-
121-
### Resolution steps
122-
123-
1. Get the status of the HDFS on the HDInsight cluster by running the following commands:
124-
125-
```apache
126-
hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -report
127-
```
128-
129-
```apache
130-
hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -report
131-
Safe mode is ON
132-
Configured Capacity: 3372381241344 (3.07 TB)
133-
Present Capacity: 3138625077248 (2.85 TB)
134-
DFS Remaining: 3102710317056 (2.82 TB)
135-
DFS Used: 35914760192 (33.45 GB)
136-
DFS Used%: 1.14%
137-
Under replicated blocks: 0
138-
Blocks with corrupt replicas: 0
139-
Missing blocks: 0
140-
Missing blocks (with replication factor 1): 0
141-
142-
-------------------------------------------------
143-
Live datanodes (8):
144-
145-
Name: 10.0.0.17:30010 (10.0.0.17)
146-
Hostname: 10.0.0.17
147-
Decommission Status : Normal
148-
Configured Capacity: 421547655168 (392.60 GB)
149-
DFS Used: 5288128512 (4.92 GB)
150-
Non DFS Used: 29087272960 (27.09 GB)
151-
DFS Remaining: 387172253696 (360.58 GB)
152-
DFS Used%: 1.25%
153-
DFS Remaining%: 91.85%
154-
Configured Cache Capacity: 0 (0 B)
155-
Cache Used: 0 (0 B)
156-
Cache Remaining: 0 (0 B)
157-
Cache Used%: 100.00%
158-
Cache Remaining%: 0.00%
159-
Xceivers: 2
160-
Last contact: Wed Apr 05 16:22:00 UTC 2017
161-
...
162-
163-
```
164-
2. You also can check the integrity of the HDFS on the HDInsight cluster by using the following commands:
165-
166-
```apache
167-
hdfs fsck -D "fs.default.name=hdfs://mycluster/" /
168-
```
169-
170-
```apache
171-
Connecting to namenode via http://hn0-spark2.2oyzcdm4sfjuzjmj5dnmvscjpg.dx.internal.cloudapp.net:30070/fsck?ugi=hdiuser&path=%2F
172-
FSCK started by hdiuser (auth:SIMPLE) from /10.0.0.22 for path / at Wed Apr 05 16:40:28 UTC 2017
173-
....................................................................................................
174-
175-
....................................................................................................
176-
..................Status: HEALTHY
177-
Total size: 9330539472 B
178-
Total dirs: 37
179-
Total files: 2618
180-
Total symlinks: 0 (Files currently being written: 2)
181-
Total blocks (validated): 2535 (avg. block size 3680686 B)
182-
Minimally replicated blocks: 2535 (100.0 %)
183-
Over-replicated blocks: 0 (0.0 %)
184-
Under-replicated blocks: 0 (0.0 %)
185-
Mis-replicated blocks: 0 (0.0 %)
186-
Default replication factor: 3
187-
Average block replication: 3.0
188-
Corrupt blocks: 0
189-
Missing replicas: 0 (0.0 %)
190-
Number of data-nodes: 8
191-
Number of racks: 1
192-
FSCK ended at Wed Apr 05 16:40:28 UTC 2017 in 187 milliseconds
193-
194-
The filesystem under path '/' is HEALTHY
195-
```
196-
197-
3. If you determine that there are no missing, corrupt, or under-replicated blocks, or that those blocks can be ignored, run the following command to take the name node out of safe mode:
198-
199-
```apache
200-
hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -safemode leave
201-
```
202-
203-
20449
## How do I fix JDBC or SQLLine connectivity issues with Apache Phoenix?
20550

20651
### Resolution steps
@@ -411,5 +256,5 @@ Here's what's happening behind the scenes:
411256
sudo su - hbase -c "/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh start regionserver"
412257
```
413258

414-
### See Also
415-
[Troubleshoot by Using Azure HDInsight](../../hdinsight/hdinsight-troubleshoot-guide.md)
259+
### See also
260+
[Troubleshoot by using Azure HDInsight](../../hdinsight/hdinsight-troubleshoot-guide.md)

articles/hdinsight/hdinsight-troubleshoot-guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ ms.date: 05/29/2019
1313

1414
| Apache workload | Top questions |
1515
|---|---|
16-
|![HBase](./media/hdinsight-troubleshoot-guide/HBASE.png)<br>[Troubleshoot Apache HBase](hbase/apache-troubleshoot-hbase.md)|<br>[How do I run hbck command reports with multiple unassigned regions?](hbase/apache-troubleshoot-hbase.md#how-do-i-run-hbck-command-reports-with-multiple-unassigned-regions)<br><br>[How do I fix timeout issues when using hbck commands for region assignments?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-timeout-issues-with-hbck-commands-for-region-assignments)<br><br>[How do I force-disable HDFS safe mode on a cluster?](hbase/apache-troubleshoot-hbase.md#how-do-i-force-disable-hdfs-safe-mode-in-a-cluster)<br><br>[How do I fix JDBC or SQLLine connectivity issues with Apache Phoenix?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-jdbc-or-sqlline-connectivity-issues-with-apache-phoenix)<br><br>[What causes a master server to fail to start?](hbase/apache-troubleshoot-hbase.md#what-causes-a-master-server-to-fail-to-start)<br><br>[What causes a restart failure on a region server?](hbase/apache-troubleshoot-hbase.md#what-causes-a-restart-failure-on-a-region-server)|
17-
|![HDFS](./media/hdinsight-troubleshoot-guide/HDFS.png)<br>[Troubleshoot Apache Hadoop HDFS](hdinsight-troubleshoot-hdfs.md)|<br>[How do I access a local HDFS from inside a cluster?](hdinsight-troubleshoot-hdfs.md#how-do-i-access-local-hdfs-from-inside-a-cluster)<br><br>[How do I force-disable HDFS safe mode on a cluster?](hdinsight-troubleshoot-hdfs.md#how-do-i-force-disable-hdfs-safe-mode-in-a-cluster)|
16+
|![HBase](./media/hdinsight-troubleshoot-guide/HBASE.png)<br>[Troubleshoot Apache HBase](hbase/apache-troubleshoot-hbase.md)|<br>[How do I run hbck command reports with multiple unassigned regions?](hbase/apache-troubleshoot-hbase.md#how-do-i-run-hbck-command-reports-with-multiple-unassigned-regions)<br><br>[How do I fix timeout issues when using hbck commands for region assignments?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-timeout-issues-with-hbck-commands-for-region-assignments)<br><br>[How do I fix JDBC or SQLLine connectivity issues with Apache Phoenix?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-jdbc-or-sqlline-connectivity-issues-with-apache-phoenix)<br><br>[What causes a master server to fail to start?](hbase/apache-troubleshoot-hbase.md#what-causes-a-master-server-to-fail-to-start)<br><br>[What causes a restart failure on a region server?](hbase/apache-troubleshoot-hbase.md#what-causes-a-restart-failure-on-a-region-server)|
17+
|![HDFS](./media/hdinsight-troubleshoot-guide/HDFS.png)<br>[Troubleshoot Apache Hadoop HDFS](hdinsight-troubleshoot-hdfs.md)|<br>[How do I access a local HDFS from inside a cluster?](hdinsight-troubleshoot-hdfs.md#how-do-i-access-local-hdfs-from-inside-a-cluster)<br><br>[Local HDFS stuck in safe mode on Azure HDInsight cluster](hadoop/hdinsight-hdfs-troubleshoot-safe-mode.md)|
1818
|![Hive](./media/hdinsight-troubleshoot-guide/HIVE.png)<br>[Troubleshoot Apache Hive](hdinsight-troubleshoot-hive.md)|<br>[How do I export a Hive metastore and import it on another cluster?](hdinsight-troubleshoot-hive.md#how-do-i-export-a-hive-metastore-and-import-it-on-another-cluster)<br><br>[How do I locate Apache Hive logs on a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-locate-hive-logs-on-a-cluster)<br><br>[How do I launch the Apache Hive shell with specific configurations on a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-launch-the-hive-shell-with-specific-configurations-on-a-cluster)<br><br>[How do I analyze Apache Tez DAG data on a cluster-critical path?](hdinsight-troubleshoot-hive.md#how-do-i-analyze-tez-dag-data-on-a-cluster-critical-path)<br><br>[How do I download Apache Tez DAG data from a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-download-tez-dag-data-from-a-cluster)|
1919
|![Spark](./media/hdinsight-troubleshoot-guide/SPARK.png)<br>[Troubleshoot Apache Spark](hdinsight-troubleshoot-SPARK.md)|<br>[How do I configure an Apache Spark application by using Apache Ambari on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-apache-ambari-on-clusters)<br><br>[How do I configure an Apache Spark application by using a Jupyter notebook on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-a-jupyter-notebook-on-clusters)<br><br>[How do I configure an Apache Spark application by using Apache Livy on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-apache-livy-on-clusters)<br><br>[How do I configure an Apache Spark application by using spark-submit on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-spark-submit-on-clusters)<br><br>[How do I configure an Apache Spark application by using IntelliJ?](spark/apache-spark-intellij-tool-plugin.md)<br><br>[How do I configure an Apache Spark application by using Eclipse?](spark/apache-spark-eclipse-tool-plugin.md)<br><br>[How do I configure an Apache Spark application by using VSCode?](hdinsight-for-vscode.md)<br><br>[What causes an Apache Spark application OutOfMemoryError exception?](spark/apache-troubleshoot-spark.md#what-causes-an-apache-spark-application-outofmemoryerror-exception)|
2020
|![Storm](./media/hdinsight-troubleshoot-guide/STORM.png)<br>[Troubleshoot Apache Storm](hdinsight-troubleshoot-STORM.md)|<br>[How do I access the Apache Storm UI on a cluster?](storm/apache-troubleshoot-storm.md#how-do-i-access-the-storm-ui-on-a-cluster)<br><br>[How do I transfer Apache Storm event hub spout checkpoint information from one topology to another?](storm/apache-troubleshoot-storm.md#how-do-i-transfer-storm-event-hub-spout-checkpoint-information-from-one-topology-to-another)<br><br>[How do I locate Storm binaries on a cluster?](storm/apache-troubleshoot-storm.md#how-do-i-locate-storm-binaries-on-a-cluster)<br><br>[How do I determine the deployment topology of a Storm cluster?](storm/apache-troubleshoot-storm.md#how-do-i-determine-the-deployment-topology-of-a-storm-cluster)<br><br>[How do I locate Apache Storm event hub spout binaries for development?](storm/apache-troubleshoot-storm.md#how-do-i-locate-storm-event-hub-spout-binaries-for-development)|

0 commit comments

Comments
 (0)