Skip to content

Commit e74daf1

Browse files
committed
ts_hdfs2
1 parent 656c1b2 commit e74daf1

File tree

3 files changed

+8
-159
lines changed

3 files changed

+8
-159
lines changed

articles/hdinsight/hadoop/hdinsight-hdfs-troubleshoot-safe-mode.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ ms.service: hdinsight
55
ms.topic: troubleshooting
66
author: hrasheed-msft
77
ms.author: hrasheed
8-
ms.date: 08/02/2019
8+
ms.date: 08/14/2019
99
---
1010

1111
# Scenario: Local HDFS stuck in safe mode on Azure HDInsight cluster
@@ -14,9 +14,9 @@ This article describes troubleshooting steps and possible resolutions for issues
1414

1515
## Issue
1616

17-
Local HDFS stuck in safe mode on Azure HDInsight cluster. You receive an error message similar as follows:
17+
The local Apache Hadoop Distributed File System (HDFS) is stuck in safe mode on the HDInsight cluster. You receive an error message similar as follows:
1818

19-
```
19+
```output
2020
hdiuser@hn0-spark2:~$ hdfs dfs -D "fs.default.name=hdfs://mycluster/" -mkdir /temp
2121
17/04/05 16:20:52 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.mkdirs over hn0-spark2.2oyzcdm4sfjuzjmj5dnmvscjpg.dx.internal.cloudapp.net/10.0.0.22:8020. Not retrying because try once and fail.
2222
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /temp. Name node is in safe mode.
@@ -28,7 +28,7 @@ mkdir: Cannot create directory /temp. Name node is in safe mode.
2828

2929
## Cause
3030

31-
HDInsight cluster has been scaled down to very few nodes below or close to HDFS replication factor.
31+
The HDInsight cluster has been scaled down to very few nodes below, or number of nodes is close to the HDFS replication factor.
3232

3333
## Resolution
3434

@@ -56,6 +56,6 @@ If you didn't see your problem or are unable to solve your issue, visit one of t
5656
5757
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
5858
59-
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience by connecting the Azure community to the right resources: answers, support, and experts.
59+
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts.
6060
61-
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, please review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).
61+
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).

articles/hdinsight/hdinsight-troubleshoot-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.date: 05/29/2019
1414
| Apache workload | Top questions |
1515
|---|---|
1616
|![HBase](./media/hdinsight-troubleshoot-guide/HBASE.png)<br>[Troubleshoot Apache HBase](hbase/apache-troubleshoot-hbase.md)|<br>[How do I run hbck command reports with multiple unassigned regions?](hbase/apache-troubleshoot-hbase.md#how-do-i-run-hbck-command-reports-with-multiple-unassigned-regions)<br><br>[How do I fix timeout issues when using hbck commands for region assignments?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-timeout-issues-with-hbck-commands-for-region-assignments)<br><br>[How do I fix JDBC or SQLLine connectivity issues with Apache Phoenix?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-jdbc-or-sqlline-connectivity-issues-with-apache-phoenix)<br><br>[What causes a master server to fail to start?](hbase/apache-troubleshoot-hbase.md#what-causes-a-master-server-to-fail-to-start)<br><br>[What causes a restart failure on a region server?](hbase/apache-troubleshoot-hbase.md#what-causes-a-restart-failure-on-a-region-server)|
17-
|![HDFS](./media/hdinsight-troubleshoot-guide/HDFS.png)<br>[Troubleshoot Apache Hadoop HDFS](hdinsight-troubleshoot-hdfs.md)|<br>[How do I access a local HDFS from inside a cluster?](hdinsight-troubleshoot-hdfs.md#how-do-i-access-local-hdfs-from-inside-a-cluster)<br><br>[How do I force-disable HDFS safe mode on a cluster?](hdinsight-troubleshoot-hdfs.md#how-do-i-force-disable-hdfs-safe-mode-in-a-cluster)|
17+
|![HDFS](./media/hdinsight-troubleshoot-guide/HDFS.png)<br>[Troubleshoot Apache Hadoop HDFS](hdinsight-troubleshoot-hdfs.md)|<br>[How do I access a local HDFS from inside a cluster?](hdinsight-troubleshoot-hdfs.md#how-do-i-access-local-hdfs-from-inside-a-cluster)<br><br>[Local HDFS stuck in safe mode on Azure HDInsight cluster](hadoop/hdinsight-hdfs-troubleshoot-safe-mode.md)|
1818
|![Hive](./media/hdinsight-troubleshoot-guide/HIVE.png)<br>[Troubleshoot Apache Hive](hdinsight-troubleshoot-hive.md)|<br>[How do I export a Hive metastore and import it on another cluster?](hdinsight-troubleshoot-hive.md#how-do-i-export-a-hive-metastore-and-import-it-on-another-cluster)<br><br>[How do I locate Apache Hive logs on a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-locate-hive-logs-on-a-cluster)<br><br>[How do I launch the Apache Hive shell with specific configurations on a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-launch-the-hive-shell-with-specific-configurations-on-a-cluster)<br><br>[How do I analyze Apache Tez DAG data on a cluster-critical path?](hdinsight-troubleshoot-hive.md#how-do-i-analyze-tez-dag-data-on-a-cluster-critical-path)<br><br>[How do I download Apache Tez DAG data from a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-download-tez-dag-data-from-a-cluster)|
1919
|![Spark](./media/hdinsight-troubleshoot-guide/SPARK.png)<br>[Troubleshoot Apache Spark](hdinsight-troubleshoot-SPARK.md)|<br>[How do I configure an Apache Spark application by using Apache Ambari on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-apache-ambari-on-clusters)<br><br>[How do I configure an Apache Spark application by using a Jupyter notebook on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-a-jupyter-notebook-on-clusters)<br><br>[How do I configure an Apache Spark application by using Apache Livy on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-apache-livy-on-clusters)<br><br>[How do I configure an Apache Spark application by using spark-submit on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-spark-submit-on-clusters)<br><br>[How do I configure an Apache Spark application by using IntelliJ?](spark/apache-spark-intellij-tool-plugin.md)<br><br>[How do I configure an Apache Spark application by using Eclipse?](spark/apache-spark-eclipse-tool-plugin.md)<br><br>[How do I configure an Apache Spark application by using VSCode?](hdinsight-for-vscode.md)<br><br>[What causes an Apache Spark application OutOfMemoryError exception?](spark/apache-troubleshoot-spark.md#what-causes-an-apache-spark-application-outofmemoryerror-exception)|
2020
|![Storm](./media/hdinsight-troubleshoot-guide/STORM.png)<br>[Troubleshoot Apache Storm](hdinsight-troubleshoot-STORM.md)|<br>[How do I access the Apache Storm UI on a cluster?](storm/apache-troubleshoot-storm.md#how-do-i-access-the-storm-ui-on-a-cluster)<br><br>[How do I transfer Apache Storm event hub spout checkpoint information from one topology to another?](storm/apache-troubleshoot-storm.md#how-do-i-transfer-storm-event-hub-spout-checkpoint-information-from-one-topology-to-another)<br><br>[How do I locate Storm binaries on a cluster?](storm/apache-troubleshoot-storm.md#how-do-i-locate-storm-binaries-on-a-cluster)<br><br>[How do I determine the deployment topology of a Storm cluster?](storm/apache-troubleshoot-storm.md#how-do-i-determine-the-deployment-topology-of-a-storm-cluster)<br><br>[How do I locate Apache Storm event hub spout binaries for development?](storm/apache-troubleshoot-storm.md#how-do-i-locate-storm-event-hub-spout-binaries-for-development)|

articles/hdinsight/hdinsight-troubleshoot-hdfs.md

Lines changed: 1 addition & 152 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ author: hrasheed-msft
55
ms.author: hrasheed
66
ms.service: hdinsight
77
ms.topic: troubleshooting
8-
ms.date: 08/14/2019
8+
ms.date: 08/14/2019
99
ms.custom: seodec18
1010
---
1111

@@ -66,157 +66,6 @@ Access the local HDFS from the command line and application code instead of by u
6666
hdfs://mycluster/tmp/hive/hive/a0be04ea-ae01-4cc4-b56d-f263baf2e314/inuse.lck
6767
```
6868
69-
70-
## <a name="how-do-i-force-disable-hdfs-safe-mode-in-a-cluster"></a>How do I force-disable HDFS safe mode in a cluster?
71-
72-
### Issue
73-
74-
The local Apache Hadoop Distributed File System (HDFS) is stuck in safe mode on the HDInsight cluster. Failure occurs when you run the following HDFS command:
75-
76-
```bash
77-
hdfs dfs -D "fs.default.name=hdfs://mycluster/" -mkdir /temp
78-
```
79-
80-
You receive an error message similar as follows:
81-
82-
```output
83-
hdfs dfs -D "fs.default.name=hdfs://mycluster/" -mkdir /temp
84-
17/04/05 16:20:52 WARN retry.RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.mkdirs over hn0-spark2.2oyzcdm4sfjuzjmj5dnmvscjpg.dx.internal.cloudapp.net/10.0.0.22:8020. Not retrying because try once and fail.
85-
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /temp. Name node is in safe mode.
86-
It was turned on manually. Use "hdfs dfsadmin -safemode leave" to turn safe mode off.
87-
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1359)
88-
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4010)
89-
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1102)
90-
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:630)
91-
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
92-
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
93-
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
94-
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
95-
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
96-
at java.security.AccessController.doPrivileged(Native Method)
97-
at javax.security.auth.Subject.doAs(Subject.java:422)
98-
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
99-
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
100-
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1552)
101-
at org.apache.hadoop.ipc.Client.call(Client.java:1496)
102-
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
103-
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
104-
at com.sun.proxy.$Proxy10.mkdirs(Unknown Source)
105-
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:603)
106-
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
107-
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
108-
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
109-
at java.lang.reflect.Method.invoke(Method.java:498)
110-
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:278)
111-
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:194)
112-
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:176)
113-
at com.sun.proxy.$Proxy11.mkdirs(Unknown Source)
114-
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3061)
115-
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:3031)
116-
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1162)
117-
at org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1158)
118-
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
119-
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1158)
120-
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1150)
121-
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1898)
122-
at org.apache.hadoop.fs.shell.Mkdir.processNonexistentPath(Mkdir.java:76)
123-
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:273)
124-
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
125-
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119)
126-
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
127-
at org.apache.hadoop.fs.FsShell.run(FsShell.java:297)
128-
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
129-
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
130-
at org.apache.hadoop.fs.FsShell.main(FsShell.java:350)
131-
mkdir: Cannot create directory /temp. Name node is in safe mode.
132-
```
133-
134-
### Cause
135-
136-
The HDInsight cluster has been scaled down to a very few nodes. The number of nodes is below or close to the HDFS replication factor.
137-
138-
### Resolution
139-
140-
1. Get the status of HDFS on the HDInsight cluster by using the following commands:
141-
142-
```bash
143-
hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -report
144-
```
145-
146-
```sample output
147-
hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -report
148-
Safe mode is ON
149-
Configured Capacity: 3372381241344 (3.07 TB)
150-
Present Capacity: 3138625077248 (2.85 TB)
151-
DFS Remaining: 3102710317056 (2.82 TB)
152-
DFS Used: 35914760192 (33.45 GB)
153-
DFS Used%: 1.14%
154-
Under replicated blocks: 0
155-
Blocks with corrupt replicas: 0
156-
Missing blocks: 0
157-
Missing blocks (with replication factor 1): 0
158-
159-
-------------------------------------------------
160-
Live datanodes (8):
161-
162-
Name: 10.0.0.17:30010 (10.0.0.17)
163-
Hostname: 10.0.0.17
164-
Decommission Status : Normal
165-
Configured Capacity: 421547655168 (392.60 GB)
166-
DFS Used: 5288128512 (4.92 GB)
167-
Non DFS Used: 29087272960 (27.09 GB)
168-
DFS Remaining: 387172253696 (360.58 GB)
169-
DFS Used%: 1.25%
170-
DFS Remaining%: 91.85%
171-
Configured Cache Capacity: 0 (0 B)
172-
Cache Used: 0 (0 B)
173-
Cache Remaining: 0 (0 B)
174-
Cache Used%: 100.00%
175-
Cache Remaining%: 0.00%
176-
Xceivers: 2
177-
Last contact: Wed Apr 05 16:22:00 UTC 2017
178-
...
179-
```
180-
181-
1. Check the integrity of HDFS on the HDInsight cluster by using the following commands:
182-
183-
```bash
184-
hdfs fsck -D "fs.default.name=hdfs://mycluster/" /
185-
```
186-
187-
```sample output
188-
Connecting to namenode via http://hn0-spark2.2oyzcdm4sfjuzjmj5dnmvscjpg.dx.internal.cloudapp.net:30070/fsck?ugi=hdiuser&path=%2F
189-
FSCK started by hdiuser (auth:SIMPLE) from /10.0.0.22 for path / at Wed Apr 05 16:40:28 UTC 2017
190-
....................................................................................................
191-
192-
....................................................................................................
193-
..................Status: HEALTHY
194-
Total size: 9330539472 B
195-
Total dirs: 37
196-
Total files: 2618
197-
Total symlinks: 0 (Files currently being written: 2)
198-
Total blocks (validated): 2535 (avg. block size 3680686 B)
199-
Minimally replicated blocks: 2535 (100.0 %)
200-
Over-replicated blocks: 0 (0.0 %)
201-
Under-replicated blocks: 0 (0.0 %)
202-
Mis-replicated blocks: 0 (0.0 %)
203-
Default replication factor: 3
204-
Average block replication: 3.0
205-
Corrupt blocks: 0
206-
Missing replicas: 0 (0.0 %)
207-
Number of data-nodes: 8
208-
Number of racks: 1
209-
FSCK ended at Wed Apr 05 16:40:28 UTC 2017 in 187 milliseconds
210-
211-
The filesystem under path '/' is HEALTHY
212-
```
213-
214-
1. If you determine that there are no missing, corrupt, or under-replicated blocks, or that those blocks can be ignored, run the following command to take the name node out of safe mode:
215-
216-
```apache
217-
hdfs dfsadmin -D "fs.default.name=hdfs://mycluster/" -safemode leave
218-
```
219-
22069
## Next steps
22170
22271
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:

0 commit comments

Comments
 (0)