Skip to content

Commit bd1ddcc

Browse files
authored
Merge pull request #105502 from msft-tacox/patch-7
Update zookeeper-troubleshoot-quorum-fails.md
2 parents a2148d6 + 58a7e1f commit bd1ddcc

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

articles/hdinsight/spark/zookeeper-troubleshoot-quorum-fails.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,24 +15,31 @@ This article describes troubleshooting steps and possible resolutions for issues
1515

1616
## Issue
1717

18-
Apache ZooKeeper server is unhealthy, symptoms could include: both Resource Managers/Name Nodes are in standby mode, simple HDFS operations do not work, `zkFailoverController` is stopped and cannot be started, Yarn/Spark/Livy jobs fail due to Zookeeper errors. You may see an error message similar to:
18+
Apache ZooKeeper server is unhealthy, symptoms could include: both Resource Managers/Name Nodes are in standby mode, simple HDFS operations do not work, `zkFailoverController` is stopped and cannot be started, Yarn/Spark/Livy jobs fail due to Zookeeper errors. LLAP Daemons may also fail to start on Secure Spark or Interactive Hive clusters. You may see an error message similar to:
1919

2020
```
2121
19/06/19 08:27:08 ERROR ZooKeeperStateStore: Fatal Zookeeper error. Shutting down Livy server.
2222
19/06/19 08:27:08 INFO LivyServer: Shutting down Livy server.
2323
```
2424

25+
In the Zookeeper Server logs on any Zookeeper host at /var/log/zookeeper/zookeeper-zookeeper-server-\*.out, you may also see the following error:
26+
27+
```
28+
2020-02-12 00:31:52,513 - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
29+
java.nio.channels.CancelledKeyException
30+
```
31+
2532
## Cause
2633

2734
When the volume of snapshot files is large or snapshot files are corrupted, ZooKeeper server will fail to form a quorum, which causes ZooKeeper related services unhealthy. ZooKeeper server will not remove old snapshot files from its data directory, instead, it is a periodic task to be performed by users to maintain the healthiness of ZooKeeper. For more information, see [ZooKeeper Strengths and Limitations](https://zookeeper.apache.org/doc/r3.3.5/zookeeperAdmin.html#sc_strengthsAndLimitations).
2835

2936
## Resolution
3037

31-
Check ZooKeeper data directory `/hadoop/zookeeper/version-2` and `/hadoop/hdinsight-zookeepe/version-2` to find out if the snapshots file size is large. Take the following steps if large snapshots exist:
38+
Check ZooKeeper data directory `/hadoop/zookeeper/version-2` and `/hadoop/hdinsight-zookeeper/version-2` to find out if the snapshots file size is large. Take the following steps if large snapshots exist:
3239

33-
1. Back up snapshots in `/hadoop/zookeeper/version-2` and `/hadoop/hdinsight-zookeepe/version-2`.
40+
1. Back up snapshots in `/hadoop/zookeeper/version-2` and `/hadoop/hdinsight-zookeeper/version-2`.
3441

35-
1. Clean up snapshots in `/hadoop/zookeeper/version-2` and `/hadoop/hdinsight-zookeepe/version-2`.
42+
1. Clean up snapshots in `/hadoop/zookeeper/version-2` and `/hadoop/hdinsight-zookeeper/version-2`.
3643

3744
1. Restart all ZooKeeper servers from Apache Ambari UI.
3845

0 commit comments

Comments
 (0)