You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/high-availability-components.md
+14-13Lines changed: 14 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ ms.date: 10/09/2019
10
10
---
11
11
# High availability services supported by Azure HDInsight
12
12
13
-
In order to provide you with optimal levels of availability for your analytics components, HDInsight has developed a unique architecture for ensuring high availability (HA) of critical services. Some components of this architecture were developed by HDInsight to provide automatic failover. Other components are standard Apache components which are deployed to support specific services. This article explains the architecture of the HA service model in HDInsight, how HDInsight supports failover for HA services, and best practices to recover from other service interruptions.
13
+
In order to provide you with optimal levels of availability for your analytics components, HDInsight has developed a unique architecture for ensuring high availability (HA) of critical services. Some components of this architecture were developed by HDInsight to provide automatic failover. Other components are standard Apache components that are deployed to support specific services. This article explains the architecture of the HA service model in HDInsight, how HDInsight supports failover for HA services, and best practices to recover from other service interruptions.
14
14
15
15
## High availability infrastructure
16
16
@@ -28,9 +28,9 @@ To achieve this level of dependability, HDInsight has developed a unique reliabi
28
28
- Slave high availability service
29
29
- Master high availability service
30
30
31
-
There are also other high availability services which are supported by open source Apache reliability services. These components are also present on HDInsight clusters, but are not developed by HDInsight:
31
+
There are also other high availability services, which are supported by open source Apache reliability services. These components are also present on HDInsight clusters, but are not developed by HDInsight:
32
32
33
-
- HDFS NameNode
33
+
-Hadoop File System (HDFS) NameNode
34
34
- YARN Resource Manager
35
35
- HBase Master
36
36
@@ -58,16 +58,16 @@ To maintain the correct states of HA services and provide a fast failover, HDIns
58
58
59
59
### Apache ZooKeeper
60
60
61
-
Apache ZooKeeper is a high-performance coordination service for distributed applications. In production, ZooKeeper usually runs in replicated mode where a replicated group of ZooKeeper servers form a quorum. Each HDInsight cluster has three ZooKeeper nodes which allow three ZooKeeper servers to form a quorum. HDInsight has two ZooKeeper quorums running in parallel with each other. One quorum decides the active headnode in a cluster on which HDInsight HA services should run. Another quorum is used to coordinate HA services provided by Apache, as detailed in later sections.
61
+
Apache ZooKeeper is a high-performance coordination service for distributed applications. In production, ZooKeeper usually runs in replicated mode where a replicated group of ZooKeeper servers form a quorum. Each HDInsight cluster has three ZooKeeper nodes that allow three ZooKeeper servers to form a quorum. HDInsight has two ZooKeeper quorums running in parallel with each other. One quorum decides the active headnode in a cluster on which HDInsight HA services should run. Another quorum is used to coordinate HA services provided by Apache, as detailed in later sections.
62
62
63
63
### Slave failover controller
64
64
65
-
The slave failover controller runs on every node in an HDInsight cluster. It is responsible for starting the Ambari agent and slave-ha-service on each node. It periodically queries the first ZooKeeper quorum about the active headnode. When the active and standby headnodes change, the slave failover controller performs the following:
65
+
The slave failover controller runs on every node in an HDInsight cluster. This controller is responsible for starting the Ambari agent and `slave-ha-service` on each node. It periodically queries the first ZooKeeper quorum about the active headnode. When the active and standby headnodes change, the slave failover controller performs the following:
66
66
67
67
1. Updates the host configuration file.
68
68
1. Restarts Ambari agent.
69
69
70
-
The slave-ha-service is responsible for stopping the HDInsight HA services (except Ambari server) on the standby headnode.
70
+
The `slave-ha-service` is responsible for stopping the HDInsight HA services (except Ambari server) on the standby headnode.
71
71
72
72
### Master failover controller
73
73
@@ -83,11 +83,11 @@ The master-ha-service only runs on the active headnode, it stops the HDInsight H
83
83
84
84
### The failover process
85
85
86
-
A health monitor, which is a daemon, runs along with each master failover controller to perform heartbeats with the headnodes. The headnode is regarded as an HA service in this scenario. The health monitor checks if the HA service is healthy and if it's ready to join in the leadership election. If yes, this HA service will compete in the election. If no, it will quit the election until it becomes ready again.
86
+
A health monitorruns along with each master failover controller to perform heartbeats for the headnodes. The headnode is regarded as an HA service in this scenario. The health monitor checks if each high availability service is healthy and if it's ready to join in the leadership election. If yes, this HA service will compete in the election. If not, it will quit the election until it becomes ready again.
87
87
88
-
For active headnode failures, such as headnode crash, or rebooting, if the standby headnode achieves the leadership and becomes active, its master failover controller will start all HDInsight HA services on it. It will also stop these services on the other headnode.
88
+
For active headnode failures, such as a headnode crash or reboot, if the standby headnode achieves the leadership and becomes active, its master failover controller will start all HDInsight HA services on it. The master failover controller will also stop these services on the other headnode.
89
89
90
-
For HDInsight HA service failures, such as service down, unhealthy, and so on, master failover controller should be able to automatically restart or stop the services according to the headnode status. Users shouldn't manually start HDInsight HA services on both head nodes. Instead, allow automatic or manual failover to recover the problem.
90
+
For HDInsight HA service failures, such as a service being down or unhealthy, the master failover controller should automatically restart or stop the services according to the headnode status. Users shouldn't manually start HDInsight HA services on both head nodes. Instead, allow automatic or manual failover to help the service recover.
91
91
92
92
### Inadvertent manual intervention
93
93
@@ -111,14 +111,15 @@ The second Zookeeper quorum is independent of the first quorum, so the active Na
111
111
112
112
### YARN Resource Manager
113
113
114
-
HDInsight clusters based on Apache Hadoop 2.4 or higher support YARN Resource Manager high availability. There are two resource managers, rm1 and rm2, running on headnode-0 and headnode-1, respectively. Like NameNode, Resource Manager is also configured for automatic failover. Another Resource Manager is automatically elected to be the active one when the active Resource Manager goes down or unresponsive.
114
+
HDInsight clusters based on Apache Hadoop 2.4 or higher support YARN Resource Manager high availability. There are two resource managers, rm1 and rm2, running on headnode-0 and headnode-1, respectively. Like NameNode, YARN Resource Manager is also configured for automatic failover. Another Resource Manager is automatically elected to be active when the current active resource manager goes down or unresponsive.
115
115
116
-
Resource Manager uses its embedded ActiveStandbyElector as a failure detector and leader elector. Unlike HDFS NodeManager, Resource Manager doesn't need a separate ZKFC daemon. The active Resource Manager writes its states into Apache Zookeeper.
117
-
Resource Manager high availability is independent from NameNode and HDInsight HA services, the active Resource Manager may not run on active headnode or headnode that the active NameNode is running. For more information about YARN Resource Manager high availability, see [Resource Manager High Availability](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html).
116
+
YARN Resource Manager uses its embedded ActiveStandbyElector as a failure detector and leader elector. Unlike HDFS NodeManager, YARN Resource Manager doesn't need a separate ZKFC daemon. The active resource manager writes its states into Apache Zookeeper.
117
+
118
+
YARN Resource Manager high availability is independent from NameNode and HDInsight HA services, the active resource manager may not run on active headnode or headnode that the active NameNode is running. For more information about YARN Resource Manager high availability, see [Resource Manager High Availability](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html).
118
119
119
120
### HBase Master
120
121
121
-
HDInsight HBase clusters support HBase Master high availability. Unlike other HA services, which run on headnodes, HBase Masters run on the three Zookeeper nodes, where one of them is the active master and the other two are standby. Like NameNode, HBase Master coordinates with Apache Zookeeper for leader election and does automatic failover when current active master has problems. There is only one active HBase Master at any time.
122
+
HDInsight HBase clusters support HBase Master high availability. Unlike other HA services, which run on headnodes, HBase Masters run on the three Zookeeper nodes, where one of them is the active master and the other two are standby. Like NameNode, HBase Master coordinates with Apache Zookeeper for leader election and does automatic failover when the current active master has problems. There is only one active HBase Master at any time.
0 commit comments