|
| 1 | +--- |
| 2 | +title: Reliability in Azure HDInsight |
| 3 | +description: Find out about reliability in Azure HDInsight |
| 4 | +author: apurbasroy |
| 5 | +ms.service: azure |
| 6 | +ms.topic: conceptual |
| 7 | +ms.date: 02/27/2023 |
| 8 | +ms.author: anaharris |
| 9 | +ms.custom: references_regions, subject-reliability |
| 10 | +CustomerIntent: As a cloud architect/engineer, I need general guidance on migrating HDInsight to using availability zones. |
| 11 | +--- |
| 12 | + |
| 13 | + |
| 14 | +# Reliability in Azure HDInsight |
| 15 | + |
| 16 | + |
| 17 | +This article describes reliability support in Azure HDInsight and covers both intra-regional resiliency with [availability zones](#availability-zone-support) and links to information on [cross-region resiliency with disaster recovery](#disaster-recovery-cross-region-failover). For a more detailed overview of reliability in Azure, see [Azure reliability](/azure/architecture/framework/resiliency/overview). |
| 18 | + |
| 19 | + |
| 20 | +## Availability zone support |
| 21 | + |
| 22 | +Azure availability zones are at least three physically separate groups of datacenters within each Azure region. Datacenters within each zone are equipped with independent power, cooling, and networking infrastructure. In the case of a local zone failure, availability zones are designed so that if one zone is affected, regional services, capacity, and high availability are supported by the remaining two zones. Failures can range from software and hardware failures to events such as earthquakes, floods, and fires. Tolerance to failures is achieved with redundancy and logical isolation of Azure services. For more detailed information on availability zones in Azure, see [Availability zone service and regional support](availability-zones-service-support.md). |
| 23 | + |
| 24 | +An Azure HDInsight cluster consists of multiple nodes (head nodes, worker nodes, gateway nodes and zookeeper nodes). In a region that supports availability zones, HDInsight by default automatically spreads the cluster nodes across all zones of the selected region. In this default deployment model, you choose to have no control over which cluster nodes are provisioned in which availability zone. |
| 25 | + |
| 26 | +However, Azure HDInsight also supports both [zone-redundant and zonal deployment configurations](availability-zones-service-support.md#azure-services-with-availability-zone-support). |
| 27 | + |
| 28 | +- **Zonal**. Azure HDInsight cluster nodes are placed in a single zone that you select in the selected region. A zonal HDInsight cluster is isolated from any outages that occur in other zones. However, if an outage impacts the specific zone chosen for the HDInsight cluster, the cluster won't be available. This deployment model provides inexpensive, low latency network connectivity within the cluster. Replicating this deployment model into multiple availability zones can provide a higher level of availability to protect against hardware failure. |
| 29 | + |
| 30 | +- **Zone-redundant**. If you want application requires availability across multiple availability zones, you can create one primary HDInsight cluster in one availability zone and create a secondary HDInsight cluster in a different availability zone with minimum size to save cost. With this design, if one of the other availability zones goes down, this HDInsight cluster won’t be impacted. If this availability zone goes down, customers need to switch the secondary clusters in a different availability zone to the primary, route the workload to this new primary cluster, and quickly scale up the cluster size to pick up the data processing. |
| 31 | + |
| 32 | + |
| 33 | +## Prerequisites |
| 34 | + |
| 35 | +- Availability zones are only supported for clusters created after June 15, 2023. Availability zone settings can't be updated after the cluster is created. You also can't update an existing, non-availability zone cluster to use availability zones. |
| 36 | + |
| 37 | +- Clusters must be created under a custom VNet. |
| 38 | + |
| 39 | +- You need to bring your own SQL DB for Ambari DB and external metastore, such as Hive metastore, so that you can config these DBs in the same availability zone. |
| 40 | + |
| 41 | +- Your HDInsight clusters must be created with the availability zone option in one of the following regions: |
| 42 | + |
| 43 | + - Australia East |
| 44 | + - Brazil South |
| 45 | + - Canada Central |
| 46 | + - Central US |
| 47 | + - East US |
| 48 | + - East US 2 |
| 49 | + - France Central |
| 50 | + - Germany West Central |
| 51 | + - Japan East |
| 52 | + - Korea Central |
| 53 | + - North Europe |
| 54 | + - Qatar Central |
| 55 | + - Southeast Asia |
| 56 | + - South Central US |
| 57 | + - UK South |
| 58 | + - US Gov Virginia |
| 59 | + - West Europe |
| 60 | + - West US 2 |
| 61 | + |
| 62 | + |
| 63 | +### SLA improvements |
| 64 | +<!-- Need info --> |
| 65 | + |
| 66 | +## Create an HDInsight cluster using availability zone |
| 67 | + |
| 68 | +You can use Azure Resource Manager (ARM) template to launch an HDInsight cluster into a specified availability zone. |
| 69 | + |
| 70 | +In the resources section, you need to add a section of ‘zones’ and provide which availability zone you want this cluster to be deployed into. |
| 71 | + |
| 72 | +```json |
| 73 | + "resources": [ |
| 74 | + { |
| 75 | + "type": "Microsoft.HDInsight/clusters", |
| 76 | + "apiVersion": "2021-06-01", |
| 77 | + "name": "[parameters('cluster name')]", |
| 78 | + "location": "East US 2", |
| 79 | + "zones": [ |
| 80 | + "1" |
| 81 | + ], |
| 82 | + } |
| 83 | + ] |
| 84 | +``` |
| 85 | + |
| 86 | +### Verify nodes within one availability Zone across zones |
| 87 | + |
| 88 | +When the HDInsight cluster is ready, you can check the location to see which availability zone they're deployed in. |
| 89 | + |
| 90 | +:::image type="content" source="../hdinsight/media/hdinsight-use-availability-zones/cluster-availability-zone-info.png" alt-text="Screenshot that shows availability zone info in cluster overview" border="true"::: |
| 91 | + |
| 92 | +**Get API response**: |
| 93 | + |
| 94 | +```json |
| 95 | + [ |
| 96 | + { |
| 97 | + "location": "East US 2", |
| 98 | + "zones": [ |
| 99 | + "1" |
| 100 | + ], |
| 101 | + } |
| 102 | + ] |
| 103 | +``` |
| 104 | + |
| 105 | +### Scale up the cluster |
| 106 | + |
| 107 | +You can scale up an HDInsight cluster with more worker nodes. The newly added worker nodes will be placed in the same availability zone of this cluster. |
| 108 | + |
| 109 | + |
| 110 | +### Zonal failover support |
| 111 | + |
| 112 | +{Need more info here.} |
| 113 | + |
| 114 | +Make sure to implement logic to easily route workload to the secondary cluster and regularly back up the configurations in Ambari DB. |
| 115 | + |
| 116 | +### Availability zone redeployment |
| 117 | + |
| 118 | +Azure HDInsight clusters currently doesn't support in-place migration of existing cluster instances to availability zone support. However, you can choose to [recreate your cluster](#create-an-hdinsight-cluster-using-availability-zone), and choose availability zone support during the cluster creation. |
| 119 | + |
| 120 | +### Zone down experience |
| 121 | + |
| 122 | +When an availability zone goes down: |
| 123 | + |
| 124 | + - You can't ssh to this cluster |
| 125 | + - You can't delete or scale up or scale down this cluster |
| 126 | + - You can't submit jobs or see job history |
| 127 | + - You still can submit new cluster creation request in a different region |
| 128 | + |
| 129 | + |
| 130 | +### Low-latency design |
| 131 | + |
| 132 | +{Any more info here?} |
| 133 | + |
| 134 | + |
| 135 | +## Disaster recovery: cross region failover |
| 136 | + |
| 137 | +<!-- Need info on this --> |
| 138 | + |
| 139 | + |
| 140 | +## Next steps |
| 141 | + |
| 142 | +> [!div class="nextstepaction"] |
| 143 | +> [Reliability in Azure](availability-zones-overview.md) |
0 commit comments