Skip to content

Commit e41baed

Browse files
authored
Merge pull request #250098 from v-akarnase/hbase-update
Hbase-update
2 parents 6c59463 + 3832be3 commit e41baed

File tree

7 files changed

+565
-0
lines changed

7 files changed

+565
-0
lines changed

articles/hdinsight/TOC.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -775,8 +775,12 @@ items:
775775
href: ./hbase/apache-hbase-backup-replication.md
776776
- name: Migrate Apache HBase cluster to newer version
777777
href: ./hbase/apache-hbase-migrate-new-version.md
778+
- name: Migrate an Apache HBase cluster to an HDInsight 5.1
779+
href: ./hbase/apache-hbase-migrate-hdinsight-5-1.md
778780
- name: Migrate Apache HBase version and storage account
779781
href: ./hbase/apache-hbase-migrate-new-version-new-storage-account.md
782+
- name: Migrate Apache HBase to an HDInsight 5.1 and new storage account
783+
href: ./hbase/apache-hbase-migrate-hdinsight-5-1-new-storage-account.md
780784
- name: Apache Phoenix performance best practices
781785
href: ./hbase/apache-hbase-phoenix-performance.md
782786
- name: Create Apache HBase cluster in Azure VNet
Lines changed: 268 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,268 @@
1+
---
2+
title: Migrate an HBase cluster to an HDInsight 5.1 and new storage account - Azure HDInsight
3+
description: Learn how to migrate an Apache HBase cluster in Azure HDInsight to an HDInsight 5.1 with a different Azure Storage account.
4+
ms.service: hdinsight
5+
ms.topic: how-to
6+
ms.custom: hdinsightactive
7+
ms.date: 06/30/2023
8+
---
9+
10+
# Migrate Apache HBase to an HDInsight 5.1 and new storage account
11+
12+
This article discusses how to update your Apache HBase cluster on Azure HDInsight to a newer version with a different Azure Storage account.
13+
14+
This article applies only if you need to use different Storage accounts for your source and destination clusters. To upgrade versions with the same Storage account for your source and destination clusters, see [Migrate Apache HBase to a new version](./apache-hbase-migrate-hdinsight-5-1.md).
15+
16+
The downtime while upgrading can be more than 20 minutes. This downtime caused by the steps to flush all in-memory data, and wait for all procedure to complete and the time to configure and restart the services on the new cluster. Your results vary, depending on the number of nodes, amount of data, and other variables.
17+
18+
19+
## Review Apache HBase compatibility
20+
21+
Before upgrading Apache HBase, ensure the HBase versions on the source and destination clusters are compatible. Review the HBase version compatibility matrix and release notes in the [HBase Reference Guide](https://hbase.apache.org/book.html#upgrading) to make sure your application is compatible with the new version.
22+
23+
Here's an example compatibility matrix. **Y** indicates compatibility and **N** indicates a potential incompatibility:
24+
25+
| Compatibility type | Major version| Minor version | Patch |
26+
| --- | --- | --- | --- |
27+
| Client-Server wire compatibility | N | Y | Y |
28+
| Server-Server compatibility | N | Y | Y |
29+
| File format compatibility | N | Y | Y |
30+
| Client API compatibility | N | Y | Y |
31+
| Client binary compatibility | N | N | Y |
32+
| **Server-side limited API compatibility** | | | |
33+
| Stable | N | Y | Y |
34+
| Evolving | N | N | Y |
35+
| Unstable | N | N | N |
36+
| Dependency compatibility | N | Y | Y |
37+
| Operational compatibility | N | N | Y |
38+
39+
The HBase version release notes should describe any breaking incompatibilities. Test your application in a cluster running the target version of HDInsight and HBase.
40+
41+
For more information about HDInsight versions and compatibility, see [Azure HDInsight versions](../hdinsight-component-versioning.md).
42+
43+
## Apache HBase cluster migration overview
44+
45+
To upgrade and migrate your Apache HBase cluster on Azure HDInsight to a new storage account, you complete the following basic steps. For detailed instructions, see the detailed steps and commands.
46+
47+
Prepare the source cluster:
48+
1. Stop data ingestion.
49+
1. Check cluster health
50+
1. Stop replication if needed
51+
1. Flush `memstore` data.
52+
1. Stop HBase.
53+
1. For clusters with accelerated writes, back up the Write Ahead Log (WAL) directory.
54+
55+
Prepare the destination cluster:
56+
1. Create the destination cluster.
57+
1. Stop HBase from Ambari.
58+
1. Clean Zookeeper data.
59+
1. Switch user to HBase.
60+
61+
Complete the migration:
62+
1. Clean the destination file system, migrate the data, and remove `/hbase/hbase.id`.
63+
1. Clean and migrate the WAL.
64+
1. Start all services from the Ambari destination cluster.
65+
1. Verify HBase.
66+
1. Delete the source cluster.
67+
68+
## Detailed migration steps and commands
69+
70+
Use these detailed steps and commands to migrate your Apache HBase cluster with a new storage account.
71+
72+
### Prepare the source cluster
73+
74+
1. Stop ingestion to the source HBase cluster.
75+
76+
1. Check Hbase hbck to verify cluster health
77+
78+
1. Verify HBCK Report page on HBase UI. Healthy cluster does not show any inconsistencies
79+
80+
:::image type="content" source="./media/apache-hbase-migrate-new-version/verify-hbck-report.png" alt-text="Screenshot showing how to verify HBCK report." lightbox="./media/apache-hbase-migrate-new-version/verify-hbck-report.png":::
81+
82+
1. If any inconsistencies exist, please fix inconsistencies using [hbase hbck2](/azure/hdinsight/hbase/how-to-use-hbck2-tool/)
83+
84+
1. Note down number of regions in online at source cluster, so that the number can be referred at destination cluster after the migration.
85+
86+
:::image type="content" source="./media/apache-hbase-migrate-new-version/total-number-of-regions.png" alt-text="Screenshot showing count of number of regions." lightbox="./media/apache-hbase-migrate-new-version/total-number-of-regions.png":::
87+
88+
1. If replication enabled on the cluster, please stop it and reenable the replication on destination cluster after migration. Refer [HBase replication guide](/azure/hdinsight/hbase/apache-hbase-replication/)
89+
90+
1. Flush the source HBase cluster you're upgrading.
91+
92+
HBase writes incoming data to an in-memory store called a *`memstore`*. After the `memstore` reaches a certain size, HBase flushes it to disk for long-term storage in the cluster's storage account. Deleting the source cluster after an upgrade also deletes any data in the `memstore`s. To retain the data, manually flush each table's `memstore` to disk before upgrading.
93+
94+
You can flush the `memstore` data by running the [flush_all_tables.sh](https://github.com/Azure/hbase-utils/blob/master/scripts/flush_all_tables.sh) script from the [hbase-utils GitHub repository](https://github.com/Azure/hbase-utils/).
95+
96+
You can also flush the `memstore` data by running the following HBase shell command from inside the HDInsight cluster:
97+
98+
```bash
99+
hbase shell
100+
flush "<table-name>"
101+
```
102+
1. Wait for 15 mins and verify that all the procedures are completed, and masterProcWal files doesn't have any pending procedures.
103+
104+
1. Verify the Procedures page to confirm that there are no pending procedures.
105+
106+
:::image type="content" source="./media/apache-hbase-migrate-new-version/verify-master-process.png" alt-text="Screenshot showing how to verify master process." lightbox="./media/apache-hbase-migrate-new-version/verify-master-process.png":::
107+
1. STOP HBase
108+
109+
1. Sign in to [Apache Ambari](https://ambari.apache.org/) on the source cluster with `https://<OLDCLUSTERNAME>.azurehdinsight.net`
110+
1. Turn on maintenance mode for HBase.
111+
1. Stop HBase Masters only first. First stop standby masters, in last stop Active HBase master.
112+
113+
:::image type="content" source="./media/apache-hbase-migrate-new-version/stop-master-services.png" alt-text="Screenshot showing how to stop master services." lightbox="./media/apache-hbase-migrate-new-version/stop-master-services.png":::
114+
115+
1. Stop the HBase service, it stops remaining servers.
116+
117+
> [!NOTE]
118+
> HBase 2.4.11 does not support some of the old Procedures.
119+
>
120+
> For more information on connecting to and using Ambari, see [Manage HDInsight clusters by using the Ambari Web UI](../hdinsight-hadoop-manage-ambari.md).
121+
>
122+
> Stopping HBase in the previous steps mentioned how Hbase avoids creating new master proc WALs.
123+
124+
1. If your source HBase cluster doesn't have the [Accelerated Writes](apache-hbase-accelerated-writes.md) feature, skip this step. For source HBase clusters with Accelerated Writes, back up the WAL directory under HDFS by running the following commands from an SSH session on any source cluster Zookeeper node or worker node.
125+
126+
```bash
127+
hdfs dfs -mkdir /hbase-wal-backup
128+
hdfs dfs -cp hdfs://mycluster/hbasewal /hbase-wal-backup
129+
```
130+
131+
### Prepare the destination cluster
132+
133+
1. In the Azure portal, [set up a new destination HDInsight cluster](../hdinsight-hadoop-provision-linux-clusters.md) that uses a different storage account than your source cluster.
134+
135+
1. Sign in to [Apache Ambari](https://ambari.apache.org/) on the new cluster at `https://<NEWCLUSTERNAME>.azurehdinsight.net`, and stop the HBase services.
136+
137+
1. Clean the Zookeeper data on the destination cluster by running the following commands in any Zookeeper node or worker node:
138+
139+
```bash
140+
hbase zkcli
141+
rmr /hbase-unsecure
142+
quit
143+
```
144+
145+
1. Switch the user to HBase by running `sudo su hbase`.
146+
147+
### Clean and migrate the file system and WAL
148+
149+
Run the following commands, depending on your source HDInsight version and whether the source and destination clusters have Accelerated Writes. The destination cluster is always HDInsight version 4.0, since HDInsight 3.6 is in Basic support and isn't recommended for new clusters.
150+
151+
- [The source cluster is HDInsight 4.0 with Accelerated Writes, and the destination cluster has Accelerated Writes](#the-source-cluster-is-hdinsight-40-without-accelerated-writes-and-the-destination-cluster-has-accelerated-writes).
152+
- [The source cluster is HDInsight 4.0 without Accelerated Writes, and the destination cluster has Accelerated Writes](#the-source-cluster-is-hdinsight-40-without-accelerated-writes-and-the-destination-cluster-doesnt-have-accelerated-writes).
153+
- [The source cluster is HDInsight 4.0 without Accelerated Writes, and the destination cluster doesn't have Accelerated Writes]().
154+
155+
The `<container-endpoint-url>` for the storage account is `https://<storageaccount>.blob.core.windows.net/<container-name>`. Pass the SAS token for the storage account at the very end of the URL.
156+
157+
- The `<container-fullpath>` for storage type WASB is `wasbs://<container-name>@<storageaccount>.blob.core.windows.net`
158+
- The `<container-fullpath>` for storage type Azure Data Lake Storage Gen2 is `abfs://<container-name>@<storageaccount>.dfs.core.windows.net`.
159+
160+
#### Copy commands
161+
162+
The HDFS copy command is `hdfs dfs <copy properties starting with -D> -cp`
163+
164+
Use `hadoop distcp` for better performance when copying files not in a page blob: `hadoop distcp <copy properties starting with -D>`
165+
166+
To pass the key of the storage account, use:
167+
- `-Dfs.azure.account.key.<storageaccount>.blob.core.windows.net='<storage account key>'`
168+
- `-Dfs.azure.account.keyprovider.<storageaccount>.blob.core.windows.net=org.apache.hadoop.fs.azure.SimpleKeyProvider`
169+
170+
You can also use [AzCopy](../../storage/common/storage-ref-azcopy.md) for better performance when copying HBase data files.
171+
172+
1. Run the AzCopy command:
173+
174+
```bash
175+
azcopy cp "<source-container-endpoint-url>/hbase" "<target-container-endpoint-url>" --recursive
176+
```
177+
178+
1. If the destination storage account is Azure Blob storage, do this step after the copy. If the destination storage account is Data Lake Storage Gen2, skip this step.
179+
180+
The Hadoop WASB driver uses special zero sized blobs corresponding to every directory. AzCopy skips these files when doing the copy. Some WASB operations use these blobs, so you must create them in the destination cluster. To create the blobs, run the following Hadoop command from any node in the destination cluster:
181+
182+
```bash
183+
sudo -u hbase hadoop fs -chmod -R 0755 /hbase
184+
```
185+
186+
You can download AzCopy from [Get started with AzCopy](../../storage/common/storage-use-azcopy-v10.md). For more information about using AzCopy, see [azcopy copy](../../storage/common/storage-ref-azcopy-copy.md).
187+
188+
189+
#### The source cluster is HDInsight 4.0 without Accelerated Writes, and the destination cluster has Accelerated Writes
190+
191+
1. To clean the file system and migrate data, run the following commands:
192+
193+
```bash
194+
hdfs dfs -rm -r /hbase
195+
hadoop distcp <source-container-fullpath>/hbase /
196+
```
197+
198+
1. Remove `hbase.id` by running `hdfs dfs -rm /hbase/hbase.id`
199+
200+
1. To clean and migrate the WAL, run the following commands:
201+
202+
```bash
203+
hdfs dfs -rm -r hdfs://<destination-cluster>/hbasewal
204+
hdfs dfs -Dfs.azure.page.blob.dir="/hbase-wals" -cp <source-container-fullpath>/hbase-wals hdfs://<destination-cluster>/hbasewal
205+
```
206+
207+
#### The source cluster is HDInsight 4.0 without Accelerated Writes, and the destination cluster doesn't have Accelerated Writes
208+
209+
1. To clean the file system and migrate data, run the following commands:
210+
211+
```bash
212+
hdfs dfs -rm -r /hbase
213+
hadoop distcp <source-container-fullpath>/hbase /
214+
```
215+
216+
1. Remove `hbase.id` by running `hdfs dfs -rm /hbase/hbase.id`
217+
218+
1. To clean and migrate the WAL, run the following commands:
219+
220+
```bash
221+
hdfs dfs -rm -r /hbase-wals/*
222+
hdfs dfs -Dfs.azure.page.blob.dir="/hbase-wals" -cp <source-container-fullpath>/hbase-wals /
223+
```
224+
225+
### Complete the migration
226+
227+
1. On the destination cluster, save your changes and restart all required services as indicated by Ambari.
228+
229+
1. Point your application to the destination cluster.
230+
231+
> [!NOTE]
232+
> The static DNS name for your application changes when you upgrade. Rather than hard-coding this DNS name, you can configure a CNAME in your domain name's DNS settings that points to the cluster's name. Another option is to use a configuration file for your application that you can update without redeploying.
233+
234+
1. Start the ingestion.
235+
236+
1. Verify HBase consistency and simple Data Definition Language (DDL) and Data Manipulation Language (DML) operations.
237+
238+
1. If the destination cluster is satisfactory, delete the source cluster.
239+
240+
## Troubleshooting
241+
242+
### Use case 1:
243+
If Hbase masters and region servers up and regions stuck in transition or only one region i.e `hbase:meta` region is assigned. Waiting for other regions to assign
244+
245+
**Solution:**
246+
247+
1. ssh into any ZooKeeper node of original cluster and run `kinit -k -t /etc/security/keytabs/hbase.service.keytab hbase/<zk FQDN>` if this is ESP cluster
248+
1. Run `echo "scan '`hbase:meta`'" | hbase shell > meta.out` to read the `hbase:meta` into a file
249+
1. Run `grep "info:sn" meta.out | awk '{print $4}' | sort | uniq` to get all RS instance names where the regions were present in old cluster. Output should be like `value=<wn FQDN>,16020,........`
250+
1. Create a dummy WAL dir with that `wn` value
251+
252+
If the cluster is accelerated write cluster
253+
```
254+
hdfs dfs -mkdir hdfs://mycluster/hbasewal/WALs/<wn FQDN>,16020,.........
255+
```
256+
If the cluster is nonaccelarated Write cluster
257+
```
258+
hdfs dfs -mkdir /hbase-wals/WALs/<wn FQDN>,16020,.........
259+
```
260+
1. Restart Active `Hmaster`
261+
## Next steps
262+
263+
To learn more about [Apache HBase](https://hbase.apache.org/) and upgrading HDInsight clusters, see the following articles:
264+
265+
- [Upgrade an HDInsight cluster to a newer version](../hdinsight-upgrade-cluster.md)
266+
- [Monitor and manage Azure HDInsight using the Apache Ambari Web UI](../hdinsight-hadoop-manage-ambari.md)
267+
- [Azure HDInsight versions](../hdinsight-component-versioning.md)
268+
- [Optimize Apache HBase](../optimize-hbase-ambari.md)

0 commit comments

Comments
 (0)