You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Migrate Azure HDInsight 3.6 Hive workloads to HDInsight 4.0
3
3
description: Learn how to migrate Apache Hive workloads on HDInsight 3.6 to HDInsight 4.0.
4
-
ms.service: hdinsight
5
4
author: msft-tacox
6
5
ms.author: tacox
7
6
ms.reviewer: jasonh
7
+
ms.service: hdinsight
8
8
ms.topic: conceptual
9
-
ms.date: 04/24/2019
9
+
ms.date: 11/13/2019
10
10
---
11
+
11
12
# Migrate Azure HDInsight 3.6 Hive workloads to HDInsight 4.0
12
13
13
14
This document shows you how to migrate Apache Hive and LLAP workloads on HDInsight 3.6 to HDInsight 4.0. HDInsight 4.0 provides newer Hive and LLAP features such as materialized views and query result caching. When you migrate your workloads to HDInsight 4.0, you can use many newer features of Hive 3 that aren't available on HDInsight 3.6.
@@ -19,42 +20,104 @@ This article covers the following subjects:
19
20
* Preservation of Hive security policies across HDInsight versions
20
21
* Query execution and debugging from HDInsight 3.6 to HDInsight 4.0
21
22
22
-
## Migrate Apache Hive metadata to HDInsight 4.0
23
+
One advantage of Hive is the ability to export metadata to an external database (referred to as the Hive Metastore). The **Hive Metastore** is responsible for storing table statistics, including the table storage location, column names, and table index information. The metastore database schema differs between Hive versions. The recommended way to upgrade the Hive metastore safely is to create a copy and upgrade the copy instead of the current production environment.
24
+
25
+
## Copy metastore
26
+
27
+
HDInsight 3.6 and HDInsight 4.0 require different metastore schemas and can't share a single metastore.
28
+
29
+
### External metastore
30
+
31
+
Create a new copy of your external metastore. If you're using an external metastore, one of the safe and easy ways to make a copy of the metastore is to [restore the Database](../../sql-database/sql-database-recovery-using-backups.md#point-in-time-restore) with a different name using the SQL Database restore function. See [Use external metadata stores in Azure HDInsight](../hdinsight-use-external-metadata-stores.md) to learn more about attaching an external metastore to an HDInsight cluster.
32
+
33
+
### Internal metastore
34
+
35
+
If you're using the internal metastore, you can use queries to export object definitions in the Hive metastore, and import them into a new database.
36
+
37
+
1. Connect to the HDInsight cluster by using a [Secure Shell (SSH) client](../hdinsight-hadoop-linux-use-ssh-unix.md).
38
+
39
+
1. Connect to HiveServer2 with your [Beeline client](../hadoop/apache-hadoop-use-hive-beeline.md) from your open SSH session by entering the following command:
40
+
41
+
```hiveql
42
+
for d in `beeline -u "jdbc:hive2://localhost:10001/;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show databases;"`; do echo "create database $d; use $d;" >> alltables.sql; for t in `beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show tables;"` ; do ddl=`beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show create table $t;"`; echo "$ddl ;" >> alltables.sql ; echo "$ddl" | grep -q "PARTITIONED\s*BY" && echo "MSCK REPAIR TABLE $t ;" >> alltables.sql ; done; done
43
+
```
44
+
45
+
This command generates a file named **alltables.sql**. Because default database can't be deleted/re-created, please remove `create database default;` statement in **alltables.sql**.
46
+
47
+
1. Exit your SSH session. Then enter a scp command to download **alltables.sql** locally.
One advantage of Hive is the ability to export metadata to an external database (referred to as the Hive Metastore). The **Hive Metastore** is responsible for storing table statistics, including the table storage location, column names, and table index information. The metastore database schema differs between Hive versions. Do the following to upgrade a HDInsight 3.6 Hive Metastore so that it's compatible with HDInsight 4.0.
65
+
## Upgrade metastore
25
66
26
-
1. Create a new copy of your external metastore. HDInsight 3.6 and HDInsight 4.0 require different metastore schemas and can't share a single metastore. See [Use external metadata stores in Azure HDInsight](../hdinsight-use-external-metadata-stores.md) to learn more about attaching an external metastore to an HDInsight cluster.
27
-
2. Launch a script action against your HDI 3.6 cluster, with "Head nodes" as the node type for execution. Paste the following URI into the textbox marked "Bash Script URI": https://hdiconfigactions.blob.core.windows.net/hivemetastoreschemaupgrade/launch-schema-upgrade.sh.
28
-
In the textbox marked "Arguments", enter the servername, database, username and password for the **copied** Hive metastore, separated by spaces. Do not include ".database.windows.net" when specifying the servername.
67
+
Once the metastore **copy** is complete, run a schema upgrade script in [Script Action](../hdinsight-hadoop-customize-cluster-linux.md) on the existing HDInsight 3.6 cluster to upgrade the new metastore to Hive 3 schema. This allows the database to be attached as HDInsight 4.0 metastore.
29
68
30
-
> [!Warning]
69
+
Use the values in the table further below. Replace `SQLSERVERNAME DATABASENAME USERNAME PASSWORD` with the appropriate values for the **copied** Hive metastore, separated by spaces. Don't include ".database.windows.net" when specifying the SQL server name.
> The upgrade which converts the HDInsight 3.6 metadata schema to the HDInsight 4.0 schema, cannot be reversed.
32
81
82
+
You can verify the upgrade by running the following sql query against the database:
83
+
84
+
```sql
85
+
select * from dbo.version
86
+
```
87
+
33
88
## Migrate Hive tables to HDInsight 4.0
34
89
35
90
After completing the previous set of steps to migrate the Hive Metastore to HDInsight 4.0, the tables and databases recorded in the metastore will be visible from within the HDInsight 4.0 cluster by executing `show tables` or `show databases` from within the cluster. See [Query execution across HDInsight versions](#query-execution-across-hdinsight-versions) for information on query execution in HDInsight 4.0 clusters.
36
91
37
92
The actual data from the tables, however, isn't accessible until the cluster has access to the necessary storage accounts. To make sure your HDInsight 4.0 cluster can access the same data as your old HDInsight 3.6 cluster, complete the following steps:
38
93
39
-
1. Determine the Azure storage account of your table or database using describe formatted.
40
-
2. If your HDInsight 4.0 cluster is already running, attach the Azure storage account to the cluster via Ambari. If you haven't yet created the HDInsight 4.0 cluster, make sure the Azure storage account is specified as either the primary or a secondary cluster storage account. For more information about adding storage accounts to HDInsight clusters, see [Add additional storage accounts to HDInsight](../hdinsight-hadoop-add-storage.md).
94
+
1. Determine the Azure storage account of your table or database.
95
+
96
+
1. If your HDInsight 4.0 cluster is already running, attach the Azure storage account to the cluster via Ambari. If you haven't yet created the HDInsight 4.0 cluster, make sure the Azure storage account is specified as either the primary or a secondary cluster storage account. For more information about adding storage accounts to HDInsight clusters, see [Add additional storage accounts to HDInsight](../hdinsight-hadoop-add-storage.md).
97
+
98
+
## Deploy new HDInsight 4.0 and connect to the new metastore
99
+
100
+
After the schema upgrade is complete, deploy a new HDInsight 4.0 cluster and connect the upgraded metastore. If you've already deployed 4.0, set it so that you can connect to the metastore from Ambari.
101
+
102
+
## Run schema migration script from HDInsight 4.0
41
103
42
-
> [!Note]
43
-
> Tables are treated differently in HDInsight 3.6 and HDInsight 4.0. For this reason, you cannot share the same tables for clusters of different versions. If you want to use HDInsight 3.6 at the same time as HDInsight 4.0, you must have separate copies of the data for each version.
104
+
Tables are treated differently in HDInsight 3.6 and HDInsight 4.0. For this reason, you can't share the same tables for clusters of different versions. If you want to use HDInsight 3.6 at the same time as HDInsight 4.0, you must have separate copies of the data for each version.
44
105
45
106
Your Hive workload may include a mix of ACID and non-ACID tables. One key difference between Hive on HDInsight 3.6 (Hive 2) and Hive on HDInsight 4.0 (Hive 3) is ACID-compliance for tables. In HDInsight 3.6, enabling Hive ACID-compliance requires additional configuration, but in HDInsight 4.0 tables are ACID-compliant by default. The only action required before migration is to run a major compaction against the ACID table on the 3.6 cluster. From the Hive view or from Beeline, run the following query:
46
107
47
-
```bash
108
+
```sql
48
109
altertable myacidtable compact 'major';
49
110
```
50
111
51
112
This compaction is required because HDInsight 3.6 and HDInsight 4.0 ACID tables understand ACID deltas differently. Compaction enforces a clean slate that guarantees consistency. Section 4 of the [Hive migration documentation](https://docs.hortonworks.com/HDPDocuments/Ambari-2.7.3.0/bk_ambari-upgrade-major/content/prepare_hive_for_upgrade.html) contains guidance for bulk compaction of HDInsight 3.6 ACID tables.
52
113
53
-
Once you have completed the metastore migration and compaction steps, you can migrate the actual warehouse. After you complete the Hive warehouse migration, the HDInsight 4.0 warehouse will have the following properties:
114
+
Once you've completed the metastore migration and compaction steps, you can migrate the actual warehouse. After you complete the Hive warehouse migration, the HDInsight 4.0 warehouse will have the following properties:
54
115
55
-
* External tables in HDInsight 3.6 will be external tables in HDInsight 4.0
56
-
* Non-transactional managed tables in HDInsight 3.6 will be external tables in HDInsight 4.0
57
-
* Transactional managed tables in HDInsight 3.6 will be managed tables in HDInsight 4.0
You may need to adjust the properties of your warehouse before executing the migration. For example, if you expect that some table will be accessed by a third party (such as an HDInsight 3.6 cluster), that table must be external once the migration is complete. In HDInsight 4.0, all managed tables are transactional. Therefore, managed tables in HDInsight 4.0 should only be accessed by HDInsight 4.0 clusters.
60
123
@@ -63,15 +126,15 @@ Once your table properties are set correctly, execute the Hive warehouse migrati
63
126
1. Connect to your cluster headnode using SSH. For instructions, see [Connect to HDInsight using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md)
64
127
1. Open a login shell as the Hive user by running `sudo su - hive`
65
128
1. Determine the data platform stack version by executing `ls /usr/hdp`. This will display a version string that you should use in the next command.
66
-
1. Execute the following command from the shell. Replace `${{STACK_VERSION}}` with the version string from the previous step:
129
+
1. Execute the following command from the shell. Replace `STACK_VERSION` with the version string from the previous step:
After the migration tool completes, your Hive warehouse will be ready for HDInsight 4.0.
135
+
After the migration tool completes, your Hive warehouse will be ready for HDInsight 4.0.
73
136
74
-
> [!Important]
137
+
> [!Important]
75
138
> Managed tables in HDInsight 4.0 (including tables migrated from 3.6) should not be accessed by other services or applications, including HDInsight 3.6 clusters.
76
139
77
140
## Secure Hive across HDInsight versions
@@ -84,17 +147,38 @@ Since HDInsight 3.6, HDInsight integrates with Azure Active Directory using HDIn
84
147
4. Navigate to the **Ranger Service Manager** panel in your HDInsight 4.0 cluster.
85
148
5. Navigate to the policy named **HIVE** and import the ranger policy json from step 2.
86
149
150
+
## Check compatibility and modify codes as needed in test app
151
+
152
+
When migrating workloads such as existing programs and queries, please check the release notes and documentation for changes and apply changes as necessary. If your HDInsight 3.6 cluster is using a shared Spark and Hive metastore, [additional configuration using Hive Warehouse Connector](./apache-hive-warehouse-connector.md) is required.
153
+
154
+
## Deploy new app for production
155
+
156
+
To switch to the new cluster, e.g. you can install a new client application and use it as a new production environment, or you can upgrade your existing client application and switch to HDInsight 4.0.
157
+
158
+
## Switch HDInsight 4.0 to the production
159
+
160
+
If differences were created in the metastore while testing, you'll need to update the changes just before switching. In this case, you can export & import the metastore and then upgrade again.
161
+
162
+
## Remove the old production
163
+
164
+
Once you've confirmed that the release is complete and fully operational, you can remove version 3.6 and the previous metastore. Please make sure that everything is migrated before deleting the environment.
165
+
87
166
## Query execution across HDInsight versions
88
167
89
168
There are two ways to execute and debug Hive/LLAP queries within an HDInsight 3.6 cluster. HiveCLI provides a command-line experience and the Tez view/Hive view provides a GUI-based workflow.
90
169
91
170
In HDInsight 4.0, HiveCLI has been replaced with Beeline. HiveCLI is a thrift client for Hiveserver 1, and Beeline is a JDBC client that provides access to Hiveserver 2. Beeline can also be used to connect to any other JDBC-compatible database endpoint. Beeline is available out-of-box on HDInsight 4.0 without any installation needed.
92
171
93
-
In HDInsight 3.6, the GUI client for interacting with Hive server is the Ambari Hive View. HDInsight 4.0 replaces the Hive View with Hortonworks Data Analytics Studio (DAS). DAS doesn't ship with HDInsight clusters out-of-box and is not an officially supported package. However, DAS can be installed on the cluster as follows:
172
+
In HDInsight 3.6, the GUI client for interacting with Hive server is the Ambari Hive View. HDInsight 4.0 replaces the Hive View with Hortonworks Data Analytics Studio (DAS). DAS doesn't ship with HDInsight clusters out-of-box and isn't an officially supported package. However, DAS can be installed on the cluster using a [script action](../hdinsight-hadoop-customize-cluster-linux.md) as follows:
94
173
95
-
Launch a script action against your cluster, with "Head nodes" as the node type for execution. Paste the following URI into the textbox marked "Bash Script URI": https://hdiconfigactions.blob.core.windows.net/dasinstaller/LaunchDASInstaller.sh
0 commit comments