You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/interactive-query/apache-hive-migrate-workloads.md
+42-9Lines changed: 42 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ author: reachnijel
5
5
ms.author: nijelsf
6
6
ms.service: hdinsight
7
7
ms.topic: how-to
8
-
ms.date: 07/18/2022
8
+
ms.date: 10/20/2022
9
9
---
10
10
11
11
# Migrate Azure HDInsight 3.6 Hive workloads to HDInsight 4.0
@@ -47,7 +47,7 @@ Migration of Hive tables to a new Storage Account needs to be done as a separate
47
47
This step uses the [`Hive Schema Tool`](https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool) from HDInsight 4.0 to upgrade the metastore schema.
48
48
49
49
> [!WARNING]
50
-
> This step is not reversible. Run this only on a copy of the metastore.
50
+
> This step isn't reversible. Run this only on a copy of the metastore.
51
51
52
52
1. Create a temporary HDInsight 4.0 cluster to access the 4.0 Hive `schematool`. You can use the [default Hive metastore](../hdinsight-use-external-metadata-stores.md#default-metastore) for this step.
53
53
@@ -65,7 +65,7 @@ This step uses the [`Hive Schema Tool`](https://cwiki.apache.org/confluence/disp
65
65
> [!NOTE]
66
66
> This utility uses client `beeline` to execute SQL scripts in `/usr/hdp/$STACK_VERSION/hive/scripts/metastore/upgrade/mssql/upgrade-*.mssql.sql`.
67
67
>
68
-
> SQL Syntax in these scripts is not necessarily compatible to other client tools. For example, [SSMS](/sql/ssms/download-sql-server-management-studio-ssms) and [Query Editor on Azure Portal](/azure/azure-sql/database/connect-query-portal) require keyword `GO` after each command.
68
+
> SQL Syntax in these scripts isn't necessarily compatible to other client tools. For example, [SSMS](/sql/ssms/download-sql-server-management-studio-ssms) and [Query Editor on Azure Portal](/azure/azure-sql/database/connect-query-portal) require keyword `GO` after each command.
69
69
>
70
70
> If any script fails due to resource capacity or transaction timeouts, scale up the SQL Database.
71
71
@@ -85,7 +85,7 @@ Create a new HDInsight 4.0 cluster, [selecting the upgraded Hive metastore](../h
85
85
86
86
* The new cluster doesn't require having the same default filesystem.
87
87
88
-
* If the metastore contains tables residing in multiple Storage Accounts, you need to add those Storage Accounts to the new cluster to access those tables. See [add additional Storage Accounts to HDInsight](../hdinsight-hadoop-add-storage.md).
88
+
* If the metastore contains tables residing in multiple Storage Accounts, you need to add those Storage Accounts to the new cluster to access those tables. See [add extra Storage Accounts to HDInsight](../hdinsight-hadoop-add-storage.md).
89
89
90
90
* If Hive jobs fail due to storage inaccessibility, verify that the table location is in a Storage Account added to the cluster.
91
91
@@ -104,20 +104,53 @@ sudo su - hive
104
104
STACK_VERSION=$(hdp-select status hive-server2 | awk '{ print $3; }')
### 6. Class not found error with `MultiDelimitSerDe`
108
+
109
+
**Problem**
110
+
111
+
In certain situations when running a Hive query, you might receive `java.lang.ClassNotFoundException` stating `org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe` class isn't found. This error occurs when customer migrates from HDInsight 3.6 to HDInsight 4.0. The SerDe class `org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe`, which is a part of `hive-contrib-1.2.1000.2.6.5.3033-1.jar` in HDInsight 3.6 is removed and we're using `org.apache.hadoop.hive.serde2.MultiDelimitSerDe` class, which is a part of `hive-exec jar` in HDI-4.0. `hive-exec jar` will load to HS2 by default when we start the service.
112
+
113
+
**STEPS TO TROUBLESHOOT**
114
+
115
+
1. Check if any JAR under a folder (likely that it supposed to be under Hive libraries folder, which is `/usr/hdp/current/hive/lib` in HDInsight) contains this class or not.
116
+
1. Check for the class `org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe` and `org.apache.hadoop.hive.serde2.MultiDelimitSerDe` as mentioned in the solution.
117
+
118
+
**Solution**
119
+
120
+
1. Although a JAR file is a binary file, you can still use `grep` command with `-Hrni` switches as below to search for a particular class name
1. From the above output, we can confirm that no jar contains the class `org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe` and hive-exec jar contains `org.apache.hadoop.hive.serde2.MultiDelimitSerDe`.
133
+
1. Try to create the table with row format DerDe as `ROW FORMAT SERDE org.apache.hadoop.hive.serde2.MultiDelimitSerDe`
134
+
1. This command will fix the issue. If you've already created the table, you can rename it using the below commands
135
+
```
136
+
Hive => ALTER TABLE TABLE_NAME SET SERDE 'org.apache.hadoop.hive.serde2.MultiDelimitSerDe'
137
+
Backend DB => UPDATE SERDES SET SLIB='org.apache.hadoop.hive.serde2.MultiDelimitSerDe' where SLIB='org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe';
138
+
```
139
+
The update command is to update the details manually in the backend DB and the alter command is used to alter the table with the new SerDe class from beeline or Hive.
107
140
108
141
## Secure Hive across HDInsight versions
109
142
110
143
HDInsight optionally integrates with Azure Active Directory using HDInsight Enterprise Security Package (ESP). ESP uses Kerberos and Apache Ranger to manage the permissions of specific resources within the cluster. Ranger policies deployed against Hive in HDInsight 3.6 can be migrated to HDInsight 4.0 with the following steps:
111
144
112
145
1. Navigate to the Ranger Service Manager panel in your HDInsight 3.6 cluster.
113
-
2. Navigate to the policy named **HIVE** and export the policy to a json file.
114
-
3. Make sure that all users referred to in the exported policy json exist in the new cluster. If a user is referred to in the policy json but doesn't exist in the new cluster, either add the user to the new cluster or remove the reference from the policy.
115
-
4. Navigate to the **Ranger Service Manager** panel in your HDInsight 4.0 cluster.
116
-
5. Navigate to the policy named **HIVE** and import the ranger policy json from step 2.
146
+
1. Navigate to the policy named **HIVE** and export the policy to a json file.
147
+
1. Make sure that all users referred to in the exported policy json exist in the new cluster. If a user is referred to in the policy json but doesn't exist in the new cluster, either add the user to the new cluster or remove the reference from the policy.
148
+
1. Navigate to the **Ranger Service Manager** panel in your HDInsight 4.0 cluster.
149
+
1. Navigate to the policy named **HIVE** and import the ranger policy json from step 2.
117
150
118
151
## Hive changes in HDInsight 4.0 that may require application changes
119
152
120
-
* See [Additional configuration using Hive Warehouse Connector](./apache-hive-warehouse-connector.md) for sharing the metastore between Spark and Hive for ACID tables.
153
+
* See [Extra configuration using Hive Warehouse Connector](./apache-hive-warehouse-connector.md) for sharing the metastore between Spark and Hive for ACID tables.
121
154
122
155
* HDInsight 4.0 uses [Storage Based Authorization](https://cwiki.apache.org/confluence/display/Hive/Storage+Based+Authorization+in+the+Metastore+Server). If you modify file permissions or create folders as a different user than Hive, you'll likely hit Hive errors based on storage permissions. To fix, grant `rw-` access to the user. See [HDFS Permissions Guide](https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html).
0 commit comments