You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/interactive-query/apache-hive-warehouse-connector.md
+36-14Lines changed: 36 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.topic: how-to
6
6
author: abhishjain002
7
7
ms.author: abhishjain
8
8
ms.reviewer: nijelsf
9
-
ms.date: 03/11/2025
9
+
ms.date: 08/08/2025
10
10
---
11
11
12
12
# Integrate Apache Spark and Apache Hive with Hive Warehouse Connector in Azure HDInsight
@@ -20,8 +20,8 @@ Apache Hive offers support for database transactions that are Atomic, Consistent
20
20
Apache Spark has a Structured Streaming API that gives streaming capabilities not available in Apache Hive. Beginning with HDInsight 4.0, Apache Spark 2.3.1 & above, and Apache Hive 3.1.0 have separate metastore catalogs, which make interoperability difficult.
21
21
22
22
The Hive Warehouse Connector (HWC) makes it easier to use Spark and Hive together. The HWC library loads data from LLAP daemons to Spark executors in parallel. This process makes it more efficient and adaptable than a standard JDBC connection from Spark to Hive. This brings out two different execution modes for HWC:
23
-
>- Hive JDBC mode via HiveServer2
24
-
>- Hive LLAP mode using LLAP daemons **[Recommended]**
23
+
- Hive JDBC mode via HiveServer2
24
+
- Hive LLAP mode using LLAP daemons **[Recommended]**
25
25
26
26
By default, HWC is configured to use Hive LLAP daemons.
27
27
For executing Hive queries (both read and write) using the above modes with their respective APIs, see [HWC APIs](./hive-warehouse-connector-apis.md).
@@ -41,7 +41,6 @@ Some of the operations supported by the Hive Warehouse Connector are:
41
41
## Hive Warehouse Connector setup
42
42
43
43
> [!IMPORTANT]
44
-
> - The HiveServer2 Interactive instance installed on Spark 2.4 Enterprise Security Package clusters is not supported for use with the Hive Warehouse Connector. Instead, you must configure a separate HiveServer2 Interactive cluster to host your HiveServer2 Interactive workloads. A Hive Warehouse Connector configuration that utilizes a single Spark 2.4 cluster is not supported.
45
44
> - Hive Warehouse Connector (HWC) Library is not supported for use with Interactive Query Clusters where Workload Management (WLM) feature is enabled. <br>
46
45
In a scenario where you only have Spark workloads and want to use HWC Library, ensure Interactive Query cluster doesn't have Workload Management feature enabled (`hive.server2.tez.interactive.queue` configuration is not set in Hive configs). <br>
47
46
For a scenario where both Spark workloads (HWC) and LLAP native workloads exists, You need to create two separate Interactive Query Clusters with shared metastore database. One cluster for native LLAP workloads where WLM feature can be enabled on need basis and other cluster for HWC only workload where WLM feature shouldn't be configured.
@@ -56,15 +55,13 @@ Hive Warehouse Connector needs separate clusters for Spark and Interactive Query
56
55
57
56
| HWC Version | Spark Version | InteractiveQuery Version |
1. Create an HDInsight Spark **4.0** cluster with a storage account and a custom Azure virtual network. For information on creating a cluster in an Azure virtual network, see [Add HDInsight to an existing virtual network](../../hdinsight/hdinsight-plan-virtual-network-deployment.md#existingvnet).
62
+
1. Create an HDInsight Spark **5.1** cluster with a storage account and a custom Azure virtual network. For information on creating a cluster in an Azure virtual network, see [Add HDInsight to an existing virtual network](../../hdinsight/hdinsight-plan-virtual-network-deployment.md#existingvnet).
66
63
67
-
1. Create an HDInsight Interactive Query (LLAP) **4.0** cluster with the same storage account and Azure virtual network as the Spark cluster.
64
+
1. Create an HDInsight Interactive Query (LLAP) **5.1** cluster with the same storage account and Azure virtual network as the Spark cluster.
68
65
69
66
### Configure HWC settings
70
67
@@ -102,6 +99,20 @@ value. The value may be similar to: `thrift://iqgiro.rekufuk2y2cezcbowjkbwfnyvd.
102
99
103
100
1. Save changes and restart all affected components.
104
101
102
+
#### Additional configurations for Spark and Hive
103
+
104
+
The following configuration needs to be done for **all** head and worker nodes of your Spark and Hive clusters.
105
+
106
+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark and Apache Hive nodes. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
1. Append the file content of /etc/hosts of hive cluster in /etc/hosts file of spark cluster and vice-versa.
113
+
114
+
1. Once all nodes are updated then, restart both the clusters.
115
+
105
116
### Configure HWC for Enterprise Security Package (ESP) clusters
106
117
107
118
The Enterprise Security Package (ESP) provides enterprise-grade capabilities like Active Directory-based authentication, multi-user support, and role-based access control for Apache Hadoop clusters in Azure HDInsight. For more information on ESP, see [Use Enterprise Security Package in HDInsight](../domain-joined/apache-domain-joined-architecture.md).
@@ -118,16 +129,27 @@ Apart from the configurations mentioned in the previous section, add the followi
118
129
119
130
* From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/#/main/services/HIVE/summary` where CLUSTERNAME is the name of your Interactive Query cluster. Click on **HiveServer2 Interactive**. You'll see the Fully Qualified Domain Name (FQDN) of the head node on which LLAP is running as shown in the screenshot. Replace `<llap-headnode>` with this value.
120
131
121
-
:::image type="content" source="./media/apache-hive-warehouse-connector/head-node-hive-server-interactive.png" alt-text="hive warehouse connector Head Node." border="true":::
132
+
:::image type="content" source="./media/apache-hive-warehouse-connector/head-node-hive-server-interactive.png" alt-text="Screenshot of hive warehouse connector Head Node." border="true":::
122
133
123
134
* Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Interactive Query cluster. Look for `default_realm` parameter in the `/etc/krb5.conf` file. Replace `<AAD-DOMAIN>` with this value as an uppercase string, otherwise the credential won't be found.
* For instance, `hive/hn*.mjry42ikpruuxgs2qy2kpg4q5e.cx.internal.cloudapp.net@PKRSRVUQVMAE6J85.D2.INTERNAL.CLOUDAPP.NET`.
128
139
140
+
1. The following configuration needs to be done for **all** head and worker nodes of your Spark and Hive clusters.
141
+
142
+
* Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark and Apache Hive nodes. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
* Append tenant domain name (e.g. "fabrikam.onmicrosoft.com”) in the last line of /etc/resolv.conf in head and worker nodes of your Spark and Hive clusters.
149
+
129
150
1. Save changes and restart components as needed.
130
151
152
+
131
153
## Hive Warehouse Connector usage
132
154
133
155
You can choose between a few different methods to connect to your Interactive Query cluster and execute queries using the Hive Warehouse Connector. Supported methods include the following tools:
@@ -234,21 +256,21 @@ kinit USERNAME
234
256
hive.executeQuery("SELECT * FROM demo").show()
235
257
```
236
258
237
-
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-table-before-ranger-policy.png" alt-text="demo table before applying ranger policy." border="true":::
259
+
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-table-before-ranger-policy.png" alt-text="Screenshot of demo table before applying ranger policy." border="true":::
238
260
239
261
1. Apply a column masking policy that only shows the last four characters of the column.
240
262
1. Go to the RangerAdminUI at `https://LLAPCLUSTERNAME.azurehdinsight.net/ranger/`.
241
263
1. Click on the Hive service for your cluster under **Hive**.
242
-
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-ranger-service-manager.png" alt-text="ranger service manager." border="true":::
264
+
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-ranger-service-manager.png" alt-text="Screenshot of ranger service manager." border="true":::
243
265
1. Click on the **Masking** tab and then**AddNewPolicy**
1. Provide a desired policy name. Selectdatabase: **Default**, Hivetable: **demo**, Hivecolumn: **name**, User:**rsadmin2**, AccessTypes:**select**, and **Partialmask: show last 4** from the **SelectMaskingOption** menu. Click**Add**.
Copy file name to clipboardExpand all lines: articles/hdinsight/interactive-query/hive-warehouse-connector-v2-apis.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,12 +6,12 @@ ms.topic: how-to
6
6
author: abhishjain002
7
7
ms.author: abhishjain
8
8
ms.reviewer: nijelsf
9
-
ms.date: 01/02/2025
9
+
ms.date: 08/07/2025
10
10
---
11
11
12
-
# Hive Warehouse Connector 2.0 APIs in Azure HDInsight
12
+
# Hive Warehouse Connector 2.1 and 2.0 APIs in Azure HDInsight
13
13
14
-
This article lists all the APIs supported by Hive warehouse connector 2.0. All the examples shown are how to run using spark-shell and hive warehouse connector session.
14
+
This article lists all the APIs supported by Hive warehouse connector 2.1 and 2.0. All the examples shown are how to run using spark-shell and hive warehouse connector session.
0 commit comments