Skip to content

Commit 2d0c0aa

Browse files
authored
Merge pull request #303958 from abhishjain002/patch-8
Update apache-hive-warehouse-connector.md
2 parents 1e59d60 + 2efb2a8 commit 2d0c0aa

File tree

3 files changed

+40
-18
lines changed

3 files changed

+40
-18
lines changed

articles/hdinsight/TOC.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -907,7 +907,7 @@ items:
907907
href: ./interactive-query/apache-hive-warehouse-connector-zeppelin.md
908908
- name: HWC 1.0 Supported APIs
909909
href: ./interactive-query/hive-warehouse-connector-apis.md
910-
- name: HWC 2.0 Supported APIs
910+
- name: HWC 2.1 and 2.0 Supported APIs
911911
href: ./interactive-query/hive-warehouse-connector-v2-apis.md
912912
- name: Apache Hive with Hadoop
913913
href: ./hadoop/hdinsight-use-hive.md

articles/hdinsight/interactive-query/apache-hive-warehouse-connector.md

Lines changed: 36 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.topic: how-to
66
author: abhishjain002
77
ms.author: abhishjain
88
ms.reviewer: nijelsf
9-
ms.date: 03/11/2025
9+
ms.date: 08/08/2025
1010
---
1111

1212
# Integrate Apache Spark and Apache Hive with Hive Warehouse Connector in Azure HDInsight
@@ -20,8 +20,8 @@ Apache Hive offers support for database transactions that are Atomic, Consistent
2020
Apache Spark has a Structured Streaming API that gives streaming capabilities not available in Apache Hive. Beginning with HDInsight 4.0, Apache Spark 2.3.1 & above, and Apache Hive 3.1.0 have separate metastore catalogs, which make interoperability difficult.
2121

2222
The Hive Warehouse Connector (HWC) makes it easier to use Spark and Hive together. The HWC library loads data from LLAP daemons to Spark executors in parallel. This process makes it more efficient and adaptable than a standard JDBC connection from Spark to Hive. This brings out two different execution modes for HWC:
23-
> - Hive JDBC mode via HiveServer2
24-
> - Hive LLAP mode using LLAP daemons **[Recommended]**
23+
- Hive JDBC mode via HiveServer2
24+
- Hive LLAP mode using LLAP daemons **[Recommended]**
2525

2626
By default, HWC is configured to use Hive LLAP daemons.
2727
For executing Hive queries (both read and write) using the above modes with their respective APIs, see [HWC APIs](./hive-warehouse-connector-apis.md).
@@ -41,7 +41,6 @@ Some of the operations supported by the Hive Warehouse Connector are:
4141
## Hive Warehouse Connector setup
4242

4343
> [!IMPORTANT]
44-
> - The HiveServer2 Interactive instance installed on Spark 2.4 Enterprise Security Package clusters is not supported for use with the Hive Warehouse Connector. Instead, you must configure a separate HiveServer2 Interactive cluster to host your HiveServer2 Interactive workloads. A Hive Warehouse Connector configuration that utilizes a single Spark 2.4 cluster is not supported.
4544
> - Hive Warehouse Connector (HWC) Library is not supported for use with Interactive Query Clusters where Workload Management (WLM) feature is enabled. <br>
4645
In a scenario where you only have Spark workloads and want to use HWC Library, ensure Interactive Query cluster doesn't have Workload Management feature enabled (`hive.server2.tez.interactive.queue` configuration is not set in Hive configs). <br>
4746
For a scenario where both Spark workloads (HWC) and LLAP native workloads exists, You need to create two separate Interactive Query Clusters with shared metastore database. One cluster for native LLAP workloads where WLM feature can be enabled on need basis and other cluster for HWC only workload where WLM feature shouldn't be configured.
@@ -56,15 +55,13 @@ Hive Warehouse Connector needs separate clusters for Spark and Interactive Query
5655

5756
| HWC Version | Spark Version | InteractiveQuery Version |
5857
|:---:|:---:|---|
59-
| v1 | Spark 2.4 \| HDI 4.0 | Interactive Query 3.1 \| HDI 4.0 |
60-
| v2 | Spark 3.1 \| HDI 5.0 | Interactive Query 3.1 \| HDI 5.0 |
6158
| v2.1 | Spark 3.3.0 \| HDI 5.1 | Interactive Query 3.1 \| HDI 5.1 |
6259

6360
### Create clusters
6461

65-
1. Create an HDInsight Spark **4.0** cluster with a storage account and a custom Azure virtual network. For information on creating a cluster in an Azure virtual network, see [Add HDInsight to an existing virtual network](../../hdinsight/hdinsight-plan-virtual-network-deployment.md#existingvnet).
62+
1. Create an HDInsight Spark **5.1** cluster with a storage account and a custom Azure virtual network. For information on creating a cluster in an Azure virtual network, see [Add HDInsight to an existing virtual network](../../hdinsight/hdinsight-plan-virtual-network-deployment.md#existingvnet).
6663

67-
1. Create an HDInsight Interactive Query (LLAP) **4.0** cluster with the same storage account and Azure virtual network as the Spark cluster.
64+
1. Create an HDInsight Interactive Query (LLAP) **5.1** cluster with the same storage account and Azure virtual network as the Spark cluster.
6865

6966
### Configure HWC settings
7067

@@ -102,6 +99,20 @@ value. The value may be similar to: `thrift://iqgiro.rekufuk2y2cezcbowjkbwfnyvd.
10299

103100
1. Save changes and restart all affected components.
104101

102+
#### Additional configurations for Spark and Hive
103+
104+
The following configuration needs to be done for **all** head and worker nodes of your Spark and Hive clusters.
105+
106+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark and Apache Hive nodes. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
107+
108+
```cmd
109+
110+
```
111+
112+
1. Append the file content of /etc/hosts of hive cluster in /etc/hosts file of spark cluster and vice-versa.
113+
114+
1. Once all nodes are updated then, restart both the clusters.
115+
105116
### Configure HWC for Enterprise Security Package (ESP) clusters
106117
107118
The Enterprise Security Package (ESP) provides enterprise-grade capabilities like Active Directory-based authentication, multi-user support, and role-based access control for Apache Hadoop clusters in Azure HDInsight. For more information on ESP, see [Use Enterprise Security Package in HDInsight](../domain-joined/apache-domain-joined-architecture.md).
@@ -118,16 +129,27 @@ Apart from the configurations mentioned in the previous section, add the followi
118129
119130
* From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/#/main/services/HIVE/summary` where CLUSTERNAME is the name of your Interactive Query cluster. Click on **HiveServer2 Interactive**. You'll see the Fully Qualified Domain Name (FQDN) of the head node on which LLAP is running as shown in the screenshot. Replace `<llap-headnode>` with this value.
120131
121-
:::image type="content" source="./media/apache-hive-warehouse-connector/head-node-hive-server-interactive.png" alt-text="hive warehouse connector Head Node." border="true":::
132+
:::image type="content" source="./media/apache-hive-warehouse-connector/head-node-hive-server-interactive.png" alt-text="Screenshot of hive warehouse connector Head Node." border="true":::
122133
123134
* Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Interactive Query cluster. Look for `default_realm` parameter in the `/etc/krb5.conf` file. Replace `<AAD-DOMAIN>` with this value as an uppercase string, otherwise the credential won't be found.
124135
125136
:::image type="content" source="./media/apache-hive-warehouse-connector/aad-domain.png" alt-text="Screenshot of Hive warehouse connector AAD Domain." border="true":::
126137
127138
* For instance, `hive/hn*.mjry42ikpruuxgs2qy2kpg4q5e.cx.internal.cloudapp.net@PKRSRVUQVMAE6J85.D2.INTERNAL.CLOUDAPP.NET`.
128139
140+
1. The following configuration needs to be done for **all** head and worker nodes of your Spark and Hive clusters.
141+
142+
* Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark and Apache Hive nodes. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
143+
144+
```cmd
145+
146+
```
147+
148+
* Append tenant domain name (e.g. "fabrikam.onmicrosoft.com”) in the last line of /etc/resolv.conf in head and worker nodes of your Spark and Hive clusters.
149+
129150
1. Save changes and restart components as needed.
130151
152+
131153
## Hive Warehouse Connector usage
132154
133155
You can choose between a few different methods to connect to your Interactive Query cluster and execute queries using the Hive Warehouse Connector. Supported methods include the following tools:
@@ -234,21 +256,21 @@ kinit USERNAME
234256
hive.executeQuery("SELECT * FROM demo").show()
235257
```
236258

237-
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-table-before-ranger-policy.png" alt-text="demo table before applying ranger policy." border="true":::
259+
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-table-before-ranger-policy.png" alt-text="Screenshot of demo table before applying ranger policy." border="true":::
238260

239261
1. Apply a column masking policy that only shows the last four characters of the column.
240262
1. Go to the Ranger Admin UI at `https://LLAPCLUSTERNAME.azurehdinsight.net/ranger/`.
241263
1. Click on the Hive service for your cluster under **Hive**.
242-
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-ranger-service-manager.png" alt-text="ranger service manager." border="true":::
264+
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-ranger-service-manager.png" alt-text="Screenshot of ranger service manager." border="true":::
243265
1. Click on the **Masking** tab and then **Add New Policy**
244266

245-
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-ranger-hive-policy-list.png" alt-text="hive warehouse connector ranger hive policy list." border="true":::
267+
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-ranger-hive-policy-list.png" alt-text="Screenshot of hive warehouse connector ranger hive policy list." border="true":::
246268

247269
1. Provide a desired policy name. Select database: **Default**, Hive table: **demo**, Hive column: **name**, User: **rsadmin2**, Access Types: **select**, and **Partial mask: show last 4** from the **Select Masking Option** menu. Click **Add**.
248-
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-ranger-create-policy.png" alt-text="create policy." border="true":::
270+
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-ranger-create-policy.png" alt-text="Screenshot of create policy." border="true":::
249271
1. View the table's contents again. After applying the ranger policy, we can see only the last four characters of the column.
250272

251-
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-table-after-ranger-policy.png" alt-text="demo table after applying ranger policy." border="true":::
273+
:::image type="content" source="./media/apache-hive-warehouse-connector/hive-warehouse-connector-table-after-ranger-policy.png" alt-text="Screenshot of demo table after applying ranger policy." border="true":::
252274

253275
## Next steps
254276

articles/hdinsight/interactive-query/hive-warehouse-connector-v2-apis.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@ ms.topic: how-to
66
author: abhishjain002
77
ms.author: abhishjain
88
ms.reviewer: nijelsf
9-
ms.date: 01/02/2025
9+
ms.date: 08/07/2025
1010
---
1111

12-
# Hive Warehouse Connector 2.0 APIs in Azure HDInsight
12+
# Hive Warehouse Connector 2.1 and 2.0 APIs in Azure HDInsight
1313

14-
This article lists all the APIs supported by Hive warehouse connector 2.0. All the examples shown are how to run using spark-shell and hive warehouse connector session.
14+
This article lists all the APIs supported by Hive warehouse connector 2.1 and 2.0. All the examples shown are how to run using spark-shell and hive warehouse connector session.
1515

1616
How to create Hive warehouse connector session:
1717

0 commit comments

Comments
 (0)