You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/interactive-query/apache-hive-warehouse-connector-supported-spark-operations.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ Spark doesn't natively support writing to Hive's managed ACID tables. However,us
38
38
1. Filter the table `hivesampletable` where the column `state` equals `Colorado`. This hive query returns a SparkDataFrame ans sis saved in the Hive table `sampletable_colorado` using the `write` function.
Copy file name to clipboardExpand all lines: articles/hdinsight/interactive-query/apache-hive-warehouse-connector-zeppelin-livy.md
+17-8Lines changed: 17 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,17 +9,29 @@ ms.topic: conceptual
9
9
ms.date: 01/05/2020
10
10
---
11
11
12
-
# Use Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight
12
+
# Integrate Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight
13
13
14
14
HDInsight Spark clusters include Apache Zeppelin notebooks with different interpreters. In this article, we will focus only on the Livy interpreter to access Hive tables from Spark using Hive Warehouse Connector.
15
15
16
-
## Prerequisites
16
+
## Prerequisite
17
17
18
-
1. Create an HDInsight Spark **4.0** cluster with a storage account and a custom Azure virtual network. For information on creating a cluster in an Azure virtual network, see [Add HDInsight to an existing virtual network](../../hdinsight/hdinsight-plan-virtual-network-deployment.md#existingvnet).
18
+
Complete the [Hive Warehouse Connector setup](apache-hive-warehouse-connector.md#hive-warehouse-connector-setup) steps.
19
19
20
-
1. Create an HDInsight Interactive Query (LLAP) **4.0** cluster with the same storage account and Azure virtual network as the Spark cluster.
20
+
## Getting started
21
21
22
-
1. Setup up both Spark and Hive clusters with HWC using this document. See [Setup HDInsight clusters with Hive Warehouse Connector](./apache-hive-warehouse-connector.md#configure-hwc-settings)
22
+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
| livy.spark.sql.hive.hiveserver2.jdbc.url.principal | `hive/<headnode-FQDN>@<AAD-Domain>` (Needed only for ESP clusters) |
70
82
| livy.spark.sql.hive.hiveserver2.jdbc.url | Set it to the HiveServer2 JDBC connection string of the Interactive Query cluster. REPLACE `LLAPCLUSTERNAME` with the name of your Interactive Query cluster |
In `hive/<headnode-FQDN>@<AAD-Domain>` service principal, Replace `<headnode-FQDN>` with the Fully Qualified Domain Name of the head node host of the Interactive Query cluster. Replace `<AAD-DOMAIN>` with the name of the Azure Active Directory (AAD) that the cluster is joined to. Use an uppercase string for the `<AAD-DOMAIN>` value, otherwise the credential won't be found. Check /etc/krb5.conf for the realm names if needed.
75
86
76
-
For `zeppelin.livy.url` configuration, `headnode-FQDN` denotes the Fully Qualified Domain Name of the head node host of the Spark cluster.
77
-
78
87
* Save the changes and restart the Livy interpreter.
79
88
80
89
NOTE: If Livy interpreter is not accessible, please modify the `shiro.ini` file present within Zeppelin component in Ambari. Refer this [document](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/configuring-zeppelin-security/content/enabling_access_control_for_interpreter__configuration__and_credential_settings.html).
1. From a web browser, navigate to `https://LLAPCLUSTERNAME.azurehdinsight.net/#/main/services/HIVE/configs` where LLAPCLUSTERNAME is the name of your Interactive Query cluster.
49
49
50
-
* Select**Add Property...**as needed to add/update the following.
50
+
1. Navigate to **Advanced** >**General** > **hive.metastore.uris**and note the value. The value may be similar to: `thrift://iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:9083,thrift://hn1-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:9083`.
51
51
52
-
| Configuration | Value |
53
-
|----|----|
54
-
|`spark.sql.hive.hiveserver2.jdbc.url`|`jdbc:hive2://LLAPCLUSTERNAME.azurehdinsight.net:443/;user=admin;password=PWD;ssl=true;transportMode=http;httpPath=/hive2`. Set it to the HiveServer2 JDBC connection string of the Interactive Query cluster. REPLACE `LLAPCLUSTERNAME` with the name of your Interactive Query cluster. |
55
-
|`spark.datasource.hive.warehouse.load.staging.dir`|`wasbs://STORAGE_CONTAINER_NAME@STORAGE_ACCOUNT_NAME.blob.core.windows.net/tmp`. Set to a suitable HDFS-compatible staging directory. If you have two different clusters, the staging directory should be a folder in the staging directory of the LLAP cluster's storage account so that HiveServer2 has access to it. Replace `STORAGE_ACCOUNT_NAME` with the name of the storage account being used by the cluster, and `STORAGE_CONTAINER_NAME` with the name of the storage container. |
56
-
|`spark.datasource.hive.warehouse.metastoreUri`| The value of **hive.metastore.uris** Interactive Query cluster |
57
-
|`spark.security.credentials.hiveserver2.enabled`|`true` for YARN cluster mode and `false` for YARN client mode |
58
-
|`spark.hadoop.hive.zookeeper.quorum`| The value of **hive.zookeeper.quorum** of Interactive Query cluster |
52
+
1. Navigate to **Advanced** > **Advanced hive-site** > **hive.zookeeper.quorum** and note the value. The value may be similar to: `zk0-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:2181,zk1-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:2181,zk4-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:2181`.
59
53
60
-
* Save changes and restart components as needed.
54
+
1. Navigate to **Advanced** > **Advanced hive-interactive-site** > **hive.llap.daemon.service.hosts** and note the value. The value may be similar to: `@llap0`.
55
+
56
+
#### Configure Spark cluster settings
57
+
58
+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/#/main/services/SPARK2/configs` where CLUSTERNAME is the name of your Apache Spark cluster.
1. Select **Add Property...** to add the following configurations:
65
+
66
+
| Configuration | Value |
67
+
|----|----|
68
+
|`spark.sql.hive.hiveserver2.jdbc.url`|`jdbc:hive2://LLAPCLUSTERNAME.azurehdinsight.net:443/;user=admin;password=PWD;ssl=true;transportMode=http;httpPath=/hive2`. <br>Replace `LLAPCLUSTERNAME` with the name of your Interactive Query cluster. Replace `PWD` with the actual password.|
69
+
|`spark.datasource.hive.warehouse.load.staging.dir`|`wasbs://STORAGE_CONTAINER_NAME@STORAGE_ACCOUNT_NAME.blob.core.windows.net/tmp`. <br> Set to a suitable HDFS-compatible staging directory. If you have two different clusters, the staging directory should be a folder in the staging directory of the LLAP cluster's storage account so that HiveServer2 has access to it. Replace `STORAGE_ACCOUNT_NAME` with the name of the storage account being used by the cluster, and `STORAGE_CONTAINER_NAME` with the name of the storage container. |
70
+
|`spark.datasource.hive.warehouse.metastoreUri`| The value you obtained earlier from **hive.metastore.uris**. |
71
+
|`spark.security.credentials.hiveserver2.enabled`|`true` for YARN cluster mode and `false` for YARN client mode. |
72
+
|`spark.hadoop.hive.zookeeper.quorum`| The value you obtained earlier from **hive.zookeeper.quorum**. |
73
+
|`spark.hadoop.hive.llap.daemon.service.hosts`| The value you obtained earlier from **hive.llap.daemon.service.hosts**. |
74
+
75
+
1. Save changes and restart all affected components.
61
76
62
77
### Configure HWC for Enterprise Security Package (ESP) clusters
63
78
@@ -85,7 +100,7 @@ You can choose between a few different methods to connect to your Interactive Qu
Below are some examples to connect to HWC from Spark.
@@ -145,7 +160,7 @@ Firstly, SSH into the headnode of the Apache Spark cluster. For more information
145
160
To learn about the spark operations supported by HWC, please follow this [document](./apache-hive-warehouse-connector-supported-spark-operations.md).
146
161
147
162
148
-
#### Run queries on Enterprise Security Package (ESP) clusters
163
+
## Run queries on Enterprise Security Package (ESP) clusters
149
164
150
165
* Before initiating the spark-shell or spark-submit, execute the following command.
151
166
@@ -190,9 +205,9 @@ To learn about the spark operations supported by HWC, please follow this [docume
190
205
191
206
## Next steps
192
207
193
-
* [Spark Operations Supported By Hive Warehouse Connector](./apache-hive-warehouse-connector-supported-spark-operations.md)
208
+
* [Apache Spark Operations Supported By Hive Warehouse Connector](./apache-hive-warehouse-connector-supported-spark-operations.md)
194
209
* [Use Interactive Query with HDInsight](https://docs.microsoft.com/azure/hdinsight/interactive-query/apache-interactive-query-get-started)
195
-
* [Use Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight](./apache-hive-warehouse-connector-zeppelin-livy.md)
210
+
* [Integrate Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight](./apache-hive-warehouse-connector-zeppelin-livy.md)
196
211
* [Examples of interacting with Hive Warehouse Connector using Zeppelin, Livy, spark-submit, and pyspark](https://community.hortonworks.com/articles/223626/integrating-apache-hive-with-apache-spark-hive-war.html)
197
212
198
213
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
0 commit comments