Skip to content

Commit 6dff5f1

Browse files
committed
Updated a few configurations
1 parent f45d0d2 commit 6dff5f1

File tree

3 files changed

+51
-25
lines changed

3 files changed

+51
-25
lines changed

articles/hdinsight/interactive-query/apache-hive-warehouse-connector-supported-spark-operations.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Spark doesn't natively support writing to Hive's managed ACID tables. However,us
3838
1. Filter the table `hivesampletable` where the column `state` equals `Colorado`. This hive query returns a Spark DataFrame ans sis saved in the Hive table `sampletable_colorado` using the `write` function.
3939

4040
```scala
41-
hive.table("hivesampletable").filter("state = 'Colorado'").write.format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR).option("table","sampletable_colorado").save()
41+
hive.table("hivesampletable").filter("state = 'Colorado'").write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector").mode("append").option("table","sampletable_colorado").save()
4242
```
4343

4444
1. View the results with the following command:
@@ -76,7 +76,7 @@ Follow the steps below to create a Hive Warehouse Connector example that ingests
7676
1. Then write the streaming data to the newly created table using the following command:
7777

7878
```scala
79-
lines.filter("value = 'HiveSpark'").writeStream.format(HiveWarehouseSession.STREAM_TO_STREAM).option("database", "default").option("table","stream_table").option("metastoreUri",spark.conf.get("spark.datasource.hive.warehouse.metastoreUri")).option("checkpointLocation","/tmp/checkpoint1").start()
79+
lines.filter("value = 'HiveSpark'").writeStream.format("com.hortonworks.spark.sql.hive.llap.streaming.HiveStreamingDataSource").option("database", "default").option("table","stream_table").option("metastoreUri",spark.conf.get("spark.datasource.hive.warehouse.metastoreUri")).option("checkpointLocation","/tmp/checkpoint1").start()
8080
```
8181

8282
>[!Important]
@@ -98,6 +98,8 @@ Follow the steps below to create a Hive Warehouse Connector example that ingests
9898

9999
Use **Ctrl + C** to stop netcat on the second SSH session. Use `:q` to exit spark-shell on the first SSH session.
100100

101+
**NOTE:** In ESP enabled Spark 4.0 clusters, structured streaming writes are not supported.
102+
101103
## Next steps
102104

103105
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:

articles/hdinsight/interactive-query/apache-hive-warehouse-connector-zeppelin-livy.md

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,29 @@ ms.topic: conceptual
99
ms.date: 01/05/2020
1010
---
1111

12-
# Use Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight
12+
# Integrate Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight
1313

1414
HDInsight Spark clusters include Apache Zeppelin notebooks with different interpreters. In this article, we will focus only on the Livy interpreter to access Hive tables from Spark using Hive Warehouse Connector.
1515

16-
## Prerequisites
16+
## Prerequisite
1717

18-
1. Create an HDInsight Spark **4.0** cluster with a storage account and a custom Azure virtual network. For information on creating a cluster in an Azure virtual network, see [Add HDInsight to an existing virtual network](../../hdinsight/hdinsight-plan-virtual-network-deployment.md#existingvnet).
18+
Complete the [Hive Warehouse Connector setup](apache-hive-warehouse-connector.md#hive-warehouse-connector-setup) steps.
1919

20-
1. Create an HDInsight Interactive Query (LLAP) **4.0** cluster with the same storage account and Azure virtual network as the Spark cluster.
20+
## Getting started
2121

22-
1. Setup up both Spark and Hive clusters with HWC using this document. See [Setup HDInsight clusters with Hive Warehouse Connector](./apache-hive-warehouse-connector.md#configure-hwc-settings)
22+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
23+
24+
```cmd
25+
26+
```
27+
28+
1. From your ssh session, execute the following command to note the versions for `hive-warehouse-connector-assembly` and `pyspark_hwc`:
29+
30+
```bash
31+
ls /usr/hdp/current/hive_warehouse_connector
32+
```
33+
34+
Save the output for later use when configuring Apache Zeppelin.
2335
2436
## Configure Livy
2537
@@ -68,13 +80,10 @@ Following configurations are required to be able to access hive tables from Zepp
6880
| livy.spark.submit.pyFiles | file:///usr/hdp/current/hive_warehouse_connector/pyspark_hwc-<STACK_VERSION>.zip |
6981
| livy.spark.sql.hive.hiveserver2.jdbc.url.principal | `hive/<headnode-FQDN>@<AAD-Domain>` (Needed only for ESP clusters) |
7082
| livy.spark.sql.hive.hiveserver2.jdbc.url | Set it to the HiveServer2 JDBC connection string of the Interactive Query cluster. REPLACE `LLAPCLUSTERNAME` with the name of your Interactive Query cluster |
71-
| zeppelin.livy.url | http://{headnode-FQDN}:8998 |
7283
| spark.security.credentials.hiveserver2.enabled | true |
7384
7485
In `hive/<headnode-FQDN>@<AAD-Domain>` service principal, Replace `<headnode-FQDN>` with the Fully Qualified Domain Name of the head node host of the Interactive Query cluster. Replace `<AAD-DOMAIN>` with the name of the Azure Active Directory (AAD) that the cluster is joined to. Use an uppercase string for the `<AAD-DOMAIN>` value, otherwise the credential won't be found. Check /etc/krb5.conf for the realm names if needed.
7586
76-
For `zeppelin.livy.url` configuration, `headnode-FQDN` denotes the Fully Qualified Domain Name of the head node host of the Spark cluster.
77-
7887
* Save the changes and restart the Livy interpreter.
7988
8089
NOTE: If Livy interpreter is not accessible, please modify the `shiro.ini` file present within Zeppelin component in Ambari. Refer this [document](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/configuring-zeppelin-security/content/enabling_access_control_for_interpreter__configuration__and_credential_settings.html).

articles/hdinsight/interactive-query/apache-hive-warehouse-connector.md

Lines changed: 30 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -43,21 +43,36 @@ Hive Warehouse Connector needs separate clusters for Spark and Interactive Query
4343

4444
### Configure HWC settings
4545

46-
* From Ambari web UI of Spark cluster, navigate to **Spark2** > **CONFIGS** > **Custom spark2-defaults**.
46+
#### Gather preliminary information
4747

48-
![Apache Ambari Spark2 configuration](./media/apache-hive-warehouse-connector/hive-warehouse-connector-spark2-ambari.png)
48+
1. From a web browser, navigate to `https://LLAPCLUSTERNAME.azurehdinsight.net/#/main/services/HIVE/configs` where LLAPCLUSTERNAME is the name of your Interactive Query cluster.
4949

50-
* Select **Add Property...** as needed to add/update the following.
50+
1. Navigate to **Advanced** > **General** > **hive.metastore.uris** and note the value. The value may be similar to: `thrift://iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:9083,thrift://hn1-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:9083`.
5151

52-
| Configuration | Value |
53-
|----|----|
54-
|`spark.sql.hive.hiveserver2.jdbc.url`|`jdbc:hive2://LLAPCLUSTERNAME.azurehdinsight.net:443/;user=admin;password=PWD;ssl=true;transportMode=http;httpPath=/hive2`. Set it to the HiveServer2 JDBC connection string of the Interactive Query cluster. REPLACE `LLAPCLUSTERNAME` with the name of your Interactive Query cluster. |
55-
|`spark.datasource.hive.warehouse.load.staging.dir`|`wasbs://STORAGE_CONTAINER_NAME@STORAGE_ACCOUNT_NAME.blob.core.windows.net/tmp`. Set to a suitable HDFS-compatible staging directory. If you have two different clusters, the staging directory should be a folder in the staging directory of the LLAP cluster's storage account so that HiveServer2 has access to it. Replace `STORAGE_ACCOUNT_NAME` with the name of the storage account being used by the cluster, and `STORAGE_CONTAINER_NAME` with the name of the storage container. |
56-
|`spark.datasource.hive.warehouse.metastoreUri`| The value of **hive.metastore.uris** Interactive Query cluster |
57-
|`spark.security.credentials.hiveserver2.enabled`|`true` for YARN cluster mode and `false` for YARN client mode |
58-
|`spark.hadoop.hive.zookeeper.quorum`| The value of **hive.zookeeper.quorum** of Interactive Query cluster |
52+
1. Navigate to **Advanced** > **Advanced hive-site** > **hive.zookeeper.quorum** and note the value. The value may be similar to: `zk0-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:2181,zk1-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:2181,zk4-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:2181`.
5953

60-
* Save changes and restart components as needed.
54+
1. Navigate to **Advanced** > **Advanced hive-interactive-site** > **hive.llap.daemon.service.hosts** and note the value. The value may be similar to: `@llap0`.
55+
56+
#### Configure Spark cluster settings
57+
58+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/#/main/services/SPARK2/configs` where CLUSTERNAME is the name of your Apache Spark cluster.
59+
60+
1. Expand **Custom spark2-defaults**.
61+
62+
![Apache Ambari Spark2 configuration](./media/apache-hive-warehouse-connector/hive-warehouse-connector-spark2-ambari.png)
63+
64+
1. Select **Add Property...** to add the following configurations:
65+
66+
| Configuration | Value |
67+
|----|----|
68+
|`spark.sql.hive.hiveserver2.jdbc.url`|`jdbc:hive2://LLAPCLUSTERNAME.azurehdinsight.net:443/;user=admin;password=PWD;ssl=true;transportMode=http;httpPath=/hive2`. <br>Replace `LLAPCLUSTERNAME` with the name of your Interactive Query cluster. Replace `PWD` with the actual password.|
69+
|`spark.datasource.hive.warehouse.load.staging.dir`|`wasbs://STORAGE_CONTAINER_NAME@STORAGE_ACCOUNT_NAME.blob.core.windows.net/tmp`. <br> Set to a suitable HDFS-compatible staging directory. If you have two different clusters, the staging directory should be a folder in the staging directory of the LLAP cluster's storage account so that HiveServer2 has access to it. Replace `STORAGE_ACCOUNT_NAME` with the name of the storage account being used by the cluster, and `STORAGE_CONTAINER_NAME` with the name of the storage container. |
70+
|`spark.datasource.hive.warehouse.metastoreUri`| The value you obtained earlier from **hive.metastore.uris**. |
71+
|`spark.security.credentials.hiveserver2.enabled`|`true` for YARN cluster mode and `false` for YARN client mode. |
72+
|`spark.hadoop.hive.zookeeper.quorum`| The value you obtained earlier from **hive.zookeeper.quorum**. |
73+
|`spark.hadoop.hive.llap.daemon.service.hosts`| The value you obtained earlier from **hive.llap.daemon.service.hosts**. |
74+
75+
1. Save changes and restart all affected components.
6176

6277
### Configure HWC for Enterprise Security Package (ESP) clusters
6378

@@ -85,7 +100,7 @@ You can choose between a few different methods to connect to your Interactive Qu
85100

86101
* [Spark-shell / PySpark](../spark/apache-spark-shell.md)
87102
* [Spark-submit](#spark-submit)
88-
* [Zeppelin](../spark/apache-spark-zeppelin-notebook.md)
103+
* [Zeppelin](./apache-hive-warehouse-connector-zeppelin-livy.md)
89104

90105

91106
Below are some examples to connect to HWC from Spark.
@@ -145,7 +160,7 @@ Firstly, SSH into the headnode of the Apache Spark cluster. For more information
145160
To learn about the spark operations supported by HWC, please follow this [document](./apache-hive-warehouse-connector-supported-spark-operations.md).
146161

147162

148-
#### Run queries on Enterprise Security Package (ESP) clusters
163+
## Run queries on Enterprise Security Package (ESP) clusters
149164

150165
* Before initiating the spark-shell or spark-submit, execute the following command.
151166

@@ -190,9 +205,9 @@ To learn about the spark operations supported by HWC, please follow this [docume
190205

191206
## Next steps
192207

193-
* [Spark Operations Supported By Hive Warehouse Connector](./apache-hive-warehouse-connector-supported-spark-operations.md)
208+
* [Apache Spark Operations Supported By Hive Warehouse Connector](./apache-hive-warehouse-connector-supported-spark-operations.md)
194209
* [Use Interactive Query with HDInsight](https://docs.microsoft.com/azure/hdinsight/interactive-query/apache-interactive-query-get-started)
195-
* [Use Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight](./apache-hive-warehouse-connector-zeppelin-livy.md)
210+
* [Integrate Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight](./apache-hive-warehouse-connector-zeppelin-livy.md)
196211
* [Examples of interacting with Hive Warehouse Connector using Zeppelin, Livy, spark-submit, and pyspark](https://community.hortonworks.com/articles/223626/integrating-apache-hive-with-apache-spark-hive-war.html)
197212

198213
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:

0 commit comments

Comments
 (0)