Skip to content

Commit c958fdd

Browse files
committed
Incorporated Code Review Comments including formatting changes
1 parent e938862 commit c958fdd

File tree

5 files changed

+241
-213
lines changed

5 files changed

+241
-213
lines changed

articles/hdinsight/TOC.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -762,12 +762,12 @@
762762
href: ./hadoop/apache-hadoop-hive-pig-udf-dotnet-csharp.md
763763
- name: Use Python with Apache Hive and Apache Pig
764764
href: ./hadoop/python-udf-hdinsight.md
765-
- name: Apache Hive with Apache Spark
766-
href: ./interactive-query/apache-hive-warehouse-connector.md
767-
- name: Spark operations supported by Hive Warehouse Connector
768-
href: ./interactive-query/apache-hive-warehouse-connector-supported-spark-operations.md
769-
- name: Use Apache Zeppelin with Hive Warehouse Connector
770-
href: ./interactive-query/apache-hive-warehouse-connector-zeppelin-livy.md
765+
- name: HWC integration with Apache Spark and Apache Hive
766+
href: ./interactive-query/hive-warehouse-connector.md
767+
- name: HWC and Apache Spark operations
768+
href: ./interactive-query/hive-warehouse-connector-operations.md
769+
- name: HWC integration with Apache Zeppelin
770+
href: ./interactive-query/hive-warehouse-connector-zeppelin.md
771771
- name: Apache Hive with Hadoop
772772
href: ./hadoop/hdinsight-use-hive.md
773773
- name: Use the Apache Hive View

articles/hdinsight/interactive-query/apache-hive-warehouse-connector-zeppelin-livy.md

Lines changed: 0 additions & 133 deletions
This file was deleted.
Lines changed: 48 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,52 @@
11
---
2-
title: Spark operations supported by Hive Warehouse Connector - Azure HDInsight
2+
title: Apache Spark operations supported by Hive Warehouse Connector in Azure HDInsight
33
description: Learn about the different capabilities of Hive Warehouse Connector on Azure HDInsight.
44
author: nis-goel
55
ms.author: nisgoel
6-
ms.reviewer: hrasheed
6+
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
9-
ms.date: 01/05/2020
9+
ms.date: 05/22/2020
1010
---
1111

12-
# Spark operations supported by Hive Warehouse Connector on Azure HDInsight
12+
# Apache Spark operations supported by Hive Warehouse Connector in Azure HDInsight
1313

14-
The article shows different spark based operations supported by HWC. All examples shown below will be executed through spark-shell.
14+
This article shows spark-based operations supported by Hive Warehouse Connector (HWC). All examples shown below will be executed through the Apache Spark shell.
15+
16+
## Prerequisite
17+
18+
Complete the [Hive Warehouse Connector setup](./hive-warehouse-connector.md#hive-warehouse-connector-setup) steps.
19+
20+
## Getting started
21+
22+
To start a spark-shell session, do the following steps:
23+
24+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
25+
26+
```cmd
27+
28+
```
29+
30+
1. From your ssh session, execute the following command to note the `hive-warehouse-connector-assembly` version:
31+
32+
```bash
33+
ls /usr/hdp/current/hive_warehouse_connector
34+
```
35+
36+
1. Edit the code below with the `hive-warehouse-connector-assembly` version identified above. Then execute the command to start the spark shell:
37+
38+
```bash
39+
spark-shell --master yarn \
40+
--jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-<STACK_VERSION>.jar \
41+
--conf spark.security.credentials.hiveserver2.enabled=false
42+
```
43+
44+
1. After starting the spark-shell, a Hive Warehouse Connector instance can be started using the following commands:
45+
46+
```scala
47+
import com.hortonworks.hwc.HiveWarehouseSession
48+
val hive = HiveWarehouseSession.session(spark).build()
49+
```
1550
1651
## Creating Spark DataFrames using Hive queries
1752
@@ -49,15 +84,17 @@ Spark doesn't natively support writing to Hive's managed ACID tables. However,us
4984

5085
![hive warehouse connector show hive table](./media/apache-hive-warehouse-connector/hive-warehouse-connector-show-hive-table.png)
5186

87+
5288
## Structured streaming writes
5389

5490
Using Hive Warehouse Connector, you can use Spark streaming to write data into Hive tables.
5591

56-
Follow the steps below to create a Hive Warehouse Connector example that ingests data from a Spark stream on localhost port 9999 into a Hive table.
92+
> [!IMPORTANT]
93+
> Structured streaming writes are not supported in ESP enabled Spark 4.0 clusters.
5794

58-
1. Follow the steps under [Connecting and running queries](./apache-hive-warehouse-connector.md#connecting-and-running-queries) to trigger the spark-shell.
95+
Follow the steps below to ingest data from a Spark stream on localhost port 9999 into a Hive table via. Hive Warehouse Connector.
5996

60-
1. Begin the spark stream with the following command:
97+
1. From your open Spark shell, begin a spark stream with the following command:
6198

6299
```scala
63100
val lines = spark.readStream.format("socket").option("host", "localhost").option("port",9999).load()
@@ -98,14 +135,8 @@ Follow the steps below to create a Hive Warehouse Connector example that ingests
98135

99136
Use **Ctrl + C** to stop netcat on the second SSH session. Use `:q` to exit spark-shell on the first SSH session.
100137

101-
**NOTE:** In ESP enabled Spark 4.0 clusters, structured streaming writes are not supported.
102-
103138
## Next steps
104139

105-
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
106-
107-
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
108-
109-
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience by connecting the Azure community to the right resources: answers, support, and experts.
110-
111-
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, please review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-portal/supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).
140+
* [HWC integration with Apache Spark and Apache Hive](./hive-warehouse-connector.md)
141+
* [Use Interactive Query with HDInsight](./apache-interactive-query-get-started.md)
142+
* [HWC integration with Apache Zeppelin](./interactive-query/hive-warehouse-connector-zeppelin.md)
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
---
2+
title: Hive Warehouse Connector - Apache Zeppelin using Livy - Azure HDInsight
3+
description: Learn how to integrate Hive Warehouse Connector with Apache Zeppelin on Azure HDInsight.
4+
author: nis-goel
5+
ms.author: nisgoel
6+
ms.reviewer: jasonh
7+
ms.service: hdinsight
8+
ms.topic: conceptual
9+
ms.date: 05/22/2020
10+
---
11+
12+
# Integrate Apache Zeppelin with Hive Warehouse Connector in Azure HDInsight
13+
14+
HDInsight Spark clusters include Apache Zeppelin notebooks with different interpreters. In this article, we'll focus only on the Livy interpreter to access Hive tables from Spark using Hive Warehouse Connector.
15+
16+
## Prerequisite
17+
18+
Complete the [Hive Warehouse Connector setup](hive-warehouse-connector.md#hive-warehouse-connector-setup) steps.
19+
20+
## Getting started
21+
22+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
23+
24+
```cmd
25+
26+
```
27+
28+
1. From your ssh session, execute the following command to note the versions for `hive-warehouse-connector-assembly` and `pyspark_hwc`:
29+
30+
```bash
31+
ls /usr/hdp/current/hive_warehouse_connector
32+
```
33+
34+
Save the output for later use when configuring Apache Zeppelin.
35+
36+
## Configure Livy
37+
38+
Following configurations are required to access hive tables from Zeppelin with the Livy interpreter.
39+
40+
### Interactive Query Cluster
41+
42+
1. From a web browser, navigate to `https://LLAPCLUSTERNAME.azurehdinsight.net/#/main/services/HDFS/configs` where LLAPCLUSTERNAME is the name of your Interactive Query cluster.
43+
44+
1. Navigate to **Advanced** > **Custom core-site**. Select **Add Property...** to add the following configurations:
45+
46+
| Configuration | Value |
47+
| ----------------------------- |-------|
48+
| hadoop.proxyuser.livy.groups | * |
49+
| hadoop.proxyuser.livy.hosts | * |
50+
51+
1. Save changes and restart all affected components.
52+
53+
### Spark Cluster
54+
55+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/#/main/services/SPARK2/configs` where CLUSTERNAME is the name of your Apache Spark cluster.
56+
57+
1. Expand **Custom livy2-conf**. Select **Add Property...** to add the following configuration:
58+
59+
| Configuration | Value |
60+
| ----------------------------- |------------------------------------------ |
61+
| livy.file.local-dir-whitelist | /usr/hdp/current/hive_warehouse_connector/ |
62+
63+
1. Save changes and restart all affected components.
64+
65+
### Configure Livy Interpreter in Zeppelin UI (Spark Cluster)
66+
67+
1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/zeppelin/#/interpreter`, where `CLUSTERNAME` is the name of your Apache Spark cluster.
68+
69+
1. Navigate to **livy2**.
70+
71+
1. Add the following configurations:
72+
73+
| Configuration | Value |
74+
| ----------------------------- |:------------------------------------------:|
75+
| livy.spark.hadoop.hive.llap.daemon.service.hosts | @llap0 |
76+
| livy.spark.security.credentials.hiveserver2.enabled | true |
77+
| livy.spark.sql.hive.llap | true |
78+
| livy.spark.yarn.security.credentials.hiveserver2.enabled | true |
79+
| livy.superusers | livy,zeppelin |
80+
| livy.spark.jars | `file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-VERSION.jar`.<br>Replace VERSION with value you obtained from [Getting started](#getting-started), earlier. |
81+
| livy.spark.submit.pyFiles | `file:///usr/hdp/current/hive_warehouse_connector/pyspark_hwc-VERSION.zip`.<br>Replace VERSION with value you obtained from [Getting started](#getting-started), earlier. |
82+
| livy.spark.sql.hive.hiveserver2.jdbc.url | Set it to the HiveServer2 Interactive JDBC URL of the Interactive Query cluster. |
83+
| spark.security.credentials.hiveserver2.enabled | true |
84+
85+
1. For ESP clusters only, add the following configuration:
86+
87+
| Configuration| Value|
88+
|---|---|
89+
| livy.spark.sql.hive.hiveserver2.jdbc.url.principal | `hive/<headnode-FQDN>@<AAD-Domain>` |
90+
91+
Replace `<headnode-FQDN>` with the Fully Qualified Domain Name of the head node of the Interactive Query cluster.
92+
Replace `<AAD-DOMAIN>` with the name of the Azure Active Directory (AAD) that the cluster is joined to. Use an uppercase string for the `<AAD-DOMAIN>` value, otherwise the credential won't be found. Check `/etc/krb5.conf` for the realm names if needed.
93+
94+
1. Save the changes and restart the Livy interpreter.
95+
96+
If Livy interpreter isn't accessible, modify the `shiro.ini` file present within Zeppelin component in Ambari. For more information, see [Configuring Apache Zeppelin Security](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/configuring-zeppelin-security/content/enabling_access_control_for_interpreter__configuration__and_credential_settings.html).
97+
98+
99+
## Running Queries in Zeppelin
100+
101+
Launch a Zeppelin notebook using Livy interpreter and execute the following
102+
103+
```python
104+
%livy2
105+
106+
import com.hortonworks.hwc.HiveWarehouseSession
107+
import com.hortonworks.hwc.HiveWarehouseSession._
108+
import org.apache.spark.sql.SaveMode
109+
110+
# Initialize the hive context
111+
val hive = HiveWarehouseSession.session(spark).build()
112+
113+
# Create a database
114+
hive.createDatabase("hwc_db",true)
115+
hive.setDatabase("hwc_db")
116+
117+
# Create a Hive table
118+
hive.createTable("testers").ifNotExists().column("id", "bigint").column("name", "string").create()
119+
120+
val dataDF = Seq( (1, "foo"), (2, "bar"), (8, "john")).toDF("id", "name")
121+
122+
# Validate writes to the table
123+
dataDF.write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector").mode("append").option("table", "hwc_db.testers").save()
124+
125+
# Validate reads
126+
hive.executeQuery("select * from testers").show()
127+
128+
```
129+
130+
## Next steps
131+
132+
* [HWC and Apache Spark operations](./hive-warehouse-connector-operations.md)
133+
* [HWC integration with Apache Spark and Apache Hive](./hive-warehouse-connector.md)
134+
* [Use Interactive Query with HDInsight](./apache-interactive-query-get-started.md)

0 commit comments

Comments
 (0)