MicrosoftDocs
diff --git a/‎articles/hdinsight/TOC.yml
Lines changed: 6 additions & 6 deletions b/‎articles/hdinsight/TOC.yml
Lines changed: 6 additions & 6 deletions
diff --git a/‎articles/hdinsight/interactive-query/apache-hive-warehouse-connector-zeppelin-livy.md
Lines changed: 0 additions & 133 deletions b/‎articles/hdinsight/interactive-query/apache-hive-warehouse-connector-zeppelin-livy.md
Lines changed: 0 additions & 133 deletions
diff --git a/‎articles/hdinsight/interactive-query/apache-hive-warehouse-connector-supported-spark-operations.md renamed to ‎articles/hdinsight/interactive-query/hive-warehouse-connector-operations.md
Lines changed: 48 additions & 17 deletions b/‎articles/hdinsight/interactive-query/apache-hive-warehouse-connector-supported-spark-operations.md renamed to ‎articles/hdinsight/interactive-query/hive-warehouse-connector-operations.md
Lines changed: 48 additions & 17 deletions
diff --git a/‎articles/hdinsight/interactive-query/hive-warehouse-connector-zeppelin.md
Lines changed: 134 additions & 0 deletions b/‎articles/hdinsight/interactive-query/hive-warehouse-connector-zeppelin.md
Lines changed: 134 additions & 0 deletions
@@ -762,12 +762,12 @@
         href: ./hadoop/apache-hadoop-hive-pig-udf-dotnet-csharp.md
       - name: Use Python with Apache Hive and Apache Pig
         href: ./hadoop/python-udf-hdinsight.md
-      - name: Apache Hive with Apache Spark
-        href: ./interactive-query/apache-hive-warehouse-connector.md
-      - name: Spark operations supported by Hive Warehouse Connector
-        href: ./interactive-query/apache-hive-warehouse-connector-supported-spark-operations.md
-      - name: Use Apache Zeppelin with Hive Warehouse Connector
-        href: ./interactive-query/apache-hive-warehouse-connector-zeppelin-livy.md
+      - name: HWC integration with Apache Spark and Apache Hive
+        href: ./interactive-query/hive-warehouse-connector.md
+      - name: HWC and Apache Spark operations
+        href: ./interactive-query/hive-warehouse-connector-operations.md
+      - name: HWC integration with Apache Zeppelin
+        href: ./interactive-query/hive-warehouse-connector-zeppelin.md
       - name: Apache Hive with Hadoop
         href: ./hadoop/hdinsight-use-hive.md
       - name: Use the Apache Hive View
 
@@ -1,17 +1,52 @@
 ---
-title: Spark operations supported by Hive Warehouse Connector - Azure HDInsight
+title: Apache Spark operations supported by Hive Warehouse Connector in Azure HDInsight
 description: Learn about the different capabilities of Hive Warehouse Connector on Azure HDInsight.
 author: nis-goel
 ms.author: nisgoel
-ms.reviewer: hrasheed
+ms.reviewer: jasonh
 ms.service: hdinsight
 ms.topic: conceptual
-ms.date: 01/05/2020
+ms.date: 05/22/2020
 ---
 
-# Spark operations supported by Hive Warehouse Connector on Azure HDInsight
+# Apache Spark operations supported by Hive Warehouse Connector in Azure HDInsight
 
-The article shows different spark based operations supported by HWC. All examples shown below will be executed through spark-shell.
+This article shows spark-based operations supported by Hive Warehouse Connector (HWC). All examples shown below will be executed through the Apache Spark shell.
+
+## Prerequisite
+
+Complete the [Hive Warehouse Connector setup](./hive-warehouse-connector.md#hive-warehouse-connector-setup) steps.
+
+## Getting started
+
+To start a spark-shell session, do the following steps:
+
+1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
+
+    ```cmd
+    ssh [email protected]
+    ```
+
+1. From your ssh session, execute the following command to note the `hive-warehouse-connector-assembly` version:
+
+    ```bash
+    ls /usr/hdp/current/hive_warehouse_connector
+    ```
+
+1. Edit the code below with the `hive-warehouse-connector-assembly` version identified above. Then execute the command to start the spark shell:
+
+    ```bash
+    spark-shell --master yarn \
+    --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-<STACK_VERSION>.jar \
+    --conf spark.security.credentials.hiveserver2.enabled=false
+    ```
+
+1. After starting the spark-shell, a Hive Warehouse Connector instance can be started using the following commands:
+
+    ```scala
+    import com.hortonworks.hwc.HiveWarehouseSession
+    val hive = HiveWarehouseSession.session(spark).build()
+    ```
 
 ## Creating Spark DataFrames using Hive queries
 
@@ -49,15 +84,17 @@ Spark doesn't natively support writing to Hive's managed ACID tables. However,us
 
     ![hive warehouse connector show hive table](./media/apache-hive-warehouse-connector/hive-warehouse-connector-show-hive-table.png)
 
+
 ## Structured streaming writes
 
 Using Hive Warehouse Connector, you can use Spark streaming to write data into Hive tables.
 
-Follow the steps below to create a Hive Warehouse Connector example that ingests data from a Spark stream on localhost port 9999 into a Hive table.
+> [!IMPORTANT]
+> Structured streaming writes are not supported in ESP enabled Spark 4.0 clusters.
 
-1. Follow the steps under [Connecting and running queries](./apache-hive-warehouse-connector.md#connecting-and-running-queries) to trigger the spark-shell.
+Follow the steps below to ingest data from a Spark stream on localhost port 9999 into a Hive table via. Hive Warehouse Connector.
 
-1. Begin the spark stream with the following command:
+1. From your open Spark shell, begin a spark stream with the following command:
 
     ```scala
     val lines = spark.readStream.format("socket").option("host", "localhost").option("port",9999).load()
@@ -98,14 +135,8 @@ Follow the steps below to create a Hive Warehouse Connector example that ingests
 
 Use **Ctrl + C** to stop netcat on the second SSH session. Use `:q` to exit spark-shell on the first SSH session.
 
-**NOTE:** In ESP enabled Spark 4.0 clusters, structured streaming writes are not supported.
-
 ## Next steps
 
-If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
-
-* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
-
-* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience by connecting the Azure community to the right resources: answers, support, and experts.
-
-* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, please review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-portal/supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).
+* [HWC integration with Apache Spark and Apache Hive](./hive-warehouse-connector.md)
+* [Use Interactive Query with HDInsight](./apache-interactive-query-get-started.md)
+* [HWC integration with Apache Zeppelin](./interactive-query/hive-warehouse-connector-zeppelin.md)
@@ -0,0 +1,134 @@
+---
+title: Hive Warehouse Connector - Apache Zeppelin using Livy - Azure HDInsight
+description: Learn how to integrate Hive Warehouse Connector with Apache Zeppelin on Azure HDInsight.
+author: nis-goel
+ms.author: nisgoel
+ms.reviewer: jasonh
+ms.service: hdinsight
+ms.topic: conceptual
+ms.date: 05/22/2020
+---
+
+# Integrate Apache Zeppelin with Hive Warehouse Connector in Azure HDInsight
+
+HDInsight Spark clusters include Apache Zeppelin notebooks with different interpreters. In this article, we'll focus only on the Livy interpreter to access Hive tables from Spark using Hive Warehouse Connector.
+
+## Prerequisite
+
+Complete the [Hive Warehouse Connector setup](hive-warehouse-connector.md#hive-warehouse-connector-setup) steps.
+
+## Getting started
+
+1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
+
+    ```cmd
+    ssh [email protected]
+    ```
+
+1. From your ssh session, execute the following command to note the versions for `hive-warehouse-connector-assembly` and `pyspark_hwc`:
+
+    ```bash
+    ls /usr/hdp/current/hive_warehouse_connector
+    ```
+
+    Save the output for later use when configuring Apache Zeppelin.
+
+## Configure Livy
+
+Following configurations are required to access hive tables from Zeppelin with the Livy interpreter.
+
+### Interactive Query Cluster
+
+1. From a web browser, navigate to `https://LLAPCLUSTERNAME.azurehdinsight.net/#/main/services/HDFS/configs` where LLAPCLUSTERNAME is the name of your Interactive Query cluster.
+
+1. Navigate to **Advanced** > **Custom core-site**. Select **Add Property...** to add the following configurations:
+
+    | Configuration                 | Value |
+    | ----------------------------- |-------|
+    | hadoop.proxyuser.livy.groups  | *     |
+    | hadoop.proxyuser.livy.hosts   | *     |
+
+1. Save changes and restart all affected components.
+
+### Spark Cluster
+
+1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/#/main/services/SPARK2/configs` where CLUSTERNAME is the name of your Apache Spark cluster.
+
+1. Expand **Custom livy2-conf**. Select **Add Property...** to add the following configuration:
+
+    | Configuration                 | Value                                      |
+    | ----------------------------- |------------------------------------------  |
+    | livy.file.local-dir-whitelist | /usr/hdp/current/hive_warehouse_connector/ |
+
+1. Save changes and restart all affected components.
+
+### Configure Livy Interpreter in Zeppelin UI (Spark Cluster)
+
+1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/zeppelin/#/interpreter`, where `CLUSTERNAME` is the name of your Apache Spark cluster.
+
+1. Navigate to **livy2**.
+
+1. Add the following configurations:
+
+    | Configuration                 | Value                                      |
+    | ----------------------------- |:------------------------------------------:|
+    | livy.spark.hadoop.hive.llap.daemon.service.hosts | @llap0 |
+    | livy.spark.security.credentials.hiveserver2.enabled | true |
+    | livy.spark.sql.hive.llap | true |
+    | livy.spark.yarn.security.credentials.hiveserver2.enabled | true |
+    | livy.superusers | livy,zeppelin |
+    | livy.spark.jars | `file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-VERSION.jar`.<br>Replace VERSION with value you obtained from [Getting started](#getting-started), earlier. |
+    | livy.spark.submit.pyFiles | `file:///usr/hdp/current/hive_warehouse_connector/pyspark_hwc-VERSION.zip`.<br>Replace VERSION with value you obtained from [Getting started](#getting-started), earlier. |
+    | livy.spark.sql.hive.hiveserver2.jdbc.url | Set it to the HiveServer2 Interactive JDBC URL of the Interactive Query cluster. |
+    | spark.security.credentials.hiveserver2.enabled | true |
+
+1. For ESP clusters only, add the following configuration:
+
+    | Configuration| Value|
+    |---|---|
+    | livy.spark.sql.hive.hiveserver2.jdbc.url.principal | `hive/<headnode-FQDN>@<AAD-Domain>` |
+
+    Replace `<headnode-FQDN>` with the Fully Qualified Domain Name of the head node of the Interactive Query cluster.
+    Replace `<AAD-DOMAIN>` with the name of the Azure Active Directory (AAD) that the cluster is joined to. Use an uppercase string for the `<AAD-DOMAIN>` value, otherwise the credential won't be found. Check `/etc/krb5.conf` for the realm names if needed.
+
+1. Save the changes and restart the Livy interpreter.
+
+If Livy interpreter isn't accessible, modify the `shiro.ini` file present within Zeppelin component in Ambari. For more information, see [Configuring Apache Zeppelin Security](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/configuring-zeppelin-security/content/enabling_access_control_for_interpreter__configuration__and_credential_settings.html).  
+
+
+## Running Queries in Zeppelin 
+
+Launch a Zeppelin notebook using Livy interpreter and execute the following
+
+```python
+%livy2
+
+import com.hortonworks.hwc.HiveWarehouseSession
+import com.hortonworks.hwc.HiveWarehouseSession._
+import org.apache.spark.sql.SaveMode
+
+# Initialize the hive context
+val hive = HiveWarehouseSession.session(spark).build()
+
+# Create a database
+hive.createDatabase("hwc_db",true)
+hive.setDatabase("hwc_db")
+
+# Create a Hive table
+hive.createTable("testers").ifNotExists().column("id", "bigint").column("name", "string").create()
+
+val dataDF = Seq( (1, "foo"), (2, "bar"), (8, "john")).toDF("id", "name")
+
+# Validate writes to the table
+dataDF.write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector").mode("append").option("table", "hwc_db.testers").save()
+
+# Validate reads
+hive.executeQuery("select * from testers").show()
+
+```
+
+## Next steps
+
+* [HWC and Apache Spark operations](./hive-warehouse-connector-operations.md)
+* [HWC integration with Apache Spark and Apache Hive](./hive-warehouse-connector.md)
+* [Use Interactive Query with HDInsight](./apache-interactive-query-get-started.md)