Updated a few configurations

nis-goel · nis-goel · commit 6dff5f1a0aae · 2020-05-09T17:17:32.000-07:00
diff --git a/articles/hdinsight/interactive-query/apache-hive-warehouse-connector-supported-spark-operations.md b/articles/hdinsight/interactive-query/apache-hive-warehouse-connector-supported-spark-operations.md
@@ -38,7 +38,7 @@ Spark doesn't natively support writing to Hive's managed ACID tables. However,us
 1. Filter the table `hivesampletable` where the column `state` equals `Colorado`. This hive query returns a Spark DataFrame ans sis saved in the Hive table `sampletable_colorado` using the `write` function.
 
     ```scala
-    hive.table("hivesampletable").filter("state = 'Colorado'").write.format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR).option("table","sampletable_colorado").save()
+    hive.table("hivesampletable").filter("state = 'Colorado'").write.format("com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector").mode("append").option("table","sampletable_colorado").save()
     ```
 
 1. View the results with the following command:
@@ -76,7 +76,7 @@ Follow the steps below to create a Hive Warehouse Connector example that ingests
 1. Then write the streaming data to the newly created table using the following command:
 
     ```scala
-    lines.filter("value = 'HiveSpark'").writeStream.format(HiveWarehouseSession.STREAM_TO_STREAM).option("database", "default").option("table","stream_table").option("metastoreUri",spark.conf.get("spark.datasource.hive.warehouse.metastoreUri")).option("checkpointLocation","/tmp/checkpoint1").start()
+    lines.filter("value = 'HiveSpark'").writeStream.format("com.hortonworks.spark.sql.hive.llap.streaming.HiveStreamingDataSource").option("database", "default").option("table","stream_table").option("metastoreUri",spark.conf.get("spark.datasource.hive.warehouse.metastoreUri")).option("checkpointLocation","/tmp/checkpoint1").start()
     ```
 
     >[!Important]
@@ -98,6 +98,8 @@ Follow the steps below to create a Hive Warehouse Connector example that ingests
 
 Use **Ctrl + C** to stop netcat on the second SSH session. Use `:q` to exit spark-shell on the first SSH session.
 
+**NOTE:** In ESP enabled Spark 4.0 clusters, structured streaming writes are not supported.
+
 ## Next steps
 
 If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
diff --git a/articles/hdinsight/interactive-query/apache-hive-warehouse-connector-zeppelin-livy.md b/articles/hdinsight/interactive-query/apache-hive-warehouse-connector-zeppelin-livy.md
@@ -9,17 +9,29 @@ ms.topic: conceptual
 ms.date: 01/05/2020
 ---
 
-# Use Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight
+# Integrate Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight
 
 HDInsight Spark clusters include Apache Zeppelin notebooks with different interpreters. In this article, we will focus only on the Livy interpreter to access Hive tables from Spark using Hive Warehouse Connector.
 
-## Prerequisites
+## Prerequisite
 
-1. Create an HDInsight Spark **4.0** cluster with a storage account and a custom Azure virtual network. For information on creating a cluster in an Azure virtual network, see [Add HDInsight to an existing virtual network](../../hdinsight/hdinsight-plan-virtual-network-deployment.md#existingvnet).
+Complete the [Hive Warehouse Connector setup](apache-hive-warehouse-connector.md#hive-warehouse-connector-setup) steps.
 
-1. Create an HDInsight Interactive Query (LLAP) **4.0** cluster with the same storage account and Azure virtual network as the Spark cluster.
+## Getting started
 
-1. Setup up both Spark and Hive clusters with HWC using this document. See [Setup HDInsight clusters with Hive Warehouse Connector](./apache-hive-warehouse-connector.md#configure-hwc-settings)
+1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
+
+    ```cmd
+    ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.net
+    ```
+
+1. From your ssh session, execute the following command to note the versions for `hive-warehouse-connector-assembly` and `pyspark_hwc`:
+
+    ```bash
+    ls /usr/hdp/current/hive_warehouse_connector
+    ```
+
+    Save the output for later use when configuring Apache Zeppelin.
 
 ## Configure Livy
 
@@ -68,13 +80,10 @@ Following configurations are required to be able to access hive tables from Zepp
 | livy.spark.submit.pyFiles | file:///usr/hdp/current/hive_warehouse_connector/pyspark_hwc-<STACK_VERSION>.zip |
 | livy.spark.sql.hive.hiveserver2.jdbc.url.principal | `hive/<headnode-FQDN>@<AAD-Domain>` (Needed only for ESP clusters) |
 | livy.spark.sql.hive.hiveserver2.jdbc.url | Set it to the HiveServer2 JDBC connection string of the Interactive Query cluster. REPLACE `LLAPCLUSTERNAME` with the name of your Interactive Query cluster |
-| zeppelin.livy.url | http://{headnode-FQDN}:8998 |
 | spark.security.credentials.hiveserver2.enabled | true |
 
 In `hive/<headnode-FQDN>@<AAD-Domain>` service principal, Replace `<headnode-FQDN>` with the Fully Qualified Domain Name of the head node host of the Interactive Query cluster. Replace `<AAD-DOMAIN>` with the name of the Azure Active Directory (AAD) that the cluster is joined to. Use an uppercase string for the `<AAD-DOMAIN>` value, otherwise the credential won't be found. Check /etc/krb5.conf for the realm names if needed.
 
-For `zeppelin.livy.url` configuration, `headnode-FQDN` denotes the Fully Qualified Domain Name of the head node host of the Spark cluster.
-
 * Save the changes and restart the Livy interpreter.
 
 NOTE: If Livy interpreter is not accessible, please modify the `shiro.ini` file present within Zeppelin component in Ambari. Refer this [document](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.1/configuring-zeppelin-security/content/enabling_access_control_for_interpreter__configuration__and_credential_settings.html).  
diff --git a/articles/hdinsight/interactive-query/apache-hive-warehouse-connector.md b/articles/hdinsight/interactive-query/apache-hive-warehouse-connector.md
@@ -43,21 +43,36 @@ Hive Warehouse Connector needs separate clusters for Spark and Interactive Query
 
 ### Configure HWC settings
 
-* From Ambari web UI of Spark cluster, navigate to **Spark2** > **CONFIGS** > **Custom spark2-defaults**.
+#### Gather preliminary information
 
-![Apache Ambari Spark2 configuration](./media/apache-hive-warehouse-connector/hive-warehouse-connector-spark2-ambari.png)
+1. From a web browser, navigate to `https://LLAPCLUSTERNAME.azurehdinsight.net/#/main/services/HIVE/configs` where LLAPCLUSTERNAME is the name of your Interactive Query cluster.
 
-* Select **Add Property...** as needed to add/update the following.
+1. Navigate to **Advanced** > **General** > **hive.metastore.uris** and note the value. The value may be similar to: `thrift://iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:9083,thrift://hn1-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:9083`.
 
-| Configuration | Value |
-|----|----|
-|`spark.sql.hive.hiveserver2.jdbc.url`|`jdbc:hive2://LLAPCLUSTERNAME.azurehdinsight.net:443/;user=admin;password=PWD;ssl=true;transportMode=http;httpPath=/hive2`. Set it to the HiveServer2 JDBC connection string of the Interactive Query cluster. REPLACE `LLAPCLUSTERNAME` with the name of your Interactive Query cluster. |
-|`spark.datasource.hive.warehouse.load.staging.dir`|`wasbs://STORAGE_CONTAINER_NAME@STORAGE_ACCOUNT_NAME.blob.core.windows.net/tmp`. Set to a suitable HDFS-compatible staging directory. If you have two different clusters, the staging directory should be a folder in the staging directory of the LLAP cluster's storage account so that HiveServer2 has access to it.  Replace `STORAGE_ACCOUNT_NAME` with the name of the storage account being used by the cluster, and `STORAGE_CONTAINER_NAME` with the name of the storage container. |
-|`spark.datasource.hive.warehouse.metastoreUri`| The value of **hive.metastore.uris** Interactive Query cluster |
-|`spark.security.credentials.hiveserver2.enabled`|`true` for YARN cluster mode and `false` for YARN client mode |
-|`spark.hadoop.hive.zookeeper.quorum`| The value of **hive.zookeeper.quorum** of Interactive Query cluster |
+1. Navigate to **Advanced** > **Advanced hive-site** > **hive.zookeeper.quorum** and note the value. The value may be similar to: `zk0-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:2181,zk1-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:2181,zk4-iqgiro.rekufuk2y2cezcbowjkbwfnyvd.bx.internal.cloudapp.net:2181`.
 
-* Save changes and restart components as needed.
+1. Navigate to **Advanced** > **Advanced hive-interactive-site** > **hive.llap.daemon.service.hosts** and note the value. The value may be similar to: `@llap0`.
+
+#### Configure Spark cluster settings
+
+1. From a web browser, navigate to `https://CLUSTERNAME.azurehdinsight.net/#/main/services/SPARK2/configs` where CLUSTERNAME is the name of your Apache Spark cluster.
+
+1. Expand **Custom spark2-defaults**.
+
+    ![Apache Ambari Spark2 configuration](./media/apache-hive-warehouse-connector/hive-warehouse-connector-spark2-ambari.png)
+
+1. Select **Add Property...** to add the following configurations:
+
+    | Configuration | Value |
+    |----|----|
+    |`spark.sql.hive.hiveserver2.jdbc.url`|`jdbc:hive2://LLAPCLUSTERNAME.azurehdinsight.net:443/;user=admin;password=PWD;ssl=true;transportMode=http;httpPath=/hive2`. <br>Replace `LLAPCLUSTERNAME` with the name of your Interactive Query cluster. Replace `PWD` with the actual password.|
+    |`spark.datasource.hive.warehouse.load.staging.dir`|`wasbs://STORAGE_CONTAINER_NAME@STORAGE_ACCOUNT_NAME.blob.core.windows.net/tmp`. <br> Set to a suitable HDFS-compatible staging directory. If you have two different clusters, the staging directory should be a folder in the staging directory of the LLAP cluster's storage account so that HiveServer2 has access to it.  Replace `STORAGE_ACCOUNT_NAME` with the name of the storage account being used by the cluster, and `STORAGE_CONTAINER_NAME` with the name of the storage container. |
+    |`spark.datasource.hive.warehouse.metastoreUri`| The value you obtained earlier from **hive.metastore.uris**. |
+    |`spark.security.credentials.hiveserver2.enabled`|`true` for YARN cluster mode and `false` for YARN client mode. |
+    |`spark.hadoop.hive.zookeeper.quorum`| The value you obtained earlier from **hive.zookeeper.quorum**. |
+    |`spark.hadoop.hive.llap.daemon.service.hosts`| The value you obtained earlier from **hive.llap.daemon.service.hosts**. |
+
+1. Save changes and restart all affected components.
 
 ### Configure HWC for Enterprise Security Package (ESP) clusters
 
@@ -85,7 +100,7 @@ You can choose between a few different methods to connect to your Interactive Qu
 
 * [Spark-shell / PySpark](../spark/apache-spark-shell.md)
 * [Spark-submit](#spark-submit)
-* [Zeppelin](../spark/apache-spark-zeppelin-notebook.md)
+* [Zeppelin](./apache-hive-warehouse-connector-zeppelin-livy.md)
 
 
 Below are some examples to connect to HWC from Spark.
@@ -145,7 +160,7 @@ Firstly, SSH into the headnode of the Apache Spark cluster. For more information
 To learn about the spark operations supported by HWC, please follow this [document](./apache-hive-warehouse-connector-supported-spark-operations.md).
 
 
-#### Run queries on Enterprise Security Package (ESP) clusters
+## Run queries on Enterprise Security Package (ESP) clusters
 
 * Before initiating the spark-shell or spark-submit, execute the following command.
 
@@ -190,9 +205,9 @@ To learn about the spark operations supported by HWC, please follow this [docume
 
 ## Next steps
 
-* [Spark Operations Supported By Hive Warehouse Connector](./apache-hive-warehouse-connector-supported-spark-operations.md)
+* [Apache Spark Operations Supported By Hive Warehouse Connector](./apache-hive-warehouse-connector-supported-spark-operations.md)
 * [Use Interactive Query with HDInsight](https://docs.microsoft.com/azure/hdinsight/interactive-query/apache-interactive-query-get-started)
-* [Use Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight](./apache-hive-warehouse-connector-zeppelin-livy.md)
+* [Integrate Apache Zeppelin with Hive Warehouse Connector on Azure HDInsight](./apache-hive-warehouse-connector-zeppelin-livy.md)
 * [Examples of interacting with Hive Warehouse Connector using Zeppelin, Livy, spark-submit, and pyspark](https://community.hortonworks.com/articles/223626/integrating-apache-hive-with-apache-spark-hive-war.html)
 
 If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support: