You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/interactive-query/apache-hive-warehouse-connector.md
+8-2Lines changed: 8 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ ms.date: 05/28/2020
10
10
11
11
# Integrate Apache Spark and Apache Hive with Hive Warehouse Connector in Azure HDInsight
12
12
13
-
The Apache Hive Warehouse Connector (HWC) is a library that allows you to work more easily with Apache Spark and Apache Hive. It supports tasks such as moving data between Spark DataFrames and Hive tables. Also, by directing Spark streaming data into Hive tables. Hive Warehouse Connector works like a bridge between Spark and Hive. It also supports Scala, Java, and Python as programming languages for development.
13
+
The Apache Hive Warehouse Connector (HWC) is a library that allows you to work more easily with Apache Spark and Apache Hive. It supports tasks such as moving data between Spark DataFrames and Hive tables. Also, by directing Spark streaming data into Hive tables. Hive Warehouse Connector works like a bridge between Spark and Hive. It also supports Scala, Java, and Python as programming languages for development. However, R language is not supported.
14
14
15
15
The Hive Warehouse Connector allows you to take advantage of the unique features of Hive and Spark to build powerful big-data applications.
16
16
@@ -122,6 +122,8 @@ Below are some examples to connect to HWC from Spark.
122
122
123
123
### Spark-shell
124
124
125
+
This is a way to run Spark interactively through a modified version of the Scala shell.
126
+
125
127
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
126
128
127
129
```cmd
@@ -151,6 +153,9 @@ Below are some examples to connect to HWC from Spark.
151
153
152
154
### Spark-submit
153
155
156
+
Spark-submit is a utility to submit any spark program (or job) to Spark clusters.
157
+
The spark-submit job will setup and configure Spark and Hive Warehouse Connector as per our instructions, execute the program we pass to it, then cleanly release the resources that were being used.
158
+
154
159
Once you build the scala/java code along with the dependencies into an assembly jar, use the below command to launch a Spark application. Replace `<VERSION>`, and `<APP_JAR_PATH>` with the actual values.
155
160
156
161
* YARN Client mode
@@ -176,7 +181,8 @@ Once you build the scala/java code along with the dependencies into an assembly
176
181
/<APP_JAR_PATH>/myHwcAppProject.jar
177
182
```
178
183
179
-
For Python, add the following configuration as well.
184
+
This utility is also used when we have written the entire application in pySpark and packaged into py files (Python), so that we can submit the entire code to Spark cluster for execution.
185
+
For Python applications, simply pass a .py file in the place of /<APP_JAR_PATH>/myHwcAppProject.jar, and add the below configuration (Python .zip) file to the search path with --py-files.
0 commit comments