Skip to content

Commit 0531f22

Browse files
Detailing 'spark-submit' utility & unsupportability of R
1 parent 193afc8 commit 0531f22

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

articles/hdinsight/interactive-query/apache-hive-warehouse-connector.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.date: 05/28/2020
1010

1111
# Integrate Apache Spark and Apache Hive with Hive Warehouse Connector in Azure HDInsight
1212

13-
The Apache Hive Warehouse Connector (HWC) is a library that allows you to work more easily with Apache Spark and Apache Hive. It supports tasks such as moving data between Spark DataFrames and Hive tables. Also, by directing Spark streaming data into Hive tables. Hive Warehouse Connector works like a bridge between Spark and Hive. It also supports Scala, Java, and Python as programming languages for development.
13+
The Apache Hive Warehouse Connector (HWC) is a library that allows you to work more easily with Apache Spark and Apache Hive. It supports tasks such as moving data between Spark DataFrames and Hive tables. Also, by directing Spark streaming data into Hive tables. Hive Warehouse Connector works like a bridge between Spark and Hive. It also supports Scala, Java, and Python as programming languages for development. However, R language is not supported.
1414

1515
The Hive Warehouse Connector allows you to take advantage of the unique features of Hive and Spark to build powerful big-data applications.
1616

@@ -122,6 +122,8 @@ Below are some examples to connect to HWC from Spark.
122122

123123
### Spark-shell
124124

125+
This is a way to run Spark interactively through a modified version of the Scala shell.
126+
125127
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your Apache Spark cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
126128

127129
```cmd
@@ -151,6 +153,9 @@ Below are some examples to connect to HWC from Spark.
151153
152154
### Spark-submit
153155
156+
Spark-submit is a utility to submit any spark program (or job) to Spark clusters.
157+
The spark-submit job will setup and configure Spark and Hive Warehouse Connector as per our instructions, execute the program we pass to it, then cleanly release the resources that were being used.
158+
154159
Once you build the scala/java code along with the dependencies into an assembly jar, use the below command to launch a Spark application. Replace `<VERSION>`, and `<APP_JAR_PATH>` with the actual values.
155160
156161
* YARN Client mode
@@ -176,7 +181,8 @@ Once you build the scala/java code along with the dependencies into an assembly
176181
/<APP_JAR_PATH>/myHwcAppProject.jar
177182
```
178183
179-
For Python, add the following configuration as well.
184+
This utility is also used when we have written the entire application in pySpark and packaged into py files (Python), so that we can submit the entire code to Spark cluster for execution.
185+
For Python applications, simply pass a .py file in the place of /<APP_JAR_PATH>/myHwcAppProject.jar, and add the below configuration (Python .zip) file to the search path with --py-files.
180186
181187
```python
182188
--py-files /usr/hdp/current/hive_warehouse_connector/pyspark_hwc-<VERSION>.zip

0 commit comments

Comments
 (0)