Skip to content

Commit cd0b288

Browse files
Merge pull request #106088 from dagiro/freshness7
freshness7
2 parents 143db66 + dbd7d1e commit cd0b288

File tree

1 file changed

+35
-30
lines changed

1 file changed

+35
-30
lines changed

articles/hdinsight/spark/apache-spark-livy-rest-interface.md

Lines changed: 35 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -5,37 +5,35 @@ author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive,hdiseo17may2017
98
ms.topic: conceptual
10-
ms.date: 06/11/2019
9+
ms.custom: hdinsightactive,hdiseo17may2017
10+
ms.date: 02/28/2020
1111
---
1212

1313
# Use Apache Spark REST API to submit remote jobs to an HDInsight Spark cluster
1414

15-
Learn how to use [Apache Livy](https://livy.incubator.apache.org/), the [Apache Spark](https://spark.apache.org/) REST API, which is used to submit remote jobs to an Azure HDInsight Spark cluster. For detailed documentation, see [https://livy.incubator.apache.org/](https://livy.incubator.apache.org/).
15+
Learn how to use [Apache Livy](https://livy.incubator.apache.org/), the Apache Spark REST API, which is used to submit remote jobs to an Azure HDInsight Spark cluster. For detailed documentation, see [Apache Livy](https://livy.incubator.apache.org/docs/latest/rest-api.html).
1616

1717
You can use Livy to run interactive Spark shells or submit batch jobs to be run on Spark. This article talks about using Livy to submit batch jobs. The snippets in this article use cURL to make REST API calls to the Livy Spark endpoint.
1818

1919
## Prerequisites
2020

21-
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md).
22-
23-
* [cURL](https://curl.haxx.se/). This article uses cURL to demonstrate how to make REST API calls against an HDInsight Spark cluster.
21+
An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md).
2422

2523
## Submit an Apache Livy Spark batch job
2624

2725
Before you submit a batch job, you must upload the application jar on the cluster storage associated with the cluster. You can use [AzCopy](../../storage/common/storage-use-azcopy.md), a command-line utility, to do so. There are various other clients you can use to upload data. You can find more about them at [Upload data for Apache Hadoop jobs in HDInsight](../hdinsight-upload-data.md).
2826

2927
```cmd
30-
curl -k --user "<hdinsight user>:<user password>" -v -H "Content-Type: application/json" -X POST -d '{ "file":"<path to application jar>", "className":"<classname in jar>" }' 'https://<spark_cluster_name>.azurehdinsight.net/livy/batches' -H "X-Requested-By: admin"
28+
curl -k --user "admin:password" -v -H "Content-Type: application/json" -X POST -d '{ "file":"<path to application jar>", "className":"<classname in jar>" }' 'https://<spark_cluster_name>.azurehdinsight.net/livy/batches' -H "X-Requested-By: admin"
3129
```
3230

3331
### Examples
3432

35-
* If the jar file is on the cluster storage (WASB)
33+
* If the jar file is on the cluster storage (WASBS)
3634

3735
```cmd
38-
curl -k --user "admin:mypassword1!" -v -H "Content-Type: application/json" -X POST -d '{ "file":"wasb://[email protected]/data/SparkSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }' "https://mysparkcluster.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
36+
curl -k --user "admin:mypassword1!" -v -H "Content-Type: application/json" -X POST -d '{ "file":"wasbs://[email protected]/data/SparkSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }' "https://mysparkcluster.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
3937
```
4038
4139
* If you want to pass the jar filename and the classname as part of an input file (in this example, input.txt)
@@ -49,15 +47,15 @@ curl -k --user "<hdinsight user>:<user password>" -v -H "Content-Type: applicati
4947
Syntax:
5048
5149
```cmd
52-
curl -k --user "<hdinsight user>:<user password>" -v -X GET "https://<spark_cluster_name>.azurehdinsight.net/livy/batches"
50+
curl -k --user "admin:password" -v -X GET "https://<spark_cluster_name>.azurehdinsight.net/livy/batches"
5351
```
5452

5553
### Examples
5654

5755
* If you want to retrieve all the Livy Spark batches running on the cluster:
5856

5957
```cmd
60-
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches"
58+
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches"
6159
```
6260
6361
* If you want to retrieve a specific batch with a given batch ID
@@ -69,7 +67,7 @@ curl -k --user "<hdinsight user>:<user password>" -v -X GET "https://<spark_clus
6967
## Delete a Livy Spark batch job
7068
7169
```cmd
72-
curl -k --user "<hdinsight user>:<user password>" -v -X DELETE "https://<spark_cluster_name>.azurehdinsight.net/livy/batches/{batchId}"
70+
curl -k --user "admin:mypassword1!" -v -X DELETE "https://<spark_cluster_name>.azurehdinsight.net/livy/batches/{batchId}"
7371
```
7472

7573
### Example
@@ -84,22 +82,29 @@ curl -k --user "admin:mypassword1!" -v -X DELETE "https://mysparkcluster.azurehd
8482

8583
Livy provides high-availability for Spark jobs running on the cluster. Here is a couple of examples.
8684

87-
* If the Livy service goes down after you have submitted a job remotely to a Spark cluster, the job continues to run in the background. When Livy is back up, it restores the status of the job and reports it back.
88-
* Jupyter notebooks for HDInsight are powered by Livy in the backend. If a notebook is running a Spark job and the Livy service gets restarted, the notebook continues to run the code cells.
85+
* If the Livy service goes down after you've submitted a job remotely to a Spark cluster, the job continues to run in the background. When Livy is back up, it restores the status of the job and reports it back.
86+
* Jupyter notebooks for HDInsight are powered by Livy in the backend. If a notebook is running a Spark job and the Livy service gets restarted, the notebook continues to run the code cells.
8987

9088
## Show me an example
9189

92-
In this section, we look at examples to use Livy Spark to submit batch job, monitor the progress of the job, and then delete it. The application we use in this example is the one developed in the article [Create a standalone Scala application and to run on HDInsight Spark cluster](apache-spark-create-standalone-application.md). The steps here assume that:
90+
In this section, we look at examples to use Livy Spark to submit batch job, monitor the progress of the job, and then delete it. The application we use in this example is the one developed in the article [Create a standalone Scala application and to run on HDInsight Spark cluster](apache-spark-create-standalone-application.md). The steps here assume:
9391

94-
* You have already copied over the application jar to the storage account associated with the cluster.
95-
* You have CuRL installed on the computer where you are trying these steps.
92+
* You've already copied over the application jar to the storage account associated with the cluster.
93+
* You've CuRL installed on the computer where you're trying these steps.
9694

9795
Perform the following steps:
9896

99-
1. Let us first verify that Livy Spark is running on the cluster. We can do so by getting a list of running batches. If you are running a job using Livy for the first time, the output should return zero.
97+
1. For ease of use, set environment variables. This example is based on a Windows environment, revise variables as needed for your environment. Replace `CLUSTERNAME`, and `PASSWORD` with the appropriate values.
10098

10199
```cmd
102-
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches"
100+
set clustername=CLUSTERNAME
101+
set password=PASSWORD
102+
```
103+
104+
1. Verify that Livy Spark is running on the cluster. We can do so by getting a list of running batches. If you're running a job using Livy for the first time, the output should return zero.
105+
106+
```cmd
107+
curl -k --user "admin:%password%" -v -X GET "https://%clustername%.azurehdinsight.net/livy/batches"
103108
```
104109
105110
You should get an output similar to the following snippet:
@@ -118,16 +123,16 @@ Perform the following steps:
118123
119124
Notice how the last line in the output says **total:0**, which suggests no running batches.
120125
121-
2. Let us now submit a batch job. The following snippet uses an input file (input.txt) to pass the jar name and the class name as parameters. If you are running these steps from a Windows computer, using an input file is the recommended approach.
126+
1. Let us now submit a batch job. The following snippet uses an input file (input.txt) to pass the jar name and the class name as parameters. If you're running these steps from a Windows computer, using an input file is the recommended approach.
122127
123128
```cmd
124-
curl -k --user "admin:mypassword1!" -v -H "Content-Type: application/json" -X POST --data @C:\Temp\input.txt "https://mysparkcluster.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
129+
curl -k --user "admin:%password%" -v -H "Content-Type: application/json" -X POST --data @C:\Temp\input.txt "https://%clustername%.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
125130
```
126131
127132
The parameters in the file **input.txt** are defined as follows:
128133
129134
```text
130-
{ "file":"wasb:///example/jars/SparkSimpleApp.jar", "className":"com.microsoft.spark.example.WasbIOTest" }
135+
{ "file":"wasbs:///example/jars/SparkSimpleApp.jar", "className":"com.microsoft.spark.example.WasbIOTest" }
131136
```
132137
133138
You should see an output similar to the following snippet:
@@ -147,10 +152,10 @@ Perform the following steps:
147152
148153
Notice how the last line of the output says **state:starting**. It also says, **id:0**. Here, **0** is the batch ID.
149154
150-
3. You can now retrieve the status of this specific batch using the batch ID.
155+
1. You can now retrieve the status of this specific batch using the batch ID.
151156
152157
```cmd
153-
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches/0"
158+
curl -k --user "admin:%password%" -v -X GET "https://%clustername%.azurehdinsight.net/livy/batches/0"
154159
```
155160
156161
You should see an output similar to the following snippet:
@@ -164,15 +169,15 @@ Perform the following steps:
164169
< Date: Fri, 20 Nov 2015 23:54:42 GMT
165170
< Content-Length: 509
166171
<
167-
{"id":0,"state":"success","log":["\t diagnostics: N/A","\t ApplicationMaster host: 10.0.0.4","\t ApplicationMaster RPC port: 0","\t queue: default","\t start time: 1448063505350","\t final status: SUCCEEDED","\t tracking URL: http://myspar.lpel1gnnvxne3gwzqkfq5u5uzh.jx.internal.cloudapp.net:8088/proxy/application_1447984474852_0002/","\t user: root","15/11/20 23:52:47 INFO Utils: Shutdown hook called","15/11/20 23:52:47 INFO Utils: Deleting directory /tmp/spark-b72cd2bf-280b-4c57-8ceb-9e3e69ac7d0c"]}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
172+
{"id":0,"state":"success","log":["\t diagnostics: N/A","\t ApplicationMaster host: 10.0.0.4","\t ApplicationMaster RPC port: 0","\t queue: default","\t start time: 1448063505350","\t final status: SUCCEEDED","\t tracking URL: http://myspar.lpel.jx.internal.cloudapp.net:8088/proxy/application_1447984474852_0002/","\t user: root","15/11/20 23:52:47 INFO Utils: Shutdown hook called","15/11/20 23:52:47 INFO Utils: Deleting directory /tmp/spark-b72cd2bf-280b-4c57-8ceb-9e3e69ac7d0c"]}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
168173
```
169174
170175
The output now shows **state:success**, which suggests that the job was successfully completed.
171176
172-
4. If you want, you can now delete the batch.
177+
1. If you want, you can now delete the batch.
173178
174179
```cmd
175-
curl -k --user "admin:mypassword1!" -v -X DELETE "https://mysparkcluster.azurehdinsight.net/livy/batches/0"
180+
curl -k --user "admin:%password%" -v -X DELETE "https://%clustername%.azurehdinsight.net/livy/batches/0"
176181
```
177182
178183
You should see an output similar to the following snippet:
@@ -189,11 +194,11 @@ Perform the following steps:
189194
{"msg":"deleted"}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
190195
```
191196
192-
The last line of the output shows that the batch was successfully deleted. Deleting a job, while it is running, also kills the job. If you delete a job that has completed, successfully or otherwise, it deletes the job information completely.
197+
The last line of the output shows that the batch was successfully deleted. Deleting a job, while it's running, also kills the job. If you delete a job that has completed, successfully or otherwise, it deletes the job information completely.
193198
194199
## Updates to Livy configuration starting with HDInsight 3.5 version
195200
196-
HDInsight 3.5 clusters and above, by default, disable use of local file paths to access sample data files or jars. We encourage you to use the `wasb://` path instead to access jars or sample data files from the cluster.
201+
HDInsight 3.5 clusters and above, by default, disable use of local file paths to access sample data files or jars. We encourage you to use the `wasbs://` path instead to access jars or sample data files from the cluster.
197202
198203
## Submitting Livy jobs for a cluster within an Azure virtual network
199204
@@ -203,4 +208,4 @@ If you connect to an HDInsight Spark cluster from within an Azure Virtual Networ
203208
204209
* [Apache Livy REST API documentation](https://livy.incubator.apache.org/docs/latest/rest-api.html)
205210
* [Manage resources for the Apache Spark cluster in Azure HDInsight](apache-spark-resource-manager.md)
206-
* [Track and debug jobs running on an Apache Spark cluster in HDInsight](apache-spark-job-debugging.md)
211+
* [Track and debug jobs running on an Apache Spark cluster in HDInsight](apache-spark-job-debugging.md)

0 commit comments

Comments
 (0)