Skip to content

Commit 10ced7a

Browse files
authored
Merge pull request #79316 from dagiro/freshness120
freshness120
2 parents 58e49f4 + 0756682 commit 10ced7a

File tree

1 file changed

+128
-87
lines changed

1 file changed

+128
-87
lines changed

articles/hdinsight/spark/apache-spark-livy-rest-interface.md

Lines changed: 128 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -7,60 +7,88 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.custom: hdinsightactive,hdiseo17may2017
99
ms.topic: conceptual
10-
ms.date: 11/06/2018
10+
ms.date: 06/11/2019
1111
---
12+
1213
# Use Apache Spark REST API to submit remote jobs to an HDInsight Spark cluster
1314

1415
Learn how to use [Apache Livy](https://livy.incubator.apache.org/), the [Apache Spark](https://spark.apache.org/) REST API, which is used to submit remote jobs to an Azure HDInsight Spark cluster. For detailed documentation, see [https://livy.incubator.apache.org/](https://livy.incubator.apache.org/).
1516

1617
You can use Livy to run interactive Spark shells or submit batch jobs to be run on Spark. This article talks about using Livy to submit batch jobs. The snippets in this article use cURL to make REST API calls to the Livy Spark endpoint.
1718

18-
**Prerequisites:**
19+
## Prerequisites
1920

2021
* An Apache Spark cluster on HDInsight. For instructions, see [Create Apache Spark clusters in Azure HDInsight](apache-spark-jupyter-spark-sql.md).
2122

2223
* [cURL](https://curl.haxx.se/). This article uses cURL to demonstrate how to make REST API calls against an HDInsight Spark cluster.
2324

2425
## Submit an Apache Livy Spark batch job
25-
Before you submit a batch job, you must upload the application jar on the cluster storage associated with the cluster. You can use [**AzCopy**](../../storage/common/storage-use-azcopy.md), a command-line utility, to do so. There are various other clients you can use to upload data. You can find more about them at [Upload data for Apache Hadoop jobs in HDInsight](../hdinsight-upload-data.md).
2626

27-
curl -k --user "<hdinsight user>:<user password>" -v -H "Content-Type: application/json" -X POST -d '{ "file":"<path to application jar>", "className":"<classname in jar>" }' 'https://<spark_cluster_name>.azurehdinsight.net/livy/batches' -H "X-Requested-By: admin"
27+
Before you submit a batch job, you must upload the application jar on the cluster storage associated with the cluster. You can use [AzCopy](../../storage/common/storage-use-azcopy.md), a command-line utility, to do so. There are various other clients you can use to upload data. You can find more about them at [Upload data for Apache Hadoop jobs in HDInsight](../hdinsight-upload-data.md).
28+
29+
```cmd
30+
curl -k --user "<hdinsight user>:<user password>" -v -H "Content-Type: application/json" -X POST -d '{ "file":"<path to application jar>", "className":"<classname in jar>" }' 'https://<spark_cluster_name>.azurehdinsight.net/livy/batches' -H "X-Requested-By: admin"
31+
```
2832

29-
**Examples**:
33+
### Examples
3034

3135
* If the jar file is on the cluster storage (WASB)
32-
33-
curl -k --user "admin:mypassword1!" -v -H "Content-Type: application/json" -X POST -d '{ "file":"wasb://[email protected]/data/SparkSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }' "https://mysparkcluster.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
36+
37+
```cmd
38+
curl -k --user "admin:mypassword1!" -v -H "Content-Type: application/json" -X POST -d '{ "file":"wasb://[email protected]/data/SparkSimpleTest.jar", "className":"com.microsoft.spark.test.SimpleFile" }' "https://mysparkcluster.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
39+
```
40+
3441
* If you want to pass the jar filename and the classname as part of an input file (in this example, input.txt)
35-
36-
curl -k --user "admin:mypassword1!" -v -H "Content-Type: application/json" -X POST --data @C:\Temp\input.txt "https://mysparkcluster.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
42+
43+
```cmd
44+
curl -k --user "admin:mypassword1!" -v -H "Content-Type: application/json" -X POST --data @C:\Temp\input.txt "https://mysparkcluster.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
45+
```
3746
3847
## Get information on Livy Spark batches running on the cluster
39-
curl -k --user "<hdinsight user>:<user password>" -v -X GET "https://<spark_cluster_name>.azurehdinsight.net/livy/batches"
4048
41-
**Examples**:
49+
Syntax:
50+
51+
```cmd
52+
curl -k --user "<hdinsight user>:<user password>" -v -X GET "https://<spark_cluster_name>.azurehdinsight.net/livy/batches"
53+
```
54+
55+
### Examples
4256

4357
* If you want to retrieve all the Livy Spark batches running on the cluster:
44-
45-
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches"
46-
* If you want to retrieve a specific batch with a given batchId
47-
48-
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches/{batchId}"
58+
59+
```cmd
60+
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches"
61+
```
62+
63+
* If you want to retrieve a specific batch with a given batch ID
64+
65+
```cmd
66+
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches/{batchId}"
67+
```
4968
5069
## Delete a Livy Spark batch job
51-
curl -k --user "<hdinsight user>:<user password>" -v -X DELETE "https://<spark_cluster_name>.azurehdinsight.net/livy/batches/{batchId}"
5270
53-
**Example**:
71+
```cmd
72+
curl -k --user "<hdinsight user>:<user password>" -v -X DELETE "https://<spark_cluster_name>.azurehdinsight.net/livy/batches/{batchId}"
73+
```
5474

55-
curl -k --user "admin:mypassword1!" -v -X DELETE "https://mysparkcluster.azurehdinsight.net/livy/batches/{batchId}"
75+
### Example
76+
77+
Deleting a batch job with batch ID `5`.
78+
79+
```cmd
80+
curl -k --user "admin:mypassword1!" -v -X DELETE "https://mysparkcluster.azurehdinsight.net/livy/batches/5"
81+
```
5682

5783
## Livy Spark and high-availability
84+
5885
Livy provides high-availability for Spark jobs running on the cluster. Here is a couple of examples.
5986

6087
* If the Livy service goes down after you have submitted a job remotely to a Spark cluster, the job continues to run in the background. When Livy is back up, it restores the status of the job and reports it back.
6188
* Jupyter notebooks for HDInsight are powered by Livy in the backend. If a notebook is running a Spark job and the Livy service gets restarted, the notebook continues to run the code cells.
6289

6390
## Show me an example
91+
6492
In this section, we look at examples to use Livy Spark to submit batch job, monitor the progress of the job, and then delete it. The application we use in this example is the one developed in the article [Create a standalone Scala application and to run on HDInsight Spark cluster](apache-spark-create-standalone-application.md). The steps here assume that:
6593

6694
* You have already copied over the application jar to the storage account associated with the cluster.
@@ -69,97 +97,110 @@ In this section, we look at examples to use Livy Spark to submit batch job, moni
6997
Perform the following steps:
7098

7199
1. Let us first verify that Livy Spark is running on the cluster. We can do so by getting a list of running batches. If you are running a job using Livy for the first time, the output should return zero.
72-
73-
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches"
74-
100+
101+
```cmd
102+
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches"
103+
```
104+
75105
You should get an output similar to the following snippet:
76-
77-
< HTTP/1.1 200 OK
78-
< Content-Type: application/json; charset=UTF-8
79-
< Server: Microsoft-IIS/8.5
80-
< X-Powered-By: ARR/2.5
81-
< X-Powered-By: ASP.NET
82-
< Date: Fri, 20 Nov 2015 23:47:53 GMT
83-
< Content-Length: 34
84-
<
85-
{"from":0,"total":0,"sessions":[]}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
86-
106+
107+
```output
108+
< HTTP/1.1 200 OK
109+
< Content-Type: application/json; charset=UTF-8
110+
< Server: Microsoft-IIS/8.5
111+
< X-Powered-By: ARR/2.5
112+
< X-Powered-By: ASP.NET
113+
< Date: Fri, 20 Nov 2015 23:47:53 GMT
114+
< Content-Length: 34
115+
<
116+
{"from":0,"total":0,"sessions":[]}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
117+
```
118+
87119
Notice how the last line in the output says **total:0**, which suggests no running batches.
88120
89121
2. Let us now submit a batch job. The following snippet uses an input file (input.txt) to pass the jar name and the class name as parameters. If you are running these steps from a Windows computer, using an input file is the recommended approach.
90-
91-
curl -k --user "admin:mypassword1!" -v -H "Content-Type: application/json" -X POST --data @C:\Temp\input.txt "https://mysparkcluster.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
92-
122+
123+
```cmd
124+
curl -k --user "admin:mypassword1!" -v -H "Content-Type: application/json" -X POST --data @C:\Temp\input.txt "https://mysparkcluster.azurehdinsight.net/livy/batches" -H "X-Requested-By: admin"
125+
```
126+
93127
The parameters in the file **input.txt** are defined as follows:
94-
95-
{ "file":"wasb:///example/jars/SparkSimpleApp.jar", "className":"com.microsoft.spark.example.WasbIOTest" }
96-
128+
129+
```text
130+
{ "file":"wasb:///example/jars/SparkSimpleApp.jar", "className":"com.microsoft.spark.example.WasbIOTest" }
131+
```
132+
97133
You should see an output similar to the following snippet:
98-
99-
< HTTP/1.1 201 Created
100-
< Content-Type: application/json; charset=UTF-8
101-
< Location: /0
102-
< Server: Microsoft-IIS/8.5
103-
< X-Powered-By: ARR/2.5
104-
< X-Powered-By: ASP.NET
105-
< Date: Fri, 20 Nov 2015 23:51:30 GMT
106-
< Content-Length: 36
107-
<
108-
{"id":0,"state":"starting","log":[]}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
109-
134+
135+
```output
136+
< HTTP/1.1 201 Created
137+
< Content-Type: application/json; charset=UTF-8
138+
< Location: /0
139+
< Server: Microsoft-IIS/8.5
140+
< X-Powered-By: ARR/2.5
141+
< X-Powered-By: ASP.NET
142+
< Date: Fri, 20 Nov 2015 23:51:30 GMT
143+
< Content-Length: 36
144+
<
145+
{"id":0,"state":"starting","log":[]}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
146+
```
147+
110148
Notice how the last line of the output says **state:starting**. It also says, **id:0**. Here, **0** is the batch ID.
111149
112150
3. You can now retrieve the status of this specific batch using the batch ID.
113-
114-
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches/0"
115-
151+
152+
```cmd
153+
curl -k --user "admin:mypassword1!" -v -X GET "https://mysparkcluster.azurehdinsight.net/livy/batches/0"
154+
```
155+
116156
You should see an output similar to the following snippet:
117-
118-
< HTTP/1.1 200 OK
119-
< Content-Type: application/json; charset=UTF-8
120-
< Server: Microsoft-IIS/8.5
121-
< X-Powered-By: ARR/2.5
122-
< X-Powered-By: ASP.NET
123-
< Date: Fri, 20 Nov 2015 23:54:42 GMT
124-
< Content-Length: 509
125-
<
126-
{"id":0,"state":"success","log":["\t diagnostics: N/A","\t ApplicationMaster host: 10.0.0.4","\t ApplicationMaster RPC port: 0","\t queue: default","\t start time: 1448063505350","\t final status: SUCCEEDED","\t tracking URL: http://hn0-myspar.lpel1gnnvxne3gwzqkfq5u5uzh.jx.internal.cloudapp.net:8088/proxy/application_1447984474852_0002/","\t user: root","15/11/20 23:52:47 INFO Utils: Shutdown hook called","15/11/20 23:52:47 INFO Utils: Deleting directory /tmp/spark-b72cd2bf-280b-4c57-8ceb-9e3e69ac7d0c"]}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
127-
157+
158+
```output
159+
< HTTP/1.1 200 OK
160+
< Content-Type: application/json; charset=UTF-8
161+
< Server: Microsoft-IIS/8.5
162+
< X-Powered-By: ARR/2.5
163+
< X-Powered-By: ASP.NET
164+
< Date: Fri, 20 Nov 2015 23:54:42 GMT
165+
< Content-Length: 509
166+
<
167+
{"id":0,"state":"success","log":["\t diagnostics: N/A","\t ApplicationMaster host: 10.0.0.4","\t ApplicationMaster RPC port: 0","\t queue: default","\t start time: 1448063505350","\t final status: SUCCEEDED","\t tracking URL: http://hn0-myspar.lpel1gnnvxne3gwzqkfq5u5uzh.jx.internal.cloudapp.net:8088/proxy/application_1447984474852_0002/","\t user: root","15/11/20 23:52:47 INFO Utils: Shutdown hook called","15/11/20 23:52:47 INFO Utils: Deleting directory /tmp/spark-b72cd2bf-280b-4c57-8ceb-9e3e69ac7d0c"]}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
168+
```
169+
128170
The output now shows **state:success**, which suggests that the job was successfully completed.
129171
130172
4. If you want, you can now delete the batch.
131-
132-
curl -k --user "admin:mypassword1!" -v -X DELETE "https://mysparkcluster.azurehdinsight.net/livy/batches/0"
133-
173+
174+
```cmd
175+
curl -k --user "admin:mypassword1!" -v -X DELETE "https://mysparkcluster.azurehdinsight.net/livy/batches/0"
176+
```
177+
134178
You should see an output similar to the following snippet:
135-
136-
< HTTP/1.1 200 OK
137-
< Content-Type: application/json; charset=UTF-8
138-
< Server: Microsoft-IIS/8.5
139-
< X-Powered-By: ARR/2.5
140-
< X-Powered-By: ASP.NET
141-
< Date: Sat, 21 Nov 2015 18:51:54 GMT
142-
< Content-Length: 17
143-
<
144-
{"msg":"deleted"}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
145-
179+
180+
```output
181+
< HTTP/1.1 200 OK
182+
< Content-Type: application/json; charset=UTF-8
183+
< Server: Microsoft-IIS/8.5
184+
< X-Powered-By: ARR/2.5
185+
< X-Powered-By: ASP.NET
186+
< Date: Sat, 21 Nov 2015 18:51:54 GMT
187+
< Content-Length: 17
188+
<
189+
{"msg":"deleted"}* Connection #0 to host mysparkcluster.azurehdinsight.net left intact
190+
```
191+
146192
The last line of the output shows that the batch was successfully deleted. Deleting a job, while it is running, also kills the job. If you delete a job that has completed, successfully or otherwise, it deletes the job information completely.
147193
148194
## Updates to Livy configuration starting with HDInsight 3.5 version
149195
150-
HDInsight 3.5 clusters and above, by default, disable use of local file paths to access sample data files or jars. We encourage you to use the `wasb://` path instead to access jars or sample data files from the cluster.
196+
HDInsight 3.5 clusters and above, by default, disable use of local file paths to access sample data files or jars. We encourage you to use the `wasb://` path instead to access jars or sample data files from the cluster.
151197
152198
## Submitting Livy jobs for a cluster within an Azure virtual network
153199
154200
If you connect to an HDInsight Spark cluster from within an Azure Virtual Network, you can directly connect to Livy on the cluster. In such a case, the URL for Livy endpoint is `http://<IP address of the headnode>:8998/batches`. Here, **8998** is the port on which Livy runs on the cluster headnode. For more information on accessing services on non-public ports, see [Ports used by Apache Hadoop services on HDInsight](../hdinsight-hadoop-port-settings-for-services.md).
155201
156-
157-
158-
159-
160-
## Next step
202+
## Next steps
161203
162204
* [Apache Livy REST API documentation](https://livy.incubator.apache.org/docs/latest/rest-api.html)
163205
* [Manage resources for the Apache Spark cluster in Azure HDInsight](apache-spark-resource-manager.md)
164-
* [Track and debug jobs running on an Apache Spark cluster in HDInsight](apache-spark-job-debugging.md)
165-
206+
* [Track and debug jobs running on an Apache Spark cluster in HDInsight](apache-spark-job-debugging.md)

0 commit comments

Comments
 (0)