Skip to content

Commit 08d52b2

Browse files
committed
freshness174
1 parent 696ccf1 commit 08d52b2

File tree

1 file changed

+78
-51
lines changed

1 file changed

+78
-51
lines changed

articles/hdinsight/hadoop/apache-hadoop-use-mapreduce-curl.md

Lines changed: 78 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdinsightactive
10-
ms.date: 02/27/2018
10+
ms.date: 01/13/2020
1111
---
1212

1313
# Run MapReduce jobs with Apache Hadoop on HDInsight using REST
@@ -17,45 +17,91 @@ Learn how to use the Apache Hive WebHCat REST API to run MapReduce jobs on an Ap
1717
> [!NOTE]
1818
> If you are already familiar with using Linux-based Hadoop servers, but you are new to HDInsight, see the [What you need to know about Linux-based Apache Hadoop on HDInsight](../hdinsight-hadoop-linux-information.md) document.
1919
20+
## Prerequisites
2021

21-
## <a id="prereq"></a>Prerequisites
22+
* An Apache Hadoop cluster on HDInsight. See [Create Apache Hadoop clusters using the Azure portal](../hdinsight-hadoop-create-linux-clusters-portal.md).
2223

23-
* A Hadoop on HDInsight cluster
24-
* Windows PowerShell or [Curl](https://curl.haxx.se/) and [jq](https://stedolan.github.io/jq/)
24+
Either:
25+
* Windows PowerShell or,
26+
* [Curl](https://curl.haxx.se/) with [jq](https://stedolan.github.io/jq/)
2527

26-
## <a id="curl"></a>Run a MapReduce job
28+
## Run a MapReduce job
2729

2830
> [!NOTE]
2931
> When you use Curl or any other REST communication with WebHCat, you must authenticate the requests by providing the HDInsight cluster administrator user name and password. You must use the cluster name as part of the URI that is used to send the requests to the server.
3032
>
3133
> The REST API is secured by using [basic access authentication](https://en.wikipedia.org/wiki/Basic_access_authentication). You should always make requests by using HTTPS to ensure that your credentials are securely sent to the server.
3234
33-
1. To set the cluster login that is used by the scripts in this document, use one of the following commands:
35+
### Curl
36+
37+
1. For ease of use, set the variables below. This example is based on a Windows environment, revise as needed for your environment.
38+
39+
```cmd
40+
set CLUSTERNAME=
41+
set PASSWORD=
42+
```
43+
44+
1. From a command line, use the following command to verify that you can connect to your HDInsight cluster:
3445
3546
```bash
36-
read -p "Enter your cluster login account name: " LOGIN
47+
curl -u admin:%PASSWORD% -G https://%CLUSTERNAME%.azurehdinsight.net/templeton/v1/status
3748
```
3849
39-
```powershell
40-
$creds = Get-Credential -UserName admin -Message "Enter the cluster login name and password"
50+
The parameters used in this command are as follows:
51+
52+
* **-u**: Indicates the user name and password used to authenticate the request
53+
* **-G**: Indicates that this operation is a GET request
54+
55+
The beginning of the URI, `https://CLUSTERNAME.azurehdinsight.net/templeton/v1`, is the same for all requests.
56+
57+
You receive a response similar to the following JSON:
58+
59+
```output
60+
{"version":"v1","status":"ok"}
4161
```
4262
43-
2. To set the cluster name, use one of the following commands:
63+
1. To submit a MapReduce job, use the following command. Modify the path to **jq** as needed.
4464
45-
```bash
46-
read -p "Enter the HDInsight cluster name: " CLUSTERNAME
65+
```cmd
66+
curl -u admin:%PASSWORD% -d user.name=admin ^
67+
-d jar=/example/jars/hadoop-mapreduce-examples.jar ^
68+
-d class=wordcount -d arg=/example/data/gutenberg/davinci.txt -d arg=/example/data/output ^
69+
https://%CLUSTERNAME%.azurehdinsight.net/templeton/v1/mapreduce/jar | ^
70+
C:\HDI\jq-win64.exe .id
4771
```
4872
49-
```powershell
50-
$clusterName = Read-Host -Prompt "Enter the HDInsight cluster name"
73+
The end of the URI (/mapreduce/jar) tells WebHCat that this request starts a MapReduce job from a class in a jar file. The parameters used in this command are as follows:
74+
75+
* **-d**: `-G` isn't used, so the request defaults to the POST method. `-d` specifies the data values that are sent with the request.
76+
* **user.name**: The user who is running the command
77+
* **jar**: The location of the jar file that contains class to be ran
78+
* **class**: The class that contains the MapReduce logic
79+
* **arg**: The arguments to be passed to the MapReduce job. In this case, the input text file and the directory that are used for the output
80+
81+
This command should return a job ID that can be used to check the status of the job:
82+
83+
job_1415651640909_0026
84+
85+
1. To check the status of the job, use the following command. Replace the value for `JOBID` with the **actual** value returned in the previous step. Revise location of **jq** as needed.
86+
87+
```cmd
88+
set JOBID=job_1415651640909_0026
89+
90+
curl -G -u admin:%PASSWORD% -d user.name=admin https://%CLUSTERNAME%.azurehdinsight.net/templeton/v1/jobs/%JOBID% | ^
91+
C:\HDI\jq-win64.exe .status.state
5192
```
5293
53-
3. From a command line, use the following command to verify that you can connect to your HDInsight cluster:
94+
### PowerShell
5495
55-
```bash
56-
curl -u $LOGIN -G https://$CLUSTERNAME.azurehdinsight.net/templeton/v1/status
96+
1. For ease of use, set the variables below. Replace `CLUSTERNAME` with your actual cluster name. Execute the command and enter the cluster login password when prompted.
97+
98+
```powershell
99+
$clusterName="CLUSTERNAME"
100+
$creds = Get-Credential -UserName admin -Message "Enter the cluster login password"
57101
```
58102
103+
1. use the following command to verify that you can connect to your HDInsight cluster:
104+
59105
```powershell
60106
$resp = Invoke-WebRequest -Uri "https://$clustername.azurehdinsight.net/templeton/v1/status" `
61107
-Credential $creds `
@@ -65,22 +111,12 @@ Learn how to use the Apache Hive WebHCat REST API to run MapReduce jobs on an Ap
65111
66112
You receive a response similar to the following JSON:
67113
68-
{"status":"ok","version":"v1"}
69-
70-
The parameters used in this command are as follows:
71-
72-
* **-u**: Indicates the user name and password used to authenticate the request
73-
* **-G**: Indicates that this operation is a GET request
74-
75-
The beginning of the URI, `https://CLUSTERNAME.azurehdinsight.net/templeton/v1`, is the same for all requests.
76-
77-
4. To submit a MapReduce job, use the following command:
78-
79-
```bash
80-
JOBID=`curl -u $LOGIN -d user.name=$LOGIN -d jar=/example/jars/hadoop-mapreduce-examples.jar -d class=wordcount -d arg=/example/data/gutenberg/davinci.txt -d arg=/example/data/output https://$CLUSTERNAME.azurehdinsight.net/templeton/v1/mapreduce/jar | jq .id`
81-
echo $JOBID
114+
```output
115+
{"version":"v1","status":"ok"}
82116
```
83117
118+
1. To submit a MapReduce job, use the following command:
119+
84120
```powershell
85121
$reqParams = @{}
86122
$reqParams."user.name" = "admin"
@@ -100,52 +136,43 @@ Learn how to use the Apache Hive WebHCat REST API to run MapReduce jobs on an Ap
100136
101137
The end of the URI (/mapreduce/jar) tells WebHCat that this request starts a MapReduce job from a class in a jar file. The parameters used in this command are as follows:
102138
103-
* **-d**: `-G` is not used, so the request defaults to the POST method. `-d` specifies the data values that are sent with the request.
104-
* **user.name**: The user who is running the command
105-
* **jar**: The location of the jar file that contains class to be ran
106-
* **class**: The class that contains the MapReduce logic
107-
* **arg**: The arguments to be passed to the MapReduce job. In this case, the input text file and the directory that are used for the output
139+
* **user.name**: The user who is running the command
140+
* **jar**: The location of the jar file that contains class to be ran
141+
* **class**: The class that contains the MapReduce logic
142+
* **arg**: The arguments to be passed to the MapReduce job. In this case, the input text file and the directory that are used for the output
108143
109144
This command should return a job ID that can be used to check the status of the job:
110145
111146
job_1415651640909_0026
112147
113-
5. To check the status of the job, use the following command:
114-
115-
```bash
116-
curl -G -u $LOGIN -d user.name=$LOGIN https://$CLUSTERNAME.azurehdinsight.net/templeton/v1/jobs/$JOBID | jq .status.state
117-
```
148+
1. To check the status of the job, use the following command:
118149
119150
```powershell
120151
$reqParams=@{"user.name"="admin"}
121152
$resp = Invoke-WebRequest -Uri "https://$clusterName.azurehdinsight.net/templeton/v1/jobs/$jobID" `
122153
-Credential $creds `
123154
-Body $reqParams `
124155
-UseBasicParsing
156+
125157
# ConvertFrom-JSON can't handle duplicate names with different case
126158
# So change one to prevent the error
127159
$fixDup=$resp.Content.Replace("jobID","job_ID")
128160
(ConvertFrom-Json $fixDup).status.state
129161
```
130162
131-
If the job is complete, the state returned is `SUCCEEDED`.
163+
### Both methods
132164
133-
> [!NOTE]
134-
> This Curl request returns a JSON document with information about the job. Jq is used to retrieve only the state value.
165+
1. If the job is complete, the state returned is `SUCCEEDED`.
135166
136-
6. When the state of the job has changed to `SUCCEEDED`, you can retrieve the results of the job from Azure Blob storage. The `statusdir` parameter that is passed with the query contains the location of the output file. In this example, the location is `/example/curl`. This address stores the output of the job in the clusters default storage at `/example/curl`.
167+
1. When the state of the job has changed to `SUCCEEDED`, you can retrieve the results of the job from Azure Blob storage. The `statusdir` parameter that is passed with the query contains the location of the output file. In this example, the location is `/example/curl`. This address stores the output of the job in the clusters default storage at `/example/curl`.
137168
138169
You can list and download these files by using the [Azure CLI](https://docs.microsoft.com/cli/azure/install-azure-cli). For more information on working with blobs from the Azure CLI, see the [Using the Azure CLI with Azure Storage](../../storage/common/storage-azure-cli.md#create-and-manage-blobs) document.
139170
140-
## <a id="nextsteps"></a>Next steps
141-
142-
For general information about MapReduce jobs in HDInsight:
143-
144-
* [Use MapReduce with Apache Hadoop on HDInsight](hdinsight-use-mapreduce.md)
171+
## Next steps
145172
146173
For information about other ways you can work with Hadoop on HDInsight:
147174
175+
* [Use MapReduce with Apache Hadoop on HDInsight](hdinsight-use-mapreduce.md)
148176
* [Use Apache Hive with Apache Hadoop on HDInsight](hdinsight-use-hive.md)
149-
* [Use Apache Pig with Apache Hadoop on HDInsight](hdinsight-use-pig.md)
150177
151178
For more information about the REST interface that is used in this article, see the [WebHCat Reference](https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference).

0 commit comments

Comments
 (0)