Skip to content

Commit 419afdd

Browse files
committed
freshness169
1 parent 080c9d1 commit 419afdd

File tree

1 file changed

+20
-33
lines changed

1 file changed

+20
-33
lines changed

articles/hdinsight/hadoop/apache-hadoop-use-mapreduce-powershell.md

Lines changed: 20 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -2,44 +2,39 @@
22
title: Use MapReduce and PowerShell with Apache Hadoop - Azure HDInsight
33
description: Learn how to use PowerShell to remotely run MapReduce jobs with Apache Hadoop on HDInsight.
44
author: hrasheed-msft
5+
ms.author: hrasheed
56
ms.reviewer: jasonh
6-
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 05/09/2018
11-
ms.author: hrasheed
12-
9+
ms.custom: hdinsightactive
10+
ms.date: 01/08/2020
1311
---
12+
1413
# Run MapReduce jobs with Apache Hadoop on HDInsight using PowerShell
1514

1615
[!INCLUDE [mapreduce-selector](../../../includes/hdinsight-selector-use-mapreduce.md)]
1716

1817
This document provides an example of using Azure PowerShell to run a MapReduce job in a Hadoop on HDInsight cluster.
1918

20-
## <a id="prereq"></a>Prerequisites
21-
22-
[!INCLUDE [updated-for-az](../../../includes/updated-for-az.md)]
19+
## Prerequisites
2320

24-
* **An Azure HDInsight (Hadoop on HDInsight) cluster**
21+
* An Apache Hadoop cluster on HDInsight. See [Create Apache Hadoop clusters using the Azure portal](../hdinsight-hadoop-create-linux-clusters-portal.md).
2522

26-
* **A workstation with Azure PowerShell**.
23+
* The PowerShell [Az Module](https://docs.microsoft.com/powershell/azure/overview) installed.
2724

28-
## <a id="powershell"></a>Run a MapReduce job
25+
## Run a MapReduce job
2926

3027
Azure PowerShell provides *cmdlets* that allow you to remotely run MapReduce jobs on HDInsight. Internally, PowerShell makes REST calls to [WebHCat](https://cwiki.apache.org/confluence/display/Hive/WebHCat) (formerly called Templeton) running on the HDInsight cluster.
3128

3229
The following cmdlets are used when running MapReduce jobs in a remote HDInsight cluster.
3330

34-
* **Connect-AzAccount**: Authenticates Azure PowerShell to your Azure subscription.
35-
36-
* **New-AzHDInsightMapReduceJobDefinition**: Creates a new *job definition* by using the specified MapReduce information.
37-
38-
* **Start-AzHDInsightJob**: Sends the job definition to HDInsight and starts the job. A *job* object is returned.
39-
40-
* **Wait-AzHDInsightJob**: Uses the job object to check the status of the job. It waits until the job completes or the wait time is exceeded.
41-
42-
* **Get-AzHDInsightJobOutput**: Used to retrieve the output of the job.
31+
|Cmdlet | Description |
32+
|---|---|
33+
|Connect-AzAccount|Authenticates Azure PowerShell to your Azure subscription.|
34+
|New-AzHDInsightMapReduceJobDefinition|Creates a new *job definition* by using the specified MapReduce information.|
35+
|Start-AzHDInsightJob|Sends the job definition to HDInsight and starts the job. A *job* object is returned.|
36+
|Wait-AzHDInsightJob|Uses the job object to check the status of the job. It waits until the job completes or the wait time is exceeded.|
37+
|Get-AzHDInsightJobOutput|Used to retrieve the output of the job.|
4338

4439
The following steps demonstrate how to use these cmdlets to run a job in your HDInsight cluster.
4540

@@ -51,7 +46,7 @@ The following steps demonstrate how to use these cmdlets to run a job in your HD
5146

5247
.\mapreducejob.ps1
5348

54-
When you run the script, you are prompted for the name of the HDInsight cluster and the cluster login. You may also be prompted to authenticate to your Azure subscription.
49+
When you run the script, you're prompted for the name of the HDInsight cluster and the cluster login. You may also be prompted to authenticate to your Azure subscription.
5550

5651
3. When the job completes, you receive output similar to the following text:
5752

@@ -79,9 +74,9 @@ To see the words and counts produced by the job, open the **output.txt** file in
7974
> [!NOTE]
8075
> The output files of a MapReduce job are immutable. So if you rerun this sample, you need to change the name of the output file.
8176
82-
## <a id="troubleshooting"></a>Troubleshooting
77+
## Troubleshooting
8378

84-
If no information is returned when the job completes, view errors for the job. To view error information for this job, add the following command to the end of the **mapreducejob.ps1** file, save it, and then run it again.
79+
If no information is returned when the job completes, view errors for the job. To view error information for this job, add the following command to the end of the **mapreducejob.ps1** file. Then save the file and rerun the script.
8580

8681
```powershell
8782
# Print the output of the WordCount job.
@@ -95,17 +90,9 @@ Get-AzHDInsightJobOutput `
9590

9691
This cmdlet returns the information that was written to STDERR as the job runs.
9792

98-
## <a id="summary"></a>Summary
93+
## Next steps
9994

100-
As you can see, Azure PowerShell provides an easy way to run MapReduce jobs on an HDInsight cluster, monitor the job status, and retrieve the output.
101-
102-
## <a id="nextsteps"></a>Next steps
103-
104-
For general information about MapReduce jobs in HDInsight:
95+
As you can see, Azure PowerShell provides an easy way to run MapReduce jobs on an HDInsight cluster, monitor the job status, and retrieve the output. For information about other ways you can work with Hadoop on HDInsight:
10596

10697
* [Use MapReduce on HDInsight Hadoop](hdinsight-use-mapreduce.md)
107-
108-
For information about other ways you can work with Hadoop on HDInsight:
109-
11098
* [Use Apache Hive with Apache Hadoop on HDInsight](hdinsight-use-hive.md)
111-
* [Use Apache Pig with Apache Hadoop on HDInsight](hdinsight-use-pig.md)

0 commit comments

Comments
 (0)