Skip to content

Commit d365012

Browse files
authored
Merge pull request #111538 from dagiro/freshness59
freshness59
2 parents f62f9ed + 18739bb commit d365012

File tree

1 file changed

+13
-14
lines changed

1 file changed

+13
-14
lines changed

articles/hdinsight/hadoop/apache-hadoop-dotnet-csharp-mapreduce-streaming.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@ ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
99
ms.custom: hdinsightactive
10-
ms.date: 11/22/2019
10+
ms.date: 04/15/2020
1111
---
1212

1313
# Use C# with MapReduce streaming on Apache Hadoop in HDInsight
1414

1515
Learn how to use C# to create a MapReduce solution on HDInsight.
1616

17-
Apache Hadoop streaming is a utility that allows you to run MapReduce jobs using a script or executable. In this example, .NET is used to implement the mapper and reducer for a word count solution.
17+
Apache Hadoop streaming allows you to run MapReduce jobs using a script or executable. Here, .NET is used to implement the mapper and reducer for a word count solution.
1818

1919
## .NET on HDInsight
2020

@@ -44,12 +44,9 @@ For more information on streaming, see [Hadoop Streaming](https://hadoop.apache.
4444

4545
* If using PowerShell, you'll need the [Az Module](https://docs.microsoft.com/powershell/azure/overview).
4646

47-
* An SSH client (optional). For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
48-
4947
* An Apache Hadoop cluster on HDInsight. See [Get Started with HDInsight on Linux](../hadoop/apache-hadoop-linux-tutorial-get-started.md).
5048

51-
* The [URI scheme](../hdinsight-hadoop-linux-information.md#URI-and-scheme) for your clusters primary storage. This would be `wasb://` for Azure Storage, `abfs://` for Azure Data Lake Storage Gen2 or `adl://` for Azure Data Lake Storage Gen1. If secure transfer is enabled for Azure Storage or Data Lake Storage Gen2, the URI would be `wasbs://` or `abfss://`, respectively See also, [secure transfer](../../storage/common/storage-require-secure-transfer.md).
52-
49+
* The [URI scheme](../hdinsight-hadoop-linux-information.md#URI-and-scheme) for your clusters primary storage. This scheme would be `wasb://` for Azure Storage, `abfs://` for Azure Data Lake Storage Gen2 or `adl://` for Azure Data Lake Storage Gen1. If secure transfer is enabled for Azure Storage or Data Lake Storage Gen2, the URI would be `wasbs://` or `abfss://`, respectively See also, [secure transfer](../../storage/common/storage-require-secure-transfer.md).
5350

5451
## Create the mapper
5552

@@ -147,7 +144,7 @@ Next, you need to upload the *mapper* and *reducer* applications to HDInsight st
147144

148145
1. In Visual Studio, select **View** > **Server Explorer**.
149146

150-
1. Right-click **Azure**, select **Connect to Microsoft Azure Subscription...**, and complete the sign in process.
147+
1. Right-click **Azure**, select **Connect to Microsoft Azure Subscription...**, and complete the sign-in process.
151148

152149
1. Expand the HDInsight cluster that you wish to deploy this application to. An entry with the text **(Default Storage Account)** is listed.
153150

@@ -216,14 +213,16 @@ The following procedure describes how to run a MapReduce job using an SSH sessio
216213
217214
The following list describes what each parameter and option represents:
218215
219-
* *hadoop-streaming.jar*: Specifies the jar file that contains the streaming MapReduce functionality.
220-
* `-files`: Specifies the *mapper.exe* and *reducer.exe* files for this job. The `wasbs:///`, `adl:///`, or `abfs:///` protocol declaration before each file is the path to the root of default storage for the cluster.
221-
* `-mapper`: Specifies the file that implements the mapper.
222-
* `-reducer`: Specifies the file that implements the reducer.
223-
* `-input`: Specifies the input data.
224-
* `-output`: Specifies the output directory.
216+
|Parameter | Description |
217+
|---|---|
218+
|hadoop-streaming.jar|Specifies the jar file that contains the streaming MapReduce functionality.|
219+
|-files|Specifies the *mapper.exe* and *reducer.exe* files for this job. The `wasbs:///`, `adl:///`, or `abfs:///` protocol declaration before each file is the path to the root of default storage for the cluster.|
220+
|-mapper|Specifies the file that implements the mapper.|
221+
|-reducer|Specifies the file that implements the reducer.|
222+
|-input|Specifies the input data.|
223+
|-output|Specifies the output directory.|
225224
226-
3. Once the MapReduce job completes, use the following command to view the results:
225+
1. Once the MapReduce job completes, use the following command to view the results:
227226
228227
```bash
229228
hdfs dfs -text /example/wordcountout/part-00000

0 commit comments

Comments
 (0)