Skip to content

Commit 52cae07

Browse files
Removed the co-admin note from the MapReduce streaming article.
Also made a few other fixes.
1 parent ef785f8 commit 52cae07

File tree

3 files changed

+45
-46
lines changed

3 files changed

+45
-46
lines changed

articles/hdinsight/hadoop/apache-hadoop-dotnet-csharp-mapreduce-streaming.md

Lines changed: 45 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ For more information on streaming, see [Hadoop Streaming](https://hadoop.apache.
5151

5252
## Create the mapper
5353

54-
In Visual Studio, create a new .NET Framework console app named *mapper*. Use the following code for the application:
54+
In Visual Studio, create a new .NET Framework console application named *mapper*. Use the following code for the application:
5555

5656
```csharp
5757
using System;
@@ -88,7 +88,7 @@ After you create the application, build it to produce the */bin/Debug/mapper.exe
8888

8989
## Create the reducer
9090

91-
In Visual Studio, create a new .NET Framework console app named *reducer*. Use the following code for the application:
91+
In Visual Studio, create a new .NET Framework console application named *reducer*. Use the following code for the application:
9292

9393
```csharp
9494
using System;
@@ -141,12 +141,9 @@ After you create the application, build it to produce the */bin/Debug/reducer.ex
141141

142142
## Upload to storage
143143

144-
Next, you need to upload the *mapper* and *reducer* apps to HDInsight storage.
144+
Next, you need to upload the *mapper* and *reducer* applications to HDInsight storage.
145145

146-
> [!NOTE]
147-
> To upload to storage on your HDInsight cluster from Visual Studio, you need to have at least co-administrator access to your Azure subscription. To change administrators for a subscription, see [Add or change Azure subscription administrators](../../billing/billing-add-change-azure-subscription-administrator.md).
148-
149-
1. In Visual Studio, open **Server Explorer**.
146+
1. In Visual Studio, choose **View** > **Server Explorer**.
150147

151148
2. Expand **Azure**, and then expand **HDInsight**.
152149

@@ -162,59 +159,61 @@ Next, you need to upload the *mapper* and *reducer* apps to HDInsight storage.
162159

163160
5. To upload the .exe files, use one of the following methods:
164161

165-
* For an **Azure Storage Account**, select the upload icon, and then browse to the *bin\debug* folder for the *mapper* project. Finally, select the *mapper.exe* file and then select **Ok**.
162+
* If you're using an **Azure Storage Account**, select the **Upload Blob** icon.
163+
164+
![HDInsight upload icon for mapper, Visual Studio](./media/apache-hadoop-dotnet-csharp-mapreduce-streaming/hdinsight-upload-icon.png)
165+
166+
In the **Upload New File** dialog box, under **File name**, select **Browse**. In the **Upload Blob** dialog box, go to the *bin\debug* folder for the *mapper* project, and then choose the *mapper.exe* file. Finally, select **Open** and then **OK** to complete the upload.
166167

167-
![HDInsight upload icon for mapper](./media/apache-hadoop-dotnet-csharp-mapreduce-streaming/hdinsight-upload-icon.png)
168-
169-
* For **Azure Data Lake Storage**, right-click an empty area in the file listing, and then select **Upload**. Finally, select the *mapper.exe* file and then select **Open**.
168+
* For **Azure Data Lake Storage**, right-click an empty area in the file listing, and then select **Upload**. Finally, select the *mapper.exe* file and then select **Open**.
170169

171-
Once the *mapper.exe* upload has finished, repeat the upload process for the *reducer.exe* file.
170+
Once the *mapper.exe* upload has finished, repeat the upload process for the *reducer.exe* file.
172171

173172
## Run a job: Using an SSH session
174173

175174
The following procedure describes how to run a MapReduce job using an SSH session:
176175

177-
1. Use SSH to connect to the HDInsight cluster. For more information, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md).
176+
1. Use SSH to connect to the HDInsight cluster. (For example, run the command `ssh sshuser@<clustername>-ssh.azurehdinsight.net`.) For more information, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md).
178177

179178
2. Use one of the following commands to start the MapReduce job:
180179

181-
* If default storage is **Data Lake Storage Gen2**:
182-
183-
```bash
184-
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
185-
-files abfs:///mapper.exe,abfs:///reducer.exe \
186-
-mapper mapper.exe \
187-
-reducer reducer.exe \
188-
-input /example/data/gutenberg/davinci.txt \
189-
-output /example/wordcountout
190-
```
191-
192-
* If default storage is **Data Lake Storage Gen1**:
193-
194-
```bash
195-
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
196-
-files adl:///mapper.exe,adl:///reducer.exe \
197-
-mapper mapper.exe \
198-
-reducer reducer.exe \
199-
-input /example/data/gutenberg/davinci.txt \
200-
-output /example/wordcountout
201-
```
202-
203-
* If default storage is **Azure Storage**:
204-
205-
```bash
206-
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
207-
-files wasb:///mapper.exe,wasb:///reducer.exe \
208-
-mapper mapper.exe \
209-
-reducer reducer.exe \
210-
-input /example/data/gutenberg/davinci.txt \
211-
-output /example/wordcountout
212-
```
180+
* If the default storage is **Azure Storage**:
181+
182+
```bash
183+
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
184+
-files wasb:///mapper.exe,wasb:///reducer.exe \
185+
-mapper mapper.exe \
186+
-reducer reducer.exe \
187+
-input /example/data/gutenberg/davinci.txt \
188+
-output /example/wordcountout
189+
```
190+
191+
* If the default storage is **Data Lake Storage Gen1**:
192+
193+
```bash
194+
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
195+
-files adl:///mapper.exe,adl:///reducer.exe \
196+
-mapper mapper.exe \
197+
-reducer reducer.exe \
198+
-input /example/data/gutenberg/davinci.txt \
199+
-output /example/wordcountout
200+
```
201+
202+
* If the default storage is **Data Lake Storage Gen2**:
203+
204+
```bash
205+
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
206+
-files abfs:///mapper.exe,abfs:///reducer.exe \
207+
-mapper mapper.exe \
208+
-reducer reducer.exe \
209+
-input /example/data/gutenberg/davinci.txt \
210+
-output /example/wordcountout
211+
```
213212

214213
The following list describes what each parameter and option represents:
215214

216215
* *hadoop-streaming.jar*: Specifies the jar file that contains the streaming MapReduce functionality.
217-
* `-files`: Specifies the *mapper.exe* and *reducer.exe* files for this job. The `abfs:///`,`adl:///`, or `wasb:///` protocol declaration before each file is the path to the root of default storage for the cluster.
216+
* `-files`: Specifies the *mapper.exe* and *reducer.exe* files for this job. The `wasb:///`, `adl:///`, or `abfs:///` protocol declaration before each file is the path to the root of default storage for the cluster.
218217
* `-mapper`: Specifies the file that implements the mapper.
219218
* `-reducer`: Specifies the file that implements the reducer.
220219
* `-input`: Specifies the input data.
25 KB
Loading
571 Bytes
Loading

0 commit comments

Comments
 (0)