You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/apache-hadoop-dotnet-csharp-mapreduce-streaming.md
+45-46Lines changed: 45 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -51,7 +51,7 @@ For more information on streaming, see [Hadoop Streaming](https://hadoop.apache.
51
51
52
52
## Create the mapper
53
53
54
-
In Visual Studio, create a new .NET Framework console app named *mapper*. Use the following code for the application:
54
+
In Visual Studio, create a new .NET Framework console application named *mapper*. Use the following code for the application:
55
55
56
56
```csharp
57
57
usingSystem;
@@ -88,7 +88,7 @@ After you create the application, build it to produce the */bin/Debug/mapper.exe
88
88
89
89
## Create the reducer
90
90
91
-
In Visual Studio, create a new .NET Framework console app named *reducer*. Use the following code for the application:
91
+
In Visual Studio, create a new .NET Framework console application named *reducer*. Use the following code for the application:
92
92
93
93
```csharp
94
94
usingSystem;
@@ -141,12 +141,9 @@ After you create the application, build it to produce the */bin/Debug/reducer.ex
141
141
142
142
## Upload to storage
143
143
144
-
Next, you need to upload the *mapper* and *reducer*apps to HDInsight storage.
144
+
Next, you need to upload the *mapper* and *reducer*applications to HDInsight storage.
145
145
146
-
> [!NOTE]
147
-
> To upload to storage on your HDInsight cluster from Visual Studio, you need to have at least co-administrator access to your Azure subscription. To change administrators for a subscription, see [Add or change Azure subscription administrators](../../billing/billing-add-change-azure-subscription-administrator.md).
148
-
149
-
1. In Visual Studio, open **Server Explorer**.
146
+
1. In Visual Studio, choose **View** > **Server Explorer**.
150
147
151
148
2. Expand **Azure**, and then expand **HDInsight**.
152
149
@@ -162,59 +159,61 @@ Next, you need to upload the *mapper* and *reducer* apps to HDInsight storage.
162
159
163
160
5. To upload the .exe files, use one of the following methods:
164
161
165
-
* For an **Azure Storage Account**, select the upload icon, and then browse to the *bin\debug* folder for the *mapper* project. Finally, select the *mapper.exe* file and then select **Ok**.
162
+
* If you're using an **Azure Storage Account**, select the **Upload Blob** icon.
163
+
164
+

165
+
166
+
In the **Upload New File** dialog box, under **File name**, select **Browse**. In the **Upload Blob** dialog box, go to the *bin\debug* folder for the *mapper* project, and then choose the *mapper.exe* file. Finally, select **Open** and then **OK** to complete the upload.
166
167
167
-

168
-
169
-
* For **Azure Data Lake Storage**, right-click an empty area in the file listing, and then select **Upload**. Finally, select the *mapper.exe* file and then select **Open**.
168
+
* For **Azure Data Lake Storage**, right-click an empty area in the file listing, and then select **Upload**. Finally, select the *mapper.exe* file and then select **Open**.
170
169
171
-
Once the *mapper.exe* upload has finished, repeat the upload process for the *reducer.exe* file.
170
+
Once the *mapper.exe* upload has finished, repeat the upload process for the *reducer.exe* file.
172
171
173
172
## Run a job: Using an SSH session
174
173
175
174
The following procedure describes how to run a MapReduce job using an SSH session:
176
175
177
-
1. Use SSH to connect to the HDInsight cluster. For more information, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md).
176
+
1. Use SSH to connect to the HDInsight cluster. (For example, run the command `ssh sshuser@<clustername>-ssh.azurehdinsight.net`.) For more information, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md).
178
177
179
178
2. Use one of the following commands to start the MapReduce job:
180
179
181
-
* If default storage is **Data Lake Storage Gen2**:
182
-
183
-
```bash
184
-
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
185
-
-files abfs:///mapper.exe,abfs:///reducer.exe \
186
-
-mapper mapper.exe \
187
-
-reducer reducer.exe \
188
-
-input /example/data/gutenberg/davinci.txt \
189
-
-output /example/wordcountout
190
-
```
191
-
192
-
* If default storage is **Data Lake Storage Gen1**:
193
-
194
-
```bash
195
-
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
196
-
-files adl:///mapper.exe,adl:///reducer.exe \
197
-
-mapper mapper.exe \
198
-
-reducer reducer.exe \
199
-
-input /example/data/gutenberg/davinci.txt \
200
-
-output /example/wordcountout
201
-
```
202
-
203
-
* If default storage is **Azure Storage**:
204
-
205
-
```bash
206
-
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
207
-
-files wasb:///mapper.exe,wasb:///reducer.exe \
208
-
-mapper mapper.exe \
209
-
-reducer reducer.exe \
210
-
-input /example/data/gutenberg/davinci.txt \
211
-
-output /example/wordcountout
212
-
```
180
+
* If the default storage is **Azure Storage**:
181
+
182
+
```bash
183
+
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
184
+
-files wasb:///mapper.exe,wasb:///reducer.exe \
185
+
-mapper mapper.exe \
186
+
-reducer reducer.exe \
187
+
-input /example/data/gutenberg/davinci.txt \
188
+
-output /example/wordcountout
189
+
```
190
+
191
+
* If the default storage is **Data Lake Storage Gen1**:
192
+
193
+
```bash
194
+
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
195
+
-files adl:///mapper.exe,adl:///reducer.exe \
196
+
-mapper mapper.exe \
197
+
-reducer reducer.exe \
198
+
-input /example/data/gutenberg/davinci.txt \
199
+
-output /example/wordcountout
200
+
```
201
+
202
+
* If the default storage is **Data Lake Storage Gen2**:
203
+
204
+
```bash
205
+
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-streaming.jar \
206
+
-files abfs:///mapper.exe,abfs:///reducer.exe \
207
+
-mapper mapper.exe \
208
+
-reducer reducer.exe \
209
+
-input /example/data/gutenberg/davinci.txt \
210
+
-output /example/wordcountout
211
+
```
213
212
214
213
The following list describes what each parameter and option represents:
215
214
216
215
**hadoop-streaming.jar*: Specifies the jar file that contains the streaming MapReduce functionality.
217
-
*`-files`: Specifies the *mapper.exe* and *reducer.exe* files for this job. The `abfs:///`,`adl:///`, or `wasb:///` protocol declaration before each file is the path to the root of default storage for the cluster.
216
+
*`-files`: Specifies the *mapper.exe* and *reducer.exe* files for this job. The `wasb:///`,`adl:///`, or `abfs:///` protocol declaration before each file is the path to the root of default storage for the cluster.
218
217
*`-mapper`: Specifies the file that implements the mapper.
219
218
*`-reducer`: Specifies the file that implements the reducer.
0 commit comments