Skip to content

Commit 2797e33

Browse files
authored
Merge pull request #100849 from dagiro/freshness172
freshness172
2 parents 443c101 + 0cbba98 commit 2797e33

File tree

1 file changed

+32
-51
lines changed

1 file changed

+32
-51
lines changed

articles/hdinsight/hadoop/apache-hadoop-use-mapreduce-ssh.md

Lines changed: 32 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,14 @@
22
title: MapReduce and SSH connection with Apache Hadoop - Azure HDInsight
33
description: Learn how to use SSH to run MapReduce jobs using Apache Hadoop on HDInsight.
44
author: hrasheed-msft
5+
ms.author: hrasheed
56
ms.reviewer: jasonh
6-
77
ms.service: hdinsight
8-
ms.custom: hdinsightactive
98
ms.topic: conceptual
10-
ms.date: 04/10/2018
11-
ms.author: hrasheed
12-
9+
ms.custom: hdinsightactive
10+
ms.date: 01/10/2020
1311
---
12+
1413
# Use MapReduce with Apache Hadoop on HDInsight with SSH
1514

1615
[!INCLUDE [mapreduce-selector](../../../includes/hdinsight-selector-use-mapreduce.md)]
@@ -20,31 +19,17 @@ Learn how to submit MapReduce jobs from a Secure Shell (SSH) connection to HDIns
2019
> [!NOTE]
2120
> If you are already familiar with using Linux-based Apache Hadoop servers, but you are new to HDInsight, see [Linux-based HDInsight tips](../hdinsight-hadoop-linux-information.md).
2221
23-
## <a id="prereq"></a>Prerequisites
24-
25-
* A Linux-based HDInsight (Hadoop on HDInsight) cluster
26-
27-
* An SSH client. For more information, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md)
28-
29-
## <a id="ssh"></a>Connect with SSH
22+
## Prerequisites
3023

31-
Connect to the cluster using SSH. For example, the following command connects to a cluster named **myhdinsight** as the **sshuser** account:
24+
An Apache Hadoop cluster on HDInsight. See [Create Apache Hadoop clusters using the Azure portal](../hdinsight-hadoop-create-linux-clusters-portal.md).
3225

33-
```bash
34-
35-
```
26+
## Use Hadoop commands
3627

37-
**If you use a certificate key for SSH authentication**, you may need to specify the location of the private key on your client system, for example:
28+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
3829

39-
```bash
40-
ssh -i ~/mykey.key [email protected]
41-
```
42-
43-
**If you use a password for SSH authentication**, you need to provide the password when prompted.
44-
45-
For more information on using SSH with HDInsight, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md).
46-
47-
## <a id="hadoop"></a>Use Hadoop commands
30+
```cmd
31+
32+
```
4833
4934
1. After you are connected to the HDInsight cluster, use the following command to start a MapReduce job:
5035
@@ -57,14 +42,16 @@ For more information on using SSH with HDInsight, see [Use SSH with HDInsight](.
5742
> [!NOTE]
5843
> For more information about this MapReduce job and the example data, see [Use MapReduce in Apache Hadoop on HDInsight](hdinsight-use-mapreduce.md).
5944
60-
2. The job emits details as it processes, and it returns information similar to the following text when the job completes:
45+
The job emits details as it processes, and it returns information similar to the following text when the job completes:
6146
62-
File Input Format Counters
63-
Bytes Read=1395666
64-
File Output Format Counters
65-
Bytes Written=337623
47+
```output
48+
File Input Format Counters
49+
Bytes Read=1395666
50+
File Output Format Counters
51+
Bytes Written=337623
52+
```
6653
67-
3. When the job completes, use the following command to list the output files:
54+
1. When the job completes, use the following command to list the output files:
6855
6956
```bash
7057
hdfs dfs -ls /example/data/WordCountOutput
@@ -75,33 +62,27 @@ For more information on using SSH with HDInsight, see [Use SSH with HDInsight](.
7562
> [!NOTE]
7663
> Some MapReduce jobs may split the results across multiple **part-r-#####** files. If so, use the ##### suffix to indicate the order of the files.
7764
78-
4. To view the output, use the following command:
65+
1. To view the output, use the following command:
7966
8067
```bash
8168
hdfs dfs -cat /example/data/WordCountOutput/part-r-00000
8269
```
8370
84-
This command displays a list of the words that are contained in the **wasb://example/data/gutenberg/davinci.txt** file and the number of times each word occurred. The following text is an example of the data that is contained in the file:
71+
This command displays a list of the words that are contained in the **wasbs://example/data/gutenberg/davinci.txt** file and the number of times each word occurred. The following text is an example of the data that is contained in the file:
8572
86-
wreathed 3
87-
wreathing 1
88-
wreaths 1
89-
wrecked 3
90-
wrenching 1
91-
wretched 6
92-
wriggling 1
93-
94-
## <a id="summary"></a>Summary
95-
96-
As you can see, Hadoop commands provide an easy way to run MapReduce jobs in an HDInsight cluster and then view the job output.
73+
```output
74+
wreathed 3
75+
wreathing 1
76+
wreaths 1
77+
wrecked 3
78+
wrenching 1
79+
wretched 6
80+
wriggling 1
81+
```
9782
98-
## <a id="nextsteps"></a>Next steps
83+
## Next steps
9984
100-
For general information about MapReduce jobs in HDInsight:
85+
As you can see, Hadoop commands provide an easy way to run MapReduce jobs in an HDInsight cluster and then view the job output. For information about other ways you can work with Hadoop on HDInsight:
10186
10287
* [Use MapReduce on HDInsight Hadoop](hdinsight-use-mapreduce.md)
103-
104-
For information about other ways you can work with Hadoop on HDInsight:
105-
10688
* [Use Apache Hive with Apache Hadoop on HDInsight](hdinsight-use-hive.md)
107-
* [Use Apache Pig with Apache Hadoop on HDInsight](hdinsight-use-pig.md)

0 commit comments

Comments
 (0)