Skip to content

Commit d7b0b2c

Browse files
authored
Merge pull request #105911 from dagiro/freshness3
freshness3
2 parents df1e965 + 4dabf19 commit d7b0b2c

File tree

1 file changed

+5
-6
lines changed

1 file changed

+5
-6
lines changed

articles/hdinsight/hadoop/apache-hadoop-introduction.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,13 @@
11
---
22
title: What is the Apache Hadoop technology stack? - Azure HDInsight
33
description: An introduction to HDInsight, and the Apache Hadoop technology stack and components.
4-
keywords: azure hadoop, hadoop azure, hadoop intro, hadoop introduction, hadoop technology stack, intro to hadoop, introduction to hadoop, what is a hadoop cluster, what is hadoop cluster, what is hadoop used for
54
author: hrasheed-msft
65
ms.author: hrasheed
76
ms.reviewer: jasonh
87
ms.service: hdinsight
9-
ms.custom: hdinsightactive,hdiseo17may2017,mvc,seodec18
108
ms.topic: overview
11-
ms.date: 08/15/2019
9+
ms.custom: hdinsightactive,hdiseo17may2017,mvc,seodec18
10+
ms.date: 02/27/2020
1211
#Customer intent: As a data analyst, I want understand what is Hadoop and how it is offered in Azure HDInsight so that I can decide on using HDInsight instead of on premises clusters.
1312
---
1413

@@ -20,15 +19,15 @@ Azure HDInsight is a fully managed, full-spectrum, open-source analytics service
2019

2120
To see available Hadoop technology stack components on HDInsight, see [Components and versions available with HDInsight](../hdinsight-component-versioning.md). To read more about Hadoop in HDInsight, see the [Azure features page for HDInsight](https://azure.microsoft.com/services/hdinsight/).
2221

23-
## <a id="whatis"></a>What is MapReduce
22+
## What is MapReduce
2423

2524
Apache Hadoop MapReduce is a software framework for writing jobs that process vast amounts of data. Input data is split into independent chunks. Each chunk is processed in parallel across the nodes in your cluster. A MapReduce job consists of two functions:
2625

2726
* **Mapper**: Consumes input data, analyzes it (usually with filter and sorting operations), and emits tuples (key-value pairs)
2827

2928
* **Reducer**: Consumes tuples emitted by the Mapper and performs a summary operation that creates a smaller, combined result from the Mapper data
3029

31-
A basic word count MapReduce job example is illustrated in the following diagram:
30+
A basic word count MapReduce job example is illustrated in the following diagram:
3231

3332
![HDI.WordCountDiagram](./media/apache-hadoop-introduction/hdi-word-count-diagram.gif)
3433

@@ -47,7 +46,7 @@ Hadoop streaming communicates with the mapper and reducer over STDIN and STDOUT.
4746

4847
[key]/t[value]
4948

50-
For more information, see [Hadoop Streaming](https://hadoop.apache.org/docs/r1.2.1/streaming.html).
49+
For more information, see [Hadoop Streaming](https://hadoop.apache.org/docs/current/hadoop-streaming/HadoopStreaming.html).
5150

5251
For examples of using Hadoop streaming with HDInsight, see the following document:
5352

0 commit comments

Comments
 (0)