Skip to content

Commit ade043c

Browse files
authored
Update apache-hadoop-run-samples-linux.md
1 parent b3266ed commit ade043c

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

articles/hdinsight/hadoop/apache-hadoop-run-samples-linux.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ The following samples are contained in this archive:
4747
|teravalidate|Checking results of terasort.|
4848
|wordcount|Counts the words in the input files.|
4949
|`wordmean`|Counts the average length of the words in the input files.|
50-
|`wordmedian|Counts the median length of the words in the input files.|
50+
|`wordmedian`|Counts the median length of the words in the input files.|
5151
|wordstandarddeviation|Counts the standard deviation of the length of the words in the input files.|
5252

5353
## Run the wordcount example
@@ -166,15 +166,15 @@ The value returned by this command is similar to **3.14159155000000000000**. For
166166

167167
GraySort is a benchmark sort. The metric is the sort rate (TB/minute) that is achieved while sorting large amounts of data, usually a 100 TB minimum.
168168

169-
This sample uses a modest 10 GB of data so that it can be run relatively quickly. It uses the MapReduce applications developed by **Owen O'Malley** and **Arun Murthy**. These applications won the annual general-purpose ("Daytona") terabyte sort benchmark in 2009, with a rate of 0.578 TB/min (100 TB in 173 minutes). For more information on this and other sorting benchmarks, see the [Sort Benchmark](https://sortbenchmark.org/) site.
169+
This sample uses a modest 10 GB of data so that it can be run relatively quickly. It uses the MapReduce applications developed by `Owen O'Malley` and `Arun Murthy`. These applications won the annual general-purpose ("Daytona") terabyte sort benchmark in 2009, with a rate of 0.578 TB/min (100 TB in 173 minutes). For more information on this and other sorting benchmarks, see the [Sort Benchmark](https://sortbenchmark.org/) site.
170170

171171
This sample uses three sets of MapReduce programs:
172172

173173
* **TeraGen**: A MapReduce program that generates rows of data to sort
174174

175175
* **TeraSort**: Samples the input data and uses MapReduce to sort the data into a total order
176176

177-
TeraSort is a standard MapReduce sort, except for a custom partitioner. The partitioner uses a sorted list of N-1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i-1] <= key < sample[i] are sent to reduce i. This partitioner guarantees that the outputs of reduce i are all less than the output of reduce `i+1`.
177+
TeraSort is a standard MapReduce sort, except for a custom partitioner. The partitioner uses a sorted list of N-1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i-1] <= key < sample[i] are sent to reduce i. This partitioner guarantees that the outputs of reduce `i` are all less than the output of reduce `i+1`.
178178

179179
* **TeraValidate**: A MapReduce program that validates that the output is globally sorted
180180

0 commit comments

Comments
 (0)