You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/apache-hadoop-run-samples-linux.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ The following samples are contained in this archive:
47
47
|teravalidate|Checking results of terasort.|
48
48
|wordcount|Counts the words in the input files.|
49
49
|`wordmean`|Counts the average length of the words in the input files.|
50
-
|`wordmedian|Counts the median length of the words in the input files.|
50
+
|`wordmedian`|Counts the median length of the words in the input files.|
51
51
|wordstandarddeviation|Counts the standard deviation of the length of the words in the input files.|
52
52
53
53
## Run the wordcount example
@@ -166,15 +166,15 @@ The value returned by this command is similar to **3.14159155000000000000**. For
166
166
167
167
GraySort is a benchmark sort. The metric is the sort rate (TB/minute) that is achieved while sorting large amounts of data, usually a 100 TB minimum.
168
168
169
-
This sample uses a modest 10 GB of data so that it can be run relatively quickly. It uses the MapReduce applications developed by **Owen O'Malley** and **Arun Murthy**. These applications won the annual general-purpose ("Daytona") terabyte sort benchmark in 2009, with a rate of 0.578 TB/min (100 TB in 173 minutes). For more information on this and other sorting benchmarks, see the [Sort Benchmark](https://sortbenchmark.org/) site.
169
+
This sample uses a modest 10 GB of data so that it can be run relatively quickly. It uses the MapReduce applications developed by `Owen O'Malley` and `Arun Murthy`. These applications won the annual general-purpose ("Daytona") terabyte sort benchmark in 2009, with a rate of 0.578 TB/min (100 TB in 173 minutes). For more information on this and other sorting benchmarks, see the [Sort Benchmark](https://sortbenchmark.org/) site.
170
170
171
171
This sample uses three sets of MapReduce programs:
172
172
173
173
***TeraGen**: A MapReduce program that generates rows of data to sort
174
174
175
175
***TeraSort**: Samples the input data and uses MapReduce to sort the data into a total order
176
176
177
-
TeraSort is a standard MapReduce sort, except for a custom partitioner. The partitioner uses a sorted list of N-1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i-1] <= key < sample[i] are sent to reduce i. This partitioner guarantees that the outputs of reduce i are all less than the output of reduce `i+1`.
177
+
TeraSort is a standard MapReduce sort, except for a custom partitioner. The partitioner uses a sorted list of N-1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i-1] <= key < sample[i] are sent to reduce i. This partitioner guarantees that the outputs of reduce `i` are all less than the output of reduce `i+1`.
178
178
179
179
***TeraValidate**: A MapReduce program that validates that the output is globally sorted
0 commit comments