Skip to content

Commit 8221178

Browse files
authored
Update apache-hadoop-run-samples-linux.md
1 parent ffabf85 commit 8221178

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

articles/hdinsight/hadoop/apache-hadoop-run-samples-linux.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ The following samples are contained in this archive:
2929
|---|---|
3030
|aggregatewordcount|Counts the words in the input files.|
3131
|aggregatewordhist|Computes the histogram of the words in the input files.|
32-
|bbp|Uses Bailey-Borwein-Plouffe to compute exact digits of Pi.|
32+
|`bbp`|Uses Bailey-Borwein-Plouffe to compute exact digits of Pi.|
3333
|dbcount|Counts the pageview logs stored in a database.|
3434
|distbbp|Uses a BBP-type formula to compute exact bits of Pi.|
3535
|grep|Counts the matches of a regex in the input.|
@@ -38,15 +38,15 @@ The following samples are contained in this archive:
3838
|pentomino|Tile laying program to find solutions to pentomino problems.|
3939
|pi|Estimates Pi using a quasi-Monte Carlo method.|
4040
|randomtextwriter|Writes 10 GB of random textual data per node.|
41-
|randomwriter|Writes 10 GB of random data per node.|
42-
|secondarysort|Defines a secondary sort to the reduce phase.|
41+
|`randomwriter`|Writes 10 GB of random data per node.|
42+
|`secondarysort`|Defines a secondary sort to the reduce phase.|
4343
|sort|Sorts the data written by the random writer.|
4444
|sudoku|A sudoku solver.|
4545
|teragen|Generate data for the terasort.|
4646
|terasort|Run the terasort.|
4747
|teravalidate|Checking results of terasort.|
4848
|wordcount|Counts the words in the input files.|
49-
|wordmean|Counts the average length of the words in the input files.|
49+
|`wordmean`|Counts the average length of the words in the input files.|
5050
|wordmedian|Counts the median length of the words in the input files.|
5151
|wordstandarddeviation|Counts the standard deviation of the length of the words in the input files.|
5252

@@ -116,7 +116,7 @@ The following samples are contained in this archive:
116116
* Each column can contain either a number or `?` (which indicates a blank cell)
117117
* Cells are separated by a space
118118
119-
There is a certain way to construct Sudoku puzzles; you can't repeat a number in a column or row. There's an example on the HDInsight cluster that is properly constructed. It is located at `/usr/hdp/*/hadoop/src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/dancing/puzzle1.dta` and contains the following text:
119+
There is a certain way to construct Sudoku puzzles; you can't repeat a number in a column or row. There is an example of the HDInsight cluster that is properly constructed. It is located at `/usr/hdp/*/hadoop/src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/dancing/puzzle1.dta` and contains the following text:
120120
121121
```output
122122
8 5 ? 3 9 ? ? ? ?
@@ -152,7 +152,7 @@ The results appear similar to the following text:
152152

153153
## Pi (π) example
154154

155-
The pi sample uses a statistical (quasi-Monte Carlo) method to estimate the value of pi. Points are placed at random in a unit square. The square also contains a circle. The probability that the points fall within the circle is equal to the area of the circle, pi/4. The value of pi can be estimated from the value of 4R. R is the ratio of the number of points that are inside the circle to the total number of points that are within the square. The larger the sample of points used, the better the estimate is.
155+
The pi sample uses a statistical (quasi-Monte Carlo) method to estimate the value of pi. Points are placed at random in a unit square. The square also contains a circle. The probability that the points fall within the circle is equal to the area of the circle, pi/4. The value of pi can be estimated from the value of `4R`. R is the ratio of the number of points that are inside the circle to the total number of points that are within the square. The larger the sample of points used, the better the estimate is.
156156

157157
Use the following command to run this sample. This command uses 16 maps with 10,000,000 samples each to estimate the value of pi:
158158

@@ -162,7 +162,7 @@ yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar
162162

163163
The value returned by this command is similar to **3.14159155000000000000**. For references, the first 10 decimal places of pi are 3.1415926535.
164164

165-
## 10 GB GraySort example
165+
## 10-GB GraySort example
166166

167167
GraySort is a benchmark sort. The metric is the sort rate (TB/minute) that is achieved while sorting large amounts of data, usually a 100 TB minimum.
168168

@@ -174,7 +174,7 @@ This sample uses three sets of MapReduce programs:
174174

175175
* **TeraSort**: Samples the input data and uses MapReduce to sort the data into a total order
176176

177-
TeraSort is a standard MapReduce sort, except for a custom partitioner. The partitioner uses a sorted list of N-1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i-1] <= key < sample[i] are sent to reduce i. This partitioner guarantees that the outputs of reduce i are all less than the output of reduce i+1.
177+
TeraSort is a standard MapReduce sort, except for a custom partitioner. The partitioner uses a sorted list of N-1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i-1] <= key < sample[i] are sent to reduce i. This partitioner guarantees that the outputs of reduce i are all less than the output of reduce `i+1`.
178178

179179
* **TeraValidate**: A MapReduce program that validates that the output is globally sorted
180180

0 commit comments

Comments
 (0)