You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hadoop/apache-hadoop-run-samples-linux.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,7 +29,7 @@ The following samples are contained in this archive:
29
29
|---|---|
30
30
|aggregatewordcount|Counts the words in the input files.|
31
31
|aggregatewordhist|Computes the histogram of the words in the input files.|
32
-
|bbp|Uses Bailey-Borwein-Plouffe to compute exact digits of Pi.|
32
+
|`bbp`|Uses Bailey-Borwein-Plouffe to compute exact digits of Pi.|
33
33
|dbcount|Counts the pageview logs stored in a database.|
34
34
|distbbp|Uses a BBP-type formula to compute exact bits of Pi.|
35
35
|grep|Counts the matches of a regex in the input.|
@@ -38,15 +38,15 @@ The following samples are contained in this archive:
38
38
|pentomino|Tile laying program to find solutions to pentomino problems.|
39
39
|pi|Estimates Pi using a quasi-Monte Carlo method.|
40
40
|randomtextwriter|Writes 10 GB of random textual data per node.|
41
-
|randomwriter|Writes 10 GB of random data per node.|
42
-
|secondarysort|Defines a secondary sort to the reduce phase.|
41
+
|`randomwriter`|Writes 10 GB of random data per node.|
42
+
|`secondarysort`|Defines a secondary sort to the reduce phase.|
43
43
|sort|Sorts the data written by the random writer.|
44
44
|sudoku|A sudoku solver.|
45
45
|teragen|Generate data for the terasort.|
46
46
|terasort|Run the terasort.|
47
47
|teravalidate|Checking results of terasort.|
48
48
|wordcount|Counts the words in the input files.|
49
-
|wordmean|Counts the average length of the words in the input files.|
49
+
|`wordmean`|Counts the average length of the words in the input files.|
50
50
|wordmedian|Counts the median length of the words in the input files.|
51
51
|wordstandarddeviation|Counts the standard deviation of the length of the words in the input files.|
52
52
@@ -116,7 +116,7 @@ The following samples are contained in this archive:
116
116
* Each column can contain either a number or `?` (which indicates a blank cell)
117
117
* Cells are separated by a space
118
118
119
-
There is a certain way to construct Sudoku puzzles; you can't repeat a number in a column or row. There's an example on the HDInsight cluster that is properly constructed. It is located at `/usr/hdp/*/hadoop/src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/dancing/puzzle1.dta` and contains the following text:
119
+
There is a certain way to construct Sudoku puzzles; you can't repeat a number in a column or row. There is an example of the HDInsight cluster that is properly constructed. It is located at `/usr/hdp/*/hadoop/src/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/dancing/puzzle1.dta` and contains the following text:
120
120
121
121
```output
122
122
8 5 ? 3 9 ? ? ? ?
@@ -152,7 +152,7 @@ The results appear similar to the following text:
152
152
153
153
## Pi (π) example
154
154
155
-
The pi sample uses a statistical (quasi-Monte Carlo) method to estimate the value of pi. Points are placed at random in a unit square. The square also contains a circle. The probability that the points fall within the circle is equal to the area of the circle, pi/4. The value of pi can be estimated from the value of 4R. R is the ratio of the number of points that are inside the circle to the total number of points that are within the square. The larger the sample of points used, the better the estimate is.
155
+
The pi sample uses a statistical (quasi-Monte Carlo) method to estimate the value of pi. Points are placed at random in a unit square. The square also contains a circle. The probability that the points fall within the circle is equal to the area of the circle, pi/4. The value of pi can be estimated from the value of `4R`. R is the ratio of the number of points that are inside the circle to the total number of points that are within the square. The larger the sample of points used, the better the estimate is.
156
156
157
157
Use the following command to run this sample. This command uses 16 maps with 10,000,000 samples each to estimate the value of pi:
158
158
@@ -162,7 +162,7 @@ yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar
162
162
163
163
The value returned by this command is similar to **3.14159155000000000000**. For references, the first 10 decimal places of pi are 3.1415926535.
164
164
165
-
## 10GB GraySort example
165
+
## 10-GB GraySort example
166
166
167
167
GraySort is a benchmark sort. The metric is the sort rate (TB/minute) that is achieved while sorting large amounts of data, usually a 100 TB minimum.
168
168
@@ -174,7 +174,7 @@ This sample uses three sets of MapReduce programs:
174
174
175
175
***TeraSort**: Samples the input data and uses MapReduce to sort the data into a total order
176
176
177
-
TeraSort is a standard MapReduce sort, except for a custom partitioner. The partitioner uses a sorted list of N-1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i-1] <= key < sample[i] are sent to reduce i. This partitioner guarantees that the outputs of reduce i are all less than the output of reduce i+1.
177
+
TeraSort is a standard MapReduce sort, except for a custom partitioner. The partitioner uses a sorted list of N-1 sampled keys that define the key range for each reduce. In particular, all keys such that sample[i-1] <= key < sample[i] are sent to reduce i. This partitioner guarantees that the outputs of reduce i are all less than the output of reduce `i+1`.
178
178
179
179
***TeraValidate**: A MapReduce program that validates that the output is globally sorted
0 commit comments