Skip to content

Commit 75864c9

Browse files
committed
Merge pull request #2 from learninghadoop2/gmodena-readme-gettingstarted
Add project documentation
2 parents 9285bab + e24889b commit 75864c9

File tree

1 file changed

+100
-0
lines changed

1 file changed

+100
-0
lines changed

README.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# book-examples
2+
3+
This repository contains examples (and errata) for [Learning Hadoop 2](http://learninghadoop2.com).
4+
5+
## Requirements
6+
7+
Throughout the book we use Cloudera CDH 5.0 and Amazon EMR as reference systems. All examples target,
8+
and have been tested with, Java 7.
9+
10+
## Build the examples
11+
The easiest way to build the examples, with CDH 5.0 dependencies, is to use the provided Gradle and sbt scripts.
12+
13+
### Gradle
14+
We use [Gradle](https://gradle.org) to compile Java code and collect the required class files into a single JAR file.
15+
16+
```{bash}
17+
$ ./gradlew jar
18+
```
19+
20+
JARs can then be submitted to Hadoop with:
21+
22+
```{bash}
23+
$ hadoop jar <job jarfile> <main class> <argument 1> ... <argument 2>
24+
```
25+
26+
#### Example - Chapter 3 (Mapreduce and beyond)
27+
28+
To build ch3 examples
29+
```{bash}
30+
$ git clone https://github.com/learninghadoop2/book-examples
31+
$ cd book-examples/ch3
32+
$ ./gradlew jar
33+
```
34+
35+
The script will take care of downloading a Gradle distribution from
36+
the official repo
37+
(https://services.gradle.org/distributions/gradle-2.0-bin.zip),
38+
and use it to build the code under
39+
src/main/java/com/learninghadoop2/mapreduce/. You will find the
40+
resulting jar in build/libs/mapreduce-example.jar.
41+
42+
We can run the WordCount example as described in Chapter 3:
43+
```{bash}
44+
$ hadoop jar build/libs/mapreduce-example.jar \
45+
com.learninghadoop2.mapreduce.WordCount \
46+
input.txt \
47+
output
48+
```
49+
50+
51+
For more information on how gradle is bootstrapped to run the build,
52+
refer to https://docs.gradle.org/current/userguide/gradle_wrapper.html
53+
The gradle_wrapper plugin is distributed with the examples
54+
(gradle/wrapper/gradle-wrapper.jar).
55+
56+
57+
### SBT
58+
59+
We use [sbt](www.scala-sbt.org) to build, manage, and execute the Spark examples in Chapter 5.
60+
61+
The build.sbt file controls the codebase metadata and software dependencies.
62+
63+
The source code for all examples can be compiled with:
64+
```{bash}
65+
$ cd ch5
66+
$ sbt compile
67+
```
68+
69+
Or, it can be packaged into a JAR file with:
70+
```{bash}
71+
$ sbt package
72+
```
73+
74+
For Spark in standalone mode, an helper script to execute compiled classes can be generated with:
75+
```{bash}
76+
$ sbt add-start-script-tasks
77+
$ sbt start-script
78+
```
79+
The helper can be invoked as follows:
80+
81+
```{bash}
82+
$ target/start <class name> <master> <param1> … <param n>
83+
```
84+
85+
86+
#### YARN on CDH5
87+
88+
To run the examples on a YARN grid on CDH5, you can build a JAR file using:
89+
```{bash}
90+
$ sbt package
91+
```
92+
93+
and then ship it to the Resource Manager using the spark-submit command:
94+
95+
```{bash}
96+
./bin/spark-submit --class application.to.execute --master yarn-cluster [options] target/scala-2.10/chapter-4_2.10-1.0.jar [<param1> … <param n>]
97+
```
98+
Unlike the standalone mode, we don't need to specify a <master> URI.
99+
100+
More information on launching Spark on YARN can be found at http://spark.apache.org/docs/latest/running-on-yarn.html.

0 commit comments

Comments
 (0)