|
| 1 | +# book-examples |
| 2 | + |
| 3 | +This repository contains examples (and errata) for [Learning Hadoop 2](http://learninghadoop2.com). |
| 4 | + |
| 5 | +## Requirements |
| 6 | + |
| 7 | +Throughout the book we use Cloudera CDH 5.0 and Amazon EMR as reference systems. All examples target, |
| 8 | +and have been tested with, Java 7. |
| 9 | + |
| 10 | +## Build the examples |
| 11 | +The easiest way to build the examples, with CDH 5.0 dependencies, is to use the provided Gradle and sbt scripts. |
| 12 | + |
| 13 | +### Gradle |
| 14 | +We use [Gradle](https://gradle.org) to compile Java code and collect the required class files into a single JAR file. |
| 15 | + |
| 16 | +```{bash} |
| 17 | +$ ./gradlew jar |
| 18 | +``` |
| 19 | + |
| 20 | +JARs can then be submitted to Hadoop with: |
| 21 | + |
| 22 | +```{bash} |
| 23 | +$ hadoop jar <job jarfile> <main class> <argument 1> ... <argument 2> |
| 24 | +``` |
| 25 | + |
| 26 | +#### Example - Chapter 3 (Mapreduce and beyond) |
| 27 | + |
| 28 | +To build ch3 examples |
| 29 | +```{bash} |
| 30 | +$ git clone https://github.com/learninghadoop2/book-examples |
| 31 | +$ cd book-examples/ch3 |
| 32 | +$ ./gradlew jar |
| 33 | +``` |
| 34 | + |
| 35 | +The script will take care of downloading a Gradle distribution from |
| 36 | +the official repo |
| 37 | +(https://services.gradle.org/distributions/gradle-2.0-bin.zip), |
| 38 | +and use it to build the code under |
| 39 | +src/main/java/com/learninghadoop2/mapreduce/. You will find the |
| 40 | +resulting jar in build/libs/mapreduce-example.jar. |
| 41 | + |
| 42 | +We can run the WordCount example as described in Chapter 3: |
| 43 | +```{bash} |
| 44 | +$ hadoop jar build/libs/mapreduce-example.jar \ |
| 45 | +com.learninghadoop2.mapreduce.WordCount \ |
| 46 | +input.txt \ |
| 47 | +output |
| 48 | +``` |
| 49 | + |
| 50 | + |
| 51 | +For more information on how gradle is bootstrapped to run the build, |
| 52 | +refer to https://docs.gradle.org/current/userguide/gradle_wrapper.html |
| 53 | +The gradle_wrapper plugin is distributed with the examples |
| 54 | +(gradle/wrapper/gradle-wrapper.jar). |
| 55 | + |
| 56 | + |
| 57 | +### SBT |
| 58 | + |
| 59 | +We use [sbt](www.scala-sbt.org) to build, manage, and execute the Spark examples in Chapter 5. |
| 60 | + |
| 61 | +The build.sbt file controls the codebase metadata and software dependencies. |
| 62 | + |
| 63 | +The source code for all examples can be compiled with: |
| 64 | +```{bash} |
| 65 | +$ cd ch5 |
| 66 | +$ sbt compile |
| 67 | +``` |
| 68 | + |
| 69 | +Or, it can be packaged into a JAR file with: |
| 70 | +```{bash} |
| 71 | +$ sbt package |
| 72 | +``` |
| 73 | + |
| 74 | +For Spark in standalone mode, an helper script to execute compiled classes can be generated with: |
| 75 | +```{bash} |
| 76 | +$ sbt add-start-script-tasks |
| 77 | +$ sbt start-script |
| 78 | +``` |
| 79 | +The helper can be invoked as follows: |
| 80 | + |
| 81 | +```{bash} |
| 82 | +$ target/start <class name> <master> <param1> … <param n> |
| 83 | +``` |
| 84 | + |
| 85 | + |
| 86 | +#### YARN on CDH5 |
| 87 | + |
| 88 | +To run the examples on a YARN grid on CDH5, you can build a JAR file using: |
| 89 | +```{bash} |
| 90 | +$ sbt package |
| 91 | +``` |
| 92 | + |
| 93 | +and then ship it to the Resource Manager using the spark-submit command: |
| 94 | + |
| 95 | +```{bash} |
| 96 | +./bin/spark-submit --class application.to.execute --master yarn-cluster [options] target/scala-2.10/chapter-4_2.10-1.0.jar [<param1> … <param n>] |
| 97 | +``` |
| 98 | +Unlike the standalone mode, we don't need to specify a <master> URI. |
| 99 | + |
| 100 | +More information on launching Spark on YARN can be found at http://spark.apache.org/docs/latest/running-on-yarn.html. |
0 commit comments