Running Shark Locally

This guide describes how to get Spark running locally.

Shark requires Hive 0.7.0 and Spark (0.4-SNAPSHOT).

Get the patched Hive from AMPLab github account:

$ export HIVE_DEV_HOME=/path/to/hive
$ git clone git://github.com/amplab/hive.git -b shark-0.7.0 $HIVE_HOME
$ cd $HIVE_DEV_HOME
$ ant package

Get Spark from Github, compile, and publish to local ivy:

$ git clone git://github.com/mesos/spark.git spark 
$ cd spark 
$ sbt/sbt publish-local

Get Shark from Github:

$ git clone git://github.com/amplab/shark.git shark
$ cd shark

Before building Shark, first modify the config file:

$ conf/shark-env.sh

Compile Shark (make sure $HIVE_HOME is set as $HIVE_DEV_HOME/build/dist in the config file or as an environmental variable):

$ sbt/sbt products

There are several executables in /bin:

shark: Runs Shark CLI.
shark-withinfo: Runs Shark with INFO level logs printed to the console.
shark-withdebug: Runs Shark with DEBUG level logs printed to the console.
shark-shell: Runs Shark scala console. This provides an experimental feature to convert Hive QL queries into TableRDD.
clear-buffer-cache.py: Automatically clears OS buffer caches on Mesos EC2 clusters. This is handy for performance studies.

Provide feedback