Skip to content

sparkler 0.1

Thamme Gowda edited this page Jun 7, 2016 · 17 revisions

Table of Contents

Sparkler v0.1

How to Setup Solr backed Crawldb Store

Requirements

Apache Solr (Tested on 6.0.1)

Steps

1. Download Apache Solr

# A place to keep all the files organized
mkdir ~/work/sparkler/ -p
cd ~/work/sparkler/
# Download Solr Binary
wget "http://apache.mirrors.hoobly.com/lucene/solr/6.0.1/solr-6.0.1.tgz"  # pick your version and mirror
# Extract Solr
tar xvzf solr-6.0.1.tgz
# Add crawldb config sets
cd solr-6.0.1/
cp -rv ${SPARKLER_GIT_SOURCE_PATH}/conf/solr/crawldb server/solr/configsets/

2. Start Solr in Local Mode

There are many ways to do this, Here is a relatively easy way to start solr with crawldb

# from the solr extracted directory
cp -r server/solr/configsets/crawldb server/solr/
./bin/solr start

Wait for a while to start the solr, Open http://localhost:8983/solr/#/~cores/ in your browser, Follow Add Core > then fill 'crawldb' for both name and instanceDir form fields and click Add Core.

Now the Crawldb core is ready, Skip step 3.

3. Start Solr in Cloud mode

// Coming soon

4. Inject Seed urls

Open a file called seed.txt and enter your seed urls. Example :

http://nutch.apache.org/
http://tika.apache.org/
Clone this wiki locally