- 
                Notifications
    You must be signed in to change notification settings 
- Fork 139
sparkler 0.1
        Thamme Gowda edited this page Jun 7, 2016 
        ·
        17 revisions
      
    How to Setup Solr backed Crawldb Store
Apache Solr (Tested on 6.0.1)
# A place to keep all the files organized
mkdir ~/work/sparkler/ -p
cd ~/work/sparkler/
# Download Solr Binary
wget "http://apache.mirrors.hoobly.com/lucene/solr/6.0.1/solr-6.0.1.tgz"  # pick your version and mirror
# Extract Solr
tar xvzf solr-6.0.1.tgz
# Add crawldb config sets
cd solr-6.0.1/
cp -rv ${SPARKLER_GIT_SOURCE_PATH}/conf/solr/crawldb server/solr/configsets/
There are many ways to do this, Here is a relatively easy way to start solr with crawldb
# from the solr extracted directory cp -r server/solr/configsets/crawldb server/solr/ ./bin/solr start
Wait for a while to start the solr, Open http://localhost:8983/solr/#/~cores/ in your browser, Follow Add Core > then fill 'crawldb' for both name and instanceDir form fields and click Add Core.
Now the Crawldb core is ready, Skip step 3.
// Coming soon
Open a file called seed.txt and enter your seed urls. Example :
http://nutch.apache.org/ http://tika.apache.org/