Skip to content

Sparkler Usage

Tom Barber edited this page Dec 24, 2020 · 14 revisions

Basics

A Simple Crawl

Once you have Sparkler installed and configured you can kick off your first crawl. There are various command line flags to help you do this.

./sparkler.sh inject -su bbc.co.uk -id test
./sparkler.sh crawl -id test

This example basically says crawl bbc.co.uk and label the id test. The id is optional, if you don't supply it then you'll get a custom job id in return.

Crawls are always in 2 steps, the inject phase just preseeds the database. Then the crawl phase iterates through the seeded urls and populates the database with the crawl results.

Headers

Enabling Plugins

Basic Plugins

Fetcher HTMLUnit

Regex

Samehost

Advanced Usage

Plugins

Fetcher Chrome

URL Injector

POST/PUT Commands

Config Override

Additional Fields

Clone this wiki locally