add a crawl option to not recrawl already crawled pages

Plan of action:
- add an option "expand_crawl" with a new crawl depth (questions: should it add the new depth to the original one? should we have a max_expanded_depth setting?) 
- monkeypatch spider's `_request` function to first check if the page already exists in the mongo and if so skip request and instead feed the spider directly with the stored lrulinks from the mongo 


linked with #158 which could be implemented altogether now that this features comes in

Extra: add a RickRoll/Recrawl easter egg!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a crawl option to not recrawl already crawled pages #507

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

add a crawl option to not recrawl already crawled pages #507

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions