Skip to content

Refactor for scalability #1

@jamesjarvis

Description

@jamesjarvis

Try and make this a bit more scalable:

  • Look at DB schema again, see if maybe you'd want to include some more info?
  • Create a slightly more persistant message queue system, rather than retaining in memory as this inherently limit the number of URL's that can be scraped.
  • Crawling service consumes from message queue with a URL and either throws it away (if it has been seen before), or scrapes and adds other URLs back into the message queue.
  • Tiny little service that is used to seed a few initial URL's to the message queue (perhaps this could be exposed so people could request their URL to be scraped?)

Also for shits and giggles try and rope together some raspberry pi's lying around so one can be the DB / Message queue service and the other can be a worker.

Also finally create the UI for this, so I can get some saucy graphs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions