-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Try and make this a bit more scalable:
- Look at DB schema again, see if maybe you'd want to include some more info?
- Create a slightly more persistant message queue system, rather than retaining in memory as this inherently limit the number of URL's that can be scraped.
- Crawling service consumes from message queue with a URL and either throws it away (if it has been seen before), or scrapes and adds other URLs back into the message queue.
- Tiny little service that is used to seed a few initial URL's to the message queue (perhaps this could be exposed so people could request their URL to be scraped?)
Also for shits and giggles try and rope together some raspberry pi's lying around so one can be the DB / Message queue service and the other can be a worker.
Also finally create the UI for this, so I can get some saucy graphs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels