Spider Depth First Scraping rather than Breadth First #140
Replies: 2 comments
-
|
Literally just seen the priority param that I somehow missed when searching the docs before - believe that will solve my issue 😅 |
Beta Was this translation helpful? Give feedback.
-
|
Higher priority requests are executed first. If all priorities were equal, the requests added to the queue first would be executed first. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
It is possible currently for spiders to make requests using depth first priority rather than breadth first? From some testing, I cannot find any settings to set this. It may be possible with some logic within the parse functions.
To be clear, I'm talking about when building a spider that is intended to crawl from many different URLs on a site. Currently if a page contains many URLs for other pages, like a basic example the amazon homepage, the spider will currenly visit each of the linked pages (cleaning tools, home decor etc.) before then starting to open the URLs found on those pages. Instead I want to visit cleaning tools, then continue crawling through that page before returning to the homepage to go to the next page.

Let me know if you need any more explaination/clearing up
Beta Was this translation helpful? Give feedback.
All reactions