Skip to content

Whats the best approach to speed things up? #3

@jalbstmeijer

Description

@jalbstmeijer

Hi,

I'm looking for ways to speedup the crawling process.
Where website-scraper takes up to 8 minutes to crawl a site, website-scraper-puppeteer needs 40 minutes for the same site. (sure I expect a penalty)

Increasing cpu resources only helps to a certain point as I see website-scraper-puppeteer (chromium) not taking all cpu available.

Would it be possible to only use website-scraper-puppeteer for JS scraping/execution? and leave the rest up to website-scraper? If so, how?

Looking at puppeteer and performance increase options, I came across pupeteer-pool. (https://github.com/latesh/puppeteer-pool)
Would it be possible to use pupeteer-pool from within website-scraper-puppeteer? if so how?

Tried the last option, but I'm not very into node so I failed to get it to work.

Gr, J

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions