-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Labels
enhancementNew feature or request.New feature or request.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
- Some "JavaScript-heavy websites" (e.g. https://tripadvisor.com) cannot be scraped by using just Scrapy.
Can you check why our Beautiful Soup template fails on tripadvisor.com? https://console.apify.com/actors/jWYbXHu32SvZf1Cgb/runs/0IYh4rWH9Ig2vIUSM#output
- Solution: We can provide a new Scrapy Actor template using a headless browser like Playwright.
- PyPI packages: scrapy and scrapy-playwright.
- The integration of Playwright into the Scrapy project is pretty simple,
scrapy-playwrightprovides a Scrapy componentScrapyPlaywrightDownloadHandler, which needs to be added to the project. - Check the Web scraping with Scrapy blog post for more information and inspiration.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or request.New feature or request.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.