
This repository contains Python-based scrapers that extract data from Rotten Tomatoes Movie Listings and Rotten Tomatoes Movie Details pages. The scrapers use the Crawlbase Crawling API to handle CAPTCHA challenges, pagination, anti-bot protections, and JavaScript-rendered content seamlessly.
The extracted data is parsed and saved in JSON format.
➡ For detailed instructions, visit the full blog here.
The Rottentomatoes.com Movie Listings Scraper (rottentomatoes_serp_scraper.py) extracts:
- Movie Title
- Critics score
- Audience Score
- Movie Page Link
It also automatically handles pagination, ensuring comprehensive data extraction. It saves the extracted data in a JSON file.
The Rottentomatoes.com Movie Details Page Scraper (rottentomatoes_movie_page_scraper.py) extracts detailed movie information, including:
- Movie Title
- Synopsis
- Movie Details like Director, Producer, Screenwriter, Distributor, rating etc
It saves the extracted data in a JSON file.
Ensure that Python is installed on your system. Check the version using:
# Use python3 if you're on Linux with Python 3 installed
python --version
Next, install the required dependencies:
pip install crawlbase beautifulsoup4
- Crawlbase – Handles JavaScript rendering and bypasses bot protections.
- BeautifulSoup – Parses and extracts structured data from HTML.
-
Get Your Crawlbase Access Token
- Sign up for Crawlbase here to get an API token.
- Use the JS token for Rottentomatoes.com scraping, as the site uses JavaScript-rendered content.
-
Update the Scraper with Your Token
- Replace
"CRAWLBASE_JS_TOKEN"
in the script with your Crawlbase JS Token.
- Replace
-
Run the Scraper
# Use python3 if required (for Linux/macOS)
python SCRAPER_FILE_NAME.py
Replace "SCRAPER_FILE_NAME.py"
with the actual script name (rottentomatoes_serp_scraper.py
or rottentomatoes_movie_page_scraper.py
).
- Expand scrapers to extract additional movie details.
- Optimize data storage and export formats (e.g., CSV, database integration).
- Enhance scraper efficiency and speed.
- Extracts Rotten Tomatoes Data efficiently.
- Bypasses CAPTCHAs and anti-bot protections with Crawlbase.
- Handles JavaScript-rendered content seamlessly.
- Supports easy pagination for scraping multiple pages.