Rottentomatoes.com Scrapers

Rottentomatoes.com Scrapers

Description

This repository contains Python-based scrapers that extract data from Rotten Tomatoes Movie Listings and Rotten Tomatoes Movie Details pages. The scrapers use the Crawlbase Crawling API to handle CAPTCHA challenges, pagination, anti-bot protections, and JavaScript-rendered content seamlessly.

The extracted data is parsed and saved in JSON format.

➡ For detailed instructions, visit the full blog here.

Scrapers Overview

Rottentomatoes.com Movie Listings Scraper

The Rottentomatoes.com Movie Listings Scraper (rottentomatoes_serp_scraper.py) extracts:

Movie Title
Critics score
Audience Score
Movie Page Link

It also automatically handles pagination, ensuring comprehensive data extraction. It saves the extracted data in a JSON file.

Rottentomatoes.com Movie Details Page Scraper

The Rottentomatoes.com Movie Details Page Scraper (rottentomatoes_movie_page_scraper.py) extracts detailed movie information, including:

Movie Title
Synopsis
Movie Details like Director, Producer, Screenwriter, Distributor, rating etc

It saves the extracted data in a JSON file.

Environment Setup

Ensure that Python is installed on your system. Check the version using:

# Use python3 if you're on Linux with Python 3 installed
python --version

Next, install the required dependencies:

pip install crawlbase beautifulsoup4

Crawlbase – Handles JavaScript rendering and bypasses bot protections.
BeautifulSoup – Parses and extracts structured data from HTML.

Running the Scrapers

Get Your Crawlbase Access Token
- Sign up for Crawlbase here to get an API token.
- Use the JS token for Rottentomatoes.com scraping, as the site uses JavaScript-rendered content.
Update the Scraper with Your Token
- Replace "CRAWLBASE_JS_TOKEN" in the script with your Crawlbase JS Token.
Run the Scraper

# Use python3 if required (for Linux/macOS)
python SCRAPER_FILE_NAME.py

Replace "SCRAPER_FILE_NAME.py" with the actual script name (rottentomatoes_serp_scraper.py or rottentomatoes_movie_page_scraper.py).

To-Do List

Expand scrapers to extract additional movie details.
Optimize data storage and export formats (e.g., CSV, database integration).
Enhance scraper efficiency and speed.

Why Use These Scrapers?

Extracts Rotten Tomatoes Data efficiently.
Bypasses CAPTCHAs and anti-bot protections with Crawlbase.
Handles JavaScript-rendered content seamlessly.
Supports easy pagination for scraping multiple pages.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
rottentomatoes_movie_page_scraper.py		rottentomatoes_movie_page_scraper.py
rottentomatoes_serp_scraper.py		rottentomatoes_serp_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Rottentomatoes.com Scrapers

Description

Scrapers Overview

Rottentomatoes.com Movie Listings Scraper

Rottentomatoes.com Movie Details Page Scraper

Environment Setup

Running the Scrapers

To-Do List

Why Use These Scrapers?

About

Uh oh!

Releases

Packages

Languages

ScraperHub/rotten-tomatoes-scraper

Folders and files

Latest commit

History

Repository files navigation

Rottentomatoes.com Scrapers

Description

Scrapers Overview

Rottentomatoes.com Movie Listings Scraper

Rottentomatoes.com Movie Details Page Scraper

Environment Setup

Running the Scrapers

To-Do List

Why Use These Scrapers?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages