feat: Add Serper scrape API for content extraction #48

T-rav · 2025-10-10T22:29:36Z

WebCat now uses Serper's optimized scraping infrastructure as the primary content extraction method, with Trafilatura as fallback. This makes WebCat a true composite search tool - one SERPER_API_KEY enables both search + scraping.

Benefits:

Much faster and more reliable scraping via Serper's infrastructure
Cleaner markdown output with preserved document structure
Single API key for both search and content extraction
Automatic fallback to Trafilatura when Serper unavailable
Reduced compute costs compared to local scraping

Changes:

Add scrape_webpage() function to serper_client.py
Update content_scraper.py to prioritize Serper scrape API
Update README to highlight composite tool functionality
Maintain backward compatibility with Trafilatura fallback

Pricing: Serper scraping at $0.001 per scrape (included in free tier)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added Serper-powered web scraping when an API key is provided, with automatic fallback to Trafilatura for reliability.
Documentation
- Updated README to reflect Serper-based scraping and search.
- Clarified that a single Serper API key enables both search and scraping.
- Revised setup steps, architecture flow, and tool references.
Chores
- Bumped version to 2.5.1.

WebCat now uses Serper's optimized scraping infrastructure as the primary content extraction method, with Trafilatura as fallback. This makes WebCat a true composite search tool - one SERPER_API_KEY enables both search + scraping. Benefits: - Much faster and more reliable scraping via Serper's infrastructure - Cleaner markdown output with preserved document structure - Single API key for both search and content extraction - Automatic fallback to Trafilatura when Serper unavailable - Reduced compute costs compared to local scraping Changes: - Add scrape_webpage() function to serper_client.py - Update content_scraper.py to prioritize Serper scrape API - Update README to highlight composite tool functionality - Maintain backward compatibility with Trafilatura fallback Pricing: Serper scraping at $0.001 per scrape (included in free tier) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

coderabbitai · 2025-10-10T22:30:00Z

Caution

Review failed

The pull request is closed.

Walkthrough

Adds Serper-based web scraping with optional fallback to Trafilatura, updates README to reflect new scraping/search setup, introduces a Serper client function for scraping, adjusts content scraper control flow to prefer Serper when configured, and increments the version to 2.5.1.

Changes

Cohort / File(s)	Summary
Docs updates `README.md`	Revised scraping/search descriptions to use Serper scrape API (primary) with Trafilatura fallback; clarified API key usage; updated architecture and repo structure references.
Serper client `docker/clients/serper_client.py`	Added `scrape_webpage(url: str, api_key: str) -> Optional[str]` using Serper scrape API; error handling and logging; minor typing/docstring updates.
Content scraping flow `docker/services/content_scraper.py`	Implemented Serper-first scraping path gated by `SERPER_API_KEY` with fallback to existing Trafilatura logic; added imports, env config, logging, and result wrapping/truncation.
Versioning `docker/constants.py`	Bumped `VERSION` from "2.5.0" to "2.5.1".

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as Caller
    participant CS as ContentScraper
    participant SC as SerperClient
    participant SA as Serper Scrape API
    participant T as Trafilatura

    U->>CS: scrape(url)
    alt SERPER_API_KEY present
        CS->>SC: scrape_webpage(url, api_key)
        SC->>SA: POST / (url)
        SA-->>SC: text/markdown or empty
        alt Content returned
            SC-->>CS: markdown text
            CS-->>U: wrapped+truncated content
        else No content/error
            SC-->>CS: None / error
            Note over CS: Fallback to Trafilatura
            CS->>T: extract(url)
            T-->>CS: text/markdown/snippet or None
            CS-->>U: result (or snippet)
        end
    else No SERPER_API_KEY
        CS->>T: extract(url)
        T-->>CS: text/markdown/snippet or None
        CS-->>U: result (or snippet)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Better page scraping and results display #29 — Also modifies content_scraper to shift extraction toward Trafilatura; related to this PR’s Serper-first with Trafilatura fallback flow.

Poem

In whiskered taps I fetch the news,
Serper first, if keys you use;
If clouds roll in and calls do fail,
Trafilatura sets the trail.
Version hops with gentle cheer—
A rabbit nods: the path is clear. 🥕✨

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chore/bump-version-2.5.0

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff8c2a3 and e44a103.

📒 Files selected for processing (4)

README.md (4 hunks)
docker/clients/serper_client.py (2 hunks)
docker/constants.py (1 hunks)
docker/services/content_scraper.py (3 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

T-rav and others added 2 commits October 10, 2025 16:28

chore: Bump version to 2.5.1

e44a103

T-rav merged commit 68a668a into main Oct 10, 2025
11 checks passed

T-rav deleted the chore/bump-version-2.5.0 branch October 10, 2025 22:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Serper scrape API for content extraction #48

feat: Add Serper scrape API for content extraction #48

Uh oh!

T-rav commented Oct 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 10, 2025 •

edited

Loading

Review failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add Serper scrape API for content extraction #48

feat: Add Serper scrape API for content extraction #48

Uh oh!

Conversation

T-rav commented Oct 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

T-rav commented Oct 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 10, 2025 •

edited

Loading