Skip to content

Conversation

@T-rav
Copy link
Collaborator

@T-rav T-rav commented Oct 10, 2025

WebCat now uses Serper's optimized scraping infrastructure as the primary content extraction method, with Trafilatura as fallback. This makes WebCat a true composite search tool - one SERPER_API_KEY enables both search + scraping.

Benefits:

  • Much faster and more reliable scraping via Serper's infrastructure
  • Cleaner markdown output with preserved document structure
  • Single API key for both search and content extraction
  • Automatic fallback to Trafilatura when Serper unavailable
  • Reduced compute costs compared to local scraping

Changes:

  • Add scrape_webpage() function to serper_client.py
  • Update content_scraper.py to prioritize Serper scrape API
  • Update README to highlight composite tool functionality
  • Maintain backward compatibility with Trafilatura fallback

Pricing: Serper scraping at $0.001 per scrape (included in free tier)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added Serper-powered web scraping when an API key is provided, with automatic fallback to Trafilatura for reliability.
  • Documentation

    • Updated README to reflect Serper-based scraping and search.
    • Clarified that a single Serper API key enables both search and scraping.
    • Revised setup steps, architecture flow, and tool references.
  • Chores

    • Bumped version to 2.5.1.

T-rav and others added 2 commits October 10, 2025 16:28
WebCat now uses Serper's optimized scraping infrastructure as the primary
content extraction method, with Trafilatura as fallback. This makes WebCat
a true composite search tool - one SERPER_API_KEY enables both search + scraping.

Benefits:
- Much faster and more reliable scraping via Serper's infrastructure
- Cleaner markdown output with preserved document structure
- Single API key for both search and content extraction
- Automatic fallback to Trafilatura when Serper unavailable
- Reduced compute costs compared to local scraping

Changes:
- Add scrape_webpage() function to serper_client.py
- Update content_scraper.py to prioritize Serper scrape API
- Update README to highlight composite tool functionality
- Maintain backward compatibility with Trafilatura fallback

Pricing: Serper scraping at $0.001 per scrape (included in free tier)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Oct 10, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Adds Serper-based web scraping with optional fallback to Trafilatura, updates README to reflect new scraping/search setup, introduces a Serper client function for scraping, adjusts content scraper control flow to prefer Serper when configured, and increments the version to 2.5.1.

Changes

Cohort / File(s) Summary
Docs updates
README.md
Revised scraping/search descriptions to use Serper scrape API (primary) with Trafilatura fallback; clarified API key usage; updated architecture and repo structure references.
Serper client
docker/clients/serper_client.py
Added scrape_webpage(url: str, api_key: str) -> Optional[str] using Serper scrape API; error handling and logging; minor typing/docstring updates.
Content scraping flow
docker/services/content_scraper.py
Implemented Serper-first scraping path gated by SERPER_API_KEY with fallback to existing Trafilatura logic; added imports, env config, logging, and result wrapping/truncation.
Versioning
docker/constants.py
Bumped VERSION from "2.5.0" to "2.5.1".

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as Caller
    participant CS as ContentScraper
    participant SC as SerperClient
    participant SA as Serper Scrape API
    participant T as Trafilatura

    U->>CS: scrape(url)
    alt SERPER_API_KEY present
        CS->>SC: scrape_webpage(url, api_key)
        SC->>SA: POST / (url)
        SA-->>SC: text/markdown or empty
        alt Content returned
            SC-->>CS: markdown text
            CS-->>U: wrapped+truncated content
        else No content/error
            SC-->>CS: None / error
            Note over CS: Fallback to Trafilatura
            CS->>T: extract(url)
            T-->>CS: text/markdown/snippet or None
            CS-->>U: result (or snippet)
        end
    else No SERPER_API_KEY
        CS->>T: extract(url)
        T-->>CS: text/markdown/snippet or None
        CS-->>U: result (or snippet)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

In whiskered taps I fetch the news,
Serper first, if keys you use;
If clouds roll in and calls do fail,
Trafilatura sets the trail.
Version hops with gentle cheer—
A rabbit nods: the path is clear. 🥕✨

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/bump-version-2.5.0

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff8c2a3 and e44a103.

📒 Files selected for processing (4)
  • README.md (4 hunks)
  • docker/clients/serper_client.py (2 hunks)
  • docker/constants.py (1 hunks)
  • docker/services/content_scraper.py (3 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@T-rav T-rav merged commit 68a668a into main Oct 10, 2025
11 checks passed
@T-rav T-rav deleted the chore/bump-version-2.5.0 branch October 10, 2025 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants