Skip to content

Implement maxAge fast scraping parameter#72

Closed
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1751317120-implement-maxage-parameter
Closed

Implement maxAge fast scraping parameter#72
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1751317120-implement-maxage-parameter

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Jun 30, 2025

Implement maxAge fast scraping parameter

Summary

This PR implements the maxAge fast scraping parameter across all scraping-related tools in the Firecrawl MCP Server, enabling 500% faster scraping through intelligent caching as documented in PR #34 of firecrawl-docs.

Key Changes:

  • Added maxAge parameter (number, defaults to 0) to SCRAPE_TOOL, CRAWL_TOOL, and SEARCH_TOOL schemas
  • Created missing BATCH_SCRAPE_TOOL that was referenced in tests but absent from main code
  • Added proper type guard and request handler for batch scraping functionality
  • Updated all tool schemas to include maxAge with proper descriptions and defaults

The maxAge parameter accepts milliseconds and uses cached content if younger than the specified age, otherwise scrapes fresh content. A value of 0 (default) means always scrape fresh.

Review & Testing Checklist for Human

  • Test actual caching behavior: Verify maxAge parameter works with real Firecrawl API calls (make same request twice with maxAge > 0, confirm second request uses cache)
  • Test new BATCH_SCRAPE_TOOL: Verify the previously missing batch scrape functionality now works end-to-end
  • Verify backward compatibility: Test all existing tools still work without maxAge specified
  • Test parameter passing: Confirm maxAge gets properly passed to underlying Firecrawl client methods
  • Integration testing: Run the MCP server with a real MCP client and test all modified tools

Recommended test plan:

  1. Start MCP server locally
  2. Test each tool (scrape, crawl, batch_scrape, search) with and without maxAge
  3. For caching verification: scrape same URL twice with maxAge=300000 (5min), verify second call is faster
  4. Verify error handling when maxAge is invalid (negative, non-number)

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    subgraph "MCP Server Structure"
        Index["src/index.ts"]:::major-edit
        Tests["src/index.test.ts"]:::context
    end
    
    subgraph "Tool Definitions (Updated)"
        SCRAPE["SCRAPE_TOOL<br/>+maxAge param"]:::major-edit
        CRAWL["CRAWL_TOOL<br/>+maxAge in scrapeOptions"]:::major-edit  
        SEARCH["SEARCH_TOOL<br/>+maxAge in scrapeOptions"]:::minor-edit
        BATCH["BATCH_SCRAPE_TOOL<br/>**NEW TOOL**"]:::major-edit
    end
    
    subgraph "API Handlers (Updated)"  
        Handler["CallToolRequestSchema<br/>+batch_scrape case"]:::major-edit
        TypeGuards["Type Guards<br/>+isBatchScrapeOptions"]:::minor-edit
    end
    
    subgraph "Firecrawl Client Calls"
        ScrapeCall["client.scrapeUrl()"]:::context
        CrawlCall["client.asyncCrawlUrl()"]:::context  
        BatchCall["client.asyncBatchScrapeUrls()"]:::context
        SearchCall["client.search()"]:::context
    end
    
    Index --> SCRAPE
    Index --> CRAWL  
    Index --> SEARCH
    Index --> BATCH
    Index --> Handler
    Index --> TypeGuards
    
    
    Handler --> ScrapeCall
    Handler --> CrawlCall
    Handler --> BatchCall  
    Handler --> SearchCall
    
    Tests -.-> BATCH
    
    subgraph Legend
        L1["Major Edit"]:::major-edit
        L2["Minor Edit"]:::minor-edit  
        L3["Context/No Edit"]:::context
    end
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB  
    classDef context fill:#FFFFFF
Loading

Notes

  • Critical Discovery: The BATCH_SCRAPE_TOOL was completely missing from the main code despite being referenced in tests - this was a significant gap that needed to be filled
  • TypeScript Issue: Had to remove origin: 'mcp-server' parameter from batch scrape call due to type compatibility issues
  • Testing Limitation: While all lint/test/build checks pass, the actual caching behavior with real Firecrawl API calls couldn't be verified in the development environment
  • Documentation Alignment: Implementation follows the fast-scraping documentation from firecrawl-docs PR How to configure the cline plugin after deploying firecrawl in docker #34

Session Info:

- Add maxAge parameter to SCRAPE_TOOL input schema
- Add maxAge to CRAWL_TOOL scrapeOptions for faster crawling
- Create missing BATCH_SCRAPE_TOOL with maxAge support
- Add maxAge to SEARCH_TOOL scrapeOptions
- Ensure maxAge is passed through to all Firecrawl API calls

Fixes #69

Co-Authored-By: Nick <nicolascamara29@gmail.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement maxAge fast scraping parameter

1 participant