Skip to content

feat: add batch scrape tools with maxConcurrency support (ENG-2756)#78

Closed
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/ENG-2756-1753370764
Closed

feat: add batch scrape tools with maxConcurrency support (ENG-2756)#78
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/ENG-2756-1753370764

Conversation

@devin-ai-integration
Copy link
Contributor

feat: add batch scrape tools with maxConcurrency support (ENG-2756)

Summary

This PR implements two new MCP tools to support batch scraping operations with concurrency control:

  • firecrawl_batch_scrape: Initiates batch scraping of multiple URLs with configurable maxConcurrency parameter
  • firecrawl_check_batch_status: Monitors the progress and status of batch scrape jobs

The implementation follows existing patterns from SCRAPE_TOOL and CHECK_CRAWL_STATUS_TOOL, exposing the underlying Firecrawl JavaScript SDK's batch scraping capabilities through the MCP server interface.

Key changes:

  • Added tool definitions with comprehensive schemas including maxConcurrency parameter
  • Implemented request handlers using asyncBatchScrapeUrls and checkBatchScrapeStatus SDK methods
  • Added type checking function isBatchScrapeOptions for input validation
  • Registered tools in the ListToolsRequestSchema handler

Review & Testing Checklist for Human

  • Verify maxConcurrency works with real API calls - The parameter is passed via TypeScript error suppression since it's not in ScrapeParams types but is accepted by the API
  • Test batch scraping end-to-end - Initiate a batch scrape with different maxConcurrency values and verify it works as expected
  • Validate status checking functionality - Ensure firecrawl_check_batch_status correctly returns job progress and results
  • Review TypeScript error suppressions - Confirm the @ts-expect-error comments are acceptable for maxConcurrency and origin parameters
  • Test error handling scenarios - Verify proper error responses for invalid URLs, authentication failures, and API errors

Recommended test plan:

  1. Set up MCP server with real Firecrawl API key
  2. Test batch scraping 3-5 URLs with maxConcurrency=2
  3. Monitor job status until completion
  4. Verify results are properly formatted and returned

Diagram

%%{ init : { "theme" : "default" }}%%
graph TD
    Client["MCP Client"]
    Server["src/index.ts<br/>MCP Server"]
    SDK["Firecrawl JS SDK<br/>asyncBatchScrapeUrls()"]
    API["Firecrawl API<br/>/v1/batch/scrape"]
    
    Client -->|firecrawl_batch_scrape| Server
    Client -->|firecrawl_check_batch_status| Server
    Server -->|batchScrapeUrls| SDK
    Server -->|checkBatchScrapeStatus| SDK
    SDK -->|HTTP POST| API
    
    Server:::major-edit
    SDK:::context
    API:::context
    Client:::context
    
    subgraph Legend
        L1[Major Edit]:::major-edit
        L2[Minor Edit]:::minor-edit  
        L3[Context/No Edit]:::context
    end
    
    
    classDef major-edit fill:#90EE90
    classDef minor-edit fill:#87CEEB
    classDef context fill:#FFFFFF
Loading

Notes

  • TypeScript compatibility: Had to use @ts-expect-error for maxConcurrency parameter since it's not officially in ScrapeParams type but is accepted by the underlying API
  • SDK method choice: Using asyncBatchScrapeUrls instead of batchScrapeUrls to return job ID immediately rather than waiting for completion
  • Testing limitation: Current tests are mocked - real API testing needed to verify maxConcurrency functionality
  • Session info: Requested by Nick (@nickscamara) - Link to Devin run

- Add BATCH_SCRAPE_TOOL definition with maxConcurrency parameter
- Add CHECK_BATCH_STATUS_TOOL for monitoring batch operations
- Implement request handlers for both new tools
- Register tools in ListToolsRequestSchema handler
- Follow existing patterns from SCRAPE_TOOL and CHECK_CRAWL_STATUS_TOOL

Fixes ENG-2756

Co-Authored-By: Nick <nicolascamara29@gmail.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

The TypeScript compiler no longer requires error suppression for the origin
property in the batch scrape options, so removing the unused directive to
fix CI build failure.

Co-Authored-By: Nick <nicolascamara29@gmail.com>
@devin-ai-integration
Copy link
Contributor Author

Closing due to inactivity for more than 30 days. Configure here.

@savvaki
Copy link

savvaki commented Sep 2, 2025

why was this closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant