Skip to content
This repository was archived by the owner on Nov 10, 2025. It is now read-only.

Fix issue #2783: Allow FirecrawlScrapeWebsiteTool to handle URL in constructor#300

Closed
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1746697111-fix-firecrawl-scrape-website-tool
Closed

Fix issue #2783: Allow FirecrawlScrapeWebsiteTool to handle URL in constructor#300
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
devin/1746697111-fix-firecrawl-scrape-website-tool

Conversation

@devin-ai-integration
Copy link
Contributor

Fix issue #2783: Allow FirecrawlScrapeWebsiteTool to handle URL in constructor

Fixes crewAIInc/crewAI#2783

Description

This PR fixes an issue where the FirecrawlScrapeWebsiteTool fails when a URL is passed to the constructor rather than the run method. The documentation on the CrewAI website shows users passing the URL to the constructor, but the tool doesn't handle this parameter correctly, resulting in the error: "FirecrawlApp.scrape_url() takes 2 positional arguments but 3 were given".

Changes

  • Modified the FirecrawlScrapeWebsiteTool to accept and store a URL parameter in the constructor
  • Modified the _run method to use the URL from run() if provided, otherwise use the URL from the constructor
  • Updated the README.md to show both usage patterns
  • Added tests to verify the fix works correctly

Testing

Created tests to verify:

  • URL can be passed to the constructor
  • URL can be passed to the run method
  • URL in run method overrides URL in constructor
  • Error is raised if no URL is provided

Link to Devin run: https://app.devin.ai/sessions/006b12b0476e4bfba670ecfbe3c2e75a
Requested by: Joe Moura (joao@crewai.com)

…nstructor

Co-Authored-By: Joe Moura <joao@crewai.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@joaomdmoura
Copy link
Collaborator

Disclaimer: This review was made by a crew of AI Agents.

Code Review for PR #300

This pull request addresses issue #2783 by enhancing the FirecrawlScrapeWebsiteTool to handle URLs in the constructor, thereby increasing flexibility in URL handling. The changes span across three files: firecrawl_scrape_website_tool.py, README.md, and test_issue_2783_fixed.py. Below are detailed observations and recommendations for each file.

1. firecrawl_scrape_website_tool.py Analysis

Positive Aspects:

  • URL Handling Flexibility: The introduction of URL handling in both the constructor and the run() method enhances usability by allowing users to provide URLs as per convenience.
  • Type Hints and Documentation: The usage of type hints for parameters improves code readability and maintainability.
  • Integration: The modifications seamlessly fit into the existing code structure, making it easier for developers to adopt.

Issues and Suggestions:

  1. Error Handling Enhancement: It’s crucial to implement additional checks for URL validity to prevent runtime errors. Consider the following:

    def _run(self, url: Optional[str] = None, **kwargs):
        url_to_use = url if url is not None else self.url
        if not isinstance(url_to_use, str) or not url_to_use.strip():
            raise ValueError("URL must be a non-empty string")
        return self._firecrawl.scrape_url(url_to_use, **self.config)
  2. Constructor Parameter Documentation: While the constructor documentation exists, it should explicitly outline optional parameters:

    def __init__(self, api_key: Optional[str] = None, url: Optional[str] = None, **kwargs) -> None:
        """
        Initialize FirecrawlScrapeWebsiteTool.
        Args:
            api_key (Optional[str]): Firecrawl API key
            url (Optional[str]): Website URL
            **kwargs: Additional configurations
        """
  3. Configuration Validation: Implement more comprehensive validation checks for the configuration dictionary, ensuring that required fields adhere to expected types and formats:

    @validator('config')
    def validate_config(cls, v):
        if not isinstance(v, dict):
            raise ValueError("Config must be a dictionary")
        if 'formats' in v and not isinstance(v['formats'], list):
            raise ValueError("formats must be a list")
        return v

2. README.md Analysis

Positive Aspects:

  • Usage Examples: The documentation includes clear examples illustrating different usage scenarios, which will help users integrate the tool effectively.
  • Structured Layout: Well-organized documentation enhances accessibility and readability for end-users.

Suggestions:

  1. Error Handling Examples: Adding practical examples that demonstrate error handling capabilities can greatly aid users in managing exceptions:
    ## Error Handling
    ```python
    try:
        tool = FirecrawlScrapeWebsiteTool(url="invalid-url")
        result = tool.run()
    except ValueError as e:
        print(f"Error: {e}")
    
    
  2. Configuration Examples: Including various configuration examples for advanced users will make the documentation even more comprehensive:
    ## Configuration Examples
    ```python
    config = {"formats": ["html", "markdown"], "wait_for": 5}
    tool = FirecrawlScrapeWebsiteTool(url="example.com", config=config)
    
    

3. test_issue_2783_fixed.py Analysis

Positive Aspects:

  • Test Coverage: Extensive tests are provided to validate the new features, ensuring robustness in various scenarios.
  • Effective Mocking: Usage of mocking effectively isolates tests and maintains reliability of test results.

Suggestions:

  1. Edge Case Tests: Introducing tests for edge cases will further solidify the robustness of the tool:

    def test_invalid_url_format(self):
        tool = FirecrawlScrapeWebsiteTool(api_key="test_key")
        with self.assertRaises(ValueError):
            tool.run(url="")
        with self.assertRaises(ValueError):
            tool.run(url=None)
  2. Cleanup in setUp/tearDown: Implementing cleanup methods can ensure that the testing environment remains pristine between tests:

    def tearDown(self):
        mock_firecrawl_app.reset_mock()
        self.mock_instance.reset_mock()

General Recommendations

  • Documentation Enhancements: Augment error messages and the clarity around expected input types and output behaviors.
  • Code Quality Improvements: Introduce additional input validations for URLs and implement logging to capture API interactions for troubleshooting.
  • Testing Improvements: Consider adding more integration and performance tests to comprehensively assess the tool’s behavior under load.

Security Considerations

  • URL Validation: Ensuring robust URL validation is imperative to ward off security vulnerabilities.
  • API Key Management: Ensure that API keys are masked in logs and have mechanisms for secure handling and rotation.

The changes made in this PR represent an important step forward in enhancing the functionality of the FirecrawlScrapeWebsiteTool. Addressing the recommendations made here will further increase the code's robustness and maintainability.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] FirecrawlApp.scrape_url()takes 2 positional arguments but 3 were given

2 participants