Fix FirecrawlScrapeWebsiteTool#298
Conversation
…ect Dict type annotation - Add required config parameter when creating the tool - Change type hint from `dict` to `Dict` to resolve Pydantic validation issues
|
Disclaimer: This review was made by a crew of AI Agents. Code Review Comment for PR #298: FirecrawlScrapeWebsiteTool ChangesOverviewThe changes in this pull request aim to enhance the implementation of the FirecrawlScrapeWebsiteTool by correcting type annotations and adding necessary configuration parameters. Below is a detailed analysis of the code along with recommendations for further improvement. Positive Aspects
Issues and Recommendations1. Type Hint ConsistencyThe Current Implementation:def _run(self, url: str, timeout: Optional[int] = 30000):
return self._firecrawl.scrape_url(url, **self.config)Recommended Implementation:def _run(self, url: str, timeout: Optional[int] = None) -> str:
config = self.config.copy()
if timeout:
config['timeout'] = timeout
return self._firecrawl.scrape_url(url, **config)2. Configuration ManagementThe current implementation does not merge user-defined config values with default values. Current Implementation:if config:
kwargs["config"] = configRecommended Implementation:def __init__(self, api_key: Optional[str] = None, config: Optional[Dict[str, Any]] = None, **kwargs):
default_config = {
"formats": ["markdown"],
"only_main_content": True,
"include_tags": [],
"exclude_tags": [],
"headers": {},
"wait_for": 0,
"json_options": None
}
if config:
default_config.update(config)
kwargs["config"] = default_config
super().__init__(**kwargs)3. Enhanced Error HandlingImproving error messages during imports can enhance debugging experience. Current Implementation:try:
from firecrawl import FirecrawlApp # type: ignore
except ImportError:
raise ImportError("Firecrawl is not installed. Please install it with `pip install firecrawl`")Recommended Implementation:try:
from firecrawl import FirecrawlApp # type: ignore
except ImportError as e:
raise ImportError(
"Firecrawl package is required but not installed. "
"Please install it with `pip install firecrawl`. "
f"Original error: {str(e)}"
) from e4. Documentation ClarityAdding type hints in the docstring can clarify expected input and output types. Recommended Docstring enhancement:class FirecrawlScrapeWebsiteTool(BaseTool):
"""Tool for scraping websites using Firecrawl.
Args:
api_key (Optional[str]): Firecrawl API key
config (Optional[Dict[str, Any]]): Configuration dictionary with the following options:
formats (List[str]): Output formats, e.g., ["markdown"]
only_main_content (bool): Extract only main content. Default: True
include_tags (List[str]): Tags to include. Default: []
exclude_tags (List[str]): Tags to exclude. Default: []
headers (Dict[str, str]): Headers to include. Default: {}
wait_for (int): Time to wait for page to load in ms. Default: 0
json_options (Optional[Dict]): Options for JSON extraction. Default: None
Returns:
str: Scraped content in specified format
"""5. Configuration ValidationAdding validation checks ensures that required fields are correctly set and of the right type. Recommended Implementation:@property
def config(self) -> Dict[str, Any]:
required_fields = {
"formats": list,
"only_main_content": bool,
"include_tags": list,
"exclude_tags": list,
"headers": dict
}
for field, expected_type in required_fields.items():
if field not in self._config or not isinstance(self._config[field], expected_type):
raise ValueError(f"Config field '{field}' must be of type {expected_type}")
return self._configHistorical Context from Related PRs
ConclusionThe changes made in this PR move in the right direction by improving type handling, documentation, and error management. Implementing the above recommendations will further enhance the robustness and usability of the |
| def __init__(self, api_key: Optional[str] = None, config: Optional[Dict[str, Any]] = None, **kwargs): | ||
| if config: | ||
| kwargs["config"] = config |
There was a problem hiding this comment.
I got a bit confused here since
config: Dict[str, Any] = Field(means that config is required
however, your initializer is treating is as optional:
config: Optional[Dict[str, Any]] = NoneDoes this change to config really needed?
| self._firecrawl = FirecrawlApp(api_key=api_key) | ||
|
|
||
| def _run(self, url: str): | ||
| def _run(self, url: str, timeout: Optional[int] = 30000): |
There was a problem hiding this comment.
I think this parameter very useless haha.. We are even using that here..
What about removing from FirecrawlScrapeWebsiteToolSchema?
- removing optional config - removing timeout from Pydantic model
- removing config from __init__
- removing timeout
lucasgomide
left a comment
There was a problem hiding this comment.
hey @nicoferdi96 welcome and thanks for your first contribution
* fix FirecrawlScrapeWebsiteTool: add missing config parameter and correct Dict type annotation - Add required config parameter when creating the tool - Change type hint from `dict` to `Dict` to resolve Pydantic validation issues * Update firecrawl_scrape_website_tool.py - removing optional config - removing timeout from Pydantic model * Removing config from __init__ - removing config from __init__ * Update firecrawl_scrape_website_tool.py - removing timeout
Add missing config parameter and correct Dict type annotation
dicttoDictto resolve Pydantic validation issues