-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Firecrawl V1 migration / rehauling #15834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 3 Skipped Deployments
|
WalkthroughThis pull request revises several Firecrawl components by updating action configurations, removing legacy properties, and introducing new ones. Key changes include refactoring run methods to simplify parameter handling, consolidating JSON parsing logic in utility functions, and introducing new constants for output formats. Additionally, several properties have been removed or updated to reflect a new API version, and the package version has been incremented to denote major updates. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant A as Action (Crawl/Scrape)
participant F as Firecrawl Service
U->>A: Invoke run() with parameters
A->>F: Call crawl/scrape method with destructured parameters
F-->>A: Return response with identifier
A->>U: Output summary message using response.id
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 ESLint
components/housecall_pro/housecall_pro.app.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs components/rabbitmq/rabbitmq.app.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs components/servicetitan/servicetitan.app.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs 📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (3)
⏰ Context from checks skipped due to timeout of 90000ms (3)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (7)
components/firecrawl/common/utils.mjs (2)
17-17: Minor style preference.
A semicolon after a function definition can be optional in many JS style guides, so this is just a matter of stylistic consistency.
18-24: Centralized JSON parsing helper.
Encapsulating the try-catch inoptionalParseAsJSONimproves readability and ensures code reuse. However, consider whether you want to log or handle parse errors to help with debugging in some scenarios.components/firecrawl/actions/scrape-page/scrape-page.mjs (1)
71-73: Extra scraping options.
additionalOptionsis flexible; passing it throughparseObjectEntriessupports partial or nested JSON structures. Consider clarifying allowable keys to avoid confusion.components/firecrawl/firecrawl.app.mjs (1)
12-16: Consider validating input types.
While allowing extra parameters viaadditionalOptionsis flexible, ensure that any nested objects or arrays are properly validated before passing them to the API to avoid unexpected parsing errors.components/firecrawl/actions/crawl-url/crawl-url.mjs (3)
60-63: Careful with external link expansions.
Following external sites can balloon the crawl scope. Consider adding usage hints or warnings.
66-71: Typo in the documentation link format.
The link bracket is unbalanced:[https://docs.firecrawl.dev/api-reference/endpoint/crawl-post). Add the missing bracket to ensure proper rendering.- description: "Additional parameters to send in the request. [https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) ... + description: "Additional parameters to send in the request. [https://docs.firecrawl.dev/api-reference/endpoint/crawl-post] ...
75-83: Validate merging of properties inrun()method.
Destructuringthismerges all user-provided fields intodata, which can cause collisions in certain scenarios. Consider restricting or enumerating which fields are forwarded to ensure consistent input.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (7)
components/firecrawl/actions/crawl-url/crawl-url.mjs(2 hunks)components/firecrawl/actions/get-crawl-status/get-crawl-status.mjs(2 hunks)components/firecrawl/actions/scrape-page/scrape-page.mjs(2 hunks)components/firecrawl/common/constants.mjs(1 hunks)components/firecrawl/common/utils.mjs(1 hunks)components/firecrawl/firecrawl.app.mjs(2 hunks)components/firecrawl/package.json(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- components/firecrawl/common/constants.mjs
- components/firecrawl/package.json
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Verify TypeScript components
- GitHub Check: pnpm publish
- GitHub Check: Publish TypeScript components
🔇 Additional comments (35)
components/firecrawl/common/utils.mjs (4)
1-1: Refactored function declaration looks fine.
Changing from an arrow function to a standard function declaration is acceptable and can improve clarity, especially if you plan to reuse the function or hoist it.
7-7: Good use of a dedicated function for parsing JSON conditionally.
CallingoptionalParseAsJSON(item)inside the array map keeps the code clean and ensures each string item is safely parsed.
13-13: Consistent JSON parsing for string inputs.
UsingoptionalParseAsJSONhere matches the logic for arrays and ensures uniform handling of string data.
26-39: Provides flexible handling of key-value pairs.
parseObjectEntrieswill parse each value if it's a parseable string. This function is particularly useful when you receive partially stringified JSON objects. Keep in mind it only does one-level parsing; for deeply nested objects, you may need a recursive approach.components/firecrawl/actions/get-crawl-status/get-crawl-status.mjs (4)
5-5: Renamed action aligns with broader usage.
Changing the name to "Get Crawl Data" clarifies that the endpoint retrieves not just status but also data.
6-6: Documentation link updated.
Updating the URL to reflect the new endpoint path is appropriate. Ensure that the destination link remains valid in the future.
7-7: Version bump appears appropriate.
Incrementing to0.0.2signals minor changes. Make sure you maintain a consistent versioning strategy across related actions.
24-24: Minor improvement for summary formatting.
Including parentheses around the crawl ID is a stylistic choice that can improve readability.components/firecrawl/actions/scrape-page/scrape-page.mjs (14)
1-2: New imports enable modular usage.
ImportingOUTPUT_FORMAT_OPTIONSandparseObjectEntrieshelps encapsulate logic and constants outside this file, promoting code reuse.
8-9: Description updated to reflect new documentation.
Providing a direct link to the updated docs helps users quickly find relevant reference material.
10-10: Major version update.
Version1.0.0indicates a significant or breaking change, which is consistent with the added and removed properties in this action.
19-19: Clearer URL description.
“The URL to scrape” is concise and easily understandable.
21-25: Flexible output format handling.
Definingformatsas an array of strings and referencingOUTPUT_FORMAT_OPTIONSgives users a straightforward way to customize the scraped data’s output.
29-31: Selective content retrieval.
onlyMainContentis a handy property for focusing on key content. Ensure documentation clarifies what qualifies as “main” content.
34-39: Granular inclusion of tags.
Allowing users to specifyincludeTagscan be extremely helpful. Thoroughly test edge cases where tags might overlap withexcludeTags.
40-45: Granular exclusion of tags.
excludeTagsis similarly valuable. Combining it withincludeTagsmay raise potential conflicts; consider clarifications or priority rules in the docs.
46-51: Configurable request headers.
Providing a headers object promotes flexibility, such as setting user-agent or cookies faking.
52-57: Wait time for dynamic content.
waitForhelps ensure that dynamic content loads properly before scraping. Document potential performance impacts of long wait times.
58-63: Mobile emulation.
mobilecan help replicate a mobile user experience. This is especially useful for verifying responsive layouts.
65-69: Supported request consumer control.
timeoutprovides a safeguard against long-running requests, preventing indefinite hangs.
78-80: Clean destructuring of props.
This pattern keeps code manageable by separating outadditionalOptionsand re-bundling everything else indata.
84-85: Efficient merging of options.
UsingparseObjectEntries(additionalOptions)ensures each option is conditionally parsed without cluttering your main scraping logic.components/firecrawl/firecrawl.app.mjs (4)
10-10: No issues with the updated description.
This description succinctly clarifies the purpose of theurlproperty.
19-23: Minor improvement in property clarity.
Renaming and clarifying thecrawlIddescription helps users better understand its usage. No further issues found.
54-54: Confirm removal of/statussegment.
This new path,/crawl/${crawlId}, replaces the older/status. Make sure the upstream and downstream calls align with this updated endpoint.
27-27:❓ Verification inconclusive
Verify the new API endpoint version.
Ensure that all references to the oldv0endpoint have been removed or refactored to avoid inconsistent API requests.
🏁 Script executed:
#!/bin/bash # Searches for any references to the old v0 endpoint in the repository rg "https://api\.firecrawl\.dev/v0"Length of output: 37
API Endpoint Update Verified – Manual Confirmation Recommended
The search for any lingering"https://api.firecrawl.dev/v0"reference returned no results, indicating that the codebase now consistently uses the new v1 endpoint as shown incomponents/firecrawl/firecrawl.app.mjs(line 27). However, because the automated search produced minimal output, please manually verify across the repository to ensure no overlooked references to the old API remain.
- Confirm that the file
components/firecrawl/firecrawl.app.mjsnow exclusively returns"https://api.firecrawl.dev/v1".- Double-check any related modules or configuration files for hardcoded URLs referencing v0.
components/firecrawl/actions/crawl-url/crawl-url.mjs (9)
7-8: Description and version updates look good.
These changes clearly communicate the action’s purpose and version.
18-23: Exclusion logic is clearly documented.
No issues found. TheexcludePathsproperty meets typical user needs for regulating URL path crawls.
31-33: Clarity on crawl depth.
Renaming and clarifying themaxDepthproperty helps users control recursion levels effectively.
37-39: No concerns with ignoring sitemaps.
This feature can be useful for advanced or custom crawls.
42-45: Good optional toggle for query parameters.
Disabling repeated scraping of different query params can significantly reduce redundant requests.
48-51: Limit parameter is appropriately placed.
Capping the maximum pages at the client side is standard.
54-57: Back-link crawling logic looks fine.
Allowing backward links can be beneficial, but remain mindful of potential infinite loop scenarios if link structures are cyclical.
86-86: Summary message is helpful.
The dynamic output withresponse.idis clear and user-friendly.
1-1:✅ Verification successful
Validate import path.
Confirm thatparseObjectEntriesis defined and exported in../../common/utils.mjs.
🏁 Script executed:
#!/bin/bash fd utils.mjs --exec cat {} | rg 'parseObjectEntries'Length of output: 186
Validated Import Path: The function
parseObjectEntriesis confirmed to be defined and exported in../../common/utils.mjs(as evidenced by the search results). No changes are needed regarding the import path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
components/firecrawl/actions/crawl-url/crawl-url.mjs (1)
71-71: Fix malformed Markdown link in description.There's a syntax issue in the description where the markdown link is not properly formatted.
- description: "Additional parameters to send in the request. [https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for available parameters. Values will be parsed as JSON where applicable.", + description: "Additional parameters to send in the request. [See the documentation](https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for available parameters. Values will be parsed as JSON where applicable.",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/firecrawl/actions/crawl-url/crawl-url.mjs(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pnpm publish
- GitHub Check: Verify TypeScript components
- GitHub Check: Publish TypeScript components
🔇 Additional comments (6)
components/firecrawl/actions/crawl-url/crawl-url.mjs (6)
7-8: Version increment and documentation update look good.The description now clearly points to the relevant documentation and the version has been properly incremented to 1.0.0 to reflect the significant changes in the Firecrawl API.
18-23: New excludePaths property is well-documented.The property is clear and the description provides a helpful example to illustrate how the regex pattern matching works.
24-29: The includePaths label is correct.The label "Include Paths" now properly matches the property's purpose, addressing the previous review comment.
42-47: Good addition of new crawling parameters.The new properties (ignoreQueryParameters, allowBackwardLinks, allowExternalLinks) are well-documented with clear descriptions that explain their purpose.
Also applies to: 54-65
74-83: Good refactoring of parameter handling in run method.The destructuring approach improves code readability and simplifies parameter passing. The use of the parseObjectEntries utility function for additionalOptions is a clean way to handle the JSON parsing.
86-86: Updated API field name in summary export.The code now correctly references
response.idinstead of the previousresponse.jobId, matching the changes in the Firecrawl V1 API response structure.
michelle0927
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/firecrawl/actions/crawl-url/crawl-url.mjs(2 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
components/firecrawl/actions/crawl-url/crawl-url.mjs
[error] 71-71: expected , but instead found webhook
Remove webhook
(parse)
[error] 71-71: expected , but instead found ": {"
Remove ": {"
(parse)
[error] 71-71: expected : but instead found url
Remove url
(parse)
[error] 71-71: expected , but instead found ": "
Remove ": "
(parse)
[error] 71-71: expected : but instead found https
Remove https
(parse)
[error] 71-71: expected , but instead found :
Remove :
(parse)
[error] 72-72: Expected a property, a shorthand property, a getter, a setter, or a method but instead found '}'.
Expected a property, a shorthand property, a getter, a setter, or a method here.
(parse)
🪛 GitHub Actions: Pull Request Checks
components/firecrawl/actions/crawl-url/crawl-url.mjs
[error] 71-71: Parsing error: Unexpected token webhook
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pnpm publish
- GitHub Check: Publish TypeScript components
- GitHub Check: Verify TypeScript components
🔇 Additional comments (13)
components/firecrawl/actions/crawl-url/crawl-url.mjs (13)
1-1: No actionable feedback.
7-8: No actionable feedback.
18-21: No actionable feedback.
24-27: No actionable feedback.
31-33: No actionable feedback.
37-39: No actionable feedback.
42-45: No actionable feedback.
48-51: No actionable feedback.
54-57: No actionable feedback.
60-63: No actionable feedback.
75-78: No actionable feedback.
81-82: No actionable feedback.
86-86: No actionable feedback.
| additionalOptions: { | ||
| propDefinition: [ | ||
| firecrawl, | ||
| "waitFor", | ||
| "additionalOptions", | ||
| ], | ||
| optional: true, | ||
| description: "Additional parameters to send in the request. [https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for available parameters. Values will be parsed as JSON where applicable. For example, to add the `webhook` param, use the value `{"webhook": {"url": "https://your-server-webhook-api.com","headers": {},"metadata": {},"events": ["completed"]}}`", | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the parse error in the description.
Pipeline logs indicate a parsing error at line 71 due to bracket mismatch and unescaped backticks in the doc link and JSON example. This breaks the build. Please apply the following fix:
- description: "Additional parameters to send in the request. [https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for available parameters. Values will be parsed as JSON where applicable. For example, to add the `webhook` param, use the value `{\"webhook\": {\"url\": \"https://your-server-webhook-api.com\",\"headers\": {},\"metadata\": {},\"events\": [\"completed\"]}}`",
+ description: "Additional parameters to send in the request. (https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for available parameters. Values will be parsed as JSON where applicable. For example, to add the 'webhook' param, use the value '{\"webhook\":{\"url\":\"https://your-server-webhook-api.com\",\"headers\":{},\"metadata\":{},\"events\":[\"completed\"]}}'.",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| additionalOptions: { | |
| propDefinition: [ | |
| firecrawl, | |
| "waitFor", | |
| "additionalOptions", | |
| ], | |
| optional: true, | |
| description: "Additional parameters to send in the request. [https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for available parameters. Values will be parsed as JSON where applicable. For example, to add the `webhook` param, use the value `{"webhook": {"url": "https://your-server-webhook-api.com","headers": {},"metadata": {},"events": ["completed"]}}`", | |
| }, | |
| additionalOptions: { | |
| propDefinition: [ | |
| firecrawl, | |
| "additionalOptions", | |
| ], | |
| description: "Additional parameters to send in the request. (https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for available parameters. Values will be parsed as JSON where applicable. For example, to add the 'webhook' param, use the value '{\"webhook\":{\"url\":\"https://your-server-webhook-api.com\",\"headers\":{},\"metadata\":{},\"events\":[\"completed\"]}}'.", | |
| }, |
🧰 Tools
🪛 Biome (1.9.4)
[error] 71-71: expected , but instead found webhook
Remove webhook
(parse)
[error] 71-71: expected , but instead found ": {"
Remove ": {"
(parse)
[error] 71-71: expected : but instead found url
Remove url
(parse)
[error] 71-71: expected , but instead found ": "
Remove ": "
(parse)
[error] 71-71: expected : but instead found https
Remove https
(parse)
[error] 71-71: expected , but instead found :
Remove :
(parse)
[error] 72-72: Expected a property, a shorthand property, a getter, a setter, or a method but instead found '}'.
Expected a property, a shorthand property, a getter, a setter, or a method here.
(parse)
🪛 GitHub Actions: Pull Request Checks
[error] 71-71: Parsing error: Unexpected token webhook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
components/firecrawl/actions/crawl-url/crawl-url.mjs (1)
71-71:⚠️ Potential issueFix the URL formatting and escape quotes in description.
The documentation URL has mismatched brackets and there might still be potential parsing issues with backticks and quotes in the JSON example.
- description: "Additional parameters to send in the request. [https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for available parameters. Values will be parsed as JSON where applicable. For example, to add the `webhook` param, use the value `{\"webhook\": {\"url\": \"https://your-server-webhook-api.com\",\"headers\": {},\"metadata\": {},\"events\": [\"completed\"]}}`", + description: "Additional parameters to send in the request. See [documentation](https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for available parameters. Values will be parsed as JSON where applicable. For example, to add the webhook param, use the value '{\"webhook\":{\"url\":\"https://your-server-webhook-api.com\",\"headers\":{},\"metadata\":{},\"events\":[\"completed\"]}}'",
🧹 Nitpick comments (1)
components/firecrawl/actions/crawl-url/crawl-url.mjs (1)
66-72: Consider adding error handling for JSON parsing.While the code uses parseObjectEntries to handle JSON parsing, there's no explicit error handling if the additionalOptions contains invalid JSON.
async run({ $ }) { const { firecrawl, additionalOptions, ...data } = this; + let parsedOptions = {}; + try { + if (additionalOptions) { + parsedOptions = parseObjectEntries(additionalOptions); + } + } catch (error) { + throw new Error(`Error parsing additionalOptions: ${error.message}`); + } const response = await firecrawl.crawl({ $, data: { ...data, - ...(additionalOptions && parseObjectEntries(additionalOptions)), + ...parsedOptions, }, });
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/firecrawl/actions/crawl-url/crawl-url.mjs(2 hunks)
🔇 Additional comments (5)
components/firecrawl/actions/crawl-url/crawl-url.mjs (5)
1-1: Good addition of utility import.The imported
parseObjectEntriesutility function is appropriately used later to handle JSON parsing for the additionalOptions, which improves code maintainability.
7-8: Version bump correctly reflects significant changes.The version has been appropriately incremented from 0.0.1 to 1.0.0, indicating a major update with breaking changes to the component's interface. The updated description with documentation link is helpful for users.
24-27: Label and property name for includePaths now match correctly.The label now correctly states "Include Paths" which aligns with the property name and prevents user confusion.
75-83: Clean code refactoring of the run method.The updated implementation properly destructures properties and uses the parseObjectEntries utility to handle JSON parsing, resulting in cleaner and more maintainable code.
86-86: Updated summary message to use correct ID property.The summary message has been updated to use
response.idinstead of the previousresponse.jobId, which aligns with the updated API response structure.
|
/approve |
Closes #15780
Crawl and Scrape endpoints changed pretty drastically in terms of parameters, so the props were changed accordingly.
Summary by CodeRabbit