-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New Components - webscrape_ai #17582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 3 Skipped Deployments
|
|
""" WalkthroughA new "Scrape Website" action was added to the WebScrapeAI integration, allowing users to specify a URL, extraction command, schema, pagination, headers, and custom JavaScript instructions. The app logic was refactored to support real API requests, and package dependencies were updated to include the platform library. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Action ("scrape-website.mjs")
participant App ("webscrape_ai.app.mjs")
participant WebScrapeAI_API
User->>Action: Provide URL, command, schema, etc.
Action->>App: Call scrapeWebsite(opts)
App->>WebScrapeAI_API: POST /scrapeWebSite with params
WebScrapeAI_API-->>App: Return scraped data
App-->>Action: Return response
Action-->>User: Export summary & return data
Assessment against linked issues
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
components/webscrape_ai/actions/scrape-website/scrape-website.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
components/webscrape_ai/actions/scrape-website/scrape-website.mjs (2)
26-30: Clarify the schema prop description.The description shows an object example but the prop type is string, which could confuse users about the expected input format.
Consider updating the description to clarify both formats are supported:
- description: "Schema representing the fields you want to scrape. E.g. `{\"author\":\"string\",\"comments_count\":\"integer\",\"points\":\"integer\",\"posted_time\":\"string\",\"title\":\"string\",\"url\":\"url\"}`", + description: "Schema representing the fields you want to scrape. Can be a JSON string or object. E.g. `{\"author\":\"string\",\"comments_count\":\"integer\",\"points\":\"integer\",\"posted_time\":\"string\",\"title\":\"string\",\"url\":\"url\"}`",
37-42: Clarify the headers format.The description mentions "key-value pairs" but doesn't specify the exact format expected by the API.
Consider providing a clearer format example:
- description: "List of headers in key-value pairs. i.e `Accept: application/json`", + description: "HTTP headers as key-value pairs, one per line. E.g. `Accept: application/json`",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (3)
components/webscrape_ai/actions/scrape-website/scrape-website.mjs(1 hunks)components/webscrape_ai/package.json(2 hunks)components/webscrape_ai/webscrape_ai.app.mjs(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
components/webscrape_ai/package.json (1)
Learnt from: jcortes
PR: PipedreamHQ/pipedream#14935
File: components/sailpoint/package.json:15-18
Timestamp: 2024-12-12T19:23:09.039Z
Learning: When developing Pipedream components, do not add built-in Node.js modules like `fs` to `package.json` dependencies, as they are native modules provided by the Node.js runtime.
components/webscrape_ai/webscrape_ai.app.mjs (1)
Learnt from: GTFalcao
PR: PipedreamHQ/pipedream#16954
File: components/salesloft/salesloft.app.mjs:14-23
Timestamp: 2025-06-04T17:52:05.780Z
Learning: In the Salesloft API integration (components/salesloft/salesloft.app.mjs), the _makeRequest method returns response.data which directly contains arrays for list endpoints like listPeople, listCadences, listUsers, and listAccounts. The propDefinitions correctly call .map() directly on these responses without needing to destructure a nested data property.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pnpm publish
- GitHub Check: Verify TypeScript components
- GitHub Check: Lint Code Base
- GitHub Check: Publish TypeScript components
🔇 Additional comments (8)
components/webscrape_ai/package.json (2)
3-3: LGTM: Appropriate version bump for new functionality.The minor version increment correctly reflects the addition of new functionality (the scrape website action).
15-16: LGTM: Platform dependency correctly added.The
@pipedream/platformdependency is properly added to support the axios import used in the app client.components/webscrape_ai/webscrape_ai.app.mjs (3)
1-1: LGTM: Proper platform import for HTTP client.The axios import from the platform is correctly implemented to support API requests.
8-22: LGTM: Standard Pipedream app client pattern.The implementation follows the established pattern with:
_baseUrl()method for API endpoint_makeRequest()method with automatic authentication- Proper parameter handling and request options spreading
23-28: LGTM: Clean API method wrapper.The
scrapeWebsitemethod provides a clean interface for the specific endpoint while leveraging the shared request infrastructure.components/webscrape_ai/actions/scrape-website/scrape-website.mjs (3)
11-15: LGTM: Helpful timeout alert for users.The alert about potential timeout issues is valuable user guidance for synchronous API operations.
56-58: LGTM: Proper schema handling.The conditional JSON.stringify for object schemas is well implemented to handle both string and object inputs.
64-67: Ensure the response is an array (or unwrap the data property).
The current summary usesresponse.lengthdirectly, but thescrapeWebsitecall may return a full HTTP response object or wrap the array under adataproperty. Please confirm the exact shape of the JSON returned by/scrapeWebSiteand update the code accordingly. For example, if the array is nested inresponse.data, you can:• Destructure and return only the array:
- const response = await this.webscrapeAi.scrapeWebsite({ … }); - $.export("$summary", `Scraped ${this.url} and got ${response.length} result${response.length === 1 ? "" : "s"}`); - return response; + const { data } = await this.webscrapeAi.scrapeWebsite({ … }); + $.export("$summary", `Scraped ${this.url} and got ${data.length} result${data.length === 1 ? "" : "s"}`); + return data;• Or handle both cases in one go:
const response = await this.webscrapeAi.scrapeWebsite({ … }); const results = Array.isArray(response) ? response : response.data; $.export("$summary", `Scraped ${this.url} and got ${results.length} result${results.length === 1 ? "" : "s"}`); return results;
luancazarine
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @michelle0927, LGTM! Ready for QA!
Resolves #17451
Summary by CodeRabbit
New Features
Refactor
Chores