-
Notifications
You must be signed in to change notification settings - Fork 5.5k
[Components] scrapingant #13316 #15379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 3 Skipped Deployments
|
WalkthroughThis pull request introduces a new ScrapingAnt component for web scraping in Pipedream. The changes include creating a new general extraction action, adding constants for proxy and resource types, and enhancing the ScrapingAnt application module with new configuration properties and methods. The package version has been updated to reflect these significant additions, providing a structured approach to configuring and executing web scraping tasks. Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip 🌐 Web search-backed reviews and chat
📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
components/scrapingant/actions/general-extraction/general-extraction.mjs (1)
4-8: Version number should start at 0.1.0 for new componentsThe version is set to "0.0.1" but as this is a new component with complete functionality, it should start at 0.1.0 to align with semantic versioning practices.
- version: "0.0.1", + version: "0.1.0",components/scrapingant/common/constants.mjs (1)
110-163: Add documentation links for resource typesThe resource types should include references to the Puppeteer/Chrome DevTools Protocol documentation for better understanding of each type's purpose.
RESOURCE_TYPES: [ + // See: https://pptr.dev/api/puppeteer.resourcetype { label: "Document", value: "document", },components/scrapingant/scrapingant.app.mjs (1)
65-67: Consider making the base URL configurableThe API endpoint should be configurable to support different environments (e.g., staging, production) and potential future changes.
_baseUrl() { - return "https://api.scrapingant.com/v2"; + return this.$auth.base_url || "https://api.scrapingant.com/v2"; },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (4)
components/scrapingant/actions/general-extraction/general-extraction.mjs(1 hunks)components/scrapingant/common/constants.mjs(1 hunks)components/scrapingant/package.json(2 hunks)components/scrapingant/scrapingant.app.mjs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pnpm publish
- GitHub Check: Verify TypeScript components
- GitHub Check: Publish TypeScript components
🔇 Additional comments (2)
components/scrapingant/common/constants.mjs (1)
2-99: Consider adding ISO code validation for proxy countriesThe proxy countries list should be validated against ISO 3166-1 alpha-2 country codes to ensure accuracy. Also, consider sorting the countries alphabetically for better maintainability.
✅ Verification successful
Country codes are valid and well-organized
All country codes in the PROXY_COUNTRIES constant are valid ISO 3166-1 alpha-2 codes. The current organization of the list is logical and maintainable.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify country codes against ISO standard curl -s https://raw.githubusercontent.com/lukes/ISO-3166-Countries-with-Regional-Codes/master/slim-2/slim-2.json | \ jq -r '.[] | select(.["alpha-2"] | IN("BR","CA","CN","CZ","FR","DE","HK","IN","ID","IT","IL","JP","NL","PL","RU","SA","SG","KR","ES","GB","AE","US","VN")) | .["alpha-2"]' | \ sort | uniq -c | grep -v " 1"Length of output: 324
components/scrapingant/package.json (1)
15-17: Verify compatibility with @pipedream/platform versionThe specified version range for @pipedream/platform should be tested for compatibility.
✅ Verification successful
@pipedream/platform dependency version is appropriate
The specified version
^3.0.3is using the latest available 3.0.x release and follows proper semver practices.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check latest version and compatibility npm info @pipedream/platform versions --json | jq -r '.[] | select(startswith("3.0"))' | sort -VLength of output: 122
components/scrapingant/actions/general-extraction/general-extraction.mjs
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few improvement suggestions here
Co-authored-by: Guilherme Falcão <[email protected]>
Co-authored-by: Guilherme Falcão <[email protected]>
Co-authored-by: Guilherme Falcão <[email protected]>
Co-authored-by: Guilherme Falcão <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks almost good to go, moving it to QA, there's just one more syntax improvement I commented on.
components/scrapingant/actions/general-extraction/general-extraction.mjs
Show resolved
Hide resolved
|
/approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
components/scrapingant/scrapingant.app.mjs (1)
7-64: Improve prop descriptions and options.The prop definitions are well-structured but could benefit from more concise descriptions and better use of options.
Apply these improvements:
blockResource: { type: "string[]", label: "Block Resource", - description: "Prevents cloud browser from loading specified resource types", + description: "Prevents cloud browser from loading the specified resource types", options: constants.RESOURCE_TYPES, optional: true, },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (3)
components/scrapingant/actions/general-extraction/general-extraction.mjs(1 hunks)components/scrapingant/common/constants.mjs(1 hunks)components/scrapingant/scrapingant.app.mjs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/scrapingant/common/constants.mjs
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pnpm publish
- GitHub Check: Verify TypeScript components
- GitHub Check: Publish TypeScript components
🔇 Additional comments (5)
components/scrapingant/actions/general-extraction/general-extraction.mjs (3)
3-72: LGTM! Well-structured component definition.The component metadata and props are well-defined with appropriate documentation links and proper handling of browser-dependent properties.
90-107: Add error handling and improve summary message.The run method needs:
- Error handling for API calls
- A more informative summary message including the scraped URL
Apply this diff to improve the code:
async run({ $ }) { + try { const response = await this.app.generalExtraction({ $, params: { url: this.url, browser: this.browser, return_page_source: this.returnPageSource, cookies: this.cookies, js_snippet: this.jsSnippet, proxy_type: this.proxyType, proxy_country: this.proxyCountry, wait_for_selector: this.waitForSelector, block_resource: this.blockResource, }, }); - $.export("$summary", "Successfully sent the request to ScrapingAnt"); + $.export("$summary", `Successfully scraped ${this.url}`); return response; + } catch (error) { + throw new Error(`Failed to scrape URL: ${error.message}`); + } },
73-88: 🛠️ Refactor suggestionConsolidate duplicate conditions and remove unused object.
The code can be simplified by:
- Consolidating the three identical browser checks into one.
- Removing the unused
propsobject.Apply this diff to improve the code:
async additionalProps(existingProps) { - const props = {}; if (this.browser) { existingProps.returnPageSource.hidden = false; existingProps.returnPageSource.disabled = false; - } - if (this.browser) { existingProps.jsSnippet.hidden = false; existingProps.jsSnippet.disabled = false; - } - if (this.browser) { existingProps.blockResource.hidden = false; existingProps.blockResource.disabled = false; } - return props; + return {}; },Likely invalid or redundant comment.
components/scrapingant/scrapingant.app.mjs (2)
69-84: Enhance error handling in _makeRequest method.The request method should include timeout configuration and better error handling with specific error types.
Apply this diff to improve the code:
async _makeRequest(opts = {}) { const { $ = this, path, headers, + timeout = 30000, ...otherOpts } = opts; - return axios($, { - ...otherOpts, - url: this._baseUrl() + path, - headers: { - ...headers, - "x-api-key": `${this.$auth.api_token}`, - }, - }); + try { + return await axios($, { + ...otherOpts, + url: this._baseUrl() + path, + headers: { + ...headers, + "x-api-key": `${this.$auth.api_token}`, + }, + timeout, + }); + } catch (error) { + if (error.response?.status === 401) { + throw new Error("Invalid API key. Please check your authentication credentials."); + } + throw error; + } },
85-90: LGTM! Simple and effective implementation.The generalExtraction method is well-implemented with proper parameter passing.
|
@lcaresia I think the change requested above hasn't been applied yet |
|
/approve |
WHY
Summary by CodeRabbit
New Features
Improvements
Version Update