- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.5k
New Components - scrapeninja #15753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Components - scrapeninja #15753
Conversation
| The latest updates on your projects. Learn more about Vercel for Git ↗︎ 3 Skipped Deployments
 | 
| WalkthroughThe pull request removes legacy files and introduces new modules that redefine the ScrapeNinja application. The outdated  Changes
 Sequence Diagram(s)sequenceDiagram
  participant A as "Action: Scrape with JS Rendering"
  participant B as "ScrapeNinja App (scrapeJs)"
  participant C as "API Request (_makeRequest)"
  
  A->>B: Prepare scraping config (viewport, parameters)
  B->>C: Call _makeRequest for JS rendering
  C-->>B: Return API response
  B-->>A: Return scraping result
sequenceDiagram
  participant A as "Action: Scrape without JS"
  participant B as "ScrapeNinja App (scrapeNonJs)"
  participant C as "API Request (_makeRequest)"
  
  A->>B: Assemble scraping parameters and data
  B->>C: Call _makeRequest for non-JS scraping
  C-->>B: Return API response
  B-->>A: Return scraping result
Possibly related PRs
 Suggested reviewers
 Poem
 Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 ESLint
 components/scrapeninja/actions/scrape-without-js/scrape-without-js.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs components/scrapeninja/actions/scrape-with-js-rendering/scrape-with-js-rendering.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs components/scrapeninja/scrapeninja.app.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs ✨ Finishing Touches
 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit: 
 Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
 Other keywords and placeholders
 CodeRabbit Configuration File ( | 
Actions - Non JS Scraping - Scraping With JS Rendering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few suggestions regarding component naming and prop descriptions
        
          
                components/scrapeninja/actions/non-js-scraping/non-js-scraping.mjs
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                components/scrapeninja/actions/scraping-with-js-rendering/scraping-with-js-rendering.mjs
              
                Outdated
          
            Show resolved
            Hide resolved
        
      | /approve | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
components/scrapeninja/actions/non-js-scraping/non-js-scraping.mjs (1)
83-104: Consider more robust error handling.The current error handling assumes a specific error structure with
response.data. This could fail if the error doesn't have this structure (e.g., network errors, timeouts).- } catch ({ response: { data } }) { - throw new ConfigurationError(data.message || data.stderr); + } catch (error) { + if (error.response && error.response.data) { + throw new ConfigurationError(error.response.data.message || error.response.data.stderr); + } + throw new ConfigurationError(error.message || "An unknown error occurred"); + }components/scrapeninja/actions/scraping-with-js-rendering/scraping-with-js-rendering.mjs (1)
184-229: Improve error handling for more robustness.Similar to the non-JS scraping action, the error handling here could be more robust to handle various error scenarios.
- } catch ({ response: { data } }) { - throw new ConfigurationError(parseError(data)); + } catch (error) { + if (error.response && error.response.data) { + throw new ConfigurationError(parseError(error.response.data)); + } + throw new ConfigurationError(error.message || "An unknown error occurred"); + }components/scrapeninja/common/utils.mjs (2)
32-47: Optimize the clearObj function to avoid O(n²) complexity.The use of spread operator (
...) in the reducer can lead to O(n²) time complexity as flagged by the static analysis tool. This could impact performance with larger objects.export const clearObj = (obj) => { - return Object.entries(obj) - .filter(([ - _, - v, - ]) => (v != null && v != "" && _ != "$emit")) - .reduce((acc, [ - k, - v, - ]) => ({ - ...acc, - [k]: (!Array.isArray(v) && v === Object(v)) - ? clearObj(v) - : v, - }), {}); + const result = {}; + for (const [key, value] of Object.entries(obj)) { + if (value != null && value !== "" && key !== "$emit") { + result[key] = (!Array.isArray(value) && value === Object(value)) + ? clearObj(value) + : value; + } + } + return result; };This approach avoids the spread operator in the reducer, achieving the same result more efficiently.
🧰 Tools
🪛 Biome (1.9.4)
[error] 42-42: Avoid the use of spread (
...) syntax on accumulators.Spread syntax should be avoided on accumulators (like those in
.reduce) because it causes a time complexity ofO(n^2).
Consider methods such as .splice or .push instead.(lint/performance/noAccumulatingSpread)
26-30: Enhance parseError to handle all possible error cases.The parseError function doesn't have a fallback if none of the expected properties are present.
export const parseError = (data) => { if (data.message) return data.message; if (data.stderr) return data.stderr; if (data.errors) return Object.entries(data.errors[0])[0][1]; + return JSON.stringify(data) || "Unknown error occurred"; };This ensures that some meaningful message is always returned, even for unexpected error structures.
components/scrapeninja/scrapeninja.app.mjs (1)
7-161: Consider adding input validation for critical parametersWhile the property definitions are comprehensive with good descriptions, consider adding runtime validation for critical parameters like URL to provide better error messages to users.
For example, you could add a method to validate the URL format before sending to the API:
validateUrl(url) { try { new URL(url); return true; } catch (err) { throw new Error(`Invalid URL format: ${url}`); } }And then call this from your scrape methods.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
- pnpm-lock.yamlis excluded by- !**/pnpm-lock.yaml
📒 Files selected for processing (7)
- components/scrapeninja/.gitignore(0 hunks)
- components/scrapeninja/actions/non-js-scraping/non-js-scraping.mjs(1 hunks)
- components/scrapeninja/actions/scraping-with-js-rendering/scraping-with-js-rendering.mjs(1 hunks)
- components/scrapeninja/app/scrapeninja.app.ts(0 hunks)
- components/scrapeninja/common/utils.mjs(1 hunks)
- components/scrapeninja/package.json(1 hunks)
- components/scrapeninja/scrapeninja.app.mjs(1 hunks)
💤 Files with no reviewable changes (2)
- components/scrapeninja/.gitignore
- components/scrapeninja/app/scrapeninja.app.ts
🧰 Additional context used
🪛 Biome (1.9.4)
components/scrapeninja/common/utils.mjs
[error] 42-42: Avoid the use of spread (...) syntax on accumulators.
Spread syntax should be avoided on accumulators (like those in .reduce) because it causes a time complexity of O(n^2).
Consider methods such as .splice or .push instead.
(lint/performance/noAccumulatingSpread)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Verify TypeScript components
- GitHub Check: pnpm publish
- GitHub Check: Publish TypeScript components
🔇 Additional comments (7)
components/scrapeninja/actions/non-js-scraping/non-js-scraping.mjs (1)
5-10: LGTM! The naming follows the guidelines.The action name "Scrape without JS" correctly follows the naming convention with a verb as suggested in the prior review comments.
components/scrapeninja/actions/scraping-with-js-rendering/scraping-with-js-rendering.mjs (1)
225-225: Summary message includes useful context.Good job including the URL in the summary message. This provides better context than the generic message in the non-JS action.
components/scrapeninja/package.json (2)
3-6: Version bump and entry point change look good.The version bump from 0.0.2 to 0.1.0 is appropriate for the addition of new features, and the main entry point change reflects the updated file structure.
15-17: Dependency addition is necessary and correct.Adding the @pipedream/platform dependency is necessary as the code imports ConfigurationError from this package.
components/scrapeninja/scrapeninja.app.mjs (3)
32-34: Update proxy property description as previously suggestedThe description is clear, but there was a previous suggestion to update this text.
description: "Premium or your own proxy URL (overrides `Geo` prop). [Read more about ScrapeNinja proxy setup](https://scrapeninja.net/docs/proxy-setup/).",
162-196: LGTM! Well-structured API client methodsThe methods are well-organized with a clean separation of concerns:
- Base URL and headers are properly encapsulated
- Generic request method with appropriate defaults
- Specific methods for different scraping endpoints
This follows good practices for API client implementation.
1-6: LGTM! Clean imports and app definitionThe import statement and app definition are clean and follow Pipedream conventions.
        
          
                components/scrapeninja/actions/scraping-with-js-rendering/scraping-with-js-rendering.mjs
          
            Show resolved
            Hide resolved
        
      | import scrapeninja from "../../scrapeninja.app.mjs"; | ||
|  | ||
| export default { | ||
| key: "scrapeninja-non-js-scraping", | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luancazarine can you rename the actual action names / slugs? Since they are new components I think it'd be best for them to match the name before being shipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filenames are still the same - maybe the commit wasn't pushed?
/actions/non-js-scraping/non-js-scraping.mjs
should match the action name, as in
/actions/scrape-without-js/scrape-without-js.mjs
The same goes for the component key, for both components
…ping-with-js-rendering.mjs Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
| /approve | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
components/scrapeninja/actions/scrape-without-js/scrape-without-js.mjs (1)
83-99: Validate user-provided URL and propertiesThe
urland other optional props are passed directly to thescrapeNonJsservice. If these props can contain invalid or unexpected values (e.g., malformed URLs), consider adding validations or gracefully handling invalid inputs to prevent unexpected errors in the external call.components/scrapeninja/actions/scrape-with-js-rendering/scrape-with-js-rendering.mjs (1)
186-193: Consider additional logs or metrics for debugging complex rendering flowsWhen scraping pages that require JS rendering, introducing optional debug logs around waitForSelector, postWaitTime, or
viewportcould help in diagnosing slow or failing scraping sessions. This is especially useful when debugging complex page interactions or large timeouts.components/scrapeninja/scrapeninja.app.mjs (2)
163-165: Prefer a configurable base URLCurrently
_baseUrl()returns a hardcoded"https://scrapeninja.p.rapidapi.com". If you ever need to switch to different ScrapeNinja environments (e.g., dev, staging), consider making this URL configurable via environment variables.
166-171: Handle missing Rapid API key gracefully
this.$auth.rapid_api_keyis used directly. If the key is absent, your calls will fail. Consider checking for presence or throwing an informative error if the user fails to provide a required key.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
- components/scrapeninja/actions/scrape-with-js-rendering/scrape-with-js-rendering.mjs(1 hunks)
- components/scrapeninja/actions/scrape-without-js/scrape-without-js.mjs(1 hunks)
- components/scrapeninja/scrapeninja.app.mjs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: pnpm publish
- GitHub Check: Verify TypeScript components
- GitHub Check: Publish TypeScript components
🔇 Additional comments (2)
components/scrapeninja/actions/scrape-without-js/scrape-without-js.mjs (1)
2-2: Confirm the behavior ofparseObjectfor headersThe
headersprop is declared as an array of strings, and here it's parsed withparseObject(this.headers). If each array element is expected to be a standard “Key: Value” string, ensureparseObjectconsistently produces the correct header key-value pairs or returns a meaningful error if the formatting is invalid.Would you like a quick test or verification script to confirm expected parsing behavior?
components/scrapeninja/actions/scrape-with-js-rendering/scrape-with-js-rendering.mjs (1)
1-5: Validate imports and utilities usageImports for
clearObj,parseError, andparseObjectsuggest that objects and errors are filtered or transformed. Confirm the correct usage of these utilities in the rest of the codebase to ensure consistent handling of optional properties (e.g., ensuring we don't parsenullor incorrectly format error messages).
| } catch ({ response: { data } }) { | ||
| throw new ConfigurationError(data.message || data.stderr); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against missing data in error response
Destructuring ({ response: { data } }) throws a runtime error if response or data is undefined. Add a fallback or restructure the catch block to avoid uncaught exceptions in cases where the error format differs from the expected shape.
} catch (err) {
-  throw new ConfigurationError(err.response.data.message || err.response.data.stderr);
+  const msg = err?.response?.data?.message || err?.response?.data?.stderr || "Unknown error";
+  throw new ConfigurationError(msg);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } catch ({ response: { data } }) { | |
| throw new ConfigurationError(data.message || data.stderr); | |
| } | |
| } catch (err) { | |
| const msg = err?.response?.data?.message || err?.response?.data?.stderr || "Unknown error"; | |
| throw new ConfigurationError(msg); | |
| } | 
| } catch ({ response: { data } }) { | ||
| throw new ConfigurationError(parseError(data)); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent crashing on unexpected error shapes
Similar to the other action, the nested destructuring in the catch block can lead to runtime errors if response or data objects are missing. Use safe access or a fallback to avoid an unhandled exception when err has a different structure.
} catch (err) {
-  throw new ConfigurationError(parseError(err.response.data));
+  const safeData = err?.response?.data;
+  throw new ConfigurationError(parseError(safeData));
}Committable suggestion skipped: line range outside the PR's diff.
| _makeRequest({ | ||
| $ = this, path, ...opts | ||
| }) { | ||
| return axios($, { | ||
| url: this._baseUrl() + path, | ||
| headers: this._headers(), | ||
| ...opts, | ||
| }); | ||
| }, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Centralize or standardize request error handling
_makeRequest merges additional options into a single axios request. If your application frequently encounters network or parsing errors, consider building a unified error-handling layer here to avoid repeating try-catch logic in multiple actions.
Resolves #15137.
Summary by CodeRabbit
New Features
Chores