Skip to content

Conversation

@michelle0927
Copy link
Collaborator

@michelle0927 michelle0927 commented Oct 1, 2025

Resolves #18503

Summary by CodeRabbit

  • New Features

    • Added action to create OnPage crawl tasks with configurable options and success feedback.
    • Added action to retrieve crawled pages by task ID with pagination support.
  • Style

    • Renamed “Parse Page Content” to “Parse Page Content with OnPage”; minor version update.
  • Chores

    • Bumped DataForSEO package to 0.4.0.
    • Updated platform dependency to ^3.1.0.

@vercel
Copy link

vercel bot commented Oct 1, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
pipedream-docs Ignored Ignored Oct 1, 2025 6:33pm
pipedream-docs-redirect-do-not-edit Ignored Ignored Oct 1, 2025 6:33pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 1, 2025

Walkthrough

Adds two new DataForSEO OnPage actions for creating crawl tasks and retrieving crawled pages, updates an existing OnPage parse action’s metadata, and bumps the package version and a dependency.

Changes

Cohort / File(s) Summary of changes
OnPage API actions (new)
components/dataforseo/actions/create-onpage-task/create-onpage-task.mjs, components/dataforseo/actions/get-crawled-pages/get-crawled-pages.mjs
Introduces actions to create OnPage tasks and fetch crawled pages via DataForSEO OnPage endpoints with request building, response validation, and summary export.
OnPage parse action metadata
components/dataforseo/actions/parse-page-content/parse-page-content.mjs
Renames action to “Parse Page Content with OnPage” and bumps version from 0.0.3 to 0.0.4. No logic changes.
Package metadata
components/dataforseo/package.json
Bumps package version 0.3.0 → 0.4.0 and updates dependency @pipedream/platform ^3.0.3 → ^3.1.0.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User/Workflow
  participant A as Action: Create OnPage Task
  participant DFS as DataForSEO OnPage API

  U->>A: Trigger run()
  A->>DFS: POST /on_page/task_post {target, max_crawl_pages, ...}
  DFS-->>A: Response {status_code, tasks[0].status_code, ...}
  alt Valid status codes (20000/20100)
    A-->>U: Summary: Successfully created onpage task.
  else Error status
    A-->>U: Throw ConfigurationError with status_message
  end
Loading
sequenceDiagram
  autonumber
  actor U as User/Workflow
  participant A as Action: Get Crawled Pages
  participant DFS as DataForSEO OnPage API

  U->>A: Trigger run()
  A->>DFS: POST /on_page/pages {id, limit, search_after_token, tag}
  DFS-->>A: Response {status_code, tasks[0].status_code, ...}
  alt Valid status codes (20000)
    A-->>U: Summary: Successfully retrieved crawled pages.
  else Error status
    A-->>U: Throw ConfigurationError with status_message
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I thump my paws on fertile logs,
New OnPage trails through data fogs—
I queue the crawl, then hop for leaves,
Fetch pages where the spider weaves.
Version bumps? A gentle breeze.
Carrots cached, I parse with ease.
Hoppy runs and 20000s, please! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description contains only “Resolves #18503” and does not follow the repository’s description template, which requires a “## WHY” section explaining the motivation and context of the changes. Without this section, readers lack insight into why the OnPage API endpoints are being added. Please update the pull request description to include the “## WHY” section from the template, summarizing the rationale for adding the OnPage API endpoints and how they enhance the DataForSEO integration.
Title Check ❓ Inconclusive The title “DataForSEO - new components” is related to the repository but is too generic to convey the primary change, which is the addition of OnPage API-based actions (create-onpage-task and get-crawled-pages) to the DataForSEO component. It does not highlight the OnPage API integration or the specific new actions introduced. As a result, it may not be sufficiently informative for a teammate scanning pull request history. Consider renaming the title to specifically mention the OnPage API additions, for example “Add DataForSEO OnPage Task and Crawled Pages Actions,” to clearly communicate the core changes in this pull request.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues Check ✅ Passed The changes introduce new createOnpageTask and getCrawledPages actions that map to the OnPage API endpoints with appropriate metadata, request payload mapping, and error handling, fulfilling the objectives of issue #18503 to update DataForSEO with OnPage API functionality. The version bumps and naming updates also reflect the integration of these new endpoints.
Out of Scope Changes Check ✅ Passed All modifications in this pull request relate directly to the OnPage API integration or necessary version and naming updates for the DataForSEO component. There are no unrelated or extraneous changes outside the scope of adding the OnPage endpoints.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch issue-18503

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
components/dataforseo/actions/create-onpage-task/create-onpage-task.mjs (1)

23-27: Consider adding validation for maxCrawlPages.

Adding a min constraint would prevent users from submitting invalid values (e.g., 0 or negative numbers).

 maxCrawlPages: {
   type: "integer",
   label: "Max Crawl Pages",
   description: "The number of pages to crawl on the specified domain",
+  min: 1,
 },
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5128d19 and 83064dc.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (4)
  • components/dataforseo/actions/create-onpage-task/create-onpage-task.mjs (1 hunks)
  • components/dataforseo/actions/get-crawled-pages/get-crawled-pages.mjs (1 hunks)
  • components/dataforseo/actions/parse-page-content/parse-page-content.mjs (1 hunks)
  • components/dataforseo/package.json (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/dataforseo/actions/get-crawled-pages/get-crawled-pages.mjs (2)
components/dataforseo/actions/parse-page-content/parse-page-content.mjs (1)
  • response (61-72)
components/dataforseo/actions/create-onpage-task/create-onpage-task.mjs (1)
  • response (69-82)
components/dataforseo/actions/create-onpage-task/create-onpage-task.mjs (2)
components/dataforseo/actions/parse-page-content/parse-page-content.mjs (1)
  • response (61-72)
components/dataforseo/actions/get-crawled-pages/get-crawled-pages.mjs (1)
  • response (51-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Verify TypeScript components
  • GitHub Check: Publish TypeScript components
  • GitHub Check: pnpm publish
  • GitHub Check: Lint Code Base
🔇 Additional comments (9)
components/dataforseo/package.json (1)

3-3: LGTM! Version and dependency updates are appropriate.

The version bump to 0.4.0 correctly reflects the addition of new OnPage actions, and the platform dependency upgrade to ^3.1.0 aligns with the latest stable release.

Also applies to: 16-16

components/dataforseo/actions/parse-page-content/parse-page-content.mjs (1)

6-6: LGTM! Metadata updates align with OnPage API focus.

The name clarification and version bump appropriately reflect the OnPage API integration theme of this PR.

Also applies to: 9-9

components/dataforseo/actions/get-crawled-pages/get-crawled-pages.mjs (3)

1-14: LGTM! Imports and metadata are well-configured.

The ConfigurationError import and action metadata follow Pipedream conventions. The readOnlyHint: true annotation correctly reflects that this action retrieves data without modifying resources.


41-49: LGTM! Method implementation follows established patterns.

The getCrawledPage helper correctly delegates to dataforseo._makeRequest and matches the pattern used in other DataForSEO actions.


22-27: Confirmed limit and tag propDefinitions exist in dataforseo.app.mjs; no further action needed.

components/dataforseo/actions/create-onpage-task/create-onpage-task.mjs (4)

1-14: LGTM! Imports and metadata are correctly configured.

The readOnlyHint: false annotation appropriately reflects that this action creates a new task.


59-67: LGTM! Method implementation is consistent.

The helper method follows the established pattern used across DataForSEO actions.


68-82: LGTM! API request payload is correctly structured.

The property mapping from camelCase props to snake_case API fields follows DataForSEO API conventions.


17-22: target propDefinition verified

The target propDefinition exists in the DataForSeo app’s definitions, so no further action is required.

@vunguyenhung vunguyenhung merged commit 0dfb169 into master Oct 2, 2025
10 checks passed
@vunguyenhung vunguyenhung deleted the issue-18503 branch October 2, 2025 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ACTION] Update DataForSEO to include OnPage API Endpoints

4 participants