-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New Components - docparser #16191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Components - docparser #16191
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 3 Skipped Deployments
|
WalkthroughThis pull request introduces multiple new modules and updates within the Docparser component. Two new actions are added for fetching documents from a URL and uploading documents. A new utility function ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/Trigger
participant Action as "fetch-document-url Action"
participant DocparserApp as "Docparser App"
Client->>Action: Trigger fetch with URL
Action->>DocparserApp: Call fetchDocumentFromURL(parserId, formData)
DocparserApp-->>Action: Return document ID
Action-->>Client: Return summary message
sequenceDiagram
participant Client as Client/Trigger
participant Action as "upload-document Action"
participant Utils as "checkTmp Utility"
participant DocparserApp as "Docparser App"
Client->>Action: Trigger upload with file path
Action->>Utils: Validate file path
Utils-->>Action: Return validated file path
Action->>DocparserApp: Call uploadDocument(parserId, formData)
DocparserApp-->>Action: Return document ID
Action-->>Client: Return summary message
sequenceDiagram
participant DeployHook as "Deploy Hook"
participant CommonBase as "Common Base Module"
participant Database as "Database Service"
DeployHook->>CommonBase: Call emitEvent(limit:25)
CommonBase->>Database: Retrieve last processed date
Database-->>CommonBase: Return last processed date
CommonBase->>CommonBase: Process and update events
CommonBase-->>DeployHook: Emit processed events
Assessment against linked issues
Suggested reviewers
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
components/docparser/actions/upload-document/upload-document.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs components/docparser/common/utils.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs components/docparser/docparser.app.mjsOops! Something went wrong! :( ESLint: 8.57.1 Error [ERR_MODULE_NOT_FOUND]: Cannot find package 'jsonc-eslint-parser' imported from /eslint.config.mjs
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Sources - New Document Data Available Actions - Fetch Document URL - Upload Document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (8)
components/docparser/common/utils.mjs (1)
1-6: Add input validation for more robust error handlingThe
checkTmpfunction ensures paths start with "/tmp", which is good for security, but it lacks validation for edge cases. Consider adding checks for null/undefined values and empty strings.export const checkTmp = (filename) => { + if (!filename) { + return "/tmp/undefined"; + } if (!filename.startsWith("/tmp")) { return `/tmp/${filename}`; } return filename; };components/docparser/sources/new-document-data-available/new-document-data-available.mjs (1)
12-20: Consider specifying the deduplication strategy more explicitlyThe
dedupe: "unique"setting is good, but it's not clear which field is used for deduplication. Consider specifying this or adding a comment to clarify how unique events are identified.You might want to add a comment or specify a
uniqueKeyproperty if needed:dedupe: "unique", + // Using document_id as the unique identifier for deduplicationcomponents/docparser/sources/new-document-data-available/test-event.mjs (1)
1-16: Update sample dates to realistic timestampsThe sample event contains dates set to April 2025, which is in the future. While this works for testing purposes, consider using more realistic dates (past or present) to avoid confusion.
"page_count": 5, - "uploaded_at": "2025-04-08T13:32:02+00:00", - "processed_at": "2025-04-08T13:32:02+00:00", - "uploaded_at_utc": "2025-04-08T13:32:02+00:00", - "uploaded_at_user": "2025-04-08T06:32:02+00:00", - "processed_at_utc": "2025-04-08T13:32:02+00:00", - "processed_at_user": "2025-04-08T06:32:02+00:00" + "uploaded_at": "2023-04-08T13:32:02+00:00", + "processed_at": "2023-04-08T13:32:02+00:00", + "uploaded_at_utc": "2023-04-08T13:32:02+00:00", + "uploaded_at_user": "2023-04-08T06:32:02+00:00", + "processed_at_utc": "2023-04-08T13:32:02+00:00", + "processed_at_user": "2023-04-08T06:32:02+00:00"components/docparser/sources/common/base.mjs (2)
21-27: Implement safeguard for database operations.The methods for getting and setting the last date should include error handling for robustness.
_getLastDate() { + try { return this.db.get("lastDate") || "1970-01-01T00:00:00"; + } catch (error) { + console.error("Error retrieving last date:", error); + return "1970-01-01T00:00:00"; + } }, _setLastDate(lastDate) { + try { this.db.set("lastDate", lastDate); + } catch (error) { + console.error("Error setting last date:", error); + } },
66-68: Add error handling to the run method.The run method should have proper error handling to ensure failures are logged and don't crash the component.
async run() { + try { await this.emitEvent(); + } catch (error) { + console.error("Error in run method:", error); + } },components/docparser/docparser.app.mjs (3)
7-20: Validate parser IDs in the options method.The parserId options method should include error handling and validation of the returned data.
parserId: { type: "string", label: "Parser ID", description: "The ID of the parser to be used.", async options() { + try { const parsers = await this.listParsers(); + + if (!Array.isArray(parsers)) { + throw new Error("Invalid response format from listParsers"); + } + return parsers.map(({ id: value, label, }) => ({ label, value, })); + } catch (error) { + console.error("Error fetching parser options:", error); + return []; + } }, },
54-62: Enhance the fetchDocumentFromURL method with input validation.The method should validate that the parserId is provided before making the API call.
fetchDocumentFromURL({ parserId, ...opts }) { + if (!parserId) { + throw new Error("Parser ID is required"); + } + return this._makeRequest({ method: "POST", path: `/v2/document/fetch/${parserId}`, ...opts, }); },
63-71: Enhance the uploadDocument method with input validation.Similar to the fetchDocumentFromURL method, this should validate the parserId before proceeding.
uploadDocument({ parserId, ...opts }) { + if (!parserId) { + throw new Error("Parser ID is required"); + } + return this._makeRequest({ method: "POST", path: `/v1/document/upload/${parserId}`, ...opts, }); },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (8)
components/docparser/actions/fetch-document-url/fetch-document-url.mjs(1 hunks)components/docparser/actions/upload-document/upload-document.mjs(1 hunks)components/docparser/common/utils.mjs(1 hunks)components/docparser/docparser.app.mjs(1 hunks)components/docparser/package.json(2 hunks)components/docparser/sources/common/base.mjs(1 hunks)components/docparser/sources/new-document-data-available/new-document-data-available.mjs(1 hunks)components/docparser/sources/new-document-data-available/test-event.mjs(1 hunks)
🔇 Additional comments (3)
components/docparser/sources/new-document-data-available/new-document-data-available.mjs (1)
6-11: LGTM! Component metadata looks goodThe component is well-defined with appropriate metadata for a Pipedream source.
components/docparser/package.json (2)
16-16: LGTM! Dependency update is appropriateThe update to @pipedream/platform version is a minor patch update which is appropriate for maintaining compatibility.
3-3:❓ Verification inconclusive
Verify intentional version downgrade
The version has been changed from "0.6.0" to "0.1.0", which is unusual as version numbers typically increase. Please verify this downgrade is intentional and not a mistake.
🌐 Web query:
What are best practices for semantic versioning when releasing new features?💡 Result:
Semantic versioning (SemVer) is a standardized approach to versioning software releases, using a
MAJOR.MINOR.PATCHformat (e.g.,2.4.1). When releasing new features, follow these best practices to ensure clarity and compatibility:
Versioning Rules for New Features
Increment the
MINORversion when adding backward-compatible functionality[1][3][9].
Example:1.3.0→1.4.0after introducing a new API endpoint.
- Avoid breaking changes (reserve those for
MAJORupdates)[7][9].- Reset
PATCHto0when updatingMINOR[9][10].Use
MAJORversion0during initial development (e.g.,0.1.0) to signal instability[4][8][9]. Transition to1.0.0once the API is stable[4][9].Maintain backward compatibility for
MINORreleases to avoid forcing users to rewrite code[3][7].
Automation and Tooling
- Automate versioning using tools like
semantic-release, which analyzes commit messages (e.g.,feat:,fix:) to determine version increments[1][5].
Example GitHub Actions workflow:- name: Release run: npx semantic-release- Generate changelogs automatically with plugins like
semantic-release/changelogto document new features and fixes[1][6].
Git Tagging and Release Management
- Tag releases with semantic versions (e.g.,
v1.4.0) and avoid modifying tags post-release[2][6].- Pin dependencies to
MINORorPATCHranges (e.g.,^1.4.0) to allow safe automatic updates[10].
Communication and Transparency
- Publish release notes explaining new features, deprecations, and bug fixes[1][3][7].
- Document your versioning policy to align teams and users[3][7].
- Adopt a release schedule (e.g., quarterly minor updates) to set user expectations[4][7].
Pre-release Handling
- Use prerelease labels (e.g.,
1.4.0-beta.1) for testing unfinished features[9][10].- Leverage feature flags to decouple feature releases from version updates, enabling gradual rollouts[3][7].
Team Coordination
- Align QA, DevOps, and product teams on versioning rules to streamline testing and deployment[6][7].
- Audit dependencies to ensure third-party
MINOR/PATCHupdates don’t introduce breaking changes[10].
By adhering to these practices, teams can deliver features predictably, minimize compatibility issues, and foster trust with users. Tools like
semantic-releaseand Git tags reduce manual effort, while clear communication ensures stakeholders understand the impact of each update[1][3][6].Citations:
- 1: https://aws.amazon.com/blogs/devops/using-semantic-versioning-to-simplify-release-management/
- 2: https://www.gitkraken.com/gitkon/semantic-versioning-git-tags
- 3: https://launchdarkly.com/blog/software-release-versioning/
- 4: https://www.cloudbees.com/blog/best-practices-when-versioning-a-release
- 5: https://github.com/semantic-release/semantic-release
- 6: https://www.tiny.cloud/blog/improving-our-engineering-best-practices-with-semantic-versioning/
- 7: https://cpl.thalesgroup.com/software-monetization/software-versioning-basics
- 8: https://arjancodes.com/blog/how-to-implement-semantic-versioning-in-software-projects/
- 9: https://semver.org
- 10: https://www.baeldung.com/cs/semantic-versioning
Confirm the Version Reset in package.json
The version in
components/docparser/package.jsonhas been changed from"0.6.0"to"0.1.0". As per semantic versioning best practices, when introducing new features it’s standard to increment the MINOR (or PATCH) version rather than regress to a lower version. If this reset to"0.1.0"reflects an intentional reinitialization due to early development states (where major version0signals instability), please clearly document this decision (for example, in your changelog). Otherwise, consider aligning the version update with established semantic versioning rules by incrementing rather than downgrading.
jcortes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @luancazarine I've just added an observation other than that lgtm! Ready for QA!
|
/approve |
Resolves #13255.
Summary by CodeRabbit
New Features
Chores