Skip to content

Conversation

@michelle0927
Copy link
Collaborator

@michelle0927 michelle0927 commented Sep 29, 2025

Resolves #18486
Resolves #18352

  • Bug fix for error “TypeError: job.getQueryResults is not a function”
  • Introduces additional updates to help reduce out-of-memory errors

Summary by CodeRabbit

  • New Features
    • Added option to limit the number of BigQuery rows processed per execution.
  • Refactor
    • Reworked BigQuery processing to page and batch results, reducing memory usage and improving reliability when handling large datasets.
    • Enforced a maximum event size of 1000 for safer processing.
  • Documentation
    • Clarified the description for the deduplication key in BigQuery query results.
  • Chores
    • Bumped component versions (google_cloud package and BigQuery sources) with no functional changes beyond those noted above.

@vercel
Copy link

vercel bot commented Sep 29, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
pipedream-docs Ignored Ignored Sep 29, 2025 3:49pm
pipedream-docs-redirect-do-not-edit Ignored Ignored Sep 29, 2025 3:49pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 29, 2025

Walkthrough

Version bumps across Google Cloud BigQuery packages and sources. BigQuery Query Results source updates metadata generation. BigQuery common module refactors query execution to inline job creation with paginated, memory-aware processing, adds maxRowsPerExecution, caps eventSize, and removes helper methods.

Changes

Cohort / File(s) Summary
Version bumps
components/google_cloud/package.json, components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs, components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs
Incremented versions: package.json 0.6.1→0.6.2; New Row source 0.1.7→0.1.8; Query Results source 0.1.6→0.1.7.
Query Results source metadata tweaks
components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs
Flattened props.dedupeKey.description to single line; reworked generateMetaForCollection(rows, ts) to use a for-of loop updating the hash per row.
BigQuery common refactor: pagination + memory guards
components/google_cloud/sources/common/bigquery.mjs
Added maxRowsPerExecution; set eventSize max: 1000; replaced helper-based query execution with inline client/job creation and paginated loop using pageToken, pageSize, maxPages, and totalRowsProcessed; immediate per-chunk processing; added logs and try/catch; removed createQueryJob and getRowsForQuery.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Source as Source Runner
  participant BQ as BigQuery Client
  participant Job as Query Job
  participant Pager as Page Iterator
  participant Emitter as Event Emitter

  Note over Source: Start execution
  Source->>BQ: instantiate client
  Source->>BQ: createQueryJob(queryOpts)
  BQ-->>Source: job handle
  Source->>Job: request first page (pageSize)
  loop until no page or limits hit
    Job-->>Source: rows, pageToken
    Note right of Source: process rows in small chunks<br/>update cursor / last result ID
    Source->>Emitter: emit events per chunk/row
    alt pageToken present and under limits
      Source->>Job: get next page (pageToken)
    else limits hit or no page
      Note over Source: stop (memory/pages cap)
    end
  end
  Note over Source: Finish execution (log summary)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit hopped through clouds of query light,
Nibbled pages, tiny bites—just right.
No more memory fright at night,
Row by row, it keeps things tight.
Version carrots stacked in sight—
Hop! The job completes in flight. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning This PR includes version bumps in components/google_cloud/package.json and the BigQuery New Row source, as well as an unrelated update to the dedupeKey description in the Query Results source, which were not specified in the linked issues focused on bug fixing and memory‐aware pagination. Remove or justify the version bumps and documentation tweak in this PR, or move them to a separate release‐preparation change set to keep the scope aligned with the linked issues.
Description Check ⚠️ Warning The pull request description does not follow the repository’s required template because it omits the “## WHY” section and does not explain the rationale behind the BigQuery changes, instead only listing issue resolutions and a brief summary of fixes. Please update the description to include a “## WHY” section that clearly outlines the motivation for these BigQuery source changes, in accordance with the template.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title “Google Cloud Big Query improvements” correctly references the BigQuery components that are being enhanced, covering both the bug fix and memory‐handling updates, and thus aligns with the main focus of the changeset.
Linked Issues Check ✅ Passed The changes inline the BigQuery job execution and remove the invalid getQueryResults call to resolve the TypeError (#18486), and they implement pagination with configurable maxRowsPerExecution, memory checks, and enhanced error logging to address out‐of‐memory failures in the Query Results trigger (#18352).
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch issue-18352-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (1)

116-126: Restore a valid query helper for _getIdOfLastRow.

this.getRowsForQuery was removed from common/bigquery.mjs, so this call now throws TypeError: this.getRowsForQuery is not a function, breaking deployments. Please update this code to use the new inline job execution pattern (e.g. create the query job via the BigQuery client and call job.getQueryResults) or reinstate a shared helper before shipping.

Consider applying:

-      const rows = await this.getRowsForQuery(queryOpts, this.datasetId);
+      const client = this.googleCloud
+        .getBigQueryClient()
+        .dataset(this.datasetId);
+      const [job] = await client.createQueryJob(queryOpts);
+      const [rows] = await job.getQueryResults({
+        maxResults: limit,
+      });
🧹 Nitpick comments (1)
components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (1)

24-24: Clarify dedupeKey wording and behavior.

Suggest: “The name of a column in the query results to use for deduplication…” Also note that if omitted, the source will dedupe the batch by a deterministic hash of each row (see code), not by a random UUID.

If you adopt the deterministic fallback above, align this description and the linked docs accordingly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b8ba5c and 6375748.

📒 Files selected for processing (4)
  • components/google_cloud/package.json (1 hunks)
  • components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (1 hunks)
  • components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (3 hunks)
  • components/google_cloud/sources/common/bigquery.mjs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (2)
components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (2)
  • rows (116-116)
  • hash (153-153)
components/google_cloud/sources/common/bigquery.mjs (1)
  • rows (81-84)
components/google_cloud/sources/common/bigquery.mjs (1)
components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (2)
  • queryOpts (110-115)
  • rows (116-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Lint Code Base
  • GitHub Check: Publish TypeScript components
  • GitHub Check: Verify TypeScript components
  • GitHub Check: pnpm publish
🔇 Additional comments (1)
components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (1)

11-11: Version bump looks good; ensure cross-file consistency.

Confirm related sources and the package version in components/google_cloud/package.json reflect the same release and include a changelog entry summarizing the BigQuery streaming/memory improvements.

Copy link
Collaborator

@luancazarine luancazarine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @michelle0927, LGTM! Ready for QA!

@michelle0927 michelle0927 merged commit cd835d7 into master Sep 30, 2025
10 checks passed
@michelle0927 michelle0927 deleted the issue-18352-2 branch September 30, 2025 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Google Cloud - Bigquery source [BUG] BigQuery - New Query Result throws out of memory error

3 participants