Google Cloud Big Query improvements #18490

michelle0927 · 2025-09-29T15:48:58Z

Resolves #18486
Resolves #18352

Bug fix for error “TypeError: job.getQueryResults is not a function”
Introduces additional updates to help reduce out-of-memory errors

Summary by CodeRabbit

New Features
- Added option to limit the number of BigQuery rows processed per execution.
Refactor
- Reworked BigQuery processing to page and batch results, reducing memory usage and improving reliability when handling large datasets.
- Enforced a maximum event size of 1000 for safer processing.
Documentation
- Clarified the description for the deduplication key in BigQuery query results.
Chores
- Bumped component versions (google_cloud package and BigQuery sources) with no functional changes beyond those noted above.

vercel · 2025-09-29T15:49:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Preview	Comments	Updated (UTC)
pipedream-docs	Ignored			Sep 29, 2025 3:49pm
pipedream-docs-redirect-do-not-edit	Ignored			Sep 29, 2025 3:49pm

coderabbitai · 2025-09-29T15:49:07Z

Walkthrough

Version bumps across Google Cloud BigQuery packages and sources. BigQuery Query Results source updates metadata generation. BigQuery common module refactors query execution to inline job creation with paginated, memory-aware processing, adds maxRowsPerExecution, caps eventSize, and removes helper methods.

Changes

Cohort / File(s)	Summary
Version bumps `components/google_cloud/package.json`, `components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs`, `components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs`	Incremented versions: package.json 0.6.1→0.6.2; New Row source 0.1.7→0.1.8; Query Results source 0.1.6→0.1.7.
Query Results source metadata tweaks `components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs`	Flattened `props.dedupeKey.description` to single line; reworked `generateMetaForCollection(rows, ts)` to use a for-of loop updating the hash per row.
BigQuery common refactor: pagination + memory guards `components/google_cloud/sources/common/bigquery.mjs`	Added `maxRowsPerExecution`; set `eventSize` `max: 1000`; replaced helper-based query execution with inline client/job creation and paginated loop using `pageToken`, `pageSize`, `maxPages`, and `totalRowsProcessed`; immediate per-chunk processing; added logs and try/catch; removed `createQueryJob` and `getRowsForQuery`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Source as Source Runner
  participant BQ as BigQuery Client
  participant Job as Query Job
  participant Pager as Page Iterator
  participant Emitter as Event Emitter

  Note over Source: Start execution
  Source->>BQ: instantiate client
  Source->>BQ: createQueryJob(queryOpts)
  BQ-->>Source: job handle
  Source->>Job: request first page (pageSize)
  loop until no page or limits hit
    Job-->>Source: rows, pageToken
    Note right of Source: process rows in small chunks<br/>update cursor / last result ID
    Source->>Emitter: emit events per chunk/row
    alt pageToken present and under limits
      Source->>Job: get next page (pageToken)
    else limits hit or no page
      Note over Source: stop (memory/pages cap)
    end
  end
  Note over Source: Finish execution (log summary)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Google Cloud BigQuery improvements #18467 — Also modifies components/google_cloud/sources/common/bigquery.mjs to change query execution and pagination, overlapping with this refactor.

Poem

A rabbit hopped through clouds of query light,
Nibbled pages, tiny bites—just right.
No more memory fright at night,
Row by row, it keeps things tight.
Version carrots stacked in sight—
Hop! The job completes in flight. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes Check	⚠️ Warning	This PR includes version bumps in components/google_cloud/package.json and the BigQuery New Row source, as well as an unrelated update to the dedupeKey description in the Query Results source, which were not specified in the linked issues focused on bug fixing and memory‐aware pagination.	Remove or justify the version bumps and documentation tweak in this PR, or move them to a separate release‐preparation change set to keep the scope aligned with the linked issues.
Description Check	⚠️ Warning	The pull request description does not follow the repository’s required template because it omits the “## WHY” section and does not explain the rationale behind the BigQuery changes, instead only listing issue resolutions and a brief summary of fixes.	Please update the description to include a “## WHY” section that clearly outlines the motivation for these BigQuery source changes, in accordance with the template.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title “Google Cloud Big Query improvements” correctly references the BigQuery components that are being enhanced, covering both the bug fix and memory‐handling updates, and thus aligns with the main focus of the changeset.
Linked Issues Check	✅ Passed	The changes inline the BigQuery job execution and remove the invalid getQueryResults call to resolve the TypeError (#18486), and they implement pagination with configurable maxRowsPerExecution, memory checks, and enhanced error logging to address out‐of‐memory failures in the Query Results trigger (#18352).
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch issue-18352-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (1)
116-126: Restore a valid query helper for _getIdOfLastRow.

this.getRowsForQuery was removed from common/bigquery.mjs, so this call now throws TypeError: this.getRowsForQuery is not a function, breaking deployments. Please update this code to use the new inline job execution pattern (e.g. create the query job via the BigQuery client and call job.getQueryResults) or reinstate a shared helper before shipping.

Consider applying:
-      const rows = await this.getRowsForQuery(queryOpts, this.datasetId);
+      const client = this.googleCloud
+        .getBigQueryClient()
+        .dataset(this.datasetId);
+      const [job] = await client.createQueryJob(queryOpts);
+      const [rows] = await job.getQueryResults({
+        maxResults: limit,
+      });

🧹 Nitpick comments (1)

components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (1)

24-24: Clarify dedupeKey wording and behavior.

Suggest: “The name of a column in the query results to use for deduplication…” Also note that if omitted, the source will dedupe the batch by a deterministic hash of each row (see code), not by a random UUID.

If you adopt the deterministic fallback above, align this description and the linked docs accordingly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b8ba5c and 6375748.

📒 Files selected for processing (4)

components/google_cloud/package.json (1 hunks)
components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (1 hunks)
components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (3 hunks)
components/google_cloud/sources/common/bigquery.mjs (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (2)

components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (2)

rows (116-116)

hash (153-153)

components/google_cloud/sources/common/bigquery.mjs (1)

rows (81-84)

components/google_cloud/sources/common/bigquery.mjs (1)

components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (2)

queryOpts (110-115)

rows (116-116)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Lint Code Base
GitHub Check: Publish TypeScript components
GitHub Check: Verify TypeScript components
GitHub Check: pnpm publish

🔇 Additional comments (1)

components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (1)

11-11: Version bump looks good; ensure cross-file consistency.

Confirm related sources and the package version in components/google_cloud/package.json reflect the same release and include a changelog entry summarizing the BigQuery streaming/memory improvements.

components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs

luancazarine

Hi @michelle0927, LGTM! Ready for QA!

updates

6375748

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs Show resolved Hide resolved

pipedream-component-development requested a review from jcortes September 29, 2025 15:59

michelle0927 removed the request for review from jcortes September 29, 2025 16:00

pipedream-component-development requested a review from luancazarine September 29, 2025 16:00

luancazarine approved these changes Sep 29, 2025

View reviewed changes

michelle0927 merged commit cd835d7 into master Sep 30, 2025
10 checks passed

michelle0927 deleted the issue-18352-2 branch September 30, 2025 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Google Cloud Big Query improvements #18490

Google Cloud Big Query improvements #18490

Uh oh!

michelle0927 commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Sep 29, 2025

Uh oh!

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

luancazarine left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Google Cloud Big Query improvements #18490

Google Cloud Big Query improvements #18490

Uh oh!

Conversation

michelle0927 commented Sep 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel bot commented Sep 29, 2025

Uh oh!

coderabbitai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

luancazarine left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

michelle0927 commented Sep 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 29, 2025 •

edited

Loading