- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5.5k
Google Cloud Big Query improvements #18490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
 | 
| WalkthroughVersion bumps across Google Cloud BigQuery packages and sources. BigQuery Query Results source updates metadata generation. BigQuery common module refactors query execution to inline job creation with paginated, memory-aware processing, adds maxRowsPerExecution, caps eventSize, and removes helper methods. Changes
 Sequence Diagram(s)sequenceDiagram
  autonumber
  actor Source as Source Runner
  participant BQ as BigQuery Client
  participant Job as Query Job
  participant Pager as Page Iterator
  participant Emitter as Event Emitter
  Note over Source: Start execution
  Source->>BQ: instantiate client
  Source->>BQ: createQueryJob(queryOpts)
  BQ-->>Source: job handle
  Source->>Job: request first page (pageSize)
  loop until no page or limits hit
    Job-->>Source: rows, pageToken
    Note right of Source: process rows in small chunks<br/>update cursor / last result ID
    Source->>Emitter: emit events per chunk/row
    alt pageToken present and under limits
      Source->>Job: get next page (pageToken)
    else limits hit or no page
      Note over Source: stop (memory/pages cap)
    end
  end
  Note over Source: Finish execution (log summary)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
 Poem
 Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
 ✅ Passed checks (3 passed)
 ✨ Finishing touches
 🧪 Generate unit tests
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (1)
components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (1)
116-126: Restore a valid query helper for_getIdOfLastRow.
this.getRowsForQuerywas removed fromcommon/bigquery.mjs, so this call now throwsTypeError: this.getRowsForQuery is not a function, breaking deployments. Please update this code to use the new inline job execution pattern (e.g. create the query job via the BigQuery client and calljob.getQueryResults) or reinstate a shared helper before shipping.Consider applying:
- const rows = await this.getRowsForQuery(queryOpts, this.datasetId); + const client = this.googleCloud + .getBigQueryClient() + .dataset(this.datasetId); + const [job] = await client.createQueryJob(queryOpts); + const [rows] = await job.getQueryResults({ + maxResults: limit, + });
🧹 Nitpick comments (1)
components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (1)
24-24: Clarify dedupeKey wording and behavior.Suggest: “The name of a column in the query results to use for deduplication…” Also note that if omitted, the source will dedupe the batch by a deterministic hash of each row (see code), not by a random UUID.
If you adopt the deterministic fallback above, align this description and the linked docs accordingly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
- components/google_cloud/package.json(1 hunks)
- components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs(1 hunks)
- components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs(3 hunks)
- components/google_cloud/sources/common/bigquery.mjs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (2)
components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (2)
rows(116-116)
hash(153-153)components/google_cloud/sources/common/bigquery.mjs (1)
rows(81-84)
components/google_cloud/sources/common/bigquery.mjs (1)
components/google_cloud/sources/bigquery-new-row/bigquery-new-row.mjs (2)
queryOpts(110-115)
rows(116-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Lint Code Base
- GitHub Check: Publish TypeScript components
- GitHub Check: Verify TypeScript components
- GitHub Check: pnpm publish
🔇 Additional comments (1)
components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs (1)
11-11: Version bump looks good; ensure cross-file consistency.Confirm related sources and the package version in
components/google_cloud/package.jsonreflect the same release and include a changelog entry summarizing the BigQuery streaming/memory improvements.
        
          
                components/google_cloud/sources/bigquery-query-results/bigquery-query-results.mjs
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @michelle0927, LGTM! Ready for QA!
Resolves #18486
Resolves #18352
Summary by CodeRabbit