Skip to content

Conversation

@irjudson
Copy link
Contributor

Summary

Implements two new features for enhanced BigQuery sync capabilities:

1. Custom SQL Query Support (Issue #24)

  • Add optional customQuery field to table configuration
  • Support @lastTimestamp variable substitution for incremental sync
  • Automatically wrap custom queries with partitioning logic (MOD-based)
  • Enable JOINs, transformations, and complex WHERE clauses
  • Maintain distributed workload partitioning across cluster

Example:

- id: enriched_events
  customQuery: |
    SELECT 
      e.event_time as timestamp,
      e.vessel_id,
      v.vessel_name
    FROM \`maritime_tracking.events\` e
    LEFT JOIN \`maritime_tracking.vessels\` v ON e.vessel_id = v.id
    WHERE e.event_time > TIMESTAMP(@lastTimestamp)
  timestampColumn: timestamp
  targetTable: EnrichedEvents

2. Proxy Configuration (Issue #23)

  • Add optional proxy configuration at bigquery level
  • Support HTTP/HTTPS proxies with authentication
  • Configure via proxy.enabled and proxy.url fields
  • Plugin-level configuration (does not affect Harper replication)

Example:

bigquery:
  proxy:
    enabled: true
    url: "http://proxy.vz.com:8080"

Additional Changes

This PR also includes:

  • bq-benchmark tool for measuring BigQuery to Harper throughput
  • Updated README documentation
  • 18 new tests for custom query and proxy features

Testing

  • ✅ All 91 existing tests pass
  • ✅ All 18 new tests pass
  • ✅ Linter passed
  • ✅ Pre-commit hooks passed

Closes

Closes #23, Closes #24

irjudson and others added 3 commits January 9, 2026 10:42
Implements a comprehensive benchmarking tool that measures sustained
throughput from BigQuery to Harper with dual metrics (checkpoint-based
and table-based) to provide complete pipeline visibility.

Features:
- Three-phase benchmark (warmup, measurement, cooldown)
- Dual metrics: checkpoint-based (ingestion speed) + table-based (query availability)
- Replication lag calculation between ingestion and query availability
- Sliding window algorithm that excludes startup/consistency artifacts
- Comprehensive Markdown reports with statistics
- JSON export for programmatic access
- Resource profiling (CPU, memory)
- Pre-flight validation (Harper, BigQuery, tables, checkpoints)

Architecture:
- CLI with 4 commands: setup, run, analyze, cleanup
- Metrics collector with checkpoint client
- Phase manager for three-phase execution
- Sliding window for throughput/latency calculations
- Markdown reporter with dual metrics comparison

Design document: docs/plans/2026-01-09-bq-benchmark-design.md

MVP is complete and functional for benchmarking BigQuery → Harper pipelines.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Replace "coming soon" text with actual tool documentation including:
- Tool capabilities (throughput, replication lag, latency, resource monitoring)
- Quick start commands
- Description of three-phase benchmark approach
- Link to complete documentation

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Implements two new features for enhanced BigQuery sync capabilities:

1. Custom SQL Query Support (Issue #24)
   - Add optional customQuery field to table configuration
   - Support @lastTimestamp variable substitution for incremental sync
   - Automatically wrap custom queries with partitioning logic (MOD-based)
   - Enable JOINs, transformations, and complex WHERE clauses
   - Maintain distributed workload partitioning across cluster

2. Proxy Configuration (Issue #23)
   - Add optional proxy configuration at bigquery level
   - Support HTTP/HTTPS proxies with authentication
   - Configure via proxy.enabled and proxy.url fields
   - Plugin-level configuration (does not affect Harper replication)
   - Uses environment variables respected by BigQuery SDK

All existing tests pass (91/91) plus 18 new tests for these features.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@irjudson irjudson merged commit c6b3a93 into main Jan 23, 2026
4 checks passed
@irjudson irjudson deleted the feature/bq-benchmark-tool branch January 23, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add SQL query capabilities Proxy configurations at the plugin level

2 participants