A Cloud Native Go microservice that exports BigQuery query results to destinations via a pluggable driver:
- GCS Parquet using BigQuery server-side EXPORT DATA
- StarRocks table load with automatic table creation and batched inserts
- Driver Architecture: Select destination via
EXPORT_DRIVER(GCS_PARQUETorSTARROCKS). - Efficient Export (GCS): Uses BigQuery's native
EXPORT DATAstatement (server-side export). - StarRocks Load: Creates table if missing and performs batched inserts for high throughput.
- Cloud Native:
- Stateless architecture suitable for Cloud Run.
- JSON structured logging (
slog) for Cloud Logging. - Graceful shutdown handling.
- Health check endpoint (
/health).
- Flexible Output: Supports exporting to specific folders or wildcard paths in GCS.
- Go 1.25+
- Google Cloud Project with BigQuery and GCS enabled.
- Service Account with permissions:
BigQuery Job UserBigQuery Data Viewer(on source dataset)Storage Object Admin(on destination bucket)
The application is configured via environment variables:
| Variable | Description | Default |
|---|---|---|
PORT |
HTTP Port to listen on | 8080 |
RUN_MODE |
service (HTTP) or job (one-off) |
service |
GCP_PROJECT_ID |
Google Cloud Project ID | Detected from creds |
GOOGLE_APPLICATION_CREDENTIALS |
Path to Service Account JSON key | - |
GIN_MODE |
Gin framework mode (release or debug) |
release (if unset) |
API_KEY |
Optional API key for request auth | - |
EXPORT_DRIVER |
Destination driver: GCS_PARQUET or STARROCKS |
GCS_PARQUET |
STARROCKS_HOST |
StarRocks FE host | - |
STARROCKS_PORT |
StarRocks MySQL port | 9030 |
STARROCKS_USER |
StarRocks user | - |
STARROCKS_PASSWORD |
StarRocks password | - |
STARROCKS_DB |
Default database used when request omits database |
- |
STARROCKS_WAREHOUSE |
Session warehouse for StarRocks | default_warehouse |
STARROCKS_BATCH_SIZE |
Insert batch size | 1000 |
Job mode environment overrides (only when RUN_MODE=job):
| Variable | Description | Default |
|---|---|---|
JOB_QUERY |
SQL to run on BigQuery | - |
JOB_QUERY_LOCATION |
BigQuery job location (e.g., US) |
- |
JOB_TABLE |
Target table name for StarRocks | - |
JOB_DATABASE |
Target database for StarRocks | - |
JOB_OUTPUT |
GCS output URI/prefix for Parquet | - |
JOB_FILENAME |
Base filename for Parquet exports | - |
JOB_USE_TIMESTAMP |
Append timestamp to filenames (true/false) |
false |
JOB_CREATE_DDL |
Optional explicit CREATE TABLE DDL | - |
Single endpoint supports both drivers. The body shape is unified; fields are validated per driver.
Request Body:
{
"query": "SELECT * FROM dataset.table",
"query_location": "US",
"table": "optional-for-starrocks",
"output": "required-for-gcs",
"filename": "optional-for-gcs",
"use_timestamp": false
}- Common:
queryandquery_locationare required.
- GCS Parquet:
outputrequired;filenameanduse_timestampoptional.- Response includes
gcs_path.
- StarRocks:
tableoptional; defaults toexport.databaseoptional; overrides the defaultSTARROCKS_DBfor this request. If set, the service ensures the database exists (creates if missing).create_ddloptional; if provided, will be executed to create the table (e.g., full CREATE TABLE ... statement). If not provided, the service infers schema from the BigQuery result and:- Creates the table if missing using a default DUPLICATE KEY model (first column) and HASH distribution (8 buckets)
- Performs automatic schema evolution by adding missing columns when the query returns new fields
- Response includes
starrocks_tableandrows_loaded.
When running via docker compose up, the service listens on localhost:8080, requires the header X-API-Key: apikey, and defaults to EXPORT_DRIVER=GCS_PARQUET.
- StarRocks (switch driver to STARROCKS first: set
EXPORT_DRIVER=STARROCKSin compose or env). You can omitSTARROCKS_DBand specifydatabasein the request:
curl -X POST http://localhost:8080/api/export \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT puskesmas_name, file_name FROM test_data.data_ingestion_report LIMIT 10",
"query_location": "asia-southeast2",
"table": "data_ingestion_report",
"database": "syntethic_data"
}'Explicit DDL (optional):
curl -X POST http://localhost:8080/api/export \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT puskesmas_name, file_name FROM test_data.data_ingestion_report LIMIT 10",
"query_location": "asia-southeast2",
"table": "data_ingestion_report",
"database": "syntethic_data",
"create_ddl": "CREATE TABLE IF NOT EXISTS syntethic_data.data_ingestion_report (puskesmas_name VARCHAR(256), file_name VARCHAR(256)) ENGINE=OLAP"
}'# Build and run with ADC mounted
docker compose up --build
# Stop
docker compose downPlace your service account JSON at the project root as sa.json. The compose file mounts it into the container and sets GOOGLE_APPLICATION_CREDENTIALS=/app/creds/sa.json. Optional environment variables can be provided via .env.
Set an API key via environment variable and include it in requests:
API_KEY=your-api-key
Send the header on requests:
X-API-Key: your-api-key
The /health endpoint is public; /api/export requires the header when API_KEY is set. For Cloud Scheduler, add the same header in the job configuration.
docker build -t bq-exporter .Use when each run finishes under 60 minutes.
gcloud run deploy bq-exporter \
--image gcr.io/YOUR_PROJECT/bq-exporter \
--platform managed \
--region us-central1 \
--allow-unauthenticatedTrigger with Cloud Scheduler HTTP target to POST /api/export. Prefer OIDC auth.
Use for larger loads that may exceed the 60‑minute HTTP limit.
- Build and deploy the same image as a Job:
gcloud run jobs create bq-exporter-job \
--image gcr.io/YOUR_PROJECT/bq-exporter \
--region us-central1 \
--set-env-vars RUN_MODE=job- Provide per-run overrides via environment variables:
gcloud run jobs update bq-exporter-job \
--region us-central1 \
--set-env-vars \
JOB_QUERY="SELECT id FROM dataset.table",\
JOB_QUERY_LOCATION="US",\
JOB_TABLE="users",\
JOB_DATABASE="analytics",\
JOB_OUTPUT="",\
JOB_FILENAME="",\
JOB_USE_TIMESTAMP="false",\
JOB_CREATE_DDL=""- Run on demand or schedule via Cloud Scheduler using the Jobs API (e.g., Cloud Workflows or Cloud Functions as an orchestrator).
Job mode logs the result and exits; no HTTP server is started.
Cloud Scheduler can call the Cloud Run Admin API to run the job on schedule.
- URL:
POST https://run.googleapis.com/v2/projects/PROJECT_ID/locations/REGION/jobs/JOB_NAME:run
- Auth:
- Use OAuth Token with scope
https://www.googleapis.com/auth/cloud-platform - Grant the Scheduler’s service account
roles/run.jobRunner(orroles/run.admin)
- Use OAuth Token with scope
- Body:
- Per-run override (human-friendly envs):
{ "overrides": { "containerOverrides": [ { "name": "JOB_QUERY", "value": "SELECT id FROM dataset.table" }, { "name": "JOB_QUERY_LOCATION", "value": "US" }, { "name": "JOB_TABLE", "value": "users" }, { "name": "JOB_DATABASE", "value": "analytics" }, { "name": "JOB_OUTPUT", "value": "gs://my-bucket/exports/" }, { "name": "JOB_FILENAME", "value": "daily" }, { "name": "JOB_USE_TIMESTAMP", "value": "true" }, { "name": "JOB_CREATE_DDL", "value": "" } ] } }
- Per-run override (human-friendly envs):
# Create .env file
echo "GOOGLE_APPLICATION_CREDENTIALS=./key.json" > .env
# Run
go run main.go-
Build and Push Image:
gcloud builds submit --tag gcr.io/YOUR_PROJECT/bq-exporter
-
Deploy:
gcloud run deploy bq-exporter \ --image gcr.io/YOUR_PROJECT/bq-exporter \ --platform managed \ --region us-central1 \ --allow-unauthenticated \ --service-account YOUR-SERVICE-ACCOUNT@YOUR_PROJECT.iam.gserviceaccount.com
Note: Remove
--allow-unauthenticatedif you want to secure it with IAM.
To trigger this service on a schedule (e.g., every hour):
- Create a Cloud Scheduler job.
- Target type: HTTP
- URL:
https://your-cloud-run-url.run.app/api/export - HTTP Method: POST
- Body:
{ "query": "SELECT * FROM dataset.table WHERE date = CURRENT_DATE()", "output": "gs://my-bucket/daily-export/" } - Auth Header: Add OIDC Token (select your service account).
The application automatically logs X-CloudScheduler-JobName and X-CloudScheduler-ScheduleTime headers to help you trace execution in Cloud Logging.
- Ensure the stack is running:
docker compose up --detach --wait --wait-timeout 120- Open the MySQL-compatible CLI inside the FE container:
docker compose exec starrocks-fe mysql -uroot -h starrocks-fe -P9030- Example queries:
SHOW STORAGE VOLUMES;
SHOW COMPUTE NODES;
CREATE DATABASE IF NOT EXISTS analytics;
SHOW DATABASES;- Notes:
- The default storage volume is auto-created and set via FE config in
docker-compose.yml(shared-data mode pointing at MinIO). - Databases and tables are not auto-created. Create them via SQL, or let the exporter create tables on first load when using the
STARROCKSdriver.
- The default storage volume is auto-created and set via FE config in
- Create a simple OLAP table (Duplicate Key model) and insert rows:
CREATE TABLE IF NOT EXISTS analytics.users (
id BIGINT,
name VARCHAR(256),
created_at DATETIME
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 8
PROPERTIES ("replication_num" = "1");
INSERT INTO analytics.users (id, name, created_at) VALUES
(1, 'Alice', '2026-02-19 09:00:00'),
(2, 'Bob', '2026-02-19 09:05:00');
SELECT * FROM analytics.users ORDER BY id;
SELECT COUNT(*) FROM analytics.users;- Tip: You can run the above directly via:
docker compose exec -T starrocks-fe mysql -uroot -h starrocks-fe -P9030 -e "
CREATE TABLE IF NOT EXISTS analytics.users (
id BIGINT,
name VARCHAR(256),
created_at DATETIME
) ENGINE=OLAP
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(id) BUCKETS 8
PROPERTIES (\"replication_num\" = \"1\");
INSERT INTO analytics.users (id, name, created_at) VALUES
(1, 'Alice', '2026-02-19 09:00:00'),
(2, 'Bob', '2026-02-19 09:05:00');
SELECT COUNT(*) FROM analytics.users;
SELECT * FROM analytics.users ORDER BY id;"