Skip to content
This repository was archived by the owner on Feb 3, 2026. It is now read-only.

shinesolutions/bigquery-table-to-one-file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bigquery-table-to-one-file

Using Cloud Dataflow, this trivial Java application reads a table in BigQuery, and turns it into one file in GCS (GZIP compressed format). Why? Because currently BigQuery only support unsharded exports of under 1 GB.

https://cloud.google.com/bigquery/docs/exporting-data https://cloud.google.com/dataflow/

It uses the default credentials set to the environment variable GOOGLE_APPLICATION_CREDENTIALS. See all about that here: https://developers.google.com/identity/protocols/application-default-credentials

In the code, change the table name and bucket details etc. to suit your needs. You will also just need to create the GCS bucket(s) yourself. I wasn't bothered making them cli parameters.

To run:

--project=<your_project_id> --runner=DataflowRunner --jobName=bigquery-table-to-one-file --maxNumWorkers=50 --zone=australia-southeast1-a --stagingLocation=gs://<your_bucket>/jars --tempLocation=gs://<your_bucket>/tmp

It should look like this when it's running. I tested it with the public WIKI table (1 billion rows & ~100GB) and it took about 6 hours using 50 n1-standard-1 workers:

alt text alt text

About

Using Cloud Dataflow, read a table in BigQuery, and turns it into one file in GCS (BigQuery only support sharded exports over 1GB).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages