Skip to content
This repository was archived by the owner on Mar 21, 2025. It is now read-only.

Dataflow Flex Template template which loads data exports in Apache Avro format to Cloud Spanner from a staged Cloud Storage Bucket.

License

Notifications You must be signed in to change notification settings

nitobuendia/dataflow-gcs-avro-to-spanner-scd

Repository files navigation

Load Apache Avro files to Cloud Spanner with Slowly Changing Dimensions (SCD) using Dataflow Flex template

Customers have large volumes of transactional data with Slowly Changing Dimensions (SCD), which may need to be loaded to Cloud Spanner during migrations.

The Dataflow pipeline template in this solution allows customers to load exports (in Apache Avro format) from their current database or data warehouse to Cloud Spanner from a staged Cloud Storage Bucket.

The Dataflow pipeline supports the following SCD Types:

  • SCD Type 1: updates existing row if the primary key exists, or inserts a new row otherwise.

  • SCD Type 2: updates existing row's end date to the current timestamp if the primary key exists, and inserts a new row with null end date and start date with the current timestamp if the column is passed.

For more details, check the documentation.

Disclaimer: This a copy/fork for display of work purposes. If you are interested in using and/or contributing to this project, refer to the original source.

About

Dataflow Flex Template template which loads data exports in Apache Avro format to Cloud Spanner from a staged Cloud Storage Bucket.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published