Skip to content

[Backfill] Improve backfill performance by skipping merge until stream exhausts #262

@george-zubrienko

Description

@george-zubrienko

Currently we merge each batch from source into temporary table and then fire CREATE OR REPLACE. However, since backfills for many source are APPEND-ONLY, or, in case of Synapse, can be pre-aggregated, an improvement can be done to reduce load on merge engine:

  • Do not create staging tables except for backfill table
  • Append data files to backfill table
  • Modify backfill-overwrite query to perform pre-aggregation if needed (for Synapse, for example)

This will allow us to bypass query engine until the last stage, maximizing read throughput from source.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions