Skip to content

Data Integration: add optional truncate table for FULL_TABLE sync (Postgres & MySQL)#6076

Open
topdev998 wants to merge 1 commit intomage-ai:masterfrom
topdev998:feature/truncate-full-table-sync
Open

Data Integration: add optional truncate table for FULL_TABLE sync (Postgres & MySQL)#6076
topdev998 wants to merge 1 commit intomage-ai:masterfrom
topdev998:feature/truncate-full-table-sync

Conversation

@topdev998
Copy link
Copy Markdown

Description

Add an optional truncate_full_table configuration flag for SQL destinations to support truncating the destination table during FULL_TABLE replication.

When enabled, the destination table is truncated before loading a fresh snapshot, ensuring idempotent full refresh behavior. This logic is implemented in the shared SQL base class and applied to PostgreSQL and MySQL destinations.

Motivation

Currently, FULL_TABLE replication appends data unless the table is manually cleared. This enhancement allows users to perform a clean overwrite of the destination table automatically, which is a common requirement for full refresh pipelines.

Implementation details

  • Introduced KEY_TRUNCATE_FULL_TABLE = "truncate_full_table" in destinations/constants.py
  • Added:
    - truncate_full_table property
    - build_truncate_table_commands helper in destinations/sql/base.py
  • Updated Destination.build_query_strings:
    • When:
      • replication_method == FULL_TABLE
      • truncate_full_table == True
      • destination table exists
    • Then:
      • add a TRUNCATE TABLE statement during the initial batch (batch 0), before insert queries
  • Exposed the flag in
    • destinations/postgresql/templates/config.json
    • destinations/mysql/templates/config.json
      (default is false to preserve existing behavior)

How to use

Example (PostgreSQL):
{
"database": "your_db",
"host": "your_host",
"password": "your_password",
"port": 5432,
"schema": "public",
"table": "your_table",
"username": "your_user",
"truncate_full_table": true
}

Example (MySQL):

{
"database": "your_db",
"host": "your_host",
"password": "your_password",
"port": 3306,
"table": "your_table",
"username": "your_user",
"use_lowercase": true,
"truncate_full_table": true
}

How Has This Been Tested?

Unit tests

Added unit tests for both destinations:

  • PostgreSQLDestinationTests.test_build_query_strings_truncate_full_table
  • MySQLDestinationTests.test_build_query_strings_truncate_full_table

These tests verify that:

  • TRUNCATE TABLE is generated when:
    • replication_method = FULL_TABLE
    • truncate_full_table = True
    • table exists
  • No truncate occurs when the flag is disabled or replication method is not FULL_TABLE

Run tests:

python -m unittest mage_integrations.tests.destinations.postgresql.test_postgres
python -m unittest mage_integrations.tests.destinations.mysql.test_mysql

Manual validation

  • Simulated destination execution via build_query_strings
  • Verified:
    • With truncate_full_table=trueTRUNCATE TABLE present
    • With truncate_full_table=false → no truncate
    • With non-FULL_TABLE → no truncate

Checklist

  • The PR is tagged with proper labels (enhancement, data integration)
  • I have performed a self-review of my own code
  • I have added unit tests that prove my feature works
  • I have commented my code where necessary
  • I have made corresponding changes to the configuration templates

@wangxiaoyou1993
Copy link
Copy Markdown
Member

Have you tested with real Mage data integration pipeline?

@topdev998
Copy link
Copy Markdown
Author

Have you tested with real Mage data integration pipeline?

Hi, thanks for the question!
I validated the feature through unit tests and by manually executing build_query_strings to simulate the destination pipeline behavior. This confirmed that when truncate_full_table is enabled with FULL_TABLE replication, the TRUNCATE TABLE statement is generated and executed in batch 0 before insert queries.

I also attempted to run a full Mage pipeline using the Docker dev environment, but encountered a transient dependency issue during the frontend build (Yarn registry error), so I wasn’t able to complete that end-to-end run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants