You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`isolationLevel`|`serializable`| Specifies the Iceberg isolation level to use |
436
436
|`branch`|`main`| Specifies the Iceberg branch to use |
437
437
438
-
### SCD1Merge Operation
438
+
### SCD1 Merge Operation
439
+
The SCD1 (Slowly Changing Dimension Type 1) merge operation in SwiftLakeEngine allows you to update existing records, insert new ones, and delete records while maintaining only the current state of data without preserving history. This operation is ideal for dimensions where historical values are not needed.
439
440
440
-
The SCD1Merge functionality in SwiftLakeEngine allows you to perform Slowly Changing Dimension Type 1 (SCD1) merges on Iceberg tables. This operation combines insert, update, and delete operations, enabling you to update existing records, insert new ones, and delete records based on matching conditions and an operation column in the source.
441
+
#### Changes Mode
442
+
In Changes Mode, the SCD1 merge operation processes incremental changes based on change events. It expects input data containing records with an Operation Column specifying the change type (INSERT/UPDATE, DELETE). The operation applies the specified changes to matching records based on key columns.
|`isolationLevel`|`serializable`| Specifies the Iceberg isolation level to use |
476
478
|`branch`|`main`| Specifies the Iceberg branch to use |
477
479
478
-
#### Best Practices
480
+
#####Best Practices
479
481
480
482
1. Always specify key columns to ensure accurate matching between source and target records.
481
483
2. Use table filtering to optimize performance, especially for large tables.
482
484
3. Consider using `executeSourceSqlOnceOnly` for complex or non-deterministic source queries.
483
485
486
+
#### Snapshot Mode
487
+
In Snapshot Mode, the SCD1 merge operation performs merge based on snapshot comparisons. This mode identifies differences by comparing a complete input snapshot with the existing data in the table, enabling efficient detection of inserts, updates, and deletes without requiring an operation column.
|`tableFilterSql`| - | SQL predicate to retrieve the existing snapshot data from the table for comparison |
506
+
|`tableFilter`| - | Alternative condition specification using Expressions APIs |
507
+
|`sourceSql`| - | SELECT statement SQL query to retrieve input data for merge |
508
+
|`sourceMybatisStatement`| - ||
509
+
| `id`| - | Identifier of the MyBatis SELECT statement to retrieve input data |
510
+
| `parameter`| - | Parameters to replace in the MyBatis SQL query |
511
+
|`keyColumns`| - | Primary key columns used to match source and target records |
512
+
|`valueColumns`| All non-key columns | List of columns to consider when detecting changes between source snapshot and target table |
513
+
|`valueColumnsMetadata`| - | Map of column names to their value comparison metadata, including maximum allowed delta for numeric columns and null replacement values |
514
+
|`columns`| All columns of the table | List of column names to include in the merge |
515
+
|`executeSourceSqlOnceOnly`|`false`| Set to true for partitioned tables with expensive or non-deterministic queries |
516
+
|`skipDataSorting`|`false`| When set to true, skips sorting data before insertion |
|`processSourceTables`| SwiftLake Engine-level processTablesDefaultValue | Process tables present in the source SQL |
519
+
|`isolationLevel`|`serializable`| Specifies the Iceberg isolation level to use |
520
+
|`branch`|`main`| Specifies the Iceberg branch to use |
521
+
522
+
##### Best Practices
523
+
524
+
1. Carefully define your tableFilter to ensure you're comparing the correct subset of existing data.
525
+
2. Use `valueColumnsMetadata` to fine-tune change detection, especially for numeric columns where small variations might not be considered significant changes.
526
+
3. Consider using `executeSourceSqlOnceOnly` for complex or expensive source queries to improve performance.
527
+
484
528
### SCD2 Merge Operation
485
529
486
530
The SCD2 (Slowly Changing Dimension Type 2) merge operation in SwiftLakeEngine preserves historical records, maintaining a temporal history of changes. This allows for efficient auditing and analysis of data over time.
0 commit comments