Skip to content

Commit 42c6c85

Browse files
mothukurRama Mothukuri
andauthored
SwiftLake 0.2.0 release (#8)
Co-authored-by: Rama Mothukuri <mothukur@arcesium.com>
1 parent 178a6c5 commit 42c6c85

File tree

3 files changed

+56
-11
lines changed

3 files changed

+56
-11
lines changed

.github/ISSUE_TEMPLATE/bug_report.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ body:
1414
label: SwiftLake version
1515
description: What version of SwiftLake are you using?
1616
options:
17-
- "0.1.0 (latest release)"
17+
- "0.2.0 (latest release)"
18+
- "0.1.0"
1819
- "main (development)"
1920
- "Other (please specify in description)"
2021
validations:

README.md

Lines changed: 52 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -83,14 +83,14 @@ Add this to your `pom.xml`:
8383
<dependency>
8484
<groupId>com.arcesium.swiftlake</groupId>
8585
<artifactId>swiftlake-core</artifactId>
86-
<version>0.1.0</version>
86+
<version>0.2.0</version>
8787
</dependency>
8888
```
8989

9090
#### Gradle
9191
Add this to your `build.gradle`:
9292
```gradle
93-
implementation 'com.arcesium.swiftlake:swiftlake-core:0.1.0'
93+
implementation 'com.arcesium.swiftlake:swiftlake-core:0.2.0'
9494
```
9595

9696
### Setup
@@ -203,13 +203,13 @@ To use SwiftLake with Amazon S3, you need to configure the S3 file system:
203203
<dependency>
204204
<groupId>com.arcesium.swiftlake</groupId>
205205
<artifactId>swiftlake-aws</artifactId>
206-
<version>0.1.0</version>
206+
<version>0.2.0</version>
207207
</dependency>
208208
```
209209

210210
##### Gradle
211211
```gradle
212-
implementation 'com.arcesium.swiftlake:swiftlake-aws:0.1.0'
212+
implementation 'com.arcesium.swiftlake:swiftlake-aws:0.2.0'
213213
```
214214

215215
2. Configure S3 in your SwiftLake setup:
@@ -435,9 +435,11 @@ swiftLakeEngine.deleteFrom(tableName)
435435
| `isolationLevel` | `serializable` | Specifies the Iceberg isolation level to use |
436436
| `branch` | `main` | Specifies the Iceberg branch to use |
437437

438-
### SCD1Merge Operation
438+
### SCD1 Merge Operation
439+
The SCD1 (Slowly Changing Dimension Type 1) merge operation in SwiftLakeEngine allows you to update existing records, insert new ones, and delete records while maintaining only the current state of data without preserving history. This operation is ideal for dimensions where historical values are not needed.
439440

440-
The SCD1Merge functionality in SwiftLakeEngine allows you to perform Slowly Changing Dimension Type 1 (SCD1) merges on Iceberg tables. This operation combines insert, update, and delete operations, enabling you to update existing records, insert new ones, and delete records based on matching conditions and an operation column in the source.
441+
#### Changes Mode
442+
In Changes Mode, the SCD1 merge operation processes incremental changes based on change events. It expects input data containing records with an Operation Column specifying the change type (INSERT/UPDATE, DELETE). The operation applies the specified changes to matching records based on key columns.
441443

442444
```java
443445
String sourceSql = "SELECT * FROM (VALUES (1, 'a', 'category1', DATE'2025-01-01', 'INSERT'), " +
@@ -452,7 +454,7 @@ swiftLakeEngine.applyChangesAsSCD1(tableName)
452454
.execute();
453455
```
454456

455-
#### Configuration Options
457+
##### Configuration Options
456458

457459
| Name | Default | Description |
458460
|------|---------|------------------------------------------------------------------------------------|
@@ -475,12 +477,54 @@ swiftLakeEngine.applyChangesAsSCD1(tableName)
475477
| `isolationLevel` | `serializable` | Specifies the Iceberg isolation level to use |
476478
| `branch` | `main` | Specifies the Iceberg branch to use |
477479

478-
#### Best Practices
480+
##### Best Practices
479481

480482
1. Always specify key columns to ensure accurate matching between source and target records.
481483
2. Use table filtering to optimize performance, especially for large tables.
482484
3. Consider using `executeSourceSqlOnceOnly` for complex or non-deterministic source queries.
483485

486+
#### Snapshot Mode
487+
In Snapshot Mode, the SCD1 merge operation performs merge based on snapshot comparisons. This mode identifies differences by comparing a complete input snapshot with the existing data in the table, enabling efficient detection of inserts, updates, and deletes without requiring an operation column.
488+
489+
```java
490+
String sourceSql = "SELECT * FROM (VALUES (1, 'a', 'category1', DATE'2025-01-01'), " +
491+
"(3, 'c', 'category3', DATE'2025-01-01')) " +
492+
"source(id, data, category, date)";
493+
494+
swiftLakeEngine.applySnapshotAsSCD1(tableName)
495+
.tableFilterSql("date = DATE'2025-01-01'")
496+
.sourceSql(sourceSql)
497+
.keyColumns(List.of("id", "category", "date"))
498+
.execute();
499+
```
500+
501+
##### Configuration Options
502+
503+
| Name | Default | Description |
504+
|------|---------|------------------------------------------------------------------------------------|
505+
| `tableFilterSql` | - | SQL predicate to retrieve the existing snapshot data from the table for comparison |
506+
| `tableFilter` | - | Alternative condition specification using Expressions APIs |
507+
| `sourceSql` | - | SELECT statement SQL query to retrieve input data for merge |
508+
| `sourceMybatisStatement` | - | |
509+
| &nbsp;&nbsp;&nbsp;&nbsp;`id` | - | Identifier of the MyBatis SELECT statement to retrieve input data |
510+
| &nbsp;&nbsp;&nbsp;&nbsp;`parameter` | - | Parameters to replace in the MyBatis SQL query |
511+
| `keyColumns` | - | Primary key columns used to match source and target records |
512+
| `valueColumns` | All non-key columns | List of columns to consider when detecting changes between source snapshot and target table |
513+
| `valueColumnsMetadata` | - | Map of column names to their value comparison metadata, including maximum allowed delta for numeric columns and null replacement values |
514+
| `columns` | All columns of the table | List of column names to include in the merge |
515+
| `executeSourceSqlOnceOnly` | `false` | Set to true for partitioned tables with expensive or non-deterministic queries |
516+
| `skipDataSorting` | `false` | When set to true, skips sorting data before insertion |
517+
| `sqlSessionFactory` | SwiftLake Engine-level SqlSessionFactory | Optional SqlSessionFactory for MyBatis integration |
518+
| `processSourceTables` | SwiftLake Engine-level processTablesDefaultValue | Process tables present in the source SQL |
519+
| `isolationLevel` | `serializable` | Specifies the Iceberg isolation level to use |
520+
| `branch` | `main` | Specifies the Iceberg branch to use |
521+
522+
##### Best Practices
523+
524+
1. Carefully define your tableFilter to ensure you're comparing the correct subset of existing data.
525+
2. Use `valueColumnsMetadata` to fine-tune change detection, especially for numeric columns where small variations might not be considered significant changes.
526+
3. Consider using `executeSourceSqlOnceOnly` for complex or expensive source queries to improve performance.
527+
484528
### SCD2 Merge Operation
485529

486530
The SCD2 (Slowly Changing Dimension Type 2) merge operation in SwiftLakeEngine preserves historical records, maintaining a temporal history of changes. This allows for efficient auditing and analysis of data over time.

build.gradle

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ plugins {
1616

1717
allprojects {
1818
group = 'com.arcesium.swiftlake'
19-
version = '0.1.0'
19+
version = '0.2.0'
2020
repositories {
2121
mavenCentral()
2222
}
@@ -72,7 +72,7 @@ subprojects {
7272

7373
pom {
7474
name = 'SwiftLake'
75-
description = 'A lightweight Java library for cloud data lakes'
75+
description = 'SwiftLake: Simplifying lakehouse data operations with Iceberg and DuckDB'
7676
url = 'https://github.com/arcesium/swiftlake'
7777

7878
licenses {

0 commit comments

Comments
 (0)