Skip to content

feat(table): add RowDelta API for atomic row-level mutations#789

Open
laskoviymishka wants to merge 1 commit intoapache:mainfrom
laskoviymishka:feat/rowdelta-api
Open

feat(table): add RowDelta API for atomic row-level mutations#789
laskoviymishka wants to merge 1 commit intoapache:mainfrom
laskoviymishka:feat/rowdelta-api

Conversation

@laskoviymishka
Copy link
Contributor

Adds Transaction.NewRowDelta() — Go equivalent of Java's BaseRowDelta. Commits data files and delete files (position or equality) in one atomic snapshot. This is needed for row-level mutations: an UPDATE becomes an equality delete for the old row + append of the new row, both in one commit.

Resolves #602.

API

rd := tx.NewRowDelta(snapshotProps)
rd.AddRows(dataFile1, dataFile2)
rd.AddDeletes(posDeleteFile, eqDeleteFile)
rd.Commit(ctx)

Operation type picked automatically: data-only → append, deletes-only → delete, both → overwrite.

Validation

  • Delete files require format version >= 2
  • Equality deletes must have non-empty EqualityFieldIDs referencing existing schema columns
  • Content types checked: no data files in AddDeletes, no delete files in AddRows

Known limitations

  • No conflict detection for concurrent writers — documented in the type comment
  • Uses fast-append producer (no manifest merging)

What's tested

The interesting ones:

  • Commit data + position deletes, check snapshot summary has added-data-files=1, added-delete-files=1, operation is overwrite
  • Commit equality deletes, check added-equality-delete-files shows up in summary
  • Read back manifests after commit, verify there's one data manifest and one delete manifest with correct content types in entries
  • Two RowDeltas on same transaction (batch1 append, batch2 append+delete), verify cumulative total-data-files
  • v1 table rejects delete files with clear error
  • Equality delete file without field IDs → error
  • Equality delete file with field ID 999 (not in schema) → error

The round-trip integration test:

  1. Write 5 rows as real Parquet, append to table
  2. Write a position delete file targeting positions 1 and 3, commit via RowDelta
  3. Scan the table back — get 3 rows, verify IDs are [1, 3, 5] (beta and delta gone)

This covers the full path: write parquet → RowDelta commit → scan with position delete filtering applied.

What's left to do

This PR covers the commit API. Remaining work for full DML support:

  • Equality delete file writing — a writer that produces Parquet files with PK-only schema and EntryContentEqDeletes content type. The RowDelta API already accepts them, but there's no convenient writer yet.
  • Equality delete reading — the scanner currently errors with "iceberg-go does not yet support equality deletes" (scanner.go:415). Needs: collect eq delete entries during scan planning, match to data files by partition + sequence number, apply hash-based anti-join during Arrow reads.
  • Conflict validationvalidateFromSnapshot, validateNoConflictingDataFiles, etc. Java's Flink connector skips most of this for streaming, so it's not blocking for CDC use cases.

@laskoviymishka laskoviymishka marked this pull request as ready for review March 15, 2026 21:33
Add Transaction.NewRowDelta() for committing data files and delete files  (position or equality) in a single atomic snapshot. Includes format version validation, equality field ID validation, and full round-trip integration test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

support for row delta

1 participant