Skip to content

add TransactionalCatalog interface for multi-table atomic commits #784

@laskoviymishka

Description

@laskoviymishka

Feature Request / Improvement

The Iceberg REST spec defines POST /v1/transactions/commit for atomic commits across multiple tables (OpenAPI spec). Java has supported this since Iceberg 1.4 via RESTSessionCatalog.commitTransaction().

Motivation

Tooling that snapshots an entire database (e.g., all tables from Postgres) to Iceberg currently commits each table independently. If a failure occurs mid-way, the lakehouse is left in an inconsistent state — some tables reflect the new snapshot, others are stale. Multi-table atomic commit solves this.

Example: snapshotting 5 Postgres tables where the 3rd commit fails:

Table     Commit Result
────────  ──────────────
users     ✓ committed (snapshot T₁)
orders    ✓ committed (snapshot T₁)
payments  ✗ failed (409 conflict)
products  — never attempted
inventory — never attempted

Consumers see inconsistent cross-table state. Joins, aggregates, and foreign key relationships produce silently wrong results.

Proposal

Add a TableCommit struct in the table package:

// TableCommit holds the identifier, requirements, and updates
// for a single table within a multi-table transaction.
type TableCommit struct {
    Identifier   Identifier
    Requirements []Requirement
    Updates      []Update
}

Add an optional TransactionalCatalog interface in the catalog package (separate from Catalog to avoid breaking existing implementations):

// TransactionalCatalog is an optional interface implemented by catalogs
// that support atomic multi-table commits.
type TransactionalCatalog interface {
    CommitTransaction(ctx context.Context, commits []table.TableCommit) error
}

Callers check support via type assertion:

if tc, ok := cat.(catalog.TransactionalCatalog); ok {
    err := tc.CommitTransaction(ctx, commits)
} else {
    // fallback to per-table commits
}

Design Notes

  • Separate interface avoids breaking existing Catalog implementations (Glue, Hive, SQL) that don't support multi-table commits server-side.
  • Input validation: commits must be non-empty, every commit must have a non-nil identifier.
  • The endpoint returns 204 No Content — no metadata is returned. Callers must LoadTable() after if they need updated state.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions