-
Notifications
You must be signed in to change notification settings - Fork 156
Description
Apache Iceberg version
main (development)
Please describe the bug π
txn.AddFiles() in table/transaction.go hardcodes .fastAppend(), completely
bypassing appendSnapshotProducer() which correctly reads commit.manifest-merge.enabled.
Setting commit.manifest-merge.enabled=true in table properties has no effect when using
AddFiles() β manifests accumulate 1-per-commit indefinitely regardless of the property value.
All other append paths behave correctly:
| Method | Uses appendSnapshotProducer() |
|---|---|
AddDataFiles() |
β yes |
Append() |
β yes |
AddFiles() |
β hardcodes .fastAppend() β this bug |
Steps to reproduce
// Create table with merge enabled and minCountToMerge=2
meta, _ := table.NewMetadata(schema, iceberg.UnpartitionedSpec,
table.UnsortedSortOrder, location, iceberg.Properties{
table.ManifestMergeEnabledKey: "true",
table.ManifestMinMergeCountKey: "2",
})
cat := &inMemoryCatalog{meta: meta}
tbl := table.New(ident, meta, metadataLoc,
func(_ context.Context) (iceio.IO, error) { return fs, nil }, cat)
// 3 separate AddFiles commits β merge should fire at commit 3 (count > minCount=2)
for i := range 3 {
txn := tbl.NewTransaction()
_ = txn.AddFiles(ctx, []string{filePaths[i]}, nil, false)
tbl, _ = txn.Commit(ctx)
}
snap := tbl.CurrentSnapshot()
manifests, _ := snap.Manifests(fs)
fmt.Println(len(manifests)) // prints 3 β BUG: should print 1Expected behavior
AddFiles() should route through appendSnapshotProducer() which reads
ManifestMergeEnabledKey and returns mergeAppend() or fastAppend() accordingly.
With minCountToMerge=2 and 3 commits, the 3rd commit should trigger a merge
producing 1 consolidated manifest.
Java reference: BaseTable.newAppend() always uses MergeAppend.
MANIFEST_MERGE_ENABLED_DEFAULT is true in Java. The Go implementation
should behave consistently with the reference implementation.
Actual behavior
AddFiles() always uses fastAppend() regardless of commit.manifest-merge.enabled.
Each call creates one new manifest file that is never consolidated.
Production impact observed on a table with 1 commit/min running 18 hours:
- HEAD snapshot accumulated 1121 manifests (one per commit, never merged)
- Orphan cleanup scan time grew +1m 50s/hour (must open all 1121 manifests via
FetchEntries()) - Athena query planning opens all 1121 manifests before executing any query
After applying the 1-line fix below, HEAD dropped to 16 manifests on the next commit.
Root cause
// table/transaction.go β hardcoded, bypasses the property:
updater := t.updateSnapshot(fs, snapshotProps, OpAppend).fastAppend()
// Fix (1 line) β same as AddDataFiles() and Append() already do:
updater := t.appendSnapshotProducer(fs, snapshotProps)appendSnapshotProducer() already exists and correctly reads the property:
func (t *Transaction) appendSnapshotProducer(afs io.IO, props iceberg.Properties) *snapshotProducer {
manifestMerge := t.meta.props.GetBool(ManifestMergeEnabledKey, ManifestMergeEnabledDefault)
updateSnapshot := t.updateSnapshot(afs, props, OpAppend)
if manifestMerge {
return updateSnapshot.mergeAppend()
}
return updateSnapshot.fastAppend()
}It is used by AddDataFiles() and Append() but was missed in AddFiles().
Test gap
Three existing tests set commit.manifest-merge.enabled=true but none call AddFiles():
| Test | File | Uses AddFiles()? |
|---|---|---|
TestCommitV3RowLineageDeltaIncludesExistingRows |
snapshot_producers_test.go |
β calls newMergeAppendFilesProducer() directly |
TestMergeManifests |
table_test.go |
β uses tblA.AppendTable() β Append() |
TestSetProperties |
transaction_test.go |
β only sets the property, never appends any files |
Because no test exercises AddFiles() + commit.manifest-merge.enabled=true together,
the bug went undetected.
A regression test through the AddFiles() public API is included in the linked PR.
Additional context
- Affected method:
Transaction.AddFiles()intable/transaction.go - Not affected:
AddDataFiles(),Append(),ReplaceDataFiles()β all correctly useappendSnapshotProducer() - Java reference:
BaseTable.newAppend()always returnsMergeAppend;MANIFEST_MERGE_ENABLED_DEFAULT = true - A PR with the fix and regression test is ready to submit
Note: This bug was identified and the fix was developed with AI assistance (Claude).
The implementation has been verified end-to-end including Java reference behavior,
production impact measurement, and a regression test proving both the bug and the fix.