Skip to content

feat(manifest): support snapshot mainifest cache#386

Open
gripleaf wants to merge 4 commits into
alibaba:mainfrom
gripleaf:feat/snapshot-manifest-cache
Open

feat(manifest): support snapshot mainifest cache#386
gripleaf wants to merge 4 commits into
alibaba:mainfrom
gripleaf:feat/snapshot-manifest-cache

Conversation

@gripleaf

@gripleaf gripleaf commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Purpose

In our production workload, a Paimon table may have about 60k buckets. A batch of data is imported roughly every 15 minutes, and the interval may become longer. During query planning, one scan read and decoded about 4.89 million manifest entries with 16 threads, which took around 30 seconds, while only a small subset of entries was finally kept after pruning.

This patch introduces a snapshot-level live manifest cache to reduce repeated manifest read and decode cost. The cache stores merged live manifest entries for snapshots. When scanning a newer snapshot, paimon-cpp looks up the latest cached snapshot not greater than the target snapshot. If it is the same snapshot, the scan can use it directly; otherwise paimon-
cpp incrementally builds the target snapshot by reading intermediate delta manifests.

The implementation reuses the existing byte-oriented Cache interface:

  • add CacheKind::SNAPSHOT_LIVE_MANIFEST;
  • use table_path#branch as the logical cache key;
  • store multiple snapshots in one serialized binary cache value, bounded by scan.manifest-entry-cache.max-snapshots;
  • enable the optimization when a cache is provided through ScanContextBuilder::WithCache() and scan.manifest-entry-cache.max-snapshots is greater than 0.

This trades one in-memory serialize/deserialize path for avoiding remote manifest reads plus manifest decoding. In the expected case, live entries are much fewer than all historical manifest entries; even in a conservative case, the cache is useful as long as deserializing cached live entries is cheaper than reading manifest files from remote storage and
decoding all required manifest entries again.

Tests

  • cmake --build build --target paimon-core-test -j2
  • ./build/release/paimon-core-test --gtest_filter='FileStoreScanTest.TestSnapshotLiveManifestCache:FileStoreScanTest.TestSnapshotLiveManifestCacheUsesCacheKey:TableScanTest.*:CoreOptionsTest.TestDefaultValue:CoreOptionsTest.TestFromMap:CoreOptionsTest.TestInvalidCase'
  • git -c filter.lfs.process= -c filter.lfs.clean=cat -c filter.lfs.required=false diff --check

API and Format

This change extends the public cache kind enum with CacheKind::SNAPSHOT_LIVE_MANIFEST.

It removes the separate scan.manifest-entry-cache.enabled option. The cache is controlled by scan.manifest-entry-cache.max-snapshots: values greater than 0 enable the cache path when a cache is provided through WithCache(), and 0 disables it.

It does not change table storage format, file format, or network protocol. The serialized snapshot live manifest bundle is an in-memory cache value only and can be rebuilt from manifest files if absent or evicted.

Documentation

Yes. Added/updated user guide documentation for snapshot live manifest cache under docs/source/user_guide/manifest_entry_cache.rst.

Generative AI tooling

Generated-by: Codex (GPT-5)

@gripleaf gripleaf force-pushed the feat/snapshot-manifest-cache branch 12 times, most recently from 8337c3b to 5280b8e Compare June 30, 2026 13:31
@gripleaf gripleaf changed the title [WIP] feat(manifest): support snapshot mainifest cache feat(manifest): support snapshot mainifest cache Jul 1, 2026
@gripleaf gripleaf marked this pull request as ready for review July 1, 2026 02:05
@gripleaf gripleaf force-pushed the feat/snapshot-manifest-cache branch from 5280b8e to 4f2be24 Compare July 1, 2026 03:13
@gripleaf gripleaf marked this pull request as draft July 1, 2026 08:54
@gripleaf gripleaf force-pushed the feat/snapshot-manifest-cache branch 5 times, most recently from a30b4b7 to 7ac31b2 Compare July 1, 2026 13:25
@gripleaf gripleaf marked this pull request as ready for review July 1, 2026 13:27
@gripleaf gripleaf force-pushed the feat/snapshot-manifest-cache branch from 7ac31b2 to ffa1072 Compare July 2, 2026 02:59
@gripleaf gripleaf force-pushed the feat/snapshot-manifest-cache branch from 4968e72 to 44e9ee5 Compare July 2, 2026 11:31
for (const auto& [snapshot_id, entries] : entries_by_snapshot_) {
out.WriteValue<int64_t>(snapshot_id);
PAIMON_RETURN_NOT_OK(serializer.SerializeList(*entries, &out));
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to understand the need for versioning in the serialization/deserialization of SnapshotLiveManifestEntries. Is the expectation that it may be persisted or transferred across module boundaries? My understanding is probably not, since both parsing and toBytes seem to be handled entirely within SnapshotLiveManifestEntries itself.

My concern is that once we introduce a version here, any format change in a nested field of ManifestEntry—for example DataFileMeta—could require bumping the outer SnapshotLiveManifestEntries version as well to preserve cross-language compatibility. Right now, we only carry this kind of version information for protocols like split and commit message, so I want to confirm whether the version here is actually necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. SnapshotLiveManifestEntries is an in-memory state, and there is no multi-version issue during its lifecycle. I will remove the version control logic.

}
auto scan_context_result = context_builder.Finish();
EXPECT_TRUE(scan_context_result.ok()) << scan_context_result.status().ToString();
auto table_scan_result = TableScan::Create(std::move(scan_context_result).value());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider use EXPECT_OK_AND_ASSIGN.

ASSERT_EQ(supplier_calls_after_first_scan, manifest_cache->SupplierCallCount());
}

Result<std::vector<std::shared_ptr<DataSplitImpl>>> RunBucketSnapshotScan(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure correctness across orthogonal scan scenarios, please make the relevant scan tests parameterized.

In particular, convert the scan-inte/scan-and-read-inte/data-evolution tests that cover bucket pruning, predicate filtering, PK scans, data evolution scans, row-range scans, etc. to TEST_P, and run each case with snapshot live manifest cache both disabled and enabled.

This would make sure the new snapshot manifest cache path remains semantically equivalent to the existing non-cache path across the existing scan/filter matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants