feat(manifest): support snapshot mainifest cache#386
Conversation
8337c3b to
5280b8e
Compare
5280b8e to
4f2be24
Compare
a30b4b7 to
7ac31b2
Compare
7ac31b2 to
ffa1072
Compare
4968e72 to
44e9ee5
Compare
| for (const auto& [snapshot_id, entries] : entries_by_snapshot_) { | ||
| out.WriteValue<int64_t>(snapshot_id); | ||
| PAIMON_RETURN_NOT_OK(serializer.SerializeList(*entries, &out)); | ||
| } |
There was a problem hiding this comment.
I’d like to understand the need for versioning in the serialization/deserialization of SnapshotLiveManifestEntries. Is the expectation that it may be persisted or transferred across module boundaries? My understanding is probably not, since both parsing and toBytes seem to be handled entirely within SnapshotLiveManifestEntries itself.
My concern is that once we introduce a version here, any format change in a nested field of ManifestEntry—for example DataFileMeta—could require bumping the outer SnapshotLiveManifestEntries version as well to preserve cross-language compatibility. Right now, we only carry this kind of version information for protocols like split and commit message, so I want to confirm whether the version here is actually necessary.
There was a problem hiding this comment.
Yes, you are right. SnapshotLiveManifestEntries is an in-memory state, and there is no multi-version issue during its lifecycle. I will remove the version control logic.
| } | ||
| auto scan_context_result = context_builder.Finish(); | ||
| EXPECT_TRUE(scan_context_result.ok()) << scan_context_result.status().ToString(); | ||
| auto table_scan_result = TableScan::Create(std::move(scan_context_result).value()); |
There was a problem hiding this comment.
Consider use EXPECT_OK_AND_ASSIGN.
| ASSERT_EQ(supplier_calls_after_first_scan, manifest_cache->SupplierCallCount()); | ||
| } | ||
|
|
||
| Result<std::vector<std::shared_ptr<DataSplitImpl>>> RunBucketSnapshotScan( |
There was a problem hiding this comment.
To ensure correctness across orthogonal scan scenarios, please make the relevant scan tests parameterized.
In particular, convert the scan-inte/scan-and-read-inte/data-evolution tests that cover bucket pruning, predicate filtering, PK scans, data evolution scans, row-range scans, etc. to TEST_P, and run each case with snapshot live manifest cache both disabled and enabled.
This would make sure the new snapshot manifest cache path remains semantically equivalent to the existing non-cache path across the existing scan/filter matrix.
Purpose
In our production workload, a Paimon table may have about 60k buckets. A batch of data is imported roughly every 15 minutes, and the interval may become longer. During query planning, one scan read and decoded about 4.89 million manifest entries with 16 threads, which took around 30 seconds, while only a small subset of entries was finally kept after pruning.
This patch introduces a snapshot-level live manifest cache to reduce repeated manifest read and decode cost. The cache stores merged live manifest entries for snapshots. When scanning a newer snapshot, paimon-cpp looks up the latest cached snapshot not greater than the target snapshot. If it is the same snapshot, the scan can use it directly; otherwise paimon-
cpp incrementally builds the target snapshot by reading intermediate delta manifests.
The implementation reuses the existing byte-oriented
Cacheinterface:CacheKind::SNAPSHOT_LIVE_MANIFEST;table_path#branchas the logical cache key;scan.manifest-entry-cache.max-snapshots;ScanContextBuilder::WithCache()andscan.manifest-entry-cache.max-snapshotsis greater than 0.This trades one in-memory serialize/deserialize path for avoiding remote manifest reads plus manifest decoding. In the expected case, live entries are much fewer than all historical manifest entries; even in a conservative case, the cache is useful as long as deserializing cached live entries is cheaper than reading manifest files from remote storage and
decoding all required manifest entries again.
Tests
cmake --build build --target paimon-core-test -j2./build/release/paimon-core-test --gtest_filter='FileStoreScanTest.TestSnapshotLiveManifestCache:FileStoreScanTest.TestSnapshotLiveManifestCacheUsesCacheKey:TableScanTest.*:CoreOptionsTest.TestDefaultValue:CoreOptionsTest.TestFromMap:CoreOptionsTest.TestInvalidCase'git -c filter.lfs.process= -c filter.lfs.clean=cat -c filter.lfs.required=false diff --checkAPI and Format
This change extends the public cache kind enum with
CacheKind::SNAPSHOT_LIVE_MANIFEST.It removes the separate
scan.manifest-entry-cache.enabledoption. The cache is controlled byscan.manifest-entry-cache.max-snapshots: values greater than 0 enable the cache path when a cache is provided throughWithCache(), and 0 disables it.It does not change table storage format, file format, or network protocol. The serialized snapshot live manifest bundle is an in-memory cache value only and can be rebuilt from manifest files if absent or evicted.
Documentation
Yes. Added/updated user guide documentation for snapshot live manifest cache under
docs/source/user_guide/manifest_entry_cache.rst.Generative AI tooling
Generated-by: Codex (GPT-5)