Skip to content

refactor(hummock): add owned from_persisted_protobuf to reduce memory amplification#24940

Open
Li0k wants to merge 1 commit intomainfrom
li0k/from_protobuf_owned
Open

refactor(hummock): add owned from_persisted_protobuf to reduce memory amplification#24940
Li0k wants to merge 1 commit intomainfrom
li0k/from_protobuf_owned

Conversation

@Li0k
Copy link
Contributor

@Li0k Li0k commented Mar 3, 2026

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Add From for HummockVersionCommon (owned) to move data instead of cloning during deserialization
  • Add from_persisted_protobuf_owned and from_protobuf_owned methods for HummockVersion, HummockVersionDelta, and HummockVersionCheckpoint
  • Add TableChangeLogCommon::from_protobuf_owned for owned change log conversion
  • Update all callers that pass owned/temporary protobuf values to use the new owned variants, eliminating unnecessary clones of key_range, table_ids, and stale_objects across thousands of SstableInfos

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors Hummock protobuf deserialization paths to support owned conversions, aiming to reduce memory amplification by moving large protobuf fields (e.g., SSTable key ranges / table_ids) instead of cloning during checkpoint/version loading.

Changes:

  • Add owned from_persisted_protobuf_owned / from_protobuf_owned conversion APIs for Hummock version/delta/checkpoint-related types.
  • Implement From<PbHummockVersion> for HummockVersionCommon<T> and add TableChangeLogCommon::from_protobuf_owned to enable move-based conversions.
  • Update call sites (meta snapshot load/build paths, time travel, tests, benches) to use owned conversion APIs when protobuf values are already owned/temporary.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/storage/src/hummock/store/version.rs Update test to construct HummockVersion via owned persisted-protobuf conversion.
src/storage/src/hummock/event_handler/uploader/test_utils.rs Update test helper to use owned persisted-protobuf conversion.
src/storage/hummock_sdk/src/version.rs Add owned persisted-protobuf conversion entrypoints and owned From<PbHummockVersion> conversion.
src/storage/hummock_sdk/src/change_log.rs Add owned change log conversion helper (from_protobuf_owned).
src/storage/benches/bench_table_watermarks.rs Switch benchmark setup to owned persisted-protobuf conversion.
src/storage/backup/src/meta_snapshot_v2.rs Decode snapshot metadata using owned HummockVersion conversion.
src/storage/backup/src/meta_snapshot_v1.rs Decode snapshot metadata using owned HummockVersion conversion.
src/meta/src/hummock/manager/time_travel.rs Use owned persisted-protobuf conversions when to_protobuf() already yields owned protobuf values.
src/meta/src/hummock/manager/mod.rs Use owned persisted-protobuf conversion for deltas loaded from DB models.
src/meta/src/hummock/manager/checkpoint.rs Add owned checkpoint conversion and use it when reading checkpoints from object store.
src/meta/src/backup_restore/meta_snapshot_builder.rs Convert version deltas using the owned persisted-protobuf API.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +624 to +628
/// Convert an owned `PbHummockVersionDelta` deserialized from persisted state to
/// `HummockVersionDelta`, moving data instead of cloning.
pub fn from_persisted_protobuf_owned(delta: PbHummockVersionDelta) -> Self {
delta.into()
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_persisted_protobuf_owned currently just calls delta.into(), but the underlying From<PbHummockVersionDelta> conversion still clones change_log_delta (log_delta.new_log.clone() + iterating by reference). That means the new owned API won’t fully eliminate the intended cloning/memory amplification for deltas. Consider changing the From<PbHummockVersionDelta> impl to consume pb_version_delta.change_log_delta via into_iter() and convert each PbChangeLogDelta with its owned From<PbChangeLogDelta> impl (in change_log.rs), avoiding the clone() entirely.

Copilot uses AI. Check for mistakes.
@Li0k Li0k requested a review from zwang28 March 5, 2026 13:12
… amplification

- Add From<PbHummockVersion> for HummockVersionCommon<T> (owned) to move
  data instead of cloning during deserialization
- Add from_persisted_protobuf_owned and from_protobuf_owned methods for
  HummockVersion, HummockVersionDelta, and HummockVersionCheckpoint
- Add TableChangeLogCommon::from_protobuf_owned for owned change log conversion
- Update all callers that pass owned/temporary protobuf values to use
  the new owned variants, eliminating unnecessary clones of key_range,
  table_ids, and stale_objects across thousands of SstableInfos
@Li0k Li0k force-pushed the li0k/from_protobuf_owned branch from 0701220 to b4a42fd Compare March 5, 2026 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants