Skip to content

chore: streamlined migration#429

Open
jot2re wants to merge 23 commits intomainfrom
tore/refactor/update-prss-upgrade-logic
Open

chore: streamlined migration#429
jot2re wants to merge 23 commits intomainfrom
tore/refactor/update-prss-upgrade-logic

Conversation

@jot2re
Copy link
Collaborator

@jot2re jot2re commented Feb 19, 2026

Description of changes

  • Streamlines migration approach to ensure all migration tasks are carried out immediately when calling kms-server.
  • Refactors handling of legacy FHE keys, s.t. they are not removed immediately
  • Changes logic of legacy PRSS s.t. they don't get loaded, but instead just converted at boot, and legacy PRSS removed
  • Removes overwrite logic of PRSS to avoid hiding any bugs, as a PRSS should never be overwritten after we have introduced epochs

Issue ticket number and link

PR Checklist

I attest that all checked items are satisfied. Any deviation is clearly justified above.

  • Title follows conventional commits (e.g. chore: ...).
  • Tests added for every new pub item and test coverage has not decreased.
  • Public APIs and non-obvious logic documented; unfinished work marked as TODO(#issue).
  • unwrap/expect/panic only in tests or for invariant bugs (documented if present).
  • No dependency version changes OR (if changed) only minimal required fixes.
  • No architectural protocol changes OR linked spec PR/issue provided.
  • No breaking deployment config changes OR devops label + infra notified + infra-team reviewer assigned.
  • No breaking gRPC / serialized data changes OR commit marked with ! and affected teams notified.
  • No modifications to existing versionized structs OR backward compatibility tests updated.
  • No critical business logic / crypto changes OR ≥2 reviewers assigned.
  • No new sensitive data fields added OR Zeroize + ZeroizeOnDrop implemented.
  • No new public storage data OR data is verifiable (signature / digest).
  • No unsafe; if unavoidable: minimal, justified, documented, and test/fuzz covered.
  • Strongly typed boundaries: typed inputs validated at the edge; no untyped values or errors cross modules.
  • Self-review completed.

Dependency Update Questionnaire (only if deps changed or added)

Answer in the Cargo.toml next to the dependency (or here if updating):

  1. Ownership changes or suspicious concentration?
  2. Low popularity?
  3. Unusual version jump?
  4. Lacking documentation?
  5. Missing CI?
  6. No security / disclosure policy?
  7. Significant size increase?

More details and explanations for the checklist and dependency updates can be found in CONTRIBUTING.md

@cla-bot cla-bot bot added the cla-signed The CLA has been signed. label Feb 19, 2026
@github-actions
Copy link

github-actions bot commented Feb 19, 2026

Consolidated Tests Results 2026-02-25 - 17:22:59

Test Results

passed 14 passed

Details

tests 14 tests
clock not captured
tool junit-to-ctrf
build build-and-test arrow-right test-reporter link #590
pull-request chore: streamlined migration link #429

test-reporter: Run #590

Tests 📝 Passed ✅ Failed ❌ Skipped ⏭️ Pending ⏳ Other ❓ Flaky 🍂 Duration ⏱️
14 14 0 0 0 0 0 not captured

🎉 All tests passed!

Tests

View All Tests
Test Name Status Flaky Duration
full_gen_tests_k8s_default_threshld_sequential_crs 33.0s
test_k8s_threshld_insecure 3m 14s
k8s_test_crs_uniqueness 33.0s
k8s_test_keygen_and_crs 3m 15s
k8s_test_keygen_uniqueness 8m 53s
full_gen_tests_k8s_default_threshld_sequential_crs 32.8s
test_k8s_threshld_insecure 3m 15s
k8s_test_crs_uniqueness 33.1s
k8s_test_keygen_and_crs 3m 13s
k8s_test_keygen_uniqueness 8m 57s
full_gen_tests_k8s_default_centralzd_sequential_crs 1.8s
test_k8s_centralzd_insecure 1m 40s
full_gen_tests_default_k8s_centralized_sequential_crs 1.8s
k8s_test_centralized_insecure 1m 1s

🍂 No flaky tests in this run.

Github Test Reporter by CTRF 💚

🔄 This comment has been updated

@jot2re jot2re changed the title refactor: streamlined upgrade chore: streamlined migration Feb 19, 2026
@jot2re jot2re requested review from Copilot February 19, 2026 13:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR streamlines the migration approach for the KMS server by consolidating migration logic to run immediately at startup when calling kms-server. The changes refactor how legacy FHE keys and PRSS (Pseudo-Random Secret Sharing) setups are handled during migration from version 0.12.x to 0.13.1.

Changes:

  • Introduced migrate_to_0_13_1() function that consolidates FHE key and PRSS migration at server startup
  • Refactored FHE key migration to preserve legacy data instead of immediately deleting it, with cleanup deferred to version 0.14.0
  • Changed PRSS migration to convert legacy format to new combined format at boot and delete legacy data immediately
  • Removed PRSS overwrite logic in epoch_manager to prevent silently hiding bugs
  • Moved init_all_prss_from_storage() call to after PRSS initialization in the startup sequence

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
core/service/src/bin/kms-server.rs Updated to call new consolidated migration function at startup with threshold and num_parties parameters
core/service/src/engine/migration.rs Added migrate_to_0_13_1() and migrate_to_0_14_0() functions; refactored FHE key migration to not delete legacy data immediately; moved PRSS migration logic from epoch_manager; added comprehensive tests
core/service/src/engine/threshold/service/epoch_manager.rs Removed init_legacy_prss_from_storage() method and PRSS overwrite checking logic; updated documentation
core/service/src/engine/threshold/service/kms_impl.rs Reordered initialization to call init_all_prss_from_storage() after conditional PRSS creation
core/service/src/engine/threshold/service/session.rs Added #[allow(dead_code)] to num_parties() method which is no longer used after refactoring
Comments suppressed due to low confidence (1)

core/service/src/engine/threshold/service/epoch_manager.rs:330

  • The removal of PRSS overwrite checking combined with storage's non-overwriting behavior creates a potential inconsistency. If init_prss is called twice with the same epoch_id: (1) the first call generates and stores PRSS to disk, (2) the second call generates a different PRSS, attempts to store it (which silently succeeds without writing due to line 284-289 in file.rs), then adds this new PRSS to session_maker (line 330), resulting in in-memory PRSS differing from on-disk PRSS. The comment "Ensure data can be stored before updating the model in ram" (line 321) suggests the intent is to prevent this, but store_versioned_at_request_id returns Ok() even when it skips writing. Consider either: (1) checking if data already exists before running the expensive PRSS protocol, or (2) making store_versioned_at_request_id return an error when data already exists for this critical use case.
        // Ensure data can be stored before updating the model in ram
        store_versioned_at_request_id(
            &mut (*priv_storage),
            &(*epoch_id).into(),
            &prss,
            &PrivDataType::PrssSetupCombined.to_string(),
        )
        .await?;

        session_maker.add_epoch(*epoch_id, prss).await;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

jot2re and others added 5 commits February 19, 2026 14:23
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jot2re jot2re marked this pull request as ready for review February 19, 2026 14:18
@jot2re jot2re requested a review from a team as a code owner February 19, 2026 14:18
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@kc1212 kc1212 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, I think, in addition to all this, we need to check that our storage backend works properly with the two types FheKeyInfo storage patterns.

Could you add some tests, for all the storage backends, so that when doing things like find all epoch IDs for FheKeyInfo, it only returns the epoch IDs that matches this pattern
FheKeyInfo/<epoch_id>/<key_id>

and does not return the <key_id> as epoch ID in this pattern FheKeyInfo/<key_id>?

@kc1212
Copy link
Contributor

kc1212 commented Feb 19, 2026

More concretely, the functions like all_epoch_ids_for_data and all_epoch_ids_for_data will do some kind of iteration depending on the paths. these should all be tested when there's a mixture of storage patterns.

@jot2re
Copy link
Collaborator Author

jot2re commented Feb 24, 2026

@kc1212 additional tests added in 91e897b

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@kc1212 kc1212 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have one concern around the removal of the PRSS existance check. the rest are all ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants