feat: [cp2.6] implement data salvage for force failover#48527
feat: [cp2.6] implement data salvage for force failover#48527bigsheeper wants to merge 9 commits intomilvus-io:2.6from
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: bigsheeper The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@bigsheeper This is a feature PR ( How to resolve: Design documents location: https://github.com/milvus-io/milvus-design-docs/tree/main/design_docs |
|
[ci-v2-notice] To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
|
[INFO] PR Label Summary by Default
[WARNING] Milestone not set
You can set milestone by commenting: Use /refresh-label to update related check and label manually |
|
/refresh-label |
|
[INFO] PR Label Summary by Refresh-Label
[FAILED] PR #47599 not merged
Use /refresh-label to update related check and label manually |
454c874 to
3851afb
Compare
|
[INFO] PR Label Summary by Default
Use /refresh-label to update related check and label manually |
1 similar comment
|
[INFO] PR Label Summary by Default
Use /refresh-label to update related check and label manually |
|
/ci-rerun-build |
|
/ci-rerun-all |
b143833 to
c82a5ca
Compare
|
[INFO] PR Label Summary by Default
Use /refresh-label to update related check and label manually |
3 similar comments
|
[INFO] PR Label Summary by Default
Use /refresh-label to update related check and label manually |
|
[INFO] PR Label Summary by Default
Use /refresh-label to update related check and label manually |
|
[INFO] PR Label Summary by Default
Use /refresh-label to update related check and label manually |
|
/ci-rerun-ut-go |
|
[INFO] PR Label Summary by Default
Use /refresh-label to update related check and label manually |
Cherry-pick from master PR milvus-io#47599 Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
…Append mock - Remove trailing newline in internal/proxy/impl.go (gofmt) - Remove mockBroadcastService.EXPECT().Append() call in TestForcePromoteUpdateConfigError: the Broadcast interface in 2.6 only has Ack(), not Append(), so the mock setup was invalid. The test validates early at request validation before reaching any broadcast, so the mock is unnecessary. Signed-off-by: $(git config user.name) <$(git config user.email)> Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
Add unit test coverage for salvage checkpoint functionality introduced in PR milvus-io#47599: - kv_catalog_test.go: TestCatalogSalvageCheckpoint (save/get success, errors, multiple clusters) and extend TestBuildPrefixAndKey with salvage key builders - replicate_service_test.go: TestReplicateServiceGetSalvageCheckpoint (success, error, closed_lifetime) - manager_test.go: TestSalvageCheckpointLoadedFromEtcd and TestSalvageCheckpointMultipleForcePromotes - salvage_checkpoint_test.go (new): TestUpdateCheckpointForcePromote and TestConsumeDirtySnapshotWithSalvageCheckpoint covering the force-promote path in recovery_storage_impl Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
Covers all code paths in the DumpMessages RPC handler: - Node health check (unhealthy → error) - Parameter validation (missing pchannel, nil/empty start_message_id) - Context cancellation (returns ctx.Err()) - Scanner error/done paths - Start/end timetick filtering - System message filtering (RollbackTxn filtered, data messages passed) - Stream send error propagation - Channel closed (returns nil) Uses mock WAL via streaming.SetWALForTest with RunAndReturn to inject messages synchronously into the buffered msgCh channel. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
…eCheckpoint Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
…effects Tests using SetWALForTest(mockWAL)/defer SetWALForTest(nil) were inadvertently resetting the streaming singleton to nil after tests in proxy_test.go had set it to noopWALAccesser via SetupNoopWALForTest(). This caused TestDeleteTask_Execute/delete_produce_failed to panic on streaming.WAL().AppendMessages() when the singleton was nil. Fix: save and restore the previous WAL value in each test instead of unconditionally resetting to nil. Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
… no checkpoints exist When salvageCheckpoints map is empty, make() returns an empty non-nil slice, causing require.Nil assertion in TestWAL to fail. Return nil explicitly when there are no salvage checkpoints. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Yihao Dai <yihao.dai@zilliz.com>
0148ead to
c9c42b4
Compare
|
[INFO] PR Label Summary by Default
Use /refresh-label to update related check and label manually |
Cherry-pick from master
pr: #47599
design doc: https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/20260205-data_salvage_for_force_failover.md
Summary
Cherry-picked from master PR #47599 (open - preemptive backport)
Note: Original PR is still open. This CP PR should be updated if the original PR changes.
Verification