fix: vector clustering key compaction task stuck in analyzing state#48529
fix: vector clustering key compaction task stuck in analyzing state#48529xiaocai2333 wants to merge 4 commits intomilvus-io:masterfrom
Conversation
…ilvus-io#47540) When FloatVector is used as a clustering key, the compaction task gets stuck because the analyzing state was not mapped in FromCompactionState, causing the global scheduler to drop the task. After processAnalyzing transitions the state back to pipelining, no one re-schedules doCompact. Fix by mapping analyzing to InProgress so the scheduler keeps the task in runningTasks, and guarding QueryTaskOnWorker to skip DataNode queries while the task is still analyzing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xiaocai2333 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
[ci-v2-notice] To rerun ci-v2 checks, comment with:
If you have any questions or requests, please contact @zhikunyao. |
…uest The analyzeTask.CreateTaskOnWorker was sending an incomplete AnalyzeRequest with empty SegmentStats, Dim=0, and missing clustering configuration params. Port the logic from 2.5's PreCheck to populate segment binlog IDs, extract vector dimension from schema TypeParams, calculate NumClusters, and set all clustering compaction configuration parameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
…ycle Verify the full state machine: pipelining → analyzing → pipelining (with AnalyzeVersion) → executing. Covers the three fixes from milvus-io#47540: - FromCompactionState(analyzing) returns InProgress - QueryTaskOnWorker skips when analyzing - CreateTaskOnWorker calls doCompact after AnalyzeVersion is set Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (51.51%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #48529 +/- ##
==========================================
- Coverage 77.60% 77.58% -0.02%
==========================================
Files 2113 2111 -2
Lines 350935 350746 -189
==========================================
- Hits 272352 272138 -214
- Misses 70242 70268 +26
+ Partials 8341 8340 -1
🚀 New features to boost your workflow:
|
…us-io#47540) Add end-to-end test that creates a collection with FloatVector as clustering key, inserts data, triggers clustering compaction, and verifies the full lifecycle completes (including the analyze step). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Summary
analyzingstate toInProgressinFromCompactionStateso the global scheduler keeps the task inrunningTasksinstead of dropping itQueryTaskOnWorkerto skip DataNode queries while the task is inanalyzingstate, preventing premature state resetRoot cause
When FloatVector is used as a clustering key,
CreateTaskOnWorkercallsdoAnalyze()which sets the task state toanalyzing. However,FromCompactionStatedid not mapanalyzingto any scheduler state, returningNone. This caused the global scheduler to drop the task from bothpendingTasksandrunningTasks. AfterprocessAnalyzing()transitioned the state back topipelining, no one re-scheduleddoCompact(), leaving the task stuck forever.Test plan
TestFromCompactionState— verifies all compaction states map correctly, includinganalyzing → InProgressTestQueryTaskOnWorkerSkipAnalyzing— verifiesQueryTaskOnWorkerreturns immediately when task is inanalyzingstate without querying DataNodeissue: #47540
🤖 Generated with Claude Code
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com