[CORE-6913] cloud/scrub: fix false positive for replaced segments#29655
[CORE-6913] cloud/scrub: fix false positive for replaced segments#29655oleiman wants to merge 2 commits intoredpanda-data:devfrom
Conversation
|
/ci-repeat 3 |
559ba36 to
f523899
Compare
|
/ci-repeat 3 |
Retry command for Build#80815please wait until all jobs are finished before running the slash command |
|
/cdt |
There was a problem hiding this comment.
Pull request overview
This PR fixes a false positive issue in the cloud storage scrubber where compacted reuploads were incorrectly flagged as anomalies. The scrubber's manifest could become stale while GC deleted old segments, causing false positives when replacement segments existed at the same offset range but with different names/sizes.
Changes:
- Enhanced anomaly filtering to compare segment identities (names) in addition to offset ranges
- Applied the fix to both missing segment and segment metadata anomaly filtering paths
CI test resultstest results on build#80836
test results on build#80905
test results on build#80914
|
The scrub false-positive filter in process_anomalies() only checked whether a segment with the same offset range existed in the manifest. A compacted reupload produces a replacement segment at the same offset range but with a different name (different size). When GC deleted the old segment from cloud storage while the scrubber was still referencing a stale manifest, the filter kept the anomaly because the offset range still matched—even though the current segment at that range was a different (replacement) object that existed in cloud storage. Compare generate_remote_segment_name() for the manifest entry and the reported-missing segment so that replacements with the same offset range but different identity are correctly recognized as false positives. Fixes CORE-6913. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
f523899 to
221cceb
Compare
Test for race between scrubber and compacted segment reupload: 1. Create manifest with 3 segments, remove the middle one from cloud storage so the detector reports it missing 2. Replace it in the manifest with a compacted version at the same offset range but different size_bytes 3. Assert generate_remote_segment_name() differs for the original vs compacted segment (v2/v3 names encode size) 4. Call process_anomalies() and assert the anomaly is filtered out as a false positive 5. Verify no anomalies remain after filtering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Oren Leiman <oren.leiman@redpanda.com>
221cceb to
1fd5cb6
Compare
The scrub false-positive filter in process_anomalies() only checked whether a segment with the same offset range existed in the manifest. A compacted reupload produces a replacement segment at the same offset range but with a different name (different size). When GC deleted the old segment from cloud storage while the scrubber was still referencing a stale manifest, the filter kept the anomaly because the offset range still matched—even though the current segment at that range was a different (replacement) object that existed in cloud storage.
Compare generate_remote_segment_name() for the manifest entry and the reported-missing segment so that replacements with the same offset range but different identity are correctly recognized as false positives.
Fixes CORE-6913.
Backports Required
Release Notes
Improvements