-
-
Notifications
You must be signed in to change notification settings - Fork 814
compact: fix dealing with mismatching hints (1.4-maint) #9203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compact: fix dealing with mismatching hints (1.4-maint) #9203
Conversation
…#8535 emit only a warning, but let compaction complete. after that, borg check --repair can fix the hints successfully. This is the result of a longer discussion with Antigravity AI and me: Detailed Explanation: Why Converting AssertionError to Warning is Correct ========================================================================= PROBLEM OVERVIEW ---------------- The assertion `assert segments[segment] == 0` in compact_segments() was causing borg compact to crash when segment reference counts in the hints file didn't match the actual repository state. This typically occurred after index corruption or repository recovery scenarios. ROOT CAUSE ANALYSIS ------------------- The crash happens due to a fundamental mismatch between two data structures: 1. self.segments (loaded from hints file) - Contains reference counts for each segment - Persisted to disk in the hints file - Represents the "last known state" 2. self.index (loaded from index file) - Contains mappings of object IDs to (segment, offset) pairs - Can be corrupted or lost - When corrupted, triggers auto-recovery The Problem Scenario: 1. Repository has valid data with consistent hints.N and index.N 2. Index file gets corrupted (crash, disk error, etc.) 3. Borg detects corruption and auto-recovers: - Loads hints.N (with old reference counts) - Rebuilds index by replaying segments - Commits the rebuilt index 4. State is now inconsistent IF segments were deleted/lost: - self.segments[X] = 10 (from old hints, assumes segment X exists) - Segment X was actually deleted/lost - self.index has 0 entries for segment X (rebuilt from remaining segments) 5. During compact_segments(): - Tries to iterate objects in segment X - Segment X doesn't exist (was deleted/lost) - OR: segment X exists but objects aren't in index (superseded) - segments[X] is never decremented - segments[X] remains 10 instead of becoming 0 - Assertion fails! WHY THE FIX IS CORRECT ---------------------- 1. Hints are Advisory, Not Authoritative The hints file is an optimization to avoid scanning all segments. It's explicitly designed to be rebuildable from scratch by scanning segments. Therefore, incorrect hints should not cause a fatal error. 2. Self-Healing Behavior By converting the assertion to a warning and allowing compaction to proceed: - Compaction completes successfully - New hints are written with correct reference counts - Repository is automatically healed - No manual intervention required 3. Data Safety is Preserved The fix does NOT compromise data integrity because: - Compaction first copies all live data from segments to new segments - Only after all live data is safely copied are segments marked for deletion - The index determines what's "live" (authoritative source of truth) - Segments are deleted only when they contain no live data (per index) - The refcount warning indicates stale hints, not actual data loss risk - After compaction, new hints are written with correct counts 4. Consistent with Design Philosophy Borg already handles many corruption scenarios gracefully: - Missing hints → regenerated from segments - Corrupted index → rebuilt from segments - Missing segments → detected and handled This fix extends that philosophy to hint/index mismatches. 5. Alternative Solutions are Worse Other approaches considered: a) Crash and require manual intervention - Current behavior, user-hostile - Requires expert knowledge to fix b) Automatically run check --repair - Too aggressive, may hide real problems - User should decide when to repair c) Refuse to compact - Leaves repository in degraded state - Prevents normal operations VERIFICATION ------------ The fix has been verified with test cases that reproduce both scenarios: 1. test_missing_segment_in_hints - Simulates missing segment files - Verifies compact succeeds and updates hints correctly 2. test_index_corruption_with_old_hints - Simulates the root cause: corrupted index with old hints - Verifies compact succeeds despite reference count mismatch 3. test_subtly_corrupted_hints_without_integrity - Existing test updated to expect warning instead of crash - Verifies repository remains consistent after compaction OPERATIONAL IMPACT ------------------ After this fix: 1. Users experiencing this crash can now run `borg compact` successfully 2. The warning message alerts them to the inconsistency 3. They can optionally run `borg check --repair` for peace of mind 4. Repository continues to function normally The warning message provides enough information for debugging while not blocking normal operations. CONCLUSION ---------- Converting the assertion to a warning is the correct fix because: - It aligns with Borg's design philosophy of graceful degradation - It enables self-healing behavior - It preserves data safety - It improves user experience - It's consistent with how other corruption scenarios are handled The assertion was overly strict for a data structure (hints) that is explicitly designed to be advisory and rebuildable.
The code used to remove the missing segment only from "compact" hints, but we need to also remove it from "segments" hints.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 1.4-maint #9203 +/- ##
=============================================
- Coverage 80.60% 80.57% -0.03%
=============================================
Files 38 38
Lines 11253 11256 +3
Branches 1769 1771 +2
=============================================
Hits 9070 9070
- Misses 1614 1616 +2
- Partials 569 570 +1 ☔ View full report in Codecov by Sentry. |
|
TODO: Check if some bits of this should be ported to master. The borg2 repo code is totally different, but there is legacyrepository code there (mainly for reading though). |
No description provided.