-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Summary
After bd doctor completes, .beads/dolt/beads_z/.dolt/noms/LOCK file remains on disk instead of being cleaned up. Doctor then reports this LOCK as a problem on subsequent runs.
Environment
- Beads version: v0.55.1
- Backend: Dolt
- Database name:
beads_z(configured indolt_databasefield) - OS: macOS (Darwin)
Reproduction Steps
- Run
bd doctorwhen no LOCK files exist - Doctor opens Dolt connections to perform checks
- Doctor completes and exits
- Check that LOCK file persists:
test -f .beads/dolt/beads_z/.dolt/noms/LOCK && echo "LOCK exists" # Output: LOCK exists
- Run
bd doctoragain:# Output: Reports LOCK as problem RUNTIME (1/2 passed) ⚠ Dolt Lock Health: 1 lock issue(s) detected noms LOCK file exists at dolt/beads_z/.dolt/noms/LOCK — may block database access
Expected Behavior
When doctor finishes, all Dolt connections should be properly closed and Dolt should remove the .dolt/noms/LOCK file.
Actual Behavior
The LOCK file persists after doctor exits, causing:
- Doctor to report LOCK as "stale" or "block database access" on subsequent runs
- False warning for users who manually delete the LOCK file (it reappears)
- Users unsure if LOCK is from a crashed process or from doctor itself
Root Cause Analysis
Proof from Code Analysis:
Actual Flow in runDiagnostics() (cmd/bd/doctor.go:330-636)
func runDiagnostics(path string) doctorResult {
// Lines 458-479: Federation checks open Dolt connections
remotesAPICheck := convertWithCategory(doctor.CheckFederationRemotesAPI(path), doctor.CategoryFederation)
peerConnCheck := convertWithCategory(doctor.CheckFederationPeerConnectivity(path), doctor.CategoryFederation)
// ... other checks that open Dolt connections
// Line 225: CheckLockHealth runs AFTER other checks
lockCheck := CheckLockHealth(path)
result.Checks = append(result.Checks, lockCheck)
}CheckLockHealth Does NOT Open Dolt (Lines 573-636)
func CheckLockHealth(path string) DoctorCheck {
beadsDir := resolveBeadsDir(filepath.Join(path, ".beads"))
if !IsDoltBackend(beadsDir) {
return DoctorCheck{Name: "Dolt Lock Health", Status: StatusOK, Message: "N/A"}
}
var warnings []string
// ONLY checks filesystem for LOCK files - does NOT open Dolt
doltDir := filepath.Join(beadsDir, "dolt")
if dbEntries, err := os.ReadDir(doltDir); err == nil {
for _, dbEntry := range dbEntries {
if !dbEntry.IsDir() {
continue
}
nomsLock := filepath.Join(doltDir, dbEntry.Name(), ".dolt", "noms", "LOCK")
if _, err := os.Stat(nomsLock); err == nil {
warnings = append(warnings,
fmt.Sprintf("noms LOCK file exists at dolt/%s/.dolt/noms/LOCK", dbEntry.Name()))
}
}
}
// Returns warning if any LOCK files found
if len(warnings) > 0 {
return DoctorCheck{
Name: "Dolt Lock Health",
Status: StatusWarning,
Message: fmt.Sprintf("%d lock issue(s) detected", len(warnings)),
Detail: strings.Join(warnings, "; "),
Fix: "Run 'bd doctor --fix' to clean stale lock files, or wait for other process to finish",
}
}
}The Bug Timeline
-
Early checks open Dolt connections (lines 458-479):
CheckFederationRemotesAPI()→ callsdolt.New()→ Dolt creates.dolt/noms/LOCKCheckFederationPeerConnectivity()→ callsdolt.New()→ Dolt creates.dolt/noms/LOCKCheckFederationConflicts()→ callsdolt.New()→ Dolt creates.dolt/noms/LOCK- Other Dolt-based checks open connections
-
Dolt connections are NOT properly closed:
- Federation checks use
defer func() { _ = store.Close() }()(federation.go:60, 146, 263, 356, 463) - However, Dolt's embedded driver does NOT remove
.dolt/noms/LOCKon close
- Federation checks use
-
CheckLockHealth runs LAST (line 225):
- It finds the
.dolt/noms/LOCKfile left behind by earlier checks - It cannot distinguish between:
- LOCK from current doctor run (created by checks in lines 458-479)
- LOCK from crashed previous process
- LOCK from concurrent
bdprocess
- It finds the
-
Doctor exits:
- The LOCK file remains on disk
- Next doctor run finds it again and reports as warning
Concrete Evidence
| Check | Opens Dolt? | Closes Connection? | Line |
|---|---|---|---|
| CheckFederationRemotesAPI | ✅ YES | 146 (defer close) | |
| CheckFederationPeerConnectivity | ✅ YES | 263 (defer close) | |
| CheckFederationConflicts | ✅ YES | 356 (defer close) | |
| CheckDoltServerModeMismatch | ✅ YES | 463 (defer close) | |
| CheckLockHealth | ❌ NO | N/A | 573 (only checks filesystem) |
Key Finding: Dolt's embedded driver does NOT remove .dolt/noms/LOCK when connections are closed, even with proper defer Close() pattern.
Dolt Connection Close Pattern
Looking at federation checks (cmd/bd/doctor/federation.go):
// Line 146 - CheckFederationRemotesAPI
defer func() { _ = store.Close() }() // ← Uses proper defer pattern
// But Dolt doesn't clean up .dolt/noms/LOCK when this closesConfirmed behavior: Even with proper defer func() { _ = store.Close() }() pattern, the .dolt/noms/LOCK file remains after doctor exits.
This proves that the bug is in Dolt's embedded driver, not in doctor's connection management code.
Code Locations to Investigate
File: cmd/bd/doctor/dolt.go
Functions that open Dolt connections:
openDoltDBEmbedded()- Creates connections for doctor checksopenDoltDBWithLock()- Creates connections with advisory lockcheckStatusWithDB()- Usesconn.dbfrom above
Close pattern:
defer conn.Close() // Should trigger Dolt cleanupDolt's expected behavior:
When conn.db.Close() is called, Dolt should:
- Release
.dolt/noms/LOCKfile - Close all file handles
- Clean up any transient resources
Actual behavior:
LOCK file remains on disk after doctor exits.
Impact
Severity: Medium
User-facing impact:
- Doctor consistently shows "LOCK file exists" warning on every run
- Users unsure if they should delete the file (it's from doctor, not a crash)
- Confusing UX - tool creates problem and reports it as external issue
Functional impact:
- Low - New bd operations may fail if LOCK isn't cleaned properly
- May cause "database locked" errors if another process tries to open Dolt
- However, stale LOCK warning from doctor suggests it's not actively held
Suggested Fixes
Fix A: Ensure proper connection cleanup
Add explicit cleanup after all doctor checks complete:
// In cmd/bd/doctor.go main doctor function
defer func() {
// Explicitly close any open connections
if conn != nil {
conn.Close()
}
}()Fix B: Force Dolt to cleanup LOCK
After closing connections, explicitly remove stale LOCK files:
// In cmd/bd/doctor/dolt.go CheckLockHealth()
defer func() {
// After checking, cleanup any stale LOCKs from this run
nomsLockPath := filepath.Join(beadsDir, "dolt", dbName, ".dolt", "noms", "LOCK")
_ = os.Remove(nomsLockPath)
}()Fix C: Coordinate LOCK check timing
Don't check for LOCK files while doctor connections are open:
// In cmd/bd/doctor.go
// Run LOCK health check FIRST, before opening any Dolt connections
lockHealthCheck := CheckLockHealth(path)
// Then run other checks that open connections
// This avoids detecting doctor's own LOCK filesWorkarounds for Users
Workaround #1: Accept the warning
Doctor's LOCK file is from doctor itself, not a crashed process. Accept the warning until the bug is fixed.
Workaround #2: Manual cleanup after doctor
After doctor completes, manually remove LOCK files (knowing they'll reappear):
# After doctor finishes
rm -f .beads/dolt/beads_z/.dolt/noms/LOCKVerification
To confirm this is a bug:
# 1. Remove all LOCK files
rm -f .beads/dolt/beads_z/.dolt/noms/LOCK
rm -f .beads/dolt-access.lock
# 2. Verify no LOCK files
test -f .beads/dolt/beads_z/.dolt/noms/LOCK && echo "LOCK exists" || echo "No LOCK"
# 3. Run doctor (creates LOCK)
bd doctor 2>&1 | grep "Dolt Lock Health"
# 4. Check if LOCK persists
test -f .beads/dolt/beads_z/.dolt/noms/LOCK && echo "LOCK still exists" || echo "LOCK cleaned up"
# Expected: LOCK still exists (bug confirmed)Additional Context
Observed behavior:
- When user manually deletes the
.dolt/noms/LOCKfile and runsbd doctoragain, the LOCK file reappears - This confirms that
bd doctoritself is creating the LOCK file during its operation - The LOCK persists after doctor exits instead of being cleaned up
User testing:
- User deleted the LOCK file manually
- Ran
bd doctor - LOCK file appeared again
- This confirms the bug is reproducible
Checklist for Fix Implementation
- Investigated concrete bug location and flow (CODE PROOF ADDED)
- Verified Dolt doesn't remove LOCK on close (Dolt driver issue)
- Confirmed doctor uses proper defer Close() pattern (not the issue)
- Documented actual bug timeline with line numbers
- Fix Dolt embedded driver to remove .dolt/noms/LOCK on connection close
- Add test case verifying LOCK cleanup after doctor completion
- Test with both embedded and server mode Dolt connections
- Consider running LOCK health check BEFORE opening Dolt connections
- Test with both embedded and server mode Dolt connections