Skip to content

[BUG] Doctor Leaves Dolt LOCK File After Completion #1925

@mikij

Description

@mikij

Summary

After bd doctor completes, .beads/dolt/beads_z/.dolt/noms/LOCK file remains on disk instead of being cleaned up. Doctor then reports this LOCK as a problem on subsequent runs.

Environment

  • Beads version: v0.55.1
  • Backend: Dolt
  • Database name: beads_z (configured in dolt_database field)
  • OS: macOS (Darwin)

Reproduction Steps

  1. Run bd doctor when no LOCK files exist
  2. Doctor opens Dolt connections to perform checks
  3. Doctor completes and exits
  4. Check that LOCK file persists:
    test -f .beads/dolt/beads_z/.dolt/noms/LOCK && echo "LOCK exists"
    # Output: LOCK exists
  5. Run bd doctor again:
    # Output: Reports LOCK as problem
    RUNTIME (1/2 passed)
      ⚠  Dolt Lock Health: 1 lock issue(s) detected
            noms LOCK file exists at dolt/beads_z/.dolt/noms/LOCK — may block database access
    

Expected Behavior

When doctor finishes, all Dolt connections should be properly closed and Dolt should remove the .dolt/noms/LOCK file.

Actual Behavior

The LOCK file persists after doctor exits, causing:

  1. Doctor to report LOCK as "stale" or "block database access" on subsequent runs
  2. False warning for users who manually delete the LOCK file (it reappears)
  3. Users unsure if LOCK is from a crashed process or from doctor itself

Root Cause Analysis

Proof from Code Analysis:

Actual Flow in runDiagnostics() (cmd/bd/doctor.go:330-636)

func runDiagnostics(path string) doctorResult {
    // Lines 458-479: Federation checks open Dolt connections
    remotesAPICheck := convertWithCategory(doctor.CheckFederationRemotesAPI(path), doctor.CategoryFederation)
    peerConnCheck := convertWithCategory(doctor.CheckFederationPeerConnectivity(path), doctor.CategoryFederation)
    // ... other checks that open Dolt connections

    // Line 225: CheckLockHealth runs AFTER other checks
    lockCheck := CheckLockHealth(path)
    result.Checks = append(result.Checks, lockCheck)
}

CheckLockHealth Does NOT Open Dolt (Lines 573-636)

func CheckLockHealth(path string) DoctorCheck {
    beadsDir := resolveBeadsDir(filepath.Join(path, ".beads"))

    if !IsDoltBackend(beadsDir) {
        return DoctorCheck{Name: "Dolt Lock Health", Status: StatusOK, Message: "N/A"}
    }

    var warnings []string

    // ONLY checks filesystem for LOCK files - does NOT open Dolt
    doltDir := filepath.Join(beadsDir, "dolt")
    if dbEntries, err := os.ReadDir(doltDir); err == nil {
        for _, dbEntry := range dbEntries {
            if !dbEntry.IsDir() {
                continue
            }
            nomsLock := filepath.Join(doltDir, dbEntry.Name(), ".dolt", "noms", "LOCK")
            if _, err := os.Stat(nomsLock); err == nil {
                warnings = append(warnings,
                    fmt.Sprintf("noms LOCK file exists at dolt/%s/.dolt/noms/LOCK", dbEntry.Name()))
            }
        }
    }

    // Returns warning if any LOCK files found
    if len(warnings) > 0 {
        return DoctorCheck{
            Name:     "Dolt Lock Health",
            Status:   StatusWarning,
            Message:  fmt.Sprintf("%d lock issue(s) detected", len(warnings)),
            Detail:   strings.Join(warnings, "; "),
            Fix:      "Run 'bd doctor --fix' to clean stale lock files, or wait for other process to finish",
        }
    }
}

The Bug Timeline

  1. Early checks open Dolt connections (lines 458-479):

    • CheckFederationRemotesAPI() → calls dolt.New() → Dolt creates .dolt/noms/LOCK
    • CheckFederationPeerConnectivity() → calls dolt.New() → Dolt creates .dolt/noms/LOCK
    • CheckFederationConflicts() → calls dolt.New() → Dolt creates .dolt/noms/LOCK
    • Other Dolt-based checks open connections
  2. Dolt connections are NOT properly closed:

    • Federation checks use defer func() { _ = store.Close() }() (federation.go:60, 146, 263, 356, 463)
    • However, Dolt's embedded driver does NOT remove .dolt/noms/LOCK on close
  3. CheckLockHealth runs LAST (line 225):

    • It finds the .dolt/noms/LOCK file left behind by earlier checks
    • It cannot distinguish between:
      • LOCK from current doctor run (created by checks in lines 458-479)
      • LOCK from crashed previous process
    • LOCK from concurrent bd process
  4. Doctor exits:

    • The LOCK file remains on disk
    • Next doctor run finds it again and reports as warning

Concrete Evidence

Check Opens Dolt? Closes Connection? Line
CheckFederationRemotesAPI ✅ YES ⚠️ DOLUT 146 (defer close)
CheckFederationPeerConnectivity ✅ YES ⚠️ DOLUT 263 (defer close)
CheckFederationConflicts ✅ YES ⚠️ DOLUT 356 (defer close)
CheckDoltServerModeMismatch ✅ YES ⚠️ DOLUT 463 (defer close)
CheckLockHealth ❌ NO N/A 573 (only checks filesystem)

Key Finding: Dolt's embedded driver does NOT remove .dolt/noms/LOCK when connections are closed, even with proper defer Close() pattern.

Dolt Connection Close Pattern

Looking at federation checks (cmd/bd/doctor/federation.go):

// Line 146 - CheckFederationRemotesAPI
defer func() { _ = store.Close() }()  // ← Uses proper defer pattern

// But Dolt doesn't clean up .dolt/noms/LOCK when this closes

Confirmed behavior: Even with proper defer func() { _ = store.Close() }() pattern, the .dolt/noms/LOCK file remains after doctor exits.

This proves that the bug is in Dolt's embedded driver, not in doctor's connection management code.

Code Locations to Investigate

File: cmd/bd/doctor/dolt.go

Functions that open Dolt connections:

  • openDoltDBEmbedded() - Creates connections for doctor checks
  • openDoltDBWithLock() - Creates connections with advisory lock
  • checkStatusWithDB() - Uses conn.db from above

Close pattern:

defer conn.Close()  // Should trigger Dolt cleanup

Dolt's expected behavior:
When conn.db.Close() is called, Dolt should:

  1. Release .dolt/noms/LOCK file
  2. Close all file handles
  3. Clean up any transient resources

Actual behavior:
LOCK file remains on disk after doctor exits.

Impact

Severity: Medium

User-facing impact:

  • Doctor consistently shows "LOCK file exists" warning on every run
  • Users unsure if they should delete the file (it's from doctor, not a crash)
  • Confusing UX - tool creates problem and reports it as external issue

Functional impact:

  • Low - New bd operations may fail if LOCK isn't cleaned properly
  • May cause "database locked" errors if another process tries to open Dolt
  • However, stale LOCK warning from doctor suggests it's not actively held

Suggested Fixes

Fix A: Ensure proper connection cleanup

Add explicit cleanup after all doctor checks complete:

// In cmd/bd/doctor.go main doctor function
defer func() {
    // Explicitly close any open connections
    if conn != nil {
        conn.Close()
    }
}()

Fix B: Force Dolt to cleanup LOCK

After closing connections, explicitly remove stale LOCK files:

// In cmd/bd/doctor/dolt.go CheckLockHealth()
defer func() {
    // After checking, cleanup any stale LOCKs from this run
    nomsLockPath := filepath.Join(beadsDir, "dolt", dbName, ".dolt", "noms", "LOCK")
    _ = os.Remove(nomsLockPath)
}()

Fix C: Coordinate LOCK check timing

Don't check for LOCK files while doctor connections are open:

// In cmd/bd/doctor.go
// Run LOCK health check FIRST, before opening any Dolt connections
lockHealthCheck := CheckLockHealth(path)

// Then run other checks that open connections
// This avoids detecting doctor's own LOCK files

Workarounds for Users

Workaround #1: Accept the warning

Doctor's LOCK file is from doctor itself, not a crashed process. Accept the warning until the bug is fixed.

Workaround #2: Manual cleanup after doctor

After doctor completes, manually remove LOCK files (knowing they'll reappear):

# After doctor finishes
rm -f .beads/dolt/beads_z/.dolt/noms/LOCK

Verification

To confirm this is a bug:

# 1. Remove all LOCK files
rm -f .beads/dolt/beads_z/.dolt/noms/LOCK
rm -f .beads/dolt-access.lock

# 2. Verify no LOCK files
test -f .beads/dolt/beads_z/.dolt/noms/LOCK && echo "LOCK exists" || echo "No LOCK"

# 3. Run doctor (creates LOCK)
bd doctor 2>&1 | grep "Dolt Lock Health"

# 4. Check if LOCK persists
test -f .beads/dolt/beads_z/.dolt/noms/LOCK && echo "LOCK still exists" || echo "LOCK cleaned up"

# Expected: LOCK still exists (bug confirmed)

Additional Context

Observed behavior:

  • When user manually deletes the .dolt/noms/LOCK file and runs bd doctor again, the LOCK file reappears
  • This confirms that bd doctor itself is creating the LOCK file during its operation
  • The LOCK persists after doctor exits instead of being cleaned up

User testing:

  • User deleted the LOCK file manually
  • Ran bd doctor
  • LOCK file appeared again
  • This confirms the bug is reproducible

Checklist for Fix Implementation

  • Investigated concrete bug location and flow (CODE PROOF ADDED)
  • Verified Dolt doesn't remove LOCK on close (Dolt driver issue)
  • Confirmed doctor uses proper defer Close() pattern (not the issue)
  • Documented actual bug timeline with line numbers
  • Fix Dolt embedded driver to remove .dolt/noms/LOCK on connection close
  • Add test case verifying LOCK cleanup after doctor completion
  • Test with both embedded and server mode Dolt connections
  • Consider running LOCK health check BEFORE opening Dolt connections
  • Test with both embedded and server mode Dolt connections

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions