Skip to content

[BUG] Windows: stale .memory-write.lock not released, blocks smart-extractor writes #622

@jlin53882

Description

@jlin53882

[BUG] Windows: stale .memory-write.lock not released, blocks smart-extractor writes

Bug Description

On Windows, the proper-lockfile-based lock in store.ts:runWithFileLock() fails to release stale locks, causing smart-extractor and memory-reflection hooks to fail with Lock file is already being held.

Observed Symptoms

  • Error: Lock file is already being held repeated in logs
  • smart-extractor fails to process candidates
  • memory-reflection hook fails
  • auto-recall times out after 60s

Root Cause Analysis

What was wrong in the original analysis

  1. proper-lockfile does not use a PID file — In v4.1.2 it uses a lock directory created via mkdir, and the actual lock is <target>.lock.

  2. proper-lockfile requires the target file to exist — It defaults to realpath: true, so simply removing the pre-creation of .memory-write.lock would cause lockfile.lock(lockPath) to fail with ENOENT, making things worse.

  3. The stale lock mechanism is unclearstale: 10000 should clean up the .lock directory, but on Windows the rmdir behavior or crash cleanup without proper handlers may prevent this.

What actually happens

.memory-write.lock        ← target file (pre-created, stays empty)
.memory-write.lock.lock/  ← actual lock directory (managed by proper-lockfile, in old implementation)

When a process crashes without calling release():

  • The .lock directory may remain as a stale lock
  • stale: 10000 should reclaim it, but Windows-specific behavior may interfere
  • Subsequent writes fail with "Lock file is already being held"

Recommended Fix

1. Use lockfilePath with dbPath

Lock the database directory directly and use lockfilePath to specify the lock location:

const release = await lockfile.lock(this.config.dbPath, {
  lockfilePath: lockPath,
  retries: { retries: 10, factor: 2, minTimeout: 200, maxTimeout: 5000 },
  stale: 10000,
});

2. Proactive Stale Lock Cleanup

Add proactive cleanup for legacy lock artifacts that exceed a threshold (5 minutes by default):

const lockPath = join(this.config.dbPath, ".memory-write.lock");
const staleThresholdMs = 5 * 60 * 1000;

if (existsSync(lockPath)) {
  try {
    const stat = statSync(lockPath);
    const ageMs = Date.now() - stat.mtimeMs;

    if (ageMs > staleThresholdMs) {
      if (stat.isDirectory()) {
        rmSync(lockPath, { recursive: true, force: true });
      } else {
        unlinkSync(lockPath);
      }
      console.warn(`[memory-lancedb-pro] cleared stale lock artifact: ${lockPath} ageMs=${ageMs}`);
    }
  } catch (err) {
    console.warn(`[memory-lancedb-pro] failed to inspect/clear stale lock artifact: ${lockPath} err=${String(err)}`);
  }
}

Why Proactive Cleanup Is Needed

Even with proper-lockfile's built-in stale mechanism, certain conditions can prevent cleanup:

  • Process crashes without triggering proper cleanup handlers
  • Windows-specific rmdir behavior may fail on non-empty directories
  • The stale threshold (10s) may be too short for slow operations

Proactive cleanup provides a fallback for legacy artifacts that should never exist in a healthy system.

Complete Implementation

private async runWithFileLock<T>(fn: () => Promise<T>): Promise<T> {
  const queued = this.updateQueue.catch(() => {}).then(async () => {
    const lockfile = await loadLockfile();

    try { mkdirSync(this.config.dbPath, { recursive: true }); } catch {}

    const lockPath = join(this.config.dbPath, ".memory-write.lock");
    const staleThresholdMs = 5 * 60 * 1000;

    // Proactive cleanup of stale lock artifacts
    if (existsSync(lockPath)) {
      try {
        const stat = statSync(lockPath);
        const ageMs = Date.now() - stat.mtimeMs;

        if (ageMs > staleThresholdMs) {
          if (stat.isDirectory()) {
            rmSync(lockPath, { recursive: true, force: true });
          } else {
            unlinkSync(lockPath);
          }
          console.warn(`[memory-lancedb-pro] cleared stale lock artifact: ${lockPath} ageMs=${ageMs}`);
        }
      } catch (err) {
        console.warn(`[memory-lancedb-pro] failed to inspect/clear stale lock artifact: ${lockPath} err=${String(err)}`);
      }
    }

    const release = await lockfile.lock(this.config.dbPath, {
      lockfilePath: lockPath,
      retries: { retries: 10, factor: 2, minTimeout: 200, maxTimeout: 5000 },
      stale: 10000,
    });

    try {
      return await fn();
    } finally {
      await release();
    }
  });

  this.updateQueue = queued.catch(() => {});
  return queued;
}

Required Imports

import {
  existsSync,
  mkdirSync,
  rmSync,
  statSync,
  unlinkSync,
} from "node:fs";

Why This Is Better

Change Reason
Lock dbPath directly Avoids ENOENT — directory always exists
Use lockfilePath Single lock artifact: <dbPath>/.memory-write.lock
Proactive cleanup Legacy artifacts are cleaned before they cause issues
Threshold = 5 min Conservative; only clears truly stale locks
Log failures Don't silently swallow cleanup errors

Relationship to Issue #623

Issue #623 addresses same-instance concurrent writes竞争 lock.

Both issues should be merged together:

Environment

  • Windows (observed)
  • memory-lancedb-pro 1.1.0-beta.10
  • Multiple concurrent agents

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions