fix: resolve stale lock and same-instance write contention (#622, #623) by jlin53882 · Pull Request #626 · CortexReach/memory-lancedb-pro

jlin53882 · 2026-04-14T18:58:14Z

Summary

Fixes two related lock issues in memory-lancedb-pro:

[BUG] Windows: stale .memory-write.lock not released, blocks smart-extractor writes #622: Stale .memory-write.lock\ not released, blocking writes on Windows
[BUG] Same-instance concurrent writes竞争 lock，导致 Lock file is already being held #623: Same-instance concurrent writes competing for lock

Changes

src/store.ts

Added \updateQueue\ for intra-instance write serialization
Added \lockfilePath\ option with explicit lock path
Added proactive stale lock cleanup (5-minute threshold)
Fixed nested deadlock where \update()\ used both
unWithFileLock()\ AND
unSerializedUpdate()\

test/lock-recovery.test.mjs

Tests for stale lock recovery

test/store-write-queue.test.mjs

Tests for queue serialization

Testing

7 tests pass, 1 skip

Closes #622
Closes #623

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f691cff8bd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T19:02:30Z

+          if (ageMs > staleThresholdMs) {
+            if (stat.isDirectory()) {


Clear legacy lock artifacts immediately

This cleanup gate can leave a legacy .memory-write.lock regular file in place for up to 5 minutes, but the new lockfile.lock(this.config.dbPath, { lockfilePath }) flow expects that path to be created as a lock directory. In upgrade/restart scenarios where the old file was created recently, lock acquisition can fail (ELOCKED/ENOTDIR) and block all writes until the age threshold is crossed. Remove incompatible pre-existing lock artifacts regardless of age (or use the same short stale window as the lock) before acquiring the new lock.

Useful? React with 👍 / 👎.

jlin53882 · 2026-04-14T19:08:53Z

CI Failure Analysis

The cli-smoke and storage-and-schema job failures are NOT related to this PR's changes.

Root Cause

These failures are tracked in:

Issue cli-smoke test failing: missing store.count() mock (regression from PR #582) #590: cli-smoke test failing: missing store.count() mock (regression from PR fix: skip 75ms retry when store is empty (Bug 2) #582)
Issue cli-smoke test: store mock missing count() method #596: cli-smoke test: store mock missing count() method

The cli-smoke.mjs mock for store is missing a count() method. PR #582 added a countStore parameter to
etrieveWithRetry(), but the test mock was never updated.

Verification

I tested on base commit

rwmjhb · 2026-04-15T07:31:02Z

Please resolve conflict at first

…ch#622, CortexReach#623)

jlin53882 · 2026-04-15T09:50:48Z

@rwmjhb 幫我看一下～

現況更新

✅ 衝突已解決 — PR 已 rebase 到 upstream/master 最新，零衝突。

⚠️ packaging-and-workflow CI 失敗

CI 失敗在 verify-ci-test-manifest.mjs：

Error: unexpected manifest entry: test/hook-dedup-phase1.test.mjs

初步確認這是 upstream #617 merge 時產生的既有问题，不是本 PR 引進的：

test/hook-dedup-phase1.test.mjs 在 scripts/ci-test-manifest.mjs 中（來自 feat(phase1): hook event deduplication (continuation of PR #430) #617）
但不在 scripts/verify-ci-test-manifest.mjs 的 EXPECTED_BASELINE 中
導致 verifyExactOnceCoverage() 失敗
upstream master 自己的 CI 也同樣 fail 在這裡

正在確認 upstream #617 的 CI 狀態，確認後會再更新。

jlin53882 · 2026-04-15T09:56:33Z

關聯 PR #617 Hook Event Deduplication 的潛在問題

嗨 @CortexReach/memory-lancedb-pro 維護者，

PR #617 實作了 hook event deduplication（_hookEventDedup Set，max 200 entries，prune to newest 100），並將新測試檔註冊進了 scripts/ci-test-manifest.mjs。

注意到 PR #626 這次新增了兩個測試檔：

test/lock-recovery.test.mjs
test/store-write-queue.test.mjs

建議：確認這兩個測試檔是否也已註冊進 scripts/ci-test-manifest.mjs（類似的 packaging-and-workflow CI 失敗在 PR #617 時曾短暫出現，後來在 merge 前修復）。

另外，PR #617 的 hook deduplication 邏輯（index.ts）和 PR #626 的 lock recovery（src/store.ts）涉及不同的模組級狀態，如果有任何互動（例如 hook 層的 dedup skip 是否影響 store 層的 lock 事件時序），建議確認測試覆蓋。

謝謝！

… stale locks

…ortexReach#415) 對齊 upstream/master（包含 PR CortexReach#626 proactive cleanup），並保留 James 針對 Issue CortexReach#415 的修復： - 從 PR CortexReach#626 引入 proactive cleanup（age > 5 分鐘的 stale lock 自動清除） - 【修復 CortexReach#415】保守 retries 設定： - minTimeout: 1000ms（避免高負載下過度密集重試） - maxTimeout: 30000ms（支撐更久的 event loop 阻塞） - stale: 10000ms - 【修復 CortexReach#415】onCompromised flag：lock compromised 時不立即崩潰，由 finally block 統一處理 fn() 錯誤 vs compromisedErr 的抛出邏輯 - 新增 lock-stress-test.mjs：驗證並發寫入、重試行為、stress test PR CortexReach#517: CortexReach/memory-lancedb-pro Issue CortexReach#415: ECOMPROMISED crash under event-loop pressure

- PR CortexReach#517: onCompromised callback for graceful ECOMPROMISED handling - PR CortexReach#517: retries 10, maxTimeout 30000ms, minTimeout 1000ms - PR CortexReach#626: proactive stale lock cleanup (>5min) - PR CortexReach#626: updateQueue for intra-instance write serialization Issue CortexReach#632: lock contention in multi-agent environment

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

jlin53882 mentioned this pull request Apr 15, 2026

fix(store): proper-lockfile retries + ECOMPROMISED graceful handling (#415) #517

Open

fix: resolve stale lock and same-instance write contention (CortexRea…

18c39e7

…ch#622, CortexReach#623)

jlin53882 force-pushed the fix/issue-622-623-lock-clean-v2 branch from f691cff to 18c39e7 Compare April 15, 2026 09:18

rwmjhb merged commit 2718a8b into CortexReach:master Apr 15, 2026
5 of 7 checks passed

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 16, 2026

fix(store): align with PR CortexReach#626 — add proactive cleanup for…

895dff2

… stale locks

ScientificProgrammer mentioned this pull request Apr 19, 2026

fix(auto-recall): thread AbortSignal to cancel embedding on timeout #668

Open

arthurliang mentioned this pull request Apr 20, 2026

[BUG] ENOENT from proper-lockfile realpath() after proactive stale lock cleanup #670

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve stale lock and same-instance write contention (#622, #623)#626

fix: resolve stale lock and same-instance write contention (#622, #623)#626
rwmjhb merged 1 commit intoCortexReach:masterfrom
jlin53882:fix/issue-622-623-lock-clean-v2

jlin53882 commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

jlin53882 commented Apr 14, 2026

Uh oh!

rwmjhb commented Apr 15, 2026

Uh oh!

jlin53882 commented Apr 15, 2026

Uh oh!

jlin53882 commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jlin53882 commented Apr 14, 2026

Summary

Changes

src/store.ts

test/lock-recovery.test.mjs

test/store-write-queue.test.mjs

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

jlin53882 commented Apr 14, 2026

CI Failure Analysis

Root Cause

Verification

Uh oh!

rwmjhb commented Apr 15, 2026

Uh oh!

jlin53882 commented Apr 15, 2026

現況更新

Uh oh!

jlin53882 commented Apr 15, 2026

關聯 PR #617 Hook Event Deduplication 的潛在問題

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants