fix: resolve stale lock and same-instance write contention (#622, #623)#626
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f691cff8bd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (ageMs > staleThresholdMs) { | ||
| if (stat.isDirectory()) { |
There was a problem hiding this comment.
Clear legacy lock artifacts immediately
This cleanup gate can leave a legacy .memory-write.lock regular file in place for up to 5 minutes, but the new lockfile.lock(this.config.dbPath, { lockfilePath }) flow expects that path to be created as a lock directory. In upgrade/restart scenarios where the old file was created recently, lock acquisition can fail (ELOCKED/ENOTDIR) and block all writes until the age threshold is crossed. Remove incompatible pre-existing lock artifacts regardless of age (or use the same short stale window as the lock) before acquiring the new lock.
Useful? React with 👍 / 👎.
CI Failure AnalysisThe cli-smoke and storage-and-schema job failures are NOT related to this PR's changes. Root CauseThese failures are tracked in:
The cli-smoke.mjs mock for store is missing a count() method. PR #582 added a countStore parameter to VerificationI tested on base commit |
|
Please resolve conflict at first |
f691cff to
18c39e7
Compare
|
@rwmjhb 幫我看一下~ 現況更新✅ 衝突已解決 — PR 已 rebase 到 CI 失敗在 初步確認這是 upstream #617 merge 時產生的既有问题,不是本 PR 引進的:
正在確認 upstream #617 的 CI 狀態,確認後會再更新。 |
關聯 PR #617 Hook Event Deduplication 的潛在問題嗨 @CortexReach/memory-lancedb-pro 維護者, PR #617 實作了 hook event deduplication( 注意到 PR #626 這次新增了兩個測試檔:
建議:確認這兩個測試檔是否也已註冊進 另外,PR #617 的 hook deduplication 邏輯( 謝謝! |
…ortexReach#415) 對齊 upstream/master(包含 PR CortexReach#626 proactive cleanup),並保留 James 針對 Issue CortexReach#415 的修復: - 從 PR CortexReach#626 引入 proactive cleanup(age > 5 分鐘的 stale lock 自動清除) - 【修復 CortexReach#415】保守 retries 設定: - minTimeout: 1000ms(避免高負載下過度密集重試) - maxTimeout: 30000ms(支撐更久的 event loop 阻塞) - stale: 10000ms - 【修復 CortexReach#415】onCompromised flag:lock compromised 時不立即崩潰, 由 finally block 統一處理 fn() 錯誤 vs compromisedErr 的抛出邏輯 - 新增 lock-stress-test.mjs:驗證並發寫入、重試行為、stress test PR CortexReach#517: CortexReach/memory-lancedb-pro Issue CortexReach#415: ECOMPROMISED crash under event-loop pressure
- PR CortexReach#517: onCompromised callback for graceful ECOMPROMISED handling - PR CortexReach#517: retries 10, maxTimeout 30000ms, minTimeout 1000ms - PR CortexReach#626: proactive stale lock cleanup (>5min) - PR CortexReach#626: updateQueue for intra-instance write serialization Issue CortexReach#632: lock contention in multi-agent environment
Summary
Fixes two related lock issues in memory-lancedb-pro:
Changes
src/store.ts
unWithFileLock()\ AND
unSerializedUpdate()\
test/lock-recovery.test.mjs
test/store-write-queue.test.mjs
Testing
Closes #622
Closes #623