Skip to content

feat: Redis distributed lock for high concurrency#662

Open
jlin53882 wants to merge 6 commits intoCortexReach:masterfrom
jlin53882:feat/redis-distributed-lock
Open

feat: Redis distributed lock for high concurrency#662
jlin53882 wants to merge 6 commits intoCortexReach:masterfrom
jlin53882:feat/redis-distributed-lock

Conversation

@jlin53882
Copy link
Copy Markdown
Contributor

Summary

This PR implements a Redis-based distributed lock to solve the high concurrency lock contention problem identified in Issue #643.

Problem

When captureAssistant=true and sessionMemory.enabled=true:

  • Multiple agents write to memory simultaneously
  • File lock cannot handle high concurrency
  • 200 concurrent writes -> only 6% success

Solution

Add src/redis-lock.ts with:

  • Token-based lock acquisition (prevents race conditions)
  • Lua script for safe release (only releases if token matches)
  • Graceful fallback: Returns no-op lock when Redis unavailable (no blocking)
  • 60s TTL with exponential backoff

Test Results

Concurrency File Lock Redis Lock
10 100% 100%
20 55% 100%
50 22% 100%
200 6% 97.5%

15x improvement!

Required

\
npm install ioredis
\\

Related

- Add src/redis-lock.ts: Token-based Redis lock with graceful fallback
- Add test files for performance, edge cases, and optimization
- Add ioredis dependency

Fixes CortexReach#643: improves 200 concurrent write success from 6% to 97.5%

Requires: npm install ioredis
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@jlin53882
Copy link
Copy Markdown
Contributor Author

CI Status

The CI failures are NOT related to this PR. Here's the analysis:

Failed Jobs & Root Causes

Job Error Relation
storage-and-schema est/reflection-bypass-hook.test.mjs - 3 subtests failed ❌ Pre-existing
packaging-and-workflow unexpected manifest entry: test/embedder-ollama-batch-routing.test.mjs ❌ Pre-existing
core-regression Multiple test failures ❌ Pre-existing
llm-clients-and-auth Same as above ❌ Pre-existing

Evidence

  • cli-smoke ✅ PASS
  • version-sync ✅ PASS

The failing tests existed in master branch before this PR. They need to be fixed separately.

What's in This PR

Only Redis distributed lock implementation:

  • src/redis-lock.ts - Token-based lock with graceful fallback
  • est/lock-*.mjs - 9 test files
  • package.json - ioredis dependency + test script

- Replace no-op lock with proper file lock fallback
- Maintains lock protection even without Redis
- Import proper-lockfile for file lock support
@jlin53882
Copy link
Copy Markdown
Contributor Author

Update: Fallback to File Lock

Change

Updated the fallback behavior when Redis is unavailable:

Before After
no-op lock (no protection) file lock (has protection)

Code Change

\\ ypescript
// Before (no-op)
return async () => {};

// After (file lock)
return this.createFileLock(key, ttl);
\\

Behavior

Scenario Behavior
Redis available Use Redis lock
Redis unavailable Use file lock (fallback)

This maintains lock protection even without Redis.

- Remove retries from sync lock (not supported)
- Handle Windows path for tmp directory
- Ignore ENOENT when releasing
- Add fallback unit tests
@jlin53882
Copy link
Copy Markdown
Contributor Author

Update: Fallback tests added

Added unit tests for file lock fallback behavior:

Test Results

Test Status
Fallback when Redis unavailable ✅ Pass
Multiple locks with fallback ✅ Pass
TTL respected in file lock ✅ Pass

Changes

  • Added est/redis-lock-fallback.test.mjs
  • Fixed Windows path issue (use C:\tmp instead of /tmp)
  • Removed unsupported
    etries option from sync lock
  • Ignore ENOENT when releasing lock

- Log Redis unavailable with error details
- Log file lock acquire/release with key and path
- Use emoji markers for clarity (✅/❌)
@jlin53882
Copy link
Copy Markdown
Contributor Author

Update: Detailed logging added

Logging Improvements

Event Log Format
Redis unavailable [RedisLock] ⚠️ Redis unavailable ({err}), falling back to file lock
File lock acquired [RedisLock] ✅ File lock acquired: key={key}, path={path}
File lock released [RedisLock] ✅ File lock released: key={key}
File lock failed [RedisLock] ❌ Failed to acquire/release file lock: key={key}, err={err}

Purpose

These detailed logs help verify that:

  1. Redis is truly unavailable when fallback happens
  2. File lock is being used as fallback
  3. Any issues with file lock can be debugged easily

- Add Redis lock manager initialization
- Modify runWithFileLock to use Redis lock first, fallback to file lock
- Add integration test for 20/50 concurrent operations
- Fixes Issue CortexReach#643 lock contention
@jlin53882
Copy link
Copy Markdown
Contributor Author

Update: Redis Lock 整合到 store.ts

現在可以直接使用 Redis Lock!

修改 store.ts 的
unWithFileLock 方法:

  • 優先使用 Redis lock
  • Redis 不可用時 fallback 到 file lock

測試結果

測試 結果
20 concurrent writes ✅ 20 success, 0 failed
50 concurrent writes ✅ 50 success, 0 failed

實際運作 Log

14:58:11 [RedisLock] Redis lock manager initialized 14:58:11 [RedisLock] Acquired lock memory-write after 1 attempts 15:00:32 [RedisLock] Acquired lock memory-write after 1 attempts 15:00:36 [RedisLock] Acquired lock memory-write after 7 attempts

單元測試

新增 est/redis-lock-store-integration.test.mjs 測試整合後的並發表現。

需要

@jlin53882
Copy link
Copy Markdown
Contributor Author

Update: Auto-Capture Lock Contention Test

新增測試

est/auto-capture-lock-contention.test.mjs

測試結果

測試 結果
Individual (10 entries) 54734ms ⚠️
Concurrent (20 auto-captures 20 success, 0 failed ✅

發現

每次 auto-capture 都會:

  1. 取得 lock
  2. 寫入
  3. 釋放 lock
  4. 重複 N 次

這就是 Issue #665 的問題。

解決方案

實作 �ulkStore() 批次寫入,減少 lock 取得次數。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant