Skip to content

fix(mcp): async tool handlers + pre-computed embeddings — parallel-store hang#318

Open
Huntehhh wants to merge 1 commit into
buildingjoshbetter:mainfrom
Huntehhh:fix/async-mcp-handlers-parallel-store-hang
Open

fix(mcp): async tool handlers + pre-computed embeddings — parallel-store hang#318
Huntehhh wants to merge 1 commit into
buildingjoshbetter:mainfrom
Huntehhh:fix/async-mcp-handlers-parallel-store-hang

Conversation

@Huntehhh
Copy link
Copy Markdown
Contributor

Summary

Resolves the 10-15 second Claude Code harness hang when 3+ parallel truememory_store (or truememory_search) MCP calls are issued in a single tool batch. Server-side writes complete in ~60ms — the hang is at the MCP transport + engine-lock layers.

Two-layer fix:

  1. MCP layer: 7 hot-path @mcp.tool() handlers changed def → async def. Engine calls wrapped in asyncio.to_thread() so FastMCP's single event-loop thread stays free for concurrent JSON-RPC requests. Also fixes @tracked to return an async wrapper for coroutine functions (otherwise the async-ification is silently defeated by the existing sync wrapper).

  2. Engine layer: TrueMemoryEngine.add() pre-computes both embeddings OUTSIDE _write_lock. Previously the lock was held during ~10-50ms of model.encode() per store, serializing all concurrent writes. PyTorch releases the GIL inside .encode(), so concurrent stores now overlap on inference; they only contend at the actual INSERTs (μs).

Changes

File Change
truememory/mcp_server.py 7 tool handlers def → async def; engine calls via await asyncio.to_thread(...)
truememory/engine.py add() pre-computes content + separation embeddings outside _write_lock
truememory/telemetry.py @tracked detects coroutine functions and returns an async wrapper for them
tests/test_health_stats.py Update test_truememory_stats_includes_health to await the now-async handler
tests/test_concurrent_store_hang.py NEW — 3 regression tests (threaded engine.add(), handler-shape check, asyncio.gather() end-to-end)

truememory_configure intentionally stays sync — heavy state mutation, called once at setup, not on the hot path.

Test Plan

  • New regression tests pass: pytest tests/test_concurrent_store_hang.py (3/3)
  • Health stats tests pass: pytest tests/test_health_stats.py (11/11)
  • Threading tests pass: pytest tests/test_ensure_connection_threading.py (2/2)
  • Wider suite on Windows: 572 pass / 17 pre-existing failures unrelated to this change (test_platform_compat.py POSIX-path tests, test_spawn_gate.py system-memory dependent, test_cli_help.py locale issues, etc.)
  • Manual verification needed on POSIX — pytest on Linux/macOS to confirm no regression in the Windows-failing tests

Breaking Changes

The 7 MCP tool handlers are now async def. Transparent for MCP clients (JSON-RPC wire protocol unchanged). Direct Python callers must update:

- result = truememory_store(content="...", user_id="...")
+ result = await truememory_store(content="...", user_id="...")
+ # or, from sync code:  result = asyncio.run(truememory_store(content="...", user_id="..."))

Co-Authored-By: claude-opus-4-7 wontreply@getfucked.ai

…ore hang

Resolves the 10-15s harness hang when 3+ truememory_store or search MCP
calls fire in parallel. Three layered changes:

1. mcp_server.py — 7 hot-path @mcp.tool() handlers (store / search /
   search_deep / get / forget / stats / entity_profile) changed from
   sync `def` to `async def`. Engine calls run via
   `await asyncio.to_thread(...)` so FastMCP's event-loop thread stays
   free for concurrent JSON-RPC requests. truememory_configure stays
   sync — heavy state mutation, called once at setup.

2. telemetry.py — `@tracked` is now async-aware. Wrapping an `async def`
   in the old sync wrapper produced an unawaited coroutine object that
   silently defeated the async-ification.

3. engine.py — `add()` pre-computes both content + separation embeddings
   OUTSIDE `_write_lock`. Previously the lock was held during the two
   ~10-50ms model.encode() calls, serializing all concurrent stores.
   PyTorch releases the GIL inside .encode(), so concurrent stores can
   now overlap on inference; they only contend at the INSERTs (μs).

Tests:
- tests/test_concurrent_store_hang.py (new): three regression locks —
  threaded engine.add(), MCP handler-shape check, asyncio.gather()
  end-to-end.
- tests/test_health_stats.py: wrap the now-async truememory_stats() in
  asyncio.run().

Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant