fix(mcp): async tool handlers + pre-computed embeddings — parallel-store hang#318
Open
Huntehhh wants to merge 1 commit into
Open
Conversation
…ore hang Resolves the 10-15s harness hang when 3+ truememory_store or search MCP calls fire in parallel. Three layered changes: 1. mcp_server.py — 7 hot-path @mcp.tool() handlers (store / search / search_deep / get / forget / stats / entity_profile) changed from sync `def` to `async def`. Engine calls run via `await asyncio.to_thread(...)` so FastMCP's event-loop thread stays free for concurrent JSON-RPC requests. truememory_configure stays sync — heavy state mutation, called once at setup. 2. telemetry.py — `@tracked` is now async-aware. Wrapping an `async def` in the old sync wrapper produced an unawaited coroutine object that silently defeated the async-ification. 3. engine.py — `add()` pre-computes both content + separation embeddings OUTSIDE `_write_lock`. Previously the lock was held during the two ~10-50ms model.encode() calls, serializing all concurrent stores. PyTorch releases the GIL inside .encode(), so concurrent stores can now overlap on inference; they only contend at the INSERTs (μs). Tests: - tests/test_concurrent_store_hang.py (new): three regression locks — threaded engine.add(), MCP handler-shape check, asyncio.gather() end-to-end. - tests/test_health_stats.py: wrap the now-async truememory_stats() in asyncio.run(). Co-Authored-By: claude-opus-4-7 <wontreply@getfucked.ai>
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves the 10-15 second Claude Code harness hang when 3+ parallel
truememory_store(ortruememory_search) MCP calls are issued in a single tool batch. Server-side writes complete in ~60ms — the hang is at the MCP transport + engine-lock layers.Two-layer fix:
MCP layer: 7 hot-path
@mcp.tool()handlers changeddef → async def. Engine calls wrapped inasyncio.to_thread()so FastMCP's single event-loop thread stays free for concurrent JSON-RPC requests. Also fixes@trackedto return an async wrapper for coroutine functions (otherwise the async-ification is silently defeated by the existing sync wrapper).Engine layer:
TrueMemoryEngine.add()pre-computes both embeddings OUTSIDE_write_lock. Previously the lock was held during ~10-50ms ofmodel.encode()per store, serializing all concurrent writes. PyTorch releases the GIL inside.encode(), so concurrent stores now overlap on inference; they only contend at the actual INSERTs (μs).Changes
truememory/mcp_server.pydef → async def; engine calls viaawait asyncio.to_thread(...)truememory/engine.pyadd()pre-computes content + separation embeddings outside_write_locktruememory/telemetry.py@trackeddetects coroutine functions and returns an async wrapper for themtests/test_health_stats.pytest_truememory_stats_includes_healthto await the now-async handlertests/test_concurrent_store_hang.pyengine.add(), handler-shape check,asyncio.gather()end-to-end)truememory_configureintentionally stays sync — heavy state mutation, called once at setup, not on the hot path.Test Plan
pytest tests/test_concurrent_store_hang.py(3/3)pytest tests/test_health_stats.py(11/11)pytest tests/test_ensure_connection_threading.py(2/2)test_platform_compat.pyPOSIX-path tests,test_spawn_gate.pysystem-memory dependent,test_cli_help.pylocale issues, etc.)Breaking Changes
The 7 MCP tool handlers are now
async def. Transparent for MCP clients (JSON-RPC wire protocol unchanged). Direct Python callers must update:Co-Authored-By: claude-opus-4-7 wontreply@getfucked.ai