fix: shared model server + tier switch bug fixes#343
Merged
Conversation
VectorCacheRegistry.set() was called without last_embedded_id, resetting it to 0 after every rebuild. This broke delta rebuilds since the system would think no messages had been embedded. Now queries the vec table for the actual MAX(rowid) and COUNT(*) before finalizing.
migrate_legacy_vec_tables() was using the currently-active tier group, which could put edge-model vectors into basepro tables. Now detects the actual model from metadata (defaulting to edge for pre-tier-switch DBs) and uses VectorCacheRegistry.set() for proper registration.
Replace triple-sampling + adaptive sleeping with a single RAM check per batch. Old behavior: 7s overhead per batch (5s triple-sample + 2s adaptive sleep) for 1s of work. New: 0.2s pause + 5s only if RAM < 2GB. Batch sizes fixed at init based on total RAM and device type. OOM recovery still handled by the worker (halve and retry).
Models were preloaded eagerly on startup, then the idle timer would unload them 5 minutes later if no search happened, wasting the initial load. Now lazy-loads by default (models load on first search). Users who want eager preloading can set TRUEMEMORY_PRELOAD_MODELS=1.
#335) Add a standalone model server process that loads the embedding model and reranker ONCE, serving all TrueMemory processes (MCP server, ingest hooks, CLI) over a Unix domain socket. Reduces memory from ~10GB (5 processes x 2GB each) to ~2.5GB (1 server + 5 lightweight clients). - truememory/model_server.py: UDS listener, lazy model loading, idle timeout auto-shutdown, PID lifecycle management - truememory/model_client.py: EmbeddingProxy/RerankerProxy drop-in replacements, auto-start logic, transparent fallback to local loading - Integration: get_model() and get_reranker() use server when available, fall back to local loading when server isn't running (e.g., in tests) - MCP server startup calls ensure_server_running() to launch the server - Set TRUEMEMORY_NO_MODEL_SERVER=1 to force local loading
This was referenced May 16, 2026
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
model_server.py+model_client.py): Loads embedding model and reranker ONCE, serves all processes over Unix domain socket. Reduces memory from ~10GB (5 processes × 2GB each) to ~2.5GB (1 server + 5 lightweight clients). Auto-starts on MCP server launch, auto-stops after idle timeout (300s). Falls back to local loading if server unavailable._finalize_rebuildresettinglast_embedded_id(basepro last_embedded_id=0 after rebuild: _finalize_rebuild overwrites progress #332): Was zeroing out progress after every successful rebuild, breaking delta rebuilds. Now queries the actual vec table for correct values.Test plan