True Long-Term Memory & Agentic Awareness for SillyTavern - With Branch Support
This is a fork of OpenVault with added chat branch awareness. When you create a branch from an earlier message, memories from messages that don't exist in the branch are automatically pruned.
![]() |
![]() |
![]() |
OpenVault transforms your characters from simple chatbots into aware participants. It gives them narrative memory: the ability to recall specific events, track relationship dynamics (Trust/Tension), and remember emotional shifts, all while respecting the character's Point of View
Unlike standard vector storage, OpenVault uses a Smart Agentic Pipeline to decide what is worth remembering and when to recall it
- 🧠 Intelligent Extraction: Automatically analyzes your chat to save significant moments (Actions, Revelations, Emotions) while ignoring small talk
- 👁️ POV-Aware: No more meta-gaming. Characters only remember what they actually witnessed or were told
- ❤️ Relationship Tracking: Tracks Trust and Tension levels that evolve naturally based on your interactions
- 🔎 Hybrid Search: Combines Semantic Search (vibes/meaning) with Keyword Search (specific names/terms) to find the perfect memory
- 📉 Narrative Decay: Memories fade naturally over time unless they are highly important or reinforced
- 🙈 Auto-Hide: Keeps your prompt clean by hiding old messages, while OpenVault keeps the memories alive in the background
- 🔒 100% Local & Private: All data is stored in your chat file. Supports local embeddings (WASM/WebGPU) or Ollama
- 🌿 Branch-Aware: Automatically prunes memories when switching to chat branches, ensuring each timeline has its own consistent memory state
When you create a chat branch in SillyTavern (from an earlier message), the original OpenVault would carry over ALL memories from the parent chat - even memories extracted from messages that don't exist in the branch.
OpenVault Branch fixes this by:
- Detecting when you switch to a branch with fewer messages than the memories reference
- Automatically pruning memories that reference non-existent messages
- Cleaning up character states and relationships accordingly
- Showing a toast notification when pruning occurs
Example:
- Parent chat has 200 messages with 50 memories
- You create a branch from message #3
- OpenVault Branch automatically removes memories from messages 4-200
- Your branch now has a clean memory state matching its actual message history
- Open SillyTavern
- Navigate to Extensions > Install Extension
- Paste this URL:
https://github.com/vadash/openvault - Click Install
- Reload SillyTavern
- Enable: Go to the OpenVault tab (top of extensions list) and check Enable OpenVault
- Configure LLM: Select your Extraction Profile (what model writes the memories) and Retrieval Profile (what model picks memories, optional). pick fast non reason model like glm air or free Nvidia NIM kimi k2
- Embeddings: Choose e5 or if you have modern gpu (RTX 2060 and above) try gemma
- Chat: Just roleplay! OpenVault works in the background
- Before the AI replies, OpenVault injects relevant memories
- After the AI replies, OpenVault analyzes the new messages for memories
A visual overview of your memory health
- Status: Shows if the system is Ready, Extracting, or Retrieving
- Quick Toggles: Turn the system on/off or toggle Auto-Hide
- Extraction Progress: Shows if there are backlog messages waiting to be processed
Browse everything your character remembers
- Search & Filter: Find memories by specific characters or event types (Action, Emotion, etc.)
- Edit: Fix incorrect details or change the importance rating (1-5 stars) of a memory
- Delete: Remove memories that didn't happen or aren't wanted
- Smart Retrieval: Keeps the AI involved in the recall process. It reads the top potential memories and picks only the ones truly relevant to the current scene. Try it with ON and OFF
Embeddings allow the AI to find memories based on meaning (e.g., searching "Fight" finds "Combat")
- Browser Models (Transformers.js): Runs entirely in your browser
- bge: Best for English. Fast
- gemma: Very smart, but requires WebGPU (Chrome/Edge with hardware acceleration)
- Ollama: Offload the work to your local LLM backend
- Context Window Size: How much past chat the LLM reads when writing new memories. Higher = better context, slower generation
- Pre-filter / Final Budget: Controls how many tokens are used for memory processing vs. final injection into the prompt
Fine-tune how the engine finds memories:
- Semantic Match Weight: Turns up the "Vibes" search. Finds conceptually similar events
- Keyword Match Weight: Turns up "Exact" search. Essential for finding specific names or proper nouns
- Semantic Threshold: The strictness filter. Lower values let more "loosely related" memories through; higher values require exact matches
OpenVault can automatically "hide" messages older than a specific threshold (default: 50)
- Hidden messages are removed from the prompt sent to the LLM, saving you money and tokens
- However, OpenVault has already extracted the memories from those messages
- Result: You can have a chat with 5,000 messages, but only send ~50 messages + ~10 relevant memories to the AI. Infinite context feel with zero token bloat
"WebGPU not available"
- WebGPU requires a secure context (HTTPS or Localhost). If accessing SillyTavern over a local network IP (e.g.,
192.168.1.x), you must enable "Insecure origins treated as secure" in your browser flags:
- Go to
chrome://flags - Enable
#enable-unsafe-webgpu - Enable
#enable-webgpu-developer-features - In
#unsafely-treat-insecure-origin-as-secureadd your SillyTavern URL - Restart browser
"Ollama Connection Failed"
- Ensure your Ollama server is running with
OLLAMA_ORIGINS="*"environment variable set to allow browser access
"Extraction is skipped/stuck"
- Check the SillyTavern server console. Ensure your Main API is connected and not busy generating a reply
OpenVault Branch is Free & Open Source software licensed under AGPL-3.0 Based on OpenVault - Created for the SillyTavern community
Version 1.32 (Branch Fork)


