CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Vlt-Bridge (formerly Document-MCP) is a monorepo containing:

Document-MCP: Multi-tenant Obsidian-like documentation viewer with AI-first workflow
vlt-cli: AI memory and context retrieval CLI tool with CodeRAG

AI agents write/update documentation via MCP (Model Context Protocol), while humans read and edit through a web UI. The system provides per-user vaults with Markdown notes, full-text search (SQLite FTS5), wikilink resolution, tag indexing, and backlink tracking.

Vlt Oracle: Multi-source intelligent context retrieval for AI coding agents, combining:

vlt threads: Development history and memory
Markdown vault: Documentation (Document-MCP)
CodeRAG: Code understanding with hybrid retrieval (vector + BM25 + graph)

Architecture: Python 3.11+ backend (FastAPI + FastMCP) + React 19 frontend (Vite 7 + shadcn/ui) + vlt-cli (Python CLI)

Monorepo Structure

Vlt-Bridge/
├── backend/           # Document-MCP FastAPI backend
├── frontend/          # Document-MCP React frontend
├── packages/
│   └── vlt-cli/       # vlt CLI tool (memory, threads, oracle, coderag)
├── specs/             # Feature specifications (SpecKit)
└── data/              # Local data (vaults, indexes)

Key Concepts:

Vault: Per-user filesystem directory containing .md files
MCP Server: Exposes tools for AI agents (STDIO for local, HTTP for remote with JWT)
Indexer: SQLite FTS5 for full-text search + separate tables for tags/links/metadata
Wikilinks: [[Note Name]] resolved via case-insensitive slug matching (prefers same folder, then lexicographic)
Optimistic Concurrency: Version counter in SQLite (not frontmatter); UI sends if_version, MCP uses last-write-wins
RAG: LlamaIndex with Gemini embeddings for semantic search over vault content
TTS: ElevenLabs integration for text-to-speech note reading

Development Commands

Quick Start (Full Stack)

# Automated startup (recommended)
./start-dev.sh                # Starts backend (8000) + frontend (5173)
./stop-dev.sh                 # Stop both services
./status-dev.sh               # Check running processes

Backend (Python 3.11+)

cd backend

# Setup (first time)
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .
uv pip install -e ".[dev]"   # Dev dependencies (pytest, httpx)

# Run FastAPI HTTP server (for UI)
uv run uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000

# Run MCP STDIO server (for Claude Desktop/Code)
uv run python src/mcp/server.py

# Run MCP HTTP server (for remote clients with JWT)
uv run python src/mcp/server.py --http --port 8001

# Tests
uv run pytest                          # All tests
uv run pytest tests/unit               # Unit tests only
uv run pytest tests/integration        # Integration tests
uv run pytest -k test_vault_write      # Single test pattern
uv run pytest -v                       # Verbose output
uv run pytest --lf                     # Last failed tests

Frontend (Node 18+, React 19 + Vite 7)

cd frontend

# Setup (first time)
npm install

# Development server
npm run dev                   # Start Vite dev server (http://localhost:5173)

# Build
npm run build                 # TypeScript compile + Vite build to dist/

# Lint
npm run lint                  # ESLint check

# Preview production build
npm run preview               # Serve dist/ (after npm run build)

Docker (Local Testing)

# Build and run container locally (mirrors HF Spaces deployment)
docker build -t document-mcp .
docker run -p 7860:7860 -e JWT_SECRET_KEY="dev-secret" document-mcp
# Access at http://localhost:7860

Database Initialization

# Backend database is auto-initialized on first run
# Manual reset (WARNING: destroys all data)
cd backend
rm -f ../data/index.db
uv run python -c "from src.services.database import DatabaseService; DatabaseService().initialize()"

Architecture Deep Dive

Backend Service Layers

3-tier architecture:

Models (backend/src/models/): Pydantic schemas for validation
- note.py: Note, NoteMetadata, NoteSummary
- user.py: User, UserProfile
- search.py: SearchResult, SearchQuery
- index.py: IndexHealth
- auth.py: TokenRequest, TokenResponse
Services (backend/src/services/): Business logic
- vault.py: Filesystem operations (read/write/list/delete notes)
  - validate_note_path(): Path security (no .., max 256 chars, Unix separators)
  - sanitize_path(): Resolves and enforces vault root boundary
- indexer.py: SQLite FTS5 + metadata tracking
  - index_note(): Updates metadata, FTS, tags, links (synchronous on every write)
  - search_notes(): BM25 ranking with title 3x weight, body 1x, recency bonus
  - get_backlinks(): Follows link graph (note → sources that reference it)
- auth.py: JWT + HF OAuth integration
  - create_access_token(): Issues JWT with sub=user_id, exp=90days
  - verify_token(): Validates JWT and extracts user_id
- config.py: Env var management (MODE, JWT_SECRET_KEY, VAULT_BASE_DIR, etc.)
- database.py: SQLite connection manager + schema DDL
API/MCP (backend/src/api/ and backend/src/mcp/):
- api/routes/: FastAPI endpoints
  - auth.py: OAuth, JWT, user endpoints
  - notes.py: CRUD operations (with optimistic concurrency)
  - search.py: Full-text search
  - index.py: Index rebuild/health
  - graph.py: Note relationship graph for visualization
  - rag.py: RAG/vector DB queries (LlamaIndex + Gemini)
  - tts.py: Text-to-speech (ElevenLabs)
  - demo.py, system.py: Demo data seeding, system info
- api/middleware/auth_middleware.py: JWT Bearer token validation
- mcp/server.py: FastMCP tools (7 tools: list, read, write, delete, search, backlinks, tags)

Critical Path Validation (in vault.py):

All note paths MUST pass validate_note_path() (returns (bool, str) tuple)
Then sanitize_path() resolves and ensures no vault escape
Failure = 400 Bad Request with specific error message

SQLite Index Schema

5 tables (see backend/src/services/database.py):

note_metadata: Version tracking, size, timestamps (per note)
note_fts: Contentless FTS5 with porter tokenizer, prefix='2 3' for autocomplete
note_tags: Many-to-many (user_id, note_path, tag)
note_links: Link graph (source_path → target_path, is_resolved flag)
index_health: Aggregate stats (note_count, last_full_rebuild, last_incremental_update)

Indexer Update Flow (in indexer.py):

write_note() → vault.write_note() → indexer.index_note()
                                  ↓
                            [metadata table: version++]
                            [FTS table: re-insert title+body]
                            [tags table: clear + re-insert]
                            [links table: extract wikilinks, resolve, update backlinks]
                            [health table: note_count++, last_incremental_update=now]

Wikilink Resolution Algorithm

In indexer.py (resolve_wikilink logic):

Normalize link text to slug: normalize_slug("API Design") → "api-design"
Find all notes where slug matches normalize_slug(title) or normalize_slug(filename_stem)
If multiple matches:
- Prefer same folder as source note
- Else lexicographically smallest path (ASCII sort)
Store in note_links table with is_resolved=1 (or 0 if no match)

Broken links are tracked (is_resolved=0) and can be queried for UI "Create note" affordance.

MCP Server Modes

STDIO (python src/mcp/server.py):

For Claude Desktop/Code local integration
Uses LOCAL_USER_ID from env (default: "local-dev")
No authentication

HTTP (python src/mcp/server.py --http --port 8001):

For remote clients (HF Space deployment)
Requires Authorization: Bearer <jwt> header
JWT validated → user_id extracted → scoped to that user's vault

Endpoint: Tools defined in mcp/server.py with FastMCP decorators (@mcp.tool)

Frontend Architecture

Component Hierarchy:

App.tsx (main layout, routing)
├── MainApp.tsx (authenticated app shell)
│   ├── DirectoryTree.tsx (left sidebar: vault explorer)
│   ├── NoteViewer.tsx (read mode: react-markdown rendering)
│   ├── NoteEditor.tsx (edit mode: split view with live preview)
│   ├── SearchBar.tsx (debounced search with dropdown)
│   ├── ChatPanel.tsx (AI chat interface for RAG)
│   ├── GraphView.tsx (note relationship visualization)
│   └── TableOfContents.tsx (heading navigator)
├── Login.tsx (HF OAuth flow)
└── Settings.tsx (token access, preferences)

Key Libraries:

react-markdown + remark-gfm: Markdown rendering with GFM support
shadcn/ui: UI components (30+ primitives from Radix UI)
react-force-graph-2d: Note relationship graph visualization
react-resizable-panels: Split pane layout
lib/wikilink.ts: Parse [[...]] + resolve via GET /api/backlinks
services/api.ts: Fetch wrapper with Bearer token injection

Wikilink Rendering (in NoteViewer.tsx):

Custom react-markdown renderer for links
Detect [[Note Name]] pattern → fetch backlinks → resolve to path → make clickable
Broken links styled differently (e.g., red/dashed underline)

Version Conflict Flow (Optimistic Concurrency)

UI Edit Scenario:

User opens note → GET /api/notes/{path} → receives {..., version: 5}
User edits → clicks Save → PUT /api/notes/{path} with {"if_version": 5, ...}
Backend checks: if current version != 5 → return 409 Conflict
UI shows "Note changed, please reload" message

MCP Write: No version check, always succeeds (last-write-wins).

Environment Configuration

See .env.example for all variables. Key settings:

MODE: local (single-user, no OAuth) or space (HF multi-tenant)
JWT_SECRET_KEY: Generate with python -c "import secrets; print(secrets.token_urlsafe(32))"
VAULT_BASE_DIR: Where vaults are stored (e.g., ./data/vaults)
DB_PATH: SQLite database file (e.g., ./data/index.db)
LOCAL_USER_ID: Default user for local mode (default: local-dev)

HF Space variables (only needed when MODE=space):

HF_OAUTH_CLIENT_ID, HF_OAUTH_CLIENT_SECRET, HF_SPACE_HOST

Optional integrations:

GOOGLE_API_KEY: Gemini API for RAG embeddings and LLM
ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID, ELEVENLABS_MODEL: TTS integration

Constraints & Limits

Note size: 1 MiB max (enforced in vault.py)
Vault limit: 5,000 notes per user (configurable in indexer.py)
Path length: 256 chars max (validated in vault.py)
Wikilink syntax: Only [[wikilink]] supported (no aliases like [[link|alias]])

Performance Targets

MCP operations: <500ms for 1,000-note vaults
UI directory load: <2s
Note render: <1s
Search: <1s for 5,000 notes
Index rebuild: <30s for 1,000 notes

SpecKit Workflow (in .specify/)

This repo uses the SpecKit methodology for feature planning:

specs/###-feature-name/: Feature documentation
- spec.md: User stories, requirements, success criteria
- plan.md: Tech stack, architecture, structure
- data-model.md: Entities, schemas, validation
- contracts/: OpenAPI + MCP tool schemas
- tasks.md: Implementation task checklist
Slash commands: /speckit.specify, /speckit.plan, /speckit.tasks, /speckit.implement
Scripts: .specify/scripts/bash/ (feature scaffolding, context updates)

Implemented features: 001-obsidian-docs-viewer, 002-add-graph-view, 003-ai-chat-window, 004-gemini-vault-chat, 006-ui-polish, 011-coderag-project-init

CodeRAG Commands

The vlt CLI includes CodeRAG functionality for indexing and searching codebases with hybrid retrieval (vector + BM25 + graph).

Initialize Code Index

# Interactive project selection
vlt coderag init

# Specify project directly
vlt coderag init --project <project-id>

# Index specific directory
vlt coderag init --project <project-id> --path /path/to/codebase

# Force re-index (overwrite existing)
vlt coderag init --project <project-id> --force

# Run in foreground with progress display
vlt coderag init --project <project-id> --foreground

Notes:

By default, indexing runs in background via the daemon
If daemon is not running, you will be prompted to run in foreground
Existing indexes require --force to overwrite

Check Indexing Status

# Human-readable status
vlt coderag status --project <project-id>

# JSON output for scripting
vlt coderag status --project <project-id> --json

Status values:

pending: Job queued, waiting for daemon
running: Indexing in progress
completed: Indexing finished successfully
failed: Indexing failed (check error_message)
cancelled: Job was cancelled by user

Search Code Index

# Semantic search
vlt coderag search "function that handles authentication" --project <project-id>

# Limit results
vlt coderag search "error handling" --project <project-id> --limit 5

Repository Map

# Generate overview of codebase structure
vlt coderag map --project <project-id>

# Focus on specific directory
vlt coderag map --project <project-id> --scope src/api/

Daemon Management (for background indexing)

# Start daemon
vlt daemon start

# Stop daemon
vlt daemon stop

# Check daemon status
vlt daemon status

Supported Languages

CodeRAG supports: python, typescript, tsx, javascript, go, rust

Files matching patterns in coderag.toml (or default **/*.py) are indexed.

Configuration (coderag.toml)

Place in project root for custom settings:

[coderag]
include = ["**/*.py", "**/*.ts", "**/*.tsx"]
exclude = ["**/node_modules/**", "**/.venv/**", "**/dist/**"]

[coderag.embedding]
batch_size = 10

[coderag.repomap]
max_tokens = 4000
include_signatures = true

MCP Client Configuration

Claude Desktop (STDIO, local mode):

{
  "mcpServers": {
    "document-mcp": {
      "command": "uv",
      "args": ["run", "python", "src/mcp/server.py"],
      "cwd": "/absolute/path/to/Document-MCP/backend"
    }
  }
}

Remote HTTP (HF Space with JWT):

{
  "mcpServers": {
    "document-mcp": {
      "url": "https://your-space.hf.space/mcp",
      "transport": "http",
      "headers": {
        "Authorization": "Bearer YOUR_JWT_TOKEN"
      }
    }
  }
}

Obtain JWT: POST /api/tokens after HF OAuth login.

ChatGPT Widget Integration

The app can be embedded in ChatGPT as an iFrame:

Widget served at /widget.html with special MIME type text/html+skybridge
MCP endpoint remains accessible for other AI agents simultaneously
Entry point: frontend/src/widget.tsx

Agent Notification System (ANS)

The ANS provides real-time notifications to AI agents during task execution, enabling self-awareness about tool failures, budget limits, and operational issues.

Architecture Overview

Event Source (oracle_agent.py, tool_executor.py)
    │
    ▼ emit(Event)
EventBus (pub/sub)
    │
    ▼ notify handlers
SubscriberLoader → Subscriber configs (*.toml)
    │
    ▼ filter + batch
NotificationAccumulator
    │
    ▼ render with template
ToonFormatter (Jinja2 + python-toon)
    │
    ▼ yield OracleStreamChunk(type="system")
SSE Stream → Frontend ChatPanel

Components

EventBus (backend/src/services/ans/bus.py):

Pub/sub pattern for decoupled event emission
Supports wildcard subscriptions (e.g., tool.*)
Thread-safe with overflow handling

Subscribers (backend/src/services/ans/subscribers/*.toml):

TOML-based configuration for each notification type
Define event types, severity filters, batching windows
Reference Jinja2 templates for formatting

Example subscriber config:

[subscriber]
id = "tool_failure"
name = "Tool Failure Notifications"

[events]
types = ["tool.call.failure", "tool.call.timeout"]
severity_filter = "warning"

[output]
priority = "high"
inject_at = "after_tool"
template = "tool_failure.toon.j2"
core = true  # Cannot be disabled by user

Templates (backend/src/services/ans/templates/*.toon.j2):

Jinja2 templates producing TOON (Token-Optimized Object Notation) output
Compact format optimized for LLM context windows
Supports batching multiple events into single notification

Notification Injection Points:

turn_start: Injected before agent receives next prompt (budget warnings)
after_tool: Injected after tool execution (tool failures)
immediate: Injected as soon as event occurs (critical alerts)

Frontend Display

System messages appear in ChatPanel with distinct styling:

Yellow/amber left border and background
AlertCircle icon with "System" attribution
Rendered inline with agent/user messages
Persisted in context_nodes.system_messages_json

Available Subscribers

Subscriber	Events	Priority	Inject At
tool_failure	tool.call.failure, tool.call.timeout	high	after_tool
budget_warning	budget.token.warning, budget.iteration.warning	normal	turn_start
budget_exceeded	budget.token.exceeded, budget.iteration.exceeded	critical	immediate
loop_detected	agent.loop.detected	high	immediate

Extending ANS

Create backend/src/services/ans/subscribers/my_subscriber.toml
Create backend/src/services/ans/templates/my_subscriber.toon.j2

Emit events from your service code:

# Within backend/src/services/, use relative imports:
from .ans.bus import get_event_bus
from .ans.event import Event, Severity

bus = get_event_bus()
bus.emit(Event(
    type="my.custom.event",
    source="my_service",
    severity=Severity.INFO,
    payload={"message": "Something happened"}
))

User Settings

Users can toggle non-core subscribers via Settings > Notifications tab. Core subscribers (marked core = true) cannot be disabled. Settings stored in user_settings.disabled_subscribers_json.

RLM Oracle (022-rlm-oracle)

The Oracle uses a REPL-centric inference harness. The LLM is given a Python environment where the entire project lives as variables (project, sub_oracle, Final). It writes code to explore and synthesize answers programmatically.

Before (BT Oracle): Query classifier → prompt composer → BT XML signals → multi-turn loop After (RLM Oracle): REPL environment → LLM writes Python → Final variable terminates loop

REPLExecutor: RestrictedPython sandbox with 30s timeout; approved modules (re, json, math, datetime, collections, itertools); Final sentinel detection
ProjectContext: Exposes project files, threads, notes as Python objects in REPL namespace
SubOracleCallable: Recursive sub-oracle calls (max depth 2, max 3 calls per root session)
Streaming: progress chunks carry REPL stdout; content chunks carry the Final answer; done chunk carries metadata["iteration_count"]

Architecture

The Oracle API routes (/api/oracle and /api/oracle/stream) use RLMOracleWrapper:

from backend.src.services.rlm_oracle import RLMOracleWrapper

wrapper = RLMOracleWrapper(
    user_id="user-id",
    api_key="openrouter-api-key",
    project_id="project-id",
    model="deepseek/deepseek-chat-v3-0324",
    max_tokens=4096,
)

async for chunk in wrapper.process_query(query="Hello", context_id=None):
    print(chunk.type, chunk.content)

Environment Variables

Variable	Default	Description
`ORACLE_MAX_TURNS`	`25`	Max REPL iterations per root session
`ORACLE_SUB_MAX_TURNS`	`8`	Max iterations per sub-oracle session

Key Files

File	Purpose
`backend/src/services/rlm_oracle.py`	RLMOracleWrapper, RLMSession, RLMPromptBuilder, SubOracleCallable
`backend/src/services/project_context.py`	ProjectContext, TextHandle, FileManifest
`backend/src/services/repl_executor.py`	REPLExecutor, REPLNamespace (RestrictedPython sandbox)
`backend/src/services/openrouter_client.py`	OpenRouter HTTP client (moved from bt/services/)
`backend/src/api/routes/oracle.py`	Oracle API routes (updated to use RLMOracleWrapper)

Recent Changes

022-rlm-oracle: Replaced BT Oracle with RLM Oracle harness; LLM writes Python in REPL with project/sub_oracle/Final namespace; deleted entire backend/src/bt/ directory; added REPLExecutor (RestrictedPython), ProjectContext (file/thread/note handles), RLMOracleWrapper; Go symbol extraction + end_line field in CodeRAG repomap
018-vlt-mcp-server: Added vlt-mcp unified MCP server (packages/vlt-cli/src/vlt/mcp/) with 17 tools across 5 modules (thread_tools, meta_tools, code_tools, oracle_tools, vault_tools); Oracle toggle backend route (/api/settings/oracle); Oracle tab in Settings.tsx; 164ms cold-start via STDIO; registered as user-scope MCP in Claude Code

Active Technologies

Python 3.11+ (backend only; no frontend changes) (022-rlm-oracle)
RestrictedPython>=8.0 for REPL sandbox (022-rlm-oracle)
No new persistence. Ephemeral RLMSession per query. OracleBridge (existing) handles conversation history via existing context_nodes table. (022-rlm-oracle)

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Monorepo Structure

Development Commands

Quick Start (Full Stack)

Backend (Python 3.11+)

Frontend (Node 18+, React 19 + Vite 7)

Docker (Local Testing)

Database Initialization

Architecture Deep Dive

Backend Service Layers

SQLite Index Schema

Wikilink Resolution Algorithm

MCP Server Modes

Frontend Architecture

Version Conflict Flow (Optimistic Concurrency)

Environment Configuration

Constraints & Limits

Performance Targets

SpecKit Workflow (in .specify/)

CodeRAG Commands

Initialize Code Index

Check Indexing Status

Search Code Index

Repository Map

Daemon Management (for background indexing)

Supported Languages

Configuration (coderag.toml)

MCP Client Configuration

ChatGPT Widget Integration

Agent Notification System (ANS)

Architecture Overview

Components

Frontend Display

Available Subscribers

Extending ANS

User Settings

RLM Oracle (022-rlm-oracle)

Architecture

Environment Variables

Key Files

Recent Changes

Active Technologies