Skip to content

Latest commit

 

History

History
600 lines (458 loc) · 21.2 KB

File metadata and controls

600 lines (458 loc) · 21.2 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Vlt-Bridge (formerly Document-MCP) is a monorepo containing:

  1. Document-MCP: Multi-tenant Obsidian-like documentation viewer with AI-first workflow
  2. vlt-cli: AI memory and context retrieval CLI tool with CodeRAG

AI agents write/update documentation via MCP (Model Context Protocol), while humans read and edit through a web UI. The system provides per-user vaults with Markdown notes, full-text search (SQLite FTS5), wikilink resolution, tag indexing, and backlink tracking.

Vlt Oracle: Multi-source intelligent context retrieval for AI coding agents, combining:

  • vlt threads: Development history and memory
  • Markdown vault: Documentation (Document-MCP)
  • CodeRAG: Code understanding with hybrid retrieval (vector + BM25 + graph)

Architecture: Python 3.11+ backend (FastAPI + FastMCP) + React 19 frontend (Vite 7 + shadcn/ui) + vlt-cli (Python CLI)

Monorepo Structure

Vlt-Bridge/
├── backend/           # Document-MCP FastAPI backend
├── frontend/          # Document-MCP React frontend
├── packages/
│   └── vlt-cli/       # vlt CLI tool (memory, threads, oracle, coderag)
├── specs/             # Feature specifications (SpecKit)
└── data/              # Local data (vaults, indexes)

Key Concepts:

  • Vault: Per-user filesystem directory containing .md files
  • MCP Server: Exposes tools for AI agents (STDIO for local, HTTP for remote with JWT)
  • Indexer: SQLite FTS5 for full-text search + separate tables for tags/links/metadata
  • Wikilinks: [[Note Name]] resolved via case-insensitive slug matching (prefers same folder, then lexicographic)
  • Optimistic Concurrency: Version counter in SQLite (not frontmatter); UI sends if_version, MCP uses last-write-wins
  • RAG: LlamaIndex with Gemini embeddings for semantic search over vault content
  • TTS: ElevenLabs integration for text-to-speech note reading

Development Commands

Quick Start (Full Stack)

# Automated startup (recommended)
./start-dev.sh                # Starts backend (8000) + frontend (5173)
./stop-dev.sh                 # Stop both services
./status-dev.sh               # Check running processes

Backend (Python 3.11+)

cd backend

# Setup (first time)
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .
uv pip install -e ".[dev]"   # Dev dependencies (pytest, httpx)

# Run FastAPI HTTP server (for UI)
uv run uvicorn src.api.main:app --reload --host 0.0.0.0 --port 8000

# Run MCP STDIO server (for Claude Desktop/Code)
uv run python src/mcp/server.py

# Run MCP HTTP server (for remote clients with JWT)
uv run python src/mcp/server.py --http --port 8001

# Tests
uv run pytest                          # All tests
uv run pytest tests/unit               # Unit tests only
uv run pytest tests/integration        # Integration tests
uv run pytest -k test_vault_write      # Single test pattern
uv run pytest -v                       # Verbose output
uv run pytest --lf                     # Last failed tests

Frontend (Node 18+, React 19 + Vite 7)

cd frontend

# Setup (first time)
npm install

# Development server
npm run dev                   # Start Vite dev server (http://localhost:5173)

# Build
npm run build                 # TypeScript compile + Vite build to dist/

# Lint
npm run lint                  # ESLint check

# Preview production build
npm run preview               # Serve dist/ (after npm run build)

Docker (Local Testing)

# Build and run container locally (mirrors HF Spaces deployment)
docker build -t document-mcp .
docker run -p 7860:7860 -e JWT_SECRET_KEY="dev-secret" document-mcp
# Access at http://localhost:7860

Database Initialization

# Backend database is auto-initialized on first run
# Manual reset (WARNING: destroys all data)
cd backend
rm -f ../data/index.db
uv run python -c "from src.services.database import DatabaseService; DatabaseService().initialize()"

Architecture Deep Dive

Backend Service Layers

3-tier architecture:

  1. Models (backend/src/models/): Pydantic schemas for validation

    • note.py: Note, NoteMetadata, NoteSummary
    • user.py: User, UserProfile
    • search.py: SearchResult, SearchQuery
    • index.py: IndexHealth
    • auth.py: TokenRequest, TokenResponse
  2. Services (backend/src/services/): Business logic

    • vault.py: Filesystem operations (read/write/list/delete notes)
      • validate_note_path(): Path security (no .., max 256 chars, Unix separators)
      • sanitize_path(): Resolves and enforces vault root boundary
    • indexer.py: SQLite FTS5 + metadata tracking
      • index_note(): Updates metadata, FTS, tags, links (synchronous on every write)
      • search_notes(): BM25 ranking with title 3x weight, body 1x, recency bonus
      • get_backlinks(): Follows link graph (note → sources that reference it)
    • auth.py: JWT + HF OAuth integration
      • create_access_token(): Issues JWT with sub=user_id, exp=90days
      • verify_token(): Validates JWT and extracts user_id
    • config.py: Env var management (MODE, JWT_SECRET_KEY, VAULT_BASE_DIR, etc.)
    • database.py: SQLite connection manager + schema DDL
  3. API/MCP (backend/src/api/ and backend/src/mcp/):

    • api/routes/: FastAPI endpoints
      • auth.py: OAuth, JWT, user endpoints
      • notes.py: CRUD operations (with optimistic concurrency)
      • search.py: Full-text search
      • index.py: Index rebuild/health
      • graph.py: Note relationship graph for visualization
      • rag.py: RAG/vector DB queries (LlamaIndex + Gemini)
      • tts.py: Text-to-speech (ElevenLabs)
      • demo.py, system.py: Demo data seeding, system info
    • api/middleware/auth_middleware.py: JWT Bearer token validation
    • mcp/server.py: FastMCP tools (7 tools: list, read, write, delete, search, backlinks, tags)

Critical Path Validation (in vault.py):

  • All note paths MUST pass validate_note_path() (returns (bool, str) tuple)
  • Then sanitize_path() resolves and ensures no vault escape
  • Failure = 400 Bad Request with specific error message

SQLite Index Schema

5 tables (see backend/src/services/database.py):

  1. note_metadata: Version tracking, size, timestamps (per note)
  2. note_fts: Contentless FTS5 with porter tokenizer, prefix='2 3' for autocomplete
  3. note_tags: Many-to-many (user_id, note_path, tag)
  4. note_links: Link graph (source_path → target_path, is_resolved flag)
  5. index_health: Aggregate stats (note_count, last_full_rebuild, last_incremental_update)

Indexer Update Flow (in indexer.py):

write_note() → vault.write_note() → indexer.index_note()
                                  ↓
                            [metadata table: version++]
                            [FTS table: re-insert title+body]
                            [tags table: clear + re-insert]
                            [links table: extract wikilinks, resolve, update backlinks]
                            [health table: note_count++, last_incremental_update=now]

Wikilink Resolution Algorithm

In indexer.py (resolve_wikilink logic):

  1. Normalize link text to slug: normalize_slug("API Design")"api-design"
  2. Find all notes where slug matches normalize_slug(title) or normalize_slug(filename_stem)
  3. If multiple matches:
    • Prefer same folder as source note
    • Else lexicographically smallest path (ASCII sort)
  4. Store in note_links table with is_resolved=1 (or 0 if no match)

Broken links are tracked (is_resolved=0) and can be queried for UI "Create note" affordance.

MCP Server Modes

STDIO (python src/mcp/server.py):

  • For Claude Desktop/Code local integration
  • Uses LOCAL_USER_ID from env (default: "local-dev")
  • No authentication

HTTP (python src/mcp/server.py --http --port 8001):

  • For remote clients (HF Space deployment)
  • Requires Authorization: Bearer <jwt> header
  • JWT validated → user_id extracted → scoped to that user's vault

Endpoint: Tools defined in mcp/server.py with FastMCP decorators (@mcp.tool)

Frontend Architecture

Component Hierarchy:

App.tsx (main layout, routing)
├── MainApp.tsx (authenticated app shell)
│   ├── DirectoryTree.tsx (left sidebar: vault explorer)
│   ├── NoteViewer.tsx (read mode: react-markdown rendering)
│   ├── NoteEditor.tsx (edit mode: split view with live preview)
│   ├── SearchBar.tsx (debounced search with dropdown)
│   ├── ChatPanel.tsx (AI chat interface for RAG)
│   ├── GraphView.tsx (note relationship visualization)
│   └── TableOfContents.tsx (heading navigator)
├── Login.tsx (HF OAuth flow)
└── Settings.tsx (token access, preferences)

Key Libraries:

  • react-markdown + remark-gfm: Markdown rendering with GFM support
  • shadcn/ui: UI components (30+ primitives from Radix UI)
  • react-force-graph-2d: Note relationship graph visualization
  • react-resizable-panels: Split pane layout
  • lib/wikilink.ts: Parse [[...]] + resolve via GET /api/backlinks
  • services/api.ts: Fetch wrapper with Bearer token injection

Wikilink Rendering (in NoteViewer.tsx):

  • Custom react-markdown renderer for links
  • Detect [[Note Name]] pattern → fetch backlinks → resolve to path → make clickable
  • Broken links styled differently (e.g., red/dashed underline)

Version Conflict Flow (Optimistic Concurrency)

UI Edit Scenario:

  1. User opens note → GET /api/notes/{path} → receives {..., version: 5}
  2. User edits → clicks Save → PUT /api/notes/{path} with {"if_version": 5, ...}
  3. Backend checks: if current version != 5 → return 409 Conflict
  4. UI shows "Note changed, please reload" message

MCP Write: No version check, always succeeds (last-write-wins).

Environment Configuration

See .env.example for all variables. Key settings:

  • MODE: local (single-user, no OAuth) or space (HF multi-tenant)
  • JWT_SECRET_KEY: Generate with python -c "import secrets; print(secrets.token_urlsafe(32))"
  • VAULT_BASE_DIR: Where vaults are stored (e.g., ./data/vaults)
  • DB_PATH: SQLite database file (e.g., ./data/index.db)
  • LOCAL_USER_ID: Default user for local mode (default: local-dev)

HF Space variables (only needed when MODE=space):

  • HF_OAUTH_CLIENT_ID, HF_OAUTH_CLIENT_SECRET, HF_SPACE_HOST

Optional integrations:

  • GOOGLE_API_KEY: Gemini API for RAG embeddings and LLM
  • ELEVENLABS_API_KEY, ELEVENLABS_VOICE_ID, ELEVENLABS_MODEL: TTS integration

Constraints & Limits

  • Note size: 1 MiB max (enforced in vault.py)
  • Vault limit: 5,000 notes per user (configurable in indexer.py)
  • Path length: 256 chars max (validated in vault.py)
  • Wikilink syntax: Only [[wikilink]] supported (no aliases like [[link|alias]])

Performance Targets

  • MCP operations: <500ms for 1,000-note vaults
  • UI directory load: <2s
  • Note render: <1s
  • Search: <1s for 5,000 notes
  • Index rebuild: <30s for 1,000 notes

SpecKit Workflow (in .specify/)

This repo uses the SpecKit methodology for feature planning:

  • specs/###-feature-name/: Feature documentation
    • spec.md: User stories, requirements, success criteria
    • plan.md: Tech stack, architecture, structure
    • data-model.md: Entities, schemas, validation
    • contracts/: OpenAPI + MCP tool schemas
    • tasks.md: Implementation task checklist
  • Slash commands: /speckit.specify, /speckit.plan, /speckit.tasks, /speckit.implement
  • Scripts: .specify/scripts/bash/ (feature scaffolding, context updates)

Implemented features: 001-obsidian-docs-viewer, 002-add-graph-view, 003-ai-chat-window, 004-gemini-vault-chat, 006-ui-polish, 011-coderag-project-init

CodeRAG Commands

The vlt CLI includes CodeRAG functionality for indexing and searching codebases with hybrid retrieval (vector + BM25 + graph).

Initialize Code Index

# Interactive project selection
vlt coderag init

# Specify project directly
vlt coderag init --project <project-id>

# Index specific directory
vlt coderag init --project <project-id> --path /path/to/codebase

# Force re-index (overwrite existing)
vlt coderag init --project <project-id> --force

# Run in foreground with progress display
vlt coderag init --project <project-id> --foreground

Notes:

  • By default, indexing runs in background via the daemon
  • If daemon is not running, you will be prompted to run in foreground
  • Existing indexes require --force to overwrite

Check Indexing Status

# Human-readable status
vlt coderag status --project <project-id>

# JSON output for scripting
vlt coderag status --project <project-id> --json

Status values:

  • pending: Job queued, waiting for daemon
  • running: Indexing in progress
  • completed: Indexing finished successfully
  • failed: Indexing failed (check error_message)
  • cancelled: Job was cancelled by user

Search Code Index

# Semantic search
vlt coderag search "function that handles authentication" --project <project-id>

# Limit results
vlt coderag search "error handling" --project <project-id> --limit 5

Repository Map

# Generate overview of codebase structure
vlt coderag map --project <project-id>

# Focus on specific directory
vlt coderag map --project <project-id> --scope src/api/

Daemon Management (for background indexing)

# Start daemon
vlt daemon start

# Stop daemon
vlt daemon stop

# Check daemon status
vlt daemon status

Supported Languages

CodeRAG supports: python, typescript, tsx, javascript, go, rust

Files matching patterns in coderag.toml (or default **/*.py) are indexed.

Configuration (coderag.toml)

Place in project root for custom settings:

[coderag]
include = ["**/*.py", "**/*.ts", "**/*.tsx"]
exclude = ["**/node_modules/**", "**/.venv/**", "**/dist/**"]

[coderag.embedding]
batch_size = 10

[coderag.repomap]
max_tokens = 4000
include_signatures = true

MCP Client Configuration

Claude Desktop (STDIO, local mode):

{
  "mcpServers": {
    "document-mcp": {
      "command": "uv",
      "args": ["run", "python", "src/mcp/server.py"],
      "cwd": "/absolute/path/to/Document-MCP/backend"
    }
  }
}

Remote HTTP (HF Space with JWT):

{
  "mcpServers": {
    "document-mcp": {
      "url": "https://your-space.hf.space/mcp",
      "transport": "http",
      "headers": {
        "Authorization": "Bearer YOUR_JWT_TOKEN"
      }
    }
  }
}

Obtain JWT: POST /api/tokens after HF OAuth login.

ChatGPT Widget Integration

The app can be embedded in ChatGPT as an iFrame:

  • Widget served at /widget.html with special MIME type text/html+skybridge
  • MCP endpoint remains accessible for other AI agents simultaneously
  • Entry point: frontend/src/widget.tsx

Agent Notification System (ANS)

The ANS provides real-time notifications to AI agents during task execution, enabling self-awareness about tool failures, budget limits, and operational issues.

Architecture Overview

Event Source (oracle_agent.py, tool_executor.py)
    │
    ▼ emit(Event)
EventBus (pub/sub)
    │
    ▼ notify handlers
SubscriberLoader → Subscriber configs (*.toml)
    │
    ▼ filter + batch
NotificationAccumulator
    │
    ▼ render with template
ToonFormatter (Jinja2 + python-toon)
    │
    ▼ yield OracleStreamChunk(type="system")
SSE Stream → Frontend ChatPanel

Components

EventBus (backend/src/services/ans/bus.py):

  • Pub/sub pattern for decoupled event emission
  • Supports wildcard subscriptions (e.g., tool.*)
  • Thread-safe with overflow handling

Subscribers (backend/src/services/ans/subscribers/*.toml):

  • TOML-based configuration for each notification type
  • Define event types, severity filters, batching windows
  • Reference Jinja2 templates for formatting

Example subscriber config:

[subscriber]
id = "tool_failure"
name = "Tool Failure Notifications"

[events]
types = ["tool.call.failure", "tool.call.timeout"]
severity_filter = "warning"

[output]
priority = "high"
inject_at = "after_tool"
template = "tool_failure.toon.j2"
core = true  # Cannot be disabled by user

Templates (backend/src/services/ans/templates/*.toon.j2):

  • Jinja2 templates producing TOON (Token-Optimized Object Notation) output
  • Compact format optimized for LLM context windows
  • Supports batching multiple events into single notification

Notification Injection Points:

  • turn_start: Injected before agent receives next prompt (budget warnings)
  • after_tool: Injected after tool execution (tool failures)
  • immediate: Injected as soon as event occurs (critical alerts)

Frontend Display

System messages appear in ChatPanel with distinct styling:

  • Yellow/amber left border and background
  • AlertCircle icon with "System" attribution
  • Rendered inline with agent/user messages
  • Persisted in context_nodes.system_messages_json

Available Subscribers

Subscriber Events Priority Inject At
tool_failure tool.call.failure, tool.call.timeout high after_tool
budget_warning budget.token.warning, budget.iteration.warning normal turn_start
budget_exceeded budget.token.exceeded, budget.iteration.exceeded critical immediate
loop_detected agent.loop.detected high immediate

Extending ANS

  1. Create backend/src/services/ans/subscribers/my_subscriber.toml
  2. Create backend/src/services/ans/templates/my_subscriber.toon.j2
  3. Emit events from your service code:
    # Within backend/src/services/, use relative imports:
    from .ans.bus import get_event_bus
    from .ans.event import Event, Severity
    
    bus = get_event_bus()
    bus.emit(Event(
        type="my.custom.event",
        source="my_service",
        severity=Severity.INFO,
        payload={"message": "Something happened"}
    ))

User Settings

Users can toggle non-core subscribers via Settings > Notifications tab. Core subscribers (marked core = true) cannot be disabled. Settings stored in user_settings.disabled_subscribers_json.

RLM Oracle (022-rlm-oracle)

The Oracle uses a REPL-centric inference harness. The LLM is given a Python environment where the entire project lives as variables (project, sub_oracle, Final). It writes code to explore and synthesize answers programmatically.

Before (BT Oracle): Query classifier → prompt composer → BT XML signals → multi-turn loop After (RLM Oracle): REPL environment → LLM writes Python → Final variable terminates loop

  • REPLExecutor: RestrictedPython sandbox with 30s timeout; approved modules (re, json, math, datetime, collections, itertools); Final sentinel detection
  • ProjectContext: Exposes project files, threads, notes as Python objects in REPL namespace
  • SubOracleCallable: Recursive sub-oracle calls (max depth 2, max 3 calls per root session)
  • Streaming: progress chunks carry REPL stdout; content chunks carry the Final answer; done chunk carries metadata["iteration_count"]

Architecture

The Oracle API routes (/api/oracle and /api/oracle/stream) use RLMOracleWrapper:

from backend.src.services.rlm_oracle import RLMOracleWrapper

wrapper = RLMOracleWrapper(
    user_id="user-id",
    api_key="openrouter-api-key",
    project_id="project-id",
    model="deepseek/deepseek-chat-v3-0324",
    max_tokens=4096,
)

async for chunk in wrapper.process_query(query="Hello", context_id=None):
    print(chunk.type, chunk.content)

Environment Variables

Variable Default Description
ORACLE_MAX_TURNS 25 Max REPL iterations per root session
ORACLE_SUB_MAX_TURNS 8 Max iterations per sub-oracle session

Key Files

File Purpose
backend/src/services/rlm_oracle.py RLMOracleWrapper, RLMSession, RLMPromptBuilder, SubOracleCallable
backend/src/services/project_context.py ProjectContext, TextHandle, FileManifest
backend/src/services/repl_executor.py REPLExecutor, REPLNamespace (RestrictedPython sandbox)
backend/src/services/openrouter_client.py OpenRouter HTTP client (moved from bt/services/)
backend/src/api/routes/oracle.py Oracle API routes (updated to use RLMOracleWrapper)

Recent Changes

  • 022-rlm-oracle: Replaced BT Oracle with RLM Oracle harness; LLM writes Python in REPL with project/sub_oracle/Final namespace; deleted entire backend/src/bt/ directory; added REPLExecutor (RestrictedPython), ProjectContext (file/thread/note handles), RLMOracleWrapper; Go symbol extraction + end_line field in CodeRAG repomap
  • 018-vlt-mcp-server: Added vlt-mcp unified MCP server (packages/vlt-cli/src/vlt/mcp/) with 17 tools across 5 modules (thread_tools, meta_tools, code_tools, oracle_tools, vault_tools); Oracle toggle backend route (/api/settings/oracle); Oracle tab in Settings.tsx; 164ms cold-start via STDIO; registered as user-scope MCP in Claude Code

Active Technologies

  • Python 3.11+ (backend only; no frontend changes) (022-rlm-oracle)
  • RestrictedPython>=8.0 for REPL sandbox (022-rlm-oracle)
  • No new persistence. Ephemeral RLMSession per query. OracleBridge (existing) handles conversation history via existing context_nodes table. (022-rlm-oracle)