LDRS v3 — Hybrid RAG Deep Agent System

Living Document RAG System, version 3 — a 6-stage retrieval-augmented generation pipeline that combines hierarchical grep search, pgvector semantic similarity, BM25 fusion ranking, and an agentic reasoning loop with grounding verification. Designed for multi-document technical knowledge bases with full Nepali/Devanagari Unicode support.

Python 3.12+   |   PostgreSQL 16 + pgvector   |   OpenAI-compatible API

Architecture Overview

LDRS v3 processes user queries through a deterministic 6-stage pipeline where each stage has a well-defined input/output contract. Three stages involve LLM calls (Intent Classification, Agent Loop, Grounding); the remaining stages are deterministic.

                          ┌──────────────────────────┐
                          │   Source .md Files        │
                          │   (docs/ directory)       │
                          └────────────┬─────────────┘
                                       │
                          ┌────────────▼─────────────┐
                          │   File Watcher (Stage 0)  │
                          │   watchdog + debounce     │
                          └────────────┬─────────────┘
                                       │
                          ┌────────────▼─────────────┐
                          │   PageIndex (md_to_tree)  │
                          │   .md → structure JSON    │
                          └────────────┬─────────────┘
                                       │
                     ┌─────────────────┼─────────────────┐
                     │                 │                  │
            ┌────────▼───────┐ ┌──────▼────────┐ ┌──────▼───────┐
            │ Structure JSON │ │  Embed into   │ │  Registry    │
            │ (results/)     │ │  pgvector     │ │  (JSON)      │
            └────────────────┘ └───────────────┘ └──────────────┘


  User Query ──────────────────────────────────────────────────────────
                     │
            ┌────────▼────────────────────────────────────────────┐
            │  Stage 1: Intent Classification (LLM)               │
            │  → intent_type, selected_files, query_variants,     │
            │    pattern_hints, needs_db, likely_multihop          │
            └────────┬────────────────────────────────────────────┘
                     │
            ┌────────▼────────────────────────────────────────────┐
            │  Stage 2: Parallel Retrieval                        │
            │  ┌──────────────┐    ┌──────────────────────┐       │
            │  │  TreeGrep    │    │  pgvector Similarity  │       │
            │  │  (pattern    │ ∥  │  (query_variants →    │       │
            │  │   hints)     │    │   cosine search)      │       │
            │  └──────┬───────┘    └──────────┬───────────┘       │
            │         └──────────┬────────────┘                   │
            │                    ▼                                │
            │         Section Pool (deduplicated)                 │
            └────────┬────────────────────────────────────────────┘
                     │
            ┌────────▼────────────────────────────────────────────┐
            │  Stage 3: BM25 Fusion Ranking                       │
            │  score = (w_bm25 × BM25) + (w_vec × vector)        │
            │        + (w_grep × grep_density)                    │
            │  × recency_factor × tag_boost                       │
            │  Weights shift by intent_type                       │
            └────────┬────────────────────────────────────────────┘
                     │
            ┌────────▼────────────────────────────────────────────┐
            │  Stage 4: VFS Population                            │
            │  /sessions/{id}/manifest.json                       │
            │  /sessions/{id}/retrieved/rank1__doc__section.md     │
            │  /sessions/{id}/conversation/                       │
            │  /sessions/{id}/working/scratchpad.md                │
            └────────┬────────────────────────────────────────────┘
                     │
            ┌────────▼────────────────────────────────────────────┐
            │  Stage 5: Agent Loop (LLM + Function Calling)       │
            │  Read manifest → self-sufficiency check →            │
            │  read_section / fetch_section / scratchpad →          │
            │  cited answer synthesis                              │
            └────────┬────────────────────────────────────────────┘
                     │
            ┌────────▼────────────────────────────────────────────┐
            │  Stage 6: Grounding Verification (LLM)              │
            │  Extract claims → entailment check per claim →       │
            │  flag unsupported → caveat insertion →                │
            │  optional re-grounding loop                         │
            └────────┬────────────────────────────────────────────┘
                     │
                     ▼
              Final Verified Answer

Key Design Decisions

OpenAI function calling is used directly (not through a framework) for the agent loop, giving fine-grained control over tool execution, message history, and forced synthesis on iteration limits.
Dual retrieval (TreeGrep + pgvector) ensures both keyword-exact and semantic matches are captured. BM25 fusion ranking merges these signals with intent-aware weight presets.
VFS-per-session gives the agent a self-contained filesystem with ranked sections, conversation context, and a scratchpad. The agent navigates via manifest.json and reads sections selectively.
Grounding verification runs after every agent answer. Each cited claim is checked via LLM entailment against its source section. Unsupported claims get caveats; high flag ratios trigger re-grounding.
NFC normalization is applied at every text boundary for safe handling of Nepali/Devanagari characters.

Pipeline Stages

Stage 0: File Watcher

Monitors directories for .md file changes using the watchdog library. On create/modify, the file is re-indexed (parsed, embedded, registered). On delete, embeddings and registry entries are removed. A configurable debounce (default 2 seconds) prevents rapid successive re-indexing.

Stage 1: Intent Classification

A single LLM call receives the user query and the compact registry JSON. It outputs a structured JSON with:

Field	Description
`intent_type`	One of: `exact`, `conceptual`, `comparative`, `multihop`, `db_query`, `hybrid`
`selected_files`	Files likely to contain the answer, with confidence scores
`query_variants`	2-4 rephrased queries for diverse retrieval
`pattern_hints`	`{literals, phrases, prefix_wildcards}` for TreeGrep
`needs_db`	Whether database access is needed
`likely_multihop`	Whether multi-hop reasoning is required

Fast paths: empty registry returns defaults without an LLM call.

Stage 2: Parallel Retrieval

Two retrieval methods run concurrently:

TreeGrep — searches structure JSONs using pattern_hints from Stage 1. Three-tier matching: title (3.0), summary (2.0), body (1.0). Supports exact substring and word-level matching with stop word filtering and configurable minimum match ratio (0.3).
pgvector — embeds each query_variant and runs cosine similarity search against the sections table. Multi-query results are deduplicated by (doc_name, section_id), keeping the highest similarity score.

Results are merged into a unified SectionCandidate pool with combined grep and vector signals.

Stage 3: BM25 Fusion Ranking

Scores each candidate using a weighted fusion:

raw_score = (w_bm25 × BM25_norm) + (w_vector × similarity) + (w_grep × grep_density)
final_score = raw_score × recency_factor × tag_boost

Weight presets by intent type:

Intent	BM25	Vector	Grep
`exact`	0.30	0.35	0.35
`conceptual`	0.25	0.55	0.20
`comparative`	0.35	0.40	0.25
`multihop`	0.35	0.40	0.25
`db_query`	0.30	0.40	0.30
`hybrid`	0.35	0.40	0.25

Metadata boosts:

recency_factor: exponential decay with 365-day half-life, range [0.5, 1.0]
tag_boost: fraction of query tokens matching registry tags, range [1.0, 1.5]

For comparative intent, a round-robin interleave ensures balanced multi-file coverage.

Stage 4: VFS Population

Creates a per-session directory with:

manifest.json — ranked section list with metadata and scores
retrieved/ — individual .md files per ranked section
conversation/ — history summary and recent turns
db_context/ — optional database query results
working/scratchpad.md — agent working memory

The manifest is the agent's primary navigation interface.

Stage 5: Agent Loop

An iterative OpenAI function-calling loop:

The agent receives the query, intent type, and manifest
It calls tools (read_section, fetch_section, write_scratchpad, etc.)
Each iteration is an LLM call with tool_choice="auto"
When the LLM produces a text response without tool calls, the loop ends
If max iterations are reached, a forced synthesis prompt is sent

Tool definitions are provided in OpenAI function-calling format. Only sections actually read via read_section or fetch_section may be cited.

Stage 6: Grounding Verification

Post-answer verification of every cited claim:

Extract (claim, citation) pairs from the answer text
Locate cited section content in the VFS
LLM entailment check: "Does this section support this claim?"
Flag unsupported claims; insert caveat text
If flag ratio > 0.4, set re_grounded=True; the pipeline re-runs Stage 5 with grounding feedback appended to the query

Flagged claims are logged to results/hallucination_log.jsonl.

Quick Start

Prerequisites

Python 3.12+
Docker (for PostgreSQL + pgvector)
An OpenAI-compatible API endpoint (local or remote)

1. Clone and install

git clone <repo-url> ldrs_v3
cd ldrs_v3
pip install -r requirements.txt

Or install as a package:

pip install -e ".[dev]"

2. Start PostgreSQL + pgvector

docker compose up -d

This starts pgvector/pgvector:pg16 on port 5432 and runs scripts/init_db.sql to create the sections and hallucination_log tables with the pgvector extension.

3. Configure environment

cp .env.example .env
# Edit .env with your API key, base URL, and other settings

4. Index documents

Place your .md source files in docs/, then index:

# Via API (start server first)
uvicorn api.server:app --host 0.0.0.0 --port 8001

# Then index via HTTP
curl -X POST http://localhost:8001/index-directory

Or programmatically:

import asyncio
from agent.config import AgentConfig
from agent.pipeline import Pipeline

async def main():
    pipeline = Pipeline(AgentConfig())
    await pipeline.startup()
    results = await pipeline.index_directory("docs/")
    for r in results:
        print(f"{r.doc_name}: {r.node_count} nodes, {r.embedded_count} embedded")
    await pipeline.shutdown()

asyncio.run(main())

5. Query

curl -X POST http://localhost:8001/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How does OAuth2 token refresh work?"}'

6. Launch the UI

streamlit run ui/streamlit_app.py

Configuration Reference

All configuration is centralized in AgentConfig (agent/config.py). Values are read from environment variables with sensible defaults. Constructor arguments override env values.

LLM Settings

Env Var	Field	Default	Description
`API_KEY`	`api_key`	`""`	API key for the OpenAI-compatible endpoint
`BASE_URL`	`base_url`	`http://localhost:8000/v1`	Base URL for chat and embeddings
`DEFAULT_MODEL`	`default_model`	`qwen3-vl`	Default model for chat completions
`EMBEDDING_MODEL`	`embedding_model`	`text-embedding-3-small`	Model name for embeddings

PostgreSQL

Env Var	Field	Default	Description
`POSTGRES_HOST`	`postgres_host`	`localhost`	PostgreSQL host
`POSTGRES_PORT`	`postgres_port`	`5432`	PostgreSQL port
`POSTGRES_DB`	`postgres_db`	`ldrs_v3`	Database name
`POSTGRES_USER`	`postgres_user`	`ldrs`	Database user
`POSTGRES_PASSWORD`	`postgres_password`	`ldrs_secret`	Database password
—	`embedding_dim`	`1536`	Embedding vector dimension

LangSmith Monitoring

Env Var	Field	Default	Description
`LANGSMITH_TRACING`	`langsmith_tracing`	`false`	Enable LangSmith tracing
`LANGSMITH_API_KEY`	`langsmith_api_key`	`""`	LangSmith API key
`LANGSMITH_PROJECT`	`langsmith_project`	`ldrs-v3`	LangSmith project name

File Watcher

Env Var	Field	Default	Description
`WATCH_DIRS`	`watch_dirs`	`./docs`	Comma-separated directories to monitor
`WATCH_DEBOUNCE`	`watch_debounce`	`2.0`	Debounce interval in seconds

Retrieval & Agent Tuning

Env Var	Field	Default	Description
`MAX_VFS_SECTIONS`	`max_vfs_sections`	`15`	Max sections in VFS per query
—	`max_context_chars`	`15000`	Character budget for context
—	`max_grep_results`	`50`	Max TreeGrep results per document
`MAX_AGENT_ITERATIONS`	`max_agent_iterations`	`10`	Max agent loop iterations

Fusion Weights (Defaults)

Field	Default	Description
`bm25_weight`	`0.4`	Default BM25 weight
`vector_weight`	`0.4`	Default vector similarity weight
`grep_weight`	`0.2`	Default grep density weight

These defaults are overridden by the intent-specific presets in FusionRanker.

Directories

Field	Default	Description
`results_dir`	`./results`	Structure JSONs and registry
`docs_dir`	`./docs`	Source `.md` files
`sessions_dir`	`./sessions`	VFS session data
`registry_path`	`./results/registry.json`	Auto-derived from `results_dir`

Derived Properties

postgres_dsn — full PostgreSQL connection string
async_postgres_dsn — async-compatible connection string (for asyncpg)

API Reference

The FastAPI server runs on port 8001 (configurable via PORT env var).

POST `/query`

Run the full 6-stage pipeline for a user query.

Request body:

{
  "query": "How does OAuth2 token refresh work?",
  "conversation_summary": "optional prior conversation summary",
  "recent_turns": [
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."}
  ],
  "db_context": null,
  "cleanup_session": false
}

Response:

{
  "answer": "OAuth2 token refresh works by...",
  "intent_type": "conceptual",
  "selected_files": ["authentication.md"],
  "candidates_count": 12,
  "ranked_count": 8,
  "session_id": "a1b2c3d4e5f6",
  "citations": ["authentication.md § Token Refresh"],
  "claims_checked": 3,
  "claims_supported": 3,
  "claims_flagged": 0,
  "re_grounded": false,
  "usage": {
    "total_input_tokens": 2500,
    "total_output_tokens": 800,
    "total_tokens": 3300,
    "total_cost_usd": 0.0,
    "total_llm_calls": 4,
    "total_query_time_ms": 3200.5,
    "stage_breakdown": {},
    "stage_timings_ms": {}
  },
  "total_time_ms": 3450.2,
  "success": true,
  "error": ""
}

POST `/batch-query`

Run multiple queries sequentially.

Request body:

{
  "queries": ["question 1", "question 2"],
  "conversation_summary": null,
  "cleanup_sessions": false
}

Response: List[QueryResponse]

POST `/index`

Index a single markdown file.

Request body:

{
  "md_path": "/absolute/path/to/document.md",
  "tags": ["auth", "security"],
  "summary": "OAuth2 authentication documentation",
  "if_thinning": false
}

Response:

{
  "md_path": "/absolute/path/to/document.md",
  "doc_name": "document",
  "index_path": "results/document_structure.json",
  "node_count": 15,
  "section_count": 12,
  "embedded_count": 12,
  "success": true,
  "error": ""
}

POST `/index-directory`

Index all .md files in a directory.

Request body:

{
  "directory": null,
  "tags": null,
  "if_thinning": false
}

If directory is null, defaults to config.docs_dir.

Response: List[IndexResponse]

GET `/corpus`

Get a summary of the current corpus.

Response:

{
  "total_files": 5,
  "total_tokens": 25000,
  "total_nodes": 120,
  "files_with_embeddings": 5,
  "file_names": ["auth.md", "api.md", "setup.md"]
}

GET `/corpus/stats`

Detailed corpus statistics including per-file info.

Response:

{
  "summary": { "...corpus summary..." },
  "files": {
    "auth.md": {
      "summary": "OAuth2 flow documentation",
      "tags": ["auth"],
      "sections_count": 8,
      "size_tokens": 1200,
      "has_embeddings": true,
      "last_modified": "2026-02-01"
    }
  }
}

POST `/corpus/rebuild`

Re-index all documents in the docs directory. Returns List[IndexResponse].

GET `/sessions`

List all VFS session IDs. Returns List[str].

DELETE `/sessions/{session_id}`

Delete a specific VFS session.

Response: {"status": "deleted", "session_id": "..."}

POST `/sessions/cleanup`

Delete all VFS sessions.

Response: {"status": "cleaned", "sessions_deleted": "5"}

GET `/health`

Health check endpoint.

Response:

{
  "status": "ok",
  "pipeline_started": true,
  "corpus_files": 5,
  "version": "3.0.0"
}

Streamlit UI

The Streamlit chat interface (ui/streamlit_app.py) provides:

Chat interface — send queries and view cited answers
Pipeline details — expandable panel showing intent type, candidate counts, ranked counts, claims checked/flagged, timing
Corpus info — sidebar showing file count, token count, node count
Usage stats — expandable JSON view of per-stage token usage
Health status — sidebar indicator of API connection status

Launch:

streamlit run ui/streamlit_app.py

The UI connects to the FastAPI server (default http://localhost:8001). The API URL is configurable via the sidebar text input.

Module Reference

All modules live under agent/. The package uses lazy imports via __getattr__ in agent/__init__.py to avoid pulling in heavy dependencies on package load.

`config.py` — AgentConfig

Centralized @dataclass configuration. All settings loaded from environment variables with sensible defaults. Provides postgres_dsn and async_postgres_dsn computed properties.

`monitoring.py` — UsageTracker & LangSmith

setup_monitoring(config) — configures LangSmith tracing env vars
UsageTracker — per-query accumulator for token counts, latencies, and costs across all stages
LLMCallRecord — individual LLM call record
StageTimer — per-stage timing

`registry.py` — Registry

JSON-based document registry following the AGENT_SYSTEM schema. Tracks:

Document metadata (summary, tags, sections, token counts)
Embedding status
Last modified dates
Index paths

Key methods:

add_file() — add/update a file entry with structure tree
remove_file() — remove a file entry
mark_embeddings() — update embedding status
get_corpus_summary() — aggregate corpus statistics
get_for_llm() — compact version for Stage 1 LLM context
save() — atomic write (tmp file + rename)

Uses NFC normalization on all text fields and tiktoken for token counting.

`embedder.py` — Embedder

Async section-level embedding pipeline:

PostgreSQL connection pool via asyncpg (min 2, max 10 connections)
Batch embedding via OpenAI-compatible /v1/embeddings endpoint
Upsert semantics (delete + insert per document)
Cosine similarity search with optional doc_names scope
Multi-query search with deduplication

`indexer.py` — Indexer

End-to-end document indexing:

md_to_tree() — parse .md into structure tree via PageIndex
Save structure JSON to results/
Flatten tree into sections (with breadcrumbs)
Embed sections into pgvector
Register in Registry with metadata

Also handles remove_file() and index_directory().

`tree_grep.py` — TreeGrep

Hierarchical pattern search across structure JSON nodes. Three-tier matching with configurable relevance scores:

Field	Relevance Score
Title	3.0
Summary	2.0
Body text	1.0

Two matching modes:

Tier 1: Exact substring match (full score)
Tier 2: Word-level match with ratio scaling (min ratio: 0.3)

Features:

NFC normalization for Unicode safety
Stop word filtering (200+ English stop words)
Scope filtering by node_id or title
Snippet extraction with configurable padding (60 chars)
search_from_hints() for Intent Classifier pattern_hints
Multi-pattern search with deduplication

`watcher.py` — FileWatcher

File system monitoring using watchdog:

Debounced event handling (configurable interval)
Handles create, modify, delete, and move events
Async indexing via asyncio.run_coroutine_threadsafe()
Automatic registry timestamp updates
Recursive directory watching

`intent_classifier.py` — IntentClassifier

Stage 1 LLM call that produces:

IntentResult with intent type, file selection, query variants
PatternHints for TreeGrep routing
SelectedFile list with confidence scores

Handles: malformed JSON, missing fields, invalid intent types. Falls back to conceptual intent with original query on any failure.

`retriever.py` — Retriever

Stage 2 parallel retrieval:

Loads TreeGrep instances for selected files
Runs grep and vector search concurrently via asyncio.gather()
Merges results into SectionCandidate pool
Deduplication by (doc_name, section_id)
Combined signals: grep_score, grep_hits, vector_similarity

`fusion_ranker.py` — FusionRanker

Stage 3 BM25 fusion ranking:

BM25Okapi scoring from rank-bm25 library
BM25 normalization to [0, 1] range
Intent-based weight presets
Recency factor (exponential decay, 365-day half-life)
Tag overlap boost
Multi-file interleave for comparative intent
Capped at max_vfs_sections

`vfs.py` — VFS

Stage 4 virtual filesystem:

Creates session directories with UUID-based IDs
Writes ranked sections as individual .md files
Builds manifest.json with metadata and score breakdowns
Writes conversation context and db_context
Supports on-demand section fetching (add_fetched_section)
Session listing and cleanup

`tools.py` — AgentTools

Five tools available to the agent during Stage 5:

Tool	Description
`read_section(vfs_path)`	Read a section from the VFS manifest
`fetch_section(source_file, section_header)`	Pull additional section on demand
`search_conversation_history(query)`	Search prior conversation turns
`write_scratchpad(content)`	Write to private working memory
`read_scratchpad()`	Read working memory

Tool definitions are generated in OpenAI function-calling format via get_tool_definitions(). The sections_read property tracks which sections were accessed (for citation validation).

`agent_loop.py` — AgentLoop

Stage 5 iterative function-calling loop:

System prompt defines the agent protocol (manifest-first, cite-inline)
Iterates up to max_agent_iterations times
Each iteration: LLM call with tools → execute tool calls → append results
Terminates when the LLM produces content without tool calls
Forced synthesis on iteration limit
Citation extraction via regex: [source: file.md § Section]
Records per-iteration token usage

`grounding.py` — GroundingVerifier

Stage 6 grounding verification:

Extracts (claim, citation) pairs from answer text
Locates cited source content in the VFS manifest
Per-claim LLM entailment check (strict verification prompt)
Flags unsupported claims (max 10 claims checked per answer)
Caveat insertion for flagged claims
Re-grounding trigger at >40% flag ratio
Hallucination logging to results/hallucination_log.jsonl
Handles JSON parse failures and verification errors gracefully

`pipeline.py` — Pipeline

End-to-end orchestrator:

Lazy component initialization (all components created on first use)
startup() / shutdown() lifecycle management
query() runs all 6 stages sequentially
Re-grounding loop with feedback to the agent
Conversation state management (last 20 turns)
Index pass-through (index_file, index_directory)
Corpus info and session management utilities

Indexing & File Watcher

Document Format

Source documents are Markdown files. PageIndex parses them into a hierarchical structure tree based on heading levels.

Indexing Pipeline

.md file → md_to_tree() → structure JSON → flatten → embed → register

PageIndex parsing — md_to_tree() converts Markdown headings into a tree structure with node IDs, titles, text content, and page ranges.
Structure JSON — saved to results/<doc_name>_structure.json for TreeGrep and the fetch_section tool.
Flattening — the tree is flattened into a list of section dicts with {node_id, title, text, line_num, breadcrumb}. Only nodes with non-empty text are included.
Embedding — sections are batch-embedded via the OpenAI-compatible embeddings endpoint and upserted into the sections table in PostgreSQL with pgvector.
Registration — the document is added to registry.json with metadata (summary, tags, sections, token count, embedding status).

File Watcher

The watcher monitors directories for .md file changes:

from agent.watcher import FileWatcher
from agent.config import AgentConfig

config = AgentConfig()
watcher = FileWatcher(config)
await watcher.start()   # starts watchdog observer
# ... application runs ...
await watcher.stop()     # stops and cleans up

Events handled:

created / modified → re-index the file
deleted → remove embeddings + registry entry + structure JSON
moved → treat source as deleted, destination as created

Virtual Filesystem (VFS)

Each query creates a VFS session directory:

sessions/{session_id}/
  manifest.json              ← agent reads this first
  retrieved/
    rank1__docname__section.md
    rank2__docname__section.md
    ...
  conversation/
    history_summary.md        ← conversation context
    recent_turns.json         ← last N turns
  db_context/
    relevant_records.json     ← optional database results
  working/
    scratchpad.md             ← agent working memory

manifest.json

{
  "session_id": "a1b2c3d4e5f6",
  "created_at": "2026-02-28T10:00:00Z",
  "intent_type": "conceptual",
  "query_variants": ["original query", "variant 1"],
  "sections": [
    {
      "vfs_path": "retrieved/rank1__auth__oauth_flow.md",
      "source_file": "authentication.md",
      "section": "OAuth Flow",
      "one_line_summary": "Describes the OAuth2 authorization code flow...",
      "retrieval_method": "grep+vector",
      "final_score": 0.85,
      "score_breakdown": {
        "bm25": 0.7,
        "vector": 0.9,
        "grep_density": 0.3
      },
      "why_included": "vector=0.90; bm25=0.70; grep_density=0.30",
      "last_modified": "2026-02-01",
      "fetch_more_hint": false
    }
  ]
}

Agent Tools & Function Calling

The agent has five tools available, defined in OpenAI function-calling format:

`read_section`

{
  "name": "read_section",
  "parameters": {
    "vfs_path": "retrieved/rank1__auth__oauth_flow.md"
  }
}

Reads a section listed in the manifest. Only sections read via this tool may be cited. The tool tracks all accessed paths in sections_read.

`fetch_section`

{
  "name": "fetch_section",
  "parameters": {
    "source_file": "authentication.md",
    "section_header": "Token Refresh"
  }
}

Pulls an additional section on demand from the structure JSON. The section is added to the VFS retrieved/ directory and manifest. Use when the manifest hints at more content or a multihop reference is followed.

`search_conversation_history`

{
  "name": "search_conversation_history",
  "parameters": {
    "query": "what did we discuss about authentication"
  }
}

Searches prior conversation turns via keyword matching.

`write_scratchpad`

{
  "name": "write_scratchpad",
  "parameters": {
    "content": "## Reasoning\n..."
  }
}

Writes to the agent's private working memory. Content should follow the structured format: ## Reasoning, ## Key Facts, ## Open Questions, ## Synthesis Plan.

`read_scratchpad`

No parameters. Reads the current scratchpad content.

Grounding Verification

Process

Claim extraction — regex matches sentences followed by [source: ...] citations
Source lookup — maps citations to VFS sections via manifest
Entailment check — LLM verifies each claim against its source
Caveat insertion — unsupported claims get a [Note: This claim could not be fully verified...] suffix
Re-grounding — if >40% of claims are flagged, the pipeline re-runs Stage 5 with explicit feedback about which claims failed

Hallucination Log

Flagged claims are appended to results/hallucination_log.jsonl:

{
  "timestamp": "2026-02-28T10:00:00Z",
  "session_id": "a1b2c3d4e5f6",
  "claim": "The refresh token expires after 30 days",
  "citation": "authentication.md § Token Refresh",
  "supported": false,
  "reason": "The source does not mention a 30-day expiry period"
}

Cost Control

Maximum 10 claims verified per answer (MAX_CLAIMS_TO_VERIFY)
Source content truncated to 3000 chars per verification call
Re-grounding runs at most once (MAX_REGROUND_ATTEMPTS = 1)

Monitoring & LangSmith

UsageTracker

Every query creates a UsageTracker that records:

Per-call: stage, model, input/output tokens, latency, cost
Per-stage: start/end timing
Query-level: total query time

The summary() method returns a dict with:

{
    "total_input_tokens": 2500,
    "total_output_tokens": 800,
    "total_tokens": 3300,
    "total_cost_usd": 0.0,
    "total_llm_calls": 4,
    "total_query_time_ms": 3200.5,
    "stage_breakdown": {
        "intent_classifier": {"input_tokens": 500, "output_tokens": 200, ...},
        "agent_loop": {"input_tokens": 1500, "output_tokens": 400, ...},
        "grounding": {"input_tokens": 500, "output_tokens": 200, ...},
    },
    "stage_timings_ms": {
        "intent_classifier": 850.2,
        "retrieval": 200.5,
        "fusion_ranking": 15.3,
        "vfs_population": 25.1,
        "agent_loop": 1800.0,
        "grounding": 310.5,
    },
}

LangSmith Integration

Enable LangSmith tracing via environment variables:

LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-key
LANGSMITH_PROJECT=ldrs-v3

setup_monitoring(config) sets the appropriate environment variables that LangSmith/LangChain read automatically. Must be called before any LangChain model is instantiated (handled by Pipeline.startup()).

Citation Format

The agent uses inline citations in its answers:

Single source:

OAuth2 uses refresh tokens to obtain new access tokens. [source: auth.md § Token Refresh]

Multiple sources:

Both OAuth2 and API keys support scoped permissions. [source: auth.md § Scopes, api_keys.md § Permissions]

Citations are extracted by the AgentLoop._extract_citations() method using the regex pattern \[source:\s*([^\]]+)\].

The grounding verifier parses citations in the format file.md § Section Name to locate source content in the VFS.

Testing

Running Tests

# From the project root
python -m pytest tests/ -v

# Or with the development install
pytest tests/ -v

Test Suite

74 tests covering all modules and pipeline stages
All tests pass in approximately 1.5 seconds
Fully offline — no LLM or database calls
Uses unittest.mock.AsyncMock for async mocking
Uses pytest-asyncio with asyncio_mode = "auto"
Test fixtures use tmp_path for filesystem isolation

Test Coverage

Tests cover:

AgentConfig — default values, environment overrides, DSN properties
UsageTracker — recording, timing, summary computation
Registry — add/remove files, save/load, corpus summary, LLM format
TreeGrep — exact/word-level matching, regex, scope, multi-pattern, search_from_hints, deduplication
IntentClassifier — response parsing, malformed JSON, empty registry
SectionCandidate — signal merging
FusionRanker — weight presets, multi-file interleave, recency/tag boosts
VFS — session creation, manifest, section read/write, scratchpad, add fetched section, cleanup
AgentTools — all five tools, error handling
AgentLoop — citation extraction
GroundingVerifier — claim extraction, verification parsing, caveating, flag logging
Pipeline — lifecycle, conversation management, corpus info
Indexer — section flattening, breadcrumbs, recursive node counting

Test Data

tests/fixtures/ — 9 structure JSON files for TreeGrep testing
tests/markdown/ — 12 .md files for indexer testing

Project Structure

ldrs_v3/
├── README.md                    ← this file
├── pyproject.toml               ← project metadata, dependencies, tool config
├── requirements.txt             ← pinned dependencies
├── .env.example                 ← environment variable template
├── .gitignore
├── docker-compose.yml           ← PostgreSQL + pgvector (pg16)
│
├── agent/                       ← core pipeline modules
│   ├── __init__.py              ← lazy imports, __all__ exports
│   ├── config.py                ← AgentConfig dataclass
│   ├── monitoring.py            ← UsageTracker, LangSmith setup
│   ├── registry.py              ← document registry (JSON)
│   ├── embedder.py              ← pgvector embedding pipeline
│   ├── indexer.py               ← md → tree → embed → register
│   ├── tree_grep.py             ← hierarchical pattern search
│   ├── watcher.py               ← file system watcher (watchdog)
│   ├── intent_classifier.py     ← Stage 1: intent + routing
│   ├── retriever.py             ← Stage 2: parallel retrieval
│   ├── fusion_ranker.py         ← Stage 3: BM25 fusion ranking
│   ├── vfs.py                   ← Stage 4: VFS population
│   ├── tools.py                 ← Stage 5: agent tools
│   ├── agent_loop.py            ← Stage 5: agent reasoning loop
│   ├── grounding.py             ← Stage 6: grounding verification
│   └── pipeline.py              ← end-to-end orchestrator
│
├── api/
│   ├── __init__.py
│   └── server.py                ← FastAPI server (11 endpoints)
│
├── ui/
│   ├── __init__.py
│   └── streamlit_app.py         ← Streamlit chat interface
│
├── pageindex/                   ← PageIndex library (md_to_tree)
│   ├── __init__.py
│   ├── page_index_md.py         ← markdown → structure tree
│   ├── utils.py                 ← shared utilities
│   └── config.yaml              ← PageIndex configuration
│
├── scripts/
│   └── init_db.sql              ← PostgreSQL schema (sections + hallucination_log)
│
├── tests/
│   ├── test_ldrs_v3.py          ← 74 tests (all passing)
│   ├── fixtures/                ← 9 structure JSON test files
│   └── markdown/                ← 12 .md test files
│
├── docs/                        ← source .md files (user content)
├── results/                     ← runtime: structure JSONs + registry.json
└── sessions/                    ← runtime: VFS session directories

Database Schema

PostgreSQL 16 with pgvector extension. Tables created by scripts/init_db.sql:

`sections` table

Column	Type	Description
`id`	`SERIAL PRIMARY KEY`	Auto-incrementing ID
`doc_name`	`TEXT NOT NULL`	Document name
`section_id`	`TEXT NOT NULL`	node_id from PageIndex
`section_title`	`TEXT NOT NULL`	Section title
`source_file`	`TEXT NOT NULL`	Path to source `.md` file
`content`	`TEXT NOT NULL`	Full section text
`line_num`	`INTEGER`	Line number in source `.md`
`embedding`	`vector(1536)`	Embedding vector
`token_count`	`INTEGER DEFAULT 0`	Token count
`created_at`	`TIMESTAMPTZ`	Creation timestamp
`updated_at`	`TIMESTAMPTZ`	Update timestamp

Unique constraint: (doc_name, section_id)

Indexes:

idx_sections_embedding — IVFFlat cosine index (100 lists)
idx_sections_doc_name — B-tree on doc_name

`hallucination_log` table

Column	Type	Description
`id`	`SERIAL PRIMARY KEY`	Auto-incrementing ID
`session_id`	`TEXT NOT NULL`	VFS session ID
`claim_text`	`TEXT NOT NULL`	The flagged claim
`cited_source`	`TEXT`	Citation reference
`cited_section`	`TEXT`	Section reference
`supported`	`BOOLEAN DEFAULT FALSE`	Verification result
`confidence`	`FLOAT`	Confidence score
`logged_at`	`TIMESTAMPTZ`	Timestamp

Troubleshooting

PostgreSQL connection errors

Ensure Docker is running and the pgvector container is healthy:

docker compose ps
docker compose logs postgres

Verify the database is accessible:

psql postgresql://ldrs:ldrs_secret@localhost:5432/ldrs_v3

Embedding dimension mismatch

The default embedding dimension is 1536 (matching text-embedding-3-small). If using a different model, update both:

embedding_dim in AgentConfig
The vector(1536) column type in init_db.sql

Empty search results

Verify documents are indexed: GET /corpus
Check that structure JSONs exist in results/
Ensure embeddings were generated: files_with_embeddings > 0
Check logs for embedding errors

Agent loop hitting max iterations

Increase MAX_AGENT_ITERATIONS or check if the manifest has enough sections. The agent may be trying to fetch sections that don't exist.

LangSmith tracing not working

Ensure all three variables are set:

LANGSMITH_TRACING=true
LANGSMITH_API_KEY=<valid-key>
LANGSMITH_PROJECT=ldrs-v3

And that setup_monitoring() is called before any LLM calls (handled automatically by Pipeline.startup()).

Unicode / Nepali text issues

All text processing applies unicodedata.normalize("NFC", text) at boundaries. If you see matching failures with Devanagari text, check that:

Source files are saved as UTF-8
NFC normalization is applied before comparison
The database encoding is UTF-8

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
agent		agent
api		api
pageindex		pageindex
scripts		scripts
tests		tests
ui		ui
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

LDRS v3 — Hybrid RAG Deep Agent System

Table of Contents

Architecture Overview

Key Design Decisions

Pipeline Stages

Stage 0: File Watcher

Stage 1: Intent Classification

Stage 2: Parallel Retrieval

Stage 3: BM25 Fusion Ranking

Stage 4: VFS Population

Stage 5: Agent Loop

Stage 6: Grounding Verification

Quick Start

Prerequisites

1. Clone and install

2. Start PostgreSQL + pgvector

3. Configure environment

4. Index documents

5. Query

6. Launch the UI

Configuration Reference

LLM Settings

PostgreSQL

LangSmith Monitoring

File Watcher

Retrieval & Agent Tuning

Fusion Weights (Defaults)

Directories

Derived Properties

API Reference

POST /query

POST /batch-query

POST /index

POST /index-directory

GET /corpus

GET /corpus/stats

POST /corpus/rebuild

GET /sessions

DELETE /sessions/{session_id}

POST /sessions/cleanup

GET /health

Streamlit UI

Module Reference

config.py — AgentConfig

monitoring.py — UsageTracker & LangSmith

registry.py — Registry

embedder.py — Embedder

indexer.py — Indexer

tree_grep.py — TreeGrep

watcher.py — FileWatcher

intent_classifier.py — IntentClassifier

retriever.py — Retriever

fusion_ranker.py — FusionRanker

vfs.py — VFS

tools.py — AgentTools

agent_loop.py — AgentLoop

grounding.py — GroundingVerifier

pipeline.py — Pipeline

Indexing & File Watcher

Document Format

Indexing Pipeline

File Watcher

Virtual Filesystem (VFS)

manifest.json

Agent Tools & Function Calling

read_section

fetch_section

search_conversation_history

write_scratchpad

read_scratchpad

Grounding Verification

Process

Hallucination Log

Cost Control

Monitoring & LangSmith

UsageTracker

POST `/query`

POST `/batch-query`

POST `/index`

POST `/index-directory`

GET `/corpus`

GET `/corpus/stats`

POST `/corpus/rebuild`

GET `/sessions`

DELETE `/sessions/{session_id}`

POST `/sessions/cleanup`

GET `/health`

`config.py` — AgentConfig

`monitoring.py` — UsageTracker & LangSmith

`registry.py` — Registry

`embedder.py` — Embedder

`indexer.py` — Indexer

`tree_grep.py` — TreeGrep

`watcher.py` — FileWatcher

`intent_classifier.py` — IntentClassifier

`retriever.py` — Retriever

`fusion_ranker.py` — FusionRanker

`vfs.py` — VFS

`tools.py` — AgentTools

`agent_loop.py` — AgentLoop

`grounding.py` — GroundingVerifier

`pipeline.py` — Pipeline

`read_section`

`fetch_section`

`search_conversation_history`

`write_scratchpad`

`read_scratchpad`

`sections` table

`hallucination_log` table

Packages