Living Document RAG System, version 3 — a 6-stage retrieval-augmented generation pipeline that combines hierarchical grep search, pgvector semantic similarity, BM25 fusion ranking, and an agentic reasoning loop with grounding verification. Designed for multi-document technical knowledge bases with full Nepali/Devanagari Unicode support.
Python 3.12+ | PostgreSQL 16 + pgvector | OpenAI-compatible API
- Architecture Overview
- Pipeline Stages
- Quick Start
- Configuration Reference
- API Reference
- Streamlit UI
- Module Reference
- Indexing & File Watcher
- Virtual Filesystem (VFS)
- Agent Tools & Function Calling
- Grounding Verification
- Monitoring & LangSmith
- Citation Format
- Testing
- Project Structure
- Troubleshooting
LDRS v3 processes user queries through a deterministic 6-stage pipeline where each stage has a well-defined input/output contract. Three stages involve LLM calls (Intent Classification, Agent Loop, Grounding); the remaining stages are deterministic.
┌──────────────────────────┐
│ Source .md Files │
│ (docs/ directory) │
└────────────┬─────────────┘
│
┌────────────▼─────────────┐
│ File Watcher (Stage 0) │
│ watchdog + debounce │
└────────────┬─────────────┘
│
┌────────────▼─────────────┐
│ PageIndex (md_to_tree) │
│ .md → structure JSON │
└────────────┬─────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌────────▼───────┐ ┌──────▼────────┐ ┌──────▼───────┐
│ Structure JSON │ │ Embed into │ │ Registry │
│ (results/) │ │ pgvector │ │ (JSON) │
└────────────────┘ └───────────────┘ └──────────────┘
User Query ──────────────────────────────────────────────────────────
│
┌────────▼────────────────────────────────────────────┐
│ Stage 1: Intent Classification (LLM) │
│ → intent_type, selected_files, query_variants, │
│ pattern_hints, needs_db, likely_multihop │
└────────┬────────────────────────────────────────────┘
│
┌────────▼────────────────────────────────────────────┐
│ Stage 2: Parallel Retrieval │
│ ┌──────────────┐ ┌──────────────────────┐ │
│ │ TreeGrep │ │ pgvector Similarity │ │
│ │ (pattern │ ∥ │ (query_variants → │ │
│ │ hints) │ │ cosine search) │ │
│ └──────┬───────┘ └──────────┬───────────┘ │
│ └──────────┬────────────┘ │
│ ▼ │
│ Section Pool (deduplicated) │
└────────┬────────────────────────────────────────────┘
│
┌────────▼────────────────────────────────────────────┐
│ Stage 3: BM25 Fusion Ranking │
│ score = (w_bm25 × BM25) + (w_vec × vector) │
│ + (w_grep × grep_density) │
│ × recency_factor × tag_boost │
│ Weights shift by intent_type │
└────────┬────────────────────────────────────────────┘
│
┌────────▼────────────────────────────────────────────┐
│ Stage 4: VFS Population │
│ /sessions/{id}/manifest.json │
│ /sessions/{id}/retrieved/rank1__doc__section.md │
│ /sessions/{id}/conversation/ │
│ /sessions/{id}/working/scratchpad.md │
└────────┬────────────────────────────────────────────┘
│
┌────────▼────────────────────────────────────────────┐
│ Stage 5: Agent Loop (LLM + Function Calling) │
│ Read manifest → self-sufficiency check → │
│ read_section / fetch_section / scratchpad → │
│ cited answer synthesis │
└────────┬────────────────────────────────────────────┘
│
┌────────▼────────────────────────────────────────────┐
│ Stage 6: Grounding Verification (LLM) │
│ Extract claims → entailment check per claim → │
│ flag unsupported → caveat insertion → │
│ optional re-grounding loop │
└────────┬────────────────────────────────────────────┘
│
▼
Final Verified Answer
-
OpenAI function calling is used directly (not through a framework) for the agent loop, giving fine-grained control over tool execution, message history, and forced synthesis on iteration limits.
-
Dual retrieval (TreeGrep + pgvector) ensures both keyword-exact and semantic matches are captured. BM25 fusion ranking merges these signals with intent-aware weight presets.
-
VFS-per-session gives the agent a self-contained filesystem with ranked sections, conversation context, and a scratchpad. The agent navigates via
manifest.jsonand reads sections selectively. -
Grounding verification runs after every agent answer. Each cited claim is checked via LLM entailment against its source section. Unsupported claims get caveats; high flag ratios trigger re-grounding.
-
NFC normalization is applied at every text boundary for safe handling of Nepali/Devanagari characters.
Monitors directories for .md file changes using the watchdog library.
On create/modify, the file is re-indexed (parsed, embedded, registered).
On delete, embeddings and registry entries are removed. A configurable
debounce (default 2 seconds) prevents rapid successive re-indexing.
A single LLM call receives the user query and the compact registry JSON. It outputs a structured JSON with:
| Field | Description |
|---|---|
intent_type |
One of: exact, conceptual, comparative, multihop, db_query, hybrid |
selected_files |
Files likely to contain the answer, with confidence scores |
query_variants |
2-4 rephrased queries for diverse retrieval |
pattern_hints |
{literals, phrases, prefix_wildcards} for TreeGrep |
needs_db |
Whether database access is needed |
likely_multihop |
Whether multi-hop reasoning is required |
Fast paths: empty registry returns defaults without an LLM call.
Two retrieval methods run concurrently:
-
TreeGrep — searches structure JSONs using
pattern_hintsfrom Stage 1. Three-tier matching: title (3.0), summary (2.0), body (1.0). Supports exact substring and word-level matching with stop word filtering and configurable minimum match ratio (0.3). -
pgvector — embeds each
query_variantand runs cosine similarity search against thesectionstable. Multi-query results are deduplicated by(doc_name, section_id), keeping the highest similarity score.
Results are merged into a unified SectionCandidate pool with combined
grep and vector signals.
Scores each candidate using a weighted fusion:
raw_score = (w_bm25 × BM25_norm) + (w_vector × similarity) + (w_grep × grep_density)
final_score = raw_score × recency_factor × tag_boost
Weight presets by intent type:
| Intent | BM25 | Vector | Grep |
|---|---|---|---|
exact |
0.30 | 0.35 | 0.35 |
conceptual |
0.25 | 0.55 | 0.20 |
comparative |
0.35 | 0.40 | 0.25 |
multihop |
0.35 | 0.40 | 0.25 |
db_query |
0.30 | 0.40 | 0.30 |
hybrid |
0.35 | 0.40 | 0.25 |
Metadata boosts:
recency_factor: exponential decay with 365-day half-life, range [0.5, 1.0]tag_boost: fraction of query tokens matching registry tags, range [1.0, 1.5]
For comparative intent, a round-robin interleave ensures balanced
multi-file coverage.
Creates a per-session directory with:
manifest.json— ranked section list with metadata and scoresretrieved/— individual.mdfiles per ranked sectionconversation/— history summary and recent turnsdb_context/— optional database query resultsworking/scratchpad.md— agent working memory
The manifest is the agent's primary navigation interface.
An iterative OpenAI function-calling loop:
- The agent receives the query, intent type, and manifest
- It calls tools (
read_section,fetch_section,write_scratchpad, etc.) - Each iteration is an LLM call with
tool_choice="auto" - When the LLM produces a text response without tool calls, the loop ends
- If max iterations are reached, a forced synthesis prompt is sent
Tool definitions are provided in OpenAI function-calling format.
Only sections actually read via read_section or fetch_section may
be cited.
Post-answer verification of every cited claim:
- Extract
(claim, citation)pairs from the answer text - Locate cited section content in the VFS
- LLM entailment check: "Does this section support this claim?"
- Flag unsupported claims; insert caveat text
- If flag ratio > 0.4, set
re_grounded=True; the pipeline re-runs Stage 5 with grounding feedback appended to the query
Flagged claims are logged to results/hallucination_log.jsonl.
- Python 3.12+
- Docker (for PostgreSQL + pgvector)
- An OpenAI-compatible API endpoint (local or remote)
git clone <repo-url> ldrs_v3
cd ldrs_v3
pip install -r requirements.txtOr install as a package:
pip install -e ".[dev]"docker compose up -dThis starts pgvector/pgvector:pg16 on port 5432 and runs
scripts/init_db.sql to create the sections and hallucination_log
tables with the pgvector extension.
cp .env.example .env
# Edit .env with your API key, base URL, and other settingsPlace your .md source files in docs/, then index:
# Via API (start server first)
uvicorn api.server:app --host 0.0.0.0 --port 8001
# Then index via HTTP
curl -X POST http://localhost:8001/index-directoryOr programmatically:
import asyncio
from agent.config import AgentConfig
from agent.pipeline import Pipeline
async def main():
pipeline = Pipeline(AgentConfig())
await pipeline.startup()
results = await pipeline.index_directory("docs/")
for r in results:
print(f"{r.doc_name}: {r.node_count} nodes, {r.embedded_count} embedded")
await pipeline.shutdown()
asyncio.run(main())curl -X POST http://localhost:8001/query \
-H "Content-Type: application/json" \
-d '{"query": "How does OAuth2 token refresh work?"}'streamlit run ui/streamlit_app.pyAll configuration is centralized in AgentConfig (agent/config.py).
Values are read from environment variables with sensible defaults.
Constructor arguments override env values.
| Env Var | Field | Default | Description |
|---|---|---|---|
API_KEY |
api_key |
"" |
API key for the OpenAI-compatible endpoint |
BASE_URL |
base_url |
http://localhost:8000/v1 |
Base URL for chat and embeddings |
DEFAULT_MODEL |
default_model |
qwen3-vl |
Default model for chat completions |
EMBEDDING_MODEL |
embedding_model |
text-embedding-3-small |
Model name for embeddings |
| Env Var | Field | Default | Description |
|---|---|---|---|
POSTGRES_HOST |
postgres_host |
localhost |
PostgreSQL host |
POSTGRES_PORT |
postgres_port |
5432 |
PostgreSQL port |
POSTGRES_DB |
postgres_db |
ldrs_v3 |
Database name |
POSTGRES_USER |
postgres_user |
ldrs |
Database user |
POSTGRES_PASSWORD |
postgres_password |
ldrs_secret |
Database password |
| — | embedding_dim |
1536 |
Embedding vector dimension |
| Env Var | Field | Default | Description |
|---|---|---|---|
LANGSMITH_TRACING |
langsmith_tracing |
false |
Enable LangSmith tracing |
LANGSMITH_API_KEY |
langsmith_api_key |
"" |
LangSmith API key |
LANGSMITH_PROJECT |
langsmith_project |
ldrs-v3 |
LangSmith project name |
| Env Var | Field | Default | Description |
|---|---|---|---|
WATCH_DIRS |
watch_dirs |
./docs |
Comma-separated directories to monitor |
WATCH_DEBOUNCE |
watch_debounce |
2.0 |
Debounce interval in seconds |
| Env Var | Field | Default | Description |
|---|---|---|---|
MAX_VFS_SECTIONS |
max_vfs_sections |
15 |
Max sections in VFS per query |
| — | max_context_chars |
15000 |
Character budget for context |
| — | max_grep_results |
50 |
Max TreeGrep results per document |
MAX_AGENT_ITERATIONS |
max_agent_iterations |
10 |
Max agent loop iterations |
| Field | Default | Description |
|---|---|---|
bm25_weight |
0.4 |
Default BM25 weight |
vector_weight |
0.4 |
Default vector similarity weight |
grep_weight |
0.2 |
Default grep density weight |
These defaults are overridden by the intent-specific presets in
FusionRanker.
| Field | Default | Description |
|---|---|---|
results_dir |
./results |
Structure JSONs and registry |
docs_dir |
./docs |
Source .md files |
sessions_dir |
./sessions |
VFS session data |
registry_path |
./results/registry.json |
Auto-derived from results_dir |
postgres_dsn— full PostgreSQL connection stringasync_postgres_dsn— async-compatible connection string (for asyncpg)
The FastAPI server runs on port 8001 (configurable via PORT env var).
Run the full 6-stage pipeline for a user query.
Request body:
{
"query": "How does OAuth2 token refresh work?",
"conversation_summary": "optional prior conversation summary",
"recent_turns": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
],
"db_context": null,
"cleanup_session": false
}Response:
{
"answer": "OAuth2 token refresh works by...",
"intent_type": "conceptual",
"selected_files": ["authentication.md"],
"candidates_count": 12,
"ranked_count": 8,
"session_id": "a1b2c3d4e5f6",
"citations": ["authentication.md § Token Refresh"],
"claims_checked": 3,
"claims_supported": 3,
"claims_flagged": 0,
"re_grounded": false,
"usage": {
"total_input_tokens": 2500,
"total_output_tokens": 800,
"total_tokens": 3300,
"total_cost_usd": 0.0,
"total_llm_calls": 4,
"total_query_time_ms": 3200.5,
"stage_breakdown": {},
"stage_timings_ms": {}
},
"total_time_ms": 3450.2,
"success": true,
"error": ""
}Run multiple queries sequentially.
Request body:
{
"queries": ["question 1", "question 2"],
"conversation_summary": null,
"cleanup_sessions": false
}Response: List[QueryResponse]
Index a single markdown file.
Request body:
{
"md_path": "/absolute/path/to/document.md",
"tags": ["auth", "security"],
"summary": "OAuth2 authentication documentation",
"if_thinning": false
}Response:
{
"md_path": "/absolute/path/to/document.md",
"doc_name": "document",
"index_path": "results/document_structure.json",
"node_count": 15,
"section_count": 12,
"embedded_count": 12,
"success": true,
"error": ""
}Index all .md files in a directory.
Request body:
{
"directory": null,
"tags": null,
"if_thinning": false
}If directory is null, defaults to config.docs_dir.
Response: List[IndexResponse]
Get a summary of the current corpus.
Response:
{
"total_files": 5,
"total_tokens": 25000,
"total_nodes": 120,
"files_with_embeddings": 5,
"file_names": ["auth.md", "api.md", "setup.md"]
}Detailed corpus statistics including per-file info.
Response:
{
"summary": { "...corpus summary..." },
"files": {
"auth.md": {
"summary": "OAuth2 flow documentation",
"tags": ["auth"],
"sections_count": 8,
"size_tokens": 1200,
"has_embeddings": true,
"last_modified": "2026-02-01"
}
}
}Re-index all documents in the docs directory. Returns List[IndexResponse].
List all VFS session IDs. Returns List[str].
Delete a specific VFS session.
Response: {"status": "deleted", "session_id": "..."}
Delete all VFS sessions.
Response: {"status": "cleaned", "sessions_deleted": "5"}
Health check endpoint.
Response:
{
"status": "ok",
"pipeline_started": true,
"corpus_files": 5,
"version": "3.0.0"
}The Streamlit chat interface (ui/streamlit_app.py) provides:
- Chat interface — send queries and view cited answers
- Pipeline details — expandable panel showing intent type, candidate counts, ranked counts, claims checked/flagged, timing
- Corpus info — sidebar showing file count, token count, node count
- Usage stats — expandable JSON view of per-stage token usage
- Health status — sidebar indicator of API connection status
Launch:
streamlit run ui/streamlit_app.pyThe UI connects to the FastAPI server (default http://localhost:8001).
The API URL is configurable via the sidebar text input.
All modules live under agent/. The package uses lazy imports via
__getattr__ in agent/__init__.py to avoid pulling in heavy
dependencies on package load.
Centralized @dataclass configuration. All settings loaded from
environment variables with sensible defaults. Provides postgres_dsn
and async_postgres_dsn computed properties.
setup_monitoring(config)— configures LangSmith tracing env varsUsageTracker— per-query accumulator for token counts, latencies, and costs across all stagesLLMCallRecord— individual LLM call recordStageTimer— per-stage timing
JSON-based document registry following the AGENT_SYSTEM schema. Tracks:
- Document metadata (summary, tags, sections, token counts)
- Embedding status
- Last modified dates
- Index paths
Key methods:
add_file()— add/update a file entry with structure treeremove_file()— remove a file entrymark_embeddings()— update embedding statusget_corpus_summary()— aggregate corpus statisticsget_for_llm()— compact version for Stage 1 LLM contextsave()— atomic write (tmp file + rename)
Uses NFC normalization on all text fields and tiktoken for token counting.
Async section-level embedding pipeline:
- PostgreSQL connection pool via
asyncpg(min 2, max 10 connections) - Batch embedding via OpenAI-compatible
/v1/embeddingsendpoint - Upsert semantics (delete + insert per document)
- Cosine similarity search with optional
doc_namesscope - Multi-query search with deduplication
End-to-end document indexing:
md_to_tree()— parse.mdinto structure tree via PageIndex- Save structure JSON to
results/ - Flatten tree into sections (with breadcrumbs)
- Embed sections into pgvector
- Register in Registry with metadata
Also handles remove_file() and index_directory().
Hierarchical pattern search across structure JSON nodes. Three-tier matching with configurable relevance scores:
| Field | Relevance Score |
|---|---|
| Title | 3.0 |
| Summary | 2.0 |
| Body text | 1.0 |
Two matching modes:
- Tier 1: Exact substring match (full score)
- Tier 2: Word-level match with ratio scaling (min ratio: 0.3)
Features:
- NFC normalization for Unicode safety
- Stop word filtering (200+ English stop words)
- Scope filtering by node_id or title
- Snippet extraction with configurable padding (60 chars)
search_from_hints()for Intent Classifier pattern_hints- Multi-pattern search with deduplication
File system monitoring using watchdog:
- Debounced event handling (configurable interval)
- Handles create, modify, delete, and move events
- Async indexing via
asyncio.run_coroutine_threadsafe() - Automatic registry timestamp updates
- Recursive directory watching
Stage 1 LLM call that produces:
IntentResultwith intent type, file selection, query variantsPatternHintsfor TreeGrep routingSelectedFilelist with confidence scores
Handles: malformed JSON, missing fields, invalid intent types. Falls
back to conceptual intent with original query on any failure.
Stage 2 parallel retrieval:
- Loads TreeGrep instances for selected files
- Runs grep and vector search concurrently via
asyncio.gather() - Merges results into
SectionCandidatepool - Deduplication by
(doc_name, section_id) - Combined signals:
grep_score,grep_hits,vector_similarity
Stage 3 BM25 fusion ranking:
- BM25Okapi scoring from
rank-bm25library - BM25 normalization to [0, 1] range
- Intent-based weight presets
- Recency factor (exponential decay, 365-day half-life)
- Tag overlap boost
- Multi-file interleave for comparative intent
- Capped at
max_vfs_sections
Stage 4 virtual filesystem:
- Creates session directories with UUID-based IDs
- Writes ranked sections as individual
.mdfiles - Builds
manifest.jsonwith metadata and score breakdowns - Writes conversation context and db_context
- Supports on-demand section fetching (
add_fetched_section) - Session listing and cleanup
Five tools available to the agent during Stage 5:
| Tool | Description |
|---|---|
read_section(vfs_path) |
Read a section from the VFS manifest |
fetch_section(source_file, section_header) |
Pull additional section on demand |
search_conversation_history(query) |
Search prior conversation turns |
write_scratchpad(content) |
Write to private working memory |
read_scratchpad() |
Read working memory |
Tool definitions are generated in OpenAI function-calling format via
get_tool_definitions(). The sections_read property tracks which
sections were accessed (for citation validation).
Stage 5 iterative function-calling loop:
- System prompt defines the agent protocol (manifest-first, cite-inline)
- Iterates up to
max_agent_iterationstimes - Each iteration: LLM call with tools → execute tool calls → append results
- Terminates when the LLM produces content without tool calls
- Forced synthesis on iteration limit
- Citation extraction via regex:
[source: file.md § Section] - Records per-iteration token usage
Stage 6 grounding verification:
- Extracts
(claim, citation)pairs from answer text - Locates cited source content in the VFS manifest
- Per-claim LLM entailment check (strict verification prompt)
- Flags unsupported claims (max 10 claims checked per answer)
- Caveat insertion for flagged claims
- Re-grounding trigger at >40% flag ratio
- Hallucination logging to
results/hallucination_log.jsonl - Handles JSON parse failures and verification errors gracefully
End-to-end orchestrator:
- Lazy component initialization (all components created on first use)
startup()/shutdown()lifecycle managementquery()runs all 6 stages sequentially- Re-grounding loop with feedback to the agent
- Conversation state management (last 20 turns)
- Index pass-through (
index_file,index_directory) - Corpus info and session management utilities
Source documents are Markdown files. PageIndex parses them into a hierarchical structure tree based on heading levels.
.md file → md_to_tree() → structure JSON → flatten → embed → register
-
PageIndex parsing —
md_to_tree()converts Markdown headings into a tree structure with node IDs, titles, text content, and page ranges. -
Structure JSON — saved to
results/<doc_name>_structure.jsonfor TreeGrep and the fetch_section tool. -
Flattening — the tree is flattened into a list of section dicts with
{node_id, title, text, line_num, breadcrumb}. Only nodes with non-empty text are included. -
Embedding — sections are batch-embedded via the OpenAI-compatible embeddings endpoint and upserted into the
sectionstable in PostgreSQL with pgvector. -
Registration — the document is added to
registry.jsonwith metadata (summary, tags, sections, token count, embedding status).
The watcher monitors directories for .md file changes:
from agent.watcher import FileWatcher
from agent.config import AgentConfig
config = AgentConfig()
watcher = FileWatcher(config)
await watcher.start() # starts watchdog observer
# ... application runs ...
await watcher.stop() # stops and cleans upEvents handled:
created/modified→ re-index the filedeleted→ remove embeddings + registry entry + structure JSONmoved→ treat source as deleted, destination as created
Each query creates a VFS session directory:
sessions/{session_id}/
manifest.json ← agent reads this first
retrieved/
rank1__docname__section.md
rank2__docname__section.md
...
conversation/
history_summary.md ← conversation context
recent_turns.json ← last N turns
db_context/
relevant_records.json ← optional database results
working/
scratchpad.md ← agent working memory
{
"session_id": "a1b2c3d4e5f6",
"created_at": "2026-02-28T10:00:00Z",
"intent_type": "conceptual",
"query_variants": ["original query", "variant 1"],
"sections": [
{
"vfs_path": "retrieved/rank1__auth__oauth_flow.md",
"source_file": "authentication.md",
"section": "OAuth Flow",
"one_line_summary": "Describes the OAuth2 authorization code flow...",
"retrieval_method": "grep+vector",
"final_score": 0.85,
"score_breakdown": {
"bm25": 0.7,
"vector": 0.9,
"grep_density": 0.3
},
"why_included": "vector=0.90; bm25=0.70; grep_density=0.30",
"last_modified": "2026-02-01",
"fetch_more_hint": false
}
]
}The agent has five tools available, defined in OpenAI function-calling format:
{
"name": "read_section",
"parameters": {
"vfs_path": "retrieved/rank1__auth__oauth_flow.md"
}
}Reads a section listed in the manifest. Only sections read via this tool
may be cited. The tool tracks all accessed paths in sections_read.
{
"name": "fetch_section",
"parameters": {
"source_file": "authentication.md",
"section_header": "Token Refresh"
}
}Pulls an additional section on demand from the structure JSON. The section
is added to the VFS retrieved/ directory and manifest. Use when the
manifest hints at more content or a multihop reference is followed.
{
"name": "search_conversation_history",
"parameters": {
"query": "what did we discuss about authentication"
}
}Searches prior conversation turns via keyword matching.
{
"name": "write_scratchpad",
"parameters": {
"content": "## Reasoning\n..."
}
}Writes to the agent's private working memory. Content should follow the
structured format: ## Reasoning, ## Key Facts, ## Open Questions,
## Synthesis Plan.
No parameters. Reads the current scratchpad content.
- Claim extraction — regex matches sentences followed by
[source: ...]citations - Source lookup — maps citations to VFS sections via manifest
- Entailment check — LLM verifies each claim against its source
- Caveat insertion — unsupported claims get a
[Note: This claim could not be fully verified...]suffix - Re-grounding — if >40% of claims are flagged, the pipeline re-runs Stage 5 with explicit feedback about which claims failed
Flagged claims are appended to results/hallucination_log.jsonl:
{
"timestamp": "2026-02-28T10:00:00Z",
"session_id": "a1b2c3d4e5f6",
"claim": "The refresh token expires after 30 days",
"citation": "authentication.md § Token Refresh",
"supported": false,
"reason": "The source does not mention a 30-day expiry period"
}- Maximum 10 claims verified per answer (
MAX_CLAIMS_TO_VERIFY) - Source content truncated to 3000 chars per verification call
- Re-grounding runs at most once (
MAX_REGROUND_ATTEMPTS = 1)
Every query creates a UsageTracker that records:
- Per-call: stage, model, input/output tokens, latency, cost
- Per-stage: start/end timing
- Query-level: total query time
The summary() method returns a dict with:
{
"total_input_tokens": 2500,
"total_output_tokens": 800,
"total_tokens": 3300,
"total_cost_usd": 0.0,
"total_llm_calls": 4,
"total_query_time_ms": 3200.5,
"stage_breakdown": {
"intent_classifier": {"input_tokens": 500, "output_tokens": 200, ...},
"agent_loop": {"input_tokens": 1500, "output_tokens": 400, ...},
"grounding": {"input_tokens": 500, "output_tokens": 200, ...},
},
"stage_timings_ms": {
"intent_classifier": 850.2,
"retrieval": 200.5,
"fusion_ranking": 15.3,
"vfs_population": 25.1,
"agent_loop": 1800.0,
"grounding": 310.5,
},
}Enable LangSmith tracing via environment variables:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=your-key
LANGSMITH_PROJECT=ldrs-v3setup_monitoring(config) sets the appropriate environment variables
that LangSmith/LangChain read automatically. Must be called before any
LangChain model is instantiated (handled by Pipeline.startup()).
The agent uses inline citations in its answers:
Single source:
OAuth2 uses refresh tokens to obtain new access tokens. [source: auth.md § Token Refresh]
Multiple sources:
Both OAuth2 and API keys support scoped permissions. [source: auth.md § Scopes, api_keys.md § Permissions]
Citations are extracted by the AgentLoop._extract_citations() method
using the regex pattern \[source:\s*([^\]]+)\].
The grounding verifier parses citations in the format
file.md § Section Name to locate source content in the VFS.
# From the project root
python -m pytest tests/ -v
# Or with the development install
pytest tests/ -v- 74 tests covering all modules and pipeline stages
- All tests pass in approximately 1.5 seconds
- Fully offline — no LLM or database calls
- Uses
unittest.mock.AsyncMockfor async mocking - Uses
pytest-asynciowithasyncio_mode = "auto" - Test fixtures use
tmp_pathfor filesystem isolation
Tests cover:
AgentConfig— default values, environment overrides, DSN propertiesUsageTracker— recording, timing, summary computationRegistry— add/remove files, save/load, corpus summary, LLM formatTreeGrep— exact/word-level matching, regex, scope, multi-pattern,search_from_hints, deduplicationIntentClassifier— response parsing, malformed JSON, empty registrySectionCandidate— signal mergingFusionRanker— weight presets, multi-file interleave, recency/tag boostsVFS— session creation, manifest, section read/write, scratchpad, add fetched section, cleanupAgentTools— all five tools, error handlingAgentLoop— citation extractionGroundingVerifier— claim extraction, verification parsing, caveating, flag loggingPipeline— lifecycle, conversation management, corpus info- Indexer — section flattening, breadcrumbs, recursive node counting
tests/fixtures/— 9 structure JSON files for TreeGrep testingtests/markdown/— 12.mdfiles for indexer testing
ldrs_v3/
├── README.md ← this file
├── pyproject.toml ← project metadata, dependencies, tool config
├── requirements.txt ← pinned dependencies
├── .env.example ← environment variable template
├── .gitignore
├── docker-compose.yml ← PostgreSQL + pgvector (pg16)
│
├── agent/ ← core pipeline modules
│ ├── __init__.py ← lazy imports, __all__ exports
│ ├── config.py ← AgentConfig dataclass
│ ├── monitoring.py ← UsageTracker, LangSmith setup
│ ├── registry.py ← document registry (JSON)
│ ├── embedder.py ← pgvector embedding pipeline
│ ├── indexer.py ← md → tree → embed → register
│ ├── tree_grep.py ← hierarchical pattern search
│ ├── watcher.py ← file system watcher (watchdog)
│ ├── intent_classifier.py ← Stage 1: intent + routing
│ ├── retriever.py ← Stage 2: parallel retrieval
│ ├── fusion_ranker.py ← Stage 3: BM25 fusion ranking
│ ├── vfs.py ← Stage 4: VFS population
│ ├── tools.py ← Stage 5: agent tools
│ ├── agent_loop.py ← Stage 5: agent reasoning loop
│ ├── grounding.py ← Stage 6: grounding verification
│ └── pipeline.py ← end-to-end orchestrator
│
├── api/
│ ├── __init__.py
│ └── server.py ← FastAPI server (11 endpoints)
│
├── ui/
│ ├── __init__.py
│ └── streamlit_app.py ← Streamlit chat interface
│
├── pageindex/ ← PageIndex library (md_to_tree)
│ ├── __init__.py
│ ├── page_index_md.py ← markdown → structure tree
│ ├── utils.py ← shared utilities
│ └── config.yaml ← PageIndex configuration
│
├── scripts/
│ └── init_db.sql ← PostgreSQL schema (sections + hallucination_log)
│
├── tests/
│ ├── test_ldrs_v3.py ← 74 tests (all passing)
│ ├── fixtures/ ← 9 structure JSON test files
│ └── markdown/ ← 12 .md test files
│
├── docs/ ← source .md files (user content)
├── results/ ← runtime: structure JSONs + registry.json
└── sessions/ ← runtime: VFS session directories
PostgreSQL 16 with pgvector extension. Tables created by
scripts/init_db.sql:
| Column | Type | Description |
|---|---|---|
id |
SERIAL PRIMARY KEY |
Auto-incrementing ID |
doc_name |
TEXT NOT NULL |
Document name |
section_id |
TEXT NOT NULL |
node_id from PageIndex |
section_title |
TEXT NOT NULL |
Section title |
source_file |
TEXT NOT NULL |
Path to source .md file |
content |
TEXT NOT NULL |
Full section text |
line_num |
INTEGER |
Line number in source .md |
embedding |
vector(1536) |
Embedding vector |
token_count |
INTEGER DEFAULT 0 |
Token count |
created_at |
TIMESTAMPTZ |
Creation timestamp |
updated_at |
TIMESTAMPTZ |
Update timestamp |
Unique constraint: (doc_name, section_id)
Indexes:
idx_sections_embedding— IVFFlat cosine index (100 lists)idx_sections_doc_name— B-tree ondoc_name
| Column | Type | Description |
|---|---|---|
id |
SERIAL PRIMARY KEY |
Auto-incrementing ID |
session_id |
TEXT NOT NULL |
VFS session ID |
claim_text |
TEXT NOT NULL |
The flagged claim |
cited_source |
TEXT |
Citation reference |
cited_section |
TEXT |
Section reference |
supported |
BOOLEAN DEFAULT FALSE |
Verification result |
confidence |
FLOAT |
Confidence score |
logged_at |
TIMESTAMPTZ |
Timestamp |
Ensure Docker is running and the pgvector container is healthy:
docker compose ps
docker compose logs postgresVerify the database is accessible:
psql postgresql://ldrs:ldrs_secret@localhost:5432/ldrs_v3The default embedding dimension is 1536 (matching text-embedding-3-small).
If using a different model, update both:
embedding_diminAgentConfig- The
vector(1536)column type ininit_db.sql
- Verify documents are indexed:
GET /corpus - Check that structure JSONs exist in
results/ - Ensure embeddings were generated:
files_with_embeddings > 0 - Check logs for embedding errors
Increase MAX_AGENT_ITERATIONS or check if the manifest has enough
sections. The agent may be trying to fetch sections that don't exist.
Ensure all three variables are set:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=<valid-key>
LANGSMITH_PROJECT=ldrs-v3And that setup_monitoring() is called before any LLM calls
(handled automatically by Pipeline.startup()).
All text processing applies unicodedata.normalize("NFC", text) at
boundaries. If you see matching failures with Devanagari text, check
that:
- Source files are saved as UTF-8
- NFC normalization is applied before comparison
- The database encoding is UTF-8
MIT