NSHG‑RAG is not a generic Retrieval‑Augmented Generation (RAG) demo. It is an experimental, research‑grade RAG system designed to study and implement retrieval policies, hybrid knowledge representations, and evaluation‑driven reliability in large‑language‑model systems.
Rather than treating retrieval as a single vector search call, NSHG‑RAG models retrieval as a decision‑making process governed by query decomposition, conditional routing, symbolic constraints, and retriever‑specific trust weighting. Large Language Models (LLMs) are treated as components, not authorities.
The system is built as a neuro‑symbolic RAG testbed that allows controlled experimentation across dense, sparse, symbolic, and clustered retrieval paradigms — with first‑class support for ablation, evaluation, and architectural introspection.
NSHG‑RAG is guided by the following principles:
-
Retrieval is a policy, not a function call Retrieval decisions are explicitly reasoned about, decomposed, filtered, and weighted.
-
Knowledge is structured, fallible, and contextual Text chunks are treated as indexing artifacts, not ground truth. Symbolic graphs, metadata constraints, and clustering provide alternative views over the same knowledge.
-
LLMs synthesize — they do not decide LLMs are used for query decomposition, weighting, and synthesis, while control logic remains external and inspectable.
-
Evaluation precedes optimization Ablation studies, retriever comparisons, and controlled experiments are central to development.
-
Failure modes are first‑class The system is designed to expose retrieval bias, routing errors, and relevance trade‑offs rather than hide them behind fluent answers.
NSHG-RAG differs from typical RAG systems in several important ways:
- Policy-driven retrieval — Retrieval decisions are externalized, auditable, and dynamically weighted across heterogeneous retrievers.
- Hybrid neuro-symbolic knowledge — Combines dense, sparse, clustering, and symbolic graph retrieval rather than relying on a single retriever.
- Planner-centric orchestration — LLMs are used for query decomposition and synthesis, while retrieval control remains external and inspectable.
- Chunking as first-class concern — Semantic, structure-aware chunks ensure high-quality retrieval and reduce hallucination risk.
- Evaluation-first design — Built-in ablation and comparative evaluation to understand retrieval quality, weighting effects, and system robustness.
- Not a chatbot framework — Optimized for research, analysis, and controlled experiments rather than user-facing conversation.
Important architectural note: In NSHG-RAG, each retriever has a distinct epistemic role. Overlap in functionality is intentional and resolved at the policy and weighting level, not inside individual retrievers.
-
Query decomposition into atomic, independent subqueries
-
Conditional retrieval based on:
- Query intent
- Scope constraints (file, folder, extension)
- Precision vs recall requirements
-
LLM‑assisted retriever weighting
-
Symbolic, semantic, sparse, and dense hybridization
NSHG-RAG employs multiple retrievers, each optimized for a different notion of relevance. These retrievers are not interchangeable.
-
Dense vector retrieval (FAISS) Chunk-level semantic proximity. Optimized for paraphrasing, implicit references, and fine-grained relevance.
-
Sparse lexical retrieval (BM25) Literal keyword and factoid matching. Optimized for exact terms, identifiers, and surface-form precision.
-
Semantic clustering retrieval Concept-level routing. Optimized for identifying which conceptual region of a document is relevant, not which exact sentence.
-
Symbolic graph-based retrieval Structure- and author-intent-driven access via titles, sections, file paths, and explicit document organization.
Hybrid re-ranking is performed through explicit score-space ensembling under planner-controlled weights, rather than implicit cascades or opaque rerankers.
- Dense vector retrieval (FAISS)
- Sparse lexical retrieval (BM25)
- Semantic clustering retrieval
- Symbolic graph‑based retrieval
- Hybrid re‑ranking under explicit weight control
- Chunk‑level indexing (fixed + semantic)
- Metadata‑aware filtering
- Symbolic graph views over document structure
- Section‑ and title‑aware retrieval
- Explicit planning stage before retrieval
- Retrieval decisions externalized from prompting
- Multi‑step execution with intermediate state
- Robust LLM output parsing and fallback logic
- Retriever ablation planning
- Side‑by‑side retriever comparison
- Precision / recall / F1 tracking
- Regression‑friendly experimental CLI
- Score‑space analysis across retrievers
- Inspection of weight sensitivity and retrieval agreement
User Query
↓
Planner (Decompose → Filter → Weight)
↓
Hybrid Retriever (Symbolic | Cluster | BM25 | FAISS)
↓
Retrieved Evidence (Deduplicated)
↓
LLM Synthesizer
↓
Final Answer
The Planner is the defining component of NSHG‑RAG and the primary reason it qualifies as an advanced RAG system.
-
Query Decomposition Converts a complex user query into atomic, declarative subqueries designed solely for high‑confidence retrieval — not reasoning.
-
Scope & Constraint Extraction Identifies explicit filters such as:
- File names
- Folder paths
- File extensions
Enforces strict scoping when required.
-
Retriever Weight Assignment Dynamically assigns trust weights across retrievers:
- Symbolic
- Semantic clustering
- BM25
- FAISS
Weights are chosen based on query intent, scope strictness, and retrieval semantics — not heuristics alone.
-
Conditional Retrieval Execution Each subquery is retrieved independently with its own filters and weights, enabling:
- Multi‑hop retrieval
- Recall‑diverse evidence gathering
- Bias mitigation across retrievers
-
Evidence Aggregation Retrieved chunks are deduplicated and merged before synthesis.
This design:
- Prevents early hallucination
- Decouples reasoning from retrieval
- Makes retrieval decisions auditable
- Enables ablation at the policy level
Retrieval quality in NSHG-RAG is fundamentally bounded by chunk quality. Instead of relying on fixed-size or naive recursive splitting, NSHG-RAG employs a semantic chunking strategy designed to preserve conceptual coherence while remaining retriever-friendly.
The SemanticChunker operates as a content-aware preprocessing layer that normalizes heterogeneous documents into retrieval-stable units.
Key characteristics:
-
Multi-format parsing Supports Markdown, PDF, DOCX, PPTX, notebooks, spreadsheets, source code, configuration files, and plain text through specialized parsers.
-
Structure-preserving preprocessing Parsed elements retain metadata such as section titles and document hierarchy, enabling downstream symbolic and scoped retrieval.
-
Semantic-aware splitting Large textual elements are split using embedding similarity rather than fixed token counts. Splits occur only when semantic drift exceeds a configurable threshold.
-
Token-aware constraints Each chunk is bounded by minimum and maximum token limits to ensure compatibility with both sparse and dense retrievers.
-
Context-preserving overlap Adjacent chunks include limited sentence overlap to prevent boundary-induced information loss.
-
Retriever-aligned output Chunks are designed to be equally consumable by BM25, FAISS, clustering, and symbolic retrievers without format-specific bias.
Most RAG failures attributed to retrieval or ranking are, in practice, chunking failures. NSHG-RAG treats chunking as a first-class architectural concern:
- Reduces semantic fragmentation
- Improves retriever agreement
- Enables more stable score-space ensembling
- Strengthens symbolic and section-aware filtering
Chunking in NSHG-RAG is not a preprocessing utility — it is an implicit retrieval policy.
Semantic, structure-aware chunk generation (fixed + semantic splitting). Document parsing and chunk generation (fixed + semantic).
Independent implementations of:
- BM25
- FAISS
- Hybrid retrieval
- Semantic clustering
- Symbolic graph traversal
Explicit retrieval‑reasoning orchestration layer.
Evaluation, ablation, and benchmarking utilities.
Embedding and metadata persistence (SQLite + vector stores).
LLM adapters (Gemini, Ollama).
Abstract contracts enforcing modularity and interchangeability.
NSHG‑RAG treats evaluation as a first‑order concern:
- Retrieval quality is measured independently of generation
- Each retriever can be isolated and stress‑tested
- Ablation studies reveal architectural dependencies
- Performance is tracked across configuration changes
Most RAG systems fail here — NSHG‑RAG is designed to expose those failures.
NSHG-RAG comes with an evaluation CLI that demonstrates its capabilities in comparative and ablation studies.
python eval_cli.py --eval_type comparative --topk 5 --chunker_type semantic-
--eval_type:comparative→ Compare all retrievers side-by-sideablation_weights→ Test dynamic vs fixed planner-assigned weightsablation_retriever→ Isolate retrievers to measure individual impact
-
--topk→ Number of retrieved chunks per retriever -
--chunker_type→ Choosesemantic(structure-aware) orfilechunking -
--human_eval→ Force manual evaluation scoring -
--use_dynamic_weights→ For weight ablation experiments
This workflow demonstrates how NSHG-RAG separates retrieval policy from LLM synthesis, enabling fine-grained experimentation and analysis.
NSHG‑RAG explicitly distinguishes score normalization from retrieval confidence.
All retrievers emit scores normalized to a common ([0,1]) range to enable safe combination. However, normalized scores are not treated as calibrated probabilities. Confidence is instead approximated implicitly through:
- Agreement across heterogeneous retrievers
- Planner‑assigned trust weights based on query intent
- Redundancy of evidence across independent subqueries
This design avoids over‑interpreting any single retriever’s score while still enabling principled score‑space ensembling. Explicit probabilistic calibration is left as a future research direction.
Semantic clusters in NSHG-RAG are conceptual groupings, not reranking mechanisms.
Key design choice:
- Cluster-level similarity scores are intentionally not refined at the chunk level.
Rationale:
- The role of the cluster retriever is to identify which conceptual region of a document is relevant.
- Fine-grained chunk-to-query proximity is delegated to FAISS and BM25, which are explicitly designed for that purpose.
- Cluster retrieval therefore prioritizes high-recall concept discovery over local precision.
All chunks within a relevant cluster are surfaced and subsequently reweighted through the hybrid ensemble. This separation of concerns avoids double-counting semantic similarity and preserves retriever epistemic clarity.
The Symbolic Retriever operates over a directory hierarchy knowledge graph (folder → file → section → chunk) and is intentionally not a primary relevance scorer. Its core role is search-space restriction based on query intent, as decided by the planner.
Unlike dense (FAISS) or sparse (BM25) retrievers, the symbolic retriever:
- Does not attempt to rank chunks by semantic relevance
- Acts as a structural filter that reduces the candidate set
- Encodes human-authored organization (folders, files, extensions, section titles)
This mirrors how experts navigate large codebases or document collections: "first narrow the scope, then search deeply".
The symbolic knowledge graph is a MultiDiGraph with the following hierarchy:
Folder → File → Section → Chunk
- Folder nodes encode directory structure
- File nodes encode filenames, extensions, normalized paths
- Section nodes encode semantic headings / document structure
- Chunk nodes store the final retrievable content
The planner may infer query intent such as:
- Target folder(s)
- Specific file names
- File extensions (e.g.
.py,.md)
These constraints are passed to the symbolic retriever, which:
- Identifies candidate files using graph indices
- Collects all descendant section + chunk nodes
- Returns a restricted chunk set to downstream retrievers
This filtered candidate pool is then consumed by:
- BM25 (lexical precision)
- FAISS (dense proximity)
Both retrievers operate only within this planner-approved subset.
Semantic scoring inside the symbolic retriever is lightweight and structural:
- Uses section titles, not chunk text
- Helps rank chunks within the already-filtered space
- Never competes with FAISS or BM25 as a primary scorer
This separation avoids:
- Global dense search over irrelevant folders
- Semantic score dilution across unrelated domains
- Redundant computation already handled by FAISS
In this architecture:
- Symbolic retrieval = intent-aligned filtering
- Cluster retrieval = concept-level recall
- FAISS / BM25 = fine-grained relevance
This makes retrieval:
- Interpretable
- Planner-controllable
- Efficient at scale
- Aligned with human information-seeking behavior
The symbolic retriever encodes explicit human structure:
- Section titles
- Document hierarchy
- File and folder organization
Symbolic signals are treated as high-precision but low-recall. They are never assumed to be semantically complete and are always combined with semantic retrievers under planner control.
NSHG‑RAG is intentionally open‑ended. Planned and exploratory directions include:
- Temporal and versioned knowledge
- Retrieval‑answer consistency verification
- Contradiction detection across evidence
- Confidence estimation and abstention
- Learned retrieval policies
- Adversarial and injection‑resistant retrieval
NSHG‑RAG is built for:
- Researchers studying RAG reliability
- Engineers designing knowledge systems
- Practitioners moving beyond demo‑level RAG
If your goal is a chatbot, this is overkill. If your goal is trustworthy knowledge synthesis, this is the right level of abstraction.
MIT License
NSHG‑RAG treats RAG not as retrieval‑augmented text generation, but as externalized cognition — where memory, control, and reasoning are explicit, inspectable, and improvable.