Production-grade GraphRAG pipeline for building, improving, evaluating, and documenting knowledge graphs over large engineering and technical PDFs.
This repository implements a deterministic + LLM-assisted GraphRAG system with ontology induction, community-aware retrieval, evaluation, and automated documentation generation.
- Extracts entities and relations from engineering PDFs
- Builds a materialized knowledge graph (NetworkX)
- Learns ontology types and type hierarchies from data
- Detects semantic communities in the graph
- Prevents semantic drift using community-aware retrieval
- Supports hybrid retrieval (Graph + Vector / FAISS)
- Runs targeted LLM improve cycles for factual gaps
- Evaluates multiple RAG strategies
- Generates DOCX and PDF technical documentation automatically
- PDF ingestion and chunking
- Entity & relation extraction
- Graph construction (
graph.pkl) - Ontology induction (
type_registry.json,type_hierarchy.json) - Community detection (
communities.json) - Hybrid retrieval (Graph + FAISS)
- Improve cycle (LLM-assisted, checkpointed)
- Strategy evaluation
- Automated documentation generation
| Artifact | Description |
|---|---|
index/graph.pkl |
Materialized knowledge graph |
index/type_registry.json |
Learned ontology classes |
index/type_hierarchy.json |
Inferred type hierarchy |
index/communities.json |
Graph communities |
index/faiss.index |
Vector index |
index/strategy_comparison.json |
Strategy evaluation |
index/missed_knowledge.json |
Improve-cycle gaps |
documentation/GraphRAG_Documentation.docx |
Auto-generated doc |
documentation/GraphRAG_Documentation.pdf |
PDF (LibreOffice) |
Technical documentation is generated automatically using:
python scripts/generate_documentation.pyOutputs:
- DOCX (always)
- PDF (via LibreOffice headless)
- Deterministic structure before LLMs
- LLMs used only for factual completion
- Community boundaries prevent semantic drift
- Explainability and reproducibility over recall
- Core GraphRAG engine: ✅ COMPLETE
- Ontology & communities: ✅ COMPLETE
- Evaluation & documentation: ✅ COMPLETE
- Operationalization: 🚧 IN PROGRESS