Prophet is a modular framework for building and experimenting with knowledge graph-based RAG (Retrieval-Augmented Generation) systems. It enables structured knowledge extraction, retrieval, and reasoning using graph-based techniques.
Bodhi is a dedicated LLM based Knowledge Graph Extraction Pipeline for constructing knowledge graphs from unstructured documents. It supports various formats, including:
- PDFs
- Text files (.txt, .md, .csv, etc.)
Bodhi extracts entities, relationships, and constructs knowledge graphs in NetworkX format, serving as the foundation for downstream retrieval and inference.
Odysseus is a plugin-based graph retrieval engine designed for preprocessing and indexing knowledge graphs. It currently supports:
- GraphRAG: Global Search - Summarizes entire knowledge graphs for broad query coverage.
- GraphRAG: Local Search - Focuses on retrieving highly relevant graph substructures for detailed analysis.
Odysseus precomputes retrieval structures, optimizing runtime efficiency for downstream queries.
Alchemist serves as the dynamic counterpart to Odysseus, executing retrieval queries in real-time. Like Odysseus, it supports:
- Global Search - Retrieves broad contextual information.
- Local Search - Extracts specific, high-relevance subgraphs.
Alchemist ensures flexible and adaptive retrieval, integrating various retrieval strategies to enhance response accuracy.
Sanchayam is a plugin-based storage manager enabling seamless integration of both object storage and file system-based storage solutions. It acts as the central data store for Prophet, ensuring efficient access and persistence of extracted knowledge graphs and retrieval artifacts.
This project is built with Anaconda. To replicate the conda environment run the following command (NB: Anaconda installation is required)
conda env create -f environment.ymlAfter activating the Conda environment, install the latest PyTorch with CUDA 12.4 manually:
pip install torch --index-url https://download.pytorch.org/whl/cu124 As of now, config.yml file allows to following configurations:
- LLM to be used in the pipeline
- Maximum size limit for each text unit extracted from the source document
- Storage backend : Prophet supports both object storage and file system storage.
- Defaults to local file system storage. Uses python os libraries in the backend.
- Object storage backend is under development
- Storage directory : Custom storage directory path for saving the pipeline artifacts
- To prepare knowledge graph from new documents, store the document in
infra/data/directory - Update the main section of the
Prophet.pywith the new file name example:
init_state = {"source_path":"<langgraph_application_structure.md>"} # Source name with extension
...
odysseus = Odysseus(sources=["langgraph_application_structure"]) # Source name with without extension- Then run the Prophet.py
python Prophet.py- This will prepare knowledge graph and related vector databases for the given document. After all the static processing pipeline, zeromq based server will be initiated and start listening to port 5555. The Alchemist engine would connect with this server during runtime.
- Make sure Odysseus server is up
- Update the
Alchemist/engine.pyfile with the query to be answered
...
state = {"query":"Explain concept of checkpointers?"}
...- Then run the Alchemist engine (Should run as python module)
python -m Alchemist.engine- Support for multiple retrieval strategies beyond GraphRAG
- MinIO storage backend integrations within Sanchayam
- Implement additional knowledge graph extraction techniques
Prophet is open-source and licensed under MIT License.