Undertaker-Ai is a sophisticated interface designed to interact with a Microsoft GraphRAG knowledge index. This project is specifically configured to analyze and explore the narrative depths of the light novel series "86 - Eighty Six". By leveraging Graph Retrieval-Augmented Generation (GraphRAG), the application enables users to perform complex query reasoning over structured data extracted from unstructured text.
The application supports two distinct modes of inquiry to analyze the knowledge base:
-
Global Search (Map-Reduce):
- Designed for broad questions that require aggregating information from across the entire dataset.
- Mechanism: Uses a map-reduce approach to query community summaries, generating a comprehensive answer that synthesizes themes and widespread facts.
- Use Case: "What are the major themes of the Eighty Six series?" or "How does the war affect the San Magnolia Republic?"
-
Local Search (Neighbourhood):
- Optimized for specific questions about distinct entities (characters, locations, organizations).
- Mechanism: Navigates to a specific entity's node and explores its immediate neighbors (connected relationships and text units) to provide granular details.
- Use Case: "Who is Shinei Nouzen?" or "Describe the Juggernaut mecha."
Visualize the underlying data structure using PyVis:
- Dynamic filtering: Adjust the minimal edge weight to filter out weak connections and focus on strong relationships.
- Node limitation: Control the maximum number of nodes displayed to prevent visual clutter and ensure performance.
- Physics engine: Nodes automatically arrange themselves using a force-directed layout for optimal readability.
Every answer generated by the system includes:
- Context Data: The specific text chunks and community reports used by the LLM.
- Traceability: Allows users to verify the information against the source material.
This project is built on Python 3.10+ and integrates several powerful libraries.
graphrag: Microsoft's library for structured GraphRAG pipelines.streamlit: The web framework powering the user interface.pandas: For efficient data manipulation of entities and relationships.networkx&pyvis: For graph modeling and interactive rendering.
The project uses a strict configuration file to manage the GraphRAG pipeline. Key settings include:
- LLM & Embeddings: Configured to use OpenAI-compatible endpoints (e.g., OpenRouter).
default_chat_model: Handles answer generation and graph extraction.default_embedding_model: Generates vector embeddings for text units (text-embedding-3-small).
- Data Ingestion:
- Input: Text files located in
input/. - Chunking: Text is split into 1200-token chunks with 100-token overlap to maintain context.
- Input: Text files located in
- Storage: uses
lancedbfor vector storage and local file system for artifacts.
- Python 3.10 or higher.
- Git.
- An API Key for an OpenAI-compatible LLM provider (e.g., OpenRouter, OpenAI).
git clone https://github.com/PhucHuwu/Undertaker-Ai.git
cd Undertaker-Aipip install -r requirements.txtCreate a .env file in the root directory to store your credentials. This avoids hardcoding sensitive keys in settings.yaml.
# .env file
GRAPHRAG_API_KEY=your_actual_api_key
GRAPHRAG_CHAT_MODEL=your_preferred_model_nameIf you have raw text files in the input/ folder but no index in output/, you must run the indexing pipeline first:
python -m graphrag.index --root .This process extracts entities, relationships, and communities/claims, which can take time depending on the dataset size.
Start the Streamlit interface:
streamlit run app.py- Dashboard: The sidebar shows the status of the index loading.
- Chat Interface: Select "Global" or "Local" search, type your query, and view the AI-generated response along with context.
- Visualization: Switch tabs to view the node-link diagram of the characters and events.
-
"Output directory not found":
- Cause: The GraphRAG indexing pipeline has not been run or completed successfully.
- Solution: Run the indexing command mentioned in the "Data Indexing" section.
-
API Errors / Authentication Failures:
- Cause: Incorrect API Key or Model Name in
.env. - Solution: Verify your
.envfile matches the variable names expected bysettings.yaml. Check your API provider's dashboard for quota limits.
- Cause: Incorrect API Key or Model Name in
-
Graph Visualization is Empty:
- Cause: The "Minimum Edge Weight" filter might be too high.
- Solution: Lower the slider in the visualization tab to reveal weaker connections.
app.py: The main application entry point.settings.yaml: The master configuration file for the GraphRAG pipeline.input/: Directory for raw source text files (*.txt).output/: Directory where the indexed artifacts (Parquet files, LanceDB) are stored.prompts/: Custom prompt templates used to guide the LLM during extraction and search.cache/: Local cache to speed up subsequent runs and reduce API costs.
