Skip to content

PhucHuwu/Undertaker-Ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Undertaker-Ai

Undertaker-Ai is a sophisticated interface designed to interact with a Microsoft GraphRAG knowledge index. This project is specifically configured to analyze and explore the narrative depths of the light novel series "86 - Eighty Six". By leveraging Graph Retrieval-Augmented Generation (GraphRAG), the application enables users to perform complex query reasoning over structured data extracted from unstructured text.

Graph Preview

Key Features

1. Advanced Search Capabilities

The application supports two distinct modes of inquiry to analyze the knowledge base:

  • Global Search (Map-Reduce):

    • Designed for broad questions that require aggregating information from across the entire dataset.
    • Mechanism: Uses a map-reduce approach to query community summaries, generating a comprehensive answer that synthesizes themes and widespread facts.
    • Use Case: "What are the major themes of the Eighty Six series?" or "How does the war affect the San Magnolia Republic?"
  • Local Search (Neighbourhood):

    • Optimized for specific questions about distinct entities (characters, locations, organizations).
    • Mechanism: Navigates to a specific entity's node and explores its immediate neighbors (connected relationships and text units) to provide granular details.
    • Use Case: "Who is Shinei Nouzen?" or "Describe the Juggernaut mecha."

2. Interactive Knowledge Graph

Visualize the underlying data structure using PyVis:

  • Dynamic filtering: Adjust the minimal edge weight to filter out weak connections and focus on strong relationships.
  • Node limitation: Control the maximum number of nodes displayed to prevent visual clutter and ensure performance.
  • Physics engine: Nodes automatically arrange themselves using a force-directed layout for optimal readability.

3. Transparent Sourcing

Every answer generated by the system includes:

  • Context Data: The specific text chunks and community reports used by the LLM.
  • Traceability: Allows users to verify the information against the source material.

Technical Architecture & Configuration

This project is built on Python 3.10+ and integrates several powerful libraries.

Core Dependencies

  • graphrag: Microsoft's library for structured GraphRAG pipelines.
  • streamlit: The web framework powering the user interface.
  • pandas: For efficient data manipulation of entities and relationships.
  • networkx & pyvis: For graph modeling and interactive rendering.

Configuration (settings.yaml)

The project uses a strict configuration file to manage the GraphRAG pipeline. Key settings include:

  • LLM & Embeddings: Configured to use OpenAI-compatible endpoints (e.g., OpenRouter).
    • default_chat_model: Handles answer generation and graph extraction.
    • default_embedding_model: Generates vector embeddings for text units (text-embedding-3-small).
  • Data Ingestion:
    • Input: Text files located in input/.
    • Chunking: Text is split into 1200-token chunks with 100-token overlap to maintain context.
  • Storage: uses lancedb for vector storage and local file system for artifacts.

Installation & Setup

Prerequisites

  • Python 3.10 or higher.
  • Git.
  • An API Key for an OpenAI-compatible LLM provider (e.g., OpenRouter, OpenAI).

Step 1: Clone the Repository

git clone https://github.com/PhucHuwu/Undertaker-Ai.git
cd Undertaker-Ai

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3: Configure Environment

Create a .env file in the root directory to store your credentials. This avoids hardcoding sensitive keys in settings.yaml.

# .env file
GRAPHRAG_API_KEY=your_actual_api_key
GRAPHRAG_CHAT_MODEL=your_preferred_model_name

Operations Guide

1. Data Indexing (If needed)

If you have raw text files in the input/ folder but no index in output/, you must run the indexing pipeline first:

python -m graphrag.index --root .

This process extracts entities, relationships, and communities/claims, which can take time depending on the dataset size.

2. Launching the App

Start the Streamlit interface:

streamlit run app.py

3. Usage

  • Dashboard: The sidebar shows the status of the index loading.
  • Chat Interface: Select "Global" or "Local" search, type your query, and view the AI-generated response along with context.
  • Visualization: Switch tabs to view the node-link diagram of the characters and events.

Troubleshooting

Common Issues

  • "Output directory not found":

    • Cause: The GraphRAG indexing pipeline has not been run or completed successfully.
    • Solution: Run the indexing command mentioned in the "Data Indexing" section.
  • API Errors / Authentication Failures:

    • Cause: Incorrect API Key or Model Name in .env.
    • Solution: Verify your .env file matches the variable names expected by settings.yaml. Check your API provider's dashboard for quota limits.
  • Graph Visualization is Empty:

    • Cause: The "Minimum Edge Weight" filter might be too high.
    • Solution: Lower the slider in the visualization tab to reveal weaker connections.

Directory Structure

  • app.py: The main application entry point.
  • settings.yaml: The master configuration file for the GraphRAG pipeline.
  • input/: Directory for raw source text files (*.txt).
  • output/: Directory where the indexed artifacts (Parquet files, LanceDB) are stored.
  • prompts/: Custom prompt templates used to guide the LLM during extraction and search.
  • cache/: Local cache to speed up subsequent runs and reduce API costs.

About

This project implements a Retrieval-Augmented Generation (RAG) system using GraphRAG for the novel "86 - Eighty Six"

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages