🌐 Language

中文 | 📘 English

「Kemeng」 Domain Chat Assistant Based on Knowledge Graph and Corpus

📝 Project Introduction

Pokémon is one of the most influential IPs worldwide, with a massive universe and character data. Its long-term accumulation across games, animations, cards, and films has resulted in a highly structured knowledge system, making it ideal for knowledge graph modeling and intelligent Q&A scenarios.

With the advancement of LLMs and knowledge-enhanced techniques, building a multimodal, structured, and interactive AI system based on the Pokémon universe is now feasible. This project builds a Pokémon knowledge graph using data from Baidu Tieba and Wikipedia, covering characters, attributes, skills, regions, evolution paths, and more. Combined with LLM capabilities, we created a Pokémon-domain smart assistant — “Kemeng.”

By integrating LangGraph pipeline orchestration, GraphRAG enhanced retrieval, and graph visualization, users can both get accurate answers through natural language queries and visually explore the Pokémon world. The system also supports geographic mapping, linking Pokémon world locations to real-world coordinates for spatial visualization.

This project is designed to be a transferable, scalable domain assistant template, making it easy to adapt for other characters or fields (e.g., Su Shi, finance, e-government) by simply changing the knowledge source and graph structure.

🚀 New Features

LangChain & LangGraph: Supports LangChain 1.x and LangGraph 1.0 for multi-agent orchestration
LightRAG Integration: Integrated with HKU-DS LightRAG for efficient retrieval
Advanced RAG: Features Self-RAG, CRAG, HyDE, and Query Decomposition
Agentic Memory: Long-term memory with user preference adaptation
MCP Service: Supports Model Context Protocol for real-world location mapping
Performance: Built-in Semantic Cache and Speculative RAG for speed

🎯 System Architecture

The project includes a complete Vue3 + FastAPI stack and a functional Pokémon knowledge graph-based Q&A system. It combines semantic modeling (BERT + TF-IDF + rule matching) with generative Q&A, supporting questions about evolution, attribute restraints, skills, and geographic distribution.

Core Architecture:

Hybrid Retrieval: Vector Retrieval (Milvus) + Graph Retrieval (Neo4j) + Keyword Retrieval (BM25)
Agent Orchestration: LangGraph state machine for complex task management
Knowledge Enhancement: GraphRAG for entity relationship extraction

Architecture overview:

🎯 Highlights

Fine-tuned a Pokémon-domain LLM ("Kemeng") using web-scraped data.
Built a Pokémon knowledge graph based on Wikipedia and forums.
Automated NER training with RoBERTa + TF-IDF + rule-based matching.
Integrated FunASR (Alibaba DAMO Academy) for ASR (speech-to-text) capabilities.
[NEW] Implemented MCP Service to support mapping and querying of Pokémon world locations to real-world coordinates.
Extracted documents with DeepDoc to enhance knowledge base parsing.
[NEW] Used LangGraph to implement multi-agent collaboration (RAG + Search + Graph + MCP).
Encapsulated agent base class for multi-agent workflows.
Supports graph search, web search, knowledge base search, MCP queries, and voice input, in any combination.

🛠️ Deployment

Requirements: Docker & Docker Compose

🐳 Docker Compose One-Click Start (Recommended)

No manual environment configuration needed. Directly use Docker Compose to start all services:

# 1. Clone the repository
git clone https://github.com/skygazer42/pokemon-chat.git
cd pokemon-chat

# 2. Configure environment variables (backend uses repo root `.env`; Docker Compose loads it too)
cp .env.example .env
# Edit .env and fill in your LLM API key (e.g. llm_api_key / SILICONFLOW_API_KEY)
# Optional: enable retrieval/tools (examples)
#   enable_knowledge_graph=true   # Neo4j knowledge graph
#   enable_knowledge_base=true    # Milvus knowledge base
#   enable_web_search=true        # Web search (requires tavily_api_key)
#   enable_mcp=true               # MCP (typically `--profile infra --profile mcp`)
# Optional: enable ASR (FunASR) -> enable_asr=true, funasr_url=ws://funasr:10095 (Docker)
# Optional: restrict CORS origins (recommended for production) -> cors_allow_origins=http://localhost:3100

# 3. Start services
cd docker
# Default starts the app only (API + Web).
docker compose up -d --build

# Optional: start infra (Neo4j/MySQL/Milvus, plus auto Neo4j import via neo4j-bootstrap)
# docker compose --profile infra up -d --build

# Optional: MCP SSE server (needs MySQL, so typically use it with infra)
# docker compose --profile infra --profile mcp up -d --build

# Optional: ASR (FunASR)
# docker compose --profile asr up -d --build

Access:

Web UI: http://localhost:3100/
API Docs: http://localhost:3100/api/docs (or direct http://localhost:5050/docs)

📦 Data Initialization (First Run)

When you start with --profile infra, Neo4j graph data is imported automatically by the one-shot service neo4j-bootstrap, from:

resources/data/kg_data/entities.json
resources/data/kg_data/relations.json

Force re-import (DANGEROUS: wipes the Neo4j DB):

cd docker
docker compose run --rm neo4j-bootstrap python scripts/import_graph.py --wait-seconds 120 --force --reset

Optional: MySQL map data import (run only if you need the map feature). We don’t run it automatically to keep the one-command startup stable across environments.

cd docker
docker compose exec api python scripts/import_pokemon_map.py

If you want a clean start (wipe persisted data and re-run bootstrap):

cd docker
docker compose down
# These are bind-mounted data directories (not named volumes). Linux/macOS/WSL:
rm -rf volumes/neo4j/data volumes/neo4j/logs volumes/milvus volumes/mysql/data
docker compose --profile infra up -d --build

Windows PowerShell:

cd docker
docker compose down
Remove-Item -Recurse -Force .\\volumes\\neo4j\\data, .\\volumes\\neo4j\\logs, .\\volumes\\milvus, .\\volumes\\mysql\\data
docker compose --profile infra up -d --build

✅ Verify It's Running

Web UI: http://localhost:3100/
API Ready: http://localhost:5050/readyz
Neo4j Browser: http://localhost:7474/ (only if you started with --profile infra)

Command checks:

cd docker
docker compose ps
docker compose exec -T neo4j cypher-shell 'MATCH (n) RETURN count(n) AS nodes;'  # only with --profile infra

🧰 Troubleshooting (Docker)

Port conflicts: edit docker/docker-compose.yml port mappings (defaults Web=3100, API=5050, Neo4j=7474/7687, MySQL=3307, Milvus=19530/19091)
Orphan containers warning: cd docker && docker compose up -d --build --remove-orphans

🤝 Contributing

This repo is meant to be reproducible via Docker. For dev conventions/tests/contribution workflow, see CONTRIBUTING.md.

🔭 Reference Projects

📄 License

This project is licensed under the MIT License, free for commercial and personal use. Please retain author credits when redistributing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

「Kemeng」 Domain Chat Assistant Based on Knowledge Graph and Corpus

📝 Project Introduction

🚀 New Features

🎯 System Architecture

🎯 Highlights

🛠️ Deployment

🐳 Docker Compose One-Click Start (Recommended)

📦 Data Initialization (First Run)

✅ Verify It's Running

🧰 Troubleshooting (Docker)

🤝 Contributing

🔭 Reference Projects

📄 License

FilesExpand file tree

README.en.md

Latest commit

History

README.en.md

File metadata and controls

「Kemeng」 Domain Chat Assistant Based on Knowledge Graph and Corpus

📝 Project Introduction

🚀 New Features

🎯 System Architecture

🎯 Highlights

🛠️ Deployment

🐳 Docker Compose One-Click Start (Recommended)

📦 Data Initialization (First Run)

✅ Verify It's Running

🧰 Troubleshooting (Docker)

🤝 Contributing

🔭 Reference Projects

📄 License