GLM is a modular multi-agent graph-based chain of thought reasoning framework designed to solve complex queries by performing multi-step reasoning over graph-structured knowledge bases. It combines powerful language models, structured graph retrieval, and prompt engineering to reason in a human-like, interpretable way.
├── core/
│ ├── agent.py # LLM-based agent
│ ├── glm.py # GLM framework for multi-agent reasoning
│ ├── llm.py # LLM interface
│ ├── retriever.py # Graph-RAG-based retriever
│ ├── cache.py # Thread-safe shared LRU cache
├── custom/
│ ├── fewshots.py # Few-shot examples for each dataset
│ ├── templates.py # Prompt templates & graph definitions
├── result/ # Output results and logs
├── main.py # Entrypoint for running experiments
├── accuracy.py # Evaluation: GPTScore, ROUGE
-
Multi-Agent Reasoning System:
classification_agent: Determines question type and take different branchthought_agent: Determine what information is required to solve questionaction_agent: Generates Python code snippet to retrieve the required informationretriever: Executes code to fetch knowledge from graphs
-
Graph-Based Execution:
- Supports domain-specific knowledge graphs (healthcare, academic, legal, etc.)
- Uses FAISS for embedding-based retrieval
- Auto-repairs failed queries using LLM
-
Template-Driven Prompting:
- Dynamic templates for classification, thought, and code generation
- Configurable few-shot examples for each domain
-
Evaluation Tools:
- GPT-based correctness checking (
GPTScore) - Standard text similarity metrics (
ROUGE)
- GPT-based correctness checking (
- Classify the question as deterministic or non-deterministic
- If deterministic:
- Action → Retrieve → Get answer
- If non-deterministic:
- Think → Action → Retrieve → Repeat until finish
- Collect execution logs, answer, and statistics
The code is written in Python 3.8. Before running, you need to first install the required packages by typing following commands (Using a virtual environment is recommended):
conda create --name graphcot python==3.8
conda activate graphcot
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
conda install -c pytorch -c nvidia faiss-gpu=1.7.4
pip3 install -r requirements.txt
You can download the test dataset from zeta : https://zeta.alipay.com/zeta/GraphCoT_Dataset
Each dataset directory should contain:
graph.json # The structured knowledge graph
data.json # The list of QA pairs (qid, question, answer)
cache-all-mpnet-base-v2 # The embeddings cache of knowledge graph by all-mpnet-base-v2 model
Example:
/home/GraphCoT_Dataset/healthcare/
├── graph.json
├── data.json
└── cache-all-mpnet-base-v2
A complete usage example is provided in main.py. You can modify the configuration by changing variables such as:
dataset = "legal"default_graph_dir = f"/home/GraphCoT_Dataset/{dataset}"openai_api_key = "EMPTY"openai_api_base = "http://localhost:8000/v1"model = "/ossfs/workspace/Qwen/Qwen3-235B-A22B"embedding_model_name = "/ossfs/workspace/all-mpnet-base-v2"
Run GPT-based correctness and ROUGE evaluation:
python accuracy.pyMetrics:
- GPTScore: Uses an LLM to judge whether the answer is correct (Yes/No)
- ROUGE: Measures string overlap between prediction and ground truth