Skip to content

mengziheng/GLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GLM: Graph-of-Thought Multi-Agent Reasoning Framework

GLM is a modular multi-agent graph-based chain of thought reasoning framework designed to solve complex queries by performing multi-step reasoning over graph-structured knowledge bases. It combines powerful language models, structured graph retrieval, and prompt engineering to reason in a human-like, interpretable way.


📁 Project Structure

├── core/
│   ├── agent.py         # LLM-based agent
│   ├── glm.py           # GLM framework for multi-agent reasoning
│   ├── llm.py           # LLM interface
│   ├── retriever.py     # Graph-RAG-based retriever
│   ├── cache.py         # Thread-safe shared LRU cache
├── custom/
│   ├── fewshots.py      # Few-shot examples for each dataset
│   ├── templates.py     # Prompt templates & graph definitions
├── result/              # Output results and logs
├── main.py              # Entrypoint for running experiments
├── accuracy.py          # Evaluation: GPTScore, ROUGE

🚀 Key Features

  • Multi-Agent Reasoning System:

    • classification_agent: Determines question type and take different branch
    • thought_agent: Determine what information is required to solve question
    • action_agent: Generates Python code snippet to retrieve the required information
    • retriever: Executes code to fetch knowledge from graphs
  • Graph-Based Execution:

    • Supports domain-specific knowledge graphs (healthcare, academic, legal, etc.)
    • Uses FAISS for embedding-based retrieval
    • Auto-repairs failed queries using LLM
  • Template-Driven Prompting:

    • Dynamic templates for classification, thought, and code generation
    • Configurable few-shot examples for each domain
  • Evaluation Tools:

    • GPT-based correctness checking (GPTScore)
    • Standard text similarity metrics (ROUGE)

🔄 Example Reasoning Flow

  1. Classify the question as deterministic or non-deterministic
  2. If deterministic:
    • Action → Retrieve → Get answer
  3. If non-deterministic:
    • Think → Action → Retrieve → Repeat until finish
  4. Collect execution logs, answer, and statistics

⚙️ Installation

1. Install dependencies

The code is written in Python 3.8. Before running, you need to first install the required packages by typing following commands (Using a virtual environment is recommended):

conda create --name graphcot python==3.8
conda activate graphcot
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
conda install -c pytorch -c nvidia faiss-gpu=1.7.4
pip3 install -r requirements.txt

2. Prepare your dataset

You can download the test dataset from zeta : https://zeta.alipay.com/zeta/GraphCoT_Dataset

Each dataset directory should contain:

graph.json      # The structured knowledge graph
data.json       # The list of QA pairs (qid, question, answer)
cache-all-mpnet-base-v2       # The embeddings cache of knowledge graph by all-mpnet-base-v2 model

Example:

/home/GraphCoT_Dataset/healthcare/
    ├── graph.json
    ├── data.json
    └── cache-all-mpnet-base-v2

▶️ Usage

A complete usage example is provided in main.py. You can modify the configuration by changing variables such as:

  • dataset = "legal"
  • default_graph_dir = f"/home/GraphCoT_Dataset/{dataset}"
  • openai_api_key = "EMPTY"
  • openai_api_base = "http://localhost:8000/v1"
  • model = "/ossfs/workspace/Qwen/Qwen3-235B-A22B"
  • embedding_model_name = "/ossfs/workspace/all-mpnet-base-v2"

📊 Evaluation

Run GPT-based correctness and ROUGE evaluation:

python accuracy.py

Metrics:

  • GPTScore: Uses an LLM to judge whether the answer is correct (Yes/No)
  • ROUGE: Measures string overlap between prediction and ground truth

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors