Skip to content

A Framework for LLM-based Multi-Agent Reinforced Training and Inference

License

Notifications You must be signed in to change notification settings

TsinghuaC3I/MARTI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,469 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MARTI: A Framework for LLM-based Multi-Agent Reinforced Training and Inference

If you like our project, please give us a star ⭐ on GitHub for the latest update.
Typing Animation

MARTI is an open-source framework for training LLM-based Multi-Agent Systems (MAS) with Reinforcement Learning (RL). It enables powerful, scalable, and adaptive workflows by combining centralized multi-agent interactions with distributed policy training. MARTI supports both built-in graph-based workflows and popular third-party multi-agent frameworks.

MARTI-v2 extends the framework with tree search-augmented RL for complex reasoning tasks like code generation. By integrating multi-agent tree search, MARTI-v2 enables efficient multi-turn exploration with adaptive node expansion and refinement, allowing agents to systematically explore solution spaces and discover high-quality reasoning trajectories. The framework also incorporates advanced RL training techniques (GSPO loss for sequence-level optimization, TIS correction for vLLM sampling mismatch, dynamic data filtering, overlong buffer for token penalty) to support ultra-long sequences up to 32K tokens and heterogeneous multi-agent training.

We hope that MARTI not only advances reasoning capabilities beyond those of individual large language models or reasoning models, but also fosters collective intelligence as a step toward general artificial intelligence.

📣 Latest News

  • [2026-02-10] 🚀🚀🚀 We release MARTI-v2 with scaling multi-agent tree search via reinforcement learning for code generation (MARS2). Look at 🌳 MARS² - Multi-Agent Tree Search RL (New!) and Technical Report.
  • [2026-01-25] MARTI was accepted by ICLR 2026, congrats to the team.
  • [2025-10-10] We’re thrilled to see both ReviewRL (EMNLP 2025) and CoMAS being built on MARTI!
  • [2025-08-05] We have introduced new support for Async Tool Use in Agentic RL, and Async Workflow for Multi-Agent RL. This enables more flexible and efficient RL pipelines, supporting both single-agent and multi-agent scenarios. Look at 🤝 Customised Async Step and Workflow.
  • [2025-05-27] We release the codebase of MARTI framework, welcome to have a try on LLM-based multi-agent reinforcement learning. 🤗

Table of Contents

💡 Overview

MARTI-v2: Tree Search-Augmented Multi-Agent RL (🔥New!)

MARTI-v2 extends the framework with tree search-augmented reinforcement learning for complex reasoning tasks like code generation. By integrating multi-agent tree search with advanced RL techniques, MARTI-v2 enables efficient multi-step exploration with adaptive node expansion and refinement, allowing agents to systematically explore solution spaces and discover high-quality reasoning trajectories.

The framework has been adapted to the latest OpenRLHF infrastructure, incorporating state-of-the-art RL training techniques for heterogeneous multi-agent training.

Figure 1: Overview of Core Components of MARTI-v2

Key Features:

  • Multi-Agent Tree Search: Efficient tree exploration with asynchronous multi-agent tree search, supporting code generation tasks with adaptive node expansion and refinement
  • GSPO Loss: Sequence-level policy optimization (vs. token-level in PPO) better suited for complex reasoning tasks
  • TIS Correction: Truncated Importance Sampling addresses distribution shift in long sequence generation, enabling stable training for ultra-long contexts and correcting vLLM sampling bias during rollout
  • Heterogeneous Multi-Agent Training: Train different models simultaneously (e.g., Qwen3-8B + AreaL-boba-2-8B) with independent roles, training strategies, and dynamic sample filtering per agent

MARTI

We designed the MARTI framework following the principle of centralized multi-agent interaction with distributed policy training, where all agent interactions and reward allocation occur centrally while policy training is distributed across individual agents. As illustrated in Figure 1, MARTI comprises three core modules: Multi-Agent World, Centralized Rewarding, and Single Agent Trainer.

Figure 2: Overview of Core Components of MARTI

Key Features:

  • Multi-Agent Inference + RL Training in a unified framework
  • Graph-based workflows (debate, chain-of-agents, mixture-of-agents)
  • Support for heterogeneous models within the same agent graph
  • Built-in credit assignment and reward shaping strategies
  • Support for diverse RL algorithms (PPO, GRPO, REINFORCE++, TTRL)
  • Third-party integration with AutoGen and CAMEL (experimental)
  • Advanced performance on reasoning benchmarks (e.g., AIME)

Additionally, building on single-agent RL frameworks like OpenRLHF and verl, MARTI supports the vLLM v1 Engine and a Hybrid Engine to enable fast and efficient training.

🚀 Quick Start

📦 Installation

git clone https://github.com/TsinghuaC3I/MARTI.git
cd MARTI

pip install -r requirements.txt

Follow the setup instructions for dependencies, including OpenRLHF, Ray, and vLLM.


🌳 MARS² - Multi-Agent Tree Search RL (🔥New!)

MARTI-v2 introduces tree search-augmented reinforcement learning training (MARS²) for complex reasoning tasks like code generation.

Key Features:

  • Single-agent and Multi-agent MCTS training for code generation tasks
  • GSPO Loss: Sequence-level policy optimization (better suited for complex reasoning than PPO's token-level optimization)
  • TIS Correction: Truncated Importance Sampling to address vLLM sampling distribution mismatch
  • Dynamic Filtering: Per-agent sample filtering for heterogeneous training
  • Overlong Buffer: Penalty mechanism for excessively long token sequences

Single-Agent MCTS Training

# Minimum hardware requirement: approximately 8×80G GPUs

# Add path setting in scripts
ROOT_DIR="/path/to/MARTI"
MODEL_DIR="/path/to/models"

# Single-agent MCTS training
# See the script for more training examples
bash examples/mars2/run_train_single_mcts.sh

Multi-Agent MCTS Training

# Minimum hardware requirement: approximately 8×80G GPUs per agent

# Add path setting in scripts
ROOT_DIR="/path/to/MARTI"
MODEL_DIR="/path/to/models"

# Multi-agent MCTS training
# See the script for more training examples
bash examples/mars2/run_train_multi_mcts.sh

🤝 Customised Async Step and Workflow

We introduce asynchronous tool use and workflow support for both single-agent and multi-agent RL pipelines. These features make our framework more modular, efficient, and scalable for a variety of RL scenarios.

Supported Workflows:

  • Multi-Agent Debate
  • Chain-of-Agents
  • Mixture-of-Agents
  • Review-RL

Single-Agent Training

# Minimum hardware requirement: approximately 8×80G GPUs

# Add path setting in scripts
ROOT_DIR="/path/to/MARTI"
MODEL_DIR="/path/to/models"

# Train asynchronous multi-turn code RL
bash examples/single-agent/run_train_code_async.sh

# Train asynchronous multi-turn math RL
bash examples/single-agent/run_train_math_async.sh

Multi-Agent Training

# Minimum hardware requirement: approximately 8×80G GPUs per agent

# Add path setting in scripts
ROOT_DIR="/path/to/MARTI"
MODEL_DIR="/path/to/models"

# Mixture-of-Agents
bash examples/multi-agent/run_train_chain.sh

# Multi-agent Debate
bash examples/multi-agent/run_train_mad.sh

# Chain-of-agents (MathChat)
bash examples/multi-agent/run_train_mathchat.sh

# Review-RL
bash examples/reviewrl/run_train_reviewrl_async.sh

📊 Experimental Results

MARTI-v2 (New!)

Training Details

We employ the MARTI-v2 framework to train reasoning models, specifically Qwen3-8B, Qwen3-14B, AreaL-boba-2-8B, AreaL-boba-2-14B, and DeepCoder-14B. For multi-agent reinforcement learning, we employ a cluster configuration consisting of 3 nodes, each equipped with 8 H200 GPUs, allocating one full node per agent.

Benchmark Results

We evaluate MARTI-v2 on the LCB code generation benchmark under both single-agent and multi-agent settings compared to baseline methods. As shown in Figure 3 and Figure 4, our experiments demonstrate that:

  • Single-agent MCTS achieves faster convergence: The single-agent setting outperforms Vanilla GRPO baseline across all base models, with Pass@1 improvements up to 4.6% and Pass@1(MCTS) improvements up to 5.1%, exhibiting faster early-stage convergence and stronger deep optimization capabilities.
  • Multi-agent MCTS breaks performance bottlenecks: The multi-agent setting maintains policy diversity and effectively addresses the performance saturation issue in later training stages. For Qwen3-8B, multi-agent training achieves 8.0% improvement over the base model, 4.4% over Vanilla GRPO, and 2.9% over single-agent peak performance.
  • Enhanced system-level collaboration: With 14B-scale heterogeneous agent teams, multi-agent training achieves 71.2% Pass@1(MCTS), with consistent improvements in Pass@N metrics, validating comprehensive enhancements in collaborative problem-solving capabilities.

Figure 3: Experimental results of single-agent MCTS and baseline methods on LCB benchmarks

Figure 4: Pass@1 results of multi-agent MCTS and baseline methods on LCB benchmarks

📚 Documentation

👏 Acknowledge

MARTI is developed primarily based on OpenRLHF. We would like to express our gratitude to the developers of OpenRLHF, as well as to the teams behind vLLM, Ray, DeepSpeed, and TreeQuest for their invaluable contributions.

🤝 Core Contributors

For the full list of contributors, please refer to the author list in the citation. We are also deeply grateful to everyone who engaged in discussions and provided valuable feedback throughout the development of this project.

📬 Contact

For issues or inquiries:

🔬 Citation

If you use MARTI in your research, please cite the project:

@misc{marti2025,
  title={MARTI: A Framework for Multi-Agent LLM Systems Reinforced Training and Inference},
  author={Kaiyan Zhang and Runze Liu and Xuekai Zhu and Kai Tian and Sihang Zeng and Guoli Jia and Yuchen Fan and Xingtai Lv and Yuxin Zuo and Che Jiang and Ziyang Liu and Jianyu Wang and Yuru Wang and Ruotong Zhao and Ermo Hua and Yibo Wang and Shijie Wang and Junqi Gao and Xinwei Long and Youbang Sun and Zhiyuan Ma and Ganqu Cui and Lei Bai and Ning Ding and Biqing Qi and Bowen Zhou},
  year={2025},
  institution={Tsinghua University and Shanghai AI Lab},
  url={https://github.com/TsinghuaC3I/MARTI}
}

@misc{marti2026,
  title={MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation},
  author={Shijie Wang and Pengfei Li and Yikun Fu and Kaifeng Liu and Fangyuan Li and Yang Liu and Xiaowei Sun and Zonglin Li and Siyao Zhao and Jian Zhao and Kai Tian and Dong Li and Junqi Gao and Yutong Zhang and Yiqun Chen and Yuqiang Li and Zoe Li and Weinan Zhang and Peng Ye and Shuyue Hu and Lei Bai and Bowen Zhou and Kaiyan Zhang and Biqing Qi},
  year={2026},
  institution={Shanghai AI Lab and Tsinghua University},
  url={https://github.com/TsinghuaC3I/MARTI}
}

⭐️ Star History

Star History Chart


MARTI © 2025 Tsinghua University & Shanghai AI Lab. All rights reserved.

Releases

No releases published

Packages

No packages published

Contributors 76

Languages