Skip to content

OpenMLRL/CoMLRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

253 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenMLRL arXiv documentation Hugging Face

Python Version PyPI version Conda version PyPI downloads

CI tests pre-commit.ci Docs Build code style: black license: BSD-3-Clause

Cooperative Multi-LLM Reinforcement Learning (CoMLRL) is an open-source library for training multiple LLMs to collaborate using Multi-Agent Reinforcement Learning (MARL). It provides implementations of various MARL algorithms for LLM collaboration and support for different environments and benchmarks.

Installation

CoMLRL can be installed via PyPI, conda-forge, or from source:

# Install from PyPI
pip install comlrl

# Install from conda-forge
conda install -c conda-forge comlrl

# Install from source
git clone https://github.com/OpenMLRL/CoMLRL.git
cd CoMLRL && pip install -e .

# Install Compatible PyTorch

Features

  • Cooperative MARL trainers to optimize decentralized LLM collaboration:

    • Multi-Agent REINFORCE: Critic-free policy gradient methods, including MAREINFORCE, MAGRPO, MARLOO, MAREMAX.
      • Aligned individual response joint with joint_mode='aligned'.
      • Memory-efficient cross joint with joint_mode='cross'.
    • Multi-Agent Actor-Critic: Actor-Critic methods, including IAC and MAAC.
      • Independent actor-critic (separate critic or value-head over LLM backbone).
      • Centralized critic over joint prompts with separate actors.
  • Environments that simulate real-world tasks for training and evaluating LLM collaboration:

    • Writing: Multiple LLM agents collaborate on processing articles.
      • TLDR - Summarizing Reddit posts.
      • ArXiv - Expanding abstracts into introductions.
    • Coding: Generate code solutions for programming problems.
      • MBPP - Mostly basic python problems.
      • HumanEval - Handwritten evaluation problems.
      • CoopHumanEval - HumanEval with cooperative nature.
      • ClassEval - Complete class-level code based on attributes and docstrings.
    • Minecraft: Collaborative building tasks in Minecraft.
      • StrBuild - Building structures based on string blueprints.
      • HouseBuild - Constructing houses from given blueprints while defending against spider attacks.

Usage

Quick start by training 2 Qwen-2.5 to summarize Reddit posts with MAGRPO:

from datasets import load_dataset
from transformers import AutoTokenizer
from comlrl.trainers.reinforce import MAGRPOConfig, MAGRPOTrainer

# Load dataset and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
dataset = load_dataset("trl-lib/tldr", split="train").select(range(128))

# Initialize trainer and start training
trainer = MAGRPOTrainer(
    model="Qwen/Qwen2.5-0.5B",
    num_agents=2,
    tokenizer=tokenizer,
    train_dataset=dataset,
    reward_func=lambda a, b: [abs(max(len(b[0]), 1) / max(len(a[0]), 1) - 3.0)],
    formatters=[lambda example: example["prompt"]] * 2,
    args=MAGRPOConfig(
    ),
)
trainer.train()

Contributing

We thank the gracious help of all contributors:


Shuo Liu

πŸ€” 🚧 πŸ’» πŸ“–

Tianle Chen

🚧 πŸ’» πŸ›

Ryan Amiri

🚧 πŸ’» πŸ›

Zeyu Liang

πŸ“– πŸ›
πŸ€”: Foundational Ideas; 🚧: Maintenance; πŸ’»: Code; πŸ“–: Documentation; πŸ›: Bug Report.

For new contributors, please see contributing guidelines on setting up a dev environment.

Sponsorship

CoMLRL was developed using substantial computational resources. Its growth has been made possible by the generous support of the following organizations and institutions.

We welcome computational sponsorship to support the continued development of CoMLRL. If you are interested in supporting this project, please contact us. Email

Citation

Please cite the following papers if you find this library useful in your research:

@inproceedings{liu2025llmcollabmarl,
  title     = {LLM Collaboration With Multi-Agent Reinforcement Learning},
  author    = {Liu, Shuo and Liang, Zeyu and Lyu, Xueguang and Amato, Christopher},
  booktitle = {Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence},
  year      = {2026}
}

@article{liu2026learndecllmcollabmaac,
  title   = {Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic},
  author  = {Liu, Shuo and Chen, Tianle and Amiri, Ryan and Amato, Christopher},
  journal = {arXiv preprint arXiv:2601.21972},
  year    = {2026}
}