Cooperative Multi-LLM Reinforcement Learning (CoMLRL) is an open-source library for training multiple LLMs to collaborate using Multi-Agent Reinforcement Learning (MARL). It provides implementations of various MARL algorithms for LLM collaboration and support for different environments and benchmarks.
CoMLRL can be installed via PyPI, conda-forge, or from source:
# Install from PyPI
pip install comlrl
# Install from conda-forge
conda install -c conda-forge comlrl
# Install from source
git clone https://github.com/OpenMLRL/CoMLRL.git
cd CoMLRL && pip install -e .
# Install Compatible PyTorch-
Cooperative MARL trainers to optimize decentralized LLM collaboration:
- Multi-Agent REINFORCE: Critic-free policy gradient methods, including MAREINFORCE, MAGRPO, MARLOO, MAREMAX.
- Aligned individual response joint with
joint_mode='aligned'. - Memory-efficient cross joint with
joint_mode='cross'.
- Aligned individual response joint with
- Multi-Agent Actor-Critic: Actor-Critic methods, including IAC and MAAC.
- Independent actor-critic (separate critic or value-head over LLM backbone).
- Centralized critic over joint prompts with separate actors.
- Multi-Agent REINFORCE: Critic-free policy gradient methods, including MAREINFORCE, MAGRPO, MARLOO, MAREMAX.
-
Environments that simulate real-world tasks for training and evaluating LLM collaboration:
- Writing: Multiple LLM agents collaborate on processing articles.
- Coding: Generate code solutions for programming problems.
- MBPP - Mostly basic python problems.
- HumanEval - Handwritten evaluation problems.
- CoopHumanEval - HumanEval with cooperative nature.
- ClassEval - Complete class-level code based on attributes and docstrings.
- Minecraft: Collaborative building tasks in Minecraft.
- StrBuild - Building structures based on string blueprints.
- HouseBuild - Constructing houses from given blueprints while defending against spider attacks.
Quick start by training 2 Qwen-2.5 to summarize Reddit posts with MAGRPO:
from datasets import load_dataset
from transformers import AutoTokenizer
from comlrl.trainers.reinforce import MAGRPOConfig, MAGRPOTrainer
# Load dataset and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
dataset = load_dataset("trl-lib/tldr", split="train").select(range(128))
# Initialize trainer and start training
trainer = MAGRPOTrainer(
model="Qwen/Qwen2.5-0.5B",
num_agents=2,
tokenizer=tokenizer,
train_dataset=dataset,
reward_func=lambda a, b: [abs(max(len(b[0]), 1) / max(len(a[0]), 1) - 3.0)],
formatters=[lambda example: example["prompt"]] * 2,
args=MAGRPOConfig(
),
)
trainer.train()We thank the gracious help of all contributors:
Shuo Liu π€ π§ π» π |
Tianle Chen π§ π» π |
Ryan Amiri π§ π» π |
Zeyu Liang π π |
For new contributors, please see contributing guidelines on setting up a dev environment.
CoMLRL was developed using substantial computational resources. Its growth has been made possible by the generous support of the following organizations and institutions.
We welcome computational sponsorship to support the continued development of CoMLRL. If you are interested in supporting this project, please contact us.
Please cite the following papers if you find this library useful in your research:
@inproceedings{liu2025llmcollabmarl,
title = {LLM Collaboration With Multi-Agent Reinforcement Learning},
author = {Liu, Shuo and Liang, Zeyu and Lyu, Xueguang and Amato, Christopher},
booktitle = {Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence},
year = {2026}
}
@article{liu2026learndecllmcollabmaac,
title = {Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic},
author = {Liu, Shuo and Chen, Tianle and Amiri, Ryan and Amato, Christopher},
journal = {arXiv preprint arXiv:2601.21972},
year = {2026}
}

