GitHub - OpenMLRL/CoMLRL: Open-Source Library for Cooperative Multi-LLM Reinforcement Learning

Cooperative Multi-LLM Reinforcement Learning (CoMLRL) is an open-source library for training multiple LLMs to collaborate using Multi-Agent Reinforcement Learning (MARL). It provides implementations of various MARL algorithms for LLM collaboration and support for different environments and benchmarks.

Installation

CoMLRL can be installed via PyPI, conda-forge, or from source:

# Install from PyPI
pip install comlrl

# Install from conda-forge
conda install -c conda-forge comlrl

# Install from source
git clone https://github.com/OpenMLRL/CoMLRL.git
cd CoMLRL && pip install -e .

# Install Compatible PyTorch

Features

Cooperative MARL trainers to optimize decentralized LLM collaboration:
- Multi-Agent REINFORCE: Critic-free policy gradient methods, including MAREINFORCE, MAGRPO, MARLOO, MAREMAX.
  - Aligned individual response joint with joint_mode='aligned'.
  - Memory-efficient cross joint with joint_mode='cross'.
- Multi-Agent Actor-Critic: Actor-Critic methods, including IAC and MAAC.
  - Independent actor-critic (separate critic or value-head over LLM backbone).
  - Centralized critic over joint prompts with separate actors.
Environments that simulate real-world tasks for training and evaluating LLM collaboration:
- Writing: Multiple LLM agents collaborate on processing articles.
  - TLDR - Summarizing Reddit posts.
  - ArXiv - Expanding abstracts into introductions.
- Coding: Generate code solutions for programming problems.
  - MBPP - Mostly basic python problems.
  - HumanEval - Handwritten evaluation problems.
  - CoopHumanEval - HumanEval with cooperative nature.
  - ClassEval - Complete class-level code based on attributes and docstrings.
- Minecraft: Collaborative building tasks in Minecraft.
  - StrBuild - Building structures based on string blueprints.
  - HouseBuild - Constructing houses from given blueprints while defending against spider attacks.

Usage

Quick start by training 2 Qwen-2.5 to summarize Reddit posts with MAGRPO:

from datasets import load_dataset
from transformers import AutoTokenizer
from comlrl.trainers.reinforce import MAGRPOConfig, MAGRPOTrainer

# Load dataset and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
dataset = load_dataset("trl-lib/tldr", split="train").select(range(128))

# Initialize trainer and start training
trainer = MAGRPOTrainer(
    model="Qwen/Qwen2.5-0.5B",
    num_agents=2,
    tokenizer=tokenizer,
    train_dataset=dataset,
    reward_func=lambda a, b: [abs(max(len(b[0]), 1) / max(len(a[0]), 1) - 3.0)],
    formatters=[lambda example: example["prompt"]] * 2,
    args=MAGRPOConfig(
    ),
)
trainer.train()

Contributing

We thank the gracious help of all contributors:

_{Shuo Liu}
🤔 🚧 💻 📖

_{Tianle Chen}
🚧 💻 🐛

_{Ryan Amiri}
🚧 💻 🐛

_{Zeyu Liang}
📖 🐛

_{🤔: Foundational Ideas; 🚧: Maintenance; 💻: Code; 📖: Documentation; 🐛: Bug Report.}

For new contributors, please see contributing guidelines on setting up a dev environment.

Sponsorship

CoMLRL was developed using substantial computational resources. Its growth has been made possible by the generous support of the following organizations and institutions.

We welcome computational sponsorship to support the continued development of CoMLRL. If you are interested in supporting this project, please contact us.

Citation

Please cite the following papers if you find this library useful in your research:

@inproceedings{liu2025llmcollabmarl,
  title     = {LLM Collaboration With Multi-Agent Reinforcement Learning},
  author    = {Liu, Shuo and Liang, Zeyu and Lyu, Xueguang and Amato, Christopher},
  booktitle = {Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence},
  year      = {2026}
}

@article{liu2026learndecllmcollabmaac,
  title   = {Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic},
  author  = {Liu, Shuo and Chen, Tianle and Amiri, Ryan and Amato, Christopher},
  journal = {arXiv preprint arXiv:2601.21972},
  year    = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 253 Commits
.github/workflows		.github/workflows
comlrl		comlrl
docs		docs
examples		examples
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

Features

Usage

Contributing

Sponsorship

Citation

About

Uh oh!

Releases 32

Packages

Contributors 3

Uh oh!

Languages

License

OpenMLRL/CoMLRL

Folders and files

Latest commit

History

Repository files navigation

Installation

Features

Usage

Contributing

Sponsorship

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 32

Packages 0

Contributors 3

Uh oh!

Languages

Packages