Improving Human-AI Coordination through Online Adversarial Training and Generative Models

Description

Training framework of GOAT: Generative Online Adversarial Training

paper: Improving Human-AI Coordination through Online Adversarial Training and Generative Models

demo/website: https://sites.google.com/view/goat-2025/home

Get Started

Install

First you need to have a python environment with pytorch. If you don't have, here is an example of creating a vitual conda environment:

conda create -n goat python=3.12
conda activate goat
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

And then install other packages. These (and a few missing) packages suffice to work on top of an minimal pytorch virtual environment (conda/docker)

pip install absl-py dill numpy scipy tqdm gym pettingzoo ipython pygame ipywidgets opencv-python
pip install wandb icecream setproctitle seaborn tensorboardX slackweb psutil slackweb pyastar2d einops h5py

To install module "mapbt" for MARL and "overcooked" for Overcooked!

git clone https://github.com/pareshrchaudhary/goat_overcooked.git
# Initialize and update submodules (required for overcooked environment and population data)
git submodule update --init --recursive
# Move to the root directory
cd mapbt
# Install mapbt
pip install -e .
# Install overcooked
pip install -e mapbt/envs/overcooked/overcooked_berkeley

Tips

Never work on the "main" branch directly! Create a new branch and do whatever you want
Don't forget "git rebase main" to keep updated with the "main" branch, especially before the pull request
For most cases, create new files or write additional code for a new feature, rather than modify the original code

Run (Example: Overcooked)

For a complete pipeline to train a Cooperator for overcooked, please refer to this document

Here we only show some simple examples.

Self-play on Overcooked

# cd mapbt/scripts
bash train_overcooked_sp.sh

Zero-shot Coordination on Overcooked

Generate Config

(Optional) Use the following script to generate config files if the config files are absent:

# cd mapbt/scripts
bash generate_overcooked_policy_config.sh

The config files can also be found under the "run_dir" of current experiment

Stage 1: Population-based Training

# cd mapbt/scripts
bash train_overcooked_mep_stage1.sh

Stage 2: Train the Adaptive Policy

Copy the model to this folder: "mapbt/scripts/policy_models". Specifiy the path to the policy models that will be used in stage 2 in the ``--population_yaml_path'' file

Then run

# cd mapbt/scripts
bash train_overcooked_mep_stage2.sh

Evaluatiuon with Human Proxy Model

# Train the human proxy model by behavior cloning
bash train_overcooked_bc.sh
# Move the model to mapbt/scripts/policy_models
cp "mep_adaptive_model" "mapbt/scripts/policy_models/[layout_name]/mep_adaptive.pt"
cp "bc_model" "mapbt/scripts/policy_models/[layout_name]/proxy.pt"
# Evaluate
bash eval_overcooked_mep_vs_proxy.sh

Train and Evaluate GOAT with Human Proxy Model

Ensure that policy models directory path is updated in the script and chose between different models for the adversary.

bash train_overcooked_regret.sh gpu_id "seed list"

Cite

@article{chaudhary2025improving,
  title={Improving Human-AI Coordination through Adversarial Training and Generative Models},
  author={Chaudhary, Paresh and Liang, Yancheng and Chen, Daphne and Du, Simon S and Jaques, Natasha},
  journal={arXiv preprint arXiv:2504.15457},
  year={2025}
}

Reference

This repository is built upon the official implementation of this paper

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
mapbt		mapbt
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improving Human-AI Coordination through Online Adversarial Training and Generative Models

Description

Get Started

Install

Tips

Run (Example: Overcooked)

Self-play on Overcooked

Zero-shot Coordination on Overcooked

Generate Config

Stage 1: Population-based Training

Stage 2: Train the Adaptive Policy

Evaluatiuon with Human Proxy Model

Train and Evaluate GOAT with Human Proxy Model

Cite

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Improving Human-AI Coordination through Online Adversarial Training and Generative Models

Description

Get Started

Install

Tips

Run (Example: Overcooked)

Self-play on Overcooked

Zero-shot Coordination on Overcooked

Generate Config

Stage 1: Population-based Training

Stage 2: Train the Adaptive Policy

Evaluatiuon with Human Proxy Model

Train and Evaluate GOAT with Human Proxy Model

Cite

Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages