Nicholas Edwards¹*, Yukyung Lee²*, Yujun (Audrey) Mao², Yulu Qin², Sebastian Schuster¹³†, Najoung Kim²†
¹University College London, ²Boston University, ³University of Vienna
*, † Equal contribution
Submit your agent here : Go submission page 🚀
.
├── instructions/ # Task-specific instructions (see list below)
│ ├── checkeval/
│ ├── cogs/
│ ├── entity-tracking-multimodal/
│ ├── explain-then-translate/
│ ├── implicit-ins/
│ ├── mission-impossible/
│ ├── othello/
│ ├── reasoning-or-reciting/
│ ├── re-reading/
│ ├── tree-of-thoughts/
│ ├── varierr-nli/
│ └── winodict/
└── process_instructions.py # Script for processing instructionsEach subdirectory inside instructions/ contains an instructions.md file that describes the task setting.
- checkeval
- cogs
- entity-tracking-multimodal
- implicit-ins
- mission-impossible
- othello
- reasoning-or-reciting
- re-reading
- tree-of-thoughts
- varierr-nli
- winodict
@article{edwards2025rex,
title={RExBench: Can coding agents autonomously implement AI research extensions?},
author={Edwards, Nicholas and Lee, Yukyung and Mao, Yujun (Audrey) and Qin, Yulu and Schuster, Sebastian and Kim, Najoung},
journal={arXiv preprint},
year={2025}
}Team RExBench (rexbench@googlegroups.com)