Skip to content

yukyunglee/RExBench

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RExBench Title

Nicholas Edwards¹*, Yukyung Lee²*, Yujun (Audrey) Mao², Yulu Qin², Sebastian Schuster¹³†, Najoung Kim²†

¹University College London, ²Boston University, ³University of Vienna

*, † Equal contribution

Paper | Website | Dataset 🤗

📊 Submission Page

Submit your agent here : Go submission page 🚀

📂 Repository Structure

.
├── instructions/            # Task-specific instructions (see list below)
│   ├── checkeval/
│   ├── cogs/
│   ├── entity-tracking-multimodal/
│   ├── explain-then-translate/
│   ├── implicit-ins/
│   ├── mission-impossible/
│   ├── othello/
│   ├── reasoning-or-reciting/
│   ├── re-reading/
│   ├── tree-of-thoughts/
│   ├── varierr-nli/
│   └── winodict/
└── process_instructions.py     # Script for processing instructions

Each subdirectory inside instructions/ contains an instructions.md file that describes the task setting.

✅ Included Tasks

  • checkeval
  • cogs
  • entity-tracking-multimodal
  • implicit-ins
  • mission-impossible
  • othello
  • reasoning-or-reciting
  • re-reading
  • tree-of-thoughts
  • varierr-nli
  • winodict

🧠 Baseline Agents

  • Agent 1: aider (GitHub)
  • Agent 2: OpenHands (GitHub)
  • Agent 3: Claude Code

Citation

@article{edwards2025rex,
        title={RExBench: Can coding agents autonomously implement AI research extensions?},
        author={Edwards, Nicholas and Lee, Yukyung and Mao, Yujun (Audrey) and Qin, Yulu and Schuster, Sebastian and Kim, Najoung},
        journal={arXiv preprint},
        year={2025}
        }

Contact

Team RExBench (rexbench@googlegroups.com)

About

RExBench : Can coding agents autonomously implement AI research extensions?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%