You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|**METR**| RE-Bench: Evaluating frontier AI R&D capabilities of language-model agents against human experts |[Paper](https://arxiv.org/abs/2411.15114), [GitHub](https://github.com/METR/RE-Bench)|
60
60
|**Sakana AI**| The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search |[Paper](https://arxiv.org/abs/2504.08066), [GitHub](https://github.com/SakanaAI/AI-Scientist-v2)|
|**Meta**| AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench |[Paper](https://arxiv.org/abs/2506.22419), [GitHub](https://github.com/facebookresearch/aira-dojo)|
62
63
63
64
> *Know another public project that cites or forks AIDE?
64
65
> [Open a PR](https://github.com/WecoAI/aideml/pulls) and add it to the table!*
0 commit comments