Skip to content

Commit 509e96e

Browse files
committed
curie arxiv
1 parent e1edbbf commit 509e96e

File tree

1 file changed

+18
-0
lines changed

1 file changed

+18
-0
lines changed

source/_data/SymbioticLab.bib

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1988,3 +1988,21 @@ @Article{mordal:arxiv25
19881988
We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using up to 8.9×-11.6× lower GPU hours than grid search. In the process of our evaluation, we have also discovered new VLMs that outperform their state-of-the-art counterparts.
19891989
}
19901990
}
1991+
1992+
@Article{mordal:arxiv25,
1993+
author = {Patrick Tser Jern Kon, Jiachen Liu, Qiuyi Ding, Yiming Qiu, Zhenning Yang, Yibo Huang, Jayanth Srinivasa, Myungjin Lee, Mosharaf Chowdhury, Ang Chen},
1994+
title = {{Curie}: Toward Rigorous and Automated Scientific Experimentation with AI Agents},
1995+
year = {2025},
1996+
month = {Feb},
1997+
volume = {abs/2502.16069},
1998+
archivePrefix = {arXiv},
1999+
eprint = {2502.16069},
2000+
url = {https://arxiv.org/abs/2502.16069},
2001+
publist_link = {code || https://github.com/Just-Curieous/Curie},
2002+
publist_confkey = {arXiv:2502.16069},
2003+
publist_link = {paper || https://arxiv.org/abs/2502.16069},
2004+
publist_topic = {Systems + AI},
2005+
publist_abstract = {
2006+
Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4× improvement in correctly answering experimental questions. Curie is open-sourced at https://github.com/Just-Curieous/Curie.
2007+
}
2008+
}

0 commit comments

Comments
 (0)