Skip to content

Commit 2bc70b6

Browse files
authored
Curie arxiv (#300)
* curie arxiv * curie arxiv * Update SymbioticLab.bib. for Curie * Update check.yml * Update ci-workflow.yml
1 parent e1edbbf commit 2bc70b6

File tree

3 files changed

+20
-2
lines changed

3 files changed

+20
-2
lines changed

.github/workflows/check.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
node-version: 16
2727
# Caching dependencies to speed up workflows. (GitHub will remove any cache entries that have not been accessed in over 7 days.)
2828
- name: Cache node modules
29-
uses: actions/cache@v2
29+
uses: actions/cache@v4
3030
id: cache
3131
with:
3232
path: ~/.npm

.github/workflows/ci-workflow.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323

2424
# Caching dependencies to speed up workflows. (GitHub will remove any cache entries that have not been accessed in over 7 days.)
2525
- name: Cache node modules
26-
uses: actions/cache@v2
26+
uses: actions/cache@v4
2727
id: cache
2828
with:
2929
path: ~/.npm

source/_data/SymbioticLab.bib

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1988,3 +1988,21 @@ @Article{mordal:arxiv25
19881988
We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using up to 8.9×-11.6× lower GPU hours than grid search. In the process of our evaluation, we have also discovered new VLMs that outperform their state-of-the-art counterparts.
19891989
}
19901990
}
1991+
1992+
@Article{curie:arxiv25,
1993+
author = {Patrick Tser Jern Kon and Jiachen Liu and Qiuyi Ding and Yiming Qiu and Zhenning Yang and Yibo Huang and Jayanth Srinivasa and Myungjin Lee and Mosharaf Chowdhury and Ang Chen},
1994+
title = {{Curie}: Toward Rigorous and Automated Scientific Experimentation with AI Agents},
1995+
year = {2025},
1996+
month = {Feb},
1997+
volume = {abs/2502.16069},
1998+
archivePrefix = {arXiv},
1999+
eprint = {2502.16069},
2000+
url = {https://arxiv.org/abs/2502.16069},
2001+
publist_link = {code || https://github.com/Just-Curieous/Curie},
2002+
publist_confkey = {arXiv:2502.16069},
2003+
publist_link = {paper || https://arxiv.org/abs/2502.16069},
2004+
publist_topic = {Systems + AI},
2005+
publist_abstract = {
2006+
Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4× improvement in correctly answering experimental questions. Curie is open-sourced at https://github.com/Just-Curieous/Curie.
2007+
}
2008+
}

0 commit comments

Comments
 (0)