Curie arxiv (#300)

AmberLJC · web-flow · commit 2bc70b60dc2d · 2025-03-07T23:24:24.000-05:00
* curie arxiv

* curie arxiv

* Update SymbioticLab.bib. for Curie

* Update check.yml

* Update ci-workflow.yml
diff --git a/.github/workflows/check.yml b/.github/workflows/check.yml
@@ -26,7 +26,7 @@ jobs:
           node-version: 16
       # Caching dependencies to speed up workflows. (GitHub will remove any cache entries that have not been accessed in over 7 days.)
       - name: Cache node modules
-        uses: actions/cache@v2
+        uses: actions/cache@v4
         id: cache
         with:
           path: ~/.npm
diff --git a/.github/workflows/ci-workflow.yml b/.github/workflows/ci-workflow.yml
@@ -23,7 +23,7 @@ jobs:
 
     # Caching dependencies to speed up workflows. (GitHub will remove any cache entries that have not been accessed in over 7 days.)
     - name: Cache node modules
-      uses: actions/cache@v2
+      uses: actions/cache@v4
       id: cache
       with:
         path: ~/.npm
diff --git a/source/_data/SymbioticLab.bib b/source/_data/SymbioticLab.bib
@@ -1988,3 +1988,21 @@ @Article{mordal:arxiv25
     We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using up to 8.9×-11.6× lower GPU hours than grid search. In the process of our evaluation, we have also discovered new VLMs that outperform their state-of-the-art counterparts.
   }
 }
+
+@Article{curie:arxiv25,
+  author          = {Patrick Tser Jern Kon and Jiachen Liu and Qiuyi Ding and Yiming Qiu and Zhenning Yang and Yibo Huang and Jayanth Srinivasa and Myungjin Lee and Mosharaf Chowdhury and Ang Chen},
+  title           = {{Curie}: Toward Rigorous and Automated Scientific Experimentation with AI Agents},
+  year            = {2025},
+  month           = {Feb},
+  volume          = {abs/2502.16069},
+  archivePrefix   = {arXiv},
+  eprint          = {2502.16069},
+  url             = {https://arxiv.org/abs/2502.16069},
+  publist_link = {code || https://github.com/Just-Curieous/Curie},
+  publist_confkey = {arXiv:2502.16069},
+  publist_link    = {paper || https://arxiv.org/abs/2502.16069},
+  publist_topic   = {Systems + AI},
+  publist_abstract = {
+Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4× improvement in correctly answering experimental questions. Curie is open-sourced at https://github.com/Just-Curieous/Curie.
+  }
+}