Skip to content

Commit 21d1975

Browse files
author
unknown
committed
“update”
1 parent 07ebdbd commit 21d1975

File tree

2 files changed

+2
-0
lines changed

2 files changed

+2
-0
lines changed

paper.pdf

-1 Bytes
Binary file not shown.

paper/sections/7-conclusion.tex

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
\section{Conclusion}
22

33
This work advances the study of Scientific General Intelligence (SGI) from both theory and practice. Grounded in the Practical Inquiry Model, we formalize SGI as the capacity to navigate the iterative cycle of \emph{Deliberation}, \emph{Conception}, \emph{Action}, and \emph{Perception} with the versatility of a human scientist. Building on this principle-grounded definition, we operationalize SGI through SGI-Bench, a comprehensive, scientist-aligned benchmark that instantiates four core task families: Scientific Deep Research, Idea Generation, AI-Assisted Scientific Experiment (dry/wet), and Scientific Experimental Reasoning. Complemented by our agentic evaluation framework and multi-metric protocol, SGI-Bench enables scalable, transparent, and domain-faithful assessment.
4+
45
Experiments reveal a consistent pattern: in \emph{Deep Research}, models show step-level alignment but low exact-match accuracy (10--20\%), with brittleness in quantitative reasoning; in \emph{Idea Generation}, hypotheses are fluent but underspecified and infeasible; in \emph{Dry Experiment}, code is executable but PassAll@k remains low; in \emph{Wet Experiment}, sequences show omissions and misordering; and in \emph{Experimental Reasoning}, causal reasoning outperforms comparative, with persistent multimodal challenges. These highlight gaps between linguistic fluency and integrated scientific cognition. Moreover, SGI exhibits \emph{dynamic capacity}: Test-Time Reinforcement Learning with novelty rewards improves idea generation without reference answers.
6+
57
Taken together, SGI-Bench clarifies both what SGI \emph{is} and where current systems \emph{fail}. By integrating principled task design, multi-metric evaluation, and agentic tool use, our framework provides a concrete foundation for systematically advancing SGI. Looking forward, the combination of numerically robust reasoning, planning-aware conception, executable experimentation, comparative multimodal inference, dynamic test-time learning, and efficient tool ecosystems charts a clear path toward general intelligence systems capable of genuine scientific discovery.

0 commit comments

Comments
 (0)