Skip to content

Commit b766898

Browse files
YihengWangYihengWang
authored andcommitted
update README
1 parent f4cfaf6 commit b766898

File tree

5 files changed

+9
-6
lines changed

5 files changed

+9
-6
lines changed

README.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,13 @@ A unified evaluation toolkit and leaderboard for rigorously assessing the scient
1616
<img src="assets/icon/welcome.png" alt="welcome" height="24" style="vertical-align:middle;" />
1717
&nbsp;Welcome to the official repository of <strong>SciEval</strong>!
1818

19+
<div align="center">
20+
<img src="assets/SciEvalKit.png" alt="SciEval capability radar" width="90%">
1921
</div>
2022

21-
## <img src="assets/icon/why.png" alt="why" height="28" style="vertical-align:middle;" />&nbsp;Why SciEval?
23+
</div>
2224

25+
## <img src="assets/icon/why.png" alt="why" height="28" style="vertical-align:middle;" />&nbsp;Why SciEval?
2326

2427
**SciEval** is an open‑source evaluation framework and leaderboard aimed at measuring the **scientific intelligence** of large language and vision–language models.
2528
Although modern frontier models often achieve *~90* on general‑purpose benchmarks, their performance drops sharply on rigorous, domain‑specific scientific tasks—revealing a persistent **general‑versus‑scientific gap** that motivates the need for SciEval.
@@ -30,10 +33,6 @@ Its design is shaped by following core ideas:
3033
- **Capability‑oriented & reproducible ▸** A unified toolkit for **dataset construction, prompt engineering, inference, and expert‑aligned scoring** ensures transparent and repeatable comparisons.
3134
- **Grounded in real scenarios ▸** Benchmarks use domain‑specific data and tasks so performance reflects **actual scientific practice**, not synthetic proxies.
3235

33-
<div align="center">
34-
<img src="assets/github.png" alt="SciEval capability radar" width="100%">
35-
</div>
36-
3736

3837
## <img src="assets/icon/progress.png" alt="progress" height="28" style="vertical-align:middle;" />&nbsp;Progress in Scientific Intelligence
3938

@@ -51,6 +50,10 @@ Its design is shaped by following core ideas:
5150

5251
## <img src="assets/icon/key.png" alt="key" height="28" style="vertical-align:middle;" />&nbsp;Key Features
5352

53+
<div align="center">
54+
<img src="assets/radar.png" alt="SciEval capability radar" width="70%">
55+
</div>
56+
5457
| Category | Highlights |
5558
| ----------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
5659
| **Seven Core Dimensions** | Scientific Knowledge Understanding, Scientific Code Generation, Scientific Symbolic Reasoning, Scientific Hypothesis Generation, Scientific Multimodal Perception, Scientific Multimodal Reasoning, Scientific Multimodal Understanding |
@@ -62,7 +65,7 @@ Its design is shaped by following core ideas:
6265
<img src="assets/framework.png" alt="SciEval framework overview" width="65%">
6366
</div>
6467

65-
<p align="center">
68+
<p align="left">
6669
<em>
6770
An overview of the SciEval framework, illustrating how heterogeneous scientific datasets, unified prompt construction, model inference, and capability-oriented evaluators are integrated into a single reproducible evaluation pipeline.
6871
</em>

assets/PrismaEval.png

-1.31 MB
Binary file not shown.

assets/SciEvalKit.png

6.36 MB
Loading

assets/radar.png

813 KB
Loading

docs/SciEvalKit.pdf

2.81 MB
Binary file not shown.

0 commit comments

Comments
 (0)