Skip to content

Commit 2aea412

Browse files
authored
update readme
1 parent e47866f commit 2aea412

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
# PrismaEval ToolKit
1+
# SciEval ToolKit
22

3-
**PrismaEval** is an open-source evaluation framework and leaderboard for measuring the *scientific intelligence* of large language and vision–language models.
3+
**SciEval** is an open-source evaluation framework and leaderboard for measuring the *scientific intelligence* of large language and vision–language models.
44
It targets the full research workflow which is from scientific image understanding to hypothesis generation and provides a reproducible toolkit that unifies data loading, prompt construction, inference and evaluation.
55

66
<div align="center">
7-
<img src="assets/github.png" alt="PrismaEval capability radar" width="100%">
7+
<img src="assets/github.png" alt="SciEval capability radar" width="100%">
88
</div>
99

10-
Modern frontier language models routinely score near *90* on general‑purpose benchmarks, yet even the strongest model (e.g., **Gemini 3 Pro**) drops below *60* when challenged by rigorous, domain‑specific scientific tasks. PrismaEval makes this **general‑versus‑scientific gap** explicit and supplies the evaluation infrastructure needed to guide the integration of broad instruction‑tuned abilities with specialised skills in coding, symbolic reasoning and diagram understanding.
10+
Modern frontier language models routinely score near *90* on general‑purpose benchmarks, yet even the strongest model (e.g., **Gemini 3 Pro**) drops below *60* when challenged by rigorous, domain‑specific scientific tasks. SciEval makes this **general‑versus‑scientific gap** explicit and supplies the evaluation infrastructure needed to guide the integration of broad instruction‑tuned abilities with specialised skills in coding, symbolic reasoning and diagram understanding.
1111

1212
<div align="center">
13-
<img src="assets/general_scientific_comparison.png" alt="PrismaEval capability radar" width="100%">
13+
<img src="assets/general_scientific_comparison.png" alt="SciEval capability radar" width="100%">
1414
</div>
1515

1616
## Key Features
@@ -24,7 +24,7 @@ Modern frontier language models routinely score near *90* on general‑purpose
2424
<hr style="height:1px;background:black;border:none;" />
2525

2626
## News
27-
* **2025‑12‑05 · PrismaEval v1 Launch**
27+
* **2025‑12‑05 · SciEval v1 Launch**
2828
&nbsp;&nbsp;• Initial public release of a science‑focused evaluation toolkit and leaderboard devoted to realistic research workflows.
2929

3030
&nbsp;&nbsp;• Initial evaluation of 20 frontier models (closed & open source) now live at <https://discovery.intern-ai.org.cn/sciprismax/leaderboard>.
@@ -47,8 +47,8 @@ guides, or consult the VLMEvalKit tutorial
4747

4848
### 1 · Install
4949
```bash
50-
git clone https://github.com/PrismaEval/PrismaEval-Kit.git
51-
cd PrismaEval-Kit
50+
git clone https://github.com/InternScience/SciEvalKit.git
51+
cd SciEval-Kit
5252
pip install -e .[all] # brings in vllm, openai‑sdk, hf_hub, etc.
5353
```
5454

@@ -60,7 +60,7 @@ OPENAI_API_KEY=...
6060
GOOGLE_API_KEY=...
6161
DASHSCOPE_API_KEY=...
6262
```
63-
If no keys are provided, PrismaEval falls back to rule‑based scoring
63+
If no keys are provided, SciEval falls back to rule‑based scoring
6464
whenever possible.
6565

6666
### 3 · Run a API demo test
@@ -88,6 +88,6 @@ python run.py \
8888

8989
## Acknowledgements
9090

91-
PrismaEval ToolKit is built on top of the excellent **[VLMEvalKit](https://github.com/open-compass/VLMEvalKit)** framework and we thank the OpenCompass team not only for open‑sourcing their engine, but also for publishing thorough deployment and development guides ([Quick Start](https://vlmevalkit.readthedocs.io/en/latest/Quickstart.html)[Development Notes](https://vlmevalkit.readthedocs.io/en/latest/Development.html)) that streamlined our integration.
91+
SciEval ToolKit is built on top of the excellent **[VLMEvalKit](https://github.com/open-compass/VLMEvalKit)** framework and we thank the OpenCompass team not only for open‑sourcing their engine, but also for publishing thorough deployment and development guides ([Quick Start](https://vlmevalkit.readthedocs.io/en/latest/Quickstart.html)[Development Notes](https://vlmevalkit.readthedocs.io/en/latest/Development.html)) that streamlined our integration.
9292

93-
We also acknowledge the core PrismaEval contributors for their efforts on dataset curation, evaluation design, and engine implementation: Jun Yao, Han Deng, Yizhou Wang, Jiabei Xiao, Jiaqi Liu, Encheng Su, Yujie Liu, Weida Wang, Junchi Yao, Haoran Sun, Runmin Ma, Bo Zhang, Dongzhan Zhou, Shufei Zhang, Peng Ye, Xiaosong Wang, and Shixiang Tang, as well as all community testers who provided early feedback.
93+
We also acknowledge the core SciEval contributors for their efforts on dataset curation, evaluation design, and engine implementation: Jun Yao, Han Deng, Yizhou Wang, Jiabei Xiao, Jiaqi Liu, Encheng Su, Yujie Liu, Weida Wang, Junchi Yao, Haoran Sun, Runmin Ma, Bo Zhang, Dongzhan Zhou, Shufei Zhang, Peng Ye, Xiaosong Wang, and Shixiang Tang, as well as all community testers who provided early feedback.

0 commit comments

Comments
 (0)