InternScience
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 1 deletion b/‎.gitignore‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎README.md‎
Lines changed: 3 additions & 2 deletions b/‎README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/en/Quickstart.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/en/Quickstart.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/en/conf.py‎
Lines changed: 1 addition & 1 deletion b/‎docs/en/conf.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/ja/README_ja.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/ja/README_ja.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/zh-CN/Quickstart.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/zh-CN/Quickstart.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/zh-CN/README_zh-CN.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/zh-CN/README_zh-CN.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/zh-CN/conf.py‎
Lines changed: 1 addition & 1 deletion b/‎docs/zh-CN/conf.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎requirements.txt‎
Lines changed: 14 additions & 0 deletions b/‎requirements.txt‎
Lines changed: 14 additions & 0 deletions
@@ -196,7 +196,7 @@ GPT4o_MINI/
 #apple.jpg
 #assets/LOGO.png
 #api_list.txt
-#vlmeval/gemini_tmp.py
+#scieval/gemini_tmp.py
 #run.sh
 #run_g.sh
 #tmp/
 
@@ -36,7 +36,7 @@ English | [简体中文](/docs/zh-CN/README_zh-CN.md) | [日本語](/docs/ja/REA
 - **[2025-07-07]** Supported [**SeePhys**](https://seephys.github.io/), which is a full spectrum multimodal benchmark for evaluating physics reasoning across different knowledge levels. thanks to [**Quinn777**](https://github.com/Quinn777) 🔥🔥🔥
 - **[2025-07-02]** Supported [**OvisU1**](https://huggingface.co/AIDC-AI/Ovis-U1-3B), thanks to [**liyang-7**](https://github.com/liyang-7) 🔥🔥🔥
 - **[2025-06-16]** Supported [**PhyX**](https://phyx-bench.github.io/), a benchmark aiming to assess capacity for physics-grounded reasoning in visual scenarios. 🔥🔥🔥
-- **[2025-05-24]** To facilitate faster evaluations for large-scale or thinking models, **VLMEvalKit supports multi-node distributed inference** using **LMDeploy**  (supports *InternVL Series, QwenVL Series, LLaMa4*) or **VLLM**(supports *QwenVL Series, LLaMa4*). You can activate this feature by adding the ```use_lmdeploy``` or ```use_vllm``` flag to your custom model configuration in [config.py](vlmeval/config.py) . Leverage these tools to significantly speed up your evaluation workflows 🔥🔥🔥
+- **[2025-05-24]** To facilitate faster evaluations for large-scale or thinking models, **VLMEvalKit supports multi-node distributed inference** using **LMDeploy**  (supports *InternVL Series, QwenVL Series, LLaMa4*) or **VLLM**(supports *QwenVL Series, LLaMa4*). You can activate this feature by adding the ```use_lmdeploy``` or ```use_vllm``` flag to your custom model configuration in [config.py](scieval/config.py) . Leverage these tools to significantly speed up your evaluation workflows 🔥🔥🔥
 - **[2025-05-24]** Supported Models: **InternVL3 Series, Gemini-2.5-Pro, Kimi-VL, LLaMA4, NVILA, Qwen2.5-Omni, Phi4, SmolVLM2, Grok, SAIL-VL-1.5, WeThink-Qwen2.5VL-7B, Bailingmm, VLM-R1, Taichu-VLR**. Supported Benchmarks: **HLE-Bench, MMVP, MM-AlignBench, Creation-MMBench, MM-IFEval, OmniDocBench, OCR-Reasoning, EMMA, ChaXiv，MedXpertQA, Physics, MSEarthMCQ, MicroBench, MMSci, VGRP-Bench, wildDoc, TDBench, VisuLogic, CVBench, LEGO-Puzzles, Video-MMLU, QBench-Video, MME-CoT, VLM2Bench, VMCBench, MOAT, Spatial457 Benchmark**. Please refer to [**VLMEvalKit Features**](https://aicarrier.feishu.cn/wiki/Qp7wwSzQ9iK1Y6kNUJVcr6zTnPe?table=tblsdEpLieDoCxtb) for more details. Thanks to all contributors 🔥🔥🔥
 - **[2025-02-20]** Supported Models: **InternVL2.5 Series, Qwen2.5VL Series, QVQ-72B, Doubao-VL, Janus-Pro-7B, MiniCPM-o-2.6, InternVL2-MPO, LLaVA-CoT, Hunyuan-Standard-Vision, Ovis2, Valley, SAIL-VL, Ross, Long-VITA, EMU3, SmolVLM**. Supported Benchmarks: **MMMU-Pro, WeMath, 3DSRBench, LogicVista, VL-RewardBench, CC-OCR, CG-Bench, CMMMU, WorldSense**. Thanks to all contributors 🔥🔥🔥
 - **[2024-12-11]** Supported [**NaturalBench**](https://huggingface.co/datasets/BaiqiL/NaturalBench), a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with simple questions about natural imagery.
@@ -87,7 +87,8 @@ Note that some VLMs may not be able to run under certain flash-attention version
 
 ```python
 # Demo
-from vlmeval.config import supported_VLM
+from scieval.config import supported_VLM
+
 model = supported_VLM['idefics_9b_instruct']()
 # Forward Single Image
 ret = model.generate(['assets/apple.jpg', 'What is in this image?'])
 
@@ -96,7 +96,7 @@ torchrun --nproc-per-node=2 run.py --data MME --model qwen_chat --verbose
 # When running with `python`, only one VLM instance is instantiated, and it might use multiple GPUs (depending on its default behavior).
 # That is recommended for evaluating very large VLMs (like IDEFICS-80B-Instruct).
 
-# IDEFICS2-8B on MMBench-Video, with 8 frames as inputs and vanilla evaluation. On a node with 8 GPUs. MMBench_Video_8frame_nopack is a defined dataset setting in `vlmeval/dataset/video_dataset_config.py`.
+# IDEFICS2-8B on MMBench-Video, with 8 frames as inputs and vanilla evaluation. On a node with 8 GPUs. MMBench_Video_8frame_nopack is a defined dataset setting in `scieval/dataset/video_dataset_config.py`.
 torchrun --nproc-per-node=8 run.py --data MMBench_Video_8frame_nopack --model idefics2_8
 # GPT-4o (API model) on MMBench-Video, with 1 frame per second as inputs and pack evaluation (all questions of a video in a single query).
 python run.py --data MMBench_Video_1fps_pack --model GPT4o
@@ -131,7 +131,7 @@ Some models, such as Qwen2VL and InternVL, define extensive prompt-building meth
 
 ```python
 def use_custom_prompt(self, dataset: str) -> bool:
-    from vlmeval.dataset import DATASET_TYPE, DATASET_MODALITY
+    from scieval.dataset import DATASET_TYPE, DATASET_MODALITY
     dataset_type = DATASET_TYPE(dataset, default=None)
     if not self._use_custom_prompt:
         return False
 
@@ -28,7 +28,7 @@
 author = 'VLMEvalKit Authors'
 
 # The full version, including alpha/beta/rc tags
-version_file = '../../vlmeval/__init__.py'
+version_file = '../../scieval/__init__.py'
 
 
 def get_version():
 
@@ -47,7 +47,8 @@ PS: 日本語の README には最新のアップデートがすべて含まれ
 
 ```python
 # デモ
-from vlmeval.config import supported_VLM
+from scieval.config import supported_VLM
+
 model = supported_VLM['idefics_9b_instruct']()
 # 単一画像のフォワード
 ret = model.generate(['assets/apple.jpg', 'この画像には何がありますか？'])
 
@@ -95,7 +95,7 @@ torchrun --nproc-per-node=2 run.py --data MME --model qwen_chat --verbose
 # 使用 `python` 运行时，只实例化一个 VLM，并且它可能使用多个 GPU。
 # 这推荐用于评估参数量非常大的 VLMs（如 IDEFICS-80B-Instruct）。
 
-# 在 MMBench-Video 上评测 IDEFCIS2-8B, 视频采样 8 帧作为输入，不采用 pack 模式评测. MMBench_Video_8frame_nopack 是一个定义在 `vlmeval/dataset/video_dataset_config.py` 的数据集设定.
+# 在 MMBench-Video 上评测 IDEFCIS2-8B, 视频采样 8 帧作为输入，不采用 pack 模式评测. MMBench_Video_8frame_nopack 是一个定义在 `scieval/dataset/video_dataset_config.py` 的数据集设定.
 torchrun --nproc-per-node=8 run.py --data MMBench_Video_8frame_nopack --model idefics2_8
 # 在 MMBench-Video 上评测 GPT-4o (API 模型), 视频采样每秒一帧作为输入，采用 pack 模式评测
 python run.py --data MMBench_Video_1fps_pack --model GPT4o
 
@@ -65,7 +65,8 @@
 **如何测试一个 VLM 是否可以正常运行:**
 
 ```python
-from vlmeval.config import supported_VLM
+from scieval.config import supported_VLM
+
 model = supported_VLM['idefics_9b_instruct']()
 # 前向单张图片
 ret = model.generate(['assets/apple.jpg', 'What is in this image?'])
 
@@ -28,7 +28,7 @@
 author = 'VLMEvalKit Authors'
 
 # The full version, including alpha/beta/rc tags
-version_file = '../../vlmeval/__init__.py'
+version_file = '../../scieval/__init__.py'
 
 
 def get_version():
 
@@ -38,3 +38,17 @@ transformers
 typing_extensions
 validators
 xlsxwriter
+datasets
+## clima_qa
+bert_score
+tensorflow-hub
+scikit-learn
+## CMPhysBench
+wrapt_timeout_decorator
+latex2sympy2-extended
+## PHYSICS
+pylatexenc
+math-verify
+# wrapt_timeout_decorator
+## chemBench
+loguru