Skip to content

Commit e47866f

Browse files
authored
update readme
1 parent 3fff7a8 commit e47866f

File tree

1 file changed

+16
-16
lines changed

1 file changed

+16
-16
lines changed

docs/en/Quickstart.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ Before running the evaluation script, you need to configure the VLMs and correct
77
### Installation
88

99
```bash
10-
git clone https://github.com/open-compass/VLMEvalKit.git
11-
cd VLMEvalKit
10+
git clone https://github.com/open-compass/SciEvalKit.git
11+
cd SciEvalKit
1212
pip install -e .
1313
```
1414

@@ -18,10 +18,10 @@ To use API models (e.g., GPT-4v, Gemini-Pro-V) for inference, you must set up AP
1818

1919
> **Note:** Some datasets require an LLM as a Judge and have default evaluation models configured (see *Extra Notes*). You also need to configure the corresponding APIs when evaluating these datasets.
2020
21-
You can place the required keys in `$VLMEvalKit/.env` or set them directly as environment variables. If you choose to create a `.env` file, the content should look like this:
21+
You can place the required keys in `$SciEvalKit/.env` or set them directly as environment variables. If you choose to create a `.env` file, the content should look like this:
2222

2323
```bash
24-
# .env file, place it under $VLMEvalKit
24+
# .env file, place it under $SciEvalKit
2525

2626
# --- API Keys for Proprietary VLMs ---
2727
# QwenVL APIs
@@ -62,9 +62,9 @@ Fill in your keys where applicable. These API keys will be automatically loaded
6262

6363
## Step 1: Configuration
6464

65-
**VLM Configuration:** All VLMs are configured in `vlmeval/config.py`. For some VLMs (e.g., MiniGPT-4, LLaVA-v1-7B), additional configuration is required (setting the code/model weight root directory in the config file).
65+
**VLM Configuration:** All VLMs are configured in `scieval/config.py`. For some VLMs (e.g., MiniGPT-4, LLaVA-v1-7B), additional configuration is required (setting the code/model weight root directory in the config file).
6666

67-
When evaluating, you should use the model name specified in `supported_VLM` in `vlmeval/config.py`. Ensure you can successfully run inference with the VLM before starting the evaluation.
67+
When evaluating, you should use the model name specified in `supported_VLM` in `scieval/config.py`. Ensure you can successfully run inference with the VLM before starting the evaluation.
6868

6969
**Check Command:**
7070

@@ -76,12 +76,12 @@ vlmutil check {MODEL_NAME}
7676

7777
## Step 2: Evaluation
7878

79-
We use `run.py` for evaluation. You can use `$VLMEvalKit/run.py` or create a soft link to the script.
79+
We use `run.py` for evaluation. You can use `$SciEvalKit/run.py` or create a soft link to the script.
8080

8181
### Basic Arguments
8282

83-
* `--data` (list[str]): Set the dataset names supported in VLMEvalKit (refer to `vlmeval/dataset/__init__.py` or use `vlmutil dlist all` to check).
84-
* `--model` (list[str]): Set the VLM names supported in VLMEvalKit (defined in `supported_VLM` in `vlmeval/config.py`).
83+
* `--data` (list[str]): Set the dataset names supported in SciEvalKit (refer to `scieval/dataset/__init__.py` or use `vlmutil dlist all` to check).
84+
* `--model` (list[str]): Set the VLM names supported in SciEvalKit (defined in `supported_VLM` in `scieval/config.py`).
8585
* `--mode` (str, default `'all'`): Running mode, choices are `['all', 'infer', 'eval']`.
8686
* `"all"`: Perform both inference and evaluation.
8787
* `"infer"`: Perform inference only.
@@ -143,7 +143,7 @@ python run.py --config config.json
143143

144144
* `--judge` (str): Specify the evaluation model for datasets that require model-based evaluation.
145145
* If not specified, the configured default model will be used.
146-
* The model can be a VLM supported in VLMEvalKit or a custom model.
146+
* The model can be a VLM supported in SciEvalKit or a custom model.
147147
* `--judge-args` (str): Arguments for the judge model (in JSON string format).
148148
* You can pass parameters like `temperature`, `max_tokens` when specifying the judge via `--judge`.
149149
* Specific args depend on the model initialization class (e.g., `scieval.api.gpt.OpenAIWrapper`).
@@ -186,7 +186,7 @@ Some datasets have specific requirements during evaluation:
186186
* If you want to use a model other than the default GPT-4o, you must specify `base_url` and `api_key` separately (defaults to `OPENAI_API_KEY`, `OPENAI_API_BASE` in env).
187187
* **AstroVisBench:**
188188
* **Environment:** Must download dependencies following the official guide and set the `AstroVisBench_Env` environment variable.
189-
* **Python Env:** Due to complex dependencies, it is recommended to create a separate environment, install VLMEvalKit dependencies, and then install the official dependencies to avoid conflicts.
189+
* **Python Env:** Due to complex dependencies, it is recommended to create a separate environment, install SciEvalKit dependencies, and then install the official dependencies to avoid conflicts.
190190
* **Concurrency:** Default concurrency is 4. Can be changed via `--judge-args '{"max_workers": <nums>}'`.
191191
* **Judge Model:** Requires Claude 3.5 Sonnet for evaluation. Ensure `ANTHROPIC_API_KEY` is set.
192192

@@ -210,7 +210,7 @@ The following datasets use specific models as default Judges:
210210

211211
If the model output for a benchmark does not match expectations, it might be due to incorrect prompt construction.
212212

213-
In VLMEvalKit, each dataset class has a `build_prompt()` function. For example, `ImageMCQDataset.build_prompt()` combines hint, question, and options into a standard format:
213+
In SciEvalKit, each dataset class has a `build_prompt()` function. For example, `ImageMCQDataset.build_prompt()` combines hint, question, and options into a standard format:
214214

215215
```text
216216
HINT
@@ -222,15 +222,15 @@ B. Option B
222222
Please select the correct answer from the options above.
223223
```
224224

225-
VLMEvalKit also supports **Model-Level custom prompt building** via `model.build_prompt()`.
225+
SciEvalKit also supports **Model-Level custom prompt building** via `model.build_prompt()`.
226226
* **Priority:** `model.build_prompt()` overrides `dataset.build_prompt()`.
227227

228228
**Custom `use_custom_prompt()`:**
229229
You can define `model.use_custom_prompt()` to decide when to use the model-specific prompt logic:
230230

231231
```python
232232
def use_custom_prompt(self, dataset: str) -> bool:
233-
from vlmeval.dataset import DATASET_TYPE, DATASET_MODALITY
233+
from scieval.dataset import DATASET_TYPE, DATASET_MODALITY
234234
dataset_type = DATASET_TYPE(dataset, default=None)
235235

236236
if not self._use_custom_prompt:
@@ -244,7 +244,7 @@ def use_custom_prompt(self, dataset: str) -> bool:
244244

245245
### Model Splitting & GPU Allocation
246246

247-
VLMEvalKit supports automatic GPU resource division for `lmdeploy` or `transformers` backends.
247+
SciEvalKit supports automatic GPU resource division for `lmdeploy` or `transformers` backends.
248248

249249
* **Python:** Defaults to all visible GPUs. Use `CUDA_VISIBLE_DEVICES` to restrict.
250250
* **Torchrun:**
@@ -296,4 +296,4 @@ LOCAL_LLM=<model_ID_you_got>
296296
```
297297

298298
**5. Run Evaluation**
299-
Execute `run.py` as normal.
299+
Execute `run.py` as normal.

0 commit comments

Comments
 (0)