Skip to content

Commit 3f770db

Browse files
committed
add the desc of batch eval
1 parent 9f16ab4 commit 3f770db

File tree

2 files changed

+75
-56
lines changed

2 files changed

+75
-56
lines changed

docs/en/notes/guide/pipelines/EvalPipeline.md

Lines changed: 38 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,6 @@ dataflow eval init
3030
dataflow eval api / dataflow eval local
3131
```
3232

33-
34-
3533
## Step 1: Install Evaluation Environment
3634

3735
Download evaluation environment
@@ -41,17 +39,13 @@ pip install -e .[eval]
4139
cd ..
4240
```
4341

44-
45-
4642
## Step 2: Create and Enter DataFlow Working Directory
4743

4844
```bash
4945
mkdir workspace
5046
cd workspace
5147
```
5248

53-
54-
5549
## Step 3: Prepare Evaluation Data and Initialize Configuration Files
5650

5751
Initialize configuration files
@@ -66,8 +60,6 @@ Project Root/
6660
└── eval_local.py # Configuration file for local model evaluator
6761
```
6862

69-
70-
7163
## Step 4: Prepare Evaluation Data
7264

7365
### Method 1: JSON Format
@@ -100,10 +92,8 @@ EVALUATOR_RUN_CONFIG = {
10092
}
10193
```
10294

103-
104-
10595
## Step 5: Configure Parameters
106-
96+
### Model Parameter Configure
10797
If you want to use a local model as the evaluator, please modify the parameters in the `eval_local.py` file.
10898

10999
If you want to use an API model as the evaluator, please modify the parameters in the `eval_api.py` file.
@@ -122,24 +112,46 @@ TARGET_MODELS = [
122112

123113
# 3. Custom configuration
124114
# Add more models...
125-
# {
126-
# "name": "llama_8b",
127-
# "path": "meta-llama/Llama-3-8B-Instruct",
128-
# "tensor_parallel_size": 2,
129-
# "max_tokens": 2048,
130-
# "gpu_memory_utilization": 0.9,
131-
132-
# # You can customize prompts for each model. If not specified, defaults to the template in build_prompt function.
133-
# # Default prompt for evaluated models
134-
# # IMPORTANT: This is the prompt for models being evaluated, NOT for the judge model!!!
135-
# "answer_prompt": """please answer the questions:
136-
# question:{question}
137-
# answer:"""
138-
# }
115+
116+
{
117+
"name": "qwen_7b", # Model name
118+
"path": "./Qwen2.5-7B-Instruct", # Model path
119+
# Large language models can use different parameters
120+
"vllm_tensor_parallel_size": 4, # Number of GPUs
121+
"vllm_temperature": 0.1, # Randomness
122+
"vllm_top_p": 0.9, # Top-p sampling
123+
"vllm_max_tokens": 2048, # Maximum number of tokens
124+
"vllm_repetition_penalty": 1.0, # Repetition penalty
125+
"vllm_seed": None, # Random seed
126+
"vllm_gpu_memory_utilization": 0.9, # Maximum GPU memory utilization
127+
# Custom prompt can be defined for each model
128+
"answer_prompt": """please answer the following question:"""
129+
}
130+
131+
139132
]
140133
```
141134

142-
135+
### Bench Parameter Configuration
136+
Supports batch configuration of benchmarks
137+
```python
138+
BENCH_CONFIG = [
139+
{
140+
"name": "bench_name", # Benchmark name
141+
"input_file": "path_to_your_qa/qa.json", # Data file
142+
"question_key": "input", # Question field name
143+
"reference_answer_key": "output", # Reference answer field name
144+
"output_dir": "path//bench_name", # Output directory
145+
},
146+
{
147+
"name": "other_bench_name",
148+
"input_file": "path_to_your_qa/other_qa.json",
149+
"question_key": "input",
150+
"reference_answer_key": "output",
151+
"output_dir": "path/other_bench_name",
152+
}
153+
]
154+
```
143155

144156
## Step 6: Run Evaluation
145157

docs/zh/notes/guide/pipelines/EvalPipeline.md

Lines changed: 37 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -40,17 +40,13 @@ pip install -e .[eval]
4040
cd ..
4141
```
4242

43-
44-
4543
## 第二步:创建并进入dataflow工作文件夹
4644

4745
```bash
4846
mkdir workspace
4947
cd workspace
5048
```
5149

52-
53-
5450
## 第三步:准备评估数据初始化配置文件
5551

5652
初始化配置文件
@@ -67,8 +63,6 @@ dataflow eval init
6763
└── eval_local.py # 评估器为本地模型的配置文件
6864
```
6965

70-
71-
7266
## 第四步:准备评估数据
7367

7468
### 方式一:
@@ -90,8 +84,6 @@ dataflow eval init
9084

9185
`output`是标准答案
9286

93-
94-
9587
### 方式二:
9688

9789
也可以不处理数据(需要有明确的问题和标准答案这两个字段),通过eval_api.py以及eval_local.py来进行配置映射字段名字
@@ -104,16 +96,14 @@ EVALUATOR_RUN_CONFIG = {
10496
}
10597
```
10698

107-
108-
10999
## 第五步:配置参数
100+
### 模型参数配置
110101

111102
假设想用本地模型作为评估器,请修改`eval_local.py`文件中的参数
112103

113104
假设想用api模型作为评估器,请修改`eval_api.py`文件中的参数
114105

115-
```bash
116-
Target Models Configuration (same as API mode)
106+
```python
117107

118108
TARGET_MODELS = [
119109
# 展示所有用法
@@ -124,28 +114,45 @@ TARGET_MODELS = [
124114
# "Qwen/Qwen2.5-7B-Instruct"
125115
# 3.单独配置
126116
# 添加更多模型...
127-
# {
128-
# "name": "llama_8b",
129-
# "path": "meta-llama/Llama-3-8B-Instruct",
130-
# "tensor_parallel_size": 2
131-
# "max_tokens": 2048,
132-
# "gpu_memory_utilization": 0.9,
133-
# 可以为每个模型自定义提示词 不写就为默认模板 即 build_prompt函数中的prompt
134-
# 默认被评估模型提示词
135-
# 再次提示:该prompt为被评估模型的提示词,请勿与评估模型提示词混淆!!!
136-
# You can customize prompts for each model. If not specified, defaults to the template in build_prompt function.
137-
# Default prompt for evaluated models
138-
# IMPORTANT: This is the prompt for models being evaluated, NOT for the judge model!!!
139-
# "answer_prompt": """please answer the questions:
140-
# question:{question}
141-
# answer:"""
142-
# ""
143-
# }
144-
#
117+
{
118+
"name": "qwen_7b", # 模型名称
119+
"path": "./Qwen2.5-7B-Instruct", # 模型路径
120+
# 大模型可以用不同的参数
121+
"vllm_tensor_parallel_size": 4, # 显卡数量
122+
"vllm_temperature": 0.1, # 随机性,值越大输出越随机
123+
"vllm_top_p": 0.9, # 核采样概率阈值,控制候选词的累积概率范围
124+
"vllm_max_tokens": 2048, # 最大生成token数
125+
"vllm_repetition_penalty": 1.0, # 重复惩罚系数,大于1时抑制重复内容
126+
"vllm_seed": None, # 随机种子,设置后可复现结果
127+
"vllm_gpu_memory_utilization": 0.9, # 最大显存利用率
128+
# 可以为每个模型自定义提示词
129+
"answer_prompt": """please answer the following question:""" # 回答提示词模板
130+
}
145131

146132
]
147133
```
148134

135+
### Bench参数配置
136+
支持批量Bench评估
137+
```python
138+
BENCH_CONFIG = [
139+
{
140+
"name": "bench_name", # bench名称
141+
"input_file": "path_to_your_qa/qa.json", # 数据文件
142+
"question_key": "input", # 问题字段名
143+
"reference_answer_key": "output", # 答案字段名
144+
"output_dir": "path//bench_name", # 输出目录
145+
},
146+
{
147+
"name": "other_bench_name",
148+
"input_file": "path_to_your_qa/other_qa.json",
149+
"question_key": "input",
150+
"reference_answer_key": "output",
151+
"output_dir":"path/other_bench_name",
152+
}
153+
]
154+
155+
```
149156

150157

151158
## 第六步:进行评估

0 commit comments

Comments
 (0)