Skip to content

Commit 3459e6b

Browse files
authored
upload evalpipeline-doc zh&en (#126)
* modified pdf2model-doc * upload pdf2model-doc files * upload pdf2model-doc files * EvalPipeline-doc
1 parent 739c430 commit 3459e6b

File tree

2 files changed

+336
-0
lines changed

2 files changed

+336
-0
lines changed
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
# **Evaluation Pipeline**
2+
3+
Only supports QA pair format evaluation
4+
5+
## **Quick Start**
6+
7+
```
8+
cd DataFlow
9+
pip install -e .[llamafactory]
10+
11+
cd ..
12+
mkdir workspace
13+
cd workspace
14+
15+
#Place the files you want to evaluate in the working directory
16+
17+
#Initialize the evaluation configuration file
18+
dataflow eval init
19+
20+
#Note: You must modify the configuration file eval_api.py or eval_local.py
21+
#By default, it finds the latest fine-tuned model and compares it with its base model
22+
#Default evaluation method is semantic evaluation
23+
#Evaluation metric is accuracy
24+
dataflow eval api / dataflow eval local
25+
```
26+
27+
## **Step 1: Install Evaluation Environment**
28+
29+
Download evaluation environment
30+
31+
```
32+
cd DataFlow
33+
pip install -e .[llamafactory]
34+
cd ..
35+
```
36+
37+
38+
39+
## **Step 2: Create and Enter dataflow Working Folder**
40+
41+
```
42+
mkdir workspace
43+
cd workspace
44+
```
45+
46+
47+
48+
## **Step 3: Prepare Evaluation Data and Initialize Configuration File**
49+
50+
Initialize configuration file
51+
52+
```
53+
dataflow eval init
54+
```
55+
56+
After initialization is complete, the project directory becomes:
57+
58+
```
59+
Project Root Directory/
60+
├── eval_api.py # Configuration file for API model evaluator
61+
└── eval_local.py # Configuration file for local model evaluator
62+
```
63+
64+
65+
66+
## **Step 4: Prepare Evaluation Data**
67+
68+
Initialize configuration file
69+
70+
```
71+
dataflow eval init
72+
```
73+
74+
After initialization is complete, the project directory becomes:
75+
76+
```
77+
Project Root Directory/
78+
├── eval_api.py # Configuration file for API model evaluator
79+
└── eval_local.py # Configuration file for local model evaluator
80+
```
81+
82+
83+
84+
**Method 1:** Please prepare a JSON format file with data format similar to the example shown
85+
86+
Please prepare a JSON format file with data format similar to the display
87+
88+
```
89+
{
90+
"input": "What properties indicate that material PI-1 has excellent processing characteristics during manufacturing processes?",
91+
"output": "Material PI-1 has high tensile strength between 85-105 MPa.\nPI-1 exhibits low melt viscosity below 300 Pa·s indicating good flowability.\n\nThe combination of its high tensile strength and low melt viscosity indicates that it can be easily processed without breaking during manufacturing."
92+
},
93+
```
94+
95+
In this example data
96+
97+
`input` is the question
98+
99+
`output` is the standard answer
100+
101+
102+
103+
**Method 2:** You can also skip data processing (requires clear question and standard answer fields), and configure field name mappings through eval_api.py and eval_local.py
104+
105+
```
106+
EVALUATOR_RUN_CONFIG = {
107+
"input_test_answer_key": "model_generated_answer", # Model generated answer field name
108+
"input_gt_answer_key": "output", # Standard answer field name (corresponding to original data)
109+
"input_question_key": "input" # Question field name (corresponding to original data)
110+
}
111+
```
112+
113+
114+
115+
## **Step 5: Configure Parameters**
116+
117+
If you want to use a local model as the evaluator, please modify the parameters in the `eval_local.py` file If you want to use an API model as the evaluator, please modify the parameters in the `eval_api.py` file
118+
119+
```
120+
Target Models Configuration (same as API mode)
121+
122+
TARGET_MODELS = [
123+
# Shows all usage methods
124+
# The following methods can be mixed and used together
125+
# 1. Local path
126+
# "./Qwen2.5-3B-Instruct",
127+
# 2. Huggingface path
128+
# "Qwen/Qwen2.5-7B-Instruct"
129+
# 3. Individual configuration
130+
# Add more models...
131+
# {
132+
# "name": "llama_8b",
133+
# "path": "meta-llama/Llama-3-8B-Instruct",
134+
# "tensor_parallel_size": 2
135+
# "max_tokens": 2048,
136+
# "gpu_memory_utilization": 0.9,
137+
# You can customize prompts for each model. If not specified, defaults to the template in build_prompt function.
138+
# Default prompt for evaluated models
139+
# IMPORTANT: This is the prompt for models being evaluated, NOT for the judge model!!!
140+
# "answer_prompt": """please answer the questions:
141+
# question:{question}
142+
# answer:"""
143+
# ""
144+
# }
145+
#
146+
147+
]
148+
```
149+
150+
151+
152+
## **Step 6: Perform Evaluation**
153+
154+
Run local evaluation
155+
156+
```
157+
dataflow eval local
158+
```
159+
160+
Run API evaluation
161+
162+
```
163+
dataflow eval api
164+
```
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# 评估流水线
2+
3+
仅支持QA对形式的评估
4+
5+
## 快速开始
6+
7+
```
8+
cd DataFlow
9+
pip install -e .[llamafactory]
10+
11+
cd ..
12+
mkdir workspace
13+
cd workspace
14+
15+
#将想要评估的文件放到工作目录下
16+
17+
#初始化评估的配置文件
18+
dataflow eval init
19+
20+
#注意 一定要修改配置文件eval_api.py 或者 eval_local.py
21+
#默认找到最新的微调模型与其基础模型对比
22+
#默认评估方法是语义评估
23+
#评估指标是准确度
24+
dataflow eval api / dataflow eval local
25+
```
26+
27+
28+
29+
## 第一步:安装评估环境
30+
31+
下载评估环境
32+
33+
```
34+
cd DataFlow
35+
pip install -e .[llamafactory]
36+
cd ..
37+
```
38+
39+
40+
41+
## 第二步:创建并进入dataflow工作文件夹
42+
43+
```
44+
mkdir workspace
45+
cd workspace
46+
```
47+
48+
49+
50+
## 第三步:准备评估数据初始化配置文件
51+
52+
初始化配置文件
53+
54+
```
55+
dataflow eval init
56+
```
57+
58+
初始化完成后,项目目录变成:
59+
60+
```
61+
项目根目录/
62+
├── eval_api.py # 评估器为api模型的配置文件
63+
└── eval_local.py # 评估器为本地模型的配置文件
64+
```
65+
66+
67+
68+
## 第四步:准备评估数据
69+
70+
初始化配置文件
71+
72+
```
73+
dataflow eval init
74+
```
75+
76+
初始化完成后,项目目录变成:
77+
78+
```
79+
项目根目录/
80+
├── eval_api.py # 评估器为api模型的配置文件
81+
└── eval_local.py # 评估器为本地模型的配置文件
82+
```
83+
84+
### 方式一:
85+
86+
请准备好json格式文件,数据格式与展示类似
87+
88+
```
89+
[
90+
{
91+
"input": "What properties indicate that material PI-1 has excellent processing characteristics during manufacturing processes?",
92+
"output": "Material PI-1 has high tensile strength between 85-105 MPa.\nPI-1 exhibits low melt viscosity below 300 Pa·s indicating good flowability.\n\nThe combination of its high tensile strength and low melt viscosity indicates that it can be easily processed without breaking during manufacturing."
93+
},
94+
]
95+
```
96+
97+
这里示例数据中
98+
99+
`input`是问题(也可以是问题+选择的选项合并为一个input)
100+
101+
`output`是标准答案
102+
103+
### 方式二:
104+
105+
也可以不处理数据(需要有明确的问题和标准答案这两个字段),通过eval_api.py以及eval_local.py来进行配置映射字段名字
106+
107+
```
108+
EVALUATOR_RUN_CONFIG = {
109+
"input_test_answer_key": "model_generated_answer", # 模型生成的答案字段名
110+
"input_gt_answer_key": "output", # 标准答案字段名(原始数据的字段)
111+
"input_question_key": "input" # 问题字段名(原始数据的字段)
112+
}
113+
```
114+
115+
116+
117+
## 第五步:配置参数
118+
119+
假设想用本地模型作为评估器,请修改`eval_local.py`文件中的参数
120+
121+
假设想用api模型作为评估器,请修改`eval_api.py`文件中的参数
122+
123+
```
124+
Target Models Configuration (same as API mode)
125+
126+
TARGET_MODELS = [
127+
# 展示所有用法
128+
# 以下用法可混合使用
129+
# 1.本地路径
130+
# "./Qwen2.5-3B-Instruct",
131+
# 2.huggingface路径
132+
# "Qwen/Qwen2.5-7B-Instruct"
133+
# 3.单独配置
134+
# 添加更多模型...
135+
# {
136+
# "name": "llama_8b",
137+
# "path": "meta-llama/Llama-3-8B-Instruct",
138+
# "tensor_parallel_size": 2
139+
# "max_tokens": 2048,
140+
# "gpu_memory_utilization": 0.9,
141+
# 可以为每个模型自定义提示词 不写就为默认模板 即 build_prompt函数中的prompt
142+
# 默认被评估模型提示词
143+
# 再次提示:该prompt为被评估模型的提示词,请勿与评估模型提示词混淆!!!
144+
# You can customize prompts for each model. If not specified, defaults to the template in build_prompt function.
145+
# Default prompt for evaluated models
146+
# IMPORTANT: This is the prompt for models being evaluated, NOT for the judge model!!!
147+
# "answer_prompt": """please answer the questions:
148+
# question:{question}
149+
# answer:"""
150+
# ""
151+
# }
152+
#
153+
154+
]
155+
```
156+
157+
158+
159+
## 第六步:进行评估
160+
161+
运行本地评估
162+
163+
```
164+
dataflow eval local
165+
```
166+
167+
运行api评估
168+
169+
```
170+
dataflow eval api
171+
```
172+

0 commit comments

Comments
 (0)