Skip to content

Commit 9693926

Browse files
ShenQianliShenQianli
andauthored
BOTS reference evaluation results collection (#440)
Co-authored-by: ShenQianli <[email protected]>
1 parent 25b8e11 commit 9693926

File tree

7 files changed

+182
-5
lines changed

7 files changed

+182
-5
lines changed

examples/bots/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,25 @@ Also refer to the [Data Preparation Guide](https://github.com/LLM360/Reasoning36
3333

3434
Remember to modify the model/data path in `bots.yaml` and `random.yaml` accordingly.
3535

36+
##### (Optional) Customize Reference Evaluation Results
37+
38+
Modify `ref_eval_collect.yaml` to set the reference model you want to evaluate, e.g., Qwen2.5-1.5B-Instruct.
39+
40+
Launch evaluation by executing:
41+
```bash
42+
BOTS_REF_EVAL_LOG_FILE="path/to/save/eval/logs" trinity run --config examples/bots/ref_eval_collect.yaml --plugin-dir examples/bots/workflow
43+
```
44+
45+
The evaluation logs will be saved at the specified location. Then integrate the evaluation results as a new column into the original dataset:
46+
47+
```bash
48+
python examples/bots/ref_eval_collect.py \
49+
--data-path <your/path/to/original/dataset> \
50+
--ref-eval-path <your/path/to/bots_ref_eval_log.jsonl> \
51+
--ref-eval-key <column name, e.g., qwen2.5_1.5b_pass_rate>
52+
```
53+
Remember to update `task_selector.feature_keys` in `bots.yaml`.
54+
3655
##### Step 3: Training
3756
Launch training by executing:
3857
```bash

examples/bots/README_zh.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,26 @@ BOTS 以任务选择、模型训练和后验概率更新的连续循环运行。
3030
请参考LLM360提供的[数据准备指南](https://github.com/LLM360/Reasoning360?tab=readme-ov-file#data-preparation)[技术报告](https://www.arxiv.org/pdf/2506.14965)
3131
请修改`bots.yaml``random.yaml`中相应的模型/数据路径。
3232

33+
34+
##### (可选)客制参考评估结果
35+
36+
修改 `ref_eval_collect.yaml` 以设置你想要评估的参考模型,例如Qwen2.5-1.5B-Instruct。
37+
38+
执行以下命令启动评估:
39+
```bash
40+
BOTS_REF_EVAL_LOG_FILE="path/to/save/eval/logs" trinity run --config examples/bots/ref_eval_collect.yaml --plugin-dir examples/bots/workflow
41+
```
42+
43+
评估日志会保存在指定的路径下。接下来将评估结果作为新列聚合到原数据集:
44+
45+
```bash
46+
python examples/bots/ref_eval_collect.py \
47+
--data-path <your/path/to/original/dataset> \
48+
--ref-eval-path <your/path/to/bots_ref_eval_log.jsonl> \
49+
--ref-eval-key <column name, e.g., qwen2.5_1.5b_pass_rate>
50+
```
51+
记得修改`bots.yaml`中的`task_selector.feature_keys`字段。
52+
3353
##### 第三步:训练
3454
执行以下命令启动训练:
3555
```bash

examples/bots/bots.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ buffer:
2424
taskset:
2525
name: math-train
2626
storage_type: file
27-
path: '<DATA_ROOT>/LLM360/guru-RL-92k/train/math__combined_54.4k.parquet'
27+
path: 'your/data/path/containing/math__combined_54.4k.parquet' # you need to set it manually
2828
split: 'train'
2929
format:
3030
prompt_key: 'prompt'
@@ -44,7 +44,7 @@ buffer:
4444
eval_tasksets:
4545
- name: math-eval
4646
storage_type: file
47-
path: '<DATA_ROOT>/LLM360/guru-RL-92k/online_eval/math__math_500.parquet'
47+
path: 'your/data/path/containing/math__math_500.parquet' # you need to set it manually
4848
format:
4949
prompt_key: 'prompt'
5050
response_key: 'reward_model.ground_truth'

examples/bots/random.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ buffer:
2020
taskset:
2121
name: math-train
2222
storage_type: file
23-
path: '<DATA_ROOT>/LLM360/guru-RL-92k/train/math__combined_54.4k.parquet'
23+
path: 'your/data/path/containing/math__combined_54.4k.parquet' # you need to set it manually
2424
split: 'train'
2525
format:
2626
prompt_key: 'prompt'
@@ -32,7 +32,7 @@ buffer:
3232
eval_tasksets:
3333
- name: math-eval
3434
storage_type: file
35-
path: '<DATA_ROOT>/LLM360/guru-RL-92k/online_eval/math__math_500.parquet'
35+
path: 'your/data/path/containing/math__math_500.parquet' # you need to set it manually
3636
format:
3737
prompt_key: 'prompt'
3838
response_key: 'reward_model.ground_truth'

examples/bots/ref_eval_collect.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
import argparse
2+
import json
3+
4+
import numpy as np
5+
import pandas as pd
6+
7+
8+
def main():
9+
parser = argparse.ArgumentParser()
10+
parser.add_argument("--data-path", type=str, required=True)
11+
parser.add_argument("--ref-eval-path", type=str, required=True)
12+
parser.add_argument("--ref-eval-key", type=str, required=True)
13+
args = parser.parse_args()
14+
15+
print(f"Loading original dataset from {args.data_path}...")
16+
original_data = pd.read_parquet(args.data_path)
17+
prompt2linenum = {}
18+
for i, d in enumerate(original_data["prompt"]):
19+
prompt2linenum[d[0]["content"]] = i
20+
eval_results = [0.0 for _ in range(len(original_data))]
21+
print(f"Loading reference evaluation results from {args.ref_eval_path}...")
22+
print(f"Results will be written to the original dataset at a new column {args.ref_eval_key}...")
23+
with open(args.ref_eval_path, "r") as f:
24+
for line in f:
25+
item = json.loads(line)
26+
eval_results[prompt2linenum[item["question"][0]["content"]]] = np.mean(item["rewards"])
27+
original_data[args.ref_eval_key] = eval_results
28+
print(f"Dataset overwritten at {args.data_path}...")
29+
original_data.to_parquet(args.data_path)
30+
31+
32+
if __name__ == "__main__":
33+
main()
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
project: "bots_ref_eval_collect_demo_1.5B"
2+
name: "run-1"
3+
mode: explore
4+
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
5+
algorithm:
6+
algorithm_type: grpo
7+
repeat_times: 16
8+
optimizer:
9+
lr: 1e-6
10+
model:
11+
model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-1.5B-Instruct}
12+
max_prompt_tokens: 4096
13+
max_response_tokens: 8192
14+
cluster:
15+
node_num: 1
16+
gpu_per_node: 8
17+
buffer:
18+
total_epochs: 1
19+
batch_size: 100
20+
explorer_input:
21+
taskset:
22+
name: math-train
23+
storage_type: file
24+
path: 'your/data/path/containing/math__combined_54.4k.parquet' # you need to set it manually
25+
split: 'train'
26+
format:
27+
prompt_key: 'prompt'
28+
response_key: 'reward_model.ground_truth'
29+
rollout_args:
30+
temperature: 1.0
31+
task_selector:
32+
selector_type: sequential
33+
default_workflow_type: 'bots_ref_eval_collect_math_boxed_workflow'
34+
trainer_input:
35+
experience_buffer:
36+
name: exp_buffer
37+
storage_type: queue
38+
path: 'sqlite:///bots_ref_eval_collect_buffer.db'
39+
explorer:
40+
eval_interval: 100
41+
runner_per_model: 16
42+
rollout_model:
43+
engine_num: 8
44+
tensor_parallel_size: 1
45+
enable_prefix_caching: false
46+
enforce_eager: true
47+
dtype: bfloat16
48+
seed: 42
49+
synchronizer:
50+
sync_method: 'nccl'
51+
sync_interval: 10
52+
sync_timeout: 1200
53+
trainer:
54+
trainer_type: 'verl'
55+
save_interval: 1000
56+
grad_clip: 1.0
57+
use_dynamic_bsz: true
58+
max_token_len_per_gpu: 24576
59+
ulysses_sequence_parallel_size: 1

examples/bots/workflow/bots_math_boxed_workflow.py

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
1-
from typing import Union
1+
import fcntl
2+
import json
3+
import os
4+
from typing import List, Union
25

6+
from trinity.common.experience import Experience
37
from trinity.common.workflows.customized_math_workflows import MathBoxedWorkflow, Task
48
from trinity.common.workflows.workflow import WORKFLOWS
59

@@ -21,6 +25,48 @@ def format_messages(self):
2125
return self.task_desc
2226

2327

28+
@WORKFLOWS.register_module("bots_ref_eval_collect_math_boxed_workflow")
29+
class BOTSRefEvalCollectMathBoxedWorkflow(MathBoxedWorkflow):
30+
"""A reference evaluation collection workflow for math tasks that give answers in boxed format for BOTS."""
31+
32+
def reset(self, task: Task):
33+
super().reset(task)
34+
from trinity.plugins.bots_math_boxed_reward import BOTSMathBoxedRewardFn
35+
36+
self.reward_fn = BOTSMathBoxedRewardFn(**self.reward_fn_args)
37+
self.task_desc = nested_query(self.format_args.prompt_key, self.raw_task)
38+
self.truth = nested_query(self.format_args.response_key, self.raw_task)
39+
40+
def format_messages(self):
41+
# the prompts are already in message format
42+
return self.task_desc
43+
44+
def run(self) -> List[Experience]:
45+
responses = super().run()
46+
47+
rewards = [response.reward for response in responses]
48+
49+
log_entry = {
50+
"model_version": self.model.model_version,
51+
"rewards": rewards,
52+
"question": self.task_desc,
53+
"truth": self.truth,
54+
}
55+
56+
log_file_path = os.environ.get("BOTS_REF_EVAL_LOG_FILE", "./bots_ref_eval_log.jsonl")
57+
os.makedirs(os.path.dirname(log_file_path), exist_ok=True)
58+
59+
with open(log_file_path, "a") as f:
60+
fcntl.flock(f, fcntl.LOCK_EX)
61+
try:
62+
json.dump(log_entry, f)
63+
f.write("\n")
64+
finally:
65+
fcntl.flock(f, fcntl.LOCK_UN)
66+
67+
return responses
68+
69+
2470
def nested_query(query_key: str, query_obj: Union[dict, None]):
2571
# support nested query for a dict given query_keys split by '.'
2672
if query_obj is None:

0 commit comments

Comments
 (0)