Skip to content

Commit 5e854a2

Browse files
committed
add browse comp plus as an training example
1 parent 90b55e8 commit 5e854a2

File tree

7 files changed

+1156
-0
lines changed

7 files changed

+1156
-0
lines changed
106 KB
Loading
93.4 KB
Loading
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Example of Training a BrowseComp-Plus Search Agent
2+
3+
This example demonstrates how to train a web search and information retrieval agent on the **BrowseComp-Plus** dataset using the ReAct (Reasoning and Acting) paradigm.
4+
5+
BrowseComp-Plus is a comprehensive benchmark for evaluating information retrieval and question answering capabilities. The original dataset and benchmark can be found at [BrowseComp-Plus GitHub](https://github.com/texttron/BrowseComp-Plus).
6+
7+
The config file is located in [`bcp_config.yaml`](bcp_config.yaml).
8+
9+
## Key Features
10+
11+
* **Training ReAct Agent**: The workflow trains a ReAct agent that can reason and act with search tools to find information and answer questions.
12+
* **Local Search Integration**: The agent uses local BM25 or dense retrieval search (no external API required) via BrowseComp-Plus's built-in searcher.
13+
* **Tool-based Interaction**: The agent can:
14+
* **Search**: Query the search index to find relevant documents
15+
* **Get Document** (optional): Retrieve full document content by document ID
16+
* **LLM-as-Judge Evaluation**: The agent's final answer is evaluated by an auxiliary "judge" LLM against ground-truth answers to generate reward signals for training.
17+
* **Asynchronous Execution**: The workflow is designed to run asynchronously for better performance.
18+
19+
## Prerequisites
20+
21+
Before running this workflow, please complete the following setup steps.
22+
23+
### 1. Install BrowseComp-Plus
24+
25+
Clone and set up the BrowseComp-Plus repository:
26+
27+
```bash
28+
# Clone the repository
29+
git clone https://github.com/texttron/BrowseComp-Plus.git
30+
31+
# Set the environment variable (add this to your ~/.bashrc or ~/.zshrc for persistence)
32+
export BROWSECOMP_PATH="/path/to/BrowseComp-Plus"
33+
34+
# Install dependencies
35+
cd $BROWSECOMP_PATH
36+
pip install -r requirements.txt
37+
```
38+
39+
### 2. Download and Decrypt the Dataset
40+
41+
Follow the instructions in BrowseComp-Plus to download and decrypt the dataset:
42+
43+
```bash
44+
cd $BROWSECOMP_PATH
45+
46+
# Download the encrypted dataset
47+
# Follow instructions at: https://github.com/texttron/BrowseComp-Plus#data
48+
python scripts_build_index/decrypt_dataset.py --output data/browsecomp_plus_decrypted.jsonl --generate-tsv topics-qrels/queries.tsv
49+
```
50+
51+
### 3. Build the Search Index
52+
53+
Build the BM25 search index (or other index types if preferred):
54+
55+
```bash
56+
cd $BROWSECOMP_PATH
57+
58+
# Build Search index
59+
bash scripts_build_index/download_indexes.sh
60+
61+
# (Optional) To try out other retrieval index methods, please refer to instructions in BrowseComp-Plus Repo
62+
```
63+
64+
### 4. Generate Trinity-RFT Format Dataset
65+
66+
Convert the BrowseComp-Plus dataset to Trinity-RFT format:
67+
68+
```bash
69+
# From the Trinity-RFT root directory
70+
python examples/browse_comp_plus/get_browse_comp_data_for_trinity.py \
71+
--input $BROWSECOMP_PATH/data/browsecomp_plus_decrypted.jsonl \
72+
--output_dir data/trinity_format \
73+
--train_size 400 \
74+
--test_size 200 \
75+
--seed 42
76+
```
77+
78+
This will create:
79+
- `data/trinity_format/train.jsonl`: Training set (400 samples)
80+
- `data/trinity_format/test.jsonl`: Test set (200 samples)
81+
82+
### 5. Set Environment Variables and Config
83+
84+
The configuration file uses environment variables with sensible defaults. Set the required variables:
85+
86+
```bash
87+
# Required: Path to BrowseComp-Plus directory
88+
export BROWSECOMP_PATH="/path/to/BrowseComp-Plus"
89+
```
90+
91+
You should also set the `model_path` and the `auxiliary_model_path` in `bcp_config.yaml`.
92+
93+
## Running the Training
94+
95+
Once everything is configured, start the training:
96+
97+
```bash
98+
# Make sure environment variables are set
99+
export BROWSECOMP_PATH="/path/to/BrowseComp-Plus"
100+
export TRINITY_TASKSET_PATH="data/trinity_format"
101+
102+
# start the ray server
103+
ray start --head
104+
105+
# Run training
106+
trinity run --config examples/browse_comp_plus/bcp_config.yaml
107+
```
108+
109+
### Workflow Arguments
110+
111+
The `workflow_args` section controls the agent's behavior:
112+
113+
* **`searcher_type`**: Type of search index to use (e.g. `"bm25"`, etc.)
114+
* **`index_path`**: Path to the search index (uses `BROWSECOMP_INDEX_PATH` env variable)
115+
* **`browsecomp_path`**: Path to BrowseComp-Plus directory (uses `BROWSECOMP_PATH` env variable)
116+
* **`max_iterations`**: Maximum number of search/reasoning steps (default: 30)
117+
* **`top_k`**: Number of search results returned per query (default: 5)
118+
* **`snippet_max_tokens`**: Maximum tokens to include from each document snippet (default: 512)
119+
* **`include_get_document`**: Whether to enable the `get_document` tool (default: false)
120+
121+
122+
## Results
123+
124+
From the below curve you can see that the agent learns for leverage more search calls to gain more accurate answers.
125+
126+
Reward curve:
127+
128+
![](../../docs/sphinx_doc/assets/bcp_reward.png)
129+
130+
Search call curve:
131+
132+
![](../../docs/sphinx_doc/assets/bcp_searchcall.png)
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
project: "Trinity_BrowseComp_Plus"
2+
name: "BrowseComp_Plus_Simple_React_Agent"
3+
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
4+
5+
algorithm:
6+
algorithm_type: multi_step_grpo
7+
repeat_times: 8 # Number of rollouts per sample for GRPO
8+
advantage_fn_args:
9+
std_threshold: 0.001
10+
optimizer:
11+
lr: 1e-6
12+
13+
model:
14+
# Main agent model for rollout
15+
model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-4B-Instruct-2507}
16+
max_response_tokens: 10000
17+
max_model_len: 64000
18+
19+
cluster:
20+
node_num: 1
21+
gpu_per_node: 8
22+
23+
buffer:
24+
total_epochs: 128
25+
batch_size: 64
26+
train_batch_size: 512 # Total batch size: batch_size * gpu_per_node * gradient_accumulation
27+
28+
explorer_input:
29+
# Training dataset
30+
taskset:
31+
name: browsecomp_train
32+
storage_type: file
33+
path: ${oc.env:TRINITY_TASKSET_PATH,data/trinity_format}
34+
split: train
35+
format:
36+
prompt_key: 'query' # Field name for the query
37+
response_key: 'answer' # Field name for ground truth answer
38+
workflow_args:
39+
# Uses local searcher (no MCP server required)
40+
max_iterations: 30 # Maximum conversation rounds
41+
max_model_tokens: 64000 # Filter experiences longer than this
42+
# Local searcher configuration
43+
searcher_type: "bm25" # Type of searcher: bm25, dense, etc.
44+
index_path: ${oc.env:BROWSECOMP_INDEX_PATH,indexes/bm25} # Path to search index (relative to BROWSECOMP_PATH)
45+
browsecomp_path: ${oc.env:BROWSECOMP_PATH,null} # Path to BrowseComp-Plus directory
46+
top_k: 5 # Number of search results per query
47+
snippet_max_tokens: 512 # Max tokens per document snippet
48+
include_get_document: false # Whether to include get_document tool
49+
rollout_args:
50+
temperature: 1.0
51+
top_p: 1.0
52+
max_tokens: 10000
53+
enable_progress_bar: true
54+
55+
# Evaluation datasets
56+
eval_tasksets:
57+
- name: browsecomp_eval
58+
storage_type: file
59+
path: ${oc.env:TRINITY_TASKSET_PATH,data/trinity_format}
60+
split: test
61+
format:
62+
prompt_key: 'query'
63+
response_key: 'answer'
64+
workflow_args:
65+
max_iterations: 30
66+
max_model_tokens: 64000
67+
searcher_type: "bm25"
68+
index_path: ${oc.env:BROWSECOMP_INDEX_PATH,indexes/bm25}
69+
browsecomp_path: ${oc.env:BROWSECOMP_PATH,null}
70+
top_k: 5
71+
snippet_max_tokens: 512
72+
include_get_document: false
73+
rollout_args:
74+
temperature: 1.0
75+
max_tokens: 10000
76+
top_p: 1.0
77+
enable_progress_bar: true
78+
79+
default_workflow_type: 'bcp_simple_react_workflow'
80+
81+
trainer_input:
82+
experience_buffer:
83+
name: experience_buffer
84+
storage_type: queue
85+
max_read_timeout: 7200
86+
replay_buffer:
87+
enable: true
88+
89+
explorer:
90+
eval_interval: 10 # Evaluate every 10 training iterations
91+
max_repeat_times_per_runner: 4
92+
max_timeout: 3600 # 1 hour timeout per rollout
93+
runner_per_model: 16
94+
95+
# Rollout model configuration (agent model)
96+
rollout_model:
97+
enable_thinking: true
98+
enable_history: true
99+
enable_openai_api: true
100+
enable_auto_tool_choice: true # Enable automatic tool calling
101+
tool_call_parser: hermes # Tool call parser format
102+
engine_num: 2 # Number of vLLM engines
103+
tensor_parallel_size: 1 # Tensor parallelism per engine
104+
enable_prefix_caching: false
105+
enforce_eager: true
106+
dtype: bfloat16
107+
seed: 42
108+
gpu_memory_utilization: 0.7
109+
enable_chunked_prefill: true
110+
111+
# Auxiliary models (judge model for evaluation)
112+
auxiliary_models:
113+
- model_path: ${oc.env:TRINITY_JUDGE_MODEL_PATH,qwen/Qwen3-30B-A3B-Instruct-2507}
114+
engine_num: 1
115+
tensor_parallel_size: 2 # Use 2 GPUs for the larger judge model
116+
enable_thinking: false
117+
max_prompt_tokens: 20480
118+
max_response_tokens: 8192
119+
max_model_len: 32000
120+
121+
synchronizer:
122+
sync_style: dynamic_by_explorer
123+
sync_method: 'nccl'
124+
sync_interval: 4 # Sync every 4 batches
125+
sync_timeout: 7200
126+
127+
trainer:
128+
save_interval: 20 # Save checkpoint every 20 iterations
129+
grad_clip: 1.0
130+
use_dynamic_bsz: true
131+
max_token_len_per_gpu: 16384
132+
ulysses_sequence_parallel_size: 4
133+
134+
monitor:
135+
monitor_type: wandb

0 commit comments

Comments
 (0)