-
Notifications
You must be signed in to change notification settings - Fork 49
[Example] browse comp plus #377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
garyzhang99
wants to merge
4
commits into
agentscope-ai:main
Choose a base branch
from
garyzhang99:dev/browser_comp_plus_react
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 1 commit
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
5e854a2
add browse comp plus as an training example
garyzhang99 634014e
Update examples/browse_comp_plus/README.md
garyzhang99 92f5f1d
Update examples/browse_comp_plus/get_browse_comp_data_for_trinity.py
garyzhang99 f7fe5eb
Update examples/browse_comp_plus/get_browse_comp_data_for_trinity.py
garyzhang99 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,132 @@ | ||
| # Example of Training a BrowseComp-Plus Search Agent | ||
|
|
||
| This example demonstrates how to train a web search and information retrieval agent on the **BrowseComp-Plus** dataset using the ReAct (Reasoning and Acting) paradigm. | ||
|
|
||
| BrowseComp-Plus is a comprehensive benchmark for evaluating information retrieval and question answering capabilities. The original dataset and benchmark can be found at [BrowseComp-Plus GitHub](https://github.com/texttron/BrowseComp-Plus). | ||
|
|
||
| The config file is located in [`bcp_config.yaml`](bcp_config.yaml). | ||
|
|
||
| ## Key Features | ||
|
|
||
| * **Training ReAct Agent**: The workflow trains a ReAct agent that can reason and act with search tools to find information and answer questions. | ||
| * **Local Search Integration**: The agent uses local BM25 or dense retrieval search (no external API required) via BrowseComp-Plus's built-in searcher. | ||
| * **Tool-based Interaction**: The agent can: | ||
| * **Search**: Query the search index to find relevant documents | ||
| * **Get Document** (optional): Retrieve full document content by document ID | ||
| * **LLM-as-Judge Evaluation**: The agent's final answer is evaluated by an auxiliary "judge" LLM against ground-truth answers to generate reward signals for training. | ||
| * **Asynchronous Execution**: The workflow is designed to run asynchronously for better performance. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| Before running this workflow, please complete the following setup steps. | ||
|
|
||
| ### 1. Install BrowseComp-Plus | ||
|
|
||
| Clone and set up the BrowseComp-Plus repository: | ||
|
|
||
| ```bash | ||
| # Clone the repository | ||
| git clone https://github.com/texttron/BrowseComp-Plus.git | ||
|
|
||
| # Set the environment variable (add this to your ~/.bashrc or ~/.zshrc for persistence) | ||
| export BROWSECOMP_PATH="/path/to/BrowseComp-Plus" | ||
|
|
||
| # Install dependencies | ||
| cd $BROWSECOMP_PATH | ||
| pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| ### 2. Download and Decrypt the Dataset | ||
|
|
||
| Follow the instructions in BrowseComp-Plus to download and decrypt the dataset: | ||
|
|
||
| ```bash | ||
| cd $BROWSECOMP_PATH | ||
|
|
||
| # Download the encrypted dataset | ||
| # Follow instructions at: https://github.com/texttron/BrowseComp-Plus#data | ||
| python scripts_build_index/decrypt_dataset.py --output data/browsecomp_plus_decrypted.jsonl --generate-tsv topics-qrels/queries.tsv | ||
| ``` | ||
|
|
||
| ### 3. Build the Search Index | ||
|
|
||
| Build the BM25 search index (or other index types if preferred): | ||
|
|
||
| ```bash | ||
| cd $BROWSECOMP_PATH | ||
|
|
||
| # Build Search index | ||
| bash scripts_build_index/download_indexes.sh | ||
|
|
||
| # (Optional) To try out other retrieval index methods, please refer to instructions in BrowseComp-Plus Repo | ||
| ``` | ||
|
|
||
| ### 4. Generate Trinity-RFT Format Dataset | ||
|
|
||
| Convert the BrowseComp-Plus dataset to Trinity-RFT format: | ||
|
|
||
| ```bash | ||
| # From the Trinity-RFT root directory | ||
| python examples/browse_comp_plus/get_browse_comp_data_for_trinity.py \ | ||
| --input $BROWSECOMP_PATH/data/browsecomp_plus_decrypted.jsonl \ | ||
| --output_dir data/trinity_format \ | ||
| --train_size 400 \ | ||
| --test_size 200 \ | ||
| --seed 42 | ||
| ``` | ||
|
|
||
| This will create: | ||
| - `data/trinity_format/train.jsonl`: Training set (400 samples) | ||
| - `data/trinity_format/test.jsonl`: Test set (200 samples) | ||
|
|
||
| ### 5. Set Environment Variables and Config | ||
|
|
||
| The configuration file uses environment variables with sensible defaults. Set the required variables: | ||
|
|
||
| ```bash | ||
| # Required: Path to BrowseComp-Plus directory | ||
| export BROWSECOMP_PATH="/path/to/BrowseComp-Plus" | ||
| ``` | ||
|
|
||
| You should also set the `model_path` and the `auxiliary_model_path` in `bcp_config.yaml`. | ||
|
|
||
| ## Running the Training | ||
|
|
||
| Once everything is configured, start the training: | ||
|
|
||
| ```bash | ||
| # Make sure environment variables are set | ||
| export BROWSECOMP_PATH="/path/to/BrowseComp-Plus" | ||
| export TRINITY_TASKSET_PATH="data/trinity_format" | ||
|
|
||
| # start the ray server | ||
| ray start --head | ||
|
|
||
| # Run training | ||
| trinity run --config examples/browse_comp_plus/bcp_config.yaml | ||
| ``` | ||
|
|
||
| ### Workflow Arguments | ||
|
|
||
| The `workflow_args` section controls the agent's behavior: | ||
|
|
||
| * **`searcher_type`**: Type of search index to use (e.g. `"bm25"`, etc.) | ||
| * **`index_path`**: Path to the search index (uses `BROWSECOMP_INDEX_PATH` env variable) | ||
| * **`browsecomp_path`**: Path to BrowseComp-Plus directory (uses `BROWSECOMP_PATH` env variable) | ||
| * **`max_iterations`**: Maximum number of search/reasoning steps (default: 30) | ||
| * **`top_k`**: Number of search results returned per query (default: 5) | ||
| * **`snippet_max_tokens`**: Maximum tokens to include from each document snippet (default: 512) | ||
| * **`include_get_document`**: Whether to enable the `get_document` tool (default: false) | ||
|
|
||
|
|
||
| ## Results | ||
|
|
||
| From the below curve you can see that the agent learns for leverage more search calls to gain more accurate answers. | ||
|
|
||
| Reward curve: | ||
|
|
||
|  | ||
|
|
||
| Search call curve: | ||
|
|
||
|  | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,135 @@ | ||
| project: "Trinity_BrowseComp_Plus" | ||
| name: "BrowseComp_Plus_Simple_React_Agent" | ||
| checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints} | ||
|
|
||
| algorithm: | ||
| algorithm_type: multi_step_grpo | ||
| repeat_times: 8 # Number of rollouts per sample for GRPO | ||
| advantage_fn_args: | ||
| std_threshold: 0.001 | ||
| optimizer: | ||
| lr: 1e-6 | ||
|
|
||
| model: | ||
| # Main agent model for rollout | ||
| model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen3-4B-Instruct-2507} | ||
| max_response_tokens: 10000 | ||
| max_model_len: 64000 | ||
|
|
||
| cluster: | ||
| node_num: 1 | ||
| gpu_per_node: 8 | ||
|
|
||
| buffer: | ||
| total_epochs: 128 | ||
| batch_size: 64 | ||
| train_batch_size: 512 # Total batch size: batch_size * gpu_per_node * gradient_accumulation | ||
|
|
||
| explorer_input: | ||
| # Training dataset | ||
| taskset: | ||
| name: browsecomp_train | ||
| storage_type: file | ||
| path: ${oc.env:TRINITY_TASKSET_PATH,data/trinity_format} | ||
| split: train | ||
| format: | ||
| prompt_key: 'query' # Field name for the query | ||
| response_key: 'answer' # Field name for ground truth answer | ||
| workflow_args: | ||
| # Uses local searcher (no MCP server required) | ||
| max_iterations: 30 # Maximum conversation rounds | ||
| max_model_tokens: 64000 # Filter experiences longer than this | ||
| # Local searcher configuration | ||
| searcher_type: "bm25" # Type of searcher: bm25, dense, etc. | ||
| index_path: ${oc.env:BROWSECOMP_INDEX_PATH,indexes/bm25} # Path to search index (relative to BROWSECOMP_PATH) | ||
| browsecomp_path: ${oc.env:BROWSECOMP_PATH,null} # Path to BrowseComp-Plus directory | ||
| top_k: 5 # Number of search results per query | ||
| snippet_max_tokens: 512 # Max tokens per document snippet | ||
| include_get_document: false # Whether to include get_document tool | ||
| rollout_args: | ||
| temperature: 1.0 | ||
| top_p: 1.0 | ||
| max_tokens: 10000 | ||
| enable_progress_bar: true | ||
|
|
||
| # Evaluation datasets | ||
| eval_tasksets: | ||
| - name: browsecomp_eval | ||
| storage_type: file | ||
| path: ${oc.env:TRINITY_TASKSET_PATH,data/trinity_format} | ||
| split: test | ||
| format: | ||
| prompt_key: 'query' | ||
| response_key: 'answer' | ||
| workflow_args: | ||
| max_iterations: 30 | ||
| max_model_tokens: 64000 | ||
| searcher_type: "bm25" | ||
| index_path: ${oc.env:BROWSECOMP_INDEX_PATH,indexes/bm25} | ||
| browsecomp_path: ${oc.env:BROWSECOMP_PATH,null} | ||
| top_k: 5 | ||
| snippet_max_tokens: 512 | ||
| include_get_document: false | ||
| rollout_args: | ||
| temperature: 1.0 | ||
| max_tokens: 10000 | ||
| top_p: 1.0 | ||
| enable_progress_bar: true | ||
|
|
||
| default_workflow_type: 'bcp_simple_react_workflow' | ||
|
|
||
| trainer_input: | ||
| experience_buffer: | ||
| name: experience_buffer | ||
| storage_type: queue | ||
| max_read_timeout: 7200 | ||
| replay_buffer: | ||
| enable: true | ||
|
|
||
| explorer: | ||
| eval_interval: 10 # Evaluate every 10 training iterations | ||
| max_repeat_times_per_runner: 4 | ||
| max_timeout: 3600 # 1 hour timeout per rollout | ||
| runner_per_model: 16 | ||
|
|
||
| # Rollout model configuration (agent model) | ||
| rollout_model: | ||
| enable_thinking: true | ||
| enable_history: true | ||
| enable_openai_api: true | ||
| enable_auto_tool_choice: true # Enable automatic tool calling | ||
| tool_call_parser: hermes # Tool call parser format | ||
| engine_num: 2 # Number of vLLM engines | ||
| tensor_parallel_size: 1 # Tensor parallelism per engine | ||
| enable_prefix_caching: false | ||
| enforce_eager: true | ||
| dtype: bfloat16 | ||
| seed: 42 | ||
| gpu_memory_utilization: 0.7 | ||
| enable_chunked_prefill: true | ||
|
|
||
| # Auxiliary models (judge model for evaluation) | ||
| auxiliary_models: | ||
| - model_path: ${oc.env:TRINITY_JUDGE_MODEL_PATH,qwen/Qwen3-30B-A3B-Instruct-2507} | ||
| engine_num: 1 | ||
| tensor_parallel_size: 2 # Use 2 GPUs for the larger judge model | ||
| enable_thinking: false | ||
| max_prompt_tokens: 20480 | ||
| max_response_tokens: 8192 | ||
| max_model_len: 32000 | ||
|
|
||
| synchronizer: | ||
| sync_style: dynamic_by_explorer | ||
| sync_method: 'nccl' | ||
| sync_interval: 4 # Sync every 4 batches | ||
| sync_timeout: 7200 | ||
|
|
||
| trainer: | ||
| save_interval: 20 # Save checkpoint every 20 iterations | ||
| grad_clip: 1.0 | ||
| use_dynamic_bsz: true | ||
| max_token_len_per_gpu: 16384 | ||
| ulysses_sequence_parallel_size: 4 | ||
|
|
||
| monitor: | ||
| monitor_type: wandb |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.