Skip to content

Commit 5efe6d3

Browse files
committed
Update Readme
2 parents 5126945 + ff3caf1 commit 5efe6d3

File tree

4 files changed

+31
-3
lines changed

4 files changed

+31
-3
lines changed

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,14 @@ AgentFly is an extensible framework for building LLM agents with reinforcement l
1212

1313
## 🆕 News
1414

15-
**New: Chat Template System** - A flexible framework for creating conversation templates with multi-model support, vision capabilities, and tool integration. [Learn more →](docs/chat_template/)
15+
**Multi-Modal (Vision) Agent Training Support** - Thanks to the powerful template system, AgentFly now supports training vision-language agents! 🎉
16+
17+
Train agents that can see and understand visual content, including GUI automation and image-based QA. See our [predefined training examples](docs/examples/predefined_training_examples.md) for ready-to-use scripts.
1618

1719
---
1820

21+
**New: Chat Template System** - A flexible framework for creating conversation templates with multi-model support, vision capabilities, and tool integration. [Learn more →](docs/chat_template/)
22+
1923
## Installation
2024
Clone and initialize the project:
2125
```bash

agents/agents/agents/chain/chain_base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -345,7 +345,7 @@ async def _run_single_chain(self,
345345
newest_messages.append({
346346
"role": "tool",
347347
"tool_call_id": tool_call["id"],
348-
"content": [{"type": "text", "text": observation_json}],
348+
"content": [{"type": "text", "text": observation}],
349349
})
350350
action_input_node.messages = deepcopy(newest_messages)
351351
action_input_node.is_terminal = result["status"] in self.terminal_status
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Predefined Training Examples
2+
3+
This document provides a comprehensive overview of all the predefined training examples available in the `verl/examples/run_agents/` folder. Each example has been tested and configured for specific agent types and tasks.
4+
5+
## Training Examples Overview
6+
7+
| Example Name | Model | Agent Type | Dataset | Tools | Reward Function | Max Steps | Training Steps | Batch Size | Learning Rate | Advantage Estimator |
8+
|--------------|-------|------------|---------|-------|-----------------|-----------|----------------|------------|---------------|-------------------|
9+
| **GUI Agent** | Qwen2.5-VL-Instruct | GUI | GUI R1 Train/Test | pyautogui_code_generator | gui_reward | 4 | 200 | 64 | 4e-7 | GRPO |
10+
| **VLM QA Agent** | Qwen2.5-VL-Instruct | React | InfoSeek Train/Val | asyncdense_retrieve, answer_qa | infoseek_reward | 6 | 200 | 128 | 5e-7 | Reinforce++ |
11+
| **Code Agent** | Qwen2.5-Instruct | Code | Orz Math 57K Train | code_interpreter | math_reward_tool | 8 | 200 | 64 | 5e-7 | GRPO |
12+
| **Webshop Agent** | Qwen2.5-Instruct | React | Webshop Goals Train/Val | webshop_browser | webshop_reward | 8 | 200 | 128 | 4e-7 | GRPO |
13+
| **Search Agent** | Qwen2.5-Instruct | React | HotpotQA Train | google_search, answer | qa_f1_reward | 4 | 200 | 128 | 5e-7 | Reinforce++ |
14+
| **Science World Agent** | Qwen2.5-Instruct | React | ScienceWorld Train/Val | scienceworld_explorer | scienceworld_reward | 20 | 200 | 128 | 4e-7 | Reinforce++ |
15+
| **ALFWorld Agent** | Qwen2.5-Instruct | React | ALFWorld Train/Val | alfworld_step, alfworld_get_admissible_commands, alfworld_get_task_objective | alfworld_episode_reward | 10 | 150 | 64 | 1e-6 | Reinforce++ |
16+
| **Retrieve Agent** | Qwen2.5-Instruct | React | HotpotQA Train | asyncdense_retrieve, answer_qa | qa_f1_reward_format | 4 | 100 | 128 | 5e-7 | Reinforce++ |
17+
18+
## Detailed Configurations
19+
For detailed configurations, please refer to the training scripts.
20+
21+
## Training Curves
22+
23+
Training curves and metrics are logged to WandB for each experiment.
24+

verl

0 commit comments

Comments
 (0)