Update Readme

Reason-Wang · Reason-Wang · commit 5efe6d32fbab · 2025-08-15T19:36:18.000Z
diff --git a/README.md b/README.md
@@ -12,10 +12,14 @@ AgentFly is an extensible framework for building LLM agents with reinforcement l
 
 ## 🆕 News
 
-**New: Chat Template System** - A flexible framework for creating conversation templates with multi-model support, vision capabilities, and tool integration. [Learn more →](docs/chat_template/)
+**Multi-Modal (Vision) Agent Training Support** - Thanks to the powerful template system, AgentFly now supports training vision-language agents! 🎉
+
+Train agents that can see and understand visual content, including GUI automation and image-based QA. See our [predefined training examples](docs/examples/predefined_training_examples.md) for ready-to-use scripts.
 
 ---
 
+**New: Chat Template System** - A flexible framework for creating conversation templates with multi-model support, vision capabilities, and tool integration. [Learn more →](docs/chat_template/)
+
 ## Installation
 Clone and initialize the project:
 ```bash
diff --git a/agents/agents/agents/chain/chain_base.py b/agents/agents/agents/chain/chain_base.py
@@ -345,7 +345,7 @@ async def _run_single_chain(self,
                     newest_messages.append({
                         "role": "tool",
                         "tool_call_id": tool_call["id"],
-                        "content": [{"type": "text", "text": observation_json}],
+                        "content": [{"type": "text", "text": observation}],
                     })
                     action_input_node.messages = deepcopy(newest_messages)
                     action_input_node.is_terminal = result["status"] in self.terminal_status
diff --git a/docs/examples/predefined_training_examples.md b/docs/examples/predefined_training_examples.md
@@ -0,0 +1,24 @@
+# Predefined Training Examples
+
+This document provides a comprehensive overview of all the predefined training examples available in the `verl/examples/run_agents/` folder. Each example has been tested and configured for specific agent types and tasks.
+
+## Training Examples Overview
+
+| Example Name | Model | Agent Type | Dataset | Tools | Reward Function | Max Steps | Training Steps | Batch Size | Learning Rate | Advantage Estimator |
+|--------------|-------|------------|---------|-------|-----------------|-----------|----------------|------------|---------------|-------------------|
+| **GUI Agent** | Qwen2.5-VL-Instruct | GUI | GUI R1 Train/Test | pyautogui_code_generator | gui_reward | 4 | 200 | 64 | 4e-7 | GRPO |
+| **VLM QA Agent** | Qwen2.5-VL-Instruct | React | InfoSeek Train/Val | asyncdense_retrieve, answer_qa | infoseek_reward | 6 | 200 | 128 | 5e-7 | Reinforce++ |
+| **Code Agent** | Qwen2.5-Instruct | Code | Orz Math 57K Train | code_interpreter | math_reward_tool | 8 | 200 | 64 | 5e-7 | GRPO |
+| **Webshop Agent** | Qwen2.5-Instruct | React | Webshop Goals Train/Val | webshop_browser | webshop_reward | 8 | 200 | 128 | 4e-7 | GRPO |
+| **Search Agent** | Qwen2.5-Instruct | React | HotpotQA Train | google_search, answer | qa_f1_reward | 4 | 200 | 128 | 5e-7 | Reinforce++ |
+| **Science World Agent** | Qwen2.5-Instruct | React | ScienceWorld Train/Val | scienceworld_explorer | scienceworld_reward | 20 | 200 | 128 | 4e-7 | Reinforce++ |
+| **ALFWorld Agent** | Qwen2.5-Instruct | React | ALFWorld Train/Val | alfworld_step, alfworld_get_admissible_commands, alfworld_get_task_objective | alfworld_episode_reward | 10 | 150 | 64 | 1e-6 | Reinforce++ |
+| **Retrieve Agent** | Qwen2.5-Instruct | React | HotpotQA Train | asyncdense_retrieve, answer_qa | qa_f1_reward_format | 4 | 100 | 128 | 5e-7 | Reinforce++ |
+
+## Detailed Configurations
+For detailed configurations, please refer to the training scripts.
+
+## Training Curves
+
+Training curves and metrics are logged to WandB for each experiment.
+
diff --git a/verl b/verl
@@ -1 +1 @@
-Subproject commit 0ba71360604c85ca7e83168520169fa858681633
+Subproject commit 1f3ca1319d9c44a580394a5f6c20ca156957eedb