You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We introduce <strong>Embodied Planner-R1</strong>, a novel outcome-driven reinforcement learning framework that enables LLMs to develop interactive capabilities through autonomous exploration.
21
+
22
+
Embodied Planner-R1 enables LLM agents to learn causal relationships between actions and environmental feedback through <strong>multi-turn</strong> interactions, allowing them to update their policies based on an outcome reward.
23
+
24
+
<palign="center">
25
+
<img src=figs/alf_performance.png width=700/>
26
+
</p>
27
+
28
+
29
+
30
+
31
+
## 🔥Releases
32
+
<strong>[2025/07/01]</strong>
33
+
- 🌌 Full training code and scripts are available.
34
+
- 🤗 We open source our model weights in [huggingface]()
35
+
36
+
37
+
38
+
## 🚀 Installation
4
39
1. Embodied-Planner-R1 is based on verl with vLLM>=0.8
5
40
```
6
41
# Create the conda environment
@@ -41,22 +76,23 @@ pip install fastapi
41
76
pip install uvicorn
42
77
```
43
78
44
-
## 2. Prepare for data
79
+
## 🛠️ Data preparation
80
+
We need to prepare tasks for reinforcement learning.
45
81
```
46
82
# get task data for rl training
47
83
cd get_data
48
84
bash get_data_for_training.sh
49
85
```
50
86
51
-
## 3. Start training
87
+
## 🕹️ Quick Start
52
88
```
53
89
# Remember to replace the path in the shell script with your local path
54
90
bash cmd/alf.sh
55
91
56
92
bash cmd/sci_easy.sh
57
93
```
58
94
59
-
## 4. Evaluation
95
+
## 🎮 Evaluation
60
96
```
61
97
# We follow the framework of MINT to evaluate models.
62
98
cd verl/eval_agent
@@ -77,3 +113,14 @@ conda activate eval_agent
77
113
python -m eval_agent.main --agent_config er1_alfworld --exp_config alfworld_v2 --split dev --verbose # you can find more examples in eval.sh
78
114
79
115
```
116
+
117
+
118
+
## Acknowledgements
119
+
The training codebase is primarily based on [Verl](https://github.com/volcengine/verl), while the evaluation framework is adapted from [MINT](https://github.com/xingyaoww/mint-bench). Our model builds upon the foundation of [`Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). We deeply appreciate their excellent contributions.
0 commit comments