Skip to content

Commit 32eb707

Browse files
authored
Update README.md
1 parent 7aae039 commit 32eb707

File tree

1 file changed

+52
-5
lines changed

1 file changed

+52
-5
lines changed

README.md

Lines changed: 52 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,41 @@
1-
<h1 style="text-align: center;">Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning</h1>
1+
<div align="center">
22

3-
## Installation
3+
# Embodied-Planner-R1
4+
<div>
5+
🌠Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning 🚀
6+
</div>
7+
</div>
8+
9+
<div>
10+
<br>
11+
12+
<div align="center">
13+
14+
[![Hugging Face Model](https://img.shields.io/badge/models-%23000000?style=for-the-badge&logo=huggingface&logoColor=000&logoColor=white)]()
15+
[![Hugging Face Data](https://img.shields.io/badge/data-%23000000?style=for-the-badge&logo=huggingface&logoColor=000&logoColor=white)]()
16+
[![Paper](https://img.shields.io/badge/Paper-%23000000?style=for-the-badge&logo=arxiv&logoColor=000&labelColor=white)]()
17+
</div>
18+
</div>
19+
20+
We introduce <strong>Embodied Planner-R1</strong>, a novel outcome-driven reinforcement learning framework that enables LLMs to develop interactive capabilities through autonomous exploration.
21+
22+
Embodied Planner-R1 enables LLM agents to learn causal relationships between actions and environmental feedback through <strong>multi-turn</strong> interactions, allowing them to update their policies based on an outcome reward.
23+
24+
<p align="center">
25+
<img src=figs/alf_performance.png width=700/>
26+
</p>
27+
28+
29+
30+
31+
## 🔥Releases
32+
<strong>[2025/07/01]</strong>
33+
- 🌌 Full training code and scripts are available.
34+
- 🤗 We open source our model weights in [huggingface]()
35+
36+
37+
38+
## 🚀 Installation
439
1. Embodied-Planner-R1 is based on verl with vLLM>=0.8
540
```
641
# Create the conda environment
@@ -41,22 +76,23 @@ pip install fastapi
4176
pip install uvicorn
4277
```
4378

44-
## 2. Prepare for data
79+
## 🛠️ Data preparation
80+
We need to prepare tasks for reinforcement learning.
4581
```
4682
# get task data for rl training
4783
cd get_data
4884
bash get_data_for_training.sh
4985
```
5086

51-
## 3. Start training
87+
## 🕹️ Quick Start
5288
```
5389
# Remember to replace the path in the shell script with your local path
5490
bash cmd/alf.sh
5591
5692
bash cmd/sci_easy.sh
5793
```
5894

59-
## 4. Evaluation
95+
## 🎮 Evaluation
6096
```
6197
# We follow the framework of MINT to evaluate models.
6298
cd verl/eval_agent
@@ -77,3 +113,14 @@ conda activate eval_agent
77113
python -m eval_agent.main --agent_config er1_alfworld --exp_config alfworld_v2 --split dev --verbose # you can find more examples in eval.sh
78114
79115
```
116+
117+
118+
## Acknowledgements
119+
The training codebase is primarily based on [Verl](https://github.com/volcengine/verl), while the evaluation framework is adapted from [MINT](https://github.com/xingyaoww/mint-bench). Our model builds upon the foundation of [`Qwen2.5-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). We deeply appreciate their excellent contributions.
120+
121+
122+
## Citation
123+
```
124+
```
125+
126+

0 commit comments

Comments
 (0)