Skip to content

Commit ae2f883

Browse files
authored
docs: update the readme; update the rst_source/examples; (RLinf#308)
* docs: update the readme; update the rst_source/examples of libero and maniksill; Signed-off-by: WinstonWmj <983289917@qq.com>
1 parent 57745af commit ae2f883

File tree

8 files changed

+221
-143
lines changed

8 files changed

+221
-143
lines changed

README.md

Lines changed: 29 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
</div>
2020

2121
<h1 align="center">
22-
<sub>RLinf: Reinforcement Learning Infrastructure for Agentic AI</sub>
22+
<sub>RLinf: Reinforcement Learning Infrastructure for Post-training</sub>
2323
</h1>
2424

2525
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models via reinforcement learning. The 'inf' in RLinf stands for `Infrastructure`, highlighting its role as a robust backbone for next-generation training. It also stands for `Infinite`, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
@@ -30,7 +30,9 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
3030

3131

3232
## What's NEW!
33-
- [2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [Behavior 1k](https://github.com/StanfordVL/BEHAVIOR-1K). Doc: [RL on Behavior 1k](https://rlinf.readthedocs.io/en/latest/rst_source/examples/behavior.html)
33+
- [2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [GR00T-N1.5](https://github.com/NVIDIA/Isaac-GR00T). Doc: [RL on GR00T-N1.5](https://rlinf.readthedocs.io/en/latest/rst_source/examples/gr00t.html).
34+
- [2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [Metaworld](https://github.com/Farama-Foundation/Metaworld). Doc: [RL on Metaworld](https://rlinf.readthedocs.io/en/latest/rst_source/examples/metaworld.html).
35+
- [2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [Behavior 1k](https://github.com/StanfordVL/BEHAVIOR-1K). Doc: [RL on Behavior 1k](https://rlinf.readthedocs.io/en/latest/rst_source/examples/behavior.html).
3436
- [2025/11] Add lora support to π₀ and π₀.₅.
3537
- [2025/10] 🔥 RLinf supports reinforcement learning fine-tuning for π₀ and π₀.₅! Doc: [RL on π₀ and π₀.₅ Models](https://rlinf.readthedocs.io/en/latest/rst_source/examples/pi0.html). For more technical details, refer to the [RL fine-tuning for π₀ and π₀.₅ technical report](https://arxiv.org/abs/2510.25889). The report on πRL by [Machine Heart](https://mp.weixin.qq.com/s/dFlpmqmE0qfhOQmGG25X9g) and [RoboTech](https://mp.weixin.qq.com/s/S51P-Y1UYXzumnZzon2N1g) are also released.
3638
- [2025/10] 🔥 RLinf now officially supports online reinforcement learning! Doc: [coding_online_rl](https://rlinf.readthedocs.io/en/latest/rst_source/examples/coding_online_rl.html), Blog post: [The first open-source agent online RL framework RLinf-Online](https://mp.weixin.qq.com/s/jmohmDokuWLhQHFueSHZIQ).
@@ -91,7 +93,7 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
9193
</ul>
9294
<li><b>Custom Models</b></li>
9395
<ul>
94-
<li>MLP-Policy</li>
96+
<li>MLP-Policy</li>
9597
</ul>
9698
</ul>
9799
</td>
@@ -116,15 +118,15 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
116118
</tbody>
117119
</table>
118120

119-
RLinf supports mainstream VLA models, mainstream CPU & GPU-based simulators via standardized Worker interfaces, and enables the first RL fine-tuning of the $\pi_{0}$ and $\pi_{0.5}$ model family with a flow-matching action expert, as shown in the above table.
121+
RLinf supports mainstream VLA models, mainstream CPU & GPU-parallel simulators via standardized Worker interfaces, and enables the first RL fine-tuning of the $\pi_{0}$ and $\pi_{0.5}$ model family with a flow-matching action expert, as shown in the above table.
120122

121123
### Agentic RL
122124

123-
Agentic RL includes both RL training for improving LLM reasoning ability, such as [Math Reasoning](https://rlinf.readthedocs.io/en/latest/rst_source/examples/reasoning.html), and RL training for Agents, for example, [RL training of coding agent](https://rlinf.readthedocs.io/en/latest/rst_source/examples/coding_online_rl.html). RLinf can also well support agentic RL. We believe embodied intelligence will also integrate the ability of agents in the future to complete complex tasks.
125+
Agentic RL includes both RL training for improving LLM reasoning ability, such as [Math Reasoning](https://rlinf.readthedocs.io/en/latest/rst_source/examples/reasoning.html), and RL training for Agents, for example, [RL training of coding agent](https://rlinf.readthedocs.io/en/latest/rst_source/examples/coding_online_rl.html). We believe embodied intelligence will also integrate the ability of agents in the future to complete complex tasks.
124126

125127
### High flexibility, efficiency, and scalability
126128

127-
Besides the rich functionalities introduced above, RLinf has high flexibility to support diverse RL training workflows (e.g., simulator integrated embodied RL, PPO/RLHF), while hiding the complexity of distributed programming. Users can easily scale RL training to a large number of GPU nodes without modifying code, meeting the increasing demand of computation for RL training.
129+
Besides the rich functionalities introduced above, RLinf has high flexibility to support diverse RL training workflows (PPO, GRPO, SAC and so on), while hiding the complexity of distributed programming. Users can easily scale RL training to a large number of GPU nodes without modifying code, meeting the increasing demand of computation for RL training.
128130

129131
The high flexibility allows RLinf to explore more efficient scheduling and execution. The hybrid execution mode for embodied RL achieves a **100%+** throughput improvement compared to baseline solutions.
130132

@@ -145,7 +147,7 @@ For more tutorials of RLinf and application examples, checkout our [documentatio
145147
### Embodied Intelligence
146148

147149
- RLinf supports both PPO and GRPO algorithms, enabling state-of-the-art training for Vision-Language-Action models.
148-
- The framework provides seamless integration with mainstream embodied intelligence benchmarks, including ManiSkill3 and LIBERO, and achieves strong performance across diverse evaluation metrics.
150+
- The framework provides seamless integration with mainstream embodied intelligence benchmarks, and achieves strong performance across diverse evaluation metrics.
149151

150152
#### OpenVLA and OpenVLA-OFT Results
151153

@@ -583,52 +585,40 @@ We welcome contributions to RLinf. Please read [contribution guide](https://gith
583585
If you find **RLinf** helpful, please cite the paper:
584586

585587
```bibtex
586-
@misc{yu2025rlinfflexibleefficientlargescale,
587-
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
588-
author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
589-
year={2025},
590-
eprint={2509.15965},
591-
archivePrefix={arXiv},
592-
primaryClass={cs.LG},
593-
url={https://arxiv.org/abs/2509.15965},
588+
@article{yu2025rlinf,
589+
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
590+
author={Yu, Chao and Wang, Yuanqing and Guo, Zhen and Lin, Hao and Xu, Si and Zang, Hongzhi and Zhang, Quanlu and Wu, Yongji and Zhu, Chunyang and Hu, Junhao and others},
591+
journal={arXiv preprint arXiv:2509.15965},
592+
year={2025}
594593
}
595594
```
596595

597596
If you use RL+VLA in RLinf, you can also cite our technical report and empirical study paper:
598597

599598
```bibtex
600-
@misc{zang2025rlinfvlaunifiedefficientframework,
601-
title={RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training},
602-
author={Hongzhi Zang and Mingjie Wei and Si Xu and Yongji Wu and Zhen Guo and Yuanqing Wang and Hao Lin and Liangzhi Shi and Yuqing Xie and Zhexuan Xu and Zhihao Liu and Kang Chen and Wenhao Tang and Quanlu Zhang and Weinan Zhang and Chao Yu and Yu Wang},
603-
year={2025},
604-
eprint={2510.06710},
605-
archivePrefix={arXiv},
606-
primaryClass={cs.RO},
607-
url={https://arxiv.org/abs/2510.06710},
599+
@article{zang2025rlinf,
600+
title={RLinf-VLA: A Unified and Efficient Framework for VLA+ RL Training},
601+
author={Zang, Hongzhi and Wei, Mingjie and Xu, Si and Wu, Yongji and Guo, Zhen and Wang, Yuanqing and Lin, Hao and Shi, Liangzhi and Xie, Yuqing and Xu, Zhexuan and others},
602+
journal={arXiv preprint arXiv:2510.06710},
603+
year={2025}
608604
}
609605
```
610606

611607
```bibtex
612-
@misc{liu2025rlbringvlageneralization,
613-
title={What Can RL Bring to VLA Generalization? An Empirical Study},
614-
author={Jijia Liu and Feng Gao and Bingwen Wei and Xinlei Chen and Qingmin Liao and Yi Wu and Chao Yu and Yu Wang},
615-
year={2025},
616-
eprint={2505.19789},
617-
archivePrefix={arXiv},
618-
primaryClass={cs.LG},
619-
url={https://arxiv.org/abs/2505.19789},
608+
@article{liu2025can,
609+
title={What can rl bring to vla generalization? an empirical study},
610+
author={Liu, Jijia and Gao, Feng and Wei, Bingwen and Chen, Xinlei and Liao, Qingmin and Wu, Yi and Yu, Chao and Wang, Yu},
611+
journal={arXiv preprint arXiv:2505.19789},
612+
year={2025}
620613
}
621614
```
622615

623616
```bibtex
624-
@misc{chen2025pitextttrlonlinerlfinetuning,
625-
title={$\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
626-
author={Kang Chen and Zhihao Liu and Tonghe Zhang and Zhen Guo and Si Xu and Hao Lin and Hongzhi Zang and Quanlu Zhang and Zhaofei Yu and Guoliang Fan and Tiejun Huang and Yu Wang and Chao Yu},
627-
year={2025},
628-
eprint={2510.25889},
629-
archivePrefix={arXiv},
630-
primaryClass={cs.LG},
631-
url={https://arxiv.org/abs/2510.25889},
617+
@article{chen2025pi_,
618+
title={$$\backslash$pi\_$\backslash$texttt $\{$RL$\}$ $: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
619+
author={Chen, Kang and Liu, Zhihao and Zhang, Tonghe and Guo, Zhen and Xu, Si and Lin, Hao and Zang, Hongzhi and Zhang, Quanlu and Yu, Zhaofei and Fan, Guoliang and others},
620+
journal={arXiv preprint arXiv:2510.25889},
621+
year={2025}
632622
}
633623
```
634624

README.zh-CN.md

Lines changed: 32 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
</div>
2020

2121
<h1 align="center">
22-
<sub>RLinf: 为Agentic AI而生的强化学习框架</sub>
22+
<sub>RLinf: 为Post-training而生的强化学习框架</sub>
2323
</h1>
2424

2525
RLinf 是一个灵活且可扩展的开源框架,专为利用强化学习进行基础模型的后训练而设计。名称中的 “inf” 既代表 `Infrastructure`,强调其作为新一代训练坚实基础的作用;也代表 `Infinite`,寓意其支持开放式学习、持续泛化以及智能发展的无限可能。
@@ -30,7 +30,9 @@ RLinf 是一个灵活且可扩展的开源框架,专为利用强化学习进
3030

3131

3232
## 最新动态
33-
- [2025/11] 🔥 基于[Behavior 1k](https://github.com/StanfordVL/BEHAVIOR-1K)的强化学习微调已经上线! 文档:[RL on Behavior 1k](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/behavior.html)
33+
- [2025/11] 🔥 RLinf现在已经支持强化学习微调[GR00T-N1.5](https://github.com/NVIDIA/Isaac-GR00T)!文档:[RL on GR00T-N1.5](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/gr00t.html)
34+
- [2025/11] 🔥 基于[Metaworld](https://github.com/Farama-Foundation/Metaworld)的强化学习微调已经上线! 文档:[RL on Metaworld](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/metaworld.html)
35+
- [2025/11] 🔥 基于[Behavior 1k](https://github.com/StanfordVL/BEHAVIOR-1K)的强化学习微调已经上线! 文档:[RL on Behavior 1k](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/behavior.html)
3436
- [2025/11] lora微调支持π₀和π₀.₅模型。
3537
- [2025/10] 🔥 π₀和π₀.₅模型的强化学习微调已经上线! 文档:[π₀和π₀.₅模型强化学习训练](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/pi0.html)。更多技术细节请参考:[π₀ 与 π₀.₅ 模型强化学习微调技术报告](https://arxiv.org/abs/2510.25889)。机器之心与具身智能之心报道:[《RLinf上新πRL:在线强化学习微调π₀ 和 π₀.₅》](https://mp.weixin.qq.com/s/dFlpmqmE0qfhOQmGG25X9g), [《清华大学最新!πRL:用在线强化学习让机器人 “边学边做” 的通用方案》](https://mp.weixin.qq.com/s/S51P-Y1UYXzumnZzon2N1g)
3638
- [2025/10] 🔥 RLinf 正式支持在线强化学习!文档:[coding_online_rl](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/coding_online_rl.html),同时发布文章 [《首个开源的Agent在线强化学习框架RLinf-Online!让你的Agent今天比昨天更聪明》](https://mp.weixin.qq.com/s/jmohmDokuWLhQHFueSHZIQ)
@@ -89,6 +91,10 @@ RLinf 是一个灵活且可扩展的开源框架,专为利用强化学习进
8991
<ul>
9092
<li>Qwen2.5-VL</li>
9193
</ul>
94+
<li><b>自定义模型</b></li>
95+
<ul>
96+
<li>MLP-Policy ✅</li>
97+
</ul>
9298
</ul>
9399
</td>
94100
<td>
@@ -112,15 +118,15 @@ RLinf 是一个灵活且可扩展的开源框架,专为利用强化学习进
112118
</tbody>
113119
</table>
114120

115-
如上表所示,RLinf支持主流VLA模型,通过标准的Worker接口支持主流的基于CPU或者GPU的模拟器,首次实现对带有 flow-matching action expert 的 $\pi_{0}$ 和 $\pi_{0.5}$ 模型家族的RL微调。
121+
如上表所示,RLinf支持主流VLA模型,通过标准的Worker接口支持主流的CPU或者GPU并行的模拟器,首次实现对带有 flow-matching action expert 的 $\pi_{0}$ 和 $\pi_{0.5}$ 模型家族的RL微调。
116122

117123
### 智能体强化学习
118124

119-
智能体强化学习包括用于提升大语言模型推理能力的强化学习训练,例如[数学推理](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/reasoning.html);也包括针对各类智能体的强化学习训练,例如[编程智能体的在线强化学习训练](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/coding_online_rl.html)RLinf 框架能够很好地支持智能体强化学习。我们相信,未来的具身智能也必将融合智能体的能力,以完成更复杂的任务。
125+
智能体强化学习包括用于提升大语言模型推理能力的强化学习训练,例如[数学推理](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/reasoning.html);也包括针对各类智能体的强化学习训练,例如[编程智能体的在线强化学习训练](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/coding_online_rl.html)。我们相信,未来的具身智能也必将融合智能体的能力,以完成更复杂的任务。
120126

121127
### 高灵活性、高效性与高可扩展性
122128

123-
除了上述丰富功能外,RLinf 还具有高度灵活性,可支持多种强化学习训练工作流(例如集成了模拟器的具身强化学习、PPO/RLHF),同时隐藏了分布式编程的复杂性。用户无需修改代码即可轻松将强化学习训练扩展至大量GPU节点,满足强化学习训练日益增长的计算需求。
129+
除了上述丰富功能外,RLinf 还具有高度灵活性,可支持多种强化学习训练工作流(PPO、GRPO、SAC等),同时隐藏了分布式编程的复杂性。用户无需修改代码即可轻松将强化学习训练扩展至大量GPU节点,满足强化学习训练日益增长的计算需求。
124130

125131
这种高灵活性使 RLinf 能够探索更高效的调度与执行模式。在具身强化学习中,混合执行模式相较于基线方案实现了100%以上的吞吐量提升。
126132

@@ -141,7 +147,7 @@ RLinf 是一个灵活且可扩展的开源框架,专为利用强化学习进
141147
### 具身智能
142148

143149
- RLinf 同时支持 PPO 与 GRPO 算法,为视觉-语言-动作(Vision-Language-Action, VLA)模型提供最先进的训练能力。
144-
- 该框架与主流具身智能基准测试(如 ManiSkill3 与 LIBERO)无缝集成,并在多样化的评测指标上均取得了优异表现。
150+
- 该框架与主流具身智能基准测试无缝集成,并在多样化的评测指标上均取得了优异表现。
145151

146152
#### OpenVLA 和 OpenVLA-OFT 结果
147153

@@ -584,52 +590,40 @@ RLinf 具有全面的 CI 测试,涵盖核心组件(通过单元测试)和
584590
如果您觉得 **RLinf** 对您的研究或工作有所帮助,请引用以下论文:
585591

586592
```bibtex
587-
@misc{yu2025rlinfflexibleefficientlargescale,
588-
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
589-
author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
590-
year={2025},
591-
eprint={2509.15965},
592-
archivePrefix={arXiv},
593-
primaryClass={cs.LG},
594-
url={https://arxiv.org/abs/2509.15965},
593+
@article{yu2025rlinf,
594+
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
595+
author={Yu, Chao and Wang, Yuanqing and Guo, Zhen and Lin, Hao and Xu, Si and Zang, Hongzhi and Zhang, Quanlu and Wu, Yongji and Zhu, Chunyang and Hu, Junhao and others},
596+
journal={arXiv preprint arXiv:2509.15965},
597+
year={2025}
595598
}
596599
```
597600

598601
如果你在 RLinf 中使用了 RL+VLA,欢迎引用我们的算法技术报告和实证研究论文:
599602

600603
```bibtex
601-
@misc{zang2025rlinfvlaunifiedefficientframework,
602-
title={RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training},
603-
author={Hongzhi Zang and Mingjie Wei and Si Xu and Yongji Wu and Zhen Guo and Yuanqing Wang and Hao Lin and Liangzhi Shi and Yuqing Xie and Zhexuan Xu and Zhihao Liu and Kang Chen and Wenhao Tang and Quanlu Zhang and Weinan Zhang and Chao Yu and Yu Wang},
604-
year={2025},
605-
eprint={2510.06710},
606-
archivePrefix={arXiv},
607-
primaryClass={cs.RO},
608-
url={https://arxiv.org/abs/2510.06710},
604+
@article{zang2025rlinf,
605+
title={RLinf-VLA: A Unified and Efficient Framework for VLA+ RL Training},
606+
author={Zang, Hongzhi and Wei, Mingjie and Xu, Si and Wu, Yongji and Guo, Zhen and Wang, Yuanqing and Lin, Hao and Shi, Liangzhi and Xie, Yuqing and Xu, Zhexuan and others},
607+
journal={arXiv preprint arXiv:2510.06710},
608+
year={2025}
609609
}
610610
```
611611

612612
```bibtex
613-
@misc{liu2025rlbringvlageneralization,
614-
title={What Can RL Bring to VLA Generalization? An Empirical Study},
615-
author={Jijia Liu and Feng Gao and Bingwen Wei and Xinlei Chen and Qingmin Liao and Yi Wu and Chao Yu and Yu Wang},
616-
year={2025},
617-
eprint={2505.19789},
618-
archivePrefix={arXiv},
619-
primaryClass={cs.LG},
620-
url={https://arxiv.org/abs/2505.19789},
613+
@article{liu2025can,
614+
title={What can rl bring to vla generalization? an empirical study},
615+
author={Liu, Jijia and Gao, Feng and Wei, Bingwen and Chen, Xinlei and Liao, Qingmin and Wu, Yi and Yu, Chao and Wang, Yu},
616+
journal={arXiv preprint arXiv:2505.19789},
617+
year={2025}
621618
}
622619
```
623620

624621
```bibtex
625-
@misc{chen2025pitextttrlonlinerlfinetuning,
626-
title={$\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
627-
author={Kang Chen and Zhihao Liu and Tonghe Zhang and Zhen Guo and Si Xu and Hao Lin and Hongzhi Zang and Quanlu Zhang and Zhaofei Yu and Guoliang Fan and Tiejun Huang and Yu Wang and Chao Yu},
628-
year={2025},
629-
eprint={2510.25889},
630-
archivePrefix={arXiv},
631-
primaryClass={cs.LG},
632-
url={https://arxiv.org/abs/2510.25889},
622+
@article{chen2025pi_,
623+
title={$$\backslash$pi\_$\backslash$texttt $\{$RL$\}$ $: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
624+
author={Chen, Kang and Liu, Zhihao and Zhang, Tonghe and Guo, Zhen and Xu, Si and Lin, Hao and Zang, Hongzhi and Zhang, Quanlu and Yu, Zhaofei and Fan, Guoliang and others},
625+
journal={arXiv preprint arXiv:2510.25889},
626+
year={2025}
633627
}
634628
```
635629

0 commit comments

Comments
 (0)