You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+29-39Lines changed: 29 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@
19
19
</div>
20
20
21
21
<h1align="center">
22
-
<sub>RLinf: Reinforcement Learning Infrastructure for Agentic AI</sub>
22
+
<sub>RLinf: Reinforcement Learning Infrastructure for Post-training</sub>
23
23
</h1>
24
24
25
25
RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models via reinforcement learning. The 'inf' in RLinf stands for `Infrastructure`, highlighting its role as a robust backbone for next-generation training. It also stands for `Infinite`, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
@@ -30,7 +30,9 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
30
30
31
31
32
32
## What's NEW!
33
-
-[2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [Behavior 1k](https://github.com/StanfordVL/BEHAVIOR-1K). Doc: [RL on Behavior 1k](https://rlinf.readthedocs.io/en/latest/rst_source/examples/behavior.html)
33
+
-[2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [GR00T-N1.5](https://github.com/NVIDIA/Isaac-GR00T). Doc: [RL on GR00T-N1.5](https://rlinf.readthedocs.io/en/latest/rst_source/examples/gr00t.html).
34
+
-[2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [Metaworld](https://github.com/Farama-Foundation/Metaworld). Doc: [RL on Metaworld](https://rlinf.readthedocs.io/en/latest/rst_source/examples/metaworld.html).
35
+
-[2025/11] 🔥 RLinf supports reinforcement learning fine-tuning for [Behavior 1k](https://github.com/StanfordVL/BEHAVIOR-1K). Doc: [RL on Behavior 1k](https://rlinf.readthedocs.io/en/latest/rst_source/examples/behavior.html).
34
36
-[2025/11] Add lora support to π₀ and π₀.₅.
35
37
-[2025/10] 🔥 RLinf supports reinforcement learning fine-tuning for π₀ and π₀.₅! Doc: [RL on π₀ and π₀.₅ Models](https://rlinf.readthedocs.io/en/latest/rst_source/examples/pi0.html). For more technical details, refer to the [RL fine-tuning for π₀ and π₀.₅ technical report](https://arxiv.org/abs/2510.25889). The report on πRL by [Machine Heart](https://mp.weixin.qq.com/s/dFlpmqmE0qfhOQmGG25X9g) and [RoboTech](https://mp.weixin.qq.com/s/S51P-Y1UYXzumnZzon2N1g) are also released.
36
38
-[2025/10] 🔥 RLinf now officially supports online reinforcement learning! Doc: [coding_online_rl](https://rlinf.readthedocs.io/en/latest/rst_source/examples/coding_online_rl.html), Blog post: [The first open-source agent online RL framework RLinf-Online](https://mp.weixin.qq.com/s/jmohmDokuWLhQHFueSHZIQ).
@@ -91,7 +93,7 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
91
93
</ul>
92
94
<li><b>Custom Models</b></li>
93
95
<ul>
94
-
<li>MLP-Policy</li>
96
+
<li>MLP-Policy ✅</li>
95
97
</ul>
96
98
</ul>
97
99
</td>
@@ -116,15 +118,15 @@ RLinf is a flexible and scalable open-source infrastructure designed for post-tr
116
118
</tbody>
117
119
</table>
118
120
119
-
RLinf supports mainstream VLA models, mainstream CPU & GPU-based simulators via standardized Worker interfaces, and enables the first RL fine-tuning of the $\pi_{0}$ and $\pi_{0.5}$ model family with a flow-matching action expert, as shown in the above table.
121
+
RLinf supports mainstream VLA models, mainstream CPU & GPU-parallel simulators via standardized Worker interfaces, and enables the first RL fine-tuning of the $\pi_{0}$ and $\pi_{0.5}$ model family with a flow-matching action expert, as shown in the above table.
120
122
121
123
### Agentic RL
122
124
123
-
Agentic RL includes both RL training for improving LLM reasoning ability, such as [Math Reasoning](https://rlinf.readthedocs.io/en/latest/rst_source/examples/reasoning.html), and RL training for Agents, for example, [RL training of coding agent](https://rlinf.readthedocs.io/en/latest/rst_source/examples/coding_online_rl.html). RLinf can also well support agentic RL. We believe embodied intelligence will also integrate the ability of agents in the future to complete complex tasks.
125
+
Agentic RL includes both RL training for improving LLM reasoning ability, such as [Math Reasoning](https://rlinf.readthedocs.io/en/latest/rst_source/examples/reasoning.html), and RL training for Agents, for example, [RL training of coding agent](https://rlinf.readthedocs.io/en/latest/rst_source/examples/coding_online_rl.html). We believe embodied intelligence will also integrate the ability of agents in the future to complete complex tasks.
124
126
125
127
### High flexibility, efficiency, and scalability
126
128
127
-
Besides the rich functionalities introduced above, RLinf has high flexibility to support diverse RL training workflows (e.g., simulator integrated embodied RL, PPO/RLHF), while hiding the complexity of distributed programming. Users can easily scale RL training to a large number of GPU nodes without modifying code, meeting the increasing demand of computation for RL training.
129
+
Besides the rich functionalities introduced above, RLinf has high flexibility to support diverse RL training workflows (PPO, GRPO, SAC and so on), while hiding the complexity of distributed programming. Users can easily scale RL training to a large number of GPU nodes without modifying code, meeting the increasing demand of computation for RL training.
128
130
129
131
The high flexibility allows RLinf to explore more efficient scheduling and execution. The hybrid execution mode for embodied RL achieves a **100%+** throughput improvement compared to baseline solutions.
130
132
@@ -145,7 +147,7 @@ For more tutorials of RLinf and application examples, checkout our [documentatio
145
147
### Embodied Intelligence
146
148
147
149
- RLinf supports both PPO and GRPO algorithms, enabling state-of-the-art training for Vision-Language-Action models.
148
-
- The framework provides seamless integration with mainstream embodied intelligence benchmarks, including ManiSkill3 and LIBERO, and achieves strong performance across diverse evaluation metrics.
150
+
- The framework provides seamless integration with mainstream embodied intelligence benchmarks, and achieves strong performance across diverse evaluation metrics.
149
151
150
152
#### OpenVLA and OpenVLA-OFT Results
151
153
@@ -583,52 +585,40 @@ We welcome contributions to RLinf. Please read [contribution guide](https://gith
583
585
If you find **RLinf** helpful, please cite the paper:
584
586
585
587
```bibtex
586
-
@misc{yu2025rlinfflexibleefficientlargescale,
587
-
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
588
-
author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
589
-
year={2025},
590
-
eprint={2509.15965},
591
-
archivePrefix={arXiv},
592
-
primaryClass={cs.LG},
593
-
url={https://arxiv.org/abs/2509.15965},
588
+
@article{yu2025rlinf,
589
+
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
590
+
author={Yu, Chao and Wang, Yuanqing and Guo, Zhen and Lin, Hao and Xu, Si and Zang, Hongzhi and Zhang, Quanlu and Wu, Yongji and Zhu, Chunyang and Hu, Junhao and others},
591
+
journal={arXiv preprint arXiv:2509.15965},
592
+
year={2025}
594
593
}
595
594
```
596
595
597
596
If you use RL+VLA in RLinf, you can also cite our technical report and empirical study paper:
598
597
599
598
```bibtex
600
-
@misc{zang2025rlinfvlaunifiedefficientframework,
601
-
title={RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training},
602
-
author={Hongzhi Zang and Mingjie Wei and Si Xu and Yongji Wu and Zhen Guo and Yuanqing Wang and Hao Lin and Liangzhi Shi and Yuqing Xie and Zhexuan Xu and Zhihao Liu and Kang Chen and Wenhao Tang and Quanlu Zhang and Weinan Zhang and Chao Yu and Yu Wang},
603
-
year={2025},
604
-
eprint={2510.06710},
605
-
archivePrefix={arXiv},
606
-
primaryClass={cs.RO},
607
-
url={https://arxiv.org/abs/2510.06710},
599
+
@article{zang2025rlinf,
600
+
title={RLinf-VLA: A Unified and Efficient Framework for VLA+ RL Training},
601
+
author={Zang, Hongzhi and Wei, Mingjie and Xu, Si and Wu, Yongji and Guo, Zhen and Wang, Yuanqing and Lin, Hao and Shi, Liangzhi and Xie, Yuqing and Xu, Zhexuan and others},
602
+
journal={arXiv preprint arXiv:2510.06710},
603
+
year={2025}
608
604
}
609
605
```
610
606
611
607
```bibtex
612
-
@misc{liu2025rlbringvlageneralization,
613
-
title={What Can RL Bring to VLA Generalization? An Empirical Study},
614
-
author={Jijia Liu and Feng Gao and Bingwen Wei and Xinlei Chen and Qingmin Liao and Yi Wu and Chao Yu and Yu Wang},
615
-
year={2025},
616
-
eprint={2505.19789},
617
-
archivePrefix={arXiv},
618
-
primaryClass={cs.LG},
619
-
url={https://arxiv.org/abs/2505.19789},
608
+
@article{liu2025can,
609
+
title={What can rl bring to vla generalization? an empirical study},
610
+
author={Liu, Jijia and Gao, Feng and Wei, Bingwen and Chen, Xinlei and Liao, Qingmin and Wu, Yi and Yu, Chao and Wang, Yu},
611
+
journal={arXiv preprint arXiv:2505.19789},
612
+
year={2025}
620
613
}
621
614
```
622
615
623
616
```bibtex
624
-
@misc{chen2025pitextttrlonlinerlfinetuning,
625
-
title={$\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
626
-
author={Kang Chen and Zhihao Liu and Tonghe Zhang and Zhen Guo and Si Xu and Hao Lin and Hongzhi Zang and Quanlu Zhang and Zhaofei Yu and Guoliang Fan and Tiejun Huang and Yu Wang and Chao Yu},
627
-
year={2025},
628
-
eprint={2510.25889},
629
-
archivePrefix={arXiv},
630
-
primaryClass={cs.LG},
631
-
url={https://arxiv.org/abs/2510.25889},
617
+
@article{chen2025pi_,
618
+
title={$$\backslash$pi\_$\backslash$texttt $\{$RL$\}$ $: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
619
+
author={Chen, Kang and Liu, Zhihao and Zhang, Tonghe and Guo, Zhen and Xu, Si and Lin, Hao and Zang, Hongzhi and Zhang, Quanlu and Yu, Zhaofei and Fan, Guoliang and others},
-[2025/11] 🔥 基于[Behavior 1k](https://github.com/StanfordVL/BEHAVIOR-1K)的强化学习微调已经上线! 文档:[RL on Behavior 1k](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/behavior.html)
33
+
-[2025/11] 🔥 RLinf现在已经支持强化学习微调[GR00T-N1.5](https://github.com/NVIDIA/Isaac-GR00T)!文档:[RL on GR00T-N1.5](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/gr00t.html)。
34
+
-[2025/11] 🔥 基于[Metaworld](https://github.com/Farama-Foundation/Metaworld)的强化学习微调已经上线! 文档:[RL on Metaworld](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/metaworld.html)。
35
+
-[2025/11] 🔥 基于[Behavior 1k](https://github.com/StanfordVL/BEHAVIOR-1K)的强化学习微调已经上线! 文档:[RL on Behavior 1k](https://rlinf.readthedocs.io/zh-cn/latest/rst_source/examples/behavior.html) 。
@@ -584,52 +590,40 @@ RLinf 具有全面的 CI 测试,涵盖核心组件(通过单元测试)和
584
590
如果您觉得 **RLinf** 对您的研究或工作有所帮助,请引用以下论文:
585
591
586
592
```bibtex
587
-
@misc{yu2025rlinfflexibleefficientlargescale,
588
-
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
589
-
author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
590
-
year={2025},
591
-
eprint={2509.15965},
592
-
archivePrefix={arXiv},
593
-
primaryClass={cs.LG},
594
-
url={https://arxiv.org/abs/2509.15965},
593
+
@article{yu2025rlinf,
594
+
title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation},
595
+
author={Yu, Chao and Wang, Yuanqing and Guo, Zhen and Lin, Hao and Xu, Si and Zang, Hongzhi and Zhang, Quanlu and Wu, Yongji and Zhu, Chunyang and Hu, Junhao and others},
596
+
journal={arXiv preprint arXiv:2509.15965},
597
+
year={2025}
595
598
}
596
599
```
597
600
598
601
如果你在 RLinf 中使用了 RL+VLA,欢迎引用我们的算法技术报告和实证研究论文:
599
602
600
603
```bibtex
601
-
@misc{zang2025rlinfvlaunifiedefficientframework,
602
-
title={RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training},
603
-
author={Hongzhi Zang and Mingjie Wei and Si Xu and Yongji Wu and Zhen Guo and Yuanqing Wang and Hao Lin and Liangzhi Shi and Yuqing Xie and Zhexuan Xu and Zhihao Liu and Kang Chen and Wenhao Tang and Quanlu Zhang and Weinan Zhang and Chao Yu and Yu Wang},
604
-
year={2025},
605
-
eprint={2510.06710},
606
-
archivePrefix={arXiv},
607
-
primaryClass={cs.RO},
608
-
url={https://arxiv.org/abs/2510.06710},
604
+
@article{zang2025rlinf,
605
+
title={RLinf-VLA: A Unified and Efficient Framework for VLA+ RL Training},
606
+
author={Zang, Hongzhi and Wei, Mingjie and Xu, Si and Wu, Yongji and Guo, Zhen and Wang, Yuanqing and Lin, Hao and Shi, Liangzhi and Xie, Yuqing and Xu, Zhexuan and others},
607
+
journal={arXiv preprint arXiv:2510.06710},
608
+
year={2025}
609
609
}
610
610
```
611
611
612
612
```bibtex
613
-
@misc{liu2025rlbringvlageneralization,
614
-
title={What Can RL Bring to VLA Generalization? An Empirical Study},
615
-
author={Jijia Liu and Feng Gao and Bingwen Wei and Xinlei Chen and Qingmin Liao and Yi Wu and Chao Yu and Yu Wang},
616
-
year={2025},
617
-
eprint={2505.19789},
618
-
archivePrefix={arXiv},
619
-
primaryClass={cs.LG},
620
-
url={https://arxiv.org/abs/2505.19789},
613
+
@article{liu2025can,
614
+
title={What can rl bring to vla generalization? an empirical study},
615
+
author={Liu, Jijia and Gao, Feng and Wei, Bingwen and Chen, Xinlei and Liao, Qingmin and Wu, Yi and Yu, Chao and Wang, Yu},
616
+
journal={arXiv preprint arXiv:2505.19789},
617
+
year={2025}
621
618
}
622
619
```
623
620
624
621
```bibtex
625
-
@misc{chen2025pitextttrlonlinerlfinetuning,
626
-
title={$\pi_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
627
-
author={Kang Chen and Zhihao Liu and Tonghe Zhang and Zhen Guo and Si Xu and Hao Lin and Hongzhi Zang and Quanlu Zhang and Zhaofei Yu and Guoliang Fan and Tiejun Huang and Yu Wang and Chao Yu},
628
-
year={2025},
629
-
eprint={2510.25889},
630
-
archivePrefix={arXiv},
631
-
primaryClass={cs.LG},
632
-
url={https://arxiv.org/abs/2510.25889},
622
+
@article{chen2025pi_,
623
+
title={$$\backslash$pi\_$\backslash$texttt $\{$RL$\}$ $: Online RL Fine-tuning for Flow-based Vision-Language-Action Models},
624
+
author={Chen, Kang and Liu, Zhihao and Zhang, Tonghe and Guo, Zhen and Xu, Si and Lin, Hao and Zang, Hongzhi and Zhang, Quanlu and Yu, Zhaofei and Fan, Guoliang and others},
0 commit comments