Skip to content

Commit 2ea56e9

Browse files
authored
fix(async_rl): add docs and fix typos for embodiment async rl (RLinf#790)
* docs: add docs for async ppo * feat: support multi rollout epoch for async ppo * chore: fix typos in async ppo's yamls * fix: let async env worker also support multi rollout epoch * fix: fix correct critic_warmup_steps * feat: add resume training for async ppo * fix: convert version to tensor in case perf regression * chore: move version update logic to set global step * fix: fix condition when sac does not set global step --------- Signed-off-by: Bo Dai <daibo@infini-ai.com>
1 parent 015ba0f commit 2ea56e9

20 files changed

+940
-78
lines changed

docs/source-en/rst_source/tutorials/rlalg/async_ppo.rst

Lines changed: 432 additions & 0 deletions
Large diffs are not rendered by default.

docs/source-en/rst_source/tutorials/rlalg/index.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ In this section, we provide an overview of each algorithm, including their core
55

66
Each algorithm is implemented with flexibility in mind, allowing researchers and practitioners to apply them to a variety of reinforcement learning tasks. Whether you're exploring standard benchmarks or designing custom environments, RLinf offers streamlined interfaces for training and evaluation.
77

8-
As of now, RLinf supports seven widely-used reinforcement learning algorithms:
8+
As of now, RLinf supports eight widely-used reinforcement learning algorithms:
99

1010
- :doc:`Proximal Policy Optimization (PPO) <ppo>`
1111
- :doc:`Group Relative Policy Optimization (GRPO) <grpo>`
@@ -14,6 +14,7 @@ As of now, RLinf supports seven widely-used reinforcement learning algorithms:
1414
- :doc:`Soft Actor-Critic (SAC) <sac>`
1515
- :doc:`Cross-Q <crossq>`
1616
- :doc:`RLPD <rlpd>`
17+
- :doc:`Async Proximal Policy Optimization (Async PPO) <async_ppo>`
1718

1819
We are continuously working to expand the selection of supported algorithms in future releases. Stay tuned for upcoming additions!
1920

@@ -28,3 +29,4 @@ We are continuously working to expand the selection of supported algorithms in f
2829
sac
2930
crossq
3031
rlpd
32+
async_ppo

0 commit comments

Comments
 (0)