Skip to content

Commit 3d02fd9

Browse files
committed
Adds symmetry augmentation and RND implementation
Approved-by: Clemens Schwarke
1 parent c388c6a commit 3d02fd9

File tree

14 files changed

+843
-83
lines changed

14 files changed

+843
-83
lines changed

README.md

Lines changed: 60 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,27 @@
33
Fast and simple implementation of RL algorithms, designed to run fully on GPU.
44
This code is an evolution of `rl-pytorch` provided with NVIDIA's Isaac GYM.
55

6-
| The `algorithms` branch supports additional algorithms (SAC, DDPG, DSAC, and more)! |
7-
| ----------------------------------------------------------------------------------- |
6+
The main branch supports PPO with additional features from our work.
7+
These include:
88

9-
The main branch only supports PPO for now.
10-
Contributions are welcome.
9+
* [Random Network Distillation (RND)](https://proceedings.mlr.press/v229/schwarke23a.html)
10+
* [Symmetry-based Augmentation](https://arxiv.org/abs/2403.04359)
1111

1212
**Maintainer**: Mayank Mittal and Clemens Schwarke <br/>
1313
**Affiliation**: Robotic Systems Lab, ETH Zurich & NVIDIA <br/>
1414
**Contact**: [email protected]
1515

16+
Environment repositories using the framework:
17+
18+
* `Isaac Lab` (built on top of NVIDIA Isaac Sim): https://github.com/isaac-sim/IsaacLab
19+
* `Legged-Gym` (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
20+
21+
We welcome contributions on the community. Please check our contribution guidelines for more
22+
information.
23+
24+
> **Note:** The `algorithms` branch supports additional algorithms (SAC, DDPG, DSAC, and more). However, it isn't currently actively maintained.
25+
26+
1627
## Setup
1728

1829
The package can be installed via PyPI with:
@@ -50,17 +61,57 @@ We use the following tools for maintaining code quality:
5061

5162
Please check [here](https://pre-commit.com/#install) for instructions to set these up. To run over the entire repository, please execute the following command in the terminal:
5263

53-
5464
```bash
5565
# for installation (only once)
5666
pre-commit install
5767
# for running
5868
pre-commit run --all-files
5969
```
6070

61-
## Useful Links
71+
## Citing
72+
73+
**We are working on writing a white paper for this library.** Until then, please cite the following work
74+
if you use this library for your research:
75+
76+
```text
77+
@InProceedings{rudin2022learning,
78+
title = {Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning},
79+
author = {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco},
80+
booktitle = {Proceedings of the 5th Conference on Robot Learning},
81+
pages = {91--100},
82+
year = {2022},
83+
volume = {164},
84+
series = {Proceedings of Machine Learning Research},
85+
publisher = {PMLR},
86+
url = {https://proceedings.mlr.press/v164/rudin22a.html},
87+
}
88+
```
6289

63-
Environment repositories using the framework:
90+
If you use the library with curiosity-driven exploration (random network distillation), please cite:
91+
92+
```text
93+
@InProceedings{schwarke2023curiosity,
94+
title = {Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks},
95+
author = {Schwarke, Clemens and Klemm, Victor and Boon, Matthijs van der and Bjelonic, Marko and Hutter, Marco},
96+
booktitle = {Proceedings of The 7th Conference on Robot Learning},
97+
pages = {2594--2610},
98+
year = {2023},
99+
volume = {229},
100+
series = {Proceedings of Machine Learning Research},
101+
publisher = {PMLR},
102+
url = {https://proceedings.mlr.press/v229/schwarke23a.html},
103+
}
104+
```
64105

65-
* `Isaac Lab` (built on top of NVIDIA Isaac Sim): https://github.com/isaac-sim/IsaacLab
66-
* `Legged-Gym` (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
106+
If you use the library with symmetry augmentation, please cite:
107+
108+
```text
109+
@InProceedings{mittal2024symmetry,
110+
author={Mittal, Mayank and Rudin, Nikita and Klemm, Victor and Allshire, Arthur and Hutter, Marco},
111+
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
112+
title={Symmetry Considerations for Learning Task Symmetric Robot Policies},
113+
year={2024},
114+
pages={7433-7439},
115+
doi={10.1109/ICRA57147.2024.10611493}
116+
}
117+
```

config/dummy_config.yaml

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,52 @@ algorithm:
1616
num_learning_epochs: 5
1717
num_mini_batches: 4 # mini batch size = num_envs * num_steps / num_mini_batches
1818
schedule: adaptive # adaptive, fixed
19+
20+
# -- Random Network Distillation
21+
rnd_cfg:
22+
weight: 0.0 # initial weight of the RND reward
23+
24+
# note: This is a dictionary with a required key called "mode" which can be one of "constant" or "step".
25+
# - If "constant", then the weight is constant.
26+
# - If "step", then the weight is updated using the step scheduler. It takes additional parameters:
27+
# - max_num_steps: maximum number of steps to update the weight
28+
# - final_value: final value of the weight
29+
# If None, then no scheduler is used.
30+
weight_schedule: null
31+
32+
reward_normalization: false # whether to normalize RND reward
33+
gate_normalization: true # whether to normalize RND gate observations
34+
35+
# -- Learning parameters
36+
learning_rate: 0.001 # learning rate for RND
37+
38+
# -- Network parameters
39+
# note: if -1, then the network will use dimensions of the observation
40+
num_outputs: 1 # number of outputs of RND network
41+
predictor_hidden_dims: [-1] # hidden dimensions of predictor network
42+
target_hidden_dims: [-1] # hidden dimensions of target network
43+
44+
# -- Symmetry Augmentation
45+
symmetry_cfg:
46+
use_data_augmentation: true # this adds symmetric trajectories to the batch
47+
use_mirror_loss: false # this adds symmetry loss term to the loss function
48+
49+
# string containing the module and function name to import.
50+
# Example: "legged_gym.envs.locomotion.anymal_c.symmetry:get_symmetric_states"
51+
#
52+
# .. code-block:: python
53+
#
54+
# @torch.no_grad()
55+
# def get_symmetric_states(
56+
# obs: Optional[torch.Tensor] = None, actions: Optional[torch.Tensor] = None, cfg: "BaseEnvCfg" = None, is_critic: bool = False,
57+
# ) -> Tuple[torch.Tensor, torch.Tensor]:
58+
#
59+
data_augmentation_func: null
60+
61+
# coefficient for symmetry loss term
62+
# if 0, then no symmetry loss is used
63+
mirror_loss_coeff: 0.0
64+
1965
policy:
2066
class_name: ActorCritic
2167
# for MLP i.e. `ActorCritic`
@@ -27,6 +73,7 @@ policy:
2773
# rnn_type: 'lstm'
2874
# rnn_hidden_size: 512
2975
# rnn_num_layers: 1
76+
3077
runner:
3178
num_steps_per_env: 24 # number of steps per environment per iteration
3279
max_iterations: 1500 # number of policy updates
@@ -44,5 +91,6 @@ runner:
4491
load_run: -1 # -1 means load latest run
4592
resume_path: null # updated from load_run and checkpoint
4693
checkpoint: -1 # -1 means load latest checkpoint
94+
4795
runner_class_name: OnPolicyRunner
4896
seed: 1

0 commit comments

Comments
 (0)