leggedrobotics
diff --git a/‎README.md‎
Lines changed: 60 additions & 9 deletions b/‎README.md‎
Lines changed: 60 additions & 9 deletions
diff --git a/‎config/dummy_config.yaml‎
Lines changed: 48 additions & 0 deletions b/‎config/dummy_config.yaml‎
Lines changed: 48 additions & 0 deletions
@@ -3,16 +3,27 @@
 Fast and simple implementation of RL algorithms, designed to run fully on GPU.
 This code is an evolution of `rl-pytorch` provided with NVIDIA's Isaac GYM.
 
-| The `algorithms` branch supports additional algorithms (SAC, DDPG, DSAC, and more)! |
-| ----------------------------------------------------------------------------------- |
+The main branch supports PPO with additional features from our work.
+These include:
 
-The main branch only supports PPO for now.
-Contributions are welcome.
+* [Random Network Distillation (RND)](https://proceedings.mlr.press/v229/schwarke23a.html)
+* [Symmetry-based Augmentation](https://arxiv.org/abs/2403.04359)
 
 **Maintainer**: Mayank Mittal and Clemens Schwarke <br/>
 **Affiliation**: Robotic Systems Lab, ETH Zurich & NVIDIA <br/>
 **Contact**: [email protected]
 
+Environment repositories using the framework:
+
+* `Isaac Lab` (built on top of NVIDIA Isaac Sim): https://github.com/isaac-sim/IsaacLab
+* `Legged-Gym` (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
+
+We welcome contributions on the community. Please check our contribution guidelines for more
+information.
+
+> **Note:** The `algorithms` branch supports additional algorithms (SAC, DDPG, DSAC, and more). However, it isn't currently actively maintained.
+
+
 ## Setup
 
 The package can be installed via PyPI with:
@@ -50,17 +61,57 @@ We use the following tools for maintaining code quality:
 
 Please check [here](https://pre-commit.com/#install) for instructions to set these up. To run over the entire repository, please execute the following command in the terminal:
 
-
 ```bash
 # for installation (only once)
 pre-commit install
 # for running
 pre-commit run --all-files
 ```
 
-## Useful Links
+## Citing
+
+**We are working on writing a white paper for this library.** Until then, please cite the following work
+if you use this library for your research:
+
+```text
+@InProceedings{rudin2022learning,
+  title = 	 {Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning},
+  author =       {Rudin, Nikita and Hoeller, David and Reist, Philipp and Hutter, Marco},
+  booktitle = 	 {Proceedings of the 5th Conference on Robot Learning},
+  pages = 	 {91--100},
+  year = 	 {2022},
+  volume = 	 {164},
+  series = 	 {Proceedings of Machine Learning Research},
+  publisher =    {PMLR},
+  url = 	 {https://proceedings.mlr.press/v164/rudin22a.html},
+}
+```
 
-Environment repositories using the framework:
+If you use the library with curiosity-driven exploration (random network distillation), please cite:
+
+```text
+@InProceedings{schwarke2023curiosity,
+  title = 	 {Curiosity-Driven Learning of Joint Locomotion and Manipulation Tasks},
+  author =       {Schwarke, Clemens and Klemm, Victor and Boon, Matthijs van der and Bjelonic, Marko and Hutter, Marco},
+  booktitle = 	 {Proceedings of The 7th Conference on Robot Learning},
+  pages = 	 {2594--2610},
+  year = 	 {2023},
+  volume = 	 {229},
+  series = 	 {Proceedings of Machine Learning Research},
+  publisher =    {PMLR},
+  url = 	 {https://proceedings.mlr.press/v229/schwarke23a.html},
+}
+```
 
-* `Isaac Lab` (built on top of NVIDIA Isaac Sim): https://github.com/isaac-sim/IsaacLab
-* `Legged-Gym` (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
+If you use the library with symmetry augmentation, please cite:
+
+```text
+@InProceedings{mittal2024symmetry,
+  author={Mittal, Mayank and Rudin, Nikita and Klemm, Victor and Allshire, Arthur and Hutter, Marco},
+  booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
+  title={Symmetry Considerations for Learning Task Symmetric Robot Policies},
+  year={2024},
+  pages={7433-7439},
+  doi={10.1109/ICRA57147.2024.10611493}
+}
+```
@@ -16,6 +16,52 @@ algorithm:
   num_learning_epochs: 5
   num_mini_batches: 4  # mini batch size = num_envs * num_steps / num_mini_batches
   schedule: adaptive  # adaptive, fixed
+
+  # -- Random Network Distillation
+  rnd_cfg:
+      weight: 0.0  # initial weight of the RND reward
+
+      # note: This is a dictionary with a required key called "mode" which can be one of "constant" or "step".
+      #   - If "constant", then the weight is constant.
+      #   - If "step", then the weight is updated using the step scheduler. It takes additional parameters:
+      #     - max_num_steps: maximum number of steps to update the weight
+      #     - final_value: final value of the weight
+      # If None, then no scheduler is used.
+      weight_schedule: null
+
+      reward_normalization: false  # whether to normalize RND reward
+      gate_normalization: true  # whether to normalize RND gate observations
+
+      # -- Learning parameters
+      learning_rate: 0.001  # learning rate for RND
+
+      # -- Network parameters
+      # note: if -1, then the network will use dimensions of the observation
+      num_outputs: 1  # number of outputs of RND network
+      predictor_hidden_dims: [-1] # hidden dimensions of predictor network
+      target_hidden_dims: [-1]  # hidden dimensions of target network
+
+  # -- Symmetry Augmentation
+  symmetry_cfg:
+    use_data_augmentation: true  # this adds symmetric trajectories to the batch
+    use_mirror_loss: false  # this adds symmetry loss term to the loss function
+
+    # string containing the module and function name to import.
+    # Example: "legged_gym.envs.locomotion.anymal_c.symmetry:get_symmetric_states"
+    #
+    # .. code-block:: python
+    #
+    #     @torch.no_grad()
+    #     def get_symmetric_states(
+    #        obs: Optional[torch.Tensor] = None, actions: Optional[torch.Tensor] = None, cfg: "BaseEnvCfg" = None, is_critic: bool = False,
+    #     ) -> Tuple[torch.Tensor, torch.Tensor]:
+    #
+    data_augmentation_func: null
+
+    # coefficient for symmetry loss term
+    # if 0, then no symmetry loss is used
+    mirror_loss_coeff: 0.0
+
 policy:
   class_name: ActorCritic
   # for MLP i.e. `ActorCritic`
@@ -27,6 +73,7 @@ policy:
   # rnn_type: 'lstm'
   # rnn_hidden_size: 512
   # rnn_num_layers: 1
+
 runner:
     num_steps_per_env: 24  # number of steps per environment per iteration
     max_iterations: 1500  # number of policy updates
@@ -44,5 +91,6 @@ runner:
     load_run: -1  # -1 means load latest run
     resume_path: null  # updated from load_run and checkpoint
     checkpoint: -1  # -1 means load latest checkpoint
+
 runner_class_name: OnPolicyRunner
 seed: 1