leggedrobotics
diff --git a/‎.flake8‎
Lines changed: 0 additions & 22 deletions b/‎.flake8‎
Lines changed: 0 additions & 22 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 5 additions & 24 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 5 additions & 24 deletions
diff --git a/‎CONTRIBUTORS.md‎
Lines changed: 3 additions & 1 deletion b/‎CONTRIBUTORS.md‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 5 additions & 9 deletions b/‎README.md‎
Lines changed: 5 additions & 9 deletions
diff --git a/‎config/example_config.yaml‎
Lines changed: 30 additions & 29 deletions b/‎config/example_config.yaml‎
Lines changed: 30 additions & 29 deletions
diff --git a/‎licenses/dependencies/black-license.txt‎
Lines changed: 0 additions & 21 deletions b/‎licenses/dependencies/black-license.txt‎
Lines changed: 0 additions & 21 deletions
diff --git a/‎licenses/dependencies/flake8-license.txt‎
Lines changed: 0 additions & 22 deletions b/‎licenses/dependencies/flake8-license.txt‎
Lines changed: 0 additions & 22 deletions
diff --git a/‎licenses/dependencies/isort-license.txt‎
Lines changed: 0 additions & 21 deletions b/‎licenses/dependencies/isort-license.txt‎
Lines changed: 0 additions & 21 deletions
diff --git a/‎licenses/dependencies/numpy_license.txt‎ renamed to ‎licenses/dependencies/numpy-license.txt‎ b/‎licenses/dependencies/numpy_license.txt‎ renamed to ‎licenses/dependencies/numpy-license.txt‎
diff --git a/‎licenses/dependencies/pyupgrade-license.txt‎
Lines changed: 0 additions & 19 deletions b/‎licenses/dependencies/pyupgrade-license.txt‎
Lines changed: 0 additions & 19 deletions
@@ -1,40 +1,21 @@
 repos:
-  - repo: https://github.com/python/black
-    rev: 23.10.1
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.14.0
     hooks:
-      - id: black
-        args: ["--line-length", "120", "--preview"]
-  - repo: https://github.com/pycqa/flake8
-    rev: 6.1.0
-    hooks:
-      - id: flake8
-        additional_dependencies: [flake8-simplify, flake8-return]
+      - id: ruff-check
+      - id: ruff-format
   - repo: https://github.com/pre-commit/pre-commit-hooks
     rev: v4.5.0
     hooks:
-      - id: trailing-whitespace
       - id: check-symlinks
       - id: destroyed-symlinks
       - id: check-yaml
+      - id: check-toml
       - id: check-merge-conflict
       - id: check-case-conflict
       - id: check-executables-have-shebangs
-      - id: check-toml
-      - id: end-of-file-fixer
       - id: check-shebang-scripts-are-executable
       - id: detect-private-key
-      - id: debug-statements
-  - repo: https://github.com/pycqa/isort
-    rev: 5.12.0
-    hooks:
-      - id: isort
-        name: isort (python)
-        args: ["--profile", "black", "--filter-files"]
-  - repo: https://github.com/asottile/pyupgrade
-    rev: v3.15.0
-    hooks:
-      - id: pyupgrade
-        args: ["--py37-plus"]
   - repo: https://github.com/codespell-project/codespell
     rev: v2.2.6
     hooks:
 
@@ -17,12 +17,14 @@ Please keep the lists sorted alphabetically.
 
 ---
 
-* Mayank Mittal
 * Clemens Schwarke
+* Mayank Mittal
 
 ## Authors
 
+* Clemens Schwarke
 * David Hoeller
+* Mayank Mittal
 * Nikita Rudin
 
 ## Contributors
 
@@ -1,15 +1,14 @@
-# RSL RL
+# RSL-RL
 
-A fast and simple implementation of RL algorithms, designed to run fully on GPU.
-This code is an evolution of `rl-pytorch` provided with NVIDIA's Isaac Gym.
+A fast and simple implementation of learning algorithms for robotics. For an overview of the library please have a look at https://arxiv.org/pdf/2509.10771.
 
 Environment repositories using the framework:
 
 * **`Isaac Lab`** (built on top of NVIDIA Isaac Sim): https://github.com/isaac-sim/IsaacLab
-* **`Legged-Gym`** (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
+* **`Legged Gym`** (built on top of NVIDIA Isaac Gym): https://leggedrobotics.github.io/legged_gym/
 * **`MuJoCo Playground`** (built on top of MuJoCo MJX and Warp): https://github.com/google-deepmind/mujoco_playground/
 
-The main branch supports **PPO** and **Student-Teacher Distillation** with additional features from our research. These include:
+The library currently supports **PPO** and **Student-Teacher Distillation** with additional features from our research. These include:
 
 * [Random Network Distillation (RND)](https://proceedings.mlr.press/v229/schwarke23a.html) - Encourages exploration by adding
   a curiosity driven intrinsic reward.
@@ -22,8 +21,6 @@ information.
 **Affiliation**: Robotic Systems Lab, ETH Zurich & NVIDIA <br/>
 **Contact**: [email protected]
 
-> **Note:** The `algorithms` branch supports additional algorithms (SAC, DDPG, DSAC, and more). However, it isn't currently actively maintained.
-
 
 ## Setup
 
@@ -57,8 +54,7 @@ For documentation, we adopt the [Google Style Guide](https://sphinxcontrib-napol
 We use the following tools for maintaining code quality:
 
 - [pre-commit](https://pre-commit.com/): Runs a list of formatters and linters over the codebase.
-- [black](https://black.readthedocs.io/en/stable/): The uncompromising code formatter.
-- [flake8](https://flake8.pycqa.org/en/latest/): A wrapper around PyFlakes, pycodestyle, and McCabe complexity checker.
+- [ruff](https://github.com/astral-sh/ruff): An extremely fast Python linter and code formatter, written in Rust.
 
 Please check [here](https://pre-commit.com/#install) for instructions to set these up. To run over the entire repository, please execute the following command in the terminal:
 
 
@@ -1,21 +1,21 @@
 runner:
   class_name: OnPolicyRunner
-  # -- general
-  num_steps_per_env: 24  # number of steps per environment per iteration
-  max_iterations: 1500  # number of policy updates
+  # General
+  num_steps_per_env: 24  # Number of steps per environment per iteration
+  max_iterations: 1500  # Number of policy updates
   seed: 1
-  # -- observations
-  obs_groups: {"policy": ["policy"], "critic": ["policy", "privileged"]} # maps observation groups to types. See `vec_env.py` for more information
-  # -- logging parameters
-  save_interval: 50  # check for potential saves every `save_interval` iterations
+  # Observations
+  obs_groups: {"policy": ["policy"], "critic": ["policy", "privileged"]} # Maps observation groups to sets. See `vec_env.py` for more information
+  # Logging parameters
+  save_interval: 50  # Check for potential saves every `save_interval` iterations
   experiment_name: walking_experiment
   run_name: ""
-  # -- logging writer
+  # Logging writer
   logger: tensorboard  # tensorboard, neptune, wandb
   neptune_project: legged_gym
   wandb_project: legged_gym
 
-  # -- policy
+  # Policy
   policy:
     class_name: ActorCritic
     activation: elu
@@ -25,45 +25,46 @@ runner:
     critic_hidden_dims: [256, 256, 256]
     init_noise_std: 1.0
     noise_std_type: "scalar"  # 'scalar' or 'log'
+    state_dependent_std: false
 
-  # -- algorithm
+  # Algorithm
   algorithm:
     class_name: PPO
-    # -- training
+    # Training
     learning_rate: 0.001
     num_learning_epochs: 5
     num_mini_batches: 4  # mini batch size = num_envs * num_steps / num_mini_batches
     schedule: adaptive  # adaptive, fixed
-    # -- value function
+    # Value function
     value_loss_coef: 1.0
     clip_param: 0.2
     use_clipped_value_loss: true
-    # -- surrogate loss
+    # Surrogate loss
     desired_kl: 0.01
     entropy_coef: 0.01
     gamma: 0.99
     lam: 0.95
     max_grad_norm: 1.0
-    # -- miscellaneous
+    # Miscellaneous
     normalize_advantage_per_mini_batch: false
 
-    # -- random network distillation
+    # Random network distillation
     rnd_cfg:
-        weight: 0.0  # initial weight of the RND reward
-        weight_schedule: null # note: this is a dictionary with a required key called "mode". Please check the RND module for more information
-        reward_normalization: false  # whether to normalize RND reward
-        # -- learning parameters
-        learning_rate: 0.001  # learning rate for RND
-        # -- network parameters
-        num_outputs: 1  # number of outputs of RND network. Note: if -1, then the network will use dimensions of the observation
-        predictor_hidden_dims: [-1] # hidden dimensions of predictor network
-        target_hidden_dims: [-1]  # hidden dimensions of target network
+        weight: 0.0  # Initial weight of the RND reward
+        weight_schedule: null # This is a dictionary with a required key called "mode". Please check the RND module for more information
+        reward_normalization: false  # Whether to normalize RND reward
+        # Learning parameters
+        learning_rate: 0.001  # Learning rate for RND
+        # Network parameters
+        num_outputs: 1  # Number of outputs of RND network. Note: if -1, then the network will use dimensions of the observation
+        predictor_hidden_dims: [-1] # Hidden dimensions of predictor network
+        target_hidden_dims: [-1]  # Hidden dimensions of target network
 
-    # -- symmetry augmentation
+    # Symmetry augmentation
     symmetry_cfg:
-      use_data_augmentation: true  # this adds symmetric trajectories to the batch
-      use_mirror_loss: false  # this adds symmetry loss term to the loss function
-      data_augmentation_func: null # string containing the module and function name to import
+      use_data_augmentation: true  # This adds symmetric trajectories to the batch
+      use_mirror_loss: false  # This adds symmetry loss term to the loss function
+      data_augmentation_func: null # String containing the module and function name to import
       # Example: "legged_gym.envs.locomotion.anymal_c.symmetry:get_symmetric_states"
       #
       # .. code-block:: python
@@ -73,4 +74,4 @@ runner:
       #        obs: Optional[torch.Tensor] = None, actions: Optional[torch.Tensor] = None, cfg: "BaseEnvCfg" = None, obs_type: str = "policy"
       #     ) -> Tuple[torch.Tensor, torch.Tensor]:
       #
-      mirror_loss_coeff: 0.0 #coefficient for symmetry loss term. If 0, no symmetry loss is used
+      mirror_loss_coeff: 0.0 # Coefficient for symmetry loss term. If 0, no symmetry loss is used