ROB-6323 Go2 Project — Isaac Lab - Arshia Sangwan

This repository is the starter code for the NYU Reinforcement Learning and Optimal Control project in which students train a Unitree Go2 walking policy in Isaac Lab starting from a minimal baseline and improve it via reward shaping and robustness strategies. Please read this README fully before starting and follow the exact workflow and naming rules below to ensure your runs integrate correctly with the cluster scripts and grading pipeline.

Repository policy

Fork this repository and do not change the repository name in your fork.
Your fork must be named rob6323_go2_project so cluster scripts and paths work without modification.

Prerequisites

GitHub Account: You must have a GitHub account to fork this repository and manage your code. If you do not have one, sign up here.

Links

Project Webpage: https://machines-in-motion.github.io/RL_class_go2_project/
Project Tutorial: https://github.com/machines-in-motion/rob6323_go2_project/blob/master/tutorial/tutorial.md

Connect to Greene

Connect to the NYU Greene HPC via SSH; if you are off-campus or not on NYU Wi‑Fi, you must connect through the NYU VPN before SSHing to Greene.
The official instructions include example SSH config snippets and commands for greene.hpc.nyu.edu and dtn.hpc.nyu.edu as well as VPN and gateway options: https://sites.google.com/nyu.edu/nyu-hpc/accessing-hpc?authuser=0#h.7t97br4zzvip.

Clone in $HOME

After logging into Greene, cd into your home directory (cd $HOME). You must clone your fork into $HOME only (not scratch or archive). This ensures subsequent scripts and paths resolve correctly on the cluster. Since this is a private repository, you need to authenticate with GitHub. You have two options:

Option A: Via VS Code (Recommended)

The easiest way to avoid managing keys manually is to configure VS Code Remote SSH. If set up correctly, VS Code forwards your local credentials to the cluster.

Follow the NYU HPC VS Code guide to set up the connection.

Tip: Once connected to Greene in VS Code, you can clone directly without using the terminal:

Sign in to GitHub: Click the "Accounts" icon (user profile picture) in the bottom-left sidebar. If you aren't signed in, click "Sign in with GitHub" and follow the browser prompts to authorize VS Code.

Clone the Repo: Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P), type Git: Clone, and select it.

Select Destination: When prompted, select your home directory (/home/<netid>/) as the clone location.

For more details, see the VS Code Version Control Documentation.

Option B: Manual SSH Key Setup

If you prefer using a standard terminal, you must generate a unique SSH key on the Greene cluster and add it to your GitHub account:

Generate a key: Run the ssh-keygen command on Greene (follow the official GitHub documentation on generating a new SSH key).
Add the key to GitHub: Copy the output of your public key (e.g., cat ~/.ssh/id_ed25519.pub) and add it to your account settings (follow the GitHub documentation on adding a new SSH key).

Execute the Clone

Once authenticated, run the following commands. Replace <your-git-ssh-url> with the SSH URL of your fork (e.g., git@github.com:YOUR_USERNAME/rob6323_go2_project.git).

cd $HOME
git clone <your-git-ssh-url> rob6323_go2_project

Note: You must ensure the target directory is named exactly rob6323_go2_project. This ensures subsequent scripts and paths resolve correctly on the cluster.

Install environment

Enter the project directory and run the installer to set up required dependencies and cluster-side tooling.

cd $HOME/rob6323_go2_project
./install.sh

Do not skip this step, as it configures the environment expected by the training and evaluation scripts. It will launch a job in burst to set up things and clone the IsaacLab repo inside your greene storage. You must wait until the job in burst is complete before launching your first training. To check the progress of the job, you can run ssh burst "squeue -u $USER", and the job should disappear from there once it's completed. It takes around 30 minutes to complete. You should see something similar to the screenshot below (captured from Greene):

In this output, the ST (state) column indicates the job status:

PD = pending in the queue (waiting for resources).
CF = instance is being configured.
R = job is running.

On burst, it is common for an instance to fail to configure; in that case, the provided scripts automatically relaunch the job when this happens, so you usually only need to wait until the job finishes successfully and no longer appears in squeue.

What to edit

In this project you'll only have to modify the two files below, which define the Isaac Lab task and its configuration (including PPO hyperparameters).
- source/rob6323_go2/rob6323_go2/tasks/direct/rob6323_go2/rob6323_go2_env.py
- source/rob6323_go2/rob6323_go2/tasks/direct/rob6323_go2/rob6323_go2_env_cfg.py PPO hyperparameters are defined in source/rob6323_go2/rob6323_go2/tasks/direct/rob6323_go2/agents/rsl_rl_ppo_cfg.py, but you shouldn't need to modify them.

How to edit

Option A (recommended): Use VS Code Remote SSH from your laptop to edit files on Greene; follow the NYU HPC VS Code guide and connect to a compute node as instructed (VPN required off‑campus) (https://sites.google.com/nyu.edu/nyu-hpc/training-support/general-hpc-topics/vs-code). If you set it correctly, it makes the login process easier, among other things, e.g., cloning a private repo.
Option B: Edit directly on Greene using a terminal editor such as nano.

nano source/rob6323_go2/rob6323_go2/tasks/direct/rob6323_go2/rob6323_go2_env.py

Option C: Develop locally on your machine, push to your fork, then pull changes on Greene within your $HOME/rob6323_go2_project clone.

Tip: Don't forget to regularly push your work to github

Launch training

From $HOME/rob6323_go2_project on Greene, submit a training job via the provided script.

cd "$HOME/rob6323_go2_project"
./train.sh

Check job status with SLURM using squeue on the burst head node as shown below.

ssh burst "squeue -u $USER"

Be aware that jobs can be canceled and requeued by the scheduler or underlying provider policies when higher-priority work preempts your resources, which is normal behavior on shared clusters using preemptible partitions.

Where to find results

When a job completes, logs are written under logs in your project clone on Greene in the structure logs/[job_id]/rsl_rl/go2_flat_direct/[date_time]/.
Inside each run directory you will find a TensorBoard events file (events.out.tfevents...), neural network checkpoints (model_[epoch].pt), YAML files with the exact PPO and environment parameters, and a rollout video under videos/play/ that showcases the trained policy.

Download logs to your computer

Use rsync to copy results from the cluster to your local machine. It is faster and can resume interrupted transfers. Run this on your machine (NOT on Greene):

rsync -avzP -e 'ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null' <netid>@dtn.hpc.nyu.edu:/home/<netid>/rob6323_go2_project/logs ./

Explanation of flags:

-a: Archive mode (preserves permissions, times, and recursive).
-v: Verbose output.
-z: Compresses data during transfer (faster over network).
-P: Shows progress bar and allows resuming partial transfers.

Visualize with TensorBoard

You can inspect training metrics (reward curves, loss values, episode lengths) using TensorBoard. This requires installing it on your local machine.

Install TensorBoard: On your local computer (do NOT run this on Greene), install the package:
```
pip install tensorboard
```

Launch the Server: Navigate to the folder where you downloaded your logs and start the server:

# Assuming you are in the directory containing the 'logs' folder
tensorboard --logdir ./logs

View Metrics: Open your browser to the URL shown (usually http://localhost:6006/).

Debugging on Burst

Burst storage is accessible only from a job running on burst, not from the burst login node. The provided scripts do not automatically synchronize error logs back to your home directory on Greene. However, you will need access to these logs to debug failed jobs. These error logs differ from the logs in the previous section.

The suggested way to inspect these logs is via the Open OnDemand web interface:

Navigate to https://ood-burst-001.hpc.nyu.edu.
Select Files > Home Directory from the top menu.
You will see a list of files, including your .err log files.
Click on any .err file to view its content directly in the browser.

Important: Do not modify anything inside the rob6323_go2_project folder on burst storage. This directory is managed by the job scripts, and manual changes may cause synchronization issues or job failures.

Project scope reminder

The assignment expects you to go beyond velocity tracking by adding principled reward terms (posture stabilization, foot clearance, slip minimization, smooth actions, contact and collision penalties), robustness via domain randomization, and clear benchmarking metrics for evaluation as described in the course guidelines.
Keep your repository organized, document your changes in the README, and ensure your scripts are reproducible, as these factors are part of grading alongside policy quality and the short demo video deliverable.

Resources

Isaac Lab documentation — Everything you need to know about IsaacLab, and more!
Isaac Lab ANYmal C environment — This targets ANYmal C (not Unitree Go2), so use it as a reference and adapt robot config, assets, and reward to Go2.
DMO (IsaacGym) Go2 walking project page • Go2 walking environment used by the authors • Config file used by the authors — Look at the function compute_reward_CaT (beware that some reward terms have a weight of 0 and thus are deactivated, check weights in the config file); this implementation includes strong reward shaping, domain randomization, and training disturbances for robust sim‑to‑real, but it is written for legacy IsaacGym and the challenge is to re-implement it in Isaac Lab.
API References:
- ArticulationData (robot.data) — Contains root_pos_w, joint_pos, projected_gravity_b, etc.
- ContactSensorData (_contact_sensor.data) — Contains net_forces_w (contact forces).

Students should only edit README.md below this line.

My Implementation: Robust Quadruped Locomotion and Dynamic Skills

Author: Arshia Sangwan
Course: ROB-6323 Reinforcement Learning and Optimal Control
Institution: NYU Tandon School of Engineering

Overview

This project transforms a minimal two-reward baseline into a comprehensive quadruped locomotion system capable of robust velocity tracking, terrain traversal, and dynamic acrobatic maneuvers. Starting from the base repo, I developed two complete training environments:

Flat Terrain Locomotion (Main Task): A production-ready walking policy with 26 reward terms, automatic curriculum learning, and extensive domain randomization for sim-to-real transfer.
Controlled Backflip (Bonus): A dynamic aerial maneuver environment that teaches the robot to perform a complete backward somersault with soft landing and recovery.

The following sections detail every technical decision, implementation choice, and the reasoning behind each component.

Project Architecture

Repository Structure

rob6323_go2_project/
    source/rob6323_go2/rob6323_go2/tasks/direct/rob6323_go2/
        rob6323_go2_env.py              # Main locomotion environment (1380 lines)
        rob6323_go2_env_cfg.py          # Main configuration (455 lines)
        rob6323_go2_backflip_env.py     # Backflip environment (1046 lines)
        rob6323_go2_backflip_env_cfg.py # Backflip configuration (262 lines)
        agents/
            rsl_rl_ppo_cfg.py           # PPO hyperparameters
    train.sh                            # Flat terrain training
    train_backflip.sh                   # Backflip training

Files I Created and/or Substantially Modified

File	Lines	Description
`rob6323_go2_env.py`	1380	Complete rewrite with 26 rewards, curriculum, randomization
`rob6323_go2_env_cfg.py`	455	Comprehensive configuration with documented parameters
`rob6323_go2_backflip_env.py`	1046	New file: phase-based backflip training
`rob6323_go2_backflip_env_cfg.py`	262	New file: backflip-specific configuration

Main Task: Flat Terrain Locomotion

Design Philosophy

The baseline environment provided only two reward terms: linear velocity tracking and yaw rate tracking. While functional, this minimal approach produces policies that lack the robustness, efficiency, and natural motion quality required for real-world deployment.

My approach follows three core principles:

Reward Decomposition: Rather than a monolithic reward function, I decompose the objective into 26 specialized terms, each addressing a specific aspect of locomotion quality. This provides clear learning signals and enables fine-grained behavior tuning.
Progressive Difficulty: Training begins with simple conditions and gradually introduces complexity through automatic curriculum learning. This prevents the policy from being overwhelmed early in training.
Sim-to-Real Awareness: Every design decision considers the eventual transfer to physical hardware. Domain randomization, observation noise, and actuator modeling all serve this goal.

Reward Engineering

I implemented 26 reward terms organized into six functional categories. Each term was carefully designed based on established literature and tuned through extensive experimentation.

Category 1: Tracking Rewards

These rewards encourage the policy to follow velocity commands accurately.

Term	Formula	Scale	Rationale
`track_lin_vel_xy`	exp(-error^2 / 0.25)	+1.5	Exponential kernel provides smooth gradients near zero error and naturally bounds the reward. Inspired by ETH Zurich's legged_gym implementation.
`track_ang_vel_z`	exp(-error^2 / 0.25)	+0.8	Same kernel for yaw rate. Lower scale because turning is secondary to forward motion in most tasks.

The exponential kernel is superior to quadratic error for several reasons: it saturates at 1.0 for perfect tracking (preventing reward explosion), provides strong gradients near the target, and degrades gracefully for large errors.

Category 2: Posture Stability

These penalties maintain a stable, natural body configuration.

Term	Formula	Scale	Rationale
`orient`	gravity_xy^2	-2.0	Penalizes roll and pitch deviation using the projected gravity vector in body frame. When the robot is level, gravity_x and gravity_y are zero.
`base_height`	(h - 0.34)^2	-10.0	Maintains the Go2's natural standing height of 0.34m. Prevents both crouching (inefficient) and over-extension (unstable).
`lin_vel_z`	vel_z^2	-2.0	Penalizes vertical bouncing, which wastes energy and indicates poor gait quality.
`ang_vel_xy`	ang_vel_xy^2	-0.05	Dampens rotational oscillations in roll and pitch. Small scale because some rotation is natural during locomotion.

Category 3: Action Smoothness

These penalties reduce actuator wear and produce more natural motion.

Term	Formula	Scale	Rationale
`action_rate`	(a_t - a_{t-1})^2	-0.01	Penalizes the first derivative of actions (velocity). Reduces high-frequency oscillations.
`action_smoothness`	(a_t - 2a_{t-1} + a_{t-2})^2	-0.005	Penalizes the second derivative (acceleration/jerk). Uses finite differences to compute action acceleration.

The combination of first and second derivative penalties produces remarkably smooth motion while still allowing rapid responses when needed.

Category 4: Joint Protection

These penalties protect the hardware from damage and encourage efficient motion.

Term	Formula	Scale	Rationale
`dof_vel`	joint_vel^2	-5e-4	Limits joint speeds. The Go2 motors have a 30 rad/s velocity limit.
`dof_acc`	((v_t - v_{t-1})/dt)^2	-2.5e-7	Limits joint accelerations. Protects gearboxes from shock loads.
`dof_pos_limits`	max(0, pos - upper) + max(0, lower - pos)	-10.0	Penalizes approach to mechanical joint limits defined in the URDF.
`torque`	torque^2	-1e-5	Encourages energy-efficient, low-torque solutions.
`hip_regularization`	(hip_angle - default)^2	-0.5	Keeps hip abduction near the default pose. Inspired by legged_gym. Prevents splayed leg configurations.

Category 5: Foot Interaction

These rewards shape natural gait patterns and foot placement.

Term	Formula	Scale	Rationale
`feet_clearance`	swing_mask * max(0, 0.08 - foot_z)	-2.0	Penalizes low foot height during swing phase. Target clearance is 8cm, essential for obstacle avoidance.
`tracking_contacts`	1 - mean(abs(actual - desired))	+0.3	Rewards matching the clock-based gait schedule. Promotes regular trotting rhythm.
`feet_air_time`	(air_time - 0.5) * first_contact	+0.25	Rewards approximately 0.5 second swing duration. Encourages dynamic gaits over shuffling.
`feet_slip`	contact * max(0, foot_vel_xy - 0.1)	-0.2	Penalizes foot sliding while in contact with ground. Slip wastes energy and indicates poor placement.
`stumble`	count(thigh_contact) + count(calf_contact)	-2.0	Penalizes non-foot body parts contacting the ground. These events indicate stumbling.
`feet_contact_force`	sum(max(0, force - 400N))	-1e-4	Penalizes excessive impact forces above 400N. Protects feet from damage.
`foot_impact_vel`	first_contact * (vel_z)^2	-0.3	Penalizes high downward velocity at touchdown. Encourages soft landings.

Category 6: Safety and Gait Quality

These rewards address overall behavior and safety.

Term	Formula	Scale	Rationale
`collision`	(base_force > 5N)	-5.0	Penalizes body collisions with ground. Binary penalty when the base touches the ground.
`raibert_heuristic`	(foot_pos - target)^2	-0.5	Encourages Raibert-style footstep placement: foot lands ahead when moving forward.
`stand_still`	joint_motion when cmd=0	-0.2	Penalizes motion when commanded to stand still. Prevents drifting at zero command.
`symmetry`	(FL_h - FR_h)^2 + (RL_h - RR_h)^2	-0.1	Penalizes left-right asymmetry. Inspired by DMO. Prevents limping gaits.
`termination`	is_terminated	-200.0	Large penalty at episode end due to failure. Strongly discourages falling.
`power`	sum(abs(torque * vel))	-1e-5	Penalizes mechanical power consumption. Encourages energy efficiency.
`alive`	1.0	+0.2	Constant reward each timestep. Encourages survival without dominating other terms.

Curriculum Learning

Training difficulty increases automatically over 5 million environment steps through four phases:

Progress	Phase	Features Enabled
0-25%	Foundation	No randomization. Robot learns basic walking mechanics.
25-50%	Light Randomization	Friction variation (0.5-1.25), mass variation (15%), initial pose randomization.
50-75%	Medium Randomization	Motor strength variation (10%), PD gain variation (10%), observation noise.
75-100%	Full Randomization	Push disturbances (0.5 m/s every 10 seconds), all previous randomizations.

The curriculum factor is computed as:

progress = total_steps / curriculum_end_step
curriculum_factor = clamp(progress, 0.0, 1.0)

This factor gates all randomization intensities, observation noise magnitudes, and command range scaling.

Domain Randomization

To bridge the sim-to-real gap, I randomize six categories of parameters:

Parameter	Range	Purpose
Ground Friction	0.5 - 1.25	Simulates different floor surfaces (tile, carpet, concrete, outdoor)
Body Mass	85% - 115% of nominal	Accounts for payloads and manufacturing variation
Motor Strength	90% - 110%	Simulates actuator degradation and voltage variation
PD Gains (Kp, Kd)	90% - 110%	Accounts for control loop uncertainty
Stiction	0.1 - 0.6 Nm	Models static friction in gearboxes and bearings
Viscous Damping	0.01 - 0.05 Nm*s/rad	Models velocity-dependent friction

Actuator Friction Model

Real motors exhibit internal friction that simulation typically ignores. I implemented a physics-based friction model:

tau_stiction = Fs * tanh(qdot / velocity_threshold)  # Static friction
tau_viscous = mu_v * qdot                             # Viscous damping
tau_output = tau_PD - tau_stiction - tau_viscous

The hyperbolic tangent provides a smooth transition around zero velocity, avoiding numerical issues while capturing the "sticking" behavior of real actuators.

Observation Space

The policy receives a 52-dimensional observation vector:

Dimensions	Content	Scaling	Noise Level
0-2	Body linear velocity	x2.0	10%
3-5	Body angular velocity	x0.25	20%
6-8	Projected gravity	x1.0	5%
9-11	Velocity commands	Variable	0%
12-23	Joint positions (offset from default)	x1.0	1%
24-35	Joint velocities	x0.05	High
36-47	Previous actions	x1.0	0%
48-51	Gait clock signals	x1.0	0%

Clock-Based Gait Scheduling

I added sinusoidal clock signals that encode the desired gait phase:

# Trot gait: diagonal legs move together
foot_indices = [phase + 0.5, phase, phase, phase + 0.5]  # FL, FR, RL, RR
clock_inputs = sin(2 * pi * foot_indices)

This provides explicit timing information to the policy, significantly accelerating the learning of regular trotting gaits. The approach is inspired by "Walk These Ways" from CMU.

Explicit PD Control

The baseline used Isaac Lab's built-in position controller. I replaced this with explicit PD torque control:

torques = Kp * (desired_pos - joint_pos) - Kd * joint_vel
torques = torques * motor_strength  # Domain randomization
torques = clip(torques, -torque_limit, torque_limit)
robot.set_joint_effort_target(torques)

This enables randomization of Kp, Kd, and motor strength, which is essential for producing policies robust to actuator variations.

Bonus Task : Controlled Backflip

Overview

The backflip represents one of the most challenging dynamic maneuvers for quadruped robots. It requires explosive power generation, mid-air attitude control, and precise landing timing. I designed a complete phase-based training system that decomposes this complex skill into learnable sub-behaviors.

Phase Decomposition

The backflip is divided into five distinct phases, each with specific objectives:

Phase	Duration	Objective	Key Metrics
CROUCH	0-10%	Lower center of mass, compress legs	Height 22cm, slight backward lean
LAUNCH	10-23%	Explosive leg extension, initiate rotation	Vertical velocity 3 m/s, pitch rate 12 rad/s
FLIGHT	23-57%	Tuck legs, maintain rotation rate	Peak height 0.9m, angular momentum conservation
EXTEND	57-73%	Extend legs for landing preparation	Reduce rotation rate, orient feet downward
LAND	73-100%	Soft touchdown, recover to standing	Low impact velocity, upright orientation

Technical Implementation

Phase State Machine

PHASE_CROUCH = 0  # Prepare for jump
PHASE_LAUNCH = 1  # Explosive takeoff
PHASE_FLIGHT = 2  # Mid-air rotation
PHASE_EXTEND = 3  # Landing preparation
PHASE_LAND = 4    # Recovery to standing

Cumulative Rotation Tracking

To detect a complete backflip, I track cumulative pitch rotation:

current_pitch = atan2(-gravity_x, -gravity_z)
delta_pitch = current_pitch - previous_pitch
cumulative_pitch += delta_pitch  # Full flip = -2*pi radians

Ballistic Height Trajectory

The expected height follows projectile motion:

h(t) = h0 + v0*t - 0.5*g*t^2
height_reward = exp(-3.0 * |actual - expected|)

Observation Space

The backflip policy receives 56-dimensional observations:

Dimensions	Content
0-2	Body linear velocity
3-5	Body angular velocity
6-8	Projected gravity (orientation)
9-11	Phase information (sin, cos, phase_id)
12-23	Joint position offsets
24-35	Joint velocities
36-47	Previous actions
48-51	Target motion (pitch_vel, height, pitch, time)
52-55	Feet contact states

Reward Structure

Phase-Specific Rewards

Phase	Reward	Scale	Description
Crouch	crouch_phase	1.0	Proper pre-jump posture
Launch	launch_phase	5.0	Explosive takeoff with rotation initiation
Flight	flight_rotation	5.0	Maintain target angular velocity
Flight	tuck	2.0	Proper tucked leg position
Extend	extend_phase	1.5	Prepare legs for landing
Land	land_soft	3.0	Low impact velocity at touchdown
Land	land_upright	4.0	Land on feet, not back

Continuous Rewards and Penalties

Term	Scale	Description
angular_momentum	+1.0	Track target rotation rate
height_trajectory	+1.0	Follow ballistic flight path
crash_penalty	-50.0	Landing on back, head, or side
early_contact	-10.0	Ground contact during flight
success_bonus	+100.0	Complete flip with stable recovery

Rotation Curriculum

Training starts with partial rotations and gradually increases:

rotation_curriculum = 0.5  # Start at 50% rotation
if success_rate > 0.5:
    rotation_curriculum += 0.01  # Increase toward 100%

Training Command

./train_backflip.sh

Training and Evaluation

Running Training Jobs

Flat Terrain (Main Task)

cd $HOME/rob6323_go2_project
./train.sh

Backflip (Bonus)

./train_backflip.sh

Monitoring Progress

ssh burst "squeue -u $USER"

Downloading Results

rsync -avzP YOUR_NETID@dtn.hpc.nyu.edu:/home/YOUR_NETID/rob6323_go2_project/logs ./

TensorBoard Visualization

tensorboard --logdir ./logs

Results

Flat Terrain Performance

Metric	Baseline	My Implementation
Episode Length	~100 steps (falling)	~1000 steps (full episode)
Velocity Tracking Error	~1.0 m/s	Less than 0.3 m/s
Foot Slip Events	~50 per episode	Less than 10 per episode
Curriculum Completion	N/A	100% by 5M steps

Logged Metrics

I implemented comprehensive TensorBoard logging for all 26 reward terms, plus:

vel_tracking_error: Command following accuracy
orientation_error: Body stability measure
slip_count: Total foot sliding events
energy_consumption: Mechanical power integral
episode_length: Survival duration
curriculum_factor: Training progress (0.0 to 1.0)

Backflip Training Progression

Iterations	Observed Behavior
0-500	Learning to crouch and attempt jumps
500-1000	Initiating rotation, partial flips
1000-1500	Completing rotation, crash landings
1500-2000	Soft landings, recovery to standing
2000+	Consistent, repeatable backflips

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.vscode		.vscode
docs/img		docs/img
scripts		scripts
source/rob6323_go2		source/rob6323_go2
tutorial		tutorial
.dockerignore		.dockerignore
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
install.sh		install.sh
install.slurm		install.slurm
train.sh		train.sh
train.slurm		train.slurm
train_backflip.sh		train_backflip.sh
train_backflip.slurm		train_backflip.slurm
train_local.sh		train_local.sh

Folders and files

Latest commit

History

Repository files navigation

ROB-6323 Go2 Project — Isaac Lab - Arshia Sangwan

Repository policy

Prerequisites

Links

Connect to Greene

Clone in $HOME

Option A: Via VS Code (Recommended)

Option B: Manual SSH Key Setup

Execute the Clone

Install environment

What to edit

How to edit

Launch training

Where to find results

Download logs to your computer

Visualize with TensorBoard

Debugging on Burst

Project scope reminder

Resources

My Implementation: Robust Quadruped Locomotion and Dynamic Skills

Overview

Table of Contents

Project Architecture

Repository Structure

Files I Created and/or Substantially Modified

Main Task: Flat Terrain Locomotion

Design Philosophy

Reward Engineering

Category 1: Tracking Rewards

Category 2: Posture Stability

Category 3: Action Smoothness

Category 4: Joint Protection

Category 5: Foot Interaction

Category 6: Safety and Gait Quality

Curriculum Learning

Domain Randomization

Actuator Friction Model

Observation Space

Clock-Based Gait Scheduling

Explicit PD Control

Bonus Task : Controlled Backflip

Overview

Phase Decomposition

Technical Implementation

Phase State Machine

Cumulative Rotation Tracking

Ballistic Height Trajectory

Observation Space

Reward Structure

Phase-Specific Rewards

Continuous Rewards and Penalties

Rotation Curriculum

Training Command

Training and Evaluation

Running Training Jobs

Monitoring Progress

Downloading Results

TensorBoard Visualization

Results

Flat Terrain Performance

Logged Metrics

Backflip Training Progression

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages