Skip to content

Latest commit

 

History

History
82 lines (56 loc) · 6.9 KB

File metadata and controls

82 lines (56 loc) · 6.9 KB

Autonomous Drone Racing Experiments

My repository for experimenting with Autonomous Drone Racing and replicating papers in that domain.

This project uses a modified version of flightlib, part of the Flightmare simulator, to simulate many Autonomous Drone Racing environments in parallel to collect samples during RL training. Rerun.io is used for visualization of sampled episodes during training. I have mainly drawn inspiration from two papers by the Robotics and Perception Group at UZH, [1] and [2].

Main contributions

This repository contains a replication of the method described in Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning (2023) [2], whose authors did not provide an open-source implementation of their work. A successful roll-out of a policy trained with the code in the repository can be viewed in the GIF above.

A new environment RacingEnv was implemented according to the descriptions outlined in the paper. Track handling, gate observation, gate passage detection, collision checking, domain randomization, curriculum sampling and more was added, beyond what was already available in flightlib. Some creative liberty had to be taken to fill in some of the gaps that were not described by the paper's authors in detail. This new environment was adapted to a modern Stable Baselines 3 (SB3) and Gymnasium API.

A MetricsCallback Python class was made to log training progress, save checkpoints, and intermittently visualize sampled roll-outs using Rerun. Being able to qualitatively view the progress of the policy is great!

A proper Docker .devcontainer was set up, to help others who might want to experiment further with this repository.

Modeling and RL setup

Observation space: Priviledged ego state: Linear velocity and rotation matrix of drone. Gates: relative position between the drone center and the four corners of the next target gate or relative difference in corner distance between two consecutive gates, for the next $N$ future gates, where $N$ is a parameter. [2] uses $N = 2$. $N$ corresponds to num_future_gates in config racing_env.yaml. Observation space dimension (for $N$ future gates):

  • Ego velocity: 3 (linear velocity in body frame)
  • Ego rotation: 9 (rotation matrix flattened)
  • Gates corners: $N$ * 12 (4 corners × 3D relative position).

Total: $(N + 1) * 12$ dimensions in observation space.

Although the description in [2] is somewhat vague, this is how I have defined the gate observations in this project:

$$\delta_{\mathbf{p}_1} = \big[ (\mathbf{c}_{1,1}-\mathbf{p}_t)^\top,\; (\mathbf{c}_{1,2}-\mathbf{p}_t)^\top,\; (\mathbf{c}_{1,3}-\mathbf{p}_t)^\top,\; (\mathbf{c}_{1,4}-\mathbf{p}_t)^\top \big]^\top \in \mathbb{R}^{12}.$$ $$\delta_{\mathbf{p}_i} = \big[ (\mathbf{c}_{i,1}-\mathbf{c}_{i-1,1})^\top,\; (\mathbf{c}_{i,2}-\mathbf{c}_{i-1,2})^\top,\; (\mathbf{c}_{i,3}-\mathbf{c}_{i-1,3})^\top,\; (\mathbf{c}_{i,4}-\mathbf{c}_{i-1,4})^\top \big]^\top \in \mathbb{R}^{12},\quad i=2,\dots,N.$$

where $\delta_{\mathbf{p}_1}$ is the observation of the next gate and $\delta_{\mathbf{p}_i}$ the observation for subsequent gates. $\mathbf{p}_t$ is the position of the quadrotor at timestep $t$, and $\mathbf{c}_{i,j}$ is the position of gate corner $j \in \{1,2,3,4\}$ (counter clock-wise starting with top right corner as $1$) of gate $i$.

Control inputs: Mass-normalized collective thrust + body rates (CTBR). Configurable input delay. Noise applied to thrust mapping coefficients to model uncertainty in dynamics.

RL method: PPO, based on Stable Baselines 3 (SB3) implementation. Unlike [2], no tanh activation function is used for squashing network output. Instead, the default action clipping mechanism for the SB3 PPO implementation is used.

Reward design: Gate Progress objective: $r(k) = \left\| g_k - p_{k-1} \right\| - \left\| g_k - p_{k} \right\| - b\left\| \mathbf{\omega}_k \right\|$. Target gate center $g_k$ and drone positions at current timestep, $p_k$, and previous timestep, $p_{k-1}$. Body rate magnitude $\left\| \mathbf{\omega}_k \right\|$ and weighting parameter $b = 0.01$. Collision penalty $r(k) = −10.0$ and goal reward $r(k) = +10.0$ upon finishing the race. All as described in [2]. The Gate Progress objective makes the reward function dense, and conducive to rapid learning.

Parallelized simulation: Many simulations can run simultaneously during RL training. Authors of [2] let 100 simulations run in parallel. Simple OpenMP loop parallelization for C++-side VecEnv. All simulation runs on the CPU.

Curriculum initialization: Starting positions of the quadrotor are sampled from areas in the middle between all pairs of subsequent gates, such that all gates are observed early during training. Starting states from which the policy manages to pass at least one gate get put in a success buffer, from which starting states are drawn from again with a fixed probability (a hyperparameter). This curriculum is described in [2].

Notes on additions to flightlib

My main modification to Flightmare's flightlib is the RacingEnv, which includes additions needed to properly model Autonomous Drone Racing scenarios for RL training.

Track loading

Tracks defined by .yaml files (see files in assets/racetracks/ for format) can be loaded in as Track objects which can be assigned to RacingEnv instances.

Gate passage detection

The RacingEnv detects when the drone successfully passes through the next gate, allowing the ability to define a win condition for the drone finishing the track. This is also required for the curriculum setup described in [2].

Collision checking

The collision checking I've added allows checking collision between drones and racing gates, for the purpose of calculating collision-based rewards during RL training. Gate-drone collison is based on AABB-sphere collision checking and leverages a sphere-representation of the drone. Drones are represented by 5 spheres: one for the center and one for each propeller. Simple and efficient. Collision sphere placement consideres the arm length parameter defined for the drone in the configuration file, racing_env.yaml. Collision checking is restricted to consider only the next gate that the quadrotor shall pass, to avoid confusing the policy by allowing collisions with gates that cannot be observed.

References

[1]: Autonomous Drone Racing with Deep Reinforcement Learning (2021)

[2]: Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning (2023)