This is the roadmap for my project, replicating Autonomous Drone Racing papers. Specifically, the papers [1] and [2] are the focus of the first iteration.
Roadmap tasks:
- Update project and dev-container to Ubuntu 22.04 and modern tools. This point refers to changes in dependencies and project structure, not changes to code
- Toss out dependencies related to Stable Baselines 2 (replace with new version 3),
- Remove the old OpenAI gym download (replace with modern "gymnasium"),
- Toss out tensorflow dependencies completely (though a local tensorboard server should be used for visualizing live training metrics)
- Toss out anything related to ROS 1, including anything catkin-related (will maybe be replaced with a ROS 2 Humble bridge in the future, not now though)
- Leave 'flightros/' as reference, but remove all usage of it (to be replaced with ROS 2 bridge if needed)
- Add Rerun.io for visualization
- Set up vcpkg for third-party dependencies, like zmq, instead of building from source or using other mechanisms (
initial vcpkg.jsonalready exists) - Switch to uv instead of pip as python3 package manager
- Set up the
blackformatter for any new python code written (flightmare folder shall be exempted from any formatting/linting, to not disturb old code) - Update to versions of python packages and other dependencies to newest stable set. Python packages should have a requirements.txt made, or whatever uv uses "Modern" and "newest stable" are ambiguous, perhaps some experimentation is required to find a good set of versions that all are either the latest stable version or the latest version that works with the other dependencies. Think: "As new as possible without breaking anything"
- Set up unified track handling
- Common TrackHandler C++ class that can read and write tracks from/to file. File format: Simplified yaml format compared to what was present in the TOGT-Planner git repository.
- Convert downloaded benchmark tracks (
.yaml) inassets/racetracks/to this simpler yaml format. Convert all gates to square gates. Side length of gates should be whatever is most common for the gates with type RectanglePrisma that are square in the current.yamltrack files. These files should be placed also in theassets/racetracks/simplifiedfolder.
- Fix project CMakeLists.txt files
- Top level file including all subdirectories (currently only
common/). - Make top level file reference
flightmare/flightlib/, such that building the top-level project also builds theflightlibshared library (if changes have occurred in that code) - CMakeLists.txt in
common/to build TrackHandler
- Top level file including all subdirectories (currently only
- Build whole project using common set of dependencies, including
flightlib- Update Eigen version for
flightlibto use whatever is the newest compatible version - Debug errors that arise
- Specify the newest stable/newest compatible version of each dependency in vcpkg, so that building this project in the future will install specifically the versions compatible with this project
- Update Eigen version for
- Set up building with clang instead of gcc so that clang-tidy will work properly.
- Add a new RacingEnv class similar to QuadrotorEnv that will also include logic regarding the racing tracks consisting of square gates
- All gates in track stored in RacingEnv instance
- Ability to fetch drone state observed as described in [2] (Linear velocity and rotation matrix of drone).
- Gates are observed as described in [2] (gate corners).
- Integer parameter decides how many future gates are included in observation space
- Compute Gate Progress reward as described in [1] and [2].
- Modify RacingEnv action space to use delayed CTBR (collective thrust + body rates)
- Use CTBR command mode (collective_thrust + omega) instead of single rotor thrusts mode in RacingEnv::step()
- Add an input delay defined by a parameter (input delay translated to an integer number of steps based on simulation dt from configuration file). Implement with a Command buffer. Initialize buffer with "neutral" commands that simply thrust "upwards" with 1g. Clear buffer on reset, filling with neutral commands.
- Add racing logic to RacingEnv
- Add mechanism to RacingEnv that allows detection of when the drone passes through a gate successfully.
- Add collision detection to RacingEnv
- Build sphere representation of drone based on parameters (arm length) on initialization
- Perform AABB-Sphere collision checking between drone and the observed gates
- Add RacingEnv to pybind11 wrapper (pybind_wrapper.cpp)
- Expose
RacingEnvasRacingEnv - Expose
VecEnv<RacingEnv>asVectorizedRacingEnv
- Expose
- Set up an interactive test simulation
- Make a simple Python
.ipynbnotebook that uses shared libraryflightlibthrough pybind wrapper. Can start one RacingEnv instance and teleport the drone around.- Implement an interface to RacingEnv that simply "teleports" the drone a small distance in the direction given by the keyboard input --- completely ignoring the physics (gravity, rotor thrust, etc.). Separate method (env.teleportTo() used instead of step(), which remains pure for the RL stuff). teleportTo() still needs to perform all collision/gate pass checks and such, for debugging.
- Add ability to load track from yaml using pybind11 wrapper for TrackHandler C++ class
- Set up Rerun as main visualization
- Use
assets/glTF/uzh_gate.glbfor visualizing the gate - Use
assets/glTF/drone_red.glbfor visualizing the drone - Add ability to display drone's collision sphere representation (spheres colored in green)
- Add ability to show each gate as its 4 collision cuboids (in blue)
- Add ability to show gate observations as 3D lines from drone to next gate's corners
- Add ability to show collisions by rendering colliding drone collision spheres and gate cuboids in red
- Add ability to show gate pass detection by making gates passed in correct order yellow (specifically their blue collision cuboids turn yellow, not the 3D gate assets)
- Use
- After each teleport, collisions and gate passes are checked and visualizations updated. Magenta line segment rendered from previous state to current to show movement history
- Make a simple Python
- Set up brand new RL training script
- Set up SB3 PPO policy as similar as possible to the one described in [2]. Use
VecNormalizeto approximate "input normalization using z-scoring" mentioned in [1]. - Set up tensorboard locally for visualizing metrics. All the usual metrics + gradients. Regular validation roll-outs with roll-out metrics such as: success rate (passing all gates), collision rate, average velocity, maximum velocity, maximum acceleration (measured in g).
- Implement with
self.loggercalls in a callback
- Implement with
- Make a
viz/visualization.pymodule for calling Rerun's python API for visualizing the racing environment. Base it on the.ipynbused for tests.- Use
InstancePoses3Dfor visualizing the gateAsset3D--- likely more efficient since they all are identical - (Optional) Add visualization of gate observations during roll-out visualizations
- Use
- Add as an available option to sample one roll-out from every
viz_every_n_batchesbatches (viz_every_n_batches~ 50-100) to visualize in Rerun during training --- to track progress qualitatively.- Record a randomly selected roll-out (one from a batch) and send it to rerun after the episode completes via an efficient
send_columns()call. Timestamps based on simulation time. Magenta trail behind drone shows history (line segments connecting current and previous position). User can replay/scrub timeline to see behavior until next roll-out is sampled for visualization. Rollout visualizations can be stored as.rrdto look at later.
- Record a randomly selected roll-out (one from a batch) and send it to rerun after the episode completes via an efficient
- Make sure that episode step/time limit is set and handled correctly
- Add sampling of drone start poses at the centerpoint between gates as it is described in [2] in order to speed up training. Part of curriculum setup. This should probably live in
RacingEnv. Progression described in [2] where starting points that lead to success are sampled from again (initial state buffer). - Read env config file only once, not once per env instance
- Add track setting from train.py (set same track for all envs, used if track randomization off)
- One single training configuration
.yamlfile used for all hyperparameters. Also specifies path to the.yamlused to read environment parameters from.
- Set up SB3 PPO policy as similar as possible to the one described in [2]. Use
- Apply noise to thrust mapping coefficients. From [2]: "we [...] randomize the thrust mapping coefficients to simulate unmodeled battery behaviour, such as high voltage drops when flying at very high speeds, and the drag coefficients to simulate unknown aerodynamic effects". Add env parameters for tuning this noise. Aerodynamic drag isn't modeled in flightlib, and I will not implement a model for it for this project.
- Debug training loop until it works as expected
- Verify that logged metrics and visualized roll-outs look reasonable
- Train a decent performing policy
- Add evaluation script
- Remodel repo structure and clean up build system
- Make sure one single CMake build command is enough to build the whole project, including flightgym python module
- Build devcontainer and run tests on other device to mitigate "works on my machine" issues
- (Optional) Add tanh squashing by modifying the SB3 PPO implementation, to match the paper [2].
- (Optional) Add random track generation as described in [1]. This can live in RacingEnv. Complexity of track generation modified by parameters (number of gates, and maximum "magnitude" of pose difference to previous gate/start pose)
[1]: Autonomous Drone Racing with Deep Reinforcement Learning (2021)
[2]: Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning (2023)