Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning
partially_sim.mp4
onboard_flight.mp4
failure_new.mp4
This repository is an NVIDIA Isaac Lab extension that contains the environment and algorithms to control a multi-drone transport system (flycrane).
Using DirectMARLEnv, only the hover
task has been implemented. The partial_obs
flag allows to train the policies with only local observations (own state and ID, load pose and goal pose). DirectMARLEnvs allow to for decentralized agents such as MAPPO, but can also be wrapped to allow for centralized training using PPO. This environment has been used to train the agents for the CoRL paper.
Using ManagerBasedRLEnv, these task have been implemented (although deprecated), allowing for centralized control:
-
hover_llc
/hover
The flycrane gets a reference pose for the payload where it should hover. This is done with a differential based flatness controller (DFBC), or end-to-end, which, in this project, is from the simulation states to the forces and torques on the quadrotors. All tasks hereafter include the DFBC to minimize the sim2real gap. -
track
Track a reference trajectory which is generated by an external script. Different trajectories is figure-8 or ellipsoid shapes are generated for the payload. -
FlyThrough
Make the payload fly through a gap/between 2 walls, avoiding collisions.
I recommend not further developing the ManagerBasedRLEnvs, since centralized control can also be achieved by wrapping DirectMARLEnvs. They can be used for inspiration however.
Moreover, a test environment (without agent) for a single drone is also available in single_falcon
.
-
Install Isaac Lab, see the installation guide.
-
Clone and install this fork of SKRL
cd skrl
pip install -e .
- Using a python interpreter that has Isaac Lab installed, install the library
cd exts/MARL_mav_carry_ext
python -m pip install -e .
The assets used for the tasks are under exts/MARL_mav_carry_ext/MARL_mav_carry_ext/assets.
The assets folder contains $(ROBOT).py
files which have the configuration the respective robot in the form an ArticulationCfg
(Isaac Lab actuated robot config class).
Then, in the exts/MARL_mav_carry_ext/MARL_mav_carry_ext/assets/data/AMR folder, the robot URDF and corresponding meshes are located in the $(ROBOT)_data
folder and the USD files can be found in the $(ROBOT)
folder.
The controllers are implemented in exts/MARL_mav_carry_ext/MARL_mav_carry_ext/controllers. The differential flatness based controller (DFBC) can be found in geometric.py
. The incremental nonlinear dynamic inversion (INDI) controller can be found in indi.py
.
The plotting tools are implemented in exts/MARL_mav_carry_ext/MARL_mav_carry_ext/plotting_tools. Plotting is supported for both ManagerBasedRLEnv and DirectMARLEnv.
The environments for each specific flycrane task can be found under exts/MARL_mav_carry_ext/MARL_mav_carry_ext/tasks/MARL_mav_carry.
Moreover, tasks for a single falcon drone are implemented in exts/MARL_mav_carry_ext/MARL_mav_carry_ext/tasks/single_falcon. This environment was only used to test the controller.
The configuration class for each task can be found in their respective folder. Following the Isaac Lab structure the environment is implemented as a ManagerBasedRLEnv
.
The manager based environment consists of multiple modules, and their configs can be found in the $(TASK)_env_cfg.py
file:
-
Scene
The scene that describes the environment. This describes all prims present in the environment such as ground plane, lights, robots and sensors. Also describes obstacles if there are any in the environment. -
CommandManager
Generate a new pose command as a reference for the payload and resample at certain time intervals. Or in the case of a trajectory, sample a new trajectory and pass pass the 4 (variable) future points as observations. -
ActionManager
Processes and sends the actions to the simulation. For the end-to-end case, this class only has 1ActionTerm
that purely reshapes and clamps the forces and torques applied to the body. The RL policy directly learns the forces and torques on the drone bodies (collective thrust and 3 torques). When using a lower level controller, the policy output is passed to the controller and mapped to rotor forces here. -
ObservationManager
The observations available to the policy. Currently, the problem is handled as a centralized problem and contains the following observations:- Payload state (positions, orientations, linear/angular velocities) in environment frame
- Drone states (positions, orientations, linear/angular velocities) in environment frame
- Payload pose errors to the goal
- Payload twist errors to the goal (for trajectory tracking)
-
EventManager
Describes what happens on certain events such as startup, reset or at certain time intervals. Right now, when reset is called, velocities and external forces/torques on all bodies are set to 0. The pose of the flycrane is sampled from a uniform distribution to randomize the initial state. -
RewardManager
Implements the reward function. Consisting of the following terms:reward_pose
Reward for tracking the payload posereward_twist
Reward for tracking the payload twist (for trajectory tracking)reward_force
Reward to keep the effort small (rotor forces)reward_policy_action_smoothness
Keep the changes in policy output over with respect to the previous timestep smallreward_body_rates
Reward to keep the commanded body rates smallreward_downwash
Reward for keeping the wake of the drones away from the payload
-
TerminationsManager
Terminates the episode corresponding to an environment if a termination condition has been met. The implemented termination terms are"time_out
Time out after max episode lengthfalcon_fly_low
Terminate when drones fly too lowpayload_fly_low
Terminate when the payload flies too lowillegal_contact
Terminate when forces between bodies get too largepayload_cable_angle
Terminate when the angle between the cables and the payload get too largedrone_cable_angle
Terminate when the angle between the cables and the drones get too largelarge_states
Terminate when any states of any body are too largebounding_box
Terminate when the articulation goes outside of a specified areacables_collide
Terminate when the cables collidedrones_collide
Terminate when the drones get too close to each othertarget_too_far
Terminate when the pose target is too far (for trajectory tracking)
-
CurriculumManager
Manager of different learning tasks in order to increase difficulty of the task over time (curriculum learning). This is not implemented.
The DirectMARLEnv
case implements all of these functions directly into 1 class, and is more efficient. The functionality of the functions is the same for the most part. It can be found under exts/MARL_mav_carry_ext/MARL_mav_carry_ext/tasks/directMARL.
Isaac Lab offers different wrappers for different RL libraries to make it easy to switch between libraries. The scripts for the corresponding libraries are implemented in scripts. The usable libraries are rsl_rl and skrl.
The agent configurations for the flycrane are in the respective task's config/flycrane/agents
. folder The environments are registered as a gym environment and the parameters of the agents can be changed here.
To train the agent, for example using skrl. You can run the following command from the command line:
python3 scripts/skrl/train.py --task=Isaac-flycrane-payload-decentralized-hovering-v0 --headless --num_envs=4096 --seed=-1 --algorithm="MAPPO"
This will start the training for the hover task with the configured settings in the agent configuration file. For more command line interface arguments, check the respective train.py
file under scripts/
. To wrap the environment and allow for centralized training, simply change the algorithm to 'PPO'.
To play with the learned agent, you can run the play.py
script. This will load the latest checkpoint from the logs
that have been accumulated during training. For this, execute (for example):
python3 scripts/skrl/play.py --task=Isaac-flycrane-payload-decentralized-hovering-v0 --headless --video --video_length=2000 --algorithm="MAPPO" --control_mode="ACCBR" --save_plots --checkpoint=$(PATH_TO_PT_FILE)
To gather data and plot results of the played episode, add the --save_plots
flag, this will plot several statistics against time such as metrics (for each task), payload and drone states over time.
We have a pre-commit template to automatically format your code. To install pre-commit:
pip install pre-commit
Then you can run pre-commit with:
pre-commit run --all-files