Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning
partially_sim.mp4
onboard_flight.mp4
failure_new.mp4
This repository is an NVIDIA Isaac Lab extension that contains the environment and algorithms to control a multi-drone transport system (flycrane).
Using DirectMARLEnv, only the hover task has been implemented. The partial_obs flag allows to train the policies with only local observations (own state and ID, load pose and goal pose). DirectMARLEnvs allow to for decentralized agents such as MAPPO, but can also be wrapped to allow for centralized training using PPO. This environment has been used to train the agents for the CoRL paper.
Using ManagerBasedRLEnv, these task have been implemented (although deprecated), allowing for centralized control:
-
hover_llc/hoverThe flycrane gets a reference pose for the payload where it should hover. This is done with a differential based flatness controller (DFBC), or end-to-end, which, in this project, is from the simulation states to the forces and torques on the quadrotors. All tasks hereafter include the DFBC to minimize the sim2real gap. -
trackTrack a reference trajectory which is generated by an external script. Different trajectories is figure-8 or ellipsoid shapes are generated for the payload. -
FlyThroughMake the payload fly through a gap/between 2 walls, avoiding collisions.
I recommend not further developing the ManagerBasedRLEnvs, since centralized control can also be achieved by wrapping DirectMARLEnvs. They can be used for inspiration however.
Moreover, a test environment (without agent) for a single drone is also available in single_falcon.
-
Install Isaac Lab, see the installation guide.
-
Clone and install this fork of SKRL
cd skrl
pip install -e .
- Using a python interpreter that has Isaac Lab installed, install the library
cd exts/MARL_mav_carry_ext
python -m pip install -e .
The assets used for the tasks are under exts/MARL_mav_carry_ext/MARL_mav_carry_ext/assets.
The assets folder contains $(ROBOT).py files which have the configuration the respective robot in the form an ArticulationCfg (Isaac Lab actuated robot config class).
Then, in the exts/MARL_mav_carry_ext/MARL_mav_carry_ext/assets/data/AMR folder, the robot URDF and corresponding meshes are located in the $(ROBOT)_data folder and the USD files can be found in the $(ROBOT) folder.
The controllers are implemented in exts/MARL_mav_carry_ext/MARL_mav_carry_ext/controllers. The differential flatness based controller (DFBC) can be found in geometric.py. The incremental nonlinear dynamic inversion (INDI) controller can be found in indi.py.
The plotting tools are implemented in exts/MARL_mav_carry_ext/MARL_mav_carry_ext/plotting_tools. Plotting is supported for both ManagerBasedRLEnv and DirectMARLEnv.
The environments for each specific flycrane task can be found under exts/MARL_mav_carry_ext/MARL_mav_carry_ext/tasks/MARL_mav_carry.
Moreover, tasks for a single falcon drone are implemented in exts/MARL_mav_carry_ext/MARL_mav_carry_ext/tasks/single_falcon. This environment was only used to test the controller.
The configuration class for each task can be found in their respective folder. Following the Isaac Lab structure the environment is implemented as a ManagerBasedRLEnv.
The manager based environment consists of multiple modules, and their configs can be found in the $(TASK)_env_cfg.py file:
-
SceneThe scene that describes the environment. This describes all prims present in the environment such as ground plane, lights, robots and sensors. Also describes obstacles if there are any in the environment. -
CommandManagerGenerate a new pose command as a reference for the payload and resample at certain time intervals. Or in the case of a trajectory, sample a new trajectory and pass pass the 4 (variable) future points as observations. -
ActionManagerProcesses and sends the actions to the simulation. For the end-to-end case, this class only has 1ActionTermthat purely reshapes and clamps the forces and torques applied to the body. The RL policy directly learns the forces and torques on the drone bodies (collective thrust and 3 torques). When using a lower level controller, the policy output is passed to the controller and mapped to rotor forces here. -
ObservationManagerThe observations available to the policy. Currently, the problem is handled as a centralized problem and contains the following observations:- Payload state (positions, orientations, linear/angular velocities) in environment frame
- Drone states (positions, orientations, linear/angular velocities) in environment frame
- Payload pose errors to the goal
- Payload twist errors to the goal (for trajectory tracking)
-
EventManagerDescribes what happens on certain events such as startup, reset or at certain time intervals. Right now, when reset is called, velocities and external forces/torques on all bodies are set to 0. The pose of the flycrane is sampled from a uniform distribution to randomize the initial state. -
RewardManagerImplements the reward function. Consisting of the following terms:reward_poseReward for tracking the payload posereward_twistReward for tracking the payload twist (for trajectory tracking)reward_forceReward to keep the effort small (rotor forces)reward_policy_action_smoothnessKeep the changes in policy output over with respect to the previous timestep smallreward_body_ratesReward to keep the commanded body rates smallreward_downwashReward for keeping the wake of the drones away from the payload
-
TerminationsManagerTerminates the episode corresponding to an environment if a termination condition has been met. The implemented termination terms are"time_outTime out after max episode lengthfalcon_fly_lowTerminate when drones fly too lowpayload_fly_lowTerminate when the payload flies too lowillegal_contactTerminate when forces between bodies get too largepayload_cable_angleTerminate when the angle between the cables and the payload get too largedrone_cable_angleTerminate when the angle between the cables and the drones get too largelarge_statesTerminate when any states of any body are too largebounding_boxTerminate when the articulation goes outside of a specified areacables_collideTerminate when the cables collidedrones_collideTerminate when the drones get too close to each othertarget_too_farTerminate when the pose target is too far (for trajectory tracking)
-
CurriculumManagerManager of different learning tasks in order to increase difficulty of the task over time (curriculum learning). This is not implemented.
The DirectMARLEnv case implements all of these functions directly into 1 class, and is more efficient. The functionality of the functions is the same for the most part. It can be found under exts/MARL_mav_carry_ext/MARL_mav_carry_ext/tasks/directMARL.
Isaac Lab offers different wrappers for different RL libraries to make it easy to switch between libraries. The scripts for the corresponding libraries are implemented in scripts. The usable libraries are rsl_rl and skrl.
The agent configurations for the flycrane are in the respective task's config/flycrane/agents. folder The environments are registered as a gym environment and the parameters of the agents can be changed here.
To train the agent, for example using skrl. You can run the following command from the command line:
python3 scripts/skrl/train.py --task=Isaac-flycrane-payload-decentralized-hovering-v0 --headless --num_envs=4096 --seed=-1 --algorithm="MAPPO"
This will start the training for the hover task with the configured settings in the agent configuration file. For more command line interface arguments, check the respective train.py file under scripts/. To wrap the environment and allow for centralized training, simply change the algorithm to 'PPO'.
To play with the learned agent, you can run the play.py script. This will load the latest checkpoint from the logs that have been accumulated during training. For this, execute (for example):
python3 scripts/skrl/play.py --task=Isaac-flycrane-payload-decentralized-hovering-v0 --headless --video --video_length=2000 --algorithm="MAPPO" --control_mode="ACCBR" --save_plots --checkpoint=$(PATH_TO_PT_FILE)
To gather data and plot results of the played episode, add the --save_plots flag, this will plot several statistics against time such as metrics (for each task), payload and drone states over time.
We have a pre-commit template to automatically format your code. To install pre-commit:
pip install pre-commitThen you can run pre-commit with:
pre-commit run --all-files