Skip to content

Aerial-Manipulation-Lab/MARL_cooperative_aerial_manipulation_ext

Repository files navigation

Decentralized Aerial Manipulation of a Cable-Suspended Load using Multi-Agent Reinforcement Learning

IsaacSim Isaac Lab Python Linux platform Windows platform pre-commit License

Author - Jack Zeng | Paper

partially_sim.mp4
onboard_flight.mp4
failure_new.mp4

Overview

This repository is an NVIDIA Isaac Lab extension that contains the environment and algorithms to control a multi-drone transport system (flycrane).

Using DirectMARLEnv, only the hover task has been implemented. The partial_obs flag allows to train the policies with only local observations (own state and ID, load pose and goal pose). DirectMARLEnvs allow to for decentralized agents such as MAPPO, but can also be wrapped to allow for centralized training using PPO. This environment has been used to train the agents for the CoRL paper.

Using ManagerBasedRLEnv, these task have been implemented (although deprecated), allowing for centralized control:

  • hover_llc/hover The flycrane gets a reference pose for the payload where it should hover. This is done with a differential based flatness controller (DFBC), or end-to-end, which, in this project, is from the simulation states to the forces and torques on the quadrotors. All tasks hereafter include the DFBC to minimize the sim2real gap.

  • track Track a reference trajectory which is generated by an external script. Different trajectories is figure-8 or ellipsoid shapes are generated for the payload.

  • FlyThrough Make the payload fly through a gap/between 2 walls, avoiding collisions.

I recommend not further developing the ManagerBasedRLEnvs, since centralized control can also be achieved by wrapping DirectMARLEnvs. They can be used for inspiration however.

Moreover, a test environment (without agent) for a single drone is also available in single_falcon.

Installation

cd skrl
pip install -e .
  • Using a python interpreter that has Isaac Lab installed, install the library
cd exts/MARL_mav_carry_ext
python -m pip install -e .

Assets

The assets used for the tasks are under exts/MARL_mav_carry_ext/MARL_mav_carry_ext/assets. The assets folder contains $(ROBOT).py files which have the configuration the respective robot in the form an ArticulationCfg (Isaac Lab actuated robot config class).

Then, in the exts/MARL_mav_carry_ext/MARL_mav_carry_ext/assets/data/AMR folder, the robot URDF and corresponding meshes are located in the $(ROBOT)_data folder and the USD files can be found in the $(ROBOT) folder.

Controllers

The controllers are implemented in exts/MARL_mav_carry_ext/MARL_mav_carry_ext/controllers. The differential flatness based controller (DFBC) can be found in geometric.py. The incremental nonlinear dynamic inversion (INDI) controller can be found in indi.py.

Plotting Tools

The plotting tools are implemented in exts/MARL_mav_carry_ext/MARL_mav_carry_ext/plotting_tools. Plotting is supported for both ManagerBasedRLEnv and DirectMARLEnv.

Environments/Tasks

The environments for each specific flycrane task can be found under exts/MARL_mav_carry_ext/MARL_mav_carry_ext/tasks/MARL_mav_carry.

Moreover, tasks for a single falcon drone are implemented in exts/MARL_mav_carry_ext/MARL_mav_carry_ext/tasks/single_falcon. This environment was only used to test the controller.

Environment structure

The configuration class for each task can be found in their respective folder. Following the Isaac Lab structure the environment is implemented as a ManagerBasedRLEnv.

The manager based environment consists of multiple modules, and their configs can be found in the $(TASK)_env_cfg.py file:

  • Scene The scene that describes the environment. This describes all prims present in the environment such as ground plane, lights, robots and sensors. Also describes obstacles if there are any in the environment.

  • CommandManager Generate a new pose command as a reference for the payload and resample at certain time intervals. Or in the case of a trajectory, sample a new trajectory and pass pass the 4 (variable) future points as observations.

  • ActionManager Processes and sends the actions to the simulation. For the end-to-end case, this class only has 1 ActionTerm that purely reshapes and clamps the forces and torques applied to the body. The RL policy directly learns the forces and torques on the drone bodies (collective thrust and 3 torques). When using a lower level controller, the policy output is passed to the controller and mapped to rotor forces here.

  • ObservationManager The observations available to the policy. Currently, the problem is handled as a centralized problem and contains the following observations:

    • Payload state (positions, orientations, linear/angular velocities) in environment frame
    • Drone states (positions, orientations, linear/angular velocities) in environment frame
    • Payload pose errors to the goal
    • Payload twist errors to the goal (for trajectory tracking)
  • EventManager Describes what happens on certain events such as startup, reset or at certain time intervals. Right now, when reset is called, velocities and external forces/torques on all bodies are set to 0. The pose of the flycrane is sampled from a uniform distribution to randomize the initial state.

  • RewardManager Implements the reward function. Consisting of the following terms:

    • reward_pose Reward for tracking the payload pose
    • reward_twist Reward for tracking the payload twist (for trajectory tracking)
    • reward_force Reward to keep the effort small (rotor forces)
    • reward_policy_action_smoothness Keep the changes in policy output over with respect to the previous timestep small
    • reward_body_rates Reward to keep the commanded body rates small
    • reward_downwash Reward for keeping the wake of the drones away from the payload
  • TerminationsManager Terminates the episode corresponding to an environment if a termination condition has been met. The implemented termination terms are"

    • time_out Time out after max episode length
    • falcon_fly_low Terminate when drones fly too low
    • payload_fly_low Terminate when the payload flies too low
    • illegal_contact Terminate when forces between bodies get too large
    • payload_cable_angle Terminate when the angle between the cables and the payload get too large
    • drone_cable_angle Terminate when the angle between the cables and the drones get too large
    • large_states Terminate when any states of any body are too large
    • bounding_box Terminate when the articulation goes outside of a specified area
    • cables_collide Terminate when the cables collide
    • drones_collide Terminate when the drones get too close to each other
    • target_too_far Terminate when the pose target is too far (for trajectory tracking)
  • CurriculumManager Manager of different learning tasks in order to increase difficulty of the task over time (curriculum learning). This is not implemented.

The DirectMARLEnv case implements all of these functions directly into 1 class, and is more efficient. The functionality of the functions is the same for the most part. It can be found under exts/MARL_mav_carry_ext/MARL_mav_carry_ext/tasks/directMARL.

Training and playing

Isaac Lab offers different wrappers for different RL libraries to make it easy to switch between libraries. The scripts for the corresponding libraries are implemented in scripts. The usable libraries are rsl_rl and skrl.

Agents

The agent configurations for the flycrane are in the respective task's config/flycrane/agents. folder The environments are registered as a gym environment and the parameters of the agents can be changed here.

Training

To train the agent, for example using skrl. You can run the following command from the command line:

python3 scripts/skrl/train.py --task=Isaac-flycrane-payload-decentralized-hovering-v0 --headless --num_envs=4096 --seed=-1 --algorithm="MAPPO"

This will start the training for the hover task with the configured settings in the agent configuration file. For more command line interface arguments, check the respective train.py file under scripts/. To wrap the environment and allow for centralized training, simply change the algorithm to 'PPO'.

Playing

To play with the learned agent, you can run the play.py script. This will load the latest checkpoint from the logs that have been accumulated during training. For this, execute (for example):

python3 scripts/skrl/play.py --task=Isaac-flycrane-payload-decentralized-hovering-v0 --headless --video --video_length=2000 --algorithm="MAPPO" --control_mode="ACCBR" --save_plots --checkpoint=$(PATH_TO_PT_FILE)

To gather data and plot results of the played episode, add the --save_plots flag, this will plot several statistics against time such as metrics (for each task), payload and drone states over time.

Code formatting

We have a pre-commit template to automatically format your code. To install pre-commit:

pip install pre-commit

Then you can run pre-commit with:

pre-commit run --all-files

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published