This project aims to simulate and train a single/double pendulum on a rail to balance itself in an inverted position using reinforcement learning. The simulation is implemented in Gazebo and Pytorch, with ROS 2 serving as the middleware interface.
- Simulates a double pendulum on a rail in Gazebo.
- Uses ROS 2 for interfacing sensor data and controlling the pendulum.
- Reinforcement learning setup using Gymnasium-compatible environments.
- Dynamic control, data publishing, and state monitoring nodes.
This project has been developped under Ubuntu 24.04 LTS. We didn't manage to make Gazebo work in WSL.
Requires Python 3.12
Before running the project, ensure you the following installed.
- ROS 2 (version jazzy)
- Gazebo (version harmonic)
- ros-gz bridge
The following instructions work for Ubuntu 24.04 LTS (Noble)
The following intructions are extracted from the official installation instructions
Make sure you have a locale which supports UTF-8. If you are in a minimal environment (such as a docker container), the locale may be something minimal like POSIX.
locale # check for UTF-8You will need to add the ROS 2 apt repository to your system. First ensure that the Ubuntu Universe repository is enabled.
sudo apt install software-properties-common
sudo add-apt-repository universeNow add the ROS 2 GPG key with apt.
sudo apt update && sudo apt install curl -y
sudo curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key -o /usr/share/keyrings/ros-archive-keyring.gpgThen add the repository to your sources list.
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] http://packages.ros.org/ros2/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) main" | sudo tee /etc/apt/sources.list.d/ros2.list > /dev/nullUpdate your apt repository caches after setting up the repositories.
sudo apt update
sudo apt upgradeDesktop Install (Recommended): ROS, RViz, demos, tutorials.
sudo apt install ros-jazzy-desktopAdd ro2 to the path: open this file with a text editor
nano ~/.bashrcand add this line at the end :
source /opt/ros/jazzy/setup.bashSave and close the file, and source it:
source ~/.bashrcThe following intructions are extracted from the official installation instructions
sudo apt-get install curl lsb-release gnupg
sudo curl https://packages.osrfoundation.org/gazebo.gpg --output /usr/share/keyrings/pkgs-osrf-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/pkgs-osrf-archive-keyring.gpg] http://packages.osrfoundation.org/gazebo/ubuntu-stable $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/gazebo-stable.list > /dev/null
sudo apt-get update
sudo apt-get install gz-harmonicAll Gazebo should be ready to use and the gz sim app ready to be executed.
The following command will install the correct version of Gazebo and ros_gz for your ROS installation on a Linux system.
sudo apt-get install ros-jazzy-ros-gzCreate a new virtual environment and add all the required packages to it.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt-
Clone this repository:
git clone https://github.com/Aym-brz/projetIA cd projetIA -
Build the workspace:
colcon build
-
Source the workspace:
source install/setup.bash
projectroot
├── src/projetIA
│ ├── config/ # Contains configuration files
│ │ └── bridge_config.yaml # Bridge configuration between Gazebo and ROS topics
│ ├── models/ # Contains SDF models
│ │ └── default_world.sdf # Empty environment
│ │ └── double_pendulum_rail.sdf # Description of the double pendulum
│ │ └── simple_pendulum_rail.sdf # Description of the single pendulum
│ │ └── simple_pendulum_up_rail.sdf # Single pendulum initialized from up
│ ├── launch/ # Contains launch files for simulations and the ros-gz bridge
│ │ └── pendulum.launch.py # launch the double pendulum
│ │ └── simple_pendulum_up.launch.py # launch the simple pendulum upwards
│ │ └── simple_pendulum.launch.py # launch the simple pendulum
│ │ └── test_pendulum.launch.py # launch the double pendulum and test the different nodes.
│ ├── projetIA/ # Python library for the project
│ │ └── eval_policy.py # Evaluate the a policy obtain
│ │ └── main.py # Launch the training and the evaluation
│ │ └── network.py # Structure of the neural networks
│ │ └── state_subscriber.py # ROS node to read the speeds and positions
│ │ └── speed_publisher.py # ROS node to publish the speed of the trolley
│ │ └── pendulum_env.py # Creating Gymnasium environment, starts all the ROS nodes required
│ │ └── train_pendulum_reinforce.py # Training script for DQN
│ │ └── train_pendulum_reinforce.py # Training script for REINFORCE algorithm
│ │ └── train_pendulum.py # Training script (not working, implementation of reinforce)
│ │ └── world_control.py # ROS node to start, pause and reset the simulation
│ ├── setup.py # Setup script for the ROS 2 package
│ └── package.xml # ROS 2 package metadata
├── README.md # Documentation
└── requirements.txt # Python requirements
-
Launch the Gazebo simulation with ROS 2 bridge:
ros2 launch projetIA pendulum.launch.py
This will launch the simulation, as well as the Gazebo - ROS bridge.
-
The simulation can be interacted with manually, to check that everything work correctly.
- The speed and position of the joints can be retrieved on the ROS
/joint_statestopic:- Using a node subscribed to the right topics:
ros2 run projetIA state_subscriber
- Manually:
ros2 topic echo /joint_states
- Using a node subscribed to the right topics:
- The velocity of the trolley can be set by publishing a float to the topic
/trolley_speed_cmd:- Using a node publishing to the right topics:
ros2 run projetIA speed_publisher
- Manually:
ros2 topic pub /trolley_speed_cmd std_msgs/msg/Float64 "data: 4.0"
- Using a node publishing to the right topics:
- The simulation can be started, paused, and reset by publishing on the ROS topic
/world/default/control:- Using a node:
ros2 run projetIA world_control
- Using a node:
- The speed and position of the joints can be retrieved on the ROS
-
Training or evaluating a policy can be launched by running the file
src/projetIA/projetIA/main.py(set the different parameters in this file). -
It is also possible to evaluate a policy by running the file
src/projetIA/projetIA/eval_policy.py
trained.pendulum.mp4
Stability test, applying a force through the Gazebo UI (recording the screen significantly impacted the simulation performance, forcing us to record indirectly):
pendulum.stability.test.mp4
To replicate these videos, follow these steps:
- Launch a Gazebo simulation with a simple pendulum starting downwards:
ros2 launch projetIA simple_pendulum.launch.py- Run the
eval_policyscript with the following settings in the main function:
double_pendulum = False
starting_up = False
max_iter = 10000
is_DQN = True
save_path = "saved_policies/single_pendulum/DQN/starting_down/policy_DQN_2930.pth"- Apply a force to the pendulum using the "Apply Force Torque" section in Gazebo:
- Click on the three dots in the top right corner of the Gazebo window to display this section.
- Click on the pendulum and select "upper_link" from the link list.
- Set the force magnitude in the Y direction.
- Click the "Apply Force" button will apply the force for a short duration.
The pendulum starts on the stable low position. The reinforcement learning algorithm encourages the pendulum to reach and maintain an inverted balance through reward-based feedback. No supervised learning is used; instead, the reward function incentivizes minimizing angular deviations.
We tried the following reward functions:
-
First version:
- Positive Terms:
- Maintaining angles near the upright position for both pendulum links.
- Maintaining position near the center for the trolley.
- No Penalty for Failures
- Positive Terms:
-
Second version:
- Stability Terms:
- Instability compute as Maintaining angles near the upright position and Maintaining position near the center for the trolley.
- Stability as the exponential of the negative instability: the stability will increase the reward function if it is near the goal, and decrease the reward funtion if it is away from the goal.
- Force punishment: Derivative of the speed, this will dismunish the reward function if there is too much variation in the speed.
- Penalty for Failures: Simulation reset after failure + penalty if the trolley reach the border.
- Stability Terms:
-
Final version:
- Positive Terms: Angles close to upright position for both pendulum links.
- Negative Terms: Position far the center for the trolley.
- Penalty for Failures: Simulation reset after failure + penalty if the trolley reach the border.
- Run with GPU for longer training
- DDPG implementation
- Better to have a graphic card to run the code. With CPU - intel i7 - 8650U, DQN training last for 10h
- ros-gz: https://github.com/gazebosim/ros_gz/tree/ros2/ros_gz_sim_demos
- RL definition: https://www.ibm.com/think/topics/reinforcement-learning
- RL: https://www.sciencedirect.com/science/article/abs/pii/S0952197623017025
- Gym environment documentation: https://www.gymlibrary.dev/api/core/
- PPO: https://github.com/ericyangyu/PPO-for-Beginners
- DQN:
- Reinforce (and DDQP + DQN) : https://github.com/fredrikmagnus/RL-for-Inverted-Pendulum
- DDPG :