This repository provides a custom OpenAI Gymnasium environment for simulating and training reinforcement learning (RL) agents to control a quadcopter. It includes a reward function, environment implementation, and a sample notebook for training with the TD3 algorithm using Stable Baselines3.
- Custom Environment:
QuadcopterEnvsimulates a 12-dimensional quadcopter state and 4-dimensional action space. - Reward Function: Flexible quadratic reward function for state and action penalties.
- TD3 Training Example: Jupyter notebook (
td3.ipynb) demonstrates training and evaluation of a TD3 agent. - Logging: Training logs are saved in the
logs/directory for analysis.
quad_copter.py: Defines theQuadcopterEnvclass, a Gymnasium-compatible environment for quadcopter control.reward_func.py: Contains thequadcopter_rewardfunction, a quadratic cost-based reward for RL.td3.ipynb: Jupyter notebook for training and evaluating a TD3 agent on the custom environment.logs/: Directory for training logs and monitor files.
- Python 3.8+
- gymnasium
- stable-baselines3
- numpy
- pandas (for log analysis)
Install dependencies:
pip install gymnasium stable-baselines3 numpy pandas- Custom Environment: Use
QuadcopterEnvfromquad_copter.pyin your RL experiments. - Reward Function: Import and use
quadcopter_rewardfor custom reward shaping. - Training: Run the
td3.ipynbnotebook to train and evaluate a TD3 agent.
from quad_copter import QuadcopterEnv
import numpy as np
env = QuadcopterEnv()
obs, _ = env.reset()
for _ in range(100):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
if done:
break- The environment state is a 12D vector: position, angles, velocities, and angular velocities.
- The action is a 4D vector, typically representing motor commands.
- The reward penalizes deviation from the goal state and large actions.