22
33![ Demo of Plane environment] ( demo.gif )
44
5- ** Plane** is a lightweight yet realistic ** reinforcement learning environment** simulating a 2D side view of an Airbus A320-like aircraft.
6- It’s designed for ** fast, end-to-end training on GPU with JAX** while staying ** physics-based** and ** realistic enough** to capture the core challenges of aircraft control.
5+ ** Plane** is a lightweight yet realistic ** reinforcement learning environment** simulating a 2D side view of an Airbus A320-like aircraft.
6+ It’s designed for ** fast, end-to-end training on GPU with JAX** while staying ** physics-based** and ** realistic enough** to capture the core challenges of aircraft control.
77
8- Plane allows you to benchmark RL agents on ** delays, perturbations, irrecoverable states, partial observability, and competing objectives** — challenges that are often ignored in standard toy environments.
8+ Plane allows you to benchmark RL agents on ** delays, irrecoverable states, partial observability, and competing objectives** — challenges that are often ignored in standard toy environments.
99
1010---
1111
1212## ✨ Features
1313
14- - 🏎 ** Fast & parallelizable** thanks to JAX — scale to thousands of parallel environments on GPU/TPU.
15- - 📐 ** Physics-based** : Dynamics are derived from airplane modeling equations (not arcade physics).
16- - 🧪 ** Reliable** : Covered by unit tests to ensure stability and reproducibility.
17- - 🎯 ** Challenging** : Captures real-world aviation control problems (momentum, delays, irrecoverable states).
18- - 🔄 ** Compatible with both worlds** :
19- - [ Gymnasium] ( https://gymnasium.farama.org/ ) (with Stable-Baselines3)
20- - [ Gymnax] ( https://github.com/RobertTLange/gymnax ) (with sbx / JAX-native RL libraries)
14+ * 🏎 ** Fast & parallelizable** thanks to JAX — scale to thousands of parallel environments on GPU/TPU.
15+ * 📐 ** Physics-based** : Dynamics are derived from airplane modeling equations (not arcade physics).
16+ * 🧪 ** Reliable** : Covered by unit tests to ensure stability and reproducibility.
17+ * 🎯 ** Challenging** : Captures real-world aviation control problems (momentum, delays, irrecoverable states).
18+ * 🔄 ** Compatible with multiple interfaces** : Designed to work with JAX-based environments.
19+ * 🌟 ** Upcoming features** : Environmental perturbations (e.g., wind) will be available in future releases.
2120
2221---
2322
2423## 📊 Stable Altitude vs. Power & Pitch
2524
26- Below is an example of how stable altitude changes with engine power and pitch:
25+ Below is an example of how stable altitude changes with engine power and pitch:
2726
2827![ Stable altitude graph] ( altitude_vs_power_and_stick.png )
2928
@@ -47,93 +46,129 @@ poetry add plane-env
4746
4847## 🎮 Usage
4948
50- Plane supports ** both Gymnasium and Gymnax interfaces** .
51- Here are some examples to get you started:
49+ Here’s a minimal example of running an episode and saving a video:
5250
53- ### Gymnasium (with Stable-Baselines3)
51+ ``` python
52+ from plane_env.env_jax import Airplane2D, EnvParams
53+
54+ # Create env
55+ env = Airplane2D()
56+ seed = 42
57+ env_params = EnvParams(max_steps_in_episode = 1_000 )
58+
59+ # Simple constant policy with 80% power and 0° stick input.
60+ action = (0.8 , 0.0 )
61+
62+ # Save the video
63+ env.save_video(lambda o : action, seed, folder = " videos" , episode_index = 0 , params = env_params, format = " gif" )
64+ ```
65+
66+ Of course you can also directly use it to train an agent using your favorite RL library (here: stable-baselines3)
5467
5568``` python
56- import gymnasium as gym
57- import plane_env
69+ from plane_env.env_gymnasium import Airplane2D, EnvParams
70+ from stable_baselines3 import SAC
71+
72+ # Create env
73+ env = Airplane2D()
74+ # Model training (adapted from https://stable-baselines3.readthedocs.io/en/master/modules/sac.html)
5875
59- # Create environment
60- env = gym.make(" Plane-v0" , render_mode = " rgb_array" )
6176
62- # Rollout a random policy and save a video
63- from stable_baselines3.common.vec_env import VecVideoRecorder, DummyVecEnv
77+ model = SAC(" MlpPolicy" , env, verbose = 1 )
78+ model.learn(total_timesteps = 10_000 , log_interval = 4 )
79+ model.save(" sac_plane" )
6480
65- vec_env = DummyVecEnv([lambda : env])
66- video_env = VecVideoRecorder(vec_env, " videos/" , record_video_trigger = lambda x : x == 0 , video_length = 200 )
67- obs = video_env.reset()
81+ del model # remove to demonstrate saving and loading
6882
69- for _ in range (200 ):
70- action = video_env.action_space.sample()
71- obs, reward, done, info = video_env.step(action)
72- if done:
73- video_env.reset()
83+ model = SAC .load(" sac_plane" )
7484
75- video_env.close()
85+ obs, info = env.reset()
86+ while True :
87+ action, _states = model.predict(obs, deterministic = True )
88+ obs, reward, terminated, truncated, info = env.step(action)
89+ if terminated or truncated:
90+ break
7691```
7792
78- ### Gymnax (with sbx)
7993
80- ``` python
81- import jax
82- import jax.numpy as jnp
83- import gymnax
84- import plane_env
85-
86- # Create environment
87- env, params = plane_env.make(" PlaneJax-v0" )
88-
89- # Vectorize & run multiple parallel environments
90- key = jax.random.PRNGKey(0 )
91- reset_fn = jax.vmap(env.reset, in_axes = (0 , None ))
92- step_fn = jax.vmap(env.step, in_axes = (0 , 0 , None , None ))
93-
94- keys = jax.random.split(key, 16 )
95- obs, state = reset_fn(keys, params)
96-
97- def rollout_step (carry , _ ):
98- obs, state, key = carry
99- action = jax.random.randint(key, (), 0 , env.action_space().n)
100- key, subkey = jax.random.split(key)
101- obs, state, reward, done, info = step_fn(obs, state, action, subkey, params)
102- return (obs, state, key), (obs, reward, done)
103-
104- (_, _, _), (obs_hist, reward_hist, done_hist) = jax.lax.scan(rollout_step, (obs, state, key), None , length = 200 )
105- ```
94+ ---
95+
96+ ## 🛩️ Environment Overview (Reinforcement Learning Perspective)
97+
98+ ** State (` EnvState ` )** : 13-dimensional vector representing aircraft dynamics:
99+
100+ | Variable | Description |
101+ | ----------------- | --------------------------------------- |
102+ | ` x ` | Horizontal position (m) |
103+ | ` x_dot ` | Horizontal speed (m/s) |
104+ | ` z ` | Altitude (m) |
105+ | ` z_dot ` | Vertical speed (m/s) |
106+ | ` theta ` | Pitch angle (rad) |
107+ | ` theta_dot ` | Pitch angular velocity (rad/s) |
108+ | ` alpha ` | Angle of attack (rad) |
109+ | ` gamma ` | Flight path angle (rad) |
110+ | ` m ` | Aircraft mass (kg) |
111+ | ` power ` | Normalized engine thrust (0–1) |
112+ | ` stick ` | Control stick input for pitch (–1 to 1) |
113+ | ` fuel ` | Remaining fuel (kg) |
114+ | ` t ` | Current timestep |
115+ | ` target_altitude ` | Desired target altitude (m) |
116+
117+ The state also provides ** derived properties** like air density, Mach number, and speed of sound.
118+
119+ The agent currently observes all of the state, minus ** x** and ** t** (as they should be irrelevant for control), as well as fuel which is currently not used.
120+
121+ ** Action Space** : Continuous 2D vector ` [power_requested, stick_requested] ` controlling engine thrust and pitch.
122+
123+ ** Reward Function** :
124+
125+ * Encourages maintaining ** target altitude** .
126+ * Terminal altitude violations (` z < min_alt ` or ` z > max_alt ` ) incur ` -max_steps_in_episode ` .
127+ * Otherwise, reward is sthe quared normalized difference to target altitude:
128+
129+ $` r_t = \left( \frac{\text{max\_alt} - | \text{target\_altitude} - z_t |}{\text{max\_alt} - \text{min\_alt}} \right)^2 ` $
130+
131+
132+
133+ ** Episode Termination** :
134+
135+ * ** Altitude limits exceeded** → terminated
136+ * ** Maximum episode length reached** → truncated
137+
138+ ** Time step** : ` delta_t = 0.5 s ` , ` max_steps_in_episode = 1,000 ` .
106139
107140---
108141
109142## 🧩 Challenges Modeled
110143
111144Plane is designed to test RL agents under ** realistic aviation challenges** :
112145
113- - ⏳ ** Delay** : Engine power changes take time to fully apply.
114- - 🌪 ** Perturbations** : Random wind gusts alter dynamics.
115- - 👀 ** Partial observability** : Some forces and wind speeds cannot be directly measured.
116- - 🏁 ** Competing objectives** : Reach target altitude fast while minimizing fuel and overshoot.
117- - 🌀 ** Momentum effects** : Control inputs show delayed impact due to physical inertia.
118- - ⚠️ ** Irrecoverable states** : Certain trajectories inevitably lead to failure (crash).
146+ * ⏳ ** Delay** : Engine power changes take time to fully apply.
147+ * 👀 ** Partial observability** : Some forces cannot be directly measured.
148+ * 🏁 ** Competing objectives** : Reach target altitude fast while minimizing fuel and overshoot.
149+ * 🌀 ** Momentum effects** : Control inputs show delayed impact due to physical inertia.
150+ * ⚠️ ** Irrecoverable states** : Certain trajectories inevitably lead to failure (crash).
151+
152+ > Environmental perturbations (wind, turbulence) are coming in a future release.
119153
120154---
121155
122156## 📦 Roadmap
123157
124- - [ ] Add pitch as a controllable action.
125- - [ ] Expand challenges (sensor noise, turbulence models).
126- - [ ] Provide ready-to-use benchmark results for popular RL baselines.
158+ * [ ] Add perturbations (wind with varying speeds and directions) to model the non-stationarity of the dynamics.
159+ * [ ] Add an easier interface to create partially-observable versions of the environment.
160+ * [ ] Provide ready-to-use benchmark results for popular RL baselines.
161+ * [ ] Add fuel consumption.
127162
128163---
129164
130165## 🤝 Contributing
131166
132- Contributions are welcome!
133- Please open an issue or PR if you have suggestions, bug reports, or new features.
167+ Contributions are welcome!
168+ Please open an issue or PR if you have suggestions, bug reports, or new features.
134169
135170---
136171
137172## 📜 License
138173
139- MIT License – feel free to use it in your own research and projects.
174+ MIT License – feel free to use it in your own research and projects.
0 commit comments