|
15 | 15 | "\n", |
16 | 16 | "This notebook is more an \"example of what works\" rather than a deep dive tutorial.\n", |
17 | 17 | "\n", |
18 | | - "See stable-baselines3.readthedocs.io/ for a more detailed information.\n", |
| 18 | + "See https://docs.ray.io/en/latest/rllib/rllib-env.html#configuring-environments for a more detailed information.\n", |
19 | 19 | "\n", |
20 | | - "This notebook is tested with grid2op 1.10.2 and stable baselines3 version 2.3.2 on an ubuntu 20.04 machine.\n", |
| 20 | + "See also https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html for other details\n", |
21 | 21 | "\n", |
| 22 | + "This notebook is tested with grid2op 1.10.2 and ray 2.9 on an ubuntu 20.04 machine.\n", |
22 | 23 | "\n", |
| 24 | + "- [0 Some tips to get started](#0-some-tips-to-get-started) : is a reminder on what you can do to make things work. Indeed, this notebook explains \"how to use grid2op with stable baselines\" but not \"how to create a working agent able to operate a real powergrid in real time with stable baselines\". We wish we could explain the later...\n", |
| 25 | + "- [1 Create the \"Grid2opEnvWrapper\" class](#1-create-the-grid2openvwraper-class) : explain how to create the main grid2op env class that you can use a \"gymnasium\" environment. \n", |
| 26 | + "- [2 Create an environment, and train a first policy](#2-create-an-environment-and-train-a-first-policy): show how to create an environment from the class above (is pretty easy)\n", |
| 27 | + "- [3 Evaluate the trained agent ](#3-evaluate-the-trained-agent): show how to evaluate the trained \"agent\"\n", |
| 28 | + "- [4 Some customizations](#4-some-customizations): explain how to perform some customization of your agent / environment / policy\n", |
23 | 29 | "## 0 Some tips to get started\n", |
24 | 30 | "\n", |
25 | 31 | "<font color='red'> It is unlikely that \"simply\" using a RL algorithm on a grid2op environment will lead to good results for the vast majority of environments.</font>\n", |
|
62 | 68 | "metadata": {}, |
63 | 69 | "source": [ |
64 | 70 | "\n", |
65 | | - "## 1 Create the \"Grid2opEnv\" class\n", |
| 71 | + "## 1 Create the \"Grid2opEnvWrapper\" class\n", |
66 | 72 | "\n", |
67 | 73 | "In the next cell, we define a custom environment (that will internally use the `GymEnv` grid2op class). It is not strictly needed\n", |
68 | 74 | "\n", |
|
102 | 108 | "source": [ |
103 | 109 | "from gymnasium import Env\n", |
104 | 110 | "from gymnasium.spaces import Discrete, MultiDiscrete, Box\n", |
| 111 | + "import json\n", |
105 | 112 | "\n", |
106 | 113 | "import ray\n", |
107 | 114 | "from ray.rllib.algorithms.ppo import PPOConfig\n", |
108 | 115 | "from ray.rllib.algorithms import ppo\n", |
109 | 116 | "\n", |
110 | 117 | "from typing import Dict, Literal, Any\n", |
| 118 | + "import copy\n", |
111 | 119 | "\n", |
112 | 120 | "import grid2op\n", |
113 | 121 | "from grid2op.gym_compat import GymEnv, BoxGymObsSpace, DiscreteActSpace, BoxGymActSpace, MultiDiscreteActSpace\n", |
|
201 | 209 | " else:\n", |
202 | 210 | " raise NotImplementedError(f\"action type '{act_type}' is not currently supported.\")\n", |
203 | 211 | " \n", |
204 | | - " \n", |
205 | | - " def reset(self, seed, options):\n", |
| 212 | + " def reset(self, seed=None, options=None):\n", |
206 | 213 | " # use default _gym_env (from grid2op.gym_compat module)\n", |
| 214 | + " # NB: here you can also specify \"default options\" when you reset, for example:\n", |
| 215 | + " # - limiting the duration of the episode \"max step\"\n", |
| 216 | + " # - starting at different steps \"init ts\"\n", |
| 217 | + " # - study difficult scenario \"time serie id\"\n", |
| 218 | + " # - specify an initial state of your grid \"init state\"\n", |
207 | 219 | " return self._gym_env.reset(seed=seed, options=options)\n", |
208 | 220 | " \n", |
209 | 221 | " def step(self, action):\n", |
|
216 | 228 | "cell_type": "markdown", |
217 | 229 | "metadata": {}, |
218 | 230 | "source": [ |
219 | | - "Now we init ray, because we need to." |
| 231 | + "## 2 Create an environment, and train a first policy" |
220 | 232 | ] |
221 | 233 | }, |
222 | 234 | { |
223 | | - "cell_type": "code", |
224 | | - "execution_count": null, |
| 235 | + "cell_type": "markdown", |
225 | 236 | "metadata": {}, |
226 | | - "outputs": [], |
227 | 237 | "source": [ |
228 | | - "ray.init()" |
| 238 | + "Now we init ray, because we need to." |
229 | 239 | ] |
230 | 240 | }, |
231 | 241 | { |
232 | | - "cell_type": "markdown", |
| 242 | + "cell_type": "code", |
| 243 | + "execution_count": null, |
233 | 244 | "metadata": {}, |
| 245 | + "outputs": [], |
234 | 246 | "source": [ |
235 | | - "## 2 Make a default environment, and train a PPO agent for one iteration" |
| 247 | + "ray.init()" |
236 | 248 | ] |
237 | 249 | }, |
238 | 250 | { |
|
279 | 291 | "cell_type": "markdown", |
280 | 292 | "metadata": {}, |
281 | 293 | "source": [ |
282 | | - "## 3 Train a PPO agent using 2 \"runners\" to make the rollouts\n", |
| 294 | + "## 3 Evaluate the trained agent\n", |
| 295 | + "\n", |
| 296 | + "This notebook is a simple quick introduction for stable baselines only. So we don't really recall everything that has been said previously.\n", |
| 297 | + "\n", |
| 298 | + "Please consult the section `0) Recommended initial steps` of the notebook [11_IntegrationWithExistingRLFrameworks](./11_IntegrationWithExistingRLFrameworks.ipynb) for more information.\n", |
| 299 | + "\n", |
| 300 | + "**TLD;DR** grid2op offers the possibility to test your agent on scenarios / episodes different from the one it has been trained. We greatly encourage you to use this functionality.\n", |
| 301 | + "\n", |
| 302 | + "There are two main ways to evaluate your agent:\n", |
| 303 | + "\n", |
| 304 | + "- you stay in the \"gymnasium\" world (see [here](#31-staying-in-the-gymnasium-ecosystem) ) and you evaluate your policy directly just like you would any other gymnasium compatible environment. Simple, easy but without support for some grid2op features\n", |
| 305 | + "- you \"get back\" to the \"grid2op\" world (detailed [here](#32-using-the-grid2op-ecosystem)) by \"converting\" your NN policy into something that is able to output grid2op like action. This introduces yet again a \"wrapper\" but you can benefit from all grid2op features, such as the `Runner` to save an inspect what your policy has done.\n", |
| 306 | + "\n", |
| 307 | + "<font color='red'> We show here just a simple examples to \"get easily started\". For much better working agents, you can have a look at l2rpn-baselines code. There you have classes that maps the environment, the agents etc. to grid2op directly (you don't have to copy paste any wrapper).</font> \n", |
| 308 | + "\n", |
| 309 | + "\n", |
| 310 | + "\n", |
| 311 | + "### 3.1 staying in the gymnasium ecosystem\n", |
| 312 | + "\n", |
| 313 | + "You can do pretty much what you want, but you have to do it yourself, or use any of the \"Wrappers\" available in gymnasium https://gymnasium.farama.org/main/api/wrappers/ (*eg* https://gymnasium.farama.org/main/api/wrappers/misc_wrappers/#gymnasium.wrappers.RecordEpisodeStatistics) or in your RL framework.\n", |
| 314 | + "\n", |
| 315 | + "For the sake of simplicity, we show how to do things \"manually\" even though we do not recommend to do it like that." |
| 316 | + ] |
| 317 | + }, |
| 318 | + { |
| 319 | + "cell_type": "code", |
| 320 | + "execution_count": null, |
| 321 | + "metadata": {}, |
| 322 | + "outputs": [], |
| 323 | + "source": [] |
| 324 | + }, |
| 325 | + { |
| 326 | + "cell_type": "markdown", |
| 327 | + "metadata": {}, |
| 328 | + "source": [ |
| 329 | + "### 3.2 using the grid2op environment" |
| 330 | + ] |
| 331 | + }, |
| 332 | + { |
| 333 | + "cell_type": "code", |
| 334 | + "execution_count": null, |
| 335 | + "metadata": {}, |
| 336 | + "outputs": [], |
| 337 | + "source": [] |
| 338 | + }, |
| 339 | + { |
| 340 | + "cell_type": "markdown", |
| 341 | + "metadata": {}, |
| 342 | + "source": [ |
| 343 | + "## 4 some customizations\n", |
| 344 | + "\n", |
| 345 | + "### 4.1 Train a PPO agent using 2 \"runners\" to make the rollouts\n", |
283 | 346 | "\n", |
284 | 347 | "In this second example, we explain briefly how to train the model using 2 \"processes\". This is, the agent will interact with 2 agents at the same time during the \"rollout\" phases.\n", |
285 | 348 | "\n", |
|
296 | 359 | "\n", |
297 | 360 | "# use multiple runners\n", |
298 | 361 | "config2 = (PPOConfig().training(gamma=0.9, lr=0.01)\n", |
299 | | - " .environment(env=Grid2opEnv, env_config={})\n", |
| 362 | + " .environment(env=Grid2opEnvWrapper, env_config={})\n", |
300 | 363 | " .resources(num_gpus=0)\n", |
301 | 364 | " .env_runners(num_env_runners=2, num_envs_per_env_runner=1, num_cpus_per_env_runner=1)\n", |
302 | 365 | " .framework(\"tf2\")\n", |
|
326 | 389 | "cell_type": "markdown", |
327 | 390 | "metadata": {}, |
328 | 391 | "source": [ |
329 | | - "## 4 Use non default parameters to make the grid2op environment\n", |
| 392 | + "### 4.2 Use non default parameters to make the grid2op environment\n", |
330 | 393 | "\n", |
331 | 394 | "In this third example, we will train a policy using the \"box\" action space, and on another environment (`l2rpn_idf_2023` instead of `l2rpn_case14_sandbox`)" |
332 | 395 | ] |
|
345 | 408 | " \"act_type\": \"box\",\n", |
346 | 409 | " }\n", |
347 | 410 | "config3 = (PPOConfig().training(gamma=0.9, lr=0.01)\n", |
348 | | - " .environment(env=Grid2opEnv, env_config=env_config)\n", |
| 411 | + " .environment(env=Grid2opEnvWrapper, env_config=env_config)\n", |
349 | 412 | " .resources(num_gpus=0)\n", |
350 | 413 | " .env_runners(num_env_runners=2, num_envs_per_env_runner=1, num_cpus_per_env_runner=1)\n", |
351 | 414 | " .framework(\"tf2\")\n", |
|
392 | 455 | " \"act_type\": \"multi_discrete\",\n", |
393 | 456 | " }\n", |
394 | 457 | "config4 = (PPOConfig().training(gamma=0.9, lr=0.01)\n", |
395 | | - " .environment(env=Grid2opEnv, env_config=env_config4)\n", |
| 458 | + " .environment(env=Grid2opEnvWrapper, env_config=env_config4)\n", |
396 | 459 | " .resources(num_gpus=0)\n", |
397 | 460 | " .env_runners(num_env_runners=2, num_envs_per_env_runner=1, num_cpus_per_env_runner=1)\n", |
398 | 461 | " .framework(\"tf2\")\n", |
|
422 | 485 | "cell_type": "markdown", |
423 | 486 | "metadata": {}, |
424 | 487 | "source": [ |
425 | | - "## 5 Customize the policy (number of layers, size of layers etc.)\n", |
| 488 | + "### 4.3 Customize the policy (number of layers, size of layers etc.)\n", |
426 | 489 | "\n", |
427 | 490 | "This notebook does not aim at covering all possibilities offered by ray / rllib. For that you need to refer to the ray / rllib documentation.\n", |
428 | 491 | "\n", |
|
439 | 502 | "\n", |
440 | 503 | "# Use a \"Box\" action space (mainly to use redispatching, curtailment and storage units)\n", |
441 | 504 | "config5 = (PPOConfig().training(gamma=0.9, lr=0.01)\n", |
442 | | - " .environment(env=Grid2opEnv, env_config={})\n", |
| 505 | + " .environment(env=Grid2opEnvWrapper, env_config={})\n", |
443 | 506 | " .resources(num_gpus=0)\n", |
444 | 507 | " .env_runners(num_env_runners=2, num_envs_per_env_runner=1, num_cpus_per_env_runner=1)\n", |
445 | 508 | " .framework(\"tf2\")\n", |
|
0 commit comments