Skip to content

Commit caf639e

Browse files
committed
adding a notebook for stable baselines, need to check it works and clean it now [skip ci]
1 parent 7d5cfc8 commit caf639e

File tree

3 files changed

+474
-19
lines changed

3 files changed

+474
-19
lines changed

CHANGELOG.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ Change Log
3434

3535
- TODO A number of max buses per sub
3636
- TODO in the runner, save multiple times the same sceanrio
37-
37+
- TODO in the gym env, make the action_space and observation_space attribute
38+
filled automatically (see ray integration, it's boring to have to copy paste...)
3839

3940
[1.10.3] - 2024-xx-yy
4041
-------------------------

getting_started/11_ray_integration.ipynb

Lines changed: 62 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -15,26 +15,66 @@
1515
"\n",
1616
"This notebook is more an \"example of what works\" rather than a deep dive tutorial.\n",
1717
"\n",
18-
"See https://docs.ray.io/en/latest/rllib/rllib-env.html#configuring-environments for a more detailed information.\n",
19-
"\n",
20-
"See also https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html for other details\n",
21-
"\n",
22-
"This notebook is tested with grid2op 1.10 and ray 2.23 on an ubuntu 20.04 machine.\n",
23-
"\n",
18+
"See stable-baselines3.readthedocs.io/ for a more detailed information.\n",
19+
"\n",
20+
"This notebook is tested with grid2op 1.10.2 and stable baselines3 version 2.3.2 on an ubuntu 20.04 machine.\n",
21+
"\n",
22+
"\n",
23+
"## 0 Some tips to get started\n",
24+
"\n",
25+
"<font color='red'> It is unlikely that \"simply\" using a RL algorithm on a grid2op environment will lead to good results for the vast majority of environments.</font>\n",
26+
"\n",
27+
"To make RL algorithms work with more or less sucess you might want to:\n",
28+
"\n",
29+
" 1) ajust the observation space: in particular selecting the right information for your agent. Too much information\n",
30+
" and the size of the observation space will blow up and your agent will not learn anything. Not enough\n",
31+
" information and your agent will not be able to capture anything.\n",
32+
" \n",
33+
" 2) customize the action space: dealing with both discrete and continuous values is often a challenge. So maybe you want to focus on only one type of action. And in all cases, try to still reduce the amount of actions your\n",
34+
" agent \n",
35+
" can perform. Indeed, for \"larger\" grids (118 substations, as a reference the french grid counts more than 6.000\n",
36+
" such substations...) and by limiting 2 busbars per substation (as a reference, for some subsations, you have more\n",
37+
" than 12 such \"busbars\") your agent will have the opportunity to choose between more than 60.000 different discrete\n",
38+
" actions each steps. This is way too large for current RL algorithm as far as we know (and proposed environment are\n",
39+
" small in comparison to real one)\n",
40+
" \n",
41+
" 3) customize the reward: the default reward might not work great for you. Ultimately, what TSO's or ISO's want is\n",
42+
" to operate the grid safely, as long as possible with a cost as low as possible. This is of course really hard to\n",
43+
" catch everything in one single reward signal. Customizing the reward is also really important because the \"do\n",
44+
" nothing\" policy often leads to really good results (much better than random actions) which makes exploration \n",
45+
" different actions...). So you kind of want to incentivize your agent to perform some actions at some point.\n",
46+
" \n",
47+
" 4) use fast simulator: even if you target an industrial application with industry grade simulators, we still would\n",
48+
" advise you to use (at early stage of training at least) fast simulator for the vast majority of the training\n",
49+
" process and then maybe to fine tune on better one.\n",
50+
" \n",
51+
" 5) combine RL with some heuristics: it's super easy to implement things like \"if there is no issue, then do\n",
52+
" nothing\". This can be quite time consuming to learn though. Don't hesitate to check out the \"l2rpn-baselines\"\n",
53+
" repository for already \"kind of working\" heuristics\n",
54+
" \n",
55+
"And finally don't hesitate to check solution proposed by winners of past l2rpn competitions in l2rpn-baselines.\n",
56+
"\n",
57+
"You can also ask question on our discord or on our github."
58+
]
59+
},
60+
{
61+
"cell_type": "markdown",
62+
"metadata": {},
63+
"source": [
2464
"\n",
2565
"## 1 Create the \"Grid2opEnv\" class\n",
2666
"\n",
27-
"In the next cell, we define a custom environment (that will internally use the `GymEnv` grid2op class) that is needed for ray / rllib.\n",
67+
"In the next cell, we define a custom environment (that will internally use the `GymEnv` grid2op class). It is not strictly needed\n",
2868
"\n",
2969
"Indeed, in order to work with ray / rllib you need to define a custom wrapper on top of the GymEnv wrapper. You then have:\n",
3070
"\n",
3171
"- self._g2op_env which is the default grid2op environment, receiving grid2op Action and producing grid2op Observation.\n",
3272
"- self._gym_env which is a the grid2op defined `gymnasium Environment` that cannot be directly used with ray / rllib\n",
33-
"- `Grid2opEnv` which is a the wrapper on top of `self._gym_env` to make it usable with ray / rllib.\n",
73+
"- `Grid2opEnvWrapper` which is a the wrapper on top of `self._gym_env` to make it usable with ray / rllib.\n",
3474
"\n",
35-
"Ray / rllib expects the gymnasium environment to inherit from `gymnasium.Env` and to be initialized with a given configuration. This is why you need to create the `Grid2opEnv` wrapper on top of `GymEnv`.\n",
75+
"Ray / rllib expects the gymnasium environment to inherit from `gymnasium.Env` and to be initialized with a given configuration. This is why you need to create the `Grid2opEnvWrapper` wrapper on top of `GymEnv`.\n",
3676
"\n",
37-
"In the initialization of `Grid2opEnv`, the `env_config` variable is a dictionary that can take as key-word arguments:\n",
77+
"In the initialization of `Grid2opEnvWrapper`, the `env_config` variable is a dictionary that can take as key-word arguments:\n",
3878
"\n",
3979
"- `backend_cls` : what is the class of the backend. If not provided, it will use `LightSimBackend` from the `lightsim2grid` package\n",
4080
"- `backend_options`: what options will be used to create the backend for your environment. Your backend will be created by calling\n",
@@ -74,7 +114,7 @@
74114
"from lightsim2grid import LightSimBackend\n",
75115
"\n",
76116
"\n",
77-
"class Grid2opEnv(Env):\n",
117+
"class Grid2opEnvWrapper(Env):\n",
78118
" def __init__(self,\n",
79119
" env_config: Dict[Literal[\"backend_cls\",\n",
80120
" \"backend_options\",\n",
@@ -83,7 +123,7 @@
83123
" \"obs_attr_to_keep\",\n",
84124
" \"act_type\",\n",
85125
" \"act_attr_to_keep\"],\n",
86-
" Any]):\n",
126+
" Any]= None):\n",
87127
" super().__init__()\n",
88128
" if env_config is None:\n",
89129
" env_config = {}\n",
@@ -207,7 +247,7 @@
207247
"# Construct a generic config object, specifying values within different\n",
208248
"# sub-categories, e.g. \"training\".\n",
209249
"config = (PPOConfig().training(gamma=0.9, lr=0.01)\n",
210-
" .environment(env=Grid2opEnv, env_config={})\n",
250+
" .environment(env=Grid2opEnvWrapper, env_config={})\n",
211251
" .resources(num_gpus=0)\n",
212252
" .env_runners(num_env_runners=0)\n",
213253
" .framework(\"tf2\")\n",
@@ -239,7 +279,11 @@
239279
"cell_type": "markdown",
240280
"metadata": {},
241281
"source": [
242-
"## 3 Train a PPO agent using 2 \"runners\" to make the rollouts"
282+
"## 3 Train a PPO agent using 2 \"runners\" to make the rollouts\n",
283+
"\n",
284+
"In this second example, we explain briefly how to train the model using 2 \"processes\". This is, the agent will interact with 2 agents at the same time during the \"rollout\" phases.\n",
285+
"\n",
286+
"But everything related to the training of the agent is still done on the main process (and in this case not using a GPU but only a CPU)."
243287
]
244288
},
245289
{
@@ -250,7 +294,7 @@
250294
"source": [
251295
"# see https://docs.ray.io/en/latest/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.html\n",
252296
"\n",
253-
"# use multiple use multiple runners\n",
297+
"# use multiple runners\n",
254298
"config2 = (PPOConfig().training(gamma=0.9, lr=0.01)\n",
255299
" .environment(env=Grid2opEnv, env_config={})\n",
256300
" .resources(num_gpus=0)\n",
@@ -282,9 +326,9 @@
282326
"cell_type": "markdown",
283327
"metadata": {},
284328
"source": [
285-
"## 4 Use non default parameters to make the l2rpn environment\n",
329+
"## 4 Use non default parameters to make the grid2op environment\n",
286330
"\n",
287-
"In this first example, we will train a policy using the \"box\" action space."
331+
"In this third example, we will train a policy using the \"box\" action space, and on another environment (`l2rpn_idf_2023` instead of `l2rpn_case14_sandbox`)"
288332
]
289333
},
290334
{
@@ -441,7 +485,7 @@
441485
"name": "python",
442486
"nbconvert_exporter": "python",
443487
"pygments_lexer": "ipython3",
444-
"version": "3.10.13"
488+
"version": "3.8.10"
445489
}
446490
},
447491
"nbformat": 4,

0 commit comments

Comments
 (0)