You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version 0.9.9: Gifs, Better Memory Management, Save-able DataBunches (#14)
* Init Branch
* Added:
- Python 3.7 testing
- parallel matrix testing
- k_partitions_top. You can now divide your runs for example k=3. This means if you run for 100 epochs, MDPDataBunch will store the best episode in the first third, second third, and final third. Thus giving you a perspective on how the agent evolved.
- test for k_partitions_top and k_top
- Gif plotting for AgentInterpretation
- Gif writing for AgentInterpretation
- pickle tests
* Fixed:
- plotting failure / bug with data block *sometimes* double reseting the most recent episode.
- gifs register in jupyter notebooks
* Added:
- auto gif generating code to tests.
* Fixed:
- tests probably crashing because moviepy is not installed. Not sure if we want to deal with moviepy dependencies in azure
- data block not loading pybullet wrapper
- somehow trained_learner method in test was completely broken
- gif generation crshing due to out of bounds
* Changed:
- migrated layers to layers.py. Should make imports less crazy.
Removed:
- agents_base.py because all the functionality is in agent_core.py or layers.py
* Added:
- .idea dictionaries to snowball common words.
- codeStyle project specification
- ddpg tests
- metric tests
- databunch save / load testing
- learner import / export compatibility
- resolution wrapper. Reduces the render resolution of an env by step size
- gif out of bounds issue
- DDPG was somewhat broken. Add tanh activation to Actor module. For critic, of conv is not specified, then a single fc layer is added for only state input. Previously this was fine with Pendulum. However ant would just fail badly / highly negative reward. Now I understand that ant penalizes high value actions, so passing actions through a tanh activation function should fix that. The single Linear layer should make the DDPG more stable.
- resolution reduction in ddpg tests
- some of the pybullet env closing issues. You can now specify to have the env close after fit (default is not to close so that you can easily execute fit multiple times). Though: bulletphysics/bullet3#2470 claims this is fixed, at least for me this is still an issue. We fixed this by calling gc.collect() after close to force an immediate clean up of resources.
* Added:
- frame skipping for gif generation to reduce gif size.
* Fixed:
- gif generation warning
- More pybullet env crashing issues. Turns out the "close" method call does not actually close the pybullet env. This means that that env will close later when gc is being performed (possibly screwing up future training.). It seems more and more that we need to have a special close method for every physics engine type since some don't react to the regular close, but others actually shut down the entire engine permanently!
Changed:
- Simplifying the ddpg tests
Added:
- RL abbreviations
Notes:
- I am starting to move to making the code follow the fastai style guidelines. I think this will be a continuous progression after v1.0
* Fixed:
- gif generation warning
- test random failure
* Added:
- prototype roadmap
- python-xlib in due to python3.7 breaking for some reason...
* Updated:
- convolutional layers make more sense plus reduce the image space
* Changed:
- tests passed
- removed Python3.7 testing for now due to openai gym not working will with it.
* Added:
- a docker file for fastrl
- trying a simply image test
* Fixed:
- maze gif env takes too much ram. Reduced image size
* Changed:
- trigger in the yml file is removed so that the pipelines cloud can orchestrate when / how everything will be executed
* Added:
- deployment staging
* Update azure-pipelines.yml for Azure Pipelines
* Added:
- some conditions that might allow this to be multistage but idk lol
* Notes:
- testing twine autobuild
* Update azure-pipelines.yml for Azure Pipelines
* Notes:
- fixed Docker file
- need to reduce GIF size somehow
* Added:
- gifs
- gif processing notebook
* Added:
- initial new README without graphics and stuff
-[X] 0.7.0 Full test suite using multi-processing. Connect to CI.
87
-
-[X] 0.8.0 Comprehensive model eval **debug/verify**. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
88
-
-[X] 0.9.0 Notebook demonstrations of basic model usage.
89
109
-[ ]**Working on****1.0.0** Base version is completed with working model visualizations proving performance / expected failure. At
90
110
this point, all models should have guaranteed environments they should succeed in.
91
-
-[ ] 1.8.0 Add PyBullet Fetch Environments
92
-
-[ ] 1.8.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
93
-
-[ ] 1.8.0 Add HER
94
-
95
-
96
-
## Code
97
-
Some of the key take aways is Fastai's use of callbacks. Not only do callbacks allow for logging, but in fact adding a
98
-
callback to a generic fit function can change its behavior drastically. My goal is to have a library that is as easy
99
-
as possible to run on a server or on one's own computer. We are also interested in this being easy to extend.
100
-
101
-
We have a few assumptions that the code / support algorithms I believe should adhere to:
102
-
- Environments should be pickle-able, and serializable. They should be able to shut down and start up multiple times
103
-
during run time.
104
-
- Agents should not need more information than images or state values for an environment per step. This means that
105
-
environments should not be expected to allow output of contact points, sub-goals, or STRIPS style logical outputs.
106
-
107
-
Rational:
108
-
- Shutdown / Startup: Some environments (pybullet) have the issue of shutting down and starting different environments.
109
-
Luckily, we have a fork of pybullet, so these modifications will be forced.
110
-
- Pickling: Being able to encapsulate an environment as a `.pkl` can be important for saving it and all the information
111
-
it generated.
112
-
- Serializable: If we want to do parallel processing, environments need to be serializable to transport them between
113
-
those processes.
114
-
115
-
Some extra assumptions:
116
-
- Environments can easier be goal-less, or have a single goal in which OpenAI defines as `Env` and `GoalEnv`.
117
-
118
-
These assumptions are necessary for us to implement other envs from other repos. We do not want to be tied to just
119
-
OpenAI gyms.
120
-
121
-
## Versioning
122
-
At present the repo is in alpha stages being. We plan to move this from alpha to a pseudo beta / working versions.
123
-
Regardless of version, we will follow Python style versioning
124
-
125
-
_Alpha Versions_: #.#.# e.g. 0.1.0. Alpha will never go above 0.99.99, at that point it will be full version 1.0.0.
126
-
A key point is during alpha, coding will be quick and dirty with no promise of proper deprecation.
127
-
128
-
_Beta / Full Versions_: These will be greater than 1.0.0. We follow the Python method of versions:
129
-
**[Breaking Changes]**.**[Backward Compatible Features]**.**[Bug Fixes]**. These will be feature
130
-
additions such new functions, tools, models, env support. Also proper deprecation will be used.
131
-
132
-
_Pip update frequency_: We have a pip repository, however we do not plan to update it as frequently at the moment.
133
-
However, the current frequency will be during Beta / Full Version updates, we might every 0.5.0
134
-
versions update pip.
135
-
136
-
## Contributing
137
-
Follow the templates we have on github. Make a branch either from master or the most recent version branch.
138
-
We recommend squashing commits / keep pointless ones to a minimum.
111
+
-[ ] 1.1.0 More Traditional RL models
112
+
-[ ] Add PPO
113
+
-[ ] Add TRPO
114
+
-[ ] Add D4PG
115
+
-[ ] Add A2C
116
+
-[ ] Add A3C
117
+
-[ ] 1.2.0 HRL models *Possibly might change version to 2.0 depending on SMDP issues*
118
+
-[ ] Add SMDP
119
+
-[ ] Add Goal oriented MDPs. Will Require a new "Step"
120
+
-[ ] Add FeUdal Network
121
+
-[ ] Add storage based DataBunch memory management. This can prevent RAM from being used up by episode image frames
122
+
that may or may not serve any use to the agent, but only for logging.
123
+
-[ ] 1.3.0
124
+
-[ ] Add HAC
125
+
-[ ] Add MAXQ
126
+
-[ ] Add HIRO
127
+
-[ ] 1.4.0
128
+
-[ ] Add h-DQN
129
+
-[ ] Add Modulated Policy Hierarchies
130
+
-[ ] Add Meta Learning Shared Hierarchies
131
+
-[ ] 1.5.0
132
+
-[ ] Add STRategic Attentive Writer (STRAW)
133
+
-[ ] Add H-DRLN
134
+
-[ ] Add Abstract Markov Decision Process (AMDP)
135
+
-[ ] Add conda integration so that installation can be truly one step.
136
+
-[ ] 1.6.0 HRL Options models *Possibly will already be implemented in a previous model*
137
+
-[ ] Options augmentation to DQN based models
138
+
-[ ] Options augmentation to actor critic models
139
+
-[ ] Options augmentation to async actor critic models
140
+
-[ ] 1.8.0 HRL Skills
141
+
-[ ] Skills augmentation to DQN based models
142
+
-[ ] Skills augmentation to actor critic models
143
+
-[ ] Skills augmentation to async actor critic models
144
+
-[ ] 1.9.0
145
+
-[ ] 2.0.0 Add PyBullet Fetch Environments
146
+
-[ ] 2.0.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
147
+
-[ ] 2.0.0 Add HER
148
+
149
+
150
+
## Contribution
151
+
Following fastai's guidelines would be desirable: [Guidelines](https://github.com/fastai/fastai/blob/master/README.md#contribution-guidelines)
152
+
153
+
While we hope that model additions will be added smoothly. All models will only be dependent on `core.layers.py`.
154
+
As time goes on, the model architecture will overall improve (we are and while continue to be still figuring things out).
139
155
140
156
141
157
## Style
142
-
Fastai does not follow closely with [google python style guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md#3164-guidelines-derived-from-guidos-recommendations),
143
-
however in this repo we will use this guide.
144
-
Some exceptions however (typically found in Fastai):
145
-
- "PEP 8 Multiple statements per line violation" is allowed in the case of if statements as long as they are still
146
-
within the column limit.
158
+
Since fastai uses a different style from traditional PEP-8, we will be following [Style](https://docs.fast.ai/dev/style.html)
159
+
and [Abbreviations](https://docs.fast.ai/dev/abbr.html). Also we will use RL specific abbr.
0 commit comments