Skip to content

Commit 6be8c8c

Browse files
authored
Version 0.9.9: Gifs, Better Memory Management, Save-able DataBunches (#14)
* Init Branch * Added: - Python 3.7 testing - parallel matrix testing - k_partitions_top. You can now divide your runs for example k=3. This means if you run for 100 epochs, MDPDataBunch will store the best episode in the first third, second third, and final third. Thus giving you a perspective on how the agent evolved. - test for k_partitions_top and k_top - Gif plotting for AgentInterpretation - Gif writing for AgentInterpretation - pickle tests * Fixed: - plotting failure / bug with data block *sometimes* double reseting the most recent episode. - gifs register in jupyter notebooks * Added: - auto gif generating code to tests. * Fixed: - tests probably crashing because moviepy is not installed. Not sure if we want to deal with moviepy dependencies in azure - data block not loading pybullet wrapper - somehow trained_learner method in test was completely broken - gif generation crshing due to out of bounds * Changed: - migrated layers to layers.py. Should make imports less crazy. Removed: - agents_base.py because all the functionality is in agent_core.py or layers.py * Added: - .idea dictionaries to snowball common words. - codeStyle project specification - ddpg tests - metric tests - databunch save / load testing - learner import / export compatibility - resolution wrapper. Reduces the render resolution of an env by step size - gif out of bounds issue - DDPG was somewhat broken. Add tanh activation to Actor module. For critic, of conv is not specified, then a single fc layer is added for only state input. Previously this was fine with Pendulum. However ant would just fail badly / highly negative reward. Now I understand that ant penalizes high value actions, so passing actions through a tanh activation function should fix that. The single Linear layer should make the DDPG more stable. - resolution reduction in ddpg tests - some of the pybullet env closing issues. You can now specify to have the env close after fit (default is not to close so that you can easily execute fit multiple times). Though: bulletphysics/bullet3#2470 claims this is fixed, at least for me this is still an issue. We fixed this by calling gc.collect() after close to force an immediate clean up of resources. * Added: - frame skipping for gif generation to reduce gif size. * Fixed: - gif generation warning - More pybullet env crashing issues. Turns out the "close" method call does not actually close the pybullet env. This means that that env will close later when gc is being performed (possibly screwing up future training.). It seems more and more that we need to have a special close method for every physics engine type since some don't react to the regular close, but others actually shut down the entire engine permanently! Changed: - Simplifying the ddpg tests Added: - RL abbreviations Notes: - I am starting to move to making the code follow the fastai style guidelines. I think this will be a continuous progression after v1.0 * Fixed: - gif generation warning - test random failure * Added: - prototype roadmap - python-xlib in due to python3.7 breaking for some reason... * Updated: - convolutional layers make more sense plus reduce the image space * Changed: - tests passed - removed Python3.7 testing for now due to openai gym not working will with it. * Added: - a docker file for fastrl - trying a simply image test * Fixed: - maze gif env takes too much ram. Reduced image size * Changed: - trigger in the yml file is removed so that the pipelines cloud can orchestrate when / how everything will be executed * Added: - deployment staging * Update azure-pipelines.yml for Azure Pipelines * Added: - some conditions that might allow this to be multistage but idk lol * Notes: - testing twine autobuild * Update azure-pipelines.yml for Azure Pipelines * Notes: - fixed Docker file - need to reduce GIF size somehow * Added: - gifs - gif processing notebook * Added: - initial new README without graphics and stuff
1 parent 0abb10a commit 6be8c8c

File tree

117 files changed

+4279
-1447
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

117 files changed

+4279
-1447
lines changed

.gitignore

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,21 @@
11
# IntelliJ project files
2-
.idea
2+
*.iml
3+
.idea/inspectionProfiles/*
4+
misc.xml
5+
modules.xml
6+
other.xml
7+
vcs.xml
8+
workspace.xml
39
out
410
gen
511
.DS_Store
6-
.gitignore
712

813
# Jupyter Notebook
914
*/.ipynb_checkpoints/*
1015

16+
# Secure Files
17+
.pypirc
18+
1119
# Data Files
1220
#/docs_src/data/*
1321

.idea/codeStyles/Project.xml

Lines changed: 23 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/codeStyles/codeStyleConfig.xml

Lines changed: 5 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/dictionaries/jlaivins.xml

Lines changed: 23 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Dockerfile

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
FROM nvidia/cuda:10.0-base-ubuntu18.04
2+
# See http://bugs.python.org/issue19846
3+
ENV LANG C.UTF-8
4+
LABEL com.nvidia.volumes.needed="nvidia_driver"
5+
6+
RUN apt-get update && apt-get install -y --no-install-recommends \
7+
build-essential cmake git curl vim ca-certificates python-qt4 libjpeg-dev \
8+
zip nano unzip libpng-dev strace && \
9+
rm -rf /var/lib/apt/lists/*
10+
11+
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
12+
ENV PYTHON_VERSION=3.6
13+
14+
RUN curl -o ~/miniconda.sh -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
15+
chmod +x ~/miniconda.sh && \
16+
~/miniconda.sh -b -p /opt/conda && \
17+
rm ~/miniconda.sh && \
18+
/opt/conda/bin/conda install conda-build && \
19+
apt-get update && apt-get upgrade -y --no-install-recommends
20+
21+
ENV PATH=$PATH:/opt/conda/bin/
22+
ENV USER fastrl_user
23+
# Create Enviroment
24+
COPY environment.yaml /environment.yaml
25+
RUN conda env create -f environment.yaml
26+
27+
# Cleanup
28+
RUN rm -rf /var/lib/apt/lists/* \
29+
&& apt-get -y autoremove
30+
31+
COPY /fast_rl /fast-reinforcement-learning/fast_rl
32+
COPY /setup.py /fast-reinforcement-learning/setup.py
33+
COPY /README.md /fast-reinforcement-learning/README.md
34+
WORKDIR /fast-reinforcement-learning
35+
RUN /bin/bash -c "source activate fastrl && pip install -e ."
36+
37+
CMD ["/bin/bash -c"]

README.md

Lines changed: 122 additions & 103 deletions
Original file line numberDiff line numberDiff line change
@@ -2,47 +2,87 @@
22
[![pypi fasti_rl version](https://img.shields.io/pypi/v/fast_rl)](https://pypi.python.org/pypi/fast_rl)
33
[![github_master version](https://img.shields.io/github/v/release/josiahls/fast-reinforcement-learning?include_prereleases)](https://github.com/josiahls/fast-reinforcement-learning/releases)
44

5-
**Note: Test passing will not be a useful stability indicator until version 1.0+**
6-
7-
# Fast Reinforcement Learning
8-
This repo is not affiliated with Jeremy Howard or his course which can be found here: [here](https://www.fast.ai/about/)
9-
We will be using components from the Fastai library however for building and training our reinforcement learning (RL)
5+
# Fast_rl
6+
This repo is not affiliated with Jeremy Howard or his course which can be found [here](https://www.fast.ai/about/).
7+
We will be using components from the Fastai library for building and training our reinforcement learning (RL)
108
agents.
119

10+
Our goal is for fast_rl to be make benchmarking easier, inference more efficient, and environment compatibility to be
11+
as decoupled as much as possible. This being version 1.0, we still have a lot of work to make RL training itself faster
12+
and more efficient. The goals for this repo can be seen in the [RoadMap](#roadmap).
13+
14+
A simple example:
15+
```python
16+
from fast_rl.agents.dqn import create_dqn_model, dqn_learner
17+
from fast_rl.agents.dqn_models import *
18+
from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
19+
from fast_rl.core.data_block import MDPDataBunch
20+
from fast_rl.core.metrics import RewardMetric, EpsilonMetric
21+
22+
memory = ExperienceReplay(memory_size=1000000, reduce_ram=True)
23+
explore = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
24+
data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64, add_valid=False)
25+
model = create_dqn_model(data=data, base_arch=FixedTargetDQNModule, lr=0.001, layers=[32,32])
26+
learn = dqn_learner(data, model, memory=memory, exploration_method=explore, copy_over_frequency=300,
27+
callback_fns=[RewardMetric, EpsilonMetric])
28+
learn.fit(450)
29+
```
30+
31+
More complex examples might involve running an RL agent multiple times, generating episode snapshots as gifs, grouping
32+
reward plots, and finally showing the best and worst runs in a single graph.
33+
```python
34+
from fastai.basic_data import DatasetType
35+
from fast_rl.agents.dqn import create_dqn_model, dqn_learner
36+
from fast_rl.agents.dqn_models import *
37+
from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
38+
from fast_rl.core.data_block import MDPDataBunch
39+
from fast_rl.core.metrics import RewardMetric, EpsilonMetric
40+
from fast_rl.core.train import GroupAgentInterpretation, AgentInterpretation
41+
42+
group_interp = GroupAgentInterpretation()
43+
for i in range(5):
44+
memory = ExperienceReplay(memory_size=1000000, reduce_ram=True)
45+
explore = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
46+
data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64, add_valid=False)
47+
model = create_dqn_model(data=data, base_arch=FixedTargetDQNModule, lr=0.001, layers=[32,32])
48+
learn = dqn_learner(data, model, memory=memory, exploration_method=explore, copy_over_frequency=300,
49+
callback_fns=[RewardMetric, EpsilonMetric])
50+
learn.fit(450)
51+
52+
interp=AgentInterpretation(learn, ds_type=DatasetType.Train)
53+
interp.plot_rewards(cumulative=True, per_episode=True, group_name='cartpole_experience_example')
54+
group_interp.add_interpretation(interp)
55+
group_interp.to_pickle(f'{learn.model.name.lower()}/', f'{learn.model.name.lower()}')
56+
for g in interp.generate_gif(): g.write(f'{learn.model.name.lower()}')
57+
group_interp.plot_reward_bounds(per_episode=True, smooth_groups=10)
58+
```
59+
More examples can be found in `docs_src` and the actual code being run for generating gifs can be found in `tests` in
60+
either `test_dqn.py` or `test_ddpg.py`.
61+
1262
As a note, here is a run down of existing RL frameworks:
1363
- [Intel Coach](https://github.com/NervanaSystems/coach)
1464
- [Tensor Force](https://github.com/tensorforce/tensorforce)
1565
- [OpenAI Baselines](https://github.com/openai/baselines)
1666
- [Tensorflow Agents](https://github.com/tensorflow/agents)
1767
- [KerasRL](https://github.com/keras-rl/keras-rl)
1868

19-
However there are also frameworks in PyTorch most notably Facebook's Horizon:
69+
However there are also frameworks in PyTorch:
2070
- [Horizon](https://github.com/facebookresearch/Horizon)
2171
- [DeepRL](https://github.com/ShangtongZhang/DeepRL)
22-
23-
Fastai for computer vision and tabular learning has been amazing. One would wish that this would be the same for RL.
24-
The purpose of this repo is to have a framework that is as easy as possible to start, but also designed for testing
25-
new agents.
26-
27-
# Table of Contents
28-
1. [Installation](#installation)
29-
2. [Alpha TODO](#alpha-todo)
30-
3. [Code](#code)
31-
5. [Versioning](#versioning)
32-
6. [Contributing](#contributing)
33-
7. [Style](#style)
34-
72+
- [Spinning Up](https://spinningup.openai.com/en/latest/user/introduction.html)
3573

3674
## Installation
37-
Very soon we would like to add some form of scripting to install some complicated dependencies. We have 2 steps:
3875

39-
**1.a FastAI**
76+
**fastai (semi-optional)**\
4077
[Install Fastai](https://github.com/fastai/fastai/blob/master/README.md#installation)
41-
or if you are Anaconda (which is a good idea to use Anaconda) you can do: \
78+
or if you are using Anaconda (which is a good idea to use Anaconda) you can do: \
4279
`conda install -c pytorch -c fastai fastai`
4380

81+
**fast_rl**\
82+
Fastai will be installed if it does not exist. If it does exist, the versioning should be repaired by the the setup.py.
83+
`pip install fastai`
4484

45-
**1.b Optional / Extra Envs**
85+
## Installation (Optional)
4686
OpenAI all gyms: \
4787
`pip install gym[all]`
4888

@@ -52,95 +92,74 @@ Mazes: \
5292
`python setup.py install`
5393

5494

55-
**2 Actual Repo** \
95+
## Installation Dev (Optional)
5696
`git clone https://github.com/josiahls/fast-reinforcement-learning.git` \
5797
`cd fast-reinforcement-learning` \
5898
`python setup.py install`
5999

60-
## Alpha TODO
61-
At the moment these are the things we personally urgently need, and then the nice things that will make this repo
62-
something akin to valuable. These are listed in kind of the order we are planning on executing them.
100+
## Installation Issues
101+
Many issues will likely fall under [fastai installation issues](https://github.com/fastai/fastai/blob/master/README.md#installation-issues).
63102

64-
At present, we are in the Alpha stages of agents not being fully tested / debugged. The final step (before 1.0.0) will
65-
be doing an evaluation of the DQN and DDPG agent implementations and verifying each performs the best it can at
66-
known environments. Prior to 1.0.0, new changes might break previous code versions, and models are not guaranteed to be
67-
working at their best. Post 1.0.0 will be more formal feature development with CI, unit testing, etc.
103+
Any other issues are likely environment related. It is important to note that Python 3.7 is not being tested due to
104+
an issue with Pyglet and gym do not working. This issue will not stop you from training models, however this might impact using
105+
OpenAI environments.
68106

69-
**Critical**
70-
Testable code:
71-
```python
72-
from fast_rl.agents.dqn import *
73-
from fast_rl.agents.dqn_models import *
74-
from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
75-
from fast_rl.core.data_block import MDPDataBunch
76-
from fast_rl.core.metrics import *
77-
78-
data = MDPDataBunch.from_env('CartPole-v1', render='rgb_array', bs=32, add_valid=False)
79-
model = create_dqn_model(data, FixedTargetDQNModule, opt=torch.optim.RMSprop, lr=0.00025)
80-
memory = ExperienceReplay(memory_size=1000, reduce_ram=True)
81-
exploration_method = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
82-
learner = dqn_learner(data=data, model=model, memory=memory, exploration_method=exploration_method)
83-
learner.fit(10)
84-
```
107+
## RoadMap
85108

86-
- [X] 0.7.0 Full test suite using multi-processing. Connect to CI.
87-
- [X] 0.8.0 Comprehensive model eval **debug/verify**. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
88-
- [X] 0.9.0 Notebook demonstrations of basic model usage.
89109
- [ ] **Working on** **1.0.0** Base version is completed with working model visualizations proving performance / expected failure. At
90110
this point, all models should have guaranteed environments they should succeed in.
91-
- [ ] 1.8.0 Add PyBullet Fetch Environments
92-
- [ ] 1.8.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
93-
- [ ] 1.8.0 Add HER
94-
95-
96-
## Code
97-
Some of the key take aways is Fastai's use of callbacks. Not only do callbacks allow for logging, but in fact adding a
98-
callback to a generic fit function can change its behavior drastically. My goal is to have a library that is as easy
99-
as possible to run on a server or on one's own computer. We are also interested in this being easy to extend.
100-
101-
We have a few assumptions that the code / support algorithms I believe should adhere to:
102-
- Environments should be pickle-able, and serializable. They should be able to shut down and start up multiple times
103-
during run time.
104-
- Agents should not need more information than images or state values for an environment per step. This means that
105-
environments should not be expected to allow output of contact points, sub-goals, or STRIPS style logical outputs.
106-
107-
Rational:
108-
- Shutdown / Startup: Some environments (pybullet) have the issue of shutting down and starting different environments.
109-
Luckily, we have a fork of pybullet, so these modifications will be forced.
110-
- Pickling: Being able to encapsulate an environment as a `.pkl` can be important for saving it and all the information
111-
it generated.
112-
- Serializable: If we want to do parallel processing, environments need to be serializable to transport them between
113-
those processes.
114-
115-
Some extra assumptions:
116-
- Environments can easier be goal-less, or have a single goal in which OpenAI defines as `Env` and `GoalEnv`.
117-
118-
These assumptions are necessary for us to implement other envs from other repos. We do not want to be tied to just
119-
OpenAI gyms.
120-
121-
## Versioning
122-
At present the repo is in alpha stages being. We plan to move this from alpha to a pseudo beta / working versions.
123-
Regardless of version, we will follow Python style versioning
124-
125-
_Alpha Versions_: #.#.# e.g. 0.1.0. Alpha will never go above 0.99.99, at that point it will be full version 1.0.0.
126-
A key point is during alpha, coding will be quick and dirty with no promise of proper deprecation.
127-
128-
_Beta / Full Versions_: These will be greater than 1.0.0. We follow the Python method of versions:
129-
**[Breaking Changes]**.**[Backward Compatible Features]**.**[Bug Fixes]**. These will be feature
130-
additions such new functions, tools, models, env support. Also proper deprecation will be used.
131-
132-
_Pip update frequency_: We have a pip repository, however we do not plan to update it as frequently at the moment.
133-
However, the current frequency will be during Beta / Full Version updates, we might every 0.5.0
134-
versions update pip.
135-
136-
## Contributing
137-
Follow the templates we have on github. Make a branch either from master or the most recent version branch.
138-
We recommend squashing commits / keep pointless ones to a minimum.
111+
- [ ] 1.1.0 More Traditional RL models
112+
- [ ] Add PPO
113+
- [ ] Add TRPO
114+
- [ ] Add D4PG
115+
- [ ] Add A2C
116+
- [ ] Add A3C
117+
- [ ] 1.2.0 HRL models *Possibly might change version to 2.0 depending on SMDP issues*
118+
- [ ] Add SMDP
119+
- [ ] Add Goal oriented MDPs. Will Require a new "Step"
120+
- [ ] Add FeUdal Network
121+
- [ ] Add storage based DataBunch memory management. This can prevent RAM from being used up by episode image frames
122+
that may or may not serve any use to the agent, but only for logging.
123+
- [ ] 1.3.0
124+
- [ ] Add HAC
125+
- [ ] Add MAXQ
126+
- [ ] Add HIRO
127+
- [ ] 1.4.0
128+
- [ ] Add h-DQN
129+
- [ ] Add Modulated Policy Hierarchies
130+
- [ ] Add Meta Learning Shared Hierarchies
131+
- [ ] 1.5.0
132+
- [ ] Add STRategic Attentive Writer (STRAW)
133+
- [ ] Add H-DRLN
134+
- [ ] Add Abstract Markov Decision Process (AMDP)
135+
- [ ] Add conda integration so that installation can be truly one step.
136+
- [ ] 1.6.0 HRL Options models *Possibly will already be implemented in a previous model*
137+
- [ ] Options augmentation to DQN based models
138+
- [ ] Options augmentation to actor critic models
139+
- [ ] Options augmentation to async actor critic models
140+
- [ ] 1.8.0 HRL Skills
141+
- [ ] Skills augmentation to DQN based models
142+
- [ ] Skills augmentation to actor critic models
143+
- [ ] Skills augmentation to async actor critic models
144+
- [ ] 1.9.0
145+
- [ ] 2.0.0 Add PyBullet Fetch Environments
146+
- [ ] 2.0.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
147+
- [ ] 2.0.0 Add HER
148+
149+
150+
## Contribution
151+
Following fastai's guidelines would be desirable: [Guidelines](https://github.com/fastai/fastai/blob/master/README.md#contribution-guidelines)
152+
153+
While we hope that model additions will be added smoothly. All models will only be dependent on `core.layers.py`.
154+
As time goes on, the model architecture will overall improve (we are and while continue to be still figuring things out).
139155

140156

141157
## Style
142-
Fastai does not follow closely with [google python style guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md#3164-guidelines-derived-from-guidos-recommendations),
143-
however in this repo we will use this guide.
144-
Some exceptions however (typically found in Fastai):
145-
- "PEP 8 Multiple statements per line violation" is allowed in the case of if statements as long as they are still
146-
within the column limit.
158+
Since fastai uses a different style from traditional PEP-8, we will be following [Style](https://docs.fast.ai/dev/style.html)
159+
and [Abbreviations](https://docs.fast.ai/dev/abbr.html). Also we will use RL specific abbr.
160+
161+
| | Concept | Abbr. | Combination Examples |
162+
|:------:|:-------:|:-----:|:--------------------:|
163+
| **RL** | State | st | |
164+
| | Action | acn | |
165+
| | Bounds | bb | Same as Bounding Box |

0 commit comments

Comments
 (0)