josiahls
diff --git a/‎.gitignore‎
Lines changed: 10 additions & 2 deletions b/‎.gitignore‎
Lines changed: 10 additions & 2 deletions
diff --git a/‎.idea/codeStyles/Project.xml‎
Lines changed: 23 additions & 0 deletions b/‎.idea/codeStyles/Project.xml‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎.idea/codeStyles/codeStyleConfig.xml‎
Lines changed: 5 additions & 0 deletions b/‎.idea/codeStyles/codeStyleConfig.xml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎.idea/dictionaries/jlaivins.xml‎
Lines changed: 23 additions & 0 deletions b/‎.idea/dictionaries/jlaivins.xml‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎Dockerfile‎
Lines changed: 37 additions & 0 deletions b/‎Dockerfile‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 122 additions & 103 deletions b/‎README.md‎
Lines changed: 122 additions & 103 deletions
@@ -1,13 +1,21 @@
 # IntelliJ project files
-.idea
+*.iml
+.idea/inspectionProfiles/*
+misc.xml
+modules.xml
+other.xml
+vcs.xml
+workspace.xml
 out
 gen
 .DS_Store
-.gitignore
 
 # Jupyter Notebook
 */.ipynb_checkpoints/*
 
+# Secure Files
+.pypirc
+
 # Data Files
 #/docs_src/data/*
 
 
@@ -0,0 +1,37 @@
+FROM nvidia/cuda:10.0-base-ubuntu18.04
+# See http://bugs.python.org/issue19846
+ENV LANG C.UTF-8
+LABEL com.nvidia.volumes.needed="nvidia_driver"
+
+RUN apt-get update && apt-get install -y --no-install-recommends \
+        build-essential cmake git curl vim ca-certificates python-qt4 libjpeg-dev \
+        zip nano unzip libpng-dev strace && \
+        rm -rf /var/lib/apt/lists/*
+
+ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
+ENV PYTHON_VERSION=3.6
+
+RUN curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh  && \
+     chmod +x ~/miniconda.sh && \
+     ~/miniconda.sh -b -p /opt/conda && \
+     rm ~/miniconda.sh && \
+     /opt/conda/bin/conda install conda-build && \
+     apt-get update && apt-get upgrade -y --no-install-recommends
+
+ENV PATH=$PATH:/opt/conda/bin/
+ENV USER fastrl_user
+# Create Enviroment
+COPY environment.yaml /environment.yaml
+RUN conda env create -f environment.yaml
+
+# Cleanup
+RUN rm -rf /var/lib/apt/lists/* \
+    && apt-get -y autoremove
+
+COPY /fast_rl /fast-reinforcement-learning/fast_rl
+COPY /setup.py /fast-reinforcement-learning/setup.py
+COPY /README.md /fast-reinforcement-learning/README.md
+WORKDIR /fast-reinforcement-learning
+RUN /bin/bash -c "source activate fastrl && pip install -e ."
+
+CMD ["/bin/bash -c"]
@@ -2,47 +2,87 @@
 [![pypi fasti_rl version](https://img.shields.io/pypi/v/fast_rl)](https://pypi.python.org/pypi/fast_rl)
 [![github_master version](https://img.shields.io/github/v/release/josiahls/fast-reinforcement-learning?include_prereleases)](https://github.com/josiahls/fast-reinforcement-learning/releases)
 
-**Note: Test passing will not be a useful stability indicator until version 1.0+**
-
-# Fast Reinforcement Learning
-This repo is not affiliated with Jeremy Howard or his course which can be found here: [here](https://www.fast.ai/about/)
-We will be using components from the Fastai library however for building and training our reinforcement learning (RL) 
+# Fast_rl
+This repo is not affiliated with Jeremy Howard or his course which can be found [here](https://www.fast.ai/about/).
+We will be using components from the Fastai library for building and training our reinforcement learning (RL) 
 agents.
 
+Our goal is for fast_rl to be make benchmarking easier, inference more efficient, and environment compatibility to be
+as decoupled as much as possible. This being version 1.0, we still have a lot of work to make RL training itself faster 
+and more efficient. The goals for this repo can be seen in the [RoadMap](#roadmap).
+
+A simple example:
+```python
+from fast_rl.agents.dqn import create_dqn_model, dqn_learner
+from fast_rl.agents.dqn_models import *
+from fast_rl.core.agent_core import ExperienceReplay,  GreedyEpsilon
+from fast_rl.core.data_block import MDPDataBunch
+from fast_rl.core.metrics import RewardMetric, EpsilonMetric
+
+memory = ExperienceReplay(memory_size=1000000, reduce_ram=True)
+explore = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
+data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64, add_valid=False)
+model = create_dqn_model(data=data, base_arch=FixedTargetDQNModule, lr=0.001, layers=[32,32])
+learn = dqn_learner(data, model, memory=memory, exploration_method=explore, copy_over_frequency=300,
+                    callback_fns=[RewardMetric, EpsilonMetric])
+learn.fit(450)
+```
+
+More complex examples might involve running an RL agent multiple times, generating episode snapshots as gifs, grouping
+reward plots, and finally showing the best and worst runs in a single graph. 
+```python
+from fastai.basic_data import DatasetType
+from fast_rl.agents.dqn import create_dqn_model, dqn_learner
+from fast_rl.agents.dqn_models import *
+from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
+from fast_rl.core.data_block import MDPDataBunch
+from fast_rl.core.metrics import RewardMetric, EpsilonMetric
+from fast_rl.core.train import GroupAgentInterpretation, AgentInterpretation
+
+group_interp = GroupAgentInterpretation()
+for i in range(5):
+	memory = ExperienceReplay(memory_size=1000000, reduce_ram=True)
+	explore = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
+	data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64, add_valid=False)
+	model = create_dqn_model(data=data, base_arch=FixedTargetDQNModule, lr=0.001, layers=[32,32])
+	learn = dqn_learner(data, model, memory=memory, exploration_method=explore, copy_over_frequency=300,
+						callback_fns=[RewardMetric, EpsilonMetric])
+	learn.fit(450)
+
+	interp=AgentInterpretation(learn, ds_type=DatasetType.Train)
+	interp.plot_rewards(cumulative=True, per_episode=True, group_name='cartpole_experience_example')
+	group_interp.add_interpretation(interp)
+	group_interp.to_pickle(f'{learn.model.name.lower()}/', f'{learn.model.name.lower()}')
+	for g in interp.generate_gif(): g.write(f'{learn.model.name.lower()}')
+group_interp.plot_reward_bounds(per_episode=True, smooth_groups=10)
+```
+More examples can be found in `docs_src` and the actual code being run for generating gifs can be found in `tests` in 
+either `test_dqn.py` or `test_ddpg.py`.
+
 As a note, here is a run down of existing RL frameworks:
 - [Intel Coach](https://github.com/NervanaSystems/coach) 
 - [Tensor Force](https://github.com/tensorforce/tensorforce)
 - [OpenAI Baselines](https://github.com/openai/baselines)
 - [Tensorflow Agents](https://github.com/tensorflow/agents)
 - [KerasRL](https://github.com/keras-rl/keras-rl)
 
-However there are also frameworks in PyTorch most notably Facebook's Horizon:
+However there are also frameworks in PyTorch:
 - [Horizon](https://github.com/facebookresearch/Horizon)
 - [DeepRL](https://github.com/ShangtongZhang/DeepRL)
-
-Fastai for computer vision and tabular learning has been amazing. One would wish that this would be the same for RL. 
-The purpose of this repo is to have a framework that is as easy as possible to start, but also designed for testing
-new agents. 
-
-# Table of Contents
-1. [Installation](#installation)
-2. [Alpha TODO](#alpha-todo)
-3. [Code](#code)
-5. [Versioning](#versioning)
-6. [Contributing](#contributing)
-7. [Style](#style)
-
+- [Spinning Up](https://spinningup.openai.com/en/latest/user/introduction.html)
 
 ## Installation
-Very soon we would like to add some form of scripting to install some complicated dependencies. We have 2 steps:
 
-**1.a FastAI**
+**fastai (semi-optional)**\
 [Install Fastai](https://github.com/fastai/fastai/blob/master/README.md#installation)
-or if you are Anaconda (which is a good idea to use Anaconda) you can do: \
+or if you are using Anaconda (which is a good idea to use Anaconda) you can do: \
 `conda install -c pytorch -c fastai fastai`
 
+**fast_rl**\
+Fastai will be installed if it does not exist. If it does exist, the versioning should be repaired by the the setup.py.
+`pip install fastai`
 
-**1.b Optional / Extra Envs**
+## Installation (Optional)
 OpenAI all gyms: \
 `pip install gym[all]`
 
@@ -52,95 +92,74 @@ Mazes: \
 `python setup.py install`
 
 
-**2 Actual Repo** \
+## Installation Dev (Optional)
 `git clone https://github.com/josiahls/fast-reinforcement-learning.git` \
 `cd fast-reinforcement-learning` \
 `python setup.py install`
 
-## Alpha TODO
-At the moment these are the things we personally urgently need, and then the nice things that will make this repo
-something akin to valuable. These are listed in kind of the order we are planning on executing them.
+## Installation Issues
+Many issues will likely fall under [fastai installation issues](https://github.com/fastai/fastai/blob/master/README.md#installation-issues).
 
-At present, we are in the Alpha stages of agents not being fully tested / debugged. The final step (before 1.0.0) will 
-be doing an evaluation of the DQN and DDPG agent implementations and verifying each performs the best it can at 
-known environments. Prior to 1.0.0, new changes might break previous code versions, and models are not guaranteed to be
-working at their best. Post 1.0.0 will be more formal feature development with CI, unit testing, etc. 
+Any other issues are likely environment related. It is important to note that Python 3.7 is not being tested due to
+an issue with Pyglet and gym do not working. This issue will not stop you from training models, however this might impact using
+OpenAI environments. 
 
-**Critical**
-Testable code:
-```python
-from fast_rl.agents.dqn import *
-from fast_rl.agents.dqn_models import *
-from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
-from fast_rl.core.data_block import MDPDataBunch
-from fast_rl.core.metrics import *
-
-data = MDPDataBunch.from_env('CartPole-v1', render='rgb_array', bs=32, add_valid=False)
-model = create_dqn_model(data, FixedTargetDQNModule, opt=torch.optim.RMSprop, lr=0.00025)
-memory = ExperienceReplay(memory_size=1000, reduce_ram=True)
-exploration_method = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
-learner = dqn_learner(data=data, model=model, memory=memory, exploration_method=exploration_method)
-learner.fit(10)
-```
+## RoadMap
 
-- [X] 0.7.0 Full test suite using multi-processing. Connect to CI.
-- [X] 0.8.0 Comprehensive model eval **debug/verify**. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
-- [X] 0.9.0 Notebook demonstrations of basic model usage.
 - [ ] **Working on** **1.0.0** Base version is completed with working model visualizations proving performance / expected failure. At 
 this point, all models should have guaranteed environments they should succeed in. 
-- [ ] 1.8.0 Add PyBullet Fetch Environments
-    - [ ] 1.8.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
-    - [ ] 1.8.0 Add HER
-
-
-## Code 
-Some of the key take aways is Fastai's use of callbacks. Not only do callbacks allow for logging, but in fact adding a
-callback to a generic fit function can change its behavior drastically. My goal is to have a library that is as easy
-as possible to run on a server or on one's own computer. We are also interested in this being easy to extend. 
-
-We have a few assumptions that the code / support algorithms I believe should adhere to:
-- Environments should be pickle-able, and serializable. They should be able to shut down and start up multiple times
-during run time.
-- Agents should not need more information than images or state values for an environment per step. This means that 
-environments should not be expected to allow output of contact points, sub-goals, or STRIPS style logical outputs. 
-
-Rational:
-- Shutdown / Startup: Some environments (pybullet) have the issue of shutting down and starting different environments.
-Luckily, we have a fork of pybullet, so these modifications will be forced. 
-- Pickling: Being able to encapsulate an environment as a `.pkl` can be important for saving it and all the information
-it generated.
-- Serializable: If we want to do parallel processing, environments need to be serializable to transport them between 
-those processes.
-
-Some extra assumptions:
-- Environments can easier be goal-less, or have a single goal in which OpenAI defines as `Env` and `GoalEnv`. 
-
-These assumptions are necessary for us to implement other envs from other repos. We do not want to be tied to just
-OpenAI gyms. 
-
-## Versioning
-At present the repo is in alpha stages being. We plan to move this from alpha to a pseudo beta / working versions. 
-Regardless of version, we will follow Python style versioning
-
-_Alpha Versions_:  #.#.# e.g. 0.1.0. Alpha will never go above 0.99.99, at that point it will be full version 1.0.0.
-                   A key point is during alpha, coding will be quick and dirty with no promise of proper deprecation.
-
-_Beta / Full Versions_: These will be greater than 1.0.0. We follow the Python method of versions:
-                        **[Breaking Changes]**.**[Backward Compatible Features]**.**[Bug Fixes]**. These will be feature
-                        additions such new functions, tools, models, env support. Also proper deprecation will be used.
-                        
-_Pip update frequency_: We have a pip repository, however we do not plan to update it as frequently at the moment. 
-                        However, the current frequency will be during Beta / Full Version updates, we might every 0.5.0
-                        versions update pip.
-
-## Contributing
-Follow the templates we have on github. Make a branch either from master or the most recent version branch.
-We recommend squashing commits / keep pointless ones to a minimum.
+- [ ] 1.1.0 More Traditional RL models
+    - [ ] Add PPO
+    - [ ] Add TRPO
+    - [ ] Add D4PG
+    - [ ] Add A2C
+    - [ ] Add A3C
+- [ ] 1.2.0 HRL models *Possibly might change version to 2.0 depending on SMDP issues*
+    - [ ] Add SMDP
+    - [ ] Add Goal oriented MDPs. Will Require a new "Step"
+    - [ ] Add FeUdal Network
+    - [ ] Add storage based DataBunch memory management. This can prevent RAM from being used up by episode image frames
+    that may or may not serve any use to the agent, but only for logging.
+- [ ] 1.3.0
+    - [ ] Add HAC
+    - [ ] Add MAXQ
+    - [ ] Add HIRO
+- [ ] 1.4.0
+    - [ ] Add h-DQN
+    - [ ] Add Modulated Policy Hierarchies
+    - [ ] Add Meta Learning Shared Hierarchies
+- [ ] 1.5.0
+    - [ ] Add STRategic Attentive Writer (STRAW)
+    - [ ] Add H-DRLN
+    - [ ] Add Abstract Markov Decision Process (AMDP)
+    - [ ] Add conda integration so that installation can be truly one step.
+- [ ] 1.6.0 HRL Options models *Possibly will already be implemented in a previous model*
+    - [ ] Options augmentation to DQN based models
+    - [ ] Options augmentation to actor critic models
+    - [ ] Options augmentation to async actor critic models
+- [ ] 1.8.0 HRL Skills
+    - [ ] Skills augmentation to DQN based models
+    - [ ] Skills augmentation to actor critic models
+    - [ ] Skills augmentation to async actor critic models
+- [ ] 1.9.0
+- [ ] 2.0.0 Add PyBullet Fetch Environments
+    - [ ] 2.0.0 Not part of this repo, however the envs need to subclass the OpenAI `gym.GoalEnv`
+    - [ ] 2.0.0 Add HER
+
+
+## Contribution
+Following fastai's guidelines would be desirable: [Guidelines](https://github.com/fastai/fastai/blob/master/README.md#contribution-guidelines)
+
+While we hope that model additions will be added smoothly. All models will only be dependent on `core.layers.py`.
+As time goes on, the model architecture will overall improve (we are and while continue to be still figuring things out).
 
 
 ## Style
-Fastai does not follow closely with [google python style guide](https://github.com/google/styleguide/blob/gh-pages/pyguide.md#3164-guidelines-derived-from-guidos-recommendations),
-however in this repo we will use this guide.  
-Some exceptions however (typically found in Fastai):
-- "PEP 8 Multiple statements per line violation" is allowed in the case of if statements as long as they are still 
-within the column limit.
+Since fastai uses a different style from traditional PEP-8, we will be following [Style](https://docs.fast.ai/dev/style.html) 
+and [Abbreviations](https://docs.fast.ai/dev/abbr.html). Also we will use RL specific abbr.
+
+|        | Concept | Abbr. | Combination Examples |
+|:------:|:-------:|:-----:|:--------------------:|
+| **RL** |  State  |  st   |                      |
+|        | Action  |  acn  |                      |
+|        | Bounds  |  bb   | Same as Bounding Box |