Skip to content

Commit 65e6d70

Browse files
authored
Version 1.0.0: DDPG / DQN Reward Graphs and Gifs (#16)
* Added: - initial updates to notebooks - fixed target lunar lander * Updated: - notebooks Fixed: - there seems to be a strange versioning issue with whether the keyword axis needs to be passed on argmax pytorch functions * Note: - there seems to be an overall issue with image generation * Added: - initial gifs, finished notebooks - gif table generating notebook - reward graphs 1.0 :)
1 parent df296a0 commit 65e6d70

File tree

47 files changed

+1259
-3130
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1259
-3130
lines changed

Dockerfile

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ LABEL com.nvidia.volumes.needed="nvidia_driver"
55

66
RUN apt-get update && apt-get install -y --no-install-recommends \
77
build-essential cmake git curl vim ca-certificates python-qt4 libjpeg-dev \
8-
zip nano unzip libpng-dev strace && \
8+
zip nano unzip libpng-dev strace python-opengl xvfb && \
99
rm -rf /var/lib/apt/lists/*
1010

1111
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
@@ -28,10 +28,7 @@ RUN conda env create -f environment.yaml
2828
RUN rm -rf /var/lib/apt/lists/* \
2929
&& apt-get -y autoremove
3030

31-
COPY /fast_rl /fast-reinforcement-learning/fast_rl
32-
COPY /setup.py /fast-reinforcement-learning/setup.py
33-
COPY /README.md /fast-reinforcement-learning/README.md
34-
WORKDIR /fast-reinforcement-learning
35-
RUN /bin/bash -c "source activate fastrl && pip install -e ."
31+
EXPOSE 8888
32+
ENV CONDA_DEFAULT_ENV fastrl
3633

3734
CMD ["/bin/bash -c"]

README.md

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ Our goal is for fast_rl to be make benchmarking easier, inference more efficient
1111
as decoupled as much as possible. This being version 1.0, we still have a lot of work to make RL training itself faster
1212
and more efficient. The goals for this repo can be seen in the [RoadMap](#roadmap).
1313

14+
**An important note is that training can use up a lot of RAM. This will likely be resolved as more models are being added. Likely will be resolved by off loading to storage in the next few versions.**
15+
1416
A simple example:
1517
```python
1618
from fast_rl.agents.dqn import create_dqn_model, dqn_learner
@@ -162,4 +164,53 @@ and [Abbreviations](https://docs.fast.ai/dev/abbr.html). Also we will use RL spe
162164
|:------:|:-------:|:-----:|:--------------------:|
163165
| **RL** | State | st | |
164166
| | Action | acn | |
165-
| | Bounds | bb | Same as Bounding Box |
167+
| | Bounds | bb | Same as Bounding Box |
168+
169+
## Examples
170+
171+
### Reward Graphs
172+
173+
| | Model |
174+
|:------------------------------------------:|:---------------:|
175+
| ![01](./res/reward_plots/cartpole_dqn.png) | DQN |
176+
| ![01](./res/reward_plots/cartpole_dueling.png) | Dueling DQN |
177+
| ![01](./res/reward_plots/cartpole_double.png) | Double DQN |
178+
| ![01](./res/reward_plots/cartpole_dddqn.png) | DDDQN |
179+
| ![01](./res/reward_plots/cartpole_fixedtarget.png) | Fixed Target DQN |
180+
| ![01](./res/reward_plots/lunarlander_dqn.png) | DQN |
181+
| ![01](./res/reward_plots/lunarlander_dueling.png) | Dueling DQN |
182+
| ![01](./res/reward_plots/lunarlander_double.png) | Double DQN |
183+
| ![01](./res/reward_plots/lunarlander_dddqn.png) | DDDQN |
184+
| ![01](./res/reward_plots/lunarlander_fixedtarget.png) | Fixed Target DQN |
185+
| ![01](./res/reward_plots/ant_ddpg.png) | DDPG |
186+
| ![01](./res/reward_plots/pendulum_ddpg.png) | DDPG |
187+
| ![01](./res/reward_plots/halfcheetah_ddpg.png) | DDPG |
188+
189+
190+
### Agent Stages
191+
192+
| Model | Gif(Early) | Gif(Mid) | Gif(Late) |
193+
|:------------:|:------------:|:------------:|:------------:|
194+
| DDPG+PER | ![](./res/run_gifs/pendulum_PriorityExperienceReplay_DDPGModule_1_episode_35.gif) | ![](./res/run_gifs/pendulum_PriorityExperienceReplay_DDPGModule_1_episode_222.gif) | ![](./res/run_gifs/pendulum_PriorityExperienceReplay_DDPGModule_1_episode_431.gif)|
195+
| DoubleDueling+ER | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDuelingModule_1_episode_114.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDuelingModule_1_episode_346.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDuelingModule_1_episode_925.gif)|
196+
| DoubleDQN+ER | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDQNModule_1_episode_88.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDQNModule_1_episode_613.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDQNModule_1_episode_999.gif)|
197+
| DuelingDQN+ER | ![](./res/run_gifs/lunarlander_ExperienceReplay_DuelingDQNModule_1_episode_112.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DuelingDQNModule_1_episode_431.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DuelingDQNModule_1_episode_980.gif)|
198+
| DoubleDueling+PER | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDuelingModule_1_episode_151.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDuelingModule_1_episode_341.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDuelingModule_1_episode_999.gif)|
199+
| DQN+ER | ![](./res/run_gifs/lunarlander_ExperienceReplay_DQNModule_1_episode_93.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DQNModule_1_episode_541.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DQNModule_1_episode_999.gif)|
200+
| DuelingDQN+PER | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DuelingDQNModule_1_episode_21.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DuelingDQNModule_1_episode_442.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DuelingDQNModule_1_episode_998.gif)|
201+
| DQN+PER | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DQNModule_1_episode_99.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DQNModule_1_episode_382.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DQNModule_1_episode_949.gif)|
202+
| DoubleDQN+PER | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDQNModule_1_episode_7.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDQNModule_1_episode_514.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDQNModule_1_episode_999.gif)|
203+
| DDPG+PER | ![](./res/run_gifs/ant_PriorityExperienceReplay_DDPGModule_1_episode_52.gif) | ![](./res/run_gifs/ant_PriorityExperienceReplay_DDPGModule_1_episode_596.gif) | ![](./res/run_gifs/ant_PriorityExperienceReplay_DDPGModule_1_episode_984.gif)|
204+
| DDPG+ER | ![](./res/run_gifs/ant_ExperienceReplay_DDPGModule_1_episode_54.gif) | ![](./res/run_gifs/ant_ExperienceReplay_DDPGModule_1_episode_614.gif) | ![](./res/run_gifs/ant_ExperienceReplay_DDPGModule_1_episode_999.gif)|
205+
| DQN+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DQNModule_1_episode_44.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DQNModule_1_episode_216.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DQNModule_1_episode_413.gif)|
206+
| FixedTargetDQN+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_FixedTargetDQNModule_1_episode_57.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_FixedTargetDQNModule_1_episode_309.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_FixedTargetDQNModule_1_episode_438.gif)|
207+
| DQN+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_DQNModule_1_episode_31.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DQNModule_1_episode_207.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DQNModule_1_episode_447.gif)|
208+
| FixedTargetDQN+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_FixedTargetDQNModule_1_episode_13.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_FixedTargetDQNModule_1_episode_265.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_FixedTargetDQNModule_1_episode_449.gif)|
209+
| DoubleDQN+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDQNModule_1_episode_60.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDQNModule_1_episode_268.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDQNModule_1_episode_438.gif)|
210+
| DoubleDQN+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDQNModule_1_episode_35.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDQNModule_1_episode_269.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDQNModule_1_episode_444.gif)|
211+
| DuelingDQN+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_DuelingDQNModule_1_episode_62.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DuelingDQNModule_1_episode_209.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DuelingDQNModule_1_episode_432.gif)|
212+
| DoubleDueling+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDuelingModule_1_episode_2.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDuelingModule_1_episode_260.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDuelingModule_1_episode_438.gif)|
213+
| DuelingDQN+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DuelingDQNModule_1_episode_69.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DuelingDQNModule_1_episode_272.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DuelingDQNModule_1_episode_438.gif)|
214+
| DoubleDueling+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDuelingModule_1_episode_43.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDuelingModule_1_episode_287.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDuelingModule_1_episode_447.gif)|
215+
| DDPG+ER | ![](./res/run_gifs/acrobot_ExperienceReplay_DDPGModule_1_episode_69.gif) | ![](./res/run_gifs/acrobot_ExperienceReplay_DDPGModule_1_episode_197.gif) | ![](./res/run_gifs/acrobot_ExperienceReplay_DDPGModule_1_episode_438.gif)|
216+
| DDPG+PER | ![](./res/run_gifs/acrobot_PriorityExperienceReplay_DDPGModule_1_episode_55.gif) | ![](./res/run_gifs/acrobot_PriorityExperienceReplay_DDPGModule_1_episode_267.gif) | ![](./res/run_gifs/acrobot_PriorityExperienceReplay_DDPGModule_1_episode_422.gif)|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)