Skip to content

Commit 65e6d70

Browse files
authored
Version 1.0.0: DDPG / DQN Reward Graphs and Gifs (#16)
* Added: - initial updates to notebooks - fixed target lunar lander * Updated: - notebooks Fixed: - there seems to be a strange versioning issue with whether the keyword axis needs to be passed on argmax pytorch functions * Note: - there seems to be an overall issue with image generation * Added: - initial gifs, finished notebooks - gif table generating notebook - reward graphs 1.0 :)
1 parent df296a0 commit 65e6d70

File tree

47 files changed

+1259
-3130
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+1259
-3130
lines changed

Dockerfile

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ LABEL com.nvidia.volumes.needed="nvidia_driver"
55

66
RUN apt-get update && apt-get install -y --no-install-recommends \
77
build-essential cmake git curl vim ca-certificates python-qt4 libjpeg-dev \
8-
zip nano unzip libpng-dev strace && \
8+
zip nano unzip libpng-dev strace python-opengl xvfb && \
99
rm -rf /var/lib/apt/lists/*
1010

1111
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64
@@ -28,10 +28,7 @@ RUN conda env create -f environment.yaml
2828
RUN rm -rf /var/lib/apt/lists/* \
2929
&& apt-get -y autoremove
3030

31-
COPY /fast_rl /fast-reinforcement-learning/fast_rl
32-
COPY /setup.py /fast-reinforcement-learning/setup.py
33-
COPY /README.md /fast-reinforcement-learning/README.md
34-
WORKDIR /fast-reinforcement-learning
35-
RUN /bin/bash -c "source activate fastrl && pip install -e ."
31+
EXPOSE 8888
32+
ENV CONDA_DEFAULT_ENV fastrl
3633

3734
CMD ["/bin/bash -c"]

README.md

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ Our goal is for fast_rl to be make benchmarking easier, inference more efficient
1111
as decoupled as much as possible. This being version 1.0, we still have a lot of work to make RL training itself faster
1212
and more efficient. The goals for this repo can be seen in the [RoadMap](#roadmap).
1313

14+
**An important note is that training can use up a lot of RAM. This will likely be resolved as more models are being added. Likely will be resolved by off loading to storage in the next few versions.**
15+
1416
A simple example:
1517
```python
1618
from fast_rl.agents.dqn import create_dqn_model, dqn_learner
@@ -162,4 +164,53 @@ and [Abbreviations](https://docs.fast.ai/dev/abbr.html). Also we will use RL spe
162164
|:------:|:-------:|:-----:|:--------------------:|
163165
| **RL** | State | st | |
164166
| | Action | acn | |
165-
| | Bounds | bb | Same as Bounding Box |
167+
| | Bounds | bb | Same as Bounding Box |
168+
169+
## Examples
170+
171+
### Reward Graphs
172+
173+
| | Model |
174+
|:------------------------------------------:|:---------------:|
175+
| ![01](./res/reward_plots/cartpole_dqn.png) | DQN |
176+
| ![01](./res/reward_plots/cartpole_dueling.png) | Dueling DQN |
177+
| ![01](./res/reward_plots/cartpole_double.png) | Double DQN |
178+
| ![01](./res/reward_plots/cartpole_dddqn.png) | DDDQN |
179+
| ![01](./res/reward_plots/cartpole_fixedtarget.png) | Fixed Target DQN |
180+
| ![01](./res/reward_plots/lunarlander_dqn.png) | DQN |
181+
| ![01](./res/reward_plots/lunarlander_dueling.png) | Dueling DQN |
182+
| ![01](./res/reward_plots/lunarlander_double.png) | Double DQN |
183+
| ![01](./res/reward_plots/lunarlander_dddqn.png) | DDDQN |
184+
| ![01](./res/reward_plots/lunarlander_fixedtarget.png) | Fixed Target DQN |
185+
| ![01](./res/reward_plots/ant_ddpg.png) | DDPG |
186+
| ![01](./res/reward_plots/pendulum_ddpg.png) | DDPG |
187+
| ![01](./res/reward_plots/halfcheetah_ddpg.png) | DDPG |
188+
189+
190+
### Agent Stages
191+
192+
| Model | Gif(Early) | Gif(Mid) | Gif(Late) |
193+
|:------------:|:------------:|:------------:|:------------:|
194+
| DDPG+PER | ![](./res/run_gifs/pendulum_PriorityExperienceReplay_DDPGModule_1_episode_35.gif) | ![](./res/run_gifs/pendulum_PriorityExperienceReplay_DDPGModule_1_episode_222.gif) | ![](./res/run_gifs/pendulum_PriorityExperienceReplay_DDPGModule_1_episode_431.gif)|
195+
| DoubleDueling+ER | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDuelingModule_1_episode_114.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDuelingModule_1_episode_346.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDuelingModule_1_episode_925.gif)|
196+
| DoubleDQN+ER | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDQNModule_1_episode_88.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDQNModule_1_episode_613.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DoubleDQNModule_1_episode_999.gif)|
197+
| DuelingDQN+ER | ![](./res/run_gifs/lunarlander_ExperienceReplay_DuelingDQNModule_1_episode_112.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DuelingDQNModule_1_episode_431.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DuelingDQNModule_1_episode_980.gif)|
198+
| DoubleDueling+PER | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDuelingModule_1_episode_151.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDuelingModule_1_episode_341.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDuelingModule_1_episode_999.gif)|
199+
| DQN+ER | ![](./res/run_gifs/lunarlander_ExperienceReplay_DQNModule_1_episode_93.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DQNModule_1_episode_541.gif) | ![](./res/run_gifs/lunarlander_ExperienceReplay_DQNModule_1_episode_999.gif)|
200+
| DuelingDQN+PER | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DuelingDQNModule_1_episode_21.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DuelingDQNModule_1_episode_442.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DuelingDQNModule_1_episode_998.gif)|
201+
| DQN+PER | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DQNModule_1_episode_99.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DQNModule_1_episode_382.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DQNModule_1_episode_949.gif)|
202+
| DoubleDQN+PER | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDQNModule_1_episode_7.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDQNModule_1_episode_514.gif) | ![](./res/run_gifs/lunarlander_PriorityExperienceReplay_DoubleDQNModule_1_episode_999.gif)|
203+
| DDPG+PER | ![](./res/run_gifs/ant_PriorityExperienceReplay_DDPGModule_1_episode_52.gif) | ![](./res/run_gifs/ant_PriorityExperienceReplay_DDPGModule_1_episode_596.gif) | ![](./res/run_gifs/ant_PriorityExperienceReplay_DDPGModule_1_episode_984.gif)|
204+
| DDPG+ER | ![](./res/run_gifs/ant_ExperienceReplay_DDPGModule_1_episode_54.gif) | ![](./res/run_gifs/ant_ExperienceReplay_DDPGModule_1_episode_614.gif) | ![](./res/run_gifs/ant_ExperienceReplay_DDPGModule_1_episode_999.gif)|
205+
| DQN+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DQNModule_1_episode_44.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DQNModule_1_episode_216.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DQNModule_1_episode_413.gif)|
206+
| FixedTargetDQN+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_FixedTargetDQNModule_1_episode_57.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_FixedTargetDQNModule_1_episode_309.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_FixedTargetDQNModule_1_episode_438.gif)|
207+
| DQN+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_DQNModule_1_episode_31.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DQNModule_1_episode_207.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DQNModule_1_episode_447.gif)|
208+
| FixedTargetDQN+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_FixedTargetDQNModule_1_episode_13.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_FixedTargetDQNModule_1_episode_265.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_FixedTargetDQNModule_1_episode_449.gif)|
209+
| DoubleDQN+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDQNModule_1_episode_60.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDQNModule_1_episode_268.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDQNModule_1_episode_438.gif)|
210+
| DoubleDQN+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDQNModule_1_episode_35.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDQNModule_1_episode_269.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDQNModule_1_episode_444.gif)|
211+
| DuelingDQN+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_DuelingDQNModule_1_episode_62.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DuelingDQNModule_1_episode_209.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DuelingDQNModule_1_episode_432.gif)|
212+
| DoubleDueling+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDuelingModule_1_episode_2.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDuelingModule_1_episode_260.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DoubleDuelingModule_1_episode_438.gif)|
213+
| DuelingDQN+PER | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DuelingDQNModule_1_episode_69.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DuelingDQNModule_1_episode_272.gif) | ![](./res/run_gifs/cartpole_PriorityExperienceReplay_DuelingDQNModule_1_episode_438.gif)|
214+
| DoubleDueling+ER | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDuelingModule_1_episode_43.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDuelingModule_1_episode_287.gif) | ![](./res/run_gifs/cartpole_ExperienceReplay_DoubleDuelingModule_1_episode_447.gif)|
215+
| DDPG+ER | ![](./res/run_gifs/acrobot_ExperienceReplay_DDPGModule_1_episode_69.gif) | ![](./res/run_gifs/acrobot_ExperienceReplay_DDPGModule_1_episode_197.gif) | ![](./res/run_gifs/acrobot_ExperienceReplay_DDPGModule_1_episode_438.gif)|
216+
| DDPG+PER | ![](./res/run_gifs/acrobot_PriorityExperienceReplay_DDPGModule_1_episode_55.gif) | ![](./res/run_gifs/acrobot_PriorityExperienceReplay_DDPGModule_1_episode_267.gif) | ![](./res/run_gifs/acrobot_PriorityExperienceReplay_DDPGModule_1_episode_422.gif)|
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)