|
1 | | -# drl-nd |
2 | | -My solution notebooks for the Deep Reinforcement Learning Nanodegree by Udacity |
| 1 | +[//]: # (Image References) |
| 2 | + |
| 3 | +[image1]: https://user-images.githubusercontent.com/10624937/42135602-b0335606-7d12-11e8-8689-dd1cf9fa11a9.gif "Trained Agents" |
| 4 | +[image2]: https://user-images.githubusercontent.com/10624937/42386929-76f671f0-8106-11e8-9376-f17da2ae852e.png "Kernel" |
| 5 | + |
| 6 | +# Deep Reinforcement Learning Nanodegree |
| 7 | + |
| 8 | +![Trained Agents][image1] |
| 9 | + |
| 10 | +This repository contains material related to Udacity's [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program. |
| 11 | + |
| 12 | +## Table of Contents |
| 13 | + |
| 14 | +### Tutorials |
| 15 | + |
| 16 | +The tutorials lead you through implementing various algorithms in reinforcement learning. All of the code is in PyTorch (v0.4) and Python 3. |
| 17 | + |
| 18 | +* [Dynamic Programming](https://github.com/udacity/deep-reinforcement-learning/tree/master/dynamic-programming): Implement Dynamic Programming algorithms such as Policy Evaluation, Policy Improvement, Policy Iteration, and Value Iteration. |
| 19 | +* [Monte Carlo](https://github.com/udacity/deep-reinforcement-learning/tree/master/monte-carlo): Implement Monte Carlo methods for prediction and control. |
| 20 | +* [Temporal-Difference](https://github.com/udacity/deep-reinforcement-learning/tree/master/temporal-difference): Implement Temporal-Difference methods such as Sarsa, Q-Learning, and Expected Sarsa. |
| 21 | +* [Discretization](https://github.com/udacity/deep-reinforcement-learning/tree/master/discretization): Learn how to discretize continuous state spaces, and solve the Mountain Car environment. |
| 22 | +* [Tile Coding](https://github.com/udacity/deep-reinforcement-learning/tree/master/tile-coding): Implement a method for discretizing continuous state spaces that enables better generalization. |
| 23 | +* [Deep Q-Network](https://github.com/udacity/deep-reinforcement-learning/tree/master/dqn): Explore how to use a Deep Q-Network (DQN) to navigate a space vehicle without crashing. |
| 24 | +* [Robotics](https://github.com/dusty-nv/jetson-reinforcement): Use a C++ API to train reinforcement learning agents from virtual robotic simulation in 3D. (_External link_) |
| 25 | +* [Hill Climbing](https://github.com/udacity/deep-reinforcement-learning/tree/master/hill-climbing): Use hill climbing with adaptive noise scaling to balance a pole on a moving cart. |
| 26 | +* [Cross-Entropy Method](https://github.com/udacity/deep-reinforcement-learning/tree/master/cross-entropy): Use the cross-entropy method to train a car to navigate a steep hill. |
| 27 | +* [REINFORCE](https://github.com/udacity/deep-reinforcement-learning/tree/master/reinforce): Learn how to use Monte Carlo Policy Gradients to solve a classic control task. |
| 28 | +* **Proximal Policy Optimization**: Explore how to use Proximal Policy Optimization (PPO) to solve a classic reinforcement learning task. (_Coming soon!_) |
| 29 | +* **Deep Deterministic Policy Gradients**: Explore how to use Deep Deterministic Policy Gradients (DDPG) with OpenAI Gym environments. |
| 30 | + * [Pendulum](https://github.com/udacity/deep-reinforcement-learning/tree/master/ddpg-pendulum): Use OpenAI Gym's Pendulum environment. |
| 31 | + * [BipedalWalker](https://github.com/udacity/deep-reinforcement-learning/tree/master/ddpg-bipedal): Use OpenAI Gym's BipedalWalker environment. |
| 32 | +* [Finance](https://github.com/udacity/deep-reinforcement-learning/tree/master/finance): Train an agent to discover optimal trading strategies. |
| 33 | + |
| 34 | +### Labs / Projects |
| 35 | + |
| 36 | +The labs and projects can be found below. All of the projects use rich simulation environments from [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents). In the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program, you will receive a review of your project. These reviews are meant to give you personalized feedback and to tell you what can be improved in your code. |
| 37 | + |
| 38 | +* [The Taxi Problem](https://github.com/udacity/deep-reinforcement-learning/tree/master/lab-taxi): In this lab, you will train a taxi to pick up and drop off passengers. |
| 39 | +* [Navigation](https://github.com/udacity/deep-reinforcement-learning/tree/master/p1_navigation): In the first project, you will train an agent to collect yellow bananas while avoiding blue bananas. |
| 40 | +* [Continuous Control](https://github.com/udacity/deep-reinforcement-learning/tree/master/p2_continuous-control): In the second project, you will train an robotic arm to reach target locations. |
| 41 | +* [Collaboration and Competition](https://github.com/udacity/deep-reinforcement-learning/tree/master/p3_collab-compet): In the third project, you will train a pair of agents to play tennis! |
| 42 | + |
| 43 | +### Resources |
| 44 | + |
| 45 | +* [Cheatsheet](https://github.com/udacity/deep-reinforcement-learning/blob/master/cheatsheet): You are encouraged to use [this PDF file](https://github.com/udacity/deep-reinforcement-learning/blob/master/cheatsheet/cheatsheet.pdf) to guide your study of reinforcement learning. |
| 46 | + |
| 47 | +## OpenAI Gym Benchmarks |
| 48 | + |
| 49 | +### Classic Control |
| 50 | +- `Acrobot-v1` with [Tile Coding](https://github.com/udacity/deep-reinforcement-learning/blob/master/tile-coding/Tile_Coding_Solution.ipynb) and Q-Learning |
| 51 | +- `Cartpole-v0` with [Hill Climbing](https://github.com/udacity/deep-reinforcement-learning/blob/master/hill-climbing/Hill_Climbing.ipynb) | solved in 13 episodes |
| 52 | +- `Cartpole-v0` with [REINFORCE](https://github.com/udacity/deep-reinforcement-learning/blob/master/reinforce/REINFORCE.ipynb) | solved in 691 episodes |
| 53 | +- `MountainCarContinuous-v0` with [Cross-Entropy Method](https://github.com/udacity/deep-reinforcement-learning/blob/master/cross-entropy/CEM.ipynb) | solved in 47 iterations |
| 54 | +- `MountainCar-v0` with [Uniform-Grid Discretization](https://github.com/udacity/deep-reinforcement-learning/blob/master/discretization/Discretization_Solution.ipynb) and Q-Learning | solved in <50000 episodes |
| 55 | +- `Pendulum-v0` with [Deep Deterministic Policy Gradients (DDPG)](https://github.com/udacity/deep-reinforcement-learning/blob/master/ddpg-pendulum/DDPG.ipynb) |
| 56 | + |
| 57 | +### Box2d |
| 58 | +- `BipedalWalker-v2` with [Deep Deterministic Policy Gradients (DDPG)](https://github.com/udacity/deep-reinforcement-learning/blob/master/ddpg-bipedal/DDPG.ipynb) |
| 59 | +- `CarRacing-v0` with **Deep Q-Networks (DQN)** | _Coming soon!_ |
| 60 | +- `LunarLander-v2` with [Deep Q-Networks (DQN)](https://github.com/udacity/deep-reinforcement-learning/blob/master/dqn/solution/Deep_Q_Network_Solution.ipynb) | solved in 1504 episodes |
| 61 | + |
| 62 | +### Toy Text |
| 63 | +- `FrozenLake-v0` with [Dynamic Programming](https://github.com/udacity/deep-reinforcement-learning/blob/master/dynamic-programming/Dynamic_Programming_Solution.ipynb) |
| 64 | +- `Blackjack-v0` with [Monte Carlo Methods](https://github.com/udacity/deep-reinforcement-learning/blob/master/monte-carlo/Monte_Carlo_Solution.ipynb) |
| 65 | +- `CliffWalking-v0` with [Temporal-Difference Methods](https://github.com/udacity/deep-reinforcement-learning/blob/master/temporal-difference/Temporal_Difference_Solution.ipynb) |
| 66 | + |
| 67 | +## Dependencies |
| 68 | + |
| 69 | +To set up your python environment to run the code in this repository, follow the instructions below. |
| 70 | + |
| 71 | +1. Create (and activate) a new environment with Python 3.6. |
| 72 | + |
| 73 | + - __Linux__ or __Mac__: |
| 74 | + ```bash |
| 75 | + conda create --name drlnd python=3.6 |
| 76 | + source activate drlnd |
| 77 | + ``` |
| 78 | + - __Windows__: |
| 79 | + ```bash |
| 80 | + conda create --name drlnd python=3.6 |
| 81 | + activate drlnd |
| 82 | + ``` |
| 83 | + |
| 84 | +2. Follow the instructions in [this repository](https://github.com/openai/gym) to perform a minimal install of OpenAI gym. |
| 85 | + - Next, install the **classic control** environment group by following the instructions [here](https://github.com/openai/gym#classic-control). |
| 86 | + - Then, install the **box2d** environment group by following the instructions [here](https://github.com/openai/gym#box2d). |
| 87 | + |
| 88 | +3. Clone the repository (if you haven't already!), and navigate to the `python/` folder. Then, install several dependencies. |
| 89 | +```bash |
| 90 | +git clone https://github.com/udacity/deep-reinforcement-learning.git |
| 91 | +cd deep-reinforcement-learning/python |
| 92 | +pip install . |
| 93 | +``` |
| 94 | + |
| 95 | +4. Create an [IPython kernel](http://ipython.readthedocs.io/en/stable/install/kernel_install.html) for the `drlnd` environment. |
| 96 | +```bash |
| 97 | +python -m ipykernel install --user --name drlnd --display-name "drlnd" |
| 98 | +``` |
| 99 | + |
| 100 | +5. Before running code in a notebook, change the kernel to match the `drlnd` environment by using the drop-down `Kernel` menu. |
| 101 | + |
| 102 | +![Kernel][image2] |
| 103 | + |
| 104 | +## Want to learn more? |
| 105 | + |
| 106 | +<p align="center">Come learn with us in the <a href="https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893">Deep Reinforcement Learning Nanodegree</a> program at Udacity!</p> |
| 107 | + |
| 108 | +<p align="center"><a href="https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893"> |
| 109 | + <img width="503" height="133" src="https://user-images.githubusercontent.com/10624937/42135812-1829637e-7d16-11e8-9aa1-88056f23f51e.png"></a> |
| 110 | +</p> |
0 commit comments