Training an agent to collect items in a large square world using deep reinforcement learning
For this project, the task is to train an agent to navigate in a large, square world, while collecting yellow bananas, and avoiding blue bananas. A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal is to collect as many yellow bananas as possible while avoiding blue bananas.
-
State space is
37dimensional and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction. -
Action space is
4dimentional. Four discrete actions correspond to:0- move forward1- move backward2- move left3- move right
-
Solution criteria: the environment is considered as solved when the agent gets an average score of +13 over 100 consecutive episodes.
PC configuration used for this project:
- OS: Mac OS 10.14.4 Mojave
- i7 Quadcore 3.5 Ghz, 32GB, GeForce GTX 780M
-
Check this nanodegree's prerequisite, and follow the instructions.
-
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
(For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link to obtain the environment.
-
Place the file in
bin/directory, and unzip (or decompress) the file.
To train the agent, start jupyter notebook, open Navigation.ipynb
and execute! For more information, please check instructions
inside the notebook.
Idea. Use deep neural network for Q-value function approximation as state -> action mapping. The following loss function is minimized as part of this:
Neural network architecture:
| Layer | (in, out) | Activation |
|---|---|---|
| Layer 1 | (state_size, 64) |
relu |
| Layer 2 | (64, 128) | relu |
| Layer 3 | (128, 128) | relu |
| Layer 4 | (128, 64) | relu |
| Layer 5 | (64, 32) | relu |
| Layer 6 | (32, action_size) |
- |
- Best model weights are saved under the checkpoint file in the repo.
Multiple potential improvements to this method are available and have been researched:
- Exchanging the epsilon greedy action selection for a more systematic method such as another neural network for best epsilon selection.
- Providing dynamic reward maps to the agent
- Using abbreviations of the above algorithm, namely Double DQN, Dueling DQN, Prioritized Experience Replay, Fixed-Q-Targets, or even the ultimate DQN version called Rainbow DQN.