DRL---Robotic-Item-Collection

Training an agent to collect items in a large square world using deep reinforcement learning

Project description

For this project, the task is to train an agent to navigate in a large, square world, while collecting yellow bananas, and avoiding blue bananas. A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal is to collect as many yellow bananas as possible while avoiding blue bananas.

State space is 37 dimensional and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction.
Action space is 4 dimentional. Four discrete actions correspond to:
- 0 - move forward
- 1 - move backward
- 2 - move left
- 3 - move right
Solution criteria: the environment is considered as solved when the agent gets an average score of +13 over 100 consecutive episodes.

Getting started

Configuration

PC configuration used for this project:

OS: Mac OS 10.14.4 Mojave
i7 Quadcore 3.5 Ghz, 32GB, GeForce GTX 780M

Environment setup

Check this nanodegree's prerequisite, and follow the instructions.
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

(For AWS) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link to obtain the environment.
Place the file in bin/ directory, and unzip (or decompress) the file.

Instructions

To train the agent, start jupyter notebook, open Navigation.ipynb and execute! For more information, please check instructions inside the notebook.

Deep Q-Learning High-Level Architecture

Idea. Use deep neural network for Q-value function approximation as state -> action mapping. The following loss function is minimized as part of this:

Neural network architecture:

Layer	(in, out)	Activation
Layer 1	(`state_size`, 64)	`relu`
Layer 2	(64, 128)	`relu`
Layer 3	(128, 128)	`relu`
Layer 4	(128, 64)	`relu`
Layer 5	(64, 32)	`relu`
Layer 6	(32, `action_size`)	-

Results

Best model weights are saved under the checkpoint file in the repo.

Possible improvements

Multiple potential improvements to this method are available and have been researched:

Exchanging the epsilon greedy action selection for a more systematic method such as another neural network for best epsilon selection.
Providing dynamic reward maps to the agent
Using abbreviations of the above algorithm, namely Double DQN, Dueling DQN, Prioritized Experience Replay, Fixed-Q-Targets, or even the ultimate DQN version called Rainbow DQN.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Navigation.ipynb		Navigation.ipynb
README.md		README.md
checkpoint_2019-02-07_21-27.pth		checkpoint_2019-02-07_21-27.pth
dqn_navigator_NeuralNet.py		dqn_navigator_NeuralNet.py
dqn_navigator_agent.py		dqn_navigator_agent.py
report.md		report.md
result_score.jpg		result_score.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DRL---Robotic-Item-Collection

Project description

Getting started

Configuration

Environment setup

Instructions

Deep Q-Learning High-Level Architecture

Results

Possible improvements

About

Uh oh!

Releases

Packages

Languages

pbaginski86/DRL---Robotic-Item-Collection

Folders and files

Latest commit

History

Repository files navigation

DRL---Robotic-Item-Collection

Project description

Getting started

Configuration

Environment setup

Instructions

Deep Q-Learning High-Level Architecture

Results

Possible improvements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages