GitHub - prajwalgatti/DRL-Navigation: An RL agent that implements DDQN algorithm

Description

The RL agent is allowed to traverse across a two dimensional grid with blue and yellow bananas placed across it. The agent is expected to collect the yellow bananas while avoiding the blue ones. The agent receives a positive reward for every yellow banana it collects and a negative reward for every blue banana collected. The size of the state space is 37. The agent is able to move forwards and backwards as well as turn left and right, thus the size of the action space is 4. The minimal expected performance of the agent after training is a score of +13 over 100 consecutive episodes.

Algorithm Used

The current solution implements a Dueling DQN algorithm with Prioritized Experience Replay as described in the "Dueling Network Architectures for Deep Reinforcement Learning" paper (arXiv:1511.06581)

The network's architecture looks like :

DuelQNet(
  (input): Linear(in_features=37, out_features=100, bias=True)
  (dropout): Dropout(p=0.3)
  (hidden): ModuleList(
    (0): Linear(in_features=100, out_features=64, bias=True)
    (1): Linear(in_features=64, out_features=64, bias=True)
    (2): Linear(in_features=64, out_features=64, bias=True)
    (3): Linear(in_features=64, out_features=64, bias=True)
  )
  (value): ModuleList(
    (0): Linear(in_features=64, out_features=64, bias=True)
    (1): Linear(in_features=64, out_features=1, bias=True)
  )
  (advantage): ModuleList(
    (0): Linear(in_features=64, out_features=64, bias=True)
    (1): Linear(in_features=64, out_features=4, bias=True)
  )
)

The hyperparameters selected for the demonstration are:

Learning Rate: 0.0005
Batch size : 64
Learns after every steps : 4
Gamma : 0.99
Tau : 0.003

It took the agent about 471 episodes to be able to perform with not less than score of 13 as an average of 100 episodes.

Plot of Rewards

The goal set of the agent is to reach +13 points as average reward of the last 100 episodes. The current solution manages to reach the goal after 400-500 episodes and keep improving over 17 points.

The saved weights can be found here.

Follow the training notebook here.

Some ideas for future work

Search for better hyperparameters of algorithm as well as neural network
Implement prioritized experience replay mechanism

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Banana.app/Contents		Banana.app/Contents
saved_weights		saved_weights
.DS_Store		.DS_Store
Navigation.ipynb		Navigation.ipynb
README.md		README.md
plot.png		plot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Description

Algorithm Used

Plot of Rewards

Some ideas for future work

About

Uh oh!

Releases

Packages

Languages

prajwalgatti/DRL-Navigation

Folders and files

Latest commit

History

Repository files navigation

Description

Algorithm Used

Plot of Rewards

Some ideas for future work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages