Skip to content

Latest commit

 

History

History
119 lines (88 loc) · 8.72 KB

File metadata and controls

119 lines (88 loc) · 8.72 KB

Report on project1-Navigation

In this project, the DQN learning algorithm has been used to solve the Navigation problem. Other learning algorithms like Dobule DQN, Prioritized Experience Replay DQN, Dueling DQN will be added later.

The report will describe the learning algorithm with used hyper parameters, the arcitectures for neural netwoorks.

Training Code

The code is written in PyTorch and Python3, executed in Jupyter Notebook

  • Navigation.ipynb : Main Instruction file
  • dqn_agent.py : Agent and ReplayBuffer Class
  • model.py : Build QNetwork and train function
  • vanila_dqn_checkpoint.pth : Saved Model Weights

Learning Algorithm

Deep Q-Network

Q-learning is a value-based Reinforcement Learning algorithm that is used to find the optimal action-selection policy using a q function, Q(s,a)

It's goal is to maximize the value function Q

which is the maximum sum of rewards rt discounted by γ at each timestep t, achievable by a behaviour policy π=P(a|s), after making an observation (s) and taking an action (a)

The follwoing is pseudo code of Q learning algorithm.

  1. Initialze Q-values Q(s,a) arbitrarily for all state-action pairs.

  2. For i=1 to # num_episodes
    Choose an action At int eht current state (s) based on current Q-value estimates (e,g ε-greedy)
    Take action At amd observe reward and state, Rt+1, St+1 Update Q(s|a)

Q-networks approximate the Q-function as a neural network given a state, Q-values for each action
Q(s, a, θ) is a neural network that define obejctive function by mean-squared error in Q-values

To find optimum parameters θ, optimise by SGD, using δL(θ)θ
This algorithm diverges because stages are correlated and targets are non-stationary.

DQN-Experience replay
In order to deal with the correlated states, the agent build a dataset of experience and then makes random samples from the dataset.

DQN-Fixed Target
Also, the agent fixes the parameter θ- and then with some frequency updates them

References

  1. Q-Learning
  2. Deep Reinforcement Learning - David Silver
  3. Deep Reinforcement Learning

Neural Network Architecture
The state space has 37 dimensions and the size of action space per state is 4.
so the number of input features of NN is 37 and the output size is 4.
And the number of hidden layers and each size is configurable in this project.
You can input the list of hidden layers as one of the input parameters when creating an agent.
The hidden layers used in this project is [64,32] ie, 2 layers with 64, 32 neurons in each layer.

Number of features

  • Input layers : 37
  • Hidden layer 1: 64
  • Hidden layer 2: 32
  • Output layer : 4
QNetwork(
  (layers): ModuleList(
    (0): Linear(in_features=37, out_features=64, bias=True)
    (1): Linear(in_features=64, out_features=32, bias=True)
  )
  (output): Linear(in_features=32, out_features=4, bias=True)
)

Hyper-parameters

  • BUFFER_SIZE = int(1e5) # replay buffer size
  • BATCH_SIZE = 64 # minibatch size
  • GAMMA = 0.99 # discount factor
  • TAU = 1e-3 # for soft update of target parameters
  • LR = 5e-4 # learning rate
  • UPDATE_EVERY = 4 # how often to update the network

Note: learning rate is also configurable, you can specify when creating an agent.

Plot of Rewards

A plot of rewards per episode

  • plot an average reward (over 100 episodes)
  • It shows this agent solve the environment in in 169 episodes! image
Episode 100	Average Score: 2.41
Episode 200	Average Score: 8.68
Episode 269	Average Score: 13.00
Environment solved in 169 episodes!	Average Score: 13.00
Total training time 0:03:53 s

The follwoing movie shows how the trained agent works to collect bananas and the final score

https://youtu.be/GxIUse16NSs

Ideas for Future Work

This project used simply the vanila DQN focusing on understanding algorithms and implementation.
As a future work, more improved algorithms like double DQN, dueling DQN and prioritized experince replay can be applied. And find-out fine-tuned hyper parameters that improve the overall performance.