Skip to content

poojan1202/Implementation_of_RL_algos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementation of RL Algorithms

Sarsa to Q-Learning

Monte-Carlo

Implemented Monte-Carlo method on GymMinigrid Envrionment, MiniGrid-Empty-8x8-v0.

Observation Space

  • gen_obs generates partially observable agent's view (an image)
  • For discrete observation, we use agent_pos, which returns the grid number at which the agent is present.

Action Space

Num Action
0 Turn Left
1 Turn Right
2 Move Forward

Reward Function

  • Reward is 1 when agent reaches goal, else 0

Hyperparameters

  • Gamma
    • 0.9
  • Training Episodes
    • 75
  • Exploration
    • Epsilon = Epsilon/1.1

Results

Training Reward

Simulation

SARSA-λ and SARSA backward

Implemented SARSA-λ and Backward SARSA method on GymMinigrid Envrionment, MiniGrid-Empty-8x8-v0 and MiniGrid-FourRooms-v0.

Observation Space

  • gen_obs generates partially observable agent's view (an image)
  • For discrete observation, we use agent_pos, which returns the grid number at which the agent is present.

Action Space

Num Action
0 Turn Left
1 Turn Right
2 Move Forward

Reward Function

  • Reward is 1 when agent reaches goal, else 0

Hyperparameters (SARSA-λ)

  • Gamma
    • 0.9
  • Sarsa Lambda
    • 0.99
  • Training Episodes
    • 50
  • Exploration
    • Epsilon = Epsilon/1.05

Results

Training Reward

#### Simulation

### Hyperparameters (SARSA Backward) - Gamma - 0.9 - Sarsa Lambda - 0.9 - Training Episodes - 25 - Exploration - Epsilon = Epsilon/1.2

Results

Training Reward

#### Simulation

Q-Learning

Implemented SARSA-λ and Backward SARSA method on GymMinigrid Envrionment, MiniGrid-Empty-8x8-v0.

Observation Space

  • gen_obs generates partially observable agent's view (an image)
  • For discrete observation, we use agent_pos, which returns the grid number at which the agent is present.

Action Space

Num Action
0 Turn Left
1 Turn Right
2 Move Forward

Reward Function

  • Reward is 1 when agent reaches goal, else 0

Hyperparameters

  • Gamma
    • Trained agents with 5 different values of gamma
      • 0.9, 0.7, 0.5, 0.3, 0.1
  • Training Episodes
    • 150
  • Exploration
    • Epsilon = Epsilon/1.1

Results

Training Reward

Steps vs Episodes

Simulation

Deep Q-Learning (DQN)

Implemented DQN on Gym Envrionment, Gym-CartPole-v0.

Observation Space

Num Observation Min Max
0 Cart Position -4.8 4.8
1 Cart Velocity -Inf Inf
2 Pole Angle -0.418 rad(-24 deg) 0.418 rad(-24 deg)
3 Pole Angular Velocity -Inf Inf

Action Space

Num Action
0 Push Cart to Left
1 Push Cart to Right

Reward Function

  • Reward is 1 for every step taken, including the termination step

Hyperparameters

  • Network Architecture
    • 4 Linear Layers of dim = [16, 32, 16, 2]
  • Optimizer
    • Adam Optimizer
  • Learning Rate
    • 0.0001
  • Batch Size
    • 128
  • Training Episodes
    • 700

Results

Training Reward

Simulation

Policy Gradient

Implemented Policy Gradient Method (Actor-Critic) on Gym Envrionment, Gym-CartPole-v0.

Observation Space

The observation is a ndarray with shape (3,) representing the x-y coordinates of the pendulum's free end and its angular velocity.

Num Observation Min Max
0 x = cos(theta) -1.0 1.0
1 y = sin(angle) -1.0 1.0
2 Angular Velocity -8.0 8.0

Action Space

The action is a ndarray with shape (1,) representing the torque applied to free end of the pendulum.

Num Action Min Max
0 Torque -2.0 2.0

Reward Function

  • The reward function is a function of theta, angle made by the pendulum.

Hyperparameters

  • Network Architecture

    • Actor
      • 4 Linear Layers of dim = [31,128,32,2]
    • Critic
      • 4 Linear Layers of dim = [31,128,32,1]
  • Optimizer

    • Adam Optimizer
  • Learning Rate

    • 0.0005
  • Batch Size

    • 64
  • Training Episodes

    • 1200

Results

Training Reward

Simulation

About

Implementation of different Tabular and Deep Reinforcement Learning Algorithms on various Gym Environments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors