Implementation of RL Algorithms

Sarsa to Q-Learning

Monte-Carlo

Implemented Monte-Carlo method on GymMinigrid Envrionment, MiniGrid-Empty-8x8-v0.

Observation Space

gen_obs generates partially observable agent's view (an image)
For discrete observation, we use agent_pos, which returns the grid number at which the agent is present.

Action Space

Num	Action
0	Turn Left
1	Turn Right
2	Move Forward

Reward Function

Reward is 1 when agent reaches goal, else 0

Hyperparameters

Gamma
- 0.9
Training Episodes
- 75
Exploration
- Epsilon = Epsilon/1.1

Results

Training Reward

Simulation

SARSA-λ and SARSA backward

Implemented SARSA-λ and Backward SARSA method on GymMinigrid Envrionment, MiniGrid-Empty-8x8-v0 and MiniGrid-FourRooms-v0.

Observation Space

gen_obs generates partially observable agent's view (an image)
For discrete observation, we use agent_pos, which returns the grid number at which the agent is present.

Action Space

Num	Action
0	Turn Left
1	Turn Right
2	Move Forward

Reward Function

Reward is 1 when agent reaches goal, else 0

Hyperparameters (SARSA-λ)

Gamma
- 0.9
Sarsa Lambda
- 0.99
Training Episodes
- 50
Exploration
- Epsilon = Epsilon/1.05

Results

Training Reward

#### Simulation

### Hyperparameters (SARSA Backward) - Gamma - 0.9 - Sarsa Lambda - 0.9 - Training Episodes - 25 - Exploration - Epsilon = Epsilon/1.2

Results

Training Reward

#### Simulation

Q-Learning

Implemented SARSA-λ and Backward SARSA method on GymMinigrid Envrionment, MiniGrid-Empty-8x8-v0.

Observation Space

gen_obs generates partially observable agent's view (an image)
For discrete observation, we use agent_pos, which returns the grid number at which the agent is present.

Action Space

Num	Action
0	Turn Left
1	Turn Right
2	Move Forward

Reward Function

Reward is 1 when agent reaches goal, else 0

Hyperparameters

Gamma
- Trained agents with 5 different values of gamma
  - 0.9, 0.7, 0.5, 0.3, 0.1
Training Episodes
- 150
Exploration
- Epsilon = Epsilon/1.1

Results

Training Reward

Steps vs Episodes

Simulation

Deep Q-Learning (DQN)

Implemented DQN on Gym Envrionment, Gym-CartPole-v0.

Observation Space

Num	Observation	Min	Max
0	Cart Position	-4.8	4.8
1	Cart Velocity	-Inf	Inf
2	Pole Angle	-0.418 rad(-24 deg)	0.418 rad(-24 deg)
3	Pole Angular Velocity	-Inf	Inf

Action Space

Num	Action
0	Push Cart to Left
1	Push Cart to Right

Reward Function

Reward is 1 for every step taken, including the termination step

Hyperparameters

Network Architecture
- 4 Linear Layers of dim = [16, 32, 16, 2]
Optimizer
- Adam Optimizer
Learning Rate
- 0.0001
Batch Size
- 128
Training Episodes
- 700

Results

Training Reward

Simulation

Policy Gradient

Implemented Policy Gradient Method (Actor-Critic) on Gym Envrionment, Gym-CartPole-v0.

Observation Space

The observation is a ndarray with shape (3,) representing the x-y coordinates of the pendulum's free end and its angular velocity.

Num	Observation	Min	Max
0	x = cos(theta)	-1.0	1.0
1	y = sin(angle)	-1.0	1.0
2	Angular Velocity	-8.0	8.0

Action Space

The action is a ndarray with shape (1,) representing the torque applied to free end of the pendulum.

Num	Action	Min	Max
0	Torque	-2.0	2.0

Reward Function

The reward function is a function of theta, angle made by the pendulum.

Hyperparameters

Network Architecture
- Actor
  - 4 Linear Layers of dim = [31,128,32,2]
- Critic
  - 4 Linear Layers of dim = [31,128,32,1]
Optimizer
- Adam Optimizer
Learning Rate
- 0.0005
Batch Size
- 64
Training Episodes
- 1200

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
CartPole.py		CartPole.py
InvPEN.ipynb		InvPEN.ipynb
README.md		README.md
Sarsa.py		Sarsa.py
Sarsa_Backward.py		Sarsa_Backward.py
monte_carlo.py		monte_carlo.py
q_learning.py		q_learning.py

Folders and files

Latest commit

History

Repository files navigation

Implementation of RL Algorithms

Sarsa to Q-Learning

Monte-Carlo

Observation Space

Action Space

Reward Function

Hyperparameters

Results

Training Reward

Simulation

SARSA-λ and SARSA backward

Observation Space

Action Space

Reward Function

Hyperparameters (SARSA-λ)

Results

Training Reward

Results

Training Reward

Q-Learning

Observation Space

Action Space

Reward Function

Hyperparameters

Results

Training Reward

Steps vs Episodes

Simulation

Deep Q-Learning (DQN)

Observation Space

Action Space

Reward Function

Hyperparameters

Results

Training Reward

Simulation

Policy Gradient

Observation Space

Action Space

Reward Function

Hyperparameters

Results

Training Reward

Simulation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages