Discounted reward calculation in PPO.py breaks when trajectory reaches max_ep_len in train.py

In `train.py`, when the trajectory reaches `max_ep_len`, the last `done` value for that trajectory in `ppo_agent.buffer.is_terminals` is `False`.

https://github.com/nikhilbarhate99/PPO-PyTorch/blob/728cce83d7ab628fe2634eabcdf3239997eb81dd/train.py#L173-L181
This leads to an issue in the `update` function of `PPO.py`, where the calculation of `discounted_reward` fails when the last `is_terminal` value of the trajectory is `False`.

https://github.com/nikhilbarhate99/PPO-PyTorch/blob/728cce83d7ab628fe2634eabcdf3239997eb81dd/PPO.py#L200-L208

solution: 

```
 for t in range(1, max_ep_len+1): 
  
     # select action with policy 
     action = ppo_agent.select_action(state) 
     state, reward, done, _ = env.step(action) 
  
     # saving reward and is_terminals 
     ppo_agent.buffer.rewards.append(reward) 
     ppo_agent.buffer.is_terminals.append(True if t == max_ep_len else done) 
```

	def update(self):
	# Monte Carlo estimate of returns
	rewards = []
	discounted_reward = 0
	for reward, is_terminal in zip(reversed(self.buffer.rewards), reversed(self.buffer.is_terminals)):
	if is_terminal:
	discounted_reward = 0
	discounted_reward = reward + (self.gamma * discounted_reward)
	rewards.insert(0, discounted_reward)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discounted reward calculation in PPO.py breaks when trajectory reaches max_ep_len in train.py #73

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	for t in range(1, max_ep_len+1):

	# select action with policy
	action = ppo_agent.select_action(state)
	state, reward, done, _ = env.step(action)

	# saving reward and is_terminals
	ppo_agent.buffer.rewards.append(reward)
	ppo_agent.buffer.is_terminals.append(done)

Discounted reward calculation in PPO.py breaks when trajectory reaches max_ep_len in train.py #73

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions