using PPO implementation in custom environement

Hi, thank you for writing this code I found it extremely helpful as a beginner.
I have been using this implementation in a custom environment and I had a general question.

One of the hyperparameters is n_steps, number of steps to run for each environment per update. I was wondering if there is an inherent issue if my custom environment has maximum 250 steps and loses reward for the time that passes.

Can this create a conflict and will it not learn as well? I hope my question makes sense. Please do let me know.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using PPO implementation in custom environement #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

using PPO implementation in custom environement #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions