Skip to content

using PPO implementation in custom environement #2

@chaubeyniha

Description

@chaubeyniha

Hi, thank you for writing this code I found it extremely helpful as a beginner.
I have been using this implementation in a custom environment and I had a general question.

One of the hyperparameters is n_steps, number of steps to run for each environment per update. I was wondering if there is an inherent issue if my custom environment has maximum 250 steps and loses reward for the time that passes.

Can this create a conflict and will it not learn as well? I hope my question makes sense. Please do let me know.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions