[question] Hyperparameters for Roboschool HumanoidFlagrunHarder

Hi,
Currently, I have used 4 algorithms from stable-baselines for the task of Roboschool HumanoidFlagrunHarder. My evaluation metric is the mean reward of 100 episodes. Basically: PPO2 is perfect, A2C gets the mean reward of 500, DDPG gets the mean reward around 0. SAC gets the mean reward of 280. I have been looking for the hyperparameters setting in stable-baselines-zoo for A2C, DDPG, and SAC but could only find Bullet Env Humanoid for SAC (quite close to Roboschool HFH). Thus, do you have any suggestions for A2C, DDPG, SAC on this task? The number of timesteps for on-policy methods is 400M and 20M for off-policy methods. It would be nice if you add them to the set of hyperparameters. 
Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Hyperparameters for Roboschool HumanoidFlagrunHarder #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[question] Hyperparameters for Roboschool HumanoidFlagrunHarder #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions