Hi,
Currently, I have used 4 algorithms from stable-baselines for the task of Roboschool HumanoidFlagrunHarder. My evaluation metric is the mean reward of 100 episodes. Basically: PPO2 is perfect, A2C gets the mean reward of 500, DDPG gets the mean reward around 0. SAC gets the mean reward of 280. I have been looking for the hyperparameters setting in stable-baselines-zoo for A2C, DDPG, and SAC but could only find Bullet Env Humanoid for SAC (quite close to Roboschool HFH). Thus, do you have any suggestions for A2C, DDPG, SAC on this task? The number of timesteps for on-policy methods is 400M and 20M for off-policy methods. It would be nice if you add them to the set of hyperparameters.
Thanks.
Hi,
Currently, I have used 4 algorithms from stable-baselines for the task of Roboschool HumanoidFlagrunHarder. My evaluation metric is the mean reward of 100 episodes. Basically: PPO2 is perfect, A2C gets the mean reward of 500, DDPG gets the mean reward around 0. SAC gets the mean reward of 280. I have been looking for the hyperparameters setting in stable-baselines-zoo for A2C, DDPG, and SAC but could only find Bullet Env Humanoid for SAC (quite close to Roboschool HFH). Thus, do you have any suggestions for A2C, DDPG, SAC on this task? The number of timesteps for on-policy methods is 400M and 20M for off-policy methods. It would be nice if you add them to the set of hyperparameters.
Thanks.