maximum episodic rewards

I've read your [paper](https://playground.mujoco.org/assets/playground_technical_report.pdf) and have been using the MuJoCo Playground to test my algorithms. Thank you for your great work. From the report, I see that the Brax framework was used for training and evaluation, with results reported across environments. I have two questions:

Were the Brax hyperparameters tuned separately for each environment? I noticed variations in the hyperparameters across environments. For the dm_control environments, only two sets were shared—one for PPO and one for SAC.

Regarding PPO agents, do the maximum achievable returns per episode vary significantly across environments? In some cases, returns reach around 900–1000, but in others, Brax seems to struggle—for example, HopperHop wasn’t solved, FingerSpin reaches ~600, and PendulumSwingup only ~50. Is there a standard expected return for each environment (i.e. around 1000 for each environment for dm_control)? If so, could these differences be due to insufficient hyperparameter tuning or is it expected?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

maximum episodic rewards #135

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

maximum episodic rewards #135

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions