-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
Describe the benchmarking experiment/task
This task involves reproducing the benchmark results for common reinforcement learning algorithms (e.g., PPO, SAC) on standard MuJoCo Playground environments. The goal is to validate that our implementation is correct and performs on par with established baselines from the paper.
The experimental design is as follows:
- Select Algorithms: PPO and SAC.
- Select Environments: At least two from the standard suite.
- Run Trials: For each algorithm-environment pair, run the training for at least 5 different random seeds.
- Training Duration: Check paper.
- Log Metrics: Log the episodic return, episode length, and any relevant algorithm-specific metrics (e.g., actor/critic loss) against the environment timestep.
Hypothesis/expected behavior or outcome
We expect our implementations of PPO and SAC to achieve a final mean episodic return that is within 5-10% of the scores reported by the chosen reference library (e.g., CleanRL) for the corresponding MuJoCo environment after X training steps. The learning curves generated from our runs should exhibit a similar trend and stability.
Definition of done
This benchmark is considered "done" when:
- Experiments for both PPO and SAC have been successfully completed on selected envs for at least 5 seeds each.
- The performance data has been aggregated and plotted, showing the mean and standard deviation of episodic returns across seeds.
- The final mean return for each experiment is confirmed to be within the acceptable 10% margin of the reference score.
Mandatory checklist before benchmarking is complete
- Experiment is documented - hyperparameters, plots, conclusions/findings etc. are available in a final report.
- Link experiment/benchmarking (optional).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request