[BENCHMARK] Reproduce Mujoco playground baselines

### Describe the benchmarking experiment/task

This task involves reproducing the benchmark results for common reinforcement learning algorithms (e.g., PPO, SAC) on standard **MuJoCo Playground** environments. The goal is to validate that our implementation is correct and performs on par with established baselines from the paper.

The experimental design is as follows:

1.  **Select Algorithms**: PPO and SAC.
2.  **Select Environments**: At least two from the standard suite.
3.  **Run Trials**: For each algorithm-environment pair, run the training for at least **5 different random seeds**.
4.  **Training Duration**: Check paper.
5.  **Log Metrics**: Log the episodic return, episode length, and any relevant algorithm-specific metrics (e.g., actor/critic loss) against the environment timestep.

### Hypothesis/expected behavior or outcome

We expect our implementations of PPO and SAC to achieve a final mean episodic return that is **within 5-10%** of the scores reported by the chosen reference library (e.g., CleanRL) for the corresponding MuJoCo environment after X training steps. The learning curves generated from our runs should exhibit a similar trend and stability.

### Definition of done

This benchmark is considered "done" when:

1.  Experiments for both PPO and SAC have been successfully completed on selected envs for at least 5 seeds each.
2.  The performance data has been aggregated and plotted, showing the mean and standard deviation of episodic returns across seeds.
3.  The final mean return for each experiment is confirmed to be within the acceptable 10% margin of the reference score.

### Mandatory checklist before benchmarking is complete

  * [ ] Experiment is documented - hyperparameters, plots, conclusions/findings etc. are available in a final report.
  * [ ] Link experiment/benchmarking (optional).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BENCHMARK] Reproduce Mujoco playground baselines #180

Describe the benchmarking experiment/task

Hypothesis/expected behavior or outcome

Definition of done

Mandatory checklist before benchmarking is complete

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BENCHMARK] Reproduce Mujoco playground baselines #180

Description

Describe the benchmarking experiment/task

Hypothesis/expected behavior or outcome

Definition of done

Mandatory checklist before benchmarking is complete

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions