loco-mujoco/examples/training_examples/jax_rl_mimic at master · hwang-warren/loco-mujoco

Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
conf.yaml	conf.yaml
eval.py	eval.py
experiment.py	experiment.py

Name

Last commit message

Last commit date

DeepMimic Imitation Learning Example

This example demonstrates training a PPO agent on the Unitree H1 robot using a DeepMimic‑style reward to imitate a dataset of human walking trajectories in various directions.

The example leverages the GoalTrajMimic goal vector, which encodes target positions, orientations, and velocities of both site and joint states from the expert dataset. The reward function defined in the configuration, MimicReward, implements DeepMimic‑style reward. By the end of training, the agent can walk in multiple directions and rotate around its z‑axis.

🚀 Training

To train the agent, run:

python experiment.py

This command will:

Train the PPO agent on the Unitree H1 robot for 300 million steps (approximately 36 minutes on an RTX 3080 Ti). To get even smoother policies train it for longer!
Save the trained agent (as GAILJax_saved.pkl in the outputs folder).
Perform a final rendering of the trained policy.
Save a video of the rendering to the LocoMuJoCo_recordings/ directory.
Upload the video to Weights & Biases (WandB) for further analysis (check the command line logs for details).

Validation Loop During Training

Throughout training, the agent will be evaluated using various trajectory-based metrics, including Euclidean distance, Dynamic Time Warping (DTW), and discrete Fréchet distance. These metrics will be computed on different entities such as joint positions, joint velocities, and site positions and orientations. All results will be logged to Weights & Biases (WandB).

📈 Evaluation

To evaluate the trained agent, run:

python eval.py --path path/to/agent_file

If you'd like to evaluate the agent using MuJoCo (instead of Mjx), run:

python eval.py --path path/to/agent_file --use_mujoco

⚠️ Note: Evaluating with MuJoCo may not yield results as robust as with Mjx due to simulator differences. For reliable policy transfer between the two, consider applying domain randomization techniques. nks to the dataset, or more details about the environment or architecture!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

DeepMimic Imitation Learning Example

🚀 Training

Validation Loop During Training

📈 Evaluation

FilesExpand file tree

jax_rl_mimic

Directory actions

More options

Directory actions

More options

Latest commit

History

jax_rl_mimic

Folders and files

parent directory

README.md

DeepMimic Imitation Learning Example

🚀 Training

Validation Loop During Training

📈 Evaluation