A PyTorch implementation of the Soft Actor-Critic (SAC) algorithm to train an agent to play with the Humanoid environment from MuJoCo.
You can clone the repository and install the required dependencies using Poetry or pip. This project requires Python 3.13.
# 1. Clone the repository
git clone https://github.com/giansimone/sac-mujoco-humanoid.git
cd sac-mujoco-humanoid
# 2. Initialize environment and install dependencies
poetry env use python3.13
poetry install
# 3. Activate the shell
eval $(poetry env activate)# 1. Clone the repository
git clone https://github.com/giansimone/sac-mujoco-humanoid.git
cd sac-mujoco-humanoid
# 2. Create and activate virtual environment
python3.13 -m venv venv
source venv/bin/activate
# 3. Install package in editable mode
pip install -e .sac-mujoco-humanoid/
├── sac_mujoco_humanoid/
│ ├── __init__.py
│ ├── agent.py # SAC implementation (Actor/Critic)
│ ├── buffer.py # Replay Buffer
│ ├── config.yaml # Training hyperparameters
│ ├── environment.py # Gym environment wrappers
│ ├── enjoy.py # Evaluation script
│ ├── export.py # Hugging Face export script
│ ├── model.py # PyTorch Network definitions
│ ├── train.py # Main training loop
│ └── utils.py
├── .gitignore
├── LICENSE
├── README.md
└── pyproject.tomlEnsure you are in the sac_mujoco_humanoid source directory where config.yaml is located before running these commands.
cd sac-mujoco-humanoidTrain a SAC agent with the default configuration.
Note: The Replay Buffer pre-allocates memory. Ensure your system has at least 8GB of RAM available.
python -m trainEdit config.yaml to customise training parameters.
#Environment
env_name: Humanoid-v5
#Network Architecture
hidden_dim: 256
#Training
total_steps: 2_000_000
buffer_size: 1_000_000
batch_size: 256
start_steps: 10_000
updates_per_step: 1
#SAC Agent
lr: 0.0003
gamma: 0.99
tau: 0.005
alpha: 0.2
auto_tune_alpha: True
#Logging
log_dir: runs/
#System
seed: 42Watch a trained agent by running the enjoy script. Point the artifact argument to your saved model file.
python -m enjoy \
--artifact runs/sac_Humanoid-v5_YYYY-MM-DD_HHhMMmSSs/final_model.pt \
--num-episodes 5Share your trained model, config, and a replay video to the Hugging Face Hub.
python -m export \
--username YOUR_HF_USERNAME \
--repo-name sac-mujoco-humanoid \
--artifact-path runs/sac_Humanoid-v5_YYYY-MM-DD_HHhMMmSSs/final_model.pt \
--movie-fps 30 \
--n-eval 10This will automatically:
-
Upload the model weights and config.
-
Generate a model card with evaluation metrics (Mean Reward +/- Std).
-
Record and upload a video of the agent.
This project is licensed under the MIT License. See the LICENSE file for details.