This project provides a modular framework for testing pre-trained reinforcement learning (RL) agents on the Highway-Env simulation environment.
The main goal is to discover failure-inducing driving scenarios (e.g., collisions) using different search algorithms.
It serves as the foundation for lab assignments where students will:
- Understand and modify a reinforcement-learning test environment.
- Implement and compare different search-based scenario generation algorithms (e.g., Random Search, Hill Climber).
- Analyze the agent’s robustness by reproducing and recording crash scenarios.
A pre-trained RL agent (trained using PPO) drives autonomously in a simulated highway.
The task is to automatically generate test scenarios (traffic density, spacing, initial lane, etc.) that reveal weaknesses in the policy.
Find scenario parameters that make the ego vehicle crash.
If no crash is found, the goal becomes finding scenarios that minimize the distance between the ego vehicle and other vehicles.
Scenario search changes parameters such as:
vehicles_count: number of cars on the roadlanes_count: number of lanesinitial_spacing: distance between vehiclesego_spacing: spacing between ego and surrounding trafficinitial_lane_id: initial lane position of the ego vehicle
testing-rl/
│
├── main.py # Entry point – runs a chosen search algorithm
│
├── config/
│ ├── __init__.py
│ └── search_space.py # Defines search-space & base configuration
│
├── envs/
│ ├── __init__.py
│ └── highway_env_utils.py # Environment setup, resets, and episode evaluation
│
├── policies/
│ ├── __init__.py
│ └── pretrained_policy.py # Loads a trained model and defines policy()
│
├── search/
│ ├── __init__.py
│ ├── base_search.py # Abstract class ScenarioSearch
│ ├── random_search.py # Implements RandomSearch
│ └── hill_climbing.py # YOUR TASK (template)
│
├── agents/ # Folder containing trained RL policies
│
├── videos/ # Recorded crash videos
│
└── requirements.txt # Python dependencies
-
Pretrained RL agent
A PPO agent trained onhighway-fast-v0with domain randomization. -
Scenario execution utilities (
envs/highway_env_utils.py)- Scenario application / reset
- Episode execution
- Collision detection
- Time-series logging (ego + other vehicles over time)
- Video recording utilities
-
Random Search baseline (
search/random_search.py)
A simple baseline that samples scenarios uniformly at random. -
Hill Climber template (
search/hill_climbing.py)
A template with TODO sections for students.
Do not change public function signatures, as they are used for grading.
- You are expected to implement your solution only by editing:
search/hill_climbing.py. - You may add helper code only inside
search/(e.g.,search/utils_hc.py) and import it fromsearch/hill_climbing.py. - Do not change the interface of any provided classes/functions (especially
ScenarioSearchandrun_search()), otherwise our grading scripts may fail.
git clone https://github.com/<your-repo>/testing-rl.git
cd testing-rl
python -m venv .venv
source .venv/bin/activate # (Mac/Linux)
# OR
.venv\\Scripts\\activate # (Windows)pip install -r requirements.txtThe key dependencies include:
gymnasium>=0.29.0highway-env>=1.8.2torch>=2.2.0stable-baselines3==2.3.0tqdm,numpy
The trained policy (PPO) files are stored in:
agents/
└── model.zip
└── vec_normalized.pkl
The file pretrained_policy.py handles loading the pre-trained PPO agent
Run the random search baseline from main.py:
python main.pyThis will:
- Load the pre-trained PPO policy.
- Sample multiple random highway scenarios.
- Run each configuration for several episodes.
- Record any crashes to
videos/as MP4 files.
run_episode(...) (defined in envs/highway_env_utils.py) returns:
crashed(bool): whether a collision occurred at any time during the episodetime_series(list[dict]): a list of frames (one per time step)
Each frame is a dictionary with (at least) the following keys:
frame["t"]: time step indexframe["crashed"]: whether a crash has occurred at that stepframe["ego"]: ego vehicle state, orNoneif unavailableframe["others"]: list of other vehicles (possibly empty)
A typical frame looks like:
{
"t": 12,
"crashed": False,
"ego": {
"pos": [x, y],
"speed": v,
"heading": h,
"length": L,
"width": W,
"lane_id": k,
},
"others": [
{"pos": [x, y], "length": L, "width": W, "lane_id": k},
...
]
}The exact objective definition is intentionally left open and must be explained and justified by the student.
Hints for computing objectives (fitness)
- A natural goal in this assignment is to identify failure-inducing scenarios, such as those leading to a collision.
- When no collision is observed, the time-series data produced during execution can be used to quantify how unsafe a scenario is, based solely on observable behavior (e.g., vehicle positions and interactions over time).
- Such objective values may capture notions of proximity, risk, or progress toward failure without requiring access to internal agent information.
The exact definition of the objective (fitness) function is intentionally left open and must be clearly described and justified in the report.
See envs/highway_env_utils.py for helper routines you can reuse, including:
make_env(...)andapply_scenario_config(...)for consistent environment setuprun_episode(...)to execute a scenario and collecttime_seriesrecord_video_episode(...)to replay a scenario and save an MP4 video undervideos/
For grading, you are expected to complete only the Hill Climbing implementation provided in:
search/hill_climbing.py
You may add helper functions/modules inside the search/ package (e.g., search/utils_hc.py) if you want, but:
- Do not change public function signatures in the provided files.
- Do not rename/move files that are imported by
main.py. - Your code must remain runnable via
python main.py.
We provide RandomSearch as a baseline in search/random_search.py. Your Hill Climber must follow the same general workflow:
- Generate or mutate a scenario configuration (based on the parameter space in
config/search_space.py). - Evaluate a scenario by executing it with the provided PPO policy (via
envs/highway_env_utils.py). - Use the observed outcome (collision + time-series signals) to guide search.
The Random Search baseline is already implemented. You can run it from main.py (see the code snippet below) to understand:
- how scenarios are sampled,
- how episodes are executed,
- how crashes are detected,
- and how videos are recorded.
from search.random_search import RandomSearch
from config.search_space import param_spec, base_cfg
from policies.pretrained_policy import load_pretrained_policy
from envs.highway_env_utils import make_env
def main():
env_id = "highway-fast-v0"
policy = load_pretrained_policy("agents/model")
env, defaults = make_env(env_id)
search = RandomSearch(env_id, base_cfg, param_spec, policy, defaults)
crashes = search.run_search(n_scenarios=30, n_eval=3, seed=11)
print(f"✅ Found {len(crashes)} crashes.")
if __name__ == "__main__":
main()This project is provided for educational use in the Software Engineering and Testing for AI Systems course (DSAIT4015).
Based on Highway-Env
and Stable-Baselines3.
Author: Annibale Panichella
Institution: TU Delft – SERG / CISELab