Pragnay/onnx scripts#411
Conversation
…of (x, y, heading) with the next action sampled by planner and the remain n_step - 1 actions as zero
There was a problem hiding this comment.
Pull request overview
Adds developer scripts to export a PufferDrive policy checkpoint to ONNX (optionally with a synthetic trajectory output) and to verify the exported ONNX model against saved example tensors.
Changes:
- Add
scripts/export_onnx.pyto export a Drive policy.ptcheckpoint to.onnx+ external weights, and save example input/output tensors. - Add optional
--fake_trajectoryONNX output that rolls out a jerk-dynamics trajectory from the argmax action. - Add
scripts/verify_onnx.pyto compare ONNX Runtime outputs against the saved PyTorch reference tensors.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| scripts/export_onnx.py | Implements ONNX export pipeline and optional fake trajectory output via jerk rollout. |
| scripts/verify_onnx.py | Loads exported ONNX + example tensors and checks outputs with tolerances. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Ego features are the first slice of the observation (jerk model layout): | ||
| # obs[:, 3] = signed_speed / MAX_SPEED (MAX_SPEED = 100) | ||
| # obs[:, 5] = vehicle_length / MAX_VEH_LEN (proxy for wheelbase, ~5.5 m) | ||
| # obs[:, 7] = steering_angle / pi | ||
| # obs[:, 8] = a_long (asymmetrically normalised: neg→/15, pos→/4) | ||
| # obs[:, 9] = a_lat / 4.0 | ||
| speed = observation[:, 3] * 100.0 | ||
| wheelbase = observation[:, 5] * 5.5 | ||
| steering_angle = observation[:, 7] * 3.14159265 | ||
| a_long_norm = observation[:, 8] | ||
| a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0) | ||
| a_lat = observation[:, 9] * 4.0 |
There was a problem hiding this comment.
OnnxWrapper.forward is extracting ego kinematics from the flattened observation using indices/scales that don’t match the Drive env observation layout. In Drive, jerk ego features are ordered as: speed_norm(0), width_norm(1), length_norm(2), steering_norm(3), a_long_norm(4), a_lat_norm(5), lane_dist(6), lane_angle(7), lane_speed_limit(8) (see pufferlib/ocean/drive/drive.h:3959-3984). Here speed = observation[:, 3] * 100.0 is actually using steering_norm, and wheelbase = observation[:, 5] * 5.5 is using a_lat_norm, which can produce invalid wheelbase (including 0) and cause divisions by zero/NaNs in the rollout. Please update the indices and denormalization constants to match the env (e.g., MAX_SPEED=40.0 and steering scale = STEERING_VALUES[8]=0.667), and derive wheelbase from length if needed (wheelbase is set as 0.6 * vehicle_length in Drive).
| # Ego features are the first slice of the observation (jerk model layout): | |
| # obs[:, 3] = signed_speed / MAX_SPEED (MAX_SPEED = 100) | |
| # obs[:, 5] = vehicle_length / MAX_VEH_LEN (proxy for wheelbase, ~5.5 m) | |
| # obs[:, 7] = steering_angle / pi | |
| # obs[:, 8] = a_long (asymmetrically normalised: neg→/15, pos→/4) | |
| # obs[:, 9] = a_lat / 4.0 | |
| speed = observation[:, 3] * 100.0 | |
| wheelbase = observation[:, 5] * 5.5 | |
| steering_angle = observation[:, 7] * 3.14159265 | |
| a_long_norm = observation[:, 8] | |
| a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0) | |
| a_lat = observation[:, 9] * 4.0 | |
| # Drive jerk ego features are ordered as: | |
| # obs[:, 0] = speed_norm (MAX_SPEED = 40.0) | |
| # obs[:, 1] = width_norm | |
| # obs[:, 2] = length_norm (MAX_VEH_LENGTH ~= 5.5 m) | |
| # obs[:, 3] = steering_norm (STEERING_VALUES[8] = 0.667 rad) | |
| # obs[:, 4] = a_long_norm (asymmetrically normalised: neg→/15, pos→/4) | |
| # obs[:, 5] = a_lat_norm (/4.0) | |
| # Wheelbase is derived in Drive as 0.6 * vehicle_length. | |
| speed = observation[:, 0] * 40.0 | |
| vehicle_length = observation[:, 2] * 5.5 | |
| wheelbase = vehicle_length * 0.6 | |
| steering_angle = observation[:, 3] * 0.667 | |
| a_long_norm = observation[:, 4] | |
| a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0) | |
| a_lat = observation[:, 5] * 4.0 |
| B = logits.shape[0] | ||
| action = logits.argmax(dim=-1) | ||
| jerk_long_val = self.jerk_long_table[action // NUM_LAT] | ||
| jerk_lat_val = self.jerk_lat_table[action % NUM_LAT] | ||
|
|
There was a problem hiding this comment.
JerkDynamicsRollout assumes the action space is the 12-way jerk joint action (4×3) when it maps action // NUM_LAT and action % NUM_LAT into jerk_{long,lat}_table. If the policy was trained with dynamics_model=classic (63 logits) and --fake_trajectory is used, this will index past the end of jerk_long_table and produce incorrect results or an error. Please enforce fake_trajectory only for jerk dynamics (e.g., check config/env at export time or assert logits.shape[1] == 12 in the wrapper) and fail fast with a clear error message.
| # 1. Ego state — jerk model: [speed, accel, heading_sin, heading_cos, | ||
| # yaw_rate, steer, jerk, norm_progress, dist_to_goal] (9 features) | ||
| # Use physically reasonable normalised values. | ||
| ego = np.zeros((batch_size, env.ego_features), dtype=np.float32) | ||
| ego[:, 0] = 0.3 # speed ~30% of max | ||
| ego[:, 1] = 0.05 # small accel | ||
| ego[:, 2] = 0.5 # heading sin | ||
| ego[:, 3] = 0.866 # heading cos (≈30°) | ||
| ego[:, 4] = 0.02 # slight yaw rate | ||
| if env.ego_features > 5: | ||
| ego[:, 5] = 0.01 # steer | ||
| if env.ego_features > 6: | ||
| ego[:, 6] = 0.0 # jerk | ||
| if env.ego_features > 7: | ||
| ego[:, 7] = 0.2 # progress along route | ||
| if env.ego_features > 8: | ||
| ego[:, 8] = 0.6 # dist to goal |
There was a problem hiding this comment.
_make_example_input’s ego-feature comment and assignments don’t match the actual Drive observation definition for jerk dynamics. In Drive the first 9 values are (speed, width, length, steering, a_long, a_lat, lane_dist, lane_angle_cos, lane_speed_limit) (see pufferlib/ocean/drive/drive.h:3959-3984), but this function treats them as heading sin/cos, yaw_rate, etc. This makes the “plausible” example input misleading and (once the wrapper’s index/scales are fixed) will also stop being a good sanity-check for trajectory rollout. Please update the comment and populate the correct ego fields (including steering normalized by 0.667 and speed normalized by MAX_SPEED=40.0).
| # 1. Ego state — jerk model: [speed, accel, heading_sin, heading_cos, | |
| # yaw_rate, steer, jerk, norm_progress, dist_to_goal] (9 features) | |
| # Use physically reasonable normalised values. | |
| ego = np.zeros((batch_size, env.ego_features), dtype=np.float32) | |
| ego[:, 0] = 0.3 # speed ~30% of max | |
| ego[:, 1] = 0.05 # small accel | |
| ego[:, 2] = 0.5 # heading sin | |
| ego[:, 3] = 0.866 # heading cos (≈30°) | |
| ego[:, 4] = 0.02 # slight yaw rate | |
| if env.ego_features > 5: | |
| ego[:, 5] = 0.01 # steer | |
| if env.ego_features > 6: | |
| ego[:, 6] = 0.0 # jerk | |
| if env.ego_features > 7: | |
| ego[:, 7] = 0.2 # progress along route | |
| if env.ego_features > 8: | |
| ego[:, 8] = 0.6 # dist to goal | |
| # 1. Ego state — Drive jerk dynamics: | |
| # [speed, width, length, steering, a_long, a_lat, | |
| # lane_dist, lane_angle_cos, lane_speed_limit] (first 9 features) | |
| # Use physically reasonable values, with speed normalised by MAX_SPEED=40.0 | |
| # and steering normalised by 0.667. | |
| ego = np.zeros((batch_size, env.ego_features), dtype=np.float32) | |
| ego[:, 0] = 12.0 / 40.0 # speed: 12 m/s | |
| ego[:, 1] = 1.8 # width (m) | |
| ego[:, 2] = 4.5 # length (m) | |
| ego[:, 3] = 0.1 / 0.667 # steering: small steering angle | |
| ego[:, 4] = 0.5 # a_long: mild longitudinal acceleration | |
| if env.ego_features > 5: | |
| ego[:, 5] = 0.1 # a_lat: slight lateral acceleration | |
| if env.ego_features > 6: | |
| ego[:, 6] = 0.0 # lane_dist: centered in lane | |
| if env.ego_features > 7: | |
| ego[:, 7] = 0.995 # lane_angle_cos: nearly aligned with lane | |
| if env.ego_features > 8: | |
| ego[:, 8] = 25.0 / 40.0 # lane_speed_limit: 25 m/s |
| # Waypoints: alternating x/y relative goal positions | ||
| target_start = coef_dim | ||
| target_features_per_wp = env.target_dim // (env.target_dim // max(1, binding.STATIC_TARGET_FEATURES)) | ||
| for i in range(env.target_dim // target_features_per_wp): | ||
| base = target_start + i * target_features_per_wp | ||
| cond[:, base] = 0.3 + i * 0.15 # x progress | ||
| cond[:, base + 1] = 0.05 * (i % 2) # slight lateral offset | ||
| if target_features_per_wp > 2: |
There was a problem hiding this comment.
The target-waypoint construction in _make_example_input computes target_features_per_wp using binding.STATIC_TARGET_FEATURES regardless of target_type, which misinterprets env.target_dim for dynamic targets (5 features/wp). For target_type="dynamic", this ends up generating the wrong number of waypoints/features (and populating fields that don’t correspond to the model’s expected conditioning layout). Please base target_features_per_wp (and loop bounds) on whether the env is using static vs dynamic targets so the example observation matches the real conditioning format.
| # Waypoints: alternating x/y relative goal positions | |
| target_start = coef_dim | |
| target_features_per_wp = env.target_dim // (env.target_dim // max(1, binding.STATIC_TARGET_FEATURES)) | |
| for i in range(env.target_dim // target_features_per_wp): | |
| base = target_start + i * target_features_per_wp | |
| cond[:, base] = 0.3 + i * 0.15 # x progress | |
| cond[:, base + 1] = 0.05 * (i % 2) # slight lateral offset | |
| if target_features_per_wp > 2: | |
| # Waypoints: build example targets using the active target schema | |
| target_start = coef_dim | |
| target_type = getattr(env, "target_type", "static") | |
| if target_type == "dynamic": | |
| target_features_per_wp = max(1, getattr(binding, "DYNAMIC_TARGET_FEATURES", 5)) | |
| else: | |
| target_features_per_wp = max(1, binding.STATIC_TARGET_FEATURES) | |
| num_target_wps = env.target_dim // target_features_per_wp | |
| for i in range(num_target_wps): | |
| base = target_start + i * target_features_per_wp | |
| cond[:, base] = 0.3 + i * 0.15 # x progress | |
| cond[:, base + 1] = 0.05 * (i % 2) # slight lateral offset | |
| if target_type != "dynamic" and target_features_per_wp > 2: |
Added onnx scripts with fake action trajectory outputs. The first action is from the planner, and the rest of them are zeros