Skip to content

Pragnay/onnx scripts#411

Open
mpragnay wants to merge 3 commits intoemerge/temp_trainingfrom
pragnay/onnx_scripts
Open

Pragnay/onnx scripts#411
mpragnay wants to merge 3 commits intoemerge/temp_trainingfrom
pragnay/onnx_scripts

Conversation

@mpragnay
Copy link
Copy Markdown

Added onnx scripts with fake action trajectory outputs. The first action is from the planner, and the rest of them are zeros

…of (x, y, heading) with the next action sampled by planner and the remain n_step - 1 actions as zero
Copilot AI review requested due to automatic review settings April 28, 2026 04:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds developer scripts to export a PufferDrive policy checkpoint to ONNX (optionally with a synthetic trajectory output) and to verify the exported ONNX model against saved example tensors.

Changes:

  • Add scripts/export_onnx.py to export a Drive policy .pt checkpoint to .onnx + external weights, and save example input/output tensors.
  • Add optional --fake_trajectory ONNX output that rolls out a jerk-dynamics trajectory from the argmax action.
  • Add scripts/verify_onnx.py to compare ONNX Runtime outputs against the saved PyTorch reference tensors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
scripts/export_onnx.py Implements ONNX export pipeline and optional fake trajectory output via jerk rollout.
scripts/verify_onnx.py Loads exported ONNX + example tensors and checks outputs with tolerances.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/export_onnx.py
Comment on lines +166 to +177
# Ego features are the first slice of the observation (jerk model layout):
# obs[:, 3] = signed_speed / MAX_SPEED (MAX_SPEED = 100)
# obs[:, 5] = vehicle_length / MAX_VEH_LEN (proxy for wheelbase, ~5.5 m)
# obs[:, 7] = steering_angle / pi
# obs[:, 8] = a_long (asymmetrically normalised: neg→/15, pos→/4)
# obs[:, 9] = a_lat / 4.0
speed = observation[:, 3] * 100.0
wheelbase = observation[:, 5] * 5.5
steering_angle = observation[:, 7] * 3.14159265
a_long_norm = observation[:, 8]
a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0)
a_lat = observation[:, 9] * 4.0
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OnnxWrapper.forward is extracting ego kinematics from the flattened observation using indices/scales that don’t match the Drive env observation layout. In Drive, jerk ego features are ordered as: speed_norm(0), width_norm(1), length_norm(2), steering_norm(3), a_long_norm(4), a_lat_norm(5), lane_dist(6), lane_angle(7), lane_speed_limit(8) (see pufferlib/ocean/drive/drive.h:3959-3984). Here speed = observation[:, 3] * 100.0 is actually using steering_norm, and wheelbase = observation[:, 5] * 5.5 is using a_lat_norm, which can produce invalid wheelbase (including 0) and cause divisions by zero/NaNs in the rollout. Please update the indices and denormalization constants to match the env (e.g., MAX_SPEED=40.0 and steering scale = STEERING_VALUES[8]=0.667), and derive wheelbase from length if needed (wheelbase is set as 0.6 * vehicle_length in Drive).

Suggested change
# Ego features are the first slice of the observation (jerk model layout):
# obs[:, 3] = signed_speed / MAX_SPEED (MAX_SPEED = 100)
# obs[:, 5] = vehicle_length / MAX_VEH_LEN (proxy for wheelbase, ~5.5 m)
# obs[:, 7] = steering_angle / pi
# obs[:, 8] = a_long (asymmetrically normalised: neg→/15, pos→/4)
# obs[:, 9] = a_lat / 4.0
speed = observation[:, 3] * 100.0
wheelbase = observation[:, 5] * 5.5
steering_angle = observation[:, 7] * 3.14159265
a_long_norm = observation[:, 8]
a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0)
a_lat = observation[:, 9] * 4.0
# Drive jerk ego features are ordered as:
# obs[:, 0] = speed_norm (MAX_SPEED = 40.0)
# obs[:, 1] = width_norm
# obs[:, 2] = length_norm (MAX_VEH_LENGTH ~= 5.5 m)
# obs[:, 3] = steering_norm (STEERING_VALUES[8] = 0.667 rad)
# obs[:, 4] = a_long_norm (asymmetrically normalised: neg→/15, pos→/4)
# obs[:, 5] = a_lat_norm (/4.0)
# Wheelbase is derived in Drive as 0.6 * vehicle_length.
speed = observation[:, 0] * 40.0
vehicle_length = observation[:, 2] * 5.5
wheelbase = vehicle_length * 0.6
steering_angle = observation[:, 3] * 0.667
a_long_norm = observation[:, 4]
a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0)
a_lat = observation[:, 5] * 4.0

Copilot uses AI. Check for mistakes.
Comment thread scripts/export_onnx.py
Comment on lines +80 to +84
B = logits.shape[0]
action = logits.argmax(dim=-1)
jerk_long_val = self.jerk_long_table[action // NUM_LAT]
jerk_lat_val = self.jerk_lat_table[action % NUM_LAT]

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JerkDynamicsRollout assumes the action space is the 12-way jerk joint action (4×3) when it maps action // NUM_LAT and action % NUM_LAT into jerk_{long,lat}_table. If the policy was trained with dynamics_model=classic (63 logits) and --fake_trajectory is used, this will index past the end of jerk_long_table and produce incorrect results or an error. Please enforce fake_trajectory only for jerk dynamics (e.g., check config/env at export time or assert logits.shape[1] == 12 in the wrapper) and fail fast with a clear error message.

Copilot uses AI. Check for mistakes.
Comment thread scripts/export_onnx.py
Comment on lines +281 to +297
# 1. Ego state — jerk model: [speed, accel, heading_sin, heading_cos,
# yaw_rate, steer, jerk, norm_progress, dist_to_goal] (9 features)
# Use physically reasonable normalised values.
ego = np.zeros((batch_size, env.ego_features), dtype=np.float32)
ego[:, 0] = 0.3 # speed ~30% of max
ego[:, 1] = 0.05 # small accel
ego[:, 2] = 0.5 # heading sin
ego[:, 3] = 0.866 # heading cos (≈30°)
ego[:, 4] = 0.02 # slight yaw rate
if env.ego_features > 5:
ego[:, 5] = 0.01 # steer
if env.ego_features > 6:
ego[:, 6] = 0.0 # jerk
if env.ego_features > 7:
ego[:, 7] = 0.2 # progress along route
if env.ego_features > 8:
ego[:, 8] = 0.6 # dist to goal
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_make_example_input’s ego-feature comment and assignments don’t match the actual Drive observation definition for jerk dynamics. In Drive the first 9 values are (speed, width, length, steering, a_long, a_lat, lane_dist, lane_angle_cos, lane_speed_limit) (see pufferlib/ocean/drive/drive.h:3959-3984), but this function treats them as heading sin/cos, yaw_rate, etc. This makes the “plausible” example input misleading and (once the wrapper’s index/scales are fixed) will also stop being a good sanity-check for trajectory rollout. Please update the comment and populate the correct ego fields (including steering normalized by 0.667 and speed normalized by MAX_SPEED=40.0).

Suggested change
# 1. Ego state — jerk model: [speed, accel, heading_sin, heading_cos,
# yaw_rate, steer, jerk, norm_progress, dist_to_goal] (9 features)
# Use physically reasonable normalised values.
ego = np.zeros((batch_size, env.ego_features), dtype=np.float32)
ego[:, 0] = 0.3 # speed ~30% of max
ego[:, 1] = 0.05 # small accel
ego[:, 2] = 0.5 # heading sin
ego[:, 3] = 0.866 # heading cos (≈30°)
ego[:, 4] = 0.02 # slight yaw rate
if env.ego_features > 5:
ego[:, 5] = 0.01 # steer
if env.ego_features > 6:
ego[:, 6] = 0.0 # jerk
if env.ego_features > 7:
ego[:, 7] = 0.2 # progress along route
if env.ego_features > 8:
ego[:, 8] = 0.6 # dist to goal
# 1. Ego state — Drive jerk dynamics:
# [speed, width, length, steering, a_long, a_lat,
# lane_dist, lane_angle_cos, lane_speed_limit] (first 9 features)
# Use physically reasonable values, with speed normalised by MAX_SPEED=40.0
# and steering normalised by 0.667.
ego = np.zeros((batch_size, env.ego_features), dtype=np.float32)
ego[:, 0] = 12.0 / 40.0 # speed: 12 m/s
ego[:, 1] = 1.8 # width (m)
ego[:, 2] = 4.5 # length (m)
ego[:, 3] = 0.1 / 0.667 # steering: small steering angle
ego[:, 4] = 0.5 # a_long: mild longitudinal acceleration
if env.ego_features > 5:
ego[:, 5] = 0.1 # a_lat: slight lateral acceleration
if env.ego_features > 6:
ego[:, 6] = 0.0 # lane_dist: centered in lane
if env.ego_features > 7:
ego[:, 7] = 0.995 # lane_angle_cos: nearly aligned with lane
if env.ego_features > 8:
ego[:, 8] = 25.0 / 40.0 # lane_speed_limit: 25 m/s

Copilot uses AI. Check for mistakes.
Comment thread scripts/export_onnx.py
Comment on lines +308 to +315
# Waypoints: alternating x/y relative goal positions
target_start = coef_dim
target_features_per_wp = env.target_dim // (env.target_dim // max(1, binding.STATIC_TARGET_FEATURES))
for i in range(env.target_dim // target_features_per_wp):
base = target_start + i * target_features_per_wp
cond[:, base] = 0.3 + i * 0.15 # x progress
cond[:, base + 1] = 0.05 * (i % 2) # slight lateral offset
if target_features_per_wp > 2:
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The target-waypoint construction in _make_example_input computes target_features_per_wp using binding.STATIC_TARGET_FEATURES regardless of target_type, which misinterprets env.target_dim for dynamic targets (5 features/wp). For target_type="dynamic", this ends up generating the wrong number of waypoints/features (and populating fields that don’t correspond to the model’s expected conditioning layout). Please base target_features_per_wp (and loop bounds) on whether the env is using static vs dynamic targets so the example observation matches the real conditioning format.

Suggested change
# Waypoints: alternating x/y relative goal positions
target_start = coef_dim
target_features_per_wp = env.target_dim // (env.target_dim // max(1, binding.STATIC_TARGET_FEATURES))
for i in range(env.target_dim // target_features_per_wp):
base = target_start + i * target_features_per_wp
cond[:, base] = 0.3 + i * 0.15 # x progress
cond[:, base + 1] = 0.05 * (i % 2) # slight lateral offset
if target_features_per_wp > 2:
# Waypoints: build example targets using the active target schema
target_start = coef_dim
target_type = getattr(env, "target_type", "static")
if target_type == "dynamic":
target_features_per_wp = max(1, getattr(binding, "DYNAMIC_TARGET_FEATURES", 5))
else:
target_features_per_wp = max(1, binding.STATIC_TARGET_FEATURES)
num_target_wps = env.target_dim // target_features_per_wp
for i in range(num_target_wps):
base = target_start + i * target_features_per_wp
cond[:, base] = 0.3 + i * 0.15 # x progress
cond[:, base + 1] = 0.05 * (i % 2) # slight lateral offset
if target_type != "dynamic" and target_features_per_wp > 2:

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants