Pragnay/onnx scripts by mpragnay · Pull Request #411 · Emerge-Lab/PufferDrive

mpragnay · 2026-04-28T04:27:28Z

Added onnx scripts with fake action trajectory outputs. The first action is from the planner, and the rest of them are zeros

…of (x, y, heading) with the next action sampled by planner and the remain n_step - 1 actions as zero

Copilot

Pull request overview

Adds developer scripts to export a PufferDrive policy checkpoint to ONNX (optionally with a synthetic trajectory output) and to verify the exported ONNX model against saved example tensors.

Changes:

Add scripts/export_onnx.py to export a Drive policy .pt checkpoint to .onnx + external weights, and save example input/output tensors.
Add optional --fake_trajectory ONNX output that rolls out a jerk-dynamics trajectory from the argmax action.
Add scripts/verify_onnx.py to compare ONNX Runtime outputs against the saved PyTorch reference tensors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
scripts/export_onnx.py	Implements ONNX export pipeline and optional fake trajectory output via jerk rollout.
scripts/verify_onnx.py	Loads exported ONNX + example tensors and checks outputs with tolerances.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T04:32:04Z

+            # Ego features are the first slice of the observation (jerk model layout):
+            #   obs[:, 3] = signed_speed / MAX_SPEED  (MAX_SPEED = 100)
+            #   obs[:, 5] = vehicle_length / MAX_VEH_LEN  (proxy for wheelbase, ~5.5 m)
+            #   obs[:, 7] = steering_angle / pi
+            #   obs[:, 8] = a_long (asymmetrically normalised: neg→/15, pos→/4)
+            #   obs[:, 9] = a_lat / 4.0
+            speed = observation[:, 3] * 100.0
+            wheelbase = observation[:, 5] * 5.5
+            steering_angle = observation[:, 7] * 3.14159265
+            a_long_norm = observation[:, 8]
+            a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0)
+            a_lat = observation[:, 9] * 4.0


OnnxWrapper.forward is extracting ego kinematics from the flattened observation using indices/scales that don’t match the Drive env observation layout. In Drive, jerk ego features are ordered as: speed_norm(0), width_norm(1), length_norm(2), steering_norm(3), a_long_norm(4), a_lat_norm(5), lane_dist(6), lane_angle(7), lane_speed_limit(8) (see pufferlib/ocean/drive/drive.h:3959-3984). Here speed = observation[:, 3] * 100.0 is actually using steering_norm, and wheelbase = observation[:, 5] * 5.5 is using a_lat_norm, which can produce invalid wheelbase (including 0) and cause divisions by zero/NaNs in the rollout. Please update the indices and denormalization constants to match the env (e.g., MAX_SPEED=40.0 and steering scale = STEERING_VALUES[8]=0.667), and derive wheelbase from length if needed (wheelbase is set as 0.6 * vehicle_length in Drive).

Suggested change

# Ego features are the first slice of the observation (jerk model layout):

# obs[:, 3] = signed_speed / MAX_SPEED (MAX_SPEED = 100)

# obs[:, 5] = vehicle_length / MAX_VEH_LEN (proxy for wheelbase, ~5.5 m)

# obs[:, 7] = steering_angle / pi

# obs[:, 8] = a_long (asymmetrically normalised: neg→/15, pos→/4)

# obs[:, 9] = a_lat / 4.0

speed = observation[:, 3] * 100.0

wheelbase = observation[:, 5] * 5.5

steering_angle = observation[:, 7] * 3.14159265

a_long_norm = observation[:, 8]

a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0)

a_lat = observation[:, 9] * 4.0

# Drive jerk ego features are ordered as:

# obs[:, 0] = speed_norm (MAX_SPEED = 40.0)

# obs[:, 1] = width_norm

# obs[:, 2] = length_norm (MAX_VEH_LENGTH ~= 5.5 m)

# obs[:, 3] = steering_norm (STEERING_VALUES[8] = 0.667 rad)

# obs[:, 4] = a_long_norm (asymmetrically normalised: neg→/15, pos→/4)

# obs[:, 5] = a_lat_norm (/4.0)

# Wheelbase is derived in Drive as 0.6 * vehicle_length.

speed = observation[:, 0] * 40.0

vehicle_length = observation[:, 2] * 5.5

wheelbase = vehicle_length * 0.6

steering_angle = observation[:, 3] * 0.667

a_long_norm = observation[:, 4]

a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0)

a_lat = observation[:, 5] * 4.0

Copilot · 2026-04-28T04:32:05Z

+        B = logits.shape[0]
+        action = logits.argmax(dim=-1)
+        jerk_long_val = self.jerk_long_table[action // NUM_LAT]
+        jerk_lat_val = self.jerk_lat_table[action % NUM_LAT]
+


JerkDynamicsRollout assumes the action space is the 12-way jerk joint action (4×3) when it maps action // NUM_LAT and action % NUM_LAT into jerk_{long,lat}_table. If the policy was trained with dynamics_model=classic (63 logits) and --fake_trajectory is used, this will index past the end of jerk_long_table and produce incorrect results or an error. Please enforce fake_trajectory only for jerk dynamics (e.g., check config/env at export time or assert logits.shape[1] == 12 in the wrapper) and fail fast with a clear error message.

Copilot · 2026-04-28T04:32:05Z

+    # 1. Ego state — jerk model: [speed, accel, heading_sin, heading_cos,
+    #    yaw_rate, steer, jerk, norm_progress, dist_to_goal] (9 features)
+    #    Use physically reasonable normalised values.
+    ego = np.zeros((batch_size, env.ego_features), dtype=np.float32)
+    ego[:, 0] = 0.3  # speed ~30% of max
+    ego[:, 1] = 0.05  # small accel
+    ego[:, 2] = 0.5  # heading sin
+    ego[:, 3] = 0.866  # heading cos (≈30°)
+    ego[:, 4] = 0.02  # slight yaw rate
+    if env.ego_features > 5:
+        ego[:, 5] = 0.01  # steer
+    if env.ego_features > 6:
+        ego[:, 6] = 0.0  # jerk
+    if env.ego_features > 7:
+        ego[:, 7] = 0.2  # progress along route
+    if env.ego_features > 8:
+        ego[:, 8] = 0.6  # dist to goal


_make_example_input’s ego-feature comment and assignments don’t match the actual Drive observation definition for jerk dynamics. In Drive the first 9 values are (speed, width, length, steering, a_long, a_lat, lane_dist, lane_angle_cos, lane_speed_limit) (see pufferlib/ocean/drive/drive.h:3959-3984), but this function treats them as heading sin/cos, yaw_rate, etc. This makes the “plausible” example input misleading and (once the wrapper’s index/scales are fixed) will also stop being a good sanity-check for trajectory rollout. Please update the comment and populate the correct ego fields (including steering normalized by 0.667 and speed normalized by MAX_SPEED=40.0).

Suggested change

# 1. Ego state — jerk model: [speed, accel, heading_sin, heading_cos,

# yaw_rate, steer, jerk, norm_progress, dist_to_goal] (9 features)

# Use physically reasonable normalised values.

ego = np.zeros((batch_size, env.ego_features), dtype=np.float32)

ego[:, 0] = 0.3 # speed ~30% of max

ego[:, 1] = 0.05 # small accel

ego[:, 2] = 0.5 # heading sin

ego[:, 3] = 0.866 # heading cos (≈30°)

ego[:, 4] = 0.02 # slight yaw rate

if env.ego_features > 5:

ego[:, 5] = 0.01 # steer

if env.ego_features > 6:

ego[:, 6] = 0.0 # jerk

if env.ego_features > 7:

ego[:, 7] = 0.2 # progress along route

if env.ego_features > 8:

ego[:, 8] = 0.6 # dist to goal

# 1. Ego state — Drive jerk dynamics:

# [speed, width, length, steering, a_long, a_lat,

# lane_dist, lane_angle_cos, lane_speed_limit] (first 9 features)

# Use physically reasonable values, with speed normalised by MAX_SPEED=40.0

# and steering normalised by 0.667.

ego = np.zeros((batch_size, env.ego_features), dtype=np.float32)

ego[:, 0] = 12.0 / 40.0 # speed: 12 m/s

ego[:, 1] = 1.8 # width (m)

ego[:, 2] = 4.5 # length (m)

ego[:, 3] = 0.1 / 0.667 # steering: small steering angle

ego[:, 4] = 0.5 # a_long: mild longitudinal acceleration

if env.ego_features > 5:

ego[:, 5] = 0.1 # a_lat: slight lateral acceleration

if env.ego_features > 6:

ego[:, 6] = 0.0 # lane_dist: centered in lane

if env.ego_features > 7:

ego[:, 7] = 0.995 # lane_angle_cos: nearly aligned with lane

if env.ego_features > 8:

ego[:, 8] = 25.0 / 40.0 # lane_speed_limit: 25 m/s

Copilot · 2026-04-28T04:32:05Z

+        # Waypoints: alternating x/y relative goal positions
+        target_start = coef_dim
+        target_features_per_wp = env.target_dim // (env.target_dim // max(1, binding.STATIC_TARGET_FEATURES))
+        for i in range(env.target_dim // target_features_per_wp):
+            base = target_start + i * target_features_per_wp
+            cond[:, base] = 0.3 + i * 0.15  # x progress
+            cond[:, base + 1] = 0.05 * (i % 2)  # slight lateral offset
+            if target_features_per_wp > 2:


The target-waypoint construction in _make_example_input computes target_features_per_wp using binding.STATIC_TARGET_FEATURES regardless of target_type, which misinterprets env.target_dim for dynamic targets (5 features/wp). For target_type="dynamic", this ends up generating the wrong number of waypoints/features (and populating fields that don’t correspond to the model’s expected conditioning layout). Please base target_features_per_wp (and loop bounds) on whether the env is using static vs dynamic targets so the example observation matches the real conditioning format.

Suggested change

# Waypoints: alternating x/y relative goal positions

target_start = coef_dim

target_features_per_wp = env.target_dim // (env.target_dim // max(1, binding.STATIC_TARGET_FEATURES))

for i in range(env.target_dim // target_features_per_wp):

base = target_start + i * target_features_per_wp

cond[:, base] = 0.3 + i * 0.15 # x progress

cond[:, base + 1] = 0.05 * (i % 2) # slight lateral offset

if target_features_per_wp > 2:

# Waypoints: build example targets using the active target schema

target_start = coef_dim

target_type = getattr(env, "target_type", "static")

if target_type == "dynamic":

target_features_per_wp = max(1, getattr(binding, "DYNAMIC_TARGET_FEATURES", 5))

else:

target_features_per_wp = max(1, binding.STATIC_TARGET_FEATURES)

num_target_wps = env.target_dim // target_features_per_wp

for i in range(num_target_wps):

base = target_start + i * target_features_per_wp

cond[:, base] = 0.3 + i * 0.15 # x progress

cond[:, base + 1] = 0.05 * (i % 2) # slight lateral offset

if target_type != "dynamic" and target_features_per_wp > 2:

mpragnay added 3 commits April 27, 2026 20:10

init commit

9e2fc52

Updated onnx files

bf69f86

Added fake trajectory output onnx wrapper, outputs a fake trajectory …

bfc2ee2

…of (x, y, heading) with the next action sampled by planner and the remain n_step - 1 actions as zero

Copilot AI review requested due to automatic review settings April 28, 2026 04:27

Copilot started reviewing on behalf of mpragnay April 28, 2026 04:27 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pragnay/onnx scripts#411

Pragnay/onnx scripts#411
mpragnay wants to merge 3 commits intoemerge/temp_trainingfrom
pragnay/onnx_scripts

mpragnay commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            # Ego features are the first slice of the observation (jerk model layout):
-            #   obs[:, 3] = signed_speed / MAX_SPEED  (MAX_SPEED = 100)
-            #   obs[:, 5] = vehicle_length / MAX_VEH_LEN  (proxy for wheelbase, ~5.5 m)
-            #   obs[:, 7] = steering_angle / pi
-            #   obs[:, 8] = a_long (asymmetrically normalised: neg→/15, pos→/4)
-            #   obs[:, 9] = a_lat / 4.0
-            speed = observation[:, 3] * 100.0
-            wheelbase = observation[:, 5] * 5.5
-            steering_angle = observation[:, 7] * 3.14159265
-            a_long_norm = observation[:, 8]
-            a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0)
-            a_lat = observation[:, 9] * 4.0
+            # Drive jerk ego features are ordered as:
+            #   obs[:, 0] = speed_norm          (MAX_SPEED = 40.0)
+            #   obs[:, 1] = width_norm
+            #   obs[:, 2] = length_norm         (MAX_VEH_LENGTH ~= 5.5 m)
+            #   obs[:, 3] = steering_norm       (STEERING_VALUES[8] = 0.667 rad)
+            #   obs[:, 4] = a_long_norm         (asymmetrically normalised: neg→/15, pos→/4)
+            #   obs[:, 5] = a_lat_norm          (/4.0)
+            # Wheelbase is derived in Drive as 0.6 * vehicle_length.
+            speed = observation[:, 0] * 40.0
+            vehicle_length = observation[:, 2] * 5.5
+            wheelbase = vehicle_length * 0.6
+            steering_angle = observation[:, 3] * 0.667
+            a_long_norm = observation[:, 4]
+            a_long = torch.where(a_long_norm < 0, a_long_norm * 15.0, a_long_norm * 4.0)
+            a_lat = observation[:, 5] * 4.0

Conversation

mpragnay commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants