[WIP] Trajviz: Vulkan offline trajectory renderer by eugenevinitsky · Pull Request #398 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-04-11T22:19:59Z

Summary

A new headless Vulkan-backed renderer that turns saved Drive trajectories into MP4 videos. Independent of the existing raylib visualizer (scripts/build_ocean.sh visualize) — opt-in via TRAJVIZ=1 python setup.py build_ext --inplace. Drive sim code (drive.{h,c,py}, binding.c) is untouched.

Status: WIP. Builds, runs, renders correctly on real Drive sims. There is more to do (see below) before this is merge-ready.

Public Python API: Renderer.render_episode(...) and Renderer.render_batch([...]) (up to 16 episodes per batch)
CLI: python -m pufferlib.ocean.drive.trajviz <inputs> --maps-dir ... --out ...
Random-rollout smoke test: python -m pufferlib.ocean.drive.trajviz.tools.random_rollout
Standalone C harness: pufferlib/ocean/drive/trajviz/tests/test_main.c (no Python required)

Performance

On RTX 4080 + 16-core CPU, 1280×720, 90-frame episodes, both views, libx264 -preset veryfast:

batch_size	wall time	per-episode	ep/s
1	345 ms	345 ms	2.9
4	1094 ms	274 ms	3.7
8	2136 ms	267 ms	3.7

Pure Vulkan + readback floor (encoder bypassed) is ~30 ms / ep ≈ 32 ep/s. The remaining gap is libx264 — NVENC integration (deferred) would close it.

Headline optimizations applied during development:

HOST_CACHED readback memory — single biggest win (~6-7×). Default HOST_VISIBLE | HOST_COHERENT on NVIDIA picks write-combined PCIe BAR memory, fast for the GPU but ~250 MB/s for CPU reads. HOST_CACHED puts it in regular RAM (>5 GB/s).
LINE_STRIP polylines — one draw call per polyline instead of per segment.
Vertically-tiled atlas for batched rendering — N episodes' tiles stacked vertically so each tile's bytes are row-contiguous in the readback buffer (one fwrite per pipe per frame, no row stitching).
Threaded fwrite fan-out — each ffmpeg pipe owns a writer thread; per-frame write phase costs max(single fwrite) instead of sum(fwrites).
F_SETPIPE_SZ — bumps the kernel pipe buffer up to whatever the per-process limit allows so fwrites don't ping-pong on a 64 KB pipe.

Architecture

__init__.py / _native.c / trajviz.{h,c}
                ↓
   vk_renderer (single)   vk_batch_renderer (tiled atlas)
                ↓                  ↓
   vk_pipeline / vk_context / ffmpeg_pipe (+ writer thread)
                ↓                  ↓
        Vulkan 1.3             ffmpeg subprocess

Two views, matching the existing live raylib path:

Top-down = RenderView.FULL_SIM_STATE — orthographic full-map
BEV = RenderView.BEV_AGENT_OBS — agent-centric ~100m × 178m window, ego at center facing up

Documentation in docs/src/trajviz.md covers prerequisites (apt packages), build, Python and CLI usage, performance tuning (sysctl knobs, HOST_CACHED explanation, batch size guidance), debugging env vars, troubleshooting, and architecture overview.

Why Vulkan, not raylib

Headless on Linux clusters with no X server (raylib needs xvfb)
Throughput-oriented batching (impossible without command-buffer control)
Independent build path so the optional Vulkan dep doesn't pollute the live drive sim build

Why WIP — known gaps before merge

No NPC / expert-replay agents — currently only renders the controlled agents from get_sim_trajectories. The other ~18 vehicles in a typical Waymo scenario (the WOSAC "context" tracks) aren't shown. Adding them needs a separate Drive API to expose expert trajectories.
No 3D follow-cam — RenderView.AGENT_PERSP (3D car meshes from .glb) is not implemented in trajviz. Top-down + BEV only.
CPU-bound by libx264 once batched — NVENC integration (Vulkan video encode or libnvidia-encode + CUDA-Vulkan interop) would unlock the remaining ~12% gap to the pure-GPU ceiling.
batch_size capped at 16 — atlas image height grows linearly. Past ~22 we'd need multiple atlas passes or a 2-D tile grid.
Uniform num_steps per batch — short episodes get padded with zeros. Wastes a tiny bit of GPU work on the trailing zeros.
Buffer/image helpers duplicated between vk_renderer.c and vk_batch_renderer.c (~50 lines each). Should be consolidated into a shared helper before merge.

Test plan

TRAJVIZ=1 python setup.py build_ext --inplace — clean build, no warnings beyond the standard PyCFunction cast
Single-episode render via random_rollout.py — 90-frame MP4 with valid h264, 1280×720, both views
Batched render of 4 episodes — 4 valid MP4s, ~270 ms/ep
Visual sanity check on extracted frames (ego centered in BEV, Waymo road geometry slides correctly)
Render an actual saved trajectories_*.npz from a real training run end-to-end via render_npz
Try on a non-NVIDIA GPU (AMD radv, Intel) to confirm HOST_CACHED fallback works
CI build with TRAJVIZ=1 so the extension keeps compiling
Decide whether to delete notebooks/visualize_trajectories.py once map_io.py is the canonical parser

🤖 Generated with Claude Code

Ported from ev/yolo (~6 commits) as a single clean change, skipping debug commits and the per-component reward logging / partner obs changes that were tangled into the same branch history. ## C side (drive.h, datatypes.h, env_binding.h) - Agent struct gets four float* buffers (sim_traj_x/y/z/heading) sized to episode_length. Allocated in init() after set_active_agents, freed in free_agent. - c_step writes the post-move_dynamics state into position t = timestep-1. Cheap: 4 float copies per agent per step. - c_get_sim_trajectories(env, x, y, z, heading, lengths, ep_len) copies all active agents' buffers into output arrays. lengths[i] = env->timestep so callers know how much of the buffer is valid for the current episode. - vec_get_sim_trajectories: Python-facing wrapper that iterates sub-envs with agent-offset accounting. - vec_get_world_mean: (x, y, z) tuple from env 0. Used by save_trajectories to lift sim coords back into the source map frame for offline rendering. ## Python side (drive.py) - Drive.__init__ accepts traj_save_dir kwarg; stores it along with a _worker_idx slot that vector.py fills in for multiprocessing workers. - Cache world_mean after binding.vectorize (and on resample_maps). - get_sim_trajectories() allocates output arrays and calls the binding. - notify() writes a per-worker traj_worker_{idx}.npz containing the sim trajectories, map_ids, agent_offsets, map_files, world_mean. Called from workers via the shm notify mechanism. ## Multiprocessing plumbing (vector.py) - _worker_process tags each env (or sub-envs of a Serial wrapper) with _worker_idx after construction so env.notify() knows which file to write. - Multiprocessing.save_worker_trajectories() sets the notify flag for all workers and spins until they all clear it (workers respond inside their step loop). ## Checkpoint integration (pufferl.py) - save_trajectories() dumps the rolling policy buffers (actions, rewards, values, logprobs, terminals, truncations) + C-side trajectories + map context into trajectories_{epoch:06d}.npz. Supports both multiprocessing (fan out via save_worker_trajectories, stitch worker files) and Serial (read driver_env directly). - save_reproducibility() snapshots the compiled .so, key source files, active config, and git commit/diff on the first checkpoint of a run. - Both called inside the existing checkpoint block in train(). - train() pre-creates data_dir/traj_tmp and threads traj_save_dir into args["env"] so workers inherit it automatically via env kwargs. - Opt-out via `save_trajectories: False` in the train config. Verified locally: C extension builds, Drive.get_sim_trajectories() returns correctly-shaped arrays with live sim positions, world_mean binding works. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Pulled verbatim from ev/yolo (commit af9119a). jupytext percent-format script that reads the trajectories_<epoch>.npz written by PuffeRL.save_trajectories and renders agent paths on top of the source map. Uses world_mean to align sim coords with the map frame. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

A new headless Vulkan-backed renderer that turns saved Drive trajectories into MP4 videos. Independent of the existing raylib visualizer (`scripts/build_ocean.sh visualize`) — opt-in via `TRAJVIZ=1 python setup.py build_ext --inplace`. Optional dependency, won't affect users who don't need it. ## Public surface - Python: `pufferlib.ocean.drive.trajviz.Renderer` - `render_episode(...)` for single-episode rendering - `render_batch([...])` for multi-episode batched rendering (up to 16) - `render_npz(path, maps_dir, out_dir)` for saved trajectories_*.npz - CLI: `python -m pufferlib.ocean.drive.trajviz <inputs> --maps-dir ... --out ...` - Random-rollout smoke test: `python -m pufferlib.ocean.drive.trajviz.tools.random_rollout` - Standalone C harness: `tests/test_main.c` (no Python required) ## Architecture - `_native.c` CPython extension shell (numpy unwrap, GIL release) - `trajviz.{h,c}` public C API: render_episode, render_episodes_batch - `vk_context` VkInstance / VkDevice / queue / command pool - `vk_pipeline` line + box graphics pipelines, push-constant cameras - `vk_renderer` single-episode renderer - `vk_batch_renderer` batched renderer with vertically-tiled atlas (per-episode tile bytes are row-contiguous in the readback buffer) - `ffmpeg_pipe` pipe-to-ffmpeg + per-pipe writer thread for parallel fan-out fwrites - `shaders/` GLSL → SPIR-V (compiled at build time, embedded as uint32_t arrays in a generated `shaders.c`) ## Two views Matches the existing live raylib path: - Top-down (RenderView.FULL_SIM_STATE): orthographic full-map - BEV (RenderView.BEV_AGENT_OBS): agent-centric ~100m × 178m window, ego at center facing up ## Performance On RTX 4080 + 16-core CPU, 1280x720 90-frame episodes with both views, libx264 -preset veryfast: batch_size=1: 345 ms / ep (2.9 ep/s) batch_size=4: 1094 ms total (274 ms / ep, 3.7 ep/s) batch_size=8: 2136 ms total (267 ms / ep, 3.7 ep/s) Pure Vulkan + readback floor (no encoder): ~30 ms / ep ≈ 32 ep/s. The remaining gap is libx264 encoding — NVENC would close it. Key optimizations applied: - HOST_CACHED readback memory (6-7× win on its own — uncached PCIe BAR reads were ~250 MB/s; cached RAM reads are >5 GB/s) - LINE_STRIP polylines (one draw per polyline, not per segment) - Tiled vertical atlas for batched rendering (single submit per frame for N episodes; per-tile bytes are row-contiguous for one fwrite per pipe per frame) - Threaded fwrites (per-pipe writer threads, parallel fan-out) - F_SETPIPE_SZ to fit a frame in one pipe buffer ## Why Vulkan, not raylib - Headless on Linux clusters with no X server (raylib needs xvfb) - Throughput-oriented batching (impossible without command-buffer control) - Independent build path so the optional Vulkan dep doesn't pollute the live drive sim build ## Documentation `docs/src/trajviz.md` covers prerequisites (apt packages), build, Python and CLI usage, performance tuning (sysctl knobs, HOST_CACHED explanation, batch size guidance), debugging env vars, troubleshooting, and architecture overview. ## Notes - `pufferlib/ocean/drive/map_io.py` extracted from `notebooks/visualize_trajectories.py` so trajviz and the notebook share one .bin map parser. The notebook still works. - `trajviz/shaders.c` is generated at build time and gitignored. - Drive sim code (drive.{h,c,py}, binding.c) is untouched.

Copilot

Pull request overview

Adds an opt-in, headless Vulkan-backed “trajviz” renderer for offline Drive trajectory visualization (MP4 output), and wires up trajectory capture/saving during training checkpoints to feed the renderer—without changing the live raylib visualizer path.

Changes:

Introduces pufferlib.ocean.drive.trajviz (C/Vulkan renderer + CPython extension + CLI/tools) built only when TRAJVIZ=1.
Adds C-side sim-trajectory recording and Python/C bindings to retrieve + save per-checkpoint trajectories (including map context) for offline rendering.
Adds docs and a shared .bin map parser (map_io.py) to support loading and rendering saved trajectories.

Reviewed changes

Copilot reviewed 35 out of 37 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
setup.py	Adds opt-in build path for the trajviz Vulkan CPython extension and shader build step.
pufferlib/vector.py	Tags worker envs with `_worker_idx`; adds `save_worker_trajectories()` fan-out helper.
pufferlib/pufferl.py	Saves trajectories at checkpoint time; adds reproducibility snapshot; threads traj temp dir into env kwargs.
pufferlib/ocean/env_binding.h	Adds vectorized bindings to pull sim trajectories and `world_mean` from C envs.
pufferlib/ocean/drive/drive.py	Caches `world_mean`; adds `get_sim_trajectories()` and `notify()` to write per-worker `.npz`.
pufferlib/ocean/drive/drive.h	Allocates/records per-step sim trajectories in C; exposes `c_get_sim_trajectories()`.
pufferlib/ocean/drive/datatypes.h	Extends `Agent` with sim-trajectory buffers and frees them on teardown.
pufferlib/ocean/drive/map_io.py	Adds canonical parser/transform helpers for Drive `.bin` maps for offline tooling.
pufferlib/ocean/drive/trajviz/init.py	Python `Renderer` wrapper + `render_npz()` convenience loader.
pufferlib/ocean/drive/trajviz/main.py	CLI entry point to render one or more `trajectories_*.npz` inputs.
pufferlib/ocean/drive/trajviz/_native.c	CPython extension: numpy validation + GIL release around render calls.
pufferlib/ocean/drive/trajviz/trajviz.h	Public C API for single-episode and batched rendering.
pufferlib/ocean/drive/trajviz/trajviz.c	Orchestrates Vulkan renderers + ffmpeg pipes; implements batch tiling path.
pufferlib/ocean/drive/trajviz/vk_context.h	Declares Vulkan instance/device/queue lifecycle and error helpers.
pufferlib/ocean/drive/trajviz/vk_context.c	Implements Vulkan init (1.3 + dynamic rendering + sync2) and teardown.
pufferlib/ocean/drive/trajviz/vk_pipeline.h	Declares shared pipeline/push-constant and instance formats.
pufferlib/ocean/drive/trajviz/vk_pipeline.c	Creates Vulkan graphics pipelines for polylines and agent boxes.
pufferlib/ocean/drive/trajviz/vk_renderer.h	Declares single-episode renderer (frame slots, readback, ffmpeg drain).
pufferlib/ocean/drive/trajviz/vk_renderer.c	Implements per-frame rendering + readback + ffmpeg write for one episode.
pufferlib/ocean/drive/trajviz/vk_batch_renderer.h	Declares batched atlas renderer API.
pufferlib/ocean/drive/trajviz/vk_batch_renderer.c	Implements batched atlas rendering + threaded pipe write fan-out.
pufferlib/ocean/drive/trajviz/vk_math.h	Adds small mat4 helpers for ortho fit and BEV camera.
pufferlib/ocean/drive/trajviz/ffmpeg_pipe.h	Declares ffmpeg subprocess pipe + writer-thread API.
pufferlib/ocean/drive/trajviz/ffmpeg_pipe.c	Implements popen-based ffmpeg piping and async writer thread.
pufferlib/ocean/drive/trajviz/shaders.h	Declares externs for generated SPIR-V blobs.
pufferlib/ocean/drive/trajviz/shaders/build_shaders.sh	Builds GLSL → SPIR-V and generates `shaders.c`.
pufferlib/ocean/drive/trajviz/shaders/polyline.vert	Adds GLSL for road polyline vertex stage.
pufferlib/ocean/drive/trajviz/shaders/polyline.frag	Adds GLSL for road polyline fragment stage.
pufferlib/ocean/drive/trajviz/shaders/agent_box.vert	Adds GLSL for instanced agent quad expansion.
pufferlib/ocean/drive/trajviz/shaders/agent_box.frag	Adds GLSL for flat-colored agent box fragment stage.
pufferlib/ocean/drive/trajviz/tools/random_rollout.py	End-to-end smoke test that rolls out Drive and renders via trajviz.
pufferlib/ocean/drive/trajviz/tools/init.py	Marks tools as a package (module discovery).
pufferlib/ocean/drive/trajviz/tests/test_main.c	Standalone C harness to validate Vulkan+ffmpeg path without Python.
docs/src/trajviz.md	Adds end-user documentation: build/run/tuning/architecture/troubleshooting.
docs/src/SUMMARY.md	Links trajviz documentation into the docs sidebar.
notebooks/visualize_trajectories.py	Adds/updates notebook for analyzing and plotting saved trajectories.
.gitignore	Ignores generated `pufferlib/ocean/drive/trajviz/shaders.c`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T22:26:23Z

+            if hasattr(self.vecenv, "save_worker_trajectories"):
+                traj_tmp = getattr(driver_env, "_traj_save_dir", None) if driver_env else None
+                if traj_tmp:
+                    self.vecenv.save_worker_trajectories()
+                    worker_files = sorted(glob.glob(os.path.join(traj_tmp, "traj_worker_*.npz")))
+                    if worker_files:
+                        all_traj = {}
+                        map_files = None
+                        world_mean = None
+                        for f in worker_files:
+                            d = np.load(f, allow_pickle=True)
+                            for k in ("x", "y", "z", "heading", "lengths", "map_ids"):
+                                if k in d:
+                                    all_traj.setdefault(k, []).append(d[k])
+                            if map_files is None and "map_files" in d:
+                                map_files = d["map_files"]
+                            if world_mean is None and "world_mean" in d:
+                                world_mean = d["world_mean"]
+                        for k, v in all_traj.items():
+                            key = f"traj_{k}" if k in ("x", "y", "z", "heading", "lengths") else k
+                            data[key] = np.concatenate(v)


The multiprocessing stitching path doesn’t include agent_offsets, which render_npz() requires to slice per-env episodes. Simply concatenating per-worker (agent-local) offsets wouldn’t be correct anyway; you likely need to (a) collect each worker’s agent_offsets, (b) shift them by the cumulative agent count, and (c) concatenate to produce a global agent_offsets aligned with the concatenated traj_* arrays.

Copilot · 2026-04-11T22:26:24Z

+                traj_tmp = getattr(driver_env, "_traj_save_dir", None) if driver_env else None
+                if traj_tmp:
+                    self.vecenv.save_worker_trajectories()
+                    worker_files = sorted(glob.glob(os.path.join(traj_tmp, "traj_worker_*.npz")))
+                    if worker_files:
+                        all_traj = {}


traj_tmp is a shared directory (.../traj_tmp) that is never cleaned, and glob(traj_worker_*.npz) will pick up stale worker files (e.g., if a previous run used more workers, or a crashed worker left an old file). Consider writing into an epoch-scoped subdir, or deleting existing traj_worker_*.npz files before triggering save_worker_trajectories(), and/or validating the expected worker count before stitching.

Copilot · 2026-04-11T22:26:24Z

+            PyErr_Fetch(&type, &value, &tb);
+            PyErr_Format(PyExc_TypeError, "%s: %s", name,
+                         value ? PyUnicode_AsUTF8(PyObject_Str(value)) : "type/shape mismatch");


as_array() leaks a reference: PyObject_Str(value) creates a new object that isn’t DECREF’d (it’s passed directly into PyUnicode_AsUTF8(...)). Store the PyObject_Str result in a temporary, use it for formatting, then Py_DECREF it to avoid per-call leaks on shape/type errors.

Suggested change

PyErr_Fetch(&type, &value, &tb);

PyErr_Format(PyExc_TypeError, "%s: %s", name,

value ? PyUnicode_AsUTF8(PyObject_Str(value)) : "type/shape mismatch");

PyObject *value_str = NULL;

const char *message = "type/shape mismatch";

PyErr_Fetch(&type, &value, &tb);

if (value) {

value_str = PyObject_Str(value);

if (value_str) {

const char *utf8 = PyUnicode_AsUTF8(value_str);

if (utf8)

message = utf8;

}

}

PyErr_Format(PyExc_TypeError, "%s: %s", name, message);

Py_XDECREF(value_str);

Copilot · 2026-04-11T22:26:24Z

+    d->rs.polygonMode = VK_POLYGON_MODE_FILL;
+    d->rs.cullMode = VK_CULL_MODE_NONE;
+    d->rs.frontFace = VK_FRONT_FACE_COUNTER_CLOCKWISE;
+    d->rs.lineWidth = 1.5f; /* used for line topology only; ignored for tris */


rs.lineWidth is set to 1.5f, but Vulkan requires the wideLines device feature to be enabled for line widths != 1.0. Since vk_ctx_init() doesn’t enable wideLines, this can trigger validation errors or pipeline creation failure on some devices. Either set the line width back to 1.0 or explicitly query+enable VkPhysicalDeviceFeatures::wideLines when supported.

Suggested change

d->rs.lineWidth = 1.5f; /* used for line topology only; ignored for tris */

d->rs.lineWidth = 1.0f; /* Vulkan-safe default; wider lines require wideLines */

Copilot · 2026-04-11T22:26:25Z

+static void record_view(VkCommandBuffer cmd, Renderer *r, FrameSlot *slot, RenderTarget *rt, const Mat4 *mvp,
+                        uint32_t num_instances) {
+    barrier_image(cmd, rt->color.image, VK_IMAGE_LAYOUT_UNDEFINED, VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
+                  VK_PIPELINE_STAGE_2_TOP_OF_PIPE_BIT, 0, VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT,
+                  VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT);


The image layout barrier uses oldLayout = VK_IMAGE_LAYOUT_UNDEFINED every frame. After the first frame the image will actually be in VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL (set later in the same command buffer), so this will trip validation and is undefined behavior. Track the current layout per render target (or always transition from TRANSFER_SRC_OPTIMAL back to COLOR_ATTACHMENT_OPTIMAL after the copy) and use the real old layout in the next frame’s barrier.

Copilot · 2026-04-11T22:26:25Z

+    PyArrayObject *x_arr = (PyArrayObject *)PyTuple_GetItem(args, 1);
+    PyArrayObject *y_arr = (PyArrayObject *)PyTuple_GetItem(args, 2);
+    PyArrayObject *z_arr = (PyArrayObject *)PyTuple_GetItem(args, 3);
+    PyArrayObject *heading_arr = (PyArrayObject *)PyTuple_GetItem(args, 4);
+    PyArrayObject *lengths_arr = (PyArrayObject *)PyTuple_GetItem(args, 5);
+    int ep_len = (int)PyLong_AsLong(PyTuple_GetItem(args, 6));
+


vec_get_sim_trajectories casts tuple items directly to PyArrayObject* and calls PyArray_DATA without validating they’re actually NumPy arrays / contiguous / expected dtype+shape. Other vec_* helpers in this file do PyArray_Check and shape-derived stride math. Add PyArray_Check (and ideally dtype/ndim checks) here too to prevent segfaults on misuse from Python.

Copilot · 2026-04-11T22:26:25Z

+    def save_worker_trajectories(self):
+        """Trigger every worker to call env.notify(), then block until all finish.
+
+        Used by PuffeRL.save_trajectories() to fan out a trajectory-save request
+        across workers. Each worker's env.notify() writes a per-worker npz and
+        clears its own notify flag; we spin until all flags are down.
+        """
+        self.buf["notify"][:] = True
+        while any(self.buf["notify"]):
+            time.sleep(0.01)
+


save_worker_trajectories() spins indefinitely using Python’s any(self.buf['notify']) over a NumPy array. This is both slower than self.buf['notify'].any() and can hang forever if a worker dies or never clears its flag. Consider using np.any(...)/.any() plus a timeout (and surfacing an error) to avoid deadlocking the training loop.

Copilot · 2026-04-11T22:26:26Z

+# Opt-in: TRAJVIZ=1 builds the Vulkan trajectory renderer as a CPython
+# extension. Requires libvulkan-dev + glslang-tools (apt). See
+# docs/trajviz.md for installation. Default off — most users don't need it.
+TRAJVIZ = os.getenv("TRAJVIZ", "0") == "1"


This comment points users to docs/trajviz.md, but the documentation added in this PR lives under docs/src/trajviz.md. Update the path so the install instructions are discoverable from the repo layout.

Copilot · 2026-04-11T22:26:26Z

+    if (!traj_xyh || !vert_offsets || !poly_meta_offsets || !poly_type_offsets || !agent_lengths) {
+        snprintf(ctx->last_error, sizeof(ctx->last_error), "null required pointer to render_episodes_batch");
+        return TRAJVIZ_ERR_BAD_ARG;
+    }


trajviz_render_episodes_batch allows all_road_offsets / all_road_types to be NULL (they’re not included in the required-pointer check), but later computes num_polys_s from poly_meta_offsets and can call vk_batch_renderer_set_episode with num_polys_s > 0 and off_s/typ_s == NULL, which is likely to crash. Either require these pointers when any episode has polylines, or validate per-episode and force num_polys_s=0 when offsets/types are absent.

Suggested change

}

}

{

int any_episode_has_polylines = 0;

for (int s = 0; s < batch_size; ++s) {

if (poly_meta_offsets[s + 1] > poly_meta_offsets[s]) {

any_episode_has_polylines = 1;

break;

}

}

if (any_episode_has_polylines && (!all_road_offsets || !all_road_types)) {

snprintf(ctx->last_error, sizeof(ctx->last_error),

"road offset/type arrays are required when any episode has polylines");

return TRAJVIZ_ERR_BAD_ARG;

}

}

Copilot · 2026-04-11T22:26:26Z

+        const uint32_t *off_s = (all_road_offsets && num_polys_plus_1 > 0) ? (all_road_offsets + pm_start) : NULL;
+
+        uint32_t pt_start = poly_type_offsets[s];
+        const uint32_t *typ_s = (all_road_types && num_polys_s > 0) ? (all_road_types + pt_start) : NULL;


The per-episode road slicing can yield num_polys_s > 0 while off_s/typ_s are NULL (because all_road_offsets / all_road_types are treated as optional). Before calling vk_batch_renderer_set_episode, add a consistency check that offsets/types are present whenever num_polys_s > 0, and fail with a clear error if not.

Suggested change

const uint32_t *typ_s = (all_road_types && num_polys_s > 0) ? (all_road_types + pt_start) : NULL;

const uint32_t *typ_s = (all_road_types && num_polys_s > 0) ? (all_road_types + pt_start) : NULL;

if (num_polys_s > 0 && (!off_s || !typ_s)) {

snprintf(ctx->last_error, sizeof(ctx->last_error),

"episode %d has %u road polygons but missing %s%s",

s, num_polys_s,

!off_s ? "road offsets" : "",

(!off_s && !typ_s) ? " and road types" : (!typ_s ? "road types" : ""));

err = TRAJVIZ_ERR_BAD_ARGS;

goto cleanup;

}

Adds an env-var encoder selector to ffmpeg_pipe.c with two choices: TRAJVIZ_ENCODER unset (default) → libx264 -preset veryfast TRAJVIZ_ENCODER=nvenc / h264_nvenc → h264_nvenc -preset p4 libx264 stays the default — counter-intuitively NVENC turned out to be the wrong fit for trajviz's "spawn one ffmpeg subprocess per output MP4 per render" architecture. Two reasons measured empirically on RTX 4080 + 16-core CPU: 1. NVENC session creation is ~100 ms per session and we spawn 2N ffmpeg processes per render_batch call. For short episodes the per-session startup tax dominates wall time. 2. The driver still throttles concurrent NVENC sessions per process ("incompatible client key (21)") at batch_size ≥ 8 even though the consumer-card cap was officially removed in driver 530+. 3. In steady state, libx264 -preset veryfast and NVENC -preset p4 are tied per-frame at 720p (~2.3 ms/frame either way). 16 parallel libx264 instances on 16 cores out-throughputs a single NVENC engine serializing 16 streams. Per-episode wall time, libx264 vs nvenc, both views, 1280x720: batch_size T=90 T=500 T=1000 1 350/790 1162/1540 2203/2284 4 273/815 1139/1442 5157/5432 NVENC closes the startup gap with longer episodes but never wins on this hardware. Real NVENC throughput unlocks would require either a single long-lived ffmpeg with multi-input/multi-output or direct libnvidia-encode integration with VK_KHR_external_memory_fd — both larger refactors than v1. `TRAJVIZ_ENCODER=nvenc` remains as a one-line opt-in for users who want to experiment or have a single-stream long-episode workload where the math flips. docs/src/trajviz.md gets a "Choosing an encoder" section with the empirical table and the architecture explanation, plus the env var is added to the debugging knobs list.

Each Drive sub-env in a vec computes its own world_mean in set_means() from its own map's road + agent points, so different maps in a num_maps>1 vec have different world_means. Empirically these can differ by 10+ km in source-Waymo coordinates across maps from different cities. The previous code had a misleading comment in env_binding.h's vec_get_world_mean ("All envs in a vec share the same map-centering convention so env 0 is representative") and saved a single world_mean (env 0's) into trajectories_*.npz. Any offline tool that loaded a non-env-0 sub-env's source map and tried to align it with that env's trajectory was off by (this env's world_mean − env 0's world_mean) — silently rendering roads in the wrong place. Fixes: env_binding.h - Replace the misleading comment on vec_get_world_mean with one that explains the env-0-only nature and points at the new fn - Add vec_get_all_world_means(c_envs, out) that fills a (num_envs, 3) float32 array with each sub-env's world_mean drive.py - Drive.get_world_means() Python wrapper, returns (num_envs, 3) - Drive.notify() now saves world_means (plural, per-env) into the per-worker npz, in addition to the legacy world_mean (singular) pufferl.py - PuffeRL.save_trajectories concatenates world_means across the per-worker npz files (matching how it concatenates map_ids / agent_offsets) and saves it as a (total_envs, 3) array - The serial / native PufferEnv path also saves world_means via driver_env.get_world_means() trajviz/__init__.py - render_npz prefers world_means (plural) when present and looks up the right per-env value via the env loop. Falls back to the legacy single world_mean key with a printed warning when older npz files are loaded — those files render with mis-aligned roads for non-env-0 sub-envs (but at least they render). - Map cache key now includes the world_mean tuple, not just map_id, so future heterogeneous-init_mode setups can't trip on a stale cached entry. docs/src/trajviz.md - New "Per-env world_means" item in the known limitations section explaining the schema change and the back-compat behavior Verified empirically: Drive(num_maps=3, use_all_maps=True) loaded map_001.bin / map_500.bin / map_900.bin, the three sub-envs returned world_means (420, -11042, -193), (-1737, -11850, -60), and (5950, 1706, 3) respectively — max diff 12,748 m.

eugenevinitsky and others added 3 commits April 11, 2026 14:43

Copilot AI review requested due to automatic review settings April 11, 2026 22:20

Copilot started reviewing on behalf of eugenevinitsky April 11, 2026 22:20 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

eugenevinitsky added 2 commits April 11, 2026 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Trajviz: Vulkan offline trajectory renderer#398

[WIP] Trajviz: Vulkan offline trajectory renderer#398
eugenevinitsky wants to merge 5 commits into3.0from
ev/visualize_tooling

eugenevinitsky commented Apr 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-            PyErr_Fetch(&type, &value, &tb);
-            PyErr_Format(PyExc_TypeError, "%s: %s", name,
-                         value ? PyUnicode_AsUTF8(PyObject_Str(value)) : "type/shape mismatch");
+            PyObject *value_str = NULL;
+            const char *message = "type/shape mismatch";
+            PyErr_Fetch(&type, &value, &tb);
+            if (value) {
+                value_str = PyObject_Str(value);
+                if (value_str) {
+                    const char *utf8 = PyUnicode_AsUTF8(value_str);
+                    if (utf8)
+                        message = utf8;
+                }
+            }
+            PyErr_Format(PyExc_TypeError, "%s: %s", name, message);
+            Py_XDECREF(value_str);

	d->rs.lineWidth = 1.5f; /* used for line topology only; ignored for tris */
	d->rs.lineWidth = 1.0f; /* Vulkan-safe default; wider lines require wideLines */

-    }
+    }
+    {
+        int any_episode_has_polylines = 0;
+        for (int s = 0; s < batch_size; ++s) {
+            if (poly_meta_offsets[s + 1] > poly_meta_offsets[s]) {
+                any_episode_has_polylines = 1;
+                break;
+            }
+        }
+        if (any_episode_has_polylines && (!all_road_offsets || !all_road_types)) {
+            snprintf(ctx->last_error, sizeof(ctx->last_error),
+                     "road offset/type arrays are required when any episode has polylines");
+            return TRAJVIZ_ERR_BAD_ARG;
+        }
+    }

-        const uint32_t *typ_s = (all_road_types && num_polys_s > 0) ? (all_road_types + pt_start) : NULL;
+        const uint32_t *typ_s = (all_road_types && num_polys_s > 0) ? (all_road_types + pt_start) : NULL;
+        if (num_polys_s > 0 && (!off_s || !typ_s)) {
+            snprintf(ctx->last_error, sizeof(ctx->last_error),
+                     "episode %d has %u road polygons but missing %s%s",
+                     s, num_polys_s,
+                     !off_s ? "road offsets" : "",
+                     (!off_s && !typ_s) ? " and road types" : (!typ_s ? "road types" : ""));
+            err = TRAJVIZ_ERR_BAD_ARGS;
+            goto cleanup;
+        }

Conversation

eugenevinitsky commented Apr 11, 2026

Summary

Performance

Architecture

Why Vulkan, not raylib

Why WIP — known gaps before merge

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants